Step-by-Step Implementation of LSTM (Long Short-Term Memory) in Python
Objective:
We will build an LSTM-based Recurrent Neural Network (RNN) using TensorFlow and Keras to perform sentiment analysis on the IMDB movie review dataset.
Step 1: Install Dependencies
Ensure you have the required libraries installed. If not, install them using:
pip install tensorflow numpy matplotlib
Step 2: Import Libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import matplotlib.pyplot as plt
Explanation:
- tensorflow and keras are used for building and training the LSTM model.
- imdb provides a dataset of 50,000 movie reviews labeled as positive or negative.
- LSTM defines the LSTM layer, an advanced form of RNN that overcomes vanishing gradients.
- Embedding converts words into dense vector representations.
- pad_sequences ensures all sequences have the same length.
Step 3: Load and Preprocess the Data
# Load IMDB dataset (keep only the 10,000 most frequent words)
max_features = 10000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
# Pad sequences to ensure equal length
maxlen = 100 # Max review length
X_train = pad_sequences(X_train, maxlen=maxlen)
X_test = pad_sequences(X_test, maxlen=maxlen)
Explanation:
- IMDB dataset contains 50,000 reviews, split into training (25,000) and testing (25,000).
- We only keep the top 10,000 most common words to reduce complexity.
- Padding ensures all sequences are of the same length (100 words).
Step 4: Build the LSTM Model
# Define the LSTM model
model = Sequential([
Embedding(input_dim=max_features, output_dim=128, input_length=maxlen), # Word embeddings
LSTM(64, return_sequences=True), # First LSTM layer
LSTM(32, return_sequences=False), # Second LSTM layer
Dense(16, activation='relu'), # Fully connected layer
Dropout(0.5), # Dropout to prevent overfitting
Dense(1, activation='sigmoid') # Output layer for binary classification
])
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Print model summary
model.summary()
Explanation:
- Embedding Layer: Converts words into dense vector representations.
- First LSTM Layer: Extracts long-term dependencies from text sequences.
- Second LSTM Layer: Further refines the sequential learning.
- Dense Layer: Adds fully connected neurons to refine predictions.
- Dropout Layer: Reduces overfitting by randomly dropping neurons.
- Sigmoid Activation: Outputs probabilities for binary classification (positive/negative review).
Step 5: Train the Model
# Train the model
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))
Explanation:
- epochs=5 means the model trains for 5 iterations.
- batch_size=64 means 64 reviews are processed per update.
- validation_data=(X_test, y_test) monitors performance on unseen data.
Step 6: Evaluate the Model
# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")
Explanation:
- evaluate() computes accuracy on unseen reviews.
Step 7: Make Predictions
# Make a prediction on a sample review
sample = X_test[0].reshape(1, -1) # Reshape for model input
prediction = model.predict(sample)
sentiment = "Positive" if prediction > 0.5 else "Negative"
print(f"Predicted Sentiment: {sentiment}")
Explanation:
- predict() returns the probability of positive sentiment.
- > 0.5 means the model predicts positive, otherwise negative.
Step 8: Visualize Training Performance
# Plot accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Model Training Accuracy')
plt.show()
Explanation:
- The graph shows how accuracy improves over epochs.
- Validation accuracy indicates generalization ability.
Key Takeaways
- LSTMs are superior to vanilla RNNs because they handle long-term dependencies efficiently.
- Embedding layers convert words into dense vectors, making text processing efficient.
- Multiple LSTM layers improve learning, allowing better feature extraction.
- ReLU activation prevents vanishing gradient issues.
- Dropout layers help prevent overfitting.
- Sigmoid activation is used for binary classification (positive/negative review).
- Test accuracy determines how well the model generalizes to new reviews.
Next Blog- Generative Adversarial Networks (GANs)