Deep Learning February 02 ,2025

Step-by-Step Implementation of LSTM (Long Short-Term Memory) in Python

Objective:

We will build an LSTM-based Recurrent Neural Network (RNN) using TensorFlow and Keras to perform sentiment analysis on the IMDB movie review dataset.

Step 1: Install Dependencies

Ensure you have the required libraries installed. If not, install them using:

pip install tensorflow numpy matplotlib

Step 2: Import Libraries

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import matplotlib.pyplot as plt

Explanation:

  • tensorflow and keras are used for building and training the LSTM model.
  • imdb provides a dataset of 50,000 movie reviews labeled as positive or negative.
  • LSTM defines the LSTM layer, an advanced form of RNN that overcomes vanishing gradients.
  • Embedding converts words into dense vector representations.
  • pad_sequences ensures all sequences have the same length.

Step 3: Load and Preprocess the Data

# Load IMDB dataset (keep only the 10,000 most frequent words)
max_features = 10000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to ensure equal length
maxlen = 100  # Max review length
X_train = pad_sequences(X_train, maxlen=maxlen)
X_test = pad_sequences(X_test, maxlen=maxlen)

Explanation:

  • IMDB dataset contains 50,000 reviews, split into training (25,000) and testing (25,000).
  • We only keep the top 10,000 most common words to reduce complexity.
  • Padding ensures all sequences are of the same length (100 words).

Step 4: Build the LSTM Model

# Define the LSTM model
model = Sequential([
    Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),  # Word embeddings
    LSTM(64, return_sequences=True),  # First LSTM layer
    LSTM(32, return_sequences=False),  # Second LSTM layer
    Dense(16, activation='relu'),  # Fully connected layer
    Dropout(0.5),  # Dropout to prevent overfitting
    Dense(1, activation='sigmoid')  # Output layer for binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Print model summary
model.summary()

Explanation:

  • Embedding Layer: Converts words into dense vector representations.
  • First LSTM Layer: Extracts long-term dependencies from text sequences.
  • Second LSTM Layer: Further refines the sequential learning.
  • Dense Layer: Adds fully connected neurons to refine predictions.
  • Dropout Layer: Reduces overfitting by randomly dropping neurons.
  • Sigmoid Activation: Outputs probabilities for binary classification (positive/negative review).

Step 5: Train the Model

# Train the model
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))

Explanation:

  • epochs=5 means the model trains for 5 iterations.
  • batch_size=64 means 64 reviews are processed per update.
  • validation_data=(X_test, y_test) monitors performance on unseen data.

Step 6: Evaluate the Model

# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")

Explanation:

  • evaluate() computes accuracy on unseen reviews.

Step 7: Make Predictions

# Make a prediction on a sample review
sample = X_test[0].reshape(1, -1)  # Reshape for model input
prediction = model.predict(sample)
sentiment = "Positive" if prediction > 0.5 else "Negative"

print(f"Predicted Sentiment: {sentiment}")

Explanation:

  • predict() returns the probability of positive sentiment.
  • > 0.5 means the model predicts positive, otherwise negative.

Step 8: Visualize Training Performance

# Plot accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Model Training Accuracy')
plt.show()

Explanation:

  • The graph shows how accuracy improves over epochs.
  • Validation accuracy indicates generalization ability.

Key Takeaways

  1. LSTMs are superior to vanilla RNNs because they handle long-term dependencies efficiently.
  2. Embedding layers convert words into dense vectors, making text processing efficient.
  3. Multiple LSTM layers improve learning, allowing better feature extraction.
  4. ReLU activation prevents vanishing gradient issues.
  5. Dropout layers help prevent overfitting.
  6. Sigmoid activation is used for binary classification (positive/negative review).
  7. Test accuracy determines how well the model generalizes to new reviews.

Next Blog- Generative Adversarial Networks (GANs)

 

Purnima
0

You must logged in to post comments.

Get In Touch

123 Street, New York, USA

+012 345 67890

techiefreak87@gmail.com

© Design & Developed by HW Infotech