Step-by-Step Implementation of Convolutional Neural Networks (CNN) in Python
Objective:
We will build a Convolutional Neural Network (CNN) using Python, TensorFlow, and Keras to classify images from the CIFAR-10 dataset.
Step 1: Install Dependencies
Before proceeding, ensure you have the required libraries installed. If not, install them using:
pip install tensorflow numpy matplotlib
Step 2: Import Libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
import numpy as np
Explanation:
- tensorflow and keras are used for building and training the CNN model.
- cifar10 provides a dataset of 60,000 color images (10 classes).
- Conv2D, MaxPooling2D, Flatten, and Dense define CNN layers.
- to_categorical converts labels to one-hot encoded format.
Step 3: Load and Preprocess the Data
# Load CIFAR-10 dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
# Normalize pixel values (0-255 → 0-1)
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
Explanation:
- CIFAR-10 has 10 classes (airplane, automobile, bird, etc.).
- Pixel values are normalized to improve training speed.
- Labels are one-hot encoded for categorical classification.
Step 4: Build the CNN Model
# Define the CNN model
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)), # Convolutional layer
MaxPooling2D((2,2)), # Pooling layer
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D((2,2)),
Conv2D(128, (3,3), activation='relu'),
MaxPooling2D((2,2)),
Flatten(), # Flatten feature maps
Dense(128, activation='relu'), # Fully connected layer
Dropout(0.5), # Dropout for regularization
Dense(10, activation='softmax') # Output layer
])
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Print model summary
model.summary()
Explanation:
- Convolutional Layers: Extract features from images (32, 64, 128 filters).
- Pooling Layers: Reduce spatial dimensions using max-pooling.
- Flatten Layer: Converts feature maps into a 1D array.
- Dense Layer: Adds fully connected neurons (128 neurons).
- Dropout Layer: Reduces overfitting (50% of neurons dropped).
- Output Layer: Uses softmax for multi-class classification (10 categories).
- Loss Function: categorical_crossentropy for multi-class problems.
Step 5: Train the Model
# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))
Explanation:
- epochs=10 means the model trains for 10 iterations.
- batch_size=64 means 64 images are processed per update.
- validation_data=(X_test, y_test) helps monitor performance.
Step 6: Evaluate the Model
# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")
Explanation:
- evaluate() computes accuracy on unseen test images.
Step 7: Make Predictions
# Make predictions
sample = np.expand_dims(X_test[0], axis=0) # Take one test image
prediction = model.predict(sample)
predicted_class = np.argmax(prediction)
# CIFAR-10 class names
class_names = ["Airplane", "Automobile", "Bird", "Cat", "Deer", "Dog", "Frog", "Horse", "Ship", "Truck"]
print(f"Predicted Class: {class_names[predicted_class]}")
Explanation:
- expand_dims() reshapes the test image for model input.
- predict() returns probabilities for each class.
- argmax() extracts the highest probability class.
Step 8: Visualize Training Performance
# Plot accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Model Training Accuracy')
plt.show()
Explanation:
- The graph shows how accuracy improves over epochs.
- Validation accuracy indicates generalization ability.
Key Takeaways
- CNNs excel at image classification by capturing spatial patterns.
- Convolutional layers extract features like edges and textures.
- Max pooling reduces spatial size while retaining important features.
- ReLU activation prevents vanishing gradient issues.
- Dropout layers help prevent overfitting.
- Softmax activation is used for multi-class classification.
- Data normalization improves training stability.
- Test accuracy shows how well the model generalizes to unseen images.
Next blog- Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)