Street View Housing Number Digit Recognition¶
Context¶
One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.
The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It is one of the most popular image recognition datasets. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.
Objective¶
Our objective is to predict the number depicted inside the image by using Artificial or Fully Connected Feed Forward Neural Networks and Convolutional Neural Networks. We will go through various models of each and finally select the one that is giving us the best performance.
Dataset¶
Here, we will use a subset of the original data to save some computation time. The dataset is provided as a .h5 file. The basic preprocessing steps have been applied on the dataset.
Mount the drive¶
#mounting google drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
Importing the necessary libraries¶
#basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import h5py
#model selection, preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
#deep learning training
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, BatchNormalization
from tensorflow.keras.layers import Conv2D,LeakyReLU,MaxPooling2D,Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses, backend
#model metrics
from sklearn.metrics import classification_report, confusion_matrix
checking version of tensorflow
#printing tensorflow version
print("TensorFlow version:", tf.__version__)
TensorFlow version: 2.15.0
Load the dataset¶
#setting path variable
path = "/content/drive/MyDrive/MIT_Datasets/SVHN_single_grey1.h5"
#loading dataset
with h5py.File(path, 'r') as f:
#extracting data and labels
X_train = np.array(f['X_train'])
y_train = np.array(f['y_train'])
X_test = np.array(f['X_test'])
y_test = np.array(f['y_test'])
#printing set shapes to confirm number of images
print("Train data shape:", X_train.shape)
print("Train labels shape:", y_train.shape)
print("Test data shape:", X_test.shape)
print("Test labels shape:", y_test.shape)
Train data shape: (42000, 32, 32) Train labels shape: (42000,) Test data shape: (18000, 32, 32) Test labels shape: (18000,)
Observation:
- The train labels array has a shape of (42000,), while the test labels array has a shape of (18000,), indicating both arrays contain labels for all of their images.
- The images in both the train and test sets have dimensions of 32x32 pixels.
Visualizing images¶
#visualising first 10 images
plt.figure(figsize=(10, 1))
for i in range(10):
plt.subplot(1, 10, i + 1)
plt.imshow(X_train[i], cmap='gray')
plt.title(y_train[i])
plt.axis('off')
plt.show()
Data preparation¶
- Print the shape and the array of pixels for the first image in the training dataset.
- Normalize the train and the test dataset by dividing by 255.
- Print the new shapes of the train and the test dataset.
- One-hot encode the target variable.
#printing shape and array of pixels for the first image in the training dataset
print("First image shape:", X_train[0].shape)
print("First image - pixel array:\n", X_train[0])
#flattening dataset, dividing by 255 to normalise
X_train = X_train.reshape(X_train.shape[0], 1024)/255
X_test = X_test.reshape(X_test.shape[0], 1024)/255
#encoding target variable
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
#printing set shapes to confirm number of images
print("Train data shape:", X_train.shape)
print("Train labels shape:", y_train.shape)
print("Test data shape:", X_test.shape)
print("Test labels shape:", y_test.shape)
First image shape: (32, 32) First image - pixel array: [[ 33.0704 30.2601 26.852 ... 71.4471 58.2204 42.9939] [ 25.2283 25.5533 29.9765 ... 113.0209 103.3639 84.2949] [ 26.2775 22.6137 40.4763 ... 113.3028 121.775 115.4228] ... [ 28.5502 36.212 45.0801 ... 24.1359 25.0927 26.0603] [ 38.4352 26.4733 23.2717 ... 28.1094 29.4683 30.0661] [ 50.2984 26.0773 24.0389 ... 49.6682 50.853 53.0377]] Train data shape: (42000, 1024) Train labels shape: (42000, 10) Test data shape: (18000, 1024) Test labels shape: (18000, 10)
Observation:
- Now that the images are flattened into 1-dimensioanl arrays, each sample in both the training and testing datasets has 1024 features.
- The training labels array now has a shape of (42000, 10) while the test labels array has a shape of (18000, 10), indicating that it contains 10 distinct labels for each of the images. This is the desired outcome of one-hot encoding the target variable.
#fixing seed for random number generators
np.random.seed(42)
tf.random.set_seed(42)
Building and training an ANN model¶
#defining model function
def nn_model_1():
model = Sequential([
Dense(64, activation='relu', input_shape=(1024, )),
Dense(32, activation='relu'),
Dense(10, activation = 'softmax')
])
#instantiating adam optimiser
adam = Adam(learning_rate=0.001)
#compiling model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])
return model
#setting model function as variable
nn_model_1 = nn_model_1()
#printing summary
nn_model_1.summary()
#fitting model
hist_nn_model_1 = nn_model_1.fit(X_train, y_train, epochs=20, validation_split=0.2, batch_size=128, verbose = 1)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 65600
dense_1 (Dense) (None, 32) 2080
dense_2 (Dense) (None, 10) 330
=================================================================
Total params: 68010 (265.66 KB)
Trainable params: 68010 (265.66 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/20
263/263 [==============================] - 4s 5ms/step - loss: 2.2961 - accuracy: 0.1240 - val_loss: 2.2563 - val_accuracy: 0.1383
Epoch 2/20
263/263 [==============================] - 1s 4ms/step - loss: 2.0585 - accuracy: 0.2698 - val_loss: 1.8846 - val_accuracy: 0.3338
Epoch 3/20
263/263 [==============================] - 1s 4ms/step - loss: 1.7369 - accuracy: 0.4081 - val_loss: 1.6266 - val_accuracy: 0.4599
Epoch 4/20
263/263 [==============================] - 1s 4ms/step - loss: 1.5514 - accuracy: 0.4853 - val_loss: 1.4956 - val_accuracy: 0.5125
Epoch 5/20
263/263 [==============================] - 1s 4ms/step - loss: 1.4396 - accuracy: 0.5305 - val_loss: 1.4022 - val_accuracy: 0.5494
Epoch 6/20
263/263 [==============================] - 1s 4ms/step - loss: 1.3600 - accuracy: 0.5671 - val_loss: 1.3411 - val_accuracy: 0.5855
Epoch 7/20
263/263 [==============================] - 1s 5ms/step - loss: 1.3114 - accuracy: 0.5885 - val_loss: 1.3030 - val_accuracy: 0.5967
Epoch 8/20
263/263 [==============================] - 2s 6ms/step - loss: 1.2705 - accuracy: 0.6033 - val_loss: 1.2479 - val_accuracy: 0.6185
Epoch 9/20
263/263 [==============================] - 1s 5ms/step - loss: 1.2338 - accuracy: 0.6179 - val_loss: 1.2217 - val_accuracy: 0.6263
Epoch 10/20
263/263 [==============================] - 1s 4ms/step - loss: 1.2044 - accuracy: 0.6267 - val_loss: 1.1944 - val_accuracy: 0.6321
Epoch 11/20
263/263 [==============================] - 1s 4ms/step - loss: 1.1815 - accuracy: 0.6365 - val_loss: 1.1811 - val_accuracy: 0.6410
Epoch 12/20
263/263 [==============================] - 1s 4ms/step - loss: 1.1585 - accuracy: 0.6446 - val_loss: 1.1517 - val_accuracy: 0.6501
Epoch 13/20
263/263 [==============================] - 1s 4ms/step - loss: 1.1367 - accuracy: 0.6526 - val_loss: 1.1350 - val_accuracy: 0.6555
Epoch 14/20
263/263 [==============================] - 1s 4ms/step - loss: 1.1216 - accuracy: 0.6586 - val_loss: 1.1194 - val_accuracy: 0.6601
Epoch 15/20
263/263 [==============================] - 1s 4ms/step - loss: 1.1042 - accuracy: 0.6656 - val_loss: 1.1156 - val_accuracy: 0.6621
Epoch 16/20
263/263 [==============================] - 1s 4ms/step - loss: 1.0903 - accuracy: 0.6705 - val_loss: 1.1203 - val_accuracy: 0.6593
Epoch 17/20
263/263 [==============================] - 1s 4ms/step - loss: 1.0761 - accuracy: 0.6773 - val_loss: 1.0797 - val_accuracy: 0.6742
Epoch 18/20
263/263 [==============================] - 1s 4ms/step - loss: 1.0637 - accuracy: 0.6799 - val_loss: 1.0679 - val_accuracy: 0.6768
Epoch 19/20
263/263 [==============================] - 1s 5ms/step - loss: 1.0556 - accuracy: 0.6802 - val_loss: 1.0598 - val_accuracy: 0.6801
Epoch 20/20
263/263 [==============================] - 2s 6ms/step - loss: 1.0449 - accuracy: 0.6846 - val_loss: 1.0497 - val_accuracy: 0.6836
Plotting Training and Validation Accuracies¶
#extracting training history, list of epochs
model_hist = hist_nn_model_1.history
epochs = [i for i in range(1,21)]
#plotting accuracies using model history by epoch
plt.figure(figsize = (8,8))
plt.plot(epochs,model_hist['accuracy'],ls = '--', label = 'Accuracy')
plt.plot(epochs,model_hist['val_accuracy'],ls = '--', label = 'Validation Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- It's evident that the training accuracy and validation accuracy are closely matched throughout the model training process. This indicates the model is well fitted to the data, and can be expected to generalise well to unseen data.
- The training accuracy rapidly improves through the initial epochs (up to epoch 6), before decreasing in gradient and improving more gradually from epoch 7 onwards.
- The model achieved a final accuracy of 68.46% on epoch 20. This isn't particularly high, but still encouraging given the simplicity of this particular model.
Let's build one more model with higher complexity and see if we can improve the performance of the model.
First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.
#clearing keras backend
backend.clear_session()
#fixing seed for random number generators
np.random.seed(42)
tf.random.set_seed(42)
Building and training the new ANN model¶
#defining model function
def nn_model_2():
model = Sequential([
Dense(256, activation='relu', input_shape=(1024, )),
Dense(128, activation='relu'),
Dropout(rate = 0.2),
Dense(64, activation='relu'),
Dense(64, activation='relu'),
Dense(32, activation='relu'),
BatchNormalization(),
Dense(10, activation = 'softmax')
])
#instantiating adam optimiser
adam = Adam(learning_rate=0.0005)
#compiling model
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics='accuracy')
return model
#setting model function as variable
nn_model_2 = nn_model_2()
#printing model summary
print(nn_model_2.summary())
#fitting model
hist_nn_model_2 = nn_model_2.fit(X_train,y_train, epochs=30, validation_split=0.2, batch_size=128, verbose = 1)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 256) 262400
dense_1 (Dense) (None, 128) 32896
dropout (Dropout) (None, 128) 0
dense_2 (Dense) (None, 64) 8256
dense_3 (Dense) (None, 64) 4160
dense_4 (Dense) (None, 32) 2080
batch_normalization (Batch (None, 32) 128
Normalization)
dense_5 (Dense) (None, 10) 330
=================================================================
Total params: 310250 (1.18 MB)
Trainable params: 310186 (1.18 MB)
Non-trainable params: 64 (256.00 Byte)
_________________________________________________________________
None
Epoch 1/30
263/263 [==============================] - 6s 8ms/step - loss: 2.3359 - accuracy: 0.1030 - val_loss: 2.3034 - val_accuracy: 0.0942
Epoch 2/30
263/263 [==============================] - 2s 8ms/step - loss: 2.2799 - accuracy: 0.1299 - val_loss: 2.2352 - val_accuracy: 0.1855
Epoch 3/30
263/263 [==============================] - 1s 5ms/step - loss: 1.9052 - accuracy: 0.2969 - val_loss: 1.8256 - val_accuracy: 0.3443
Epoch 4/30
263/263 [==============================] - 2s 6ms/step - loss: 1.5668 - accuracy: 0.4568 - val_loss: 1.3822 - val_accuracy: 0.5492
Epoch 5/30
263/263 [==============================] - 1s 5ms/step - loss: 1.3245 - accuracy: 0.5639 - val_loss: 1.1980 - val_accuracy: 0.6177
Epoch 6/30
263/263 [==============================] - 2s 6ms/step - loss: 1.2032 - accuracy: 0.6086 - val_loss: 1.1109 - val_accuracy: 0.6413
Epoch 7/30
263/263 [==============================] - 1s 5ms/step - loss: 1.1301 - accuracy: 0.6360 - val_loss: 1.1062 - val_accuracy: 0.6405
Epoch 8/30
263/263 [==============================] - 1s 6ms/step - loss: 1.0573 - accuracy: 0.6631 - val_loss: 0.9751 - val_accuracy: 0.6901
Epoch 9/30
263/263 [==============================] - 2s 7ms/step - loss: 1.0128 - accuracy: 0.6771 - val_loss: 0.9443 - val_accuracy: 0.7002
Epoch 10/30
263/263 [==============================] - 2s 7ms/step - loss: 0.9677 - accuracy: 0.6930 - val_loss: 0.9037 - val_accuracy: 0.7129
Epoch 11/30
263/263 [==============================] - 1s 5ms/step - loss: 0.9386 - accuracy: 0.7026 - val_loss: 0.8602 - val_accuracy: 0.7286
Epoch 12/30
263/263 [==============================] - 1s 5ms/step - loss: 0.9130 - accuracy: 0.7093 - val_loss: 0.8869 - val_accuracy: 0.7215
Epoch 13/30
263/263 [==============================] - 1s 5ms/step - loss: 0.9027 - accuracy: 0.7117 - val_loss: 0.8584 - val_accuracy: 0.7311
Epoch 14/30
263/263 [==============================] - 2s 6ms/step - loss: 0.8838 - accuracy: 0.7212 - val_loss: 0.8698 - val_accuracy: 0.7286
Epoch 15/30
263/263 [==============================] - 2s 6ms/step - loss: 0.8535 - accuracy: 0.7299 - val_loss: 0.8112 - val_accuracy: 0.7486
Epoch 16/30
263/263 [==============================] - 1s 5ms/step - loss: 0.8449 - accuracy: 0.7316 - val_loss: 0.8466 - val_accuracy: 0.7273
Epoch 17/30
263/263 [==============================] - 2s 6ms/step - loss: 0.8267 - accuracy: 0.7384 - val_loss: 0.8158 - val_accuracy: 0.7449
Epoch 18/30
263/263 [==============================] - 2s 8ms/step - loss: 0.8258 - accuracy: 0.7378 - val_loss: 0.8321 - val_accuracy: 0.7394
Epoch 19/30
263/263 [==============================] - 1s 5ms/step - loss: 0.8147 - accuracy: 0.7410 - val_loss: 0.8021 - val_accuracy: 0.7462
Epoch 20/30
263/263 [==============================] - 1s 5ms/step - loss: 0.7889 - accuracy: 0.7509 - val_loss: 0.7471 - val_accuracy: 0.7695
Epoch 21/30
263/263 [==============================] - 2s 6ms/step - loss: 0.7764 - accuracy: 0.7548 - val_loss: 0.7608 - val_accuracy: 0.7649
Epoch 22/30
263/263 [==============================] - 1s 5ms/step - loss: 0.7708 - accuracy: 0.7559 - val_loss: 0.7576 - val_accuracy: 0.7608
Epoch 23/30
263/263 [==============================] - 2s 6ms/step - loss: 0.7489 - accuracy: 0.7632 - val_loss: 0.7766 - val_accuracy: 0.7582
Epoch 24/30
263/263 [==============================] - 2s 7ms/step - loss: 0.7477 - accuracy: 0.7623 - val_loss: 0.7249 - val_accuracy: 0.7748
Epoch 25/30
263/263 [==============================] - 2s 7ms/step - loss: 0.7338 - accuracy: 0.7657 - val_loss: 0.7292 - val_accuracy: 0.7715
Epoch 26/30
263/263 [==============================] - 2s 8ms/step - loss: 0.7299 - accuracy: 0.7658 - val_loss: 0.7230 - val_accuracy: 0.7729
Epoch 27/30
263/263 [==============================] - 1s 5ms/step - loss: 0.7226 - accuracy: 0.7684 - val_loss: 0.7278 - val_accuracy: 0.7740
Epoch 28/30
263/263 [==============================] - 2s 6ms/step - loss: 0.7158 - accuracy: 0.7719 - val_loss: 0.7436 - val_accuracy: 0.7689
Epoch 29/30
263/263 [==============================] - 2s 6ms/step - loss: 0.7163 - accuracy: 0.7712 - val_loss: 0.7519 - val_accuracy: 0.7640
Epoch 30/30
263/263 [==============================] - 1s 5ms/step - loss: 0.7058 - accuracy: 0.7771 - val_loss: 0.6762 - val_accuracy: 0.7902
Plot the Training and Validation Accuracies and write down your Observations.¶
#extracting training history, list of epochs
model_hist = hist_nn_model_2.history
epochs = [i for i in range(1,31)]
#plotting accuracies using model history by epoch
plt.figure(figsize = (8,8))
plt.plot(epochs,model_hist['accuracy'],ls = '--', label = 'Accuracy')
plt.plot(epochs,model_hist['val_accuracy'],ls = '--', label = 'Validation Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- As with the first NN model above, it's clear that the training accuracy and validation accuracy are closely matched throughout the model training process. There does appear to be dips and spikes in the validation accuracy from epoch 6 onwards, however the overall trend is positive. Therefore the model is well fitted to the data, and can be expected to generalise well to unseen data.
- As before, the training accuracy rapidly improves through the initial epochs, before decreasing in gradient and improving more gradually from epoch 7 onwards.
- The model achieved a final accuracy of 77.71% by epoch 30. This is markedly improved over the final accuracy of the first NN model at 68.46%.
Predictions on the test data¶
#making predictions using nn_model_2
y_pred = nn_model_2.predict(X_test)
563/563 [==============================] - 1s 2ms/step
#converting y_pred, y_test entries to single label using argmax
y_pred = np.argmax(y_pred, axis=-1)
y_test = np.argmax(y_test, axis=-1)
#confirming unique values in y_test, y_pred
unique_y_test = np.unique(y_test)
unique_y_pred = np.unique(y_pred)
#printing unique values in y_test, y_pred
print("Unique values in y_test:", unique_y_test)
print("Unique values in y_pred:", unique_y_pred)
Unique values in y_test: [0 1 2 3 4 5 6 7 8 9] Unique values in y_pred: [0 1 2 3 4 5 6 7 8 9]
Print the classification report and the confusion matrix¶
#printing classification report
print(classification_report(y_test, y_pred))
#plotting heatmap of confusion matrix
con_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8,5))
sns.heatmap(con_matrix, annot=True, fmt='.0f')
plt.ylabel('Actual Value')
plt.xlabel('Predicted Value')
plt.show()
precision recall f1-score support
0 0.85 0.79 0.82 1814
1 0.73 0.86 0.79 1828
2 0.84 0.78 0.81 1803
3 0.76 0.75 0.75 1719
4 0.82 0.83 0.83 1812
5 0.76 0.76 0.76 1768
6 0.81 0.77 0.79 1832
7 0.83 0.83 0.83 1808
8 0.75 0.76 0.75 1812
9 0.77 0.77 0.77 1804
accuracy 0.79 18000
macro avg 0.79 0.79 0.79 18000
weighted avg 0.79 0.79 0.79 18000
Final Observations:
- In terms of the classification report, the model achieved a final accuracy of 0.79, indicating the model performs relatively well in identifying what numbers are shown in the dataset images. However the precision, recall, and F1 score vary across different number classes from 0.73 to 0.86, indicating considerable variation in the model's ability to accurately predict numbers across different classes.
- Numbers 1, 4, and 7 have relatively higher precision, recall, and F1 score compared to other classes. Number 1 has the highest recall, indicating that the model is more suited to identifying images of the number 1. By comparison numbers 3 and 8 have relatively lower precision, recall, and F1 score compared to other numbers.
- Turning to the confusion matrix, it's notable that the final model often confuses similar numbers, for example 0 with 9, 2 with 7, and 3 with 5. This further points to some drawbacks in using this model in predicting numerical values.
Using Convolutional Neural Networks¶
Reloading the dataset¶
#setting path variable
path = "/content/drive/MyDrive/MIT_Datasets/SVHN_single_grey1.h5"
#loading dataset
with h5py.File(path, 'r') as f:
#extracting data and labels
X_train = np.array(f['X_train'])
y_train = np.array(f['y_train'])
X_test = np.array(f['X_test'])
y_test = np.array(f['y_test'])
#printing set shapes to confirm number of images
print("Train data shape:", X_train.shape)
print("Train labels shape:", y_train.shape)
print("Test data shape:", X_test.shape)
print("Test labels shape:", y_test.shape)
Train data shape: (42000, 32, 32) Train labels shape: (42000,) Test data shape: (18000, 32, 32) Test labels shape: (18000,)
Observation:
- The train labels array has a shape of (42000,), while the test labels array has a shape of (18000,), indicating both arrays contain labels for all of their images.
- The images in both the train and test sets have dimensions of 32x32 pixels.
Data preparation¶
- Print the shape and the array of pixels for the first image in the training dataset.
- Reshape the train and the test dataset because we always have to give a 4D array as input to CNNs.
- Normalize the train and the test dataset by dividing by 255.
- Print the new shapes of the train and the test dataset.
- One-hot encode the target variable.
#printing shape and array of pixels for the first image in the training dataset
print("First image shape:", X_train[0].shape)
print("First image - pixel array:\n", X_train[0])
First image shape: (32, 32) First image - pixel array: [[ 33.0704 30.2601 26.852 ... 71.4471 58.2204 42.9939] [ 25.2283 25.5533 29.9765 ... 113.0209 103.3639 84.2949] [ 26.2775 22.6137 40.4763 ... 113.3028 121.775 115.4228] ... [ 28.5502 36.212 45.0801 ... 24.1359 25.0927 26.0603] [ 38.4352 26.4733 23.2717 ... 28.1094 29.4683 30.0661] [ 50.2984 26.0773 24.0389 ... 49.6682 50.853 53.0377]]
Reshape the dataset to be able to pass them to CNNs. Remember that we always have to give a 4D array as input to CNNs
#reshape train and test set
X_train = X_train.reshape(-1, 32, 32, 1)
X_test = X_test.reshape(-1, 32, 32, 1)
#normalise train and test set - dividing by 255
X_train = X_train/255
X_test = X_test/255
#printing shape of train and test set
print("Shape of reshaped X_train:", X_train.shape)
print("Shape of reshaped X_test:", X_test.shape)
Shape of reshaped X_train: (42000, 32, 32, 1) Shape of reshaped X_test: (18000, 32, 32, 1)
One-hot encode the labels in the target variable y_train and y_test.¶
#encoding target variable
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
Observation:
- Now that the images are flattened into 1-dimensioanl arrays, each sample in both the training and testing datasets has 1024 features.
- The training labels array now has a shape of (42000, 10) while the test labels array has a shape of (18000, 10), indicating that it contains 10 distinct labels for each of the images. This is the desired outcome of one-hot encoding the target variable.
Model Building¶
Now that we have done data preprocessing, let's build a CNN model. Fix the seed for random number generators
#fixing seed for random number generators
np.random.seed(42)
tf.random.set_seed(42)
Build and train a CNN model¶
#defining model function
def cnn_model_1():
model = Sequential([
Conv2D(16, (3, 3), padding='same', input_shape=(32, 32, 1)),
LeakyReLU(alpha=0.1),
Conv2D(32, (3, 3), padding='same'),
LeakyReLU(alpha=0.1),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(32),
LeakyReLU(alpha=0.1),
Dense(10, activation='softmax')
])
#instantiating adam optimiser
adam = Adam(learning_rate=0.001)
#compiling model
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics='accuracy')
return model
#setting model function as variable
cnn_model_1 = cnn_model_1()
#printing summary
print(cnn_model_1.summary())
#fitting model
hist_cnn_model_1 = cnn_model_1.fit(X_train, y_train, validation_split=0.2, batch_size=32, verbose=1, epochs=20)
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 16) 160
leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0
max_pooling2d (MaxPooling2 (None, 16, 16, 32) 0
D)
flatten (Flatten) (None, 8192) 0
dense_6 (Dense) (None, 32) 262176
leaky_re_lu_2 (LeakyReLU) (None, 32) 0
dense_7 (Dense) (None, 10) 330
=================================================================
Total params: 267306 (1.02 MB)
Trainable params: 267306 (1.02 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None
Epoch 1/20
1050/1050 [==============================] - 9s 5ms/step - loss: 1.0480 - accuracy: 0.6709 - val_loss: 0.6740 - val_accuracy: 0.8051
Epoch 2/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.5519 - accuracy: 0.8421 - val_loss: 0.5387 - val_accuracy: 0.8444
Epoch 3/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.4647 - accuracy: 0.8649 - val_loss: 0.5175 - val_accuracy: 0.8493
Epoch 4/20
1050/1050 [==============================] - 6s 6ms/step - loss: 0.4084 - accuracy: 0.8796 - val_loss: 0.4635 - val_accuracy: 0.8677
Epoch 5/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.3634 - accuracy: 0.8922 - val_loss: 0.4636 - val_accuracy: 0.8714
Epoch 6/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.3260 - accuracy: 0.9046 - val_loss: 0.4759 - val_accuracy: 0.8687
Epoch 7/20
1050/1050 [==============================] - 5s 5ms/step - loss: 0.2976 - accuracy: 0.9117 - val_loss: 0.4627 - val_accuracy: 0.8731
Epoch 8/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.2664 - accuracy: 0.9202 - val_loss: 0.4770 - val_accuracy: 0.8720
Epoch 9/20
1050/1050 [==============================] - 5s 5ms/step - loss: 0.2456 - accuracy: 0.9269 - val_loss: 0.4612 - val_accuracy: 0.8721
Epoch 10/20
1050/1050 [==============================] - 5s 5ms/step - loss: 0.2175 - accuracy: 0.9329 - val_loss: 0.4870 - val_accuracy: 0.8775
Epoch 11/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.1977 - accuracy: 0.9396 - val_loss: 0.5203 - val_accuracy: 0.8693
Epoch 12/20
1050/1050 [==============================] - 5s 5ms/step - loss: 0.1814 - accuracy: 0.9440 - val_loss: 0.5256 - val_accuracy: 0.8787
Epoch 13/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.1611 - accuracy: 0.9499 - val_loss: 0.5595 - val_accuracy: 0.8670
Epoch 14/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.1454 - accuracy: 0.9543 - val_loss: 0.5857 - val_accuracy: 0.8660
Epoch 15/20
1050/1050 [==============================] - 5s 5ms/step - loss: 0.1317 - accuracy: 0.9587 - val_loss: 0.6338 - val_accuracy: 0.8673
Epoch 16/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.1174 - accuracy: 0.9632 - val_loss: 0.6501 - val_accuracy: 0.8699
Epoch 17/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.1110 - accuracy: 0.9649 - val_loss: 0.6576 - val_accuracy: 0.8705
Epoch 18/20
1050/1050 [==============================] - 5s 5ms/step - loss: 0.0983 - accuracy: 0.9681 - val_loss: 0.6776 - val_accuracy: 0.8724
Epoch 19/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.0916 - accuracy: 0.9707 - val_loss: 0.7382 - val_accuracy: 0.8696
Epoch 20/20
1050/1050 [==============================] - 4s 4ms/step - loss: 0.0842 - accuracy: 0.9731 - val_loss: 0.7642 - val_accuracy: 0.8669
Plotting Training and Validation Accuracies¶
#extracting training history, list of epochs
model_hist = hist_cnn_model_1.history
epochs = [i for i in range(1,21)]
#plotting accuracies using model history by epoch
plt.figure(figsize = (8,8))
plt.plot(epochs,model_hist['accuracy'],ls = '--', label = 'Accuracy')
plt.plot(epochs,model_hist['val_accuracy'],ls = '--', label = 'Validation Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- The training accuracy surpasses the validation accuracy by epoch 2, rapidly improving in training accuracy over successive epochs. By comparison to the earlier NN models, this CNN model is significantly more accurate at predicting the values of the images in the dataset.
- The final iteration of this model achieves an accuracy score of 97.31%, however the final validation accuracy only reaches 86.69%. This is concerning as it indicates the model is overfitting the data, leading to potential issues in generalising to unseen data.
Let's build another model and see if we can get a better model with generalized performance.
First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.
#clearing keras backend
backend.clear_session()
#fixing seed for random number generators
np.random.seed(42)
tf.random.set_seed(42)
Building and training second CNN model¶
#defining model function
def cnn_model_2():
model = Sequential([
Conv2D(16, (3, 3), padding='same', input_shape=(32, 32, 1)),
LeakyReLU(alpha=0.1),
Conv2D(32, (3, 3), padding='same'),
LeakyReLU(alpha=0.1),
MaxPooling2D(pool_size=(2, 2)),
BatchNormalization(),
Conv2D(32, (3, 3), padding='same'),
LeakyReLU(alpha=0.1),
Conv2D(64, (3, 3), padding='same'),
LeakyReLU(alpha=0.1),
MaxPooling2D(pool_size=(2, 2)),
BatchNormalization(),
Flatten(),
Dense(32),
LeakyReLU(alpha=0.1),
Dropout(0.5),
Dense(10, activation='softmax')
])
#instantiating adam optimiser
adam = Adam(learning_rate=0.001)
#compiling model
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics='accuracy')
return model
#setting model function to variable
cnn_model_2 = cnn_model_2()
#printing summary
print(cnn_model_2.summary())
#fitting model
hist_cnn_model_2 = cnn_model_2.fit(X_train, y_train, validation_split=0.2, batch_size=128, verbose=1, epochs=30)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 16) 160
leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0
max_pooling2d (MaxPooling2 (None, 16, 16, 32) 0
D)
batch_normalization (Batch (None, 16, 16, 32) 128
Normalization)
conv2d_2 (Conv2D) (None, 16, 16, 32) 9248
leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 32) 0
conv2d_3 (Conv2D) (None, 16, 16, 64) 18496
leaky_re_lu_3 (LeakyReLU) (None, 16, 16, 64) 0
max_pooling2d_1 (MaxPoolin (None, 8, 8, 64) 0
g2D)
batch_normalization_1 (Bat (None, 8, 8, 64) 256
chNormalization)
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 32) 131104
leaky_re_lu_4 (LeakyReLU) (None, 32) 0
dropout (Dropout) (None, 32) 0
dense_1 (Dense) (None, 10) 330
=================================================================
Total params: 164362 (642.04 KB)
Trainable params: 164170 (641.29 KB)
Non-trainable params: 192 (768.00 Byte)
_________________________________________________________________
None
Epoch 1/30
263/263 [==============================] - 6s 8ms/step - loss: 1.3951 - accuracy: 0.5281 - val_loss: 2.2783 - val_accuracy: 0.2210
Epoch 2/30
263/263 [==============================] - 2s 6ms/step - loss: 0.6708 - accuracy: 0.7962 - val_loss: 0.5396 - val_accuracy: 0.8382
Epoch 3/30
263/263 [==============================] - 2s 6ms/step - loss: 0.5484 - accuracy: 0.8328 - val_loss: 0.4997 - val_accuracy: 0.8550
Epoch 4/30
263/263 [==============================] - 2s 6ms/step - loss: 0.4803 - accuracy: 0.8545 - val_loss: 0.4409 - val_accuracy: 0.8743
Epoch 5/30
263/263 [==============================] - 2s 6ms/step - loss: 0.4336 - accuracy: 0.8677 - val_loss: 0.4091 - val_accuracy: 0.8830
Epoch 6/30
263/263 [==============================] - 2s 8ms/step - loss: 0.4026 - accuracy: 0.8781 - val_loss: 0.3886 - val_accuracy: 0.8914
Epoch 7/30
263/263 [==============================] - 2s 8ms/step - loss: 0.3780 - accuracy: 0.8838 - val_loss: 0.3993 - val_accuracy: 0.8905
Epoch 8/30
263/263 [==============================] - 2s 6ms/step - loss: 0.3455 - accuracy: 0.8946 - val_loss: 0.4080 - val_accuracy: 0.8946
Epoch 9/30
263/263 [==============================] - 2s 6ms/step - loss: 0.3285 - accuracy: 0.8974 - val_loss: 0.3947 - val_accuracy: 0.8965
Epoch 10/30
263/263 [==============================] - 2s 6ms/step - loss: 0.3101 - accuracy: 0.9041 - val_loss: 0.3718 - val_accuracy: 0.8999
Epoch 11/30
263/263 [==============================] - 2s 7ms/step - loss: 0.2904 - accuracy: 0.9081 - val_loss: 0.4118 - val_accuracy: 0.8950
Epoch 12/30
263/263 [==============================] - 2s 7ms/step - loss: 0.2787 - accuracy: 0.9133 - val_loss: 0.3757 - val_accuracy: 0.9019
Epoch 13/30
263/263 [==============================] - 2s 7ms/step - loss: 0.2619 - accuracy: 0.9180 - val_loss: 0.3649 - val_accuracy: 0.9038
Epoch 14/30
263/263 [==============================] - 2s 8ms/step - loss: 0.2553 - accuracy: 0.9182 - val_loss: 0.4027 - val_accuracy: 0.9063
Epoch 15/30
263/263 [==============================] - 2s 7ms/step - loss: 0.2438 - accuracy: 0.9209 - val_loss: 0.3927 - val_accuracy: 0.9065
Epoch 16/30
263/263 [==============================] - 2s 6ms/step - loss: 0.2328 - accuracy: 0.9246 - val_loss: 0.4087 - val_accuracy: 0.9094
Epoch 17/30
263/263 [==============================] - 2s 6ms/step - loss: 0.2257 - accuracy: 0.9274 - val_loss: 0.4052 - val_accuracy: 0.9048
Epoch 18/30
263/263 [==============================] - 2s 6ms/step - loss: 0.2132 - accuracy: 0.9318 - val_loss: 0.3433 - val_accuracy: 0.9154
Epoch 19/30
263/263 [==============================] - 2s 7ms/step - loss: 0.2051 - accuracy: 0.9343 - val_loss: 0.3683 - val_accuracy: 0.9124
Epoch 20/30
263/263 [==============================] - 2s 7ms/step - loss: 0.1935 - accuracy: 0.9374 - val_loss: 0.3623 - val_accuracy: 0.9146
Epoch 21/30
263/263 [==============================] - 2s 8ms/step - loss: 0.1946 - accuracy: 0.9382 - val_loss: 0.3849 - val_accuracy: 0.9087
Epoch 22/30
263/263 [==============================] - 2s 7ms/step - loss: 0.1838 - accuracy: 0.9393 - val_loss: 0.4420 - val_accuracy: 0.9101
Epoch 23/30
263/263 [==============================] - 2s 7ms/step - loss: 0.1694 - accuracy: 0.9436 - val_loss: 0.4375 - val_accuracy: 0.9106
Epoch 24/30
263/263 [==============================] - 2s 7ms/step - loss: 0.1725 - accuracy: 0.9420 - val_loss: 0.4669 - val_accuracy: 0.9014
Epoch 25/30
263/263 [==============================] - 2s 7ms/step - loss: 0.1700 - accuracy: 0.9434 - val_loss: 0.4462 - val_accuracy: 0.9117
Epoch 26/30
263/263 [==============================] - 2s 7ms/step - loss: 0.1616 - accuracy: 0.9474 - val_loss: 0.4158 - val_accuracy: 0.9180
Epoch 27/30
263/263 [==============================] - 2s 6ms/step - loss: 0.1654 - accuracy: 0.9462 - val_loss: 0.4485 - val_accuracy: 0.9106
Epoch 28/30
263/263 [==============================] - 2s 8ms/step - loss: 0.1558 - accuracy: 0.9485 - val_loss: 0.4795 - val_accuracy: 0.9079
Epoch 29/30
263/263 [==============================] - 2s 8ms/step - loss: 0.1468 - accuracy: 0.9518 - val_loss: 0.4047 - val_accuracy: 0.9168
Epoch 30/30
263/263 [==============================] - 2s 6ms/step - loss: 0.1482 - accuracy: 0.9507 - val_loss: 0.3987 - val_accuracy: 0.9129
Plotting the Training and Validation accuracies¶
#extracting training history, list of epochs
model_hist = hist_cnn_model_2.history
epochs = [i for i in range(1,31)]
#plotting accuracies using model history by epoch
plt.figure(figsize = (8,8))
plt.plot(epochs,model_hist['accuracy'],ls = '--', label = 'Accuracy')
plt.plot(epochs,model_hist['val_accuracy'],ls = '--', label = 'Validation Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- As with the first CNN model, the training accuracy significantly improves following the 2nd epoch, rapidly improving in training accuracy over successive epochs. However it's also noted that the validation accuracy similarly increases significantly after epoch 2, generally following the trend of the training accuracy.
- The final iteration of this model achieves an accuracy score of 95.07%. While this is lower the the first CNN model's final accuracy of 97.31%, the final validation accuracy of this model is markedly improved, at 91.29% compared to the earlier validation accuracy of 86.69%.
- Therefore this model does not appear to be overfitting in comparison to the first CNN model, and should be able to generalise well to unseen data.
- With this in mind, this model is the better model of the two CNN models trained.
Predictions on the test data¶
- Make predictions on the test set using the second model.
- Print the obtained results using the classification report and the confusion matrix.
- Final observations on the obtained results.
Predictions on the test data using the second model.¶
#making predictions using cnn_model_2
y_pred = cnn_model_2.predict(X_test)
563/563 [==============================] - 1s 2ms/step
#converting y_pred, y_test entries to single label using argmax
y_test = np.argmax(y_test, axis=-1)
y_pred = np.argmax(y_pred, axis=-1)
#confirming unique values in y_test, y_pred
unique_y_test = np.unique(y_test)
unique_y_pred = np.unique(y_pred)
#printing unique values in y_test, y_pred
print("Unique values in y_test:", unique_y_test)
print("Unique values in y_pred:", unique_y_pred)
Unique values in y_test: [0 1 2 3 4 5 6 7 8 9] Unique values in y_pred: [0 1 2 3 4 5 6 7 8 9]
Final observations on the performance of the model¶
#printing classification report
print(classification_report(y_test, y_pred))
#plotting heatmap using confusion matrix
con_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8,5))
sns.heatmap(con_matrix, annot=True, fmt='.0f')
plt.ylabel('Actual Value')
plt.xlabel('Predicted Value')
plt.show()
precision recall f1-score support
0 0.91 0.95 0.93 1814
1 0.92 0.90 0.91 1828
2 0.92 0.92 0.92 1803
3 0.91 0.86 0.89 1719
4 0.92 0.93 0.93 1812
5 0.93 0.90 0.91 1768
6 0.90 0.91 0.90 1832
7 0.95 0.91 0.93 1808
8 0.89 0.91 0.90 1812
9 0.88 0.92 0.90 1804
accuracy 0.91 18000
macro avg 0.91 0.91 0.91 18000
weighted avg 0.91 0.91 0.91 18000
Final Observations:
- The overall accuracy of the model is 0.91, which is markedly improved over the final accuracy of the earlier NN model (0.79). This indicates that this CNN model is considerably better at predicting values in the dataset than the final NN model.
- Precision scores range from 0.88 to 0.95, indicating high precision across all classes; similarly the F1 score ranges from 0.89 to 0.93, demonstrating overall strong performance across all classes.
- The number 7 has the highest precision, recall, and F1 score among all classes, indicating that the model performs exceptionally well in identifying images of the number 7. The number 3 has the lowest recall among all classes, although it still maintains a relatively high F1 score, suggesting that it may have difficulty capturing all instances of class 3 but maintains good precision for those it does capture.
- The numbers 1-6, 8, and 9 also exhibit high precision, recall, and F1 scores, indicating robust performance across multiple classes.
- In terms of the confusion matrix, the model does not mistake similar numbers to the same degree as the earlier NN model, with the majority of predictions falling within the correct numerical class.
- As noted earlier, given the similarly high validation accuracy indicating good model fit, this is the superior CNN model of the two models trained, despite the slightly lower final accuracy.