Malaria Detection¶
Executive Summary
A Deep Learning model is developed to accurately identify images of red blood cells infected with malaria. The model is built as a Convolutional Neural Network, trained on almost 25,000 images of parasitised and uninfected red blood cells, then tested on a further 2,600 images of infected and uninfected cells. The chosen final model is highly successful in maximising accurate classification of infected cells, with a recall score of 98.8% on malaria-infected cells in particular (98.4% recall for the overall model). From a business standpoint this outperforms the WHO-defined sensitivity benchmark of 95% for malaria detection methods [1] by 3.8%, indicating this is acceptable by conventional standards. Coupled with the considerable labour reduction brought on by such a model, this would be a highly advantageous tool for malaria diagnosis.
While the margin for improvement on this model is small, some improvements may be achieved by trialling various methods of data augmentation. HSV conversion is used in this model, however other methods such as local contrast enhancement may further highlight parasites present on the infected cells, leading to higher recall rates on images of infected cells.
Problem Summary¶
Malaria is a contagious illness transmitted to humans from the bites of infected mosquitoes. The parasite enters the bloodstream, damages red blood cells and leads to respiratory problems. The parasites can remain dormant in a person's body for over a year without displaying symptoms, making timely treatment crucial to avoid severe consequences (up to and including fatality). Almost half of the global population is at risk of malaria, with children being particularly vulnerable to this illness.
Traditional laboratory diagnosis of malaria requires careful examination by skilled professionals to distinguish between healthy and infected red blood cells, a laborious and time-consuming process prone to human error and variability between lab professionals.
The objective of this project is to create a high performance computer vision model for malaria detection. This model will analyse images of red blood cells and determine whether they are infected with malaria or not. The classification will be binary, labeling cells as either parasitised or uninfected.
Solution Design¶
Based on the above requirements and the need to correctly classify as many infected red blood cells as possible in the data, this is considered a recall problem - the most important metric is the model's ability to identify as many malaria-infected cells as possible. A CNN architecture is used given this architecture's proficiency in working with image data. Furthermore, it is necessary to minimise the number of false negative values in the classification, as falsely classifying infected cells as uninfected would be more detrimental to the model's goals than falsely classifying uninfected cells.
Prior to classification, images fed into this model are converted from RGB to HSV, as this is found to greatly improve discernability of parasites on the infected red blood cells.
The model architecture comprises six successive convolutional layers employing Leaky ReLU activation, followed by batch normalisation, max pooling, and dropout layers to extract relevant features and reduce the chances of overfitting.
The convolutional layers are flattened into 1D vectors before being fed into the fully connected layers for classification, culminating in a sigmoid output layer for binary classification.
The full details of the data preprocessing and model building steps, along with model evaluation, analysis and recommendations for implementation, can be found below.
Mounting Drive¶
#mounting google drive
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Loading libraries¶
#basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile
import os
import cv2
from PIL import Image
import random
from random import shuffle
import warnings
warnings.simplefilter("ignore")
#model selection, preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
#deep learning training
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, BatchNormalization
from tensorflow.keras.layers import Conv2D,LeakyReLU,MaxPooling2D,Flatten
from tensorflow.keras.applications import VGG16
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses, backend
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import backend
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
#model metrics
from sklearn.metrics import classification_report, confusion_matrix
Loading the data¶
# Storing the path of the data file from the Google drive
path = '/content/drive/MyDrive/cell_images.zip'
# The data is provided as a zip file so we need to extract the files from the zip file
with zipfile.ZipFile(path, 'r') as zip_folder:
zip_folder.extractall()
Inputting data to lists¶
#setting train directory
train_dir = '/content/cell_images/train'
#setting test directory
test_dir = '/content/cell_images/test'
train_images = []
train_labels = []
test_images = []
test_labels = []
#iterating through folders
for folder_name in ['/parasitized/', '/uninfected/']:
#folder path
images_path = os.listdir(train_dir + folder_name)
for i, image_name in enumerate(images_path):
try:
#image path
image = Image.open(train_dir + folder_name + image_name)
#resizing image
image = image.resize((64, 64))
#converting images to arrays, appending to list
train_images.append(np.array(image))
#labels for parasitised, uninfected
if folder_name == '/parasitized/':
train_labels.append(1)
else:
train_labels.append(0)
except Exception:
pass
#converting lists to arrays
train_images = np.array(train_images)
train_labels = np.array(train_labels)
#iterating through folders
for folder_name in ['/parasitized/', '/uninfected/']:
#folder path
images_path = os.listdir(test_dir + folder_name)
for i, image_name in enumerate(images_path):
try:
#image path
image = Image.open(test_dir + folder_name + image_name)
#resizing image
image = image.resize((64, 64))
#converting images to arrays, appending to list
test_images.append(np.array(image))
#labels for parasitised, uninfected
if folder_name == '/parasitized/':
test_labels.append(1)
else:
test_labels.append(0)
except Exception:
pass
#converting lists to arrays
test_images = np.array(test_images)
test_labels = np.array(test_labels)
Check the shape of train and test images¶
#printing set shapes to confirm number of images
print("Train data shape:", train_images.shape)
print("Test data shape:", test_images.shape)
Train data shape: (24958, 64, 64, 3) Test data shape: (2600, 64, 64, 3)
Check the shape of train and test labels¶
#printing set shapes to confirm number of images
print("Train labels shape:", train_labels.shape)
print("Test labels shape:", test_labels.shape)
Train labels shape: (24958,) Test labels shape: (2600,)
Observations and insights:¶
- The train images contains 24,958 images, while the test images set contains 2,600 images. This indicates that 9.4% of the total images have been earmarked for testing.
- The images in both the train and test sets have dimensions of 64x64 pixels, with an additional dimension of 3 indicating these are colour images (RGB).
- The train labels array has a shape of (24958,), while the test labels array has a shape of (2600,), indicating both arrays contain labels for all of their images.
Check the minimum and maximum range of pixel values for train and test images
#check min and max pixel values for train images
train_min = np.amin(train_images)
train_max = np.amax(train_images)
#check min and max pixel values for test images
test_min = np.amin(test_images)
test_max = np.amax(test_images)
#printing output
print("Train Images:\n Min pixel value:", train_min, "\n Max pixel value:", train_max)
print("\nTest Images:\n Min pixel value:", test_min, "\n Max pixel value:", test_max)
Train Images: Min pixel value: 0 Max pixel value: 255 Test Images: Min pixel value: 0 Max pixel value: 255
Observations and insights:¶
- Both the train images and test images have a minimum pixel value of 0 and a maximum pixel value of 255.
Count the number of values in uninfected and parasitised¶
#count number of values in each set
train_count = train_images.size
test_count = test_images.size
#printing output
print("Number of values in train images:", train_count)
print("Number of values in test images:", test_count)
Number of values in train images: 306683904 Number of values in test images: 31948800
Normalise the images¶
#normalise train images, convert to float32
train_images = (train_images/255).astype('float32')
#normalise test images, convert to float32
test_images = (test_images / 255).astype('float32')
Observations and insights:¶
- There are significantly more values in the train images dataset (306,683,904) than the test images dataset (31,948,800).
- The number of values in the test images amounts to roughly 1/10th the values in the train images.
- Therefore the models will be trained on a very large dataset and tested on a much smaller dataset.
Plot to check if the data is balanced¶
#count occurrences of each label in the training set, print output
train_label_counts = np.bincount(train_labels)
print('Number of uninfected images in train data:',train_label_counts[0])
print('Number of parasitised images in train data:',train_label_counts[1])
#count occurrences of each label in the test set, print output
test_label_counts = np.bincount(test_labels)
print('\nNumber of uninfected images in test data:',test_label_counts[0])
print('Number of parasitised images in test data:',test_label_counts[1],'\n')
#plotting distributions
plt.figure(figsize=(10, 5))
#plotting for train data
plt.subplot(1, 2, 1)
plt.bar(range(len(train_label_counts)), train_label_counts, color='blue')
plt.title('Train Data Distribution')
plt.xlabel('Label')
plt.ylabel('Count')
plt.xticks(range(len(train_label_counts)))
#plotting for test data
plt.subplot(1, 2, 2)
plt.bar(range(len(test_label_counts)), test_label_counts, color='red')
plt.title('Test Data Distribution')
plt.xlabel('Label')
plt.ylabel('Count')
plt.xticks(range(len(test_label_counts)))
plt.tight_layout()
plt.show()
Number of uninfected images in train data: 12376 Number of parasitised images in train data: 12582 Number of uninfected images in test data: 1300 Number of parasitised images in test data: 1300
Observations and insights:¶
- The data appears to be well balanced between parasitized and uninfected images, for both the training data and test data.
- There are slightly more parasitised images in the training data compared to the uninfected images, with 12,582 parasitised images compared to 12,372 uninfected images. However this amounts to a 1.7% difference, and shouldn't have a significant effect on predictions.
- The images in the test data are perfectly balanced, with 1,300 images for both parasitised and uninfected classes.
Visualising the Training Images¶
#setting random seed
np.random.seed(42)
plt.figure(figsize=(12, 12))
#iterating through 36 images
for i in range(1, 37):
plt.subplot(6, 6, i)
#random index from train_images
random_index = np.random.randint(0, train_images.shape[0])
#get image and label
image = train_images[random_index]
label = train_labels[random_index]
#set title
title = 'parasitized' if label == 1 else 'uninfected'
plt.title(title)
#display image
plt.imshow(image)
plt.axis('off')
plt.show()
Observations and insights:¶
- The above images shows a random selection of unprocessed images from the training dataset.
- For the parasitised images, there appears to be dark red-purple blots on the cell, showing the presence of the malaria parasite. This blot is absent from the uninfected cells.
- Aside from this, the cell images appear to range in size, shape, and colour, which may be a potential issue for a deep learning classification model.
Visualising the Test Images¶
#set random seed
np.random.seed(42)
plt.figure(figsize=(12, 12))
#visualising 36 random images from test images
for i in range(1, 37):
plt.subplot(6, 6, i)
#randomly select index from test_images
random_index = np.random.randint(0, test_images.shape[0])
#get image and label
image = test_images[random_index]
label = test_labels[random_index]
#set title
title = 'parasitized' if label == 1 else 'uninfected'
plt.title(title)
#display image
plt.imshow(image)
plt.axis('off')
plt.show()
Observations and insights:¶
- Similar to the earlier exercise, These images show a random selection of unprocessed images from the test dataset.
- Again for the parasitised images, there appears to be dark red-purple blots on the cell, showing the presence of the malaria parasite. This blot is absent from the uninfected cells.
- As with the training images, the cell images appear to range in size, shape, and colour, which may be a potential issue for a deep learning classification model.
Plotting the mean images for parasitised and uninfected¶
#function to find the mean
def find_mean_image(full_images, title):
# Calculate average image
mean_image = np.mean(full_images, axis=0)
# Plot the average image
plt.imshow(mean_image)
plt.title(f'Average {title}')
plt.axis('off')
plt.show()
return mean_image
Mean image for parasitised¶
#creating list to hold parasitised images
parasitised_data = []
#iterate through train images and labels
for img, label in zip(train_images, train_labels):
#check if label is 1
if label == 1:
parasitised_data.append(img)
#convert list to numpy array
parasitised_data = np.array(parasitised_data)
#calculate and plot mean of parasitised images
parasitized_mean = find_mean_image(parasitised_data, 'Parasitized')
Mean image for uninfected¶
#creating list to hold uninfected images
uninfected_data = []
#iterate through train images and labels
for img, label in zip(train_images, train_labels):
#check if label is 0
if label != 1:
uninfected_data.append(img)
#convert list to numpy array
uninfected_data = np.array(uninfected_data)
#calculate and plot mean of uninfected images
uninfected_mean = find_mean_image(uninfected_data, 'Uninfected')
Observations and insights:¶
- Comparing the mean images for parasitised and uninfected in the training data, there doesn't appear to be a significant difference in the mean images.
- The mean parasitised image does appear to be slightly darker, however apart from this there are no significant differences in the images.
Converting the train data from RGB to HSV¶
#fixing seed for random number generators
np.random.seed(42)
#setting figsize
plt.figure(figsize=(12, 12))
#creating list to hold the HSV images
train_images_hsv = []
#convert all train images to HSV
for img in train_images:
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
hsv_img = (hsv_img * 255).astype(np.uint8)
train_images_hsv.append(hsv_img)
#convert list to a numpy array
train_images_hsv = np.array(train_images_hsv)
for i in range(1, 37):
plt.subplot(6, 6, i)
#randomly select index from test_images
random_index = np.random.randint(0, train_images_hsv.shape[0])
#get image and label
image = train_images_hsv[random_index]
label = train_labels[random_index]
#set title
title = 'parasitized' if label == 1 else 'uninfected'
plt.title(title)
#display image
plt.imshow(image)
plt.axis('off')
plt.show()
Converting the test data from RGB to HSV¶
np.random.seed(42)
plt.figure(figsize=(12, 12))
#creating list to hold HSV images
test_images_hsv = []
#convert all test images to HSV
for img in test_images:
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
hsv_img = (hsv_img * 255).astype(np.uint8)
test_images_hsv.append(hsv_img)
#convert list to numpy array
test_images_hsv = np.array(test_images_hsv)
for i in range(1, 37):
plt.subplot(6, 6, i)
#randomly select index from the test_images array
random_index = np.random.randint(0, test_images_hsv.shape[0])
#get image and label
image = test_images_hsv[random_index]
label = test_labels[random_index]
#set title
title = 'parasitized' if label == 1 else 'uninfected'
plt.title(title)
#display the image
plt.imshow(image)
plt.axis('off')
plt.show()
Observations and insights:¶
- Converting the train and test images to HSV does appear to highlight instances of the parasite in infected cells, now showing up as yellow-green blobs in the images.
- There still appears to be some red-blue noise in the images which may affect the functioning of the CNN models.
One Hot Encoding the train and test labels¶
#encoding train, test labels
train_labels = to_categorical(train_labels, 2)
test_labels = to_categorical(test_labels, 2)
#confirming data shapes before proceeding
print("Shape of Training Data:", train_images_gauss.shape)
print("Shape of Training Labels:", train_labels.shape)
print("Shape of Test Data:", test_images_gauss.shape)
print("Shape of Test Labels:", test_labels.shape)
Shape of Training Data: (24958, 64, 64, 3) Shape of Training Labels: (24958, 2) Shape of Test Data: (2600, 64, 64, 3) Shape of Test Labels: (2600, 2)
Defining Functions for Evaluation¶
Defining function to print the classification report, plot the confusion matrix
#defining function to plot confusion matrix, print classification report
def plot_con_matrix(model, test_images, test_labels):
# Making predictions
pred = model.predict(test_images_hsv)
pred = np.argmax(pred, axis=1)
# Getting true labels
y_true = np.argmax(test_labels, axis=1)
# Printing classification report, setting to 3 decimal places
print(classification_report(y_true, pred, digits=3))
# Plotting confusion matrix heatmap
cm = confusion_matrix(y_true, pred)
plt.figure(figsize=(8, 5))
sns.heatmap(cm, annot=True, fmt='.0f', xticklabels=['Uninfected', 'Parasitised'], yticklabels=['Uninfected', 'Parasitised'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
Defining function to plot the recall and loss curves
#defining function to plot recall and loss
def plot_recall_and_loss(history):
#get number of epochs
epochs = range(1, len(history.history["recall"]) + 1)
#create figure and axes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
#plot recall
ax1.plot(epochs, history.history["recall"], label="Train Recall", ls='-')
ax1.plot(epochs, history.history["val_recall"], label="Validation Recall", ls='-')
ax1.set_title("Recall vs Epoch")
ax1.set_xlabel("Epochs")
ax1.set_ylabel("Recall")
ax1.legend(loc="upper left")
#plot loss
ax2.plot(epochs, history.history["loss"], label="Train Loss", ls='-')
ax2.plot(epochs, history.history["val_loss"], label="Validation Loss", ls='-')
ax2.set_title("Loss vs Epoch")
ax2.set_xlabel("Epochs")
ax2.set_ylabel("Loss")
ax2.legend(loc="upper right")
# Show plot
plt.show()
Building the Model¶
#clearing keras backend
backend.clear_session()
#fixing seed for random number generators
np.random.seed(42)
tf.random.set_seed(42)
#defining model and layers
malaria_model = Sequential([
#convolutional layers with same padding to maintain input size,
#leakyrelu to introduce non-linearity
Conv2D(32, (2, 2), padding='same', activation=LeakyReLU(alpha=0.2), input_shape=(64, 64, 3)),
#normalising with batch normalisation
BatchNormalization(),
#downsampling with max pooling
MaxPooling2D((2, 2)),
#regularising with dropout
Dropout(0.2),
#repeating convolutional layer pattern
Conv2D(32, (2, 2), padding='same', activation=LeakyReLU(alpha=0.2)),
BatchNormalization(),
MaxPooling2D((2, 2)),
Dropout(0.2),
Conv2D(32, (2, 2), padding='same', activation=LeakyReLU(alpha=0.2)),
BatchNormalization(),
MaxPooling2D((2, 2)),
Dropout(0.2),
Conv2D(32, (2, 2), padding='same', activation=LeakyReLU(alpha=0.2)),
BatchNormalization(),
MaxPooling2D((2, 2)),
Dropout(0.2),
Conv2D(32, (2, 2), padding='same', activation=LeakyReLU(alpha=0.2)),
BatchNormalization(),
MaxPooling2D((2, 2)),
Dropout(0.2),
Conv2D(32, (2, 2), padding='same', activation=LeakyReLU(alpha=0.2)),
BatchNormalization(),
MaxPooling2D((2, 2)),
Dropout(0.2),
#converting to 1D vector with flatten
Flatten(),
#fully connected layers
#connecting neurons from previous layers with dense
Dense(512, activation=LeakyReLU(alpha=0.2)),
#regularising with dropout
Dropout(0.4),
#using sigmoid activation on final dense layer for binary classification
Dense(2, activation='sigmoid')
])
#calling summary of model
malaria_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 64, 64, 32) 416
batch_normalization (Batch (None, 64, 64, 32) 128
Normalization)
max_pooling2d (MaxPooling2 (None, 32, 32, 32) 0
D)
dropout (Dropout) (None, 32, 32, 32) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 4128
batch_normalization_1 (Bat (None, 32, 32, 32) 128
chNormalization)
max_pooling2d_1 (MaxPoolin (None, 16, 16, 32) 0
g2D)
dropout_1 (Dropout) (None, 16, 16, 32) 0
conv2d_2 (Conv2D) (None, 16, 16, 32) 4128
batch_normalization_2 (Bat (None, 16, 16, 32) 128
chNormalization)
max_pooling2d_2 (MaxPoolin (None, 8, 8, 32) 0
g2D)
dropout_2 (Dropout) (None, 8, 8, 32) 0
conv2d_3 (Conv2D) (None, 8, 8, 32) 4128
batch_normalization_3 (Bat (None, 8, 8, 32) 128
chNormalization)
max_pooling2d_3 (MaxPoolin (None, 4, 4, 32) 0
g2D)
dropout_3 (Dropout) (None, 4, 4, 32) 0
conv2d_4 (Conv2D) (None, 4, 4, 32) 4128
batch_normalization_4 (Bat (None, 4, 4, 32) 128
chNormalization)
max_pooling2d_4 (MaxPoolin (None, 2, 2, 32) 0
g2D)
dropout_4 (Dropout) (None, 2, 2, 32) 0
conv2d_5 (Conv2D) (None, 2, 2, 32) 4128
batch_normalization_5 (Bat (None, 2, 2, 32) 128
chNormalization)
max_pooling2d_5 (MaxPoolin (None, 1, 1, 32) 0
g2D)
dropout_5 (Dropout) (None, 1, 1, 32) 0
flatten (Flatten) (None, 32) 0
dense (Dense) (None, 512) 16896
dropout_6 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 2) 1026
=================================================================
Total params: 39746 (155.26 KB)
Trainable params: 39362 (153.76 KB)
Non-trainable params: 384 (1.50 KB)
_________________________________________________________________
Compiling the model¶
#instantiating adam optimiser
adam = Adam(learning_rate=0.001)
#compiling model
malaria_model.compile(loss = 'binary_crossentropy', optimizer = adam, metrics = [tf.keras.metrics.Recall()])
Using Callbacks¶
#assigning early stopping and model checkpoint callbacks
callbacks = [EarlyStopping(monitor = 'val_loss', patience = 3),
ModelCheckpoint('.mdl_wts.hdf5', monitor = 'val_loss', save_best_only = True)]
Fit and Train the Model¶
#fitting and training model
malaria_model_hist = malaria_model.fit(train_images_hsv, train_labels, batch_size = 32,
callbacks = callbacks, validation_split = 0.2, epochs = 20, verbose = 1)
Epoch 1/20 624/624 [==============================] - 13s 14ms/step - loss: 0.3096 - recall: 0.8599 - val_loss: 0.1309 - val_recall: 0.9655 Epoch 2/20 624/624 [==============================] - 7s 12ms/step - loss: 0.1094 - recall: 0.9639 - val_loss: 0.0681 - val_recall: 0.9838 Epoch 3/20 624/624 [==============================] - 7s 12ms/step - loss: 0.0913 - recall: 0.9705 - val_loss: 0.0575 - val_recall: 0.9800 Epoch 4/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0846 - recall: 0.9721 - val_loss: 0.0463 - val_recall: 0.9850 Epoch 5/20 624/624 [==============================] - 7s 12ms/step - loss: 0.0770 - recall: 0.9756 - val_loss: 0.0593 - val_recall: 0.9800 Epoch 6/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0757 - recall: 0.9750 - val_loss: 0.0533 - val_recall: 0.9800 Epoch 7/20 624/624 [==============================] - 7s 11ms/step - loss: 0.0697 - recall: 0.9763 - val_loss: 0.0458 - val_recall: 0.9832 Epoch 8/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0682 - recall: 0.9778 - val_loss: 0.0632 - val_recall: 0.9774 Epoch 9/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0705 - recall: 0.9776 - val_loss: 0.0532 - val_recall: 0.9788 Epoch 10/20 624/624 [==============================] - 7s 12ms/step - loss: 0.0657 - recall: 0.9779 - val_loss: 0.0456 - val_recall: 0.9838 Epoch 11/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0618 - recall: 0.9802 - val_loss: 0.0537 - val_recall: 0.9804 Epoch 12/20 624/624 [==============================] - 7s 11ms/step - loss: 0.0617 - recall: 0.9797 - val_loss: 0.0399 - val_recall: 0.9860 Epoch 13/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0616 - recall: 0.9791 - val_loss: 0.0466 - val_recall: 0.9818 Epoch 14/20 624/624 [==============================] - 7s 12ms/step - loss: 0.0633 - recall: 0.9789 - val_loss: 0.0559 - val_recall: 0.9752 Epoch 15/20 624/624 [==============================] - 8s 13ms/step - loss: 0.0623 - recall: 0.9794 - val_loss: 0.0541 - val_recall: 0.9790
Evaluating the model
#evaluating for recall
malaria_model_eval = malaria_model.evaluate(test_images_hsv, test_labels, verbose = 1)
print('\n', 'Test Recall:', malaria_model_eval[1])
82/82 [==============================] - 1s 4ms/step - loss: 0.0446 - recall: 0.9842 Test Recall: 0.9842307567596436
Plotting the confusion matrix¶
#printing classification report and plotting confusion matrix
plot_con_matrix(malaria_model,test_images_hsv,test_labels)
82/82 [==============================] - 0s 3ms/step
precision recall f1-score support
0 0.988 0.980 0.984 1300
1 0.980 0.988 0.984 1300
accuracy 0.984 2600
macro avg 0.984 0.984 0.984 2600
weighted avg 0.984 0.984 0.984 2600
Plotting the recall and loss curves¶
#plotting recall and loss for model 1
plot_recall_and_loss(malaria_model_hist)
#saving malaria model
malaria_model.save("malaria_detection_model.keras")
Analysis and Key Insights¶
From the data preprocessing steps, it's clear the presence of malaria parasites on the cell images present as dark purple blobs on the cell surface. Therefore, efforts to promote the appearance of these blobs (as with HSV conversion) can reasonably lead to higher performance in model detection of the parasite.
The model shows high performance across all metrics, with precision, recall, and F1-score exceeding 98% for both classes. In particular, the chosen model performed well in terms of final recall score, achieving a final recall of 98.8% on malaria-infected cells, using LeakyReLU and sigmoid activation. From a business standpoint this model was able to detect 98.8% of all malaria-infected cells in the test set, exceeding the World Health Organization's recommended minimum sensitivity of 95% by 3.8% [1]. This surpasses conventional standards and significantly reduces the workload required for diagnosis, making it a highly beneficial tool for malaria detection.
Additionally, it can be seen from the above recall and loss curves that the training recall closely follows the trend of the validation curve, with similar trends seen in the loss curves. While the validation recall and loss does appear to dip and spike with increasing epochs, ultimately it doesn't significantly differ from the train recall and loss respectively. This indicates reasonably good fit, suggesting the model should generalise well to unseen data.
With the above considerations, this model is chosen to be the final solution design for its high recall, low rate of false positives, ease of setup, and fit to the validation recall indicating a high likelihood of generalising well to unseen data.
The model may be further improved by increasing the number of convolutional layers in the model, as well as experimenting with different activation functions. Also, further investigation into data augmentation methods may yield increased recall on a more optimised model.
Recommendations for Implementation¶
There may be some potential challenges in implementing this for use by technicians in the field. For example, the user interface for this model should be easy to understand and compatible with existing hardware used by professionals. Adequate training should also be provided for healthcare workers on using the system effectively. This should cover proper blood smear preparation, image capture techniques, and proper interpretation of the model's results.
Additionally, it will be important to maintain patient confidentiality in diagnosing malaria using this model. Implementation steps for this model should include anonymising patient data, and adhering to regulations on health information.
References¶
[1] WHO. Guidelines for the treatment of malaria. 3rd ed. Geneva: World Health Organization; 2015. https://www.afro.who.int/publications/guidelines-treatment-malaria-third-edition. Accessed 1 Apr 2024.