12: Machine Learning with Image Processing

Integrating Machine Learning with Image Processing has many beneficial use cases. It can be used to identify facial features, detect defects in fruits, classify leaves, etc.

I will give 3 illustrations in this post.

Musical Notes

First up, let’s consider this image of sheet music. The location of the note with respect to the lines determines the pitch while the shape of the note determines the beats. To be able to read the sheet music, the following steps were performed:

1. Morphological Operations (Opening then Closing using a horizontal bar structuring element) to remove the lines. A Fourier transform could have also been used to remove these repetitive lines.

Steps 1 & 2

2. Blobs were created using the DoG method, with a certain threshold to exclude the thin vertical lines. With the coordinated for the centroids of these blobs extracted, a bounding box was created to segment each individual note.

3. The highest y coordinate value of each note was mapped to find the pitch.

4. 9 out of the 60 notes were manually tagged for their beat and fit into a LinearSVC model and used to predict the remaining notes. An accuracy of 100% was achieved.

Accurately labelled sheet music

Next, let us try to translate the American Sign Language to alphabets.


The dataset was acquired from Kaggle and contained ~27,500 images of 28x28 pixelated images. A Convolutional Neural Network was used to with flatten, pooling, dense, and dropout layers to perform a multi-class classification. An accuracy of 100% was achieved on the dataset. However, when trying to run live predictions using a webcam many errors surfaced, drastically reducing the accuracy.

TensorFlow CNN Model Summary

Without modifying the model, image processing techniques were applied to the captured frame to run into the model.

Image Processing Steps

The following steps were performed:

1. Slice the location of the fist

2. Bilateral filter to remove noise and smoothen the image

3. Histogram Manipulation to normalize values at 50% percentile of their per channel intensities.

4. Gray scaling

5. Downscaling

These steps were necessary to match the new input data to that of the dataset the model was trained on. Because of this downscaling, some information on where the thumb or some fingers are were lost. The model is therefore very sensitive to light, angle, and proximity to the camera. The model was accurately able to identify most letters except for “s” which is in the shape of a closed fist, similar to many other letters like “a” and “n”.

Lastly, let us try to classify leaves. There are 5 classes of leaves but each class has a different image size as well as majority orientation. Image processing techniques such as segmentation and scaling/rotation need to be performed before making a model.

5 Classes of Leaves

1. Since the images were already in gray scale, thresholds were set to binarize the images.

2. The images were then labelled and run through the regionprops function.

Steps 1 & 2

The function unfortunately found stray points that were considered individual components as well. These were filtered out based on the area property. The individual leaf images were then extracted along with other features such as orientation, eccentricity, hu moments, and extent.

3. All individual leaves were then padded so they appear in the center of the image.

4. The individual leaves were then rotated based on the orientation (of their major axis). Here the images would be strictly top or bottom pointing instead of multiple angles.

Steps 3 & 4

5. All images were scaled to give them equal shapes of 100x100 pixels.

6. From these scaled images, features such as the leaf area and length-width ratio were derived.

7. The flattened image arrays along with the extracted and derived features were then split into a 80–20 train test set and fit into a random forest classifier. After some hyperparameter tuning, an accuracy of 92% was achieved. The top predictors of the model were the eccentricity, length-width ratio, components of the hu moments, and finally some specific pixel values.

This brings us to the end of my blog posts. Hope you were able to learn a few new things and develop an interest in image processing and machine learning. Thank you for reading!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nisarg Nigam

Nisarg Nigam

MS Data Science student at Asian Institute of Management