Automatic License/Number Plate Recognition (ANPR) using CNN

9 min readJul 2, 2021

Automatic License/Number Plate Recognition systems come in all shapes and sizes:

And even more, advanced ANPR systems use specialized neural network architectures to pre-process and clean images before they are OCR’d, thereby improving ANPR accuracy.
Automatic License/Number Plate Recognition (ANPR/ALPR) is a process involving the following steps:
Step #1: Detect and localize a license plate in an input image/frame
Step #2: Extract the characters from the license plate
Step #3: Apply some form of Optical Character Recognition (OCR) to recognize the extracted characters

Creating a workspace

>conda create -n ‘name_of_the_environment’ python=3.6

>conda activate ‘name_of_the_environment’

# installing OpenCV
>pip install opencv-python==4.1.0# Installing Keras
>pip install keras# Installing Jupyter
>pip install jupyter#Installing Scikit-Learn
>pip install scikit-learn

Step 2

Setting up the environment!

We’ll start with running Jupiter notebook and then importing necessary libraries in our case OpenCV, Keras, and sklearn.

# in your conda environment run
>jupyter notebook

This should open Jupyter notebook in the default web browser.
Once open, let’s import the libraries

#importing openCV
>import cv2#importing numpy
>import numpy as np#importing pandas to read the CSV file containing our data
>import pandas as pd#importing keras and sub-libraries
>from keras.models import Sequential
>from keras.layers import Dense
>from keras.layers import Dropout
>from keras.layers import Flatten, MaxPool2D
>from keras.layers.convolutional import Conv2D
>from keras.layers.convolutional import MaxPooling2D
>from keras import backend as K
>from keras.utils import np_utils
>from sklearn.model_selection import train_test_split

Step 3

Number plate detection:

Let’s start simply by importing a sample image of a car with a license plate and define some functions:

The above function works by taking image as input, then applying ‘haar cascade’ that is pre-trained to detect Indian license plates, here the parameter scaleFactor stands for a value by which input image can be scaled for better detection of license plate (know more). minNeighbors is just a parameter to reduce false positives, if this value is low, the algorithm may be more prone to giving a misrecognized outputs. (you can download the haar cascade file as ‘indian_license_plate.xml’ file from my github profile.)

input image

output image with detected plate highlighted

output image of detected license plate

Step 4

Performing some image processing on the License plate.

Now let’s process this image further to make the character extraction process easy. We’ll start by defining some more functions for that.

The above function takes in the image as input and performs the following operation on it-

resizes it to a dimension such that all characters seem distinct and clear
convert the colored image to a grey scaled image i.e instead of 3 channels (BGR), the image only has a single 8-bit channel with values ranging from 0–255 where 0 corresponds to black and 255 corresponds to white. We do this to prepare the image for the next process.
now the threshold function converts the grey scaled image to a binary image i.e each pixel will now have a value of 0 or 1 where 0 corresponds to black and 1 corresponds to white. It is done by applying a threshold that has a value between 0 and 255, here the value is 200 which means in the grayscaled image for pixels having a value above 200, in the new binary image that pixel will be given a value of 1. And for pixels having value below 200, in the new binary image that pixel will be given a value of 0.
The image is now in binary form and ready for the next process Eroding.
Eroding is a simple process used for removing unwanted pixels from the object’s boundary meaning pixels that should have a value of 0 but are having a value of 1. It works by considering each pixel in the image one by one and then considering the pixel’s neighbor (the number of neighbors depends on the kernel size), the pixel is given a value 1 only if all its neighboring pixels are 1, otherwise it is given a value of 0.
The image is now clean and free of boundary noise, we will now dilate the image to fill up the absent pixels meaning pixels that should have a value of 1 but are having value 0. The function works similar to eroding but with a little catch, it works by considering each pixel in the image one by one and then considering the pixel’s neighbor (the number of neighbors depends on the kernel size), the pixel is given a value 1 if at least one of its neighboring pixels is 1.
The next step now is to make the boundaries of the image white. This is to remove any out of the frame pixel in case it is present.
Next, we define a list of dimensions that contains 4 values with which we’ll be comparing the character’s dimensions for filtering out the required characters.
Through the above processes, we have reduced our image to a processed binary image and we are ready to pass this image for character extraction.

Step 5

Segmenting the alphanumeric characters from the license plate

After step 4 we should have a clean binary image to work on. In this step, we will be applying some more image processing to extract the individual characters from the license plate. The steps involved will be-

Finding all the contours in the input image. The function cv2.findContours returns all the contours it finds in the image. Contours can be explained simply as a curve joining all the continuous points (along the boundary), having the same color or intensity.

plate with contours drawn in green

After finding all the contours we consider them one by one and calculate the dimension of their respective bounding rectangle. Now consider bounding rectangle is the smallest rectangle possible that contains the contour. Let me illustrate the bounding rectangle by drawing them for each character here.

Since we have the dimensions of these bounding rectangle, all we need to do is do some parameter tuning and filter out the required rectangle containing required characters. For this, we will be performing some dimension comparison by accepting only those rectangle that has a width in a range of 0, (length of the pic)/(number of characters) and length in a range of (width of the pic)/2, 4*(width of the pic)/5. If everything works well we should have all the characters extracted as binary images.

The binary images of 10 extracted characters.

The characters may be unsorted but don’t worry, the last few lines of the code take care of that. It sorts the character according to the position of their bounding rectangle from the left boundary of the plate.

Step 6

Creating a Machine Learning model and training it for the characters.

The data is all clean and ready, now it’s time do create a Neural Network that will be intelligent enough to recognize the characters after training.

https://mesin-belajar.blogspot.com/2016/05/topological-visualisation-of.html

For modeling, we will be using a Convolutional Neural Network with 3 layers.

## create model
>model = Sequential()
>model.add(Conv2D(filters=32, kernel_size=(5,5), input_shape=(28, 28, 1), activation='relu'))
>model.add(MaxPooling2D(pool_size=(2, 2)))
>model.add(Dropout(rate=0.4))
>model.add(Flatten())
>model.add(Dense(units=128, activation='relu'))
>model.add(Dense(units=36, activation='softmax'))

To keep the model simple, we’ll start by creating a sequential object.
The first layer will be a convolutional layer with 32 output filters, a convolution window of size (5,5), and ‘Relu’ as activation function.

Next, we’ll be adding a max-pooling layer with a window size of (2,2).
Max pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned.

max-pooling layer

Now, we will be adding some dropout rate to take care of overfitting.
Dropout is a regularization hyperparameter initialized to prevent Neural Networks from Overfitting. Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. We have chosen a dropout rate of 0.4 meaning 60% of the node will be retained.
Now it’s time to flatten the node data so we add a flatten layer for that. The flatten layer takes data from the previous layer and represents it in a single dimension.

Finally, we will be adding 2 dense layers, one with the dimensionality of the output space as 128, activation function=’relu’ and other, our final layer with 36 outputs for categorizing the 26 alphabets (A-Z) + 10 digits (0–9) and activation function=’ softmax’

Step 7

Training our CNN model.

The data we will be using contains images of alphabets (A-Z) and digits (0–9) of size 28x28, also the data is balanced so we won’t have to do any kind of data tuning here.
I’ve created a zip file that contains data as per the directory structure below, with a train test split of 80:20

https://medium.com/@vijayabhaskar96/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720

We’ll be using ImageDataGenerator class available in keras to generate some more data using image augmentation techniques like width shift, height shift. To know more about ImageDataGenerator, please check out this nice blog.
Width shift: Accepts a float value denoting by what fraction the image will be shifted left and right.
Height shift: Accepts a float value denoting by what fraction the image will be shifted up and down.
It’s time to train our model now!
we will use ‘categorical_crossentropy’ as loss function, ‘Adam’ as optimization function and ‘Accuracy’ as our error matrix.
After training for 23 epochs, the model achieved an accuracy of 99.54%.