OCR optimization for vehicle number plate Identification based on Template matching

Page 1

Int. Journal of Electrical & Electronics Engg.

Vol. 2, Spl. Issue 1 (2015)

e-ISSN: 1694-2310 | p-ISSN: 1694-2426

OCR optimization for vehicle number plate Identification based on Template matching Vikas Upadhyay#, Surbhi, Dixit Sharma Department of Electronics and Communication, University of Allahabad 1

Vikasjk4@gmail.com

Abstract—Optical character recognition (OCR) is an approach to extract the characters from an image. Vehicle number plate identification is already a challenging task in OCR. In this paper a method for vehicle license plate identification is implemented and analyzed, on the basis of novel adaptive image segmentation and filtering technique conjunction with optical character recognition has been proposed. In this paper a novel method for license plate number localization based on ratio and position of characters is performed. The localized characters have been correlated to the predefined templates of characters. Based on appropriate threshold of character authentication, the correlation value decides the valid character for localized region of interest. This paper is divided into five segments: first part consists of introduction and literature survey, second part deals with image conversion (from RGB to black and white), removal of unwanted noisy region and classification of connected components, third part explains filtering based on ratio of height to width for validation of true character using height filter and position filter, fourth part explains how to extract the likelihood region of character using median centroid approach for number plate. This approach enables the localization of number plates in widely varying illumination conditions with relevance to the number plate having English alphanumeric fonts. Fifth part of this paper explains the correlation with each templates for validate the character based on maximum correlation value. Keywords— OCR, adaptive image segmentation, connected component, median centroid approach, correlation, Region of interest

I. INTRODUCTION This era of digital image processing provide a number of valid and useful results in field of traffic and video inspection. Automatic number plate recognition is one of the wide areas of research for traffic control and vehicle security agencies. Number of approach has been proposed so for in Automatic Number Plate Recognition (ANPR). The proposed ANPR is a real time traffic surveillance system which automatically identifies and records the license number of vehicles. License plate standards vary from country to country; different countries use different types of fonts and styles for Vehicle licence plates. Current ANPR systems are being used to the number plates having Standard English fonts. Proposed algorithm provides accurate results for the high quality images even with number plate at few degree of camera angle. There may be different type of noises that always alter the output of optical character recognition in case of number plate recognition. Noise like dirt can be present on the number plate. In some cases, many unwanted characters or design may be NITTTR, Chandigarh EDIT -2015

present on the number plate. In order to recognize the license number, we have to extract the regions containing required numbers or characters also called region of interest. Then these extracted areas will further be processed to find the characters and numbers. Matlab is a suitable tool to perform the ANPR algorithms. The proposed method is also proficient to provide real time video traffic surveillance for stolen vehicle based on provided licence number. A. Literature Survey In the last few years, different methodologies were used in the field of automatic number plate recognition. Traffic security surveillance and vehicle safety raised the demand of research work in field of ANPR system. In paper [1] a technique to recognise the number plate found in any corner of the image is being explained. The edge detection and logical conversion of image is used to extract the number plate from the image. The accuracy of this algorithm is limited to 87%. In paper [2] and [3], the author has explained that there are two major stages involved in the identification of number plate character, character separation and character recognition. The OCR algorithm for identification of characters in the number plate is used. OCR is used to recognize an optically processed printed character number plate. But in some countries like India there is no standard for the numerals and characters. High variations are found in design of characters and numerals. In that cases accuracy of OCR diminishes and it become difficult to find the correct result. In paper [6] the OCR based algorithm is developed for recognition of number plates which is most suitable for green and white background license plates. In it first of all number plate is extracted from the image after that threshold based segmentation of character and finally template matching is done. This algorithm started with the number plate background color search process so it is not suitable for other number plates having different background color. In countries like India ANPR is still a difficult task using OCR, hence optimization in the OCR algorithm has been proposed in this paper. II. PROPOSED METHOD The proposed method to Recognize vehicle number plate consists of following steps:  RGB to Binary conversion  Remove prior noise  Extracting ROI from image 78


Int. Journal of Electrical & Electronics Engg.

Vol. 2, Spl. Issue 1 (2015)

e-ISSN: 1694-2310 | p-ISSN: 1694-2426

Remove potential noise based on character position Recognize the License plate character or retrieving the vehicle number based on template correlation.

B. RGB to Logical image It is the very first step that has to be performed after capturing an image. Inserted image is most probably the RGB image, to perform any filtering approach first of all convert the rgb image into gray scale. After that, Otsu’s method has been used to convert the gray scale image into black and white image. Here finding threshold of gray scale image is very important in contrast to the loss of information. Thresholding will give an appropriate value which can be used to convert the gray image into logical image.

Fig 1. RGB Image

Fig 2. Logical Image

C. Removal of small noise Binary conversion adds some unwanted noisy region which should be removed to get desired output. As in above figure a white spotted area above character H appears as noise. Here to remove the small connected pixels unwanted noisy area, some image processing functions has been used. Pixels area less than 20 pixels may be treated as noise region and will be removed from the logical image. Testing shows that such small area within 20 pixels can’t be a region of interest. In the next step, certain labels are assigned to the entire connected components. D. Label connected components After removing noise it is important to label all the connected components. This process assigns different labels to the connected regions. All of these connected components will contain region of interest along with the noise. So to extract noise free region of interest advance filtering processes has been implemented.

79

Fig 3. Noisy image

Fig 4. Label Image

III. POSTION BASED FILTRATION Prior study shows various methods to design filters and optimize the result of ANPR. This proposed filtering method consists of unique approach suitable for identification of vehicle number plates. This paper proposed a position based filter to remove the entire addition noisy region based on their specific property like position and size and will left out only the region of interested region that contain character or number. Considering the dimensional properties of character, number and noisy region, different types of filters are used at different levels of region extraction. These filters are as follows:  Height to width ratio filter  Height filter  Position filter/ Mean centroid filter  Pixel area based approach Figure shown below represent the region of interest in the red color boxes. We can see a noisy region at the left side of image. To remove such noises we used a height to width ratio filter, explained in the next section.

Fig 5. Without height and width filter

E. Height to width ratio filter Worldwide English alphabets and numbers in any particular font consist of certain range of height to width ratio. This ratio filter increases the probability to filter out the effective ROI in the target region. Height to width ratio filter is very efficient filter as it removes most of the unnecessary bounding region which does not hold any effective character or number. The research and analysis has been done on various fonts of English alphabets and numbers. The threshold range of H/W ratio lies between NITTTR, Chandigarh

EDIT-2015


Int. Journal of Electrical & Electronics Engg.

Vol. 2, Spl. Issue 1 (2015)

one to eight. As this filter filters out the region having height to width ratio between 1 and 8. HW=Height/width To get the maximum efficiency, from our analysis and statistics we used HW between 1 and 8 in this algorithm to filter out the bounding ROIs having.

e-ISSN: 1694-2310 | p-ISSN: 1694-2426

6, 7, 8, 9, 10, 11, 12, 13, 14, 15

1 < HW < 8 This HW ratio filters a number of unwanted regions labelled as connected component and provides the bounding region of our interest. F. Height filter H/W filter approach is good enough to optimize the result for stationary images. In real time ANPR system there are variations even after extracting bounding region from height to width ratio filter. Considering a real time approach of ANPR these filtered bounding boxes has been further filtered according to their height. As in real time the distance of vehicle to camera may vary from few centimetres to metres hence to filter out bounding regions based on constant height will never be a novel method. This proposed algorithm is implemented to overcome this problem in number of cases: 1) As we have different heights of ROIs that may vary from very small bounding region to bounding region as large as size of target image. The very first step is to divide heights of the objects in margin of 10. Say, if heights of 16 bounding regions are as: [44, 450, 159, 541, 671, 89, 93, 95, 98, 91, 87, 93, 93, 85, 99, 156]

It will be taken as: [41, 441, 151, 541, 671, 81, 91, 91, 91, 91, 81, 91, 91, 81, 91, 151]

2) This matrix count the number of times a number is repeating itself as in above example: like 151 is 2 times in above matrix [1, 1, 2, 1, 1, 3, 7, 7, 7, 7, 3, 7, 7, 3, 7, 2]

3) Now find the largest number from upper array or in other words the number which repeats itself maximum times. Here we have 7 (in the above example) corresponding to 91. 4) Save the positions of ROIs that contain highest frequency of repetition from the above matrix and take the mean height of those positions as: Mean Height = {(93+95+98+91+93+93+99)/7} =94.57 This process of filtering is localization of effective ROIs for the target image.

Fig 6.

Alternative Method: One other centroid position based filtering approach can be applied on the ROIs to filter the effective character region. The centroid position filter can use independently or even with the H/W filter to improve the result further. IV. CENTROID POSITION FILTER After performing algorithm of 'height filter' most of the noise part get removed. But in some cases there are still chances of presence of noise portion along with required region. An alternate method has been tested in some cases to optimize the filter result which is based on position of centroid of the bounded region of interest. The centroid coordinate provide the level of ROIs in a image. The general aspect of number plates consists of one or two row of alphabets and numbers. G. Y-axis position filter In an image centroid point of connected region consist of coordinate of centroid. First element of coordinate represents Y-axis and second element represent X-axis. Inline alphabet or numeral of any image like in vehicle number plate has been effectively identified by using a median centroid tolerance filter. It defines the range of centroid for inline (Horizontal) alphabet and numeral based on Y-axis position. 1) Similar to 'height filter' divide centroid of the bounding boxes (along Y-axis) with margin of 10. 2) Count number of times number repeat itself. 3) Find the frequent occurring numbers or in other words the number which repeats itself maximum times. 4) Save the positions of bounding boxes that contain largest number and take the mean of centroid at those positions. 5) Save the position of bounding boxes that contains the centroid (Y axis) value between (mean+30) and (mean-30).

5) Filter out and save the location of ROIs (bounding red box objects) that contains the height between (mean height +20) and (mean height-20). From the given example the heights of filtered bounding boxes are as moreover provide a localization of the effective ROIs with corresponding pixel heights: 89, 93, 95, 98, 91, 87, 93, 93, 85, 99

Positions of filtered bounding boxes are as: NITTTR, Chandigarh

Fig 7. Centroid Position filter

EDIT -2015

80


Int. Journal of Electrical & Electronics Engg.

Vol. 2, Spl. Issue 1 (2015)

H. X - axis position filter After performing Y-axis filter algorithm we will move towards X-axis position filter. Perform the following operations on the output of the Yaxis position filter.

e-ISSN: 1694-2310 | p-ISSN: 1694-2426

or numeral. The maximum correlation coefficient value provides the matching template for each cropped region and character or number assigned to that template will be assigned as the text or digit of the cropped region. VI. FLOW DIAGRAM OF THE PROCESS

6) Find mean of the centroid position value (along Xaxis). 7) Find the difference (DF) between the maximum and minimum value of centroid. (along X-axis). 8) Save the position of bounding boxes that contain the centroid value between mean-(DF/1.8) and mean+(DF/1.8) I. Pixel area base noise filter Pixel area of the ROIs in an image also one of the important aspect to eliminated the unwanted noisy ROIs from the target image. Matlab provide number of function to remove the small unwanted noisy region in an image. After apply different filter in most of the cases the ANPR result get optimize but still in the main target image consist some very large pixel area of unwanted ROIs. The pixel area based aspect has been applied at the very last technique of this exact ROIs filter algorithm. V. CORRELATE IMAGES WITH PREDEFINED TEMPLATE Correlation function is a very vast approach to find out the similarity between two images using correlation coefficient. As much as two images match with each other; the correlation coefficient value tending towards unity. Correlation function computes the correlation coefficient r using =

∑∑ (∑ ∑

(

(

− ̅)( − ̅) )(∑ ∑

VII.

− ) (

RESULTS AND DISCUSSION

− ) )

̅ ℎ 2( ) 2( ) If correlation coefficient is unity then two image regions can be considered as perfectly matched. In this paper we considered a threshold value of correlation coefficient after analysis of correlation of the template with specific matching pattern (ROIs) within the original image. After successive filtering and morphological operations the appropriate region of interest identified. To process further these ROIs must be resized to templates dimension. These bounding regions and save them as cropped images. For proposed algorithm templates of 30*50 pixel size has been used. So resize the cropped images to 30*50 pixel dimensions to match with the dimension of existing templates. These templates consist of English alphabet and numeral database in capital font. Increasing the number of templates for each character can improve the result of correlation. The set of templates is defined as per the standard alphabets used in number plates. Correlate each of the cropped regions of interest with the saved set of templates and create the correlation matrix for each cropped region. Find the maximum correlation value in each correlation matrix of cropped regions.

Each of the cropped regions defines a particular character 81

We applied this algorithm to 500 different images to calculate accuracy of our work. The accuracy of projected work is around 95.76% with the Indian vehicle number plates with a slight camera angle and distant variation. The table given at the end consist of images taken from cars from varying distances and camera angle, and the corresponding characters recognised by our proposed algorithm. Sr. No. 1.

Image

Recognized Character MH12DE1433

Accuracy % 100

2.

MH04FRT977

90

3.

MH12DE1433

100

4.

PB6517614

88.8

NITTTR, Chandigarh

EDIT-2015


Int. Journal of Electrical & Electronics Engg.

WB06F5977

5.

Vol. 2, Spl. Issue 1 (2015)

100

VIII. CONCLUSION AND APPLICATION This proposed algorithm has capability to read the number plate from a distant camera and slight angle variation is also acceptable. The number plate of any vehicle is a unique identity for the vehicle it can be utilized for many purposes.  Automatic toll tax deduction  Vehicle surveillance.  Vehicle security Various algorithms had been developed on optical character recognition. This method is differing in the aspect of filtration and segmentation. A multidimensional target number plate region extraction algorithm has been developed. It removes the unwanted data processing and improves the result of optical character recognition. Supplementary improvement has been proposed by creating more than one set of templates. In future scope of improvement we can increase the number of templates and create an artificial neural network, train the artificial neural network using set of templates for each character as input with a particular output. This will automatically adjust the output for the input from the available set of outputs (templates) and may improve the result. REFERENCES [1] Prathamesh Kulkarni, Ashish Khatri, Prateek Banga, Kushal Sha, “A Feature Based Approach for Localization of Indian Number Plates”, Electro/Information Technology, 2009.eit’09, IEEE International Conference , E-ISBN 978-1-4244-3354-4,Print ISBN 978-1-42443354-4, June 2009. [2] Shyang-Lih Chang, Li-Shien Chen, Yun-Chung Chung, and Sei-Wan Chen;”Automatic License Plate Recognition”, IEEE transactions on intelligent transportation systems, vol. 5, no. 1, March 2004 [3] Er. Kavneet Kaur1, Vijay Kumar Banga1, “number plate recognition using ocr technique”, IJRET: International Journal of Research in Engineering and Technology EISSN: 2319-1163 | PISSN: 2321-7308; Volume: 02 Issue: 09 Sep-2013

NITTTR, Chandigarh

EDIT -2015

e-ISSN: 1694-2310 | p-ISSN: 1694-2426

[4] Mohamed El-Adawi, Hesham Abd el Moneim Keshk, Mona Mahmoud Haragi, “Automatic License Plate Recognition”, IEEE Transactions on Intelligent Transport Systems, vol. 5, pp. 42- 53, March 2004. [5] Serkan Ozbay, Ergun Ercelebi,” Automatic Vehicle Identification by Plate Recognition”, World Academy of Science, Engineering and Technology, International Journal of Electrical, Robotics, Electronics and Communications Engineering Vol:1 No:9, 2007. [6] Sarmad Majeed Malik, and Rehan Hafiz,” Automatic Number Plate Recognition Based on Connected Components Analysis Technique”, 2nd International Conference on Emerging Trends in Engineering and Technology (ICETET'2014), May 30-31, 2014 London (UK)

Vikas Upadhyay received his Master of Technology degree in Electronics Engineering from Department of Electronics and Communication, University of Allahabad, Allahabad, in 2013 and currently working in the field of Embedded Linux based system Design, Digital Image processing, Neural Network and LED drivers. He worked as Project Engineer at NIELIT. His research interests include embedded system design, Digital Image processing. He also received the IETE Post Graduate Fellowship from IETE, New Delhi during his master’s study at University of Allahabad.

Surbhi received his Master's degree in Electronics Engineering from Electronics and Electrical department from PEC university of Technology, Chandigarh, in 2013 and currently working as a Project Engineer at NIELIT, Chandigarh. Her Research areas include Embedded systems and Digital image processing. She also received MHRD fellowship during her M.Tech study.

Dixit Sharma, a 3rd year B.tech student from Guru Nanak Dev University regional campus Jalandhar. Also, awarded as the best allrounder student of the institute. I have keen interest in research field and want to see future in this field

82


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.