geometrical processing- texture segmentation- OCR

Page 1

EE 569- Fall’08

EE569-Project-3 Geometric modification, Texture Analysis & ; Optical Character Recognition(OCR)

Submitted by-

NEHA RATHORE 5994499980

NEHA RATHORE |ID- 5994499980

1


GEOMETRICAL MODIFICATION Problem 1- Geometrical modification Objective We were given four images Boat1-Boat4 all in different orientations and scales. All together they represent a single image boat.raw. We had to implement an algorithm to properly scale, translate and rotate these images so as to join these images properly to make final boat.raw. Motivation In this problem, the objective is to perform Geometrical Modification on an image. This requires manipulation to be done on the image co-ordinates and not on the image intensity values. Geometrical transformations modify the spatial relationship between the pixels in the image. These transformations are often called rubber-sheet transformation. Geometric Transform of an image refers to the family of linear operations on an image such as the spatial translation, spatial rotation, spatial scaling and perspective transformation. These operations are an integral part of Computer Graphics and Animation which involves a non-linear combination of the above basic operations. It should be noted that all the above operators when employed in series are not commutative, which is a basic fact that arises from the property of matrices. Geometric Image Modification plays an important role in Image registration and Image synthesis.1 In terms of Digital Image Processing, geometrical transformation consists of two basic operations: • A spatial transformation of coordinates • Intensity interpolation that assigns intensity values to the spatially transformed pixels. Image registration is an important application in DIP to align two or more images of the same scene. The main learning in this assignment was to have knowledge of converting image coordinates to Cartesian coordinates and processing the image in Cartesian coordinate system. Also I learnt how to scale , rotate, translate images to get the desired output. Finally I learnt image registration in joining the 4 images together. Along with this I learnt zoom-in and zoom-out concepts, shearing concepts. PROCEDURES I modularized this part of assignment in different challenges to be achieved: • The first challenge in this report was to find the corners of the rotated boat image, that was located inside a bigger white box. • The second challenge was to find the center of the image and translate it to (0,0) location. • The third challenge was to rotate this image about the point(0,0) to the desired angle so as to make the image straight along horizontal and vertical axis. 1

Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods

EE 569- Fall’08 | ID- 5994499980

2


• • •

The fourth challenge was to translate the rotated image back. The five challenge was to find the scaling factor of image size into size 256 . Fifth challenge was join the four scaled image together.

Algorithm We have four different images, all the images have different orientations and are scaled by different factors. The main task is to design an algorithm to translate, rotate and scale these images such that, they can be combined to form the desired 512x512 image.

Image coordinate to Cartesian coordinate conversion As mentioned above the points are first represented in Cartesian coordinate system which has the center point in the bottom left corner which is transformed into image coordinate system whose center point is top left corner of the image2 . The relationship between the Cartesian coordinate representations and the discrete image arrays of the input and the output images are shown by Xk = k - ½ Yk = J + ½ -j These equations represent the output array indices to their Cartesian coordinate system. Similarly the input array relationship is given by Uq = q -½ Vp = P+½ - p We find that the basic goal of implementing Geometrical Modification is to find out where a particular pixel in the input image maps to in the output image. These transformations often yields no integer values which makes it tough to determine the correct position in the output. However ,the reverse operation of finding out where a particular pixel in the output image comes from in the input image is much better . Thus in order to implement this reverse mapping ,we need a function. This is called as reverse address mapping function. Thus the entire problem of Geometrical Modification is resovled if we find the “Reverse Address Mapping Function”. Thus for the each of the different geometrical operations like translation scaling and rotation we have different reverse address mapping functions which is basically obtained by multiplying the inverse of each of the transform matrix to the corresponding input coordinates to obtain the final image coordinates . Corner Detection There are different algorithms for detecting the corners like Harris Corner detector etc. that detects the corner by using the change in intensity levels in nieghboring pixels. However, since our images were very 2

Pratt W. Digital image processing 3ed

EE 569- Fall’08 | ID- 5994499980

3


simple with only 4 corners and are located entirely within the white space, it becomes easy for us to calculate the corners by merely scanning for the first non-white pixel in different directions. We scan the 256x256 image in the following order: //first quadrant for(a=0;a<256;a++) { count=0; for(b=0;b<256;b++) { if(Input[a][b] != 255) { //cout<<"top-left corner detected:"; x1=a; y1=b; Input[a][b]=0; cout<<"x1="<<x1<<"y1="<<y1<<endl; count=1; goto next; } } }

//third quadrant has no corner and both other corners are located in 4rth qouadrant. //bottom right next1: for(a=355;a>0;a--) { count=0; for(b=355;b>0;b--) { if(Input[a][b] != 255) { cout<<"bottom-right corner detected:"; x3=a; y3=b; Input[a][b]=0; cout<<"x3="<<x3<<"y3="<<y3<<endl; count=1; goto next2; } } }

//second quadrant has the x2 y2 quadrants top right corner next: for(b=255;b>=0;b--) { count=0; for(a=0;a<=255;a++) { if(Input[a][b] !=255) { cout<<"top-right corner detected:"; x2=a; y2=b; Input[a][b]=0; cout<<"x2="<<x2<<"y2="<<y2<<endl; count=1; goto next1; } } } //bottom left next2: cout<<"in 4"<<endl; for(b=0;b<356;b++) { count=0; for(a=355;a>=0;a--) { if(Input[a][b] != 255) { cout<<"bottom-left corner detected:"; x4=a; y4=b; Input[a][b]=0; cout<<"x4="<<x4<<"y4="<<y4<<endl; count=1; goto next3; } } }

EE 569- Fall’08 | ID- 5994499980

4


OUTPUT:

Since I replaced every corner detected by a black pixel( for check purposes) I found out that the corners were successfully detected for all four images. To Find the angles of rotaion we just for a right angles triangle using two corners of the image and use Angle=sin-1 (opposite/hypotenuse) We use the following relations3;

3

Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods

EE 569- Fall’08 | ID- 5994499980

5


Intensity interpolation

GRAY LEVEL or INTENSITY INTERPOLATION: Once we have calculated the relation of coordinates between input and output coordinates, the next most important operation is intensity interpolation. The address mapping done in the previous step might give us non integer values. Because the distorted image is digital, its pixel values are defined only at integer coordinates. Thus using non integer values causes a mapping into locations of the image points for which no gray levels are defined. Thus coming to the conclusion on what the gray level values at those locations should be , based only on the pixel values at the integer coordinate location , then becomes necessary. Thus gray level interpolation is used to get this transformation. In our discussion we use bi-linear interpolation. Bi-linear interpolation technique uses the gray levels of the four nearest neighbors to interpolate the value of the new non integer pixel. The gray level of each of the four integral nearest neighbors of a no integral pair of coordinates is known, the gray level value at these coordinates can be interpolated from the values of its neighbors by using the relation ship

. (p,q)

.(p,q+1) *(i,j)

.(p+1,q)

.(p+1,q+1)

F(i,j) = (1-a)[(1-b)F(p,q) + b.F(p,q+1)] + a[(1-b)F(p+1,q) + b.F(p+1,q+1)] Where a and b are the corresponding distance of the intermediate value f(i,j) with respect to its neighboring pixel coordinates along horizontal and vertical directions.

Interpolation is basically averaging between neighboring pixels. Let's say you have 3x3 image.

10 4 8 2 12 6 8 4 2 You might want to make 6x6 image with this image by bilinear interpolation. (O: No value assigned yet) 10 O 4 O 8 O OOOOOO 2 O 12 O 6 O OOOOOO EE 569- Fall’08 | ID- 5994499980

6


8O4O2O OOOOOO

First, obtain value of unassigned pixels by averaging horizontally neighboring two pixels. 10 7 4 6 8 8 OOOOOO 2 7 12 9 6 6 OOOOOO 864322 OOOOOO Second, obtain value of unassigned pixels by averaging vertically neighboring two pixels. 10 7 4 6 8 8 6O8O7O 2 7 12 9 6 6 5O8O4O 864322 8O4O2O Lastly, obtain value of unassigned pixels by averaging neighboring 4 pixels. (In case of edge pixels, obtain value of unassigned pixels by averaging neighboring 3 pixels . 10 7 4 6 8 8 6 7 8 7.5 7 7 2 7 12 9 6 6 5 6.5 8 6 4 4 864322 864322

EE 569- Fall’08 | ID- 5994499980

7


RESULT and DISCUSSION Example of rotation, translation and scalling for one part of boat.raw

Final output

EE 569- Fall’08 | ID- 5994499980

8


DISCUSSION We were able to rotate the image and scale it accordingly. But we notice that the image has blurred to some extent. This is because of approximation used in intensity interpolation.As we are distributing same intensity to set of surrounding pixels, it has en effect of averaging filter which also makes the image blur. Also we were not able to stitch the image together properly. This is because of rounding errors in the decision of boundaries. This could be avoided by copying the pixels a little bit outside the boundaries so as to fill the white space.

1B- SPATIAL WRAPPING MOTIVATION AND OBJECTIVE: Spatial warping is a useful technique of determining the coordinate relationship between the input and the output images so as to get a linear system given the control points or the degrees of freedom. Once the linear system has been obtained, all the points in the input image will obey or follow this system and get warped to the corresponding coordinates in the output image. This system of equations or coefficients can be used to recover an image that has been distorted or warped in a particular manner. Geometric (or spatial) transformations on an image are typically used to correct for imaging system distortion or conversely to purposely distort (i.e., warp) for purposes of achieving some desired visual effect. Geometric correction is an important image processing task in many application areas. Distortion may arise from aberrations in the sensor.. A geometric transformation is given by a mapping function that relates the points in the input image to corresponding points in the output image. The mapping may be represented by a pair of equations or a transformation matrix. The matrix is either known a priori, or, as is true for a vast majority of applications, must be inferred from a set of points of correspondence, typically called control points. Once the transformation matrix is known, it may be used to compute a corrected output image from a known distorted input image. For example, they can be employed to recover an image that has been distorted by a physical imaging system. Typical examples include barrel and pincushion distortion. In remote sensing and satellite imagery, the common distortions are due to earth curvature and various attitude and altitude effects . Non-linear geometrical modification has also a wide range of applications apart from its usage in multimedia and graphical illusions. Ex: The pictures taken during aerial surveys or the photographs taken by satellites have considerable distortions that are non-linear in nature.

Procedure In general, a spatial transformation is defined by a polynomial function of the form

where x,y and u, v are point coordinates in the input and output images, respectively, N is the polynomial order, and a, b are mapping coefficients that characterize the transformation.

EE 569- Fall’08 | ID- 5994499980

9


We are required to an affine transformation resulting in a image whose coordinates have a equation of degree 2. This implies the transformation is not linear. We choose the values of x and y such that maximum degree is 2.

U= ( a0 a1 a2 a3 a4 a5)(1 x y x^2 xy y^2)t V= ( b0 b1 b2 b3 b4 b5)(1 x y x^2 xy y^2)t Breaking down this into steps • • • • •

Finding the control points Calculating the A matrix Calculating inverse of A Finding coefficients a0-a5 and b0-b5 by multiplying Ainv*U and Ainv*B Applying these coefficients to the general affine transformation equation to calculate the new coordinates for input coordinates.

Algorithm Finding the control points We have the sample input and sample output images for this problem. We are also given the radius of the circle in the output image. This makes it easy to calculate the values of u,v in output image corresponding to the x ,y in the input image. We manually see these points and find the mapping chart. This chart is useful in finding the coefficients as these are one of the roots of the above equation. We then for the A matrix which is given as follows: A= [1 X Y X^2 XY Y^2] Then number of rows in this matrix depends on the number of control points. I deally for six unknown coeff, sic control points should be enough. But to make the result better we can choose more then 6 control points. We then calculate the inverse. I used matlab to find the inverse of this 6x6 matrix. Reason:I was using an online matrix inverse calculator to find the inverse first, but it was giving me drastically different results. The calculations were not done properly and hence coefficients were coming out to be incorrect.

EE 569- Fall’08 | ID- 5994499980

10


If the number of control points are more then 6 we have a rectangular matrix. Inverse of this, does not exist . so ve instead find the pseudo inverse by the following formula;

A inv =(AtA)-1At We then multiply this matrix with first U matrix to get a coeffs and then with V matrix to get b coeffs. Finally we get the coeff and we hardcode them into the program. Results:

CHOOSING THE CONTROL POINTS

The Image was divided in four parts as indicated above: Then A matrix and coeff for each part was calculated . This was done because it is very difficult to find a single value of coefficients that can warp the image in 4 different directions. INPUT--------------------->OUTPUT X,Y-----U,v EE 569- Fall’08 | ID- 5994499980

11


A= [1 x y x^2 xy y^2] PART 2 Part1

PART2

Control points

Control points

0,0 - 256,0 128,128--256,128 256,256--256,256 128,384--128,256 0,511----0,256 0,256----181,181

256,256----256,256 256,511---181,331 511,511---256,511 128,384--128,256 384,384---256,384 0,511-----0,256

A1 =

A2 =

1 1 1 1 1 1

0 128 256 128 0 0

0 128 256 384 511 256

0 0 0 16384 16384 16384 65536 65536 65536 16384 49152 147456 0 0 261121 0 0 65536

a=A inverse*u b=a inverse*v a0=256.0000; a1=0.0841; a2= -0.0841; a3= 0.0008; a4= 0.0000; a5= -0.0008;

1 1 1 1 1

256 256 511 128 384

256 511 511 384 384

65536 65536 261121 16384 147456

65536 65536 130816 261121 261121 261121 49152 147456 147456 147456

acoef = a0=256.0000; a1=0.9132; a2=-0.9132; a3=-0.0008; a4=0.0000; a5=0.0008; bcoef =

b0=0.0; b1=0.0861; b2=0.9139; b3=0.0008; b4=-0.0000; b5=-0.0008; Part 3

b0=0; b1=0.0868; b2= 0.9132; b3= 0.0008; b4=-0.0000; b5=-0.0008; Part 4

Control Points

Control Points

256,256---256,256 384,128---384,256 511,511---256,511

256,256----256,256 0,0--------256,0 128,128----256,128 EE 569- Fall’08 | ID- 5994499980

12


511,256---331,331 511,0-----511,256 384,384---256,384

256,0------331,181 384,128----384,256 511,0------511,256

A3 = 1 1 1 1 1 1

A4 = 256 384 511 511 511 384

acoef3 = 256.0000 0.9152 -0.9152 -0.0008 0.0000 0.0008 bcoef3 = 0 0.9132 0.0868 -0.0008 -0.0000 0.0008

256 65536 128 147456 511 261121 256 261121 0 261121 384 147456

65536 65536 49152 16384 261121 261121 130816 65536 0 0 147456 147456

1 1 1 1 1 1

256 0 128 256 384 511

256 65536 65536 65536 0 0 0 0 128 16384 16384 16384 0 65536 0 0 128 147456 49152 16384 0 261121 0 0

ac4 = 256.0000 0.0861 -0.0861 0.0008 0.0000 -0.0008 bc4 = 0 0.9139 0.0861 -0.0008 -0.0000 0.0008

EE 569- Fall’08 | ID- 5994499980

13


RESULTS & DISCUSSION Input image

SCANNING REGIONS

EE 569- Fall’08 | ID- 5994499980

14


The warped image of PART4

EE 569- Fall’08 | ID- 5994499980

15


Final IMAGE

OUTPUT

DISCUSSION If we notice properly the image is shifted by 1 pixel from top and bottom. Also the curve is not exactly smooth and we can see some scales-like effect in some regions. The possible reason for this is that, since the warping used here is not linear values of U,V might be in fractions for some values of X,Y. Rounding this value of U,V places the pixel in the nearest pixel possible. This produces Scale-like effect Also some the angle of warping is discreet it is not giving the exact point of end points and makes the image shift 1 pixel down from above and below. EE 569- Fall’08 | ID- 5994499980

16


This was an efficient but a tedious way of finding the warped image. This warped image can also be produces by use of polynomial equation of degree 3. Even a slightest change in coeff. value is drastically reflected on the output image. This process is variant to rounding errors and decimal approximations. As this process produces a smaller image as compared to the original image, we didn’t see the lack of intensity in any region. We can see from final image that the input image is spatially warped successfully.

PROBLEM 2- TEXTURE ANALYSIS Part A- TEXTURE CLASSIFICATION OBJECTIVE: We are given 12 sample of textures . there are four groups containing three textures each. We have to design an algorithm to classify the clusters of these images belonging to different groups. MOTIVATION : In this problem, either a group of images are to be classified according to their texture types or a single image is to be segmented into different parts with each part having a distinct texture type. Texture analysis plays an important role in the interpretation of remote sensing images, satellite maps, etc. A texture is related to the visual appearance of the region. It is due to semi regular patterns which are not strictly periodic. Texture analysis is carried out basically to describe structured patterns. Edge detection cannot be used here because if the texture is very fine, the edge density will be very high and hence the output segmented image will not be appealing. Textures can be structured patterns of object surfaces such as wood, grain, sand, grass, cloth etc. They are very difficult to define correctly. Each texture is characterized by a set of characteristics called as “features”. A feature is summarized information which catches the essence of a texture type but still has the desired discriminated power. If two images have the same texture type, their features should be identical.4 The texture classification is used to identity features of an image and find out information about that image. For example, for a picture taken from moon , earth appears in different colors. Through texture classification, it is possible to identify the regions of water, land,forest and etc. on earth. Another motivation to do problem is that texture classification can be used to generate applications like voice automation on basis of texture classification. Image a blind walking through a park and is about to strike to a tree, if the texture classification is done properly in real time, then this collision can be avoided. Procedure

4

Pratt W. Digital image processing 3ed;

EE 569- Fall’08 | ID- 5994499980

17


Image Segmentation based on Texture is very challenging and various methods have been proposed for achieving this goal. However only a few of them have been successful. One such technique is the “Laws Filters” which has been used for my implementation. The FIFTEEN input images are read into a cell array such that each image is stored as an element in the array. Each of the fifteen images are passed through a filter bank that consists of nine filters. The three basic filters that give rise to these nine filters are:

Local Average Filter L3 = 1/6 * [1 2 1] Edge Detector E3 = ½ * [-1 0 1] Spot Detector S3 = ½ * [1 -2 1]

The idea was to form the tensor product of each of these 3 filters and get 9 “ 3 x 3” filters. The basic understanding behind the usage of these 3 filters is as follows: Upon doing the Fourier Analysis of each of these filters ,it is observed that ,

L3: acts as a Low Pass Filter(L.P.F) E3: acts as a Band Pass Filter(B.P.F) S3: acts as a High Pass Filter(H.P.F)

CALCULATION OF THE ENERGY VECTORS: Therefore when all 3 filters are put together, we cover the low frequency ,high frequency and middle frequency regions. Once the nine filters have been obtained, each of the 15 images are passed through the filter bank to produce a set of nine output images per input image. Gi = input image * lawfilter(i) [ i varies from 1 – 9] For each of the Gi’s produced in the previous step ,the energy is computed as follows: Energy Fk= (1/N^2) {all i ∑ all j ∑ |Gk(i,j)|^2 } where k : 0 9 Each input image when passed through the filter bank will give rise to a set of nine energy components related to the nine output images produced by the filter bank. These nine components can be treated as a 9 point Energy vector in the 9 dimensional Feature Space. Since there are 15 images to be classified, there are a total of 15 such energy vectors in the 9D Feature Space. Each 9D Energy vector is a point in the Feature Space. Thus the resulting feature space is an array of 15 x 9 vectors, since each image has 9 different feature vectors for 9 different law filtered image.

EE 569- Fall’08 | ID- 5994499980

18


EUCLIDEAN DISTANCE CLASSIFICATION: The classification can be done based on the proximity of the energy vectors in the feature space. The nearness of a vector to another vector can be determined by calculating the Euclidean distance as follows: Euclidean Distance = || Ei – Ej || = ∑ (( Ei – Ej) ^ 2) The Euclidean Distance is calculated from every energy vector to every other energy vector in the feature space.

NEAREST NEIGHBOR ALGORITHM: In my classification problem I have used nearest neighboring algorithm. An energy vector in the feature space whose Euclidean distance from another energy vector is below a threshold is said to be a nearest neighbour of the same. The essence of the approach is to determine all such vectors that lie in close proximity to each other.

Since it is known that there are 4 classes of images among a group of 12 images, the Euclidean distance of one vector from the every vector thus calculated is arranged in a row and the least three distances are determined and the corresponding energy vectors are said to be in close proximity to the same in the 9D feature space RESULT for classification

EE 569- Fall’08 | ID- 5994499980

19


Energy Computation

Image1 Image2 Image3 Image4 Image5 Image6 Image7 Image8 Image9 Image10 Image11 Image12

T1 T2 T3 T4 T5 T6 T7 T8 T9 0.0420 0.0031 0.0050 0.0037 0.0030 0.0078 0.0079 0.0074 0.0203 0.0616 0.0037 0.0036 0.0040 0.0040 0.0063 0.0042 0.0070 0.0158 0.0446 0.0040 0.0072 0.0071 0.0074 0.0183 0.0101 0.0204 0.0524 0.0350 0.0048 0.0083 0.0041 0.0080 0.0196 0.0060 0.0170 0.0529 0.0616 0.0040 0.0040 0.0041 0.0041 0.0064 0.0045 0.0071 0.0157 0.0402 0.0023 0.0037 0.0025 0.0022 0.0055 0.0041 0.0057 0.0160 0.0327 0.0065 0.0111 0.0045 0.0101 0.0254 0.0072 0.0212 0.0627 0.0489 0.0043 0.0068 0.0049 0.0067 0.0169 0.0077 0.0180 0.0509 0.0604 0.0038 0.0039 0.0044 0.0047 0.0072 0.0048 0.0081 0.0178 0.0313 0.0070 0.0111 0.0052 0.0113 0.0263 0.0081 0.0217 0.0723 0.0450 0.0041 0.0062 0.0059 0.0067 0.0158 0.0082 0.0164 0.0470 0.0448 0.0023 0.0039 0.0027 0.0022 0.0055 0.0062 0.0055 0.0159

GRAPHICAL CALCULATIONS

DISCUSSION We see from the graphs, that the energy values of all the 12 images lie within a certain range. That is, for a certain kind of texture, the energy valuesof different images fall in close proximity to each other. As seen from the graph, the 9 feature vectores obtained from each set of image lie in close proximity to each the images having similar textures. For example, for images 1, 12 , 6, we see that all the 9 vectors EE 569- Fall’08 | ID- 5994499980

20


have more or less same values. This decreases the euclidean distance between the two images and hence, classify then as same textures. we also notice, that our result shows an error in line 4 where it recognized 4 in the group of image 3 and 11. But It was able to detect the proper group later on. I tried to find the logical error in this but could think of any.so I concluded,this might occur due to some transitional stage in the algorithm. I decided to take the best of 3 as a result.

Part 2B

- TEXTURE SEGMENTATION

Objective: Now we are given a cluster of different textures within an image. We have to do segmentation such that we are able draw a boundary between different textures within this image. In our case, we are given 2 images that have four clusters each. We have to use the method used above for classification, but modify it in such a way that it applies to the pixels rather than image.

MOTIVATION: Instead of considering the entire image, we can perform the analysis of the features associated with every pixel in the image. It is intuitive to note that the visual vector of each pixel is not stable but that of the entire image is stable. We also know that pixels belonging to same type of texture have their corresponding feature vectors close to each other in the feature space. So an analysis of a pixel instead of the entire image would provide more information about the texture information. This kind of image processing is used in applications where we have to detect the features or object of a picture like face detection, where, eyes, hair, nose, lips etc are separated from each other for different analysis. One of the exciting applications that I think of is, extracting the features of a particular city when looked from a large height. Procedure: The input image is scanned for each pixel and a 3x3 matrix is formed by picking but all 4-connected and 8 connected pixels. We then apply the laws filter to each pixel like this and generate 9 images for 9 laws filter. We then take a large region of the image surrounding a particular pixel and calculate the energy. We use the kmeans algorithm to find the areas of different clusters.

ALGORITHM AND IMPLEMENTATION: EE 569- Fall’08 | ID- 5994499980

21


The input image is applied to the 9 (3 x 3 ) filters given by Law to produce nine images namely T1 , We know that pixels belonging to same type of texture have their corresponding feature vectors close to each other.T2,T3 ,…….T9. The main assumption that leads us to this analysis of performing operations on a pixel by pixel basis is that it would be more efficient than on the entire image, the energy corresponding to each pixel in the set of images ‘G’ is determined. Since there are nine such images, each pixel in a particular location in the input image will have a 9 dimensional Energy vector as opposed to part above where each image had a 9 dimensional energy vector in the Feature Space.

Energy Fk= (1/W2) { all I € W € all j ∑ W ∑ |Gk(i,j)|^2 } where k : 0 9,W:size of the window(15 x 15) Thus each of the Gi’s is subject to energy computation for each of its pixels. This results in each pixel of the input image having nine different features namely, { f1,f2,f3,f4,f5,f6,f7,f8,f9}.

This kind of averaging is done to reduce the statistical fluctuations in image. This prevents the spreading of the clusters in the feature space which could make the segmentation process even more complicated. The size of the window, if large, will cause the clusters to merge in the feature space that could result in the merging of textures in the output image. On the other hand, a smaller window size could result in the over spreading of the clusters that could result in over segmentation. Hence the size of the window should be chosen on an experimental basis so that the above two cases are avoided to the best possible extent I decided to go with a window size of 51x51. I scan the image after extending the size of image by 25 pixels from each side. I copy the first 25 pixels from each side and copy it to adjacent areas. This I am doing to fill the new pixels with a value same as the image. This is done so as to avoid the error due to averaging. For example; average of 5+5+5/3 is 5 but 0+0+5/3 is 5/5 which is smaller then the previous case. So if we leave the extended bits zero , we will get faulty energy values

I then apply the k means algorithm.

K-MEANS ALGORITHM: Initially a set of centroids are choosen. Suppose there are N2 points in the feature space. If they are to be classified into k classes, the first step is to assume k centroids. The distance of each and every point from the centroids is calculated. EE 569- Fall’08 | ID- 5994499980

22


The distances of each and every point from the centroids are calculated by using the Nearest Neighbor Method or the Euclidean Distance method discussed above. Each of the points are associated with the closest centroid thus forming various clusters. The average of the centroids is calculated to get a new centroid. The above two steps are repeated or the clustering is performed again. Converging the iteration. The centroid for each and every cluster thereby formed is calculated and the above steps are repeated until the centroids of the previous iteration and the current iteration is less than the threshold passed in the function. The threshold can be as minimal as possible. RESULTS & DISCUSSION 15x15 window in texture1

15x15 window in texture2

EE 569- Fall’08 | ID- 5994499980

23


we see that the result of 15x15 window was very poor for texture1 but is fairly better for texture2.However, these is lot of scope left for improvement here. The reason for this is that the textures in first image are very similar to each other. So it is highly possible that without any information of their spatial distribution, the total energies calculated for each pixel are so close that it becomes hard to distinguish. Thus, giving an error in detecting proper segments. In second image the textures have a large variation from each other, thus producing energies differing by an intelligible amount, thus it is easier to detect these patterns correctly.

DISCUSSION1-CHOOSING A DIMENSION FOR WINDOW: We started with a 15x15 window and calculated energies accordingly. However, in the results you will see that choosing this size of the window was not sufficient as it is not calculating the segmentation in a proper way. The reason for this is simple. Say, we are considering a brick pattern. A 15x15 window will consider a small region of the whole brick and hence produce an energy value that is similar to the selected region in some other part of the image. This produces error in segmentation. I decided to work with a 51x51 window. DISCUSSION2-Adding extra dimension to feature vector . Extending the size of window improved the result. However, there is still some scope of improvement left in the result. I decided to add the coordinates x y of the pixel as an extra 2 dimensions in the feature space. This helps in binding the pixels by some kind of spatial distribution. Basically we are taking into consideration, where a pixel is located inside an image. 51X51 window in texture1 with extra 2 dimensions

51X51 window in texture2 with extra 2 dimensions

EE 569- Fall’08 | ID- 5994499980

24


Increasing the window size gives better result. However using a larger window produces an error in decision of boundaries properly. This is because a 51x51 window we are also considering for the boundaries , this makes energy of boundaries not distinct and hence producing error in decision of boundaries.

PROBLEM3-

OPTICAL CHARACTER RECOGNITION (OCR) OPTICAL CHARACTER RECOGNITION (OCR) OBJECTIVE: We have been give a training set that contains alphabets A,B,C,D,E,K,L,M,N,O,P,R,S,T,U. All of them are of Arial font and same font size. We have to use this as our training set to make our program understand the characteristics of different features of alphabets. In first part we have to extract the shape features of the alphabets in terms of line numbers and end points. In second part we have to read the test image and compare its result with our feature set to find out what kind of character it is. We have to develop a set of features based on a set of training data and then scan the test images and declare each of the characters in these test images as the one which is the closest match in the training data.

MOTIVATION

EE 569- Fall’08 | ID- 5994499980

25


The idea behind Optical Character Recognition is to extract features from the characters and/or numerals and special symbols and use them as parameters to segment and detect their presence in any document. This is the principle of Document Processing. This plays an important part in Pattern Recognition and also for describing objects in an Image Understanding System.5 OCR is also used for shape analysis of images where in a particular symbol is declared to be of a predetermined character. PROCEDURE & Algorithm: Steps part1: Binarize the image.( to distinguish between the object and background) Thin the image (used for part1 only) (we need to find end points and diagonals by finding the hit and miss patterns of given set of masks. Find the minimum bounding box for each character and segment the characters Run the algorithm to find end points and line numbers for each character. Store it in an array representing feature vector. Compare the characters on the basis of this feature vector. Steps part2: Binarize the image. Find the minimum bounding box for each character and segment the character in different arrays. Rum algorithm to find: Area, perimeter, Euler number, circularity, spatial moment, symmetry, aspect ratio, Euclidean distances (my approach). The following were the imperative concepts and steps: BINARIZATION: Binarize the training image into two gray levels (0 and 255). If the pixels’ value is less than a particular threshold, then set its gray level to 0, else set its gray level to 255. Our given image has values ranging from 255 or 0 so binarizing is an important step. I used the threshold value of 128 and gave value 0 to every pixel below 128 and value 255 to every pixel above 128. OBJECT SEGMENTATION: In my program, I have taken advantage of the fact that the characters are uniformly distributed. I am checking all white rows and all white columns to segment image in 15 segments roughly. Although this is not the best approach to do this, but this was a suggestion of the TA and I found it convincing to not complicate matters. I have first determined 15 boxes containing each character roughly .I mean the character might not be in the center but is contained inside the image. BOUNDING BOX DETERMINATION:

5

Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods

EE 569- Fall’08 | ID- 5994499980

26


We detect the corners (as done in the previous case) and store these values of x and y coordinates. I found the ymin , ymax, xmin, xmax from this set of values and draw a box (virtually).this box contains the image completely with no extra rows and columns. I use this box to find different features. FEATURE EXTRACTION: I have used the following features in my program in order to characterize the numbers/characters given in the training image Part 1 Line number End point number Part 2 Area Perimeter Euler Number Circularity Aspect Ratio Symmetry (Upper mass, lower mass,rightmass and leftmass) Central spatial moment Symmetry Elongation Euclidean distance from feature vector (my approach). Part 1Finding the line direction I check the image (segmented wherever mentioned from now on) for 4 patterns indicating occurrence of a line. There are four different line directions that can occur in a character. These line directions are as follows:

I have declared hline, vline, ldline, rdline as the integers that will store the number of occurrence of these patterns in the given image.( h=horizontal,v= vertical, ld= left diagonal, rd=right diagonal) As soon as we get a hit, we record that particular instance as a hit and assign a value 1 to that respective integer. I have declared an array linenumber={hline,vline,ldline,rdline}; so for instance for A={1,0,1,1}. This is one part of my feature vector.

EE 569- Fall’08 | ID- 5994499980

27


End point number End point is defined as a point that is only connected with one direction. (4 connectivity or eight connectivity. End point is located at the end of the point. End point number can be calculated by using the masks below:

I have declared and array: numberofendpoints1[15][8]={leftendpoint,rightendpoint,topendpoint,bottomendpoint,topleftdiagon alendpoint,toprightdiagonalendpoint,bottomleftdiagonalendpoint,bottomrightdiagonalendpoint}; This stores the occurrence of these points in a character. This way we have a clear picture of spatial distribution of end points along with the exact information of end points. This is my second feature vector. I concatenate both these feature vectors to a single feature vector. (This helps me find the Euclidean distance). PART 2 After segmenting each object in the training data we now compare the objects inside the bounding box with the following bit quad patterns. Q1 consists of four masks: 10 01 00 00 00 00 10 01 Q2 consists of four masks: 11 01 00 10 00 01 11 10 Q3 consists of four masks: 11 01 10 11 01 11 11 10 Q4 consists of one masks: 00 00 Qd consists of two masks: EE 569- Fall’08 | ID- 5994499980

28


10 01

01 10

AREA: The area of an object is the number of object pixels that constitute the entire object. If an object pixel has a value equal to 1, Area = 0.25 * (nq1 + (2 * nq2) + (3 * nq3) + (4 * nq4) + (2 * nqd) However, the area of the object can be also calculated by simply counting the number of black pixels in the object. Basically looking for {1} pattern. Area=n {1}; [I have used this approach] PERIMETER: The perimeter of an object is defined as the number of sides of the object which separate the pixels with different values. Perimeter = n{Q1} + n {Q2} + n {Q3} + 2*n{Qd} However , the perimeter of the object can be also calculated by looking at patterns{1 0},{0 1},{1 0}t,{0,1}t. Again I have used this approach. Perimeter =n{1 0}+n{0 1}+n{1 0}t+n{0,1}t. where n is the number of occurrences. AREA/PERIMETER RATIO Ratio of area/perimeter is a better feature then the area or perimeter alone since they are scaling variant. Sometimes the area is so large that this ratio is guided by area alone and bringing errors in decision making. So to avoid this we scale the perimeter such the ratio is normalized. EULER NUMBER: It is defined as the number of connected components that constitute the object minus the number of holes within the object. Euler number = 0.25 * (n{Q1} – n{Q3} – 2*n{Qd}) CIRCULARITY: The Circularity of an object is defined as the ratio that describes how far the shape of the object approximates a circle. Circularity = 4 * pi * Area/ (Perimeter)^2 ASPECT RATIO: EE 569- Fall’08 | ID- 5994499980

29


The Aspect Ratio is defined as the ratio of the Height of the object to the width of the object. Height=height of bounding box; Width=width of bounding box; Aspect Ratio = Height/Width Width Ratio = Width/ (Height + Width) Height Ratio = Height/ (Height + Width) SYMMETRY: An object is said to be symmetric horizontally if the mirror image of one half of the object along the horizontal direction gives the other half. An object is said to be vertically symmetric if the mirror image of one half of the object along the vertical direction gives the other half. We use upper mass and lower mass to find the horizontal symmetry. An object is said to be entirely symmetric if it exhibits the above two properties. I have calculates symmetry in terms of 2 parameters. Left mass/right mass ratio and uppermass/ lower mass ratio. To this I divided the bounding box in 2 parts first horizontally and then vertically. Then I have calculated the number of pixels in each side of the axis and taken the ratio. For symmetric objects this ratio is one.

SPATIAL MOMENT: The (m, n)th moment of the joint probability density function can be used to describe the features of an object. Here the joint probability density function is replaced by the continuous image function. The shape of the object is characterized by a few of the low-order moments. The use of the features for OCR can be justified by the fact that these features are invariant to a certain extent for a particular symbol. Fill the data structure corresponding to the features of the training data symbols. Meanx =∑x f(i,j); Meany=∑y f (i,j); Moment (m, n) =∑xmyn f (i, j); Value of f (I, j) =1 or 0; Euclidean Distance: The Euclidean distance(ed) is defined as: Euclidean Distance = || Ei – Ej || = ∑ (( Ei – Ej) ^ 2) I analyzed that the feature arrays that I have stored for each character can be considered as feature vectors with each feature contributing a dimension. If my decision tree fails, I can use the ed to find the character closest to the given character. The logic behind this is that each character has a certain kind of spatial distribution that is unique. So feature vector gives us the way to define a character by some unique set of values. Some of the features are scaling variant, some are font variant. These kinds of variances can lead us to have a faulty decision sometimes. So we keep ed as a final check to find the closest character. EE 569- Fall’08 | ID- 5994499980

30


Implementation and results: As discussed above, I segmented the image in terms of their spatial distribution6. The set of pixels that belong to each object are stored as an individual entry in a cell array. I create 15 arrays for each character. Step 1- Thinned image of training .raw to calculate the number of lines and end points.

I have designed a feature vector as mentioned above which represents endpoints+line numbers. For example we get for A: Endpoints={1,1,0,0,0,0,0,0,} line numbers={1,0,1,1} So the program stores {1,1,0,0,0,0,0,0,0,0,1,0,1,1} as the Value of A.

To recognize the character in the test image I match the features of the character with all the images. The vector closest to the feature vector is picked as the alphabet for that text character under consideration. Results & Discussion

Example: comparison of features of A= A training ={1,1,0,0,0,0,0,0,1,0,1,1} A test ={0,1,0,0,0,0,2,2,1,0,1,1}

6

Confirmed by TA.

EE 569- Fall’08 | ID- 5994499980

31


This gives a 1 bit error. So these is a possibility that the procedure can detect a but of we take error within a limit of plus and minus 2. -2<error<+2

For hard boundaries like ANDing this procedure failed for almost all alphabets other then, U,T,S,R,P,O. This is because the characters in training and test both are simple and has no extra extenstions due to font style. This makes it possible to detect the characters.

I then chose the second approach.

I calculated the above mentioned parameters like area/perimeter ratio, euler number etc. I then grouped the characters according to their features. Euler number =0 A,D,O,P,R. Euler number =-1 B Similarly. When I get the imput character, I first tested it for symmetries. I have save uppermass/lowermass ratios, leftmass/rightmass ratios. I check these ratios of the input image and see if the image is horizontally symmetric of vertical or none. If symmetric- B,D,C,M,O,S,T,U,A,K,E Else L,P,R. This narrowed the number of comparison set. I then check the euler number. And narrow the set further. Euler number is unique(mostly) for all. EE 569- Fall’08 | ID- 5994499980

32


Later on I check for aspect ratio. This helps me categorize the long characters from wide characters. Using all these features I was able to successfully detect all the characters. DISCUSSION & results continued. The reason for this is that these features are mostly based upon the basic structure of the element rather than just the font style. Which was the point in first case. Hence, being scaling invariant and design invariant , we were able to detect the characters as per their basic structure that it should have. My approach: The chart below represents the Euclidean distances of each alphabet in training set to the test set. The blocks marked in green are the distances for correct detection. Red represents the second closest. A B C D E K L M N O P R S T U 3.8 3.6 1.7 3.3 10.7 5.1 7.4 6.0 3.0 2.6 5.0 3.7 40.1 6.5 a 7 1 3 2 4.58 7 0 2 0 0 5 0 4 2 6 4.4 3.4 3.7 2.8 4.8 5.8 4.5 3.1 1.4 3.4 3.3 5.6 b 7 6 4 3 4.00 9.95 0 3 8 6 1 6 2 2.83 6 4.5 5.2 2.6 4.3 11.1 5.6 8.4 7.0 4.5 3.3 6.4 5.2 7.1 c 8 0 5 6 5.92 4 6 3 7 8 2 0 9 4.58 4 4.4 3.7 3.4 2.4 3.8 6.1 5.0 2.8 1.4 4.0 3.3 4.6 d 7 4 6 5 3.74 9.11 7 6 0 3 1 0 2 2.00 9 4.1 4.5 4.8 4.1 4.4 5.0 4.2 4.3 3.3 3.8 3.7 6.4 e 2 8 0 2 5.00 9.17 7 0 4 6 2 7 4 3.32 0 4.3 4.3 2.2 3.6 10.2 4.6 7.8 6.4 3.3 3.0 5.5 4.0 5.7 k 6 6 4 1 4.58 0 9 1 8 2 0 7 0 3.87 4 3.7 4.2 2.8 3.1 4.1 6.7 5.5 3.4 2.0 4.9 3.6 5.4 l 4 4 3 6 4.69 9.64 2 8 7 6 0 0 1 2.83 8 3.7 4.2 2.8 3.1 4.1 6.7 5.5 3.4 2.0 4.9 3.6 5.4 m 4 4 3 6 4.69 9.64 2 8 7 6 0 0 1 2.83 8 5.2 5.0 3.8 4.1 4.9 7.6 6.6 3.8 3.3 5.5 4.6 5.7 n 0 0 7 2 4.12 9.70 0 8 3 7 2 7 9 3.61 4 4.1 4.1 2.2 3.3 10.1 4.6 7.6 6.4 3.0 2.6 5.3 3.7 5.5 o 2 2 4 2 4.58 0 9 8 8 0 5 9 4 3.61 7 4.0 4.6 2.0 3.7 10.9 5.2 8.1 6.7 4.0 2.4 6.0 4.8 6.7 p 0 9 0 4 5.48 1 0 2 1 0 5 0 0 4.00 8 4.2 3.7 3.1 2.8 4.5 6.3 5.0 3.1 0.0 4.0 3.6 5.6 r 4 4 6 3 3.74 9.95 8 2 0 6 0 0 1 2.45 6 4.2 3.4 2.8 2.8 4.5 6.3 4.8 2.8 1.4 4.0 3.3 5.6 s 4 6 3 3 3.46 9.95 8 2 0 3 1 0 2 2.83 6 3.0 3.8 3.0 2.6 3.7 6.2 4.9 3.6 1.7 4.8 4.0 5.9 t 0 7 0 5 3.87 9.27 4 5 0 1 3 0 0 2.24 2 4.5 4.8 3.0 3.3 4.0 7.6 6.4 3.6 2.6 5.7 4.2 4.8 u 8 0 0 2 4.80 9.38 0 8 8 1 5 4 4 3.00 0

EE 569- Fall’08 | ID- 5994499980

33


The chart below shows the results of taking error into consideration:

a b c d e k l m n o p r s t u

A 1.7 3 1.4 1 2.6 5 1.4 1 3.3 2 2.2 4 2.0 0 2.0 0 3.3 2 2.2 4 2.0 0 0.0 0 1.4 1 1.7 3 2.6 5

B 2.6 5 2.8 3 3.3 2 2.0 0 3.3 2 3.0 0 2.8 3 2.8 3 3.6 1 2.6 5 2.4 5 2.4 5 2.8 3 2.2 4 3.0 0

C 3.0 0 2.8 3 4.3 6 2.4 5 3.7 4 3.3 2 2.8 3 2.8 3 3.8 7 3.0 0 3.7 4 2.8 3 2.8 3 2.6 5 3.0 0

D 3.3 2 3.1 6 4.5 8 2.8 3 3.8 7 3.6 1 3.1 6 3.1 6 3.8 7 3.3 2 4.0 0 3.1 6 2.8 3 3.0 0 3.3 2

E 3.6 1 3.3 2 4.5 8 3.3 2 4.1 2 3.8 7 3.4 6 3.4 6 4.1 2 3.6 1 4.0 0 3.1 6 2.8 3 3.0 0 3.6 1

K 3.7 4 3.4 6 4.5 8 3.4 6 4.1 2 4.0 0 3.6 1 3.6 1 4.1 2 3.7 4 4.0 0 3.6 1 3.3 2 3.6 1 4.0 0

L 3.8 7 3.4 6 5.2 0 3.7 4 4.2 4 4.3 6 3.7 4 3.7 4 4.6 9 4.1 2 4.6 9 3.7 4 3.4 6 3.7 4 4.2 4

M 4.5 8 3.7 4 5.2 9 3.7 4 4.3 6 4.3 6 4.1 2 4.1 2 4.9 0 4.1 2 4.8 0 3.7 4 3.4 6 3.8 7 4.5 8

N 5.0 0 4.0 0 5.6 6 3.8 7 4.4 7 4.5 8 4.2 4 4.2 4 5.0 0 4.5 8 5.2 0 4.0 0 4.0 0 3.8 7 4.8 0

O 5.1 0 4.4 7 5.9 2 4.0 0 4.5 8 4.6 9 4.6 9 4.6 9 5.2 0 4.6 9 5.4 8 4.2 4 4.2 4 4.0 0 4.8 0

P 6.0 0 4.5 8 6.4 0 4.4 7 4.8 0 5.5 7 4.9 0 4.9 0 5.5 7 5.3 9 6.0 0 4.5 8 4.5 8 4.8 0 4.8 0

R 6.5 6 4.8 0 7.0 7 4.6 9 5.0 0 5.7 4 5.4 8 5.4 8 5.7 4 5.5 7 6.7 1 5.0 0 4.8 0 4.9 0 5.7 4

S 7.42

T 10.7 7

U 40.1 2

5.66

5.83

7.14

8.43

9.95 11.1 4

5.00

6.16

9.11

5.00

6.40

6.48

7.81

9.17 10.2 0

5.57

6.78

9.64

5.57

6.78

9.64

6.63

7.68

6.48

7.68

6.78

8.12

9.70 10.1 0 10.9 1

5.66

6.32

9.95

5.66

6.32

9.95

5.92

6.25

9.27

6.48

7.68

9.38

The chart represents that using this approach we were able to detect B,C,D,O,P,T,S from the test set. We can detect K,L if we increase our error window. However, we cannot find the correct algorithm to take a decision on the basis of Euclidean distances only. Our second check should be taking into consideration each feature separately. However, we find that this is not a very efficient way for alphabet detection when we take into consideration only the endpoints and line numbers. The reason for errorWe see that thinning has a considerable effect in detection of parameters. As shown above, the thinning effects in training .raw are not as drastic as in test.raw.

EE 569- Fall’08 | ID- 5994499980

34


Unfortunately our algorithm runs only for thinned images and hence is very dependent on font style. Since in our test image O,P,R,S,T and U are very straight fonts, it is easier to detect them. Also, I realize there might be some small error in my programming some where which is why it is showing error in the detection of L and M. but I couldn’t find any possible error in my code. Part2 Euler Numbers

EULER NUMBERS Test.raw

Training.raw

For A=0;B=1;C=2;D=3;E=4;K=5;L=6;M=7,N=8,O=9,P=10,R=11,S=12,T=13,U=14 Wrong detection of euler numberFOR ALPHABETS – K,L,M as per our matching set in training.raw. we take this as wrong detection because we consider results for training as standards. Detection of B We just match the euler number of input character by -1. This way we can detect B.

Check for vertical symmetry: A&O A—2 end points ------------------correct decision of A;// this is a font dependent decision. O---0 end points------------------correct decision of 0;

EE 569- Fall’08 | ID- 5994499980

35


We are not using circularity property for O because

We see that circularity of O in training set was not 1. Hence it was not a good parameter to judge. However, the ratio of area and perimeter are still close. The reason for this circularity not being 1 is that because the O given in training set is elongated and not perfect o. Check for horizontal symmetry: D&O D--------------------------------------correct decision of D; Check for Uppermass/lowermass ratio: P<R Check for leftmass & rightmass ratio: P<R P & R detected on the basis of this ratio. This method is invariant of fontsize and style. Hence detects the alphabets correctly here.

Checking for vertical symmetry: Options: S,T,U. Here S is under consideration because I am calculating symmetry on the basis of area of left and right. Because of their structure, the area happens to be same and hence, making them symmetric. Number of end points check: Case 3: T—has 3 end points---------------------------detected successfully; Case else: S,U. Uppermass/lowermass ratio: Rightmass/leftmass ratio: S has same UM/LM ratio but U has smaller UM/LM ratio; U & S are detected. Checking for horizontal symmetry: C,E ( the rest are already detected.) Endpoint check: C has 2 diagonal end points and E has 3 left end points. C & E Detected properly. EE 569- Fall’08 | ID- 5994499980

36


Here logically M, K,L should be detected. But since in our test set the detection of euler number for M is faulty, it reflects in the decision making of M.

Numbers left. K,L,M We check for Vertical symmetry: M M can be detected as A,M,O,N,S,T & U. M has 2 bottom end points in training set but 4 in test We check for horizontal and vertical lines.

From the above set only M has one left diagonal and one right diagonal lines. Hence, M matches. Hence, M is detected properly. We check for Horizontal symmetry: K. + plus k has 4 endpoints. // Easy decision. L has one top end point and one left end point.+ L has no symmetry. So possible choices. P , R & L. We check uppermass/lowermass ratio. This only matches with L. Hence L detected.

Hence, all the alphabets were detected inspite of first check failing. We conclude the second approach was much efficient inthis case because of the reason mentioned mentioned in above discussion. PART 2

DETECTION OF NUMBERS (0-9) DECISION TREE FOR THE TRAINING DATA:

EE 569- Fall’08 | ID- 5994499980

37


When we take decision, we take parameters that doesn’t change on scaling, rotating and only when there is no other possibility. I have taken area or perimeter rarely and its use minimal in my program. 1) Check if the Euler Number is 1, 0 or -1. We first use the Euler number. We have 3 groups: E=-1 E=0 E=1

{8} {6,9,4,0} {1,2,3,5,7,+,-, . ,/, *, }

2) If the Euler number is -1, there is only one possibility – the number ‘8’ .so number 8 has been identified! 3) If the Euler number is 0, there is only 4 possibility – the numbers ‘6,9,4,0’ , Now, when we run the training image, we see that ‘0’ has the maximum circularity and upper mass and lower mass are nearly symmetry. So we can set a small threshold so that it won’t deviate too much in their symmetry and hence number ‘0’ has been identified! Next, The numbers ‘6 & 9’ can be differentiated by their leftmass/rightmass (L/R)ratio. In training image as well as test images, L/R ratio for 6 is maximum while for 9 is minimum. So we can differentiate it by this ratio and the numbers ‘6’ and ‘9’ has identified! Then with Euler ‘0’ and aspect ratio minimum ,we can get the number ‘4’ because aspect ratio was minimum in the test images as well as training images.. So the numbers 6,9,4,0 can be identified 4) Then with the Euler number ‘1’,we have {1,2,3,5,7,+,-, . ,/, *, } Since there a lot of characters with this euler number’1’,so utmost care was taken to differentiate the characters among themselves. An approximate way to detect ‘1’ could be by noting that the Aspect Ratio of ‘1’ is large enough , ( infact the largest) and also its circularity index is a small value. In my program I used a threshold value of 1.6 which was obtained by trial and error basis. We come to the next three possibilities – ‘2’, ‘5’, ‘7’ . A crucial feature that characterizes these three uniquely is the Upper Mass and the Lower Mass of the bounding box. It is found that the upper mass and the lower mass in case of the number ‘2’ do not differ by a great extent and it seems to be biggest among all. Using a small threshold, ‘2’ can be detected. As we can see clearly that the Upper Mass and the Lower Mass in case of ‘7’ will differ by a large value which is the case in practical . Hence a large threshold set will detect ‘7’. The other feature that will be able to differentiate 7 is R/L ratio. The ratio comes approximately to ‘2’.So we can have a threshold from 1.9 to 2.2 and isolate 7.If the above two conditions do not hold, it can mean only one thing – the number ‘5’.So we have differentiated 2,5,7! EE 569- Fall’08 | ID- 5994499980

38


Sometimes 7 and 1 can be misjudged. So Another suggested approach to distinguish 7 & 1: (not implemented)We can take a row-wise histogram of the symbol for its entire height. If it remains almost constant but with a big peak in the beginning, it is a seven. Otherwise it is a 1. But it’s a difficult and tedious procedure. Detecting a ‘3’ was quite a difficult because I couldn’t find a unique parameter that differentiates it from others. So I had to think a another way .The number ‘3’ can be detected uniquely by some other method,how?. The difference between the mass of the pixels in the right portion of the bounding box and the mass of the pixels in the left portion of the box is always a large value. Thus we can choose a certain threshold to detect the number ‘3’. 5) Now comes differentiating the symbols. They are 5 unique symbols. Out of which .(dot) can be found out uniquely. The central moments, Moments, area and perimeters are least for dot(.).So it can easily detected.. Next, Among the above 4 symbols only ‘-’ and ‘+’ exhibit symmetry. Once detected that they are symmetric, one way of classifying ‘+’ and ‘-’ is that the normalized area of ‘+’ Is always greater than that of ‘-’. This condition will hold universally. What I mean by symmetry? For plus, leftmass = rightmass and uppermass=lowermass and for minus only leftmass=rightmass. So they can be got and detected. Then we are left with ‘*’ and ‘/’.Centrals moments are approximately same in the test and training images for /.So we can detect /.For *,leftmass=rightmass and uppermass is greater than lower mass.Though ‘-’ and ‘*’ properties are likely to be similar,we have to emphasis on the fact that ‘*’ is not a symmetrical pattern. So thus based on various parameters and characters ,all the 15 patterns were uniquely identified and isolated and it was tested using the test patterns below. DISCUSSION OF RESULTS:

Training Image given used to train my program,!

EE 569- Fall’08 | ID- 5994499980

39


Word1.raw

Output: The Number/character is 9 The Number/character is 6 The number/character is 1 The number/character is . The number/character is 7

My program was able to recognize the characters 7,1,.,6,9.So my threshold value setting for the above numbers are perfectly correct and we get the desirable output.

Word2.raw

Output : The number/character is 4 The number/character is 2 The number/character is * The number/character is /

My program wasn’t able to recognize ‘7’ in this word2 and it was able to detect the other words!So my output was 4,2,*,/ . METHODS TO OVERCOME THE PROBLEM: My algorithm to detect the number 7 was U/R (uppermass/lowermass)ratio as mentioned in my algorithm. The ratio was approximately ‘2’ but in the above image, we can see that ‘7’ was slightly slanted and it has a different structure than in the training image. So my program wasn’t able to detect it and in fact we see that u/r ratio was equal in this test image. So this was the reason that I was able to detect ‘7’ in word1.raw but couldn’t detect ‘’7’’ in word2.raw. One method to overcome this problem is to manually work out all possible ways of the script for the letter ‘7’.Once all the scripts are formed, then analysis of all the possible parameters are made and a certain threshold is obtained and its fed inside the program. Once its fed ,now if we run the program, then chances of detecting that number is more than the previous method in which I did where we train the program by giving only one training image. Literally what I mean is, there should be lot of training images fed to the program and then the test images should be given to the program .Then the program is less likely to make mistakes in detecting .! EE 569- Fall’08 | ID- 5994499980

40


I have attached how and why ‘7’ was not detected by my program!

EE 569- Fall’08 | ID- 5994499980

41


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.