View with images and charts Text String Extraction from Scene Image Based on Morphological Features Chapter: 1 Introduction 1.1 Introduction In the digitalization of the world, it becomes more and more important to extract text from image. Because text data present in images contain useful information for automatic annotation, indexing and structuring of images. Furthermore, text printed on the cover of magazine, signs, indicators, billboards etc always mixes with photos and designs [1]. This kind of texts in scene images may take much information and thus need to separate text string from scene image. There are numerous applications of a text information extraction system, including document analysis, vehicle license plate extraction, technical paper analysis, and object oriented data compression. In the following, we briefly describe some of these applications.
Using Automatic Business Card Reader (BCR), necessary contact information can be directly populated into the phonebook from the business cards. Although, such applications are commercially available in some mobile handsets, the accuracy is yet to be extended to be really useful in practice. It is observed that graphic backgrounds are commonly found in most business card images. In order to recognize the text information from the card, the text and background contents must be separated [2].
Text printed on the cover of magazine always mixes with photos and designs. This kind of text in scene images may take much information and automatic recognition can be useful for visually impaired persons and foreign travelers. For example, automatic recognition of signs and indicators can help the blind people to move on the streets freely. Also, recognition of magazine cover can help to extract the information into database quickly for library to improve the efficiency of classification [1].
Used in OCR (Optical character Recognition). OCR software enables us to successfully extract the text from an image (and not only) and convert it into an editable text document.
Also used for Form Processing, Maps Interpretations, bank cheque processing, postal address sorting and Engineering drawing interpretation.
Our main objective is to extract text from scene image. In this paper, we discuss a novel approaches for detecting and extracting text from scene image based on morphological features. Text extraction system involves detection and extraction of the text from a given image. However, variations of text due to differences in size, style, orientation and alignment, as
well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. Thus, extracting texts from image is not too simple. 1.2 Previous works and overview of our approach Several types of technologies have been developed to extract text from scene image. Nirmala Shivananda and P. Nagabhushan [1] proposed a hybrid method for separating text from color document images. But this method can’t extract text from complex graphics. Partha Pratim Roy, Josep Llad´os and Umapada Pal [3] proposed a method for separating text from color map based on connected component analysis and grouping of characters in a string. This approach can detect the characters, connected to graphics and can separate them. But some of the characters can’t separate through connected component analysis. For gaining better result we use Lixu Gu’s [4 ] approach, which is based on mathematical morphology. We also focus on the limitations of Gu’s approach. This approach includes two distinct stages – Primary processing and Extraction processing. In primary processing, we give input image and use a new shape decomposition filter based on morphological recursive opening and closing. The input image can be color or gray scale image. The proposed filter decomposes input image into several subimages based on the size of the characters. Extraction processing includes three steps – feature emphasis, character extraction and noise reduction. In feature emphasis step, a new morphological filter used to emphasis characters features in subimages and remove most of the noise. By histogram method characters are extracted from subimages in character extraction step. Lastly, using a morphological filter based on closing is implemented for noise reduction. The remainder of this paper is organized as follows: Chapter 2 provides image processing basics. Morphological concepts for image processing are discussed in chapter 3. In Chapter 4 we describe our method. Experimental results with scene images and discussions are discussed in Chapter 5. Finally, the conclusions are drawn in Chapter 6. Chapter: 2 Image Processing Basics Digital image processing is an important area in Computer Science. As vision is the most advanced of our senses, so it is not surprising that images play the single most important role in human perception. Thus, before starting our discussion we have to familiar with image processing basics. In this chapter we describe very basics of digital image processing. 2.1 Digital Image Processing Any 2D mathematical function that bears information can be represented as an image. A digital image is an array of real or complex numbers represented by a finite number of numbers of elements. These elements are referred to as picture elements, image elements, pels, and pixels. Pixel is used to denote the elements of a digital image. Digital image processing generally refers to processing of a 2D picture by a digital computer. An image may be defined as a two-dimensional function, f(x, y), where x and y are spatial coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity or
gray level of the image at that point. When x, y, and the amplitude values of f are all finite, discrete quantities, we call the image a digital image [5]. Digital image processing is the use of computer algorithms to perform image processing on digital images [6]. Many types of remote sensing images are routinely recorded in digital form and then processed by computers to produce images for interpreters to study. The simplest form of digital image processing employs a microprocessor that converts the digital data tape into a film image with minimal corrections and calibrations. At the other extreme, large mainframe computers are employed for sophisticated interactive manipulation of the data to produce images in which specific information has been extracted and highlighted [7]. An image given in the form of a transparency, slide, photograph, or chart is first digitized and stored as a matrix of binary digits in computer memory. This digitized image can then be processed and/or displayed on a high resolution television monitor. For display, the image is stored in a rapid-access buffer memory which refreshes the monitor at 30 frames/s to produce a visibly continuous display. Mini-or microcomputers are used to communicate and control all the digitization, storage, processing, and display operations via a computer are made through a terminal, and the outputs are available on a terminal, television monitor, or a printer/plotter. 2.2 Digital Image Representation
“Virtual image, a point or system of points, on one side of a mirror or lens, which, if it existed, would emit the system of rays which actually exists on the other side of the mirror or lens.� --Clerk Maxwell We need a coordinate system to describe an image, the coordinate system used to place elements in relation to each other is called user space, since this is the coordinates the user uses to define elements and position them in relation to each other [8 ].
(0,0)
X
Y (1, 1)
Figure 2.2 (a): Coordinate system Representation for a digitalized image function in the coordinate system is given below:
f(x,y) =
f(0,0)
f(0,1)
............
f(0,N-1)
f(1,0)
f(1,1)
............
f(1,N-1)
f(M-1,1)
............
f(M-1,N-1)
: :
f(M-1,0)
The right side of this equation is a digital image by definition where the image has M rows and N columns. That is the image is of size M x N [5].
2.3 Image Properties In addition to the pixel data, images occasionally have many other kinds of data associated with them. These data, known as properties, is a simple database of arbitrary data attached to the images. Each property is simply an Object with a unique, case insensitive name [9]. 2.3.1
Color
“Color is an important attribute for image matching and retrieval (see (Niblack et al.,1993)).�Color is the most widely used attribute in image retrieval and object recognition. Humans seem not to be as affected by small variations in color as by variations in gray level values. A color image is usually stored in memory as a raster map, a two-dimensional array of small integer triplets; or (rarely) as three separate raster maps, one for each channel. Eight bits per sample (24 bits per pixel) seem to be adequate for most uses. A (digital) color image is a digital image that includes color information for each pixel. 2.3.2
Texture
Small surface structure, natural or artificial, regular or irregular is call texture. Statistical texture analysis describes texture as a whole based on specific attributes: regularity, coarseness, orientation, contrast etc. Texture Examples: wood barks, knitting patterns etc.
Figure 2.3.2(a): Example of Texture. 2.4 Applications of Image Processing Digital image processing has a broad spectrum of applications, such as remote sensing via satellites and other spacecrafts, image transmission and storage for business applications, medical imaging, radar, sonar, and acoustic image processing, robotics, and automated inspection of industrial parts. (a) In Satellite Images acquired by satellites are useful in tracking of earth resources, geographical mapping, prediction of agricultural crops, urban growth, and weather, flood and fire control, and many other environmental applications. (b) In Robot Vision In roboting, inside and outside robot navigation, automatic inspection of industrial process, pattern recognition etc. (c) Space Image Application Space image applications include recognition and analysis of objects contained in images obtained from deep space-probe missions. (d) Image Transmission and Storage Image transmission and storage applications occur in broadcast television, teleconferencing, transmission of facsimile images (printed documents and graphics) for office automation , communication over computer networks and closed-circuit television based security monitoring systems, and in military communications. (e) In Medical Application In medical applications one is concerned with processing of chest X rays, cineangiojgrams, projection images of transaxial tomography and other medical images that occur in radiology, nuclear magnetic resonance (NMR) and ultrasonic scanning. These images may be used for patient screening and monitoring or for detection of tumors or other disease in patients. (f) Radar and Sonar Image Application Radar and sonar images are used for detection and recognition of various types of targets or in guidance and maneuvering of aircraft or missile systems. (g) Other Applications There are many other applications ranging from robot vision for industrial automation to image synthesis for cartoon making or fashion design. 2.5 Problems with Digital Image Processing
There are some problems which associated with digital image processing. These are given below: (a) Image Compression A modern trend in image storage and transmission is to use digital techniques. Digitizing a television signal results in -100 megabits per second. But channel bandwidth is expensive. So for applications such as teleconferencing, one wants to use a channel of 64 kilobits per second. For other applications such as videophone and mobile videophone, even lower channel bandwidths (e.g., 9 kilobits per second) are desirable [10]. (b) Image Enhancement In enhancement, one aims to process images to improve their quality. An image may be of poor quality because its contrast is low, or it is noisy, or it is blurred, etc. Many algorithms have been devised to remove these degradations. The difficult problem is how to remove degradations without hurting the signal. For example, noise-reduction algorithms typically involve local, averaging or smoothing which, unfortunately, will blur the edges in the image. Adaptive methods have been investigated-e.g., smoothing less near the edges. However, they are generally effective only if the degradation is slight [10]. (c) Image Recognition Typically, a recognition system needs to classify an unknown input pattern into one of a set of pre-specified classes. The task is fairly easy if the number of classes is small and if all members in the same class are almost exactly the same. However, the problem can become very difficult if the number of classes is very large or if members in the same class can look very different [10]. (d) Image Representation and Modeling An image could represent luminances of objects in a scene (image taken by camera), the absorption characteristics of the body tissue or material particles (X-ray imaging), radar cross-section of a target (radar imaging), the temperature profile of a region (infrared imaging), the gravitational field in an area (geophysical imaging). An important consideration in image representation is the fidelity or intelligibility criteria for measuring the quality of an image or the performance of a processing technique. Specification of such measures requires models of perception of contrast, spatial frequencies, color and so on. Knowledge of a fidelity criterion helps in designing the imaging sensor, because it tells us the variables that should be measured most accurately. A classical method of signal representation is by an orthogonal series expansion, such as the Fourier series. For images analogous representation is possible via two-dimensional orthogonal functions called basic images. For sample images the basis images can be determined from unitary matrices called image transformations. Any given image can be expressed as a weighted sum of the basis images. Several characteristics of images, such as their spatial frequency content, band-width, power spectrum, and application in filter design, feature extraction, and so on, can be studied via such expansions.
Statistical models describe an image as a member of an ensemble, often characterized by its mean and covariance functions. This permits development of algorithms that are useful for an entire class or an ensemble of images rather than for a single image. In global modeling, image is considered as a composition of several objects. Various objects in the scene are detected (for example, by segmentation techniques), and the model gives the rules for defining the relationship among various objects. Such representations fall under the category of image understanding models. (e) Image Analysis Image analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophisticated as identifying a person from their face [11] . Computers are indispensable for the analysis of large amounts of data, for tasks that require complex computation, or for the extraction of quantitative information. On the other hand, the human visual cortex is an excellent image analysis apparatus, especially for extracting higher-level information, and for many applications — including medicine, security, and remote sensing — human analysts still cannot be replaced by computers. Chapter: 3 Morphological Image Processing The word “morphology” commonly denotes a branch of biology that deals with the form and structure of animals and plants. In digital image processing, mathematical morphology is considered as a tool for extracting image components that are useful in the representation and description of region shape, such as boundaries and the convex hull. For our experiment, we use some fundamental morphological operations such as dilation, erosion, opening, closing etc [5]. In this chapter we emphasis some preliminaries on text string extraction tools. 3.1 Mathematical Morphology Mathematical morphology (MM) is a theory and technique for the analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions. MM is most commonly applied to digital images, but it can be employed as well on graphs, surface meshes, solids, and many other spatial structures. It aims at the analyzing the shape and the forms of the objects [12]. MM was initiated by G.Matheron and J.Serraat Paris School of Mines. It was originally developed for binary images, and was later extended to grayscale functions and images. The subsequent generalization to complete lattices is widely accepted today as MM's theoretical foundation [12]. Topological and geometrical continuous-space concepts such as size, shape, convexity, connectivity, and geodesic distance, can be characterized by MM on both continuous and discrete spaces. MM is also the foundation of morphological image processing, which
consists of a set of operators that transform images according to the above characterizations [12] .
Figure 3.1(a): Image after segmentation
Figure3.1 (b): Image after segmentation and mathematical morphology
3.2 Use of Mathematical Morphology Mathematical morphology is used in following application:
Image enhancement
Image segmentation
Image restoration
Edge and feature detection
Texture analysis
Feature generation
Skeletonization
Shape analysis
Image compression
component analysis
curve filling
Noise reduction
General thinning
3.3 Structure Element In mathematical morphology, a structuring element (s.e.) is a shape, used to probe or interact with a given image, with the purpose of drawing conclusions on how this shape fits or misses the shapes in the image. It is typically used in morphological operations, such as dilation, erosion, opening, and closing, as well as the hit-or-miss transform. It is also called kernel.
According to Georges Matheron, knowledge about an object (e.g., an image) depends on the manner in which we probe (observe) it [13]. In particular, the choice of a certain s.e. for a particular morphological operation influences the information one can obtain. There are two main characteristics that are directly related to s.e.s: •
•
Shape. For example, the s.e. can be a “ball” or a line; convex or a ring, etc. By choosing a particular s.e., one sets a way of differentiating some objects (or parts of objects) from others, according to their shape or spatial orientation. Size. For example, one s.e. can be a 3x3 square or a 21 x 21 square. Setting the size of the structuring element is similar to setting the observation scale, and setting the criterion to differentiate image objects or features according to size. 1
1
1 1
(i)
1
1
1 1 1
1
1
1
1
1 1
1
(ii)
1
(iii)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
0 1
1
(iv)
Figure 3.3(a): Structuring Elements with their origins. IPT (Image Processing Toolbox) function strel constructs structuring elements with a variety of shapes and sizes. Its basic syntax is se = strel(shape, parameters) Where shape is a string specifying the desired shape, and parameters is a list of parameters that specify information about the shape, such as its size. For example, strel(‘diamond’,5) returns a diamond-shaped structuring element that extends ±5 pixels along the horizontal and vertical axes [5].
Two standard shaped structure elements DISK and SQUARE are selected in our method, they are as described as following: 1. A DISK structure element with its origin at the center and radius i is denoted by riDDisk. 2. A SQUARE structure element with length of 2i is denoted by riDsquare.[4] 3.4 Morphological Operations Morphology is a technique of image processing based on shapes. The value of each pixel in the output image is based on a comparison of the corresponding pixel in the input image with its neighbors. By choosing the size and shape of the neighborhood, we can construct a morphological operation that is sensitive to specific shapes in the input image[14]. Morphological operations has been widely used to process binary and grayscale images, with morphological techniques being applied to noise reduction, image enhancement, and feature detection. Morphological operations modify the shape of a component or give information about the shape of a component. Mathematical morphology includes four basic operations:
3.4.1
1. Dilation (expand a component) 2. Erosion (shrink a component) 3. Opening and 4. Closing Dilation
Dilation is also called Minkowski addition. It is an operation that “grows” or “thickens” objects in a binary image. The specific manner and extent of this thickening is controlled by a shape referred to as a structuring element. This process can only turn pixels from OFF to ON.
Dilation is used to increase the area of a component. It Adds pixels around the boundaries and fills interior holes. The value of the output pixel is the maximum value of all the pixels in the input pixel's neighborhood. In a binary image, if any of the pixels is set to the value 1, the output pixel is set to 1 [5]. An image is processed by applying a structuring element • Center the structuring element S on pixel P. • If P is OFF then set it to ON if any part of S overlaps an ON image pixel. Given binary image f and structuring element s the dilated image g can be described as: g = f ⊕s g( x , y ) =
m/2
n/2
v
v
k = -n/2
f ( x-k, y-l) s(k,l)
k = -m/2
g( x , y ) = { 1 if s hits f Properties of Morphological dilation is given below1. Commutative : D(A,B) =0 A Botherwise = B ⊕A = D(B,A) ⊕
2. Associative : A ⊕(B ⊕ C) = (A ⊕B) ⊕C 3. Translation Invariance: A ⊕(B + x) = (A ⊕B) + x 4. Decomposition : A ⊕(B U C) = (A ⊕B) U (A ⊕C) 5. Multi-dilations: nB = B ⊕ B ⊕ B ⊕ B ………….
1
1
1
1
1
1
1
1
1
⊕ B
Figure 3.4.1(a): Square size Structuring Element
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
0
1
0
1
0
0
0
0
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
0
0
0
0
0
0
0
Figure 3.4.1(b): Original image 1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
Figure 3.4.1(c): Image after Dilation 3.4.2
Erosion
Erosion is also called Minkowski subtraction. It shrinks or thins object in a binary image. The manner or extent of shrinking is controlled by structuring elements. It is used to decrease the area of a component. It removes pixels around the boundaries and enlarges interior holes. The
value of the output pixel is the minimum value of all the pixels in the input pixel's neighborhood. In a binary image, if any of the pixels is set to 0, the output pixel is set to 0. An image is processed by applying a structuring element •
Center the structuring element S on pixel P
•
If P is ON then set it to OFF if any part of S overlaps an OFF image pixel [5].
Given binary image f and structuring element s the dilated image g can be described as: g=f Өs n/2
g( x , y ) =
^
^
k = -n/2
m/2
f ( x-k, y-l) s(k,l)
k = -m/2
g( x , y ) = { 1, if s fits f Properties of Morphological Erosion are given below: 0, ≠ E (B,A) otherwise 1. Non-commutative : E (A,B) 2. Non-Inverses : D(E(A,B),B) ≠ A ≠ E(D(A,B),B) 3. Translation Invariance : A Ө (B + x) = (A Ө B) + x 4. Decomposition : A Ө (B U C) = (A Ө B) U (A Ө C) 1
1
1
1
1
1
1
1
1
Figure 3.4.2(a): Square size Structuring Element
0
1
1
0
0
0
1
1
1
1
1
0
1
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
Figure 3.4.2(b): Original image 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
Figure3.4.2(c): Image after Erosion 3.4.3
Opening
The open operation is defined as dilation followed by erosion
Smoothens boundaries
Enlarges narrow gaps
Eliminates “spikes”.
Usually it can remove regions which smaller than the structuring element. The morphological opening of A by B, denoted, A º B, is simply erosion of A by B, followed by the dilation of the result by B: A º B = (AӨB)
B
Properties of Morphological Opening are given below: 1. Translation: O(A + x, B) = O(A,B) + x 2. Idempotence : (A ° B)° B = A ° B
Opening is implemented in the IPT with function imopen. The function has the syntax forms as: C = imopen (A, B)
Where A is the binary image and B is a matrix of 0s and 1s that specifies the structuring elements [5]. 3.4.4
Closing
The close operation is defined as erosion followed by dilation •
Fills narrow gaps
•
Eliminates small holes and breaks.
Usually it can fill the small holes which are smaller than the structuring element. The morphological closing of A by B, denoted, A • B, is simply dilation of A by B, followed by the erosion of the result by B: A • B = (A
B) Ө B
Properties of Morphological closing are given below: 1. Translation : C (A + x, B) = C (A,B) + x 2. Idempotence : (A • B) • B = A • B
Closing is implemented in the toolbox with function imclose. The function has the syntax forms as: C = imclose (A, B) Where A is the binary image and B is a matrix of 0s and 1s that specifies the structuring elements [5]. 3.5 Thresholding Segmentation involves separating an image into regions (or their contours) corresponding to objects. We usually try to segment regions by identifying common properties. Or, similarly, we identify contours by identifying differences between regions (edges).
The simplest property that pixels in a region can share it’s intensity. So, a natural way to segment such regions is through thresholding, the separation of light and dark regions. Thresholding creates binary images from grey-level ones by turning all pixels below some threshold to zero and all pixels about that threshold to one. If g(x, y) is a thresholded version of f(x, y) at some global threshold T, g(x,y) = {
1,
if f(x,y)≥ T
The major problem0,withOtherwise thresholding is that we consider only the intensity, not any relationships between the pixels. There is no guarantee that the pixels identified by the thresholding process are contiguous. We can easily include extraneous pixels that aren’t part of the desired region, and we can just as easily miss isolated pixels within the region (especially near the boundaries of the region). These effects get worse as the noise gets worse, simply because it’s more likely that pixels intensity doesn’t represent the normal intensity in the region. When we use thresholding, we typically have to play with it, sometimes losing too much of the region and sometimes getting too many extraneous background pixels. Another problem with global thresholding is that changes in illumination across the scene may cause some parts to be brighter (in the light) and some parts darker (in shadow) in ways that have nothing to do with the objects in the image. We can deal, at least in part, with such uneven illumination by determining thresholds locally. That is, instead of having a single global threshold, we allow the threshold itself to smoothly vary across the image. To set a global threshold or to adapt a local threshold to an area, we usually look at the histogram to see if we can find two or more distinct modes—one for the foreground and one for the background. Recall that a histogram is a probability distribution: p (g) = ng/n That is, the number of pixels, ng, having greyscale intensity g as a fraction of the total number of pixels n [14]. 3.6 Finding Peaks and Valleys One extremely simple way to find a suitable threshold is to find each of the modes (local maxima) and then find the valley (minimum) between them. While this method appears simple, there are two main problems with it:
1. The histogram may be noisy, thus causing many local minima and maxima. To get around this, the histogram is usually smoothed before trying to find separate modes. 2. The sum of two separate distributions, each with their own mode, may not produce a distribution with two distinct modes [14]. Chapter: 4 Our Method Extraction of text from scene image is much difficult than extraction from simple document image. A lot of researches succeeded in extracting single text string from images, but can not deal with image including many text strings [15]. Here we describe and implement Gu’s algorithm that uses mathematical morphology to extract text effectively. Mathematical morphology provides us the theory and tools to capture geodesic information. Hence structure segmentation and shape representation have been popularly operated by Morphological approaches. A primary morphology based approach – top-hats transformation (TT) provides an excellent tool for extracting bright or dark objects from uneven background. But too many complicated segmentation problems, the TT alone cannot provide satisfactory solutions. When a series of structure elements with various sizes have been employed in our study, the gray scale particle segmentation for some complex cases is realized successfully in an iterative manner based on the TT. Here we try to implement a method which was derived based on TT idea [16]. Block diagram of the proposed method is given in figure 4(a).
Input
Gray scale image
Scene images
Decomposition into subimages
Shape decomposition filter Primary Process
Subimages
Emphasis character features and noise reduction
Morphological Filter
Extraction of characters from subimages
Histogram method
Noise reduction from character regions
Composed subimages
Morphological Filter
Text Extraction Process Figure 4(a): Block diagram of Text Extraction Process 4.1 Conception and Notation
Output images
Characters in scene images may be characterized by word, "they float on the background or sink under it with a flat surface". This characteristic underlines the basic idea of our new method. The aim of our new method is to separate the characters from the background. The whole process is divided into two distinct stages: 1. Primary processing (shape decomposition filter) 2. Extraction processing. - Feature emphasis - Character extraction - Noise reduction In the first stage, a new shape decomposition filter based on morphological recursive opening and closing is implemented. This filter decomposes a gray scale input image into a series of sub images according to the size of characters. In the second stage, we first employ a new morphological filter to emphasize characters' features in the sub images and remove most noises out of them, and then, the characters are extracted directly from the gray scale sub images by the histogram method. Lastly, a morphological image cleaning algorithm based on morphological conditional dilation is introduced to make the extracted character region distinct from noise. The resulting sub images are composed together to produce the final result in binary [4]. 4.2 Morphological Notations The original grey scale image is decomposed into a series of sub images with different size of characters. Scene image involves numbers of components besides characters, it is difficult to be dealt in general view, and thus we compose it into simpler ones in our first processing stage. The decomposition will be implemented by the morphological algorithm, and a set of particular structure elements which are derived from 3 x 3 region of support in a recursive manner will also be used. For a given simplest structure element B, which may be a disk, a square, a triangle (in the Euclidean space R 2), a sphere, or a cube (in the Euclidean space R 3), etc., a set of structure element Xi is defined by Xi = riB
……………………………………………………... (1)
When ri is an integer, above equation is equivalent to the following relation, if B is bounded and convex: Xi = B ⊕B ⊕………….. ⊕ B
(ri times) ……………….….. (2)
In our approach, the structure element riBsquare and ri Bdisk are also used. riBsquare can be directly produced in recurrence due to its simplicity, riBsquare = r1Bsquare
⊕ r 1Bsquare ⊕ …………….
r⊕ 1Bsquare (ri times)……………….……..(3)
We also use, Xi = | Xo –Xo º riBDisk |
B
……………………………………..... (4)
This equation implies that the original gray scale image Xo is to be opened by a disk structure element with radius of i and a subtraction will be implemented between original image and opened image, then the result is going to be thresholded by a fixed value to produce a binary image Xi [4]. 4.3 Primary Processing The decomposition procedure is implemented by following morphological algorithm, which is applied to grey scale image and binary images are produced. Xi = | (Xo –Xo º riBDisk ) - (Xo –Xo º ri-1BDisk ) |B – X′i-1 ........................................(5) X′j = U Xi , X1 = Ø 0<j≤i
Or Xi = | (Xo • riBDisk – Xo ) - (Xo • ri-1BDisk – Xo ) |B – X′i-1 ..................................(6) X′j = U Xi , X1 = Ø 0<j≤i
Where, Xo is the original image with a grey scale, X′i is the decomposed binary image, and | | B denotes threshold operation in a defined value. Equation (5) is used to decompose the images where characters are brighter than the background; if the characters are darker than background, equation (6) will be applied. In our experiment we use equation (6). The procedure of this processing is start with riBDisk, a, series of subimages X′i are produced in a recursive manner. The processing will be stopped when the image X oºriBDisk or Xo • riBDisk have no characters remained [4]. 4.4 Extraction Processing Comparing to the documentary texts, recognition of the texts in scene images is much more challenging. Especially, extraction of text strings is most difficult work in the whole process, due to problems listed below: 1. The characters are often mixed with other objects, such as structural bars, company logos and smears. 2. The text may be of any color, and the color of the background may differ only slightly from that of the text. 3. The font style and size of the characters may vary. 4. The lighting conditions in the scene are uneven. As text have some nature different from the background that make us easy to identify them in the scene. There are an abundance of studies taken to extract the text using the nature [10]. According to the nature being used, present work can be classified into four categories: 1. Texts are often composed by equable connected components. 2. A text group is often printed in the equable color.
3. Texts often have high contrast against background, so clear edges always enclose text regions. 4. Texts often have a distinguish texture comparing to the background. Extraction processing is divided into 3 distinct stages:
1. Feature Emphasis 2. Character Extraction 3. Noise reduction / Refinement 4.4.1 Feature Emphasis In this stage, the decomposed sub images are processed by a morphological filter to release noises and emphasize character region [4]. Ei =
(((X′i º ri-1BDisk ) (X′i - X′10) x Xo
ri+1BDisk ) º r2iBDisk ) x Xo (i≤10) ……………..(7) (i>10)
4.4.2 Character Extraction Since character regions are the main component in E i, they hold the peak values in the histogram. The peak values which are bigger than the average of all peak values are searched and Ei is thresholded by the selected peak values to extract characters from it. The extracted characters are in Hi [4]. 4.4.3 Refinement The extracted characters are broken in Hi and there remain several noises. A morphological filter derived from conditional dilation is implemented to refine them perfected. Ri0 = Hi º ri-1Bsquare Rin = (Ri(n-1) r5BDisk) ∩ |Xo|B………………………………..(8) if Rik = Ri(k-1) then stop In the last, sub images Rik are intersected with Xr-1 to obtain the entire resultant image Xr, denoted by Xr = Xr-1 ∩ Rik ……………………………………………………….…. (9) Equation (9) will be continued for the condition 1 ≤ i ≤ j [11]. In our method we use another type of filter which is defined in equation (10) for feature emphasis: Ei = X′ki • ri-1 Bdisk……………………………………………….…….. (10)
Chapter: 5 Experimental Results and Discussions For implementing our method, we use Matlab 7.1. The implementation codes are given in Appendix. In this chapter we describe our experimental results and discussions. 5.1 Experiments In our experiment, we divide test images into three clusters. 1. Cluster 1: Consists of images with small texts such as color map. 2. Cluster 2: Consists of images with medium texts such as business card. 3. Cluster 3: Consists of images with large texts such as color cover image. To evaluate the performance of our experiment for extracting texts from scene images, we used 40 images. All the images were taken from web. In the experiment, primary processing is conducted using equation (6), the stop condition is preferred to jâ&#x2030;¤ 10, because the size of characters in scene image is no more than this value [4]. The Matlab code which is implemented in appendix can extract small texts from color image such as color map, business card etc where characters are darker than background. If we enlarge disk size used in equation (8), the method can extract texts with medium size. Again, if we change all threshold values used in the given method, it can extract texts with large size from different color images such as cover image, billboard, business card etc. Now, we will show the changes which are applied for extracting texts from different cluster with small, medium and large size from different color images such as business card, color map, billboard, cover image etc in tabular form in table 5.1(a). Table 5.1(a): Difference between three Clusters. Changing Position
Cluster 1(Small Cluster 2 (Medium Cluster 3 (Large text Extraction) text extraction) text extraction)
Threshold value 0.05 used in equation (6) Threshold value 0.55 used in equation (8) Disk size used in Always 1 equation (8)
0.05
0.03
0.55
0.65
1 but 6 recursion
during 3 but 5 during recursion
In cluster 1, we use 15 test images including color map, business card, form etc where characters are darker than the background. Among them, three test images and their corresponding results are given below:
Input image
Figure 5.1(a): Original image with 466 x 350 pixels and 72 pixels/inch resolution Output Image
Figure 5.1(b): Output image
Input image
Figure 5.1(c): Original image with 604 x 304 pixels and 72 pixels/inch resolution. Output Image
Figure 5.1 (d): Output image.
Input image
Figure 5.1 (e): Original image with 350 x 350 pixels and 72 pixels/inch resolution. Output Image
Figure 5.1 (f): Output image. In cluster 2, we use 13 test images including business card, cover image etc. In some images, characters are darker than the background and in rest of the images characters are lighter than the background. Also, cluster 2 consists of text with medium sized font. Among them, three test images and their corresponding results are given below: Input image
Figure 5.1 (g): Original image with 495 x 310 pixels and 96 pixels/inch resolution. Output image
Figure 5.1 (h): Output image
Input image
Figure 5.1 (i): Original image with 500 x 323 pixels and 72 pixels/inch resolution.
Figure 5.1(j): Output image
Output image
Input image
Output image
Figure 5.1 (k): Original image with 466 x 350 pixels and 300 pixels/inch resolution.
Figure 5.1 (l): Output image. In cluster 3, we use 12 test images including business card, cover image, bill board etc. In some images, characters are darker than the background and in rest of the images characters are lighter than the background. Also, cluster 3 consists of texts with large sized font. Among them, three test images and their corresponding results are given below: Input image
Figure 5.1 (m): Original image with 580 x 350 pixels and 72 pixels/inch resolution. Output image after digital negative
Figure 5.1 (n): Output image Input image
Figure 5.1 (o): Original image with 466 x 350 pixels and 72 pixels/inch resolution. Output image after digital negative
Figure 5.1 (p): Output image Input image
Figure 5.1 (q): Original image with 600 x 393 pixels and 300 pixels/inch resolution. Output image after digital negative
Figure 5.1 (r): Output image For measuring accuracy, recall rate of characters are calculated to evaluate our algorithm performance. Recall rate of characters are defined as follow. RRC (Recall rate of Characters) = Number of extracted characters / Number of characters in images * 100 % Table 5.1(b): Extraction Performance for Character strings Total Character string Extracted Character string RRC (Accuracy)
Total 175 153 87.43%
Cluster 3 33 31 93.94%
Cluster 2 57 53 92.57%
Cluster 1 85 73 85.89%
5.2 Discussions A color image is transformed to RGB images and the grayscale images are implemented by the presented approach. Although the approach also appropriate to color image, an improvement of extraction processing is made to satisfy color characteristic [4]. Gray scale and color cover image, business card, map, billboard are used in our experiment to demonstrate the efficiency of our method. We focused on this type of image because they have many variations thought to be a typical representative of scene image. Currently, our algorithm has been implemented in Matlab 7.1 under Windows-Xp on a PC with Pentium D 2.80 GHz and 512 memory. The test dataset contains 40 color images. All of them are from web. Most of them are cover image, business card, color map, billboard etc. The average computation time for extracting text from these color images is approximately 30.81s. Chapter: 5 Conclusion In this chapter, we discuss the limitations of our method with our future works. 5.1 Summary and conclusion A method of character string extraction from scene image is discussed and implemented. It canâ&#x20AC;&#x2122;t only detect single and multiple texts, but also detect arbitrary oriented text. It is based
on mathematical morphology and can deal with various cases of scene images. Because in mathematical morphology, top-hats transformation (TT) provides an excellent tools for extracting bright or dark objects from complex background. But for many complicated segmentation problems, TT alone can’t derive better result. For this reason, we discuss this method which is based on morphological segmentation algorithm. Here we also used a series of structure elements with different sizes and modified Gu’s morphological filter for better result. The experimental results appear encouraging to demonstrate the efficiency of mathematical morphology for shape analyzing and detecting. We intended to proceed with our study on character extraction from scene image to improve its accuracy. Finally a character recognition system on scene image using mathematical morphology is suggested to be constructed. 5.2 Limitations Using our method, we successfully deal with multiple texts in one image and extract them from color background. Of course, this method has some limitations. 1. Small noise parts have similar nature as small characters. This type of noise has remained in our resultant images. 2. Texts are connected with graphics. The method doesn’t retrieve a string completely connected to the graphics. 3. The method can only separate text from complex background but it can’t recognize characters. 5.3 Future works
In our experiment we can only separate text from complex color background, but we can’t recognize characters. In future, we will try to recognize characters using vertical and horizontal projection and convert the text image into an editable form. References [1] Nirmala Shivananda and P. Nagabhushan, “Separation of Foreground Text from Complex Background in Color Document Images”, IEEE Transactions on Image Processing, vol.10, pp.306-309, (2009). [2] A. F. Mollah, S. Basu, M. Nasipuri, D. K. Basu, “Text/Graphic separation for business card images for mobile devices”, IAPR International Workshop on Graphics Recognition, pp. 263-270, (2009). [3] Partha Pratim Roy, Josep Llad´os and Umapada Pal, “Text/Graphics Separation in Color Maps”, IEEE Transactions on Image Processing, vol .7, pp.545-551, (2007). [4] Lixu Gu, Naoki Tanaka , R.M.Haralick and Toyohisa kaneko, “The Extraction of Characters from Scene Image Using Mathematical morphology”, IAPR Workshop on Machine Vision Applications, vol.2, pp.12-14, (1996) . [5] Rafael Gonzalez, Richard E. Woods and Steven L. Eddins, “Digital Image Processing Using Matlab”, Publisher: Prentice Hall, 2nd Edition. [6] http://en.wikipedia.org/wiki/Digital_image_processing, (Accessed on: November 20, 2010).
[7] http://www.ciesin.org/docs/005-477/005-477.html, (Accessed on: November 22, 2010). [8] http://pippin.gimp.org/image_processing/chap_dir.html, (Accessed on: November 22, 2010). [9]http://java.sun.com/products/javamedia/jai/forDevelopers/jai1_0_1guide/unc/ Properties.doc.html,(Accessed on: November 22, 2010). [10] T. S. HUANG AND K. AIZAWA, “Image processing: Some challenging problems”, Publisher: National Academy of Science, (1992) [11] John C. Russ, “The Image Processing Handbook”, Publisher: CRC Press, 4th Edition, (2002). [12] J. Serra, “Image analysis and mathematical morphology”, Publisher: Academic Press, 2nd Edition, vol.2, (2008). [13] Edward R. Dougherty, “An Introduction to Morphological Image Processing”, Publisher: SPIE Press, vol: TT59, (2003). [14] Bryan S. Morse, “Lecture 4: Thresholding”,Brigham Young University, (1998-2000). [15] Yuming Wang, Naoki Tanaka, “Text string extraction from scene image based on edge feature and morphology”, IEEE Transactions on Image Processing, vol.10, pp. 323-328, (2008). [16] Henk J.A.M. Heijmans and Jos B.T.M. Roerdink (Eds), “Mathematical Morphology and its Applications to Image and Signal Processing”, Publisher: Springer, 1st Edition, pp. 367374, (2000).
Tools Used Language: Matlab Software: Matlab 7.1 Appendix This appendix contains implementation code for our method using Matlab 7.1.
Matlab Code: clc; clear all; close all; k=input('Enter the file name :: ','s'); X0=imread(k); %X0=rgb2gray(X0);
figure; imshow(X0); Title('Input image'); [M N]= size(X0) max=10; %% Primary Processing %% equation 6 se0=strel('disk',0); se1=strel('disk',1); se2=strel('disk',2); X=imsubtract(imsubtract(imclose(X0,se1),X0),imsubtract(imclose(X0,se0),X0)); %%X=imsubtract(imsubtract(X0,imopen(X0,se1)),imsubtract(X0,imopen(X0,se0))); Z1=im2bw(X,0.05); [M1 N1]=size(Z1) Y=zeros(M1,N1,max);
% decomposed images
E=zeros(M1,N1,max);
% feature emphasized images
H=zeros(M1,N1,max); R=zeros(M1,N1,max); Y(:,:,1)=Z1; Y(:,:,1)=bwmorph(Y(:,:,1),'clean',inf); F=imopen(imdilate(imopen(Y(:,:,1),se0),se2),se2); %E(:,:,1)=immultiply(F,im2bw(X0,0.003)); E(:,:,1)=F; hist=imhist(E(:,:,1))/numel(E(:,:,1)); % %Finding peaks peak = mean(hist) H(:,:,1)=im2bw(E(:,:,1),peak); R(:,:,1)=H(:,:,1); th=0.23; for i=2:1:max
s1=strel('disk',i); s2=strel('disk',i-1); X1=imsubtract(imsubtract(imclose(X0,s1),X0),imsubtract(imclose(X0,s2),X0)); %%X1=imsubtract(imsubtract(X0,imopen(X0,se1)),imsubtract(X0,imopen(X0,se0))); X2=im2bw(X1,th+0.02)-Y(:,:,i-1); Y(:,:,i)=X2|Y(:,:,i-1); Y(:,:,i)=bwmorph(Y(:,:,i),'clean',inf); %figure; imshow(Y(:,:,i)); Title('Decomposed image2'); %% eq 10 for rgb image if i<=10 %F1=imopen(imdilate(imopen(Y(:,:,i),s2),strel('disk',i+1)),strel('disk',i+2)); F1=imclose(Y(:,:,i),s2); %E(:,:,i)=immultiply(F1,im2bw(X0,0.35)); E(:,:,i)=F1; end % Producing a histogram in matlab hist=imhist(E(:,:,i))/numel(E(:,:,i)); %% Finding peaks peak = mean(hist) H(:,:,i)=im2bw(E(:,:,i),peak); %% Refinement R(:,:,i)=imopen(H(:,:,i),strel('square',i-1)); end % for j=1:max %
figure; imshow(R(:,:,j));
Title('Decomposed image'); % end %% eq 8
max2=4; D=zeros(M1,N1,max2); G=zeros(M1,N1,max2); %D(:,:,1)=R(:,:,1); D(:,:,1)=(imdilate(R(:,:,1),strel('disk',1)))&(im2bw(X0,0.55)); G(:,:,1)=D(:,:,1); th=0.45 %for l=1:1:max2 for k=2:1:max2 R(:,:,i-1)=bwmorph(R(:,:,i-1),'clean',inf); D(:,:,k)=(imdilate(R(:,:,k-1),strel('disk',1)))&(im2bw(X0,th+0.02)); %% eq 9 (slightly modified) G(:,:,k)=(G(:,:,k-1)&D(:,:,k)); end figure; imshow(G(:,:,max2));Title('Output image'); [L num]=bwlabel(G(:,:,max2),8);
%label matrix obtained using 8 connectivity
areas=regionprops(L,'Area'); for i=1:num temp=areas(i).Area; if(temp<1) aux=find(L==i); L(aux)=0; end end %figure;imshow(L); imagen=L; threshold = graythresh(imagen); imagen =~im2bw(imagen,threshold); figure;imshow(imagen);Title('Output Image');