INTERNATIONAL JOURNAL OF IMAGE, GRAPHICS AND SIGNAL PROCESSING (IJIGSP) ISSN Print: 2074-9074, ISSN Online: 2074-9082 Editor-in-Chief Prof. Valerii P. Shyrochyn, National Technical University of Ukraine "KPI", Ukraine
Associate Editors Prof. Aleksandr Cariow, West Pomeranian University of Technology, Poland Prof. P. S. Hiremath, KLE Technological University BVBCET Campus, India Prof. Hamid Amiri, National Engineering School of Tunis, Tunisia Prof. V.K. Govindan, National Institute of Technology Calicut, India Prof. Ruisong Ye, Shantou University, China Prof. Y.K. Sundara Krishna, Krishna University, India
Members of Editorial and Reviewer Board Dr. Alan Harris University of North Florida, USA
Prof. B. Eswara Reddy JNTU College of Engineering, India
Dr. Benaïssa Mohamed Bechar University, Algeria
Prof. Mohamed B. El_Mashade Al_Azhar University, Egypt
Prof. Srinivas Yarramalle GITAM University, India
Dr. Goran Bidjovski International Balkan University, Macedonia
Dr. D. B. Shah Sardar Patel University, India
Dr. Sedigheh Ghofrani Islamic Azad University, Iran
Prof. M. A. H. Akhand Khulna University of Engineering Technology Khulna, Bangladesh
&
Prof. C. Patvardhan Dayalbagh Educational Institute, India Dr. Mohamed M. Fouad Military Technical College Kobry Elkoppa, Egypt Dr. Debotosh Bhattacharjee Jadavpur University, India Dr. Abdelkrim Mebarki Université des sciences et de technologie d’Oran – Mohamed Boudiaf, Algeria Dr. Olufade F. W. Onifade University of Ibadan, Nigeria Dr. Mahua Bhattacharya ABV Indian Institute of Information Technology & Management, India Dr. Muhammad Irfan University of Engineering and Technology, Pakistan
Dr. V. T. Humbe Swami Ramanand University, India
Teerth
Marathwada
Dr. Mohammad Motiur Rahman Mawlana Bhashani Science and Technology University, Bangladesh
Dr. S.N. Omkar Indian Institute of Science, India
Dr. Nagendraswamy H. S. University of Mysore, India
Prof. Abderrahmane Hajraoui University of Abdelmalek Essaâdi, Morocco
Dr. Khaled F. Hussain Assiut University, Egypt
Dr. A. K. Verma Hindustan Institute of Technology and Management, India
Dr. Ravi Kumar Jatoth National Institute of Technology, India
Dr. Ireneusz Kubiak Military Communication Institute, Poland
Dr. Hossein Ghanei-Yakhdan Yazd University Yazd, Iran
Dr. Abdol Hamid Pilevar Bu Ali Sina University Hamedan, Iran
Dr. Galina Cariowa West Pomeranian University of Technology, Poland
Prof. Ghazali Bin Sulong Universiti Teknologi Malaysia, Malaysia
Dr. C. Vasantha Lakshmi Dayalbagh Educational Institute, India
Dr. Basavaraj S. Anami K.L.E. Institute of Technology, India
Prof. Jiang Li Austin Peay State University (APSU), USA
International Journal of Image, Graphics and Signal Processing (IJIGSP, ISSN Print: 2074-9074 ISSN, Online: 2074-9082) is published monthly by the MECS Publisher, Unit B 13/F PRAT COMM’L BLDG, 17-19 PRAT AVENUE, TSIMSHATSUI KLN, Hong Kong, E-mail: ijigsp@mecs-press.org, Website: www.mecs-press.org. The current and past issues are made available on-line at www.mecs-press.org/ijigsp. Opinions expressed in the papers are those of the author(s) and do not necessarily express the opinions of the editors or the MECS publisher. The papers are published as presented and without change, in the interests of timely dissemination. Copyright © by MECS Publisher. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner.
International Journal of Image, Graphics and Signal Processing (IJIGSP) ISSN Print: 2074-9074, ISSN Online: 2074-9082 Volume 8, Number 12, December 2016
Contents REGULAR PAPERS The Multifractal Analysis Approach for Photogrammetric Image Edge Detection Olga V. Spirintseva
1
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words Hashem Ghaleb, P. Nagabhushan, Umapada Pal
8
Remote Sensing Textual Image Classification based on Ensemble Learning Ye zhiwei, Yang Juan, Zhang Xu, Hu Zhengbing
21
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival Pooja Gupta, Vijay Verma
30
A Survey on Shadow Removal Techniques for Single Image Saritha Murali, V.K. Govindan, Saidalavi Kalady
38
Image Comparison with Different Filter Banks On Improved PCSM Code Jagdish Giri Goswami, Pawan Kumar Mishra
47
2D Convolution Operation with Partial Buffering Implementation on FPGA Arun Mahajan, Paramveer Gill
55
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank Mohd.Abdul Muqeet, Raghunath S.Holambe
62
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.01
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection Olga V. Spirintseva Oles Honchar Dnipropetrovsk National University, Dnipropetrovsk, Ukraine Email: spirintseva.olga@gmail.com
Abstract—As rapidly the computer technology is being developed the fractals and fractal based analysis have received special popularity. Space photogrammetric snapshots fixed in a number of electromagnetic radiation spectral ranges have their own special attributes as compared with color images in general. The aspects of photogrammetric images segmentation based on multifractal analysis are studied in this paper in order to extract the edges of the developed object optimally. The aim of the study is to research the way of fractal analysis based on pointwise Hölder exponent of photogrammetric images fixed in a number of spectrum ranges by iconic means of remote sensing. Index Terms—Photogrammetric image, analysis, segmentation, Hölder exponent.
multifractal
I. INTRODUCTION Data obtained from the digital images processing are very important for modern technology, science and engineering and also national economics development. Digital images processing methods are steadily being developed and upgraded in terms of image visual perception improving, its preprocessing, denoising, segmentation and classification of images in order to extract the objects of interest. Space photogrammetric snapshots are fixed in a number of electromagnetic radiation spectral ranges, so they are multispectral images so that have their own special attributes as compared with color images in general. Such multispectral image features as different information value and different spatial resolution subject to image spectral band, time, position, optical sensor features and etc. dependencies of image forming processes affect the specialty with its processing. The principle of image segmentation in order to analyze and classify the objects of interest in the image consists of the image separation into components or primitive objects. The detailing level of the shared areas depends on the current task. For example, when the object of interest ceases to maintain the integrity and becomes being divided into smaller parts, the segmentation process should be finished. Image segmentation algorithms are often based on the similarities and breaks of the image intensity values. The approach of intensity breaks is based on sudden changes in image brightness values. The approach of intensity Copyright © 2016 MECS
similarities is based on the image division into similar regions according to a number of pre-defined criteria. Thus, the choice of image segmentation algorithm depends on the task to be solved. The edge detection is an integral part of image segmentation techniques set as far as the effectiveness of many tasks solving of image processing and computer vision often depends on the quality of interested objects contours extracting. The principle of multifractal based image segmentation is the following. It seems intuitively clear that points in an image can be classified according to their Hölder exponent [1]. Herewith the Hölder exponent is the regularity characteristic for intensity level distribution on the image field. The point-wise, local, Hölder exponent represents the local regularity, which is proportional to the current value of the exponent. So, let us take the example of points lying on contours. These points often correspond to discontinuities of the grey level map or of its derivative. They thus have in general "low" Hölder regularity. However, the exact value of the exponent will depend on the characteristics of the image. In addition, the property of being an edge is not purely local, and one needs a global criterion in order to decide that a given point is an edge point. Indeed, points lying in textured regions also have in general a low regularity, and one has to find a way to distinguish them from contours. As a powerful mathematic tool, fractal theory initiated by Mandelbrot [2] has been widely applied to many areas of natural sciences. The approach of Image Multifractal Segmentation is fully nonparametric, and analyses the image through various features of its multifractal spectrum [3]. Although it is perfectly possible to use Weakly Self-Affine functions [4] to model and segment images as well as to use multifractal tools for the segmentation of 1D signals. In fact, this is really a modeling method used for segmentation purposes.
II. GENERAL APPROACHES TO IMAGE MULTIFRACTAL SEGMENTATION Since edges are by definition sets of points of dimension one, we shall declare a point to lie on a contour if it has an exponent such that the associated value of the multifractal spectrum is one. Note that, in addition to this geometrical characterization of edge points, a statistical one is possible. Edge points may be defined by their probability of being hit when a pixel is randomly chosen in the image at a given resolution. The I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
2
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection
link between the geometrical and statistical characterizations is provided by the multifractal formalism. Starting again from the Hölder exponents, one can decide to keep those points where the spectrum has any given value. One starts by computing the Hölder exponent at each point. This yields the image of Hölder exponents. The second step is to compute the multifractal spectrum. Finally, one chooses which kind of points to extract, i.e. points on smooth edges, textures, etc..., by specifying the corresponding value of the spectrum. The analysis is performed with respect to the Lebesgue measure; exponents are computed by comparing the content of a region with its size. In [5] the local behavior of a continuous function is explained. A non-trivial open interval of is given, the local regularity at a point of a function is given by the point-wise Hölder exponent . The function belongs to if and only if there exist a constant and a polynomial of degree smaller than such that
The Hausdorff spectrum gives geometrical information pertaining to the dimension of sets of points in the image having a given exponent. This spectrum is a curve where the abscissa represents all the Hölder exponents that occur in your image, and the ordinate is the dimension of the sets of pixels with a given exponent. The second spectrum is the large deviation spectrum. This spectrum yields the statistical information, related to the probability of finding a point with a given exponent in the image. The computation is based on techniques used in density estimation, and uses a kernel of optimal, signal dependent, size computed from some empirical statistical criterion. The third spectrum is called the Legendre spectrum. It is just a concave approximation to the large deviation spectrum. Its main interest is that it usually yields much more robust estimates, though at the expense of an information loss. According to [5] the Hausdorff dimension is the most common notion of dimension, denoted dim in the paper. The Hausdorff multifractal spectrum of is defined by
, |
|
|
The pointwise Hölder exponent of {
| . at
is
}.
Multifractal analysis then focuses on the dimension of the fractal level sets of the function , that is the sets of the form {
}
The computation of the Point-wise Hölder exponent was executed by different ways of measuring the content of a given region in the image: - associating to each region the sum of the grey level of the pixels in it; - computing the Lp norm, i.e. the 1/p-power of the sum of the p-powers of the individual grey levels (lpsum capacity); - measuring the region content by the minimum of the grey levels of its pixels; - measuring the region content by the maximum of the grey levels of its pixels; - assigning to a region the cardinal of the largest subset of pixels having the same grey level (iso capacity).
During segmentation, those points the exponent of which have a corresponding value of spectrum that falls inside the definite range of dimensions are being extracted from the original image. The result is a binary image, where the extracted points are in white and everything else is black.
III. HÖLDER EXPONENT BASED SEGMENTATION ALGORITHM The image processing feature of interest is the Hölder exponent (also known as the Lipschitz exponent). This is a scalar quantity, readily computable from the wavelet transform [6, 7], which represents the regularity or differentiability of a given signal; the higher the exponent, the higher the regularity. The following segmentation algorithm is - computing the corresponding alpha image from the normalized grayscale input image, using a specific radius and specific measure type. The radius is an integer from 1 to 5, which defines the largest measure domain. The measure methods used in the study are Maximum, Minimum, Sum, Iso, Adoptive Iso of square shaped domain.
For instance, if a region is composed of N pixels all having different grey levels, its iso capacity will be one. If all pixels have the same grey level, the iso capacity is N. The adaptive iso is a refinement of this taking into account a possible noise in the image. At the next step, the multifractal spectrum of the processed image could be computed by one of the following three algorithms, which are proposed by Fractal Theory.
The pixels in the alpha image are estimated values of Hölder exponent at these points and they describe the local regularity of an input image. The Hölder exponent at the definite point is estimated from bi-logarithmic diagram ln(Mi(m, n)) vs. ln(i), where Mi(m, n) is the amount of the chosen measure within measure domain of size i at particular image pixel with spatial coordinates (m, n). The limiting value of alpha(m, n) is estimated as a slope of the linear regression line in this log-log diagram [8, 9].
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection
- carrying out the alpha image discretization thereto the alpha image is being covered by a regular grid of boxes with integer box sizes j = 1,2,4,6,8,10,12,14,16 pixels. The size of each bin is k = (N - 1)/(max - min), where N is the number of bins, max and min are the maximum and minimum intensity levels of the alpha image. The boxes containing at least one value of that alpha bin are counted giving the overall number of hitboxes of size j, Nj(alpha bin) for that alpha bin. Boxes of different sizes are recursively taken into account, and corresponding Hausdorff measures are calculated for each image pixel from bi-logarithmic diagram ln(Nj(alpha bin)) vs. ln(j). The limiting value of f(alpha bin) is estimated as a slope of the linear regression line in this log-log diagram. This procedure is repeated for every of N equally sized alpha bins, obtaining the 1D f(alpha) multifractal spectrum and also the f(alpha) image filled by pixel-wise values of f(alpha). - computing the multifractal Hausdorff spectrum from the alpha image using a specific number of bins, which is a positive integer. As follows the components of the
a
b
3
spectrum are normalized to 2 to achieve the minimum misrepresentation. - computing the corresponding f(alpha) image from an input alpha image using a specific number of bins for input alpha image discretization defining. The function f(alpha) describes the global regularity of an input image and is the result of the following segmentation process.
IV. EXPERIMENTAL RESULTS As the testing images the Ikonos panchromatic images of University campus area (fig. 1 a) and urban area (fig. 1 c), the Ikonos blue band multispectral image of University campus area (fig. 1 b) and the Ikonos NIR multispectral image of urban area (fig. 1 d) were kept. The resolution of the panchromatic images is 1 meter and the spectral range is 0.45 - 0.90 mkm. The resolution of the multispectral images is 4 meters and the spectral ranges are 0.45 - 0.53 mkm for the blue band and 0.77 0.88 mkm for NIR band images.
c
d
Fig.1. The initial photogrammetric images
The images obtained after segmentation process are shown below in Table I in accordance with Hölder exponent capacity provided Hausdorff Spectrum was used. Such parameters for "fat" binary image containing smooth regions were chosen directly during segmentation when the f(alpha) image had being computed. The obtained results were compared with similar ones got after applying the number of well-known Edge Detection Methods to original test image bands. Such methods are the Sobel, the Prewitt, the Roberts, the Laplacian of Gaussian, the Zero-Cross, the Canny ones [10-12]. The comparison is performed according to the criteria of the SSIM - index [13], S
A A B B AA BB 2
Copyright © 2016 MECS
2
where the symbol of upper line marks the mean values of image intensity levels; the symbol of means the
matrix Phrobenius norms; the symbol of means the element-wise matrix multiplication, wherein the A and B matrixes have the same dimension. The SSIM - index shows the similarity of two images about their geometric structure. The higher the index, the greater the similarity between the images. In the paper, the SSIM - index is calculated for the pair of the original image and corresponding segmentation result one. The appropriate SSIM - index values are put below in Table II. As we can see from the Table II the fractal Hölder exponent based segmentation approach deserve attention. Different capacities give different results for different bands of the definite photogrammetric image. The noteworthy values are marked bold. And we may notice this effect visually inspecting the corresponding images. 2
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
4
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection
Table.1. Image Segmentation Results 1
*
Campus Area
Urban Area
2* Blue Band
Panchromatic
NIR Band
Adoptive Iso
Iso
Max
Min
Panchromatic
*
1 - Area type, Band name; 2 - HĂślder Exponent capacity
In [14] were offered to use the uniformity measure and contrast measure as segmentation quality criteria. For this purpose, the Standard Deviation (StD) used in this paper as uniformity measure and the Image Contrast (IC) used as intersegment contrast. Corresponding estimate values are put down in Table III. These characteristics are supplemented with Image Fidelity (IF) and Signal-toNoise Ratio (SNR) values as the standard digital image quality assessments (Table III), [15].
As we can see from the Table III the numbers speak for themselves, and general conclusions about this are presented in Discussion. I `d rather say that the numerical results echo the corresponding images visual perception.
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection
∑ Table.2. Comparing with Edge Detection Methods According to Criteria of SSIM-Index 1
Campus Urban Pan Blue Pan NIR 0.0779 0.1391 0.1421 0.1008 Sobel 0.0765 0.1372 0.1361 0.0989 Prewitt 0.0631 0.1332 0.1392 0.1063 Roberts 0.1127 0.1711 0.2065 0.1880 Log 0.1127 0.1711 0.2065 0.1880 Zerocross 0.0382 0.0790 0.0906 0.0823 Canny Max 0.4830 0.4334 0.3927 0.5051 0.0827 0.0036 0.0296 Min 0.3547 0.0064 0.0787 0.1125 0.0883 Iso 0.1797 0.2292 Adopt.Iso 0.2931 0.2745 * 1 - Area type, Band name; 2 - Edge Detection Methods Table.3. Qualitative Segmentation Results 1*
Min Max Iso Adopt. Iso *
StD IC IF
,
∑
*
2*
2*
5
Campus Area
Urban Area
Pan
Blue
Pan
NIR
97.5521 0.9537 0.8219
109.881 0.9537 0.7536
98.8964 0.9764 0.8156
120.547 0.9764 0.6629
SNR
0.7495
0.6084
0.7342
0.4722
StD IC IF
90.3927 0.9537 0.8526
105.406 0.9537 0.7813
104.025 0.9764 0.7891
115.588 0.9764 0.7110
SNR
0.8316
0.6602
0.6759
0.5391
StD IC
104.316 0.9999
126.3406 0.9537
124.3290 0.9764
125.6667 0.9764
IF SNR StD
0.7875 0.6726 93.7225
0.4327 0.2462 120.7393
0.3892 0.2141 111.0354
0.4155 0.2332 109.3223
IC
0.9537
0.9999
0.9999
0.9999
IF
0.8390
0.6607
0.7458
0.7573
SNR
0.7932
0.4694
0.5948
0.6149
1 - Area type, Band name; 2 - Hölder Exponent capacity
As it was mentioned earlier in the paper there are some photogrammetric image singularities such as a photogrammetric image forming principle particularities, optical sensor space instability directly while fixing process, the random natural character of intensity level distributions, which cause the processing specialties. These features are also responsible for the too noisy character of such image intensity level distributions. To obtain the satisfactory image segmentation results one should denoise the original images before the segmentation process begins. The most simple, clear and common denoising method, acceptable for photogrammetric images processing, is Adoptive Wiener Filtering, which designed to remove the additive Gaussian white noise from the image. Wiener method based on statistics (local mean and variance features) estimated from a local neighborhood of each pixel. According to the Wiener method the local mean and variance are estimated around each pixel:
Copyright © 2016 MECS
where η is the N-by-M local neighborhood of each pixel in the image A. Then a pixel-wise Wiener filter using these estimates is created
where ν2 is the noise variance. If the noise variance is not given, the method uses the average of all the local estimated variances. The most eloquent results of Preliminary Wiener Filtering applying to original images before the segmentation process and comparing with the same results without Preliminary Wiener Filtering are presented below in Table IV. Analyzing the Table IV data, the Urban Area Images we could notice that the contours become clearer and less garbage. At Campus Area Images also there is less garbage and contours could be distinguished especially on the PAN band.
V. DISCUSSION Fractal geometry is called to describe natural phenomena and objects. The spatial intensity distribution across the image field could be taken as a phenomenon and the image itself could be seen as an object. Fractal methods are fundamentally new signal and image processing methods. They use a fractional topological space dimension of signals and images as well as properties of self-similarity and scaling. While computing the Hölder Exponent It should be mentioned that the most important capacities are the sum, max, and iso ones. The choice of one capacity rather than another one should be performed on a trial and error basis. As a general rule, max, min and (adaptive) iso capacities give more robust and meaningful results. In any case, it should be experimented with different capacities and look at the result before you decide which one you choose: different capacities will often highlight different aspects of your image. While computing the box dimensions for sets of points for Hausdorff spectrum be warned that excessive max boxes sizes (over 64) will result in long computation times. Increasing the minimum boxes yields smoother but less precise spectra. The shape of the spectra for a typical image is very different depending on the capacity: for the sum capacity, it generally has an approximate bell shape. For the max capacity, it looks more like a segment of the form y = 2 -- ax, with a > 0, as for the iso one, it would resemble y= ax, again with a >0.
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
6
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection
Table.4. Preliminary Denoising Results 1
*
Segmentation only
Denoising + Segmentation
Segmentation only
Denoising + Segmentation
PAN band Urban Area Min capacity
NIR band Urban Area Max capasity
2*
1*
PAN Band Campus Area ISO capacity
Blue Band Campus Area Adopt.ISO
2
*
*
1 - Method combination; 2 - Area type, Band name, HĂślder Exponent capacity
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
The Multifractal Analysis Approach for Photogrammetric Image Edge Detection
Visual qualitative assessment of obtained segmentation result images confirms said-above general resumes. As complementary quantitative assessments the Standard Deviation and Image Contrast to the best advantage fit for photogrammetric image segmentation estimation, and could be supplemented and compared with standard features of digital images visual quality in future studies. These data allow concluding the following:
[4]
- The Multifractal pointwise Hölder exponent based segmentation of photogrammetric image fixed in a number of spectral ranges is performed. - The segmentation results are estimated up to visual perception and qualitatively. - As we can see, it should be experimented with different capacities and look at the result before you decide which one you choose: different capacities will often highlight different aspects of the image.
[7]
Good practical results give Preliminary Adoptive Wiener Denoising applied to the original images. So, it`s reasonable to study different de-noising approaches including Fractal de-noising methods, which should be applied to the image before the segmentation process begins. Further development of the photogrammetric image segmentation may hold towards adding the preliminary image processing by means of de-noising methods, geometrical correction of spatial intensity distributions, image fusion methods for multispectral photogrammetric images and etc... It`s also practical to study the post processing direction of object contours joining and elongation that had been suffered breaking during coarse segmentation.
[5]
[6]
[8] [9]
[10]
[11] [12]
[13]
[14]
[15]
7
J. Levy-Vehel, "Weakly Self-Affine Functions and Applications in Signal Processing", CUADERNOS del Instituto de Matematica ―Beppo Levi‖, 30, ISSN 03260690, 2001. J. Barral, S. Seuret, "From multifractal measures to multifractal wavelet series", The Journal of Fourier Analysis and Applications, Vol. 11, Issue 5, pp. 589 - 614, 2005. S. A. Mallat, "Wavelet Tour of Signal Processing". Second edition. Academic Press, London, 1999. S. Mallat and W.L. Hwang, "Singularity detection and processing with wavelets", IEEE Transactions on Information Theory 38, pp.617-643, 1992. J. Lévy-Véhel, P. Mignot "Multifractal segmentation of images", Fractals, Vol. 2 No. 3, pp. 379-382, 2004. T. Stojic, I. Reljin, B. Reljin, "Adaptation of multifractal analysis to segmentation of microcalcifications in digital mammograms", Physica A: Statistical Mechanics and its Applications, Vol. 367, No. 15, pp. 494-508, 2006. J. Canny, "A Computational Approach to Edge Detection", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.PAMI-8, No. 6, pp. 679-698, 1986. J.S. Lim, "Two-Dimensional Signal and Image Processing", Englewood Cliffs, NJ, Prentice-Hall, pp. 478-488, 1990. J. R. Parker, "Algorithms for Image Processing and Computer Vision", New York, John Wiley & Sons, Inc., pp. 23-29, 1997. Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity", IEEE Trans. Image Processing, Vol. 13, pp. 600 – 612, 2004. M.D.Levine and A.Nazif. "Dynamic measurement of computer generated image segmentations", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.7, No.2, pp. 155 - 164, 1985. I.M. Zhuravel, "Digital Images Visual Quality Assessments. Short Theory Course of Image Processing", 2002. http://www.matlab.ru/imageprocess/book2/2.asp.
ACKNOWLEDGMENT The author sincerely and strongly felt thanks to Prof. Vladimir M. Korchinsky for the mere suggestion and supporting the research. REFERENCES L. Trujillo, P. Legrand, J. Levy-Vehel, "The Estimation of Hölderian Regularity using Genetic Programming", GECCO’10, pp. 861-686, 2010. [2] B.B. Mandelbrot, J.W. Van Ness, "Fractional Brownian motions, fractional noises and applications", SIAM Rev. 10 (4), pp. 422–437, 1968. [3] J. Levy-Vehel, P. Legrand, "Thinking in Patterns. Signal and Image Processing with FRACLAB", pp. 321–322, 2004. [1]
Authors’ Profiles Olga V. Spirintseva, female, is a Candidate of Technical Sciences (2013) and an associate professor at the Electronic Computer Department of Oles Honchar Dnipropetrovsk National University, Ukraine. The dissertation thesis specialty is Applied Geometry, Engineering Graphics, and its theme is "Object Identification on Photogrammetric Digital Images in Conditions of Fixing Parameters Indeterminateness". Research interests include digital and photogrammetric image processing, object identification, image segmentation. Teaching interests include digital image processing, Java programming, information security, wireless communications.
How to cite this paper: Olga V. Spirintseva,"The Multifractal Analysis Approach for Photogrammetric Image Edge Detection", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.1-7, 2016.DOI: 10.5815/ijigsp.2016.12.01
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 1-7
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.02
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Subwords Hashem Ghaleb1 1
Department of Studies in Computer Science, University of Mysore, Mysore, India Email: hashemx86@gmail.com
P. Nagabhushan1 and Umapada Pal2 2
CVPR Unit, Indian Statistical Institute, Kolkata, India Email: pnagabhushan@hotmail.com, umapadapal@isical.ac.in
Abstract—Segmentation of Arabic text is a major challenge that shall be addressed by any recognition system. The cursive nature of Arabic writing makes it necessary to handle the segmentation issue at various levels. Arabic text line can be viewed as a sequence of words which in turn can be viewed as a sequence of subwords. Sub-words have the frequently encountered intrinsic property of sharing the same vertical space which makes vertical projection based segmentation technique inefficient. In this paper, the task of segmenting handwritten Arabic text at sub-word level is taken up. The proposed algorithm is based on pulling away the connected components to overcome the impossibility of separating them by vertical projection based approach. Graph theoretic modeling is proposed to solve the problem of connected component extraction. In the sequel, these components are subjected to thorough analysis in order to obtain the constituent sub-words where a sub-word may consist of many components. The proposed algorithm was tested using variety of handwritten Arabic samples taken from different databases and the results obtained are encouraging. Index Terms—Arabic Handwriting Recognition; Arabic Sub-words; Sub-word Segmentation; Connected Component Extraction; Graph theoretic modeling
I. INTRODUCTION Optical Character Recognition is the process of transforming text from the iconic form of writing to its symbolic form [1]. Numerous research results in recognizing printed and handwritten text have been
(a)
reported especially for Latin and Chinese scripts. However, the state of the art for Arabic text recognition falls far behind [2,3] despite the fact that Arabic letters, along with few additional letters, are used to transcribe several languages such as Persian and Urdu. Arabic script, written from right to left, is cursive in both handwritten and printed forms. Arabic Alphabet contains 28 letters. Letters are joined to form a word (see Fig. 1(a)). However, there are 6 letters ( ا, د, ذ, ر, ز, and )وwhich can be joined only to the letter preceding them but not to the succeeding letter. This conditional joining property leads to the emergence of PAWs (Parts of Arabic Words) [6]. A PAW, or a sub-word, is a sequence of Arabic letters that are joined together. Generally, an Arabic word consists of one or more sub-words (for example, the word in Fig. 1(a) consists of 1 sub-word whereas the word in Fig. 1(b) consists of 3 sub-words). A sub-word, in its essence, may consist of several components which occurs as a results of the intrinsic nature of Arabic letters where several letters share the same basic shape; the dot and Hamza components are used to distinguish such letters. These components are known in the literature as primary component (main body of the sub-word) and secondary components (dots and diacritics) (the word in Fig. 1(c) consists of two sub-words; the right sub-word has only one component whereas the left one consists of 4 components; one component corresponding to the main body and three to the dots). Additionally, the sub-words might horizontally overlap; i.e., share the same vertical space without being touching (see Fig. 1(d), overlapping is highlighted by encircling). This overlapping induces problems for both the word and the character segmentation [4,5].
(b)
(c)
(d)
Fig.1. Arabic words.
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
9
Fig.2. Arabic Text line.
Overall, the Arabic text line can be viewed as a sequence of words which in turn are a sequence of subwords (see Fig. 2). The sequence of sub-words usually runs from right to left with the secondary components associated with a particular main body of a sub-word are confined within the boundary of the main body. However, there might be some displacement of the secondary components in the handwriting scenario. Since segmentation of word into characters is very challenging task [18], we propose to work on considering sub-word itself as a basic unit for processing. Since a sub-word in a word in principle should happen to be the disconnected components, although segmentation by projection as said above could be difficult, we resort to graph theoretic modeling for the analysis of connected and disconnected components in coherence with which the graph theory is utilized to map complex problems into simple representations and models which allows for the definition of optimal solution techniques for such problems [16]. Thus the work proposed in this paper is devoted for segmenting text line into its sub-words utilizing the concepts of graph theory. The rest of the paper is organized as follows- related literature is presented in section 2. The different stages of the proposed algorithm are described in section 3. The experimental results are reported in section 4. Comparative results are reported in section 5. Finally, the paper is concluded in section 6.
II. RELATED WORKS Arabic handwriting recognition systems proposed in the literature can be divided into segmentation-based (Analytical) and segmentation-free (Holistic) approaches. In segmentation-based recognition systems the word is segmented into smaller units (characters, graphemes, or primitives), and then these units are recognized. On the contrary, segmentation-free systems consider the whole image as the unit of recognition. The property of Arabic writing where a word is separated into many pieces (i.e., sub-words) has been utilized by the researchers for performing variety of tasks such as word recognition, word spotting, and lexicon reduction. Motivated by the Arabic letters‘ conditional joining property, Abdulkader introduced a two-tier approach for the recognition of handwritten Arabic text in [6] wherein an Arabic word is looked at as being composed of a sequence PAWs. PAWs can be expressed in terms of letters. The recognition problem is decomposed into two problems that are solved
Copyright Š 2016 MECS
simultaneously. To find the best matching word for an input image, a Two-Tier Beam search is performed. In Tier one the search is constrained by a letter to PAW lexicon. In Tier two, the search is constrained by a PAW to word lexicon. In [21], characteristic loci features were used to cluster printed Farsi (Persian) sub-words based on their shapes yielding a pictorial dictionary which according to the authors can be used in a word recognition system. However, the authors did not address the segmentation of text into sub-words explicitly. Instead, a dataset of sub-words is created and later used to build the dictionary. A word spotting system for handwritten Arabic documents which adapts to the nature of Arabic writing is introduced in [22]. The system recognizes PAWs. Then, it reconstructs and spots words using language models. Number of PAWs, along with number and position of dots, were used as inputs to a lexicon pruning method in [8]. A lexicon-reduction strategy for Arabic documents based on the structure of Arabic sub-word shapes which is described by their topology is introduced in [7].There exist few attempts in the literature which aimed at segmenting the Arabic word/ text line into the composing sub-words. Vertical histogram of line image is employed to segment the printed Arabic text line into sub-words [20]. The strategy for segmenting the text line into sub-words in [14,15] is based on vertical projection of black pixels of the line onto the X-axis. Such strategy has the shortcoming of being unable to handle the overlapping nature of the subwords. Parvez et al [3] addressed the issue of sub-word segmentation as part of recognition system which intends to recognize handwritten Arabic words. Firstly, all connected components in the line are extracted. This is followed by baseline estimation. Later on, the baseline information is used to determine the primary components of the sub-words. All other components are considered to be secondary components. After that each secondary component is assigned to a primary component using set of predefined rules. Finally, each primary component along with the associated secondary components are marked as PAW and passed to the subsequent stages of character segmentation and recognition. The technique used for sub- word segmentation in [9] proceeds as follows: word baseline is estimated using the horizontal projection histogram method. Then, the main and secondary bodies are identified; main bodies of the subwords are extracted and the secondary bodies are assigned using predefined rules to their respective main bodies to yield the sub-word.
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
10
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
(a)
(b)
(c)
Fig.3. Arabic script based words.
Adapting the connected components analysis to segment Arabic word/text line into sub-words seems to be more appealing. However, the existing methods are essentially dependant on the strict identification of a component as either primary or secondary component which is determined by the size of the component and its distance from the baseline; the guiding criterion for such characterization is based on the relatively small size of secondary components and its far distance from the baseline. Furthermore, predefined rules are applied to associate the secondary components to their respective primary components. It could be argued that applying such rigid rules is an error-prone process especially in the handwritten text scenario. For example, the component C4 (a primary component) in Fig. 3(a) is smaller than the component C6 (a secondary one) and the component C3 (a primary component) in Fig. 3(b) has smaller size than the component C6 (a secondary component). It can be also observed that the component C6 is not so far from the baseline. The component C1 in Fig. 3(c) which constitutes a complete sub-word also has a relatively small size. In this paper we propose an algorithm to segment a text line into its constituent sub-words based on the graph theoretic analysis of connected components. The proposed algorithm is solely inspired by the sequence of Arabic writing which allowed us to introduce a simple yet efficient technique to accomplish the segmentation. To handle the situation when the secondary components are displaced, an additional refinement stage is incorporated into the algorithm. Furthermore, the proposed algorithm is flexible in the sense it might associate a secondary component with more than one primary component ensuring that the secondary component is associated with its primary component and leaving the issue of resolving the ambiguity to the subsequent stages of the recognition system.
III. METHODOLOGY The proposed algorithm comprises of four stages: preprocessing, extraction of connected components, segmentation of text line into sub-words, and refinement stage. The details of each stage are presented in the following sub-sections. A. Preprocessing The goal of the preprocessing is to prepare the image for subsequent stages. To close gaps and fill small holes, morphological closing operation is applied on the image. Furthermore, two rows containing background pixels Copyright Š 2016 MECS
(OFF pixels) are inserted (padded) to the image (one at the top and the other at the bottom). Similarly two columns (one to the left and one to the right) are padded to the image (The necessity of such operation will be highlighted later). The sample images, on which the experimentation is carried out, have been drawn from datasets which are already binarized and de-noised [1013]. Hence, no further binarization or noise removal is applied. B. Extraction of Connected Components In a binary image it is important to identify the foreground components present in the image. The general approach for tackling this problem is to associate with each pixel of a connected component a label by which the component is identified [16]. Different techniques have been proposed for accomplishing such labeling. The output of such operation is an image of the same size as the input image. A final scan of the labeled image may be required to identify the pixels associated with each component adding an additional burden to the task of labeling which itself might require more than one pass of scanning the input image. Unlike many methods existing, the connected component extraction algorithm proposed here goes directly to the extraction of the set of foreground components in just a single pass where each foreground component is retained in terms of the pixels comprising such component. Moreover, the algorithm is compatible with the alignment of Arabic writing which further makes the subsequent analysis of connected components towards achieving the task of sub-word segmentation easier. In this stage the connected components are extracted. Each component is retained in terms of coordinates of pixels which belong to the component. Towards this, the concept of grid graph [16] is utilized to represent the binary image as a graph. Given a set of pixels F in the image and a connectivity relationship on which a digital topology is based, the grid graph G (V,E) (where V is the set of vertices and E is the set of edges) of the image is defined as follows [16]: i. ii.
To every pixel in F there corresponds a vertex in V. An edge (u,v) exists in E whenever the pixels p and q corresponding to pixels u and v are neighbors in the digital topology.
The above mentioned concept is utilized in our work where the set F is defined to be the foreground pixels (ON pixels) in the binary image and the 8-neighborhood is used. This representation (i.e., representation of the image as a grid graph) is used to extract the connected components existing in the image. The algorithm I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
proceeds as follows: the image is scanned from top to bottom and from right to left. Upon encountering the first foreground pixel, its coordinates are stored in a list which would later contain all the pixels connected to this pixel (the coordinate system as defined in [17], in which a pixel is said to be of coordinates (x,y) where the first element of coordinate tuple refers to a row and the second element refers to a column, is used) . The pixels in 8neighborhood of such pixel are traced; if any pixel in the neighborhood is found to be a foreground pixel, then its coordinates is also stored in the same list. The scanning continues and if a foreground pixel which does not belong to the list(s) available is encountered, a new list is created to retain the coordinates reflecting the possibility of such pixel belonging to a different component. Sometimes, two lists are merged; when a pixel which is yet to be inserted into a given list already belongs to another list which indicates that the pixels contained in the two lists belong to the same component. The significance of padding background pixel to the input image is to avoid repeated checking of a foreground pixel being on the border of the image. It also makes the tracing of 8-neighborhood pixels more consistent instead of being dependent on the position of foreground pixel. Finally, it is worth noting that the connected components are extracted in the offline mode and hence the image can be scanned from left to right as well. However, to align with direction of Arabic writing and to facilitate the subsequent stage of sub-word segmentation we opted for right to left scanning. Algorithm: Connected Component Extraction Input: I, binary image Output: S={C1,C2,…,Cn}, set of connected components where Ci={(x1,y1),(x2,y2),…..,(xk,yk)} is the set of coordinates of pixels comprising the component Ci. Method Step 1: Apply preprocessing on I Step 2: Dummy_List←Φ
11
Step 3: Pool_of_Lists<-Dummy_List Step 4: Scan the image column-wise from right to left Step 5: if pixel is OFF goto Step 4 elsegoto Step 6 Step 6: If pixel is not appearing in any list included in the Pool_of_Lists Create a new list Store pixel coordinated in the list Update Pool_of_Lists Traverse the neighbors in the 8-neighborhood for each ON pixel in the 8-neigborhood do if pixel is common with another list Merge the two lists goto Step 4 else Store the coordinates of the pixel in the new list goto Step 4 end for else goto Step 7 Step 7: Identify the list where the pixel exists Traverse the neighbors in the 8-neighborhood for each ON pixel in the 8-neighborhood do if neighbor pixel exists in the same list where the pixel under consideration exists goto Step 4 else if neighbor pixel is common with another list Merge the two lists goto Step 4 else Store the coordinates of the neighbor pixel in the same (identified) list goto Step 4 Algorithm ends The above algorithm is explained with a simple example given in Fig. 4. The resembling sub-graphs are shown in Fig.4(b).
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
1
1
1
0
1
1
1
0
1
1
1
1
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
0
0
1
0
0
1
1
1
1
1
1
0
0
1
0
0
1
1
0
0
0
1
0
1
1
1
0
1
1
0
0
0
1
0
1
1
1
0
1
1
1
1
1
1
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
1
1
0
1
1
1
1
0
0
1
1
1
1
0
1
1
1
1
1
0
1
1
0
1
1
1
1
1
1
1
0
1
1
0
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(a)
(b) Fig.4. Sample input image.
Without losing generality it is assumed that the given image has been already pre- processed. Firstly, a dummy list is created. Then, the input image is scanned from top to bottom and from right to left. When the first Copyright © 2016 MECS
foreground pixel (located at the coordinates (5,10)) is encountered, it is checked if such pixel exists in any list available (here it is dummy list only), therefore a new list C1 is created and the coordinates (5,10) are stored in the I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
12
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
list C1. The 8-neighborhood pixels of the pixel under consideration are traced and the pixels located at (4,9),(5,9) and (6,10) are stored in C1. The scanning continues with the pixel located at (6,10); it is found that (6,10) is already present in C1. Therefore, the foreground pixels neighboring such pixel are also stored in C1 which reflects the fact that these pixels belong to the same component to which pixel at (6,10) belongs. This means that C1 is updated and the pixel at (7, 9) is stored in C1 (the pixels at (5,9) and (5,10) are already stored in C1. Hence, no further action is needed). When the pixel at (2,7) is scanned, it is found that does not exist in any list available till now (i.e., dummy list and C1) hence a new list C2 is created and the coordinates (2,7) are stored in C2. Again, the 8-neighborhood of pixel is traced and coordinates (3,7) are stored in the list C2. Moving on to the pixel at (5,7) it is again found that it does not exist in any list hence a new list C3 is created and the coordinates of the current pixel; that is (5,7) and of the neighboring foreground pixels (5,6) and (6,6) are stored in C3. The algorithm proceeds to the pixel at (7,7). It is found that this pixel is already stored in C1. The 8-neighborhood is scanned. However, it is established that (6,6), an 8neighbor of the pixel at (7,7) which is stored in C1, is present at C3. Hence, C1 and C3 are merged. Overall, the algorithm produces 4 lists corresponding to the 4 different foreground components present in the image. They are: C1={(5,10), (4,9), (5,9), (6,10), (7,9), (7,8), (8,9), (7,7), (5,7), (5,8), (6,6), (7,6), (4,5), (2,5), (3,5)} , C2={(2,7), (3,7)} , C3={(9,7), (10,6), (10,5), (9,4), (8,4), (8,3)} and C4={(6,4), (6,3), (6,2). Upon the completion of connected components extraction, we obtain the set of connected components ordered as per their appearance in the text image. Each component is described using its borders, i.e., (xmin,xmax) and (ymin,ymax). This can be straightforward computed by sorting the points of the connected component with respect to x-axis any y-axis respectively. A matrix, component_border_matrix, is maintained to store such information where each component is assigned an ID number according to its appearance and its borders
information are retained (see Table 1) .The upcoming operations are based on this matrix. Table 1.Comp_border_matrix ID
xmin
xmax
ymin
ymax
1
2
8
5
10
2
2
3
7
7
3
8
10
3
7
4
6
6
2
4
C. Sub-word Segmentation The core operation of segmenting text into sub-words is performed in this stage. It is inspired by the sequence of Arabic writing where sub-words are written -one after the other- from right to left. The sub-words segmentation task is accomplished by analyzing the components retained in the comp_border_matrixconstructed in the previous stage. It may be recalled that an Arabic subword consists essentially of a main body (primary component). Sometimes, there exists diacritical marks (dots and/or Hamza) associated with the main body (they are known as secondary components). For example the first sub-word of the word given in Fig. 5(b) consists only of the main body (C1) whereas the second sub-word contains primary component (C2) and secondary component (C3). Initially, it is assumed that the position information, that is (xmin,xmax) and (ymin,ymax) of the primary component of the sub-word is sufficient to extract the sub-image containing such sub-word. However, as mentioned earlier, a sub-word may consist of both primary and secondary components. To handle such scenario the concept of satellite component is introduced. Definition: Satellite Component A component Ci is said to be a satellite component with respect to a component Cj if it is fully placed within the borders of Cj. As an example the, component C2 in Fig. 4(a) is a satellite component with respect to the component C1. The component C3 in Fig. 5(b) is a satellite component with respect to the component C2 whereas the component C5 in Fig. 5(b) is a satellite component with respect to both C4 and C2.
(a)
(b) Fig.5. Satellite Components.
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
13
The sub-word segmentation proceeds as follows: the rightmost component is considered to be the (primary component of) first sub-word in the text line. The satellite components associated with this component (if any) are identified. If there is no satellite component associated with this component, the information of the component under consideration are used to extract the sub-image contained between (xmin ,xmax) and (ymin, ymax). Otherwise, the information of satellite component (s) are used to modify (xmin ,xmax) ,if necessary, prior to the extraction of the sub-image containing the sub-word. If the sub-word is overlapping with the succeeding subword, a translation operation is performed to resolve the overlapping. This is repeated till all the sub-words are segmented. The detailed algorithm is listed below.
For the sake of clarity, the proposed algorithm is explained in the light of the word shown in Fig. 6(a) which consists of 4 sub-words. The comp_border_matrix representing the components of the input word is shown in Table 2.
Algorithm: Sub-word Segmentation Input: f, text line (word) image comp_border_matrix, matrix containing components information n, number of connected components Output: SW1, SW2,â&#x20AC;Ś,SWk, Sub-words contained in the input text line Method: current_sub_word=1 while(current_sub-word<=n( Obtain succeeding sub-word; i.e., successive component which are not a satellite component with respect to current_sub_word if succeeding sub-word is not overlapping with the current sub-word Obtain satellite component (s) associated with the current_sub_word if there exists no satellite component Use the information contained in the comp_border_matrix to extract the sub-image containing the component current_sub_word else Update (xmin,xmax) using the positional information of satellite component (s) else Extract the sub-image (SI) surrendered by the rightmost column of the right sub-word (rc) and the leftmost column of the left sub-word (lc) Identify the satellite component (s) associated with the current_sub_word where: ymax(lc): ymax of the left component, and ymin(rc): ymin of the right component Perform translation operation of the right component along with its satellite component (s), if any, by the
To start with, the first component (component 1 in the matrix) is considered to be (the primary component of) the first sub-word in the image. Based on the analysis of (ymin,ymax) of the following component, it is deduced that there is no satellite component associated with this component; and hence the component which immediately follows (component 2) is considered to be (the primary component of) the succeeding sub- word. Based on the analysis of (ymin,ymax) of the two components it is established that there is no overlapping. Since there is no satellite components associated with this component, the information (xmin,xmax) and (ymin,ymax) of component 1 are used to extract the sub-image; i.e., first sub-word in the image (See Fig. 6(b)). The algorithm proceeds to the (primary component of the) second sub-word. There exist two satellite components associated with component 2 and hence the component 5 is identified to be the (primary component of the) succeeding sub-word. The position information (xmin,xmax) of the satellite components are used to modify (xmin,xmax) of the subimage containing the second word (Here only x min is modified; see Fig. 6(c)). The component 5 is picked for processing. Here, there is no satellite components attached so the component 6 is identified to be the (primary component of the) succeeding sub- word. It is established that the (main body of the) current sub-word is overlapping with the (main body of the) successive sub-word. Therefore, the sub-image containing the current sub-word along with the succeeding sub-word is extracted (Fig. 6(d)). This is followed by applying a translation operation of the current sub-word with a suitable amount to resolve the overlapping (Fig. 6(e)). Finally, the sub-image containing the sub-word under consideration is extracted (Fig. 6(f)). Moving on to the next sub-word, it is identified that the component 6 constitutes the primary component of the sub-word. It is also found that the component 7 is a satellite component with respect to component 6, and hence the component 8 is considered to form the (primary component of the) succeeding sub-word. Based on the analysis of overlapping, process similar to the previous case is applied (Fig. 6(g)-(i)). Finally, the sub-word constituted by component 8 is extracted (Fig. 6 (j)).
Extract the sub-image of SI which contains the current sub-word along with satellite component (s), if any current_sub_word=succeeding sub-word Algorithm ends
Copyright Š 2016 MECS
Table 2.Comp_border_matrixof the image given in Fig. 6(a) ID 1 2 3 4 5 6 7 8
xmin 93 95 89 85 117 144 113 114
xmax 164 175 102 98 208 178 126 126
ymin 345 222 270 253 83 62 73 57
ymax 367 337 282 266 218 102 87 70
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
14
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
(a)
(b)
(e)
(f)
(c)
(d)
(g)
(i)
(h)
(j)
Fig.6. Stages of sub-word segmentation algorithm.
It is worth noting that the first three sub-words are segmented correctly whereas the last sub-word was oversegmented due to the displacement of the dot component which led to mis-identifying it as a separate sub- word. To address such cases, an additional stage which aims to refine the results of segmentation is developed. It is described in detail in the following sub-section. D. Refinement of Segmentation The results of the previous stage are subjected to refinement stage to compensate for the over-segmentation caused by the displacement of secondary components (i.e., dots and Hamza). The strategy applied for the refinement is: the height of the candidate sub-word is computed. If the height is less than a threshold and the sub-word overlaps with either the preceding or the succeeding subword, the sub-image containing the overlapped sub-word, along with its satellite components if any, and the subword under investigation is reconstructed using the information of pixels contained in the respective components. The detailed algorithm for reconstructing the sub-image containing a particular set of components in a binary image is listed below. Algorithm: Reconstruct sub-image Input: C1,C2,â&#x20AC;Ś,Cn : Pixel coordinates of the components C1,C2, and Cn respectively. Output: SI, sub-image reconstructed. Method: Step 1: Find the minimum x-coordinate for each component mx_Ci, i=1,2,..,n Step 2: Find the Maximum x-coordinate for each component Mx_Ci, i=1,2,..,n Step 3: Find the minimum x-coordinate for each component my_Ci, i=1,2,..,n Step 4: Find the minimum x-coordinate for each component My_Ci, i=1,2,..,n Step 5: Set mx=min(mx_Ci) Mx=min(Mx_Ci) i=1,2,..,n my=min(my_Ci) My=min(My_Ci) Step 6: Declare SI to be an image of size (Mx-mx+1) rows and (My-my+1) columns containing only background (OFF) pixels Copyright Š 2016 MECS
Step 7: For each component Ci , i=1 to n do For each pixel of the coordinate (i,j) do Map the pixel (i,j) to the output image (SI) pixel (i-mx+1,j-my+1) Set the output image pixel to foreground (ON) pixel Algorithm ends The components C3={(9,7), (10,6), (10,5), (9,4), (8,4), (8,3)} and C4=={(6,4), (6,3), (6,2) of image given in Fig. 4 is reconstructed as follows- firstly, the (xmin,xmax) and (ymin,ymax) of each component is retrieved ((xmin,xmax) and (ymin,ymax) of C3 are (8,10) and (3,7) respectively. similarly, it is (6,6) and (2,4) for C4). Therefore, mx=6, Mx=10, my=2 and My=7 are set. Then an image of (106+1=5) rows and (7-2+1=6) are created with all the pixels are set to be background pixels. This is followed by mapping the pixels of C3 and C4 to the output image. Pixel p with the coordinates (i,j) is mapped to the output image as follows- (xout, ,yout) = (i-mx+1,j-my+1). For example, the pixel (9,7) of C3 is mapped to (9-6+1,72+1)=(4,6). Similarly, (6,2) of C1 is mapped to (1,1). Each pixel of a component Ci is mapped to the corresponding output coordinate and the corresponding pixel in the output image is set to 0 (foreground). 0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
0
1
1
0
1
1
1
0
0
1
Fig.7. Reconstruction of the components C3 and C4 of the image given in Fig. 4
Going back to the previous example where the oversegmentation results is held, we take the previous example (Fig. 6) wherein the last sub-word is oversegmented into two sub-words (the first include the main body of the sub-word along with a dot whereas the second contains the dot). When the reconstruction is applied on the components of the fourth and the fifth subword, the over-segmentation is handled and the output is shown in Fig. 8.
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
15
Communications Technology (IFN) at Technical University Braunschweing in Germany and The National School of Engineers of Tunis (ENIT). It contains handwritten Arabic names of 946 towns/villages in Tunisia (A name may consist of several words) written by 411 writers. Database contains totally 26459 handwritten images. KHATT [12] dataset consists of 1000 handwritten forms written by 1000 distinct writers from different countries Overall, each writer has written 6 paragraphs: 1 fixed text paragraph (written twice), 2 randomly selected paragraphs, and 2 optional paragraphs about subject of his interest. The fourth dataset used is a Persian dataset which is part of a multi-lingual dataset introduced in [11]. The Persian part consists of 140 unconstrained handwritten pages. Initially, the experimentation was carried out without applying the refinement operation. Overall, we obtained 77.63% successful segmentation. Some successful segmentation results are shown in Fig. 9. The detailed results of the experimentation are given in table 4. The correct segmentation results for each dataset are shown in table 5.
Fig.8. Output of reconstruction operation.
IV. EXPERIMENTAL RESULTS Experiments were conducted on handwritten images (word/text line) drawn from four different datasets [1013]. The sample consists of 350 images of Arabic/Persian text. Details are listed in Table 3. Datasets are briefly described below. Datasets The IESK-ArDB [10] database was developed in the Institute for Electronics, Signal Processing and Communication (IESK) at Otto-von- Guericke University Magdeburg. It contains more than 4000 Arabic word images in addition to more than 6000 segmented character images. Samples were collected from 22 writers from different Arabic countries. The IFN/ENIT [13] dataset was developed by the Institute of
Table 3. Experimentation Samples Details Tot. No. samples 350
of
No. of image 212
word
No. of text line images 138
Tot. No. words 988
of
Tot. No. of subwords 2325
Table 4. Experimentation Results. Segmentation Category
Percentage
Correct Segmentation Dot/Hamza Displacement
77.63% 16.27%
Over-segmentation
Pen lifting
2.75%
Under-segmentation
Fully overlapped sub-words Touching sub-words
0.58% 2%
Incorrect association of secondary component
0.77%
Table 5. Correct Segmentation results Dataset
No. of Samples
No. of Subwords
ArDB IFN/ ENIT KHATT Persian
150 150 25 25
Copyright Š 2016 MECS
Percent
356 669
No. of correctly segmented subword 320 549
587 713
432 504
73.59% 70.69%
89.88% 82.06%
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
16
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
Input Image
Output Sub-words
Fig.9. Some successful segmentation results.
There exist 19.02% of the sub-words which were oversegmented. The main reason for the over-segmentation is the displacement of dot/Hamza components (16.27%). Pen lifting has also caused over-segmentation (2.75%); however, in most of the cases (87%) where the pen lifting is the reason for over-segmentation, there will be no effect (consequences) on the subsequent character segmentation. Fig. 10 shows some text images wherein some sub-words were over-segmented (the oversegmented sub-words are indicated by rectangles).There exists also another segmentation error; that is the underInput Image
segmentation. The first (and inevitable) source for undersegmentation is the touching of the sub-words. Furthermore, there appear cases where a sub-word is totally overlapping with another sub-word leading it to be identified as a satellite component which causes undersegmentation. Fig. 11 shows some under-segmentation results (under-segmentation is highlighted by encircling).Finally, in few cases the algorithms was successful in segmenting the sub-word. However, it has incorrectly associated a secondary component with it. Fig. 12 shows such cases. Output sub-words
Remarks Dot Displacement
Hamza Displacement
Dot Displacement
Pen Lifting
Fig.10. Some over-segmentation results. Input Image
Output sub-words
Remarks Fully overlapped sub-words Touching subwords
Fig.11. Some under-segmentation results.
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
Input Image
17
Output sub-words
Fig.12. Incorrect Association of secondary components. Table 6. Thresholding Experimentation No. of secondary components Threshold No. of secondary components detected Percentage No. of primary components categorized as secondary components
850
850
850
850
833
50% 757
55% 773
60% 811
65% 818
70%
89% 0
91% 2
95% 4
96% 19
98% 31
Table 7. Correct Segmentation results after the refinement Dataset
No. of Samples
No. of Sub-words 356 669
No. of correctly segmented sub-word 348 632
ArDB IFN/ ENIT KHATT Persian
150 150 25 25
Percent 97.75% 94.47%
587 713
523 618
89.10% 86.67%
Table 8. Comparative segmentation results Method of [14]
Method of [3]
Proposed Method
74%
82%
91.23%
Input image
Output obtained by [14] Output obtained by [3] Output obtained by the proposed algorithm (a) Input image
Output obtained by [14] Output obtained by [3] Output obtained by the proposed algorithm (b)
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
18
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
Input image
Output obtained by [14] Output obtained by [3] Output obtained by the proposed algorithm (c) Fig.13. Comparative results; over-segmentation is highlighted by rectangle and under-segmentation by encircling.
From the above listing it is obvious that Dot/Hamza displacement is the major source of segmentation errors. This motivated us to add an additional stage which aims to handle such cases and refine the segmentation results. The detailed description of the refinement stage is stated above. It is observed that Dot/Hamza components have relatively small height. This is utilized in the stage of refinement. Towards this end, 120 handwritten images (77 word image and 43 text line image) were chosen. We have chosen the threshold as a percentage of the average height of the components in the image. The threshold of 60% was chosen and the experiments were re-conducted. The algorithm achieved 91.23% correct segmentation results after incorporating the refinement stage. The correct segmentation results for each dataset after the refinement stage is given in table 7. An explicit experimentation with printed Arabic text Finally, though the proposed algorithm is developed to segment handwritten Arabic text into sub-words, experimentation is carried out to test its efficiency when the input is a printed text. In this experiment 30 printed Arabic text lines is chosen from the database introduced in [19]. They are printed in 6 different styles and totally contain 1249 sub-words. The algorithm is successful in segmenting 99.44% of the sub-words where the only source of error is the incorrect association of satellite components. However, as stated earlier this could be handled in the subsequent stages of the recognition system. Barring this the algorithm is capable of segmenting printed Arabic text lines into sub-words almost perfectly.
V. COMPARATIVE RESULTS The method used for sub-word segmentation in [14] is based on vertical projection of the black pixels onto Xaxis. It obtained 74% successful segmentation on the handwritten samples used in our experimentation (See table 3 for details). Although it can withstand the disconnectivity caused by pen lifting phenomena to some Copyright Š 2016 MECS
extent, it has the major limitation of being unable to segment overlapping sub-words. On the contrary, the proposed method which is based on connected component analysis efficiently handles the overlapping nature of Arabic text while facing the drawback of oversegmentation at the sub-word level. However, such oversegmentation has no consequences on the character segmentation in most of the cases as stated above. Finally, both the methods are unsuccessful in segmenting touching sub- words. In another method proposed by Parvez et al [3], the connected components are firstly extracted. Then, each connected component is enclosed in a rectangular box and the area of such box is computed. These components which the area of their bounding box is greater than or equal to have of the average area are identified to be base components. Later on, the centroid points of such components are used to estimate the baseline. The area of the bounding box enclosing a particular component and the distance from its centroid point to the estimated baseline is used to determine if such component is a primary component or a secondary component. Finally, the secondary components are associated with their corresponding main components using the amount of overlapping; a secondary component is associated with the primary component with which it has the maximum amount of overlap. If a secondary component has the same amount of overlap with more than one component, it is associated with the component that has the smallest distance to the centroid point of the secondary component. Again, the experiment was conducted using the same handwritten samples and the algorithm successful rate of sub-word segmentation achieved is 82%. The main source of error for this algorithm is its dependence on categorizing the components into primary and secondary components; a primary component is quite frequently mis-identified as a secondary component due to the small area of the bounding box enclosing such component which leads to under-segmentation in many cases. Sub-word segmentation is addressed as an intermediate stage of a system meant to recognize handwritten Arabic words in [9]. Firstly, the input word is segmented into I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
sub-words. Sub-words are segmented into graphemes which are later passed to a recognition engine that utilizes recurrent neural networks. The sub-word segmentation is accomplished as follows-the baseline is estimated using horizontal projection. Then, connected components are extracted. A component is considered to be a secondary body if it is very small compared with other components, it is relatively small and far from the baseline, or it is a vertical line with a large component below it. The components which are not secondary bodies are considered to be the main bodies of the sub-words. A set of rules are applied to assign secondary bodies to their respective main bodies. Finally, every main body is extracted with its secondary bodies as one sub-word and passed to the grapheme segmentation stage. The results of segmentation at sub-word level are not reported. Furthermore, the details of the fixing the parameters (size, distance) are not given. Hence it was not possible to subject such method for the experimental comparative analysis. However, it may be observed that the proposed method on the contrary does not require such parameters and is independent of the classification of components into primary and secondary.
VI. CONCLUSION Segmenting Arabic text into sub-words is a crucial task for any recognition system. This paper is an attempt towards accomplishing such task. The proposed algorithm is based on the analysis of connected components (which are extracted using graph theory concepts). The results obtained are encouraging and it motivates us to do more research in this direction. ACKNOWLEDGEMENT We would like to thank Miss. Faten Kallel Jaiem for providing the APTI printed Arabic database. The author also acknowledges University of Thamar, Yemen for financial support. REFERENCES [1] R.M. Bozinovic, S.N. Srihari, ―Off-line cursive script word recognition‖, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 1, pp. 68-83,1989. [2] S.N. Srihari, G. Ball, An assessment of Arabic handwriting recognition technology, in: V. Margner, H. El Abed (Eds.), Guide to OCR for Arabic Script, SpringerVerlag, London, pp. 3-34, 2012. [3] M.T. Parvez, S.A. Mahmoud, ―Arabic handwriting recognition using structural and syntactic pattern attributes‖, Pattern Recognition, Vol. 46, No. 1, pp. 141154, 2013. [4] A.Cheung, M. Bennamoun, N.W. Bergmann, ―An Arabic optical character recognition system using recognitionbased segmentation‖, Pattern Recognition, Vol. 34, No. 2, pp. 215-233, 2001. [5] M. Zand, A.N Nilchiand, S.A. Monadjemi, ―Recognitionbased Segmentation in Persian Character Recognition‖, World Academy of Science, Engineering and Technology, Vol. 38, pp. 183-187, 2008.
Copyright © 2016 MECS
19
[6] A. AbdulKader, Two-Tier Approach for Arabic Offline Handwriting Recognition based on conditional joining rules, in: Proceedings of the 2006 Summit on Arabic and Chinese Handwritten Recognition, pp. 121-127, 2006. [7] Y. Chherawala, M. Cheriet, ―W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents‖, Pattern Recognition, Vol. 45, No. 9, pp. 3277-3287, 2012. [8] S. Wshah, V. Govindaraju, Y. Cheng, H. Li, ―A Novel Lexicon Reduction Method for Arabic Handwriting Recognition‖, in: Proceedings of the Twentieth International Conference on Pattern Recognition, pp. 2865-2868, 2010. [9] G.A. Abandah, F. Jamour, E. Qaralleh, ― Recognizing handwritten Arabic words using grapheme segmentation and recurrent neural networks‖, International Journal of Document Analysis and Recognition, Vol. 17, No. 3, pp. 275-291, 2014. [10] M. Elzobi, A. Al-Hamadi , Z. Al Aghbari, L. Dings, ―IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach, International Journal of Document Analysis and Recognition‖, Vol. 16 No. 3, pp. 295-308, 2013. [11] A. Alaei, P. Nagabhushan, U. Pal, ―Dataset and Ground Truth for Handwritten Text in Four Different Scripts‖, International Journal of Pattern Recognition and Artificial I, Vol. 26, No. 4, 2012. [12] S.A. Mahmoud, I. Ahmad, W.G, ―Al-Khatib, M. Alshayeb, KHATT:An open Arabic offline handwritten text database‖, Pattern Recognition, Vol. 47, No. 3, pp. 10961112, 2014. [13] M. Pechwitz, S. S. Maddouri, V. Märgner, N. Ellouze, H. Amiri, ―IFN/ENIT- Database of Handwritten Arabic Words‖, in: Proceedings of CIFED : colloque international francophone surl crit et le document, pp.129-136, 2002. [14] A. Elnagar, R. Bentrcia, ―A Multi-Agent Approach to Arabic Handwritten Text Segmentation‖, Journal of Intelligent Learning Systems and Applications, Vol. 4, No. 3, pp. 207-215, 2012. [15] A. Elnagar, R. Bentrcia, ―A Recognition-Based Segmentation Approach to Segmenting Arabic Handwritten Text‖, Journal of Intelligent Learning Systems and Applications, Vol. 7, No. 4, pp. 93-103, 2015. [16] S. Marchand-Maillet, Y. M. Sharaih, Binary Digital Image Processing A Discrete Approach, first ed. Academic Press, London, 2000. [17] R. C. Gonzalez, R. E. Woods, Digital Image Processing, third ed., Dorling Kindersley (India) Pvt. Ltd., India, 2009. [18] S.N. Srihari, G. R. Ball, H. Srinivasan, ―Versatile Search of Scanned Arabic Handwriting‖, in: Proceedings of the 2006 Summit on Arabic and Chinese Handwritten Recognition, pp. 57-69, 2006. [19] F. K. Jaiem, S. Kanoun, M. Khemakhem, H. El Abed, J. Kardoun, ―Database for Arabic Printed Text Recognition Research‖, in: Proceedings of the Seventeenth International Conference on Image Analysis and Processing, pp. 251-259, 2013. [20] L. Zheng, A.H. Hassin, X. Tang, ―A new algorithm for machine printed Arabic character segmentation‖, Pattern Recognition Letters, Vol. 25, No. 15, pp. 1723-1729, 2004. [21] A. Ebrahimi, E. Kabir, ―A pictorial dictionary for printed Farsi subwords‖, Pattern Recognition Letters, Vol. 29, No. 5, pp. 656-663, 2008. [22] M. Khayyat, L. Lam, C.Y. Suen, ―Learning-based word spotting system for Arabic handwritten documents‖, Pattern Recognition, Vol. 47, No. 3, pp. 1021-1030, 2014.
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
20
Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words
Authors’ Profiles Hashem Ghaleb received his B.E. in Computer Engineering from King Saud University, Riyad, Saudi Arabia and M.Tech from University of Mysore, Mysore, India. Currently he is a PhD studentat the Department of Studies in Computer Science, University of Mysore, India. His research interest includes Image Processing, Document Image Processing, and Pattern Recognition
P. Nagabhushan (B.E.—1980, M.Tech.— 1983, Ph.D.—1989) is a professor at the Department of Studies in Computer Science and was Director—Planning Monitoring and Evaluation Board at the University of Mysore, India. He is an active researcher in the areas pertaining to Pattern Recognition, Document Image Processing, Symbolic Data Analysis and Data Mining. He has over 400 publications in journals and conferences of International repute. He has chaired several international conferences. He is a visiting professor to USA, Japan and France. He is a fellow of
Institution of Engineers and Institution of Telecommunication and Electronics Engineers, India.
Umapada Pal received his Ph.D. from Indian Statistical Institute (ISI). He did his Post-Doctoral research at INRIA, France. During July1997–January1998 he visited GSF- For schungszentrum fur Umweltund Gesundheit GmbH, Germany as a guest scientist. From January1997, he is a faculty member of the Computer Vision and Pattern Recognition Unit, ISI, Kolkata. He has published numerous research papers in various international journals, conference proceedings, and edited volumes. He received student best paper award from Chennai Chapter of Computer Society of India and a merit certificate from Indian Science Congress Association in 1995 and 1996, respectively. Dr. Pal achieved ‗ICDAR Outstanding Young Researcher Award‘fromTC-10 and TC-11 committees of IAPR in 2003. He has been serving as a guest editor, co-editor, program chair, and program committee member of many international journals and conferences. He is a life member of Indian unit of IAPR and a senior life member of Computer Society of India.
How to cite this paper: Hashem Ghaleb, P. Nagabhushan, Umapada Pal,"Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.8-20, 2016.DOI: 10.5815/ijigsp.2016.12.02
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 8-20
I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.03
Remote Sensing Textual Image Classification based on Ensemble Learning Ye zhiwei1, Yang Juan1, Zhang Xu1, Hu Zhengbing2 1
School of Computer Science, Hubei university of Technology, Wuhan, China School of Educational Information Technology, Central China Normal University, Wuhan, China
2
Abstractâ&#x20AC;&#x201D;Remote sensing textual image classification technology has been the hottest topic in the filed of remote sensing. Texture is the most helpful symbol for image classification. In common, there are complex terrain types and multiple texture features are extracted for classification, in addition; there is noise in the remote sensing images and the single classifier is hard to obtain the optimal classification results. Integration of multiple classifiers is able to make good use of the characteristics of different classifiers and improve the classification accuracy in the largest extent. In the paper, based on the diversity measurement of the base classifiers, J48 classifier, IBk classifier, sequential minimal optimization (SMO) classifier, Naive Bayes classifier and multilayer perceptron (MLP) classifier are selected for ensemble learning. In order to evaluate the influence of our proposed method, our approach is compared with the five base classifiers through calculating the average classification accuracy. Experiments on five UCI data sets and remote sensing image data sets are performed to testify the effectiveness of the proposed method. Index Termsâ&#x20AC;&#x201D;Remote Sensing, Textual Classification, Ensemble Learning, Bagging.
Image
I. INTRODUCTION Remote sensing image mining or classification is one of the most important methods of extracting land cover information on the Earth [1]. Different from standard alphanumeric mining, image mining or classification is very difficult because images data are unstructured [2]. There are two main image classification techniques, unsupervised image classification and supervised image classification. As for supervised image classification, first, the user selects representative samples called training set for each land cover classes, then a learning classifier is trained by a set of given training data set which contains a lot of training samples, in the end, the trained classifier will be utilized for practical application. In each training samples, there are a low-level feature vector and its related class label. The trained classifier is able to distinguish unknown low-level feature vectors into a class which has been trained. Several classifiers like Maximum Likelihood Classifier, Minimum Distance Classifier has been used for image classification [3]. With the development of remote sensing technology, the spatial and spectral resolution of remote sensing Copyright Š 2016 MECS
images has been getting higher and higher [4]. It presents new challenges to remote sensing image classification and requires the development of new data classification methods. Many new classification methods such as spectral information divergence, object oriented paradigm appeared [5]. To a certain extent, these classifiers or classification strategy can improve the classification accuracy; however, different classifiers have their own characteristics. For different applications, the performance of classification is not identical [6]. Some of the samples are wrongly classified by one classifier while these samples may be correctly labeled by another classifier, which indicates that there is complementarity between the classifiers. It is difficult to design a powerful model for classifying remote sensing image because the model should not only have main discrimination information of remote sensing image and it should be robust to its variations at the same time. As a result, only improving traditional methods to achieve robust classification is not always feasible. In 1998, Duin et al. proposed combining multiple classifiers to enhance classification performance of a single classifier [7]. That is, the combination of classifiers is able to amend the errors made by a single classifier on distinct parts of the input space. It is conceivable that the performance of combining multiple classifiers is better than one of the base classifiers used in isolation [8]. The emergence of ensemble learning provides a new research idea for solving the problem of strong correlation and redundancy exists in the bands. Hanson et al. firstly proposed the concept of neural network ensemble [9]. They proved that, the generalization ability of learning systems could be significantly improved through the training of multiple neural networks. In 2011, multiple classifiers ensemble was applied to face recognition [10]. At the same year, support vector machine (SVM) was used as the base classifier to recognize the facial expression [11]. As is known, texture is a vital characteristics for remote sensing image interpretation. However, texture often changes in orientation, scale or other visual appearance thus it is hard to be accurately described by use of a single mathematical model. Generally, several descriptors will utilized for classifying textures, which may improve the classification accuracy and lead to classification difficulty in the meantime. In the paper, based on the diversity measurement of the base classifiers, J48 classifier, IBk classifier, sequential minimal optimization (SMO) classifier, Naive Bayes I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
22
Remote Sensing Textual Image Classification based on Ensemble Learning
classifier and multilayer perceptron (MLP) classifier are selected for ensemble learning. These classifiers respectively use C4.5 classification algorithm, Naive Bayes classification algorithm, k-Nearest Neighbors (kNN) classification algorithm, artificial neural network (ANN) classification algorithm as the base classifier. In order to evaluate the influence of our proposed method, our approach is compared with the five base classifiers through calculating the average classification accuracy. The remainder of this paper is organized as follows. Section 2 briefly reviews the ensemble learning. In Section 3, the selection of base classifiers and the proposed method is described in detail. The effectiveness of the proposed method is demonstrated in Section 4 by experiments on several public data sets from UCI machine learning repository and real remote sensing images. Finally, Section 5 draws the conclusion from the experimental results.
II. OVERVIEW OF ENSEMBLE LEARNING In a narrow sense, ensemble learning just uses the same type of learners to learn the same problem. For example, we can put all the learners as support vector machine or neural network classifiers. In a broad sense, a variety of learners are applied to solve the problem, which could be also considered as ensemble learning. The following is the idea of ensemble learning. In general, when learning new examples, the idea of ensemble learning is integrating multiple individual learners and the result is determined by combining the results of multiple learners in order to achieve better performance than a single learner [12]. If considered the single learner as a decision maker, ensemble learning is considered as the decision which is made by a number of decision-makers. With combining k base classifiers, M1 , M 2 , , M k , an improved composite classification model M * is created. A given data set D1 , D2 , , Dk where Di (1 i k 1) is devoted to generate classifier M i , is
used to create k training data sets. The ensemble result is a prediction of class based on votes from the base classifiers. The flow chart of ensemble learning is shown in Fig 1.
There are three methods considered in the theoretical guidance to ensemble learning: 1) Sample set reconstruction. 2) Feature level reconstruction. 3) Output variable reconstruction. The way to employ ensemble learning has two steps usually. The first step is to obtain individual models through producing several training subset. The second step is to use the synthesis technology to get the final results on the individual output through the third methods. Dietterich expounded why an ensemble learner is superior to a single model in three ways [13]. Usually from a statistical perspective, the hypothesis space need to be searched is very large, but it is not enough to accurately learn the target hypothesis as only a few training samples could be used to compare with real samples in the world, which causes the results of learning to be a series of hypotheses that meet the training sets and have approximation accuracy. The hypotheses may well meet the training sets but not hold a good performance in practice, in consequence the choice of only one classify will lead to a big risk. Fortunately, it is able to reduce this risk by considering multiple hypotheses at the same time.
III. MULTIPLE CLASSIFIERS ENSEMBLE BASED ON BAGGING ALGORITHM A. Selection of Base Classifiers As is discussed above, an important reason for the success of ensemble classifier algorithm is that a group of different base classifiers are employed. Diversity among a team of classifiers is deemed to be a key issue in classifier ensemble [14]. However, measuring diversity is not specific for there is no widely accepted formal definition. In 1989, Littlewood and Miller proposed that diversity has been recognized as a very important characteristic in classifier combination [15]. However, there is no rigid definition of what is directly perceived as dependence, diversity or orthogonality of classifiers. Many measures of the connection between two classifier outputs are able to be derived from the statistical literature, such as the Q statistics and the correlation coefficient. There are formulas, methods and ideas aiming at quantifying diversity when three or more classifiers are concerned, but little is put on a strict or systematic basis due to lack of a definition. The general anticipation is that designing the base classifiers and the combination technology can be helped by diversity measures. In the paper, we measure the diversity between 5 types of supervised classifier and use the non-pairwise diversity measures, such as entropy, Kappa measure, KohaviaWolpert variance, etc. Let D {D1 , D2 , , DL } be a set of base classifiers. Let
{1 , 2 , , c } be a set of class labels, in which x be a vector with n features to be labeled [16].The entropy measure E is defined as Eq.(1) Fig.1. The flow chart of ensemble learning
Copyright © 2016 MECS
E
1 N 1 min{l ( x j ), L l ( x j )} N j 1 ( L [ L / 2])
(1)
I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
Remote Sensing Textual Image Classification based on Ensemble Learning
E varies between 0 and 1, where E 0 indicates no difference between base classifies and E 1 indicates the highest possible diversity. It indicates higher possible diversity when the value of entropy is larger. Kohavi-Wolpert Variance use a specific classifier model vx 1 (1 P( y 1| x)2 P( y 0 | x)2 ) to express the 2
diversity of the predicted class label y for x across training samples, where P( y i | x) is estimated as an average over different data sets. Averaging over the entire Z , the KW measure of diversity is defined as Eq.(2) KW
1 NL2
N
l ( z )( L l ( z )) j 1
j
j
(2)
Let p be the average accuracy of each classification, 1 N L i.e., p yj , i , then Kappa measurement is NL j 1 i 1 defined Eq.(3) 1 i 1 l ( xi )( L l ( xi)) K 1 N N ( L 1) P(1 P) N
(3)
From Eq.(3), the Kappa value increases with the increase of the correlation between classifiers. In this paper, J48 classifier, IBk classifier, SMO classifier, Naive Bayes classifier and MLP classifier are selected as the base classifiers. First of all, the five supervised classifiers are introduced briefly. Then, the diversity between these five classifiers is measured by using the non-pairwise diversity measures. 1)
J48 Classifier
Decision tree learning construct predictive model as a decision tree, mapping observations about conclusions about an item's target value. Ross Quinlan developed the algorithm of decision tree which called C4.5. C4.5 is an expansion of earlier ID3 algorithm [17]. C4.5 has the same way as ID3 to build decision trees from training data by the concept of information entropy. The method of the construction of decision tree was first derived from Hunt method, which includes two steps [18]. The first step is if there is only one class, the node is a leaf node, otherwise it will enter the next step. The second step is to search for a variable that is to divide the data into two or more subsets of data with higher purity according to the condition of the variable. That is to say, it selects the variable according to local optimality and then returns to the first step. J48 is an open source Java implementation of the C4.5 algorithm in Weka. The classification rules of the C4.5 algorithm are easy to understand and its accuracy is high. Its main drawback is that it needs to scan and sort the data set repeatedly in the process of constructing the decision tree, which leads to the low efficiency of the algorithm. 2)
IBk
Copyright © 2016 MECS
23
The second classification chosen is k-NN. The input of k-NN comprises k closest training sets in the feature space and the output is a class member. An object is classified as a majority of its neighbors, and the object is assigned to the commonest k nearest neighbor. If k 1 , then the object is assigned to the class of that single nearest neighbor in a nutshell. The training examples, each of which with a class label, are vectors in a multidimensional feature space. In the classification phase, k is a user-defined constant value. The unlabeled vector is classified by attributing the label most frequent among the k training samples nearest to that unknown point. When the training samples are a few, it can simply put the training set as a reference set. When there are many training samples, it can use the existing selection or calculate the prototype of the reference sets. k-NN algorithm has strong adaptability to the tested samples with more overlapping domains. A commonly used distance metric for continuous variables is Euclidean distance [19]. The length of the line segment connecting points i and j , (ij ) is the Euclidean distance. In Cartesian coordinates, the distance (d) from i to j , or from j to i when i ( xi1 , xi 2 , , xin ) and j ( x j1 , x j 2 , , x jn ) are two points in Euclidean nspace is given by Eq.(4)
d (i, j ) ( xi1 x j1 )2 ( xi 2 x j 2 )2
( xin x jn )2
(4)
k-NN is a non parametric classification technique [20]. It has higher classification accuracy to unknown and non normal distribution. It has the advantages of intuitive thinking, high feasible degree and clear concept. It directly uses the relationship between the samples, which can reduce the error probability of the classification and avoid unnecessary trouble. Of course, it is a kind of lazy learning method. It has the disadvantages of slow classification speed and strong dependence on sample size. 3)
Sequential Minimal Optimization Classifier
SMO is get from the idea of decomposition algorithm to the extreme for solving the optimization problem [21]. It is an iterative algorithm to break the problem into some smallest possible sub-problems, of which the most prominent place is that the optimization problem of two data points can be obtained analytically. Therefore, there is no need to take twice planning optimization algorithm as a part of the algorithm. Each step of SMO selects two elements to optimize [22]. The optimal values of two parameters need to be found and updated the corresponding vectors on the premise that other parameters have been fixed. In spite of more iterations needed to converge, there is an increase in the number of speed because the operation of each iteration is very small. Based on the principle of structural risk minimization and VC dimension theory of statistical learning theory, Support vector machine (SVM) finds the best balance between the learning ability and the complexity of the model by a certain samples, so as to get the best I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
24
Remote Sensing Textual Image Classification based on Ensemble Learning
promotion ability. Compared with the traditional artificial neural network, SVM has the advantages of simple structure. It increases a lot in the generalization performance and solves the local optimum problem which could not be avoided in the neural network. SVM can solve the problems of small collective samples, high dimension and nonlinear. It has a lot of special properties to ensure that the generalization ability in learning period is better. At the same time, it also averts the problem of dimension. 4)
Compared with other algorithms, neural network has the advantages of high capacity of noise data, and it has a very good performance for the classification of the training data. Different numbers and types of classifiers are used to measure their diversity. The results are shown in TABLE I. Table 1. Kappa Measurement of Different Classifiers number
Naive Bayes
Naive Bayes classifier is a simple probabilistic classifier based on Bayes' theorem, which has a strong independent assumption [23]. Bayes' theorem is based on the prior probability of a given class known, and then it uses the Bayes formula to calculate the posterior probability. Finally, the class that has the largest posterior probability is selected as the class of object. In the theory, Naive Bayes is a conditional probability model: suppose the sample space of experiment E to be S , represented by B1 , B2 ,..., Bn representing n features.
P( Bi ) 0,(i 1, 2,..., n) . Using Bayes' theorem, the conditional probability is able to be calculated as Eq.(5) P( A | Bi ) P( Bi ) n
P( A | B ) P( B ) j 1
j
, i 1, 2,..., n
(5)
j
Multilayer Perceptron
An artificial neural network is a simulation of biological neural network system which are used to evaluate or approximate functions that can depend on a large number of simple computing units connected in some form to form a network [24]. In the stage of network learning, network achieves the correspondence between input samples and correct sample by adjusting the weights. The neural network has a strong ability to identify and classify the input samples, which is to find out the segmentation regions each of which belongs to a class meeting the classification requirements through sample space in fact. A MLP is a feedforward ANN model, which can be regarded as a mapping F : Rd RM . What makes a MLP different from other neural network is that a number of neurons use a nonlinear activation function. Learning happens in the perceptron by altering the weight between neurons after each training sample is processed, based on the comparison between the output of the error and the expected results. Copyright © 2016 MECS
(J48,IBk)
0.5952
(J48,SMO)
0.885
(J48,Bayes)
0.8745
(J48,MLP)
0.767
(IBk,SMO)
0.9492
(IBk,Bayes)
0.8751
(IBk,MLP)
0.7868
(SMO,Bayes)
0.885
(SMO,MLP)
0.7787
(Bayes,MLP)
0.7768
(J48,IBk,SMO)
0.9595
(J48,IBk,Bayes)
0.9476
(J48,IBk,MLP)
0.9158
(J48,SMO,Bayes)
0.9473
(J48,SMO,MLP)
0.9132
(J48,Bayes,MLP)
0.9116
(IBk,SMO,Bayes)
0.9573
(IBk,SMO,MLP)
0.9232
(IBk,Bayes,MLP)
0.9149
(SMO,Bayes,MLP)
0.9143
(J48,IBk,SMO,Bayes)
0.9627
(J48,IBk,SMO,MLP)
0.958
(J48,IBk,Bayes,MLP)
0.9548
(J48,SMO,Bayes,MLP)
0.9542
(IBk,SMO,Bayes,MLP)
0.9572
(J48,SMO,MLP,IBk,Bayes)
0.9671
3
A clear distinction between Naive Bayes and other learning methods is that it does not explicitly search possible hypothesis space. Naive Bayes algorithm takes less time and considers the logic relatively simple. Naive Bayes algorithm also has a high degree of feasibility and the characteristics of logic and high stability. 5)
Kappa
2
A is a event of E and P( A) 0 . For each of i possible results or classes Bi , the instance probabilities are
P( Bi | A)
classifiers
4
5
It can be seen from TABLE I, if we choose two classifiers from all classifiers, the best choice are IBk classifier and SMO classifier. Similarly, if we choose three classifiers from all classifiers, the best choices are J48 classifier, IBk classifier and SMO classifier. If we choose four classifiers from all classifiers, the best choices are J48 classifier, IBk classifier, SMO classifier and Bayes classifier. If we want to get better results, we need to choose the five classifiers. Therefore, J48 classifier, IBk classifier, SMO classifier, Naive Bayes classifier and MLP classifier are chosen to conduct ensemble learning.
I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
Remote Sensing Textual Image Classification based on Ensemble Learning
B. Multiple Classifiers Ensemble based on Bagging Algorithm Bagging algorithm is a kind of ensemble learning method improving the classification by combining classifications with randomly selecting training sets, which proposed by Breiman in 1994 [25]. If there is a training set of size m , it is practicable to draw m random instances from it with replacement. The m instances are able to be learned, and this process can be duplicated several times. Some duplicates and omissions are contained in the instance compared to the initial training set, since the draw is with replacement. Through the process, each cycle results in a classifier. According to the construction of several classifiers, the forecast of each classifier will be a vote to influence the final forecast. Algorithm: Bagging algorithm Input: 1. 2. 3. 4.
D {( x1; y1),( x2; y 2),...,( xm, ym)} , a set of m training tuples; T , the number of models in the ensemble; L , a classification learning scheme (J48, IBk, SMO, Naive Bayes and MLP). Output: The ensemble --- a composite model, T
H ( x) arg max l ( y ht ( x)) . When the value yY
t 1
5.
in the parentheses is a true proposition, the sum is 1. Otherwise, the sum is 0. Method:
6.
for
7. 8.
t 1 to T do
create bootstrap sample,
Dt Bootstrap( D) , by
sampling D with replacement; use Dt and the learning scheme L to derive a model,
ht ; 9. 10. 11.
endfor To use the ensemble to classify a tuple, X : Let each of the T models classify X and return the majority vote;
Given a training set D {( x1; y1),( x2; y 2),...,( xm, ym)} of size m , bagging algorithm generates T new training sets Dt , each of size m , by sampling from D congruously. For each sample set, the probability is 1 (1 1/ m)m . For large m , the unique examples will be 1 1/ e 63.2% and the rest will be duplicates. The T models are fitted using the above T kinds of samples which known as a bootstrap sample and combined by casting votes. It has the correct * classification rate r max P(i | x) PX ( x) . Based on i
bagging algorithm, the probability classification can be as Eq.(6)
Copyright © 2016 MECS
of
correct
rA
xC
25
max P(i | x) PX ( x) [ I (A ( x) i)P(i | x)]PX ( x) C'
i
i
(6) It can be seen from the correct rate that the result of bagging algorithm is better than the results obtained by a single prediction function.
IV. SIMULATION RESULTS AND DISCUSSION A. Experiments for Public Data Sets In order to evaluate the performance of multiple classifiers ensemble based on bagging algorithm, five public data sets from UCI machine learning repository named “Image segment”, “german_credit”, “hepatitis”, “ionosphere” and “soybean” are used in this part. For example, “Image segmentation” data set has 19 continuous attributes, 210 training samples and 2100 test samples. It was randomly selected instances from a database of 7 outdoor images and each instance is a 3x3 region. The images were segmented to create a classification for each pixel. The classes of the “Image segmentation” data set are brickface, cement, foliage, sky, path, window and grass. The general information of other data sets, such as the number of instances, the number of attributes and the number of classes are shown in TABLE II. Table 2. General Information of Public Data Sets From UCI(Http://Archive.Ics.Uci.Edu/Ml/Datasets.Html) Data Sets
Instance
Attribute
Class
segment
2310
19
7
german_credi t
1000
20
2
hepatitis
155
19
2
ionosphere
351
34
2
soybean
683
35
19
Before calculating the classification accuracy, some approaches are chosen to data cleaning as a process. It not only ensures the degree of uniformity and accuracy of the data set, but also makes the data set more conducive to the implementation of the mining process by changing the internal structure and content of the data file. The data preprocessing not only improves the quality of the data sample set but also improves the quality of the data mining algorithm and reduces the running time. For neural network backpropagation algorithm, normalization helps speed up the learning phase after normalizing the input values for each attribute. If using a distance-based method, normalization can help prevent attributes with originally large ranges from overweighting attributes with originally smaller ranges. Considering the classifiers chosen, we use min-max normalization to preprocess the data. In the attributes of Image segmentation data set, the values of region-centroid-col, region-centroid-row, region-pixel-count, short-lineI.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
26
Remote Sensing Textual Image Classification based on Ensemble Learning
density-5, short-line-density-2, vedge-mean, vegde-sd, hedge-mean, hedge-sd, intensity-mean, rawred-mean, rawblue-mean, rawgreen-mean, exred-mean, exbluemean, exgreen-mean, value-mean, saturatoin-mean and hue-mean are [1,254], [11,251], [9,9], [0,0.333], [0,0.222], [0,29.222], [0,991.718], [0,44.722], [0,1386.33], [0,143.444], [0,137.111], [0,150.889], [0,142.556], [-49.667,9.889],[-12.444,82],[33.889,24.667], [0,150.889], [0,1], [-3.044,2.912], respectively. It is clear that the 19 numeric attributes are not in the same range, so they need to be unified to a certain extent. The experiments use Normalize, which is an unsupervised filter in Weka. By min-max normalization, suppose that min A and max A are the minimum and maximum values of an attribute, A . The value v is mapped by the value v of A in the range [0,1] by minmax normalization, computing as Eq.(7)
v'
v min A max A min A
Fig.3. Visualization of the german_credit data set using a scatter-plot matrix with part of attributes
(7)
Fig.2, Fig, 3, Fig, 4, Fig.5 and Fig. 6 are the visualization of the above five data sets using a scatterplot matrix.
Fig.4. Visualization of the hepatitis data set using a scatter-plot matrix with part of attributes
Fig.2. Visualization of the Image Segmentation data set with part of attributes
Fig.5. Visualization of the ionosphere data set using a scatter-plot matrix with part of attributes
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
Remote Sensing Textual Image Classification based on Ensemble Learning
27
performance of the base classifiers are different. There is no one kind of classifier has absolute advantage. This is also the purpose of this experiment. Based on the difference and information complementarity between the base classifiers, it combines different classifiers with bagging algorithm and gives full play to the advantages of each base classifier. From the experimental data of TABLE III, the average classification accuracy of ensemble learning algorithm based on the above five kinds of classifiers is higher than the average classification accuracy using one of the base classifiers separately. It can be seen that the results of the five data sets using bagging algorithms for classification respectively are 97.2%, 76.1%, 85.8%, 92.0%, 94.7%. B. Experiments for Remote Sensing Image Data Sets Fig.6. Visualization of the soybean data set using a scatter-plot matrix with part of attributes
In the experiment, C4.5 algorithm, k-NN algorithm, SVM algorithm, Naive Bayes algorithm, ANN algorithm and ensemble learning algorithm are used to classify the test data sets separately. For five public datasets, TABLE III shows average classification accuracy of using the J48 classifier, IBk classifier, SMO classifier, Naive Bayes classifier, MLP classifier and bagging classifier. Table 3. Comparison Of Classification Accuracy For Five Public Data Sets Naiv e Baye s
MLP
Bagging
Data Sets
J48
IBk
SM O
segment
96.9%
97.1%
93.0 %
80.2%
96.2%
97.7%
german_ credit
70.5%
72%
75.1 %
75.6%
72%
76.4%
hepatitis
83.8%
80.6%
85.1 %
84.5%
80%
85.8%
ionosphere
91.4%
86.3%
88.6 %
82.6%
91.1%
92.0%
soybean
91.5%
91.2%
93.8 %
92.9%
93.4%
94.4%
It can be seen from TABLE III, for “Image segment” data set, the average classification accuracy of IBk classifier is significantly higher than the other four kinds of base classifiers, reaching 97.1%; as for “german_credit” data set, the average classification accuracy of Naive Bayes reaches 75.6%; for “hepatitis” data set and “soybean” data set, the average classification accuracy of SMO classifier is higher than the other four kinds of base classifiers, reaching 85.1% and 93.8% respectively; in the experiment of “ionosphere“ data set, the average classification accuracy of J48 classifier reaches 91.4%, however the average classification accuracy of IBk algorithm, SMO algorithm, Naive Bayes algorithm and MLP algorithm are 86.3%, 88.6%, 82.6%, 91.1%, respectively. It can be seen that, for different data sets, the results of the classification accuracy are different, because the Copyright © 2016 MECS
In order to further illustrate the performance of our method on real remote sensing images, we selected some real remote sensing images as training data. There are 684 instances which are divided into four classes, including resident, paddy field water and vegetation area. Some training samples of each class are shown in Fig.7, Fig.8, Fig.9 and Fig.10.
Fig.7. Resident training samples
Fig.8. Paddy field training samples
Fig.9. Water training sample.
Fig.10. Vegetation training samples
The same as five public data sets, remote sensing image data set uses min-max normalization to deal with the original data of 22 attributes, including variance, skewness, prominence, energy, absolute value and texture energy of each order. Fig.11 shows visualization of the remote sensing image data set using a scatter-plot matrix. Then C4.5 algorithm, k-NN algorithm, SVM algorithm, Naive Bayes algorithm, ANN algorithm and ensemble learning algorithm are used to classify remote sensing image data set separately. Table IV shows average classification accuracy of using the J48 classifier, IBk classifier, SMO classifier, Naive Bayes classifier, MLP classifier and bagging classifier for remote sensing image data set. I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
28
Remote Sensing Textual Image Classification based on Ensemble Learning
REFERENCES [1] [2]
[3]
[4]
[5]
Fig.11. Visualization of the remote sensing image data set using a scatter-plot matrix with part of attributes
[6]
Table 4. Comparison of Classification Accuracy for Remote Sensing Image Data Set
[7]
Data Sets
J48
IBk
SMO
Naive Bayes
MLP
Bagging
Remote Sensing Image
81.2%
78.6%
86.5%
85.1%
86.9%
89.1%
From TABLE IV, the average classification accuracy of using the J48 classifier, IBk classifier, SMO classifier, Naive Bayes classifier, MLP classifier and bagging classifier for remote sensing image data set are 81.2%, 78.6%, 86.5%, 85.1%, 86.9%, successively. As for the result of bagging algorithm, it rises to 89.1%, which is nearly 2% higher than MLP, which performs best as a single classifier in base classifiers. It may be deduced that as for texture images classification, ensemble learning is a promising approach which could acquire the satisfied results in practice.
[8]
[9]
[10]
[11]
[12] [13]
V. CONCLUSION In order to improve the classification accuracy of remote sensing image, our method uses ensemble learning to combine the classifiers of J48, IBk, sequential minimal optimization, Naive Bayes and multilayer perceptron, which classify the data sets by straight voting. At last, five set of public data and real remote sensing images are selected to verify the results. The experimental results show that multiple classifier ensemble can effectively improve the classification accuracy of textural remote sensing images. However, in the paper, classifiers are integrated with the sample mode, in the future, some better way would be employed .
[14]
ACKNOWLEDGMENT
[18]
This work is funded by the National Natural Science Foundation of China under Grant No.41301371 and funded by State Key Laboratory of Geo-Information Engineering, No. SKLGIE2014-M-3-3.
[19]
Copyright Š 2016 MECS
[15]
[16]
[17]
[20]
Ghassemian H. A review of remote sensing image fusion methods[J]. Information Fusion, 2016, 32(PA):75-89. Tsai C F. Image mining by spectral features: A case study of scenery image classification[J]. Expert Systems with Applications, 2007, 32(1):135-142. Goel S, Gaur M, Jain E. Nature Inspired Algorithms in Remote Sensing Image Classification[J]. Procedia Computer Science, 2015, 57:377-384. Xu M, Zhang L, Du B. An Image-Based Endmember Bundle Extraction Algorithm Using Both Spatial and Spectral Information[J]. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 2015, 8(6):2607-2617. Rutherford V. Platt, Lauren Rapoza. An Evaluation of an Object-Oriented Paradigm for Land Use/Land Cover Classification[J]. Professional Geographer, 2008, 60(1):87-100. Wolpert, D H. The supervised learning no-free-lunch theorem [C]. Proceedings of the 6th Online World Conference on Soft Computing in Industrial Applications, 2001. Kittler J, Hatef M, Duin R P W, et al. On combining classifiers[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1998, 20(3):226-239. Doan H T, Foody G M. Increasing soft classification accuracy through the use of an ensemble of classifiers [J]. International Journal of Remote Sensing, 2007, 28(20): 4606-4623 Hansen L K, Salamon P. Neural network ensembles[J]. Pattern Analysis & Machine Intelligence IEEE Transactions on, 1990, 12(10):993-1001. Lei Z, Liao S, Pietika&#x, et al. Face Recognition by Exploring Information Jointly in Space, Scale and Orientation[J]. IEEE Transactions on Image Processing, 2011, 20(1):247-56. Mountrakis G, Im J, Ogole C. Support vector machines in remote sensing: A review[J]. Isprs Journal of Photogrammetry & Remote Sensing, 2011, 66(3):247-259. Rokach L. Ensemble-based classifiers[J]. Artificial Intelligence Review, 2010, 33(1-2):1-39. Dietterich T G. Ensemble Methods in Machine Learning[C]// International Workshop on Multiple Classifier Systems. Springer-Verlag, 2000:1-15. Dietterich T G. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 2000,40(2):139-158 Littlewood B, Miller D R. Conceptual modeling of coincident failures in multiversion software[J]. IEEE Transactions on Software Engineering, 1989, 15(12):1596-1614. Kuncheva L, Whitaker C J, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy [J]. Machine Learning, 2003, 51(2): 181-207 Quinlan J R. Improved use of continuous attributes in C4.5[J]. Journal of Artificial Intelligence Research, 1996, 4(1):77-90. Hunt E B, Marin J, Stone P J. Experiments in induction.[J]. American Journal of Psychology, 1967, 80(4):17-19. Luxburg U V. A tutorial on spectral clustering[J]. Statistics & Computing, 2007, 17(17):395-416. Yang J F. A Novel Template Reduction K-Nearest Neighbor Classification Method Based on Weighted
I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
Remote Sensing Textual Image Classification based on Ensemble Learning
[21]
[22]
[23]
[24]
Distance[J]. Dianzi Yu Xinxi Xuebao/journal of Electronics & Information Technology, 2011, 33(10):2378-2383. Chen P H, Fan R E, Lin C J. A study on SMO-type decomposition methods for support vector machines.[J]. IEEE Transactions on Neural Networks, 2006, 17(4):893908. Karatzoglou A, Smola A, Hornik K, et al. kernlab - An S4 Package for Kernel Methods in R[J]. Journal of Statistical Software, 2004, 11(i09):721-729. Hameg S, Lazri M, Ameur S. Using naive Bayes classifier for classification of convective rainfall intensities based on spectral characteristics retrieved from SEVIRI[J]. Journal of Earth System Science, 2016:1-11. Roy M, Routaray D, Ghosh S, et al. Ensemble of Multilayer Perceptrons for Change Detection in Remotely Sensed Images[J]. IEEE Geoscience & Remote Sensing Letters, 2014, 11(11):49-53.
29
[25] Wolpert D H, Macready W G. An Efficient Method To Estimate Bagging's Generalization Error[C]// Santa Fe Institute, 1999:41-55.
Authorsâ&#x20AC;&#x2122; Profiles Ye Zhiwei, Born in Hubei China. He is an associate professor in school of computer science, Hubei University of technology, Wuhan China. His major research interests include image processing, swarm intelligence and machine learning.
How to cite this paper: Ye zhiwei, Yang Juan, Zhang Xu, Hu Zhengbing,"Remote Sensing Textual Image Classification based on Ensemble Learning", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.21-29, 2016.DOI: 10.5815/ijigsp.2016.12.03
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 21-29
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.04
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival Pooja Gupta School of Electronics, KIIT University, Patia 751024, Bhubaneswar, Odisha, India Email: pooja.guptafet@kiit.ac.in
Vijay Verma School of Electronics, KIIT University, Patia 751024, Bhubaneswar, Odisha, India Email:vijay.vermafet@kiit.ac.in
Abstract—The signal processing applications are limited with high-resolution signal parameter estimation. Therefore the Direction of Arrival estimation algorithm needs to be effective and efficient in order to improve the performance of smart antennas. This paper presents the simulation for a subspace based DOA estimation algorithm with high resolution. MUSIC (Multiple signal classification) and the IMUSIC (Improved MUSIC) are presented and optimized by varying various parameters. The basic MUSIC algorithm is ineffective in estimating the incoming coherent signals. The new improved MUSIC algorithm overcomes this ineffectiveness and correctly estimates the related signals with improved accuracy. The improved version of MUSIC algorithm is brought about by taking into account the conjugate of the data matrix for MUSIC algorithm and then reconstructing it. The various factors like the number of array elements, number of snapshots, varying the distance between array elements, varying SNR and the difference in arrival angles can bring about better resolutions. The comparisons for MUSIC and Improved MUSIC algorithm are widely discussed. Index Terms—DOA, MUSIC, ML, IMUSIC, ULA, ESPRIT.
can be enhanced by using an array of antenna system rather than a single antenna [2] as the system provides spatial sampling. The Barlett, MUSIC, MVDR, Estimation of signal Parameter through Rotational Invariance Technique (ESPRIT), Capon, Maximum Likelihood (ML) techniques [3] and Min-norm are the various resolution algorithms. The spatial spectrum comprises the target stage, observation stage and the estimation stage [4] and assume that the signals are distributed in the entire space in all the directions. This signal spatial spectrum is exploited to obtain the Direction of Arrival of the incoming source signals. ESPRIT [5] and the MUSIC (Multiple Signal Classification) [6-7] are the two most widely used subspaces based spectral estimation techniques which work on the Eigen value decomposition technique. The covariance matrices of the signals form the base for the subspace based approaches. As the ESPRIT estimation algorithm is applicable only to the array structures with peculiar geometries [8], the MUSIC algorithm is found to be the most classic and accepted parameter estimation technique that can be used for both uniform and nonuniform linear arrays. It works with the Uniform Linear Array (ULA) where the Nyquist criterion forms the basis for the placement of the array elements. The nonuniform array designing [9] is quite tedious.
I. INTRODUCTION The smart antenna system detects the emitted narrow band signals from different sources using the sensor arrays by applying specific estimation algorithms. The array signal processing works on processing and strengthening the useful signals that are received by the antenna elements, then taking measures for diminishing noise and interference, and simultaneously collecting needful signal parameters in order to carry out the estimation process . The various engineering applications like the radio, sonar, wireless communication, astronomy, earthquake, medicines, tracking and other emergencies have a vital importance for the Direction of Arrival (DOA) estimation algorithms [1]. Smart antennas provide higher communication capacity by suppressing multipath signals and interference. The resolution of the estimation process Copyright © 2016 MECS
Fig.1. The array antenna model for DOA Estimation Algorithms.
A simple arrangement of the antenna array receive model is shown in the above figure. The estimation algorithms are supposed to compute the number of I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
incident signals on the sensor array, their corresponding strengths and the direction of arrival i.e. the incidence angle of the incoming signal. For carrying out the estimation process we first need to uniformly sample the bearing space to fetch many of the discrete angles, and then we need to follow the assumption that the source and noise signals are arriving from every small bearing angles and the estimation algorithm computes the angle of signal corresponding to stronger power. The simulations on varying the factors are carried out using MATLAB which specify that the efficiency and resolution of the obtained spectrum using MUSIC algorithm can be increased by varying various parameters like the spacing between the array elements, number of array elements, number of snapshots and the signal incidence angle difference. The estimation of the related signals is not that efficient by using the conventional MUSIC algorithm, and this algorithm is found to have a declining performance when the antenna array has phase and amplitude errors. An improved MUSIC algorithm which is brought about by taking into account the conjugate of the data matrix for MUSIC algorithm and then reconstructing it is also broadly discussed. This improved version of the MUSIC algorithm solves the problem for the incoming signals that are coherent. This algorithm is found to solve all the such estimation errors where the conventional MUSIC lags. This paper widely discusses the performance of the above two mentioned estimation algorithms on varying the factors related to the estimation scenario. The paper is organized as follows. Section II describes the mathematics behind the array sensor system. Section III gives the MUSIC algorithm and in IV the parameters influencing the performance of MUSIC algorithm are discussed. Section V focuses on the detection of coherent signals and then the following section describes the Improved MUSIC algorithm. Finally section VI concludes this article by drawing inferences on study. We are considering F narrow banded source signals with the same centre frequency f0 which are impinging on an array with D number of sensor elements. The number D is always kept greater than the number of incident source signals and the array elements are linearly spaced with equal distances between consecutive elements. Restricting the number of sensors to be greater than the number of signals being incident i.e. (D>F) for a better estimation result we suppose that the M incoming signals will be incident on the sensor array with azimuth angles θk, where k varies from 1 to F. The consecutive elements of the array are placed such that they maintain a distance of half of the received signal wavelength [10]. The space matrix [d1 d2….dD-1] represents the consecutive elements of the antenna array.
Copyright © 2016 MECS
31
Fig.2. The antenna system model for DOA Estimation
The figure shown gives the structure of an uniform antenna array i.e. d1=d2 =dD-1 which lies in the same plane as the incoming signals. If the source signal has a phase of and amplitude to be aamp(t), The complex form representation of the source signal is : Ss (t)= aamp(t)
(1)
If the first element of the antenna array is taken as the reference element, the signal sensed because of the pth signal source [11] by the kth array element will be represented as: Sp
where 1≤p≤F
(2)
The overall sensing for all the incoming signals done at the kth array (where 1≤ k ≤D) is calculated as : Op(t) = ∑
(3)
Taking the separation between the (k-1)th and the kth array elements to be dk , and Sp (t) representing the pth signal that is being incident on the antenna array , and nk representing the noise signal that is being sensed by the kth element of antenna array. We denote a matrix O(t) representing the received signal that comprises of all the information obtained by the overall D array elements. The matrix representation is given as: O(t) = ASs(t) +N(t)
(4)
„A‟ in the above equation represents a steering vector matrix and N(t) gives the total noise that is received by all the array elements . The received signal by the array elements is: O(t) = [ O1(t) O2(t)...OD(t)]T
(5)
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
32
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
Now calculating the matrix representation for the „F‟ signal sources that are being incident on sensor array and the steering vector matrix „A‟ [11] : Ss(t) = [ S1(t) S2(t)…SF(t) ]T
(6)
A= [ β(θ1) β(θ2), …. β(θF) ]
(7)
θ
(8)
The value of p in the above equation ranges from 1 to F.
III. THE MUSIC ALGORITHM Estimation approach known as Multiple Signal Classification (MUSIC) algorithm in 1979[12].This algorithm is similar in character with the maximum likelihood method and is basically an one-dimensional representation of the maximum entropy. The basic approach of this algorithm is to separate the signal from the noise using the Eigen value decomposition of the received signal covariance matrix. It uses the orthogonality property of both the signal and noise space. Then the estimation is brought about by using one of these subspaces and considering that the noise in each channel is highly uncorrelated. As this algorithm takes uncorrelated noise into account, the generated covariance matrix is diagonal in nature. Here the signal and the noise subspaces are computed using the matrix algebra and are found to be orthogonal to each other. Therefore this algorithm exploits the orthogonality property to isolate the signal and the noise subspaces. There are a variety of algorithms for carrying out the estimation process, out of which this paper focuses on the most accepted and widely used Multiple Signal Classification algorithm. It decomposes the obtained data correlation matrix into two orthogonal subspaces, namely the signal and the noise subspaces. And then it uses of these subspaces to estimate the direction of the incoming signal. The obtained data correlation matrix actually forms the base for the MUSIC algorithm. We then need to search through the entire steering vector matrix and then try bringing out those steering signals and noise vectors which are exactly orthogonal. Let „CJ‟ be the covariance matrix for the received data O, where covariance matrix is the expectance of the matrix with its Hermitian equivalent.
CO = E[OOH] Now using equation (5) for the value of O we get: CO = E[(AS+N) (AS+N)] Copyright © 2016 MECS
(10)
= ACOAH + CN CN is the correlation matrix for noise and it can be expressed as: CN = σ2 I
The above two matrices given in equations (6) and (7) form the signal subspace. And in the following equation β(θp) gives the number of source signals that are being incident as β(θp) = θ
= AE[SSH]AH + E[NNH]
(11)
I in the above equation represents an unit matrix for the antenna array D*D. As the signals practically are associated with some noise, we need to compute the correlation matrix taking noise into account, so the new modified matrix is: CO = ACSAH + CN
(12)
CS here denotes the source correlation matrix, CN denotes the noise correlation matrix and A denotes the steering vector matrix. In order to distinguish the signal sources from the noise sources we have to carry out the eigen value decomposition for the calculated covariance matrix CS which will result in „D‟ eigen values where „F‟ number of larger eigen values corresponds the signal sources and the noise sources are given by the remaining smaller D-F eigen values. Let BS and BN be the basis for the signal and the noise subspaces respectively. Now the decomposed form of the correlation matrix can be given as: CO = BS∑ BSH + BN∑ BNH
(13)
βH(θ)BN = 0
(14)
Equation (14) holds true as the MUSIC estimation approach is based on the orthogonality of the signal and noise subspaces. Now the incident signal sources and the noise subspaces can be arranged so as to obtain the angle for the direction of the arriving signal as shown in the following equation: θMUSIC = argmin . βH(θ)BNBNH β(θ)
(15)
As finding peaks in the spectral estimation plot can be the best way to identify the angles from which the signals are arriving, we need to work on the reciprocal value of θMUSIC in order to obtain the maximum spectrum values. There are several approaches namely the linear prediction, eigen analysis, beam forming, array correlation matrix, maximum likelihood, minimum variance, MUSIC etc for defining the pseudo spectrum function (PMUSIC). Now the reciprocal representation of the above equation to obtain peaks in a spectral estimation plot is given: (16)
(9) This above mentioned will give sharp and high peaks when θ is exactly equal to that of the direction of arrival of the signal source. The F higher peaks are of greater power [13] and corresponds to the estimated arrival angle. I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
The entire algorithm can be summarized using a flow chart as shown below:
33
The independent spectrum peaks in the graph corresponds to 30° and 70° stating that the MUSIC algorithm is efficient enough to estimate the arrival angles of the signal sources.
V. FACTORS INFLUENCING THE PERFORMANCE OF MUSIC ALGORITHM There are various parameters that can influence the performance of MUSIC algorithm. The following sections describe those factors that can bring about better resolution in the MUSIC algorithm A. Dependence of SNR on estimation process We carry out the simulation by keeping all the specifications to be the same as used for the basic MUSIC and varying SNR successively as 0, 10, 15 and 20 dB
Fig.3. Flow chart summarizing MUSIC algorithm.
IV. SIMULATION RESULT FOR A BASIC MUSIC ALGORITHM Fill For carrying out the basic standard simulation for MUSIC algorithm the signals considered are narrow banded and non-coherent, the distance between the antenna arrays is taken to be half the wavelength of the received signal .The standard number of array elements „D‟ used is 10, SNR considered is 10dB and the number of snapshot taken is 200. The noise here is additive white Gaussian noise. We have considered the two incoming signals with arrival angles 30° and 70°. The following graph shows the MATLAB simulation for the MUSIC algorithm considering the above mentioned specifications:
Fig.5. Efficiency result on varying the SNR
The color red, blue, green and black corresponds to the estimation results for 0, 10, 15 and 20dB SNR. The peaks are found to be more and sharper and distinguished as the value of SNR gets higher. The estimation plots covering a larger range of the spectral function have better efficiency and resolution. Hence improving the performance of the estimation process in the low SNR condition is currently the main research topic. B. Dependence of the number of sensor elements on the estimation process On keeping the same specifications and varying the number of sensor elements as 10, 20, 50 and 100, we find that the peaks corresponding to the estimation with higher number of array elements are more and more sharp and prominent. We now carry out a simulation with varying number of sensor elements. The estimation graph for the same is shown:
Fig.4. Spatial spectrum for MUSIC algorithm
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
34
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
The red, blue and black curves corresponds to simulations with inter element spacing as lambda/6, lambda/4 and lambda/2. It is clearly seen that an increase in the spacing within a maximum range of lambda/2 provide more prominent and sharper peaks. Now when the inter element spacing is increased beyond lambda/2, we observe the following:
Fig.6. Efficiency results on varying the number of sensor elements.
The red, blue and green curve represents the simulations for 10, 20 and 50 number of array elements respectively. The black color showing the best sharpest curves corresponds to a simulation with 100 array elements. Hence with an increase in the number of sensing elements the estimation process goes more and more efficient. But increasing the sensing elements recklessly is not economical. C. Dependence of interelement spacing on estimation process The MUSIC algorithm is efficient when the array model is an uniform linear array where the inter element spacing is the same between all the consecutive sensors. The standard simulation considers that the elements are equidistantly placed with a gap of lambda/2, where lambda is the wavelength of the signal. We then carry out the simulation by keeping the uniforms inter element distance to vary as lambda/6 and lambda/4 and lambda/2. The red, blue and black curves corresponds to simulations with inter element spacing as lambda/6, lambda/4 and lambda/2. It is clearly seen that an increase in the spacing within a maximum range of lambda/2 provide more prominent and sharper peaks.
Fig.7. Enhancing efficiency by increasing the inter element spacing within lambda/2.
Copyright Š 2016 MECS
Fig.8. False peaks on increasing the inter element spacing beyond lambda/2.
The estimation algorithm â&#x20AC;&#x17E;MUSICâ&#x20AC;&#x; produces false peaks when the inter element distance is increased beyond lambda/2. The blue and black curves correspond to an element spacing of 2*lambda/3 and lambda. Hence the best optimized results for the MSUIC algorithm is obtained with an inter element spacing of lambda/2. D. Dependence of estimation process on the number of snapshots. The number of snapshots for retrieving the data to find estimation results on varying the number of snapshots is shown below. Te number of snapshot is varied as 10, 100 and 200. The red, blue and black curve corresponds to 10, 100 and 200 number of snapshots respectively. With other conditions remaining unchanged we find that the beam width of the direction of arrival estimation spectrum becomes narrower and the accuracy is enhanced.
Fig.9. Enhancing efficiency by the number of snapshots.
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
The curves corresponding to higher number of snapshots cover larger spectrum ranges. The number of snapshots can be expanded in order to enhance the accuracy, but with increase in snapshots the number of calculations to be carried out also increases. So we need to work with a reasonable number of snapshots that ensures accuracy along with which it minimizes the computation.
VI. DETECTION OF COHERENT SIGNALS Estimation algorithms because of its sharp needle spectrum which can estimate the independent source signals precisely. It provides unbiased estimation results in many practical applications, and is even found to perform well in a multiple signal environment. This algorithm achieves high resolution in DOA [14] only when the incoming signals are non-coherent. The results and simulations discussed in the above sections consider that that the incoming signals that are being incident are not related and hence are not generated from the same source. The performance of MSUIC algorithm for the incoming non-coherent signals is quite satisfactory. Now we carry out simulations corresponding to the signals which are coherent and thus are related. Considering the number of array elements to be 10, the incoming signals to be arriving with angles 30°and 70°, SNR to be 10dB, the inter element spacing to be half the signal wavelength, the number of snapshots to be 200 and the coherent signals are with the same frequency. The simulation results for the comparison of incoming coherent and noncoherent signals employing MUSIC algorithm is shown below:
the basic conventional MUSIC algorithm. This algorithm declines in its performance and accuracy when the incoming signals are related and even when there are amplitude as well as phase errors.
VII. THE IMPROVED MUSIC ALGORITHM The new Improved MUSIC algorithm which is proposed in the following section can effectively estimate the related as well as the non-related signals. It can even partially calibrate the array errors. As the peaks obtained are not sharp and narrow, they fail to estimate the arrival angle for coherent signals. So we need to move towards an improved MUSIC algorithm to meet the estimation requirements for coherent signals. In order to improve the estimation results for the conventional MUSIC algorithm [15], we need to introduce an identity transition matrix „I‟ to the matrix „X‟ which corresponds to the received signal . On introducing the identity matrix we get: X= IO*
CX = E[X XH ] = TCO*
Copyright © 2016 MECS
(18)
A reconstructed matrix „C‟ is given by summing up CX and CO .As the matrix are summed up they will have the same noise subspaces:
C = ACSAH + I[ACSAH ]* I + 2σ2 I
The blue and red curves correspond to the estimation curves for coherent and non-coherent incoming signals. It can be clearly seen that the MUSIC algorithm cannot detect the related signals with the same frequency. The peak search method fails when the signals are coherent. Hence the estimation of the related signal i.e. the signals which are coherent cannot be efficiently carried out using
(17)
O* in the above equation represents the complex conjugate of the received signal matrix „O‟. If „CX‟ denotes the correlation function for the received signal matrix, the modified „ CX‟ on introducing the identity is given as:
C = CO +CX
Fig.10. Estimation of coherent and non-coherent incoming signals using MUSIC algorithm.
35
(19) (20)
The new improved MUSIC is different from the conventional one as it filters out the noise subspace obtained after decomposition of CO and uses the new noise subspace for the spectral estimation which is obtained by the characteristic decomposition of the resultant matrix C is used for the spatial spectrum construction and to obtain peaks. On keeping all the specifications to be the same as in the simulations for basic MUSIC algorithm, we carry out the same for improved MUSIC. The graph shown below gives a comparison between the estimation of coherent and non coherent signals using the new improved MUSIC algorithm. In the following figure it can be seen that the Improved MUSIC eliminates the limitations of the MUSIC algorithm. The performance of the improved MUSIC for the estimation of coherent signals is satisfactory. The graph even shows that the resolution and accuracy of the Improved MUSIC for non-coherent signals is quite more than that for the coherent signals
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
36
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
Fig.11. Estimation of coherent and non-coherent incoming signals using Improved MUSIC algorithm.
Fig.13. Estimation of incoming non-coherent signals using both MUSIC and Improved MUSIC algorithms.
The red and blue curves give the estimation results for coherent and non-coherent signals respectively. The blue curves corresponding to the non-coherent signals cover a larger range of spectrum and hence the improved MUSIC is more efficient for non-coherent than those of the incoming coherent signals. The upcoming figure compares both MUSIC and Improved MUSIC algorithm for the estimation of coherent signals.
The figure shows the same where the blue curve representing the Improved MUSIC estimation plot covers a wider spectral range than the blue curve corresponding to the conventional one. So the non-coherent signals can also be estimated with higher resolution and accuracy with the use of Improved MUSIC algorithm. The efficiency of this improved MUSIC for both coherent and non-coherent signals can also be increased by varying the parameters as in the case of the basic MUSIC algorithm. Increasing the number of sensors, number of snapshots, increasing the inter element spacing and SNR can bring about better efficiency in Improved MUSIC as well.
VIII. CONCLUSION
Fig.12. Comparison of MUSIC and Improved MUSIC for the estimation of coherent signals.
So the problems in estimating the coherent signals are overcome by the Improved MUSIC algorithm. The newly introduced Improved MUSIC even shows better performance for the estimation of non-coherent signals. This can be efficiently judged by comparing the simulation curve for the estimation of the arriving noncoherent signals using both MUSIC and Improved MUSIC.
Copyright Š 2016 MECS
The eigen values and eigen vectors forms the base for both MUSIC and improved MUSIC algorithms. They exploit the orthogonality property between the signals and the noise subspaces to estimate the direction of arrival of the incoming signal. Performance of these algorithms can be increased with an increase in the number of array elements, on considering a larger number of snapshots, increasing the element spacing with some above mentioned constrained and effectively increases on increasing the value of SNR. The MUSIC algorithm is found to be efficient for the detection of non-coherent signals and the performance declines sharply when the incoming signals are coherent in nature. The Improved MUSIC algorithm caters the problem for the estimation of coherent signals. REFERENCES [1] Haykin S,Reilly JP, Vertascitsch E. Some Aspects of Array Signal Processing. IEE Proc. F,1992,139; p1 26.
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival
[2] Zekeriya Aliyazicioglu, H.K. Hwang. Marshall Grice, Anatoly Yakovlev, “Sensitivity analysis for direction of arrival estimation using a Root-MUSIC algorithm”, Proceedings of the International Multi Conference of Engineers and Computer Scientists vol II IMECS, 19-21 March 2008 [3] T. S. Rappaport and J. C. Liberti Jr.,” Smart Antennas for Wireless Communications: IS-95 and Third Generation CDMA Applications”, Upper Saddle River, NJ: Prentice Hall, 1999 [4] Yongliang Wang, “ Space Spectral Estimation Theory and Algorithm”, China. Tsinghua Press, 2004. [5] Richard Roy, ThonasKailath, “ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques”,IEEE Trans on Acoustics Speech and Signal Processing, vol. 37. No.7 pp 984~995, July 1989. [6] Ronald D, Degrot., “The Constrained MUSIC Problem” ,IEEE Trans on SP.1993. vol. 41(3). P1445~1449. [7] Fuli Richard,”Analysis of Min-norm and MUSIC with Arbitrary Array Geometry” , IEEE Trans on AES.1990. vol. 26(6). P976~985. [8] M.Gavish, “Performance Analysis of the VIA ESPRIT Algorithm” ,IEE-Proc-F. 1993. vol. 140(2). P123~128. [9] B. P. Kumar, G. R. .Branner, “Design of Unequally Spaced Arrays for Performance Improvement”, IEEE Trans. On Antennas Propagation, 47 (3): 511-523, 1999. [10] Y.J Huang, Y.W Wang, F.J Meng, G.L Wang, “A Spatial Spectrum Estimation Algorithm based on Adaptive Beam forming Nulling” ,Piscataway, NJ USA. Jun, 2013. Pp 220-4. [11] ZHNAG Hongmei, GAO Zhenguo, FU Huixuan, “High Resolution Random Linear Sonar Array Based MUSIC Method for Underwater DOA Estimation” Proceeding of 32nd Chinese Control Conference, July 26-28,2013. [12] Ralph O, Schmidt, “Multiple Emitter Location and signal Parameter Estimation”, IEEE Trans. On Antennas and Propagation, March 1986. vol. 34. No. 3. pp 276-280.
37
[13] L.N Yan, “ Study of Factors Affecting Accuracy of DOA Modern Rader” ,June 2007. vol. 29. No. 6. pp 70-3. [14] Fei Wen, Qun Wan, Rong Fan, Hewen Wei, “Improved MUSIC Algorithm for Multiple Noncoherent Subarrays” ,IEEE Signal Processing Letters, vol. 21, no. 5, May, 2014 [15] Debasis Kundu, “Modified MUSIC Algorithm for estimating DOA of signals”, Department of Mathematics Indian Institute of Technology, Kanpur, India. November 1993.
Authors’ Profiles Pooja Gupta was born in Odisha, India in 1992. She has completed her Dual Degree B.Tech and M.Tech programme in Electronics and Telecommunication Engineering in 2015 from KIIT University, Bhubaneswar, Odisha. She is currently working as a Faculty Associate at KIIT University. Her research area includes subspace methods for Direction of Arrival Estimation.
Vijay Kumar Verma was born in Uttar pradesh, India on 21 May, 1990 and has completed his post graduation from IIT(BHU) Varanasi. He is currently working as an Assistant Professor at KIIT University, Odisha. His field of interest includes magnetic levitation systems, model order control, and robust control and optimization techniques.
How to cite this paper: Pooja Gupta, Vijay Verma,"Optimization of MUSIC and Improved MUSIC Algorithm to Estimate Direction of Arrival", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.30-37, 2016.DOI: 10.5815/ijigsp.2016.12.04
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 30-37
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.05
A Survey on Shadow Removal Techniques for Single Image Saritha Murali1 1
Department of Computer Science and Engineering, 1 National Institute of Technology Calicut, India. Email: saritha.mkv@gmail.com
V.K. Govindan2, Saidalavi Kalady1 2
Department of Computer Science and Engineering, Indian Institute of Information Technology, Kottayam, India. Email: vkg@nitc.ac.in, said@nitc.ac.in
2
Abstract—Shadows are physical phenomena that appear on a surface when direct light from a source is unable to reach the surface due to the presence of an object between the source and the surface. The formation of shadows and their various features has evolved as a topic of discussion among researchers. Though the presence of shadows can aid us in understanding the scene model, it might impair the performance of applications such as object detection. Hence, the removal of shadows from videos and images is required for the faultless working of certain image processing tasks. This paper presents a survey of notable shadow removal techniques for single image available in the literature. For the purpose of the survey, the various shadow removal algorithms are classified under five categories, namely, reintegration methods, relighting methods, patch-based methods, color transfer methods, and interactive methods. Comparative study of qualitative and quantitative performances of these works is also included. The pros and cons of various approaches are highlighted. The survey concludes with the following observations- (i) shadow removal should be performed in real time since it is usually considered as a preprocessing task, (ii) the texture and color information of the regions underlying the shadow must be recovered, (iii) there should be no hard transition between shadow and nonshadow regions after removing the shadows. Index Terms—Shadow removal, reintegration, relighting, color-transfer.
I. INTRODUCTION The presence of shadows in an image assists the user in locating the light sources and determining the size and shape of the object casting the shadow. But, they impair the proper execution of computer vision algorithms for segmentation, object recognition, video analysis, etc. In image segmentation and object detection, the shadow region itself may be misclassified as an object or a part of an object. Detection of moving objects in a video recording may also yield improper results due to the presence of moving shadows. Hence, these applications Copyright © 2016 MECS
need the shadows to be eliminated in the preprocessing stage. Human Visual System is capable of locating shadows in an image in most of the situations. However, it is hard to develop algorithms that can automatically detect shadows without any human intervention. Various complications may arise while dealing with shadows in an image. An image may have complex scattered shadows, self-shadows, hard and soft shadows caused by multiple light sources, shadow boundary coinciding with object edge and so on. In addition, shadows of different objects may overlap or the object casting shadows may be absent in the image. The shadows may span multiple surfaces, or have complex underlying texture, or there may be dark regions which appear like shadows. All these issues cause trouble in detecting and removing shadows from an image. We consider shadow removal as a two-phase process, involving a detection phase and a removal phase. Numerous methods to remove shadows from videos[1], aerial images[2] and outdoor images[3] are available in the literature. A survey on shadow detection and removal methods for real images was done by Xu et al.[4]. Another extensive review with the classification of the shadow removal algorithms was presented by Sasi and Govindan[5]. In this work, we attempt to review the various shadow removal techniques for a single image available in the literature. We have identified five major classes of shadow removal algorithms, namely, reintegration methods, relighting methods, patch-based methods, color transfer methods, and interactive methods. Though some of the works may fit under more than one category, we have classified each of them into the most appropriate category. The remainder of this paper is structured in various sections as given in the following: Section II introduces different types of shadows. Section III presents the major challenges involved and the expected outcome of shadow removal algorithms. Various classes of techniques employed for shadow removal algorithms are reviewed in Section IV. A comparison of the methods discussed
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
A Survey on Shadow Removal Techniques for Single Image
follows in Section V. Finally, the survey is concluded highlighting major observations in SectionVI.
39
Regaining the texture or color information under hard shadows may be difficult due to the loss of details in these lesser-illuminated areas.
II. TYPES OF SHADOWS Shadows can be broadly classified as self-shadows and cast shadows. Self-shadow is the shadow cast by an object on itself. This kind of shadow appears on an object when direct light from a source is unable to reach that area of the object. Cast shadows are the shadows formed by an object on another object or surface. These shadows have two main regions when the scene is illuminated by multiple light sources. The dark inner region in cast shadow is called umbra, and the light outer region is called penumbra. The texture and color information of the underlying surface is mostly lost in the umbra region, while the illumination intensity is non-uniform in the penumbra region. The different shadow regions are shown in Fig.1.
IV. SHADOW REMOVAL TECHNIQUES Some of the shadow removal techniques include a shadow detection phase prior to the removal stage, while others take a shadow detection result as input and perform removal over it. This survey focuses on the removal phase rather than the detection phase. The major works in shadow removal are classified into the following categories: A. Reintegration methods B. Relighting methods C. Patch-based methods D. Color transfer methods E. Interactive methods A brief description of the works belonging to each of these categories is given in this section. A. Reintegration Methods
Fig.1. Shadow regions
III. CHALLENGES IN SHADOW REMOVAL Elimination of shadows from an image is particularly challenging since the entire information needed to locate and eliminate shadows should be derived from the input image itself. The automated methods do not possess extra information regarding the number or location of light sources illuminating the scene or depth information. Often, the object casting the shadow may not be present in the image. All these contribute to the difficulty in detecting and removing shadows from an image. An automated algorithm for removing shadows from an image is expected to hold the properties listed below:
Preserve the texture beneath the shadow Retain the color information of the surface Make the shadow edges unnoticeable in the shadow-free image Reduce visual artifacts Consume less time since shadow removal is dealt with as a preprocessing task
Copyright © 2016 MECS
The reintegration methods for shadow removal are built on the concept that nullifying the image gradient along the shadow edges and integrating back the modified gradient will produce a shadow-free image. An extensive study on reintegration based shadow removal methods was performed by Finlayson and team [6,7,9,10,11]. Initially, the shadow edges were detected using an invariant image representation. The image gradient was then computed and nullified along the shadow edges, followed by reintegration. These techniques assume that the variation of image illumination is at a slower rate compared to reflectance. Finlayson et al.[6] proposed a mechanism to eliminate shadows from images by reintegrating the image gradient in which the x and y derivatives of the pixels in shadow edges were set to zero. The reintegration was done separately for each color band by solving a 2Dimensional Poisson equation. In [7], shadow edges are not set to zero. Instead, iterative diffusion is used to fill the shadow edge derivatives by values from non-shadow neighbors, followed by reintegration as in [6]. The 2-Dimensional reintegration results in artifacts for imperfect shadow edge detection input. Moreover, the technique is computationally expensive due to the reintegration on two derivatives per pixel. Later, the authors proposed a 1-Dimensional path-based reintegration using retinex[8] in [9] to reduce the computation overhead. The average lightness at each pixel was obtained by computing the ratio between the initial pixel and each pixel along multiple paths in each color channel. They used 20 to 40 random paths each of length three-quarters of Image size. Reintegration along random paths using retinex often included certain pixels multiple times, leading to visible artifacts and higher complexity. A faster reintegration of the image gradient along non-random raster and fractal I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
40
A Survey on Shadow Removal Techniques for Single Image
paths was proposed by Fredembach and Finlayson[10]. Their technique gave good results upon averaging along 16 such Hamiltonian paths. The authors subsequently spotted that shadow edges should be closed and a path should enter and leave the shadow region only once[11]. Hamiltonian paths satisfying these conditions were used for reintegration yielding better shadow removal results in less time. Discussion: The reintegration based shadow removal algorithms generally improve the quality of the entire input image. The major drawback of these methods is the need for strong shadow edges in the detection output. In addition, the 2-Dimensional reintegration is computationally expensive than the 1-Dimensional path reintegration. Lesser artifacts are produced by 1-Dimensional reintegration if each pixel is encountered only once, and the reintegration results are averaged along multiple paths. B. Relighting Methods Shadow regions appear in an image due to a reduction in the amount of light reaching the area compared to the non-shadow regions. The objective of relighting methods for shadow removal is to find a factor that can be used to enhance the lightness of the shadow pixels. In [12], Arbel and Hel-Or mentioned that the factor can either be a multiplicative constant in image domain or an additive constant in log domain as shown in (1) and (2) respectively. (1) Taking logarithm of (1) :
Arbel and Hel-Or[12]. They initially calculated an additive scale factor in log domain for the inner shadow region using cubic smoothing splines. Directional smoothing was then deployed to correct the scale factors in penumbra to eliminate the abrupt variation that may arise at the shadow edge after shadow removal. Salamati et al.[14] determined the lightness factor for each pixel in umbra and penumbra using a probability shadow map, in the LAB color space. Chromaticity and boundary correction were performed after the lightness correction. Their method preserves both texture and color in the results. Fractional shadow coefficients derived by matting were used by Guo et al. [15] to derive the scale factor for shadow pixels. The pixel relighting was done by using the following equation: (4) where r is the ratio of direct light to environment light and Ii is the i-th image pixel. ki is a value in the range [0,1] that depends on the amount of direct light falling on the region. The value of ki is 1 for a non-shadow pixel, 0 for an umbra pixel. Other values indicate that the pixel is in the penumbra. Shadow removal was achieved using region relighting in the work by Vicente et al.[16]. In their work, a trained classifier was employed to determine a shadow region and its corresponding lit region pair. The shadow region was relit by matching the luminance of shadow region and its corresponding lit region. The relighting transformation T is defined based on the shadow region (RS) and the non-shadow region (RNS) of the same material as given in (5). Color and boundary correction were achieved by adding offset in LAB color space.
(2) where, I is the image, R and L are the reflectance and luminance components and C is the factor used to correct the shadow pixels. The various methods to relight the shadow regions are discussed in this section. In [3], the authors demonstrated that the costly reintegration procedure can be replaced by computing a constant in each color channel that will minimize the variation between shadow and non-shadow pixels on either side of shadow edge. They evaluated an array C that minimizes the least square error between the pixel arrays outside and inside the shadow edge, P and S respectively using equation (3). ||
||
(3)
The values in C corresponding to least error are then averaged to find the constant. This constant was calculated separately for each shadow region and added to the pixels in the region to get the shadow-free image. Du et al. [13] estimated solar and environmental light, pixel reflectance and light attenuation factor to relight each shadow pixel in an outdoor image. An algorithm to remove shadows on curved surfaces was proposed by Copyright Š 2016 MECS
Ě&#x201A;
Ě&#x201A;
(5)
Discussion : The relighting methods for shadow removal aim at finding a scale factor that can be added to the shadow region, thereby reducing the difference in illumination between shadow and non-shadow regions. These methods are usually simple and fast. The major challenge is in finding separate scale factors for umbra and penumbra shadows, and shadow boundary correction. Within the penumbra, different pixels might vary in illumination which ultimately needs multiple scale factors for a single shadow region. Also, a suitable lit region should be found to calculate the relighting factor in most of the cases. C. Patch- based Methods The patch-based methods for shadow removal attempt to operate on patches rather than on single pixels. These methods work on the assumption that the illumination and reflectance within a shadow patch are almost nonvarying. Some of the works consider overlapping patches while others use non-overlapping patches.
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
A Survey on Shadow Removal Techniques for Single Image
The shadow removal technique proposed by Gryka et al. [17] computed a feature vector for each of the nonoverlapping patches in an image. The feature vector of each shadow patch was mapped to a set of possible shadow mattes using a trained regressor, and the best matching matte was found by Markov Random Field(MRF). The extraction of shadow matte from the red, green and blue color planes resulted in a shadow-free image. An illumination recovering operator computed from a shadow patch and its corresponding lit patch was used to remove shadows in [18]. This method divided the image into overlapped patches. The illumination of overlapped pixels was optimized by a weighted average of the pixels in the patch. Ma et al. [19] used a patch-based image synthesis approach that reconstructs the shadow region using patches sampled from non-shadow regions. The color and texture of shadow patches were then modified based on correction parameters. This was followed by optimization using a confidence which assured that shadow patches without matching non-shadow patch are also rectified. In [20], Sasi and Govindan extracted a shadow image by finding the difference between an image and its invariant. Geometric and shadow sub-dictionaries were formed from the patches of this shadow image by learning. The geometric component of the shadow image was recovered and finally added to the invariant to get a shadow-free image. Discussion : The patch-based methods process each shadow patch instead of single shadow pixels. This requires a considerably less amount of time for shadow removal. Sometimes, it is difficult to find correctly matching nonshadow patch, for a shadow patch. In addition, inappropriate patch matching may lead to unexpected results. D. Color Transfer Methods The foundation of color transfer methods for shadow removal lies on the work by Reinhard et al. [21].These methods aim at transferring color information from the lit areas to the shadow areas. The colors from the lit regions are transferred to shadow regions using mean and standard deviation of the Gaussian distribution followed by the color intensities in an image [22]. Wu and Tang [22] used a Bayesian formulation to extract the shadow image β from the image I, leaving a shadow-free image Ƒ. The image I is represented as a combination of in β and Ƒ in (6). Within the shadow region, β is estimated using the mean intensity of pixels in shadow and lit regions. For the uncertain regions, β is estimated by considering the affinity of a pixel to the probability distribution of shadow region. The shadowfree image is then computed by solving a Poisson equation followed by optimization using prior of β to get smooth shadow in β, and hence, retaining the texture in Ƒ. I=βƑ
Copyright © 2016 MECS
(6)
41
Wu et al. [23] formulated shadow effect as light attenuation problem and generated an initial shadow-free image by color transfer from non-shadow to shadow regions based on the probability of a pixel being a shadow pixel, and color histogram. The effect of color transfer at shadow boundaries was then reduced by affinity map and the shadow matte β was computed by energy minimization. Shor and Lischinski [24] initially computed Laplacian pyramid of the input image and downsampled shadow masks for each level. Four affine parameters modeling the relation between a shadow pixel and its illuminated intensities were estimated based on the mean color and standard deviation of luminance in shadow and corresponding non-shadow regions and applied on shadow pixels at each level. The relation is shown in (7). (7) Here, is the camera response at k ϵ {R,G,B}; =1/a(x) is the inverse of ambient attenuation factor. The modified pyramid was then flattened and edges were inpainted to obtain the shadow-free image. This method used uniform parameters for entire shadow region without considering reflectance variation within shadow. A technique to overcome this limitation was put forward by Xiao et al. [25]. They calculated adaptive parameters for illumination transfer from lit regions to shadow regions on multiple scales and combined the results to get less noisy output. The method used global illumination transfer followed by direct illumination of each shadow pixel in LAB color space, and recovered shadow-free image by (7). These shadow removal approaches did not consider the varying texture within a shadow region. This often resulted in inaccurate texture recovery. This was addressed in [26] where the shadow and lit regions were segmented into sub-regions based on texture and a matching lit sub-region was found for each shadow subregion based on texture feature and distance, followed by illumination transfer [25]. Khan et al. [27] estimated an initial shadow-free image using multilevel color transfer and improved the estimate by inpainting the boundaries. The mean value of each cluster was used in transferring color at each level and the transfers were integrated at the end. Bayesian formulation was then used to extract shadow model parameters thereby forming a shadow-free image. Discussion : The color transfer methods attempt to transmit color details from the lit regions to shadow regions. It is seen that finding a uniform set of parameters to recover the shadow region may produce visual artifacts. Determining adaptive parameters and selecting appropriate nonshadow regions to calculate the parameters were found to improve the results. These techniques may produce unacceptable results for hard shadow edges and highly colored, non-uniform textures.
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
42
A Survey on Shadow Removal Techniques for Single Image
V. COMPARISON
E. Interactive Methods Shadow removal techniques are either interactive or automatic. Many authors suggest that a simple user input can lead to faster shadow detection. Also, providing the user a platform to interact with the shadow removal system can lead to improved results. This Section deals with various interactive shadow removal methods in the literature. Liu et al. [28] proposed an interactive shadow removal technique in which the user should mark the shadow boundary in the image. The image gradient, in penumbra region derived from the user input, was modified using illumination change curves and the new gradient was integrated by solving Poisson equation. Miyazaki et al. [29] proposed a way to eliminate shadows by hierarchical graph cut. Their algorithm needed the user to mark shadow, lit and background regions using a stroke. During every iteration of hierarchical graph cut, the user can interact with the system by marking the imperfectly recovered areas, leading to the final shadow-free output. Arbel and Hel-Or[30] generated shadow, penumbra and lit region masks from the shadow and lit regions marked by the user. Anchor points in the image were then selected based on the monotonicity of pixels to generate a shadow and surround mask. These anchor points were used to find scale factor for umbra and penumbra regions by intensity surface approximation. In [31], the features extracted from shadow and lit regions input by the user was used to train a Granular Reflex Fuzzy Min-Max Neural Network (GrRFMN). The pixels in the Region Of Interest (ROI) marked by the user were examined to find their fuzzy-membership in shadow or lit region, and the shadow pixels were recovered using a correction factor based on mean RGB in the shadow and lit regions. The user can again enter another ROI and repeat the procedure. A simple user stroke on the umbra region was taken as input by Gong et al. [32] from which the shadow boundary was derived. Shadow was then removed by a scale estimated using illumination variance at intensity samples along different positions in the shadow boundary, followed by color transfer. A fast shadow removal technique that ask the user to scribble samples of lit and shadow pixels was given in [33]. A fusion image that boosts the illumination discontinuity along the shadow edge and conceals the texture was developed from the shadow mask. The illumination change along shadow boundary was used to form a penumbra strip. A sparse shadow scale followed by dense scale was estimated using the penumbra strip and the shadow region was relighted. Discussion : The interactive techniques for shadow removal are generally simpler than the automated techniques since the user can provide useful cues for locating the shadows in an image. Also, certain techniques [29][31] let the user to iteratively interact with the removal system thereby deriving a shadow-free image. Copyright © 2016 MECS
This Section gives a listing of the most widely used single image shadow detection and removal datasets. This is followed by a qualitative and quantitative comparison of the shadow removal methods discussed in the previous section. A. Datasets The four main datasets available online for detection and removal of shadows in single images are as follows:
UIUC dataset by Guo et al.[15]: 108 natural scenes taken under different illumination and their ground truth. CMU dataset by Lalonde et al.[34]: 135 outdoor consumer photographs with shadow boundary annotations. Dataset by Gong et al.[33]: 214 images with ground truth. UCF dataset by Zhu et al.[35]: 355 images and corresponding manually-labeled ground truths.
B. Qualitative Analysis In this section, we compare the shadow removal algorithms using the visual quality of the output shadowfree images. A comparison of these algorithms is given in Table 1. Shadow removal is usually performed at pixel, region, or patch level. The pixel-based methods usually consume a large amount of time since processing is performed on each pixel at a time. The patch-based and region based removal techniques process a set of pixels together. Some of the algorithms use learning based system to detect or remove shadows. These methods train the system with a set of images and use the learned features to locate and remove shadows from an input image. The learning based methods are found to give good quality results. The shadow removal algorithms also make assumptions on the lighting conditions, camera or surface properties. Many of these methods work for scenes illuminated by point light source and Lambertian reflectance. Table 2 and Table 3 depicts the shadow removal results of some of the works discussed in this paper. A comparison of the 2-Dimensional reintegration [6] based on Poisson equation and 1-Dimensional path-based reintegration [11] using an average of 4 Hamiltonian paths is given in Table 2(a). The result of 1-Dimensional reintegration looks more pleasing than the 2-Dimensional method. In addition, the results of path based reintegration can be obtained in lesser time. Table 2(b)(iii) illustrates the artifacts present at the shadow boundary due to the poor performance of the boundary processing method in [16]. Table 2(b)(iv) gives the result of Khan et al.[27] in which the transition from shadow to the nonshadow region is almost imperceptible. A shadow region may not always possess same texture or color reflectance. Hence, applying uniform parameters to eliminate shadows from an image may lead to inappropriate results as shown in Table 2(c)(iii). This problem is addressed by Xiao et al.[25] in which the I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
A Survey on Shadow Removal Techniques for Single Image
reflectance and texture variation inside the shadow region is also considered in evaluating the parameters for shadow correction. Table 2(d)(iii) shows that the simple shadow removal by Fredembach and Finlayson[3] were
43
not able to find a constant that could recover umbra and penumbra regions. But, the relighting used by Salamati et al.[14] produced good results without loss of underlying texture.
Table 1. Qualitative comparison of shadow removal methods Reintegration Methods
level
FI
PT
LB
LD
UP
UI
DO
BC
Finlayson et al.[6]
pixel
✓
✓
✕
✓
✕
✕
edge
✕
Finlayson et al.[9]
pixel
✓
✓
✕
✓
✕
✕
edge
✕
Finlayson et al.[10]
pixel
✓
✓
✕
✓
✕
✕
edge
✕
Fredembach and Finlayson[11]
pixel
✓
✓
✕
✓
✕
✕
edge
inpaint
Finlayson et al.[7]
pixel
✓
✓
✕
✓
✕
✕
edge
inpaint
Relighting Methods
level
FI
PT
LB
LD
UP
UI
DO
BC
Fredembach and Finlayson[3]
pixel
✕
✓
✕
✓
✕
edge
inpaint
Du et al.[13]
pixel
✕
✓
✕
✓
✓
✓
matte
✕
Arbel and Hel-Or[12]
pixel
✕
✓
✕
✓
✕
mask
Salamati et al.[14]
pixel
✕
✓
✕
✓
✕
probability map
Guo et al.[15]
pixel
✕
✓
✕
✓
✓
✕
shadow coefficient
Vicente et al.[16]
region
✕
✓
✓
✕
mask
gaussian filter
Patch-based Methods
level
FI
PT
LB
LD
UP
UI
DO
BC
Gryka et al.[17]
patch
✕
✓
✓
✓
✓
mask
mask
✕
Zhang et al.[18]
patch
✕
✓
✕
✓
✓
trimap
matte
✓
Ma et al.[19]
patch
✕
✓
✕
✕
mask
✕
Sasi and Govindan[20]
patch
✓
✓
✓
✕
Color Transfer Methods
level
FI
PT
LB
LD
UP
UI
DO
BC
Wu and Tang[22]
pixel
✕
✓
✕
✓
✓
✓
extraction
✓
Wu et al.[23]
pixel
✕
✓
✕
✓
✓
extraction
✓
Shor and Lischinski[24]
pixel
✕
✓
✕
✓
✓
mask
inpaint
Xiao et al.[25]
region
✕
✓
✕
✓
✕
✓
mask
bayesian
Xiao et al.[26]
region
✕
✓
✕
✕
✕
mask
alpha matte interpolation
Khan et al.[27]
pixel
✕
✓
✓
✓
✓
✕
mask
alpha matte interpolation
Interactive Methods
level
FI
PT
LB
LD
UP
UI
DO
BC
Liu et al.[28]
pixel
✕
✓
✕
✕
✓
edge
edge
✕
Miyazaki et al.[29]
region
✕
✓
✕
✕
✓
stroke
extraction
✕
Arbel and Hel-Or[30]
pixel
✕
✓
✕
✕
✓
markings
mask
✕
Nandekar et al.[31]
pixel
✕
✕
✓
✓
stroke
fuzzy membership
✕
Gong et al. [32]
pixel
✕
✓
✕
✕
✓
rough stroke
mask
✕
Gong et al.[33]
pixel
✕
✓
✕
✕
✓
scribbles
mask
✕
✕
gaussian smoothening
✕
FI – Modifies entire image, PT- Preserves texture, LB – Learning based, LD – light or camera dependency, UP – both umbra and penumbra, UI – user input, DO- Detection output, BC – Shadow boundary correction, ✓ - yes, ✕ - no
Gryka et al.[17] designed a method to remove soft shadows. Table 2(e)(iii) illustrates the result of applying this method on hard shadow. The method doesn’t give good results since the training was done for soft shadows only. The region based shadow removal by Guo et al.[15] considers irregular shadow region which might have different textures and colors. This introduces error in the relighting constant for the region. Whereas the patchbased method by Zhang et al.[18] uses adaptive overlapped patches to compute the relighting factor using matching patches. Table 2(g) illustrates the shadow removal from curved surfaces.
Copyright © 2016 MECS
Table 3(i) shows the effect of shadow detection output in the final result. The detection result by Guo et al. [15] misclassified the dark pattern on the box as shadows and hence the dark pattern was removed from the image. The Table 3(ii) shows the entire image enhanced by the method proposed by Finlayson et al.[6] on the attempt to remove shadows. An example of the interactive methods is displayed in Table 3(iii). Liu et al. [28] needs the user to input the shadow boundary itself whereas [32] asks the user to give a rough stroke in the shadow region. C. Quantitative Analysis
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
44
A Survey on Shadow Removal Techniques for Single Image
Most of the authors use Root Mean Square Error(RMSE) to evaluate the shadow removal results. The per-pixel RMSE between shadow removal output and ground truth shadowless images for some of the works are tabulated in Table 4. The comparison figures are obtained from the works discussed in this paper. The table shows RMSE for the shadow regions and nonshadow regions separately. The actual error of an image with and without the shadow is also included. From the table, it can be observed that the RMSE for shadow regions and overall image is least for Vicente et al.[16]. This means their method gives better results in terms of per-pixel accuracy. Table 5 gives average RMSE for some of the methods discussed. Again, it is clear that the one with least RMSE is the method proposed by Sasi and Govindan[20].
methods. In addition, pixel-wise processing brings about immense computation overhead. Among the reintegration based methods, the 1-Dimensional path-based methods are found to be computationally efficient than the 2Dimensional Poisson equation approach. Initially, path lengths were taken as almost three quarters original image size[9]. This needed averaging the results along 20 to 40 paths to get a reasonably good output. Later, the number of paths was reduced to 16 with non-random Hamiltonian paths [10]. By imposing closed shadow edge constraint[11], 4 Hamiltonian paths were able to give good removal results. Table 3. Qualitative Results (B) : The technique used is mentioned under each image. (i) input image
(ii) input image
(iii) Liu et al [28]
Guo et al[15]
Finlayson et al[6]
Gong et al [32]
Table 2. Qualitative Results(A) : Column (ii) has input images. Columns(iii) and (iv) has the shadow removal results by the techniques mentioned under each image. (i)
(a) 2D and 1D reintegration
(b) shadow boundary correction
(ii)
input image
(iii)
Finlayson et al[6]
(iv)
Fredembach and Finlayson[11]
Table 4. Per pixel RMSE for UIUC dataset Method
input image
Vicente et al[16]
Khan et al[27]
(c) surface reflectance input image
Shor et al[24]
Xiao et al[25]
Wu et al.[23] Guo et al.[15] Khan et al.[27] Vicente et al.[16] Actual Error
Shadow region RMSE 21.3
NonShadow RMSE 5.9
11.8
4.7
6.4
10.5
4.7
6.1
9.24
4.9
5.9
42
4.6
13.7
All region RMSE 9.7
Table 5. Average RMSE for UIUC Dataset (d) umbra and penumbra correction
(e) effect of training set
Method input image
input image
Fredembach and Finlayson[3]
Gryka et al[17]
Salamati et al[14]
Guo et al.[15]
19.85
Arbel and Hel-Or[12]
18.36
Gryka et al.[17]
13.83
Sasi and Govindan[20]
12.23
Guo et al[15]
VI. CONCLUSION
(f) region vs patch
(g) curved surface
Average RMSE
input image
Guo et al[15]
Zhang et al[18]
input image
Khan et al[27]
Arbel and HelOr[12]
The learning based methods for shadow removal consumes much time and memory compared to other Copyright Š 2016 MECS
Shadows are unavoidable entities that appear in an image when direct light is unable to illuminate the entire scene uniformly due to the presence of an obstruction between the light source and a surface. Due to the adverse effect of shadows in the image with various applications, removal of shadows from an image has become an active area of research in Computer Vision. This is considered as a difficult task since the entire input necessary to locate and illuminate the shadow region is to be derived from the single image. In this paper, we have discussed a few of the notable works in the literature to remove the I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
A Survey on Shadow Removal Techniques for Single Image
shadows in an image. For the purpose of structuring the review, shadow removal techniques were classified under five categories, namely, reintegration methods, relighting methods, patch-based methods, color transfer methods, and interactive methods. Numerous works are available for eliminating different kinds of shadows such as cast shadow, self-shadow, soft shadow, hard shadow, shadow in videos, aerial images, and outdoor images. Any shadow removal technique should initially detect the shadows in the input image. This is followed by illuminating the shadow regions to get a shadow-free output. One of the earliest methods for shadow removal from an image was using reintegration of derivative image in which the effect of shadow edges was reduced to zero. While the 2-Dimensional reintegration methods were observed to be computationally expensive, authors came up with 1-Dimensional path-based methods to remove shadows. All these methods assumed single point light source and needed strong shadow edges. The relighting methods aim at finding a constant or a set of constants that can relight the shadow regions such that the transition from a shadow region to the adjacent non-shadow regions in the shadow-free image is imperceptible. These methods are usually fast and most of them find separate constants for umbra and penumbra regions. The major focus of patch-based methods is to reduce the computation time needed by pixel based systems. They are based on the assumption that within a patch, the variation in reflectance is very less. The color transfer methods try to restore the information in the shadow region using the color information from the lit region. The works in this category have evolved from a uniform set of parameters for the shadow region to an adaptive set of parameters considering the varying reflectance within a shadow region. The last category is that of interactive shadow removal techniques. These methods need the user to provide information on the shadow location within the image. From the survey, it can be concluded that the shadow removal algorithms should be able to yield good quality results that preserve the texture and color information underlying the shadow region, without revealing the transition between shadow and non-shadow regions, in real time. REFERENCES [1] C. R. Jung, ―Efficient background subtraction and shadow removal for monochromatic video sequences,‖ IEEE Trans. Multimed., vol. 11, no. 3, pp. 571–577, 2009. [2] H. Li, L. Zhang, and H. Shen, ―An adaptive nonlocal regularized shadow removal method for aerial remote sensing images,‖ IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 106–120, 2014. [3] C. Fredembach and G. Finlayson, ―Simple Shadow Removal,‖ pp. 3–6. [4] L. Xu, F. Qi, R. Jiang, Y. Hao, and G. Wu, ―Shadow Detection and Removal in Real Images: A Survey,‖ Citeseer, 2006. [5] R. K. Sasi and V. K. Govindan, ―Shadow Detection and Removal from Real Images,‖ Proc. Third Int. Symp.
Copyright © 2016 MECS
[6]
[7]
[8] [9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
45
Women Comput. Informatics - WCI ’15, no. AUGUST, pp. 309–317, 2015. G. D. Finlayson, S. D. Hordley, and M. S. Drew, ―Removing Shadows from Images,‖ Comput. Sci., vol. 2353, pp. 823–836, 2002. G. D. Finlayson, S. D. Hordley, C. Lu, and M. S. Drew, ―On the removal of shadows from images,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 1, pp. 59–68, 2006. Land, Edwin H. The retinex theory of color vision. Scientific America, 1977. G. D. Finlayson, S. D. Hordley, C. Lu, and M. S. Drew, ―Removing Shadows From Images using Retinex,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 1, pp. 59– 68, 2006. G. D. Finlayson and C. Fredembach, ―Fast Re-integration of Shadow Free Images.,‖ Color Imaging Conf., vol. 2004, no. 4, pp. 117–122, 2004. C. Fredembach and G. Finlayson, ―Hamiltonian PathBased Shadow Removal,‖ Br. Machine Vision Conference, vol. 2, pp. 502–511, 2005. E. Arbel and H. Hel-Or, ―Texture-preserving shadow removal in color images containing curved surfaces,‖ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2007. Du, Zhenlong, Hai Lin, and Hujun Bao. "Shadow removal in gradient domain." In International Conference Image Analysis and Recognition, pp. 107-115. Springer Berlin Heidelberg, 2005. N. Salamati, A. Germain, and S. Süsstrunk, ―Removing shadows from images using color and near-infrared,‖ Proc. - Int. Conf. Image Process. ICIP, pp. 1713–1716, 2011. R. Guo, Q. Dai, and D. Hoiem, ―Paired regions for shadow detection and removal,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 12, pp. 2956–2967, 2013. T. F. Yago Vicente and D. Samaras, ―Single image shadow removal via neighbor-based region relighting,‖ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8927, pp. 309–320, 2015. M. Gryka, M. Terry, and G. J. Brostow, ―Learning to Remove Soft Shadows,‖ ACM Trans. Graph., vol. 34, no. 5, pp. 153:1–153:15, 2015. L. Zhang, Q. Zhang, and C. Xiao, ―Shadow Remover: Image Shadow Removal Based on Illumination Recovering Optimization,‖ IEEE Trans. Image Process., vol. 24, no. 11, pp. 4623–4636, 2015. L. Ma, J. Wang, E. Shechtman, K. Sunkavalli, and S. Hu, ―Appearance Harmonization for Single Image Shadow Removal,‖ 2016. R. K. Sasi and V. K. Govindan, ―Shadow removal using sparse representation over local dictionaries,‖ Eng. Sci. Technol. an Int. J., 2016. E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley, ―Color transfer between images,‖ IEEE Comput. Graph. Appl., vol. 21, no. 5, pp. 34–41, 2001. T. P. Wu and C. K. Tang, ―A Bayesian approach for shadow extraction from a single image,‖ Proc. IEEE Int. Conf. Comput. Vis., vol. I, pp. 480–487, 2005. T.-P. Wu, C.-K. Tang, M. S. Brown, and H.-Y. Shum, ―Natural shadow matting,‖ ACM Trans. Graph., vol. 26, no. 2, p. 8–es, 2007. Y. Shor and D. Lischinski, ―The shadow meets the mask: Pyramid-based shadow removal,‖ Comput. Graph. Forum, vol. 27, no. 2, pp. 577–586, 2008. C. Xiao, R. She, D. Xiao, and K. L. Ma, ―Fast shadow removal using adaptive multi-scale illumination transfer,‖ Comput. Graph. Forum, vol. 32, no. 8, pp. 207–218, 2013.
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
46
A Survey on Shadow Removal Techniques for Single Image
[26] C. Xiao, D. Xiao, L. Zhang, and L. Chen, ―Efficient shadow removal using subregion matching illumination transfer,‖ Comput. Graph. Forum, vol. 32, no. 7, pp. 421– 430, 2013. [27] S. Khan, M. Bennamoun, F. Sohel, and R. Togneri, ―Automatic Shadow Detection and Removal from a Single Image,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, no. 1, pp. 1–1, 2015. [28] F. Liu and M. Gleicher, ―Texture-consistent shadow removal,‖ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5305 LNCS, no. PART 4, pp. 437–450, 2008. [29] Miyazaki, Daisuke, Yasuyuki Matsushita, and Katsushi Ikeuchi. "Interactive shadow removal from a single image using hierarchical graph cut." In Asian Conference on Computer Vision, pp. 234-245. Springer Berlin Heidelberg, 2009. [30] E. Arbel and H. Hel-Or, ―Shadow removal using intensity surfaces and texture anchor points,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 6, pp. 1202–1216, 2011. [31] A. Nandedkar, ―An interactive shadow detection and removal tool using granular reflex fuzzy min-max neural network,‖ Proc. World Congr. Eng., vol. II, pp. 4–7, 2012. [32] H. Gong, D. Cosker, C. Li, and M. Brown, ―User-aided single image shadow removal,‖ Proc. - IEEE Int. Conf. Multimed. Expo, pp. 2–7, 2013. [33] Gong, Han, and D. P. Cosker. "Interactive shadow removal and ground truth for variable scene categories." In BMVC 2014-Proceedings of the British Machine Vision Conference 2014. University of Bath, 2014. [34] Lalonde, Jean-François, Alexei A. Efros, and Srinivasa G. Narasimhan. "Detecting ground shadows in outdoor consumer photographs." In European conference on computer vision, pp. 322-335. Springer Berlin Heidelberg, 2010. [35] J. Zhu, K. G. G. Samuel, S. Z. Masood, and M. F. Tappen, ―Learning to recognize shadows in monochromatic natural images,‖ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 223–230, 2010.
Authors’ Profiles Saritha Murali completed her Master of Technology in Computer Science (Information Security) from National Institute of Technology, Calicut, India. She is currently working towards the achievement of Ph.D. degree in the area of shadow detection and removal in the same institute. Her research interests are in the area of Computer Vision and Image Processing. She has a few research publications to her credit.
V.K. Govindan is currently working as Professor in Computer Science and Engineering at Indian Institute of Information Technology, Kottayam, India. He worked as Professor in the Department of Computer Science and Engineering at National Institute of Technology (NIT), Calicut, India during 1998 to 2015. He completed Bachelor's and Master's degrees in Electrical Engineering from NIT Calicut and obtained Ph.D. degree in the area of Character Recognition from Indian Institute of Science, Bangalore, India. He has more than 37 years of teaching & research experience and has served as Head of the Department of Computer Science and Engineering, and Dean Academic of NIT Calicut. His research interests include image processing, pattern recognition, and operating systems. He has more than 170 research publications, completed several sponsored research projects, authored 20 books, produced 9 PhDs, and currently guiding several PhDs.
Saidalavi Kalady is Associate Professor and Head of the Department of Computer Science and Engineering at National Institute of Technology, Calicut, India. He completed Post Graduation from Indian Institute of Science, Bangalore, India. His research interests include Computational Intelligence and Operating Systems. He obtained his Ph.D. in the area of agent-based systems from NIT Calicut, India.
How to cite this paper: Saritha Murali, V.K. Govindan, Saidalavi Kalady,"A Survey on Shadow Removal Techniques for Single Image", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.38-46, 2016.DOI: 10.5815/ijigsp.2016.12.05
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 38-46
I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.06
Image Comparison with Different Filter Banks On Improved PCSM Code Jagdish Giri Goswami 1
M.Tech Scholar, Uttarakhand Technical University/Computer Science, Dehradun, 248007, India E-mail: mail2jagdishgoswami@gmail.com
Pawan Kumar Mishra 2
Assistant Professor, Uttarakhand Technical University/Computer Science, Dehradun, 248007, India E-mail: pawantechno@rediffmail.com
Abstract—Image compression is playing a vital role in the development of various multimedia applications. Image Compression solves the problem of reducing the amount of data required to represent the digital image. In image compression methods there are several techniques evolved. All techniques of image compression basically divided into two parts, spatial domain compression technique and frequency domain compression technique. In frequency domain techniques there are numerous techniques like Fourier Transform, Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) etc. after converting the image into frequency domain transformation, it uses several encoding technique like Embedded Zero Tree (EZW) coding, SPIHT (Set Partitioning in Hierarchical Tree), ASWDR (Adaptively Scanned Wavelet Difference Reduction) WDR (Wavelet Difference Reduction) and STW (Spatial orientation Tree Wavelet) etc. These encoding schemes are also known as Progressive Coefficients Significance Methods (PCSM). In this paper the wavelet filters combine with improved PCSM codes and proposed a new filter for further improvement. In new wavelet proposed filter has slightly change in the scaling and wavelet function of existing filter. It gives the wide range of selectivity of frequencies in higher and lower side of it. Hence it provides better lower bandwidth range with greater high preservation of frequencies. Scaling and wavelet impulse response of proposed filter then a comparison is made on the proposed work with all the filters. Filters are demonstrated to show the performance of compression using wavelet functions. The filters are used in the work like bi-orthogonal (BIO), Reverse bi-orthogonal (RBIO), Coiflets (COIF), Daubechies (DB), Symlet (SYM) and Improved Progressive Coefficients Significance Method (IPCSM) encoding scheme will be compare and analyze with all compression parameters like mean square error (MSE), peak to signal noise ratio (PSNR), compression ratio (CR), execution time (ET), bits per pixel (BPP), root mean square error (MSE). Index Terms—DWT, DCT, MSE, PSNR, CR, ET, BPP, RMSE, SNR, MAE, HAAR, DB, SYM, COIF, BIOR, RBIO.
Copyright © 2016 MECS
I. INTRODUCTION Digital image compression is commodious when a digital image has to store for future reference and transmit form one location to another for digital communication. Digitization of digital images provided a useful tool to the scholars of medical field, space engineering, defence enforcement and law area. Now with new technology in the field of digital image processing, we are creating a world where digital images become a very important part of our day by day life. The use of digital image compression is concern already in the abstract and also it is discussed that, by how many compression methods we can get the goal. Here, introduction and importance of the work is given in brief. In this work a digital image is converted into a frequency domain of the high spectrum and low coefficient spectrum and wavelet Parameters are ready to measure the performance of the specific coding technique [1]. In this paper an image is compressed by improved Progressive Coefficients Significance Method (IPCSM) and after compressing the code different filter banks apply on it to improve the quality of image. Image compression is a technique by which an image can be minimizing in size without losing the quality of original image. Image compression solved the problem of limitations of store the digital image. Digital Image Compression is use to reduce redundancy of the digital image code, the storage requirement, transmission time and process duration. A compressed image takes less memory space to store which indicates less time to need for processing of image. Wavelet image compression is very much popular because of its blocking artifacts nature and high quality multi-resolution reconstructed images quality. Many classy and novel wavelet based algorithm for image compression have been evolved and implemented in last few years. There are two types of digital image compression algorithms, Lossy image compression and Lossless image compression. In the Lossy image compression the compressed image is not replica as the input image. There is always some loss in output image [2]. Wavelet image compression, vector quantization and transformation coding are Lossy image compression. While in Lossless image compression the I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
48
A Survey on Shadow Removal Techniques for Single Image
compressed output image is same as the original input image, there is no loss in original image quality. Huffman Encoding, Run length encoding and Incremental Encoding are Lossless image compression [3]. PCSM method helps to compress the digital image into a small code (bit rate) by which storing and transmission of a digital image one location to another and different filters are use improve the image quality. By this PCSM method, digital image is converted into frequency domain of the high spectrum and low spectrum of coefficient. These coefficients are converted by an appropriate method by which the digital image converted into small code of an array [4]. Reconstruct of the image by a decoder and after applying inverse image transform a reconstructed digital image which is almost replica of the input digital image [5]. There are some compression parameters to analyze the performance of reconstructed image. These parameters are mean square error (MSE), peak to signal noise ratio (PSNR), compression ratio (CR), execution time (ET), bits per pixel (BPP), root mean square error (MSE).etc. The objective of this paper is not only provide the better compression method for real digital images but also provide a better quality of digital images by wavelet method. Here, an improved PCSM code used to compress the image into a lower bit rate code which further can be used for storing and transmission purpose. Wavelet transform is used as a basic tool to compress image followed with an encoding method to code the image pixels. This improved code will be calibrated with different filter banks of wavelet. For this purpose few important changes on the existing techniques we can achieve the best image compression result.
II. RELATED WORK This section provides a survey of the related work in the areas of classification for image compression. Many different approaches have been developed for classification of image compression over the past seven decades. Various applications are used in these researches. In Ref. [1] a wavelet-based progressive image transmission method is proposed. Here also a combined method is proposed to minimize the image browsing time. The proposed scheme transforms a digital image from spatial domain into frequency domain by using discrete wavelet transformation (DWT) methods. For wavelet transformation phase they used Haar wavelet transformation. In Ref. [2] they introduced a great application in transferring videos and images over the web to minimize data transfer time and resource consumption. A lot of methods based on DCT and DWT have been introduced in the past like JPEG, MPEG, EZW, SPIHT etc. In Ref. [3] proposed a Wavelet-MFOCPN based technique and algorithm tested on varieties of benchmark images for color image compression. In Ref. [4] Defined the image compression is to minimize irrelevance image data in order to store the image in less memory and to improve the transfer time. In Ref. [5] Copyright Š 2016 MECS
proposed image compression system not only good quality compressed images but also good compression ratio, while maintaining small time cost. In Ref. [6] prescribed the Image compression is the application of Data compression on digital images. Different techniques for image compression have been reviewed and presented that include DFT, FFT, DCT and DWT. In Ref. [7] described a novel method in image compression with different algorithms by using the transform of wavelet accompanied by neural network. In Ref. [8] reviewed lossless data compression methodologies and compares their performance. The refreshers have find out that arithmetic encoding methodology is very powerful over Huffman encoding methodology. In Ref. [9] presented a survey of various types of lossy and lossless image compression techniques and analysis it. In Ref. [10] examined the performance of a set of lossless data compression algorithm, on different form of text data. A set of selected algorithms are implemented to evaluate the performance in compressing text data. In Ref. [11] proposed for encryption and compression. This paper focuses mainly on highly secure encryption and high compression rate using SPIHT method in Larger Images. In Ref. [12] proposed two technique Lossy compression techniques and lossless compression. Lossless technique the image can be reconstructed without loss of information. But in lossy compression it cause some form of information loss when it reconstructed. In Ref. [13] digital image compression which uses the unique embedded Wavelet based image coding in combination with Huffman-encoder for more compression. In [14] they proposed a lossless image compression technique combining a unique step with the integer wavelet transform. The unique step proposed in this technique is a simplified version of the median edge detector algorithm. In [15] proposed a digital image compression used hybrid wavelet transform matrix. Hybrid wavelet transform is formed using some orthogonal component transforms. In [16] proposed some unique approaches to the rationalization of FDWT and IDWT basic operations execution with the reduced number of multiplications are considered. In Ref. [17] proposed a unique technique for lossless image compression. In Ref. [18] introduced a digital wavelet transform (DWT) based image compression methods have higher compression rate with less amount of memory requirements and execute image compression technique. By using of unique wavelet filters, dmey, Symlets, Daubechies, Coiflets, reverse biorthogonal and examine the compression performance. In Ref. [19] compared five different Wavelet based digital image compression techniques are analyzed. These techniques involved in the digital image comparison process. In Ref. [20] introduced a improved PCSM code, by reducing the execution time, increased PSNR and enhanced the compression ratio. Improved PCSM introduced a vital aspect in digital image compression and enhanced coding.
I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
A Survey on Shadow Removal Techniques for Single Image
III. BACKGROUND A.
Discrete Wavelet Transform (DWT)
For short-term Fourier transform, sine or cosine wave replaced by a short wave which was not long lasting wave. This short wave is wavelet and its transform is called as wavelet transform. Wavelet series in continuous domain is given as the combination of continuous scaling and wavelet function. The resulting coefficients in Discrete Wavelet Series expansions are called Discrete Wavelet Transform (DWT).Using DWT in image compression because of Discrete Cosine Transform (DCT) has a lot of major drawbacks. DCT is a cosine wave based image compression technique and it calculates the coefficients of frequencies by applying transform function to image or a region of image. In the region, DCT is completely unable to detect high or low frequency and just give the result of an averaging filter. Applying DCT in an image block by block creates reasonable compression artefacts i.e. blocking artefacts and also known as checker board effect in DCT [6]. To resolve the band limitation problem of DCT we have to use a short-term DWT instead of sine or cosine wave based transform in which it goes for the complete range of the signal. This short wave is known as wavelet and wavelet based transform is known as wavelet transform [7]. The scaling functions give low frequency analysis of signals to obtain approximations and wavelet functions provide high-frequency analysis of signals to extract details in signals. Here, a unique approach is presented in the direction of sub band analysis and synthesis and it will be shown how DWT would completely fit into the sub band coding and decoding requirements. DWT is widely used to perform the sub band analysis of signal in multi resolution approach. DWT and sub band decomposition have a good relationship and it is possible to compute DWT through sub band decomposition in sequences [8]. B. Progressive Coefficients Significance Methods (PCSM) Progressive Coefficients Significance Methods (PCSM) used to compress the image into a lower bit rate code which further can be used for storing and transmission purpose. PCSM codes mainly contains the codes which includes Embedded Zero Tree (EZW) coding, SPIHT (Set Partitioning in Hierarchical Tree), ASWDR (Adaptively Scanned Wavelet Difference Reduction) WDR (Wavelet Difference Reduction) and STW (Spatial orientation Tree Wavelet) [9]. All these encoding methods process the compressing algorithm in progressive manner this means after each pass we will get a better version on compressed image.
49
orientation Tree Wavelet) encoding of PCSM codes. This modified improved code use the list of SPIHT to perform the coding shorting in encoder and a tree based algorithm use to advance the compression process in the Progressive Coefficients Significance Methods (PCSM) series code. This improved code will contain the merits of both the coding of SPIHT and STW coding and improve the value of PSNR and MSE image compression parameters. Wavelet structure is applied on the image and combined with Discrete Wavelet Transform (DWT) and Progressive Coefficients Significance Methods (PCSM) algorithms performed one by one on the DWT coefficients [20].
Fig.1. Planning Chart of Improved PCSM Code [20]
D. Filter Banks All filter banks have some specific functions (mathematical functions). These mathematical functions work accordingly their specific purposes and help to change digital image in wavelet transform. Every filter bank has a wavelet functions and a scaling function. These filters are divided into two categories. Biorthogonal wavelet and Orthogonal functions [18].
IV. PROPOSED ALGORITHM
C. Previous Modification on PCSM Code An improved Progressive Coefficients Significance Methods (IPCSM) coding is presented which is an advancement and modification in SPIHT (Set Partitioning in Hierarchical Tree) and SWT (Spatial Copyright Š 2016 MECS
Fig.2. Block diagram of compression and reconstruction of a digital image
I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
50
A Survey on Shadow Removal Techniques for Single Image
In this approach of image compression method first change the digital image from spatial domain to frequency domain by DWT method. In second stage coefficients of these frequency components quantized by using and appropriate encoding algorithms. Here an idea is, not all the components carry sufficient energy so lower energy coefficients can be suppressed and most of the information maintained by keeping high frequency components secure. After compressing the code different filter banks apply on it to improve the quality of image. Now for reconstruction of the same image a decoder is used to decode it into a function of frequency. After applying final step of using inverse transform a reconstructed image is taken back which is almost similar to the input image. In this paper these filters are demonstrated to show the performance of compression using wavelet functions. The new filter used some properties of basic wavelet filter. There is slightly change in the scaling function and wavelet function of these existing filters. Scaling and wavelet impulse response of proposed filter is given as: Table.1 Scaling and wavelet function of proposed wavelet filter Scaling Function
Wavelet Function
-0.0102009221870400
0.00134974786501000
-0.0102300708193700
-0.00135360470301000
0.0556648607799600
-0.0120141966670800
0.0285444717151500
0.00843901203981000
-0.295463938592920
0.0351664733065400
-0.536628801791570
-0.0546333136825200
-0.295463938592920
-0.0665099006248400
0.0285444717151500
0.297547906345710
0.0556648607799600
0.584015752240750
-0.0102300708193700
0.297547906345710
-0.0102009221870400
-0.0665099006248400 -0.0546333136825200 0.0351664733065400 0.00843901203981000 -0.0120141966670800 -0.00135360470301000
It gives the wide range of selectivity of frequencies in higher and lower side of it. Hence it provides better lower bandwidth range with greater high preservation of frequencies. Value of Scaling and wavelet impulse response of proposed filter is given. In the program first image is transformed by the wavelet function into frequency domain where low and high frequency coefficients are converted and coded into a bit stream. This code is compared by the original images bit size and compression ratio is calculated. Then for reconstruction the bit stream output of encoder is decoded back to the frequency coefficients and then this is again converted back to the pixel matrix in the spatial domain by the same wavelet function in inverse transform. A. Flow Chart
Copyright © 2016 MECS
Fig.3.Flow Chart of proposed coding scheme with filter
In this program first image is transformed by the wavelet function into frequency domain where low and high frequency coefficients are converted and coded into a bit stream. This code is compared by the original images bit size and compression ratio is calculated. Then for reconstruction the bit stream output of encoder is decoded back to the frequency coefficients. And then this is again converted back to the pixel matrix in the spatial domain by the same wavelet function in inverse transform. B. Algorithm Algorithm of the proposed work is divided into three parts and as follows: /* start algorithm */ Part 1 Filter Selection Create a variable and assign it the image matrix X= [input Image] Select the filter type (as in the case of the existing filter) or choose the value of scaling and wavelet function (in the case of proposed filter). Type= ―Name of the Filter‖ Calculate the low and high decomposition and reconstruction filter (Lo_D, Hi_D, Lo_R, Hi_R) respectively. // Df(Scalling function), Rf(wavelet Function) Df = (Value mentioned in table 1)=W Rf = (Value mentioned in table 1)=W W = W/sum(W); // normalize filter sum. Lo_R = sqrt(2)*W; // Square Root I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
A Survey on Shadow Removal Techniques for Single Image
Hi_R = qmf(Lo_R); // Quadrature mirror filter QMF Hi_D = wrev(Hi_R);// reverses the vector Lo_D = wrev(Lo_R);// reverses the vector Part 2 Transformation L = 3 // Level of decomposition j = [h(:)' v(:)' d(:)' j]; // details store k = [size(x);k]; // size store Out Put code IW = dwt(j,k) // discrete wavelet Transform. /*End of Algorithm*/ Part 3 PCSM coding (formation of tree) From the significant code {a b c d} in binary a three layer binary tree can be made to reduce the encoding length as follows:
T(a) T(b) T(d)
T(c) T(e)
T(f)
T(g)
Fig.4. Binary Tree in Proposed coding formation of coefficients.
V. EXPERIMENT SIMULATION AND RESULT ANALYSIS In this paper, test image used for elegant the benefit of schemes used in improved PCSM coding with filter bank used. All the functions for the Improved PCSM code and proposed filter coding are written in the MATLAB 2011. In this program many inbuilt functions of MATLAB used for the development of different steps of the algorithm. Each phase presented a input image with respect to the functions. We have to obtain Discrete Wavelet Transform (DWT) of test images. In this implementation several predefined filters used along with the proposed filter.
51
Mean square error (MSE) is the cumulative squared error obtained between original and recovered image. Larger the value of MSE means the quality of image is poor [19]. iv.
Compression Ratio
Compression Ratio is judge efficiency of the compression as the ratio of rate of the size or number of bits of the original image data to the size of the compressed image data [15]. v.
Peak signal to noise ratio (PSNR)
To judge how much information has been lost while the reconstruction of the same input digital image. A distortion measurement used for a lossy compression algorithm. The PSNR is commonly used to measure the performance of all lossy compression algorithms. PSNR gives the ratio between the power of corrupting noise and the high power of a signal and which affects the fidelity of its representation [16]. Other parameter like RMSE is the square root. Here, maximum possible pixel value of the image. It is 255 generally for unsigned 8-bit gray scale image. Small value of PSNR results in the poor quality image. But PSNR is very fast and easy to implement. For object detection requirement the Hardware requirement is a personal computer with 2 GB RAM and run for the MATLAN 11, 10 GB of Disk space is need and save the program and input output files. All the Results in this projected thesis work has been acquired on MATLAB2011, installed on Microsoft Windows 7 based operating system along with 2 GB DDR3 RAM, Intel Core i3, 2.20Ghz Processor and 320GB HDD. B. Test Images Result
A. Important Parameters For Measuring Performance i.
Execution Time
This is the time required to encode the original image with the help of image compression encoded algorithm. It is measured in second [10]. Fig.5. Test image
ii.
Bit per pixel (bpp)
Bit per pixel also known as ―bpp‖ of the image is defined as the number of bits of information stored per pixel of an image. Bit per pixel in the case of decoded image can be calculated by obtaining the total number of bits of the image i.e. number of significant coefficients multiplied by 8 and after this dividing them by total number of coefficients (or total number of pixels)[13]. iii.
Mean Square Error (MSE)
Copyright © 2016 MECS
In this paper, test image example are used to elegant the benefit of schemes used in PCSM coding with filter bank. All the functions for the Improved PCSM code and proposed filter coding are written in the MATLAB 2011. In this program many inbuilt functions of MATLAB used for the development of different steps of the algorithm. Each phase presented a input image with respect to the functions. We have to obtain Discrete Wavelet Transform (DWT) of test images. In this implementation several I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
52
A Survey on Shadow Removal Techniques for Single Image
predefined filters used along with the proposed filter. This test image is processed by the Improved PCSM code. In which Improved PCSM code is used without improved wavelet filter. All the parameters are calculated in this process. Then the existing wavelets filters are used to enhance the result of the PCSM code are all parameters are compared. Finally the test image is processed by the proposed filter with PCSM code to produce the best result in the wavelet compression technique. All the image compression parameters are compared and analyze, finally the improvement in the compression process is stated. In this paper we will analyze the all experimentation with this test images and its Results shows in the graphs and the tables. Following eight graphs are observed by the experimentation.
Fig.9. ET plot for Test image
Fig.10. BPP plot for Test image Fig.6. RMSE plot for Test image
Fig.7. SNR plot for Test image
Fig.8. MAE plot for Test image
Copyright Š 2016 MECS
Fig.11. MSE plot for Test image
Fig.12. CR plot for Test image
I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
A Survey on Shadow Removal Techniques for Single Image
Fig.13. PSNR plot for Test image
53
Fig.14.Reconstructed Image test Table.2 Comparison Table of parameters
PCSM BIOR1.1 DB2 HAAR SYM2 COIF1 RBIO1.1 IMPROVED FILTER
PSNR 36.9856 38.2825 39.2235 38.2825 39.2235 39.2908 38.2825 40.0028
MSE 13.0173 9.6568 7.7756 9.6568 7.7756 7.6559 9.6568 6.4983
BPP 1 .5 .5 .5 .5 .5 .5 .5
CR% 12.5 75 75 75 75 75 75 75
EX.TIME 9.8593 9.2486 9.588 9.3776 9.4796 9.5724 9.3316 9.6184
MAE 2.8255 3.3895 3.0609 3.3895 3.0609 3.0428 3.3895 2.8162
SNR 88.8548 90.1517 91.0927 90.1517 91.0927 91.16 90.1517 91.872
RMSE 3.608 3.1075 2.7885 3.1075 2.7885 2.7669 3.1075 2.5492
REFERENCES VI. CONCLUSION This work presents a computationally efficient method designed for image compression. The objective of this paper is to present PCSM Coding in a new scheme by which we can get a digital image which has not only a good quality but also a higher compression ratio. All the parameters are increasing when we are using the predefined filter in MATLAB instead of HAAR wavelet CR is increasing 75% of previous method with 3-5 % improvement in error and PSNR SNR parameters. This should be also noted that from the above obtained results for mean square error (MSE), power to signal noise ratio (PSNR), compression ratio (CR), execution time (ET) and bits per pixel (bpp). It concludes that all the coding methods provide different results for different images. Now in advancement of PCSM we are using the new filter to further increase the values. The improved and enhanced coding scheme provides further improvements in the MSE with 5 to 10% reduction reduced and in PSNR with the same amount of increment. Future scope of the coding is to further develop a filter to enhance more the compression process and also we can change the value of the scaling and wavelet function in the coding part of the thesis to calibrate the improvement more. This method is also useful in the case of other image compression application like satellite imaging, medical imaging etc.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Copyright © 2016 MECS
Vinay Jeengar, S.N. Omkar ,Amarjot Singh ,Maneesh Kumar Yadav, Saksham Keshri ―A Review Comparison of Wavelet and Cosine Image Transforms‖ I.J. Image, Graphics and Signal Processing, 2012, 11, 16-25 Published Online September 2012 in MECS. Md. Rifat Ahmmad Rashid, Mir Tafseer Nayeem, Kamrul Hasan Talukder,Md. Saddam Hossain Mukta “ A Progressive Image Transmission Method Based on Discrete Wavelet Transform (DWT)”I.J. Image, Graphics and Signal Processing, 2012, 10, 18-24 Published Online September 2012 in MECS Ashutosh Dwivedi, N Subhash Chandra Bose, Ashiwani Kumar,A Novel Hybrid Image Compression Technique: Wavelet-MFOCPN pp.492-495, 2012 Prachi Tripathi ―Image Compression Enhancement using Bipolar Coding with LM Algorithm in Artificial Neural Network ―IJSRP, Volume 2, Issue 8, 2012 1 ISSN 22503153 . M. Venkata Subbarao ―Hybrid Image Compression using DWT and Neural Networks ― International Journal of Advanced Science and Technology Vol. 53, April, 2013. Gaganpreet kaur, Sandeep Kaur" Comparative Analysis of Various Digital Image Compression Techniques Using Wavelets " IJARCS, Volume 3, Issue 4, 2013 ISSN: 2277 128X. Farnoosh Negahban, Mohammad Ali Shafieian, and Mohammad Rahmanian" Various Novel Wavelet – Based Image Compression Algorithms Using a Neural Network as a Predictor"J. Basic. Appl. Sci. Res., 3(6)280-287, 2013 ISSN 2090-4304. S. Porwal, Y. Chaudhary, J. Joshi, and M. Jain, ―Data Compression Methodologies for Lossless Data and Comparison between Algorithms,‖ vol. 2, no. 2, pp. 142– 147, 2013.
I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
54
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
A Survey on Shadow Removal Techniques for Single Image
S. Gaurav Vijayvargiya and R. P. Pandey, ―A Survey: Various Techniques of Image Compression,‖ vol. 11, no. 10, 2013. Arup Kumar Bhattacharjee, Tanumon Bej, and Saheb Agarwal, ―Comparison Study of Lossless Data Compression Algorithms for Text Data \n,‖ IOSR J. Comput. Eng., vol. 11, no. 6, pp. 15–19, 2013. C. Rengarajaswamy and S. Imaculate Rosaline, SPIHT Compression of Encrypted Images,IEEE, pp. 336341,2013. Athira B. Kaimal, S. Manimurugan, C.S.C .Devadass, Image Compression Techniques: A Surveye-ISSN: 22787461, p-ISBN: 2319-6491,Volume 2, Issue 4 (February 2013) PP: 26-28. S.Srikanth and SukadevMeher, Compression Efficiency for Combining Different Embedded Image Compression Techniques with Huffman Encoding,‖IEEE, pp. 816-820, 2013. Richard M. Dansereau, Mohamed M. Fouad ―Lossless Image Compression Using A Simplified MED Algorithm with Integer Wavelet Transform‖ I.J. Image, Graphics and Signal Processing, 2014, 1, 18-23 Published Online November 2013 in MECS. Dr. H.B.Kekre, Dr.TanujaSarode ,PrachiNatu ―Performance Comparison of Hybrid Wavelet Transform Formed by Combination of Different Base Transforms with DCT on Image Compression‖ I.J. Image, Graphics and Signal Processing, 2014, 4, 39-45 Published Online March 2014 in MECS Aleksandr Cariow,Galina Cariowa ―Algorithmic Tricks for Reducing the Complexity of FDWT/IDWT Basic Operations Implementation‖ I.J. Image, Graphics and Signal Processing, 2014, 10, 1-9 Published Online September 2014 in MECS B. C. Vemuri, S. Sahni, F. Chen, C. Kapoor, C. Leonard, and J. Fitzsimmons, ―Losseless image compression,‖ Igarss 2014, vol. 45, no. 1, pp. 1–5, Pooja Rawat, Ashish Nautiyal, Swati Chamoli Performance Evaluation of Gray Scale Image using EZW and SPIHT Coding Schemes International Journal of Computer Applications (0975 – 8887) Volume 124 – No.15, August 2015. Hunny Sharma, Satnam Singh, IJARCSSE, pp. 16991702 ―Image Compression Using Wavelet Based Various Algorithms‖, 2015. Jagdish Giri Goswami, Pawan Mishra, IJAFRC, pp.17-25 ―Performance Analysis of Image Compression using Progressive Coefficients Significance Methods (PCSM)‖,2016.
Mr. Pawan Kumar Mishra pursuing Ph.D in Computer Science & Engineering from Uttarakhand Technical University, Dehradun. He received his M.Tech. degree in Computer Science & Engineering from Uttarakhand Technical University, Dehradun in 2010 and B.Tech degree in Computer Science & Engineering from Dr. B.R Ambedkar University, Agra in 2002.
Authors’ Profiles Mr. Jagdish Giri Goswami pursuing M.tech in Computer Science & Engineering from Uttarakhand Technical University, Dehradun and B.Tech degree in Computer Science & Engineering from D.B.I.T. Dehradun, Uttarakhand Technical University, in 2011.
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 47-54
I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.07
2D Convolution Operation with Partial Buffering Implementation on FPGA Arun Mahajan CEC Landran, Mohali/ECE Dept, Chandigarh, 140307, India Email: arunnmahajann@gmail.com
Mr. Paramveer Gill CEC Landran, Mohali/ECE Dept, Chandigarh, 140307, India Email: paramveer.ece@cgc.edu.in
Abstractâ&#x20AC;&#x201D;In the modern digital systems, the digital image processing and digital signal processing application form an integrated part in the system design. Many designers proposed and implemented various resources and speed efficient approaches in the recent past. The important aspect of designing any digital system is its memory efficiency. The image consists of various pixels and each pixel is again holds a value from 0 to 255 which requires 8 bits to represent the range. So a larger memory is required to process the image and with the increase in size of the image the number of pixels also increases. A buffering technique is used to read the pixel data from the image and process the data efficiently. In the work presented in this paper, different window sizes are compared on the basis of timing efficiency and area utilization. An optimum window size must be selected so as to reduce the resources and maximize the speed. Results show the comparison of various window operations on the basis of performance parameters. In future other window operation along with convolution filter like Adaptive Median filter must be implemented and used by changing the row and column values in Window size. Index Termsâ&#x20AC;&#x201D;2D Convolution, Median Filter, FPGA
I. INTRODUCTION Image de-noising is the process to eliminate the noise from the images. Nonlinear filter such as adaptive mean filter are used for image de-noising because they reduce the smoothing and preserve the image edges. Digital images are affected by the noise formed from transmission of images, acquisition, scanner, camera sensor and many more. Noise in the image basically refers to change in the display of image such as contrast, brightness, etc. usually, noise can be defined as erroneous intensity variation that take place generally, because of imperfections in imaging device that utilized to achieve images or in other words, because of interruption in transmission channels [9]. While capturing image there is a chance of inclusion of noise, which will affect the pixel intensity values. There are a plenty sources of noise that affects the image. Some sources included transmission Copyright Š 2016 MECS
errors, imperfect instruments, natural phenomena, and imperfect data acquisition process and compression techniques. The image noise may not be visible but will be there in the image. The image quality is affected by many factors such as environmental temperature, sensitivity of camera, time taken to capture images and so on. The brightness, color, smoothness in the image gets affected thus producing a picture, which is undesirable. The following image is a noisy image with the presence of excessive random noise [3]. There are several kinds of noise from which images are influenced. Some of these noises are gamma noise, uniform and non-uniform noise, Gaussian noise and many more. In the digital images, impulse noise is one of the major sources of error. This type of noises is taking place through image transmission. Elimination of noise usually refers to filter the signal that can have more or less noise, which affects the image quality. The major objective of de-noising is to improve the real image while eliminating the noisy pixel [4]. The median filter algorithms optimize the sorting process to reduce the computational complexity. The median filter algorithms increase the quality of the filtered images without increasing the computational complexity of median filter algorithm. These algorithms try to detect the noisy pixels and adaptively filter only these noisy pixels. However, the adaptive median filter algorithm presented in this work both reduces the computational complexity of median filter algorithm and increases the quality of filtered images by exploiting the pixel correlations in the input image[10].
II. RELATED WORK Verma, Kesari,et al. [1]in this paper, improvement in Adaptive Median Filter has been presented used for preservation of edges. In biomedical images edges are the significant feature. The major objective of this work is to preserve the edges without losing in peak signal to noise ratio and SNR. In this work, a novel constraint for calculating Edge Retrieval Index (ERI) has also been proposed which helps to evaluate the edge preservation index in images.in homogenous area, proposed approach cleans the entire image noise but preserves the edges. The results indicate that the proposed approach can be I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
56
2D Convolution Operation with Partial Buffering Implementation on FPGA
adapted to eliminate noise and prevent edges. The result also proves that, by using this approach edge lost is minimal. Saleem, S. Abdulet al. [2] in this study, an Effective Noise Adaptive Median Filter have been proposed in order to eliminate Impulse Noises from Color Images. Color images are degraded by noise due to transmission, acquisition and storage. This proposed method helps to remove impulse noise and also maintained its image details. This method also helps to enhance its image quality. This novel method uses a spatial domain approach and 3×3 overlapping window to filter the signal. The proposed median filter has been assessed using MATLAB and simulations have been done on a both gray scale and color images. The results demonstrated the effectiveness of median filter when compared with some other adaptive mean algorithm. Chakravarthy, SR Sannasi et al. [3] proposed a boundary discriminative noise detection algorithm for eliminating impulse noise and random noise. The features of Image will degrade by the impact of noise. In this work, modifications to the filtering step of the BDND algorithm has been presented by increasing the window size one step higher to existing size to solve those issues. The results indicate that proposed boundary discriminative noise detection algorithm can produce sharper image and also noise is removed. This presented algorithm has been implemented using MATLAB 7.12 using image-processing toolbox. Kaur Amanpreet, Rahul Malhotra et al. [4] presented a non-iterative adaptive median filter. In this work, noisy pixel is removed from given window using proposed method i.e. non-iterative median filter. The proposed approach indicates efficient performance and it helps to eliminate the noisy pixel. In order to propose the median filters in MATLAB, Image processing toolbox has been used. Habib, MuhammadNoiseet al. [5] a adaptive fuzzy inference system has been presented in this paper. This novel approach is used for arbitrary value impulse noise recognition and elimination. The presented filter utilizes the intensity, which depends on statistics to build adaptive fuzzy membership functions. The Simulation results are depending upon PSNR indicates the efficiency of presented filter. Bhateja, Vikrantet al. [6] proposed a non-iterative adaptive median filter for removing noise infected with impulse noise. This scheme works in 2 stages. The presented process is verified on the images having distinct features and it may be seen that it is generating good results in terms of qualitative and quantitative measures of the image in contrast to another filtering methods. Meher, SarojKet al. [7] in this paper, enhanced recursive and adaptive median filter has been presented. Proposed RAMF method used for the re-establishment of images tarnished with impulse noise. The size of window may change, depending upon the occurrence of noise – free pixels. The performance indicates that the proposed approach is extremely efficient as compared with other Copyright © 2016 MECS
algorithm in term of PSNR and image enhancement factor. The dominance of the proposed filter is also necessary by visual analysis. Shanmugavadivuet al [8] in this work, a novel noise filter has been proposed which helps to eliminate noise from the image, which is tarnished, by impulsive noise. The performance of presented filter is better as compare to other traditional approaches in term of peak SNR. The presented filter demonstrated that it is very efficient for eliminating noise from the images which having 90% of noise. Mukherjee, Manaliet al. [9] this work proposed low complexity reconfigurable hardware architecture. Adaptive Median filter has been proposed in this work for eliminating the noise from the digital images. MSE and PSNR have been done in order to contrast the performance of median and adaptive median filter. This work presented a hardware implementation that is necessary for real time execution. FPGA is highly utilized for the processing in real time. The simulation result has been done using the Xilinx ISE 14.5 software. Kalali, Ercan, et al[10] Presented a 2D adaptive median filter algorithm in this paper. In this work adaptive median filter reduces the complexity of 2D filters and it gives good quality images. The novel methods have been implemented for 5*5 window size. the proposed hardware can process 105 full HD (1920x1080) images per second in the worst case on a Xilinx Virtex 6 FPGA, and it has more than 80% less energy consumption than original 2D median filter hardware on the same FPGA. Hsieh, Mu-Hsien et al [11] This novel approach is used for random valued impulse noise detection and removal. The proposed median filter has been assessed using MATLAB and simulations have been done on a both gray scale and color images. The results indicated that effectiveness of median filter when compared with some other adaptive mean algorithm. The presented approach re established noisy images with 1–99% levels of salt-and-pepper impulse noise. Sree, P. Syamala Jayaet. al [12] presented a novel adaptive median-based lifting filter. This is used to eliminate noise from the images by computing the median of neighboring pixel. In this paper, proposed algorithm uses lifting scheme to remove the noise. The numerical result indicates that the proposed algorithm is very efficient and this filter is compared with other traditional filters. On the basis of comparison it is found that the proposed method performs well and it can eliminate salt and pepper noise as high as 90%. This method works exceptionally well at each level of noise in terms of PSNR and SSIM. Deka, Bhabesh et al[13] presented multi-scale based adaptive median filter which has ability to re establish images degraded by impulsive and salt-pepper noise and having very high noise ratios. The propose approach is efficient, simple and easy to implement. In this paper, noise detection method is used depending on the multiscale filtering and switching median filter. Several switching filter has been discussed in this paper such as I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
2D Convolution Operation with Partial Buffering Implementation on FPGA
DWM, SDROM, BDND, ACWM and many more. These filers are compared and evaluated with the proposed adaptive filter. And it is found that proposed adaptive filter is better than other existing methods. this proposed method also helps to preserve the textures details and edge of images. Zhang, Peixuanet al[14] in this paper, a novel adaptive weighted mean filter (AWMF) has been presented. This is used for eliminating and detecting high level of impulsive noise. The median filter algorithms increase the quality of the filtered images without increasing the computational complexity of median filter algorithm. These algorithms try to detect the noisy pixels and adaptively filter only these noisy pixels. This proposed method helps to remove impulse noise and also maintained its image details. This method also helps to enhance its image quality. The result indicates that AWMF filter provides very less detection error rate and high restoration quality especially for high-level noise. Ibrahem et al [15] presents a method for the removal of salt and pepper noise and considers 3x3 window for the removal of noise. If all the pixels are compared for the removal of noise then window size must be increased to 5x5 window. The proposed technique in this paper works efficient at a noise density of 97%. Ahmed, Faruk et al[16] in this work, adaptive iterative fuzzy mean filter has been presented which is used for de-noising images degraded by salt and pepper noise. Adaptive mean filter are used for image de-noising because they reduce the smoothing and preserve the image edges. There are a plenty sources of noise that affects the image. The result analysis indicates that the algorithm to be superior to state-of-the-art filters.
III. PROPOSED METHODOLOGY In the proposed technique buffering operations are performed with different window sizes. The buffered image is then pass to the median filter for the removal of salt and pepper noise. The proposed design is divided into two main blocks. In the first block row buffering technique is implemented and in the second block median filter is implemented. The blocks are described as: Row Buffer: In the row buffering technique, 4 windowing operations are implemented in the proposed technique. In the first operation 3�3 window is implemented. In this window 3 inputs are loaded into the window and 9 outputs are generated after every clock cycle. Figure 1 shows the 3�3 input window operations. In the second window 4�3, four inputs are loaded into the buffer and after every clock cycle 12 outputs are generated. Figure 2 shows the 4�3 window operations. In the third window operation, 5�3 window operation is proposed. In this window operation 5 inputs are loaded into the buffer after every clock cycle and 15 outputs are generated which are further utilized for the median filtering calculation. Figure 3 shows the 5�3 window operation. And in the fourth operation, 5X5 window is implemented in which 5 inputs are loaded into the buffer
Copyright © 2016 MECS
57
and 25 outputs are generated for the median filter. Figure 4 shows the 5X5 window operation. In7 In4 In1
In8 In5 In2
In9 In6 In3
Fig.1. 3�3 Window Operation In10 In7 In4 In1
In11 In8 In5 In2
In12 In9 In6 In3
Fig.2. 4�3 Window Operation In13 In10 In7 In4 In1
In14 In11 In8 In5 In2
In15 In12 In9 In6 In3 Fig.3. 5�3 Window Operation
In21 In16 In11 In6 In1
In22 In17 In12 In7 In2
In23 In18 In13 In8 In3
In24 In19 In14 In9 In4
In25 In20 In15 In10 In5
Fig.4. 5X5 Window Operation
Median Filter: Median filtering operation is implemented for the removal of salt and pepper noise in the image. Median Filter operation is a convolution operation, which is implemented by sorting the rows first, and then sorting the columns and finally diagonal sorting is performed in a 3�3 window. Figure 4-6 shows the median filtering operation such as Row wise sorting, Column wise sorting and Diagonal sorting implemented in the proposed design.
Fig.5. Row Wise Sorting
Fig.6. Column Wise Sorting
I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
58
2D Convolution Operation with Partial Buffering Implementation on FPGA
Fig.7. Diagonal Sorting
IV. RESULTS AND DISCUSSIONS The proposed methodology is implemented using the Xilinx Vertex 5 FPGA and the language used for the implementation is VHDL. The results for the number of
resources used and the speed of operation are compared for the three different window operations for buffering. Table 1 shows the comparison of number of slices and other resources used and the total time required for the completion of buffering operations. The comparison table shows the total time required to buffer a image with two rows zero padded in the image. The total time shows the speed of operation of buffering for different window operations. The table also shows the resources utilized by the FPGA for different window operations. The speed of operations for is increased as the total time is decreased with an increment in the area utilized. So a trade off must be maintained in order to efficiently buffer the image.
Table.1. Comparison table Parameters
3X3 Window
4X3 Window
5X3 Window
5X5 Window (Basic)
123 1151
5X5 Window (Proposed) 208 3414
No. of Slice Registers No. of LUTs
75 516
99 828
No. of LUT-FF Pairs Latency (ns)
38 19.514
44 18.883
54 18.907
205 19.674
192 -
Total Time (ns)
221880
166410
133128
79876
-
The resources utilized by the 5X5 window is compared to the base paper approach and it comes out that the proposed approach utilizes optimal resources as compared to basic approach. Figure 7 shows the simulation results of the proposed methodology for window. Table 2 and Table 3 show the comparison of
1296 2400
PSNR and MSE with the base paper approach. Figure 8 (a-c) shows the introduction of noise in the image and the original image. The peak signal to noise ratio value is calculated for the noisy image with respect to the original image and is shown in table 2 and 3.
Fig.8. Simulation Results
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
2D Convolution Operation with Partial Buffering Implementation on FPGA
59
Table.2. Comparison Table PSNR with Base Paper Approach Image
Basic Approach (3% noise)
Proposed Approach (3% noise)
Basic Approach (5% noise)
Proposed Approach (5% noise)
Lena
25.54
28.5466
25.41
26.45
Baboon
22.72
25.19
22.63
23.51
Peppers
26.82
27.96
26.43
26.95
Table 3 Comparison Table MSE with Base Paper Approach Image
Basic Approach (3% noise)
Proposed Approach (3% noise)
Basic Approach (5% noise)
Proposed Approach (5% noise)
Lena Baboon Peppers
181.44 347.5 135.07
90.8698 205.35 113.9
186.77 354.72 147.6
147.33 300.88 134.95
(a) Lena Image
(b) Peppers Image
(c) Baboon Image Fig.9. (a-c): Original Image and Noisy Image
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
60
2D Convolution Operation with Partial Buffering Implementation on FPGA
V. CONCLUSION Buffering in digital image processing applications is an important area of concern in recent times. In the proposed technique four different sized windows are proposed for buffering the image and are compared on the basis of area and speed parameters. Also 5X5 window implemented here is compared to the base approach and shows an optimized value of the total area utilized by the device. It is clear from the proposed technique that as the number of rows in the buffering operation increases, the area utilized by the device also increases and to a larger extent. The speed of reading is increased but the memory requirement to hold and process the data also increases. The three test case images are considered and obtained a greater PSNR value and lesser MSE value as compared to the base approach. In future other window operation along with convolution filter like Adaptive Median filter must be implemented and used by changing the row and column values in Window size.
[9]
[10]
[11]
[12]
[13]
ACKNOWLEDGMENT Authors would like to thank Chandigarh Engineering College (CEC) Landran, Mohali (India) for important and timely help in research. The work we present in this paper is completely supported by CEC, Landran.
[14]
REFERENCES [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Verma, Kesari, Bikesh Kumar Singh, and A. S. Thoke. "An Enhancement in Adaptive Median Filter for Edge Preservation." Procedia Computer Science48 (2015): 2936. Saleem, S. Abdul, and T. Abdul Razak. "An Effective Noise Adaptive Median Filter for Removing High Density Impulse Noises in Color Images. ―International Journal of Electrical and Computer Engineering (IJECE) 6, no. 2 (2015). Chakravarthy, SR Sannasi, and S. A. Subhasakthe. "Adaptive Median Filtering with Modified BDND Algorithm for the Removal of High-Density Impulse and Random Noise." (2015). Kaur, Amanpreet, Rahul Malhotra, and Ravneet Kaur. "Performance evaluation of non-iterative adaptive median filter." In Advance Computing Conference (IACC), 2015 IEEE International, pp. 1117-1121. IEEE, 2015. Habib, Muhammad, Ayyaz Hussain, Saqib Rasheed, and Mubashir Ali. "Adaptive fuzzy inference system based directional median filter for impulse noise removal." AEU-International Journal of Electronics and Communications 70, no. 5 (2016): 689-697. Bhateja, Vikrant, Kartikeya Rastogi, Aviral Verma, and Chirag Malhotra. "A non-iterative adaptive median filter for image denoising." In Signal Processing and Integrated Networks (SPIN), 2014 International Conference on, pp. 113-118. IEEE, 2014. Meher, Saroj K., and Brijraj Singhawat. "An improved recursive and adaptive median filter for high density impulse noise." AEU-International Journal of Electronics and Communications 68, no. 12 (2014): 1173-1179. Shanmugavadivu, P., and P. S. Jeevaraj. "Laplace equation based Adaptive Median Filter for highly corrupted images." In Computer Communication and
Copyright © 2016 MECS
[15]
[16]
Informatics (ICCCI), 2012 International Conference on, pp. 1-5. IEEE, 2012. Mukherjee, Manali, and Mausumi Maitra. "Reconfigurable architecture of adaptive median filter— An FPGA based approach for impulse noise suppression." In Computer, Communication, Control and Information Technology (C3IT), 2015 Third International Conference on, pp. 1-6. IEEE, 2015. Kalali, Ercan, and Ilker Hamzaoglu. "A low energy 2D adaptive median filter hardware." In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp. 725-729. IEEE, 2015. Hsieh, Mu-Hsien, Fan-Chieh Cheng, Mon-Chau Shie, and Shanq-Jang Ruan. "Fast and efficient median filter for removing 1–99% levels of salt-and-pepper noise in images." Engineering Applications of Artificial Intelligence26, no. 4 (2013): 1333-1338. Sree, P. Syamala Jaya, Pradeep Kumar, Rajesh Siddavatam, and Ravikant Verma. "Salt-and-pepper noise removal by adaptive median-based lifting filter using second-generation wavelets." Signal, Image and Video Processing 7, no. 1 (2013): 111-118. Deka, Bhabesh, and Sangita Choudhury. "A multiscale detection based adaptive median filter for the removal of salt and pepper noise from highly corrupted images." International Journal of Signal Processing, Image Processing and Pattern Recognition 6, no. 2 (2013): 129-144. Zhang, Peixuan, and Fang Li. "A new adaptive weighted mean filter for removing salt-and-pepper noise." IEEE Signal Processing Letters 21, no. 10 (2014): 1280-1283. Ibrahem, Hani M. "An efficient and simple switching filters for removal of high density salt-and-pepper noise." International Journal of Image, Graphics and Signal Processing 5.12 (2013) in MECS Ahmed, Faruk, and Swagatam Das. "Removal of highdensity salt-and-pepper noise in images with an iterative adaptive fuzzy filter using alpha-trimmed mean." IEEE Transactions on fuzzy systems 22, no. 5 (2014): 13521358.
Authors’ Profiles Arun Mahajan is a Post Graduate student of M.Tech VLSI at Chandigarh Engineering College Landran, Mohali, India. He completed his B.Tech degree in Electronics and Communication from SUS College of engineering and technology, Mohali, India. His area of interest includes Image Processing, VLSI, and digital communication.
Mr. Paramveer Gill is an assistant professor in ECE Dept. at Chandigarh Engineering College Landran, Mohali, India. He is supervising many M.Tech students for their research work. His areas of interest are Image Processing, Wireless Sensor Network, Watermarking, Control and Automation.
I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
2D Convolution Operation with Partial Buffering Implementation on FPGA
61
How to cite this paper: Arun Mahajan, Paramveer Gill,"2D Convolution Operation with Partial Buffering Implementation on FPGA", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.55-61, 2016.DOI: 10.5815/ijigsp.2016.12.07
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 55-61
I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70 Published Online December 2016 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijigsp.2016.12.08
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank Mohd.Abdul Muqeet Muffakham Jah College of Engineering and Technology, Hyderabad, India Email: ab.muqeet2013@gmail.com
Raghunath S.Holambe SGGS Institute of Engineering and Technology, Nanded, India Email:holambe@yahoo.com
Abstract—Face recognition using subspace methods are quite popular in research community. This paper proposes an efficient face recognition method based on the application of recently developed triplet half band wavelet filter bank (TWFB) as pre-processing step to further enhance the performance of well known linear and nonlinear subspace methods such as principle component analysis(PCA),kernel principle component analysis (KPCA), linear discriminant analysis (LDA), and kernel discriminant analysis (KDA). The design of 6 th order TWFB is used as the multiresolution analysis tool to perform the 2-D discrete wavelet transform (DWT). Experimental results are performed on two standard databases ORL and Yale. Comparative results are obtained in terms of verification performance parameters such as false acceptance rate (FAR), false rejection rate (FRR) and genuine acceptance rate (GAR). Application of TWFB enhances the performance of PCA, KPCA, LDA, and KDA based methods. Index Terms—Face Recognition, triplet half band wavelet filter bank (TWFB), PCA, KPCA, LDA, KDA.
I. INTRODUCTION Face recognition has rapidly developed over past of the years with important applications in video surveillance identity authentication, security monitoring, access control, suspect tracking in commercial and law enforcement [1].Challenges in face recognition arise due to change in face image characteristics such as a change in illumination of scene, change in pose, change in facial expression, and occlusion of some portion of the face area. Appearance-based face recognition methods are categorized into linear and nonlinear subspace methods and frequency based methods. Principal component analysis (PCA) [3], Fisher discriminant analysis (FDA) [4], independent component analysis (ICA) [5] and twodimensional PCA [6] are the widely used linear subspace methods. These methods attempt to represent face images in a lower dimensional feature space which is a linear combination of a set of basis [25]. In spite of the popularity of linear subspace methods, their performance is not satisfactory under large variation of facial images Copyright © 2016 MECS
such as illumination pose and occlusion. To overcome such variations some nonlinear subspace methods like kernel PCA (KPCA) [7] and kernel discriminant analysis (KDA) [8] are proposed which are non linear extension of PCA and LDA. These methods map an input image nonlinearly to a higher dimensional feature with the help of a kernel function [8]. Non-statistical face recognition method using local binary pattern (LBP) [26] outperforms PCA and LDA-based methods in terms of recognition performance and computational simplicity. Disadvantages are sensitivity to noise and a large size of the feature vector. Wavelet transforms based face recognition methods are extensively by the research community. Researchers have used different off the shelf wavelet filter for facial feature extraction. This paper involves the usage of recently developed triplet wavelet filter bank (TWFB) to enhance the performance of existing linear and non linear subspace methods. Application of (TWFB) provides the efficient multiresolution features compared with the existing wavelet filters. The paper is organized as follows. Section II gives the insight of the related work. In Section III, we briefly reviewed linear and non linear subspace methods such as PCA [3], LDA [4], KPCA [7], and KDA [8]. Section IV reviews the design of TWFB [19], [22]. The proposed method of face recognition using TWFB based features, used face databases, and experimental results are summarized in Section V followed by conclusion in Section VI.
II. RELATED WORK Recent progress in face recognition efficiently uses wavelet transform that possess good time and frequency localization which helps to detect the facial geometric structures and offers robust feature extraction even under variable illumination [9]. In wavelet analysis the discrete wavelet transform (DWT) [10] and Gabor wavelet transform (GWT) [11]–[12] based face recognition methods are more popular. GWT is more successful due to its capability of effectively expressing face feature for its directional selectivity. But Gabor wavelets provide over-complete representation which increases computational complexity due to the convolution of each I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
face image with several Gabor wavelet kernels at many scales and orientations [13]. When DWT is applied before PCA or LDA, features like edge information are extracted more efficiently which forms meaningful feature vectors [13].Such a method proposed by Feng et al [14] applied PCA on wavelet subband and utilized a midrange frequency subband for PCA representation. Chien and Wu [10] performed face recognition by applying the wavelet transform to extract waveletfaces from face images and incorporated LDA for discriminant analysis. A comparison of the performance of face recognition systems based on the principal component analysis (PCA), Gabor wavelet transform (GWT) and discrete wavelet transform (DWT) is presented in [15]. It is demonstrated that the DWT algorithm has the ability to achieve recognition performance at levels similar to those of the GWT algorithm with a faster analysis time. Utsumi et al [9] evaluated the performance of face recognition in response to various wavelets transform (e.g. Haar, French hat, Mexican hat, Daubechies, Coiflet, Symlet, and Ospline).They demonstrated that the performance of the wavelets assessed is similar to that of the Gabor wavelet. Jadhav and Holambe [16] applied radon transform to images and DWT is applied on the generated radon feature space. Radon transform improves the lowfrequency component of the face images and wavelet transform when applied on the Radon feature space provides multiresolution features. The proposed system is invariant in facial expression and illumination. Most of the existing DWT based face extraction methods use offthe-shelf wavelet filters like orthogonal (Haar, Daubechies, Coiflet, Symlet) and Biorthogonal (B-spline Biorthogonal, and CDF-9/7). But even for the same class of face images, the effect of these wavelet filters may vary for the distinct application. This is due to the fact that many of the face images do not have the same statistical characteristics due to the changes in illumination, pose, and occlusion. Designing of appropriate wavelet filters can result in robust representations with regard to changes in illumination, pose, and occlusion and provide significant feature vectors with reduce computational complexity. Hence to design wavelet filters which match the characteristics of face images for feature extraction is highly desirable. Biorthogonal wavelet filters are preferred over orthogonal wavelet filters for applications such as feature extraction and image compression. The most popular construction for the design of biorthogonal filter banks is CDF-9/7 9/7 [17] which is designed by factorizing a Lagrange half band polynomial (LHBP) to obtain wavelet filters in the filter bank. Lagrange half band polynomial is imposed with maximum no of zeros at z 1 due to which they offer no degree of freedom and there is no direct control over the frequency response of the filters [19],[22]. Patil et al [18] designed a pair of biorthogonal wavelet filters to control the frequency response of the filters. In their proposed work instead of using the Lagrange half band polynomial a general half band polynomial (GHBP) is considered for factorization. A recent work on designing of biorthogonal wavelet filter bank is proposed by Copyright © 2016 MECS
63
Rahulkar et.al [19], [22].A general half band polynomial is used and three polynomials are generated using factorization. These triplet filters are used in the equations proposed in [21] to obtain the low pass analysis and synthesis filters. The authors used it for Iris feature extraction, however due to its properties like flexible frequency response, near-Orthogonality, and regularity; the designed filter bank can be effectively used for applications in feature extraction and image compression [22].The authors in [23] used the 10th order TWFB for iris feature extraction. Our work is motivated from [19] and [22] to develop a face recognition system so as to derive significant and compact facial features using the TWFB as the preprocessing step and provide the essential multiresolution features to the existing subspace methods like PCA, KPCA, LDA, and KDA. Experiments are performed on two well known databases ORL [27] and Yale [28].
III. OVERVIEW OF LINEAR AND NON-LINEAR SUB-SPACE METHODS In this section, a brief review on linear and non-linear sub-space methods is presented. A. Principal Component Analysis Principal component analysis is a famous linear projection method for dimensionality reduction [3]. Assuming the training set consists of N images, the data matrix X {x1 , x2 ,....xN }, xi Rt is obtained by row concatenation of image data. Let C be the number of classes and each image xi belongs to one of
C classes {1, 2,.......C} .The mean face of the data matrix is defined as m (1/ N ).iN1 xi .The covariance matrix or the total scatter matrix St is defined as [3]:
St
1 N
N
xi m . xi m
T
(1)
i 1
The eigenvalues i and corresponding eigenvectors Vi are computed from St eigenvalues problem [3]:
defined by the following
StVi iVi
(2)
Arranging all eigenvalues in the descending order and considering first few n highest eigenvalues and their corresponding eigenvectors generates the projection matrix Wpca and in turn new face space [3], T Z Wpca ( xi m) , where zi R f ( f t )
(3)
B. Linear Discriminant Analysis (LDA) I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
64
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
The objective of Fisherfaces method using linear discriminant analysis (LDA) is to choose subspace of face features that makes the ratio of between-class scatter matrix to the within-class scatter matrix largest [4]. For all samples of all classes, the between-class scatter matrix Sb and the within-class scatter matrix S w are defined as [4]: C
Sb
C
m0
ni . mi m . mi m
(10)
The corresponding eigen-value problem becomes [7]: (11)
(4) Applying the kernel trick, equation (11) can be transformed to
ni
x
j
mi
x j mi
T
KVi iVi ,
(5)
i 1 j 1
Where ni is the number of samples in ith class mi is the mean for the ith class samples and m is the mean face or global mean of all samples. If S w is non-singular, LDA seeks to find the optimal transformation matrix W to maximize the projection ratio denoted by [4]: J (W ) arg max W
T
| W SbW | T
(6)
| W SwW |
This ratio is maximized when W is constructed by considering C 1 leading eigenvectors of the following eigenvalue problem [4]:
Sw1Sb wi
1 N . ( xi ) N i 1
StVi iVi
T
i 1
Sw
where
i wi
i 1, 2,......, C 1
(12)
K is the N N centralized kernel matrix defined with
Kij ( xi )T . ( x j ) ( ( xi ) ( x j )) k ( xi , x j )
(13)
where k ( xi , x j ) is a kernel function employed to compute the dot product ( ( xi ). ( x j )) and (i, j 1,...., N ) Thus calculating the orthonormal eigenvectors V1 ,V2 ,......,Vn corresponding to n largest positive eigenvalues 1 , 2 ,......n , the new face space is calculated using [7]. n
Yi
(7)
Where wi (i 1, 2,......, C 1) is the generalized eigenvectors corresponding to the largest eigenvalues. But S w is always singular in face recognition due to the high dimensionality of face images [4]. Thus to make the S w matrix non-singular PCA is usually performed before LDA to reduce the dimensions from N to N C and then applying LDA defined by equation (6) to further reduce the dimension to C 1 [4].
for i 1,....N
i 1
Vi
i
.k ( xi , x) .
(14)
D. Kernel Discriminant Analysis(KDA) In the proposed work of KDA [8] the kernel feature space F is used instead of input space R t to find the traditional LDA. The between and within-class scatter matrix in feature space F are as C
Sb ni (mi m )(mi m )T
(15)
i 1
C. Kernel principal component analysis (KPCA) KPCA is a nonlinear extension of classical PCA [7]. The mapping of linear input space R t into high dimensional feature space F is obtained by a nonlinear mapping function,
: Rt F x ( x)
1 N . ( ( xi ) m0 )( ( xi ) m0 )T N i 1
Copyright © 2016 MECS
C
ni
Sw ( ( x j ) mi )( ( x j ) mi )T
(8)
(9)
(16)
i 1 j 1
where mi is the mean of
This non linear mapping is achieved though kernels trick [7]. The covariance matrix on the feature space F is:
St
and
i-th class mapping
samples , m is the is the global mean of all the mapping samples and ni is the number of samples in i-th class. Fisher discriminant function in F feature space is given as [8]: J1 ( w)
wT Sb w
wT S w w
(17)
I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
P( z) P( z) 2 z D , where D is odd
Since eigenvectors of w lies in the span of all samples in F , its eigenvectors can be represented with a linear combination of ( xi ) , i.e.
65
(24)
N
w ai ( xi ) Qa
(18)
i 1
Substituting equation (18) in equation (17), we obtain the following maximizing function for KDA [8]
aT K a J 2 ( w) T b a Kwa
(19)
Thus from solution of equation (19) we obtain the eigenvectors corresponding to the largest eigen-values in the following eigen-value problem:
K w1Kbi ii
Fig.1. Two-channel Biorthogonal Filter Bank
As per the work described in [19], [22], [23], a 6th order generalized half band polynomial is used to design (13/19) wavelet filters. The design procedure for the triplet half band wavelet filter bank (TWFB) is explained as below 1)
(20)
For n largest eigen-values we use the following formula to calculate projection of a new sample x on w in F feature space
P( z) 0 2 z 2 z 3 2 z 4 0 z 6
2)
n
yi ( w. ( x)) iT K x i k ( xi , x)
Consider a generalized half band polynomial (GHBP) P( z ) of 6th order i.e. K=6.
(21)
i 1
IV. TRIPLET HALF BAND WAVELET FILTER BANK (TWFB) AND FEATURE EXTRACTION 3) A. Triplet half band wavelet filter bank (TWFB): Design Review
This GHBP is used to polynomials (HBPs) by z 1 using the method [22]. Extracting of zeros on the GHBP P( z )
(25)
construct three half band extracting three zeros at of synthetic division [19], at z=-1 imposes regularity and should satisfy the
constraints of X (( K / 2) 1) , where X are the number of zeros at z 1 and K is the order of GHBP. Regularity is imposed in the design of P( z ) by extracting three zeros at z=-1 to construct three half band polynomials P1 ( z ) , P2 ( z ) , P3 ( z ) .These three
As per the design methodology adopted in [19],[22],[23] only FIR filters for implementation of a two channel biorthogonal channel filter bank shown in Fig.1, where H 0 ( z ) and H1 ( z ) are the low pass and high pass filters respectively. The corresponding G0 ( z )
half band polynomials are expressed in terms of 0 only which are given as follows [19],[22],[23]
and G1 ( z ) are the low pass and high pass synthesis filters. The perfect reconstruction condition [20] is given as
P3 ( z) (1 z 1 ) X 3 .R3 ( z)
G0 ( z) H0 ( z) G1 ( z) H1 ( z) 2 z D
(22)
G0 ( z) H0 ( z) G1 ( z) H1 ( z) 0
(23)
and
P1 ( z ) (1 z 1 ) X1 .R1 ( z ) P2 ( z) (1 z 1 ) X 2 .R2 ( z) (26)
X i are number of zeros at z=-1, and R1 ( z), R2 ( z), R3 ( z) the remainder polynomials. Consider X1 0 , X 2 1 and X 2 2 such that X X1 X 2 X 3 3 which satisfies the condition of X (( K / 2) 1) , K 6 is the order of GHBP. Where
Where D is the amount of delay. If H1 ( z) G0 ( z) and G1 ( z) H0 ( z) then equation (2) is automatically satisfied and equation (1) reduces to equation (3) by defining the product filter P( z) H0 ( z).G0 ( z) . Thus, the design of the filter bank reduces to the design of the half band filter P( z ) whose factorization gives H 0 ( z ) and G0 ( z ) [20].
Copyright © 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
66
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
P1 ( z ) (1 z )0 .( 0 ( 0 0.5) z 2 z 3
linear phase and perfect reconstruction [19], [22]. These properties are vital in achieving higher facial discrimination capability along with linear and non linear subspace methods.
( 0 0.5) z 4 0 z 6 ) P2 ( z ) (1 z 1 )1.( 0 0 z 1 0.5 z 2 0.5 z 3 0 z 4 0 z 5 )
B. Facial Feature Extraction Using TWFB
P3 ( z ) (1 z 1 )2 .( 0 2 0 z 1 (2 0 0.5) z 2 (27)
We need 2-D separable filter bank for face feature extraction and we need only analysis or decomposition part of the filter bank. The implementation is carried out by applying one-dimensional wavelet filters to the rows of the face image and the columns of the row transformed data respectively. The one level-decomposition results in one approximation subband LL which gives a coarser approximation to the original image, one LH band and one HL band which records the changes of the image along the horizontal and vertical direction, and one HH band that shows the high-frequency component of the image. In our approach, TWFB is applied on face images to extract multiresolution-based facial feature. First, we perform two-level wavelet decomposition using the TWFB wavelet pairs and obtain the two-level lowfrequency sub-band LL as the feature vectors. The lowfrequency subband LL is an optimal approximation image of the original image which contains most important features of the face and can be sufficiently used for face recognition and the remaining high-frequency sub-bands can be neglected. These wavelet pairs provide an effective discriminatory representation of face images. Thus, after obtaining the multiresolution facial features using the TWFB, linear and non linear subspace methods are used to obtain the enhanced face recognition performance. To capture the difference between the two facial features or classification, we used Euclidean distance measures for LDA and KDA based methods and cosine distance measures for PCA and KPCA based face recognition methods.
2 0 z 3 0 z 4 ) As these HBPs are expressed in the form of 0 term, only a flexible frequency response can be obtained due to one degree of freedom [19], [22]. The value of using MATLAB 0 0.062499 is obtained unconstrained optimization function fminunc which considers the objective function designed on the basis of minimizing energy in the ripples of these three HBPs[19],[22]. 4)
The analysis low pass H 0 ( z ) and synthesis low pass
G0 ( z ) are given as follows from [19] and [21] H0 ( z) G0 ( z )
1 p 1 T1 ( z ). 1 pT0 ( z ) 2 2
1 pT0 ( z ) 1 p T2 ( z ).H 0 ( z ) 1 p 1 p
(28)
Where T0 ( z ) , T1 ( z ) and T2 ( z ) are half band kernels whose ideal pass band and stop band responses are 1 and 0 respectively. In order to achieve perfect reconstruction, these filters are formulated as below.
T0 ( z) P1 ( z) 1 T1 ( z) P2 ( z) 1 T2 ( z) P3 ( z) 1 5)
(29)
Substituting equation (29) in equation (28) we can obtain low pass analysis H 0 ( z ) and low pass synthesis G0 ( z ) filters of TWFB. To obtain same magnitude response from H 0 ( z ) and
G0 ( z ) at / 2 , the value of shaping parameter p 0.41421 is considered [21], [22]. The length of H 0 ( z ) and G0 ( z ) is 13 and 19 respectively. 6)
By using the below relation we can obtain the high pass analysis and synthesis filters.
G0 ( z) H1 ( z), G1 ( z) H0 ( z)
(30)
V. EXPERIMENTATION AND RESULTS The experiments were carried out in Matlab 2014a, on a 64-bit I3, 2.13 GHz processor, with 2 GB RAM. The performance evaluation is carried out on two standard databases, ORL [27], and Yale [28] face databases. To design the experiments for face verification we consider all the face images and applied the concept mentioned in previous section for performance measure. In this paper, to test the robustness of the proposed approach no preprocessing technique is used for face images such as face normalization. Thus to enhance the performance of PCA, LDA, KPCA, and KDA, 2-D implementation of TWFB is applied on the face images. Compared with the linear and non subspace projection methods which take raw pixel information, TWFB facial features contains more discriminating information thus shows robustness against variation in illumination, expression, and pose. A. Performance Measures
Thus, the usage of a GHBP to design TWFB improves the frequency response of H 0 ( z ) and G0 ( z ) filters. These filters not only satisfy regularity but also achieve Copyright © 2016 MECS
The verification performance of our face recognition methods is estimated using false acceptance rate (FAR); false reject rate (FRR) and guanine GAR values. FAR is I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
defined as the ratio of the number of accepted impostors to the total number of impostor accesses, while FRR is defined as the ratio of rejected genuine claims to the total number of genuine accesses. Where impostors are defined as the face images for an individual excluding his own images. A low value for FAR and FRR is often desirable but as the FAR increases FRR decreases, this is due to the uneven distribution of genuine and impostor claims. Thus to determine the optimum balance between FAR and FRR, Receiver Operating Curve (ROC) is adopted which plot FAR versus GAR [24].A higher value of GAR is often desirable for a better verification performance. B. Results on ORL face database The ORL database [27] consists of 10 different images of each of 40 different persons. There are variations in the capture time, lighting, head position, facial expressions such as eyes open or closed, smiling or not smiling and facial details such as glasses or no glasses. All images are 8-bit grayscale taken against a dark homogeneous background of resolution 112x92 pixels. The face images are resized to 128x128 pixels. Some samples face images of ORL face database are shown in Fig.2. The first series of experiments are conducted using the extracted TWFB features and four subspace methods for face verification. For this purpose, all parameters like feature dimensions and distance measure are optimized to achieve the best performance.
67
Fig.3. ROC curves of the subspace methods for ORL database
To verify the discriminating power of extracted TWFB based pixel features over the raw pixels of subspace methods a comparative performance of TWFB+PCA, TWFB+ LDA, TWFB +KPCA and TWFB+KDA is carried out. It can be observed from Table.2 and Fig.4 that the adaptation of TWFB before PCA, LDA, KPCA, and KDA methods enhances the performance of these methods and we can achieve better performance compared to the original methods. The dimension of the feature vector for PCA and KPCA based methods is 399, whereas its 39 for LDA and KDA based methods. The great advantage of applying TWFB comes from the fact that the size for the low frequency approximation subband of size 32x32 is suffice to capture the essential discriminating characteristics of face image compared to original image size of 128x128 as in case with PCA,LDA,KPCA and KDA methods. LDA-based method gives better performance than PCA-based method and kernel-based methods give better performance than linear-based PCA and LDA methods. Among all these methods, the KDA based methods combined with TWFB gives the best performance in terms of higher GAR when the dimension is set at 39.
Fig.2. Sample face images from ORL face database
Comparative results of PCA, LDA, KPCA and KDA are tabulated in Table.1 and same is shown with ROC curves in Fig. 3. For these methods, the pixel values of face images are simply concatenated to form a feature vector. For KPCA and KDA based methods, we adopted normalized polynomial kernel with the parameter d equal to 2.
Table 2. TWFB based Sub-space results in terms of FAR, FRR, and GAR on ORL database Method TWFB+PCA
FAR (%) 0.3031
FRR (%) 1.3182
GAR (%) 98.6818
TWFB+LDA
0.7161
1.2273
98.7727
TWFB+KPCA
0.3510
1.9545
98.0455
TWFB+KDA
0.2558
1.1364
98.8636
Table 1. Sub-space results in terms of FAR, FRR, and GAR on ORL database Method
FAR (%)
FRR (%)
GAR (%)
PCA
0.9744
3.2727
96.7273
LDA
0.6957
1.8636
98.1364
KPCA
0.4169
3.4545
96.5455
KDA
0.2558
1.8182
98.1818
Fig.4. ROC curves of the TWFB based subspace methods on ORL database
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
68
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
It is apparent from Table.2 and Fig. 4 that adaptation of TWFB based feature vectors enhances the performance of these subspace methods with difference in GAR. TWFB+PCA method achieves 2.02 % higher GAR compared with PCA, while 0.6442% higher GAR is observed for TWFB+LDA compared with LDA method. For TWFB+KPCA and for TWFB+KDA, 1.53% and 0.6944 % higher GAR is achieved respectively. C. Results on Yale face Database Yale face database [28] consists of face 15 individuals and for each individual there are 11 images. Images are having variations in illumination and facial expressions. The illumination variations are due to lighting changes in center-light, left-light and right-light. Different facial expressions are normal, happy, sad, sleepy, surprise and wink. We used this database to evaluate the performance of our method under the condition of change in illumination and facial expression. The face images are resized to 128x128 pixels. Samples face images of one person are shown in Fig.5.
TWFB based sup-space methods are tabulated in Table 4 and shown in Fig.7. It is evident that KPCA achieves better performance compared with PCA and KDA achieve better performance compared with LDAbased method for Yale database. For Yale databases, TWFB+PCA method achieves 0.28 % higher GAR compared with PCA, while 0.42% higher GAR is observed for TWFB+LDA compared with LDA method. For TWFB+KPCA 1.39 % and for TWFB+KDA 0.6154 % higher GAR is achieved respectively. Table 4. TWFB based Subspace results in terms of FAR, FRR, and GAR on Yale database Method TWFB-+PCA TWFB+LDA TWFB+KPCA TWFB+KDA
FAR (%) 4.2972 1.2551 10.5298 0.6452
FRR (%) 28.1818 2.9293 26.1616 0.9091
GAR (%) 71.8182 97.0707 73.8384 99.0909
Fig.5. Sample face from Yale face database with different illumination variation
Similar FAR and FRR test are performed on Yale database which gives 990 genuine attempts and 25575 impostor attempts. The performance values attained with the method taking raw pixel values images applying only PCA, LDA, KPCA and KDA are tabulated in Table.3 and the corresponding ROC curve is shown in Fig.6. Table 3. Subspace results in terms of FAR, FRR, and GAR on Yale face database Method PCA LDA KPCA KDA
FAR (%) 6.9169 0.6452 6.6276 0.6452
FRR (%) 27.1717 3.3333 27.1717 1.5152
GAR (%) 71.6162 96.6667 72.8283 98.4848
Fig.7. ROC curves of the TWFB based subspace methods on Yale database
D. Comparative results We also compared the effectiveness of TWFB as facial features with CDF-9/7 wavelet filters, as these filters are mostly used for wavelet based facial feature extraction. The comparative results are depicted in Table 5. Table 5. Comparative results
Method CDF-9/7+ LDA TWFB+LDA CDF-9/7+ KDA TWFB+KDA
ORL FAR (%)
FRR (%)
GAR (%)
Yale FAR (%)
FRR (%)
GAR (%)
0.514
1.318
98.682
1.255
2.929
97.071
0.716
1.227
98.773
1.857
2.323
97.677
0.302
1.500
98.500
0.645
1.111
98.889
0.255
1.136
98.864
0.645
0.909
99.091
VI. CONCLUSION
Fig.6. ROC curves of the subspace methods on Yale database
Copyright Š 2016 MECS
In this paper, we have proposed a face recognition method to enhance the performance of existing subspace methods with application of TWFB. The design of TWFB for facial feature extraction is also been discussed and experimentally tested for face verification purpose. The performance evaluation is carried out on two standard I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
face databases. The proposed method with TWFB+KDA achieve high GAR values of 98.86% and 99.09% method for ORL and Yale face databases respectively, which is significantly better than PCA, LDA, and KPCA based methods. REFERENCES [1]
[2] [3] [4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
W.Zhao,R Chellappa, A.Rosenfeld, and P.J.Phillips, "Face Recognition: A Literature Survey", Technical Report CAR-TR-948,Univ. of Maryland, CfAR, (2000). S. Z. Li and A. K. Jain, Handbook of Face Recognition, eds. Springer-Verlag, Mar.2005 M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neurosci., vol. 3, no. 1, pp. 71–86, 1991 P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 7, pp. 711–720, Jul. 1997 M. Bartlett, J. Movellan, and T. Sejnowski, “Face recognition by independent component analysis,” IEEE Trans. Neural Netw., vol. 13, pp. 1450–1464, 2002 J. Yang, D. Zhang, A.F. Frangi, and J.Y. Yang, “Twodimensional PCA: A new approach to appearance-based face representation and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol.26, no.1, pp.131–137, Jan. 2004 K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An introduction to kernel-based learning algorithms,” IEEE Trans. Neural Networks, vol. 12, pp. 181–201, Mar. 2001. J. W. Lu, K. Plataniotis, and A. N. Venetsanopoulos, “Face recognition using kernel direct discriminant analysis algorithms,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 117–126, Jan. 2003. Y. Utsumi, Y. Iwai, and M. Yachida, "Performance evaluation of face recognition in the wavelet domain," in Proc. Int. Con. Intelligent Robots and Systems, 2006, pp. 3344-3351. J. T. Chien and C. C.Wu, “Discriminant waveletfaces and nearest feature classifiers for face recognition,” IEEE Trans. Pattern Anal. Mach Intell., vol. 24, no. 12, pp. 1644–1649, Dec. 2002. C. J. Liu and H.Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” IEEE Trans. Image Process., vol. 11, no. 4, pp. 467–476, Apr. 2002 Štruc V., Pavešic, N, “The Complete Gabor-Fisher Classifier for Robust Face Recognition,” EURASIP Advances in Signal Processing, vol. 2010, 26 pages, doi:10.1155/2010/847680, 2010 L. L. Shen and L. Bai, “A review on gabor wavelets for face recognition,” Pattern Anal. Appl., vol. 9, pp. 273–292, 2006 G. C. Feng, P. C. Yuen and D. Q. Dai, “Human face recognition using PCA on wavelet subband,” Journal of Electronic Imaging, 9(2), 2000, pp. 226-233 M. Meade. S.C.Sivakumar,W. J. Phillips," Comparative performance of principal component analysis, gabor wavelets and discrete wavelet transforms for face recognition," Electrical and Computer Engineering, Canadian Journal, vol.30, no.2, pp.93,102, Spring 2005 doi: 10.1109/CJECE.2005.1541731 D.V. Jadhav, R.S. Holambe, “Feature extraction using Radon and Wavelet transforms with application to face recognition,” Journal of Neurocomputing 72 (2008) 1951–1959.
Copyright © 2016 MECS
69
[17] C. I. Daubechies and J. C. Feauveau, “Biorthogonal bases of compactly supported wavelets,” Commun. Pure Appl. Math., vol. 45, no. 5, pp. 485-560, 1992. [18] B. D. Patil, P. G. Patwardhan, and V. M. Gadre, “On the design of FIR wavelet filter banks using factorization of a halfband polynomial,” IEEE Signal Process. Lett., vol. 15, pp. 485–488, 2008 [19] A. D. Rahulkar and R. S. Holambe, “Half-iris feature extraction and recognition using a new class of biorthogonal triplet half-band filter bank and flexible kout-of-n: A postclassifier,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 1, pp. 230–240, Feb. 2012 [20] G. Strang and T. Nguyen, Wavelets and Filter Banks. Cambridge, MA: Wellesley-Cambridge, 1996. [21] R. Ansari, C. W. Kim, and M. Dedovic, “Structure and design of two channel filter banks derived from a triplet of halfband filtres,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 12, pp. 1487–1496, Dec. 1999. [22] Rahulkar A.D, Patil B.D, Holambe R.S, “A new approach to the design of biorthogonal triplet half-band filter banks using generalized half-band polynomials”, Signal Image Video Process 2012: 1–7. http://dx.doi.org/10.1007/s11760-012-0378-1. [23] S.S.Barpanda, B.Majhi, P.K.Sa, “Region based feature extraction from non-cooperative iris images using triplet half-band filter bank,” Optics and Laser Technology, 72 (2015)6-14. http://dx.doi.org/10.1016/j.optlastec.2015.03.003 [24] K. Jonsson, J. Kittler, Y.P. Li, J. Matas, “Support vector machines for face authentication,” Image and Vision Computing 20 (5–6) (2002) 369–375. [25] Ali Javed, ―Face Recognition Based on Principal Component Analysis, I.J. Image, Graphics and Signal Processing (IJIGSP), 2013, 2, 38-44. DOI: 10.5815/ijigsp.2013.02.06 [26] Murty, Gorti Satyanarayana, J. Sasi Kiran, and V. Vijaya Kumar. "Facial Expression Recognition Based on Features Derived From the Distinct LBP and GLCM." International Journal of Image, Graphics and Signal Processing, vol.6, no. 2, pp.68, 2014. [27] [Online].Available: http://www.uk.research.att.com/pub/data/att_faces.zip [28] [Online].Available: http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
Authors’ Profiles Mohd.Abdul Muqeet, received the B.E. degree from M.B.E.S College of Engineering, Ambejogai, India in 2001, and M.Tech. degree from SGGS Institute of Engineering and Technology, Nanded, India in 2007. He is presently a research scholar in Instrumentation Engineering at SGGS Institute of Engineering and Technology, Nanded, India. His research interests include filter banks, biometrics, and applications of wavelet transform.
I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
70
Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank
Dr. Raghunath S. Holambe received the Ph.D. degree from Indian Institute of Technology, Kharagpur, India, and he is presently a professor in Instrumentation Engineering in SGGS Institute of Engineering and Technology, Nanded, India. The areas of his research interest are digital signal processing, image processing, applications of wavelet transform, biometrics, and real time signal processing using DSP processors.
How to cite this paper: Mohd.Abdul Muqeet, Raghunath S.Holambe,"Enhancing Face Recognition Performance using Triplet Half Band Wavelet Filter Bank", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.12, pp.62-70, 2016.DOI: 10.5815/ijigsp.2016.12.08
Copyright Š 2016 MECS
I.J. Image, Graphics and Signal Processing, 2016, 12, 62-70
Instructions for Authors Manuscript Submission We invite original, previously unpublished, research papers, review, survey and tutorial papers, application papers, plus case studies, short research notes and letters, on both applied and theoretical aspects. Manuscripts should be written in English. All the papers except survey should ideally not exceed 18,000 words (15 pages) in length. Whenever applicable, submissions must include the following elements: title, authors, affiliations, contacts, abstract, index terms, introduction, main text, conclusions, appendixes, acknowledgement, references, and biographies. Papers should be formatted into A4-size (8.27″×11.69″) pages, with main text of 10-point Times New Roman, in single-spaced two-column format. Figures and tables must be sized as they are to appear in print. Figures should be placed exactly where they are to appear within the text. There is no strict requirement on the format of the manuscripts. However, authors are strongly recommended to follow the format of the final version. Papers should be submitted to the MECS Publisher, Unit B 13/F PRAT COMM’L BLDG, 17-19 PRAT AVENUE, TSIMSHATSUI KLN, Hong Kong (Email: ijigsp@mecs-press.org, Paper Submission System: www.mecs-press.org/ijigsp/submission.html), with a cowering email clearly staring the name, address and affiliation of the corresponding author. Paper submissions are accepted only in PDF. Other formats are not acceptable. Each paper will be provided with a unique paper ID for further reference. Authors may suggest 2-4 reviewers when submitting their works, by providing us with the reviewers’ title, full name and contact information. The editor will decide whether the recommendations will be used or not.
Conference Version Submissions previously published in conference proceedings are eligible for consideration provided that the author informs the Editors at the time of submission and that the submission has undergone substantial revision. In the new submission, authors are required to cite the previous publication and very clearly indicate how the new submission offers substantively novel or different contributions beyond those of the previously published work. The appropriate way to indicate that your paper has been revised substantially is for the new paper to have a new title. Author should supply a copy of the previous version to the Editor, and provide a brief description of the differences between the submitted manuscript and the previous version. If the authors provide a previously published conference submission, Editors will cheek the submission to determine whether there has been sufficient new material added to warrant publication in the Journal. The MECS Publisher’s guidelines are that the submission should contain a significant amount of new material, that is, material that has not been published elsewhere. New results are not required; however, the submission should contain expansions of key ideas, examples, and so on, of the conference submission. The paper submitting to the journal should differ from the previously published material by at least 50 percent.
Review Process Submissions are accepted for review with the same work has been neither submitted to, nor published in, another publication. Concurrent submission to other publications will result in immediate rejection of the submission. All manuscripts will be subject to a well established, fair, unbiased peer review and refereeing procedure, and are considered on the basis of their significance, novelty and usefulness to the Journals readership. The reviewing structure will always ensure the anonymity of the referees. The review output will be one of the following decisions: Accept, Accept with minor revision, Accept with major revision, Reject with a possibility of resubmitting, or Reject. The review process may take approximately three months to be completed. Should authors be requested by the editor to revise the text, the revised version should be submitted within three months for a major revision or one month for a minor revision. Authors who need more time are kindly requested to contact the Editor. The Editor reserves the right to reject a paper if it does not meet the aims and scope of the journal, it is not technically sound, it is not revised satisfactorily, or if it is inadequate in presentation.
Revised and Final Version Submission Revised version should follow the same requirements as for the final version to format the paper, plus a short summary about the modifications authors have made and author’s comments. Authors are requested to the MECS Publisher Journal Style for preparing the final camera-ready version. A template in PDF and an MS word template can be downloaded from the web site. Authors are requested to strictly follow the guidelines specified in the templates. Only PDF format is acceptable .The PDF document should be sent as an open file, i.e. without any date protection. Authors should submit their paper electronically through email to the Journal’s submission address. Please always refer to paper ID in the submissions and any further enquiries. Please do not use the Adobe Acrobat PDFWriter to generate the PDF file. Use the Adobe Acrobat Distiller instead, which is contained in the same package as the Acrobat PDFWriter. Make sure that you have used Type 1 or True Type Fonts(cheek with the Acrobat Reader or Acrobat Writer by clicking on File>Document Properties>Fonts to see the list of fonts and their type used in the PDF document).
Copyright Submission of your paper to this journal implies that the paper is not under submission for publication elsewhere. Material which has been previously copyrighted, published, or accepted for publication will not be considered for publication in this journal. Submission of a manuscript is interpreted as a statement of certification that no part of the manuscript is under review by any other formal publication. Submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the MECS Publisher or its editorial staff. The main author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission. More information about permission request can be found at the web site. Authors are asked to sign a warranty and copyright agreement upon acceptance of their manuscript, before the manuscript can be published. The Copyright Transfer Agreement can be downloaded from the web site. Publication Charges and Re-print No page charges for publications in this journal. Reprints of the paper can be ordered with a price of 150 USD. Electronic: free available on www.mecs-press.org.To subscribe, please contact the Journal Subscriptions Department, E-mail: ijigsp@mecs-press.org. More information is available on the web site at http://www.mecs-press.org/ijigsp.