B1060513 by IJERD Editor

International Journal of Engineering Research and Development e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.05-13

Feature extraction for content-based mammogram retrieval Dr.K.karteeka Pavan1, Sri.M.Brahmaiah2, Ms.Sk.Habi Munnissa3 1

RVR&JC College of Engineering, ANU, chowdavaram, Guntur-19. R.V.R&JC College of Engineering, ANU, chowdavaram, Guntur-19. 3 R.V.R&JC College of Engineering, ANU, chowdavaram, Guntur-19. 2

Abstract:- Extracting image features are one of the ways to classify the images. Image texture is used in CBIR (Content-Based Image Retrieval) to represent and index the images. Many statistical matrix representations are proposed to distinguish texture by the statistical distribution of the image intensity. This paper studies the performance of various gray level statistical matrices with thirteen statistical texture features. For the classification of mammograms. The relative performance of various statistical matrices is evaluated using classification accuracy and retrieval time by conducting experiments on MIAS (Mammography Image Analysis Society) images. Keywords:- Content-based image retrieval, Mammogram, Texture, Gray level statistical Matrix.

INTRODUCTION

Content-Based Image Retrieval (CBIR) has become more and more popular for various applications [12]. Medical image diagnosis is one of the primary application domains for content-based access technologies [23]. In the medical field, enormous amount of digital images are produced every day and used for diagnostics like X-ray, MRI, CT, and Mammogram [17]. Finding anatomic structures and other regions of interest is important in the clinical decision making process. Hence, the decision support systems in radiology create a need for powerful data retrieval [8]. Breast cancer is one of the causes for cancer deaths in women. Mammography is the most reliable methods for early detection of breast cancer and one of the most frequent Application areas within the radiology department with respect to content-based search [11, 22]. Texture is one of the visual features used in CBIR to represents the image to extract similar areas [27]. Mammograms possess discriminative textural information [16]; Specific textural patterns cane be revealed on the mammograms for the calcification, architectural distortion, asymmetry and mass categories [30]. In statistical texture analysis, the texture information in an image is represented by a gray level statistical matrix from which the textural features are estimated [9]. The second order and higher order gray level statistical matrices have been found to be a powerful statistical tool in the discrimination of textures [24]. In [29], the gray level co-occurrence matrices (GLCMs) of pixel distance one, three and five are generated in order to estimate the Haralickâ&#x20AC;&#x2122;s texture features for the retrieval of abnormal mammograms from the MIAS database. Mohamed Eisa et al. [5] investigated the retrieval of mass and calcification mammograms from the MIAS database using texture and moment-based features [1]. In [7], the textural features of a medical database consisting brain, spine, heart, lung, breast, adiposity, muscle, liver and bone images of 11 each are extracted from gray level co-occurrence matrices. The descriptor combining gradient, entropy, and homogeneity performs better than the remaining features. In [3], for the classification and retrieval of benign and malignant type mammograms in the MIAS database, the Gabor and GLCM based texture features are used in addition to shape features. Sun et al. [26] proposed texture features based on the combination of distortion constraint and weighted moments for the retrieval of abnormal mammograms from the MIAS database and the result show that their performance is better than region [14] and Gabor features. In [31], the gray level aura matrix (GLAM) is used to extract texture information for the retrieval of four categories of mammograms from the DDSM database in [32]. The objectives of the work is i) to extract texture features from various types of mammograms; ii) to investigate the effectiveness of the texture features for the retrieval of mammograms; iii) to compare the retrieval performance of GLCM, GLAM, GLNM texture features extraction methods. Section 2 explains the methodology for mammogram retrieval using the proposed gray level statistical matrix. Section 3 presents the experimental results and discussions. Finally, Section 4 gives the conclusion.

Feature extraction for content-based mammogram retrieval

II.

METHODOLOGY

Content-based mammogram retrieval using the proposed gray level statistical matrix consists of feature extraction and image retrieval. During the first stage, in the pre-processing step the regions of interest (ROIs) of the database images are normalized to zero mean and unit variance [4]. From the pre-processed images, the gray level statistical matrices are generated in order to estimate the texture features and to form the feature dataset using GLCM, GLAM, and GLNM. During the on-line image retrieval stage also, initially, the texture features based on the gray level statistical matrix are estimated from the pre-processed ROI of the given query image. Finally, the performance measures are calculated using SVM Classification in order to analyse the effectiveness of the proposed method towards mammogram retrieval.

Fig. 2.1 Overview of mammogram retrieval using the proposed approach

2.1 GLCM The texture filter functions provide a statistical view of texture based on the image histogram. These functions can provide useful information about the texture of an image but cannot provide information about shape, i.e., the spatial relationships of pixels in an image. Another statistical method that considers the spatial relationship of pixels is the gray-level cooccurrence matrix (GLCM), also known as the gray-level spatial dependence matrix. The toolbox provides functions to create a GLCM and derive statistical measurements from it.

2.1.1 Creating a Gray-Level Co-Occurrence Matrix To create a GLCM, use the graycomatrix function. The graycomatrix function creates a gray-level cooccurrence matrix (GLCM) by calculating how often a pixel with the intensity (gray-level) value i occurs in a specific spatial relationship to a pixel with the value j. By default, the spatial relationship is defined as the pixel of interest and the pixel to its immediate right (horizontally adjacent), but you can specify other spatial relationships between the two pixels. Each element (i,j) in the resultant glcm is simply the sum of the number of times that the pixel with value i occurred in the specified spatial relationship to a pixel with value j in the input image.The processing required to calculate a GLCM for the full dynamic range of an image is prohibitive, graycomatrix scales the input image. By default, graycomatrix uses scaling to reduce the number of intensity values in grayscale image from 256 to eight. The number of gray levels determines the size of the GLCM. To control the number of gray levels in the GLCM and the scaling of intensity values, using the NumLevels and the GrayLimits parameters of the graycomatrix function. See the graycomatrix reference page for more information. The gray-level co-occurrence matrix can reveal certain properties about the spatial distribution of the gray levels in the texture image. For example, if most of the entries in the GLCM are concentrated along the diagonal, the texture is coarse with respect to the specified offset. You can also derive several statistical measures from the GLCM. See Deriving Statistics from a GLCM for more information.To illustrate, the following figure shows how graycomatrix calculates the first three values in a GLCM. In the output GLCM, element (1,1) contains the value 1 because there is only one instance in the input image where two horizontally adjacent pixels have the values 1 and 1, respectively. glcm(1,2) contains the value 2 because there are two instances where two horizontally adjacent pixels have the values 1 and 2. Element (1,3) in the GLCM has the

Feature extraction for content-based mammogram retrieval value 0 because there are no instances of two horizontally adjacent pixels with the values 1 and 3. graycomatrix continues processing the input image, scanning the image for other pixel pairs (i,j) and recording the sums in the corresponding elements of the GLCM.

Figure 2.1.1: Process Used to Create the GLCM

2.2 GLAM An image can be modelled as rectangular constitutions of m x n grids. Furthermore a neighbourhoodsystemN = {Ns, s ∈ S} can be defined. At whichthe neighbourhood Ns is built from the basicneighbourhoodE at site s. The basic neighbourhood isthereby a chosen structural element [9].Aura Measure: [9] Given two subsets A, B⊆ S,where |A| is the total number of elements in A. The aurameasure of A with respect to B is given in (1).

GLAM (Gray Level Aura Matrix): [9] Let N be theneighbourhood system over S and {Si, 0 ≤ i ≤ G - 1} bethegray level set of an image over S with G asthenumber of different gray levels, then the GLAM of theimage is given in (2). WherebySi={s ∈S | xs= i} is the gray level set correspondingto the ith level, and m(Si, Sj, N) is the aurameasure of Si with respect to Sjwith the neighbourhoodsystemN.

Figure 2.2.1: Process Used to Create the GLAM Fig 2.2.1(a) A sample binary lattice S, where thesubset A is the set of all 1’s and B the set of all 0’s.Fig 2.2.1(b) The structural element of the neighbourhoodsystem. Fig 2.2.1(c) The shaded sites are the sites who are involved for building m(S1,S0,N).Fig 2.2.1 (d) Thecorresponding GLAM.The aura of A with respect to B characterizes howthe subset B is represented in the neighbourhood of A.The GLAM of an image measures the amount of eachgray level in the neighbourhood of each gray level. Asan example, the GLAM for the image shown in Figure 2.2.1(a) is shown in Figure 2.2.1(d), which is calculated usingthe structural element of the fournearest-neighbourneighbourhood system.

Feature extraction for content-based mammogram retrieval

2.3 Gray level neighbours matrix (GLNM) The proposed gray level statistical matrix, termed as gray level neighbours matrix (GLNM), which can extract textural information contains the size information of texture elements and is based on the occurrence of gray level neighbours within the specified neighbourhood. The gray level neighbours are the pixels in the specified neighbourhood with similar gray level as the centre pixel. In the case of a 3×3 neighbourhood, the maximum number of possible gray level neighbours is eight. The number of rows and columns of the GLNM are equal to the number of gray levels and maximum gray level neighbours, respectively. If the number of gray levels and the neighbourhood size are larger it may result in an array of larger dimension, which can be controlled to a considerable extent by reducing the quantization level of the image. The matrix element (i, j) of the GLNM is the ‘j’ number of neighbours within the given neighbourhood having the intensity ‘i’, which is defined as,

Where, # denotes the number of elements in the set and Nxy (p, q) is the defined neighbourhood in the image. The generation of the GLNM matrix shown in Fig. 2 is simple, i.e., the number of operations required to process an image to obtain the GLNM is directly proportional to the total number of pixels. Consider Fig.2.3.2 (a), which shows a 6×6 image matrix with eight gray levels ranging from0 to 7. Figure 2.3.2(b) shows the corresponding GLNM generated. In this case, the row size ofthe GLNM is equal to number of gray levels in the image matrix, i.e., eight and the columnsize of the GLNM is equal to the maximum gray level neighbours of the specified neighbourhood, i.e., also eight due to 3×3 neighbourhood considered. For example, the element in the (1, 2) position (medium shaded) of the GLNM whose value is five, indicates that two gray level neighbours occur five times for the centre pixel with gray level value zero. Likewise, the element in the (5, 3) position (light shaded) whose value is four, indicates that three gray level neighbours occurs four times for the centre pixel with gray level value four. Also, the element in the (8, 2) position (dark shaded) whose value is three, indicates that two gray level neighbours occurs three times for the centre pixel with gray level values even.

Fig. 2.3.2 (a) Sample image matrix (b) Gray level neighbours matrix c) Illustration for generating the value of G (1,2) The elements in the right columns of the GLNM having zero values are an indication of the absence of higher number of pixel neighbours for all the gray levels.

2.4 Texture features The texture coarseness or fineness of the ROI can be interpreted as the distribution of the elements in the GLNM. If a texture is smooth then a pixel and its neighbours will probably have similar gray levels.This means that the entries in the GLNM take larger values and concentrate on right most side columns. On the other hand, if a texture has fine details then the difference between a pixel and its neighbouring pixels will probably be large. This means that the entries in the GLNM take smaller values and concentrate on left most side columns.13 texture features [9] that are computed from the GLNM are as follows :

Feature extraction for content-based mammogram retrieval Contrast:

When i and j are equal, the diagonal elements are considered and (i-j) =0. These values represent pixels entirely similar to their neighbor, so they are given aweight of 0.If i and j differ by 1, there is a small contrast, and the weight is 1.If i and j differ by 2, contrast is increasing and the weight is 4.The weights continue to increase exponentially as (i-j) increases. Homogeneity: This statistic is also called as Inverse Difference Moment. It measures imagehomogeneityas it assumes larger values for smaller gray tone differences in pairelements. It is more sensitive to the presence of near diagonal elements in the GLNM.If weights decrease away from the diagonal, the result will be larger forwindows with little contrast. It has maximum value when all elements in the imageare same. GLNM contrast and homogeneity are strongly, but inversely, correlatedin terms of equivalent distribution in the pixel pairâ&#x20AC;&#x2122;s population. It meanshomogeneity decreases if contrast increases while energy is kept constant.

Dissimilarity: In the Contrast measure, weights increase exponentially (0, 1, 4, 9, etc.) as one moves away from the diagonal. However in the dissimilarity measure weights increase linearly (0, 1, 2,3 etc.).Dissimilarity and Contrast result in larger numbers for more contrast windows. If weights decrease away from the diagonal, the result will be larger for windows with little contrast. Homogeneity weights values by the inverse of the Contrast weight, with weights decreasing exponentially away from the diagonal.

ASM: ASM and Energy use each Pijas a weight for itself. High values of ASM or Energy occur when the window is very orderly. The name for ASM comes from Physics, and reflects the similar form of Physics equations used to calculate the angular second moment; a measure of rotational acceleration .The square root of the ASM is sometimes used as a texture measure, and is called Energy. This statistic is also called Uniformity. It measures the textural uniformity that is pixel pair repetitions. It detects disorders in textures. Energy reaches a maximum value equal to one. High energy values occur when the gray level distribution has a constant or periodic form.

The square root of the ASM is sometimes used as a texture measure, and is called Energy.

Entropy: Entropy is a notoriously difficult term to understand; the concept comes from thermodynamics. It refers to the quantity of energy that is permanently lost to heat ("chaos") every time a reaction or a physical transformation occurs. Entropy cannot be recovered to do useful work. Because of this, the term is used in nontechnical speech to mean irremediable chaos or disorder. Also, as with ASM, the equation used to calculate physical entropy is very similar to the one used for the texture measure. This statistic measures the disorder or complexity of an image.

Feature extraction for content-based mammogram retrieval Difference Entropy:

Sum Entropy:

Sum Average:

Variance: This means relative to ith pixel and jth pixel respectively. This statistic is a measure of heterogeneity and is strongly correlated to first order statistical variable such as standard deviation. Variance increases when the gray level values differ from their mean.

Standard deviation are given by

Variance in texture measures performs the same task as does the common descriptive statistic called variance. Difference Variance: Diff Variance Correlation: The Correlation texture measures the linear dependency of grey levels on those of neighbouring pixels. Correlation can be calculated for successively larger window sizes. The window size at which the Correlation value declines suddenly may be taken as one definition of the size of definable objects within an image.

G is the number of gray levels used.μx, μy, and are the means and standard deviations of Pxand Py. Cluster Shade:

Cluster Prominence:

III.

RESULTS AND DISCUSSIONS

The digital mammograms available in the MIAS database [25] were used for the experiments. The database includes 322 mammograms belonging to normal (Norm) and six abnormal classes—architectural distortion (Arch), asymmetry (Asym), calcification (Calci), circumscribed (Circ) masses, spiculated (Spic) masses and ill-defined (Ill-def) masses. Each mammogram is of size 1,024×1,024 pixels, and annotated for the class, severity, centre of abnormality, background tissue character and radius of a circle enclosing the abnormality. As shown in Fig. 3, the ROIs from abnormal mammograms were extracted. Hence, the abnormal ROIs are of different sizes. But, in the case of normal mammograms, the ROIs of uniform size 200×200 pixels were cropped about the centre, which is a new approach that avoids bias in the case of normal mammograms. Out of 322 ROIs, there are 209 normal, 19 architetural distortions, 15 asymmetry cases, 26 calcification regions, 24 circumscribed masses, 19 spiculated masses and 15 ill-defined masses. In this work, all the 327 ROIs were involved to create the feature dataset and 110 ROIs comprising one-third from each mammogram class were selected as queries. The performance analysis of the proposed gray level statistical matrix for texture feature

Feature extraction for content-based mammogram retrieval extraction regarding mammogram retrieval problem is presented in this section. overall Time and Performance offered by various methods is reported. Method Time(in terms of seconds)

Performance(in terms of error rate)

GLNM

76.22

53%

GLAM

1072.28

63%

GLCM

1200.08

60%

Table 1: Time and Performance rates of proposed method and competing methods

Fig 3.Graph for Time and Performance Analysis

IV. CONCLUSION The paper reports the retrieval performance of the GLCM, GLAM and GLNM by applying on MIAS database. The capability of the methods in extracting texture features is demonstrated. These retrieval approaches may help the physicians to effectively search for relevant mammograms during their diagnosis. From the results, least computational time is observed for GLNM and comparatively better classification rate is observed in GLAM. Developing more efficient feature estimation method is our future endeavour. .

REFERENCES [1]. [2]. [3]. [4]. [5]. [6].

[7].

Chang HD, Shi XJ, Min R, Hu LM, Cai XP, Du HN (2006) Approaches for automated detection and classification of masses in mammograms. Pattern Recognit 39:646–668 Chen CH, Pau LF, Wang PSP (eds) (1998) The handbook of pattern recognition and computer vision, (2nd edn). World Scientific Publishing pp 207–248 Choraś RS (2008) Feature extraction for classification and retrieval mammogram in databases. Int J Med Eng Inf 1(1):50–61 Do MN, Vetterli M (2002) Wavelet-based texture retrieval using generalized gaussian density and Kullback–Leibler distance. IEEE Tans Image Proc 11(2):146–158 Eisa M, Refaat M, El-Gamal AF (2009) Preliminary diagnostics of mammograms using moments and texture features. ICGST-GVIP J 9(5):21–27 El-Naqa I, Yang Y, Galatsanos NP, Nishikawa RM,Wernick MN (2004) A similarity learning approach to content-based image retrieval: application to digital mammography. IEEE Trans Med Imaging 23(10):1233–1244 Felipe JC, Traina AJM, Ribeiro MX, Souza EPM, Junior CT (2006) Effective shape-based retrieval and classification of mammograms. In: Proceedings of the Twenty First Annual ACM symposium on Applied Computing. pp 250–255

Feature extraction for content-based mammogram retrieval [8]. [9]. [10]. [11]. [12]. [13].

[14]. [15]. [16]. [17]. [18]. [19]. [20].

[21]. [22]. [23]. [24]. [25].

[26]. [27]. [28]. [29]. [30].

[31]. [32].

Greenspan H, Pinhas AT (2007) Medical image categorization and retrieval for PACS using the GMMKL framework. IEEE Trans Inf Technol Biomed 11:190–202 Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621 Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans Pattern Anal Mach Intell 12(5):489–497 Korn P, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z (1998) Fast and effective retrieval of medical tumor shapes. IEEE Trans Knowl Data Eng 10(6):889–904 Kwitt R, Meerwald P, Uhl A (2011) Efficient texture image retrieval using copulas in a Bayesian framework. IEEE Trans Image Process 20(7):2063–2077 Lamard M, Cazuguel G, Quellec G, Bekri L, Roux C, Cochener B (2007) Content-based image retrieval based on wavelet transform coefficients distribution. In: Proceedings of the Twenty Ninth Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Press, Lyon, France, pp 4532–4535 Lu S, Bottema MJ (2003). Structural image texture and early detection of breast cancer. In: Proceedings of the 2003 APRS Workshop on Digital Image Computing. pp 15–20 Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837–842 Mudigonda NR, Rangayyan RM, Leo Desautels JE (2000) Gradient and texture analysis for the classification of mammographic masses. IEEE Trans Med Imaging 19(10):1032–1043 Muller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications—clinical benefits and future directions. Int J Med Inform 73:1–23 Muller H, Muller W, Squire DM, Marchand-Maillet S, Pun T (2005) Performance evaluation in contentbased image retrieval: overview and proposals. Pattern Recognit Lett 22(5):593–601 Pandey D, Kumar R (2011) Inter space local binary patterns for image indexing and retrieval. J Theor Appl Inf Technol 32(2):160–168 Qin X, Yang Y (2004) Similarity measure and learning with Gray Level Aura Matrices (GLAM) for texture image retrieval. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit Washington DC USA 1:326–333 Quellec G, Lamard M, Cazuguel G, Cochener B, Roux C (2010) Wavelet optimization for contentbased image retrieval in medical databases. Med Image Anal 14:227–241 Schnorrenberg F, Pattichis CS, Schizas CN, Kyriacou K (2000) Content-based retrieval of breast cancer biopsy slides. Technol Health Care 8:291–297 Smeulders AVM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380 Srinivasan GN, Shobha G (2008) Statistical texture analysis. ProcWorld Acad Sci Eng Technol 36:1264–1269 Suckling J, Parker J, Dance DR, Astley SM, Hutt I, Boggis CRM, Ricketts I, Stamatakis E, Cerneaz N, Kok SL, Taylor P, Betal D, Savage J (1994) Mammographic image analysis society digital mammogram database. Proceedings of International Workshop on Digital Mammography pp 211–221 Sun J, Zhang Z (2008) An effective method for mammograph image retrieval. In: Proceedings of International Conference on Computational Intelligence and Security. pp 190–193 Tourassi GD (1999) Journey toward computer-aided diagnosis: role of image texture analysis. Radiology 213:317–320 Tourassi G, Harrawood B, Singh S, Lo J, Floyd C (2007) Evaluation of information theoretic similarity measure for content-based retrieval and detection of masses in mammograms. Med Phys 34:140–150 Wei CH, Li CT, Wilson R (2005) A general framework for content-based medical image retrieval with its application to mammogram retrieval. Proc SPIE Int Symp Med Imaging 5748:134–143 Wei CH, Li CT, Wilson R (2006) A content-based approach to medical image database retrieval. In: Ma ZM (ed) Database modeling for industrial data management: emerging technologies and applications. Idea Group Publishing, Hershey, pp 258–291 Wiesmuller S, Chandy DA (2010) Content-based mammogram retrieval using gray level aura matrix. Int J Comput Commun Inf Syst (IJCCIS) 2(1):217–222 D. Abraham Chandy . J. Stanly Johnson .S. Easter Selvan (2013) Texture feature extraction using gray level statistical matrix for content based mammogram retrieval.Springer Science + Business Media New York.

Feature extraction for content-based mammogram retrieval Dr.K.Karteeka Pavan completed Ph.D(CSE) from ANU. She is presently working as a Professor. in RVR&JC College of Engineering, chowdavaram, Guntur-19, India. She is having about 15 years of teaching experience and also published papers in Bio Informatics in addition to associate member of CSI and Life member of ISTE. E-Mail id: kkp@rvrjcce.ac.in

Sri. Madamanchi Brahmaiah completed M.Tech from ANU. He is presently working as an Asst. Prof. in RVR&JC College of Engineering, chowdavaram, Guntur-19, India. He is having about 4 years of teaching experience and also worked almost 14 years as programmer in addition to associate member of CSI and member of ISTE, Member in IAENG. E-Mail id: brahmaiah_m@yahoo.com

Ms. SK.Habi Munnissa studying final year M.C.A. from RVR&JC College of Engineering, chowdavaram, Guntur-19, Affiliated to ANU. She associate member of CSI E-Mail id: habi.hr43@gmail.com