IJIRST –International Journal for Innovative Research in Science & Technology| Volume 1 | Issue 6 | November 2014 ISSN (online): 2349-6010
An Efficient Segmentation of Remote Sensing Images For The Classification of Satellite Data Using K-Means Clustering Algorithm D. Napoleon Assistant Professor Department of Computer Science Bharathiar University, Coimbatore, India
Dr. E. Ramaraj Professor Department of Computer Science & Engineering Alagappa University, Karaikudi, India.
Abstract Now-a-days Image plays a massive role in bringing information. Numerous amount of information has been hidden in various forms. One such image is the Remote Sensing Image. Remote Sensing images have also been used for reaching high level information. Clustering algorithms plays an important role in classifying the images for variety of information. K-Means algorithm brings out the best way of classifying and segmenting the images from Quick Bird data sets and also other data sets. Three different centroids are used to classify and analyze the remote sensing images. The proposed concept use K-Means Clustering Algorithm which attains good Accuracy with different running time. .Other clustering algorithms are to be used to measure the performance accuracy. Keywords: Clustering, K-Means Algorithm, Segmentation And Remote Sensing Images. _______________________________________________________________________________________________________
I. INTRODUCTION An image can be discrete like a two dimensional function, f(x, y), where x and y are the spatial (plane) co-ordinates, the intensity of an image at a point is the amplitude of f at any pairs coordinates (x, y). We call an image a digital image when amplitude values of „f‟ are finite and distinct qualities. When a digital image is being processed using a digital computer the field of study is called digital image processing. A digital image consists of fixed number of elements called pixels or picture elements, where each element has a particular value and location. Amongst the five senses vision is the most advanced sense, due to which images play an important role in human perception. Human‟s vision is inadequate in visualizing all the bands in Electromagnetic (EM) spectrum while the entire spectrum ranging from gamma to radio waves can be covered through the imaging machines. They are operable on image sources that humans are proverbial too like electron microscopy, ultra sound and computer generated images. Whose role acts very important in wide and varied field of applications [1]. Clustering Techniques are popularly classified into two namely hard clustering and soft clustering. In hard clustering it focuses mainly on whether the object belongs to a cluster or not while in a soft clustering to a certain degree the objects belongs to a cluster. This paper deals with K-means algorithm for clustering data. Here K-means algorithm is applied to a large dataset like image data set. The paper is categorized into six parts in which part II explains about the related work, in part III clustering problem is analyzed, part IV KMeans algorithm is discussed, part V the system flow is described while in part VI and part VII the performance analysis and conclusion has been described.
II. RELATED WORK Clustering algorithm is a widely discussed problem which has various application domains like Knowledge discovery and data mining [1], statistical data analysis [3], medical image processing [5], [4] compression [4], data classification and bioinformatics [6], various algorithms have been proposed for the clustering technique [7],[8]. A.L.Abul has explained about Cluster Validity analysis using sub sampling [10]. In a clustering algorithm the data points are divided into subsets where the similar objects form a subset where different subsets have their unique qualities [11], [12], [13]. For refining the initial cluster centers Bradely and Fayyad have proposed an algorithm. The algorithm iterates less time but the true clusters are found very often [15]. By diminishing the distance calculations, performance is improved in some clustering methods. . For example, Judd et al. proposed a parallel clustering algorithm P-CLUSTER [16] which is based on the three pruning techniques. K-Means algorithm [7] is widely known for its competence in clustering larger data sets. Ruspini[9] and Bezdek have reported about the fuzzy version of K-means algorithm, where individual patterns are permissible to have a membership function for clusters while just having a discrete membership for exactly one cluster. In Kanungo‟s et al. [17] filtering algorithm the data points are stored in k-d tree. Where each node in the tree maintains a set of candidate centers which are pruned of filtered as they promulgate to the node‟s children. This algorithm ia more robust while comparing with Alsabti‟s method because it relies on less effective pruning mechanism based on
All rights reserved by www.ijirst.org
314
An Efficient Segmentation of Remote Sensing Images For The Classification of Satellite Data Using K-Means Clustering Algorithm (IJIRST/ Volume 1 / Issue 6 / 055)
computing the minimum and maximum distances to each cell. Ming-Chuan Hung, explained about an Efficient k-Means Clustering Algorithm Using Simple Partitioning [20].
III. CLASSIFICATION A classification unit could be a collection of neighboring pixels or a pixel or a remote sensing image itself. It mainly focuses on the spectral information in a classification image considering the use of temporal, spatial and other information. Two classes are being introduced namely information class and spectral class for the illustration of supervised and unsupervised classification. Information class is the class in which deals with the information extracted by the analyst while the spectral class deals with the similar gray level vectors in multispectral space. The tradition of correlating an spectral class to information class is the main difference between the supervised classification and unsupervised classification. In a supervised classification the information class is specified first on the image and to form class signatures an algorithm is being used to summarize the multispectral information from the precise areas on the image. This process is known as unsupervised training. But in a unsupervised classification algorithm is applied first on the image and spectral classes are being formed. Later the image analyst assigns the spectral class to the information class.The methodology diagram of the proposed work is show in the figure 1. Input Image
Conversion of Grayscale Image
K-Means Clustering Algorithm
Image Cluster
Classified Image Fig. 1: Proposed Architecture
Quick Bird and Landsat image has been taken as input. These images are converted into grey scale image so as to perform better. K-Means clustering algorithm is used to cluster the image. This algorithm works well in classifying both the input images. Number equations consecutively with equation numbers in parentheses flush with the right margin, as in (1). First use the equation editor to create the equation. Then select the “Equation” markup style. Press the tab key and write the equation number in parentheses. To make your equations more compact, you may use the solidus ( / ), the exp function, or appropriate exponents. Use parentheses to avoid ambiguities in denominators. Punctuate equations when they are part of a sentence, as in
IV. K-MEANS CLUSTERING ALGORITHM K-means algorithm [5] is mainly used in solving the clustering problems. The data set is classified based on the number of clusters which is assumed as k-clusters. For each clusters k centroids is being defined. The centroids are handled very carefully because when placed in different location the results are produced in a diverse manner [8]. So the centroids can be chosen as much as far possible. The point belonging to the data set is taken and the nearest centroids are associated. When no point is pending, the first step is completed and an early group age is done. At this point we need to re-calculate k new centroids as bar centers of the clusters resulting from the previous step. After acquiring the k new centroids a novel binding is being done in between the nearest new centroids and the same data set points. A loop is being generated. As the loops result the k centroids keep changing their location in a step by step process until no changes are possible. In the squared error function this algorithms aims in minimizing the objective function. The objective function: ∑∑ Where is a chosen distance measure between a data point x ij and the cluster centre cj is an indicator of the distance of the n data points from their respective cluster centers The Points are represented as initial group centroids by Placing K points into the space represented by the objects that are being clustered. Assign each object to the group that has the closest centroid.
All rights reserved by www.ijirst.org
315
An Efficient Segmentation of Remote Sensing Images For The Classification of Satellite Data Using K-Means Clustering Algorithm (IJIRST/ Volume 1 / Issue 6 / 055)
When all objects have been assigned, recalculate the positions of the K centroids. Repeat second and third points until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
A. Pseudo code for K-Means Algorithm Take k number of clusters and n number of samples in a multi dimensional space. Here, k numbers of iterations are followed for k centroids to obtain optimal clusters, at each iteration, solution is constructed. And finally, found best optimal solution from iteration and average time is calculated at end. Below mentioned rule is followed for algorithm construction. Step 1: Initialize all k number of clusters and n number of Samples. Step 2: Iteration (I) <=k Step 3: Randomly select one centroid cj, where j=1<j<k. Step 4: Method1<=n Step 5: Randomly choose one object oi, where i=1<i<n. Step 6: Object i on cluster j represent as oij. Step 7: Result of oij is either 0 or 1 If oij=1 means, i belongs to cluster j. If oij=0 means, i belongs to some other clusters. Step 8: Calculate mean algorithm by Kj=1/ni(oij)*cij Step 9: Repeat Step4 to Step8 until method reaches in n sample. Step 10: Recalculate the position of centriod by Cj=1/ni(Kj*cij). Step 11: Repeat Step 2 to Step 10, upto centroids. Step 12: Find the optimal clusters from the result of iteration by Min j= oij|Kj – Cj|2.
V. EXPERIMENTAL RESULTS The dataset used is a part of remote sensing images of Quick bird images (as in figure 2) which wraps a small area of south part of the city Trento, Italy which is obtained on July 17, 2006. This mainly comprises of the vegetation and exposed land cover types. The vegetation area is represented by green, dry land is represented by dark areas, slightly darker areas represent grass land and paddy fields are represented by deep darker areas. The vegetation area is found high. Brown area represents the exposed land, dry salt flats is represented by slightly brown which are blocked by mixed forest land and dry land. These are the features explaining the difficulty in the clustering of land cover. In many fields clustering techniques plays an important role. To present an acceptable result heuristic algorithms are being used for running time and solution quality because the non-trivial clustering problem differences are NP-Hard. When the initial centroids are chosen randomly the K-means algorithm produces desired results in different runs. Choosing Initial centroid is very important because the result has a huge effect based on this initial centroids. K-means algorithm takes more time to complete when it runs on large data sets. K-Means algorithm clusters the data based on different colors for this data set. This data set consists of six numerical attributes ranging from 0 and 255. Each data set consists of n points where the points are centered around k cluster centers. Randomly the k cluster centers are allocated first then the six attributes ranging from 0 to 255 is uniformly allocated in the algorithm. The D/r defines the calculation of minimum distance (D) between two cluster points. The process is repeated till it attains the data points and the clusters. A. Dataset 1
Fig. 2: Quick Bird Image
The Following table 1.1 shows the performance measure of Quick Bird dataset obtained by using K-Means Clustering algorithm.
All rights reserved by www.ijirst.org
316
An Efficient Segmentation of Remote Sensing Images For The Classification of Satellite Data Using K-Means Clustering Algorithm (IJIRST/ Volume 1 / Issue 6 / 055)
Table - 1 Results On Quick Bird Data Set Input Set
Running Time (sec)
Distance Measurement
3 Centroids 1000 points
221
7.39 x 104
5 Centroids 1000 points
505
3.34 x 106
7 Centroids 1000 points
2043
1.01 X 1010
Fig. 3: Classified Image (3 Centroids 1000 Points)
Fig. 4: Classified Image (5 Centroids 1000 Points)
Fig. 5: Classified Image (7 Centroids 1000 Points)
Fig. 6: Accuracy Analysis For Quick Bird Data Set
2500 2000 Running Time
1500 Distance Measurement
1000
Accuracy
500 0 3cen
5cen
7cen
Fig. 7: Performance Analysis Chart For Quick Bird
All rights reserved by www.ijirst.org
317
An Efficient Segmentation of Remote Sensing Images For The Classification of Satellite Data Using K-Means Clustering Algorithm (IJIRST/ Volume 1 / Issue 6 / 055)
B.
Dataset 2
Fig. 8: Landsat Image Table - 1 Results On Landsat Data Set Input Set
Running Time (sec)
Distance Measurement
3 Centroids 1000 points
234
7.89 x 104
5 Centroids 1000 points
595
4.34 x 106
7 Centroids 1000 points
2003
1.00 X 1010
The above table 2.1 shows the performance measure of Landsat Data Set obtained by using K-Means Clustering algorithm.
Fig. 9: Classified Image (3 Centroids 1000 Points)
Fig. 11: Classified Image (7 Centroids 1000 Points)
Fig. 10: Classified Image (5 Centroids 1000 Points)
Fig. 12: Accuracy Analysis For Land Data Set
Fig. 13: Performance Analysis Chart For Landsat
All rights reserved by www.ijirst.org
318
An Efficient Segmentation of Remote Sensing Images For The Classification of Satellite Data Using K-Means Clustering Algorithm (IJIRST/ Volume 1 / Issue 6 / 055)
VI. CONCLUSION Images have high value of representations for numerous information. Remote Sensing Image has also been used for high level information. Clustering technique plays an important role in classifying the images for variety of information. The clustering performance for the K-Means algorithms has been evaluated using Quick Bird data set and Landsat data set where three different centroids are used to classify and analyze the remote sensing images. The Proposed K-Means Clustering Algorithm attains 78% of Accuracy with different running time. In order to enhance the accuracy some other clustering algorithms can also be used to perform better performance with better accuracy.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]
Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining and Knowledge Discovery, Vol. 2, 1998, pp. 283-304. J. R. Wen, J. Y. Nie, and H. J. Zhang, “Query clustering using user logs,” ACM Transactions on Information Systems, Vol. 20, 2002, pp. 59-81. J. Banfield and A. Raftery, “Model-based gaussian and non-gaussian clustering,” Biometrics, Vol. 49, 1993, pp. 15-34. J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, Vol. 18, 1975, pp. 509-517. D.A. Clausi, “K-means iterative fisher unsupervised clustering algorithm applied to image texture segmentation,” Pattern Recognition, Vol. 35, 2002, pp. 1959-1972. F. X. Wu, W. J. Zhang, and A. L. Kusalik, “Determination of the minimum samples size in micro array experiments to cluster genes using K-means clustering,” in Proceedings of 3rd IEEE Symposium on Bioinformatics and Bioengineering, 2003, pp. 401-406. K. Alsabti, S. Ranka, and V. Singh, “An efficient k-means clustering algorithm,” in Proceedings of 1st Workshop on High performance Data Mining, 1998. R. C. Dubes and A. K. Jain,”Algorithms for Clustering Data”, Prentice Hall, 1988. E. R. Ruspini, “A new approach to clustering,” Inform. Contr., vol. 19, pp. 22–32, 1969. L. Abul, R. Alhajj, F. Polat and K. Barker “Cluster Validity Analysis Using Sub sampling,” in proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Washington DC, Oct. 2003 Volume 2: pp. 1435-1440. J. Grabmeier and A. Rudolph, “Techniques of cluster algorithms in data mining,” Data Mining and Knowledge Discover, 6, 2002, pp. 303-360. L. O Hall, I. B. Ozyurt, J. C. Bezdek, “Clustering with a genetically optimized approach,” IEEE Transactions on Evolutionary Computation,3(2), 1999, pp. 103-112. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, 31(3), 1999, pp. 264-323. P. Berkhin, “A Survey of Clustering Data Mining Techniques” Kogan, Jacob; Nicholas, Charles; Teboulle, Marc (Eds.) Grouping Multidimensional Data, Springer Press (2006) 25-72. P. S. Bradley and U. M. Fayyad, “Refining initial points for k-means clustering,” in Proceedings of 15th International Conference on Machine Learning, 1998, pp. 91-99. D. Judd, P. McKinley, and A. Jain, “Large-scale parallel data clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, 1998, pp. 871-876. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, 2002, pp. 881-892. Demiriz, K. P. Bennett, and M. J. Embrechts, “Semi-supervised clustering using genetic algorithms,” R.P.I. Math Report No. 9901,Rensselaer Polytechnic Institute, 1999. M. Painho and F. Bação, “Using genetic algorithms in clustering problems,” in Proceedings of GeoComputation Conference, 2000. J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann Publishers, 2000. W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, and D. Pregibon, “Squashing flat files flatter,” in Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1999, pp. 6-15. Ya-Wei Ho; Chih-Hung Wu; Chih-Chin Lai,” Aerial image clustering using genetic algorithm, “IEEE transactions on Pattern Analysis and Machine Intelligence, Vol. 24, 2009. P. K. Agarwal and C. M. Procopiuc, “Exact and approximation algorithms for clustering,” in Proceedings of the ninth annual ACMSIAM symposium on Discrete algorithms, 1998, pp. 658-667. Venkatesh Katari,Suresh Chandra Satapathy, JVR Murthy,PVGD Prasad Reddy, “Hybridized Improved Genetic Algorithm with Variable Length Chromosome for Image Clustering”, IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.11, November 2007
All rights reserved by www.ijirst.org
319