Paper id 26201424

Page 1

International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

A New Kernelized Fuzzy C-Means Clustering Algorithm with Enhanced Performance Samarjit Das1, Hemanta K. Baruah2 1

Department of Computer Science &IT, 2Vice-Chancellor 1 Cotton College, Assam, India 2 Bodoland University, Assam, India 1 ssaimm@rediffmail.com, 2 hemanta_bh@yahoo.com

Abstract- Recently Kernelized Fuzzy C-Means clustering technique where a kernel-induced distance function is used as a similarity measure instead of a Euclidean distance which is used in the conventional Fuzzy C-Means clustering technique, has earned popularity among research community. Like the conventional Fuzzy C-Means clustering technique this technique also suffers from inconsistency in its performance due to the fact that here also the initial centroids are obtained based on the randomly initialized membership values of the objects. Our present work proposes a modified method to remove the effect of random initialization from Kernelized Fuzzy C-Means clustering technique and to improve the overall performance of it. In our proposed method we have used the algorithm of Yuan et al. to determine the initial centroids. These initial centroids are then used in the conventional Kernelized Fuzzy C-Means clustering technique to obtain the final clusters. We have also provided a comparison of our method with the Kernelized Fuzzy C-Means clustering technique of Hogo using two validity measures namely Partition Coefficient and Clustering Entropy. Keywords: kernel-induced distance function, Random initialization, Partition Coefficient, Clustering Entropy. 1. INTRODUCTION Clustering is a technique which helps us to reveal the inherent grouping structure of data in an unsupervised manner. The conventional hard clustering techniques can not deal with the situations pertaining to non-probabilistic uncertainty. Hard clustering techniques are based on crisp set theory and therefore there is no possibility of partial belongingness of objects to multiple clusters. In other words the clusters revealed by a hard clustering technique are disjoint i.e. an object of a dataset, after the application of a hard clustering technique, either belongs totally to a particular cluster or does not belong to that cluster at all. The concept of partial belongingness was first introduced by Zadeh (1965) in his famous fuzzy set theory (FST). A complete presentation of all aspects of FST is available in the work of Zimmermann (1991). The applications of FST in dealing with ambiguous problems where nonprobabilistic uncertainty prevails have been reflected in the works of Dewit (1982) and Ostaszewski (1993). Baruah (2011a, 2011b) has introduced a new approach to FST where he has justified that the membership value of a fuzzy number can be expressed as a difference between a membership function and a reference function and therefore the membership value and the membership function for the complement of a fuzzy set are not same. With the advent of FST, the conventional hard clustering techniques have unlocked a new way of clustering, known as fuzzy clustering, where due to the existence

of the concept of degree of belongingness, an object may belong exactly to one cluster or partially to more than one clusters depending on its membership value. In the literature, out of the different available fuzzy clustering techniques the Fuzzy C-Means clustering technique (FCM) of Bezdek (1981) has been found to be widely studied and applied. Derrig and Ostaszewski (1995) have applied the FCM of Bezdek in their research work where they have explained a method of pattern recognition for risk and claim classification. Das (2013) has tried the fuzzy c-means algorithm of Bezdek with three different distances namely Euclidean distance, Canberra distance and Hamming distance which revealed that out of the three distances, the algorithm produces the result fastest as well as the most expected when Euclidean distance is considered and the slowest as well as the least expected when Canberra distance is considered. Das and Baruah (2013) have shown the application of Bezdek’s (1981) FCM clustering technique on vehicular pollution, through which they have discussed the importance of application of a fuzzy clustering technique on a dataset describing vehicular pollution, instead of a hard clustering technique. Although in most of the situations it is evident that the FCM clustering technique performs better than other fuzzy clustering techniques, due to the random initialization of the membership values the performance of FCM clustering technique varies significantly in its different executions. Yager and Filev (1992) proposed a simple and effective method,

43


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.