Study of Clustering Methods in Data Mining

Page 1

Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.55-59 SSN: 2278-2419

Study of Clustering Methods in Data Mining M.Arumaiselvam1, R.Anitajesi2 Assistant Professor, HOD PG& Research Department of Computer Science, St.Joseph’s College of Arts and Science, Cuddalore. 2 M.phil Scholar, St.Joseph’s College of Arts and Science, Cuddalore Email: Arumai_selvam@yahoo.com, anitajesi4@gmail.com

1

Abstract - A study will be made on the various clustering methods. Method of grouping set of physical or abstract objects into classes of similar objects is called as clustering. Splitting a data set into groups such that the similarity within a group is larger than among groups are done by clustering algorithm. This paper discusses about few types of clustering methods- Partitioning methods, Hierarchical methods, Density-based methods, Grid-based method, Model based methods, Constraint based methods. Keywords: Data mining, Clustering, Partitioning, Hierarchical, Density-based, Grid-based. I. INTRODUCTION Data mining is an action of discovering hidden information from the actual data. It is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database system. It is technique of exacting implicit, previously unknown and future useful information from available data. In short it is process of analysing data from different viewpoint and assembling the knowledge from it. Data mining is a set of techniques for efficient automated computerized of previously unknown, valid, novel, beneficial and comprehensible patterns in huge databases. The patterns need to be actionable in order that may be used in a decision of an organization making process. Data mining made organization to collect large amount of data at lowest cost using the technology data collection and storage. Taking advantage of this information, it is easy to access the stored data. Now a days, data mining is used all over the world in various fields like customer retention, education system, production control, healthcare, market basket analysis, manufacturing engineering, scientific discovery and decision making etc. [1] Major elements of Data mining are:  Extract, convert, and load transaction data onto the data warehouse system.  Data are stored in a multidimensional format.  Afford data access to business analysts and information technology professionals.  Using the application software data is analysed.  Present the data in a useful format, such as a graph , table, ect. [2] II. CLUSTERING IN DATA MINING A collection of data that are similar to one another within the same cluster are said to be clustering. It offers with locating a structure in a collection of unlabelled data so that clustering was taken into unsupervised learning technique . [3] Clustering is also called data segmentation in some applications because clustering partitions large data sets into groups according to their similarity. Outlier detection also possible by clustering. Often these techniques require that the user specifies how many groups are expected. Types of Clustering There are various types of clustering in data mining:  Partitioning methods  Hierarchical methods  Density-based methods  Grid-based methods

55


Integrated Intelligent Research (IIR)

 

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.55-59 SSN: 2278-2419

Model based methods Constraint based methods

Partitioning methods [18]Partitioning methods obtain a single level partition of objects. Given n objects, these methods make k<n clusters of data and use an iterative relocation method. If the following requirements are satisified then it will group the data into k groups:  Each group carries at least one object.  Each object must belong to just one group.

Algorithm used in partitioning methods is:  k-means algorithm  k-medoids algorithm k-means algorithm : k number groups of data are partitioned by k-means clustering. The sum of distances from all the objects in that cluster is minimized using centroid of each cluster. So k-means is an repeatative algorithm in which it minimizes the sum of distances from each object to its cluster centroid, over all clusters [3]. k-medoids algorithm : The k-medoids algorithm, each cluster is represented by one of the objects located near the centre of the cluster. The iterative process of replacing representative objects by no representative objects continues as long as the value of the resulting clustering is improved. This value is estimated using a cost function that measures the average dissimilarity between an object and the representative object of its cluster. Hierarchical methods: A hierarchical decomposition was created by hierarchical method of the given set of data objects. It is a set of clusters that are organized as a tree like structure. It produces a series of clusters as opposed to the partitioned methods which produce only a flat set of clusters. Essentially the hierarchical methods attempt to capture the structure of the data by constructing a tree of clusters.[16] Dendrogram is said to be cluster hierarchy. Every cluster node contains child clusters; sibling clusters partition the points covered by their common parent.

Approaches used in hierarchical methods are:  Agglomerative Approach  Divisive Approach

56


Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.55-59 SSN: 2278-2419

Agglomerative Approach: [7]This approach is also known as the bottom-up approach. First step in this method was object forming a separate group. Then all the objects are merged in a group that are close to one another. It will be ended at the stage until all of the groups are merged into single group or until the stop condition holds. Divisive Approach: This approach is also known as the top-down approach. It will start with all of the objects in the same cluster. Then a cluster is dividing into smaller clusters. It is ended when each object in one cluster or the stop condition holds. Density-based methods : Based on distance among objects most partitioning methods cluster objects.[14] Cluster is dense region of points, which is individual separated by low-density regions, from the other regions of high density regions. It used when the clusters are very irregular, and when noise and outliers are available.

Advantages  Does not require a-priori specification of number of clusters.  Able to recognize noise data while clustering. Disadvantages  DBSCAN algorithm fails in case of varying density clusters.  Fails in case of neck type of dataset.  Does not work well in case of large dimensional data. Grid-based methods: Grid based methods quantize the object space into a finite number of cells that form a grid structure. [13] [17] Different shaped clusters in multi-density environments are handled by grid based method. Based on number of points mapped to one grid, a grid density is defined.

Advantages:  Fast processing time when accessing the data.  It is dependent on the number of cells in each dimension .

57


Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.55-59 SSN: 2278-2419

Model based methods : The data from a circulation that is mixture of two or more clusters is called as model based clustering. The density function is used to locate the cluster. Spatial distribution is reflected on the data points. Standard statistics, taking outlier or noise into report also provided by this method.

Constraint based methods: This technique, the grouping is prepared by the method for fuse of client or application-arranged requirements. An crucial manages the folks need or the properties of wanted grouping come about.[7] It offer us with an instinctive method for connection with the clustering techniques.

III. CONCLUSION In this paper, we made a study about clustering and its methods. We would continue to practise our research in clustering algorithms as applied to educational background and will also be working towards generating a unified clustering approach such that it could easily be applied to any educational institutional dataset without any much overhead. References [1] S. Muthukumaran, Dr. P. Suresh, J. Amudhave "Sentimental Analysis on online product reviews using LSSVM method" Journal of Advanced Research in Dynamical and Control Systems Issue: 12-Special Issue Year: 2017 Pages: 1342-1352 [2] S. Muthukumaran, Dr. P. Suresh, J. Amudhave " A State of art approaches on Sentimental Analysis Techniques” Journal of Advanced Research in Dynamical and Control Systems Issue: 12-Special Issue Year: 2017 Pages: 1353-1370 [3] S. Muthukumaran, Dr. P. Suresh “Text Analysis for Product Reviews for Sentiment Analysis using NLP Methods” International Journal of Engineering Trends and Technology (IJETT) – Volume 47 Number 8 May 2017, Page: 474-480 [4] S. Muthukumaran, Dr. P. Suresh “Sentiment Analysis of Product Reviews Using LDA Method based on Customer Text Content” International Journal of Computer Science & Engineering Technology(IJCSET) – December 2015 | Vol 5, Issue 12, 427-431. [5] S.Muthukumaran, A. Victoria Anand Mary, M.Valli “ A Novel Approach to Data Discretization Using Clustering Techniques in Mining of High Dimensional Data” Published in International Journal of Trend in Research and Development (IJTRD) – September 2016 | Special Issue Page:62-67

58


Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.55-59 SSN: 2278-2419

[6] S.Muthukumaran, Dr.P.Suresh and A. Victoria Anand Mary “Sentiment Analysis for Online Product Reviews using NLP Techniques and Statistical Methods” International Journal of Mathematics And its Applications Volume 4, Issue 4 (2016), Page: 303-312. (Special Issue) [7] Abhishekkumar, K. (2017). Survey on K-Means Clustering Algorithm, pp.218–221. https://doi.org/10.21884/IJMTER.2017.4143.LGJZD [8] Bansal, A. (2017). Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining, (January). https://doi.org/10.5120/ijca2017912719 [9] Patel, K. M. A., & Thakral, P. (2016). The Best Clustering Algorithms in Data Mining, (2), 2042–2046. [10] Pal, J., Parmar, S., Shivank, P., Soni, K., & Jain, P. A. (2016). A Survey on K-Means Clustering Algorithms for Large Datasets, 5(9), 48–53. https://doi.org/10.17148/IJARCCE.2016.5912 [11] Dhanachandra, N., Manglem, K., & Chanu, Y. J. (2015). Image Segmentation using K -means Clustering Algorithm and Subtractive Clustering Algorithm. Procedia - Procedia Computer Science, 54, 764–771. https://doi.org/10.1016/j.procs.2015.06.090 [12] Joshi, K., Gupta, H., Chaudhary, P., & Sharma, P. (2015). Survey on Different Enhanced K-Means Clustering Algorithm, 27(4), 178–181. [13] Mythili, S., & Madhiya, E. (2014). An Analysis on Clustering Algorithms in Data Mining Abstract :, 3(1), 334–340. [14] Mandloi, M. (2014). A Survey on Clustering Algorithms and K-Means, 2–6. [15] Agglomerative Hierarchical Clustering Algorithm- A Review. (2013), 3(3), 2–4. [16] Computing, M. (2013a). A Survey on Clustering Principles with K-means Clustering Algorithm Using Different Methods in Detail, 2(May), 327–331. [17] Computing, M. (2013b). A Survey on Various Clustering Techniques with K-means Clustering Algorithm in Detail, 2(April), 155–159. [18] Jain, N., & Srivastava, V. (2013). DATA MINING TECHNIQUES : A SURVEY PAPER, 116–119. [19] Mann, A. K., & Kaur, N. (2013). Grid Density Based Clustering Algorithm, 2(6), 2143–2147. [20] Verma, M., Srivastava, M., Chack, N., Diswar, A. K., & Gupta, N. (2012). A Comparative Study of Various Clustering Algorithms in Data Mining Manish Verma , Mauly Srivastava , Neha Chack , Atul Kumar Diswar , Nidhi Gupta, 2(3), 1379–1384. [21] Ramageri, M. (n.d.). DATA MINING TECHNIQUES AND APPLICATIONS, 1(4), 301–305. [22] Ranjini, K. (2011). Performance Analysis of Hierarchical Clustering Algorithm, 1011, 1006–1011. [23] Mohan, V. (2010). A Survey of Grid Based Clustering Algorithms, 2(8), 3441–3446. [24] Ayr, S. (2006). Introduction to partitioning-based clustering methods with a robust example (Vol . 35).

59


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.