IJSRD - International Journal for Scientific Research & Development| Vol. 4, Issue 05, 2016 | ISSN (online): 2321-0613
A Comparative Analysis of K-Means and K-Medoids Algorithm for Educational Data Dr.(Mrs.) Ananthi Sheshasaayee1 C.Kabila2 1 Research Supervisor 2Research Scholar 1,2 Department of Computer Science & Engineering 1,2 Quaid E Millath Govt. College for Women, Chennai Abstract— Data mining is useful to extract the particular set of information from large volume of database. Data mining is useful in all the fields especially in education field it is known as Educational Data mining (EDM). Educational data mining consist of huge amount of education related data. These data are used to predict the student’s performance, it has become very challenging task. By predicting the performance of the student each student can be monitored closely by the trainer. This prediction method is also helpful in keeping track of curriculum pattern. Many algorithms in clustering are used to find the performance, two algorithms are used k-means and k-medoids is used to calculate the student’s performance and the difficulty they have in the questions. Based upon the marks secured by each student in each question their performance is calculated and finally determining which algorithm will be best for predicting the student’s performance. Rapid miner tool is used. Key words: Educational Data Mining, K-Means, KMedoids, Rapid Miner I. INTRODUCTION Data mining attracts all the fields. It helps the industry to find the solution for their problem. It also helps the researcher to find the solution. In education field data mining plays a major role in predicting the performance, it may be called as Educational Data Mining (EDM). Education is must in every human being life it has to be provided in a proper and in an effective manner. In the higher education students performance is more important. The quality of the higher education is based on the performance of the student’s. Educators and the learner can be motivated by finding the performance of the student’s [1]. Data mining is an incredible methodology that helps to find the hidden information of the students from a large database. Students grade are found for recruitment process [2]. Data mining is also known as Knowledge Discovery in Database (KDD) useful information can be fetched from a large database, in the field of discovering the new techniques. With the dataset mean values of a cluster are measured and can be viewed as a centroid table [3]. There is an increasing interest in the field of education. This emerging field called Educational Data Mining (EDM), which helps discovering knowledge and originates data in the education field [4]. Educational data mining methods belong to a diversity of literatures. These literatures include data mining, machine learning, information visualization, and computational modelling [5]. Clustering can be used in Educational Data Mining (EDM) it can use the techniques like k-means, k-medoids, agglomerative, divisive. Using this technique student’s performance can be predicted. To improve current trends in higher education, motivating the students can be done by managing and processing the
student’s data. Data mining is used to manage these data [6]. The main objective of data mining in higher education is to provide quality education to the students and to improve their managerial decision [7]. The main objective of this research is that it uses clustering technique to predict the performance of the students and to find their difficulty in answering the questions. Clustering is that assigning a particular set of objects to a specific group [12]. K-means and k-medoids algorithm are used and then they are compared, k-means algorithm works better for the students data. The next section is focused on methodology then section 3 is about tools and techniques used. Section 4 contains the algorithm, tools, performance of the algorithm and the result. The last section 5 is conclusion and future work is outlined. II. METHODOLOGY The research has started after various studies and discussion. This study is to find the best algorithm which suits for the students data. For this work the student data is collected and these data are transferred into a standard format required by Rapid miner tool. These data is then given input to the tool which is used for this study.
Fig. 1: Methodology III. TOOLS AND TECHNIQUES USED Data mining technique is used for educational data. Clustering is the technique used to figure out the performance of the students, k-means and k-medoids are the two algorithms in clustering which has been taken for this study. For implementation work rapid miner tool is used. IV. K-MEANS AND K-MEDOIDS ALGORITHM IN RAPIDMINER Clustering is used to identify similar classes of objects. By using clustering dense and sparse region can be identified and can discover correlation among data attributes [8]. Kmeans is the centroid based algorithm which is used to cluster the data in same group. K-means algorithm is applied to group the student’s data and to predict their performance.
All rights reserved by www.ijsrd.com
1683
A Comparative Analysis of K-Means and K-Medoids Algorithm for Educational Data (IJSRD/Vol. 4/Issue 05/2016/412)
It runs number of times by choosing random data point as centre. Large data set can be clustered using k-means algorithm and it is used to reduce the time required for the process [9][11]. The k-means algorithm is an evolutionary algorithm that gains its name from its method of operation [10]. K-medoids is also a centroid based algorithm which is related to k-means which is used to minimize the distance between the points. The k value is given at first depending upon the k value the data are clustered in groups. Rapid miner is used to predict the performance of students and finding which algorithm suits for educational data. Rapid miner support many clustering algorithm among all the clustering algorithm k-means and k-medoids algorithm are compared. For applying the clustering algorithm it requires data set and it should be in required format. The data is then applied in rapid miner tool, the data is first configured using read excel operator, rapid miner accepts many file format. The Meta data view of particular data set after applying k-means and k-medoids algorithm are shown in figure.
Fig. 2: Meta data view of a mark dataset The rapid miner tool produces the centroid table for k-means and k-medoids algorithm, the centroid table is calculated using the Euclidean distance measure or some other metrics. Centroid table is based upon the data points in the data table. Each point will be assigned one by one using centroid points. The centroid table for k-means and kmedoids are as follows.
Fig. 3: Centroid table of k-means algorithm
Fig. 4: Centroid table of k-medoids algorithm A. Performance Evaluation of K-Means and K-Medoids The performance is calculate after the cluster model. The performance is used to find the distance between each centroid cluster points, the average value is calculate using this performance operator. The performance is found between average within centroid table or davies bouldin. The average performance of both k-means and k-medoids algorithm for same data values are given below.
Fig. 5: K-means performans evaluation
Fig. 6: K-medoids preformance evaluation This performance value shows the average value of each cluster points of the table. By using this average value best algorithm is found for educational data, the values are applied to the graph to predict the algorithm performance. Each average value of cluster is applied to graph t find which is best.
All rights reserved by www.ijsrd.com
1684
A Comparative Analysis of K-Means and K-Medoids Algorithm for Educational Data (IJSRD/Vol. 4/Issue 05/2016/412)
Fig. 7: Graphical representation of k-means and k-medoids performance B. Result This graph shows that k-means algorithm has less distance between the centroid points than k-medoids algorithm. Thus the k-means algorithm works better for the educational data to find the performance of the students and by predicting the performance the trainer can improve the performance of weak students. V. CONCLUSION AND FUTURE WORK Forecasting the performance of students will help them in future to find out best solution out of the result. Data mining is most commonly used technique in the field of education. In data mining clustering is one of the techniques which is frequently used to group the data, k-means and k-medoids is one of the algorithm in clustering. In conclusion, among two algorithms k-means works best for the educational data to predict the results. It gives less distance points between data value that the k-medoids. In future this work can be extended to compare all the hierarchical clustering by using more number of student data.
[6] P.Veeramuthu Dr.R.Periasamy, “Application of Higher Education System for Predicting Student Using Data mining Techniques”, International Journal of Innovative Research in Advanced Engineering (IJIRAE) Volume 1 Issue 5 (June 2014) [7] Mohammed M. Abu Tair, Alaa M. El-Halees, Mining Educational Data to Improve Students’ Performance: A Case Study Volume 2 No. 2, February 2012 ISSN 2223-4985 International Journal of Information and Communication Technology Research [8] Brijesh Kumar Baradwaj Saurabh Pal,Mining “Educational Data to Analyze Students Performance”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2 No. 6, 2011 [9] Shiwani Rana Roopali Garg, “Evaluation of Student’s Performance of an Institute Using Clustering Algorithms”, International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 5 (2016) pp 3605-3609 [10] Kehar singh, Dimple Malik, Naveen Sharma, “Evolving limitations in k-means algorithm in data mining and their removal”, IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011 [11] Mahendra Tiwari, Randhir Singh, Neeraj Vimal, “An Empirical Study of Applications of Data Mining Techniques for Predicting Student Performance in Higher Education”, IJCSMC, Vol. 2, Issue. 2, February 2013, pg.53 – 57 [12] Monika Goyal and Rajan Vohra, “Applications of Data Mining in Higher Education”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 1, March 2012
REFERENCES [1] Amirah Mohamed Shahiria, Wahidah Husaina, Nur’aini Abdul Rashida, The Third Information Systems International Conference “A Review on Predicting Student’s Performance using Data Mining Techniques”, 2015 The Authors. Published by Elsevier B.V [2] Umamaheswari. K, S. Niraimathi , “A Study on Student Data Analysis Using Data Mining Techniques”,Volume 3, Issue 8, August 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering [3] L.Arockiam, S.Charles, I.Carol, P.Bastin Thiyagaraj, S. Yosuva, V. Arulkumar, “Deriving Association between Urban and Rural Students Programming Skills”, (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 03, 2010, 687-690 [4] Parneet Kaura,Manpreet Singhb,Gurpreet Singh Josan, “Classification and prediction based data mining algorithms to predict slow learners in education sector”, 3rd International Conference on Recent Trends in Computing 2015(ICRTC-2015) [5] Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy, “An Educational Data Mining System for Advising Higher Education Students”, International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:7, No:10, 2013
All rights reserved by www.ijsrd.com
1685