International Journal of Engineering, Management & Sciences (IJEMS) ISSN-2348 –3733, Volume-2, Issue-5, May 2015
Mining Student Learning Behavior in Library Usage Using K-Means Algorithm (ClassificationAlgorithm) Krutibash Nayak Abstract— In this paper we use K-means classification data mining algorithm for classifying students based on their Library usage data and the marks obtained in their respective courses. We have used a specific mining tool for making the configuration and execution of data mining techniques easier for instructors. We have used real data from institutions and courses with University students. We have also applied pre-processing techniques on the original numerical data in order to verify if better classifier models are obtained. Finally, we claim that a classifier model appropriate for educational use has to be both accurate and comprehensible for instructor and management in order to be of use for decision making. Index Terms— K-means, Data Mining, classification, web mining
could also provide immediate guidance in order to promote the students’ learning effects. Data mining is also known as knowledge discovery in databases (KDD; Baker & Yacef, 2009; Han & Kamber, 2006), and follows the standard KDD process: 1. data cleaning and integration 2. selection and transformation 3. applying data mining algorithms, and evaluation and presentation
I. INTRODUCTION Data mining is a new kind of information processing technology, it can extract interesting patterns or knowledge implicated in a large number of incomplete, noisy, ambiguous and random practical application data people do not know in advance but with potentially application Many methods are being developed to study student behaviour in e-learning and virtual class room. But few methods are being applied to class room teaching and library usage. Most of the institutions spend a lot of its financial resources on library but it is difficult to calculate how much it will impact on the student's study and growth of institution. Faculty members face difficulties to teach the student if they don't know the student participation during the lecture. Learning methods generally refer to any records created in the learning process, such as notes, assignments, test papers, and reports. Through computer techniques, the students’ behaviour, such as the time taken to read learning materials(like books and journals), the duration spent online library and logon frequency, assignments, and records of online conversations with others on the learning platform can be recorded in a database. Thus, the learning portfolios of students participating in online learning include detailed raw data. If we could analyse the correlation between the students’ learning behaviour and learning achievements, we would be able to enable the teachers to control the students’ overall and personal learning situations to a greater extent. The teachers Manuscript received May 14, 2015. Krutibash Nayak,Student,M.Tech,Suresh Gyan Vihar University Jaipur, Rajasthan, India
81
The above figure 1 shows the overall architecture which will integrate all the phases of software development all the users are mining information various data mining algorithms are used to extract data from various databases listed in the above figure information migration takes place while the users extract the data as and when required [Namo Narayan,2008].
II. K-MEANS ALGORITHM The k-means algorithm is an evolutionary algorithm that gains its name from its method of operation. The algorithm clusters observations into k groups, where k is provided as an input parameter. It then assigns each observation to clusters based upon the observation’s proximity to the mean of the cluster. The cluster’s mean is then recomputed and the process begins again. Here’s how the algorithm works: The algorithm arbitrarily selects k points as the initial cluster centers (“means”). Each point in the dataset is assigned to the closed cluster, based upon the Euclidean distance between each point and each cluster center. Each cluster center is recomputed as the average of the points in that cluster. Steps 2 and 3 repeat until the clusters converge. Convergence may be defined differently depending upon the implementation, but it normally means that either no
www.alliedjournals.com