7 ijaers feb 2016 19 heart disease prediction using k nearest neighbour and k means clustering by IJAERS Journal

International Journal of Advanced Engineering Research and Science (IJAERS)

Vol-3, Issue-2 , Feb- 2016] ISSN: 2349-6495

Heart Disease Prediction using K Nearest Neighbour and K Means Clustering Dr.Mohanraj.E1,SubhaSuryaa.K2, Sudha.P3, Sarath Kumar.K4

Assistant Professor, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode,Tamil Nadu, India 2, 3, 4 Students, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode,Tamil Nadu, India

Abstract—The widespread application of data mining is highly noticeable fields like e-business, marketing and retail has led to its application in other industries and healthcare sectors. The healthcare environs are still information rich but that has poor knowledgeable data. Techniques in Data mining have been commonly used to extract knowledgeable information from medical data bases Today medical field have come a long way to treat patients with various kind of diseases. Among the most menacing one is the Heart disease which cannot be detected with a stripped eye and comes suddenly when its boundaries are reached. Bad medical decisions would cause death of a patient which cannot be afforded by any hospital. To achieve a correct and cost effective treatment computer-based and support Systems can be developed to make good decision. Many hospitals use hospital information systems to manage their healthcare or patient data. These systems produce huge amounts of data in the form of images, text, charts and numbers. K nearest neighbor and K means used to support the medical decision making efficiently. Keywords—K Nearest Neighbor, K Means. I. INTRODUCTION Heart disease is one of the major problems for causing death. Most of the healthcare organization predicts this disease by doctor’s experience. Nowadays our computer technology has been improved and develops software for analyzing the problems in our human body. The large amount of healthcare data can be collected by health care industry for every person, those details does not contain hidden information. In this case advanced data mining techniques are used to evaluate the dataset effectively, which helps as to take decisions clearly. The accurate data is helpful for both clinicians and patients for identifying the individual risk. The K Nearest and K Means algorithm are used for partitioning number of observations with nearest mean value which means classify a given data set through a certain number of clusters. This aim is to minimizing an objective function and gives a safety measure for affected persons. It compares every healthcare detail with original www.ijaers.com

dataset and provides an accurate result and gives an alert to the affected persons. II. EXISTING SYSTEM In exiting approach syndicates K Nearest Neighbor and genetic algorithm to expand the classification accurateness of heart disease data set. They used genetic search as a heavens measure to crop redundant and immaterial attributes and to rank the attributes which contribute more towards classification. Least graded attributes are detached and classification algorithm is built based on estimated attributes. This classifier is accomplished to categorize heart disease data set as either healthy or sick. In exiting paper recommended for only classification not a prediction so some safekeeping issues is accrued. In existing system Old genetic algorithm are used, so there is no prediction it leads to the low security of the system. III. PROPOSED SYSTEM In exiting system only proposed for classification technique. In this paper proposed classification and prediction of K Nearest Neighbor with K Means classification. This combined approach of K Nearest Neighbor and K-Means clustering to improve the classification accuracy of heart disease data set and the prediction can be used to provide the security in heart disease medical data. The proposed system works as follows Proposed algorithm Step 1) Data set are loaded Step 2) Attributes are ordered based on their value Step 4) selects the subset of higher ranked attributes Step 5) Apply (KNN+K-Means) on the subset of attributes that exploits Classification accuracy Step 6) Estimate the correctness of the classifier, which dealings the ability of the Classifier to properly categorize unfamiliar sample. Accuracy of the classifier is computed as Accuracy = Page | 30

International Journal of Advanced Engineering Research and Science (IJAERS) N = No. of samples correctly classified in test data K = Total number of samples in the test data IV. MODULES DESCRIPTION 4.1 ANALYSING THE DATA SET The goal of data set analysis is to discover the useful information from the large set of data.Data is collected from a variety of sources. Analysis of data is a procedure of inspecting, cleaning, transforming, and modeling the data. Data integration is a precursor to data analysis, and data analysis is closely linked to data visualized and data dissemination. Data initially attained must be processed or organized for analysis. For instance, this may involve placing data into rows and columns in a table format.Each column represents a particular variable.It lists values for each of the variables, such as age, height, weight, heart beat rate, cholesterol level and other blood test information.The values may be numbers, such as real numbers or integers for example representing a person’s height in centimetres, but may also be nominal data (not consisting of numerical values).The data set related to heart disease are collected from various resources and processed. 4.2 ADD PATIENT AND DOCTOR DETAILS Add patient and doctor details module is designed for add patient details, view and edit the patient details, and also for add, view and edit the doctor details. The patient details such as their name, sexual category, age, date of birth, mobile number and address are collected. After entering this detail the unique patient id and pass word are generated. This particularly used for further patient health information are stored in their own login. The doctor details includes their name, ID, age, specialist, and experience .These information are stored in data base and used for further processes. This module is aimed at viewing the prescribed tests to the patients by the doctor and to enter the patient’s and doctor’s general information. Each record of this entry is stored in database for the each patient and doctor. 4.3 ADD THE PATIENT HEALTH INFORMATION In add the patient health information module, the patient login their web page by using their own login id and pass word for entering their health related information and their habits. The patient’s health related information are their name, sexual category, age, height, weight, heart beat rate, and blood test information like albumin, cholesterol level, HBA1C, BUN, and Triglyceride and their habits like smoking, alcohol consumption and about their exercise and also other heart disease related information. The collected information is examined and compare against the sample data. High valued attributes are extracted from the database .This attribute may be the reason for heart disease. www.ijaers.com

Vol-3, Issue-2 , Feb- 2016] ISSN: 2349-6495

4.4 K NEAREST NEIGHBOUR AND K-MEAN CLUSTERING ALGORITHM IMPLEMENTATION Heart disease arises when the blood vessel which normally delivers oxygen and body fluid to the heart jammed completely or tapered. Heart problems assimilated at birth or later in life. The main reason for major deaths in India is heart disease ,the result of the survey shows. Following types of heart diseases are arises by various reasons 1) Coronary heart disease 2) Cardiomyopathy 3) Cardiovascular disease 4) Ischemic heart disease 5) Heart failure 6) Hypertensive heart disease 7) Inflammatory heart disease 8) Valvular heart disease Common risk factors of heart disease include 1) High blood pressure 2) Abnormal blood lipids 3) Use of tobacco 4) Alcohol consumption 4) Obesity 5) Physical inactivity 6) Diabetes 7) Age 8) Gender 9) High cholesterol level 10) High triglyceride level 11) Lack of exercise These factors are applied to an algorithm for heart disease prediction. K NEAREST NEIGHBOR ALGORITHM K nearest neighbor (KNN) is a simple algorithm, which stores all data and classifies new data from that old data based on similarity measure. KNN is a straight forward classifier, where samples are classified based on the class of their nearest neighbor. The distance is calculated using one of the following measures Euclidean Distance. d (x, y) = ∑ | − | Using this formula, the distance is calculated for the patient attributes .The points are plotted on the graph. If the value is nearest to the sample value the patient may affected by heart disease.

Page | 31

International Journal of Advanced Engineering Research and Science (IJAERS)

Vol-3, Issue-2 , Feb- 2016] ISSN: 2349-6495

V. CONCLUSION In this paper we presented an approach to predict the heart disease by using K Nearest Neighbor and K Means algorithm implementation. The proposed system group the health related attributes into number of clusters and calculate distance between the clusters by using KNN algorithm and also calculate the centroid value of the clusters by using K Means algorithm. This values are helps to predict the heart disease of the person. This heart disease prediction model to the doctor foreffective heart disease diagnosis with lower number of attributes. Fig. 1:

Accuracy graph for different cluster with nearest value

K MEANS CLUSTERING ALGORITHM The classification of objects into different groups, or more exactly, the segregating of a data set into subsets is called clustering. The similarity of two elements is determined by using to measure the distance and it will influence the shape of the clusters. An algorithm for partitioning (or clustering) N data points into K disjoints subsets. The distance is measured by using the formula called j =∑ ∑ € | − µ | Step 1: Starts with a decision on the value of k number of clusters Step 2: Given any initial partition for classify the data into k clusters Step 3: The distance is compute from the centroid of the each clusters for each taken sample Step 4: Step 3 is repeat until to achieve the convergence Using this algorithm the centroid is calculated for the patients attribute. If the mean value of the patient is nearest to the sample mean value, the patient may affected by heart disease.

Mean value calculation.

Fig. 2:

Accuracy graph for different cluster with their mean value

www.ijaers.com

REFERENCE [1] M. Akhiljabbar , Dr. B. L Deekshatulu and Dr.Priti Chandra “Heart Disease Prediction System using Associative Classification and Genetic Algorithm” International Journal on Emerging Trends in Electrical, Electronics and Communication Technologies,vol.33,pp.241-252,2012. [2] M. Akhil Jabbar, B.L Deekshatulu & Priti Chandra “Classification of Heart Disease using Artificial Neural Network and Feature Subset Selection” Global Journal of Computer Science and Technology Neural & Artificial Intelligence,vol.13 no.1,2013. [3] M. Akhil jabbar, B. L Deekshatulu and Priti Chandra “Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm” International Journal on Computational Intelligence: Modeling Techniques and Applications,pp.85-94,2013. [4] P.K. Anooj “Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules” Journal of King Saud University – Computer and Information Sciences,vol.24,pp.27-40,2012. [5] Amiya Kumar Tripathy, Rahul Isola and RebeckCarvalho “Knowledge Discovery in Medical Systems Using Differential Diagnosis, LAMSTAR, and k-NN” IEEE Transactions on Information Technology in biomedicine, vol 16, no. 6.pp. 12871295,2012. [6] Chang-Sik Son, Hyoung-Seob, Hyung-Seop Kim, Min-Soo Kim , Park and Yoon-Nyun Kim, “Decisionmaking model for early diagnosis of congestive heart failure using rough set and decision tree approaches” Journal of Biomedical Informatics, vol.45,pp 9991008,2012. [7] Dr.B.LDeekshatulu, M.A.Jabbar and DrPriti Chandra “Knowledge Discovery From Mining Association Rules For Heart Disease Prediction”Journal of Theoretical and Applied Information Technology. Vol. 41 No.2,2012. Page | 32

International Journal of Advanced Engineering Research and Science (IJAERS)

Vol-3, Issue-2 , Feb- 2016] ISSN: 2349-6495

[8] DemetraHadjipanayi, Minas A. Karaolis, Joseph A. Moutiris, and Constantinos S. Pattichis “Assessment of the Risk Factors of Coronary Heart Events Based on Data Mining” IEEE Transactions on Information Technology in Biomedicine, vol.14, no. 3,pp.559-567, 2010. [9] M.EssamKhalifa, Omar H. Karam,Randa ElBialyandMostafa A. Salamay“Feature Analysis of Coronary Artery Heart Disease Data Sets” Elsevier vol. 65,pp.459-468,2014. [10] George Gomes Cabral and Adriano Lorena In´acio de Oliveira “One-class Classification for Heart Disease Diagnosis” IEEE Transactions on Information Technology in Biomedicine vol.9.no.12,pp-111118,2014. [11] HeonGyuLede, Jin Hyoung Park and Jong Heung Park “Real-time Diagnosis System Using Incremental Emerging Pattern Mining” IEEE Transctions on Biomedical Engineering vol.10,no,8. Pp.10801085,2010. [12] M. Karaolis, J.A. Moutiris, L. Papaconstantinou and C.S. Pattichis “Association Rule Analysis for the Assessment of the Risk of Coronary Heart Events”IEEE Transactions on Information Technology in Biomedicine vol.9,no.5,pp.6238-6242,2009. [13] Leandro Pecchia, Marcello Bracale, Nicola De Luca and Paolo Melillo, “Classification Tree for Risk Assessment in Patients Suffering From Congestive Heart Failure via Long-Term Heart Rate Variability” IEEE journal of biomedical and health informatics, vol. 17, no. 3, pp. 727-734,2013. [14] Nitin Bhatia and Vandana “Survey of Nearest Neighbor Techniques” International Journal of Computer Science and Information Security, Vol. 8, No. 2, 2010. [15] Yuan Jiang andZhi-Hua Zhou “Medical Diagnosis with C4.5 Rule Preceded byArtificial Neural Network Ensemble” IEEE Transactions on Information Technology in Biomedicine vol.24,No.2,pp.123140,2011.

www.ijaers.com

Page | 33