213 by International Journal

Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in

Anticipating the Chronic Kidney Disorder (CKD) using Performance Optimization in AdaBoost and Multilayer Perceptron Dr. S. Sasikala1, Dr. S. Jansi2, Ms. S. Saranya3,Ms. P. Deepika4, Ms. A. Kiruthika5 1,2,3,4,5

Asst.Prof., Research Department of Computer Science and Applications, Hindusthan College of Arts and Science, Coimbatore, India

Abstract:. Chronic kidney disorder is a progressive loss of the renal functions. Classifying the disease with features such as blood pressure, albumin, sugar helps in diagnosing the disease. Machine learning helps in getting accuracy for the classification task. Meta classifiers improve the accuracy in the machine learning. This paper deploys the AdaBoost and Multilayer Perceptron for the purpose of anticipating the chronic kidney disorder. Keywords: CKD, AdaBoost, MultilayerPerceptron, Machine Learner

1. Introduction Chronic kidney disease (CKD), also known as chronic renal disease, is a progressive loss in renal function over a period of months or years. The symptoms of worsening kidney function are not specific, and might include feeling generally unwell and experiencing a reduced appetite. Often, chronic kidney disease is diagnosed as a result of screening of people known to be at risk of kidney problems, such as those with high blood pressure or diabetes and those with a blood relative with CKD. This disease may also be identified when it leads to one of its recognized complications, such as cardiovascular disease, anemia, or pericarditis. It is differentiated from acute kidney disease in that the reduction in kidney function must be present for over 3 months.

2. CKD Features People with CKD suffer from accelerated atherosclerosis and are more likely to develop cardiovascular disease than the general population. Patients afflicted with CKD and cardiovascular disease tend to have significantly worse prognoses than those suffering only from the latter.

Imperial Journal of Interdisciplinary Research (IJIR)

Table 1: Feature Description Features used and its description age age bp blood pressure sg specific gravity al albumin su sugar rbc red blood cells pc pus cell pcc pus cell clumps ba bacteria bgr blood glucose random bu blood urea sc serum creatinine sod sodium pot potassium hemo hemoglobin pcv packed cell volume wc white blood cell count rc red blood cell count tn hypertension dm diabetes mellitus cad coronary artery disease appet appetite pe pedal edema ane anemia class class

3. Machine learners Machine Learning is mainly useful in cases where algorithmic/deterministic solutions are not available i.e. there is a lack of formal models or the knowledge about the application domain is scarce. The algorithms have been developed in diverse set of disciplines such as statistics, computer science, robotics, computer vision, physics, and applied mathematics. Advantages of machine learning over statistical models are accuracy, automation, speed, customizability and scalability.

Page 1260

Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in

Figure 1: Machine Learning As medicine plays a great role in human life, automated knowledge extraction from medical data sets has become an immense issue. Research on knowledge extraction from medical data is growing fast. All activities in medicine can be divided into six tasks: screening, diagnosis, treatment, prognosis, monitoring and management. As the healthcare industry is becoming more and more reliant on computer technology, machine learning methods are required to assist the physicians in identifying and curing abnormalities at early stages. Medical diagnosis is one of the important activities of medicine. The accuracy of the diagnosis contributes in deciding the right treatment and subsequently in cure of diseases.

4. MultiLayer Perceptron A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training the network. MLP is a modification of the standard linear perceptron and can distinguish data that are not linearly separable. Activation function If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that maps the weighted inputs to the output of each neuron, then it is easily proved with linear algebra that any number of layers can be reduced to the standard two-layer input-output model (see perceptron). What makes a multilayer perceptron different is that some neurons use a nonlinear activation function which was developed to model the frequency of action potentials, or firing,

Imperial Journal of Interdisciplinary Research (IJIR)

of biological neurons in the brain. This function is modeled in several ways. The two main activation functions used in current applications are both sigmoids, and are described by , in which the former function is a hyperbolic tangent which ranges from -1 to 1, and the latter, the logistic function, is similar in shape but ranges from 0 to 1. Here is the output of the th node (neuron) and is the weighted sum of the input synapses. Alternative activation functions have been proposed, including the rectifier and softplus functions. More specialized activation functions include radial basis functions which are used in another class of supervised neural network models. Layers The multilayer perceptron consists of three or more layers (an input and an output layer with one or more hidden layers) of nonlinearly-activating nodes and is thus considered adeep neural network. Since an MLP is a Fully Connected Network, each node in to one layer connects with a certain weight every node in the following layer. Some people do not include the input layer when counting the number of layers and there is disagreement about whether should be interpreted as the weight from i to j or the other way around. Learning through backpropagation Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. This is an example of supervised learning, and is carried out through backpropagation, a generalization of the least mean squares algorithm in the linear perceptron. We represent the error in output node data point (training

in the th example)

, where is the by target value and is the value produced by the perceptron. We then make corrections to the weights of the nodes based on those corrections which minimize the error in the entire output, given by

. Using gradient descent, we find our change in each weight to be

where is the output of the previous neuron and is the learning rate, which is carefully selected to ensure that the weights converge to a response fast enough, without producing oscillations.

Page 1261

Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in In programming applications, this parameter typically ranges from 0.2 to 0.8 The derivative to be calculated depends on the induced local field , which itself varies. It is easy to prove that for an output node this derivative can be simplified to

where is the derivative of the activation function described above, which itself does not vary. The analysis is more difficult for the change in weights to a hidden node, but it can be shown that the relevant derivative is

. This depends on the change in weights of the th nodes, which represent the output layer. So to change the hidden layer weights, we must first change the output layer weights according to the derivative of the activation function, and so this algorithm represents a backpropagation of the activation function. In particular, given a mini-batch of mm training examples, the following algorithm applies a gradient descent learning step based on that mini-batch: 1. Input a set of training examples 2. For each training example xx: Set the corresponding input activation ax,1ax,1, and perform the following steps: o Feedforward: For each l=2,3,…,Ll=2,3,…,L compute zx,l=wlax,l−1+blzx,l=wlax,l−1+bl and ax,l=σ(zx,l)ax,l=σ(zx,l). o Output error δx,Lδx,L: Compute the vectorδx,L=∇aCx⊙σ′(zx,L)δx,L= ∇aCx⊙σ′(zx,L). o Backpropagate the error: For each l=L−1,L−2,…,2l=L−1,L−2,… ,2 compute δx,l=((wl+1)Tδx,l+1) ⊙σ′(zx,l)δx,l=((wl+1)Tδx,l+1)⊙σ ′(zx,l). 3. Gradient descent: For each l=L,L−1,…,2l=L,L−1,…,2 update the weights according to the rule wl→wl−ηm∑xδx,l(ax,l−1)Twl→wl−η m∑xδx,l(ax,l−1)T, and the biases according to the rule bl→bl−ηm∑xδx,lbl→bl−ηm∑xδx,l.

5. AdaBoost

Gödel Prize in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (e.g., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

6. Evaluation Metrics The evaluation metrics with their appropriate formulas are enlisted in Table 2. Experiments conducted in the work deploy these metrics. Information retrieval performance efficiency is evaluated with the metrics such as Precision, Recall and F-Measure. The total samples are divided into True Positives (TP), False Positives (FP), False Negatives (FN) and True Negatives (TN). Consider Positive (identified) and Negative (rejected), then  True Positive: Number of correctly identified samples  False Positive: Number of incorrectly identified samples  True Negative: Number of correctly rejected samples  False Negative: Number of incorrectly rejected samples Table 2: Evaluation Metrics

AdaBoost, short for "Adaptive Boosting", is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the

Imperial Journal of Interdisciplinary Research (IJIR)

Page 1262

Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in

Figure 3: Perfromance Comparison

Results AdaBoost results Time taken to build model: 0.09 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 396 % Incorrectly Classified Instances 4 % Kappa statistic 0.9788 Mean absolute error 0.0182 Root mean squared error 0.0902 Relative absolute error 3.8914 % Root relative squared error 18.6237 % Total Number of Instances 400 === Confusion Matrix === a b <-- classified as 246 4 | a = ckd 0 150 | b = notckd Mutilayerperceptron results

7. Conclusion

99 1

=== Run information === Scheme:weka.classifiers.meta.AdaBoostM1 -P 100 S 1 -I 10 -W weka.classifiers.functions.MultilayerPerceptron -- -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a Relation: Chronic_Kidney_Disease Instances: 400 Attributes: 25 Time taken to build model: 13.07 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 399 99.75 % Incorrectly Classified Instances 1 0.25 % Kappa statistic 0.9947 Mean absolute error 0.0085 Root mean squared error 0.0622 Relative absolute error 1.8073 % Root relative squared error 12.8559 % Total Number of Instances 400 === Confusion Matrix === a b <-- classified as 249 1 | a = ckd 0 150 | b = notckd

Imperial Journal of Interdisciplinary Research (IJIR)

Machine learning helps in classifying the CKD dataset to anticipate the disease. The features used are enlisted in the paper. Two machine learning, Multilayer perceptron and AdaBoost were deployed for the problem. Both Multilayer Perceptron and AdaBoost gives good Precision, Recall, TPR, FMeasure and accuracy. AdaBoost seems to give slightly better accuracy than the former one.

8. References 1. Amandeep Kaur, J K Sharma and Sunil Agrawal, “Optimization of Artificial Neural Networks for Cancer Detection”, IJCSNS International Journal of Computer 112 Science and Network Security, VOL.11 No.5, May 2011. 2. Shigang Chen, Xiaohu Ma, Shukui Zhang, “AdaBoost Face Detection Based on Haar-like Intensity Features and Multithreshold Features”, 2011 International Conference on Multimedia and Signal Processing. 3. Yufeng Li, Haijuan Zhang, Yanan Zhang, “Research on Face Detection Algorithm in Instant Message Robot”, 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering. 4. Amit P Ganatra, Yogesh P Kosta, “Comprehensive Evolution and Evaluation of Boosting”, International Journal of Computer Theory and Engineering, Vol.2, No.6, December, 2010. 5. Chensheng Sun, Jiwei Hu, Kin-Man Lam, “Feature Subset Selection For Efficient Adaboost Training” in IEEE 2011. 6. WEKA, “Machine learning toolkit”, WEKA, www.cs.waikato.ac.nz/ml/weka/ 7. http://en.wikipedia.org 8. Kotsiantis, Sotiris B., Ioannis D. Zaharakis, and Panayiotis E. Pintelas. "Machine learning: a review of classification and combining techniques." Artificial Intelligence Review 26.3 (2006): 159-190.

Page 1263