GRD Journals- Global Research and Development Journal for Engineering | Volume 6 | Issue 1 | December 2020 ISSN- 2455-5703
Machine Learning Algorithms to Improve the Performance Metrics of Breast Cancer Diagnosis Dr. V. S. R. Kumari Principal ( Professor) Department of Electronics and Communication Engineering Sri Mittapalli Institute of Technology for Women /JNTU Kakinada Suresh Veesa Associate Professor Department of Electronics and Communication Engineering Sri Mittapalli Institute of Technology for Women /JNTU Kakinada
Srinivasa Rao Chevala Assistant Professor Department of Electronics and Communication Engineering Sri Mittapalli Institute of Technology for Women /JNTU Kakinada
Abstract Cancer is the common problem for all people in the world with all types. Particularly, Breast Cancer is the most frequent disease as a cancer type for women. Therefore, any development for diagnosis and prediction of cancer disease is capital important for a healthy life. Cancer is a term for diseases in which abnormal cells divide without control and can invade nearby tissues. Cancer cells can also spread to other parts of the body through the blood and lymph systems. so, detecting the cancer in early stages is important for diagnosis. There are several main types of cancer. Carcinoma is a cancer that begins in the skin or in tissues that line or cover internal organs. Breast cancer starts when cells in the breast begin to grow out of control. These cells usually form a tumor that can often be seen on an x-ray or felt as a lump. Machine learning techniques can make a huge contribute on the process of early diagnosis and prediction of cancer. In this project I am mainly focusing on breast cancer. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. The classification performance of these techniques has been compared with each other using the values of accuracy, precision, recall and ROC Area. The best performance has been obtained by Support Vector Machine technique with the highest accuracy. Keywords- Machine Learning, Breast Cancer, Classification, Early Diagnosis Necessary
I. INTRODUCTION Cancer is the second reason of human death all over the world and accounts for roughly 9.6 million deaths in 2018. Globally, for 1 human death in 6 can be said that is caused by cancer. Almost 70 percent of the deaths from cancer disease happen in countries that have low and middle income. The most common cancer type among women are breast, lung and colorectal, which totally symbolize half of the all cancer cases. People says that everyone knows someone who has breast cancer but what I had seen is everyone has someone close who has breast cancer--Debbie Wasserman Schultz, US House of Representatives, breast cancer survivor. There were 1.7 billion breast cancer cases were diagnosed in 2012. In 2019, there will be an estimated 271,270 new cases of invasive breast cancer diagnosed in women and 2,670 cases diagnosed in men. As we can see that out of all new cases fifty percent are prone to death. By early detection we can reduce this percentage of death. The above figure shows that out of all cancers the cases are more for breast cancer. To discourage the growth of breast cancer, it is important to focus on early detection. Early diagnosis and screening are two main methods of advance detection of breast cancer. From the last few decades, ML techniques healthcare systems, especially for breast cancer (BC) diagnosis and prognosis. Traditionally the diagnostic accuracy of a patient depends on a physician’s experience; however, this expertise is built up over many years of observations of different patients’ symptoms and confirmed diagnoses. Even then the accuracy cannot be guaranteed. With the advent of computing technologies, it is now relatively easy to acquire and store a lot of data.
All rights reserved by www.grdjournals.com
8
Machine Learning Algorithms to Improve the Performance Metrics of Breast Cancer Diagnosis (GRDJE/ Volume 6 / Issue 1 / 003)
Fig. 1: Statistics of cancer
Without the help of computers it is impossible for health specialists to analyze these complex datasets particularly when undertaking complex searching of the data. The intelligent healthcare system is therefore a precious and important domain. The intelligent healthcare system can assist physicians to diagnose patients with greater accuracy or provide more meaningful benchmarks, and further it can aid people to plan for their physical condition into the future. In this context, ML technique scan take over some complex manual works from the physicians, for instance, text and voice analysis, which have been applied to identify/code patient emotions corresponding to healthcare professionals responses. Recently, ML techniques are playing a important role in diagnosis and forecast of breast cancer by applying classification techniques to identify people with breast cancer, differentiate benign from malignant tumour and to predict prognosis. Bektas and Babur have studied on diagnosis of breast cancer using machine learning techniques. Kent Ridge Microarray has been used 2 datasets and support vector machine, kstar, random forest algorithm and voted perceptron have been applied. Random forest algorithm has been showed more performance than applied feature selection method [7]. Chen et al. have applied Support Vector Machine classification algorithm on Wisconsin Diagnostic Breast Cancer dataset. In the study, the training and testing sets have been split as 50-50%, 70-30% and 80-20%. According to different training/testing percent, accuracy values have been calculated [8]. In this paper, as SVM and ANN two of the most popular machine learning techniques are applied on Wisconsin Breast Cancer (Original) dataset and the result of applied machine learning (ML) techniques are compared according to performance metrics. Accurate classification can further assist clinicians to prescribe the most appropriate treatment regime. Classification is a kind of complex optimization problem. Many ML techniques have been applied by researchers in solving this classification problem. In the following sections, a comprehensive explanation of different classification methods applied to breast cancer will be given. We focus on the artificial neural network (ANNs), support vector machine (SVMs), decision tree (DTs) and k-nearest neighbor(k -NNs) techniques as they are the main methods used in breast cancer diagnosis and prognosis. Scientists strive to find the best algorithm to achieve the most accurate classification result, however, data of variable quality will also influence the classification result.
II. METHODOLOGY A. Support Vector Machines (SVM) Support Vector Machines (SVMs) have been first explained by Vladimir Vapnik and the good performances of SVMs have been noticed in many pattern recognition problems. SVMs can indicate better classification performance when it is compared with many other classification techniques. SVM is one of the most popular machines learning classification technique that is used for the prognosis and diagnosis of cancer. According to SVM, the classes are separated with hyperplane that is consisted of support vectors that are critical samples from all classes. The hyperplane is a separator that is identified as decision boundary among the two sample clusters. SVM can be used for classifying tumors as benign or malignant based on patient’s age and tumors size. B. Artificial Neural Networks An artificial neuron network (ANN) is a computational model based on the structure and functions of biological neural networks. Information that flows through the network affects the structure of the ANN because a neural network changes - or learns, in a sense - based on that input and output. ANNs are considered nonlinear statistical data modeling tools where the complex relationships between inputs and outputs are modeled or patterns are found. ANN is also known as a neural network. Activation functions are really important for a Artificial Neural Network to learn and make sense of something really complicated and Nonlinear complex functional mappings between the inputs and response variable. They introduce non-linear properties to our
All rights reserved by www.grdjournals.com
9
Machine Learning Algorithms to Improve the Performance Metrics of Breast Cancer Diagnosis (GRDJE/ Volume 6 / Issue 1 / 003)
Network. Their main purpose is to convert a input signal of a node in a A-NN to an output signal. That output signal now is used as a input in the next layer in the stack. Specifically in A-NN we do the sum of products of inputs(X) and their corresponding Weights (W) and apply a Activation function f(x) to it to get the output of that layer and feed it as an input to the next layer.
Fig. 2: Artificial neural network data flow
III. RESULTS AND DISCUSSION In this paper, we have applied SVM and ANN techniques for prediction of the classification of breast cancer to find which machine learning methods performance is better.
Fig. 3: Confusion Matrix for SVM model
Fig. 4: ANN confusion matrix
All rights reserved by www.grdjournals.com
10
Machine Learning Algorithms to Improve the Performance Metrics of Breast Cancer Diagnosis (GRDJE/ Volume 6 / Issue 1 / 003)
The dataset is divided into train and test data. Using this test dataset in confusion matrix I got accuracy as 98%.Using ANN and SVM the false negatives has been reduced.
IV. CONCLUSION Breast Cancer is the most frequent disease as a cancer type for women. Therefore, any development for diagnosis and prediction of cancer disease is capital important for a healthy life. In this paper, the cancer dataset which was taken from uci website does not contain any mussing values. This type prediction comes under the classification but using classification algorithms like NaĂŻve bayes and logistic regression there are high false negatives so I tried to use state vector machine and artificial Neural network to reduce the false negatives. The accuracy I got is 98%.
REFERENCES [1] [2] [3] [4]
[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
E. A. Bayrak, P. K?rc? and T. Ensari, "Comparison of Machine Learning Methods for Breast Cancer Diagnosis," 2019 Scientific Meeting on ElectricalElectronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, 2019, pp. 1-3, doi: 10.1109/EBBT.2019.8741990 Yue, Wenbin & Wang, Zidong & Chen, Hongwei & Payne, Annette & Liu, Xiaohui. (2018). Machine Learning with Applications in Breast Cancer Diagnosis and Prognosis. Designs. 2. 13. 10.3390/designs2020013 Cancer, https://www.who.int/en/news-room/fact-sheets/detail/cancer. Last Access: 25.01.2019. Kumar V., Mishra B.K., Mazzara M., Thanh D.N.H., Verma A. (2020) Prediction of Malignant and Benign Breast Cancer: A Data Mining Approach in Healthcare Applications. In: Borah S., Emilia Balas V., Polkowski Z.(eds) Advances in Data Science and Management. Lecture Notes on Data Engineering and Communications Technologies, vol 37. Springer, Singapore. https://doi.org/10.1007/978-981-15-0978-0_43 Siegel, R. L., Miller, K. D., &Jemal, A. (2018). Cancer statistics, Ca-a Cancer Journal for Clinicians, 68 (1), pp. 7-30. Maity, N. G., & Das, S. (2017). Machine learning for improved diagnosis and prognosis in healthcare.In 2017 IEEE Aerospace Conference, pp. 1-9. Huang, M. W., Chen, C. W., Lin, W. C., Ke, S. W., & Tsai, C. F. (2017). SVM and SVM ensembles in breast cancer prediction.PloS one, 12 (1). Bazazeh, D., &Shubair, R. (2016). Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In 2016 5th International Conference on Electronic Devices, Systems and Applications, pp. 1-4. Ahmad, L. G., Eshlaghy, A. T., Poorebrahimi, A., Ebrahimi, M., &Razavi, A. R. (2013). Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform, 4 (124). Bektas, B., & Babur, S. (2016). Machine learning based performance development for diagnosis of breast cancer, Medical Technologies National Congress, pp. 1-4. Umadevi, S., &Marseline, K. J. (2017). A survey on data mining classification algorithms. In 2017 International Conference on Signal Processing and Communication, pp. 264-268. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/cancer https://siteman.wustl.edu/glossary/cdr0000045333/ http://www.omegahospitals.com/Breast-Onocology-Omega-Cancer-Hospital.pdf https://archive.ics.uci.edu/ml/datasets/Breast%2BCancer%2BWisconsin%2B(Diagnostic)
All rights reserved by www.grdjournals.com
11