Study on Data Mining Techniques for Cancer Prediction System

Page 1

Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.60-63 SSN: 2278-2419

Study on Data Mining Techniques for Cancer Prediction System Roseline Jecintha I1, Poonguzhali V2 Assistant Professor, 2Research Scholar PG & Research Department of Computer Applications, St.Joseph’s College of Arts & Science, Cuddalore Email: roseline_jessy@yahoo.co.in, mpss1806@gmail.com 1

. Abstract - Cancer occurs when changes called mutations obtain situate in genes that control cell growth. The mutations permit the cells to divide and multiply in an uncontrolled, hectic way. The cells are multiplying, producing copies that obtain increasingly more abnormal. In the majority cases, the cell copies finally form a tumor. Cancer is the most vital reason for death in the world. The most of common cancers diagnosed in the world are those of the breast, lung, and blood cancers. The prognosis of different cancer is extremely variable. Several cancers are curable with early detection and treatment. Cancers that are aggressive at a later stage may be more difficult to cure. Knowledge Discovery in the database (KDD), which includes data mining techniques are has been used in healthcare. This study paper we have discussed various data mining techniques that have been utilized for the breast cancer, lung cancer, blood cancer. We focus on present research being carried out using the data mining approach to enhance the breast, lung, blood cancers risk factors are diagnosis and prognosis. Keywords - Breast cancer, lung cancer, blood cancer, data mining and KDD. I. INTRODUCTION One of the popular research tools for medical field is Data Mining. Researchers to identify and relationships among a huge number of variables. Able to predict the result of a disease using the historical cases stored within the dataset. The aim of our study is to use various data mining techniques to classify the risk factors of breast cancer, lung cancer, and the blood cancer. This paper reviews the classification techniques in health domain. 1.1 Knowledge discovery process and data mining The term Knowledge Discovery in Databases or KDD for short, refers to the wide route of finding knowledge in data, and emphasizes the "high-level" function of particular data mining methods. The combined aim of the KDD process is to dig up knowledge from the large repositories. Data mining methods to pull out the information according to the condition of actions, using a database along with any vital preprocessing, sampling, and transformations of that database. 1.2.1 The list of steps involved in the knowledge discovery process:  Data Cleaning - The noise and inconsistent data is removed.  Data Integration - many data sources are combined.  Data Selection - Data relevant to the analysis task are retrieved from the database.  Data Transformation - Data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.  Data Mining - smart methods are applied in order to pull out data patterns.  Pattern Evaluation - Data patterns are evaluated.  Knowledge Presentation - Knowledge is represented. 1.2.2. How to use KDD and data mining in research 1. Developing an understanding of the application domain, the related prior information, and the goals of the user. 2. Creating a target data set: selecting a data set, or focusing on a separation of variables, or data samples, on which detection is to be performed

60


Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.60-63 SSN: 2278-2419

3. Data cleaning and preprocessing. Strategies for handling missing data fields. Accounting for time series information and known changes 4. Data reduction and projection. Using dimensionality decrease or transformation methods to decrease the useful number of variables under reflection or to find invariant representations for the data. 5. Choosing the data mining task. Decide whether the goal of the KDD process is classification, regression, clustering, etc. 6. Choosing the data mining algorithms. Deciding which models and parameters may be appropriate. Matching a particular Data mining method with the overall criteria of the KDD process. 7. Data mining. Searching for patterns of interest in a particular representational form Consolidating discovered knowledge. 1.2.3. Classification and Prediction Classification is the route of finding a model that describes the data classes. The reason is to be able to use this model to guess the class of objects whose class label is unidentified. This obtained model is based on the analysis of sets of training data. The obtained model can be presented in the following forms:  Classification (IF-THEN) Rules  Decision Trees  Mathematical Formulae  Neural Networks Classification - It predicts the class of objects whose class label is unidentified. Its purpose is to find an obtained model that describes and distinguishes data classes or concepts. The Derived Model is based on the study set of training data i.e. the data object whose class label is well known. Prediction - It is used to predict missing or engaged numerical data values rather than class labels. Regression study is usually used for prediction. Prediction can also be used for recognition of distribution trends based on available data. II. PRECEDING STUDIES OF CANCER PREDICTION AUTHORS Jaimini Majali, Rishikesh (2015)

PAPER Data Mining Techniques For Diagnosis And Prognosis Of Cancer

Dr.T.Christopher, J.Jamera banu(2016)

Study of Classification algorithm for lung cancer prediction

Alaa M. El-Halees, Asem H. Shurrab (2017)

Blood tumor prediction using data mining techniques

TECHNIQUE USED The Developing system with two approaches of data mining association rule mining and classification techniques, FPgrowth algorithm for diagnosis of cancer. Applied Wisconsin dataset to FP –growth algorithm Using Wisconsin data set. Classification algorithms such as Naïve Bayes, the Bayesian network and J48 algorithms are used to analyze lung cancer prediction.

RESULTS Among the various data mining classifiers and soft computing approaches, the decision tree is found to be the best predictor on Wisconsin data set.

Rule Induction, Association rules, Deep learning.

The experiments gave different accuracy ratio according to the type of blood disease and the type of the classifier. We found that the deep learning classifier has the best ability to detect tumor from blood samples.

61

The naïve Bayes algorithm gives a better performance over the other classification algorithm.


Integrated Intelligent Research (IIR)

A. Priayanga, S. Prakasam.(2013)

Durgalaksmi.R, Mannar mannan. J (2016)

Jaimini Majali, Rishikesh Niranjan (2015)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.60-63 SSN: 2278-2419

Effectiveness of data mining based cancer prediction system(2013). Prognosis of blood carcinoma using data mining techniques (2016).

J48, ID3, NaĂŻve Bayes.

Compared to those three algorithms. Id3 provides the highest accuracy.

Clustering, Ontology.

Data mining techniques for diagnosis and prognosis of cancer (2015).

Classification, association, FPGrowth, decision tree

Early prediction of blood cancer by classifying and clustering the data. Semantic relationships among the datasets are examined closely using ontology. The accuracy of the diagnosis analysis of various data mining classification techniques is highly acceptable and can help the medical professionals in decision making for early diagnosis.

Classification,

III. BREAST CANCER Breast cancer is the cancer that has spread from the ducts or glands to other parts of the breast. More than 40,000 women were expected to die from the disease. Breast cancer can also be diagnosed in men. Risk factors for breast cancer Age, drinking alcohol, Having dense breast tissue, Gender genes, Giving birth at an older age, never being pregnant these are included in risk factors for breast cancer. IV. LUNG CANCER Lung cancer is the uncontrolled growth of anomalous cells that starts in one or both lungs; regularly in the cells that line the air passages. The abnormal cells do not expand into healthy lung tissue; they divide quickly and form tumors. Two types of lung cancer are SCLC and NSCLC. SCLC stands for Small Cell Lung Cancers NSCLC stands for Non-Small Cell Lung Cancers Risk factors for lung cancer Smokings, passive smoking, exposure to asbestos fibers, exposure to radon gas, familial predisposition are included in risk factors for lung cancer. Blood Cancer Blood cancers affect the produce and function of blood cells. These cancers start in bone marrow. The most common are leukemia, lymphoma, and myeloma. Risk factors for blood cancer Certain blood disorders, genetic syndromes, family history, older age, male gender are some of the risk factors of blood cancer. V. CONCLUSION This paper discusses various data mining techniques used to researchers in their research papers. The data mining is used in the field of medical prediction are discussed. The main focus is on using different algorithms for cancer prediction using data mining. In future the work may be extended and improved for the automation of breast cancer, lung cancer, blood cancer prediction. We will use the other types of data mining techniques to predict. Some politeness restrictions such as contractual relationships between researcher and health care organization are compulsory to overcome the safety issues. There is also a need of standardized approach for constructing the data warehouse. In recent years due to an enhancement of internet facility, huge datasets (text and non-text form) are also available on the website. So, there is also an essential need for effective data mining techniques for analyzing this data to uncover hidden information.

62


Integrated Intelligent Research (IIR)

International Journal of Data Mining Techniques and Applications Volume: 07, Issue: 01, June 2018, Page No.60-63 SSN: 2278-2419

References [1] Ada, & Kaur, R. (2013). A Study of Detection of Lung Cancer Using Data Mining Classification Techniques. International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), 3(3), 131–134. [2] Arutchelvan, K., & Periyasamy, R. (2015). Cancer Prediction System Using Data mining Techniques, 1179–1183. [3] Bharathi, H., & Arulananth, T. S. (2017). A review of lung cancer prediction system using data mining techniques and self organizing map (SOM). International Journal of Applied Engineering Research, 12(10), 2190–2195. [4] Christopher, T., & Jamera, J. (2016). Study of Classification Algorithm for Lung Cancer Prediction. a. IJISET - International Journal of Innovative Science, Engineering & Technology, 3(2), 42–49. [5] Cornuéjols, A., Wemmert, C., Gançarski, P., & Bennani, Y. (2018). Collaborative clustering: Why, when, what and how. Information Fusion, 39, 81–95. https://doi.org/10.1016/j.inffus.2017.04.008 [6] Early Prediction of Heart Diseases Using Data Mining Techniques Early Prediction of Heart Diseases Using Data Mining. (2013), (December). [7] Gomathy, B. (2013). Disease Predicting System Using Data Mining Techniques. International Journal of Technical Research and Applications, 1(5), 41–45. [8] Kaur, H., & Wasan, S. K. (2006). Empirical Study on Applications of Data Mining Techniques in Healthcare. (B3) Journal of Computer Sciences, 2(2), 194–200. https://doi.org/10.3844/jcssp.2006.194.200 [9] Krishnaiah, V., Narsimha, G., & Chandra, N. S. (2013). Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques. International Journal of Computer Science and Information Technologies, 4(1), 39–45. Retrieved from http://www.ijcsit.com/docs/Volume 4/Vol4Issue1/ijcsit2013040110.pdf [10] Majali, J., Niranjan, R., Phatak, V., & Tadakhe, O. (2015). Data Mining Techniques for Diagnosis and Prognosis Of Cancer. Ijarcce, 4(3), 613–615. https://doi.org/10.17148/IJARCCE.2015.43147 [11] Rajesh, K., & Sangeetha, V. (2012). Application of Data Mining Methods and Techniques for Diabetes Diagnosis. (Sem Qualis) International Journal of Engineering Research and Innovative Technology (IJEIT), 2(3), 224–229. Retrieved from http://www.apeejay.edu/aitsm/journal/docs/issue-oct- 2016/ajmst040101.pdf [12] Ramachandran, P., Girija, N., Bhuvaneswari, T., & Professor, A. (2014). Early Detection and Prevention of Cancer using Data Mining Techniques. International Journal of Computer Applications, 97(13), 975–8887. Retrieved from http://research.ijcaonline.org/volume97/number13/pxc3897492.pdf [13] S.Dangare, C., & S. Apte, S. (2012). Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques. International Journal of Computer Applications, 47(10), 44–48. https://doi.org/10.5120/7228-0076.

63


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.