IJIERT-COMPARATIVE ANALYSIS OF STANDARD ERROR USING IMPUTATION METHOD

Novateur Publicationâ&#x20AC;&#x2122;s International Journal of Innovation in Engineering, Research and Technology [IJIERT] ICITDCEMEâ&#x20AC;&#x2122;15 Conference Proceedings ISSN No - 2394-3696

COMPARATIVE ANALYSIS OF STANDARD ERROR USING IMPUTATION METHOD V.B. Kamble P.E.S. College of Engineering,Aurangabad. (M.S.), India S.N. Deshmukh Dr. Babasaheb Ambedkar Marathwada University,Aurangabad. (M.S.) India.

ABSTRACT Presence of missing values in the dataset remains great challenge in the process of knowledge extracting. Also this leads to difficulty for performance analysis in data mining task. In this research work, student dataset is taken that contains marks of four different subjects of engineering college. Mean Imputation, Mode Imputation, Median Imputation and Standard Deviation Imputation were used to deal with challenges of incomplete data. By implementing imputation methods for example Mean Imputation, Mode Imputation, Median Imputation and Standard Deviation Imputation on the student dataset and find out standard errors for each imputation method then analyze obtained result. Mean Imputation with standard error is less as compare with other imputation method with standard error. Hence Mean Imputation Method with standard error is more suitable to handling the missing values in the dataset. INTRODUCTION Most of the real world datasets are incomplete, due to presence of missing values. Missing data is randomly distributed in the dataset. Incompleteness in the dataset occurs due to various reasons like manual data entry procedures, incorrect measurements, equipment errors etc. When data is incomplete, it is very difficult to extract useful knowledge from dataset as data analysis algorithms can work efficiently only with complete data. In this research work minimize errors by using imputation methods with standard error. Mean Imputation with standard error is less as compare with other imputation method with standard error. Hence Mean Imputation Method with standard error is more suitable to handling the missing values in the dataset. The organization of the paper is Section1: Introduction of Imputation Techniques. Section 2: Related Work, Section 3: Missing Data Imputation Technique, Section 4: Experimental result and Analysis, Section 5: Conclusion and future work. 1.1 MISSING VALUES Missing data occur due to nonresponsive field items when the data was collected and ignored by users. Missing values lead to the difficulty of extracting useful information from data set [2]. Missing data is the absence of data items that hide some information that may be important [1]. Most of the real world databases are characterized by an unavoidable problem of incompleteness, in terms of missing values. [3]. 1.2 HANDLING MISSING VALUES A) HANDLING MISSING VALUES BY IGNORING THE TUPLE This method is not effective, unless the tuple contains several attributes with missing values. It is poor when the percentage of missing values as per attribute varies considerably.

1|Page

Turn static files into dynamic content formats.

Create a flipbook

IJIERT-COMPARATIVE ANALYSIS OF STANDARD ERROR USING IMPUTATION METHOD

Published on Sep 21, 2018

IJIERT Journal

Presence of missing values in the dataset remains g reat challenge in the process of knowledge extracti ng. Also this leads to difficulty for performance analysis in dat a mining task. In this research work,student datas et is taken that contains marks of four different subjects of engine ering college. Mean Imputation,Mode Imputation,Me dian Imputation and Standard Deviation Imputation were u sed to deal with challenges of incomplete data. By implementing imputation methods for example Mean Im putation,Mode Imputation,Median Imputation and Standard Deviation Imputation on the student datase t and find out standard errors for each imputation method then analyze obtained result. Mean Imputation with stand ard error is less as compare with other imputation method with standard error. Hence Mean Imputation Method with s tandard error is more suitable to handling the miss ing values in the dataset. https://www.ijiert.org/admin/papers/1452798468_ICITDCEME-15.pdf