Ijcot v3i11p108

Page 1

International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

An Efficient Extended Attribute Selection Methods for Classification S.Rajeev 1 , Mrs. N. Rajeswari 2 Mtech Student in cse department,Gudlavalleru Engineering College,Gudlavalleru,Krishna(dt), Associate professor in cse department,Gudlavalleru Engineering College,Gudlavalleru,Krishna(dt) 1

2

Abstract – In recent years many applications of data mining deal with a high-dimensional data (very large number of features) impose a high computational cost as well as the risk of “over fitting”. In these cases, it is common practice to adopt feature selection method to improve the generalization accuracy. Data quantity is the main issue in the small data set problem, because usually insufficient data will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. In this connection, the present study is devoted not only to investigate the most relevant subset features with minimum cardinality for achieving high predictive performance by adopting CFS and ChiSqr filtered feature selection techniques in data mining but also to evaluate the goodness of subsets with different cardinalities and the quality of two filtered feature selection algorithms in terms of Fmeasure value and Receiver Operating Characteristics (ROC) value, generated by the Improved SVM technique with radial basis kernel function classifier method. The results show that the proposed method has a superior classification performance when compared to principal component analysis (PCA), kernel principal component analysis (KPCA), and kernel independent component analysis (KICA) with a Gaussian kernel in the support vector machine (SVM) classifier. The result of the present study effectively supports the well known fact of increase in the predictive accuracy with the existence of minimum number of features. The expected outcomes show a reduction in computational time and constructional cost in both training and classification phases . Keywords – A t t r i b u t e s e l e c t i o n , Rules, Stemming, Probability.

I. INTRODUCTION As the world grows in complexity, overwhelming us when using the data it generates, data mining becomes the one and only think of elucidating the patterns that underlie it [1]. The manual strategy of data analysis becomes tedious as size of data grows and to discover the number of dimensions increases, the means of data analysis ought to be computerised. The words

ISSN: 2249-2593

Knowledge Discovery from data (KDD) is the automated process of knowledge discovery from databases. The method of KDD is comprised of many steps namely data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation. Data mining is a step in entire process of data discovery that can be explained being a means of extracting or mining knowledge from a great deal of data [2]. Data mining is basically a kind of knowledge discovery necessary for solving problems inside a specific domain. Data mining may also be explained as the non trivial process that automatically collects the useful hidden information beginning with the data and is gotten as sorts of rule, concept, pattern and so forth [3]. The knowledge extracted from data mining, allows the user to look for interesting patterns and regularities deeply buried within the data in order to help by doing so of selection process.

Many irrelevant attributes could be comprised in data to remain mined. In order that they ought to be removed. Also many mining algorithms don’t work efficiently with large amounts of features or attributes. Therefore feature selection techniques ought to be applied before just about any mining algorithm is applied. The most ideal objectives of feature selection are to refrain from overfitting and increase model performance and also to give faster plus much more cost-effective models. The alternative of optimal features adds extra layer of complexity among the modelling as instead of just finding optimal parameters for full multitude of features, first optimal feature subset is going to be found and of course the model parameters are optimised . Attribute selection methods might be broadly divided into filter and wrapper approaches. Among the filter hit on the attribute selection way is independent of the results mining algorithm that really is onto the selected attributes and assess the relevance of features by looking only at the intrinsic properties of one's data. In most cases a feature relevance final score is calculated, and lowscoring features are removed. The subset of features left after feature removal is presented as input into the

http://www.ijcotjournal.org

Page 530


International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

classification algorithm. Greatest things about filter techniques are that these easily scale to highdimensional datasets are computationally uncomplicated and fast, and as the filter approach is outside of the mining algorithm so feature selection needs to be performed only once, then different classifiers might be evaluated. Disadvantages of filter methods are which these disregard the interaction using the classifier so that most proposed techniques are univariate which means that each feature is said to be separately, thereby ignoring feature dependencies, which might cause worse classification performance in comparison to some other kinds of feature selection techniques. So that you can overcome the challenge of ignoring feature dependencies, a range of multivariate filter techniques were introduced, aiming along at the incorporation of feature dependencies to some extent. Wrapper methods embed the model hypothesis search within the feature subset search. In the wrapper make a move on the attribute selection method uses the effect of the data mining algorithm to understand how good a certain attribute subset is. In this particular setup, a search procedure within the space of possible feature subsets is defined, and various subsets of features are generated and evaluated. The major characteristic of the wrapper approach is the fact that the quality associated with an attribute subset is directly measured by the performance of the data mining algorithm applied to that attribute subset. The wrapper approach leans be much not as fast as the filter approach, just like the data mining algorithm is applied for each individual attribute subset considered from the search. In addition, if a number of different data mining algorithms are going to be applied to information, the wrapper approach becomes more computationally expensive. Data mining algorithms can follow three different learning approaches: supervised, unsupervised, or semisupervised. In supervised learning, the algorithm works with particular examples whose labels have been shown. The labels can easily be nominal values in the case of the classification task, or numerical values for the regression task. In unsupervised learning, as opposed, the tags of the examples among the dataset are unknown, plus the algorithm typically aims at grouping examples in accordance with the similarity of their total attribute values, characterizing a clustering task. Finally, semisupervised learning is usually used when a small subset of labeled examples can be obtained, together with a large number of unlabeled examples[1].

II.

LITERATURE SURVEY

In the early research on feature construction, some systems focused on decision-tree-based algorithms . BACON is a program that discovers the relationships among real-valued features of instances and uses multiply and divide operators. FRINGE constructs new features by conjoining pairs of features at the fringe of each of the positive branches in the decision trees. During each iteration, the newly constructed features and the existing features are used as the input space for the algorithm. CITRE uses a variety of operands such as root, fringe, and root-fringe, to construct new features, and all of these use conjunction as the operator. Other methods of attribute construction include geneticbased algorithms, and these can be divided into two major categories: the wrapper and nonwrapper approaches. For the wrapper genetic programming approach, in which the final learner is used as an indicator for the appropriateness of the constructed attributes, the constructed attributes are fed into the classifier, and the classifier accuracy is used as a guide to rank them . The nonwrapper genetic programming approach is performed as a preprocessing phase, and since no particular classifier is involved in evaluating the constructed attributes, it is expected to be more efficient and the results are also expected to be more general . The information gain (IG) and information gain ratio (IGR) are the commonly used fitness functions for constructing attributes.

Attribute Selection Since not all the measured variables are important for understanding the underlying phenomena, dimension

ISSN: 2249-2593

http://www.ijcotjournal.org

Page 531


International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

reduction is often possible in many cases. There may be variables whose variance is less than the measurement noise, and hence these are considered irrelevant to the model. Feature Extraction Feature extraction is a technique which has the capability to project the original features into a lower feature space to reduce the number of data dimensions and improve analytical efficiency.

Fig. is the flow chart of the proposed method that starts from collecting the data, building MTD functions, and computing the overlap area of the MTD functions, and then moves on to class-possibility attribute transformation, attribute construction, attribute merging, and finally SVM model building.

occurrence is actually independent of the class value. As a statistical test, it is known to behave erratically for very small expected counts, which are common in text classification both because of having rarely occurring word features, and sometimes because of having few positive training examples for a concept. In statistics, the χ2 test is applied to test the independence of two events,where two eventsA and B are defined to be independent if P(AB) = P(A)P(B) or, equivalently, P(A|B) = P(A) and P(B|A) = P(B). In feature selection, the two events are occurrence of the term and occurrence of the class. Feature selection using the χ2 statistic is analogous to performing a hypothesis test on the distribution of the class as it relates to the values of the feature in question. The null hypothesis is that there is no correlation; each value is as likely to have instances in any one class as any other class. Under the null hypothesis, if p of the instances have a given value and q of the instances are in a specific class, (p · q)/n instances have a given value and are in a specific class (n is the total number of instances in the dataset). This is because p/n instances have the value and q/n instances are in the class, and if the probabilities are independent (i.e.the null hypothesis) their joint probability is their product. Given the null hypothesis, the χ2 statistic measures how far away the actual value is from the expected value: Confusion Matrix: A confusion matrix (Kohavi and Provost, 1998) contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix. The following table shows the confusion matrix for a two class classifier. The entries in the confusion matrix have the following meaning in the context of our study:

III. PROPOSED SYSTEM CHI (χ 2 STATISTIC): This method measure the lack of independence between a term and the category. Chi-Squared is the common statistical test that measures divergence from the distribution expected if one assumes the feature

ISSN: 2249-2593

a is the number of correct predictions that an instance is negative,

b is the number of incorrect predictions that an instance is positive,

c is the number of incorrect of predictions that an instance negative, and

d is the number of correct predictions that an instance is positive. Predicted Negative Positive Negative a b Actual Positive c d

Several standard terms have been defined for the 2

http://www.ijcotjournal.org

Page 532


International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

class matrix:

The accuracy (AC) is the proportion of the total number of predictions that were correct. It is determined using the equation:

The recall or true positive rate (TP) is the proportion of positive cases that were correctly identified, as calculated using the equation:

The false positive rate (FP) is the proportion of negatives cases that were incorrectly classified as positive, as calculated using the equation:

corresponding classes { yi : yi  {1,1}} then the training algorithm attempts to place an hyperplane between points where yi  1 and points where yi  1 . Once this has been achieved a new pattern x can then be classified by testing which side of the hyper-plane the point lies on.

Radial basis function kernel with width σ 

The true negative rate (TN) is defined as the proportion of negatives cases that were classified correctly, as calculated using the equation:

The false negati ve rate (FN) is the proportion of positives cases that were incorrectly classified as negati ve, as calculated using the equation:

Finally, precision (P) is the proportion of the predicted positi ve cases that were correct, as calculated using the equation:

LINEAR AND POLYNOMIAL SVM

A support vector machine is primarily a two-class classifier. It is possible to solve multi class problems with support vectors by treating each single class as a separate problem. It aims to maximise the width of the margin between classes, that is, the empty area between the decision boundary and the nearest training patterns. Given a set of points {xi } in n-dimensional space with

ISSN: 2249-2593

RESULTS:

Attribute Evaluator (nominal): 40 class): Feature Filter ROBUST selected attributes: 0.16219 7 transform_6 0.13081 3 transform_2 0.13081 2 transform_1 0.11552 13 transform_12 0.09478 4 transform_3 0.05674 8 transform_7 0.0359 12 transform_11 0.03445 25 transform_24 0.0261 27 transform_26 0.02276 5 transform_4 0.02202 18 transform_17 0.02077 21 transform_20 0.02017 19 transform_18

http://www.ijcotjournal.org

Page 533


International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

0.01635 28 transform_27 0.01265 15 transform_14 0.01195 38 transform_37 0.01098 22 transform_21 0.00908 36 transform_35 0.00841 10 transform_9 0.00795 30 transform_29 0.00756 14 transform_13 0.00677 17 transform_16 0 6 transform_5 0 1 transform_0 0 11 transform_10 0 9 transform_8 0 34 transform_33 0 32 transform_31 0 33 transform_32 0 39 transform_38 0 35 transform_34 0 37 transform_36 0 23 transform_22 0 16 transform_15 0 20 transform_19 0 29 transform_28 0 31 transform_30 0 24 transform_23 0 26 transform_25

0.0722 8 transform_7 0.0589 12 transform_11 0.05 22 transform_21 0.0361 19 transform_18 0.028 10 transform_9 0.0277 17 transform_16 0.0229 5 transform_4 0.0204 15 transform_14 0.0203 14 transform_13 0 6 transform_5 0 1 transform_0 0 11 transform_10 0 9 transform_8 0 34 transform_33 0 32 transform_31 0 33 transform_32 0 39 transform_38 0 35 transform_34 0 37 transform_36 0 23 transform_22 0 16 transform_15 0 20 transform_19 0 29 transform_28 0 31 transform_30 0 24 transform_23 0 26 transform_25

Selected attributes: 7,3,2,13,4,8,12,25,27,5,18,21,19,28,15,38,22,36,10,30,14 ,17,6,1,11,9,34,32,33,39,35,37,23,16,20,29,31,24,26 : 39

Selected attributes: 13,27,7,38,30,18,4,28,3,2,36,25,21,8,12,22,19,10,17,5,15 ,14,6,1,11,9,34,32,33,39,35,37,23,16,20,29,31,24,26 : 39

Attribute Evaluator (nominal): 40 class): KPCA Feature Extractor ROBUST selected attributes: 0.2185 13 transform_12 0.2108 27 transform_26 0.1929 7 transform_6 0.1813 38 transform_37 0.1691 30 transform_29 0.1586 18 transform_17 0.1432 4 transform_3 0.1408 28 transform_27 0.1378 3 transform_2 0.1378 2 transform_1 0.1087 36 transform_35 0.0871 25 transform_24 0.0794 21 transform_20

ISSN: 2249-2593

ROBUST selected attributes: 0.8297 1 0.33 transform_2-0.33transform_10.287transform_4-0.23transform_8+0.226transform_6... 0.6938 2 0.328transform_16+0.327transform_21+0.32 transform_10+0.299transform_28+0.272transform_31... 0.5835 3 0.374transform_25+0.37 transform_30+0.342transform_20+0.327transform_34+0. 256transform_37... 0.5195 4 -0.434transform_19-0.38transform_80.352transform_14+0.264transform_3+0.212transform_3 4... 0.4644 5 0.362transform_120.314transform_18+0.301transform_90.285transform_11+0.235transform_5... 0.4172 6 0.473transform_120.297transform_5+0.274transform_31-0.272transform_9-

http://www.ijcotjournal.org

Page 534


International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

0.264transform_27... 0.3752 7 -0.448transform_7+0.393transform_30.376transform_27+0.305transform_90.298transform_4... 0.3354 8 0.528transform_3-0.435transform_90.314transform_7+0.314transform_270.275transform_4... 0.2977 9 -0.441transform_26+0.371transform_340.364transform_33-0.357transform_200.334transform_29... 0.2644 10 0.653transform_13+0.393transform_330.288transform_26+0.226transform_80.205transform_14... 0.2318 11 0.478transform_13+0.453transform_260.43transform_33-0.34transform_190.191transform_14... 0.2021 12 0.79 transform_320.252transform_27+0.234transform_280.219transform_35-0.206transform_31... 0.173 13 -0.58transform_220.505transform_15+0.244transform_27+0.238transform_ 9-0.199transform_10... 0.1447 14 0.857transform_170.198transform_12+0.181transform_15+0.173transform_ 11-0.169transform_18... 0.1166 15 0.775transform_29-0.307transform_260.302transform_330.22transform_17+0.186transform_25... 0.0889 16 0.654transform_230.574transform_24+0.258transform_290.192transform_15-0.158transform_32... 0.062 17 0.719transform_15+0.642transform_22+0.165transform_ 16-0.113transform_27-0.092transform_10... 0.036 18 0.597transform_36+0.518transform_380.409transform_37-0.349transform_350.187transform_23...

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

-0.0656 * pima_transform_4 -0.4121 * pima_transform_5 0.6822 * pima_transform_6 -0.139 * pima_transform_7 0.0734 * pima_transform_8 -0.3321 * pima_transform_9 -0.0799 * pima_transform_10 0.1083 * pima_transform_11 0.5739 * pima_transform_12 -0.2121 * pima_transform_13 0.2855 * pima_transform_14 -0.2516 * pima_transform_15 0.1717 * pima_transform_16 0.4647 * pima_transform_17 -0.3564 * pima_transform_18 -0.4977 * pima_transform_19 0.7833 * pima_transform_20 0.5947 * pima_transform_21 -0.4231 * pima_transform_22 -1.1781 * pima_transform_23 0.8217 * pima_transform_24 0.0638 * pima_transform_25 0.7195 * pima_transform_26 0.981 * pima_transform_27 -0.3862 * pima_transform_28 0.6545 * pima_transform_29 -0.5907 * pima_transform_30 0.2454 * pima_transform_31 -0.6317 * pima_transform_32 -0.7543 * pima_transform_33 0.1636 * pima_transform_34 1.1231 * pima_transform_35 -0.8777 * pima_transform_36 1.0819 * pima_transform_37 -0.9183 * pima_transform_38 0.5258

Number of kernel evaluations: 191468 (72.575%) Selected attributes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 : 18 Time taken to build model: 0.24 seconds Time taken to test model on training data: 0.04 seconds === Confusion Matrix === a b <-- classified as 468 32 | a = 0 90 178 | b = 1

SVM KERNAL FUNCTION Classifier for classes: 0, 1

+ +

-0.2701 * pima_transform_1 0.2701 * pima_transform_2 -0.2045 * pima_transform_3

ISSN: 2249-2593

V. CONCLUSION AND FUTURE SCOPE This system successfully works to extend the attributes with the large datasets. Feature selection plays vital role in this work for selecting important ranked attributes to the classifier. The experiments show that our approach is able

to identify meaningful classification rules within an

http://www.ijcotjournal.org

Page 535


International Journal of Computer & Organization Trends –Volume 3 Issue 11 – Dec 2013

acceptable execution time. This framework develop a new algorithm based on coherent rules so that users can mine the items without domain knowledge and it can mine the items efficiently when compared to association rules. Implication of propositional logic is a good alternative on the definition on association. Rules based on this definition may be searched and discovered within feasible time.

REFERENCES [1] Classification and Feature Selection Techniques in Data Mining Sunita Beniwal*, Jitender Arora, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 6, August - 2012 [2] Hybrid Approach to Improve Pattern Discovery in Text mining Charushila Kadu, International Journal of Advanced Research in Computer and Communication Engineering [3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 478-499, 1994. [4] H. Ahonen, O. Heinonen, M. Klemettinen, and A.I. Verkamo, “Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections,” Proc. IEEE Int’l Forum on Research and Technology Advances in Digital Libraries (ADL ’98), pp. 2-11, 1998. [5] R. Baeza-Yates and B. Ribeiro-Neto,Modern Information Retrieval. Addison Wesley, 1999. [7] T. Chau and A. K. C.Wong, “Pattern discovery by residual analysis and recursive partitioning,” IEEE Trans. Knowledge Data Eng., vol. 11, pp.833–852, Nov./Dec. 1999. [8] Nitin Jindal, Bing Liu, Ee-Peng Lim, “Finding Unusual Review Patterns Using Unexpected Rules”.

ISSN: 2249-2593

http://www.ijcotjournal.org

Page 536


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.