Privacy preserving data mining in four group randomized response technique using id3 and cart algori

IJRET: International Journal of Research in Engineering and Technology

eISSN: 2319-1163 | pISSN: 2321-7308

PRIVACY PRESERVING DATA MINING IN FOUR GROUP RANDOMIZED RESPONSE TECHNIQUE USING ID3 AND CART ALGORITHM Monika Soni1, Vishal Shrivastava2 1

M. Tech. Scholar, 2Associate Professor, Arya College of Engineering and IT, Rajasthan, India, 12.monika@gmail.com, vishal500371@yahoo.co.in

Abstract Data mining is a process in which data collected from different sources is analyzed for useful information. Data mining is also known as knowledge discovery in database (KDD). Privacy and accuracy are the important issues in data mining when data is shared. Most of the methods use random permutation techniques to mask the data, for preserving the privacy of sensitive data. Randomize response techniques were developed for the purpose of protecting surveys privacy and avoiding biased answers. The proposed work thesis is to enhance the privacy level in RR technique using four group schemes. First according to the algorithm random attributes a, b, c, d were considered, Then the randomization have been performed on every dataset according to the values of theta. Then ID3 and CART algorithm are applied on the randomized data. The result shows that by increasing the group, the privacy level will increase. This work shows that as compared with three group scheme with four groups scheme the accuracy decreases 6% but the privacy increases 65%.

-----------------------------------------------------------------------***----------------------------------------------------------------------1. INTRODUCTION OF PROPOSED APPROACH This work uses ID3 and CART algorithm to enhance the privacy of the secret data. The problem with the previous work for three groups of data sets using ID3 algorithm was that it was not checking the group performance at every step and the privacy level was not very high. The proposed work increases the level of privacy by using ID3 and CART algorithms. Previous work was giving an overall result whereas this work is implementing it in step by step manner.

1.1 The Basic idea of ID3 Algorithm ID3 algorithm uses information entropy theory to select attribute values with maximum information gain in the current sample sets as the test attribute. The division of the sample sets is based on the value of the test properties, the numbers of test attributes decide the number of subsample sets; at the same time, new leaf nodes grow out of corresponding nodes of the sample set on the decision tree. ID3 algorithm is given below: ID3(S, AL) Step 1. Create a node V. Step 2. If S consists of samples with all the same class C then return V as a leaf node labeled with class C. Step 3. If AL is empty, then return V as a leaf-node with the majority class in y. Step 4. Select test attribute (TA) among the AL with the highest information gain.

Step 5. Step 6. a) b) c) d)

Label node V with TA. For each known value ai of TA Grow a branch from node V for the condition TA=ai Let si be the set of samples in S for which TA=ai. If si empty then attach a leaf labeled with the majority class in S. Else attach the node returned by ID3(si,AL-TA).

1.2 CART Algorithm Classification and Regression Trees is a classification method which uses historical data to construct so-called decision trees. Decision trees are then used to classify new data. In order to use CART the number of classes should be known a priori. Decision trees are represented by a set of questions which splits the learning sample into smaller and smaller parts. CART asks only yes/no questions. A possible question could be: ”Is age greater than 50?” or ”Is sex male?”. CART algorithm will search for all possible variables and all possible values in order to find the best split – the question that splits the data into two parts with maximum homogeneity. The process is then repeated for each of the resulting data fragments. Step 1: Find each predictor’s best split. Step 2: Find the node’s best split. Step 3: Split the node using its best split found in step 2 if the stopping rules are not satisfied.

__________________________________________________________________________________________ Volume: 02 Issue: 09 | Sep-2013, Available @ http://www.ijret.org

106

Turn static files into dynamic content formats.

Create a flipbook