Paper id 26201458

Page 1

International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

An Improved Approach to Enhance the Performance of Classical Apriori Algorithm to Mine Frequent Itemsets Shivani Kwatra1, Ravneet Kaur2 1

Student of Master of Technology and 2Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University Fatehgarh Sahib, Punjab, India Email: shivanikwatra4@gmail.com1 and ravneetin2002@gmail.com2 Abstract— Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. Data mining is used to obtain information from the data sets. Association Rule Mining is one of the technique used for obtaining frequent patterns. Frequent itemsets mining is an important and difficult task in association rule mining. In large databases , the research to improve the performance of mining data is necessary. The researchers invented various ideas to improve the performance of mining frequent itemsets. Most of the algorithms are based on time factor. In this paper a new algorithm is proposed to mine frequent itemsets which will use the technique of binary search to generate frequent itemsets. Keywords- Data mining; Association rule mining, Frequent Itemsets

1.

INTRODUCTION

Data mining is a process of discovering knowledge from the database to discover patterns and relationships in data that may be used to make valid predictions. Data mining is used to obtain information from the data sets. The growth in the size of database has led to the development of tools to mine frequent itemsets. The need of automatic extraction of knowledge from data is increasing. The certain information within databases has led to the discovery of association rules to uncover useful patterns for decision support, marketing strategies, financial forecast, and other applications. To find frequent itemsets different techniques are used such as association rules, correlations, clustering and classifiers and many more from which association rules are most populer in the field of frequent itemsets mining. The motivation behind frequent itemsets mining is to examine the items which are purchasing together in the supermarket. This paper is proposed for the survey on frequent itemsets. 2.

LITERATURE REVIEW

Wei Zhang et al. [4] introduces an improved apriori algorithm so called FP-growth algorithm that will help resolve two neck-bottle problems of traditional apriori algorithm and has more efficiency than original one. This introduces constructing method of FP tree structure and experimental results are shown, that the algorithm has higher mining efficiency in execution time, memory usage and CPU utilization than most current ones like Apriori.

Goswami D.N. et al. [5] described three different frequent pattern mining approaches (Record filter, Intersection and Proposed Algorithm) are given based on classical Apriori algorithm. In these approaches Record filter approach proved better than classical Apriori Algorithm, Intersection approach proved better than Record filter approach and finally proposed algorithm proved that it is much better than other frequent pattern mining algorithm. In last this performs a comparative study of all approaches on dataset of 2000 transaction. Basheer Mohamad Al-Maqaleh and Saleem Khalid Shaab [6] proposed an efficient algorithm to integrate confidence measure during the process of mining frequent itemsets, which generates confident frequent itemsets. Consequently, the suggested algorithm generates strong association rules from these confident frequent itemsets. This technique has been implemented and the experimental results show the usefulness and effectiveness of the proposed algorithm. Saurabh Malgaonkar et al. [7] described that the mentioned system is designed to find the most frequent combinations of items. It is based on developing an efficient algorithm that outperforms the best available frequent pattern algorithms on a number of typical data sets. This will help in marketing and sales. The technique can be used to uncover interesting crosssells and related products. Three different algorithms from association mining have been implanted and then best combination method is utilized to find more interesting results. The analyst then can perform the data mining and extraction and finally conclude the result and make appropriate decision.

252


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 Patel Tushar S. et al. [8] includes depth analysis of algorithms and discusses some problems of generating frequent itemsets from the algorithm. The unifying feature among the internal working of various mining algorithms is explored. The comparative study of algorithms includes aspects like different support values is discussed. Anitha Modi1 and Radhika Krishnan [9] described the problem of mining frequent itemsets arises in large transactional databases where there is need to find association rules among the transactional data for the growth of business. Several algorithms have been proposed and developed to increase efficiency of mining frequent itemsets. The survey of various algorithms for mining frequent itemsets in transactional database that work on horizontal, vertical, projected and hybrid layout datasets is presented. Mihir R. et al. [10] introduces proposing method that can be combined with Apriori algorithm and reduces storage required to store candidate and the execution time by reducing CPU time. CPU time is saved by reducing candidate sets size and time required to calculate the support of each candidate. The concept of checkpoint is purposed based on support value to reduce the execution time and overall storage space required to store candidate generated during scanning of dataset. Damor Nirali N. et al. [11] described a new method for generating frequent itemsets using frequent itemset tree (FItree). Also describe the example of new method and its result analysis using wine dataset. The execution time of purposed method is better compare to SaM method. 3. 1.

2.

3.

4.

5.

CHALLENGES OF DATA MINING WITH ASSOCIATION RULE MINING

Mining different kinds of knowledge in databases - It is necessary for data mining to cover broad range of knowledge discovery task. Presentation and visualization of data mining results. Once the patterns are discovered it needs to be expressed in high level languages, visual representations. This representations should be easily understandable by the users. Handling noisy or incomplete data. - The data cleaning methods are required that can handle the noise, incomplete objects while mining the data regularities. If data cleaning methods are not there then the accuracy of the discovered patterns will be poor. Pattern evaluation. - It refers to interestingness of the problem. The patterns discovered should be interesting because either they represent common knowledge or lack novelty. Efficiency and scalability of data mining algorithms. - It is also one of the major challenges in association rule mining.

6.

7.

Parallel and distributed data mining- The factors such as huge size of databases, wide distribution of data, and complexity of data mining methods motivate the development of parallel and distributed data mining algorithms. Handling of relational and complex types of data. - The database may contain complex data objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one system to mine all these kind of data. 4.

COMPARATIVE ANALYSIS

Algorithm

Memory Utilization

Time

Databases

Apriori Algorithm

Require Large Space

More Execution Time

Both Sparse And Dense

DHP

Less Space At Earlier Passes And More Space At Later Stages

Small Execution Time For Small Databases

Medium Databases

Partitioning Algorithm

Requires Less Memory

More Execution Time

Large Databases

DIC

Variable Memory

Small Execution Time

Medium And Low Databases

Sampling Algorithm

Very Less Amount Of Memory

Small Execution Time

Any Kind Of Database But Not Give Accurate Results

Eclat

Requires Less Memory

Small Execution Time

Not Suitable For Small Datasets

Requires More Main

More Execution Time

Medium And Large Databases

Algorithm

Algorithm

Algorithm

FP-Growth Algorithm

253


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 Memory H-Mine

5.

Variable Memory

More Execution Time

Both Sparse And Dense

PROPOSED ALGORITHM Algorithm: Proposed Algorithm Input: Data Sets Output: Frequent Itemsets Step1 input dataset and min threshold value Step2 calculate the length of the longest item set in the data set Step3 Apply binary search to find the frequent item set Low=length of shortest item High=length of longest itemset Mid= (low+high)/2 While ( low <= high) 1. If we get the itemset at mid level which is greater than threshold value. Then we have to move further in binary 2. Else backword in binary Step4 Exit

The proposed algorithm will work on the dataset to generate the frequent itemsets. The input required for the algorithm is the dataset and the minimum threshold value. Firstly it will calculate the length of longest itemset in the dataset. Then it will apply the binary search to find frequent itemsets. The proposed algorithm will decrease the time taken to find frequent itemsets using binary search. It will also decrease the space usage by decreasing number of scans. Fetching the data to the main memory is a major issue in the existing algorithms that will cover in this purposed algorithm. This algorithm doesn't require the whole data to be fetched in main memory. 6.

CONCLUSION

In this paper various techniques are discussed and analyzed. A new algorithm is proposed for association rule mining which

will generate efficient results than the traditional algorithms. In this algorithm input and minimum threshold value will be assigned. It will calculate the length of longest itemset from the database and the binary search technique will be applied to the database. Then it will find the frequent itemsets as output. It will give the generated output in less time than other existing algorithms. ACKNOWLEDGEMENTS I would like to place on record my deep sense of gratitude to Mrs. Ravneet Kaur, Department of Computer Science and Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib, India, for her generous guidance, help and useful suggestions. I am also thankful to Dr. Navdeep Kaur, Head of Department of Computer Science and Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib, India, for her kind help and cooperation. I would like to say thanks for support of my friends. I want to express my appreciation to every person who contributed their Inspiration. I am highly grateful to my parents and brother for the inspiration and ever encouraging moral support, which enabled me to pursue my studies. Shivani Kwatra REFERENCES [1] Wei Zhang, Hongzhi Liao and Na Zhao "Research on the FP Growth Algorithm about Association Rule Mining", IEEE Vol. 1, 2008, Wuhan, pp. 315-318. [2] Goswami D.N., Chaturvedi Anshu. Raghuvanshi C.S. " An Algorithm for Frequent Pattern Mining Based On Apriori", IJCSE Vol. 2, 2010, pp. 942-947. [3] Basheer Mohamad Al-Maqaleh and Saleem Khalid Shaab " An Efficient Algorithm for Mining Association Rules using Confident Frequent Itemsets ", IEEE , 2013, Rohtak , pp. 90-94. [4] Saurabh Malgaonkar, Sakshi Surve and Tejas Hirave, "Use of Mining Techniques To Improve The Effectiveness of Marketing and Sales", IEEE, 2013, Mumbai, India, pp. 1-5. [5] Patel Tushar S., Panchal Mayur, Ladumor Dhara, Kapadiya Jahnvi, Desai Piyusha, Prajapati Ashish and Prajapati Reecha, "An Analytical Study of Various Frequent Itemset Mining Algorithms", Res. J. Computer & IT Sci., Vol. 1(1), 2013, pp.6-9. [6] Anitha Modi1and Radhika Krishnan, " Mining Frequent Itemsets in Transactional Database Mining", IJETAE., Vol. 3, 2013, ISSN 2250-2459. [7] Mihir R. Patel,Dipti P. Rana and Rupa G. Mehta "FApriori: A Modified Apriori Algorithm Based on Checkpoint", IEEE , 2013, Mathura, pp. 50-53.

254


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 [8] Damor Nirali N., Radhika Krishnan and Patel Hardik, " A New Method to Mine Frequent Itemsets using Frequent Itemset Tree", Res. J. Computer & IT Sci., Vol.1(3), 2013, ISSN 2320 – 6527, pp. 9-12.

255


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.