66_ by ides editor

Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013

Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining J Karthikeyan1 and Dr. Udaykumar2 1

Research Scholar, Hindustan University, Chennai, India Email: karthikeyan_world@hotmail.com 2 ACOE, Hindustan University, Chennai, India Email: aukumar71@gmail.com

Abstract -Mining frequent item-sets is one of the most important concepts in data mining. It is a fundamental and initial task of data mining. Apriori[3] is the most popular and frequently used algorithm for finding frequent item-sets. There are other algorithms viz, Eclat[4], FP-growth[5] which are used to find out frequent item-sets. In order to improve the time efficiency of Apriori algorithms, Jiemin Zheng introduced Bit-Apriori[1] algorithm with the following corrections with respect to Apriori[3] algorithm. 1) Support count is implemented by performing bitwise “And” operation on binary strings 2) Special equal-support pruning In this paper, to improve the time efficiency of Bit-Apriori[1] algorithm, a novel algorithm that deletes infrequent items during trie2 and subsequent tire’s are proposed and demonstrated with an example.

unimportant patterns in the item-sets mining. II. RELATED WORK A. Apriori algorithm In computer science and data mining, Apriori is a classic algorithm for learning association rules[8]. Apriori is designed to operate on databases containing transactions. Apriori is commonly used in association rule mining [3]. Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data[9][10]. The algorithm terminates when no further successful extensions are found. Apriori [2] uses breadth-first [3] search and a tree structure to count[6][12[13] candidate item sets efficiently. It generates candidate item sets of length K from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern[11]. According to the downward closure lemma, the candidate set contains all frequent k- length item sets. After that, it scans the transaction database to determine frequent item-sets among the candidates. Apriori [2], though historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all -1of its proper subsets. The pseudo code for Apriori is shown in Table I.

Index Terms - Data mining; frequent item-sets; Apriori; BitApriori, trie2.

I. INTRODUCTION In recent years the size of database has increased rapidly. This has led to a growing interest in the development of tools capable of automatic extraction of knowledge from data. The term data mining or knowledge discovery in database has been adopted for a field of research dealing with the automatic discovery of implicit information or knowledge within the databases. The implicit information within databases, mainly the interesting association relationships[5] among sets of objects that lead to association rules may disclose useful patterns for decision support, financial forecast, marketing policies, even medical diagnosis and many other applications[7]. In frequent patterns, the challenge is large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of item-sets are generated. Therefore, pruning[1] unimportant patterns can be done effectively in mining process and that becomes one of the main topics in frequent pattern mining. Hence, the main aim is to optimize the process of finding frequent patterns which should be efficient, scalable and can detect the important patterns that can be used in various ways of extraction of knowledge from data. Therefore, the study of frequent item-sets mining is well acknowledged in frequent pattern mining because of its broad applications on association rules and for other data mining tasks. An attempt is made in the present work to prune © 2013 ACEEE DOI: 03.LSCS.2013.2.66

B. Bit-Apriori Algorithm Bit-Apriori used the datastructure and techniques of Apriori [1] algorithm. The main difference between Apriori and Bit-Apriori lies in candidate item-sets generation and support count approach. These two steps consume more time and memory in the Apriori [2] algorithm. Given a set of item-sets, the algorithm attempts to find subsets which are common to at least a minimum number C of the item-sets. The time required for mining [14][15]frequent k-item-sets grows significantly when k increases in Apriori. But Bit-Apriori [1] performs much better because it has no candidate generation and needs to traverse the trie only once. The pseudocode for Bit-Apriori is shown in Table II. 54