GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology (ICIET) - 2016 | July 2016
e-ISSN: 2455-5703
Lossless Representation of High Utility Item set Using APRIORI Algorithm 1S.
Rashmi 2A. Selvaraj 3K. Pavithra 4S. Sundareswari 1,2,3,4 Student 1,2,3,4 Department of Information Technology 1,2,3,4 K.L.N College of Engineering Abstract
Mining High Utility Item set (HUIs) becomes an important data mining task. Too many HUIs may degrade the level of performance. To achieve high efficiency of mining task apriori algorithm is used. In this method a refined database is made that makes the search easier. This in-turn serves to be a compact and lossless representation of HUIs. Keyword- Mining High Utility Item set (HUIs), apriori algorithm _________________________________________________________________________________________________
I. INTRODUCTION Frequent item set mining (FIM) is a fundamental research topic in data mining. The original paper consists of frequent pattern tree structure used for data mining. In tree structure memory and spaced used is more. Since memory used is more this seems to be a great disadvantage. In this application, the traditional model of FIM may discover a large amount of frequent but low revenue item sets and lose the information on valuable item sets having low selling frequencies. These problems are caused by the facts that FIM treats all items as having the same importance. These representations successfully reduce the number of item sets found. In this paper, we address all of these challenges by proposing. A condensed and meaningful representation of HUIs named closed high utility item sets (CHUIs), which integrates the concept of closed item set into high utility item set mining. Due to a new structure named utility unit array the proposed representation is lossless that allows recovering all HUIs and their utilities efficiently. The proposed representation is also compact. Experiments show that it reduces the number of item sets by several orders of magnitude, especially for datasets containing long high utility item sets mining, each item has a weight (e.g. unit profit) and can appear more than once in each transaction (e.g. purchase quantity). The utility of an item set represents its importance. If its utility is no less than a user-specified minimum utility threshold then that item set is called a high utility item set (HUI); otherwise, it is called a low utility item set. It has a wide range of applications such as website click stream analysis, cross marketing in retail stores mobile commerce environment and biomedical applications. The original dataset consists of weather data related to storm events. Pre-processing of the dataset has to be done. In this stage all the noisy data is removed. The data that would be helpful for processing is alone categorized for usage. For this the original dataset is provided as input for removing the noisy data. The removal of this noisy data is done manually. By this the original dataset is pre-processed. Now the pre-processed dataset is provided as input in the next step. Next loading and processing of dataset is done. The availability of the file is checked for. If found then processed or else warning has to be send. Then implementation of apriori algorithm is done in this stage. The state codes are verified and then the number of frequent sets is categorized. Thus based on the number of occurrence the frequent set is prepared. The Frequent set is provided as output using attribute number.
All rights reserved by www.grdjournals.com
208
Lossless Representation of High Utility Item set Using APRIORI Algorithm (GRDJE / CONFERENCE / ICIET - 2016 / 033)
II. ARCHITECTURE OVERVIEW
III. APRIORI ALGORITHM Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database this has applications in domains such as market basket analysis. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. The frequent set is calculated using support and confidence level. Based on the support and confidence level all processing is done.
IV. RELATED WORK The existing system does the analysis of item sets by using frequent pattern tree. In this system a dataset is provided as input. They are arranged in the form of a tree structure. Further searches are made from the tree structure of the dataset. The tree structure occupies more space in memory. Time used for search is also more. Every time when the data has to be sorted they were arranged in a tree structure. This tree structure has to be traversed every time when it is checked for. This seemed to be a very tedious job. These were the complexities of the existing system. Thus to Avoid this existing system we use a tabulation method for the proposed system. This in turn reduces the complexity of usage.
V. PROPOSED WORK In the system proposed to be done apriori algorithm is used. A concise dataset is prepared and then search is done from it. The need to search the original database is overcome by using this algorithm. Further processing is done that arranges the data set in hierarchical order. By this space and time complexity can be reduced greatly. The weather dataset is pre-processed and the noisy data are removed. Then from the available data the frequent sets are calculated. This calculation is done by using the support and confidence value provided. Then the frequent sets have to be displayed. This is done using the attribute display array of the data input.
All rights reserved by www.grdjournals.com
209
Lossless Representation of High Utility Item set Using APRIORI Algorithm (GRDJE / CONFERENCE / ICIET - 2016 / 033)
VI. IMPLEMENTATION The various modules that has to be implemented are pre-processing, loading and processing of data files, implementation of apriori check based on support and confidence value and to output frequent set using attribute numbers In the first step preprocessing of dataset is done. The original database would contain lot of data. This makes it noisy. To reduce it database has to be pre-processed. The weather dataset has lot of unused data that would not be helpful in processing. Those unused data is removed by pre-processing of the dataset. Then loading and processing of dataset is done. Data set is provided as input to the program. The availability of data file is checked. If the data file is present then the processing is continued. If file is not present then provide a proper input file. Implementation of Apriori check by using support and confidence level. Now the input file has to be processed. Apriori algorithm is applied to it. Based on the support and confidence level provided processing is done. The frequent set present is calculated. Then the count is provided as output. Then final output has to be provided to the users. This is done by attribute numbering method. The number of high utility item set is calculated. This is provided as output to the users.
VII.
EXPERIMENTAL RESULT
VIII.
CONCLUSION
The perception of the web has increased due to the introduction of new social platform which are in need of methods and tools to support users’ and search for other user groups which communicates their own interests. The advent of social networks, web user groups and other user groups changed the ways of sharing information between the users. In this paper, Weather data set is used for processing. The various location where storm has occurred is categorized. This is taken as input. Then by applying apriori algorithm then the count of frequent set is calculated and processed. Then the output is provided as output in the form of array output.
IX. FUTURE ENHANCEMENT This paper proposed a way of identifying the impact of storms in various locations. Only a few state codes were entered. This could be further enhanced by including various state codes. By doing this prediction could be done easily for various locations in the world. Thus this could be made more helpful for the tourist who visit such locations and the disaster zones could be easily analysed. Various analysis can also be made further. Like a dataset for analysing crime can be used as input for providing more security to the users. This would even help the government to improvise the security.
All rights reserved by www.grdjournals.com
210
Lossless Representation of High Utility Item set Using APRIORI Algorithm (GRDJE / CONFERENCE / ICIET - 2016 / 033)
REFERENCES [1] Sen Zhang, Member, IEEE, Zhihui Du, Member, IEEE, and Jason T. L. Wang, Member, IEEE “New Techniques for Mining Frequent Patterns in Unordered Trees “in IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 6, JUNE 2015 1113. [2] Vincent S. Tseng, Cheng-Wei Wu, Philippe Fournier-Viger, and Philip S. Yu, Fellow, IEEE” Efficient Algorithms for Miningthe Concise and Lossless Representation of High Utility Item sets” in Ieee Transactions On Knowledge And Data Engineering, Vol. 27, No. 3, March 2015 [3] Divya Bansal and Lekha Bhambhu M.Tech Scholar Assistant Professor Department Of Computer Science Department Of Computer Science. J.C.D College Of Engineering And Technology G.J.U. University Of Science & Technology, India “ Execution Of Apriori Algorithm Of Data Mining Directed Towards Tumultuous Crimes Concerning Women” In Volume 3, Issue 9, September 2013 Issn: 2277 128x International Journal Of Advanced Research In Computer Science And Software Engineering [4] U Kanimozhi, J K Kavitha, D Manjula Mining High Utility Item sets – A Recent Survey In International Journal Of Scientific Engineering And Technology (Issn : 2277-1581)Volume No.3 Issue No.11, Pp : 1339-1344 [5] International Journal Of Innovative Research In Computer And Communication Engineering (An Iso 3297: 2007 Certified Organization) Vol. 3, Issue 11, November 2015discovering Frequent Item sets Using Fastapriori Algorithm M. Premalatha1, T. Menaka2 [6] Akshita Bhandaria, Ashutosh Gupta, DE basis Das” Improvised Apriori Algorithm Using Frequent Pattern Tree For Real Time Applications in Data Mining” International Conference on Information and Communication Technologies (Icict 2014) [7] Claudio Lucchese, Salvatore Orlando, And Raffaele Perego Fast And Memory Efficient Mining Of Frequent Closed Item sets In Ieee Transactions On Knowledge And Data Engineering, Vol. 18, No. 1, January 2006 [8] Jiawei Han University of Illinois at Urbana-Champaign Jianpei State University Of New York At Buffalo Yiwen Yin Simon Fraser University Raying Mao Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach.
All rights reserved by www.grdjournals.com
211