ISSN (ONLINE) : 2045 -8711 ISSN (PRINT) : 2045 -869X
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY & CREATIVE ENGINEERING
DECEMBER 2016 VOL- 6 NO-12
@IJITCE Publication
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
UK: Managing Editor International Journal of Innovative Technology and Creative Engineering 1a park lane, Cranford London TW59WA UK E-Mail: editor@ijitce.co.uk Phone: +44-773-043-0249 USA: Editor International Journal of Innovative Technology and Creative Engineering Dr. Arumugam Department of Chemistry University of Georgia GA-30602, USA. Phone: 001-706-206-0812 Fax:001-706-542-2626 India: Editor International Journal of Innovative Technology & Creative Engineering Dr. Arthanariee. A. M Finance Tracking Center India 66/2 East mada st, Thiruvanmiyur, Chennai -600041 Mobile: 91-7598208700
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
www.ijitce.co.uk
IJITCE PUBLICATION
International Journal of Innovative Technology & Creative Engineering Vol.6 No.12 December 2016
www.ijitce.co.uk
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
From Editor's Desk Dear Researcher, Greetings! Research article in this issue discusses about motivational factor analysis. Let us review research around the world this month. Galaxies as wide as the Milky Way but bereft of starlight are scattered throughout our cosmic neighbourhood. Unlike Andromeda and other well-known galaxies, these dark beasts have no grand spirals of stars and gas neither wrapped around a glowing core nor is they radiant balls of densely packed stars. Instead, researchers find just a wisp of starlight from a tenuous blob. How these dark galaxies form is unclear. They could be a whole new type of galaxy that challenges ideas about the birth of galaxies. The outliers of already familiar galaxies, black sheep shaped by their environment. Wherever they come from, dark galaxies appear to be ubiquitous. This haul of ghostly galaxies is puzzling on many fronts. Any galaxy the size of the Milky Way should have no trouble creating lots of stars. But it’s still unclear how heavy the dark galaxies are. Perhaps these shadowy entities are failed galaxies, as massive as our own but mysteriously prevented from giving birth to a vast stellar family. Or despite being as wide as the Milky Way, they could be relative lightweights stretched thin by internal or external forces. It has been an absolute pleasure to present you articles that you wish to read. We look forward to many more new technologies related research articles from you and your friends. We are anxiously awaiting the rich and thorough research papers that have been prepared by our authors for the next issue.
Thanks, Editorial Team IJITCE
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
Editorial Members Dr. Chee Kyun Ng Ph.D Department of Computer and Communication Systems, Faculty of Engineering,Universiti Putra Malaysia,UPMSerdang, 43400 Selangor,Malaysia. Dr. Simon SEE Ph.D Chief Technologist and Technical Director at Oracle Corporation, Associate Professor (Adjunct) at Nanyang Technological University Professor (Adjunct) at ShangaiJiaotong University, 27 West Coast Rise #08-12,Singapore 127470 Dr. sc.agr. Horst Juergen SCHWARTZ Ph.D, Humboldt-University of Berlin,Faculty of Agriculture and Horticulture,Asternplatz 2a, D-12203 Berlin,Germany Dr. Marco L. BianchiniPh.D Italian National Research Council; IBAF-CNR,Via Salaria km 29.300, 00015 MonterotondoScalo (RM),Italy Dr. NijadKabbaraPh.D Marine Research Centre / Remote Sensing Centre/ National Council for Scientific Research, P. O. Box: 189 Jounieh,Lebanon Dr. Aaron Solomon Ph.D Department of Computer Science, National Chi Nan University,No. 303, University Road,Puli Town, Nantou County 54561,Taiwan Dr. Arthanariee. A. M M.Sc.,M.Phil.,M.S.,Ph.D Director - Bharathidasan School of Computer Applications, Ellispettai, Erode, Tamil Nadu,India Dr. Takaharu KAMEOKA, Ph.D Professor, Laboratory of Food, Environmental & Cultural Informatics Division of Sustainable Resource Sciences, Graduate School of Bioresources,Mie University, 1577 Kurimamachiya-cho, Tsu, Mie, 514-8507, Japan Dr. M. Sivakumar M.C.A.,ITIL.,PRINCE2.,ISTQB.,OCP.,ICP. Ph.D. Project Manager - Software,Applied Materials,1a park lane,cranford,UK Dr. Bulent AcmaPh.D Anadolu University, Department of Economics,Unit of Southeastern Anatolia Project(GAP),26470 Eskisehir,TURKEY Dr. SelvanathanArumugamPh.D Research Scientist, Department of Chemistry, University of Georgia, GA-30602,USA.
Review Board Members Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168, Australia Dr. Zhiming Yang MD., Ph. D. Department of Radiation Oncology and Molecular Radiation Science,1550 Orleans Street Rm 441, Baltimore MD, 21231,USA Dr. Jifeng Wang Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign Urbana, Illinois, 61801, USA Dr. Giuseppe Baldacchini ENEA - Frascati Research Center, Via Enrico Fermi 45 - P.O. Box 65,00044 Frascati, Roma, ITALY. Dr. MutamedTurkiNayefKhatib Assistant Professor of Telecommunication Engineering,Head of Telecommunication Engineering Department,Palestine Technical University (Kadoorie), TulKarm, PALESTINE.
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61 Dr.P.UmaMaheswari Prof &Head,Depaartment of CSE/IT, INFO Institute of Engineering,Coimbatore. Dr. T. Christopher, Ph.D., Assistant Professor &Head,Department of Computer Science,Government Arts College(Autonomous),Udumalpet, India. Dr. T. DEVI Ph.D. Engg. (Warwick, UK), Head,Department of Computer Applications,Bharathiar University,Coimbatore-641 046, India. Dr. Renato J. orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business School,RuaItapeva, 474 (8° andar),01332-000, São Paulo (SP), Brazil Visiting Scholar at INSEAD,INSEAD Social Innovation Centre,Boulevard de Constance,77305 Fontainebleau - France Y. BenalYurtlu Assist. Prof. OndokuzMayis University Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India Dr. ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688 Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business SchoolRuaItapeva, 474 (8° andar),01332-000, São Paulo (SP), Brazil Dr. Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 JavadRobati Crop Production Departement,University of Maragheh,Golshahr,Maragheh,Iran VineshSukumar (PhD, MBA) Product Engineering Segment Manager, Imaging Products, Aptina Imaging Inc. Dr. Binod Kumar PhD(CS), M.Phil.(CS), MIAENG,MIEEE HOD & Associate Professor, IT Dept, Medi-Caps Inst. of Science & Tech.(MIST),Indore, India Dr. S. B. Warkad Associate Professor, Department of Electrical Engineering, Priyadarshini College of Engineering, Nagpur, India Dr. doc. Ing. RostislavChoteborský, Ph.D. Katedramateriálu a strojírenskétechnologieTechnickáfakulta,Ceskázemedelskáuniverzita v Praze,Kamýcká 129, Praha 6, 165 21 Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168 DR.ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg.,HamptonUniversity,Hampton, VA 23688 Mr. Abhishek Taneja B.sc(Electronics),M.B.E,M.C.A.,M.Phil., Assistant Professor in the Department of Computer Science & Applications, at Dronacharya Institute of Management and Technology, Kurukshetra. (India). Dr. Ing. RostislavChotěborský,ph.d, Katedramateriálu a strojírenskétechnologie, Technickáfakulta,Českázemědělskáuniverzita v Praze,Kamýcká 129, Praha 6, 165 21
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61 Dr. AmalaVijayaSelvi Rajan, B.sc,Ph.d, Faculty – Information Technology Dubai Women’s College – Higher Colleges of Technology,P.O. Box – 16062, Dubai, UAE Naik Nitin AshokraoB.sc,M.Sc Lecturer in YeshwantMahavidyalayaNanded University Dr.A.Kathirvell, B.E, M.E, Ph.D,MISTE, MIACSIT, MENGG Professor - Department of Computer Science and Engineering,Tagore Engineering College, Chennai Dr. H. S. Fadewar B.sc,M.sc,M.Phil.,ph.d,PGDBM,B.Ed. Associate Professor - Sinhgad Institute of Management & Computer Application, Mumbai-BangloreWesternly Express Way Narhe, Pune - 41 Dr. David Batten Leader, Algal Pre-Feasibility Study,Transport Technologies and Sustainable Fuels,CSIRO Energy Transformed Flagship Private Bag 1,Aspendale, Vic. 3195,AUSTRALIA Dr R C Panda (MTech& PhD(IITM);Ex-Faculty (Curtin Univ Tech, Perth, Australia))Scientist CLRI (CSIR), Adyar, Chennai - 600 020,India Miss Jing He PH.D. Candidate of Georgia State University,1450 Willow Lake Dr. NE,Atlanta, GA, 30329 Jeremiah Neubert Assistant Professor,MechanicalEngineering,University of North Dakota Hui Shen Mechanical Engineering Dept,Ohio Northern Univ. Dr. Xiangfa Wu, Ph.D. Assistant Professor / Mechanical Engineering,NORTH DAKOTA STATE UNIVERSITY SeraphinChallyAbou Professor,Mechanical& Industrial Engineering Depart,MEHS Program, 235 Voss-Kovach Hall,1305 OrdeanCourt,Duluth, Minnesota 55812-3042 Dr. Qiang Cheng, Ph.D. Assistant Professor,Computer Science Department Southern Illinois University CarbondaleFaner Hall, Room 2140-Mail Code 45111000 Faner Drive, Carbondale, IL 62901 Dr. Carlos Barrios, PhD Assistant Professor of Architecture,School of Architecture and Planning,The Catholic University of America Y. BenalYurtlu Assist. Prof. OndokuzMayis University Dr. Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials CSIRO Process Science & Engineering Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India Dr. ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61 Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business School,RuaItapeva, 474 (8° andar)01332-000, São Paulo (SP), Brazil Dr. Wael M. G. Ibrahim Department Head-Electronics Engineering Technology Dept.School of Engineering Technology ECPI College of Technology 5501 Greenwich Road Suite 100,Virginia Beach, VA 23462 Dr. Messaoud Jake Bahoura Associate Professor-Engineering Department and Center for Materials Research Norfolk State University,700 Park avenue,Norfolk, VA 23504 Dr. V. P. Eswaramurthy M.C.A., M.Phil., Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. P. Kamakkannan,M.C.A., Ph.D ., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. V. Karthikeyani Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 008, India. Dr. K. Thangadurai Ph.D., Assistant Professor, Department of Computer Science, Government Arts College ( Autonomous ), Karur - 639 005,India. Dr. N. Maheswari Ph.D., Assistant Professor, Department of MCA, Faculty of Engineering and Technology, SRM University, Kattangulathur, Kanchipiram Dt - 603 203, India. Mr. Md. Musfique Anwar B.Sc(Engg.) Lecturer, Computer Science & Engineering Department, Jahangirnagar University, Savar, Dhaka, Bangladesh. Mrs. Smitha Ramachandran M.Sc(CS)., SAP Analyst, Akzonobel, Slough, United Kingdom. Dr. V. Vallimayil Ph.D., Director, Department of MCA, Vivekanandha Business School For Women, Elayampalayam, Tiruchengode - 637 205, India. Mr. M. Moorthi M.C.A., M.Phil., Assistant Professor, Department of computer Applications, Kongu Arts and Science College, India PremaSelvarajBsc,M.C.A,M.Phil Assistant Professor,Department of Computer Science,KSR College of Arts and Science, Tiruchengode Mr. G. Rajendran M.C.A., M.Phil., N.E.T., PGDBM., PGDBF., Assistant Professor, Department of Computer Science, Government Arts College, Salem, India. Dr. Pradeep H Pendse B.E.,M.M.S.,Ph.d Dean - IT,Welingkar Institute of Management Development and Research, Mumbai, India Muhammad Javed Centre for Next Generation Localisation, School of Computing, Dublin City University, Dublin 9, Ireland Dr. G. GOBI Assistant Professor-Department of Physics,Government Arts College,Salem - 636 007 Dr.S.Senthilkumar Post Doctoral Research Fellow, (Mathematics and Computer Science & Applications),UniversitiSainsMalaysia,School of Mathematical Sciences, Pulau Pinang-11800,[PENANG],MALAYSIA. Manoj Sharma Associate Professor Deptt. of ECE, PrannathParnami Institute of Management & Technology, Hissar, Haryana, India RAMKUMAR JAGANATHAN Asst-Professor,Dept of Computer Science, V.L.B Janakiammal college of Arts & Science, Coimbatore,Tamilnadu, India
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61 Dr. S. B. Warkad Assoc. Professor, Priyadarshini College of Engineering, Nagpur, Maharashtra State, India Dr. Saurabh Pal Associate Professor, UNS Institute of Engg. & Tech., VBS Purvanchal University, Jaunpur, India Manimala Assistant Professor, Department of Applied Electronics and Instrumentation, St Joseph’s College of Engineering & Technology, Choondacherry Post, Kottayam Dt. Kerala -686579 Dr. Qazi S. M. Zia-ul-Haque Control Engineer Synchrotron-light for Experimental Sciences and Applications in the Middle East (SESAME),P. O. Box 7, Allan 19252, Jordan Dr. A. Subramani, M.C.A.,M.Phil.,Ph.D. Professor,Department of Computer Applications, K.S.R. College of Engineering, Tiruchengode - 637215 Dr. SeraphinChallyAbou Professor, Mechanical & Industrial Engineering Depart. MEHS Program, 235 Voss-Kovach Hall, 1305 Ordean Court Duluth, Minnesota 55812-3042 Dr. K. Kousalya Professor, Department of CSE,Kongu Engineering College,Perundurai-638 052 Dr. (Mrs.) R. Uma Rani Asso.Prof., Department of Computer Science, Sri Sarada College For Women, Salem-16, Tamil Nadu, India. MOHAMMAD YAZDANI-ASRAMI Electrical and Computer Engineering Department, Babol"Noshirvani" University of Technology, Iran. Dr. Kulasekharan, N, Ph.D Technical Lead - CFD,GE Appliances and Lighting, GE India,John F Welch Technology Center,Plot # 122, EPIP, Phase 2,Whitefield Road,Bangalore – 560066, India. Dr. Manjeet Bansal Dean (Post Graduate),Department of Civil Engineering,Punjab Technical University,GianiZail Singh Campus,Bathinda -151001 (Punjab),INDIA Dr. Oliver Jukić Vice Dean for education,Virovitica College,MatijeGupca 78,33000 Virovitica, Croatia Dr. Lori A. Wolff, Ph.D., J.D. Professor of Leadership and Counselor Education,The University of Mississippi,Department of Leadership and Counselor Education, 139 Guyton University, MS 38677
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
Contents An Adaptive Intelligence Technique for Ultra Metric Tree Frequent Item Set Mining K.Mohankumar, M.Santhoshmani, Dr.S.Prasath .…………………………………….[395]
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
An Adaptive Intelligence Technique for Ultra Metric Tree Frequent Item Set Mining K.Mohankumar M.Phil Research Scholar, Department of Computer Science, Nandha Arts and Science College Erode, Tamil Nadu, India. Mrs.M.Santhoshmani Assistant Professor, Department of Computer Science, Nandha Arts and Science College Erode, Tamil Nadu, India. Email:shrisanu21@gmail.com Dr.S.Prasath Assistant Professor, Department of Computer Science, Nandha Arts and Science College Erode, Tamil Nadu, India. Email: softprasaths@gmail.com Abstract-Frequent items are an item that occurs frequently in the dataset. Frequent item set mining (FIM) is a one of the core data mining operation. Frequent item set mining is mainly used for market basket analysis. Consider an example a set of items that contains bread and butter which always occurs frequently together. A traditional frequent item set mining algorithm are Apriori and FPgrowth algorithm. Apriori algorithm is a level-wise iterative approach were k items are used to generate the k+1 items. Apriori algorithm consists of two steps join step and prune step. Initially candidate items are generated by joining process after that by checking the minimum support count frequent items will be generated. The process will be repeated until all k frequent items generation. However, it has a disadvantage that many candidate items should generate which increases the computing time. To overcome that a pattern growth approach algorithm is proposed which significantly reduce the size of candidate sets. FP-Growth algorithm adopts a divide and conquers strategy for finding frequent item sets. It also has some disadvantage that frequent items are generated by repeated scanning of database and recursive traversing of tree. Keywords— Data Mining, Frequency Item Set, Apriori.
1. INTRODUCTION Frequent items are an item that occurs frequently in the dataset. Frequent item set mining (FIM) is a one of the core data mining operation. Frequent item set mining is mainly used for market basket analysis. Consider an example a set of items that contains bread and butter which always occurs frequently together. A traditional frequent item set mining algorithms are Apriori and FP-growth algorithm. Apriori algorithm is a level-wise iterative approach were k items are used to generate the k+1 items. Apriori algorithm consists of two steps join step and prune step. Initially candidate items are generated by joining process after that by checking the minimum support count frequent items will be generated. The process will be repeated until all k frequent
items generation. However, it has a disadvantage that many candidate items should generate which increases the computing time. To overcome that a pattern growth approach algorithm is proposed which significantly reduce the size of candidate sets. FP-Growth algorithm adopts a divide and conquers strategy for finding frequent item sets. It also has some disadvantage that frequent items are generated by repeated scanning of database and recursive traversing of tree. 1.1 Frequent Item set Ultra metric Tree Using Map Reduce Now days, FIM is most importantly used by researchers because it is widely applied in real world to find the frequent item sets. As a volume of database increases day by day, the problems of scalability and efficiency become more severe. As a solution to this problem, we design a parallel mining of frequent item set using FIUT algorithm on Map Reduce framework. In this paper we incorporate enhanced Frequent Item set Ultra metric Tree (E-FIUT), rather than traditional FP-Tree. FIUT has major four advantages over traditional FP-tree like; it involves only two round of scanning which minimizes I/O overhead. Then the EFIUT is a highly improved way to partition a database, which considerably reduces the search space. Next is here only frequent items in each transaction are inserted as nodes into the EFIUT for compressed storage. At last all frequent item sets are generated without traversing the tree recursively by checking the leaves of each EFIUT which significantly reduces computing time. 2. BIGDATA In the 21st century, it is increasingly inseparable from the network, people visit dozens or even hundreds of pages, or upload photos or speech every day, which makes the data content on the network into a geometric growth, and the traditional technical architecture has become increasingly unable to meet the current needs of the vast amounts of data. Therefore, researching massive data processing and storage become more and more popular nowadays.
395
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
Big data is a large data that it becomes difficult to process the conventional database systems. If the data is very large, moves very fast, or doesn’t fit the structures of the database architectures. To gain value from this data, choose another way to process the data. Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Big Data is the frontier of the firm’s ability to store, process and access large volume of data it needs to operate effectively, make decisions, reduce risks, and serve customers. However, the amount of data generated can often be very large for a single computer to process in a reasonable amount of time. Furthermore, the data itself may be too big to store on a single machine. Therefore, in order to reduce the time taken to process the data, and to allocate the storage space for large files, it is necessary to write programs that can execute on multiple computers and distribute the workload among them. 3. HADOOP Hadoop is the foundation for biggest data architecture. Apache hadoop is an open source java programming framework for fast storing and fast processing large data sets with cluster of commodity hardware. Cluster is a set of machine in single LAN (Local Area Network). The Hadoop is mainly constituted by the underlying distributed file system HDFS (Hadoop Distributed File System) and Map Reduce layer of parallel programming model engine. Hadoop is used by various universities and companies like Google, eBay, Facebook, IBM, LinkedIn and Twitter. HDFS is a reliable distributed file system that provides high-throughput and scalable access to data. Map Reduce is a distributed framework for executing the work in parallel. Hadoop has the master/slave architecture for both processing and storage. HDFS is a specially designed file system for storing massive amount of data sets with cluster of commodity hardware with steaming access pattern. Steaming access pattern means write once and read any number of times but don’t change content of files in file system. HDFS differ from other file system by its significant. HDFS is a very large distributed file system which is highly fault-tolerant, provides high throughput access to the large data and deployed on low-cost hardware. HDFS is mainly used for storing data, and simply adding the number of servers can achieve growth in storage capacity and computing power. Map Reduce can make full use of the computing resources of each server's CPU, which efficiently handles with the stored data and calculations. To address the above issues, Google developed the Google File System (GFS), which is a distributed file system architecture model for processing large amount of data and created the Map Reduce programming model. The Map Reduce programming model is for processing the massive amount of data in parallel. Hadoop is an open source software which mange Map Reduce framework, written in Java, originally developed by Yahoo. A Map Reduce consists of two tasks namely the Map and Reduce task. Each Map task takes key-value pair as input and produce key-value pair as an output. The input data are split into various input splits. Based on the number of input
splits Mapper will be assign. Record Reader is an interface between input split and Mapper which is used to convert record into key value pair. Mapper will read key value pair as an input and produce key value pair as an output. Now the Reducer will combine all the intermediate values associated with a particular key. Both input pairs of Mapper and Reducer are managed by the HDFS. The advantage of Map Reduce is highly scalable, transparent fault-tolerant processing and automatic parallelization. 4. EXISTING SYSTEM In Existing System Rather than considering Apriori and FP-growth, we incorporate the frequent items ultra metric tree (FIU-tree) in the design of our parallel FIM technique. We focus on FIU-tree because of its four salient advantages, which include reducing I/O overhead, offering a natural way of partitioning a dataset, compressed storage, and averting recursively traverse. Parallel algorithms lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large computing clusters. Not efficient, require more time for mining. Existing parallel mining algorithms for frequent item sets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent item sets mining algorithm called FiDoop using the Map Reduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultra metric tree, rather than conventional FP trees. In FiDoop, three Map Reduce jobs are implemented to complete the mining task. In the crucial third Map Reduce job, the mappers independently decompose item sets, the reducers perform combination operations by constructing small ultra metric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because item sets with different lengths have different decomposition and construction costs. To improve FiDoop’s performance, we develop a workload balance metric to measure load balance across the cluster’s computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for highdimensional data analysis. Extensive experiments using realworld celestial spectral data demonstrate that our proposed solution is efficient and scalable. 5. PROPOSED SYSTEM The proposed system a new data partitioning method to well balance computing load among the cluster nodes; we develop FiDoop-HD, an extension of FiDoop, to meet the needs of high dimensional data processing. 5.1 Frequent Item set Ultra metric Tree I-FIUT is a new method for mining frequent item sets from the database. I-FIUT has four major advantages over traditional FP-tree like: it involves only two round of scanning which minimizes I/O overhead. Then the I-FIUT is a highly improved way to partition a database, which considerably reduces the search space. Next is here only frequent items in
396
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
each transaction are inserted as nodes into the I-FIUT for compressed storage. At last all frequent item sets are generated without traversing the tree recursively by checking the leaves of each I-FIUT which reduces computing time significantly. I-FIUT consists of two phases to generate the frequent item sets from the transactions by two rounds of scanning the database. 5.2 Generating one Item sets and K Item sets Phase1 consists of two round of scanning the database. At the first round of scanning the database frequent one item will be generated based on the minimum support count. At the second round of scanning the database all kitems will be generated by pruning the infrequent items from each transaction. 5.3 Generating Frequent K Item sets Phase2 consists of a two process decompose each ‘h’ item sets into ‘k’ item sets. After decomposing process, the repetitive construction of K-FIU-Tree and all ‘k’ frequent item sets are generated by checking the leaves of FIU-Tree where ‘k’ is from M down to 2. After decomposing process ‘k’ item sets are generated that are used for the construction of K FIU Tree. Initially the root is labeled as null. Then each ‘k’ item sets are inserted into the tree. If first frequent item exists as one of the children of the root, then it denotes the child as a temporary 1st root, if it is not existing then add a new node for this item as a child of the root node and denote it as temporary 1st root. Then the sth frequent item of the k item set, where ‘s’ is from 2 to k - 1, check if the sth frequent item exists as one of the children of the temporary (s-1)th root, then denote the child as a temporary sth root. If it does not exist, then add a new node for this item as a child of the temporary (s-1)th root and denote it as a temporary sth root. This process is repeated until K-FIU Tree is constructed. By checking the leaf node all k frequent items will be generated. 5.4 I-FIUT Each phase of Frequent Item set Ultra metric Tree is explained with an example. Consider the 5 transactional database D as shown in the Table.5.1 Table 5.1 Database D TID
ITEMS BOUGHT
100
a,c,d,f,g,i,m,p
200
a,b,c,f,l,m,o
300
b,f,h,j,o
400
b,c,k,s,p
500 a,c,e,f,l,m,n,p During the phase 1 at the first round of scanning the database frequent one item sets will be generated with the minimum support count value 2. Table 5.2 shows the frequent one item set of the database D.
Table 5.2 Frequent 1 Item sets A 3 B 4 C 5 F 4 M 3 P 4 During the phase 1 at the second round of scanning the database all ‘k’ item sets will be generated by pruning the each infrequent item from each transactional datasets. Table 5.3 shows all ‘k’ item sets. Table 5.3 All K Item sets 1-itemsets a,c,f,m,p 5-itemsets a,b,c,f,m 4-itemsets Ø 3-itemsets b,c,p 2-itemsets b,f 5.5 Parallel Mining of Frequent Item sets using I-FIUT on Map reduce Framework As a volume of database increases day by day traditional frequent item set mining algorithms becomes inefficient. As a solution to this problem parallel mining of frequent item sets using I-FIUT algorithm is implemented on Map Reduce framework. Here we using I-FIUT algorithm rather than traditional FP-Tree algorithm because to avoid building conditional patterns and to achieve compressed storage. We build this using Hadoop framework. The working flow of I-FIUT algorithm on Map Reduce framework consists of three Map Reduce job. Synthetic datasets are used for the experimental analysis. The working flow of I-FIUT based Map Reduce framework. It consists of three Map Reduce job. The output of the first Map Reduce is all frequent one item sets. The second Map Reduce job is responsible for creating all k item sets. Finally the third Map Reduce job is responsible for creating all frequent k item sets. 5.6 Frequent One Item sets Generation The first Map Reduce job is responsible for mining all frequent one-item sets. A transaction database is partitioned into multiple input split files stored by the HDFS across multiple data nodes of a Hadoop cluster. Number of mapper will be executed based on number of input split. Each mapper sequentially reads each transaction from its local input split, where each transaction is stored in the format of key value pair<Long Writable offset, Text record> by the record reader. Then, mappers compute the frequencies of items and generate local one-item sets. Next, these one-item sets with the same key emitted by different mappers are sorted and merged in a specific reducer, which further produces global one item sets. Finally, infrequent items are pruned by applying the min support and consequently, global frequent one-item sets are generated and written in the form of pair<Text item, Long Writable count> as the output from the first Map Reduce job. Importantly, frequent one-item sets along with their counts are stored in a local file system, which becomes the input of the second Map Reduce job.
397
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
5.7 All K Item sets Generation Given frequent one-item sets generated by the first Map Reduce job, the second Map Reduce job applies a second round of scanning on the database to prune infrequent items from each transaction record. The second job marks an item set as a k-item set if it contains k frequent items (2 ≤ k ≤ M, where M is the maximal value of k in the pruned transactions). Each mapper of the second job takes transactions as input. Then, the mapper emits a set of pair <Array Writable item sets, Long writable ONE>, in which item sets is composed of the number of the items produced by pruning and the set of items. These pairs obtained by the second Map Reduce job’s mappers are combined and shuffled for the second job’s reducers. After performing the combination operation, each reducer emits key/value pairs, where the key is the number of each item set and the value is each item set and its count. More formally, the output of the second Map Reduce job is pair<Int Writable item number, Map Writable<Array Writable k-item, Long Writable 5.8 Frequent K Item sets Generation The third Map Reduce job a computationally expensive phase is dedicated to: 1) decomposing item sets; 2) constructing k-FIU trees and 3) mining frequent item sets. The main goal of each mapper is twofold: 1) to decompose each kitem set obtained by the second Map Reduce job into a list of small-sized sets, where the number of each set is anywhere between 2 to k − 1 and 2) to construct an FIU-tree by merging local decomposition results with the same length. The third Map Reduce job is highly scalable, because the decomposition procedure of each mapper is independent of the other mappers. In other words, the multiple mappers can perform the decomposition process in parallel. Such an FIU-tree construction improves data storage efficiency and I/O performance; the improvement is made possible thanks to merging the same item sets in advance using small FIU trees. The Map function of the third job generates a set of key/value pairs, in which the key is the number of items in an item set and the value is an FIU-tree that is comprised of non leaf and leaf nodes. Non leaf nodes include item-name and node-link, leaf nodes include item-name and its support. In doing so, item sets with the same number of items are delivered to a single reducer. By parsing the key-value pair (k2, v2), the reducer is responsible for constructing k2-FIU-tree and mining all frequent item sets only by checking the count value of each leaf in the k2-FIU-tree without repeatedly traversing the tree. Figure 6.4 illustrates the Map and Reduce functions. Here, the details on the function of t-FIU-tree generation (t-item set) can be found. The decompose () function is a recursive one, decomposing an h-item set into a list of k-item sets, where k is an integer between 2 and h.
6.1 Minimum Support Count Minimum support count plays the important role in mining frequent item sets. When we increase the minimum support threshold the running time of the proposed algorithm reduces. A small minimum support slows down the performance of the evaluated algorithms. This is because an increasing number of items satisfy the small minimum support when the min support is decreased; it takes an increased amount of time to process the large number of items. Figure 6.1 shows the execution time of four different minimum support counts.
Fig.6.1 Minimum Support 6.2 Scalability In this experiment, evaluate the scalability of the proposed algorithm when the size of input dataset grows dramatically. The parallel mining process is slowed down by the excessive data amount that has to be scanned twice. The increased dataset leads to a long scanning time. An output of the second Map Reduce job are distributed and stored in intermediate files based on the length of item set these files are accessed by the third Map Reduce job as an input. Further, the decomposed results are written into these external files. The scalability of the proposed algorithm is higher when it comes to parallel mining of an enormous amount of data.
6. PEFORMANCE EVALUATION The performance for proposed methods can be evaluated by using the following parameters. Parameters which are considered for evaluating the experiments are: Fig.6.2 Scalability
398
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.12 DECEMBER 2016, IMPACT FACTOR: 0.61
7. CONCLUSION To solve the scalability and efficiency in the existing parallel mining algorithms for frequent item sets for frequent item sets, applied the parallel mining of frequent item sets using Frequent Item set Ultra metric Tree on Map Reduce framework. We incorporate the Frequent Item set Ultra metric Tree rather than conventional FP trees, thereby achieving compressed storage and avoiding the necessity to build conditional pattern bases. The proposed algorithm integrates three Map Reduce jobs to accomplish parallel mining of frequent item sets. At the end of the third Map Reduce job all frequent K item sets are generated. To evaluate the performance of the proposed I-FIUT algorithm on Map Reduce framework we use synthetic datasets in our experiments.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[12]
REFERNCES Chang E.Y., Li H., Wang Y., Zhang D. and Zhang M. (2008), ‘PFP: Parallel FP-growth for query recommendation’, in Proc. ACM Conf. Recommend.Syst., Lausanne, Switzerland, pp. 107– 114. Chang W.L., Chen P.L. and Lin K.W. (2011), ‘A novel frequent pattern mining algorithm for very large databases in cloud computing environments’, in Proc. IEEE Int. Conf. Granular Comput. (GrC), Kaohsiung, Taiwan, pp. 399–403. Chunyan H., Hong S., Huaxuan Z. and Shiping s. (2013), ‘The study of improved FP-growth algorithm in MapReduce’ in Proc. 1st Int. Workshop Cloud Comput. Inf. Security, Shanghai, China, 2013, pp. 250–253. Cong S, Han J., Hoeflinger J. and Padua D. (2005) ‘A sampling-based framework for parallel data mining’, in Proc. 10th ACM SIGPLAN Symp. Prin. Pract. Parallel Program., Chicago, IL, USA, pp. 255– 265. Dean J. and Ghemawat S. (2008), ‘MapReduce: Simplified data processing on large clusters’, Commun. ACM, vol. 51, no. 1, pp. 107–113. Dean J. and Ghemawat S. (2010), ‘MapReduce: A flexible data processing Tool’, Commun. ACM, vol. 53, no. 1, pp. 72–77. Han E H., Karypis G. and Kumar V. (2000) ‘Scalable parallel data mining for association rules’ , IEEE Trans. Knowl. Data Eng., vol. 12, no. 3, pp. 337–352. Han J., Mao R., Pei J. and Yin Y. (2004), ‘Mining frequent patterns without candidate generation: A frequent-pattern tree approach’, Data Min. Knowl. Disc., vol. 8, no. 1, pp. 53–87. Hsueh S.C., Lin M.Y. and Lee P.Y. (2012), ‘Aprioribased frequent itemset mining algorithms on MapReduce’, in Proc. 6th Int. Conf. Ubiquit. Inf. Manage. Commun. (ICUIMC), Danang, Vietnam, pp. 76:1–76:8. Hsu T.J., Tsay J.Y. and Yu J.R. (2009), ‘I-FIUT: A new method for mining frequent itemsets’, Inf. Sci., vol. 179, no. 11, pp. 1724–1737.
399
Kitsuregawa M. and Pramudiono I. (2003), ‘Parallel FP-growth on PC cluster’, in Advances in Knowledge Discovery and Data Mining. Berlin, Germany: Springer, pp. 467–473.
@IJITCE Publication