ISSN (ONLINE) : 2045 -8711 ISSN (PRINT) : 2045 -869X
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY & CREATIVE ENGINEERING
MAY 2017 VOL-7 NO-05
@IJITCE Publication
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
UK: Managing Editor International Journal of Innovative Technology and Creative Engineering 1a park lane, Cranford London TW59WA UK E-Mail: editor@ijitce.co.uk Phone: +44-773-043-0249 USA: Editor International Journal of Innovative Technology and Creative Engineering Dr. Arumugam Department of Chemistry University of Georgia GA-30602, USA. Phone: 001-706-206-0812 Fax:001-706-542-2626 India: Editor International Journal of Innovative Technology & Creative Engineering Dr. Arthanariee. A. M Finance Tracking Center India 66/2 East mada st, Thiruvanmiyur, Chennai -600041 Mobile: 91-7598208700
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
www.ijitce.co.uk
IJITCE PUBLICATION
International Journal of Innovative Technology & Creative Engineering Vol.7 No.05 May 2017
www.ijitce.co.uk
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
From Editor's Desk Dear Researcher, Greetings! Research article in this issue discusses about motivational factor analysis. Let us review research around the world this month. Particles with mind-bending quantum properties still follow a standard gravitational rule, at least as far as scientists can tell. One of the first reported tests of the equivalence principle well before it was understood in the framework of general relativity was Galileo’s apocryphal experiment in which he is said to have dropped weights from the Leaning Tower of Pisa. Scientists have since adapted that test to smaller scales, swapping out the weights for atoms. In the new study, physicists went a step further, putting atoms into a quantum superposition, a kind of limbo in which an atom does not have a definite energy but occupies a combination of two energy levels. Manipulating rubidium atoms with lasers, scientists led by researchers from Italy gave the atoms an upward kick and observed how gravity tugged them down. Quantum tests of the equivalence principle explore the murky realm where quantum mechanics and general relativity meet. The two theories don’t play well with one another. Scientists are currently struggling to unify the pair into one theory of quantum gravity and some candidate theories predict that the equivalence principle breaks down at the quantum level. It has been an absolute pleasure to present you articles that you wish to read. We look forward to many more new technologies related research articles from you and your friends. We are anxiously awaiting the rich and thorough research papers that have been prepared by our authors for the next issue.
Thanks, Editorial Team IJITCE
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
Editorial Members Dr. Chee Kyun Ng Ph.D Department of Computer and Communication Systems, Faculty of Engineering,Universiti Putra Malaysia,UPMSerdang, 43400 Selangor,Malaysia. Dr. Simon SEE Ph.D Chief Technologist and Technical Director at Oracle Corporation, Associate Professor (Adjunct) at Nanyang Technological University Professor (Adjunct) at ShangaiJiaotong University, 27 West Coast Rise #08-12,Singapore 127470 Dr. sc.agr. Horst Juergen SCHWARTZ Ph.D, Humboldt-University of Berlin,Faculty of Agriculture and Horticulture,Asternplatz 2a, D-12203 Berlin,Germany Dr. Marco L. BianchiniPh.D Italian National Research Council; IBAF-CNR,Via Salaria km 29.300, 00015 MonterotondoScalo (RM),Italy Dr. NijadKabbaraPh.D Marine Research Centre / Remote Sensing Centre/ National Council for Scientific Research, P. O. Box: 189 Jounieh,Lebanon Dr. Aaron Solomon Ph.D Department of Computer Science, National Chi Nan University,No. 303, University Road,Puli Town, Nantou County 54561,Taiwan Dr. Arthanariee. A. M M.Sc.,M.Phil.,M.S.,Ph.D Director - Bharathidasan School of Computer Applications, Ellispettai, Erode, Tamil Nadu,India Dr. Takaharu KAMEOKA, Ph.D Professor, Laboratory of Food, Environmental & Cultural Informatics Division of Sustainable Resource Sciences, Graduate School of Bioresources,Mie University, 1577 Kurimamachiya-cho, Tsu, Mie, 514-8507, Japan Dr. M. Sivakumar M.C.A.,ITIL.,PRINCE2.,ISTQB.,OCP.,ICP. Ph.D. Project Manager - Software,Applied Materials,1a park lane,cranford,UK Dr. Bulent AcmaPh.D Anadolu University, Department of Economics,Unit of Southeastern Anatolia Project(GAP),26470 Eskisehir,TURKEY Dr. SelvanathanArumugamPh.D Research Scientist, Department of Chemistry, University of Georgia, GA-30602,USA.
Review Board Members Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168, Australia Dr. Zhiming Yang MD., Ph. D. Department of Radiation Oncology and Molecular Radiation Science,1550 Orleans Street Rm 441, Baltimore MD, 21231,USA Dr. Jifeng Wang Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign Urbana, Illinois, 61801, USA Dr. Giuseppe Baldacchini ENEA - Frascati Research Center, Via Enrico Fermi 45 - P.O. Box 65,00044 Frascati, Roma, ITALY.
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
Dr. MutamedTurkiNayefKhatib Assistant Professor of Telecommunication Engineering,Head of Telecommunication Engineering Department,Palestine Technical University (Kadoorie), TulKarm, PALESTINE. Dr.P.UmaMaheswari Prof &Head,Depaartment of CSE/IT, INFO Institute of Engineering,Coimbatore. Dr. T. Christopher, Ph.D., Assistant Professor &Head,Department of Computer Science,Government Arts College(Autonomous),Udumalpet, India. Dr. T. DEVI Ph.D. Engg. (Warwick, UK), Head,Department of Computer Applications,Bharathiar University,Coimbatore-641 046, India. Dr. Renato J. orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business School,RuaItapeva, 474 (8° andar),01332-000, São Paulo (SP), Brazil Visiting Scholar at INSEAD,INSEAD Social Innovation Centre,Boulevard de Constance,77305 Fontainebleau - France Y. BenalYurtlu Assist. Prof. OndokuzMayis University Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India Dr. ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688 Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business SchoolRuaItapeva, 474 (8° andar),01332-000, São Paulo (SP), Brazil Dr. Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 JavadRobati Crop Production Departement,University of Maragheh,Golshahr,Maragheh,Iran VineshSukumar (PhD, MBA) Product Engineering Segment Manager, Imaging Products, Aptina Imaging Inc. Dr. Binod Kumar PhD(CS), M.Phil.(CS), MIAENG,MIEEE HOD & Associate Professor, IT Dept, Medi-Caps Inst. of Science & Tech.(MIST),Indore, India Dr. S. B. Warkad Associate Professor, Department of Electrical Engineering, Priyadarshini College of Engineering, Nagpur, India Dr. doc. Ing. RostislavChoteborský, Ph.D. Katedramateriálu a strojírenskétechnologieTechnickáfakulta,Ceskázemedelskáuniverzita v Praze,Kamýcká 129, Praha 6, 165 21 Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168 DR.ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg.,HamptonUniversity,Hampton, VA 23688 Mr. Abhishek Taneja B.sc(Electronics),M.B.E,M.C.A.,M.Phil., Assistant Professor in the Department of Computer Science & Applications, at Dronacharya Institute of Management and Technology, Kurukshetra. (India).
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
Dr. Ing. RostislavChotěborský,ph.d, Katedramateriálu a strojírenskétechnologie, Technickáfakulta,Českázemědělskáuniverzita v Praze,Kamýcká 129, Praha 6, 165 21
Dr. AmalaVijayaSelvi Rajan, B.sc,Ph.d, Faculty – Information Technology Dubai Women’s College – Higher Colleges of Technology,P.O. Box – 16062, Dubai, UAE Naik Nitin AshokraoB.sc,M.Sc Lecturer in YeshwantMahavidyalayaNanded University Dr.A.Kathirvell, B.E, M.E, Ph.D,MISTE, MIACSIT, MENGG Professor - Department of Computer Science and Engineering,Tagore Engineering College, Chennai Dr. H. S. Fadewar B.sc,M.sc,M.Phil.,ph.d,PGDBM,B.Ed. Associate Professor - Sinhgad Institute of Management & Computer Application, Mumbai-BangloreWesternly Express Way Narhe, Pune - 41 Dr. David Batten Leader, Algal Pre-Feasibility Study,Transport Technologies and Sustainable Fuels,CSIRO Energy Transformed Flagship Private Bag 1,Aspendale, Vic. 3195,AUSTRALIA Dr R C Panda (MTech& PhD(IITM);Ex-Faculty (Curtin Univ Tech, Perth, Australia))Scientist CLRI (CSIR), Adyar, Chennai - 600 020,India Miss Jing He PH.D. Candidate of Georgia State University,1450 Willow Lake Dr. NE,Atlanta, GA, 30329 Jeremiah Neubert Assistant Professor,MechanicalEngineering,University of North Dakota Hui Shen Mechanical Engineering Dept,Ohio Northern Univ. Dr. Xiangfa Wu, Ph.D. Assistant Professor / Mechanical Engineering,NORTH DAKOTA STATE UNIVERSITY SeraphinChallyAbou Professor,Mechanical& Industrial Engineering Depart,MEHS Program, 235 Voss-Kovach Hall,1305 OrdeanCourt,Duluth, Minnesota 55812-3042 Dr. Qiang Cheng, Ph.D. Assistant Professor,Computer Science Department Southern Illinois University CarbondaleFaner Hall, Room 2140-Mail Code 45111000 Faner Drive, Carbondale, IL 62901 Dr. Carlos Barrios, PhD Assistant Professor of Architecture,School of Architecture and Planning,The Catholic University of America Y. BenalYurtlu Assist. Prof. OndokuzMayis University Dr. Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials CSIRO Process Science & Engineering Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04 Dr. ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688
Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business School,RuaItapeva, 474 (8° andar)01332-000, São Paulo (SP), Brazil Dr. Wael M. G. Ibrahim Department Head-Electronics Engineering Technology Dept.School of Engineering Technology ECPI College of Technology 5501 Greenwich Road Suite 100,Virginia Beach, VA 23462 Dr. Messaoud Jake Bahoura Associate Professor-Engineering Department and Center for Materials Research Norfolk State University,700 Park avenue,Norfolk, VA 23504 Dr. V. P. Eswaramurthy M.C.A., M.Phil., Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. P. Kamakkannan,M.C.A., Ph.D ., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. V. Karthikeyani Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 008, India. Dr. K. Thangadurai Ph.D., Assistant Professor, Department of Computer Science, Government Arts College ( Autonomous ), Karur - 639 005,India. Dr. N. Maheswari Ph.D., Assistant Professor, Department of MCA, Faculty of Engineering and Technology, SRM University, Kattangulathur, Kanchipiram Dt - 603 203, India. Mr. Md. Musfique Anwar B.Sc(Engg.) Lecturer, Computer Science & Engineering Department, Jahangirnagar University, Savar, Dhaka, Bangladesh. Mrs. Smitha Ramachandran M.Sc(CS)., SAP Analyst, Akzonobel, Slough, United Kingdom. Dr. V. Vallimayil Ph.D., Director, Department of MCA, Vivekanandha Business School For Women, Elayampalayam, Tiruchengode - 637 205, India. Mr. M. Moorthi M.C.A., M.Phil., Assistant Professor, Department of computer Applications, Kongu Arts and Science College, India PremaSelvarajBsc,M.C.A,M.Phil Assistant Professor,Department of Computer Science,KSR College of Arts and Science, Tiruchengode Mr. G. Rajendran M.C.A., M.Phil., N.E.T., PGDBM., PGDBF., Assistant Professor, Department of Computer Science, Government Arts College, Salem, India. Dr. Pradeep H Pendse B.E.,M.M.S.,Ph.d Dean - IT,Welingkar Institute of Management Development and Research, Mumbai, India Muhammad Javed Centre for Next Generation Localisation, School of Computing, Dublin City University, Dublin 9, Ireland Dr. G. GOBI Assistant Professor-Department of Physics,Government Arts College,Salem - 636 007 Dr.S.Senthilkumar Post Doctoral Research Fellow, (Mathematics and Computer Science & Applications),UniversitiSainsMalaysia,School of Mathematical Sciences, Pulau Pinang-11800,[PENANG],MALAYSIA. Manoj Sharma Associate Professor Deptt. of ECE, PrannathParnami Institute of Management & Technology, Hissar, Haryana, India
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
RAMKUMAR JAGANATHAN Asst-Professor,Dept of Computer Science, V.L.B Janakiammal college of Arts & Science, Coimbatore,Tamilnadu, India Dr. S. B. Warkad Assoc. Professor, Priyadarshini College of Engineering, Nagpur, Maharashtra State, India Dr. Saurabh Pal Associate Professor, UNS Institute of Engg. & Tech., VBS Purvanchal University, Jaunpur, India Manimala Assistant Professor, Department of Applied Electronics and Instrumentation, St Joseph’s College of Engineering & Technology, Choondacherry Post, Kottayam Dt. Kerala -686579 Dr. Qazi S. M. Zia-ul-Haque Control Engineer Synchrotron-light for Experimental Sciences and Applications in the Middle East (SESAME),P. O. Box 7, Allan 19252, Jordan Dr. A. Subramani, M.C.A.,M.Phil.,Ph.D. Professor,Department of Computer Applications, K.S.R. College of Engineering, Tiruchengode - 637215 Dr. SeraphinChallyAbou Professor, Mechanical & Industrial Engineering Depart. MEHS Program, 235 Voss-Kovach Hall, 1305 Ordean Court Duluth, Minnesota 55812-3042 Dr. K. Kousalya Professor, Department of CSE,Kongu Engineering College,Perundurai-638 052 Dr. (Mrs.) R. Uma Rani Asso.Prof., Department of Computer Science, Sri Sarada College For Women, Salem-16, Tamil Nadu, India. MOHAMMAD YAZDANI-ASRAMI Electrical and Computer Engineering Department, Babol"Noshirvani" University of Technology, Iran. Dr. Kulasekharan, N, Ph.D Technical Lead - CFD,GE Appliances and Lighting, GE India,John F Welch Technology Center,Plot # 122, EPIP, Phase 2,Whitefield Road,Bangalore – 560066, India. Dr. Manjeet Bansal Dean (Post Graduate),Department of Civil Engineering,Punjab Technical University,GianiZail Singh Campus,Bathinda -151001 (Punjab),INDIA Dr. Oliver Jukić Vice Dean for education,Virovitica College,MatijeGupca 78,33000 Virovitica, Croatia Dr. Lori A. Wolff, Ph.D., J.D. Professor of Leadership and Counselor Education,The University of Mississippi,Department of Leadership and Counselor Education, 139 Guyton University, MS 38677
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
Contents Analysis of Microarray Gene Expression Data Using Boolean Association Rule Mining R. Vengateshkumar, S. Alagukumar & Dr.R. Lawrance .…………………………………….[412]
www.ijitce.co.uk
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
Analysis of Microarray Gene Expression Data Using Boolean Association Rule Mining R. Vengateshkumar Research Scholar, Research & Development center, Bharathiar University, Coimbatore, Tamil Nadu, India. Email: vengatesh.kumar@gmail.com S. Alagukumar Assistant Professor, Department of Computer Applications, Ayya Nadar Janaki Ammal College, Sivakasi, Tamil Nadu, India Email: alagukumarmca@gmail.com Dr.R. Lawrance Director, Department of Computer Applications, Ayya Nadar Janaki Ammal College, Sivakasi, Tamil Nadu, India Email:lawrancer@yahoo.com Abstract- Data Mining is one of the interdisciplinary fields on the research area. Association rule mining plays a vital role in the data mining for finding significant relations in biological data. Microarray technology is mainly used by the researchers to find the meaningful relations among gene expression data. In this research paper, the statistical t-test has been applied to select the significant genes, equal frequency binning method has been implemented for discretizing the gene expression data, Boolean Association Rules (BAR) generate the frequent gene expression intervals and finally, the association rules has been discovered. Association rules discover the significant relations among microarray gene expression data. It exposes the correlation among the gene expression and used to provide the significant decision for cancer diagnosis. Keywords- Microarray, Gene Filtering, Equal Frequency, Frequent Pattern Mining and Association Rule Mining.
1. INTRODUCTION Now-a-days, huge amount of data are being collected from Biological data. Analyzing and extracting information from huge amount of data is difficult. Data mining techniques have been used to get the effective knowledge from the huge amount of data. In this paper, the proposed methodology focuses on association rule mining technique to extract interesting relationships among set of genes in the field of bioinformatics. Microarray technologies provide the opportunity to compute the expression level of tens of thousands of genes in cells simultaneously. One interesting fact about microarray data is that the behaviors of thousand of genes can be examined at different times. Gene expression is the process of transcribing DNA sequence to MRNA sequence which is later referred to as the amino acid sequence known as protein. The number of produced versions from RNA is called gene expression level. Microarray experiments contains huge amount of data. Main challenge on microarray data is high density of data. Data collected from microarray experiments is in the form of R x C matrix of expression level, where R represents Rows (experiments) and C represents Columns
(genes). Microarray contains an order of magnitude more genes than experiments. In this paper, it has been focused on microarray gene expression interval association analysis from the frequent pattern mining. Frequent pattern mining is the most important task of association rule mining. Microarray gene expression interval association analysis is exploring the biological relevant association between different genes under different experimental samples. The rest of the paper is organized as given below. The related papers are reviewed in Section 2. The proposed methodology of Boolean association rules (BAR) are illustrated in Section 3. The experimental results are shown in Section 4. Conclusion of the work is discussed in Section 5. 2. RELATED WORKS In order to do the survey various algorithms have been studied. Extracting the interesting relationships among set of genes using gene intervals and association rules, the researcher must know the basic knowledge of gene filters, discretization techniques and association algorithms. Jeanmougin, M, et al. have discussed the statistical approaches to select genes differentially expressed between two groups is to apply a t-test and compared with various statistical methods to find the significant genes [1]. Garcia, S. et al. have made a survey on discretization techniques. Discretization is an essential preprocessing technique to transform a set of continuous attributes into discrete attributes, by associating categorical values to intervals [2]. Alves, R., et al. have discussed frequent pattern methods for gene association analysis. The dense datasets such as telecommunications, microarrays, etc., where there are many long frequent patterns. Hence, these methods scale very poorly and sometimes are impractical. This drawback is due to the high computational cost used by apriori algorithm. Then they pointed out that the tree based methods such as Frequent Pattern (FP-growth) may find difficulties when dealing with high dimensional datasets [3].
412
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
Zakaria, W., et al. have proposed a column enumeration based algorithm using high confidence association rules for up and down expressed genes. Then they explained that the generating all frequent itemsets in dense datasets requires large memory [4]. Alagukumar, S., et al. have discussed the microarray data analysis using association rule mining. They compared the frequent pattern mining methods using Apriori and FPGrowth on microarray gene expression data [5]. Wur, S.Y., et.al. have proposed effective boolean algorithm for mining association rules in large databases. The sparse matrix approach has been given better performance over the Apriori algorithm [6]. From the literature study, it has been concluded that microarray dataset typically contain high density of data. Association rules have been proved to be useful in analyzing such datasets. However, the most existing association rule mining algorithms are unable to efficiently handle normalized microarray datasets with continuous values. The existing association rule mining algorithms requires large memory and takes exponential time for generating frequent gene expression pattern and discovering association rules. In this paper, a new algorithm called Boolean Association Rule (BAR) is described that is specially designed to select the significant genes, generate frequent gene expressions intervals and discover association rules from microarray gene expression data using gene intervals with less memory and low computational time.
In Boolean Association Rule (BAR), item-sets are gene expression intervals. The aim of Boolean Association Rule (BAR) is to extract the frequent gene expression intervals and then use them to generate association rules. Before mining, Boolean Association Rule (BAR) selects the significant genes from microarray gene expression data, and transforms the data by converting continuous gene expression data into discretized gene expression data. Finally, the discretized data are used as transaction data for mining. In this research paper, it has been proposed a Boolean Association Rule (BAR) for microarray gene association analysis using frequent gene expression intervals and association rules shown in figure 1. The Boolean Association Rule (BAR) comprises of two phases, namely preprocessing and gene association analysis. The pseudo code for overall algorithm is illustrated in the figure 2. Input
: Microarray gene expression data
Output : Micro array gene association rules Begin Step 1 : Read the gene expression data Step 2 : Filter Significant genes using t-test Step 3 : Discretize the gene expression data using equal frequency binning method Step 4 : Convert the discretized gene intervals data to transaction data
3. METHODOLOGY Association rule mining finds frequent item-sets whose occurrences exceed a predefined threshold in the dataset. Then it generates association rules from frequent item sets with the support and confidence. Association rule mining is applied on microarray data set to extract interesting associations among set of genes.
Step 5 : Generate the frequent gene intervals using Boolean method Step 6 : Extract the Microarray gene association Rules using Boolean method End Fig.2: BAR Algorithm At the end of two phases the frequent patterns and significant relations among microarray gene expression intervals are extracted. A. Preprocessing Data preprocessing is a one of the data mining technique which involves transforming unprocessed data into an understandable format. Real world data is often deficient, inconsistent. Data preprocessing is a proven method of resolving such issues. In this paper, the informative genes are selected using gene filtering and the continuous gene expression data are transformed into discrete data using discretization technique.
Fig.1: Block diagram of BAR
1. Gene Filtering using t-test The gene filtering is the process of selecting the differentially expressed genes and statistically significant in the gene expression data using t-test method. The t-test is the most often used to analyze microarray data. The t-statistic
413
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
provides a standardized estimate of differential expression based on the following formula
n
ˆ p
m
Ai n A Bi m B 1
2
2
2
2
1
nm2
Where represents is sample means and unbiased estimator for standard deviation.
(2) is an
First calculate the sums of squares and Correlation factor, to subtract to give n times normal sample variance also called the sum of squared residuals, the associated probability under the null hypothesis is calculated by reference to the t- distribution with degree of freedom. The p-value is used to determine if a number is significantly different from normal. A p-value of 0.05 or less is commonly measured statically significant. The t-statistics value will be calculated and the p-value calculated from tdistribution with n−2 degrees of freedom. Finally, the differentially expressed genes and statistically significant genes are selected or biological significance based on probability with degrees of freedom N-2 and p < 0.05. The uninformative genes are removed from gene expression. 2. Discretization using Equal Frequency Binning Data discretization is a commonly used as preprocessing method that reduces the number of distinct values for a given continuous variable by dividing its range into a finite set of disjoint intervals, and then relates these intervals with meaningful labels [7]. Subsequently, data are analyzed or reported at this higher level of knowledge representation rather than the individual values, and thus leads to the simplified data representation in data exploration and data mining process. Discretization methods can be supervised or unsupervised. In this paper, the Equal Frequency Binning method has been implemented to discrete the gene expression values. The equal-frequency algorithm determines the minimum and maximum values of the discretized attribute, sorts all values in ascending order, and divides the sorted continuous values into k intervals such that each interval contains approximately n/k data instances with adjacent values[7].
Step-1: The frequent gene interval sets are identified using bitwise OR and bitwise AND operations. Step-2: Microarray gene association rules are generated using bitwise AND and bitwise XOR operations from the frequent gene interval sets. 1. Frequent gene interval sets Given a set of genes G= {g1, g2, g3 … gn} and a set of samples tID = {s1, s2, s3… sm}, a subset of G, S⊆G is called a frequent, if support(S) ≥ minimum support, where minimum support is a user defined threshold [8]. 2. Microarray Gene Association Rules Association Rule: Let G = {g1, g2, g3 … gn} be a set of n elements called genes. A rule is defined as an implication of the form X→Y, where and [8]. The lefthand side of the rule is named as antecedent and right-side of the rule is named as consequent. Support: The Rule holds in the transaction set T with Support S, Where S is the percentage of samples in T that contain [8]
Confident: The Rule has confidence C in the transaction set T, where C is the percentage of samples in T containing X that also contain Y [8]. 4. EXPERIMENTAL RESULTS The sample gene expression data related to breast cancer2 dataset consists of 30 samples and 16 genes are shown in table1. The sample gene expression data are filtered using statistical t-test gene filtering method. The filtered gene expression data are shown in table 2. After gene filtering, the gene expression data are transformed into gene intervals using equal frequency binning discretization method, where the data clustered into 2 distinct clusters are shown in table 3. Finally, gene intervals are converted into transactional data where samples are represented by transactions and gene intervals are represented by item sets as shown in table 4. In microarray gene association, the frequent gene intervals sets are generated with minimum support count 50% as shown in table 5. From the frequent gene intervals sets, the association rules are extracted with support 50% and confidence 100% as shown in table 6. Finally the biological knowledge is extracted from the association rules. It provides gene targeting treatment decisions for cancer patients. Table 1: Sample Microarray Gene Expression Data
B. Gene Association Analysis The Boolean Association Rule (BAR) algorithm finds useful patterns and rules from transaction data using boolean method. These patterns and rules are very useful for decision making. The boolean method generates the frequent microarray gene intervals without generating the candidate item sets and extracting the association rules in two steps.
414
Sample
S1
S2
S3
S4
S5
Sn
LYPD6
0.45
-1.49
-0.46
0.65
0.08
…
PTGER3
1.66
2.26
1.22
3.98
3.59
…
EST_1
-0.62
-0.68
-0.8
-0.68
-0.83
…
EST_2
-0.49
-0.38
-0.56
-0.48
-0.41
…
CHDH
0.86
1.17
0.34
1
-1.82
…
0.64
0.48
-0.34
-0.04
-1.24
…
EST_3
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
-0.7
IL17BR
-1.1
-2.16
-0.59
-3.56
SCYA4
7.13
7.11
7.05
6.51
8.58
…
IL1R2
1.37
1.65
0.53
1.44
1.1
…
ABCC11
5.96
6.55
4.35
6.82
6.02
…
HOXB13
-3.1
3.2
2.05
-3.33
1.12
…
APS
-1.45
0.68
-0.17
0.22
-0.45
…
ESTs_4
-2.57
2.3
1.11
-2.54
0.59
…
DOK2
2.04
1.69
1.77
3.09
EST_5
1.83
1.55
1.94
GUCY2D
1.99
4.25
5.49
Table-4: Transaction dataset
…
2.24
…
1.25
2.1
…
1.56
2.59
…
tID
{LYPD6=[-1.49,0.45],EST_2=[-0.41,-0.38], EST_3=[ 0.48,0.64],IL17BR=[-3.56,-0.70], IL1R2=[1.44,1.65],ABCC11=[6.55,6.82]}
S3
{LYPD6=[-1.49,0.45],EST_2=[-0.56,-0.41], EST_3=[-1.24,0.48],IL17BR=[-3.56,-0.70], IL1R2=[0.53,1.44),ABCC11=[4.35,6.55)}
S5
{LYPD6=[ 0.45,0.65],EST_2=[-0.56,-0.41), EST_3=[-1.24,0.48],IL17BR=[-0.70,-0.59], IL1R2=[1.44,1.65],ABCC11=[6.55,6.82]} {LYPD6=[-1.49,0.45],EST_2=[-0.41,-0.38], EST_3=[-1.24,0.48],IL17BR=[-3.56,-0.70], IL1R2=[0.53,1.44],ABCC11=[4.35,6.55]}
EST_2
EST_3
IL17BR
IL1R2
ABCC11
S1
0.45
-0.49
0.64
-0.7
1.37
5.96
S2
-1.49
-0.38
0.48
-1.1
1.65
6.55
S3
-0.46
-0.56
-0.34
-2.16
0.53
4.35
S4
0.65
-0.48
-0.04
-0.59
1.44
6.82
EST_2[-0.56,-0.41]
3
S5
0.08
-0.41
-1.24
-3.56
1.1
6.02
EST_3=[-1.24,0.48]
3
Sn
…
…
…
…
…
…
IL17BR=[-3.56,-0.70]
3
IL1R2=[0.53,1.44]
3
ABCC11=[4.35,6.55]
3
LYP D6 [ 0.45, 0.65]
Table-5: Frequent Gene Expression Intervals set Frequent Gene Expression Intervals set LYPD6=[-1.49,0.45]
EST_2 [-0.56,
EST_3 [ 0.48,
-0.41]
0.64]
IL17BR [-0.70,0.59]
IL1R2 [0.53,
ABCC 11 [4.35,
1.44]
6.55]
[1.49,
[-0.41,
[ 0.48,
0.45]
-0.38]
0.64]
[1.49,
[-0.56,
[-1.24,
0.45]
-0.41]
0.48]
[ 0.45,
[-0.56,
[-1.24,
0.65]
-0.41]
0.48]
[1.49,
[-0.41,
[-1.24,
[-3.56,0.70]
[-3.56,0.70]
[-0.70,0.59]
S5
0.45]
-0.38]
0.48]
[-3.56,0.70]
Sn
…
…
…
…
Support Count
LYPD6=[-1.49,0.45], IL17BR=[-3.56,-0.70] IL1R2=[0.53,1.44] ABCC11=[4.35,6.55] ….
S1
S4
S2
LYPD6
Sam ple
S3
{LYPD6[ 0.45,0.65],EST_2[-0.56,-0.41], EST_3[ 0.48,0.64],IL17BR=[-0.70,-0.59], IL1R2=[0.53,1.44],ABCC11=[4.35,6.55]}
Sample
Table-3: Microarray Gene Expression Intervals
S2
S1
S4
Table-2: Filtered Gene Expression Data
Itemset
3
3 3 ….
[1.44,
[6.55,
1.65]
6.82]
[0.53,
[4.35,
EST_3-1.24,0.48] ABCC11[4.35,6.55]
IL17BR[-1.83 : -0.59] IL1R2[0.53,1.44]
50%
100%
1.44]
6.55]
IL17BR[-1.83 : -0.59] IL1R2[0.53,1.44]
EST_3[-1.24,0.48]
50%
100%
[1.44,
[6.55,
IL17BR[-1.83 : -0.59] IL1R2[0.53,1.44]
ABCC11[4.35,6.55]
50%
100%
1.65]
6.82]
IL17BR[-1.83 : -0.59] IL1R2[0.53,1.44]
EST_3[-1.24,0.48] ABCC11[4.35,6.55]
50%
100%
IL17BR[-1.83 : -0.59] ABCC11[4.35,6.55]
EST_3[-1.24,0.48]
50%
100%
IL1R2=[0.53:1.44]
ABCC11=[4.35:6.55]
60%
100%
ABCC11=[4.35:6.55]
IL1R2=[0.53:1.44]
60%
100%
Table-6: Association Rules
[0.53,
[4.35,
1.44]
6.55]
…
…
Antecedent
Consequent
Sup.
Conf.
The experiments are performed on a computer with Intel Core 2 Duo CPU and 2GB of main memory. The proposed algorithm implemented in Java language with JDK1.4 version. The microarray breast cancer2 gene expression data were taken from National centre for Biotechnology Information (NCBI) [9].
415
INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.7 NO.05 MAY 2017, IMPACT FACTOR: 1.04
A. Comparative analysis [6]
[7]
[8] [9] Fig.3: Comparative Analysis of Rule Generation The proposed BAR algorithm compared with classical association algorithms such as Apriori and FPGrowth algorithms. The proposed association algorithms discover the less number of rules and reduce the time complexity as well as memory to compare with bench mark algorithms. The figure 3 depicts the comparative analysis of rule generation of proposed algorithm with Apiori and FPGrowth algorithms. 6. CONCLUSION The proposed Boolean Association Rule (BAR) algorithm obtains frequent item set without candidate generation and scans the database only once. It reduces the time complexity and memory usage. The proposed Boolean Association Rule (BAR) algorithm extracts significant relations among microarray genes. The experiments were carried out by using the microarray breast cancer 2 dataset. Additionally, in this paper, the algorithm has been compared with other traditional bench mark algorithms such as Apriori, FP-growth. Apriori algorithm requires large memory and takes exponential time for candidate generation. FP-growth generates frequent item set without candidate item set generation, Hence it requires less memory and scans the database only two times. The result of the comparative analysis revealed that the Boolean Association Rule (BAR) performed better than other methods. The result of this work can be used to reveal crucial resource for diseases and provide gene targeting treatments. REFERENCES [1] Jeanmougin, M, et al. "Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies." PloS one vol.5n no.9, 2010. [2] S. Garcia, J. Luengo, J.A. Sáez, V. López, and F. Herrera, “A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning”, Knowledge and Data Engineering, IEEE Transactions, vol. 25, no.4, pp.734-750, 2013. [3] R. Alves, B.D.S Rodriguez, and R.J.S. Aguilar, “Gene association analysis: a survey of frequent pattern mining from gene expression data”, Briefings in Bioinformatics, 2009, vol.2, no.2, pp.210-224. [4] W. Zakaria, Y. Kotb, and F. Ghaleb, “MCR-Miner: Maximal Confident Association Rules Miner Algorithm for Up/Down-Expressed Genes”, Applied Mathematics and Information Sciences, vol.8 no.2, pp.799-809, 2014. [5] S. Alagukumar and R. Lawrance, "A Selective Analysis of Microarray Data Using Association Rule Mining."
416
Procedia Computer Science, no.47, pp.3-12, 2015. http://dx.doi.org/10.1016/j.procs.2015.03.177 S.Y. Wur, and Y. Leu, “An Effective Boolean Algorithm for Mining Association Rules in Large Databases” Database Systems for Advanced Applications,IEEE Transactions, pp.179-186,1999. R. Dash, and R.L. Paramguru, “Comparative analysis of Supervised and Unsupervised Discretization Techniques”, International Journal of Advances in Science and Technology,vol.2,no.3,pp.29-7,2011. J. Han, and M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Elsevier, 2002. www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE1379.
@IJITCE Publication