Sws issue 11 vol1

Page 1

ISSN (ONLINE): 2279-0071 ISSN (PRINT): 2279-0063

Issue 11, Volume 1 December-2014 to February-2015

International Journal of Software and Web Sciences

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

STEM International Scientific Online Media and Publishing House Head Office: 148, Summit Drive, Byron, Georgia-31008, United States. Offices Overseas: Germany, Australia, India, Netherlands, Canada. Website: www.iasir.net, E-mail (s): iasir.journals@iasir.net, iasir.journals@gmail.com, ijswss@gmail.com



PREFACE We are delighted to welcome you to the eleventh issue of the International Journal of Software and Web Sciences (IJSWS). In recent years, advances in science, technology, engineering, and mathematics have radically expanded the data available to researchers and professionals in a wide variety of domains. This unique combination of theory with data has the potential to have broad impact on educational research and practice. IJSWS is publishing high-quality, peer-reviewed papers covering a number of topics in the areas of Software architectures for scientific computing, , Mobile robots, Artificial intelligence systems and architectures, Microcontrollers & microprocessor applications, Natural language processing and expert systems, Fuzzy logic and soft computing, Semantic Web, Web retrieval systems, Software and multimedia Web, Advanced database systems, Information retrieval systems, Computer architecture & VLSI, Distributed and parallel processing, Software testing, verifications and validation methods, Web mining and data mining, UML/MDA and AADL, Object oriented technology, Software and Web metrics, Software maintenance and evolution, Component based software engineering, middleware, and tools, Service oriented software architecture, Hypermedia design applications, Ontology creation, evolution, reconciliation, and mediation, Web authoring tools, Web application architectures and frameworks, Testing and evaluation of Web applications, Empirical Web engineering, Deep and hidden Web, and other relevant fields available in the vicinity of software and Web sciences.

The editorial board of IJSWS is composed of members of the Teachers & Researchers community who have expertise in a variety of disciplines, including software process models, software and technology deployments, ICT solutions, and other related disciplines of software and Web based applications. In order to best serve our community, this Journal is available online as well as in hard-copy form. Because of the rapid advances in underlying technologies and the interdisciplinary nature of the field, we believe it is important to provide quality research articles promptly and to the widest possible audience.

We are happy that this Journal has continued to grow and develop. We have made every effort to evaluate and process submissions for reviews, and address queries from authors and the general public promptly. The Journal has strived to reflect the most recent and finest researchers in the field of emerging technologies especially related to Software and Web sciences. This Journal is completely refereed and indexed with major databases like: IndexCopernicus, Computer Science Directory, GetCITED, DOAJ, SSRN, TGDScholar,


WorldWideScience, CiteSeerX, CRCnetBASE, Google Scholar, Microsoft Academic Search, INSPEC, ProQuest, ArnetMiner, Base, ChemXSeer, citebase, OpenJ-Gate, eLibrary, SafetyLit, SSRN, VADLO, OpenGrey, EBSCO, ProQuest, UlrichWeb, ISSUU, SPIE Digital Library,

arXiv,

ERIC,

EasyBib,

Infotopia,

WorldCat,

.docstoc

JURN,

Mendeley,

ResearchGate, cogprints, OCLC, iSEEK, Scribd, LOCKSS, CASSI, E-PrintNetwork, intute, and some other databases.

We are grateful to all of the individuals and agencies whose work and support made the Journal's success possible. We want to thank the executive board and core committee members of the IJSWS for entrusting us with the important job. We are thankful to the members of the IJSWS editorial board who have contributed energy and time to the Journal with their steadfast support, constructive advice, as well as reviews of submissions. We are deeply indebted to the numerous anonymous reviewers who have contributed expertly evaluations of the submissions to help maintain the quality of the Journal. For this eleventh issue, we received 96 research papers and out of which only 23 research papers are published in one volume as per the reviewers’ recommendations. We have highest respect to all the authors who have submitted articles to the Journal for their intellectual energy and creativity, and for their dedication to the field of software and web sciences.

The issue of the IJSWS has attracted a large number of authors and researchers across worldwide and would provide an effective platform to all the intellectuals of different streams to put forth their suggestions and ideas which might prove beneficial for the accelerated pace of development of emerging technologies in Software and Web sciences and may open new area for research and development. We hope you will enjoy this eleventh issue of the IJSWS and are looking forward to hearing your feedback and receiving your contributions.

(Administrative Chief)

(Managing Director)

(Editorial Head)

--------------------------------------------------------------------------------------------------------------------------Published papers in the International Journal of Software and Web Sciences (IJSWS), ISSN (Online): 2279-0071, ISSN (Print): 2279-0063 (December-2014 to February-2015, Issue 11, Volume 1). ---------------------------------------------------------------------------------------------------------------------------


BOARD MEMBERS

        

                 

EDITOR IN CHIEF Prof. (Dr.) Waressara Weerawat, Director of Logistics Innovation Center, Department of Industrial Engineering, Faculty of Engineering, Mahidol University, Thailand. Prof. (Dr.) Yen-Chun Lin, Professor and Chair, Dept. of Computer Science and Information Engineering, Chang Jung Christian University, Kway Jen, Tainan, Taiwan. Divya Sethi, GM Conferencing & VSAT Solutions, Enterprise Services, Bharti Airtel, Gurgaon, India. CHIEF EDITOR (TECHNICAL) Prof. (Dr.) Atul K. Raturi, Head School of Engineering and Physics, Faculty of Science, Technology and Environment, The University of the South Pacific, Laucala campus, Suva, Fiji Islands. Prof. (Dr.) Hadi Suwastio, College of Applied Science, Department of Information Technology, The Sultanate of Oman and Director of IETI-Research Institute-Bandung, Indonesia. Dr. Nitin Jindal, Vice President, Max Coreth, North America Gas & Power Trading, New York, United States. CHIEF EDITOR (GENERAL) Prof. (Dr.) Thanakorn Naenna, Department of Industrial Engineering, Faculty of Engineering, Mahidol University, Thailand. Prof. (Dr.) Jose Francisco Vicent Frances, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Huiyun Liu, Department of Electronic & Electrical Engineering, University College London, Torrington Place, London. ADVISORY BOARD Prof. (Dr.) Kimberly A. Freeman, Professor & Director of Undergraduate Programs, Stetson School of Business and Economics, Mercer University, Macon, Georgia, United States. Prof. (Dr.) Klaus G. Troitzsch, Professor, Institute for IS Research, University of Koblenz-Landau, Germany. Prof. (Dr.) T. Anthony Choi, Professor, Department of Electrical & Computer Engineering, Mercer University, Macon, Georgia, United States. Prof. (Dr.) Fabrizio Gerli, Department of Management, Ca' Foscari University of Venice, Italy. Prof. (Dr.) Jen-Wei Hsieh, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taiwan. Prof. (Dr.) Jose C. Martinez, Dept. Physical Chemistry, Faculty of Sciences, University of Granada, Spain. Prof. (Dr.) Panayiotis Vafeas, Department of Engineering Sciences, University of Patras, Greece. Prof. (Dr.) Soib Taib, School of Electrical & Electronics Engineering, University Science Malaysia, Malaysia. Prof. (Dr.) Vit Vozenilek, Department of Geoinformatics, Palacky University, Olomouc, Czech Republic. Prof. (Dr.) Sim Kwan Hua, School of Engineering, Computing and Science, Swinburne University of Technology, Sarawak, Malaysia. Prof. (Dr.) Jose Francisco Vicent Frances, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Rafael Ignacio Alvarez Sanchez, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Praneel Chand, Ph.D., M.IEEEC/O School of Engineering & Physics Faculty of Science & Technology The University of the South Pacific (USP) Laucala Campus, Private Mail Bag, Suva, Fiji. Prof. (Dr.) Francisco Miguel Martinez, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Antonio Zamora Gomez, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Leandro Tortosa, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Samir Ananou, Department of Microbiology, Universidad de Granada, Granada, Spain. Dr. Miguel Angel Bautista, Department de Matematica Aplicada y Analisis, Facultad de Matematicas, Universidad de Barcelona, Spain.


           

                  

Prof. (Dr.) Prof. Adam Baharum, School of Mathematical Sciences, University of Universiti Sains, Malaysia, Malaysia. Dr. Cathryn J. Peoples, Faculty of Computing and Engineering, School of Computing and Information Engineering, University of Ulster, Coleraine, Northern Ireland, United Kingdom. Prof. (Dr.) Pavel Lafata, Department of Telecommunication Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, 166 27, Czech Republic. Prof. (Dr.) P. Bhanu Prasad, Vision Specialist, Matrix vision GmbH, Germany, Consultant, TIFACCORE for Machine Vision, Advisor, Kelenn Technology, France Advisor, Shubham Automation & Services, Ahmedabad, and Professor of C.S.E, Rajalakshmi Engineering College, India. Prof. (Dr.) Anis Zarrad, Department of Computer Science and Information System, Prince Sultan University, Riyadh, Saudi Arabia. Prof. (Dr.) Mohammed Ali Hussain, Professor, Dept. of Electronics and Computer Engineering, KL University, Green Fields, Vaddeswaram, Andhra Pradesh, India. Dr. Cristiano De Magalhaes Barros, Governo do Estado de Minas Gerais, Brazil. Prof. (Dr.) Md. Rizwan Beg, Professor & Head, Dean, Faculty of Computer Applications, Deptt. of Computer Sc. & Engg. & Information Technology, Integral University Kursi Road, Dasauli, Lucknow, India. Prof. (Dr.) Vishnu Narayan Mishra, Assistant Professor of Mathematics, Sardar Vallabhbhai National Institute of Technology, Ichchhanath Mahadev Road, Surat, Surat-395007, Gujarat, India. Dr. Jia Hu, Member Research Staff, Philips Research North America, New York Area, NY. Prof. Shashikant Shantilal Patil SVKM, MPSTME Shirpur Campus, NMIMS University Vile Parle Mumbai, India. Prof. (Dr.) Bindhya Chal Yadav, Assistant Professor in Botany, Govt. Post Graduate College, Fatehabad, Agra, Uttar Pradesh, India. REVIEW BOARD Prof. (Dr.) Kimberly A. Freeman, Professor & Director of Undergraduate Programs, Stetson School of Business and Economics, Mercer University, Macon, Georgia, United States. Prof. (Dr.) Klaus G. Troitzsch, Professor, Institute for IS Research, University of Koblenz-Landau, Germany. Prof. (Dr.) T. Anthony Choi, Professor, Department of Electrical & Computer Engineering, Mercer University, Macon, Georgia, United States. Prof. (Dr.) Yen-Chun Lin, Professor and Chair, Dept. of Computer Science and Information Engineering, Chang Jung Christian University, Kway Jen, Tainan, Taiwan. Prof. (Dr.) Jen-Wei Hsieh, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taiwan. Prof. (Dr.) Jose C. Martinez, Dept. Physical Chemistry, Faculty of Sciences, University of Granada, Spain. Prof. (Dr.) Joel Saltz, Emory University, Atlanta, Georgia, United States. Prof. (Dr.) Panayiotis Vafeas, Department of Engineering Sciences, University of Patras, Greece. Prof. (Dr.) Soib Taib, School of Electrical & Electronics Engineering, University Science Malaysia, Malaysia. Prof. (Dr.) Sim Kwan Hua, School of Engineering, Computing and Science, Swinburne University of Technology, Sarawak, Malaysia. Prof. (Dr.) Jose Francisco Vicent Frances, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Rafael Ignacio Alvarez Sanchez, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Francisco Miguel Martinez, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Antonio Zamora Gomez, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Leandro Tortosa, Department of Science of the Computation and Artificial Intelligence, Universidad de Alicante, Alicante, Spain. Prof. (Dr.) Samir Ananou, Department of Microbiology, Universidad de Granada, Granada, Spain. Dr. Miguel Angel Bautista, Department de Matematica Aplicada y Analisis, Facultad de Matematicas, Universidad de Barcelona, Spain. Prof. (Dr.) Prof. Adam Baharum, School of Mathematical Sciences, University of Universiti Sains, Malaysia, Malaysia. Prof. (Dr.) Huiyun Liu, Department of Electronic & Electrical Engineering, University College London, Torrington Place, London.


                                

Dr. Cristiano De Magalhaes Barros, Governo do Estado de Minas Gerais, Brazil. Prof. (Dr.) Pravin G. Ingole, Senior Researcher, Greenhouse Gas Research Center, Korea Institute of Energy Research (KIER), 152 Gajeong-ro, Yuseong-gu, Daejeon 305-343, KOREA Prof. (Dr.) Dilum Bandara, Dept. Computer Science & Engineering, University of Moratuwa, Sri Lanka. Prof. (Dr.) Faudziah Ahmad, School of Computing, UUM College of Arts and Sciences, University Utara Malaysia, 06010 UUM Sintok, Kedah Darulaman Prof. (Dr.) G. Manoj Someswar, Principal, Dept. of CSE at Anwar-ul-uloom College of Engineering & Technology, Yennepally, Vikarabad, RR District., A.P., India. Prof. (Dr.) Abdelghni Lakehal, Applied Mathematics, Rue 10 no 6 cite des fonctionnaires dokkarat 30010 Fes Marocco. Dr. Kamal Kulshreshtha, Associate Professor & Head, Deptt. of Computer Sc. & Applications, Modi Institute of Management & Technology, Kota-324 009, Rajasthan, India. Prof. (Dr.) Anukrati Sharma, Associate Professor, Faculty of Commerce and Management, University of Kota, Kota, Rajasthan, India. Prof. (Dr.) S. Natarajan, Department of Electronics and Communication Engineering, SSM College of Engineering, NH 47, Salem Main Road, Komarapalayam, Namakkal District, Tamilnadu 638183, India. Prof. (Dr.) J. Sadhik Basha, Department of Mechanical Engineering, King Khalid University, Abha, Kingdom of Saudi Arabia Prof. (Dr.) G. SAVITHRI, Department of Sericulture, S.P. Mahila Visvavidyalayam, Tirupati517502, Andhra Pradesh, India. Prof. (Dr.) Shweta jain, Tolani College of Commerce, Andheri, Mumbai. 400001, India Prof. (Dr.) Abdullah M. Abdul-Jabbar, Department of Mathematics, College of Science, University of Salahaddin-Erbil, Kurdistan Region, Iraq. Prof. (Dr.) P.Sujathamma, Department of Sericulture, S.P.Mahila Visvavidyalayam, Tirupati517502, India. Prof. (Dr.) Bimla Dhanda, Professor & Head, Department of Human Development and Family Studies, College of Home Science, CCS, Haryana Agricultural University, Hisar- 125001 (Haryana) India. Prof. (Dr.) Manjulatha, Dept of Biochemistry,School of Life Sciences,University of Hyderabad,Gachibowli, Hyderabad, India. Prof. (Dr.) Upasani Dhananjay Eknath Advisor & Chief Coordinator, ALUMNI Association, Sinhgad Institute of Technology & Science, Narhe, Pune- 411 041, India. Prof. (Dr.) Sudhindra Bhat, Professor & Finance Area Chair, School of Business, Alliance University Bangalore-562106. Prof. Prasenjit Chatterjee , Dept. of Mechanical Engineering, MCKV Institute of Engineering West Bengal, India. Prof. Rajesh Murukesan, Deptt. of Automobile Engineering, Rajalakshmi Engineering college, Chennai, India. Prof. (Dr.) Parmil Kumar, Department of Statistics, University of Jammu, Jammu, India Prof. (Dr.) M.N. Shesha Prakash, Vice Principal, Professor & Head of Civil Engineering, Vidya Vikas Institute of Engineering and Technology, Alanahally, Mysore-570 028 Prof. (Dr.) Piyush Singhal, Mechanical Engineering Deptt., GLA University, India. Prof. M. Mahbubur Rahman, School of Engineering & Information Technology, Murdoch University, Perth Western Australia 6150, Australia. Prof. Nawaraj Chaulagain, Department of Religion, Illinois Wesleyan University, Bloomington, IL. Prof. Hassan Jafari, Faculty of Maritime Economics & Management, Khoramshahr University of Marine Science and Technology, khoramshahr, Khuzestan province, Iran Prof. (Dr.) Kantipudi MVV Prasad , Dept of EC, School of Engg, R.K.University,Kast urbhadham, Tramba, Rajkot-360020, India. Prof. (Mrs.) P.Sujathamma, Department of Sericulture, S.P.Mahila Visvavidyalayam, ( Women's University), Tirupati-517502, India. Prof. (Dr.) M A Rizvi, Dept. of Computer Engineering and Applications, National Institute of Technical Teachers' Training and Research, Bhopal M.P. India Prof. (Dr.) Mohsen Shafiei Nikabadi, Faculty of Economics and Management, Industrial Management Department, Semnan University, Semnan, Iran. Prof. P.R.SivaSankar, Head, Dept. of Commerce, Vikrama Simhapuri University Post Graduate Centre, KAVALI - 524201, A.P. India. Prof. (Dr.) Bhawna Dubey, Institute of Environmental Science( AIES), Amity University, Noida, India. Prof. Manoj Chouhan, Deptt. of Information Technology, SVITS Indore, India.


                                

Prof. Yupal S Shukla, V M Patel College of Management Studies, Ganpat University, KhervaMehsana, India. Prof. (Dr.) Amit Kohli, Head of the Department, Department of Mechanical Engineering, D.A.V.Institute of Engg. and Technology, Kabir Nagar, Jalandhar, Punjab(India) Prof. (Dr.) Kumar Irayya Maddani, and Head of the Department of Physics in SDM College of Engineering and Technology, Dhavalagiri, Dharwad, State: Karnataka (INDIA). Prof. (Dr.) Shafi Phaniband, SDM College of Engineering and Technology, Dharwad, INDIA. Prof. M H Annaiah, Head, Department of Automobile Engineering, Acharya Institute of Technology, Soladevana Halli, Bangalore -560107, India. Prof. (Dr.) Shriram K V, Faculty Computer Science and Engineering, Amrita Vishwa Vidhyapeetham University, Coimbatore, India. Prof. (Dr.) Sohail Ayub, Department of Civil Engineering, Z.H College of Engineering & Technology, Aligarh Muslim University, Aligarh. 202002 UP-India Prof. (Dr.) Santosh Kumar Behera, Department of Education, Sidho-Kanho-Birsha University, Purulia, West Bengal, India. Prof. (Dr.) Urmila Shrawankar, Department of Computer Science & Engineering, G H Raisoni College of Engineering, Nagpur (MS), India. Prof. Anbu Kumar. S, Deptt. of Civil Engg., Delhi Technological University (Formerly Delhi College of Engineering) Delhi, India. Prof. (Dr.) Meenakshi Sood, Vegetable Science, College of Horticulture, Mysore, University of Horticultural Sciences, Bagalkot, Karnataka (India) Prof. (Dr.) Prof. R. R. Patil, Director School Of Earth Science, Solapur University, Solapur, India. Prof. (Dr.) Manoj Khandelwal, Dept. of Mining Engg, College of Technology & Engineering, Maharana Pratap University of Agriculture & Technology, Udaipur-313 001 (Rajasthan), India Prof. (Dr.) Kishor Chandra Satpathy, Librarian, National Institute of Technology, Silchar-788010, Assam, India. Prof. (Dr.) Juhana Jaafar, Gas Engineering Department, Faculty of Petroleum and Renewable Energy Engineering (FPREE), Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor. Prof. (Dr.) Rita Khare, Assistant Professor in chemistry, Govt. Women,s College, Gardanibagh, Patna, Bihar, India. Prof. (Dr.) Raviraj Kusanur, Dept of Chemistry, R V College of Engineering, Bangalore-59, India. Prof. (Dr.) Hameem Shanavas .I, M.V.J College of Engineering, Bangalore, India. Prof. (Dr.) Sandhya Mehrotra, Department of Biological Sciences, Birla Institute of Technology and Sciences, Pilani, Rajasthan, India. Prof. (Dr.) Dr. Ravindra Jilte, Head of the Department, Department of Mechanical Engineering,VCET, Thane-401202, India. Prof. (Dr.) Sanjay Kumar, JKL University, Ajmer Road, Jaipur Prof. (Dr.) Pushp Lata Faculty of English and Communication, Department of Humanities and Languages, Nucleus Member, Publications and Media Relations Unit Editor, BITScan, BITS, PilaniIndia Prof. Arun Agarwal, Faculty of ECE Dept., ITER College, Siksha 'O' Anusandhan University Bhubaneswar, Odisha, India Prof. (Dr.) Pratima Tripathi, Department of Biosciences, SSSIHL, Anantapur Campus Anantapur515001 (A.P.) India. Prof. (Dr.) Sudip Das, Department of Biotechnology, Haldia Institute of Technology, I.C.A.R.E. Complex, H.I.T. Campus, P.O. Hit, Haldia; Dist: Puba Medinipur, West Bengal, India. Prof. (Dr.) ABHIJIT MITRA , Associate Professor and former Head, Department of Marine Science, University of Calcutta , India. Prof. (Dr.) N.Ramu , Associate Professor , Department of Commerce, Annamalai University, AnnamalaiNadar-608 002, Chidambaram, Tamil Nadu , India. Prof. (Dr.) Saber Mohamed Abd-Allah, Assistant Professor of Theriogenology , Faculty of Veterinary Medicine , Beni-Suef University , Egypt. Prof. (Dr.) Ramel D. Tomaquin, Dean, College of Arts and Sciences Surigao Del Sur State University (SDSSU), Tandag City Surigao Del Sur, Philippines. Prof. (Dr.) Bimla Dhanda, Professor & Head, Department of Human Development and Family Studies College of Home Science, CCS, Haryana Agricultural University, Hisar- 125001 (Haryana) India. Prof. (Dr.) R.K.Tiwari, Professor, S.O.S. in Physics, Jiwaji University, Gwalior, M.P.-474011, India. Prof. (Dr.) Sandeep Gupta, Department of Computer Science & Engineering, Noida Institute of Engineering and Technology, Gr.Noida, India. Prof. (Dr.) Mohammad Akram, Jazan University, Kingdom of Saudi Arabia.


                               

Prof. (Dr.) Sanjay Sharma, Dept. of Mathematics, BIT, Durg(C.G.), India. Prof. (Dr.) Manas R. Panigrahi, Department of Physics, School of Applied Sciences, KIIT University, Bhubaneswar, India. Prof. (Dr.) P.Kiran Sree, Dept of CSE, Jawaharlal Nehru Technological University, India Prof. (Dr.) Suvroma Gupta, Department of Biotechnology in Haldia Institute of Technology, Haldia, West Bengal, India. Prof. (Dr.) SREEKANTH. K. J., Department of Mechanical Engineering at Mar Baselios College of Engineering & Technology, University of Kerala, Trivandrum, Kerala, India Prof. Bhubneshwar Sharma, Department of Electronics and Communication Engineering, Eternal University (H.P), India. Prof. Love Kumar, Electronics and Communication Engineering, DAV Institute of Engineering and Technology, Jalandhar (Punjab), India. Prof. S.KANNAN, Department of History, Annamalai University, Annamalainagar- 608002, Tamil Nadu, India. Prof. (Dr.) Hasrinah Hasbullah, Faculty of Petroleum & Renewable Energy Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia. Prof. Rajesh Duvvuru, Dept. of Computer Sc. & Engg., N.I.T. Jamshedpur, Jharkhand, India. Prof. (Dr.) Bhargavi H. Goswami, Department of MCA, Sunshine Group of Institutes, Nr. Rangoli Park, Kalawad Road, Rajkot, Gujarat, India. Prof. (Dr.) Essam H. Houssein, Computer Science Department, Faculty of Computers & Informatics, Benha University, Benha 13518, Qalyubia Governorate, Egypt. Arash Shaghaghi, University College London, University of London, Great Britain. Prof. Rajesh Duvvuru, Dept. of Computer Sc. & Engg., N.I.T. Jamshedpur, Jharkhand, India. Prof. (Dr.) Anand Kumar, Head, Department of MCA, M.S. Engineering College, Navarathna Agrahara, Sadahalli Post, Bangalore, PIN 562110, Karnataka, INDIA. Prof. (Dr.) Venkata Raghavendra Miriampally, Electrical and Computer Engineering Dept, Adama Science & Technology University, Adama, Ethiopia. Prof. (Dr.) Jatinderkumar R. Saini, Director (I.T.), GTU's Ankleshwar-Bharuch Innovation Sankul &Director I/C & Associate Professor, Narmada College of Computer Application, Zadeshwar, Bharuch, Gujarat, India. Prof. Jaswinder Singh, Mechanical Engineering Department, University Institute Of Engineering & Technology, Panjab University SSG Regional Centre, Hoshiarpur, Punjab, India- 146001. Prof. (Dr.) S.Kadhiravan, Head i/c, Department of Psychology, Periyar University, Salem- 636 011,Tamil Nadu, India. Prof. (Dr.) Mohammad Israr, Principal, Balaji Engineering College,Junagadh, Gujarat-362014, India. Prof. (Dr.) VENKATESWARLU B., Director of MCA in Sreenivasa Institute of Technology and Management Studies (SITAMS), Chittoor. Prof. (Dr.) Deepak Paliwal, Faculty of Sociology, Uttarakhand Open University, Haldwani-Nainital Prof. (Dr.) Dr. Anil K Dwivedi, Faculty of Pollution & Environmental Assay Research Laboratory (PEARL), Department of Botany,DDU Gorakhpur University,Gorakhpur-273009, India. Prof. R. Ravikumar, Department of Agricultural and Rural Management, TamilNadu Agricultural University, Coimbatore-641003,Tamil Nadu, India. Prof. (Dr.) R.Raman, Professor of Agronomy, Faculty of Agriculture, Annamalai university, Annamalai Nagar 608 002Tamil Nadu, India. Prof. (Dr.) Ahmed Khalafallah, Coordinator of the CM Degree Program, Department of Architectural and Manufacturing Sciences, Ogden College of Sciences and Engineering Western Kentucky University 1906 College Heights Blvd Bowling Green, KY 42103-1066 Prof. (Dr.) Asmita Das , Delhi Technological University (Formerly Delhi College of Engineering), Shahbad, Daulatpur, Delhi 110042, India. Prof. (Dr.)Aniruddha Bhattacharjya, Assistant Professor (Senior Grade), CSE Department, Amrita School of Engineering , Amrita Vishwa VidyaPeetham (University), Kasavanahalli, Carmelaram P.O., Bangalore 560035, Karnataka, India Prof. (Dr.) S. Rama Krishna Pisipaty, Prof & Geoarchaeologist, Head of the Department of Sanskrit & Indian Culture, SCSVMV University, Enathur, Kanchipuram 631561, India Prof. (Dr.) Shubhasheesh Bhattacharya, Professor & HOD(HR), Symbiosis Institute of International Business (SIIB), Hinjewadi, Phase-I, Pune- 411 057 Prof. (Dr.) Vijay Kothari, Institute of Science, Nirma University, S-G Highway, Ahmedabad 382481, India. Prof. (Dr.) Raja Sekhar Mamillapalli, Department of Civil Engineering at Sir Padampat Singhania University, Udaipur, India.


                             

Prof. (Dr.)B. M. Kunar, Department of Mining Engineering, Indian School of Mines, Dhanbad 826004, Jharkhand, India. Prof. (Dr.) Prabir Sarkar, Assistant Professor, School of Mechanical, Materials and Energy Engineering, Room 307, Academic Block, Indian Institute of Technology, Ropar, Nangal Road, Rupnagar 140001, Punjab, India. Prof. (Dr.) K.Srinivasmoorthy, Associate Professor, Department of Earth Sciences, School of Physical,Chemical and Applied Sciences, Pondicherry university, R.Venkataraman Nagar, Kalapet, Puducherry 605014, India. Prof. (Dr.) Bhawna Dubey, Institute of Environmental Science (AIES), Amity University, Noida, India. Prof. (Dr.) P. Bhanu Prasad, Vision Specialist, Matrix vision GmbH, Germany, Consultant, TIFACCORE for Machine Vision, Advisor, Kelenn Technology, France Advisor, Shubham Automation & Services, Ahmedabad, and Professor of C.S.E, Rajalakshmi Engineering College, India. Prof. (Dr.)P.Raviraj, Professor & Head, Dept. of CSE, Kalaignar Karunanidhi, Institute of Technology, Coimbatore 641402,Tamilnadu,India. Prof. (Dr.) Damodar Reddy Edla, Department of Computer Science & Engineering, Indian School of Mines, Dhanbad, Jharkhand 826004, India. Prof. (Dr.) T.C. Manjunath, Principal in HKBK College of Engg., Bangalore, Karnataka, India. Prof. (Dr.) Pankaj Bhambri, I.T. Deptt., Guru Nanak Dev Engineering College, Ludhiana 141006, Punjab, India . Prof. Shashikant Shantilal Patil SVKM, MPSTME Shirpur Campus, NMIMS University Vile Parle Mumbai, India. Prof. (Dr.) Shambhu Nath Choudhary, Department of Physics, T.M. Bhagalpur University, Bhagalpur 81200, Bihar, India. Prof. (Dr.) Venkateshwarlu Sonnati, Professor & Head of EEED, Department of EEE, Sreenidhi Institute of Science & Technology, Ghatkesar, Hyderabad, Andhra Pradesh, India. Prof. (Dr.) Saurabh Dalela, Department of Pure & Applied Physics, University of Kota, KOTA 324010, Rajasthan, India. Prof. S. Arman Hashemi Monfared, Department of Civil Eng, University of Sistan & Baluchestan, Daneshgah St.,Zahedan, IRAN, P.C. 98155-987 Prof. (Dr.) R.S.Chanda, Dept. of Jute & Fibre Tech., University of Calcutta, Kolkata 700019, West Bengal, India. Prof. V.S.VAKULA, Department of Electrical and Electronics Engineering, JNTUK, University College of Engg., Vizianagaram5 35003, Andhra Pradesh, India. Prof. (Dr.) Nehal Gitesh Chitaliya, Sardar Vallabhbhai Patel Institute of Technology, Vasad 388 306, Gujarat, India. Prof. (Dr.) D.R. Prajapati, Department of Mechanical Engineering, PEC University of Technology,Chandigarh 160012, India. Dr. A. SENTHIL KUMAR, Postdoctoral Researcher, Centre for Energy and Electrical Power, Electrical Engineering Department, Faculty of Engineering and the Built Environment, Tshwane University of Technology, Pretoria 0001, South Africa. Prof. (Dr.)Vijay Harishchandra Mankar, Department of Electronics & Telecommunication Engineering, Govt. Polytechnic, Mangalwari Bazar, Besa Road, Nagpur- 440027, India. Prof. Varun.G.Menon, Department Of C.S.E, S.C.M.S School of Engineering, Karukutty, Ernakulam, Kerala 683544, India. Prof. (Dr.) U C Srivastava, Department of Physics, Amity Institute of Applied Sciences, Amity University, Noida, U.P-203301.India. Prof. (Dr.) Surendra Yadav, Professor and Head (Computer Science & Engineering Department), Maharashi Arvind College of Engineering and Research Centre (MACERC), Jaipur, Rajasthan, India. Prof. (Dr.) Sunil Kumar, H.O.D. Applied Sciences & Humanities Dehradun Institute of Technology, (D.I.T. School of Engineering), 48 A K.P-3 Gr. Noida (U.P.) 201308 Prof. Naveen Jain, Dept. of Electrical Engineering, College of Technology and Engineering, Udaipur-313 001, India. Prof. Veera Jyothi.B, CBIT ,Hyderabad, Andhra Pradesh, India. Prof. Aritra Ghosh, Global Institute of Management and Technology, Krishnagar, Nadia, W.B. India Prof. Anuj K. Gupta, Head, Dept. of Computer Science & Engineering, RIMT Group of Institutions, Sirhind Mandi Gobindgarh, Punajb, India. Prof. (Dr.) Varala Ravi, Head, Department of Chemistry, IIIT Basar Campus, Rajiv Gandhi University of Knowledge Technologies, Mudhole, Adilabad, Andhra Pradesh- 504 107, India Prof. (Dr.) Ravikumar C Baratakke, faculty of Biology,Govt. College, Saundatti - 591 126, India.


                              

Prof. (Dr.) NALIN BHARTI, School of Humanities and Social Science, Indian Institute of Technology Patna, India. Prof. (Dr.) Shivanand S.Gornale, Head, Department of Studies in Computer Science, Government College (Autonomous), Mandya, Mandya-571 401-Karanataka Prof. (Dr.) Naveen.P.Badiger, Dept.Of Chemistry, S.D.M.College of Engg. & Technology, Dharwad-580002, Karnataka State, India. Prof. (Dr.) Bimla Dhanda, Professor & Head, Department of Human Development and Family Studies, College of Home Science, CCS, Haryana Agricultural University, Hisar- 125001 (Haryana) India. Prof. (Dr.) Tauqeer Ahmad Usmani, Faculty of IT, Salalah College of Technology, Salalah, Sultanate of Oman, Prof. (Dr.) Naresh Kr. Vats, Chairman, Department of Law, BGC Trust University Bangladesh Prof. (Dr.) Papita Das (Saha), Department of Environmental Science, University of Calcutta, Kolkata, India Prof. (Dr.) Rekha Govindan , Dept of Biotechnology, Aarupadai Veedu Institute of technology , Vinayaka Missions University , Paiyanoor , Kanchipuram Dt, Tamilnadu , India Prof. (Dr.) Lawrence Abraham Gojeh, Department of Information Science, Jimma University, P.o.Box 378, Jimma, Ethiopia Prof. (Dr.) M.N. Kalasad, Department of Physics, SDM College of Engineering & Technology, Dharwad, Karnataka, India Prof. Rab Nawaz Lodhi, Department of Management Sciences, COMSATS Institute of Information Technology Sahiwal Prof. (Dr.) Masoud Hajarian, Department of Mathematics, Faculty of Mathematical Sciences, Shahid Beheshti University, General Campus, Evin, Tehran 19839,Iran Prof. (Dr.) Chandra Kala Singh, Associate professor, Department of Human Development and Family Studies, College of Home Science, CCS, Haryana Agricultural University, Hisar- 125001 (Haryana) India Prof. (Dr.) J.Babu, Professor & Dean of research, St.Joseph's College of Engineering & Technology, Choondacherry, Palai,Kerala. Prof. (Dr.) Pradip Kumar Roy, Department of Applied Mechanics, Birla Institute of Technology (BIT) Mesra, Ranchi-835215, Jharkhand, India. Prof. (Dr.) P. Sanjeevi kumar, School of Electrical Engineering (SELECT), Vandalur Kelambakkam Road, VIT University, Chennai, India. Prof. (Dr.) Debasis Patnaik, BITS-Pilani, Goa Campus, India. Prof. (Dr.) SANDEEP BANSAL, Associate Professor, Department of Commerce, I.G.N. College, Haryana, India. Dr. Radhakrishnan S V S, Department of Pharmacognosy, Faser Hall, The University of Mississippi Oxford, MS-38655, USA Prof. (Dr.) Megha Mittal, Faculty of Chemistry, Manav Rachna College of Engineering, Faridabad (HR), 121001, India. Prof. (Dr.) Mihaela Simionescu (BRATU), BUCHAREST, District no. 6, Romania, member of the Romanian Society of Econometrics, Romanian Regional Science Association and General Association of Economists from Romania Prof. (Dr.) Atmani Hassan, Director Regional of Organization Entraide Nationale Prof. (Dr.) Deepshikha Gupta, Dept. of Chemistry, Amity Institute of Applied Sciences,Amity University, Sec.125, Noida, India Prof. (Dr.) Muhammad Kamruzzaman, Deaprtment of Infectious Diseases, The University of Sydney, Westmead Hospital, Westmead, NSW-2145. Prof. (Dr.) Meghshyam K. Patil , Assistant Professor & Head, Department of Chemistry,Dr. Babasaheb Ambedkar Marathwada University,Sub-Campus, Osmanabad- 413 501, Maharashtra, INDIA Prof. (Dr.) Ashok Kr. Dargar, Department of Mechanical Engineering, School of Engineering, Sir Padampat Singhania University, Udaipur (Raj.) Prof. (Dr.) Sudarson Jena, Dept. of Information Technology, GITAM University, Hyderabad, India Prof. (Dr.) Jai Prakash Jaiswal, Department of Mathematics, Maulana Azad National Institute of Technology Bhopal-India Prof. (Dr.) S.Amutha, Dept. of Educational Technology, Bharathidasan University, Tiruchirappalli620 023, Tamil Nadu-India Prof. (Dr.) R. HEMA KRISHNA, Environmental chemistry, University of Toronto, Canada. Prof. (Dr.) B.Swaminathan, Dept. of Agrl.Economics, Tamil Nadu Agricultural University, India.


                             

Prof. (Dr.) Meghshyam K. Patil, Assistant Professor & Head, Department of Chemistry, Dr. Babasaheb Ambedkar Marathwada University, Sub-Campus, Osmanabad- 413 501, Maharashtra, INDIA Prof. (Dr.) K. Ramesh, Department of Chemistry, C .B . I. T, Gandipet, Hyderabad-500075 Prof. (Dr.) Sunil Kumar, H.O.D. Applied Sciences &Humanities, JIMS Technical campus,(I.P. University,New Delhi), 48/4 ,K.P.-3,Gr.Noida (U.P.) Prof. (Dr.) G.V.S.R.Anjaneyulu, CHAIRMAN - P.G. BOS in Statistics & Deputy Coordinator UGC DRS-I Project, Executive Member ISPS-2013, Department of Statistics, Acharya Nagarjuna University, Nagarjuna Nagar-522510, Guntur, Andhra Pradesh, India Prof. (Dr.) Sribas Goswami, Department of Sociology, Serampore College, Serampore 712201, West Bengal, India. Prof. (Dr.) Sunanda Sharma, Department of Veterinary Obstetrics Y Gynecology, College of Veterinary & Animal Science,Rajasthan University of Veterinary & Animal Sciences,Bikaner334001, India. Prof. (Dr.) S.K. Tiwari, Department of Zoology, D.D.U. Gorakhpur University, Gorakhpur-273009 U.P., India. Prof. (Dr.) Praveena Kuruva, Materials Research Centre, Indian Institute of Science, Bangalore560012, INDIA Prof. (Dr.) Rajesh Kumar, Department Of Applied Physics , Bhilai Institute Of Technology, Durg (C.G.) 491001 Prof. (Dr.) Y.P.Singh, (Director), Somany (PG) Institute of Technology and Management, Garhi Bolni Road, Delhi-Jaipur Highway No. 8, Beside 3 km from City Rewari, Rewari-123401, India. Prof. (Dr.) MIR IQBAL FAHEEM, VICE PRINCIPAL &HEAD- Department of Civil Engineering & Professor of Civil Engineering, Deccan College of Engineering & Technology, Dar-us-Salam, Aghapura, Hyderabad (AP) 500 036. Prof. (Dr.) Jitendra Gupta, Regional Head, Co-ordinator(U.P. State Representative)& Asstt. Prof., (Pharmaceutics), Institute of Pharmaceutical Research, GLA University, Mathura. Prof. (Dr.) N. Sakthivel, Scientist - C,Research Extension Center,Central Silk Board, Government of India, Inam Karisal Kulam (Post), Srivilliputtur - 626 125,Tamil Nadu, India. Prof. (Dr.) Omprakash Srivastav, Centre of Advanced Study, Department of History, Aligarh Muslim University, Aligarh-202 001, INDIA. Prof. (Dr.) K.V.L.N.Acharyulu, Associate Professor, Department of Mathematics, Bapatla Engineering college, Bapatla-522101, INDIA. Prof. (Dr.) Fateh Mebarek-Oudina, Assoc. Prof., Sciences Faculty,20 aout 1955-Skikda University, B.P 26 Route El-Hadaiek, 21000,Skikda, Algeria. NagaLaxmi M. Raman, Project Support Officer, Amity International Centre for Postharvest, Technology & Cold Chain Management, Amity University Campus, Sector-125, Expressway, Noida Prof. (Dr.) V.SIVASANKAR, Associate Professor, Department Of Chemistry, Thiagarajar College Of Engineering (Autonomous), Madurai 625015, Tamil Nadu, India (Dr.) Ramkrishna Singh Solanki, School of Studies in Statistics, Vikram University, Ujjain, India Prof. (Dr.) M.A.Rabbani, Professor/Computer Applications, School of Computer, Information and Mathematical Sciences, B.S.Abdur Rahman University, Chennai, India Prof. (Dr.) P.P.Satya Paul Kumar, Associate Professor, Physical Education & Sports Sciences, University College of Physical Education & Sports, Sciences, Acharya Nagarjuna University, Guntur. Prof. (Dr.) Fazal Shirazi, PostDoctoral Fellow, Infectious Disease, MD Anderson Cancer Center, Houston, Texas, USA Prof. (Dr.) Omprakash Srivastav, Department of Museology, Aligarh Muslim University, Aligarh202 001, INDIA. Prof. (Dr.) Mandeep Singh walia, A.P. E.C.E., Panjab University SSG Regional Centre Hoshiarpur, Una Road, V.P.O. Allahabad, Bajwara, Hoshiarpur Prof. (Dr.) Ho Soon Min, Senior Lecturer, Faculty of Applied Sciences, INTI International University, Persiaran Perdana BBN, Putra Nilai, 71800 Nilai, Negeri Sembilan, Malaysia Prof. (Dr.) L.Ganesamoorthy, Assistant Professor in Commerce, Annamalai University, Annamalai Nagar-608002, Chidambaram, Tamilnadu, India. Prof. (Dr.) Vuda Sreenivasarao, Professor, School of Computing and Electrical Engineering, Bahir Dar University, Bahirdar,Ethiopia Prof. (Dr.) Umesh Sharma, Professor & HOD Applied Sciences & Humanities, Eshan college of Engineering, Mathura, India. Prof. (Dr.) K. John Singh, School of Information Technology and Engineering, VIT University, Vellore, Tamil Nadu, India. Prof. (Dr.) Sita Ram Pal (Asst.Prof.), Dept. of Special Education, Dr.BAOU, Ahmedabad, India.


                                 

Prof. Vishal S.Rana, H.O.D, Department of Business Administration, S.S.B.T'S College of Engineering & Technology, Bambhori,Jalgaon (M.S), India. Prof. (Dr.) Chandrakant Badgaiyan, Department of Mechatronics and Engineering, Chhattisgarh. Dr. (Mrs.) Shubhrata Gupta, Prof. (Electrical), NIT Raipur, India. Prof. (Dr.) Usha Rani. Nelakuditi, Assoc. Prof., ECE Deptt., Vignan’s Engineering College, Vignan University, India. Prof. (Dr.) S. Swathi, Asst. Professor, Department of Information Technology, Vardhaman college of Engineering(Autonomous) , Shamshabad, R.R District, India. Prof. (Dr.) Raja Chakraverty, M Pharm (Pharmacology), BCPSR, Durgapur, West Bengal, India Prof. (Dr.) P. Sanjeevi Kumar, Electrical & Electronics Engineering, National Institute of Technology (NIT-Puducherry), An Institute of National Importance under MHRD (Govt. of India), Karaikal- 609 605, India. Prof. (Dr.) Amitava Ghosh, Professor & Principal, Bengal College of Pharmaceutical Sciences and Research, B.R.B. Sarani, Bidhannagar, Durgapur, West Bengal- 713212. Prof. (Dr.) Om Kumar Harsh, Group Director, Amritsar College of Engineering and Technology, Amritsar 143001 (Punjab), India. Prof. (Dr.) Mansoor Maitah, Department of International Relations, Faculty of Economics and Management, Czech University of Life Sciences Prague, 165 21 Praha 6 Suchdol, Czech Republic. Prof. (Dr.) Zahid Mahmood, Department of Management Sciences (Graduate Studies), Bahria University, Naval Complex, Sector, E-9, Islamabad, Pakistan. Prof. (Dr.) N. Sandeep, Faculty Division of Fluid Dynamics, VIT University, Vellore-632 014. Mr. Jiban Shrestha, Scientist (Plant Breeding and Genetics), Nepal Agricultural Research Council, National Maize Research Program, Rampur, Chitwan, Nepal. Prof. (Dr.) Rakhi Garg, Banaras Hindu University, Varanasi, Uttar Pradesh, India. Prof. (Dr.) Ramakant Pandey. Dept. of Biochemistry. Patna University Patna (Bihar)-India. Prof. (Dr.) Nalah Augustine Bala, Behavioural Health Unit, Psychology Department, Nasarawa State University, Keffi, P.M.B. 1022 Keffi, Nasarawa State, Nigeria. Prof. (Dr.) Mehdi Babaei, Department of Engineering, Faculty of Civil Engineering, University of Zanjan, Iran. Prof. (Dr.) A. SENTHIL KUMAR., Professor/EEE, VELAMMAL ENGINEERING COLLEGE, CHENNAI Prof. (Dr.) Gudikandhula Narasimha Rao, Dept. of Computer Sc. & Engg., KKR & KSR Inst Of Tech & Sciences, Guntur, Andhra Pradesh, India. Prof. (Dr.) Dhanesh singh, Department of Chemistry, K.G. Arts & Science College, Raigarh (C.G.) India. Prof. (Dr.) Syed Umar , Dept. of Electronics and Computer Engineering, KL University, Guntur, A.P., India. Prof. (Dr.) Rachna Goswami, Faculty in Bio-Science Department, IIIT Nuzvid (RGUKT), DistrictKrishna , Andhra Pradesh - 521201 Prof. (Dr.) Ahsas Goyal, FSRHCP, Founder & Vice president of Society of Researchers and Health Care Professionals Prof. (Dr.) Gagan Singh, School of Management Studies and Commerce, Department of Commerce, Uttarakhand Open University, Haldwani-Nainital, Uttarakhand (UK)-263139 (India) Prof. (Dr.) Solomon A. O. Iyekekpolor, Mathematics and Statistics, Federal University, WukariNigeria. Prof. (Dr.) S. Saiganesh, Faculty of Marketing, Dayananda Sagar Business School, Bangalore, India. Dr. K.C.Sivabalan, Field Enumerator and Data Analyst, Asian Vegetable Research Centre, The World Vegetable Centre, Taiwan Prof. (Dr.) Amit Kumar Mishra, Department of Environmntal Science and Energy Research, Weizmann Institute of Science, Rehovot, Israel Prof. (Dr.) Manisha N. Paliwal, Sinhgad Institute of Management, Vadgaon (Bk), Pune, India Prof. (Dr.) M. S. HIREMATH, Principal, K.L.ESOCIETY’S SCHOOL, ATHANI, India Prof. Manoj Dhawan, Department of Information Technology, Shri Vaishnav Institute of Technology & Science, Indore, (M. P.), India Prof. (Dr.) V.R.Naik, Professor & Head of Department, Mechancal Engineering , Textile & Engineering Institute, Ichalkaranji (Dist. Kolhapur), Maharashatra, India Prof. (Dr.) Jyotindra C. Prajapati,Head, Department of Mathematical Sciences, Faculty of Applied Sciences, Charotar University of Science and Technology, Changa Anand -388421, Gujarat, India Prof. (Dr.) Sarbjit Singh, Head, Department of Industrial & Production Engineering, Dr BR Ambedkar National Institute of Technology, Jalandhar, Punjab,India


                                

Prof. (Dr.) Professor Braja Gopal Bag, Department of Chemistry and Chemical Technology, Vidyasagar University, West Midnapore Prof. (Dr.) Ashok Kumar Chandra, Department of Management, Bhilai Institute of Technology, Bhilai House, Durg (C.G.) Prof. (Dr.) Amit Kumar, Assistant Professor, School of Chemistry, Shoolini University, Solan, Himachal Pradesh, India Prof. (Dr.) L. Suresh Kumar, Mechanical Department, Chaitanya Bharathi Institute of Technology, Hyderabad, India. Scientist Sheeraz Saleem Bhat, Lac Production Division, Indian Institute of Natural Resins and Gums, Namkum, Ranchi, Jharkhand Prof. C.Divya , Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli - 627012, Tamilnadu , India Prof. T.D.Subash, Infant Jesus College Of Engineering and Technology, Thoothukudi Tamilnadu, India Prof. (Dr.) Vinay Nassa, Prof. E.C.E Deptt., Dronacharya.Engg. College, Gurgaon India. Prof. Sunny Narayan, university of Roma Tre, Italy. Prof. (Dr.) Sanjoy Deb, Dept. of ECE, BIT Sathy, Sathyamangalam, Tamilnadu-638401, India. Prof. (Dr.) Reena Gupta, Institute of Pharmaceutical Research, GLA University, Mathura-India Prof. (Dr.) P.R.SivaSankar, Head Dept. of Commerce, Vikrama Simhapuri University Post Graduate Centre, KAVALI - 524201, A.P., India Prof. (Dr.) Mohsen Shafiei Nikabadi, Faculty of Economics and Management, Industrial Management Department, Semnan University, Semnan, Iran. Prof. (Dr.) Praveen Kumar Rai, Department of Geography, Faculty of Science, Banaras Hindu University, Varanasi-221005, U.P. India Prof. (Dr.) Christine Jeyaseelan, Dept of Chemistry, Amity Institute of Applied Sciences, Amity University, Noida, India Prof. (Dr.) M A Rizvi, Dept. of Computer Engineering and Applications , National Institute of Technical Teachers' Training and Research, Bhopal M.P. India Prof. (Dr.) K.V.N.R.Sai Krishna, H O D in Computer Science, S.V.R.M.College,(Autonomous), Nagaram, Guntur(DT), Andhra Pradesh, India. Prof. (Dr.) Ashok Kr. Dargar, Department of Mechanical Engineering, School of Engineering, Sir Padampat Singhania University, Udaipur (Raj.) Prof. (Dr.) Asim Kumar Sen, Principal , ST.Francis Institute of Technology (Engineering College) under University of Mumbai , MT. Poinsur, S.V.P Road, Borivali (W), Mumbai, 400103, India, Prof. (Dr.) Rahmathulla Noufal.E, Civil Engineering Department, Govt.Engg.College-Kozhikode Prof. (Dr.) N.Rajesh, Department of Agronomy, TamilNadu Agricultural University -Coimbatore, TamilNadu, India Prof. (Dr.) Har Mohan Rai, Professor, Electronics and Communication Engineering, N.I.T. Kurukshetra 136131,India Prof. (Dr.) Eng. Sutasn Thipprakmas from King Mongkut, University of Technology Thonburi, Thailand Prof. (Dr.) Kantipudi MVV Prasad, EC Department, RK University, Rajkot. Prof. (Dr.) Jitendra Gupta,Faculty of Pharmaceutics, Institute of Pharmaceutical Research, GLA University, Mathura. Prof. (Dr.) Swapnali Borah, HOD, Dept of Family Resource Management, College of Home Science, Central Agricultural University, Tura, Meghalaya, India Prof. (Dr.) N.Nazar Khan, Professor in Chemistry, BTK Institute of Technology, Dwarahat-263653 (Almora), Uttarakhand-India Prof. (Dr.) Rajiv Sharma, Department of Ocean Engineering, Indian Institute of Technology Madras, Chennai (TN) - 600 036, India. Prof. (Dr.) Aparna Sarkar, PH.D. Physiology, AIPT, Amity University , F 1 Block, LGF, Sector125,Noida-201303, UP, India. Prof. (Dr.) Manpreet Singh, Professor and Head, Department of Computer Engineering, Maharishi Markandeshwar University, Mullana, Haryana, India. Prof. (Dr.) Sukumar Senthilkumar, Senior Researcher, Advanced Education Center of Jeonbuk for Electronics and Information Technology, Chon Buk National University, Chon Buk, 561-756, SOUTH KOREA. . Prof. (Dr.) Hari Singh Dhillon, Assistant Professor, Department of Electronics and Communication Engineering, DAV Institute of Engineering and Technology, Jalandhar (Punjab), INDIA. . Prof. (Dr.) Poonkuzhali, G., Department of Computer Science and Engineering, Rajalakshmi Engineering College, Chennai, INDIA. .


                                 

Prof. (Dr.) Bharath K N, Assistant Professor, Dept. of Mechanical Engineering, GM Institute of Technology, PB Road, Davangere 577006, Karnataka, India. Prof. (Dr.) F.Alipanahi, Assistant Professor, Islamic Azad University, Zanjan Branch, Atemadeyeh, Moalem Street, Zanjan IRAN. Prof. Yogesh Rathore, Assistant Professor, Dept. of Computer Science & Engineering, RITEE, Raipur, India Prof. (Dr.) Ratneshwer, Department of Computer Science (MMV),Banaras Hindu University Varanasi-221005, India. Prof. Pramod Kumar Pandey, Assistant Professor, Department Electronics & Instrumentation Engineering, ITM University, Gwalior, M.P., India. Prof. (Dr.)Sudarson Jena, Associate Professor, Dept.of IT, GITAM University, Hyderabad, India Prof. (Dr.) Binod Kumar, PhD(CS), M.Phil(CS), MIEEE,MIAENG, Dean & Professor( MCA), Jayawant Technical Campus(JSPM's), Pune, India. Prof. (Dr.) Mohan Singh Mehata, (JSPS fellow), Assistant Professor, Department of Applied Physics, Delhi Technological University, Delhi Prof. Ajay Kumar Agarwal, Asstt. Prof., Deptt. of Mech. Engg., Royal Institute of Management & Technology, Sonipat (Haryana), India. Prof. (Dr.) Siddharth Sharma, University School of Management, Kurukshetra University, Kurukshetra, India. Prof. (Dr.) Satish Chandra Dixit, Department of Chemistry, D.B.S.College, Govind Nagar,Kanpur208006, India. Prof. (Dr.) Ajay Solkhe, Department of Management, Kurukshetra University, Kurukshetra, India. Prof. (Dr.) Neeraj Sharma, Asst. Prof. Dept. of Chemistry, GLA University, Mathura, India. Prof. (Dr.) Basant Lal, Department of Chemistry, G.L.A. University, Mathura, India. Prof. (Dr.) T Venkat Narayana Rao, C.S.E, Guru Nanak Engineering College, Hyderabad, Andhra Pradesh, India. Prof. (Dr.) Rajanarender Reddy Pingili, S.R. International Institute of Technology, Hyderabad, Andhra Pradesh, India. Prof. (Dr.) V.S.Vairale, Department of Computer Engineering, All India Shri Shivaji Memorial Society College of Engineering, Kennedy Road, Pune-411 001, Maharashtra, India. Prof. (Dr.) Vasavi Bande, Department of Computer Science & Engineering, Netaji Institute of Engineering and Technology, Hyderabad, Andhra Pradesh, India Prof. (Dr.) Hardeep Anand, Department of Chemistry, Kurukshetra University Kurukshetra, Haryana, India. Prof. Aasheesh shukla, Asst Professor, Dept. of EC, GLA University, Mathura, India. Prof. S.P.Anandaraj., CSE Dept, SREC, Warangal, India. Prof. (Dr.) Chitranjan Agrawal, Department of Mechanical Engineering, College of Technology & Engineering, Maharana Pratap University of Agriculture & Technology, Udaipur- 313001, Rajasthan, India. Prof. (Dr.) Rangnath Aher, Principal, New Arts, Commerce and Science College, Parner, DistAhmednagar, M.S. India. Prof. (Dr.) Chandan Kumar Panda, Department of Agricultural Extension, College of Agriculture, Tripura, Lembucherra-799210 Prof. (Dr.) Latika Kharb, IP Faculty (MCA Deptt), Jagan Institute of Management Studies (JIMS), Sector-5, Rohini, Delhi, India. Raj Mohan Raja Muthiah, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts. Prof. (Dr.) Chhanda Chatterjee, Dept of Philosophy, Balurghat College, West Bengal, India. Prof. (Dr.) Mihir Kumar Shome , H.O.D of Mathematics, Management and Humanities, National Institute of Technology, Arunachal Pradesh, India Prof. (Dr.) Muthukumar .Subramanyam, Registrar (I/C), Faculty, Computer Science and Engineering, National Institute of Technology, Puducherry, India. Prof. (Dr.) Vinay Saxena, Department of Mathematics, Kisan Postgraduate College, Bahraich – 271801 UP, India. Satya Rishi Takyar, Senior ISO Consultant, New Delhi, India. Prof. Anuj K. Gupta, Head, Dept. of Computer Science & Engineering, RIMT Group of Institutions, Mandi Gobindgarh (PB) Prof. (Dr.) Harish Kumar, Department of Sports Science, Punjabi University, Patiala, Punjab, India. Prof. (Dr.) Mohammed Ali Hussain, Professor, Dept. of Electronics and Computer Engineering, KL University, Green Fields, Vaddeswaram, Andhra Pradesh, India.


                                           

Prof. (Dr.) Manish Gupta, Department of Mechanical Engineering, GJU, Haryana, India. Prof. Mridul Chawla, Department of Elect. and Comm. Engineering, Deenbandhu Chhotu Ram University of Science & Technology, Murthal, Haryana, India. Prof. Seema Chawla, Department of Bio-medical Engineering, Deenbandhu Chhotu Ram University of Science & Technology, Murthal, Haryana, India. Prof. (Dr.) Atul M. Gosai, Department of Computer Science, Saurashtra University, Rajkot, Gujarat, India. Prof. (Dr.) Ajit Kr. Bansal, Department of Management, Shoolini University, H.P., India. Prof. (Dr.) Sunil Vasistha, Mody Institute of Tecnology and Science, Sikar, Rajasthan, India. Prof. Vivekta Singh, GNIT Girls Institute of Technology, Greater Noida, India. Prof. Ajay Loura, Assistant Professor at Thapar University, Patiala, India. Prof. Sushil Sharma, Department of Computer Science and Applications, Govt. P. G. College, Ambala Cantt., Haryana, India. Prof. Sube Singh, Assistant Professor, Department of Computer Engineering, Govt. Polytechnic, Narnaul, Haryana, India. Prof. Himanshu Arora, Delhi Institute of Technology and Management, New Delhi, India. Dr. Sabina Amporful, Bibb Family Practice Association, Macon, Georgia, USA. Dr. Pawan K. Monga, Jindal Institute of Medical Sciences, Hisar, Haryana, India. Dr. Sam Ampoful, Bibb Family Practice Association, Macon, Georgia, USA. Dr. Nagender Sangra, Director of Sangra Technologies, Chandigarh, India. Vipin Gujral, CPA, New Jersey, USA. Sarfo Baffour, University of Ghana, Ghana. Monique Vincon, Hype Softwaretechnik GmbH, Bonn, Germany. Natasha Sigmund, Atlanta, USA. Marta Trochimowicz, Rhein-Zeitung, Koblenz, Germany. Kamalesh Desai, Atlanta, USA. Vijay Attri, Software Developer Google, San Jose, California, USA. Neeraj Khillan, Wipro Technologies, Boston, USA. Ruchir Sachdeva, Software Engineer at Infosys, Pune, Maharashtra, India. Anadi Charan, Senior Software Consultant at Capgemini, Mumbai, Maharashtra. Pawan Monga, Senior Product Manager, LG Electronics India Pvt. Ltd., New Delhi, India. Sunil Kumar, Senior Information Developer, Honeywell Technology Solutions, Inc., Bangalore, India. Bharat Gambhir, Technical Architect, Tata Consultancy Services (TCS), Noida, India. Vinay Chopra, Team Leader, Access Infotech Pvt Ltd. Chandigarh, India. Sumit Sharma, Team Lead, American Express, New Delhi, India. Vivek Gautam, Senior Software Engineer, Wipro, Noida, India. Anirudh Trehan, Nagarro Software Gurgaon, Haryana, India. Manjot Singh, Senior Software Engineer, HCL Technologies Delhi, India. Rajat Adlakha, Senior Software Engineer, Tech Mahindra Ltd, Mumbai, Maharashtra, India. Mohit Bhayana, Senior Software Engineer, Nagarro Software Pvt. Gurgaon, Haryana, India. Dheeraj Sardana, Tech. Head, Nagarro Software, Gurgaon, Haryana, India. Naresh Setia, Senior Software Engineer, Infogain, Noida, India. Raj Agarwal Megh, Idhasoft Limited, Pune, Maharashtra, India. Shrikant Bhardwaj, Senior Software Engineer, Mphasis an HP Company, Pune, Maharashtra, India. Vikas Chawla, Technical Lead, Xavient Software Solutions, Noida, India. Kapoor Singh, Sr. Executive at IBM, Gurgaon, Haryana, India. Ashwani Rohilla, Senior SAP Consultant at TCS, Mumbai, India. Anuj Chhabra, Sr. Software Engineer, McKinsey & Company, Faridabad, Haryana, India. Jaspreet Singh, Business Analyst at HCL Technologies, Gurgaon, Haryana, India.


TOPICS OF INTEREST Topics of interest include, but are not limited to, the following:                                      

Software architectures for scientific computing Computer architecture & VLSI Mobile robots Artificial intelligence systems and architectures Distributed and parallel processing Microcontrollers & microprocessor applications Natural language processing and expert systems Fuzzy logic and soft computing Semantic Web e-Learning design and methodologies Knowledge and information management techniques Enterprise Applications for software and Web engineering Open-source e-Learning platforms Internet payment systems Advanced Web service technologies including security, process management and QoS Web retrieval systems Software and multimedia Web Advanced database systems Software testing, verifications and validation methods UML/MDA and AADL e-Commerce applications using Web services Semantic Web for e-Business and e-Learning Object oriented technology Software and Web metrics Techniques for B2B e-Commerce e-Business models and architectures Service-oriented e-Commerce Enterprise-wide client-server architectures Software maintenance and evolution Component based software engineering Multimedia and hypermedia software engineering Enterprise software, middleware, and tools Service oriented software architecture Model based software engineering Information systems analysis and specification Aspect-oriented programming Web-based learning, wikis and blogs Social networks and intelligence Social science simulation



TABLE OF CONTENTS International Journal of Software and Web Sciences (IJSWS) ISSN (Print): 2279-0063, ISSN (Online): 2279-0071 (December-2014 to February-2015, Issue 11, Volume 1) Issue 11, Volume 1 Paper Code

Paper Title

Page No.

IJSWS 15-104

A TOOL TO GENERATE A COLLABORATIVE CONTENT COMPATIBLE WITH IMS-LD Fauzi El Moudden, Souhaib Aammou, Mohamed Khaldi

01-08

IJSWS 15-105

A Query Enhancement Technique for Extracting Relevant Information Mansour Aldawood and Dr. Ahmed Z Emam

09-15

IJSWS 15-110

Review on Various Routing Protocols Based on VANET's: A Survey Gurminder Kaur, Manju Bala, Manoj Kumar

16-24

IJSWS 15-113

A novel clustering approach to select optimal usability principles for educational websites Prafulla Bharat Bafna

25-27

IJSWS 15-115

NEURAL NETWORK BASED SOFTWARE RELIABILITY Gaurav Aggarwal, Dr. V.K. Gupta

28-30

IJSWS 15-118

GIS Based Automated Drainage Extraction for the Analysis of basin Morphometry in Vaniyar Subbasin, South India Ebenezer Sahayam Samuel. A, Dr. Sorna Chandra Devadass

31-34

IJSWS 15-121

Visualizing and Analyzing Industrial Samples Using Non-Destructive Testing Snigdha S. Parthan, Aditi Deodhar, Pranoti Nage, Rupali Deshmukh

35-39

IJSWS 15-124

Determining the Co-relation between Behavioral Engagement and Performance of e-Learners Dias B.A.M.T., Malida K.K.D.S., Sahani M.A., Jayathilaka J.M.S.C., T. C Sandanayaka, G.T.I Karunarathne

40-46

IJSWS 15-125

Community Detection: A Boom to Society Mini Singh Ahuja, Jasmine

47-50

IJSWS 15-127

Prioritizing Usability Data Collection Methods Teena Tiwari Parmar, Dr. Kshama Paithankar

51-55

IJSWS 15-128

A Review on Personalised Search Engine Snehal D. Jadhav, Vaishali P. Suryawanshi

56-59

IJSWS 15-132

Dynamic Data Storage and Replication Based on the Category and Data Access Patterns Priya Deshpande, Radhika Jaju

60-63

IJSWS 15-133

Testing of Brain Tumor Segmentation Using Hierarchal Self Organizing Map (HSOM) Dr. M.Anto Bennet , G.Sankar Babu, S.Lokesh, S.Sankaranarayanan

64-71

IJSWS 15-148

Providing efficient quality web services as per the consumer requirement by composing web service on demand Raghu Rajalingam,R Prabhu Doss

72-75

IJSWS 15-149

State of The Art In Handwritten Digit Recognition Pooja Agrawal

76-78

IJSWS 15-153

A Survey on Privacy Preserving Technique to Secure Cloud Vitthal Sadashiv Gutte, Prof. Priya Deshpande

79-82

IJSWS 15-154

Optimal solution of software component selection by using software metric Hema Gaikwad, Ms. Prafulla Bafna

83-87

IJSWS 15-165

Applying data mining in higher education sector Archana Sharma, Prof. Vibhakar Mansotra

88-92

IJSWS 15-172

Virtualization in Cloud computing Domain structure and data security enhancement Santvana Singh, Sarla Singh, Sumit Dubey

93-95

IJSWS 15-176

Cost Efficient RESTful Services Caching for Mobile Devices Iyad Ollite, Dr. Nawaz Mohamudally

96-101


IJSWS 15-184

INDOOR SURVELLIANCE SYSTEM USING IMAGE PROCESSING Upasana Dugad, Chaitrali Mahanwar, Nimisha Rajeev, Rupali Deshmukh

102-106

IJSWS 15-189

Primary Education Analysis Based on Decision tree, Apriori Algorithm with Neural Networks Manmohan Singh, Anjali Sant

107-113

IJSWS 15-190

Generating Recurrent Patterns Using Clique Algorithm Bipin Nair B J

114-119


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

A TOOL TO GENERATE A COLLABORATIVE CONTENT COMPATIBLE WITH IMS-LD Fauzi El Moudden1, Souhaib Aammou2, Mohamed Khaldi3 Abdelmalek Essaâdi University, Faculty of Sciences, LIROSA, Tetouan, MOROCCO 2 Abdelmalek Essaâdi University, Faculty of Sciences, LIROSA, Tetouan, MOROCCO 3 Abdelmalek Essaâdi University, Faculty of Sciences, LIROSA, Tetouan, MOROCCO _______________________________________________________________________________________ Abstract: In this research we try to adapt the meta-model IMS-LD with a model supporting collaborative learning. This adaptation will go through three stages, first the development of a collaborative model, secondly, the study of correspondence between the developed model and meta-model IMS-LD and their transformation to IMS-LD meta-model that reduced the MDA approach in a transformation that is based on the rules implemented in the ATL language. 1

Keywords: IMS-LD, MDA, ATL, collaborative learning. ______________________________________________________________________________________

I. Introduction Distance learning is promoted through educational platforms: integrated systems offer a wide range of activities in the learning process. Teachers use the platforms to monitor or evaluate the work of students. They use content management systems (LCMS) to create courses, tests, etc. However, the platforms do not offer personalized services and therefore do not take into account the aspects of personalization such as the level of knowledge, interest, motivation and goals of learners. They access the same resource sets in the same way. In fact, we present an easy way for teachers to create and administer the educational content online collaboratively. This tool allows the generation and editing structures of websites through database rather than pedagogical models, with a variety of choices that ensures better adaptation to the teaching of the course and learning style. Otherwise, the social constructivism approach is centered on the learner activity to support synchronous and asynchronous collaboration. Therefore, it is necessary to find a method to model all types of activities. In order to model the activities we have based ourselves upon the IMS-LD specification focusing on collaborative learning. II. Theoretical approach A. Collaborative Learning Online A.1. Definition According to [1], collaborative learning is any learning activity carried out by a group of learners with a common purpose, as every source of information, motivation, interaction, and support ... each with inputs from other, the synergy of the group and the help of a trainer facilitating individual and collective learning. A.2. Importance of collaborative learning online. Collaborative learning was experienced at the onset of online education in the late 1980s under the name, of computer conferencing, e-mail first, then by forums. As online learning, collaborative learning allows learner’s to benefit from great flexibility of time and place (stimulant autonomy and reflection) and an excellent asynchronous interaction (source of motivation for mutual, critical thinking, synthesis ...). That is why [2] reported in 1989 that "the collective nature of computer conferencing may be the single most critical and fundamental underlying theory development and the design and implementation of educational activities line. » In this context, collaborative learning is the most important online educational contribution. And irrefutable logic [3], offering an online education without making those who follow the benefits of its "most fundamental" is absurd and devalues the remarkable educational tool that provides telemetric learners. This does not mean that online education should be limited to collaborative learning online! But it is important that any online program includes a minimum of collaborative learning and exploits the extent of appropriateness a suitable program and its student’s way of learning. Since the emergence of web, e-learning aroused great enthusiasm and developed rapidly in the form of educational materials posted on the web, often without human interaction, and sometimes email interaction between each learner and tutor, previous forums for interaction of each student with a tutor and peers. But the

IJSWS 15-104; © 2015, IJSWS All Rights Reserved

Page 1


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

online collaborative learning activities in small groups have been neglected by many directors online training; they are probably too busy to multiply online training to concentrate on the design and facilitation of small group activities, while they are the most beneficial innovation of online learning. B. Instructional Management Systems (IMS-LD) IMS-LD was published in 2003 by the IMS / GLC. (Instructional Management Systems Global Learning Consortium: Consortium for global learning management systems with training, the original name when IMS was started in 1997 Instructional Management Systems project). [4] Reminds us of its origins: the source (EML) of the proposed language was assessed by the European Committee for Standardization (CEN) in a comparative study of different SRMS [5], as best suited to satisfy the criteria definition of an EML. EML (Educational Modelling Language (EML)) is defined by CEN / ISS as "an information aggregation and semantic model describing the content and processes involved in a unit of learning from an educational perspective and in order to ensure the reusability and interoperability. "In this context, the North American IMS consortium undertook a study and provided a specification of such a language, giving birth in February 2003, the Learning Design specification V1.0 (IMS-LD). She adds that proposal, largely inspired EML developed by [6] (OUNL) provides a conceptual framework for modeling a Learning Unit and claims to offer a good compromise between on the one hand to the generic implement a variety of instructional approaches and secondly, the power of expression that allows an accurate description of each learning unit. This specification allows us to represent and encode learning structures for learners both alone and in groups, compiled by roles, such as "learners" and "Team". [7] We can model a lesson plan in IMS-LD, defining roles, learning activities, services and many other elements and building learning units. The syllabus is modeled and built with resources assembled in a compressed Zip file then started by an executable ("player"). It coordinates the teachers, students and activities as long as the respective learning process progresses. A user takes a "role" to play and execute the activities related to in order to achieve a satisfactory learning unit. In all, the unit structure, roles and activities build the learning scenario to be executed in a system compatible with IMS LD. IMS-LD does not impose a particular pedagogical model but can be used with a large number of scenarios and pedagogical models, demonstrating its flexibility. That is why IMS-LD is often called a pedagogical metamodel. Previous initiatives in e-learning pretend pedagogically neutral, IMS-LS is not intended to pedagogical neutrality but seeks to raise awareness of e-learning on the need for a flexible approach. IMS-LD has been developed for e-learning and virtual classes but during a face-to-face can be made and incorporated into a structure created with this specification, as an activity or learning support activities. If the ultimate goal of creating rich learning units, with support to achieve the learning objectives by providing the best possible experience, face-to-face and other learning resources are allowed such as videoconferencing, collaborative table or any action research field. IMS-LD uses the theatrical metaphor, which implies the existence of roles, resources and himself learning scenario: a room is divided into one or more acts and conduct by several actors who can take on different roles at different times. Each role must perform a number of activities to complete the learning process. In addition, all roles must be synchronized at the end of each act before processing the next act. Figure 1 shows a conceptual model of these three levels.

Fig. 1 The conceptual model of IMS LD [8]

IJSWS 15-104; Š 2015, IJSWS All Rights Reserved

Page 2


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

III. Model-driven engineering A. Modeling Driven Architecture (MDA) In November 2000, the OMG in the field of software engineering consortium of over 1,000 companies, has initiated the process MDA [OMG MDA], the concepts-oriented models rather than object-oriented. The Model Driven Architecture MDA [OMG MDA] offers the power of abstraction, refinement and different views of the models. This standard has to add a new way to design applications by separating business logic from business, any technical platform to increase the reuse of previously developed code, reducing development time and facilitating the integration of new technology [9]. It gives the opportunity to develop independent models platforms and implementation [10] environment. MDA is used to separate two extreme views of the same system [11]:  its functional specifications on the one hand;  its physical implementation on the other hand; Including several aspects of the life of the software, namely its tests, its quality requirements, the definition of successive iterations, etc. The MDA architecture consists of four layers. In the center, there is a UML (Unified Modeling Language) standard MOF (Meta-Object Facility) and CWM (Common Warehouse Meta-model). The second layer contains the XMI (XML Metadata Interchange) standard for dialogue between the middleware (Java, CORBA, .NET, and Web services). The next layer refers to the services to manage events, transaction security, and directories. The last layer offers specific frameworks in scope (Telecommunications, medicine, electronic commerce, finance, etc.) A designer to create his own application can use UML as it can use other languages. So according to this architecture independent technical context, MDA proposes to structure the front needs to engage in a transformation of this functional modeling technical modeling while testing each product model [12]. This model of application is to be created independently of the target implementation (hardware or software). This allows greater reuse of patterns. MDA is considered an approach with the ambition to offer the widest possible view of the life cycle of the software, not content with only its production. Moreover, this is intended overview described in a unified syntax.One of the assumptions underlying the MDA is that the operationalization of an abstract model is not a trivial problem. One of the benefits of MDA is to solve this problem [13]. MDA proposes to design an application through software chain is divided into four phases with the aim of flexible implementation, integration, maintenance and test:  The development of a computer model without concern (CIM: Computer Independent Model).  The manual transformation into a model in a particular technological context (PIM: Platform Independent Model);  The automatic transformation into a pattern associated with the target implementation of the platform (PSM: Platform Specific Model) model to be refined;  Its implementation in the target platform. B. Collaborative meta-model In our research, we propose a meta-model for a system designed to achive the needs of educational projects that require online collaboration, and the needs of teachers in terms of generation of collaborative educational content. Therefore, we establish the following diagram as a first proposal of a meta-model for collaborative learning: <poduct Poduction

consist> Poject

Phase have>

<use

<realise

Tools

Objectifs

Task Team

Calendar Discussion

Sub Task

Member

Notification Tutor Document Teacher Lerner

Fig. 2 Proposal of conceptual collaborative meta-model

IJSWS 15-104; © 2015, IJSWS All Rights Reserved

Page 3


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

We propose subsequently a collaborative model from our meta-model proposed in (Fig. 2) in which we defined each class properties and the relationships between them:

Fig. 3 Poposal of collborative model C. Correspondence between the terminology of IMS-LD and that of the collaborative model The majority of classes designed in our collaborative model correspond perfectly with the IMS-LD model, which makes possible their transformations to it. The transformation of models is a technique aims to put links between models in order to avoid unnecessary reproductions. In the next section, we will discuss how to perform transformations between models, starting with the study of model driven engineering and ending with the rules of transformations that will be used to make our collaborative model compatible with meta-model IMS-LD. In the following table we have tried to collect all the collaborative model classes and their equivalent at the IMS-LD: Table 1 Correspondence between the terminology of IMS-LD and that of the collaborative model Collaborative meta-model

IMS-LD

Project

Activity

Task

Role

Subtask

Activity structure

Team, Members

Person

Teacher

Staff

Learner

Learner

Production

Outcome

Notification

Notification

Objective

Learning Objective

Services

Services

IV. Transformation rules A. Atlas Language Transformation (ATL) In their operational ATL, Canals et al use. [14] State that to deal with the transformation of models; it is difficult and cumbersome to use object languages since we spend so much effort to the development of transformation

IJSWS 15-104; Š 2015, IJSWS All Rights Reserved

Page 4


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

definitions of Framework for the set work. The use of XSLT as a language if it is more direct and adapted by rest against difficult to maintain [14]. We follow their choice by focusing on the implementation of approaches centered on the MDA (Model Driven Architecture), MDE (Model Driven Engineering) and QVT (Queries View Transformation) tools. Query / View / Transformation (QVT) [18] is a standard defined by the OMG. This is a standardized language to express model transformations. QVT is not advanced sufficiently now in its definition for Queries and View aspects. Against transformation by the appearance expressed by the MDA approach has resulted in various experiments (eg Triskell, ATL ...) in both academic and commercial level. To determine the transformation, it is necessary to have tools of transformations. These are based on languages transformations must respect the QVT standard [18] proposed by the OMG [15]. There is an offer of free tools (ATL, MTF, MTL, QVTP, etc.) and commercial (eg MIA). We chose ATL (Atlas Transformation Language) from the provision of free tools, to the extent that only ATL has a spirit consistent with OMG / MDA / MOF / QVT [14]. B. ATL Description Atlas Transformation Language (ATL) has been designed to perform transformations within the MDA Framework proposed by the OMG [16], [17]. The ATL language is mainly based on the fact that the models are first-class entities. Indeed, the transformations are considered models of Transformation. Since transformations are considered themselves as models, we can apply their transformations. This possibility of ATL is considered an important point. Indeed, it provides the means to achieve higher order transformations (HOT Higher- Order Processing) [17]. A higher-order transformation is a transformation including source and target models that are themselves transformations. As ATL is among the languages model transformation respecting the QVT [18] standard proposed by OMG [15], we describe its structure in relation to this standard (QVT). The study of the abstract syntax of the ATL language is to study two features provided by this language more than rules changes. The first feature, navigation, allows to study the possibility of navigation between metamodels sources and targets. The second feature, Operations, used to describe the ability to define operations on model elements. Finally, the study of the transformation rules is used to describe these types of rules, how they are called and the type of results they return.  Navigation [16], [15] this feature is offered by ATL language (Object Constraint Language). Navigation is allowed only if the model elements are fully initialized. The elements of the target model cannot be definitively initialized at the end of the execution of the transformation. Therefore, the navigation in ATL can only be made between elements of the model (or meta-model) source and model (or metamodel) target.  Operations: [16] this feature ATL is also provided by the OCL (Object Constraint Language). In OCL, operations can be defined on the elements of the model. ATL takes this opportunity to OCL to allow defining operations on elements of the source model and the transformation model [15].  The transformation rules: there are several types of transformation rules based on how they are called and what kind of results they return (Fig. 1)

Fig. 4 ATL Transformation rules CalledRule [16] rule explicitly called by its name and by setting its parameters. MatchedRule [16] rule executed when a guy (InPattern) scheme is recognized in the source model. The result of a rule may be a set of predefined models (OutPattern) or a block of mandatory (ActionBlock). If the rule is MatchedRule type and if its result is a set of elements of the target model (OutPattern), it was named declarative. If it is CalledRule type whose result is a block of statements, it is then called procedure. Combinations of rules (declarative and imperative) are called hybrid rules. [15]

IJSWS 15-104; © 2015, IJSWS All Rights Reserved

Page 5


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

C. Rules of transformations from the metamodel Collaborative Project to the meta-model IMS-LD In this section, we define, explain and argue the transformation rules developed from the collaborative méta_modèle to that of IMS-LD metamodel. A rule defines the "mapping" between classes of meta-models and the rules for handling attributes and relationships of classes. D. Rules between the collaborative meta-model and IMS-LD Meta-model activity structure side. In the collaborative meta-model, there are three levels of conceptualization to meet: Project, Task (task) and subtask (Subtask). These three concepts are linked by relations of composition (a project consists of tasks that are composed of sub-tasks). While in the meta-model IMS-LD, we found that the concept and structure activity that describe these three concepts.

Fig. 5 Rule Project to Activity-structure Regarding the first rule (rule Projet2Structure) (Fig. 5), we need to create a rule type (MatchedRule) that will have as source class (Draft) and a target class (Structure of activity). This rule is the "mapping" between the Project of class collaborative meta-model structure and activity meta-model IMS-LD and handling attributes of these classes. E. Rule between the Collaborative meta-model task and the IMS-LD activity model side Regarding the second rule (rule Task2Activity) (Fig. 6), we also chose a type rule (MatchedRule) of source class (Task) and a target class (Activity). This rule is the "mapping" between the Task of class collaborative metamodel and meta-model Activity IMS-LD and handling attributes of these classes.

Fig. 6 Rule Task2Activity F. Rule between the Collaborative meta-model Subtask and the IMS-LD activity model side For the third rule (rule Subtask2Activity) (Fig. 7), we also chose a type rule (MatchedRule) of a source class (Subtask) and a target class (Activity). This rule is the "mapping" between the Task of class collaborative metamodel and meta-model Activity IMS-LD.

Fig. 7 Rule Subtask2Activity Concerning the handling of the relationship between the classes Task and Subtask of collaborative meta-model, we are unable to have a close relationship in the meta-model IMS-LD because it does not exist in the metatarget model (IMS-LD) of equivalent relationship. In IMS-LD, we do not have the opportunity to build an Activity which consists of several Activities. For this, there is at this level a semantic loss (certain specified information in a scenario disappear) (Fig. 8). And it is the responsibility of the user to verify the name Subtask project to see what Subtask correspond to which task. We will see in the following rule how we concatenate the title of Subtask with that of its Task and Subtask this number with that of the identifier of the task to address this problem.

IJSWS 15-104; © 2015, IJSWS All Rights Reserved

Page 6


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

Fig. 8 Concatenation ATL code Concatenation is not ensured within the rule directly. To do this, we specify that we use a "Helper". [15] We define two "helper" (Fig. 9) in the context of the concept "Subtask". The first is used to return the value of the attribute "name_subtask" concept "Subtask" by assigning the "Helper". The second is used to return the value of the attribute "Number_Subtask" concept "Subtask" by assigning the "Helper".

Fig. 9 Code ATL helper V. Conclusion In this chapter we took the road to the modeling of a meta-model of collaborative online education compatible with IMS-LD, we started in part by the expression of the need for this modeling suite we developed a section regarding the IMS LD specification in the theoretical framework of this chapter. In the second part of this chapter we began by studying the MDE approach (Model Driven Engineering) and we adopted the MDA (Model Driven Architecture) defined by the OMG approach based on four stages of implementation:  The development of a computer model without concern (CIM: Computer Independent Model).  The manual transformation into a model in a particular technological context (PIM: Platform Independent Model);  The automatic transformation into a pattern associated with the target implementation platform (PSM: Platform Specific Model) model to be refined;  Its implementation in the target platform.  This led us to study the transformation language ATL that in turn has allowed us to define the transformation rules to our proposed meta-model IMS-LD meta-model. Concerning executed transformations in this chapter, we limited ourselves to a few examples by what we have given way to forward for other transformations to perform all other transformations from collaborative proposed meta-model to the meta-model IMS-LD. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Henri F., Lundgren-Cayrol K., Apprentissage collaboratif à distance - Pour comprendre et concevoir les environnements d’apprentissages virtuels, Sainte-Foy (Canada), Presses de l’Université du Québec, 2001 Harasim L., « Online education: a new domain », in Mason R. & al. (ed.), Mindweave,Oxford, Pergamon, 1989, p. 5052,http://www-icdl.open.ac.uk/literaturestore/mindweave/chap4.html Salmon G., E-Moderating - The key to teaching and learning online, Londres, Kogan Page,2000. Lejeune, A. IMS Learning Design : Étude d'un langage de modélisation pédagogique, Revue Distances et Savoirs, volume 2. CEN/ISS WS/LT, Learning Technologies Workshop "Survey of Educational Modelling Languages (EMLs)".Version 1, septembre 2002. Koper, R. (2001). Modeling Units of Study from a Pedagogical Perspective, the pedagogical meta-model behind EML. http://eml.ou.nl/introduction/docs/pedmetamodel.pdf. Burgos, D., Berbegal, N., Griffiths, D. (2005a). IMS Learning Design Level 0 . http://moodle.learningnetworks.org/. IMS Learning Design www.epi.asso.fr/revue/articles/a0512c.htm P. Boulet, J.L. Dekeyser, C. Dumoulin, and P. Marquet. Mda for soc embeddeb systems design, intensive signal processing experiment. In SIVOESMDA workshop at UML 2003, San Francisco, October 2003. D. Thi-Lan-anh, G. Olivier, and S. Houari. Gestion de modèles : définitions, besoins et revue de littérature. In Premières Journées sur l’Ingénierie Dirigée par les Modèles, pages 1–15, Paris, France, 30 Juin- 1 Juillet 2005. A. Clave. D’uml à mda en passant par les méta modèles. Technical report, La Lettre d’ADELI n 56, 2004.

IJSWS 15-104; © 2015, IJSWS All Rights Reserved

Page 7


Fauzi El Moudden et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 01-08

[12] [13] [14] [15] [16] [17] [18]

A. Clave. D’uml à mda en passant par les méta modèles. Technical report, La Lettre d’ADELI n 56, 2004. P.A. Caron, F. Hoogstoel, X. Le Pallec, and B. Warin. Construire des dispositifs sur la plateforme moodle - application de l’ingénierie bricoles. In MoodleMoot-2007, Castres, France, 14 - 15 Juin 2007. A. Canals, C. Le-Camus, M. Feau, G. Jolly, V. Bonnafous, and P. Bazavan. Une utilisation opérationnelle d’atl : l’intégration de la transformation de modèles dans le projet topcased. In Génie Logiciel (73), pages 21–26, 2005. J. Bézivin, G. Dupé, F. Jouault, G. Pitette, and J. Eddine Rougui. First experiments with the atl model transformation language : Transforming xslt into xquery. In OOPSLA 2003 Workshop, Anaheim, California, 2003. ATLAS group LINA and INRIA Nantes. Atlas transformation language atl user manual - version 0.7. Technical report, INRIA University of Nantes, February 2006. B. Combemale and S. Rougemaille. ATL - Atlas Transformation Language. Master 2 Recherche SLCP, module rtm edition, 2005. OMG/RFP/QVT MOF 2.0 Query/Views/Transformations RFP. OMG, October 2002.

IJSWS 15-104; © 2015, IJSWS All Rights Reserved

Page 8


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

A Query Enhancement Technique for Extracting Relevant Information Mansour Aldawood and Dr. Ahmed Z Emam Information Systems Department King Saud University Riyadh, Kingdom of Saudi Arabia

Abstract: The size of the contents on the web is exponentially increasing with the facilitating from modern technologies that eases creating and storing them. This evolving of database technology leads the researches and the specialists in a certain field to save their publications easily in these databases, based on their backgrounds and interests. With time passing, these databases size reach to a huge number of the information stored in them, which is making the search for an information is difficult. In Medical filed, a predefined database such as Medline has more than 23 Million scientific papers, which is care about the field of the healthcare. The extraction of a certain information in this huge database requires a technique that guides the searcher, to have the most relevant results that satisfy his need. This paper will introduce a technique that aims to enhance the query entered by searcher of an information by preprocessing the query to increase the degree of relevancy of the results. This technique aims to improve the results of the extraction tools that specialized in extracting information from healthcare database. Keywords: Information Extraction; Medline; Query enhancement; UMLS; information retrieval; I. Introduction The amount of medical information in health care databases has become exponentially huge during the past few years, with the aid of advanced technologies that helps and facilitate the process of easily storing this information. The huge amount of information that resides in these databases has made the search for relevant information in them more difficult than ever for physicians whom seek about a certain information [1]. One of the most famous healthcare databases is Medline [2], which considers the most comprehensive database that specialized in biomedical fields. Its information updated on a daily basis and has more than 24 million records that dated back from the early of the 60s. With this evolving in database size, the traditional searching tools didn’t serve the physicians or researches in a proper way since these tools present a big number of results, which makes finding the relevant information taking a lot of time. This time consuming affects directly the decision-making in most cases [3]. This issue leads to the need of specialized extraction tools that mainly extracting a structure information from unstructured repositories based on the keywords that entered by the searchers. Even with existing of extraction tools, the number of results for a certain query about relevant information is still high, which makes finding the most wanted results for physicians or researchers not easy and taking a long time. This led to the need of query enhancer technique that help the physician to search about any information with certain keywords that give the most relevant results to the query, with taking to the consideration the existing extraction tools. II. Literature Review The most famous definition of the information extraction concept is the auto extraction of structured information from unstructured source or database such as the databases with the natural language format. The ultimate goal of this technique is to make any wanted information accessible through a certain mechanism and in a timely manner. In addition, most of the researchers believe that the use of information extraction tools will help to develop databases with structured information that will help to search and indexing the information that resides in them more easily. What motivates this technique to be appearing and developed over the past years, is the need for searching in wider pool rather than just searching on a certain keyword in a specific area. This concept has recognized in the late of the 70s and start to be develop widely in the 80s [4]. The Unified Medical Language System is a set of files and tools that contains many standards and vocabulary that related to the field of healthcare and biomedical in order to achieve interoperability between information systems that specialized in the healthcare sector. The most advantage of using UMLS is to have the ability to link between health information, medical standards and medicine names over multiple information systems. UMLS has created during 1986 as databases that about the vocabularies in the biomedical sciences. With the enormous increase of contents that specialized in biomedical resources in predefined databases, the retrieve for this content become more difficult and has a large volume of results. UMLS help solving the issue by enhancing the accessing to these contents by providing a mapping structure between vocabularies related biomedical and thesaurus of biomedical concepts [5]. Medline is the unite state, national library of medicine (NLM) which has over than 24 million scientific papers and journals in science but more concentrates on the biomedical one. Medline [2] has started its development in

IJSWS 15-105; Š 2015, IJSWS All Rights Reserved

Page 9


Mansour Aldawood et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 09-15

1964 as a medical literature analysis and retrieval system (MEDLARS).The Porter stemmer algorithm according to his founder Porter [6] is a "process for removing the commoner morphological endings from words in English". In other word, this algorithm is responsible for retrieving English word to its original form to aid the developers in the field of information retrieval. Porter stemmer is a rule-based algorithm that aims to remove to the endings of English words, for example, the word relational will be relate according to porter stemming.The growth of contents in predefined databases has exponentially increased, therefore the need for tools that extract the exact what we looking for in large databases has become necessary. One of these tools called BioIE that has developed in 2005 by Divoli and Attwood [7], which is a rule based extraction tool that extract an informative sentence which are related to a protein families and their structure. In the same year, Mitchell and his team [8] stated that BioIE has the ability to perform in classifying sentences that related to disease more than a support vector machine (SVM) since it has 56% precision while in SVM 48% but when using other factor such as the sentence that related to the structure it precision become less. In 2006, Schuhmann and his colleagues [9] has developed a tool called EBIMed to be efficiently retrieve sentences or abstracts from the Medline database with ability to analyze these phrases. The extracted abstracts that used to create a table, which has an overview about the protein, gene ontology, drugs and species of the same biological context. Furthermore, in 2004 a BioRAT has presented by Corney and his team [10] as an information extraction tool. This software aims to extract a biomedical information and be able to locate and analyze either abstract or full papers. Additionally, in 2007, Hearst and his colleagues [11] has developed a web based search engine called BioText. From the chosen name, it indicates its purpose which is to help the biologists to have a new method to access the recent scientific papers by searching on articles illustrations, figures and captions. BioText is a web based application and still ongoing research papers with an interface that's designed carefully to serve the purpose of this engine and to accomplish its functionality. After a period of time, BioText developers [12] added more functionality to their search engine by allowing the users to have the ability to search over full text, abstracts, figure captions and tables. Later in 2008, Gladki and his colleagues [13] has represented a web based tool that extract the abstracts of a biomedical information in the PubMed database. Their goal was to design a tool that can find the right and true correlations of a search by the users. They present a software that has a useful functionality such as searching by author name and using logical operators such as AND, OR and NOT. Over the years, the researchers and developers start to think differently about the way that the search engines must perform. One of these ways, as it is presented by Wang and his team in 2010 [14] which using the fuzzy search technique in their search engine. They present a web-based tool called IPubMed that extract and search for publications from the Medline database. Their goal of this tool is to have the ability to retrieve instant exact feedback to searcher query plus to have the approximate result of the same query as a fuzzy result. IPubMed is the tool that utilized for the proposed system. III. Research Question As it mentioned in the introduction of this paper, the number of results for a certain query about relevant information is still high in the existing of extraction tools, which makes finding the most wanted results for physicians or researchers not easy and taking a long time. Which leads to the need of query enhancer technique that help the physician to search about any information with certain keywords that give the most relevant results to the query. This paper has two questions to verify and prove which are, how using UMLS biomedical Metathesaurus concepts and ontology will improve the searching quality for a physician and searchers. Second question is what the appropriate tool that integrate with UMLS to enhance the query performance. Moreover, it will verify about applying a proposed tool as an extraction tool with the proposed system (EQI- Enhanced Query for IPubMed) will improve the precision and quality of the search. IV. Methodolgy The proposed methodology starts by utilizing the existing extraction tool called IPubMed that responsible of query handling, document analysis and indexing to improve search results and ranking the most relevant information, abstracts or sites. The second step is using the Unified Medical Language System (UMLS) APIs to retrieve synonyms keywords that related to the healthcare domain. The third step is applying porter stemmer to reduce the entered keywords to their original form then removing the stop words from the query. The last step is using the results keywords from UMLS as a query in EQI and PubMed Search engines to compare their precision and recall using confusion matrix technique. Table 1: Synonyms keywords for stomach cancer query from UMLS Malignant neoplasm of stomach

Malignant neoplasm of stomach stage IV

cellular diagnosis, gastric cancer

FH: Stomach cancer

Stomach Carcinoma

recurrent gastric cancer

stage, gastric cancer

Stage I Gastric Carcinoma

Stomach Neoplasms

Gastric Adenocarcinoma

Gastric Fundus Carcinoma

Stomach Problem

Carcinoma in situ of stomach

intestinal adenocarcinoma of the stomach

Gastric Body Carcinoma

Endoscopy of stomach

IJSWS 15-105; Š 2015, IJSWS All Rights Reserved

Page 10


Mansour Aldawood et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 09-15

V. Proposed System The proposed system running through two phases, which are the preprocessing phase for analyzing the entered keywords, and the second phase is the extracting phase, which responsible for extracting the information form predefined repository. Figure 1: Architecture of the proposed system

The first phase will be responsible for analyzing the entered keywords that entered by the physician before integrate them with the extraction tool. Firstly, the physician or researcher will type keywords through a webbased interface to have an information based on the entered keywords. The preprocessing client will remove the stop words from the query to end up with the terms that mostly wanted and which will yield to better results, also will do the stemming of the entered keyword to retrieve it to its origin form using a stemmer algorithm to improve the precision of the extracted information. After that, the client will start communicating with the UMLS database to have synonym words. The client will send the entered keywords to the UMLS database through its APIs to check whether these keywords exist in in UMLS repository or not. If these keywords do not exist, it will return an empty result to the client, which in its turn will present a result to the physician, indicates that his or her searching words do not have any synonym words and the physician should return to enter a new keyword. If these keywords have a synonym word, then it will retrieved from UMLS database and send to the client in order to present them to the physician. In this stage, the synonym words will presented to the physician by the analysis client and in his or her turn will choose the most relevant words that is close to what he or she looking for. After selecting the relevant word, the analysis client will send these words to the extraction tool for retrieving the information based on the entered keywords. The second phase is responsible for extracting the information from the repository. It utilizes a predeveloped extracting tool that has integration with Medline repository. This extraction tool received the analyzed keywords from the client in the preprocessing phase, and extract the information from the repository to present them to the physician. Retrieving Synonyms using UMLS API Design Unified Medical Language System (UMLS) database has rich synonyms for the words that related to healthcare fields. This proposed system uses this database facility through its provided API that will explained in details in the next illustration. As it shows in figure 2, the communication with the API requires prerequisites requirements that permits the establishment of transfer the information between the proposed system and UMLS database, which are essentially the user name and password. The user name and password validity is essential to create the one time per eight hours ticket that communicates with UMLS database. This ticket generated by a security service that firstly check for the validity of the user name and password, and then generates the single time ticket. After the ticket generated, the communication with UMLS database will be open with its synonym words through the provided API. The provided API by UMLS terminology services has many features that used for useful needs. In this proposed system, retrieving the synonyms of the entered keywords is the important feature that this system seeks. A function used to retrieve the synonyms that provide the choice for searching in the database by either

IJSWS 15-105; Š 2015, IJSWS All Rights Reserved

Page 11


Mansour Aldawood et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 09-15

exact, approximate or normalized words. Moreover, this function has the ability to retrieve either the synonym concept, the source of the concept or the code of the concept. After this customization of the search, the result will be stored in an array and presented. Figure 2: Retrieving Synonyms using UMLS API Architecture

The main goal of the proposed system is to enhance the query that enters the search box for yielding to better and relevant results for the user query. In this proposed system, the deletion of stop words, stemming of the keywords in its original form and retrieve the synonyms from UMLS database is an automated process. The searcher, who will utilize this system, will just enter the query that satisfies his need to find any information related to his field, which is in our case in the healthcare field. The query will start the refining process by firstly check for the stop words if the query has them. Once the stop words detected, the system will delete them and send the results to the second refining process, which is stemming process. In the steaming process, the query will be diving into tokens of word to be easily reduce them to their original form. After the stemming process completed for each token, the query will combined as whole query. Thirdly, the query will sent to the UMLS database to check if there are synonyms words in the query. If the synonyms exist, it will retrieved to the searcher as suggestion keywords. As it shows in Figure 3, the user will select one of the suggested keywords if it`s satisfy his or her information needs. These selected key words will sent to as keywords, to the predefined search engine, which is IPubMed in order to extract the needed information from the Medline database. IPubMed engine is a well-developed engine that responsible of retrieving the most relevant information that related to healthcare. Figure 3: Process flow of the proposed system

IJSWS 15-105; Š 2015, IJSWS All Rights Reserved

Page 12


Mansour Aldawood et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 09-15

VI. System Validation and Results As Figure 4 illustrates, the query “Stomach Cancer” has used in the proposed system API to retrieve the synonyms words from UMLS database. Thirty-one results have received from the query with different synonyms that used in the healthcare field regarding stomach cancer. These keywords will integrated into the EQI search engine to count the number of scientific papers that has the synonymous keywords and the original keywords “Stomach cancer” in the same paper. Figure 4: Number of Citations per Synonym word Number of Citations 35 30

29

25 18

20

15

14

15 11

11 9

10

11

10 10 8

7

9 7

7

8 5

4

5 0

1

0

1

0

0

2

1

0

0

1

0

0

Confusion matrix technique used to facilitate measuring the performance of the proposed system, since it offers a classification that leads to measuring the precision, recall, error rate, accuracy and F-measure of the presented technique. Confusion matrix techniques divide the current class that aimed to measure to four classifications as shown in Table 2. .Table 2: Four classes of confusion matrix Confusion matrix Actual class

+

Predicted Class + True positives (++) False negatives(+-)

-

False positives(-+)

True negatives(--)

A set of results (hits) has retrieved when applying the most ranked synonym words which is" Malignant neoplasm of stomach" on the two search engines. One-hundred hits has tested from the two search engines and the following results based on this token sample. After analyzing the two results of the two search engines, which are EQI and the PubMed on 100 extracted documents for the queries, which are, query1 "Malignant neoplasm of stomach", query2"Stomach problem" and query3"Stomach Neoplasms". The next tables show the results of four classes of the confusion matrix as shown in Table 3. Table 3: Performance results for the search engines PubMed

Performance classes for PubMed (Total=100)

Query/Class

TP

FN

FP

TN

Query1 Query2

37 28

9 6

22 19

32 47

Query3

33

14

23

30

EQI Query/Class Query1 Query2 Query3

Performance classes for IPubMed (Total=100) TP 19 11 21

IJSWS 15-105; © 2015, IJSWS All Rights Reserved

FN 7 5 6

FP 28 26 24

TN 46 58 49

Page 13


Mansour Aldawood et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 09-15

Table 4 Equations of precision, recall, error rate, accuracy and F-measure Measure Accuracy

Equation (True positives+ True negatives)/Total Documents

Error rate Recall

(False positives+ False negatives)/Total Documents (True positives)/ (True positives+ False negatives)

Precision F-Measure

(True positives)/( True positives+ False positives) 2*( (Precision* Recall)/ (Precision+ Recall))

From the literature review, the suggested performance evaluation metrics for similar system are precision, recall, error rate, accuracy and F-measure can achieved as shown in Table (4). Table (4) shows the equation that used for measuring the precision, recall, error rate, accuracy and F-measure. The results will be as shown in table5. Table 5: Calculation of precision, recall, error rate, accuracy and F-measure EQI Query1 Query2 Query3 PubMed Query1 Query2 Query3

Accuracy 65% 69% 70% Accuracy 69% 75% 63%

Error rate 35% 31% 30% Error rate 31% 25% 37%

Recall 73% 68.8% 77.8% Recall 80.4% 82.3% 70.2%

Precision 40.4% 29.7% 46.7% Precision 62.7% 59.6% 58.9%

F-Measure 52% 41.4% 58.3% F-Measure 70.4% 69.1% 64%

To evaluate the efficiency of the proposed system, the generated four three evaluation metrics as described in table 4 are measured. We have implemented this evaluation model using the proposed system and the comparison among the three quires by each metric is shown in figures (5). Figure 5: Performance Metrics

VII. Conclusion and Future work According to the results mentioned earlier in the previous chapter, a set of recommendations has assembled in the matter of the proposed system. Using the proposed system has a value to the searcher of physicians whom facing a difficulty in finding the right query that leads to the most relevant information. Therefore, in any

IJSWS 15-105; Š 2015, IJSWS All Rights Reserved

Page 14


Mansour Aldawood et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 09-15

information retrieval system the enhancement of the query is must in order to have relevant results. Enhancing the query is not enough, the ranking mechanism is important as well for retrieving the most relevant information. Because with existing of a good enhancing technique, but with a less quality ranking technique, the results will not be as expected to the user. The information retrieval system that has used in the two systems, which are EQI and PubMed, uses a different mechanism of ranking the results; therefore, the results of the recalled elements in PubMed are much higher than the recalled elements in EQI. Regarding that, the two search engines reading their results from the same database. Moreover, measuring of the relevancy of the results for EQI is depending on the term frequency that mentioned in the abstract of a document, which means if the document has the term of the query repeated many times, the document will ranked at the top and so on. On the other hand, PubMed uses different kinds of algorithm that measuring the relevancy that combining more than one factor to rank the results, the factors such term frequency of the whole document, the date of the publication. Therefore, the precision of PubMed is higher than EQI. The accuracy and error for the two search engines considered reasonable, since they have different way of retrieving the information from Medline, with the regards that the results that collected from PubMed considered more reliable to the query than EQI, since it uses term frequency for the whole document for scoring and ranking the document. As a future work of the proposed system, the proposed system could be used more components for enhancing the query than the ones were presented. For instance, adding the tokenization component that responsible of split the query into a group of words in order to for stemming them or treat them as query individually. This approach will take one word at a time and prepares it for synonyms as the following illustration shows. Moreover, automatic annotation to enhance the query can added, which is simply caring about the metadata for the query not just the query itself. For instance, if a searcher wants to search about a paper with knowing the author of the paper only, a result of suggestions author lists can be helpful for the searcher. The synonyms that retrieved from UMLS database could cause a noise to the proposed system. Because not all the synonyms words, that has been collected from UMLS are usable as a keywords in the scientific papers. Moreover, from 200 citations that been ranked according to their relevancy to the user query, the synonyms were not used and this causing a noise to the proposed system. As future improvement, a filter technique could added to the query enhancement system to filter the unusable keywords and to reduce the number of suggestions to the most relevant ones. The Porter stemmer that used in this proposed system is a general technique that reducing the keywords in their original form based on the English dictionary words. An enhancement could added to the system by adding a porter stemmer that specialized in reducing the medical keywords to their root form. This enhancement could raise the quality of the retrieved suggestions to the user who seeks about the relevant documents. VIII. References [1] [2] [3] [4] [5]

[6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

Ebbert, J., Dupras, D. and Erwin, P. (2003). Searching the Medical Literature Using PubMed: A Tutorial. Mayo Clinic Proceedings, 78(1), pp.87-91. Ncbi.nlm.nih.gov, (2014). Home - PubMed - NCBI. [online] Available at: http://www.ncbi.nlm.nih.gov/pubmed [Accessed 25 Dec. 2014]. Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A. and Mazo, I. (2004). Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics, 20(5), pp.604-611. Pazienza, M. (1997). Information extraction. Berlin: Springer. Selden, C. and Humphreys, B. (1997). Unified Medical Language System (UMLS). Bethesda, Md. (8600 Rockville Pike): U.S. Dept. of Health and Human Services, Public Health Service, National Institutes of Health, National Library of Medicine, Reference Section. Porter, M. (1980). An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), pp.130-137. Divoli, A. and Attwood, T. (2005). BioIE: extracting informative sentences from the biomedical literature. Bioinformatics, 21(9), pp.2138-2139. Mitchell, A., Divoli, A., Kim, J., Hilario, M., Selimas, I. and Attwood, T. (2005). METIS: multiple extraction techniques for informative sentences. Bioinformatics, 21(22), pp.4196-4197. Rebholz-Schuhmann, D., Kirsch, H., Arregui, M., Gaudan, S., Riethoven, M. and Stoehr, P. (2007). EBIMed--text crunching to gather facts for proteins from Medline. Bioinformatics, 23(2), pp.e237-e244. Corney, D., Buxton, B., Langdon, W. and Jones, D. (2004). BioRAT: extracting biological information from full-length papers. Bioinformatics, 20(17), pp.3206-3213. Hearst, M., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M. and Ye, J. (2007). BioText Search Engine: beyond abstract search. Bioinformatics, 23(16), pp.2196-2197. Hearst MA, Divoli A, Ye J, Wooldridge MA. Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces. 2007. in Proceedings of BioNLP 2007, a workshop of ACL 2007. Gladki, A., Siedlecki, P., Kaczanowski, S., & Zielenkiewicz, P. (2008). e-LiSe - An online tool for finding needles in the “(Medline) haystack”. Bioinformatics. Wang, J., Cetindil, I., Ji, S., Li, C., Xie, X., Li, G. and Feng, J. (2010). Interactive and fuzzy search: a dynamic way to explore MEDLINE. Bioinformatics, 26(18), pp.2321-2327. Chen, C. (2013). FEATURE SELECTION BASED ON COMPACTNESS AND SEPARABILITY: COMPARISON WITH FILTER-BASED METHODS. Computational Intelligence, 30(3), pp.636-656.

IJSWS 15-105; © 2015, IJSWS All Rights Reserved

Page 15


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Review on Various Routing Protocols Based on VANET's: A Survey Gurminder Kaur, Manju Bala*, Manoj Kumar** *Department of Computer Science and Engineering, **Department of Electronics Communication and Engineering, CT, Institute of Engineering, Management & Technology, Shahpur, Jalandhar, Punjab, INDIA. Abstract: The Vehicle Ad-hoc Network (VANET) is a gathering of masterminding toward oneself node without any base. The Vehicular nodes with remote radio interface are joined by remote connections where every gadget in a VANET is allowed to move freely and haphazardly with the capacity of transforming its connections to different gadgets often. Late research work in VANET attention on specific regions like steering, security and nature of administration however because of high dynamic nature of this system, planning an effective directing convention for all VANET requisitions is hard, still there are extent of reproduction or making of new outline of convention, administrations for VANET architectures. The change in existing approach or proposed a novel method for directing is turning point however a study of steering conventions focused around different parameters of VANET is an essential issue in vehicle-to- vehicle (V2v) and infrastructure to- vehicle (IVC) correspondence for shrewd ITS. In the advancement of VANET routing is the principle issue. There are distinctive security imperfections and assaults on steering conventions in VANET. These strike can influence the execution of diverse routing protocols. Keywords: VANET, V2V, V2C, ITS, IVC I. Introduction Vehicular impromptu Network (VANET)[1] may be another troublesome system setting that seeks after the develop of present registering for future. Vehicles outfitted with remote correspondence advances and acting like smart phone hubs are out and about without further ado and this can alter the build of development. The development of the augmented scope of vehicles are furnished with remote handsets to talk with distinctive vehicles to make an uncommon class of remote systems, called transport specially appointed systems or Vanets. Specially appointed remote system ought to be skilled to orchestrate toward oneself and self-arrange due to the very truth that the versatile structure is dynamical on every event. Versatile hosts have a limited shift and bringing about the message to an alternate host or different has, that isn't inside the sender's host transmission differ, should be sent through the system by abuse distinctive hosts which has the capacity be worked as switches for conveying the message all through the whole system. The portable host ought to utilize show for bringing about messages and will be in wanton mode for tolerating any messages that it gets. To support the insurance of drivers and giving cozy driving setting, messages for different capacities must be constrained to be sent to vehicles through the between vehicle interchanges. Vanets bring scores of possibilities for shiny new change of uses which has the capacity not exclusively fabricate the travel more secure however fun additionally. advancing to an end of the line or getting encourage would be bounteous simpler. The develop of Vanets is somewhat basic: by consolidating the remote correspondence and data offering capacities, the vehicles are frequently turned into a system giving comparable administrations much the same as the ones with that we tend to are wont to in our business locales or homes. Vanets are considered AN off-shoot of Mobile unplanned Networks (Manets); yet they require a few attributes as well. The arrangements got ready for Manets must be propelled to be assessed meticulously so custom-made to be used in VANET setting. Moreover, Vanets likewise are similar to Manets from multiple points of view. As a sample, each one systems are multi-jump versatile systems having element topology to are won't to in our business locales or homes. Vanets are considered an offshoot of Mobile unplanned Networks (Manets); however they require some trademark as well. The arrangements anticipated Manets must be forced to be assessed exactingly so customized in order to be used in VANET setting. Also, Vanets likewise are similar to Manets from multiple points of view as an illustration, each one systems are multi-jump versatile systems having element topology. II.APPLICATIONS VANET application will be categorized into following classes a) VANET gives present property headed straight toward portable clients. so moving vehicles will speak with each other amid a considerable measure of prudent way. they will just transmit their messages all through the system.

IJSWS 15-110; Š 2015, IJSWS All Rights Reserved

Page 16


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

b) It gives prudent vehicle to vehicle interchanges that allows the Intelligent Transport System (ITS). ITS typify sort of uses like agreeable movement perception, administration of activity streams, visually impaired intersection and impact bar. c) Solace application allows the explorer to talk with option vehicles and with web has, that enhances solace level of travelers. for example VANET gives web property to movement hubs while on the development so voyager will exchange music, send messages, watch on-line films and so forth. III CHALLENGES Vanets qualities grasp fast hub development, successive topology correction, and short affiliation life especially with multi-bounce ways. These 3 qualities debase the execution of some climbing topological directing conventions for unplanned system extensively. this is regularly as an issue of topological steering needs to keep up a way from the supply to the goal, however the trail lapses rapidly on account of continuous topology changes. The execution of VANET steering conventions depend on shifted parameters like quality model, driving environment and a lot of extra, in this way thinking of Associate in Nursing practical directing convention for circumstances is extraordinarily gigantic test in VANET. In VANET steering convention to boot needs to handle issues like dispersed system thickness, interfering air, long way length, inertness and so on IV CATOGORIES OF ROUTING PROTOCOL The routing protocols are classified into 5 categories: Topology based mostly routing protocol, Position based mostly routing protocol, Cluster based mostly routing protocol and Geo forged routing protocol and Broadcast routing protocol.[2] A. Topology Based Routing Protocols These routing protocols use links info that exists within the network to perform packet forwarding. They’re more divided into Proactive, Reactive & Hybrid Protocols. A.1 Proactive routing protocols The proactive routing implies that the routing information, like next forwarding hop is maintained within the background no matter communication requests. The advantage of proactive routing protocol is that there's no route discovery since the destination route is keep within the background; however the disadvantage of this protocol is that it provides low latency for real time application. The assorted forms of proactive routing protocols are: FSR, DSDV, OLSR, CGSR, WRP, and TBRPF "Fisheye state routing” [3] FSR is AN efficient connection state directing that keeps up a topology map at each hub and spreads connection state redesigns with singularly quick neighbors not the complete system. also, the connection state illumination is show in various frequencies for different entrances figuring on their jump separation to this hub. Passages that square measure extra away square measure show with lower recurrence than ones that square measure closer. The decrease in telecast overhead is recorded for the vagueness in steering. Nonetheless, the vagueness gets rectified as parcels approach more closer to the goal. "Destination-Sequenced Distance-Vector Routing”[4] DSDV may be a table-driven steering subject for impromptu versatile systems upheld the Bellman- Ford equation. It dispenses with course process, will build merging speed, and diminishes administration message overhead. In DSDV, each hub keeps up a next-jump table, that it trades with its neighbors. "Optimized Link State Routing Protocol” [5] it's AN advancement of an immaculate connection state convention for versatile impromptu systems. each hub inside the system chooses a gathering of neighbor hubs alluded to as multipoint transfers (MPR) that retransmits its bundles. The neighbor hubs that don't appear to be in its MPR set will exclusively sweep and system the parcel. This method lessens the amount of retransmissions in an exceptionally telecast technique. "Cluster head entree Switch Routing” [6] The CGSR convention contrasts from the past convention inside the assortment of tending to and system association subject utilized. instead of a "level" system, CGSR may be a bunched multi bounce portable remote system with numerous heuristic directing plans. It express that by having a bunch head predominant a gaggle of impromptu hubs, a system for code partition, channel get to, directing and data measure distribution are regularly attained. "Wireless Routing Protocol”[7] The WRP outline in could be a table-based convention with the objective of keeping up steering illumination among all hubs inside the system. each hub inside the system is responsible for keeping up four tables: (a) distance table, (b) routing table, (c) link-cost table, and (d) message retransmission list (MRL) table. A.2 Reactive/ Ad hoc based routing Reactive routing opens the course just on the off chance that its vital for a hub to talk with each other. Receptive directing comprises of course revelation present that the inquiry bundles are overflowed into the system for the trail

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 17


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

pursuit and this area Completes once course is found. He varied styles of reactive routing protocols are AODV, PGB, DSR, TORA, and JARR. "Ad Hoc on Demand Distance Vector"[8] In AODV directing, upon receipt of a show question (RREQ), hubs record the location of the hub causation the inquiry in their steering table. This system of recording its past bounce is termed retrogressive learning. Upon internal at the end of the line, an answer bundle (RREP) is then sent through the whole way gotten from retrogressive figuring out how to the supply. At each stop of the trail, the hub would record its past jump, hence securing the forward way from the supply. The flooding of inquiry and causation of answer secure a full duplex way. When the trail has been built, its kept up as long in light of the fact that the supply utilizes it. Join disappointments are going to be accounted for recursively to the supply and can progressively trigger an alternate inquiry reaction strategy to search out a just took the ribbon off new course. "Ad Hoc on Demand Multipath Distance Vector”[9] AOMDV is sweetening of AODV convention. It backed the space vector thought and uses jump by-bounce steering methodology. Additionally, AOMDV deals with discovering courses on interest utilizing a course revelation system. The most qualification between the 2 conventions exists in the scope of courses found in every course disclosure method. In AOMDV, RREQ spread from the supply towards the end builds various opposite techniques each at moderate hubs additionally in light of the fact that the end hub. Numerous Rreps cross these converse strategies again to make various forward techniques to the terminus at the supply and middle of the road hubs. AOMDV conjointly gives middle of the road hubs substitute routines that square measure discovered to be useful in diminishing course revelation recurrence. The various systems found by AOMDV square measure circle free and disjoint, and encourage to discover such strategies with proficiency utilizing a surge based course disclosure. “Preferred Group Broadcasting”[10] PGB may be a TV instrument that means to scale back show over head identified with AODV's course disclosure and to create course dependability especially essential in Vanets wherever quick paced vehicles zone unit utilized as remote hosts. Underpinned the got sign of the printed, beneficiaries will affirm whether they region unit inside the most prominent group and that one inside the bunch to show. "Dynamic supply Routing” [11]DSR uses supply directing, that is, the supply shows amid a learning parcel's the arrangement of halfway hubs on the steering way. In DSR, the inquiry bundle duplicates in its header the Ids of the moderate hubs that its crossed. The end of the line then recovers the entire way from the inquiry bundle, and uses it to answer to the supply. As an issue, the supply will create a way to the objective. On the off chance that we tend to allow the objective to send numerous course answers, the supply hub may get and store various courses from the goal. an exchange course may be utilized once some connection inside the current course breaks. Amid a system with low quality, {this is this is regularly this may be} favorable over AODV since the decision course can be attempted before DSR launchs an alternate surge for course revelation. "Temporally Ordered Routing Algorithm” [12]TORA directing has a place with a group of connection inversion steering calculations wherever a steered non-cyclic diagram (DAG) at the end is made backed the crest of the tree stock-still at the supply. The regulated non-cyclic chart guides the stream of parcels and guarantees achieve capacity to any or all hubs. When a hub offers a parcel to send, it shows the bundle. Its neighbor exclusively telecasts the bundle in the event that it's the creating hub's descending connection upheld the DAG. "Junction-based reconciling Reactive Routing”[13] The design of transport impromptu system (VANET) amid a town air comprises of the numerous possible ways and intersections that structures the directing ways. Most limited way directing isn't feasible as an issue of each way ought to be occupied with vehicles. An adaptable multi-jump directing convention that adjusts well to the town environment even with rapidly continually changing system topologies and a lot of separated and thick system conditions is requested. A totally interesting position based basically directing convention i.e. JARR, it'll address the weaknesses of this conventions by assessing the thickness of approaches to be utilized. "Dynamic Manet On-demand”[14] DYMO convention may be a clear and snappy steering convention for multi jump systems. It decides uni-cast courses among DYMO switches at interims the system in AN on-interest an alternate touchy convention, giving enhanced joining in dynamic topologies in an exceptionally organize. To affirm the accuracy of this convention, Digital marks and hash capacities region unit utilized. The key operations of the DYMO convention zone unit course revelation and course administration. Firstly, course disclosure is that the technique for making a course to an end of the line once a hub wants a course to that. When a supply hub needs to talk with an end hub, it starts a Route Request (RREQ) message. Inside the RREQ message, the supply hub incorporates it address and its arrangement extend that gets augmented before its added to the RREQ. A.3 Hybrid Protocols The cross breed conventions square measure acquainted with curtail the administration overhead of proactive directing conventions and decrease the beginning course disclosure defer in touchy steering conventions. "Zone routing protocol” [15] The Zone Routing Protocol (ZRP) consolidates the profits of the proactive and receptive methodologies by keeping up a state-of-the-art topological guide of a zone focused on every hub. Inside the zone, courses region unit specifically reachable. For ends outside the zone, ZRP utilizes a course

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 18


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

disclosure strategy, which may like the local directing information of the zones. In ZRP, a proactive directing convention (IARP) is utilized in intra-zone correspondence and a between zone responsive steering convention (IERP) is utilized in between zone correspondence. Supply sends information on to the terminus if every range unit in same directing zone generally IERP responsively launchs a course disclosure. "Hybrid ad hoc Routing Protocol” [16] HARP partitions whole system into non-covering zones. It means to focus a stable course from a supply to an objective to improve delay. It applies course disclosure between zones to breaking point flooding inside the system, and choose best course upheld the soundness criteria. In HARP steering is performed on 2 levels: intra-zone and between zone, depending on the position of terminus. It utilizes proactive and receptive conventions as a part of intra zone and between zone steering severally. It's not pertinent in great impromptu systems. B. Position Based Routing Protocols Position fundamentally based steering comprises of complexity of directing principle. They impart the property of abuse geographic situating illumination to choose resulting sending bounces. Position {based generally principally based} directing is approximately separated in 2 sorts: Position based greedy V2V protocols, Delay Tolerant Protocols. B.1. Delay Tolerant Network (DTN) Position Based Routing Protocols There are an unit transport steering conventions intended for Vanets that zone unit treated as an issue of Delay Tolerant Network (DTN). Since hubs region unit greatly portable, amid this style of a system, they experience the ill effects of successive separations. to beat this, parcel conveyance is expanded by allowing hubs to store the parcels once there's no contact with diverse hubs, to hold the bundles for a couple of separation till gathering with distinctive hubs, and to forward upheld a few measurements to the neighbours hubs (likewise alluded to as convey and-forward procedure). "Vehicle-Assisted information Delivery” [17]VADD could be a vehicle directing system designed for climbing steering in disengaged transport arranges by the prospect of convey and-forward backed the vocation of certain vehicle quality. A vehicle settles on a decision at an intersection and chooses future sending way with the tiniest parcel conveyance delay. A way is simply a stretched street from A crossing point. "Geographical opportunist Routing” [18]Geopps exploits the directed courses of vehicles' route framework to pick vehicles that square measure most likely to move closer to a definitive goal of a bundle. It figures the briefest separation from parcel's goal to the closest reason (NP) of vehicles' way, and assessments the entry of your time of a bundle to objective. Geopps needs route illumination to be presented to the system, therefore, protection like vehicle's whereabouts may well be a trouble. B.2 Hybrid Position Based Routing Protocols GeoDTN+Nav[19] may be a mixture of non-DTN and DTN approach that has the ravenous mode, the edge mode, furthermore the DTN mode. It changes from non-DTN mode to DTN mode by assessing the property of the system backed the amount of bounces a parcel has cosmopolitan to date, neighbor's. conveyance quality, and neighbor's course with significance the goal. B.3 NON-DTN Position Based Routing Protocols The basic guideline inside the ravenous methodology is that a hub advances its parcel to its neighbour that is closest to the terminus. The sending methodology will fizzle if no neighbor is closer to the end than the hub itself. Amid this case, we are stating that the bundle has arrived at the 'neighborhood most extreme' at the hub since its made the most local advancement at this hub. The directing conventions amid this class have their recuperation system to influence such a disappointment. These directing conventions return underneath overlay steering. Relate in nursing overlay steering has the trademark that the directing convention works on a gathering of agent hubs overlaid on prime of the predominating system. Inside the urban setting, its not burdensome to watch that choices territory unit made at intersections as this region unit the spots wherever parcels fabricate turns onto a novel street area. In this manner, the overlaid directing conventions presented beneath have one thing to attempt to with hubs at intersection. “Greedy Perimeter unsettled Routing”[20] In Greedy Perimeter unsettled Routing (GPSR), a hub advances a parcel to an immediately neighbor that is geologically closer to the objective hub. This mode of sending is termed eager mode. When a bundle achieves an area most, a recuperation mode is utilized to forward a parcel to a hub that is closer to the end of the line than the hub wherever the bundle experienced the local most. The bundle resumes sending in insatiable mode once it achieves a hub whose separation to the goal is closer than the hub at the local most to the end. "Position-Based Routing with Distance Vector Recovery” [21]PBR-DV utilizes AODV-style recuperation as bundles speak to a region most. The hub at the local most would show a welcome bundle amid which is that the hub's position and terminus' area. After accepting a welcome parcel, a hub would first check in the event that its closer to the goal than the hub at the local most. On the off chance that its not, it records the hub from that it gets the solicitation parcel (like retrogressive learning) and rebroadcasts the appeal; else, it sends an answer to the hub from that it gets the solicitation.

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 19


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

“Greedy Perimeter organizer Routing” [22]GPCR depends upon the genuine actuality that town road kind a characteristic organizer diagram. GPCR needn't bother with outer static road map for its operation. GPCR comprises of 2 segments: A Restricted Greedy sending technique, a repair method for steering equation. A GPCR takes after a terminus basically based eager sending method, it courses messages to hubs at convergence. Since GPCR doesn't utilize any outside static road outline hubs at crossing point square apportion extreme to look for. GPCR not singularly disposes of the nature of hub planarization, however conjointly enhances directing execution as bundles travel shorter bounces inside the edge mode. Besides, the enhanced directing call keeps parcels from being directed to the erroneous heading that dependably winds up in higher deferral. “Connectivity Aware Routing Protocols” [23]Car utilizes AODV-based way revelation to pursuit out courses with confined show from PGB. Be that as it may, hubs that sort the course record not one or the other their past hub from retrogressive learning nor their past hub that advances the trail answer bundle from the terminus. Progressed Greedy Forwardcing (AGF) is then acclimated forward the course answer over to the supply through the recorded grapple focuses. When the supply gets the course answer, it records the trail to the end and begins transmission. Data bundles are sent in an extremely voracious way at the terminus through the set of stay point's exploitation AGF. Also to handle quality by AGF, auto presents "watchmen" to help to tack the present position of an objective. A guarding hub will channel or redirect bundles or adds information to a bundle that may in the end convey this information to the parcel's objective. “Geographic supply Routing” [24] GSR sensation relies on upon the supply of a guide and processes a Dijkstra briefest way on the overlaid chart wherever the vertices square measure intersection hubs and hence the edges square measure roads that unite those vertices. The grouping of intersections makes the course to the goal. Bundles square measure then sent avariciously between intersections. Fere marvel doesn't ponder the property between 2 intersections; in this manner, the course won't be joined through. Recuperation once such a case happens is avaricious sending. The key qualification between Fere marvel and vehicles is that car doesn't utilize a guide and it utilizes proactive revelation of stay focuses that demonstrate a flip at an intersection. “Anchor-Based Street and Traffic Aware Routing”[25] A-STAR is practically equivalent to galvanic skin reaction in this bundles square measure steered through grapple purposes of the overlay. In any case, A-STAR is movement mindful: the activity out and about figures out if or not the grapple purposes of the street are considered inside the briefest way. A-STAR courses backed 2 styles of overlaid maps: a statically appraised guide and an alterably evaluated guide. A factually appraised guide presentations transport courses that by and large intimate stable amount of movement. e measurably evaluated guide square measure ordinarily joined as an issue of the extra information. A rapidly evaluated guide could be a guide that is produced upheld the time of time movement condition on the streets. “Landmark Overlays for Urban conveyance Routing Environments” [26] its condensed geographic eager overlay directing into 2 camps. The essential camp is geo-responsive overlay directing wherever resulting overlaid hub is situated upheld their neighbor hubs' separation to the end (STBR) or a mixture of it and activity thickness (Gytar). The second camp is geo-proactive overlay directing wherever the arrangement of overlaid hubs is situated from the earlier (GSR and A-STAR). Historic point Overlays for Urban transport Routing Environments (LOUVER) fits in with the second camp. "Greedy Traffic Aware Routing protocol” [27]Gytar is Associate in Nursing overlaid approach much the same as the methodologies said higher than in this bundles are sent rapaciously to following intersection which can then affirm the least complex intersection to forward next. Gytar accept that the measure of autos is given for every street from edge units and decides the property of streets. A score is given to each neighboring intersection considering the activity thickness and their separation to the goal. The weights to movement thickness and their separation to the objective ar configurable parameters. Gytar tries to copy the most brief way steering by thinking seriously about the street property. C. Cluster Based Routing Protocols Bunch fundamentally based directing is most mainstream in bunches. A gaggle of hubs recognizes themselves to be a territory of bunch and a hub is chosen as group head can show the parcel to group. keen quantifiability are frequently accommodated enormous systems however system postponements and overhead territory unit brought about once shaping groups in greatly portable VANET. The changed Clusters principally based directing conventions zone unit COIN, LORA-CBF, HCB, and CBDRP. “Cluster-Based Directional Routing Protocol”[28] It partitions the vehicles into groups and vehicles that range unit taking ownership same bearing kind a group. The supply sends the message to its group header then it advances the message to header that is inside the same bunch with the end of the line. At last the terminus header sends the message to the goal. The bunch header decision and support is same like universe sized microwave foundation radiation anyway it considers rate and course of a vehicle. “Location routing algorithmic rule with Cluster- primarily based Flooding” [29]In Lora_cbf, each hub will turn into the bunch head, section or group part. For each bunch, there's one group head. The hub that unites 2 groups range unit alluded to as section. The bunch head keeps up information with respect to its parts and doors.

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 20


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

The bundle sending is same the covetous directing. Exclusively bunch head and doors will send the arrangement demand (LREQ) parcels once the situation of the objective isn't available still on the grounds that the segment of the position Reply (LREP) messages. “Clustering for Open IVC Network”[30] Group head decision in COIN depends on transport progress and driver aims instead of ID or relative quality as in standard bunch ways. IVC conjointly obliges the intermittent nature of between vehicle separations. In a perfect world, the relative quality between a bunch head and a part hub ought to be low, in place that they stay in radio contact the length of feasible. “Hierarchical Cluster based Routing”[31] it's an extraordinary based positioned Cluster steering convention intended for to a great degree quality specially appointed systems. HCB is two-layer correspondence plan. In layer-1 generally hubs have single radio interface and that they speak with each other through multi-bounce way. Among these hubs some even have an alternate interface with long radio correspondence change known as super hubs that exist each on layer-1and a couple of. Super hubs square measure ready to correspond with each other through the base station in layer-2. “Cluster based Location Routing”[32] This convention expect all vehicles will accumulate their positions through GPS. The algorithmic principle partitions the system into numerous groups. Each bunch incorporates a group head and a cluster of parts at interims the transmission fluctuates of the bunch head. The group head and parts territory unit formed as take after: a fresh out of the box new vehicle transmits a high Message. In the event that the vehicle gets an answer from the group head vehicle, the new vehicle would turn into a part of the bunch. If not, the new vehicle turns into the group head. D. Geo Cast Routing Protocols Geo solid routing is essentially a location based mostly multicast routing. Its objective is to deliver the packet from supply node to all or any alternative nodes among a given nation-state (Zone of relevancy ZOR). The varied Geo solid routing protocols square measure IVG, DG-CASTOR and DRG “Inter-Vehicle Geocast”[33] IVG is anticipated for spreading security messages to vehicles on interstates. The convention utilizes a clock basically based system for message sending and intermittent telecasts zone unit wont to overcome system discontinuity. “Direction-based Geocast Routing Protocol for question dissemination in VANET”[34]DgCastor convention could be a novel Geocast steering convention and especially LORA-CBF demonstrates amazingly heterogeneous execution results tailored for moving picture applications in VANETs. It aims to make a virtual community supported future locations prediction of the mobile nodes within the network. We have a tendency to decision this community a Rendez-Vous cluster wherever the nodes could meet within the close to future. However, the question is barely disseminated between the nodes happiness to a similar Rendez-Vouscluster.4.3 “Distributed strong Geocast”[35] DRG convention enhances the obligation of message sending by forming the zone of sending (ZOF) that encompasses the locale of investment. The zone of association (ZOR) is that the situated of geographic criteria a hub ought to fulfill so with respect to the Geocast message to be important thereto hub; while, the zone of sending (ZOF) is that the situated of geographic criteria a hub ought to fulfill in order to forward a geo-cast message. “Robust transport Routing”[36] RVR could be a dependable topographical multicast convention wherever singularly administration bundles are telecasted inside the system and consequently the data parcels are unithrown. The focus of the convention is to make an impression on all or any option vehicles at interims such that Zone of connectedness (ZOR). The ZOR is illustrated as an issue such that by its corner coordinates. A message is delineated by the triplet [a, M, Z] it demonstrates such that application, message and character of a zone severally. When a vehicle gets a message, it acknowledges the message on the off chance that its at interims the ZOR. “Dynamic Time-Stable Geocast Routing” [37]The most point of DTSG convention is to figure even with slight thickness systems. It powerfully alters the convention depending on system thickness furthermore the vehicles speed for higher execution. It characterizes 2 stages: prestable and stable sum. Prestable part helps the message to be dispersed at interims the area and stable-period middle hub uses store and forward system for a predefined time at interims the locale. E. Broadcast Based Routing Protocols Show directing is typically used in VANET for imparting, activity, climate and crisis, street conditions among vehicles and conveying notices and affirmations. The grouped Broadcast directing conventions are BROADCOMM, UMB, V-TRADE, and DV-CAST. "BROADCOMM”[38] BROADCOMM is predicated on hierarchic structure for course organize. In BRAODCOMM the course is part into virtual cells that move like vehicles. The hubs inside the course range unit composed into 2level of chain of importance: the essential Level incorporates all the hubs in an exceptionally cell, the second level is diagrammatic by cell reflectors, that zone unit few hubs settled shut to

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 21


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

topographical focal point of cell. Cell reflected carries on beyond any doubt interim of your time as bunch head and handles the crisis messages coming back from same parts of the cell or close neighbor. "Urban Multi-hop Broadcast Protocol” [39]UMB is proposed to beat the obstruction, bundle crash and concealed hub issues all through message circulation in multi jump show. In UMB the sender hub tries to pick the farthest hub inside the telecast course for sending and recognizing the parcel with none past topology information. UMB convention performs with a great deal of accomplishment at higher bundle hundreds and movement densities. "Vector primarily {based} Tracing Detection”[40] V-TRADE could be a GPS based message television conventions. The key arrangement is comparable to uni-cast directing conventions Zone Routing Protocol (ZRP). It groups the neighbors into entirely unexpected sending groups depending upon position and development illumination. For each bunch exclusively a bit set of vehicles is decided to transmit the message. "Distributed conveyance broadcast protocol"[41] DV-CAST utilizes local topology information by exploitation the occasional how-would you-do messages for TV the information. Each vehicle utilizes a banner variable to see whether the bundle is excess or not. This convention partitions the vehicles into 3 sorts depending on the local property additionally associated, meagerly joined, entirely disengaged neighborhood. "Edge-aware epidemic protocol"[42] EAEP is solid; data measure prudent information spread based basically to a great degree dynamic VANET convention. It lessens administration parcel overhead by disposing of trade of additional welcome bundles for message exchange between totally diverse groups of vehicles and facilitates group support. Each vehicle piggybacks its own particular land position to telecast messages to wipe out signal messages. After getting a fresh out of the box new rerun message, EAEP utilizes mixed bag of transmission from front hubs and back hubs amid a given measure of your time to ascertain the chance for making call whether hubs can rerun the message or not. "Secure Ring Broadcasting”[43] SRB is to lessen scope of retransmission messages and to urge a ton of stable courses. It arranges hubs into 3 groups underpinned their getting power as Inner Nodes (near creating hub), Outer Nodes (far unapproachable from bringing about hub), Secure Ring Nodes (best separation from bringing about hub). It limits rebroadcasting to exclusively secure ring hubs to weaken scope of retransmissions. V. COMPARISON OF ROUTING PROTOCOL The different protocols are analyzed focused around some imperative parameters and prerequisite in the given table 1 Protocols

Routing Mechanism

Digital Map Required

Scenario

Position Verification

Clustering

Forwarding Strategy

Control Overhead

CGSR DSDV OLSR FSR

No No No

Urban Urban Urban Urban

No No No No

Yes No No No

Medium High High

No No -

Urban Urban Urban

No No No

No No No

Multi-hop Multi-hop Multi-hop Multi-hop Store & forwarding Multi-hop Multi-hop

TORA ZRP HARP

Unicast Unicast Broadcast Unicast Unicast/ Multicast Unicast Unicast Unicast/ Multicast Broadcast -

No No No

Urban Urban Urban

No No Yes

No -

Low Medium Medium

GPCR

Unicast

Yes

Urban

Yes

No

GPSR

Yes

Both

Yes

No

CAR

Unicast Broadcast/ Unicast

Yes

Both

-

No

GSR

Unicast

Yes

Urban

Yes

No

A-STAR

Unicast

Yes

Urban

Yes

No

CBF

Unicast

-

Urban

Yes

No

STBR

Hybrid

Yes

Highway

Yes

No

B-MFR

Unicast

-

Urban

Yes

No

IVG ROVER DSTG DG-CASTOR DRG

Geo-cast Geo-cast Multicast Geo-cast Geo-cast

No No No No No

Highway Both highway Urban Highway

Yes No No No

No No -

Multi-hop Multi-hop Multi-hop Store & Forwarding Store & Forwarding Greedy Forwarding Store & Forwarding Greedy Forwarding Greedy Forwarding Greedy Forwarding Greedy Forwarding Greedy Forwarding Multi-hop Flooding Flooding Greedy

AODV DSR DYMO

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Low Low -

Medium Medium High Medium Medium Low High Medium -

Page 22


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

COIN CBDRP CBR CBLR HCB PGB

Unicast Unicast Unicast Cluster Based Cluster Based Unicast

Yes Yes Yes Yes Yes -

Urban Urban Urban Urban

Yes -

Yes Yes Yes Yes Yes No

DV-CAST UMB V-TRADE

Broadcast Broadcast Broadcast

No No

Highway Urban Highway

No Yes No

No No

EAEP

Broadcast

No

Highway

No

Yes

SRB

Broadcast

PBSB BROADCOMM

Broadcast Broadcast

No No -

Highway Highway Highway

No No Yes

No No No

Forwarding Multi-hop Multi-hop Multi-hop Multi-hop Store Forwarding Flooding Store Forwarding Store Forwarding Store Forwarding -

Medium Low Low Medium & High High & High & High & High -

VI. SUMMARY VANET is mastermind toward oneself system that assume real part in wise transport framework (ITS). The principle point of VANET is to give wellbeing in vehicular framework and recovery lives. In VANET, the topology of the system changes quick so that planning an effective steering convention is extremely troublesome errand. Directing is vital segment in VANET correspondence. The performance of routing protocol is depending on the movement of vehicles, driving environment and many more. VII. CONCLUSION In this paper we have examined, that GSR routing protocol is better as contrasted with other directing conventions in Highway situation in Vanets. GSR routing protocol is the Geographic based directing convention i.e. geographic based steering expects that every hub had investigating its physical/ land position of GPS or by some other position picking associations Proactive based routing protocol and it may fall flat in VANET because of utilization of more transfer speed and vast table data. Hybrid convention i.e. blends of proactive and sensitive steering convention. These convention deals with between zone and intra-zone idea that expand the directing overhead and consequently its execution get corrupted. Because of this reason the GSR is beated in all the three measurements i.e. End to end Delay, Throughput and Jitter REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.

Bijan Paul, Md. Ibrahim, Md. Abu Naser Bikas " VANET Routing Protocols: Pros and Cons" International Journal of Computer Applications (0975 – 8887) Volume 20– No.3, April 2011 W. Franz, "Inter-Vehicle-Communications", Based on Ad Hoc Networking Principles-The Fleet Net, 2005. M. Gerla, X. Hong, G. Pei, "FSR", July 2002 C.E. Perkins, P. Bhagwat, “Highly dynamic Destination-Sequenced Distance-Vector routing (DSDV) for mobile computers, SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications, 1994, pp.234-244 T. Clausen, et al., “Optimized Link State Routing Protocol (OLSR)”, Oct. 2003 C.S. Murthy, B.S. Manoj, "Ad Hoc Wireless Networks", Pearson, 2004, pp. 336-338 and 627 S. Murthy, “An Efficient Routing Protocol for Wireless Networks," October 1996 C. Perkins, E. Belding-Royer, S. Das, “AODV Routing”, RFC 3561, Network Working Group, 2003 M.K.Marina and S.R.Das, “On-Demand multipath distance vector routing in ad hoc networks” in Proceedings of the 9th IEEE International Conference on Network Protocols (ICNP), 2001, pp.124-130 NAUMOV V, “An evaluation of inter-vehicle ad hoc networks based on realistic vehicular traces”. MOBIHOC 2006 D. Johnson, B.D.A. Maltz, Y.C.Hu, “DSR”, by International Journal of Advance in Science and Technology, 2004, pp.64-69 V. Park, "Temporally-Ordered Routing Algorithm (TORA) Version 1 Functional Specification", 2001 C. A. T. H. Tee, A. Lee, “ARP in City Environments”, in International Journal of Phat Tran and Christer Wibom “Dynamic MANET On-demand”. Z. J. Haas, “The Zone Routing Protocol”, Sixth WINLAB Workshop on Third Generation Wireless information Networks, New Brunswick, NJ, Nov. 1997. Navid Nikaein, Christian Bonnet, Neda Nikaein, “HARP”, IEEE infocom 2001 J. Zhao, VADD, "Vehicle-Assisted Data Delivery in Vehicular Ad Hoc Networks", IEEE infocom, 2006. I. Leontiadis, GeOpps, "Opportunistic Geographical Routing for Vehicular Networks", 2007, P.C. Cheng, “Geodtn nav. Geographic DTN routing with navigator prediction for urban vehicular environments”, International Journal of Computer Applications, 2003, pp.23-28. B. Karp , “GPSR, Greedy perimeter stateless routing for wireless networks”, 2000 Kevin C. Lee,” Survey of Routing Protocols in Vehicular Ad Hoc Networks”. Kevin C. Lee,” Survey of Routing Protocols in Vehicular Adhoc Networks”, 2009. Naumov, V., "CAR in VANET," International Journal of Advance in Science and Technology, May 2007, pp. 340-346. J. Zhao, VADD, "Vehicle-Assisted Data Delivery in Vehicular Ad Hoc Networks", IEEE infocom, 2006 Seet, B.-C., “A-STAR: A Mobile Ad Hoc Routing Strategy for Metropolis Vehicular Communications.” 1980. K. Lee, Gerla, “LOUVRE”, International Journal of Computer Applications, 2008, pp.321-324. Moez Jerbi, “GyTAR”, International Journal of Computer Applications , September 2006, pp. 32-38.

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 23


Gurminder Kaur et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 16-24

28. 29. 30. 31. 32.

33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

Tao Song, “A Cluster-Based Directional Routing Protocol in VANET”, International Journal of Computer Applications, 2009, pp.376-382. R. A. Santos, "Performance evaluation of routing protocols in vehicular ad hoc networks”, International Journal of Ad Hoc and Ubiquitous Computing, 2005, pp. 80-91. J. Blum, "Mobility management in IVC networks," International Journal of Computer Applications, 2003, pp. 214-224. Yang Xia, “Hierarchical Cluster Based Routing for Highly Mobile Heterogeneous MANET”, IOSR Journal of Computer Engineering (IOSRJCE), 2007, pp. 1-9 Using the Cluster-Based Location Routing (CBLR) Algorithm for Exchanging Information on a Motorway”, Fourth IEEE Conference on Mobile and Wireless Communications Networks, September 2002, Stockholm, Sweden, Proceedings pp 212-216, ISBN 0-7803-7606-4. A. Bachir, "A Multicast Protocol", in Ad hoc Networks Inter- Vehicle Geocast. Springer, April 2003, pp. 281-286. Talar Atechian,” DG-CastoR: Direction-based GeoCast Routing Protocol for query dissemination in VANET”. H. P. Joshi. “Distributed Robust Geocast Multicast Routing for Inter-Vehicle Communication." M. Kihl, ”Reliable Geographical Multicast Routing in Vehicular Adhoc Networks”, 2007. Hamidreza Rahbar,“Dynamic Time-Stable Geocast Routing in Vehicular Ad Hoc Networks”, Master thesis in applied science, University of waterloo, 2010. M. Durresi, "Emergency broadcast protocol for intervehicle communications," on 11th International Conference on Parallel and Distributed Systems - Workshops (ICPADS'05) 2005, pp. 290-298. G. Korkmaz, "Urban multihop broadcast protocol for intervehicle communication systems", 2003. M. Sum, "GPS-based message broadcasting for Intervehicle”, 2000. O. K. Tonguz, "Broadcasting in VANET", 2007. M. Nekovee, "Reliable and efficient information dissemination in intermittently connected vehicular ad hoc networks", 2007. Rainer Baumann, “Vehicular Ad hoc Networks”, Master’s Thesis in Computer Science, ETH Zurich, 2004. Adnan Afsar Khan, ”Parameter less broadcasting in static to highly mobile wireless ad hoc, sensor and actuator networks”, August 1999.

IJSWS 15-110; © 2015, IJSWS All Rights Reserved

Page 24


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

A novel clustering approach to select optimal usability principles for educational websites Prafulla Bharat Bafna Symbiosis Institute of Computer Studies and Research (SICSR), Symbiosis International University (SIU), Atur Centre, Gokhale Cross Road, Model Colony, Pune –411 016, Maharashtra, INDIA. _________________________________________________________________________________ Abstract: Usability is the quality measure of a user's interaction with a product. When this product is website its termed as web usability. It is the quality attribute that assesses how easily user interfaces. User interface contributes to improve the design process. Web usability is an approach to make website easy to use for an enduser, without the requirement that any specialized training be undertaken. Website design is the key issue, because how the end user experiences the design is the key to acceptance. On the Web, usability is a necessary condition for survival. If a website is difficult to use, the users will leave the site and the usability will be zero. There are several usability principles mentioned in the literature but all these principles depend upon the application domain. It means that usability principles for commercial websites may not be applicable to educational websites. Clustering based technique is proposed which extracts major usability principles amongst several by forward feature selection and experts’ knowledge. It reduces time, efforts and resources. Outcomes of cluster quality is used as decision support for the optimum usability principles and clustering on complete data set for effective decision support.. The data set used is related to top 35 educational websites with respect to its usability count. Keywords: Usability, scale, optimal, entropy, purity, SMSE, clustering ________________________________________________________________________________________ I. Introduction With advances in Internet technology, website design had become critical issue .The main objective of website is to achieve goals set by application domain. User can be frustrated and may give up if more efforts are spent by him to accomplish a particular task. Confusing website might affect the productivity. Usability principles are application specific [1].To get optimal principles for educational website a clustering based approach is proposed in which cluster accuracy is used as an indicator to select the set of features. Kmeans and hierarchical algorithm are selected to confirm optimal principles. The selection criteria are entropy, purity and Sum of Mean Square Error (SMSE). Considering usability guidelines, top 35 college websites are analyzed out with respect to their usability count [1]. A questionnaire was designed based on website usability guidelines and distributed to users .Each objective question had 4 options. Each option is associated with a scale (1-4).Question weight is calculated according to the selected option. For each website 35 parameters are considered and distance matrix is formed. Websites represents rows and scale assigned to each usability principle represents a column Hierarchical algorithm is applied for different combination of features by considering expert’s advice and forward feature selection method [7]. The selection criteria is based upon minimum entropy, maximum purity, minimum SMSE is selected. Finally18 usability principles were selected from 35. These principles can be considered as mostly focused to design educational website. II. Background Usability process involves design with its evaluation and follows basic steps as requirement analysis, conceptual design, prototype, production, launch and maintenance [2]. Home page should be created in such a way that it will create positive first impression of a website. The important issues while designing websites are navigation, graphics, screen based control, page layout, etc.[3]. Benefits of planning usability into the website are Higher satisfaction leads to productivity, Completion with success to acquire brand loyalty, A higher rate of repeat users to progress in competition. The challenging task lies in extracting useful information from a large collection of data either from a data warehouse or from a database. [3,8]Generally collected data contains irrelevant or redundant attributes. Classification and clustering do not give accurate result if there are interdependent attributes. Correct feature selection is a fundamental data preprocessing step in data mining. FeatureMine algorithm contains sequence mining and classification algorithms which efficiently handles very large data sets with thousands of items and millions of records.[4] Edie Rasmussen states Cluster analysis is a technique which assigns items to groups based on a calculation of the degree of association between items and groups. Cluster analysis can be used for hierarchical algorithm. Nested data set is produced in which pairs of

IJSWS 15-113; © 2015, IJSWS All Rights Reserved

Page 25


Prafulla Bharat Bafna, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 25-27

items or clusters are connected successively. However, the hierarchical methods are better information retrieval. The commonly used hierarchical methods, such as single link, complete link, group average link, and Ward's method, have high space and time requirements. In order to cluster the large data sets with high dimensionality there is need to have a better algorithm Examples are the minimal spanning tree algorithms for the single link method, the Voorhees algorithm for group average link, and the reciprocal nearest neighbor algorithm for Ward's method. Edie listed steps of clustering including Selecting of the attributes on which items are to be clustered , selecting appropriate clustering method, Creating the clusters or cluster hierarchies ,interpreting clusters and validating the results etc.[5]They have focused on feature selection algorithms for classification(knowing class label ) and clustering (unsupervised feature selection) where data is unlabeled. Feature selection algorithms designed with different evaluation criteria broadly fall into three categories: the filter model , the wrapper model and the hybrid model . The filter model relies on [6] III. Data Collection Data is. Collected from 35 different websites having topmost usability count [1] and 35 usability principles are considered for each website. Each principle is converted into an objective question. Scale is assigned to each question [1]. K-means [5] plot was obtained for above matrix. Consistency represents X-axis and feedback message represents Y axis. Hierarchical algorithm [5] was applied on entire data and dendrogram was obtained presented in graph A and B. IV. Selection of Optimal usability principles k-means and hierarchical algorithms are applied on domain specific enhanced principles by forward Feature selection. approach.[7]. Graph A,B represents dendrogram produced by hierarchical algorithm using Euclidean distance for 25websites and 18 usability principles respectively Graph C,D represents kmeans plot for 25websites and 18 usability principles respectively. We can easily detect optimal guidelines form Table A having minimum SMSE and entropy also with maximum purity. From 35 standard website principles, principles selected are are navigation, error free , flexibility, homepage, menu, feedback ,memory load, graphs, notice board, category label, validation, scrolling, group label, grouping, response time, unique actions, parallel actions.focus Table A: Optimal website principles No of Principles

SMSE (sum mean square error)

entropy

purity

10

15.67

.29

.76

14

20

.35

.78

18

8.26

.19

.88

24

25

.38

.79

30

12

.24

.74

35

>22

.42

.69

Graph A: Dendrogram for 25X35

IJSWS 15-113; Š 2015, IJSWS All Rights Reserved

Graph B: Dendrogram for 25X18

Page 26


Prafulla Bharat Bafna, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 25-27

Graph D: Kmeans plot for 25X18

Graph C: Kmeans plot for 25X35

V. Conclusion The overall quality of a website is actually sum of many quality attributes, important one of which is usability. Focusing usability as a quality goal, there are serval usability principles explored in literature. These principles are not domain specific. Applying all principles at a time is cumbersome Above approach of selecting optimal usability principles for educational websites increases productivity and saves time, cost and efforts . References [1]. [2]. [3]. [4]. [5]. [6]. [7].

[8].

Prafulla bafna,Shailaja Shirwaikar ,Human Computer Interaction-paradigms,process,practices, Advances in computer vision and information technology,part 20,pg 989-998,2009. Nielsen, Iterative User Interface Design, IEEE Computer Vol. 26, No. 11 (November 1993), Alan Dix, ‘Human computer interaction’, prentice-Hall, Microsoft Corporation, 2004 Lesh, N. MERL, Zaki, M.J.,Scalable feature mining for sequential data, Intelligent Systems and their Applications, IEEE (Volume:15 , Issue: 2 ) ,2000 Chapter 16: Clustering Algorithms, Edie Rasmussen, University of Pittsburgh Huan Liu and Lei Yu, Toward Integrating Feature Selection Algorithms for Classification and Clustering, IEEE Transactions on Knowledge and Data Engineering, v.5 n.6, p.914-925,2004 Prafulla Bafna1 , Pravin Metkewar2* and Shailaja Shirwaikar3#,,Novel Clustering approach for Feature Selection , American International Journal of Research in Science, Technology, Engineering & Mathematics" (ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-362) September-November 2014 issue 8 volume Prafulla Bafna et al ,Comparative Analysis of Apriori and Improved Apriori Algorithm, International Journal of Emerging Technologies in Computational ,and Applied Sciences (IJETCAS), Issue 7 Volume 2 ,135-143,2014

IJSWS 15-113; © 2015, IJSWS All Rights Reserved

Page 27


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

NEURAL NETWORK BASED SOFTWARE RELIABILITY Gaurav Aggarwal1, Dr. V.K. Gupta2 Research Scholar, Jagannath University, Jaipur, Rajasthan, INDIA. 2 Assistant Professor, Rajasthan University, Jaipur, Rajasthan, INDIA. 1

Abstract: Software Reliability is the main concern due to the growing nature of industry. The reliability check is performed after the coding phase. The check failure leads to the reimplementation of the software. This paper designed a software reliability model that performs its working in two phases. First phase of the model is completed before the coding after the design phase. In this phase the design is checked against the requirements. The second phase of the model is placed after the implementation phase. The model is analyzed and results towards optimization. Keywords: Neural Network, Software Reliability Error back propagation, Coding Phase I. Introduction Software reliability is about to define the stability or the life of software system with different properties. These properties include the trustfulness of software system, software cost, execution time, software stability etc. The aspects related to these software system includes the probability of software faults, frequency of fault occurrence, criticality of fault, associated module with respective fault etc. In a software development process, the pre estimation of software reliability is required to deliver the software product. According to the required level of software quality estimation of software cost, development time is also estimated. There are number of quality measure that approves the software reliability [1]. Each stage of software life cycle itself takes some time quantum to deal with software reliability. Higher the software quality, lesser the software maintainability. Software reliability growth models, refers to those models that try to predict software reliability from test data [2]. These models try to show a relationship between fault detection data (i.e. test data) and known mathematical functions such as logarithmic or exponential functions. The goodness of fit of these models depends on the degree of correlation between the test data and the mathematical function [3]. II. Existing Software Reliability Model The author of [4] proposed a software reliability growth model on the basis of the existing model like logistic growth curve model etc. The model is designed on the basis of the tangential function. The tangential model must be drawn in the positive axis. It varies from 0 to infinity similar to the software reliability. The software reliability is inversely proportional to the fault detection as the no of fault detection decrease the reliability increases. The zero fault detection means the infinite reliability and the zero software reliability means the infinite faults. The proposed model suits the behavior of the software reliability so fits to the software reliability. In the beginning of testing, there is exponential number of faults in the software code. The number of faults is unknown but they are fixed in number. All faults are of same type. Each fault can be detected independent of each other. The remaining number of fault and the remaining time is useful to determine the other parameters. The probability of occurring of each fault is same. Each fault occurred can be removed instantaneously. The mean value function can be given as (1) The failure intensity can be expressed as (2) According to the failure intensity of the software at time t is proportional to the expected number of faults remaining in the software. III. Proposed Model In the existing software reliability models the failure check is performed after the coding and the implementation phase. Sometimes due to faulty designs requires reimplementation of the project. It leads to wastage of the resources and the time. The proposed software reliability model performs its working in two phases. First phase of the model is completed before the coding after the design phase. In this phase the design is checked against the requirements. This phase uses the error back propagation of the neural network. The second phase of the model

IJSWS 15-115; Š 2015, IJSWS All Rights Reserved

Page 28


Gaurav Aggarwal et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 28-30

is placed after the implementation phase. This model uses the mean time failure and intensity to increase the reliability. The detail working of the model can be understood by following model: Requirement Neural Network

Design issues

Requirement Analysis

Design

Design Analysis

Error

Update design

Coding and implementation

Testing

Reliability

Figure 1: Block Diagram of Software Reliability Model. A. Methodology The methodology complete in two phases one is training phase and the other is testing phase. Training Phase: 1. Input the requirement analysis . 2. Input the design issues corresponding to the requirement analysis. 3. Train the network(calculate weight matrix) . by using threshold activation function. Testing Phase: 4. Input the requirement 5. Analyze the requirement 6. The requirement analysis is given as input to the neural network 7. Neural network process the requirement analysis and provides corresponding design issues. 8. The design issues are checked in the design. 9. If any error occur then design is updated and go to step 8 10. Perform Coding 11. Then get mean time failure

12.

Calculate intensity of failure

13.

Remove failures.

IV. Implementation The proposed methodology is analyzed in two manners. In the first way, library software is built from the initial phase and complete methodology is applied on the software for high reliability. The software is build for the Vaish College of Engineering Rohtak Haryana. In the second way, the proposed neural network based methodology is analyzed on the datasets downloaded from [5]. The dataset predicts the defects in the five modules of the NASA products. The NASA products under analysis are JM1,PC1, KM1, KC1, and KC2. The variables in this dataset are evaluated by using static measures i.e. prediction variables. The subsets in the dataset

IJSWS 15-115; Š 2015, IJSWS All Rights Reserved

Page 29


Gaurav Aggarwal et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 28-30

are prepared by classifying the set on the basis of size of module. This results in the high prediction performance. The quality is better in the class level data prediction as compared to method level data prediction. The defect prediction is more accurate in the large modules as compared to small modules. The present work uses the 60% of the dataset for the training purpose and rest for the testing purpose. The target of this work on this dataset is to find the software reliability by finding the defects accurately. : Dataset

RMSE

JM1

Training Data 0.65

Testing Data 0.64

PC1

0.63

0.61

KM1

0.64

0.63

KC1

0.63

0.61

KC2

0.61

0.60

Library Software

0.48

0.18

The graphical representation of the above values is shown below: 0.7 0.6 0.5 0.4 Training 0.3

Testing

0.2 0.1 0 JM1

PC1

KM1

KC1

KC2

Library Software

The reduce in the error confirms the better performance of the model. V. Conclusion This paper introduces a software reliability model that performs the reliability check before the implementation phase as well as after the implementation phase. This model is suitable only if the software is building from the scratch, otherwise only the second phase of the model is applicable and the performance of the model is still better than other software reliability growth curve models. In future the model can be analyzed over large software been developed. References [1] [2] [3] [4] [5]

Garima Chawla et. al. , A Fault Analysis based Model for Software Reliability Estimation, International Journal of Recent Technology and Engineering (IJRTE ),ISSN: 2277 -3878, Volume-2, Issue-3, July 2013. Rita G. Al gargoor et. al. , Software Reliability Prediction Using Artificial Techniques, IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 4, No 2, July 2013,ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784,www.IJCSI.org. Al-Rahamneh Z., Reyalat M. Sheta A. F., Bani-Ahmad S., and Al-Oqeili S., A New Software Reliability Growth Model: Genetic-Programming-Based Approach , Journal of Software Engineering and Applications, 2011, pp. 476-481. Gaurav Aggarwal, V.k Gupta, “Software Reliability Growth model,” International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 4, Issue 1, January 2014, ISSN 2277 128X. http://promise.site.uottawa.ca/SERepository/datasets

IJSWS 15-115; © 2015, IJSWS All Rights Reserved

Page 30


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

GIS Based Automated Drainage Extraction for the Analysis of basin Morphometry in Vaniyar Sub-basin, South India 1

Ebenezer Sahayam Samuel. A, 2Dr. Sorna Chandra Devadass 1,2 School of Civil Engineering, Karunya University, Coimbatore, Tamilnadu, INDIA.

Abstract: In this paper, the Automated Extraction of Drainage through the SRTM data to delineate the basin morphometry using Arc GIS technique is attempted. This automated extraction tool will create the possible drainage pattern in the study area. After extracting the drainage, this was used to analyze the morphometric parameters such as, stream network (Strahler’s), and basin boundary. This technique is very useful for those who work in the field of terrain analysis, hydrology, and watershed analysis as it is easy to use with a single click for the generation of a reliable database for morphometric analysis. Keywords: Morphometry GIS, SRTM, Drainage, hydrology

I. Introduction The analysis of drainage basin is one of the most important parameters for the water resource studies as it provides valuable information concerning the quantitative explanation of the drainage system. This quantitative analysis is an important aspect of the characterization of a basin [12]. Morphometric analysis requires the measurement of linear features, areal aspects, gradient of channel network and contributing ground slopes of the drainage basin [10]. Drainage characteristics of many river basins and sub basins around the globe have been studied using conventional methods [4, 11, 12, 7 & 6]. Traditional methods such as field observation and topographic maps are commonly used for the identification of drainage networks in a basin. Alternatively, advanced methods like remote sensing and feature extraction from digital elevation models are also opted [13, 9, 8 & 5]. Analysis of all drainage networks from field observation is a tedious task because of the vast extent of rough terrain. Hence, the DEMs can be used to extract the drainage networks with the help of GIS techniques [3]. This template provides authors with most of the formatting specifications needed for preparing electronic versions of their papers. Margins, column widths, line spacing, and type styles are built-in; examples of the type styles are provided throughout this document and are identified in italic type, within parentheses, following the example.. Some components, such as multi-leveled equations and graphics, are not prescribed, although the various table text styles are provided. The formatter will need to create these components, incorporating the applicable criteria that follow. An automated extraction model is developed in this study that can be used to extract the drainage networks with the help of SRTM data through the use of ArcGIS software. The hydrology tool SRTM data is used as the input parameter for basin delineation and other supporting data for morphometric analysis. The developed tool was applied to Vaniyar River basin for data validation, and it was found that the data thus generated are reliable for further morphometric analysis. II. Study Area The Vaniyar Sub-basin of Ponniyar River, Tamilnadu, has been selected for the present study. The study area, lies between the latitudes 11°46’ N to 12°09’39” N and longitudes 78°12’27” E to 78°36’65” E covering an area of 982.25 km2. out of which plain land covers an area of 591.43 km2 (Figgure 1). The study area falls in Salem and Dharmapuri districts of Tamil Nadu. The base map was prepared from toposheets nos. 57L/4, 8, 58, I/1, and 5 of 1:50,000 Scale. The ephemeral stream Vaniyar has its source along the northern slopes of Shervorayan hills and originating at Kombur and takes a course along the northeast in the valley and emerges out as the main artery of Dharmapuri district with northeast gradient and small portion of catchment area falls in Salem district.

IJSWS 15-118; © 2015, IJSWS All Rights Reserved

Page 31


Ebenezer Sahayam Samuel et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 31-34

Figure 1: Location map of the Study area. III.

Methodology

Figure 2: Flow chart showing methodology. Arc GIS Software (Hydrology Tool)

Input (SRTM Data)

Fill

Flow Direction

Flow Accumulation

Stream Order

IJSWS 15-118; Š 2015, IJSWS All Rights Reserved

Page 32


Ebenezer Sahayam Samuel et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 31-34

IV. Result and Dicussion In the present study effectively exploring the hydrology tool; describe the process extraction of basin morphometry. The hydrologic tools allow you to identify sinks, determine flow direction, calculate flow accumulation, delineated watersheds, and create stream networks. Using a digital elevation model (DEM) as input, it is possible to automatic delineated a drainage system and quantify the characteristics of the system. The following graphics illustrate the steps involved in calculating a stream network from a DEM. A. Fill The hydrology function tools in ArcGIS any basins in the original DEM are identified. A sink is usually an incorrect value lower than the values of its surroundings. The depressions shown in the graphic above (the scattered colored points) are problematic because any water that flows into them cannot flow out. In order to ensure proper drainage mapping, these depressions can be filled using the Fill tool (Figure 3). B. Flow Direction Using the DEM as input into the Flow Direction tool, the direction in which water would flow out of each cell is determined (Figure 4). C. Flow Accumulation To create a stream network, the Flow Accumulation tool is used to calculate the number of upslope cells flowing to a location. The output of the Flow Direction tool from above is used as input (Figure 5). D. Stream Order A threshold can be specified on the raster derived from the Flow Accumulation tool; the initial stage is defining the stream network system. This task can be accomplished with the Con tool or using Map Algebra. An example of Con is a new raster = con (accum > 100, 1). All cells with more than 100 cells flowing into them will be part of the stream network. Apply the Stream Order tool to represent the order of each of the segments in a network. The available methods for ordering are the Strahler techniques. In the study area, 1st order, 2nd order, 3rd order, 4th order and 5th order streams are present in Vaniyar basin (Figure. 6).

IJSWS 15-118; Š 2015, IJSWS All Rights Reserved

Page 33


Ebenezer Sahayam Samuel et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 31-34

V. Conclusion In this paper, an attempt has been made to study the extraction of automated drainage network and its analysis using SRTM data as input. The detailed drainage information is extracted through Hydrology tool in ArcGIS. The 1st, 2nd, 3rd, 4th and 5th orders of the stream in Vaniyar river basin are covered in the present study. It is easy to make the result with a single click for the generation of a reliable database for morphometric analysis. This technique is very useful for those who work in the field of terrain analysis, hydrology, and watershed analysis. References [1] [2] [3] [4] [5] [6] [7] [8]

[9] [10] [11] [12] [13]

Amsterdam, Oxford, New York. 1983. Strahler, A.N (1952). “Dynamics basis of geomorphology”. Bull. Geol. Soc. Am. Vol. 63, Pp 923-938. 1952. Gurugnanam, B and Kalaivanan, K., (2014) A GIS Based Automated Extraction Tool for the Analysis of basin Morphometric in Kolli Hill, Tamil Nadu, India. Indian Journal of Applied Research, Volume: 4 | Issue: 9 | Pp. 247- 248. Horton, R.E (1945). “Erosional development of streams and their drainage basins: hydrophysical approach to quantitative morphology”, Geol. Soc. Am Bull. Vol. 56, Pp. 275–370. Kalaivanan. K, Gurugnanam. B and Suresh. M (2014). GIS Based Morphometric Analysis of Gadilam River Basin, Tamil Nadu, India. International Journal of Advanced Research (2014), Volume 2, Issue 7, 1015-1022. Krishnamurthy, J. Srinivas, G. Jayaram,V and Chandrasekar, M.G (1996). “Influence of rock types and structures in the development of drainage networks in typical hard rock terrain”, ITC Journal, Vol. 3, no. 4, Pp. 252-259. Leopold, L.B and Miller, J. P (1956). “Ephermeral streams: hydraulic factors and their relation to the drainage network”, U.S. Geological Survey, Professional Paper, 282-A, Pp 38. Magesh, N.S, Chandrasekar, N, Soundranayagam, J.P (2011). “Morphometric evaluation of Papanasam and Manimuthar watersheds, parts of Western Ghats, Tirunelveli district, Tamil Nadu, India: a GIS approach”, Environ. Earth Sci. Vol. 64, no. 2, Pp 373-381. Maidment, D.R. (2002). “Arc Hydro GIS for water resources”, Esri Press, California. Nautiyal, M.D (1994). “Morphometric analysis of drainage basin using aerial photographs: a case study of Khairkuli basin, District Strahler, A.N (1957). “Quantitative analysis of watershed geomorphology”, Trans. Am. Geophys. Union, Vol. 38, Pp 913–920, 1957. Strahler, A.N (1964).“Quantative geomorphology of drainage basins and channel networks”. In: VenTe Chow (ed.) Hand book of Applied Hydrology. McGraw Hill Book Company, New York. Verstappen, H (1983). “The applied geomorphology”, International Institute for Aerial Survey and Earth Science (I.T.C), Enscheda, The Netherlands.

IJSWS 15-118; © 2015, IJSWS All Rights Reserved

Page 34


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

Visualizing and Analyzing Industrial Samples Using Non-Destructive Testing Snigdha S. Parthan1, Aditi Deodhar1, Pranoti Nage1, Rupali Deshmukh2 1 Student, 2Asst. Professor 1,2 Department of Computer Engineering, Fr. C. Rodrigues Institute of Technology, Vashi, Navi Mumbai-706,INDIA Abstract: Non-destructive testing (NDT) is used in the industry to check for the properties of the material, internal flaws, etc. without cutting open the samples. Digital Radiography (DR), an NDT, is a form of x-ray imaging, where digital x-ray sensors are used. Advantages of DR include time efficiency, ability to digitally enhance and transfer images. In the proposed system, solid industrial objects like computer chip, reactor parts, etc. are considered. The proposed software converts these radiographs into tomographic images (virtual slices) as done in Computed Tomography (CT). CT is another powerful Non-Destructive Evaluation technique for producing 2D and 3D cross-sectional images of an object from the x-ray images. CT is widely used in the medical and in the industrial sectors. As the slices of the object can be viewed using the software, internal flaws, defects and the overall product can be observed. 3-D models of sample objects are reconstructed from the set of x-ray frames. Keywords: Non-Destructive Testing; Digital Radiography (DR); Computed Tomography (CT); 3D analysis; I. Introduction Non-Destructive means without destroying or breaking. Non Destructive methods and techniques are being widely used today in various different fields, almost all of them. There are several types of non-destructing activities being used which include NDT (Non Destructive Testing), NDE (Non Destructive Evaluation/ Non Destructive Examination) and NDI (Non Destructive Inspection). NDT is a wide group of analysis techniques used in science and industry to evaluate the properties of a material, component or system without causing damage. By this technique, internal complex parts can be precisely measured without destructive testing. Inspection and analysis costs are reduced. These methods can be performed on metals, plastics, ceramics, composites, cermets and coatings in order to detect cracks, internal voids, surface cavities, delamination, incomplete defective welds and any type of flaw that could lead to premature failure. Non-destructive Evaluation (NDE) is an interdisciplinary field of study which is concerned with the development of analysis techniques and measurement technologies for the quantitative characterization of materials, tissues and structures by non-destructive means. Non-destructive Inspection (NDI) is the examination of an object or material with technology that does not affect its future usefulness. NDI can be used without destroying or damaging a product or material. Because it allows inspection without interfering with a product's final use, NDI provides an excellent balance between quality control and cost-effectiveness. The term "NDI" includes many methods that can detect internal or external imperfections, determine structure, composition, or material properties and measure geometric characteristics. Popular Non- Destructive Testing methods include vibration analysis, Infrared thermography, acoustic emission analysis, Digital Radiography (DR), X-ray Computed Tomography (CT), Ground Penetrating Radar (GPR), eddy current imaging, Magneto inductive cable testing and Optical Imaging. This paper focuses on DR and CT, two powerful Non-Destructive Testing methods. Computed Tomography Industrial CT has its main application in specific examinations in flaw detection, analysis of failure and dimensional measurement of not accessible geometric features, inspection of assemblies or statistical investigations of material properties as density distribution [1]. Today the most important application of CT has become scanning for 3D-digitizing purposes. Includes point cloud generation for first article inspection procedures, reverse Engineering on a Motorcycle engine cylinder, reverse Engineering of a cylinder head, etc. Computed tomography is excellent for generating 3D data of complex cast parts. Aluminium as the mostly used material in engine production can easily be penetrated up to 300 mm diameter in the range of 0.2 mm. Now the process of CT may be discussed. X-ray slice data is generated using an X-ray source that rotates around the object; X-ray sensors are positioned on the opposite side of the circle from the X-ray source. The earliest sensors were scintillation detectors, with photomultiplier tubes excited by (typically) iodide crystals. Cesium iodide was replaced during the 1980s by ion chambers containing high-pressure Xenon gas. These systems were

IJSWS 15-121; Š 2015, IJSWS All Rights Reserved

Page 35


Snigdha S. Parthan et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 35-39

in turn replaced by scintillation systems based on photodiodes instead of photomultipliers and modern scintillation materials (for example earth garnet or rare earth oxide ceramics) with more desirable characteristics. Initial machines would rotate the X-ray source and detectors around a stationary object [2]. Following a complete rotation, the object would be moved along its axis, and the next rotation started. Newer machines permitted continuous rotation with the object to be imaged slowly and smoothly slid through the X-ray ring. These are called helical or spiral CT machines. A subsequent development of helical CT was multi-slice (or multi-detector) CT; instead of a single row of detectors, multiple rows of detectors are used effectively capturing multiple crosssections simultaneously. Systems with a very large number of detector rows, such that the z-axis coverage is comparable to the xy-axis coverage are often termed cone beam CT, due to the shape of the X-ray beam (strictly, the beam is pyramidal in shape, rather than conical). In conventional CT machines, an X-ray tube and detector are physically rotated behind a circular shroud [2]. An alternative, short lived design, known as electron beam tomography (EBT), used electromagnetic deflection of an electron beam within a very large conical X-ray tube and a stationary array of detectors to achieve very high temporal resolution, for imaging of rapidly moving structures. There are several advantages that CT has over traditional 2D radiography. CT completely eliminates the superimposition of images of structures outside the area of interest. Second, because of the inherent high-contrast resolution of CT. Internal complex parts can be precisely measured without destructive testing, thereby reducing inspection and analysis costs. CT is regarded as a moderate- to high-radiation technique. Also, development cost is reduced using CAD model and product quality is improved. The improved resolution of CT has permitted the development of new investigations, which may have advantages, compared to conventional radiography. Drawbacks of CT is that it is exorbitantly expensive. It delivers high dose of radiation. CT scan should never be done on a pregnant female or on diabetic patients because it damages human tissues. It damage body cells, including DNA molecules, which can lead to cancer. Younger the age, more the risk of getting affected. Applications of CT is that it is an analysis and inspection technique. It is used in assembly, part comparisons, void, crack and defect detection and geometric dimensioning and tolerance analysis. Although most common in medicine, CT is also used in other fields, such as non-destructive materials testing. Another example is archaeological uses such as imaging the contents of sarcophagi. II. Existing System 3D CT reconstruction models were directly compared to CAD models (Julien Noel, North Star Imaging Inc., Dec 2008) [4] and other CT models in order to display differences or similarities in measurements, densities, etc. Their objective was to measure the part, inspect for internal integrity and dimensionally compare the actual manufactured casting to the originally designed CAD model. For reconstruction, they converted 2D X-ray images into 3D voxels volume model. The 3D CT reconstruction was made of several million or billion voxels, could also be transformed to a surface model. The resolution of the 3D model depends on the number of voxels generated from CT reconstruction. A threshold value of radio density is chosen by the operator, and then set using edge detection image processing algorithms. From this, a 3D model can be constructed and displayed on-screen. The problem here is that the reconstruction assumes that the X-ray attenuation within each voxel is homogenous; this may not be the case at sharp edges. Reverse engineering is yet another technology that is used for digitalizing and reconstruction of products. Alexander Flisch et al [5] used an x-ray source, detector system, object positioning system and a computer system. Initially Computer Aided Designs (CAD) for a sample object were made. These were then compared with the original object and scaled accordingly. Finally a 3D model of the object was created which can be worked upon. This involves the process of data acquisition, determining the threshold after segmentation, reducing amount of data, point cloud generation and reverse engineering. Object used here was motorcycle engine. Disadvantages of this system is that computations give a better result if they are based on real physical model rather than data being gathered from theoretical CAD models. Further, there are technologies like micro computed tomography and nano-computed tomography. These are like the traditional CT with some improvements to investigate the structure of their samples with high spatial resolution [6]. These are used to make tomographic scans of a variety of objects going from biological and geological samples. High quality detectors are used. Disadvantages are that these are exorbitantly costly and resolution of order of 10-9 m may not be always be needed. Also the technology is not widely accepted and popular. Since it is at its initial stage, system is roughly designed. III. Proposed Sytem The proposed system focuses on the development of a software for visualization and analysis of threedimensional (3-D) scientific data. Software development includes a Graphical User Interface (GUI) tool for easy and interactive visualization and analysis of CT data of different industrial objects like computer chip, reactor parts, etc. This Software is expected to provide two different types of visualization•One is to navigate the image stack in three different orientations with or without interpolation. •The other is to render it as a volume in order to help users to understand the 3-D structure inside objects.

IJSWS 15-121; Š 2015, IJSWS All Rights Reserved

Page 36


Snigdha S. Parthan et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 35-39

Figure 1 Data acquisition process

The entire process, shown in fig.2 has been divided into five broad categories. First is data acquisition, shown in fig.1. This is where two dimensional digital radiographs are acquired. Depending on the resolution required, the number of radiographs needed is decided. For instance, 360 radiographs may be acquired (one for every degree) for each industrial part that needs to be reconstructed and analysed. The next step in this process is stacking of 2D Images. The data acquired, namely the digital radiographs, will be stacked together in this step. Next is the reconstruction software. In this stage, the stacked DRs are converted to various computed tomographs. CTs obtained are basically the cross-sectional slices of the sample part. These CTs are sensitive to noise and will be worked upon during later stages. After this comes stacking of 2D CT Slices. Similar to the second step, stacking is performed. Only that here stacking of the two dimensional CT slices obtained in step 3 takes place. And finally the Data Analysis Software. Using the data analysis software, the reconstructed industrial part can be rotated in three dimensional space. The three dimension phantom is generated using linear interpolation. Noise removal is performed in this step with the help of filters. Also, 3D rendering generates the slices of the inner areas of the part. Using this 3D model, measurements of the object can be taken for analysis like ratio of individual elements of the object; for research purposes, etc. In the third step for developing Reconstruction Software, Analytic algorithm is used .This method is based on exact mathematical solutions to the image equations and hence is faster. One such method to implement these solutions it is once again necessary to limit the spatial resolution of the image. Figure 2 Block diagram of the entire system

Analytical reconstruction method is divided into two types as: 1. Two-Dimensional Fourier Reconstruction Method. 2. Filtered Back Projection Method. Among these two techniques Reconstruction Software will be developed using Filtered Back Projection Method as it gives better performance and its performance is not affected due to any other filtering technique. And also it is being proved that this technique is mostly used for X-ray image processing.

IJSWS 15-121; Š 2015, IJSWS All Rights Reserved

Page 37


Snigdha S. Parthan et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 35-39

IV. Flow Chart

V. Conclusion Software can be built based on this proposition to carry out Non-destructive testing of industrial samples. The proposed system can be used for visualizing and analyzing these samples in three dimensional spaces with the principal benefit of obtaining the complete model of both external as well as internal surfaces without destroying it. The software can also measure the length and depth of the flaws. On building an extremely user-friendly front end, this proposition can be better than existing systems due to its ease of usage. Any individual with slightest of technical knowledge may also benefit from such systems. Minute flaws and defects can be detected, which may be corrected on the original sample. This technique for testing the quality of the final product can be used in various sectors apart from the industrial sector. These, to name a few, include Research, Forensic engineering, Mechanical engineering, Electrical engineering, Civil engineering and Medicine. References [1]

Alexander Flisch, Joachim Wirth, Robert Zanini, Micheal Breitenstein, Adrian Rudin, Florian Wendt, Franz Mnich and Ronald Golz, “Industrial Computed Tomography in Reverse Engineering Applications” , Computerized Tomography for Industrial Applications and Image Processing in Radiology, DGZfP-Proceedings BB 67-CD.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, pp 68-73.

IJSWS 15-121; © 2015, IJSWS All Rights Reserved

Page 38


Snigdha S. Parthan et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 35-39

[2] [3] [4] [5]

[6]

http://en.wikipedia.org/wiki/X-ray_computed_tomography Rodney A. Brooks, Giovanni D1 Chiro, “Principles of Computer Assisted Tomography (CAT) in Radiographic and Radioisotopic Imaging”, Phys. Med. Biol., Vol. 21, No. 5, pp 689-732. Julien Noel, “Advantages of CT in 3D Scanning of Industrial Parts”, 3D Scanning Technologies Magazine, Vol. 1, No. 3, Dec 2008. Alexander Flisch, Joachim Wirth, Robert Zanini, Michael Breitenstein, Adrian Rudin, Florian Wendt, Franz Mnich, Roland Golz, “Industrial Computed Tomography in Reverse Engineering Applications”, DGZfP-Proceedings BB 67-CD, March 1999, pp 45-53. B.C. Masschaele, V. Cnudde, M. Dierick, P. Jacobs, L. Van Hoorebeke, J. Vlassenbroeck, "UGCT: New X-ray radiography and tomography facility", Nuclear Instruments and Methods in Physics Research Section A 580, May 2007, pp 266–269.

VI.

Acknowledgments

We would like to express our sincere gratitude to Dr. Umesh Kumar, Section Head, Industrial Tomography and Instrumentation Section, IP&AD, BARC and Mr. Lakshminarayana Y., Scientific Officer, Industrial Tomography and Instrumentation Section, IP&AD, BARC for introducing us to the field of DR & CT. We are highly indebted to the consistent support and guidance provided to us.

IJSWS 15-121; © 2015, IJSWS All Rights Reserved

Page 39


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Determining the Co-relation between Behavioral Engagement and Performance of e-Learners Dias B.A.M.T., Malida K.K.D.S., Sahani M.A., Jayathilaka J.M.S.C., T. C Sandanayaka, G.T.I Karunarathne Faculty of Information Technology, University of Moratuwa, Moratuwa, Sri Lanka __________________________________________________________________________________________ Abstract: The main focus of the research is to identify the co-relation between performance and behavioral engagement of learners in the e-learning environment. Two experiments were conducted in controlled environment to gather data using behavioral tracking software. Data analysis is done by using both statistical approach and data mining approach to develop the final model. Set of specialized software was used to preprocess the dataset. The developed model can be used to predict the learner’s performance level based on identified independent parameters. Key words: Co-relation, Data-mining, Behavioral Engagement ________________________________________________________________________________________ I. Introduction E-Learning is defined as all forms of electronic supported learning and teaching, which are procedural in character and aim to effect the construction of knowledge with reference to individual experience, practice and knowledge of the learner. Information and communication systems, whether networked or not, serve as specific media to implement the learning process. [1] As a highly dynamic application area in modern computing, eLearning promises to play a crucial role in transforming traditional education.[2] The knowledge transfer process within the context of technology-based teaching and learning environments can be interpreted as a holistic phenomenon composed of two related streams: the teaching process and the learning process. [3] Furthermore, e-learning embraces more than simply reading online lessons. It is a large and complex field of research encompassing a variety of learning and teaching paradigms, for example: constructivistic, serial, symmetric, cognitive, face-to-face, discovery, managed learning [4].Though early approaches merely focused on the student's cognitive processes, this has changed with new research emphasizing the role of user behavioral engagement in creating a productive and enjoyable learning process.[5] The goal of the project is to identify the co-relation between learner performance and learner behavioral engagement by tracking e-learner's different behaviors together with performance. E-learner performance level has been determined by using the grade the elearner has achieved at the end of the learning activity and that is used as the dependent variable in the relationship identification. Selected learner behavioral engagement parameters are eye focus area on the screen, cursor points with time, time spent on the lesson material, time spent on other websites (social media), keyboard input of learner. The developed background software applications were used to record data relevant to the learner interactions with the lesson. Statistical analysis and data mining techniques were used to analyze the dataset and develop the model. As the final outcome, a model is developed to automatically predict the performance level of the learner by capturing, processing and analyzing data relevant to learner’s behavioral engagement. II. Research Methodology The research project is conducted under the assumption that there is a relationship between the e-learner behavioral engagement and the e-learner performance during an e-learning activity.

Figure 2.1: High Level Architecture of the System

IJSWS 15-124; Š 2015, IJSWS All Rights Reserved

Page 40


Dias et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 40-46

The research was conducted under four main phases. First phase is developing data capturing methodologies and software applications. Four basic software applications were developed to track behavioral engagement of the learner and finally all those separate software applications were integrated in to one system. Second phase is gathering primary data on learner behavior and performance by conducting experiment. Two experiments were conducted for data gathering. Round 01 data gathering was conducted to 177 students in a controlled lab environment.

Figure 2.2: Round 01 Data Collection The experiment of round two data gathering was conducted for 53 undergraduate students at the same lab environment as round one data gathering. Phase three is data analysis and co-relationship identification. Data analysis is done by using statistical approach and data mining techniques. Statistical methodologies such as ‘Correlation Calculation’ and ‘Linear Regression Line Calculation’ were applied to the data set to identify the relationship between the independent variables and dependant variables. Data mining techniques also used to map the behavioral data with performance data. Evaluation is the fourth phase of the project. The evaluation was conducted in two main ways such as evaluation of the tools developed for capturing data and evaluation of the final model developed by analyzing learner behavior and performance data. III. Implementation (A) Eye Movement Capturing Software OpenCv library and C++ was used to develop the application. The face of the user is detected by the developed eye tracking application and eyes are detected after that. After detecting the eyes, the area of the each eye is divided in to two parts vertically. The sclera percentage of each eye is detected and saved to database at each frame in which the eye is detected.

Figure 3.1: Areas used for Image Processing At the beginning, a calibration page is presented to the user and the user is asked to focus eyes on the left corner and the right corner of the screen. By that the maximum and minimum values for sclera area are identified. The computer screen is divided into three equal vertical areas (A, B, C) and using the minimum and maximum calibration values , the area on which the user is focusing his/her eyes at each frame is predicted using the following algorithm:

IJSWS 15-124; © 2015, IJSWS All Rights Reserved

Page 41


Dias et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 40-46

Where: w = screen width (in pixels) a1 = minimum area of sclera at calibration (at left side) a2 = Maximum area of sclera at calibration (at left side) A = area of the sclera at a movement

Figure 3.2: Eye Calibration Points (B) Mouse Movement Capturing Software and Data Preprocessing The X, Y coordination of the curser in the activity window with time (during each hundred millisecond) is captured by a Java program. The data captured is written to a text file automatically and the text file is created in C drive of the user’s computer. The data set cannot be directly applied for analysis process since it is not compatible with the type of other data. Hence the variance of mouse movements with respect to time was used for the analysis process. Mouse movement variance was calculated by using following algorithm.

Where:

Where: Z = Length of one movement X2 = Second x coordination of the screen X1 = First x coordination of the screen Y2 = Second y coordination of the screen Y1 = First y coordination of the screen N = number of movements (C) Keyboard Tracking Software and Data Preprocessing In this project, a key logger was developed which quietly monitors keyboard actions to log any key press the user makes. The program stores the key board input in a log file and sends the input to the database. The program is written operating system-specific hence at this stage the program for keyboard capturing is written in c# for windows OS. The input keyboard data is saved in a text file. The text file is then read by a java program and checked against a string of pre-defined keywords from the lesson. Keyboard input data set was preprocessed to get the number of keyword searched by the learner. The count of the keywords typed by the user is calculated and sent to the database as the input parameter. (D) Web Extension to track the sites visited and respective time on each site and Data Preprocessing An extension for Google Chrome browser is developed using Javascript. Extensions are extra features and functionality that can be added to Google Chrome. By using the extension, Google Chrome is customized to capture the switching time and the URL between switching materials while engaging with the e-learning activity. Therefore the software records all the URLs visited by the learner and the time spent on each URL. Based on the URLs and the respective time spent on each URL; the time spent on the learning activity was taken as a percentage of the total time spent as an input parameter. In the same way, the time spent on non-learning

IJSWS 15-124; Š 2015, IJSWS All Rights Reserved

Page 42


Dias et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 40-46

URLs was taken as a percentage of the total time as an input parameter. The ten most visited, social media and entertainment related sites by Sri Lankans were defined as the non-learning URLs in the calculation process. (E) Student marks prediction software Finally a software application was developed to predict the learner performances by getting the details of independent parameters based on the statistical model which is the final outcome of the statistical data analysis. The application was developed by using java programming language.

Figure 3.3: Web Extension IV. Data Analysis (A) Statistical Data Analysis Statistic analysis is done to determine whether there is a relationship among the obtained data. Correlation calculation and Multiple Regression Line calculation was applied to the data set during statistical data analysis using “Data Analysis Plug-in� in Microsoft Excel (B) Round one data collection As the majority of marks are ranked in the 80-100 range, the marks do not have a normal distribution and the frequency of marks of round one performance data is as follows:

X axis denotes the marks and Y axis denotes the Number of students who obtained the relevant marks. (C) Round two data collection The e-learning experiment to track behavioral and performance data was conducted on 50 level 04 undergraduates. This experiment was conducted avoiding the practical and technical issues in round one data collection and was based on a more common, historical subject. The performance data are distributed more naturally in second round data set. Hence the data collected in the round two is used for advance data analysis to develop a model which is used to predict the learner performance level. Frequency of the student marks of second round data set is as follows:

X axis denotes the marks and Y axis denotes the Number of students who obtained the relevant marks.

IJSWS 15-124; Š 2015, IJSWS All Rights Reserved

Page 43


Dias et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 40-46

The following co-relationships were identified during correlation analysis. 1. Co-relation Coefficient between Eye Movements and Marks is 0.2400 2. Co-relation Coefficient between Mouse Movements (variance) and Marks is0.3436 3. Co-relation Coefficient between Social Media Engagement (as a percentage from time spent on the whole activity) and Marks is 0.6713 4. Co-relation Coefficient between number of keywords typed and Marks is 0.4852 Multiple linear regression methodology was used to model the relationship between dependent parameter and several independent variables. y denoted Student Marks and x denoted independent variables such as eye focus points on the screen, cursor points with time, time spent on the lesson material, time spent on other websites(social media), keyboard input of the learner.

Figure 4.1: Multiple Linear Regression Model Calculation The Multiple Linear Regression line was calculated using Microsoft Excel for the round 02 data by considering 85% of accurate level. The final model developed based on statistical analysis is as follows: y = (-0.000129) x1 + 0.000174x2 + 0.19402x3 + 0.082514x4 Where: y = Marks x1 = Mouse variance x2 = Eye focus area x3 = No of key words x4 = Involve Time (D) Data Mining Data mining techniques were used for further analysis of the data set and validate the model. Weka 3.6 was used for mining purposes. The collected data are used as the training set on which the model is built using J48 classifier. A new dataset is used to validate the model and the grade of the learners of the new dataset is predicted based on the developed model.

The developed system is capable to predict the learner performance level based on the developed model by tracking the behavioral engagement of the student in the e-learning environment.

IJSWS 15-124; Š 2015, IJSWS All Rights Reserved

Page 44


Dias et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 40-46

(E) Evaluation Evaluation methods were chosen based on the functionality and the nature of the modules of the system. The developed software was evaluated using actual e-learners and by doing cross validation and backward validations. E1. Evaluation of the model The developed model is tested by using ten unused data elements from the round two data collection. To evaluate the accuracy of the model, the data regarding eye focus point, mouse movement variation, searched text and visited sites of e-learners during the e-learning activity were applied to the model developed, and thereby based on those behavioral data the marks of each learner was predicted by using the model. Accuracy level of the model was 85%. E2. Evaluation of the software E2.1. Evaluation of the eye tracker If the eye tracker works accurately, it should provide exact area (A,B,C) when a learner focuses his/her eyes at some area of the computer screen. Hence a set of ten eye focus instances were used as the inputs for the evaluation. The test users were asked to focus their eyes within the A,B,C areas and the values recorded as the output by the eye tracker were checked against the actual eye focus areas of the three points on the screen. No. of Instances

Actual eye focused area on the screen

Correctly predicted no. of instances

Incorrectly predicted no. of instances

10

A

6

4

10

B

8

2

10

C

5

5

Table 5.1: Test case for the “eye tracker” E2.2 Evaluation of the Mouse Tracker If the mouse tracker works accurately, it should provide exact x, y coordination when a learner points the cursor at some point of the computer screen. Hence a set of x, y coordination and user curser points were used as the inputs for the evaluation. Therefore four corner points on the screen were used of which the x, y coordination is known and pre-defined. The test users were asked to point the cursor upon the four corners of the screen (upperleft, upper right, lower right, and lower left) and the x, y coordination recorded as the output by the mouse tracker were checked against the actual x, y coordination of the four corners of the screen. Cursor Point on the screen (Input)

Expected x, y coordination

Output by the mouse tracker

upper- left

0,0

0,0

upper right

1366,0

1366,0

lower right

1366,768

1366,768

lower left

0,768

0,768

Table 5.2: Test case for the “mouse tracker” E2.3 Evaluation of the Keyboard Tracker If the keyboard tracker works accurately, it should record exact words when a learner types the particular words using the keyboard. Hence a pre-defined words were used as the inputs for the evaluation. The test users were asked to type those words while the software runs as a background application and the set of words recorded as the output by the keyboard tracker were checked against the actual words. Expected Outcome

Actual Outcome

Sahani Matharaarachchi

Sahani Matharaarachchi

Madushi Dias

Madushi Dias

Malinda Kandalama

Malinda Kandalama

Sadun Jayasekara

Sadun Jayasekara

Table 5.3. Test case for the “keyboard tracker” E2.4 Evaluation of the web extension If the web extension works accurately, it should record the visited web URLs and time spent on each URL accurately when a learner visits various websites during the e-learning activity. Therefore four most commonly used websites were used for the evaluation. The test users were asked to visit these websites and stay on each website for two minute duration while the web extension runs as a background application and the URLs and time spent on each URL recorded as the output by the web extension were checked against the actual URLs and the time spent on them.

IJSWS 15-124; © 2015, IJSWS All Rights Reserved

Page 45


Dias et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 40-46

The URLs visited

The Approximate Time Spent

Recorded URLS by the web extension

Recorded Time Duration

https://www.facebook.com/

2 minutes

https://www.facebook.com/

2.02 minutes

http://en.wikipedia.org/

2 minutes

http://en.wikipedia.org/

2.00 minutes

https://www.youtube.com/

2 minutes

https://www.youtube.com/

1.99 minutes

http://moodle.itfac.mrt.ac.lk/

2 minutes

http://moodle.itfac.mrt.ac.lk/

2.03 minutes

Table 5.4 Test case for the “web extension” E2.5 Evaluation of the Developed Model After determining the co-relationships and developing the relationship model, 10 unused datasets from the Round 02 Data Collection were used to validate the relationships identified by applying to the model. To evaluate the accuracy of the model, the data regarding eye movement variation, mouse movement variation, searched text and visited sites of e-learners during the e-learning activity were applied to the model developed, and thereby based on those behavioral data the marks of each learner was predicted by using the model. VI. Further Work This knowledge obtained from this research project and the relationships and sub relationships identified by pattern recognition process can be used in the future researches to enhance learner performances, enhance the quality and design of e-learning materials, decision making on student’s overall performance and knowledge gathering and also this research results will be able to be used for future researches and implementations in elearning applications. VII. Conclusion By considering the developed model, it can be concluded that the mouse movement variance negatively affects to the marks achieved by the learner, Eye focus pattern of the learner positively affects the marks achieved, number of keywords searched is positively affects the marks achieved, time involved with the learning activity is positively and strongly affects to the marks achieved by the e-learner. Compared to other attributes; the time spent on non-learning activities have very weak co-relation with the marks achieved. Hence it does not have considerable effect for the marks of the student, it was rejected from the Multiple Linear Regression Model.

1. 2. 3.

4. 5.

References Dr.P.Nagarajan and Dr.G.Wiselin Jiji, "ONLINE EDUCATIONAL SYSTEM (e- learning)" in International Journal of u- and e- Service, Science and Technology, Vol. 3, No. 4, December, 2010 Johan Ismail,"The design of an e-learning system Beyond the hype", in Internet and Higher Education 4 (2002) 329–336 Victor Manuel García Barrios, Christian Gütl,Alexandra M. Preis,Keith Andrews, Maja Pivec, Felix Mödritscher and Christian Trummer, "AdELE: A Framework for Adaptive E-Learning through Eye Tracking", Proceedings of I-KNOW ’04 Graz, Austria, June 30 - July 2, 2004 Jennifer Lennon and Hermann Maurer, "Why it is Difficult to Introduce e-Learning into Schools And Some New Solutions" in Journal of Universal Computer Science, vol. 9, no. 10 (2003), 1244-1257 Oye N.D., Mazleena Salleh and N. A. Iahad, "E-Learning Methodologies and Tools" in (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No.2, 2012

IJSWS 15-124; © 2015, IJSWS All Rights Reserved

Page 46


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Community Detection: A Boom to Society Mini Singh Ahuja1, Jasmine2 Assistant Professor, 2Student (M.Tech.) 1,2 Department of Computer Science,GNDU, Regional Campus,Gurdaspur, Punjab, INDIA. __________________________________________________________________________________________ Abstract: In real world, there are many networks available such as social networks, biological networks etc. These networks have abundant information stored in them which can be extracted to help the society. So the analysis of complex networks has received a lot of attention from the scientific community during the last decades. Community structure is one of the properties of these networks. Community detection technique is used to find community structure within its complex networks. Keywords: Complex Network, Community Structure, Applications of Community Structure __________________________________________________________________________________________ 1

I. Introduction Huge graphs of true life are called complex networks. In the perspective of network theory, a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in real graphs. The complex networks are a set of many associated nodes that relate in different ways. Fig. 1. A Complex Network

The nodes in an network are also called vertices or elements, which can be represented by the symbols v1, v2,……..vn, where n is the total no. of nodes in network. The study of complex networks is a youthful and vigorous area of scientific research stimulated largely by the pragmatic study of real-world networks such as computer networks and social networks. In nature, one can get many diverse networks, meaning different kinds of nodes and associations e.g in a social network the nodes are people and the connections can be friendship relations. In the same the world, one can define connections in a different way, for example: two people are related if they are siblings. Apparently the network defined through the friendship connection is different than the one defined through the sibling relationship, because if two people are friends that doesn't mean that they are siblings and vice versa, however the nodes in each network can be the same . For example, Biological networks that we can get at atomic level are: genetic regulation networks, protein networks, neuronal networks and metabolic networks. On another level, we can get information networks (e.g. internet), social networks (e. g. facebook, scientific collaborators, swell of diseases) and ecological networks. Fig. 2. Connections

This very effortless but amazing fact, that has made possible to create arithmetical models to recognize and to clarify the structural and sometimes dynamical properties of these networks. The learning of networks concerns

IJSWS 15-125; © 2015, IJSWS All Rights Reserved

Page 47


Mini Singh Ahuja et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 47-50

different fields in science, from neurobiology to algebraic physics. The most important study of these networks deals with their structures because structure is also associated to function. Fig. 3. Friendship and Internet

A. Collective Actions In Nature: The collective actions of systems such as shoals of fish, swarms of insects or fleet of birds, in which hundreds or thousands of organisms be in motion together in the same route without the evident leadership of a leader, is one of the most magnificent examples of major organization observed in environment .From the objective point of view, there has been a drive to try to find out and know the general principles that preside over the emergence of communal order in these systems, in which the interactions between individuals are most probably of short-range. During the last some years, several models have been proposed to account for the large-scale properties of flocks and swarms. II. Community Structure Much research effort has been stanch to develop methods and algorithms that can proficiently underline this hidden structure of a network, yielding a huge literature on what is called today community detection. Community detection is the most major sub-domains in complex network study. Community detection is vital for many reasons, including node categorization which entails homogeneous groups, group leaders or vital group connectors. Communities may relate to groups of pages of the World Wide Web dealing with correlated topics, to functional modules such as cycles and pathways in metabolic networks, to groups of correlated individuals in social networks etc. A community is a interconnected subset of nodes with denser internal links comparatively to the rest of the network. A community structure is a set of communities or more specifically a partition of the network node set. Communities are clusters of strongly associated nodes within a network. The nodes in a community should have more intra-community associations rather than inter-community connections. This type of structure brings out much knowledge about the network e.g. in a metabolic network the communities correspond to biological functions of the cell. In the web graph the communities correspond to topics of interest. Community detection is an important part of network study. The goal is simple: to detect how nodes in the graph have to be grouped into communities. Some algorithms totally partition the nodes; others allow for some nodes to be considered without community. Community detection has several applications: for example, signifying connections (say, Facebook) and formative network structure. Community detection can also been quite useful to derive features for categorization tasks where the data includes connections that are not easily adaptable into features. Fig. 4. Community Detection Technique

A. Why We Detect Communities?  Community discovery permit us to “make sense” of the primary structure of networks  Understanding the interactions between vertices and nodes.  Easy visualization and navigation of huge networks

IJSWS 15-125; © 2015, IJSWS All Rights Reserved

Page 48


Mini Singh Ahuja et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 47-50

 

Forming the base for other undertakings such as data mining Understanding the complexity of networks

III. Applications of Community Structures A. Lung Cancer: Joel J. Bechtel, MD, FCCP; William A. Kelley, MD, FCCP; Teresa A. Coons, PhD; Al. Gerry Klein, MD; Daniel D. Slagel, MD; and Thomas L. Petty, MD, Master FCCP(CHEST 2005) study described a community-based lung cancer identification project focusing on high-risk patients who receive general care in a primary care outpatient practice. Chest poster anterior radiographs, thoracic CT scans, and sputum cytology were offered to subjects with air flow obstruction. Case finding in high-risk patients in a primary care population can be accomplished at a relatively low cost [4]. B. Terrorist Group: Todd Waskiewicz has discussed about the work of terrorist group on social media sites who used friend of friend relationship to transfer their influences and ideas to others members of group through intermediate’s so that to influence their propaganda. Social media pages are less prone to attack [5]. C. Ant Colony Optimization: Di Jin, Dayou Liu, Bo Yang, Jie Liu, Dongxiao He proposed a community detection algorithm MACO in his paper. This algorithm makes the ants’ movement decision become more and more intelligent, and the trend that any ant remains in its own community becomes increasingly obvious [6]. D. Ring Search: Kwan Hui Lim, Amitava Datta(2013) proposed a community detection algorithm that directly detects the community centered at an individual of interest, without the need to first detect all communities. He proposed algorithm which utilizes an expanding ring search starting from the individual of interest as the seed user[7]. E. Health Social Network: Xiaoxiao Ma, Guanling Chen, Juntao Xiao(2010) provided an empirical analysis of a health OSN, which allows its users to record their foods and exercises, to track their diet progress towards weight-change goals, and to socialize and group with each other for community support [8]. F. Facebook: Emilio Ferrara s(2012) proposed the first large-scale community structure investigation of the Facebook social network. He crawled the facebook and detected the communities in it using two algorithms Label propagation and Fast network community Algorithm[11]. G. Link Prediction: Tsuyoshi Murata, Sakiko Moriyasu (2008) described an improved method for predicting links based on weighted proximity measures of social networks. The method is based on an assumption that proximities between nodes can be estimated better by using both graph proximity measures and the weights of existing links in a social network [12]. H. Dengue Fever In Peru: Oxford University described that community structure refers to the study of densely connected groups within the network using two algorithms i.e. Greedy and Newman algorithms [17]. I .Incomplete information Networks: Wangqun Lin, Xiangnan Kong ,Philip S. Yu(2012)studied the problem of detecting communities in incomplete information networks with missing edges and a hierarchical clustering approach is proposed to detect communities within the incomplete information networks[18]. J. Refactoring Software Packages: Wei-Feng, Pan Bo, Jianging Li(2013) given a novel approach to refactor the package structures of object oriented software. It proposes a constrained community detection algorithm to obtain the optimized community structures in software networks, which also correspond to the optimized package structures [19]. K. Recommendation Systems: Massimiliano Zanin constructed a good recommendation algorithm that could guide the customer through a great variety of items in an e-store, complex networks can help us in improving the result [20]. L. Fraud Events in Telecommunications Networks: Carlos André Reis Pinheiro proposed Community Structure to Identify Fraud Events in Telecommunications Networks .He describes community detection to define users and methodology to detect outliers inside the social networks [21]. IV. Conclusion In this paper, we have tried to outline the researchers who used community structure to solve various practical problems of the society.Community Detection has become a boom to society. VI. References [1].

[2]. [3]. [4].

[5].

Michele Coscia, Fosca Giannotti and Dino Pedreschi1, (1Computer Science Department, University of Pisa, Pisa, Italy , 2 KDDLab, ISTI-CNR, Pisa, Italy, 3Center for Complex Network Research, Northeastern University, Boston, USA), “A Classification for Community Discovery Methods in Complex Networks”,2011. Wangsheng Zhang, Gang Pan_, ZhaohuiWu, Shijian Li(Department of Computer Science, Zhejiang University, China),Online Community Detection for Large Complex Networks Günce Keziban Orman, Vincent Labatut, Hocine Cherifi, (Galatasaray University, University of Burgundy),”On accuracy of community discovery algorithms” Joel J. Bechtel, MD, FCCP; William A. Kelley, MD, FCCP; Teresa A. Coons, PhD; Al. Gerry Klein, MD; Daniel D. Slagel, MD; and Thomas L. Petty, MD, Master FCCP,” Lung Cancer Detection in Patients With Airflow Obstruction Identified in a Primary Care Outpatient Practice”,2005. Todd Waskiewicz (Air Force Research Laboratory, AFRL/RIEA 525 Brooks Road, Rome, NY 13441-4505), “Friend of a Friend Influence in Terrorist Social Networks”

IJSWS 15-125; © 2015, IJSWS All Rights Reserved

Page 49


Mini Singh Ahuja et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 47-50

[6]. [7]. [8]. [9]. [10] [11]. [12]. [13]. [14]. [15]. [16]. [17]. [18]. [19]. [20]. [21].

Di Jin, Dayou Liu, Bo Yang, Jie Liu, Dongxiao He (College of Computer Science and Technology, Jilin University Changchun, 130012, China). “Ant colony optimization with a new random walk model for community detection in complex networks”, 2012. Kwan Hui Lim, Amitava Datta (School of Computer Science and Software Engineering, The University of Western Australia Crawley, WA 6009, Australia),” A Seed-Centric Community Detection Algorithm based on an Expanding Ring Search”,2013. Xiaoxiao Ma ,Guanling Chen, Juntao Xiao (Department of Computer Science, University of Massachusetts Lowell), “Analysis of An Online Health Social Network” ,2010. D. Maloney-Krichmar and J. Preece. The meaning of an online health community in the lives of its members: Roles, relationships and group dynamics. In Proceedings of the International Symposium on Technology and Society, 2002. D. Maloney-Krichmar and J. Preece. A multilevel analysis of sociability, usability, and community dynamics in an online health community. ACM Transactions on Computer-Human Interaction, 12(2), June 2005. Emilio Ferrara (Department of Mathematics University of Messina V.le F. Stagno D'Alcontres n. 31, 98166, Italy), “Community Structure Discovery in Facebook”, 2012. Tsuyoshi Murata, Sakiko Moriyasu (Department of Computer Science, Tokyo Institute of Technology W8-59 2-12-1 Ookayama, Meguro, Tokyo 152-8552 JAPAN),” Link Prediction based on Structural Properties of Online Social Networks”, 2008. Günce Keziban Orman and Vincent Labatut, “A Comparison of Community Detection Algorithms on Artificial Networks”, hal00633640, version 1 - 19 Oct 2011. Catanese, S., De Meo, P., Ferrara, E., Fiumara, G., Provetti, A., 2011. “ Crawling facebook for social network analysis purposes” In: Proceedings of the International Conference on Web Intelligence, Mining And Semantics. pp. 52:1{52:8. Clauset, A., Newman, M., Moore, C., 2004. “Finding community structure in very large networks.” Physical Review E 70 (6), 066111. Newman, M.E.J. “The structure and function of complex networks” SIAM Review 45 (2003) 167-256. Oxford university, “Community Detection in Relation to the Spread of Epidemics”. Wangqun Lin ,Xiangnan Kong ,Philip S. Yu, “Community Detection in Incomplete Information Networks”,2012. Wei-Feng, Pan Bo, Jiang Bing Li, “Refactoring Software Packages via Community Detection in Complex Software Networks”,2013. Massimiliano Zanin, Pedro Cano, “Complex Networks in Recommendation Systems”. Carlos André Reis Pinheiro, “Community Detection to Identify Fraud Events in Telecommunications Networks”, Paper 1062012 .

IJSWS 15-125; © 2015, IJSWS All Rights Reserved

Page 50


ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Prioritizing Usability Data Collection Methods Teena Tiwari Parmar1, Dr. Kshama Paithankar2 1 Assistant Professor, 2Professor, Shri Vaishnav Institute of Management, Indore, Madhya Pradesh, INDIA Abstract: Various data collection methods are available to serve the purpose of fulfilling usability requirements. Though all these methods are equally useful for data collection, the scale of influence of these methods varies in different classes of software in order to fulfil usability requirements. Observations strengthen the fact that to attain software usability, usability requirements must be determined and fulfilled throughout software development. Therefore, it is required to identify the Usability Data Collection Methods (UDCM) with preferences. In this paper, we propose to prioritize aforesaid UDCMs by computing some metrics referring different classes of software. Case study has also been performed considering three classes of software. The proposed work is useful to understand the importance of UDCMs and further for usability measurement considering specific nature of software. Also, it will lead to provide ease to developers and thereby maximizing user satisfaction as well. Keywords: Potential Data Collection Methods (PDCM), Applicability Factor, Productivity Factor, Total Productivity Factor, Usability Data Collection Methods (UDCM). I. Introduction Usability requirements capture the characteristics to build a usable system from user perspective. Understanding usability requirements is an integral part of usable software development [1]. It is widely observed that successful systems and products begin with identifying and fulfilling usability requirements [2]. Therefore, a strong need has emerged to identify the Data Collection Methods (DCM) to satisfy phase wise usability requirements. It will provide benefits such as; increased productivity, enhanced quality of work, reductions in support and training costs, and finally improved user satisfaction ([3], [4]). Also, incorporating usability requirements in the development process can reduce the risk of failure [5]. Further, fulfilling these usability requirements data must be collected for which there exist various DCM. It has been observed that usability requirements vary phase wise in the development process and the kind of final product i.e. to be usable software ([6], [7]). This paper is intended to find the Potential Data Collection Methods (PDCM) that are defined as UDCM for fulfillment of usability requirements ([8], [9]). Mainly, prioritization of UDCMs is proposed in this paper. Section-II covers the identification of PDCM towards prioritization of UDCM. Section-III deals with the prerequisites for prioritization of UDCM. In section-IV some useful metrics are proposed with computation along with the steps for UDCM prioritization. Case study illustrating the proposal quantitatively is presented in section-V. Finally, we end up with conclusion in section-VI. II. Identification of PDCM It has been observed that various DCM are available for collecting data related to usable software development ([8], [9]). However, all of these methods are not useful for fulfilling usability requirements completely. Therefore, to identify PDCM amongst all DCM the scale of influence is used considering Mandatory (Man), Not Required (NR) and Optional (Opt) influence for a specific software class. It is suggested that a method is PDCM if its influence in all phases of software development is either mandatory or optional [9]. Further, priority is assigned to the phases of software development. Considering analysis as the most important phase of software development, this phase has assigned the highest priority. Similarly, according to the importance of the phase, priorities are assigned. Finally, collaborating scale of influence and phase wise priority, PDCM are identified for each class of software and termed as UDCM as shown in Table-1. Table-1: Usability Data Collection Methods (UDCM) Potential DCM for Aps

Comparative Study Competitive Analysis Prototyping Scenarios Parallel Design Using Available Information

Analytical Preference of Methods for Aps M1 M2 M3 M4 M5 M6

Potential DCM for UIn

Comparative Study Competitive Analysis Scenarios Parallel Design Using Available Information Prototyping

IJSWS 15-127; Š 2015, IJSWS All Rights Reserved

Analytical Preference of Methods for UIn M1 M2 M3 M4 M5 M6

Potential DCM for Wes

Analytical Preference of Methods for Wes

Comparative Study Competitive Analysis Prototyping Scenarios Parallel Design

M1 M2 M3 M4 M5

User Testing

M6

Using Available Information

M7

Page 51


Teena Tiwari Parmar et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 51-55

III. Prerequisites for prioritization To attain most usable software, UDCM plays vital role while development. Though there exist many UDCM, the appropriateness of their applications in varying classes of software may be investigated. Subsequent sub-sections focus on classes of software, phases of software development and usability requirements respectively as prerequisites for said prioritization. A. Classes of Software It has been observed that different classes of software may have distinct usability requirements [9]. There exist various types of software with different functionality and characteristics. Based on features and users of the software three different types of software classes (Cs) have been identified such as; Application Software (Aps) as C1, User Interface (UIn) as C2, and Website (Wes) as C3. B. Phases of Software Development In every phase of software development, requirement of data and applicability of PDCM may vary. Mainly, four phase of software development such as Analysis (P1); Design (P2), Testing (P3) and Implementation (P4) have been considered for further study. According to preferences each phase has been assigned weights as shown in Table-2 for further computation. Table-2: Weighing Factor Phase Wise Phases of SDLC (Pi)

Weighing factor (Wi)

Weight Assign

P1 P2 P3 P4

W1 W2 W3 W4

4 3 2 1

C. Usability Requirements The need of usability requirements is to specify the ease of system use, learning and efficiently the software posses for user tasks effectively [10]. Inclusion of usability concept from the early stages of software development increases developers understanding for usability requirements [3]. Each phase consists of specific usability requirements of that phase [10]. Around 32 usability requirements of P1 have been observed such as; the end-users of the system, easy to learn, subjective satisfaction, easy to use, adaptability to the operator’s skill level, understandability, operability, effective data, efficient data, flexibility, utility, environment in which people use the product, prioritize user needs, understand the need of the system etc. For P2, 34 Usability requirements such as; learn ability, rate of errors by users, subjective satisfaction, no mistakes, access is easy, simplicity, recoverable, stress free use, context sensitive usage, provides support, visibility, reusability, error message, reaction time, time spent on completing a particular task, time from committing an error to recovering from it, help on problems, how comfortable to use it, convenience to use it etc. have been identified. P3 has 11 usability requirements such as; feedback, error message, primary requirements can be fulfilled or not, communicate to the users, testable, flexibility, ensuring, everything behaves as expected, tolerance, review able, self documenting, reliability etc. Total 10 usability requirements of P4 have been observed such as simplicity, recoverable, subjective satisfaction, accepting the system, communicate to the users, user training required, tolerance, tools require carrying the task, flexibility, convenience of use etc. IV. Prioritization of UDCM Prerequisites for prioritization of UDCM have been described in previous section. The definitions of some useful metrics for the prioritization along with the steps of prioritization are described in this section. 1. Applicability factor (Af): Af denotes the degree to which the particular PDCM is applicable or fit to be applied during usable software development. In other words, Af is referring to the number of usability requirements satisfied by a particular PDCM out of total number of requirements defined for a phase of development and computed using expression(1). No. of usability requirements satisfied in Pi Af((Cs (Pi(Mj)))=

(1)

Total no. of usability requirements of Pi

Where Cs denotes the class of software, Pi represents ith phase of software development and Mj is the jth PDCM applicable for selected Cs. The value of s varies from 1 to 3 representing the class of software as Aps, UIn and Wes respectively. Since only four phases are being considered, the value of i varies from 1 to 4. As shown in Table-1, upper bound of j is 6, 6 and 7 for class of software Aps, UIn and Wes respectively.

IJSWS 15-127; Š 2015, IJSWS All Rights Reserved

Page 52


Teena Tiwari Parmar et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 51-55

2. Productivity factor (Pf): Pf entails about utility of a PDCM with inclusion of usability in best manner. Pf is the measure of efficiency and utility of the particular PDCM in a specific phase of software development. The computation of Pf is also performed phase wise using expression (2). (2)

Pf (Cs (Pi (Mj))) = Af((Cs (Pi(Mj))) * Wi th

Where Wi is the weighing factor of the i phase of software development as denoted in Table-2. 3. Total Productivity factor (TPf): TPf is the measure of efficiency of PDCM in general i.e. in all phases of development. It is computed using expression (3) as a sum of phase wise Pf of each method. Now, prioritization of UDCM can be achieved through the computation of Af, Pf and then TPf for a class of software. Priority to UDCM is assigned on the basis of TPf where higher the value of TPf, higher is the priority. 4 TPf (Cs (Pi (Mj))) =∑ Pf (Cs (Pi (Mj))) i=1

(3)

V. Case Study Here, for each standard software class, Af, Pf and TPf are computed. The results of computation are shown in Table-3 and Table-4 respectively. Also, the prioritization of UDCM for standard software class based upon TPf, is shown in Table-5. Further, from each class two software have been studied. Usability requirements along with the general requirements of these software have been analyzed. Passing through the phases of software development, Af, Pf and TPf are computed for each software. The software referred for the study are mainly; Library Management System (LMS) and Hospital Management System (HMS) of class application software; Railway Ticket Generator System (RTGS) and ATM system from class of user interface and Matrimonial Website (MWes) and University Website(UWes) representing the website class of software. The results of computation of Af corresponding to the above mentioned software are shown in Table-6(a) and Table-6(b) whereas results representing Pf and TPf are illustrated in Table-7(a) and Table-7(b) corresponding to the same software and termed as observed values. Further, the deviation of observed values from the standard value of TPf has been evaluated in terms of percentage as shown in Table-8. The deviation corresponds to the increment, decrement and no change with observed values. Considering 5% variation between standard and observed value as negligible for positive and negative differences, it has been observed that the variation has to be obvious in some cases due to domain change. There exist many factors causing this deviation and affecting usable software development as well. It includes; type of system developed, users of the system, functioning of system, domain specific system, simple requirements and usability requirements of system, and most importantly data collection methods used in different phases of software development etc. Table-3: Af for standard software classes S/W Classes

P1 P2 P3 P4

Phases AfP1(Mj) AfP2(Mj) AfP3(Mj) AfP4(Mj)

Aps M1 0.593 0.411 0.454 0.6

M2 0.593 0.529 0.454 0.5

M3 0.812 0.852 1 0.7

M4 0.687 0.617 0.727 0.7

UIn M5 0.75 0.911 0.636 0

M6 0.5 0.382 0.363 0

M1 0.687 0.764 0.636 0.8

M2 0.718 0.794 0.636 0.7

M3 0.687 0.529 0.545 0.6

M4 0.875 0.941 0.636 0

Wes M5 0.718 0.676 0.363 0

M6 0.843 0.882 0.727 1

M1 0.875 0.764 0.636 0.8

M2 0.656 0.588 0.818 0.9

M3 0.937 0.941 0.545 0.8

M4 0.781 0.617 0.727 0.8

M5 0.875 1 0.909 0

M6 0.875 0.941 1 0.8

M7 0.968 0.911 0.909 0

Table-4: Pf for standard software classes S/W Classes Phases P1 PfPi(Mj) P2 PfPi(Mj) P3 PfPi(Mj) P4 PfPi(Mj) TPf(Mj)

Aps M1 2.372 1.233 0.908 0.6 5.113

M2 2.372 1.587 0.908 0.5 5.367

M3 3.248 2.556 2 0.7 8.504

M4 2.748 1.851 1.454 0.7 6.753

UIn M5 3 2.733 1.272 0 7.005

M6 2 1.146 0.726 0 3.872

M1 2.748 2.292 1.272 0.8 7.112

M2 2..872 2.382 1.272 0.7 7.226

M3 2.748 1.587 1.09 0.6 6.025

M4 3.5 2.823 1.272 0 7.595

Wes M5 2.872 2.028 0.726 0 5.626

M6 3.372 2.646 1.454 1 8.472

M1 3.5 2.292 1.272 0.8 7.864

M2 2.624 1.764 1.636 0.9 6.924

M3 3.748 2.823 1.09 0.8 8.461

M4 3.124 1.851 1.454 0.8 7.229

M5 3.5 3 1.818 0 8.318

M6 3.5 2.823 2 0.8 9.123

M7 3.872 2.733 1.818 0 8.423

Table-5: Prioritization of UDCM for standard software class Priorty 1 2 3 4 5 6

M3 M5 M4 M2 M1 M6

Aps Prototyping Parallel Design Scenarios Competitive Analysis Comparative Study Using Available Information

7

IJSWS 15-127; Š 2015, IJSWS All Rights Reserved

M6 M4 M2 M1 M3 M5

UIn Prototyping Parallel Design Competitive Analysis Comparative Study Scenarios Using Available Information

M6 M3 M7 M5 M1 M4

Wes User Testing Prototyping Using Available Information Parallel Design Comparative Study Scenarios

M2

Competitive Analysis

Page 53


Teena Tiwari Parmar et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 51-55

Table-6(b): Observed values of Af S/W Classes

P1 P2 P3 P4

Phases AfP1(Mj) AfP2(Mj) AfP3(Mj) AfP4(Mj)

Library Management System M1 0.56 0.558 0.454 0.5

M2 0.5 0.588 0.454 0.5

M3 0.843 0.852 1 0.9

M4 0.718 0.588 0.727 0.7

M5 0.75 0.941 0.636 0

Railway Ticket Generator System

M6 0.687 0.470 0.363 0

M1 0.625 0.705 0.636 0.8

M2 0.687 0.735 0.545 0.7

M3 0.687 0.529 0.636 0.6

M4 0.875 0.941 0.727 0

M5 0.781 0.647 0.363 0

M6 0.875 0.882 0.727 1

Matrimonial Website M1 0.875 0.735 0.727 0.8

M2 0.687 0.529 0.818 0.9

M3 0.937 0.941 0.636 0.9

M4 0.812 0.647 0.636 0.8

M5 0.875 1 0.909 0

M6 0.843 0.911 1 0.8

M7 0.937 0.911 0.909 0

M6 3.5 2.733 2 0.8 9.033

M7 3.872 2.733 1.818 0 8.423

Table-7(a): Observed values of Pf and TPf S/W P hases P1 PfPi(Mj) P2 PfPi(Mj) P3 PfPi(Mj) P4 PfPi(Mj) TPf(Mj)

Hospital Management System M1 2.5 1.41 0.908 0.5 5.318

M2 2.116 1.674 1.09 0.5 5.38

M3 3.124 2.556 2 1 8.68

M4 2.748 1.941 1.636 0.7 7.025

M5 2.872 2.823 1.272 0 6.967

ATM System

M6 2.748 1.323 0.726 0 4.797

M1 2.748 2.205 1.272 0.8 7.025

M2 2.872 2.292 1.272 0.7 7.136

M3 3 1.65 1.635 0.6 6.885

M4 3.5 2.823 1.272 0 7.595

University Website M5 3.124 1.941 0.726 0 5.791

M6 3.5 2.556 1.454 1 8.51

M1 3.5 2.115 1.272 0.8 7.687

M2 2.748 1.587 1.454 0.8 6.589

M3 3.748 2.823 1.272 0.9 8.743

M4 3.124 1.851 1.272 0.8 7.047

M5 3.5 3 1.818 0 8.318

Table-7(b): Observed values of Pf and TPf S/W Phases P1 PfPi(Mj) P2 PfPi(Mj) P3 PfPi(Mj) P4 PfPi(Mj) TPf(Mj)

Library Management System M1 2.24 1.674 0.908 0.5 5.322

M2 2 1.764 0.908 0.5 5.172

M3 3.372 2.556 2 0.9 8.828

M4 2.872 1.764 1.454 0.7 6.79

M5 3 2.823 1.272 0 7.095

Railway Ticket Generator System

M6 2.748 1.41 0.726 0 4.884

M1 2.5 2.115 1.272 0.8 6.687

M2 2.748 2.205 1.09 0.7 6.743

M3 2.748 1.587 1.272 0.6 6.207

M4 3.5 2.823 1.454 0 7.777

M5 3.124 1.941 0.726 0 5.791

M6 3.5 2.646 1.454 1 8.6

Matrimonial Website M1 3.5 2.205 1.454 0.8 7.959

M2 2.748 1.587 1.636 0.9 6.871

M3 3.748 2.823 1.272 0.9 8.743

M4 3.248 1.941 1.272 0.8 7.261

M5 3.5 3 1.818 0 8.318

M6 3.372 2.733 2 0.8 8.905

M7 3.748 2.733 1.818 0 8.299

M6 2.39 0.99

M7 1.47 0

Table-8: Deviation of standard and observed value Aps M1 4.09 4.01

M2 3.63 0.24

M3 3.81 2.07

UIn

M4 0.55 4.03

M5 1.28 0.54

M6 26.14 23.89

M1 5.98 1.22

M2 6.68 1.25

M3 3.02 14.27

M4 2.40 0

Wes M5 2.93 2.93

M6 1.51 0.45

M1 1.21 2.25

M2 0.77 4.84

M3 3.33 3.33

M4 0.44 2.52

M5 0 0

By referring Table-8, it has been observed that in LMS incremental difference of 26.14% has been obtained for M6. It shows that method M6 gives the better results if the manual system exists. Similarly for HMS, incremental difference of 23.89% has been obtained. This result confirmed that method M6 always gives the variation in results if domain of developing system is changed and it is known to developer. Rest all the methods of Aps category obtain less than 5% variation. In ATM system, incremental difference of 14.27% has been obtained for M3. It shows that M3 gives the better results because the working of ATM system is known to developer. For both M1 and M2 more than 5% decrementing difference has been observed in case of RTGS with minor deviation of 0.98% and 1.68% respectively. Since RTGS is emerging software, we may conclude that for existing software with ample availability M1 and M2 showcase better results in terms of usable software development. For rest of the methods of UIn category less than 5% difference has been observed. For MWes and UWes less than 5% deviation has been observed for all methods and no deviation has been obtained in M5 for Wes class of software i.e parallel design as obvious. After performing case study, it has been observed that priority of UDCM will be changed in case of specific software and it is shown in Table-9. Table-9: Prioritization of UDCM Priorty 1 2 3 4 5 6 7

M1 M5 M3 M6 M2 M4

Aps Comparative Study Parallel Design Prototyping Using Available Information Competitive Analysis Scenarios

M5 M6 M4 M1 M2 M3

UIn Using Available Information Prototyping Parallel Design Comparative Study Competitive Analysis Scenarios

M3 M5 M1 M6 M7 M4 M2

Wes Prototyping Parallel Design Comparative Study User Testing Using Available Information Scenarios Competitive Analysis

VI. Conclusion UDCM will help to collect data in all phases of software development considering usability requirements. Prioritization of UDCM will also reduce the efforts to select appropriate data collection methods and provide ease for developer to fulfill usability requirements phase wise. As a result, developer will be able to fulfill most of the usability requirements and thus maximize user satisfaction by developing usable software. In this paper, prioritization of UDCM has been attempted for generalized software class and for specific software system as well. Further, the case study has been performed and the results verified the proposed prioritization indicating the change if domain specific software are considered for development. It has been observed that some methods result in negligible deviation whereas a particular method may deviate reasonably for different kind of software. This work recommends that generalization of UDCM may not be yet feasible completely but there remains a

IJSWS 15-127; Š 2015, IJSWS All Rights Reserved

Page 54


Teena Tiwari Parmar et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 51-55

scope to investigate more specific UDCM on the basis of various parameters in order to get the more usable product. References 1. 2. 3. 4. 5.

6. 7. 8. 9. 10.

J. Nielsen, The Usability Engineering Life Cycle, IEEE Computer Society, Volume 25, Issue 3, pp 12-22. Y.Ormeno, J.Panch, N. Fernandez and O. Pastor, Towards a Proposal to Capture Usability Requirements Through Guidelines, Proceedings of 7th IEEE International Conference on Research Challenges in Information Science, pp 1-12. L. M. Cysneiros and A. Kushniruk, Bringing Usability to the Early Stages of Software Development, Proceedings of 11th IEEE International Conference on Requirements Engineering, pp 359-360. Y.Tao, Work in Progress-Introducing Usability Concepts in Early Phase of Software Development, 35th ASEE/IEEE Frontiers in Education Conference N. Bevan, N. Claridge, M. Maguire, M. Athousaki, Specifying and Evaluating Usability Requirements Using the Common Industry Format, Proceedings of IFIP 17th World Computer Congress, Montreal, Canada, p133-148. Kluwer Academic Publishers. N. Bevan, Usability Issues in Web Site Design, National Physical Laboratory, Usability Services, Teddington, Middx, TW11 0LW, UK. T. Geis, Dr. W. Dzida and W. Redtenbacher, Specifying Usability Requirements and Test Criteria for Interactive Systems. R. Pressman, Software Engineering: A Practitioner Approach, prentice Hall, 6th edition T.Tiwari, K.Paithankar, Identifying Parameters with Data Collection Methods for Software Development in View of Usability, 4th National Conference, INDIACom. K. Paithankar, and M. Ingle, Identification of Vital Factors by Analyzing Usability Requirements, Second International Conference on Advances in Computer Vision and Information Technology (ACVIT 2009), Babasaheb Ambedkar Marathwada University, Aurangabad, pp 578-587.

IJSWS 15-127; Š 2015, IJSWS All Rights Reserved

Page 55


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net A Review on Personalized Search Engine Snehal D. Jadhav1, Vaishali P. Suryawanshi2 Department of Information Technology, MITCOE, Pune, India SavitribaiPhule Pune University, Pune MITCOE, Paud Road, Kothrud, Pune- 411038 INDIA Abstract:Trillions of pages are available on the World Wide Web which can provide users with effective means of information source. Web Search is the most frequent activity on the Internet but is still a difficulty when a user uses mobile devices with small screens and default keypad. A major problem of current web search is that the queries are short and obscure and the results of such queries are not specific to user needs. To relive this problem some of the search engines suggest users the semantically related terms to the user submitted queries so that user can select the terms as per need. In this paper, we provide review of various personalization techniques so that users are provided with more relevant data. The prior step for personalization process includes techniques to extract contents of user query. The next step involves applying personalization algorithms to give user more relevant search results. Keywords:Ontology, Personalization, concept-based clustering, clickthrough, user profiling, concept. I. Introduction As the internet expand the indexed pages related to them also increases. With such a large volume of data, it becomes more difficult to find the relevant information in order to satisfy user queries using a simple web search engine. The queries submitted by users in search engines are limited and inexplicable. These short queries are not explicitly able to express what user needs exactly. As a result, lots of query results are retrieved which may be irrelevant to user query. To improve uses’ search many of the search engines provide query suggestions helping users to formulate their queries. When a user submits query, the terms relevant to the queries are provided to help user identify the terms he/she wants and therefore increasing effectiveness of retrieval. In this way, search engine will provide the same semantically related word to the user queries without considering user’s personal specific interests. Wilfred Ng, et al, proposed a method that provides personalized query suggestions based on personalized concept based clustering technique. This approach used clickthrough data to estimate user’s preferences and then provide personalized query suggestions. DikLun Lee, et al proposed a personalization technique that captures the user’s preferences in the form of concepts by mining their clickthrough data. An ontology-based user profiling is used to capture the user preferences. Location concepts are also given importance in order to give more specific results related to user profile. Three major functions are don6e in this personalization process: Capturing user preferences, reranking them and updating user profile. User preferences are captured by using the user clickthrough data and are treated as positive preferences. II. Literature Survey Query clustering techniques have been developed in various ways. The very first query clustering technique comes from information retrieval studies. Similarity between queries was measured based on overlapping keywords or phrases in the queries. Each query is represented as a keyword vector. Similarity functions such as cosine similarity or Jaccard similarity were used to measure the distance between two queries. One major limitation of the approach is that common keywords also exist in unrelated queries. For example, the queries, “apple iPod” and “apple pie”, are very similar since they both contain the keyword “apple.” DikLun Lee et al [2], proposed a personalized concept-based clustering technique on the basis of variations of the queries submitted to the search engines with different meanings. For example, depending on the user, the query “apple” may refer to a fruit, the company Apple Computer or the name of a person, and so forth. Thus, providing personalized query suggestion (e.g., users interested in “apple” as a fruit get suggestions about fruit, while users interested in “apple” as a company get suggestions about the company’s products) certainly helps users formulate more effective queries according to their needs. Proposed approach consists of the following four major steps. First, when a user submits a query, concepts (i.e., important terms or phrases in web-snippets) and their relations are mined online from web-snippets to build a concept relationship graph. Second, clickthroughs are collected to predict user’s conceptual preferences. Third, the concept relationship graph together with the user’s conceptual preferences is used as input to a concept-based clustering algorithm that finds conceptually close queries. Finally, the most similar queries are suggested to the user for search refinement. A. Personalized Concept-Based Clustering: Algorithm: Input: A Query-Concept Bipartite Graph G

IJSWS 15-128; © 2015, IJSWS All Rights Reserved

Page 56


SnehalJadhav et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 56-59

Output: A Personalized Clustered Query-Concept Bipartite Graph Gp // Initial Clustering 1: Obtain the similarity scores in G for all possible pairs of queries using the noise-tolerant similarity function. 2: Merge the pair of most similar queries qi; qj that does not contain the same queries from different users. 3: Obtain the similarity scores in G for all possible pairs of concepts using the noise-tolerant similarity function 4: Merge the pair of concepts ci; cj having highest similarity score. 5. Unless termination is reached, repeat steps 1-4. // Community Merging 6. Obtain the similarity scores in G for all possible pairs of queries using the noise-tolerant similarity function. 7. Merge the pair of most similar queries qi; qj that contains the same queries from different users. 8. Unless termination is reached, repeat steps 6 and 7.

Figure. 1 Initial clustering

Figure 2. Community Merging

The main consequence of this technique is timing to start community merging is important for the success of the algorithm which affects the values of recall and precision which in turn affect the accuracy of the algorithm. B. Personalization Based On Ontology The personalization approach [3] is based on “concepts” to profile the interests and preferences of a user. Therefore, an issue we have to address is how to extract and represent concepts from search results of the user. In this paper an ontology-based, multi-facet (OMF) profiling method is proposed in which concepts can be further classified into different types, such as content concepts, location concepts, name entities, dates etc. As an important first step, we focus on two major types of concepts, content concepts and locationconcepts. A content concept, for example a keyword or key-phrase in a web page, defines the content of the page, whereas a location concept refers to a physical location related to the page.

Figure. 3. Personalization Process

1. Joachim’s Method In Joachim method scanning of results is done from top to bottom. The document which has higher rank is displayed first and other results are displayed after that in descending order of ranking. If a user does not read a

IJSWS 15-128; © 2015, IJSWS All Rights Reserved

Page 57


SnehalJadhav et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 56-59

document which is at a higher rank and reads on document at lower rank, it means that the user is not interested in document at the higher rank. Thus, Joachim’s method concludes that the user prefers di to document dj (denoted as dj<r0 di, where r0 is the user's preference order of the documents in the search result list). 2. SpyNB Method Similar to Joachim’s method, SpyNB learns user behavior models from preferences extracted from click through data. SpyNB assumes that users would only click on documents that are of interest to them. Then we can treat the clicked documents as positive samples. However, documents that are not clicked are treated as unlabeled samples because they could be either relevant or irrelevant to the user. 3. RSVM Ranking SVM is an application of SVM aimed to solve certain ranking problems. The main purpose is to improve the performance of the internet search engine. Click through data can be used as the input; RSVM aims at finding a linear ranking function. It is valid for as many document preference pairs as possible. It maps the similarities between queries and the clicked pages onto certain space. Calculates the weights between any two of the vectors obtained. Re ranks the search results based on the weights. If a user has visited the GPS location lr, the weight of the location concept is incremented. Hence, we assume that the location that the user has visited a long time ago is less important than the location that the user has recently visited. Thus GPS plays an important role in location information. C. Personalization Technique With Privacy 1. Concept and Entropy: The content and location entropies [4] are introduced for measuring the diversity of content and location information from the search results of a query.In addition, the click content and location entropies were introduced to determine how much a useris interested in the content and location information associatedwith a query

2.

Personalization Effectiveness: A query result set with high content/location entropy indicates that it has a high degree of ambiguity. Thus, applying personalization on the search results helps the user to find out the relevant information.

Chen et al. studied the problem of efficient query processing in location-based search systems [7]. A query is assigned with a query footprint that specifies the geographical area of interest to the user. Several algorithms are employed to rank the search results as a combination of a textual and a geographic score. 3. Personalized Ranking Function Once the user preferences are received, Ranking SVM is applied to get a personalized ranking function for rank adaptation of the search results according to user content and location preferences. 4. User Preferences Extraction and Privacy Preservation Two feature vectors were introduced namely Content feature vector and Location feature vector to represent the content and location information associated with the document and to be used for training. PMSE [4] incorporates a user’s physical locations in the personalization process. We conduct experiments to study the influence of a user’s GPS locations in personalization. The results show that GPS locations helps improve retrieval effectiveness for location queries (i.e., queries that retrieve lots of location information).

IJSWS 15-128; © 2015, IJSWS All Rights Reserved

Page 58


SnehalJadhav et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 56-59

Figure.4. Parent-child relationships, 1) Ancestors, 2) Descendants, and 3) Sibling Concepts, in a concept ontology.

a) Content Feature Vector If content concept Ci is in web snippet Sk the value of content feature vector will be incremented by 1 Related Concepts are then give as:

b) Location Feature Vector If location concept Li is in a web snippet Dk, value of location feature vector is incremented by 1

Related Location Concepts are:

III.

Conclusion

The concept-based clustering technique used for personalization gives greater personalization to the user queries. The main critical factor affecting the effectiveness of the techniques was the time at which the agglomerative algorithm should be applied so as to get personalized clustering. Later the ontology based multi facet profiling technique was introduced which used content and location concept in ontology tree. Here the privacy preservation was not provided to the user profiles and to the user queries. The privacy of users was preserved by parameters such as minimum distance of the leaf node from the root in ontology giving user more accurate n secured results. The future scope of these personalization techniques may involve relating the queries with the user name. By recording the historical places the user visited in near past, frequent travelling patterns can also be found out in order to give more specific results according to location concept. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Dr. C. R .Rene Robin and R Divya, “Onto-Search: An Ontology Based Personalized Mobile Search Engine”, 2014 K.W.-T. Leung, W. Ng, and D.L. Lee, “Personalized Concept-Based Clustering of Search Engine Queries,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1505-1518, Nov. 2008. K.W.-T. Leung, D.L. Lee, and W.-C. Lee, “Personalized Web Search with Location Preferences,” Proc. IEEE Int’l Conf. Data Mining (ICDE), 2010. K.W.-T. Leung, D.L. Lee, and W.-C. Lee, “PMSE: A Personalized Mobile Search Engine”, IEEE Trans. Knowledge and Data Eng., vol. 25, no. 4, April 2013 E. Agichtein, E. Brill, and S. Dumais, “Improving Web Search Ranking by Incorporating User Behavior Information,” Proc. 29 th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2006 E. Agichtein, E. Brill, S. Dumais, and R. Ragno, “Learning User Interaction Models for Predicting Web Search Result Preferences,” Proc. Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2006. Y.-Y. Chen, T. Suel, and A. Markowetz, “Efficient Query Processing in Geographic Web Search Engines,” Proc. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2006. T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2002 Y. Xu, K. Wang, B. Zhang, and Z. Chen, “Privacy-Enhancing Personalized Web Search,” Proc. Int’l Conf. World Wide Web (WWW), 2007 S. Yokoji, “Kokono Search: A Location Based Search Engine,” Proc. Int’l Conf. World Wide Web (WWW), 2001.

IJSWS 15-128; © 2015, IJSWS All Rights Reserved

Page 59


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

Dynamic Data Storage and Replication Based on the Category and Data Access Patterns Priya Deshpande1, Radhika Jaju2 Assistant Professor, 2Student M.E. (I.T) 1,2 Department of Information Technology Engineering, MIT College of Engineering, Kothrud, Pune 411038, Maharashtra, INDIA _________________________________________________________________________________________ Abstract: Now days, Big data storage is becoming tricky issue. And it’s becoming current popular research topic. Dealing with the large scale massive data needed to be stored efficiently and accessed easily which is a crucial problem. This paper gives the solution to this problem. Category wise Data distribution will help to improve the data access time. And ultimately it will reduce the job execution time which will improve in the performance of data grid. Here we are using one algorithm K-Means to divide the data category wise. Then we are working on the limited storage capacity of the node. Due to limited capacity of the node we need to replace the data and to do this we need some replication strategy. So here we are using different Strategy depending on the user’s data access pattern. This paper will focus on the improvement of the performance, reduction in bandwidth consumption, Efficient and easy data access. 1

Keywords: Category, Popularity, Data Access Pattern ________________________________________________________________________________________ I. Introduction Data grid technology has main feature, storage capability which is becoming a popular research topic now a days. Dynamic massive large size data is stored through data grid technology with the help of the internet. The data size of such a massive data is growing day by day and so there is necessity of efficient handling of such data. So for the efficient handling, data placement, replacement and the data access should be effective. Large number of data is generating regularly, and this data is needed to be stored properly for efficient data access and also to prevent the data loss, memory loss, data redundancy, data duplications. So  How to place the data?  Where to place the data?  Why to place the data? [3] Hadoop is open source framework, works on the storage issue. Hadoop plays important role in distributed computing system. Here following diagram will show the process of storing data on hadoop.

Figure 1: Hadoop Data Storage process. [7]

IJSWS 15-132; © 2015, IJSWS All Rights Reserved

Page 60


Priya Deshpande et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 60-63

Figure shows the basic architecture of Hadoop. Namenode connected to number of data nodes. Client asks to namenode to store the data. Namenode checks the availability of data nodes and send the acknowledgement to the client accordingly. Selected datanodes are pipelined together to store the data. Then client directly send data to the datanodes and stores data on those datanodes [7]. With these good functionalities hadoop have some disadvantages also like Redundancy, overwriting, overheads, data loading, data retrieval etc. and ultimately this affects on the overall performance. So we need to find out solutions on these issues. Here numbers of strategies are defined and executed to overcome the issues. Some of them have shown effective results also. Our data placement strategy will give the answers to these questions. Number of strategies are introduced and implemented for the category wise placement of the data. Some strategies have shown effective results also. Depends on the data, jobs will be assigned to the sites accordingly. Replacement strategy are used for the better performance in the data intensive applications There are many replacement strategies are proposed for the better performance and some of them are showing better results. Here in this paper we are trying to apply different strategies for the different type of data for the better performance and to avoid the overheads. Replacement of the file will be depends on the access frequency of that file and also on the requirement of that file to that particular node. Jobs will be executed on the sites where the job related data is more. So the data transfer time will be reduced and bandwidth consumption will also reduce. II. Related Work Many replication strategies are implemented for the replication purpose. So here some strategies which we have studied for the reference basic and the old strategies which shows good effect in replication are as follows LRU (Least Recently used): In this strategy the replica which is not in use from long time will get deleted for the replication i.e. file which is not used recently will get deleted. And if the size of deleting file is less than the new replica then the next least recent file will get deleted for the new replica placement. And the process goes on [1]. LFU (Least Frequently Used): This strategy is based on the replication of the file. In this strategy less popular file will get replaced by newer one. Here the files on which jobs will perform will be stored in local storage. And the less accessed file will be removed from the storage for the new replica even if the file to be deleted is newly replicated [2]. DORS: in this paper different replication strategies are used for different access patterns. The replication is based on the replica’s value. Replica value is calculated and accordingly. This strategy considers the parameter like file size, network status and file access history [3]. Chang has proposed LALW (Latest Access Largest weight), here largest weight will be applied to the file which is accessed most recently. Similarly SATO et. al presented small modification to the simple replication algorithms on the basis of file access pattern and network capacity [4]. DRCP is Dynamic Replica creation and placement proposes the placement policy and the replica selection to reduce the execution time and bandwidth consumption. Their replication is based on the popularity of the file and this strategy is implemented using data grid simulator, OptorSim [5, 6]. In this paper we are proposed strategy for the data placement where we are storing the data category wise and accordingly we are applying different replication policies based on the data access patterns. Rest of the paper is divided as section 3 will describe the system architecture and the storage of the data and defines the different replication strategies according to the data access pattern. Section 4 gives the conclusion and future work and Section 5 suggests the references. III. System Architecture Figure gives the overall idea of proposed strategy. Different jobs from different clients are requested to the job broker. Here job broker works as a namenode of hadoop. Then job broker will run the K-means algorithm to find out the category of the data to be submitted. Then according to category data will be stored on the particular categorized node. Once the data is stored replication point comes into the picture. What if particular datanode fails? How to recover the data lost in failure. The Solution is Replication. Creating number of replicas of files and storing them on different data nodes so that we can retrieve the data from other nodes if particular data node fails. So many replication strategies are developed as a solution. Then again here we are Applying different replication strategies according to the access pattern. We will keep the pattern recognizer and log files in the replica manager to find out the access pattern of data and Applying the replication strategy accordingly. Replica log file maintains the replica records that are replica number, path etc. Here each file has a unique record number to avoid the redundancy.

IJSWS 15-132; Š 2015, IJSWS All Rights Reserved

Page 61


Author et al., International Journal of Software and Web Sciences, 11(1), ), December 2014-February 2015, pp. xx-xx

Figure 2: Data Storage and Replacement System Architecture [8].

A. Data Distribution Data storage is the important issue as the data size is increasing rapidly day by day. After the storage of the data, information retrieval is another big issue. The data stored is of different types, so retrieving required data from the collection of the different data is quite difficult. So to retrieve required data easily, some operations on the data are needed to be performed. Operations like dividing data into different category. And storing this category wise data on different nodes where the request ratio to the particular data is high. Data will be retrieved easily and the file transfer traffic ratio will be low ultimately it will effect on the performance and cost of the file transfer. Data will be divided category wise. For example, Plastic factory data will have different category like vendors data, supplier’s data, There are different strategies are studied to divide such a data category wise. K-means is one of the algorithms used for the data categorization purpose. It is a well known partitioning algorithm where the objects are categorized as they belong to the one of K-groups, here K is priori. Depending on the mean multidimensional version i.e. centroid of the cluster, the belonging of that object to the particular cluster is finalized. It means object is assigned to the group having closest centroid [10,11]. K-means works particularly by calculating centroid of each cluster. And it is cost effective. Basic k-means algorithm is { Select K ; // where K = initial centroids of K points; DO { Create K clusters; // i.e. assign each point to its closest centroid Recalculate the centroid of each cluster; } WHILE centroids of the cluster do not change; }

Whenever data will come to the job tracker, job tracker will invoke the K-means algorithm. K-means will divide that data category wise and that categorized data will be stored on to the node assigned for that category. In case of storage full, the new arriving data will automatically propagate to the nearer node. B Replication When we are requesting certain files say [f1, f2, f3‌fn] for executing particular job, some of the files will be available on the local storage and these file will execute directly. But the files which are not on the local storage have to be fetched from other nodes and have to store on local node, then execution will be carried out. And as

IJSWS 15-132; Š 2015, IJSWS All Rights Reserved

Page 62


Priya Deshpande et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 60-63

we know the limited storage capacity of the node, what if the storage is full? Where to store these files? Answer is deleting some files of local storage and store the new one. Again the question arises that which file should be deleted? How many files will be deleted for the storage of new files? These are the some key points we are considering while applying our strategy. Here we will achieve our strategy results in two steps first is to decide whether the file from other node should be stored on local device or not. And second is to apply the different replacement strategy depending on the data access pattern. 1: The storage of the file will be depends on the replication factor. If the file having copies less than the replication factor then they are copied and if they are having number more than it they will not get copied. The replication factor will be decided on the basis of the capacity of all the nodes to the total size of all the files. R= C/W -------[9] R= Replication factor, C= capacity of all the nodes, W= Total size of all files in data grid. Here R will decide whether to replicate the file or not. If the copies of the files are less than R then files will be replaced otherwise not. 2: The different replacement strategy will be used for the different data access patterns. We will use the strategies which are showing better results for the particular data access pattern [8]. For example for random data access pattern LRU shows the better performance. And the replication will be depends on the request rate of user. That is the file which is not in a use for long time on that particular node will get replaced by new one. And to decide how many replicas and when to replicate the file is depends on the replication factor. Here the category is also important for the replication. Depending on the category of the data, it will goes to the particular node assigned for similar type of data. Here we are assuming that we are assigning a particular node for particular subject information. For example if particular file of data belongs to chemistry domain then it will go to the node assigned for chemistry domain. IV. Conclusion and Future Work In This paper, our data distribution strategy will help to improve the data access time. K-means algorithm will divide the data category wise and then it will send to respective node assigned for the particular category. So when the user request for the file or stores the file, K-Means will run and will go to particular category node and will perform on it. As our Replication strategy is based on the user’s data access pattern, replica strategy will depend on it. This will shows the better results than applying same strategy for all type of access pattern. In future we will also consider the scheduling criteria, load balancing, recovering so that we can perform on whole system and will give better results. References [1]

[2] [3] [4] [5] [6] [7] [8] [9]

[10] [11]

W.H. Bell, D.G. Cameron, R. Carvajal Schiaffino, AP. Millar, K.Stockinger, F. Zini, Evaluation of an economy- based file replication strategy for a data grid, in: Proc. of 3rd IEEE In!. Symposium on Cluster Computing and the Grid, CCGrid'2003, IEEE CS-Press, Japan, (2003). W.H. Bell, D.G. Cameron, R. Carvajal-Schiaffino, AP. Millar, K.Stockinger, F.Zini, Evaluating Scheduling and Replica Optimization Strategies in Data Grid, IEEE (2003). W. Zhao, et al., “A Dynamic Optimal Replication Strategy in Data Grid Environment”, International Conference on Internet Technology and Applications, pp. 1-4, 2010. R.S. Chang, H.P. Chang, "A Dynamic Data Replication Strategy Using Access-Weights in Data Grids," Supercomputing, Vol. 45, No. 3, pp. 277-295, 2008. K. Sashi, A. Selvadoss Thanamani, A New Replica Creation and Placement Algorithm for Data Grid Environment, IEEE – International Conference on Data Storage and Data Engineering (2010). K. Sashi, A. Selvadoss Thanamani, Dynamic Replication in a Data Grid using a Modified BHR Region Based Algorithm, Elsevier – Future Generation Computer Systems (2011). White, Tom. Hadoop The Definitive Guide. Sebastopol : O'Reilly, 2010. Myunghoon Jeon, Kwang-Ho Lim, Hyun Ahn, Byoung-Dai Lee, Dynamic data replication scheme in cloud computing environment, 2012 IEEE Second Symposium on Network Cloud Computing and Applications Wolfgang Hoschek, Francisco Javier Jaén-Martínez, Asad Samar, Heinz Stockinger, and Kurt Stockinger, “Data Management in an International Data Grid Project”, Proceedings of the First IEEE/ACM International Workshop on Grid Computing, SpringerVerlag, 2000, pp. 77-90 Chen G., Jaradat S., Banerjee N., Tanaka T., Ko M., and Zhang M., “Evaluation and Comparison of clustering algorithms in analyzing ES cell Gene Expression Data”, Statistica Sinica, vol. 12, pp. 241-262, 2002. Osama Abu Abbas, ‘Comparisons between data clustering Algorithms’, The international Arab journal of information technology, Vol. 5, No. 3, July 2008.

IJSWS 15-132; © 2015, IJSWS All Rights Reserved

Page 63


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Testing of Brain Tumor Segmentation Using Hierarchal Self Organizing Map (HSOM) Dr. M.Anto Bennet , G.Sankar Babu, S.Lokesh, S.Sankaranarayanan Department of Electronics and Communication Engineering, VELTECH, Avadi-Chennai- 600062, Tamilnadu, INDIA _________________________________________________________________________________________ Abstract: A forceful segmentation tool for the detection of brain tumor is used to assist clinician and researchers in radio surgery applications. A clustering based approach using hierarchal self organizing map algorithm is proposed for MR image segmentation. Hierarchal self organizing map (HSOM) is a dynamically growing neural network model evolves into a hierarchical structure according to the requirements of the input data during an unsupervised training process. This algorithm proved to be exceptionally successful for data visualization applications. HSOM is the extension of the conventional self organizing map which is used to classify the image row by row. In this technique lowest level of weight vector, a higher value of tumor pixels and the computation speed is achieved by the HSOM with vector quantization. Hierarchal self organizing map (HSOM) segmentation is done in two phases: In the first phase MRI brain image is acquired from patient data base and the image is pre-processed where the noise is removed. In the second phase tumor present in the brain and the severances of the tumor is detected. Finally the numbers of affected cells are counted using the row and column wise scanning method. The proposed system uses HSOM. A self-organizing map (SOM) or selforganizing feature map (SOFM) is a type of artificial neural network for unsupervised learning. SOMs operate in two modes: training and mapping. Training is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector. Segmentation is an important process to extract information from complex medical images. Keywords: HSOM, Image analysis, Magnetic Resonance Imaging (MRI) Segmentation, Tumor detection. __________________________________________________________________________________________ I. INTRODUCTION The existing system uses cellular automata (CA) based seeded tumor segmentation. Cellular automata (CA) was introduced to provide a formal framework for investigating the behavior of dynamic complex system in which time and space are discrete.CA model is composed of cell, state set of cell, neighborhood and local rule. INPUT IMAGE

SELECTION OF VOI

FOREGROUND (RED) BACKGROUND (BLUE)

RESULT

INITIALIZEPT = 0.5

TUMOR PROBABILITY MAP

Figure 1 Block Diagram of cellular automata (CA) based seeded tumor segmentation Figure 1 shows the block diagram of existing system. It shows that the region of interest is selected first and then the process of segmentation is done. The existing system uses cellular automata (CA) based seeded tumor segmentation. Cellular automata (CA) was introduced to provide a formal framework for investigating the behavior of dynamic complex system in which time and space are discrete.CA model is composed of cell, state set of cell, neighborhood and local rule. A cellular automaton is basically a computer algorithm that is discrete in space and time and operates on a lattice of cells. Cellular Automata has attracted researchers from various fields in both physical and social sciences because of its simplicity Band potential in modeling complex systems. A cellular automaton (CA) is a triple A = (S,N, γ),where S is a nonempty set, called the state set, N is the neighborhood, and γ is the local transition function (rule). Grow-cut method uses continuous state cellular automata to interactively label images using user supplied seeds. The cells are corresponding to image pixels, and the feature vector is RGB or gray scale intensities. The automata are initialized by assigning corresponding labels at seeds with a strength value between 0 and 1 where a higher value reflects a higher confidence in choosing the seed. Strengths for unlabeled cells are set to 0. The ultimate aim of image processing applications is to extract important features from the image data, from which a description, interpretation, or understanding

IJSWS 15-133; © 2015, IJSWS All Rights Reserved

Page 64


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

of the scene can be provided by the machine. The segmentation of brain tumor from magnetic resonance images is an important but time-consuming task performed by medical experts The digital image processing community has developed several segmentation methods. II. DIAGNOISIS OF BRAIN TUMOR Identifying a brain tumor usually involves a neurological examination, brain scans, and/or an analysis of the brain tissue. Doctors use the diagnostic information to classify the tumor from the least aggressive (benign) to the most aggressive (malignant). In most cases, a brain tumor is named for the cell type of origin or its location in the brain. Identifying the type of tumor helps doctors determine the most appropriate course of treatment. MRI Scan MRI (Magnetic Resonance Imaging) is a scanning device that uses magnetic fields and computers to capture images of the brain on film. It does not use x-rays. It provides pictures from various planes, which permit doctors to create a three-dimensional image of the tumor. The MRI detects signals emitted from normal and abnormal tissue, providing clear images of most tumors.

Figure 2. MRI of Normal Brain Figure 3. MRI of Tumored Brain Figure 2 shows the MRI of normal brain which shows the clear view of brain. The brain consists of different parts depending on which the sense organs work Figure 3 shows the MRI of brain with tumor in it. Tumor may be present in any portion and depending on the number of cells affected it is possible to provide treatment. CTScan CT or CAT Scan (Computed Tomography) combines sophisticated x-ray and computer technology. CT can show a combination of soft tissue, bone, and blood vessels. CT images can determine some types of tumors, as well as help detect swelling, bleeding, and bone and tissue calcification. Usually, iodine is the contrast agent used during a CT scan.

. Figure 4. CT Scan Image of Brain Tumor Figure 4.shows the CT scan image of brain tumor. It is a extended format of X-ray and hence clear description about the tumor cannot be detected. III. HSOM ALGORITHM The hierarchal self-organizing map (HSOM) is an artificial neural network model that proved to be exceptionally successful for data visualization applications where the mapping from an usually very highdimensional data space into a two-dimensional representation space is required. The remarkable benefit of HSOMs in this kind of applications is that the similarity between the input data as measured in the input data space is preserved as faithfully as possible within the representation space. Thus, the similarity of the input data is mirrored to a very large extend in terms of geographical vicinity within the representation space. Image segmentation techniques can be classified as based on edge detection, region or surface growing, threshold level, classifier such as Hierarchical Self Organizing Map (HSOM), and feature vector clustering or vector

IJSWS 15-133; Š 2015, IJSWS All Rights Reserved

Page 65


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

quantization. Vector quantization has proved to be a very effective model for image segmentation process. Vector quantization is a process of portioning n-dimensional vector space into M regions so as to optimize a criterion function when all the points in each region are approximated by the representation vector Xi associated with that region. There are two processes involved in the vector quantization: one is the training process which determines the set of codebook vector according to the probability of the input data, the other is the encoding process which assigns input vectors to the codebook vectors. HSOM combine the idea of regarding the image segmentation process as one of data abstraction where the segmented image is the final domain independent abstraction of the input image. The hierarchical segmentation process for a hierarchical structure is called abstraction tree. The abstraction tree bears some resemblance to the major familiar quad tree data structure used in the several image processing and image analysis algorithms. The researchers in this field have used SOM or HSOM separately as one of the tool for the image segmentation of MRI brain for the tumor analysis. DICOM IMAGE

PRE PROCESSING

IMAGE ENHANCEMENT

HSOM

BINARIZATION

RESULTANT OUTPUT

AFFECTED CELL COUNTING

Figure 5 Block Diagram of hierarchal self-organizing map (HSOM) Figure 5 shows the block diagram of proposed system. The process is done in two phases which is explained in following description. The block diagram of proposed system consists of two phases. First phase composed of segmented output with the help of HSOM algorithm. During the second phase number of cells affected is calculated from which the severity of the tumor is found. DICOM Image The Image Processing Toolbox now supports writing files in Digital Imaging and Communications in Medicine (DICOM) format, using the dicom write function. This converts the dicom image format into JPEG image format so that preprocessing of the image becomes easier. Preprocessing The MRI consists of film artifact or labels on the MRI such as patient name, age and marks. Film artifacts are removed using tracking algorithm. Here, starting from the first row and the first column, the intensity value, greater than that of the threshold value is removed from MRI. The high intensity value of film artifact is removed from MRI brain image. During removal of film artifacts, the image consist of salt and pepper noise. Image Enhancement The image is given to enhancement stage for the removing high intensity component and the above noise. This part is used to enhance the smoothness towards piecewise- homogeneous region and reduce the edge blurring effects. This proposed system describes the information of enhancement using weighted median filter for removing high frequency component. HSOM algorithm is used and segmented output is obtained. Figure 6 shows the satellite image in which the clear details about the image is not found. There are various transformations available from which gamma transformation is used to enhance the above image.

Figure 6. Original Satellite image Figure7 Enhanced Satellite Image Figure 7 shows the enhanced satellite image. Using the condition s = cr ^ γ image enhancement process is done considering the value of γ = 5.0. Finally binarization is done from which the affected numbers of cells are counted. Binarization is the process of converting the grey scale image into black and white image

IJSWS 15-133; © 2015, IJSWS All Rights Reserved

Page 66


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

IV. SYSTEM IMPLEMENTATION Image segmentation is a technique which divides the image into different feature of region and extract out the interested target. The features can be pixel grayscale, color, texture, etc. Predefined targets can correspond to a single region or multiple regions. To illustrate the level of the image segmentation in image processing, "an image engineering concept" is introduced. It brings the involved theory, methods, algorithms, tools, equipment of image segmentation into an overall framework. IMAGE UNDERSTANDING HIGH LEVELSYMBOL

IMAGE ANALYSIS MIDDLE LEVEL

TARGET

IMAGE PROCESSING LOW LEVEL

PIXEL

Figure 8 Levels of Segmentation Figure 8 shows the three levels of segmentation. Image Engineering is a new subject for research and application of image field, its content is very abundant. According to the different of the abstract degree and research methods, it can be divided into three levels as Image processing, image analysis and image understanding. The target expression based on segmentation, the feature extraction and parameter measurement that convert the original image to more abstract and more compact form, it is possible to make high-level image analysis and understanding. In the actual production life, the application of image segmentation is also very wide and almost appeared in all related areas of image processing as well as involved various types of image.For example, satellite image processing in the application of remote sensing. The brain MR image analysis in the applications of medicine, the plates of illegal vehicle region segmentation in the traffic image analysis, the image region of interest extraction in the object oriented image compression and content-based image retrieval. Image segmentation is usually used for image analysis, identification and compress code, etc. The accuracy of regions extraction will directly affect the effectiveness of following task, the method and accuracy of segmentation is very important. V. METHOD FOR IMAGE SEGMENTATION Image segmentation is the first important process in numerous applications of computer vision. It partitions the image into different meaningful regions with homogeneous characteristics using discontinuities or similarities of image components, the subsequent processes rely heavily on its performance. In most cases, the segmentation of color image demonstrates to be more useful than the segmentation of monochrome image, because color image expresses much more image features than monochrome image. Each pixel is characterized by a great number of combinations of RGB chromatic components. However, more complicated segmentation techniques are required to deal with rich chromatic information in the segmentation of color images. According to the usage of prior knowledge of the image, color image can be segmented in an unsupervised or supervised way. The former attempts to construct the “natural grouping” of the image without using any prior knowledge. Unsupervised Segmentation The spatial compactness and color homogeneity are two desirable properties in unsupervised segmentation, which lead to image-domain and feature-space based segmentation techniques. According to the strategy of spatial grouping, image-domain techniques include split-and-merge, region growing and edge detection techniques. There have been extensive studies on them in the literature. The Markov Random Field (MRF) is defined in the quad-tree structure to represent the continuity of color regions in the process of split-and-merge. The histogram thresholding is a technique that seeks the peaks or valleys in 3 color histograms or a three dimensional (3-D) histogram. The HSV histograms are used for the segmentation of color image. The achromatic regions are determined by the saturation values, and the remaining chromatic regions are segmented by thresholding the peaks of hue histogram. A 3-D color histogram is built by color components. The valleys of color histogram are identified by the watershed algorithm. The nonparametric clustering is a promising solution in color clustering. Supervised Segmentation In supervised segmentation, the pixel classifier is trained for the best partition of color space using the sample of object colors. The image is segmented by assigning the pixel to one of the predefined classes. The common techniques of supervised segmentation are evaluated in, including maximum likelihood, decision tree, nearest neighbor and neural networks. The supervised segmentation is employed for the segmentation of video shots.

IJSWS 15-133; © 2015, IJSWS All Rights Reserved

Page 67


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

The segmentation of image frames is hierarchized by three classifiers, i.e., k nearest neighbor, naive bayes, and support vector machine. In, image segmentation is performed by a procedure of supervised pixel classification. The rule of minimum distance decision is used to assign each pixel to a specific class in a color texture space. Histogram-based approaches can also be quickly adapted to occur over multiple frames, while maintaining their single pass efficiency. Histogram-Based Methods Histogram-based methods are very efficient when compared to other image segmentation methods because they typically require only one pass through the pixels. The histogram can be done in multiple fashions when multiple frames are considered. The same approach that is taken with one frame can be applied to multiple, and after the results are merged, peaks and valleys that were previously difficult to identify are more likely to be distinguishable. The histogram can also be applied on a per pixel basis where the information result is used to determine the most frequent color for the pixel location. This approach segments based on active objects and a static environment, resulting in a different type of segmentation useful in Video tracking. Edge Detection Edge detection is one of the fundamental steps in image processing, image analysis, image pattern recognition, and computer vision techniques. During recent years, however, substantial and successful research has also been made on computer vision methods that do not explicitly rely on edge detection as a pre-processing step. The well-known and earlier Sobel operator is based on the following filters: These are the estimates of first- order derivatives, the gradient magnitude is then computed as: 1. Focal blur caused by a finite depth-of-field and finite point spread function. 2. Penumbral blur caused by shadows created by light sources of non-zero radius. 3. Shading at a smooth object. Region Growing Methods The first region growing method was the seeded region growing method. This method takes a set of seeds as input along with the image. The seeds mark each of the objects to be segmented. The regions are iteratively grown by comparing all unallocated neighboring pixels to the regions. The difference between a pixel's intensity value and the region's mean, δ, is used as a measure of similarity. The pixel with the smallest difference measured this way is allocated to the respective region. This process continues until all pixels are allocated to a region. Seeded region growing requires seeds as additional input. The segmentation results are dependent on the choice of seeds. Noise in the image can cause the seeds to be poorly placed. Unseeded region growing is a modified algorithm that doesn't require explicit seeds. It starts off with a single region A1 – the pixel chosen here does not significantly influence final segmentation. Semi-Automatic Segmentation In this kind of segmentation, the user outlines the region of interest with the mouse clicks and algorithms are applied so that the path that best fits the edge of the image .Techniques like Siox, Livewire, or Intelligent Scissors are used in this kind of segmentation. Neural Networks Segmentation Neural Network segmentation relies on processing small areas of an image using an artificial neural network or a set of neural networks. After such processing the decision-making mechanism marks the areas of an image accordingly to the category recognized by the neural network. VI. SIMULATED RESULTS

Figure 9. Input Image A Figure 9 shows the input image which is collected from the tumor diagnosis center. It is an MRI scan in which the tumor is not much severe. GUI programming is done so that the tumor is analyzed and the cells affected by the tumor are found.

IJSWS 15-133; Š 2015, IJSWS All Rights Reserved

Page 68


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

Figure 10. Output of Image A Figure 10 shows the output of image A. The analysis of the image shows that there are 2076 cells affected cells and the time taken to for the analysis is 3.40082 seconds. The analysis can clearly segment the tumor present in the brain.

Figure 11 Input Image B Figure 11 shows the input image which is collected from the tumor diagnosis center. It is an MRI scan in which the tumor is not much severe. GUI programming is done so that the tumor is analyzed and the cells affected by the tumor are found.

IJSWS 15-133; Š 2015, IJSWS All Rights Reserved

Page 69


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

Figure 12. Output of Image B Figure 12 shows the output of image A. The analysis of the image shows that there are 3803 cells affected cells and the time taken to for the analysis is 2.26981 seconds. The analysis can clearly segment the tumor present in the brain.

Figure 13. Input Image C Figure 13 shows the input image which is collected from the tumor diagnosis center. It is an MRI scan in which the tumor is not much severe. GUI programming is done so that the tumor is analyzed and the cells affected by the tumor is found.

Figure 14 Output of Image C Figure 14 shows the output of image A. The analysis of the image shows that there are 115 cells affected cells and the time taken to for the analysis is 1.77841 second. The analysis can clearly segment the tumor present in the brain.

IJSWS 15-133; Š 2015, IJSWS All Rights Reserved

Page 70


M.Anto Bennet et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 64-71

VII. CONCLUSION The proposed method uses hierarchal self organizing map algorithm using which the red blue and green components are separated. During this separation red component is considered to be best suited for the process of segmentation. With the use of red component segmentation is done and the tumor part is identified. The HSOM algorithm provides the information that how many number of tumor cells are affected. The time taken to detect the tumor cells are also provided using HSOM. Segmentation of brain tumor can be done by using different segmentation methodologies. Hence different segmentation techniques such as K-means clustering algorithm, rule based algorithm, fuzzy c-means clustering algorithms will be implemented and the comparative analysis of all the above segmentation will be done. By this the advantage of using HSOM algorithm will be proved. REFERENCES [1].

[2].

[3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. [12].

[13]. [14].

[15]. [16].

[17]. [18]. [19]. [20]. [21]. [22]. [23]. [24]. [25].

AndacHamamci, Nadir Kucuk, KutlayKaraman, KayihanEngin, and GozdeUnal,(2012). "Tumor-Cut: Segmentation of Brain Tumors on Contrast Enhanced MR Imagesfor Radiosurgery Applications". IEEE Transactions On Medical Imaging, Vol. 31, No. 3. Fedde van der Lijn, Marleen de Bruijne, Stefan Klein, Tom den Heijer, Yoo Y. Hoogendam, Aad van der Lugt, Monique M. B. Breteler, and Wiro J. Niessen(2012). "Automated Brain Structure Segmentation Based on Atlas Registration and Appearance Models". IEEE Transactions On Medical Imaging, Vol. 31, No. 2 Shanhui Sun, Christian Bauer, and ReinhardBeichel (2012). "Automated 3-D Segmentation of Lungs With Lung Cancer in CT Data Using a Novel Robust Active Shape Model Approach". IEEE Transactions OnMedical Imaging, Vol. 31, No. 2. Nazem-ZadehM.R, Davoodi-Bojd.E., and Soltanian-ZadehH.,(2011). “Atlasbased fiber bundle segmentation using principal diffusion directions and spherical harmonic coefficients,” NeuroImage, vol. 54, pp. S146–S164. Gooya, G. Biros, and DavatzikosC., (2011). “Deformable registration of glioma images using em algorithm and diffusion reaction modeling,” IEEETrans. Med. Imag., vol. 30, no. 2, pp. 375–390. Menze,Leemput.K.VLashkari.D,Weber.M.A,. Ayache.N, andGolland.P, (2010)“A generative model for brain tumor segmentation in multimodal images,” Med. Image Comput. Comput. Assist. Intervent., vol. 13, pp. 151–159. Hamamci, Unal.G,.Kucuk.N, and Engin.K., (2010) “Cellular automata segmentation of brain tumors on post contrast MR images,” in MICCAI. New York: Springer, pp. 137–146. Kauffmann and Pich.N., (2010) “Seeded ND medical image segmentation by cellular automaton on GPU,” Int. J. Comput. Assist. Radiol. Surg., vol. 5, pp. 251–262. Bai.X, and Sapiro.G,(2009)., “Geodesic matting: A framework for fast interactive image and video segmentation and matting,” Int. J. Comput. Vis., vol. 82, pp. 113–132. Couprie.C, Grady.L, Najman.L, and Talbot.H. (2009), “Power watersheds:A new image segmentation framework extending graph cuts, randomwalker and optimal spanning forest,” in ICCV , pp. 731–738. Biswas.T. (2009), “Stereotactic radiosurgery for glioblastoma: Retrospective analysis.,” Radiation Oncology, vol. 4, no. 11, p. 11. Eisenhauer.E,Therasse.P, Bogaerts.J, Schwartz.L,Sargent.D, Ford.R,Dancey.J,.Arbuck.S, Gwyther.S.Mooney.M, Rubinstein.L, Shankar.L, Dodd.L, Kaplan.R.,.Lacombe.D, and Verweij,J, (2009). “New response evaluation criteria in solid tumours: Revised recist guideline(version 1.1),” Eur. J. Cancer, vol. 45, no. 2, pp. 228–247. Criminisi, Sharp.T, and Blake.A. (2008), “GeoS: Geodesic image segmentation,”in Comput. Vis. ECCV, vol. 5302, pp. 99–112. Szeliski.R.,Zabih.R,Scharstein.D, Veksler.O,Kolmogorov.V,Agarwala.A, Tappen.M, and Rother.C, (2008) “A comparative study of energy minimization methods for Markov random fields with smoothness-based priors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, pp. 1068–1080. Sinop and Grady.L. (2007), “A seeded image segmentation framework unifying graph cuts and random walker which yields a new algorithm,” in ICCV, pp. 1–8. Angelini.E.D, Clatz.O, Mandonnet.E, Konukoglu.E, Capelle.L, and Duffau.H(2007), “Glioma dynamics and computational models: A review of segmentation, registration, and in silico growth algorithms and their clinical applications,” Curr. Med. Imag. Rev., vol. 3, no. 4, pp.262–276. Alvino.C, Unal.G, Slabaugh.G, Peny.B, and Fang.T, (2007). “Efficient segmentation based on eikonal and diffusion equations,” Int. J. Comput.Math., vol. 84, pp. 1309–1324. Archip.N, Jolesz.F, and Warfield.S, (2007) “A validation framework for brain tumor segmentation,” Acad. Radiol., vol. 14, no. 10, pp. 1242–1251. Grady.L. (2006), “Random walks for image segmentation,” in IEEE Trans. Pattern Anal. Mach. Intell., , vol. 28, no. 11, pp. 1768–1783. Liu.J, Udupa.J.K, Odhner.D,Hackney.D, and Moonis.G. (2005), “Asystem for brain tumor volume estimation viaMR imaging and fuzzy connectedness,” Comput. Med. Imag. Graph., vol. 29, pp. 21–34. Vezhnevets.Vand Konouchine.V, (2005) “Growcut-interactive multi-label n-d image segmentation by cellular automata,” presented at the Graphicon, Novosibirsk Akademgorodok, Russia. Kari.J., (2005). “Theory of cellular automata: A survey”, Theoretical Comput.Sci., vol. 334, no. 1–3, pp. 3–33. Prastawa.M,Bullitt.E, and Gerig.G., (2005) “Synthetic ground truth for validation of brain tumor MRI segmentation,” in MICCAI.NewYork: springer, pp. 26–33. Warfield.S, Zou.K, and Wells.W,(2004), “Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation,” IEEE Trans. Med Imag., vol. 23, no. 7, pp.903–921. Zou.K.H, Warfield,A.Bharatha.S.K, Tempany.C.M.C,Kaus.M.R,Haker.S.J,W.M.Wells, Jolesz.F.A, and Kikinis.R. (2004).“Statistical validationof image segmentation quality based on a spatial overlap index”, Acad. Radiol., vol. 11, no. 2, pp. 178–189.

IJSWS 15-133; © 2015, IJSWS All Rights Reserved

Page 71


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Providing efficient quality web services as per the consumer requirement by composing web service on demand Raghu Rajalingam1, R Prabhu Doss2 PG Scholar, Dept of Computer Science and Engineering, Prist University, Tanjore, Tamilnadu, INDIA. 2 Asst Professor, Dept of Computer Science and Engineering, Prist University, Tanjore, Tamilnadu, INDIA _____________________________________________________________________________________ Abstract: In today’s quick technology and customer need changes everyone wants a better quality of IT services wish to consume. Web services here referred as IT services provided by 3rd parties known as service provider. Present days no organization wants to invest heavily on IT services or IT infrastructure. As it is becoming more expensive to maintain it. Every one cost efficient and quick to integrate their IT requirement moving towards Web service enabled IT solutions. Everyone started focusing on integrating the existing IT solution with in organization or externally available IT solution. In this space web services has dominating the software industry. Every one focusing converting the existing solution to some form web services and offering the services to customer. The existing technology of web services are extended to completely customize the way services are composed as per the customer requirement so that it will add or provide value as well as the maximum benefit to the customer. Dynamically composing the web services based on the user constraint and providing services to customer always challenging. The proposed paper focus on to addressing how to compose web services based on the user constraint dynamically. The proposal is win-win for both service provider as well as the service consumer. Keywords: Service oriented computing; composition of services; dynamic composition; UDDI registry; web services. ___________________________________________________________________________________ 1

I. Introduction In today’s fast moving technology world everyone wants to have better quality of service whatever they want to use with their constraints addressed. Here user constraints wary from user to user and it cannot be same for everyone. User wants to use a better reliable service or user wants to use less expensive services or with multiple constraints such as less expensive with more reliable service constraints. Currently lots of research is focused on in this area how to compose web services dynamically to address user constraints. Composition of web services plays a critical role in case of B2B and integration of enterprise application. Here some time composition of web service may be a single service or it could be more than one dependent different web service to complete the user request. Most of the time service composition consumes lots of development time as well as to create new application. Now there is another parameter is added called user constraint and this will add more time to the existing problem. The problem with current scenario is Service Consumer (SC) and Service Provider (SP) get in to legal agreement for consuming and providing service for a pre-determined duration such as 1 years or 2 year time frame. SC has to use the services offered by the SP with whom he has a legal agreement for that duration whether the services offered by him is good or bad. Imagine if services offered by the SP are really bad or not meeting the expectation of the SC, then he has to terminate the agreement to find better SP or give feed back to the SP that he is not happy with his service. Here SC’s are forced to use the available services and SP dominate with their services without any improvement. SC doesn’t have a much choice as there are few SP’s and there is no much competition to improve the quality of the services. How about provide a multiple option’s to SC’s so that the overall service delivery quality will improve. SC’s will choose the service only meet their explicitly expressed constraints. Here SC’s are not tied to any SP’s and they can choose any service as per their requirement as well as their constraints. This will create a healthy competition among the SP’s to provide better quality of service. Following are some of issues need to be taken care.  Web services are dynamically created and updated so the decision should be taken at execution time and based on recent information.  How to access easily the different web services provided by the different service provider with different conceptual data model without any technical difficulties.  How to ensure the access of the web services allowed only for the authorized user. It should not be the case where everyone accesses all web services. Web services can be categorizedin two ways on the basis of their functionality.  Semantic annotation describes the functionality of web service and  Functional annotation describes how it performs itsfunctionality.

IJSWS 15-148; © 2015, IJSWS All Rights Reserved

Page 72


Raghu Rajalingam et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 72-75

WSDL (Web Services Description Language) isused for specification of messages that are used forcommunication between service providers and consumer.Web services can be composed in two ways. One is static web service composition and other is automated/dynamic web service composition. In static web service composition, webservices are manually composed, i.e. each web service is executed sequentially order one by one to fulfill the service consumer request. Static web service composition is time consuming and tedious task. This requires lots of developer effort to ensure services are properly composed.In case of automated web service composition, a generic framework with intelligence are used compose web services which may internally have many web services to fulfill the service consumer request. From the service consumer point of view it is consider as single service. Web serviced are composed either centralized data flow or decentralized data flow approach. In case of dynamic web services composition, both have advantages and some limitations. The limitation of centralized dataflow is thatall component services must pass through a composite service. II. Preliminaries Below section provide some insight about the web services composition, automated web services composition and actors involved in dynamic web services composition. A. Web Services Composition Web services are distributed applications. The main advantage of web services over other techniques is that web services can be dynamically discovered and invoked on demand, unlike other applications in which static binding is required before execution and discovery. Semantic and ontological concepts have been introduced to dynamically compose web services in which clients invoke web services by dynamically composing it without any prior knowledge of web services. Semantic and ontological techniques are used to dynamically discover andcompose at the same time (run time). B. Automated Web services composition The automated web service composition methods generate the request/response automatically. Most of these methods are based on AI planning. First request goes to Translator which performs translation from external form to a form used by system, and then the services are selected from repositories that meet user criteria. Now the Process Generator composes these services. If there is more than one composite service that meetsuser criteria then Evaluator evaluates them and returns the best selected service to Execution Engine. The results are returned to clients (Requester). There should be well defined methods and interfaces through which clients interact with the system and get the response. III. Proposed Solution Here we are proposing a framework to address composing the web service dynamically based on the user context as well as with user constraints.

1. 2.

User Constraint Query Builder – User constraint query builder will construct the user query based on the user specific constraint for searching the web services from WS Meta Data server. Service Requester – Constructs the user request object and submits the request to WS Meta data server.

IJSWS 15-148; Š 2015, IJSWS All Rights Reserved

Page 73


Raghu Rajalingam et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 72-75

3. 4. 5. 6. 7. 8. 9. 10.

WS Meta data server – All the Meta data related information is indexed for the faster retrieval of the search data. WS – Composition – Constructs the WS – Client dynamically based on the user chosen web service. WS- Instance cache – Is an in memory db to persist the instance of the frequently accessed WS instance. Service Repository – Master DB of all the WS registered by the service provider. Execution engine – A common framework to execute the WS calls. Service Rating engine – Automatically compute the service Rating based on request/response time, QOS etc. Service Recommendation engine – Based on the user query service recommendation engine will recommend the service for his query. Response – A common response object for the WS execution request.

IV. Proposed Technique A. Methodology The methodology of proposed model is given as: 1. Web services Meta data are indexed for the faster retrieval during the search operation. 2. The web services are registered in registries. 3. Service requester send request for service. 4. Translator converts the query into a form used by internal system. 5. The request arrives at composition module. Matching Engine checks for the requested service from WSDBs. If it finds the desired interface base service composition then it sends results to Evaluator. 6. Evaluator evaluates these selected web services in two steps. In first step it evaluates the web services on the basis of interface based search, whereas in second step it performs the evaluation on basis of functionality based rule. After evaluation it sends selected services to composer. The purpose of composer is to compose these component web services. Multiple WSDBs are introduced, so that if one goes down then we can make use of other databases. A timestamp (aging) is maintained with each URI in WSDB. If the request arrives before the expiration of that time then it looks for the service in WSDB. 7. If Matching Engine does not find requested service composition from web services database then it start searching from web. 8. The web services are searched from multiple registries and results are returned to Evaluator. Matching Engine also saves their references in WSDB with aging factor. The purpose of aging is that it maintains the updated information about web services as the contents are refreshed each time when aging time expires. 9. Evaluator evaluates these web services based on interface based and functionality based rules. 10. Composer composes these evaluated resultant services and sends result to Execution Engine. Execution Engine executes these web services and through translator results are sent back to requester. 11. After the successful/Failure transaction the service rating will be computed and persisted in the DB. Later user can use rating as one of the constraint for his web service search. B. Pseudo Code Algorithm: Dynamic Web services composition based on user constraints. Input: Request for web service Output: Composed service Web services with their constraints data are ingested in WS-Meta data server; User enters request for web service along with their constraints; WS-Query builder translates the web service request; Web Service request will be submitted to the WS-Meta data server; WS-Meta data server search services for the submitted user query; WS-Meta data server returns the matching services as per the user ws query; User selects and submits web service matching his requirement to WS-Composition Engine. WS-Composition search for the submitted web service from WS-Instance cache; If selected WS instance is available from WS-Instance cache then Return instance to Execution engine. If selected WS instance is not available from WS-Instance cache then Create a new instance and add to the WS-Instance cache. Return instance to Execution engine. Execution engine will execute the selected web service and return the result; Here we have query builder using which user can specify user constraints is. Based on the query a search request will be submitted to WS-Meta data server. WS-Meta data server will return the search result based on

IJSWS 15-148; © 2015, IJSWS All Rights Reserved

Page 74


Raghu Rajalingam et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 72-75

the user constraint. If any record match with the user query result will be returned to the user. User with the returned list of web service matching his constraint can choose one and complete his operation. When user chooses a specific service matches the constraint then a search is done in WS-Instance cache is there already an instance is available. If no instance is available then a query will be sent to service repository to ensure that service still registered and available. Sometime services are registered for a short time and later removed from the service register. To avoid such problem it is always good to make check before composing or executing the web service. Core component of this framework is WS-Meta data server. The Meta data server contains only the Meta data of the web services along with use full information such as pricing, availability, reliability, user rating, is partner etc. As these information may not available in service registry. Service registry may contain service name, service provider and name space etc. We are creating an index server which very fast in retrieving the required data based on the user query and it contains the Meta data of the services. The web services information along with the constraints are ingested in the Meta data and frequently kept updated as a when data changes. Sometime the data contained in Meta data server will obsolete as this is not the master database for the web services. This is a secondary server for fast searching data. Service registry will be the master copy of web services.WS-Instance cache is another important component as it contains the frequently used web services to speed up the web service composition and execution. There will be few services which will be used frequently by the user. We cache such service instances in WS-Instance cache so that in future if the same web service requested then it will retrieve from the cache instead of creating new one. If referred web service instance is already available in WS-Instance cache then retrieve the instance and execute the web service call. If it is not available then create the instance and add to the WS-Instance cache. This approach used to increase the speed of web service execution. V. Conclusion With dynamic composition of the web services based on the user constraints always ensures better quality of the services rendered to the service consumer. By this approach it will create a healthy eco system among the service provider to provide better quality service. References [1]. [2]. [3]. [4]. [5]. [6]. [7]. [8]. [9]. [10]. [11]. [12].

[13]. [14]. [15]. [16]. [17]. [18]. [19]. [20]. [21].

A Constraint-Based Approach to Horizontal WebService Composition, Ahlem Ben Hassine, Shigeo Matsubara and Toru Ishida,3 Cruz et al. (Eds.): ISWC 2006, LNCS 4273, pp. 130–143, 2006,Springer-Verlag Berlin Heidelberg 2006. An approach to dynamic web service composition, Dmytro Pukhkaiev, Tetiana Kot, Larysa Globa, UDC 621.93, National Technical University of Ukraine “Kyiv Polytechnic Institute”, Kyiv, Ukraine. G. Alonso, F. Casati, H. Kunoand V. Machiraju. Web Services. Concepts, Architectures and Applications, Springer, 2004 354 p. M. P. Papazoglou, P. Traverso, S. Dustdar, F. Leymann: "Service-Oriented Computing: State of the Art and Research Challenges"; IEEE Computer, 40 (November 2007 (2007), 11; P. 38 - 45. M.Papazoglou and D. Georgeakopoulos. “Service-OrientedComputing,” Communications of the ACM, vol. 46, 2003. - P. 2528. W.M.P. van der Aalst, Benatallah B., Casati F. , Curbera F. , and Verbeek H.M.W.. Business Process Management: Where Business Processes and Web Services Meet. // Data and Knowledge Engineering. -2007. - 61(1). – P. 1-5. Y. Jadeja, K. Modi, and A. Goswami. Context Based DynamicWeb Services Composition Approaches: a Comparative Study, International Journal of Information and Education Technology, vol. 2, Apr. 2012. - P.164-166. S. Dustdar, and W. Schreiner, “A survey on web services com-position,” in Int. J. Web and Grid Services, vol. 1, No. 1, 2005. P.1–30. S.Bansal, A. Bansal, and M.B. Blake, “Trust based Dynamic Web Service Composition using Social Network Analysis,” IEEE International Workshop on Business Applications for Social Network Analysis (BASNA 2010), August 2010. - P. 1-8. C. Molina-Jimenez, J. Pruyne, and A. van Moorsel. The Role of Agreements in ITManagement Software. Architecting Dependable Systems III, LNCS 3549. Springer Verlag, Volume3549, 2005. – P. 36-58. OMG Business Process Model and Notation (BPMN) [Online]. - 2011. - Available: http://www.omg.org/spec/BPMN/2.0/PDF N. Russell, W.M.P. Van der Aalst, A.H.M. ter Hofstede, P. Wohed “On the suitability of UML 2.0 activity diagrams for business process modeling,” Proceedings of the 3rd Asia-Pacific conference on Conceptual modeling APCCM '06, vol. 53, 2006. - P. 95-104. Oberle, D., Bhatti, N., Brockmans, S., Niemann, M., Janiesch, C., “Countering Service Information Challenges in the Internet of Services,” Journal of Business & Information System Engineering, vol.1, 2009. - P. 370-390. H. Foster, S. Uchitel, J. Magee, J. Kramer, M. Hu “Using a Rigorous approach for engineering Web service compositions: a case study,” in Proceedings of the 2005 IEEE International Conference on Services Computing (SCC '05), 2005. - P.217-224. OASIS (2007) Web Services Business Process ExecutionLanguage (WSBPEL) [Online]. Available: http://docs.oasisopen.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.pdf. E.M. Maximilien and M.P. Singh., “A framework and ontology for dynamic web services selection,” IEEE Internet Computing, vol. 8, Sep./Oct. 2004. - P. 84-93. M. B. Blake, and D. J. Cummings, “Workflow Composition of Service-Level Agreements,” in Proceedings of IEEE International Conference on Services Computing (SCC), July 2007,. - P.138-145. A. Andrieux, K. Czajkowski, A. Dan, K. Keahey, H. Ludwig, T. Nakata, J. Pruyne, J. Rofrano, S. Tuecke, and M. Xu. (2007) Web Services Agreement Specification (WS-Agreement) Online]. Available: http://www.ogf.org/documents/GFD.107.pdf. A graph-based approach to Web services compositionHashemian, S.V.; Mavaddat, F. Applications and the Internet, 2005. Proceedings. The 2005 Symposium on DOI: 10.1109/SAINT.2005.4 Publication Year: 2005, Page(s): 183- 189. Van der Aalst, W.: Don’t Go with the Flow: Web Services Composition Standards Exposed.IEEE Intelligent Systems (Jan/Feb 2003). Tony Andrews and Francisco Curbera and Hitesh Dholakia and Yaron Goland and JohannesKlein and Frank Leymann and Kevin Liu and Dieter Roller and Doug Smith and SatishThatte and Ivana Trickovic and Sanjiva Weerawarana: Business Process Execution Languagefor Web Services (2003).

IJSWS 15-148; © 2015, IJSWS All Rights Reserved

Page 75


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net State of The Art in Handwritten Digit Recognition Pooja Agrawal Department of Computer Science, SVITS, Indore, Madhya Pradesh, INDIA Prof. Anand Rajavat Department of Computer Science, SVITS, Indore, Madhya Pradesh, INDIA RGPV/SVITS Indore Sanwer Road, Gram Baroli, Alwasa, Indore, Madhya Pradesh, INDIA ______________________________________________________________________________________ Abstract: In this paper, we present an overview of existing handwritten character recognition techniques, specially handwritten digit recognition. All these algorithms are described more or less on their own. Handwritten character recognition is a very popular and computationally expensive task. We also explain the fundamentals of handwritten character recognition. We describe modern and popular approaches for handwritten character recognition. Their strengths and weaknesses are also analyzed. We have concluded with the common problems existing in these methods. __________________________________________________________________________________________ I.

Introduction

Character recognition is an art of detecting segmenting and identifying characters from image. More precisely Character recognition is process of detecting and recognizing characters from input image and converts it into ASCII or other equivalent machine editable form [1,2,3]. HDR has major contribution to the advancement of automation process and improving the interface between man and machine in many applications [4]. Character recognition is one of the most interesting and fascinating areas of pattern recognition and artificial intelligence [5], [6].Character recognition is getting more and more attention since last decade due to its wide range of application. Recognition of handwritten characters is important for making several important documents related to our history. It will help in recognition of historical documents such as manuscripts into machine editable form so that it can be easily accessed and pres independent work is going on in Optical Character Recognition that is processing of printed/computer generated document and handwritten and manually created document processing i.e. handwritten character recognition, which includes handwritten digit recognition. In offline character recognition system, the hand written document is first generated, digitalized, and stored in computer and then it is processed. While in case of online character recognition system the character is processed while it was under creation. Image processing and pattern recognition also plays significant role in handwritten character recognition. II Literature Survey In [10], Rajbala et al have discussed various types of classification of feature extraction methods like statistical feature based methods and structural feature based methods etc. All the statistical methods are based on planning of how data are selected. They use the information of statistical distribution of pixels in image. It can be mainly classified in three categories namely: Partitioning in regions, Profile generation and projections. Distances and crossing. Structural features are extracted from structure and geometry of character like number of horizontal and vertical lines, number of cross points, aspect ratio, number of branch points, number of loops, number of curves, number of strokes, etc. Global transformation features are calculated by converting image in frequency domain like Discrete Fourier Transformation (DFT), Discrete Cosine Transformation (DCT), Discrete Wavelet Transformation (DWT), Gabor filtering, Walsh-Hadamard transformation etc. Feature extracted can be either low level or high level. Low level features include width, aspect ratio, height, curliness, etc. These features alone are not sufficient to distinguish one character from another in the character set of the language [11]. So there are a number of other high level features which include number and position of loops, curves, lines, headlines etc. Tirthraj Dash et al have discussed HCR using associative memory net (AMN) in their paper [12]. They have directly worked at pixel level, in which the dataset was designed in MS Paint 6.1 with normal Arial font of size 28. The characters are extracted first and then their binary pixel values are directly used to train AMN. I.K.Pathan et al have proposed offline approach for handwritten isolated Urdu characters in their wok mentioned in [13]. Urdu character may contain any number of segments. In which one segment is known as primary and rest of all are known as secondary components. Authors have used moment invariants (MI) feature to recognize the characters. MI features are well known to be invariant under rotation, reflection, scaling & translation. These features are measure of the pixel distribution around the center of gravity of character and it captures the global character shape information. If character image is single component than it is normalized in 60 X 60 pixels and horizontally divided in equal 3 parts. 7 MI are extracted from each zone and 7 MI are calculated from overall

IJSWS 15-149; Š 2015, IJSWS All Rights Reserved

Page 76


Pooja Agrawal et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 76-78

image, so total of twenty eight features are used to train SVM. And if image is having multi component than 28 MI are extracted from primary component (60 X 60) and 21 MI are extracted from secondary component (22 X 22). Separate SVM are trained for both and decision is taken based on rules satisfying some criteria. In [4], Pradeep et al have proposed neural network based classification of handwritten character recognition system. Every character is resized to 30 X 20 pixels for processing. Proposed method is using binary features to train neural network.Although such features are not robust. In post processing stage, recognized characters are converted to ASCII format. In this method, the input layer has 600 neurons equal to number of pixels. Output layer has 26 neurons as English has 26 alphabets. It also uses back propagation algorithm with momentum and adaptive learning rate. Rajib et al have proposed Hidden Markov Model based system for English HCR in their literature [8]. They have employed global as well as local feature extraction methods. Global feature involves four gradient features, six projection features and four curvature features. And to extract local features, image is divided in to nine equal blocks and 4 gradient features are calculated from each block, so total of 36 features are extracted. So overall feature vector contains 50 features per character. O = [G(4) P(6) C(4) L(36)], where G, P, C and L represents global gradient, projection, curvature and local gradient features respectively. Number in parenthesis represents number of respective features. HMM is trained using these feature and experiment is carried out. Post processing is also applied after recognition phase of HMM to highly confused group of characters like N and M, O and Q, C and O etc. For each group new feature is calculated to discriminate characters within the group. Gradient features based method is discussed in [14] by Ashutosh et al. Experiment is carried out on Hindi, third most popular language in the world. The first research work on handwritten Devnagari characters was published in 1977. 300 Million People use the Devnagari script for documentation in central and north region of India [8]. In proposed method, Gradient Vector is calculated at each pixel and image is divided in 9 X 9 blocks. Then strength of gradient is accumulated in eight standard directions in each sub block. 9 X 9 blocks is further down sampled to 5 X 5 block using Gaussian filter. Preprocessing steps for proposed methods are as follows: Intensity values of image were adjusted and then images were converted to binary with threshold value 0.8. Connected component with pixel density less than 30 were removed from further processing. Median filter was applied to remove pepper and salt noise present in binary images. And finally, individual characters were extracted by row and column histogram processing and normalized to 90 X 90 pixel block. Gradient feature were extracted using sobel operator. Velappa et al have proposed multiscale neural network based approach in [15]. Neural networks like Feed forward back propagation neural network requires long training time to memorize and generalize all input feature vectors [10]. And still there are good chances of misclassification. Generalization problem can be overcome by using multi scale neural network [11].Proposed system first convert camera captured RGB image to binary image. Width to Height Ratio (WH), Relative Height (RH) ratio, Relative Width ratio (RW) is calculated to remove unnecessary connected components from image. For multi scale neural network, detected character is resized to 20 X 28 pixels, 10 X 14 pixels and 5 X 7 pixels. Binary features of these different resolution images are given to three layer feed forward back propagation algorithm. In literature [16], T.Som et al have discussed fuzzy membership function based approach for HCR.Character images are normalized to 20 X 10 pixels. Average image (fused image) is formed from 10 images of each character. Bonding box around character is determined by using vertical and horizontal projection of character. After cropping image to bounding box, it is resized to 10 X 10 pixels size. After that, thing is performed and thinned image is placed in one by one raw of 100 X 100 canvas. Similarity score of test image is matched with fusion image and characters are classified. In literature [17], Rakesh Kumar et al has proposed single layer neural network based approach for HCR to reduce training time. Characters are written on A4 size paper in uniform box. Segmented characters are scaled to 80 X 80 pixels. Each 0 is replaced by -1 for better training. Diagonal based feature extraction work is mentioned in [19], which is improved by Sharma et al, discussed in [20]. They have proposed zone based hybrid feature extraction method. Euler number concept is used to improve speed and accuracy. Thresholding, filtering and thinning operations are performed as a part of preprocessing. Segmentation can be classified into three broad categories: Top down, Bottom up and Hybrid techniques. In proposed method segmented character is resized to 90 X 60. After calculating Euler number from this image, character is divided in to 10 X 10 pixel 54 zones. Each zone value is replaced by average intensity value and is used as feature value. 9 and 6 features are extracted by averaging values row wise and column wise, so it forms total 69 features. A FFBPNN with configuration 69-100-100-26 is used for classification. III Conclusion In this paper, we elaborated the basic concept of handwritten digit recognition. We analyzed some popular and modern methods for handwritten character recognition. Accuracy of HCR is still limited to 90 percent due to presence of large variation in shape, scale, style, orientation etc. Lots of work has been done on handwritten digit recognition. But still it is a heart favourite research topic. Because the accuracy is needed to be improved.

IJSWS 15-149; Š 2015, IJSWS All Rights Reserved

Page 77


Pooja Agrawal et al., International Journal of Software and Web Sciences, 11(1), December-2014 to February-2015, pp. 76-78

IV References [1]

[2] [3]

[4] [5] [6] [7] [8]

[9] [10] [11] [12] [13] [14]

[15] [16] [17] [18] [19] [20]

Kai Ding, Zhibin Liu, Lianwen Jin, Xinghua Zhu, A Comparative study of GABOR feature and gradient feature for handwritten 17hinese character recognition, International Conference on Wavelet Analysis and Pattern Recognition, pp. 1182-1186, Beijing, China, 2-4 Nov. 2007 Pranob K Charles, V.Harish, M.Swathi, CH. Deepthi, "A Review on the Various Techniques used for Optical Character Recognition", International Journal of Engineering Research and Applications, Vol. 2, Issue 1, pp. 659-662, Jan-Feb 2012 Om Prakash Sharma, M. K. Ghose, Krishna Bikram Shah, "An Improved Zone Based Hybrid Feature Extraction Model for Handwritten Alphabets Recognition Using Euler Number", International Journal of Soft Computing and Engineering, Vol.2, Issue 2, pp. 504-58, May 2012 J. Pradeepa, E. Srinivasana, S. Himavathib, "Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten", International journal of Engineering,Vol.25, No. 2, pp. 99-106, May 2012 Liu Cheng-Lin, Nakashima, Kazuki, H, Sako, H.Fujisawa, Handwritten digit recognition: investigation of normalization and feature extraction techniques, Pattern Recognition, Vol. 37, No. 2, pp. 265-279, 2004. Supriya Deshmukh, Leena Ragha, "Analysis of Directional Features - Stroke and Contour for Handwritten Character Recognition", IEEE International Advance Computing Conference, pp.1114-1118, 6-7 March, 2009, India Amritha Sampath, Tripti C, Govindaru V, Freeman code based online handwritten character recognition for Malayalam using Back propagation neural networks, Advance computing: An international journal, Vol. 3, No. 4, pp. 51-58, July 2012 Rajib Lochan Das, Binod Kumar Prasad, Goutam Sanyal, "HMM based Offline Handwritten Writer Independent English Character Recognition using Global and Local Feature Extraction", International Journal of Computer Applications (0975 8887), Volume 46 No.10, pp. 45-50, May 2012 Bhatia, N. and Vandana, Survey of Nearest Neighbor Techniques, International Journal of Computer Science and Information Security, Vol. 8, No. 2, (2001), 302-305. Rajbala Tokas,Aruna Bhadu, "A comparative analysis of feature extraction techniques for handwritten character recognition", International Journal of Advanced Technology & Engineering Research, Volume 2, Issue 4, pp. 215-219, July 2012 Amritha Sampath, Tripti C, Govindaru V, "Freeman code based online handwritten character recognition for Malayalam using backpropagation neural networks", International journal on Advanced computing, Vol. 3, No. 4, pp. 51 - 58, July 2012 Tirtharaj Dash, Time efficient approach to offline hand written character recognition using associative memory net., International Journal of Computing and Business Research, Volume 3 Issue 3 September 2012 Imaran Khan Pathan,Abdulbari Ahmed Bari Ahmed Ali, Ramteke R.J., "Recognition of offline handwritten isolated Urdu character ", International Journal on Advances in Computational Research, Vol. 4, Issue 1, pp. 117-121, 2012 Ashutosh Aggarwal, Rajneesh Rani, RenuDhir, "Handwritten Devanagari Character Recognition Using Gradient Features", International Journal of Advanced Research in Computer Science and Software Engineering (ISSN: 2277-128X), Vol. 2, Issue 5, pp. 8590, May 2012 Velappa Ganapathy, Kok Leong Liew, "Handwritten Character Recognition Using Multi scale Neural Network Training Technique", World Academy of Science, Engineering and Technology, pp. 32-37, 2008 T.Som, Sumit Saha, "Handwritten Character Recognition Using Fuzzy Membership Function", International Journal of Emerging Technologies in Sciences and Engineering,Vol.5, No.2, pp. 11-15, Dec 2011 Rakesh Kumar Mandal, N R Manna, "Hand Written English Character Recognition using Row- wise Segmentation Technique", International Symposium on Devices MEMS, Intelligent Systems & Communication, pp. 5-9, 2011. Farah Hanna Zawaideh, "Arabic Hand Written Character Recognition Using Modified Multi-Neural Network", Journal of Emerging Trends in Computing and Information Sciences (ISSN 2079-8407), Vol. 3, No. 7, pp. 1021-1026, July 2012 J Pradeep, E Shrinivasan and S.Himavathi, "Diagonal Based Feature Extraction for Handwritten Alphabets Recognition System Using Neural Network", International Journal of Computer Science & Information Technology (IJCSIT), vol. 3, No 1, Feb 2011. Om Prakash Sharma, M. K. Ghose, Krishna Bikram Shah, "An Improved Zone Based Hybrid Feature Extraction Model for Handwritten Alphabets Recognition Using Euler Number", International Journal of Soft Computing and Engineering (ISSN: 2231 - 2307), Vol. 2, Issue 2, pp. 504-508, May 2012.

IJSWS 15-149; Š 2015, IJSWS All Rights Reserved

Page 78


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

A Survey on Privacy Preserving Technique to Secure Cloud Vitthal Sadashiv Gutte, Prof. Priya Deshpande Pune University, MIT College of Engineering, Kothrud, Pune 411038, Maharashtra, India

___________________________________________________________________________ Abstract: With the advancement of cloud computing and storage technology, large-scale databases are being exponentially generated today. Storage management systems to cloud it still faces a number of fundamental and critical challenges, among which storage space and security is the top concern. To ensure the correctness of user and user’s data in the cloud, we propose third party authentication system. In addition to simplified data storage and secure data acquisition. Finally we will perform security and performance analysis which shows that the proposed scheme is highly efficient for maintaining secure data storage and acquisition. Keywords: Storage technology, security, database, storage management

___________________________________________________________________________ I. Introduction There is a danger factor in storing your personal data on public cloud; data encryption is a great way to discourage people from accessing unauthorized data. If you plan to store your data on the public cloud security key will identify them as your own work and discourage people from copying them or claiming them. As their own and in case of cloud storage it makes it very difficult for maintaining storage space and also security for that matter. Cloud computing is a technology that keep data and its application by using internet and central remote servers. Cloud computing can considered as a new computing paradigm with the implications for greater flexibility and availability with the minimum cost. Because of this, cloud computing has been receiving a good attention from many people with different work areas. When using the storage services offered by Cloud service providers it is very important to secure information that enters the cloud and protecting the privacy associated with it. Thus requires deeper security into the cloud’s infrastructure. As privacy issues are sure to be central to user concerns about the adoption of Cloud computing, building such protections into the design and operation of the Cloud is vital to the future success of this new networking paradigm. II. Literature Survey A. Challenging Issues [1]: To secure data and also to built highly efficient architecture which allows batch processing during auditing Gap Analysis: Author performed analysis on system and proves that system is provably secure but User's files are not encrypted on some open source cloud storage systems. Statement of Aims and Objectives: In this paper, author focused on eliminating the burden of cloud user from the tedious and possibly expensive auditing task author proposed a privacy-preserving public auditing system for data storage security in cloud computing and also prevent outsourced data leakage. Method also performs multiple auditing tasks in a batch manner for better efficiency. Author used Amazon EC2 cloud for demonstration. Methodology and Techniques to be Used: Author used the homomorphic linear authenticator and random masking techniques so to guarantee that the TPA would not learn any knowledge about the data content stored on the cloud server. Finally author performed an extensive analysis which shows that their proposed schemes are provably secure and highly efficient. B. Challenging Issues [2]: To maintain data correctness. To design system in such a way that it will be highly efficient and resilient against attacks like malicious data modification attacks, server colluding attacks and also Byzantine failure. Gap Analysis: Author’s analysis shows that system is built to maintain data correctness and proves that system is provably secured but User's files are not encrypted on some open source cloud storage systems. Statement of Aims and Objectives: In this paper authors described Cloud storage and process to remotely storage of data and the on-demand high quality cloud applications without the burden of local hardware and software management and explained the

IJSWS 15-153; © 2015, IJSWS All Rights Reserved

Page 79


Vitthal Sadashiv Gutte et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 79-82

benefits of the same. Methodology and Techniques to be used: In this paper author proposed a flexible distributed storage integrity auditing mechanism, which utilizes the homomorphic token and distributed erasure-coded data. Authors designed the system in a way that allows users to audit the cloud storage with very lightweight communication and computation cost. Authors mainly focus on the correctness of the data in cloud. Proposed system is highly is highly efficient and resilient against malicious attacks like data modification attack, ser ver colluding attacks and also Byzantine failure. C. Challenging Issues [3]: To ensure data correction, storage correction and also error localization. Gap Analysis: Author proposed that system to ensure data correction, storage correction and error localization but Anyone can intentionally access or modify the data files as long as they are internally consistent, for that author does not used any encryption scheme. Statement of Aims and Objectives: In this paper, author proposed an effective and flexible distributed scheme with explicit dynamic data support to ensure the correctness of users’ data in the cloud. Author proposed data correcting code in the file distribution preparation to provide redundancies and guarantee the data dependability which drastically reduces the communication and storage overhead as compared to the traditional replication-based file distribution techniques. Methodology and Techniques to be Used: Author used homomorphic token with distributed verification of erasure-coded data.Proposed system is highly efficient and resilient against malicious data modification attack server colluding attacks and Byzantine failure. Proposed system not only achieves the storage correctness insurance but also data error localization. D. Challenging Issues [4]: To supports data dynamic operations. To support batch auditing for multiple owners as well as multiple clouds, without using any trusted organizer. Gap Analysis: Proposed method provide consistent place to save valuable data and documents but owner's files are not encrypted on open source cloud storage systems. Statement of Aims and Objectives: Author studies about data owners and data consumers and their access privileges and new security challenges that comes with cloud computing, which needs an independent auditing service to check the data integrity in the cloud. Author also mentioned some existing remote integrity checking methods that can only serve for static archive data. Existing data integrity checking methods does not suffice existing cloud computing security needs because the data in the cloud can be dynamically updated. So author proposed an efficient and secure dynamic auditing protocol. Methodology and Techniques to be Used: In proposed system authors proposed their own auditing protocol and designed an auditing framework for cloud storage systems and propose an efficient and privacy-preserving auditing protocol and then extend their auditing protocol which support data dynamic operations and also further extend proposed auditing protocol to support the batch auditing for both multiple clouds as well as multiple owners, without using any trusted organizer. E. Challenging Issues [5]: To design a system which supports distributed storage with acceptable cost Gap Analysis: Author only proposed distributed storage and it does not ensures data security Statement of Aims and Objectives: The main objective was provide applications with explicit knowledge about the properties of the storage resources they desire to use, and, in turn, uses that knowledge to maintain such properties in a manner which limits the costs only to data/state that require them. Methodology and Techniques to be Used: In this paper author presents FleCS and its associated abstractions, and explores their utility in supporting diverse and flexible cloud storage services. Through use of storage containers, and their associated attributes, FleCS enables realizations of range of storage services with acceptable costs, include those targeting cross-cloud storage interactions. Authors are continuing to evolve the FleCS prototype and to develop and experiment with realistic applications and use. III. Proposed Architectures and Methods In this paper, we propose an effective and flexible distributed scheme with explicit dynamic data support to

IJSWS 15-153; Š 2015, IJSWS All Rights Reserved

Page 80


Vitthal Sadashiv Gutte et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 79-82

ensure the correctness of users data in the cloud. An optional third party authentication, who has expertise and authorities that users may not have, is trusted and is able expose risk of cloud storage services on behalf of the users upon request. We propose batch auditing for TPA for multiple users. To enable privacy- preserving public auditing for cloud data storage under the proposed model, our protocol design should achieve the following security and performance guarantees public auditability to allow TPA to verify correctness of user’s data without retrieving the copy of whole data. We will maintain storage correctness to ensure that there exist no cheating cloud servers that can pass the TPA’s audit. To maintaining privacy we have to ensure that TPA cannot derive users data content from the information collected during the auditing process.

Fig. 3.1 Proposed Architecture We propose an effective and flexible distributed scheme with explicit dynamic data support to ensure the correctness of user and user’s data in the cloud. We erasures correcting code in the file storage preparation to provide redundancies and guarantee the data dependability. Our goal is to build up a repository to facilitate the data integration and sharing across cloud along with preservation of data confidentiality. For this we will be using an effective encryption technique to provide data security on data storage. As shown in fig 3.1 encrypting the data before storing in cloud can handle the confidentiality issue and to ensure correctness of users data in cloud we used TPA, so proposed system provides effective and efficient users data correctness with minimum computation, communication and storage overhead. In past several years cloud computing has experienced massive growth in the corporate industry, especially as the technology caters to media interoperability and accessibility. Our objective is to build a security service which will be provided with a trusted 3rd party, and would lead to providing only security services. Main aim to be achieved is providing security to data in public cloud by focusing on 2 important issues: 1) Integrity 2) Privacy Detailing it further: 1. To construct Web service system which would provide data integrity verification, provide encryption/decryption of the consumer data. 2. Defining access list for sharing data securely with specific band of individuals. 3. To construct thin client application which would call this web service before uploading/downloading the data to and from cloud. VI. Conclusion In many organizations the main issues is maintaining the security and privacy of confidential data. Cloud store different types of data for example documents, data sheets, digital media object and it is necessary to give guarantee about data confidentiality. Data integrity, privacy and auditing are the terms which examine all stored data to maintain privacy and integrity of data and give data confidentiality. References [1]

[2]

Privacy-Preserving Public Auditing for Secure Cloud Storage Cong Wang, Member, IEEE, Sherman S.M. Chow, Qian Wang, Member,IEEE, Kui Ren, Senior Member, IEEE, and Wenjing Lou, Senior Member, IEEE TRANSACTIONS ON COMPUTERS, Vol.,. 62, No. 2,FEBRUARY 2013. Towards Secure and Dependable Storage Services in Cloud Computing Cong Wang, Student Member, IEEE, Qian Wang, Student Member, IEEE, Kui Ren, Member, IEEE, Ning Cao, Student Member, IEEE, and Wenjing Lou, Senior Member, IEEE.

IJSWS 15-153; © 2015, IJSWS All Rights Reserved

Page 81


Vitthal Sadashiv Gutte et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 79-82

[3]

[4]

[5] [6]

[7] [8] [9] [10]

[11] [12]

Ensuring Data Storage Security in Cloud Computing Cong Wang, Qian Wang, and Kui Ren Department of ECE Illinois Institute of Technology Email: {cwang, qwang, kren}@ece.iit.edu Wenjing Lou Department of ECE Worcester Polytechnic Institute Email: wjlou@ece.wpi.edu. An Efficient and Secure Dynamic Auditing Protocol for Data Storage in Cloud Computing Kan Yang, Student Member, IEEE, and Xiaohua Jia, Fellow, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 9, September 2013. An Efficient and Secure Protocol for Ensuring Data Storage Security in Cloud Computing Syam Kumar, P,Subramanian R Department of Computer Science, School of Engineering & Technology Pondicherry University,Puducherry-605014, India. Privacy-Assured Outsourcing of Image Reconstruction Service in Cloud CONG WANG (Member, IEEE), BINGSHENG ZHANG (Member, IEEE), KUI REN (Senior Member, IEEE), AND JANET M. ROVEDA (Senior Member, IEEE.) Department of Computer Science, City University of Hong Kong, Hong Kong, Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY 14214 USA, Department of Electrical and Computer Engineering, University of Arizona at Tucson, Tucson, AZ 85721 USA. Ensuring Distributed Accountability for Data Sharing in the Cloud Smitha Sundareswaran, Anna C. Squicciarini, Member, IEEE, and Dan Lin. Data Integrity Proofs in Cloud Storage Sravan Kumar R Software Engineering and Technology labs Infosys Technologies Ltd Hyderabad, India, Ashutosh Saxena, Software Engineering and Technology labs Infosys Technologies Ltd Hyderabad, India. Cooperative Provable Data Possession for Integrity Verification in Multi-Cloud Storage Yan Zhu, HongxinHu, Gail-Joon Ahn, Senior Member, IEEE, MengyangYu . Cloud Computing Security Issues and Challenges Kuyoro S. O. , Department of Computer Science Babcock University IlishanRemo, 240001, Nigeria Ibikunle F. ,Department of Computer Science Covenant University Otta, 240001, Nigeria Awodele O. Department of Computer Science Babcock University Ilishan-Remo, 240001, Nigeria. An analysis of security issues for cloud computingKeiko Hashizume, David G Rosado , Eduardo Fernández-Medina and Eduardo B Fernandez. Cloud Computing Security Issues, Challenges and Solution International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 8, August 2012) Pradeep Kumar Tiwari , Dr. Bharat Mishra M.phil (CSE)student, Reader in department of physical Science, at Mahatma Gandhi Chitrakoot Gramodaya Vishwavidyalaya Chitrakoot - Satna (M.P.).

IJSWS 15-153; © 2015, IJSWS All Rights Reserved

Page 82


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

Optimal solution of software component selection by using software metric 1

Hema Gaikwad, 2Ms. Prafulla Bafna Assistant Professor Symbiosis Institute of Computer Studies and Research Symbiosis International University, Pune, Maharashtra, INDIA ______________________________________________________________________________________ Abstract: The base for component base software engineering (CBSE) is the reusable code/architecture (component) and secondly save the development cost, time and efforts of our resource. This CBSE process consists of four phases qualifying the component, adapt the component, compose the component and update the component. The CBSE concept is derived from iterative Spiral process model. The first most challenging part of CBSE is component selection. Generally the stakeholders requirements are not constant they vary from stakeholder to stakeholder, so it become very difficult to select the component which exactly fulfill the requirements. In very few cases we directly use component otherwise they requires some amount of modification. This paper explores the idea of component selection by using software metric, we know that software metric represents quantitative measurement. There are two categories of software metric Direct metric and Indirect metric. Direct metric includes cost, effort, LoC, response time, resource usage and so on. Indirect metric includes functionality, User-friendliness, Usability, efficiency, ease of use, quality and testability. The paper focuses on Indirect testability metric for software component. Keywords: CBSE, Software component, Re usability, LoC ______________________________________________________________________________________ I. Introduction Software is a set of instructions or programs and Engineering is to build something. Software Engineering is a process through which we can develop the software. Pressman stated that Software Engineering is a systematic, disciplined quantifiable approach to the development, operation and maintenance of software; that is the application of engineering to the software [1]. There are number of models are available for development of any software and all the process model base on the principle of PDCA (Plan Do Check and Act) Cycle. Plan means define your objectives, goals and the plan of how you will achieve that goal. Do means execution accordingly. Check or Test means to make sure that we are moving according to the plan and getting the desired result. Act means during the check cycle if any issues are there then take appropriate action accordingly and revise your plan. Generally Developers and other stakeholders of the project do the “Plan, Do and Act” while tester do the “Check” part. Examples of process models are System Development Life Cycle (SDLC), Rapid Application development (RAD), Spiral model and Component base assembly model etc. System development life cycle (SDLC) is the base for all process models. When we develop any application first we decompose it into small modules or units this process is known as work break down structure. It is the method through which we can easily develop as well as maintain the software. The phases of SDLC are as follows -Requirement engineering, Feasibility study, Planning, Analysis and Design, Code, Verification & Validation and Implementation. Requirement engineering is the first phase of SDLC. We collect the requirement using various tools such as Interview, Questionnaire, Observation and Review of documents and then segregate the requirements into Functional Requirements (WHAT Requirements) and Non functional Requirements (HOW Requirements). Feasibility Study identifies, describes and evaluate the requirements and tell whether this candidate system is developed or not. Planning includes project course of action and determining what is to be done to meet the goals. Analysis &Design means study of the previous/old system and Design means process of developing the technical and operational specification of a system for implementation. Quality assurance and quality control activities represent verification and validation activities. When we are in implementation phase we deploy system to client. The component based development model incorporates many of the characteristics of the spiral model. It is evolutionary in nature, demanding an iterative approach to the creation of the software. However the component based development model constructs applications from pre packaged software components. Component base model is associated with Component based software engineering. Componentbased software engineering (CBSE) is used to develop/assemble software from existing components [6]. It also leads the software reuse and re usability. Software reuse, the use of existing software artifacts or knowledge to create new software, is a key method for significantly improving software quality and productivity. Re usability is the degree to which a thing can be reused [2]. The integration or assembly of the already existing software components accelerates the development process. Many virtual components libraries are available on the web.

IJSWS 15-154; © 2015, IJSWS All Rights Reserved

Page 83


Hema Gaikwad, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 83-87

Component integration becomes easy when we select the right component. If the right components are chosen, the integration aspect is made much simpler. II. Analysis Study Process Model Name

When to Use

Features

Disadvantages

Software Development Model

Requirements are clearly given

1. Easy to use. 2. Easy to implement.

1. Sequential process. 2. Backtracking is not allowed.

Component Assembly Model

Component based model should be used when a particular application is risk based and when there are so many reusable component are present in that case this is the most important and more useful.

1. Reduced development time. 2. It is a pure re usability model. 3. It is the enhancement of spiral model that's why it also includes risk analysis, which is very important. NA 4. Increased quality and productivity.

III.

Table 1 Re usability Model or Component Base Model

Figure 1 The above model shows component base model. The diagram has two different parts. The left one represent the spiral model but the right one represents the virtual library. In a spiral model we have various spirals and always move in clock wise directions. Inner spiral represent the concept development area and the last spiral represent the Maintenance and future enhancement area. It is the development model and consists of Customer communication, planning, risk analysis, Engineering construction & release and last is customer evaluation. In customer communication we collect all requirements with the help of various fact finding tools such as interview, questionnaires, on site observation and review of documents. In planning phase prepare plan document which includes estimation process, schedule details and resource details etc. Risk analysis is the most important, attracting and unique feature in the spiral model. Engineering includes analysis, designing and construction activity. When we construct any application first we prepare WBS (work Breakdown structure) and get to know the information about modules. After decomposition we start searching the components if it is available we use it and if it is not develop new component and keep in library for future reference. IV. Proposed re usability model and software metric Ronald J. Leach stated that “software reuse� is a situation in which some software is used in more than one project. Software is defined loosely as one or more items that are considered part of an organization's standard software engineering process that produces some product [5]. Capers Jones stated that we can reuse the data, architecture, design, programs and module also [5]. Component-based software development is a collection of process that deals with the systematic reuse of existing components often know as commercial off-the-shelf (COTS) and assembling them together to develop an application rather than building and coding overall application from scratch, thus the life cycle of component-based software systems is different from that of the traditional software systems. In general, analysis and design phases for component-based software

IJSWS 15-154; Š 2015, IJSWS All Rights Reserved

Page 84


Hema Gaikwad, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 83-87

development process models take more time than traditional ones and much less time is spent in development, while testing occurs throughout the process [7]. Everybody knows that component based software engineering (CBSE) is very useful concept. Re usability helps to save time, efforts and cost. Following table shows the SDLC phase and corresponding components. SDLC phase

Components

Requirement Engineering

SRS, RTM

Feasibility study & Planning

Feasibility report & plan document

Modeling

Design document

Coding

Code

Verification & Validation

Verification & Validation document

Deployment

Final Report

Maintenance & Post Maintenance

Maintenance & Post Maintenance report

Table 2 Proposed model says that if we start the component searching from first phase of SDLC, instead of checking only in coding phase we can minimize the TCE (time, cost and effort). Requirement Engineering

RTM & SRS

Feasibility study & Planning

Feasibility report & Plan document

Modeling

Coding

Verification & Validation

Design Document

Virtual Library

Code

Verification & Validation document

Deployment

Final Report

Maintenance & Post Maintenance

Maintenance & Post Maintenance report

Figure 2 Figure 2 shows the virtual library mapping with SDLC phases and its components. The most challenging part is component selection, as we know that software metric helps in this regards. The paper focuses on the use of Indirect testability metric using control flow graph. Assume that we are searching components for coding phase of SDLC. Suppose P is problem, C is component, L1, L2 and L3 are languages. Source code is the input for control flow graph and output is the numeric value. The steps of Indirect testability metric using control flow graph are as follows: 1. Input any source code 2. Draw control flow graph 3. Find out edges (e) and nodes (n) 4. Apply formula Cyclomatic Complexity CC=e-n+2

IJSWS 15-154; Š 2015, IJSWS All Rights Reserved

Page 85


Hema Gaikwad, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 83-87

5. Get output (numeric value). Control flow graph for L1 1,2,3,4,5,6,7,8 9,10,11

Figure 3 No of nodes n=01 No. of edges e=0 CC=03 Control flow graph for L2 1,2,3,4,5,6,7,8 9,10,11,12,13

Figure 4 No of nodes n=01 No. of edges e=0 CC=03 Control flow graph for L3 1,2,3,4,5 6,7,8,9

10

11

12,13,14 15,16 17

Figure 5 No of nodes n -05 No. of edges e -05 CC=02 Language

Component Cyclomatic Complexity

L1

03

L2

03

L3

02

Table 3 The above table shows that component complexity in L1 is 03, meaning that write three white box test cases to test the component, secondly component complexity in L2 is 03 meaning that write three white box test cases to test the component and lastly component complexity in L3 is 02 meaning that write two white box test cases to test the component. This is the way we optimally select the component. V. Conclusion The software component selection is done through various metric such as requirement metric, design metric, source code metric, validation metric and documentation metric. Metric is a quantitative measurement. After applying the metric we get number and the component which has the least value will be best. We can select this

IJSWS 15-154; Š 2015, IJSWS All Rights Reserved

Page 86


Hema Gaikwad, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 83-87

component and rework on it. Software development can be rushed with lesser re-work by implementing the component. The cost of the software projects can also be reduced optimally and the overall product development cycle can be made faster. This could be pivotal for project development. References [1] [2] [3] [4] [5] [6]

[7] [8]

Roger Pressman, Software engineering, 7th international edition, 2010. Software Reuse Metrics and Models WILLIAM FRAKES Virginia Tech and CAROL TERRY INCODE Corporation. Dorothy Graham, Erik van Veenendaal ,Isabel Evans and Rex Black. Foundations of software testing, US, 2006. Omprakash Deshmukh, Mandakini Kaushik “A Overview of Software Verification &Validation and Selection Process” International Journal of Computer Trends and Technology volume4, Issue2- 2013. Software Reuse Methods, Models and costs by Ronald J. Leach. An Improved Model for Component Based Software Development by Asif Irshad Khan, Noor-ul-Qayyum, Usman Ali Khan Department of Computer Science, Singhania University, Jhunjhunu, Rajasthan, India Faculty of Computing and Information Technology, King Abdul Aziz University, Jeddah, Saudi Arabia. Murat güneştaş, “A study on component based software engineering”, Master’s thesis in Computer Engineering, Atılım University, January 2005. Sajid Riaz, Moving Towards Component Based Software Engineering in Train Control Applications, Final thesis, Linköpings universitet, sweden, 2012 .

IJSWS 15-154; © 2015, IJSWS All Rights Reserved

Page 87


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Applying Data Mining in Higher Education Sector 1

Archana Sharma, 2Prof. Vibhakar Mansotra, 3Rhea Mahajan Department of Computer Application, Jodhpur National University, Jodhpur, Rajasthan, India. 2 Department of Computer Science & IT, University of Jammu, Jammu, India. 3 Department of Computer Engineering.Yogananda College Of Engineering and Technology, India. _________________________________________________________________________________________ Abstract: The new interesting subject that offered by institution to interact more student is “DATA MINING”. In this paper we will discuss about the problem that are faced by students how to choose the best institutes for learning. One of the biggest challenges that student’s faced tough time selecting the right engineering college that opens doors to exciting careers. Students would like to know, which college provides better quality education and its alumni are successful in the real world. Data Mining helps to students to take decision more accurately. Data mining is better tool to predict the general information of the college, courses offered and no. of seats, selection criteria, infrastructure, faculty performance, industry interface, placement, and potential to network, exchange programs and global exposure and national and international alumni chapter. In this paper we will discuss about data mining, their different phase’s, advantages and also we classify data using weak data mining tool which helps to understand the data. In this paper we will use decision tree algorithm to predict the status of colleges, faculty performance, student feedback, student performance, infrastructure, placements and emotion states of students. Key terms: Induction Algorithm, Knowledge Discovery, Information Gain _________________________________________________________________________________________ 1

I. INTRODUCTION Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process.[7] The following figure (Figure 1) shows data mining as a step in an iterative knowledge discovery process.

Fig. 1: Data mining Process The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps: 1. Data cleaning: Also known as data cleansing, it is a phase in which noise data and irrelevant data are removed from the collection. 2. Data integration: At this stage, multiple data sources, often heterogeneous, may be combined in a common source. 3. Data selection: At this step, the data relevant to the analysis is decided on and retrieved from the data collection. 4. Data transformation: Also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. 5. Data mining: It is the crucial step in which clever techniques are applied to extract patterns potentially useful. 6. Pattern evaluation: In this step, strictly interesting patterns representing knowledge are identified based on given measures.

IJSWS 15-165; © 2015, IJSWS All Rights Reserved

Page 88


Archana Sharma et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 88-92

7.

Knowledge representation: Is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. It is common to combine some of these steps together. For instance, data cleaning and data integration can be performed together as a pre-processing phase to generate a data warehouse. Data selection and data transformation can also be combined where the consolidation of the data is the result of the selection, or, as for the case of data warehouses, the selection is done on transformed data. The KDD is an iterative process. Once the discovered knowledge is presented to the user, the evaluation measures can be enhanced, the mining can be further refined, new data can be selected or further transformed, or new data sources can be integrated, in order to get different, more appropriate results. Data mining tools and algorithms Machine Learning Computer science, heuristics and Induction algorithms Artificial Intelligence Emulating human intelligence Neural Networks A. Classification by Decision Tree Induction The basic algorithm for decision tree induction is a greedy algorithm that constructs decision trees in a top-down recursive divide-and-conquer manner. The algorithm is a version of ID3,awell-known decision tree induction algorithm. [6] The basic strategy is as follows: 1. The tree starts as a single node representing the training samples. 2. If the samples are all of the same class, then the node becomes a leaf and is labeled with that class. 3. Otherwise, the algorithm uses an entropy based measure known as information gain as a heuristic for selecting the attribute that will best separate the samples into individual classes. This attribute becomes the “test” or “decision” attributes at the node.in this version of the algorithm, allattributes arecategorical, thatis, discrete-valued. Continous-valued attribute must be discretized. 4. A branch is created for each known value of the test attribute, and the samples are portioned accordingly. 5. The algorithm uses the same process recursively to form a decision tree for the samples at each partition. Once an attribute has occurred at a node, it need not be considered in any of the nodes descendants. 6. The recursive partitioning stops only when any one of the following conditions is true: a) All samples for a given node belong to the same class. b) There are no remaining attribute on which the samples may be further partitioned.in this case, majority voting is employed. This involves converting the given node into a leaf and labeling it with the class in majority among samples. Alternatively, the class distribution of the node samples may be stored. c) There are no samples for the branch test-attribute=ai. In this case, a leaf is created with the majority class in samples. Attribute selection measure is computed by Information gain Used by the ID3, C4.5 and C5.0 tree-generation algorithms. Informationgain is based on the concept of entropy from information theory.

B. Tools of Data Collection & Analysis Various tools are needed for that project some for analyzing data, some for designing, implementation and some developing software tool these are:         

MYSQL DATBASE EXCEL MS ACCESS SPSS METLAB TOOL WEKA DATA MINING TOOL TANGARA DATA MINING TOOL WEB MINER V.B 6.0

II. DATA MINING EXPERIMENT In this research work I will collect data of three thousand students from three colleges but in this example I have chosen fourteen students from three colleges. In the first step we clean and integrate data. For our problem we

IJSWS 15-165; © 2015, IJSWS All Rights Reserved

Page 89


Archana Sharma et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 88-92

chose five attributes these converted into its equivalent values which are given below in the table. Table I: Selected Attributes S.No

Selected attributes

Description

1

College_name

X,Y,Z

2

Faculty_performance

Excellent,good,averge

3

Campus_placement

Yes,no

4

Student_performance

Pass,reappear

5

Class:take_admission

Yes,no

Fig. 2: csv file of database After collecting and cleaning the data we classify data using weka mining tool. For classifying and prediction we use ID3 algorithm. in this experiment table 2 presents a training set of data tuples taken from the college1 database. The class label attribute, take_admission, has two distinct values (namely yes or no).let class C1 correspond to yes and class c2 correspond to no.There are 9 samples of class yes and 5 samples of class no. Now we compute the information gain of each attribute, we first use equation to compute the expected information needed to classify a given sample: I( S1,S2)=I(9,5)=-9/14 log29/14-5/14 log2 5/14=0.940. Next we compute the entropy of each attribute. For college_name=”X” S11=2 S21=3 I(S11,S21)=0.971 For college_name=”Y” S12=4 S22=0 I(S12,S22)=0 For college_name=”Z” S13=3 S23=2 I(S13,S23)=0.971 Using equation 2, the expected information needed to classifying a given sample, if the samples are partitioned according to college_name is E(college_name)=5/14 I(S11,S21) + 4/14 I(S12,S22) +5/14I(S13,S23)=0.694. Hence, the gain in information from such a partitioning would be Gain(college_name)=I(S1,S2) –E(college_name)=0.246. Similarly we can compute Gain(campus_placement)=0.151,Gain(faculty_performance)=0.029,and Gain(student_performance)=0.048.since college_name has the highest information gain among the attributes, it is selected as the test attribute.

IJSWS 15-165; © 2015, IJSWS All Rights Reserved

Page 90


Archana Sharma et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 88-92

The samples are then partitioned as shown in figure below: College_Name

X

Z Y

Faculty_performa nce Excellent Excellent Good Average Good

Campus_place ment No No No Yes Yes

Student_perfo rmance Reappear Pass Reappear Reappear Pass

Faculty_perform ance

Campus_plac ement

Student_perfo rmance

Class

Class No No Yes Yes Yes

Good Average Average Good Good

No Yes Yes Yes No

Reappear Reappear Pass Reappear Pass

Yes Yes No Yes No

Faculty_performance

Campus_placement

Student_performance

Class

Excellent Average Good Excellent

No Yes No Yes

Reappear Pass Pass Reappear

Yes Yes Yes Yes

Fig. 3: Decision table The samples falling into the partition for college_name=”Y”all belong to the same class. Since they are all belong to class yes, a leaf should therefore be created at the end of this branch and labeled with yes. The final decisiontree returned by the algorithm is shown below College_ Name

X

Z Y Student_ Performance

Campus_ Placement

NO

No

YES

Yes

Yes

PASS

No

REAPPEAR

Yes

Fig. 4: Decision tree The classification learning was also used to predict the student’s failure/success to pass the academic exam based on their present behavioral profile. For the ID3 classification learning based on training set, there was an85.71% success rate (the correctly classified instances) which is higher value of prediction. We have taken a sample of 14 instances from which 12 are correctly classified and 2 are not correctly identified. From the decision tree we are easily identify the weak institute and whose chances of fail are maximum. After identifying the weak institute we can work hard on that institute to minimize the failure result and we can improve overall result and performance of the institute. Advantage of Data mining in Academics Data mining gives the answer of questions like: Q1. Which college provides quality education? Q2. How qualified is the faculty, i.e. Ph.D., MTECH, B.TECH? Q3. How active is the faculty in research? Q4. Are there many visitors giving seminars? Q5. Do they arrange workshops and conferences regularly? Q6. The number of faculty members? Q7. How good are the labs in the discipline of your choice? Q8. How many books are there per students in the library? Q9. What e-journals do they subscribe? Q10. How much bandwidth on a per capita basis do they have? Q11. Are lecture hall equipped with Pc’s, LCD projector screen and audio facilities?

IJSWS 15-165; © 2015, IJSWS All Rights Reserved

Page 91


Archana Sharma et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 88-92

Q12. Q13.

What sports facilities exist for the student boys? Which college provides best branch? Statically result given by ID3 algorithm

III. CONCLUSION The analysis about the Institute success rate varies for students in choosing the right college. The domain of the current patterns in colleges about the success rate identifies in this paper we will concluded about the problem faced by students how to choose the best institutes for learning. Whether provides better quality education and its alumni are successful in the real world. Since the model shows the weak institute, it also helps the students to identify the best college to build their future. In end it helps students to take decision more accurately about the college and courses. Proposed system also shows the data graphically according to the need of the students in particular fields. For future work we also use clustering, with the help of clustering we can see the domain and emotion states of students. REFERENCES [1] [2] [3] [4] [5]

[6] [7]

Hideko Kitahama, “Data Mining through Cluster Analysis Evaluation on Internationalization of Universities in Japan”. Bruce L. Golden R. H. Smith School of Business University of Maryland College Park, MD 20742 “An Example of Visualization in Data Mining” Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories “Data Mining Applications in Higher Education”. Thulasi Kumarthulasi.kumar@uni.edu, University of Northern Iowa “ Theoretical Basis for Data Mining Approach to Higher Education Research”. N.V.Anand Kumar Research Scholar, Department of Computer Science and engineering Anna university, Chennai G.V.Uma Assistant professor, Department of Computer Science and Engineering Anna university, Chennai “Improving Academic Performance of Students by Applying Data Mining Technique”. Han, J. and Kamber, M., (2006), "Data Mining: Concepts and Techniques", 2nd edition. The Morgan Kaufmann Series in Data Management Systems", Jim Gray, Series Editor, 2006. C. Romero, S. Ventura, E. Garcia (2008), "Data mining in course management Systems: Moodle case study and tutorial", Computers & Education, Vol. 51, No. 1, pp. 368- 384, 2008.

IJSWS 15-165; © 2015, IJSWS All Rights Reserved

Page 92


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Virtualization in Cloud computing Domain structure and data security enhancement Santvana Singh1, Sarla Singh2, Sumit Dubey3 Student (M.Tech. EC), 2Head of Department, 3Assistant Professor 1,2,3 Department of Electronics and Communication Engineering, Jawahar Lal Nehru College of Technology Rewa, Madhya Pradesh, INDIA. __________________________________________________________________________________________ Abstract: cloud virtualization and data security is a process to provide secure services and virtualization to cloud consumers. We have implemented a private cloud domain structure to control a private cloud consumer network and its client. In this paper we are using Microsoft server windows 2012 as cloud server and domain controller and providing virtualization and data security services by using centralize storage system. __________________________________________________________________________________________ 1

I. Introduction cloud computing is very new flexible, secure, adaptive, and easy upgradable technique. Which provides centralized controlling and data storage facility? It has virtualization power to enhance remotely login services and secure communication. It is based on virtualization which means we have no need to use any other special device to implement cloud server. Cloud is much secure and strong facility for corporate clients. computing provides a shared pool of resources, including data storage space, networks, computer processing power, and specialized corporate and user applications. In other words Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal Management effort or service provider interaction. This cloud model promotes Availability and is composed of five essential characteristics, three service models, and four deployment models. II. Literature survey virtualization is a technique in which we share our private resources for as limited time period with next hopes. In virtualization we can use multiple sessions at the same time and same shell. Virtualization is very powerful technique in computer networking which provide a very secure communication and we have no need to do any up gradation in next hosts. It is very flexible service of cloud computing and it provide centralized controlling so we can control our private network through unwanted malicious. III. How cloud computing services work Cloud computing services have several common attributes: Virtualization- cloud computing utilizes server and storage virtualization extensively to allocate/reallocate resources rapidly Multi-tenancy -resources are pooled and shared among multiple users to gain economies of scale Network-access - resources are accessed via web-browser or thin client using a variety of networked devices (computer, tablet, Smartphone) On demand - resources are self-provisioned from an online catalogue of pre-defined configurations Cloud computing Services: - basically cloud computing have 3 main services in its background.

Fig. 1.1: cloud services

IJSWS 15-172; Š 2015, IJSWS All Rights Reserved

Page 93


Santvana Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 93-95

1. SAAS (software as service) Cloud-based applications or software as a service (SaaS) run on distant computers “in the cloud” that are owned and operated by others and that connect to users’ computers via the Internet and, usually, a web browser. 2. PAAS (platform as service) Platform as a service provides a cloud-based environment with everything required to support the complete life cycle of building and delivering web based (cloud) applications without the cost and complexity of buying and managing the underlying hardware, software, provisioning and hosting. 3. IAAS (Infrastructure as service) Infrastructure as a service provides companies with computing resources including servers, networking, storage, and data centre space on a pay-per-use basis. IV. Proposed system We have implemented a private cloud domain structure to control a private cloud consumer network and its client. In this paper we are using Microsoft server windows 2012 as cloud server and domain controller and providing virtualization and data security services by using centralize storage system. Experimental methodology: - we have developed code in JAVA by using IIS services. Running the service JAXRS Server Factory Bean sf = new JAXRS Server Factory Bean (); sf.set Resource Classes (Rest Calculator Service Impl.class); sf.set Resource Provider (Rest Calculator Service Impl.class, new Singleton Resource Provider (new Rest Calculator Service Impl (); sf.set Address("192.168.1.1”). sf.setAddress("192.168.1.1"); sf.create(); The lib cloud interface •list_images() •list_sizes() •list_locations() •create_node() •list_nodes() •reboot_node() • Other calls for querying UUIDs, locations, setting, passwords, etc. Apache lib cloud •Find all the VMs I have running in the IBM, Slice host and Rack space clouds: cloud = get_driver(Provider.cloud) Slice host= get_driver (Provider.SLICEHOST) Rack space = get_driver (Provider.RACKSPACE) drivers = [ cloud ('access key id', 'secret key'), Slice host('api key'), Rack space('username', 'api key') ] # Now do what you like with your running VMs V. Result After applying these algorithms in cloud by using JAVA codes our Virtualization in Cloud computing Domain structure will be ready and clients would be able to use cloud virtualization services and also can store its data remotely in cloud server with more security. First cloud server will ask for username and password.

Fig. 1.2: cloud secure login services After providing username and password to the cloud login window it will directly take you to the cloud server main screen and client would be able to use that virtual cloud system.

IJSWS 15-172; © 2015, IJSWS All Rights Reserved

Page 94


Santvana Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 93-95

Fig. 1.3: cloud server connected VI. Conclusion In this paper we have implemented a Virtual Cloud computing Domain structure and data security enhancement system with the help of java and IIS services. And we have created a more efficient and secure cloud server. VII. Future work After these privileges and process we can work further in future also for security purpose, we can use firewalls and advance security system, access control list and many for further modifications and services deployment. References [1].

[2]. [3] [4]

[5] [6] [7] [8]

[9].

[10].

[11].

[12]

[13]

A Secure Framework for User Data Storage in Public Cloud Systems using Symmetric Cryptography Reetika Singh1, Rajesh Tiwari2 1Research Scholar, Faculty of Engineering & Technology, 2Faculty of Engineering & Technology Department of Computer Science & Engineering, Shri Shankaracharya College of Engineering & Technology, Chhattisgarh Swami Vivekananda Technical University, Bhilai - 490006, Chhattisgarh, INDIA. Cloud computing: state-of-the-art and research challenges Qi Zhang·Lu Cheng·Raouf Boutaba Cloud Computing – Issues, Research and Implementations, Mladen A. Vouk Department of Computer Science, North Carolina State University, Raleigh, North Carolina, USA. Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility ,Rajkumar Buyya 1,2 , Chee Shin Yeo 1, Srikumar Venugopal 1, James Broberg 1, and Ivona Brandic 3 ,1 Grid d Computing and Distributed Systems (GRIDS) Laboratory Department of Computer Science and Software Engineering The University of Melbourne, Australia Email: {raj, csyeo, srikumar, brobergj}@csse.unimelb.edu.au ,2 Manjrasoft Pty Ltd, Melbourne, Australia. Cloud Computing Nariman Mirzaei (nmirzaei@indiana.edu) Fall 2008. FOSTER, I., ZHAO, Y., RAICU, I. and LU, S. 2008. Cloud Computing and Grid Computing 360-Degree Compared. In Grid Computing Environments Workshop (GCE '08), Austin, Texas, USA, November 2008, 1-10 Corbató, F. J., Saltzer, J. H., and Clingen, C. T. 1972. Multics: the first seven years. In Proceedings of the May 16-18, 1972, Spring Joint Computer Conference, Atlantic City, New Jersey, May 1972, 571-583. BUYYA, R., YEO, C. and VENUGOPAL, S. 2008. Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities. In High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference, 5-13. Implementing and operating an internet scale distributed application using service oriented architecture principles and cloud computing infrastructure. In iiWAS '08: Proceedings of the 10th International Conference on Information Integration and Webbased Applications & Services, 417-421. CHANG, M., HE, J., and E. Leon, "Service-Orientation in the Computing Infrastructure," 2006, pp. 27-33. [26] SEDAYAO, J. 2008. PROVIDING A SECURE DATA FORWARDING IN CLOUD STORAGE SYSTEM USING THRESHOLD PROXY REENCRYPTION SCHEME. S.Poonkodi1, V.Kavitha2, K.Suresh3 ,1,2Assistant Professor, Information Technology, Karpaga Vinayaga College of Engineering & Technology, Kanchipuram Dt, Tamil Nadu, India 3Assistant Professor, Computer Science & Engineering, KCG College of Technology, Chennai, Tamil Nadu, India Email Address: rkpoonkodi@gmail.com,avija2000@yahoo.com,sureshvk.2008@gmail.com.International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459 (Online), An ISO 9001:2008 Certified Journal, Volume 3, Special Issue 1, January 2013) International Conference on Information Systems and Computing (ICISC-2013), INDIA. Secure Framework for Data Storage from Single to Multi clouds in Cloud NetworkingB.Sujana1, P.Tejaswini2, G.Srinivasulu3, Sk.Karimulla41,2,3,4 QUBA COLLEGE OF ENGINEERING & TECH, NELLORE. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS),Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com,Volume 2, Issue 2, March – April 2013 ISSN 2278-6856. Implementation of Data Security in Cloud Computing G. Jai Arul Jose1, C. Sajeev2 1,2 Research Scholar, Sathyabama University, Chennai, INDIA. International Journal of P2P Network Trends and Technology- July to Aug Issue 2011ISSN: 22492615. SLA-based resource provisioning for hosted software-as-a-service application in cloud computing environments.by linlin Wn,saurabh kumar garg,steve verteeg and rajkumar Buyya.IEEE vol-7,no 3,july-september 2014.

IJSWS 15-172; © 2015, IJSWS All Rights Reserved

Page 95


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Cost Efficient RESTful Services Caching for Mobile Devices Iyad Ollite, Dr. Nawaz Mohamudally School of Innovative Technologies and Engineering University of Technology, Mauritius La Tour Koenig Mauritius Abstract: The use of mobile devices to access the Internet has increased considerably in the past years and the increasing popularity of smart phones allows users to use applications to access content remotely using web services. The most widely adopted web service architecture is RESTful services, which consists of HTTP requests and replies. The challenges faced by mobile RESTful services are high response time, high battery use and bandwidth use. The HTTP protocol was not initially designed for RESTful services and although HTTP compatible caching servers can be used to cache RESTful services, they are not optimized for that purpose. This work proposes a novel transparent 2-tier proxy architecture, which can be easily deployed and used by producers and consumers of RESTful services. This proxy architecture highly reduces bandwith usage which brings a financial cost savings to the mobile user. Keywords: RESTful services, mobile web service, proxy architecture, web caching, mobile proxy, performance analysis, cost I. Introduction With the increasing popularity of smartphones, there has been a change in the way people use their phones and the way business reach customers, namely through the use of ‘apps’. Apps are basically software, which allow a user to perform a given purpose and are downloaded on his mobile phone. Different apps provide different functionalities ranging from social networking, gaming to medical advice or news. Some apps are designed to operate without the use of Internet, for example single player games while others cannot operate without the Internet such as newsreaders, social networking tools and messengers. In between, there are apps, which use the Internet only to provide additional features like advertising, streaming videos, push notifications or to provide updates. A large number of apps that require data from a server do so using a request-response model. The client will create a connection and make a request to the server. The server will listen permanently for incoming connections and send responses. Once the client acknowledges the response, the connection will be closed. To implement such functionality, several techniques can be used including REST, XML-RPC and SOAP. In recent years, there has been an increasing use of REST (Representational State Transfer) based APIs and it is currently the most popular API among developers. REST refers to an architectural model in which a set of rules is applied to components, connectors and data elements in a distributed system. A REST based API does not provide for a specific format in which request messages are sent or responses are received. Given its simplicity, REST is the most commonly used technique for web APIs and apps. When applied to a web service, and used by apps, a RESTful API will consist of a RESTful service is a series of HTTP request and replies. When applied to a web service, and used by apps, a RESTful API will consist of the following:  Location of the server  The location of each service to be provided  The data type of the data requested (XML, JSON, Images)  The specification of the data; as defined by the developer  The HTTP method to be used (GET, POST, PUT or DELETE) Basically, a RESTful service is a series of HTTP request and replies. II. Issues with RESTful Services and mobile devices The architectural constraint of being stateless of a RESTful web service can lead to performance degradation. RESTful web services are appropriate for mobile devices since with each call clients need to send all state information. Consequently, even if there are frequent disconnections with the server, a client can continue its request its normal flow of resources without the need to restart from the first request. The performance issues faced by RESTful services when called from mobile devices include: A. Increase in network latency For different requests, the client will generally initiate different connections to the server. That is, for each request, a new socket connection is required; although this part is usually handled by the operating system and

IJSWS 15-176; © 2015, IJSWS All Rights Reserved

Page 96


Iyad Ollite et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 96-101

by libraries already available to the client when he is making an HTTP request. RESTful servers might use a keep alive mechanism and allow a client to reuse the same connection but these tend to be short-lived (about 180 seconds). So for any given request a new socket needs to be established to the server. B. Increase in processing time Since the server does not keep session information for a client, it will have to do more processing for a given request. For example, with each request, the server will have to recheck the authentication/credentials of the client before processing the actual request. C. Increase in data transfer and financial cost For RESTful services, the client manages his own session information. Consequently, for each request he has to send more information to the server. Additionally, all header data needs to be transferred over again to the server. These headers can be larger than the actual data requested required to process the call. Since resources need to be fine-grained, the client will need to make several requests to the server in order to get all required information. The server as well will need to transfer more information and header data with each request. Mobile Internet is usually sold in subscriptions where users pay for amount of data they download. That is, the more they use mobile Internet, the more they have to pay and mobile broadband Internet is expensive compared to cable based broadband Internet such as ADSL. III. Proposed RESTful Services Proxy Model The proposed RESTful proxy system is a 2-tier model with one component implemented on the client device, referred as RESTful Proxy Client (RESTPC) and a server component referred as RESTful Proxy Server (RESTPS) to be used by the service provider. The aim of the model is to minimize the amount of bandwidth used and therefore bring a cost saving to the end user. When using the RESTful proxy, all RESTful requests from the application are sent to the RESTPC which will analyze the request to determine whether the response can be provided from cache or needs to be fetched from the RESTful Proxy server. If available in the local cache, a RESTful response is sent to the application else a reformatted request is sent to the RESTPS which will process the request and check whether the response can be replied from its cache or needs to be fetched from the service provider. If available from cache, a custom formatted reply is sent to the RESTful proxy client. RESTPC will process the reply and convert it back to a complete RESTful response, which is then forwarded to the application. If the response was not available in the RESTPS, the RESTful request is reconstructed and sent to the service provider. When the RESTful reply is received, the RESTPS reformats the message and sends it to the RESTPC. The latter will rebuild the full RESTful response and send it to the application. When using this model, the application on the mobile device and server at the service provider receives unmodified RESTful requests and replies; the RESTful Proxy is transparent. IV. Proposed RESTful Services Proxy Model The proposed RESTful proxy system is a 2-tier model with one component implemented on the client device and a server component to be used by the service provider. The model assumes that the user installed the RESTPC on his device and that the application developer provided for an option to the end user to use the proxy. RESTPC listens for HTTP request on a given port as specified by the user. When an application makes use of the RESTful proxy service, the request is sent to the RESTPS which will process it. The RESTPS will listen for request on a given port and process all requests from the RESTPC. If the RESTful proxy is unavailable the application will make the request directly from the service provider; else the request is forwarded to the RESTful proxy. The application will determine if the RESTful proxy client is running by checking if the service is running on a certain port. In the event the RESTful proxy is available, the request is processed as depicted in figure 1. When using the RESTful proxy, all RESTful requests from the application are sent to the RESTPC. The RESTPC will analyse the request to determine whether the response can be provided from cache or needs to be fetched from the RESTPS. If available in the local cache, a RESTful response is sent to the application else a reformatted request is sent to the RESTful Proxy server. The RESTPS will process the request and check whether the response can be replied from its cache or needs to be fetched from the service provider. If available from cache, a custom formatted reply is sent to the RESTPC. The RESTPC will process the reply and convert it back to a complete RESTful response, which is then forwarded to the application. If the response was not available in the RESTPS, the RESTful request is reconstructed and sent to the service provider. When the RESTful reply is received, the RESTPS reformats the message and sends it to the RESTPC. The latter will rebuild the full RESTful response and send it to the application. When using this model, the application on the mobile device and server at the service provider receives unmodified RESTful requests and replies; the RESTful Proxy is transparent. Whether the RESTful proxy is used or not, the mobile application will send an HTTP request and receive and HTTP reply; the service provider will receive an HTTP request and send an HTTP reply.

IJSWS 15-176; Š 2015, IJSWS All Rights Reserved

Page 97


Iyad Ollite et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 96-101

Figure 1: RESTful Proxy Model.

V. RESTful Services Proxy Caching Techniques Like all cache systems, the RESTful Proxy needs to address several issues in order to be as efficient as possible. A. Cache placement A client proxy resides on the user’s mobile device and another proxy is installed on a machine on the same local area network or same machine as the service provider. B. Cached content Given the characteristics of a RESTful service, only GET responses are usually cached. However, the proposed model caches GET responses as well as headers of GET, POST, PUT and DELETE requests and responses. This is possible since the client proxy and server proxy have their own internal communication protocol and can reuse previous headers to rebuild a complete request to be sent to server and header responses to be sent to clients. C. Inter-Cache communication The communication protocol modeled allows the RESTPC to communicate with the RESTPS; RESTPS do not share content among themselves. In certain cases, the RESTPSC might send a request to the RESTPS which contains only an identifier to a previous request but the RESTPS might no longer have a record of the request. In that case, the RESTPS will reply a response with the status 301(Moved permanently) and The RESTPC will then need to send the full request. The identifier system implemented does not require the RESTPS to check if the RESTPC has a cached version of the content. Since the RESTPC sends the unique ID of the resource it has in its cache, the RESTPS can determine whether to send a new response or a response with status 304 (Not Modified). D. Cache replacement policy for RESTPC For the RESTPC, two separate replacement policy based on whether responses contain cache directives are used. Firstly, the Next To Expire algorithm (NTE) is used since it is relatively simple algorithm and therefore does not require much processing power and consequently, the amount of battery use is limited. However, for NTE to be efficient, the response headers must include the max-age directive found in the Cache-Control response header or the Expires header directive is specified. For responses not containing any cache directive, the Least Frequently Used (LFU) algorithm is used. LFU as well is a straight forward algorithm which does not require much processing power. In the absence of cache directive, the client proxy will still have to send the request to the server proxy to ensure that the content has not changed. To be able to keep track of which items is least frequently used, every time a resource is used from cache a counter associated to it is incremented. E. Cache replacement policy for RESTPS For RESTPS, For the server proxy, the NTE (when cache directives are available) and a Greedy Dual-Size (GDS) (when cache directives are not available) algorithms are used. The GDS used takes into consideration the time required to process a request by the server and the size of the response. This algorithm is different from the regular GDS algorithms in that the server proxy is really close to the service provider and the number of network hops between the service provider and server proxy is two or less. Rather than using network hops to determine the cost of a resource, the actual processing time of the request by the service provider is used. The second

IJSWS 15-176; Š 2015, IJSWS All Rights Reserved

Page 98


Iyad Ollite et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 96-101

parameter considered for the algorithm is the compressed size of the resource. The weight for a given resource is given by 1-(1/(Cost*Size)). The element with the least weight will be evicted from the cache. VI. Implementation A prototype above model has been implemented for mobile devices running the Android operating system; the RESTPS has been implemented in Java. The RESTPC will run on a certain port on the mobile device and the application can connect to that port. When coding the application, the developer needs to check if a service is running on that given port and if the port is accepting connections, he then requests that all calls are made using that proxy. For test purposes, the end user can install the software and configure its mobile device settings to use the RESTPC as proxy. The pseudo code to be implemented by the mobile developer is as follows: Read global settings If Restful Proxy Port defined Set Proxy Host as localhost for current application Set Proxy Port as defined in global settings for current application Endif Make standard HTTP request to call RESTful Service RESTPS runs on a certain port and is installed on the same machine providing the service or on the same local area network. It is totally transparent to the developer and no changes are required to be made by the RESTful service provider developer. A RESTful news service was also implemented to determine the effectiveness of the system on a real service. VII. Performance Review In order to access the performance of the above described mobile caching architecture, the following metrics were used, namely hit ratio, byte-hit ratio and bandwidth usage. The following metrics were compared for the following architectures:  RESTful service without  RESTful service with a single web cache  RESTful service with RESTful Proxy without the client proxy active  RESTful service with RESTful Proxy with the client proxy active For all the metrics, the RESTPC cache size tested was 100KB while the RESTPS cache size was 5MB. A. Hit Ratio The hit ratio represents the ratio of number of items fetched from the cache with respect to the total number for requests made. For the RESTful Proxy model, hit ratio is calculated at for both the RESTPC and RESTPS. Figure 2 shows the hit ratio of the different architectures. It can be seen that when using a single cache, the hit ratio is slowly increases to 72% as the number of requests increases and cache is filled. When using the RESTful Proxy both with and without the client cache, the hit ratio is 81.7%; that is an increase of around 9.5%. This increase can be explained by the cache replacement policy used by RESTPS. This increase can be explained by the fact that more resources can be stored in the cache when using compression and optimizing the data exchange between the RESTPC and RESTPC. B. Byte Hit Ratio Byte hit ratio represents the ratio of total size of resources that were fetched from cache with respect to the total size of resources served. Figure 3 shows the byte hit ratio of the different architectures. This metric is calculated at for both the RESTPC and RESTPS.When using a single cache, the byte-hit ratio is around 72%. When using the RESTful proxy, the byte-hit ratio is around 82%. The byte-hit ratio is higher due the Greedy Dual-Size algorithm implemented. By keeping larger but compressed resources in the cache, these larger items are served directly from cache. C. Bandwidth Usage Bandwidth usage represents the total number of bytes that was sent over the network and the bandwidth savings represents the total size of resources that were not sent over the network and retrieved from cache. Since the model proposed is a 2-tier one, when a resource needs to be validated from the RESTPS either the complete resource needs to be sent or only a validation message. When no cache is used, the bandwidth used by the service provider and clients for the 1,000,000 requests is 5,130MB. When a single cache is used, the bandwidth used by the service provider falls to 1413MB; representing a bandwidth savings of 72% for the service provider. However, for the clients, 5,130MB of data is still transferred since the data, which is cached, will be served from cache rather than service provider.

IJSWS 15-176; © 2015, IJSWS All Rights Reserved

Page 99


Iyad Ollite et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 96-101

When the RESTful Proxy is used without the client component, only 1,026MB is required for all requests. This represents the bandwidth consumed by both the server and clients; a reduction of bandwidth of 80%. This reduction of data usage can be explained by the use of compression by the RESTful proxy. When the RESTful proxy is used with both client and server caches, the amount of data transferred for all the 1,000,000 requests is only 712MB; a bandwidth usage reduction of 86%. The additional gain is explained by the fact that some requests can be served from the client’s local cache and therefore no network data transfer is required. Figure 2: Hit Ratio

Figure 3: Byte Hit Ratio

Figure 4: Bandwidth Usage

VI. Conclusion Based on the above results it can be deduced that the RESTful Proxy will bring cost savings to both the service provider and mobile users. For mobile users, there is a reduction in bandwidth usage and therefore a reduction in the amount spent on mobile data package. For the service providers, less bandwidth is required as well as less CPU time. Consequently, the service will be able to serve more requests before reaching full capacity. The proposed model also has its limitation. Firstly, it focuses on only the economic gain for the mobile user. A full study needs to be done to determine the impact of the system on other key metrics such as response time, network latency, processing time and battery use. Additionally, different compression algorithms need to be tested with the above model to determine which is most appropriate.

IJSWS 15-176; Š 2015, IJSWS All Rights Reserved

Page 100


Iyad Ollite et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 96-101

References [1]

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Hu, X. and Zincir-Heywood, A. N. 2005. Understanding the Performance of Cooperative Web Caching Systems. In Proceedings of the 3rd Annual Communication Networks and Services Research Conference (May 16 - 18, 2005). Communications Networks and Services Research Conference. IEEE Computer Society, Washington, DC, 183-188. Rodriguez, P., Spanner, C., and Biersack, E. W. 2001. Analysis of web caching architectures: hierarchical and distributed caching. IEEE/ACM Trans. Netw.9, 4 (Aug. 2001), 404-418. Teng, W., Chang, C., and Chen, M. 2005. Integrating Web Caching and Web Prefetching in Client-Side Proxies. IEEE Trans. Parallel Distrib.Syst. 16, 5 (May. 2005), 444-455. Bzoch, Pavel; Matejka, Lubos ; Pesicka, Ladislav ; Safarik, Jiri, 2012, Towards caching algorithm applicable to mobile clients, In Federated Conference on Computer Science and Information Systems (FedCSIS), pp 607 – 614. Mwaffaq Otoom and JoAnn M. Paul. 2008. Holistic design and caching in mobile computing. In Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis (CODES+ISSS '08). Santhanakrishnan, G.; Amer, A. & Chrysanthis, P. K. (2005), Towards universal mobile caching., in Vijay Kumar; Arkady B. Zaslavsky; Ugur Çetintemel & Alexandros Labrinidis, ed., 'MobiDE' , ACM, , pp. 73-80 . J. Fernandez, A. Fernandez, J. Pazos,. (2005), Optimizing web services performance using cache., In International Conference on Next Generation Web Services Practices. Dawar, M., Singh, C. (2013), ‘Study on web Caching Architecture: A Survey’, In International Journal of Advanced Research in Computer Science and Software Engineering., pp 581-585 . Wang, J. (1999), 'A survey of web caching schemes for the Internet.', Computer Communication Review 29 (5) , 36-46 . Athena I. Vakali , George E. Pallis (2001), A Study on Web Caching Architectures and Performance. Catrein, D.; Lohrer, B.; Meyer, C.; Rembarz, R. & Weidenfeller, T. (2011), An Analysis of Web Caching in Current Mobile Broadband Scenarios., in 'NTMS' , IEEE, , pp. 1-5 . Cai, X.; Zhang, S. & Zhang, Y. (2013), Economic analysis of cache location in mobile network., in 'WCNC' , IEEE, , pp. 12431248 . Wathsala, W.V., Siddhisena, B., Athukorale, A.S. (2008), ‘Next generation proxy servers.’, In 10th International Conference on Advanced Communication Technology, 2008. ICACT 2008., pp 2183-2187. Qian, F.; Quah, K. S.; Huang, J.; Erman, J.; Gerber, A.; Mao, Z. M.; Sen, S. & Spatscheck, O. (2012), Web caching on smartphones: ideal vs. reality., in Nigel Davies; Srinivasan Seshan & Lin Zhong, ed., 'MobiSys' , ACM, , pp. 127-140.

IJSWS 15-176; © 2015, IJSWS All Rights Reserved

Page 101


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Indoor Survelliance System using Image Processing Upasana Dugad1, Chaitrali Mahanwar1, Nimisha Rajeev1, Rupali Deshmukh2 1 Student, 2Assistant Professor 1,2 Department of Computer Engineering, Fr. C. Rodrigues Institute of Technology, Sector 9A, Vashi NaviMumbai, 400708, INDIA _______________________________________________________________________________________ Abstract: Video surveillance has long been in use to monitor security sensitive areas such as banks, department stores, highways, crowded public places and borders. The purpose of this application is to detect a human intruder. This application will detect the occurrence of motion within the area under surveillance. Further, it will also detect if the motion is caused by a human intruder. This paper predominantly focuses on how motion and human can be detected. The study of these concepts has established a strong foundation on which the application has been architected. Keywords: Differentiation; Vitruvius proportions; Ghosting effect; Blob; perturbations _________________________________________________________________________________________ I. Introduction Insecurity and crime constitute some of the major problems facing the immediate society today. People live with a fear of being attacked by burglars, vandals and thieves. Today in the society security is one of the major issues and having a 24*7 human eye is just impossible. In order to be secured of safety, it has become a necessity to realize and manage smart surveillance system. Despite all the effort, resources and time that has been devoted to the development of tools that will reduce crime rates and make the world a safer place to live, these problems are still increasing substantially. These give rise to the need for an increasing development in the technology of motion sensors. Even with the introduction of these alarm systems which have reduced greatly the level of insecurity, there is still a problem of false alarm which needs to be minimized. Also, changes in illumination, noise, and compression artifacts make motion detection a challenging problem. In order to achieve robust detection, there is a need of a new technique i.e. using various image processing algorithms which will detect motion. This system will provide security and ensure alarms are activated only when an unauthorized person try to gain access to the protected area. II. Existing Systems A burglar alarm may be implemented using different methods. Two of them are: (a) Using Image Processing (b) Using Motion Sensors A motion detector is an electronic device that detects the physical movement in a given area/ designated locations and it transforms motion into an electric signal. All motion sensors indicate the same thing; that some condition has changed. All sensors have some 'normal' state. Some sensors only report when the 'normal' status is disturbed, others also report when the condition reverts to 'normal'. [1] However, there are a number of drawbacks associated with them. They are as follows:  It might not cover a full room.  Things may trigger the motion detector that you don't want to.  An outside detector mounted too close to a light that stays on at night, will be triggered continuously by bugs.  The set shutoff time of the motion sensor may be too short (maybe only 30 seconds). For outside lights with a motion detector, if you want more "on time", it's best to buy a model with an adjustable shutoff time.  The reliability of a motion sensor may also be affected by rapid environmental changes and direct sunlight. Also, things like a fireplace, and direct wind from an air conditioner or heater. This is because a PIR sensor actually detects changes in infrared energy, specifically, the "heat energy" emitted by normal human skin temperature. All the above disadvantages can be overcome by using image processing.[2] III. Human Detection Using Histograms Once motion is detected using image processing, the next step is to detect if the motion is due to a human intruder. This can be done using histograms or using the optical flow algorithm. However, the method using histograms also have a number of drawbacks. They are as follows: Higher cost.  Less accurate.  Comparatively higher CPU load.

IJSWS 15-184; © 2015, IJSWS All Rights Reserved

Page 102


Upasana Dugad et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 102-106

 Requires a lot of processing time. Thus, to avoid these drawbacks, the optical flow algorithm is used so as to detect the presence of a human intruder. IV. Proposed System In this paper a burglar alarm system is proposed. The main objective is to provide security to the user. For development of a burglar alarm using Image Processing, there are 2 major steps to be followed: 1. Motion Detection 2. Human Detection A. Motion Detection using Image Processing The surveillance system being designed is for a static background and for night time use. Since no motion is expected at night time, the first step is to detect if there is any motion in the area under surveillance. Initially, the software switches on the webcam in the video mode. A video is nothing but a stream of images. Now, the first step is to extract images from the video. The first image is stored as the background image.

Figure 1 Flowchart of the design of the system All other images following the first image are considered as the foreground images. A video has around 30 frames in every second. Since the pace of a human is not that fast, the frames are compared at an interval of every 2 seconds. Every 2 seconds, the captured image is captured with the background image. Image differentiation is performed on these two images, i.e. the background image is subtracted from the foreground image. The two images are compared pixel by pixel. Now, if there is a difference in the pixel values of the two images, the result of subtraction will be a non-zero value, else zero. Now if there is no motion in the frame, there will be no difference in the images. However, in case of motion, there will be a difference in the two images and hence, the pixel values will be different. Thus, the subtraction of the two images will yield a non-zero result. Thus, motion can be detected. The reason for comparing the images with the initial image and not with the

IJSWS 15-184; Š 2015, IJSWS All Rights Reserved

Page 103


Upasana Dugad et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 102-106

image captured just prior to the current image is that it may result in the ghosting effect. Now, consider a person enters the frame at instant t. Now, if the image that is captured at instant t+1 is compared with this image, the result will be an overlapped image of the person at instant t and at instant t+1. This makes it difficult to determine the exact position of the person. Such an effect is known as the ghosting effect. In order to avoid this, all the images are compared with the initial image that is static and has no motion.

Figure 2: Difference in frames [3]

Figure 3: Ghosting effect [3] Consider the example of a shop wherein the area facing the entrance is under surveillance. Now, during night time there will be no motion in this area. So the video captured by the webcam should ideally capture a series of images which are identical. Thus, when the subsequently captured images are compared with the first frame, the result should be a zero value. However, when there is motion, say at instant t, the image captured will be different from the initial one. Now, when this image is compared with the original background image, the result will be a difference of the pixel values of both images which will yield a non-zero value. This shows that there has been motion in the area under surveillance. [3] B. Human Detection B1. Scope of human detection The needs of a human motion analysis system can be summarized in: Robustness, Accuracy and Speed. Using those three features, it is possible to make a basic description about the requirements for a proper algorithm or software. Camera: The camera is not calibrated. The camera is situated in a fixed position and do not have to be affected by any kind of vibrations. Scenario: The scenario is indoor. Due to design requirements, is preferable to avoid major illumination changes, hence the environment is indoor. Detected objects / Persons: The target persons' height has to appear completely in the scene as the ultimate measurement is the height. In addition, in a sub-stage of this software there is a filter which works using human shape proportions, so it is necessary to have the complete shape of the person in the image. It is not necessary to have a person in movement because the detection is based on frame differencing, comparing every frame with the background scenario, which is previously computed. After it has been found out that motion has occurred, the next step is to find out the reason behind the motion, i.e. whether the motion is because of a human intruder. There is a possibility that the motion detected can be because of any object falling or curtains flying or any other such reason. So to find out whether the motion occurred is because of an intruder, the second step is human detection. This system uses the optical flow algorithm for human detection.

IJSWS 15-184; Š 2015, IJSWS All Rights Reserved

Page 104


Upasana Dugad et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 102-106

Once the motion regions are detected it is possible that several sorts of moving objects can be distinguished, i.e. persons, animals, cars, etc. Objects are classified based on the shape of the motion regions. That information can be related to the surface, aspect ratio or the diffusion of the shape. It is a common practice to consider a square box around the region of interest in order to work with the surface or with the width/length ratio of the box. This technique makes use of shape-based object classification based on human shape proportions. The frame achieved from the human differentiation technique is the input to this algorithm. This algorithm gives two outputs. The principal one is the coordinates of the detected object. With the coordinates, the aforementioned measuring system will be able to calculate the height by itself. The other output is the video sequence with the detected person bounded by a rectangle; which provides visual feedback to the user.

Figure 4: System inputs, outputs and perturbations. [4] The goal of this application is to build a real-time human detection algorithm which could be for further estimation of human height. The algorithm will be able to have any kind of video inputs. An important assumption is that the algorithm should work with any standard webcams. Furthermore, it is assumed that indoor scenes are being analyzed and there is no need of camera calibration. B2. Object classification Once the motion regions are detected it is possible that several sorts of moving objects can be distinguished, i.e. persons, animals, cars, etc. Even though false positives due to noise could appear. Methods to implement moving object classification: shape based and motion based classification. Shape-based: Objects are classified based on the shape of the motion regions. That information can be related to the surface, aspect ratio or the diffusion of the shape. It is a common practice to consider a square box around the region of interest in order to work with the surface or with the width/length ratio of the box. That is how it is going to be done in this project. This technique makes use of shape-based object classification based on human shape proportions. Basic Steps: (a) Getting blobs of the objects Now, the system draws a rectangle which, allegedly, fits a human body silhouette i.e. the blob. The base and the top coordinates of the detected object are estimated. With these coordinates, a person’s height is estimated. An average height to width ratio of the object is found out. (b) Blob Generation. This is a simple technique which consists on filling up the complete shape of the moving object from the detected blobs using the classic procedures of morphological processing, such as dilation and erosion. Those blobs are coming from the result of differencing the background and the current frame i.e. from the motion detection algorithm.

Figure 5: Window showing the blob of a person. [4] Now a rectangle is drawn which completely inscribes the blob in it. [4] B3. Deciding Stage Using the measurements of the rectangle, the base and top co-ordinates are found out. Using these base and top co-ordinates, the height of the blob is found out. Now in general, according to the Vitruvius' proportions the ratio between the height and width of a human is 1.7. So the ratio of height and width of the blob is found out. If the ratio lies between the range 1.6 -1.8, then detected, if not then the motion is not because of an

IJSWS 15-184; Š 2015, IJSWS All Rights Reserved

Page 105


Upasana Dugad et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 102-106

intruder. If human is detected, then the alarm rings, if not then the remaining frames are compared again with the first frame, and the entire procedure repeats till the video is on. V. Conclusion Using sensors for motion detection is not completely reliable as it has a higher probability of raising false alarms and also inefficient functioning .Whereas in the proposed system, motion detection using image processing is much faster and cost effective. Effective storage and transmission, availability in any desired format, noise free processing are the advantages of using image processing for motion detection. Also for Human Detection, traditional HOG cannot describe the body detail in larger image region. It also requires a lot of processing time. The information is often noisy. Another disadvantage is that as the final descriptor vector grows larger, the time for extracting and training using a given class increases. But, the Optical flow method of the proposed system deals with vibrations and illumination changes. The problem of detecting the silhouette of a person correctly and distinguishing it from windows and doors is solved by checking the human proportion constrains and it is based on the number of pixels in movement which are inside the detected rectangle. It also helps save a lot of computational time. References [1] [2 [3] [4]

http://www.engineersgarage.com/articles/ motion-detection-sensors. http://www.answers.com/Q/Disadvantages _for_motion_sensor. Robust Motion Detection in Real-Life Scenarios By Ester Martテュnez-Martテュn, テ]gel P. del Pobil. http://www.bth.se/fou/cuppsats nsf/all/91ff3eaa9d89ef19c12579540033741 9/$file/BTH2011Quiros.pdf

IJSWS 15-184; ツゥ 2015, IJSWS All Rights Reserved

Page 106


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net Primary Education Analysis Based on Decision tree, Apriori Algorithm with Neural Networks 1

Manmohan Singh , 2Anjali Sant Reserch Scholar, Department of Computer Science, Mewar University, Rajasthan, India 2 Department of Mathematics, Bhopal Institute of Tech. and Science, Bhopal, Madhya Pradesh, India _______________________________________________________________________________________ Abstract: In this research work we have concentrate to measure the primary education status in Betul Dist. M.P India, It known that the literacy rate of Betul District is very slow and it is not the different in India. Here we measure the result rate of primary school III, IV and V at different classes at different Block of. Here we use Apriori algorithm to classify the data from irrelevant data and tertiary level data. After then we have applied Neural Network (Multilayer perceptron) to train the data set for better result. Finally we have compared the result by calculating the result with Decision tree. Here we found that if the previous result how to improve this result rate is small Neural Network is best to measure the result and NN generate effective result improvement facture rate is batter result . Keywords: Decision Tree Algorithm, Neural Network, Apriori algorithm, Primary school, Association Mining. _________________________________________________________________________________________ I. Introduction The development of any country depend s mainly on its manpower and the pillar of good manpower is the primary education. It is easy to realize that as the children are engaging to primary education as the hopes of the prosperity will go up for respective country. Kids learning are very important for every country irrespective of rich and poor. The countries which are more developed are developed at their education level and the development start from primary level. Basically kids are very much curious in every matter and those loves to adopt with innovative culture and fashion. To ensure the kids literacy governments as well as the parents should take effective measure which will attract the kids to learn with joy and enjoyments. Proper education for the kids can empower human beings to liberate individual mind from the curse of ignorance and darkness. It represents the foundation in the development process of any society and the key indicator of the people’s progress and prosperity. In the view of the importance of education to a country like India the present thesis addresses limitations of primary education system, which is diversified and multifarious due to economic, socio cultural, political, regional and religious factors. The entrance of primary education is maintained mainly by the government of India. More than 73% schools are controlled by the government and around 82% of the total children enrolled in the primary level educational institution go to these schools (Baseline Survey, 2005:3). Similarly, more than 69% primary teachers are working in the government controlled schools. Besides government run primary schools, nine other category of primary schools are administered, monitored and maintained by different authorities. Disparity and lack of coordination among these institutions constrains the attainment of universal primary education and in its effort to increase enrollments and quality education. Variations in teacher student ratio, the number of qualified and trained teachers between the categories, also pose a big challenge towards achieving the goals of universal primary education. In the circumstances of the open scenario, India became one of the signatories to the UN Millennium Declaration in 2000, and has achieved to eight Millennium Development Goals (MGDs) that affirmed a perception for the 21st century (Burns et al, 2003:23). Bangladesh also pledged to implement the MDGs roadmap by 2016. The MDG-2 targets for ‘school Chalo Abhyan Universal Primary Education’ are claimed to be on track in India, showing remarkable achievements in terms of net enrolment rate in primary education 72.7% in 1992 to 86% in 2005 and primary education completion 44.5% in 1992 to 85.3% in 2004 (Triumvir, 2005:120). India government itself had taken many initiatives, including the Compulsory Primary Education Act 1973, which made the five-year primary education program free in all primary school. The government adopted demand side intervention policies such as food for education program and stipend program for primary education. Of late, the government introduced primary education development program (PEDP-II), a Five year program beginning in the year 1983, which aims to increase access, quality and efficiency across the board in the primary education sector despite existing socioeconomic problems. India by now has achieved a good progress in net enrollment rate and education completion rate in primary education. The current paper will examine the outcomes and challenges that have emerged as a result of Betul District. It would further investigate whether the target of the second Millennium Development Goal is attainable within the stipulated time. To better improvement of literacy at primary level we examined and interviewed a lot of kids at various places and found that the children are used to learn with joy and entrainment. Recently in Betul District (M.P) various primary school at other country/Dist or state education system Strategy of like implemented school Chalo 1

IJSWS 15-189; © 2015, IJSWS All Rights Reserved

Page 107


Manmohan Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 107-113

Abhya at primary school compartment and get the very positive results. In schools and constructing roads and provide Food at morning time Lunch is call Madhyan Bhojan because to give students easy access to schools and gat good Healthy food. Construction of 70 more is underway. It also paved 45 roads for easy access to schools for [3] students from nearby areas. About 78 more are under construction but Madhyan Bhojan facility is government provide is All over India and Betul Dist rural and Urban Area School. Any student a student of class-IV at Rural Area Village Government Primary School said she could not attend school in the rainy season last year for bad condition of the road to her school. But she has not missed any classes this year after the road got repaired. Head master of the Rural (Chakli Kala Multai Block) Area Govt. primary school said most students now come to school in the rainy season but the attendance hovered around 70-75 percent during the same period before the road was repaired. The children's School School Chalo Campaign. In this similar case private primary school in Urban and Rural area they do not provide any type of food faculty but school management provide some extra curriculum extra activity and motivate student all so his parents . In the school is an additional attraction for students, he added. Several students were found waiting for their turn to get on a swing at M.P Government Primary School. Many of them come to school early to play on the swings. a student of class-V of Rural Area Government Primary School, said their playground used to go under water in the rainy season. It was developed last year, and now they can use it all year round, headmaster of Rural Area (Goradongri Block Amdana) govt primary school, said the Student initiative helped increase attendance rate in his school. Now about 93 to 97 percent students attend classes, while it was 80 to 85 percent a year ago. Rural Area (Multai Block Barai Village) [3] headmaster of Amla govt primary school, said the children's School Chalo Campaign of the school has brought a big change. The attendance rate went up to more than 90 percent from below 80 percent. Student said they are setting up children's School Chalo Campaign on school playgrounds to provide students with leisure facilities on instructions from School Had Master. The organizations of the work start by the literature overview after the introductory descriptions. At the introductory description we have observed the situation of Betul District primary education and the current status of the country. At the literature study part we have look forward towards the whole world situations specially the developing courtiers of the world like. The data collection is done in all the block of Betul District at ruler and urban area private, Government school and semi govt. After the literature study and the data set are real world data from various primary school of Betul of Madhya Pradesh. It is very alarming that we have found very much irregularity to collect the data. Data collection helps us to design the intelligent system for our desired work. At first we applied the data classification techniques to the collected data. We choose decision tree and Apriori Algorithm in Association data set. The data set classification helps us to reduce the redundant data. At last but not the least we compare the result in Neural Network (Multilayer perceptron). II. Literature Overview [Ma, Liu, Wong, Yu, and Lee 2000] Applied a DM approach based in Association Rules in order to select weak tertiary school students of Singapore for remedial classes. The input variables included demographic attributes (e.g. sex, region) and school performance over the past years and the proposed solution outperformed the traditional allocation procedure. [Ma, Liu, Wong, Yu, and Lee .2000] they have applied a data mining technique based on association rules to find weak tertiary school students (n= 264) of Singapore for remedial classes. Three scoring measures namely Scoring Based on Associations (SBA-score), C4.5-score and NB-score for evaluating the prediction in connection with the selection of the students for remedial classes were used with the input variables like sex, region and school performance over the past years. It was found that the predictive accuracy of SBA-score methodology was 20% higher than that of C4.5 score, NB-score methods and traditional method. [Y. B. Walters, and K. Soyibo 2001 ] they have conducted a study to determine Jamaican high school students' (population n=305) level of performance on five integrated science process skills with performance linked to gender, grade level, school location, school type, student type, and socio-economic background (SEB). The results revealed that there was a positive significant relationship between academic performance of the student and the nature of the school. [Minaei-Bidgoli et al. 2003], online student grades from the Michigan State University were modeled using three classification approaches (i.e. binary: pass/fail; 3-level: low, middle, high; and 9-level: from 1- lowest grade to 9 - highest score). The database included 227 samples with online features (e.g. number of corrected answers or tries for homework) and the best results were obtained by a classifier ensemble (e.g. Decision Tree and Neural Network) with accuracy rates of 94% (binary), 72% (3-classes) and 62% (9- classes). [Varapron P. et al. 2003] they have used Rough Set theory as a classification approach to analyze student data where the Rosetta toolkit was used to evaluate the student data to describe different dependencies between the attributes and the student status. The discovered patterns are explained in plain English. [Kotsiantis et al.2004] Applied several DM algorithms to predict the performance of computer science students from university distance learning program. For each student, several demographic (e.g. sex, age, marital status)

IJSWS 15-189; Š 2015, IJSWS All Rights Reserved

Page 108


Manmohan Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 107-113

and performance attributes (e.g. mark in a given assignment) were used as inputs of a binary pass/fail classifier. The best solution was obtained by a Naive Bayes method with an accuracy of 74%. Also, it was found that past school grades has a much higher impact than demographic variables [Khan Z. N. Khan,2005 ] he has conducted a performance study on 400 students comprising 200 boys and 200 girls selected from the senior secondary school of Aligarh Muslim University, Aligarh, India with a main objective to establish the prognostic value of different measures of cognition, personality and demographic variables for success at higher secondary level in science stream. The selection was based on cluster sampling technique in which the entire population of interest was divided into groups, or clusters, and a random sample of these clusters was selected for further analyses. It was found that girls with high socio-economic status had relatively higher academic achievement in science stream and boys with low socio-economic status had relatively higher academic achievement in general. [Moriana . J. A. Moriana, F. Alos, R. Alcala, M. J. Pino, J. Herruzo, and R. Ruiz, 2006] they have studied the possible influence of extra-curricular activities like study-related (tutoring or private classes, computers) and/or sports-related (indoor and outdoor games) on the academic performance of the secondary school students in Spain. A total number of 222 students from 12 different schools were the samples and they were categorized into two groups as a function of student activities (both sports and academic) outside the school day. Analysis of variance (ANOVA) was used to verify the effect of extracurricular actives on the academic performance and it was observed that group involved in activities outside the school yielded better academic performance. [Pardos et al.2006] Collected data from an online tutoring system regarding USA 8 th grade Math tests. The authors adopted a regression approach, where the aim was to predict the math test score based on individual skills. The authors used Bayesian Networks and the best result was a predictive error of 15%. [N.V.Anand Kumar, G.V.Uma 2009] they have conducted as study on the student performance by selecting a sample of 300 students (225 males, 75 females) from a group of colleges affiliated to Punjab University. The hypothesis that was stated as "Student's attitude towards attendance in class, hours spent in study on daily basis after college, students' family income, students' mother's age and mother's education are significantly related with student performance" was framed. By means of simple linear regression analysis, it was found that the factors like mother’s education and student’s family income were highly correlated with the student academic performance. III. Decision tree The philosophy of operation of any algorithm based on decision trees is quite simple. In fact, although sometimes containing important differences in the way to do this or that step, any algorithm of this category is based on the strategy of divide and conquers. In general, this philosophy is based on the successive division of the problem into several sub problems with a smaller number of dimensions, until a solution for each of the simpler problems can be found. Based on this principle, the classifiers based on decision trees try to find ways to divide the universe into successively more subgroups (creating nodes containing the respective tests) until each addressing only one class or until one of the classes shows a clear majority do not justifying further divisions, generating in this situation a leaf containing the class majority. Obviously, the classification is only to follow the path dictated by the successive test placed along the tree until it found a leaf containing the class to assign to the new example.

Figure 1: Decision Tree

IJSWS 15-189; Š 2015, IJSWS All Rights Reserved

Page 109


Manmohan Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 107-113

We now need objective criteria for judging how good a split is. The information gain measure is used to select the test attribute at each node in the tree. The attribute with the highest information gain (or greatest entropy reduction) is chosen as the test attribute for the current node. This attribute minimizes the information needed to classify the samples in the resulting partitions. Entropy, in general, measures the amount of disorder or uncertainty in a system. In the classification setting, higher entropy (i.e., more disorder) corresponds to a sample that has a mixed collection of labels. Lower entropy corresponds to a case where we have mostly pure partitions. In information theory, the entropy of a sample D is defined as follows:

Where

is the probability of a data point in D being labeled with class , and k is the number of

classes.

Can be estimated directly from the data as follows:

We can also define the weighted entropy of a decision/split as follows: Where D has been partitioned into information gain for a given split as:

and

due to some split decision. Finally, we can define the

In other words, Gain is the expected reduction in entropy caused by knowing the value of an attribute School_type = Private: B (161.0/64.0) School_type = Government | Mother_Education = Middle | | Category = SC: A (13.0/6.0) | | Category = ST | | | Sex = M: B (2.0) | | | Sex = F: C (2.0) | | Category = OBC: B (26.0/16.0) | | Category = Gen: A (28.0/13.0) | | Category = Gen: A (0.0) | Mother_Education = SSC: A (19.0/8.0) | Mother_Education = HSC | | Sex = M | | | Familysize = Large: A (5.0/2.0) | | | Familysize = Small: B (2.0) | | Sex = F: C (2.0/1.0) | Mother_Education = Primary | | Location_School = Urban: C (18.0/9.0) | | Location_School = Rural | | | Sex = M: B (50.0/25.0) | | | Sex = F: C (50.0/25.0) | Mother_Education = Literal: C (59.0/26.0) | Mother_Education = UG: A (2.0) | Mother_Education = PG: C (0.0) | Mother_Education = Illiteral | | Father_Education = SSC: C (0.0) | | Father_Education = Middle: B (3.0/1.0) | | Father_Education = HSC: A (1.0) | | Father_Education = Primary: C (11.0/5.0) | | Father_Education = Literal: C (1.0) | | Father_Education = UG: C (0.0) | | Father_Education = Illiteral | | | Category = SC: A (1.0) | | | Category = ST: A (3.0/1.0) | | | Category = OBC: B (6.0/2.0) | | | Category = Gen: C (2.0/1.0) | | | Category = Gen: A (0.0) | | Father Education = PG: C (0.0)

IJSWS 15-189; Š 2015, IJSWS All Rights Reserved

Page 110


Manmohan Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 107-113

IV. Apriori Algorithm 1: Find all large 1-itemsets 2: For (k = 2; while Lk-1 is non-empty; k++) 3 {Ck = apriori-gen (Lk-1) 4 For each c in Ck, initialise c.count to zero 5.For all records r in the DB 6{Cr = subset (Ck, r); For each c in Cr, c.count++} 5 Set Lk := all c in Ck whose count >= minsup 6 }/* end -- return all of the Lk sets We have gone to work out to support for each of the candidate k-itemises in Ck, by working out how many times each of these item sets appears in ID

, P_Grade (A) ,Private, Rural F_Govt, M_Govt, Fm_size4, Regular, OBC, 1 2 3

1

1

4 5 6 7 8 9 10 11 12 13

1

1 1 1 1

1 1

1 1

Male

1

1

1

1

1

1 1

1

1

1

1

1 1

1 1

14 15 16 17 18 19 20

1 1

1 1

1

1

1

1 1

1

1 1 1 1

1 1 1

1

1 1

1

1 1

1

Table 1: Using Apriori Algorithm Minimum support: 0.55 (257 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 9 Generated sets of large itemsets: Size of set of large itemsets L(1): 7 Size of set of large itemsets L(2): 13 Size of set of large itemsets L(3): 11 Of set of large itemsets L(4): 5 Size of set of large itemsets L(5): 1 Best rules found: 1. Attendence_School=Regular 442 ==> Previous_Result =Pass 442 <conf :(1)> lift:(1)lev:(0) [0] conv:(0.95) 2. School_type =Government 306 ==> Previous_Result =Pass 306 <conf :(1)> lift:(1) lev:(0) [0] conv:(0.66) 3. School_type =Government Attendence_School=Regular 302 ==> Previous_Result =Pass 302 <conf :(1)> lift:(1) lev:(0) [0] conv:(0.65) 4. Familysize=Small Attendence_School=Regular 281 ==> Previous_Result =Pass 281 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.6) 5. Sex=M 265 ==> Previous_Result =Pass 265 <conf :( 1)> lift :( 1) lev:(0) [0] conv:(0.57) 6. Leaving_Area=Rural 263 ==> Previous_Result =Pass 263 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.56) 7. Location_School=Rural Leaving_Area=Rural 262 ==> School_type =Government 262 <conf:(1)> lift:(1.53) lev:(0.19) [90] conv:(90.33) 8. School_type =Government Leaving_Area=Rural 262 ==> Location_School=Rural 262 <conf:(1)> lift:(1.78) lev:(0.25) [114] conv:(114.45) 9. School_type =Government Location_School=Rural 262 ==> Leaving_Area=Rural 262 <conf:(1)> lift:(1.78) lev:(0.25) [114] conv:(114.45) 10. Location_School=Rural Previous_Result =Pass 262 ==> School_type =Government 262 <conf:(1)> lift:(1.53) lev:(0.19) [90] conv:(90.33) 11. School_type =Government Location_School=Rural 262 ==> Previous_Result =Pass 262 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.56) 12. School_type =Government Leaving_Area=Rural 262 ==> Previous_Result =Pass 262 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.56) 13. Location_School=Rural Previous_Result =Pass 262 ==> Leaving_Area=Rural 262 <conf:(1)> lift:(1.78) lev:(0.25) [114] conv:(114.45)

IJSWS 15-189; Š 2015, IJSWS All Rights Reserved

Page 111


Manmohan Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 107-113

14. Location_School=Rural Leaving_Area=Rural 262 ==> Previous_Result =Pass 262 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.56) 15. Location_School=Rural Leaving_Area=Rural Previous_Result =Pass 262 ==> School_type =Government 262 <conf:(1)> lift:(1.53) lev:(0.19) [90] conv:(90.33) 16. School_type =Government Leaving_Area=Rural Previous_Result =Pass 262 ==> Location_School=Rural 262 <conf:(1)> lift:(1.78) lev:(0.25) [114] conv:(114.45) 17. School_type =Government Location_School=Rural Previous_Result =Pass 262 ==> Leaving_Area=Rural 262 <conf:(1)> lift:(1.78) lev:(0.25) [114] conv:(114.45) 18. School_type =Government Location_School=Rural Leaving_Area=Rural 262 ==> Previous_Result =Pass 262 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.56) 19. Location_School=Rural Previous_Result =Pass 262 ==> School_type =Government Leaving_Area=Rural 262 <conf:(1)> lift:(1.78) lev:(0.25) [115] conv:(115.01) 20. Location_School=Rural Leaving_Area=Rural 262 ==> School_type =Government Previous_Result =Pass 262 <conf:(1)> lift:(1.53) lev:(0.19) [90] conv:(90.33) 21. School_type =Government Leaving_Area=Rural 262 ==> Location_School=Rural Previous_Result =Pass 262 <conf:(1)> lift:(1.78) lev:(0.25) [115] conv:(115.01) 22. School_type =Government Location_School=Rural 262 ==> Leaving_Area=Rural Previous_Result =Pass 262 <conf:(1)> lift:(1.78) lev:(0.25) [114] conv:(114.45) V. Neural networks Neural networks represent a brain analogy for information processing. These models are biologically exhilarated rather than clear-cut clone of how the brain actually functions. The figure 7 shows the similarities between artificial neural network and biological neurons. Neural ideas are usually implemented as system simulations of the massively parallel processes that involve processing elements interconnected in network architecture.

Figure 2: The biological and artificial neurons Neurons receive the sum of information from other neurons or the external elements, perform transformation on the inputs and then pass the transformed information to other neurons or the external outputs. A typical structure is shown in Figure 8. For the better measurement and accurate result we have experimented by Multi Layer Involvement (MLI) of the Neural Networks (NNs) which is an advanced computational and learning method at modern computation and Intelligent Systems (ISs). A MLI consists of three layers named input layer, hidden layer, and output layer. A hidden layer receives input from the previous layer and converts those inputs into outputs for further processing. Several hidden layers can be placed between the input and output layers, although it is quite common to use only one hidden layer. Every working cell is connected with each of other cell as directed graph. A NN is very much similar with a directed graph where the neuron cells are considered as vertices and the connections between the cells are edges. In that case each edge is associated with its weights and the weights must reflect the measurements of the input and output results. Naturally an Artificial Neural Network (ANN) is consisting of Adaptive Linear Neural Elements (ADLINE) that changes its structure according to the. Propagation of information on external or internal matters through the network during the learning phase.

IJSWS 15-189; Š 2015, IJSWS All Rights Reserved

Page 112


Manmohan Singh et al., International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 107-113

Figure 3: Neural Network VI. Comparisons The result of the both decision tree, Apriori and NN are very effective for this research. Both procedure are very suitable for predicts the accuracy of the result Improvement Though the efficiency we make a comparison and found that NN is better at the cases when the input size has the less difference in School Improvement primary school result as based in our Data Field on the contrary the NN is better while the input As per Decision Tree And Apriori Algorithm. NN is 96% is corrects and Decision Tree is 92.5% accurate in this research. VII. Conclusion It is very good to say that this research will help to assess the degree of Improvement students from any country especially from the developing country. We have measured the significant amount of Progress in students from primary stage due to the various socioeconomic problems like lack of knowledge, poverty, and social barrier. Our implementation is very efficient for Approach in Data mining. Besides this work we noticed a few drawback regarding the time line and data collections and organization. At future we will overcome the problems regarding the indicated problems. VIII.

Acknowledgements

We are thanking the head masters of the schools who help us by providing the data regarding the enrollments and Improvement Result. We are showing our respect to All the , head teacher of Our Betul District Govt. Primary School, Private primary school and Semi Government their regular support for data collection and analysis.

References [1] [2] [3] [4] [5] [6] [7]. [8]. [9]. [10]. [11]. [12]. [13]. [14]. [15].

Pardeep Mittal , Preet Inder Kaur, Hardeep Kaur 2013, Implementation of Government education system using data mining, International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 4, April 2013. Osman N. Darcan and Bertan Y. Badur, 2012 “Student Profiling on Academic Performance Using Cluster Analysis” IBIMA Publishing Journal of e-Learning & Higher Education Article ID 622480, 8 pages DOI: 10.5171/2012.622480 Brajish K Bharadwaj and Saurabh Pal ,2011” Data Mining: A prediction for performance Improvement using classification” (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 4, April. V.Ramesh, P.Parkavi, P.Yasodha, 2011, “Performance Analysis of Data Mining Techniques for Placement Chance Prediction” International Journal of Scientific & Engineering Research Volume 2, Issue 8, August, ISSN 2229-5518. Ernesto Pathros Ibarra García, Pablo Medina Mora, 2011"Model Prediction of Academic Performance for First Year Students," micai, pp.169-174, 2011 10th Mexican International Conference on Artificial Intelligence, Adel Ben Youssef and MounirDahmani, “The impact of ICT’s on students’ performance in Higher Education: Direct effects, Indirect effects and Organizational change”, 2010. N.V.Anand Kumar , G.V.Uma (2009), , Improving Academic Performanceof Students by Applying Data Mining Technique, European Journal of Scientific Research ISSN 1450-216X Vol.34 No.4 , pp.526-534 Jaiwei Han and Micheline Kamber, 2008 “Data Mining Concepts and Techniques”, Second Edition Morgan Kaufmann Publishers M. Bray 2007, The Shadow Education System: Private Tutoring And Its Implications For Planners, (2nd ed.), UNESCO, PARIS, France, J. A. Moriana, F. Alos, R. Alcala, M. J. Pino, J. Herruzo, and R. Ruiz, 2006 “Extra Curricular Activities and Academic Performance in Secondary Students”, Electronic Journal of Research in Educational Psychology,Vol. 4, No. 1, pp. 35-46. Pardos Z.; Heffernan N.; Anderson B.; and Heffernan C., 2006. Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks. In Proc. of 8th Int. Conf. on Intelligent Tutoring Systems. Taiwan. Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd ed.). Boston, MA: Elsevier. Hand, D., Mannila, H., and Smyth, P. 2006, ‘Principles of Data Mining”. The MIT Press. S. T. Hijazi, and R. S. M. M. Naqvi, “Factors Affecting Student’s Performance: A Case of Private Colleges”, Bangladesh e-Journal of Sociology, Vol. 3, Z. N. Khan, 2005. “Scholastic Achievement of Higher Secondary Students in Science Stream”, Journal of Social Sciences, Vol. 1, No. 2, pp. 84-87. Kotsiantis S.; Pierrakeas C.; and Pintelas P., 2004. Pre- dicting Students’ Performance in Distance Learning Using Machine Learning Techniques. Applied Artificial Intelligence (AAI), 18, no. 5, 411–426.

IJSWS 15-189; © 2015, IJSWS All Rights Reserved

Page 113


International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

ISSN (Print): 2279-0063 ISSN (Online): 2279-0071

International Journal of Software and Web Sciences (IJSWS) www.iasir.net

Generating Recurrent Patterns Using Clique Algorithm Bipin Nair B J Lecturer in Department of Computer Science Amrita Vishwa Vidyapeetham, Mysore Campus, Karnataka, INDIA _________________________________________________________________________________________ Abstract: Clustering is one of the several machine learning techniques to find out frequent patterns. Most of the clustering methods are not designed for high dimensional data. As the dimensions increase, cluster formation becomes a major challenge in data mining. As a solution to this problem, an algorithm called clique is introduced. Clique is a clustering algorithm which helps to find frequent patterns from high dimensional data. In many real world applications, this clustering technique is being used. This report will look into the sales application, where we can find frequent patterns in product sales. This algorithm can be regarded as one of the ways to improve sales. Keywords: Clustering, Clique __________________________________________________________________________________________ I. INTRODUCTION Data mining is the process of extracting data from a huge data store and discovering some interesting patterns. Data mining consists mainly of two types of models: predictive and descriptive. The predictive model is to predict the unknown values on the basis of known data. The descriptive model is used to identify patterns. Clustering is an important technique in data mining. It is a grouping process whereby similar objects are grouped together. So clustering is a task that is used for aggregating similar objects inside a cluster and these objects will be dissimilar to objects belonging to other clusters. Clustering is a descriptive model which is used to find frequent patterns from data. Clustering is also known as unsupervised classification. There are different approaches in clustering, such as the one based on the principle of maximizing similarity between objects in the same class, called intra-class similarity and the one based on minimizing similarity between objects of different classes, known as inter-class similarity. One of the major problems in clustering is the curse of dimensionality [13]. The most relevant aspect of the curse of dimensionality is that on distance or similarity, and requires that objects in cluster. Here, the problem is to find clusters from a full-dimensional space. The clusters of point may only exist in a subset of higher dimensional spaces and the number of possible subspaces is also exponential in the dimensionality of the space. To overcome this problem CLIQUE (Clustering in Quest) is used [2]. Clique represents a grid-based and density-based approach for clustering in high dimensional data space and it is also represented by dimension growth subspace clustering. It is the first high dimensional space clustering algorithm [7]. The dense units are present in subspaces of increasing dimensions. Dimension growth subspace clustering primarily starts with the single dimensional subspace and then it grows upwards to the higher dimensional ones and discovers dense regions in each subspace. Clique partitions each dimension like a grid structure and determines whether a cell is dense, based on the number of points it contains. A unit is dense if the data points in it are exceeding the threshold value. The clique algorithm finds the crowded region from the multidimensional database and discovers patterns. The application employs the clique algorithm to interpret sales data. Using this algorithm will make it easy to analyse sales details and suggest patters that will help businesses to make informed and profitable decisions. All details related to sales will be available in the dataset and the clique algorithm will uncover the hidden frequent patterns in it. According to the frequent patterns, sales can be improved by taking necessary decisions and making better plans for the future. II. OUTLINE OF THE PAPER Section 3 explains the clique algorithm with a brief discussion about the algorithm. In section 4 implementation of the clique algorithm is explained along with the pseudo code. Moreover, in section 4 there are subsections like preprocessing, customization, base algorithm, post-processing and model. In section 5 related works are explained. Section 6 explains the experimental result of the application. Section 7 explains the goals and challenges of implementing this algorithm. Section 8 corresponds to the literature survey. Finally, section 9 is the conclusion.

IJSWS 15-190; Š 2015, IJSWS All Rights Reserved

Page 114


Bipin Nair B J, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 114-119

III. CLIQUE ALGORITHM There are mainly three steps in clique algorithm: 1) Identification of subspaces that contains clusters. 2) Identification of cluster. 3) Generation of minimal description for the clusters. Brief discussion about the algorithm: 1) Identification of subspaces that contains clusters: To identify the subspaces that contain clusters, first need to identify the dense units in the different subspaces. The bottom up iterative process is used to find subspace data points. 2) Identification of cluster: The dense unit which has selectivity at least tau will form frequent patterns. The frequent patterns which are similar will form as clusters. 3) Generation of minimal description for the clusters: First determines the minimal dense regions over the datasets in the subspaces and each cluster then determines the minimal cover for the maximal regions. Same procedure will follows to find cluster if the dimension increase. Algorithm 1: Pseudo code 1) Based on m, the input feature space is split. 2) The input is quantized to a particular grid. 3) Initialize count (of elements) to 0 across all the grids in the feature space. 4) As the input record is read from the file, line by line, the count of grid to which it is mapped is increased by 1. 5) For every attribute activate the regions of high density, i.e., store the levels of a particular level which are important. 6) Now take two attributes at a time and check for dense regions in the intersection of the dense regions in the individual attribute. 7) Repeat the step 6 by adding one dimension with each iteration, and choosing all their possible combinations, until all the dimensions (attributes) in the data set are covered. 8) Label all the connected clusters with a label value. IV. IMPLEMENTING CLIQUE When the data set has a lot of dimensionality, this leads to wastage of time in looking for clusters in highly sparse regions. To counter this problem, clique uses a very intelligent concept. When there is a cluster, assume k-dimensional subspace, then it can be reasoned that it will have a subspace of k-1 dimension which will also be dense. Now taking this concept in reverse, we start looking for dense regions in each of the k dimension and then look upwards higher and higher dimensions. This is precisely the intuition behind the algorithm. A. Preprocessing: Preprocessing is nothing but discretization of the datasets. Discretization is the process of putting values into buckets so that there are a limited number of possible states. In datasets some columns may contain so many values that the algorithm cannot easily identify interesting patterns in the data so discretization is used in it. B. Customization: There are two customization parameters: 1. The number of levels into which a particular dimension is divided into, and this step usually taken care in the pre-processing step, or in the data preparation stage itself. 2. The other parameter that can be customized is tau, which can be called as selectivity. If the number of data points within a grid falls below tau then that particular grid is not considered for further processing. C. Base algorithm Input: For a dataset of size m*d, each record is of dimension d. The dimension is split into m bins (after preprocessing), if each dimension is split m1,m2,‌,md, then the total number of bins/grids in the subspace is going to be m1* m2*‌*md. Either a vector of (m)d*1 or a single number m is used for splitting all dimensions equally into m bins, giving md grids. Output: Cluster labels. D. Post-Processing In this algorithm we are considering only discretized inputs. The input will not have any changes in the algorithm. So there is no specific post processing in this technique because the input is not changed. E. Model There is not a model as such for the algorithm. Since this by itself is an unsupervised learning procedure, the cluster labels are the output of this procedure. V. RELATED WORKS The working of the Clique algorithm proceeds from a lower to higher dimension. From lower dimension subspaces to higher dimension subspaces dense regions regions are discovered. There are many clustering

IJSWS 15-190; Š 2015, IJSWS All Rights Reserved

Page 115


Bipin Nair B J, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 114-119

algorithms available in data mining. Algorithms like DBSCAN, OPTICS, and PROCLUS cluster high dimensional data. DBSCAN is a density based clustering method [6]. This algorithm has two parameters: tau, the distance metric that will search the neighbourhood of a point to find other prospective candidates for the same cluster, and min_point, a parameter that specified the algorithm to give importance to a point only if it has other min_points in the vicinity of this particular point. This can be increased to accommodate dense clusters and discard outliers. If the points in a cluster are close to each other and a required number of neighbours are present within that boundary, then the points are classed into a cluster. The cluster will satisfy two properties: all the points within the cluster are mutually density-connected and if a point is density-connected to any other point of the cluster, it is part of the cluster as well. OPTICS is a density based clustering method [3]. If the points in the cluster are close to each other and a required number of neighbours are present within that boundary, then the points are classed into a cluster. In optics there are mainly two parameters, tau and min_point, similar to dbscan. However in optics, min_point can be increased to accommodate dense clusters and leave outliers. The cluster satisfies two properties: all points within the cluster are mutually density-connected and if a point is density-connected to any other point of the cluster, it is part of the cluster as well. In these two algorithms only numerical values are used as inputs. If not, only the numerical attributes of the dataset are sent as input to the algorithm. This is the case because the metric that is used to measure the closeness is the euclidean distance. PROCLUS (PROjected CLUStering) is a typical dimension-reduction subspace clustering method [1]. Proclus algorithm, instead of starting from single-dimensional spaces, starts by finding an initial approximation of the clusters in the high-dimensional attribute space. The proclus algorithm consists of mainly three phases: initialization, iteration, and cluster refinement. Initialization uses a greedy algorithm to select a set of initial medoids that are far apart from each other. This will ensure that each cluster is represented by at least one object in the selected set. Iteration choose a random set of k medoids from the reduced set of medoids and progressively improves the quality of medoids by iteratively replacing the bad medoids in the current set with new points from medoids. In refinement, one more pass over the data will improve the quality of the clustering. Overall it will compute new dimensions for each medoid based on the clusters found and reassigns points to medoids. Finally it will remove all the outliers. In proclus algorithm only numerical values will be taken as input. The attributes can be continuous or discrete valued variables. The algorithm has two parameters: Number of clusters k and Average dimensions I . There is not a model as such for these algorithms. Since this by itself is an unsupervised learning procedure, the cluster labels are the output of this procedure same as clique. In clique it is not necessary to give input as only numerical values; other values also can be given, but only discretized values will be taken as input. VI. EXPERIMENTAL RESULT The real world application used in this algorithm is sales data. The competition in this field is very high. Most of them have loss more than profit in business, So they need a better plan to improve their sales. Clique will give a fine solution for this problem. In sales there is some pattern we need to follow, if customer is buying one product frequently then we need to discover which frequent pattern is they follow. But this is possible only when there is a small dataset. When it comes in large datasets finding the frequent pattern is quite difficult. So to deal with the high dimensional data clique algorithm is used. This algorithm will help to find the frequent pattern from the high dimensional dataset easily. The result will depend upon the datasets.

Figure 1: Dataset Figure 1 will give a brief idea about how the dataset looks like. This is a sample sales dataset, the products are the items which people buy from the shop. The detailed description about the dataset is given below. Table I will give a brief explanation about the dataset. And Table II will give an overall idea about the attributes present in the dataset. Table I: Description about the Dataset Dataset No. of attributes No. of rows

Sales.csv 10 500

To explain this application, we are using sales dataset with an extension of .csv file. In this dataset there are total 10 attributes and 500 instances. Attributes are considered as the dimensions.

IJSWS 15-190; Š 2015, IJSWS All Rights Reserved

Page 116


Bipin Nair B J, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 114-119

Table II: Attributes Names and Details Name of attributes

Products available in each section

Product_in_sectionA Product_in_sectionB Product_in_sectionC Product_in_sectionD Product_in_sectionE Product_in_sectionF Product_in_sectionG Product_in_sectionH Product_in_sectionI Product_in_sectionJ

Ice, Cone and Cream Maggi, Knorr and Top_ramen Toyota, i20 and Ford Hammer, Nails and Bolt Book, Pencil and Ink Bag, Box and Bottle Mango, Orange and Banana Diarymilk, Kitkat and Snickers Chain, Ring and Bangles Laptop, Mouse and CPU

In Table II the attributes of sales dataset is explained. The products which is present each section is explained in the table. In each section different products are available. The algorithm will work in such a way that, only discretized datasets will take as an input. If we consider a high dimensional dataset it will not be easy to identify interesting patterns from that data. Discretization is the process of putting the values into buckets. In this process there are a limited number of possible states will be present. There are several methods to discretize the data. The number of buckets to use for grouping data by setting the value of discretization Bucket Count property. The default bucket count is 5. This is the first customized parameter taken care in the pre-processing step. This algorithm working is depends on the customized parameter tau value or selectivity value. The whole algorithm working is, taking the count attribute wise and removing the count that fall below tau value and taking the comparison of attributes in all the ways and there should not be any repetition in comparison and checking the count again removing the count that falls below the tau value and finding the frequent patterns. All this work is depends on the customized tau value. Table III: Frequent Pattern Products in frequent pattern

No. of items purchased

Count of items

Cone, Knorr Cone, Hammer Cone, Book Toyota, Mango Hammer, Book Book, Mango Cone, Hammer, Book

2 items 2 items 2 items 2 items 2 items 2 items 3 items

178 242 178 194 178 178 178

Table III is the frequent patterns from the sales dataset. This will give a clear idea of which products are buying frequently. Using clique we can easily identify which all products is having more marking value. This algorithm will help to reduce the work load of finding patterns from such a large datasets. The result of this application is, combination of cone and knorr purchased together 178 times, combination of cone and hammer purchased together 242 times, combination of cone and book purchased together 178 times, combination of toyota and mango purchased together 194 times, combination of hammer and book purchased together 178 times, combination of book and mango purchased together 178 times and combination of cone, hammer and book purchased together 178 times. By analyzing this result the sales can be improved by keeping more stoke on the products which are frequently purchased. Finally we will get five clusters as result based on the customized tau value. This is one application where we can use clique algorithm, likewise there are many areas where we can use clique algorithm for the easy analysis of data.

Figure 2: Clusters

IJSWS 15-190; Š 2015, IJSWS All Rights Reserved

Page 117


Bipin Nair B J, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 114-119

The visualization of the clusters is shown in Figure 2. Based on the customized tau value five clusters are formed for this dataset. The visualization is done based on the final frequent patterns. In Table III the final frequent patterns is explained. Each frequent pattern is separately visualized for a better understanding. In Figure 2 each cluster is represented in different colors. The cluster which contains more than one frequent pattern is visualized in same color, so this is considered as one cluster. The result of this application is represented in the form of a multibar chat is shown in Figure 3. This chart is used to visualize the frequent patterns in each cluster separately. For this multibar chart the result will convert to D3.js. In this multibar chart total five clusters are charted. In chart the frequent patterns are shown like first bar from first group and second bar from the second group is belongs to one cluster. And the second bar from second first group and the second bar from second group belong to one cluster. Remaining bars from first group is showing clusters with single frequent pattern. For the details of the frequent pattern refer Table III. Table III is giving the details about the frequent patterns mentioned in the multibar chart.

Figure 3: Multibar chart In multibar chart, it is plotted based on the frequent pattern values. The advantage of this chart is, it will give the details separately about the cluster when we select the cluster labels in the top of the chart. And it also show the count of frequent pattern when we point the cursor on the top of the bar. The benefit of visualizing the result in multibar chart is it will show the details of cluster very clearly rather than visualizing it in the cluster form. VII. LITERATURE SURVEY A method for speeding up the step of clique algorithm is described in [12]. Clique doesn’t have any model because it is unsupervised learning. Clustering is unsupervised learning of a hidden data concept [5] and the resulting system represents a data concept. Clustering is a division of data into groups of similar objects. To find clusters within subspaces of a dataset, clique was one of the first algorithms proposed to attempt that [11]. Subspace clustering is an extension of feature selection that attempts to find clusters in different subspaces of the same dataset. The subspace clustering requires a search method and evaluation criteria as with feature selection. Clique algorithm combines density and grid based clustering and it uses an apriori technique to find cluster able subspaces. IX. CONCLUSION In this paper a brief explanation of data mining is provided along with clustering and the challenges of clustering in high dimensional data. This application has been successfully applied in some areas and needs more study to understand its strength and limitations. There is no reason to expect that this specific clustering algorithm will give clusters for all sorts of high dimensional data. However, almost all high dimensional data clustering problems can be solved by applying this clustering algorithm. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9]

Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, and Jong Soo Park. Fast algorithms for projected clustering. SIGMOD Rec., 28(2):61–72, June 1999. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec., 27(2):94–105, June 1998. Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Rec., 28(2):49–60, June 1999. Christian Baumgartner, Claudia Plant, Karin Kailing, Hans-PeterKriegel, and Peer Kröger. Subspace selection for clustering high- dimensional data. In ICDM, pages 11–18, 2004. p.Berkhin.A survey of clustering data mining techniquies. Grouping Multidimensional Data ,Pages 25-71,2006. Martin Ester, Hans peter Kriegel, JÃ{rg S, and Xiaowei Xu. A density- based algorithm for discovering clusters in large spatial databases with noise. pages 226–231. AAAI Press, 1996. Jiawei Han. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Karin Kailing, Hans-Peter Kriegel, and Peer KrÃ{ger. Density- connected subspace clustering for high-dimensional data. In IN: PROC. SDM. (2004, pages 246–257, 2004. Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, and Arthur Zimek. Density based subspace clustering over dynamic data. In

IJSWS 15-190; © 2015, IJSWS All Rights Reserved

Page 118


Bipin Nair B J, International Journal of Software and Web Sciences, 11(1), December 2014-February 2015, pp. 114-119

[10] [11] [12]

[13]

SSDBM, pages 387–404, 2011. Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. Detecting clusters inmoderate-to-high dimensional data: subspace clustering, pattern-basedclustering, and correlation clustering. PVLDB, 1(2):1528–1529, 2008. Lance Parsons, Ehtesham Haque, and Huan Liu. Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl., 6(1):90–105, June 2004. Jyoti Pawar and P. R. Rao. An attribute based storage method for speeding up clique algorithm for subspace clustering. In Proceedings of the 10th International Database Engineering and Applications Sympo- sium, IDEAS ’06, pages 309– 310,Washington, DC, USA, 2006. IEEE Computer Society. Michael Steinbach, Levent ErtÃ{z, and Vipin Kumar. The challenges of clustering high-dimensional data. In In New Vistas in Statistical Physics: Applications in Econophysics, Bioinformatics, and Pattern Recognition. Springer-Verlag, 2003.

IJSWS 15-190; © 2015, IJSWS All Rights Reserved

Page 119


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.