Global journal 18

Page 1

www.globaljournal.asia GJESR REVIEW PAPER VOL. 1 [ISSUE 4] MAY, 2014

ISSN:- 2349–283X

NEW CHALLENGES IN DATA INTEGRATION: LARGE SCALE AUTOMATIC SCHEMA MATCHING 1*Kamal Kant Department of Computer Science & Engineering, Madan Mohan Malviya University of Technology, Gorakhpur, (UP) India. Email: kk.kamal2525@yahoo.com

2,3,4A.K.Sharma,

Kumar Ashis, Sanjay Kumar Department of Computer Science & Engineering, Madan Mohan Malviya University of Technology, Gorakhpur, (UP) India. Email: akscse@rediffmail.com

ABSTRACT: Today schema matching is a basic task in almost every data intensive distributed application, like enterprise information integration, collaborating web services, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability. Many surveys have been presented in the past to summarize this research. The requirement for extending the previous surveys has been created because of the mushrooming of the dynamic nature of these data intensive applications. Indeed, evolving large scale distributed information systems are further pushing the schema matching research to utilize the processing power not available in the past and directly increasing the industry investment proportion in the matching domain. This article reviews the latest application domains in which schema matching is being utilized. The paper gives a detailed insight about the desiderata for schema matching and integration in the large scale scenarios. Another panorama which is covered by this survey is the shift from manual to automatic schema matching. Finally the paper presents the state of the art in large scale schema matching, classifying the tools and prototypes according to their input, output and execution strategies and algorithms. Keywords: Schema integration, data integration, Mappings, Schema Merging schema evolution; large scale. 1. INTRODUCTION There exists an unending list of digital devices cooperating together to solve problems at individual level, personal or professional, and organisational level. The collaboration between these devices eventuates in better performance and results. Every day a new gadget hits the market, creating a ripple-effect in its surrounding operating environment. For the database community, it is an emergence of new form of data or information, which has to be utilised in the most efficient and effective manner. The ability to exchange and use of data/information between different devices (physical or logical), is the basic activity in any type of system, usually referred to as data interoperability. Previous work on schema matching was developed in the context of schema translation and integration (Bernstein, Melnik, Petropoulos,

& Quix, 2004; Do & Rahm, 2007; A. Halevy, Ives, Suciu, & Tatarinov, 2003), knowledge representation (Giunchiglia, Shvaiko, & Yatskevich, 2004; Shvaiko & Euzenat, 2005), machine learning, and information retrieval (Doan, Madhavan, Dhamankar, Domingos, & Halevy, 2003). All these approaches aimed to provide a good quality matching but require significant human intervention (Bernstein et al., 2004; Doan et al., 2003; Do & Rahm, 2007; Giunchiglia et al., 2004; A. Halevy et al., 2003; Lu, Wang, & Wang, 2005; Madhavan, Bernstein, & Rahm, 2001). However, they missed to consider the performance aspect, which is equally important in large scale scenario (large schema or a large number of schema to be matched). By definition, schema matching is the task of discovering correspondences between

Š Virtu and Foi

27


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.