A FRAMEWORK FOR KNOWLEDGE REPRESENTATION OF INFORMATION SYSTEM

Page 1

International Engineering Journal For Research & Development E-ISSN NO: 2349-0721 Volume 1:Issue 1

A FRAMEWORK FOR KNOWLEDGE REPRESENTATION OF INFORMATION SYSTEM *

Dr. G. R. Bamnote1, +Prof. S. S. Agrawal2

1

Professor & Head, Department of Computer Science & Engineering, PRMITR, Badnera 2 Asst. Professor, Department of Computer Science & Engineering, COE & T, Akola

------------------------------------------------------------------------------------------------------------------------------------------------------------

Abstract Now a day’s data can be searched and represented by combining data from multiple sources. The important problem is how to represent information from multiple sources in dispersed databases, supportive information systems, and data warehousing. Two important issues related to the design and maintenance of application are conceptual modeling of the domain, and logic support over the conceptual representation. information demonstration and reasoning techniques play an important role for both of these two factors. The development of successful information combination solutions requires not only alternative to expressive description logics, but also to significantly extend them. A different approach for information combination is presented, which allows for suitably modeling the global concepts of the purpose, the individual information sources, and the constraints among different sources. The inference procedures are also developed for the fundamental reasoning i.e., relation and concept representation, and query containment. Finally, we present a procedural framework for information combination, which can be applied in several situations, and highlights the role of reasoning services within the design process. Keywords: Query Processing, Information Combination, Relation

1. Introduction The main purpose in information combination is to entrĂŠe, transmit and merge data from many sources. Information combination is one of the main problems in dispersed databases, supportive information systems, and data warehousing [1-2]. Basic work on information combination was carried out in the background of database design, and focused on the schema integration problem, i.e. designing a global, unified schema for a database application starting from several sub-schemata, each one produced independently from the others [3]. Efforts have been made to information combination, which generalizes schema integration by taking into account actual data in the integration process. The input is a collection of source data sets, and the purpose is to provide an integrated and submissive view of the data residing at the sources, without interfering with their autonomy [4]. Information combination can be either virtual or materialized. In the first case, the integration system acts as an interface between the user and the sources [5], whereas in the second case, the system maintains a reconciled, replicated view of the data at the sources [6]. Procedural and declarative are the two basic approaches to information combination problem. In the procedural approach, data are combined in an ad-hoc manner with respect to a set of predefined information requirements. In this situation, the basic question is to design suitable software modules that access the sources in order to fulfill the predefined information requirements. Some data integration projects, such as TSIMMIS [7], SQUIRREL [8], and WHIPS [9] follow the above idea. They do not require an explicit concept of integrated data schema, and rely on wrappers to encapsulate sources and mediators to merge data coming from wrappers and other mediators. In the second approach, the purpose is to model the data at the sources by means of a suitable language, to construct a unified demonstration, to refer to such a demonstration when querying the global information system, and to derive the query answers by means of suitable mechanisms accessing the sources. Some projects, such as Carnot [10], SIMS [11] and Information Manifold [12-14] uses declarative concept. A key advantage has been provided by declarative approach over the procedural approach: building a unified representation may be costly, it represents a reusable component of the information combination system. A declarative approach is used for information combination and two factors for the design and maintenance of applications requiring information combination are the conceptual modeling of the domain and the opportunity of way of thinking over the conceptual demonstration. Knowledge representation and reasoning techniques play an key role for both of these factors, by proposing a description Logic [15, 16] based framework for information combination. A different architecture for an information combination system is presented, which allows representing data and information needs at various levels. At the conceptual level we use description logics for modeling both the global domain and the various sources. The development of successful information combination system requires detailed modeling features, we propose a new explanation sense, that treats n-ary relations as first-class citizens. The

www.iejrd.in

Page 1


International Engineering Journal For Research & Development E-ISSN NO: 2349-0721 Volume 1:Issue 1

common characteristic of many description logics to model only unary predicates and binary predicates would represent an impossible limit. We also provide a suitable approach for expressing the intermodel assertions, i.e. interrelationships between concepts in different sources. Hence, information combination is the incremental process of understanding and representing the relationships between data in the sources, rather than producing a unified data schema. We include the possibility of describing all data at the logical level in terms of a set of relational structures for an precise description of the information sources. Each relational structure is defined as a view over the conceptual representation, hence providing a mapping between the description of data and the conceptual representation of the domain. The inference procedures are provided for the fundamental reasoning services, i.e., concept and relation representation, and query containment. We present the first decidability result on query containment for description logic with n-ary relations [17]. Also we present a framework for information combination based on these reasoning methods, which can be applied both in the virtual and in the materialized approach.

2. Architecture of Information Combination System 2.1 Components Figure shows the data structures used by an information combination system. It consists of four components: conceptual, logical physical and meta. In Figure, interface module allows the communication with the user and the designer. And the external Sources represent the independent systems managing the actual data that the system is supposed to combine.

Source 2

Source 1 Information Combination System Query Module 1

Query Schema 1

Query Schema n

Meta Model

Domain Model

Domain Schema

Enterprise Model

Materialized view schema

Query Module 1 Meta Level

Query Module n

Query Module 2

Query Schema1

Conceptual Level

Materialized View Store Mediators

Query Schema2

Logical Level

Wrappers Physical Level

Interface Figure: Information System Combination Architecture 2.1.1 Conceptual Level A prescribed description of the concepts, the relationships between concepts, and the information requirements that the integration application has to deal with are provided in the conceptual level. The important feature of this level is that such a description is independent from any system consideration, and is oriented towards the goal of expressing the semantics of the application. The first element in the conceptual level is the enterprise model. This model is a conceptual representation of the global concepts and relationships that are of interest to the application. It corresponds to the concept of integrated conceptual schema in the traditional approaches to schema integration. For an information source S, the source model of S is a conceptual representation of the data exist in S. The second element in the conceptual level is domain model. This model is used to represent the combination of both the enterprise model and the various source models, in addition relationships holding between concepts belonging to different models.

www.iejrd.in

Page 2


International Engineering Journal For Research & Development E-ISSN NO: 2349-0721 Volume 1:Issue 1

The last element in the conceptual level is query model. This model is a conceptual representation of an information need, For example, relational query over the domain model. The domain model contains the specification of the interdependencies between elements of different source models and between source models and the venture replica. The idea of interdependency is an important part in the architecture. As the sources are of interest in the system, integration does not simply mean producing the enterprise model, but rather to be able to establish the correct relationships between the source models and the venture representation, and flanked by the various source models. 2.1.2 Logical Level In this level, the description of the data and the queries are expressed in terms of logical structures managed by database systems. The source schema of a source S describes the logical content of S and the materialized view schema describes the logical content of the materialized views maintained by the system. The source schemas and the materialized view schema form the data schema collectively. The materialized view schema is important only in the case where the combined data are materialized, whereas it is unimportant in the case of fully virtual combination. The query schemas express the information needs at the reasonable stage, for instance as a set of relational queries over the data schema. 2.1.3 Physical Level The physical level refers to the actual data managed by the system. So, in this layer, the extensional information of the system is taken into account. The materialized view store contains the data that the system maintains materialized. Figure 1 shows wrappers and mediators at this stage. A covering is a software component that is able to access a source and retrieve the data within a form that is rational with the logical specification of the source. A mediator is a software module that takes as input sets of data produced by either wrappers or other mediators, refines this information by integrating and resolving conflicts, and produces as output another set of data, i.e., the one corresponding to the result of a given query. 2.1.4 Meta level This level consists of meta model, which is the warehouse with all meta information about the various system components, and is used by both the user and the designer.

2.2 Tasks First define activities to the different elements of the architecture. The activities affect to the design of the information combination system. For example, the specification of the various conceptual models and the intermodel links belongs to this phase. The architecture does not fix to build the conceptual level in one try, but rather supports an incremental definition of both the domain and the query models. Such models are subject to changes and additions as the analysis of the information sources proceeds. One of the most critical tasks is the decision of what and how to materialize in the materialized approach to integration. Also, in both the materialized and the virtual approach, the task of wrapper and mediator design is extremely important. Designing a wrapper means to decide how to access the source in order to retrieve data, and designing a mediator means to decide how to use wrappers in order to answer a particular query or to materialize a particular view. The design of a mediator consists of the resolution of conflicts and/or heterogeneity of data residing in different sources. Another tasks includes all the design activities to be performed when a new information need arises. In this case, the new query has to be compared with those computed by the available mediators. The most important problem here is the one of doubt rephrasing, i.e. inspecting if and how the original query can be reformulated in terms of those computed by the existing mediators. In case of virtual integration, this may lead the new mediator to simply call for the existing mediators. In materialized integration, reformulating the query in terms of the materialized views means avoiding to access the sources. Thus, if the query cannot be answered by simply relying on the existing materialized views, a new view should be materialized, and the problem of query rewriting arises in a different form: the new view to materialize is seen as a query that has to be formulated in terms of the source schemas. The third class of tasks relate to the activities that are routinely carried out during the operational phase of the systems, i.e., data extraction, query computation, and view materialization and refreshment.

www.iejrd.in

Page 3


International Engineering Journal For Research & Development E-ISSN NO: 2349-0721 Volume 1:Issue 1

2.3 Comparison with existing systems 2.3.1 Multidatabases It deals with different sources, which are considered as internal components of the information combination System. Based on a logical representation of the resource, peacekeepers are planned in order to satisfy information needs also expressed at the logical level. Mediators do not materialize data in the system. Also, the conceptual level is generally not taken into account. 2.3.2 Schema integration Integration starts by providing a conceptual representation of the sources and proceeds by generating the global database schema. Such a schema is then used for the design of the implemented database. Once such database has been created, the sources are discarded and the conceptual level is not used anymore. 2.3.3 Global information systems The goal is to provide tools for the integrated access to multiple and diverse autonomous information sources and repositories, such as databases, HTML documents, unstructured files. Among the systems proposed in this framework, 3. Proposed Methodology A proposed methodology for information combination system is based on the techniques which can be applied in the situation of both virtual and materialized data combination. The methodology deals source-driven and client-driven. 3.1 Source driven combination Source-driven combination is triggered when a new source or a new portion of a source is taken into account for combination. First the source model is constructed. The model capturing the concepts and the relationships of the new source that are critical for the organization is produced. Next we combine the source model with the domain model. This can lead to changes both to the source models, and to the enterprise model. The specification of intermodel assertions and the derivation of implicit relationships by exploiting the reasoning techniques, represent the new approach to the methodology. It not only assertions relating elements in one source model with elements in the enterprise model, but also assertions relating elements in different source models are of importance. Another attribute is quality analysis. In this, the quality factors of the resulting domain model are evaluated and a restructuring is accomplished to match the required criteria. This step requires the use of the reasoning techniques associated with formalisms to check for quality factors such as consistency, redundancy, readability, accessibility, believability [18]. In source schema construction, the source schema, i.e. the logical view of the new source or a new portion of the source is produced. The source schemas are used in order to determine the sources relevant for computing answers to queries, by exploiting the ability to reason about queries. In materialized view schema reorganization, on the foundation of the new source, an analysis is carried out on whether the materialized view schema should be restructured and/or modified in order to meet quality requirements. The schema is constituted by a set of queries over the domain model, and for its restructuring the use of reasoning techniques is crucial. A restructuring of the materialized view schema may require the design of new mediators. 3.2 Client driven combination The client-driven combination approach refers to the case when a new query or a set of queries fired by a client is considered. The reasoning facilities are exploited to analyze and systematically decompose the query and check whether its components are subsumed by the views defined in the various schemas. In materialized data integration, the analysis is carried out as: By exploiting query control checking, we verify if and how the answer can be computed from the materialized views. In the case where the materialized views are not sufficient, we verify if the answer can be obtained by materializing new concepts represented in the domain model. In this case, query control helps to identify the set of sub-queries to be issued on the sources and to extend and/or restructure the materialized view

www.iejrd.in

Page 4


International Engineering Journal For Research & Development E-ISSN NO: 2349-0721 Volume 1:Issue 1

schema. Different choices can be identified, based on various preference criteria which take into account the above mentioned quality factors. In the case where neither the materialized data nor the concepts in the domain model are sufficient, the necessary data should be searched for in new sources, or in new portions of already analyzed sources. The new portions of the sources are then added to the domain model using the source-driven approach, and the process of analyzing the query is iterated. In virtual data integration, one has to determine whether and how the answer can be computed from the data in the analyzed sources.

4. Conclusion The basic features of a declarative and procedural approach to information combination based on description logic are presented. The proposed framework can be applied to source driven combination or client driven combination or both. We combine solutions to expressive description logics to significantly extend them. A different approach for information combination is presented, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources.

References [1] G. Wiederhold. Special issue: Intelligent integration of information. J. of Intelligent Information Systems, 6(2/3). [2] C. Knoblock and A. Levy, editors. AAAI Symposium on Information Gathering from Heterogeneous, Distributed Environments, number SS-95-08 in AAAI Spring Symposium Series. AAAI Press/The MIT Press. [3] C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4):323–364. [4] J. D. Ullman. Information integration using logical views. In Proc. of the 6th Int. Conf. on Database Theory (ICDT-97), number 1186 in Lecture Notes in Computer Science, pages 19–40. Springer-Verlag. [5] A. Sheth and J. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3) [6] W. H. Inmon. Building the Data Warehouse. John Wiley & Sons, second edition. [7] S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proc. of IPSI Conf. (IPSI’94), Tokyo (Japan). [8] G. Zhou, R. Hull, and R. King. Generating data integration mediators that use materializations. J. of Intelligent Information Systems, 6:199–221. [9] J. Hammer, H. Garcia-Molina, J. Widom, W. Labio, and Y. Zhuge. The Stanford data warehousing project. IEEE Bulletin of the Technical Committee on Data Engineering, 18(2):41–48. [10] C. Collet, M. N. Huhns, and W.-M. Shen. Resource integration using a large knowledge base in Carnot. IEEE Computer, 24(12):55–62. [11] Y. Arens, C. Y. Chee, C. Hsu, and C. A. Knoblock. Retrieving and integrating data from multiple information sources. J. of Intelligent and Cooperative Information Systems, 2(2):127–158. [12] A. Y. Levy, D. Srivastava, and T. Kirk. Data model and query evaluation in global information systems. J. of Intelligent Information Systems, 5:121–143 [13] T. Kirk, A. Y. Levy, Y. Sagiv, and D. Srivastava. The Information Manifold. In Proceedings of the AAAI 1995 Spring Symp. on Information Gathering from Heterogeneous, Distributed Environments, pages 85–91 [14] A. Y. Levy, A. Rajaraman, and J. J. Ordille. Query answering algorithms for information agents. In Proc. of the 13th Nat. Conf. on Artificial Intelligence (AAAI-96), pages 40–47 [15] F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Reasoning in description logics. In G. Brewka, editor, Principles of Knowledge Representation, Studies in Logic, Language and Information, pages 193–238. CSLI Publications [16] A. Borgida. Description logics in data management. IEEE Trans. on Knowledge and Data Engineering, 7(5):671–682. [17] D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query containment under constraints. In Proc. Of the 17th ACM SIGACT SIGMOD SIGART Sym. on Principles of Database Systems (PODS-98) [18] D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, and R. Rosati. Source integration in data warehousing. Technical Report DWQ-UNIROMA-002, DWQ Consortium, Oct.

www.iejrd.in

Page 5


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.