SR08 by GEOTECHNOLOGIEN

In Germany the national programme ÂťInformation-Systems in Earth ManagementÂŤ has been initiated in 2002 as part of the R&D-Programme GEOTECHNOLOGIEN. Between 2002 and 2005 six joint projects have been funded with about 4 Million Euro by the Federal Ministry of Education and Research. All projects were carried out in close cooperation with various national and international partners from academia and industry. This report highlights the scientific results from this funding period addressing the following objectives: Semantical and geometrical integration of topographical, soil, and geological data Rule based derivation of geoinformation Typologisation of marine and geoscientifical information Investigations and development of mobile geo-services Coupling information systems and simulation systems for the evaluation of transport processes

Science Report Information Systems in Earth Management

GEOTECHNOLOGIEN

Information Systems in Earth Management

GEOTECHNOLOGIEN Science Report

Information Systems in Earth Management From Science to Application Results from the First Funding Period (2002-2005)

The GEOTECHNOLOGIEN programme is funded by the Federal Ministry for Education and Research (BMBF) and the German Research Council (DFG)

No. 8

ISSN: 1619-7399

No. 8

GEOTECHNOLOGIEN Science Report

Information Systems in Earth Management

From Science to Application

Results from the First Funding Period (2002-2005)

Number 1

No. 8

Impressum

Schriftleitung Dr. Ludwig Stroink © Koordinierungsbüro GEOTECHNOLOGIEN, Potsdam 2006 ISSN 1619-7399 The Editors and the Publisher can not be held responsible for the opinions expressed and the statements made in the articles published, such responsibility resting with the author. Die Deutsche Bibliothek – CIP Einheitsaufnahme GEOTECHNOLOGIEN; Information Systems in Earth Management, From Science to Application – Results from the First Funding Period (2002-2005) Potsdam: Koordinierungsbüro GEOTECHNOLOGIEN, 2006 (GEOTECHNOLOGIEN Science Report No. 8) ISSN 1619-7399 Bezug / Distribution Koordinierungsbüro GEOTECHNOLOGIEN Heinrich-Mann-Allee 18/19 14473 Potsdam, Germany Fon +49 (0)331-620 14 800 Fax +49 (0)331-620 14 801 www.geotechnologien.de geotech@gfz-potsdam.de Bildnachweis Titel / Copyright Cover Picture: M. Butenuth

Preface

In Germany the national programme »Information-Systems in Earth Management« has been initiated in 2002 as part of the R&DProgramme GEOTECHNOLOGIEN. After a public call, more than 40 project proposals have been evaluated in an international twostep review procedure, involving several experts from four different countries. Finally six joint projects were recommended and funded by the Federal Ministry of Education and Research (BMBF) with about 4 Million Euro for a three year funding period (2002 – 2005). The research projects - involving 15 partners from academia and industry - covered the following key topics: -

Semantical and geometrical integration of topographical, soil, and geological data Rule based derivation of geoinformation Typologisation of marine and geoscientific information Investigations and development of mobile geo-services Coupling information systems and simulation systems for the evaluation of transport processes

As part of the international Munster GI Days, the closing workshop took place on June 22, 2005. More than 60 scientists unveiled their results in the presence of the international reviewers. The highlights of the research projects are presented in this report. Information technologies will remain an important focus of the R&D-Programme GEOTECHNOLOGIEN. Interoperable information architectures are e.g. an essential link in the early warning chain. Therefore, their systematic further development and implementation will be a main focus of future research activities in the development of early warning systems. Corresponding research projects will be started in early 2007 in the frame the R&D-Programme GEOTECHNOLOGIEN. The intelligent combination of various research elements with the goal of a global Earth System Management is thus being implemented piece by piece.

Ludwig Stroink

Table of Contents

Rule Based Derivation of Groundwater Vulnerability on Different Scales Azzam R., Kappler W., Kiehle C., Kunkel R., Meiners H.G. Wendland F. . . . . . . . . . . . . . . . . 2 - 11 Overcoming Semantic Heterogeneity in Spatial Data Infrastructures Lutz M., Christ I., Witte J., Klien E., Hübner S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 - 31 Advancement of Mobile Geoservices: Potential and Experiences Breunig M., Bär W., Thomsen A., Häußler J., Kipfer A., Kandawasvika A., Mäs S., Reinhardt W., Wang F., Brand S., Staub G., Wiesel J.

. . . . . . . . . . . . . . . . . . . . . . . . . . 32 - 51

Development of a data structure and tools for the integration of heterogeneous geospatial data sets Butenuth M., Gösseln G. v., Heipke C., Lipeck U., Sester M., Tiedge M. . . . . . . . . . . . . . . . 52 - 73 ISSNEW – Developing an Information and Simulation System to Evaluate Non-point Nutrient Loading into Waterbodies Dannowski R., Arndt O., Schätzl P, Michels I., Steidl J., Hecker J.-M., v. Waldow H., Kersebaum K.-C.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 - 87

Marine Geo-Information-System for Spatial Analysis and Visualization of Heterogeneous Data (MarGIS) Schlüter M., Schröder W., Vetter L., Jerosch K., Peesch R., Köberle A., Morchner C., Fritsche U.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 - 101

Notes

Rule Based Derivation of Groundwater Vulnerability on Different Scales Azzam R. (1), Kappler W. (2), Kiehle C. (1), Kunkel R. (3), Meiners H.G. (2), Wendland F. (3) (1) RWTH Aachen University, Chair for Engineering Geology and Hydrogeology, Lochnerstraße 4-20, 52064 Aachen, Germany, {azzam|kiehle}@lih.rwth-aachen.de (2) ahu AG Wasser Boden Geomatik, Kirberichshofer Weg 6, 52066 Aachen, Germany, {w.kappler|g.meiners}@ahu.de (3) Research Centre Jülich, Programme Group Systems Analysis and Technology Evaluation, 52425 Jülich, Germany, {r.kunkel|f.wendland}@fz-juelich.de

1. Introduction The project »Development of an information infrastructure for the rule based derivation of geoinformation from distributive, heterogeneous geodata inventories on different scales with an example regarding the groundwater vulnerability assessment« has been funded from 2002 to 2005 by the BMBF-/DFG-programme »Geotechnologien – Information Systems in Earth Management« (project number 03F0372A). The overall goal of the project was the development of a Spatial Data Infrastructure (SDI) for the processing of geoinformation in a rule-based manner, independent of scale. This was achieved by providing a »Geo Web Service Groundwater Vulnerability« (GWV) as an integral part of a web-based decision support system. The Geo Web Service GWV accesses distributed geodata inventories provided by web service technology and processes the heterogeneous base data to consistent information. The concept for determining groundwater vulnerability (Hölting et al., 1995) serves as a geoscientific case study. This case study aims to provide base data and methods for the development of a SDI in order to process distributed geoinformation. Therefore, the project has both a geoscientific and an information technological (IT) focus.

2. Spatial scale and uncertainty analysis The geoscientific objectives of the project con-

cern the investigation of the consequences using data originated from different scales on the example of the derivation of the groundwater vulnerability according to Hölting et al. (1995). This method considers the intrinsic susceptibility of the groundwater system depending on the properties of the aquifer coverage (field capacity of the soil, petrographic structure) and the associated sources of water and stresses for the system (recharge, travel distance). From this investigation rules were compiled and implemented permitting consistent spatial information to be derived and displayed from geodata recorded on different scales and in different formats. In order to derive these rules, the effects of spatial information and its uncertainties on the derived groundwater vulnerability were analysed at different scales: at the micro scale (~1:5,000), the meso scale (~1:25,000) and the macro scale (<1:50,000). Three study areas, located in the southwestern part of the Federal State of North Rhine-Westphalia (NRW), representing the different scale levels, have been selected as study areas: the rivers Rur and Erft, Inde and Saubach (see Azzam et al., 2003). These areas were selected in particular because they show the greatest possible variability of natural area and anthropogenic factors, which is also reflected in the data available, their resolution and quality. The method, data and results of the assessment of groundwater vulnerability has already

been described in detail by Bogena et al. (2004) and will not be discussed in detail in this context. However, the results show that the assessment groundwater vulnerability leads to different results depending on the scale of the data used for the derivation. The reason for this is mainly due to the fact that the knowledge of the deeper subsurface, which is an important factor for vulnerability assessment, depends on the scale of the data. Whereas on the micro scale borehole data provide detailed knowledge on the three dimensional structure of the subsurface, macro scale data, often general geological maps, are containing only two dimensional information. Therefore, assumptions required for the »unknown« characteristics of the subsurface missing in the data may lead to different results concerning groundwater vulnerability assessment. On the other hand, the usability of the information system requires information on the quality of the presented results. Therefore, methods for quality assessment of the input data and calculation results have been developed. These methods include the quantification of numerical uncertainties using Gaussian error propagation. For the groundwater recharge level - one input parameter for the groundwa-

ter vulnerability assessment - this has been discussed already in detail in Bogena et al. (2005). For other input parameters uncertainty assessment has been performed on the basis of best case / worst case calculations. In Figure 1 the results of the groundwater vulnerability and uncertainty assessment is shown for the example of the macro scale area. It becomes clear that the vulnerability as well as its uncertainty varies strongly within the investigation area depending on the individual situation at a certain site. It can be concluded that a spatially differentiated assessment of groundwater vulnerability and the estimation of uncertainties is possible on the different scale levels. Results are creditable from geoscientific point of view. However, the results are different on different scale levels due to different data. Because the concept of groundwater vulnerability according to Hölting et al. (1995) contains a significant amount of »expert judgement« and interpretation of the data, validation of the final result (GWV) is not possible. Therefore, the quality of groundwater vulnerability assessment can only be evaluated within and not across scale levels.

Figure 1: Groundwater vulnerability and uncertainty of vulnerability classes due to data uncertainties

3. Implementation of the Spatial Data Infrastructure The Spatial Data Infrastructure (SDI) should reach pilot operation within the projectâ&#x20AC;&#x2122;s overall duration processing distributed geodata in a standardised way. If the effort is acceptable, user demands should be met in an appropriate way. User workshops have been organized to investigate the demands of providers and users. An incremental, iterative model of software development has been applied to meet changing requirements. The SDI consists of web-based components and services, which can be divided into three tiers (see figure 2): 1. The data tier, representing the distributed data inventories and providing access to metadata and catalogue service. 2. The business logic tier, implementing the geoprocessing capabilities and the components needed for the rule based derivation. 3. The presentation tier, providing a webbased graphical user interface to interact with the system. 3.1 Data Tier The data tier contains both descriptive data information (metadata) required by the system to select the appropriate input data for groundwater vulnerability assessment and the input data itself. All elements of the data tier were implemented as individual web-services compliant to current OGC specifications (foremost Web Coverage Service, Web Feature

Figure 2: Spatial Data Infrastructure, conceptual view

Service and Catalogue Service Web (see Evans 2003, Vretanos 2002, and Nogueras-Iso et al. 2005, respectively)). Metadata about available and used data are required for the tasks of the other tiers. From the case studies it became evident that information concerning data quality is very important for uncertainty analysis and the assessment of the results presented through the system by the users. The different available common metadata standards (e.g. FGDC, ISO, Dublincore) were evaluated with respect to their capability in specifying data quality information. From this evaluation the ISO 19115 standard has been selected to be used, the mandatory elements to be specified were defined. Data quality was characterized by general numeric uncertainties (e.g. 10%), references to digital uncertainty maps provided by webservices and / or other qualitative information. For all input data used by the system appropriate metadata sets, containing information e.g. about data formats, coordinate systems, data quality and online access (addresses, access restriction etc.) have been created. Technically, the data structure was defined by an XLM Schema document according to the ISO 19115 specification and reproduced in an ORACLE 10g database by registering the XML Schema. All metadata have been verified according to this schema and stored in the database as the XMLTYPE records. Business logic and presentation tiers communicate with the metadatabase via an OGC CS-W

2.0-compliant catalogue service. For this purpose, the existing deegree2 (http://www.deegree.org) catalogue service was modified and extended to support both the used ISO 19115 metadata and XMLTYPE data. 3.2 Business-Logic Tier The business logic tier is the mediator between the data tier and the presentation tier. It provides the main service for generation of information and rule based derivation of geoinformation. The two main components implemented are the data processing component and the rule based derivation component, which are introduced in the following two sections. 3.2.1 Data Processing Component According to figure 2 the business logic tier is the centre of any SDI, thus it is placed between the data tier and the presentation tier. It is capable of integrating several data sources of heterogeneous origin. The business logic tier consists of several components which encapsulate all tasks needed for accessing data, generating information out the data and forwarding results to the presentation tier (i.e. the user). Inside the business logic, all kinds of tasks performed conventionally in geo information systems, can be implemented. The tasks needed for geoprocessing, i.e. the processing of spatial data according to a strictly defined geospatial algorithm, consist mainly of three elements: 1. Connection of web services which serve as data providers 2. Application of the geospatial algorithm (in this case: the calculation of groundwater vulnerability) 3. Preparation of information for output (e.g. on a handheld device or through a webbased information system) According to the principles of service-oriented architecture (Chappell and Jewell, 2003), from which the concept of spatial data infrastructures is derived, the business logic component has also been implemented as a web service. To generate uniform information, three steps have to be undertaken: 1. Providing factors as OGC compliant services (either WFS or WCS)

2. Accessing the services in order to acquire the provided data, to transform all data to grid data and to use map algebra (Tomlin, 1990) for calculating the geoinformation »Groundwater Vulnerability« 3. Providing the result as W3C compliant SOAP web service for platform-independent consumption Figure 3 gives an overview of the service interaction. Clients (here: a web application, generally: any kind of http client) access an integrative SOAP web service »Groundwater Vulnerability« which encapsulates services for authentication and the assessment of groundwater vulnerability. The base data are not coupled with the system. An OGC-compliant catalogue service acts as a data broker for distributed data services. The data is computed by a map algebra service. The results of this process are afterwards incorporated into a map-like display by the »base service groundwater vulnerability mapping«. The groundwater vulnerability map classifier acts as a statistic calculation module, computing map parameters, such as mean distribution of values, minimum/ maximum values of computed results, etc. All the complexity remains hidden to the service consumer who just accesses the integrative soap web service which is well defined by a WSDL-interface . The web application consuming this web service has not to be installed on the same machine as the one providing the service. Figure 4 shows four screens of the information systems in front of a map of the study area. The main tasks performed by users can be described as: 1. select the area of interest based on a topographic map provided by a public data provider 2. inform by means of generating geoinformation from distributed data inventories 3. compare by generating several maps based on different data sources and afterwards let the system compare the maps by e.g. map algebra tasks like subtraction, addition, etc. 4. decide on a specific topic by using the information generated by the system

Figure 3: Communication schema inside the developed SDI

3.2.2 Rule Based Derivation Component – The Business Rules Approach (Kardasis and Loucopoulos, 2004) define business rules as »Projections of external constraints on an organisation’s way of working, and on its supporting information systems »functionality«. Those »external constraints« projected on spatial information are for example the limits of a groundwater model, the vali-

dity of geological assumptions related to scale levels or restrictions regarding generalization of data sets. This concept is enhanced by Wan-Kadir and Loucopoulos (2004) who extend the business rule approach by the concept of »evolvable software architecture«. This definition aims to separate the code from the business logic. The strict separation allows software to grow with

Figure 4: The main tasks of the developed spatial data infrastructure

new requirements. It focuses on a just-in-time integration of business logic. The advantages, especially for the enterprise technologies, are fast integration of altered market situations, integration of new laws, etc. The relevant aspects of business rules for the project are: - Strict separation of programming code from business logic - The definition of restrictions on base data (e.g. scale levels) - User-defined rule alteration (during runtime) - Integration of already existing metadata This architecture (figure 5) enables a strict separation of domain experts, software engineers and users. The software engineers choose the implementation technology which is fully transparent to the user. Strictly defined metadata schemes are not suitable for inserting business rules. Their purpose is to describe the underlying spatial data sets. Business rules have to be defined in a separate rule repository. To ensure interaction

in distributive software environments the definition of business rules in form of eXtensible Markup Language (XML) (Harold and Means, 2004) files is a suitable way which has been followed in this project. An XML schema definition has been developed to specify the valid data types for any given rule set. The XML schema defines for example minimum or maximum scale for integrating a file into the process of information retrieval or defining another data set, from which the proposed information could be derived. The XML schema allows an easy integration of constraints in the process of data selection. XML files validated against the schema contain simple rules which are transformed from textual representation of expertâ&#x20AC;&#x2122;s knowledge. A business rule declaration could be specified containing at least the following entries for any given spatial data set: - minimum and maximum values for scale restrictions - algorithms for generalization

Figure 5: System of rule based derivation of geoinformation

URL to alternative data sets URL to data set, from which the current data set is derived - rules for deriving factors from base data (e.g. how to convert usable field capacity to factor soil) - textual use restrictions (containing nonnumeric information about data quality in context of different scales) - any other rule description Any XML file containing rules has to be validated against XML schema in order to ensure correctness of all entries (e.g. preventing the user from entering a negative scale level). The generation of XML files is easy. Users simply fill in web forms to define, for example, scale restrictions. Three interfaces provide standardized communication between the rule component and the

user (human or machine): - GetCapabilities: well-known of any Open Geospatial Consortium compliant Web Service (OWS), this interface provides the service metadata - GetRules: retrieves all rules defined for a specific data set - DefineRule: Definition (and alteration) of new rules for a specific data set Those interfaces allow the implementation of the rule component into the SDI and also the integration in any other distributed environment due to its generic nature and high-levelabstraction as well. 3.3 Presentation Tier The main objective of the presentation tier was the development of an exemplary web application which provides easy to handle functio-

nalities to analyse the different possibilities of calculating groundwater vulnerability on different data sets. The application should work without proprietary software components. It should also communicate via standardised interfaces (W3C, OGC) with other components in order to be exchangeable. To meet technical and user requirements, which were discussed in workshop sessions, an exemplary web application has been developed using Apache turbine framework and velocity. It contains modules - to communicate with the Business logic - to analyse user requests - to transform GWV results into graphical output (maps, diagrams) and - to calculate statistics. The Business logic and presentation tier are communicating via http using WSDL interfaces (W3C compliant). The most important functionalities of the presentation tier are: - An easy to handle predefinition of the area under investigation (figure 6) using an existing Web Map Service with background information (Topography)

- Provision of original data and GWV results - Provision of metadata and data uncertainties - Clear presentation of GWV results in the form of maps, diagrams or tables - Calculation of GWV variations by varying input parameters (e.g. scale) and - Calculation of differences between GWV variations (figure 7). In workshop users emphasized the benefits of calculations on distributed data and the adequate presentation of metadata and uncertainties.

4. Conclusions Referring to the objectives of the BMBF-/DFGprogramme »Geotechnologien – Information Systems in Earth Management«, the project »Geo Web Service Groundwater Vulnerability« delivers different results of research. First of all, the Geo Web Service GWV shows how distributed heterogeneous data can be provided as OGC compliant data services (WFS, WCS). The data services are directly used by the Geo Web Service GWV, but can also be used by other spatial data infrastructures. The service oriented provision of data leads to

Figure 6: Predefinition of the area under investigation

Figure 7: Comparison of the computation results for different input parameter (e.g. scales)

syntactical interoperability. Establishing a new OGC specification for Web Processing Services (WPS) produces new possibilities for geo web services. It fills the gap between different data sets, because now intersecting and other operations on data services can be provided in a standardised way. Using strategy patterns in the Geo Web Service GWV, algorithmic modifications can easily be carried out. The Geo Web Service GWV enables integration of domain expertise just-in-time. A rule based reasoner controls the data sets used for calculation. Rules are represented by XML-files and processed by SAX-events. The implementation is based on components and services, provided by distributed servers. Standardised interfaces (OGC, W3C) can be

used to communicate with other services and applications or clients. Additional functionality therefore can easily be used by clients when implemented in the business logic. The used free and open source software has proved its applicability. It is used to reduce expense and to alleviate transferability to other fields of investigation. The Geo Web Service GWV is a very practical example of an online-calculating, easy to handle web application. Users who take part in the pilot operation confirm the advantages of the system. The automated online-calculation offers new possibilities for data providers and clients. The system reduces the need for secondary data storing, especially when data are changing fast. The possibilities of combining data from different providers facilitates data

handling. The need of ordering or converting data is minimized. The transferability to other areas of application is given. Many geoscientific or engineering questions can be answered by similar systems. Therefore it can be regarded as a contribution to widen the range of existing geo web service applications and to increase the acceptance using web technologies for geoscientific matters.

5. References Azzam, R.; Bauer, Ch.; Bogena, H.; Kappler, W.; Kiehle, Ch.; Kunkel, R.; Leppig, B.; Meiners, H.-G.; Müller, F.; Wendland, F.; Wimmer, G. (2003): Geoservice Groundwater Vulnerability - Development of an Information Infrastructure for the Rule-based Derivation of Geoinformation from Distributive, Heterogeneous Geodata Inventories on Different Scales with an Example regarding the Groundwater Vulnerability Assessment. Geotechnologien; Information Systems in Earth Management, Kick-Off-Meeting, University of Hannover, 19. February 2003, Projects. Geotechnologien Science Report, 2, Koordinierungsbüro GEOTECHNOLOGIEN, Potsdam, Germany, 31-35. Bogena, H.; Kunkel, R.; Leppig, B.; Müller, F.; Wendland, F. (2004): Assessment of Groundwater Vulnerability at Different Scales. Geotechnologien; Information Systems in Earth Management, Status Seminar RWTH Aachen University, 23-24 March 2004, Programme & Abstracts. Geotechnologien Science Report, 4, Koordinierungsbüro GEOTECHNOLOGIEN, Potsdam, Germany, 30-34.

Online:https://portal.opengeospatial.org/files/? artifact_id=3837 Harold, E.R., Means, W.S. (2004): XML in a nutshell. Cologne: O’Reilly, 3rd edition. Hölting, B.; Haertlé, T.; Hohberger, K.-H.; Nachtigall, K.-H.; Villinger, E.; Weinzierl, W.; Wrobel, J.P. (1995): Konzept zur Ermittlung der Schutzfunktion der Grundwasserüberdeckung. Geologisches Jahrbuch (63). Hannover (Schweizerbart): 7-20. Kardasis, P., Loucopoulos, P. (2004): Expressing and organising business rules. Information and Software Technology 46, 701-718. Nogueras-Iso, J.; Zarazaga-Soria, J.; MuroMedrano, P. (2005): Geographic Information Metadata for Spatial Data Infrastructures. Berlin (Springer). Tomlin, C.D. (1990): Geographic information systems and cartographic modelling. New Jersey (Prentice Hall). Vretanos, P. (ed.) (2002): Web Feature Service 1.0. OGC, 02-058. Online: https://portal.opengeospatial.org/files/?artifact_id=7176. Wan-Kadir, W.M.N., Loucopoulos, P. (2004): Relating evolving business rules to software design. Journal of Systems Architecture 50, 367-382.

Bogena, H.; Kunkel, R.; Montzka, C.; Wendland, F. (2005): Uncertainties in the simulation of groundwater recharge at different scales. Advances in Geosciences 5, 1-6. Chappell, D.A.; Jewell, T. (2003): Java Web Services. Köln (O’Reilly). Evans, J.D. (Hrsg.) (2003): Web Coverage Service (WCS), Version 1.0.0. OGC03-065r6.

Overcoming Semantic Heterogeneity in Spatial Data Infrastructures Lutz M. (1), Christ I. (2), Witte J. (3), Klien E. (1), H端bner S. (3) (1) Institute for Geoinformatics (IfGI), M端nster, {m.lutz|klien}@uni-muenster.de (2) Delphi InformationsMusterManagement (DELPHI IMM), Potsdam, ingrid.christ@delphi-imm.de (3) Center for Computing Technologies (TZI), Bremen, {witte|huebner}@tzi.de

1 Introduction Spatial data infrastructures (SDIs) play a major role for searching, accessing and integrating heterogeneous geographic data sets and geographic information (GI) services. The standards of the Open Geospatial Consortium (OGC) provide a syntactical basis for data interchange between different user communities. But this is only the first step, as semantic heterogeneity (Bishr 1998) still presents an obstacle on the way towards full interoperability (Sheth 1999; Sondheim et al. 1999; Egenhofer 2002). In SDIs, existing standards fail to address semantic problems that occur due to heterogeneous data content and heterogeneous user communities (e.g. different languages, terminologies, and perspectives). Semantic heterogeneity occurs at different levels. At each of these levels, it can inhibit tasks that are essential to the success of SDIs. - At the metadata level, semantic heterogeneity impedes the discovery of geographic information; - at the schema level, semantic heterogeneity impedes the retrieval of geographic information; and - at the data content level, semantic heterogeneity impedes the interpretation and integration of geographic information. It is the goal of the work presented in this paper to enhance important tasks in SDIs by overcoming these semantically heterogeneity problems. We present an ontology-based methodology for enhancing GI dis-covery, retrieval, interpretation and integration in SDIs, which has been developed in the meanInGs project1. To illustrate its benefits and practical

use, we introduce two examples: - an example from the hydrology domain for illustrating discovery, retrieval and transformation, and - an example from the geology domain for illustrating interpretation and integration. The remainder of the paper is structured as follows. Chapter 2 elaborates on the problems caused by semantic heterogeneity on the metadata, schema and data content levels. In chapter 3, we explain the notion of ontologies and introduce the ontology architecture and language employed in both presented ap-proaches. Section 4 describes the proposed methodology for overcoming semantic heterogeneity at the metadata and schema levels. The approach for dealing with semantic heterogeneities particularly at the data level is described in section 5. We conclude the paper with a discussion of related work (section 6) and a conclusion and outlook to future work (section 7). 2 Problems Caused by Semantic Heterogeneity The Metadata Level In current SDIs, catalogues (OGC 2004) provide query functionalities based on keywords and/or spatial filters. The metadata fields that can be included in the query depend on the meta-data schema used, e.g. ISO 19115 (ISO/TC-211 2003), and on the query functionality of the service that is used for accessing the metadata. Even though natural language processing techniques can increase the semantic relevance of search results with respect to the search request (e.g. Richardson & Smeaton

1995), keyword-based techniques are inherently restricted by the ambiguities of natural language. If different terminology is used by providers and requesters keyword-based search can have low recall, i.e. not all relevant information sources are discovered. If terms are homonymous or because of the limited ability to express complex queries in keyword-based search, precision can also be low, i.e. some of the discovered services are not relevant (Bernstein & Klein 2002).

The Schema Level Once an appropriate data source has been discovered, it can be accessed through a standardized interface like a Web Feature Service (WFS) (OGC 2002). Here, semantic heterogeneity can cause another difficulty. While the service can be queried for the schema of a data source, a requester might still run into trouble when formulating a query filter if the property names are not intuitively interpretable. Also, when the retrieved data are to be consumed by another service (e.g. in a composite service chain) they might have to be mapped from the providing serviceâ&#x20AC;&#x2122;s (source) schema into the consuming serviceâ&#x20AC;&#x2122;s (target) schema.

The Data Content Level Problems can also occur when interpreting the content of data, in particular if the semantics of values depend on some reference system (e.g. units of measure or a classification system). For example, it is difficult to interpret a value correctly, if the unit of measure is not given. Problems can also occur when classification systems (e.g. for rock, soil, or vegetation types) are used. They can differ between information communities (e.g. between geology and soil science), but also within one information community when the vocabulary used by the information community changes over time. The resulting heterogeneities present serious problems when several datasets using different classification schemes are to be represented in a common map, interpreted by a user or combined for analysis. Like-wise, the data integra-

tion task within a composite service chain requires the detection and elimination of semantic heterogeneities, e.g. transformations of values between different units of measure. 3 Building Blocks for Overcoming Semantic Heterogeneity Ontologies can be employed for making the semantics of the information content of geospatial web services explicit. In this section, we describe the ontology architecture (section 3.1), ontology language (section 3.2) and reasoning procedures for matchmaking (section 3.3) that are employed in the proposed methodology. The notion of registration mappings (section 3.4) is required to establish a link between a data schema and its semantic description, which is crucial for the tasks of data retrieval and schema transformation. The rule-based method for semantic mediation (section 3.5) is required for detecting and eliminating semantic heterogeneities for the task of data transformation. 3.1 Ontology Architecture The backbone of our methodology is an infrastructure of geospatial domain and application ontologies (Fig.1). Domain ontologies represent the basic concepts and relations that are known to all members of an information community. Together they form the shared vocabulary of that domain. Based on these common terms application ontologies are derived that further constrain specific concepts and relations and thus describe a particular application, e.g. a geographic dataset or the categories in a classification scheme. A user searching for data or a category with certain properties can also use the concepts and relations from the shared vocabulary to specify a query. As both application ontologies and queries are based on the same terms, they become comparable â&#x20AC;&#x201C; and thus the commitment of providers and requesters to a common shared vocabulary ensures semantic interoperability.

Figure 1: The hybrid ontology approach, modified from (Wache et al. 2001)

3.2 Description Logics The ontologies shown in this paper are expressed using a Description Logic (DL) (Baader & Nutt 2003) notation used in the RACER system (Haarslev & Möller 2004). DL is a family of knowledge representation languages that are subsets of first-order logic (for a mapping from DL to FOL, see e.g. Sattler et al. 2003). They provide the basis for the Ontology Web Language (OWL), the proposed standard language for the Semantic Web (Antoniou & Van Harmelen 2003). The basic syntactic building blocks of a DL are atomic concepts (unary predicates), atomic roles (binary predicates), and individuals (constants). The expressive power of DL languages is restricted to a small set of constructors for building complex concepts and roles. Implicit knowledge about concepts and individuals can be inferred automatically with the help of inference procedures (Baader & Nutt 2003). A DL knowledge base consists of a TBox containing intensional knowledge (declarations

that describe general properties of concepts) and an ABox containing extensional knowledge that is specific to the individuals of the domain. In our work, we only use TBox language features, namely - concept definition: (define-concept C D), - concept inclusion: (implies C D), and - role definition: (define-primitive-role R :parent P :domain C :range D). The domain of a role is a concept describing the set of all things from which this role can originate. This notion of the term should not be confused with the notion »domain of interest« (as in domain ontology). The range of a role is a concept describing the set of all things the role can lead to. Concepts can be defined using the following constructors: D

(and E F) (or E F) (all R C) (some R C) (at-least | at-most | exactly n R)

(intersection) (union) (value restriction) (existential quantification) (number restrictions)

3.3 Subsumption Reasoning There are two different types of user queries. The user can either choose an existing concept from a domain or application ontology (simple query), or she can define a concept based on the concepts and relations in the shared vocabulary (defined concept query). Both cases result in a query concept that can be used in the matchmaking. To determine whether a concept describing a data source or category in an application ontology is a match for a given query concept, we use one of the available DL inference procedures, computing subsumption relationships between concepts. Subsumption reasoning determines whether a concept is more specific or general than another one. A data source or category is a match for a given query, if the corresponding concept is subsumed by, i.e. more specific than, the query concept. For a more detailed introduction to DL languages and different subsumption algorithms see (Baader & Nutt 2003).

3.4 Registration Mappings for GI Retrieval In order to establish a mapping between a data source’s schema and its description in an application ontology, we have introduced registration mappings (Bowers & Ludäscher 2004). An example registration mapping2 for a feature type representing a water level measurement is shown in Fig. 2. The complete feature

type is mapped to an application concept (chmi_Measurement) and its properties are mapped to a contextual path in the ontology. For example, the mapping from tok to chmi_Measurement.quantityResult.observedW aterBody.name states that the property represents the name of the water body, in which the measurement was taken. The main idea of registration mappings is to have separate descriptions of the application concept C (called semantic type in Bowers & Ludäscher 2004) and of the structural details of the feature type it de-scribes (called structural type). This has the advantage that the semantics of the feature type can be specified more accurately in application concepts if the specification does not try to mirror the feature type’s structure. This is especially true for feature types that have a »flat« structure that does not well reflect the conceptual model of the domain. The property tok in figure 5, for example, represents the name of the river, where the water level measurement was taken.

3.5 Semantic Mediation for GI Service Composition The goal of the matchmaking approach described in section 3.3 is to identify data sources that exactly match the semantics required by the requester. When matchmaking is done in the context of composing a complex service chain, these requirements are also determined by the application semantics and schema of an existing

Figure 2: An example registration mapping for the GML document shown on top

service (which is to be combined with the service to be discovered). In this case, mediation between the schema of the discovered service and that of the target service can become necessary. Our approach to mediation is based on the idea of semantic mediation introduced in Wache (2003). When compared to approaches from the area of schema integration (for an overview, see Conrad 2002), semantic mediation especially focuses on semantic heterogeneities. The specification of the integration mapping on the semantic level is identified as the most difficult task when developing a mediator for integrating heterogeneous information sources. As information on the semantics of an information source is often only available inherently inside the information systems managing the information, an explicit semantic description of information sources is introduced in addition to syntactic descriptions. This semantic description consists of two parts: a description of meaning and a description of context (Fig. 3).

Figure 3: Semantics as a composition of meaning and context

The meaning is defined unanimously across all data sources in a domain e.g. by a concept WaterLevel. It can be used to identify semantically equivalent information or semantic heterogeneities within one domain (Wache 2003). The context describes the different representations of semantically equivalent information in different data sources, e.g. the unit or the scale of WaterLevel. It can be used to identify and solve

semantic heterogeneity problems between specific data sources within one domain. The specification of the integration mapping is based on a rule-based approach. The mapping consists of transformation rules, which are divided into rules for solving structural and semantic heterogeneities. Accordingly, two kinds of transformation rules can be distinguished: - Query decomposition rules are based on the Âťglobal-as-viewÂŤ principle (Levy 1999; Halevy 2001) and enable the splitting of a query against a global (target) schema into several subqueries against the respective information sources. They are used to solve conflicts caused by structural heterogeneities. - Context transformation rules specify how a piece of information can be transformed from one context into another. They are used to solve conflicts caused by semantic heterogeneities.

4 Enhancing GI Discovery, Retrieval and Integration This sections illustrates how semantic heterogeneity can be overcome to enhance the discovery, retrieval and integration of geographic information. We first introduce a scenario in section 4.1, which we use for illustration. In section 4.2, an approach for ontology-based GI discovery and retrieval is introduced. Section 4.3 describes an approach for semantic translation, and section 4.4 introduces the SDI architecture for implementing the presented approaches.

4.1 Hydrology Scenario In this section, we use the following scenario for illustration. A service developer wants to implement a service chain (ISO 2005) that provides fast and up-to-date access to water level measurements in a certain river, interpolates these measurements along the river course and visualizes the interpolation results. Such a service chain could for example enable the detection of hazard areas during flood events. The central component of this service chain is the service that interpolates the water level

measurements. In order to execute this service in an open and distributed environment the following steps are necessary: (1) appropriate input data (provided by WFS (OGC 2002)) have to be discovered, (2) the input data have to be retrieved using given constraints, and (3) the retrieved data have to be transformed to fit the re-quirements of the interpolation service. In current SDI architectures the user will face the problems described in section 2. On the metadata level, keyword-based search in a catalogue will result in (1) low recall, e.g. if the user searches for »water level measurement« whereas the service is described with »tide scale«, and (2) low precision, e.g. if the user searches for the water level in rivers whereas the services provides water level in groundwater, and both use »water level measurement« in their descriptions. On the schema level, it might be difficult for a user to correctly interpret the meaning of the discovered feature type’s property names, e.g. if a water level measurement is called »height«. This makes the formulation of the WFS GetFeature request a difficult task. Finally, on the data content level, the results returned by the WFS may be incorrectly interpreted by the consuming interpolation service due to missing information on the provided data. If, e.g., the interpolation service expects water level measurements in meters and the WFS provides water level measure-ments in centimeters, this will lead to wrong interpolation results.

4.2 Ontology-Based Discovery and Retrieval of Geographic Data To address the problems on the metadata and schema levels, we have proposed an approach that enhances the discovery of geographic data in SDIs by providing ontology-based descriptions (section 3.1) and integrates the discovery and retrieval processes to make the overall task of GI retrieval more user-friendly (Lutz & Klien 2006).

Rather than having to formulate separate queries for discovering data and retrieving it, the requester only has to formulate one query for the data she is interested in. This query is based on terms from existing domain ontologies and is automatically translated into one or several DL concepts. These concepts are used as query concepts for discovering semantically appropriate feature types. The matchmaking between the query concept and the application concepts describing feature types is based on subsumption reasoning and follows the approach described in section 3.3. After the requester has selected one of the discovered feature types, a GetFeature query, which uses the property names of the feature type's application schema, is constructed from the user query. This step requires registration mappings (section 3.4) between the feature type's properties and the roles from the domain ontology. As a final step, the derived WFS query is then executed and its results are returned to the user. In the following, we illustrate the translation of a requester's query (1) into a DL query concept (2) and subsequently into a GetFeature request (3) based on the scenario described above. (1) An example query statement for finding water level measurements for the Elbe River provided in centimeters for a given date (2004-04-22) and location is presented in Fig. 4. This query is based on concepts and relations from three domain ontologies (for the domains of MEASUREMENTS, HYDROLOGY and GEOGRAPHIC FEATURES). Note that the constraints are either type restrictions or comparisons with a value specified by the requester (value constraints). Value constraints can only be defined for roles whose range is an XSD datatype or a GML geometry type. In addition to common string and number comparators (such as ≥ or startsWith), spatial comparators such as withinBoundingBox, intersects or withindistance-of can be used.

Figure 4: Example for a semantic query statement. The keywords of the proposed syntax are shown in capitals, the comparators in italics

(2) This query can be translated following the guidelines in Lutz & Klien (2006) into the following DL query concept. Type constraints are expressed through universally quantified value restrictions in DL. Value constraints only become relevant when the data are retrieved from the WFS. In order to be able to express these constraints as a filter expression in the GetFeature query, it is important that the feature type contains the property to be constrained. Therefore, in the discovery phase, value constraints are expressed as existential quantification on the specified roles.

This query concept is then used for discovering appropriate data sources based on DL subsumption reasoning (section 3.3). (3) In the next step we want to retrieve the requested information from the discovered WFS. This requires formulating a GetFeature request including a filter expression. In order to do this, the structure of the WFS’s feature type has to be known. Also, the property names of the selected feature type that are equivalent to the domain ontology terms used in the query statement have to be derived. All the required information can be accessed from the feature type’s registration mapping. By using the registration mapping shown in Fig. 2 (p. 4), the example query statement shown in Fig. 4 can be translated into the WFS GetFeature request (OGC 2002b) and filter expression (OGC 2001) shown in Fig. 5.

4.3 Semantic Translation of Geographic Data When considering a service chain composed of several separate services (as in the presented example), an additional step might be required after discovering and retrieving appropriate data. If the structure and semantics of the data provided by one service (e.g. the WFS providing the water level measurements) do not match exactly those required by the consuming service (e.g. the interpolation service) a translation becomes necessary.

Figure 5: A WFS query that requests the property stav from a feature type called StavVody. The filter expression constrains the query to features whose tok property equals »Elbe«, whose datum property equals »2004-04-22« and whose position property is within the specified bounding box

In general, translations connect one or more data sources to a destination with the help of appropriate conversion rules. The challenge of a semantic translation is not to process the translation, but to discover and to specify the translation, in our case of the GML documents returned by a WFS. For the discovery, the semantics and the structure of the source(s) (i.e. the WFS) as well as of the target (i.e. the interpolation service) have to be examined. Meaning and context can be extracted for each feature type property from the registration mappings and application ontologies of the source and target services. Based on the meaning, semantic correspondences can be identified between the different attributes for each feature type based on subsumption reasoning (step 2 in section 4.2). After matching concepts have been discovered, appropriate mappings between the corresponding properties of the feature types are generated automatically for a defined domain. These mappings are combined for different feature types as query transformation rules. The rules (section 3.5) contain the information, how queries can

be applied to one schema of a feature type, specified by a corresponding description in the rule head, and can be transformed into equivalent queries against other feature types, specified in the rule body. Thus, for example, a query with the schema of the feature type required by the interpolation service can be transformed on property level into a request for a feature type StavVody as is shown in Fig. 6. The rule decomposes a query against a feature type called Measurement from the interpolation service into a feature type called StavVody (as instances of the meaning Measurement). Additionally, it defines correspondences between different properties that are semantically equivalent, e.g. value from the feature type Measurement and stav from the feature type StavVody. After this, we have to take a look at the context of the semantic description. In our scenario, the interpolation service requires water levels in meters (see context description in Fig. 6) whereas the discovered WFS provides water levels in centimeters. Therefore the measurement values have to be converted. For this conversion, a library of predefined context transformation

Figure 6: A query decomposition rule that decomposes a query against a feature type called Measurement from the WLIS into a feature type called StavVody.

<?name1, (?meaning)::[meas:unitOfMeasure->> meas:Centimeter], ?datatype1, ?value1> @?source1 --> <?name2, (?meaning)::[meas:unitOfMeasure ->> meas:Meter], ?datatype2, ?value2> @?source2 :?value1 is ?value2 / 100.

Figure 7: Example for a simple context transformation rule for the conversion between centimeter and meter (domain-independent)

rules is used. These rules specify how to convert between different contexts. As an example, Fig. 7 shows a rule for transforming measurement values from centimeters to meters. To build query plans containing detailed sequences of query transformation rules and context transformation rules we are using logical inference based on a representation of the feature types and the rules in Horn Logic.

Figure 8: Simplified view on the service architecture

4.4 SDI Architecture In order to use the methods for enhancing discovery, retrieval and transformation in SDIs they have to be encapsulated in software components. In this section, we introduce an architecture that includes such components in addition to existing SDI components and describe the flow of information between them. Fig. 8 depicts a simplified view of the architecture including the tasks fulfilled by each of the components. The central component of the architecture is a client that manages the overall workflow (WS Client). In our current architecture, this service is tailored to its specific application, i.e. to execute the interpolation service chain. In a future version of the architecture, this client could be substituted by a generic workflow client that executes a customised description of the service chain to be executed, e.g. using the Business Process Execution Language (BPEL) (Andrews et al. 2003). The entry point for the ontology-based discovery (step 1 in Fig. 8) and retrieval (step 4) is the query client. This client provides a user interface for query formulation. It is built on the query language described in section 4.2, thus enabling users to pose formal and precise queries based on existing domain ontologies (Fig. 9).

Figure 9: User interface of the query client.

After a user query has been submitted, it is translated into a DL query concept (section 3.3). When submitting a query, the user might choose whether the transformation rules offered by the Semantic Translation Specification Service (STSS, see below) should be taken into account (step 2). If, for example, a transformation rule between the units meter and centimeter exists, it makes sense to search for feature types that offer water level measurements in both units, thus extending the potential result set. In our scenario, the STSS offers unit transformations for length measurements (e.g. from centimeters to meters) and the query concept is relaxed accordingly, i.e. the range restriction for the unitOfMeasure role is relaxed from Centimeter to the disjunction of unit concepts that are transformable into centimetres: (all unitOfMeasure (or Centimeter Meter Millimeter Yard Inch … )). Note that now it is no longer guaranteed that the discovered feature types exactly match the requirements of the interpolation service. However, it is guaranteed that all feature types can be transformed in the correct schema by the STSS. The query concept is sent to the Semantic Catalog Service to discover WFSs that provide

semantically appropriate feature types (section 3.3). In order to access the discovered feature type through its WFS interface, the query client then automatically constructs a GetFeature request from the user’s query using the property names of the feature type’s application schema (section 4.2). The query concept is sent to the OntologyBased Reasoner (OBR), which stores the registered ontologies and provides the reasoning functionality (step 4). Based on subsumption reasoning the OBR discovers semantically appropriate feature types and returns their metadata to the query client. (section 3.3). In order to access the discovered feature type through its WFS interface, the query client then automatically constructs a GetFeature request from the user’s query using the property names of the feature type’s application schema (section 4.2 (3)). For the discovery and specification of the transformation between a source and target schema the Semantic Translation Specification Service (STSS) has been developed. The input parameters of the STSS include the results of the ontology-based discovery and retrieval, i.e. the application ontologies, registration map-

pings, and GetFeature requests of the source services (the discovered WFSs) and the corresponding information for the target service (the interpolation service). The STSS determines the semantic correlations, selects the needed context transformation rules, adds changes to the structure and specifies a transformation, which is noted in XSLT (step 5). The actual transformation is performed by the Transformation Service (TS), which simply executes the XSLT rules and returns a GML document with the transformed water level measurements in meters (step 6). Now, the interpolation service correctly interprets all the data provided by the WFSs and the results of the interpolation (step 7) can be displayed in a WMS (step 8).

5 Enhancing GI Interpretation and Integration In this section, we address the problem of semantic heterogeneity on the data content level from a different angle. In section 5.1, we introduce another scenario (from the geology domain), which differs from the one introduced in the previous section in addressing the problem of different classification systems within one user community. In section 5.2, we present how ontology-based methods can be applied to support the tasks of interpretating and integrating geographic information.

5.1 Geology Scenario Semantic heterogeneity problems at the data content level occur not only between different information communities (e.g. from the domains of soil science and geology) but also within the same information community at different times. The latter applies for example to the Geology Survey of Saxony-Anhalt where different authors of geological maps have used different classifications at different times in history. This leads to the problem of synonymous and homonymous stratigraphic terms within the geological database. There are numerous questions concerning geological

data integration, like »Where are good conditions for groundwater formations?« or »Where is the geological subsoil suitable for a dump site?«. Obviously, this makes the creation of a common map presentation or data analysis a difficult task (if done in a conventional way). We use the geology scenario to illustrate how these problems are tackled using ontology-based data integration methods.

5.2 Ontology-based Data Integration In our approach, the meaning of the stratigraphic terms used in the different classification systems is made explicit by means of DL ontologies. As in the approach for GI discovery and retrieval, the use of ontologies should be transparent for the user. Therefore, we have implemented a function for translating user queries into DL query concepts, which can subsequently be used by an inference engine to do the actual matchmaking. The ontologies were modelled according to the hybrid approach as depicted in Fig. 1. A shared vocabulary for a subdomain of geology was created and, based on this, more specific application ontologies were defined for each stratigraphic concept from the RogensteinZone of the Lower Buntsandstein3 (like »su4«) according to different classification schemes like that of Jung or Fulda & Huelsemann. The graphical user interface (Fig. 10) allows two kinds of queries: A concept-based query providing pre-defined stratigraphic concepts from existing classification schemes and a more complex user-defined query based on petrographic characteristics. The latter allows, for example, the search for rocks offering a good protection against ground water pollution. Translated in petrographic characteristics this means it must be a solid rock or a soil based on clay or silt. This query would be translated into the following DL query concept:

(define-concept query (and Gestein (or (and (all hat-konsistenz Fest) (exactly 1 hat-hauptbestandteil)) (and (all hat-konsistenz Locker) (exactly 1 hat-hauptbestandteil) (or (at-least 1 hat-hauptbestandteil Ton) (at-least 1 hat-hauptbestandteil Schluff))))) )

Figure 10: User Interface for the geology scenario

The query is sent to the Ontology-based reasoner (OBR), which performs the matchmaking between the query concept and the concepts representing stratigraphic terms in one of the available classification schemes. The matching stratigraphic concepts in different classification schemes are shown in Fig. 11. These concepts are then used to generate a filter for the GetFeature request to the WFS that provides a standardized access to the geological database. The retrieved features are displayed by a WMS. As a result of the user-defined query, Fig. 12 shows the successful data integration in two neighbouring

map sheets in Saxony-Anhalt that use different classification schemes. The ontology-based approach presented in this section has a tremendous benefit compared to the conventional ones: Users can either use terms from a classification scheme they are already familiar with and still find features using a different (unknown) classification scheme. Furthermore, they can create their own search terms based on the petrographic characteristics available in the shared vocabulary and find the appropriate features irrespective of the classification scheme used by it. This is a crucial feature for using geological information

Figure 11: Results (extract) of the user-defined query for rocks with a good protection against groundwater pollution

Figure 12: Graphical Result for rock types offering a good protection against ground water pollution

as a basis for engineering decisions, for example the building of a dump site.

6 Discussion and Related Work The approach presented in this paper is related to previous work in the fields of geographic information science, information discovery and retrieval, data integration and artificial intelligence. A first step towards overcoming semantic heterogeneity in the geospatial domain has been the proposal of Integrated Geographic Information Systems (IGIS), i.e. systems that integrate diverse GIS technologies or reflect a particular point of view of a community (Hinton 1996). This idea has been advanced in Fonseca et al. (2002a) and Fonseca et al. (2002b) by introducing ontologies as means for supporting representations of incomplete information, multiple representations of geographical space, and different levels of detail. In SDIs, where geographic information is usually highly distributed and heterogeneous, solving heterogeneity problems becomes a prerequisite. One focus of the research presented here is to transfer the ontology approach for dealing with semantic heterogeneity to the SDI domain and to demonstrate how it can be integrated into existing standards-based architectures.

Work in the field of information discovery and retrieval is manifold. There is widespread agreement among researchers in this field that declarative content descriptions and query capabilities are necessary (Mena et al. 1998; Czerwinski et al. 1999; Guarino et al. 1999; Heflin & Hendler 2000). The vision of most research in this domain is that users should be able to express what they want, and the system should find the relevant sources and obtain the answer (Levy et al. 1996). As this might involve combining data from multiple sources, information discovery and retrieval is closely related to data integration, whose goal it is to provide a uniform interface (through a global schema) to a multitude of data sources (each with a local schema). In data integration terminology (Levy 2000), our approach can be considered as a ÂťLocal As ViewÂŤ approach. This means that the contents of a data source are described as a query over the mediated schema, which in our case is substituted by the ontology (see MĂ¤dche et al. (2001); Guha et al. (2003) for other examples, where ontologies are used in search and retrieval mechanisms). Usually, a query through the mediated schema in this approach requires complex transformation rules. With his semantic mediator, Wache (2003) suggests a way to generate these rules from fully annotated data sources semi-automatically with the help of assistants. The assi-

stants attempt to find inter-correspondences between data elements of query and sources. In our SDI-scenario, it is often not possible to ask the user for confirmation, especially not on lowlevel relations between sources the user is not familiar with. We circumvent this problem with the specialisation to a specific domain with explicitly pre-modelled information and relations, e.g., transformation rules between units (see chapter 3.5). In the BUSTER (Bremen University Semantic Translator for Enhanced Retrieval) project (Vögele et al. 2003), DL descriptions have been used to describe and query classifications (Visser & Stuckenschmidt 2002) and data content (Hübner et al. 2004; Vögele & Spittel 2004). However, these approaches use simple ontologies, and queries only have limited expressivity. They show how well established catalogue systems for electronic devices, namely ETIM and ecl@ss (Visser et al. 2002a), or landuse classification, ATKIS and CORINE Landcover (Visser et al. 2002b), can be used as the grounding shared vocabularies for semantic translation. However, as these classification schemes are often imprecise, miss details and contain hardly understandable verbal circumscriptions and even inconsistencies, a lot of adjustments were needed in order to transform them into ontological descriptions. Providing an ontology-based query interface that enables uniform access to heterogeneous data sources and supports the user in formulating a precise query has also been proposed by the SEWASIE4 project (Dongilli et al. 2004), which employs the same ontology and matchmaking approach for information retrieval. Moreover, the SEWASIE query interface enables an iterative refinement process of the query and utilizes natural language as query representation. While this certainly represents a userfriendly approach, it also additionally requires that the ontology engineer provides verbalizations for each ontology term. In contrast, we propose an intuitive but still formal query language. And whereas the SEWASIE query interface is developed for the needs of the Semantic

Web in general, we are focused on geospatial information infrastructures. A strategy based on semantics for supporting the discovery and integration of datasets and services is used in the Science Environment for Ecological Knowledge (SEEK) project (Pennington et al. 2004). We have benefited from the work conducted in SEEK by adapting the method of registration mapping (Bowers & Ludäscher 2004) for our purposes. The key difference of our approach lies in the combination of both tasks, the information discovery and retrieval, and to hide the complexity from the user. Integrating information from different user communities based on ontologies with the goal of displaying a single map (with a single well understood legend) has also been the focus of other research projects. In the GEON project, geological data from different US states have been combined according to a simple shared vocabulary comprising geological age, composition, fabrics, texture and genesis (Lin & Ludäscher 2003). Rather than using a hybrid ontology approach as proposed in this paper, the authors propose to use explicit mappings between different ontologies. Also, the ontologies used are simple is-a hierarchies and do not contain roles. In the HarmonISA project , landuse data from the border region of Austria, Slovenia and Italy have been combined in a single landcover map. In this project, a comprehensive shared vocabulary for defining land use classes has been developed based on the different national landuse classification systems. In contrast to the approach presented in this paper, where all areas matching certain requirements were searched, the HarmonISA project5 aimed at producing a landuse map that completely covers the area. Therefore, the authors used a complex similarity measurement between landuse type definitions rather than subsumption reasoning.

7 Conclusions and Future Work Problems caused by semantic heterogeneities can occur on different levels in SDIs. In this paper, we have illustrated how these problems can be overcome by using DL ontologies and reasoning. We have also shown how the proposed methodology can be encapsulated in services and clients and how these can be combined with existing SDI components. The two use cases illustrate how these intelligent services effectively support the semantic query, retrieval, translation and integration of geographic data. Moreover, we have shown that the approach also supports dynamic service chaining in order to answer complex queries. In our future work we will address the following issues: - Extensions of the tested scenarios. The tested scenario comprises requests and application schemas with a relatively simple structure. Also, the effects of the scale of a data source have not been taken into account. Future tests of the approach will include more complex request possibilities (like support for spatial comparators and nested queries) and data sources at different scales. Also, the effectiveness of the approach will be tested in a more generic setting with complex application schemas and examples from other domains. - Semantics of geoprocessing services. In scenarios where a service chain is required to answer a complex question, the semantics not only of the data but also of the services for processing the data are of vital importance. In our future work, we will therefore investigate approaches for the semantic description and discovery of geoprocessing services and examine how these can be combined with the presented approach for the discovery and retrieval of geographic data. For first steps in this direction, see Lutz (2005a,b). - Template Service Chains. In recent years, many researchers have addressed the automated generation of complex service chains based on user queries (e.g. Burstein et al. 2005). However, these approaches still face

many problems of complexity. A simpler solution for supporting the creation of complex service chains by the user could be based on providing generic templates for service chains that solve a particular type of task. Such a template should be a fixed combination of several generic service types, each of which performs a subtask of the overall functionality. In an iterative process, requesters could subsequently instantiate these templates with services discovered for each of these subtasks. - User-friendly generation of application ontologies. While our approach hides much of the complexity of the ontology-based GI retrieval from the requester, the data provider still has to create and register rather complex application ontologies. We are aware that this is one of the crucial bottlenecks for our approach to be accepted and used in future SDIs. Future work will therefore address how the process of creating formal descriptions of the geodata could be automated. First ideas on how this can be achieved using spatial analyses of geographic datasets are presented in Klien & Lutz (2005).

References Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I. & Weerawarana, S. (2003): Business Process Execution Language for Web Services, Version 1.1, BEA Systems, IBM, Microsoft, SAP,Siebel Systems. Antoniou, G. & Van Harmelen, F. (2003): Web Ontology Language: OWL, in: Staab, S. & R. Studer (ed.): Handbook on Ontologies, Springer: 67-92. Baader, F. & Nutt, W. (2003): Basic Description Logics, in: Baader, F., D. Calvanese, D. McGuinness, D. Nardi & P. Patel-Schneider (ed.): The Description Logic Handbook. Theory, Implementation and Applications, Cambridge, Cambridge University Press: 43-95.

Bernstein, A. & Klein, M. (2002): Towards High-Precision Service Retrieval, in: Horrocks, I. & J. Hendler (ed.): The Semantic Web - First International Semantic Web Conference (ISWC 2002): 84-101. Bishr, Y. (1998): Overcoming the Semantic and Other Barriers to GIS Interoperability, International Journal of Geographical Information Science 12 (4): 299-314. Bowers, S. & Ludäscher, B. (2004): An Ontology-Driven Framework for Data Transformation in Scientific Workflows, International Workshop on Data Integration in the Life Sciences (DILS'04). Burstein, M., Bussler, C., Pistore, M. & Roman, D. [ed.] (2005): Proceedings of the Workshop on WWW Service Composition with Semantic Web Services 2005 (wscomps05), University of Technology of Compiegne, France. Conrad, S. (2002): Schemaintegration – Integrationskonflikte, Lösungsansätze, aktuelle Herausforderungen, Informatik – Forschung & Entwicklung 17 (3): 101-111. Czerwinski, S., Zhao, B. Y. & Hodes, T. (1999): An architecture for a secure service discovery service, Fifth ACM/IEEE International Conference on Mobile Computing and Networking: 24-35. Dongilli, P., Franconi, E. & Tessaris, S. (2004): Semantics driven support for query formulation, in: Haarslev, V. & R. Möller (ed.): International Workshop on Description Logics (CEUR Workshop Proceedings). Egenhofer, M. (2002): Toward the Semantic Geospatial Web, The 10th ACM International Symposium on Advances in Geographic Information Systems (ACM-GIS).

Fonseca, F., Egenhofer, M., Agouris, P. & Camara, G. (2002a): Using Ontologies for Integrated Geographic Information Systems, Transactions in GIS 6 (3). Fonseca, F., Egenhofer, M., Davis, C. & Câmara, G. (2002b): Semantic Granularity in Ontology-Driven Geographic Information Systems, Annals of Mathematics and Artificial Intelligence 36 (1-2): 121-151. Guarino, N., Masolo, C. & Vetere, G. (1999): OntoSeek: Content-Based Access to the Web, IEEE Intelli-gent Systems 14 (3): 70-80. Guha, R., McCool, R. & Miller, E. (2003): Semantic Search, 12th International Conference on World Wide Web: 700-709. Haarslev, V. & Möller, R. (2004): RACER User’s Guide and Reference Manual. Version 1.7.19, URL: http://www.cs.concordia.ca/~haarslev/ racer/racer-manual-1-7-19.pdf. Last accessed: July 29, 2005 Halevy, A. Y. (2001): Answering Queries Using Views: A Survey, Very Large Data Bases 10 (4): 270-294. Heflin, J. & Hendler, J. (2000): Searching the Web with SHOE, Papers from the AAAI Workshop (In Artificial Intelligence for Web Search): 35-40. Hinton, J. (1996): GIS and Remote Sensing Integration for Environmental Applications, International Journal of Geographical Information Science 10 (877-890. Hübner, S., Spittel, R., Visser, U. & Vögele, T. (2004): Ontology-Based Search for Interactive Digital Maps, IEEE Intelligent Systems 19 (3): 80-86. ISO (2005): Geographic Information - Services. ISO 19119, International Organization for Standardization.

ISO/TC-211 (2003): Text for FDIS 19115 Geogaphic information - Metadata. Final Draft Version, International Organization for Standardization. Klien, E. & Lutz, M. (2005): The Role of Spatial Relations in Automating the Semantic Annotation of Geodata, Conference on Spatial Information Theory (COSIT 2005). Levy, A. Y. (1999): Combining Artificial Intelligence and Databases for Data Integration, in: Wooldridge, M. & M. M. Veloso (ed.): Artificial Intelligence Today: Recent Trends and Developments (LNCS 1600), Berlin, Springer: 249-268.

M채dche, A., Staab, S., Stojanovic, N., Studer, R. & Sure, Y. (2001): SEAL - A Framework for Developing SEmantic portALs, 18th British National Conference on Databases (Lecture Notes in Computer Science): 1-22. Mena, F., Kashyap, V., Illarramendi, A. & Sheth, A. (1998): Domain Specific Ontologies for Semantic Information Brokering on the Global Information Infrastructure, First International Conference on Formal Ontologies in Information Systems. OGC (2002): Web Feature Service Implementation Specification, Version 1.0.0, Open GIS Consortium.

Levy, A. Y. (2000): Logic-Based Techniques in Data Integration, in: Minker, J. (ed.): Logic Based Artificial Intelligence, Dordrecht, NL, Kluwer: 575-595.

OGC (2004): Catalogue Services Specification, Version 2.0 (OGC Implementation Specification 04-021r2), Open Geospatial Consortium.

Levy, A. Y., Rajaraman, A. & Ordille, J. (1996): Querying heterogeneous information sources using source descriptions, 22nd VLDB Conference: 251-262. Lin, K. & Lud채scher, B. (2003): A System for Semantic Integration of Geologic Maps via Ontologies, Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW).

Pennington, D., Michener, W. K., Berkley, C., Higgins, D., Jones, M. B., Schildhauer, M., Bowers, S., Lud채scher, B. & Rajasekar, A. (2004): Building SEEK: The Science Environment for Ecological Knowledge (SEEK): A Distributed, Ontology-Driven Environment for Ecological Modeling and Analysis (Abstract), in: Egenhofer, M., C. Freksa & H. Miller (ed.): The Third Conference of Geographic Information Science (GIScience 2004).

Lutz, M. (2005a): Ontology-based Descriptions for Semantic Discovery and Composition of Geoprocessing Services, GeoInformatica (in press).

Richardson, R. & Smeaton, A. F. (1995): Using WordNet in a Knowledge-based Approach to Information Retrieval (Technical Report CA0395), Dublin City University.

Lutz, M. (2005b): Ontology-Based Service Discovery in Spatial Data Infrastructures, in: Jones, C. & R. Purves (ed.): ACM Workshop on Geographic Information Retrieval (GIR'05).

Sattler, U., Calvanese, D. & Molitor, R. (2003): Relationships with other Formalisms, in: Baader, F., D. Calvanese, D. McGuinness, D. Nardi & P. Patel-Schneider (ed.): The Description Logic Handbook. Theory, Implementation and Applications, Cambridge, Cambridge University Press: 142-183.

Lutz, M. & Klien, E. (2006): Ontology-based Retrieval of Geographic Information, International Journal of Geographical Information Science(forthcoming).

Sheth, A. P. (1999): Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics, in: Goodchild, M. F., M. Egenhofer, R. Fegeas & C. A. Kottman (ed.): Interoperating Geographic Information Systems, Dordrecht, NL, Kluwer: 5-30.

Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H. & Hübner, S. (2001): Ontology-Based Integration of Information – A Survey of Existing Approaches, IJCAI-01 Work-shop: Ontologies and Information Sharing: 108-117.

Sondheim, M., Gardels, K. & Buehler, K. [ed.] (1999): GIS Interoperability, New York, John Wiley & Sons (Geographic Information Systems 1, Pronciples and Technical Issues).

see http://www.meanings.de/

taken from the hydrology example described in section 4.1.

Visser, U. & Stuckenschmidt, H. (2002): Interoperability in GIS - Enabling Technologies, in: Ruiz, M., M. Gould & J. Ramon (ed.): 5th AGILE Conference on Geographic Information Science: 291-297.

Oolitic Limestone

Semantic Webs and Agents in Integrated Economies, see http://www.sewasie.org/

see http://harmonisa.uni-klu.ac.at

Visser, U., Stuckenschmidt, H., Schlieder, C., Wache, H. & Timm, I. (2002a): Terminology Integration for the Management of distributed Information Resources, Künstliche Intelligenz 16 (1): 31-34. Visser, U., Vögele, T. & Schlieder, C. (2002b): Spatio-Terminological Information Retrieval using the BUSTER System, in: Pillmann, W. & K. Tochtermann (ed.): Environmental Communication in the Information Society, 16th Conference on Informatics for Environmental Protection (EnviroInfo): 93-100. Vögele, T., Hübner, S. & Schuster, G. (2003): BUSTER - An Information Broker for the Semantic Web, KI - Künstliche Intelligenz 03 (3): 31-34. Vögele, T. & Spittel, R. (2004): Enhancing Spatial Data Infrastructures with Semantic Web Technologies, 7th Conference on Geographic Information Science (AGILE 2004). Wache, H. (2003): Semantische Mediation für heterogene Informationsquellen, Berlin, Akademische Verlagsgesellschaft.

Advancement of Mobile Geoservices: Potential and Experiences Breunig M. (1), Bär W. (1), Thomsen A. (1), Häußler J. (2), Kipfer A. (2), Kandawasvika A. (3), Mäs S. (3), Reinhardt W. (3), Wang F. (3), Brand S. (4), Staub G. (4), Wiesel J. (4) (1) Research Centre for Geoinformatics and Remote Sensing, University of Osnabrück, Kolpingstr. 7, 49069 Osnabrück, Germany; E-Mail: Martin.Breunig@uni-osnabrueck.de (2) European Media Laboratory GmbH, Villa Bosch , Schloss-Wolfsbrunnenweg 33, 69118 Heidelberg, Germany E-Mail: Jochen.Haeussler@eml.villa-bosch.de (3) GIS Lab, University of the Bundeswehr Munich, Werner-Heisenberg-Weg 39, 85577 Neubiberg, Germany E-Mail: Wolfgang.Reinhardt@UniBW-Muenchen.de (4) Institute of Photogrammetry and Remote Sensing (IPF), University of Karlsruhe, Englerstr. 7, 76128 Karlsruhe, Germany, E-Mail: Joachim.Wiesel@ipf.uni-karlsruhe.de

1. Introduction Mobile information technology opens new perspectives and dimensions for the geosciences, by providing experts in governmental and non-governmental authorities, industry and science with ubiquitous access to geoscientific information. With this new instrument the digital acquisition, management, visualization, and analysis of geodata needed for the understanding of geoscientific processes and natural disasters can be supported directly in the field. The number of applications is increasing where geoinformation systems (GIS) have to cooperate with distributed mobile applications and with suitable geodatabase management systems (Balovnev, Bode, Breunig, Cremers, Müller, Pogodaev et al., 2004). The current paradigm shift from the development of monolithic GIS to flexible and mobile accessible geoservices can be recognized in many application fields. New geoservices will provide ubiquitous access to geodata needed in applications such as environmental monitoring and disaster management. Client applications communicating with geoservices have to efficiently acquire, visualize and manage application-specific 2D and 3D objects and complex spatiotemporal models (Breunig, Cremers, Shumilov & Siebeck, 2003).

In this contribution a geoscientific case study dealing with the analysis of land slides shows the potential behind mobile geoservices. Contributions to a distributed software system (Breunig, Malaka, Reinhardt & Wiesel, 2003) consisting of geoservices used by on-site clients for geodata acquisition, viewing, augmented reality, and geodata management are presented. The clients communicate over network with geodatabase services. Experiences are reported and finally, conclusions and a short outlook are given which address further research in the field of mobile geoservices.

2. Objectives of the project The concrete problem we are referring to in this project is the analysis of land slides at an area near Balingen in south-west Germany (Ruch, 2002). Since several years there are active creeping movements of the terrain, which may endanger the traffic and people using a nearby road. The geodetic measurements show a gradual sinking of the soil and rocks. A forecast for a slowing down or speeding up of the movements cannot be given. However, mobile data acquisition of the ongoing movements and remote data access to a central station help to watch the situation. The movement measurements are done by extensome-

Figure 1: Clefts in the Balingen case study area with extensometer measurement units

ters located in some of the biggest clefts (see figure 1). If conspicuous extensions of a monitored cleft are registered, an alarm is triggered and the local road is closed immediately. In this case a geologist then has to go to the area and decide if this was a false alarm or if the next landslide can be expected shortly. The main objective of the project is to show the potential of mobile geoservices prototypically for geoscientific applications like the Balingen land slide example. The available primary data of the Balingen examination area are fixed points with direction vectors, measurement plots of the extensometers, a digital elevation model, contour lines, path network, structural edges and slopes in scale 1:250. From these primary data the following interpreted data are constructed: stratigraphic boundaries and 3D strata bodies. Typical requirements of the Balingen case study to geoservices are: - Storage of 2.5D geodata (digital elevation

model), measurements data, and 3D models. Retrieval of stored geodata and computed deduced 2D profile sections. Online geodata acquisition and analysis of the terrain. Geodata editing of rocks and clefts in the terrain. Viewing of primary and interpreted data in the terrain. Overlapping of the 3D model with the physical reality by AR methods.

The Balingen case study is a well-suited example to demonstrate the use of modern geoservices supporting environmental monitoring and prediction in the geosciences.

3. Methods & results 3.1 Mobile acquisition of geodata For the mobile geodata acquisition in the given project environment of the Balingen case study, four main objectives have been investigated:

(a) Refinement of concepts for mobile acquisition of geodata. (b) Development of a prototype system. (c) Definition of a detailed concept for the quality assurance. (d) Proof of concepts â&#x20AC;&#x201C; Application of the system to the ÂťBalingen test areaÂŤ.

3.1.1 Refinement of concepts for mobile data acquisition This point included in particular the following important research issues: - The development of refined workflows for mobile acquisition of geodata, which make fully use of ubiquitous access to various sources of information. This includes the selection of the servers and the download of the data usable for the current application, the feature acquisition, update etc. The specific requirements of these workflows during mobile online data acquisition have been analysed, evaluated and finally the application has been adjusted accordingly. - Multi-sensor treatment: In the user scenario in the Balingen test area, different kinds of sensors like GPS receivers, total stations, extensometers and even laser scanning devices have to be considered. In future OGC standards like SensorWeb or SensorML will allow for an interoperable access to these sensors. Until the sensors manufacturers support these protocols alternative options have to be considered. The use of common standardised protocols like e.g. NMEA for GPS receivers allows for an access to many sensors of that specific kind independent of a certain vendor. This enables the application to access and control a maximum number of sensors. - Technical issues like the connectivity via wireless techniques have been investigated. In rural and especially forested areas cellular radio and WLAN have to be combined in order to fully cover an area of interest and to transfer the data to the server. Some of the experiences made are summarised in section

3.1.2 Development of a prototype system A prototype for mobile data acquisition of geodata has been developed. The most important guidelines for this development have been: - Development of an open architecture based on standards, which means that no propriety vendor dependent modules and interfaces have been included. Proprietary interfaces constrict in particular the connection to the disparate data services and the applied sensors. Hence for the access of the heterogeneous distributed servers standards like the OGC web map and feature services (WMS and WFS) and the geographic markup language (GML) have been employed. As mentioned in the previous section the control of the sensors and the transfer of the measurement results should also be based on standardised interfaces (if possible). - A generic approach of data acquisition has been developed which allows using the system in various applications. Therefore the client application has to be able to adjust itself to the requirements imposed by the data model. In particular, the measuring process and the templates for input of further attributes must be flexible and adaptable. The standardised service interfaces mentioned above are self-contained and self-describing. The client application is using these features for the purposes of data access and acquisition. This means the client application downloads capabilities (supported operations and existing feature classes) and schema information of the server at runtime. Such XML-schema contains all necessary details about the modeled feature types, their geometry and associated attributes as well as the interrelations between features of one or different types. With the information contained in the XML-schema, it is possible for the client application to adjust the acquisition process with regard to the required attributes, geometry types and relationships of a particular feature type and to guide the user through the whole data collection procedure. The templates for the input of

attribute values are generated automatically at runtime and the process of measuring geometry elements is adjusted to the requirements of the feature type currently being measured. This will assure that the collection of the data is conforming to the schema provided by the particular server. (MĂ¤s, S., Reinhardt, W. & Wang, F. 2005a) - The architecture of the client software allows for an easy extension and adaptation to the requirements of a particular application. Section 3.2 (Graphical geodata editor) includes further explanations regarding this.

3.1.3 Quality assurance concept The mobile interoperable access to heterogeneous geodatabases and their update from the field has far reaching consequences for the data acquisition process. As mentioned before, this approach provides the possibility to check the newly acquired data in terms of quality and reliability directly in the field, which makes quality management investigations necessary. In our work specific focus was given on finding a way to define integrity constraints and transfer them to the client in a standardized way, as additional information to the XML schema available through the WFS. These constraints allow for related automatic checks during data collection

in the field. Therefore it has been investigated how spatial and other constraints can be formalised in SWRL (Semantic Web Rule Language, W3C 2004b), which is a combination of OWL (Web Ontology Language, W3C 2004a) and RuleML (Rule Markup Language). The defined constraints can be applied, for example: - on spatial relations between objects of the same or of different classes, - on a single or numerous attribute values, - on a defined relation between two attribute values of one object, - or on a combination of spatial relations and attribute values of different objects. The rules are not restricted to relate only two object classes or attributes. Even complex spatial and topological relations between numerous spatial objects, together with their attribute values, can be described. A simple example of a quality constraint for geospatial data is given in figure 2. In natural language the meaning of this rule is: Âťa clearing is always within a forestÂŤ. The two atoms in the antecedent define variables for each one of the object classes. In the consequent these variables are used to set the object classes in

Figure 2: Example quality constraint encoded in SWRL.

Figure 3: System configuration for the Balingen test area

relation. Therefore the »Within« relation is employed. The denotation and the definition of such spatial relations refer to the spatial operators defined in OGC Filter Encoding Implementation Specification (OGC 2001). More details regarding the constraint formalisation in SWRL and the quality assurance concept can be found in Mäs et al. (2005b). 3.1.4 Proof of concepts – application of the system to the »Balingen test area« As mentioned before the proposed mobile client system should support the decisionmaking process of the geologist in case of an alarm. The concepts and prototype implementations have been proven in the Balingen test area. Therefore a data model has been defined in cooperation with the users and a WFS server has been set up. This data model and the alarm scenario are described in more detail in Kandawasvika et al. (2004). Figure 3 shows the system configuration for the field tests. Central component of the configuration has been the »Geotech in-field server«, which is

normally installed in a car at a point where a connection to the central geodata warehouse via GSM / GPRS / UMTS is possible. This connection might not be necessary in every case. Sometimes it is better to have the database and the service directly running on the in-field server, depending on the data volume and the available bandwidth / transfer rate. With the in-field server and a locally installed WLAN it is possible to support several mobile users at the same time. Mobile units are preferably tablet PCs, because of their capacity and performance, but the client application should also support other devices. For the data collection GPS and total station have been used as measurement devices. In practical tests we found out that the whole area of around 200* 150 m2 can be covered by using only 2 WLAN access points (high-end APs and antennas). Please notice that our examination area is a very steep (50m height difference), undulated terrain, which is covered by tall trees. The user is able to move in the whole area e.g. with a tablet PC always being

connected to the geodata warehouse (via the in-field-server). The performed field test verified the practical advantage of the developed concepts for the geologist field tasks. The support to the geologists included: - Online request and visualisation of available existing data - Positioning in the map - Possibilities to analyse data and do inspection measurements - Validation of the alarm - Acquisition of new features like e.g. ditches or gaps - Quality assurance of these data The quality assurance process not only does it help to acquire data conform to the server data model, but it also supports the decision-making process and helps to identify dangerous situations that are not always obvious to see. For example a constraint prohibiting a publicly accessible way to be in a certain distance to a ditch would lead to an automatic warning to

the geologist while measuring such a newly formed ditch. It is then up to his decision if and how to react: he might close the way for public access or at least mark the danger with some warning signs. Anyway, for traceability he has to document his decision in the system.

3.2 Graphical geodata editor A central component of the mobile acquisition system is the graphical editor for geodata (see figure 4). It is implemented as a lightweight Java application running on the mobile device, e.g. a ruggedized Tablet PC. The editor constitutes the user interface of the mobile data acquisition system and provides the core functionality for acquiring and editing geodata in the field. The central element of the editor GUI is a map which displays the geodata received from the server. The usual tools for navigating the map (pan, zoom), getting information about features and editing their attribute data and geometries are being implemented. A straightforward possibility for the visualization of the GML feature collections received by

Figure 4: Architecture of the mobile acquisition system

WFS is the transformation to SVG using XSL transformations (XSLT). In this context, we have investigated whether and how SVG can be used to visualize and store geodata within a Java SVG implementation. As implementation of SVG, we have chosen the open source Apache Batik framework. This framework supports SVG rendering and access to the SVG DOM from Java applications. We investigated how to manage the geodata with all its non-geometric attributes on the client (in memory). It is not desirable to store the data redundantly, e.g. as SVG DOM and as objects of a GIS library in parallel. So, we need to hold all spatial as well as non-spatial attributes of the GML features in the SVG representation. While the GML geometries can be transformed to SVG geometries, the non-spatial attributes of the features can be stored in »svg:metadata« elements. This is a standardized mechanism to embed arbitrary metadata within SVG documents. It is also possible to insert elements of other namespaces into a SVG document. The Java SVG binding provides possibilities to address particular nodes in the SVG DOM directly. This allows manipulating SVG subtrees representing single geographic features. We realized a simple way to keep all the geodata in one SVG DOM including a change history (during one editing session) for each feature attribute. When geodata is represented as SVG it is necessary to transform from GML to SVG and also back from SVG to GML in order to write edited data back to the server. In order to develop a generic application we had to investigate the general possibilities and restrictions for bidirectional XSL transformations between GML and SVG. We used Styled Layer Descriptor (SLD) documents for the definition of the visual attributes of the different feature types. In Merdes, Häußler & Zipf (2005) it is shown, that it is possible to build a generic application without the necessity for application developers to write any application or domain specific code by using the self-describing mechanisms of the WFS services and SLD as styling definition.

To integrate additional functionality for more specific application scenarios we have developed an architecture and runtime engine for plugins. These way parts of the additional functionality can be integrated into the core editor keeping the actual editor component thin. Thus the system can be adapted to the application scenario and also fosters the desired reusability of the software application. The SVG representation of the geodata inside the core editor is transparent to the plugins. One group of plugins is »position sources«. As position sources we denote plugins that provide geo-positions with semantics well known to the user. Examples of position sources are GPS and total station. A position source plugin encapsulates e.g. a single GPS device and provides its measurements to the editor environment, together with additional information like timestamp, precision etc.. With such a plugin the current position can be displayed on the map. Several position sources can be connected simultaneously. The plugin infrastructure makes it possible for all plugins to connect to all registered position sources at any time making the editor a very flexible platform for additional and more advanced functionality. Other devices which do not function as position sources can connect e.g. other measuring devices which deliver measurements for non-geometric attributes of new or existing features (temperature, soil parameters, precipitation measurements, etc.). Triggering of a single measurement or a series of measurements in certain spatial or temporal intervals and insertion of the respective located measuring point(s) into the database are possible that way. A third group of plugins are those that do not link hardware devices to the editor, but provide other kinds of functionality. Examples of such plugins include: - A feature acquisition plugin which generates and adds new features using the position sources as input for the new geometries. - A plugin for quality assurance that controls the correctness of the edited features performing topological tests.

- A 2D Profile Section plugin: The user defines a planar profile section in the map of the editor and gets a 2D profile generated by the 3Dto2D service described in section 4.4.

of calculation-intensive services like the mentioned 3Dto2D service, which is important for (infield) interpretation of the acquired data.

The described infrastructure provides a flexible and extensible solution for a mobile open standards-based geodata editor.

3.3 Augmented reality client A mobile AR prototype system (see figures 5 & 6) has been designed and developed (Wiesel, Staub, Brand & Coelho, 2004), to support geoscientists in the field. The system is based on an IEEE1394 camera and a monoscopic Head Mount Display (HMD), hardware for navigation, an inertial measurement unit (IMU) and the necessary computing equipment mounted on a backpack.

For the described Balingen case study there are several ways in supporting the geologist with such a system. As there is online access to the geodata, the geologist does not have to go to the office (which in our case is hundreds of kilometres away) to consult the latest data. In the case of a false alarm, this makes it possible to bring the endangered road into service again very quickly. Furthermore, observations of the geologist can be added to the geodatabase server directly in the field, therefore being immediately accessible by other specialists. The quality control plugin could widely make post processing of the data superfluous, which is again important due to the big distance between the monitored area and the office. Decision making is assisted by the availability

The system has been designed to allow a human to move around in the test area and analyse the geological structures and landslides by - Inspecting the scene. - Overlaying the terrain with time stamped 3D geodatabase content (e.g. profiles or displacement vectors, geological data). - Gathering new geodata (e.g. new clefts or rifts). - Entering and editing geodata in real time

Figure 5: Proposal for an Augmented Reality System Architecture and Hardware Mockup

Figure 6: Testing the proposed ARS outdoors

into the geodatabase. - Entering attribute data into the database. - Writing reports about current situations in the field. Navigation and orientation of the sensor system (based either on a camera or a head mounted display) is crucial for the usability of such mobile AR clients. We have combined a GPS-receiver and a low cost IMU to achieve a positioning precision in the cm range. By using Real Time Kinematic (RTK) GPS positioning we can achieve a positioning precision down to ±1 cm. Yet, in a typical geoscientific application, we have to deal with GPS dropouts while moving around in the field. To overcome this problem we are using an IMU mounted on top of the system, which can provide velocity, position and attitude of the HMD and camera for a short time period. Ongoing studies (Staub, Coelho & Leebmann, 2004) calibrate and filter sensor readings to close the gaps of satellite signal outages. Furthermore, we use a so called wrist-worn keyboard to interact with the system. Many important features are implemented so far and

can be triggered by predefined keyboard shortcuts. For example, it is possible to change the transparency, line width of the virtual objects or the lighting conditions of the virtual scene, as well as loading or removing objects. It is also possible to zoom in the scene, pan and rotate the virtual objects. The feedback signal sent by the ARS after receiving such a command is either visual or acoustic. This depends on the action performed by the user. The human computer interface had to be designed straightforward without occluding the area of interest in the »real world«. Therefore, a transparent interface with minimal contents and alternative controls on demand is proposed. It consists of permanent output of the user’s position (Gauß-Krüger coordinates and ellipsoidal height) and orientation, which is shown in the upper-most position of the display. An overview window is placed at the lower right corner of the display, which can be removed if it is occluding some important objects. These three components combine all the necessary positioning and orientation information to give the user knowledge about his (or her) location in the field. In the centre of the field of view a crosshair is displayed. This is

used for capturing additional information from virtual and real objects. This is a useful feature of the ARS, because it offers the possibility to receive information about the objects in realtime. Non-visible information is gathered from the artificial objects. To achieve a realistic impression of the superimposed scene, it is important to provide a smooth transition of virtual and real objects. Depth information is needed to fit virtual objects into the environment, which may occlude parts of the virtual scene. In urban environments, the developed ARS uses additional building models to retrieve depth information and to compute occlusion (Coelho, 2004). In the context described in this article, the user has to operate in a forest. No information on the location and size of trees, which are the

main source of occluding objects, is available. To operate in such an environment, a headmounted stereo camera system is used to obtain necessary depth information on the fly with a dense two-frame stereo algorithm. In the Balingen test area, newly discovered clefts or rifts have to be surveyed by a geoscientist. Therefore, (Leebmann, 2005) proposes a methodology to gather such information from a distance. It is necessary to survey the object of interest from a minimum of two different points of view. This way it is possible to calculate the GauĂ&#x;-KrĂźger coordinates. Figure 7 shows the approach tested by surveying an edge and the augmented view on it after calculating its position in the field.

Figure 7: Survey an edge in terrain

3.4 3D geodatabase services In order to provide geoservices accessible by arbitrary mobile clients, a 3D geodatabase system should provide an open service architecture giving access to the whole functionality of the underlying geodatabase, while ensuring the communication with the mobile clients based on interoperable open protocols. In the prototype of our 3D geodatabase system the service framework is provided by a rich set of single services implemented as remote method calls in Java. The service framework supports the combination of the single services to so called service chains – which are then capable of providing complex processing capabilities inside the database system. The data transfer between services and clients is primarily based on the extensible markup language with an associated XML schema. The output services currently cover a specialised XML format, VRML and X3D, and are extensible by the user through XSL transformation to arbitrary XML and text based formats. To support applications like the Balingen land slide scenario, the geodatabase services must provide access to and update capabilities of entire 3D-models and related geometric and thematic data from mobile devices in the field. The following sections give two examples of our 3D geodatabase system which are meant to address these capabilities. The first example describes the support of constraint mobile devices (PDAs), which are not yet capable of working with complex 3D models, through a special application service. The second example explains our ongoing research with supporting update capabilities on mobile devices through the usage of mobile databases integrated with the server geodatabase system.

3.4.1 Supporting constrained mobile devices A comprehensive subsurface model may consist of hundreds of geological bodies, each represented by complex objects, e.g. triangulated surfaces or volumes, composed of up to more than a hundred thousand elements (e.g.

triangles or tetrahedrons). Considering constraint clients, e.g. PDAs combined with a GPS, both the transmission and the graphical representation of such a complex model are not yet realistic, because of insufficient available bandwidth and performance of the graphical display. On the other hand, the geoscientist in the field often needs only a selected part of the information, specified by e.g. a 3D region, a stratigraphic interval, a set of thematic attributes or some other geometric and thematic criteria. Even such reduced information may be too large for use in the field, motivating the use of techniques of data reduction and progressive transmission (Shumilov, Thomsen, Cremers & Koos, 2002). Therefore – due to today’s hardware restrictions on PDAs - graphical representation of a 3D model could be reduced to a sequence of 2D sections and projections. By sliding through successive sections, even a 2D display can provide insight into the form and structure of a complex 3D body. However, this means that services have to be provided that compute 2D profile sections for arbitrary planes of a 3D subsurface model. Such a service allows the field geologist to compare the actual observed situation with information provided by the subsurface model, and to take decisions on sampling accordingly. We are exemplary presenting such a service, the so called 3Dto2D-Service. It provides the derivation of 2D geological profiles from a 3D subsurface model for a specified arbitrary plane in the 3D space. Additionally, further objects spatially located in a specified distance to the plane, which are of interest for interpretation, can be projected onto the computed 2D profile. The service is composed of the following single services provided by our service framework (see figure 8): - RetrieveService – supports queries on complex geoscientific 3D models. - PlaneCut – cuts a planar profile through the 3D model for a spatially specified arbitrary 3D plane.

Figure 8: 3Dto2D service

- PlaneProjection – projects interesting 3D objects onto the plane profile, which are spatially located in a specified certain distance of interest to the 3D plane. - AffineTransform – transforms the resulting 3D objects into a 2D xy plane. Figure 8 shows the principle steps. The user may specify a planar profile section between endpoints A and B, with further data such as spatially neighboured boreholes b1 and b2. Figure 8 (a) shows the location in map plane view. The block view of the 3D model is given in figure 8 (b) and figure 8 (c) shows the view of profile section with part of model removed. Finally figure 8 (d) shows the resulting 2D profile section with the projected borehole profiles as additional information. Each of the single services of the 3Dto2DService implies geometric operations requiring a considerable amount of time (Breunig, Bär & Thomsen, 2004). Therefore, in order to reduce the length of transactions, the single services are operating each in transactional mode. Single failures of one service can be compensated by restarting this single service, and do not require starting the whole service chain from the beginning.

3.4.2 Supporting update operations with detached mobile databases A 3D geodatabase system for geological applications should enable the geologists in the field, as well as in the laboratory, to refer to a shared common 3D model during the process

of data caption, processing, interpretation and assessment. The cycle of steps involved in updating a geological model can be rather long and the result may never be free of subjective appreciation. Therefore it is advisable to use strategies of version management to control the evolution of the 3D model rather than supporting direct editing by transaction management. In the following we will give an overview of how we address update capabilities in the field using mobile databases (Bär & Breunig, 2005). The approach is based on a version management extension of our 3D geodatabase server. The mobile database is regarded as a special client to this version management system and therefore updates on local 3D objects during offline mode are integrated back to the 3D geodatabase system as new revisions of the previously replicated 3D object. This approach makes it possible to review changes in 3D objects or to complete 3D models before they are merged to the original 3D model of the database system. Therefore our 3D geodatabase system has been extended with version management capabilities. The generic version management extension is motivated by the object-oriented version model of Schönhoff (2002) and provides the management of history graphs of versions and a hierarchy of workspaces as version repositories. For an overview of version models we refer to Katz (1990). Figure 9 gives an overview of the general structure of the version management system as

Figure 9: Structure of the version management system

seen by the user. The existing 3D database system is called the release database in which the releasable database objects reside. To modify a database object from the release database it has to be put under version control first. This operation creates a new design object for the database object and the initial version in the main workspace. Starting from this initial version, new modifications result in a new version of the design object. Inside a workspace there exist only revisions of versions (linear history). Modifications to versions which are meant to provide a rather alternative representation of an object are called alternative versions. Such alternative versions have to be created in a new child workspace with an own revision history. Propagating a later revision of an alternative version back to the parent workspace is called merging. To execute a merge, no conflicts with the latest revision in the parent workspace are allowed. Otherwise a conflict resolution must be done beforehand by the user. This way mature versions can be propagated upwards the workspace hierarchy and finally replace the original database object in the release database.

Besides the concepts of design objects and their versions, the concept of configurations has to be supported in the version management system. Configurations allow grouping specific versions of several design objects. With configurations the notion of 3D models from geology as a consistent set of several design objects can be realized. Therefore they must be extensible to allow forcing constraints on the added versions such as Âťno geometric intersection between the objects represented by the added versions is allowedÂŤ. Furthermore, configurations provide a way for batch propagation of a consolidated set of versions between the workspaces. The version management presented so far does not force a specific representation of versions in the system. As the complex 3D objects used in geo applications can internally consist of up to several hundred thousands of simplexes, the storage of a complete object for each version is impracticable. Therefore, we represent a version as the set of changes to its revision or alternative predecessor in the version history (delta storage). Beside the reduction of

used storage, this approach enables efficient algorithms for conflict detection and also for providing change histories. Having used simplicial complexes as the underlying data model of our 3D objects, the changes are represented inside the version management system as additions / deletions of single simplexes, complete components and the associated thematic changes. Conversion operations between the version representation and the database object representation ensure that all the operations from the geodatabase system can also be applied to the versions of 3D objects. This makes it possible for example to create profiles sections with the described 3Dto2DService from different versions of a 3D object and therefore to compare differences also on constraint devices in the 2D space. The version management extension is integrated with the service framework of the geodatabase system. The communication between mobile databases and the version management is based on the XML representation of 3D objects or change sets between versions. Although the version management system was designed with the support of detached mobile databases (offline usage mode) in mind, the integration with the service framework also enables every mobile or static client to use the version management capabilities provided.

3.4.3 The 4D-Extension: Managing spatial objects varying with time Landslides obviously involve changes of location and form of spatially extended objects depending on time. The modelling of displacements and deformations can be done by numerical models, or by a scientist designing a sufficient number of discrete states of the model at different time instants, based of observations and measurements. The task of the geodatabase is to manage the resulting time-dependent spatial objects (4D-objects), and provide services that allow to retrieve the state of a spatial object at any given time of its lifespan by appropriate searching and interpolation methods. The 4D-extension of the

geodatabase is based on earlier experiences with the timescene tree (Polthier & Rumpf, 1995), with the GeoToolKit (Balovnev et al, 2004), and on concepts presented by Worboys (1995). Rolfs (2005) presents a detailed discussion of the spatiotemporal extension and its implementation as well as more extensive references. A time-dependent spatial object is considered as a function defined on a time interval, with values in a set of spatial 3D-objects. This implies that in addition to the spatial model discussed in previous chapters, a model and a discretisation of time is required. It consists of time instants t and time intervals (ti,ti+1) that are concatenated to form time sequences. A number of temporal operations support set operations and predicates, especially to determine intersections. Searching is supported by a temporal index based on Bentleyâ&#x20AC;&#x2122;s segment tree, cf. (de Berg et al., 2000). The temporal behaviour of objects is defined in an interface that is inherited by all temporal and spatiotemporal classes. The central questions concern the discretisation of time and the necessary interpolation between discrete states of the object, changes of topology, i.e. of meshing and of connectivity. As the static 3D-objects of a geological model may already comprise meshes of considerable size (up to several 100000 elements), a simple repetition of slightly changed copies at each time step may result in intolerably big and redundant 4D-objects. Therefore, attempts are made to reduce redundancy in parts of 4D-objects that are either static or show only very small changes over time or changes that dependent linearly on time, by allowing for different density of discretisation in different parts. Whereas the geometry (location, extent and form) of a 4D-object may vary continuously or by steps, its topology (meshing, connectivity) can only change at discrete steps. Moreover, it seems reasonable to assume continuous deformations and displacements taking place more

The ST-elements are grouped into a number of spatiotemporal (ST-) components, each with a common discretisation of time and a constant and connected mesh. Between spatially neighbouring ST-components, time discretisation may vary, and at the contact of subsequent STcomponents in time, meshing and connectivity may change (figure 11). Different discretisation in space or time may cause inconsistencies at the contact of ST-components. These might be avoided by carefully designing the 3D-objects, or by the user imposing appropriate constraints. In a simple case, an ST-object may consist of a single ST-component, with a common time discretisation, and no discontinuities.

Figure 10: A spatiotemporal (ST-) element

frequently than disruptions or re-meshing because of extreme deformations or changes of size. The time-dependent geometry therefore assumes that a number of contiguous discretisation intervals where continuous displacements and deformations occur without change of topology, can be grouped together to larger time intervals at the boundaries of which meshing and connectivity change. A 4D-object is composed of spatiotemporal (ST-) elements of two kinds: one is defined as a pair (t, s d(t)) , with a time instant t and a d-simplex s d(t), the other one is defined as a tuple (ti, ti+1, s d(ti), s d(ti+1), f) â&#x20AC;&#x201C; it consists of an open time interval ]ti, ti+1[, a pair of spatial d-simplexes s d(t) defined at the interval boundaries, and an interpolation function f that, for any t in the open interval ]ti,ti+1[, yields a snapshot f(t)=(t,s d(t)), The present geodatabase supports linear interpolation of vertex co-ordinates, but the approach can be generalised to more elaborate interpolation methods. Considered as a 4D-geometry object, such an ST-element resembles a deformed prism (figure 10).

Besides methods for the loading and checking of ST-objects, for intersection with 4D search boxes, numerical functions etc., there are two main operations supported by an ST-object Od to be mentioned: 1. The calculation, for any given time t within its interval of definition, of its 3D-snapshot Sd(t), yielding a ST-object, which is a d-simplicial complex in R3 with an additional time stamp, and can be subject to any spatial operation defined in the 3D-geodatabase. 2. The intersection with a 4D query box, resulting in a new ST-object defined by the intersection with the query box. In principle, a combined spatiotemporal index can be defined by extending the well known Rtree to four dimensions. In the present model, however, separate indexes for time â&#x20AC;&#x201C; a segment tree, and an R-tree for space are used, thus keeping in line with the static 3D- model (Rolfs, 2005).

Figure 11: A spatiotemporal (ST-) object composed of 3 ST-components with different time discretisation

4. Conclusions This contribution reported about typical geoscientific requirements to new geoservices. In a case study dealing with land slides at Balingen, south-west Germany, it has been shown that the mobile acquisition, visualization and management of spatial data can simplify geoscientific work by digitally supporting geoscientists directly in the field. Contributions of the project partners to a prototype of a distributed software system of geoclients and services usable by mobile geoscientific applications were discussed. A mobile graphical editor for geodata acquisition, a mobile AR client and geodatabase services were presented in detail. We are optimistic that in the future, the merging of the 3D database content with the live scene in real time executed by AR methods will help the geoscientific expert in the field efficiently to examine geological subsurface structures and to compare them with visible fault lines at the surface. The 3Dto2D geodatabase service, for example, meets the introduced geoscientific requirements by remotely computing 2D profile sections from a 3D subsurface model and by visualizing the database query results on the mobile client. For the future we see research demands in integrating single

geoservices into geodata infrastructures and in developing new mobile data acquisition and visualization tools coupled by efficient geodatabase services.

Acknowledgements The funding of the research project »Advancement of Geoservices« (Weiterentwicklung von Geodiensten) by the German Ministry of Education and Research (BMBF) by grant no. 03F0373B et al. within the framework of the geotechnology initiative (http://www.geotechnologien.de) is gratefully acknowledged. The responsibility for the contents of this publication is by the authors. We also thank Dr. Ruch from the LGRB Baden-Württemberg for providing an interesting application and data for the Balingen test example. An earlier version of this contribution has been published as GEOTECH-126.

References Agarwal, S., Arun, G., Chatterjee, R., Speckhard, B. & Vasudevan, R. (2003) Long transactions in an RDBMS. 7th Annual Conference and Exhibition: A Geospatial Odyssey, Geospatial Information and Technology Association (GITA). Balovnev, O., Bode, T., Breunig, M., Cremers, A.B., Müller, W., Pogodaev, G., Shumilov, S., Siebeck, J., Siehl, A. & Thomsen, A. (2004) The story of the GeoToolKit – An ObjectOriented Geodatabase Kernel System. GeoInformatica, 8 (1), 5-47. Batty, P.M. (2002) Version management revisited. Proc. of GITA Annual Conference, Florida. Bär, W. & Breunig, M. (2005) Usage of Mobile Databases for Mobile Geoscientific Applications. Accepted for publication in AGILE 2005 Proceedings, 8th AGILE Conference on Geographic Information Science, Estoril, Portugal, 9 p. Bernard, G., Ben-Othman, J., Bouganim, L., Canals, G., Chabridon, S., Defude, B., Ferrie, J. Gaucarski, S., Guerraoui, R., Molli, P., Pucheral, Ph., Roncancio, C., Serrano-Alvarado, P. & Valduriez, P. (2004) Mobile Databases: a Selection of Open Issues and Research Directions. ACM SIGMOD Record, Vol. 33, No. 2, June, 6 p. Breunig, M. (2001) On the Way to Component-Based 3D/4D Geoinformation Systems. Lecture Notes in Earth Sciences, No. 94, Springer, 199 p. Breunig, M., Cremers, A.B., Shumilov, S. & Siebeck, J. (2003) Spatio-Temporal Database Support for Long-Period Scientific Data. Data Science Journal, International Council for Science, Vol. 2, 175-191. Breunig, M., Malaka, R., Reinhardt, W. & Wiesel, J. (2003) Advancement of Geoservices. Geotechnologien Science Report No. 2, Information Systems in Earth Management, Potsdam, 37-50.

Breunig, M., Türker, C., Böhlen, H., Dieker, S., Güting, R.H., Jensen, C.S., Relly, L., Rigaux, P., Schek H.J. & Scholl M. (2003) Architecture and Implementation of Spatio-Temporal DBMS. Spatio-Temporal Databases – The CHOROCHRONOS Approach, Lecture Notes in Computer Science Vol. 2520, Springer, 219-264. Breunig, M., Bär, W. & Thomsen, A. (2004) Usage of Spatial Data Stores for Geo-Services. Proceeding 7th AGILE Conference on Geographic Information Science, Heraklion, Greece, 687-696. Brinkhoff, T. (1999) Requirements of Traffic Telematics to Spatial Databases. Proceedings of the 6th Intern. Symposium on Large Spatial Databases, Hong Kong, China. In: LNCS, Vol. 1651, 365-369. Coelho, A. H. (2004) Erweiterte Realität zur Visualisierung simulierter Hochwasserereignisse. Phd thesis, Karlsruhe, Univ., Fak. Für Bauingenieur-, Geo- und Umweltwissenschaften. Dahne, P. & Karigiannis, J. N. (2002) Archeoguide: system architecture of a mobile outdoor augmented reality system. Proceedings of the International Symposium on Mixed and Augmented Reality, ISMAR 2002, 263-264. de Berg, M. et al. (2000): Computational geometry – algorithms and applications, Springer. Feiner, S., MacIntyre, B., Höllerer, T. & Webster, T. (1997) A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment. Proceedings of the 1st Int. Symposium on Wearable Computers, ISWC97, October 13-14, 1997, Cambridge, 208-217. Gollmick, Ch. (2003) Client-Oriented Replication in Mobile Database Environments. Jenaer Schriften zur Mathematik und Informatik, Math/Inf/08/03, University of Jena, Germany.

Güting, R. H., Bohlen, M. H., Erwig, M., Jensen, C. S., Lorentzos, N. A., Schneider, M. & Vazirgiannis, M. (2000) A foundation for representing and querying moving objects. ACM Transactions on Database Systems 25, 1, 1-42. Höllerer, T., Feiner, S., Terauchi, T., Rashid, G. & Hallaway, D. (1999) Exploring mars: Developing indoor and outdoor user interfaces to a mobile augmented reality system. Computer & Graphics, Vol.23, No. 6. Elsevier Publishers, 779-785. Kandawasvika, A., Mäs, S., Plan, O., Reinhardt, W. & Wang, F. (2004) Concepts and development of a mobile client for online geospatial data acquisition. GeoLeipzig 2004 - Geowissenschaften sichern Zukunft. Schriftenreihe der Deutschen Geologischen Gesellschaft, Vol. 34, p. 79. Kandawasvika, A. & Reinhardt, W. (2005) Concept for interoperable usage of multi-sensors within a landslide monitoring application scenario, Accepted for publication in AGILE 2005 Proceedings, 8th AGILE Conference on Geographic Information Science, Estoril, Portugal, 10 p. Katz, R.H. (1990) Towards a Unified Framework for Version Modeling in Engineering Databases. ACM Computing Surveys 22, No. 4, 375-408. Leebmann, J. (2005) Dreidimensionale Skizzen in der physikalischen Welt. Phd thesis, Karlsruhe, Univ., Fak. Für Bauingenieur-, Geo- und Umweltwissenschaften, submitted on July 5th 2005. Livingston, M.A., Brown, D. & Gabbard, J.L. (2002) An augmented reality system for military operations in urban terrain. Proceedings of Interservice/ Industry Training, Simulation and Education Conference, IEEE, Zurich, 31-38. Mäs, S., Reinhardt, W. & Wang, F. (2005a) Concepts for quality assurance during mobile online data acquisition. Accepted for publication in AGILE 2005 Proceedings, 8th AGILE Conference on Geographic Information Sci-

ence, Estoril, Portugal, 10 p. Mäs, Stephan; Wang, Fei; Reinhardt, Wolfgang (2005b): »Using Ontologies for Integrity Constraint Definition«, In: Proceedings of the 4th International Symposium On Spatial Data Quality, pp. 304-313, August 25- 26, 2005, Peking, China Merdes, M., Häußler, J. & Zipf, A. (2005) GML2GML: Generic and Interoperable RoundTrip Geodata Editing – Concepts and Example. Accepted for publication in AGILE 2005 Proceedings, 8th AGILE Conference on Geographic Information Science, Estoril, Portugal, 10 p. Mostafavi, M.-A., Edwards, G. & Jeansoulin, R. (2004): Ontology-based method for quality assessment of spatial data bases. In: ISSDQ '04 Proceedings, 49-66. Mutschler, B. & Specht, G. (2003) Implementation concepts and application development of commercial mobile database systems (in german). Proceedings Workshop Mobility and Information Systems, ETH-Zürich, Report No. 422, 67-76. Newell, R.G. & Easterfield, M. (1990) Version Management – the problem of the long transaction. Smallworld Systems Ltd., Technical Paper 4. OGC (2001): Filter Encoding Implementation Specification, Version: 1.0.0, OpenGIS® Implementation Specification, OpenGIS project document: OGC 02-059, 19 September 2001 Polthier, K. & Rumpf, M. (1995): A Concept For Time-Dependent Processes. In: Goebel, M., Mueller, H., Urban, B. (eds.): Visualization in Scientific Computing, Springer, 137-153. Pundt, H. (2002): Field Data Acquisition with Mobile GIS: Dependencies Between Data Quality and Semantics, GeoInformatica 6:4, 2002, Kluwer Academic Publishers, 363-380.

Roddick, J.F., Egenhofer, M.J., Hoel, E. & Papadias, D. (2004) Spatial, Temporal and Spatio-Temporal Databases – Hot Issues and Directions for PhD Research. ACM SIGMOD Record, Vol. 33, No. 2, June, 6 p. Rolfs, C. (2005): Konzeption und Implementierung eines Datenmodells zur Verwaltung von zeitabhängigen 3D-Modellen in geowissenschaftlichen Anwendungen. Diplomarbeit Geoinformatik, Fachhochschule Oldenburg FB Bauwesen u. Geoinformation, 90p. Ruch, C. (2002) Georisiken: Aktive Massenbewegungen am Albtrauf. LGRB-Nachrichten No.8/2002, Landesamt für Geologie, Rohstoffe und Bergbau (LGRB), Baden-Württemberg, 2 p. Schönhoff, M. (2002) Version Management in Federated Database Systems. DISDBIS 81, Akademische Verlagsgesellschaft (Aka), Berlin. Sellis, T. (1999) Research Issues in Spatio-temporal Database Systems. In: Güting, Papadias & Lochovsky (Eds.): Advances in Spatial Databases. Lecture Notes in Computer Science 1651, Springer. Staub, G., Coelho, A. & Leebmann, J. (2004) An analysis approach for inertial measurements of a low cost IMU. Proceedings of the 10th International Conference on Virtual Systems and Multimedia, Ogaki, Japan, November, 2004, 924-933.Shumilov, S., Thomsen, A., Cremers, A.B. & Koos, B. (2002) Management and Visualization of large, complex and timedependent 3D Objects in Distributed GIS. Proc. 10th ACM GIS, McLean (VA). W3C (2004a): OWL web ontology language Reference. Editors: Dean, Mike; Schreiber, Guus, W3C Recommendation, 10. February 2004, Available at:http://www.w3.org/TR/owlref/W3C (2004b): SWRL: A Semantic Web Rule Language Combining OWL and RuleML, W3C Member Submission 21 May 2004, Available at: http://www.w3.org/Submission/ + 2004/ SUBM-SWRL-20040521/

Wiesel, J., Staub, G., Brand, S. & Coelho, A.H. (2004) Advancement of Geoservices – Augmented Reality GIS Client. Geotechnologien Science Report No. 4, Aachen, 94-97. Wolfson, O. (2002) Moving Objects Information Management: The Database Challenge. Proc. of the 5th International Workshop on Next Generation Information Technologies and Systems, LNCS 2382, Springer-Verlag, 75-89. Worboys, F. M. (1994): A unified model for spatial and temporal information. Computer Journal vol. 37 no. 1, 26-34.

Development of a data structure and tools for the integration of heterogeneous geospatial data sets Butenuth M. (1), GĂśsseln G. v. (2), Heipke C. (1), Lipeck U. (3), Sester M. (2), Tiedge M. (3) (1) Institute of Photogrammetry and GeoInformation, University of Hannover, Nienburger Str. 1, 30167 Hannover, Germany, {butenuth, heipke}@ipi.uni-hannover.de (2) Institute of Cartography and Geoinformatics, University of Hannover, Appelstr. 9a, 30167 Hannover, Germany {goesseln, sester}@ikg.uni-hannover.de (3) Institute of Practical Informatics, University of Hannover, Welfengarten 1, 30167 Hannover {ul, mti}@dbs.uni-hannover.de

Abstract The integration of heterogeneous geospatial data sets offers extended possibilities of deriving new information which could not be accessed by using only single sources. Different acquisition methods, data schemata and updating periods of the topographic content leads to discrepancies in geometry, accuracy and topicality which hampers the combined usage of these data sets. The integration of different data sets â&#x20AC;&#x201C; in our case topographic data, geoscientific data and imagery â&#x20AC;&#x201C; allows for a consistent representation, the propagation of updates from one data set to the other and the automatic derivation of new information. In order to achieve these goals, basic methods for the integration and harmonisation of data from different sources and of different types are needed. To provide an integrated access to the heterogeneous data sets a federated spatial database is developed. We demonstrate two generic integration cases, namely the integration of two heterogeneous vector data sets, and the integration of raster and vector data.

1. Introduction Geospatial data integration is often applied to solve complex geoscientific questions. To ensure successful data integration, i.e. ensure that the integrated data sets fit to each other and

can be analysed in a meaningful way, an intelligent strategy is required due to the fact that these data sets are mostly acquired using different methods, quality standards and at different points in time. Differences between printed analogue maps were not as apparent as are those of digital data of today, when different data sets are overlaid in modern GIS-applications. Integrating different data sets allows for a consistent representation and for the propagation of updates from one data set to the other. To enable the integration of vector data sets, a strategy based on semantic and geometric matching, object based linking, geometric alignment, change detection, and updating will be used. With this described strategy the actual topographic content from an up-to-date data set can be used as a reference to enhance the content of certain geoscientific data sets. In addition, the integration of two data sets with the aim to derive an updated data set with an intermediate geometry based on given weights is possible. The integration of raster and vector data sets is the second integration task dealt with in this paper. As an example, field boundaries and wind erosion obstacles are extracted from aerial imagery exploiting prior GIS knowledge. One application area are geoscientific questions, for example the derivation of potential wind erosion risk fields, which can be generated with field boundaries

and additional input information about the prevailing wind direction and soil parameters. Another area is the agricultural sector, where information about field geometry is important for tasks such as precision farming or the monitoring and control of subsidies. The paper is structured as follows: The following section gives an overview of the state of the art concerning the topic of data integration. Afterwards, the used data sets are presented and an architecture for database supported integration is described. Methods for the integration of vector/vector and raster/vector data integration are highlighted in the following section. Results demonstrate the potential of the proposed solution, finally a set of conclusions is given and further work is discussed.

2. State of the art of geospatial data integration The integration of vector data sets presented in this paper is based on the idea of comparing two data sets, while one is used as a reference and a second one â&#x20AC;&#x201C; the candidate â&#x20AC;&#x201C; is aligned to the first one, which is a general matching problem, see e.g. Walter and Fritsch (1999). For the integration of multiple data sets, it has been shown how corresponding objects can be found when several data sets have to be integrated (Beeri et al., 2005). Due to the complexity of the integration problem it is very difficult to solve this task with one closed system, therefore the development of a strategy based on component ware technology was proposed (Yuan and Tao, 1999) and a software prototype for the vector data integration has been developed as a set of components to ensure the applicability in different integration tasks. While this approach uses a reference data set to enhance and update the topographic content of a candidate data set, data integration can also be used for data registration, when one data set is spatially referenced and the other has to be aligned to it (Sester et al., 1998). In order to geometrically adapt data sets of different origin, rubber sheeting mechanisms are being applied (Doythser, 2000).

Strategies applied to cadastral data based on triangulation to enhance the rubber-sheeting process have been presented by Hettwer and Benning (2000). The recognition of objects with the help of image analysis methods starts often with an integration of raster and vector data, i.e. using prior knowledge to support object extraction. An integrated modelling of the objects of interest and the surrounding scene exploiting the context relations between different objects leads to an overall and holistic description (Baltsavias, 2004). In this paper, the extraction of field boundaries and wind erosion obstacles from imagery is chosen to demonstrate the methodology integrating raster and vector data. In the past, several investigations regarding the automatic extraction of man-made objects have been carried out (e.g. Mayer, 2001). Similarly, the extraction of trees has been accomplished, cf. Hill and Leckie (1999) for an overview of approaches suitable for woodland. In contrary, the extraction of field boundaries is not in an advanced phase: a first approach to update and refine topologically correct field boundaries by fusing raster-images and vector-map data is represented in LĂścherbach (1998). The author focuses on the reconstruction of the geometry and features of the land-use units, however, the acquisition of new boundaries is not discussed. In Torre and Radeva (2000) a so called region competition approach is described, which extracts field boundaries from aerial images with a combination of region growing techniques and snakes. To initialise the process, seed regions have to be defined manually, which is a time and cost-intensive procedure. In order to connect heterogeneous databases, first so-called multi-database architectures had been discussed for loose coupling. Subsequently, so-called federated databases have been chosen to support closer coupling (Conrad, 1997). Federated databases allow integrating heterogeneous databases via a global schema and provide a unified database interface for global applications. Local applications remain unchanged, as they still access the

databases via local schemata. For database schema integration a broad spectrum of methods has been investigated (Batini et al., 1986), but identifying objects is typically restricted to one-to-one-relationships. In context of geospatial integration more sophisticated methods are needed, to incorporate complex correspondences between objects (manyto-many-relationships), which usually are not considered in federated databases. Whereas there are a lot of overview articles of spatial databases (e.g. Rigeaux, 2002), federated spatial databases are hardly investigated with the exception of (Devogele, 1998; Laurini, 1998).

3. Architecture for integration Different geospatial data sets which represent the same real world region, but cover different thematic aspects, are acquired with respect to different needs. In this section we present an architecture that provides an integrated access to heterogeneous data sets. It is designed to store and export results of the vector/vector and the raster/vector integration steps. This task is accomplished according to the para-

Figure 1: System overview

digm of federated databases. For this purpose the known architecture of a federated database is expanded to handle geospatial data. In order to select certain objects satisfying given semantic criteria it is possible to define mappings to harmonise the attributes of the different data sets. Furthermore, the database provides mechanisms to pre-process geospatial objects for the integration of raster and vector data. Fig. 1 gives a simplified overview of the realised system architecture with respect to the interaction between the federated database and the integration process, namely object matching and extraction. In the next section, the involved vector and raster data sets are described to demonstrate how much the geospatial data models differ structurally and semantically. Then the architecture and modelling concepts of the database integration are explained; they provide an organisational framework for the approaches of geospatial data integration given in section 4.

3.1 Data sets The vector data sets used in this project include the German topographic data set (ATKIS DLMBasis), the geological map (GK) and the soil science map (BK), all at a scale of 1:25000. Simple superimposition of different data sets already reveals some differences. These differences can be explained by looking at the creation the maps. For ATKIS the topography is the main thematic focus, for the geoscientific maps it is either geology or soil science. Thus, these maps have been produced using the result of geological drilling, and according to this punctual information, area objects have been derived using interpolation methods based on geoscientific models. They are, however, related to the underlying topography. The connection between the data sets has been achieved by using the topographic information together with the geoscientific results at the point of time, when the geological or soil science information was collected. The selection and integration of objects from one data set to another one was performed manually and in most of the cases the objects have been generalised by the geoscientist. While the geological content of these data sets keeps its topicality for decades, the topographic information in these maps does not: In general, topographic updates are not

integrated unless new geological information has to be inserted in these data sets. The geoscientific maps have been digitised to use the benefits of digital data sets, but due to the digitalisation even more discrepancies occurred. Another problem which amplifies the deviations of the geometry is the case of different data models. Geological and soil science maps are single-layered data sets which consist only of polygons with attribute tables for the representation of thematic and topographic content, while ATKIS has a multi-layered data structure with objects of all geometric types, namely points, lines and polygons, equally with attribute tables. In addition to the described vector data, raster data sets are used to enable object recognition while exploiting the prior ATKIS knowledge. The raster data sets are aerial images or high resolution satellite images, which include an infrared channel.

3.2 Architecture and concepts of integration As the previous section has shown, the various geospatial data sets differ significantly due to the various objectives of their acquisition. In order to integrate the corresponding databases we have chosen the architectural paradigm of federation (Conrad, 1997), as it gives a close

Figure 2: Architecture of a federated database.

Figure 3: Export schema for the topographic map (ATKIS).

coupling at the same time and keeps the databases autonomous. Hereby, the matching and extraction processes are given an integrated view to the different databases via a global database schema (global applications). Nevertheless, particular applications (like import and export processes) may still access the databases locally as shown in Fig. 2.

A geoobject of entity type ATKIS_Objects, e.g. a road, has several entries of type ATKIS_ Attributes, namely (attribute, value)-pairs like e.g. (width, 10 meters). The corresponding type of the attributes or the classification of the geoobjects can be found in the collections ATKIS_ AttributeTypes and ATKIS_ObjectClasses.

The federation service requires an Âťintegration databaseÂŤ (cf. section 3.2.4) on its own to maintain imports and descriptions of the involved data sets (component databases), and to incorporate qualified links between object as the result of the matching process as well as further findings such as geometric adjusted and new extracted objects.

3.2.2 Object linking Given the structural adaptation of the different data sets, the federated database can be enabled to incorporate correspondences through so called links. Linking objects, however, should not only involves simple one-to-onerelationships, as real-world objects are represented differently with respect to different maps. The federation service has to cope with more complex correspondences namely oneto-many- and even many-to-many-relationships as shown in Fig. 4, which represents different partitions of a real world object in two maps. This task is accomplished with a flexible schema, that integrates these general correspondences as attributed one-to-one links between aggregated objects. Fig. 4 shows an instance of three and two objects, respectively, e.g. a section of a water body segmented in two different ways, whose aggregations (denoted by dashed lines) are linked.

3.2.1 Schema adaptation To make the structurally different data sets accessible to the federation service a generic but flexible export schema was designed based on experiences with geospatial data sets containing topographic objects with respect to object-relational databases (Kleiner, 2000). The schema contains all objects, object classes, attribute types and attribute values, each of them in one entity type (or table in the relational DBMS). Fig. 3 shows the schema for topographic data (ATKIS), the geoscientific data sets get isomorphic export views; in more detail they have application-specific attribute types and object classes according to their own representation model.

Figure 4: Realisation of a many-to-many-relationship as a link between object aggregations.

Figure 5: Semantic selections for regions and networks.

3.2.3 Attribute harmonisation and semantic selection In order to provide the applications with a model independent and uniform method to access certain objects with respect to thematic attributes, a mechanism for the semantic description of geoobjects was developed, to characterise comparable object sets for the matching process and to characterise object selection for the extraction process. To fulfil these requirements, the architecture of federated databases had to be expanded to unify the handling of semantic descriptions. Fig. 5 shows two simplified semantic selections of

topographic objects, namely of open landscape and a partitioning network. Semantic object selections are defined in the following three stages: Coarse semantic classification is achieved through the references to object classes given by the export views. Fig. 5 depicts some object classes of the topographic map, e.g. farmland and roads. Next, a more precise characterisation is provided through the specification of object attributes, i.e. the coarse selection via object classes is restricted by attribute conditions. For instance, road objects appear as both one-dimensional and

two-dimensional objects due to acquisition rules. In order to build a partitioning network only the one-dimensional road objects are needed. Finally, fine object classes are merged to class sets, which provide semantic selections for the global applications, independent of the original data setâ&#x20AC;&#x2122;s semantic specifications. Next to the structural unification through export views attribute harmonisation is achieved by connecting two conforming semantic selections of two different data sets (e.g. water bodies both in the topographic and the geological map). It is necessary to provide this semantic description for any representation model only once, independent of the quantity of instances of this particular model (component databases).

3.2.4 Integrated schema Fig. 6 summarises the schema architecture of the integration database. The component databases are both original involved geospatial data sets based on the previously described export views, and the term ÂťObjectsÂŤ stands

Figure 6: Overview of the integrated schema.

for all objects of the integration database, i.e. adjusted and extracted geometric objects. The different parts of Fig. 6 show that the federation service is supported with respect to the following tasks for - the model description, characterisation of object classes and attribute types of a certain data set model - the registration, registering the component databases - the semantic selection as described in the previous section - the application control, which stores meta data about extraction and matching processes, in particular about the used semantic selections, and links between the involved component databases - the linking objects from different data sets (object linking, cf. Section 3.2.2)

4. Methods of data integration In this section the methodologies of the vector/vector and the raster/vector data integration are described. First, the integration of

heterogeneous vector data sets which have been acquired for different purposes and with unequal updating strategies is presented based on a component based strategy. Subsequently, the integration of raster and vector data is highlighted with the example of the extraction of field boundaries and wind erosion obstacles from imagery exploiting prior GIS knowledge.

4.1 Integration of vector data At the beginning of the integration process the semantic content of all data sets was compared. According to this step, certain selection groups were built up for each data set (e.g. water area). This selection is mandatory to avoid comparing Âťapples and orangesÂŤ and has to be the first step to ensure a successful integration. An area-based matching process is used for the creation of links between object candidates. These links are stored in the federated database using a XML-schema, followed by an alignment process which reduces geometric discrepancies to a minimum to ensure satisfying results in the subsequent intersection process, but will still be capable of deciding between geometric discrepancies based on map creation or topographic changes which occurred during the different times of acquisition. A rule-based evaluation of the intersection results is used for change detection.

4.1.1 Revelation of links between corresponding objects Various data sets have different forms of representations for certain topographic objects (e.g. rivers), the decision which kind of representation to take often depends on specific attributes, e.g. in (ATKIS DLMBasis, cf. Section 3) the width of the river is used for this decision, thinner than 12 meters â&#x20AC;&#x201C; polyline, wider than 12 meters polygon. Due to the fact that there are different thresholds for each data set, these differences have to be resolved using harmonisation strategies. To ensure a suitable result in the revelation of links, line objects have to be transformed into polygons by applying a buffer algorithm using the width attribute. Another problem is the representation of grouped objects in different maps. For a group of water objects, e.g. a group of ponds, the representation in the different data sets could either be a group of objects with the same or a different number of objects, or even a single generalised object (see Fig. 7). Finally, also objects can be present in one data set and not represented in the other. All these considerations lead to the following relation cardinalities that have to be integrated: 1:0, 1:1, 1:n, and n:m. After the corresponding relations have been identified, each selection set will be aggregated, so they can be handled as 1:1 relations, so called relation-sets (Goesseln and Sester, 2004).

Figure 7: Different representations - ATKIS (solid line), GK (dotted line).

These relation-sets will be visualised to the operator - using a GUI based application - enabling a manual correction of the derived links. With this software each relation-set can be inspected and edited, to check whether the automated process has failed to build up the suitable correspondences between the selected data sets. Because of to the fact that the objects from all three data sets are representations of the same real world objects, they show apparent resemblance in shape and position. Nevertheless the alignment of the geometries is required after the evaluation of the matching results. As it will be described later, there are different geometric alignment method required for covering all alignment tasks. Therefore the technique offering the most suitable result can be selected for every single relation-set.

4.1.2 Geometric Alignment of corresponding objects Objects which have been considered as a matching pair could be investigated for change detection using intersection. At this stage the mentioned differences will produce more problems which are visible as discrepancies in position, scale and shape. These discrepancies will lead to unsatisfying results in the evaluation of the resulting elements almost and this would evoke an immoderate estimation of the area investigated as change of topographic content. Therefore a geometric adaptation will be applied, leading to a better geometric correspondence of the objects. For these adaptation processes thresholds are required which allow the reduction of discrepancies which are based on map creation, but will not cover the changes which happened to real world objects between the different times of data acquisition. Iterative closest point (ICP) The iterative closest point algorithm (ICP) developed by (Besl and McKay, 1992) has been implemented to achieve the best fitting between the objects from ATKIS and the geoscientific elements using a rigid 7 parameter transformation. The selection of a suitable algorithm used for ICP is depending on the

alignment to be performed, in this case the problem is reduced to a 2D problem requiring four parameters (position, scale and orientation) an solved using a Helmert-transformation. These calculations are repeated iteratively and will be evaluated after each calculation; the iteration stops when no more variation in the four parameters occur. At the end of the process the best fit between the objects using the given transformation is achieved. Evaluating the transformation parameters allows for classifying and characterising the quality of the matching: in the ideal case, the scale parameter should be close to 1 and rotation and translation should be close to 0. Assuming, that the registration of the data sets is good, these four parameters exactly meet the reasons for the integration of analogue produced data sets, that have been created by manual copying of printed maps. Therefore a greater scale factor can be an indicator for differences between two objects that are not based on map creation, but on a change on the real world object, that occurred between the different times of data acquisition (Goesseln and Sester, 2004). At the end of the process the best fit between the objects using the given transformation is achieved. The result of this transformation is stored as a set of shifting vectors, which are required in a subsequent step in which the neighbourhood of the transformed objects will be aligned. This step will be described later on (cf. Section 4.1.3). The application of the iterative adaptation using the ICP approach based on Helmert-transformation showed very good results and revealed the possibility of reducing the amount of objects which have to be evaluated manually. However there are some situations where this approach does not generate sufficient results (e.g. objects which cover several map-sheets or at least touch the map boundaries). Dual interval alignment (DIA) The DIA approach has been implemented, enabling the alignment of local discrepancies of corresponding geometries by calculating the transformation of single vertices, based on the ideas of Kohonen (1997), however this appro-

ach handles each vector separately. Corresponding objects which have been assigned as representations of the same real world object through the matching process are investigated based on their vertices. For every point in one object the nearest neighbour in the corresponding partner object is determined using the criterion of proximity. The conformation approach evaluates the distance between these coordinates, based on an interval which is predetermined by the human operator. This threshold defines the largest distance – representing a change in geometry – which will be suitable for the candidate data set. Distances exceeding this threshold implicate a topographic difference which has to be investigated during field-work. As it can be seen in Fig. 8, for each point (PC) from object C and the corresponding point (PR) of the linked object R, the point transformation is calculated based on the euclidean distance (d) between these points. The new coordinates are determined taking interval ranges a and b into account. Points within the first distance interval (0<d<a) are aligned to a single point, a distance falling into the second interval (a<d<b) will lead to an approximation of the selected points. Points with a distance beyond b will not be adapted (see Eq. 1). It seems to be a paradox that the complete alignment must not be the perfect result. The integration of data sets which cover the same area, which are based on the same method of representation and are acquired at nearly the same point of time, can be performed by using an alignment strategy with elimination of all dif-

ferences. While integrating data sets which have been acquired at different points in time it is obvious that a certain amount of change to topography, built-up area and/or vegetation has occurred. Therefore an alignment threshold is required which allows the operator to decide between errors due to map creation or realworld changes. The introduction of a second threshold follows the idea of »fuzzy« logic and ensures that there are no points of hard discontinuities at the geometry of aligned objects. The integration of a weight p (see Eq. 1) to the alignment process does not only take different accuracies of geometries into account, but opens this approach to a much wider range of conflation tasks. E.g. in a project where one data set is handled as a reference data set and must not be changed (weight set to 1). In other cases, when two data sets have to be aligned and no single data set can be selected as reference the alignment is performed using the common idea of conflation by aligning two data sets to a new improved data set. (1)

Calculating the shift distance based on the nearest neighbour enables very good alignment, but can result in topologic errors. This requires the integration of an additional criteria which is

Figure 8: Application of DIA for the partial alignment of object geometries (schematic).

vertex orientation. Therefore the orientation of the polygon segments will be calculated for all corresponding objects. If a point and its corresponding partner are selected using the distance criterion, the direction to each corresponding successor will be calculated. If the difference between these directions exceeds a given threshold, the points must not be aligned due to the assumption that they do not represent the same ÂťsideÂŤ of the real world. Comparing the adaptation approaches ICP and DIA, each is suitable for a different kind of objects in this project. ICP matches the idea, that the majority of the geometric discrepancies is caused in the way the data sets have been created by integrating topographic elements through manual copying. The resulting parameters can be used for the investigation and the evaluation of the influences which were responsible for the geometric discrepancies: An object which can be aligned by just using translations with a small change in scale can be judged as minor error based on manual copying. A larger scale factor can reveal topographic changes on the real world objects, which have to be investigated by a human operator. The transformation, the implemented ICP algorithm is based on, is very fast and reliable, but depending on the chosen transformation algorithm it does not give satisfying results for larger, irregular shaped objects like rivers, or objects that have been changed during different periods of time and therefore only match partially. But as good as the alignment results of DIA are, it is much more time consuming and susceptible to errors. The combination of both approaches delivered very good results, offering the possibility to assess the geometric discrepancies by evaluating the resulting ICP-parameters, and aligning large object groups or partially matching objects using DIA. The automatic guided decision between these methods has not been completed yet. So far both methods will be applied for every relation-set and the most suitable result will be chosen by comparing the results with certain geometric operators (e.g. angle histogram, symmetric difference).

4.1.3 Neighbourhood adaptation using rubber-sheeting The individual alignment of selected objects would result in gaps, overlaps or inconsistencies concerning the rest of the data set, so that the neighbourhood of the aligned objects must be transformed equivalent. To ensure an overall alignment, the results which originate from the individual alignment processes are stored as a collection of displacement vectors. All vectors will build up a vector field which is the basis of the neighbourhood adaptation ensuring a homogeneous data set. Using a distance weightened interpolation the rubber-sheeting method calculates a new transformation target for every point in the data set based on the vectors derived from the alignment. This strategy has to be carefully adapted for every adaptation process regarding to the used data set. Different data sets require different constraints the rubber-sheeting algorithm must be able to take into account. These constraints can be e.g. points or areas which must not be changed, like fixed points or areas which have been updated manually in advance, or objects of higher category.

4.2 Integration of raster and vector data The integration of raster and vector data is highlighted by means of the extraction of field boundaries and wind erosion obstacles from imagery exploiting prior GIS knowledge. First, the integrated modelling and the derived strategy are described, followed by the presentation of fully automatic methods to extract the field boundaries and wind erosion obstacles.

4.2.1 Model and strategy The semantic model comprises the integration of raster data (imagery) and vector data (GIS data) as starting point for the object extraction, as described in detail in Butenuth (2004). The model is differentiated in an object layer, a geometric and material part, as well as an image layer (cf. Fig. 9). It is based on the assumption, that the used images include an infrared (IR) channel and are generated in summer, when

the vegetation is in an advanced period of growth. The use of vector data as prior knowledge plays an important role, which is represented in the semantic model with an additional GIS-layer (ATKIS DLMBasis, cf. Section 3): Field boundaries and wind erosion obstacles are exclusively located in the open landscape, thus, further investigations are focussed to this area. Additionally, the objects road, river and railway are introduced in the semantic model as field boundaries with a direct relation from the GISlayer to the real world (i.e. a road is a field boundary). Of course, the underlying assumption is based on correct GIS-objects. Modelling of the GIS-objects in the geometry and material layer together with the image layer is not of interest, because they do not have to be extracted from the imagery; thus, the corresponding parts are represented with dashed lines in Fig. 9. Nevertheless, additionally extracted objects which are not yet included in the GIS database can be introduced at any time. The field is divided in the semantic model into field boundary and field area in order to allow for different modelling in the layers. The field boundary is a 2D elongated vegetation boundary, which is formed as a straight line or edge in the image. The field area is a 2D vegetation region, which is a homogeneous region with a

high NDVI (Normalised Difference Vegetation Index) value in the colour infrared (CIR) image. The wind erosion obstacle is divided in hedge and tree row due to different available information from the GIS-layer, which is partially stored in the database. The wind erosion obstacles are not only described by their direct appearance in geometry and material, but also through the fact, that due to their height (3D object) there is a 2D elongated shadow region next to the object and in a known direction. In particular, the relationships between the objects to be extracted are of interest leading to connections within the layers: One object can be part of another one or be parallel and nearby, and together they form a context network in the real world. For instance, wind erosion obstacles are not located in the middle of a field because of disadvantageous cultivation conditions, but solely on the field boundaries. The strategy derived from the modelled characteristics of the field boundaries and wind erosion obstacles aims at realising an automatic processing flow. Imagery and GIS-data are the input data to initialise the process: First, field boundaries and wind erosion obstacles are extracted separately. At the end, a combined evaluation of the preliminary results is advanta-

Figure 9: Semantic Model.

geous due to the modelled geometrical and thematic similarities of the objects of interest getting a refined and integrated solution. The strategy extracting the field boundaries starts with the derivation of the open landscape from the GIS data. In addition, within the open landscape, regions of interest are selected using the roads, rivers and railways as borderlines (cf. Section 4.2.2.). Consequently, the borderlines of the regions of interest are field boundaries, which are already fixed. In each region of interest a segmentation is carried out in a coarse scale ignoring small disturbing structures and thus exploiting the relative homogeneity of each field. The aim is to obtain a topologically correct result, even though the geometrical correctness may not be very high. Afterwards, network snakes are used to improve the preliminary field boundaries. The strategy extracting the wind erosion obstacles starts again within the open landscape due to the modelled characteristics. Search buffers can be defined around the GIS objects roads, rivers and railways, because tree rows or hedges are often located alongside these objects, and have to be verified using the imagery. In contrary, there is no prior information about the loca-

Figure 10: Generation of regions of interest (a) and adjustment of tree rows and hedges (b).

tion of all other wind erosion obstacles, which can lie anywhere within the open landscape. In addition to the modelled material characteristics, the geometrical part such as the straightness or minimum length has to be considered. Finally, the combined evaluation of the preliminary results identifies discrepancies between the field boundaries and wind erosion obstacles. For example, extracted wind erosion obstacles without a corresponding extracted field boundary have to be checked, whether nearby a field boundary is missed, or whether the extraction of the wind erosion obstacle is wrong. Consequently, the combined evaluation and refined extraction process leads to a consistent and integrative final result.

4.2.2 Preparation of GIS data As described in the previous section the regions of interest are primarily derived from roads, rivers and railways according to the GIS data, as far as these objects are located in the open landscape. The fact that these network generating objects imply a segmentation of the open landscape is used as starting point, as all necessary borderlines are already present in

this data set – however, this segmentation is too extensive (e.g. because of administrative reasons). In order to detect only the borderlines concerning regions of interest, a topological data model is generated (Egenhofer et al., 1989), consisting of a embedded graph structure, which contains the open landscape regions and the network generating objects. This graph based data model represents boundaries of area objects and one-dimensional objects as edges and therefore allows deciding if certain segmentations are a result of the separating network. The removal of all edges, which are not caused by the separating network, implies the merging of the adjacent regions and finally results in the generation of the regions of interest (Fig. 10 a). Furthermore, the initial topological data model (i.e. before edge removal) is used to prepare the tree row and hedge objects used in the wind erosion obstacle extraction process by extending them to the next respective edge (Fig. 10 b, dark lines). This process of alignment is similar to topological error correction of inaccurately produced maps (Ubeda et al., 1997), whereas these objects are not inaccurately acquired but tree rows or hedges often end with a short distance to roads, rivers or railways.

bouring fields may lead to missing boundaries. In order to overcome this problem, the standard deviation of the grey values in the image within a quadratic mask is computed, i.e. high values typically belong to field boundaries. Extracted lines from the standard deviation image within sufficiently large field regions are evaluated concerning length and straightness. Positively evaluated lines are used to split the initially generated field regions. The result of the segmentation leads to topologically correct but geometrical inaccurate results. Network snakes are used to improve the geometrical correctness of the preliminary field boundaries while maintaining the topological constraints. Snakes were originally introduced in Kass et al. (1988) as a mid-level image analysis algorithm, which combines geometric and/or topologic constraints with the extraction of low-level features from images. A traditional snake is a parametric curve (Kass et al., 1988; Butenuth and Heipke, 2005) (2) where s is the arc length, t the time, and x and y are the image coordinates of the 2D-curve. The image energy is defined as (3)

4.2.3 Extraction of field boundaries The extraction of field boundaries starts with a segmentation within each region of interest exploiting the modelled similar characteristics of each field. The border area of each region is masked out due to disturbing heterogeneities, which are typical for fields and deteriorate the subsequent steps. A multi-channel region growing is carried out using the RGB- and IRchannels of the images with a resolution of few meters. The four channels give rise to a 4dimensional feature vector: Neighbouring pixels are aggregated into the same field region, if the difference of their feature vectors does not exceed a predefined threshold. In concert with the modelled constraints, the resulting field regions must have a minimum size. The case of identical vegetation of neigh-

where I represents the image, | I(v(s,t))| is the norm of the gradient magnitude of the image at the coordinates x(s) and y(s) and |v| is the total length of v. In practice, the image energy EI(v) is computed by integrating the values | I(v(s,t))| in precomputed gradient magnitude images along the line segments that connect the polygon vertices. The internal energy is defined as (4) where the function α(s) controls the first-order term of the internal energy: the elasticity. Large values of α(s) let the contour become very straight between two points. The function β(s) controls the second-order term: the rigidity. Large values of β(s) let the contour become

smooth, small values allow the generation of corners. α(s) and β(s) need to be predefined based on experimental data and experience. The total energy of the snake, to be minimised, is defined as Esnake = Ev(s,t) + EI(v). A minimum of the total energy can be derived by embedding the curve in a virtual viscous medium solving the equation (5) where γ is the viscosity of the medium and κ is the weight between internal and image energy. After substituting of (6) in equation 5, a solution for the contour at time t depending on time t-1can be computed: ,

(7)

(I: identity matrix) Vs,t stands for either X or Y, the vectors of the x and y coordinates of the contour. A is a pentadiagonal matrix, which depends only on the functions α(s) and β(s). A main problem of snakes is the necessity to have an initialisation close to the true solution. Methods to increase the capture range of the image forces are not useful in our case, because there are lots of disturbing structures within the fields, which can cause an unwanted image energy and therefore a wrong result. Thus, only the local image information is of interest. As described above, the result of the segmentation is used to initialise the processing. In addition to the good initialisation the derivation of the topology of the initial contours is most important. The global framework of the accomplished segmentation gives rise to a network of the preliminary field boundaries: Enhancing traditional snakes, network snakes are linked to each other in the nodal points and thus interact during processing (cf. Fig. 13 b). Similarly, the connection of the end points of the contours to the borders of the region of interest must be taken into account: In contrast to the nodal points, a movement of the

end points is only allowed along the borders of the regions of interest. These topological constraints are considered, when filling the matrix A (see equation 7) with the functions α(s) and β(s), which in our case are taken to be constant.

4.2.4 Extraction of wind erosion obstacles The extraction of wind erosion obstacles is concentrated to the open landscape, as pointed out in the semantic model. No other GIS data (road, river, railway) is used, i.e. no prior geometric information reduces the search area, in order to acquire all tree rows and hedges within the open landscape. A texture segmentation is accomplished in the CIR images with a resolution of few meters yielding the texture classes tree/hedge, settlement area and agricultural area, for details concerning the approach cf. Gimel´farb (1996). The training images are generated manually by a human operator. The texture class of interest tree/ hedge is, as expected, fragmented and not complete. Therefore, the elongated and small regions of the class are vectorised. Starting point are the left and right boundaries of these regions: centrelines are then computed, which are evaluated concerning length and straightness. Currently, the third dimension, as described in the semantic model, is not used to extract the wind erosion obstacles due to a missing digital surface model.

5. Results In this section some results of test areas in northern Germany are presented. First, results are shown to reveal the possibility to perform the alignment and change detection for the updating of vector data sets with a high degree of automation. Second, results of the extraction of field boundaries and wind erosion obstacles are highlighted to demonstrate the capability of the described methods.

5.1 Results of the vector data integration In Fig. 11 the results of the different alignment

Figure 11: Result of the approach, GK 25 (thin, dotted line) aligned on the reference German digital topographic map (ATKIS, dark lines).

methods can be seen. The ICP algorithm using an iterative four-parameter transformation is very suitable for the alignment of objects which already have a similar geometry. The alignment parameters which are the results of the ICP algorithm can give a first hint whether the geometric discrepancies are due to map creation and acquisition methods (a., d.) or to changes which occurred to the real world object (c.). The resulting scale factor which was calculated for the alignment of object c. was rated as too large and therefore no alignment was performed. Of course changes of the topography can not be discovered by simple evaluation of these parameters. For object b. the algorithm achieved a best fit with four parameters below certain thresholds, but the remaining differences between the geometries still have to be corrected. The DIA implementation showed very good results to compensate local discrepancies which can not be corrected using the fourparameter-ICP, as it aims for the best alignment of the whole object. There is no single four-parameter transformation which is capable of adjusting large extended natural objects like rivers object to their corresponding partners in ATKIS, so that parts e, f and h would be properly aligned. The results does exhibit some small gaps between the geometries (see e.g. area (g)): this is due to the fact that the DIA

algorithm in the current version is only working only with existing vertices without inserting additional ones. In order to identify possible changes between the objects in the different representations, after the alignment process has been completed an intersection of corresponding objects is used for the change detection. The intersection is performed on all types of topographic elements which are represented in the data sets. The results of the intersection process will be evaluated and according to their semantic attribution sorted into three different classes. - Type I: Segment has corresponding semantic attribute in both data-sets, no adaptation required, - Type II: Segment has different semantic attributes, and a suitable information can be derived from the reference data set and the candidate data set will be updated. - Type III: Segment has different semantic attributes, but a suitable information can not be derived from the reference data set. Manual interaction is required. Type II will also be assigned to objects which are represented in the reference, but not the candidate data-set, this is the result of different updating periods between the reference and the candidate data set, which result in

outdated objects. While Type I and II require only geometric corrections or attribute adaptation and can be handled automatically, Type III needs more of the operators attention. Depending on the size and the shape of a Type III segment and by using a user-defined threshold, these segments can be filtered, removed and the remaining gap can be corrected automatically, this will avoid the integration of sliver polygons and segments which are only the results of geometric discrepancies.

5.2 Results of the raster and vector data integration 5.2.1 Results of the extraction of field boundaries Results of the proposed strategy to extract field boundaries are presented in this section. The result of the first step, the segmentation, is shown in Fig. 12: The boundaries of the regions of interest are depicted in black, the preliminary field boundaries are depicted in white. Compared to reference data, the completeness of the segmentation within a test area of 25 km2 (Lower Saxony, North of Germany) is 73 %, the correctness is 82 % and the rms error computed

by considering the horizontal derivation between extracted and reference result is 5.8 m or 3 pixels. The quality of the results is promising, but as expected the geometrical correctness is not very high. One region of interest is selected to demonstrate the methodology of the network snakes (cf. Fig. 13 a-d): The initialisation of the snake â&#x20AC;&#x201C; equivalent to the result of the segmentation â&#x20AC;&#x201C; is shown in the first figure. The topology is pointed out in Fig. 13 b): The individual snakes forming the network are linked to each other in the nodal point (black), and the end points (black with white hole) are linked to the boundary of the region of interest. The movement of the snake superimposed to the standard deviation image is shown in Fig. 13 c), the final result superimposed to the real image in Fig. 13 d). The example demonstrates, that network snakes are a useful tool to improve the geometrical correctness of topologically correct but geometrically inaccurate results.

5.2.2 Results of the extraction of wind erosion obstacles The result of the texture segmentation is presented in Fig. 14: The class tree/hedge is depic-

Figure 12: Result of the segmentation.

Figure 13a

Figure 13b

Figure 13: Results of the use of network snakes: a) initialisation, b) building the topology, c) initialisation (white) and movement of the snake (black), d) extracted field boundaries. Figure 13c

Figure 13 d

ted in white, the class agricultural area in light grey and the class settlement area in dark grey. The parts of the image, which do not belong to the open landscape exploiting the prior GIS knowledge, are marked in black. The fragmented class tree/hedge is vectorised yielding the wind erosion obstacles, as depicted in Fig. 15 in white. The texture segmentation works well, but an additional digital surface model is needed to improve and stabilise the results.

6. Conclusions The geospatial federated database has provided the expected access to the involved data sets and to the results of the matching and extraction processes. It provides a basis not

only for querying linked objects but also for update propagation. Utilised appropriate data structures like topological data models offer further approaches to assure topological consistence during geometric alignment and to accomplish a structural graph based matching. The geometric comparison and the derivation of object links, together with the ICP and DIA alignment followed by rubber-sheeting and the evaluation process show good results. So far this strategy was used with one data set as reference which remains unchanged, while a second data set is adjusted, but it can also be adapted to other vector based conflation tasks requiring an intermediate geometry. Depending on the selected thresholds large discrepancies of the shape boundaries can consider

Figure 14: Result of the texture segmentation.

Figure 15: Result of the extracted wind erosion obstacles.

as outliers and can be treated accordingly in the subsequent overlay and analysis step. While matching can be performed automatically, there are still some steps during geometric alignment and change detection which require the decision of a human operator, but the high degree of automation reduces the manual process considerable. Future work will concentrate on developing a strategy to also automate these processes. Especially the selection of the appropriate alignment method and the corresponding thresholds will be enhanced. The method integrating raster and vector data by means of the extraction of field boundaries and wind erosion obstacles from imagery exploiting prior GIS knowledge has also chosen promising results. Concerning the extraction of field boundaries the basic step of the strategy, the segmentation, could be enhanced by using an additional texture channel to prevent wrong field boundaries. They occur, when there are large heterogeneities within a field. The control of the network snakes could be improved by selecting variable values when filling the matrix A to increase the geometrical correctness, the use of network snakes provide a topologically consistent solution. Regarding the extraction of wind erosion obstacles, initial results show the potential, but also the limitations of the current approach. The use of a digital surface model will probably be very helpful to achieve better results. Finally, the com-

bined evaluation of the field boundaries and wind erosion obstacles will identify discrepancies between the different extracted objects, resulting in a more consistent and integrative final result.

References Baltsavias, E.P. (2004): Object Extraction and Revision by Image Analysis Using Existing Geodata and Knowledge: Current Status and Steps towards Operational Systems. ISPRS Journal of Photogrammetry and Remote Sensing 58 (3-4), 129-151. Batini, C., Lenzerini, M., Navathe, S. B. (1986): A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comput. Surv. 18(4), 323-364. Beeri, C., Doytsher, Y., Kanza, Y., Safra, E., Sagiv, Y. (2005): Finding Corresponding Objects when Integrating Several Geo-Spatial Datasets. Proc. 13th ACM International Symposium on Advances in Geographic Information Systems, Bremen, Germany, 4-5 November 2005, 87-96. Besl, P., McKay, N. (1992): A Method for Registration of 3-D Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence (Special issue on interpretation of 3-D scenes part II) 14 (2), 239-256.

Butenuth, M. (2004): Modelling the Extraction of Field Boundaries and Wind Erosion Obstacles from Aerial Imagery. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXV (Part B4), 1065-1070. Butenuth, M., Heipke, C. (2005): Network Snakes-Supported Extraction of Field Boundaries from Imagery. In: Kropatsch, Sablatnig, Hanbury (Eds), 27th DAGM Symposium, Wien, Österreich, Springer LNCS 3663, 417-424. Conrad, S. (1997): Föderierte Datenbanksysteme, Springer-Verlag, Berlin. Devogele, T., Parent, C., Spaccapietra, S. (1998): On spatial database integration, International Journal of Geographical Information Science, 12:4, 335-352. Doytsher, Y. (2000): A rubber sheeting algorithm for non-rectangular maps, Computer & Geosciences, 26 (9-10), 1001-1010. Egenhofer, M.J., Frank, A.U., Jackson, J.P. (1989): A Topological Data Model for Spatial Databases, Lecture Notes in Computer Science 409, 271-286. Gimel´farb, G.L. (1996): Texture modelling by multiple pairwise pixel interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (11), 1110-1114. Goesseln, G. v., Sester, M. (2004): Integration of geoscientific data sets and the german digital map using a matching approach. International Archives of Photogrammetry and Remote Sensing 35 (Part 4B), 1249-1254. Hettwer, J., Benning, W. (2000): Nachbarschaftstreue Koordinatenberechnung in der Kartenhomogenisierung, Allg. Verm. Nachr. 107, 194-197. Hill, D. A. and Leckie, D. G. (Eds.) (1999): International forum: Automated interpretation of high spatial resolution digital imagery for forestry. February 10-12, 1998, Natural

Resources Canada, Canadian Forest Service, Pacific Forestry Centre, Victoria, British Columbia. Kass, M., Witkin, A., Terzopoulus, D. (1988): Snakes: Active Contour Models. International Journal of Computer Vision 1, 321-331. Kohonen, T. (1997); Self-Organizing Maps. Springer. Kleiner, C., Lipeck, U., Falke, S. (2000): ObjektRelationale Datenbanken zur Verwaltung von ATKIS-Daten. In: Bill, R., Schmidt, F.: ATKIS Stand und Fortführung, Verlag Konrad Wittwer, Stuttgart, 169-177. Laurini, R. (1998): Spatial multi-database topological continuity and indexing: A step towards seamless GIS data interoperability, International Journal Geographical Information Science, 12:4, 373-402. Löcherbach, T. (1998): Fusing Raster- and Vector-Data with Applications to Land-Use Mapping. Inaugural-Dissertation der Hohen Landwirtschaftlichen Fakultät der Universität Bonn. Mantel, D., Lipeck, U. W. (2004): Datenbankgestütztes Matching von Kartenobjekten. In: Arbeitsgruppe Automation in der Kartographie - Tagung Erfurt 2003, BKG, Frankfurt, 145-153. Mayer, H. (1999): Automatic Object Extraction from Aerial Imagery - A Survey Focusing on Buildings. Computer Vision and Image Understanding 74 (2), 138-149. Rigeaux, P., Scholl, M., Voisard, A. (2002): Spatial Databases with Application to GIS, Morgan Kaufman Publishers. Sattler, K.-U., Conrad, S., Saake, G. (2000): Adding Conflict Resolution Features to a Query Language for Database Federations, 41-52. Sester, M., Hild, H. & Fritsch, D. (1998): Definition of Ground-Control Features for

Image Registration using GIS-Data. In: Schenk, T. & Habib, A. (Eds.), IAPRS 32/3, ISPRS Commission III Symposium on Object Recognition and Scene Classification from and Multisensor Pixels, Multispectral Columbus/Ohio, USA, 537-543. Torre, M., Radeva, P. (2000): Agricultural Field Extraction from Aerial Images Using a Region Competition Algorithm. International Archives of Photogrammetry and Remote Sensing XXXIII (Part B2), 889-896. Yuan, T., Tao, C. (1999): Development of conflation components. In: Li, B., et al. (Eds.), Geoinformatics and Socioinformatics – The Proceedings of Geoinformatics ´99 Conference, Ann Arbor, USA, 19-21 June 1999, 1-13. Walter, V. & Fritsch, D. (1999): Matching Spatial Data sets: a Statistical Approach, International Journal of Geographical Information Science 13(5), 445–473. Ubeda, T., Egenhofer, M. J. (1997): Topological Error Correcting in GIS. In: Advances in Spatial Databases, 5th International Symposium, SSD' 97.

ISSNEW – Developing an Information and Simulation System to Evaluate Non-point Nutrient Loading into Waterbodies Dannowski R. (1), Arndt O. (2), Schätzl P. (2), Michels I. (2), Steidl J. (1), Hecker J.-M. (1), v. Waldow H. (1), Kersebaum K.-C. (1) (1) Leibniz-Centre for Agricultural Landscape Research (ZALF), Eberswalder Straße 84, D-15374 Müncheberg, Germany, rdannowski@zalf.de (2) WASY GmbH Institute for Water Resources Planning and Systems Research, Waltersdorfer Straße 105, D-12526 Berlin, mail@wasy.de

1. Introduction Since it came into effect on 22th December 2000, the European Water Framework Directive (WFD 2000) is being implemented as an important element of Community action in the area of water resources protection and management. This means a number of new requirements to be met by the water management administrations. Not only does additional information about the state of water resources need to be collected and systematically prepared but also, extending on this data, all relevant waterbodies in the EU member states had to be initially documented and assessed by the end of 2004. If a waterbody failed the parameters of a good ecological status, then steps are to be specified and undertaken appropriate to meet the WFD requirements. Measures should be introduced based on efficiency and differentiated according to specific place and time. They will be summed up in river basin management plans to be set up by 2009 with the fulfilling of WFD targets to take place no later than 2015. Also incorporated in these river basin management plans will be an areawide summary of significant pressures and impact of human activity on the status of surface water and groundwater, comprising amongst others, the estimation of diffuse source pollution including a summary of land use. All these activities are to be carried out within

a governmentally controlled procedure undergoing a multi-step process of open participation from the public. In determining the appropriate measures, tools must be available that explicitly address space and time in allocating the courses of action, e. g. via scenario analysis techniques based on distributed and process-oriented models. In doing so, relevant information should be factored in as extensively as possible. Something unconditionally stipulated by WFD are river-related regions projects, being as a rule not congruent with the traditional administrative structure of Germany. From the perspective of IT, this entails such topics as multi-user access, client/server, and data storage, among many others. The presented ISSNEW project concerned the preparation of software components meeting essential WFD parameters in the form of a market-ready product family to supply many of the current WFD obligations with efficient solutions. This has been exemplified by nitrogen emission analyses, which track the subsurface transport processes of excess N from the plant root zone via the unsaturated zone and the groundwater zone to the surface waterbodies. Potential users are state and the federal environmental agencies responsible for planning measures to meet the WFD targets.

ISSNEW was planned and executed as a twoyear cooperative project. The partners have been two institutes of a research centre (coevally the leading partner), experienced in hydrologic modelling and software design, and a consulting and software developing enterprise. Insofar, the project work itself was an experiment, apart from its intent to unite the philosophies of open source and proprietary software development.

2. Objectives of the Project ISSNEW implied to unleash information technology available in the fields of water resources planning and management by means of a modular information and simulation system. Especially it was directed at introducing and providing modern information technology in the course of valuating and planning of measures for river basin management. Thus the project aimed at developing the following software components: 1. GIS and data bank based information system for the gathering, structuring and visualisation of geo-data and simulation results on non-point nutrient input into waterbodies (groundwater and surface waters). 2. Simulation system for the evaluation of the effects on water quality offered by measures against non-point nutrient flow from agricultural lands and non-point nutrient input into waterbodies. 3. Bi-directional intelligent interfaces between the information system and simulation system. In view of the new questions arising from implementing the WFD, the software system to be developed was meant to be utilisable as a decision support tool. In particular it had to be qualified for drawing regionally differentiated conclusions on the risks from non-point nitrogen sources to aquatic ecosystems as well as for specifying and valuating measures of preventing those risks. Consequently, through the components themselves and especially through their effective collaboration supported

by large geodatabases, the following goals were to be attained: -

A platform-independent, completely component based software system for data storage, analysis, and presentation to be available extending on a standardised information structure and taking into account the importance of simulation models. The growing stocks of geodata already existing as well as those still to be collected by water management institutions and services in river basins to be supplied performantly by the implemented information system, improved, and optimally further processed. Against the background of the WFD implementation, an improved management of knowledge to be made possible through the integration of space-time-based, scenario-capable modelling software for nonpoint nutrient input into the groundwater and into the surface waters.

3. Results 3.1 The information system The software ArcWFD (now termed WISYS) developed by WASY forms the basis for the ISSNEW information system. ArcWFD/WISYS is used by numerous government agencies all over Europe to implement the WFD. Basic information about ArcWFD/WISYS is available from www.arcwfd.com. ArcWFD/WISYS is based on the geographic information system software ArcGIS developed by ESRI. It consists of an extensive data base structure for storing WFD-related data and additional tools in ArcGIS. The database structure has been extended for purposes within the ISSNEW project. It allows efficient handling and analysis of all basic data. An ESRI personal geodatabase is used within ISSNEW for data storage.

Figure 1: ArcWFD/WISYS User Interface

For ISSNEW, the following ArcWFD/WISYS tools are of major interest: - Theme overview Theme activation according to user preferences and user permissions - Theme and meta data manager Preselection for theme and meta data visualisation For general data analysis and display the means of ArcGIS are used. ISSNEW Object Model Efficient data storage and transfer of basic data into the different simulation systems require a sophisticated normalised relational database system. The object-oriented ISSNEW data model has been developed in Unified Modeling Language UML, using the CASE tool Microsoft Visio. The ArcWFD/WISYS data model has been developed and optimised during several years. It completely implements the horizontal guidance document »Water Bodies« (2002) as well as guidance document »Implementing the GIS Elements of the WFD« (2002). The ArcWFD object model contains around 300 classes with approximately 3,000

different attributes, relations, value domains and rules. The ISSNEW object model consists in mainly two parts: 1. Basic data 2. Model data for newSocrates and UZModule In the »basic data« section all relevant basic data can be stored. Class structures and attributes correspond to the data sampling formats. The content of the »model data« section is derived from the basic data and provides the simulation systems with the data structures they require for modelling. In Figure 2 a cutout of the »model data« object model is shown. From the ISSNEW-related basic data section the hydrogeologic data is taken as an example. Storing 2D and 3D hydrogeologic data in geodatabases has become a more and more important topic in research during the last years. There are several first ideas for data structures and standardisation, for example by Strassberg & Pierce (2001) or Strassberg & Maidment (2004).

Figure 2: ISSNEW object model: Model data (cutout)

As the corresponding data models were developed simultaneously to ISSNEW, and as they were not directly applicable to the ISSNEW project, a completely new object model had to be developed. The idea was to primarily store original hard field data like borehole information, adding derived and interpreted data like stratigraphic information and model layers in a second step. Applying complex normalised data models, there is a strong need for convenient data import from existing data collections and for easy integration of additional sample data. Within ISSNEW, the software GeoDAta eXchange (by WASY GmbH) is used. GeoDAta eXchange allows the transfer of file-based data and data out of existing geodatabases into a normalised geodatabase structure. Linking source and destination attributes is done graphically (Figure 3), and during the data import the relations between the objects

are created automatically. For repeated import of data structures numerous template projects for ISSNEW have been created. Editor with Mask Generator The Editor has been developed to efficiently enter and edit data in the GIS system. This tool offers diverse functionality for editing graphical and non-graphical objects in a geodatabase. All objects in all currently loaded layers including all objects connected via relations can be displayed and edited. Object selection can be done by attribute enquiry or by geometric selection. For editing a specific form with interactive input features is provided, which is automatically adapted to the different object types (Mask Generator). Relations to other objects are visualised and can be edited, too. Related objects can be directly edited. In Figure 4 the Editor with Mask Generator is shown.

Figure 3: GeoDAta eXchange import project

Figure 4: Editor with Mask Generator â&#x20AC;&#x201C; user interface

Figure 5: ISSNEW UML Component Diagram

3.2 The simulation system The ISSNEW software system is built up of a set of associated simulation components which are coupled with each other and with the information system via ISSNEW-specific interfaces (Figure 5). These simulation components are as follows: -

FEFLOW® (WASY): two-dimensional nonsteady groundwater flow and nitrate transport through the saturated zone (FEFLOW is actually the main controlling component for simulation runs using FEFLOW’s interface manager)

newSocrates (ZALF): stepwise steady-state soil-vegetation-atmosphere transfer (SVAT) of moisture and dynamic nitrogen transformation in soil (redesign of the open-source SOCRATES legacy code)

UZModule (ZALF): one-dimensional (down/upward) non-steady water and nitrate flow in the deeper unsaturated zone (»trailing wave« approach combined with enhanced upwind differencing algorithm – open source programming)

MODEST (ZALF): grid-based steady-state evaluation of the nitrate transport and transformation in the (un-)saturated zone (functionally extended C++ implementation of a GIS-based stand-alone tool for non-point N sources risk analysis).

Direct linking of simulation components and the information system »on the fly« permits detailed reproduction of the partially interrelated processes of leachate flow and nitrate transformation.

3.2.1 FEFLOW FEFLOW is a commercial simulation system being developed by WASY GmbH since more than 20 years. It is used all over the world by government agencies, universities and research institutions, and private consulting companies. FEFLOW has been designed for the most complex groundwater simulations in water resources management, mining, for environmental tasks, in hydrogeology and for geothermal energy exploitation. Interaction between groundwater and surface water bodies can be considered. FEFLOW allows the simulation of

very large models in high temporal and spatial resolution. FEFLOW is based on finite-element technology. Its modelling abilities encompass 2D and 3D simulation, saturated and unsaturated conditions, flow and mass/heat transport. FEFLOW’s open programming interface IFM enables interfacing to other simulation codes by linking user-written code to the commercial simulation engine. Within ISSNEW, FEFLOW is used to simulate 2D groundwater flow and nitrate transport in groundwater. It is also the controlling component for the simulation network, i. e., FEFLOW is used for starting and stopping the simulation and for controlling the time stepping procedure. The interfaces between the different simulation components are included into the FEFLOW user interface via IFM.

3.2.2 newSocrates This soil-crop-atmosphere oriented regional model consists of (1) a plant growth model based on the evolon approach and starting from a yield estimate to dynamically calculate biomass accumulation (Mirschel et al. 2002), (2) a coupled set of capacity models to simulate water budget (Wegehenkel 2000) and nitrate transformation (Kersebaum 1995) in soil. Infiltration is calculated following an empirical approach, moisture content and flux are captured based on a cascade of non-linear reservoirs. Within the nitrogen model, net-N mineralisation, denitrification and nitrate transport (as conveyed by vertical water flow) are simulated. The time step is fixed at one day, thus newSocrates is determining the timing for the whole ISSNEW simulation system. The vertical domain for newSocrates use is confined to 2 m below surface within ISSNEW. newSocrates module is organised into the blocks COM interface, database access (ODBC) and SVAT model, as well as the model-specific part of the ISSNEW-PGDB database. Consistent data preparation (intersecting spatial input information for defining topoi, filling

newSocrates-relevant tables of ISSNEW-PGDB) is unconditional prerequisite for successful model run.

3.2.3 UZModule UZModule was tailored of two components: KinFlow for vertical flux calculation in the deeper unsaturated zone (> 2 m below surface), transportMPDATA for the associated nitrate transport. Apart from the other simulation models, there was no development before the project start. Work was based on a thorough examination of the existing literature to identify the appropriate methodology qualified for the prevailing flow conditions. KinFlow implements an analytic solution of the quasi-linear hyperbolic partial differential equation of moisture flow using a »trailing wave« approximation (Charbeneau 1984) for tracking a wetting front towards groundwater. This approach takes the middle ground between conceptual models and models solving the more processbased Richards equation in terms of computational cost and the physically correct description of the process. transportMPDATA was designed to use the MPDATA algorithm (Multidimensional Positive Definite Advection Transport Algorithm – Smolarkiewicz 1983) to evaluate the advective transport of dissolved nitrogen through the unsaturated zone based on upwind differencing. Smolarkiewicz & Margolin (1998) presented an enhanced form of that algorithm which is inherently mass conservative, stable and simple to code and provides the basis for nitrate transport simulation in UZModule. By the end of the project period UZModule development was not yet finished. In particular, the transport component transport MPDATA is missing. Therefore, for ISSNEW test runs a dummy code had to be used for the time being.

3.2.4 MODEST MODEST reproduces the fundamental aspects of the underground dissolved N transport and

nitrate depletion in a spatially explicit, twodimensional manner by means of coupled analytic and empirical calculation methods assuming steady-state flow conditions. The actual C++ implementation relies on a grid-based approximation of the advective, nondispersive groundwater-borne solute transport. From given groundwater table and known aquifer parameters, the model provides an evaluation of the transport and transformation dynamics of nitrates along the underground path â&#x20AC;&#x201C; from entering the groundwater at any point of the model region towards discharging into the respective draining waterbody. Pathlines are calculated according to Pollock (1988). Compared to the preceding GIS implementation, MODEST was extended by the possibility of a spatially differentiated parameterisation of the denitrification module and a nitrate retention potential calculation. The model is applicable for large areas in virtually unconfined spatial resolution. Input data and the computation results current-

ly are still exchanged raster-based via a simple GIS interface (ASCII-GRID). Though not fully integrated into ISSNEW, MODEST represents a valuable addition to the much more sophisticated simulation system. It is recommended to be applied prior to running the system to reduce the modelling area of ISSNEW and, thus, to significantly shorten the expenses for parameterisation and computation.

3.3 The intelligent interface The intelligent interface is the central core of the ISSNEW simulation system. It connects the information system and the simulation system and provides a graphical user interface for the entire system. Via the user interface participating simulation components can be chosen, basic settings can be done and the type and point in time of data exchange between modules can be defined.

Figure 6: ISSNEW parameter association

Scenario management For the planning of measures within the scope of the WFD, scenario techniques have to be applied. The graphical user interface of the ISSNEW interface allows for handling different scenarios and scenario versions, which can be loaded, edited and saved. All settings for a prepared scenario are saved within the ISSNEW database. The main settings of a scenario are the participating modules (information system, simulation components), the workspace, the starting time of the simulation and the options for data transfer between different modules. Data exchange One of the most important functions of the ISSNEW interface is transferring data between the different components of the simulation. Data exchange is defined graphically by linking »source« modules with »destination« modules (Figure 6)

Each parameter association has specific properties like moment of execution (e. g., before or after a time step) and possible options for regionalisation. Regionalisation Coupling different simulation components and the information system, it is necessary to deal with different geometric properties of the related data. For example, the simulation components newSocrates and UZModule run on an 1D basis for so-called topos polygons, whereas FEFLOW uses a nodal and elemental spatial discretisation. The ISSNEW interface provides the possibility to regionalise data during transfer. Two different regionalisation methods are currently supported: inverse distance weighting for point-related data and a polygon-topolygon intersection algorithm for areal data. Via COM interface additional regionalisation methods can easily be included. The Regionalisation Editor offers the possibility to limit the regionalisation to subsets of the entire geometry (Figure 7). Figure 7: Regionalisation Editor

Technical implementation Designing the ISSNEW interface the focus was set to a convenient and easy extensibility of the system. Therefore Microsoft COM technology has been used to capsule all modules of the system. Interface definitions have been developed which have to be implemented by all participating components. There are three classes of modules: -

Simulation components participating in the simulation run Data providers Data consumers

Additional interface definitions, for example, allow for linking data sources and destinations or error handling. The IParticipator interface is required to participate in the ISSNEW simulation system and must be implemented by all components to be included. Its methods are used to control the simulation run. Data providers and data consumers use the IDataProvider and IDataConsumer interface definitions. Each of the ISSNEW modules can implement these interfaces, depending on whether data have to be provided or received. Data exchange between the components is realised using the IDataLink interface which contains references to the data source, the data destination and additional attributes. If source and destination geometries cannot be referenced via a common attribute, the IRegionalization interface is used to call a spatial interpolation routine. As both source and destination data can be organised in time series, an ITimeTable interface provides the functionality to access time series data. Scenario management is done using a specific model repository within the geodatabase. Using this repository, different applications are enabled to store independent scenario or

model definitions along with arbitrary modelspecific data. To manage the corresponding database entries, a separate component called Storage Server has been developed which provides the interfaces to load and save data.

3.4 Pilot area, test scenario and test results Usability, performance and validity of the software system were to be tested and proven by the example of a mesoscale pilot area under agrarian land use in the unconsolidated rock region of eastern Germany. After consulting the Brandenburg state environmental agency (Landesumweltamt Brandenburg), the area of a groundwater body (ca. 100 km2, Figure 8) was chosen situated in the upland neighbourhood of the Oderbruch depression in eastern Brandenburg. This body of groundwater »Oder 2« (ODR_OD_2) had been identified during the initial characterisation in the course of implementing the WFD as to be »at risk of failing to meet the objectives of a good chemical status« (IKSO 2005), based on the detected concentration of nitrates in groundwater. For example, nitrate concentration in the upper aquifer near the village Reichenow has been relatively constant at about 150 mg/l, threefold the maximum permissible value in drinking water. This fact very well corresponds with the predominantly low nitrate retention potential of that region as depicted in a preliminary inspection by means of MODEST. Furthermore, the suitability of this groundwater body for testing the ISSNEW simulation system is educed from the geohydrologic situation: depth to groundwater strongly varying between up to 70 m near the water divide and less than 2 m in the Oderbruch depression; streamlines predominantly parallelised towards the receiving surface waterbody (the channelised Old Oder river), though also several production wells are being operated near the town of Wriezen; partially unconfined groundwater conditions prevailing as combined with considerable infiltration potential of the unsaturated zone, but also, in other parts, confined or even temporarily water-bearing conditions are met; the aquifer system being pervaded by widely spread, in

Figure 8: ISSNEW pilot area »Oder 2« (ODR_OD_2), Eastern Brandenburg

turn sometimes but local aquitards. Altogether, this renders the pilot area to be suitable for proving the ISSNEW simulation system by a pretty wide spectrum of practical cases. Model runs performed during the ISSNEW working period integrated FEFLOW and newSocrates as a prototype simulation system. UZModule, however, was replaced by a dummy component. Comparable to the simulation components it operated on the basis of the topos geometry as filed in the model-specific part of the ISSNEW information system. This dummy component simulated a spatiotemporally variable flow of percolate (water and dissolved nitrate) through the deeper unsaturated zone. This offered the possibility of testing all the processes of simulation and data transfer to be executed, with the exception of the actual UZModule simulation. MODEST was excluded a priori from being integrated into the simulation system.

Technical test calculations were performed for a simulation period of 10 years, 100 days for storing complete results, respectively. All parameter associations were performed without errors, read and write operations and the simulation run were finished without problems. The system ran stable, no major oscillations occurred. The required simulation times were far below the expected ones. An AMD Athlon™ XP 2800+ System with 2.08 GHz and 512 MB RAM was used for the tests. For the polygon-to-polygon regionalisation necessary after initialisation in each time step, 300,000 data sets per second could be processed, for writing results for observation points to the database using IDW regionalisation, 12,000 values per second were written. FEFLOW took around 1.9 second for the simulation of a time step of 1 day. Generally the coupling of simulation systems and the time-step based transfer of data bet-

ween the components lead to reasonable simulation times. A final estimation of the cost-value ratio, however, has to be postponed after a real application. The results of the technical tests point at the usefulness of the concept in real model simulation.

4. Summary and conclusions Growing stocks of geodata and new requests against the background of the European Water Framework Directive (WFD) for regionally differentiated conclusions on risks to aquatic ecosystems from diffuse nutrient entries, as well as for suitable measures of preventing negative impacts from land use on the quality of water resources and the waterbodies, imply to unleash available information technology by means of a modular information and simulation system. The ISSNEW joint project aimed at developing such a software system also named ISSNEW to be utilisable as a reliable decision support tool in the course of valuating and planning of measures for river basin management. By integrating an information system containing a geodatabase, with components of a simulation system for modelling the nitrate loading of ground and surface waterbodies, ISSNEW provided the prerequisite for scenario-based analyses directed at the WFD targets. ISSNEW consists of the components GIS- and database-supported information system, modular simulation system, and bidirectional intelligent interface. The information system (ArcWFD/WISYS) assists efficient acquisition, maintenance, visualisation and analysis of geodata and simulation results for non-point nutrient emissions into waterbodies. The simulation system combines modules for water flow and dissolved nitrogen transport from the root zone (newSocrates) via the deeper unsaturated zone (UZModule) and the aquifer (FEFLOW) towards the waterbodies. Direct linking between simulation components and the information system permits detailed reproduction of interrelated processes. A grid-based stand-alone tool for N emissions risk analysis

(MODEST) is recommended to be initiated in advance, in order to reduce the area of applying ISSNEW and thus to shorten the expenses for parameterisation and computation. In regard to the simulation components newSocrates, UZModule and MODEST to be (further) developed by ZALF during the project, besides a differentiated achievement of the objectives set with the application, the principal compatibility of proprietary (FEFLOW, Microsoft COM) and open-source software development within one joint project was demonstrated. Problems especially arose from the seam between both the domains (resp. the applicants ZALF and WASY GmbH): implementing interfaces, agreeing upon database structures and access routines, as well as accomplishing test runs by the example of the pilot area. In the latter the missing of adequate shared capacity of personnel was a special handicap. The ZALF simulation components are available according to the open-source convention and are to be published online. The information system, as well as the prototype simulation system built up of modelling components in parts not yet fully tested and validated, were proven by the example of a 100 km2 pilot region in eastern Brandenburg. The ISSNEW project provided substantial expertise for the participants in the fields of coupling information and simulation systems, as well as advances in the development and availability of process-oriented simulation components for water flow and matter transport across the compartments root zone, deeper unsaturated zone, and groundwater zone.

Acknowledgements The projectshave been fundet in the frame of the programme GEOTECHNOLOGIEN of BMBF and DFG, BMBF-Grant 03F0371 A, B.

References Charbeneau, R.J. (1984): Kinematic models for soil moisture and solute transport. - Water Resour. Res., 20(6): 699-706. IKSO (2005): Internationale Flussgebietseinheit Oder. Merkmale der Flussgebietseinheit, Überprüfung der Umweltauswirkungen menschlicher Tätigkeiten und Wirtschaftliche Analyse der Wassernutzung. Bericht an die Europäische Kommission gemäß Artikel 15, Abs. 2, 1. Anstrich der Richtlinie 2000/60/EG des Europäischen Parlamentes und des Rates vom 23. Oktober 2000 zur Schaffung eines Ordnungsrahmens für Maßnahmen der Gemeinschaft im Bereich der Wasserpolitik (Bericht 2005). Koordination im Rahmen der Internationalen Kommission zum Schutz der Oder (IKSO), 169 S. + Anlagen. Kersebaum, K.C. (1995): Application of a simple management model to simulate water and nitrogen dynamics. Ecological Modelling, 81: 145-156. Mirschel, W., Wieland, R., Jochheim, H., Kersebaum, K.C., Wegehenkel, M. & Wenkel, K.-O. (2002): Einheitliches Pflanzenwachstumsmodell für Ackerkulturen im Modellsystem SOCRATES. In: Gnauck, A. (ed.): Theorie und Modellierung von Ökosystemen: 225-243, Aachen (Shaker). Pollock, D.W. (1988): Semianalytical computation of path lines for finite difference models. Ground Water, 26(6): 743-750. Smolarkiewicz, P.K. (1983): A simple positive definite advection scheme with small implicit diffusion. - Mon. Weather Review, 111: 479. Smolarkiewicz, P.K. & Margolin, L.G. (1998): MPDATA: A finite-difference solver for geophysical flows. - J. Comp. Physics, 140: 459-480. Strassberg, G. & Pierce, S. (2001): Arc Hydro Groundwater Data Model, Center for Research in Water Resources, University of Texas at Austin.

Strassberg, G. & Maidment, D.R. (2004): Arc Hydro Groundwater Data Model, AWRA Spring Specialty Conference »Geographic Information Systems and Water Resources III«, Nashville, Tennessee, 17-19 May 2004. Wegehenkel, M. (2000): Test of a modelling system for simulating water balances and plant growth using various different complex approaches. Ecological Modelling, 129: 39-64. WFD (2000): Richtlinie 2000/60/EG des Europäischen Parlaments und des Rates vom 23. Oktober 2000 zur Schaffung eines Ordnungsrahmens für Maßnahmen der Gemeinschaft im Bereich der Wasserpolitik. Amtsblatt der Europäischen Gemeinschaften, L 327, 22. Dezember 2000, 1-72.

Marine Geo-Information-System for Spatial Analysis and Visualization of Heterogeneous Data (MarGIS) Schlüter M. (1), Schröder W. (2), Vetter L. (3), Jerosch K. (1), Peesch R. (2), Köberle A. (3), Morchner C. (1), and Fritsche U. (1) (1) Alfred-Wegener-Institute for Polar and Marine Research, Am Handelshafen, Box 120161, D-27515 Bremerhaven, Germany, mschlueter@awi-bremerhaven.de, kjerosch@awi-bremerhaven.de, cmorchner@awi-bremerhaven.de, ufritsche@awi-bremerhaven.de. (2) University Vechta, Box 1553, D-49364 Vechta, Germany, wschroeder@iuw.uni-vechta.de, rpeesch@iuw.uni-vechta.de (3) University of Applied Sciences, Box 110121, D-17041 Neubrandenburg, Germany vetter @fh-nb.de, akoeberle@fh-nb.de

Introduction Profound investigations of marine and terrestrial environments require compilation of extensive and complex, related to the number of parameters and methods, data sets for process oriented and spatial analysis. Starting from the »endmembers« of pure research, as expeditions by RV Challenger (1872-1876), FS Gazelle (1874-1876), or FS Valdivia (18981899) and economically oriented surveys for fish or for oil and gas the number and diversity of scientific and applied studies increased considerably. Especially within the last two decades developments of new sampling devices, in situ sensors, and mobile underwater platforms as ROV´s, (Remotely Operated Vehicles), AUV´s (Autonomous Underwater Vehicles) and Crawler´s (mobile, wheel driven underwater vehicles) provide new capabilities for marine research. The multitude of measured parameters and the quantity of information compiled during multidisciplinary research cruises requires new concepts for the management and spatial analysis of geodata. The term geodata summarizes biological, chemical, oceanographic or geological measure-ments, which are tied to the geographic coordinate (x, y, z, t) of the site of sampling or measurement. Geodata are the cornerstones for research

objectives, a requirement for management of offshore resources, and for implementation of national and international regulations as the Water Framework Directive or natural conservation issues. Furthermore, they support decisions about the possible use of the coastal seafloor for economic demands as offshore wind energy plants, sand and gravel mining, oil and gas exploration, or mariculture. Whereas on land such decisions are supported by the availability of thematic maps (e.g., about land use, soil type, fauna and flora, geology or hydrology) and environmental data, such compiled geoinformation are rather sparse for the marine environment. The increase of geodata derived by application of new marine technology and the demand for aggregated geoinformation for scientific and applied purposes were starting points for the MarGIS project. Objectives of MarGIS are the compilation of heterogeneous marine data, the development of an appropriate data base model which is closely linked to a Geo-Information System (GIS), and the application of advanced statistical techniques to characterize and identify provinces at the seafloor. Target areas are the North Sea and parts of the Baltic Sea and Norwegian continental margin. For these purposes data on sedimentology, geochemistry,

benthic biology, fish stock, bathymetry or chemical oceanography were compiled. Marine Geodata: Sites, Tracks, Areas, and Time Series In marine research, sampling and data acquisition is conducted during cruises by research vessels and, to a lesser extend, by satellite or airborne remote sensing. During cruises, a variety of devices are applied for sampling of surface sediments and sediment cores (Fig. 1). Different types of equipment are used to catch fish, plankton, or benthic biota. In situ sensor packages are deployed for long-term measurements of water currents, concentrations of nutrients, suspended matter, or fluxes of particulate matter from the photic zone to the seafloor. Acoustic techniques as multi-beam, side scan sonar, or shallow seismic are used for bathymetric surveys, habitat mapping, and investigation of sub-seafloor geology (Wright and Bartlett, 2000). High resolution still photographs and videos are recorded by towed devices and during dives by ROV´s, submersibles, or Crawler like the MOVE-System (MARUM, Univ. Bremen) for visual identification of geological or biological features as well as for inspection of infrastructure as pipelines. For example, by the ROV Victor6000 (IFREMER, FR) about 250 million soundings were recorded by a high resolution multibeam system during two dives of 44 and 15 hours bottom time at the Haakon Mosby Mud Volcano (Jerosch et al., sub., Klages et al., 2004). This allowed computation of a high resolution microbathymetry map with a depth resolution of better than 0.1 m and a foot-print of 0.5 m at the seafloor. During just six dives of video surveys and video mosaicing more than 4300 georeferenced mosaics were generated for classification of geochemical habitats. Furthermore sediment and water samples were gathered and sensor packages recorded temperature, density, conductivity and concentrations of chemical constituents along transects. Consequently, a significant amount of geodata stored on DVD´s were obtained within just about 2.5 weeks of ROV survey during one cruise. Considering that in most cases an area

of investigation is studied by several cruises a significant amount of geodata is compiled. Obviously, data management and spatial analysis of geodata by GIS techniques combing several information layers, geostatistics, or image analysis (Jerosch et al., subm.) are upcoming requirements. For these purposes, storage of geoinformation on a »cruise-entitylevel«, for example in a file-structure separating data derived during single cruises, is not efficient. Such a separation would cause considerable efforts to find and merge geodata. Especially during a cruise, a flexible and efficient access to data derived during previous expeditions or compiled from literature and reports might support decisions about sampling sites or tack lines. A data model providing access on a »data value entity« is beneficial, therefore. An integrated approach, supporting spatial analysis of marine geodata is required, which allows compilation of different data sets and data types (Fig. 1) within one data base management system (DBMS). Basically, this encompass that –in best caseone query is able to retrieve a complex georeferenced data set which can mapped and analysed within a Geo-Information-System (GIS). Besides measured or aggregated geodata, the description of the mode and techniques applied for sampling and chemical or physical analysis is essential for the evaluation of the data. Such metadata have to be stored within the geodatabase.

Marine Data Model The general demand for efficient and adequate data management initiated developments of Marine Data Models (MDM) at several institutes and companies. For example, the Marine Data Model Group intends to develop an UML based, generic MDM scheme which is closely linked to the geo-object concept and GIS software distributed by major IT companies. In some cases, this very promising concept has to be modified to cover specific scientific and administrative needs. For example, in Germany the BSH (Federal Maritime and Hydrographic Office) is establishing an advanced MDM for

Figure 1: During research cruises a multitude of sampling devices and sensor are applied to derive geological, chemical, oceanographic or biological data. The multitude of measured parameters and the quantity of information compiled during multidisciplinary research cruises requires new concepts for the management and spatial analysis of geodata. Photographs are kindly provided by MARUM (MOVE-System), AWI (AUV Bluefin, in situ Profiler), MPI-Bremen (in situ profiler), and IFREMER (ROV Victor6000)

data management of a complex set of hydrographic measurements, results of numerical models, or information about currents and tides. With different focus, MDMÂ´s were developed or are in development at institutes like IFREMER (FR), National Oceanographic Centre Southampton (UK), Oregon State University (US), or Scripps (US). For the purposes of the MarGIS project, we started from the generic MDM scheme. Our aim was to integrate geochemical, geological, biological, and oceanographic data measured along vertical profiles or transects into this scheme to support spatial analysis and process oriented investigations. The latter include studies on marine habitats, geochemical turnover processes and sedimentwater-interaction. For these specific purpose, we decided after contacts and discussion with other research groups, not to fully adopt the geo-object approach of the generic MDM

scheme. Instead we developed a Data Model which intends to bridge the gap between distributed, file-based data storage, geo-object related MDMÂ´s, and elaborated DBMS used in long term archives as the world data centre PANGAEA. The MarGIS geodatabase (MarGis_GDB) model is implemented on a client-server DBMS which allows integration of maps (vector and raster data), raw data, georeferenced images (geotiffÂ´s), and metadata (Fig. 2). Central part of the data model are tables containing information about sampling sites and tracks studied during research cruises or obtained from the literature or reports. These tables are attributed as sites of measurements, locations of time series, or starting/end points of line features as tracks lines, dives, or net hauls. Furthermore, data derived by multilevel sampling, in situ sensors, and obtained along track lines and areas are stored. In addi-

Figure 2: Simplified scheme of the data model, integrating geological, geochemical, oceanographic, and biological information into a geodatabase.

tion to raw data, aggregated information as thematic maps on sedimentology or bathymetry were integrated into the relational database management system.

Metadata Comparison and analysis of heterogeneous data derived from different sources or obtained by a multitude of research cruises has to consider the different sampling techniques or analytical methods. These methods might change over the years. Metadata, providing information where and how samples were obtained or which analytical methods were applied, are essential for data compilation and retrospective data analysis.For example, in the MarGIS_GDB metadata about cruises and dives are stored. This includes information about the mode of navigation (e.g. for ROV dives, navigation by USBL (ultra-short-baseline) or inertial navigation systems), the coordinate system, or the geodetic datum (WGS84, ED50). Essential information about the technical specification of sampling devices as sedi-

ment corers or water samplers, about in situ sensors, camera systems, and chemical methods for sediment and water analysis are archived as metadata as well. For aggregated information as geological or geochemical maps, which were georeferenced, digitized, and integrated into the geodatabase, the data source, geographic projection or the applied geodetic datum are stored as metadata. In addition, information about the data quality or the applied classification scheme are stored. The latter is important for e.g., sedimentological data, were the Folk or the Wentworth classification are applied to describe grain size distributions. International initiatives established different formats and schemes for description of geodata by metadata. This includes the Dublin Core Metadata Element Set (basic standard for the resource description), the FGDC (Federal Geographic Data Committee) widely applied in the US, or the ISO 19115 (metadata). For the MarGIS project we decide for the ISO 19115, which includes information about the identification, the extent, the quality, the spatial and temporal scheme, the spatial refe-

rence, and the distribution of digital geographic data (Kresse & Fadaie, 2004). A »Metadata ISO 19115 document« describes each parameter set compiled within MarGIS. ISO standard organizes the structure of the metadata as core metadata and comprehensive metadata elements. The metadata are stored as XML document in the database. As considered subsequently, metadata are assigned to each information layer and are distribution in conjunction with geodata by an Internet Map Server.

Geodata compiled in the MarGIS_GDB The geodata compiled within the MarGIS project and integrated into the data base were derived by an intensive recherché on literature and reports, retrievals of published geodata from marine data base systems (MDBS), and by close co-operation with scientist from various research disciplines. A considerable amount of published data were retrieved from marine data base systems established by international initiatives or by Federal Hydrographic and Oceanographic Agencies. Prominent examples are the data base systems operated by ICES (International Council for the Exploration of the Sea) or the Marine Environmental

Database (MUDAB) initiated and operated by the German Federal Maritime and Hydro-graphic Office (BSH) and the Federal Environmental Agency (UBA). In total, information about the following parameters were compiled (Tab. 1): bathymetry, salinity, temperature, concentrations of oxygen, ammonium, nitrate, nitrite, phosphate, silicic acid, pH, alkalinity and suspended matter, data on benthic biology as epibenthic and endobenthic organisms, fish populations, fish ages and length, and on the geology and geochemistry of sediments. The latter includes sediment maps and spatial distribution of distinct features at the seafloor like pockmarks, seeps and reefs. These compilations were supported by the METROL project (Methane Flux Control in Ocean Margin Sediments, http://www.metrol. org, coordinated by the Max Planck Institute for Marine Microbiology). In this context geoinformation about the methane cycle, gas rich deposits, fault zones, distribution of earth quakes, and source rocks for oil and gas were compiled. Furthermore, data about the use of the seafloor as pipelines, platforms, natural conservation areas, and sand and gravel mining were compiled. The aggregation of heterogenous geodata obtained from very various sources required a rather laborious har-

Table 1: Overview about some of the parameters compiled within the MarGIS database. (BFA/IFOE: Federal Research Centre for Fisheries, Institute for Fishery Ecology; BFA/ISH: Federal Research Centre for Fisheries, Institute for Sea Fisheries, BSH: Federal Maritime and Hydrographic Office; EC: European Community project 98/021; CEFAS: Centre for Environment, Fisheries and Aquaculture Science; GFS: National Ground Fish Surveys; ICES: International Council for the Exploration of the Sea; IFMHH: Institute for Marine Research; SBS/UWB: School of Biological Sciences, University.

monization procedure. This was one prerequisite for the integration of data and metadata into the MarGIS_GDB linked to the GeoInformation-System ArcGIS. Whereas the development of a data base model suited for incorporation of heterogeneous data, supporting analysis by GIS and able to be operate during cruises was one objectives of the project, it was not designed as a long-term archive. Such specific purposes are the domain of world data centres as PANGAEA. For example, PANGAEA provides the required infrastructure and includes specific concepts for referencing data tables or citations by DOI´s (Digital Object Identifiers).

Geostatistical analysis for identification of provinces at the seafloor Objectives as characterisation of seafloor habitats by benthos biology or computation of geochemical budgets as sediment oxygen demand or release of CH4 from the seafloor requires extrapolation of measurements obtained at distinct sites or along tracks to areas. For these purposes geostatistical methods were applied. Originally coming from geological research and applied to estimate mineral resources and reserves (Krige 1951; Matheron 1965, 1971), geostatistics are nowadays being used in various terrestrial and marine fields of research. As far as marine research is concerned geostatistical instruments were applied by various scientific disciplines, e.g. pollution research (Poon et al. 2000), geology (Chihi et al. 2000; Pehlke, 2005), biology (Harbitz & Lindstrøm 2001; Jelinski et al. 2002), or marine geochemistry Schlüter et al. (1998) or Jerosch et al. (subm.). Compared to deterministic procedures like IDW (Inverse Distance Weighted Method) geostatistical methods take into account the degree of spatial autocorrelation when predicting measurements. Geostatistics can be subdivided into two working steps: variogram analysis and kriging procedures. With variogram analysis the autocorrelation structure of the underlying spatial process is examined and modelled. Variogram

maps can be used to detect directional dependencies or so called anisotropies in the data field. The variogram models are used to predict measurement values by chosen kriging procedures (e.g. ordinary kriging). In the project MarGIS geostatistical analyses were carried out for ecologically relevant biotic and abiotic parameters as: grain size (0-20 µm, 0-63 µm, 20-63 µm und 63-2000 µm), bottom water temperature, salinity, and concentrations of dissolved constituents as silicic acid, nitrate, phosphate, ammonia, or oxygen measured at or near the sea floor (Fig. 3). The measurement data were analysed with regard to three different areas of interest and grouped into different time intervals: the Exclusive Economic Zone (EEZ) of the North Sea (four three months intervals aggregated over a six-year-period from 1995 to 2000), the entire North Sea (summer and winter months aggregated over a three year period from 1997 to 2000) and the western part of the Baltic Sea (summer and winter months aggregated over a three year period from 1995 to 2000). Quality assurance and assessment of the derived maps were conducted by cross-validation and standard error maps. Additionally, if the semivariances displayed on the variogram map indicated anisotropies in the data field, different ranges for different directions were compared with each other. Dependent on existing spatial trends and skewed value distributions either ordinary, universal or lognormal kriging was applied to spatially extrapolate the measurement data to raster maps. The quality of estimation was documented in terms of chosen key values derived from the results of crossvalidation. Figure 3 depicts results of geostatistical analyses. For example the ammonium concentrations measured at the sea floor of the North Sea in the summer months between the years of 1998 and 2000 (Fig. 3D). The results of variogram analysis reveals a distinct autocorrelation structure with a low nugget-sill ratio, indicative of low small-scale variability's as well as strong spatial dependencies of the measurement values. With the help of the variogram map anisotropies in 53.3° direction were detected, resulting in a searching ellipse

Figure 3: Examples for geodata compiled and mapped by geostatistical techniques for the North Sea. A: Digital bathymetric map of the North Sea, including the Exclusive Economic Zones. B: Distribution of sites where NO3 concentrations in bottom water are available. C: Sediment map compiled and computed for the North Sea and part of the Baltic Sea. D: Example for suggested ammonium concentration in bottom water (summer months between 1998 and 2000) derived by variogram analysis and kriging.

in the following ordinary kriging calculations. These maps are considered as preliminary results, which has to be discussed with chemical oceanographers.

Calculation of ecological sea floor provinces. On basis of the geostatically estimated surface maps ecological sea floor provinces were calculated by means of multivariate statistical methods. For the EEZ of the North Sea as well as the entire North Sea predictive habitat mapping was performed with help of the decision

tree algorithm Classification and Regression Trees (CART). Predictive habitat maps calculated for the EEZ of the North Sea as well as the entire North Sea rely on punctual data on eight benthic communities collected at 184 sites within the German Bight and the bordering central North Sea (Rachor & Nehmer 2003). Predictive habitat mapping can be defined as the development of a numerical or statistical model about the relationship among environmental variables (data on bottom water measurements on salinity, temperature, dissolved components as silicic acid, oxygen, phosphate and nitrate, or sedimentological data) and

Figure 4: Decision tree for the occurrence of eight benthic communities derived by Rachor & Nehmer (2003)

benthic biological communities. The methods used for predictive mapping vary widely from statistical approaches (including geostatistics) to more complex methods, such as expert systems, and decision tree analysis (Kelly et al. 2005). In the MarGIS project CART was applied to derive a classification model for the eight benthic communities investigated by Rachor & Nehmer (2003). CART is applied in various scientific disciplines to uncover hidden structures in complex data matrices. CART is a so called tree-growing algorithm that produces decision trees to predict or classify the outcome of a certain feature (= target variable) from a given set of meaningful predictor variables (Breiman et al. 1984). A major advantage of this technique is its ability to model non-additive and non-linear relationships among input variables. In contrast to most of the classification techniques as, e.g. cluster analysis or classical regression analysis, CART handles very large sets of mixed, i.e. both categorical and parametric data without prior transformation of scale dignity. The central goal of the CARTalgorithm is to produce homogenic classes with respect to the features of the target variable. Whether the target variable is of metric, ordinal or nominal scale dignity, different impurity measures exist. The Gini-index is commonly used when the target variable is categorical, although other options exist (Steinberg

and Colla 1995). CART does not make any assumptions on the distribution of the data and is extremely robust with respect to special cases as outliers or rare biotopes.Decision trees were computed to predict the occurrence of benthic communities from the intersected abiotic grid data. One decision tree each was computed for two sets of predictors and time intervals: the geostatistically estimated raster data for the German EEZ of the North Sea as well as the entire North Sea. Figure 4 shows the nodes of the decision tree for the EEZ in terms of histograms where each bar is representative for one of the eight communities. The results of the CART analysis resulted in a decision tree grown in nine binary splits leading to 10 endnodes or classes, respectively. As can be seen each decision tree starts with one root node containing all observations of the sample. By following the dendrogram from up to down it can be observed that the portion of each benthic communities increases stepwise. This leads to nine endnodes in which one of the eight communities is dominant (portion > 75%). Since each of these end nodes is defined by a set of decision rules, the tree can be applied to predict the occurrence of benthic communities at places where no such information is available. Each of the resulting spatial units may therefore be described with respect to the possibility of the occurrence of one of

Figure 5: Example for a classification scheme derived by CART analysis for the seafloor of the North Sea.

the eight communities. This possibility of the occurrence of each community can be derived from its percentage in the corresponding endnode. This was done for both the EEZ and the entire North Sea resulting in two different habitat maps. The preliminary habitat map of the entire North Sea is shown in Figure 5. All habitat classes were described with help of suitable statistical measures.

Distribution of Geodata via an Internet Map Server The traditional way for distributing geoinformation – single, printed maps or atlases- is supplemented by computer-based techniques which are often accessible via the internet. The internet has evolved very rapidly from delivering essentially static information into a very dynamic information resource. In the last years the demand of spatial web-based information has been strongly increased, that means to deliver interactive, »clickable maps« on-the-fly to suit the demands of the user. In order to follow this requirement the large GIS manufactu-

rers – and meanwhile the open source group, as well - are developing special software tools called Map Server technology for managing dynamic geospatial information. Nowadays it is state-of-the-art in the GIS community to work in or to provide a Map Server environment. Up to now, several web based Internet Map Server (IMS) were implemented to provide information about terrestrial environments. In contrast, in the marine context these developments seem to be just at the beginning. One goal of the MarGIS project was to design and to install a pertinent web-based user friendly system for the dissemination of marine geo-information.Figure 6 provides an overview how the different software components, the spatial elements, and the data model interact. In this case a combination of a client-server database and Map Server technology are responsible for the web-based spatial analyses. The dynamic presentations of the maps in the internet and the availability of the metadata are realized by tools as the ArcIMS (Fig. 7). This structure allows to view the results of a spatial data query in the internet in conjunction with the most current internet browsers. For the client the basic functions as Pan, Zoom, Query, Find and the possibility of printing are implemented. The Internet Map Server ArcIMS is linked to an Apache Server with a Tomcat as Servlet Engine. The various geo-data formats (as: shapes, coverages, grids, raster images) and the metadata are stored in a MSSQL geodatabase and the ArcSDE software works as a gateway. The processing chain for the user is: Web Browser as client via ArcIMS and by ArcSDE to the data.The ArcIMS Servlet Connector is responsible for the communication between the Web Server and the Application Server. The Application Server organizes the process of the incoming queries and transfers these to the Spatial Server. Different services as Image-, Feature-, Metadata-, Query-, Geocode- or Extract-Services are carrying out on the Application Server. MarGIS uses Image-, Queryand Metadata-Services. The image service creates raster data in formats as jpg, png or gif. Whenever a user zooms or pans, a query is sent to the image service with the coordinates of the

Figure 6: System structure of the web-based Marine Geo-Information-Services.

new map window. The image service generates the new image with the new coordinates and send the URL of the image back to the client. The communication between client and application server is realized by ArcXML. The client produces and analyzes the requests and responses of the server with the help of a java script. Besides maps and data, the Metadata (according to ISO 19115) assigned to each information layer is distributed via the Map server as well. To tackle the problem of presenting large data sets, MarGIS uses a web-based viewer which allows a clear presentation of information (Fig. 7). On the basis of cascading the information, the different layers are grouped in thematic blocks as for example geology, temperature, chemistry, fishery data, salinity and others. With the help of pull downs the user can select the appropriate layer (Fig. 7A). To each information layer metadata information (ISO 19115) are available (Fig. 7B). The document of the metadata information is also located on the database server which optimize the updating process. There are different possibilities of querying the data: as data to a certain object (point, line, polygon, pixel, grid), to a certain region or with the help of SQL commands. A result of such a request is presented in figure 7A for the salinity layer. Information layers which include quantitative data for example the concentration of nitrate, descriptive statistical parameters as box and whisker plots, average, histogram and others can be directly generated.

As expected, the recherchĂŠ for possible data sources and data acquisition was a rather laborious task. Very different data types and formats of data tables were received. Harmonization of these data and integration into a geodatabase was a rather time consuming step. Digital maps and other aggregated digital information describing the marine environment of the North Sea or the Baltic Sea are still very scarce or available only for rather small sub-regions. Without the very good cooperation with different research institutes, federal authorities, and scientists from various research disciplines such a compilation would be nearly impossible (see the Acknowledgements). Although elaborated Marine Data Models were developed by research groups and Federal Maritime and Hydrographic Offices these had to be modified to suit our specific research needs. For our requirements, the close link to the geo-object concept causes some overhead for integration of data into an MDM. For example, integration of chemical or geological raw data measured in the laboratory and associated metadata, describing analytical methods and quality assessment, in an MDM is a multi-step process which requires some experience with GIS. For our purposes, we used only a sub-set of the geo-object functionality to ensure data integration, export of geodata for statistical analysis, and visualisation of concentration profiles without the requirement of specific software.Within the project different statistical techniques were

Figure 7: A: Geodata distributed via the Internet Map Server (IMS) and displayed in a standard web-browser. The data points are sites of salinity measurements in bottom water. The data retrieval results in colour-coded data points mapped onto the bathymetric map and as a table providing detailed information about each measurement. The data are served from the MarGIS_GDB geodatabase. B: Metadata, according to ISO 19115, which is closely linked to each information layer. The metadata are accessible via the internet map server. ConclusionsThe objectives of the MarGIS project to identify spatial entities –provinces- at the seafloor by combination of biotic and abiotic data and geostatistical analysis was achieved by a three step process (Fig. 7): 1. compilation of heterogeneous data describing the geological, chemical, physical and biological environment of the lower water column and seafloor, 2. application and comparison of geostatistical and multivariate-statistical methods for data aggregation, and 3. distribution of data and maps via an internet maps server.

applied for identification of provinces of the seafloor. Intention was to come up with a »best practise« for such a spatial analysis. Obviously, no general recommendation can be given for analysis of heterogeneous geodata. From our perspective and experience, the combination of variogram analysis and kriging and application of CART provides rather good and robust results for identification of provinces at the seafloor.The data recherché showed that national and international programs compiled an impressive set of marine geodata. Especially the amount of information on sedimentology, benthic and pelagic biology or chemical oceanography provides a very good overview about the environment of the North Sea and Baltic Sea. Nevertheless, for studies about seasonal vari-

ations or long term trends only a few time series stations are available, where chemical or biological data were measured over longer periods. Furthermore, data on dissolved constituents relevant for studies on global change and the carbon cycle as CH4 or DMS are very limited and accessible only for a few subregions. Other data sets required for ecosystem modelling, describing sediment-water exchange of oxygen, nutrients, and trace gases or »admixture rates« as eddy diffusion coefficients are scarce. From our perspective, measurement of such data sets are a prerequisite for upcoming research projects aiming to model and predict the impact of global change on coastal environments. A compilation of available data and a geostatistical approach might support optimisation of

Figure 8: Schematic workflow of the MarGIS project starting from (I.) data recherchĂŠ for geological, chemical, biological and oceanographic raw data and aggregate information about the bottom water and seafloor of the North Sea and parts of the Baltic Sea and Norwegian continental margin. These geoinformation were integrated into a geodatabase. (II.) Maps for the different parameters were derived by geostatistical analysis and converted to raster data. Geodata and result of the spatial analysis are distributed via an Internet Map Server (III.)

future research programs ensuring coverage of representative spatial entities at the seafloor. A topic closely related to the distribution of compiled geodata to other research groups our via the internet map server is the copyright issue. For sure, data which were received on a partnership basis could not be delivered to third parties. For geodata compiled from journals and published reports or aggregated geoinformation (e.g. maps we derived by multi-variate statistic or geostatistics) the issue of copyright seems not such as well defined. In MarGIS all data sources are cited in and available as metadata which are closely linked to the geodata. Nevertheless,

we decided in some cases to reduce the information content of the geodata e.g., by reclassification of thematic maps or by providing colour codes instead of data valuesprior to integration into the Internet Map Server. Unfortunately, in our as well as other applications the potential of IMS to provide complex data sets from geodatabase systems for research and the general public is limited. The numerous discussion in workshops revealed the need for a general agreement or other regulations covering copyright issues related to distribution of geodata via the internet.

Acknowledgements We are very grateful to several scientist and institutes for their support of the objectives of MarGIS. On an institutional level we would like to thank especially scientist from following institutes and organisations: BSH: Federal Maritime and Hydrographic Office; BfN: Federal Nature Conservation Agency; MUDAB: Marine Environmental Database of BSH and UBA; ICES: International Council for the Exploration of the Sea; IFM-HH: Institute of Marine Research, University of Hamburg; IfÖ/BFA-Fi: Institute for Fishery Ecology / Federal Research Centre for Fisheries; ISH/BFAFi: Institute for Sea Fisheries / Federal Research Centre for Fisheries; GEUS: Geological Survey of Denmark and Greenland; BAW: Federal Institute for Waterway Engineering; BGS: British Geological Survey; BODC: British Oceanographic Data Centre; DOD: German Oceanographic Data Centre; RIKZ: National Institute for Coastal and Marine Management SBS/UWB: School of Biological Sciences, University of Wales, Swansea and Bangor; TNO-NITG: Netherlands Institute of Applied Geo-science TNO - National Geological; WSD : North Directorate for Water and Navigation North Region; NWSD North West Directorate for Water and Navigation North West Region; Danish Energy Authority; DEAL Data registry; Nederlands Instituut voor Geowetenschappen TNO, Utrecht, NL; UBA: Federal Environmental Agency. For their interest and support we would like to thank especially Dr. E. Rachor (AWI), Prof. Dr. S. Ehrich (BFA Hamburg), Dr. U. Brockmann (University Hamburg) and colleagues from the BSH (Federal Maritime and Hydrographic Office). Furthermore we are grateful for the initiative of the BMBF/DFG »Informationssysteme im Erdmanagement« and the stimulating efforts of Prof. Dr. R. Bill (University Rostock)

References Breiman, L., Freidman, J.H., Olshen, R.A., Stone, C.J. (1984): Classification and Regression Trees, Wadsworth. Chihi, H., Galli, A., Ravenne, C., Tesson, M., De Marsily, G. (2000): Estimating the Depth of Stratigraphic Units from Marine Seismic Profiles Using Nonstationary Geostatistics, Natural resources research 9 (1), pp. 77-95. Harbitz, A., Lindstrom, U. (2001): Stochastic spatial analysis of marine resources with application to minke whales (Balaenoptera acutorostrata) foraging: A synoptic case study from the southern Barents Sea, Sarsia, 86, pp. 485-501. Jelinski, D.E., Krueger, C.C., Duffus, D.A. (2002): Geostatistical analyses of interaction between killer whales (orcinus orca) and recreational whale-watching boats, Appl. Geogr. 22, pp. 393-411. Jerosch, K., Schlüter, M. Foucher, J.P., Allais, A.G., Klages, M., and Edy, C. (subm): Spatial distribution of mud flows and chemoautotrophic communities affecting the methane cycle at Håkon Mosby Mud Volcano. Klages, M., Thiede, J., Foucher, J.-P., 2004. The expedition ARK XIX/3 of the Research Vessel »Polarstern« in 2003, Berichte zur Polar- und Meeresforschung, 488: 346 S. Kelly, A., Powell, D., Riggs, R.A. (2005): Predicting potential natural vegetation in an interior northwest landscape using classification tree modeling and a GIS, Western Journal of Applied Forestry, 20 (2), pp. 117-127 Kresse, W. and Fadaie, K. (2004): ISO Standards for Geographic Information: 322 S., Berlin Heidelberg (Springer).

100

Krige, D.G. (1951): A statistical approach to some basic mine evaluation problems on the witwatersrand, J. Chem. Metall. Min. Soc. S. Africa 52 (6), pp. 119-139 Lembo, G.; Silecchia, T.; Carbonara, P.; Spedicato, M.T. (2000): Nursery areas of Merluccius Merluccius in the Italian Seas and in the east side of the Adriatic Sea, Biol. Mar. Medit. 7 (3), pp. 98-116. Matheron, G. (1965): Les variables régionalisées et leur estimation. Masson, Paris Matheron, G. (1971): The theory of regionalized variables and its application. Fontainebleau. Pehlke, H. (2005): Prädiktive Habitatkartierung für die Ausschließliche Wirtschaftszone (AWZ) der Nordsee, Diplomarbeit, Universität Vechta, Institut für Umweltwissenschaften. Poon, K.-F., Wong, R.W.-H., Lam, M. H.-W., Yeung,H.-Y., Chiu,T. K.-T. (2000): Geostatistical modelling of the spatial distribution of sewage pollution in coastal sediments, Water Research 34 (1), pp. 99-108. Rachor, E., Nehmer, P. (2003): Erfassung und Bewertung ökologisch wertvoller Lebensräume in der Nordsee, Abschlußbericht für das F und E-Vorhaben FKZ 899 85 310 (Bundesamt für Naturschutz), Alfred-Wegener-Institut für Polar- und Meeresforschung, Bremerhaven. Schlüter, M., Rutgers van der Loeff, M. M., Holby, O. and Kuhn, G. (1998). Silica cycle in surface sediments of the South Atlantic: DeepSea Research I, 45, 1085-1109. Steinberg, D., Colla, P. (1995): CART. Treestructured Non-Parametric Data Analysis, Salford Systems San Diego, Ca.Wright, D., Bartlett, D. (2000): Marine and Coastal Geographical Information System: 320 S., London (Taylor & Francis)

101

Author’s Index

A Arndt O. . . . . . . . . . . . . . . . . . . . . . 74 Azzam R. . . . . . . . . . . . . . . . . . . . . . . 2 B Bär W . . . . . . . . . . . . . . . . . . . . . . . 32 Brand S.. . . . . . . . . . . . . . . . . . . . . . 32 Breunig M.. . . . . . . . . . . . . . . . . . . . 32 Butenuth M. . . . . . . . . . . . . . . . . . . 52 C Christ I. . . . . . . . . . . . . . . . . . . . . . . 12 D Dannowski R.. . . . . . . . . . . . . . . . . . 74 F Fritsche U. . . . . . . . . . . . . . . . . . . . . 88 G Gösseln G.v.. . . . . . . . . . . . . . . . . . . 52 H Häußler J. . . . . . . . . . . . . . . . . . . . . 32 Hecker J.-M. . . . . . . . . . . . . . . . . . . 74 Heipke C.. . . . . . . . . . . . . . . . . . . . . 52 Hübner S . . . . . . . . . . . . . . . . . . . . . 12 J Jerosch K. . . . . . . . . . . . . . . . . . . . . 88

102

K Kandawasvika A. . . . . . . . . . . . . . . . 32 Kappler W.. . . . . . . . . . . . . . . . . . . . . 2 Kersebaum K.-C. . . . . . . . . . . . . . . . 74 Kiehle C. . . . . . . . . . . . . . . . . . . . . . . 2 Kipfer A. . . . . . . . . . . . . . . . . . . . . . 32 Klien E. . . . . . . . . . . . . . . . . . . . . . . 12 Köberle A. . . . . . . . . . . . . . . . . . . . . 88 Kunkel R. . . . . . . . . . . . . . . . . . . . . . . 2 L Lipeck U. . . . . . . . . . . . . . . . . . . . . . 52 Lutz M. . . . . . . . . . . . . . . . . . . . . . . 12 M Mäs S. . . . . . . . . . . . . . . . . . . . . . . . 32 Meiners H.G. . . . . . . . . . . . . . . . . . . . 2 Michels I. . . . . . . . . . . . . . . . . . . . . . 74 Morchner C. . . . . . . . . . . . . . . . . . . 88 P Peesch R . . . . . . . . . . . . . . . . . . . . . . 88 R Reinhardt W. . . . . . . . . . . . . . . . . . . 32

Author’s Index

S Schätzl P . . . . . . . . . . . . . . . . . . . . . . 74 Schlüter M. . . . . . . . . . . . . . . . . . . . . 88 Schröder W. . . . . . . . . . . . . . . . . . . . 88 Sester M. . . . . . . . . . . . . . . . . . . . . . 52 Staub G. . . . . . . . . . . . . . . . . . . . . . . 32 Steidl J. . . . . . . . . . . . . . . . . . . . . . . . 74 T Thomsen A. . . . . . . . . . . . . . . . . . . . 32 Tiedge M. . . . . . . . . . . . . . . . . . . . . . 52 V Vetter L. . . . . . . . . . . . . . . . . . . . . . . 88 W Waldow H.v. . . . . . . . . . . . . . . . . . . . 74 Wang F. . . . . . . . . . . . . . . . . . . . . . . 32 Wendland F. . . . . . . . . . . . . . . . . . . . . 2 Wiesel J. . . . . . . . . . . . . . . . . . . . . . . 32 Witte J. . . . . . . . . . . . . . . . . . . . . . . . 12

103

Notes

Science Report Information Systems in Earth Management

GEOTECHNOLOGIEN

Information Systems in Earth Management

GEOTECHNOLOGIEN Science Report

Information Systems in Earth Management From Science to Application Results from the First Funding Period (2002-2005)

The GEOTECHNOLOGIEN programme is funded by the Federal Ministry for Education and Research (BMBF) and the German Research Council (DFG)

No. 8

ISSN: 1619-7399

No. 8