3 minute read
Spatial Data Infrastructure Modernization: Let's Move to Big Data
Achievement of the SDGs for us who are the National Environmental Information System is a primary objective and, therefore, it is necessary that our system and with it the technological infrastructure evolve to give concrete answers.
By Carlo Cipolloni – ISPRA (Italian Institute for Environmental Protection and Research) Technical Manager – National Environmental Information System & Italian Technical Manager –INSPIRE Directive.
Over the past five years, many of the world’s public and private organizations have seen exponential growth in ingested data flows. In this scenario, the National Environmental Information System, (SINA) managed by the Italian Institute for Environmental Protection and Research, has also entered into this process, having to manage more and more millions of records per year, coming from the environmental sensors network (e.g. air quality, water quality, electromagnetic wave, etc.).
This required a paradigm shift in how data flows were managed, as established spatial data infrastructures (SDIs) were unable to respond dynamically to the management and querying of such masses of data. Furthermore, this change must be associated with the ever-increasing demand from various stakeholders for quality data. Industry, citizens, and the administrations are increasingly hungry for data to feed into artificial intelligence (AI) systems to expand their knowledge and market offerings.
The information system has passed from a management of structured spatial data flows to the need to acquire structured and unstructured data in real-time or near-real-time systems, witnessing a 1000% growth in records (from 100k records to 1M per day).
How did we respond to this need?
Considering data growth, it was therefore necessary to redesign the system architecture on how to ingest data in a more flexible way, how to store them and above all how to query them dynamically through WebGIS applications or analysis and control dashboards.
The new architecture is redesigned from data management with only relational databases to a mixed relational plus No-SQL system which has allowed the infrastructure to evolve towards data management via native APIs. With the joint use of relational databases and No-SQL it was possible to design two parallel data management paths – structured and unstructured, which are then archived jointly as GeoJSON or JSON documents in the new data management system implemented.
The structured environmental data flows that we manage follow well-defined national and Euro- pean rules, with an organized verification and control system based on checks that ascertain the syntactic and semantic harmonization of the ingested flows.
In this context, therefore, it was necessary to carry out the major re-engineering of the system, placing side by side the traditional relational system where many of the controls are implemented with a No-SQL system of document collections, which would allow querying more quickly and also guaranteed greater reliability. The incoming flow is archived and checked with a traditional system and then, in an automated way, it is transformed and collected as JSON documents in the new non-relational database cluster.
The new archive is then made available with cascading REST API services. This also required a revision of the applications with visualizing, analyzing and querying the data, towards a more modern and dynamic model. The unstructured environmental data flows, on the other hand, can be considered recently acquired, so in this case from the beginning we thought of a system entirely based on No-SQL systems that would allow the direct acquisition of the flows as JSON documents. In the same way, the applications have been designed to respond to dynamic requests for data, while the implementation of the INSPIRE/OGC network services has been entirely based on No-SQL datastores.
But we didn't stop at data flow management alone!
Achievement of the SDGs for us who are the National Environmental Information System is a primary objective and, therefore, it is necessary that our system and with it the technological infrastructure evolve to give concrete answers.
The new infrastructure is aimed at the increasingly necessary creation of an information system of knowledge, in which the new data ecosystem plays a central and decisive role. And it is for this reason that to enrich the data offer also from an open data point of view, the Triple-store database component has also been added to the architecture already described in order to have a SPARQL end-point which, on some environmental aspects, favors the federated integration with other data sources.
The next step will be to create a platform based on semantic knowledge graphs, is able to design differentiated knowledge paths as per stakeholder requests and therefore applications to support industry and citizens.
Carlo Cipolloni National Technical Manager