3 minute read

The European Open Science what?

The Square Kilometre Array, an international initiative to build the world’s largest radio telescope, will generate 300 petabytes of scientific data per year by 2024. Storing and processing data on this scale is too big a job for any one research organisation. As Michael Wise, Head of Astronomy at ASTRON in the Netherlands, said: We cannot do this alone, we simply have to collaborate.

‘Big’ isn’t the only aspect of the data-driven challenge facing science today. It is also complex – for example, how do you launch, track and manage a computational model of a Tokamak nuclear fusion reactor which involves multiply connected simulations across desktop, cloud and highperformance computers? Sensitive data presents another challenge – linking personal medical data with behavioural data from supermarket loyalty cards could provide insights into the emergence of dementia and other neuro-degenerative diseases, but doing that across borders demands the utmost care.

Advertisement

European Open Science Cloud

Launched last year, the European Open Science Cloud (EOSC) is an EU research initiative that seeks to lay the international computing foundations for tomorrow’s big science. It builds on the ideas of open access to scientific methods and results (software, papers and data), and focuses strongly on open data.

EOSC envisages a rich, everexpanding suite of computational services on top of a layer of findable, accessible, interoperable and re-usable research data (the FAIR principles). And while EOSC’s push towards openness and interoperability may be new, the data, services and resources that form the heart of this cloud are not.

Europe’s scientific computing infrastructure (e-infrastructure) has coalesced into a number of specialised initiatives (EUDAT focuses on storage and data management; EGI on highthroughput and cloud computing; GÉANT on networking) and a number of challenges must be met before frictionless interoperability is achieved.

The biggest win for EOSC would be single sign-on, followed by uniform access to data. One of the reasons the Web works so smoothly for many is the ability to “log in with Facebook” or “sign in with Google”, a widely-accepted authentication token that provides access to a broad range of services. Single sign-on is even more useful in a cloud environment. It would be a major achievement if researchers or their computational proxies could log in once to the “Science Cloud” then access multiple European resources. How to achieve this is well understood, but there are perhaps too many ways to do it: interoperability between numerous existing authentication and identity systems is the stumbling block. This illustrates EOSC’s main challenge: the solutions it needs to put in place are more political than technical, and need agreement between multiple stakeholders.

A uniformly-accessible data layer is regarded as the foundation of EOSC, and again the principal challenge is one of agreement between stakeholders. Following the FAIR principles, data in EOSC will first need to be well described with an agreed basic metadata record to support search and cataloguing (there are plenty of lessons here from library science). Web services will play a big role in accessibility, and standard, open data formats will support interoperability: EOSC data should achieve at least a “three star” rating on the five-star open data scale. Reusability is underpinned by the “O” in EOSC; where data can be shared freely they will be available under unambiguous open licences – public domain or simple attribution under a scheme like Creative Commons. (One of EOSC’s challenges will be enabling international public health research using restricted data.)

EPCC’s role

EPCC has long been a partner in two of the major underpinning e-infrastructures: the highperformance computing alliance PRACE, and the data infrastructure EUDAT. We also have roles in two new EOSC projects: EOSC-hub and eInfraCentral.

EOSC-hub, which started this year, is regarded as one of the cornerstone projects of EOSC, bringing together service providers from the EUDAT and EGI.eu infrastructure organisations with software providers from Indigo DataCloud and a significant number of scientific research infrastructure users from across Europe.

EOSC-hub will blend EUDAT’s data services with EGI’s computational services to create an “EOSC 1.0”, a blueprint for e-infrastructure in Europe for the next decade. It will do this in tandem with OpenAIRE- Advance, the new phase of the open access initiative, and eInfraCentral. eInfraCentral focuses on the “findability” aspects of EOSC, working with e-infrastructure service providers to build a common catalogue of everything, from data to HPC services.

An example of mission-led science: the European Incoherent SCATter Scientific Association (EISCAT) is an international scientific organisation that conducts ionospheric and atmospheric measurements with radars, including observing the effects of the aurora borealis.

Image: Craig Heinselman, EISCAT director.

There’s a lot of work ahead, but the hope is that by 2021 European e-infrastructure will be prepared for whatever big science can throw at it.

Rob Baxter, EPCC Group Manager r.baxter@epcc.ed.ac.uk

This article is from: