4 minute read

Teams take on third SKAO Science Data Challenge

BY CASSANDRA CAVALLARO (SKAO)

The latest SKAO Science Data Challenge is under way, with 33 teams spread across 16 countries busy analysing a huge dataset that simulates one of the SKAO’s key science goals.

In total, 234 participants have registered for this third challenge in the series, with a particularly strong showing from China and India where dozens of researchers are taking part. Participants are being supported by 12 supercomputing centres in 10 countries, including SKA Regional Centre prototypes, which are providing the resources for participants to access the data and deploy their analysis pipelines.

The challenge simulates observations of the Epoch of Reionisation, when the Universe transitioned from the so-called dark ages, to when the first stars and galaxies formed, ionising the surrounding gas. EoR studies aim to establish when this happened, through observing neutral hydrogen (i.e. not ionised). The neutral hydrogen signal – also known as the 21cm line due to its wavelength – is visible to radio telescopes, so the reionisation of the Universe is characterised by an absence of the signal.

Teams are analysing a 7.5 TB data cube which has layers of simulated astronomical data going back billions of years through cosmic time. Behind it all lies the EoR signal. If observed from Earth, its very faint light is obscured by everything in between us and it: the light from other galaxies and emission from within our own galaxy in the foreground.

Above and below: Slices of a SDC3 data cube showing the simulated EoR signal (top) and the foreground emission which is obscuring it (bottom: orange dots are galaxies, and the ribbon-like shape is diffuse gas in our galaxy). While the features of each image appear equally bright here, in the data cube the background is millions of times fainter than the foreground. Credit: Dr Philippa Hartley (SKAO)

“Removing the foreground emission is extremely challenging because there are so many objects – even the needle in a haystack analogy doesn’t do it justice!” SKAO Postdoctoral Fellow Dr Simon Purser said. “It demands sophisticated algorithms to characterise those objects so they can be removed, sifting through the data to gradually root out the signal we’re looking for. Some of these sources haven’t been observed before so that adds an extra challenge.”

The purpose of the challenges is not only to prepare the astronomical community for dealing with such huge data sets, and accessing them remotely rather than on their own laptops, but also to assess the most effective ways to process the data.

“Part of the value is seeing the techniques that teams choose and the innovations they make. We have learned a lot from the previous challenges – so much so that we’ve written a paper on Science Data Challenge 2 which highlights the lessons we’ve taken from that,” said SKAO Scientist Dr Philippa Hartley. “We’ve learned in particular that you can get better results by combining different techniques, not just relying on one. It’s vital that we do this work ahead of time so that when the data does start to flow, we have all the best tools at our disposal to exploit it.”

Working with the 12 computing centre partners – four more than last time – is helping the SKAO to establish the amount and type of computing resources different science observations will demand, making it an important exercise for the SKAO’s Data Operations team, too.

“It’s really valuable as it has been allowing us to test prototypes and processes that are being developed for the SKA Regional Centre network,” said SKAO Operations Data Scientist Dr James Collinson. “Thanks to SDC3 we completed our biggest data transfer yet, when we replicated the 7.5 TB data cube between two of the computing centres using the prototype data lake manager Rucio [a central repository for managing large amounts of data and storing corresponding metadata]. Previously our largest transfers were at the GB scale, so this has been a great performance test as we look to the future with SKA-scale data, and allows us to demonstrate what works and what needs refining.

“Just as importantly, the data challenges present us with a key opportunity to get feedback from the scientists involved, which will lead to a more capable SRC network and a better user experience.”

The challenge will run until August, and the results will be featured in the next issue of Contact!

This article is from: