15 minute read
Enriching Life Through Science
New tools and strategies to maximize the quantity, quality, and value of NOAA science
By Craig Collins
In late 2019, NOAA began to finalize the agency’s vision for a new set of strategies that would maximize the value of NOAA science through six interdependent elements: uncrewed systems; ‘omics; cloud computing; artificial intelligence; data management, and citizen science. The strategies were developed by teams of experts within NOAA who understand the most important point about all of them: one big strategy for unlocking the potential of these outcomes to maximize the value of NOAA science and dramatically expand the agency’s application of emerging science and technology focus areas to guide transformative advancements in the quality and timeliness of NOAA science, products and services.
“These strategies will accelerate the implementation of the most effective science and technology applications to advance NOAA’s mission to protect life and property and grow the American Blue Economy,” said retired Navy Rear Adm. Tim Gallaudet, Ph.D., assistant secretary of commerce for oceans and atmosphere and deputy NOAA administrator.
NOAA’s vocation is to enrich life through science. Two parts of its overall mission – sharing knowledge of the changing planet with others (service) and conserving and managing coastal and marine resources (stewardship) – are anchored by the third, science: its ability to understand and predict changes in climate, weather, oceans and coastlines.
Everything NOAA produces – weather forecasts and advisories; climate information; harmful algal bloom (HAB) forecasts; nautical charts; fishing regulations; coastal management decision tools; endangered species recovery plans and more – depends on science. And the science behind every one of these applications depends on data.
NOAA has one of the world’s most advanced and comprehensive systems for collecting environmental data, gathering samples from the ocean floor and from the sun: buoys, gauges, radar stations, geographic reference stations, satellites, air- or ocean-borne sensor arrays, and other sensing platforms use cutting-edge technologies to reveal conditions in, on and beyond the planet. Just a few years ago, NOAA estimated that these sensing capabilities enabled it to collect about 20 terabytes of data – the amount found in the texts of the library of Congress – every day. Today its daily haul is up to five times that amount.
More Data Means More Computing Power and More AI
The recent spike in the quantity and variety of NOAA’s observational data has the potential to overburden the systems tasked with incorporating all of this new information. Much of this data is fed into NOAA’s massive weather forecasting models, which are based on numerical calculations so intricate that, even when run through a powerful supercomputer, they can take hours to generate a single forecast.
In early 2020, NOAA’s National Weather Service (NWS) invested in an upgrade of its supercomputing system that would triple its capacity to run models and generate forecasts, as well as double the speed at which data is stored. The agency purchased two high-capacity Cray computers that will triple NOAA’s total supercomputing capacity for prediction and research. After a period of code migration and testing, they will be brought online in 2022 to replace existing systems in Virginia and Florida.
Raw computing power alone won’t be able to optimize how NOAA’s data is stored, analyzed and shared. Even the most powerful supercomputers available, running world-class models, are predicting the future using data that’s already hours old – which can limit forecasters’ awareness of current conditions.
Improved image analysis is one area that has benefitted from artificial intelligence, which has been used by NOAA scientists for more than 25 years to process and act on data in real time, using machine learning algorithms. Imagery collected during aircraft and drone overflights of HABs in Lake Erie, for example, is processed by an algorithm developed to estimate the concentration of cyanobacteria (blue-green algae) in Erie waters. Another algorithm under development will be able to tell the difference between toxic and non-toxic algae from these overhead images. In Alaska, scientists have trained computers to identify, with algorithms that process both visual and acoustic data, animals of specific species, such as ice seals and beluga whales – doing in hours what humans would take months to do.
NOAA’s already using AI in Alaska’s remote Aleutian Islands, where volunteers have helped classify hundreds of thousands of images of Steller sea lions taken by remote cameras, identifying animals marked by researchers from NOAA’s Alaska Fisheries Science Center to monitor them. These volunteers have already saved NOAA staff hundreds of viewing hours – but are still spending hundreds of hours themselves performing tedious work.
NOAA’s Big Data Project Takes to the Cloud
Not every NOAA workstation is powerful enough to run machine learning algorithms, and it can be difficult for users of NOAA data to download and work with such large data volumes. NOAA is in the midst of a sweeping public-private initiative – the Big Data Project – to provide public access to NOAA’s open data on commercial cloud platforms.
The Big Data Project became operational in December of 2019, when NOAA contracted with three commercial cloud providers – Amazon Web Services, Google Cloud and Microsoft Azure – to host NOAA data in support of some of its most widely used online applications. The project combines NOAA’s expansive collection of high-quality environmental data with the infrastructure and expertise of some of the data sector’s biggest innovators.
NOAA’s N-Wave network is facilitating this big data move to the cloud. N-Wave’s high-speed network services include direct connections to multiple commercial cloud providers, enabling both NOAA research and operations in the cloud. cloud, with more arriving every day – and NOAA is already seeing that its data are being used more, sometimes in surprising and innovative ways. For example, when the entire archive of data collected from the National Weather Service’s NEXRAD (nextgeneration radar) stations was moved onto a cloud platform maintained by Amazon Web Services, an international team of ornithologists and computer scientists was able to reprocess the archive using a machine learning tool that could distinguish birds from weather. The new tool, MistNet, helped to reveal previously undiscovered bird migration patterns.
According to Jonathan O’Neil, director of the Big Data Project, two things enabled this new application, neither of which had been previously achievable with NEXRAD: Making the entire archive – most of which had been written onto computer tapes – available to outside experts was a significant task. “There are some challenges getting the entire archive off of tape,” said O’Neil, “but the other side of it is having enough computing power, which the cloud provides. It’s not something you could run on your PC. Combining the high availability of this data with the computing power of the cloud platforms allows those types of activities to happen.”
The “challenge” O’Neil refers to is, in part, one of formatting data that has been housed for years on one platform – tapes – so that it can be shared and processed in the cloud. The Big Data Project is experimenting with converting traditional NOAA formats to more cloud-native formats optimized for high-powered tools and algorithms.
As O’Neil pointed out, NOAA hasn’t adopted obscure data formats that work only for internal users; it has adhered to international data standards – but those standards were written before the advent of many of the most powerful cloudcomputing tools. For example, much of the National Geodetic Survey’s spatial reference data – which NOAA uses to support emergency response and homeland security activities – is collected in several datasets that can be rapidly disseminated to federal, state and local agencies. These data include lidar, high-resolution digital imagery, film camera photographs and hyperspectral scans. NOAA’s Big Data Project team has begun the process of converting all of these data to Cloud-optimized Geotiff (COG) files, a format that can be accessed and analyzed in the cloud, without the need for a second conversion.
“When people say AI-ready, analysis-ready or cloud-optimized, they’re talking about the same thing,” O’Neil said. “Parts of NOAA, as they see the utility of these formats, may begin to produce them. The usefulness of data in the cloud, though, is mostly due to the scalability of the platform. If your data is on an empty feed, really all you can do is download it. But if your data is on Microsoft [Azure], Google [Cloud], or Amazon [Web Services], you don’t really need to download it. You can do all your processing and use all their tools right where the data sits.”
According to David Layton, NOAA’s chief enterprise architect, the Big Data Program teams are also working to make more data available to the general public. “Often, people who need to access data from our archives are experts. They know where to find it. They know the formats, and they know how to ask the right questions. Today’s data infrastructure is oriented toward those users.” NOAA is merging all of this data into a new framework, the OneStop portal (data.noaa. gov/onestop/), an online gateway to all of NOAA’s data, which can be searched by topic (i.e., “fisheries” or “weather”). “With OneStop, we’re trying to broaden our audience and make the data easier to access through modern cloud-based technologies,” Layton said. “It will be an increasingly important interface for NOAA customers, and the public in particular, to discover and access our data holdings.”
Both the big data and cloud strategies have their own broad application across the agency while underpinning and enabling the goals of NOAA’s interrelated science and technology strategies on uncrewed systems, artificial intelligence, ‘Omics and citizen science.
Citizen Scientists, Data Enthusiasts
Since 2010 NOAA has also operated and managed its own shared national network infrastructure – N-Wave (noc.nwave.noaa.gov) – to enable NOAA-wide data transport, connect scientists to remote high performance computing resources and bridge public access to NOAA data. N-Wave’s high-speed network capabilities are scalable to meet the needs of NOAA science from the campus to the cloud, including the magnitude increase of data sourced from innovative, higher-resolution instrumentation.
Technology only partly explains NOAA’s data reach. The key is people: NOAA employs 6,770 scientists, a significant number for any agency – but its weather enterprise alone is fed by data from a network of more than 10,000 volunteer observers throughout the United States and its territories, participants in the Cooperative Observer Network who take daily weather observations and make them available online.
NOAA’s growing commonwealth of citizen scientists, some using NOAA-developed mobile apps, report from every part of the country daily, helping NOAA scientists observe, predict and protect the environment. Crowdsourced data is becoming an increasingly significant NOAA asset. Volunteers have collected fish for genetic analysis; mapped urban hot spots; warned of severe weather; monitored and reported marine debris; and even monitored sea stars along the West Coast.
Citizen science benefits society while cost-effectively enhancing NOAA’s research and monitoring efforts. Volunteers don’t just fill NOAA’s data bucket; they often work to process data as well – for example, converting older weather records, such as ship logs, to digital formats. NOAA is fully leveraging the power of public participation in support of agency mission areas through the Citizen Science Strategy. Citizen science, crowdsourcing, and challenge competitions provide opportunities for the agency to engage the American public in addressing societal needs and accelerating science, technology,and innovation
Autonomous Uncrewed Systems and ‘Omics Analysis
The continuing increase in NOAA data volume is supported by new technologies, including uncrewed vehicles that can be operated remotely.
Autonomous instruments are capable of operating unattended for extended periods of time, and can collect data in the atmosphere or in bodies of water. NOAA’s Great Lakes Environmental Research Laboratory is working to integrate uncrewed aircraft into its airborne harmful algal bloom (HAB) detection and mapping program, to complement the work of the piloted aircraft now conducting aerial surveys – enabling a more rapid response and data collection in a constantly changing environment.
The GLERL is also exploring the use of underwater drones to help with short- and long-term decision making in an area of Lake Erie that supplies drinking water to 11 million people. A torpedoshaped autonomous underwater vehicle (AUV) released from a NOAA research vessel into the waters of western Lake Erie in the summer of 2018 had two assignments: First, to measure levels of a toxic chemical, microcystin, produced by blue-green algae that blooms with increasing frequency. The three-dimensional picture of the toxin’s concentration in the water column, and where it was moving, would help regional and municipal managers make forecasts and decisions such as whether and when to close off drinking water intake pipes.
The AUV’s second job – to collect water samples from both within and outside the algal bloom – was aimed at the longer term: With new technologies that allow sequencing and analysis of DNA in a given water sample, scientists could catalog all the organisms present in certain locations at certain times. The genetic information collected from Lake Erie would help scientists to understand the ecological dynamics of HABs, and ultimately to plot strategies to avoid and reduce their harm.
The ability to characterize larger ecosystems by analyzing molecules in organic material such as DNA, RNA and proteins is a relatively new field: the ‘omics sciences, known for their common suffix (i.e., genomics, proteomics, metabolomics).
Dr. Kelly Goodwin, a molecular biologist and microbiologist with NOAA’s Atlantic Oceanographic and Meteorological Laboratory, is cochair of the task force that developed an overall ‘omics strategy for the agency. Uncrewed systems such as the gliders used in Lake Erie, she said, constitute a huge leap forward in data collection: “It’s a big deal,” she said, “because there are a lot of places we can’t trawl, or we can’t sample: under ice, the deep ocean, places where the habitat is really sensitive.” Uncrewed systems have the potential to increase not only the raw quantity of data, but also the types of data that can be collected feasibly and safely.
Putting It All Together: Strategies for the Future
As O’Neil and Layton point out, exciting scientific and technological innovations are developing among NOAA’s different lines.
The Office of Aviation and Maritime Operations is implementing NOAA’s new Uncrewed Systems Operations Program: establishing an uncrewed systems office to coordinate testing, training, development, and operations among about 100 aerial, sea surface, and undersea vehicles now being used to collect high-quality environmental data for NOAA’s line offices.
Kelly Goodwin and her NOAA microbiologist colleagues are investigating and discovering new applications for the ‘omics sciences, using molecular analyses of environmental DNA in marine and aquatic environments to discover the mysteries about how these ecosystems work.
New communications technologies and mobile applications have unlocked new pathways for citizen scientists to engage in NOAA’s scientific research and monitoring.
Emerging capabilities such as artificial intelligence, cloud platforms and computing tools are enabling NOAA scientists to rapidly store, share, and analyze data.
These developments are part of a NOAA science and technology strategy for a future with the potential to greatly enhance the benefits of NOAA science – but only, its scientists caution, if they grow together. The ability to use uncrewed vehicles and sample processors to sort genetic information from a seawater sample is immensely promising – but according to Goodwin, it also presents new challenges to NOAA biologists, most of whom received their education before such tools
existed. “The transformational part of ‘omics – the part that gives you so much information on the cheap – is also the part we have to manage, which is the data dump,” she said. “We’re really talking about a new field of science, and we’re going to need partnerships and people to grow it. Utilizing these new supercool tools is going to require a lot of expertise and computing power.”
Dr. Alan Leonardi, director of NOAA’s Office of Ocean Exploration and Research, is excited about the potential of uncrewed systems to help reveal the undersea world – but he knows it will be more complicated than simply tossing robots into the ocean. Advances in engineering, he said, will have to be matched with advances in data and software capability: “If you’re going to have robots out there doing their thing, you’d better make them smart,” he said. “Artificial intelligence and machine learning are going to help us do that. If we can put our data in the cloud, rather than sit on a hard disk somewhere in a shore-based location, those tools can help a broad array of clients in the world – whether in the business, philanthropic, academic or government sectors – to quickly assess whether or not our data and information have value to them, so they can make decisions, improve a product, or in some cases make money.”
Dr. Jamese Sims, a physical scientist who began her NOAA career more than 15 years ago, developing and improving predictive models for the National Weather Service, coordinated NOAA’s Artificial Intelligence Strategy, released in February of 2020. “We have a lot of synergy across all these strategies. We’re really changing the culture in the way that we manage our science and technology,” she said. “By making sure we’re taking the ‘One NOAA’ approach, and that line offices are constantly communicating about driving these strategies and implementing them. … We have a lot of synergy across all these strategies.”
“NOAA is a pioneer with a strong track record of applying the latest science and technology and these new strategies will allow us to dramatically expand these applications across our mission areas,” said Dr. Neil Jacobs, Ph.D., acting NOAA administrator. “These detailed strategies will enable us to achieve our priorities of reclaiming and maintaining global leadership in numerical weather prediction and sustainably expanding the American Blue Economy.”
Such an approach makes it impossible to view any of these six strategies in isolation. As NOAA scientists expand their use of uncrewed systems and partner with citizen scientists to collect more data – including more detailed and comprehensive data such as the molecular information revealed by ‘omics analyses – NOAA experts will assure that the growing data catalog will reach a wider audience, who will use new tools to do more with it.
Emerging technologies like AI, uncrewed systems, ‘omics, big data and cloud services hold incredible promise to solve difficult challenges, and through these strategies and initiatives, NOAA seeks to bring this potential to all Americans for their use and benefit.