10 minute read

WorldFAIR Chemistry

Making IUPAC Assets FAIR

by Leah McEwen and Fatima Mustafa

Having chemical terminology and data available in the digital environment using standard file formats and standard identifiers will increase accessibly and interoperability of data by both humans and machines.”

Most of us as chemists are very familiar with the contributions of IUPAC for more than 100 years in nomenclature, terminology, and standardized chemical methods; however, we may be less familiar with other activities and projects that IUPAC is involved in. With the growing attention on Open Science and FAIR (Findable, Accessible, Interoperable, Reusable) data, do you know that IUPAC is increasing its efforts in translating existing standards into digital formats? [1]. Having chemical terminology and data available in the digital environment using standard file formats and standard identifiers will increase accessibly and interoperability of data by both humans and machines [2]. An example of these digital standards is the International Chemical Identifier (InChI), which is a unique representation of many layers of chemical information such as chemical formula, structure, and stereochemistry which results in a barcode like identifier of a particular substance. [3].

IUPAC WorldFAIR Chemistry

The IUPAC goal is to align chemistry data standards with the FAIR data principles

The IUPAC Committee on Publications and Cheminformatics Data Standards (CPCDS) [4] has been tasked since 2014 with creating standards to enable and promote interoperable and uniform transmission, storage, and management of digital chemical information material. Recently, the CPCDS has been involved in a two year WorldFAIR Project to lead the chemistry case with a main goal of cultivating FAIR data principles applications within the chemical community. WorldFAIR is an international initiative coordinated by CODATA and the Research Data Alliance Association (RDA) to advance implementation of the FAIR data principles, those for Interoperability, and to develop a set of recommendations and a framework for FAIR assessment in a set of disciplines, or cross-disciplinary research areas [5].

One of the 11 case studies in the WorldFAIR Project is Chemistry [6]. As chemical data are increasingly captured, analyzed, and exchanged across digital systems, there is an urgent need to ensure that critical information is machine-readable so that data can be appropriately re-used. The IUPAC goal is to align chemistry data standards with the FAIR data principles through:

• Development of guidelines, tools and validation services that enable scientists to share and store data in a FAIR manner.

• Addressing gaps in standards that currently restrain chemistry in both academic and industrial areas, in particular taking advantage of developments in AI/ML.

• Engaging critical stakeholders in the adoption of standards and best practices to significantly increase the amount of chemical data available for all scientific disciplines.

To check out the project and its progress, visit our website regularly at https://iupac.org/ project/2022-012-1-024/ in addition to our open science Zenodo community at https://zenodo.org/communities/fairchemistry/ [7], and follow us on Twitter [@FAIRChemistry].

The IUPAC WorldFAIR Chemistry initiative is divided into three main sub-projects, each of which has a clear objective and dedicated members to accomplish the broad vision.

Sub-Project 1. Reporting Guidance: Recommendations for FAIR Chemical Data Reporting

Developing guidance on best practices for handling and reporting FAIR-enabled chemistry data for different stakeholder roles that are developing policies, practices, products, and services

To ensure data are discoverable and re-usable will necessitate community-wide practices for describing data that meet the FAIR criteria for machine-readability. Implementation of domain data standards will be critical for maximizing interoperability and sustainability across collections and resources.

This sub-project aims to particularly develop guidance on best practices for handling and reporting FAIR-enabled chemistry data for different stakeholder roles that are developing policies, practices, products, and services. Many different stakeholders can be involved in processes and workflows to capture, prepare, publish, and compile datasets, including researchers, publishers, repositories, software developers, instrument facilities, and libraries, among others. The intended scope will bridge between general guidance for FAIR data and specific guidance for chemistry data types emerging through numerous activities in IUPAC standards projects and community-based use cases.

https://iupac.org/project/2022-027-1-024

Sub-Project 2. Training Cookbook: Digital Recipes for Managing Chemical Data

Developing an online community resource of practical and re-usable training materials that demonstrate how to manage digital data files and content

FAIR chemical data needs to be machine-readable, and this can be an unfamiliar scenario for many researchers and other stakeholders involved with publishing and managing experimental data. An advantage in the Cloud environment is the availability of readily accessible online tools for working with digital content. Even using workflow tools such as Electronic Laboratory Notebooks, there will always be some additional tasks to meet data sharing requirements. Explanatory information is increasingly available for the FAIR Data Principles and data sharing, and IUPAC provides extensive documentation on various standards for representing chemical information. However, very few practical resources exist that actively demonstrate how to manage various tasks associated with preparing data files for publication that will align with the technical criteria for FAIR machine-readable data.

This sub-project aims to develop an online community resource of practical and re-usable training materials that demonstrate how to manage digital data files and content. The overall goal is to get practical tools and tips in the hands of practicing chemists to lower barriers and smooth the adoption of best practices for sharing and re-using FAIR chemical data.

https://iupac.org/project/2022-028-1-024

Sub-Project 3. Protocol Services: Standardized Programmatic Access to Chemical Information

Developing web-based services that confirm chemical identity and provide real-time feedback on the machine-readability of chemical data and metadata representation

Representing chemical substances in structure form is one of the most critical functions in communicating chemistry, including sharing FAIR and machine-readable chemical data, as many resources are indexed by chemical structures. There are a range of approaches for articulating chemical substance information, depending on the scientific nature and context, and the digital motifs used in chemical databases and chemicals software, present additional layers of complexity. Chemical interpretation can vary between data systems and directly impact downstream reuse, especially when it comes to representation and analysis of associated data. Validation of chemical description is an essential requirement for the re-usability of chemical data, including discovery and in many modeling and predictive AI/ML applications.

This sub-project aims to develop web-based services that confirm chemical identity and provide real-time feedback on the machine-readability of chemical data and metadata representation, based on IUPAC standard rule sets, and recommended best practices.

https://iupac.org/project/2022-029-1-024

Outreach activities - Having IUPAC WorldFAIR Chemistry outcomes exposed from early stages to the chemical data community

Our project has been engaged with the chemical data community from the beginning and is continuing to collaborate with volunteers, experts, data and data standards generators and users from the chemistry community and across disciplines. One of our first exposures to this wide community is through the Webinar Series: “What is a Chemical?” which was launched in September 2022. The webinar series highlighted the status of working with chemical notations, development of digital tools to transform chemical notations into digital entities and ways to implement FAIR data principles across the chemical enterprise and other related disciplines. The main goals were to understand the chemical substances notations within multi-disciplines (geochemistry, nanochemistry, atmospheric chemistry, environmental chemistry, oceanography, crystallography, etc.) and various applied industrial areas (Al/ML, Agri-chemistry, Dyes and Pigments for Textiles, Pharmaceutical cheminformatics). The series explored data resources that are used in these applied areas, discussed the current ways of communication and accessing data by other groups, and investigated the various digital and machine-readable depiction ways or notations of chemical substances, reactions, and datasets. The content and the recording of the webinars are found on the WorldFAIR Chemistry Zenodo community [8-10].

The third webinar focused on some of the existing notations for single molecular entities: InChI, HELM, SMILES, graphical representation, and systematic nomenclature. Our guest speakers discussed the user perspective of working with these tools, challenges, expansion to cover chemistry needs, and how could they be complementary to each other to meet chemistry data and cheminformatics needs?. We are planning to host a 4th round in January or February to elaborate on the innovation of chemical descriptions, e.g. chemicals in complex systems: such as reactions, multiple-component systems/mixtures, complexes, composites, and using these in different computational settings. representation services/tools/mechanisms.

Additionally, we are aiming to have early-stage prototypes of services available to review by the community by early in 2023. More in person events are planned in collaboration with several cheminformatics groups (including RDA CRDIG, ACS CINF, and NFDI4Chem) in various occasions as listed on page 16. Mark your calendar for March 29 at ACS in Indianapolis. We are organizing a one-day workshop to present early work on these resources. We are glad to invite you to share your feedback on what will help you to implement these in your workflows—what works well? What needs further refinement? What is missing? The workshop will be supported by hands-on activities, so bring your laptops! Stay tuned and join us in our webinars, workshops, symposia, and talks.

Get Engaged and Volunteer with Us

As mentioned earlier, our sub-projects have many tasks to explore, and many services will be available soon to be tested. We invite you not only to attend our outreach activities above, but also to be an active part of this project. The future of implementing the FAIR data principle in chemical data is bright, and it is worth it to be part of this significant effort of the IUPAC and WorldFAIR. Lastly, to check out the project and its progress, visit our website regularly [https:// iupac.org/project/2022-012-1-024/] in addition to our Zenodo community [https://zenodo.org/communities/fairchemistry/], and follow us on Twitter [@ FAIRChemistry].

References

1. IUPAC Digital Standards, an index page collecting InChI, ThermoML, JCAMP-DX, etc, https://iupac.org/ what-we-do/digital-standards/

2. Bruno et al, 2020, “Here come the Crystal Structure Data”, from the SSP – Charleston Conference Joint Webinar “Here came the Data”, broadcast 5 Feb 2020, https://charlestonlibraryconference.com/ here-come-the-data/

3. IUPAC International Chemical Identifier, https://iupac. org/inchi or https://iupac.org/who-we-are/divisions/ division-details/inchi/

4. The IUPAC Committee on Publications and Cheminformatics Data Standards, https://iupac.org/ body/024

5. CODATA coordinated project ‘WorldFAIR: Global cooperation on FAIR data policy and practice’, https:// codata.org/initiatives/decadal-programme2/worldfair/

6. IUPAC Project “WorldFAIR Chemistry: making IUPAC assets FAIR”, https://iupac.org/project/2022-012-1-024/

7. WorldFAIR Chemistry Zenoto community, curated by FAIRChemistry, create 26 Sep 2022, https://zenodo.org/ communities/fairchemistry/

8. S. Chalk, I. Bruno, L. McEwen, and F. Mustafa, 2022, FAIRChemistry Webinar 01/03 “What is a chemical? - Handling Chemical Data Across Disciplines”, https://doi. org/10.5281/zenodo.7259101

9. Chalk, et al, 2022 FAIRChemistry Webinar 02/03 “What is a chemical? – Applying Chemical Data to Industry Challenges”, https://doi.org/10.5281/zenodo.7259727

10. Chalk, et al, 2022 FAIRChemistry Webinar 03 “What is a chemical? – User Perspectives on Digital MachineReadable Depictions”, https://doi.org/10.5281/ zenodo.7435258

Leah R. McEwen, of Cornell University, is the chair of the IUPAC Committee on Publications and Cheminformatics Data Standards, http://orcid. org/0000-0003-2968-1674

Fatima Mustafa, of Texas A&M-San Antonio, is the IUPAC WorldFAIR Chemistry project coordinator, https://orcid.org/0000-0001-6754-7375

CITE: Chemistry International, vol. 45, no. 1, 2023, pp. 14-17. https://doi.org/10.1515/ci-2023-0104

This article is from: