Final Report June, 2021
E-CAM The European Centre of Excellence for Software, Training and Consultancy in Simulation and Modelling
Funded by the European Union under grant agreement 676531
E-CAM Final Report
Page ii
Contents Executive Summary
1
1 Introduction
2
2 Highlights from E-CAM
3
3 Overview of the results achieved 3.1 WP1, Classical MD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 WP2, Electronic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 6 9
3.3 WP3, Quantum Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 WP4, Meso and Multi-scale Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 WP5, Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14 17 20
3.6 WP6, Software Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 WP7, Hardware considerations and the PRACE relationship . . . . . . . . . . . . . . . . . . . . . . . . . . .
24 25
3.8 WP8, Engagement with Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 WP9, Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 26
4 Summary of Exploitable results
28
5 Conclusions
30 32
Copyright notices: This report was coordinated by CH (project coordinator), on behalf of the E-CAM consortium. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0.
E-CAM Final Report
Page 1
Executive Summary E-CAM is an e-infrastructure for software development, training, and industrial discussion in simulation and modelling. It started in October 2015 and received funding from the European Union’s Horizon 2020 R&I program for a period of 66-months. Built around the scientific community of the Centre Européen de Calcul Atomique et Moléculaire (CECAM), E-CAM is coordinated by CECAM Headquarters (HQ) at EPFL, and is a partnership of 14 CECAM Nodes, 4 Partnership for Advanced Computing in Europe (PRACE) centres and one Centre for Industrial Computing (Hartree Centre). The present final report provides an overview of the project results during the EU funding period, and includes: • project highlights; • overview of the project results in each Work-Package (WP); • exploitable results in E-CAM.
E-CAM Final Report
Page 2
1 Introduction The overall objective of E-CAM is to create, develop and sustain a European infrastructure for computational science applied to simulation and modelling of materials and of biological processes of industrial and societal interest. To achieve its objective, E-CAM uses the following complementary instruments: 1. development, testing and dissemination of modular software targeted at end-user needs; 2. tuning those codes to run on HPC, through application co-design and the provision of HPC oriented libraries and services; 3. advanced training of current and future academic and industrial researchers in this area; 4. multidisciplinary, coordinated, top level discussions to support industrial end-users (both large multinationals and SMEs) in their use of simulation and modelling. Our approach is focused on four scientific areas, critical for high-performance simulations relevant to key societal and industrial challenges, ranging from the design of new materials and drugs to energy research. These are: • classical molecular dynamics • electronic structure calculations • quantum dynamics • meso- and multiscale modelling. E-CAM develops new scientific ideas and transfers them to algorithm development, optimisation, and parallelization in these four respective areas, and delivers the related training. E-CAM is built around the scientific community of CECAM, and it also involves the computational and hardware expertise of PRACE. E-CAM is coordinated by CECAM HQ at EPFL, and is a partnership of 14 CECAM Nodes, 4 PRACE centres and one Centre for Industrial Computing (Hartree Centre). Figure 1 is a map of the E-CAM consortium, showing both its European reach and the strong ties to the CECAM network. CSC
UK-DARESBURY ICHEC
IRL
E-CAM: 18 partners (COO: CECAM HQ, EPFL) FI
NL DE-MMS
UK-JCMAXWELL
DE-JUELICH
DE-SMSM
FR-MOSER
AT
HQ IT-SISSA FR-RA
IT-SNS
IT-SIMUL
ES
CECAM Node / E-CAM Partner
E-CAM Partner
Figure 1: Map of the E-CAM partners and the connection to CECAM. E-CAM received funding from the EU between October 2015 and March 2021. In the next sections we summarise the highlights of the E-CAM project during that period, and provide an overview of the results achieved within each WP.
E-CAM Final Report
Page 3
2 Highlights from E-CAM Software Developments • E-CAM software library: During the lifetime of the project we built a library of software with more than 200 contributions in the areas of classical molecular dynamics (WP1), electronic structure (WP2), quantum dynamics (WP3) and meso- and multi-scale modelling (WP4). Many of these have applications in industry and are targeted at solving an industrial problem. E-CAM’s software modules satisfy the E-CAM style guidelines for best-practice programming, documentation and testing. – The community of users is large, and in some of our WPs the majority of contributions to the software were provided by external contributors not funded by the project. This is the case of WP2 and WP3, with 51% and 63% of the total number of E-CAM modules produced from participants to the Extended Software Development Workshops (ESDWs) and from collaborations, respectively. • E-CAM has developed transversal libraries aimed at tackling issues that become increasingly important when running applications at scale, in particular relating to the coordination of an ensemble of petascale calculations, and load balancing: – High Throughput Computing library (jobqueue_features) [1]: the library addresses interactive HTC workloads with HPC characteristics, dynamic task-scheduling leveraging the data analytics framework Dask and providing the ability to run multi-node tasks, and on heterogeneous resources (GPU, KNL, ...). In a prototypical application, the HTC library coupled with the OpenPathSampling code for rare events is being used to study the dynamical transitions of the binding/unbinding reaction of the SARS-CoV-2 main protease. – Load Balancing Library (ALL): ALL provides an easy and portable way to include dynamic load balancing into particle based simulation codes, improving the code scalability and the size of systems that can be simulated. The library has significant applicability in codes from the E-CAM community and beyond, and is already implemented in several codes (including the flagship code HemeLB from CompBioMed CoE [2]). • Multi-GPU version of DL_MESO_DPD [3]: The current multi-GPU version of DL_MESO_DPD scales with an 85% parallel efficiency up to 4096 GPUs. This version of the code also includes the ALL library for load balancing, and the Kokkos library for accelerator performance portability has been investigated. • Electronic structure library(ESL) [4]: The goal of the Electronic Structure Library (ESL) is to develop and maintain a set of common libraries for the electronic structure community. E-CAM has been working closely with the ESL for some years now, and we provided support for this community effort through our ESDWs in WP 2 Electronic Structure, and the access to our experts on code development. • Development of the OPS package: E-CAM has been supporting the development of rare events sampling methods [5] and the OpenPathSampling (OPS) library [6],[7], a Python library to facilitate path sampling simulations. OPS was interfaced with MD codes such as LAMMPS and GROMACS to take advantage of their high scalability. Furthermore, OPS was also integrated with our HTC library to run on the largest HPC sites in Europe. • Development of n2p2, a package that provides ready-to-use software for high-dimensional neural network potentials in computational physics and chemistry. One of the major advantages of n2p2 from the user’s perspective is its integration in LAMMPS, as a user contributed package, which allows one to run massively parallelised molecular dynamics simulations with pre-trained neural network potentials. • Development of PANNA: Properties from Artificial Neural Network Architectures[8], a package to train and validate all-to-all connected network models for Behler–Parrinello and modified-Behler–Parrinello type local atomic environment descriptors and atomic potentials. • In a collaboration with the CoE MaX and the NCCR MARVEL, we have developed a new procedure for automatically generating Maximally-Localised Wannier functions (MLWFs) for high-throughput frameworks[9]. This work can facilitate the development of novel materials.
E-CAM Final Report
Page 4
• Contributions to EasyBuild, the E-CAM software manager is a maintainer of the software build and installation framework EasyBuild and has made a number of key contributions over the project lifetime.
Training • Extended Software Development Workshop (ESDW) and transversal training events: we held 18 ESDWs during E-CAM, and 12 transversal training events, 8 of which were in collaboration with PRACE. In total, we trained 474 people during the E-CAM project on topics such as co-design of applications, code rewriting to improve performance, HPC best programming practices, and others. • Online training infrastructure: we built a training portal which now includes training material for more than 100 topics. The portal is built using Clowder, a scalable data repository to share, organize and analyze data. It is intended to support the training aspects of future or ongoing E-CAM events and to build a repository of training/background material by integrating new material from those events. It also allows contributors to the repository to disseminate their training/expository material more widely. The portal includes material ranging from talks focused on state of the art methods to tutorials on the writing of robust software and performance optimization on massively parallel computer platforms. This project will be further developed in the future via co-funding from CECAM, MaX and the NCCR MARVEL. • E-CAM has assisted in the creation of the LearnHPC project which leverages the European Environment for Scientific Software Installations (EESSI) and Magic Castle to create dynamic, event-specific HPC training clusters in public clouds (such as AWS, Azure, GCP and OpenStack). LearnHPC has been awarded resources from the Fenix Research Infrastructure and is in discussion with PRACE as a candidate for remote learning resource for PRACE courses.
Industry • Pilot projects: E-CAM supported academic and industrial research via a set of pilot projects focused on industrially oriented problems, sustained by E-CAM postdoctoral researchers supervised by scientists in the team. Ten of our pilot projects were coordinated with industrial counterparts. • Industry events: we organised industry focused scoping workshops where academics outlined the major advances to be expected in the workshop interest area and industrialists outlined specific items that they wished to see developed and the kind of support that they required. We held 7 scoping workshops, where 58 industrial researchers had access to expertise and discussions in simulation and modeling. Five of these events were co-organised with industry. • To further expand the portfolio of activities targeted at industrialists, E-CAM has established a series of new events targeted at training interested industrial researchers on the simulation and modelling techniques implemented in specific codes and in the direct use of this software for their industrial applications. – The first event of this kind took place on the 16th April, and focused on the area of meso– and multiscale simulations (WP 4) and on the flagship code DL_MESO. A second event is scheduled, in collaboration with the SME BiKi Technologies. • Molecular biosensor: E-CAM supported the development and optimisation of a new type of biosensor for diagnostics, which is now evolving into a commercial product and a start-up. At the heart of this ongoing story are advanced simulation using massively parallel computation, rare-event methods and training.
Outreach • EKHAM the Comic: Identifying exciting and original tools to engage the general public with advanced research is an intriguing and non-trivial challenge for the scientific community. E-CAM decided to try something unusual, and developed a collaboration with experts and artists to use comics to talk about HPC and simulation and modelling. Our comic book was presented at the 2020 Lucca Comics&Games Festival; at the "Science Web Festival" 2021 Ed. and it was the subject of an interview in the Spanish TV. So far, our comic book was read 2’184 times and downloaded 1’269 times.
E-CAM Final Report
Page 5
• Scientific publications: we produced 43 publications during E-CAM, and at least 3 others are in preparation. These papers reported the development of new algorithms and associated software. Some are related to community packages to which we have contributed. Our publications have 1956 citations to date. • E-CAM webpage on the CECAM website: To expand the access points to E-CAM we developed an E-CAM webpage on the CECAM website. CECAM is the coordinator of E-CAM and plays a critical role in E-CAM’s sustainability. This initiative ensures that E-CAM’s most important results continue to be disseminated and communicated to the target groups even after the EU funding period. Activities planned beyond March 2021 will also be reported here.
E-CAM Final Report
Page 6
3 Overview of the results achieved In the following section we will outline the results under 9 categories (called work packages, or WP), which is how the project organised itself to achieve it’s objectives.
3.1 WP1, Classical MD E-CAM’s WP1 "Classical Molecular Dynamics" provides a means for academics and industrialists to address computational questions that involve classical MD calculations. In particular, the WP focuses on the development of software for path-based sampling and analysis to study rare events. This WP developed 70 software modules (against a target of 43) which were certified according to the E-CAM guidelines [10], and other 10 that are work in progress. The modules developed consist of: • Modules for studying the thermodynamics and kinetics of rare events. Under the framework of an E-CAM pilot project on binding kinetics in collaboration with the software vendor SME BiKi Technologies, we have investigated the binding/unbinding of a selective reversible inhibitor for protein GSK3B using path sampling methods. This inhibitor has been proposed as a potential drug for Alzheimer’s Disease. In addition to the specific results from this project, the software developed has a broader applicability than just protein-ligand binding such as protein-protein aggregation or DNA-protein binding, as well as to many questions about large scale conformational changes in biomolecules, such as protein folding. • Modules that contribute to the development of OpenPathSampling (OPS). OPS ([6],[7]) is a software package to perform path sampling simulations and other trajectory-based approaches to study rare events, and is one of the key application codes of WP1. Much of the development of OPS has been supported by E-CAM through resources and at ESDWs. The methods implemented in OPS can be used to study many kinds of problems, including drug binding and unbinding, self-assembly processes, conformational changes in biomolecules, and chemical reactions. During the lifetime of the project we have – Improved the performance and usability of OPS – provided support for OPS to interface with other codes in the MD community (e.g. Integrating LAMMPS with OpenPathSampling, Gromacs engine in OpenPathSampling, and PLUMED Wrapper for OpenPathSampling) – added more trajectory-based rare event methods to OPS – maintained and improved the quality of the OPS code base – integrated OPS with E-CAM’s High Throughput Computing (HTC) library jobqueue_features. OPS can now almost seamlessly transition from use on a personal laptop to some of the largest HPC sites in Europe! A white paper describing this work is in ref. [11]. This work was done in collaboration with a PRACE team in Poland. – promoted the use of OPS for studying systems of highly importance, such as the SARS-CoV-2 main protease. The full portfolio of modules that are based on OPS and were developed in the context of E-CAM can be found here. • Modules to optimize a protein based biosensor. An E-CAM transverse action is the development of a protein based sensor1 with applications in medical diagnostics, scientific visualisation and therapeutics. At the heart of the sensor is a novel protein based molecular switch which allows extremely sensitive real time measurement of molecular targets to be made, and to turn on or off protein functions and other processes accordingly. For a description of the sensor, see here. At the heart of the sensor is advanced simulation using massively parallel computation, rare-event methods and genetic engineering. More recently the sensor caught a great deal of attention due to its potential application in 1 EP3265812A2, 2018-01-10 and WO2018047110A1, 2018-03-15
E-CAM Final Report
Page 7
detecting the presence of SARS-CoV-2. Work in this direction is currently being developed, in straight collaboration with industry (see the story here). Experimental teams will provide a proof of concept for the COVID-19 sensors designed entirely in silico using software developed in the context of E-CAM. Besides the specific results from this work, this whole exercise shows the transformation of an idea born via simulation into a commercial opportunity and a potential start-up creation. We have been reporting the successes of this work through a news blog on our website: From idea to market. • Modules that study polymer dynamics . In the context of a pilot project on the Implementation of contact joint to resolve excluded volume constraints we have developed a new approach (adapted from physics engine to the present specific usage) to resolve the excluded volume constraint problem. The modules originated from this work can be used to study mitotic chromosome unfolding and the rheological properties of polymers. More specifically, the work is being used in a genome wide simulation of the fruit fly (publication in progress). The pilot project webpage provides the complete list of modules developed in the context of this work. • Modules that further expand the Neural Network Potential Package n2p2. The n2p2 package contains software that allows users to use an existing neural network potential parameterization to predict energies and forces (with standalone tools but also in conjunction with the MD software LAMMPS). In addition, it allows to train new neural network potentials with the provided training tools. E-CAM supported n2p2 developments in the context of a pilot project on the Implementation of neural network potentials for coarse-grained models. n2p2 can run on HPC systems through its interface with LAMMPS. Additionally, n2p2 was recently added to the official LAMMPS repository, as a user contributed package. Furthermore, the n2p2 build system was adapted to allow for multiple interfaces to other software packages, with an option to select only those of interest to the user. As a first application, the user contributed CabanaMD package was integrated in the new build process. CabanaMD is a proxy application for Molecular Dynamics (MD) which makes use of the Kokkos performance portability library and n2p2 to port neural network potentials in MD simulations to GPUs and other HPC hardware. The list of modules developed in the context of this work is available here.
The full portfolio of modules developed under WP1 is accessible from the software library for WP1 at https://e-cam.readthedocs.io/en/latest/Classical-MD-Modules/.
Dissemination and exploitation 13 scientific publications originated from the project in the area of Classical MD. 1. Improved description of atomic environments using low-Cost polynomial functions with compact support, Bircher, M. P., Singraber, A. and Dellago, C. Machine Learning: Science and technology 2021. In press. DOI: https://doi.org/10.1088/2632-2153/abf817.Open access. 2. Microswimmers learning chemotaxis with genetic algorithms, Benedikt Hartl et al., PNAS 2021, 118, 19. DOI:https://www.pnas.org/content/118/19/e2019683118. Open access 3. Unfolding the prospects of computational (bio)materials modelling, G. J. Agur Sevink et al., J. Chem. Phys. 2020, 153, 100901. DOI: https://doi.org/10.1063/5.0019773. Open access version here. 4. Reliable computational prediction of the supramolecular ordering of complex molecules under electrochemical conditions, Benedikt Hartl et al., J. Chem. Theory Comput. 2020, 16, 8, 5227–5243. DOI: https://doi.org/10.1021/acs.jctc.9b01251. Open access 5. Atomistic insight into the kinetic pathways for Watson-Crick to Hoogsteen transitions in DNA, Jocelyne Vreede et al., Nucleic Acids Research 2019, 47, 21, 11069–11076. DOI: https://doi.org/10.1093/nar/gkz837. Open access 6. OpenPathSampling: A Python Framework for Path Sampling Simulations. 1. Basics David W. H. Swenson et al, J. Chem. Theory Comput. 2019, 15, 813-836.
E-CAM Final Report
Page 8
DOI: https://doi.org/10.1021/acs.jctc.8b00626.Open access 7. OpenPathSampling: A Python Framework for Path Sampling Simulations. 2. Building and Customizing Path Ensembles and Sample Schemes, David W. H. Swenson et al, J. Chem. Theory Comput. 2019, 15, 837-856. DOI: https://doi.org/10.1021/acs.jctc.8b00627. Open access 8. The asymmetric Wigner bilayer, Moritz Antlanger et al., J. Chem. Phys. 2018, 149, 244904. DOI: https://doi.org/10.1063/1.5053651. Open access version here. 9. Unimolecular FRET sensors: Simple linker designs and properties, Shourjya Sanyal et al., Nano Communication Networks 2018, 18, 44–50. DOI: 10.1016/j.nancom.2018.10.003. Open access version here. 10. The opposing effects of isotropic and anisotropic attraction on association kinetics of proteins and colloids, Arthur C. Newton at al, J. Chem. Phys. 2017, 147, 155101. DOI: http://doi.org/10.1063/1.5006485. Open access version here. 11. Benchmarking a Fast Proton Titration Scheme in Implicit Solvent for Biomolecular Simulations, Fernando Luís Barroso da Silva and Donal MacKernan, J. Chem. Theory Comput. 2017, 13, 2915-2929. DOI: https://doi.org/10.1021/acs.jctc.6b01114. Open access version here. 12. Rich Polymorphic Behavior of Wigner Bilayers, Moritz Antlanger et al., Phys. Rev. Lett. 2016, 117, 118002. DOI: https://doi.org/10.1103/PhysRevLett.117.118002. Open access 13. Equilibrium structures of anisometric, quadrupolar particles confined to a monolayer, Thomas Heinemann et al., J. Chem. Phys. 2016, 144, 074504. DOI: https://doi.org/10.1063/1.4941585. Open access version here. The software produced within this WP was disseminated via the articles above, but also at conferences (e.g. FOODSIM, PASC) and workshops organized by the members of the WP (see next sections); via the project website (e.g. in success stories, in the modules of the month category, on the newsletter, etc.). For an overview of the news items on our website that are associated to this WP see here. Six deliverables produced during the project lifetime and that are listed here (number 1.x deliverables) provided further dissemination channels. We would like to note that the activities within this WP fostered several scientific collaborations. In particular, the work on the development of OPS has been used in Bachelor, Master and PhD thesis. The 4th paper on this list is an example of such collaboration. Collaborations with the University of Amsterdam (P. Bolhuis) and the Chodera lab at MSKCC in the US (John Chodera) are to be highlighted too when mentioning the developments on OPS, rare events and path sampling. The modules developed for the excluded volume constraints problem, are being used in collaborations at the ENS Lyon and at the University of Grenoble Alpes. Furthermore, these modules will be used by a funded Agence National de la Recherche (ANR) project (CRYOCHROM).
Industrial impact Industry connection within this WP happened through two pilot projects in collaboration with industry: • Binding Kinetics, in collaboration with BiKi Technologies • Food and Pharmaceutical Proteins, in collaboration with APC. This work is in direct connection to the development of a molecular biosensor. Furthermore, we have held six workshops in WP1 that dealt with topics of interest for industry: 1. State-of-the-Art Workshop in Reaction Coordinates from Molecular Trajectories, 29 August - 2 September 2016, Lorentz Centre, Leiden, The Netherlands. Workshop report. 2. State-of-the-Art Workshop in Large Scale activated event simulations, 1 - 3 October 2018, CECAM-AT, Austria. Workshop report.
E-CAM Final Report
Page 9
3. Scoping workshop: Building the bridge between theories and software: SME as a boost for technology transfer in industrial simulative pipelines, 23 - 25 May 2018, Fondazione Instituto Italiano di Tecnologia (IIT), Genoa, Italy. Workshop report. 4. Scoping Workshop: Solubility prediction, 14 - 15 May 2018, CECAM-FR-RA, Ecole Normale Supérieure de Lyon, France. Workshop report. 5. Scoping workshop on Electrochemical energy storage: Theory meets industry, 12 - 14 June 2019, CECAM-FRMOSER, France. Workshop report. 6. Simulation of open systems in Chemistry, Pharma, Food Science and Immuno-diagnostics: Rare-event methods at constant chemical potentials including constant pH, 25 February, 2 March, 23 March and 25 March 2021. Workshop page.
Training During the lifetime of the project we organized five ESDWs in the area of Classical MD: • Inverse Molecular Design & Inference: building a Molecular Foundry, 4 - 8 2019, CECAM-IRL, Ireland. Workshop page. • Topics in Classical MD, 3 - 12 April 2019, CECAM-FR-RA, ENS Lyon, France. Workshop report. • Intelligent High Throughput Computing for Scientific Applications, 16 - 20 July 2018, CECAM-IT-SIMUL, Politécnico de Torino, Turin, Italy. Workshop report. • Classical Molecular Dynamics, 14 - 25 August 2017, CECAM-NL, Lorentz Centre, Leiden, The Netherlands. Workshop report. • Trajectory Sampling, 114 - 25 November 2016, CECAM-AT, Traunkirchen, Austria. Workshop report. Additionally, we held a joint PRACE/E-CAM Tutorial on Machine Learning and Simulations on the 10-13 March 2020. In January and February 2021 we organized a series of three webinars on High Throughput Computing with Dask and E-CAM’s HTC library Jobqueue-Features. Key lectures are available here.
Societal impact The societal benefits arising from the developments in WP1 are twofold. At the fundamental level, the tools will enable new discoveries and developments, for instance in material science and molecular medicine. This will be made possible by the scientific community using the sampling and analysis tools that are being developed. On an economic level, industry can benefit from software containing efficient and easy to use simulation and analysis tools to extract observables for applications including but not limited to: • Drug design • Diagnostics • Food/dairy industry • Genomics • Materials science Society can benefit from the methods developed and ultimately the software, which can make drug development cheaper, improve diagnostics &medicine, food quality, and help in the development of high-tech materials.
3.2 WP2, Electronic Structure E-CAM’s WP2 "Electronic Structure" focuses on the development of software that can help understand the behaviour of electrons systems, fundamental for evaluating and predicting the properties of physical systems of interest in condensed matter physics and quantum chemistry, which have a great range of applications e.g. in material science, biology and nano-medicine.
E-CAM Final Report
Page 10
This WP developed 61 software modules (against a target of 40) which were certified according to the E-CAM guidelines [10]. The modules developed are: • Modules that are contributions to the ESL initiative. E-CAM worked closely with the ESL — The Electronic Structure Library[4] initiative, and four of our ESDWs have been dedicated to the ESL and to supporting this community initiative. The goal of the Electronic Structure Library (ESL) is to provide a repository of tested, common software for electronic structure computations that can work as a tool box of utilities used by many electronic structure codes, and that is simple to use by non-experts in a wide range of fields and applications. In this way, new ideas, and new science, can be coded by scientists without needing to rewrite functionalities that are already well-established, and without needing to know more software engineering than science. In other words, the goal is to separate the coding effort for cutting-edge research from the software infrastructure, which needs maintaining and rewriting at every step of the hardware race. The impact of the ESL can be therefore quite significant for the electronic structure community. For instance, the ESL Demonstrator is an atomic-scale simulation software illustrating how to use and bring together the various available components of the ESL. ESL-Demo is part of the ESL and is already used by newcomers to the electronic-structure field who want to learn how to perform electronic-structure calculations. In the same way, the ESL Bundle is a collection of libraries and utilities broadly used in electronic structure calculations, put together by the ESL to make their use easier by researchers and scientific software developers. The modules contributed by the ESL to E-CAM’s software library are listed here. • Modules that extend the Wannier90 software package. Wannier90 [12] is a program that, for a given system, generates the Wannier functions with minimum spatial spreads, known as Maximally localized Wannier functions (MLWF)s. The locality of MLWFs can be exploited to compute, among other things, band-structure, density of states and Fermi surfaces at modest computational cost. Wannier functions are an important class of functions which enable to obtain a real-space picture of the electronic structure of a system. They provide an insightful chemical analysis of the nature of bonding, and chemical reaction in condensed-matter physics, similar to the role played by localized molecular orbitals in chemistry. They are also a powerful tool in the study of dielectric properties via the modern theory of polarisation. The modules built in the context of E-CAM originate from a Wannier90 dedicated ESDW and one pilot project (see next point), and they meet the desire of the electronic-structure community to extend the use of WFs, and in particular of MLWFs, to a broader class of physical and chemical problems by adding new functionalities to the Wannier90 code. The modules contributed by the Wannier90 developers to E-CAM’s software library are listed here. • Modules that build electronic structure functionalities for multi-thread workflows. In the context of a pilot project at the University of Cambridge, E-CAM developed tools to improve, isolate and automatize the calculation of MLWFs, key quantities to analyze the nature of chemical bonding (and therefore of material’s properties) and its evolution during, for example, chemical reactions [12]. In particular, researchers implemented the Selected Columns of Density Matrix (SCDM) method in the pw2wannier90 interface code between the Quantum Espresso software and the Wannier90 code. Then, they used this implementation as the basis for a complete computational workflow for obtaining MLWFs and electronic properties based on Wannier interpolation of the Brillouin zone, starting only from the specification of the initial crystal structure. The workflow was implemented within the AiiDA materials informatics platform, and used to perform a High Throughtput study on a dataset of 200 materials. This work was done in the context of a collaboration between E-CAM, the MaX Centre of Excellence and NCCR MARVEL, and it is described in detail in Ref. [9] and in a success story reported here. • Modules for the quantum mechanical parameterisation of metal ions in proteins. In the context of the ECAM Pilot Project on Quantum Mechanical Parameterisation of Metal Ions in Proteins in collaboration with BiKi Technologies, we developed a new approach for the accurate parameterisation of the metal ion-protein interactions in water using machine learning techniques. One quarter to one third of all proteins require metals to function but the description of metal ions in standard force fields is still quite primitive. The training scheme combines classical simulation with electronic structure calculations to produce a force field comprising standard classical force fields with additional terms for the metal ion-water and metal ion-protein interactions. The approach allows simulations to run as fast as standard molecular dynamics codes, and is suitable for efficient massive parallelism scale-up.
E-CAM Final Report
Page 11
The modules developed for this purpose are listed here. Furthermore, the metal-ion force field developed by E-CAM is freely available. A paper was published with this work [13], as well as a success story. • Modules that are interfaces between the QMCPack package and Quantum Espresso. In the context of an E-CAM pilot project, we built interfaces between QMCPack and the commonly used software for electronic structure computations Quantum Espresso. Such interfaces can be used to establish an automated, black box workflow for Quantum Monte Carlo (QMC) computations. QMC simulations can for example be used in the benchmark and validation of DFT calculations: such a procedure can be employed in the study of several physical systems of interest in condensed matter physics, chemistry or material science, with application in industry, e.g. in the study of metal-ion or water-carbon interfaces. The modules developed in the context of this pilot project are listed here. • Modules that are tools to easily manipulate molecular geometries. A set of pre- and post-treatment Fortran codes that can be used to easily manipulate molecular geometries was created, allowing to minimize the average energy obtained for a range of internuclear distances for the dimers of each element, and decrease the computational cost of a DFT calculation. These modules were developed in the context of the Pilot Project on Calculations for Applications in Photovoltaic Devices, in collaboration with Merck. Then, E-CAM scientists have used electronic structure calculations to study how a key quantity – the HOMO-LUMO band gap – changes with respect to the molecular disposition of the donor-acceptor molecule pair. The work was reported in a joint publication with Merck[14], and in this success story. • Neural network models of condensed matter systems with PANNA. PANNA (Properties from Artificial Neural Network Architectures) is a Python package for the design, implementation and deployment of deep neural networks for the study of solid state, and especially for crystal systems. The tools provided in this package allow the user to input their own data to train and validate neural networks to model condensed matter systems; an interface with the TensorFlow package also allows efficient and user friendly implementation and monitoring of the network. Computation of energies and forces are available, to perform structural optimisation, and an interface with the molecular dynamics software LAMMPS allows the network model to be used to build an effective interatomic potential. As an open source software PANNA aims to provide to users both in academia and in the industry a platform for realistic simulations in material science, for example for the prediction of the equilibrium geometry of a solid or a molecule, or the computation of optical spectra and electronic properties of crystals (band gaps, Debye temperatures or the density of states). The work is described in Ref. [8], and the list of software modules developed in the context of the work can be found in our software library. • Modules for Mass-Zero Constrained Dynamics for Orbital Free Density Functional Theory. The Mass-Zero (MaZe) package implements the orbital-free formulation of density functional theory, in which the optimisation of the energy functional is performed directly in terms of the electronic density without use of Kohn-Sham orbitals. This feature avoids the need for satisfying the orthonormality constraint among orbitals and allows the computational complexity of the code to scale linearly with the dimensionality of the system. The main goal of the software is to produce particles trajectories to be analysed in post-production by means of external software. • Modules for calculating the vibrational free energy of periodic crystals. Caesar is a utility for calculating the vibrational free energy of periodic crystals, and a number of related vibrational properties, using the vibrational self-consistent field approximation. Caesar is intended to provide a vibrational method which is more accurate than the widely-used harmonic approximation and the more sophisticated effective harmonic approximation, but which is computationally inexpensive enough to be integrated into high-throughput workflows. The full portfolio of modules developed under WP2 is accessible from the software library for WP2 at https://e-cam.readthedocs.io/en/latest/Electronic-Structure-Modules/index.html. 51% of the total number of modules developed under WP2 arise from external contributions (i.e. participants to our ESDWs and collaborations which are not directly supported by the project), and the remaining from internal contributions (i.e. from Postdoctoral Research Associate (PDRA) contributions). The external contribution is notable and reveals the significant number of collaborations this WP has established. We have worked with different groups (e.g. the Max CoE, NCCR Marvel), community projects (e.g. the ESL), several packages (e.g. Quantum Espresso, QMCPack), and the events organized within this WP (ESDWs and SAWs) have helped catalyse new developments in the field. In general terms, the work performed in WP2 of E-CAM will have a significant impact in the community.
E-CAM Final Report
Page 12
Dissemination and exploitation 12 scientific publications originated from the project in the area of Electronic Structure. 1. A Systematic Approach to Generating Accurate Neural Network Potentials: the Case of Carbon, Yusuf Shaidu et al., npj: Computational materials 2021, 7, 52. DOI:10.1038/s41524-021-00508-6. Open access 2. Discovering the Elusive Global Minimum in a Ternary Chiral Cluster: Rotational Spectra of Propylene Oxide Trimer, Xie, F. et al., Angewandte Chemie (International ed. in English) 2020, 59(50), 22427–22430. DOI:https://doi.org/10.1002/anie.202010055. Open access 3. Quantum Monte Carlo determination of the principal Hugoniot of deuterium, Michele Ruggeri, Markus Holzmann, David M. Ceperley, and Carlo Pierleoni, Phys. Rev. B 2020, 102, 144108. DOI: https://doi.org/10.1103/PhysRevB.102.144108. Open access version here. 4. PANNA: Properties from Artificial Neural Network Architectures, Ruggero Lot, Franco Pellegrini, Yusuf Shaidu, Emine Küçükbenli, Comput. Phys. Commun. 2020, 256, 107402. DOI: https://doi.org/10.1016/j.cpc.2020.107402. Open access version here. 5. Automated high-throughput Wannierisation, Valerio Vitale, Giovanni Pizzi, Antimo Marrazzo, Jonathan R. Yates, Nicola Marzari and Arash A. Mostofi, npj Comput Mater 2020, 6, 66. DOI: https://doi.org/10.1038/s41524-020-0312-y. Open access 6. The CECAM Electronic Structure Library and the modular software development paradigm, Micael J. T. Oliveira et al., J. Chem. Phys. 2020, 153, 024117. DOI: https://doi.org/10.1063/5.0012901. Open access version here. 7. Gap variability upon packing in organic photovoltaics, D. López-Durán , Etienne Plésiat, Michal Krompiec and Emilio Artacho, PLoS ONE 2020, 15(6): e0234115. DOI: https://doi.org/10.1371/journal.pone.0234115. Open access 8. Wannier90 as a community code: new features and applications, G. Pizzi, V. Vitale, N. Marzari, D. Vanderbilt, I. Souza, A. A Mostofi, J. R Yates, et al., J. Phys.: Condens. Matter 2020, 32 165902. DOI: https://doi.org/10.1088/1361-648X/ab51ff. Open access 9. Force Field Parametrization of Metal Ions from Statistical Learning Techniques, Francesco Fracchia et al., J. Chem. Theory Comput. 2018, 14, 255-273. DOI: https://doi.org/10.1021/acs.jctc.7b00779. Open access version here. 10. Lithium Adsorption on Graphene at Finite Temperature, Yusuf Shaidu, Emine Küçükbenli, Stefano de Gironcoli, J. Phys. Chem. C 2018, 122(36), 20800-20808. DOI:https://doi.org/10.1021/acs.jpcc.8b05689. Open access version here 11. ζ-Glycine: insight into the mechanism of a polymorphic phase transition, Craig L. Bull et al., IUCrJ 2017, 4, 569–574. DOI: https://doi.org/10.1107/S205225251701096X. Open access version here 12. A parallel orbital-updating based plane-wave basis method for electronic structure calculations, Yan Pan, Xiaoying Dai, Stefano De Gironcoli et al., Journal of Computational Physics 2017, 348, 482-492. DOI: https://doi.org/10.1016/j.jcp.2017.07.033. Open access version here We would like to note, in particular, that the 6th article on this list relating to the ESL was included in AIP Scilights. The software produced within this WP was also disseminated via conferences (e.g. APS meetings) and workshops organized by the members of the WP; via the project website (e.g. in success stories, in the modules of the month category, on the newsletter, etc.). For an overview of the news items on our website that are associated to this WP see
E-CAM Final Report
Page 13
here. Six deliverables produced during the project lifetime and that are listed here (number 2.x deliverables) provided further dissemination channels. The activities within this WP fostered several scientific collaborations and we highlight collaboration with researchers at the EPFL (S. Bonella, N. Marzari, G. Pizzi), SISSA (S. Gironcoli, E. Küçükbenli), SNS di Pisa (F. Fracchia, G. Mancini), IIT Genova (W. Rocchia), University of Cambridge (M. Payne, V. Vitale, E. Artacho), Imperial College London (A. Mostofi), ENS Paris (R.Vuilleumier), Maison de la Simulation - Université Paris-Saclay (C. Pierleoni, M. Ruggeri), CIC Nanogune (D. Lopez), Simune Atomistics (Y. Pouillon), MPSD Hamburg (M. Olivera).
Industrial impact Industry connection within this WP happened through two pilot project in collaboration with industry: • Calculations for Applications in Photovoltaic Devices, in collaboration with Merck. The objective of this work was to provide important insight into the fabrication of organic solar cells. The output of this work was documented in a scientific publication together with the industrial partners[13] and a success story here. • Quantum Mechanical Parameterisation of Metal Ions in Proteins, in collaboration with BiKi Technologies. The objective of this work was to build-up a suitable parameterisation of metal ions in protein-water systems using machine learning techniques. The output of this work was documented in a scientific publication together with the industrial partners[14] and a success story here. We have held three workshops in WP2 that dealt with topics of interest for industry: 1. State-of-the-Art Workshop in Electronic Structure, 12 - 14 September 2016, Cranage Hall, United Kingdom. Workshop report. 2. State-of-the-Art Workshop: Improving the accuracy of ab-initio predictions for materials, 17 - 20 September 2018, CECAM-FR-MOSER, France. Workshop report. 3. Scoping Workshop: From the Atom to the Material, 18 - 20 September 2017, CECAM-UK-JCMAXWELL, University of Cambridge, United Kingdom. Workshop report. The scoping workshop (number 3 on the list) had particular success among industry and was attended by 8 industrialists from 7 large companies.
Training During the lifetime of the projects we organized four ESDWs in the area of Electronic Structure: • Electronic Structure Library Coding - Solvers, 6 - 17 July 2016, CECAM-ES, The Zaragoza Scientific Center for Advanced Modeling, Zaragoza, Spain. Workshop report. • Wannier90 Software Development Workshop, 12 - 16 September 2016, San Sebastian, Spain. Workshop report. • Scaling Electronic Structure Applications, 7 - 18 January 2019, CECAM-IRL, University College Dublin, Ireland. Workshop report. • Integration of ESL modules into electronic-structure codes, 17 - 28 February 2020, CECAM-HQ-EPFL, Lausanne, Switzerland. Workshop page. An ESDW postponed from the 2020 programme of events (due to the COVID-19 pandemics) will happen on the 11 - 22 October 2021 at the CECAM-HQ at EPFL. The website for the event "ESDW: Improving bundle libraries" is at https://www.cecam.org/workshop-details/23.
Societal impact Developments in electronic structure theory have wide-reaching technological implications, by improving the understanding of materials, their properties, and can help the discovery of new materials for a large range of applications. Specific classes of material directly addressed by E-CAM developments include, but are not limited to, photovoltaic
E-CAM Final Report
Page 14
materials, drug design, catalysts, novel superconductors, topological insulators, electronic devices. The methods developed can allow cheaper and more effective product development in a wide range of industries. This is supported by the work of the ESL. There is a significant level of academic/industrial interaction in the field of WP2; and studies point out that it is not unusual for industry to outsource their electronic structure work to research groups in academia. In a report from Goldbeck Consulting on "Industry interactions of the electronic structure research community in europe", results from a survey on more than 400 scientists from 33 different institutions in 12 European countries showed the significant number of collaborations (90% of the respondents) between academia and industry. The workshops organized by E-CAM (sec. 3.2) and the results from the E-CAM pilot projects are key to promote collaborations and societal change in the field of WP2. In this context, E-CAM has also produced a number of case studies that report on successful academic/industrial interactions, more specifically the following: Calculations for Applications in Photovoltaic Devices, Accelerating the design and discovery of materials with tailored properties using first principles high-throughput calculations and automated generation of Wannier functions, and The simulation of metal ions in protein-water systems using machine learning.
3.3 WP3, Quantum Dynamics E-CAM’s WP3 "Quantum Dynamics" provides a means for academic and industrial users of addressing computational questions that involve quantum dynamics, in particular, by developing scalable, open source software with verified quality standards, appropriate documentation and testing. This WP developed 32 software modules (against a target of 30) which were certified according to the E-CAM guidelines [10]. The modules developed are: • Modules for Quantum computing. Within the E-CAM Pilot Project on Quantum Computing, in collaboration with IBM Research in Zürich, we developed a toolbox of subroutines and functionalities that implement the Local Control Theory (LCT) used for construction of control pulses for tuning universal logical quantum gates in quantum computers. These modules respond to the needs of the industrial partner, who is developing a universally programmable quantum computer designed on superconducting transmon qubits. The modules developed for this purpose are listed here. • Modules that expand the functionalities and the efficiency of E-CAM WP3 flagship codes: PaPIM and QUANTICS. – PaPIM is a high performance code for calculation of equilibrated system properties (observables). Some properties can be directly obtained from the distribution function of the system, while properties that depend on the dynamics of the system, such as the structure factor, infrared spectrum or reaction rates, can be obtained from the calculation of appropriate time correlation functions. PaPIM samples either the quantum (Wigner) or classical (Boltzmann) density functions and computes approximate quantum and classical correlation functions. The code is highly parallelized and suitable for use on large HPC machines (see the code’s performance analysis). The code’s modular structure enables an easy update/change of any of its modules. Furthermore the coded functionalities can be used independently of each other. The different modules that make up the PaPIM code are available on our software repository here. – QUANTICS is a package for molecular Quantum Dynamics simulations that simulates quantum nuclear motion by solving the time-dependent Schröedinger equation. The portfolio of modules developed for QUANTICS within E-CAM is composed of extensions to the programme’s functionalities (see the full list of modules developed for QUANTICS here). • Additional modules that implement state-of-the-art algorithms in the domain of Quantum Dynamics and extend the scope of the E-CAM repository. These modules also correspond to users needs but don’t belong to one of the categories above, such as the modules based on the general quantum dynamics code ElVibRot, used in the E-CAM Pilot project on code optimization for exact and linearized quantum dynamics and documented here, or the modules that represent useful tools for an efficient and easy accessible interface between quantum dynamics simulations and electronic structure calculations.
E-CAM Final Report
Page 15
The full portfolio of modules developed under WP3 is accessible from the software library for WP3 at https://e-cam.readthedocs.io/en/latest/Quantum-Dynamics-Modules/index.html. 63% of the total number of modules developed under WP3 arise from external contributions (i.e. participants to our ESDWs and collaborations which are not directly supported by the project), and the remaining arise from internal contributions (i.e. from PDRAs). These numbers reveal the significant impact E-CAM is having on this community. Due to the novelty and complexity of the area of quantum dynamics (both on the ground electronic state and nonadiabatic), no well established community software exists and E-CAM is playing an important role in creating a reference library and a set of well maintained training and research tools. Our events in the area of WP3 ( ESDWs and SAWs) help catalyse new developments (see for example ref. [15]) and fostering the transition from in-house codes to reliable, modular, scalable and well documented community packages.
Dissemination and exploitation 5 publications in the area of Quantum Dynamics originated from the project . 1. Adiabatic motion and statistical mechanics via mass-zero constrained dynamics, S. Bonella, A. Coretti, R. Vuilleumier and G. Ciccotti, Phys. Chem. Chem. Phys. 2020, 22, 10775-10785, DOI: https://doi.org/10.1039/D0CP00163E. Open access version here. 2. A molecular perspective on Tully models for nonadiabatic dynamics, L. M. Ibele and B. F. E. Curchod, Phys. Chem. Chem. Phys. 2020, 22, 15183, DOI: https://doi.org/10.1039/D0CP01353F. Open access version here. 3. Local control theory for supercomputing qubits, M. Mališ, P. KI. Barkoutsos, M. Ganzhorn, S. Filipp, D. J. Egger, S. Bonella and I. Tavernelli, Phys. Rev. A 2019, 99, 052316, DOI: https://doi.org/10.1103/PhysRevA.99.052316. Open access version here. 4. The Fluctuation-Dissipation theorem as a diagnosis and cure for zero-Point energy leakage in quantum thermal bath simulations, E. Mangaud, S. Huppert, T. Plé, P. Depondt, S. Bonella, F. Finocchi, J. Chem. Theory Comput. 2019, 15, 2863-2880, DOI: https://doi.org/10.1021/acs.jctc.8b01164. Open access version here. 5. Sampling the thermal Wigner density via a generalized Langevin dynamics, T. Plé, S. Huppert, F. Finocchi, P. Depondt, and S. Bonella, J. Chem. Phys. 2019, 151, 114114, DOI: https://doi.org/10.1063/1.5099246. Open access version here Publication 3 was realised with the industrial partner IBM. The software produced within this WP was also disseminated via conferences and workshops attended by the members of the WP, via the project website (e.g. in success stories, in the modules of the month category, on the newsletter, etc.) - an overview of our dissemination activists towards promoting the results in this WP is here. Six deliverables produced during the project lifetime and that are listed here (number 3.x deliverables) provided further dissemination channels. Several scientific collaborations have been fostered by the activities in this WP. Collaborations among researchers at the EPFL (Sara Bonella), the Sorbonne University (Simon Huppert), Université Paris-Sud (Federica Agostini), ENS Paris (Rodolphe Vuilleumier), Durham University (Basile Curchod), UCL (Graham Worth), IBM Research Zürich (Ivano Tavernelli), among others, have led to master thesis, PhD thesis and post-doctoral research projects which actively contribute to the outcomes of this WP. Furthermore, the interactions between researchers at these events and the output in the form of new methods and software has been crucial in building up other research projects (as for example the project funded through the Agence Nationale de la Recherche (ANR) which will use as basis for future computational developments the module G-CTMQC documented here.
Industrial impact Industry connection within this WP happened through the pilot project on Quantum Computing in collaboration with a theory and simulation group at IBM (see description and outcomes here). The aim of this pilot project was to develop a new method and dedicated software for designing control pulses to manipulate qubit systems based on the local control theory (LCT). The method is highly robust and requires only qubit parameters (frequencies, coupling
E-CAM Final Report
Page 16
terms) as inputs, and can generate state preparation pulses between any number of qubits in just one evolution of the full system. E-CAM contributed to the theoretical developments in the project, and to the production of software for efficient generation of state preparation pulses. Furthermore, we have held two workshops in WP3 that dealt with topics of interest for industry: 1. ECAM State of the Art Workshop: Different Routes to Quantum Molecular Dynamics, 6 - 10 June 2016, CECAMHQ, EPFL, Lausanne, Switzerland. Workshop report. 2. Recent developments in quantum dynamics, an E-CAM state-of-the-art workshop, 17 - 21 June 2019, CECAMFR-RA, Lyon, France. Workshop report.
Training During the lifetime of the projects we organized four ESDWs in the area of Quantum Dynamics: • ESDW in Quantum Mechanics and Electronic Structure, 27 June - 8 July 2016, CECAM-FR-MOSER, Maison de la Simulation, Saclay, France. Workshop report. • ESDW in Quantum MD, 17-28 July 2017, CECAM-IRL, UCD, Dublin, Ireland. Workshop report. • ESDW in Quantum Dynamics, 18-29 June 2018, CECAM-FR-MOSER, Maison de la Simulation, Saclay, France. Workshop report. • ESDW in Quantum Dynamics, 8-19 July 2019, Durham University, UK. Workshop report.
Societal impact As technology reaches smaller time and length scales, the quantum properties of matter gains relevance for society and industry. Furthermore, the ubiquitous presence of hydrogen in materials (e.g. impurities in steel) and devices for clean energy (proton based batteries) makes these simulations relevant also in macroscopic devices and at ambient conditions. Finally, the recent interest in the potential revolution of quantum computing – both in terms of simulating devices that could be used as q-bits, and developing quantum algorithms for application in areas ranging from chemistry to cryptography – has opened a whole new set of opportunities for interactions with hardware developers. Quantum dynamical simulations are then increasingly important in many industrial sectors, including hardware design (e.g. coherence and interference effects for quantum control or design of q-bits), pharmaceutics (tunneling in enzymatic reactions), energy production or storage (when light is used to induce quantum physical or chemical transformations). Collaborations in these fields are already active. In the context of E-CAM, new simulation methods and algorithms for quantum computing were developed in collaboration with IBM. Similar collaborations involve participants to this WP. Surface hopping and multiple spawning are methods of choice to simulate the excited-state dynamics of molecular systems. Applications of these techniques to dyes and emitters have been reported at WP workshops, and brought new insights for the design of molecules in domains such as dyesensitized solar cells (collaborative projects, for example with, Dyesol, Greatcell) or organic light emitting diodes (collaborative projects, for example with, BASF, Novaled). Excited state, and in particular non-adiabatic, dynamics is potentially interesting also for pharmaceutical companies, for example in connection to preventing photo-damage (leading to skin cancer). Applications of quantum dynamics techniques are also central to reveal reaction mechanisms of atmospheric molecules. Models for these reactions currently employ experimental data, but usually require to complement insufficient data with very simplified models, often not accurate enough. Using quantum dynamics to circumvent the problem of missing experimental data would lead to more accurate atmospheric composition models, with a direct societal impact related to the study of chemical reactions involving small molecules on current climate changes. It is important to stress that although the relative novelty of the field of quantum dynamics compared to classical molecular dynamics or electronic structure calculations, the field is now approaching sufficient maturity to pursue industrial engagement more actively. To promote this, it is important to create communication channels between
E-CAM Final Report
Page 17
academia and industry to disseminate the potential applications of the field, and the role of E-CAM is crucial in this regard.
3.4 WP4, Meso and Multi-scale Modelling E-CAM’s WP4 focuses on the development of software for meso– and multi–scale modelling simulation of systems on the mesoscopic scale. This WP developed 57 software modules (against a target of 43) which were certified according to the E-CAM guidelines [10]. The modules developed are: • Modules that contribute to the porting of DL_MESO to multi-GPUs. In collaboration with the UKRI STFC Daresbury Laboratory, E-CAM has developed a highly efficient version of DL_MESO (DPD version), a software package for mesoscale simulations developed at the UKRI STFC. This distributed GPU acceleration development is an extension of the DL_MESO package to MPI+CUDA that exploits the computational power of the latest NVIDIA cards on hybrid CPU–GPU architectures. The need to port DL_MESO to massively parallel computing platforms arose because often real systems are made of millions of particles and small clusters are usually not sufficient to obtain results in brief time. Moreover, with the advent of hybrid architectures, updating the code is becoming an important software engineering step to allow scientists to continue their work. The modules associated with the GPU rewrite of DL_MESO_DPD are listed in our software library here. A publication describing the work is in Ref.[3], as well as in success stories here and here. • Modules that develop polarizable meso-scale models. Within a pilot project at partner UKRI STFC on Polarizable Mesoscale Models, in collaboration with Unilever, E-CAM researchers built a realistic model of water to be used in Dissipative Particle Dynamics (DPD) simulations, and developed the related utilities for the DL_MESO_DPD code. The list of modules for this purpose are listed here. A success story documenting this work is stored here. • Modules that help studying the rheological properties of new composite materials. In a pilot project in collaboration with Michelin, E-CAM researchers at the MPIP Mainz implemented a hierarchical equilibration strategy for polymer melts in the ESPResSo++ software package, which can help to accurately determine and predict properties of polymer materials (e.g. rheological properties). This is highly important to researchers and industry. The list of modules for this purpose are listed here. A success story documenting this work is stored here. • Modules that extend the capabilities of the ParaDIS code. These modules extend the ParaDIS code for discrete dislocation dynamics, with inclusion of precipitates interactions. The ParaDIS code is scalable and E-CAM work also focused on optimizing the code to run on HPC environments. The list of modules for this purpose are listed here. • Modules that develop the GC-AdResS scheme . The goal of the E-CAM pilot project associated to this work, was to establish GC-AdResS (Grand Canonical Adaptive Resolution Scheme) as a standard method, and to make it easier to use with any Molecular Dynamics engine. The new implementation was based on the existing GCAdResS code in GROMACS version 5.1.0 to 5.1.5, but it can be employed in any other MD packages such as e.g. LAMMPS and ESPResSo++. The coupling to the continuum is also being done in the code HALMD. The software developed in the context of this work is listed here. A case study describing the work is here. • Modules that implement the ALL load balancing library. Scalability of parallel applications depends on a number of characteristics, among which is efficient communication, equal distribution of work or efficient data lay-out. Especially for methods based on domain decomposition, as it is standard for, e.g., molecular dynamics, dissipative particle dynamics or particle-in-cell methods, unequal load is to be expected for cases where particles are not distributed homogeneously, different costs of interaction calculations are present or heterogeneous architectures are invoked, to name a few. For these scenarios the code has to decide how to redistribute the work among processes according to a work sharing protocol or to dynamically adjust computational domains, to balance the workload. The “A Load Balancing Library” (ALL) developed within E-CAM at the Jülich Supercomputing Center aims to provide an easy and portable way to include dynamic domain-based load balancing
E-CAM Final Report
Page 18
into particle based simulation codes. It provides several schemes to find the ideal split of the workload, from the simplest orthogonal non staggered domain decomposition, to the more complex Voronoi mesh scheme. Library documentation is available here. The latest release is 0.9.1 and can be found under the library GitLab page. A story reporting on this work is available at this location. • Modules for the Mesoscale Modelling of phoretic phenomena in binary fluids. In the context of an E-CAM pilot project described here, E-CAM researchers study diffusiophoresis and diffusioosmosis phenomena in heterogeneous fluids with the Lattice-Boltzmann method. These phenomena are of particular interest to the industrial partner associated to this pilot project, Unilever, with whom E-CAM collaborates in the research of binary fluid flow through pores near the liquid-liquid critical point. The software developed in the context of this work is listed at this location. The full portfolio of modules developed under WP4 is accessible from the software library for WP4 at https://e-cam.readthedocs.io/en/latest/Meso-Multi-Scale-Modelling-Modules/.
Dissemination and exploitation Ten scientific publications originated from the project in the area of meso– and multi–scale modelling. 1. Comparing equilibration schemes of high-molecular-weight polymer melts with topological indicators, Luca Tubiana et al., Journal of Physics: Condensed Matter 2021, 33(20), 204003. DOI: https://doi.org/10.1088/1361-648x/abf20c. Open access. 2. Towards blood flow in the virtual human: efficient self-coupling of HemeLB, J. W. S. McCullough et al., Interface Focus 2020, 11, 20190119. DOI: 10.1098/rsfs.2019.0119. Open access 3. Towards extreme scale dissipative particle dynamics simulations using multiple GPGPUs, J. Castagna et al., Comp. Phys. Commun. 2020, 251, 107159. DOI: 10.1016/j.cpc.2020.107159.Open access 4. ESPResSo++ 2.0: Advanced methods for multiscale molecular simulation, Horacio V. Guzman et al., Comp. Phys. Commun. 2019, 238, 66–76. DOI: 10.1016/j.cpc.2018.12.017. Open access version here. 5. Molecular Dynamics of Open Systems: Construction of a Mean-Field Particle Reservoir, Luigi Delle Site et al., Adv. Theory Simul. 2019, 1900014. DOI: 10.1002/adts.201900014. Open access version here. 6. Adaptive resolution molecular dynamics technique: Down to the essential, Christian Krekeler et al., J. Chem. Phys. 2018, 149, 024104. DOI: 10.1063/1.5031206. Open access version here. 7. Ionic Liquids Treated within the Grand Canonical Adaptive Resolution Molecular Dynamics Technique, B. Shadrack Jabes, Christian Krekeler, Computation 2018, 1, 23. DOI: 10.3390/computation6010023. Open access version here. 8. Probing spatial locality in ionic liquids with the grand canonical adaptive resolution molecular dynamics technique, B. Shadrack Jabes et al., J. Chem. Phys. 2018, 148, 193804. DOI: 10.1063/1.5009066. Open access version here. 9. Towards Open Boundary Molecular Dynamics Simulation of Ionic Liquids, Christian Krekeler and Luigi Delle Site, Phys. Chem. Chem. Phys. 2017, 19, 4701-4709. DOI: 10.1039/C6CP07489H. Open access version here. 10. Computational efficiency and Amdahl’s law for the adaptive resolution simulation technique, Christoph Junghans, Animesh Agarwal and Luigi Delle Site, Comp. Phys. Commun. 2017, 215, 20-25. DOI: 10.1016/j.cpc.2017.01.030. Open access version here.
E-CAM Final Report
Page 19
The software produced within this WP was disseminated via the articles above, but also at conferences and workshops organized by the members of the WP; via the project website (e.g. in success stories, in the modules of the month category, on the newsletter, etc.). For an overview of the news items on our website that are associated to this WP see here. Six deliverables produced during the project lifetime and that are listed here (number 4.x deliverables) provided further dissemination channels. We would like to note that the 2nd article on this list relates to a collaboration between E-CAM and the CompBioMed Centre of Excellence that started in an E-CAM ESDW on the Load Balancing Library ALL. The purpose of this collaboration was to analyse and test whether the use of ALL could improve the existing scalability of CompBioMed’s flagship code HemeLB. The software module reporting this cooperation is at this location. Further developments on this topic are expected in the near future. The ALL library is also being exploited by other codes such a the Material Point Method (MPM) code GMPM-PoC and the multi-GPU version of the DL_MESO_DPD package (see here), showing a high-performance redistribution of the work load across GPUs. The rewrite of the DL_MESO_DPD code reported in the 3rd article of this list allows the simulation of complex systems comprising billions of atoms on thousands of GPGPUs. This is necessary to simulate surfactants, key ingredients in personal care products, dish soaps, laundry detergents, etc., with great impact in several industries. It is important to note that all the work done within this WP expands the capabilities of different mesoscale codes allowing them to be used for further applications. This is complemented by the ALL library that proposes to improve the scalability of many of these codes to a larger number of cores on HPC systems, and thus to reduce the time-tosolution of the applications.
Industrial impact Industry connection within this WP happened through four pilot project in collaboration with industry. Connection with industry arose also from the work on the DL_MESO_DPD code. Within UKRI STFC, DL_MESO_DPD is involved in projects with Unilever, Syngenta, Infineum, IBM Research Europe, and the STFC spinout company Formeric. A training event dedicated to industry was organized on the DL_MESO_DPD code and the use of Dissipative Particle Dynamics (DPD) simulations to study complex systems (webpage for the event). Furthermore, we have held four workshops in WP4 that dealt with topics of interest for industry: 1. Challenges in Multiphase Flows, 9 - 12 December 2019, Monash University Prato Center, Italy, and CECAM-DESMSM, Mainz, Germany. Workshop report. 2. State-of-the-Art Workshop in Mesoscale and Multiscale Modelling, 29 May - 1 June 2017, CECAM-IRL, University College Dublin, Ireland. Workshop report. 3. Scoping Workshop: Dissipative particle dynamics: Where do we stand on predictive application, 24 - 26 April 2018, CECAM-UK-HARTREE, United Kingdom. Workshop report. 4. Scoping workshop: E-CAM perspectives on Simulation, Modelling and Data in Industry, 7-9 September 2016, CECAM-DE-SMSM, Mainz, Germany. The scoping workshop (number 3 on the list) had particular success among industry and was attended by 12 industrialists from 6 large companies. The scoping workshop number 4 of the list was attended by 4 large companies and 2 SMEs.
Training During the lifetime of the projects we organized five ESDWs in the area of meso– and multi–scale modelling: • ESDW on Meso and multiscale modeling, 18 - 29 September 2017, CECAM-DE-MMS, Freie Universität Berlin, Berlin, Germany. Workshop report. • ESDW on Meso and Multiscale Methods, 3 - 14 July 2017, CECAM-ES, University of Barcelona, Spain. Workshop report.
E-CAM Final Report
Page 20
• ESDW on Load Balancing for Particle Simulations, 6-7 September 2018 (1st part) and 3-7 June 2019 (2nd part), Forschungszentrum Jülich, Jülich Supercomputing Centre, Germany. Workshop report. • ESDW on Mesoscopic simulation models and High-Performance Computing, 14 - 18 October 2019, Aalto University & CSC IT Center for Science, Finland. Workshop page. • ESDW in HPC for mesoscale simulation, 18 - 22 January 2021, Online/CECAM-UK-DARESBURY, Daresbury Laboratory, UK. Workshop page. A training session on the E-CAM Load Balancing ALL took place on the 11th December 2020, in the form of a webinar. Three key lectures have been recorded and stored on E-CAM’s Online training portal here.
Societal impact The societal benefits of the results achieved in WP4 of E-CAM are twofold. At the fundamental level, the software modules developed are expected to provide new avenues for material modeling and will make it possible to have an impact on the prediction of the properties of new materials, and in general in material science. This will be complemented by the tools developed within the project to improve performance and scalability of the codes (such as the ALL), with potential for benefiting the broader community. At an economic level, industry will benefit from software containing efficient and easy to use simulation and analysis tools to extract observables for applications in sectors such as pharma, materials or house-hold products. These developments can provide support in applications including but not limited to: • Drug handling: kinetics of biopolymers, proteins and membrane interactions; • Food/dairy industry: protein aggregation, grain size in ice cream, food preservation, food stability and texture control; • Tyre industry: development of novel composite materials, determination of rheological properties of materials; • Materials science and daily products: surfactant kinetics, material stability, soft matter, self-assembly of nanomaterais, colloids, liquid crystal based materials, liquid-surface interactions; • Oil industry: studies of flow in porous media. Society can benefit greatly from the development of new computational algorithms and the corresponding software that make the development of new materials cheaper and improve food quality and products in daily use.
3.5 WP5, Training The pillars of the training in E-CAM were the ESDWs, transversal training events in collaborations with PRACE, and the online training infrastructure. We held 30 training events during E-CAM (18 of which, ESDWs); trained 474 participants at different stages of their careers at these events, and ESDW participants opened 90 software modules that were certified. Lectures originating from these events were recorded in our training portal at https://training.e-cam2020.eu/. E-CAM’s online training portal is built is Clowder, which is developed at NCSA. Clowder is a research data management system designed to support any data format and multiple research domains. E-CAM has expanded the capabilities of Clowder to be able to, among other things: • shrink the input video to just 1.2MB per minute for full HD video (roughly the same size as simple stereo audio files) • extract the slides from the captured presentation and prepare them for the previewer • create a navigation panel for the video allowing the user to easily jump between slides in the video (and allowing slide navigation to be auto-synchronised with the video) • when URLs are given as teaching material, display previews of target URL (as well as some additional site information)
E-CAM Final Report
Page 21
• change the file format for the slide metadata images, saving up to 70% of the space required for these images. So far, we have 129 datasets on the portal, with lectures captured at our training events. Lectures are given a license according to the speakers consent (through a consent form circulated at ESDWs). The vast majority of lectures are fully public. Lectures have tags, allowing users to aggregate results by scientific topic. A video tour of our online training portal is at this location. Collaborations with other players are also being explored to ensure the sustainability of our training portal, which we believe is extremely valuable for the community, especially in a period where events are running mostly online with lectures being recorded and stored massively. More specifically, a collaboration will start between CECAM/ECAM, MaX and the NCCR centre MARVEL, to explore our development efforts in the Clowder platform and set up a state-of-the-art web portal for training and dissemination material and beyond. WP5 analysed the profile of the participants to our ESDWs and performed surveys, for events happening between 2016 and 2021. Below we report on the analysis of 1. the participants profile (country of residence, gender, qualification), 2. the satisfaction surveys, 3. training needs highlighted by the participants of our ESDWs.
Profile of ESDW participants The country of residence for the people participating at our ESDWs is listed in Table 1. United Kingdom, Germany and France represent the countries where the majority of people come from, followed by different countries spread around Europe and beyond. 8% of our participants come from countries outside the EU, and among these the US (4%). In respect to gender, we have 16% of female participation to our ESDWs (see Table 2). Table 1: Country of residence for the participants of our ESDWs. Country
%
United Kingdom
17
Germany
16
France
15
Ireland
9
Netherlands
6
Spain
6
Switzerland
5
Italy
5
Austria
5
United States
4
Norway
2
Other countries in the EU
3
EU-13 countries
3
Other countries outside the EU
4
Table 2: Gender balance at our ESDWs. Gender
%
Female
16
Male
84
E-CAM Final Report
Page 22
The majority of people attending our ESDWs are senior scientists (professors, assistant professors, lecturers) or scientists (post-docs, researchers, programmers, research software engineers), followed by PhD students, master students and industrial researchers. See Figure 2 for more details on this distribution.
Figure 2: Position occupied by the people attending our ESDWs.
Participants surveys E-CAM sends out surveys to the participants of its ESDWs, at the end of each event. The survey is anonymous and helps E-CAM to improve the quality of E-CAM extended software development workshops by reviewing the participants’ replies to 15 key questions. There are also spaces for people to make additional comments if they wish. Figure 3 summarizes the answers from 73 participants to 11 of our ESDWs, to the following questions selected from the survey: 1. Impact of the meeting on your research 2. Was there enough discussion time at the meeting 3. Evaluate the science presented 4. Evaluate the software developed 5. How much did you learned during the meeting 6. Evaluate the quality of the training material provided 7. Did you started the development of one or more software modules to include in the E-CAM repository 8. Was there enough support for the development of your modules. The participation rate to the survey corresponded to 35% of the total number of participants to these workshops. The feedback that we got clearly showed that workshops will trigger interesting new approaches on people’s research (with 14% of the people expressing that it will have a major influence on their research). 91% of the people acknowledged the time dedicated to discussions at the meeting, whether discussions were guided by a specific topic, and their relevance and topicality. The majority of participants evaluated the science as being of a really good quality and 16% considered it as being leading edge. The software developed at the meeting was considered better than what they would have expected by 41% of the participants. It was confirmed that the majority of people learned software best practices at the meeting, and that the training material provided was above average to excellent. 74% of the participants to our ESDWs started the development of one or more software modules to include in the E-CAM Software repository, and almost the totality of people judged that there was enough support for the development of software.
E-CAM Final Report
Page 23
Figure 3: Results of the ESDW participants’ surveys. Since the fall of 2019, surveys also ask participants about their training needs. The following items have been collected so far: 1. Hands-on on code optimization 2. Code debugging 3. Code testing 4. Git, GitHub, GitLab training
E-CAM Final Report
Page 24
5. Writing bindings (python as a syntactic glue) 6. Include discussions on tools and tricks/hints that the software developers can use in their everyday workflows 7. Image processing and computer vision. It is important to highlight that most of these topics have been integrated into our programme of events. Associated lectures were stored on our online training portal.
3.6 WP6, Software Infrastructure WP6 is concerned with coding standards and their documentation, the development of interfaces from code to hardware, the implementation of low-level libraries, and the provision of the software tools of the CoE. These efforts have been described in deliverables from this WP, with D6.7: E-CAM Software Platform V [16] being the last iteration of the series of deliverables from WP6. This WP provided general development assistance to the software modules created by the postdoctoral fellows of ECAM, the attendees of the ESDWs and the scientists within E-CAM. E-CAM software is documented in our Software Library, deployed by WP6, which is divided into individual repositories for each of the current focal areas. In the table below we provide direct links to the rendered documentation websites for each of the scientific areas. A snapshot of the software library is in Fig. 4. Scientific area
Link to online documentation
Classical MD
Online documentation
Electronic Structure
Online documentation
Quantum Dynamics
Online documentation
Meso- and Multi-scale Modelling
Online documentation
Figure 4: Snapshot of the E-CAM Software Library. A software module for E-CAM is any piece of software that could be of use to the E-CAM community and that encapsulates some additional functionality, enhanced performance or improved usability for people performing computational simulations in the domain areas of interest to us. A final E-CAM module adheres to current best-practice programming style conventions, is well documented and comes with either regression or unit tests (and any necessary associated data). E-CAM modules are written in such a way that they can potentially take advantage of anticipated hardware developments in the near future. The E-CAM programmers oversee and implement the different stages of software development, under the supervision of the Software Manager. In addition, WP6 has contributed to the creation of EESSI, it’s integration in Magic Castle (a cloud-based HPC training environment), and it’s availability as a GitHub Action2 (for continuous integration). 2 GitHub Actions allow you to automate, customize, and execute your software development workflows in your GitHub repository
E-CAM Final Report
Page 25
3.7 WP7, Hardware considerations and the PRACE relationship WP7 is concerned with the monitoring of hardware developments, liaising with PRACE, interfacing of codes to new hardware, porting, benchmarking and scaling of software on new architectures and support for the user community in the use of new hardware. Some of the most important contributions of WP7 are highlighted below: • An High Throughput Computing (HTC) project, jobqueue_features, that recognises the role of ensemble calculations in the E-CAM community and provides an adaptive framework that efficiently maps it to extreme-scale resources. This project was initially carried out in collaboration with PRACE and is implemented as a Python library built upon the scalable analytics framework Dask. • An A Load-balancing Library (ALL) which aims to provide an easy way to include dynamic domain-based load balancing into particle based simulation codes. The library is developed in the Simulation Laboratory Molecular Systems of the Juelich Supercomputing Centre at Forschungszentrum Juelich. • WP7 is a frequent EasyBuild contributor and is now contributing to EESSI, which will port and optimise entire software stacks to new architectures in a way that provides a consisting computing environment on laptop, cloud and the largest supercomputing sites.WP7 has shown that the EESSI does not sacrifice hardware performance, showing that both network and accelerator hardware can be used effectively in this approach. • Development of the Multi-GPU version of DL_MESO, now scaling with 85% efficiency to 4096 GPUs. • WP7 has assisted in the creation of LearnHPC which leverages EESSI and Magic Castle to create dynamic, eventspecific HPC training clusters in public clouds (such as AWS, Azure, GCP and OpenStack). LearnHPC has been awarded resources from the Fenix Research Infrastructure and is in discussion with PRACE as a candidate for remote learning resource for PRACE courses. • WP7 has also engaged in a pilot project with the training WP of the Centre of Excellence (CoE) coordination project FocusCoE (where the E-CAM Software Manager is the training WP co-lead). The intention of the project was to create an environment and a protocol to develop specific training content on how to efficiently run a specific, heavily-utilised, community application at scale on European HPC resources. For the pilot, LAMMPS was chosen as the target application. This collaboration was conceived and designed during E-CAM. The outcome is available at Running LAMMPS on HPC systems. • Implementation of a CUDA based mesoscale bubble dynamics code capable of simulating foam coarsening and mechanics consisting of millions of bubbles in 3D. • WP7 has investigated, and promoted, the use of performance portable programming models for accelerators. In particular we have included support for Kokkos in the n2p2 neural network potential library, and explored it’s performance as an alternative to CUDA in DL_MESO. We have also tested the use of HIP in the Aalto bubble code.
3.8 WP8, Engagement with Industry WP8 is concerned with stimulating the broad and committed involvement of industry in the activities of E-CAM both to support and enable relevant commercial activities and to stimulate and grow sponsorship in the quest for sustainability. The pillars of E-CAM’s relationship with industry are: pilot projects focused on industrially oriented problems, state-of-the-art and scoping workshops and training courses for industrialists.
Delivery of state-of-the-art and scoping workshops At Scoping Workshop (SCOW) events industrialists and academics work together to sharpen and focus the work plan for E-CAM. Academics outline the major advances to be expected in each interest area, with an emphasis on applications, and industrial partners outline the areas they wish to see developed, and the kind of support that they require. State-of-the-art Workshop (SAW) events survey new methods and algorithmic developments in simulation and participate to establishing the codes and software modules that should be included in the E-CAM repository as well as to
E-CAM Final Report
Page 26
disseminate current E-CAM produced software. They establish the current state-of-the-art and highlight the immediate developments required. During the lifetime of the project we organized 7 SCOWs and 8 SAWs, listed in the sections dedicated to each of the scientific WPs. Industry acted as co-organizer of 5 scoping workshops. In total, 66 industrial researchers had access to discussions and expertise at these events. Workshop reports from these events are stored on our website at this location.
Pilot projects E-CAM supported academic and industrial research via a set of pilot projects focused on industrially oriented problems, sustained by funded PDRAs supervised by scientists in the team. During the EU funding period E-CAM ran 17 pilot projects, 10 of which included in an industrial collaboration and 7 that enabled HPC best practices with potential transfer to industry. The E-CAM website hosts webpages for each of the pilot projects in the project. These pages provide detailed information about the work description, list of tasks, list of developed modules, and all material disseminated in the context of the pilot projects. The PDRAs working on the pilot projects produced 116 software modules, accessible to users beyond the specific partners via the E-CAM website and our repository. Software is licensed for re-use and re-distribution, with specific restrictions, where necessary, dependent on the particular licence of the specific software. License information is available in the documentation of every software module.
Training courses for industrialists In the original E-CAM proposal, two main vehicles to promote innovative and effective interactions with industries were indicated: collaborative pilot projects focused on industrially oriented problems and scoping workshops. To further expand the portfolio of activities targeted at industrialists, E-CAM has established a series of new events targeted at training interested industrial researchers on the simulation and modelling techniques implemented in specific codes and in the direct use of this software for their industrial applications. E-CAM wishes to provide an opportunity for large companies and SMEs to broaden their scientific expertise by organizing training courses for industrialists on specific families of computational methods, scientific software and HPC tools. In these training courses, E-CAM provides the experts, delivering training on the methods. That is combined with hands-on training on software development. The first event of this kind happened in April 2021, and focused on the area of meso– and multiscale simulations (WP 4) and on the flagship code DL_MESO (see event website).
3.9 WP9, Dissemination WP9 is concerned with the promotion of E-CAM’s activities via all relevant media to both the academic and industrial communities. Dissemination efforts towards the general public have also been pursued. The public face of E-CAM is the E-CAM primary landing website. On our website we disseminate projects results in the following formats: • Success stories reporting successful industrial-academic collaboration; • Case studies with software developed in E-CAM and their potential applications; • News items reporting E-CAM activities and results; • Interviews with industrialists and academics in the E-CAM community on topics of general interest for our target groups; • Featured software in the "modules of the month" category; • Results associated to our pilot projects, on the dedicated Pilot project webpages;
E-CAM Final Report
Page 27
• Scientific publications ; • Scientific reports from our workshops; • E-CAM events on our event’s calendar ; • E-CAM Newsletters disseminated (also via email) to our target groups. In addition to the above, we highlight the following dissemination activities during E-CAM: 1. Comic book about E-CAM and how it enables modelling, simulation and HPC(Fig. 5); 2. E-CAM webpage on the CECAM website; 3. article on the EU research Magazine. The E-CAM issue of Comics&Science mentioned in 1. is freely available on our website at https://www.e-cam2020.eu/ecam-issue-of-comics-science/ and is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
Figure 5: Cover of the E-CAM issue of Comics&Science.
E-CAM Final Report
Page 28
4 Summary of Exploitable results The exploitable results in E-CAM and the way in which we currently exploit them is as follows: • Scientific software – Software produced in E-CAM uses, where possible, open source licences such as GPL, LGPL and BSD. Where the software developed is contributed to a pre-existing external software application, the specific contributor agreements and licence terms of the external software apply (see D11.5: Data Management Plan[17] (version 2.0)); – Software can be used by academics in further research activities (outside the action), and industrialists for further development and to integrate other features important for industrial applications. Software documentation includes, where possible, a practical applications and exploitation section, as well as the deliverables reporting them; – Documentation of the software modules developed within E-CAM is stored on the E-CAM Library repository, which is open to contributions from anyone in the E-CAM community; – The publicly available E-CAM Software Library hosts the documentation for all the software modules produced by each E-CAM PDRA, the attendees of the ESDW events and the scientists within E-CAM; – Software produced in E-CAM (for OPS, n2p2, E-CAM HTC library, ALL load balancing library, ESL, etc.) has been exploited at several ESDWs, where participants external to the project have applied them to their own use cases and applications. • Innovative ideas / inventions – Discoveries having a commercial value and associated data-sets are under intellectual property (IP) protection (e.g. through patents, copyright) before any dissemination activities (see D11.5: Data Management Plan [17] (version 2.0))); – Where an industrial partner is involved, special attention is given to the software license that is used, and we make an effort in clarifying these issues from the start of the collaboration. The software developed may be kept under embargo until publication(s) leveraging the developments achieved. Such modules can be maintained as part of a private E-CAM repository until results are published and/or for intellectual property requirements, particularly relevant when dealing with industrial partners; • Scientific publications Open access version of the documents are made available through the E-CAM repository on Zenodo or arxiv.org, allowing knowledge transfer. The publications are accessible through a dedicated page on the E-CAM website. • Workshops scientific reports These are stored on our website and disseminated to academy and industry through email messages and our newsletters. They can be used in further research activities, help to identify new directions, monitor developments, allow to transfer knowledge to both industry and academia and assist in future workshops planning. The workshop scientific reports are available on a dedicated page, on the E-CAM website. • Deliverables Non-confidential deliverables in E-CAM are made available in the E-CAM repository on Zenodo. They are also attributed a DOI, allowing them to be cited in scientific publications and presentations. The list of deliverables is available on a dedicated page, on the E-CAM website; • Online Training portal – Material generated at our events is made available through our online training infrastructure. The training material can assist the participants of our ESDWs, our PDRAs and other interested groups to develop software for extreme-scale hardware. Access to the online training material is possible via registration to the E-CAM online infrastructure. Most of the lectures recorded at our events are public on our website ;
E-CAM Final Report
Page 29
– The portal also allows contributors to the repository to disseminate their training/expository material more widely for constructive use in future events; • Case studies & success stories A selection of case studies and success stories related to E-CAM’s pilot project activities are available on our website and on the FocusCoE website and licensed under the Creative Commons Attribution 4.0 International License. The case studies and success stories can be read on the dedicated page, on the E-CAM website. • Calendar of events The E-CAM calendar of events is integrated in the CECAM website and the Focus CoE portal. All our events are open to the community external to E-CAM and in particular our target groups in industry and academia. Applications are made directly through the CECAM website. • E-CAM Comics The E-CAM issue of Comics&Science is freely available on our website and it is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Everyone in the extended E-CAM community can use it as a vehicle to promote HPC and simulation and modelling to the general public.
E-CAM Final Report
Page 30
5 Conclusions E-CAM has sought to prepare the extended E-CAM community for the demands of upcoming exascale systems. We have addressed core cross-cutting topics such as • Load balancing - through the development of the ALL load-balancing library, • Intelligent High Throughput Computing - through the development of the jobqueue_features library • Accelerator portability - through the promotion of libraries such as Kokkos, and implementations in E-CAM codes, • Portability of entire software stacks - through contributions to EasyBuild and, now, EESSI E-CAM has paved the way for the portability of entire application stacks and workflows to all the architectures that will be available through EuroHPC. These alone are major contributions to preparing the E-CAM community, providing them with some of the tools that they will need to address issues that are likely to occur at scale: hardware diversity, portability of workflows, load imbalance, resiliency, coordinating ensemble petascale calculations. . . E-CAM has gone far beyond cross-cutting work, however, and has repeatedly worked with individual applications to evaluate and improve their performance, collaborating with other projects in the EuroHPC eco-system along the way. A primary example of this is given by the DL_MESO application which, as a result of E-CAM, has been shown to be scalable to 4096 GPUs (the largest GPU partition available in Europe at the time). In addition, E-CAM has raised awareness of best practices in scientific computing, attempting to encourage people to consistently use • version control, • documentation, • continuous integration, • modular software development, • configure/build/install installation processes and the tools that can support these, • open source software licences. E-CAM has trained hundreds of scientists and exposed them to High Performance Computing (HPC) resources and tools that they might never otherwise have known. It has raised awareness of the EuroHPC eco-system among the thousands of scientists that are associated with CECAM, and trained the next-generation of those scientists so that HPC is a go-to tool in their computational workflows. The scientific output from E-CAM is remarkable, both in terms of the development of new algorithms and methods and the related scientific publications with clear benefits for academic and industrial researchers. We produced 44 publications during E-CAM, and our articles have more than 1980 citations so far. E-CAM has injected about 5.5 M€ funding into the community, to fund developments that are freely available on our software library, on the training portal and through open access scientific publications. Our software library is composed of 220 certified software modules, and ten more modules are work in progress. A set of actions (software development, new formats for workshops) has been initiated that has clearly proved its strategic interest for the community and will be maintained and further developed. Specific tools and initiatives will be continued and have funding at least for the next 48 months. A list is below. Workshops • Extended Software Development Workshops: – Improving bundle libraries, 11-22 October 2021, CECAM-HQ (EPFL). Workshop website.
E-CAM Final Report
Page 31
– HPC for simulation of complex phenomena, 11-15 October 2021, IIT (Genova). Workshop website. – Inverse Molecular Design & Inference: building a Molecular Foundry (2nd part), Fall 2021, NUID UCD (Dublin). Workshop website. • Scoping workshop: – Simulation of open systems in Chemistry, Pharma, Food Science and Immuno-diagnostics: Rare-event methods at constant chemical potentials including constant pH (2nd part), Fall 2021, NUID UCD (Dublin). Workshop website. • State-of-the art work workshop: – NOMAD/E-CAM workshop on Modeling materials at realistic space and time scales via optimal exploitation of exascale computers and AI, 1-3 November 2021, CECAM-HQ (EPFL). Workshop website. • Industrial training event with the SME BiKi Technologies, 22 Nov 2021 (online). Infrastructures • The software library will continue to exist, supported by CECAM. In addition, several software development projects originated in E-CAM and stored on the software library will be completed during the sustainable phase (we have 10 modules that are work in progress). • Our online training portal will continue to exist, supported by CECAM. The training portal will be further developed by CECAM, MaX CoE and the NCCR MARVEL, who will expand our efforts on the clowder platform and build a powerful and multipurpose repository with the potential to become a world reference for domain specific and general HPC training for the broad community of simulation and modelling. • We have built a dedicated E-CAM section on the CECAM website, to ensure E-CAM’s most important results continue to be disseminated and communicated to the target groups even after the EU funding period, as well as the future activities that we plan to run beyond March 2021. Software development projects • Some of the work that originated in E-CAM will continue beyond the EU funding period. That is for example the case of the Molecular Biosensor work and the work on the pilot project Mesoscale Modelling of phoretic phenomena in binary fluids, both tackling clear industrial problems and in collaboration with an industrial counterpart. • Other software development projects highly supported during E-CAM will continue expanding: OPS, n2p2, MaZe, ESL, GPU DL_MESO etc.
The overall developments achieved in E-CAM have represented a considerable effort, due both to the original ambitious goals of the project and an increased focus on extreme scale computing as the project progressed. In spite of this, all original goals of the project have been exceeded and new commitments have been successfully tackled with intelligence and determination. We are extremely grateful to all those who have contributed, in their different roles, to this success and very proud of the community for its dedication and effectiveness. The Horizon2020 funding has covered only a fraction of the effort, material and in kind, of our team. The remarkable number of external contributions to our software library clearly demonstrates that the E-CAM approach has been appreciated and it’s well worth preserving. We will continue to work motivated by our conviction that the legacy of our efforts will be felt in the community even more in the future, as the increasing amount of EuroHPC resources begin to materialise. E-CAM has strongly contributed to put the community in a position where their entire software stack will be portable to these systems, and created the tools for them to use these resources to do great science without an overwhelming technical burden.
E-CAM Final Report
Page 32
References Acronyms Used CECAM Centre Européen de Calcul Atomique et Moléculaire HPC High Performance Computing PRACE Partnership for Advanced Computing in Europe ESDW Extended Software Development Workshop SAW State-of-the-art Workshop SCOW Scoping Workshop Work-Package Centre of Excellence High Throughput Computing
WP CoE HTC
PDRA Postdoctoral Research Associate QMC Quantum Monte Carlo Electronic Structure Library Molecular Dynamics OpenPathSampling
ESL MD OPS
IP intellectual property ALL A Load-balancing Library MLWF Maximally localized Wannier functions DPD Dissipative Particle Dynamics EESSI European Environment for Scientific Software Installations MPM
Material Point Method
Citations [1] A. Ó Cais, D. Swenson, M. Uchronski, and A. Wlodarczyk, “Task Scheduling Library for Optimising Time-Scale Molecular Dynamics Simulations,” Aug. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3527643 [2] J. W. S. McCullough, R. A. Richardson, A. Patronis, R. Halver, R. Marshall, M. Ruefenacht, B. J. N. Wylie, T. Odaker, M. Wiedemann, B. Lloyd, E. Neufeld, G. Sutmann, A. Skjellum, D. Kranzlmüller, and P. V. Coveney, “Towards blood flow in the virtual human: efficient self-coupling of HemeLB,” Interface Focus, vol. 11, no. 1, p. 20190119, 2021. [Online]. Available: https://royalsocietypublishing.org/doi/abs/10.1098/rsfs.2019.0119 [3] J. Castagna, X. Guo, M. Seaton, and A. O’Cais, “Towards extreme scale dissipative particle dynamics simulations using multiple GPGPUs,” Computer Physics Communications, vol. 251, p. 107159, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010465520300199 [4] M. J. T. Oliveira, N. Papior, Y. Pouillon, V. Blum, E. Artacho, D. Caliste, F. Corsetti, S. de Gironcoli, A. M. Elena, A. García, V. M. García-Suárez, L. Genovese, W. P. Huhn, G. Huhs, S. Kokott, E. Küçükbenli, A. H. Larsen, A. Lazzaro, I. V. Lebedeva, Y. Li, D. López-Durán, P. López-Tarifa, M. Lüders, M. A. L. Marques, J. Minar, S. Mohr, A. A. Mostofi, A. O’Cais, M. C. Payne, T. Ruh, D. G. A. Smith, J. M. Soler, D. A. Strubbe, N. Tancogne-Dejean, D. Tildesley, M. Torrent, and V. W.-z. Yu, “The CECAM electronic structure library and the modular software development paradigm,” The Journal of Chemical Physics, vol. 153, no. 2, p. 024117, 2020. [Online]. Available: https://doi.org/10.1063/5.0012901 [5] P. G. Bolhuis and D. W. H. Swenson, “Transition Path Sampling as Markov Chain Monte Carlo of Trajectories: Recent Algorithms, Software, Applications, and Future Outlook,” Advanced Theory and Simulations, vol. 4, no. 4, p. 2000237, 2020. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/adts.202000237 [6] D. W. H. Swenson, J.-H. Prinz, F. Noe, J. D. Chodera, and P. G. Bolhuis, “OpenPathSampling: A Python Framework for Path Sampling Simulations. 1. Basics,” Journal of Chemical Theory and Computation, vol. 15, no. 2, pp. 813–836, 2019, pMID: 30336030. [Online]. Available: https://doi.org/10.1021/acs.jctc.8b00626
E-CAM Final Report
Page 33
[7] D. W. H. Swenson, J.-H. Prinz, F. Noe, J. Chodera, and P. Bolhuis, “OpenPathSampling: A Python Framework for Path Sampling Simulations. 2. Building and Customizing Path Ensembles and Sample Schemes,” Journal of Chemical Theory and Computation, vol. 15, no. 2, pp. 837–856, 2019. [Online]. Available: https://doi.org/10.1021/acs.jctc.8b00627 [8] R. Lot, F. Pellegrini, Y. Shaidu, and E. Küçükbenli, “Panna: Properties from artificial neural network architectures,” Computer Physics Communications, vol. 256, p. 107402, 2020. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0010465520301843 [9] V. Vitale, G. Pizzi, A. Marrazzo, J. Yates, N. Marzari, and A. Mostofi, “Automated high-throughput wannierisation,” npj Computational Materials, vol. 6, no. 1, p. 66, jun 2020. [Online]. Available: https: //doi.org/10.1038/s41524-020-0312-y [10] A. Ó Cais, “ESDW Technical Software Guidelines I,” Mar. 2016. [Online]. Available: https://doi.org/10.5281/ zenodo.841735 [11] M. Bialczak, A. Ó Cais, M. Uchronski, and A. Wlodarczyk, “Intelligent HTC for Committor Analysis,” Nov. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.4382017 [12] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, and N. Marzari, “wannier90: A tool for obtaining maximally-localised wannier functions,” Computer Physics Communications, vol. 178, no. 9, pp. 685 – 699, 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0010465507004936 [13] F. Fracchia, G. D. Frate, G. Mancini, W. Rocchia, and V. Barone, “Force Field Parametrization of Metal Ions From Statistical Learning Techniques,” Journal of Chemical Theory and Computation, Nov. 2017. [Online]. Available: https://doi.org/10.1021/acs.jctc.7b00779 [14] D. Lopez Duran, E. Plesiat, M. Krompiec, and E. Artacho, “Gap variability upon packing in organic photovoltaics,” PLOS ONE, vol. 15, no. 6, pp. 1–18, 06 2020. [Online]. Available: https://doi.org/10.1371/journal.pone.0234115 [15] L. M. Ibele and B. F. E. Curchod, “A molecular perspective on tully models for nonadiabatic dynamics,” Phys. Chem. Chem. Phys., vol. 22, pp. 15 183–15 196, 2020. [Online]. Available: http://dx.doi.org/10.1039/D0CP01353F [16] A. Ó Cais, “D6.7: E-CAM Software Platform V,” Dec. 2019. [Online]. Available: https://doi.org/10.5281/zenodo. 3598331 [17] K. Collins, A. O. Cais, D. Mackernan, and A. Mendonça, “Data Management Plan (version 2.0),” Oct. 2016. [Online]. Available: https://doi.org/10.5281/zenodo.3366123