FAIR4ChemNL - towards international, cross-institutional data management solutions across chemistry disciplines

If you have been following TDCC-NES activities, you might remember our excitement about potential future collaborations with Germany, especially with NFDI4Cat, after joining the CoRDI2023 Conference. In June, we took the first steps to concretize these opportunities, focusing on chemistry communities.

Workshop participants

Through an exciting 2-day workshop in Utrecht, we have aimed to establish the foundation for implementing the FAIR data principles and unified data management solutions across chemistry disciplines. The organisation of the workshop was a collaborative effort by the Dutch National Fundamentals and Methods in Chemistry (FMC) community and the NFDI4Cat consortium (Germany) with support from TDCC-NES. 

We brought together scientists, data stewards and data managers, research software engineers and IT specialists from Dutch and German academic institutions as well as the Netherlands eScience Center and SURF to discuss the challenges and solutions in chemistry digitalization and define the strategy towards setting common standards and processes to unite and strengthen the cross-disciplinary chemistry community. Our group of 40 participants included 13 presenters, 5 facilitators and more than 20 institutions, consortia and communities.

From knowledge exchange to aligned implementation

The 1st day of the workshop took the form of an engaging exchange of current best practices and available solutions, with lively discussions focusing on questions such as: What can we take to the next steps? How can we work together? What do we need to succeed?

Insights and knowledge from all phases of the research data management (RDM) cycle, from planning, collection and generation, through processing and analysis to storage and backup, were shared in 12 presentations.  

Prof. Norbert Kockmann condensed the key learnings from the 3 years of NFDI4Cat’s activities. NFDI4Cat is dedicated to enhancing RDM in catalysis-related sciences, adhering to the FAIR data principles (Findable, Accessible, Interoperable, and Reusable). To ease the transition for researchers, NFDI4Cat has been focusing on providing tools that simplify the shift to digital catalysis from day one, minimizing their workload through efficient and streamlined data handling practices. 

“Overcoming the barriers of catalysis research with open and FAIR data” was the topic addressed by Dr. Annette Trunschke. It showed how the implementation of a local data infrastructure and the connection to the open repository NOMAD, a web-based software for FAIR research data management in materials science, can contribute to consistent and reliable data sets and the storage of associated metadata. The presented digitalization concept was a fully automated process of data acquisition, subsequent standardized analysis, uploading to a local database and the generation of links between database entries, involving working according to machine-readable standardized operating protocols (SOPs). 

Dr. Nongnuch Artrith talked about “Computational Databases and Machine Learning for Applications in Energy Storage and Conversion”, discussing how computational databases and methods can optimize and design materials for energy storage and conversion. ML models, genetic algorithms, and molecular dynamics simulations, when combined, enable accurate simulations of realistic atomic structures, which demonstrates how FAIR data principles and unified data management solutions across the chemistry, materials, and physics disciplines could help accelerate materials discovery.

On behalf of SURF, dr. Maithili Kalamkar-Stam presented best practices and available services for data processing, analysis, backup and storage.

NFDI4Cat’s efforts in the area of infrastructure were presented by Dr. Thomas Bönisch and Prof. Sonja Schimmler. They presented the 4Cat Meta Portal which serves as the main entry point for exploring the entire knowledge base of the catalysis community utilizing semantic web technologies. By harvesting public metadata from all services within the infrastructure the Meta Portal provides a powerful tool to search for data fostering re-use and exchange of knowledge. the creation of a centralized storage solution. They also discussed the need for a centralized storage solution, and described the solution that is being developed within the NFDI4Cat consortia. 

Erik Bergman brought forth an approach from industry. He talked about High Throughput Catalyst Screening through parallel flow reactors, sharing learnings from 20+ years of experience in building and using a Data Processing platform for Flowrence®. 

The ongoing efforts within the DAEMON COST network were introduced by dr. Kevin Rossi. DAEMON COST is a pan-european network consisting of 200+ members from 40+ EU countries, and their objective is to democratize and accelerate the use of machine learning methods for chemistry and materials science. 

Dr. Rachit Khare talked about a research data management tool for laboratory courses - RDM4Lab. As part of NFDI4Cat, the Department of Chemistry at TUM has developed this RDM tool for systematically storing the data generated in the laboratory course in compliance with FAIR principles. The browser-based user interface allows the students to submit reports as well as raw data to a database, and the supervisors can then view both. It also features Python-based modules that can be used by the supervisors to perform automatic data analysis. 

Two specific challenges that NFDI4Cat researchers faced in achieving FAIR data were shared by dr. David Linke. He presented the VOC4Cat vocabulary that NFDI4Cat developed, and introduced a novel, schema-based Persistent Identifier (PID) system designed to simplify identification of resources like samples or devices throughout the research process. Building upon the shared vocabulary, interoperable data models can be developed that together with the PID system enable interlinked data graphs.

Dr. Jelte Nimoth talked about the experiences and lessons learned at the University of Groningen with setting up and working with electronic lab notebooks. He also presented the research data management system that is currently in place at the university with a particular focus on the workflows it facilitates.

Dr. Mark Doerr presented the LARAsuite (gitlab.com/larasuite), an open source and free research data management suite that natively generates and collects semantically annotated research data directly from lab devices during data generation. Experimental data is collected in an open format related to Fair Digital Objects (data containers with JSON-LD metadata). He demonstrated the whole infrastructure from start to end with some automated sensors, capturing and annotating time series data and querying the aggregated data.

Alexander Behr shared his work on methods for automated ontology development, focusing on the domain of Catalysis. The presentation also offered a brief introduction to workflows of automated ontology extension with Natural Language Processing.

Looking at the future, acting now

The 2nd day of the workshop was structured into two hands-on interactive sessions, focusing on synthesizing the next steps and creating a collaboration plan. To kick off the day the participants were divided into four groups, based on their expertise and interests, and they worked on the following topics: Data Collection and Generation (ELNs), Processing and Analysis (workflows, pipelines), Publishing and Archiving, Ontology and metadata development. 

Drawing from 1st day’s presentations each of the groups started by describing the state of the art in their specific topic. This was followed by a future outlook brainstorming session - thinking about where the chemistry community wants to be 5 years from now, what is needed to get there, what needs to/will change, which challenges will remain and which new challenges will be coming up. The session concluded with a closer look at the next 2 years focusing on the biggest bottlenecks that can be addressed in that period. The 4 groups discussed issues such as the lack of visibility and understanding of ontologies, the need to encourage the reporting of all data, including that from unsuccessful experiments, as something that could shift the publishing culture from one focused solely on success stories to one that values the entire research workflow and data integrity, or how to overcome the fact that reproducibility is not inherent to culture in chemistry which is primarily motivated by competition in the field.

In the second part of the day, one group of participants discussed the TDCC-NES project proposal in development -  FAIR4ChemNL: Unifying and managing chemical research data-flows, with prof. Evgeny Pidko as the lead applicant. The group focused on how to best design the projects and outline realistic work packages and tasks that will deliver the biggest impact with the limited timeline and budget, as well as specific project roles and partnering organisations. The experience NFDI4Cat consortia members shared in terms of how they structured their own work and what they’ve been able to optimise in the past 3 years has been especially valuable. This input is being implemented into a full project proposal. 

The rest of the participants focused on immediate actions they could jointly take to continue this fruitful exchange of experience and best practices to align their efforts. This included a start of planning a similar workshop, to happen next year, as a part of the closing conference of the 1st phase of NFDI activities, a potential collaboration on developing catalysis-focused RDM training materials, a start of a discussion on alignment of ontologies for the fields of chemistry and catalysis at the EU level, and several others. 

A Concrete Outcome: One of the main concrete outcomes of the workshop is the development of a whitepaper. Participants are working on this document to solidify the next steps and set a clear agenda for future collaboration. The whitepaper will serve as a comprehensive guide, outlining the strategic actions needed to implement FAIR data principles effectively across borders and enhance data management practices in chemistry everywhere.

In the coming months, you can expect more information about some of these collaborations, as well as the FAIR4ChemNL project. Follow updates from TDCC-NES, FMC, and NFDI4Cat to stay abreast of the latest developments.

Photographs of participants: L.Varat