TDCC-NES Challenge projects
The Challenge funding strand is intended for mid-sized projects with a duration of 24 – 30 months and a budget between €50,000 – €400,000. These projects are intended to respond to one or more challenges defined in TDCC-NES roadmap:
- FAIR data
- Sustainable software and e-science
- Connections to the international activities
- Long-term data archiving
- Locate computing capacity close to the storage
- Human capital
- Cross-field collaborations
The first call for these projects launched in November 2023 with an initial sum of €1,6 million per TDCC. This call will be open for twelve months and the review process will be transparent, collaborative and community led, organised in several submission cycles. For more details on the process and how to apply read through TDCC NES Project Development Process.
Results from last 3 submission cycles
We have received 22 submissions in total, 3 of which were resubmissions based on suggestions of the TDCC-NES Governing Board (GB). The GB has made consensus-based decisions to accept 5 project ideas to move forward for development. Public summaries of the meetings where decisions were made are available here under "Meetings". One of the project ideas stopped further development based on the applicants' own decision. Read more about the 4 project ideas under development below:
Geospatial machine learning (ML) models are widely used in scientific and (semi)operational settings by geoscientists, ecologists, agronomists, engineers, spatial planners, public health specialists, etc. These models and the methods to develop them are continuously evolving and changing rapidly, making it difficult to keep up with them. While some researchers and practitioners are proficient in the development, application and (re)use of ML models, others are lacking the basic knowledge required to harvest the benefits of geospatial ML models. Additionally, ML modelling remains an art and modelers do not always document their creative process. To address these problems, we propose creating a geospatial ML course that increases geospatial ML literacy as well as the (re)usability of geospatial ML models.
The geospatial ML course would not only provide researchers with foundational knowledge and skills, but also with the opportunity to stay updated with the latest advancements. Although generic ML courses exist, using ML with geospatial data is different from other domains, as the spatial aspect introduces domain specific challenges (e.g., ways to deal with spatial autocorrelation). Moreover, the variability, volume and dimensionality of geospatial data often brings data integration and processing challenges. Next to this, modelers often look for geographical and physical consistency whereas this is not automatically guaranteed by ML algorithms. Additional challenges include the selection of methods to properly evaluate geospatial ML models, the delineation of their domain of applicability so that they (re)used in a responsible manner and the identification of suitable ways to combine geospatial ML and legacy (mechanistic, mathematical, process-based) models. In short, the proposed course will provide valuable insights into the development and application of ML concepts, while addressing the unique requirements of geospatial data.
Finally, we highlight three hallmarks of the proposed course: 1/ we will develop it using open-source tools and solutions. This will help to scale up our work, allowing (sub)disciplines to reuse, expand and modify our materials; 2/ we will explore and test ways to ensure model FAIRness and reproducibility by adopting and adapting open-source (MLOps) tools and solutions. This will lead to more transparent and (re)usable models that can better support policy/decision making, and 3/ we will involve the research community from the beginning of lesson and educational material development, to adapt the material to their use cases, and to continually gather their feedback.
Lead applicants: Prof. dr. Raul Zurita-Milla, University of Twente & dr. Claire Donnelly, NLeSC
Listed project partners: WUR, VU Amsterdam, TU Delft, LTER-LIFE, SENSE / WIMEK, OSCNL, NLRN
Data-centered approaches in chemistry hold the promise of uncovering groundbreaking solutions critical for advancing energy and material transitions, pharmaceutical innovations, and circular economy endeavors. In the realm of chemical research, the potential of data to drive discovery and innovation is vast yet underutilized, primarily due to disparate data management practices. Our project, "FAIR Data for Chemistry: Unifying and Managing Chemical Research Data-Flows," aims to bridge this gap by advocating for the adoption of FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles across the chemistry discipline. This initiative is poised to lay the groundwork for practical data management solutions that cater to the nuanced needs of cross-disciplinary chemical research.
To achieve this, we plan to create a cross-institutional workforce led by dedicated software engineers who, while being embedded within the research environments of the academic partners will closely collaborate to implement complementary and functional data-management solutions. Their work will initially focus on individual research groups, with the goal of scaling successful practices to the national level in later stages. This strategy ensures that the project's outcomes will be directly applicable and beneficial to the broader Dutch chemistry community.
Our collaborative project will include leading academic groups from Delft University of Technology, Utrecht University, and the University of Amsterdam, with additional support from the eScienceCenter and Surf. By aligning our efforts with international activities, starting with NFDI4Chem in Germany and aiming for broader European collaboration, we position our project at the forefront of global data management innovation in chemistry.
Through illustrative case studies, the development of standards, and the implementation of effective workflows, we anticipate fostering widespread adoption of FAIR data practices. The success of this project will mark a significant shift towards a more collaborative, efficient, and innovative future in chemical research, characterized by enhanced data sharing and interoperability.
Lead applicant: Prof. dr. Evgeny Pidko, Delft University of Technology
Listed project partners: Utrecht University, Delft University of Technology, University of Amsterdam, Netherlands eScience Center, SURF, NFDI4Cat, Fundamentals & Methods of Chemistry advisory board, ENW NWO
Data accessibility plays a crucial role in modern research, and it is pivotal in the journey towards Open Science. However, it is still challenging to quickly access and efficiently process large datasets that are continuously growing in volume as data sources diversify and data collection frequency increases. Such data are mostly made available on the Cloud, and cloud-native data access and processing is ramping up as a modern digital competence bringing computation close to the data to increase efficiency and reduce research time. Although related technologies have been available for long, the highly inefficient traditional approach involving data download and local exploration unfortunately remains as the standard practice of most researchers in the NES domain.
This project aims to facilitate the effective use of cloud-native tools and technologies to access and process research data first by demonstrating researchers how efficient cloud-native solutions are for common research workflows compared to traditional methods, and then by training them on how to develop and run such cloud-based workflows. For this purpose, we will use geospatial data as an example, which is widely used in the NES domain. We will create a cloud-native public geodata repository with co-located data analysis capabilities on the Dutch infrastructure (SURF) and make selected datasets available on the repository, which are relevant for the NES domain and currently not available in cloud-native formats (e.g. Dutch Public Services On the Map (PDOK) data). We will benchmark data access and analysis performance against the original formats and traditional methods, and share the results with the NES community to demonstrate and promote the benefits of cloud-native research in an evidence-based manner. To enable quick uptake of the cloud-native approach, we will develop open training material specifically targeting the NES domain and organize training workshops at different locations and prominent events (e.g. SURF Research Day, Open Science Festival) during which researchers can develop their skills with hands-on practices by using the cloud-native geodata repository. Finally, we will share the lessons learned during the project with all relevant national and international stakeholders though a workshop.
Lead applicant: Dr. ing. Serkan Girgin, University of Twente
Listed project partners: University of Twente - Faculty of Geo-information Science and Earth Observation, Netherlands eScience Center, SURF, Geonovum
Exponential growth of data and sophisticated modelling is driving an increasing demand for computational resources across the NES domain. Meeting this demand requires high performance, co-located compute and storage infrastructures (RM 2.5) and cultivating the expertise necessary for effectively (re-)designing scientific workflows for these facilities (RM 2.6). This (re-)design also offers unique opportunities for cross-discipline collaborations, sustainable software development and FAIR best practices (RM 2.1, 2.2, 2.7).
Together with NES researchers we have created the DAT framework (originally inspired by geoscience use cases, but generic in its applicability). DAT is centered on the established Python-Dask-Jupyter ecosystem that is emerging in the NES domain. DAT enables researchers to seamlessly scale their workflows to HPC resources and has been adopted by a number of research groups working on SURF facilities and the DelftBlue HPC system via Delft DCC.
We believe that there is an urgent need for the NES community to be informed and educated about the benefits (and drawbacks) of HPC. We also believe that HPC-DAT, through the TDCC-NES network and collaboration with DCCs, can provide a low-barrier entry point. Here we propose a two-pronged approach to address the roadmap challenges highlighted above:
A. Upskilling, knowledge sharing and expertise development
(A.1) Upskilling and expertise development (RM 2.6)
We will develop educational/training materials focussing on the use of the HPC-DAT framework, host institutional workshops and discipline over-arching hackathons.
(A.2) Cross-discipline knowledge sharing (RM 2.7, 2.3)
We will bring together the emerging Dask communities in the NES domain and promote cross-discipline knowledge sharing to accelerate collaboration opportunities.
(A.3) User forum
We will establish a community user forum, formed around those NES researchers that have already expressed their interest to provide guidance and immediate feedback.
B. Refinement and expansion of the HPC-DAT framework.
(B.1) Workflow support (RM 2.7, 2.2, 2.5)
We will support more complex processing chains in HPC-DAT as well as add template workflows for new users to build on.
(B.2) Collaboration and efficiency (RM 2.5, 2.7)
We will improve the efficiency, accessibility, and ease-of-use of the framework, also integrating additional data management and authentication services.
(B.3) FAIRification (RM 2.1, 2.7)
We will add more support for FAIRification. Specifically, we will integrate data version control and promote best practices for finding and sharing workflow graphs to improve reproducibility.
Lead applicant: Dr. Raymond Oonk, SURF
Listed project partners: NLeSC, TU Delft, UTwente/ITC, UvA, Deltares, SRON, UU, Nikhef, ASTRON, Leiden University