LSH FAIR Fellow: Frans van der Kloet

Bio

After several jobs in commercial companies, I obtained my PhD in 2014 at the Leiden University on quantitative aspects in/of high-resolution mass spectrometry data in metabolomics. In that same year I started as a post-doc at the BDA group working on aspects of multi-block/view solutions like JIVE, DISCO and OnPLS and incorporating these types of methods in the prediction/classification of in-vivo transcriptome data. After another PD position at the Amsterdam Medical Center I started as a data scientist in the BDA group in 2019. In that capacity I’m facing issues that vary from writing software solutions (mostly Python based) with/without a data analysis part to administering a local galaxy server. I also support researchers in organizing their data and have a particular interest in storing the metadata. To that end I’ve been responsible for the development of Metatree, a database that only stores the metadata using ontologies where possible.

Use Case Title

Centralized meta-data storage

Use Case Description

Too often, valuable annotated data is locked inside individual projects: limited in scope, hard to reuse, and burdened by cumbersome, IT‑centric annotation tools. I want to change that by focusing on rich, well-structured metadata rather than the raw data itself. Using a flexible ontology that captures common measurement concepts and can be easily extended with new subclasses, we can make datasets truly findable and, crucially, reusable for future statistical analyses. By storing this metadata in a central hub (for example, at SURF), institutes can discover each other’s measurements without ever exposing the underlying data files—only references to them. Simply knowing which samples have already been measured can spark new collaborations, prevent duplicate work, and open the door to entirely new research questions.

What are the biggest challenges you anticipate facing in your use case over the next months?

Making the metadata data model flexible for many types of data and still be able to create a user-friendly upload tool for that.

What specific skills or knowledge do you hope to gain through the fellowship programme?

Applying large language models to my use-case and learn on knowledge graphs.

What motivated you to apply for this TDCC LSH fellowship?

I’d expect that most of us are dealing with the same type of questions and I wanted to learn from others how they deal (or not) with them and copy/use what is already there when possible.

In one compelling sentence, why does your project matter?

Only storing metadata that is organized properly is enough to spark new collaborations.

Link to Frans ORCID: https://orcid.org/0000-0002-8573-2651