LSH FAIR fellow: Kristoffer Basse

Bio

Dr. Kristoffer Basse is a research technician working primarily with mass spectrometry applications, operating instrument and providing bioinformatics support for various projects. He holds a PhD in nuclear magnetic resonance spectroscopy, which focussed on pulse sequence development, and has worked in the maintenance and utilisation of a wide range of analytical instrumentation, as well as provided support for data analysis, for almost a decade.

His current work involves every aspect of the proteomics pipeline from sample collection/preparation to mass spectrometry data acquisition and the associated data analysis. He is further in the process of setting up a mass spectrometry imaging capability in the lab to enable, e.g., drug and lipid distribution studies.

As a computer and bioinformatics expert, Dr. Basse has experience with a wide range of IT systems and bioinformatics tools and is involved in the development of several data analysis programs. He has programming experience in multiple programming and scripting languages, such as Python, Tcl, C++, Rust, and R, and acts as a data custodian for all the instruments under his purview.

Use Case Title

Development and implementation of PASTAQ for FAIR-oriented mass spectrometry signal quantification.

Use Case Description

The purpose of the use case is to continue the development of the PASTAQ software package to provide data pre-processing and signal quantification for proteomics and metabolomics mass spectrometry data in a wide range of data formats, and provide output in an open and interoperable format.

This will involve building data structures compatible with a wide range of available parameter sets, and writing analysis routines capable of handling both data obtained using data-dependent acquisition and data-independent acquisition, the latter of which produces much more complex output but in return removes the inherent bias involved in selecting precursors for fragmentation.

PASTAQ also needs to be able to extract peak information from data stored in both profile and centroided mode. These differ in the amount of stored detail, with profile mode providing full characterisation of peaks in the mass-to-charge (m/z) dimension, while centroided mode only stores information about the peak location and intensity, but not the width of the peak in the m/z dimension.

Analysis of centroided data may require that the data be transformed into pseudo-profile data through Gaussian splatting using theoretical standard deviations for the peak width, which can be obtained from the resolution of the mass spectrometer. Apart from the development of PASTAQ itself, a downstream goal would be to facilitate the incorporation of PASTAQ into data analysis pipelines using workflow management systems like nextflow to control the flow of data between pipeline components.

Matched FAIR Fellowship Coach

Primary coach: Team Systems Genetics. Visit the profile here!

Secondary coach: Team Bio-Imaging and ACDC. Visit the profile here!

What are the biggest challenges you anticipate facing in your use case over the next months?

The biggest challenge is likely going to be the implementation of subroutines to analysis data-independent acquisition data. This type of data is much more complex than data-dependent acquisition data and handling it will require research into efficient data structures such as CPU/GPU cache friendly block structures, peak deconvolution and spectral linking.

What specific skills or knowledge do you hope to gain through the fellowship programme?

While the further development of PASTAQ is, of course, the primary task, which speaks to the interoperability part of the FAIR principles, I hope to learn something more fundamental from the fellowship, namely how best to inspire and encourage people to adopt a FAIR mindset in their daily work with both data generation/management and bioinformatics development, in order to better promote open and reproducible science.

Making data FAIR compliant is no small task, and requires significant time investment of each individual researcher, with very little immediately obvious benefit to themselves. Learning about how best to communicate the advantages associated with FAIR complaint data will make it easier to convince researcher to make that time investment.

What motivated you to apply for this TDCC LSH fellowship?

I was presented with the opportunity to join the fellowship due to my existing role as unofficial data custodian. As I already have an interest in open science and scientific transparency, it seemed like a good fit.

As data custodian, I see the broad range of approaches to data annotation taken by researchers in my lab, and while I am sure each individual researcher has a system to allow themselves to find their data and metadata later, not all systems are equally good at making the data findable and accessible for other researchers to reuse.

I am convinced that the workshops involved in the fellowship will be able to help me set up guidelines for data annotation to improve the reusability of in-house generated data. The fact that the fellowship also provides me sufficient room to work on the further development of our analysis tools just makes the opportunity all the more attractive to me.

In one compelling sentence, why does your project matter?

Mass spectrometry can be used to investigate the molecular mechanisms behind a wide range of healthy ageing processes and diseases, but artificial barriers and signal sensitivity issues hinder progress; hence a comprehensive tool is needed to circumvent those barriers and provide accurate quantification even at the edge of instrumental capabilities.

Want to connect with Kristoffer? Use LinkedIn or view the research profile on ORCID.