Introducing CHIMAERA: Bridging Cultural Heritage Data Through Interoperable Dataset Descriptions 

This post introduces the CHIMAERA project, an Interoperability Mission initiated by TDCC-SSH. The project is working on the exchange of essential dataset information between formats.

Imagine that you are working on a research project about childhood through the ages, and so you are looking for images of children from different periods of history. You come across a dataset called ‘All project images’. The description tells you that the dataset contains thousands of archive images that were compiled as part of a project that ended in 2021. You are probably left with many questions: 

  • Do these images feature children? 
  • Which historical period is covered? 
  • Are you allowed to use the images for your purpose? 
  • Are they of good enough quality? 
  • Are there any sensitive images – for example images of persons still living, images taken without the person’s consent, images of an enslaved child? 
  • ...and many, many more 

Do you invest time and effort in trying to find out the answers for a dataset that may turn out to be unsuitable for your needs, or do you keep searching and perhaps miss out on a treasure trove of relevant images? 

This is a problem faced by dataset users across domains, but it is particularly challenging for the cultural heritage sector, where datasets are often complex and heterogeneous with convoluted histories, gaps, and biases. Unsurprisingly, many initiatives have sprung up to encourage better dataset documentation. The reason there are different initiatives is that the information users need varies greatly per domain and per type of dataset. Even within cultural heritage, a linguistic recordings dataset, a library web archive and an archival image bank will have quite different metadata and characteristics. This brings us on to our next problem. 

Imagine you are creating documentation for a dataset of archival images. You should, of course, use the documentation format your institute has chosen. But you know that machine learning users are also interested, and they use a different format. Plus you would like to upload the dataset to Zenodo and the NDE Dataset register, both of which have different metadata requirements. Your work has just multiplied greatly. 

At the start of March 2026, we kicked off the CHIMAERA project (Cultural Heritage Interoperability Mappings Across Existing Resource Architectures). In this 12-month project, we aim to make different forms of dataset documentation interoperable, by working on a common model that supports conversions. The project scope is three formats used in the Netherlands: Datasheets for Digital Cultural Heritage, Data-Envelopes and DCAT-3 (used for the NDE dataset register, among others). The project will deliver conversions between all three formats. The intention is that the developed approach will be extendable to more formats.  

The project is funded by the TDCC-SSH, as part of their Interoperability Missions, with a core project team from the Huygens Institute (KNAW) and TDCC-SSH. We will collaborate with the KB National Library of the Netherlands, NDE, Europeana and SSHOC-NL for development, and a wider network of academia and GLAM to gather requirements and feedback.  

Together, we work towards saving dataset providers and users time and energy, so that they can concentrate on creating and using valuable datasets that are supplied together with the information they need.