Open Research Data Projects

Projects funded in the framework of the ORD Program

The joint ORD program of ETH Zurich, EPFL and the four research institutes of the ETH Domain has financially supported more than 60 research projects in the period 2020–2023. Funding supports researchers engaging in, or developing, ORD practices with and for their community and assists these researchers in becoming Open Research Data leaders in their field.

This page provides an overview of these projects. It highlights how researchers in the ETH Domain are currently applying ORD in exemplary ways. Some of the projects have already been completed, others are still in progress. The projects have been divided into three categories.

“Establish” projects help link existing ORD practices to a research agenda to establish them on a broader basis. They contribute to a shared and comprehensive understanding of ORD practices that can then become de facto standards.

“Explore” projects are the most extensive ventures in the program and are designed to explore and test early-stage ORD practices. The goal is to map processes of what an ORD practice might look like and develop prototypes. Through these projects, new teams form across disciplines and institutions.

“Contribute” projects help scientists integrate their research data into existing, often international, infrastructures. By standardizing the processes and making them generally accessible, the data are validated, and their potential is considerably expanded.

Filter

Category
Category
Institutions
A standardized database framework for synthetic carbon-based solar fuels

Category

Contribute

Institutions

EPFL

Data type

Solar Fuels

Field

Renewable Energy Science and Engineering

Researchers

Isaac Holmes-Gentle

Abstract

This proposal aims to significantly expand the Solar Fuels Database, which introduced a machine-readable framework for solar-to-fuel devices and an online data entry and visualization interface. The project's objective is to broaden the database to include additional solar fuels and technological pathways not covered previously. It will encompass various fuels such as carbon monoxide, syngas, formic acid, methane, ethanol, and extend to thermochemical redox cycles. Collaboration with global research communities will ensure comprehensive metadata capture. Standardized data reporting will enhance findability and dissemination of results. The project aligns with open-science objectives and encourages community contributions, resulting in a continually updated and openly accessible resource. This resource will consolidate prior work and offer a comprehensive overview to drive future research trends in advancing solar fuels for widespread implementation.

‘XYT’, a python package to analyze activity-travel behaviors, organization and scheduling

Category

Contribute

Institutions

EPFL

Data type

Digital data from urban dynamics

Field

Urban dynamics

Researchers

Marc-Edouard Schultheiss

Abstract

Today, more and more digital data are generated by urban dynamics. Yet, the generated data is extensive and heterogenous. Datasets are large, multi-sourced, often noisy, and come in various formats and standards. In addition, a particularity of urban data is its mix in terms of level of restriction. While geolocation data is private and sensitive, public transit schedules are open data. In this context, there is a need (i) to provide a framework to re-unify the multiformity of urban dynamics data, (ii) to articulate open and restricted data, (iii) to cohere offer and demand data, and (iv) to keep track of a privacy metric. This project proposes to develop and release an open Python package to address these four needs and therefore contribute to Urban Mobility Open Research Data practices.

MiShMASh: Microbiome sequence and metadata availability standards

Category

Contribute

Institutions

ETH Zurich

Data type

Microbiome data

Field

Microbiome research

Researchers

Lina Kim

Abstract

Microbiome research relies on vast datasets, demanding unrestricted data access and consistent metadata. This project addresses two core issues: ineffective sequence data statements and inconsistent metadata standards. It proposes a dual solution - a tier-based FAIR ORD standard and compliance assessment software. The project contributes open resources for diverse users, including researchers, journals, and funders. The validation software allows users to evaluate adherence to data and metadata standards, enhancing data reporting for better accessibility, interoperability, and future reusability.

Enabling compliance with ORD standards for cutting-edge time-resolved experiments at high data-rates

Category

Contribute

Institutions

PSI

Data type

X-rays

Field

X-ray science

Researchers

Filip Leonarski

Abstract

The upcoming Swiss Light Source 2.0 machine upgrade and the advent of free electron lasers (SwissFEL) enable novel advancements in X-ray science. One emerging technique is time-resolved serial crystallography, providing insight into biomolecular processes at micro- and millisecond timescales but generating extensive data. A single experiment can produce a continuous stream of X-ray images at 2,000 images per second (17 GB/s), leading to terabytes of data. Managing such large datasets in public repositories is challenging. This project aims to enhance data accessibility by creating reduced datasets. Existing protein diffraction labeling algorithms will filter images to include only high-quality diffraction images, approximately 0.1-10% of the total. These reduced datasets will be available in the PSI public data repository, alongside the complete dataset, improving data interoperability and findability by adding labeling results to the metadata.

Pycsou FAIR: A Community Marketplace for Discovering and Sharing Image Reconstruction Plugins

Category

Contribute

Institutions

EPFL

Data type

Image reconstruction plugins

Field

Computational imaging

Researchers

Matthieu Simeoni

Abstract

Pycsou is an open-source Python computational imaging software framework. It natively supports hardware acceleration and distributed computing. Its microservice architecture ensures optimized and scalable computational imaging tools that are easy to share across modalities. The framework's domain-agnosticity keeps it lightweight, accessible, and portable. However, some imaging communities face challenges due to this generic nature. This project introduces Pycsou FAIR, a web platform, meta-programming framework, and interoperability protocol. It aims to enhance the discovery, development, and sharing of FAIR-compliant image reconstruction plugins at scale. Imaging scientists can readily integrate modern computational imaging methods into their processing pipelines.

Establishing Structures for an Efficient Management of Materials and Processing Data

Category

Contribute

Institutions

Empa

Data type

Coatings and thin films

Field

Coating Technologies (CT)

Researchers

Sebastian Siol

Abstract

Empa's Coating Technologies (CT) group excels in developing coatings and thin films for diverse applications. Our research focuses on advancing thin films through combinatorial techniques, yielding extensive datasets. However, many published datasets lack precise metadata. This project aims to automate metadata creation by developing software that interfaces with OpenBIS to record crucial parameters from deposition and measurement tools. Raw data will be stored on LAN servers, with standardized file structures. Software tools will enable data analysis (e.g., CombIgor) to access metadata and measurements. High-quality datasets in an open format can be used internally and published in open repositories like NOMAD or Zenodo. This solution benefits labs working on thin film deposition, with growing demand from partner organizations.

Standardizing Encoding Elements for Lensless Cameras

Category

Contribute

Institutions

EPFL

Data type

Open-source hardware and software toolkit

Field

Lensless imaging

Researchers

Eric Bezzam

Abstract

In this project, the objective is to expand an open-source hardware and software toolkit known as LenslessPiCam, which is designed for lensless imaging. Lensless imaging relies on an optical element that serves as a substitute for a conventional lens, typically a thin mask. Although various methods for creating these masks exist, there is currently no standardized approach that ensures both design compatibility and reproducibility. The aim of this project is to incorporate mask-design tools into LenslessPiCam, enhancing its relevance to the broader research community. Additionally, our methodology is influenced by the FAIR principles and utilizes readily available resources. This approach will enable LenslessPiCam to remain an affordable and high-performing toolkit for lensless imaging, catering to educators, hobbyists, and researchers.

Mitigating spaceborne radio frequency interference through satellite database

Category

Contribute

Institutions

ETH Zurich

Data type

Satellite signals

Field

Space Geodesy

Researchers

Matthias Schartner

Abstract

The tremendous growth of satellite mega-constellations like Starlink and OneWeb, emitting radio signals, poses a significant threat to radio astronomy. These satellite signals can lead to radio frequency interference (RFI) or oversaturation of the broad-frequency receivers in sensitive radio telescopes, causing data loss or observation failures. In response, the International Very Long Baseline Interferometry Service for Geodesy and Astronomy (IVS) community and the International Astronomical Union (IAU) have initiated working groups dedicated to measuring and mitigating satellite RFI. A promising strategy involves avoiding observations near potentially disruptive satellites. In this initiative, the plan is to contribute by establishing an open database containing satellite orbits and the associated frequency spectra emitted by satellites. This database will amalgamate existing orbit data with available frequency information and measurements obtained from observatories. It will be made openly accessible through a web interface and an application programming interface (API), ensuring easy integration into modern software workflows.

Traceable thermodynamic datasets for chemical modelling

Category

Contribute

Institutions

PSI

Data type

Thermodynamic datasets

Field

Chemical thermodynamic modeling

Researchers

George-Dan Miron

Abstract

Currently, thermodynamic datasets lack adherence to ORD FAIR principles. ThermoHub database offers access to meticulously curated and expert-documented thermodynamic datasets in an open-standard JSON format. This project's goal is to optimize ThermoHub and showcase its ORD capabilities by creating a comprehensive, user-ready database from various widely-used thermodynamic datasets for chemical modeling. The project also strives to design and provide a documented semi-automated workflow for future expansion and maintenance. This endeavor will standardize and harmonize the chemical thermodynamic modeling workflow, enhancing data quality, reliability, and traceability. Offering FAIR-compliant datasets will simplify modeling, eliminating the need for researchers to manually collect thermodynamic values from extensive literature or develop complex scripts for data import. ThermoHub will enhance collaboration across Swiss, European, and international projects by providing traceable thermodynamic data for diverse modeling applications. Additionally, it will support ongoing work at PSI/LES, EPMA, and ETHZ on thermodynamic database and modeling code development.

Building Open-Source Tools for reproducible interaction with biological ORD databases

Category

Contribute

Institutions

ETH Zurich

Data type

NCBI, EBI, MGnify

Field

Life sciences

Researchers

Nicholas Bokulich

Abstract

The life sciences increasingly rely on centralized online databases for sharing biological information (e.g., NCBI, EBI, MGnify). These databases enable the exchange of primary and secondary biological datasets for downstream re-use. However, technical challenges for depositing and accessing open research data (ORD) and issues with traceability and reproducibility hinder scientific progress and adoption of ORD/FAIR practices in the life sciences. This project aims to develop software tools to facilitate remote, programmatic, and fully FAIR interactions with prominent ORD resources in the biological sciences. These tools will remove existing barriers, promote ORD sharing and re-use, and foster community engagement in ORD practices. While this project focuses on microbiome research, its multidisciplinary nature ensures its relevance across various research domains, aligning with the Contribute program's objectives of contributing software tools for established ORD databases.

Scroll to Top

Filter

Category
Category
Institutions