Open Research Data Projects
Projects funded in the framework of the ORD Program
The joint ORD program of ETH Zurich, EPFL and the four research institutes of the ETH Domain has financially supported more than 60 research projects in the period 2020–2023. Funding supports researchers engaging in, or developing, ORD practices with and for their community and assists these researchers in becoming Open Research Data leaders in their field.
This page provides an overview of these projects. It highlights how researchers in the ETH Domain are currently applying ORD in exemplary ways. Some of the projects have already been completed, others are still in progress. The projects have been divided into three categories.
“Establish” projects help link existing ORD practices to a research agenda to establish them on a broader basis. They contribute to a shared and comprehensive understanding of ORD practices that can then become de facto standards.
“Explore” projects are the most extensive ventures in the program and are designed to explore and test early-stage ORD practices. The goal is to map processes of what an ORD practice might look like and develop prototypes. Through these projects, new teams form across disciplines and institutions.
“Contribute” projects help scientists integrate their research data into existing, often international, infrastructures. By standardizing the processes and making them generally accessible, the data are validated, and their potential is considerably expanded.
Filter
Category
Institutions
Data type
Field
Researchers
Abstract
Catalysts play a major role in the production of many chemicals that are used everyday. Their proper characterization allows to understand their properties and therefore enables a rational design of better performing catalysts. Therefore, having large amounts of open access characterization data of catalysts will enable more efficient data/machine learning driven approaches to understand catalysts at a higher level and further speed up the design of better performing catalysts. However, although a wide variety of catalyst characterization data and open access repositories are available, one missing link is the proper and standardized data processing step complying to the FAIR principles prior to the upload to open access repositories.
At Swiss Cat+, an ETH domain technology platform, large amounts of data related to catalyst characterization are generated with the help of automated high-throughput technologies. Therefore the aim of the proposal is to implement additional components within the ETHZ Swiss Cat+ ORD Roadmap to have streamlined and standardized catalysis characterization data processing, and visualization tools within and beyond Swiss Cat+. This implementation will fully unlock the potential of using catalysis characterization data that are generated globally.
Category
Institutions
Data type
Field
Researchers
Abstract
Life science generates vast amounts of next generation sequencing (NGS) data, and there are well-established, FAIR repositories for open access of this data. Still, depositing NGS data in these repositories bears some challenges for life science researchers, leading to data not being deposited and shared.
We propose to implement a web service that simplifies data deposition for life science. Our service, will start from data and meta-information available in omics-data management systems like B-Fabric. It will let the user review and curate the information to be uploaded and will then perform the upload. Our web service will directly help research groups to make their data swiftly accessible in a well-defined and well-documented format in the recognized repositories with world-wide visibility and accessibility. The service will significantly reduce the efforts of making NGS data openly accessible, it will increase the quality of the openly accessible data, and it will make the originators of NGS data, the various research institutes in Switzerland, more visible.
Category
Institutions
Data type
Field
Researchers
Abstract
Wearable sensors offer vast potential for advancing healthcare through data-driven insights, but their integration into clinical trials and practice is hindered by a lack of interoperability. This project proposes the development of a standardized low-level communication protocol for wearable sensors to facilitate harmonized data collection across different platforms. By establishing a common standard for raw data transmission, the project aims to enable seamless aggregation of sensor data and foster collaboration among healthcare, research, and industry stakeholders. Through community-driven requirements gathering and standards development, the project seeks to address the current challenges in integrating wearable sensor data into healthcare practices. Inspired by successful standards in other domains, this initiative aims to catalyze a more interconnected digital health ecosystem where wearables play a pivotal role in personalized healthcare practices.
Category
Institutions
Data type
Field
Researchers
Abstract
The "Enhancing Global WASH Data Accessibility through Collaborative Initiatives" proposal, a joint effort by WASHWeb and openwashdata, aims to improve the global Water, Sanitation, and Hygiene (WASH) data ecosystem. The partnership, formed under shared goals for better data accessibility, usability, and representation, proposes a project divided into three work packages: Maintain, Extend, and Disseminate. The first package updates the Joint Monitoring Programme (JMP) dataset for broader use. The second aims to collaborate with data providers to create a new R dataset package, enhancing analyses of WASH investments and outcomes. The final package seeks to promote these open data packages through webinars, conference sessions, and online discussion groups, fostering a community around open WASH data.
Category
Institutions
Data type
Field
Researchers
Abstract
The OPEN-3D project revolutionizes urban mobility analytics by developing and enhancing four high-resolution traffic datasets derived from cutting-edge drone technology. This initiative, aligning with the ETH-Domain ORD Program, introduces innovative datasets and revitalizes existing ones with advanced tools, redefining traffic research practices. OPEN-3D meticulously constructs two new datasets and enriches two existing ones, all adhering to FAIR principles, supporting pivotal advancements across AI, computer vision (CV), and traffic management.
Imagine a central hub designed to facilitate global data exchange and interoperability. OPEN-3D turns this vision into reality, supporting a broad network of researchers and practitioners in traffic science, AI, and CV. This project offers unprecedented insights into urban and semi-urban traffic dynamics, enabling detailed analyses of vehicle trajectories and interactions, potentially transforming urban traffic management.
OPEN-3D not only contributes to traffic studies but also inspires global urban mobility innovation. By embracing open science principles, each dataset is accessible and poised to spur further research, setting new standards in data quality and transparency. This comprehensive approach exemplifies the potential of collaborative, data-driven innovation to enhance urban safety and efficiency, fostering sustainable urban development globally and catalyzing progress in traffic management technologies.
Category
Institutions
Data type
Field
Researchers
Abstract
The Groupe ACM (Gr-ACM) is a research group responsible for the collection of architecture heritage archives known as the Archives de la construction moderne (ACM). Digital contents in architecture heritage archives is increasing and will increase in the future both through donations of new born-digital archives and digitization campaigns of paper archives. Through the CA ORD Project, the Gr-ACM aims to improve its capacity to preserve and make FAIRly available data and metadata from digital architecture archives (both born-digital and digitized from paper). To date, the Gr-ACM has collected 4 Tb of digital data made of files in different formats (.dwg, .dxf, .pln, .jpg, .pdf, etc.) stored on external hard drives and CD-Rs, which are therefore unavailable for research.
The CA ORD Project aims to fill this gap. It is about installing, configuring, and running the open-source software Archivematica, which is an archiving system based on a multi-services architecture, that allows for automation, extraction, and normalization of data and metadata. These (METS files) will be made available on the already existing ACM’s ORD AtoM-based portal Morphé, since Archivematica and Atom are interoperable.
Thanks to readily exploitable data, the CA ORD Project will enlarge the ACM users community (currently limited to historians) including new researchers from the fields of architecture, engineering and land management.
Category
Institutions
Data type
Field
Researchers
Abstract
Mechanistic ecohydrological models are essential tools to accurately simulate the impacts of climate change on the water, carbon, and nutrient cycles. However, there are very few models available to the community which can holistically simulate such a wide range of processes and most of them are written in low-level programming languages (e.g., C++ or FORTRAN), hindering model accessibility to new users. In this regard, Tethys and Chloris (T&C), a state-of-the-art ecohydrological model written in MATLAB, offers a strong foundation for creating an accessible community-driven model. TRANSCODE aims to transform T&C into a FAIR (Findable, Accessible, Interoperable, Reusable) model by redesigning its architecture for modularity and re-implementing it in Julia, an open-source language which marries the computational efficiency of low-level programming languages such as FORTRAN and the accessibility of high-level languages such as MATLAB. This translation will improve computational efficiency, foster open code contributions from the community, and facilitate interoperability with other models. Specifically, the project will create a modular, comprehensively tested, highly efficient, and easily accessible version of T&C, termed T&C-Julia. TRANSCODE has the potential to significantly benefit the Earth science community and advance the field of ecohydrological modelling by providing a versatile, state-of-the-art, and open-source modelling platform.
Category
Institutions
Data type
Field
Researchers
Abstract
Imaging science and computational microscopy are rapidly advancing, driven by novel interdisciplinary approaches involving deep learning algorithms. However, the increasing complexity and cost of these cutting-edge imaging systems and algorithms often make them inaccessible for non-experts, low-resource settings, and teaching applications. To address this challenge, we would like to organize a one-day workshop on open-source microscopy and AI to bring together the smart-microscopy community.
The workshop will showcase a full pipeline of open-source solutions for optical imaging, from hardware to computational reconstruction and deep learning-based analysis. It will provide hands-on experience for participants to assemble an open microscope (OpenSIM, OpenUC2), perform reconstructions (Pyxu), and analyze images (DeepImageJ). It aims to empower researchers and teachers to take full control of their imaging pipeline and iterate rapidly on new solutions. Beyond this event, the project seeks to drive a broader and lasting impact on the community. Educational resources such as Jupyter notebooks and hardware kits will be developed and made publicly available to support teaching at EPFL and beyond.
By showcasing a comprehensive open-source ecosystem for microscopy, this initiative aims at making state-of-the-art imaging technologies more accessible and further catalyze the growth of an open, interdisciplinary microscopy community.
Category
Institutions
Data type
Field
Researchers
Abstract
Chemical pollution has exceeded planetary boundaries, requiring urgent solutions for chemical waste removal. Microbial biodegradation processes are crucial for breaking down chemical contaminants, yet the functions of microbial communities are often challenging to predict. To address this challenge, we aim to contribute a new pipeline, EDGEbp (Enabling Detection of metaGEnomic biodegradation potential), to advance the research capabilities of the ORD biodegradation prediction software, enviPathPlus. Specifically, in EDGEbp, we will build a Hidden Markov Model-based pipeline to identify biodegradation genes and pathways from total microbial community DNA (metagenomic) sequencing data. EDGEbp will output confidence scores to infer the ‘contaminant biodegradation potential’ of a given microbial community based on sequencing information. In other words, our aim is to build a tool to convert unintelligible DNA sequences into easily-understood biodegradation confidence scores. This will help us infer the capabilities of a specific microbiome to transform chemical contaminants. This project will therefore advance the sustainable development goals of improving water quality by reducing chemical pollution through microbial biodegradation. Overall, we anticipate that EDGEbp will expand the cutting-edge functionalities of the ORD tool enviPathPlus to support its long-term preservation and promote community engagement in line with ORD principles.
Category
Institutions
Data type
Field
Researchers
Abstract
Mobile ground robots have become increasingly popular in academia and various industrial applications. However, unlike other domains like aerial robotics, autonomous driving, and construction, there is currently no high-quality, large-scale dataset or reliable benchmark established in this field, nor the tooling available to do so. Creating such a dataset would be immensely valuable for researchers and developers in fostering research on robust and practical algorithms across diverse environments. Moreover, the development of a standardized benchmarking platform would promote fair comparisons between different approaches, fostering innovation and facilitating the rapid progress of mobile ground robot research. Motivated by this, we propose to collect and share a high-quality, versatile, large-scale robotic dataset, “GrandTour”, with scalable and automated tooling– focusing on legged robots in addition to a set of benchmarks and the necessary tooling.