Open Research Data Projects
Projects funded in the framework of the ORD Program
The joint ORD program of ETH Zurich, EPFL and the four research institutes of the ETH Domain has financially supported more than 60 research projects in the period 2020–2023. Funding supports researchers engaging in, or developing, ORD practices with and for their community and assists these researchers in becoming Open Research Data leaders in their field.
This page provides an overview of these projects. It highlights how researchers in the ETH Domain are currently applying ORD in exemplary ways. Some of the projects have already been completed, others are still in progress. The projects have been divided into three categories.
“Establish” projects help link existing ORD practices to a research agenda to establish them on a broader basis. They contribute to a shared and comprehensive understanding of ORD practices that can then become de facto standards.
“Explore” projects are the most extensive ventures in the program and are designed to explore and test early-stage ORD practices. The goal is to map processes of what an ORD practice might look like and develop prototypes. Through these projects, new teams form across disciplines and institutions.
“Contribute” projects help scientists integrate their research data into existing, often international, infrastructures. By standardizing the processes and making them generally accessible, the data are validated, and their potential is considerably expanded.
Filter
Category
Institutions
Data type
Field
Researchers
Abstract
Stone masonry is an eco-friendly construction material, but its use has declined due to its vulnerability to earthquakes, mainly because of the poor arrangement of its microstructure. The microstructure includes the shape, size, and arrangement of stone units, which vary based on geographic, temporal, and material factors. Current building codes cannot fully account for this variability, and experimental studies are costly and impractical due to the diversity of masonry typologies. Numerical studies offer a solution, but creating realistic microstructures for modeling irregular stone masonry is complex and time-consuming. As a result, simplified microstructures are often used in simulations, which fail to capture the complexities of irregular masonry walls. To address this challenge, we have developed a 3D masonry microstructures database ready to use in numerical simulations. To enhance accessibility and usability, this project aims to create a web-based platform hosting this curated database of 3D microstructures and their geometric indices. The proposed web-based platform will also feature a tool for evaluating masonry quality using the Masonry Quality Index (MQI) from 2D images, promoting the preservation of historic structures and sustainable construction practices. Additionally, the platform will enable researchers to contribute and document new 3D microstructures, fostering collaboration and advancing numerical research on stone masonry.
Category
Institutions
Data type
Field
Researchers
Abstract
In order to advance our understanding of the carbon cycle, it is essential to evaluate the spatiotemporal variations of carbon between river and marine environments and gain insights into the pathways of carbon transfer from land to ocean. To do this, we need to work jointly with riverine and marine data, accounting for their temporal and spatial distribution. However, each of these systems have different data and metadata reporting strategies that need to be accounted for, which complicates their joint application. Efforts have been made to compile data from each of these systems into independent databases, but no attempt has yet been done to create a joint database of data of both of these systems while accounting for their different metadata. Hence, this project aims to bring together riverine and marine data into one database to easily query the data between both systems through the River to Ocean Geodatabase for Education and Research (ROGER). This database will be displayed in an interactive web-interface that queries riverine and/or marine data depending on the user’s requirements through a REST API. Harnessing the advanced geographical functions of PostgreSQL, the REST API will include functions that allow users to geospatially integrate riverine and marine data. This new database will provide a crucial step forward in the understanding of the carbon cycle along the land-ocean continuum, while ensuring that the data complies with best Open Research Data practices.
Category
Institutions
Data type
Field
Researchers
Abstract
Chronic cough is a common condition globally. While efforts are being made to develop wearables to detect and quantify cough events automatically, such monitoring devices have not yet been incorporated into routine clinical practice due to a lack of consistency in their validation, resulting in slow progress and a lack of trust in reported results. We have identified three main reasons for this heterogeneity: 1) the clinical definition of different cough events and especially the delimitation of their beginning/end lacks standardization, 2) the data used is typically private and imbalanced with inadequate labelling as a result of the previous point, and 3) methodologies to assess the accuracy of event detection are different between research groups and often inappropriate. This proposal builds on ORD datasets, community guidelines, and standards to propose a unified framework for validating cough event detection algorithms. The main objective is the development of standards that will unify the workflow for validating respiratory event detection algorithms to ensure data adheres the principles of Findable, Accessible, Interpretable, and Reusable data. This will be distributed through a website, serving as a central hub and reference for standardizing clinical definitions and methodologies, leading to a future benchmarking platform for respiratory event detection algorithms.
Institutions
Partners
Abstract
Sharing of research data is often perceived as a burden by researchers, as it usually involves manual upload of data from data management systems like Electronic Lab Notebooks (ELN) to repositories. This project aims to contribute to a better integration between ELN systems and data repositories in the ETH Domain, by implementing open API-based interfaces between the SciCat data repository and three important ELNs in the ETH Domain (SciLog, openBIS, Heidi). By implementing seamless interfaces between these widely used solutions in the ETH Domain, the project will simplify existing ORD practices for researchers, thereby lowering the barrier for publication of high-quality FAIR datasets.
Institutions
Partners
Abstract
The adoption of Electronic Laboratory Notebooks (ELNs) in academic research settings is steadily increasing and gradually replacing traditional paper-based notebooks. However, transitioning to ELNs requires time and expertise. Complicating matters, the market offers numerous ELN solutions, each with its unique data model, impeding seamless information exchange. In this project, we plan to address two critical aspects of ELN adoption in academia. First, we would like to broaden and strengthen the knowledge and requirements for adoption of ELN and data management solutions in academic research groups inside the ETH domain. This will be a collaboration between ETH Scientific IT Services (SIS) with the School of Engineering of EPFL, drawing from extensive experience inside SIS in deploying and providing their own software, openBIS, as an ELN and data management solution inside the ETH Domain (namely ETH Zurich, Empa and PSI) and beyond. Second, we aim to enhance interoperability by implementing a standard for data export from ELNs. To this end, we will explore and suggest enhancements to the RO-crate format which focuses on packaging data with metadata and simplifies data sharing and preservation, ensuring reproducibility and long-term accessibility. By sharing our experiences and introducing openBIS while exploring data export standards, we aim to contribute to streamlining ELN adoption and fostering data interoperability in academic research environments in the ETH domain. Our approach will facilitate efficient collaboration, enhance research reproducibility, and promote the advancement of scientific knowledge.
Institutions
Partners
Abstract
Authentication, authorization, and identity and access management (IAM) are central to interoperability between services. Currently ETH services use a variety of identity providers, from federated services like SWITCH eduID to institute-specific active directory installations. Incompatibilities between authentication and authorization can be a major obstacle to interoperability between institutes. To mitigate this, we propose to draft a set of guidelines for IAM practices relating to ORD services. All M2 projects funded under this measure will be expected to follow the guidelines, ensuring that these services are interoperable. The guidelines will also be published in the Central Info Point website and disseminated to researchers, providing clear best practices for services outside the ETH ORD program to follow.
Institutions
Partners
Abstract
Access to data is a fundamental part of any scientific analysis and discovery, and a basic requirement for Open Research Data. Nevertheless, accessing data can be limited by various barriers, including incompatible infrastructures and APIs, as it was highlighted in the Infrastructure report by the Expert Group Services & Infrastructures (EG-SI). This project aims to introduce a common Storage Access API based on industry standards in high impact use cases among the ETH Domain institutes. This will lower the efforts both for accessing data, and for developing general and domain-specific data-based tools, thus accelerating the path to scientific discoveries and leading to reusable tools across the ETH Domain scientific communities.
Institutions
Partners
Abstract
With this first-of-its-kind interoperability project, Eawag/LIB4RI, WSL and PSI aim to jointly define and implement a basic blueprint for metadata format and exchange to demonstrate the feasibility of achieving interoperability between data catalogues and repositories in the ETH Domain. We aim to define a common understanding and an agreed compliance level with national and international metadata standards relevant to ETH Domain data catalogues and repositories, and subsequently take all required steps to implement and comply with them, in order to improve the visibility of datasets in the ETH domain and beyond. This will include, as major deliverables, improving the link between datasets and scientific publications, and connecting the involved repositories (EnviDat, SciCat, Materials Cloud Archive) with each other and with well-established central search portals (including DORA and the Lib4RI search tool). This project will also uncover hidden challenges and barriers to interoperability, paving the way for other repositories in the ETH domain to join our interoperability efforts.
Institutions
Partners
Abstract
We propose to build an integration between analysis platforms and research data repositories to allow for an exchange of information about data use and to facilitate data reuse. The analysis platforms support the reproducibility of data analyses by managing and tracking the relations between input data, algorithms (and their versions), and output data while repositories contain relevant research data. This project will provide a blueprint and a concrete implementation for integrating these two vital sides of ETH Domain ORD infrastructure.
Institutions
Partners
Abstract
To ensure data to be sustainably FAIR and research to be reproducible, lab data management (LDM) and electronic lab notebooks (ELNs) must not evolve as separate systems, but rather ELNs need to be integrated into LDM. To address this need, we propose Gatekeeper, an extension of the already existing Renku platform. Gatekeeper is a centralized system that facilitates research data management across the complete project life cycle, including user access management, versatile integration of different sources, data sharing, archiving, and publication. This Renku extension acts as a middle layer that allows project-specific access to all connected data, independent of its source. In this proposal we will focus on extending Renku regarding the connection and metadata management for ELNs and other sources used in our consortium. New data sources can be integrated in a modular fashion, ranging from low-level and simple linking of data to in-depth integration including data integrity and metadata sanity checks. This modular set-up allows community-driven dissemination of Renku extensions and refinements across future users.
Institutions
Partners
Abstract
The ORD Central Info Point (CIP) project will create an online resources portal, where ETH researchers can navigate and orientate themselves in the ETH ORD landscape at various stages of the Research Data Management (RDM) life cycle. These web pages provide a single-entry point to promote ORD practices and increase services’ visibility with the aim to lower the barriers for researchers in identifying useful tools and services available in the ETH Domain. ORD Central Info Point will foster increased access and interoperability, and push towards increasing the availability of new and existing tools across institutions of the ETH Domain. The ORD Central Info Point will outline infrastructures and services available in the ETH Domain providing information on their respective purpose and use cases, access conditions and costs. The portal will not host any service, but instead direct users to the webpage of the relevant service. It will further provide curated information on policies and best practices in the ETH Domain. It will also be designed to include information on the training content from Measure 3 and direct links to these resources, and information on Measure 4. Technically, the ORD Central Info Point will be set up and run on existing infrastructure of the ORD Program Website.
Institutions
Partners
Abstract
Explore a comprehensive suite of digital learning resources designed to support researchers, students, and staff across the ETH Domain in implementing best practices for Research Data Management (RDM) and Open Research Data (ORD). The learning modules allow to learn at one’s own pace, cover a wide range of topics essential for effective management throughout the research data lifecycle, and are available as Open Educational Resources (OER).
Institutions
Partners
Abstract
Research increasingly relies on large amounts of data. To be successful, it needs to be paired with smart and efficient data management. The Data Stewardship Network Proposal of the Lib4RI, Empa, EPFL Library and ETH Library meets this need by:
- Facilitating knowledge exchange and best-practice workshops among persons with data-related roles. List of active Data Stewards is to be shared with the ETH Domain ORD program Measure 3 (dependent on their consent) as complementing activity, to encourage their involvement in the development of course material.
- Providing coordinated and pragmatic support for managing research data across the ETH Domain (i.e., developing a simplified data management plan template and interactive guides on archiving data)
- Suggesting improvements for data policies in the ETH Domain. This complements Measure 4 of the ETH Domain ORD program.
- Making the work and skill sets of Data Stewards and Research Software Engineers visible in the research community by choosing an existing communication platform and promoting its use by active Data Stewards and Research Software Engineers. This complements the service-related information which will be provided by the Central Info Point of Measure 2.
- Empowering the 4RI to catch up with ETH Zürich and EPF Lausanne in terms of data management support for researchers
Institutions
Partners
Abstract
Promoting the FAIR (Findable, Accessible, Interoperable, Reusable) data principles is not complete without considering the links between data and research software. Research software is an integral part of the entire data life cycle and is indispensable for data generation, data collection, data analysis or data archiving. Additionally, software itself as a digital artifact needs to be FAIR. Recently, FAIR principles for research software (FAIR4RS) have been proposed. Most of research software is developed by research software engineers (RSEs), who are dispersed widely across the research landscape. To better promote FAIR & ORD (Open Research Data) principles and other best practices for sustainable software in this community, RSEs would benefit from a common platform for regular interaction and knowledge exchange. In many other countries, RSE communities have been established with great success and help to promote the FAIR data principles and ORD. In this project, we propose to establish RSE communities at all institutions within the ETH Domain and to take the first steps towards building a Swiss-wide RSE community to promote best practices in research software engineering and adoption of FAIR principles for data and software. We also propose to connect the emerging RSE communities to synergize with other relevant established communities in the ORD landscape.
Institutions
Partners
Abstract
With the focus on data stewardship and other research data management specialists, ETH needs to consider whether the current role descriptions, functions, employment conditions and trainings are suitable for ORD and RDM specialists and, if necessary, develop proposals for future career paths of ORD professionals and training programmes.
The ETH Domain ORD Programme currently is concerned with career paths of ORD professionals. Within Measure 5 “Career Paths for Open Research Data Professionals” a project will be launched under the direction of HR ETH Board together with the Heads of Human Resources of the ETH Domain, to
- identify and delineate RDM/ORD roles (i.e., with example job descriptions);
- estimate FTE distribution across institutions and units of roles in each category;
- assess how roles are effectively defined in terms of written job descriptions; perception of roles by staff, their managers, and internal customers; and
- identify the drivers of staff hiring, retention, and job satisfaction/engagement.
The project will make recommendations, to inform strategy and for operational guidance and advice re roles, career paths and training.
Filter
Category
Institutions
Data type
Field
Researchers
Abstract
The Atlas of Regenerative Materials, aims to be a non-profit website that assembles and interconnects knowledge about construction with bio- and geo-sourced building materials. The goal is to create an open tool that provides visibility to the entire value chain in sustainable construction, linking natural resources to exemplary buildings and involving the expert community. The Atlas includes information about natural resources, building materials, professionals in the sector, technical construction details, and buildings implementing sustainable construction. A critical factor for the success of this knowledge platform is the collaboration of researchers from academia and industry. The project seeks to gather research results and expertise from various sources to create a comprehensive and reliable repository. The involvement of renowned experts from academia and industry is seen as a key element in establishing a community-driven platform. The project will be a community-driven web repository gathering scientific knowledge and research on natural resources, building materials, technical constructions, professionals, and exemplary buildings. The project seeks to unite various stakeholders in the construction industry, providing academia with teaching support, the construction industry with a ready-to-use database, and third parties with insights into sustainable practices as an alternative to the current depletion of fossil resources.
Category
Institutions
Data type
Field
Researchers
Abstract
The prototype of a new national database on intra-specific genetic diversity in populations of wild species in Switzerland, GenDiB, is currently being developed as part of a project co-financed by the Federal Office for the Environment (FOEN). With this ORD initiative proposal, we aim at (i) development of new tools to support simple procedures for dataset up-/download and to implement attractive visualization features and (ii) community building among researchers and stakeholders through various interactive communication means. Our dedicated activities, in parallel to further developing GenDiB as a beta version for subsequent permanent operation and maintenance, should promote the standardized use of GenDiB as the core repository for respective datasets within the community of researchers and stakeholders in conservation management in Switzerland, and possibly beyond. Integrating GenDiB into the national network of database holding species-level occurrence data (InfoSpecies) will complement these databases to cover the genetic level of biodiversity, which is fundamental for population and species persistence in the context of environmental change.
Category
Institutions
Data type
Field
Researchers
Abstract
Highly detailed 3D characterization of trees and forests with close-range technologies offers great potential for modeling carbon, energy fluxes, habitat diversity and much more. Numerous research groups are collecting complex 3D data in forests, and some are making it available as open data to the community. However, there are currently very few open data repositories or even metadata for 3D forest data from laser scans that can be used by different groups on a European or global scale. In the Forest3DTwin project, we want to build a prototype for open storage of measured 3D data with reference data according to the FAIR principles (findability, accessibility, interoperability and reusability) and are convinced that this can be established at the Swiss Federal Institute for Forest Research WSL in the long term and lead to a engagement and commitment of the European and global community to open 3D forest data.
Category
Institutions
Data type
Field
Researchers
Abstract
Tree-ring research has played a pivotal role in unraveling past environmental conditions, understanding climate variability, and providing valuable insights into ecological changes over time. In the current era of digitalization, driven by technological advancements, the field is undergoing a profound transformation, delivering unprecedented details crucial for enhanced comprehension and exploration across various research domains. However, the absence of a suitable repository and the emerging imbalance between resource-intensive data producers and users pose significant challenges to data-sharing practices. This project aims to address these challenges by showcasing an operational solution, exemplified through intra-annually resolved wood cell anatomical data and images, involving the establishment of a modern, robust, and flexible repository and simultaneous redefinition of incentives for data producers. The Xcell Hub, through the creation of a community-specific, interactive, visualizable, and user-friendly online data repository, aims to foster Open Research Data (ORD) practices. This solution integrates modern open-source technologies, emphasizing decentralized interactivity, rewarding mechanisms, and transparent data assessment to secure data archiving, engage data producers, stimulate contributions, and enhance data quality
Category
Institutions
Data type
Field
Researchers
Abstract
Custom instruments designed by researchers in academia are a key resource for advancing discovery and technology. However, they are very difficult and thus rarely shared openly with the scientific community. In the previous Explore project we have built an OpenSPM ecosystem around our custom OpenSPM prototypes. Through the rapid growth and success of this project, we have identified what will be required to sustain the growth of the OpenSPM ecosystem to become a global platform for future SPM research and development. In this second Explore grant call, we will continue to increase the available open technology, focus on maintenance and long-term viability by developing a professional development framework, and expand the community through international outreach and expansion beyond the field of SPM. The result will be a long-term sustainable, globally distributed network of users and developers from academia and industry
Category
Institutions
Data type
Field
Researchers
Abstract
3D imaging is a cutting-edge method for digitizing natural history collections, offering immense potential for taxonomy, general biology, and education. By analyzing 3D models, specialists worldwide can instantly access rare reference objects from collections, aiding in field interpretation and various scientific and educational applications. As 3D scanning becomes more efficient and digitization initiatives invest heavily in generating 3D data, 3D models are anticipated to become widespread. Surprisingly, there is limited research on how these 3D data are being used for research and education in natural history collections. Initial comparisons suggest the current 3D models are complex sets of data, lack suitable tools for analysis and modification, and require linking to additional metadata for utility in taxonomic research and education. This proposal aims to establish additional standards for 3D data preparation and develop best practice guidelines to ensure the usability of natural history collections data. Rather than focusing on developing ready-to-use solutions, the emphasis will be on identifying needs, documenting recommendations, and testing them with expert user groups. The outcomes will directly impact data infrastructures in the U.S., Europe, and Switzerland, serving over 500 institutions. They will also enable expert groups worldwide, particularly in the Global South, to virtually access natural history collection specimens for various scientific purposes
Category
Institutions
Data type
Field
Researchers
Abstract
While ORD practices become increasingly widespread, one area that remains a challenge is qualitative data. On the one hand, qualitative data is more difficult to process and make available in open practices. On the other, ethical norms in research practice require confidentiality of research subjects. Yet interview transcripts, workshops or other kinds of qualitative data are difficult or sometimes impossible to anonymize. This challenge comes to the fore when conducting transdisciplinary (Td) research, where new forms of engagement between science and society co-produce problem framings and project outputs. In this context, questions of who processes and stores data are increasingly important. Td research makes frequent use of qualitative methods, especially interviews and workshops. Sharing of this data could allow for improved learning between Td processes and increase engagement between science and society. How might this data be shared according to FAIR principles? What are appropriate protocols for determining what data to share and how to navigate the ethical issues of research participant protection and the benefits of sharing qualitative data? In this project we will prototype tools for FAIR qualitative data within Td research projects, develop standards and guidelines for other Td researchers and build a dialogue and community of practice within the Td research community around FAIR practices for qualitative data.
Category
Institutions
Data type
Field
Researchers
Abstract
Hybrid models, which combine physics and machine learning (ML) based models, are becoming increasingly popular in hydrology and the broader Earth Science community due to their potential for improved prediction and process representation. However, hybrid models pose unique challenges to open research practices, including the widely accepted FAIR principles. Unlike physics-based models, the reusability of hybrid models is hindered by the integration of ML models which dynamically change with training data. Furthermore, existing model and data repositories are not designed to host hybrid models which contain code, ML models, and associated training data. To address these challenges, FRAME will collaboratively design, implement, and test a standardised FAIR protocol tailored for hydrological hybrid models. The protocol will consist of coding standards for interoperability between different model components, a unified metadata specification accounting for different types of physics and ML-based models, and a python package leveraging existing model and data repositories widely used in the hydrology (HydroShare) and ML (DLHub) communities to share and retrieve hybrid models. To ensure wider and long-term impact of the project beyond its lifetime, the developed protocol will be actively used and improved by participating groups in the ETH Domain and Europe and will ultimately be transitioned to a community-driven protocol, inviting participation from the wider scientific community.
Category
Institutions
Data type
Field
Researchers
Abstract
Despite the presence of various Energy System Model (ESM) tools and platforms facilitating data sharing across models such as CROSSDat, the absence of a standardized structure poses a significant challenge, hindering the seamless reuse of data across different modules. MOTEL tackles this problem by applying Open Research Data (ORD) practices to technology-related data. The project aims to build an open-access data-base from unpublished data that is actively developed and used for internal model development by the Urban Energy Systems Lab (UESL) at Empa. This database will be supplemented with metadata to improve accessibil-ity, interoperability, and transparency of the underlying parameters when used in ESM studies. To achieve interoperability, we will identify and adhere to established ontologies. By connecting each parameter to a mathematically formulated model, we aim to develop a standardized methodology that explicitly defines each parameter, reducing the need for manual preprocessing and expert knowledge when using the data for ESM studies. The database will be published and maintained on the lab's existing infrastructure. In addition, we will establish a reproducible workflow demonstrating how the structured, ORD-compliant data can be seamlessly integrated into ESM tools. This will be achieved through a Python-based model builder that connects to the database and interfaces with our optimization-based ESM platform, ehubX.
Category
Institutions
Data type
Field
Researchers
Abstract
Achieving FAIR open-science needs reproducible scientific data and publications to be associated. While choosing the appropriate tools and storage facilities is already difficult, ensuring the quality of an open publication of datasets relies heavily on data curation procedures. Curation calls for proper data validations and annotations, which are clearly discipline dependent. This proposal aims at exploring the extension of Solidipes, a dataset curation software, to make it suitable to each scientific discipline's needs, but also to broaden the platforms (repositories) Solidipes can interact with. For this project, researchers from computational mechanics and astrophysics associate with a member of the EPFL library in order to gather the experience necessary to cover this ambitious explore project. If funded, the project's team will collaborate with the teams of Episciences (French diamond open access platform) and with Renku (Swiss platform deploying tools for reproducible and collaborative data analysis). This proposal describes how to fill gaps towards a modern and multidisciplinary ecosystem for dataset+paper publications.