Research Data Management and Open Research Data Services and Infrastructure

Inventory of services and infrastructures that support ORD practices

As preliminary work, the Expert Group Services and Infrastructure has conducted an inventory of services and infrastructures that support ORD practices of researchers in the ETH Domain. The analysis was carried out along the research data life cycle and identifies which services and infrastructures are required to address specific issues that arise at different points in the research data management process.

Access the report here.

Coming up soon: Central Info Point

Over the next few months, this page will be transformed into a Central Info Point for Research Data Management for researchers in the entire ETH Domain. In the first instance the Central Info Point will guide the researchers to available ORD-related solutions available across the institutions of the ETH Domain.

The Central Info Point project is being carried out as part of Measure 2 “Improve the ecosystem of research data management (RDM) services and infrastructure that support ORD practices”. Within Measure 2, the following projects are also being financed:

API Interoperability Projects:
- Improving the interoperability of Electronic Lab Notebooks and data repositories by enhancing API integration
- Towards a common storage access API
- Interoperability between the ETH Domain Repositories
- Reproducibility and Interoperability in Active Data Publication
ELN Interoperability Projects:
- Promoting ELN Adoption and Streamlining Data Interoperability in the ETH Domain
- Gatekeeper – a comprehensive approach to connect data sources across the ETH Domain
ID / Access Management Project: ETH ORD Identity and Access Management Guidelines
Community Building Projects:
- Data Stewardship Network (DSN) in the ETH Domain
- RSE4ORD – building a research software engineering community to promote open science

ETH ORD Measure 2 Implementation Projects

API Interoperability Projects

Improving the interoperability of Electronic Lab Notebooks and data repositories by enhancing API integration

Institutions

Empa, ETH Zurich, PSI

Partners

Carlo Minotti, Omkar Zade, Frédéric Thomas Potier, Caterina Barillari, Juan Fuentes, Bernd Rinn, Simone Baffelli

Abstract

Sharing of research data is often perceived as a burden by researchers, as it usually involves manual upload of data from data management systems like Electronic Lab Notebooks (ELN) to repositories. This project aims to contribute to a better integration between ELN systems and data repositories in the ETH Domain, by implementing open API-based interfaces between the SciCat data repository and three important ELNs in the ETH Domain (SciLog, openBIS, Heidi). By implementing seamless interfaces between these widely used solutions in the ETH Domain, the project will simplify existing ORD practices for researchers, thereby lowering the barrier for publication of high-quality FAIR datasets.

Towards a common storage access API

Institutions

CSCS, Empa, ETH Zurich, PSI, WSL

Partners

Leonardo Sala, Omkar Zade, Frédéric Thomas Potier, Ionut Iosifescu Enescu, Yasmin Waldeck, Pablo Fernandez, Tilo Steiger, Anusch Bachofner

Abstract

Access to data is a fundamental part of any scientific analysis and discovery, and a basic requirement for Open Research Data. Nevertheless, accessing data can be limited by various barriers, including incompatible infrastructures and APIs, as it was highlighted in the Infrastructure report by the Expert Group Services & Infrastructures (EG-SI). This project aims to introduce a common Storage Access API based on industry standards in high impact use cases among the ETH Domain institutes. This will lower the efforts both for accessing data, and for developing general and domain-specific data-based tools, thus accelerating the path to scientific discoveries and leading to reusable tools across the ETH Domain scientific communities.

Interoperability between the ETH Domain Repositories

Institutions

Eawag / Lib4RI, PSI, WSL

Partners

Fabian Felder, Valeria Granata, Frank Hösli, Carlo Minotti, Omkar Zade, Frédéric Thomas Potier, Giovanni Pizzi, Ionut Iosifescu Enescu, Yasmin Waldeck

Abstract

With this first-of-its-kind interoperability project, Eawag/LIB4RI, WSL and PSI aim to jointly define and implement a basic blueprint for metadata format and exchange to demonstrate the feasibility of achieving interoperability between data catalogues and repositories in the ETH Domain. We aim to define a common understanding and an agreed compliance level with national and international metadata standards relevant to ETH Domain data catalogues and repositories, and subsequently take all required steps to implement and comply with them, in order to improve the visibility of datasets in the ETH domain and beyond. This will include, as major deliverables, improving the link between datasets and scientific publications, and connecting the involved repositories (EnviDat, SciCat, Materials Cloud Archive) with each other and with well-established central search portals (including DORA and the Lib4RI search tool). This project will also uncover hidden challenges and barriers to interoperability, paving the way for other repositories in the ETH domain to join our interoperability efforts.

Reproducibility and Interoperability in Active Data Publication

Institutions

PSI, SDSC, WSL

Partners

Rok Roškar, Ionut Iosifescu Enescu, Yasmin Waldeck, Spencer Bliven, Omkar Zade, Frédéric Thomas Potier

Abstract

We propose to build an integration between analysis platforms and research data repositories to allow for an exchange of information about data use and to facilitate data reuse. The analysis platforms support the reproducibility of data analyses by managing and tracking the relations between input data, algorithms (and their versions), and output data while repositories contain relevant research data. This project will provide a blueprint and a concrete implementation for integrating these two vital sides of ETH Domain ORD infrastructure.

ELN Interoperability Projects

Promoting ELN Adoption and Streamlining Data Interoperability in the ETH Domain

Institutions

Empa, EPFL, ETH Zurich

Partners

Bernd Rinn, Caterina Barillari, Artur Pedziwilk, Juan Fuentes, Andreas Meier, Olli Salo, Jean-Michel Buemi, Juan Convers, Eleni Pratsini, Simone Baffelli, Pascal Su

Abstract

The adoption of Electronic Laboratory Notebooks (ELNs) in academic research settings is steadily increasing and gradually replacing traditional paper-based notebooks. However, transitioning to ELNs requires time and expertise. Complicating matters, the market offers numerous ELN solutions, each with its unique data model, impeding seamless information exchange. In this project, we plan to address two critical aspects of ELN adoption in academia. First, we would like to broaden and strengthen the knowledge and requirements for adoption of ELN and data management solutions in academic research groups inside the ETH domain. This will be a collaboration between ETH Scientific IT Services (SIS) with the School of Engineering of EPFL, drawing from extensive experience inside SIS in deploying and providing their own software, openBIS, as an ELN and data management solution inside the ETH Domain (namely ETH Zurich, Empa and PSI) and beyond. Second, we aim to enhance interoperability by implementing a standard for data export from ELNs. To this end, we will explore and suggest enhancements to the RO-crate format which focuses on packaging data with metadata and simplifies data sharing and preservation, ensuring reproducibility and long-term accessibility. By sharing our experiences and introducing openBIS while exploring data export standards, we aim to contribute to streamlining ELN adoption and fostering data interoperability in academic research environments in the ETH domain. Our approach will facilitate efficient collaboration, enhance research reproducibility, and promote the advancement of scientific knowledge.

Gatekeeper – a comprehensive approach to connect data sources across the ETH Domain

Institutions

ETH Zurich, PSI, SDSC

Partners

Daniel Stekhoven, Rok Roškar, Tanja Stadler, Niko Beerenwinkel, G.V. Shivashankar

Abstract

To ensure data to be sustainably FAIR and research to be reproducible, lab data management (LDM) and electronic lab notebooks (ELNs) must not evolve as separate systems, but rather ELNs need to be integrated into LDM. To address this need, we propose Gatekeeper, an extension of the already existing Renku platform. Gatekeeper is a centralized system that facilitates research data management across the complete project life cycle, including user access management, versatile integration of different sources, data sharing, archiving, and publication. This Renku extension acts as a middle layer that allows project-specific access to all connected data, independent of its source. In this proposal we will focus on extending Renku regarding the connection and metadata management for ELNs and other sources used in our consortium. New data sources can be integrated in a modular fashion, ranging from low-level and simple linking of data to in-depth integration including data integrity and metadata sanity checks. This modular set-up allows community-driven dissemination of Renku extensions and refinements across future users.

Central Info Point Project

ETH ORD Central Info Point

Institutions

Empa, EPFL, ETH Zurich, Lib4RI_copy, WSL

Partners

Guilaine Baud-Vittoz, Chiara Gabella, Matthias Töwe, Urs Beyerle, Patrick Pedrioli, Spencer Bliven, Fabian Felder, Veruska Muccione, Ionut Iosifescu Enescu, Yasmin Waldeck, Anusch Bachofner

Abstract

The ORD Central Info Point (CIP) project will create an online resources portal, where ETH researchers can navigate and orientate themselves in the ETH ORD landscape at various stages of the Research Data Management (RDM) life cycle. These web pages provide a single-entry point to promote ORD practices and increase services’ visibility with the aim to lower the barriers for researchers in identifying useful tools and services available in the ETH Domain. ORD Central Info Point will foster increased access and interoperability, and push towards increasing the availability of new and existing tools across institutions of the ETH Domain. The ORD Central Info Point will outline infrastructures and services available in the ETH Domain providing information on their respective purpose and use cases, access conditions and costs. The portal will not host any service, but instead direct users to the webpage of the relevant service. It will further provide curated information on policies and best practices in the ETH Domain. It will also be designed to include information on the training content from Measure 3 and direct links to these resources, and information on Measure 4. Technically, the ORD Central Info Point will be set up and run on existing infrastructure of the ORD Program Website.

ID / Access Management Project

ETH ORD Identity and Access Management Guidelines

Institutions

Eawag, Empa, EPFL, ETH Zurich, PSI, WSL

Partners

Björn Erik Abt (PSI), Spencer Bliven (PSI), Matthias Gerber (WSL), Davor Kupresak (ETH Zürich), Pierre Mellier (EPFL), Simone Baffelli (Empa), Stuart Dennis (Eawag)

Abstract

Authentication, authorization, and identity and access management (IAM) are central to interoperability between services. Currently ETH services use a variety of identity providers, from federated services like SWITCH eduID to institute-specific active directory installations. Incompatibilities between authentication and authorization can be a major obstacle to interoperability between institutes. To mitigate this, we propose to draft a set of guidelines for IAM practices relating to ORD services. All M2 projects funded under this measure will be expected to follow the guidelines, ensuring that these services are interoperable. The guidelines will also be published in the Central Info Point website and disseminated to researchers, providing clear best practices for services outside the ETH ORD program to follow.

Community Building Projects

Data Stewardship Network (DSN) in the ETH Domain

Institutions

Eawag / Lib4RI, Empa, EPFL, ETH Zurich

Partners

Fabian Felder, Moushumi Ulrich-Nath, Chiara Gabella, Julian Dederke, Stefanie Hauser

Abstract

Research increasingly relies on large amounts of data. To be successful, it needs to be paired with smart and efficient data management. The Data Stewardship Network Proposal of the Lib4RI, Empa, EPFL Library and ETH Library meets this need by:

Facilitating knowledge exchange and best-practice workshops among persons with data-related roles. List of active Data Stewards is to be shared with the ETH Domain ORD program Measure 3 (dependent on their consent) as complementing activity, to encourage their involvement in the development of course material.
Providing coordinated and pragmatic support for managing research data across the ETH Domain (i.e., developing a simplified data management plan template and interactive guides on archiving data)
Suggesting improvements for data policies in the ETH Domain. This complements Measure 4 of the ETH Domain ORD program.
Making the work and skill sets of Data Stewards and Research Software Engineers visible in the research community by choosing an existing communication platform and promoting its use by active Data Stewards and Research Software Engineers. This complements the service-related information which will be provided by the Central Info Point of Measure 2.
Empowering the 4RI to catch up with ETH Zürich and EPF Lausanne in terms of data management support for researchers

RSE4ORD – building a research software engineering community to promote open science

Institutions

Eawag, Empa, EPFL, ETH Zurich, PSI, WSL

Partners

Tarun Chadha, Franziska Oschmann, Uwe Schmitt, Linus Gasser, Son Pham-Ba, Charlotte Weil, Nicolas Richart, Emmanuel Lanti, Elsa Germann, Achim Gsell, Anusch Bachofner, Ionut Iosifescu Enescu, Yasmin Waldeck, tuart Dennis

Abstract

Promoting the FAIR (Findable, Accessible, Interoperable, Reusable) data principles is not complete without considering the links between data and research software. Research software is an integral part of the entire data life cycle and is indispensable for data generation, data collection, data analysis or data archiving. Additionally, software itself as a digital artifact needs to be FAIR. Recently, FAIR principles for research software (FAIR4RS) have been proposed. Most of research software is developed by research software engineers (RSEs), who are dispersed widely across the research landscape. To better promote FAIR & ORD (Open Research Data) principles and other best practices for sustainable software in this community, RSEs would benefit from a common platform for regular interaction and knowledge exchange. In many other countries, RSE communities have been established with great success and help to promote the FAIR data principles and ORD. In this project, we propose to establish RSE communities at all institutions within the ETH Domain and to take the first steps towards building a Swiss-wide RSE community to promote best practices in research software engineering and adoption of FAIR principles for data and software. We also propose to connect the emerging RSE communities to synergize with other relevant established communities in the ORD landscape.