Starting a User Community for Cutting-Edge Sequence Search
Category
Institutions
Data type
Field
Researchers
Abstract
The exponential growth of biomedical sequencing data has led to considerable challenges and open problems for genomic data management, leading to imitations in accessing and utilising this vast resource efficiently. The Sequence Read Archive (SRA) exemplifies the scale of available data, housing over 40 Petabases. However, the current indexing methods, which rely on metadata rather than full-text searches, significantly limit the potential for research and discovery. The Biomedical Informatics lab at ETH Zürich has developed a computational framework capable of indexing whole sequence repositories on a petabyte scale, compressing data significantly while maintaining search efficiency. This framework, embodied in the MetaGraph software platform, represents a major technological advancement, enabling precise, large-scale genomic data analysis. The lab has applied this framework to over 4 PB of raw sequencing data, freely sharing the generated indexes to promote open research. The proposal aims to establish MetaGraph as a leading open research data tool and to build a vibrant user community around it, enhancing accessibility and utility of genomic data. This initiative seeks to break down barriers to data access, fostering a more open, collaborative research environment, and expanding the scope of MetaGraph beyond DNA to include non-DNA repositories, addressing privacy and ethical considerations in data accessibility, and contributing to the democratisation of genomic data.