GA4GH Experiments Metadata Standard

Purpose of the working group

Our main objective is to specify the minimum information needed to characterise a genomic experiment.

When a researcher downloads a genomic dataset, they typically get CRAM or VCF documents, which are the results of a sequencing experiment. However, these files contain little information on the nature of the experiment itself: are the data from whole genome sequencing, transcriptomics, or another kind of experiment? Are the data for a bulk sequencing or single cell assay? Have techniques been applied to target specific regions of the genome?

Without metadata explaining the context, researchers cannot make sense of results from experiments in genomics, epigenomics, and more. The GA4GH Discovery Work Stream is aiming to produce a minimal checklist of metadata needed to characterise -omics datasets. The Experiments Metadata Standard will provide a dictionary of properties that makes it easier to search for experiments and to understand their results for analysis.

For more information on our group, please visit our GA4GH web page.

Scope

While the term “metadata” can be very broad (data that describes data), this Discovery Workstream subgroup exclusively focuses on the properties of the methodology and equipment used in a genomic experiment, and more precisely on library preparation and instrument run. It provides context around the preparation of biological samples into libraries for a given laboratory experiment run, and the execution context for that run. Interoperability with other GA4GH standards will be key to the adoption of the standard.

In the first phase, the group will focus exclusively on genomic sequencing instruments generating reads (high-throughput sequencing experiments, such as WGS, RNA-Seq, and Methyl-Seq). Future specification updates may consider the inclusion of other instruments, quality control metrics and -omics data, such as genotyping arrays, proteomics, and metabolomics, based on the evolving needs within the genomics community. Follow this link to our current working document.

The following topics are therefore considered out of scope (and will remain so): clinical data, biological sample descriptors, downstream data processing, and analysis. The discussions revolve around the content of the checklist, rather than the formats, leaving the latter to the DaMaSC sub-working group.

Core Properties Checklist

Two documents are being presented for this first version of the checklist:

Core: This checklist contains properties that are relevant to any sequencing assay.
Identifiers: This checklist contains identifiers that are relevant to include with a genomic dataset.

Mappings / Implementations

This section provides a mapping of existing platforms and projects to the Experiments Metadata Checklist.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
img		img
mappings		mappings
LICENSE		LICENSE
README.md		README.md
core.md		core.md
identifiers.md		identifiers.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GA4GH Experiments Metadata Standard

Purpose of the working group

Scope

Core Properties Checklist

Mappings / Implementations

Useful documentation

About

Uh oh!

Releases

Uh oh!

Contributors 3

Uh oh!

License

ga4gh/experiments-metadata

Folders and files

Latest commit

History

Repository files navigation

GA4GH Experiments Metadata Standard

Purpose of the working group

Scope

Core Properties Checklist

Mappings / Implementations

Useful documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 3

Uh oh!