Skip to content

The Experiments Metadata Standard Group focuses on defining a minimal set of metadata properties for genomic datasets, to make them usable in primarily in a discovery context.

License

Notifications You must be signed in to change notification settings

ga4gh/experiments-metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GA4GH Experiments Metadata Standard

Purpose of the working group

Our main objective is to specify the minimum information needed to characterise a genomic experiment.

When a researcher downloads a genomic dataset, they typically get CRAM or VCF documents, which are the results of a sequencing experiment. However, these files contain little information on the nature of the experiment itself: are the data from whole genome sequencing, transcriptomics, or another kind of experiment? Are the data for a bulk sequencing or single cell assay? Have techniques been applied to target specific regions of the genome?

Without metadata explaining the context, researchers cannot make sense of results from experiments in genomics, epigenomics, and more. The GA4GH Discovery Work Stream is aiming to produce a minimal checklist of metadata needed to characterise -omics datasets. The Experiments Metadata Standard will provide a dictionary of properties that makes it easier to search for experiments and to understand their results for analysis.

For more information on our group, please visit our GA4GH web page.

Scope

ga4gh_expmeta_scope.png

While the term “metadata” can be very broad (data that describes data), this Discovery Workstream subgroup exclusively focuses on the properties of the methodology and equipment used in a genomic experiment, and more precisely on library preparation and instrument run. It provides context around the preparation of biological samples into libraries for a given laboratory experiment run, and the execution context for that run. Interoperability with other GA4GH standards will be key to the adoption of the standard.

In the first phase, the group will focus exclusively on genomic sequencing instruments generating reads (high-throughput sequencing experiments, such as WGS, RNA-Seq, and Methyl-Seq). Future specification updates may consider the inclusion of other instruments, quality control metrics and -omics data, such as genotyping arrays, proteomics, and metabolomics, based on the evolving needs within the genomics community. Follow this link to our current working document.

The following topics are therefore considered out of scope (and will remain so): clinical data, biological sample descriptors, downstream data processing, and analysis. The discussions revolve around the content of the checklist, rather than the formats, leaving the latter to the DaMaSC sub-working group.

Core Properties Checklist

Two documents are being presented for this first version of the checklist:

  • Core: This checklist contains properties that are relevant to any sequencing assay.
  • Identifiers: This checklist contains identifiers that are relevant to include with a genomic dataset.

Mappings / Implementations

This section provides a mapping of existing platforms and projects to the Experiments Metadata Checklist.

Useful documentation

About

The Experiments Metadata Standard Group focuses on defining a minimal set of metadata properties for genomic datasets, to make them usable in primarily in a discovery context.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors 3

  •  
  •  
  •