NCRI Informatics Header NCRI Informatics Initiative Home NCRI

NCRI Informatics Initiative and data standards

The NCRI data sharing policy is raising questions about how data should be represented to facilitate sharing, making the need for data exchange standards critical and immediate.
The NCRI Informatics Initiative supports the development of standards for describing, formatting, submitting, and exchanging both data and metadata. We are thus working with the relevant communities to identify, evaluate and promote the adoption of common standards across the spectrum of cancer research.

The Cancer InfoMatrix

The Cancer InfoMatrix is just one of a suite of features available through our web-based tool the NCRI Oncology Information Exchange (ONIX). It aims to raise awareness of what standards exist, or are under development, across the entire spectrum of cancer research, from genomics to population studies. These standards support good data management and facilitate better interpretation, exchange and storage of data. The Cancer InfoMatrix provides a comprehensive view of the status of development of these standards. Additionally, it provides enough information to allow researchers to more easily judge which standards are most appropriate for their use.

Please take a look at the Cancer InfoMatrix and if you know of any other relevant standards that should be included or if you need further information please contact us.

What are data standards?

Data standards are community agreed specifications for how different data types should be represented and described.

Categories of data standards

It is important to distinguish between standards that specify how to actually do experiments and standards that specify how to describe experiments. We focus on standards that specify how to describe and communicate data and information including checklists (e.g. minimum reporting guidelines for metadata descriptions), syntax (data exchange languages) and semantics (data models, ontologies and controlled vocabularies).

Standards can be informal or formal. Informal standards are used widely within a community but have not gone through a certification process from a recognised institution. Formal standards are also used widely but have gone through a process of definition by a recognised institution and are maintained by them through formal maintenance procedures.

why use data standards

In this high throughput, open source era, access to data is taken for granted. However, data alone is of little use unless it is made available in a usable form through the development and global uptake of data standards. The adoption of common standards by any community provides a robust foundation for successful data portability, sharing, integration, interoperability and reusability of data. This is because standards ensure that data is clearly described and in a format that is universally compatible and can therefore be seamlessly exchanged between different IT systems.

Useful resources

Several synergistic activities have begun that aim to foster the harmonisation and consolidation of data standards, including:

BRIDG - The Biomedical Research Integrated Domain Group Model

The BRIDG Model is a collaborative effort of stakeholders from the Clinical Data Interchange Standards Consortium (CDISC), the HL7 Regulated Clinical Research Information Management Technical Committee (RCRIM TC), the National Cancer Institute (NCI), and the US Food and Drug Administration (FDA) to produce a shared view of the dynamic and static semantics that collectively define a shared domain-of-interest, i.e. the domain of clinical and pre-clinical protocol-driven research and its associated regulatory artefacts.

EQUATOR Network - Enhancing the Quality and Transparency of Health Research

The EQUATOR Network is a new initiative that seeks to improve the quality of scientific publications by promoting transparent and accurate reporting of health research.

MIBBI – Minimum information for Biomedical or Biological Investigations

The MIBBI project maintains a web-based communal resource designed to act as a one-stop shop for exploring the range of extant checklist projects and to foster collaborative, integrative development of checklists.

OBO Foundry – Open Biological Ontology Foundry

The OBO Foundry is a collaborative experiment involving developers of science-based ontologies* who are establishing a set of principles for ontology development. They aim to create a suite of interoperable reference ontologies in the biomedical domain to ensure consistency in the way that biomedical data and information are represented.

 * An ontology is a catalogue or model of the different types of information that exist in a domain and how the pieces of information relate to each other. They help in developing databases, website's and any other tool for displaying complex data and information.

 

International and national bodies that formally approve standards or provide a framework for standards development include:

 

ANSI - American National Standards Institute

ASTM International – American Society for Testing and Materials International

BSI – British Standards Institute

CEN – European Committee for Standardisation

GSC – Genomic Standards Consortium

HL7 – Health Level 7

HUPO PSI – Human Proteome Organisation Proteomics Standards Initiative

ISO – International Organisation for Standardisation

MGED – Microarray and Gene Expression Data Society

 

Last updated 19.03.2010 Terms and Conditions © Copyright NCRI Informatics Initiative 2010