GitHub Twitter

Dataset DRAFT Specification v. 0.2

Bioschemas specification for describing a dataset in the life-science.


The following people have been involved in the creation of this specification document. They are all members of the Datasets group.

Group Leader(s)
Other team members


A guide for how to describe datasets in the life-sciences using annotation. hierarchy

This is a new Profile that fits into the hierarchy as follows:

Thing > CreativeWork > Dataset

Key to specification table properties where the Expected Types have been changed, or new (i.e., Bioschemas created) properties/types are green. properties/types are red.

Pending properties/types are blue.

External (i.e., from 3rd party ontology) properties/types are black.

CD = Cardinality

Property Expected Type Description CD Controlled Vocabulary
Marginality: Minimum.
description Text
Schema: A description of the item.
Bioschemas: A short summary describing a dataset.
identifier PropertyValue
Schema: The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details. MANY
keywords Text
Schema: Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas.
Bioschemas: These keywords provide a summary of the dataset.
name Text
Schema: The name of the item.
Bioschemas: It is a descriptive name of the dataset.
rdf:type URL Bioschemas: This is used by validation tools to indentify the profile used. You must use the value specified in the Controlled Vocabulary column. ONE Missing!
url URL
Schema: URL of the item.
Bioschemas: It is the location of a page describing the dataset.
Marginality: Recommended.
citation CreativeWork
Schema: A citation or reference to another creative work, such as another publication, web page, scholarly article, etc.
Bioschemas: A citation for a publication that describes the dataset.
creator Organization
Schema: The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork.
Bioschemas: The name of the dataset creator (person or organization).
distribution DataDownload
Schema: A downloadable form of this dataset, at a specific location, in a specific format. ONE
includedInDataCatalog DataCatalog Schema: A data catalog which contains this dataset. MANY
license CreativeWork
Schema: A license document that applies to this content, typically indicated by URL.
Bioschemas: A license under which the dataset is distributed.
measurementTechnique Text
Schema: A technique or technology used in a Dataset (or DataDownload, DataCatalog), corresponding to the method used for measuring the corresponding variable(s) (described using variableMeasured). This is oriented towards scientific and scholarly dataset publication but may have broader applicability; it is not intended as a full representation of measurement, but rather as a high level summary for dataset discovery.
For example, if variableMeasured is: molecule concentration, measurementTechnique could be: "mass spectrometry" or "nmr spectroscopy" or "colorimetry" or "immunofluorescence".
If the variableMeasured is "depression rating", the measurementTechnique could be "Zung Scale" or "HAM-D" or "Beck Depression Inventory".
If there are several variableMeasured properties recorded for some given data object, use a PropertyValue for each variableMeasured and attach the corresponding measurementTechnique.
variableMeasured PropertyValue
Schema: The variableMeasured property can indicate (repeated as necessary) the variables that are measured in some dataset, either described as text or as pairs of identifier and description using PropertyValue.
BioSchemas: What does the dataset measure? (e.g., temperature, pressure).
version Number
Schema: The version of the CreativeWork embodied by a specified resource.
Bioschemas: The version number for this dataset.

Top ▲