GitHub Twitter

DataRecord DRAFT Specification v. 0.1

Bioschemas specification describing a record in a dataset.


The following people have been involved in the creation of this specification document. They are all members of the Datasets group.

Group Leader(s)
Other team members


A Record acts itself as a dataset although it refers to what could be seen as the minimum compact, complete and auto-descriptive unit in a dataset, i.e., a record. Bioschemas usage In Life Sciences, records will represent a BioChemEntity. hierarchy

This is a new Profile that fits into the hierarchy as follows:

Thing > CreativeWork > Dataset

Key to specification table properties where the Expected Types have been changed, or new (i.e., Bioschemas created) properties/types are green. properties/types are red.

Pending properties/types are blue.

External (i.e., from 3rd party ontology) properties/types are black.

CD = Cardinality

Property Expected Type Description CD Controlled Vocabulary
Marginality: Minimum
identifier PropertyValue
Schema: The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details. ONE
mainEntity Thing Schema: Indicates the primary entity described in some page or other CreativeWork. Inverse-property: mainEntityOfPage.
Bioschemas: Bioschemas usage. Link to the BioChemEntity represented by this record.
rdf:type URL Bioschemas: This is used by validation tools to indentify the profile used. You must use the value specified in the Controlled Vocabulary column. ONE Missing!
Marginality: Recommended
additionalType URL Schema: An additional type for the item, typically used for adding more specific types from external vocabularies in microdata syntax. This is a relationship between something and a class that the thing is in. In RDFa syntax, it is better to use the native RDFa syntax - the 'typeof' attribute - for multiple types. tools may have only weaker understanding of extra types, in particular those defined externally.
Bioschemas: Although not required, additionalType can be used to specify the nature of the record. For instance, a UniProt protein record would have UP:Protein as type.
Marginality: Optional
additionalProperty PropertyValue
Schema: A property-value pair representing an additional characteristics of the entitity, e.g. a product feature or another characteristic for which there is no matching property in Note: Publishers should be aware that applications designed to use specific properties (e.g.,,, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism.
Bioschemas: Additional to the use of name and description to describe this property in a human-readable way, additionalType should be used to specify the nature of the property/relation. For instance, if the property refers to a gene/protein disease association, you could use SIO:000983 (gene-disease association) as the additionalType for the additionalProperty.
citation CreativeWork
Schema: A citation or reference to another creative work, such as another publication, web page, scholarly article, etc. MANY
dateCreated Date
Schema: The date on which the CreativeWork was created or the item was added to a DataFeed. ONE
dateModified Date
Schema: The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed. ONE
datePublished Date Schema: Date of first broadcast/publication. ONE
distribution DataDownload Schema: A downloadable form of this dataset, at a specific location, in a specific format. MANY
image ImageObject
Schema: An image of the item. This can be an URL or a fully described ImageObject. MANY
isBasedOn CreativeWork
Schema: A resource that was used in the creation of this resource. This term can be repeated for multiple sources. For example,
Bioschemas: Whenever possible use Evidence Codes (ECO).
isBasisFor CreativeWork
Bioschemas: A resource for which this resource is basis for. Inverse property: isBasedOn.
Bioschemas DataRecord: Whenever possible use Evidence Codes (ECO).
keywords Text Schema: Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas. ONE
sameAs URL Schema: URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website. MANY
seeAlso Thing
Bioschemas: A pointer to any (somehow related) Thing. To be used whenever you are not so sure about the nature of the relation. Otherwise, use more precise terms from pre-existing Controlled Vocabularies. MANY
url URL Schema: URL of the item. MANY

Top ▲