Dataset DRAFT Profile
Version: 0.4-DRAFT (30 March 2021)
If you spot any errors or omissions with this type, please file an issue in our GitHub.
Contributors
The following people have been involved in the creation of this specification document. They are all members of the Datasets group.
Group Leader(s)
Other team members
Schema.org hierarchy
This Profile fits into the schema.org hierarchy as follows:Thing > CreativeWork > Dataset
Description
A guide for how to describe datasets in the life-sciences using Schema.org-like annotation.
Summary of Changes
- Many: Other #473 – Updated properties used in the profile to be aligned with schema.org v12.0
- keywords: Other #311 – Updated guidance
- maintainer: Added – New property in schema.org that is of particular relevance for datasets
- identifier: Other #310 – Updated guidance and example
- alternateName: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- hasPart: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- isPartOf: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- license: Marginality Increase – Licenses are required to know what terms the dataset can be used under
- isAccessibleForFree: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- dateCreated: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- dateModified: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- datePublished: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- publisher: Added #312 – Adding terms used by other recommendations such as Google or FigShare as optional
- isBasedOn: Added #477 – Link a Dataset to a Study that produced it
Latest profiles
Latest release: 0.3-RELEASE-2019_06_14
Latest draft: 0.4-DRAFT
Previous profiles
Previous version: 0.3-RELEASE-2019_06_14
Previous release: 0.3-RELEASE-2019_06_14
Group | Use Cases | Cross Walk | Task & Issues | Examples | Live Deploys |
---|---|---|---|---|---|
Datasets |
![]() |
![]() |
![]() |
![]() |
![]() |
You can read the release version of this specification here.
Key to specification table
- Green properties/types are proposed by Bioschemas, or indicate proposed changes by Bioschemas to Schema.org
- Red properties/types exist in the core of Schema.org
- Blue properties/types exist in the pending area of Schema.org
- Black properties/types are reused from external vocabularies/ontologies
CD = Cardinality
Property | Expected Type | Description | CD | Controlled Vocabulary | Example |
---|---|---|---|---|---|
Marginality: Minimum. | |||||
description |
Text |
Schema: A description of the item. Bioschemas: A short summary describing a dataset. |
ONE |
|
|
identifier |
PropertyValue Text URL |
Schema: The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details. Bioschemas: CURIEs that can be resolved using Identifiers.org should be used. |
MANY |
|
|
keywords |
DefinedTerm Text URL |
Schema: Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas. Bioschemas: Keywords should be drawn from a controlled vocabulary, e.g. EDAM, and supplied as a DefinedTerm list. |
ONE |
|
|
license |
CreativeWork URL |
Schema: A license document that applies to this content, typically indicated by URL. Bioschemas: A license under which the dataset is distributed. |
ONE |
|
|
name |
Text |
Schema: The name of the item. Bioschemas: A descriptive name of the dataset. |
ONE |
|
|
url |
URL |
Schema: URL of the item. Bioschemas: The location of a page describing the dataset. |
ONE |
|
|
Marginality: Recommended. | |||||
alternateName |
Text |
Schema: An alias for the item. |
MANY |
|
|
citation |
CreativeWork Text |
Schema: A citation or reference to another creative work, such as another publication, web page, scholarly article, etc. Bioschemas: A citation for a publication that describes the dataset. |
MANY |
|
|
creator |
Organization Person |
Schema: The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork. Bioschemas: The name of the dataset creator (person or organization). |
MANY |
|
|
distribution |
DataDownload |
Schema: A downloadable form of this dataset, at a specific location, in a specific format. |
ONE |
|
|
includedInDataCatalog |
DataCatalog |
Schema: A data catalog which contains this dataset. Supersedes includedDataCatalog, catalog. Inverse property: dataset |
MANY |
|
|
isBasedOn |
CreativeWork Product URL |
Schema: A resource that was used in the creation of this resource. This term can be repeated for multiple sources. For example, http://example.com/great-multiplication-intro.html. Supersedes isBasedOnUrl. Bioschemas: Use to link a Dataset to the Study that it was generated from. |
MANY |
|
|
measurementTechnique |
Text URL |
Schema: A technique or technology used in a Dataset (or DataDownload, DataCatalog), corresponding to the method used for measuring the corresponding variable(s) (described using variableMeasured). This is oriented towards scientific and scholarly dataset publication but may have broader applicability; it is not intended as a full representation of measurement, but rather as a high level summary for dataset discovery. For example, if variableMeasured is: molecule concentration, measurementTechnique could be: “mass spectrometry” or “nmr spectroscopy” or “colorimetry” or “immunofluorescence”. If the variableMeasured is “depression rating”, the measurementTechnique could be “Zung Scale” or “HAM-D” or “Beck Depression Inventory”. If there are several variableMeasured properties recorded for some given data object, use a PropertyValue for each variableMeasured and attach the corresponding measurementTechnique. |
MANY |
|
|
variableMeasured |
PropertyValue Text |
Schema: The variableMeasured property can indicate (repeated as necessary) the variables that are measured in some dataset, either described as text or as pairs of identifier and description using PropertyValue. Bioschemas: What does the dataset measure? (e.g., temperature, pressure). |
MANY |
|
|
version |
Number Text |
Schema: The version of the CreativeWork embodied by a specified resource. Bioschemas: The version number for this dataset. |
ONE |
|
|
Marginality: Optional. | |||||
dateCreated |
Date DateTime |
Schema: The date on which the CreativeWork was created or the item was added to a DataFeed. |
|||
dateModified |
Date DateTime |
Schema: The date on which the CreativeWork was most recently modified or when the item’s entry was modified within a DataFeed. |
|||
datePublished |
Date |
Schema: Date of first broadcast/publication. |
|||
hasPart |
CreativeWork Trip |
Schema: Indicates an item or CreativeWork that is part of this item, or CreativeWork (in some sense). Inverse property: isPartOf |
|||
isAccessibleForFree |
Boolean |
Schema: A flag to signal that the item, event, or place is accessible for free. Supersedes free. |
|||
isPartOf |
CreativeWork URL |
Schema: Indicates an item or CreativeWork that this item, or CreativeWork (in some sense), is part of. Inverse property: hasPart |
|||
maintainer |
Organization Person |
Schema: A maintainer of a Dataset, software package (SoftwareApplication), or other Project. A maintainer is a Person or Organization that manages contributions to, and/or publication of, some (typically complex) artifact. It is common for distributions of software and data to be based on “upstream” sources. When maintainer is applied to a specific version of something e.g. a particular version or packaging of a Dataset, it is always possible that the upstream source has a different maintainer. The isBasedOn property can be used to indicate such relationships between datasets to make the different maintenance roles clear. Similarly in the case of software, a package may have dedicated maintainers working on integration into software distributions such as Ubuntu, as well as upstream maintainers of the underlying work. |
MANY | ||
publisher |
Organization Person |
Schema: The publisher of the creative work. |
|||
sameAs |
URL |
Schema: URL of a reference Web page that unambiguously indicates the item’s identity. E.g. the URL of the item’s Wikipedia page, Wikidata entry, or official website. |