Skip to content
GitHub
Twitter

Dataset DRAFT Profile

Version: 0.4-DRAFT (30 March 2021)


If you spot any errors or omissions with this type, please file an issue in our GitHub.


Key to specification table

CD = Cardinality

Property Expected Type Description CD Controlled Vocabulary Example
Marginality: Minimum.
@context URL Used to provide the context (namespaces) for the JSON-LD file.
Not needed in other serialisations.
ONE
@type Text Schema.org/Bioschemas class for the resource declared using JSON-LD syntax. For other serialisations please use the appropriate mechanism.
While it is permissible to provide multiple types, it is preferred to use a single type.
MANY Schema.org, Bioschemas
@id IRI Used to distinguish the resource being described in JSON-LD. For other serialisations use the appropriate approach. ONE
dct:conformsTo IRI Used to state the Bioschemas profile that the markup relates to. The versioned URL of the profile must be used.
Note that we use a CURIE in the table here but the full URL for Dublin Core terms must be used in the markup (http://purl.org/dc/terms/conformsTo), see example.
ONE Bioschemas profile versioned URL
description Text
Schema:

A description of the item.


Bioschemas:

A short summary describing a dataset.

ONE
identifier PropertyValue
Text
URL
Schema:

The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details.


Bioschemas:

CURIEs that can be resolved using Identifiers.org should be used.

MANY
keywords DefinedTerm
Text
URL
Schema:

Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas.


Bioschemas:

Keywords should be drawn from a controlled vocabulary, e.g. EDAM, and supplied as a DefinedTerm list.

ONE
license CreativeWork
URL
Schema:

A license document that applies to this content, typically indicated by URL.


Bioschemas:

A license under which the dataset is distributed.

ONE
name Text
Schema:

The name of the item.


Bioschemas:

A descriptive name of the dataset.

ONE
url URL
Schema:

URL of the item.


Bioschemas:

The location of a page describing the dataset.

ONE
Marginality: Recommended.
alternateName Text
Schema:

An alias for the item.


MANY
citation CreativeWork
Text
Schema:

A citation or reference to another creative work, such as another publication, web page, scholarly article, etc.


Bioschemas:

A citation for a publication that describes the dataset.

MANY
creator Organization
Person
Schema:

The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork.


Bioschemas:

The name of the dataset creator (person or organization).

MANY
distribution DataDownload
Schema:

A downloadable form of this dataset, at a specific location, in a specific format.


ONE
includedInDataCatalog DataCatalog
Schema:

A data catalog which contains this dataset. Supersedes includedDataCatalog, catalog. Inverse property: dataset


MANY
isBasedOn CreativeWork
Product
URL
Schema:

A resource that was used in the creation of this resource. This term can be repeated for multiple sources. For example, http://example.com/great-multiplication-intro.html. Supersedes isBasedOnUrl.


Bioschemas:

Use to link a Dataset to the Study that it was generated from.

MANY
measurementTechnique Text
URL
Schema:

A technique or technology used in a Dataset (or DataDownload, DataCatalog), corresponding to the method used for measuring the corresponding variable(s) (described using variableMeasured). This is oriented towards scientific and scholarly dataset publication but may have broader applicability; it is not intended as a full representation of measurement, but rather as a high level summary for dataset discovery.

For example, if variableMeasured is: molecule concentration, measurementTechnique could be: “mass spectrometry” or “nmr spectroscopy” or “colorimetry” or “immunofluorescence”.

If the variableMeasured is “depression rating”, the measurementTechnique could be “Zung Scale” or “HAM-D” or “Beck Depression Inventory”.

If there are several variableMeasured properties recorded for some given data object, use a PropertyValue for each variableMeasured and attach the corresponding measurementTechnique.


MANY
variableMeasured PropertyValue
Text
Schema:

The variableMeasured property can indicate (repeated as necessary) the variables that are measured in some dataset, either described as text or as pairs of identifier and description using PropertyValue.


Bioschemas:

What does the dataset measure? (e.g., temperature, pressure).

MANY
version Number
Text
Schema:

The version of the CreativeWork embodied by a specified resource.


Bioschemas:

The version number for this dataset.

ONE
Marginality: Optional.
dateCreated Date
DateTime
Schema:

The date on which the CreativeWork was created or the item was added to a DataFeed.


dateModified Date
DateTime
Schema:

The date on which the CreativeWork was most recently modified or when the item’s entry was modified within a DataFeed.


datePublished Date
Schema:

Date of first broadcast/publication.


hasPart CreativeWork
Trip
Schema:

Indicates an item or CreativeWork that is part of this item, or CreativeWork (in some sense). Inverse property: isPartOf


isAccessibleForFree Boolean
Schema:

A flag to signal that the item, event, or place is accessible for free. Supersedes free.


isPartOf CreativeWork
URL
Schema:

Indicates an item or CreativeWork that this item, or CreativeWork (in some sense), is part of. Inverse property: hasPart


maintainer Organization
Person
Schema:

A maintainer of a Dataset, software package (SoftwareApplication), or other Project. A maintainer is a Person or Organization that manages contributions to, and/or publication of, some (typically complex) artifact. It is common for distributions of software and data to be based on “upstream” sources. When maintainer is applied to a specific version of something e.g. a particular version or packaging of a Dataset, it is always possible that the upstream source has a different maintainer. The isBasedOn property can be used to indicate such relationships between datasets to make the different maintenance roles clear. Similarly in the case of software, a package may have dedicated maintainers working on integration into software distributions such as Ubuntu, as well as upstream maintainers of the underlying work.


MANY
publisher Organization
Person
Schema:

The publisher of the creative work.


sameAs URL
Schema:

URL of a reference Web page that unambiguously indicates the item’s identity. E.g. the URL of the item’s Wikipedia page, Wikidata entry, or official website.