GitHub Twitter

DataCatalog DRAFT Specification v. 0.1

Bioschemas specification for describing data repositories and data catalogs in the life-sciences.


Contributors

The following people have been involved in the creation of this specification document. They are all members of the Data Repositories group.

Group Leader(s)

Henning Hermjakob

Other team members

Description

A guide for how to describe data catalogs/repositories in the life-sciences using Schema.org-like annotation.


Schema.org hierarchy

This is a new Profile that fits into the schema.org hierarchy as follows:

Thing > CreativeWork > DataCatalog



Key to specification table

Schema.org properties where the Expected Types have been changed, or new (i.e., Bioschemas created) properties/types are green.

Schema.org properties/types are red.

Pending Schema.org properties/types are blue.

External (i.e., from 3rd party ontology) properties/types are black.


CD = Cardinality


Property Expected Type Description CD Controlled Vocabulary
Marginality: Minimum
description Text Schema: A description of the item. ONE
keywords Text Schema: Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas.
Bioschemas: Use terms from Controlled Vocabularies where possible.
ONE
provider Organization
Person
Schema: The service provider, service operator, or service performer; the goods producer. Another party (a seller) may offer those services or goods on behalf of the provider. A provider may also serve as the seller.
Bioschemas: Contact information for this data repository/catalog.
MANY
name Text Schema: The name of the item. ONE
rdf:type URL Bioschemas: This is used by validation tools to indentify the profile used. You must use the value specified in the Controlled Vocabulary column. ONE Missing!
url URL Schema: URL of the item. ONE
Marginality: Recommended
alternateName Text Schema: An alias for the item. MANY
citation CreativeWork
Text
Schema: A citation or reference to another creative work, such as another publication, web page, scholarly article, etc.
MANY
dateModified Date
DateTime
Schema: The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed.
BioSchemas: The date on which the data catalog/repository was most recently modified.
ONE
dataset Dataset Schema: A dataset contained in this catalog. Inverse-property: includedInDataCatalog. MANY
identifier PropertyValue
Text
URL
Schema: The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details.
Bioschema: Unique identifier for the data catalog.
ONE
license CreativeWork
URL
Schema: A license document that applies to this content, typically indicated by URL. ONE
publication PublicationEvent Schema: A publication event associated with the item. MANY
sourceOrganization Organization Schema: The Organization on whose behalf the creator was working. MANY
Marginality: Optional
datePublished Date
Schema: Date of first broadcast/publication. ONE
fileFormat Text
URL
Schema: Media type, typically MIME format (see IANA site) of the content e.g. application/zip of a SoftwareApplication binary. In cases where a CreativeWork has several media type representations, 'encoding' can be used to indicate each MediaObject alongside particular fileFormat information. Unregistered or niche file formats can be indicated instead via the most appropriate URL, e.g. defining Web page or a Wikipedia entry. MANY

Top ▲