GitHub Twitter

Bioschemas, what and why?

In this tutorial you will learn what Bioschemas is, what the added value to schema.org.is and what the main elements in Bioschemas are

Keywords: schemaorg, markup, structured data, bioschemas

Topics:

Audience:

  • People interested in introductory information to Bioschemas

Authors:

Contributors:

License: CC-BY 4.0

Version: 2.0

Last Modified: 17 February 2021


◀ Previous tutorial: Schema.org markup examples | Next tutorial: How to select the right profile

What is Bioschemas?

Bioschemas is a community project built on top of schema.org, aiming to improve interoperability in Life Sciences so resources can better communicate and work together by using a common markup on their websites.

Using schema.org markup on web pages enables the generation of ‘info box’ summaries in typical web search results pages, as exemplified by google search results. Bioschemas aims to make it possible to get similar summaries, but focused on Life Science resources such as Proteins, Samples, Beacons, Tools, Training, Life science events and so on.

Imagine an insulin summary appearing within search results, but rather than pointing to Wikipedia, that summary would direct one to specialized resources such as Orphanet or CATH as seen in Figure 1. In this way you would get a quick overview while also being provided links to relevant resources, all in one search.

Figure 1. Insulin summary on a search engine
Figure 1. Insulin summary on a search engine

What are the benefits of Bioschemas?

Bioschemas inherits the benefits from schema.org, i.e., enabling machines to understand what your metadata is in advance, making it easier to find, integrate, and re-use your data. It also brings some benefits tailored to the Life Sciences community. In Figure 2, you can find a graphical summary of such benefits, which are explained in more detail in the paragraphs below.

Figure 2. Insulin summary on a search engine
Figure 2: Event profile provided by Bioschemas for the Event type in schema.org
warning Schema.org provides only 'types', while Bioschemas provides 'types' and 'profiles'. A profile is a customisation of type, including important guidelines on how to use it within the Life Sciences domain. A profile can be used to define the semantics of a particular property, the valid value(s) and ranges that may be attributed to that property, and the cardinality with which that property may appear.
Disclaimer: Initially, Bioschemas types were developed with the aim to eventually mature those types and have them integrated for direct use in schema.org. While this remains desirable, it is not essential; community tools and resources are being developed to directly harvest this markup, and there are activities in progress to migrate Bioschemas markup from individual resources to the EOSC (European Open Science Cloud).
  • Bioschemas focuses on key properties prioritised as Minimum, Recommended and Optional based on community agreements and common practices
info
  • Minimum properties should be provided
  • Recommended properties should be provided whenever possible and available
  • Optional properties could be omitted unless important or relevant for your resource

e.g., For the Event case shown on Figure 2, endDate and location are minimum while organizer is recommended.
Reminder: a property helps you describe your resource
  • Bioschemas provides additional recommendations regarding properties cardinality
info A property expects ONE or MANY elements
e.g., For the Event case, endDate should be ONE while organizer could be MANY
  • Bioschemas customises schema.org types (see previous tutorial) to better supports needs on the life sciences community
info Event already exists in schema.org. However, Bioschemas has added some new properties, for instance, "prerrequisite" is commonly used in Life Sciences to define a list of required skills to be able to attend an appropriate event.
  • Bioschemas reuses terms from well-known ontologies, thus avoiding reinventing the wheel
info Tools, a SoftwareApplication profile, recommends using terms from the EDAM ontology in order to specify, for instance, the input and output expected.
Protein, a BioChemEntity profile, includes some properties that come from well-known ontologies. For instance associatedWith comes from from the Sequence Ontology. By reusing terms, Bioschemas aims to avoid reinventing the wheel.

◀ Previous tutorial: Schema.org markup examples | Next tutorial: How to select the right profile

Top ▲