What is Bioschemas?
Bioschemas aims to improve the Findability on the Web of life sciences resouces such as datasets, software, and training materials. It does this by encouraging people in the life sciences to use Schema.org markup in their websites so that they are indexable by search engines and other services. Bioschemas encourages the consistent use of markup to ease the consumption of the contained markup across many sites. This structured information then makes it easier to discover, collate, and analyse distributed resources.
Bioschemas is making two main contributions:
- Proposing new types and properties to Schema.org to allow for the description of life science resources.
- Defining usage profiles over the Schema.org types that identify the essential properties to use in describing a resource.
Recognition of Bioschemas
Use of Bioschemas to make resources more discoverable has been endorsed by the European Research Council in their Open Research Data and Data Management Plans policy ('metadata' section, page 11). Including Bioschemas markup in a resource's metadata means that you meet some of the Findability criteria of the FAIR Data Principles.
The use of Bioschemas markup is also recommended by the International Society for Biocuration in order to help make resources more discoverable.
Schema.org is a community effort supported by the main search engines, and is already widely implemented across the web.
Schema.org provides a way to add semantic markup to web pages. It describes ‘types’ of information, which then have ‘properties’. The types are things that we can talk about and the properties are the things that we can say about the type. For example, Event is a type that has properties like startDate, endDate, and description.
If types or properties needed in the life sciences are missing, then Bioschemas is developing proposals for new types and properties to be included into Schema.org.
To simplify the marking up of web resources, and to provide consistency of markup within the life sciences community, Bioschemas are defining profiles over types that state which properties must be used (minimum), should be used (recommended), and could be used (optional). The profiles also state the cardinality of usage of a property, and identify domain ontologies to use for the value of properties.
For example, if we look at the schema.org/Dataset type there are over 100 properties available to use. The Bioschemas profile over Dataset brings this down to a more manageable number, with 5 mandatory properties and 8 recommended properties. Many of the other properties have little relevance for a Dataset. The dataset markup properties that Bioschemas specifies as mandatory will also make them findable by Google's Dataset Search tool.
The Bioschemas community are defining profiles over relevant existing Schema.org types, e.g. DataCatalog, Course, and SoftwareApplication, and over the new types being defined for the life sciences, e.g. Gene, Protein, and Taxon.
The Bioschemas Community have received funding through the ELIXIR-EXCELERATE grant and ELIXIR Implementation Studies. Full details of funding can be on our funding page.