Towards better findability: Bioschemas meets Schema.org
Published: 07 July 2021
On 7 July 2021, Schema.org included five Bioschemas types for representing life sciences data in its latest release. This update signifies an important step towards developing web markup in the life sciences. It will improve the findability of many web resources in the domain and ultimately translate into more efficient research.
Schema.org is an international community effort, with Google, Microsoft, Yahoo and Yandex adopting its vocabulary on their applications and search engines. It provides technical specifications to major search engines to distinguish and index different data objects on the internet.
In the life sciences, Bioschemas takes the lead, encouraging experts to use Schema.org markup on their websites. By extending Schema.org to cover labels specific to the life sciences, content can be widely machine-automated and discovered.
Recognising the value of Bioschemas’ work, the latest release of the Schema.org vocabulary includes types for providing high-level descriptions of genes, proteins, molecular entities, chemical substances and taxons.
Although these have just been added to Schema.org, markup with these types was already extensively used in the life sciences, supported by Bioschemas’ endorsement. For example, sites such as Ensembl, HGNC, STRING, DisProt and GBIF already use it. In fact, Bioschemas recognises a long list of known deployments – including 70 different life sciences sites.
‘This is a fantastic achievement for the Bioschemas community and a major milestone for us. It will be the catalyst for further adoption of Bioschemas markup within resources, leading to them being more findable in a web-based context. It will also enable further innovation in the use of that markup. This would not have been possible without the seed funding from ELIXIR to support this grassroots initiative.’
Alasdair Gray, Chair of the Bioschemas Steering Council
Bioschemas creates new recommendations and enhances existing domain ontologies, a formal and standardised naming of data categories, properties and relations. The approach also facilitates compliance with the FAIR (findable, accessible, interoperable and reusable) principles, a set of broadly recommended practices to improve the reusability of research data.
Over several years, constant collaboration within the Bioschemas community has been key to developing these types, alongside ELIXIR’s support. With funding, coordination and knowledge-sharing on governance strategies, ELIXIR played a critical role in establishing and sustaining the open and global Bioschemas community.
This article is also posted on the ELIXIR website