Bioschemas: marking up biodiversity websites to improve data discovery and web-scale integration

Date: 10 March 2021

Venue: Zoom

Contacts:

Franck Michel

Meeting page: TDWG Webinar

Slides

Although major aggregators such as GBIF are very successful in gathering data from multiple data sources, simple websites (HTML without semantic markup) remain the most common way of sharing scientific data at low cost. To help search engines improve their findability, ranking and summarization, it is now a common practice to annotate web pages with structured, semantic metadata using the Schema.org vocabulary. The Bioschemas community aims to extend Schema.org to support markup for Life Sciences websites. Its biodiversity group has proposed the Taxon type to support the annotation of any webpage denoting taxa, TaxonName to support more specifically the annotation of taxonomic names registries, and guidelines describing how to leverage existing vocabularies such as Darwin Core or Wikidata. This is obviously just a start. We wish to encourage the biodiversity community to adopt this practice and engage in the discussion about possible new terms related, e.g., to traits or collections.

We believe that generalizing the use of such markup by the many websites reporting checklists, museum collections, occurrences etc., as well as project reports often referred to as grey literature, shall be a major step towards the development of novel, web-scale, biodiversity data integration scenarios. This presentation will be a continuation of a talk given at TDWG 2020. Please note, to proceed further the biodiversity community must now demonstrate its interest in having these terms endorsed by Schema.org through live markup deployments and the development of applications capable of exploiting this markup data.