04.10.2017 11:52

Measuring the impact of digital specimens: persistent identifiers in scientific publications need be uniform

Digitizing specimens and measuring the impact of science is increasingly important in Open Science. ICEDIG and DiSSCo, two large new projects, join iDigBio in digitizing specimens in natural history collections in Europe and the US, helping create a global virtual natural history museum. This has the potential to tremendously increase the impact of taxonomic knowledge beyond its own domain, such as in conservation, and to make taxonomy a reproducible science because all the original data will be accessible from the digitized specimens. CETAF too is following suit to implement persistent identifiers in its member institutions, especially since many of them signed the Bouchout Declaration on Open Biodiversity Knowledge Management (Güntsch et al. 2017).

Semantically enhanced publications such as the Pensoft journals based on Taxpub/JATS, and citable subarticle elements such as taxonomic treatments and figures, are a timely development linking publications to the specimens and vice versa. Thus, they serve as an extension of this virtual natural history museum along with the content from their libraries. Including the links to the digitized specimen, especially if persistent unique identifiers are used, allows measuring the usage of specimens, the impact of collections, and that of individual scientists. Moreover, stable PID URLs of specimens enables text mining and including them in the OpenBiodiv Linked Open Data system built on the top of the Biodiversity Knowledge Graph. The use of DOIs also furthermore allows other actors, such as CrossRef or Altmetrics, to measure the impact of articles.


Figure 1. Example physical herbarium object and its stable HTTP URI identifier. A collection specific part of the URL. source: DOI 10.5281/zenodo.1002026   

However, these digitization efforts to provide access to the specimens in their publications, neither in semantically enhanced publications nor in traditional publications, require substantial behavioral changes by authors and publishers. For example, persistent identifiers minted by each natural history institution in its own style (see A in Figure 1), though created with good intentions, make it difficult for them to be easily discovered programmatically and resolved back to their originating institutions. This hinders automated discovery, extraction and reuse. Ideally, they should make use of DataCite DOIs or share some elements that make them uniquely identifiable as being from this particular class.

Plazi’s efforts to liberate facts from taxonomic publications has enabled us to extract the Digital Object Identifiers (DOI) of bibliographic references. Over the last three years, an average of 14% of the bibliographic references included DOIs  (see stats) which allowed us to add them to the article's metadata in the Biodiversity Literature Repository based at ZENODO (CERN).