Why do we need BLR to store scholarly taxonomic articles?

December 14, 2018

When Jérôme Constant published yesterday his taxonomic article on eurybrachid planthopper insects in the European Journal of Taxonomy, its data became immediately accessible as FAIR and open access data in TreatmentBank including taxonomic treatments, and the Biodiversity Literature Repository (BLR) as an article and figures.

This data is also at the same moment reused by GBIF to describe the dataset aka article, to list all its material citations, provide the taxonomic treatments for the included species, which is especially valuable for those new to the sciences or an individual occurrence. Ocellus is using the figures to index journals and provide a novel access to them. All the bibliographic references have been added to Refbank. Some of the references in the article include a Digital Object Identifier DOI, but most do not.

In addition to providing access to data that has been made citable, findable and reusable, the article deposit in BLR provides related identifiers for all the data extracted. They all resolve to a respective digital copy, i.e. the figure, taxonomic treatment or article.

Since there are only a limited number of articles for which a DOIs is included in the bibliographic reference, they do not resolve to the digital object, which in most cases is a PDF. This, despite the fact, that most of the authors have on their computers PDFs of all the articles they cite.

Wouldn’t it be great if authors would make all the referenced articles accessible to everybody by creating a DOI for those that are not yet accessible? This would save a tremendous amount of time to any subsequent readers of their work, since they would not have to find, digitize, or copy the articles again. It would raise the number of taxonomic articles and treatments accessible for use in Wikicite and Wikidata. It would contribute towards building corpora of knowledge for taxa where all the published record is digital.

BLR offers exactly such a service. It is part of Zenodo, one of the world’s leading repositories, especially for long tail scientific results that have no other home. This includes data that is underused because they are cumbersome to access, such as the millions of scientific figures or the taxonomic treatments that go largely unnoticed. Zenodo is part of one of the largest science experiments at CERN with a very powerful and sustainable IT infrastructure.

Zenodo is not unique in the way it makes its deposits discoverable and citable. It is using DataCite data types and digital object identifiers (DOI) that are a global standard and are becoming increasingly more discoverable through collaboration with CrossRef and ORCID discussed early in the coming year.

Contributors to Zenodo can upload records individually or in batches. Ideally though, access to scientific articles is provided by publishers, libraries or services like the Biodiversity Heritage Library or Archive.org or national services. The challenge though is that an expected 10% of the legacy literature is so far digital, and that there is no active program under way scanning at a scale that will cover the remaining 90%. For that reason, alternatives like BLR are decisive.

Access to a PDF and a minting a DOI is not a unique value BLR offers. This is the service providing a multitude of accesses points to the articles. It is the focus on dissemination, findability, citability of data within an article, and providing copies of the article in a machine readable formats providing all these links.

Finally, having domain specific corpora of articles in one place will facilitate prioritizing the selection of journals for their processing to include their data too because the respective digital copies are at hand. This ongoing project and service is a contribution to open science supported by Arcadia, Plazi and Zenodo.