The Swiss-based Plazi NGO has received a grant of EUR 1.5 million from Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin – to further develop its Biodiversity Literature Repository (BLR) established in collaboration with Zenodo, the open science repository hosted and managed by the European Organization for Nuclear Research (CERN), and the open-access scholarly publisher and technology provider Pensoft.
The Arcadia-supported project helps rediscover known biodiversity by liberating taxonomic treatments, material citations and images trapped in scholarly biodiversity publications, and making them FAIR and open. The project engages the community in the huge and decisive challenge to understand and preserve the biodiversity of our planet.
Our knowledge about biodiversity is largely imprisoned in a corpus of more than 500 million pages of scientific research publications that is growing daily. Many of these publications are only available in print, and others are PDFs behind a paywall. These data are not FAIR; they are not findable, accessible, interoperable, or reusable. They cannot be linked to new digital resources such as gene sequences, citizen science observations, taxonomic names, or specimens of digitized natural history collections. Extracting and using text and data from such PDFs comes at very high cost, if possible at all.
Through its TreatmentBank production service, Plazi is a leader in providing access to biodiversity data liberated from publications. Thanks to the Arcadia support and in collaboration with Pensoft, Zenodo and the Swiss Institute for Bioinformatics Literature Services (SIBiLS), Plazi provides access to over 750,000 taxonomic treatments, 450,000 figures and over 1.1 million material citations from over 53,000 publications in the BLR. Ian Engelbrecht from the South African National Biodiversity Institute highlights the value of this service: “Reliable, accessible resources for taxonomic data are scarce, and most online resources provide an interpretation of the scientific literature made by the people who built them. TreatmentBank and the BLR are different in that they go straight to the source, providing the data in a dynamic, accessible format exactly as in the original publications.”
“Having digital access to previously published species hypotheses in structured ways such as through TreatmentBank makes taxonomic research much more reproducible. Furthermore, this digital access to knowledge in a single portal informs new research in many ways as well as encourages and accelerates biodiversity/species discovery,” points out Torsten Dikow, Curator at the Smithsonian Institution (USA).
Published research data is one of the best curated data available. Linking extracted research data and connecting infrastructures in order to enable researchers to access services across the data lifecycle is now a part of the recently funded EU-Horizon 2020 project Biodiversity Community Integrated Knowledge Library (BiCIKL). Together with 15 European and world-level research infrastructures, Plazi is a key participant in BiCIKL.
“Services provided by Plazi to liberate data from the precious legacy of generations of nature explorers are globally unique, given the level of automation and detail they provide,’’ says the BiCIKL coordinator and Pensoft founder Prof. Lyubomir Penev. “We should strive to radically change the way we publish new data and narratives, so that these can immediately become FAIR, saving the costs and efforts of their extraction and liberation”.
TreatmentBank and BLR are also integrated into the Swissuniversities-funded project eBioDiv to provide access to data about specimens in the Swiss Natural History collections.
In the previous Arcadia funded project (2017-2020), Plazi built a now widely used infrastructure, including the creation of terminology to describe taxonomic treatments and material citations, both at the base to communicate biodiversity data, and to make the Zenodo repository highly customizable. It is now also implemented at the Global Biodiversity Information Facility (GBIF), where Plazi is the major data provider for over 90,000 species.
“GBIF data is greatly improved by the data flow provided by Plazi. Plazi liberates important data that is critical to the 64 member nations of the GBIF network as they work to provide answers to their biodiversity policy needs,” says Joe Miller, GBIF Executive Secretary.
With help of the current award, Plazi will focus on liberating more data from a wider array of taxonomic journals, and in collaboration with Data Futures, Plazi will develop new services to enable the broader community to enrich and curate liberated data as part of their research and to preserve the annotations for the long-term. Services and products to visualize and analyze target data, and metrics on how to measure the scientific output will be provided. A series of joint training courses and adequate training materials are also planned.
The open access to the liberated data will also serve as the basis for an analysis of the impact of the Bouchout Declaration on Open Biodiversity Knowledge Management, launched in 2014 and signed by more than 90 institutions and 200 individuals, to be presented at a conference in 2024.
To participate in the project or for further questions, please email Donat Agosti, President at Plazi.