Plazi and Pensoft launch an initiative to provide access to scholarly published data about Coronavirus hosts

Read the Eurekalert! and Knoweldge Speak release.

The COVID-19 pandemic presumably started with the escape of the Coronavirus from its bat host to humans. To understand the original host, it is important to have access to relevant scientific knowledge about these organisms. The scientific results from charting the world’s biodiversity reside in a vast corpus, which is often “imprisoned” by paywalls, copyright laws or trapped in formats unfavorable to text and data mining. For the majority of the world’s species, there exist only one or a few articles providing descriptions of the species or adding some additional observations. Even for well-known groups such as birds and mammals, access to primary taxonomic literature requires extensive and time-consuming specialist searches. Bats, suspected hosts of COVID-19 and other viruses such as Ebola, are particularly poorly covered Catalogue of Life and ITIS, and most taxonomic information is locked within commercial closed-access books and scholarly articles.

The current COVID-19 pandemic is also just one of the many occasions in which rapid access to all possible data is crucial. There is already evidence for a possible link between the escape of SARS-like (coronaviruses) viruses from bats to humans. Potential hosts include a variety of animals, including pangolins, bats, snakes and civets. The evidence supporting these claims spans from the early 2000’s up to papers published shortly after the Wuhan outbreak (Li et al. 2005, Menachery et al. 2015, Hou et al. 2017, Zhou et al. 2020, Lam et al. 2020). Nonetheless, no dedicated large-scale study on potential hosts, nor efforts to mine data and compile the taxonomic information available for these known reservoirs have been made.

For that reason, and in alignment with the recently announced DiSSCo and CETAF COVID-19 Task Force intended to create an efficient network of taxonomists, collection curators and other experts from around the globe, Plazi together with Pensoft are launching an initiative to make broadly accessible taxonomic and other biological traits data about the hosts or vectors of the SARS-CoV-2 or other coronaviruses. We will locate, acquire publications relating to the virus’ hosts and deposit in a newly formed Coronavirus-Host Community, a repository hosted on the Zenodo platform, which will provide persistent open access to these publications, enhanced with taxonomy specific data derived from the sources though text and data mining processes. Currently accessible data on the Biodiversity Literature Repository is accessible here and will be shared with the Coronavirus-Host community.

The liberated data is open access and will feed automatically into GBIF (see example) and can be reused through the APIs (see an the Ocellus example).

Contributions can be made at various levels, from sending suggestions of articles to be added to the Zotero bibliographies public libraries on virus-hosts associations and on hosts’ taxonomy (such as bats, pangolins or snakes and others), to help converting and FAIRize these articles. If you’re interested in collaborating, please email us at covihost@plazi.org.