Provenance – another look at taxonomic treatments and names

When an ‘ex banker’ - Tim Robertson - meets an ‘ex taxonomist’ - Donat Agosti - this can be very inspiring, especially when they discover that they share rather unexpectedly more than they ever expected. So, what is it?

Both share data lineages. The banker has bank accounts that are created, mutated and that have at a given time a certain value. The taxonomist has species that are described, mutated and at a given time have a certain name. Both of them have provenance, a documented lineage of data that includes the data origins, what happened to it and where it moves over time, a ledger in bankers term. It allows to trace the history, and in taxonomy it ought to allow to reproduce the history.

Each change is documented with bank records, now almost completely automated and well secured contrasting to taxonomy, where neither accounts of species are maintained in registries nor are the changes linked to the an electronic registry but rather published in parts of scientific publications that not even can be cited.

In both system, there is a login mechanism. A highly secured authentication mechanism to log into the respective bank account and in contrast a very loose mechanism whereby a transaction has to be in a scholarly publication as defined by the Codes.

Both have some requirement on how to create a new account of create a new taxon. Whilst an actual money transfer is essential to create an account of mutate its value, it is required to fulfil certain criteria to create an available name for a new species or higher taxon, but one is left to believe that this conditions are fulfilled.

Here then begins the difference that is in fact rather that taxonomy is many years behind the technological development in banking.

In taxonomy too, each transaction is documented by a taxonomic treatment, a piece of text labeled with the respective accepted taxonomic name published in scholarly publication. They together fill hundreds of millions of pages of taxonomic literature, refer often to scholarly illustrations, and in scientific tradition, cite earlier usage of the species in literature, either in a highly implicit form citing the first author and publication year or providing a complete history of subsequent taxonomic treatments.

Citations are typed, that is they can just refer to a former usage and add more data, they can accept a change in the name by citing the taxonomic treatment where the change occured or they can argue for a change in the name, such as synonymized a taxon with another other, or creating a new combination after it has been discovered that this species belongs into a different genus altogether.

More importantly, in the taxonomic treatment reference is made to the specimens used to create the new species and its name (the holotype as the most decisive specimen). They are in most cases implicit and there is no electronic link to the respective physical object in our natural history collections.

The positive aspect is that we have provenance, a well documented lineage for all the currently accepted names through a very simple but efficiently corpus of linked taxonomic treatments. Another positive aspect is that the banks show us how to deal with large data.

Taxonomists can essentially copy what the banks to produce a highly automated process to create an electronic version of the catalogue of life (sort of the central bank for taxonomic names). The currency would be the specimens that can be cited because they have certified digital copies, the conditions to create a new name could be checked following the conditions set by the Codes. An additional effort is needed to digitize all the old records.

Taxonomic treatments play in this system a central role similar to the documented transactions in a bank allowing to recapitulate each step. Additionally, in taxonomy as a science, it allows to reproduce the discovery of new species or changes. The treatments summarize all the arguments a scientist used, and peer review accepted in many ways, in this process. And in good scientific tradition, taxonomic treatments include with their included treatment citations, all the information to build the catalogue of life.

Formally, most of the technical elements and legal basis exist and are in operation, both for ongoing publications as well as processing the overwhelming corpus of legacy publications. The Arcadia Fund is supporting Plazi, in collaboration with CERNZenodo and Pensoft, and the European Journal of Taxonomy is collaborating to enhance and made this approach more popular, GBIF is a long term user of taxonomic treatments.