18.01.2016 10:15

Open Access is the basis for access to our knowledge, not to single PDFs

Open Access in taxonomy is curently discussed in an unusually long thread in Taxacom which shows how little advanced the understanding of our modern science communication, better cyber space and its application to our domain in general is.

I argue, that this whole discussion is misguided. The Internet is not about articles as we have been used to in the pre-digital age, and not about a PDF, even though we can ship this via email or in some cases access via a mouse click (Open Access).

The Internet is about linking data and building a knowledge management system or knowledge graph. This is well beyond the sum of data in the articles. And paywalls are walls that inhibit building such as network. If we maintain them, then we cannot make use of the new properties that the Internet provides us.

Open linked data also allows text and data mining over potentially the entire corpus of not only taxonomic literature, but well beyond.

Taxonomic articles can be very rich in data. It allows others to look at hour contributions in a way we don't. Bibliographic citations allow to build citation networks and measure the use of our literature. Taxonomic treatments and included citations allow building up by machine the catalogue of life. DNABarcode, collection codes allow to understand who and where specimens of which collection have been used.

A network allows to enter our knowledge from very different angles, such as a specimen, a location, a collector, a DNA-sequence and ask questions of who collected in location x in a give period? Who widely are species distributed in a given area? Who is the host of a particular species? This all, besides being able to look at a single treatment, a single article, a single key.

A lot of the elements for this is developed outside our community and we just need to make us of it. DOIs for articles are the unit for citing and identifying articles supported by CrossRef and DataCite. Persistent identifiers are being used for authors (e.g. ORCID). Solutions adopted in our community are for specimens (e.g. httpURI in CETAF and other subscribers of the Bouchout Declaration), BarCodes, Names (e.g. LSIDs in Zoobank), httpURI for treatments in Plazi. All of them are deployed and are crystallization points for the big network, because they are used such to cross reference, within the sciences and beyond. Wikipata makes use of PIDs from NCBI, IT IS, GBIF, Plazi, EOL, which all are data that orginated from published records.

But more importantly, we have one of the most advanced publication scheme in the sciences with the Biodiversity Data Journal, Zookeys and the reminder of the Pensoft journals. This allows not just getting a PDF or html of the content. But at the moment of publication, the data within is either directly pushed to GBIF, EOL, or Plazi, or  from the latter to NCBI, Wikidata.


Theses Open Access journals, paid upfront, are much cheaper than the average, as Daniel listed below.  More importantly, any PDF produced now needs somebody who extracts the data within, such as add the names to dedicated databases, extract body length as traits, extracts observation records, extract images or tables, extract the treatments and bibliographic records, if we want to make this piece of knowledge accessible within the Internet and open for an efficient data mining. So, it is not just publishing or access costs that count, but the almost unsurmountable costs of reuse that prohibit making our biodiversity knowledge part of the global knowledge graph, cloud or just our cultural heritage.

There is no way around Open Access. If we don't do it, and our knowledge is really that relevant, in the very near future we will pay dearly because the big publishers will not only ask a huge amount of money to access our journals or produce them OA, but more importantly, they will charge hilariously access fees to the knowledge base they create by making use of all the data we deliver them for free. And if you do want to do science, you will depend on this access, that most of us will not be able to afford.

So, the discussion must be how we build our knowledge management system that makes us part of the bigger picture. And I think the dire state of biodiversity with an increasing pressure, and the exciting rapidly developing genomic data make it adamant that we too use the state of the art tools to communicate our science and provide access to it - especially as our community is among the leaders in this area in the sciences.