TreatmentBank API

What is a treatment?

The Plazi TreatmentBank [1] deals with scientific, published, biosystematic literature. It is the literature documenting and describing all the world’s ca 1.9 Million known species in an estimated corpus of over 500 Million published pages. The cited publications in Plazi are all available at the Biodiversity Literature Repository [2] at Zenodo/CERN.

Treatments are well defined parts of articles that define the particular usage of a scientific name by an author at a given time (the publication) [3]. With other words, each scientific name has one to several treatments, depending whether there exists only an original description of a species, or whether there are subsequent re-descriptions. Similar to bibliographic references, treatments can be cited, and subsequent usages of names cite earlier treatments.

Treatments are a synthesis of the knowledge of a given species at a given time. They can be very rich in data, explicitly or implicitly, detailed or summarized, and include many references to external data sources, such as scientific names, collection codes, DNA-codes.

The data can be semantically enhanced, and linked. Treatments as parts of publication need be extracted. Most recently, treatments are tagged in electronic publications with the National Library of Medicine’s Journal Article Tag Suites (JATS) TaxPub extension [3]. This allows automatic extraction. Still the majority of the ca. 2000 journals and books publishing treatments use the PDF format at best. Plazi has tools to extract treatments, enhance the embedded data and import it into its SRS- Treatment Search Portal for public online access.

The data, that is, treatments and observation data, can be viewed as HTML, XML, RDF, or can be harvested with the protocols provided below. The data is provided for harvesting as Darwin Core-Archives.

What is a DarwinCore Archive?

The Darwin Core Archive format is a simple and extensible schema for sharing biodiversity data, especially catalogue data based on the ratified Darwin Core terms and the Darwin Core text guidelines [4]. Darwin Core is a standard for describing sample data in the Biodiversity Informatics community. It has been developed by the Global Biodiversity Information Facility (GBIF).. DarwinCore Archives use a table-based, “spreadsheet-style” format that is more comfortable and familiar to biologists. It uses plain text-files but it is tied to processes that support consistency and stability.

Fig. Schematic representation of a Darwin Core Archive and its components [4]

The GBIF GNA format consists of a set of files where one (or more) files represents the ‘core’ taxonomic data where a single row represents a single taxon reference. The DarwinCore Taxon class provides the majority of concepts supported in the format that enable taxonomic and nomenclatural semantics and syntax (classification, taxonomic and nomenclatural synonymy, status, etc.) to be expressed.

Other files represent “extensions” to this core table and allow additional data elements to be linked to a taxon in the core table with a many to one relationship. The overall topology of one or more of these extensions to the core table is referred to as a “star schema” and provides a compromise between an overly simple flat-file representation of data and more complex multi-related files. In addition to these files, an additional descriptor file named “meta.xml” serves as a key to the other files. Collectively, these files can be further zipped into a single compressed archive file for portability. This compressed file is known as a Darwin Core Archive (DwCA) file [4].

The Darwin Core Archive used by Plazi

There is one archive per article stored in Plazi, containing the data from all the treatments in the article. Archives contain nine files:

meta.xml: description of columns in data files eml.xml: archive meta data, i.e., bibliographic citation of article, etc. taxa.txt: the archive core file, containing one row per taxon in the nomenclature section of a treatment, thus one or multiple rows per treatment, with any after the first for each treatment handling synonymizations. occurrences.txt: occurrence data, containing one row per materials citation, with an ID reference to taxa.txt description.txt: description data, containing one row per descriptive treatment section, with an ID reference to taxa.txt distribution.txt: general distribution data, one row per distribution statement, with an ID reference to taxa.txt media.txt: full text treatments with HTML markup with additional meta data like a bibliographic citation, one row per treatment, with an ID reference to taxa.txt references.txt: bibliographic references to individual treatments, one row per treatment, with an ID reference to taxa.txt vernaculars.txt: vernacular names of treatment taxa, currently empty, as we do not have or mark this kind of data

For a detailed description of the content of each file see Appendix: Darwin Core Archive Content

Treatment Data representation in Plazi

The treatment data is stored in the Treatment Search Portal in native, generic XML included in tagged original publications. The tagged elements are (a) additionally stored in dedicated index structures to support search and (b) extracted and exported in several formats, including DwCA.

A treatment document includes two main elements, the header including the metadata based on the Metadata Object Description Schema (MODS) and the body.

tax:taxonx tax:taxonxHeader tax:taxonxBody The data XML can be converted via XSLT into HTML, TaxonX XML (a schema developed to model biosystematics legacy literature), and RDF and HTML

HTML: (this is also the persistent httpURI used as identifier for treatments)

Plain XML:

TaxonX XML:

RDF: or

The terms used in TaxonX and RDF are either imported from existing schemas (such as Darwin Core for observation records, MODS for bibliographic data) or are, if not available, defined in schemas (TaxonX) or ontologies (RDF: in development)

Plazi API

Treatment data is open access and can be accessed via HTTP GET as described in detail below. The treatment data is provided in HTML, various XML flavors, and RDF.

Obtaining a list of all the treatments available from Plazi

Response (RSS, in Atom XML, encoded in UTF-8)

Entries of interest

  • channel/item/link: the link to the XML treatment
  • channel/item/title: the taxon name and authority

Accessing a particular DwC-Archive

Replace with any UUID from the GBIF-provided listing (see below). It is also possible to directly use the endpoint URL from that listing list.

Example: Response (ZIP Archive, containing XML and tab separated TXT files, all encoded in UTF-8)

Entries of interest:

  • eml.xml: an XML file containing the meta data of the publication, in MODS format
  • taxa.txt: a tab separated TXT file listing the taxa and treatments the DwC-Archive contains, plus higher taxonomy; the Identifier column takes the form - .taxon, and the treatment UUID can be used to access the treatment on the Plazi servers (see below)
  • occurrences.txt: a tab separated TXT file containing occurrence data; the TaxonID column references the Identifier column in taxa.txt, the data column - headers are DwC terms
  • media.txt: a tab separated TXT file containing HTML versions of the treatments; the TaxonID column references the Identifier column in taxa.txt, the HTML - treatments are located in the Description column
  • references.txt: A detailed description of contents can be found here

Accessing a particular treatment on the Plazi servers

HTTP GET<treatmentUUID>
Replace with the actual treatment UUID from the taxa.txt file found in DwC-Archives

Example: Response (HTML, encoded in UTF-8): a web page displaying the treatment

HTTP GET<treatmentUUID>
Replace with the actual treatment UUID from the taxa.txt file found in DwC-Archives

Example: Response (XML, encoded in UTF-8): the raw, generic XML version of the treatment, which all other representations are generated from

HTTP GET<treatmentUUID>
Replace with the actual treatment UUID from the taxa.txt file found in DwC-Archives

Example: Response (XML, encoded in UTF-8): a TaxonX XML version of the treatment

List of Plazi’s available DwC-Archives from GBIF API

GBIF is a regular harvester of Plazi data and can be used as an alternative site.

Replace <20k> with any multiple of 20 (including 0) to page through the list. It is also possible to use a limit other than 20, with the offset then being a multiple of that other limit.

Example (first 20 datasets):

Response (JSON)

    "offset": 0,
    "limit": 1, 
    "endOfRecords": false, 
    "count": 1129, 
    "results": [
            "key": "3e8b196b-c482-47f1-9574-772141310c40", 
            "installationKey": "7ce8aef1-9e92-11dc-8740-b8a03c50a999", 
            "publishingOrganizationKey": "7ce8aef0-9e92-11dc-8738-b8a03c50a862", 
            "external": false, "numConstituents": 0, 
            "type": "CHECKLIST", 
            "title": "Revision of the ant genus Myrmoteras in the Malay Archipelago (Hymenoptera, Formicidae).", 
            "description": "UNAVAILABLE", 
            "language": "eng", 
            "homepage": "", 
            "citation": { 
                "text": " taxonomic treatments database: Revision of the ant genus Myrmoteras in the Malay Archipelago (Hymenoptera, Formicidae)." 
            "rights": "No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation.", 
            "lockedForAutoUpdate": false, 
            "createdBy": "plazi", 
            "modifiedBy": "", 
            "created": "2014-06-28T12:55:54.089+0000", 
            "modified": "2014-11-25T13:29:20.716+0000", 
            "contacts": [...], 
            "endpoints": [{ "key": 45389, 
            "type": "DWC_ARCHIVE", 
            "url": "", 
            "createdBy": "plazi", 
            "modifiedBy": "plazi", 
            "created": "2014-06-28T12:55:54.604+0000", 
            "modified": "2014-06-28T12:55:54.604+0000", 
            "machineTags": [] }], 
            "machineTags": [...], "tags": [], 
            "identifiers": [{ "key": 23594, 
            "type": "UUID", 
            "identifier": "23A1465DDF212F7DA589F41341B83FCC", 
            "createdBy": "plazi", 
            "created": "2014-06-28T12:55:54.334+0000" }], 
            "comments": [], 
            "bibliographicCitations": [], 
            "curatorialUnits": [], 
            "taxonomicCoverages": [], 
            "geographicCoverages": [], 
            "temporalCoverages": [], 
            "keywordCollections": [], 
            "countryCoverage": [], 
            "collections": [], 
            "dataDescriptions": [] 

Entries of interest:

endOfRecords: if false, increasing offset will return further datasets
count: total number of available Plazi datasets
results.endpoints.url: the URL of the DwC-Archive containing the data on
results.identifiers.identifier: the UUID of the dataset
results.homepage: the URL of an HTML page listing the taxonomic treatments whose data is contained in the DwC-Archive


  1. Plazi
  2. Biodiversity Literature Repository.
  3. Catapano T. 2010. TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions. Proceedings of the Journal Article Tag 1. ite Conference 2010 (pdf)
  4. Darwin Core Archive

Appendix: Darwin Core Archive Content

taxa.txt treatment UUID + .taxon for taxon, treatment ID + .syn for new junior synonyms reference string of original description blank, except for new junior synonyms blank blank taxon@kingdom taxon@phylum taxon@class taxon@order taxon@family taxon@genus taxon@rank taxon name blank except for new junior synonyms, where "synonym", "homotypicSynonym" if we have a syntype blank HTTP URI of treatment

occurrences.txt treatment UUID + ".mc." + materials citation ID treatment UUID + ".taxon", referencing taxa.txt mc@specimenCode (explode to one record per specimen code if possible) mc@collectionCode (explode to one record per collection code if possible) blank mc@typeStatus (blank if none given) mc text mc@sex (also other specimen types like "queen", "worker", etc.) mc@specimenCount (explode things like "5 workers, 2 females" to one record per typified specimen count if possible) mc@collectingDate mc@collectorName blank mc@latitude mc@longitude mc@elevation, or mc@elevationMin if given mc@elevationMax if given mc@collectingCountry mc@stateProvince or mc@collectingRegion mc@collectingMunicipality mc@location HTTP URI of treatment

description.txt treatment UUID + ".taxon", referencing taxa.txt subSubSection@type subSubSection text blank (except if we have language detection (might be reusable from spell checker)) article citation

distribution.txt treatment UUID + "." + location UUID treatment UUID + .taxon, referencing taxa.txt mc@collectinCountry mc@location mc@typeStatus

media.txt treatment UUID + .text treatment UUID + .taxon, referencing taxa.txt "" text/html taxon + author + year treatment HTML treatment HTTP URI Public Domain No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation. blank ((Pensoft|Zootaxa) via )?Plazi author list, semicolon separated bibliographic reference string

references.txt treatment UUID + .ref for article (treatment) reference, cited treatment ID (from treatmentCitation@httpUri) + .ref for original description reference treatment ID + .taxon, referencing taxa.txt bibRef@type reference text bibRef@title bibRef@journal or bibRef@volumeTitle blank treatment first page treatment last page bibRef@journal bibRef@part bibRef@publisher bibRef@author, semicolon separated bibRef@editor, semicolon separated bibRef@year blank bibRef@URL, if available bibRef@DOI, if available

vernaculars.txt treatment UUID + .taxon, referencing taxa.txt en vernacular name


  • Plazi background documents
  • Download the description as PDF
  • Support and Questions: Please contact our support with any questions
  • Version: 20150223