Agile GBIF publishing

This post is a part of 2019 GBIF Ebbe Nielsen Challenge submission by EarthCape.

Introduction

GBIF publishing typically requires several steps that often preclude an individual researcher or a research group to quickly publish their data without interrupting their workflow. You can find a detailed guide to publishing datasets in GBIF here.

  1. Make sure your organization has an account in GBIF
  2. Contact organization IT/management to set up an IPT or other hosting for data to be published
  3. Prepare data in DwC(-A) format
  4. Keep exported files updated as primary data change

We have built and experimental module for EarthCape platform that allows one-click publishing to GBIF via Zenodo directly from researchers personal database without the need for setting up IPT or any other software.

Given appropriate credentials from GBIF and Zenodo, a user is able to use standard EarthCape Project and Dataset structure to perform following procedures with a click of a button (via Zenodo and GBIF APIs):

  1. publish DwC-A to Zenodo (e.g. https://sandbox.zenodo.org/record/350436)
  2. publish to GBIF utilizing Zenodo endpoint and DOI (e.g. https://www.gbif-uat.org/dataset/b6121a24-ff7a-4807-a2dc-37ea51300be0)
  3. publish new version to Zenodo and update GBIF endpoint

How to use this demo

For the purpose of this challenge submission there is a dedicated installer: http://bit.ly/2ypYe2R
Installer contains prepopulated SQLite database with a dataset that has been published to Zenodo and GBIF sandboxes.

  1. Download the installer.
  2. Install on target Windows PC/Mac running Windows (admin permissions required).
  3. Run and login as Admin/admin.
  4. Check Settings/Settings section for prepopulated API settings. Change at your own risk.
  5. Go to Datasets.
  6. Create New dataset.
  7. Add minimum required data: Units (observations), Contribution, License, Project, Published Description.
  8. Click Publish in Zenodo Tab (Zenodo fields should get populated in case if successful execution).
  9. Click Publish on GBIF Tab (GBIF fields should get populated in case if successful execution).

Rationale

This development was driven by current users’ requests for providing a way to publish data to GBIF out of EarthCape databases that have been typically used for research data. Although this solution also works for server hosted databases (e.g. https://heliconius.ecdb.io/) which have the means to host the dataset endpoints we decided to investigate the option to provide more direct way for researcher’s data from their desktops to GBIF.

It very well maybe that this way of publishing has not been properly evaluated by GBIF before and, although currently not breaking any rules set by GBIF, may not be recommended.

I do believe that this removes several barriers for publishing research and collections data not only at the individual researcher’s data but also scaling up to institutional level utilizing full power of EarthCape solution.

EarthCape Windows Client is available for a free download for “personal” use and depending on GBIF’s feedback on these publishing features will eventually be released to public.

Under the hood

  • EarthCape is built on Microsoft .Net Framework (requirements).
  • EarthCape Platform consists of two applications: windows (WinForms) and web (ASP.NET) clients.
  • EarthCape source code is currently closed.
  • EarthCape Windows client is free for use with a local database (e.g. SQLite or MySQL).
  • EarthCape has an extensive predefined structure for managing specimen and observation data with the possibility of adding user fields.
  • Darwin Core mapping is preconfigured and is adjustable by the user via the application model (more information on this process).
  • Taxonomic names are run through GBIF Species API. Status, GBIF IDs, valid names and hierarchy are downloaded. More information.
  • Zenodo publishing:
    • Utilizes Zenodo REST API.
    • Zenodo account is required and API token has to be created and stored in EarthCape database.
    • Demo uses Zenodo sandbox and precreated token.
    • On successful publication Zenodo record IDs, DOIs (both base and last version) and URLs are stored with the Dataset record in the database.
    • Re-publishing of the dataset creates a new version in Zenodo and updates GBIF end point in case it is also published there.
  • GBIF publishing
    • Utilizes GBIF REST API.
    • GBIF user account has to be created and permissions to publish for a user’s organization have to be obtained.
    • Demo uses GBIF sandbox and a pre-created user.
    • Zenodo file URL is supplied as an endpoint (If Zenodo is not used for storing the DwC Archive of the dataset, there are options of providing endpoint manually or create one within hosted earthcape solution if one is in place.
    • Zenodo record (all versions, as GBIF does not track versions) DOI is supplied as a DOI for GBIF dataset.

Future plans

There are several other integrations in the lab: INaturalist, Geolocate, Genbank, Biodiversity Heritage Library. What we are aiming at is making this application into a working hub for researchers and curators that not only sends data, but also synchronizes between services and moves feedback (identification, comments, annotations, etc.) back to the operational database. Be it at individual research level, a team or an institution.

Additional screenshots

INaturalist dataset from my account used in the demo https://www.inaturalist.org/observations/emeyke
INaturalist data imported to EarthCape as a dataset
EarthCape Settings
Zenodo fields populated after a successful publication
Darwin Core Archive in Zenodo
GBIF fields populated after a successful publication
GBIF dataset (home page links to its Zenodo record)
Zenodo endpoint
Species names checked via GBIF Species match API and with valid names and hierarchy downloaded
Menu