1 Import GBIF data in QGIS

The Global Biodiversity Information Facility (GBIF) is an international open data infrastructure that allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet. It offers free and open access to a tremendeous amount of biodiversity data. As of the date of writing, the GBIF provides access to 1,645,121,552 occurrence records from 55,482 datasets. The data is collected over three centuries of natural history exploration and including current observations from citizen scientists, researchers and automated monitoring programs. The data is widely used, with over 5000 peer-reviewed papers using the data.

There are various ways to import GBIF data, including directly from the website as comma delimited file (csv), using the GBIF API and for example using R or Python packages. On the GBIF website you’ll find long list of tools that work with GBIF data.

This tutorial shows how you can access GBIF data using QGIS, using the GBIF Occurrences plugin, developed by Nicolas Noé from the Belgian Biodiversity Platform.

Ambrosia psilostachya, source: Salicyna, CC BY-SA 4.0 via Wikimedia Commons.

Figure 1.1: Ambrosia psilostachya, source: Salicyna, CC BY-SA 4.0 via Wikimedia Commons.

This tutorial shows how to download GBIF occurrence data of Ambrosia psilostachya in QGIS. The common name of this species is perennial ragweed. The species is native of North America. Outside North America the perennial ragweed has been introduced to many countries in Europe, Asia, Africa and Australia.1 Let’s see if the GBIF data confirms this.

2 Install the plugin

To use the plugin, launch QGIS and in the main menu, go to Plugins -> Manage and install plugins.... In the All tab search for ‘GBIF occurrences. Select the plugin and click ’Install plugin.’

3 Use the plugin

You can open the plugin window from the main menu: Vector -> GBIF Occurrences -> Load GBIF Occurrences. Alternatively, you can use the Plugin icon in the toolbar (circled in red in the image below).

Next, you fill in the details for your search. As a minimum, fill in the Scientific name of the species. Next, you click “Load occurrences” and the plugin will start to fetch your records.

Optionally, you can fill in other information to restrict your search. You can for example restrict your search to certain years, or what kind of observations you want to include.

When the downloading is done, you’ll see a new QGIS layer with the occurrences. The attribute table of the point vector layer contains all details known by GBIF. This makes it possible to make further selections, or check the downloaded data.

Note that the downloaded layer is a so-called temporary scratch layer. These are in-memory layers, meaning that they are not saved on disk and will be discarded when QGIS is closed. To avoid data loss, you should save the downloaded layer as vector layer. You can do this in any vector format supported by QGIS, using any of the following methods:

  • click the indicatorMemory indicator icon next to the layer.
  • right click on the layer name and in the contextual menu, select the Make permanent entry or use the Export ►.
  • in the menu, select Layer ► Save As….

Each of these commands opens the Save Vector Layer as dialog described in the Creating new layers from an existing layer section and the saved file replaces the temporary one in the Layers panel.

4 Limitations

Currently you can only select one item from the filter dropdown lists. I.e.d, it is not possible to select multiple items, nor to filter out certain items. So, one can select all ‘human observations’ but not filter all ‘human observations’ out. Also important to know, due to limitations of the GBIF API, searches are limited to 200,000 records. An alternative is to download the data from the GBIF website directly. The website offers you very convenient and advanced ways of filtering the data.

The tool offers a very convenient way to quickly download data. However, it does not offer the tools to help you check and clean the data. Yet, errors in the GBIF data, e.g., problematic geographic coordinates or duplicate records, are quite common and need to be cleaned. You can do this manually of course. However, this requires expert knowledge and is only feasible on small taxonomic or geographic scales. It is furthermore time-consuming and difficult to reproduce.2,3 One tool to help you automate part of the detection of problems and cleaning of the data is CoordinateCleaner toolset for R.4 After cleaning the data, you can export the cleaned data set as Geopackage for use in QGIS. See this tutorial how to do this.

5 Reproducibility

If you use the data for analysis and want to publish or share the results, it is important that your analysis are reproducible. This means it should be perfectly clear what data you have used. GBIF data is constantly updated. In addition, there are many ways to filter the data. So just mentioning you have downloaded the data from GBIF isn’t really enough.

A clear advantage of downloading the data directly from the GBIF website is that it automatically produces a Digital Object Identifier (DOI), thus providing a persistent link to the data download from GBIF.org, including information about the search date and filters used.

6 References

1.
CABI. Ambrosia psilostachya (perennial ragweed). Invasive Species Compendium. Published online 2019. https://www.cabi.org/isc/datasheet/4692
2.
Zizka A, Carvalho FA, Calvente A, et al. No one-size-fits-all solution to clean GBIF. PeerJ. 2020;8:e9916. doi:10.7717/peerj.9916
3.
Jin J, Yang J. BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases. Global Ecology and Conservation. 2020;21:e00852. doi:10.1016/j.gecco.2019.e00852
4.
Zizka A, Silvestro D, Andermann T, et al. CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution. 2019;(10):-7. doi:10.1111/2041-210X.13152