Access soilgrid data using the WCS protocol

osgeo
python
open data
grass gis
data import
wcs
The soil data layers from soilgrid.org provides access to grid layers representing the spatial distribution of soil properties across the globe. This post provides an example of how to use the OWSLib pacakge to access these soil grid layers for your area of interest. tags: soilgrid, OWSlib, open data, data import
Author

Paulo van Breugel

Published

November 25, 2020

Soilgrids

The website soilgrids.org provides access to grid layers representing the spatial distribution of soil properties across the globe. The SoilGrids maps are publicly available under the CC-BY 4.0 License. The soilgrids.org provides a convenient portal to explore the available data sets. The data can furthermore be accessed through various services, which are listed on the ISRIC site.

One of the available services is the Web Coverage Service (WCS), which is a standard issued by the Open Geospatial Consortium (OGC). It allows easy remote access to raster layers from a web browser or any other (GIS) program that supports the protocol. What makes it convenient is that one can choose what part of the available data to download based on spatial constraints and other query criteria.

QGIS has a native WCS provider. Downloading the data directly in QGIS is convenient if the goal is to visualize or explore few layers. But what if you use the data as input in e.g. a modeling pipeline in Python? In that case, being able to access the data from Python will greatly simplify your life. Below I’ll provide an example of how to access the data from Python using the OWSlib package.

Using OWSlib

In this post I’ll use the Python OWSlib package. Full documentation about this package can be found here, including installation instructions. The example below is based/inspired on the examples provided in this collection of Jupyter notebooks. Note that there are two versions of the WCS standard, the 1.0.0 and 2.0 versions. The latest version introduces some important improvements. However, I couldn’t get it to work with my dataset, so I’ll use version 1.0.0. As soon as I have figured out what I am doing wrong using version 2.0, I’ll update this post.

Available layers

For each different soil property predicted in SoilGrids there is an independent service. Check the web page maps.isric.org for all the services available. In this post, we’ll use the service for the soil pH as example. First step is to check which grid layers are available using the WebCoverageService function, which issues a GetCapabilities request. As input we provide the URL and the variable.

# Load python libraries
from owslib.wcs import WebCoverageService

# Coverage service
var = "phh2o"
url = "http://maps.isric.org/mapserv?map=/map/{}.map".format(var)
wcs = WebCoverageService(url, version='1.0.0')

This gives us a service connection. From this the available coverages (maps) can be accessed with the contents property. This returns a dictionary of 24 coverages, combining six standard depth intervals with four quantiles. Below, this dictionary is converted to a list for easier printing. Below only the first 6 layers are shown.

# Get the list of maps available for this the service
cov_list = list(wcs.contents)
print(cov_list)
>> ['phh2o_0-5cm_Q0.05', 'phh2o_0-5cm_Q0.5', 'phh2o_0-5cm_Q0.95',
    'phh2o_0-5cm_mean', 'phh2o_0-5cm_uncertainty',
    'phh2o_5-15cm_Q0.5', ...]

We can easily filter out specific layers. For example, with the code below we filter out the layers reporting the average predictions for each of the standard depth intervals.

# Get all layers with average predictions
mean_covs = [k for k in wcs.contents.keys() if k.find("mean") != -1]
print(mean_covs)
>> ['phh2o_0-5cm_mean', 'phh2o_5-15cm_mean', 'phh2o_15-30cm_mean',
    'phh2o_30-60cm_mean', 'phh2o_60-100cm_mean', 'phh2o_100-200cm_mean']

Download a layer

To download a layer for a specific region, we can use the getCoverage function. Note that we need to establish the coverage service connection first (we already did this above). The getCoverage function needs the following parameters: the variable (identifier), the coordinate reference systems (crs), the horizontal (resx) and vertical (resy) resolution, the bounding box (bbox) and the output format (e.g., GEOTIFF_INT16).

The format for the crs is defined as follows: urn:ogc:def:crs:EPSG::4326. So we only need to look up the EPSG identifier for the projection in which we want to have the data. The EPSG 4326 is the identifier for the World Geodetic System 1984 (in short WGS 84). For the Amersfoort / RD New projection, the EPSG identifier is 28992. So, we would use crs = 'urn:ogc:def:crs:EPSG::28992'.

The resolution needs to be in the projection’s unit. So, for EPSG 4326 you define the resolution in degrees, while for EPSG 28992 the resolution should be given in meters. The bounding box is defined by the coordinates of the lower left and upper right corner of your region of interest. This takes the following form: bbox = (minimum x, minimum y, maximum x, maximum y).

In the example below, we download the layer for a 16 hectare site in the municipality of Meierijstad in the south of the Netherlands. A new forest garden was established here a few years back, and our students of the Applied Geo-information Science program are involved in a monitoring program as described here. The bounding box of this region is (159750, 403750, 160500, 404750). we’ll download the data in RD New projection and 250 meter resolution.

# Download parameters
variable = 'phh2o_0-5cm_mean'
bbox = (159750, 403750, 160500, 404750)
crs='urn:ogc:def:crs:EPSG::28992'
resolution=250

# Get the data
response = wcs.getCoverage(
    identifier=variable, 
    crs=crs,
    bbox=bbox, 
    resx=resolution,
    resy=resolution, 
    format='GEOTIFF_INT16')

# Save the data as geotif
outputname = "{}.tif".format(variable)
with open(outputname, 'wb') as file:
    file.write(response.read())

Next step is to inspect and print the map. For that, we’ll use the rasterio library. We can open the file with rasterio.open() function and subsequently read the metadata with the profile method. By wrapping the latter in a with... statement, the statement is executed once the file is opened, and the file is closed when the context manager exits. This means there is no need to manually close the raster file. This use of a so-called context manager is further explained here.

# Import libraries
import rasterio
import rasterio.plot

# Read the metadata
filepath = outputname
with rasterio.open(filepath) as src:
    print(src.profile)
>> {'driver': 'GTiff', 'dtype': 'int16', 'nodata': None, 'width': 3,
    'height': 4, 'count': 1, 'crs': CRS.from_epsg(28992),
    'transform': Affine(250.0, 0.0, 159750.0, 0.0, -250.0, 404750.0),
    'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'deflate',
    'interleave': 'band'}

We can see that we are dealing with a single layer Geotiff (Gtiff) file, with a dimensions of 3x4 and that the projection is EPSG 28992. No surprise here of course. To learn more about getting raster properties and statistics, see here.

To plot the file, we can use the rasterio.plot.show() function of rasterio. We first import the show function. This time, we do not use the with statement. That means that when we are done, we need to close the connection explicitly with the close() method.

# Import the show function from rasterio
from rasterio.plot import show

# Open and plot the raster layer
raster = rasterio.open(filepath)
show(raster)
raster.close()

pH H20 for the top 5 cm layer, from soilgrid.org

pH H20 for the top 5 cm layer, from soilgrid.org

The show() function does not show the legend. If we want to plot the legend, we need to use matplotlib’s pyplot.imshow function to plot the raster layer, and use the pyplot.colorbar() to plot the legend.

# Import pyplot as plt
import matplotlib.pyplot as plt

# Print raster with legend
with rasterio.open(filepath) as src:
    lay = src.read(1)
plt.imshow(lay)
plt.colorbar()

The result is shows below. On the x- and y-axes the column and row numbers are given, rather than the coordinates. I haven’t really tried to get it to print with the proper coordinates, as for me this only serves for a quick look at the data anyway.

pH H20 (x 10) for the top 5 cm layer, from soilgrid.org

pH H20 (x 10) for the top 5 cm layer, from soilgrid.org

Import data in GRASS GIS

The advantage of scripting is that one can combine different tools in one pipeline. In the example below, the same code as above is used to download the pH layer. However, the region of interest and resolution are set using the g.region function from GRASS GIS. And the raster layer is imported in the GRASS GIS database.

# Load required libraries
import grass.script as gs
from grass.pygrass.modules import Module
from owslib.wcs import WebCoverageService
import os
import re

# Coverage service
var = "phh2o"
url = "http://maps.isric.org/mapserv?map=/map/{}.map".format(var)
wcs = WebCoverageService(url, version='1.0.0')

# Select the layer
cov_list = list(wcs.contents)
cov_id = cov_list[3]

# Download parameters
c = gs.region()
bbox = (c['w'], c['s'], c['e'], c['n'])
crs='urn:ogc:def:crs:EPSG::28992'

# Download layer and save as tif
response = wcs.getCoverage(
    identifier=cov_id, 
    crs=crs,
    bbox=bbox, 
    resx=c['nsres'],
    resy=c['ewres'], 
    format='GEOTIFF_INT16')
outputname = "{}.tif".format(cov_id)
with open(outputname, 'wb') as file:
    file.write(response.read())

# Import in grass and delete tif file
grasslayer = re.sub("-", "_", cov_id) 
Module("r.in.gdal", input=outputname, output=grasslayer)
os.remove(outputname)

Note that to run the examples above, you need to start GRASS GIS and use the integrated Python editor to run the code. Alternatively, you can start up your favorite Python IDE from the GRASS GIS console. This ensures that you can access the GRASS GIS functionality from within Python. For more information about using Python and GRASS GIS, see this wiki page. If you are new to GRASS GIS, you may want to start with this introduction.

pH H20 (x10) for the top 5 cm layer, from soilgrid.org

pH H20 (x10) for the top 5 cm layer, from soilgrid.org

See also this site for more information about how to access the soilgrid data in Python. Of, if you want to use the data in R, see see here) for an explanation how to download the data using R and gdal.