Access soilgrid data using the WCS protocol
Soilgrids
The website soilgrids.org provides access to grid layers representing the spatial distribution of soil properties across the globe. The SoilGrids maps are publicly available under the CC-BY 4.0 License. The soilgrids.org provides a convenient portal to explore the available data sets. The data can furthermore be accessed through various services, which are listed on the ISRIC site.
One of the available services is the Web Coverage Service (WCS), which is a standard issued by the Open Geospatial Consortium (OGC). It allows easy remote access to raster layers from a web browser or any other (GIS) program that supports the protocol. What makes it convenient is that one can choose what part of the available data to download based on spatial constraints and other query criteria.
QGIS has a native WCS provider. Downloading the data directly in QGIS is convenient if the goal is to visualize or explore few layers. But what if you use the data as input in e.g. a modeling pipeline in Python? In that case, being able to access the data from Python will greatly simplify your life. Below I’ll provide an example of how to access the data from Python using the OWSlib package.
Using OWSlib
In this post I’ll use the Python OWSlib package. Full documentation about this package can be found here, including installation instructions. The example below is based/inspired on the examples provided in this collection of Jupyter notebooks. Note that there are two versions of the WCS standard, the 1.0.0 and 2.0 versions. The latest version introduces some important improvements. However, I couldn’t get it to work with my dataset, so I’ll use version 1.0.0. As soon as I have figured out what I am doing wrong using version 2.0, I’ll update this post.
Available layers
For each different soil property predicted in SoilGrids there is an independent service. Check the web page maps.isric.org for all the services available. In this post, we’ll use the service for the soil pH as example. First step is to check which grid layers are available using the WebCoverageService
function, which issues a GetCapabilities
request. As input we provide the URL and the variable.
This gives us a service connection. From this the available coverages (maps) can be accessed with the contents
property. This returns a dictionary of 24 coverages, combining six standard depth intervals with four quantiles. Below, this dictionary is converted to a list for easier printing. Below only the first 6 layers are shown.
>> ['phh2o_0-5cm_Q0.05', 'phh2o_0-5cm_Q0.5', 'phh2o_0-5cm_Q0.95',
'phh2o_0-5cm_mean', 'phh2o_0-5cm_uncertainty',
'phh2o_5-15cm_Q0.5', ...]
We can easily filter out specific layers. For example, with the code below we filter out the layers reporting the average predictions for each of the standard depth intervals.
>> ['phh2o_0-5cm_mean', 'phh2o_5-15cm_mean', 'phh2o_15-30cm_mean',
'phh2o_30-60cm_mean', 'phh2o_60-100cm_mean', 'phh2o_100-200cm_mean']
Download a layer
To download a layer for a specific region, we can use the getCoverage
function. Note that we need to establish the coverage service connection first (we already did this above). The getCoverage
function needs the following parameters: the variable (identifier), the coordinate reference systems (crs), the horizontal (resx) and vertical (resy) resolution, the bounding box (bbox) and the output format (e.g., GEOTIFF_INT16).
The format for the crs is defined as follows: urn:ogc:def:crs:EPSG::4326
. So we only need to look up the EPSG identifier for the projection in which we want to have the data. The EPSG 4326 is the identifier for the World Geodetic System 1984 (in short WGS 84). For the Amersfoort / RD New projection, the EPSG identifier is 28992. So, we would use crs = 'urn:ogc:def:crs:EPSG::28992'
.
The resolution needs to be in the projection’s unit. So, for EPSG 4326 you define the resolution in degrees, while for EPSG 28992 the resolution should be given in meters. The bounding box is defined by the coordinates of the lower left and upper right corner of your region of interest. This takes the following form: bbox = (minimum x, minimum y, maximum x, maximum y)
.
In the example below, we download the layer for a 16 hectare site in the municipality of Meierijstad in the south of the Netherlands. A new forest garden was established here a few years back, and our students of the Applied Geo-information Science program are involved in a monitoring program as described here. The bounding box of this region is (159750, 403750, 160500, 404750). we’ll download the data in RD New projection and 250 meter resolution.
# Download parameters
variable = 'phh2o_0-5cm_mean'
bbox = (159750, 403750, 160500, 404750)
crs='urn:ogc:def:crs:EPSG::28992'
resolution=250
# Get the data
response = wcs.getCoverage(
identifier=variable,
crs=crs,
bbox=bbox,
resx=resolution,
resy=resolution,
format='GEOTIFF_INT16')
# Save the data as geotif
outputname = "{}.tif".format(variable)
with open(outputname, 'wb') as file:
file.write(response.read())
Next step is to inspect and print the map. For that, we’ll use the rasterio library. We can open the file with rasterio.open() function and subsequently read the metadata with the profile method. By wrapping the latter in a with...
statement, the statement is executed once the file is opened, and the file is closed when the context manager exits. This means there is no need to manually close the raster file. This use of a so-called context manager is further explained here.
>> {'driver': 'GTiff', 'dtype': 'int16', 'nodata': None, 'width': 3,
'height': 4, 'count': 1, 'crs': CRS.from_epsg(28992),
'transform': Affine(250.0, 0.0, 159750.0, 0.0, -250.0, 404750.0),
'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'deflate',
'interleave': 'band'}
We can see that we are dealing with a single layer Geotiff (Gtiff) file, with a dimensions of 3x4 and that the projection is EPSG 28992. No surprise here of course. To learn more about getting raster properties and statistics, see here.
To plot the file, we can use the rasterio.plot.show() function of rasterio. We first import the show function. This time, we do not use the with statement. That means that when we are done, we need to close the connection explicitly with the close()
method.
The show()
function does not show the legend. If we want to plot the legend, we need to use matplotlib’s pyplot.imshow function to plot the raster layer, and use the pyplot.colorbar() to plot the legend.
The result is shows below. On the x- and y-axes the column and row numbers are given, rather than the coordinates. I haven’t really tried to get it to print with the proper coordinates, as for me this only serves for a quick look at the data anyway.
Import data in GRASS GIS
The advantage of scripting is that one can combine different tools in one pipeline. In the example below, the same code as above is used to download the pH layer. However, the region of interest and resolution are set using the g.region
function from GRASS GIS. And the raster layer is imported in the GRASS GIS database.
# Load required libraries
import grass.script as gs
from grass.pygrass.modules import Module
from owslib.wcs import WebCoverageService
import os
import re
# Coverage service
var = "phh2o"
url = "http://maps.isric.org/mapserv?map=/map/{}.map".format(var)
wcs = WebCoverageService(url, version='1.0.0')
# Select the layer
cov_list = list(wcs.contents)
cov_id = cov_list[3]
# Download parameters
c = gs.region()
bbox = (c['w'], c['s'], c['e'], c['n'])
crs='urn:ogc:def:crs:EPSG::28992'
# Download layer and save as tif
response = wcs.getCoverage(
identifier=cov_id,
crs=crs,
bbox=bbox,
resx=c['nsres'],
resy=c['ewres'],
format='GEOTIFF_INT16')
outputname = "{}.tif".format(cov_id)
with open(outputname, 'wb') as file:
file.write(response.read())
# Import in grass and delete tif file
grasslayer = re.sub("-", "_", cov_id)
Module("r.in.gdal", input=outputname, output=grasslayer)
os.remove(outputname)
Note that to run the examples above, you need to start GRASS GIS and use the integrated Python editor to run the code. Alternatively, you can start up your favorite Python IDE from the GRASS GIS console. This ensures that you can access the GRASS GIS functionality from within Python. For more information about using Python and GRASS GIS, see this wiki page. If you are new to GRASS GIS, you may want to start with this introduction.
See also this site for more information about how to access the soilgrid data in Python. Of, if you want to use the data in R, see see here) for an explanation how to download the data using R and gdal.