The canopy height map by WRI and META

Download and import the CHM data in GRASS GIS

osgeo
open data
grass gis
data import
Meta and the World Resources Institute launched a global map of tree canopy height at a 1-meter resolution. This post shows how to download and import the data in GRASS GIS for a specific area.
Author

Paulo van Breugel

Published

May 4, 2024

A global canopy height map

Meta and the World Resources Institute launched a global map of tree canopy height at a 1-meter resolution, allowing the detection of single trees on a global scale. Both the canopy height data and the models used to create the data are free and publicly available (Tolan, Couprie, et al. 2024). The maps were created using machine learning models on high-resolution worldwide Maxar satellite imagery. The details of the model used to create the data set are described in Tolan et al. (2024).

The data is available on Google Earth Engine and on AWS as cloud-optimized GeoTIF tiles. In this blog post, I describe how you can download the tiles that intersect with a user-defined area of interest, import the resulting tiles in GRASS GIS, and stitch them together. For some steps, the command line code () and Python code () are provided. For other steps, where results from one command serve as input for the next, only the Python code is provided.

Warning

Before you start, be advised that the global map has some potential issues. If you zoom in, you’ll see conspicuous sudden changes in tree heights. These are more likely to reflect differences in the dates of acquisition of the source imagery than actual differences on the ground. This is easy to see if you zoom in on the Amazon or Congo basins, for example.

Install the software

I assume you already have working knowledge of GRASS GIS. If not, you are advised to start familiarizing yourself with the tool (check out the various resources on the GRASS GIS website).

To download the data from AWS, you first need to download and install the AWS Command Line Interface (AWS CLI) tool. Go to https://aws.amazon.com/cli/, select the installer for your operating system, and follow the instructions.

Set the environment

Start GRASS GIS and create a new project (or locations as it was called prior to version 8.4) with the coordinate reference system (CRS) ESP 3857. Of course, if you already have a project with the required CRS, you can use that.

g.proj epsg=3857 project=PseudoMercator
g.mapset -c mapset=CanopyHeight project=PseudoMercator
import grass.script as gs
gs.run_command("g.proj", epsg=3857, project="PseudoMercator")
gs.run_command("g.mapset", flags="c", mapset="CanopyHeight", 
               project="PseudoMercator")

The global map of tree canopy heights comes in the form of cloud optimized GeoTIFF files. In the next section, you will select and download the tiles for your area of interest to your computer. To make things easier, create a new (temporary) directory to download the files to and make that the working directory.

Note that I am working on Linux. However, apart from the file paths, the commands should be the same on the Windows command line.

mkdir /home/paulo/data/canopyheight
cd /home/paulo/data/canopyheight
import os
os.mkdir('/home/paulo/data/canopyheight')
os.chdir('/home/paulo/data/canopyheight')

List files and folders

The data is available on AWS. Besides the Cloud-optimized GeoTIFF files, there is also a Geojson file with the polygons showing the location of the tiles. You can use the asw-cli tool you installed earlier to get a list of the files and directories that are available. Click the tab to see the resulting list.

aws s3 ls --no-sign-request s3://dataforgood-fb-data/forests/v1/
import subprocess

subprocess.run(
    [
        "aws",
        "s3",
        "ls",
        "--no-sign-request",
        "s3://dataforgood-fb-data/forests/v1/"
    ]
)

To list the content of a sub-folder, run the same code as above, but add the name of the sub-folder for which you want to list the content. For example, you can use the following code to list the content of the sub-folder alsgedi_global_v6_float.

aws s3 ls --no-sign-request s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/
import subprocess

subprocess.run(
    [
        "aws",
        "s3",
        "ls",
        "--no-sign-request",
        "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/"
    ]
)

If you click the tab above, you can see that the folder contains the tiles.geojson file and the three sub-folders, chm, metadata, and msk.

The chm sub-folder contains the Geotiff tiles of the canopy height map. Their names consist of a numerical code (the QuadKey) followed by the .tiff extension. The tiles.geojson file is a vector layer with polygons showing the location of each tile. For each polygon, the attribute table contains the QuadKey of the corresponding Geotiff file.

The metadata folder contains a corresponding Geojson file [QuadKey].geojson with polygons that represent the observation date of the input imagery for area marked by that polygon. And lastly, the sub-folder msk contains cloud-cover masks accompanying GeoTIFs. The names are composed of the QuadKey.tif.msk.

Select the tiles

You now know in which folder to find what data. The next step is to identify the tiles covering the area of interest. For this tutorial, the area of interest is the Terai Arc Landscape (TAL) region. This is a region that straddles the border of Nepal and India Figure 1.

Figure 1: The lowlands of the Terai Arc Landscape TAL (light green) and the protected areas (dark green), on the foot of the Himalayas.
Figure 1: The lowlands of the Terai Arc Landscape TAL (light green) and the protected areas (dark green), on the foot of the Himalayas.

You can download the geopackage with the boundaries of the region here in the working directory. Replace this with the vector layer representing your area of interest. Next, import it into the mapset you created earlier.

Note that the CRS of the file is latlong ESP 4326. Use the r.import function to reproject the file ‘on-the-fly’ during import.

v.import input=TALregion.gpkg layer=TALregion output=TALregion

Note that the CRS of the file is latlong ESP 4326. Use the r.import function to reproject the file ‘on-the-fly’ during import.

gs.run_command("v.import", input="TALregion.gpkg", 
               layer="TALregion", output="TALregion")

Now, download the tiles.geojson file and import it into GRASS GIS. Note that according to the documentation, the polygon layer has the same CRS (EPSG 3857) as the GeoTIF files. However, in reality, the file has the EPSG 4326 CRS. So also here, use the r.import function.

# Download the file to the working directory
aws s3 cp --no-sign-request s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/tiles.geojson tiles.geojson

# Import the layer in GRASS GIS. 
v.import input=tiles.geojson output=tiles
# Download the file to the working directory
subprocess.run(
    [
        "aws",
        "s3",
        "cp",
        "--no-sign-request",
        "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/tiles.geojson",
        "tiles.geojson",
    ]
)

# Import the layer in GRASS GIS. 
gs.run_command("v.import", input="tiles.geojson", output="tiles")

The next step is to identify the tiles that overlap with the TAL region, and to get the corresponding QuadKeys. Because you’ll need the list with QuadKeys later on, it is arguably easier to do this in Python.

# Select the tiles that overlap with the TAL region.
gs.run_command("v.select", ainput="tiles", binput="TALregion",
               output="tiles_TAL", operator="overlap")

# Get a list with the QuadKeys from the attribute table
qk = gs.read_command(
    "v.db.select",
    flags="c",
    map="tiles_TAL",
    columns="tile",
    format="plain",
    separator="comma",
).split("\n")

# Remove the empty strings and convert string to integer
qk = [_f for _f in qk if _f]

Download and import the tiles

Now you can download the tiles that overlap with the TAL region. The following code will download the tiles, import them in GRASS GIS 1, and delete the Geotiff.

# Download and import
from os import path

# Download and import the files
baseurl = "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/chm"
for i, quad in enumerate(qk):
    downloadlink = f"{path.join(baseurl, quad)}.tif"
    layername = f"tile_{quad}"
    subprocess.run(
        [
            "aws",
            "s3",
            "cp",
            "--no-sign-request",
            downloadlink,
            f"{layername}.tif",
        ]
    )
    gs.run_command(
        "r.in.gdal", input=f"{layername}.tif", output=layername, memory=40000
    )
    os.remove(f"{layername}.tif")

Last step is to patch the tiles together into one layer. This requires you to set the region first, using g.region, using the following parameters: With vector = TALregion you set the extent to match that of the TAL region. With raster = qkl the resolution of the output layer will match that of the input tiles. And with align = qkl[0], the resulting layer will be cleanly aligned with the input tiles.

# Create the list of layers
qkl = [f"tile_{x}" for x in qk]

# Set the region to match the extent of the com
gs.run_command("g.region", vector="TALregion", raster=qkl, align=qkl[0])

# Set a mask to exclude the areas outside the boundaries of the 

You are almost done, what remains is using r.patch to patch the tiles together. Set the -s flag to speed up the process. It will disable the time consuming reading and creation of support files. It means that the output map will have no category labels and no explicit color table. Which is fine, as tree heights represent a continuous variable, and you can generate the color table afterwards. Make sure to set the nprocs and memory parameters to fit the specifications of your system.

# Patch the layers together
gs.run_command("r.patch", flags="s", input=qkl, output="CHMtmp", nprocs=10, memory=40000)

# Set color
color_rules = {
    0: "247:252:245",
    3: "229:245:224",
    6: "199:233:192",
    9: "161:217:155",
    12: "116:196:118",
    15: "65:171:93",
    18: "35:139:69",
    21: "0:109:44",
    24: "0:68:27",
    95: "0:0:0",
}
rules_file = gs.tempfile()
with open(rules_file, "w") as f:
    for value, color in color_rules.items():
        f.write(f"{value} {color}\n")
gs.run_command('r.colors', map="CHMtmp", rules=rules_file)

An additional step is to clip the raster to the boundaries of your area of interest. There are different ways to do this, but in this example, the r.clip is used.

# Install the r.clip addon
gs.run_command("g.extension", extension="r.clip")

# Set the region to match the extent of the area of interest
gs.run_command("g.region", vector="TAL_region", align="CHM")

# Set a mask to mask out all areas outside the areas of interest
gs.run_command("r.mask", vector="TAL_region")

# Clip the map
gs.run_command("r.clip", input="CHMtmp", output="CHM")

And the figure below shows the approximately 1 meter resolution canopy height map of the Terai Arc Landscape.

Figure 2: The canopy height map for the Terai Arc Landscape region in Nepal and India.
Figure 2: The canopy height map for the Terai Arc Landscape region in Nepal and India.

Acquisition dates

It is important to realize that the dataset is based on available satellite imagery from 2009 through 2020. The reason is that cloud cover and seasonality impose limitations on the analyzed image dates.

Figure 3: Crown height in part of the TAL region. If you look closely, you can see some sharp boundaries between areas with higher and lower crown heights.
Figure 3: Crown height in part of the TAL region. If you look closely, you can see some sharp boundaries between areas with higher and lower crown heights.
Figure 4: These sharp boundaries are the result of the CHM being based on images taken at different times rather than reflecting real differences.
Figure 4: These sharp boundaries are the result of the CHM being based on images taken at different times rather than reflecting real differences.

It is therefore good to know that you can download for each tile a corresponding Geojson file with the observation date of the input imagery.

Figure 5: For each tile [QuadKey].tiff, the metadata folder contains a corresponding Geojson file [QuadKey].geojson with polygons that represent the observation date of the input imagery for area marked by that polygon. In this figure, the observation dates for the input imagery used to create tile 313132000 are shown.
Figure 5: For each tile [QuadKey].tiff, the metadata folder contains a corresponding Geojson file [QuadKey].geojson with polygons that represent the observation date of the input imagery for area marked by that polygon. In this figure, the observation dates for the input imagery used to create tile 313132000 are shown.

The example code below will download the Geojson files and patch them together. The assumption is that you already imported the file with tiles (tiles_TAL). Now, first get the list of quadkeys

qk = gs.read_command(
    "v.db.select",
    flags="c",
    map="tiles_TAL",
    columns="tile",
    format="plain",
    separator="comma",
).split("\n")
qk = [_f for _f in qk if _f]

With the list of quadkeys, you can download and import the geojson files that overlap with your area of interest.

baseurl = "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/metadata"
for i, quad in enumerate(qk):
    downloadlink = f"{path.join(baseurl, quad)}.geojson"
    layername = f"tile_{quad}"
    subprocess.run(
        [
            "aws",
            "s3",
            "cp",
            "--no-sign-request",
            downloadlink,
            f"{layername}.geojson",
        ]
    )
    gs.run_command("v.import", input=f"{layername}.geojson", output=layername)
    os.remove(f"{layername}.geojson")

qkl = [f"tile_{x}" for x in qk]

After downloading and importing the files, create a new layer by combining the imported vector tiles. When done, remove the intermediate layers.

gs.run_command("v.patch", flags="e", input=qkl, output="CHM_tmp1")
gs.run_command("g.remove", type="vector", pattern="tile_*", flags="f")

You can optionally merge all polygons with the same date, using the v.dissolve function. A caveat is that the column used to dissolve features must be of type integer or string. So, the first step is to create a new column in which you copy the dates from the acq_date column to a new column acq_strdate as strings.

# Convert column with dates to string, required for the r.dissolve function
gs.run_command("v.db.addcolumn", map="CHM_tmp1", columns="acq_strdate varchar(24)")
gs.run_command(
    "db.execute",
    sql="UPDATE CHM_tmp1 SET acq_strdate = CAST(acq_date AS VARCHAR(24))",
)
gs.run_command("v.db.dropcolumn", map="CHM_tmp1", columns="acq_date")

Now, the polygons with the same dates can be dissolved, using the newly created column as the key column. Optionally, you can create a new column that holds the dates, using the type ‘date’, rather than ‘strings’.

# Dissolve areas with same acquire date
gs.run_command("v.dissolve", input="CHM_tmp1", column="acq_strdate", 
               output="CHM_tmp2")

# Create a new column holding the dates as date
gs.run_command("v.db.addcolumn", map="CHM_tmp2", 
               columns="acq_date date")
gs.run_command("db.execute", 
               sql="UPDATE CHM_tmp2 SET acq_date = a_acq_strdate")
gs.run_command("v.db.dropcolumn", map="CHM_tmp2", 
                columns="a_acq_strdate")

One last optional step is to clip the polygon layer to the boundaries of the area of interest. In this example, this is to clip the CHM_tmp3 vector layer to the boundaries of the TAL region.

gs.run_command("v.clip", input="CHM_tmp2", clip="TAL_PAS", 
               output="CHM_acq_dates")
gs.run_command("g.remove", type="vector", pattern="CHM_tmp*", flags="f")

It is advisable to use the CHM map in conjunction with the vector map of the acquisition dates of the source imagery. And for now, I would only compare tree heights based on the same source image.

Figure 6: The canopy height map for the Terai Arc Landscape region in Nepal and India, overlaid with the vector map of the acquisition dates of source imagery.
Figure 6: The canopy height map for the Terai Arc Landscape region in Nepal and India, overlaid with the vector map of the acquisition dates of source imagery.

References

Tolan, Jamie, Camille Couprie, John Brandt, Justine Spore, Tobias Tiecke, Tracy Johns, and Patrick Nease. 2024. “Using Artificial Intelligence to Map the Earth’s Forests.” Meta Sustainability. https://sustainability.fb.com/blog/2024/04/22/using-artificial-intelligence-to-map-the-earths-forests/.
Tolan, Jamie, Hung-I Yang, Benjamin Nosarzewski, Guillaume Couairon, Huy V. Vo, John Brandt, Justine Spore, et al. 2024. “Very High Resolution Canopy Height Maps from RGB Imagery Using Self-Supervised Vision Transformer and Convolutional Decoder Trained on Aerial Lidar.” Remote Sensing of Environment 300 (January): 113888. https://doi.org/10.1016/j.rse.2023.113888.

Footnotes

  1. An alternative way to deal with the tiles is using gdalbuildvrt to create a virtual mosaic, and r.external to import the virtual layer in GRASS GIS. See the GRASS GIS Wiki page on large data handling for the details.↩︎