The canopy height map by WRI and META
Download and import the CHM data in GRASS GIS
A global canopy height map
Meta and the World Resources Institute launched a global map of tree canopy height at a 1-meter resolution, allowing the detection of single trees on a global scale. Both the canopy height data and the models used to create the data are free and publicly available (Tolan, Couprie, et al. 2024). The maps were created using machine learning models on high-resolution worldwide Maxar satellite imagery. The details of the model used to create the data set are described in Tolan et al. (2024).
The data is available on Google Earth Engine and on AWS as cloud-optimized GeoTIF tiles. In this blog post, I describe how you can download the tiles that intersect with a user-defined area of interest, import the resulting tiles in GRASS GIS, and stitch them together. For some steps, the command line code () and Python code () are provided. For other steps, where results from one command serve as input for the next, only the Python code is provided.
Before you start, be advised that the global map has some potential issues. If you zoom in, you’ll see conspicuous sudden changes in tree heights. These are more likely to reflect differences in the dates of acquisition of the source imagery than actual differences on the ground. This is easy to see if you zoom in on the Amazon or Congo basins, for example.
Install the software
I assume you already have working knowledge of GRASS GIS. If not, you are advised to start familiarizing yourself with the tool (check out the various resources on the GRASS GIS website).
To download the data from AWS, you first need to download and install the AWS Command Line Interface (AWS CLI) tool. Go to https://aws.amazon.com/cli/, select the installer for your operating system, and follow the instructions.
Set the environment
Start GRASS GIS and create a new project (or locations as it was called prior to version 8.4) with the coordinate reference system (CRS) ESP 3857. Of course, if you already have a project with the required CRS, you can use that.
The global map of tree canopy heights comes in the form of cloud optimized GeoTIFF files. In the next section, you will select and download the tiles for your area of interest to your computer. To make things easier, create a new (temporary) directory to download the files to and make that the working directory.
List files and folders
The data is available on AWS. Besides the Cloud-optimized GeoTIFF files, there is also a Geojson file with the polygons showing the location of the tiles. You can use the asw-cli tool you installed earlier to get a list of the files and directories that are available. Click the
To list the content of a sub-folder, run the same code as above, but add the name of the sub-folder for which you want to list the content. For example, you can use the following code to list the content of the sub-folder alsgedi_global_v6_float.
If you click the
The chm sub-folder contains the Geotiff tiles of the canopy height map. Their names consist of a numerical code (the QuadKey) followed by the .tiff extension. The tiles.geojson file is a vector layer with polygons showing the location of each tile. For each polygon, the attribute table contains the QuadKey of the corresponding Geotiff file.
The metadata folder contains a corresponding Geojson file [QuadKey].geojson with polygons that represent the observation date of the input imagery for area marked by that polygon. And lastly, the sub-folder msk contains cloud-cover masks accompanying GeoTIFs. The names are composed of the QuadKey.tif.msk.
Select the tiles
You now know in which folder to find what data. The next step is to identify the tiles covering the area of interest. For this tutorial, the area of interest is the Terai Arc Landscape (TAL) region. This is a region that straddles the border of Nepal and India Figure 1.
You can download the geopackage with the boundaries of the region here in the working directory. Replace this with the vector layer representing your area of interest. Next, import it into the mapset you created earlier.
Now, download the tiles.geojson file and import it into GRASS GIS. Note that according to the documentation, the polygon layer has the same CRS (EPSG 3857) as the GeoTIF files. However, in reality, the file has the EPSG 4326 CRS. So also here, use the r.import
function.
The next step is to identify the tiles that overlap with the TAL region, and to get the corresponding QuadKeys. Because you’ll need the list with QuadKeys later on, it is arguably easier to do this in Python.
# Select the tiles that overlap with the TAL region.
gs.run_command("v.select", ainput="tiles", binput="TALregion",
output="tiles_TAL", operator="overlap")
# Get a list with the QuadKeys from the attribute table
qk = gs.read_command(
"v.db.select",
flags="c",
map="tiles_TAL",
columns="tile",
format="plain",
separator="comma",
).split("\n")
# Remove the empty strings and convert string to integer
qk = [_f for _f in qk if _f]
Download and import the tiles
Now you can download the tiles that overlap with the TAL region. The following code will download the tiles, import them in GRASS GIS 1, and delete the Geotiff.
# Download and import
from os import path
# Download and import the files
baseurl = "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/chm"
for i, quad in enumerate(qk):
downloadlink = f"{path.join(baseurl, quad)}.tif"
layername = f"tile_{quad}"
subprocess.run(
[
"aws",
"s3",
"cp",
"--no-sign-request",
downloadlink,
f"{layername}.tif",
]
)
gs.run_command(
"r.in.gdal", input=f"{layername}.tif", output=layername, memory=40000
)
os.remove(f"{layername}.tif")
Last step is to patch the tiles together into one layer. This requires you to set the region first, using g.region, using the following parameters: With vector = TALregion you set the extent to match that of the TAL region. With raster = qkl the resolution of the output layer will match that of the input tiles. And with align = qkl[0], the resulting layer will be cleanly aligned with the input tiles.
You are almost done, what remains is using r.patch
to patch the tiles together. Set the -s flag to speed up the process. It will disable the time consuming reading and creation of support files. It means that the output map will have no category labels and no explicit color table. Which is fine, as tree heights represent a continuous variable, and you can generate the color table afterwards. Make sure to set the nprocs and memory parameters to fit the specifications of your system.
# Patch the layers together
gs.run_command("r.patch", flags="s", input=qkl, output="CHMtmp", nprocs=10, memory=40000)
# Set color
color_rules = {
0: "247:252:245",
3: "229:245:224",
6: "199:233:192",
9: "161:217:155",
12: "116:196:118",
15: "65:171:93",
18: "35:139:69",
21: "0:109:44",
24: "0:68:27",
95: "0:0:0",
}
rules_file = gs.tempfile()
with open(rules_file, "w") as f:
for value, color in color_rules.items():
f.write(f"{value} {color}\n")
gs.run_command('r.colors', map="CHMtmp", rules=rules_file)
An additional step is to clip the raster to the boundaries of your area of interest. There are different ways to do this, but in this example, the r.clip is used.
# Install the r.clip addon
gs.run_command("g.extension", extension="r.clip")
# Set the region to match the extent of the area of interest
gs.run_command("g.region", vector="TAL_region", align="CHM")
# Set a mask to mask out all areas outside the areas of interest
gs.run_command("r.mask", vector="TAL_region")
# Clip the map
gs.run_command("r.clip", input="CHMtmp", output="CHM")
And the figure below shows the approximately 1 meter resolution canopy height map of the Terai Arc Landscape.
Acquisition dates
It is important to realize that the dataset is based on available satellite imagery from 2009 through 2020. The reason is that cloud cover and seasonality impose limitations on the analyzed image dates.
It is therefore good to know that you can download for each tile a corresponding Geojson file with the observation date of the input imagery.
The example code below will download the Geojson files and patch them together. The assumption is that you already imported the file with tiles (tiles_TAL). Now, first get the list of quadkeys
With the list of quadkeys, you can download and import the geojson files that overlap with your area of interest.
baseurl = "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/metadata"
for i, quad in enumerate(qk):
downloadlink = f"{path.join(baseurl, quad)}.geojson"
layername = f"tile_{quad}"
subprocess.run(
[
"aws",
"s3",
"cp",
"--no-sign-request",
downloadlink,
f"{layername}.geojson",
]
)
gs.run_command("v.import", input=f"{layername}.geojson", output=layername)
os.remove(f"{layername}.geojson")
qkl = [f"tile_{x}" for x in qk]
After downloading and importing the files, create a new layer by combining the imported vector tiles. When done, remove the intermediate layers.
You can optionally merge all polygons with the same date, using the v.dissolve function. A caveat is that the column used to dissolve features must be of type integer or string. So, the first step is to create a new column in which you copy the dates from the acq_date column to a new column acq_strdate as strings.
# Convert column with dates to string, required for the r.dissolve function
gs.run_command("v.db.addcolumn", map="CHM_tmp1", columns="acq_strdate varchar(24)")
gs.run_command(
"db.execute",
sql="UPDATE CHM_tmp1 SET acq_strdate = CAST(acq_date AS VARCHAR(24))",
)
gs.run_command("v.db.dropcolumn", map="CHM_tmp1", columns="acq_date")
Now, the polygons with the same dates can be dissolved, using the newly created column as the key column. Optionally, you can create a new column that holds the dates, using the type ‘date’, rather than ‘strings’.
# Dissolve areas with same acquire date
gs.run_command("v.dissolve", input="CHM_tmp1", column="acq_strdate",
output="CHM_tmp2")
# Create a new column holding the dates as date
gs.run_command("v.db.addcolumn", map="CHM_tmp2",
columns="acq_date date")
gs.run_command("db.execute",
sql="UPDATE CHM_tmp2 SET acq_date = a_acq_strdate")
gs.run_command("v.db.dropcolumn", map="CHM_tmp2",
columns="a_acq_strdate")
One last optional step is to clip the polygon layer to the boundaries of the area of interest. In this example, this is to clip the CHM_tmp3 vector layer to the boundaries of the TAL region.
It is advisable to use the CHM map in conjunction with the vector map of the acquisition dates of the source imagery. And for now, I would only compare tree heights based on the same source image.
References
Footnotes
An alternative way to deal with the tiles is using gdalbuildvrt to create a virtual mosaic, and r.external to import the virtual layer in GRASS GIS. See the GRASS GIS Wiki page on large data handling for the details.↩︎