# Compute the area per category

Exploring different methods to do this in GRASS GIS

## The question

Suppose I have a categorical raster layer. Now I want to create a second raster layer with values that represent the surface area of the categories in the first layer. Below, I test four different methods to see which of these is the fastest.

The first three methods result in a raster layer with values that represent the surface area of the categories (A in figure 1). The fourth option calculates the area for each group of cell that form physically discrete areas (B in figure 1). You can achieve the same with the first three options by first recategorizing the contiguous areas into unique categories. You can do that with the function r.clump.

I provide the Python code using the library, but of course, you can run the same from the command line or using the menu. For the examples, I use the landclass96 raster layer from the North Carolina dataset. You can download the dataset here.

## r.area & r.mapcalc

We can use the r.area addon to create a raster layer with values representing the size of the categories in the category_map in terms of number of cells. Next, we multiply this with the surface area per raster cell using r.mapcalc.

# Import the libraries
import grass.script as gs
import datetime

# Set the region
gs.run_command("g.region", raster="landclass96")

# Compute the area per category (repeat 25 times)
begin_time = datetime.datetime.now()

for i in range(1,25):
# Area per categories (number of cells)
output1 = "test0_{}a".format(i)
output2 = "test0_{}b".format(i)
gs.run_command("r.area", input="landclass96", output=output1)

# Convert number of cells to m2
expr = expression="{} = {} * area()".format(output2, output1)
gs.run_command("r.mapcalc", expression=expr)

# Print the runtime in seconds
runtime01 = (datetime.datetime.now() - begin_time).total_seconds()
print("The runtime is {} seconds".format(runtime01))

# Clean up
gs.run_command("g.remove", flags="f", type="raster", pattern="test0_*",
quiet=True)
The runtime is 5.03994 seconds

## r.stats and r.recode

We can use the r.stats function to compute for each raster category the area. Based on these values, we can then create a recode string and use this as input in the r.recode function to create the map with the surface area per category.

import grass.script as gs
import datetime

def surfaceArea(input, output):
""" Compute a raster layer with for each raster cell the surface area of the
category it belong to
"""
# Compute the area per category
separator=";").split("\n")
p = [i.replace('\r', '') for i in p]
p[:] = [x for x in p if x]
p = [i.split(";") for i in p]

# Create the recode rules
a = []
for i in range(0, len(p)):
sarea = float(p[i][1])
a.append("{0}:{0}:{1}".format(p[i][0], sarea))
rules = "\n".join(a)

# Recod input raster map based on recode rules.
gs.write_command("r.recode", input=input, output=output,
rules="-", stdin=rules)

# Compute the area per category (repeat 25 times)
begin_time = datetime.datetime.now()

for i in range(1,25):
# Area per categories (number of cells)
output1 = "test0_{}".format(i)
surfaceArea("landclass96", output1)

# Print the runtime in seconds
runtime01 = (datetime.datetime.now() - begin_time).total_seconds()
print("The runtime is {} seconds".format(runtime01))

# Clean up
gs.run_command("g.remove", flags="f", type="raster", pattern="test0_*",
quiet=True)
The runtime is 4.110168 seconds 

It runs faster than the previous options. I am not sure if the number of categories will make a difference though.

## r.mapcalc & r.stats.zonal

The third options uses the r.mapcalc and r.stats.zonal functions. First we compute the area of each raster cell. Next, we sum all these values per category.

# Import the libraries
import grass.script as gs
import datetime

# Set the region
gs.run_command("g.region", raster="landclass96")

# Compute the area per category (repeat 25 times)
begin_time = datetime.datetime.now()

for i in range(1,25):
# Area per categories (number of cells)
output1 = "test0_{}a".format(i)
output2 = "test0_{}b".format(i)
gs.run_command("r.mapcalc", expression = "{} = area()".format(output1))
gs.run_command("r.stats.zonal", base="landclass96", cover=output1,
method="sum", output=output2)

# Print the runtime in seconds
runtime01 = (datetime.datetime.now() - begin_time).total_seconds()
print("The runtime is {} seconds".format(runtime01))

# Clean up
gs.run_command("g.remove", flags="f", type="raster", pattern="test0_*",
quiet=True)
The runtime is 5.289945 seconds

Slightly slower than the first option. The difference is small, so I ran both options a number of times. Results showed that this option is consistently slower than the first option.

## r.to.vect & v.to.db

The fourth option is different, in that the raster layer is converted to a vector layer first, using r.to.vect. Then, the v.to.db function is used to calculate the area per polygon. These values will be used as source for the raster values when converting the vector layer back to a raster layer using the v.to.rast function. Note that this means that areas are calculated for the continuous cluster of cells with the same categories (B in figure 1).

# Import the libraries
import grass.script as gs
import datetime

# Set the region
gs.run_command("g.region", raster="landclass96")

# Compute the area per category (repeat 25 times)
begin_time = datetime.datetime.now()

for i in range(1,25):
# Area per categories (number of cells)
output1 = "test0_{}a".format(i)
output2 = "test0_{}b".format(i)
gs.run_command("r.to.vect", input="landclass96", output=output1,
type="area")
gs.run_command("v.to.db", map=output1, option="area", columns="area",
units="meters")
gs.run_command("v.to.rast", input=output1, output=output2,
use="attr", attribute_column="area", memory=2000)

# Print the runtime in seconds
runtime01 = (datetime.datetime.now() - begin_time).total_seconds()
print("The runtime is {} seconds".format(runtime01))

# Clean up
gs.run_command("g.remove", flags="f", type="all", pattern="test0_*",
quiet=True)
The runtime is 81.037101 seconds

This option is clearly much slower. However, when you need to compute multiple statistics per area, if you need to do some follow up calculations, or if you need a vector layer as output, this might still be a good option.

## Summary

The combination of r.stats and r.recode is the fastest option. It requires a bit more of coding, but that’s just part of the fun, isn’t it :-). The fourth option is clearly slower, which is largely due to/from the raster to vector conversion. Currently this Google of Summer project works on the parallelization of existing modules for GRASS GIS, including the r.to.* modules. So this may improve in the near feature.

##### Paulo van Breugel
###### Lecturer & researcher

My interests range from biodiversity and ecology to spatial data analysis. I am also what one could describe as a lifelong learner; I enjoy to learn new things and widen my horizon, both professionally and personally.