Linking spatial and non-spatial dataframes

Note, the order in which you link does matter

This is more of a quick note to self. In R, to link to two tables using the dplyr syntax, one can use e.g. a left- or right-join. The latter can be handy if you want to mutate a table, and subsequently want to link it to another table.

# Library
suppressPackageStartupMessages(
  library(dplyr)
)

# Create two tables
TableA <- data.frame(ID = c(1:4), y = rnorm(4))
TableB <- data.frame(ID = c(1:4), z = runif(4))

# Join the tables
TableC <- TableA %>%
  mutate(n = y^2) %>%
  right_join(., TableB, by = "ID")

# Result
print(TableC)
##   ID          y          n          z
## 1  1 -0.3432263 0.11780427 0.78805883
## 2  2  0.9680410 0.93710336 0.63066454
## 3  3 -1.4392307 2.07138486 0.36640277
## 4  4  0.1358082 0.01844387 0.08568824

Of course, this can also be done as:

TableT <- TableA %>%  mutate(n = y^2)
TableC <- left_join(TableB, TableT)

Mostly a matter of taste, whether you like to have one statement, or rather split it up. It does matter though, if one of the tables is a spatial dataframe (sf class). This is because depending on the order in which the tables are joined, the sf class is passed on or not.

In the example below, a new spatial data.frame of class sf is created.

# Library
suppressPackageStartupMessages(
  library(sf)
)

# Create spatial data.frame
TableB <- data.frame(ID = c(1:4), z = runif(4), 
                     x=seq(5.1, 5.4, 0.1),
                     y=rep(52, 4)) %>%
  st_as_sf(coords = c("x", "y"), crs = 4326)

Now let’s link the TableA and TableB using a left join.

# Link TableA and TableB using left_join
TableC <- left_join(TableB, TableA, by = "ID")
class(TableC)
## [1] "sf"         "data.frame"

As you see, the resulting data.frame has the class sf, i.e., it’s a spatial data.frame. And what if we use a right join?

TableC <- right_join(TableA, TableB,  by = "ID")
class(TableC)
## [1] "data.frame"

This results in a normal dataframe. This shows that when you link a dataframe to a spatial dataframe (in that order), you get a spatial object. If you do this the other way around, i.e., link the spatial dataframe to another dataframe, the sf class is dropped.

Note that in the second example, only the sf class is dropped. The geometry column is still there. This means that you can convert the dataframe to a spatial dataframe afterwards.

st_geometry(TableC) <- TableC$geometry
class(TableC)
## [1] "sf"         "data.frame"

But why not avoid that extra step? Just pay attention to the order in which the tables are linked.

Paulo van Breugel
Paulo van Breugel
Lecturer & researcher

My interests range from biodiversity and ecology to spatial data analysis. I am also what one could describe as a lifelong learner; I enjoy to learn new things and widen my horizon, both professionally and personally.

comments powered by Disqus
Next
Previous