Linking spatial and non-spatial dataframes

Note, the order in which you link does matter

data wrangling
When linking a spatial dataframe and a non-spatial dataframe in R using the the dplyr syntax, note that the order in which you do this matters. tags: sf, dplyr, SQL

Paulo van Breugel


December 6, 2020

This is more of a quick note to self. In R, to link to two tables using the dplyr syntax, one can use e.g. a left- or right-join. The latter can be handy if you want to mutate a table, and subsequently want to link it to another table.

# Library

# Create two tables
TableA <- data.frame(ID = c(1:4), y = rnorm(4))
TableB <- data.frame(ID = c(1:4), z = runif(4))

# Join the tables
TableC <- TableA %>%
  mutate(n = y^2) %>%
  right_join(., TableB, by = "ID")

# Result
  ID          y          n         z
1  1  1.0450788 1.09218965 0.6406591
2  2  0.1668342 0.02783364 0.1738163
3  3 -2.2805624 5.20096491 0.7133560
4  4 -0.9184664 0.84358051 0.8511371

Of course, this can also be done as:

TableT <- TableA %>%  mutate(n = y^2)
TableC <- left_join(TableB, TableT)

Mostly a matter of taste, whether you like to have one statement, or rather split it up. It does matter though, if one of the tables is a spatial dataframe (sf class). This is because depending on the order in which the tables are joined, the sf class is passed on or not.

In the example below, a new spatial data.frame of class sf is created.

# Library

# Create spatial data.frame
TableB <- data.frame(ID = c(1:4), z = runif(4), 
                     x=seq(5.1, 5.4, 0.1),
                     y=rep(52, 4)) %>%
  st_as_sf(coords = c("x", "y"), crs = 4326)

Now let’s link the TableA and TableB using a left join.

# Link TableA and TableB using left_join
TableC <- left_join(TableB, TableA, by = "ID")
[1] "sf"         "data.frame"

As you see, the resulting data.frame has the class sf, i.e., it’s a spatial data.frame. And what if we use a right join?

TableC <- right_join(TableA, TableB,  by = "ID")
[1] "data.frame"

This results in a normal dataframe. This shows that when you link a dataframe to a spatial dataframe (in that order), you get a spatial object. If you do this the other way around, i.e., link the spatial dataframe to another dataframe, the sf class is dropped.

Note that in the second example, only the sf class is dropped. The geometry column is still there. This means that you can convert the dataframe to a spatial dataframe afterwards.

st_geometry(TableC) <- TableC$geometry
[1] "sf"         "data.frame"

But why not avoid that extra step? Just pay attention to the order in which the tables are linked.