When linking a spatial dataframe and a non-spatial dataframe in R using the the dplyr syntax, note that the order in which you do this matters. tags: sf, dplyr, SQL
Author
Paulo van Breugel
Published
December 6, 2020
This is more of a quick note to self. In R, to link to two tables using the dplyr syntax, one can use e.g. a left- or right-join. The latter can be handy if you want to mutate a table, and subsequently want to link it to another table.
# LibrarysuppressPackageStartupMessages(library(dplyr))# Create two tablesTableA <-data.frame(ID =c(1:4), y =rnorm(4))TableB <-data.frame(ID =c(1:4), z =runif(4))# Join the tablesTableC <- TableA %>%mutate(n = y^2) %>%right_join(., TableB, by ="ID")# Resultprint(TableC)
ID y n z
1 1 -0.5577261 0.3110584 0.5944664
2 2 2.3929585 5.7262503 0.2024827
3 3 1.4856421 2.2071324 0.4099875
4 4 -1.4712446 2.1645607 0.3827962
Mostly a matter of taste, whether you like to have one statement, or rather split it up. It does matter though, if one of the tables is a spatial dataframe (sf class). This is because depending on the order in which the tables are joined, the sf class is passed on or not.
In the example below, a new spatial data.frame of class sf is created.
Now let’s link the TableA and TableB using a left join.
# Link TableA and TableB using left_joinTableC <-left_join(TableB, TableA, by ="ID")class(TableC)
[1] "sf" "data.frame"
As you see, the resulting data.frame has the class sf, i.e., it’s a spatial data.frame. And what if we use a right join?
TableC <-right_join(TableA, TableB, by ="ID")class(TableC)
[1] "data.frame"
This results in a normal dataframe. This shows that when you link a dataframe to a spatial dataframe (in that order), you get a spatial object. If you do this the other way around, i.e., link the spatial dataframe to another dataframe, the sf class is dropped.
Note that in the second example, only the sf class is dropped. The geometry column is still there. This means that you can convert the dataframe to a spatial dataframe afterwards.