Why this post?
I am writing this small little explaination to show and, possibly, explain why sometimes merged data (shape files with other data) does not generate maps and instead gives errors. I will also show how to write a shape file.
The libraries
We only need two libraries, tidyverse and sf.
library(tidyverse) ## manipulating data and plotting
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## Warning: package 'tibble' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(sf) ## read and write shape files
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
Reading in data
I have district level shape files of Bihar, a state in India.
We read this files using the read_sf
function from the sf package and store the data in an object named data_shape
.
read_sf("DistrictsClean.shp") -> data_shape
The data has 37 rows (districts of Bihar) and 171 variables (other socioeconomic indicator of district, not important for this exercise.)
Let us look at the class of the data_shape
.
class(data_shape)
## [1] "sf" "tbl_df" "tbl" "data.frame"
A vector of length 4 is returned. The object data_shape
is of all these classes. It is my understanding that this means all the operations that can be performed on objects of any of these 4 classes can be performed on the object data_shape
.
Basic plot
data_shape %>%
ggplot()+
geom_sf()
Notice that we did not define any aesthetics and yet the shapes are plotted. This is because the stats_sf
function takes geometry from the shape file by default.
Simulating merging
I will create some dummy table to be merged with the shape files object(data_Shape
).
dummy_data <- tibble(
NAME = data_shape$NAME, ## using same names of districts to use as merging key
use_dummy_var = rnorm(n = 37,mean = 0,sd = 1) ## random variable to plot with map
)
Let us also look at the class of dummy_data
.
class(dummy_data)
## [1] "tbl_df" "tbl" "data.frame"
This object belongs to all the classes that the object data_Shape
has, other than sf.
Merging data
Let us observe what is the class of the merged object which is obtained by merging dummy_data
to data_shape
.
data_shape %>%
left_join(dummy_data,by = "NAME") -> merged_1
class(merged_1)
## [1] "sf" "tbl_df" "tbl" "data.frame"
merged_1
belongs to all the same classes as data_shape
However, this changes when data_shape
is merged to dummy_data
. The new, merged object, belongs to the same classes as dummy_data
. Meaning the new data does not belong to sf class.
dummy_data %>%
left_join(data_shape,by = "NAME") -> merged_2
class(merged_2)
## [1] "tbl_df" "tbl" "data.frame"
Plotting merged data
By trying to plot merged data we will see that it generates an error for the merged_2
object. This is because it does not belong to sf class
This will work
merged_1 %>%
ggplot()+
geom_sf()+
geom_sf_text(aes(label = round(use_dummy_var,1)))
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
This will not work and generate error
merged_2 %>%
ggplot()+
geom_sf()+
geom_sf_text(aes(label = round(use_dummy_var,1)))
Error: stat_sf requires the following missing aesthetics: geometry
Therefore, we should merge in a way that the sf class of the object is retained. This is done while creating the object merged_1
Writing a Shape file
merged_1 has the columns of dummy_data as well. It retains the sf class as we because of the way it was merged. We can write the merged_1 as a shape file for future use and safe keeping.
write_sf(merged_1,"newshapefiles.shp")
## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
## Shapefile driver
New shape files will be generated in the working directory.