Merged data and writing shape files

Why this post?

I am writing this small little explaination to show and, possibly, explain why sometimes merged data (shape files with other data) does not generate maps and instead gives errors. I will also show how to write a shape file.

The libraries

We only need two libraries, tidyverse and sf.

library(tidyverse) ## manipulating data and plotting

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0

## Warning: package 'tibble' was built under R version 4.0.3

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(sf) ## read and write shape files

## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

Reading in data

I have district level shape files of Bihar, a state in India.

We read this files using the read_sf function from the sf package and store the data in an object named data_shape.

read_sf("DistrictsClean.shp") -> data_shape

The data has 37 rows (districts of Bihar) and 171 variables (other socioeconomic indicator of district, not important for this exercise.)

Let us look at the class of the data_shape.

class(data_shape)

## [1] "sf"         "tbl_df"     "tbl"        "data.frame"

A vector of length 4 is returned. The object data_shape is of all these classes. It is my understanding that this means all the operations that can be performed on objects of any of these 4 classes can be performed on the object data_shape.

Basic plot

data_shape %>% 
  ggplot()+
  geom_sf()

Notice that we did not define any aesthetics and yet the shapes are plotted. This is because the stats_sf function takes geometry from the shape file by default.

Simulating merging

I will create some dummy table to be merged with the shape files object(data_Shape).

dummy_data <- tibble(  
  NAME = data_shape$NAME, ## using same names of districts to use as merging key
  use_dummy_var = rnorm(n = 37,mean = 0,sd = 1) ## random variable to plot with map
)

Let us also look at the class of dummy_data.

class(dummy_data)

## [1] "tbl_df"     "tbl"        "data.frame"

This object belongs to all the classes that the object data_Shape has, other than sf.

Merging data

Let us observe what is the class of the merged object which is obtained by merging dummy_data to data_shape.

data_shape %>% 
  left_join(dummy_data,by = "NAME") -> merged_1

class(merged_1)

## [1] "sf"         "tbl_df"     "tbl"        "data.frame"

merged_1 belongs to all the same classes as data_shape

However, this changes when data_shape is merged to dummy_data. The new, merged object, belongs to the same classes as dummy_data. Meaning the new data does not belong to sf class.

dummy_data %>% 
  left_join(data_shape,by = "NAME") -> merged_2

class(merged_2)

## [1] "tbl_df"     "tbl"        "data.frame"

Plotting merged data

By trying to plot merged data we will see that it generates an error for the merged_2 object. This is because it does not belong to sf class

This will work

merged_1 %>% 
  ggplot()+
  geom_sf()+
  geom_sf_text(aes(label = round(use_dummy_var,1)))

## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data

This will not work and generate error

merged_2 %>% 
  ggplot()+
  geom_sf()+
  geom_sf_text(aes(label = round(use_dummy_var,1)))

Error: stat_sf requires the following missing aesthetics: geometry

Therefore, we should merge in a way that the sf class of the object is retained. This is done while creating the object merged_1

Writing a Shape file

merged_1 has the columns of dummy_data as well. It retains the sf class as we because of the way it was merged. We can write the merged_1 as a shape file for future use and safe keeping.

write_sf(merged_1,"newshapefiles.shp")

## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
## Shapefile driver

New shape files will be generated in the working directory.