3 Plotting

Okay now we’re ready to plot. The packages that I tend to use are all loaded below. The list is rather long and so rather than explaining them here I’ll call all functions with the package::function notation. All of my colour palettes I pull from scico. There is a very good reason for this and I encourage you to have a read here and here about this collection of palettes.

library(dplyr)
library(tidyr)
library(ggplot2)
library(scico)
library(broom)
library(readr)
library(rgdal)
library(gganimate)
library(ggpubr)
library(sf)

We have our data loaded, now we just need a few other bits. First, lets load in the two csv files that map ward number to ward identifier and ward identifier to trust ID.

ward_lookup <- readr::read_csv(
  "data/plotting_model_runs/ward_lookup.csv"
) %>%
  dplyr::select(WD11CD, WD11NM, LAD11CD, LAD11NM, ward = FID)

to_trust <- readr::read_csv(
  "data/plotting_model_runs/WD11ToAcuteTrustIncWalesHB.csv"
) %>%
  as_tibble()

And lets load in our trust shapefile.

trust <- rgdal::readOGR(
  dsn = "data/plotting_model_runs/WD11-TRUST/WD11-TRUST.shp", 
  stringsAsFactors = F
)

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/lachlan/Documents/PostDoc/20200710_covid_plots/uq4covid.github.io/vignettes/data/plotting_model_runs/WD11-TRUST/WD11-TRUST.shp", layer: "WD11-TRUST"
## with 186 features
## It has 6 fields

The objects plot_data, ward_lookup, and to_trust are all standard tibbles. The object trust is a SpatialPolygonsDataFrame which has a standardised but rather complicated structure (try running str(trust)). As rgdal has told us, trust has information stored in 6 fields for 186 different features. The features correspond to each individual trust, and the fields to the data that are recorded on each trust. Let’s have a look

head(trust@data) %>% as_tibble()

## # A tibble: 6 x 6
##   trustId trustNm                              actBds_       area  count acutBds
##   <chr>   <chr>                                  <dbl>      <dbl>  <dbl>   <dbl>
## 1 7A1     Betsi Cadwaladr University Local He~     438     5.12e9 4.87e5     434
## 2 7A1     Betsi Cadwaladr University Local He~     434     5.12e9 4.87e5     434
## 3 7A1     Betsi Cadwaladr University Local He~     471     5.12e9 4.87e5     434
## 4 7A2     Hywel Dda University Local Health B~     325     8.14e9 4.08e5     224
## 5 7A2     Hywel Dda University Local Health B~     155     8.14e9 4.08e5     224
## 6 7A2     Hywel Dda University Local Health B~     212     8.14e9 4.08e5     224

Note that SpatialPolygonsDataFrame objects are a S4 class and so we subset components using @ rather than $.

We can leave trust as a spatial object or we can convert it to a standard tibble. I’ll go over plotting either data structure, so really it’s a horses for courses thing. Personally, I generally work with tibbles if I’m just messing around on my machine, and I work with spatial objects if I’m making nice plots for presentations and publications.

First lets mash together plot_data, ward_lookup, and to_trust to something useful. Obviously, select which model output you want to look at and summarise them as you will. This leaves us with our data indexed by trust and week.

plot_tmp <- plot_data %>%
  left_join(ward_lookup, by = "ward") %>%
  left_join(to_trust, by = "WD11CD") %>%
  dplyr::select(
    ward, week, Hprev, Cprev, Deaths, output, trustId
  ) %>%
  group_by(trustId, week) %>%
  summarise(H_sum = sum(Hprev)) %>%
  ungroup()

  print(plot_tmp)

## # A tibble: 2,160 x 3
##    trustId  week H_sum
##    <chr>   <dbl> <dbl>
##  1 7A1        11  10.3
##  2 7A1        12  19  
##  3 7A1        13  28.1
##  4 7A1        14  28.4
##  5 7A1        15  33.3
##  6 7A1        16  22.3
##  7 7A1        17  15.4
##  8 7A1        18  15.7
##  9 7A1        19  21.1
## 10 7A1        20  20.9
## # ... with 2,150 more rows

3.1 Plotting using tibbles

When doing vanilla plots that don’t need fancy spatial projections and features, I like casting SpatialPolygonsDataFrames to a regular tibble and going from there. This method is a bit less flexible, but the data is in a nice normal structure that we can do normal things with. Call me old fashioned, but in my humble opinion it covers about 90% of use cases with minimal effort. The package broom makes this really easy.

trust_df <- broom::tidy(trust, region = "trustId")
print(trust_df)

## # A tibble: 212,160 x 7
##     long   lat order hole  piece group id   
##    <dbl> <dbl> <int> <lgl> <chr> <chr> <chr>
##  1 -3.08  53.0     1 FALSE 1     7A1.1 7A1  
##  2 -3.08  53.0     2 FALSE 1     7A1.1 7A1  
##  3 -3.08  53.0     3 FALSE 1     7A1.1 7A1  
##  4 -3.08  53.0     4 FALSE 1     7A1.1 7A1  
##  5 -3.08  53.0     5 FALSE 1     7A1.1 7A1  
##  6 -3.08  53.0     6 FALSE 1     7A1.1 7A1  
##  7 -3.08  53.0     7 FALSE 1     7A1.1 7A1  
##  8 -3.09  53.0     8 FALSE 1     7A1.1 7A1  
##  9 -3.09  53.0     9 FALSE 1     7A1.1 7A1  
## 10 -3.09  53.0    10 FALSE 1     7A1.1 7A1  
## # ... with 212,150 more rows

Now we can link our data up to our trust shapes

plot_tibble <- right_join(plot_tmp, trust_df, by = c("trustId" = "id"))
print(plot_tibble)

## # A tibble: 3,394,560 x 9
##    trustId  week H_sum  long   lat order hole  piece group
##    <chr>   <dbl> <dbl> <dbl> <dbl> <int> <lgl> <chr> <chr>
##  1 7A1        11  10.3 -3.08  53.0     1 FALSE 1     7A1.1
##  2 7A1        12  19   -3.08  53.0     1 FALSE 1     7A1.1
##  3 7A1        13  28.1 -3.08  53.0     1 FALSE 1     7A1.1
##  4 7A1        14  28.4 -3.08  53.0     1 FALSE 1     7A1.1
##  5 7A1        15  33.3 -3.08  53.0     1 FALSE 1     7A1.1
##  6 7A1        16  22.3 -3.08  53.0     1 FALSE 1     7A1.1
##  7 7A1        17  15.4 -3.08  53.0     1 FALSE 1     7A1.1
##  8 7A1        18  15.7 -3.08  53.0     1 FALSE 1     7A1.1
##  9 7A1        19  21.1 -3.08  53.0     1 FALSE 1     7A1.1
## 10 7A1        20  20.9 -3.08  53.0     1 FALSE 1     7A1.1
## # ... with 3,394,550 more rows

and plot for a single week

week_select <- 11

ggplot(
  data = plot_tibble %>% dplyr::filter(week == week_select)
) + 
  geom_polygon( 
    aes(
      x = long, 
      y = lat, 
      group = group, 
      fill = H_sum
    ),
    colour = "grey", size = 0.1
  ) +
  coord_map() +
  theme_bw() +
  xlab("Longitude") +
  ylab("Latitude") +
  scale_fill_scico(
    "Hospital\nPrevalence", 
    direction = 1, 
    palette = "batlow"
  ) +
  ggtitle(paste0("Week: ", week_select))

3.2 Plotting using spatial objects

As discussed, spatial objects have a bit more built in functionality if we want to do snazzier things. The classes inherited from the sp package, such as SpatialPolygonsDataFrame, have become a standard for any spatial analysis in R. They’re clunky, non-intuitive, and require intimate knowledge of the sp package to do anything of worth. Luckily, the package sf (simple features) can solve much of our concerns without sacrificing functionality. To convert from the SpatialPolygonsDataFrame class to a sf class we use the function st_as_sf().

trust_sf <- sf::st_as_sf(trust)
print(trust_sf)

## Simple feature collection with 186 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -5.746848 ymin: 49.9588 xmax: 1.762916 ymax: 55.77176
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##   trustId                                       trustNm actBds_       area
## 0     7A1 Betsi Cadwaladr University Local Health Board     438 5118790505
## 1     7A1 Betsi Cadwaladr University Local Health Board     434 5118790505
## 2     7A1 Betsi Cadwaladr University Local Health Board     471 5118790505
## 3     7A2       Hywel Dda University Local Health Board     325 8142633264
## 4     7A2       Hywel Dda University Local Health Board     155 8142633264
## 5     7A2       Hywel Dda University Local Health Board     212 8142633264
## 6     7A2       Hywel Dda University Local Health Board     224 8142633264
## 7     7A3     Swansea Bay University Local Health Board     329  887171476
## 8     7A3     Swansea Bay University Local Health Board     340  887171476
## 9     7A3     Swansea Bay University Local Health Board     456  887171476
##      count acutBds                       geometry
## 0 486602.8     434 MULTIPOLYGON (((-4.802508 5...
## 1 486602.8     434 MULTIPOLYGON (((-4.802508 5...
## 2 486602.8     434 MULTIPOLYGON (((-4.802508 5...
## 3 408359.2     224 MULTIPOLYGON (((-4.923253 5...
## 4 408359.2     224 MULTIPOLYGON (((-4.923253 5...
## 5 408359.2     224 MULTIPOLYGON (((-4.923253 5...
## 6 408359.2     224 MULTIPOLYGON (((-4.923253 5...
## 7 433596.8     456 MULTIPOLYGON (((-4.205965 5...
## 8 433596.8     456 MULTIPOLYGON (((-4.205965 5...
## 9 433596.8     456 MULTIPOLYGON (((-4.205965 5...

All of our data is now in one clean data frame, with the polygon geometry stored as a MULTIPOLYGON in the geometry column. Now, as before, we join up our data and plot. Out of interest in saving space, I’m going to save this plot as a template object that we can alter individual layers of.

plot_sf <- left_join(trust_sf, plot_tmp, by = "trustId")

plot_template <- ggplot(
  data = plot_sf %>% dplyr::filter(week == week_select)
) + 
  geom_sf( 
    aes(
      geometry = geometry, 
      fill = H_sum
    ),
    colour = "grey", size = 0.1
  ) +
  coord_sf() +
  theme_bw() +
  xlab("Longitude") +
  ylab("Latitude") +
  scale_fill_scico(
    "Hospital\nPrevalence", 
    direction = 1, 
    palette = "batlow"
  ) +
  ggtitle(paste0("Week: ", week_select))

  print(plot_template)

Here we need to use the layers geom_sf and coord_sf. geom_sf handles the MULTIPOLYGON outputs, and coord_sf makes sure that everything is plotted using the same coordinate reference system (CRS). The method in Section 3.1 has no such checks and so allows one to do dumb things like plot layers with differing CRSs.

What advantages do we have with spatial objects? We can do things like change up the CRS in the data frame itself (for instance, using the UK projected coordinate system: st_transform(trust_sf, "+init=epsg:27700")). But we can also do this inside of the coord_sf() layer in ggplot.

plot_template +
  coord_sf(crs = "+init=epsg:27700")

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

We can also efficiently zoom in on regions without weirdly clipping polygons and creating odd boundary problems. Let’s look at Greater London.

plot_template +
  coord_sf(
    xlim = c(-0.489, 0.236), 
    ylim = c(51.28, 51.686)
  )

We can label trusts.

plot_template +
  geom_sf_label(aes(label = trustId)) +
  coord_sf(
    xlim = c(-0.489, 0.236), 
    ylim = c(51.28, 51.686)
  )

We can use ggpubr to show the zoom area on the original map.

plot_zoom1 <- plot_template +
  geom_rect(
    aes(xmin = -0.489, xmax = 0.236, ymin = 51.28, ymax = 51.686),
    colour = "red", fill = NA
  ) +
  theme(legend.position = "none")

plot_zoom2 <- plot_template +
  coord_sf(
    xlim = c(-0.489, 0.236), 
    ylim = c(51.28, 51.686)
  )

ggpubr::ggarrange(plot_zoom1, plot_zoom2, nrow = 1, ncol = 2)

We can facet different weeks together (we can also do this with standard tibbles).

week_select <- c(11, 25)
week_labs <- paste0("Week: ", week_select)
names(week_labs) <- week_select

ggplot(
  data = plot_sf %>% dplyr::filter(week %in% week_select)
) + 
  geom_sf( 
    aes(
      geometry = geometry, 
      fill = H_sum
    ),
    colour = "grey", size = 0.1
  ) +
  coord_sf() +
  theme_bw() +
  xlab("Longitude") +
  ylab("Latitude") +
  scale_fill_scico(
    "Hospital\nPrevalence", 
    direction = 1, 
    palette = "batlow"
  ) +
  facet_wrap(
    vars(week), nrow = 1, ncol = 2,
    labeller = labeller(week = week_labs)
  )

We can do lots. I’m going to keep on adding to this as I create more plots. If you need something not in here, or have a good idea for something not shown here, let me know and I’ll give it a crack.

Plotting Metawards Outputs in R

Lachlan Astfalck

2020-07-23

1 Introduction

2 Some example data

3 Plotting

3.1 Plotting using tibbles

3.2 Plotting using spatial objects