This is a vignette to demonstrate some plotting tips and tricks for Metawards model outputs. This adds to the vignette written by Stefan. I’ll go over a few of the differences between plotting different data structures, and include some nifty features.
Eventually I’ll include some methods for generating animations of model runs (everyone loves a movie!), and show how to the plot the data in a cartogram (maybe not scientific, but definitely fun!).
I’ve written a vignette (here) to demonstrate some methods for pulling data from the ensemble databases. The data I will be using herein was generated from the following command.
plot_data <- metawards %>%
filter(output == "Ens002q", replicate == 4) %>%
collect() %>%
mutate(
ward = as.numeric(ward),
week = as.numeric(week)
)
I have included it as an RDS
file in the data section of this vignette if you do not want to go through database interaction and just want to get tucked into plotting. To load this (with your working directory in the vignettes folder) run
Okay now we’re ready to plot. The packages that I tend to use are all loaded below. The list is rather long and so rather than explaining them here I’ll call all functions with the package::function
notation. All of my colour palettes I pull from scico
. There is a very good reason for this and I encourage you to have a read here and here about this collection of palettes.
library(dplyr)
library(tidyr)
library(ggplot2)
library(scico)
library(broom)
library(readr)
library(rgdal)
library(gganimate)
library(ggpubr)
library(sf)
We have our data loaded, now we just need a few other bits. First, lets load in the two csv files that map ward number to ward identifier and ward identifier to trust ID.
ward_lookup <- readr::read_csv(
"data/plotting_model_runs/ward_lookup.csv"
) %>%
dplyr::select(WD11CD, WD11NM, LAD11CD, LAD11NM, ward = FID)
to_trust <- readr::read_csv(
"data/plotting_model_runs/WD11ToAcuteTrustIncWalesHB.csv"
) %>%
as_tibble()
And lets load in our trust shapefile.
trust <- rgdal::readOGR(
dsn = "data/plotting_model_runs/WD11-TRUST/WD11-TRUST.shp",
stringsAsFactors = F
)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/lachlan/Documents/PostDoc/20200710_covid_plots/uq4covid.github.io/vignettes/data/plotting_model_runs/WD11-TRUST/WD11-TRUST.shp", layer: "WD11-TRUST"
## with 186 features
## It has 6 fields
The objects plot_data
, ward_lookup
, and to_trust
are all standard tibbles. The object trust
is a SpatialPolygonsDataFrame
which has a standardised but rather complicated structure (try running str(trust)
). As rgdal
has told us, trust
has information stored in 6 fields for 186 different features. The features correspond to each individual trust, and the fields to the data that are recorded on each trust. Let’s have a look
## # A tibble: 6 x 6
## trustId trustNm actBds_ area count acutBds
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 7A1 Betsi Cadwaladr University Local He~ 438 5.12e9 4.87e5 434
## 2 7A1 Betsi Cadwaladr University Local He~ 434 5.12e9 4.87e5 434
## 3 7A1 Betsi Cadwaladr University Local He~ 471 5.12e9 4.87e5 434
## 4 7A2 Hywel Dda University Local Health B~ 325 8.14e9 4.08e5 224
## 5 7A2 Hywel Dda University Local Health B~ 155 8.14e9 4.08e5 224
## 6 7A2 Hywel Dda University Local Health B~ 212 8.14e9 4.08e5 224
Note that SpatialPolygonsDataFrame
objects are a S4
class and so we subset components using @
rather than $
.
We can leave trust
as a spatial object or we can convert it to a standard tibble. I’ll go over plotting either data structure, so really it’s a horses for courses thing. Personally, I generally work with tibbles if I’m just messing around on my machine, and I work with spatial objects if I’m making nice plots for presentations and publications.
First lets mash together plot_data
, ward_lookup
, and to_trust
to something useful. Obviously, select which model output you want to look at and summarise them as you will. This leaves us with our data indexed by trust and week.
plot_tmp <- plot_data %>%
left_join(ward_lookup, by = "ward") %>%
left_join(to_trust, by = "WD11CD") %>%
dplyr::select(
ward, week, Hprev, Cprev, Deaths, output, trustId
) %>%
group_by(trustId, week) %>%
summarise(H_sum = sum(Hprev)) %>%
ungroup()
print(plot_tmp)
## # A tibble: 2,160 x 3
## trustId week H_sum
## <chr> <dbl> <dbl>
## 1 7A1 11 10.3
## 2 7A1 12 19
## 3 7A1 13 28.1
## 4 7A1 14 28.4
## 5 7A1 15 33.3
## 6 7A1 16 22.3
## 7 7A1 17 15.4
## 8 7A1 18 15.7
## 9 7A1 19 21.1
## 10 7A1 20 20.9
## # ... with 2,150 more rows
When doing vanilla plots that don’t need fancy spatial projections and features, I like casting SpatialPolygonsDataFrames to a regular tibble and going from there. This method is a bit less flexible, but the data is in a nice normal structure that we can do normal things with. Call me old fashioned, but in my humble opinion it covers about 90% of use cases with minimal effort. The package broom
makes this really easy.
## # A tibble: 212,160 x 7
## long lat order hole piece group id
## <dbl> <dbl> <int> <lgl> <chr> <chr> <chr>
## 1 -3.08 53.0 1 FALSE 1 7A1.1 7A1
## 2 -3.08 53.0 2 FALSE 1 7A1.1 7A1
## 3 -3.08 53.0 3 FALSE 1 7A1.1 7A1
## 4 -3.08 53.0 4 FALSE 1 7A1.1 7A1
## 5 -3.08 53.0 5 FALSE 1 7A1.1 7A1
## 6 -3.08 53.0 6 FALSE 1 7A1.1 7A1
## 7 -3.08 53.0 7 FALSE 1 7A1.1 7A1
## 8 -3.09 53.0 8 FALSE 1 7A1.1 7A1
## 9 -3.09 53.0 9 FALSE 1 7A1.1 7A1
## 10 -3.09 53.0 10 FALSE 1 7A1.1 7A1
## # ... with 212,150 more rows
Now we can link our data up to our trust shapes
## # A tibble: 3,394,560 x 9
## trustId week H_sum long lat order hole piece group
## <chr> <dbl> <dbl> <dbl> <dbl> <int> <lgl> <chr> <chr>
## 1 7A1 11 10.3 -3.08 53.0 1 FALSE 1 7A1.1
## 2 7A1 12 19 -3.08 53.0 1 FALSE 1 7A1.1
## 3 7A1 13 28.1 -3.08 53.0 1 FALSE 1 7A1.1
## 4 7A1 14 28.4 -3.08 53.0 1 FALSE 1 7A1.1
## 5 7A1 15 33.3 -3.08 53.0 1 FALSE 1 7A1.1
## 6 7A1 16 22.3 -3.08 53.0 1 FALSE 1 7A1.1
## 7 7A1 17 15.4 -3.08 53.0 1 FALSE 1 7A1.1
## 8 7A1 18 15.7 -3.08 53.0 1 FALSE 1 7A1.1
## 9 7A1 19 21.1 -3.08 53.0 1 FALSE 1 7A1.1
## 10 7A1 20 20.9 -3.08 53.0 1 FALSE 1 7A1.1
## # ... with 3,394,550 more rows
and plot for a single week
week_select <- 11
ggplot(
data = plot_tibble %>% dplyr::filter(week == week_select)
) +
geom_polygon(
aes(
x = long,
y = lat,
group = group,
fill = H_sum
),
colour = "grey", size = 0.1
) +
coord_map() +
theme_bw() +
xlab("Longitude") +
ylab("Latitude") +
scale_fill_scico(
"Hospital\nPrevalence",
direction = 1,
palette = "batlow"
) +
ggtitle(paste0("Week: ", week_select))
As discussed, spatial objects have a bit more built in functionality if we want to do snazzier things. The classes inherited from the sp
package, such as SpatialPolygonsDataFrame
, have become a standard for any spatial analysis in R. They’re clunky, non-intuitive, and require intimate knowledge of the sp
package to do anything of worth. Luckily, the package sf
(simple features) can solve much of our concerns without sacrificing functionality. To convert from the SpatialPolygonsDataFrame
class to a sf
class we use the function st_as_sf()
.
## Simple feature collection with 186 features and 6 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -5.746848 ymin: 49.9588 xmax: 1.762916 ymax: 55.77176
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## trustId trustNm actBds_ area
## 0 7A1 Betsi Cadwaladr University Local Health Board 438 5118790505
## 1 7A1 Betsi Cadwaladr University Local Health Board 434 5118790505
## 2 7A1 Betsi Cadwaladr University Local Health Board 471 5118790505
## 3 7A2 Hywel Dda University Local Health Board 325 8142633264
## 4 7A2 Hywel Dda University Local Health Board 155 8142633264
## 5 7A2 Hywel Dda University Local Health Board 212 8142633264
## 6 7A2 Hywel Dda University Local Health Board 224 8142633264
## 7 7A3 Swansea Bay University Local Health Board 329 887171476
## 8 7A3 Swansea Bay University Local Health Board 340 887171476
## 9 7A3 Swansea Bay University Local Health Board 456 887171476
## count acutBds geometry
## 0 486602.8 434 MULTIPOLYGON (((-4.802508 5...
## 1 486602.8 434 MULTIPOLYGON (((-4.802508 5...
## 2 486602.8 434 MULTIPOLYGON (((-4.802508 5...
## 3 408359.2 224 MULTIPOLYGON (((-4.923253 5...
## 4 408359.2 224 MULTIPOLYGON (((-4.923253 5...
## 5 408359.2 224 MULTIPOLYGON (((-4.923253 5...
## 6 408359.2 224 MULTIPOLYGON (((-4.923253 5...
## 7 433596.8 456 MULTIPOLYGON (((-4.205965 5...
## 8 433596.8 456 MULTIPOLYGON (((-4.205965 5...
## 9 433596.8 456 MULTIPOLYGON (((-4.205965 5...
All of our data is now in one clean data frame, with the polygon geometry stored as a MULTIPOLYGON
in the geometry column. Now, as before, we join up our data and plot. Out of interest in saving space, I’m going to save this plot as a template object that we can alter individual layers of.
plot_sf <- left_join(trust_sf, plot_tmp, by = "trustId")
plot_template <- ggplot(
data = plot_sf %>% dplyr::filter(week == week_select)
) +
geom_sf(
aes(
geometry = geometry,
fill = H_sum
),
colour = "grey", size = 0.1
) +
coord_sf() +
theme_bw() +
xlab("Longitude") +
ylab("Latitude") +
scale_fill_scico(
"Hospital\nPrevalence",
direction = 1,
palette = "batlow"
) +
ggtitle(paste0("Week: ", week_select))
print(plot_template)
Here we need to use the layers geom_sf
and coord_sf
. geom_sf
handles the MULTIPOLYGON
outputs, and coord_sf
makes sure that everything is plotted using the same coordinate reference system (CRS). The method in Section 3.1 has no such checks and so allows one to do dumb things like plot layers with differing CRSs.
What advantages do we have with spatial objects? We can do things like change up the CRS in the data frame itself (for instance, using the UK projected coordinate system: st_transform(trust_sf, "+init=epsg:27700")
). But we can also do this inside of the coord_sf()
layer in ggplot.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
We can also efficiently zoom in on regions without weirdly clipping polygons and creating odd boundary problems. Let’s look at Greater London.
We can label trusts.
plot_template +
geom_sf_label(aes(label = trustId)) +
coord_sf(
xlim = c(-0.489, 0.236),
ylim = c(51.28, 51.686)
)
We can use ggpubr
to show the zoom area on the original map.
plot_zoom1 <- plot_template +
geom_rect(
aes(xmin = -0.489, xmax = 0.236, ymin = 51.28, ymax = 51.686),
colour = "red", fill = NA
) +
theme(legend.position = "none")
plot_zoom2 <- plot_template +
coord_sf(
xlim = c(-0.489, 0.236),
ylim = c(51.28, 51.686)
)
ggpubr::ggarrange(plot_zoom1, plot_zoom2, nrow = 1, ncol = 2)
We can facet different weeks together (we can also do this with standard tibbles).
week_select <- c(11, 25)
week_labs <- paste0("Week: ", week_select)
names(week_labs) <- week_select
ggplot(
data = plot_sf %>% dplyr::filter(week %in% week_select)
) +
geom_sf(
aes(
geometry = geometry,
fill = H_sum
),
colour = "grey", size = 0.1
) +
coord_sf() +
theme_bw() +
xlab("Longitude") +
ylab("Latitude") +
scale_fill_scico(
"Hospital\nPrevalence",
direction = 1,
palette = "batlow"
) +
facet_wrap(
vars(week), nrow = 1, ncol = 2,
labeller = labeller(week = week_labs)
)
We can do lots. I’m going to keep on adding to this as I create more plots. If you need something not in here, or have a good idea for something not shown here, let me know and I’ll give it a crack.