library(MASS)
library(rgdal)

1 Introduction

The aim of this document is to think about how to calibrate COVID-19 simulations from the MetaWards model.

Click here to see details of key changes.

  • 04/06/2020

    • A separate vignette has been put together to describe creation of the calibration data.

    • MetaWards’ output is now post-processed to represent `detected’ cases, so that it better matches the Public Health England calibration data.

    • Calibration now separately considers cumulative cases between 1st Jan 2020 and 30th March 2020 and between 1st April 2020 and 31st May 2020.

    • More consideration has been given to the choice of discrepancy parameter, \(\lambda\).

    • Calibration by emulation uses mogp and the correctly-transformed posterior mean.

    • Further MetaWards runs have been performed and their actual predictive probabilities compared against those predicted by the emulator.

2 Calibration

We’ll load the calibration data, created as described here.

load("calibration_data.rda")
load("calibration_functions.rda")

2.1 Outline

Of the various ways to calibrate the MetaWards model, Oakley & Youngman (Technometrics, 2017) seems a viable option, which works as follows. Out of laziness, I’ll just refer to the MetaWards model as the `simulator’ from now on.

Consider a single UA. Let \(Z\) denote its observed number of cases and \(n\) its population size, i.e. the number of potential cases (on a given day). (This assumes someone can only contract once, which we’ll return to.) Then, of course, \[ Z \sim Binomial(n, \theta^*)\] where \(\theta^*\) is the probability of an individual contracting COVID-19.

Next, we’ll assume that the simulator takes input \(x\), and gives number of cases \(Y(x)\), where \[\begin{equation} Y(x) \sim Binomial(m, \theta(x)), \label{y} \end{equation}\] with \(\theta(x)\) the simulator’s probability that an individual contracts COVID-19. (I suppose \(m = n\), or at least they would be equal if they represented the same time point, but we’ll return to this.)

Forgetting discrepancy, for the time being, in rather simple form we want to assume that \(\theta^* = \theta(x^*)\) for some calibrated input \(x^*\). From , \(\pi(\theta(x) \mid y(x), m)\) is of \(Beta(1 + y(x), 1 + m - y(x))\) form, and hence \[\begin{align*} \pi(z \mid y(x), m, n) &= \int \pi(z \mid \theta(x), n) \pi(\theta(x) \mid y(x), m) d\theta(x) \\ &= \dfrac{{}^nC_z B(1 + y(x) + z, 1 + m - y(x) + n - z)}{B(1 + y(x), 1 + m - y(x))}, \end{align*}\] where \(B(\, , \,)\) is the Beta function.

2.2 Simulator discrepancy and measurement error

Oakley & Youngman’s approach to allowing for simulator discrepancy was to assume that, instead of simulations representing a sample of size \(m\), they are representative of a sample of size \(\lambda_y m\), \(0 < \lambda < 1\). This is perhaps quite an intuitive way of sensibly specifying discrepancy. For example, the average UA population is \(m \simeq 165,000\). Suppose the simulator has the perfect input. Would it get \(Y(x)\) perfectly? Of course not. What if \(m = 165\), i.e. \(\lambda_y = 0.001\)? I think we need to be thinking of very small \(\lambda_y \ll 0.001\).

Measurement error can be specified similarly with \(0 < \lambda_z < 1\). There are various sources of measurement error, in particular failing to observe the actual day on which a case occurs. Recording and testing errors might be expected to be much smaller.

The upshot of both these sources of error is that for calibration we’ll consider the predictive probability \[\begin{equation} \label{calib} \pi(z \mid y(x), m, n, \lambda_y, \lambda_z) = \dfrac{{}^nC_z B\big(1 + \lambda_y y(x) + \lambda_z z, 1 + \lambda_y[m - y(x)] + \lambda_z[n - z]\big)}{B\big(1 + \lambda_y y(x), 1 + \lambda_y[m - y(x)]\big)}. \end{equation}\] This is all described rather more formally in Oakley & Youngman (2017).

pred_prob <- function(z, n, y, m, lambda_z = 1, lambda_y = 1, log = TRUE) {
out <- lchoose(lambda_z * n, round(lambda_z * z))
out <- out + lbeta(1 + lambda_y * y + lambda_z * z, 1 + lambda_y * (m - y) + lambda_z * (n - z))
out <- out - lbeta(1 + lambda_y * y, 1 + lambda_y * (m - y))
out <- sum(out)
if (!log)
  out <- exp(out)
out
}

2.3 Calibration over UAs and time periods

Now let \(y_{jt}(x)\) denote the simulated number of cases for UA \(j\), \(j=1, \ldots, J\), and time period \(t\), \(t = 1, \ldots, T\). Let \(m_j\) and \(n_j\) denote the simulator and observation populations for UA \(j\), respectively, which, for simplicity, assumes constant populations over time. For now we’ll assume independence between UAs and time periods, so that the overall predictive probability is \[\pi(z \mid n, y(x), m, \lambda_y, \lambda_z) = \prod_{j = 1}^J \prod_{t = 1}^T \pi(z_{jt} \mid y_{jt}(x), m_j, n_j, \lambda_y, \lambda_z),\] and includes a notational simplification that \(z = \{z_j\}_{j=1, \ldots, J}\) with \(\{z_j\} = \{z_{jt}\}_{t=1, \ldots, T}\) and \(y(x)\) defined similarly.

For brevity we’ll simply write \(\ell(z \mid x) = \log \pi(z \mid n, y(x), m, \lambda_y, \lambda_z)\), which can be evaluated with this function.

3 Calibration in action

3.1 Calculating some calibration quantities

We’ll calculate \(\ell(z \mid x_i)\) for each input \(x_i \in D\) for \(i = 1, \ldots, I\) where \(x_i = (x_{i1}, \ldots, x_{iK})\) and \(D = \{x_1, \ldots, x_I\}\) is our input design. Here \(I = 200\) and \(K = 5\).

First, we’ll create \(z\) and \(n\).

z <- obs
n <- lads$phe_population

This is a good opportunity to start to understand \(z\), with a histogram of \(z / n\), the observed proportion of cases per UA, which should give an idea of the ball park for numbers of cases that we should be targeting.

hist_z <- hist(z / n, 20, main = "Histogram of proportion of cases per UA", xlab = "Proportion", prob = TRUE)

Next we’ll put together the rest of the calibration data, i.e. that related to the simulator.

y <- outputs
n_input <- ncol(inputs)
m <- lads$population

Alas, some simulated counts exceed their population size, so we’ll cap them. Could this be people becoming infected multiple times? The cause of this should be investigated.

y <- lapply(y, function(x) pmin(x, m))

We can plot the overall proportion of cases per simulator run as a sanity check

par(mfrow=c(1, 2))
hist_z <- apply(z / n, 2, hist, breaks = 20,  
  main = "Histogram of proportion of cases per UA", xlab = "Proportion", prob = TRUE)

and we soon see that these are rather different from those observed.

3.2 Allowing for not all cases being detected

MetaWards outputs the number of cases per UA. However, the observations don’t reflect this: they reflect the number of detected cases per UA. Hence we need to adjust MetaWards’ output to give something calibratable. A starting point for this is to assume that, across UAs, a fixed proportion of cases, \(\tau\), are detected. Ultimately, \(\tau\) might want to be assumed to to vary with time, and to be random with some prior, which, additionally, could be integrated out. Here, \(\tau = 0.06\) will be assumed. Given the introduction of \(\tau\), the calibration simulator output will be set.

tau_0 <- .06
y_c <- lapply(y, function(x) tau_0 * x)

3.3 Calibration with aggregated numbers of cases

Now suppose that \(\theta(x) \mid y(x), m, \lambda_y\) refers to the overall proportion of cases, given simulator run with input \(x\). We’ll start by identifying the simulator run with an overall proportion of detected cases closest to that observed.

nearest <- which.min(sapply(y_c, function(x) sum((x / m - z / n)^2)))

Then we’ll plot \(\theta(x) \mid y(x), m, \lambda_y\) for different \(\lambda_y\). We’ll explore \(\lambda_y \in [10^{-8}, 1]\) and represent the resulting posterior for \(\theta \in [0, 0.01]\).

lambda_seq <- 10^seq(-8, 0)
par(mfrow = n2mfrow(length(lambda_seq), asp = 1.3), mar = c(3, 3, 1, 1))
for (i in seq_along(lambda_seq)) {
  alpha <- 1 + lambda_seq[i] * y_c[[nearest]][1, 1]
  beta <- 1 + lambda_seq[i] * (m[1] - y_c[[nearest]][1, 1])
  curve(dbeta(x, alpha, beta), n = 1e3, from = 0, to = .1)
  title(paste("\u03BB = ", lambda_seq[i], "; Beta(", signif(alpha, 3), ", ", signif(beta, 3), ")", sep = ""))
}

At this point we’ll specify values for \(\lambda_y\) and \(\lambda_z\). Firstly, we assume that \(\lambda_y < \lambda_z\), by at least an order of magnitude. Then the values are chosen essentially by trial and error so that the criterion for identifying inputs corresponding to output not compatible with the observations is very conservative. The chosen values of \(\lambda_y\) and \(\lambda_z\) are specified with

lambda_y <- 2e-6
lambda_z <- 1e2 * lambda_y

and it should be possible to increase these sequentially with more comprehensive simulator run data. These choices give the following (log) predictive probabilities, which are shown in a histogram.

log_pi_z <- sapply(y_c, pred_prob, z = z, n = n, m = m, lambda_z = lambda_z, lambda_y = lambda_y)
pi_z <- exp(log_pi_z)
hist(log_pi_z, prob = TRUE, main = "Histogram of \u2113(z | x)", xlab = "\u2113(z | x)")

3.4 An aside: is the `best’ simulator run any good?

The best simulator run is clearly that with input \(\{x_i : \ell(z \mid x_i) = \ell_0 = \max_{x_i \in D} \ell(z \mid x_i)\}\). So let’s set \(\ell_0\).

l_0 <- max(log_pi_z)

Click here to see a comparison between the observations and `best’ simulator output.

Now we’ll plot the observations and cases for the best simulator run. Let’s set up the data for this. We’re going to need some polygons representing the UAs. We can get these from this ONS link, from which we’ll choose the `Shapefile’ download. The following reads in the shapefile, and gets rid of any UAs not in our data.

#UAs <- readOGR("Local_Authority_Districts__December_2017__Boundaries_in_the_UK__WGS84_.shp")
#poly_match <- match(lads$LAD11CD, UAs$lad17cd)
UAs <- readOGR("/home/ben/Dropbox/Exeter/Research/covid/england_lad_2011_gen_clipped.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/ben/Dropbox/Exeter/Research/covid/england_lad_2011_gen_clipped.shp", layer: "england_lad_2011_gen_clipped"
## with 326 features
## It has 3 fields
poly_match <- match(lads$LAD11CD, UAs$code)
poly_okay <- is.finite(poly_match)
UAs <- UAs[poly_match[poly_okay], ]

Next we want to color the polygons

obs_prop <- z[, 1] / n
sim_prop <- y_c[[which.max(log_pi_z)]][, 1] / m
prop_seq <- pretty(pmin(c(obs_prop, sim_prop), .06), 10)
prop_pal <- grey(rev(seq(0, 1, l = length(prop_seq) - 1)))
obs_cols <- prop_pal[as.integer(cut(obs_prop, prop_seq))]
sim_cols <- prop_pal[as.integer(cut(sim_prop, prop_seq))]

and then plot them together

layout(matrix(1:3, 1), widths = c(1, 1, .05))
par(mar = c(.5, .5, 3, 2), oma = c(0, 0, 0, 3))
plot(UAs, col = obs_cols[poly_okay], lwd = .4)
title("Observed")
plot(UAs, col = sim_cols[poly_okay], lwd = .4)
title("Best simulated")
par(plt = replace(par("plt"), 3:4, c(.2, .8)))
image(1, prop_seq, t(prop_seq[-1] - .5 * diff(prop_seq)), 
      col = prop_pal, breaks = prop_seq, axes = FALSE, xlab = "")
axis(side = 4, las = 2)
box()

which shows very little agreement in pattern, which is fine at this stage since aggregated numbers of cases have been considered. At this early stage the aim is just for overall proportions to be in the `right’ ball park, which could be a rather large ball park.

3.5 Identifying potentially bad input regions

What’s a bad input region? Let’s be conservative about this and assume that an input is bad if \(\ell(z \mid x_i) - \ell_0 < \log(0.001)\), for \(\ell_0\) defined above. Let \(\mathcal{D} = [-1, 1]^5\) denote the input domain; it is worth noting that \(\ell_0 \leq \max_{x \in \mathcal{D}} \ell(z \mid x_i)\), i.e. that extra conservatism arises from not knowing the true maximum.

We’ll consider inputs in bins, of length 6 for each, which is easily changed.

# number of bins for each input
n_bin <- rep(6, n_input)
# bin edges
brks <- lapply(n_bin, function(x) seq(-1, 1, length = x))
# bin mid-points (perhaps not used)
mids <- lapply(brks, function(x) x[-1] - .5* diff(x))

Then we’ll set up some (very inefficient, yet usefully verbose) functions to calculate maxima over bins for a given input and given pairwise combinations of inputs.

Click here to see functions.

lazy_max_vec <- function(x, z, x0, top, exp = TRUE, bridge = 1) {
out <- rep(NA)
for (i in seq_along(x0[-1])) {
  idx <- x >= x0[i] & x < x0[i + 1]
  if (any(idx))
    out[i] <- max(z[idx])
}
out <- bridge * (out - top)
if (exp)
  out <- exp(out)
out
}

and similarly for pairwise combinations of inputs

lazy_max_mat <- function(x, y, z, x0, y0, top, exp = TRUE, bridge = 1) {
out <- matrix(NA, length(x0) - 1, length(y0) - 1)
for (i in seq_along(x0[-1])) {
  idx <- x >= x0[i] & x < x0[i + 1]
  for (j in seq_along(y0[-1])) {
    idy <- y >= y0[j] & y < y0[j + 1]
    idxy <- idx & idy
    if (any(idxy))
      out[i, j] <- max(z[idxy])
  }
}
out <- bridge * (out - top)
if (exp)
  out <- exp(out)
out
}

We’ll use these functions to calculate \(\max_{x_{ik} \in [*,*]} \ell(z \mid x)\) for each input (i.e. the maximum log posterior predictive probability for \(x_{ik}\) in a given bin)

vecs <- list()
for (i in seq_len(n_input)) {
  vecs[[i]] <- lazy_max_vec(inputs[,i], log_pi_z, brks[[i]], l_0)
}

and its counterpart for two-dimensional bins for each pairwise combination of inputs.

mats <- lapply(seq_len(n_input), function(i) list())
for (i in 1:(n_input - 1)) {
  for (j in (i + 1):n_input) {
    mats[[i]][[j]] <- lazy_max_mat(inputs[,i], inputs[,j], 
                        log_pi_z, brks[[i]], brks[[j]], l_0)
  }
}

Now let’s plot something useful. We want a \(5 \times 5\) panel of plots (as we have five inputs). On its diagonal we’ll put the one-dimensional binned \(\ell(z \mid x)\); on its lower triangle we’ll put pairwise, binned \(\ell(z \mid x)\); and on its upper triangle we’ll put pairwise `raw’ \(\ell(z \mid x)\).

grid <- diag(1:5)
pairwise_index <- 1:sum(1:(n_input - 1))
grid[lower.tri(grid)] <- max(grid) + pairwise_index
grid[upper.tri(grid)] <- max(grid) + pairwise_index

First we’ll set up the colour scale palettes.

ratio <- exp(log_pi_z - max(log_pi_z))
ratio_seq_log10 <- pretty(log10(ratio))
ratio_mids_log10 <- ratio_seq_log10[-1] - .5 * diff(ratio_seq_log10)
ratio_seq <- 10^ratio_seq_log10
ratio_pal <- rev(grey(seq(0, 1, l = length(ratio_seq) - 1)))
ratio_col <- ratio_pal[as.integer(cut(ratio, ratio_seq))]

These lines produce the plots in the order of diagonals (i.e. binned one-dimensional), lower triangle (i.e. binned pairwise), upper triangle (i.e. raw pairwise), and ending with adding the colour scale key.

Click here to see the plotting code.

plot_max <- function() {
layout(cbind(grid, max(grid) + 1), width = c(rep(1, n_input), .2))
par(mar = rep(.5, 4), oma = c(3, 3, 0, 4))

# one-dimensional on diagonal
for (i in 1:n_input) {
  rep_x <- c(1, rep(2:(length(brks[[i]]) - 1), each = 2), length(brks[[i]]))
  plot(brks[[i]][rep_x], rep(vecs[[i]], each = 2), type = "l", lwd = 2, 
       axes = FALSE, ylim = range(unlist(vecs)), xlim = c(-1, 1))
  box()
  if (i == 1) {
    axis(side = 2, cex.axis = .7)
    mtext(side = 2, text = paste("input", i), line = 2, cex = .7)
  }
  if (i == n_input) {
    axis(side = 1, cex.axis = .7)
    mtext(side = 1, text = paste("input", i), line = 2, cex = .7)
  }
}

# pairwise binned on lower triangle
for (i in 1:(n_input - 1)) {
  for (j in (i + 1):n_input) {
    plot_ij <- list(x = brks[[i]], y = brks[[j]], z = mats[[i]][[j]])
    image(plot_ij, col = ratio_pal, breaks = ratio_seq, axes = FALSE)
    box()
  if (i == 1) {
    axis(side = 2, cex.axis = .7)
    mtext(side = 2, text = paste("input", j), line = 2, cex = .7)
  }
  if (j == n_input) {
    axis(side = 1, cex.axis = .7)
    mtext(side = 1, text = paste("input", i), line = 2, cex = .7)
  }
}
}

# pairwise raw on upper triangle
for (i in 2:n_input) {
  for (j in 1:(i - 1)) {
    plot(inputs[, i], inputs[, j], bg = ratio_col, pch = 21, axes = FALSE)
    box()
  }
}

# color scale on right-hand side
key <- list(x = 1, y = ratio_seq_log10, z = t(ratio_mids_log10))
image(key, col = ratio_pal, breaks = ratio_seq_log10, axes = FALSE)
box()
axis(side = 4, las = 2, at = ratio_seq_log10, labels = ratio_seq)
}

3.6 NROY-ing some of the input space

I’ve not kept up with how this is done, so this might be rather crude. What we’ll consider here is ruling out parts of space for which \(\ell(z \mid x) - \ell_0 < \log(0.001)\). I don’t know if this is easily done in a more-than-two-dimensional way, so here a simple two-dimensional way will be considered. The idea will be to assume that, for each pair of inputs, the NROY region is convex and anything within its hull is not implausible.

We’ll start with a not-necessarily-convex approximation to this region.

threshold <- 1e-3
nroy <- lapply(mats, function(y) lapply(y, function(x) x > threshold))

Then we’ll write a (again a rather inefficient, yet verbose) function to identify convex hulls. (MASS::chull() will do this for a set of points, but given the rather coarse bins used, and rather small number of inputs, given the input dimension and high output variation, it’s been avoided here.)

Click here to see the function.

make_convex <- function(x) {
if (is.matrix(x)) {
  x[!is.finite(x)] <- TRUE
  dim_x <- dim(x)
  cond <- TRUE
  while (cond) {
    x0 <- x
    for (i in 1:2) {
      pass <- suppressWarnings(apply(x, 1, function(x) range(which(x))))
      pass[!is.finite(pass)] <- 0
      x <- sapply(seq_len(dim_x[seq_len(2)[i]]), function(j) 
        replace(logical(dim_x[seq_len(2)[-i]]), pass[1, j]:pass[2, j], TRUE))
    }
    cond <- any(x != x0)
  }
  x <- apply(x, 2, as.integer)
} else {
  x <- NULL
}
x
}

Next we’ll covert the above NROY regions to be convex for each pairwise combination of inputs.

nroy <- lapply(nroy, function(y) lapply(y, function(x) make_convex(x)))
nroy <- unlist(nroy, recursive = FALSE)
nroy <- nroy[!sapply(nroy, is.null)]

Suppose we want a sample of inputs from NROY space. We’ll construct a function to identify points in NROY space.

keeper <- function(x, y, nmid) {
ix <- nmid * .5 * (x + 1)
ix <- floor(ix %% nmid) + 1
if (!inherits(ix, "matrix"))
  ix <- matrix(ix, 1)
ny <- length(y)
out <- vapply(y, function(w) w$z[ix[, c(w$idx, w$idy), drop= FALSE]], double(nrow(ix)))
drop(out %*% rep(1, ny) == ny)
}

Next we’ll tidy up the pairwise NROY data we’ve got

pairs <- combn(n_input, 2)
nroy_data <- lapply(seq_along(nroy), function(i) 
               list(z = nroy[[i]], idx = pairs[1, i], idy = pairs[2, i]))
nroy_data <- lapply(nroy_data, function(x) 
               append(list(x = brks[[x$idx]], y = brks[[x$idy]]), x))

which we can then use to sample 1000, say, inputs from NROY space

n_samp <- 1e3
samp <- matrix(nrow = n_samp, ncol = 5)
k <- 1
while(k < n_samp) {
  trial <- runif(n_input, -1, 1)
  if (keeper(trial, nroy_data, n_bin - 1)) {
    samp[k, ] <- trial
    k <- k + 1
  }
}
head(samp)
##            [,1]        [,2]        [,3]       [,4]        [,5]
## [1,] 0.68400346 -0.48236414 -0.84217169 -0.4665035 -0.25186345
## [2,] 0.08299794  0.65953913 -0.84573984  0.3346530  0.51906677
## [3,] 0.17199236 -0.65464550  0.12503287  0.1614428 -0.01207468
## [4,] 0.52886143  0.04731404 -0.04611204  0.5092094 -0.03032292
## [5,] 0.32872049 -0.27893624  0.29723345 -0.8749857 -0.07459065
## [6,] 0.50804485  0.52437264 -0.11920640 -0.5267394  0.12895448

in a very slow way.

We can end with a sanity check viewing the NROY space’s convex pairwise regions (lower triangle), and the marginal (diagonal) and pairwise (upper triangle) samples from it.

Click here to see the plotting code.

plot_wave <- function() {
layout(grid)
par(mar = rep(.5, 4), oma = c(3, 3, 0, 4))

# histograms on diagonal
for (i in 1:n_input) {
  hist(samp[, i], axes = FALSE, probability = TRUE, main = "")
  box()
  if (i == 1) {
    axis(side = 2, cex.axis = .7)
    mtext(side = 2, text = paste("input", i), line = 2, cex = .7)
  }
  if (i == n_input) {
    axis(side = 1, cex.axis = .7)
    mtext(side = 1, text = paste("input", i), line = 2, cex = .7)
  }
}

# pairwise NROY
k <- 1
for (i in 1:(n_input - 1)) {
  for (j in (i + 1):n_input) {
    plot_ij <- list(x = brks[[i]], y = brks[[j]], z = nroy[[k]])
    image(plot_ij, col = grey(1:0), breaks = seq(-.5, 1.5, by = 1), axes = FALSE)
    k <- k + 1
    box()
  if (i == 1) {
    axis(side = 2, cex.axis = .7)
    mtext(side = 2, text = paste("input", j), line = 2, cex = .7)
  }
  if (j == n_input) {
    axis(side = 1, cex.axis = .7)
    mtext(side = 1, text = paste("input", i), line = 2, cex = .7)
  }
}
}

# pairwise samples from NROY
for (i in 2:n_input) {
  for (j in 1:(i - 1)) {
    plot(samp[, i], samp[, j], pch = 20, axes = FALSE)
    box()
  }
}
}

4 Emulation

Obviously, we could — and perhaps eventually should — try and emulate the simulator’s outputs, e.g. the numbers of cases on a given day in a given UA, but this sounds like quite a challenge. Instead, to kick things off, we might consider the approach of Oakley & Youngman (2017), in which the idea is simply to emulate the (logarithm of) the predictive probability surface, given the simulator’s inputs. Let’s introduce some emulator-friendly notation. Consider \(f(x)\) and suppose \(f(x) \sim GP(m(x), v(x, \;)),\) where \(m(x) = h^T(x) \beta\), with \(h()\) comprising some basis functions.

We’ll use mogp to fit an emulator. Having installed mogp_emulator following this guide; then, based on this guide, we’ll build the emulator using

mogp_dir <- "/path/to/mogp_emulator"
setwd('./path/to/ExeterUQ_MOGP')
source('BuildEmulator/BuildEmulator.R')

A useful thing to first consider is \(h()\)’s form. A priori, we might want to hope that the best inputs don’t lie on the boundary of \(\mathcal{D} = [-1, 1]^5\), input space. Hence, we should consider some form for \(h()\) that allows \(f()\) not to peak on the boundary of \(\mathcal{D}\). For example, \(h^T(x) = (1, x_1, \ldots, x_5, x_1^2, \ldots, x_5^2)\) meets this criterion, whereas \(h_T(x) = (1, x_1, \ldots, x_5)\) doesn’t; so we’ll choose the former. However, this doesn’t seem ideal, given the relatively small number of simulator runs at our disposal, as it gives \(\beta^T = (\beta_1, \ldots, \beta_q)\) with \(q = 11\); i.e. a perhaps slightly too large \(q\). Are there any more parsimonious forms for \(h()\)? Nonetheless, we’ll proceed with this \(h()\), so we’ll specify our mean function and priors in mogp format.

mean_func <- paste("y1 ~", 
                   paste(paste("x", seq_len(n_input), sep = ""), collapse = " + ")
                   , "+", 
                   paste(paste("I(x", seq_len(n_input), "^2)", sep = ""), collapse = " + "))

priors <- list(mogp_priors$NormalPrior(0., 1.),
               mogp_priors$InvGammaPrior(2., 1.), 
               mogp_priors$GammaPrior(1., 0.2))
priors <- priors[rep(1:3, c(1 + 3 * n_input, 1, 1))]

For \(v(x, \;)\) we’ll just use mogp’s default, allowing for a nugget since MetaWards is stochastic.

Rather more important is what’s emulated. Firstly, we’ll just consider building an emulator on the NROY region. So, let’s ditch those inputs outside of that region

in_nroy <- keeper(as.matrix(inputs), nroy_data, n_bin - 1)
D <- as.matrix(inputs)[in_nroy, ]
f_D <- log_pi_z[in_nroy]

which loses 104 simulator runs (and might be a few too many to drop, given that we’ve got a five-dimensional input space).

Then we’ll use a transformation of the response, based on some behind-the-scenes investigation, to stabilise its variance, which should improve the emulator’s fit. We’ll be emulating \(f(x) = -\sqrt{\ell_0 - \ell(z \mid x)}\).

trans <- function(x, x0) -sign(x0 - x) * sqrt(abs(x0 - x))
itrans <- function(x, x0) x0 - x^2
f_D <- trans(f_D, l_0)

Next we’ll put our data in the right format for mogp.

em_data <- data.frame(D, f_D)
names(em_data) <- c(paste("x", seq_len(n_input), sep = ""), "y1")

target_list <- extract_targets(em_data, target_cols = c("y1"))
inputs <- target_list[[1]]
targets <- target_list[[2]]
inputdict <- target_list[[3]]

Then we can fit the emulator.

gp <- mogp_emulator$GaussianProcess(inputs, targets,
  mean = mean_func, priors = priors, nugget = "fit",
  inputdict = inputdict)
gp <- mogp_emulator$fit_GP_MAP(gp)

It’s convenient to now set up a wrapper for predictions of the emulator mean

post_mean <- function(x, gp) {
x <- as.matrix(x)
if (ncol(x) == 1)
  x <- t(x)
gp$predict(x)$mean
}

which we’ll use to give the posterior mean for the training data, and which will be checked against the response values for the training data

mu_star <- post_mean(inputs, gp)
plot(mu_star, f_D, xlab = "Posterior mean", ylab = "f(D)", asp = 1)
title("Simulator output vs. emulator posterior mean")
abline(0, 1, lty = 2)

The agreement is not bad, especially since we’ve kept at least one redundant input: it looks like the emulator could help us rule out some more of input space. So, let’s give it a go.

We’ll use function gibbs, which is a basic Gibbs’ sampler.

Click here to see code for a Gibbs’ sampler.

gibbs <- function(x, f, ..., n = 1e2, sig = .1) {

p <- length(x)
if (length(sig) < p)
  sig <- rep(sig[1], p)
z <- matrix(x, p, n)
y <- numeric(n)

y[1] <- f0 <- f(x, ...)

for (i in 2:n) {
  z[, i] <- z[, i - 1]
  E <- -rexp(p)
  for (j in 1:p) {
    z1 <- z[, i]
    z1[j] <- z1[j] + rnorm(1, 0, sig[j])
    f1 <- f(z1, ...)
    if (f1 - f0 > E[j]) {
      z[j, i] <- z1[j]
      f0 <- f1
    }
    y[i] <- f0
  }
}

attr(z, "y") <- y
return(z)

}

First we’ll set up a function to give the emulated log predictive probability for input \(x\), thus undoing the variance-stabilising transformation.

fx <- function(x, gp, nroy, nmid, x0) {
if (any(abs(x) > 1)) 
  return(-1e20)
if (!keeper(x, nroy, nmid)) 
  return(-1e20)
out <- post_mean(x, gp)
itrans(out, x0)
}

Then we’ll run the Gibbs’ sampler for nc iterations, thinning every 10,

nc <- 1e4
thin <- seq_len(nc) %% 10 == 0
new_X <- gibbs(numeric(5), fx, gp = gp, 
              nroy = nroy_data, nmid = n_bin - 1, x0 = l_0, 
              n = nc, sig = c(2, 2, 1, 2, 2))
new_mu <- attr(new_X, "y")[thin]
new_X <- new_X[, thin]

which is little bit slow. We’ll save these.

save(new_mu, new_X, file = "valid.rda")

Finally, let’s take a look at the sampled inputs, with histograms on the diagonal, two-dimensional kernel density estimates on the lower triangle, and the raw inputs on the upper diagonal.

Click here for the plotting code.

plot_em_wave <- function() {
layout(cbind(grid, max(grid) + 1), width = c(rep(1, n_input), .2))
par(mar = rep(.5, 4), oma = c(3, 3, 0, 4))
kde_seq <- c(t(outer(10^seq(-6, 0), c(1, 5))))
kde_seq_log10 <- log10(kde_seq)
kde_mids_log10 <- kde_seq_log10[-1] - .5 * diff(kde_seq_log10)
kde_pal <- rev(grey(seq(0, 1, l = length(kde_seq) - 1)^.5))

# histograms on diagonal
for (i in 1:n_input) {
  hist(new_X[i, ], breaks = seq(-1, 1, by = .2), 
                     main = "", axes = FALSE, prob = TRUE)
  box()
  if (i == 1) {
    axis(side = 2, cex.axis = .7)
    mtext(side = 2, text = paste("input", i), line = 2, cex = .7)
  }
  if (i == n_input) {
    axis(side = 1, cex.axis = .7)
    mtext(side = 1, text = paste("input", i), line = 2, cex = .7)
  }
}

# pairwise two-dimensional kernel density estimates
for (i in 1:(n_input - 1)) {
  for (j in (i + 1):n_input) {
    k <- kde2d(new_X[i, ], new_X[j, ], n = 20, lims = c(-1, 1, -1, 1))
    image(k, col = kde_pal, breaks = kde_seq, axes = FALSE)
    box()
  if (i == 1) {
    axis(side = 2, cex.axis = .7)
    mtext(side = 2, text = paste("input", j), line = 2, cex = .7)
  }
  if (j == n_input) {
    axis(side = 1, cex.axis = .7)
    mtext(side = 1, text = paste("input", i), line = 2, cex = .7)
  }
}
}

# pairwise samples based on emulator mean
for (i in 2:n_input) {
  for (j in 1:(i - 1)) {
    plot(new_X[i, ], new_X[j, ], pch = 20, axes = FALSE, col = grey(0, alpha = .3))
    box()
  }
}

# color scale on right-hand side
key <- list(x = 1, y = kde_seq_log10, z = t(kde_mids_log10))
image(key, col = kde_pal, breaks = kde_seq_log10, axes = FALSE)
box()
axis(side = 4, las = 2, at = kde_seq_log10, labels = kde_seq)
}

Perhaps most striking is that small values of input 3 appear best.

5 Emulation: potential next steps

Although the emulator appears to identify distinct parts of input space for which the simulator’s ouptut are relatively good. It is important to establish whether the emulated predictive probabilies reasonably the actual values.

We’ll take 40 inputs from the samples, and run five replicates for each. These are then input to MetaWards.

n_valid <- 40
thinner <- seq_len(ncol(new_X)) %% round(ncol(new_X) / n_valid) == 0
run2 <- as.data.frame(t(new_X[, thinner]))
target <- new_mu[thinner]
run2$repeats <- 5
write.csv(run2, file = "run2.csv", row.names = FALSE)
system("uq4metawards-uq3a run2.csv limits.csv run2_lims.csv -f")
system("uq4metawards-uq3b run2_lims.csv run2_metawards.csv -f")
system("metawards -d ncov -o raw_outputs --force-overwrite-output --input run2_metawards.csv -a ExtraSeedsLondon.dat --extractor only_i_per_ward_day152 -u lockdown_states.txt --iterator iterate --start-date 2020/01/01 --theme simple")

Following a similar sequence to creating the calibration data, the new simulator runs will be loaded.

next_trajectories <- list.files("raw_outputs/", pattern = "wards_trajectory_I.csv.bz2", 
                           recursive = TRUE, full.names = TRUE)
next_metawards_output <- lapply(next_trajectories, raw2tidy, starts = calib_starts, ends = calib_ends)

Then we’ll process the runs, calculate the predictive probabilities for each, and see how many exceed the previous best.

next_inputs <- t(sapply(next_metawards_output, function(x) unlist(x[[1]])))
limits <- read.csv("limits.csv")
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'limits.csv'
for (i in seq_len(ncol(next_inputs))) {
  next_inputs[, i] <- 2 * ((next_inputs[, i] - limits[1, i]) / diff(limits[, i]) - .5)
}
colnames(next_inputs) <- c("incubation_time", "infectious_time", "r_zero", "lock_1_restrict", "lock_2_release")

next_outputs_by_ward <- lapply(next_metawards_output, "[[", 2)
wards <- read.csv("Ward_Lookup.csv")
next_outputs <- lapply(next_outputs_by_ward, function(z) apply(z, 2, function(x) tapply(x, wards$LAD11CD, sum)))
next_outputs <- lapply(next_outputs, function(x) x[output_keep, ])

next_y <- next_outputs
next_y <- lapply(next_y, function(x) pmin(x, m))
next_y_c <- lapply(next_y, function(x) tau_0 * x)
next_log_pi_z <- sapply(next_y_c, pred_prob, z = z, n = n, m = m, lambda_z = lambda_z, lambda_y = lambda_y)

sum(next_log_pi_z > l_0)
## [1] 30

Then we’ll plot the (scaled) predictive probabilities against their emulator predictions.

matcher <- function(x, y) {
x2 <- apply(round(x, 2), 1, paste, collapse = ":")
y2 <- apply(round(y, 2), 1, paste, collapse = ":")
match(y2, x2)
}

given_inputs <- t(new_X[, thinner])
matched <- matcher(given_inputs, next_inputs)
new_mu2 <- new_mu[thinner][matched]
plot(trans(new_mu2, l_0), trans(next_log_pi_z, l_0), xlab = "Posterior mean", ylab = "f(D)", asp = 1)
title("New simulator output vs. emulator posterior mean")
abline(0, 1, lty = 2)

These look okay, but there are signs that the emulator’s predictive probability predictions might be slightly off in at least one part of input space. So we’ll build a new emulator that incorporates the new simulator runs.

new_log_pi_z <- c(log_pi_z[in_nroy], next_log_pi_z)
new_l_0 <- max(new_log_pi_z)
new_f_D <- trans(new_log_pi_z, new_l_0)
new_D <- rbind(D, given_inputs[matched, ])

new_em_data <- data.frame(new_D, new_f_D)
names(new_em_data) <- c(paste("x", seq_len(n_input), sep = ""), "y1")

new_target_list <- extract_targets(new_em_data, target_cols = c("y1"))
new_inputs <- new_target_list[[1]]
new_targets <- new_target_list[[2]]
new_inputdict <- new_target_list[[3]]

new_gp <- mogp_emulator$GaussianProcess(new_inputs, new_targets,
  mean = mean_func, priors = priors, nugget = "fit",
  inputdict = new_inputdict)
new_gp <- mogp_emulator$fit_GP_MAP(new_gp)

new_mu_star <- post_mean(new_inputs, new_gp)
plot(new_mu_star, new_f_D, xlab = "Posterior mean", ylab = "f(D)", asp = 1)
title("Simulator output vs. new emulator posterior mean")
abline(0, 1, lty = 2)

At this stage we could think about increasing \(\lambda_y\) and \(\lambda_z\), and looping through the above procedure.

6 Extensions and issues

6.1 Non-independence of cases in different UAs

It’s probably not okay to assume that, for the observations, numbers of COVID-19 cases in one UA are independent of those in another. Two partial solutions to this spring to mind: 1), based on Bayesian inference for misspecified models, is to adjust the posterior predictive probabilities according a power (where \(0 < \text{power} \leq 1\)), which can be calculated by considering Hessians and score statistic variances (which, for now, I won’t elaborate on); or 2) to assume some kind of dependence model, such as a Gaussian Markov random field. Looking at the output of the `best’ simulator run, dependence in the simulator output also appears present, and may be trickier to allow for. 1) is nearly equivalent to reducing the discrepancy parameter \(\lambda\) a bit.

6.2 Multiple COVID-19 contractions

Ultimately, whether a person contracts COVID-19 is assumed to follow a Bernoulli distribution. If a person can contract it multiple times, this assumption is clearly wrong. I don’t know whether the MetaWards model allows for this. But for some reason it did allow numbers of cases in UAs greater than the UA population (assuming that my matching of the two was correct). If the proportion of cases per UA doesn’t approach unity, I don’t think this is a big issue; but if it does, the distributional assumptions of the calibration model will need changing.

6.3 Wards and not UAs

I don’t know if the calibration data, which are at UA level, are available at ward level. If they are, the calibration methodology should readily transfer. Issues with between-ward dependence might become more necessary to address, though.

6.4 Direct MCMC

Oakley & Youngman (2017) weren’t able to calibrate their simulator by direct MCMC. However, this might be possible for MetaWards. I don’t know how long a MetaWards run takes, but if it’s not too long, then this could be considered. Alternatively, some sort of hybrid MCMC might make it possible.