Model Description

Requires MetaWards development branch currently

Please download files for running the model from here.

The basic model structure is:

$S$: susceptible;
$E$: latent (infected but not infectious);
$A$: infectious but asymptomatic;
$P$: infectious and pre-symptomatic;
$I$: infectious and symptomatic (split into $I_1$ and $I_2$ for timing purposes);
$H$: hospitalised (see below);
$R$: recovered and immune;
$D$: died.

Each of these classes will be split into $J$ age-classes, with interactions between the classes described in detail below.

MetaWards setup

The model structure above allows for different progression pathways. We also split the population up into different age demographics, to allow for variable mixing between the age-groups. We define a customised mover function that is used to move individuals through different pathways, and mixer functions that scale the force-of-infection (FOI) terms between the different age demographics (explained later). There are currently five pathways, which will be described below, but can be summarised as:

$SEAR$: asymptomatic infections, always recover.
$SEPIR$: symptomatic infections, leading to recovery.
$SEPID$: symptomatic infections, leading to death.
$SEPIHR$: symptomatic infections, leading to hospital and then recovery.
$SEPIHD$: symptomatic infections, leading to hospital and then death.

We are calibrating to hospitalised deaths, and so to simplify the model and aid calibration, we assume that the $H$ class is non-infectious, but split the $I$ class into two stages, $I_1$ and $I_2$. Therefore, on average individuals are infectious for longer if they do not go to hospital. Some evidence suggests that individuals can remain symptomatic for much longer than they remain infectious, but from a transmission perspective, once individuals cease to be infectious they are equivalent to “recovered” individuals. As such, the $R_I$ class corresponds to recovered and non-infectious symptomatic individuals, and $T_P + T_{I_1} + T_{I_2}$ to the average infectious period (not average symptomatic period). The incubation period is thus $T_E + T_P$, and the time between symptom onset and admission to hospital is $T_{I_1}$. We also make the simplifying assumption that asymptomatic infections last the same time as symptomatic infections.

We need to specify initial proportions of individuals in each age-class, which are provided in the data/populationByAge/Pop by CTRY.csv file from Rob.

Note that the total number of individuals in the table below is different to the number of individuals in the commuter data that is used in MetaWards, since the commuter data come from the 2011 census, whereas the age-distributions above come from more recent population counts. Hence we use the more recent data to figure out the proportions of individuals in each age-category, which we pass to the MetaWards model.

However, we also need to specify the proportions of the workers and players that are in each age category. In this case we make the assumption that all individuals $<18$ years old or $\geq 70$ years old are players only, and furthermore we assume that school-age children don’t move from their home LADs—essentially assuming that most children go to school in their home LADs. (Note the play movements of the $<5$ year olds reflects the movements of the parents/carers). To implement this in MetaWards we can set the proportions of workers/players in each age-class as described below, and then set an age-specific parameter static_play_at_home for each age-class that we wish to restrict movements for. This latter part is done in the user input file here.

For the worker/player proportions, consider that if $p_j$ is the proportion of individuals in age-class $j$ in the population, then the proportion of workers in age-class $j$ is: \[ p^W_j = \left\{\begin{array}{ll} 0 & \mbox{if $a_j < 18$},\\ \frac{p_j}{\sum_{\{k : 18 \leq a_k < 70\}} p_k} & \mbox{if $j$ s.t. $18 \leq a_j < 70$},\\ 0 & \mbox{if $a_j \geq 70$}. \end{array}\right. \] To generate the proportions of players in each age-class, we need to use the fact that the sum of workers and players in each age-class must equal the number in the population. Hence we require that \[\begin{align*} p^P_jN^P + p^W_j N^W &= p_j \left(N^P + N^W\right)\\ \Rightarrow p^P_j &= \frac{p_j \left(N^P + N^W\right) - p^W_j N^W}{N^P}, \end{align*}\] where $N^W = 22,469,005$ and $N^P = 33,613,072$ are the total number of workers and players in the population.

Age	Count	Prop. Pop.	Prop. Workers	Prop. Players
<5	3,346,727	0.060	0.000	0.1000
5-17	8,607,891	0.154	0.000	0.2570
18-29	8,615,247	0.154	0.235	0.0999
30-39	7,505,080	0.134	0.205	0.0865
40-49	7,189,826	0.128	0.195	0.0832
50-59	7,488,780	0.134	0.205	0.0865
60-69	5,866,967	0.105	0.160	0.0682
70+	7,356,660	0.131	0.000	0.2190

The demographics in MetaWards is set up using the demographics.json file.

{
    "demographics" : ["age1", "age2", "age3", "age4", "age5", "age6", "age7", "age8"],
    "work_ratios" : [0, 0, 0.235, 0.205, 0.195, 0.205, 0.16, 0],
    "play_ratios" : [0.100108, 0.256943, 0.099855, 0.086539, 0.083213, 0.086539, 0.068235, 0.218568],
    "diseases" : ["ncov", "ncov", "ncov", "ncov", "ncov", "ncov", "ncov", "ncov"]
}

The stages can be set with the MetaWards disease file, called ncov.json here.

{ 
  "name"             : "SARS-Cov-2 (genpop)", 
  "stage"            : ["E", "P", "I1", "I2", "RI", "DI", "A", "RA", "H", "RH", "DH"],
  "beta"             : [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
  "progress"         : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
  "too_ill_to_move"  : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
  "contrib_foi"      : [0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]
}

We will discuss these choices in more detail in the subsequent sections. Note that the progress parameters are all set to zero here, since all movements will be controlled via the custom mover function, and the beta parameters are all set to one for infectious classes, but different values for different infectious classes will be passed in when the model is run, and the mixing between the age-classes will be controlled by a custom mixer function.

User inputs

Various functions require additional user inputs. These are stored in the inputs/user_inputs.txt file. This contains information on the dates of lockdown and other parameters that are necessary to govern various custom functions as described in more detail in various sections of this vignette. For example, the input .nage controls the number of age-classes in the model as described above.

# Normal status
.can_work[0]  = True

# Lockdown 1
.can_work[1]  = False

# Lockdown 2
.can_work[2]  = True

# Lockdown dates
.lockdown_date_1_year = 2020
.lockdown_date_1_month = 3
.lockdown_date_1_day = 21
.lockdown_date_2_year = 2020
.lockdown_date_2_month = 5
.lockdown_date_2_day = 13

# contact matrix
.contact_matrix1_filename = "inputs/POLYMOD_matrix.csv"
.contact_matrix2_filename = "inputs/coMix_matrix.csv"

# number of age classes
.nage = 8

# names of input files for seeding
.age_seed_filename = "inputs/age_seeds.csv"
.ini_states_filename = "inputs/seeds.csv"

# turn off movements in school-age children
age2:static_play_at_home = 1.0

Seeding

We seed infections proportionately to the age-structure of the population (described above). These are stored in a file called inputs/age_seeds.csv, containing the probabilities below:

1,0.06
2,0.154
3,0.154
4,0.134
5,0.128
6,0.134
7,0.105
8,0.131

Here the first column is the age demographic and the second is the probability of seeding into that age-demographic. Since we do not know where initial importations of infections happened, or where these infections happened, we need some way to inform this. We want to avoid treating the number/timing of initial importations as additional unknown parameters in the model, since this vastly increases the volume of parameter space to search. Instead we have developed a data driven prior for the states of the system at a given point in time, which depends on some data and also the parameters of the model.

The general approach is to take deaths observed in each Lower Tier Local Authority (LTLA) region on each day up to some fixed time point early on in the epidemic. Here we choose the 13th March 2020. We then fit a simplified version of the MetaWards model to each LTLA independently, where we assume a single population of individuals in each LTLA, and we remove age-structure from the model (taking a weighted average of the transition parameters across all the age-classes as inputs).

In addition to the transmission process, we also introduce a fixed parameter $p_{\text{ini}}$, which corresponds to the probability of an introduction of infection on a given day, that is independent of the transmission process. For a fixed set of parameters, we fit the model from 1st January 2020–13th March 2020 using an individual forwards-filtering backwards sampling (iFFBS) algorithm¹. Here the unknowns are the states of the system and the model is calibrated to the observed number of deaths in each LTLA. This provide a posterior distribution for the states of the system at each time point between 1st January 2020–13th March 2020, that is conditional on data. We take random samples from this distribution to provide initial states for our MetaWards model runs. We use truncated sampling to ensure that there is at least one individual that is either infectious, or can progress to infectiousness, to ensure that there is a non-zero probability of an outbreak in a given LTLA region. By separating out the data used to generate the seeds, and the data used to calibrate the model, we try to avoid too many of the pitfalls of using the data twice.

We then take multinomial samples to decide how individuals are seeded into each age-class. Details and instructions can be found in data/seedDeaths/README.md. This creates a file inputs/seedInfo.rds, which contains the data necessary to fit the iFFBS model for a given set of parameters. The file R_tools/simulateSeedStates.R will fit the iFFBS models according to each set of design points and generate a file called inputs/seeds.csv that contains initial states for use in MetaWards. The iFFBS code is contained in the R_tool/iFFBS.cpp file.

Note: we have made a few extensions to the standard iFFBS algorithm to make it more efficient for large populations with low numbers of infections.

Note: the user inputs file must contain the relative path names to the age and seed files—see here.

To implement this probabilistic seeding, we need a custom iterator function, which can be viewed below.

The contents of the iterator.py file is below. This contains code to initiate seeds and deal with lockdown (see here).

from metawards.utils import Console
from metawards.iterators import advance_infprob
from metawards.iterators import advance_play
from metawards.iterators import advance_fixed
from metawards.iterators import advance_foi
from metawards.iterators import advance_recovery
from metawards.iterators import advance_foi_work_to_play
from metawards.iterators import advance_work_to_play
from metawards.movers import MoveGenerator, go_ward
from datetime import datetime
import numpy as np
import sys
import re

# use caching to store matrix read in from filename
seed_file_cache = {}

# read in age lookup
def read_age_file(filename):
    global seed_file_cache
    if filename not in seed_file_cache:
        with open(filename, "r") as FILE:
            age_probs = [[num for num in line.split(',')] for line in FILE]
        Console.debug("Age-level seeding probabilities:", variables = [age_probs])
        
        # convert to correct format
        age_probs = np.array(age_probs)
        age_probs_ind = age_probs[:, 0].astype(int)
        age_probs = age_probs[:, 1].astype(float)
        
        # save in cache
        seed_file_cache[filename] = "STORED"
        seed_file_cache["age_probs_ind"] = age_probs_ind
        seed_file_cache["age_probs"] = age_probs
        
    return seed_file_cache["age_probs_ind"], seed_file_cache["age_probs"]

# functions to read seeding probabilities from file and/or return from cache
def read_states_file(filename):
    global seed_file_cache
    if filename not in seed_file_cache:
        with open(filename, "r") as FILE:
            ini_states = [[num for num in line.split(',')] for line in FILE]
        Console.debug("Initial states:", variables = [ini_states])
        
        # convert to correct format
        ini_states = np.array(ini_states)
        ini_states_output = ini_states[:, 0]
        ini_states_LAD = ini_states[:, 1].astype(int)
        ini_states = ini_states[:, 2:].astype(int)
        
        # save in cache        
        seed_file_cache[filename] = "STORED"
        seed_file_cache["ini_states_output"] = ini_states_output
        seed_file_cache["ini_states_LAD"] = ini_states_LAD
        seed_file_cache["ini_states"] = ini_states
        
    return seed_file_cache["ini_states_output"], seed_file_cache["ini_states_LAD"], seed_file_cache["ini_states"]
    
# determine the lock-down status based on the population and current network
def get_lock_down_vars(network, population):

    date = population.date
    params = network.params.user_params

    y1 = int(params["lockdown_date_1_year"])
    m1 = int(params["lockdown_date_1_month"])
    d1 = int(params["lockdown_date_1_day"])

    y2 = int(params["lockdown_date_2_year"])
    m2 = int(params["lockdown_date_2_month"])
    d2 = int(params["lockdown_date_2_day"])

    # Lock down dates
    lock_1 = datetime(y1, m1, d1).date()
    lock_2 = datetime(y2, m2, d2).date()
    
    state = 0
    rate = 1.0
    if date >= lock_1:
        state += 1
        rate = params["lock_1_restrict"]
    if date >= lock_2:
        state += 1
        rate = (1.0 - ((1.0 - params["lock_1_restrict"]) * params["lock_2_release"]))

    can_work = params["can_work"][state]
    return state, rate, can_work

# advance in a lock-down state weekday
def advance_lockdown_week(network, population, **kwargs):

    state, rate, can_work = get_lock_down_vars(network = network, population = population)

    advance_infprob(scale_rate = rate, network = network, population = population, **kwargs)
    advance_play(network = network, population = population, **kwargs)
    if can_work:
        advance_fixed(network = network, population = population, **kwargs)

# advance in a lockdown state weekend        
def advance_lockdown_weekend(network, population, **kwargs):

    state, rate, can_work = get_lock_down_vars(network = network, population = population)
    
    ## update dyn_play_at_home
    dyn_play_at_home = []
    for i in range(0, len(network.subnets)):
        dyn_play_at_home.append(network.subnets[i].params.dyn_play_at_home)
        network.subnets[i].params.dyn_play_at_home = network.subnets[i].params.user_params["p_home_weekend"]

    advance_infprob(scale_rate = rate, network = network, population = population, **kwargs)
    advance_work_to_play(network = network, population = population, **kwargs)
    
    ## reset dyn_play_at_home
    for i in range(0, len(network.subnets)):
        network.subnets[i].params.dyn_play_at_home = dyn_play_at_home[i]
        
# advance FOI weekend
def advance_foi_work_to_play_weekend(network, population, **kwargs):
    
    ## update dyn_play_at_home
    dyn_play_at_home = []
    for i in range(0, len(network.subnets)):
        dyn_play_at_home.append(network.subnets[i].params.dyn_play_at_home)
        network.subnets[i].params.dyn_play_at_home = network.subnets[i].params.user_params["p_home_weekend"]

    advance_foi_work_to_play(network = network, population = population, **kwargs)
    
    ## reset dyn_play_at_home
    for i in range(0, len(network.subnets)):
        network.subnets[i].params.dyn_play_at_home = dyn_play_at_home[i]

# set custom advance function
def advance_initial_seeds(output_dir, network, population, infections, profiler, rngs, **kwargs):
    
    # extract user parameters
    params = network.params
    
    # extract files name for initial seeding probabilities
    age_seed_filename = params.user_params["age_seed_filename"]
    ini_states_filename = params.user_params["ini_states_filename"]
    
    # start profiler
    p = profiler.start("additional_seeds")
    
    # set up lookups or read from cache
    age_probs_ind, age_probs = read_age_file(age_seed_filename)
    ini_states_output, ini_states_LAD, ini_states = read_states_file(ini_states_filename)
    
    # get output identifier for this run of MetaWards
    output = re.search(r"Ens([0-9a-z]*)", output_dir.output_dir())
    output = output.group(0)
    
    # set up vector of state names
    state_names = ["E", "P", "I1", "I2", "RI", "DI", "A", "RA", "H", "RH", "DH"]
    state_names = np.array(state_names)
    state_nums = range(len(state_names))
    state_nums = np.array(state_nums)
    
    # extract states for this run
    filter_states = [i == output for i in ini_states_output]
    if(sum(filter_states) == 0):
        raise Exception(f"Cannot find {output} seeding information.")
    
    # filter seeding information relating to this run
    ini_states_LAD = ini_states_LAD[filter_states]
    ini_states = ini_states[filter_states, :]
    
    # loop over LADs
    for k in range(len(ini_states_LAD)):
        
        # check for any seeds
        filter_LAD = [i == ini_states_LAD[k] for i in ini_states_LAD]
        ini_states_curr = ini_states[filter_LAD, :][0]
    
        # loop over states
        for j in range(len(ini_states_curr)):
    
            # extract number of affected individuals
            nind = ini_states_curr.sum()
            
            if nind > 0:
                
                # select seeds in age-classes at random according to initial probabilities
                age_seeds = np.random.multinomial(nind, age_probs)
                
                # run over each age demographic
                for demographic in range(len(age_seeds)):
                    
                    # check if any seeding done in demographic
                    while age_seeds[demographic] > 0:
                        
                        # select states for given seeds
                        state_probs = ini_states_curr / ini_states_curr.sum()
                        
                        # select states
                        states = np.random.multinomial(1, state_probs)
                        filter_state = [i == 1 for i in states]
                        state = state_names[filter_state]
                        staten = state_nums[filter_state]
                        
                        # remove selected state from states
                        ini_states_curr[staten] -= 1
                        
                        # generate move
                        move = MoveGenerator(from_demographic = f'age{demographic + 1}',
                            to_demographic = f'age{demographic + 1}',
                            from_ward = ini_states_LAD[k],
                            to_ward = ini_states_LAD[k],
                            from_stage = "S",
                            to_stage = state[0],
                            number = 1)

                        go_ward(generator = move, output_dir = output_dir, network = network, 
                                population = population, infections = infections, 
                                profiler = profiler, rngs = rngs, **kwargs)
                            
                        # update counter
                        age_seeds[demographic] -= 1

    # end profiler
    p.stop()

# custom iterator function
def iterate_lockdown(stage: str, population, **kwargs):

    # is this a weekday or a weekend?
    if population.date is None:
        # have to guess from the day - assume day zero was a Monday
        day = population.day % 7
        # 0-4 is a weekday, 5 and 6 are weekend
        is_weekend = (day >= 5)
    else:
        day = population.date.weekday()
        # 0-4 is a weekday, 5 and 6 are weekend
        is_weekend = (day >= 5)

    # need custom advance functions to scale rates after lockdown
    # in period before lockdown this treats workers as players at
    # weekends
    if population.day == 1:
        if stage == "setup":
            return [advance_initial_seeds]
        else:
            return []
    else:
        if stage == "foi":
            if is_weekend:
                return [advance_foi_work_to_play_weekend, advance_recovery]
            else:
                return [advance_foi, advance_recovery]
        elif stage == "infect":
            if is_weekend:
                return [advance_lockdown_weekend]
            else:
                return [advance_lockdown_week]
        else:
            # we don't do anything at the "analyse" or "finalise" stages
            return []

Parameters

For generic states $X$ and $Y$ say, the progress of an individual in age-class $j$ ($j = 1, \dots, J$) from state $X$ to state $Y$ in a given day is governed by probability $q^j_{XY}$, where:

$q^j_{SE}$ driven by a probability of transmission per infectious contact, $\nu$, which is defined by $R_0$ and the next-generation matrix, which in turn depends on other parameters below and the population contact matrix ($C$) between age-classes. See here for a complete discussion;
$q^j_{EP} = p^j_{EP}\left(1 - e^{-\gamma_{E}}\right)$ where $\gamma_E = \frac{1}{T_E}$ with $T_E$ the mean latent period;
We model: $\log\left(p^j_{EP}\right) = \alpha_{EP} + \eta \mbox{age}_j$. The $\mbox{age}_j$ term is the middle age in age-class $j$.
$q^j_{EA} = \left(1 - p^j_{EP}\right)\left(1 - e^{-\gamma_{E}}\right)$;
$q^j_{AR} = 1 - e^{-\gamma_A}$ where $\gamma_A = \frac{1}{T_P + T_{I_1} + T_{I_2}}$ with $T_P + T_{I_1} + T_{I_2}$ the mean infectious period;
$q^j_{PI_1} = 1 - e^{-\gamma_{P}}$ where $\gamma_P = \frac{1}{T_P}$ with $T_P$ the mean pre-symptomatic infectious period;
$q^j_{I_1H} = p^j_{I_1H}\left(1 - e^{-\gamma_{I_1}}\right)$ where $\gamma_{I_1} = \frac{1}{T_{I_1}}$ with $T_{I_1}$ the mean pre-hospitalisation symptomatic infectious period;
$q^j_{I_1D} = p^j_{I_1D}\left(1 - e^{-\gamma_{I_1}}\right)$;
$q^j_{I_1I_2} = \left(1 - p^j_{I_1H} - p^j_{I_1D}\right)\left(1 - e^{-\gamma_{I_1}}\right)$;
Similarly to before, we model: $\log\left(p^j_{I_1H/D}\right) = \alpha_{I_1H/D} + \eta \mbox{age}_j$.
$q^j_{I_2R} = 1 - e^{-\gamma_{I_2}}$ where $\gamma_{I_2} = \frac{1}{T_{I_2}}$ is the residual mean symptomatic infectious period for non-hospitalised individuals;
$q^j_{HD} = p^j_{HD}\left(1 - e^{-\gamma_{H}^j}\right)$ where $\gamma_H^j = \frac{1}{T_H^j}$ with $T_H^j$ the mean length of hospital stay for age class $j$;
$q^j_{HR} = \left(1 - p^j_{HD}\right)\left(1 - e^{-\gamma_{H}^j}\right)$;
Similarly to before, we model: $\log\left(p^j_{HD}\right) = \alpha_{HD} + \eta \mbox{age}_j$.
We also model: $\log\left(T_H^j\right) = \alpha_{T_H} + \eta_{T_H} \mbox{age}_j$ where $\eta_{T_H} > 0$.

We also require inputs for $R_0$, a parameter $\nu_A$ which scales the force-of-infection between asymptomatics and the general population, and further scaling parameters that we use to model various lockdown measures (described below). We also need a parameter $p_{\text{home/weekend}}$ which scales the number of movements at weekends.

From the model structure above, in a single population, the force-of-infection, $\lambda^j$, acting on an individual in age-class $j$ is: \[\begin{align*} \lambda^j &= \sum_{k=1}^J \frac{\beta^{kj}}{N^k}\left(P^k + I_1^k + I_2^k + \nu_A A^k\right) \end{align*}\] In practice we model the transmission parameters as: \[ \beta^{kj} = \nu c^{kj}, \] where $0 < \nu < 1$ is a probability of infection per contact, and $c^{kj}$ is the population contact rate that an individual in class $j$ has with individuals in class $k$. We use the population contact matrix $C = \{c^{kj}; k = 1, \dots, J; j = 1, \dots, J\}$ from the POLYMOD survey in the early stages of the outbreak, and then the CoMix survey after the first lockdown.

A challenge is that we parameterise our model in terms of $R_0$, not $\nu$, and thus we need a way to derive $\nu$ from $R_0$ for a given run. To this end, $R_0$ can be derived as the maximum eigenvalue of the next-generation matrix (NGM). Firstly, let $F_i(x)$ be the rate of new infections into infected class $i$, and $V_i(x)$ be the rate of transitions from infected class $i$. Then define matrices: \[ \mathbf{F} = \left[\frac{\partial F_i\left(x_0\right)}{\partial x_j}\right] \quad \mbox{and} \quad \mathbf{V} = \left[\frac{\partial V_i\left(x_0\right)}{\partial x_j}\right] \quad \mbox{for}~1 \leq i, j, \leq J, \] where $x_0$ corresponds to the states at the disease-free equilibrium. Then, the NGM, $\mathbf{K}$, is: \[ \mathbf{K} = \mathbf{FV}^{-1}, \] and $R_0$ is the maximum eigenvalue of $\mathbf{K}$.

From the model structure above, in a single population with age-structure, the infected classes are $E^j$, $A^j$, $P^j$, $I_1^j$ and $I_2^j$ for age-classes $j = 1, \dots, J$. In a deterministic model these would have rates-of-change of: \[\begin{align*} \frac{dE^j}{dt} &= S^j\left[\sum_{k=1}^J \frac{\beta^{kj}}{N^k}\left(P^k + I_1^k + I_2^k + \nu_A A^k\right) \right] - \gamma_E E^j\\ \frac{dA^j}{dt} &= \left(1 - p_{EP}^j\right)\gamma_E E^j - \gamma_A A^j\\ \frac{dP^j}{dt} &= p_{EP}^j\gamma_E E^j - \gamma_P P^j\\ \frac{dI_1^j}{dt} &= \gamma_P P^j - \gamma_{I_1} I_1^j\\ \frac{dI_2^j}{dt} &= \left(1 - p_{I_1H}^j - p_{I_1D}^j\right)\gamma_{I_1} I_1^j - \gamma_{I_2} I_2^j\\ \end{align*}\] Here $\beta^{kj}$ is the transmission rate from class $k$ to class $j$ and $N_k$ is the total number of individuals in class $k$. The parameter $\nu_A$ scales the force-of-infection from the $A$ class relative to the other infectious classes.

The non-zero components needed to derive the NGM are: \[\begin{align*} \frac{\partial F_{E^i}\left(x_0\right)}{\partial E^j} &= 0 &\qquad \forall i, j,\\ \frac{\partial F_{E^i}\left(x_0\right)}{\partial A^j} &= \frac{\beta^{ji}\nu_A S^i}{N^j} &\qquad \forall i, j\\ \frac{\partial F_{E^i}\left(x_0\right)}{\partial P^j} &= \frac{\beta^{ji}S^i}{N^j} &\qquad \forall i, j\\ \frac{\partial F_{E^i}\left(x_0\right)}{\partial I_1^j} &= \frac{\beta^{ji}S^i}{N^j} &\qquad \forall i, j\\ \frac{\partial F_{E^i}\left(x_0\right)}{\partial I_2^j} &= \frac{\beta^{ji}S^i}{N^j} &\qquad \forall i, j\\ \end{align*}\] and \[\begin{align*} \frac{\partial V_{E^i}\left(x_0\right)}{\partial E^j} &= \left\{\begin{array}{cc} \gamma_E & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{A^i}\left(x_0\right)}{\partial E^j} &= \left\{\begin{array}{cc} -\left(1 - p_{EP}^i\right)\gamma_E & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{A^i}\left(x_0\right)}{\partial A^j} &= \left\{\begin{array}{cc} \gamma_A & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{P^i}\left(x_0\right)}{\partial E^j} &= \left\{\begin{array}{cc} -p_{EP}^i\gamma_E & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{P^i}\left(x_0\right)}{\partial P^j} &= \left\{\begin{array}{cc} \gamma_P & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{I_1^i}\left(x_0\right)}{\partial P^j} &= \left\{\begin{array}{cc} -\gamma_P & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{I_1^i}\left(x_0\right)}{\partial I_1^j} &= \left\{\begin{array}{cc} \gamma_{I_1} & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{I_2^i}\left(x_0\right)}{\partial I_1^j} &= \left\{\begin{array}{cc} -\left(1 - p_{I_1H}^i - p_{I_1D}^i\right)\gamma_{I_1} & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise,} \end{array}\right. &\\ \frac{\partial V_{I_2^i}\left(x_0\right)}{\partial I_2^j} &= \left\{\begin{array}{cc} \gamma_{I_2} & \mathrm{for}~i = j,\\ 0 & \mathrm{otherwise.} \end{array}\right. &\\ \end{align*}\] Since the non-zero components of $\mathbf{F}$ all contain $\beta^{ji}$, where \[ \beta^{ji} = \nu c^{ji}, \] we can thus write \[ \mathbf{K} = \nu\mathbf{G}\mathbf{V}^{-1} \] where $\mathbf{G}$ is equivalent to replacing $\beta^{ji}$ by $c^{ji}$ in $\mathbf{F}$. As such: \[\begin{align*} R_0 &= \nu~\mathrm{eig}_M\left(\mathbf{G}\mathbf{V}^{-1}\right)\\ \Rightarrow \nu &= \frac{R_0}{\mathrm{eig}_M\left(\mathbf{G}\mathbf{V}^{-1}\right)} \end{align*}\] where $\mathrm{eig}_M(\mathbf{K})$ denotes the maximum eigenvalue of $\mathbf{K}$.

When using a mixture of contact matrices, we parameterise $\nu$ using the NGM evaluated at the initial contact structure, which represents the contact matrix expected in a completely susceptible population at the start of the outbreak.

Parameter ranges

We have derived plausible parameter bounds from data and the literature. We note that the analyses below are not intended to find accurate estimates for these parameters, but to find plausible ranges for these parameters.

\[\begin{align} \mbox{$R_0$}&: (2, 4.5) \qquad \mbox{from Leon and Rob}\\ T_E&: (0.1, 2)\\ T_P&: (1.2, 3)\\ T_{I_1}&: (2.8, 4.5)\\ T_{I_2}&: (0.0001, 0.5)\\ \alpha_{EP}, \alpha_{I_1H}, \alpha_{I_1D}, \alpha_{HD}, \eta&: \mbox{drawn from distribution as described below}\\ \alpha_{T_H}, \eta_{T_H}&: \mbox{drawn from distribution as described below}\\ \nu_A&: (0, 1)\\ p_{\text{home/weekend}}&: (0, 1) \end{align}\]

For the mean latent period, $T_E$, we set the upper bound such that the latent period distribution has an upper tail probability that is consistent with the generation interval distribution from Challen et al. (2020). This provides a conservative upper bound, since the generation time is equivalent to $T_E + T_P$ in our model. We choose a lower bound such that it is possible to have a short, but non-zero latent period. The figure below shows the latent period distributions at the lower and upper bounds of the parameter ranges for $T_E$.

To generate ranges for the length of the pre-symptomatic period, $T_P$, we note that the incubation period is equivalent to $T_E + T_P$ in our model, and hence through simulation we generated incubation period distributions at the lower and upper bounds of $T_E$, and chose bounds for $T_P$ such that at the lower bounds of $T_E$ and $T_P$ the incubation period distribution looks like the lower-mean incubation period distribution in Challen et al. (2020), and at the upper bounds of $T_E$ and $T_P$ the incubation period distribution looks like the higher-mean incubation period distribution in Challen et al. (2020). The figure below shows the incubation period distributions at the lower and upper bounds of the parameter ranges for $T_E$ and $T_P$, against the reference distributions in Challen et al. (2020).

To generate ranges for the length of the infectious period, $T_{I_1}$, we note that this is equivalent to the time-from-onset-to-admission in our model, and hence through simulation chose bounds for $T_{I_1}$ such that the distribution of $T_1$ was approximately consistent with the time-from-onset-to-admission in Challen et al. (2020), and also that the time-from-infection-to-admission, $T_E + T_P + T_{I_1}$, was consistent with the time-from-infection-to-admission in Challen et al. (2020), at both the lower and upper bounds for $T_E$ and $T_P$. The figure below shows these distributions at the lower and upper bounds of the parameter ranges, against the reference distributions in Challen et al. (2020).

For the remainder of infectivity time, $T_{I_2}$, we noted that Cevik et al. (2020) found no live virus beyond day 9 of symptom onset. As such, we used simulation to find upper and lower bounds for $T_{I_2}$. The upper bound of $T_{I_2}$ was chosen so that when $T_{I_1}$ was at its lower bound, the probability that the infectivity time post onset ($T_{I_1} + T_{I_2}$) was > 9 days was $\approx 0.05$. Conversely, if $T_{I_1}$ is set to its upper bound, then even for very low values of $T_{I_2}$ we were unable to get upper tail probabilities for $T_{I_1} + T_{I_2}$ of less than $\approx 0.13$, hence we set a lower bound for $T_{I_2}$ of 0.0001. At the upper bound for $T_{I_1}$ and $T_{I_2}$ these tail probabilities were $\approx 0.15$.

For the hospital stay times we used data from the CHESS study which is not publicly available. Below we plot the distribution of stay times by age, along with the best fitting exponential model in each case.

If we plot the log(mean hospital stay time) from the fitted exponential models against the midpoints of each age-category, then we get an approximate increasing linear relationship with age. To capture estimation uncertainties more robustly, we built a hierarchical Bayesian model (fitted using MCMC implemented in NIMBLE) to estimate both the mean hospital stay times in each age category, and then simultaneously the slope and intercept from Gaussian linear regression model fitted to the log(mean hospital stay time) from the exponential models. Full details can be found in the data/hospitalStays/ folder.

As a comparison, below we show the log(mean hospital stay time) estimates from the non-Bayesian models (fitted independently to each age-distribution) against the fitted line from the hierarchical Bayesian model, with 99% credible intervals.

The intercept and slope of this line provide the parameters we use to determine the age-structured hospital stay times, and we’ve decided to draw from the posterior distribution using a space-filling design for these parameters. We do this by generating large numbers of samples from the posterior, and then sub-sampling these using a maximin design to get appropriate coverage. To avoid having to run the MCMC each time we want more design points, we approximate the posterior distribution using a truncated finite Gaussian mixture model, fitted using the mclust package in R. The posterior distributions from the MCMC and the approximating FMM are shown below.

For the probabilities of transitioning along different pathways by age, we used two sources of information to generate plausible ranges to use as initial inputs into the calibration. We use model estimates of the infection fatality risk and the probability of hospitalisation given infection from Verity et al. (2020), and also some data from the CDC.

The estimates of the infection fatality risk and the probability of hospitalisation given infection from the CDC are obtained in the following way.

The cumulative number of cases and deaths for each age group by 26/02/2021 were downloaded from https://covid.cdc.gov/covid-data-tracker/#demographics.
We use the cumulative hospitalisation proportion (per 100,000 population) on 20/02/2021 (available from https://gis.cdc.gov/grasp/COVIDNet/COVID19_3.html), and the US population by age (available from https://www.statista.com/statistics/241488/population-of-the-us-by-sex-and-age/) to estimate the number of hospitalisations for the entire US population.
We obtain the number of in hospital deaths by age group from https://data.cdc.gov/dataset/NVSS-Provisional-COVID-19-Deaths-by-Place-of-Death/4va6-ph5s/data, by filtering the data for the entire US using the category “Place of death” / “Healthcare setting, inpatient”.

These estimates are provided in the data/pathways/CasesHospDeathInHospForUS_Update.xlsx file.

If we plot the log(probability of hospitalisation given infection), and the log(infection fatality risk) against the midpoints of each age-category, then we get an approximate increasing linear relationship with age in all cases, and after discussions we have decided to use log-linear relationships between the probabilities of transitions along certain pathways, and age, with the same slope parameter in each case, constrained such that there is always an increasing risk of more severe outcomes with age. We allow the intercepts to vary between the different transition pathways. Hence we have five parameters to calibrate: the intercepts $\alpha_{EP}$, $\alpha_{I_1D}$, $\alpha_{I_1H}$, $\alpha_{HD}$ and the common slope $\eta$.

To derive plausible ranges for these parameters, we used an ad-hoc MCMC approach based on simulated data sets. Firstly, we took the infection fatality risks and probabilities of hospitalisation estimates from Verity et al. (2020), and for each age-class we generated a pseudo-data set by randomly sampling from a binomial distribution with 100 trials and a probability of success given by the relevant estimate. This gave us age-specific counts for hospitalisations ($y_{iH}$) and death given infection ($y_{iD}$) given $n_i = 100$ “trials” for age-classes $i$. We then fitted a model to this pseudo-data set, targeting a posterior of the form: \[\begin{align} & \pi\left(\alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta \mid \mathbf{y}, \mathbf{n}\right) \propto\\ & \qquad \pi\left(\mathbf{y} \mid \alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta, \mathbf{n}\right) \pi\left(\alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta\right). \end{align}\] The likelihood component, $\pi\left(\mathbf{y} \mid \alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta, \mathbf{n}\right)$, depends on a set of latent variables, $\mathbf{x}$, and can be written as: \[ \pi\left(\mathbf{y} \mid \alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta, \mathbf{n}\right) = \int_{\mathcal{X}}\pi\left(\mathbf{y} \mid \mathbf{x}, \mathbf{n}\right)\pi\left(\mathbf{x} \mid \alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta\right) d \mathcal{X}, \] where $\mathcal{X}$ is the (multidimensional) parameter space for the latent states. Here the latent states correspond to the individuals transitioning down the different pathways in the model, and $\pi\left(\mathbf{y} \mid \mathbf{x}, \mathbf{n}\right)$ is a binomial probability mass function based on probabilities derived from $\mathbf{x}$ (here we assume that given $\mathbf{x}$, the data $y_{iH}$ and $y_{iD}$ are independent). To approximate the integral we used Monte Carlo simulation such that \[ \hat{\pi}\left(\mathbf{y} \mid \alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta, \mathbf{n}\right) = \frac{1}{M}\sum_{j=1}^M \pi\left(\mathbf{y} \mid \mathbf{x}_j, \mathbf{n}\right), \] where (with a slight abuse of notation) $x_j \sim \pi\left(\mathbf{x} \mid \alpha_{EP}, \alpha_{I_1D}, \alpha_{I_1H}, \alpha_{HD}, \eta\right)$ gives the number of individuals from 100,000 who end up running through each pathway, and for age-class $i$ and count $y_i$: $\pi\left(y_i \mid x_i, n\right) \sim \mbox{Bin}\left(100, x_i / 100,000\right)$ with $x_i$ the relevant simulated count. We used this estimator in place of the true likelihood in a Metropolis-Hastings algorithm to get an approximate “posterior”. We found $M = 1$ simulation was sufficient, but we had to run long chains to get good mixing.

We repeated this approach for 10 pseudo-data sets derived from the Verity et al. (2020) estimates, and also 10 pseudo-data sets derived from the CDC data. The union of these posterior samples across all 20 data sets provided our plausible ranges. Full details can be found in the data/pathways/ folder.

As a comparison, below we show the probability of hospitalisation given infection estimates from both the Verity and CDC data against the 99% prediction intervals generated from our union posterior samples.

As a comparison, below we show the probability of death given infection estimates from both the Verity and CDC data against the 99% prediction intervals generated from our union posterior samples.

For our input designs we draw from the union posterior distributions using a space-filling maximin sampling design over a large set of posterior samples. To avoid having to run the MCMC each time we want more design points, we approximate the posterior distribution using a truncated finite Gaussian mixture model, fitted using the mclust package in R. The posterior distributions from the MCMC and the approximating FMM are shown below. In practice we require checks that $\eta > 0$ and that for any combination of parameters the probabilities of transition in all age-classes are bounded by 1.

Mover functions

MetaWards has a specific structure in terms of how it progresses movements between the stages. To get the correct splits between the pathways specified above we do all non-infection movements through a custom mover function specified below. Note that in the ncov.json file specified above, we set all progress parameters to be 0. Thus, all transition probabilities other than new infections are driven by user-defined parameters that are passed to the mover function, which we call move_pathways.py here.

The mover function applies movements in order, and so it is important to get the order correct. In particular we need to reverse the order of the stage movements (e.g. do movements out of the $H$ demographic before movements out of the $I_1$ demographic). This is to ensure that individuals that move from $I_1 \to H$ say, can’t then immediately move out of $H$. The file move_pathways.py contains the code below.

Additional note: The functions in the mover file operate in turn. Therefore the movement probabilities below must be altered between each function, in order to get the correct proportions moved. For example, consider that we have $n$ individuals in the $I_1$ class and we want to move a proportion $p_{I_1}p_{I_1H}$ from $I_1 \to H$, a proportion $p_{I_1}p_{I_1D}$ from $I_1 \to D_I$, and a proportion $p_{I_1}\left(1 - p_{I_1H} - p_{I_1D}\right)$ from $I_1 \to I_2$, where $p_{I_1} = 1 - e^{-\gamma_{I_1}}$.

In this case the first mover function takes a random binomial sample from the $n$ individuals with probability $p_{I_1}p_{I_1H}$ as requested, resulting in $n_{I_1H}$ moves. However, the second mover function now operates on the $n - n_{I_1H}$ individuals, so we need to adjust the sampling probabilities to adjust for this. Hence the second mover function needs to sample from the $n - n_{I_1H}$ individuals with probability $\frac{p_{I_1}p_{I_1D}}{1 - p_{I_1}p_{I_1H}}$ in order to generate the correct proportions of moves that we would expect, resulting in $n_{I_1D}$ moves. Similarly, the third mover function now operates on the $n - n_{I_1H} - n_{I_1D}$ remaining individuals, and thus we would need to adjust the sampling probability to $\frac{p_{I_1}\left(1 - p_{I_1H} - p_{I_1D}\right)}{1 - p_{I_1}\left(p_{I_1H} + p_{I_1D}\right)}$. The remaining individuals remain in $I_1$.

from metawards.movers import go_stage
from metawards.utils import Console

def move_pathways(network, **kwargs):

    # extract user defined parameters
    params = network.params
    
    ## extract number of age-classes
    nage = params.user_params["nage"]

    ## moves out of E class
    pE = []
    pEP = []
    for j in range(nage):
        pE.append(params.user_params[f'pE_{j + 1}'])
        pEP.append(params.user_params[f'pEP_{j + 1}'])
    
    ## moves out of A class
    pA = []
    for j in range(nage):
        pA.append(params.user_params[f'pA_{j + 1}'])
    
    ## moves out of P class
    pP = []
    for j in range(nage):
        pP.append(params.user_params[f'pP_{j + 1}'])
    
    ## moves out of I1 class
    pI1 = []
    pI1H = []
    pI1D = []
    for j in range(nage):
        pI1.append(params.user_params[f'pI1_{j + 1}'])
        pI1H.append(params.user_params[f'pI1H_{j + 1}'])
        pI1D.append(params.user_params[f'pI1D_{j + 1}'])
    
    ## moves out of I2 class
    pI2 = []
    for j in range(nage):
        pI2.append(params.user_params[f'pI2_{j + 1}'])
    
    ## moves out of H class
    pH = []
    pHD = []
    for j in range(nage):
        pH.append(params.user_params[f'pH_{j + 1}'])
        pHD.append(params.user_params[f'pHD_{j + 1}'])
        
    func = []
    
    ## moves in reverse order through the stages to 
    ## ensure correct mapping
    
    ## NOTE: looping over the age ranges don't work unless you use a 
    ## default parameter k = j in the lambda - solution derived from:
    ## https://stackoverflow.com/questions/10452770/python-lambdas-binding-to-local-values
                                      
    #######################################################
    #########              H MOVES                #########
    #######################################################
                                      
    ## move H genpop to DH genpop
    ## (denominator adjustment is due to operating on remainder
    ## as described in the vignette, also includes correction
    ## in case of rounding error)
    tpHD = []
    for j in range(nage):
        tpHD.append(pH[j] * pHD[j])
        tpHD[j] = 1.0 if tpHD[j] > 1.0 else tpHD[j]
        tpHD[j] = 0.0 if tpHD[j] < 0.0 else tpHD[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="H",
                                      to_stage="DH",
                                      fraction=tpHD[k],
                                      **kwargs))
                                      
    ## move H genpop to R genpop
    ## (denominator adjustment is due to operating on remainder
    ## as described in the vignette, also includes correction
    ## in case of rounding error)
    tpHR = []
    for j in range(nage):
        tpHR.append(pH[j] * (1.0 - pHD[j]) / (1.0 - tpHD[j]))
        tpHR[j] = 0.0 if tpHD[j] == 1.0 else tpHR[j]
        tpHR[j] = 1.0 if tpHR[j] > 1.0 else tpHR[j]
        tpHR[j] = 0.0 if tpHR[j] < 0.0 else tpHR[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="H",
                                      to_stage="RH",
                                      fraction=tpHR[k],
                                      **kwargs))
                                      
    #######################################################
    #########              I2 MOVES               #########
    #######################################################

    ## move I2 genpop to RI genpop
    tpI2R = []
    for j in range(nage):
        tpI2R.append(pI2[j])
        tpI2R[j] = 1.0 if tpI2R[j] > 1.0 else tpI2R[j]
        tpI2R[j] = 0.0 if tpI2R[j] < 0.0 else tpI2R[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="I2",
                                      to_stage="RI",
                                      fraction=tpI2R[k],
                                      **kwargs))
    
    #######################################################
    #########              I1 MOVES               #########
    #######################################################

    ## move I1 genpop to H genpop
    tpI1H = []
    for j in range(nage):
        tpI1H.append(pI1[j] * pI1H[j])
        tpI1H[j] = 1.0 if tpI1H[j] > 1.0 else tpI1H[j]
        tpI1H[j] = 0.0 if tpI1H[j] < 0.0 else tpI1H[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="I1",
                                      to_stage="H",
                                      fraction=tpI1H[k],
                                      **kwargs))

    ## move I1 genpop to DI genpop
    ## (denominator adjustment is due to operating on remainder
    ## as described in the vignette, also includes correction
    ## in case of rounding error)
    tpI1D = []
    for j in range(nage):
        tpI1D.append(pI1[j] * pI1D[j] / (1.0 - tpI1H[j]))
        tpI1D[j] = 0.0 if tpI1H[j] == 1.0 else tpI1D[j]
        tpI1D[j] = 1.0 if tpI1D[j] > 1.0 else tpI1D[j]
        tpI1D[j] = 0.0 if tpI1D[j] < 0.0 else tpI1D[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="I1",
                                      to_stage="DI",
                                      fraction=tpI1D[k],
                                      **kwargs))

    ## move I1 genpop to I2 genpop
    ## (denominator adjustment is due to operating on remainder
    ## as described in the vignette, also includes correction
    ## in case of rounding error)
    tpI1I2 = []
    for j in range(nage):
        tpI1I2.append(pI1[j] * (1.0 - pI1H[j] - pI1D[j]) / (1.0 - pI1[j] * (pI1H[j] + pI1D[j])))
        tpI1I2[j] = 0.0 if (pI1[j] * (pI1H[j] + pI1D[j])) == 1.0 else tpI1I2[j]
        tpI1I2[j] = 1.0 if tpI1I2[j] > 1.0 else tpI1I2[j]
        tpI1I2[j] = 0.0 if tpI1I2[j] < 0.0 else tpI1I2[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="I1",
                                      to_stage="I2",
                                      fraction=tpI1I2[k],
                                      **kwargs))
    
    #######################################################
    #########              P MOVES                #########
    #######################################################

    ## move P genpop to I1 genpop
    tpPI1 = []
    for j in range(nage):
        tpPI1.append(pP[j])
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="P",
                                      to_stage="I1",
                                      fraction=tpPI1[k],
                                      **kwargs))
    
    #######################################################
    #########              A MOVES                #########
    #######################################################

    ## move A genpop to R genpop
    tpAR = []
    for j in range(nage):
        tpAR.append(pA[j])
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="A",
                                      to_stage="RA",
                                      fraction=tpAR[k],
                                      **kwargs))
    
    #######################################################
    #########              E MOVES                #########
    #######################################################

    ## move E genpop to A genpop
    tpEA = []
    for j in range(nage):
        tpEA.append(pE[j] * (1.0 - pEP[j]))
        tpEA[j] = 1.0 if tpEA[j] > 1.0 else tpEA[j]
        tpEA[j] = 0.0 if tpEA[j] < 0.0 else tpEA[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="E",
                                      to_stage="A",
                                      fraction=tpEA[k],
                                      **kwargs))

    ## move E genpop to P genpop
    ## (denominator adjustment is due to operating on remainder
    ## as described in the vignette, also includes correction
    ## in case of rounding error)
    tpEP = []
    for j in range(nage):
        tpEP.append(pE[j] * pEP[j] / (1.0 - tpEA[j]))
        tpEP[j] = 0.0 if tpEA[j] == 1.0 else tpEP[j]
        tpEP[j] = 1.0 if tpEP[j] > 1.0 else tpEP[j]
        tpEP[j] = 0.0 if tpEP[j] < 0.0 else tpEP[j]
        func.append(lambda k = j, **kwargs: go_stage(go_from=f'age{k + 1}',
                                      go_to=f'age{k + 1}',
                                      from_stage="E",
                                      to_stage="P",
                                      fraction=tpEP[k],
                                      **kwargs))

    return func

Interaction matrices

We will use an interaction matrix to input the contact matrices that scale the force-of-infection (FOI) from each age-class to the other age-classes (see here). When using interaction matrices in this way, the MetaWards beta parameters correspond to the probability of transmission given an infected contact, $\nu$, and for asymptomatics we scale beta by $0 < \nu_A < 1$. Full details are given here.

The mix_pathways.py file contains the custom mixer code. Note that we pass in the relative paths to the contact matrix filenames through the user inputs (see here), and we start with the POLYMOD contact matrix and then switch to the CoMix contact matrix after the first lockdown. The merge_matrix_multi_population merge function is required to get the correct frequency-dependent mixing (see MetaWards tutorial section here).

from metawards.mixers import merge_matrix_multi_population
from metawards.utils import Console
from datetime import datetime

# use caching to store matrix read in from filename
file_cache = {}

# function to read matrix from file and/or return from cache
def read_file(filename, nage):
    global file_cache
    if filename not in file_cache:
        with open(filename, "r") as FILE:
            contact_matrix = [[num for num in line.split(',')] for line in FILE]
            # set up interaction matrix
            matrix = [[0.0 for i in range(nage)] for j in range(nage)]
            for i in range(nage):
                for j in range(nage):
                    matrix[i][j] = float(contact_matrix[i][j])
        Console.debug("Interaction matrix", variables = [matrix])
        file_cache[filename] = matrix
    return file_cache[filename]

# determine the lock-down status based on the population and current network
def get_lock_down_status(network, population):

    date = population.date
    params = network.params.user_params

    y1 = int(params["lockdown_date_1_year"])
    m1 = int(params["lockdown_date_1_month"])
    d1 = int(params["lockdown_date_1_day"])

    y2 = int(params["lockdown_date_2_year"])
    m2 = int(params["lockdown_date_2_month"])
    d2 = int(params["lockdown_date_2_day"])

    # Lock down dates
    lock_1 = datetime(y1, m1, d1).date()
    lock_2 = datetime(y2, m2, d2).date()
    
    state = 0
    if date >= lock_1:
        state += 1
    if date >= lock_2:
        state += 1
    return state
    
# mixing matrix
def mix_pathways(network, population, **kwargs):
    # extract user parameters
    params = network.params
    
    # get lockdown status
    state = get_lock_down_status(network = network, population = population)
    
    # extract contact matrix and scaling information
    nage = params.user_params["nage"]
    if state == 0:
        contact_matrix_filename = params.user_params["contact_matrix1_filename"]
    else:
        contact_matrix_filename = params.user_params["contact_matrix2_filename"]
    
    # set up interaction matrix in cache or read from cache
    matrix = read_file(contact_matrix_filename, nage)
    
    network.demographics.interaction_matrix = matrix

    return [merge_matrix_multi_population]

Lockdown

Lockdown scales the FOI terms for different time periods, which each represent a different stage of interventions. Furthermore, for full lockdown we switch off all worker movements. When partial relaxing of lockdown is introduced, we allow worker movements to begin again, but scale the FOI (albeit less stringently than during full lockdown). We store this information in a custom iterator function with the corresponding parameters in inputs/user_inputs.txt.

See here for full iterator function.

Extractor

We also have a custom extractor function, that saves the outputs as a compressed SQL database for each age-class, called age*.db.bz2 (where * varies by age-class).

Each database for a given run contains a single table called compact. For efficiency we:

only store time points where an event happens within that age-class;
only store the incidence data (i.e. new cases), and not the time-series counts.

This also means that LADs and/or age-classes that are not infected do not return any results.

This trick hugely reduces the size of the output data, but means that we have to do some post-processing in order to extract quantities of interest. Since the data are stored as an SQL database, we can either query the database directly, or use some of the tools in e.g. R (see below) to interface with it and to reconstruct time-series counts if required.

So, to clarify, the stages*.db databases each contain a table called compact with entries:

day, LAD
Einc, Pinc, I1inc, I2inc, RIinc, DIinc
Ainc, RAinc
Hinc, RHinc, DHinc

We save this custom extractor function in the file extractor.py.

from metawards.extractors import extract_default
from metawards import Networks, OutputFiles, Population, Workspace
from metawards.utils import Console
from copy import deepcopy

_zero_crossings = {}

def create_tables(network: Networks):
    ## return a function that creates the tables
    ## for the specified number of disease classes
    ## Primary key needs to be composite here
    
    ## set specific column names
    col_names = ["Einc", "Pinc", "I1inc", "I2inc", "RIinc", "DIinc", "Ainc", "RAinc", "Hinc", "RHinc", "DHinc"]
    col_names = [f"{i} int" for i in col_names]
    col_str = ','.join(col_names)
    
    ## set specific column names
    col_names_ini = ["E", "P", "I1", "I2", "RI", "DI", "A", "RA", "H", "RH", "DH"]
    col_names_ini = [f"{i} int" for i in col_names_ini]
    col_str_ini = ','.join(col_names_ini)

    def initialise(conn):
        c = conn.cursor()
        c.execute(f"create table compact(day int not null, LAD int not null, {col_str}, "
                  f"primary key (day, LAD))")
        c.execute(f"create table compact_ini(day int not null, LAD int not null, {col_str_ini}, "
                  f"primary key (day, LAD))")
        conn.commit()

    return initialise

def output_db(population: Population, network: Networks,
              workspace: Workspace, output_dir: OutputFiles, **kwargs):
    Console.print(f"Calling output_db for a {network.__class__} object")
    
    ## extract number of age-classes
    nage = network.params.user_params["nage"]
        
    ## setup marker for previous day
    for i, subnet in enumerate(network.subnets):
        ## if first day, then create a copy of the ward data
        ## this should produce incidences of zero on day 1
        if not hasattr(workspace.subspaces[i], "output_previous"):
            workspace.subspaces[i].output_previous = deepcopy(workspace.subspaces[i].ward_inf_tot)
    
    if population.day == 1:
        ## open connection to databases
        conn = []
        c = []
        for i in range(nage):
            conn.append(output_dir.open_db(f"age{i + 1}.db", initialise = create_tables(network)))
            c.append(conn[i].cursor())
            
            ## get data
            ward_inf_tot = workspace.subspaces[i].ward_inf_tot
            
            ## extract initial counts
            E = ward_inf_tot[0]
            P = ward_inf_tot[1]
            I1 = ward_inf_tot[2]
            I2 = ward_inf_tot[3]
            RI = ward_inf_tot[4]
            DI = ward_inf_tot[5]
            A = ward_inf_tot[6]
            RA = ward_inf_tot[7]
            H = ward_inf_tot[8]
            RH = ward_inf_tot[9]
            DH = ward_inf_tot[10]
                
            ## loop over wards and write to file
            wards = range(0, workspace.subspaces[0].nnodes + 1)
            day = [population.day] * len(wards)
            
            ## set column names
            col_names = ["day", "LAD", "E", "P", "I1", "I2", "RI", "DI", "A", "RA", "H", "RH", "DH"]
            col_str = ','.join(col_names)
            
            ## write to file
            for day, ward, E, P, I1, I2, RI, DI, A, RA, H, RH, DH in\
            zip(day, wards, E, P, I1, I2, RI, DI, A, RA,\
            H, RH, DH):
                if ward not in _zero_crossings:
                    _zero_crossings[ward] = False
                    
                ## set up list of changes
                val = [day, ward, E, P, I1, I2, RI, DI, A, RA, H, RH, DH]
                keeps_str = ",".join([str(v) for v in val])
                qstring = f"insert into compact_ini ({col_str}) values ({keeps_str}) "
                    
                ## try to fudge a marker for first infections
                if any([ v > 0 for v in val[2:] ]) and _zero_crossings[ward] is False and ward != 0:
                    _zero_crossings[ward] = True
                    Console.print(f"Got first infection in LAD {ward}")
                
                ## check for any changes in ward    
                if _zero_crossings[ward] is True and any([ v > 0 for v in val[2:] ]):
                    c[i].execute(qstring)
                    
            conn[i].commit()
        
        ## save today's data so that it can be used tomorrow
        for i, subnet in enumerate(network.subnets):
            workspace.subspaces[i].output_previous = deepcopy(workspace.subspaces[i].ward_inf_tot)
    else:
        ## open connection to databases
        conn = []
        c = []
        for i in range(nage):
            conn.append(output_dir.open_db(f"age{i + 1}.db", initialise = create_tables(network)))
            c.append(conn[i].cursor())
        
        ## NEED TO DO FOLLOWING CALCULATIONS IN ORDER
        
        ## loop over age classes
        for j in range(nage):
            
            ## get data
            ward_inf_previous = workspace.subspaces[j].output_previous
            ward_inf_tot = workspace.subspaces[j].ward_inf_tot
            
            ## ASYMPTOMATICS
            ## calculate incidence
            RAprime = []
            for old, new in zip(ward_inf_previous[7], ward_inf_tot[7]):
                RAprime.append(new - old)
            Aprime = []
            for old, new, Rinc in zip(ward_inf_previous[6], ward_inf_tot[6], RAprime):
                Aprime.append(new - old + Rinc)
            
            ## HOSPITAL
            ## calculate incidence
            RHprime = []
            for old, new in zip(ward_inf_previous[9], ward_inf_tot[9]):
                RHprime.append(new - old)
            DHprime = []
            for old, new in zip(ward_inf_previous[10], ward_inf_tot[10]):
                DHprime.append(new - old)
            Hprime = []
            for old, new, Rinc, Dinc in zip(ward_inf_previous[8], ward_inf_tot[8], RHprime, DHprime):
                Hprime.append(new - old + Rinc + Dinc)
            
            ## GENPOP
            ## calculate incidence
            RIprime = []
            for old, new in zip(ward_inf_previous[4], ward_inf_tot[4]):
                RIprime.append(new - old)
            DIprime = []
            for old, new in zip(ward_inf_previous[5], ward_inf_tot[5]):
                DIprime.append(new - old)
            I2prime = []
            for old, new, Rinc in zip(ward_inf_previous[3], ward_inf_tot[3], RIprime):
                I2prime.append(new - old + Rinc)
            I1prime = []
            for old, new, I2inc, Dinc, Hinc in zip(ward_inf_previous[2], ward_inf_tot[2], I2prime, DIprime, Hprime):
                I1prime.append(new - old + I2inc + Dinc + Hinc)
            Pprime = []
            for old, new, I1inc in zip(ward_inf_previous[1], ward_inf_tot[1], I1prime):
                Pprime.append(new - old + I1inc)
            Eprime = []
            for old, new, Pinc, Ainc in zip(ward_inf_previous[0], ward_inf_tot[0], Pprime, Aprime):
                Eprime.append(new - old + Pinc + Ainc)
                
            ## loop over wards and write to file
            wards = range(0, workspace.subspaces[0].nnodes + 1)
            day = [population.day] * len(wards)
            
            ## set column names
            col_names = ["day", "LAD", "Einc", "Pinc", "I1inc", "I2inc", "RIinc", "DIinc", "Ainc", "RAinc", "Hinc", "RHinc", "DHinc"]
            col_str = ','.join(col_names)
            
            ## write to file
            for day, ward, Einc, Pinc, I1inc, I2inc, RIinc, DIinc, Ainc, RAinc, Hinc, RHinc, DHinc in\
            zip(day, wards, Eprime, Pprime, I1prime, I2prime, RIprime, DIprime, Aprime, RAprime,\
            Hprime, RHprime, DHprime):
                if ward not in _zero_crossings:
                    _zero_crossings[ward] = False
                    
                ## try to fudge a marker for first infections
                if Einc != 0 and _zero_crossings[ward] is False and ward != 0:
                    _zero_crossings[ward] = True
                    Console.print(f"Got first infection in LAD {ward}")
                    
                ## set up list of changes
                val = [day, ward, Einc, Pinc, I1inc, I2inc, RIinc, DIinc, Ainc, RAinc, Hinc, RHinc, DHinc]
                keeps_str = ",".join([str(v) for v in val])
                qstring = f"insert into compact ({col_str}) values ({keeps_str}) "
                
                ## check for any changes in ward    
                if _zero_crossings[ward] is True and any([ v > 0 for v in val[2:] ]):
                    c[j].execute(qstring)
                    
            conn[j].commit()
        
        ## save today's data so that it can be used tomorrow
        for i, subnet in enumerate(network.subnets):
            workspace.subspaces[i].output_previous = deepcopy(workspace.subspaces[i].ward_inf_tot)

def extract_db(**kwargs):
    funcs = []
    funcs.append(output_db)
    return funcs

Input and output code

Inputs

To run designs, we need to generate a disease.csv file containing different parameters to use for different runs. For consistency, we will define three spaces:

input space: this relates to the parameters ranges (defined below);
design space: this will usually be in $(0, 1)$ or $(-1, 1)$ space;
disease space: this relates to parameters that are fed into MetaWards.

The input and design spaces are fairly trivial to convert between, but some more work has to be done to convert between the input space and the disease space. Thus we have parameter ranges as described above.

In R we can set up the input parameter ranges as follows:

## set up parameter ranges for uniform ranges
parRanges <- data.frame(
    parameter = c("R0", "TE", "TP", "TI1", "TI2", 
                  "nuA", "lock_1_restrict", "lock_2_release", "p_home_weekend"),
    lower = c(2, 0.1, 1.2, 2.8, 0.0001, 0, 0, 0, 0),
    upper = c(4.5, 2, 3, 4.5, 0.5, 1, 1, 1, 1),
    stringsAsFactors = FALSE
)

Firstly we want a function to convert between the design and input spaces. A short R function called convertDesignToInput() which does this is given below and can be found in the R_tools/dataTools.R script file. This requires a design data frame with columns denoting each input parameter in parRanges and rows corresponding to design points. There should be an additional column called output that defines a unique identifier for each design point, and a column called repeats that contains the number of repeats for each design point. The convertDesignToInput() function also requires the parRanges data frame (defined above). We use the scale argument to define whether the design is on the $(0, 1)$ (scale = "zero_one") or $(-1, 1)$ (scale = "negone_one") space.

## function to convert from design to input space
convertDesignToInput <- function(design, parRanges, scale = c("zero_one", "negone_one")) {
  
    require(tidyverse)
  
    ## convert from design space to input space
    input <- mutate(design, ind = 1:n()) %>%
        gather(parameter, value, -ind) %>%
        left_join(parRanges, by = "parameter")
    if(scale[1] == "negone_one") {
        input <- mutate(input, value = value / 2 + 1)
    }
    input <- mutate(input, value = value * (upper - lower) + lower) %>%
        dplyr::select(ind, parameter, value) %>%
        spread(parameter, value) %>%
        arrange(ind) %>%
        dplyr::select(-ind)
    
    ## return inputs
    input
}

Next we have to generate design points for the age-dependent parameters using the process described above. To do this we have two generated finite mixture models that are stored in the files inputs/pathways.rds and inputs/hospStays.rds, and we use these to generate sets of design points, which we sub-sample using a maximin space-filling design. This approach is implemented in the FMMmaximin() function shown below. This takes an mclust model object containing the finite mixture model, a required number of design points (nsamp), and a number of points to pass to the maximin sampler (nseed). It is also useful to pass a limitFn argument in some cases to FMMmaximin(), which provides a function that assesses whether a set of inputs will convert to valid probabilities for MetaWards. We do not need this for hospital stay lengths, since these are rates that by design will always be $>0$. However, for the pathways relationships, we have a log-linear relationship between probabilities of transition and age, and for some inputs this may result in probabilities that are $> 1$. Passing a limitFn argument allows us to remove these inputs from the point cloud that the maximin design is taken over, to ensure valid inputs.

## function to generate maximin samples given an arbitrary FMM
FMMmaximin <- function(model, nsamp, limits, limitFn = NULL, nseed = 10000, ...) {
    
    ## check inputs and dependencies
    require(mclust)
    require(fields)
  
    stopifnot(!missing(model) & !missing(nsamp) & !missing(limits))
    stopifnot(class(model)[1] == "densityMclust")
    stopifnot(is.numeric(nsamp) & round(nsamp) == nsamp & nsamp > 1)
    stopifnot(is.numeric(nseed) & round(nseed) == nseed & nseed > 1 & nsamp < nseed)
    stopifnot(is.matrix(limits))
    stopifnot(ncol(limits) == 2 & nrow(limits) == nrow(model$parameters$mean))
    stopifnot(all(limits[, 1] < limits[, 2]))
    if(!is.null(limitFn)) {
        stopifnot(class(limitFn) == "function")
        stopifnot(formalArgs(limitFn)[1] == "x")
        cat("When using 'limitFn' ensure that function takes matrix as first argument 'x'
            and returns vector of logicals for inclusion of length 'nrow(x)'\n")
    }
    
    ## produce large number of samples from model and ensure they are
    ## consistent with limits
    sims <- matrix(NA, 1, 1)
    while(nrow(sims) < nseed) {
        
        ## sample from FMM
        sims1 <- sim(model$modelName, model$parameters, nseed)[, -1]
    
        ## check against limits
        for(i in 1:nrow(limits)) {
            sims1 <- sims1[sims1[, i] >= limits[i, 1], ]
            sims1 <- sims1[sims1[, i] <= limits[i, 2], ]
        }
        if(!is.null(limitFn)) {
            sims1 <- sims1[limitFn(sims1, ...), ]
        }
        if(nrow(sims1) > 0) {
            if(nrow(sims) == 1) {
                sims <- sims1
            } else {
                sims <- rbind(sims, sims1)
            }
        }
    }
    sims <- sims[1:nseed, ]
    
    ## rescale
    simsnorm <- apply(sims, 2, function(x) (x - min(x)) / diff(range(x)))
    
    ## sample initial choice at random
    simind <- sample.int(nrow(sims), 1)
    simrem <- c(1:nrow(sims))[-simind]
    
    ## choose other points using maximin
    while(length(simind) < nsamp) {
        dists <- rdist(simsnorm[simind, , drop = FALSE], simsnorm[-simind, , drop = FALSE])
        dists <- apply(dists, 2, min)
        dists <- which(dists == max(dists))
        simind <- c(simind, simrem[dists[1]])
        simrem <- simrem[-dists[1]]
    }
    
    ## return design
    sims[simind, ]
}

Note that we do not have to run the convertDesignToInput() function on these design points, since they are already in the input space.

Once we have generated design points in this way, we need to transform from the input space to the disease space for MetaWards. A convertInputToDisease() R function is given below and can be found in the R_tools/dataTools.R script file. This requires an input data frame, with columns denoting each input parameter (R0, TE, TP, TI1, TI2, alphaEP, alphaI1H, alphaI1D, alphaHD, eta, alphaTH, etaTH, nuA, lock_1_restrict, lock_2_release) and rows corresponding to each input point, a number of repeats and a column of unique identifiers (output).

## function to convert from input to disease space
convertInputToDisease <- function(input, C, N, S0, ages) {
   
    require(tidyverse)
    require(magrittr)
  
    stopifnot(all(c("R0", "nuA", "TE", "TP", "TI1", "TI2", "alphaEP", 
        "alphaI1H", "alphaI1D", "alphaHD", "eta", 
        "alphaTH", "etaTH", "lock_1_restrict", 
        "lock_2_release", "p_home_weekend", "repeats", "output") %in% colnames(input)))
    
    ## check unique ID
    stopifnot(length(unique(input$output)) == length(input$output))
    
    ## scaling for asymptomatics
    disease <- select(input, nuA, output)
  
    ## progressions out of the E class
    for(j in 1:length(ages)) {
        disease <- mutate(disease, ".pE_{j}" := 1 - exp(-1 / input$TE))
    }
    disease <- mutate(disease, 
        temp = map2(input$alphaEP, input$eta, function(alpha, eta, ages) {
            exp(alpha + eta * ages) %>%
                matrix(nrow = 1) %>%
                set_colnames(paste0(".pEP_", 1:length(ages))) %>%
                as_tibble()
        }, ages = ages)) %>%
        unnest(cols = temp)
    
    ## progressions out of the A class
    for(j in 1:length(ages)) {
        disease <- mutate(disease, ".pA_{j}" := 1 - exp(-1 / (input$TP + input$TI1 + input$TI2)))
    }
    
    ## progressions out of the P class
    for(j in 1:length(ages)) {
      disease <- mutate(disease, ".pP_{j}" := 1 - exp(-1 / input$TP))
    }
    
    ## progressions out of the I1 class
    for(j in 1:length(ages)) {
        disease <- mutate(disease, ".pI1_{j}" := 1 - exp(-1 / input$TI1))
    }
    disease <- mutate(disease, 
        temp = map2(input$alphaI1H, input$eta, function(alpha, eta, ages) {
            exp(alpha + eta * ages) %>%
                matrix(nrow = 1) %>%
                set_colnames(paste0(".pI1H_", 1:length(ages))) %>%
                as_tibble()
        }, ages = ages)) %>%
        unnest(cols = temp)
    disease <- mutate(disease, 
        temp = map2(input$alphaI1D, input$eta, function(alpha, eta, ages) {
            exp(alpha + eta * ages) %>%
                matrix(nrow = 1) %>%
                set_colnames(paste0(".pI1D_", 1:length(ages))) %>%
                as_tibble()
        }, ages = ages)) %>%
        unnest(cols = temp)
    ## check for multinomial validity
    temp <- select(disease, starts_with(".pI1")) %>%
        select(-starts_with(".pI1_")) %>%
        mutate(ind = 1:n()) %>%
        gather(par, prob, -ind) %>%
        separate(par, c("par", "age"), sep = "_") %>%
        group_by(age, ind) %>% 
        summarise(prob = sum(prob)) %>%
        pluck("prob")
    if(any(temp < 0) | any(temp > 1)) {
        stop("Some multinomial pI1* probs invalid")
    }
    
    ## progressions out of the I2 class
    for(j in 1:length(ages)) {
        disease <- mutate(disease, ".pI2_{j}" := 1 - exp(-1 / input$TI2))
    }
    
    ## progressions out of the H class
    disease <- mutate(disease, 
        temp = map2(input$alphaTH, input$etaTH, function(alpha, eta, ages) {
          exp(alpha + eta * ages) %>%
              {1 - exp(-1 / .)} %>%
              matrix(nrow = 1) %>%
              set_colnames(paste0(".pH_", 1:length(ages))) %>%
              as_tibble()
        }, ages = ages)) %>%
        unnest(cols = temp)
    disease <- mutate(disease, 
        temp = map2(input$alphaHD, input$eta, function(alpha, eta, ages) {
            exp(alpha + eta * ages) %>%
                matrix(nrow = 1) %>%
                set_colnames(paste0(".pHD_", 1:length(ages))) %>%
                as_tibble()
        }, ages = ages)) %>%
        unnest(cols = temp)
    
    ## lockdown scalings
    disease$`.lock_1_restrict` <- input$lock_1_restrict
    disease$`.lock_2_release` <- input$lock_2_release
    
    ## add weekend movement scaling
    disease$`.p_home_weekend` <- input$p_home_weekend
    
    ## remove any invalid inputs
    stopifnot(any(disease$nuA >= 0 | disease$nuA <= 1))
    stopifnot(
        select(disease, starts_with(".p")) %>%
        summarise_all(~{all(. >= 0 & . <= 1)}) %>%
        apply(1, all)
    )

    ## set up data for calculating transmission parameter
    temp <- select(disease, output, starts_with(".pEP"), starts_with(".pI1H"), starts_with(".pI1D")) %>%
        gather(prob, value, -output) %>%
        separate(prob, c("prob", "age"), sep = "_") %>%
        spread(age, value) %>%
        group_by(prob, output) %>%
        nest() %>%
        ungroup() %>%
        spread(prob, data) %>%
        mutate_at(vars(-output), ~purrr::map(., as.data.frame)) %>%
        mutate_at(vars(-output), ~purrr::map(., unlist))
    colnames(temp) <- gsub("\\.", "", colnames(temp))
    
    temp <- select(input, output, R0, nuA, TE, TP, TI1, TI2) %>%
        inner_join(temp, by = "output")

    temp <- mutate(temp, nu = pmap_dbl(as.list(temp)[-1], function(R0, nuA, TE, TP, TI1, TI2, pEP, pI1H, pI1D, C, N, S0) {
            ## transformations
            gammaE <- rep(1 / TE, length(N))
            gammaP <- rep(1 / TP, length(N))
            gammaI1 <- rep(1 / TI1, length(N))
            gammaI2 <- rep(1 / TI2, length(N))
            gammaA <- rep(1 / (TP + TI1 + TI2), length(N))
            
            ## calculate nu from NGM
            NGM(R0 = R0, nu = NA, C, S0, N, nuA, gammaE, pEP, gammaA, gammaP, gammaI1, 
                pI1H, pI1D, gammaI2)$nu
        }, C = C, N = N, S0 = S0))
    disease <- inner_join(disease, select(temp, nu, output), by = "output")
    
    ## checks on nu
    stopifnot(all(disease$nu > 0 & disease$nu < 1))
    
    ## finalise data set
    disease <- mutate(disease, nuA = nuA * nu) %>%
        mutate(`beta[1]` = nu) %>% 
        mutate(`beta[2]` = nu) %>%
        rename(`beta[3]` = nu) %>%
        rename(`beta[6]` = nuA) %>%
        inner_join(select(input, repeats, output), by = "output")
    
    stopifnot(all(disease$`beta[6]` < disease$`beta[1]`))
    
    print(paste0(nrow(input) - nrow(disease), " invalid inputs removed"))
    print(paste0(nrow(disease), " samples remaining"))
    
    ## return disease file
    disease
}

Also in R_tools/dataTools.R is a function ensembleIDGen() that creates unique IDs for each design point. So an example of a quick LHS design for five design points and five replicates is given by the convertDesign.R script file:

## load libraries
library(lhs)
library(mclust)
library(tidyverse)

## source dataTools
source("R_tools/dataTools.R")

## set up parameter ranges for uniform ranges
parRanges <- data.frame(
    parameter = c("R0", "TE", "TP", "TI1", "TI2", 
                  "nuA", "lock_1_restrict", "lock_2_release", "p_home_weekend"),
    lower = c(2, 0.1, 1.2, 2.8, 0.0001, 0, 0, 0, 0),
    upper = c(4.5, 2, 3, 4.5, 0.5, 1, 1, 1, 1),
    stringsAsFactors = FALSE
) 

## read in contact matrix to use for NGM
C <- as.matrix(read.csv("inputs/POLYMOD_matrix.csv", header = FALSE))

## generate LHS design
ndesign <- 100
design <- randomLHS(ndesign, nrow(parRanges))
colnames(design) <- parRanges$parameter
design <- as_tibble(design)

## convert to input space
inputs <- convertDesignToInput(design, parRanges, "zero_one")

## generate space-filling design for other parameters

## load FMM objects
hospStays <- readRDS("inputs/hospStays.rds")
pathways <- readRDS("inputs/pathways.rds")

## generate design points for hospital stay lengths
hospStaysInput <- FMMmaximin(
        hospStays, 
        ndesign, 
        matrix(c(-Inf, Inf, 0, Inf), ncol = 2, byrow = TRUE)
    ) %>%
    as_tibble() %>%
    rename(alphaTH = x1, etaTH = x2)

## generate design points for other transition probabilities

## this function checks validity of inputs
pathwaysLimitFn <- function(x, ages) {
    singleProbs <- apply(x, 1, function(x, ages) {
        eta <- x[5]
        alphas <- x[-5]
        y <- sapply(alphas, function(a, eta, ages) {
            y <- exp(a + eta * ages)
            all(y >= 0 & y <= 1)
        }, eta = eta, ages = ages)
        all(y)
    }, ages = ages)
    multiProbs <- apply(x[, -c(1, 3)], 1, function(x, ages) {
        alphaI1D <- x[1]
        alphaI1H <- x[2]
        eta <- x[3]
        pI1D <- exp(alphaI1D + eta * ages)
        pI1H <- exp(alphaI1H + eta * ages)
        p <- pI1D + pI1H
        all(p >= 0 & p <= 1)
    }, ages = ages)
    multiProbs & singleProbs
}

## produces design points subject to constraints
pathwaysInput <- FMMmaximin(
        pathways, 
        ndesign,
        matrix(c(rep(c(-20, 0), times = 4), 0, 1), ncol = 2, byrow = TRUE),
        pathwaysLimitFn,
        ages = c(2.5, 11, 23.5, 34.5, 44.5, 55.5, 65.5, 75.5)
    ) %>%
    as_tibble() %>%
    rename(alphaEP = x1, alphaI1D = x2, alphaHD = x3, alphaI1H = x4, eta = x5)

## bind to design
inputs <- cbind(inputs, hospStaysInput, pathwaysInput)

## add unique hash identifier
## (at the moment don't use "a0" type ensembleID, because MetaWards
## parses to dates)
inputs$output <- ensembleIDGen(ensembleID = "Ens0", nrow(inputs))
inputs$repeats <- 1

## solution to round numbers preserving sum
## adapted from:
## https://stackoverflow.com/questions/32544646/round-vector-of-numerics-to-integer-while-preserving-their-sum
smart_round <- function(x) {
    y <- floor(x)
    indices <- tail(order(x - y), round(sum(x)) - sum(y))
    y[indices] <- y[indices] + 1
    y
}

## set up number of initial individuals in each age-class
N <- smart_round(read_csv("inputs/age_seeds.csv", col_names = FALSE)$X2 * 56082077)
S0 <- N - smart_round(read_csv("inputs/age_seeds.csv", col_names = FALSE)$X2 * 100)
ages <- c(2.5, 11, 23.5, 34.5, 44.5, 55.5, 65.5, 75.5)

## convert input to disease
disease <- convertInputToDisease(inputs, C, N, S0, ages)
stopifnot(nrow(disease) == ndesign)

# ## plot inputs
# library(GGally)
# p <- select(inputs, -output, -repeats) %>%
#     ggpairs(upper = "blank")
# ggsave("design.pdf", p, width = 10, height = 10)

## reorder samples
inputs <- arrange(inputs, output)
disease <- arrange(disease, output)

## write text file for MetaWards
write.table(disease, "inputs/disease.dat", row.names = FALSE, sep = " ", quote = FALSE)

## save inputs for data for post-simulation runs
saveRDS(inputs, "inputs/inputs.rds")
saveRDS(parRanges, "inputs/parRanges.rds")

This produces a file inputs/disease.dat that can be passed to MetaWards to run the model. The bash script below provides the command line instructions needed to the model using these inputs for the start of the epidemic to the first lockdown. If you don’t run Linux, then the file should give you an idea of how to run the model on your own system.

#!/bin/bash

## set up outputs folder
mkdir -p raw_outputs

## run design code
R CMD BATCH --no-restore --slave --no-save convertDesign.R

# ## run seeding code
# cd data/seedDeaths
# R CMD BATCH --no-restore --slave --no-save '--args 2020-02-01 2020-03-11' seedDeaths.R
# cd ../..

## run seeding code
R CMD BATCH --no-restore --slave --no-save '--args 2020-02-01 2020-03-06' R_tools/simulateSeedStates.R

## set path to MetaWardsData repository
export METAWARDSDATA=$HOME/Documents/covid/MetaWardsData
  
## run model
metawards -d model_code/ncov.json -m 2019LADData\
    -D model_code/demographics.json\
    --mixer model_code/mix_pathways\
    --mover model_code/move_pathways\
    --input inputs/disease.dat\
    --iterator model_code/iterator\
    -u inputs/user_inputs.txt -o raw_outputs --force-overwrite-output \
    --extractor model_code/extractor\
    --start-date 2020/03/06 --theme simple --nsteps 15

Outputs

As described above each model run produces a series of files called age1.db.bz2, age2.db.bz2 etc., which are compressed databases containing the outputs for each age-class. In practice, it is time-consuming to unzip and extract all of these outputs for all design points, and extract and collate the various quantities we need for our emulation and history matching process. To this end we have been kindly offered a processing and data storage solution on JASMIN.

The simulation outputs from our model are stored at the public repository:

https://gws-access.jasmin.ac.uk/public/covid19/

These are stored in the format:

wave
+-- inputs
+-- raw_outputs
| +-- EnsID
| | +-- age*.db
| | +-- weeksums_*.csv
| +-- EnsIDx002
| | +-- age*.db
| | +-- weeksums_*.csv
| +-- EnsIDx003
| | +-- age*.db
| | +-- weeksums_*.csv
| +-- ...

The age*.db files are the unzipped raw outputs for that design point (described above). The weeksums_*.csv files are weekly summaries of mean hospital prevalence (Hprev column) and total deaths (Deaths column) on a weekly basis for each LAD and week within each age-class. For a given ensemble ID (Ens0000 say), replicates are appended with the replicate number e.g. Ens0000x001, Ens0000x002 and so on.

More detail on manipulating model outputs

If you’ve run the model yourselves, or want to use the data differently, then as an example of how to manipulate the raw outputs, migrate to an output folder containing e.g. age3.db.bz2 (or download one from the repo above). If this is a zipped file, then you will have to unzip it. In Linux this can be done on the command line e.g.

bzip2 -dkf age3.db.bz2

You will notice that the unzipped age3.db file is larger than the compressed version. As such, you might need to remove age3.db at the end if you have limited hard drive space (to this end, the bzip -dkf flag used above ensures that the original compressed file is not deleted when uncompressed).

The database contains a single table called compact. To store the outputs efficiently, we have introduced various tricks, described above. To clarify, each age*.db database contains a table called compact with entries:

day, LAD
Einc, Pinc, I1inc, I2inc, RIinc, DIinc
Ainc, RAinc
Hinc, RHinc, DHinc

If you’re happy with SQL, you can query these directly with e.g. SQLite. If you are an R user, then the dplyr package (or more specifically the dbplyr package) provides some useful R tools for querying SQL databases using tidyverse-type notation. More details about these tools can be found here.

If required, the script R_tools/dataTools.R, that provides a function called reconstruct() that is written using Rcpp and can be used to reconstruct the time-series counts from the incidence data.

## write Rcpp function to reconstruct counts from incidence
## first element of Einc etc. must be initial states and the following
## must be the incidence at a given time
library(Rcpp)
cppFunction('IntegerMatrix reconstruct(IntegerVector Einc, IntegerVector Pinc, 
    IntegerVector I1inc, IntegerVector I2inc, IntegerVector RIinc, IntegerVector DIinc, 
    IntegerVector Ainc, IntegerVector RAinc, 
    IntegerVector Hinc, IntegerVector RHinc, IntegerVector DHinc) {
    
    // extract sizes
    int n = Einc.size();
    
    // set up output matrix
    IntegerMatrix output(n, 17);
    
    // set up initial counts
    output(0, 0) = 0;
    output(0, 1) = Einc[0];
    output(0, 2) = 0;
    output(0, 3) = Pinc[0];
    output(0, 4) = 0;
    output(0, 5) = I1inc[0];
    output(0, 6) = 0;
    output(0, 7) = I2inc[0];
    output(0, 8) = RIinc[0];
    output(0, 9) = DIinc[0];
    output(0, 10) = 0;
    output(0, 11) = Ainc[0];
    output(0, 12) = RAinc[0];
    output(0, 13) = 0;
    output(0, 14) = Hinc[0];
    output(0, 15) = RHinc[0];
    output(0, 16) = DHinc[0];
    
    int E = Einc[0]; 
    int P = Pinc[0]; 
    int I1 = I1inc[0]; 
    int I2 = I2inc[0];
    int RI = RIinc[0];
    int DI = DIinc[0];
    int A = Ainc[0];
    int RA = RAinc[0];
    int H = Hinc[0];
    int RH = RHinc[0];
    int DH = DHinc[0];
    
    // reconstruct counts
    for(int i = 1; i < n; i++) {
    
        E += Einc[i] - Pinc[i] - Ainc[i];
        output(i, 0) = Einc[i];
        output(i, 1) = E;
    
        P += Pinc[i] - I1inc[i];
        output(i, 2) = Pinc[i];
        output(i, 3) = P;
        
        I1 += I1inc[i] - I2inc[i] - Hinc[i] - DIinc[i];
        output(i, 4) = I1inc[i];
        output(i, 5) = I1;
        
        I2 += I2inc[i] - RIinc[i];
        output(i, 6) = I2inc[i];
        output(i, 7) = I2;
        
        RI += RIinc[i];
        output(i, 8) = RI;
        
        DI += DIinc[i];
        output(i, 9) = DI;
        
        A += Ainc[i] - RAinc[i];
        output(i, 10) = Ainc[i];
        output(i, 11) = A;
        
        RA += RAinc[i];
        output(i, 12) = RA;
        
        H += Hinc[i] - RHinc[i] - DHinc[i];
        output(i, 13) = Hinc[i];
        output(i, 14) = H;
        
        RH += RHinc[i];
        output(i, 15) = RH;
        
        DH += DHinc[i];
        output(i, 16) = DH;
    }
    
    // return counts
    return(output);
}')

As a quick example, imagine that we want to extract the cumulative hospital cases on say day 20 from R for the third age-class. (Note that we do not need the time-series counts for this, only the incidence data.) Here we will need to extract the new hospital cases from day 1–20, and then sum them up for each LAD. Therefore we need to extract day, LAD and Hinc from the compact table.

## load library
## (you might also need to install the 'RSQLite' 
## and `dbplyr` packages which 'dplyr' calls)
library(dplyr)

## establish connection to database
con <- DBI::dbConnect(RSQLite::SQLite(), "age3.db")

## connect to the 'compact' table
compact <- tbl(con, "compact")

## examine
compact

## # Source:   table<compact> [?? x 13]
## # Database: sqlite 3.33.0
## #   [/home/tj/Documents/covid/uq4covid.github.io/vignettes/data/MetaWards/vignette/age3.db]
##      day   LAD  Einc  Pinc I1inc I2inc RIinc DIinc  Ainc RAinc  Hinc RHinc DHinc
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1     2    10     1     0     0     0     0     0     0     0     0     0     0
##  2     2    12     1     0     0     0     0     0     1     1     0     0     0
##  3     2    18     0     0     0     0     0     0     0     1     0     0     0
##  4     2    23     0     0     0     0     0     0     1     0     0     0     0
##  5     2    27     1     0     0     0     0     0     0     0     0     0     0
##  6     2    28     1     0     0     0     0     0     0     0     0     0     0
##  7     2    29     0     0     0     0     0     0     0     1     0     0     0
##  8     2    35     0     0     0     0     0     0     1     0     0     0     0
##  9     2    38     1     0     0     0     0     0     0     0     0     0     0
## 10     2    39     1     0     0     0     0     0     0     0     0     0     0
## # … with more rows

By default, the package only pulls enough data from the database to produce a summary on the screen (notice that it prints the dimensions as ?? x 13). If the database is small enough, then the collect() function can be used to import tables directly into R as a tibble. Alternatively, the dbplyr package can convert tidyverse-style commands into SQL and run these directly within the database. For large databases this is likely to be much more efficient, in terms of speed and memory usage. You can then collect() the results of the query into R.

Note: if you need to reconstruct the time-series counts from the incidence data, then you will need to collect() the necessary information to R first, and then run the reconstruct() function as shown below.

Querying databases

As an example of this latter approach, we will set up a query that sums the new hospital cases over the first 20 days in each LAD. Remember: for each LAD the database only contains days when some event happens. For cumulative incidence counts this is fine, since the missing data will be zero in each case, so we just need to filter first to remove all time points $> 20$. To set up the query:

## Hinc contains the new cases, so sum these
## over each LAD for days 1--100
hosp_db <- filter(compact, day <= 20) %>%
    select(LAD, Hinc) %>%
    group_by(LAD) %>%
    summarise(Hcum = sum(Hinc))

This hasn’t run any commands yet, rather hosp_db contains a parsed SQL query that can be run through the database connection. If you like, you can view the generated SQL query using:

## view query
show_query(hosp_db)

## Warning: Missing values are always removed in SQL.
## Use `SUM(x, na.rm = TRUE)` to silence this warning
## This warning is displayed only once per session.

## <SQL>
## SELECT `LAD`, SUM(`Hinc`) AS `Hcum`
## FROM (SELECT `LAD`, `Hinc`
## FROM `compact`
## WHERE (`day` <= 20.0))
## GROUP BY `LAD`

Now let’s run the query and return the results to R by passing hosp_db to the collect() function:

## run query and pull to R
hosp <- collect(hosp_db)

## print to screen
hosp

## # A tibble: 336 × 2
##      LAD  Hcum
##    <int> <int>
##  1     1     0
##  2     2     0
##  3     3     0
##  4     4     0
##  5     5     0
##  6     6     0
##  7     7     0
##  8     8     0
##  9     9     0
## 10    10     0
## # … with 326 more rows

Now you can play with hosp as much as you like. Note that hosp here only contains information about LADs that have some infections, and only from the time since initial infection. Hence for calibration you might need to expand to fill in the missing LADs. Fortunately, R (especially tidyverse) has lots of tools for doing this, and an example is given in the next section.

Collecting and reconstructing counts in R

Imagine now that we wish to extract the number of asymptomatic infections on each of the first 20 days in each LAD. Remember: for each LAD the database only contains days when some event happens, and only contains incidence counts, so we will have collect the data into R, and then run the reconstruct() function provided above in each LAD.

Note: the reconstruct() function requires that the data are in time order, but do not require time points to be expanded (since no events happen on days that are not present in the data set). For multiple LADs the reconstruct function has to run on each LAD separately, and hence the code below first orders the data, and then nests it by LAD, before running the reconstruct function on each LAD separately. It then removes the nesting.

To set up the query:

## extract information for days 0-20 and collect to R
output <- filter(compact, day <= 20) %>%
    collect()

## now sort by day, group by LAD and run reconstruct()
output <- arrange(output, day) %>%
    group_by(LAD) %>%
    nest() %>%
    mutate(data = map(data, ~{
        output <- reconstruct(
            .$Einc, .$Pinc, .$I1inc, .$I2inc, .$RIinc, 
            .$DIinc, .$Ainc, .$RAinc,
            .$Hinc, .$RHinc, .$DHinc
        ) %>%
        magrittr::set_colnames(c(
            "Einc", "E", "Pinc", "P", "I1inc", "I1", "I2inc", "I2", 
            "RI", "DI",
            "Ainc", "A", "RA",
            "Hinc", "H", "RH", "DH"
        )) %>%
        as_tibble()
        output$day <- .$day
        output
    })) %>%
    unnest(cols = data) %>%
    ungroup() %>%
    select(day, LAD, everything())

## examine data frame (reordered to put A columns first for later comparison)
select(output, day, LAD, Ainc, A, RA, everything()) %>%
    filter(LAD == 317) %>%
    print(n = Inf)

## # A tibble: 10 × 19
##      day   LAD  Ainc     A    RA  Einc     E  Pinc     P I1inc    I1 I2inc    I2
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1     6   317     0     0     0     0     1     0     0     0     0     0     0
##  2     7   317     0     0     0     8     9     0     0     0     0     0     0
##  3     8   317     7     7     0     6     8     0     0     0     0     0     0
##  4     9   317     6    11     2     6     8     0     0     0     0     0     0
##  5    10   317     6    15     4    15    17     0     0     0     0     0     0
##  6    11   317    13    27     5    19    22     1     1     0     0     0     0
##  7    12   317    19    41    10    28    30     1     1     1     1     0     0
##  8    13   317    29    65    15    44    45     0     1     0     1     0     0
##  9    14   317    33    89    24    72    82     2     3     0     0     1     1
## 10    15   317    64   136    41    14    26     6     8     1     1     0     0
## # … with 6 more variables: RI <int>, DI <int>, Hinc <int>, H <int>, RH <int>,
## #   DH <int>

At this point we have a set of time-series counts, which we will now expand to fill in the missing LAD / day combinations. (This can be done with complete(), but we’ve found it’s generally faster by joining to a lookup table as below.)

## create lookup to expand LAD / time combinations
lookup <- expand.grid(LAD = 1:339, day = 0:20) %>%
    as_tibble()

## joint to lookup, and fill in missing information correctly
asymp <- select(output, day, LAD, A) %>%
    full_join(lookup, by = c("day", "LAD")) %>%
    arrange(LAD, day) %>%
    mutate(A = ifelse(day == 0 & is.na(A), 0, A)) %>%
    group_by(LAD) %>%
    fill(A) %>%
    ungroup()

## examine output
filter(asymp, LAD == 317) %>%
    print(n = Inf)

## # A tibble: 21 × 3
##      day   LAD     A
##    <int> <int> <dbl>
##  1     0   317     0
##  2     1   317     0
##  3     2   317     0
##  4     3   317     0
##  5     4   317     0
##  6     5   317     0
##  7     6   317     0
##  8     7   317     0
##  9     8   317     7
## 10     9   317    11
## 11    10   317    15
## 12    11   317    27
## 13    12   317    41
## 14    13   317    65
## 15    14   317    89
## 16    15   317   136
## 17    16   317   136
## 18    17   317   136
## 19    18   317   136
## 20    19   317   136
## 21    20   317   136

Once you do not need access to the SQLite database any more, you should disconnect:

## disconnect from database
DBI::dbDisconnect(con)

You can very easily wrap these ideas into an R function that can scroll through the design IDs, extract relevant outputs and bind to the inputs.

Touloupou et al. (2020)↩︎