When performing UQ, it can hardly be overstated how useful it can be to run the model you are analysing for yourself. Obviously, if you can, as with Metawards, this comes with its own problems and dangers, and you might run the model in such a way as not to get anything useful out. Thanks to the work of our team (TJ McKinley and Chris Fenton in particular), and with help from Metawards developer Christopher Woods, we have a model structure (see here) and a (relatively) stable version of the code for running it for UQ and extracting the data for yourselves!
So why am I writing a “for dummies” guide to using it and not Chris/TJ? Well, because I qualify as a member of the target audience! Today, I went through TJ’s instructions, failed in several places and figured out everything I needed to do to run it, so I figured I would pass that knowledge on.
As Mac uses Unix and I am on a Mac, the instructions are identical. There will need to be tweaks to the below for Windows users, but what is here should be enough to get you started.
The easiest thing to do is to open a terminal and type
I used pip3
, you may have Python 3 as default and can use pip
. The absolute best thing to do is go here and follow the instructions. They are really clear. For this part it doesn’t matter what directory you are in.
Now, you need to head to a directory where you want the Metawards data to live. If not familar with terminal navigation, I used
and you can check if you are happy with what’s in the directory with
Next install the metawards data files with
Now, we get our version of the model, which consists of a number of files needed to do the running, designing and extracting.
These can be downloaded as a .zip
file here.
Alternatively, these are also on the uq4covid.io
repository here, which you should clone onto your machine or pull if you have it already.
Once you have it, go to directory vignettes/data/MetaWards
and there is everything you need. To ensure we keep the repo version clean, it’s best to copy this folder out to somewhere else. This guide is going to use RStudio to do most things, so head to this folder in Rstudio and set it as your working directory.
The file convertDesign.R
creates new designs to your specification. To run a custom design, you need to open it and change a few things. I’ll highlight the blocks you need to look at now.
First the parameter ranges look like this
parRanges <- data.frame(
parameter = c("r_zero", "incubation_time", "infectious_time", "hospital_time",
"critical_time", "lock_1_restrict", "lock_2_release",
"pEA", "pIH", "pIRprime", "pHC", "pHRprime", "pCR",
"GP_A", "GP_H", "GP_C"),
lower = c(2.5, 4, 2, 4, 4, rep(0, 11)),
upper = c(4, 6, 4, 12, 12, rep(1, 11)),
stringsAsFactors = FALSE
)
You can alter lower
and upper
for any of the parameters (or even fix some if you like) here. See the model description to see what they all do.
Next, generate your own design. The default is a test 5 member LHC and is in the code like this
design <- randomLHS(5, nrow(parRanges))
colnames(design) <- parRanges$parameter
design <- as_tibble(design)
Obviously, change the design to something of your own choosing (remembering at all times how bad random Latin Hypercubes can be), replacing the first of these 3 lines with your own.
The next 2 lines are super important
The second gives the number of repeats per run so is fairly obvious. Our early experiments showed that at least 20 might be needed for some good techniques to work. The output argument is very important. This is a unique identifier for the ensemble. "Ens0"
is the first test ensemble we’ve run. "Ens1"
will be the next. Soon there will be "Ens9"
, "Ensa"
, "Ensb"
and so on. We use “Ens” as Metawards does funny things if you use numbers or some single letters. Ideally, you would let us know you are generating a big ensemble with the next ID for putting on the git page. This identifier will ultimately be responsible for identifying your ensemble from others, and for matching inputs to outputs during extraction. Change it now.
One you have made those changes, run the whole script and you are ready to run metawards!
There are 2 simple steps to this. First, you need to edit the file runscript.sh
. Simply open it with xcode, and alter the line
so that the directory is the place you cloned the Metawards data above. Now it’s run time! The default is to run for 100 days, but you will probably want longer to look at interesting things. The final part of the long line calling Metawards handles this
and you can alter it in obvious ways.
Open up a terminal and navigate via cd
to the folder where your runscript.sh
is. Once there type
and let the fun begin. Even this small test ensemble took about 10 mins on my machine, and it runs on all the cores, so don’t expect to be able to do a tonne whilst it’s running. TJ had a bigger job run all night.
Eventually the fans will cool down and the simulations will be complete. Now you will want to get at some of the outputs. The extractOutput.R
file lets you do this. Open this in RStudio.
If you have all the required libraries, you can just run the whole thing and, by default, you will get the total number of infections in hospitals per ward. There is a lot you can do to customise, and details can be found in the Model Description. To get you started, here are some things you can change.
The chunk
## Hinc contains the new cases, so sum these
## over each ward for days 1--100
hosp_db <- filter(compact, day <= 100) %>%
select(ward, Hinc) %>%
group_by(ward) %>%
summarise(Hcum = sum(Hinc))
## collect outcome of query
hosp <- collect(hosp_db)
is taking the database con
created from the raw outputs, is selecting the first 100 days from the Hinc
stage (which counts the new hospital cases at each day), and summing these up within each ward. Some reference to our model setup is required to see what to do, so maybe head there to see what the stages and demographics refer to in the diagram. Briefly, we have 4 demographics: genpop
, hospital
, critical
, and asymp
. The diagram in TJ’s instruction shows where they are, and also contains a description of the outputs. Note that columns with inc
appended to their names count the new entries into the class on that day. So Hinc
is the number of new hospital cases on a given day, and H
is the number of individuals in class \(H\) (including Hinc
). For brevity, extraneous stages are removed (for example, E
and I
only exist for the genpop
demographic, so they are not returned for any other demographics since all entries are zero in any case). Entries are only returned for a ward once infection is present, so not all wards will have outputs on day 1 say. See the Model Description for full details.
Having extracted data like this in tibble form, it is ready to be processed for building whatever emulators you like.
All of the files for running Metawards are in the different folders in our repo (that you will have cloned onto your machine), so you are free to change things for your own investigations. In the future, when we decide on different experiments we want to do, we will alter these files, but still be able to retain the same designing, running and processing experience.
Happy UQing!