Introduction

Using the UQ4covid initial design described here I will build some GPs and do some data analysis to illustrate challenges for emulating Metawards.

Preliminaries

I will explore using mogp_emulator and the ExeterUQ_MOGP implementation (info is available here. To do that I need to create a pointer to the directory of my mogp_emulator installation and then to source the BuildEmulator.R code from the ExeterUQ_MOGP repository https://github.com/BayesExeter/ExeterUQ_MOGP. For this code block, your directory strings will necessarily be different from mine.

mogp_dir <- "~/Dropbox/BayesExeter/mogp_emulator"
setwd("~/Dropbox/BayesExeter/ExeterUQ_MOGP")
source("~/Dropbox/BayesExeter/ExeterUQ_MOGP/BuildEmulator/BuildEmulator.R")
## far library : Modelization for Functional AutoRegressive processes
## version 0.6-4 (2014-12-07)
## Spam version 2.3-0 (2019-09-13) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
## See https://github.com/NCAR/Fields for
##  an extensive vignette, other supplements and source code
## This is loo version 2.1.0.
## **NOTE: As of version 2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session. Visit mc-stan.org/loo/news for details on other changes.

Now we get some data from the uq4covid repository. I will look at the pre-lockdown infections at local authority level.

library(readr)
lads_by_day_cumulative_79 <- read_csv("../../data/metawards/initial_ensemble/data/pre_lockdown_1/lads_by_day_cumulative_79.csv")
## Parsed with column specification:
## cols(
##   .default = col_double()
## )
## See spec(...) for full column specifications.
names(lads_by_day_cumulative_79)
##   [1] "incubation_time"              "infectious_time"             
##   [3] "r_zero"                       "lock_1_restrict"             
##   [5] "lock_2_release"               "Adur"                        
##   [7] "Allerdale"                    "Amber Valley"                
##   [9] "Arun"                         "Ashfield"                    
##  [11] "Ashford"                      "Aylesbury Vale"              
##  [13] "Babergh"                      "Barking and Dagenham"        
##  [15] "Barnet"                       "Barnsley"                    
##  [17] "Barrow-in-Furness"            "Basildon"                    
##  [19] "Basingstoke and Deane"        "Bassetlaw"                   
##  [21] "Bath and North East Somerset" "Bedford"                     
##  [23] "Bexley"                       "Birmingham"                  
##  [25] "Blaby"                        "Blackburn with Darwen"       
##  [27] "Blackpool"                    "Blaenau Gwent"               
##  [29] "Bolsover"                     "Bolton"                      
##  [31] "Boston"                       "Bournemouth"                 
##  [33] "Bracknell Forest"             "Bradford"                    
##  [35] "Braintree"                    "Breckland"                   
##  [37] "Brent"                        "Brentwood"                   
##  [39] "Bridgend"                     "Brighton and Hove"           
##  [41] "Bristol, City of"             "Broadland"                   
##  [43] "Bromley"                      "Bromsgrove"                  
##  [45] "Broxbourne"                   "Broxtowe"                    
##  [47] "Burnley"                      "Bury"                        
##  [49] "Caerphilly"                   "Calderdale"                  
##  [51] "Cambridge"                    "Camden"                      
##  [53] "Cannock Chase"                "Canterbury"                  
##  [55] "Cardiff"                      "Carlisle"                    
##  [57] "Carmarthenshire"              "Castle Point"                
##  [59] "Central Bedfordshire"         "Ceredigion"                  
##  [61] "Charnwood"                    "Chelmsford"                  
##  [63] "Cheltenham"                   "Cherwell"                    
##  [65] "Cheshire East"                "Cheshire West and Chester"   
##  [67] "Chesterfield"                 "Chichester"                  
##  [69] "Chiltern"                     "Chorley"                     
##  [71] "Christchurch"                 "City of London"              
##  [73] "Colchester"                   "Conwy"                       
##  [75] "Copeland"                     "Corby"                       
##  [77] "Cornwall"                     "Cotswold"                    
##  [79] "County Durham"                "Coventry"                    
##  [81] "Craven"                       "Crawley"                     
##  [83] "Croydon"                      "Dacorum"                     
##  [85] "Darlington"                   "Dartford"                    
##  [87] "Daventry"                     "Denbighshire"                
##  [89] "Derby"                        "Derbyshire Dales"            
##  [91] "Doncaster"                    "Dover"                       
##  [93] "Dudley"                       "Ealing"                      
##  [95] "East Cambridgeshire"          "East Devon"                  
##  [97] "East Dorset"                  "East Hampshire"              
##  [99] "East Hertfordshire"           "East Lindsey"                
## [101] "East Northamptonshire"        "East Riding of Yorkshire"    
## [103] "East Staffordshire"           "Eastbourne"                  
## [105] "Eastleigh"                    "Eden"                        
## [107] "Elmbridge"                    "Enfield"                     
## [109] "Epping Forest"                "Epsom and Ewell"             
## [111] "Erewash"                      "Exeter"                      
## [113] "Fareham"                      "Fenland"                     
## [115] "Flintshire"                   "Forest Heath"                
## [117] "Forest of Dean"               "Fylde"                       
## [119] "Gateshead"                    "Gedling"                     
## [121] "Gloucester"                   "Gosport"                     
## [123] "Gravesham"                    "Great Yarmouth"              
## [125] "Greenwich"                    "Guildford"                   
## [127] "Gwynedd"                      "Hackney"                     
## [129] "Halton"                       "Hambleton"                   
## [131] "Hammersmith and Fulham"       "Harborough"                  
## [133] "Haringey"                     "Harlow"                      
## [135] "Harrogate"                    "Harrow"                      
## [137] "Hart"                         "Hartlepool"                  
## [139] "Hastings"                     "Havant"                      
## [141] "Havering"                     "Herefordshire, County of"    
## [143] "Hertsmere"                    "High Peak"                   
## [145] "Hillingdon"                   "Hinckley and Bosworth"       
## [147] "Horsham"                      "Hounslow"                    
## [149] "Huntingdonshire"              "Hyndburn"                    
## [151] "Ipswich"                      "Isle of Anglesey"            
## [153] "Isle of Wight"                "Isles of Scilly"             
## [155] "Islington"                    "Kensington and Chelsea"      
## [157] "Kettering"                    "King's Lynn and West Norfolk"
## [159] "Kingston upon Hull, City of"  "Kingston upon Thames"        
## [161] "Kirklees"                     "Knowsley"                    
## [163] "Lambeth"                      "Lancaster"                   
## [165] "Leeds"                        "Leicester"                   
## [167] "Lewes"                        "Lewisham"                    
## [169] "Lichfield"                    "Lincoln"                     
## [171] "Liverpool"                    "Luton"                       
## [173] "Maidstone"                    "Maldon"                      
## [175] "Malvern Hills"                "Manchester"                  
## [177] "Mansfield"                    "Medway"                      
## [179] "Melton"                       "Mendip"                      
## [181] "Merthyr Tydfil"               "Merton"                      
## [183] "Mid Devon"                    "Mid Suffolk"                 
## [185] "Mid Sussex"                   "Middlesbrough"               
## [187] "Milton Keynes"                "Mole Valley"                 
## [189] "Monmouthshire"                "Neath Port Talbot"           
## [191] "New Forest"                   "Newark and Sherwood"         
## [193] "Newcastle upon Tyne"          "Newcastle-under-Lyme"        
## [195] "Newham"                       "Newport"                     
## [197] "North Devon"                  "North Dorset"                
## [199] "North East Derbyshire"        "North East Lincolnshire"     
## [201] "North Hertfordshire"          "North Kesteven"              
## [203] "North Lincolnshire"           "North Norfolk"               
## [205] "North Somerset"               "North Tyneside"              
## [207] "North Warwickshire"           "North West Leicestershire"   
## [209] "Northampton"                  "Northumberland"              
## [211] "Norwich"                      "Nottingham"                  
## [213] "Nuneaton and Bedworth"        "Oadby and Wigston"           
## [215] "Oldham"                       "Oxford"                      
## [217] "Pembrokeshire"                "Pendle"                      
## [219] "Peterborough"                 "Plymouth"                    
## [221] "Poole"                        "Portsmouth"                  
## [223] "Powys"                        "Preston"                     
## [225] "Purbeck"                      "Reading"                     
## [227] "Redbridge"                    "Redcar and Cleveland"        
## [229] "Redditch"                     "Reigate and Banstead"        
## [231] "Rhondda Cynon Taf"            "Ribble Valley"               
## [233] "Richmond upon Thames"         "Richmondshire"               
## [235] "Rochdale"                     "Rochford"                    
## [237] "Rossendale"                   "Rother"                      
## [239] "Rotherham"                    "Rugby"                       
## [241] "Runnymede"                    "Rushcliffe"                  
## [243] "Rushmoor"                     "Rutland"                     
## [245] "Ryedale"                      "Salford"                     
## [247] "Sandwell"                     "Scarborough"                 
## [249] "Sedgemoor"                    "Sefton"                      
## [251] "Selby"                        "Sevenoaks"                   
## [253] "Sheffield"                    "Shepway"                     
## [255] "Shropshire"                   "Slough"                      
## [257] "Solihull"                     "South Bucks"                 
## [259] "South Cambridgeshire"         "South Derbyshire"            
## [261] "South Gloucestershire"        "South Hams"                  
## [263] "South Holland"                "South Kesteven"              
## [265] "South Lakeland"               "South Norfolk"               
## [267] "South Northamptonshire"       "South Oxfordshire"           
## [269] "South Ribble"                 "South Somerset"              
## [271] "South Staffordshire"          "South Tyneside"              
## [273] "Southampton"                  "Southend-on-Sea"             
## [275] "Southwark"                    "Spelthorne"                  
## [277] "St Albans"                    "St Edmundsbury"              
## [279] "St. Helens"                   "Stafford"                    
## [281] "Staffordshire Moorlands"      "Stevenage"                   
## [283] "Stockport"                    "Stockton-on-Tees"            
## [285] "Stoke-on-Trent"               "Stratford-on-Avon"           
## [287] "Stroud"                       "Suffolk Coastal"             
## [289] "Sunderland"                   "Surrey Heath"                
## [291] "Sutton"                       "Swale"                       
## [293] "Swansea"                      "Swindon"                     
## [295] "Tameside"                     "Tamworth"                    
## [297] "Tandridge"                    "Taunton Deane"               
## [299] "Teignbridge"                  "Telford and Wrekin"          
## [301] "Tendring"                     "Test Valley"                 
## [303] "Tewkesbury"                   "Thanet"                      
## [305] "The Vale of Glamorgan"        "Three Rivers"                
## [307] "Thurrock"                     "Tonbridge and Malling"       
## [309] "Torbay"                       "Torfaen"                     
## [311] "Torridge"                     "Tower Hamlets"               
## [313] "Trafford"                     "Tunbridge Wells"             
## [315] "Uttlesford"                   "Vale of White Horse"         
## [317] "Wakefield"                    "Walsall"                     
## [319] "Waltham Forest"               "Wandsworth"                  
## [321] "Warrington"                   "Warwick"                     
## [323] "Watford"                      "Waveney"                     
## [325] "Waverley"                     "Wealden"                     
## [327] "Wellingborough"               "Welwyn Hatfield"             
## [329] "West Berkshire"               "West Devon"                  
## [331] "West Dorset"                  "West Lancashire"             
## [333] "West Lindsey"                 "West Oxfordshire"            
## [335] "West Somerset"                "Westminster"                 
## [337] "Weymouth and Portland"        "Wigan"                       
## [339] "Wiltshire"                    "Winchester"                  
## [341] "Windsor and Maidenhead"       "Wirral"                      
## [343] "Woking"                       "Wokingham"                   
## [345] "Wolverhampton"                "Worcester"                   
## [347] "Worthing"                     "Wrexham"                     
## [349] "Wychavon"                     "Wycombe"                     
## [351] "Wyre"                         "Wyre Forest"                 
## [353] "York"

Let’s start by creating a data set for Exeter for use with MOGP.

N <- dim(lads_by_day_cumulative_79)[1]
Noise <- rnorm(N, mean=0, sd=0.45)
tData <- cbind(lads_by_day_cumulative_79[,1:3], Noise, 
               lads_by_day_cumulative_79[,which(names(lads_by_day_cumulative_79)=="Exeter")])
head(tData)
##   incubation_time infectious_time    r_zero       Noise Exeter
## 1       0.7373969      -0.7824042 0.1800362 -0.06143804   9980
## 2       0.7373969      -0.7824042 0.1800362  0.32344981   3428
## 3       0.7373969      -0.7824042 0.1800362  0.31633277   8528
## 4       0.7373969      -0.7824042 0.1800362  0.61149608   4564
## 5       0.7373969      -0.7824042 0.1800362 -0.16575026  10976
## 6      -0.5435754       0.8556900 0.6630966  0.50185219   6991

The Noise variable is used to prevent over fitting of mean functions.

A first Emulator

We know that the first 125 row of this design contain 5 repeats of 25 unique design points. Suppose we naively fit a GP to all of the data with our default GP priors and look at the leave one outs:

MetaEmulators1 <- BuildNewEmulators(tData, HowManyEmulators = 1, meanFun = "fitted", kernel="Matern52")
## [1] "Max reduction is 3504.99195080868 using r_zero"
## [1] "Max reduction is 4510.25786293291 using incubation_time"
## [1] "Max reduction is 5087.81728364426 using infectious_time"
## [1] "Max reduction is 1239.23933256318 using r_zero"
## [1] "Max reduction is 1868.49222712138 using Three Way Interactions with r_zero"
## [1] "Max reduction is 60.330911588544 using incubation_time"
## [1] "Max reduction is 6.96362286270596 using Three Way Interactions with incubation_time"
## [1] "Max reduction is 7.77763270965261 using r_zero"
## [1] "No permitted terms improve the fit"
## 
## Call:
## lm(formula = Exeter ~ r_zero + I(r_zero^2) + I(r_zero^3) + incubation_time + 
##     I(incubation_time^2) + infectious_time + I(incubation_time * 
##     r_zero) + I(infectious_time * r_zero) + I(infectious_time * 
##     incubation_time) + I(r_zero * incubation_time * r_zero) + 
##     I(r_zero * infectious_time * r_zero) + I(r_zero * infectious_time * 
##     incubation_time) + I(incubation_time * incubation_time * 
##     r_zero) + I(incubation_time * infectious_time * incubation_time), 
##     data = tData)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54312  -2245   -955   1413  45139 
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                                6322       1371
## r_zero                                                    15289       3262
## I(r_zero^2)                                               15623       2336
## I(r_zero^3)                                                5208       4547
## incubation_time                                           -7823       1652
## I(incubation_time^2)                                       4087       2414
## infectious_time                                           -6548       2106
## I(incubation_time * r_zero)                              -22178       2248
## I(infectious_time * r_zero)                              -22263       2387
## I(infectious_time * incubation_time)                       9397       2222
## I(r_zero * incubation_time * r_zero)                     -19904       4158
## I(r_zero * infectious_time * r_zero)                     -20211       4316
## I(r_zero * infectious_time * incubation_time)             16200       4425
## I(incubation_time * incubation_time * r_zero)              6243       4658
## I(incubation_time * infectious_time * incubation_time)    -2116       4543
##                                                        t value Pr(>|t|)
## (Intercept)                                              4.611 7.45e-06
## r_zero                                                   4.687 5.37e-06
## I(r_zero^2)                                              6.688 2.59e-10
## I(r_zero^3)                                              1.145 0.253620
## incubation_time                                         -4.734 4.37e-06
## I(incubation_time^2)                                     1.693 0.092096
## infectious_time                                         -3.109 0.002174
## I(incubation_time * r_zero)                             -9.864  < 2e-16
## I(infectious_time * r_zero)                             -9.325  < 2e-16
## I(infectious_time * incubation_time)                     4.229 3.69e-05
## I(r_zero * incubation_time * r_zero)                    -4.787 3.46e-06
## I(r_zero * infectious_time * r_zero)                    -4.683 5.46e-06
## I(r_zero * infectious_time * incubation_time)            3.661 0.000328
## I(incubation_time * incubation_time * r_zero)            1.340 0.181798
## I(incubation_time * infectious_time * incubation_time)  -0.466 0.641915
##                                                           
## (Intercept)                                            ***
## r_zero                                                 ***
## I(r_zero^2)                                            ***
## I(r_zero^3)                                               
## incubation_time                                        ***
## I(incubation_time^2)                                   .  
## infectious_time                                        ** 
## I(incubation_time * r_zero)                            ***
## I(infectious_time * r_zero)                            ***
## I(infectious_time * incubation_time)                   ***
## I(r_zero * incubation_time * r_zero)                   ***
## I(r_zero * infectious_time * r_zero)                   ***
## I(r_zero * infectious_time * incubation_time)          ***
## I(incubation_time * incubation_time * r_zero)             
## I(incubation_time * infectious_time * incubation_time)    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9295 on 185 degrees of freedom
## Multiple R-squared:  0.8773, Adjusted R-squared:  0.868 
## F-statistic: 94.45 on 14 and 185 DF,  p-value: < 2.2e-16
tLOOs <- LOO.plot(Emulators = MetaEmulators1, which.emulator = 1,
                  ParamNames = names(tData)[1:3])

A few things to notice here. First, and most obvious is that we are missing predictions for large numbers of infections and we are too confident for low r_zero. This is a symptom of the major issue that the pure noise variance grows with r_zero (I will show this later) and our GP fits a constant nugget variance. The next is that the least squares estimates for the fitted regression are way beyond are subjective prior ranges (and I don’t believe them).

theta <- MetaEmulators1$mogp$emulators[[1]]$theta
ncls <- 3
np <- length(theta)-ncls-2
print("MAP Regression coefficients: ")
## [1] "MAP Regression coefficients: "
signif(theta[1:np], digits=3)
##  [1]  2.36000  0.02590  0.03130  0.03150 -0.01530  0.01270 -0.01990
##  [8] -0.02010 -0.02300  0.00954 -0.02510 -0.02420  0.01340  0.01910
## [15] -0.01240
print("LM coefficients:")
## [1] "LM coefficients:"
summary(MetaEmulators1$fitting.elements$lm.object[[1]]$linModel)$coefficients[,1]
##                                            (Intercept) 
##                                               6322.428 
##                                                 r_zero 
##                                              15289.104 
##                                            I(r_zero^2) 
##                                              15623.332 
##                                            I(r_zero^3) 
##                                               5207.570 
##                                        incubation_time 
##                                              -7822.812 
##                                   I(incubation_time^2) 
##                                               4087.324 
##                                        infectious_time 
##                                              -6547.520 
##                            I(incubation_time * r_zero) 
##                                             -22178.178 
##                            I(infectious_time * r_zero) 
##                                             -22262.976 
##                   I(infectious_time * incubation_time) 
##                                               9397.375 
##                   I(r_zero * incubation_time * r_zero) 
##                                             -19904.291 
##                   I(r_zero * infectious_time * r_zero) 
##                                             -20211.191 
##          I(r_zero * infectious_time * incubation_time) 
##                                              16199.700 
##          I(incubation_time * incubation_time * r_zero) 
##                                               6243.033 
## I(incubation_time * infectious_time * incubation_time) 
##                                              -2115.878

Showing that the MAP estimate is nowhere near the least squares fit so that the emulator is really driven by the correlation lengths:

print(paste("Half length correlations:", paste(signif(exp(-1/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.4320.3890.349"
print(paste("Half length correlations:", paste(signif(exp(-0.25/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.8110.790.769"
print(paste("Sigma^2 = ", signif(exp(theta[length(theta)-1]),digits=3)))
## [1] "Sigma^2 =  5.92e+08"
print(paste("Nugget = ", signif(exp(theta[length(theta)]),digits=3)))
## [1] "Nugget =  77500000"

These last 2 are perhaps better as standard deviations:

print(paste("Sigma = ", signif(sqrt(exp(theta[length(theta)-1])),digits=3)))
## [1] "Sigma =  24300"
print(paste("Nugget SD= ", signif(sqrt(exp(theta[length(theta)])),digits=3)))
## [1] "Nugget SD=  8810"

The Nugget problem

So what are the standard deviations of the repeat runs (which we would want to use to estimate the nugget)?

vecs <- 1:125
matvec <- matrix(vecs,nrow=5)
tsds <- sapply(1:25, function(k) sd(tData[matvec[,k],5]))
tsds
##  [1]  3335.07589  4633.91932 12603.77750    82.38507  1604.25911
##  [6]  5256.49365  2120.97082   363.43541   336.68427 35976.90673
## [11]  2566.63599   977.55010   581.58989 11850.37918  2298.83825
## [16]  1163.19852   939.54414  1129.84167 16513.44857  4972.26657
## [21]  1037.83703   946.64608  8387.66942 16681.39718   127.91091

We can plot these with the data

par(mfrow=c(1,3))
for(i in 1:3)
  plot(tData[matvec[1,],i],tsds,pch=16,xlab=names(tData)[i],ylab="SD infections")

Clearly a static nugget is not a good model and somehow an “average nugget” has been applied across parameter space, too large in places and not large enough in others.

There are things we should do to improve this initial emulation before exploring alternative Gaussian process models. First, we can relax the prior on the regression and on the variance terms. This type of data does not fit with our default prior specification (we should expect a larger nugget for one), and so it is worth seeing if the fit is an artifact of our subjective prior.

choicesOld <- choices.default
choices.default$NonInformativeRegression <- TRUE
choices.default$NonInformativeSigma <- TRUE
choices.default$NonInformativeNugget <- TRUE
MetaEmulators1 <- BuildNewEmulators(tData, HowManyEmulators = 1, meanFun = "fitted", kernel="Matern52")
## [1] "Max reduction is 3504.99195080868 using r_zero"
## [1] "Max reduction is 4510.25786293291 using incubation_time"
## [1] "Max reduction is 5087.81728364426 using infectious_time"
## [1] "Max reduction is 1239.23933256318 using r_zero"
## [1] "Max reduction is 1868.49222712138 using Three Way Interactions with r_zero"
## [1] "Max reduction is 60.330911588544 using incubation_time"
## [1] "Max reduction is 6.96362286270596 using Three Way Interactions with incubation_time"
## [1] "Max reduction is 7.77763270965261 using r_zero"
## [1] "No permitted terms improve the fit"
## 
## Call:
## lm(formula = Exeter ~ r_zero + I(r_zero^2) + I(r_zero^3) + incubation_time + 
##     I(incubation_time^2) + infectious_time + I(incubation_time * 
##     r_zero) + I(infectious_time * r_zero) + I(infectious_time * 
##     incubation_time) + I(r_zero * incubation_time * r_zero) + 
##     I(r_zero * infectious_time * r_zero) + I(r_zero * infectious_time * 
##     incubation_time) + I(incubation_time * incubation_time * 
##     r_zero) + I(incubation_time * infectious_time * incubation_time), 
##     data = tData)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54312  -2245   -955   1413  45139 
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                                6322       1371
## r_zero                                                    15289       3262
## I(r_zero^2)                                               15623       2336
## I(r_zero^3)                                                5208       4547
## incubation_time                                           -7823       1652
## I(incubation_time^2)                                       4087       2414
## infectious_time                                           -6548       2106
## I(incubation_time * r_zero)                              -22178       2248
## I(infectious_time * r_zero)                              -22263       2387
## I(infectious_time * incubation_time)                       9397       2222
## I(r_zero * incubation_time * r_zero)                     -19904       4158
## I(r_zero * infectious_time * r_zero)                     -20211       4316
## I(r_zero * infectious_time * incubation_time)             16200       4425
## I(incubation_time * incubation_time * r_zero)              6243       4658
## I(incubation_time * infectious_time * incubation_time)    -2116       4543
##                                                        t value Pr(>|t|)
## (Intercept)                                              4.611 7.45e-06
## r_zero                                                   4.687 5.37e-06
## I(r_zero^2)                                              6.688 2.59e-10
## I(r_zero^3)                                              1.145 0.253620
## incubation_time                                         -4.734 4.37e-06
## I(incubation_time^2)                                     1.693 0.092096
## infectious_time                                         -3.109 0.002174
## I(incubation_time * r_zero)                             -9.864  < 2e-16
## I(infectious_time * r_zero)                             -9.325  < 2e-16
## I(infectious_time * incubation_time)                     4.229 3.69e-05
## I(r_zero * incubation_time * r_zero)                    -4.787 3.46e-06
## I(r_zero * infectious_time * r_zero)                    -4.683 5.46e-06
## I(r_zero * infectious_time * incubation_time)            3.661 0.000328
## I(incubation_time * incubation_time * r_zero)            1.340 0.181798
## I(incubation_time * infectious_time * incubation_time)  -0.466 0.641915
##                                                           
## (Intercept)                                            ***
## r_zero                                                 ***
## I(r_zero^2)                                            ***
## I(r_zero^3)                                               
## incubation_time                                        ***
## I(incubation_time^2)                                   .  
## infectious_time                                        ** 
## I(incubation_time * r_zero)                            ***
## I(infectious_time * r_zero)                            ***
## I(infectious_time * incubation_time)                   ***
## I(r_zero * incubation_time * r_zero)                   ***
## I(r_zero * infectious_time * r_zero)                   ***
## I(r_zero * infectious_time * incubation_time)          ***
## I(incubation_time * incubation_time * r_zero)             
## I(incubation_time * infectious_time * incubation_time)    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9295 on 185 degrees of freedom
## Multiple R-squared:  0.8773, Adjusted R-squared:  0.868 
## F-statistic: 94.45 on 14 and 185 DF,  p-value: < 2.2e-16
tLOOs <- LOO.plot(Emulators = MetaEmulators1, which.emulator = 1,
                  ParamNames = names(tData)[1:3])

theta <- MetaEmulators1$mogp$emulators[[1]]$theta
ncls <- 3
np <- length(theta)-ncls-2
print("MAP Regression coefficients: ")
## [1] "MAP Regression coefficients: "
signif(theta[1:np], digits=3)
##  [1]  23800  13300  11800   9450 -10100  10200  -7930  -6380  -4540   3740
## [11]  -5920  -4130   2820   6160  -3900
print("LM coefficients:")
## [1] "LM coefficients:"
summary(MetaEmulators1$fitting.elements$lm.object[[1]]$linModel)$coefficients[,1]
##                                            (Intercept) 
##                                               6322.428 
##                                                 r_zero 
##                                              15289.104 
##                                            I(r_zero^2) 
##                                              15623.332 
##                                            I(r_zero^3) 
##                                               5207.570 
##                                        incubation_time 
##                                              -7822.812 
##                                   I(incubation_time^2) 
##                                               4087.324 
##                                        infectious_time 
##                                              -6547.520 
##                            I(incubation_time * r_zero) 
##                                             -22178.178 
##                            I(infectious_time * r_zero) 
##                                             -22262.976 
##                   I(infectious_time * incubation_time) 
##                                               9397.375 
##                   I(r_zero * incubation_time * r_zero) 
##                                             -19904.291 
##                   I(r_zero * infectious_time * r_zero) 
##                                             -20211.191 
##          I(r_zero * infectious_time * incubation_time) 
##                                              16199.700 
##          I(incubation_time * incubation_time * r_zero) 
##                                               6243.033 
## I(incubation_time * infectious_time * incubation_time) 
##                                              -2115.878
print(paste("Half length correlations:", paste(signif(exp(-1/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.4120.3780.366"
print(paste("Half length correlations:", paste(signif(exp(-0.25/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.8010.7840.778"
print(paste("Sigma = ", signif(sqrt(exp(theta[length(theta)-1])),digits=3)))
## [1] "Sigma =  16200"
print(paste("Nugget SD= ", signif(sqrt(exp(theta[length(theta)])),digits=3)))
## [1] "Nugget SD=  9160"

This has marginally inflated the nugget (though it’s still an average across parameter space) and confirms that whilst we see a regression looking a little like a taylor expansion of an exponential in r_zero (like we might expect), this is better modelled using the correlation function of the GP.

A stochastic emulator

The problem above is well known and there are a number of alternative emulators around to try. A nice review is given by (Baker et al. 2020). I’ll explore using the hetGP package to build emulators here.

require(hetGP)
## Loading required package: hetGP
tX <- as.matrix(lads_by_day_cumulative_79[,1:3])
tY <- lads_by_day_cumulative_79$Exeter
mod.het <- mleHetGP(X = as.matrix(tX), Z = tY, lower = 0.0001, upper = 10)

This fits quite rapidly, which is reassuring as we will need to build a lot of emulators for Metawards. The inbuilt leave one out function is unconvincing:

plot(mod.het)

It seems this plot is likely drawn incorrectly in the code. To give a better idea of the fit, let’s plot some predictions and error bars and overlay the data.

xnews <- 2*randomLHS(200,3) - 1
p.het <- predict(mod.het, xnews) #make predictions
pvar.het <- p.het$sd2 + p.het$nugs
par(mfrow=c(1,3))
errorbar(x=xnews[,1],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "incubation_time", ylab = "Infections",   
         main ="Stochastic GP Exeter")
points(tX[,1] , tY,col=4)
errorbar(x=xnews[,2],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "infectious_time", ylab = "Infections",   
         main ="Stochastic GP Exeter")
points(tX[,2] , tY,col=4)
errorbar(x=xnews[,3],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "r_zero", ylab = "Infections",   
         main ="Stochastic GP Exeter")
points(tX[,3] , tY,col=4)

Here the predictions and error bars show the emulator structure. The model runs are overlaid but the predictions are not at the model run locations (this is the idea of the leave one outs, but there seems to be some error there).

What we see is that this type of emulator is getting the variance structure right, but it is not as capable of reaching high values as our MOGP emulator was.

Scaling up?

How would hetGP do at multi-output emulation? If we can understand how to improve it’s predictions in some way, will it scale to all of the wards? As a quick test, I will emulate the South West.

tNames <- c("Bath and North East Somerset", "Bristol, City of", "Cornwall", "East Devon", "East Dorset", "Exeter", "Isles of Scilly", "Mid Devon", "North Devon", "Plymouth", "South Hams", "South Somerset", "Torbay", "West Devon", "West Dorset", "West Somerset")
SouthWest <- lads_by_day_cumulative_79[,which(names(lads_by_day_cumulative_79)%in%tNames)]
nEms <- length(tNames)

Timing fitting the emulators

system.time(EM.list <- lapply(1:nEms, function(i) mleHetGP(X = as.matrix(tX), Z = unlist(SouthWest[,i]), lower = 0.0001, upper = 10)))
##    user  system elapsed 
##   4.872   1.032   5.927

The speed looks promising. Each emulator is shown below

for(k in 1:nEms){
  par(mfrow=c(1,3))
  p.het <- predict(EM.list[[k]], xnews) #make predictions
pvar.het <- p.het$sd2 + p.het$nugs
par(mfrow=c(1,3))
errorbar(x=xnews[,1],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "incubation_time", ylab = "Infections",   
         main =tNames[k])
points(tX[,1] , unlist(SouthWest[,k]),col=4)
errorbar(x=xnews[,2],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "infectious_time", ylab = "Infections",   
         main =tNames[k])
points(tX[,2] , unlist(SouthWest[,k]),col=4)
errorbar(x=xnews[,3],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "r_zero", ylab = "Infections",   
         main =tNames[k])
points(tX[,3] , unlist(SouthWest[,k]),col=4)
}

Innovation required

The hetGP fits are not great, though they capture the variance structure. There will be an issue with the low number of runs we have adding to this, but we may need to explore the choices in the covariance function.

There is a substantial issue with negative counts predicted. We might consider a negative binomial model with the GP being latent to capture this. We might also truncate, but investigation will be needed to see what the impact of that modelling is.

References

Baker, Evan, Pierre Barbillon, Arindam Fadikar, Robert Grammacy, Radu Herbei, David Higdon, Jiangeng Huang, et al. 2020. “Sochastic Simulators an Overview with Opportunities.” arXiv. https://arxiv.org/pdf/2002.01321.