Using the UQ4covid initial design described here I will build some GPs and do some data analysis to illustrate challenges for emulating Metawards.
I will explore using mogp_emulator and the ExeterUQ_MOGP implementation (info is available here. To do that I need to create a pointer to the directory of my mogp_emulator
installation and then to source the BuildEmulator.R
code from the ExeterUQ_MOGP repository https://github.com/BayesExeter/ExeterUQ_MOGP. For this code block, your directory strings will necessarily be different from mine.
mogp_dir <- "~/Dropbox/BayesExeter/mogp_emulator"
setwd("~/Dropbox/BayesExeter/ExeterUQ_MOGP")
source("~/Dropbox/BayesExeter/ExeterUQ_MOGP/BuildEmulator/BuildEmulator.R")
## far library : Modelization for Functional AutoRegressive processes
## version 0.6-4 (2014-12-07)
## Spam version 2.3-0 (2019-09-13) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
##
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
##
## backsolve, forwardsolve
## See https://github.com/NCAR/Fields for
## an extensive vignette, other supplements and source code
## This is loo version 2.1.0.
## **NOTE: As of version 2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session. Visit mc-stan.org/loo/news for details on other changes.
Now we get some data from the uq4covid
repository. I will look at the pre-lockdown infections at local authority level.
library(readr)
lads_by_day_cumulative_79 <- read_csv("../../data/metawards/initial_ensemble/data/pre_lockdown_1/lads_by_day_cumulative_79.csv")
## Parsed with column specification:
## cols(
## .default = col_double()
## )
## See spec(...) for full column specifications.
names(lads_by_day_cumulative_79)
## [1] "incubation_time" "infectious_time"
## [3] "r_zero" "lock_1_restrict"
## [5] "lock_2_release" "Adur"
## [7] "Allerdale" "Amber Valley"
## [9] "Arun" "Ashfield"
## [11] "Ashford" "Aylesbury Vale"
## [13] "Babergh" "Barking and Dagenham"
## [15] "Barnet" "Barnsley"
## [17] "Barrow-in-Furness" "Basildon"
## [19] "Basingstoke and Deane" "Bassetlaw"
## [21] "Bath and North East Somerset" "Bedford"
## [23] "Bexley" "Birmingham"
## [25] "Blaby" "Blackburn with Darwen"
## [27] "Blackpool" "Blaenau Gwent"
## [29] "Bolsover" "Bolton"
## [31] "Boston" "Bournemouth"
## [33] "Bracknell Forest" "Bradford"
## [35] "Braintree" "Breckland"
## [37] "Brent" "Brentwood"
## [39] "Bridgend" "Brighton and Hove"
## [41] "Bristol, City of" "Broadland"
## [43] "Bromley" "Bromsgrove"
## [45] "Broxbourne" "Broxtowe"
## [47] "Burnley" "Bury"
## [49] "Caerphilly" "Calderdale"
## [51] "Cambridge" "Camden"
## [53] "Cannock Chase" "Canterbury"
## [55] "Cardiff" "Carlisle"
## [57] "Carmarthenshire" "Castle Point"
## [59] "Central Bedfordshire" "Ceredigion"
## [61] "Charnwood" "Chelmsford"
## [63] "Cheltenham" "Cherwell"
## [65] "Cheshire East" "Cheshire West and Chester"
## [67] "Chesterfield" "Chichester"
## [69] "Chiltern" "Chorley"
## [71] "Christchurch" "City of London"
## [73] "Colchester" "Conwy"
## [75] "Copeland" "Corby"
## [77] "Cornwall" "Cotswold"
## [79] "County Durham" "Coventry"
## [81] "Craven" "Crawley"
## [83] "Croydon" "Dacorum"
## [85] "Darlington" "Dartford"
## [87] "Daventry" "Denbighshire"
## [89] "Derby" "Derbyshire Dales"
## [91] "Doncaster" "Dover"
## [93] "Dudley" "Ealing"
## [95] "East Cambridgeshire" "East Devon"
## [97] "East Dorset" "East Hampshire"
## [99] "East Hertfordshire" "East Lindsey"
## [101] "East Northamptonshire" "East Riding of Yorkshire"
## [103] "East Staffordshire" "Eastbourne"
## [105] "Eastleigh" "Eden"
## [107] "Elmbridge" "Enfield"
## [109] "Epping Forest" "Epsom and Ewell"
## [111] "Erewash" "Exeter"
## [113] "Fareham" "Fenland"
## [115] "Flintshire" "Forest Heath"
## [117] "Forest of Dean" "Fylde"
## [119] "Gateshead" "Gedling"
## [121] "Gloucester" "Gosport"
## [123] "Gravesham" "Great Yarmouth"
## [125] "Greenwich" "Guildford"
## [127] "Gwynedd" "Hackney"
## [129] "Halton" "Hambleton"
## [131] "Hammersmith and Fulham" "Harborough"
## [133] "Haringey" "Harlow"
## [135] "Harrogate" "Harrow"
## [137] "Hart" "Hartlepool"
## [139] "Hastings" "Havant"
## [141] "Havering" "Herefordshire, County of"
## [143] "Hertsmere" "High Peak"
## [145] "Hillingdon" "Hinckley and Bosworth"
## [147] "Horsham" "Hounslow"
## [149] "Huntingdonshire" "Hyndburn"
## [151] "Ipswich" "Isle of Anglesey"
## [153] "Isle of Wight" "Isles of Scilly"
## [155] "Islington" "Kensington and Chelsea"
## [157] "Kettering" "King's Lynn and West Norfolk"
## [159] "Kingston upon Hull, City of" "Kingston upon Thames"
## [161] "Kirklees" "Knowsley"
## [163] "Lambeth" "Lancaster"
## [165] "Leeds" "Leicester"
## [167] "Lewes" "Lewisham"
## [169] "Lichfield" "Lincoln"
## [171] "Liverpool" "Luton"
## [173] "Maidstone" "Maldon"
## [175] "Malvern Hills" "Manchester"
## [177] "Mansfield" "Medway"
## [179] "Melton" "Mendip"
## [181] "Merthyr Tydfil" "Merton"
## [183] "Mid Devon" "Mid Suffolk"
## [185] "Mid Sussex" "Middlesbrough"
## [187] "Milton Keynes" "Mole Valley"
## [189] "Monmouthshire" "Neath Port Talbot"
## [191] "New Forest" "Newark and Sherwood"
## [193] "Newcastle upon Tyne" "Newcastle-under-Lyme"
## [195] "Newham" "Newport"
## [197] "North Devon" "North Dorset"
## [199] "North East Derbyshire" "North East Lincolnshire"
## [201] "North Hertfordshire" "North Kesteven"
## [203] "North Lincolnshire" "North Norfolk"
## [205] "North Somerset" "North Tyneside"
## [207] "North Warwickshire" "North West Leicestershire"
## [209] "Northampton" "Northumberland"
## [211] "Norwich" "Nottingham"
## [213] "Nuneaton and Bedworth" "Oadby and Wigston"
## [215] "Oldham" "Oxford"
## [217] "Pembrokeshire" "Pendle"
## [219] "Peterborough" "Plymouth"
## [221] "Poole" "Portsmouth"
## [223] "Powys" "Preston"
## [225] "Purbeck" "Reading"
## [227] "Redbridge" "Redcar and Cleveland"
## [229] "Redditch" "Reigate and Banstead"
## [231] "Rhondda Cynon Taf" "Ribble Valley"
## [233] "Richmond upon Thames" "Richmondshire"
## [235] "Rochdale" "Rochford"
## [237] "Rossendale" "Rother"
## [239] "Rotherham" "Rugby"
## [241] "Runnymede" "Rushcliffe"
## [243] "Rushmoor" "Rutland"
## [245] "Ryedale" "Salford"
## [247] "Sandwell" "Scarborough"
## [249] "Sedgemoor" "Sefton"
## [251] "Selby" "Sevenoaks"
## [253] "Sheffield" "Shepway"
## [255] "Shropshire" "Slough"
## [257] "Solihull" "South Bucks"
## [259] "South Cambridgeshire" "South Derbyshire"
## [261] "South Gloucestershire" "South Hams"
## [263] "South Holland" "South Kesteven"
## [265] "South Lakeland" "South Norfolk"
## [267] "South Northamptonshire" "South Oxfordshire"
## [269] "South Ribble" "South Somerset"
## [271] "South Staffordshire" "South Tyneside"
## [273] "Southampton" "Southend-on-Sea"
## [275] "Southwark" "Spelthorne"
## [277] "St Albans" "St Edmundsbury"
## [279] "St. Helens" "Stafford"
## [281] "Staffordshire Moorlands" "Stevenage"
## [283] "Stockport" "Stockton-on-Tees"
## [285] "Stoke-on-Trent" "Stratford-on-Avon"
## [287] "Stroud" "Suffolk Coastal"
## [289] "Sunderland" "Surrey Heath"
## [291] "Sutton" "Swale"
## [293] "Swansea" "Swindon"
## [295] "Tameside" "Tamworth"
## [297] "Tandridge" "Taunton Deane"
## [299] "Teignbridge" "Telford and Wrekin"
## [301] "Tendring" "Test Valley"
## [303] "Tewkesbury" "Thanet"
## [305] "The Vale of Glamorgan" "Three Rivers"
## [307] "Thurrock" "Tonbridge and Malling"
## [309] "Torbay" "Torfaen"
## [311] "Torridge" "Tower Hamlets"
## [313] "Trafford" "Tunbridge Wells"
## [315] "Uttlesford" "Vale of White Horse"
## [317] "Wakefield" "Walsall"
## [319] "Waltham Forest" "Wandsworth"
## [321] "Warrington" "Warwick"
## [323] "Watford" "Waveney"
## [325] "Waverley" "Wealden"
## [327] "Wellingborough" "Welwyn Hatfield"
## [329] "West Berkshire" "West Devon"
## [331] "West Dorset" "West Lancashire"
## [333] "West Lindsey" "West Oxfordshire"
## [335] "West Somerset" "Westminster"
## [337] "Weymouth and Portland" "Wigan"
## [339] "Wiltshire" "Winchester"
## [341] "Windsor and Maidenhead" "Wirral"
## [343] "Woking" "Wokingham"
## [345] "Wolverhampton" "Worcester"
## [347] "Worthing" "Wrexham"
## [349] "Wychavon" "Wycombe"
## [351] "Wyre" "Wyre Forest"
## [353] "York"
Let’s start by creating a data set for Exeter for use with MOGP.
N <- dim(lads_by_day_cumulative_79)[1]
Noise <- rnorm(N, mean=0, sd=0.45)
tData <- cbind(lads_by_day_cumulative_79[,1:3], Noise,
lads_by_day_cumulative_79[,which(names(lads_by_day_cumulative_79)=="Exeter")])
head(tData)
## incubation_time infectious_time r_zero Noise Exeter
## 1 0.7373969 -0.7824042 0.1800362 -0.06143804 9980
## 2 0.7373969 -0.7824042 0.1800362 0.32344981 3428
## 3 0.7373969 -0.7824042 0.1800362 0.31633277 8528
## 4 0.7373969 -0.7824042 0.1800362 0.61149608 4564
## 5 0.7373969 -0.7824042 0.1800362 -0.16575026 10976
## 6 -0.5435754 0.8556900 0.6630966 0.50185219 6991
The Noise
variable is used to prevent over fitting of mean functions.
We know that the first 125 row of this design contain 5 repeats of 25 unique design points. Suppose we naively fit a GP to all of the data with our default GP priors and look at the leave one outs:
MetaEmulators1 <- BuildNewEmulators(tData, HowManyEmulators = 1, meanFun = "fitted", kernel="Matern52")
## [1] "Max reduction is 3504.99195080868 using r_zero"
## [1] "Max reduction is 4510.25786293291 using incubation_time"
## [1] "Max reduction is 5087.81728364426 using infectious_time"
## [1] "Max reduction is 1239.23933256318 using r_zero"
## [1] "Max reduction is 1868.49222712138 using Three Way Interactions with r_zero"
## [1] "Max reduction is 60.330911588544 using incubation_time"
## [1] "Max reduction is 6.96362286270596 using Three Way Interactions with incubation_time"
## [1] "Max reduction is 7.77763270965261 using r_zero"
## [1] "No permitted terms improve the fit"
##
## Call:
## lm(formula = Exeter ~ r_zero + I(r_zero^2) + I(r_zero^3) + incubation_time +
## I(incubation_time^2) + infectious_time + I(incubation_time *
## r_zero) + I(infectious_time * r_zero) + I(infectious_time *
## incubation_time) + I(r_zero * incubation_time * r_zero) +
## I(r_zero * infectious_time * r_zero) + I(r_zero * infectious_time *
## incubation_time) + I(incubation_time * incubation_time *
## r_zero) + I(incubation_time * infectious_time * incubation_time),
## data = tData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54312 -2245 -955 1413 45139
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 6322 1371
## r_zero 15289 3262
## I(r_zero^2) 15623 2336
## I(r_zero^3) 5208 4547
## incubation_time -7823 1652
## I(incubation_time^2) 4087 2414
## infectious_time -6548 2106
## I(incubation_time * r_zero) -22178 2248
## I(infectious_time * r_zero) -22263 2387
## I(infectious_time * incubation_time) 9397 2222
## I(r_zero * incubation_time * r_zero) -19904 4158
## I(r_zero * infectious_time * r_zero) -20211 4316
## I(r_zero * infectious_time * incubation_time) 16200 4425
## I(incubation_time * incubation_time * r_zero) 6243 4658
## I(incubation_time * infectious_time * incubation_time) -2116 4543
## t value Pr(>|t|)
## (Intercept) 4.611 7.45e-06
## r_zero 4.687 5.37e-06
## I(r_zero^2) 6.688 2.59e-10
## I(r_zero^3) 1.145 0.253620
## incubation_time -4.734 4.37e-06
## I(incubation_time^2) 1.693 0.092096
## infectious_time -3.109 0.002174
## I(incubation_time * r_zero) -9.864 < 2e-16
## I(infectious_time * r_zero) -9.325 < 2e-16
## I(infectious_time * incubation_time) 4.229 3.69e-05
## I(r_zero * incubation_time * r_zero) -4.787 3.46e-06
## I(r_zero * infectious_time * r_zero) -4.683 5.46e-06
## I(r_zero * infectious_time * incubation_time) 3.661 0.000328
## I(incubation_time * incubation_time * r_zero) 1.340 0.181798
## I(incubation_time * infectious_time * incubation_time) -0.466 0.641915
##
## (Intercept) ***
## r_zero ***
## I(r_zero^2) ***
## I(r_zero^3)
## incubation_time ***
## I(incubation_time^2) .
## infectious_time **
## I(incubation_time * r_zero) ***
## I(infectious_time * r_zero) ***
## I(infectious_time * incubation_time) ***
## I(r_zero * incubation_time * r_zero) ***
## I(r_zero * infectious_time * r_zero) ***
## I(r_zero * infectious_time * incubation_time) ***
## I(incubation_time * incubation_time * r_zero)
## I(incubation_time * infectious_time * incubation_time)
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9295 on 185 degrees of freedom
## Multiple R-squared: 0.8773, Adjusted R-squared: 0.868
## F-statistic: 94.45 on 14 and 185 DF, p-value: < 2.2e-16
tLOOs <- LOO.plot(Emulators = MetaEmulators1, which.emulator = 1,
ParamNames = names(tData)[1:3])
A few things to notice here. First, and most obvious is that we are missing predictions for large numbers of infections and we are too confident for low r_zero
. This is a symptom of the major issue that the pure noise variance grows with r_zero
(I will show this later) and our GP fits a constant nugget variance. The next is that the least squares estimates for the fitted regression are way beyond are subjective prior ranges (and I don’t believe them).
theta <- MetaEmulators1$mogp$emulators[[1]]$theta
ncls <- 3
np <- length(theta)-ncls-2
print("MAP Regression coefficients: ")
## [1] "MAP Regression coefficients: "
signif(theta[1:np], digits=3)
## [1] 2.36000 0.02590 0.03130 0.03150 -0.01530 0.01270 -0.01990
## [8] -0.02010 -0.02300 0.00954 -0.02510 -0.02420 0.01340 0.01910
## [15] -0.01240
print("LM coefficients:")
## [1] "LM coefficients:"
summary(MetaEmulators1$fitting.elements$lm.object[[1]]$linModel)$coefficients[,1]
## (Intercept)
## 6322.428
## r_zero
## 15289.104
## I(r_zero^2)
## 15623.332
## I(r_zero^3)
## 5207.570
## incubation_time
## -7822.812
## I(incubation_time^2)
## 4087.324
## infectious_time
## -6547.520
## I(incubation_time * r_zero)
## -22178.178
## I(infectious_time * r_zero)
## -22262.976
## I(infectious_time * incubation_time)
## 9397.375
## I(r_zero * incubation_time * r_zero)
## -19904.291
## I(r_zero * infectious_time * r_zero)
## -20211.191
## I(r_zero * infectious_time * incubation_time)
## 16199.700
## I(incubation_time * incubation_time * r_zero)
## 6243.033
## I(incubation_time * infectious_time * incubation_time)
## -2115.878
Showing that the MAP estimate is nowhere near the least squares fit so that the emulator is really driven by the correlation lengths:
print(paste("Half length correlations:", paste(signif(exp(-1/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.4320.3890.349"
print(paste("Half length correlations:", paste(signif(exp(-0.25/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.8110.790.769"
print(paste("Sigma^2 = ", signif(exp(theta[length(theta)-1]),digits=3)))
## [1] "Sigma^2 = 5.92e+08"
print(paste("Nugget = ", signif(exp(theta[length(theta)]),digits=3)))
## [1] "Nugget = 77500000"
These last 2 are perhaps better as standard deviations:
print(paste("Sigma = ", signif(sqrt(exp(theta[length(theta)-1])),digits=3)))
## [1] "Sigma = 24300"
print(paste("Nugget SD= ", signif(sqrt(exp(theta[length(theta)])),digits=3)))
## [1] "Nugget SD= 8810"
So what are the standard deviations of the repeat runs (which we would want to use to estimate the nugget)?
vecs <- 1:125
matvec <- matrix(vecs,nrow=5)
tsds <- sapply(1:25, function(k) sd(tData[matvec[,k],5]))
tsds
## [1] 3335.07589 4633.91932 12603.77750 82.38507 1604.25911
## [6] 5256.49365 2120.97082 363.43541 336.68427 35976.90673
## [11] 2566.63599 977.55010 581.58989 11850.37918 2298.83825
## [16] 1163.19852 939.54414 1129.84167 16513.44857 4972.26657
## [21] 1037.83703 946.64608 8387.66942 16681.39718 127.91091
We can plot these with the data
par(mfrow=c(1,3))
for(i in 1:3)
plot(tData[matvec[1,],i],tsds,pch=16,xlab=names(tData)[i],ylab="SD infections")
Clearly a static nugget is not a good model and somehow an “average nugget” has been applied across parameter space, too large in places and not large enough in others.
There are things we should do to improve this initial emulation before exploring alternative Gaussian process models. First, we can relax the prior on the regression and on the variance terms. This type of data does not fit with our default prior specification (we should expect a larger nugget for one), and so it is worth seeing if the fit is an artifact of our subjective prior.
choicesOld <- choices.default
choices.default$NonInformativeRegression <- TRUE
choices.default$NonInformativeSigma <- TRUE
choices.default$NonInformativeNugget <- TRUE
MetaEmulators1 <- BuildNewEmulators(tData, HowManyEmulators = 1, meanFun = "fitted", kernel="Matern52")
## [1] "Max reduction is 3504.99195080868 using r_zero"
## [1] "Max reduction is 4510.25786293291 using incubation_time"
## [1] "Max reduction is 5087.81728364426 using infectious_time"
## [1] "Max reduction is 1239.23933256318 using r_zero"
## [1] "Max reduction is 1868.49222712138 using Three Way Interactions with r_zero"
## [1] "Max reduction is 60.330911588544 using incubation_time"
## [1] "Max reduction is 6.96362286270596 using Three Way Interactions with incubation_time"
## [1] "Max reduction is 7.77763270965261 using r_zero"
## [1] "No permitted terms improve the fit"
##
## Call:
## lm(formula = Exeter ~ r_zero + I(r_zero^2) + I(r_zero^3) + incubation_time +
## I(incubation_time^2) + infectious_time + I(incubation_time *
## r_zero) + I(infectious_time * r_zero) + I(infectious_time *
## incubation_time) + I(r_zero * incubation_time * r_zero) +
## I(r_zero * infectious_time * r_zero) + I(r_zero * infectious_time *
## incubation_time) + I(incubation_time * incubation_time *
## r_zero) + I(incubation_time * infectious_time * incubation_time),
## data = tData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54312 -2245 -955 1413 45139
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 6322 1371
## r_zero 15289 3262
## I(r_zero^2) 15623 2336
## I(r_zero^3) 5208 4547
## incubation_time -7823 1652
## I(incubation_time^2) 4087 2414
## infectious_time -6548 2106
## I(incubation_time * r_zero) -22178 2248
## I(infectious_time * r_zero) -22263 2387
## I(infectious_time * incubation_time) 9397 2222
## I(r_zero * incubation_time * r_zero) -19904 4158
## I(r_zero * infectious_time * r_zero) -20211 4316
## I(r_zero * infectious_time * incubation_time) 16200 4425
## I(incubation_time * incubation_time * r_zero) 6243 4658
## I(incubation_time * infectious_time * incubation_time) -2116 4543
## t value Pr(>|t|)
## (Intercept) 4.611 7.45e-06
## r_zero 4.687 5.37e-06
## I(r_zero^2) 6.688 2.59e-10
## I(r_zero^3) 1.145 0.253620
## incubation_time -4.734 4.37e-06
## I(incubation_time^2) 1.693 0.092096
## infectious_time -3.109 0.002174
## I(incubation_time * r_zero) -9.864 < 2e-16
## I(infectious_time * r_zero) -9.325 < 2e-16
## I(infectious_time * incubation_time) 4.229 3.69e-05
## I(r_zero * incubation_time * r_zero) -4.787 3.46e-06
## I(r_zero * infectious_time * r_zero) -4.683 5.46e-06
## I(r_zero * infectious_time * incubation_time) 3.661 0.000328
## I(incubation_time * incubation_time * r_zero) 1.340 0.181798
## I(incubation_time * infectious_time * incubation_time) -0.466 0.641915
##
## (Intercept) ***
## r_zero ***
## I(r_zero^2) ***
## I(r_zero^3)
## incubation_time ***
## I(incubation_time^2) .
## infectious_time **
## I(incubation_time * r_zero) ***
## I(infectious_time * r_zero) ***
## I(infectious_time * incubation_time) ***
## I(r_zero * incubation_time * r_zero) ***
## I(r_zero * infectious_time * r_zero) ***
## I(r_zero * infectious_time * incubation_time) ***
## I(incubation_time * incubation_time * r_zero)
## I(incubation_time * infectious_time * incubation_time)
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9295 on 185 degrees of freedom
## Multiple R-squared: 0.8773, Adjusted R-squared: 0.868
## F-statistic: 94.45 on 14 and 185 DF, p-value: < 2.2e-16
tLOOs <- LOO.plot(Emulators = MetaEmulators1, which.emulator = 1,
ParamNames = names(tData)[1:3])
theta <- MetaEmulators1$mogp$emulators[[1]]$theta
ncls <- 3
np <- length(theta)-ncls-2
print("MAP Regression coefficients: ")
## [1] "MAP Regression coefficients: "
signif(theta[1:np], digits=3)
## [1] 23800 13300 11800 9450 -10100 10200 -7930 -6380 -4540 3740
## [11] -5920 -4130 2820 6160 -3900
print("LM coefficients:")
## [1] "LM coefficients:"
summary(MetaEmulators1$fitting.elements$lm.object[[1]]$linModel)$coefficients[,1]
## (Intercept)
## 6322.428
## r_zero
## 15289.104
## I(r_zero^2)
## 15623.332
## I(r_zero^3)
## 5207.570
## incubation_time
## -7822.812
## I(incubation_time^2)
## 4087.324
## infectious_time
## -6547.520
## I(incubation_time * r_zero)
## -22178.178
## I(infectious_time * r_zero)
## -22262.976
## I(infectious_time * incubation_time)
## 9397.375
## I(r_zero * incubation_time * r_zero)
## -19904.291
## I(r_zero * infectious_time * r_zero)
## -20211.191
## I(r_zero * infectious_time * incubation_time)
## 16199.700
## I(incubation_time * incubation_time * r_zero)
## 6243.033
## I(incubation_time * infectious_time * incubation_time)
## -2115.878
print(paste("Half length correlations:", paste(signif(exp(-1/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.4120.3780.366"
print(paste("Half length correlations:", paste(signif(exp(-0.25/(exp(-theta[(np+1):(np+ncls)])^2)),digits=3),collapse = "")))
## [1] "Half length correlations: 0.8010.7840.778"
print(paste("Sigma = ", signif(sqrt(exp(theta[length(theta)-1])),digits=3)))
## [1] "Sigma = 16200"
print(paste("Nugget SD= ", signif(sqrt(exp(theta[length(theta)])),digits=3)))
## [1] "Nugget SD= 9160"
This has marginally inflated the nugget (though it’s still an average across parameter space) and confirms that whilst we see a regression looking a little like a taylor expansion of an exponential in r_zero
(like we might expect), this is better modelled using the correlation function of the GP.
The problem above is well known and there are a number of alternative emulators around to try. A nice review is given by (Baker et al. 2020). I’ll explore using the hetGP
package to build emulators here.
require(hetGP)
## Loading required package: hetGP
tX <- as.matrix(lads_by_day_cumulative_79[,1:3])
tY <- lads_by_day_cumulative_79$Exeter
mod.het <- mleHetGP(X = as.matrix(tX), Z = tY, lower = 0.0001, upper = 10)
This fits quite rapidly, which is reassuring as we will need to build a lot of emulators for Metawards. The inbuilt leave one out function is unconvincing:
plot(mod.het)
It seems this plot is likely drawn incorrectly in the code. To give a better idea of the fit, let’s plot some predictions and error bars and overlay the data.
xnews <- 2*randomLHS(200,3) - 1
p.het <- predict(mod.het, xnews) #make predictions
pvar.het <- p.het$sd2 + p.het$nugs
par(mfrow=c(1,3))
errorbar(x=xnews[,1],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "incubation_time", ylab = "Infections",
main ="Stochastic GP Exeter")
points(tX[,1] , tY,col=4)
errorbar(x=xnews[,2],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "infectious_time", ylab = "Infections",
main ="Stochastic GP Exeter")
points(tX[,2] , tY,col=4)
errorbar(x=xnews[,3],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "r_zero", ylab = "Infections",
main ="Stochastic GP Exeter")
points(tX[,3] , tY,col=4)
Here the predictions and error bars show the emulator structure. The model runs are overlaid but the predictions are not at the model run locations (this is the idea of the leave one outs, but there seems to be some error there).
What we see is that this type of emulator is getting the variance structure right, but it is not as capable of reaching high values as our MOGP emulator was.
How would hetGP
do at multi-output emulation? If we can understand how to improve it’s predictions in some way, will it scale to all of the wards? As a quick test, I will emulate the South West.
tNames <- c("Bath and North East Somerset", "Bristol, City of", "Cornwall", "East Devon", "East Dorset", "Exeter", "Isles of Scilly", "Mid Devon", "North Devon", "Plymouth", "South Hams", "South Somerset", "Torbay", "West Devon", "West Dorset", "West Somerset")
SouthWest <- lads_by_day_cumulative_79[,which(names(lads_by_day_cumulative_79)%in%tNames)]
nEms <- length(tNames)
Timing fitting the emulators
system.time(EM.list <- lapply(1:nEms, function(i) mleHetGP(X = as.matrix(tX), Z = unlist(SouthWest[,i]), lower = 0.0001, upper = 10)))
## user system elapsed
## 4.872 1.032 5.927
The speed looks promising. Each emulator is shown below
for(k in 1:nEms){
par(mfrow=c(1,3))
p.het <- predict(EM.list[[k]], xnews) #make predictions
pvar.het <- p.het$sd2 + p.het$nugs
par(mfrow=c(1,3))
errorbar(x=xnews[,1],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "incubation_time", ylab = "Infections",
main =tNames[k])
points(tX[,1] , unlist(SouthWest[,k]),col=4)
errorbar(x=xnews[,2],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "infectious_time", ylab = "Infections",
main =tNames[k])
points(tX[,2] , unlist(SouthWest[,k]),col=4)
errorbar(x=xnews[,3],y=p.het$mean, yerr = 2*sqrt(pvar.het), pch=16, bar.col = "black",lwd=0.5,xlab = "r_zero", ylab = "Infections",
main =tNames[k])
points(tX[,3] , unlist(SouthWest[,k]),col=4)
}
The hetGP
fits are not great, though they capture the variance structure. There will be an issue with the low number of runs we have adding to this, but we may need to explore the choices in the covariance function.
There is a substantial issue with negative counts predicted. We might consider a negative binomial model with the GP being latent to capture this. We might also truncate, but investigation will be needed to see what the impact of that modelling is.
Baker, Evan, Pierre Barbillon, Arindam Fadikar, Robert Grammacy, Radu Herbei, David Higdon, Jiangeng Huang, et al. 2020. “Sochastic Simulators an Overview with Opportunities.” arXiv. https://arxiv.org/pdf/2002.01321.