## Summary Notes summarising a Gaussian process seminar series talk by [Marta Blangiardo · Imperial College London](http://www.envstats.org/) on estimating COVID-19 excess mortality. Detailed notes below. The full required talk is available from the seminar series website. Seminar series: https://gp-seminar-series.github.io/current/2022/03/28/marta-blangiardo.html Talk title: *Spatio-temporal modelling of COVID using Gaussian processes* ## Introduction - Several studies have looked at this. - Later papers have started to look at lower level aggregations of the data but generally quite coarse. - Used NUTS3 which are standardised units across countries. - Used a spatio-temporal modelling framework (INLA) - Predicted population in 2021 using population statics from 2015-2020. Ignored uncertainty. - Wanted to account for factors that account for mortality. Used temperature data, holiday data. - Analysis done separately for each country. - Main analysis excludes younger age groups as tricky and low drivers of mortality. ## First paper focus: Regional excess mortality during the 2020 COVID-19 pandemic: a study of five European countries Paper: https://www.nature.com/articles/s41467-022-28157-3 Code: https://github.com/gkonstantinoudis/ExcessDeathsCOVID/tree/rcode Based on this Italy specific paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0240286 Italy paper code: https://github.com/martablangiardo/ExcessDeathsItaly ## Model outline ![[2022-03-28-gp-talk-model.png]] - Used an expected deaths offset to account for noisy sparse reporting - BYM model for spatial variation. Mixture of random and spatially structure effects using a neighbourhood matrix - Use model to predict out of sample (i.e 2021) what expected mortality would have been. ## INLA implementation - Latent Gaussian models - Gaussian Markov random fields - Makes use of the Laplace approximation - Often we assume that the latent field employs conditional independence property. - For Laplace approximation to work well need to have less than 20 model parameters approximately - INLA is fast because it is a deterministic approximation and assumes conditional independence. ## Model validation - Performed cross-validation leaving a year out each time. - Predict the number of deaths in each year. - Repeat for different age and sex groups + countries independently vs a joint model - Assess agreement based on correlation between predicted and observed deaths + coverage. ## Results - Excess mortality varied over time and across countries - Lots of other interesting details (see recording) ## Next paper: Community factors and excess mortality in three European countries Paper: https://www.nature.com/articles/s41467-021-23935-x Code (only snippets): https://github.com/smallAreaHealthStatisticsUnit/SAHSU-COVID-19/tree/v1.0.0 - Used consistent covariates across countries, covering deprivation, air pollution, living conditions, population density, and movement of people. - Used a two-stage model - Modelled deaths in 2020 based on covariates - Excess deaths as the ratio with the corresponding model for 2015-2019 - Second stage models used posterior samples from the first model and then had its posterior combined. ### Stage 1 model ```r library(INLA #Set up the formula to use formula = count~ 1 + f(MSOAOrder, model = 'bym', graph = IG, scale.model = TRUE, hyper=list(prec.unstruct=list(param=c(1,0.1)), prec.spatial=list(param=c(1,0.1)))) + f(YEAR, model = "iid") + relevel(as.factor(AGECAT), ref = 4) #use the highest age cat (80+) as the reference (most deaths) ``` ### Stage 2 model ```r formula = count~ f(MSOAOrder, model = 'bym', graph = IG, scale.model = TRUE, hyper=list(prec.unstruct=list(param=c(1,0.1)), prec.spatial=list(param=c(1,0.1)))) + relevel(as.factor(AGECAT), ref = 4) + as.factor(OCR_Q) + as.factor(NWHITE_Q) + as.factor(INCOME_Q) + as.factor(NO2_Q) + as.factor(PD_Q) + as.factor(PM25_Q) + as.factor(CH_pop_Q) cat("Starting INLA \n") result2020=inla(formula, family = 'poisson', E=Exp, data = data, control.compute=list(dic=TRUE, config=TRUE), quantiles=c(0.025,0.5,0.975),verbose = FALSE) ``` ## Conclusons - Sub-national estimates are useful - Spatio-temporal Bayesian models offer a flexible framework - stable estimates - able to account for correlations and non-linearities - naturally able to model and report uncertaint ## Helping dissemination Shiny app: http://atlasmortalidad.uclm.es/excess/ - They put together a tutorial for estimating excess mortality using their approach (under review). - Also made a shiny app. ## Questions - S: Mentioned you used cross-validation and correlation + coverage to validate the model. Did you consider time-aware cross validation + proper scoring rules and if so why did you choose to go with the approach you used? - No they didn't - S: Model variants explored beside that mentioned? In particular did you explore alternatives to the BYM model? - See Q below. Did not explore in great detail. - S: Code for the second paper presented and the tutorial for estimating excess mortality? - Email for the code. Tutorial is also under review and so no code yet - Others: Why BYM? - More flexible than CAR etc. Offers a mixture between spatial and non-spatial effects. Using this kind of neighbourhood based model is easier to fit than trying to fitting a continuous spatial model. - Roman: At which spatial support were the observations available - In the first part at the NUTS3 regions - Roman: Could change spatial support to a grid to better reflect spatial variation - Bound by data availability. - Adam: Why did you drop the data for under 40's? That was the area where the model was most required. Why did you decide to make this decision given the model should have included the uncertainty. - Ran it on under 40's but excluded as the model didn't appear to do that well. - Alex: What are the bottlenecks? What constrains current modelling? - Smaller grids and more time. ## Discussions - On the cusp of being able to expand Gaussian processes uses to a much wider problem space. - People at the talk very interested in what research needs to be done. - Discussed time-evolving spatial kernels as one area of focus. - Also discussed enabling embedding GPs in more complex models. ## Tags #people/marta-blangiardo #gaussian-processes #excess-mortality #inla #spatial-stats #seminars/gaussian-processes #blog/2022