INLA modeling • augury

In order to show the INLA modeling wrappers provided in augury, we will look at two datasets publicly available on the World Health Organization’s Global Health Observatory. These can be accessed using the ghost package, which provides an R interface for the GHO OData API.

Time series modeling

The first indicator will be on safe sanitation. We will also use the billionaiRe package to quickly transform the GHO data into the simple format used by augury, billionaiRe, and other packages.

library(augury)

df <- ghost::gho_data("WSH_SANITATION_SAFELY_MANAGED",
                      query = "$filter=Dim1 eq 'TOTL'") %>%
  billionaiRe::wrangle_gho_data(source = "WHO GHO",
                                type = "estimated")

head(df)
#> # A tibble: 6 × 13
#>   iso3   year ind   value lower upper use_dash use_calc source type  type_detail
#>   <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl>    <lgl>    <chr>  <chr> <chr>      
#> 1 AFG    2000 hpop…    NA    NA    NA TRUE     TRUE     WHO G… esti… NA         
#> 2 AFG    2001 hpop…    NA    NA    NA TRUE     TRUE     WHO G… esti… NA         
#> 3 AFG    2002 hpop…    NA    NA    NA TRUE     TRUE     WHO G… esti… NA         
#> 4 AFG    2003 hpop…    NA    NA    NA TRUE     TRUE     WHO G… esti… NA         
#> 5 AFG    2004 hpop…    NA    NA    NA TRUE     TRUE     WHO G… esti… NA         
#> 6 AFG    2005 hpop…    NA    NA    NA TRUE     TRUE     WHO G… esti… NA         
#> # … with 2 more variables: other_detail <chr>, upload_detail <chr>

Now that we have the input data available from the GHO in an easy to use format, we can now join up with the covariates_df available in augury and run a time series model to predict sanitation out to 2023. For simplicity, let’s just look at Albania, with ISO3 code "ALB".

library(dplyr)

df <- left_join(covariates_df,
                df,
                by = c("iso3", "year")) %>%
  filter(iso3 == "ALB")

head(df)
#> # A tibble: 6 × 22
#>   iso3   year year_n region       un_subregion gbd_subregion sdg_subregion   sdi
#>   <chr> <int>  <dbl> <chr>        <chr>                <dbl> <chr>         <dbl>
#> 1 ALB    2000      1 Southern Eu… 39                      42 917           0.604
#> 2 ALB    2001      2 Southern Eu… 39                      42 917           0.61 
#> 3 ALB    2002      3 Southern Eu… 39                      42 917           0.614
#> 4 ALB    2003      4 Southern Eu… 39                      42 917           0.619
#> 5 ALB    2004      5 Southern Eu… 39                      42 917           0.625
#> 6 ALB    2005      6 Southern Eu… 39                      42 917           0.632
#> # … with 14 more variables: sdi_scaled <dbl>, e0 <dbl>, e0_scaled <dbl>,
#> #   ind <chr>, value <dbl>, lower <dbl>, upper <dbl>, use_dash <lgl>,
#> #   use_calc <lgl>, source <chr>, type <chr>, type_detail <chr>,
#> #   other_detail <chr>, upload_detail <chr>

Of course, the only “covariate” being used in this time series model is going to be year_n, but the rest are available if we want to expand to test other types of modeling. Let’s run the modeling now. We are going to scale the data and probit transform it before and after the modeling. We will use the predict_inla_ts() to fit a time series model to the data.

modeled_df <- df %>%
  scale_transform("value") %>%
  probit_transform("value") %>%
  predict_inla_ts(type_col = "type",
                  source_col = "source",
                  source = "augury modeling") %>%
  probit_transform(c("value", "pred", "upper", "lower"), inverse = TRUE) %>%
  scale_transform(c("value", "pred", "upper", "lower"), divide = FALSE)
#> Warning in is.null(x) || is.na(x): 'length(x) = 26 > 1' in coercion to
#> 'logical(1)'

# Only look at recent years and relevant columns

modeled_df %>%
  filter(year > 2015) %>%
  select(iso3, year, value, pred, lower, upper, source, type)
#> # A tibble: 10 × 8
#>    iso3   year value  pred lower upper source          type     
#>    <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>           <chr>    
#>  1 ALB    2016  46.7  46.7    NA    NA WHO GHO         estimated
#>  2 ALB    2017  47.1  47.0    NA    NA WHO GHO         estimated
#>  3 ALB    2018  47.4  47.4    NA    NA WHO GHO         estimated
#>  4 ALB    2019  47.6  47.6    NA    NA WHO GHO         estimated
#>  5 ALB    2020  47.7  47.7    NA    NA WHO GHO         estimated
#>  6 ALB    2021  47.9  47.9    NA    NA augury modeling projected
#>  7 ALB    2022  48.1  48.1    NA    NA augury modeling projected
#>  8 ALB    2023  48.3  48.3    NA    NA augury modeling projected
#>  9 ALB    2024  48.5  48.5    NA    NA augury modeling projected
#> 10 ALB    2025  48.6  48.6    NA    NA augury modeling projected

And there we go, we have now fit a time series model to our data, provided new type and source, and merged this into our existing data frame. However, in this setup, the error calculations returned by predict_inla_ts() are calculated in the probit space. If we wanted to scale and probit transform the response variable prior to model fitting, but still calculate error metrics and automatically return the response and predicted values back in the original space, we can set scale = 100 and probit = TRUE within predict_inla_ts().

df %>%
  predict_inla_ts(scale = 100,
                  probit = TRUE,
                  type_col = "type",
                  source_col = "source",
                  source = "augury modeling") %>%
  filter(year > 2015) %>%
  select(iso3, year, value, pred, lower, upper, source, type)
#> Warning in is.null(x) || is.na(x): 'length(x) = 26 > 1' in coercion to
#> 'logical(1)'
#> # A tibble: 10 × 8
#>    iso3   year value  pred lower upper source          type     
#>    <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>           <chr>    
#>  1 ALB    2016  46.7  46.7    NA    NA WHO GHO         estimated
#>  2 ALB    2017  47.1  47.0    NA    NA WHO GHO         estimated
#>  3 ALB    2018  47.4  47.4    NA    NA WHO GHO         estimated
#>  4 ALB    2019  47.6  47.6    NA    NA WHO GHO         estimated
#>  5 ALB    2020  47.7  47.7    NA    NA WHO GHO         estimated
#>  6 ALB    2021  47.9  47.9    NA    NA augury modeling projected
#>  7 ALB    2022  48.1  48.1    NA    NA augury modeling projected
#>  8 ALB    2023  48.3  48.3    NA    NA augury modeling projected
#>  9 ALB    2024  48.5  48.5    NA    NA augury modeling projected
#> 10 ALB    2025  48.6  48.6    NA    NA augury modeling projected

And we can see that the results here are the same as manually scaling and probit transforming the variables.

Mixed-effects modeling

Now we will look at another indicator, a composite of 13 International Health Regulations core capacity scores, SPAR version. Since countries only have two data points at most, we will use mixed-effects modeling to infill and project the data.

df <- ghost::gho_data("SDGIHR2018") %>%
  billionaiRe::wrangle_gho_data(source = "Electronic State Parties Self-Assessment Annual Reporting Tool (e-SPAR)",
                                type = "reported")

head(df)
#> # A tibble: 6 × 13
#>   iso3   year ind   value lower upper use_dash use_calc source type  type_detail
#>   <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl>    <lgl>    <chr>  <chr> <chr>      
#> 1 AFG    2018 espar    35    NA    NA TRUE     TRUE     Elect… repo… NA         
#> 2 AFG    2019 espar    43    NA    NA TRUE     TRUE     Elect… repo… NA         
#> 3 AFG    2020 espar    47    NA    NA TRUE     TRUE     Elect… repo… NA         
#> 4 AGO    2018 espar    59    NA    NA TRUE     TRUE     Elect… repo… NA         
#> 5 AGO    2019 espar    63    NA    NA TRUE     TRUE     Elect… repo… NA         
#> 6 AGO    2020 espar    65    NA    NA TRUE     TRUE     Elect… repo… NA         
#> # … with 2 more variables: other_detail <chr>, upload_detail <chr>

With this, let’s go straight into the modeling like last time, except we will now use predict_inla_me() for mixed-effects modeling using covariates found in covariates_df. This time, we want to model a first order auto-regressive process across time rather than a second-order random walk, so we use the "ar1" model available in INLA.

modeled_df <- df %>%
  right_join(covariates_df, by = c("iso3", "year")) %>%
  scale_transform("value") %>%
  probit_transform("value") %>%
  predict_inla_me(model = "ar1",
                  type_col = "type",
                  source_col = "source",
                  source = "WHO DDI Preliminary infilling and projections") %>%
  probit_transform(c("value", "pred", "upper", "lower"), inverse = TRUE) %>%
  scale_transform(c("value", "pred", "upper", "lower"), divide = FALSE)
#> Warning in length(marginals.random) >= i && !is.na(marginals.random[[i]]):
#> 'length(x) = 19 > 1' in coercion to 'logical(1)'
#> Warning in is.null(x) || is.na(x): 'length(x) = 19 > 1' in coercion to
#> 'logical(1)'
#> Warning in is.null(x) || is.na(x): 'length(x) = 26 > 1' in coercion to
#> 'logical(1)'

# Look at an example for Afghanistan

modeled_df %>%
  filter(year > 2017, iso3 == "AFG") %>%
  select(iso3, year, value, pred, lower, upper, source, type)
#> # A tibble: 8 × 8
#>   iso3   year value  pred lower upper source                               type 
#>   <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>                                <chr>
#> 1 AFG    2018  35    40.6    NA    NA Electronic State Parties Self-Asses… repo…
#> 2 AFG    2019  43    41.5    NA    NA Electronic State Parties Self-Asses… repo…
#> 3 AFG    2020  47    42.7    NA    NA Electronic State Parties Self-Asses… repo…
#> 4 AFG    2021  43.5  43.5    NA    NA WHO DDI Preliminary infilling and p… proj…
#> 5 AFG    2022  44.5  44.5    NA    NA WHO DDI Preliminary infilling and p… proj…
#> 6 AFG    2023  45.4  45.4    NA    NA WHO DDI Preliminary infilling and p… proj…
#> 7 AFG    2024  46.4  46.4    NA    NA WHO DDI Preliminary infilling and p… proj…
#> 8 AFG    2025  47.3  47.3    NA    NA WHO DDI Preliminary infilling and p… proj…

And exactly as we were able to do with the time series modeling, we now have infilled missing data for this indicator using mixed-effects modeling in INLA.

Building further

Building further on this work, you can tweak any of the arguments passed to these INLA models or use the base predict_inla() and other covariates to test and compare other models. There is much more functionality to test modeling accuracy and iteratively develop methods available in this package not shown here, so please continue to explore and play around.