In order to show the INLA modeling wrappers provided in augury, we will look at two datasets publicly available on the World Health Organization’s Global Health Observatory. These can be accessed using the ghost package, which provides an R interface for the GHO OData API.
Time series modeling
The first indicator will be on safe sanitation. We will also use the billionaiRe package to quickly transform the GHO data into the simple format used by augury, billionaiRe, and other packages.
library(augury)
df <- ghost::gho_data("WSH_SANITATION_SAFELY_MANAGED",
query = "$filter=Dim1 eq 'TOTL'") %>%
billionaiRe::wrangle_gho_data(source = "WHO GHO",
type = "estimated")
head(df)
#> # A tibble: 6 × 13
#> iso3 year ind value lower upper use_dash use_calc source type type_detail
#> <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <chr> <chr> <chr>
#> 1 AFG 2000 hpop… NA NA NA TRUE TRUE WHO G… esti… NA
#> 2 AFG 2001 hpop… NA NA NA TRUE TRUE WHO G… esti… NA
#> 3 AFG 2002 hpop… NA NA NA TRUE TRUE WHO G… esti… NA
#> 4 AFG 2003 hpop… NA NA NA TRUE TRUE WHO G… esti… NA
#> 5 AFG 2004 hpop… NA NA NA TRUE TRUE WHO G… esti… NA
#> 6 AFG 2005 hpop… NA NA NA TRUE TRUE WHO G… esti… NA
#> # … with 2 more variables: other_detail <chr>, upload_detail <chr>
Now that we have the input data available from the GHO in an easy to use format, we can now join up with the covariates_df
available in augury and run a time series model to predict sanitation out to 2023. For simplicity, let’s just look at Albania, with ISO3 code "ALB"
.
library(dplyr)
df <- left_join(covariates_df,
df,
by = c("iso3", "year")) %>%
filter(iso3 == "ALB")
head(df)
#> # A tibble: 6 × 22
#> iso3 year year_n region un_subregion gbd_subregion sdg_subregion sdi
#> <chr> <int> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 ALB 2000 1 Southern Eu… 39 42 917 0.604
#> 2 ALB 2001 2 Southern Eu… 39 42 917 0.61
#> 3 ALB 2002 3 Southern Eu… 39 42 917 0.614
#> 4 ALB 2003 4 Southern Eu… 39 42 917 0.619
#> 5 ALB 2004 5 Southern Eu… 39 42 917 0.625
#> 6 ALB 2005 6 Southern Eu… 39 42 917 0.632
#> # … with 14 more variables: sdi_scaled <dbl>, e0 <dbl>, e0_scaled <dbl>,
#> # ind <chr>, value <dbl>, lower <dbl>, upper <dbl>, use_dash <lgl>,
#> # use_calc <lgl>, source <chr>, type <chr>, type_detail <chr>,
#> # other_detail <chr>, upload_detail <chr>
Of course, the only “covariate” being used in this time series model is going to be year_n
, but the rest are available if we want to expand to test other types of modeling. Let’s run the modeling now. We are going to scale the data and probit transform it before and after the modeling. We will use the predict_inla_ts()
to fit a time series model to the data.
modeled_df <- df %>%
scale_transform("value") %>%
probit_transform("value") %>%
predict_inla_ts(type_col = "type",
source_col = "source",
source = "augury modeling") %>%
probit_transform(c("value", "pred", "upper", "lower"), inverse = TRUE) %>%
scale_transform(c("value", "pred", "upper", "lower"), divide = FALSE)
#> Warning in is.null(x) || is.na(x): 'length(x) = 26 > 1' in coercion to
#> 'logical(1)'
# Only look at recent years and relevant columns
modeled_df %>%
filter(year > 2015) %>%
select(iso3, year, value, pred, lower, upper, source, type)
#> # A tibble: 10 × 8
#> iso3 year value pred lower upper source type
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 ALB 2016 46.7 46.7 NA NA WHO GHO estimated
#> 2 ALB 2017 47.1 47.0 NA NA WHO GHO estimated
#> 3 ALB 2018 47.4 47.4 NA NA WHO GHO estimated
#> 4 ALB 2019 47.6 47.6 NA NA WHO GHO estimated
#> 5 ALB 2020 47.7 47.7 NA NA WHO GHO estimated
#> 6 ALB 2021 47.9 47.9 NA NA augury modeling projected
#> 7 ALB 2022 48.1 48.1 NA NA augury modeling projected
#> 8 ALB 2023 48.3 48.3 NA NA augury modeling projected
#> 9 ALB 2024 48.5 48.5 NA NA augury modeling projected
#> 10 ALB 2025 48.6 48.6 NA NA augury modeling projected
And there we go, we have now fit a time series model to our data, provided new type and source, and merged this into our existing data frame. However, in this setup, the error calculations returned by predict_inla_ts()
are calculated in the probit space. If we wanted to scale and probit transform the response variable prior to model fitting, but still calculate error metrics and automatically return the response and predicted values back in the original space, we can set scale = 100
and probit = TRUE
within predict_inla_ts()
.
df %>%
predict_inla_ts(scale = 100,
probit = TRUE,
type_col = "type",
source_col = "source",
source = "augury modeling") %>%
filter(year > 2015) %>%
select(iso3, year, value, pred, lower, upper, source, type)
#> Warning in is.null(x) || is.na(x): 'length(x) = 26 > 1' in coercion to
#> 'logical(1)'
#> # A tibble: 10 × 8
#> iso3 year value pred lower upper source type
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 ALB 2016 46.7 46.7 NA NA WHO GHO estimated
#> 2 ALB 2017 47.1 47.0 NA NA WHO GHO estimated
#> 3 ALB 2018 47.4 47.4 NA NA WHO GHO estimated
#> 4 ALB 2019 47.6 47.6 NA NA WHO GHO estimated
#> 5 ALB 2020 47.7 47.7 NA NA WHO GHO estimated
#> 6 ALB 2021 47.9 47.9 NA NA augury modeling projected
#> 7 ALB 2022 48.1 48.1 NA NA augury modeling projected
#> 8 ALB 2023 48.3 48.3 NA NA augury modeling projected
#> 9 ALB 2024 48.5 48.5 NA NA augury modeling projected
#> 10 ALB 2025 48.6 48.6 NA NA augury modeling projected
And we can see that the results here are the same as manually scaling and probit transforming the variables.
Mixed-effects modeling
Now we will look at another indicator, a composite of 13 International Health Regulations core capacity scores, SPAR version. Since countries only have two data points at most, we will use mixed-effects modeling to infill and project the data.
df <- ghost::gho_data("SDGIHR2018") %>%
billionaiRe::wrangle_gho_data(source = "Electronic State Parties Self-Assessment Annual Reporting Tool (e-SPAR)",
type = "reported")
head(df)
#> # A tibble: 6 × 13
#> iso3 year ind value lower upper use_dash use_calc source type type_detail
#> <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <chr> <chr> <chr>
#> 1 AFG 2018 espar 35 NA NA TRUE TRUE Elect… repo… NA
#> 2 AFG 2019 espar 43 NA NA TRUE TRUE Elect… repo… NA
#> 3 AFG 2020 espar 47 NA NA TRUE TRUE Elect… repo… NA
#> 4 AGO 2018 espar 59 NA NA TRUE TRUE Elect… repo… NA
#> 5 AGO 2019 espar 63 NA NA TRUE TRUE Elect… repo… NA
#> 6 AGO 2020 espar 65 NA NA TRUE TRUE Elect… repo… NA
#> # … with 2 more variables: other_detail <chr>, upload_detail <chr>
With this, let’s go straight into the modeling like last time, except we will now use predict_inla_me()
for mixed-effects modeling using covariates found in covariates_df
. This time, we want to model a first order auto-regressive process across time rather than a second-order random walk, so we use the "ar1"
model available in INLA.
modeled_df <- df %>%
right_join(covariates_df, by = c("iso3", "year")) %>%
scale_transform("value") %>%
probit_transform("value") %>%
predict_inla_me(model = "ar1",
type_col = "type",
source_col = "source",
source = "WHO DDI Preliminary infilling and projections") %>%
probit_transform(c("value", "pred", "upper", "lower"), inverse = TRUE) %>%
scale_transform(c("value", "pred", "upper", "lower"), divide = FALSE)
#> Warning in length(marginals.random) >= i && !is.na(marginals.random[[i]]):
#> 'length(x) = 19 > 1' in coercion to 'logical(1)'
#> Warning in is.null(x) || is.na(x): 'length(x) = 19 > 1' in coercion to
#> 'logical(1)'
#> Warning in is.null(x) || is.na(x): 'length(x) = 26 > 1' in coercion to
#> 'logical(1)'
# Look at an example for Afghanistan
modeled_df %>%
filter(year > 2017, iso3 == "AFG") %>%
select(iso3, year, value, pred, lower, upper, source, type)
#> # A tibble: 8 × 8
#> iso3 year value pred lower upper source type
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 AFG 2018 35 40.6 NA NA Electronic State Parties Self-Asses… repo…
#> 2 AFG 2019 43 41.5 NA NA Electronic State Parties Self-Asses… repo…
#> 3 AFG 2020 47 42.7 NA NA Electronic State Parties Self-Asses… repo…
#> 4 AFG 2021 43.5 43.5 NA NA WHO DDI Preliminary infilling and p… proj…
#> 5 AFG 2022 44.5 44.5 NA NA WHO DDI Preliminary infilling and p… proj…
#> 6 AFG 2023 45.4 45.4 NA NA WHO DDI Preliminary infilling and p… proj…
#> 7 AFG 2024 46.4 46.4 NA NA WHO DDI Preliminary infilling and p… proj…
#> 8 AFG 2025 47.3 47.3 NA NA WHO DDI Preliminary infilling and p… proj…
And exactly as we were able to do with the time series modeling, we now have infilled missing data for this indicator using mixed-effects modeling in INLA.
Building further
Building further on this work, you can tweak any of the arguments passed to these INLA models or use the base predict_inla()
and other covariates to test and compare other models. There is much more functionality to test modeling accuracy and iteratively develop methods available in this package not shown here, so please continue to explore and play around.