To look at using forecast methods to predict data, we will again be using the ghost package, which provides an R interface for the GHO OData API and accessing data on blood pressure. We will load in data for the USA and Great Britain initially, which provide full time series from 1975 to 2015.
library(augury)
df <- ghost::gho_data("BP_04", query = "$filter=SpatialDim in ('USA', 'GBR') and Dim1 eq 'MLE' and Dim2 eq 'YEARS18-PLUS'") %>%
billionaiRe::wrangle_gho_data() %>%
dplyr::right_join(tidyr::expand_grid(iso3 = c("USA", "GBR"),
year = 1975:2017))
#> Warning: Some of the rows are missing a source value.
#> Joining, by = c("iso3", "year")
head(df)
#> # A tibble: 6 × 13
#> iso3 year ind value lower upper use_dash use_calc source type type_detail
#> <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <chr> <chr>
#> 1 GBR 1975 bp 37.8 26.7 49.1 TRUE TRUE NA NA NA
#> 2 GBR 1976 bp 37.6 27.4 48 TRUE TRUE NA NA NA
#> 3 GBR 1977 bp 37.3 27.9 46.8 TRUE TRUE NA NA NA
#> 4 GBR 1978 bp 37.1 28.4 45.9 TRUE TRUE NA NA NA
#> 5 GBR 1979 bp 36.9 28.8 45.2 TRUE TRUE NA NA NA
#> 6 GBR 1980 bp 36.7 29.2 44.4 TRUE TRUE NA NA NA
#> # … with 2 more variables: other_detail <chr>, upload_detail <chr>
With this data, we can now use the predict_forecast()
function like we would any of the other predict_...
functions from augury to forecast out to 2017. First, we will do this just on USA data and use the forecast::holt
to forecast using exponential smoothing.
usa_df <- dplyr::filter(df, iso3 == "USA")
predict_forecast(usa_df,
forecast::holt,
"value",
sort_col = "year") %>%
dplyr::filter(year >= 2012)
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> # A tibble: 6 × 16
#> iso3 year ind value lower upper use_dash use_calc source type type_detail
#> <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <chr> <chr>
#> 1 USA 2012 bp 15.7 11.7 20.3 TRUE TRUE NA NA NA
#> 2 USA 2013 bp 15.5 11.2 20.8 TRUE TRUE NA NA NA
#> 3 USA 2014 bp 15.4 10.8 21.3 TRUE TRUE NA NA NA
#> 4 USA 2015 bp 15.3 10.4 21.8 TRUE TRUE NA NA NA
#> 5 USA 2016 NA 15.2 NA NA NA NA NA NA NA
#> 6 USA 2017 NA 15.1 NA NA NA NA NA NA NA
#> # … with 5 more variables: other_detail <chr>, upload_detail <chr>, pred <dbl>,
#> # pred_upper <dbl>, pred_lower <dbl>
Of course, we might want to run these models all together for each country individually. In this case, we can use the group_models = TRUE
function to perform the forecast individually by country. To save a bit of limited time, let’s use the wrapper predict_holt()
to automatically supply forecast::holt
as the forecasting function.
predict_holt(df,
response = "value",
group_col = "iso3",
group_models = TRUE,
sort_col = "year") %>%
dplyr::filter(year >= 2014, year <= 2017)
#> # A tibble: 8 × 16
#> iso3 year ind value lower upper use_dash use_calc source type type_detail
#> <chr> <int> <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <chr> <chr>
#> 1 GBR 2014 bp 18.5 14 23.3 TRUE TRUE NA NA NA
#> 2 GBR 2015 bp 17.9 13 23.2 TRUE TRUE NA NA NA
#> 3 GBR 2016 NA 17.3 NA NA NA NA NA NA NA
#> 4 GBR 2017 NA 16.7 NA NA NA NA NA NA NA
#> 5 USA 2014 bp 15.4 10.8 21.3 TRUE TRUE NA NA NA
#> 6 USA 2015 bp 15.3 10.4 21.8 TRUE TRUE NA NA NA
#> 7 USA 2016 NA 15.2 NA NA NA NA NA NA NA
#> 8 USA 2017 NA 15.1 NA NA NA NA NA NA NA
#> # … with 5 more variables: other_detail <chr>, upload_detail <chr>, pred <dbl>,
#> # pred_upper <dbl>, pred_lower <dbl>
Et voila, we have the same results for the USA and have also ran forecasting on Great Britain as well. However, you should be careful on the data that is supplied for forecasting. The forecast
package functions default to using the longest, contiguous non-missing data for forecasting. augury
instead automatically pulls the latest contiguous observed data to use for forecasting, to ensure that older data is not prioritized over new data. However, this means any break in a time series will prevent data before that from being used.
bad_df <- dplyr::tibble(x = c(1:4, NA, 3:2, rep(NA, 4)))
predict_holt(bad_df, "x", group_col = NULL, sort_col = NULL, group_models = FALSE)
#> # A tibble: 11 × 6
#> x pred pred_upper pred_lower upper lower
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 NA NA NA NA NA
#> 2 2 NA NA NA NA NA
#> 3 3 NA NA NA NA NA
#> 4 4 NA NA NA NA NA
#> 5 NA NA NA NA NA NA
#> 6 3 NA NA NA NA NA
#> 7 2 NA NA NA NA NA
#> 8 1.17 1.17 2.55 -0.217 NA NA
#> 9 0.338 0.338 2.33 -1.66 NA NA
#> 10 -0.494 -0.494 2.14 -3.12 NA NA
#> 11 -1.32 -1.32 1.98 -4.63 NA NA
It’s advisable to consider if other data infilling or imputation methods should be used to generate a full time series prior to the use of forecasting methods to prevent issues like above from impacting the predictive accuracy.