Fit forecast model to averages and apply trend to original data

Used within predict_forecast_avg_trend(), this function fits the model to the data frame, working whether the model is being fit across the entire data frame or being fit to each group individually. Data is filtered prior to fitting, model(s) are fit, and then fitted values are generated on the original.

Usage

fit_forecast_average_model(
  df,
  forecast_function,
  response,
  average_cols,
  weight_col,
  ...,
  test_col,
  group_col,
  group_models,
  sort_col,
  sort_descending,
  pred_col,
  pred_upper_col,
  pred_lower_col,
  filter_na,
  ret
)

Arguments

df: Data frame of model data.
forecast_function: An R function that outputs a forecast object coming from the forecast package. You can directly pass forecast::forecast() to the function, or you can pass other wrappers to it such as forecast::holt() or forecast::ses().
response: Column name of response variable to be used as the input to the forecast function.
average_cols: Column name(s) of column(s) for use in grouping data for averaging, such as regions. If missing, uses global average of the data for infilling.
weight_col: Column name of column of weights to be used in averaging, such as country population.
...: Additional arguments passed to the forecast function.
test_col: Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If NULL, ignored. See model_error() for details on the methods and metrics returned.
group_col: Column name(s) of group(s) to use in dplyr::group_by() when supplying type, calculating mean absolute scaled error on data involving time series, and if group_models, then fitting and predicting models too. If NULL, not used. Defaults to "iso3".
group_models: Logical, if TRUE, fits and predicts models individually onto each group_col. If FALSE, a general model is fit across the entire data frame.
sort_col: Column name of column to arrange data by in dplyr::arrange(), prior to filtering for latest contiguous time series and producing the forecast. Not used if NULL, defaults to "year".
sort_descending: Logical value on whether the sorted values from sort_col should be sorted in descending order. Defaults to FALSE.
pred_col: Column name to store predicted value.
pred_upper_col: Column name to store upper bound of confidence interval generated by the predict_... function. This stores the full set of generated values for the upper bound.
pred_lower_col: Column name to store lower bound of confidence interval generated by the predict_... function. This stores the full set of generated values for the lower bound.
filter_na: Character value specifying how, if at all, to filter NA values from the dataset prior to applying the model. By default, all observations with missing values are removed, although it can also remove rows only if they have missing dependent or independent variables, or no filtering at all.
ret: Character vector specifying what values the function returns. Defaults to returning a data frame, but can return a vector of model error, the model itself or a list with all 3 as components.

Value

List of mdl (fitted model) and df (data frame with fitted values and confidence bounds generated from the model).

Details

If fitting models individually to each group, mdl will never be returned, as as these are instead a large group of models. Otherwise, a list of mdl and df is returned and used within predict_forecast().