Skip to contents

Merge average df with predictions with original data frame

Usage

merge_average_df(
  avg_df,
  df,
  response,
  average_cols,
  group_col,
  obs_filter,
  sort_col,
  pred_col,
  pred_upper_col,
  pred_lower_col,
  test_col
)

Arguments

avg_df

Data frame with average trends.

df

Data frame of model data.

response

Column name of response variable.

average_cols

Column name(s) of column(s) for use in grouping data for averaging, such as regions. If missing, uses global average of the data for infilling.

group_col

Column name(s) of group(s) to use in dplyr::group_by() when supplying type, calculating mean absolute scaled error on data involving time series, and if group_models, then fitting and predicting models too. If NULL, not used. Defaults to "iso3".

obs_filter

String value of the form "logical operator integer" that specifies the number of observations required to fit the model and replace observations with predicted values. This is done in conjunction with group_col. So, if group_col = "iso3" and obs_filter = ">= 5", then for this model, predictions will only be used for iso3 vales that have 5 or more observations. Possible logical operators to use are >, >=, <, <=, ==, and !=.

If `group_models = FALSE`, then `obs_filter` is only used to determine when
predicted values replace observed values but **is not** used to restrict values
from being used in model fitting. If `group_models = TRUE`, then a model
is only fit for a group if they meet the `obs_filter` requirements. This provides
speed benefits, particularly when running INLA time series using `predict_inla()`.
sort_col

Column name(s) to use to dplyr::arrange() the data prior to supplying type and calculating mean absolute scaled error on data involving time series. If NULL, not used. Defaults to "year".

pred_col

Column name to store predicted value.

pred_upper_col

Column name to store upper bound of confidence interval generated by the predict_... function. This stores the full set of generated values for the upper bound.

pred_lower_col

Column name to store lower bound of confidence interval generated by the predict_... function. This stores the full set of generated values for the lower bound.

test_col

Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If NULL, ignored. See model_error() for details on the methods and metrics returned.

Value

Original data frame with new trend joined up.