predict_average_fn()
does simple imputation and flat extrapolation
using averages grouped by average_cols
.
Usage
predict_average_fn(
df,
col,
average_cols = NULL,
weight_col = NULL,
flat_extrap = TRUE,
test_col = NULL,
group_col = NULL,
obs_filter = NULL,
pred_col = "pred",
sort_col = NULL,
sort_descending = FALSE,
error_correct = FALSE,
error_correct_cols = NULL,
shift_trend = FALSE
)
Arguments
- df
Data frame of model data.
- col
Name of column to extrapolate/interpolate.
- average_cols
Column name(s) of column(s) for use in grouping data for averaging, such as regions. If missing, uses global average of the data for infilling.
- weight_col
Column name of column of weights to be used in averaging, such as country population.
- flat_extrap
Logical value determining whether or not to flat extrapolate using the latest average for missing rows with no data available.
- test_col
Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If
NULL
, ignored. Seemodel_error()
for details on the methods and metrics returned.- group_col
Column name(s) of group(s) to use in
dplyr::group_by()
when supplying type, calculating mean absolute scaled error on data involving time series, and ifgroup_models
, then fitting and predicting models too. IfNULL
, not used. Defaults to"iso3"
.- obs_filter
String value of the form "
logical operator
integer
" that specifies the number of observations required to fit the model and replace observations with predicted values. This is done in conjunction withgroup_col
. So, ifgroup_col = "iso3"
andobs_filter = ">= 5"
, then for this model, predictions will only be used foriso3
vales that have 5 or more observations. Possible logical operators to use are>
,>=
,<
,<=
,==
, and!=
.If `group_models = FALSE`, then `obs_filter` is only used to determine when predicted values replace observed values but **is not** used to restrict values from being used in model fitting. If `group_models = TRUE`, then a model is only fit for a group if they meet the `obs_filter` requirements. This provides speed benefits, particularly when running INLA time series using `predict_inla()`.
- pred_col
Column name to store predicted value.
- sort_col
Column name(s) to use to
dplyr::arrange()
the data prior to supplying type and calculating mean absolute scaled error on data involving time series. IfNULL
, not used. Defaults to"year"
.- sort_descending
Logical value on whether the sorted values from
sort_col
should be sorted in descending order. Defaults toFALSE
.- error_correct
Logical value indicating whether or not whether mean error should be used to adjust predicted values. If
TRUE
, the mean error between observed and predicted data points will be used to adjust predictions. Iferror_correct_cols
is notNULL
, mean error will be used within those groups instead of overall mean error.- error_correct_cols
Column names of data frame to group by when applying error correction to the predicted values.
- shift_trend
Logical value specifying whether or not to shift predictions so that the trend matches up to the last observation. If
error_correct
andshift_trend
are bothTRUE
,shift_trend
takes precedence.