get_model_data()
ensures that only variables necessary for the model
are included in the dataset and missing data and test sets are removed, if
test_col
is not NULL
. If filter_na
is "all"
(the default), then any
observations with NA
values are removed using na.omit()
. If filter_na
is
"response"
or "predictors"
then only rows with missing dependent or independent
variables are removed, respectively. If "none"
, then no filtering is done at all.
Usage
get_model_data(
df,
formula_vars,
test_col,
group_col = NULL,
filter_na,
reduce_columns = TRUE
)
Arguments
- df
Data frame of model data.
- formula_vars
Character vector of variables used in the model. Can be extracted from a formula using
all.vars(fmla)
.- test_col
Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If
NULL
, ignored. Seemodel_error()
for details on the methods and metrics returned.- group_col
Column name(s) of group(s) to use in
dplyr::group_by()
when supplying type, calculating mean absolute scaled error on data involving time series, and ifgroup_models
, then fitting and predicting models too. IfNULL
, not used. Defaults to"iso3"
.- filter_na
Character value specifying how, if at all, to filter
NA
values from the dataset prior to applying the model. By default, all observations with missing values are removed, although it can also remove rows only if they have missing dependent or independent variables, or no filtering at all.- reduce_columns
Logical on whether or not to reduce the number of columns in the data to just those necessary for modelling.