model_error()
calculates modeling error using observed and fitted values from
the data frame. If test_col
is provided, the error is only calculated on
observations that were excluded from modeling for test purpose. Otherwise,
the error is calculated for all non-missing values.
Usage
model_error(
df,
response,
test_col = NULL,
test_period = NULL,
test_period_flex = FALSE,
group_col = NULL,
sort_col = NULL,
sort_descending = FALSE,
pred_col = "pred",
pred_upper_col = "pred_upper",
pred_lower_col = "pred_lower"
)
Arguments
- df
Data frame of model data.
- response
Column name of response variable.
- test_col
Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If
NULL
, ignored. Seemodel_error()
for details on the methods and metrics returned.- test_period
Length of period to test for RMChE. If
NULL
, beginning and end points of each group ingroup_col
are compared. Otherwise,test_period
must be set to an integern
and for each group, comparisons are made between the end point andn
periods prior.- test_period_flex
Logical value indicating if
test_period
is less than the full length of the series, should change error still be calculated for that point. Defaults toFALSE
.- group_col
Column name(s) of group(s) to use in
dplyr::group_by()
when supplying type, calculating mean absolute scaled error on data involving time series, and ifgroup_models
, then fitting and predicting models too. IfNULL
, not used. Defaults to"iso3"
.- sort_col
Column name(s) to use to
dplyr::arrange()
the data prior to supplying type and calculating mean absolute scaled error on data involving time series. IfNULL
, not used. Defaults to"year"
.- sort_descending
Logical value on whether the sorted values from
sort_col
should be sorted in descending order. Defaults toFALSE
.- pred_col
Column name to store predicted value.
- pred_upper_col
Column name to store upper bound of confidence interval generated by the
predict_...
function. This stores the full set of generated values for the upper bound.- pred_lower_col
Column name to store lower bound of confidence interval generated by the
predict_...
function. This stores the full set of generated values for the lower bound.
Details
The error metrics generated from model_error()
are the following:
RMSE: root mean squared error
MAE: mean absolute error
MdAE: median absolute error
MASE: mean absolute scaled error. Only calculated if
test_col
is provided, as it is test error scaled by in-sample error.CBA: confidence bound accuracy, % of observations lying within the confidence bounds. Should be very near to 95%. Only calculated if both
pred_upper_col
andpred_lower_col
are provided.R2: R-squared or coefficient of determination. Calculated only on test values if
test_col
is provided. Due to the variety of models available within augury, as well as thepredict_..._avg_trend()
functions, adjusted R-squared is not currently available.COR: Pearson correlation coefficient of fitted values to observations. Useful as a measure of general trend matching beyond the point error measurements used above. If
group_col
provided, correlation coefficients are calculated within each group and the average across all groups is returned. Calculated on all data, but be careful in interpreting when applied to non-time series data.RMChE: root mean change error. Since the GPW13 infilling and projections are designed to estimate change over time, RMChE measures the accuracy of this change. It is calculated as the difference between observed change between two time periods and predicted change across those same time periods. If
test_period
isNULL
, this is the beginning and end of each group fromgroup_col
, sorted bysort_col
. Iftest_period
is provided as an integern
, then instead it is calculated comparing change between the end andn
periods prior.test_period_flexibility
says whether or not to calculate the change if the full length of the series is less thantest_period
. IfTRUE
, then it again compares change between the beginning and end of the series for that group.