Use linear interpolation and flat extrapolation to infill data
Source:R/predict_simple.R
predict_simple.Rd
predict_simple()
does simple linear interpolation and/or flat extrapolation
on a column using zoo::na.approx()
. Similar to other predict functions, it also
allows filling in of type and source if necessary. However, it does not provide
confidence bounds on the estimates, like other predict_...
model-based
functions provide.
Usage
predict_simple(
df,
model = c("forward", "all", "flat_extrap", "linear_interp", "back_extrap",
"both_extrap"),
col = "value",
ret = c("df", "all", "error"),
test_col = NULL,
test_period = NULL,
test_period_flex = NULL,
group_col = "iso3",
obs_filter = NULL,
sort_col = "year",
sort_descending = FALSE,
pred_col = "pred",
type_col = NULL,
types = c("imputed", "imputed", "projected"),
source_col = NULL,
source = NULL,
scenario_detail_col = NULL,
scenario_detail = NULL,
replace_obs = c("missing", "none")
)
Arguments
- df
Data frame of model data.
- model
Type of simple extrapolation or interpolation to perform:
forward
: Justflat_extrap
andlinear_interp
. (default)all
: All offlat_extrap
,linear_interp
, andback_extrap
flat_extrap
: Flat extrapolation from latest observed point.linear_interp
: Linear interpolation between observed data points.back_extrap
: Flat extrapolation from first observed data point backwards.both_extrap
: Bothflat_extrap
andback_extrap
.
- col
Name of column to extrapolate/interpolate.
- ret
Character vector specifying what values the function returns. Defaults to returning a data frame, but can return a vector of model error, the model itself or a list with all 3 as components.
- test_col
Name of logical column specifying which response values to remove for testing the model's predictive accuracy. If
NULL
, ignored. Seemodel_error()
for details on the methods and metrics returned.- test_period
Length of period to test for RMChE. If
NULL
, beginning and end points of each group ingroup_col
are compared. Otherwise,test_period
must be set to an integern
and for each group, comparisons are made between the end point andn
periods prior.- test_period_flex
Logical value indicating if
test_period
is less than the full length of the series, should change error still be calculated for that point. Defaults toFALSE
.- group_col
Column name(s) of group(s) to use in
dplyr::group_by()
when supplying type, calculating mean absolute scaled error on data involving time series, and ifgroup_models
, then fitting and predicting models too. IfNULL
, not used. Defaults to"iso3"
.- obs_filter
String value of the form "
logical operator
integer
" that specifies the number of observations required to fit the model and replace observations with predicted values. This is done in conjunction withgroup_col
. So, ifgroup_col = "iso3"
andobs_filter = ">= 5"
, then for this model, predictions will only be used foriso3
vales that have 5 or more observations. Possible logical operators to use are>
,>=
,<
,<=
,==
, and!=
.If `group_models = FALSE`, then `obs_filter` is only used to determine when predicted values replace observed values but **is not** used to restrict values from being used in model fitting. If `group_models = TRUE`, then a model is only fit for a group if they meet the `obs_filter` requirements. This provides speed benefits, particularly when running INLA time series using `predict_inla()`.
- sort_col
Column name(s) to use to dplyr::arrange() the data prior to supplying type and calculating mean absolute scaled error on data involving time series. If NULL, not used. Defaults to "year". For
predict_simple()
, the first value insort_col
is passed tozoo::na.approx()
asxout
to ensure linear interpolation is based onsort_col
indexing rather than default data frame indexing.- sort_descending
Logical value on whether the sorted values from
sort_col
should be sorted in descending order. Defaults toFALSE
.- pred_col
Column name to store predicted value.
- type_col
Column name specifying data type.
- types
Vector of length 3 that provides the type to provide to data produced in the model. These values are only used to fill in type values where the dependent variable is missing. The first value is given to missing observations that precede the first observation, the second to those after the last observation, and the third for those following the final observation.
- source_col
Column name containing source information for the data frame. If provided, the argument in
source
is used to fill in where predictions have filled in missing data.- source
Source to add to missing values.
- scenario_detail_col
Column name containing scenario_detail information for the data frame. If provided, the argument in
scenario_detail
is used to fill in where prediction shave filled in missing data.- scenario_detail
Scenario details to add to missing values (usually the name of the model being used to generate the projection, optionally with relevant parameters).
- replace_obs
Character value specifying how, if at all, observations should be replaced by infilled values. By default, replaces missing values in
col
but if set to"none"
thencol
is not changed.
Value
Depending on the value passed to ret
, either a data frame with
predicted data, a vector of errors from model_error()
, a fitted model, or a list with all 3.