Forecast with multivariate models — forecast

A function to estimate multivariate forecasts out-of-sample. Methods available include: vector auto-regression, linear regression, lasso regression, ridge regression, elastic net, random forest, tree-based gradient boosting machine, and single-layer neural network. The function will take in a data frame of the target variable, exogenous variables, and a 'date' column, while outputting a data frame with a date column and one column per forecast method selected.

forecast_multivariate(
  Data,
  forecast.dates,
  target,
  horizon,
  method,
  rolling.window = NA,
  freq,
  lag.variables = NULL,
  lag.n = NULL,
  outlier.clean = FALSE,
  outlier.variables = NULL,
  outlier.bounds = c(0.05, 0.95),
  outlier.trim = FALSE,
  outlier.cross_section = FALSE,
  impute.missing = FALSE,
  impute.method = "kalman",
  impute.variables = NULL,
  impute.verbose = FALSE,
  reduce.data = FALSE,
  reduce.variables = NULL,
  reduce.ncomp = NULL,
  reduce.standardize = TRUE,
  parallel.dates = NULL,
  return.models = FALSE,
  return.data = FALSE
)

Arguments

Data	data.frame: data frame of target variable, exogenous variables, and observed date (named 'date'); may alternatively be a `ts`, `xts`, or `zoo` object to forecast
forecast.dates	date: dates forecasts are created
target	string: column name in Data of variable to forecast
horizon	int: number of periods into the future to forecast
method	string or vector: methods to use; 'var', 'ols', 'ridge', 'lasso', 'elastic', 'RF', 'GBM', 'NN'
rolling.window	int: size of rolling window, NA if expanding window is used
freq	string: time series frequency; day, week, month, quarter, year
lag.variables	string: vector of variables to lag each time step, if lag.n is not null then the default is all non-date variables
lag.n	int: number of lags to create
outlier.clean	boolean: if TRUE then clean outliers
outlier.variables	string: vector of variables to standardize, default is all but 'date' column
outlier.bounds	double: vector of winsorizing minimum and maximum bounds, c(min percentile, max percentile)
outlier.trim	boolean: if TRUE then replace outliers with NA instead of winsorizing bound
outlier.cross_section	boolean: if TRUE then remove outliers based on cross-section (row-wise) instead of historical data (column-wise)
impute.missing	boolean: if TRUE then impute missing values
impute.method	string: select which method to use from the imputeTS package; 'interpolation', 'kalman', 'locf', 'ma', 'mean', 'random', 'remove','replace', 'seadec', 'seasplit'
impute.variables	string: vector of variables to impute missing values, default is all numeric columns
impute.verbose	boolean: show start-up status of impute.missing.routine
reduce.data	boolean: if TRUE then reduce dimension
reduce.variables	string: vector of variables to impute missing values, default is all numeric columns
reduce.ncomp	int: number of factors to create
reduce.standardize	boolean: normalize variables (mean zero, variance one) before estimating factors
parallel.dates	int: the number of cores available for parallel estimation
return.models	boolean: if TRUE then return list of models estimated each forecast.date
return.data	boolean: if True then return list of information.set for each forecast.date

Value

data.frame with a date column and one column per forecast method selected

Examples

if (FALSE) {
forecast_multivariate(
 Data = data,
 forecast.date = date.vector,
 target = 'UNRATE',
 horizon = 1,
 method = c('ols','lasso','ridge','elastic','GBM'),
 freq = 'month')}