forecast_multivariate.Rd
A function to estimate multivariate forecasts out-of-sample. Methods available include: vector auto-regression, linear regression, lasso regression, ridge regression, elastic net, random forest, tree-based gradient boosting machine, and single-layer neural network. The function will take in a data frame of the target variable, exogenous variables, and a 'date' column, while outputting a data frame with a date column and one column per forecast method selected.
forecast_multivariate( Data, forecast.dates, target, horizon, method, rolling.window = NA, freq, lag.variables = NULL, lag.n = NULL, outlier.clean = FALSE, outlier.variables = NULL, outlier.bounds = c(0.05, 0.95), outlier.trim = FALSE, outlier.cross_section = FALSE, impute.missing = FALSE, impute.method = "kalman", impute.variables = NULL, impute.verbose = FALSE, reduce.data = FALSE, reduce.variables = NULL, reduce.ncomp = NULL, reduce.standardize = TRUE, parallel.dates = NULL, return.models = FALSE, return.data = FALSE )
Data | data.frame: data frame of target variable, exogenous variables, and observed date (named 'date'); may alternatively be a |
---|---|
forecast.dates | date: dates forecasts are created |
target | string: column name in Data of variable to forecast |
horizon | int: number of periods into the future to forecast |
method | string or vector: methods to use; 'var', 'ols', 'ridge', 'lasso', 'elastic', 'RF', 'GBM', 'NN' |
rolling.window | int: size of rolling window, NA if expanding window is used |
freq | string: time series frequency; day, week, month, quarter, year |
lag.variables | string: vector of variables to lag each time step, if lag.n is not null then the default is all non-date variables |
lag.n | int: number of lags to create |
outlier.clean | boolean: if TRUE then clean outliers |
outlier.variables | string: vector of variables to standardize, default is all but 'date' column |
outlier.bounds | double: vector of winsorizing minimum and maximum bounds, c(min percentile, max percentile) |
outlier.trim | boolean: if TRUE then replace outliers with NA instead of winsorizing bound |
outlier.cross_section | boolean: if TRUE then remove outliers based on cross-section (row-wise) instead of historical data (column-wise) |
impute.missing | boolean: if TRUE then impute missing values |
impute.method | string: select which method to use from the imputeTS package; 'interpolation', 'kalman', 'locf', 'ma', 'mean', 'random', 'remove','replace', 'seadec', 'seasplit' |
impute.variables | string: vector of variables to impute missing values, default is all numeric columns |
impute.verbose | boolean: show start-up status of impute.missing.routine |
reduce.data | boolean: if TRUE then reduce dimension |
reduce.variables | string: vector of variables to impute missing values, default is all numeric columns |
reduce.ncomp | int: number of factors to create |
reduce.standardize | boolean: normalize variables (mean zero, variance one) before estimating factors |
parallel.dates | int: the number of cores available for parallel estimation |
return.models | boolean: if TRUE then return list of models estimated each forecast.date |
return.data | boolean: if True then return list of information.set for each forecast.date |
data.frame with a date column and one column per forecast method selected
if (FALSE) { forecast_multivariate( Data = data, forecast.date = date.vector, target = 'UNRATE', horizon = 1, method = c('ols','lasso','ridge','elastic','GBM'), freq = 'month')}