2_forecasting.Rmd
← Data Preparation Forecast Analysis →
Now that we have properly cleaned our data, we next turn to demonstrating OOS forecasting routines. This section, covering the heart of OOS, will proceed by 1) creating forecasts with a collection of univariate time series models, 2) creating forecasts with a collection of multivariate time series models, and 3) combing both univariate and multivariate forecasting model outputs to create a collection of forecast combinations (ensemble forecast in the nomenclature of machine learning).
We will first forecast using a collection of univariate time series. As we have chosen to forecast a very clean series, we will not make use of OOS’s ability to clean outliers and impute missing values in a real-time fashion. That is, we only need to focus on the bare bone requirements of OOS’s univariate forecasting routine, forecast_univariate
:
ts
, xts
, or zoo
time series object, or a two column data.frame with a date
column and the time series to forecastFor our demonstration, we will simulate forecasts of the unemployment rate one month into the future, using a random walk, ARIMA, and exponential smoothing, over a five year period from 2015 through 2019.
# run univariate forecasts
forecast.uni =
forecast_univariate(
# forecasting data
Data = dplyr::select(Data, date, UNRATE),
forecast.dates =
seq.Date(from = as.Date('2015-1-01'),
to = as.Date('2019-12-01-01'),
by = 'month'),
# forecast method and type
method = c('naive','auto.arima', 'ets'),
horizon = 1,
recursive = FALSE,
# information set treatment
rolling.window = NA,
freq = 'month')
## <warning: forecast_univariate.control_panel was instantiated and default values will be used for model estimation.>
One may notice that after running forecast_univariate
we received the warning:
<warning: univariate.forecast.training was instantiated and default values will be used for model estimation.>
This warns us that we are using default parameters when training the random walk, ARIMA, and exponential smoothing models. More on this topic, including how a user may change the training parameters as they would like, will be covered in a separate vignette - as such, this type of warning will be suppressed for the remainder of the exercise. We next examine our output.
# view top of forecast output
head(forecast.uni)
## model forecast se forecast.date date
## 1 naive 5.700000 0.1917706 2015-01-01 2015-02-01
## 2 auto.arima 5.653460 0.1796099 2015-01-01 2015-02-01
## 3 ets 5.653691 0.1790950 2015-01-01 2015-02-01
## 4 naive 5.500000 0.1917821 2015-02-01 2015-03-01
## 5 auto.arima 5.474782 0.1795762 2015-02-01 2015-03-01
## 6 ets 5.477820 0.1790102 2015-02-01 2015-03-01
As one can see, we now have five years worth of one-month ahead forecasts for the US unemployment rate! The default output for OOS forecasting routines, forecast_univariate
, forecast_multivariate
, and forecast_combination
, is a long form matrix with the columns:
Note that when forecasts are generated recursively for a horizon greater than one, all forecasts between forecast.date and date will be provided in the forecast output as well as the declared horizon.
Having successfully created a pool of univariate forecasts, we next turn to a collection of multivariate forecasting models. While there are several similarities in univariate and multivariate forecasting in OOS, one will will note that there are four differences between key forecast_univariate
and forecast_multivariate
.
forecast_multivariate
requires one to declare the name of the variable to be forecasted.forecast_multivariate
allows the user to create an arbitrary number of lags for a chosen set (although the default is all) of variables in the design matrix.forecast_univariate
allows the user to use direct projections or recursive forecasting while forecast_multivariate
only allows for direct projections (this may change in a future version of OOS).forecast_multivariate
allows users to perform dimension reduction on a chosen set (although the default is all) of variables in the design matrix, via principal components.With these differences in mind, we will jump right into our multivariate forecasting with forecast_multivariate
.
# create multivariate forecasts
forecast.multi =
forecast_multivariate(
Data = Data,
forecast.date =
seq.Date(from = as.Date('2015-1-01'),
to = as.Date('2019-12-01'),
by = 'month'),
target = 'UNRATE',
# forecast method and type
horizon = 1,
method = c('ols','elastic','RF'),
# information set treatment
rolling.window = NA,
freq = 'month',
lag.n = 1)
# view top of forecast output
head(forecast.multi)
## date forecast.date model forecast se
## 1 2015-02-01 2015-01-01 ols 5.656417 0.01631269
## 2 2015-02-01 2015-01-01 elastic 5.708525 NA
## 3 2015-02-01 2015-01-01 RF 5.606573 NA
## 4 2015-03-01 2015-02-01 ols 5.830979 0.02033377
## 5 2015-03-01 2015-02-01 elastic 5.789285 NA
## 6 2015-03-01 2015-02-01 RF 5.776003 NA
And it appears that we again have been successful in forecasting the US unemployment rate!
It is no secret that while one forecast can be good, several forecasts can be great. we next turn to combining forecasts through a series of (out-of-sample) forecast combination techniques using the OOS forecast_combine
function.
To create a set of forecast combinations, we will first need to merge our existing forecasts into one data.frame. Additionally, as we will use combination methods that require minimizing a loss function, we will also merge in the true data realizations - although this is not necessary when a user relies on methods that do not need to learn (e.g. uniform weights or the forecast median).
# combine forecasts and add in observed values
forecasts =
dplyr::bind_rows(
forecast.uni,
forecast.multi) %>%
dplyr::left_join(
dplyr::select(Data, date, observed = UNRATE),
by = 'date')
Now that we have our forecasts all in one neat package, we may turn to creating forecast combinations.
Two things that a user may wish to note regarding the forecast_combine
function are:
forecast_combine
is designed to specifically take in output from forecast_univariate
and forecast_multivariate
, however, if as long as a user has their data formatted in the same long-form style as the OOS forecasting functions, they can use forecast_combine
.Bearing these notes in mind, we will use a collection of naive methods, uniform weights, the median forecast, and a winsorized mean, as well as a collection of trained combination models, n.best, LASSO, and peLASSO, with a burn in of 5 observations to combine our univariate and multivariate based forecasts.
# forecast combinations
forecast.combo =
forecast_combine(
forecasts,
method = c('uniform','median','trimmed.mean',
'n.best','lasso','peLasso'),
burn.in = 5,
n.max = 2)
# merge forecast combinations back into forecasts
# (these will be used later)
forecasts =
forecasts %>%
dplyr::bind_rows(forecast.combo)
# view top of forecast output
head(forecast.combo)
## date forecast model se
## 1 2015-02-01 5.663111 uniform.combo NA
## 2 2015-03-01 5.641478 uniform.combo NA
## 3 2015-04-01 5.545682 uniform.combo NA
## 4 2015-05-01 5.449030 uniform.combo NA
## 5 2015-06-01 5.560600 uniform.combo NA
## 6 2015-07-01 5.569829 uniform.combo NA
We have forecast combinations for the US unemployment rate! This is nice progress, but now that we have all of these forecasts, how do we know which ones are good and which ones we should use to make decisions?
For that we next turn to OOS’s suite of forecast evaluation metrics.