Now that we have properly cleaned our data, we next turn to demonstrating OOS forecasting routines. This section, covering the heart of OOS, will proceed by 1) creating forecasts with a collection of univariate time series models, 2) creating forecasts with a collection of multivariate time series models, and 3) combing both univariate and multivariate forecasting model outputs to create a collection of forecast combinations (ensemble forecast in the nomenclature of machine learning).
We will first forecast using a collection of univariate time series. As we have chosen to forecast a very clean series, we will not make use of OOS’s ability to clean outliers and impute missing values in a real-time fashion. That is, we only need to focus on the bare bone requirements of OOS’s univariate forecasting routine,
zootime series object, or a two column data.frame with a
datecolumn and the time series to forecast
For our demonstration, we will simulate forecasts of the unemployment rate one month into the future, using a random walk, ARIMA, and exponential smoothing, over a five year period from 2015 through 2019.
# run univariate forecasts forecast.uni = forecast_univariate( # forecasting data Data = dplyr::select(Data, date, UNRATE), forecast.dates = seq.Date(from = as.Date('2015-1-01'), to = as.Date('2019-12-01-01'), by = 'month'), # forecast method and type method = c('naive','auto.arima', 'ets'), horizon = 1, recursive = FALSE, # information set treatment rolling.window = NA, freq = 'month')
## <warning: forecast_univariate.control_panel was instantiated and default values will be used for model estimation.>
One may notice that after running
forecast_univariate we received the warning:
<warning: univariate.forecast.training was instantiated and default values will be used for model estimation.>
This warns us that we are using default parameters when training the random walk, ARIMA, and exponential smoothing models. More on this topic, including how a user may change the training parameters as they would like, will be covered in a separate vignette - as such, this type of warning will be suppressed for the remainder of the exercise. We next examine our output.
# view top of forecast output head(forecast.uni)
## model forecast se forecast.date date ## 1 naive 5.700000 0.1917706 2015-01-01 2015-02-01 ## 2 auto.arima 5.653460 0.1796099 2015-01-01 2015-02-01 ## 3 ets 5.653691 0.1790950 2015-01-01 2015-02-01 ## 4 naive 5.500000 0.1917821 2015-02-01 2015-03-01 ## 5 auto.arima 5.474782 0.1795762 2015-02-01 2015-03-01 ## 6 ets 5.477820 0.1790102 2015-02-01 2015-03-01
As one can see, we now have five years worth of one-month ahead forecasts for the US unemployment rate! The default output for OOS forecasting routines,
forecast_combination, is a long form matrix with the columns:
Note that when forecasts are generated recursively for a horizon greater than one, all forecasts between forecast.date and date will be provided in the forecast output as well as the declared horizon.
Having successfully created a pool of univariate forecasts, we next turn to a collection of multivariate forecasting models. While there are several similarities in univariate and multivariate forecasting in OOS, one will will note that there are four differences between key
forecast_multivariaterequires one to declare the name of the variable to be forecasted.
forecast_multivariateallows the user to create an arbitrary number of lags for a chosen set (although the default is all) of variables in the design matrix.
forecast_univariateallows the user to use direct projections or recursive forecasting while
forecast_multivariateonly allows for direct projections (this may change in a future version of OOS).
forecast_multivariateallows users to perform dimension reduction on a chosen set (although the default is all) of variables in the design matrix, via principal components.
With these differences in mind, we will jump right into our multivariate forecasting with
# create multivariate forecasts forecast.multi = forecast_multivariate( Data = Data, forecast.date = seq.Date(from = as.Date('2015-1-01'), to = as.Date('2019-12-01'), by = 'month'), target = 'UNRATE', # forecast method and type horizon = 1, method = c('ols','elastic','RF'), # information set treatment rolling.window = NA, freq = 'month', lag.n = 1)
# view top of forecast output head(forecast.multi)
## date forecast.date model forecast se ## 1 2015-02-01 2015-01-01 ols 5.656417 0.01631269 ## 2 2015-02-01 2015-01-01 elastic 5.708525 NA ## 3 2015-02-01 2015-01-01 RF 5.606573 NA ## 4 2015-03-01 2015-02-01 ols 5.830979 0.02033377 ## 5 2015-03-01 2015-02-01 elastic 5.789285 NA ## 6 2015-03-01 2015-02-01 RF 5.776003 NA
And it appears that we again have been successful in forecasting the US unemployment rate!
It is no secret that while one forecast can be good, several forecasts can be great. we next turn to combining forecasts through a series of (out-of-sample) forecast combination techniques using the OOS
To create a set of forecast combinations, we will first need to merge our existing forecasts into one data.frame. Additionally, as we will use combination methods that require minimizing a loss function, we will also merge in the true data realizations - although this is not necessary when a user relies on methods that do not need to learn (e.g. uniform weights or the forecast median).
Now that we have our forecasts all in one neat package, we may turn to creating forecast combinations.
Two things that a user may wish to note regarding the
forecast_combine function are:
forecast_combineis designed to specifically take in output from
forecast_multivariate, however, if as long as a user has their data formatted in the same long-form style as the OOS forecasting functions, they can use
Bearing these notes in mind, we will use a collection of naive methods, uniform weights, the median forecast, and a winsorized mean, as well as a collection of trained combination models, n.best, LASSO, and peLASSO, with a burn in of 5 observations to combine our univariate and multivariate based forecasts.
# forecast combinations forecast.combo = forecast_combine( forecasts, method = c('uniform','median','trimmed.mean', 'n.best','lasso','peLasso'), burn.in = 5, n.max = 2) # merge forecast combinations back into forecasts # (these will be used later) forecasts = forecasts %>% dplyr::bind_rows(forecast.combo)
# view top of forecast output head(forecast.combo)
## date forecast model se ## 1 2015-02-01 5.663111 uniform.combo NA ## 2 2015-03-01 5.641478 uniform.combo NA ## 3 2015-04-01 5.545682 uniform.combo NA ## 4 2015-05-01 5.449030 uniform.combo NA ## 5 2015-06-01 5.560600 uniform.combo NA ## 6 2015-07-01 5.569829 uniform.combo NA
We have forecast combinations for the US unemployment rate! This is nice progress, but now that we have all of these forecasts, how do we know which ones are good and which ones we should use to make decisions?
For that we next turn to OOS’s suite of forecast evaluation metrics.