Data Preparation Forecast Analysis

2. Forecasting

Now that we have properly cleaned our data, we next turn to demonstrating OOS forecasting routines. This section, covering the heart of OOS, will proceed by 1) creating forecasts with a collection of univariate time series models, 2) creating forecasts with a collection of multivariate time series models, and 3) combing both univariate and multivariate forecasting model outputs to create a collection of forecast combinations (ensemble forecast in the nomenclature of machine learning).

2.1 Univariate Forecasts

We will first forecast using a collection of univariate time series. As we have chosen to forecast a very clean series, we will not make use of OOS’s ability to clean outliers and impute missing values in a real-time fashion. That is, we only need to focus on the bare bone requirements of OOS’s univariate forecasting routine, forecast_univariate:

  1. Data: a ts, xts, or zoo time series object, or a two column data.frame with a date column and the time series to forecast
  2. forecast.dates: a vector of dates on which to simulate a historical forecast
  3. method: a vector of forecast method names (see the package website for a list of currently supported methods)
  4. horizon: an integer denoting how far into the future (measured in periods) to forecast
  5. recursive: a boolean denoting if forecasts to a horizon greater than one should be conducted recursively with one-step-ahead forecasts or via direct projection (the default is recursive forecasting)
  6. rolling.window: an integer denoting the number of backward looking periods to use when training the forecasting model (if NA then the entire available history will be used)
  7. freq: a string denoting the frequency of the time series being forecasted

For our demonstration, we will simulate forecasts of the unemployment rate one month into the future, using a random walk, ARIMA, and exponential smoothing, over a five year period from 2015 through 2019.

# run univariate forecasts
forecast.uni =
    forecast_univariate(
        # forecasting data
      Data = dplyr::select(Data, date, UNRATE),
        forecast.dates =
          seq.Date(from = as.Date('2015-1-01'),
                   to = as.Date('2019-12-01-01'),
                   by = 'month'),

        # forecast method and type
        method = c('naive','auto.arima', 'ets'),
        horizon = 1,
        recursive = FALSE,

        # information set treatment
        rolling.window = NA,
        freq = 'month')
## <warning: forecast_univariate.control_panel was instantiated and default values will be used for model estimation.>

One may notice that after running forecast_univariate we received the warning:

<warning: univariate.forecast.training was instantiated and default values will be used for model estimation.>

This warns us that we are using default parameters when training the random walk, ARIMA, and exponential smoothing models. More on this topic, including how a user may change the training parameters as they would like, will be covered in a separate vignette - as such, this type of warning will be suppressed for the remainder of the exercise. We next examine our output.

# view top of forecast output
head(forecast.uni)
##        model forecast        se forecast.date       date
## 1      naive 5.700000 0.1917706    2015-01-01 2015-02-01
## 2 auto.arima 5.653460 0.1796099    2015-01-01 2015-02-01
## 3        ets 5.653691 0.1790950    2015-01-01 2015-02-01
## 4      naive 5.500000 0.1917821    2015-02-01 2015-03-01
## 5 auto.arima 5.474782 0.1795762    2015-02-01 2015-03-01
## 6        ets 5.477820 0.1790102    2015-02-01 2015-03-01

As one can see, we now have five years worth of one-month ahead forecasts for the US unemployment rate! The default output for OOS forecasting routines, forecast_univariate, forecast_multivariate, and forecast_combination, is a long form matrix with the columns:

  1. model: a string naming the model used to estimate the forecast
  2. forecast: a numeric forecast
  3. se: a numeric standard error, NA when models do not generate standard errors by default
  4. forecast.date: a date denoting the day the forecast was (simulated) made
  5. date: a date denoting the day being foretasted

Note that when forecasts are generated recursively for a horizon greater than one, all forecasts between forecast.date and date will be provided in the forecast output as well as the declared horizon.

2.2 Multivariate Forecasts

Having successfully created a pool of univariate forecasts, we next turn to a collection of multivariate forecasting models. While there are several similarities in univariate and multivariate forecasting in OOS, one will will note that there are four differences between key forecast_univariate and forecast_multivariate.

  1. forecast_multivariate requires one to declare the name of the variable to be forecasted.
  2. forecast_multivariate allows the user to create an arbitrary number of lags for a chosen set (although the default is all) of variables in the design matrix.
  3. forecast_univariate allows the user to use direct projections or recursive forecasting while forecast_multivariate only allows for direct projections (this may change in a future version of OOS).
  4. forecast_multivariate allows users to perform dimension reduction on a chosen set (although the default is all) of variables in the design matrix, via principal components.

With these differences in mind, we will jump right into our multivariate forecasting with forecast_multivariate.

# create multivariate forecasts
forecast.multi = 
    forecast_multivariate(
        Data = Data,           
        forecast.date = 
          seq.Date(from = as.Date('2015-1-01'),
                   to = as.Date('2019-12-01'),
                   by = 'month'),
        target = 'UNRATE',
        
        # forecast method and type
        horizon = 1,
        method = c('ols','elastic','RF'),

        # information set treatment       
        rolling.window = NA,    
        freq = 'month', 
        lag.n = 1)    
# view top of forecast output
head(forecast.multi)
##         date forecast.date   model forecast         se
## 1 2015-02-01    2015-01-01     ols 5.656417 0.01631269
## 2 2015-02-01    2015-01-01 elastic 5.708525         NA
## 3 2015-02-01    2015-01-01      RF 5.606573         NA
## 4 2015-03-01    2015-02-01     ols 5.830979 0.02033377
## 5 2015-03-01    2015-02-01 elastic 5.789285         NA
## 6 2015-03-01    2015-02-01      RF 5.776003         NA

And it appears that we again have been successful in forecasting the US unemployment rate!

2.3 Forecast Combinations

It is no secret that while one forecast can be good, several forecasts can be great. we next turn to combining forecasts through a series of (out-of-sample) forecast combination techniques using the OOS forecast_combine function.

To create a set of forecast combinations, we will first need to merge our existing forecasts into one data.frame. Additionally, as we will use combination methods that require minimizing a loss function, we will also merge in the true data realizations - although this is not necessary when a user relies on methods that do not need to learn (e.g. uniform weights or the forecast median).

# combine forecasts and add in observed values
forecasts = 
    dplyr::bind_rows(
        forecast.uni,
        forecast.multi) %>%
    dplyr::left_join( 
        dplyr::select(Data, date, observed = UNRATE),
        by = 'date')

Now that we have our forecasts all in one neat package, we may turn to creating forecast combinations.

Two things that a user may wish to note regarding the forecast_combine function are:

  1. forecast_combine is designed to specifically take in output from forecast_univariate and forecast_multivariate, however, if as long as a user has their data formatted in the same long-form style as the OOS forecasting functions, they can use forecast_combine.
  2. When using any methods that require training, a user must specify a number of burn in observations, that is, the number of observations to use in the first model instantiation.

Bearing these notes in mind, we will use a collection of naive methods, uniform weights, the median forecast, and a winsorized mean, as well as a collection of trained combination models, n.best, LASSO, and peLASSO, with a burn in of 5 observations to combine our univariate and multivariate based forecasts.

# forecast combinations 
forecast.combo = 
    forecast_combine(
        forecasts, 
        method = c('uniform','median','trimmed.mean',
                       'n.best','lasso','peLasso'), 
        burn.in = 5, 
        n.max = 2)

# merge forecast combinations back into forecasts
# (these will be used later)
forecasts = 
    forecasts %>%
    dplyr::bind_rows(forecast.combo)
# view top of forecast output
head(forecast.combo)
##         date forecast         model se
## 1 2015-02-01 5.663111 uniform.combo NA
## 2 2015-03-01 5.641478 uniform.combo NA
## 3 2015-04-01 5.545682 uniform.combo NA
## 4 2015-05-01 5.449030 uniform.combo NA
## 5 2015-06-01 5.560600 uniform.combo NA
## 6 2015-07-01 5.569829 uniform.combo NA

We have forecast combinations for the US unemployment rate! This is nice progress, but now that we have all of these forecasts, how do we know which ones are good and which ones we should use to make decisions?

For that we next turn to OOS’s suite of forecast evaluation metrics.

Data Preparation Forecast Analysis