π[DS Code Review] ARIMAX - time series modeling with exogenous features
"Exogenous? More like exo-genius! Because sometimes you need some outside smarts to get the job done!"
Redditor Challenge Accepted:
In the last post we reviewed how ARIMA and XGBoost compare for time series forecasting:
Some people brought up MLForecast and Prophet, which I reviewed here. This post will help the β¬οΈ 35 upvoting Redditors get peace of mind.
Learn from the experts
First of all, Iβm not an ARIMAX expert. Learn from one instead on the subject.
ARIMA modeling with features
ARIMA (AutoRegressive Integrated Moving Average) modeling is a popular time series forecasting technique used to make predictions based on past patterns of data. Exogenous variables are external factors that can affect the outcome of a time series, but are not themselves affected by the time series.
To do ARIMA modeling with exogenous variables, you would use an extension of the basic ARIMA model called ARIMAX (AutoRegressive Integrated Moving Average with eXogenous variables). In an ARIMAX model, you include one or more exogenous variables in addition to the time series data.
Here are the steps to perform ARIMAX modeling with exogenous variables:
Collect your data: Gather the time series data you want to model and any exogenous variables you wish to include.
Preprocess your data: Check your data for missing values, outliers, and any other data quality issues. Consider transforming your data if it is non-stationary.
Select your model: Determine the order of differencing required to make the time series stationary. Determine the appropriate order of the ARIMA model (p, d, q) by analyzing the autocorrelation and partial autocorrelation functions of the time series.
Include exogenous variables: Determine which exogenous variables to include in your model. You can use regression analysis or other techniques to identify relevant exogenous variables.
Fit the model: Use the ARIMAX model to fit the time series data and exogenous variables.
Evaluate the model: Assess the goodness-of-fit of your model using statistical measures such as the AIC or BIC. Use the model to make predictions and compare them to the actual data.
Exogenous variables are external factors that may influence the outcome of a time series, but are not themselves influenced by the time series. For example, if you are modeling sales data for a product, an exogenous variable might be the price of a competing product, which could affect sales but is not itself affected by sales. By including exogenous variables in your ARIMAX model, you can improve the accuracy of your forecasts by accounting for these external factors.
ARIMA(p, d, q)
Hereβs a better overview of ARIMA models. Below is my attempt to explain it.
In the context of ARIMA models, p
, d
, and q
are the three parameters that define the order of the model. Here's what each of them represents:
p
: The autoregressive order. This is the number of lagged values of the dependent variable (i.e., the time series itself) that are included in the model. For example, an ARIMA(p, d, q) model with p=3 would include the three previous values of the time series in the model.d
: The differencing order. This is the number of times the time series is differenced in order to make it stationary. Stationarity means that the statistical properties of the time series (e.g., mean, variance) are constant over time. Differencing is a common way to transform a non-stationary time series into a stationary one.q
: The moving average order. This is the number of lagged values of the error term (i.e., the difference between the predicted values and the actual values) that are included in the model. For example, an ARIMA(p, d, q) model with q=2 would include the two previous values of the error term in the model.
These parameters are usually determined using statistical methods, such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC), based on how well the model fits the data.
So how do I include exogenous features in an ARIMAX(p,d,q) model?
An ARIMAX model is just an ARIMA model with additional exogenous features.
πππππ If you really want to know how to do this well, look at this detailed blog post by the folks building statsmodels
.
ARIMA is nothing different than OLS regression with some fancy features. An ARIMAX model with p=1, d=2, q=3 can be shown to be the model we know and love. Check out Hyndmanβs blog on this.
The coefficients of the ARIMAX model can be interpreted in the same way as the coefficients of a multiple linear regression model. The coefficient of an exogenous variable indicates the change in the dependent variable for a unit change in the corresponding exogenous variable, holding all other variables constant. The coefficients of the lagged dependent variable and error terms indicate the effect of the previous values of the dependent variable and error terms on the current value of the dependent variable, holding all other variables constant.
Other resources:
https://people.duke.edu/~rnau/411arim.htm
https://timeseriesreasoning.com/contents/regression-with-arima-errors-model/
Have some better tips/tricks? Drop in the comments below!