stata Tutorial

ARIMA Models in Stata - Part 1: Identification

Play Video

share content

buy the material

Welcome to a new Free Stata Tutorial. In this opportunity, I will teach you how to estimate Arima models in EViews. Ensure to download the dataset (free) below, and replicate the results I obtain in the tutorial. There is a FAQ section where I provide the answer to many common questions about ARIMA models. Let’s begin!

What is an arima model?

ARIMA stands for Autoregressive Integrated Moving Average and is one of the most popular and widely used techniques for univariate time series forecasting.  Some of the variables you can forecast  with ARIMA models are: GDP, Consumer Price Index (CPI), and price of stocks or commodities.

univariate concept

We won’t try to forecast future values of a variable (i.e., inflation) by using many other regressors (i.e., GDP, Money supply, interest rates). Instead, we will rely on past levels of inflation to forecast future levels of inflation. Knowing how the variable behaved in the past will allow us to predict where it will head in the future.
In this tutorial we apply the Box Jenkins Method to select appropriate models and forecast future values of our variable of interest.

What is the box jenkins methodology?

The Box Jenkins methodology was named after the authors George Box and Gwilym Jenkins, who proposed a three steps method to select appropriate ARIMA models to forecast economics variables. We will try to find a model that fits the data well and can forecasting appropriate values.  The method consists of three basic steps: 

  • Stage 1: Identification
  • Stage 2: Estimation
  • Stage 3: Diagnostics and Forecasting

Some textbooks indicate that the box jenkins method has 4 stages

Don’t be afraid! The original book written by Box and Jenkins entitled “Time Series Analysis: Forecasting and Control” specifies only three steps. However, some textbooks have split the method into four stages. Stage 3 is diagnostics and, Stage 4 is Forecasting. The analysis remains the same.

ARIMA models in EViews, Stage 1: Identification

Overview of stage box jenkins stage 1:

ARIMA is written as ARIMA(p,d,q) where “p” is the order of the autoregressive component, “d” is the times we need to differentiate the variable to achieve stationarity, and “q” is the order of the moving average element.

Y_t=c+\sum_{i=1}^p\alpha Y_{t-i}+\sum_{j=1}^q\theta E_{t-j} +E_t


p= order of the autoregressive component
q=order of the moving average component
\alpha=coefficient of the autoregressive model
\theta= coefficient of the moving average model
E_t=Error Term

Stage 1 focuses on two aspects. We are first checking for stationarity of our variable of interest. Next, determining the order of our autoregressive and moving average components. In other words, on stage 1 we will determine “p”, “d” and “q”.

In our example, we are trying to fit an ARIMA model for the series “consumer price index – USA“.  We have to begin our analysis by checking for stationarity. Why? Our series needs to be stationary in order to forecast it. If our variable is non stationary in levels, we need to apply the appropriate transformations (logs/differences) to make it stationary.


To check for stationarity, we look at :

  1. The Graph
  2. The correlogram
  3. Formal tests: Augmented Dickey Fuller, Phillips-Perron Test and KPSS test.

Please watch my stationarity tutorial if you need further clarification on the procedure.

In our example, we verified that CPI is non stationary in levels, but stationary in first differences. Consequently, we use the variable in first differences.

arima: how to determine the order of "p" and "q"

To identify the order of the autoregressive and moving average components, we will focus on the correlogram of “CPI” in the first differences. We are displaying the correlogram in the first differences because we have confirmed that “CPI” is stationary in the first differences. The aim of this step is to find all the possible models to estimate.

In order to determine the order of the autoregressive component (“p”), we have to observe the partial autocorrelation column (PACF). In the column, we observe a confidence band on the sides. The values that exceed the band suggest the possible order of the autoregressive component. Looking at the correlogram, the first two lags are exceeding the confidence bands. There are two posibl AR components. We can try fitting an ARIMA with one or two AR components. For the purpose of this example, we will estimate the two cases, so later on we can compare and decide which model is better.

PACF arima stata

Next, to determine the order of the moving average component (“q”), we have to observe the Autocorrelation column (ACF). We can see that lags 1 exceed the confidence bands. Consequently, there is one possible moving average components = MA(1).

arima stata autocorrelation
Note: Only the fist lag exceed the condifence bands, suggesting the MA order=1

NOTE: For this example we have identified two possible models: ARIMA(1,1,1,) and ARIMA(2,1,1). We will estimate both in stage 2, and decide which model is better. 

stata Tutorial

ARIMA Models in Stata - Part 2: Estimation

graph, diagram, ARIMA
Play Video

stage 2: EStimation

Once we have identified possible ARIMA models candidates, we need to estimate them and decide which model is the most appropriate. The two models we decided to estimate are:

  1. ARIMA (1,1,1)
  2. ARIMA(2,1,1)

In Box Jenkins Method, Stage 2 we:

  • Estimate the models we identified in Stage 1
  • We select a model based on the significance of the coefficient estimates
  • and, based on model criterions such as: Akaike and Bayesian
  • The model with the smallest values in the model criterions and most significant coefficient will be the most appropriate

estimating the arima (1,1,1) in stata

arima(1,1,1) stata
Note: ARIMA(1,1,1,) estimation output.

We can see that the constant as well as the AR and the MA components are statistically significant at the 5% level (p<0.05). Next we can produce the selection criteria output which will provide us some useful information to compare the models.

arima(1,1,1) selection criteria stata

Having reported the Akaike and the Bayesian information criterions, we will take notes of the values so we can compare the models.

estimating the arima (2,1,1) in stata

arima(2,1,1) stata

We can see that the constant as well as the AR and the MA components are statistically significant at the 5% level (p<0.05). Next we can produce the selection criteria output which will provide us some useful information to compare the models.

arima (2,1,1) selection criteria stata

How to select the most appropriate arima model

To select the most appropriate model, I recommend you to do a table like the one below, and fill the information with the data we obtained in the previous section (estimated ARIMA models).

ARIMA model selection stata
Note: the criterions allow us to select an appropriate model. Model B is preferred over model A.

We need to ensure the following:

  1. Significance of the ARMA terms : select the model with most significant terms (p-values<0.05)
  2. SigmaSQ: is a measure of volatility. Select the smallest one
  3. Log Likelihood: We need to select the biggest value, since we are maximizing the log-likelihood function. (in our case the biggest is the least negative value).
  4. Model selection criterias: Select the model with smallest Akaike, and Bayesian values.

Conclusion: Model B has a better fit than model A.

stata Tutorial

ARIMA Models in Stata - Part 3: Forecasting

Play Video

stage 3: diagnostics and forecasting

In Part 3 you will learn:

  • Model Diagnostics
  • How to forecast with ARIMA models in STATA

We identified possible models and estimated them in stage 2. We also selected the most appropriate model based on diverse criterions. Now it is time to ensure the model satisfies the requirementes to forecast and predict future values!

In Box Jenkins Method, Stage 3 we:

  • Ensure the model satisfies the stability conditions
  • There is no autocorrelation
  • Is the above requirements are met, then we can forecast!


 We need to ensure that the residuals of the model are White Noise. We check it with the Portmanteau Test

Null Hypothesis: Residuals are white noise.

Note: If p>0.05, we cannot reject the null hypothesis. Therefore, the residuals are white noise.

portmanteau test stata

Conclusion: Since 0.9235>0.05, the residuals are white noise.

stability conditions

  1. The estimated model is covariance stationary: inverse AR roots should lie inside the unit circle
  2.  The estimated process is invertible: inverse MA roots should lie inside the unit circle
arima stability condition
Note: Both AR and MA roots are inside the circle.
arma roots stata
ARMA roots table: the values in the modulus column have to be <1 to satisfy the stability conditions

We can see in the figure above that all the inverse roots lie inside the unit circle. Our ARIMA(2,1,1) satisfied the stability conditions and the error terms are white noise. We are in a good spot now to forecast future values of the consumer price index. If the model you had selected did not satisfy the stability condition, you would need to repeat stage 2 and 3 again, and find another suitable possible candidate.

Forecasting with the arima(2,1,1,) model

We can now use our model to forecast. We will try to predict the values for the next 10 months!

arima(2,1,1,) forecast

Thanks for reading!

  1. If you value the content, please subscribe to my YouTube Channel and feel free to share this post in your social media. There are available links to share this post at the top of the article.
  2. Ensure to watch the video to go through the steps. You can Download the dataset to replicate the content.
  3. Finally, you can buy the DO File with all the details, along with the slides of the video and the dataset.

book a meeting

Do you need help with your research plan?

Are you stuck and need help to design/plan your thesis topic and methodology?

buy the material

Elevate your learning experience. You can buy the package for each of the tutorials.  Each package contains the slides of the video + Workfile/Do File + Data & Support

share content

Download the dataset for free and replicate the content covered in the video

Download the Data Set for Free (direct download, no adds).

$4.99 CAD

Buy the PRemium Content

Arima models - faq

Most frequent questions and answers

No. ARIMA models can only be estimated using stationary variables. The (I) stands for “Integrated” and reflects the order of integration. In other words, how many times you need to differentiate the variable to become stationary. If your variable is non stationary in levels, you will have to use logs and/or first differences to achieve stationarity.

If your variables are stationary, you will be estimating an ARMA model. There is no need to apply differences.

No. ARIMA models are univariate models. In other words, you are using past information to predict future information. For example: If I know what marks you got in your last 10 exams, I can use that past information to predict your future mark. If you want to estimate multivariate models, you need to estimate a VAR model (or, structural VAR). Ensure to watch the tutorials for VAR and SVAR models.

ARIMA models are widely used in Finance. They are simple models and an effective way to forecast future stocks value. However, it always works better for short-run predictions. The more ahead in time we predict, there are more chances of getting inaccurate results.

SARIMA models incorporate a seasonal component. Some variables have a seasonal element. It is common when predicting hydro consumption, that in summer the consumption spikes.  You will be able to identify the Seasonal component by looking at the AC column in the correlogram. If you notice the spikes get sharp every 6 months, then that is an indicator that your series has a seasonal component.

ARIMA models are popular to estimate demand of products or services, sales, production quotas, financial stocks, and other economic variables such as CPI.  Be aware that for most macroeconomic variables, ARIMA models can provide some insights (i.e., inflation, money supply, GDP, etc.), however there are more sophisticated models you can estimate such as VAR models.

Yearly data is not recommended for any type of models (in general). Using higher frequency data will allow us to identify seasonal components and see fluctuations. If you graph yearly data you will notice the line is very smooth. However, when you graph monthly or quarterly data, you will notice some spikes in the line.

If you don’t have many observations, or the frequency of your data is low (i.e. yearly data), ARIMA models are not powerful. Think about the following: ARIMA models use past data to be able to forecast future data. If you don’t have a lot of past observations, ARIMA predictions will be poor.

check out other free courses


Learn applied time series in EViews for Free. Some topics you wll learn are how to generate time variables, ARIMA Models, VAR models and more!

LaTex with overleaf

It's time to write your paper in a professional format. Make your paper look great with Overleaf.

Leave a Reply