ARIMA and Neural Networks. An application to the real GNP growth rate and the unemployment rate of U.S.A.
Eleftherios Giovanis
Abstract
This paper examines the estimation and forecasting performance of ARIMA models in comparison with some of the most popular and common models of neural networks. Specifically we provide the estimation results of AR-GRNN (Generalized regression neural networks) and the AR-RBF (Radial basis function). We show that neural networks models outperform the ARIMA forecasting. We found that the best model in the case of real US GNP is the AR-GRNN and for US unemployment rate is the AR-MLP.
Keywords: ARIMA; Radial basis function; Multilayer perceptron; Generalized regression neural networks; stationarity; unit root
1
Introduction Artificial neural networks are computational networks which aim and attempt
to simulate nerve cells or neuron of biological nervous system of human or animals (Graupe, 2007) . The difference between the neural networks and the other estimation and approximation methods is that neural networks conclude the hidden layers in which the input variables or data are transformed into special function, as the logistic or the negative exponential and many more. With this hidden layer and the synapses functions, the approach can be prove a very efficient to model and to estimate nonlinear processes (McNelis, 2005). In this paper we have to deal with two macroeconomic series , which are characterized of trend and circularity. 1
Aryal and Yao-Wu (2003) applied a MLP network with 3 hidden layers to forecast the Chinese construction industry and they compare the forecasting performance of the MLP networks with that of ARIMA. They found that the RMSE of the MLP estimation is 49 percent lower than the ARIMA counterpart. Maasoumi et al., (1996) have applied a back-propagation ANN model to forecast GDP and unemployment rate among others. The network they apply is a single hidden layer feedforward networks with the hidden units. Swanson and White (1997a, 1997b) applied neural networks to forecast nine seasonally adjusted US macroeconomic time series and they found generally neural networks outperform the linear models. Tkacz and Hu (1999) have applied neural networks to forecast the Canadian GDP growth at 4-quarter horizon and they found that forecast accuracy is statistically significant, while the performance in the 1-quarter horizon is poor. Also they found that the best neural networks models outperform the best linear models by 15 to 19 percent at 4quarter horizon. Tkacz (2001) has found that neural networks produce lower forecasting errors for the yearly growth rate of the real Canadian GDP relative with the linear and univariate models.
2
Data The data concern quarterly series of the real gross national product (GNP) and
the unemployment rate for the economy of the USA during period 1948-2006. The data have been obtained by the Reserve Federal Bank of St. Louis.
2
3
Methodology a. Autoregressive moving average The first model we estimate is the ARMA, which its process (Gujarati, 2004)
is defined as ‌ ‌ ‌ ‌ (1) This is the ARMA(p,q) process. If the series are not stationary in their levels , which means that aren’t I(0), then we have to estimate an ARIMA(p,d,q) process (Gujarati, 2004). b. Generalized Regression Neural Networks The GRNN is defined as ∞
E[y | x] =
âˆŤ âˆŤ
-∞ ∞ -∞
, where E[y | x]
yg(x, y) dy
(2)
g(x, y) dy
is the expected value of y given x
and g(x,y) is the Parzen
probability density estimator . If the value of g(x,y) is unknown , then it can be estimated from a sample of observations of x and y. The predicted output obtained by GRNN is: n
^
y(x) =
∑
y i exp( −
i
n
∑ exp( −
|| x − x i || 2 2Ďƒ
2
|| x − x i || 2 2Ďƒ
i
2
)
(3)
i
)
i
Usually the GRNN consists of four layers. The first layer , which are the input data, the synaptic and the activation functions are linear. In the second layer, the pattern layer, the synaptic function is the radial and the activation function is the negative exponential. The third layer, the summation layer, has as the first layer linear synaptic and activation functions. The last layer , the output, has a synaptic function a 3
division and linear activation function. More specifically input layer receives the input vector X and distributes the data to the pattern layer. Each neuron in the pattern layer generates an output θ, which θ i = exp( − || x − x2i || ) 2σ i 2
and present the results to the
summation layer. In this layer the numerator and denominator neuron compute the weighted and simple sums based on the value of w and θ , which is wijθj , the numerator is Sj = Σi wijθj and denominator is Sd = Σi θj. In the output layer output y are computed as Υj = Sj/ Sd. We must mention that the hidden layer consists of 24 units. The smooth rate for GNP is set at 0.01 and for the unemployment rate is set at 0.05 based on the lowest train and test errors. In our case we propose the AR-GRNN model (Li et al., 2007), which means that the output is the vector of data yt and inputs are the data with lags as yt-1, yt-2…yt-p. So the general form of the AR-GRNN is defined as , , … … … , (4) , where F is a function produced by GRNN network. But in the case of unemployment we consider the first differences, because we suspect that unemployment , is not probably stationary, as indicates the KPSS test , so we apply the following AR(p) function , , … … … ,
(5)
We apply relation (4) and (5) for all neural networks models and specifically we apply AR(1) for GNP and AR(2) for the first differences of unemployment rate. The technique we obtain is the following. Suppose that we have quarterly output data for a period e.g. 1948:Q1-2006:Q4 which is the variable yt. If we have AR(1) then we obtain the yt-1 , which is the output data with one lag. But this lag is referred again to 4
same data for period 1948:Q2-2007:Q1, which means that we don’t extinguish the last observation , but we put it forward to the next period. The same process is followed for AR(2). So in this paper we estimate for the period 1948:Q1-2006:Q4 and then we make the forecast for the period 2007:Q1-2008:Q1. This definition is applied also for the other two neural network models. In figure 1 is presented a general GRNN architecture. In all neural network models estimations the training sample is set up for period 1948:Q1-1990:Q4 and the testing sample is set up for period 1991:Q12006:Q4. The
Y1
Y2
YJ
Output Layer ……………..
Numerator 22 2
1
J
………………
1
2
3
…………
Summation Layer
Pattern Layer
I
Input Layer
……………. X1
X2
Denominator
Xk
Figure 1. General GRNN architecture
5
c. Radial Basis Function The radial basis function is defined (Bishop, 1995) as M
y k ( x) = ∑ wkjφ j ( x) + wko (6) j =1
, where wkj are the weights and wko are biases and φj(x) can be estimated by
φ j ( x ) = exp( −
|| x − µ j || 2σ
2
)
(7)
j
The RBF consists by three layers, the input, which its synaptic and activation function are linear, the hidden layer , where the synaptic and the activation functions are radial and negative exponential respectively. Finally the third layer, which is the output layer, has linear synaptic and activation function, as in the case of the input layer. In figure 2 we present a general RBF illustration. The hidden layer in the RBF estimation has 11 units. The radial for GNP and for unemployment rate has been set at 50 based on the lowest train and test error as in the case of GRNN estimation. The definition of AR-RBF function is applied as in the AR-GRNN case. We present a general RBF illustration in figure 2.
d. Multilayer perceptron The last model we estimate is the multilayer perceptron (MLP), which has two differences in relation with the RBF (McNelis, 2005). First the RBF has at the most one hidden layer, while MLP can have more. Second the activation function in RBF computes the Euclidean distance of the between the signal from input vector and the
6
center of that unit , while MLP computes the inner products of the inputs and the weights for that unit. The first layer, input, in the MLP has linear synaptic and activation function, as the last layer, the output, has. The hidden layers, which in our case are three , have linear synaptic function and hyperbolic activation functions. For networks with binary units MLP with one hidden layer has been shown that is suffice. But in our case we
Output
Linear weights
Radial basis functions
Weights
Input x
Figure 2. General RBF architecture
have continuous variables or data , so we prefer three hidden layers. In the first phase the back-propagation method is applied. Each layer consists of units and receive input from the units of the layer directly below, and then send the output to unit directly above the unit. The Ni inputs are fed into the first layer of Nh,1 hidden units (Krose & Smagt, 1996). The mathematical concept of the back-propagation method is (1) 7
, where ∑
(2)
, to get the delta rule we must set
!
(3)
"#
The error measure of Et is defined as the total square error for pattern t at the output units and it is
'( %& & $ ∑&)
(4)
, where %& is the desired output for unit i and pattern t. Then we can write by the chain rule !"#
*#
(5)
*# !"#
But by equation (2) we find that the second factor from the right hand term of the equation (5) is equal with *# !"#
And we define the first factor
*#
(6)
as
+ *
(7)
#
, so equation (3) can be written as + (8) Then to compute + we write the partial derivation, by applying the chain rule, as the product of two factors. The one factor in relation (9) reflects the change in error as a function of the output of the unit , while the other reflects the change in the output as a function of changes in the input. Relation (9) is defined as
,#
+ * , #
#
We know that the second factor of (9) is 8
*#
(9)
,# *#
(10)
, which is the derivative of function f for the kth unit. For the first factor computation we assume that k=i. Then in this case we have
%& &
(11)
& %& & & &
(12)
,#
And then we have
, for any output unit i. Second if k is a hidden unit and not an output, which means that k=h , then the error measure can be written as a function of net inputs from hidden to output layers and we use the chain rule. ,.
'
*(
( ∑&) *
( ,(
'
( ∑&) *
( ,(
'( '( . ∑' ) &, ∑&) * /& ∑&) & /& (13) "
(
'
( 0 / ∑&) & /&
(14)
In the first phase we use the back-propagation method. In the second phase we use the Levenberg-Marquardt algorithm (Bishop, 1995). Suppose that we have the error function
$ ∑2 1 2
(15)
, where ξ4 is the error for the nth pattern. We set WA as the old weight space and WB as the new weight space. Then we can expand the error vector ξ to first order in Taylor series. ξ(WB) = ξ(WΑ) + Ζ (WB – WΑ) (16) , where Z is matrix and is defined as 6 7
5 2& !
(17)
(
So the error function (20) can be written as
$ ∑2 Îľ WΑ Ζ WB – WΑ ||Îľ WΑ Ζ WB – WΑ || (18) 9
In this paper we estimate a MLP network with three hidden layers and three units each of them. The learning rate is set at 0.01 and the momentum at 0.3. In the first phase the number of epochs are 100 and in the second phase they are 500. The AR-MLP is defined as in the other two neural network models, the AR-GRNN and the AR-RBF. In figure 3 a general MLP illustration with three hidden layers is presented.
h
h
h
h No
Ni
Nh,1
Nh,t-1
Nh,t-1
Figure 3. MLP architecture with three hidden layers
Also we will apply the unit root test to examine if the series are I(0) or not, which with other words means, if they are stationary in levels or in first difference and above. We apply these tests to define if we have an ARMA(p,q) or an ARIMA(p,d,q) process. We apply two tests the Dickey-Fuller (Greene, 2003) and the KPSS (Kwiatkowski, 1992) tests. For DF GLS test we examine the regression with constant and trend and it is > , ?@ 1 (19)
10
And we test the hypothesis H0: Ď&#x2020;=1, δ=0 => yt ~ I(1) with drift H1: Ď&#x2020;<|1| => yt ~ I(0) with deterministic time trend , which means that if we accept the null hypothesis then the series are non stationary in first differences , so they are I(1), else if we reject the null hypothesis the time series are stationary, I(0). For the KPSS test we have the hypotheses. H0: stationary H1: non-stationary The KPSS test is based in the residuals by the OLS regression of yt on exogenous variables xt. Specifically it is yt = Îą + βt + ÎłÎ&#x2013;t (20) If Îł equals with zero , then the process is stationary if β=0 and trend stationary if β B 0 . Let et denotes the OLS residuals, et= yt - Îą - βt The KPSS statistic is
DEFF
, where IJ
â&#x2C6;&#x2018;LM@ K G
â&#x2C6;&#x2018;G ) $ H IJ
2 â&#x2C6;&#x2018;P ) 1 PQ R , while R
â&#x2C6;&#x2018;L TM"U@ ST ST?" G
To compare the forecasting performance between the models we examine, we apply two statistical measures, the RMSE (root mean square error) and the MAE (mean absolute error).
11
4
Results
Table 1 Unit root tests for real GNP and unemployment rate of USA Test
Series
t-statistics
Critical values -3.46 (1%)
DF GLS
GNP
-10.466
-2.92 (5%) -2.62 (10%) -3.46 (1%)
DF GLS
Un. Rate
-2.43
-2.92 (5%) -2.62 (10%)
Test
LM-stat
Critical Values 0.216 (1%)
KPSS
GNP
0.0299
0.146 (5%) 0.119 (10%) 0.216 (1%)
KPSS
Un. rate
0.2375
0.146 (5%) 0.119 (10%)
From table 1 we conclude that real GNP is I(0), so it is stationary in levels with both tests. For unemployment rate we conclude that with KPSS test is I(1) as we can see from table 2.
Table 2 KPSS unit root test for first difference of unemployment rate Test
Series
LM stat
Critical values 0.216 (1%)
DF GLS
Un. rate
0.0338
0.146 (5%) 0.119 (10%)
According to the three information criteria , Akaike, Hannan-Quinn and Schwarz, we have an
ARMA(1,0) process for GNP and ARIMA(2,1,3) for
unemployment rate. So we apply an AR(1) for the three neural networks in the case of
12
GNP and AR(2) for unemployment rate. From table 3 we conclude that neural networks modeling is better, with AR-GRNN to have the lowest RMSE and MAE. So we prefer neural network for the forecasting of the real GNP of USA. Specifically we found that the RMSE of forecasting for neural network models is 7 to 17 per cent lower than the ARIMA counterpart an the MAE is 9 to 22 per cent lower than the MAE of ARIMA. Table 3 Forecasting comparison between ARIMA and neural networks for the real GNP of USA for the period 2007:Q1-2008:Q1 Model
RMSE
MAE
ARMA(1,0)
0.554
0.502
GRNN
0.460
0.393
RBF
0.500
0.433
MLP
0.515
0.455
In table 4 the conclusions are almost the same with that of GNP results. The neural networks modeling is again more reliable and these models present lower RMSE and MAE than that of ARIMA(2,1,3). Especially the AR-MLP and then the AR-GRNN are the best models. In the case of the unemployment the RMSE and MAE of neural networks are respectively 45 to 62 and 56 to 67 percent lower than the ARIMA counterparts. In table 5 we present the actual values of real US GNP and the predicted values generated by the four models. Table 4 Forecasting comparison between ARIMA and neural networks for the real unemployment rate of USA for the period 2007:Q1-2008:Q1 Model
RMSE
MAE
ARIMA(2,1,3)
0.217
0.202
GRNN
0.107
0.089
RBF
0.120
0.084
MLP
0.081
0.066
13
Table 5 Forecasting values for GNP with the four models Model
ARMA (1,0)
Period
Actual
Predicted
2007:Q1 2007:Q2 2007:Q3 2007:Q4 2008:Q1
0.164 0.983 1.411 0.462 0.044
0.76379 0.80734 0.82153 0.82615 0.82765
Model
Predicted
GRNN
0.860 0.390 1.001 0.209 0.060
Model
Predicted
RBF
0.513 0.941 0.695 0.974 0.643
Model
Predicted
MLP
0.729 0.893 0.936 0.790 0.864
Table 6 Forecasting values for unemployment rate with ARMA (2,1,3) and neural networks Model
ARIMA (2,1,3)
Period
Actual
Predicted
2007:Q1 2007:Q2 2007:Q3 2007:Q4 2008:Q1
0.567 -0.367 0.234 -0.100 0.700
0.390 -0.176 0.391 -0.232 0.342
Model
Predicted
GRNN
0.659 -0.300 0.197 0.100 0.749
Model
Predicted
RBF
0.629 -0.328 0.269 0.155 0.722
Model
Predicted
MLP
0.550 -0.280 0.384 -0.144 0.667
In table 6 we present the actual and predicted first differences of US unemployment with ARIMA (2,1,3) and the three neural network models. In figure 4 we present the forecasting with for US real GNP during the period 2007:Q1-2008:Q1, while in figure 5 are presented the forecasting results for US unemployment for the same period.
14
1.6
1.6
1.4
1.4
1.2
1.2
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0 2007Q1
2007Q2
2007Q3
ACTUAL
2007Q4
2008Q1
0.0 2007Q1
2007Q2
2007Q3 ACTUAL
ARIMA (2,1,3)
(a)
2007Q4
2008Q1
GRNN
(b)
1.6
1.6
1.4
1.4
1.2
1.2
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0 2007Q1
2007Q2
2007Q3 ACTUAL
2007Q4
2008Q1
0.0 2007Q1
2007Q2
2007Q3 ACTUAL
RBF
(c)
2007Q4 MLP
(d)
Figure 4. Actual against forecasting for US GNP in the period 2007:Q1-2008:Q1 with: (a) ARMA (1,0), (b) GRNN, (c) RBF and (d) MLP
15
2008Q1
.8
.8
.6
.6
.4
.4
.2
.2
.0
.0
-.2
-.2
-.4 2007Q1
2007Q2
2007Q3 ACTUAL
2007Q4
2008Q1
-.4 2007Q1
2007Q2
ACTUAL
ARIMA
(a) .8
.6
.6
.4
.4
.2
.2
.0
.0
-.2
-.2
2007Q2
2007Q3 ACTUAL
2007Q4
2008Q1
GRNN
(b)
.8
-.4 2007Q1
2007Q3
2007Q4
2008Q1
RBF
-.4 2007Q1
2007Q2
2007Q3 ACTUAL
(c)
2007Q4 MLP
(d)
Figure 5. Actual against forecasting for US unemployment first differences in the period 2007:Q12008:Q1 with: (a) ARIMA (2,1,3), (b) GRNN, (c) RBF and (d) MLP
16
2008Q1
5
Conclusion We examined the forecasting performance of the traditional time series
method, the ARIMA process in comparison with three neural networks models. We proposed the three of the most usual models the generalized regression neural networks (GRNN), the radial basis function (RBF) and the multilayer perceptron (MLP). We obtained the autoregressive (AR) of these neural models, which means that input data are just the output data with time lags. We configure the AR(p) order as we define by the unit root tests, so we have AR(1) for the real gross national product (GNP) and AR(2) for the unemployment rate for the economy of USA. We show that all neural models outperform the ARIMA process , so we conclude that traditional time series and econometrical methods , are not always the best or even the only choice, but we must look out for more sophisticated modeling , as the neural networks modeling, which are able to capture with great success , the non-linear processes.
REFERENCES Aryal R.D. & Yao-Wu W. (2003). Neural Network Forecasting of the Production Level of Chinese Construction Industry. Journal of comparative international management , 29, 319-33 Bishop C.M. (1995). Neural Networks for Pattern Recognition. pp. 164-170, 290291. Oxford: Clarendon Press Graupe D. (2007). Principles of Artificial Neural Networks. 2nd Edition, pp. 1 World USA: Scientific Publishing Greene H. W. (2003). Econometric Analysis. Fifth Edition, pp. 637-640. New Jersey: Pearson Education Gujarati D. (2004). Basic Econometrics. Fourth Edition, pp. 839-840. USA: McGrawhill Krose B. & Smagt. V.D. P. (1996). An introduction to neural networks. Eighth edition . pp. 33-37. The University of Amsterdam Kwiatkowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992). Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root. Journal of Econometrics, 54, 159-178. Li W., Luo Y., Zhu Q., Liu J. & Le J. (2007). Applications of AR*-GRNN model 17
for financial time series forecasting. Neural Computing & Applications, London: Springer Maasumi E., Khotanzad A., and Abaye A. (1996). Artificial neural networks for some macroeconomic series: a first report. Econometric Reviews, 13 (1), 105-122 McNelis D. P. (2005). Neural Networks in Finance: Gaining Predictive Edge in the Market. pp. 21. USA : Elsevier Academic Press Swanson, N.R., and White, H. (1997a). A model selection approach to real time macroeconomic forecasting using linear models and artificial neural networks. Review of Economics and Statistics, 79, 540-50. Swanson, N.R., and White, H. (1997b) . Forecasting economic time series using adaptive versus non-adaptive and linear versus nonlinear econometric models. International Journal of Forecasting, 13, 439-61. Tkacz G. and Hu, S. (1999). Forecasting GDP Growth Using Artificial Neural Networks. Working Paper, Bank of Canada, 99-3 Tkacz G. (2001). Neural network forecasting of Canadian GDP growth. International Journal of Forecasting, 17, 57-69.
18