ARIMA and Neural Networks. An application to the real GNP growth rate and the unemployment rate by Giovanis Eleftherios

ARIMA and Neural Networks. An application to the real GNP growth rate and the unemployment rate of U.S.A.

Eleftherios Giovanis

Abstract

This paper examines the estimation and forecasting performance of ARIMA models in comparison with some of the most popular and common models of neural networks. Specifically we provide the estimation results of AR-GRNN (Generalized regression neural networks) and the AR-RBF (Radial basis function). We show that neural networks models outperform the ARIMA forecasting. We found that the best model in the case of real US GNP is the AR-GRNN and for US unemployment rate is the AR-MLP.

Keywords: ARIMA; Radial basis function; Multilayer perceptron; Generalized regression neural networks; stationarity; unit root

Introduction Artificial neural networks are computational networks which aim and attempt

to simulate nerve cells or neuron of biological nervous system of human or animals (Graupe, 2007) . The difference between the neural networks and the other estimation and approximation methods is that neural networks conclude the hidden layers in which the input variables or data are transformed into special function, as the logistic or the negative exponential and many more. With this hidden layer and the synapses functions, the approach can be prove a very efficient to model and to estimate nonlinear processes (McNelis, 2005). In this paper we have to deal with two macroeconomic series , which are characterized of trend and circularity. 1

Aryal and Yao-Wu (2003) applied a MLP network with 3 hidden layers to forecast the Chinese construction industry and they compare the forecasting performance of the MLP networks with that of ARIMA. They found that the RMSE of the MLP estimation is 49 percent lower than the ARIMA counterpart. Maasoumi et al., (1996) have applied a back-propagation ANN model to forecast GDP and unemployment rate among others. The network they apply is a single hidden layer feedforward networks with the hidden units. Swanson and White (1997a, 1997b) applied neural networks to forecast nine seasonally adjusted US macroeconomic time series and they found generally neural networks outperform the linear models. Tkacz and Hu (1999) have applied neural networks to forecast the Canadian GDP growth at 4-quarter horizon and they found that forecast accuracy is statistically significant, while the performance in the 1-quarter horizon is poor. Also they found that the best neural networks models outperform the best linear models by 15 to 19 percent at 4quarter horizon. Tkacz (2001) has found that neural networks produce lower forecasting errors for the yearly growth rate of the real Canadian GDP relative with the linear and univariate models.

Data The data concern quarterly series of the real gross national product (GNP) and

the unemployment rate for the economy of the USA during period 1948-2006. The data have been obtained by the Reserve Federal Bank of St. Louis.

Methodology a. Autoregressive moving average The first model we estimate is the ARMA, which its process (Gujarati, 2004)

is defined as â&#x20AC;Ś â&#x20AC;Ś â&#x20AC;Ś â&#x20AC;Ś (1) This is the ARMA(p,q) process. If the series are not stationary in their levels , which means that arenâ&#x20AC;&#x2122;t I(0), then we have to estimate an ARIMA(p,d,q) process (Gujarati, 2004). b. Generalized Regression Neural Networks The GRNN is defined as â&#x2C6;&#x17E;

E[y | x] =

â&#x2C6;Ť â&#x2C6;Ť

-â&#x2C6;&#x17E; â&#x2C6;&#x17E; -â&#x2C6;&#x17E;

, where E[y | x]

yg(x, y) dy

(2)

g(x, y) dy

is the expected value of y given x

and g(x,y) is the Parzen

probability density estimator . If the value of g(x,y) is unknown , then it can be estimated from a sample of observations of x and y. The predicted output obtained by GRNN is: n

y(x) =

â&#x2C6;&#x2018;

y i exp( â&#x2C6;&#x2019;

â&#x2C6;&#x2018; exp( â&#x2C6;&#x2019;

|| x â&#x2C6;&#x2019; x i || 2 2Ď&#x192;

)

(3)

)

Usually the GRNN consists of four layers. The first layer , which are the input data, the synaptic and the activation functions are linear. In the second layer, the pattern layer, the synaptic function is the radial and the activation function is the negative exponential. The third layer, the summation layer, has as the first layer linear synaptic and activation functions. The last layer , the output, has a synaptic function a 3

division and linear activation function. More specifically input layer receives the input vector X and distributes the data to the pattern layer. Each neuron in the pattern layer generates an output θ, which θ i = exp( − || x − x2i || ) 2σ i 2

and present the results to the

summation layer. In this layer the numerator and denominator neuron compute the weighted and simple sums based on the value of w and θ , which is wijθj , the numerator is Sj = Σi wijθj and denominator is Sd = Σi θj. In the output layer output y are computed as Υj = Sj/ Sd. We must mention that the hidden layer consists of 24 units. The smooth rate for GNP is set at 0.01 and for the unemployment rate is set at 0.05 based on the lowest train and test errors. In our case we propose the AR-GRNN model (Li et al., 2007), which means that the output is the vector of data yt and inputs are the data with lags as yt-1, yt-2…yt-p. So the general form of the AR-GRNN is defined as , , … … … , (4) , where F is a function produced by GRNN network. But in the case of unemployment we consider the first differences, because we suspect that unemployment , is not probably stationary, as indicates the KPSS test , so we apply the following AR(p) function , , … … … ,

(5)

We apply relation (4) and (5) for all neural networks models and specifically we apply AR(1) for GNP and AR(2) for the first differences of unemployment rate. The technique we obtain is the following. Suppose that we have quarterly output data for a period e.g. 1948:Q1-2006:Q4 which is the variable yt. If we have AR(1) then we obtain the yt-1 , which is the output data with one lag. But this lag is referred again to 4

same data for period 1948:Q2-2007:Q1, which means that we don’t extinguish the last observation , but we put it forward to the next period. The same process is followed for AR(2). So in this paper we estimate for the period 1948:Q1-2006:Q4 and then we make the forecast for the period 2007:Q1-2008:Q1. This definition is applied also for the other two neural network models. In figure 1 is presented a general GRNN architecture. In all neural network models estimations the training sample is set up for period 1948:Q1-1990:Q4 and the testing sample is set up for period 1991:Q12006:Q4. The

Output Layer ……………..

Numerator 22 2

………………

…………

Summation Layer

Pattern Layer

Input Layer

……………. X1

Denominator

Figure 1. General GRNN architecture

c. Radial Basis Function The radial basis function is defined (Bishop, 1995) as M

y k ( x) = ∑ wkjφ j ( x) + wko (6) j =1

, where wkj are the weights and wko are biases and φj(x) can be estimated by

φ j ( x ) = exp( −

|| x − µ j || 2σ

)

(7)

The RBF consists by three layers, the input, which its synaptic and activation function are linear, the hidden layer , where the synaptic and the activation functions are radial and negative exponential respectively. Finally the third layer, which is the output layer, has linear synaptic and activation function, as in the case of the input layer. In figure 2 we present a general RBF illustration. The hidden layer in the RBF estimation has 11 units. The radial for GNP and for unemployment rate has been set at 50 based on the lowest train and test error as in the case of GRNN estimation. The definition of AR-RBF function is applied as in the AR-GRNN case. We present a general RBF illustration in figure 2.

d. Multilayer perceptron The last model we estimate is the multilayer perceptron (MLP), which has two differences in relation with the RBF (McNelis, 2005). First the RBF has at the most one hidden layer, while MLP can have more. Second the activation function in RBF computes the Euclidean distance of the between the signal from input vector and the

center of that unit , while MLP computes the inner products of the inputs and the weights for that unit. The first layer, input, in the MLP has linear synaptic and activation function, as the last layer, the output, has. The hidden layers, which in our case are three , have linear synaptic function and hyperbolic activation functions. For networks with binary units MLP with one hidden layer has been shown that is suffice. But in our case we

Output

Linear weights

Radial basis functions

Weights

Input x

Figure 2. General RBF architecture

have continuous variables or data , so we prefer three hidden layers. In the first phase the back-propagation method is applied. Each layer consists of units and receive input from the units of the layer directly below, and then send the output to unit directly above the unit. The Ni inputs are fed into the first layer of Nh,1 hidden units (Krose & Smagt, 1996). The mathematical concept of the back-propagation method is (1) 7

, where â&#x2C6;&#x2018;

(2)

, to get the delta rule we must set

(3)

The error measure of Et is defined as the total square error for pattern t at the output units and it is

'( %& & $ â&#x2C6;&#x2018;&)

(4)

, where %& is the desired output for unit i and pattern t. Then we can write by the chain rule !"#

(5)

*# !"#

But by equation (2) we find that the second factor from the right hand term of the equation (5) is equal with *# !"#

And we define the first factor

(6)

+ *

(7)

, so equation (3) can be written as + (8) Then to compute + we write the partial derivation, by applying the chain rule, as the product of two factors. The one factor in relation (9) reflects the change in error as a function of the output of the unit , while the other reflects the change in the output as a function of changes in the input. Relation (9) is defined as

+ * , #

We know that the second factor of (9) is 8

(9)

,# *#

(10)

, which is the derivative of function f for the kth unit. For the first factor computation we assume that k=i. Then in this case we have

%& &

(11)

& %& & & &

(12)

And then we have

, for any output unit i. Second if k is a hidden unit and not an output, which means that k=h , then the error measure can be written as a function of net inputs from hidden to output layers and we use the chain rule. ,.

( â&#x2C6;&#x2018;&) *

( ,(

( â&#x2C6;&#x2018;&) *

( ,(

'( '( . â&#x2C6;&#x2018;' ) &, â&#x2C6;&#x2018;&) * /& â&#x2C6;&#x2018;&) & /& (13) "

(

( 0 / â&#x2C6;&#x2018;&) & /&

(14)

In the first phase we use the back-propagation method. In the second phase we use the Levenberg-Marquardt algorithm (Bishop, 1995). Suppose that we have the error function

$ â&#x2C6;&#x2018;2 1 2

(15)

, where Îľ4 is the error for the nth pattern. We set WA as the old weight space and WB as the new weight space. Then we can expand the error vector Îľ to first order in Taylor series. Îľ(WB) = Îľ(WÎ&#x2018;) + Î&#x2013; (WB â&#x20AC;&#x201C; WÎ&#x2018;) (16) , where Z is matrix and is defined as 6 7

5 2& !

(17)

(

So the error function (20) can be written as

$ â&#x2C6;&#x2018;2 Îľ WÎ&#x2018; Î&#x2013; WB â&#x20AC;&#x201C; WÎ&#x2018; ||Îľ WÎ&#x2018; Î&#x2013; WB â&#x20AC;&#x201C; WÎ&#x2018; || (18) 9

In this paper we estimate a MLP network with three hidden layers and three units each of them. The learning rate is set at 0.01 and the momentum at 0.3. In the first phase the number of epochs are 100 and in the second phase they are 500. The AR-MLP is defined as in the other two neural network models, the AR-GRNN and the AR-RBF. In figure 3 a general MLP illustration with three hidden layers is presented.

h No

Nh,1

Nh,t-1

Figure 3. MLP architecture with three hidden layers

Also we will apply the unit root test to examine if the series are I(0) or not, which with other words means, if they are stationary in levels or in first difference and above. We apply these tests to define if we have an ARMA(p,q) or an ARIMA(p,d,q) process. We apply two tests the Dickey-Fuller (Greene, 2003) and the KPSS (Kwiatkowski, 1992) tests. For DF GLS test we examine the regression with constant and trend and it is > , ?@ 1 (19)

And we test the hypothesis H0: Ď&#x2020;=1, Î´=0 => yt ~ I(1) with drift H1: Ď&#x2020;<|1| => yt ~ I(0) with deterministic time trend , which means that if we accept the null hypothesis then the series are non stationary in first differences , so they are I(1), else if we reject the null hypothesis the time series are stationary, I(0). For the KPSS test we have the hypotheses. H0: stationary H1: non-stationary The KPSS test is based in the residuals by the OLS regression of yt on exogenous variables xt. Specifically it is yt = Îą + Î˛t + ÎłÎ&#x2013;t (20) If Îł equals with zero , then the process is stationary if Î˛=0 and trend stationary if Î˛ B 0 . Let et denotes the OLS residuals, et= yt - Îą - Î˛t The KPSS statistic is

DEFF

, where IJ

â&#x2C6;&#x2018;LM@ K G

â&#x2C6;&#x2018;G ) $ H IJ

2 â&#x2C6;&#x2018;P ) 1 PQ R , while R

â&#x2C6;&#x2018;L TM"U@ ST ST?" G

To compare the forecasting performance between the models we examine, we apply two statistical measures, the RMSE (root mean square error) and the MAE (mean absolute error).

Results

Table 1 Unit root tests for real GNP and unemployment rate of USA Test

Series

t-statistics

Critical values -3.46 (1%)

DF GLS

GNP

-10.466

-2.92 (5%) -2.62 (10%) -3.46 (1%)

DF GLS

Un. Rate

-2.43

-2.92 (5%) -2.62 (10%)

Test

LM-stat

Critical Values 0.216 (1%)

KPSS

GNP

0.0299

0.146 (5%) 0.119 (10%) 0.216 (1%)

KPSS

Un. rate

0.2375

0.146 (5%) 0.119 (10%)

From table 1 we conclude that real GNP is I(0), so it is stationary in levels with both tests. For unemployment rate we conclude that with KPSS test is I(1) as we can see from table 2.

Table 2 KPSS unit root test for first difference of unemployment rate Test

Series

LM stat

Critical values 0.216 (1%)

DF GLS

Un. rate

0.0338

0.146 (5%) 0.119 (10%)

According to the three information criteria , Akaike, Hannan-Quinn and Schwarz, we have an

ARMA(1,0) process for GNP and ARIMA(2,1,3) for

unemployment rate. So we apply an AR(1) for the three neural networks in the case of

GNP and AR(2) for unemployment rate. From table 3 we conclude that neural networks modeling is better, with AR-GRNN to have the lowest RMSE and MAE. So we prefer neural network for the forecasting of the real GNP of USA. Specifically we found that the RMSE of forecasting for neural network models is 7 to 17 per cent lower than the ARIMA counterpart an the MAE is 9 to 22 per cent lower than the MAE of ARIMA. Table 3 Forecasting comparison between ARIMA and neural networks for the real GNP of USA for the period 2007:Q1-2008:Q1 Model

RMSE

MAE

ARMA(1,0)

0.554

0.502

GRNN

0.460

0.393

RBF

0.500

0.433

MLP

0.515

0.455

In table 4 the conclusions are almost the same with that of GNP results. The neural networks modeling is again more reliable and these models present lower RMSE and MAE than that of ARIMA(2,1,3). Especially the AR-MLP and then the AR-GRNN are the best models. In the case of the unemployment the RMSE and MAE of neural networks are respectively 45 to 62 and 56 to 67 percent lower than the ARIMA counterparts. In table 5 we present the actual values of real US GNP and the predicted values generated by the four models. Table 4 Forecasting comparison between ARIMA and neural networks for the real unemployment rate of USA for the period 2007:Q1-2008:Q1 Model

RMSE

MAE

ARIMA(2,1,3)

0.217

0.202

GRNN

0.107

0.089

RBF

0.120

0.084

MLP

0.081

0.066

Table 5 Forecasting values for GNP with the four models Model

ARMA (1,0)

Period

Actual

Predicted

2007:Q1 2007:Q2 2007:Q3 2007:Q4 2008:Q1

0.164 0.983 1.411 0.462 0.044

0.76379 0.80734 0.82153 0.82615 0.82765

Model

Predicted

GRNN

0.860 0.390 1.001 0.209 0.060

Model

Predicted

RBF

0.513 0.941 0.695 0.974 0.643

Model

Predicted

MLP

0.729 0.893 0.936 0.790 0.864

Table 6 Forecasting values for unemployment rate with ARMA (2,1,3) and neural networks Model

ARIMA (2,1,3)

Period

Actual

Predicted

2007:Q1 2007:Q2 2007:Q3 2007:Q4 2008:Q1

0.567 -0.367 0.234 -0.100 0.700

0.390 -0.176 0.391 -0.232 0.342

Model

Predicted

GRNN

0.659 -0.300 0.197 0.100 0.749

Model

Predicted

RBF

0.629 -0.328 0.269 0.155 0.722

Model

Predicted

MLP

0.550 -0.280 0.384 -0.144 0.667

In table 6 we present the actual and predicted first differences of US unemployment with ARIMA (2,1,3) and the three neural network models. In figure 4 we present the forecasting with for US real GNP during the period 2007:Q1-2008:Q1, while in figure 5 are presented the forecasting results for US unemployment for the same period.

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0 2007Q1

2007Q2

2007Q3

ACTUAL

2007Q4

2008Q1

0.0 2007Q1

2007Q2

2007Q3 ACTUAL

ARIMA (2,1,3)

(a)

2007Q4

2008Q1

GRNN

(b)

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0 2007Q1

2007Q2

2007Q3 ACTUAL

2007Q4

2008Q1

0.0 2007Q1

2007Q2

2007Q3 ACTUAL

RBF

(c)

2007Q4 MLP

(d)

Figure 4. Actual against forecasting for US GNP in the period 2007:Q1-2008:Q1 with: (a) ARMA (1,0), (b) GRNN, (c) RBF and (d) MLP

2008Q1

-.2

-.4 2007Q1

2007Q2

2007Q3 ACTUAL

2007Q4

2008Q1

-.4 2007Q1

2007Q2

ACTUAL

ARIMA

(a) .8

-.2

2007Q2

2007Q3 ACTUAL

2007Q4

2008Q1

GRNN

(b)

-.4 2007Q1

2007Q3

2007Q4

2008Q1

RBF

-.4 2007Q1

2007Q2

2007Q3 ACTUAL

(c)

2007Q4 MLP

(d)

Figure 5. Actual against forecasting for US unemployment first differences in the period 2007:Q12008:Q1 with: (a) ARIMA (2,1,3), (b) GRNN, (c) RBF and (d) MLP

2008Q1

Conclusion We examined the forecasting performance of the traditional time series

method, the ARIMA process in comparison with three neural networks models. We proposed the three of the most usual models the generalized regression neural networks (GRNN), the radial basis function (RBF) and the multilayer perceptron (MLP). We obtained the autoregressive (AR) of these neural models, which means that input data are just the output data with time lags. We configure the AR(p) order as we define by the unit root tests, so we have AR(1) for the real gross national product (GNP) and AR(2) for the unemployment rate for the economy of USA. We show that all neural models outperform the ARIMA process , so we conclude that traditional time series and econometrical methods , are not always the best or even the only choice, but we must look out for more sophisticated modeling , as the neural networks modeling, which are able to capture with great success , the non-linear processes.

REFERENCES Aryal R.D. & Yao-Wu W. (2003). Neural Network Forecasting of the Production Level of Chinese Construction Industry. Journal of comparative international management , 29, 319-33 Bishop C.M. (1995). Neural Networks for Pattern Recognition. pp. 164-170, 290291. Oxford: Clarendon Press Graupe D. (2007). Principles of Artificial Neural Networks. 2nd Edition, pp. 1 World USA: Scientific Publishing Greene H. W. (2003). Econometric Analysis. Fifth Edition, pp. 637-640. New Jersey: Pearson Education Gujarati D. (2004). Basic Econometrics. Fourth Edition, pp. 839-840. USA: McGrawhill Krose B. & Smagt. V.D. P. (1996). An introduction to neural networks. Eighth edition . pp. 33-37. The University of Amsterdam Kwiatkowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992). Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root. Journal of Econometrics, 54, 159-178. Li W., Luo Y., Zhu Q., Liu J. & Le J. (2007). Applications of AR*-GRNN model 17

for financial time series forecasting. Neural Computing & Applications, London: Springer Maasumi E., Khotanzad A., and Abaye A. (1996). Artificial neural networks for some macroeconomic series: a first report. Econometric Reviews, 13 (1), 105-122 McNelis D. P. (2005). Neural Networks in Finance: Gaining Predictive Edge in the Market. pp. 21. USA : Elsevier Academic Press Swanson, N.R., and White, H. (1997a). A model selection approach to real time macroeconomic forecasting using linear models and artificial neural networks. Review of Economics and Statistics, 79, 540-50. Swanson, N.R., and White, H. (1997b) . Forecasting economic time series using adaptive versus non-adaptive and linear versus nonlinear econometric models. International Journal of Forecasting, 13, 439-61. Tkacz G. and Hu, S. (1999). Forecasting GDP Growth Using Artificial Neural Networks. Working Paper, Bank of Canada, 99-3 Tkacz G. (2001). Neural network forecasting of Canadian GDP growth. International Journal of Forecasting, 17, 57-69.