Time series analysis concerns the mathematical modeling of time varying phenomena, e.g., ocean waves, water levels in lakes and rivers, demand for electrical power, radar signals, muscular reactions, ECG-signals, or option prices at the stock market. This book gives a comprehensive presentation of stochastic models and methods in time series analysis. The book treats stochastic vectors and both univariate and multivariate stochastic processes, as well as how these can be used to identify suitable models for various forms of observations. Furthermore, different approaches such as least squares, the prediction error method, and maximum likelihood are treated in detail, together with results on the Cramér-Rao lower bound, dictating the theoretically possible estimation accuracy. Residual analysis and prediction of stochastic models are also treated, as well as how one may form time-varying models, including the recursive least squares and the Kalman filter. The book discusses how to implement the various methods using Matlab, and several Matlab functions and data sets are provided with the book. The book provides an introduction to time series modeling of various forms of measurements, focusing on how such models may be identified and detailed. It has a practical approach, and include several examples illustrating the theory.
| An Introduction to Time Series Modeling
An Introduction to Time Series Modeling
Andreas Jakobsson
Andreas Jakobsson received his M.Sc. from Lund Institute of Technology and his Ph.D. in Signal Processing from Uppsala University in 1993 and 2000, respectively. Since, he has held positions with Global IP Sound AB, the Swedish Royal Institute of Technology, King’s College London, and Karlstad University, held an Honorary Research Fellowship at Cardiff University, as well as acted as an expert for the IAEA. He is currently Professor of Mathematical Statistics at Lund University, Sweden. His research interests include statistical and array signal processing, detection and estimation theory, and related application in remote sensing, telecommunication and biomedicine.
The book is aimed at advanced undergraduate and junior graduate s tudents in statistics, mathematics, or engineering. Helpful prerequisites include courses in multivariate analysis, linear systems, basic probability, and stochastic processes. Art.nr 36415
Andra upplagan
2:a uppl.
An Introduction to
Time Series Modeling
Andreas Jakobsson
www.studentlitteratur.se
978-91-44-10836-0_01_cover.indd 1
2015-10-06 12:04
Copying prohibited All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. �e papers and inks used in this product are eco-friendly. Art. No ����� ���� ���-��-��-�����-� Edition �:� © �e Author och Studentlitteratur ���� www.studentlitteratur.se Studentlitteratur AB, Lund
Printed by Holmbergs i Malmö AB, Sweden 2015
October �, ���� – sida � – � �
CONTENTS
Preface � Abbreviations � Notational conventions ��
CHAPTER �
Introduction ��
CHAPTER �
Stochastic vectors ��
�.� �.� �.�.� �.�.� �.�.� �.�.� �.�
Introduction �� Stochastic vectors �� Properties and peculiarities �� Conditional expectations �� Normal distributed vectors �� Linear projections of Normal distributed vectors �� Exercises ��
CHAPTER �
�.� �.� �.�.� �.�.� �.� �.�
Stochastic processes ��
Introduction �� Properties and peculiarities �� Estimating the mean and the covariance sequence �� Vector representation �� �e power spectral density �� Filtering of a stochastic process ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
�
October �, ���� – sida � – � �
��������
�.� �.�.� �.�.� �.�.� �.�.� �.� �.�
�e basic linear processes �� �e moving average process �� �e autoregressive process �� �e Levinson-Durbin algorithm� �� �e ARMA process �� Estimating the power spectral density �� Exercises ��
CHAPTER �
�.� �.� �.�.� �.�.� �.�.� �.� �.�.� �.�.� �.�.� �.�.� �.� �.� �.� �.� �.�
Introduction �� Finding an appropriate model structure �� �e partial autocorrelation function �� �e inverse autocorrelation function� ��� �e extended sample autocorrelation function� ��� Data with trends and seasons ��� Deterministic trend ��� Stochastic trend ��� Constant trend ��� Seasonal trend ��� Using a transformation to stabilize the variance ��� Transfer function models ��� Intervention analysis� ��� Outliers and robust estimation� ��� Exercises ���
CHAPTER �
�.� �.� �.�.� �.�.� �.�.� �.�.� �.�.� �.� �.�.� �
Identi�cation and modeling ��
Estimation and testing ���
Introduction ��� Estimating the unknown parameters ��� Least squares estimation ��� Weighted least squares ��� Prediction error method ��� Maximum likelihood estimation ��� �e Cramér-Rao lower bound� ��� Estimating the model order ��� Information theoretic models ��� © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida � – � �
��������
�.� �.�.� �.�.� �.�.� �.�.� �.� �.� �.�
Residual analysis ��� Testing the estimated ACF and PACF ��� Testing the cumulative periodogram ��� Testing for sign changes ��� Testing if the residual is Normal distributed ��� Two modeling examples ��� Testing for periodicities� ��� Exercises ���
CHAPTER �
�.� �.� �.� �.� �.�
Introduction ��� Optimal linear prediction ��� Prediction of ARMA processes ��� Prediction of ARMAX processes ��� Exercises ���
CHAPTER �
�.� �.� �.� �.� �.� �.�.� �.�.� �.� �.� �.�
Multivariate processes ���
Introduction ��� Common multivariate processes ��� �e multivariate Yule-Walker equations ��� Identi�cation and estimation ��� Maximum likelihood estimation ��� �e case of a known covariance matrix ��� �e case of an unknown covariance matrix ��� Multivariate residual analysis ��� Robust covariance matrix estimation� ��� Exercises ���
CHAPTER �
�.� �.� �.� �.� �.�
Prediction of stochastic processes ���
Tracking dynamic systems ���
Introduction ��� Recursive least squares ��� Recursive PEM� ��� �e linear state space representation ��� �e Kalman �lter ���
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
�
October �, ���� – sida � – � �
��������
�.� �.�
Practical considerations� ��� Exercises ���
APPENDIX A
A.� A.� A.� A.�
Matrix inversion lemmas ��� Euler’s formula and trigonometric relations ��� Kronecker products and the vec operator ��� Cauchy-Schwarz inequality ���
APPENDIX B
B.� B.� B.� B.� B.� B.� B.� B.� B.� B.��
Some useful formulae ���
Probability distributions ���
�e Normal distributed vectors ��� �e χ � -distribution ��� �e Cauchy distribution ��� �e F-distribution ��� �e Rayleigh distribution ��� �e Rice distribution ��� �e Poisson distribution ��� �e Student’s t-distribution ��� �e binomial distribution ��� �e Wishart distribution ���
APPENDIX C
Matlab functions ���
APPENDIX D
Exercise solutions ���
Bibliography ��� Index ���
�
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
CHAPTER �
October �, ���� – sida �� – � ��
Stochastic processes
A thing of beauty is a joy for ever: Its loveliness increases; it will never Pass into nothingness; but still will keep A bower quiet for us, and a sleep Full of sweet dreams, and health, and quiet breathing. J��� K����
�.� Introduction We will now proceed to extend the earlier discussion on stochastic variables and vectors to also consider stochastic processes, which, in the discrete time case, essentially may be viewed as a sequence of random variables, which occasionally, for practical purposes, makes the di�erence to the notion of a stochastic vector a bit blurred. We will begin by introducing some basic de�nitions on dependency and stationarity, as well as discussing when, how, and how well the mean and correlations of a process may be estimated from a single (vector) observation. Clearly, this will only be possible if we impose some notable restrictions and assumptions on the process of which we treat the observation as a realization; such a process will need to exhibit some form of regularity to allow us to measure these quantities from just one possible outcome. To make this possible, we will make the assumptions that the process is both stationary and ergodic, where the �rst assumption basically assures that the statistics of the process will not vary over time, whereas the other that the characteristics of the process are measurable from only a single realization. Clearly, these are both quite strong assumptions, and one should keep this in mind in the continued modeling - if these assumptions © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
are violated, for example with a measurement that changes its characteristics over time (such as, for example, a speech signal), the rest of the modeling will be �awed and will likely not yield the desired results. �is typically implies that one has to examine relatively short segments of a time series, for which it may be reasonable to make the assumption that the process is at least reasonably stationary. Continuing, we will also discuss the frequency distribution of the process, and how one may estimate this. We will also examine the properties of a stochastic process that is �ltered through a linear (and time-invariant) �lter, and, using these results, introduce the commonly used moving average (MA), autoregressive (AR), and autoregressive moving average (ARMA) processes, as well as examine their properties. �ese basic processes will then make up our basic building blocks when we in the following chapters examine how to identify and model a time series. Later on, in Chapter �, we will then proceed to extend on the here introduced notions to also allow for multi-dimensional stochastic processes, allowing us to model dependencies between di�erent time series and more complicated processes. But let us begin; what is then a stochastic process? According to the formal de�nition: De�nition �.� A stochastic process is a family of random variables {x t , t ∈ T } de�ned on a probability space, for some index set T . �
As we are here mainly interested in using the notion of a stochastic process as a tool for signal modeling, forecasting, �ltering, and similar operations, we will not dwell on the more fundamental aspects of this de�nition, but rather examine the concept from a more applied point of view (the interested reader is referred to [Lin��] for a more careful treatment of the subject). In particular, we will here primarily be interested in Gaussian processes. Reminiscent to de�nition �.�, such a process is de�ned as De�nition �.� A Gaussian process is a stochastic process {x t , t ∈ T } for which any �nite linear combination will be normally distributed. �
With these basic de�nitions established, we now proceed to examine the properties of a stochastic process in some further detail. ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
�.� Properties and peculiarities To allow us a practical approach to the topic, we will here restrict our attention to only such processes that are stationary, which are processes for which the statistical properties of the process do not change over time. In particular, we will be interested in processes that exhibit so-called wide-sense (or weak) stationarity (WSS), which implies that: A stochastic process is wide-sense stationary (WSS) if and only if (i) �e mean of the process is constant and �nite. (ii) �e autocovariance C {y s , y ∗t } only depends on the di�erence (s − t), and not on the actual values of s and t. (iii) �e variance of the process is �nite, i.e., E ��y t �� � < ∞ . In order to measure the dependencies within and between WSS processes, we de�ne the autocovariance, crosscovariance, autocorrelation, and crosscorrelation functions for such a process as: De�nition �.� �e autocovariance function for y t is de�ned as r y (k) ≡ C {y t , y ∗t−k } △
∗
= E ��y t − m y � �y t−k − m y � �
= E {y t y∗t−k } − m y m∗y =
r ∗y (−k)
(�.�) (�.�) (�.�) (�.�)
where m y = E{y t } denotes the mean of y t . In particular, the variance of y t is given as V {y t } ≡ r y (�) = E {y t y ∗t } − m y m∗y
(�.�)
Similarly, the crosscovariance of the WSS processes x t and y t is de�ned as r x , y (k) = C {x t , y∗t−k }
∗
= E �[x t − m x ] �y t−k − m y � � = r ∗y,x (−k)
where m x denotes the mean of x t . © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.�) (�.�) �
��
October �, ���� – sida �� – � ��
� ���������� ���������
De�nition �.� �e autocorrelation function (ACF) of y t is de�ned as¹ ρ y (k) =
r y (k) r y (�)
(�.�)
and will therefore be bounded such that �ρ y (k)� ≤ �, with equality for k = �, as well as, possibly, for k = �, with � > �, if the signal is periodic with period � (see, e.g., Example �.�). Similarly, we de�ne the crosscorrelation of the processes x t and y t as ρ x , y (k) = �
r x , y (k)
(�.�)
r x (�)r y (�)
which will be bounded as �ρ x , y (k)� ≤ �.
�
Example �.� An o�en useful process is the so-called white noise process, which consists of a sequence of uncorrelated random variables from a given distribution, such that the mean of the process is zero and the variance is constant, i.e., m x = E{x t } = �, V {x t } = σx� , and C{x t , x t−k } = �, ∀k ≠ �. By de�nition, it follows that ∗ r x (k) = E {x t x t−k } − m x m∗x = �
σx� , �,
k=� k≠�
(�.��)
It can be noted that this process hardly exists in a real situation, but it is a most useful tool as a basic building block in constructing models of various forms of processes. � � It can be noted that a direct consequence of the de�nitions of the autocovariance and autocorrelation functions is that these functions must be positive semi-de�nite, in the sense N N that ∑ k=� a k a � r x (�t k − t � �) ≥ �, for any set of time points t � , t � , . . . , t N , and any real∑�=� N valued numbers a � , a � , . . . , a N . �is can be seen by letting y = ∑ k=� a k x t k . �en, (�.�) implies N N N N that � ≤ V {y} = ∑ k=� ∑�=� a k a � C{x t k , x t � } = ∑ k=� ∑�=� a k a � r x (�t k − t � �). �e result for the autocorrelation function follows similarly. Furthermore, it can also be shown that every positive semi-de�nite function will correspond to the autocovariance of some process [Lin��].
��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
Example �.� Consider a complex-valued sinusoidal signal, x t , with frequency ω � , i.e., x t = Ae i ω � t+i �
(�.��)
where � is a uniformly distributed random variable between (−π,π] and A is a (real-valued) scalar. �en,
and
m x = E �Ae i ω � t+i � � =
∫
π
−π
Ae i ω � t+i �
� d� = � �π
r x (k) = E �A� e i ω � t+i � e −i ω � (t−k)−i � � = A� e i ω � k
(�.��)
(�.��)
�us, the autocovariance function of a complex-valued sinusoid is also a sinusoid, both having the same frequency. Just as the white noise process discussed above, the complex-valued sinusoidal signal is a quite useful building block in many forms of models. � Example �.� Consider instead a real-valued sinusoidal signal x t = A cos(ω � t + �)
(�.��)
where � is a uniformly distributed random variable between (−π,π] and A is a (real-valued) scalar. One may then, for instance, use Euler’s formula (see also (A.�)), to rewrite x t as xt =
A i ω � t+i � + e −i tω � −i � � �e �
(�.��)
which, using steps similar to the ones in Example �.�, yields r x (k) =
A� cos(ω � k) �
(�.��)
Comparing with Example �.�, it may be noted that it can occasionally be simpler to work with complex-valued signals than working with their realvalued counterparts. � © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
Example �.� Figure �.�(a) shows an example of a real-valued periodic signal. �is signal is a voiced speech signal extracted from the utterance in Figure �.�, and is sampled at f s = ���� Hz. Clearly, the signal exhibits strong periodicities, and one may therefore conclude from Example �.� that the covariance function of this signal should contain the same periodicities as the actual signal. Figure �.�(b) illustrates this, as well as the fact that ρ y (k) is symmetric, and bounded as �ρ y (k)� ≤ �, with equality for k = �. It can further be noted that, as the signal does not contain a pure sinusoidal component, the ACF does not again reach unity for lags larger than zero, although the ACF is close to unity around lag ±��, suggesting that the signal has a notable periodicity with the corresponding frequency. � �e de�nition of WSS processes implies some very useful properties of the autocovariance function, namely: �e autocovariance function of a WSS process satis�es: (i) It is conjugate symmetric, i.e., r y (k) = r ∗y (−k). (ii) �e variance is always non-negative, i.e., r y (�) = E ��y t − m y �� � ≥ �. (iii) It takes its largest value at lag �, i.e., r y (�) ≥ �r y (k)� , ∀k. �ese properties are easily veri�ed for the above examples. Using (�.�), we similarly say that two processes are jointly stationary if △
r x , y (t � , t � ) = r x , y (t � − t � , �) = r x , y (τ)
with τ = t � − t � , noting that for such processes r x , y (τ) = r ∗x , y (−τ) � �r x , y (τ)� ≤ r x (�)r y (�) �r x , y (τ)� ≤ ��
� �r x (�) + r y (�)� �
(�.��)
(�.��) (�.��) (�.��)
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ��������� 0.25
1
0.2
0.8
0.15
0.6 0.4
0.05
Amplitude
Amplitude
0.1
0 −0.05
0 −0.2
−0.1
−0.4
−0.15
−0.6
−0.2 −0.25
0.2
0.69
0.7
0.71
0.72
0.73
0.74
−0.8 −60
−40
Time [s]
(a)
−20
0
20
40
60
Lag
(b)
An example of a real-valued periodic signal. The signal is a voiced speech signal extracted from the utterance in Figure �.�, together with the estimated correlation function of the signal. FIGURE �.�
and with r x , y (τ) = � if the processes are uncorrelated. In cases when E{x t y∗t−k } = �, we say that the processes are orthogonal, noting that for zero-mean processes this also implies that they are uncorrelated (although the reverse does not necessarily hold). �.�.� ESTIMATING THE MEAN AND THE COVARIANCE SEQUENCE
Here, we will typically assume that we have observed a single realization of the process, containing, say, N samples, measured over t = �, . . . ,N. In general, both the mean and the autocovariance function of such a process are unknown to us, and we will therefore need to estimate m y and r y (k) as accurately as possible from this one available (vector) observation. Starting with the mean, this is most naturally estimated as ˆy = m
� N � yt N t=�
(�.��)
which is an unbiased estimate of the true mean as ˆ y� = E �m
� N � E { yˆ t } = m y N t=�
(�.��)
To show that the estimate in (�.��) is also consistent, which implies that the estimate is (at least asymptotically) unbiased and that the variance shrinks © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
to zero as N grows without bound, we note that ˆ y� = V �m
r y (�) N−� N − �k� � N N r (t − s) = ρ y (k) � � � y N � t=� s=� N k=−N+� N
(�.��)
where we used (�.��) and (�.�), and set k = t − s (see also (A.��)). �us, if lim
N→∞
N−�
�
k=−N+�
N − �k� ρ y (k) N
(�.��)
ˆy is �nite, the variance will shrink to zero as N grows without bound, and m converge to m y (in the mean square sense). For such processes, we say that the process is ergodic in the mean, which, simply put, means that it is possible to form a consistent estimate of m y . A su�cient condition for this is that ρ y (k) → � as N → ∞, which implies that � N→∞ N lim
N−�
� ρ y (k) = �
(�.��)
k=−N+�
ˆ y } → � as N → ∞. Intuitively, this means that if t and s and thus that V {m are su�ciently far apart to ensure that y t and y s are (almost) uncorrelated, then such samples yield some new information which can be continuously added, making the mean estimate approach the true mean value. Using (�.��), we have thus shown the following useful result: �eorem �.� Let y t , for t = �, . . . ,N, be a realization of a Gaussian process with mean m y and covariance function r y (k). Assuming that the process is ergodic in the mean, and if the mean of the process is estimated using (�.��), the ˆ y } = m y , with variance resulting estimate is unbiased, i.e., E{m ∞
ˆ y } = � r y (k) lim NV {m
N→∞
(�.��)
k=−∞
ˆ y is a consistent estimate, converging (in probability and in the implying that m mean square sense) to m y . For large N, an o�en useful approximation is that ˆ y} ≈ V {m
� ∞ � r y (k) N k=−∞
(�.��)
ˆ y } ≈ r y (�)�N. or, in the special case of a white process, V {m ��
�
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
Proceeding with our discussion, we now examine the estimation of r y (k). �ere are two standard ways to estimate the autocovariance function, commonly termed the unbiased autocovariance estimate rˆuy (k) =
N � ∗ ˆ y ��y t−k − m ˆ y� � �y t − m N − k t=k+�
rˆby (k) =
� N ∗ ˆ y ��y t−k − m ˆ y� � �y t − m N t=k+�
(�.��)
and the biased autocovariance estimate
(�.��)
for � ≤ k ≤ N − �. �ese covariance estimates deserve some further commenting; �rstly, note that the sums in (�.��) and (�.��) both start at t = k + �. �is is due to the fact that the �rst available sample is y � , and any values t < k + � in the sums would thus use measurements y � , for � < �, which are not available. Secondly, note that the estimates only di�er in the normalization constant before the sum. Letting ψ k denote the unweighted sum, we note that △
N
∗
ˆ y ��y t−k − m ˆ y� ψ k = � �y t − m t=k+� N
∗
ˆ y − m y )��(y t−k − m y ) − (m ˆ y − m y )� = � �(y t − m y ) − (m t=k+� N
∗
∗
ˆ y − my� = � �y t − m y ��y t−k − m y � − �m t=k+�
N
∗
N
� �y t − m y �
t=k+�
ˆ y − m y � � �y t−k − m y � + �N − k��m ˆ y − my� − �m t=k+�
�
(�.��)
If we make use of the approximation N
N
t=k+�
t=k+�
ˆ y − my� � �y t − m y � ≈ � �y t−k − m y � ≈ (N − k)�m
(�.��)
then (�.��) can be approximated as N
∗
ˆ y − my� ψ k ≈ � �y t − m y ��y t−k − m y � − �N − k��m t=k+�
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
�
(�.��) ��
October �, ���� – sida �� – � ��
� ���������� ���������
ˆ y }�. �e mean of the “unbiimplying that E{ψ k } ≈ (N − k) �r y (k) − V {m ased” and the biased estimates are thus � ˆ y} E{ψ k } = r y (k) − V {m N−k � N−k ˆ y }� �r y (k) − V {m E{r by (k)} = E{ψ k } = N N k N−k ˆ y} = r y (k) − r y (k) − V {m N N
E{r uy (k)} =
(�.��)
(�.��)
Note that, in spite of the commonly used names, both estimates are thus biased, ˆ y , r uy (k) will become unbiased, although if one ignores the variance of m somewhat explaining the used nomenclature, whereas r by (k) will still be biased in this case. In general, the latter will have a larger bias than the former, especially when k is large with respect to N. In the special case of zero-mean processes, which o�en are of particular interest here, the result simpli�es notably, yielding N
E {ψ k } = � r y (k) = (N − k)r y (k) t=k+�
implying that rˆuy (k) is an unbiased estimate, justifying its name, i.e., E �ˆr uy (k)� = r y (k)
(�.��)
(�.��)
whereas r by (k) will result in an estimate that is only asymptotically unbiased, i.e., it is unbiased only as N → ∞. One may from this observation conclude that rˆuy (k) ought to be a more appropriate estimate than rˆby (k), but as will be discussed further in the following, this is, perhaps somewhat surprisingly, actually not the case. One can get an initial feeling for why the biased estimator may be more appropriate by considering the variance of the resulting estimated autocovariance. For high lags, the unbiased estimate will be formed by averaging only very few terms, thereby giving estimates which have a variance that is increasing with the lag number. On the other hand, the biased estimate, while still only being formed by averaging a few terms, will be normalized with N independently of the lag number, thereby notably reducing the value of the resulting estimates, and as a result the variance ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
of the �nal estimates. �is stabilization of the variance for higher lags is of notable importance, and in the following, we will in fact always use the biased estimate given by (�.��). �roughout this book, the covariance sequence will always, unless otherwise noted, be estimated using the biased estimate, i.e., rˆby (k) =
� N ∗ ˆ y ��y t−k − m ˆ y� � �y t − m N t=k+�
To simplify notation, this estimate will in the following only be denoted rˆy (k), whereas the unbiased estimate will still be referred to as rˆuy (k). We will similarly estimate the crosscovariance between the two jointly stationary processes, x t and y t , as rˆx , y (k) =
� N ∗ ˆ x ��y t−k − m ˆ y� � �x t − m N t=k+�
(�.��)
where the means of the processes have been estimated similar to (�.��). Note that one should here also use the biased form, normalizing with N. In both the estimation of rˆy (k) and rˆx , y (k), it is important to be aware of the di�culty to estimate these covariances accurately for higher order lags. Not only will the number of terms available for averaging decrease with the increasing lag number, but due to �nite sample e�ects, both these estimates can exhibit correlation with themselves, making it appear as there may be a correlation among higher lags that is not there. O�en, this correlation appears as a pattern at larger lags that is also seen in the lower lags. As a practical rule of thumb, one should at most calculate covariances only for lags up to N��. Obviously, this rule also holds for the corresponding correlation functions. Extending on the above results, we may �nd the asymptotic distribution of the estimated ACF. © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
�eorem �.�� Let e t be a realization of a zero-mean white Gaussian process with variance σ e� . If ρˆe (k) is estimated according to de�nition �.�, i.e., ρˆe (k) =
rˆe (k) rˆe (�)
(�.��)
where rˆe (k) is estimated using (�.��), then, for k ≠ �, E{ρˆe (k)} = � � V {ρˆe (k)} = N
(�.��) (�.��)
Furthermore, ρˆe (k) is asymptotically Normal distributed for k > �.
�
An important consequence of theorem �.�� is that the ��� (approximative) √ con�dence interval of ρˆe (k), for k ≠ �, is ±�.��� N, i.e., with ��� con�dence, � ρˆe (k) ≈ � ± √ , N
for k ≠ �
(�.��)
√ �is means that estimated correlation values such that �ρˆe (k)� < �� N cannot be deemed to be signi�cantly di�erent from zero with a �� signi�cance level, √ i.e., we are unable to tell the di�erence between zero and values within ±�� N, and should therefore treat all of them as being “zero”. �.�.� VECTOR REPRESENTATION
An o�en convenient way to represent a set of measurements is in vector form. To allow for further �exibility, we will here divide the observed data into a collection of subvectors y t , each containing L ≤ N samples of y t , i.e., yt = � y t
�
y t+L−� �
T
(�.��)
for t = �, . . . , M, where M = N − L + � denotes the number of available subvectors y t ; if considering cases when all the samples are gathered in a single vector, i.e., when L = N, we will however, for simplicity, commonly ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
just write y in place of y N . Following the discussion in Section �.�.�, we can form the covariance matrix of y t as △
Ry = E�y t y∗t �
� r y (�) r ∗y (�) � r ∗y (L − �) � � .. . � � r y (�) r y (�) . . . =� � .. .. � . . r ∗y (�) � � � r y (L − �) � r y (�) r y (�) �
� � � � � � � � � � � �
(�.��)
(�.��)
Similarly to the covariance matrix for a general stochastic vector, Ry will be a positive semi-de�nite Hermitian matrix. As can be seen from (�.��), the matrix will also have a Toeplitz structure, i.e., the matrix will have the same elements along each of the diagonals. Matrices with a Toeplitz structure are of particular interest as this form of very strong structure allows for the development of highly e�cient algorithms; we will illustrate an example of this in Section �.�.�. Example �.�� Let e t be a zero-mean white (real-valued) Gaussian process with variance σ e� . �en, as seen in Example �.�, r e (k) = σ e� δ K (k)
where δ K (k) is the Kronecker delta function, δ K (k) = �
�, �,
k=� k≠�
(�.��)
(�.��)
and the covariance matrix of the L-dimensional subvectors, e t , formed similar to (�.��), will be given by Re = σ e� I, where I is the L × L identity matrix. Figure �.� illustrates the estimated correlation for N = ��� samples of a white noise process. Looking at Figure �.�(a), showing ρˆe (k), for −�� ≤ k ≤ ��, it is clear that ρˆe (k) is a symmetric function taking its maximal value ρˆe (k) = �, for k = �. As expected from theorem �.��, we may also note that, counter to what could be expected from (�.��), ρˆe (k) ≠ � for k ≠ �
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.��) ��
October �, ���� – sida �� – � ��
� ���������� ��������� 1.2
1 1
0.8
Amplitude
Amplitude
0.8
0.6
0.4
0.6
0.4
0.2
0.2
0
0
−0.2 −60
−40
−20
0
20
40
60
−0.2 0
2
4
6
8
10
12
14
16
18
Lag
Lag
(a)
(b)
(a) The estimated correlation function for the white noise signal in Example �.��, as well √ as (b) a magni�ed version of parts of the �gure in (a). The dashed lines correspond to ±�� N. FIGURE �.�
where ρˆe (k) has here been estimated using (�.��). �is is due to the limited number of available samples; given N samples, one is simply not able to estimate r e (k), and thus ρ e (k), with better accuracy than this. However, if we keep in mind the accuracy of the estimate, we may still conclude that it is reasonable to assume that the realization is that of a white process; this can be seen in Figure �.�(b), which shows a close-up of the estimates for lags � ≤ k ≤ ��, with the corresponding con�dence intervals. � Example �.�� �e covariance matrix for the process in Example �.� is Rx = �A�� aL (ω � )a∗L (ω � )
(�.��)
where aL (ω) is a so-called Fourier vector, de�ned as △
aL (ω) = � � e i ω
� e i ω(L−�) �
T
(�.��)
�us, Rx is a rank-one positive semi-de�nite Toeplitz matrix. Example �.�� Considering instead a sum of sinusoidal signals, such that d
xt = � A� e i ω� t �=�
��
�
(�.��)
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
for t = �, . . . ,N, where A � denotes the �th complex-valued amplitude. �en, d
△
Ry = � �A � �� aL (ω � )a∗L (ω � ) = APA∗ �=�
(�.��)
with aL (ω � ) de�ned as in (�.��), and
A = � aL (ω � ) . . . aL (ω d ) � P = diag�� �A � ��
(�.��)
. . . �A d �� ��
(�.��)
where diag {x} denotes the diagonal matrix formed with the vector x along the diagonal. �e Vandermonde² matrix A is a so-called Fourier matrix, and will have rank d (assuming that d ≤ L). As P is full rank, APA∗ will also have rank d. �is observation is key to the so-called MUSIC algorithm for estimating the unknown frequencies of y t ; we will return to this beautiful algorithm later on in Example �.�. �
Similar to the above discussion for r y (k), we will generally need to estimate the covariance matrix Ry from the available measurements. �e de�nition ˆ y as the Toeplitz matrix conin (�.��) suggests the forming of the estimate R structed from the estimated rˆy (k), obtained using (�.��), such that � rˆy (�) rˆ∗y (�) � rˆ∗y (L − �) � � .. .. � � rˆy (�) . ˆ r (�) . y ˆy = � R � .. .. � . . rˆ∗y (�) � � � rˆy (L − �) � rˆy (�) rˆy (�) �
� � � � � � � � � � � �
(�.��)
An alternative estimate can be obtained by instead forming the outer-product estimate M ˆ y = � � y y∗ R M t=� t t
(�.��)
� A matrix A is said to have a Vandermonde structure if the (k,�)th element of A satis�es A k ,� = α �k−� , for all indexes k and �, for some set of scalars, α � . © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
which, due to �nite sample e�ects, typically will not exhibit a Toeplitz structure. Imposing the Toeplitz structure, as is done if using (�.��) o�en yields undesirable e�ects, and one therefore typically prefer using (�.��) instead. However, all Toeplitz matrices are also persymmetric³ (although the opposite is not true), implying that such a matrix, A, satis�es A = JAT J
(�.��)
where J is the L × L exchange (or reversal) matrix formed as � � � J=� � � � � �
..
.
� � � � � � � � �
(�.��)
where all the empty indices of the matrix are zero. �is implies that one further alternative to estimate the covariance matrix is to impose a persymmetric ˆ y , forming the so-called forward-backward averaged covariance structure on R matrix estimate ˆ yf b = � �R ˆ yT J� ˆ y + JR R �
(�.��) fb
ˆ y yields estimates that are superior ˆ y is formed using (�.��). O�en, R where R ˆ to Ry , and if not otherwise speci�ed, this should be our choice for estimating Ry . �e estimates in (�.��) and (�.��) can be computed using the provided Matlab function covM.
� It is worth noting that the inverse of a (per)symmetric matrix will also be (per)symmetric. Generally, the inverse of a Toeplitz matrix is not Toeplitz, but as all Toeplitz matrices are persymmetric, the inverse of a Toeplitz matrix will be persymmetric. Furthermore, the inverse of a symmetric Toeplitz matrix will be centrosymmetric, i.e., it is both symmetric and persymmetric. If A ∈ Cm×m , such that A = JA∗ J, we instead say that A is a perhermitian matrix; in this case, the inverse of a Hermitian Toeplitz matrix will instead be centrohermitian. �e interested reader is referred to, e.g., [HBW��b, HBW��a, Pre��] for a further discussion on these and other interesting properties of such matrices.
��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
�.� �e power spectral density An o�en convenient way to characterize a stochastic process is via its power spectral density (PSD), de�ned as the discrete-time Fourier transform (DFT) of the autocovariance function, i.e., for −π < ω ≤ π, △
∞
� y (ω) = � r y (k)e −i ωk
(�.��)
k=−∞
�e inverse transform recovers r y (k), r y (k) =
� �π
r y (�) =
� �π
∫
π
−π
� y (ω)e i ωk dω
(�.��)
from which we note that
∫
π
−π
� y (ω) dω
(�.��)
For a zero-mean process, r y (�) = E{�y t �� }
(�.��)
measures the power of y t , and the equality in (�.��) thus shows that � y (ω) is indeed correctly named a power spectral density as it is representing the distribution of the signal power over frequencies. Under weak assumptions�, it can be shown that (�.��) is equivalent to (see also [SM��]) �� � � � �� N � � y (ω) = lim E � �� y t e −i ωt � � � N→∞ � N � � t=� � �
(�.��)
Using the DFT,
N
YN (ω) = � y t e −i ωt
(�.��)
t=�
the PSD in (�.��) can therefore be expressed as � y (ω) = lim E � N→∞
� � �YN (ω)� � N
� �e function needs to decay su�ciently rapidly, so that lim N→∞
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.��) � N
N �k� �r y (k)� = �. ∑ k=−N
��
October �, ���� – sida �� – � ��
� ���������� ���������
which also suggests the most natural way to estimate the PSD, i.e., as the magnitude square of the DFT of the data vector, i.e., � � N � �ˆ y (ω) = �YN (ω)� = �� y t e −i ωt � N N t=�
�
(�.��)
�is estimate, termed the periodogram, was introduced in ���� by Sir Arthur Schuster�, who derived it to determine hidden periodicities (non-obvious periodic signals) in time series [Sch��, Sch��]. Since � y (ω) is a power density, it is natural to assume that it should be real-valued and non-negative. �is is indeed the case which can readily be seen from (�.��). �e PSD of a WSS process y t , de�ned using (�.��), will be real-valued and non-negative, i.e., � y (ω) ≥ �,
∀ω
(�.��)
Further, the power spectral density is always periodic, such that � y (ω) = � y (ω + �πk)
(�.��)
for any integer k. In the particular case when the process is real-valued, the PSD is symmetric, so that � y (ω) = � y (−ω). Otherwise, if the process is complex-valued, the PSD is non-symmetric. As an alternative, one could use the de�nition in (�.��) to instead form the estimate of the PSD as �ˆ cy (ω) =
N−�
�
k=−(N−�)
rˆy (k)e −i ωk
(�.��)
where rˆy (k) is the (biased) autocovariance estimate de�ned in (�.��). �e resulting estimate is commonly referred to as the correlogram. It is important � Schuster applied the periodogram to �nd hidden periodicities in the sunspot numbers for the years ���� to ����, yielding the classical estimate of ��.��� years for the sunspot cycle.
��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ��������� 9 40
8 20
7
0
Power [dB]
Magnitude
6 5 4
−20
−40
3 −60
2 −80
1 0 0
0.1
0.2
0.3
0.4
0.5
Absolute frequency
(a) FIGURE �.�
−100 0
0.1
0.2
0.3
0.4
0.5
Absolute frequency
(b)
The periodogram estimate of the white noise signal in Example �.��.
to note that it is the biased autocovariance estimate that should be used here; if using the unbiased estimate, the resulting PSD estimate may actually become negative, leaning further support to our preference for the biased autocovariance estimate. It can be shown that the estimates obtained using (�.��) will actually coincide with the estimate obtained using (�.��), as long as the latter is formed using the biased autocovariance estimate (see , e.g. [SM��]). �is is most convenient as it is o�en simpler to use (�.��) when analyzing the performance of the estimate, whereas it is computationally simpler to use (�.��) when actually computing the estimate. Example �.�� �e white process in Example �.�� has the PSD � e (ω) = σ e� , which, as expected, is real-valued and positive. Figure �.� illustrates the periodogram estimate of a ��� sample long realization of this process, with σ e� = �. Figure �.�(a) shows the (regular) periodogram estimate, whereas Figure �.�(b) instead plots the estimate in decibel (dB). �e second plot is obtained as �ˆ de B (ω) = �� log��ˆ e (ω)�
(�.��)
where �ˆ de B (ω) is the periodogram estimate expressed in dB, whereas �ˆ e (ω) is the periodogram estimate expressed in the regular (linear) domain. Plotting the signal in dB is o�en preferable as it allows us to easier see the full range of values, as even the relatively small values are visible, whereas if expressed in the regular domain, it would be hard to see these. Here, since the PSD is © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
symmetric (why?), the �gures have been restricted to only show the positive frequencies. As is clear from the �gures, the periodogram estimate seems to be unbiased, having a mean of about � dB (i.e., �), but exhibits a very large variance. We will return to this aspect, discussing it in further detail in Section �.�. �
Example �.�� Extending on Example �.��, let y t = x t + e t , for t = �, . . . ,N, where x t is as de�ned in (�.��), and e t is assumed to be a zero-mean white noise, with variance σ e� , being independent of x t . �en, Ry = APA∗ + σ e� I
(�.��)
�e PSD of y t is
d
� y (ω) = � �A � �� δ D (ω − ω � ) + σ e� �=�
where δ D (ω) is the Dirac delta, satisfying f (a) =
∫
f (x)δ D (x − a) dx
(�.��)
(�.��)
In the particular case when x t is a sum of real valued sinusoids, it is clear that the spectrum will be symmetric, whereas it will otherwise not be. � Example �.�� Figure �.� illustrates the periodogram estimate of the voiced speech signal in Example �.�. As is typical for voiced speech, the signal can be seen to contain several spectral peaks at frequencies that are an integer multiple of the �rst peak frequency, the so-called fundamental frequency, or pitch. One common model for such signals is (see also, e.g. [CJ��]) d
y t = � α k sin(ω k t + � k ) + e t k=�
(�.��)
where α k , ω k , and � k are the amplitude, frequency, and phase of the kth sinusoidal component, with e t denoting some additive noise, and the frequencies ω k = �πk f � , with f � being the fundamental frequency. Figure �.� ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ��������� 50
Power [dB]
0
−50
−100
−150
−200 0
500
1000
1500
2000
2500
3000
3500
4000
Frequency [Hz]
The periodogram estimate of the voiced speech signal in Example �.��. The fundamental frequency of the signal is about ��� Hz. FIGURE �.�
indicates that this periodicity should be about ��� Hz. Carefully examining Figure �.�(b) indicates that the strong periodicity seen in the ACF lies somewhere between lag �� and ��, which, as the signal is sampled with � kHz, corresponds to a frequency somewhere between ������� ≈ ��� and ������� ≈ ��� Hz, which well matches the observed fundamental frequency. �
�.� Filtering of a stochastic process
We are o�en particularly interested in the �ltering of a stochastic process through a stable linear system. Let h k denote the impulse response of a discrete-time linear time-invariant (LTI) system, which is bounded-input, bounded-output (BIBO) stable, i.e., the output of the system, y t , remains bounded for all bounded inputs, x t , such that that its impulse response is absolutely summable, i.e., ∑∞ k=−∞ �h k � < ∞. For example, a linear system with k h k = a u k , where u k is a unit step, i.e., u k = � if k ≥ �, and zero otherwise, is stable whenever �a� < �. �e z-transformation of such a system is ∞
H(z) = � h k z −k k=−∞
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.��)
��
October �, ���� – sida �� – � ��
� ���������� ���������
where z −� denotes the unit delay operator, de�ned as z −� x t = x t−�
(�.��)
�e output, y t , of this system is then formed as ∞
y t = � h k x t−k
(�.��)
k=−∞
or, alternatively, if expressed in the z-domain, Y(z) = H(z)X(z), where x t is the input to the system, and ∞
Y(z) = � y t z −t t=−∞
�en,
and
∞
t=−∞
∞
△
m y = E � � h k x t−k � = m x � h k = m x H(�) k=−∞
with
∞
X(z) = � x t z −t
k=−∞
∞
H(ω) = � h k e −i ωk
(�.��)
(�.��)
(�.��)
k=−∞
�e mean of the output process is thus the mean of the input process scaled with the gain of the �lter, i.e., the ampli�cation of the system. Comparing (�.��) and (�.��), it can be noted that we are here using the fact that z = e i ω . A notationally more correct notation would thus be H(e i ω ) = H(z), which is also commonly seen in the literature. Here, as is also commonly done, in order to simplify the notation, we prefer to use H(ω) in place of H(e i ω ). We now assume that the input process has zero-mean, and proceed to examine the covariance function of the output process. To do so, we begin by investigating the crosscorrelation between x t and y t , being ∞
r y,x (t + k,t) = E {y t+k x t∗ } = E � � h � x t+k−� x t∗ � ∞
�=−∞
= � h � E {x t+k−� x t∗ } �=−∞ ∞
= � h � r x (k − �) �=−∞
��
(�.��) (�.��) (�.��)
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
Note that the crosscovariance function will only depend on the time di�erence k, thus indicating that if x t is WSS, y t will in fact also be WSS. With this result, we proceed to examine the covariance function of the output process, which can then be expressed as ∞
r y (t + k,t) = E {y t+k y ∗t } = E �y t+k � x �∗ h ∗t−� � �=−∞
∞
= � h ∗t−� E {y t+k x �∗ } �=−∞ ∞
= � h ∗t−� r y,x (t + k − �) �=−∞
Changing the summation index in (�.��) by setting m = t − � yields ∞
r y (t + k,t) = � h∗m r y,x (m + k) m=−∞
(�.��) (�.��) (�.��)
(�.��)
which using (�.��) yields ∞
r y (k) = �
∞
∗ ∗ � h m h � r x (m + k − �) = r x (k) � h k � h−k
m=−∞ �=−∞
(�.��)
where � denotes the convolution operator, or, in the frequency domain, � y (ω) = �H(ω)� � x (ω) �
(�.��)
It is worth noting that this also implies that ∞
r y (�) = �
∞
∗ � h m h � r x (m − �)
m=−∞ �=−∞
(�.��)
which, for �nite length �lters, say, of length L, implies that
where
r y (�) = σ y� = h∗ Rx h h = � h�
...
hL �
(�.��)
T
(�.��)
and Rx is de�ned as in (�.��). © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
De�nition �.�� �e cross spectral density of two stationary processes, x t and y t , is de�ned as the DFT of the crosscovariance function, i.e., ∞
� x , y (ω) = � r x , y (k)e −i ωk k=−∞
(�.��)
where r x , y (k) is de�ned as in (�.�). In general, � x , y (ω) is complex-valued. �
Occasionally, it is also convenient to express the cross spectral density in terms of the cross-amplitude spectrum, A x , y (ω), and the phase spectrum, Φ x , y (ω), de�ned as � x , y (ω) = A x , y (ω)e iΦ x , y (ω)
(�.��)
where A x , y (ω) = A x , y (−ω) is real-valued and non-negative, whereas the phase spectrum Φ x , y (ω) = −Φ x , y (−ω) is de�ned on (−π��,π��] + �π, for an integer � (see also [CLV��]). De�nition �.�� �e (complex) coherence spectrum of the two stationary processes, x t and y t , is de�ned as C x , y (ω) = �
� x , y (ω)
� x (ω)� y (ω)
(�.��)
which in general is complex-valued. It should be noted that the magnitude coherence spectrum is bound as �C x , y (ω)� ≤ �
�
(�.��)
with equality, for all ω, if and only if, x t and y t are related via a linear �ltering, as de�ned in (�.��). From (�.��) and (�.��)–(�.��), it can be seen that the cross-spectrum of the input and output of a linear system is related via the so-called Wiener-Hopf equation � x , y (ω) = H(ω)� x (ω)
(�.��)
It is worth noting that this relationship allows for a way to estimate the impulse response from the auto- and cross-spectra. ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
When �ltering the WSS process x t through the stable linear timeinvariant system, h k , the output, y t , will satisfy: � y (ω) = �H(ω)� � x (ω) �
� x , y (ω) = H(ω)� x (ω) m y = m x H(�)
r y (�) = h∗ Rx h
with H(ω) and h being de�ned in (�.��) and (�.��), respectively. As an aside, we note that linear processes, may, in general, be written on the so-called random shock form, i.e., ∞
y t = m y + � h � e t−� �=�
(�.��)
where e t is a white noise process and m y is the mean of the process. Without loss of generality, one typically assumes that m y = �, and that the noise sequence is scaled such that h � = �. Expressed on a transfer function form, (�.��) can be written as △
∞
y t = H(z)e t = � � h � z −� � e t �=�
(�.��)
with H(z) being denoted the transfer function of the system. �en, if there exists an inverse transfer function, G(z), such that G(z)H(z) = �
or, equivalently, G(z) = H −� (z)
(�.���)
one speaks of G(z)y t = e t as the inverse form, with G(z) typically being determined as the Taylor series expansion of H −� (z). For the variance of y t to be �nite, such that the process is stationary, one requires that ∞
H(z) = � h � z −� �=�
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.���) ��
October �, ���� – sida �� – � ��
� ���������� ���������
converges for �z� ≥ �. Similarly, the process is invertible if and only if ∞
G(z) = � �� z −�
(�.���)
�=�
converges for �z� ≥ �.
�.� �e basic linear processes �.�.� THE MOVING AVERAGE PROCESS
We proceed to de�ne the �rst of two basic forms of linear �lters, namely: De�nition �.�� �e process y t is called a moving average process if △
y t = e t + c � e t−� + . . . + c q e t−q = C(z)e t
(�.���)
where C(z) is a monic� polynomial of order q (in z −� ), i.e., C(z) = � + c � z −� + . . . + c q z −q
(�.���)
where c q ≠ �, and e t is a zero-mean white noise process with variance σ e� . �e resulting MA(q) process is always stable, and is invertible if and only if all the zeros of the generating polynomial C(z) are strictly within the unit circle. Figure �.� illustrates the generation of an MA(q) process. � As seen in (�.���), the generating polynomial, C(z), de�ned as q
C(z) = � + c � z −� + . . . + c q z −q = � c k z −k k=�
where c � = �, allows us to express the MA(q) process as y t = C(z)e t
(�.���)
(�.���)
which suggests that the transfer function of the corresponding (linear) �lter is C(z). If the zeros of the polynomial C(z) are inside the unit circle, the � A monic polynomial has its �rst coe�cient equal to one.
��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
e t−� e t−� q q −� −� z z ? ? ⌫ ⌫ c� c�
q
FIGURE �.�
? ⌫ cq
XXX PP XXXPP XXP ? ⌧ XP XP XP q P X z - ∑ - yt
6
et
z −�
e t−q
Generation of an MA(q) process.
polynomial is invertible, allowing one to form the inverse �lter, i.e., one may form the (driving) noise process, e t , as et =
∞ � △ y t = G(z)y t = � �k y t−k C(z) k=�
(�.���)
where G(z) is the inverse �lter generating the noise process, as de�ned in (�.���). It is worth noting that G(z) will generally have an in�nite impulse response (IIR). From de�nition �.��, one can conclude that An MA(q) process will satisfy m y = E{C(z)e t } = �
r y (k) = �
�c k + c � c k+� + . . . + c q−k c q � if �k� ≤ q � if �k� > q σ e�
� y (ω) = σ e� �C(ω)�
�
(�.���) (�.���) (�.���)
where C(ω) indicates that the polynomial has been evaluated at frequency ω, i.e., z = e i ω . © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
Example �.�� Consider the (real-valued) MA(�) process y t = e t + c � e t−� , i.e., the process having the generating polynomial C(z) = � + c � z −� . �e autocovariance of y t is (cf. (�.���)) r y (�) = σ e� (� + c �� ) r y (�) =
with
σ e� c �
r y (k) = �,
(�.���) (�.���)
for �k� > �
(�.���)
r y (k) = r ∗y (−k) ∀k
(�.���)
To verify the above, as well as other similar cases, it is helpful to write out the covariances explicitly, i.e., for instance, for r y (�) (cf. (�.�)) r y (�) = E {[e t + c � e t−� ] [e t−� + c � e t−� ]}
= E �e t e t−� + c � e t e t−� + c � e t−� e t−� + = c � E {e t−� e t−� } =
c � σ e�
c �� e t−� e t−� �
(�.���)
(�.���) (�.���)
Similarly, the PSD of y t is
� y (ω) = σ e� �� + c � e −i ω �
�
(�.���)
= σ e� �c � e i ω + � + c �� + c � e −i ω � = σ e� �� + c �� + �c � cos(ω)�
(�.���) (�.���)
for ω = �π f , with −�.� < f ≤ �.�, with the last equality following from Euler’s formulas (see (A.�) in Appendix A.�). �
It is worth stressing that for an MA(q) process, r y (k) = �, for �k� > q. �is insight is quite important as it allows for a way to identify if a measurement may be well modeled as an MA process; if the estimated autocovariance, or similarly the estimated ACF, is zero for lags higher than �, it may be reasonable to model the measurement as a realization of an MA(�) process. We will return to this discussion further in Chapter �. ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ��������� 5
1.2
4 1
3 0.8
1
Amplitude
Amplitude
2
0 −1
0.6
0.4
−2
0.2
−3 0
−4 −5 0
100
200
300
400
500
−0.2 −60
−40
−20
Sample
0
20
40
60
Lag
(a)
(b)
40
1
Periodogram MA estimate
20
0.8 0.6
0
Imaginary Part
Power [dB]
0.4 −20 −40 −60
0.2 0 −0.2 −0.4
−80
−0.6 −0.8
−100
−1 −120 0
0.1
0.2
0.3
0.4
0.5
−1
Frequency [Hz]
(c)
−0.5
0
0.5
1
Real Part
(d)
FIGURE �.� The �gure illustrates the MA(�) process discussed in Example �.��, for � < ω ≤ π, with (a) showing a realization of the process, (b) the estimated correlation function, (c) the estimated power spectral density, and (d) the roots of the C(z)-polynomial.
Example �.�� Consider the MA(�) process formed using C(z) = � + �.�z −� + �.�z −� + �.�z −� + �.�z −�
(�.���)
Figure �.� illustrates a realization of this process together with the estimated correlation function, spectral density, and the roots of the C(z)-polynomial. �ese �gures deserve some further comments. Firstly, it is worth noting in Figure �.�(b) that the estimated ACF is not zero, as expected from (�.���), for lags higher than �. Similar to the discussion following Example �.��, this is due to the di�culty of estimating r y (k) accurately given a �nite amount of data. �is can be seen better in Figure �.�, which shows a closer look at the correlation function in Figure �.�(b), together with the corresponding con�dence intervals as given by theorem �.�� (below). Secondly, in Figure �.�(c), © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
1
Amplitude
0.8
0.6
0.4
0.2
0
−0.2 0
2
4
6
8
10
12
14
16
18
Lag
FIGURE �.� A closer look at the estimated correlation function for the MA(�) process in Example �.��. This is a magni�ed version of Figure �.�(b). The dashed lines correspond to the con�dence interval given by theorem �.��.
the periodogram estimate, as given in (�.��), is plotted together with the true PSD. As can be seen in the �gure, the spectrum contains two nulls, i.e., two frequencies for which the PSD has low power. �is can also be seen from the location of the roots of the C(z)-polynomial. �ese roots are shown in Figure �.�(d), and are (approximately) z � = −�.�� + �.��i, z � = −�.�� − �.��i, z � = �.�� + �.��i, and z � = �.�� − �.��i. If expressed using z = e i ω , with ω = �π f , this corresponds to the frequencies f � = �.��, f � = −�.��, f � = �.��, and f � = −�.��. �ese frequencies can therefore be viewed as the angles of the vectors pointing to the corresponding roots. �us, the angle of the vector for the root corresponding to f � , which is marked with an arrow in the �gure, will be ω � = �π ⋅ �.��. An important insight is that the spectrum is nothing but �C(ω)�� (up to a proportionality constant σ e� ) evaluated along the unit circle, which implies that the spectrum will have dips at the frequencies that corresponds to the angles of the roots z � , for � = �, . . . ,�. Moreover, the closer the actual root is to the unit circle, the deeper the null, with the spectrum being zero if the root is on the unit circle. Examining the root corresponding to f � , we note that this root is closer to the unit circle as compared to z � , and will thus exhibit a deeper null in the resulting spectrum as compared to the one at frequency f � , just as we see in Figure �.�(c). Figure �.� also shows the corresponding ��� con�dence interval, obtained using theorem �.�� detailed below. � ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
It is o�en helpful to compute the roots of the generating polynomial as well as the angles of these roots. Using Matlab, this is done using the following lines of code: C = [ 1 .8 .5 .2 .6 ]; f = angle( roots(C) )/pi/2;
In Matlab, all vectors are indexed as starting at z � . �us, the �rst line will be interpreted by Matlab as forming the polynomial C(z) in (�.���). �e second line will compute the roots of C(z), followed by �nding the argument of the roots, and scaling these arguments with �π. As noted in the above discussion, we will need to formulate a generalization of theorem �.�� for MA processes, which is given as [Bar��]. �eorem �.�� Let y t , for t = �, . . . ,N, be a realization of an MA(q) process. If ρˆ y (k) is estimated according to de�nition �.�, then (for large N) E{ρˆ y (k)} = � � V {ρˆ y (k)} = �� + �(ρˆ�y (�) + . . . + ρˆ�y (q))� N
(�.���) (�.���)
for k = q + �, q + �, . . .. Furthermore, ρˆ y (k), for �k� > q, is asymptotically Normal distributed. � �is result implies that the con�dence interval introduced for a white noise process in (�.��) should for an MA process be extended to also incorporate the estimated correlation lags. We have thus concluded that: �e (approximative) ��� con�dence interval for an MA(q) process can be expressed as � � + �(ρˆ�y (�) + . . . + ρˆ�y (q) ρˆe (k) ≈ � ± � for �k� ≥ q + � N √ For white noise, i.e., q = �, this simpli�es to ρˆe (k) ≈ � ± �� N.
�e provided Matlab function acf computes an estimate of the ACF as well as the above con�dence interval. © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
-
et
q X y i P PX PX PX PX X 6 PX PPXXXX P X ⌫ ⌫ ⌫ ? −a p −a � −a �
∑
6
y t−p
FIGURE �.�
⌧
z −�
6 q z −�
y t−�
- yt
6 q z −�
y t−�
Generation of an AR(p) process.
�.�.� THE AUTOREGRESSIVE PROCESS
We proceed to de�ne the second basic linear process, namely: De�nition �.�� �e process y t is called an autoregressive (AR) process if △
A(z)y t = y t + a � y t−� + . . . + a p y t−p = e t
(�.���)
where A(z) is a monic polynomial of order p, i.e., A(z) = � + a � z −� + . . . + a p z −p
(�.���)
where a p ≠ �, and e t is a zero-mean white noise process with variance σ e� , being uncorrelated with y t−� , for � > �. �e resulting AR(p) process is stationary (and thus an AR process) if and only if all the zeros of the generating polynomial A(z) are strictly within the unit circle, but is always invertible. Figure �.� illustrates the generation of an AR(p) process. �
�e mean of an AR process is found by taking the expectation on both sides of (�.���), i.e.,
��
E{y t + a � y t−� + . . . + a p y t−p } = E{e t } = �
(�.���)
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
�us, m y (� + a � + . . . + a p ) = m y A(�) = �, which implies that m y = � as all the zeros of A(z) are strictly within the unit circle, implying that A(�) ≠ �. From (�.�) and (�.���), as well as using that m y = �, one may also �nd the covariance function of the process by post-multiplying the process with y∗t−k and taking the expectation, i.e., E{e t y∗t−k } = E{y t y ∗t−k + a � y t−� y ∗t−k + . . . + a p y t−p y ∗t−k }
(�.���)
= r y (k) + a � r y (k − �) + . . . + a p r y (k − p)
(�.���)
Since e t is uncorrelated with y t−� , for � > �, E{e t y∗t−k } = σ e� δ K (k), with δ K (k) de�ned as in (�.��), implying that r y (k) + a � r y (k − �) + . . . + a p r y (k − p) = σ e� δ K (k)
(�.���)
which are known as the Yule-Walker (YW) equations. Expressed in matrix form for k = �, . . . ,n, for some chosen n, (�.���) implies that � r y (�) r y (−�) . . . r y (−n) � � .. � � r y (�) r y (�) . � � . . � .. . . r (−�) � y � � r y (n) . . . r y (�) �
where a k = � for k > p. Introducing θ = � a�
...
an �
�� �� �� �� �� �� �� �� �� �� �� �
� a� .. . an
� � � � � � � � � � �=� � � � � � � � � � �
σ e� � .. . �
� � � � � � � � � � �
(�.���)
T
(�.���)
and, by using all but the �rst row of (�.���), yields � r y (�) � � .. � � . � � r y (n) �
� � r y (�) . . . r y (−n + �) � � � � .. .. .. �+� . � � . . � � � � r y (n − �) . . . r (�) y � �
�� �� �� �� �� �� �� ��
a� .. . an
� � � � � � �=� � � � � � � � �
� .. . �
� � � � � � � �
(�.���)
or, with obvious de�nitions, rn + Rn θ = �
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.���) ��
October �, ���� – sida �� – � ��
� ���������� ���������
implying that θˆ = −R−� n rn
(�.���)
which directly yields an estimate of the AR coe�cients. We will here refer to this as the Yule-Walker estimate of the AR coe�cients (see also Example �.�). It is worth again noting that Rn is a Toeplitz matrix, a fact that we will shortly make good use of. Example �.�� Consider the real-valued AR(�) process formed using y t + a � y t−� = e t
(�.���)
Clearly, if �a � � > �, then y t = e t − a � y t−� will grow exponentially as t grows, and y t will therefore not be a stationary process, con�rming that the roots of the A(z)-polynomial need to be strictly inside the unit circle for the process to be an AR process. Using (�.���) implies that r y (�) + a � r y (�) = σ e�
(�.���)
r y (�) + a � r y (�) = �
(�.���)
where we have exploited that r y (k) = r ∗y (−k). Clearly, this allows us to estimate a � as a function of r y (�) and r y (�), i.e., a� = −
r y (�) r y (�)
σ e� = r y (�) + a � r y (�) =
(�.���)
r �y (�) − r �y (�) r y (�)
(�.���)
Alternatively, we may assume that we know a � and instead solve for r y , i.e., r y (�) =
σ e� � − a ��
r y (�) = −a � r y (�) = −a �
��
(�.���) σ e� � − a ��
(�.���)
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
As r y (k) + a � r y (k − �) = �, we may extend this to a general k as �k�
r y (k) = �−a � �
σ e� � − a ��
(�.���)
where we have again exploited the symmetry of r y (k). �e power spectrum of y t can be found by expressing the process as formed by �ltering white noise with variance σ e� through the �rst-order all-pole �lter H(z) =
� � ≡ −� � + a� z A(z)
(�.���)
which, using (�.��), yields � y (ω) =
σ e� σ e� = [� + a � e i ω ] [� + a � e −i ω ] � + a �� + �a � cos ω
(�.���)
From the expression of � y (ω), it is worth noting that the power in y t will be concentrated at low frequencies if a � < �, and is therefore referred to as a lowpass process, with the power being more concentrated close to ω = � if a � is closer to -� (recall that �a � � < � to ensure stability), whereas if a � > �, the power will instead be concentrated at high frequencies, and is then called a highpass process. � Generalizing the formulation of � y (ω) in Example �.�� to an AR(p) process, we �nd that an AR(p) process will satisfy my = �
(�.���) p
r y (k) = σ e� δ K (k) − � a � r y (k − �)
� y (ω) =
σ e�
�A(ω)�
�=�
�
(�.���) (�.���)
with A(ω) indicating that the polynomial has been evaluated at ω.
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ��������� 8
1 0.8
6
0.6 0.4
Amplitude
Amplitude
4
2
0
0.2 0 −0.2 −0.4
−2
−0.6
−4 −0.8
−6 0
100
200
300
400
−1 −60
500
−40
−20
0
Sample
(a)
40
60
(b)
80
1
Periodogram AR estimate
60
0.8 0.6
40
0.4
Imaginary Part
20
Power [dB]
20
Lag
0 −20
0.2 0 −0.2
−40
−0.4
−60
−0.6 −0.8
−80
−1 −100 0
0.1
0.2
0.3
0.4
0.5
−1
−0.5
0
0.5
1
Real Part
Frequency [Hz]
(c)
(d)
The �gure illustrates the AR(�) process discussed in Example �.��, with (a) showing a realization of the process, (b) the estimated correlation function, (c) the estimated power spectral density, and (d) the roots of the A(z)-polynomial. FIGURE �.�
Example �.�� Consider the AR(�) process formed using A(z) = � + �.�z −� + �.�z −� + �.�z −� + �.�z −�
(�.���)
Figure �.� illustrates a realization of this process together with the estimated correlation function, spectral density, and the roots of the A(z)-polynomial. Reminiscent to the discussion in Example �.��, it can be seen that the locations of the roots of the A(z)-polynomial will also here dictate the behavior of the power spectrum for the process, although now with the di�erence that if a root is close to the unit circle, it will due to the inverse in (�.���) create a more pronounced peak as compared to a root located further from the unit circle. It can also be seen that a peak closer to the origin will yield a broader peak as compared to one close to the unit circle; this is due to the fact that the modes closer to the origin will decay more rapidly, essentially behaving as a damped ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
50
1
Periodogram AR estimate
0.8 0.6
0
Imaginary Part
Power [dB]
0.4 −50
−100
0.2 0 −0.2 −0.4 −0.6
−150
−0.8 −1 −200 0
500
1000
1500
2000
2500
3000
3500
4000
Frequency [Hz]
(a)
−1
−0.5
0
0.5
1
Real Part
(b)
(a) The estimated power spectral density, and (b) the roots of the estimated A(z)polynomial using an AR(��) model for the signal in Example �.��. FIGURE �.��
sinusoidal component, which yields a widening in the resulting spectral line. From the �gure, it can also be seen that the ACF of an AR process exhibits a ringing behavior. �is is quite characteristic for AR processes and will be one of the features we will later on exploit to identify signals that can be well modeled using an AR model. �
Example �.�� As mentioned in Example �.��, given on page ��, voiced speech is o�en modeled using a sinusoidal model. As a sum of d sinusoids may be perfectly modeled using an AR(�d) model [CLP��], a common alternative model is to instead use an AR model. Examining the periodogram estimate indicates that a model using d = �� sinusoids might be reasonable, thereby suggesting that an AR(��) model ought to be appropriate. Regrettably, the presence of the additive white noise will make the AR(�d) model match poorly with the observed spectrum�, which can also be noted in Figure �.��(a), showing the match between an AR(��) model and the periodogram estimate. As can be seen in the �gure, the spectrum matches well for the lower frequencies, but fails to give more a rough model of the higher order peaks. �e reason for this can also be seen in Figure �.��(b), showing the roots of the resulting A(z)-polynomial. If no noise was present, all the roots should be gathered along with the right half circle, with only four of the roots appearing in the le� � In fact, the noise will make the model instead correspond to that of an ARMA(�d, �d) model; see Section �.�.� for more details on ARMA models.
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
half circle, and then also for low angles (this as the peaks in the periodogram appear only at the lower part of the spectrum, with only two peaks being higher than � kHz). However, the additive noise has the e�ect of corrupting this distribution, instead making the root distribution more uniform around the entire unit circle, thereby causing the poor spectral �t for the higher frequencies (see also [vM��]). In order to handle the e�ect of the noise when using an AR model, one needs to extend the used model further; we will return to this problem in Example �.��, given on page ���. � �.�.� THE LEVINSON-DURBIN ALGORITHM�
In this section, we will discuss a computationally e�cient method for computing the Yule-Walker (YW) estimate of the AR coe�cients, as given in (�.���). �e computing of (�.���) is computationally expensive, requiring O(n � ) operations, meaning that the cost can be written as c � n � + c � n � + c � n � + c � , for some constants c � , for � = �, . . . ,�, i.e., the operation has a complexity of order n � . Fortunately, this complexity can be drastically reduced by exploiting the Toeplitz structure of the covariance matrix. Recall the YW equations in (�.���), � r y (�) r y (−�) . . . r y (−n) � � .. � � r y (�) r y (�) . � � . .. � .. . r y (−�) � � � r y (n) ... r y (�) �
or, using matrix notation, Rn+� �
� σ� �=� n � θn �
�� �� �� �� �� �� �� �� �� �� �� �
� a� .. . an
� � � � � � � � � � �=� � � � � � � � � � �
σn� � .. . �
� � � � � � � � � � �
(�.���)
(�.���)
where � denotes a column vector with elements � of appropriate dimension, and where we now use the notation σn� and θ n in place of σ e� and θ, respectively, to stress the order n of the nested structure. Using this structure and that ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
r y (k) = r ∗y (−k), we may form the vector � � � � Rn+� � � θn � � � �
� � � � � � Rn+� �=� � � � � � � r y (n + �) ˜r∗n � � � σ� � � � n � � � � � =� � � � � αn � � �
r ∗y (n + �) ˜rn r y (�)
�� � �� �� � � θn �� �� �� � ��
� � � � � � � �
(�.���)
where ˜rn indicates that the vector rn has been ordered in the opposite direction, i.e., (cf. (�.���)–(�.���)) ˜rn = � r y (n) . . . r y (�) �
T
(�.���)
and where
α n = r y (n + �) + ˜r∗n θ n
(�.���)
is obtained from the bottom row. �us, if α n could be nulled, (�.���) would be the counterpart of (�.���), with n increased by one. To achieve this, we introduce the re�ection coe�cient k n+� , de�ned as △
k n+� = −
αn σn�
(�.���)
and form
� � � � � � � � � � �� � � � Rn+� �� θ n � + k n+� � � � � � � � � � � � � � � � �
� θ˜ n �
�� � � �� � � �� � �� = � �� � � �� � �� � � � � � =� � � �
� � α ∗n � � � � � + k n+� � � � � � � � σ � � n � ∗ � σn + k n+� α n � � � � � � � � σn� � αn
� � � � � � �
(�.���)
where we have made use of the fact that for any Hermitian Toeplitz matrix y = Rx
⇔
y˜ = R˜x
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.���) ��
October �, ���� – sida �� – � ��
� ���������� ���������
�e Levinson-Durbin algorithm
Initialization: θ� = −
σ��
r y (�) = k� r y (�)
= r y (�) −
�r y (�)�
�
r y (�)
�en, for iteration n = �, . . . , n max , k n+� = −
r y (n + �) + ˜r∗ θ n σn�
� σn+� = σn� �� − �k n+� �� �
θ n+� = �
θn θ˜ n � + k n+� � � � �
where, as before, x˜ indicates that the vector x has been ordered in the opposite direction. �e expression in (�.���) has the same form as (�.���), with n increased by one, i.e., Rn+� �
�
θ n+�
�=�
� σn+� � �
(�.���)
with θ n+� therefore being the parameter vector corresponding to the YW equation of the extended dimension. �is suggests that we may compute an order-recursive estimate of θ as θ n+� = �
θn θ˜ n � + k n+� � � � �
� σn+� = σn� �� − �k n+� �� �
(�.���) (�.���)
�e initialization is straightforward, following from (�.���) and (�.���), and the algorithm can be summarized as in the above table. As can be seen from the table, the Levinson-Durbin algorithm will reduce the complexity ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
of computing θ to O(n � ) operations, which is a substantial computational reduction, particularly important for larger values of n. �e estimate can be computed in Matlab using the function levinson. One should note that an estimate of r y (k) is needed prior to computing the Levinson-Durbin estimate. As this is also a relatively computationally expensive estimate, an algorithm that could estimate θ directly from the measurements y t , without the need of �rst computing r y (k), would clearly be preferable. Such algorithms exist and work exceedingly well (see, e.g., Example �.�). �e most well-known of these are the so-called Burg algorithm [Bur��] and the modi�ed covariance method [Mar��], where the latter is generally perceived to be the method of choice for estimating θ both e�ciently and accurately. If using Matlab, these estimates can be found by using the functions arburg and armcov. �e reader is referred to [Mar��, SM��, WM��] for a further discussion of these algorithms. It is worth remarking that one may also rewrite the Levinson-Durbin algorithm to a parallel form, such that it can be implemented e�ciently in a pipeline structure (similar to the structure used by the Burg algorithm). �e resulting so-called Schur recursion is numerically preferable to the LevinsonDurbin algorithm and if implemented on a parallel architecture reduces the required complexity somewhat further (although, similar to the LevinsonDurbin algorithm, still requiring estimates of r y (k)). �e interested reader is referred to [Hay��] for a nice treatment of the topic. It is also worth stressing that the Levinson-Durbin algorithm will produce an exact solution of (�.���). If one allows for an approximate solution, one can achieve further substantial computational reductions using, for instance, the preconditioned conjugate gradient algorithm which only requires O (�n log(�n)) operations (see, e.g., [CJ��]).
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
e t−� q
z −� ⌫ ? c� 6
et
-q
PP P
PP P
e t−q
z −� ⌫ ? cq ?⇠
PP q
∑ iP P ⇢⇡ PP PP 6
-
⌫ −a p 6
y t−p
FIGURE �.��
z −�
q
- yt
PP ⌫? −a � 6 q z −�
y t−�
Generation of an ARMA(p,q) process.
�.�.� THE ARMA PROCESS
We now proceed to combine the two basic processes to form an autoregressive moving average (ARMA) process: De�nition �.�� �e process y t is called an ARMA process if A(z)y t = C(z)e t
(�.���)
where A(z) and C(z) are monic polynomials of order p and q, respectively, A(z) = � + a � z −� + . . . + a p z −p
C(z) = � + c � z
−�
+ . . . + cq z
−q
(�.���) (�.���)
and e t is a zero-mean white noise process with variance σ e� . �e process is stationary if the roots of A(z) = � lie within the unit circle, and invertible if the roots of C(z) = � do. Figure �.�� illustrates the generation of an ARMA(p, q) process. � ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
Combining the above results, one may conclude that the PSD of an ARMA process is a rational function formed as the ratio between the two characteristic polynomials, i.e., � y (ω) =
�C(ω)� �A(ω)�
�
�
σ e�
(�.���)
with A(ω) and C(ω) indicating that the polynomials have been evaluated at frequency ω, i.e., z = e i ω . Alternatively, using the z-notation, (�.���) may be expressed as
where
� y (z) =
C(z)C ∗ (��z ∗ ) � σ A(z)A∗ (��z ∗ ) e
(�.���)
A(z) = � + a � z −� + . . . + a p z −p ∗
A (��z ) = [A(��z )] = � + ∗
∗
∗
a ∗� z
+...+
a ∗p z p
(�.���) (�.���)
with C(z) and C ∗ (��z ∗ ) de�ned similarly. �is result is o�en termed the spectral factorization theorem. From (�.���), one may note that the poles and zeros of � y (z) will appear in symmetric pairs about the unit circle, such that iθk if z k = r k e i θ k is a pole (zero), then (��z ∗k ) = r −� is also a pole (zero). k e According to Weierstrass theorem, any continuous PSD can be approximated arbitrarily close by a rational PSD of the form (�.���), provided that the degrees p and q are su�ciently large. �is fact makes the ARMA processes particularly interesting for many forms of modeling, although it should be noted that “su�ciently large” may well mean that for some processes, the polynomial orders needs to be very high to allow for a reasonably accurate model. Rewriting (�.���) as p
q
�=�
�=�
y t + � a � y t−� = � c � e t−�
(�.���)
with c � = �, multiplying with y∗t−k , and taking the expectation yields p
q
�=�
�=�
r y (k) + � a � r y (k − �) = � c � E{e t−� y ∗t−k }
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.���) ��
October �, ���� – sida �� – � ��
� ���������� ���������
where the expectation of e t−� y∗t−k needs to be further resolved. Using that (�.���) can be expressed as the signal obtained by �ltering the white noise process e t through the asymptotically stable and causal �lter H(z), i.e.,
where
yt =
∞ C(z) △ e t = � h � e t−� = H(z)e t A(z) �=�
(�.���)
∞
H(z) = � h � z −�
(�.���)
�=�
with h � = �. �us, the term E{e t−� y∗t−k } may be expressed as ∞
E{e t−� y ∗t−k } = E �e t−� � h∗s e ∗t−k−s � ∞
s=�
= σ e� � h∗s δ K (� − k − s) = σ e� h ∗�−k s=�
with h k = � for k < �, and, as a result, p
q
�=�
�=�
r y (k) + � a � r y (k − �) = σ e� � c � h∗�−k
(�.���) (�.���)
(�.���)
Generally, the �lter h k is a complicated function of the {a � } and {c � } coef�cients, but for �k� ≥ q + �, (�.���) reduces to p
r y (k) + � a � r y (k − �) = � for �k� > q �=�
(�.���)
�is allows for an extensions of the above discussed Yule-Walker equations, such that these are reformulated for �k� = q + �, q + �, . . ., and thereby allowing for the estimation of the AR part of the process using a scheme reminiscent to the one discussed above for AR processes. Using these parameters, one may then form an estimated MA process by inverse �ltering y t through the obtained AR polynomial. �e resulting residual can then be modeled as an MA process, and an estimate of the MA part of y t formed accordingly (this is, in itself, a challenging task; see also Section �.�.�). We refer the reader to [SM��] for further details on the resulting modi�ed Yule-Walker method. ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
In summary, the ARMA(p,q) process will satisfy my = �
� y (ω) =
�C(ω)� �A(ω)�
(�.���) �
�
σ e�
p
r y (k) + � a � r y (k − �) = � �=�
(�.���) for �k� > q
(�.���)
Example �.�� Consider a real-valued ARMA(�,�) process, de�ned as y t + a � y t−� = e t + c � e t−�
(�.���)
where e t is a zero-mean white noise process. �e autocovariance of y t may then be formed by multiplying with y t−k and taking the expectation, i.e., E�y t y t−k + a � y t−� y t−k � = E�e t y t−k + c � e t−� y t−k �
(�.���)
implying that
r y (�) + a � r y (�) = E{e t y t } + c � E{e t−� y t }
= E{e t (−a � y t−� + e t + c � e t−� )} + c � E{e t−� y t } = σ e� + c � E{e t−� (−a � y t−� + e t + c � e t−� )} = (� + c �� − c � a � )σ e�
r y (�) + a � r y (�) = E{e t y t−� } + c � E{e t−� y t−� } = c � σ e�
(�.���) (�.���)
and thus r y (k) = −a � r y (k − �), for k ≥ �. Inserting r y (�) from (�.���) into (�.���) yields r y (�) = −a � �c � σ e� − a � r y (�)� + (� + c �� − c � a � )σ e� =
� + c �� − �c � a � � σe � − a ��
(�.���) (�.���)
and, as a result,
r y (�) = �c � − a �
� + c �� − �c � a � � (� − c � a � )(c � − a � ) � σe � σe = � − a �� � − a ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.���) ��
October �, ���� – sida �� – � ��
� ���������� ���������
Normalizing with r y (�), the ACF is obtained as ρ y (�) =
(� − c � a � )(c � − a � ) � + c �� − �c � a �
ρ y (k) = (−a � ) k−� ρ y (�) for k ≥ �
(�.���) (�.���)
and will therefore exhibit an exponential decay for lags larger than one. For an ARMA(p,q) process, this exponential decay will start a�er lag �q − p�. �
�.� Estimating the power spectral density
When computing the periodogram using a computer, one is obviously unable to compute the required Fourier transform over a continuum of frequencies, and is therefore forced to restrict the evaluation of the spectral estimate to a set of discrete frequencies. To make this explicit, we form the DFT of the sequence {y t } Nt=� as N
YN (ω � ) = � y t e −i�π�t�N t=�
(�.���)
for the discrete frequencies ω � = �π��N, for � = �, . . . ,N − �, with the corresponding inverse DFT (IDFT) formed as yt =
� N−� i�π�t�N � YN (ω � )e N �=�
(�.���)
�e coe�cient YN (�) = ∑ Nt=� y t is o�en referred to as the DC o�set�. �e equations (�.���) and (�.���) represent the link between the time and frequency representation of a �nite length signal y t . Two o�en useful connections between these representations are Plancherel’s theorem N
∗ � xt yt = t=�
� N−� ∗ � X N (ω � )YN (ω � ) N �=�
(�.���)
� �e term DC o�set originates from electronics, where it refers to the direct current. �e concept has since been extended to be used for the mean of any waveform. If the waveform is zero-mean, one commonly says it has a zero DC o�set.
��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
where X N (ω � ) and YN (ω � ) denote the DFT of x t and y t , respectively, and Parseval’s theorem, which is a special case of Plancherel’s theorem, stating that N
� � �x t � = t=�
� N−� � � �X N (ω � )� N �=�
(�.���)
Using (�.���), the periodogram estimate can be expressed as � � � N �ˆ y (ω � ) = �YN (ω � )� = �� y t e −i�π�t�N � N N t=�
�
(�.���)
To determine the performance of this estimator, it is helpful to note that the correlogram estimate, de�ned in (�.��) on page ��, will coincide with the periodogram estimate in (�.���). As a result, the bias of both estimators can be determined as E ��ˆ y (ω � )� = E ��ˆ cy (ω � )� =
N−�
�
k=−(N−�)
E �ˆr y (k)� e −i�π�t�N
(�.���)
Using the biased autocovariance estimate de�ned in (�.��), (�.��) implies that E �ˆr y (k)� =
and hence
N − �k� r y (k) N
E ��ˆ y (ω � )� =
N−�
�
k=−(N−�)
w B (k)r y (k)e −i�π�k�N
(�.���)
(�.���)
where w B (k) is the so-called triangular window (also known as the Bartlett window), de�ned as w B (k) = �
�− �,
�k� , N
k = �, ± �, . . . , ±(N − �) otherwise
(�.���)
Noting that the product of two sequences is equal to the convolution of their DFTs, (�.���) can alternatively be expressed as (see, e.g., [SM��]) E ��ˆ y (ω � )� =
� �π
∫
π
−π
� y (ψ)WB (ω � − ψ) dψ
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
(�.���) ��
October �, ���� – sida �� – � ��
� ���������� ���������
0
Main lobe −10
dB
−20
Side lobes
−30
−40
−50
−60 −0.5
0
0.5
Normalised frequency FIGURE �.��
The normalized Fejér kernel, WB (ω)�WB (�), for N = ��.
where WB (ω � ) denotes the DFT of w B (k), the so-called Fejér kernel, WB (ω � ) =
N − �k� −i ω � k e N k=−(N−�) N−�
�
(�.���)
� N N � N = � � e −i ω � (t−s) = �� e −i ω � t � N t=� s=� N t=� = =
� e −i ω � N − � � � N e −i ω � − �
�
(�.���)
�
� sin(ω � N��) � � N sin(ω � ��)
(�.���)
�
(�.���)
which is illustrated in Figure �.��. In the fourth equality, we have here used that the sum is a geometric series, allowing it to be expressed as (A.�), given on page ���. From (�.���), it is clear that �ˆ y (ω � ) will only be unbiased if the window function WB (ω � ) is a Dirac impulse, as de�ned in (�.��), which it is clearly not. One may therefore conclude that the periodogram estimate is biased. Indeed, as seen in Figure �.��, WB (ω � ) is clearly a poor approximation of a Dirac impulse for short data lengths, N. However, as seen from (�.���), ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ���������
the periodogram is an asymptotically unbiased spectral estimator, i.e., lim E ��ˆ y (ω � )� = � y (ω � )
(�.���)
N→∞
Figure �.�� also illustrates the source of the periodogram’s bias; the main lobe of WB (ω � ) will due to the convolution in (�.���) smooth the estimated spectrum so that any spectral components that are closer than ��N (in absolute frequency) will appear as a single broad peak in the resulting spectral estimate. �is e�ect is termed smearing and is of great importance for the ability to resolve closely spaced spectral components. Similarly, the side lobes will leak power from adjacent frequency bands, making frequencies that should contain low power have higher power than anticipated. �is so-called leakage e�ect makes it di�cult to estimate weak signal components, especially if there are spectral components with large power close by. However, more problematic than the bias is the fact that the periodogram is not a consistent estimator. In fact, it can be shown that: �eorem �.�� Assuming that y t can be expressed as y t = ∑∞ k=� h k e t−k , where {h k } is a stable linear �lter and e t is a Gaussian circularly symmetric� white noise, i.e., satisfying E {e t e s∗ } = σ y� , if t = s, and zero otherwise, and E {e t e s } = �, ∀s and t, it holds that lim E ���ˆ y (ω � ) − � y (ω � )� ��ˆ y (ω p ) − � y (ω p )�� = �
N→∞
� �y (ω � ), �,
ω� = ω p ω� ≠ ω p (�.���)
We refer the reader to [SM��] for a proof of this result.
�
As can be seen from the above theorem, the periodogram estimates at di�erent frequency grid points are thus uncorrelated with each other, and the variance of the periodogram estimate will equal the square of the true spectrum. As one is o�en particularly interested in the peaks of the spectrum, this is quite problematic, and implies that the periodogram estimate has a large variance and is rather inaccurate. Recalling (�.��) on page ��, it is worth noting that � �e real and imaginary parts of a circularly symmetric process are uncorrelated.
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ���������
one further consequence of (�.���) is that the sum of the periodogram is a quite poor estimate of the variance of the signal. Nevertheless, due to its computational and conceptual simplicity, the periodogram is one of the most frequently used spectral estimators in practice. ZERO PADDING AND WINDOWING
It is important to realize that the periodogram estimate of the sequence {y t } Nt=� in (�.���) is only yielding a spectral estimate for the discrete frequency grid points ω � = �π��N, for � = �, . . . N − �. O�en, one is interested in evaluating the spectral estimate over a �ner grid than that. �is can be achieved simply by appending zeros to the measured sequence, such that the periodogram is instead computed on the sequence {y � , . . . , y N , �, �, . . . , �}, where P − N zeros have been appended. �e resulting spectral estimate is then found as � N �ˆ y (ω � ) = �� y t e −i�π�t�P � N t=�
�
(�.���)
for the discrete frequencies ω � = �π��P, for � = �, . . . , P − �. Appending zeros to the sequence can clearly not provide additional information, and the resulting spectral estimate will therefore have the same frequency content as the one in (�.���), with the exception that the former is evaluated on a �ner frequency grid. Using Matlab, this so-called zero padding is achieved by using the command fft(yt,P) which computes the DFT of the vector yt using the Fast Fourier Transform (FFT) algorithm. Example �.�� People have always been fascinated by the sun, and as early as ��� BCE, the Chinese astronomer Gan De noted that the sun has spots. �e number and size of these sunspots vary over time and are due to magnetic activity causing large areas of the sun surface to be cooler than the surrounding areas, making these areas appear as dark spots. It was in an attempt to determine the periodicity of these sunspots that Sir Arthur Schuster introduced the periodogram [Sch��, Sch��]. Figure �.�� illustrates the annual sunspot numbers for the years ����-����. Figure �.��(a) illustrates the periodogram estimate of the sunspot data (a�er removing the mean), with the circles marking the computed estimates, for frequency grid points with spacing ��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
October �, ���� – sida �� – � ��
� ���������� ��������� 160 140
Sunspot number
120 100 80 60 40 20 0 1770 1780
1790
1800 1810
1820
1830
1840 1850
1860
Year FIGURE �.��
The annual sunspot numbers for the years ����-����.
∆ f = ��N = ����� (in absolute frequencies). �e dominant peak appears to be at f = �.��, suggesting that the sunspots contain a cycle of about ���.�� ≈ �.� years. Figure �.��(b) shows the estimate obtained if instead using zero padding, so that the periodogram estimate is evaluated over ���� grid points, such that the frequency spacing is ∆ f = ������. As is clear from the �gure, the �ner frequency grid allows us to �nd a more accurate estimate of the frequency of the dominant peak, which now appears to be at f = �.����, suggesting a cycle of about ��.� years. It is worth noting that the �ner grid also allows for a more accurate estimation of the �rst peak; when using the coarse grid, this peak is just represented using two frequency grid point, and as neither of these are close to the frequency of the peak, the line appears much lower than it should be. However, it should be stressed that both �gures contain the same information; clearly no additional information can be obtained by appending zeros to the measurements. �e second �gure is just easier for us to interpret, and allows us to better see the information in the spectral estimate. �e data is provided in the �le dataSunspots. �
As shown in theorem �.��, the periodogram estimate has a very large variance, in particular for the parts of the spectrum that contain notable power. As one is o�en interested in these areas of the spectrum in particular, it is important © T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
��
October �, ���� – sida �� – � ��
� ���������� ��������� 4
4
x 10
2.2
2
2
1.8
1.8
1.6
1.6
1.4
1.4
Power [dB]
Power [dB]
2.2
1.2 1
1.2 1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
x 10
0.2
0 0
0.1
0.2
0.3
0.4
0.5
0 0
0.1
0.2
0.3
Absolute frequency
Absolute frequency
(a)
(b)
0.4
0.5
Periodogram estimate of the sunspot data discussed in Example �.�� with and without zero padding. FIGURE �.��
to �nd ways to reduce the variance of the spectral estimate. One way to do so is to modify the estimate to use another window than the one naturally occurring in (�.���), by forming the spectral estimate as �ˆ BT (ω � ) =
N−�
�
k=−(N−�)
w k rˆy (k)e −i�π�t�N
(�.���)
where w k is an even function such that w k = w−k , w � = �, and w k = �, for �k� > M, where M < N, and w k decays smoothly to zero with growing �k�. �e resulting spectral estimator is termed the Blackman-Tukey estimator, and the function w k is called a lag window as it is weighting the lags of the estimated autocovariance. Many other approaches exist to improve the spectral estimate, and the reader is referred to [SM��] for an in-depth treatment of this fascinating topic.
��
© T H E A U T H O R A N D S T U D E N T L I T T E R AT U R
Time series analysis concerns the mathematical modeling of time varying phenomena, e.g., ocean waves, water levels in lakes and rivers, demand for electrical power, radar signals, muscular reactions, ECG-signals, or option prices at the stock market. This book gives a comprehensive presentation of stochastic models and methods in time series analysis. The book treats stochastic vectors and both univariate and multivariate stochastic processes, as well as how these can be used to identify suitable models for various forms of observations. Furthermore, different approaches such as least squares, the prediction error method, and maximum likelihood are treated in detail, together with results on the Cramér-Rao lower bound, dictating the theoretically possible estimation accuracy. Residual analysis and prediction of stochastic models are also treated, as well as how one may form time-varying models, including the recursive least squares and the Kalman filter. The book discusses how to implement the various methods using Matlab, and several Matlab functions and data sets are provided with the book. The book provides an introduction to time series modeling of various forms of measurements, focusing on how such models may be identified and detailed. It has a practical approach, and include several examples illustrating the theory.
| An Introduction to Time Series Modeling
An Introduction to Time Series Modeling
Andreas Jakobsson
Andreas Jakobsson received his M.Sc. from Lund Institute of Technology and his Ph.D. in Signal Processing from Uppsala University in 1993 and 2000, respectively. Since, he has held positions with Global IP Sound AB, the Swedish Royal Institute of Technology, King’s College London, and Karlstad University, held an Honorary Research Fellowship at Cardiff University, as well as acted as an expert for the IAEA. He is currently Professor of Mathematical Statistics at Lund University, Sweden. His research interests include statistical and array signal processing, detection and estimation theory, and related application in remote sensing, telecommunication and biomedicine.
The book is aimed at advanced undergraduate and junior graduate s tudents in statistics, mathematics, or engineering. Helpful prerequisites include courses in multivariate analysis, linear systems, basic probability, and stochastic processes. Art.nr 36415
Andra upplagan
2:a uppl.
An Introduction to
Time Series Modeling
Andreas Jakobsson
www.studentlitteratur.se
978-91-44-10836-0_01_cover.indd 1
2015-10-06 12:04