ListofFigures
1.1Mappingbetweenabstractgroups
2.1Lagrangemultipliersinconstrainedoptimization
2.7ConvergenceofNewton’smethodwithandwithoutline search
2.8Piecewiselinearobjectivefunction(1D)
4.1Overfittingandunderfitting:polynomialmodels
4.2Minimumdescriptionlength(MDL)
4.5ConvergencepropertiesofMetropolis-Hastings(MH)sampling
5.1Simple5-nodeprobabilisticgraphicalmodel(PGM)
5.2Notationforrepeatedconnectivityingraphicalmodels
5.3Somecomplexgraphicalmodels
5.4Threebasicprobabilisticgraphicalmodelpatterns
6.1GaussianmixturemodelforGammarayintensitystrati-
6.2Convergenceofexpectation-maximization(E-M)forthe
6.3Classifierdecisionboundaries
6.14Principalcomponentsanalysis(PCA):trafficcountsignals
6.16Locallylinearembedding(LLE):chirpsignals
7.8Computationalcomplexityoflong-durationFIRimplementations
7.9Impulseresponseandtransferfunctionofdiscretizedkernelregression
7.11Variousnonparametricpowerspectraldensityestimators 234
7.12Linearpredictivepowerspectraldensityestimators 236
7.13Regularizedlinearpredictionanalysismodelselection 238
7.14Wienerfilteringofhumanwalkingsignals 239
7.15Sinusoidalsubspaceprincipalcomponentsanalysis(PCA) filtering
7.16SubspaceMUSICfrequencyestimation
7.17Typical2DKalmanfiltertrajectories 243
7.18KalmanfiltersmoothingbyTikhonovregularization 252
7.19Short-timeFouriertransformanalysis 254
7.20Time-frequencytiling:Fourierversuswaveletanalysis 256
7.21Aselectionofwaveletfunctions 261
7.22VanishingmomentsoftheDaubechieswavelet 262
7.23Discretewavelettransformwaveletshrinkage 263
8.1DigitalMEMSsensors
8.2Discrete-timesamplinghardwareblockdiagram
8.3Non-uniformanduniformandbandlimitedsampling 268
8.4Shannon-Whittakeruniformsamplinginthefrequency domain 269
8.5DigitalGammarayintensitysignalfromadrillhole 271
8.6QuadraticB-splineuniformsampling 272
8.7Rate-distortioncurvesforani.i.d.Gaussiansource 277
8.8Lloyd-Maxscalarquantization 280
8.9Entropy-constrainedscalarquantization 282
8.10Shannon-Whittakerreconstructionofdensityfunctions 283
8.11Quantizationanddithering 285
8.12Scalarvesusvectorquantization 286
8.13Compandingquantization 287
8.14Linearpredictivecodingfordatacompression 289
8.15Transformcoding:bitcountversuscoefficientvariance 291
8.16Transformcodingofmusicalaudiosignals 292
8.17Compressivesensing:discretecosinetransformbasis 295
8.18Randomdemodulationcompressivesensing 297
9.1Quantilerunningfilterforexoplanetarylightcurves 301
9.2First-orderlog-normalrecursivefilter:dailyrainfallsignals 302
9.3Totalvariationdenoising(TVD)ofpowerspectraldensity timeseries 303
9.4Bilateralfilteringofhumanwalkingdata 304
9.5HiddenMarkovmodelling(HMM)ofpowerspectraltime seriesdata 306
9.6Cepstralanalysisofvoicesignals 312
10.1Infiniteexchangeabilityinprobabilisticgraphicalmodel form 315
10.2Linkingkernelandregularizedbasisfunctionregression 319
10.3Randomfunctiondrawsfromthelinearbasisregression model
10.4Gaussianprocess(GP)functiondrawswithvariouscovariancefunctions
10.6Gaussianprocess(GP)regressionforclassification
10.7DrawsfromtheDirichletdistributionfor K =3
10.8TheDirichletprocess(DP)
10.10ComparingpartitionsizesofDirichletversusPitman-Yor processes
10.11Parametricversusnonparametricestimationoflinearpredictivecoding(LPC)coefficients
10.12DP-meansforsignalnoisereduction:exoplanetlightcurves341
10.13Probabilisticgraphicalmodel(PGM)fortheBayesian mixturemodel
10.14PerformanceofMAP-DPversusDP-meansnonparametricclustering
Mathematicalfoundations 1
Statisticalmachinelearningandsignalprocessingaretopicsinapplied mathematics,whicharebaseduponmanyabstractmathematicalconcepts.Definingtheseconceptsclearlyisthemostimportantfirststepin thisbook.Thepurposeofthischapteristointroducethesefoundational mathematicalconcepts.Italsojustifiesthestatementthatmuchofthe artofstatisticalmachinelearningasappliedtosignalprocessing,liesin thechoiceofconvenientmathematicalmodelsthathappentobeuseful inpractice.Convenientinthiscontextmeansthatthealgebraicconsequencesofthechoiceofmathematicalmodelingassumptionsareinsome sensemanageable.Theseedsofthismanageabilityaretheelementary mathematicalconceptsuponwhichthesubjectisbuilt.
1.1Abstractalgebras
Wewilltakethesimpleviewinthisbookthatmathematicsisbasedon logicappliedto sets:asetisanunorderedcollectionofobjects,often realnumbers suchastheset {π, 1,e} (whichhasthree elements),orthe setofallrealnumbers R (withaninfinitenumberofelements).From thismodestoriginitisaremarkablefactthatwecanbuildtheentirety ofthemathematicalmethodsweneed.Wefirststartbyreviewingsome elementaryprinciplesof(abstract) algebras.
Groups
An algebra isastructurethatdefinestherulesofwhathappenswhen pairsofelementsofasetareacteduponby operations.Akindof algebraknownasa group (+, R) istheusualnotionofadditionwith pairsofrealnumbers.Itisagroupbecauseithasan identity,the numberzero(whenzeroisaddedtoanynumberitremainsunchanged, i.e.a+0=0+a = a,andeveryelementinthesethasan inverse (forany number a,thereisaninverse a whichmeansthat a+( a)=0.Finally, theoperationis associative,whichistosaythatwhenoperatingonthree ormorenumbers,additiondoesnotdependontheorderinwhichthe numbersareadded(i.e. a +(b + c)=(a + b)+ c).Additionalsohas theintuitivepropertythat a + b = b + a,i.e.itdoesnotmatterifthe numbersareswapped:theoperatoriscalled commutative,andthegroup isthencalledan Abeliangroup.Mirroringadditionismultiplication actingonthesetofrealnumberswithzeroremoved (×, R −{0}),which isalsoanAbeliangroup.Theidentityelementis1,andtheinverses
Fig.1.1:Illustratingabstractgroups andmappingbetweenthem.Shown arethetwocontinuousgroupsofreal numbersunderadditionis(leftcolumn)andmultiplication(rightcolumn),withidentities 0 and 1 respectively.Thehomomorphismof exponentiationmapsadditiononto multiplication(lefttorightcolumn), andtheinverse,thelogarithm,maps multiplicationbackontoaddition (righttoleftcolumn).Therefore, thesetwogroupsarehomomorphic.
arethereciprocalsofeachnumber.Multiplicationisalsoassociative, andcommutative.Notethatwecannotincludezerobecausethiswould requiretheinclusionoftheinverseofzero 1/0,whichdoesnotexist (Figure1.1).
Groupsarenaturallyassociatedwith symmetries.Forexample,the setofrigidgeometrictransformationsofarectanglethatleavetherectangleunchangedinthesameposition,formagrouptogetherwithcompositionsofthesetransformations(thereareflipsalongthehorizontal andverticalmidlines,oneclockwiserotationthrough180° aboutthe centre,andtheidentitytransformationthatdoesnothing).Thisgroup canbedenotedas V4 =(◦, {e,h,v,r}),where e istheidentity, h thehorizontalflip, v theverticalflip,and r therotation,withthecomposition operation ◦.Fortherectangle,wecanseethat h◦v = r,i.e.ahorizontal followedbyaverticalflipcorrespondstoa180° rotation(Figure1.2).
Veryoften,thefactthatweareabletomakesomeconvenientalgebraiccalculationsinstatisticalmachinelearningandsignalprocessing, canbetracedtotheexistenceofoneormoresymmetrygroupsthatarise duetothechoiceofmathematicalassumptions,andwewillencounter manyexamplesofthisphenomenainlaterchapters,whichoftenleadto significantcomputationalefficiencies.Astrikingexampleoftheconsequencesofgroupsinclassicalalgebraistheexplanationforwhythere arenosolutionsthatcanbewrittenintermsofaddition,multiplicationandroots,tothegeneralpolynomialequation N i=0 aixi =0 when N ≥ 5.Thisfacthasmanypracticalconsequences,forexample,itis possibletofindtheeigenvaluesofageneralmatrixofsize N × N using simpleanalyticalcalculationswhen N< 5 (althoughtheanalyticalcalculationsdobecomeprohibitivelycomplex),butthereisnopossibility ofusingsimilaranalyticaltechniqueswhen N ≥ 5,andonemustresort tonumericalmethods,andthesemethodssometimescannotguarantee tofindallsolutions!
Fig.1.2:Thegroupofsymmetries oftherectangle, V4 =(◦, {e,h,v,r}). Itconsistsofhorizontalandvertical flips,andarotationof180° aboutthe centre.Thisgroupisisomorphicto thegroup M8 =(×8, {1, 3, 5, 7}) (see Figure1.3).
Manysimplegroupswiththesamenumberofelementsare isomorphic toeachother,thatis,thereisauniquefunctionthatmapstheelements ofonegrouptotheelementsoftheother,suchthattheoperations canbeappliedconsistentlytothemappedelements.Intuitivelythen, theidentityinonegroupismappedtothatoftheothergroup.For example,therotationgroup V4 aboveisisomorphictothegroup M8 = (×8, {1, 3, 5, 7}),where×8 indicatesmultiplicationmodulo8(thatis, takingtheremainderofthemultiplicationondivisionby8,seeFigure 1.3).
Whilsttwogroupsmightnotbeisomorphic,theyaresometimes homomorphic:thereisafunctionbetweenonegroupandtheotherthat mapseachelementinthefirstgrouptooneormoreelementsinthe secondgroup,butthemappingisstillconsistentunderthesecondoperation.Averyimportantexampleistheexponentialmap, exp(x),that convertsadditionoverthesetofrealnumbers,tomultiplicationover thesetofpositiverealnumbers: ea+b = eaeb;apowerfulvariantof thismapiswidelyusedinstatisticalinferencetosimplifyandstabilize calculationsinvolvingtheprobabilitiesofindependentstatisticalevents,
byconvertingthemintocalculationswiththeirassociatedinformation content.Thisisthenegativelogarithmicmap ln(x),thatconverts probabilitiesundermultiplication,intoentropiesunderaddition.This mapisverywidelyusedinstatisticalinferenceasweshallsee.
Foramoredetailedbutaccessiblebackgroundtogrouptheory,read Humphreys(1996).
Rings
Whilstgroupsdealwithoneoperationonasetofnumbers, rings area slightlymorecomplexstructurethatoftenariseswhentwooperations areappliedtothesameset.Themostimmediatelytangibleexample istheoperationsofadditionandmultiplicationonthesetofintegers Z (thepositiveandnegativewholenumberswithzero).Usingthedefinitionabove,thesetofintegersunderadditionformanAbeliangroup, whereasundermultiplicationtheintegersformasimplestructureknown asa monoid –agroupwithoutinverses.Multiplicationwiththeintegers isassociative,andthereisanidentity(thepositivenumberone),butthe multiplicativeinversesarenotintegers(theyarefractionssuchas 1/2, 1/5 etc.)Finally,incombination,integermultiplication distributes overintegeraddition: a × (b + c)= a × b + a × c =(b + c) × a.These propertiesdefinearing:ithasoneoperationthattogetherwiththeset formsanAbeliangroup,andanotheroperationthat,withtheset,forms amonoid,andthesecondoperationdistributesoverthefirst.Aswith integers,thesetofrealnumbersundertheusualadditionandmultiplicationalsohasthestructureofaring.Anotherveryimportantexample isthesetof squarematrices allofsize N × N withrealelementsundernormalmatrixadditionandmultiplication.Herethemultiplicative identityelementisthe identitymatrix ofsize N × N ,andtheadditive identityelementisthesamesizesquarematrixwithallzeroelements.
Ringsarepowerfulstructuresthatcanleadtoverysubstantialcomputationalsavingsformanystatisticalmachinelearningandsignalprocessingproblems.Forexample,ifweremovetheconditionthatthe additiveoperationmusthaveinverses,thenwehaveapairofmonoids thataredistributive.Thisstructureisknownasa semiring or semifield anditturnsoutthattheexistenceofthisstructureinmanymachine learningandsignalprocessingproblemsmakestheseotherwisecomputationallyintractableproblemsfeasible.Forexample,theclassical Viterbi algorithm fordeterminingthemostlikelysequenceofhiddenstatesina HiddenMarkovModel (HMM)isanapplicationofthe max-sumsemifield onthe dynamicBayesiannetwork thatdefinesthestochasticdependenciesinthemodel.
BothDummitandFoote(2004)andRotman(2000)containdetailed introductionstoabstractalgebraincludinggroupsandrings.
Fig.1.3:Thetableforthesymmetrygroup V4 =(◦, {e,h,v,r}) (top), andthegroup M8 =(×8, {1, 3, 5, 7}) (bottom),showingtheisomorphism betweenthemobtainedbymapping e → 1, h → 3, v → 5 and r → 7.
Fig.1.4:Metric2Dcircles d (x, 0)= c forvariousdistancemetrics.From toptobottom,Euclideandistance, absolutedistance,thedistance d (x, y)= D i=1 |xi yi|0 3 0 3 1 , andtheMahalanobisdistancefor Σ11 =Σ22 =1 0, Σ12 =Σ21 = 0 5. Thecontoursare c =1 (redlines) and c =0 5 (bluelines).
1.2Metrics
Distanceisafundamentalconceptinmathematics.Distancefunctions playakeyroleinmachinelearningandsignalprocessing,particularly asmeasuresofsimilaritybetweenobjects,forexample,digitalsignals encodedasitemsofaset.Wewillalsoseethatastatisticalmodeloften impliestheuseofaparticularmeasureofdistance,andthismeasure determinesthepropertiesofstatisticalinferencesthatcanbemade.
A geometry isobtainedbyattachinganotionofdistancetoaset:it becomesa metricspace.A metric takestwopointsinthesetandreturns asingle(usuallyreal)valuerepresentingthedistancebetweenthem.A metricmusthavethefollowingpropertiestosatisfyintuitivenotionsof distance:
(1) Non-negativity: d (x,y) ≥ 0, (2) Symmetry: d (x,y)= d (y,x) , (3) Coincidence: d (x,x)=0,and (4) Triangleinequality: d (x,z) ≤ d (x,y)+ d (y,z).
Respectively,theserequirementsarethat(1)distancecannotbenegative,(2)thedistancegoingfrom x to y isthesameasthatfrom y to x,(3)onlypointslyingontopofeachotherhavezerodistancebetween them,and(4)thelengthofanyonesideofatriangledefinedbythree pointscannotbegreaterthanthesumofthelengthoftheothertwo sides.Forexample,the Euclideanmetric ona D-dimensionalsetis:
Thisrepresentsthenotionofdistancethatweexperienceineveryday geometry.Thedefiningpropertiesofdistanceleadtoavastrangeof possiblegeometries,forexample,the city-blockgeometry isdefinedby theabsolutedistancemetric: d (x, y)= D i=1 |xi yi| (1.2)
City-blockdistanceissonamedbecauseitmeasuresdistancesona gridparalleltotheco-ordinateaxes.Distanceneednottakeonreal values,forexample,the discretemetric isdefinedas d (x,y)=0 if x = y and d (x,y)=1 otherwise.Anotherveryimportantmetricisthe Mahalanobisdistance: d (x, y)= (x y)T Σ 1 (x y) (1.3)
Thisdistancemaynotbeaxis-aligned:itcorrespondstothefinding theEuclideandistanceafterapplyinganarbitrarystretchorcompression alongeachaxisfollowedbyanarbitrary D-dimensionalrotation(indeed if Σ = I,theidentitymatrix,thisisidenticaltotheEuclideandistance).
Figure1.4showsplotsof2D circles d (x, 0)= c forvariousmetrics,in particular c =1 whichisknownasthe unitcircle inthatmetricspace. Forfurtherreading,metricspacesareintroducedinSutherland(2009) inthecontextofrealanalysisandtopology.
1.3Vectorspaces
A space isjustthenamegiventoasetendowedwithsomeadditional mathematicalstructure.A(real) vectorspace isthekeystructureof linearalgebrathatisacentraltopicinmostofclassicalsignalprocessing—alldigitalsignalsarevectors,forexample.Thedefinition ofa(finite)vectorspacebeginswithanorderedset(oftenwrittenas acolumn)of N realnumberscalleda vector,andasinglerealnumbercalleda scalar.Tothatvectorweattachtheadditionoperation whichisbothassociativeandcommutative,thatsimplyaddseverycorrespondingelementofthenumbersineachvectortogether,writtenas v + u.Theidentityforthisoperationisthevectorwith N zeros, 0.
Additionally,wedefineascalarmultiplicationoperationthatmultiplieseachelementofthevectorwithascalar, λ.Usingthescalar multiplicationby λ = 1,wecanthenforminversesofanyvector. Scalarmultiplicationshouldnotmatterinwhichordertwoscalarmultiplicationsoccur,e.g. λ (µv)=(λµ) v =(µλ) v = µ (λv).We alsorequirethatscalarmultiplicationdistributesovervectoraddition, λ (v + u)= λv + λu,andscalaradditiondistributesoverscalarmultiplication, (λ + µ) v = λv + µv.
Everyvectorspacehasatleastone basis forthespace:thisisasetof linearlyindependentvectors,suchthateveryvectorinthevectorspace canbewrittenasauniquelinearcombinationofthesebasisvectors (Figure1.5).Sinceourvectorshave N entries,therearealways N vectorsinthebasis.Thus, N isthe dimension ofthevectorspace. Thesimplestbasisistheso-called standardbasis,consistingofthe N vectors e1 =(1, 0,..., 0)T , e2 =(0, 1,..., 0)T etc.Itiseasytoseethat avector v =(v1,v2,...,vN )T canbeexpressedintermsofthisbasisas v = v1e1 + v2e2 + ··· + vN eN
Byattachinga norm toavectorspace(seebelow),wecanmeasure thelengthofanyvector,thevectorspaceisthenreferredtoasa normed space.Tosatisfyintuitivenotionsoflength,anorm V (u) musthave thefollowingproperties:
(1) Non-negativity: V (u) ≥ 0,
(2) Positivescalability: V (αu)= |α| V (u),
(3) Separation:If V (u)=0 then u = 0,and
(4) Triangleinequality: V (u + v) ≤ V (u)+ V (v).
Often,thenotation u isused.ProbablymostfamiliaristheEuclideannorm u 2 = N i=1 u2 i ,butanothernormthatgetsheavyuse instatisticalmachinelearningisthe Lp-norm:
Fig.1.5:Importantconceptsin2D vectorspaces.Thestandardbasis (e1, e2) isalignedwiththeaxes.Two othervectors (v1, v2) canalsobeused asabasisforthespace,allthatisrequiredistheyarelinearlyindependentofeachother(inthe2Dcase, theyarenotsimplyscalarmultiples ofeachother).Then x canberepresentedineitherbasis.Thedot productbetweentwovectorsisproportionaltothecosineoftheanglebetweenthem: cos(θ) ∝ v1, v2 . Because (e1, e2) areatmutualright angles,theyhavezerodotproduct andthereforethebasisisorthogonal.Additionally,itisanorthonormalbasisbecausetheyareunitnorm (length) ( e1 = e2 =1).Theother basisvectorsareneitherorthogonal norunitnorm.
ofwhichtheEuclidean(p =2)andcity-block(p =1)normsarespecial cases.Alsoofimportanceisthemax-norm u ∞ =maxi=1...N |ui|, whichisjustthelengthofthelargestco-ordinate.
Thereareseveralwaysinwhichaproductofvectorscanbeformed. Themostimportantinourcontextisthe innerproduct betweentwo vectors:
Thisissometimesalsodescribedasthe dotproduct u v.Forcomplex vectors,thisisdefinedas:
where a isthecomplexconjugateof a ∈ C. Wewillseelaterthatthedotproductplaysacentralroleinthestatisticalnotionofcorrelation.Whentwovectorshavezeroinnerproduct,theyaresaidtobe orthogonal;geometricallytheymeetatarightangle.Thisalsohasastatisticalinterpretation:forcertainrandomvariables,orthogonalityimpliesstatisticalindependence.Thus,orthogonalityleadstosignificantsimplificationsincommoncalculationsinclassical DSP.
Aspecialandveryusefulkindofbasisisan orthogonalbasis where theinnerproductbetweeneverypairofdistinctbasisvectorsiszero:
(1.7)
Inaddition,thebasisis orthonormal ifeverybasisvectorhasunit norm vi =1 –thestandardbasisisorthonormal,forexample(Figure 1.5).Orthogonality/orthonormalitydramaticallysimplifiesmanycalculationsovervectorspaces,partlybecauseitisstraightforwardtofind the N scalarcoefficients ai ofanarbitraryvector u inthisbasisusing theinnerproduct:
(1.8) whichsimplifiesto ai = u, vi intheorthonormalcase.Orthonormal basesarethebackboneofmanymethodsinDSPandmachinelearning. WecanexpresstheEuclideannormusingtheinnerproduct: u 2 = √u u.Aninnerproductsatisfiesthefollowingproperties:
(1) Non-negativity: u v ≥ 0, (2) Symmetry: u v = v u, and
(3) Linearity: (αu) · v = α (u · v).
Thereisanintuitiveconnectionbetweendistanceandlength:assuming thatthemetricishomogeneous d (αu,αv)= |α| d (u, v) andtranslationinvariant d (u, v)= d (u + a, v + a),anormcanbedefinedasthe distancetotheorigin u = d (0, u).Acommonlyoccurringexample ofthistheso-called squared L2 weightednorm u 2 A =uT Au whichis justthesquaredMahalanobisdistance d (0, u)2 discussedearlierwith Σ 1 = A.
Ontheotherhand,thereisonesenseinwhicheverynorm induces an associatedmetricwiththeconstruction d (u, v)= u v .ThisconstructionenjoysextensiveuseinmachinelearningandstatisticalDSPto quantifythe“discrepancy”or“error”betweentwosignals.Infact,since normsare convex (discussedlater),itfollowsthatmetricsconstructed thiswayfromnorms,arealsoconvex,afactofcrucialimportancein practice.
Afinalproductwewillhaveneedforinlaterchaptersisthe elementwiseproduct w = u ◦ v whichisobtainedbymultiplyingeachelement inthevectortogether wn = unvn
Linearoperators
Alinearoperatoror map actsonvectorstocreateothervectors,and whiledoingso,preservetheoperationsofvectoradditionandscalar multiplication.Theyarehomomorphismsbetweenvectorspaces.Linearoperatorsarefundamentaltoclassicaldigitalsignalprocessingand statistics,andsofindheavyuseinmachinelearning.Linearoperators L havethe linearcombination property:
Whatthissaysisthattheoperatorcommuteswithscalarmultiplicationandvectoraddition:wegetthesameresultifwefirstscale,then addthevectors,andthenapplytheoperatortotheresult,or,applythe operatortoeachvector,thenscalethem,andthemadduptheresults (Figure1.6).
Matrices(whichwediscussnext),differentiationandintegration,and expectationinprobabilityareallexamplesoflinearoperators.The linearityofintegrationanddifferentiationarestandardruleswhichcan bederivedfromthebasicdefinitions.Linearmapsintwo-dimensional spacehaveanicegeometricinterpretation:straightlinesinthevector spacearemappedontootherstraightlines(orontoapointiftheyare degenerate maps).Thisideaextendstohigherdimensionalvectorspaces inthenaturalway.
Matrixalgebra
Whenvectorsare‘stackedtogether’theyformapowerfulstructure
Fig.1.6:A‘flowdiagram’depicting linearoperators.Alllinearoperatorssharethepropertythattheoperator L appliedtothescaledsum of(twoormore)vectors α1u1 + α2u2 (toppanel),isthesameasthescaled sumofthesameoperatorappliedto eachofthesevectorsfirst(bottom panel).Inotherwords,itdoesnot matterwhethertheoperatorisappliedbeforeorafterthescaledsum.
Fig.1.7:Adepictionofthegeometriceffectofinvertibleandnoninvertiblesquarematrices.Theinvertiblesquarematrix A mapsthe triangleatthebottomtothe‘thinner’triangle(forexample,bytransformingthevectorforeachvertex). Itscalestheareaofthetriangleby thedeterminant |A| =0.However, thenon-invertiblesquarematrix B collapsesthetriangleontoasingle pointwithnoareabecause |B| =0. Therefore, A 1 iswell-defined,but B 1 isnot.
whichisacentraltopicofmuchofsignalprocessing,statisticsandmachinelearning: matrixalgebra.A matrix isa‘rectangular’arrayof N × M elements,forexample,the 3 × 2 matrix A is:
Thiscanbeseentobetwolengththreevectorsstackedside-by-side. Theelementsofamatrixareoftenwrittenusingthesubscriptnotation aij where i =1, 2,...,N and j =1, 2,...,M .Matrixadditionoftwo matrices,iscommutative: C = A + B = B+A,wheretheadditionis element-by-elementi.e. cij = aij + bij .
Aswithvectors,therearemanypossiblewaysinwhichmatrixmultiplicationcouldbedefined:theonemostcommonlyencounteredisthe row-by-columninnerproduct.Fortwomatrices A ofsize N × M and B ofsize M × P ,theproduct C = A × B isanewmatrixofsize N × P definedas:
Thiscanbeseentobethematrixofallpossibleinnerproductsofeach rowof A byeachcolumnof B.Notethatthenumberofcolumnsofthe lefthandmatrixmustmatchthenumberofrowsoftherighthandone. Matrixmultiplicationisassociative,itdistributesovermatrixaddition, anditiscompatiblewithscalarmultiplication: αA = B simplygivesthe newmatrixwithentries bij = αaij ,i.e.itisjustcolumnwiseapplication ofvectorscalarmultiplication.Matrixmultiplicationis,however, noncommutative:itisnottrueingeneralthat A × B givesthesameresult as B × A
Ausefulmatrixoperatoristhe transpose thatswapsrowswith columns;if A isan N × M matrixthen AT = B isthe M × N matrix bji = aij .Someofthepropertiesofthetransposeare:itisself-inverse AT T = A;respectsaddition (A + B)T = AT + BT ;anditreverses theorderoffactorsinmultiplication (AB)T = BT AT
Squareandinvertiblematrices
Sofar,wehavenotdiscussedhowtosolvematrixequations.Thecase ofadditioniseasybecausewecanusescalarmultiplicationtoformthe negativeofamatrix,i.e.given C = A + B,finding B requiresusto calculate B = C A = C +( 1) A.Inthecaseofmultiplication,we needtofindthe“reciprocal”ofamatrix,e.g.tosolve C = AB for B we wouldnaturallycalculate A 1C = A 1AB = B bytheusualalgebraic rules.However,thingsbecomemorecomplicatedbecause A 1 doesnot existingeneral.Wewilldiscusstheconditionsunderwhichamatrix doeshaveamultiplicativeinversenext.
All squarematrices ofsize N × N canbesummedormultipliedto-
getherinanyorder.Asquarematrix A withallzeroelementsexcept forthe maindiagonal,i.e. aij =0 unless i = j iscalleda diagonalmatrix.Aspecialdiagonalmatrix, I,isthe identitymatrix wherethemain diagonalentries aii =1.Then,iftheequality AB = I = BA holds,the The N × N identitymatrixisdenoted IN orsimply I whenthe contextisclearandthesizecan beomitted.
matrix B mustbewell-defined(andunique)anditistheinverseof A, i.e. B = A 1.Wethensaythat A is invertible; ifitisnotinvertible thenitis degenerate or singular. Therearemanyequivalentconditionsformatrixinvertibility,forexample,theonlysolutiontotheequation Ax = 0 isthevector x = 0 orthecolumnsof A arelinearlyindependent.Butoneparticularly importantwaytotesttheinvertibilityofamatrixistocalculatethe determinant |A|:ifthematrixissingular,thedeterminantiszero.It followsthatallinvertiblematriceshave |A| =0.Thedeterminantcalculationisquiteelaborateforageneralsquarematrix,formulasexistbut geometricintuitionhelpstounderstandthesecalculations:whenalinear mapdefinedbyamatrixactsonageometricobjectinvectorspacewith acertainvolume,thedeterminantisthe scalingfactor ofthemapping. Volumesundertheactionofthemaparescaledbythemagnitudeof thedeterminant.Ifthedeterminantisnegative,the orientation ofany geometricobjectisreversed.Therefore,invertibletransformationsare thosethatdonotcollapsethevolumeofanyobjectinthevectorspace tozero(Figure1.7).
Anothermatrixoperatorwhichfindssignificantuseisthe tracetr (A) ofasquarematrix:thisisjustthesumofthediagonals,e.g.tr (A)= N i=1 aii.Thetraceisinvarianttoadditiontr (A + B)= tr (A)+tr (B), transposetr AT = tr (A) andmultiplicationtr (AB)= tr (BA).With productsofthreeormorematricesthetraceisinvarianttocyclicpermutations,withthreematrices:tr (ABC)= tr (CAB)= tr (BCA)
Eigenvaluesandeigenvectors
Aubiquitouscomputationthatarisesinconnectionwithalgebraicproblemsinvectorspacesisthe eigenvalueproblem forthegiven N ×N square matrix A:
Anynon-zero N × 1 vector v whichsolvesthisequationisknownas an eigenvector of A,andthescalarvalue λ isknownastheassociated eigenvalue.Eigenvectorsarenotunique:theycanbemultipliedbyany non-zeroscalarandstillremaineigenvectorswiththesameeigenvalues. Thus,oftentheunitlengtheigenvectorsaresoughtasthesolutionsto (1.12).
Itshouldbenotedthat(1.12)arisesforvectorspacesingenerale.g. linearoperators.Animportantexampleoccursinthevectorspaceof functions f (x) withdifferentialtheoperator L = d dx .Here,thecorrespondingeigenvalueproblemisthedifferentialequation L [f (x)]= λf (x),forwhichthesolutionis f (x)= aeλx forany(non-zero)scalar value a.Thisisknownasan eigenfunction ofthedifferentialoperator
Fig.1.8:Anexampleofdiagonalizingamatrix.Thediagonalizable squarematrix A hasdiagonalmatrix D containingtheeigenvalues,and transformationmatrix P containing theeigenbasis,so A = PDP 1 . A mapstherotatedsquare(top),to therectangleinthesameorientation (atleft).Thisisequivalenttofirst ‘unrotating’thesquare(theeffect of P 1)suchthatitisalignedwith theco-ordinateaxes,thenstretching/compressingthesquarealong eachaxis(theeffectof D),andfinally rotatingbacktotheoriginalorientation(theeffectof P).
Av = λv (1.12)
LIftheyexist,theeigenvectorsandeigenvaluesofasquarematrix A can befoundbyobtainingallscalarvalues λ suchthat|(A λI)| =0.This holdsbecause Av λv = 0 ifandonlyif |(A λI)| =0.Expandingout thisdeterminantequationleadstoan N -thorderpolynomialequation in λ,namely aN λN + aN 1λN 1 + ··· + a0 =0,andtherootsofthis equationaretheeigenvalues.
Thispolynomialisknownasthe characteristicpolynomial for A and determinestheexistenceofasetofeigenvectorsthatisalsoabasisfor thespace,inthefollowingway.Thefundamentaltheoremofalgebra statesthatthispolynomialhasexactly N roots,butsomemaybe repeated (i.e.occurmorethanonce).Iftherearenorepeatedrootsofthe characteristicpolynomial,thentheeigenvaluesarealldistinct,andso thereare N eigenvectorswhicharealllinearlyindependent.Thismeans thattheyformabasisforthevectorspace,whichisthe eigenbasis for thematrix.
Notallmatriceshaveaneigenbasis.Howevermatricesthatdoare also diagonalizable,thatis,theyhavethesamegeometriceffectasa diagonalmatrix,butinadifferentbasisotherthanthestandardone. Thisbasiscanbefoundbysolvingtheeigenvalueproblem.Placingall theeigenvectorsintothecolumnsofamatrix P andallthecorresponding eigenvaluesintoadiagonalmatrix D,thenthematrixcanberewritten:
SeeFigure1.8.Adiagonalmatrixsimplyscalesallthecoordinatesof thespacebyadifferent,fixedamount.Theyareverysimpletodealwith, andhaveimportantapplicationsinsignalprocessingandmachinelearning.Forexample,theGaussiandistributionovermultiplevariables,one ofthemostimportantdistributionsinpracticalapplications,encodes theprobabilisticrelationshipbetweeneachvariableintheproblemwith the covariancematrix.Bydiagonalizingthismatrix,onecanfindalinearmappingwhichmakesallthevariablesstatisticallyindependentof eachother:thisdramaticallysimplifiesmanysubsequentcalculations. Despitethecentralimportanceoftheeigenvectorsandeigenvaluesa linearproblem,itisgenerallynotpossibletofindalltheeigenvaluesby analyticalcalculation.Thereforeonegenerallyturnstoiterativenumericalalgorithmstoobtainananswertoacertainprecision.
Specialmatrices
Beyondwhathasbeenalreadydiscussed,thereisnotthatmuchmoreto besaidaboutgeneralmatriceswhichhave N × M degreesoffreedom. Specialmatriceswithfewerdegreesoffreedomhaveveryinteresting propertiesandoccurfrequentlyinpractice.
Someofthemostinterestingspecialmatricesare symmetricmatrices withrealentries –self-transpose andsosquarebydefinition,i.e. AT = A.Thesematricesarealwaysdiagonalizable,andhaveanorthogonal eigenbasis.Theeigenvaluesarealwaysreal.Iftheinverseexists,itis
A = PDP 1 (1.13)
alsosymmetric.Asymmetricmatrixhas 1 2 N (N +1) uniqueentries,on theorderofhalfthe N 2 entriesofanarbitrarysquarematrix.
Positive-definitematrices areaspecialkindofsymmetricmatrixfor which vT Av > 0 foranynon-zerovector v.Alltheeigenvaluesare positive.Takeanyreal,invertiblematrix B (sothat Bv = 0 forall such v)andlet A = BT B,then vT BT B
making A positive-definite.Aswillbedescribedinthenextsection, thesekindsofmatricesareveryimportantinmachinelearningandsignal processingbecausethecovariancematrixofasetofrandomvariablesis positive-definiteforexactlythisreason.
Orthonormalmatrices haveallcolumnswhicharevectorsthatform anorthonormalbasisforthespace.Thedeterminantofthesematrices iseither +1 or 1.Likesymmetricmatricestheyarealwaysdiagonalizable,althoughtheeigenvectorsaregenerallycomplexwithmodulus 1.Anorthonormalmatrixisalwaysinvertible,theinverseisalsoorthonormalandequaltothetranspose, AT = A 1.Thesubsetwith determinant +1,correspondto rotations inthevectorspace.
For upper (lower) triangularmatrices,thediagonalandtheentries above(below)thediagonalarenon-zero,therestzero.Thesematrices oftenoccurwhensolvingmatrixproblemssuchas Av = b,becausethe matrixequation Lv = b issimpletosolveby forwardsubstitution if L islower-triangular.Forwardsubstitutionisastraightforwardsequential procedurewhichfirstobtains v1 intermsof b1 and l11,then v2 interms of b1, l21 and l22 etc.Thesameholdsforuppertriangularmatricesand backwardsubstitution.Becauseofthesimplicityofthesesubstitution procedures,thereexistmethodsfordecomposingamatrixintoaproduct ofupperorlowertriangularmatricesandacompanionmatrix.
Toeplitzmatrices arematriceswith 2N 1 degreesoffreedomthat haveconstantdiagonals,thatis,theelementsof A haveentries aij = ci j .All discreteconvolutions canberepresentedasToeplitzmatrices, andaswewilldiscusslater,thismakesthemoffundamentalimportance inDSP.Becauseofthereduceddegreesoffreedomandspecialstructure ofthematrix,aToeplitzmatrixproblem Ax = b iscomputationally easiertosolvethanageneralmatrixproblem:amethodknownasthe Levinsonrecursiondramaticallyreducesthenumberofarithmeticoperationsneeded.
Circulantmatrices areToeplitzmatriceswhereeachrowisobtained fromtherowabovebyrotatingitoneelementtotheright.Withonly N degreesoffreedomtheyarehighlystructuredandcanbeunderstood asdiscretecircularconvolutions.Theeigenbasiswhichdiagonalizesthe matrixisthe discreteFourierbasis whichisoneofthecornerstonesof classicalDSP.Itfollowsthatanycirculantmatrixproblemcanbevery efficientlysolvedusingthe fastFouriertransform (FFT).
DummitandFoote(2004)containsanin-depthexpositionofvector spacesfromanabstractpointofview,whereasKayeandWilson(1998) isanaccessibleandmoreconcreteintroduction.