Instant ebooks textbook Machine learning for signal processing: data science, algorithms, and comput by Education Libraries

Visit to download the full and correct content document: https://ebookmass.com/product/machine-learning-for-signal-processing-data-sciencealgorithms-and-computational-statistics-max-a-little/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Machine Learning Algorithms for Signal and Image Processing Suman Lata Tripathi

https://ebookmass.com/product/machine-learning-algorithms-forsignal-and-image-processing-suman-lata-tripathi/

Introduction to algorithms for data mining and machine learning Yang

https://ebookmass.com/product/introduction-to-algorithms-fordata-mining-and-machine-learning-yang/

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms,

https://ebookmass.com/product/fundamentals-of-machine-learningfor-predictive-data-analytics-algorithms/

Introduction to Algorithms for Data Mining and Machine Learning Xin-She Yang

https://ebookmass.com/product/introduction-to-algorithms-fordata-mining-and-machine-learning-xin-she-yang/

Machine Learning for Biometrics: Concepts, Algorithms and Applications (Cognitive Data Science in Sustainable Computing) Partha Pratim Sarangi

https://ebookmass.com/product/machine-learning-for-biometricsconcepts-algorithms-and-applications-cognitive-data-science-insustainable-computing-partha-pratim-sarangi/

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services Dr. Shitalkumar R. Sukhdeve

https://ebookmass.com/product/google-cloud-platform-for-datascience-a-crash-course-on-big-data-machine-learning-and-dataanalytics-services-dr-shitalkumar-r-sukhdeve/

Data Science With Rust: A Comprehensive Guide - Data Analysis, Machine Learning, Data Visualization & More Van Der Post

https://ebookmass.com/product/data-science-with-rust-acomprehensive-guide-data-analysis-machine-learning-datavisualization-more-van-der-post/

Machine Learning for Planetary Science Joern Helbert

https://ebookmass.com/product/machine-learning-for-planetaryscience-joern-helbert/

Signal processing for multistatic radar systems: adaptive waveform selection, optimal geometries and pseudolinear tracking algorithms Do■ançay

https://ebookmass.com/product/signal-processing-for-multistaticradar-systems-adaptive-waveform-selection-optimal-geometries-andpseudolinear-tracking-algorithms-dogancay/

MachineLearningforSignal Processing

GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom

OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries

©MaxA.Little2019

Themoralrightsoftheauthorhavebeenasserted

FirstEditionpublishedin2019

Impression:1

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove

Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica

BritishLibraryCataloguinginPublicationData

Dataavailable

LibraryofCongressControlNumber:2019944777

ISBN978–0–19–871493–4 DOI:10.1093/oso/9780198714934.001.0001

Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY

Preface

Digitalsignalprocessing(DSP)isoneofthe‘foundational’butsomewhatinvisible,engineeringtopicsofthemodernworld,withoutwhich, manyofthetechnologieswetakeforgranted:thedigitaltelephone, digitalradio,television,CDandMP3players,WiFi,radar,toname justafew,wouldnotbepossible.Arelativenewcomerbycomparison, statisticalmachinelearningisthetheoreticalbackboneofexcitingtechnologiesthatarebynowstartingtoreachalevelofubiquity,suchas automatictechniquesforcarregistrationplaterecognition,speechrecognition,stockmarketprediction,defectdetectiononassemblylines,robot guidanceandautonomouscarnavigation.Statisticalmachinelearning hasoriginsintherecentmergingofclassicalprobabilityandstatistics withartiﬁcialintelligence,whichexploitstheanalogybetweenintelligent informationprocessinginbiologicalbrainsandsophisticatedstatistical modellingandinference.

DSPandstatisticalmachinelearningareofsuchwideimportanceto theknowledgeeconomythatbothhaveundergonerapidchangesand seenradicalimprovementsinscopeandapplicability.BothDSPand statisticalmachinelearningmakeuseofkeytopicsinappliedmathematicssuchasprobabilityandstatistics,algebra,calculus,graphsand networks.Therefore,intimateformallinksbetweenthetwosubjects existandbecauseofthis,anemergingconsensusviewisthatDSPand statisticalmachinelearningshouldnotbeseenasseparatesubjects.The manyoverlapsthatexistbetweenthetwosubjectscanbeexploitedto producenewdigitalsignalprocessingtoolsofsurprisingutilityandeﬃciency,andwideapplicability,highlysuitedtothecontemporaryworld ofpervasivedigitalsensorsandhigh-poweredandyetcheap,computinghardware.Thisbookgivesasolidmathematicalfoundationtothe topicofstatisticalmachinelearningforsignalprocessing,includingthe contemporaryconceptsofthe probabilisticgraphicalmodel (PGM)and nonparametricBayes,conceptswhichhaveonlymorerecentlyemerged asimportantforsolvingDSPproblems.

Thebookisaimedatadvancedundergraduatesorfirst-yearPhDstudentsaswellasresearchersandpractitioners.Itaddressesthefoundationalmathematicalconcepts,illustratedwithpertinentandpractical examplesacrossarangeofproblemsinengineeringandscience.Theaim istoenablestudentswithanundergraduatebackgroundinmathematics,statisticsorphysics,fromawiderangeofquantitativedisciplines, togetquicklyuptospeedwiththelatesttechniquesandconceptsin thisfast-movingfield.Theaccompanyingsoftwarewillenablereaders totestoutthetechniquestotheirownsignalanalysisproblems.The presentationofthemathematicsismuchalongthelinesofastandard undergraduatephysicsorstatisticstextbook,freeofdistractingtechnicalcomplexitiesandjargon,whilenotsacrificingrigour.Itwouldbean excellenttextbookforemergingcoursesinmachinelearningforsignals.

4Statisticalmodellingandinference93

4.1Statisticalmodels

4.2Optimalprobabilityinferences

4.4Distributionsassociatedwithmetricsandnorms

4.5Theexponentialfamily(EF)

4.6Distributionsdeﬁnedthroughquantiles

4.7Densitiesassociatedwithpiecewiselinearlossfunctions

5Probabilisticgraphicalmodels133

5.1StatisticalmodellingwithPGMs

5.3InferenceonPGMs

7.3LTIsignalprocessing

7.4Exploitingstatisticalstabilityforlinear-GaussianDSP

Discrete-timeGaussianprocesses(GPs)andDSP

7.5TheKalmanﬁlter(KF)

7.6Time-varyinglinearsystems

8Discretesignals:sampling,quantizationandcoding265

8.1Discrete-timesampling

8.2Quantization

ListofAlgorithms

2.1Iterativelyreweightedleastsquares(IRLS)......... 52

2.2Gradientdescent........................ 53

2.3Backtrackinglinesearch.................... 54

2.4Goldensectionsearch..................... 55

2.5Newton’smethod....................... 56

2.6Thesubgradientmethod................... 61

2.7Aprimal-dualinterior-pointmethodforlinearprogramming(LP)........................... 64

2.8Greedysearchfordiscreteoptimization........... 67

2.9(Simple)tabusearch...................... 68

2.10Simulatedannealing...................... 69

2.11Randomrestarting....................... 70

3.1Newton’smethodfornumericalinversetransformsampling. 73

3.2Adaptiverejectionsampling(ARS)forlog-concavedensities.............................. 76

3.3Sequentialsearchinversion(simpliﬁedversion)....... 80

3.4Binary(subdivision)searchsampling............ 81

3.5Gibb’sMarkov-ChainMonteCarlo(MCMC)sampling... 83

3.6Markov-ChainMonteCarlo(MCMC)Metropolis-Hastings (MH)sampling......................... 86

5.1Thejunctiontreealgorithmfor(semi-ring)marginalizationinferenceofasinglevariableclique........... 142

5.2Iterativeconditionalmodes(ICM).............. 144

5.3 Expectation-maximization (E-M)............... 146

6.1Expectation-maximization(E-M)forgenerali.i.d.mixturemodels........................... 153

6.2The K-meansalgorithm.................... 172

7.1Recursive,Cooley-Tukey,radix-2,decimationintime,fast Fouriertransform(FFT)algorithm............. 210

7.2Overlap-addFIRconvolution................. 224

8.1TheLloyd-Maxalgorithmforﬁxed-ratequantizerdesign. 279

8.2The K-meansalgorithmforﬁxed-ratequantizerdesign.. 280

8.3Iterativevariable-rateentropy-constrainedquantizerdesign.281

9.1 Baum-Welch expectation-maximization(E-M)forhidden Markovmodels(HMMs)................... 308

9.2 Viterbitraining forhiddenMarkovmodels(HMMs).... 310

10.1Gaussianprocess(GP) informativevectormachine (IVM) regression............................ 325

10.2Dirichletprocessmixturemodel(DPMM)Gibbssampler. 337

10.3Dirichletprocessmeans(DP-means)algorithm....... 340

10.4Maximuma-posterioriDirichletprocessmixturecollapsed (MAP-DP)algorithmforconjugateexponentialfamily distributions.......................... 342

ListofFigures

1.1Mappingbetweenabstractgroups

2.1Lagrangemultipliersinconstrainedoptimization

2.7ConvergenceofNewton’smethodwithandwithoutline search

2.8Piecewiselinearobjectivefunction(1D)

4.1Overﬁttingandunderﬁtting:polynomialmodels

4.2Minimumdescriptionlength(MDL)

4.5ConvergencepropertiesofMetropolis-Hastings(MH)sampling

5.1Simple5-nodeprobabilisticgraphicalmodel(PGM)

5.2Notationforrepeatedconnectivityingraphicalmodels

5.3Somecomplexgraphicalmodels

5.4Threebasicprobabilisticgraphicalmodelpatterns

6.1GaussianmixturemodelforGammarayintensitystrati-

6.2Convergenceofexpectation-maximization(E-M)forthe

6.3Classiﬁerdecisionboundaries

6.14Principalcomponentsanalysis(PCA):traﬃccountsignals

6.16Locallylinearembedding(LLE):chirpsignals

7.8Computationalcomplexityoflong-durationFIRimplementations

7.9Impulseresponseandtransferfunctionofdiscretizedkernelregression

7.11Variousnonparametricpowerspectraldensityestimators 234

7.12Linearpredictivepowerspectraldensityestimators 236

7.13Regularizedlinearpredictionanalysismodelselection 238

7.14Wienerﬁlteringofhumanwalkingsignals 239

7.15Sinusoidalsubspaceprincipalcomponentsanalysis(PCA) ﬁltering

7.16SubspaceMUSICfrequencyestimation

7.17Typical2DKalmanﬁltertrajectories 243

7.18KalmanﬁltersmoothingbyTikhonovregularization 252

7.19Short-timeFouriertransformanalysis 254

7.20Time-frequencytiling:Fourierversuswaveletanalysis 256

7.21Aselectionofwaveletfunctions 261

7.22VanishingmomentsoftheDaubechieswavelet 262

7.23Discretewavelettransformwaveletshrinkage 263

8.1DigitalMEMSsensors

8.2Discrete-timesamplinghardwareblockdiagram

8.3Non-uniformanduniformandbandlimitedsampling 268

8.4Shannon-Whittakeruniformsamplinginthefrequency domain 269

8.5DigitalGammarayintensitysignalfromadrillhole 271

8.6QuadraticB-splineuniformsampling 272

8.7Rate-distortioncurvesforani.i.d.Gaussiansource 277

8.8Lloyd-Maxscalarquantization 280

8.9Entropy-constrainedscalarquantization 282

8.10Shannon-Whittakerreconstructionofdensityfunctions 283

8.11Quantizationanddithering 285

8.12Scalarvesusvectorquantization 286

8.13Compandingquantization 287

8.14Linearpredictivecodingfordatacompression 289

8.15Transformcoding:bitcountversuscoeﬃcientvariance 291

8.16Transformcodingofmusicalaudiosignals 292

8.17Compressivesensing:discretecosinetransformbasis 295

8.18Randomdemodulationcompressivesensing 297

9.1Quantilerunningﬁlterforexoplanetarylightcurves 301

9.2First-orderlog-normalrecursiveﬁlter:dailyrainfallsignals 302

9.3Totalvariationdenoising(TVD)ofpowerspectraldensity timeseries 303

9.4Bilateralﬁlteringofhumanwalkingdata 304

9.5HiddenMarkovmodelling(HMM)ofpowerspectraltime seriesdata 306

9.6Cepstralanalysisofvoicesignals 312

10.1Inﬁniteexchangeabilityinprobabilisticgraphicalmodel form 315

10.2Linkingkernelandregularizedbasisfunctionregression 319

10.3Randomfunctiondrawsfromthelinearbasisregression model

10.4Gaussianprocess(GP)functiondrawswithvariouscovariancefunctions

10.6Gaussianprocess(GP)regressionforclassiﬁcation

10.7DrawsfromtheDirichletdistributionfor K =3

10.8TheDirichletprocess(DP)

10.10ComparingpartitionsizesofDirichletversusPitman-Yor processes

10.11Parametricversusnonparametricestimationoflinearpredictivecoding(LPC)coeﬃcients

10.12DP-meansforsignalnoisereduction:exoplanetlightcurves341

10.13Probabilisticgraphicalmodel(PGM)fortheBayesian mixturemodel

10.14PerformanceofMAP-DPversusDP-meansnonparametricclustering

Mathematicalfoundations 1

Statisticalmachinelearningandsignalprocessingaretopicsinapplied mathematics,whicharebaseduponmanyabstractmathematicalconcepts.Definingtheseconceptsclearlyisthemostimportantfirststepin thisbook.Thepurposeofthischapteristointroducethesefoundational mathematicalconcepts.Italsojustifiesthestatementthatmuchofthe artofstatisticalmachinelearningasappliedtosignalprocessing,liesin thechoiceofconvenientmathematicalmodelsthathappentobeuseful inpractice.Convenientinthiscontextmeansthatthealgebraicconsequencesofthechoiceofmathematicalmodelingassumptionsareinsome sensemanageable.Theseedsofthismanageabilityaretheelementary mathematicalconceptsuponwhichthesubjectisbuilt.

1.1Abstractalgebras

Wewilltakethesimpleviewinthisbookthatmathematicsisbasedon logicappliedto sets:asetisanunorderedcollectionofobjects,often realnumbers suchastheset {π, 1,e} (whichhasthree elements),orthe setofallrealnumbers R (withaninﬁnitenumberofelements).From thismodestoriginitisaremarkablefactthatwecanbuildtheentirety ofthemathematicalmethodsweneed.Weﬁrststartbyreviewingsome elementaryprinciplesof(abstract) algebras.

Groups

An algebra isastructurethatdeﬁnestherulesofwhathappenswhen pairsofelementsofasetareacteduponby operations.Akindof algebraknownasa group (+, R) istheusualnotionofadditionwith pairsofrealnumbers.Itisagroupbecauseithasan identity,the numberzero(whenzeroisaddedtoanynumberitremainsunchanged, i.e.a+0=0+a = a,andeveryelementinthesethasan inverse (forany number a,thereisaninverse a whichmeansthat a+( a)=0.Finally, theoperationis associative,whichistosaythatwhenoperatingonthree ormorenumbers,additiondoesnotdependontheorderinwhichthe numbersareadded(i.e. a +(b + c)=(a + b)+ c).Additionalsohas theintuitivepropertythat a + b = b + a,i.e.itdoesnotmatterifthe numbersareswapped:theoperatoriscalled commutative,andthegroup isthencalledan Abeliangroup.Mirroringadditionismultiplication actingonthesetofrealnumberswithzeroremoved (×, R −{0}),which isalsoanAbeliangroup.Theidentityelementis1,andtheinverses

Fig.1.1:Illustratingabstractgroups andmappingbetweenthem.Shown arethetwocontinuousgroupsofreal numbersunderadditionis(leftcolumn)andmultiplication(rightcolumn),withidentities 0 and 1 respectively.Thehomomorphismof exponentiationmapsadditiononto multiplication(lefttorightcolumn), andtheinverse,thelogarithm,maps multiplicationbackontoaddition (righttoleftcolumn).Therefore, thesetwogroupsarehomomorphic.

arethereciprocalsofeachnumber.Multiplicationisalsoassociative, andcommutative.Notethatwecannotincludezerobecausethiswould requiretheinclusionoftheinverseofzero 1/0,whichdoesnotexist (Figure1.1).

Groupsarenaturallyassociatedwith symmetries.Forexample,the setofrigidgeometrictransformationsofarectanglethatleavetherectangleunchangedinthesameposition,formagrouptogetherwithcompositionsofthesetransformations(thereareflipsalongthehorizontal andverticalmidlines,oneclockwiserotationthrough180° aboutthe centre,andtheidentitytransformationthatdoesnothing).Thisgroup canbedenotedas V4 =(◦, {e,h,v,r}),where e istheidentity, h thehorizontalflip, v theverticalflip,and r therotation,withthecomposition operation ◦.Fortherectangle,wecanseethat h◦v = r,i.e.ahorizontal followedbyaverticalflipcorrespondstoa180° rotation(Figure1.2).

Veryoften,thefactthatweareabletomakesomeconvenientalgebraiccalculationsinstatisticalmachinelearningandsignalprocessing, canbetracedtotheexistenceofoneormoresymmetrygroupsthatarise duetothechoiceofmathematicalassumptions,andwewillencounter manyexamplesofthisphenomenainlaterchapters,whichoftenleadto significantcomputationalefficiencies.Astrikingexampleoftheconsequencesofgroupsinclassicalalgebraistheexplanationforwhythere arenosolutionsthatcanbewrittenintermsofaddition,multiplicationandroots,tothegeneralpolynomialequation N i=0 aixi =0 when N ≥ 5.Thisfacthasmanypracticalconsequences,forexample,itis possibletofindtheeigenvaluesofageneralmatrixofsize N × N using simpleanalyticalcalculationswhen N< 5 (althoughtheanalyticalcalculationsdobecomeprohibitivelycomplex),butthereisnopossibility ofusingsimilaranalyticaltechniqueswhen N ≥ 5,andonemustresort tonumericalmethods,andthesemethodssometimescannotguarantee tofindallsolutions!

Fig.1.2:Thegroupofsymmetries oftherectangle, V4 =(◦, {e,h,v,r}). Itconsistsofhorizontalandvertical ﬂips,andarotationof180° aboutthe centre.Thisgroupisisomorphicto thegroup M8 =(×8, {1, 3, 5, 7}) (see Figure1.3).

Manysimplegroupswiththesamenumberofelementsare isomorphic toeachother,thatis,thereisauniquefunctionthatmapstheelements ofonegrouptotheelementsoftheother,suchthattheoperations canbeappliedconsistentlytothemappedelements.Intuitivelythen, theidentityinonegroupismappedtothatoftheothergroup.For example,therotationgroup V4 aboveisisomorphictothegroup M8 = (×8, {1, 3, 5, 7}),where×8 indicatesmultiplicationmodulo8(thatis, takingtheremainderofthemultiplicationondivisionby8,seeFigure 1.3).

Whilsttwogroupsmightnotbeisomorphic,theyaresometimes homomorphic:thereisafunctionbetweenonegroupandtheotherthat mapseachelementintheﬁrstgrouptooneormoreelementsinthe secondgroup,butthemappingisstillconsistentunderthesecondoperation.Averyimportantexampleistheexponentialmap, exp(x),that convertsadditionoverthesetofrealnumbers,tomultiplicationover thesetofpositiverealnumbers: ea+b = eaeb;apowerfulvariantof thismapiswidelyusedinstatisticalinferencetosimplifyandstabilize calculationsinvolvingtheprobabilitiesofindependentstatisticalevents,

byconvertingthemintocalculationswiththeirassociatedinformation content.Thisisthenegativelogarithmicmap ln(x),thatconverts probabilitiesundermultiplication,intoentropiesunderaddition.This mapisverywidelyusedinstatisticalinferenceasweshallsee.

Foramoredetailedbutaccessiblebackgroundtogrouptheory,read Humphreys(1996).

Rings

Whilstgroupsdealwithoneoperationonasetofnumbers, rings area slightlymorecomplexstructurethatoftenariseswhentwooperations areappliedtothesameset.Themostimmediatelytangibleexample istheoperationsofadditionandmultiplicationonthesetofintegers Z (thepositiveandnegativewholenumberswithzero).Usingthedefinitionabove,thesetofintegersunderadditionformanAbeliangroup, whereasundermultiplicationtheintegersformasimplestructureknown asa monoid –agroupwithoutinverses.Multiplicationwiththeintegers isassociative,andthereisanidentity(thepositivenumberone),butthe multiplicativeinversesarenotintegers(theyarefractionssuchas 1/2, 1/5 etc.)Finally,incombination,integermultiplication distributes overintegeraddition: a × (b + c)= a × b + a × c =(b + c) × a.These propertiesdefinearing:ithasoneoperationthattogetherwiththeset formsanAbeliangroup,andanotheroperationthat,withtheset,forms amonoid,andthesecondoperationdistributesoverthefirst.Aswith integers,thesetofrealnumbersundertheusualadditionandmultiplicationalsohasthestructureofaring.Anotherveryimportantexample isthesetof squarematrices allofsize N × N withrealelementsundernormalmatrixadditionandmultiplication.Herethemultiplicative identityelementisthe identitymatrix ofsize N × N ,andtheadditive identityelementisthesamesizesquarematrixwithallzeroelements.

Ringsarepowerfulstructuresthatcanleadtoverysubstantialcomputationalsavingsformanystatisticalmachinelearningandsignalprocessingproblems.Forexample,ifweremovetheconditionthatthe additiveoperationmusthaveinverses,thenwehaveapairofmonoids thataredistributive.Thisstructureisknownasa semiring or semifield anditturnsoutthattheexistenceofthisstructureinmanymachine learningandsignalprocessingproblemsmakestheseotherwisecomputationallyintractableproblemsfeasible.Forexample,theclassical Viterbi algorithm fordeterminingthemostlikelysequenceofhiddenstatesina HiddenMarkovModel (HMM)isanapplicationofthe max-sumsemifield onthe dynamicBayesiannetwork thatdefinesthestochasticdependenciesinthemodel.

BothDummitandFoote(2004)andRotman(2000)containdetailed introductionstoabstractalgebraincludinggroupsandrings.

Fig.1.3:Thetableforthesymmetrygroup V4 =(◦, {e,h,v,r}) (top), andthegroup M8 =(×8, {1, 3, 5, 7}) (bottom),showingtheisomorphism betweenthemobtainedbymapping e → 1, h → 3, v → 5 and r → 7.

Fig.1.4:Metric2Dcircles d (x, 0)= c forvariousdistancemetrics.From toptobottom,Euclideandistance, absolutedistance,thedistance d (x, y)= D i=1 |xi yi|0 3 0 3 1 , andtheMahalanobisdistancefor Σ11 =Σ22 =1 0, Σ12 =Σ21 = 0 5. Thecontoursare c =1 (redlines) and c =0 5 (bluelines).

1.2Metrics

Distanceisafundamentalconceptinmathematics.Distancefunctions playakeyroleinmachinelearningandsignalprocessing,particularly asmeasuresofsimilaritybetweenobjects,forexample,digitalsignals encodedasitemsofaset.Wewillalsoseethatastatisticalmodeloften impliestheuseofaparticularmeasureofdistance,andthismeasure determinesthepropertiesofstatisticalinferencesthatcanbemade.

A geometry isobtainedbyattachinganotionofdistancetoaset:it becomesa metricspace.A metric takestwopointsinthesetandreturns asingle(usuallyreal)valuerepresentingthedistancebetweenthem.A metricmusthavethefollowingpropertiestosatisfyintuitivenotionsof distance:

(1) Non-negativity: d (x,y) ≥ 0, (2) Symmetry: d (x,y)= d (y,x) , (3) Coincidence: d (x,x)=0,and (4) Triangleinequality: d (x,z) ≤ d (x,y)+ d (y,z).

Respectively,theserequirementsarethat(1)distancecannotbenegative,(2)thedistancegoingfrom x to y isthesameasthatfrom y to x,(3)onlypointslyingontopofeachotherhavezerodistancebetween them,and(4)thelengthofanyonesideofatriangledeﬁnedbythree pointscannotbegreaterthanthesumofthelengthoftheothertwo sides.Forexample,the Euclideanmetric ona D-dimensionalsetis:

Thisrepresentsthenotionofdistancethatweexperienceineveryday geometry.Thedeﬁningpropertiesofdistanceleadtoavastrangeof possiblegeometries,forexample,the city-blockgeometry isdeﬁnedby theabsolutedistancemetric: d (x, y)= D i=1 |xi yi| (1.2)

City-blockdistanceissonamedbecauseitmeasuresdistancesona gridparalleltotheco-ordinateaxes.Distanceneednottakeonreal values,forexample,the discretemetric isdeﬁnedas d (x,y)=0 if x = y and d (x,y)=1 otherwise.Anotherveryimportantmetricisthe Mahalanobisdistance: d (x, y)= (x y)T Σ 1 (x y) (1.3)

Thisdistancemaynotbeaxis-aligned:itcorrespondstotheﬁnding theEuclideandistanceafterapplyinganarbitrarystretchorcompression alongeachaxisfollowedbyanarbitrary D-dimensionalrotation(indeed if Σ = I,theidentitymatrix,thisisidenticaltotheEuclideandistance).

Figure1.4showsplotsof2D circles d (x, 0)= c forvariousmetrics,in particular c =1 whichisknownasthe unitcircle inthatmetricspace. Forfurtherreading,metricspacesareintroducedinSutherland(2009) inthecontextofrealanalysisandtopology.

1.3Vectorspaces

A space isjustthenamegiventoasetendowedwithsomeadditional mathematicalstructure.A(real) vectorspace isthekeystructureof linearalgebrathatisacentraltopicinmostofclassicalsignalprocessing—alldigitalsignalsarevectors,forexample.Thedeﬁnition ofa(ﬁnite)vectorspacebeginswithanorderedset(oftenwrittenas acolumn)of N realnumberscalleda vector,andasinglerealnumbercalleda scalar.Tothatvectorweattachtheadditionoperation whichisbothassociativeandcommutative,thatsimplyaddseverycorrespondingelementofthenumbersineachvectortogether,writtenas v + u.Theidentityforthisoperationisthevectorwith N zeros, 0.

Additionally,wedeﬁneascalarmultiplicationoperationthatmultiplieseachelementofthevectorwithascalar, λ.Usingthescalar multiplicationby λ = 1,wecanthenforminversesofanyvector. Scalarmultiplicationshouldnotmatterinwhichordertwoscalarmultiplicationsoccur,e.g. λ (µv)=(λµ) v =(µλ) v = µ (λv).We alsorequirethatscalarmultiplicationdistributesovervectoraddition, λ (v + u)= λv + λu,andscalaradditiondistributesoverscalarmultiplication, (λ + µ) v = λv + µv.

Everyvectorspacehasatleastone basis forthespace:thisisasetof linearlyindependentvectors,suchthateveryvectorinthevectorspace canbewrittenasauniquelinearcombinationofthesebasisvectors (Figure1.5).Sinceourvectorshave N entries,therearealways N vectorsinthebasis.Thus, N isthe dimension ofthevectorspace. Thesimplestbasisistheso-called standardbasis,consistingofthe N vectors e1 =(1, 0,..., 0)T , e2 =(0, 1,..., 0)T etc.Itiseasytoseethat avector v =(v1,v2,...,vN )T canbeexpressedintermsofthisbasisas v = v1e1 + v2e2 + ··· + vN eN

Byattachinga norm toavectorspace(seebelow),wecanmeasure thelengthofanyvector,thevectorspaceisthenreferredtoasa normed space.Tosatisfyintuitivenotionsoflength,anorm V (u) musthave thefollowingproperties:

(1) Non-negativity: V (u) ≥ 0,

(2) Positivescalability: V (αu)= |α| V (u),

(3) Separation:If V (u)=0 then u = 0,and

(4) Triangleinequality: V (u + v) ≤ V (u)+ V (v).

Often,thenotation u isused.ProbablymostfamiliaristheEuclideannorm u 2 = N i=1 u2 i ,butanothernormthatgetsheavyuse instatisticalmachinelearningisthe Lp-norm:

Fig.1.5:Importantconceptsin2D vectorspaces.Thestandardbasis (e1, e2) isalignedwiththeaxes.Two othervectors (v1, v2) canalsobeused asabasisforthespace,allthatisrequiredistheyarelinearlyindependentofeachother(inthe2Dcase, theyarenotsimplyscalarmultiples ofeachother).Then x canberepresentedineitherbasis.Thedot productbetweentwovectorsisproportionaltothecosineoftheanglebetweenthem: cos(θ) ∝ v1, v2 . Because (e1, e2) areatmutualright angles,theyhavezerodotproduct andthereforethebasisisorthogonal.Additionally,itisanorthonormalbasisbecausetheyareunitnorm (length) ( e1 = e2 =1).Theother basisvectorsareneitherorthogonal norunitnorm.

ofwhichtheEuclidean(p =2)andcity-block(p =1)normsarespecial cases.Alsoofimportanceisthemax-norm u ∞ =maxi=1...N |ui|, whichisjustthelengthofthelargestco-ordinate.

Thereareseveralwaysinwhichaproductofvectorscanbeformed. Themostimportantinourcontextisthe innerproduct betweentwo vectors:

Thisissometimesalsodescribedasthe dotproduct u v.Forcomplex vectors,thisisdeﬁnedas:

where a isthecomplexconjugateof a ∈ C. Wewillseelaterthatthedotproductplaysacentralroleinthestatisticalnotionofcorrelation.Whentwovectorshavezeroinnerproduct,theyaresaidtobe orthogonal;geometricallytheymeetatarightangle.Thisalsohasastatisticalinterpretation:forcertainrandomvariables,orthogonalityimpliesstatisticalindependence.Thus,orthogonalityleadstosigniﬁcantsimpliﬁcationsincommoncalculationsinclassical DSP.

Aspecialandveryusefulkindofbasisisan orthogonalbasis where theinnerproductbetweeneverypairofdistinctbasisvectorsiszero:

(1.7)

Inaddition,thebasisis orthonormal ifeverybasisvectorhasunit norm vi =1 –thestandardbasisisorthonormal,forexample(Figure 1.5).Orthogonality/orthonormalitydramaticallysimplifiesmanycalculationsovervectorspaces,partlybecauseitisstraightforwardtofind the N scalarcoefficients ai ofanarbitraryvector u inthisbasisusing theinnerproduct:

(1.8) whichsimpliﬁesto ai = u, vi intheorthonormalcase.Orthonormal basesarethebackboneofmanymethodsinDSPandmachinelearning. WecanexpresstheEuclideannormusingtheinnerproduct: u 2 = √u u.Aninnerproductsatisﬁesthefollowingproperties:

(1) Non-negativity: u v ≥ 0, (2) Symmetry: u v = v u, and

(3) Linearity: (αu) · v = α (u · v).

Thereisanintuitiveconnectionbetweendistanceandlength:assuming thatthemetricishomogeneous d (αu,αv)= |α| d (u, v) andtranslationinvariant d (u, v)= d (u + a, v + a),anormcanbedeﬁnedasthe distancetotheorigin u = d (0, u).Acommonlyoccurringexample ofthistheso-called squared L2 weightednorm u 2 A =uT Au whichis justthesquaredMahalanobisdistance d (0, u)2 discussedearlierwith Σ 1 = A.

Ontheotherhand,thereisonesenseinwhicheverynorm induces an associatedmetricwiththeconstruction d (u, v)= u v .ThisconstructionenjoysextensiveuseinmachinelearningandstatisticalDSPto quantifythe“discrepancy”or“error”betweentwosignals.Infact,since normsare convex (discussedlater),itfollowsthatmetricsconstructed thiswayfromnorms,arealsoconvex,afactofcrucialimportancein practice.

Aﬁnalproductwewillhaveneedforinlaterchaptersisthe elementwiseproduct w = u ◦ v whichisobtainedbymultiplyingeachelement inthevectortogether wn = unvn

Linearoperators

Alinearoperatoror map actsonvectorstocreateothervectors,and whiledoingso,preservetheoperationsofvectoradditionandscalar multiplication.Theyarehomomorphismsbetweenvectorspaces.Linearoperatorsarefundamentaltoclassicaldigitalsignalprocessingand statistics,andsoﬁndheavyuseinmachinelearning.Linearoperators L havethe linearcombination property:

Whatthissaysisthattheoperatorcommuteswithscalarmultiplicationandvectoraddition:wegetthesameresultifweﬁrstscale,then addthevectors,andthenapplytheoperatortotheresult,or,applythe operatortoeachvector,thenscalethem,andthemadduptheresults (Figure1.6).

Matrices(whichwediscussnext),differentiationandintegration,and expectationinprobabilityareallexamplesoflinearoperators.The linearityofintegrationanddifferentiationarestandardruleswhichcan bederivedfromthebasicdefinitions.Linearmapsintwo-dimensional spacehaveanicegeometricinterpretation:straightlinesinthevector spacearemappedontootherstraightlines(orontoapointiftheyare degenerate maps).Thisideaextendstohigherdimensionalvectorspaces inthenaturalway.

Matrixalgebra

Whenvectorsare‘stackedtogether’theyformapowerfulstructure

Fig.1.6:A‘ﬂowdiagram’depicting linearoperators.Alllinearoperatorssharethepropertythattheoperator L appliedtothescaledsum of(twoormore)vectors α1u1 + α2u2 (toppanel),isthesameasthescaled sumofthesameoperatorappliedto eachofthesevectorsﬁrst(bottom panel).Inotherwords,itdoesnot matterwhethertheoperatorisappliedbeforeorafterthescaledsum.

Fig.1.7:Adepictionofthegeometriceﬀectofinvertibleandnoninvertiblesquarematrices.Theinvertiblesquarematrix A mapsthe triangleatthebottomtothe‘thinner’triangle(forexample,bytransformingthevectorforeachvertex). Itscalestheareaofthetriangleby thedeterminant |A| =0.However, thenon-invertiblesquarematrix B collapsesthetriangleontoasingle pointwithnoareabecause |B| =0. Therefore, A 1 iswell-deﬁned,but B 1 isnot.

whichisacentraltopicofmuchofsignalprocessing,statisticsandmachinelearning: matrixalgebra.A matrix isa‘rectangular’arrayof N × M elements,forexample,the 3 × 2 matrix A is:

Thiscanbeseentobetwolengththreevectorsstackedside-by-side. Theelementsofamatrixareoftenwrittenusingthesubscriptnotation aij where i =1, 2,...,N and j =1, 2,...,M .Matrixadditionoftwo matrices,iscommutative: C = A + B = B+A,wheretheadditionis element-by-elementi.e. cij = aij + bij .

Aswithvectors,therearemanypossiblewaysinwhichmatrixmultiplicationcouldbedeﬁned:theonemostcommonlyencounteredisthe row-by-columninnerproduct.Fortwomatrices A ofsize N × M and B ofsize M × P ,theproduct C = A × B isanewmatrixofsize N × P deﬁnedas:

Thiscanbeseentobethematrixofallpossibleinnerproductsofeach rowof A byeachcolumnof B.Notethatthenumberofcolumnsofthe lefthandmatrixmustmatchthenumberofrowsoftherighthandone. Matrixmultiplicationisassociative,itdistributesovermatrixaddition, anditiscompatiblewithscalarmultiplication: αA = B simplygivesthe newmatrixwithentries bij = αaij ,i.e.itisjustcolumnwiseapplication ofvectorscalarmultiplication.Matrixmultiplicationis,however, noncommutative:itisnottrueingeneralthat A × B givesthesameresult as B × A

Ausefulmatrixoperatoristhe transpose thatswapsrowswith columns;if A isan N × M matrixthen AT = B isthe M × N matrix bji = aij .Someofthepropertiesofthetransposeare:itisself-inverse AT T = A;respectsaddition (A + B)T = AT + BT ;anditreverses theorderoffactorsinmultiplication (AB)T = BT AT

Squareandinvertiblematrices

Sofar,wehavenotdiscussedhowtosolvematrixequations.Thecase ofadditioniseasybecausewecanusescalarmultiplicationtoformthe negativeofamatrix,i.e.given C = A + B,ﬁnding B requiresusto calculate B = C A = C +( 1) A.Inthecaseofmultiplication,we needtoﬁndthe“reciprocal”ofamatrix,e.g.tosolve C = AB for B we wouldnaturallycalculate A 1C = A 1AB = B bytheusualalgebraic rules.However,thingsbecomemorecomplicatedbecause A 1 doesnot existingeneral.Wewilldiscusstheconditionsunderwhichamatrix doeshaveamultiplicativeinversenext.

All squarematrices ofsize N × N canbesummedormultipliedto-

getherinanyorder.Asquarematrix A withallzeroelementsexcept forthe maindiagonal,i.e. aij =0 unless i = j iscalleda diagonalmatrix.Aspecialdiagonalmatrix, I,isthe identitymatrix wherethemain diagonalentries aii =1.Then,iftheequality AB = I = BA holds,the The N × N identitymatrixisdenoted IN orsimply I whenthe contextisclearandthesizecan beomitted.

matrix B mustbewell-deﬁned(andunique)anditistheinverseof A, i.e. B = A 1.Wethensaythat A is invertible; ifitisnotinvertible thenitis degenerate or singular. Therearemanyequivalentconditionsformatrixinvertibility,forexample,theonlysolutiontotheequation Ax = 0 isthevector x = 0 orthecolumnsof A arelinearlyindependent.Butoneparticularly importantwaytotesttheinvertibilityofamatrixistocalculatethe determinant |A|:ifthematrixissingular,thedeterminantiszero.It followsthatallinvertiblematriceshave |A| =0.Thedeterminantcalculationisquiteelaborateforageneralsquarematrix,formulasexistbut geometricintuitionhelpstounderstandthesecalculations:whenalinear mapdeﬁnedbyamatrixactsonageometricobjectinvectorspacewith acertainvolume,thedeterminantisthe scalingfactor ofthemapping. Volumesundertheactionofthemaparescaledbythemagnitudeof thedeterminant.Ifthedeterminantisnegative,the orientation ofany geometricobjectisreversed.Therefore,invertibletransformationsare thosethatdonotcollapsethevolumeofanyobjectinthevectorspace tozero(Figure1.7).

Anothermatrixoperatorwhichﬁndssigniﬁcantuseisthe tracetr (A) ofasquarematrix:thisisjustthesumofthediagonals,e.g.tr (A)= N i=1 aii.Thetraceisinvarianttoadditiontr (A + B)= tr (A)+tr (B), transposetr AT = tr (A) andmultiplicationtr (AB)= tr (BA).With productsofthreeormorematricesthetraceisinvarianttocyclicpermutations,withthreematrices:tr (ABC)= tr (CAB)= tr (BCA)

Eigenvaluesandeigenvectors

Aubiquitouscomputationthatarisesinconnectionwithalgebraicproblemsinvectorspacesisthe eigenvalueproblem forthegiven N ×N square matrix A:

Anynon-zero N × 1 vector v whichsolvesthisequationisknownas an eigenvector of A,andthescalarvalue λ isknownastheassociated eigenvalue.Eigenvectorsarenotunique:theycanbemultipliedbyany non-zeroscalarandstillremaineigenvectorswiththesameeigenvalues. Thus,oftentheunitlengtheigenvectorsaresoughtasthesolutionsto (1.12).

Itshouldbenotedthat(1.12)arisesforvectorspacesingenerale.g. linearoperators.Animportantexampleoccursinthevectorspaceof functions f (x) withdifferentialtheoperator L = d dx .Here,thecorrespondingeigenvalueproblemisthedifferentialequation L [f (x)]= λf (x),forwhichthesolutionis f (x)= aeλx forany(non-zero)scalar value a.Thisisknownasan eigenfunction ofthedifferentialoperator

Fig.1.8:Anexampleofdiagonalizingamatrix.Thediagonalizable squarematrix A hasdiagonalmatrix D containingtheeigenvalues,and transformationmatrix P containing theeigenbasis,so A = PDP 1 . A mapstherotatedsquare(top),to therectangleinthesameorientation (atleft).Thisisequivalenttofirst ‘unrotating’thesquare(theeffect of P 1)suchthatitisalignedwith theco-ordinateaxes,thenstretching/compressingthesquarealong eachaxis(theeffectof D),andfinally rotatingbacktotheoriginalorientation(theeffectof P).

Av = λv (1.12)

LIftheyexist,theeigenvectorsandeigenvaluesofasquarematrix A can befoundbyobtainingallscalarvalues λ suchthat|(A λI)| =0.This holdsbecause Av λv = 0 ifandonlyif |(A λI)| =0.Expandingout thisdeterminantequationleadstoan N -thorderpolynomialequation in λ,namely aN λN + aN 1λN 1 + ··· + a0 =0,andtherootsofthis equationaretheeigenvalues.

Thispolynomialisknownasthe characteristicpolynomial for A and determinestheexistenceofasetofeigenvectorsthatisalsoabasisfor thespace,inthefollowingway.Thefundamentaltheoremofalgebra statesthatthispolynomialhasexactly N roots,butsomemaybe repeated (i.e.occurmorethanonce).Iftherearenorepeatedrootsofthe characteristicpolynomial,thentheeigenvaluesarealldistinct,andso thereare N eigenvectorswhicharealllinearlyindependent.Thismeans thattheyformabasisforthevectorspace,whichisthe eigenbasis for thematrix.

Notallmatriceshaveaneigenbasis.Howevermatricesthatdoare also diagonalizable,thatis,theyhavethesamegeometriceﬀectasa diagonalmatrix,butinadiﬀerentbasisotherthanthestandardone. Thisbasiscanbefoundbysolvingtheeigenvalueproblem.Placingall theeigenvectorsintothecolumnsofamatrix P andallthecorresponding eigenvaluesintoadiagonalmatrix D,thenthematrixcanberewritten:

SeeFigure1.8.Adiagonalmatrixsimplyscalesallthecoordinatesof thespacebyadifferent,fixedamount.Theyareverysimpletodealwith, andhaveimportantapplicationsinsignalprocessingandmachinelearning.Forexample,theGaussiandistributionovermultiplevariables,one ofthemostimportantdistributionsinpracticalapplications,encodes theprobabilisticrelationshipbetweeneachvariableintheproblemwith the covariancematrix.Bydiagonalizingthismatrix,onecanfindalinearmappingwhichmakesallthevariablesstatisticallyindependentof eachother:thisdramaticallysimplifiesmanysubsequentcalculations. Despitethecentralimportanceoftheeigenvectorsandeigenvaluesa linearproblem,itisgenerallynotpossibletofindalltheeigenvaluesby analyticalcalculation.Thereforeonegenerallyturnstoiterativenumericalalgorithmstoobtainananswertoacertainprecision.

Specialmatrices

Beyondwhathasbeenalreadydiscussed,thereisnotthatmuchmoreto besaidaboutgeneralmatriceswhichhave N × M degreesoffreedom. Specialmatriceswithfewerdegreesoffreedomhaveveryinteresting propertiesandoccurfrequentlyinpractice.

Someofthemostinterestingspecialmatricesare symmetricmatrices withrealentries –self-transpose andsosquarebydeﬁnition,i.e. AT = A.Thesematricesarealwaysdiagonalizable,andhaveanorthogonal eigenbasis.Theeigenvaluesarealwaysreal.Iftheinverseexists,itis

A = PDP 1 (1.13)

alsosymmetric.Asymmetricmatrixhas 1 2 N (N +1) uniqueentries,on theorderofhalfthe N 2 entriesofanarbitrarysquarematrix.

Positive-deﬁnitematrices areaspecialkindofsymmetricmatrixfor which vT Av > 0 foranynon-zerovector v.Alltheeigenvaluesare positive.Takeanyreal,invertiblematrix B (sothat Bv = 0 forall such v)andlet A = BT B,then vT BT B

making A positive-deﬁnite.Aswillbedescribedinthenextsection, thesekindsofmatricesareveryimportantinmachinelearningandsignal processingbecausethecovariancematrixofasetofrandomvariablesis positive-deﬁniteforexactlythisreason.

Orthonormalmatrices haveallcolumnswhicharevectorsthatform anorthonormalbasisforthespace.Thedeterminantofthesematrices iseither +1 or 1.Likesymmetricmatricestheyarealwaysdiagonalizable,althoughtheeigenvectorsaregenerallycomplexwithmodulus 1.Anorthonormalmatrixisalwaysinvertible,theinverseisalsoorthonormalandequaltothetranspose, AT = A 1.Thesubsetwith determinant +1,correspondto rotations inthevectorspace.

For upper (lower) triangularmatrices,thediagonalandtheentries above(below)thediagonalarenon-zero,therestzero.Thesematrices oftenoccurwhensolvingmatrixproblemssuchas Av = b,becausethe matrixequation Lv = b issimpletosolveby forwardsubstitution if L islower-triangular.Forwardsubstitutionisastraightforwardsequential procedurewhichﬁrstobtains v1 intermsof b1 and l11,then v2 interms of b1, l21 and l22 etc.Thesameholdsforuppertriangularmatricesand backwardsubstitution.Becauseofthesimplicityofthesesubstitution procedures,thereexistmethodsfordecomposingamatrixintoaproduct ofupperorlowertriangularmatricesandacompanionmatrix.

Toeplitzmatrices arematriceswith 2N 1 degreesoffreedomthat haveconstantdiagonals,thatis,theelementsof A haveentries aij = ci j .All discreteconvolutions canberepresentedasToeplitzmatrices, andaswewilldiscusslater,thismakesthemoffundamentalimportance inDSP.Becauseofthereduceddegreesoffreedomandspecialstructure ofthematrix,aToeplitzmatrixproblem Ax = b iscomputationally easiertosolvethanageneralmatrixproblem:amethodknownasthe Levinsonrecursiondramaticallyreducesthenumberofarithmeticoperationsneeded.

Circulantmatrices areToeplitzmatriceswhereeachrowisobtained fromtherowabovebyrotatingitoneelementtotheright.Withonly N degreesoffreedomtheyarehighlystructuredandcanbeunderstood asdiscretecircularconvolutions.Theeigenbasiswhichdiagonalizesthe matrixisthe discreteFourierbasis whichisoneofthecornerstonesof classicalDSP.Itfollowsthatanycirculantmatrixproblemcanbevery eﬃcientlysolvedusingthe fastFouriertransform (FFT).

DummitandFoote(2004)containsanin-depthexpositionofvector spacesfromanabstractpointofview,whereasKayeandWilson(1998) isanaccessibleandmoreconcreteintroduction.