Natural language processing 1st edition jacob eisenstein

Visit to download the full and correct content document: https://textbookfull.com/product/natural-language-processing-1st-edition-jacob-eisenst ein/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Python Natural Language Processing Advanced machine learning and deep learning techniques for natural language processing 1st Edition Jalaj Thanaki

https://textbookfull.com/product/python-natural-languageprocessing-advanced-machine-learning-and-deep-learningtechniques-for-natural-language-processing-1st-edition-jalajthanaki/

Deep learning in natural language processing Deng

https://textbookfull.com/product/deep-learning-in-naturallanguage-processing-deng/

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing 1st Edition Taweh Beysolow Ii

https://textbookfull.com/product/applied-natural-languageprocessing-with-python-implementing-machine-learning-and-deeplearning-algorithms-for-natural-language-processing-1st-editiontaweh-beysolow-ii/

COGNITIVE APPROACH TO NATURAL LANGUAGE PROCESSING 1st Edition Bernadette Sharp

https://textbookfull.com/product/cognitive-approach-to-naturallanguage-processing-1st-edition-bernadette-sharp/

Natural Language Processing with Python Cookbook 1st Edition Krishna Bhavsar

https://textbookfull.com/product/natural-language-processingwith-python-cookbook-1st-edition-krishna-bhavsar/

Deep Learning in Natural Language Processing 1st Edition Li Deng

https://textbookfull.com/product/deep-learning-in-naturallanguage-processing-1st-edition-li-deng/

Intelligent Natural Language Processing Trends and Applications 1st Edition Khaled Shaalan

https://textbookfull.com/product/intelligent-natural-languageprocessing-trends-and-applications-1st-edition-khaled-shaalan/

Natural Language Processing for Social Media 1st Edition Anna Atefeh Farzindar

https://textbookfull.com/product/natural-language-processing-forsocial-media-1st-edition-anna-atefeh-farzindar/

Natural Language Processing in Artificial Intelligence 1st Edition Brojo Kishore Mishra

https://textbookfull.com/product/natural-language-processing-inartificial-intelligence-1st-edition-brojo-kishore-mishra/

NaturalLanguageProcessing

October15,2018

JacobEisenstein

1.1Naturallanguageprocessinganditsneighbors.................1

1.2Threethemesinnaturallanguageprocessing..................6

1.2.1Learningandknowledge.........................6

1.2.2Searchandlearning............................7

1.2.3Relational,compositional,anddistributionalperspectives......9

ILearning11

2Lineartextclassiﬁcation13

2.1Thebagofwords..................................13

2.2Na¨ıveBayes.....................................17

2.2.1Typesandtokens..............................19

2.2.2Prediction..................................20

2.2.3Estimation..................................21

2.2.4Smoothing..................................22

2.2.5Settinghyperparameters..........................23

2.3Discriminativelearning..............................24

2.3.1Perceptron..................................25

2.3.2Averagedperceptron............................27

2.4Lossfunctionsandlarge-marginclassiﬁcation.................27

2.4.1Onlinelargemarginclassiﬁcation....................30

2.4.2*Derivationoftheonlinesupportvectormachine...........32

2.5Logisticregression.................................35

2.7*Additionaltopicsinclassiﬁcation........................41

2.7.1Featureselectionbyregularization....................41

2.7.2Otherviewsoflogisticregression.....................41

2.8Summaryoflearningalgorithms.........................43

3Nonlinearclassiﬁcation47

3.3.1Backpropagation..............................55

3.3.2Regularizationanddropout........................57

4Linguisticapplicationsofclassiﬁcation69

4.1Sentimentandopinionanalysis..........................69

4.1.1Relatedproblems..............................70

4.1.2Alternativeapproachestosentimentanalysis..............72 4.2Wordsensedisambiguation............................73

4.2.1Howmanywordsenses?.........................74

4.2.2Wordsensedisambiguationasclassiﬁcation..............75

4.3Designdecisionsfortextclassiﬁcation......................76

4.3.1Whatisaword?...............................76

4.3.2Howmanywords?.............................79

4.3.3Countorbinary?..............................80

4.4Evaluatingclassiﬁers................................80

4.4.1Precision,recall,and

4.4.2Threshold-freemetrics...........................83

4.4.3Classiﬁercomparisonandstatisticalsigniﬁcance............84

4.4.4*Multiplecomparisons...........................87

4.5Buildingdatasets..................................88

4.5.1Metadataaslabels.............................88

4.5.2Labelingdata................................88

5Learningwithoutsupervision95

5.1Unsupervisedlearning...............................95

5.1.1 K-meansclustering............................96

5.1.2Expectation-Maximization(EM).....................98

5.1.3EMasanoptimizationalgorithm.....................102

5.1.4Howmanyclusters?............................103

5.2Applicationsofexpectation-maximization....................104

5.2.1Wordsenseinduction...........................104

5.2.2Semi-supervisedlearning.........................105

5.2.3Multi-componentmodeling........................106

5.3Semi-supervisedlearning.............................107

5.3.1Multi-viewlearning............................108

5.3.2Graph-basedalgorithms..........................109

5.4Domainadaptation.................................110

5.4.1Superviseddomainadaptation......................111

5.4.2Unsuperviseddomainadaptation....................112

5.5*Otherapproachestolearningwithlatentvariables..............114

5.5.1Sampling...................................115

5.5.2Spectrallearning..............................117

IISequencesandtrees123

6Languagemodels125

6.1 N -gramlanguagemodels.............................126

6.2Smoothinganddiscounting............................129

6.2.1Smoothing..................................129

6.2.2Discountingandbackoff..........................130

6.2.3*Interpolation................................131

6.2.4*Kneser-Neysmoothing..........................133

6.3Recurrentneuralnetworklanguagemodels...................133

6.3.1Backpropagationthroughtime......................136

6.3.2Hyperparameters..............................137

6.3.3Gatedrecurrentneuralnetworks.....................137

6.4Evaluatinglanguagemodels............................139

6.4.1Held-outlikelihood............................139

6.4.2Perplexity..................................140

6.5Out-of-vocabularywords.............................141

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

7Sequencelabeling145

7.1Sequencelabelingasclassiﬁcation........................145

7.2Sequencelabelingasstructureprediction....................147

7.3TheViterbialgorithm................................149

7.3.1Example...................................152

7.3.2Higher-orderfeatures...........................153

7.4HiddenMarkovModels..............................153

7.4.1Estimation..................................155

7.4.2Inference...................................155

7.5Discriminativesequencelabelingwithfeatures.................157

7.5.1Structuredperceptron...........................160

7.5.2Structuredsupportvectormachines...................160

7.5.3Conditionalrandomﬁelds.........................162

7.6Neuralsequencelabeling..............................167

7.6.1Recurrentneuralnetworks........................167

7.6.2Character-levelmodels...........................169

7.6.3ConvolutionalNeuralNetworksforSequenceLabeling........170

7.7*Unsupervisedsequencelabeling.........................170

7.7.1Lineardynamicalsystems.........................172

7.7.2Alternativeunsupervisedlearningmethods..............172

7.7.3Semiringnotationandthegeneralizedviterbialgorithm.......172

8Applicationsofsequencelabeling175

8.1Part-of-speechtagging...............................175

8.1.1Parts-of-Speech...............................176

8.1.2Accuratepart-of-speechtagging.....................180

8.2MorphosyntacticAttributes............................182

8.3NamedEntityRecognition.............................183

8.4Tokenization.....................................185

8.5Codeswitching...................................186

8.6Dialogueacts....................................187

9Formallanguagetheory191

9.1Regularlanguages.................................192

9.1.1Finitestateacceptors............................193

9.1.2Morphologyasaregularlanguage....................194

9.1.3Weightedﬁnitestateacceptors......................196

9.1.4Finitestatetransducers..........................201

9.1.5*Learningweightedﬁnitestateautomata................206

9.2Context-freelanguages...............................207

9.2.1Context-freegrammars..........................208

JacobEisenstein.DraftofOctober15,2018.

9.2.2Naturallanguagesyntaxasacontext-freelanguage..........211

9.2.3Aphrase-structuregrammarforEnglish................213

9.2.4Grammaticalambiguity..........................218

9.3*Mildlycontext-sensitivelanguages.......................218

9.3.1Context-sensitivephenomenainnaturallanguage...........219

9.3.2Combinatorycategorialgrammar....................220

10Context-freeparsing225

10.1Deterministicbottom-upparsing.........................226

10.1.1Recoveringtheparsetree.........................227

10.1.2Non-binaryproductions..........................227

10.1.3Complexity.................................229

10.2Ambiguity......................................229

10.2.1Parserevaluation..............................230

10.2.2Localsolutions...............................231

10.3WeightedContext-FreeGrammars........................232

10.3.1Parsingwithweightedcontext-freegrammars.............234

10.3.2Probabilisticcontext-freegrammars...................235

10.3.3*Semiringweightedcontext-freegrammars...............237

10.4Learningweightedcontext-freegrammars....................238

10.4.1Probabilisticcontext-freegrammars...................238

10.4.2Feature-basedparsing...........................239

10.4.3*Conditionalrandomﬁeldparsing....................240

10.4.4Neuralcontext-freegrammars......................242

10.5Grammarreﬁnement................................242

10.5.1Parentannotationsandothertreetransformations...........243

10.5.2Lexicalizedcontext-freegrammars....................244

10.5.3*Reﬁnementgrammars..........................248

10.6Beyondcontext-freeparsing............................250

10.6.1Reranking..................................250

10.6.2Transition-basedparsing..........................251

11Dependencyparsing257

11.1Dependencygrammar...............................257

11.1.1Headsanddependents...........................258

11.1.2Labeleddependencies...........................259

11.1.3Dependencysubtreesandconstituents.................260

11.2Graph-baseddependencyparsing........................262

11.2.1Graph-basedparsingalgorithms.....................264

11.2.2Computingscoresfordependencyarcs.................265

11.2.3Learning...................................267

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

11.3Transition-baseddependencyparsing......................268

11.3.1Transitionsystemsfordependencyparsing...............269

11.3.2Scoringfunctionsfortransition-basedparsers.............273 11.3.3Learningtoparse..............................274 11.4Applications.....................................277

12Logicalsemantics285

12.1Meaninganddenotation..............................286

12.2Logicalrepresentationsofmeaning........................287 12.2.1Propositionallogic.............................287

12.2.2First-orderlogic...............................288

12.3Semanticparsingandthelambdacalculus....................291 12.3.1Thelambdacalculus............................292 12.3.2Quantiﬁcation................................293

12.4Learningsemanticparsers.............................296

12.4.1Learningfromderivations.........................297

12.4.2Learningfromlogicalforms........................299

12.4.3Learningfromdenotations........................301

13Predicate-argumentsemantics305

13.1Semanticroles....................................307 13.1.1VerbNet...................................308

13.1.2Proto-rolesandPropBank.........................309

13.1.3FrameNet..................................310

13.2Semanticrolelabeling...............................312

13.2.1Semanticrolelabelingasclassiﬁcation..................312

13.2.2Semanticrolelabelingasconstrainedoptimization..........315

13.2.3Neuralsemanticrolelabeling.......................317

13.3AbstractMeaningRepresentation.........................318

13.3.1AMRParsing................................321

14Distributionalanddistributedsemantics325

14.1Thedistributionalhypothesis...........................325 14.2Designdecisionsforwordrepresentations....................327

14.2.1Representation...............................327

14.2.2Context....................................328

14.2.3Estimation..................................329

14.3Latentsemanticanalysis..............................329

14.4Brownclusters....................................331

14.5Neuralwordembeddings.............................334

14.5.1Continuousbag-of-words(CBOW)....................334

14.5.2Skipgrams..................................335

14.5.3Computationalcomplexity........................335

14.5.4Wordembeddingsasmatrixfactorization................337

14.6Evaluatingwordembeddings...........................338

14.6.1Intrinsicevaluations............................339

14.6.2Extrinsicevaluations............................339

14.6.3Fairnessandbias..............................340

14.7Distributedrepresentationsbeyonddistributionalstatistics..........341

14.7.1Word-internalstructure..........................341

14.7.2Lexicalsemanticresources.........................343

14.8Distributedrepresentationsofmultiwordunits.................344

14.8.1Purelydistributionalmethods......................344

14.8.2Distributional-compositionalhybrids..................345

14.8.3Supervisedcompositionalmethods...................346

14.8.4Hybriddistributed-symbolicrepresentations..............346

15ReferenceResolution351

15.1Formsofreferringexpressions..........................352

15.1.1Pronouns..................................352

15.1.2ProperNouns................................357

15.1.3Nominals..................................357

15.2Algorithmsforcoreferenceresolution......................358

15.2.1Mention-pairmodels............................359

15.2.2Mention-rankingmodels.........................360

15.2.3Transitiveclosureinmention-basedmodels...............361

15.2.4Entity-basedmodels............................362

15.3Representationsforcoreferenceresolution....................367

15.3.1Features...................................367

15.3.2Distributedrepresentationsofmentionsandentities..........370

15.4Evaluatingcoreferenceresolution.........................373

16Discourse379

16.1Segments.......................................379

16.1.1Topicsegmentation.............................380

16.1.2Functionalsegmentation..........................381

16.2Entitiesandreference................................381

16.2.1Centeringtheory..............................382

16.2.2Theentitygrid...............................383

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

16.2.3*Formalsemanticsbeyondthesentencelevel..............384 16.3Relations.......................................385

16.3.1Shallowdiscourserelations........................385

16.3.2Hierarchicaldiscourserelations......................389

16.3.3Argumentation...............................392

16.3.4Applicationsofdiscourserelations....................393

17.1Entities........................................405

17.1.1Entitylinkingbylearningtorank....................406 17.1.2Collectiveentitylinking..........................408

17.2.1Pattern-basedrelationextraction.....................412

17.2.2Relationextractionasaclassiﬁcationtask................413

17.2.3Knowledgebasepopulation........................416

17.2.4Openinformationextraction.......................419

17.3Events........................................420

17.4Hedges,denials,andhypotheticals........................422

17.5Questionansweringandmachinereading....................424

17.5.1Formalsemantics..............................424 17.5.2Machinereading..............................425

18Machinetranslation431

18.1Machinetranslationasatask...........................431

18.1.1Evaluatingtranslations..........................433

18.1.2Data.....................................435

18.2Statisticalmachinetranslation...........................436

18.2.1Statisticaltranslationmodeling......................437

18.2.2Estimation..................................438

18.2.3Phrase-basedtranslation..........................439

18.2.4*Syntax-basedtranslation.........................441

18.3Neuralmachinetranslation............................442

18.3.1Neuralattention..............................444

18.3.2*Neuralmachinetranslationwithoutrecurrence............446

18.3.3Out-of-vocabularywords.........................448

18.4Decoding.......................................449

18.5Trainingtowardstheevaluationmetric.....................451

19Textgeneration457

19.1Data-to-textgeneration...............................457

19.1.1Latentdata-to-textalignment.......................459

19.1.2Neuraldata-to-textgeneration......................460

19.2Text-to-textgeneration...............................464

19.2.1Neuralabstractivesummarization....................464

19.2.2Sentencefusionformulti-documentsummarization..........466

19.3Dialogue.......................................467

19.3.1Finite-stateandagenda-baseddialoguesystems............467

19.3.2Markovdecisionprocesses........................468

19.3.3Neuralchatbots...............................470

AProbability475

A.1Probabilitiesofeventcombinations........................475

A.1.1Probabilitiesofdisjointevents......................476

A.1.2Lawoftotalprobability..........................477

A.2ConditionalprobabilityandBayes’rule.....................477

A.3Independence....................................479

A.4Randomvariables..................................480

A.5Expectations.....................................481

A.6Modelingandestimation..............................482

BNumericaloptimization485

B.1Gradientdescent..................................486

B.2Constrainedoptimization.............................486

B.3Example:Passive-aggressiveonlinelearning..................487

Bibliography489 UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

Preface

Thegoalofthistextisfocusonacoresubsetofthenaturallanguageprocessing,uniﬁed bytheconceptsoflearningandsearch.Aremarkablenumberofproblemsinnatural languageprocessingcanbesolvedbyacompactsetofmethods:

Search. Viterbi,CKY,minimumspanningtree,shift-reduce,integerlinearprogramming, beamsearch.

Learning. Maximum-likelihoodestimation,logisticregression,perceptron,expectationmaximization,matrixfactorization,backpropagation.

Thistextexplainshowthesemethodswork,andhowtheycanbeappliedtoawiderange oftasks:documentclassiﬁcation,wordsensedisambiguation,part-of-speechtagging, namedentityrecognition,parsing,coreferenceresolution,relationextraction,discourse analysis,languagemodeling,andmachinetranslation.

Background

Becausenaturallanguageprocessingdrawsonmanydifferentintellectualtraditions,almosteveryonewhoapproachesitfeelsunderpreparedinonewayoranother.Hereisa summaryofwhatisexpected,andwhereyoucanlearnmore:

Mathematicsandmachinelearning. Thetextassumesabackgroundinmultivariatecalculusandlinearalgebra:vectors,matrices,derivatives,andpartialderivatives.You shouldalsobefamiliarwithprobabilityandstatistics.AreviewofbasicprobabilityisfoundinAppendixA,andaminimalreviewofnumericaloptimizationis foundinAppendixB.Forlinearalgebra,theonlinecourseandtextbookfromStrang (2016)provideanexcellentreview.Deisenrothetal.(2018)arecurrentlypreparing atextbookon MathematicsforMachineLearning,adraftcanbefoundonline.1 For anintroductiontoprobabilisticmodelingandestimation,seeJamesetal.(2013);for

1https://mml-book.github.io/

amoreadvancedandcomprehensivediscussionofthesamematerial,theclassic referenceisHastieetal.(2009).

Linguistics. Thisbookassumesnoformaltraininginlinguistics,asidefromelementary conceptslikesnounsandverbs,whichyouhaveprobablyencounteredinthestudy ofEnglishgrammar.Ideasfromlinguisticsareintroducedthroughoutthetextas needed,includingdiscussionsofmorphologyandsyntax(chapter9),semantics (chapters12and13),anddiscourse(chapter16).Linguisticissuesalsoariseinthe application-focusedchapters4,8,and18.Ashortguidetolinguisticsforstudents ofnaturallanguageprocessingisofferedbyBender(2013);youareencouragedto startthere,andthenpickupamorecomprehensiveintroductorytextbook(e.g.,Akmajianetal.,2010;Fromkinetal.,2013).

Computerscience. Thebookistargetedatcomputerscientists,whoareassumedtohave takenintroductorycoursesontheanalysisofalgorithmsandcomplexitytheory.In particular,youshouldbefamiliarwithasymptoticanalysisofthetimeandmemory costsofalgorithms,andwiththebasicsofdynamicprogramming.Theclassictext onalgorithmsisofferedbyCormenetal.(2009);foranintroductiontothetheoryof computation,seeAroraandBarak(2009)andSipser(2012).

Howtousethisbook

Aftertheintroduction,thetextbookisorganizedintofourmainunits:

Learning. Thissectionbuildsupasetofmachinelearningtoolsthatwillbeusedthroughouttheothersections.Becausethefocusisonmachinelearning,thetextrepresentationsandlinguisticphenomenaaremostlysimple:“bag-of-words”textclassiﬁcationistreatedasamodelexample.Chapter4describessomeofthemorelinguisticallyinterestingapplicationsofword-basedtextanalysis.

Sequencesandtrees. Thissectionintroducesthetreatmentoflanguageasastructured phenomena.Itdescribessequenceandtreerepresentationsandthealgorithmsthat theyfacilitate,aswellasthelimitationsthattheserepresentationsimpose.Chapter9introducesﬁnitestateautomataandbrieﬂyoverviewsacontext-freeaccountof Englishsyntax.

Meaning. Thissectiontakesabroadviewofeffortstorepresentandcomputemeaning fromtext,rangingfromformallogictoneuralwordembeddings.Italsoincludes twotopicsthatarecloselyrelatedtosemantics:resolutionofambiguousreferences, andanalysisofmulti-sentencediscoursestructure.

Applications. Theﬁnalsectionofferschapter-lengthtreatmentsonthreeofthemostprominentapplicationsofnaturallanguageprocessing:informationextraction,machine

JacobEisenstein.DraftofOctober15,2018.

translation,andtextgeneration.Eachoftheseapplicationsmeritsatextbooklength treatmentofitsown(Koehn,2009;Grishman,2012;ReiterandDale,2000);thechaptershereexplainsomeofthemostwellknownsystemsusingtheformalismsand methodsbuiltupearlierinthebook,whileintroducingmethodssuchasneuralattention.

Eachchaptercontainssomeadvancedmaterial,whichismarkedwithanasterisk. Thismaterialcanbesafelyomittedwithoutcausingmisunderstandingslateron.But evenwithouttheseadvancedsections,thetextistoolongforasinglesemestercourse,so instructorswillhavetopickandchooseamongthechapters.

Chapters1-3providebuildingblocksthatwillbeusedthroughoutthebook,andchapter4describessomecriticalaspectsofthepracticeoflanguagetechnology.Language models(chapter6),sequencelabeling(chapter7),andparsing(chapter10and11)are canonicaltopicsinnaturallanguageprocessing,anddistributedwordembeddings(chapter14)havebecomeubiquitous.Oftheapplications,machinetranslation(chapter18)is thebestchoice:itismorecohesivethaninformationextraction,andmorematurethantext generation.ManystudentswillbeneﬁtfromthereviewofprobabilityinAppendixA.

• Acoursefocusingonmachinelearningshouldaddthechapteronunsupervised learning(chapter5).Thechaptersonpredicate-argumentsemantics(chapter13), referenceresolution(chapter15),andtextgeneration(chapter19)areparticularly inﬂuencedbyrecentprogressinmachinelearning,includingdeepneuralnetworks andlearningtosearch.

• Acoursewithamorelinguisticorientationshouldaddthechaptersonapplicationsofsequencelabeling(chapter8),formallanguagetheory(chapter9),semantics (chapter12and13),anddiscourse(chapter16).

• Foracoursewithamoreappliedfocus,Irecommendthechaptersonapplications ofsequencelabeling(chapter8),predicate-argumentsemantics(chapter13),informationextraction(chapter17),andtextgeneration(chapter19).

Acknowledgments

Severalcolleagues,students,andfriendsreadearlydraftsofchaptersintheirareasof expertise,includingYoavArtzi,KevinDuh,HengJi,JessyLi,BrendanO’Connor,Yuval Pinter,ShawnLingRamirez,NathanSchneider,PamelaShapiro,NoahA.Smith,Sandeep Soni,andLukeZettlemoyer.Ialsothanktheanonymousreviewers,particularlyreviewer 4,whoprovideddetailedline-by-lineeditsandsuggestions.ThetextbeneﬁtedfromhighleveldiscussionswithmyeditorMarieLufkinLee,aswellasKevinMurphy,ShawnLing Ramirez,andBonnieWebber.Inaddition,therearemanystudents,colleagues,friends, andfamilywhofoundmistakesinearlydrafts,orwhorecommendedkeyreferences.

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

Theseinclude:ParminderBhatia,KimberlyCaras,JiahaoCai,JustinChen,MurtazaDhuliawala,YantaoDu,BarbaraEisenstein,LuizC.F.Ribeiro,ChrisGu,JoshuaKillingsworth, JonathanMay,TahaMerghani,GusMonod,RaghavendraMurali,NidishNair,Brendan O’Connor,BrandonPeck,YuvalPinter,NathanSchneider,JianhaoShen,ZheweiSun,RubinTsui,AshwinCunnapakkamVinjimur,DennyVrandeˇci´c,WilliamYangWang,Clay Washington,IshanWaykul,XavierYao,YuyuZhang,andalsosomeanonymouscommenters.ClayWashingtontestedseveraloftheprogrammingexercises.

MostofthebookwaswrittenwhileIwasatGeorgiaTech’sSchoolofInteractiveComputing.IthanktheSchoolforitssupportofthisproject,andIthankmycolleaguesthere fortheirhelpandsupportatthebeginningofmyfacultycareer.Ialsothank(andapologizeto)themanystudentsinGeorgiaTech’sCS4650and7650whosufferedthrough earlyversionsofthetext.Thebookisdedicatedtomyparents.

Notation

Asageneralrule,words,wordcounts,andothertypesofobservationsareindicatedwith Romanletters(a,b,c);parametersareindicatedwithGreekletters(α,β,θ).Vectorsare indicatedwithboldscriptforbothrandomvariables x andparameters θ.Otheruseful notationsareindicatedinthetablebelow.

Basics

exp x thebase-2exponent, 2x

log x thebase-2logarithm, log2 x

{xn}N n=1 theset {x1,x2,...,xN }

xj i xi raisedtothepower j

x(j) i indexingbyboth i and j

Linearalgebra

x(i) acolumnvectoroffeaturecountsforinstance i,oftenwordcounts

xj:k elements j through k (inclusive)ofavector x

[x; y] verticalconcatenationoftwocolumnvectors

[x, y] horizontalconcatenationoftwocolumnvectors

en a“one-hot”vectorwithavalueof 1 atposition n,andzeroeverywhere else

θ thetransposeofacolumnvector θ

θ x(i) thedotproduct N j=1 θj × x(i) j

X amatrix

xi,j row i,column j ofmatrix X

Diag(x) amatrixwith x onthediagonal,e.g.,

X 1 theinverseofmatrix X v

x1 00 0 x2 0 00 x3

Textdatasets

wm wordtokenatposition m

N numberoftraininginstances

M lengthofasequence(ofwordsortags)

V numberofwordsinvocabulary

y(i) thetruelabelforinstance i

y apredictedlabel

Y thesetofallpossiblelabels

K numberofpossiblelabels K = |Y| thestarttoken thestoptoken

y(i) astructuredlabelforinstance i,suchasatagsequence

Y(w) thesetofpossiblelabelingsforthewordsequence w

thestarttag thestoptag

Probabilities

Pr(A) probabilityofevent A

Pr(A | B) probabilityofevent A,conditionedonevent B

pB (b) themarginalprobabilityofrandomvariable B takingvalue b;written p(b) whenthechoiceofrandomvariableisclearfromcontext

pB|A(b | a) theprobabilityofrandomvariable B takingvalue b,conditionedon A takingvalue a;writtenp(b | a) whenclearfromcontext

A ∼ p therandomvariable A isdistributedaccordingtodistribution p.For example, X ∼N (0, 1) statesthattherandomvariable X isdrawnfrom anormaldistributionwithzeromeanandunitvariance.

A | B ∼ p conditionedontherandomvariable B, A isdistributedaccordingto p 2

Machinelearning

Ψ(x(i),y) thescoreforassigninglabel y toinstance i

f (x(i),y) thefeaturevectorforinstance i withlabel y

θ a(column)vectorofweights (i) lossonanindividualinstance i

L objectivefunctionforanentiredataset

L log-likelihoodofadataset

λ theamountofregularization

JacobEisenstein.DraftofOctober15,2018.

Chapter1 Introduction

Naturallanguageprocessingisthesetofmethodsformakinghumanlanguageaccessible tocomputers.Inthepastdecade,naturallanguageprocessinghasbecomeembedded inourdailylives:automaticmachinetranslationisubiquitousonthewebandinsocial media;textclassiﬁcationkeepsemailsfromcollapsingunderadelugeofspam;search engineshavemovedbeyondstringmatchingandnetworkanalysistoahighdegreeof linguisticsophistication;dialogsystemsprovideanincreasinglycommonandeffective waytogetandshareinformation.

Thesediverseapplicationsarebasedonacommonsetofideas,drawingonalgorithms,linguistics,logic,statistics,andmore.Thegoalofthistextistoprovideasurvey ofthesefoundations.Thetechnicalfunstartsinthenextchapter;therestofthiscurrent chaptersituatesnaturallanguageprocessingwithrespecttootherintellectualdisciplines, identiﬁessomehigh-levelthemesincontemporarynaturallanguageprocessing,andadvisesthereaderonhowbesttoapproachthesubject.

1.1Naturallanguageprocessinganditsneighbors

Naturallanguageprocessingdrawsonmanyotherintellectualtraditions,fromformal linguisticstostatisticalphysics.Thissectionbrieﬂysituatesnaturallanguageprocessing withrespecttosomeofitsclosestneighbors.

ComputationalLinguistics Mostofthemeetingsandjournalsthathostnaturallanguageprocessingresearchbearthename“computationallinguistics”,andthetermsmay bethoughtofasessentiallysynonymous.Butwhilethereissubstantialoverlap,thereis animportantdifferenceinfocus.Inlinguistics,languageistheobjectofstudy.Computationalmethodsmaybebroughttobear,justasinscientiﬁcdisciplineslikecomputational biologyandcomputationalastronomy,buttheyplayonlyasupportingrole.Incontrast,

naturallanguageprocessingisfocusedonthedesignandanalysisofcomputationalalgorithmsandrepresentationsforprocessingnaturalhumanlanguage.Thegoalofnaturallanguageprocessingistoprovidenewcomputationalcapabilitiesaroundhumanlanguage:forexample,extractinginformationfromtexts,translatingbetweenlanguages,answeringquestions,holdingaconversation,takinginstructions,andsoon.Fundamental linguisticinsightsmaybecrucialforaccomplishingthesetasks,butsuccessisultimately measuredbywhetherandhowwellthejobgetsdone.

MachineLearning

Contemporaryapproachestonaturallanguageprocessingrelyheavilyonmachinelearning,whichmakesitpossibletobuildcomplexcomputerprograms fromexamples.Machinelearningprovidesanarrayofgeneraltechniquesfortaskslike convertingasequenceofdiscretetokensinonevocabularytoasequenceofdiscretetokensinanothervocabulary—ageneralizationofwhatonemightinformallycall“translation.”Muchoftoday’snaturallanguageprocessingresearchcanbethoughtofasapplied machinelearning.However,naturallanguageprocessinghascharacteristicsthatdistinguishitfrommanyofmachinelearning’sotherapplicationdomains.

• Unlikeimagesoraudio,textdataisfundamentallydiscrete,withmeaningcreated bycombinatorialarrangementsofsymbolicunits.Thisisparticularlyconsequential forapplicationsinwhichtextistheoutput,suchastranslationandsummarization, becauseitisnotpossibletograduallyapproachanoptimalsolution.

• Althoughthesetofwordsisdiscrete,newwordsarealwaysbeingcreated.Furthermore,thedistributionoverwords(andotherlinguisticelements)resemblesthatofa powerlaw1 (Zipf,1949):therewillbeafewwordsthatareveryfrequent,andalong tailofwordsthatarerare.Aconsequenceisthatnaturallanguageprocessingalgorithmsmustbeespeciallyrobusttoobservationsthatdonotoccurinthetraining data.

• Languageis compositional:unitssuchaswordscancombinetocreatephrases, whichcancombinebytheverysameprinciplestocreatelargerphrases.Forexample,a nounphrase canbecreatedbycombiningasmallernounphrasewitha prepositionalphrase,asin thewhitenessofthewhale.Theprepositionalphraseis createdbycombiningapreposition(inthiscase, of )withanothernounphrase(the whale).Inthisway,itispossibletocreatearbitrarilylongphrases,suchas,

(1.1) ...hugeglobularpiecesofthewhaleofthebignessofahumanhead.2

Themeaningofsuchaphrasemustbeanalyzedinaccordwiththeunderlyinghierarchicalstructure.Inthiscase, hugeglobularpiecesofthewhale actsasasinglenoun

1Throughoutthetext, boldface willbeusedtoindicatekeywordsthatappearintheindex.

2Throughoutthetext,thisnotationwillbeusedtointroducelinguisticexamples.

JacobEisenstein.DraftofOctober15,2018.

phrase,whichisconjoinedwiththeprepositionalphrase ofthebignessofahuman head.Theinterpretationwouldbedifferentifinstead, hugeglobularpieces wereconjoinedwiththeprepositionalphrase ofthewhaleofthebignessofahumanhead implyingadisappointinglysmallwhale.Eventhoughtextappearsasasequence, machinelearningmethodsmustaccountforitsimplicitrecursivestructure.

ArtiﬁcialIntelligence

Thegoalofartificialintelligenceistobuildsoftwareandrobots withthesamerangeofabilitiesashumans(RussellandNorvig,2009).Naturallanguage processingisrelevanttothisgoalinseveralways.Onthemostbasiclevel,thecapacityfor languageisoneofthecentralfeaturesofhumanintelligence,andisthereforeaprerequisiteforartificialintelligence.3 Second,muchofartificialintelligenceresearchisdedicated tothedevelopmentofsystemsthatcanreasonfrompremisestoaconclusion,butsuch algorithmsareonlyasgoodaswhattheyknow(Dreyfus,1992).Naturallanguageprocessingisapotentialsolutiontothe“knowledgebottleneck”,byacquiringknowledge fromtexts,andperhapsalsofromconversations.ThisideagoesallthewaybacktoTuring’s1949paper ComputingMachineryandIntelligence,whichproposedthe Turingtest for determiningwhetherartificialintelligencehadbeenachieved(Turing,2009).

Conversely,reasoningissometimesessentialforbasictasksoflanguageprocessing, suchasresolvingapronoun. Winogradschemas areexamplesinwhichasingleword changesthelikelyreferentofapronoun,inawaythatseemstorequireknowledgeand reasoningtodecode(Levesqueetal.,2011).Forexample,

(1.2) Thetrophydoesn’tﬁtintothebrownsuitcasebecause it istoo[small/large].

Whentheﬁnalwordis small,thenthepronoun it referstothesuitcase;whentheﬁnal wordis large,then it referstothetrophy.Solvingthisexamplerequiresspatialreasoning; otherschemasrequirereasoningaboutactionsandtheireffects,emotionsandintentions, andsocialconventions.

Suchexamplesdemonstratethatnaturallanguageunderstandingcannotbeachieved inisolationfromknowledgeandreasoning.Yetthehistoryofartiﬁcialintelligencehas beenoneofincreasingspecialization:withthegrowingvolumeofresearchinsubdisciplinessuchasnaturallanguageprocessing,machinelearning,andcomputervision,itis

3Thisviewissharedbysome,butnotall,prominentresearchersinartificialintelligence.Michael Jordan,aspecialistinmachinelearning,hassaidthatifhehadabilliondollarstospendonanylarge researchproject,hewouldspenditonnaturallanguageprocessing(https://www.reddit.com/r/ MachineLearning/comments/2fxi6v/ama_michael_i_jordan/).Ontheotherhand,inapublicdiscussionaboutthefutureofartificialintelligenceinFebruary2018,computervisionresearcherYannLecun arguedthatdespiteitsmanypracticalapplications,languageisperhaps“number300”intheprioritylist forartificialintelligenceresearch,andthatitwouldbeagreatachievementifAIcouldattainthecapabilitiesofanorangutan,whichdonotincludelanguage(http://www.abigailsee.com/2018/02/21/ deep-learning-structure-and-innate-priors.html).

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

difficultforanyonetomaintainexpertiseacrosstheentirefield.Still,recentworkhas demonstratedinterestingconnectionsbetweennaturallanguageprocessingandotherareasofAI,includingcomputervision(e.g.,Antoletal.,2015)andgameplaying(e.g., Branavanetal.,2009).Thedominanceofmachinelearningthroughoutartificialintelligencehasledtoabroadconsensusonrepresentationssuchasgraphicalmodelsand computationgraphs,andonalgorithmssuchasbackpropagationandcombinatorialoptimization.Manyofthealgorithmsandrepresentationscoveredinthistextarepartofthis consensus.

ComputerScience

Thediscreteandrecursivenatureofnaturallanguageinvitestheapplicationoftheoreticalideasfromcomputerscience.LinguistssuchasChomskyand Montaguehaveshownhowformallanguagetheorycanhelptoexplainthesyntaxand semanticsofnaturallanguage.Theoreticalmodelssuchasﬁnite-stateandpushdownautomataarethebasisformanypracticalnaturallanguageprocessingsystems.Algorithms forsearchingthecombinatorialspaceofanalysesofnaturallanguageutterancescanbe analyzedintermsoftheircomputationalcomplexity,andtheoreticallymotivatedapproximationscansometimesbeapplied.

Thestudyofcomputersystemsisalsorelevanttonaturallanguageprocessing.Large datasetsofunlabeledtextcanbeprocessedmorequicklybyparallelizationtechniques likeMapReduce(DeanandGhemawat,2008;LinandDyer,2010);high-volumedata sourcessuchassocialmediacanbesummarizedefﬁcientlybyapproximatestreaming andsketchingtechniques(Goyaletal.,2009).Whendeepneuralnetworksareimplementedinproductionsystems,itispossibletoekeoutspeedgainsusingtechniquessuch asreduced-precisionarithmetic(Wuetal.,2016).Manyclassicalnaturallanguageprocessingalgorithmsarenotnaturallysuitedtographicsprocessingunit(GPU)parallelization, suggestingdirectionsforfurtherresearchattheintersectionofnaturallanguageprocessingandcomputinghardware(Yietal.,2011).

SpeechProcessing Naturallanguageisoftencommunicatedinspokenform,andspeech recognitionisthetaskofconvertinganaudiosignaltotext.Fromoneperspective,thisis asignalprocessingproblem,whichmightbeviewedasapreprocessingstepbeforenaturallanguageprocessingcanbeapplied.However,contextplaysacriticalroleinspeech recognitionbyhumanlisteners:knowledgeofthesurroundingwordsinfluencesperceptionandhelpstocorrectfornoise(Milleretal.,1951).Forthisreason,speechrecognition isoftenintegratedwithtextanalysis,particularlywithstatistical languagemodels,which quantifytheprobabilityofasequenceoftext(seechapter6).Beyondspeechrecognition, thebroaderfieldofspeechprocessingincludesthestudyofspeech-baseddialoguesystems,whicharebrieflydiscussedinchapter19.Historically,speechprocessinghasoften beenpursuedinelectricalengineeringdepartments,whilenaturallanguageprocessing

JacobEisenstein.DraftofOctober15,2018.

hasbeenthepurviewofcomputerscientists.Forthisreason,theextentofinteraction betweenthesetwodisciplinesislessthanitmightotherwisebe.

Ethics Asmachinelearningandartiﬁcialintelligencebecomeincreasinglyubiquitous,it iscrucialtounderstandhowtheirbeneﬁts,costs,andrisksaredistributedacrossdifferentkindsofpeople.Naturallanguageprocessingraisessomeparticularlysalientissues around ethics,fairness,andaccountability:

Access. Whoisnaturallanguageprocessingdesignedtoserve?Forexample,whoselanguageistranslated from,andwhoselanguageistranslated to?

Bias. Doeslanguagetechnologylearntoreplicatesocialbiasesfromtextcorpora,and doesitreinforcethesebiasesasseeminglyobjectivecomputationalconclusions?

Labor. Whosetextandspeechcomprisethedatasetsthatpowernaturallanguageprocessing,andwhoperformstheannotations?Arethebeneﬁtsofthistechnology sharedwithallthepeoplewhoseworkmakesitpossible?

Privacyandinternetfreedom. Whatistheimpactoflarge-scaletextprocessingonthe righttofreeandprivatecommunication?Whatisthepotentialroleofnaturallanguageprocessinginregimesofcensorshiporsurveillance?

Thistextlightlytouchesonissuesrelatedtofairnessandbiasinsubsection14.6.3and subsection18.1.1,buttheseissuesareworthyofabookoftheirown.Formorefrom withintheﬁeldofcomputationallinguistics,seethepapersfromtheannualworkshop onEthicsinNaturalLanguageProcessing(Hovyetal.,2017;Alfanoetal.,2018).For anoutsideperspectiveonethicalissuesrelatingtodatascienceatlarge,seeboydand Crawford(2012).

Others Naturallanguageprocessingplaysasignificantroleinemerginginterdisciplinary fieldslike computationalsocialscience andthe digitalhumanities.Textclassification (chapter4),clustering(chapter5),andinformationextraction(chapter17)areparticularly usefultools;anotheris probabilistictopicmodels (Blei,2012),whicharenotcoveredin thistext. Informationretrieval (Manningetal.,2008)makesuseofsimilartools,and conversely,techniquessuchaslatentsemanticanalysis(section14.3)haverootsininformationretrieval. Textmining issometimesusedtorefertotheapplicationofdatamining techniques,especiallyclassificationandclustering,totext.Whilethereisnocleardistinctionbetweentextminingandnaturallanguageprocessing(norbetweendataminingand machinelearning),textminingistypicallylessconcernedwithlinguisticstructure,and moreinterestedinfast,scalablealgorithms.

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

1.2Threethemesinnaturallanguageprocessing

Naturallanguageprocessingcoversadiverserangeoftasks,methods,andlinguisticphenomena.Butdespitetheapparentincommensurabilitybetween,say,thesummarization ofscientificarticles(section16.3.4)andtheidentificationofsuffixpatternsinSpanish verbs(section9.1.4),somegeneralthemesemerge.Theremainderoftheintroductionfocusesonthesethemes,whichwillrecurinvariousformsthroughthetext.Eachtheme canbeexpressedasanoppositionbetweentwoextremeviewpointsonhowtoprocess naturallanguage.Themethodsdiscussedinthetextcanusuallybeplacedsomewhereon thecontinuumbetweenthesetwoextremes.

1.2.1Learningandknowledge

Arecurringtopicofdebateistherelativeimportanceofmachinelearningandlinguistic knowledge.Ononeextreme,advocatesof“naturallanguageprocessingfromscratch”(Collobertetal.,2011)proposetousemachinelearningtotrainend-to-endsystemsthattransmuterawtextintoanydesiredoutputstructure:e.g.,asummary,database,ortranslation.Ontheotherextreme,thecoreworkofnaturallanguageprocessingissometimes takentobetransformingtextintoastackofgeneral-purposelinguisticstructures:from subwordunitscalled morphemes,toword-level parts-of-speech,totree-structuredrepresentationsofgrammar,andbeyond,tologic-basedrepresentationsofmeaning.Intheory, thesegeneral-purposestructuresshouldthenbeabletosupportanydesiredapplication.

Theend-to-endapproachhasbeenbuoyedbyrecentresultsincomputervisionand speechrecognition,inwhichadvancesinmachinelearninghavesweptawayexpertengineeredrepresentationsbasedonthefundamentalsofopticsandphonology(Krizhevsky etal.,2012;GravesandJaitly,2014).Butwhilemachinelearningisanelementofnearly everycontemporaryapproachtonaturallanguageprocessing,linguisticrepresentations suchassyntaxtreeshavenotyetgonethewayofthevisualedgedetectorortheauditory triphone.Linguistshavearguedfortheexistenceofa“languagefaculty”inallhumanbeings,whichencodesasetofabstractionsspeciallydesignedtofacilitatetheunderstanding andproductionoflanguage.Theargumentfortheexistenceofsuchalanguagefaculty isbasedontheobservationthatchildrenlearnlanguagefasterandfromfewerexamples thanwouldbepossibleiflanguagewaslearnedfromexperiencealone.4 Fromapracticalstandpoint,linguisticstructureseemstobeparticularlyimportantinscenarioswhere trainingdataislimited.

Thereareanumberofwaysinwhichknowledgeandlearningcanbecombinedin naturallanguageprocessing.Manysupervisedlearningsystemsmakeuseofcarefully engineered features,whichtransformthedataintoarepresentationthatcanfacilitate

4TheLanguageInstinct (Pinker,2003)articulatestheseargumentsinanengagingandpopularstyle.For argumentsagainsttheinnatenessoflanguage,seeElmanetal.(1998).

JacobEisenstein.DraftofOctober15,2018.

1.2.THREETHEMESINNATURALLANGUAGEPROCESSING

learning.Forexample,inatasklikesearch,itmaybeusefultoidentifyeachword’s stem, sothatasystemcanmoreeasilygeneralizeacrossrelatedtermssuchas whale, whales, whalers,and whaling.(ThisissueisrelativelybenigninEnglish,ascomparedtothemany otherlanguageswhichincludemuchmoreelaboratesystemsofpreﬁxedandsufﬁxes.) Suchfeaturescouldbeobtainedfromahand-craftedresource,likeadictionarythatmaps eachwordtoasinglerootform.Alternatively,featurescanbeobtainedfromtheoutputof ageneral-purposelanguageprocessingsystem,suchasaparserorpart-of-speechtagger, whichmayitselfbebuiltonsupervisedmachinelearning.

Anothersynthesisoflearningandknowledgeisinmodelstructure:buildingmachine learningmodelswhosearchitecturesareinspiredbylinguistictheories.Forexample,the organizationofsentencesisoftendescribedas compositional,withmeaningoflarger unitsgraduallyconstructedfromthemeaningoftheirsmallerconstituents.Thisidea canbebuiltintothearchitectureofadeepneuralnetwork,whichisthentrainedusing contemporarydeeplearningtechniques(Dyeretal.,2016).

Thedebateabouttherelativeimportanceofmachinelearningandlinguisticknowledgesometimesbecomesheated.Nomachinelearningspecialistlikestobetoldthattheir engineeringmethodologyisunscientiﬁcalchemy;5 nordoesalinguistwanttohearthat thesearchforgenerallinguisticprinciplesandstructureshasbeenmadeirrelevantbybig data.Yetthereisclearlyroomforbothtypesofresearch:weneedtoknowhowfarwe cangowithend-to-endlearningalone,whileatthesametime,wecontinuethesearchfor linguisticrepresentationsthatgeneralizeacrossapplications,scenarios,andlanguages. Formoreonthehistoryofthisdebate,seeChurch(2011);foranoptimisticviewofthe potentialsymbiosisbetweencomputationallinguisticsanddeeplearning,seeManning (2015).

1.2.2Searchandlearning

Manynaturallanguageprocessingproblemscanbewrittenmathematicallyintheform ofoptimization,6

where,

• x istheinput,whichisanelementofaset X ;

• y istheoutput,whichisanelementofaset Y(x);

5AliRahimiarguedthatmuchofdeeplearningresearchwassimilarto“alchemy”inapresentationat the2017conferenceonNeuralInformationProcessingSystems.Hewasadvocatingformorelearningtheory, notmorelinguistics.

6Throughoutthistext,equationswillbenumberedbysquarebrackets,andlinguisticexampleswillbe numberedbyparentheses.

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

• Ψ isascoringfunction(alsocalledthe model),whichmapsfromtheset X×Y to therealnumbers;

• θ isavectorofparametersfor Ψ;

• ˆ y isthepredictedoutput,whichischosentomaximizethescoringfunction.

Thisbasicstructurecanbeappliedtoahugerangeofproblems.Forexample,theinput x mightbeasocialmediapost,andtheoutput y mightbealabelingoftheemotional sentimentexpressedbytheauthor(chapter4);or x couldbeasentenceinFrench,andthe output y couldbeasentenceinTamil(chapter18);or x mightbeasentenceinEnglish, and y mightbearepresentationofthesyntacticstructureofthesentence(chapter10);or x mightbeanewsarticleand y mightbeastructuredrecordoftheeventsthatthearticle describes(chapter17).

Thisformulationreﬂectsanimplicitdecisionthatlanguageprocessingalgorithmswill havetwodistinctmodules:

Search. Thesearchmoduleisresponsibleforcomputingthe argmax ofthefunction Ψ Inotherwords,itﬁndstheoutput ˆ y thatgetsthebestscorewithrespecttotheinput x.Thisiseasywhenthesearchspace Y(x) issmallenoughtoenumerate,or whenthescoringfunction Ψ hasaconvenientdecompositionintoparts.Inmany cases,wewillwanttoworkwithscoringfunctionsthatdonothavetheseproperties,motivatingtheuseofmoresophisticatedsearchalgorithms,suchasbottom-up dynamicprogramming(section10.1)andbeamsearch(section11.3.1).Becausethe outputsareusuallydiscreteinlanguageprocessingproblems,searchoftenrelieson themachineryof combinatorialoptimization

Learning. Thelearningmoduleisresponsibleforﬁndingtheparameters θ.Thisistypically(butnotalways)donebyprocessingalargedatasetoflabeledexamples, {(x(i) , y(i))}N i=1.Likesearch,learningisalsoapproachedthroughtheframework ofoptimization,aswewillseeinchapter2.Becausetheparametersareusually continuous,learningalgorithmsgenerallyrelyon numericaloptimization toidentifyvectorsofreal-valuedparametersthatoptimizesomefunctionofthemodeland thelabeleddata.Somebasicprinciplesofnumericaloptimizationarereviewedin AppendixB.

Thedivisionofnaturallanguageprocessingintoseparatemodulesforsearchand learningmakesitpossibletoreusegenericalgorithmsacrossmanytasksandmodels. Muchoftheworkofnaturallanguageprocessingcanbefocusedonthedesignofthe model Ψ —identifyingandformalizingthelinguisticphenomenathatarerelevanttothe taskathand—whilereapingthebeneﬁtsofdecadesofprogressinsearch,optimization, andlearning.Thistextbookwilldescribeseveralclassesofscoringfunctions,andthe correspondingalgorithmsforsearchandlearning.

JacobEisenstein.DraftofOctober15,2018.

Whenamodeliscapableofmakingsubtlelinguisticdistinctions,itissaidtobe expressive.Expressivenessisoftentradedoffagainstefficiencyofsearchandlearning.For example,aword-to-wordtranslationmodelmakessearchandlearningeasy,butitisnot expressiveenoughtodistinguishgoodtranslationsfrombadones.Manyofthemostimportantproblemsinnaturallanguageprocessingseemtorequireexpressivemodels,in whichthecomplexityofsearchgrowsexponentiallywiththesizeoftheinput.Inthese models,exactsearchisusuallyimpossible.Intractabilitythreatenstheneatmodulardecompositionbetweensearchandlearning:ifsearchrequiresasetofheuristicapproximations,thenitmaybeadvantageoustolearnamodelthatperformswellunderthesespecificheuristics.Thishasmotivatedsomeresearcherstotakeamoreintegratedapproach tosearchandlearning,asbrieflymentionedinchapters11and15.

1.2.3Relational,compositional,anddistributionalperspectives

Anyelementoflanguage—aword,aphrase,asentence,orevenasound—canbe describedfromatleastthreeperspectives.Considertheword journalist.A journalist is asubcategoryofa profession,andan anchorwoman isasubcategoryof journalist;furthermore,a journalist performs journalism,whichisoften,butnotalways,asubcategoryof writing.Thisrelationalperspectiveonmeaningisthebasisforsemantic ontologies such as WORDNET (Fellbaum,2010),whichenumeratetherelationsthatholdbetweenwords andotherelementarysemanticunits.Thepoweroftherelationalperspectiveisillustrated bythefollowingexample:

(1.3) UmashanthiinterviewedAna.Sheworksforthecollegenewspaper.

Whoworksforthecollegenewspaper?Theword journalist,whilenotstatedintheexample,implicitlylinksthe interview tothe newspaper,making Umashanthi themostlikely referentforthepronoun.(Ageneraldiscussionofhowtoresolvepronounsisfoundin chapter15.)

Yetdespitetheinferentialpoweroftherelationalperspective,itisnoteasytoformalize computationally.Exactlywhichelementsaretoberelated?Are journalists and reporters distinct,orshouldwegroupthemintoasingleunit?Isthekindof interview performedby ajournalistthesameasthekindthatoneundergoeswhenapplyingforajob?Ontology designersfacemanysuchthornyquestions,andtheprojectofontologydesignhearkens backtoBorges’(1993) CelestialEmporiumofBenevolentKnowledge,whichdividesanimals into:

(a)belongingtotheemperor;(b)embalmed;(c)tame;(d)sucklingpigs;(e) sirens;(f)fabulous;(g)straydogs;(h)includedinthepresentclassification; (i)frenzied;(j)innumerable;(k)drawnwithaveryfinecamelhairbrush;(l)et cetera;(m)havingjustbrokenthewaterpitcher;(n)thatfromalongwayoff resembleflies.

UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.

CHAPTER1.INTRODUCTION

Difﬁcultiesinontologyconstructionhaveledsomelinguiststoarguethatthereisnotaskindependentwaytopartitionupwordmeanings(Kilgarriff,1997).

Someproblemsareeasier.Eachmemberinagroupof journalists isa journalist:the -s sufﬁxdistinguishesthepluralmeaningfromthesingularinmostofthenounsinEnglish. Similarly,a journalist canbethoughtof,perhapscolloquially,assomeonewhoproducesor worksona journal.(Takingthisapproachevenfurther,theword journal derivesfromthe French jour+nal,or day+ly=daily.)Inthisway,themeaningofawordisconstructedfrom theconstituentparts—theprincipleof compositionality.Thisprinciplecanbeapplied tolargerunits:phrases,sentences,andbeyond.Indeed,oneofthegreatstrengthsofthe compositionalviewofmeaningisthatitprovidesaroadmapforunderstandingentire textsanddialoguesthroughasingleanalyticlens,groundingoutinthesmallestpartsof individualwords.

Butalongside journalists and anti-parliamentarians,therearemanywordsthatseem tobelinguisticatoms:think,forexample,of whale, blubber,and Nantucket.Idiomatic phraseslike kickthebucket and shootthebreeze havemeaningsthatarequitedifferentfrom thesumoftheirparts(Sagetal.,2002).Compositionisoflittlehelpforsuchwordsand expressions,buttheirmeaningscanbeascertained—oratleastapproximated—fromthe contextsinwhichtheyappear.Take,forexample, blubber,whichappearsinsuchcontexts as:

(1.4) a. Theblubberservedthemasfuel.

b. ...extractingitfromtheblubberofthelargeﬁsh...

c. Amongstoilysubstances,blubberhasbeenemployedasamanure.

Thesecontextsformthe distributionalproperties oftheword blubber,andtheylinkitto wordswhichcanappearinsimilarconstructions: fat, pelts,and barnacles.Thisdistributionalperspectivemakesitpossibletolearnaboutmeaningfromunlabeleddataalone; unlikerelationalandcompositionalsemantics,nomanualannotationorexpertknowledgeisrequired.Distributionalsemanticsisthuscapableofcoveringahugerangeof linguisticphenomena.However,itlacksprecision: blubber issimilarto fat inonesense,to pelts inanothersense,andto barnacles instillanother.Thequestionof why allthesewords tendtoappearinthesamecontextsisleftunanswered.

Therelational,compositional,anddistributionalperspectivesallcontributetoourunderstandingoflinguisticmeaning,andallthreeappeartobecriticaltonaturallanguage processing.Yettheyareuneasycollaborators,requiringseeminglyincompatiblerepresentationsandalgorithmicapproaches.Thistextpresentssomeofthebestknownandmost successfulmethodsforworkingwitheachoftheserepresentations,butfutureresearch mayrevealnewwaystocombinethem.

JacobEisenstein.DraftofOctober15,2018.

PartI Learning

Another random document with no related content on Scribd:

This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.”

• You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all

copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work.

• You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and

expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org.

Section 3. Information about the Project Gutenberg Literary Archive Foundation

The Project Gutenberg Literary Archive Foundation is a non-profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact

Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation

Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no

prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project Gutenberg™ electronic works

Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.

Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.

Most people start at our website which has the main PG search facility: www.gutenberg.org.

This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.