NaturalLanguageProcessing
October15,2018
JacobEisenstein
1.1Naturallanguageprocessinganditsneighbors.................1
1.2Threethemesinnaturallanguageprocessing..................6
1.2.1Learningandknowledge.........................6
1.2.2Searchandlearning............................7
1.2.3Relational,compositional,anddistributionalperspectives......9
ILearning11
2Lineartextclassification13
2.1Thebagofwords..................................13
2.2Na¨ıveBayes.....................................17
2.2.1Typesandtokens..............................19
2.2.2Prediction..................................20
2.2.3Estimation..................................21
2.2.4Smoothing..................................22
2.2.5Settinghyperparameters..........................23
2.3Discriminativelearning..............................24
2.3.1Perceptron..................................25
2.3.2Averagedperceptron............................27
2.4Lossfunctionsandlarge-marginclassification.................27
2.4.1Onlinelargemarginclassification....................30
2.4.2*Derivationoftheonlinesupportvectormachine...........32
2.5Logisticregression.................................35
2.7*Additionaltopicsinclassification........................41
2.7.1Featureselectionbyregularization....................41
2.7.2Otherviewsoflogisticregression.....................41
2.8Summaryoflearningalgorithms.........................43
3Nonlinearclassification47
3.3.1Backpropagation..............................55
3.3.2Regularizationanddropout........................57
4Linguisticapplicationsofclassification69
4.1Sentimentandopinionanalysis..........................69
4.1.1Relatedproblems..............................70
4.1.2Alternativeapproachestosentimentanalysis..............72 4.2Wordsensedisambiguation............................73
4.2.1Howmanywordsenses?.........................74
4.2.2Wordsensedisambiguationasclassification..............75
4.3Designdecisionsfortextclassification......................76
4.3.1Whatisaword?...............................76
4.3.2Howmanywords?.............................79
4.3.3Countorbinary?..............................80
4.4Evaluatingclassifiers................................80
4.4.1Precision,recall,and
4.4.2Threshold-freemetrics...........................83
4.4.3Classifiercomparisonandstatisticalsignificance............84
4.4.4*Multiplecomparisons...........................87
4.5Buildingdatasets..................................88
4.5.1Metadataaslabels.............................88
4.5.2Labelingdata................................88
5Learningwithoutsupervision95
5.1Unsupervisedlearning...............................95
5.1.1 K-meansclustering............................96
5.1.2Expectation-Maximization(EM).....................98
5.1.3EMasanoptimizationalgorithm.....................102
5.1.4Howmanyclusters?............................103
5.2Applicationsofexpectation-maximization....................104
5.2.1Wordsenseinduction...........................104
5.2.2Semi-supervisedlearning.........................105
5.2.3Multi-componentmodeling........................106
5.3Semi-supervisedlearning.............................107
5.3.1Multi-viewlearning............................108
5.3.2Graph-basedalgorithms..........................109
5.4Domainadaptation.................................110
5.4.1Superviseddomainadaptation......................111
5.4.2Unsuperviseddomainadaptation....................112
5.5*Otherapproachestolearningwithlatentvariables..............114
5.5.1Sampling...................................115
5.5.2Spectrallearning..............................117
IISequencesandtrees123
6Languagemodels125
6.1 N -gramlanguagemodels.............................126
6.2Smoothinganddiscounting............................129
6.2.1Smoothing..................................129
6.2.2Discountingandbackoff..........................130
6.2.3*Interpolation................................131
6.2.4*Kneser-Neysmoothing..........................133
6.3Recurrentneuralnetworklanguagemodels...................133
6.3.1Backpropagationthroughtime......................136
6.3.2Hyperparameters..............................137
6.3.3Gatedrecurrentneuralnetworks.....................137
6.4Evaluatinglanguagemodels............................139
6.4.1Held-outlikelihood............................139
6.4.2Perplexity..................................140
6.5Out-of-vocabularywords.............................141
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
7Sequencelabeling145
7.1Sequencelabelingasclassification........................145
7.2Sequencelabelingasstructureprediction....................147
7.3TheViterbialgorithm................................149
7.3.1Example...................................152
7.3.2Higher-orderfeatures...........................153
7.4HiddenMarkovModels..............................153
7.4.1Estimation..................................155
7.4.2Inference...................................155
7.5Discriminativesequencelabelingwithfeatures.................157
7.5.1Structuredperceptron...........................160
7.5.2Structuredsupportvectormachines...................160
7.5.3Conditionalrandomfields.........................162
7.6Neuralsequencelabeling..............................167
7.6.1Recurrentneuralnetworks........................167
7.6.2Character-levelmodels...........................169
7.6.3ConvolutionalNeuralNetworksforSequenceLabeling........170
7.7*Unsupervisedsequencelabeling.........................170
7.7.1Lineardynamicalsystems.........................172
7.7.2Alternativeunsupervisedlearningmethods..............172
7.7.3Semiringnotationandthegeneralizedviterbialgorithm.......172
8Applicationsofsequencelabeling175
8.1Part-of-speechtagging...............................175
8.1.1Parts-of-Speech...............................176
8.1.2Accuratepart-of-speechtagging.....................180
8.2MorphosyntacticAttributes............................182
8.3NamedEntityRecognition.............................183
8.4Tokenization.....................................185
8.5Codeswitching...................................186
8.6Dialogueacts....................................187
9Formallanguagetheory191
9.1Regularlanguages.................................192
9.1.1Finitestateacceptors............................193
9.1.2Morphologyasaregularlanguage....................194
9.1.3Weightedfinitestateacceptors......................196
9.1.4Finitestatetransducers..........................201
9.1.5*Learningweightedfinitestateautomata................206
9.2Context-freelanguages...............................207
9.2.1Context-freegrammars..........................208
JacobEisenstein.DraftofOctober15,2018.
9.2.2Naturallanguagesyntaxasacontext-freelanguage..........211
9.2.3Aphrase-structuregrammarforEnglish................213
9.2.4Grammaticalambiguity..........................218
9.3*Mildlycontext-sensitivelanguages.......................218
9.3.1Context-sensitivephenomenainnaturallanguage...........219
9.3.2Combinatorycategorialgrammar....................220
10Context-freeparsing225
10.1Deterministicbottom-upparsing.........................226
10.1.1Recoveringtheparsetree.........................227
10.1.2Non-binaryproductions..........................227
10.1.3Complexity.................................229
10.2Ambiguity......................................229
10.2.1Parserevaluation..............................230
10.2.2Localsolutions...............................231
10.3WeightedContext-FreeGrammars........................232
10.3.1Parsingwithweightedcontext-freegrammars.............234
10.3.2Probabilisticcontext-freegrammars...................235
10.3.3*Semiringweightedcontext-freegrammars...............237
10.4Learningweightedcontext-freegrammars....................238
10.4.1Probabilisticcontext-freegrammars...................238
10.4.2Feature-basedparsing...........................239
10.4.3*Conditionalrandomfieldparsing....................240
10.4.4Neuralcontext-freegrammars......................242
10.5Grammarrefinement................................242
10.5.1Parentannotationsandothertreetransformations...........243
10.5.2Lexicalizedcontext-freegrammars....................244
10.5.3*Refinementgrammars..........................248
10.6Beyondcontext-freeparsing............................250
10.6.1Reranking..................................250
10.6.2Transition-basedparsing..........................251
11Dependencyparsing257
11.1Dependencygrammar...............................257
11.1.1Headsanddependents...........................258
11.1.2Labeleddependencies...........................259
11.1.3Dependencysubtreesandconstituents.................260
11.2Graph-baseddependencyparsing........................262
11.2.1Graph-basedparsingalgorithms.....................264
11.2.2Computingscoresfordependencyarcs.................265
11.2.3Learning...................................267
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
11.3Transition-baseddependencyparsing......................268
11.3.1Transitionsystemsfordependencyparsing...............269
11.3.2Scoringfunctionsfortransition-basedparsers.............273 11.3.3Learningtoparse..............................274 11.4Applications.....................................277
12Logicalsemantics285
12.1Meaninganddenotation..............................286
12.2Logicalrepresentationsofmeaning........................287 12.2.1Propositionallogic.............................287
12.2.2First-orderlogic...............................288
12.3Semanticparsingandthelambdacalculus....................291 12.3.1Thelambdacalculus............................292 12.3.2Quantification................................293
12.4Learningsemanticparsers.............................296
12.4.1Learningfromderivations.........................297
12.4.2Learningfromlogicalforms........................299
12.4.3Learningfromdenotations........................301
13Predicate-argumentsemantics305
13.1Semanticroles....................................307 13.1.1VerbNet...................................308
13.1.2Proto-rolesandPropBank.........................309
13.1.3FrameNet..................................310
13.2Semanticrolelabeling...............................312
13.2.1Semanticrolelabelingasclassification..................312
13.2.2Semanticrolelabelingasconstrainedoptimization..........315
13.2.3Neuralsemanticrolelabeling.......................317
13.3AbstractMeaningRepresentation.........................318
13.3.1AMRParsing................................321
14Distributionalanddistributedsemantics325
14.1Thedistributionalhypothesis...........................325 14.2Designdecisionsforwordrepresentations....................327
14.2.1Representation...............................327
14.2.2Context....................................328
14.2.3Estimation..................................329
14.3Latentsemanticanalysis..............................329
14.4Brownclusters....................................331
14.5Neuralwordembeddings.............................334
14.5.1Continuousbag-of-words(CBOW)....................334
14.5.2Skipgrams..................................335
14.5.3Computationalcomplexity........................335
14.5.4Wordembeddingsasmatrixfactorization................337
14.6Evaluatingwordembeddings...........................338
14.6.1Intrinsicevaluations............................339
14.6.2Extrinsicevaluations............................339
14.6.3Fairnessandbias..............................340
14.7Distributedrepresentationsbeyonddistributionalstatistics..........341
14.7.1Word-internalstructure..........................341
14.7.2Lexicalsemanticresources.........................343
14.8Distributedrepresentationsofmultiwordunits.................344
14.8.1Purelydistributionalmethods......................344
14.8.2Distributional-compositionalhybrids..................345
14.8.3Supervisedcompositionalmethods...................346
14.8.4Hybriddistributed-symbolicrepresentations..............346
15ReferenceResolution351
15.1Formsofreferringexpressions..........................352
15.1.1Pronouns..................................352
15.1.2ProperNouns................................357
15.1.3Nominals..................................357
15.2Algorithmsforcoreferenceresolution......................358
15.2.1Mention-pairmodels............................359
15.2.2Mention-rankingmodels.........................360
15.2.3Transitiveclosureinmention-basedmodels...............361
15.2.4Entity-basedmodels............................362
15.3Representationsforcoreferenceresolution....................367
15.3.1Features...................................367
15.3.2Distributedrepresentationsofmentionsandentities..........370
15.4Evaluatingcoreferenceresolution.........................373
16Discourse379
16.1Segments.......................................379
16.1.1Topicsegmentation.............................380
16.1.2Functionalsegmentation..........................381
16.2Entitiesandreference................................381
16.2.1Centeringtheory..............................382
16.2.2Theentitygrid...............................383
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
16.2.3*Formalsemanticsbeyondthesentencelevel..............384 16.3Relations.......................................385
16.3.1Shallowdiscourserelations........................385
16.3.2Hierarchicaldiscourserelations......................389
16.3.3Argumentation...............................392
16.3.4Applicationsofdiscourserelations....................393
17.1Entities........................................405
17.1.1Entitylinkingbylearningtorank....................406 17.1.2Collectiveentitylinking..........................408
17.2.1Pattern-basedrelationextraction.....................412
17.2.2Relationextractionasaclassificationtask................413
17.2.3Knowledgebasepopulation........................416
17.2.4Openinformationextraction.......................419
17.3Events........................................420
17.4Hedges,denials,andhypotheticals........................422
17.5Questionansweringandmachinereading....................424
17.5.1Formalsemantics..............................424 17.5.2Machinereading..............................425
18Machinetranslation431
18.1Machinetranslationasatask...........................431
18.1.1Evaluatingtranslations..........................433
18.1.2Data.....................................435
18.2Statisticalmachinetranslation...........................436
18.2.1Statisticaltranslationmodeling......................437
18.2.2Estimation..................................438
18.2.3Phrase-basedtranslation..........................439
18.2.4*Syntax-basedtranslation.........................441
18.3Neuralmachinetranslation............................442
18.3.1Neuralattention..............................444
18.3.2*Neuralmachinetranslationwithoutrecurrence............446
18.3.3Out-of-vocabularywords.........................448
18.4Decoding.......................................449
18.5Trainingtowardstheevaluationmetric.....................451
19Textgeneration457
19.1Data-to-textgeneration...............................457
19.1.1Latentdata-to-textalignment.......................459
19.1.2Neuraldata-to-textgeneration......................460
19.2Text-to-textgeneration...............................464
19.2.1Neuralabstractivesummarization....................464
19.2.2Sentencefusionformulti-documentsummarization..........466
19.3Dialogue.......................................467
19.3.1Finite-stateandagenda-baseddialoguesystems............467
19.3.2Markovdecisionprocesses........................468
19.3.3Neuralchatbots...............................470
AProbability475
A.1Probabilitiesofeventcombinations........................475
A.1.1Probabilitiesofdisjointevents......................476
A.1.2Lawoftotalprobability..........................477
A.2ConditionalprobabilityandBayes’rule.....................477
A.3Independence....................................479
A.4Randomvariables..................................480
A.5Expectations.....................................481
A.6Modelingandestimation..............................482
BNumericaloptimization485
B.1Gradientdescent..................................486
B.2Constrainedoptimization.............................486
B.3Example:Passive-aggressiveonlinelearning..................487
Bibliography489 UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
Preface
Thegoalofthistextisfocusonacoresubsetofthenaturallanguageprocessing,unified bytheconceptsoflearningandsearch.Aremarkablenumberofproblemsinnatural languageprocessingcanbesolvedbyacompactsetofmethods:
Search. Viterbi,CKY,minimumspanningtree,shift-reduce,integerlinearprogramming, beamsearch.
Learning. Maximum-likelihoodestimation,logisticregression,perceptron,expectationmaximization,matrixfactorization,backpropagation.
Thistextexplainshowthesemethodswork,andhowtheycanbeappliedtoawiderange oftasks:documentclassification,wordsensedisambiguation,part-of-speechtagging, namedentityrecognition,parsing,coreferenceresolution,relationextraction,discourse analysis,languagemodeling,andmachinetranslation.
Background
Becausenaturallanguageprocessingdrawsonmanydifferentintellectualtraditions,almosteveryonewhoapproachesitfeelsunderpreparedinonewayoranother.Hereisa summaryofwhatisexpected,andwhereyoucanlearnmore:
Mathematicsandmachinelearning. Thetextassumesabackgroundinmultivariatecalculusandlinearalgebra:vectors,matrices,derivatives,andpartialderivatives.You shouldalsobefamiliarwithprobabilityandstatistics.AreviewofbasicprobabilityisfoundinAppendixA,andaminimalreviewofnumericaloptimizationis foundinAppendixB.Forlinearalgebra,theonlinecourseandtextbookfromStrang (2016)provideanexcellentreview.Deisenrothetal.(2018)arecurrentlypreparing atextbookon MathematicsforMachineLearning,adraftcanbefoundonline.1 For anintroductiontoprobabilisticmodelingandestimation,seeJamesetal.(2013);for
1https://mml-book.github.io/
amoreadvancedandcomprehensivediscussionofthesamematerial,theclassic referenceisHastieetal.(2009).
Linguistics. Thisbookassumesnoformaltraininginlinguistics,asidefromelementary conceptslikesnounsandverbs,whichyouhaveprobablyencounteredinthestudy ofEnglishgrammar.Ideasfromlinguisticsareintroducedthroughoutthetextas needed,includingdiscussionsofmorphologyandsyntax(chapter9),semantics (chapters12and13),anddiscourse(chapter16).Linguisticissuesalsoariseinthe application-focusedchapters4,8,and18.Ashortguidetolinguisticsforstudents ofnaturallanguageprocessingisofferedbyBender(2013);youareencouragedto startthere,andthenpickupamorecomprehensiveintroductorytextbook(e.g.,Akmajianetal.,2010;Fromkinetal.,2013).
Computerscience. Thebookistargetedatcomputerscientists,whoareassumedtohave takenintroductorycoursesontheanalysisofalgorithmsandcomplexitytheory.In particular,youshouldbefamiliarwithasymptoticanalysisofthetimeandmemory costsofalgorithms,andwiththebasicsofdynamicprogramming.Theclassictext onalgorithmsisofferedbyCormenetal.(2009);foranintroductiontothetheoryof computation,seeAroraandBarak(2009)andSipser(2012).
Howtousethisbook
Aftertheintroduction,thetextbookisorganizedintofourmainunits:
Learning. Thissectionbuildsupasetofmachinelearningtoolsthatwillbeusedthroughouttheothersections.Becausethefocusisonmachinelearning,thetextrepresentationsandlinguisticphenomenaaremostlysimple:“bag-of-words”textclassificationistreatedasamodelexample.Chapter4describessomeofthemorelinguisticallyinterestingapplicationsofword-basedtextanalysis.
Sequencesandtrees. Thissectionintroducesthetreatmentoflanguageasastructured phenomena.Itdescribessequenceandtreerepresentationsandthealgorithmsthat theyfacilitate,aswellasthelimitationsthattheserepresentationsimpose.Chapter9introducesfinitestateautomataandbrieflyoverviewsacontext-freeaccountof Englishsyntax.
Meaning. Thissectiontakesabroadviewofeffortstorepresentandcomputemeaning fromtext,rangingfromformallogictoneuralwordembeddings.Italsoincludes twotopicsthatarecloselyrelatedtosemantics:resolutionofambiguousreferences, andanalysisofmulti-sentencediscoursestructure.
Applications. Thefinalsectionofferschapter-lengthtreatmentsonthreeofthemostprominentapplicationsofnaturallanguageprocessing:informationextraction,machine
JacobEisenstein.DraftofOctober15,2018.
translation,andtextgeneration.Eachoftheseapplicationsmeritsatextbooklength treatmentofitsown(Koehn,2009;Grishman,2012;ReiterandDale,2000);thechaptershereexplainsomeofthemostwellknownsystemsusingtheformalismsand methodsbuiltupearlierinthebook,whileintroducingmethodssuchasneuralattention.
Eachchaptercontainssomeadvancedmaterial,whichismarkedwithanasterisk. Thismaterialcanbesafelyomittedwithoutcausingmisunderstandingslateron.But evenwithouttheseadvancedsections,thetextistoolongforasinglesemestercourse,so instructorswillhavetopickandchooseamongthechapters.
Chapters1-3providebuildingblocksthatwillbeusedthroughoutthebook,andchapter4describessomecriticalaspectsofthepracticeoflanguagetechnology.Language models(chapter6),sequencelabeling(chapter7),andparsing(chapter10and11)are canonicaltopicsinnaturallanguageprocessing,anddistributedwordembeddings(chapter14)havebecomeubiquitous.Oftheapplications,machinetranslation(chapter18)is thebestchoice:itismorecohesivethaninformationextraction,andmorematurethantext generation.ManystudentswillbenefitfromthereviewofprobabilityinAppendixA.
• Acoursefocusingonmachinelearningshouldaddthechapteronunsupervised learning(chapter5).Thechaptersonpredicate-argumentsemantics(chapter13), referenceresolution(chapter15),andtextgeneration(chapter19)areparticularly influencedbyrecentprogressinmachinelearning,includingdeepneuralnetworks andlearningtosearch.
• Acoursewithamorelinguisticorientationshouldaddthechaptersonapplicationsofsequencelabeling(chapter8),formallanguagetheory(chapter9),semantics (chapter12and13),anddiscourse(chapter16).
• Foracoursewithamoreappliedfocus,Irecommendthechaptersonapplications ofsequencelabeling(chapter8),predicate-argumentsemantics(chapter13),informationextraction(chapter17),andtextgeneration(chapter19).
Acknowledgments
Severalcolleagues,students,andfriendsreadearlydraftsofchaptersintheirareasof expertise,includingYoavArtzi,KevinDuh,HengJi,JessyLi,BrendanO’Connor,Yuval Pinter,ShawnLingRamirez,NathanSchneider,PamelaShapiro,NoahA.Smith,Sandeep Soni,andLukeZettlemoyer.Ialsothanktheanonymousreviewers,particularlyreviewer 4,whoprovideddetailedline-by-lineeditsandsuggestions.ThetextbenefitedfromhighleveldiscussionswithmyeditorMarieLufkinLee,aswellasKevinMurphy,ShawnLing Ramirez,andBonnieWebber.Inaddition,therearemanystudents,colleagues,friends, andfamilywhofoundmistakesinearlydrafts,orwhorecommendedkeyreferences.
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
Theseinclude:ParminderBhatia,KimberlyCaras,JiahaoCai,JustinChen,MurtazaDhuliawala,YantaoDu,BarbaraEisenstein,LuizC.F.Ribeiro,ChrisGu,JoshuaKillingsworth, JonathanMay,TahaMerghani,GusMonod,RaghavendraMurali,NidishNair,Brendan O’Connor,BrandonPeck,YuvalPinter,NathanSchneider,JianhaoShen,ZheweiSun,RubinTsui,AshwinCunnapakkamVinjimur,DennyVrandeˇci´c,WilliamYangWang,Clay Washington,IshanWaykul,XavierYao,YuyuZhang,andalsosomeanonymouscommenters.ClayWashingtontestedseveraloftheprogrammingexercises.
MostofthebookwaswrittenwhileIwasatGeorgiaTech’sSchoolofInteractiveComputing.IthanktheSchoolforitssupportofthisproject,andIthankmycolleaguesthere fortheirhelpandsupportatthebeginningofmyfacultycareer.Ialsothank(andapologizeto)themanystudentsinGeorgiaTech’sCS4650and7650whosufferedthrough earlyversionsofthetext.Thebookisdedicatedtomyparents.
Notation
Asageneralrule,words,wordcounts,andothertypesofobservationsareindicatedwith Romanletters(a,b,c);parametersareindicatedwithGreekletters(α,β,θ).Vectorsare indicatedwithboldscriptforbothrandomvariables x andparameters θ.Otheruseful notationsareindicatedinthetablebelow.
Basics
exp x thebase-2exponent, 2x
log x thebase-2logarithm, log2 x
{xn}N n=1 theset {x1,x2,...,xN }
xj i xi raisedtothepower j
x(j) i indexingbyboth i and j
Linearalgebra
x(i) acolumnvectoroffeaturecountsforinstance i,oftenwordcounts
xj:k elements j through k (inclusive)ofavector x
[x; y] verticalconcatenationoftwocolumnvectors
[x, y] horizontalconcatenationoftwocolumnvectors
en a“one-hot”vectorwithavalueof 1 atposition n,andzeroeverywhere else
θ thetransposeofacolumnvector θ
θ x(i) thedotproduct N j=1 θj × x(i) j
X amatrix
xi,j row i,column j ofmatrix X
Diag(x) amatrixwith x onthediagonal,e.g.,
X 1 theinverseofmatrix X v
x1 00 0 x2 0 00 x3
Textdatasets
wm wordtokenatposition m
N numberoftraininginstances
M lengthofasequence(ofwordsortags)
V numberofwordsinvocabulary
y(i) thetruelabelforinstance i
ˆ
y apredictedlabel
Y thesetofallpossiblelabels
K numberofpossiblelabels K = |Y| thestarttoken thestoptoken
y(i) astructuredlabelforinstance i,suchasatagsequence
Y(w) thesetofpossiblelabelingsforthewordsequence w
thestarttag thestoptag
Probabilities
Pr(A) probabilityofevent A
Pr(A | B) probabilityofevent A,conditionedonevent B
pB (b) themarginalprobabilityofrandomvariable B takingvalue b;written p(b) whenthechoiceofrandomvariableisclearfromcontext
pB|A(b | a) theprobabilityofrandomvariable B takingvalue b,conditionedon A takingvalue a;writtenp(b | a) whenclearfromcontext
A ∼ p therandomvariable A isdistributedaccordingtodistribution p.For example, X ∼N (0, 1) statesthattherandomvariable X isdrawnfrom anormaldistributionwithzeromeanandunitvariance.
A | B ∼ p conditionedontherandomvariable B, A isdistributedaccordingto p 2
Machinelearning
Ψ(x(i),y) thescoreforassigninglabel y toinstance i
f (x(i),y) thefeaturevectorforinstance i withlabel y
θ a(column)vectorofweights (i) lossonanindividualinstance i
L objectivefunctionforanentiredataset
L log-likelihoodofadataset
λ theamountofregularization
JacobEisenstein.DraftofOctober15,2018.
Chapter1 Introduction
Naturallanguageprocessingisthesetofmethodsformakinghumanlanguageaccessible tocomputers.Inthepastdecade,naturallanguageprocessinghasbecomeembedded inourdailylives:automaticmachinetranslationisubiquitousonthewebandinsocial media;textclassificationkeepsemailsfromcollapsingunderadelugeofspam;search engineshavemovedbeyondstringmatchingandnetworkanalysistoahighdegreeof linguisticsophistication;dialogsystemsprovideanincreasinglycommonandeffective waytogetandshareinformation.
Thesediverseapplicationsarebasedonacommonsetofideas,drawingonalgorithms,linguistics,logic,statistics,andmore.Thegoalofthistextistoprovideasurvey ofthesefoundations.Thetechnicalfunstartsinthenextchapter;therestofthiscurrent chaptersituatesnaturallanguageprocessingwithrespecttootherintellectualdisciplines, identifiessomehigh-levelthemesincontemporarynaturallanguageprocessing,andadvisesthereaderonhowbesttoapproachthesubject.
1.1Naturallanguageprocessinganditsneighbors
Naturallanguageprocessingdrawsonmanyotherintellectualtraditions,fromformal linguisticstostatisticalphysics.Thissectionbrieflysituatesnaturallanguageprocessing withrespecttosomeofitsclosestneighbors.
ComputationalLinguistics Mostofthemeetingsandjournalsthathostnaturallanguageprocessingresearchbearthename“computationallinguistics”,andthetermsmay bethoughtofasessentiallysynonymous.Butwhilethereissubstantialoverlap,thereis animportantdifferenceinfocus.Inlinguistics,languageistheobjectofstudy.Computationalmethodsmaybebroughttobear,justasinscientificdisciplineslikecomputational biologyandcomputationalastronomy,buttheyplayonlyasupportingrole.Incontrast,
naturallanguageprocessingisfocusedonthedesignandanalysisofcomputationalalgorithmsandrepresentationsforprocessingnaturalhumanlanguage.Thegoalofnaturallanguageprocessingistoprovidenewcomputationalcapabilitiesaroundhumanlanguage:forexample,extractinginformationfromtexts,translatingbetweenlanguages,answeringquestions,holdingaconversation,takinginstructions,andsoon.Fundamental linguisticinsightsmaybecrucialforaccomplishingthesetasks,butsuccessisultimately measuredbywhetherandhowwellthejobgetsdone.
MachineLearning
Contemporaryapproachestonaturallanguageprocessingrelyheavilyonmachinelearning,whichmakesitpossibletobuildcomplexcomputerprograms fromexamples.Machinelearningprovidesanarrayofgeneraltechniquesfortaskslike convertingasequenceofdiscretetokensinonevocabularytoasequenceofdiscretetokensinanothervocabulary—ageneralizationofwhatonemightinformallycall“translation.”Muchoftoday’snaturallanguageprocessingresearchcanbethoughtofasapplied machinelearning.However,naturallanguageprocessinghascharacteristicsthatdistinguishitfrommanyofmachinelearning’sotherapplicationdomains.
• Unlikeimagesoraudio,textdataisfundamentallydiscrete,withmeaningcreated bycombinatorialarrangementsofsymbolicunits.Thisisparticularlyconsequential forapplicationsinwhichtextistheoutput,suchastranslationandsummarization, becauseitisnotpossibletograduallyapproachanoptimalsolution.
• Althoughthesetofwordsisdiscrete,newwordsarealwaysbeingcreated.Furthermore,thedistributionoverwords(andotherlinguisticelements)resemblesthatofa powerlaw1 (Zipf,1949):therewillbeafewwordsthatareveryfrequent,andalong tailofwordsthatarerare.Aconsequenceisthatnaturallanguageprocessingalgorithmsmustbeespeciallyrobusttoobservationsthatdonotoccurinthetraining data.
• Languageis compositional:unitssuchaswordscancombinetocreatephrases, whichcancombinebytheverysameprinciplestocreatelargerphrases.Forexample,a nounphrase canbecreatedbycombiningasmallernounphrasewitha prepositionalphrase,asin thewhitenessofthewhale.Theprepositionalphraseis createdbycombiningapreposition(inthiscase, of )withanothernounphrase(the whale).Inthisway,itispossibletocreatearbitrarilylongphrases,suchas,
(1.1) ...hugeglobularpiecesofthewhaleofthebignessofahumanhead.2
Themeaningofsuchaphrasemustbeanalyzedinaccordwiththeunderlyinghierarchicalstructure.Inthiscase, hugeglobularpiecesofthewhale actsasasinglenoun
1Throughoutthetext, boldface willbeusedtoindicatekeywordsthatappearintheindex.
2Throughoutthetext,thisnotationwillbeusedtointroducelinguisticexamples.
JacobEisenstein.DraftofOctober15,2018.
phrase,whichisconjoinedwiththeprepositionalphrase ofthebignessofahuman head.Theinterpretationwouldbedifferentifinstead, hugeglobularpieces wereconjoinedwiththeprepositionalphrase ofthewhaleofthebignessofahumanhead implyingadisappointinglysmallwhale.Eventhoughtextappearsasasequence, machinelearningmethodsmustaccountforitsimplicitrecursivestructure.
ArtificialIntelligence
Thegoalofartificialintelligenceistobuildsoftwareandrobots withthesamerangeofabilitiesashumans(RussellandNorvig,2009).Naturallanguage processingisrelevanttothisgoalinseveralways.Onthemostbasiclevel,thecapacityfor languageisoneofthecentralfeaturesofhumanintelligence,andisthereforeaprerequisiteforartificialintelligence.3 Second,muchofartificialintelligenceresearchisdedicated tothedevelopmentofsystemsthatcanreasonfrompremisestoaconclusion,butsuch algorithmsareonlyasgoodaswhattheyknow(Dreyfus,1992).Naturallanguageprocessingisapotentialsolutiontothe“knowledgebottleneck”,byacquiringknowledge fromtexts,andperhapsalsofromconversations.ThisideagoesallthewaybacktoTuring’s1949paper ComputingMachineryandIntelligence,whichproposedthe Turingtest for determiningwhetherartificialintelligencehadbeenachieved(Turing,2009).
Conversely,reasoningissometimesessentialforbasictasksoflanguageprocessing, suchasresolvingapronoun. Winogradschemas areexamplesinwhichasingleword changesthelikelyreferentofapronoun,inawaythatseemstorequireknowledgeand reasoningtodecode(Levesqueetal.,2011).Forexample,
(1.2) Thetrophydoesn’tfitintothebrownsuitcasebecause it istoo[small/large].
Whenthefinalwordis small,thenthepronoun it referstothesuitcase;whenthefinal wordis large,then it referstothetrophy.Solvingthisexamplerequiresspatialreasoning; otherschemasrequirereasoningaboutactionsandtheireffects,emotionsandintentions, andsocialconventions.
Suchexamplesdemonstratethatnaturallanguageunderstandingcannotbeachieved inisolationfromknowledgeandreasoning.Yetthehistoryofartificialintelligencehas beenoneofincreasingspecialization:withthegrowingvolumeofresearchinsubdisciplinessuchasnaturallanguageprocessing,machinelearning,andcomputervision,itis
3Thisviewissharedbysome,butnotall,prominentresearchersinartificialintelligence.Michael Jordan,aspecialistinmachinelearning,hassaidthatifhehadabilliondollarstospendonanylarge researchproject,hewouldspenditonnaturallanguageprocessing(https://www.reddit.com/r/ MachineLearning/comments/2fxi6v/ama_michael_i_jordan/).Ontheotherhand,inapublicdiscussionaboutthefutureofartificialintelligenceinFebruary2018,computervisionresearcherYannLecun arguedthatdespiteitsmanypracticalapplications,languageisperhaps“number300”intheprioritylist forartificialintelligenceresearch,andthatitwouldbeagreatachievementifAIcouldattainthecapabilitiesofanorangutan,whichdonotincludelanguage(http://www.abigailsee.com/2018/02/21/ deep-learning-structure-and-innate-priors.html).
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
difficultforanyonetomaintainexpertiseacrosstheentirefield.Still,recentworkhas demonstratedinterestingconnectionsbetweennaturallanguageprocessingandotherareasofAI,includingcomputervision(e.g.,Antoletal.,2015)andgameplaying(e.g., Branavanetal.,2009).Thedominanceofmachinelearningthroughoutartificialintelligencehasledtoabroadconsensusonrepresentationssuchasgraphicalmodelsand computationgraphs,andonalgorithmssuchasbackpropagationandcombinatorialoptimization.Manyofthealgorithmsandrepresentationscoveredinthistextarepartofthis consensus.
ComputerScience
Thediscreteandrecursivenatureofnaturallanguageinvitestheapplicationoftheoreticalideasfromcomputerscience.LinguistssuchasChomskyand Montaguehaveshownhowformallanguagetheorycanhelptoexplainthesyntaxand semanticsofnaturallanguage.Theoreticalmodelssuchasfinite-stateandpushdownautomataarethebasisformanypracticalnaturallanguageprocessingsystems.Algorithms forsearchingthecombinatorialspaceofanalysesofnaturallanguageutterancescanbe analyzedintermsoftheircomputationalcomplexity,andtheoreticallymotivatedapproximationscansometimesbeapplied.
Thestudyofcomputersystemsisalsorelevanttonaturallanguageprocessing.Large datasetsofunlabeledtextcanbeprocessedmorequicklybyparallelizationtechniques likeMapReduce(DeanandGhemawat,2008;LinandDyer,2010);high-volumedata sourcessuchassocialmediacanbesummarizedefficientlybyapproximatestreaming andsketchingtechniques(Goyaletal.,2009).Whendeepneuralnetworksareimplementedinproductionsystems,itispossibletoekeoutspeedgainsusingtechniquessuch asreduced-precisionarithmetic(Wuetal.,2016).Manyclassicalnaturallanguageprocessingalgorithmsarenotnaturallysuitedtographicsprocessingunit(GPU)parallelization, suggestingdirectionsforfurtherresearchattheintersectionofnaturallanguageprocessingandcomputinghardware(Yietal.,2011).
SpeechProcessing Naturallanguageisoftencommunicatedinspokenform,andspeech recognitionisthetaskofconvertinganaudiosignaltotext.Fromoneperspective,thisis asignalprocessingproblem,whichmightbeviewedasapreprocessingstepbeforenaturallanguageprocessingcanbeapplied.However,contextplaysacriticalroleinspeech recognitionbyhumanlisteners:knowledgeofthesurroundingwordsinfluencesperceptionandhelpstocorrectfornoise(Milleretal.,1951).Forthisreason,speechrecognition isoftenintegratedwithtextanalysis,particularlywithstatistical languagemodels,which quantifytheprobabilityofasequenceoftext(seechapter6).Beyondspeechrecognition, thebroaderfieldofspeechprocessingincludesthestudyofspeech-baseddialoguesystems,whicharebrieflydiscussedinchapter19.Historically,speechprocessinghasoften beenpursuedinelectricalengineeringdepartments,whilenaturallanguageprocessing
JacobEisenstein.DraftofOctober15,2018.
hasbeenthepurviewofcomputerscientists.Forthisreason,theextentofinteraction betweenthesetwodisciplinesislessthanitmightotherwisebe.
Ethics Asmachinelearningandartificialintelligencebecomeincreasinglyubiquitous,it iscrucialtounderstandhowtheirbenefits,costs,andrisksaredistributedacrossdifferentkindsofpeople.Naturallanguageprocessingraisessomeparticularlysalientissues around ethics,fairness,andaccountability:
Access. Whoisnaturallanguageprocessingdesignedtoserve?Forexample,whoselanguageistranslated from,andwhoselanguageistranslated to?
Bias. Doeslanguagetechnologylearntoreplicatesocialbiasesfromtextcorpora,and doesitreinforcethesebiasesasseeminglyobjectivecomputationalconclusions?
Labor. Whosetextandspeechcomprisethedatasetsthatpowernaturallanguageprocessing,andwhoperformstheannotations?Arethebenefitsofthistechnology sharedwithallthepeoplewhoseworkmakesitpossible?
Privacyandinternetfreedom. Whatistheimpactoflarge-scaletextprocessingonthe righttofreeandprivatecommunication?Whatisthepotentialroleofnaturallanguageprocessinginregimesofcensorshiporsurveillance?
Thistextlightlytouchesonissuesrelatedtofairnessandbiasinsubsection14.6.3and subsection18.1.1,buttheseissuesareworthyofabookoftheirown.Formorefrom withinthefieldofcomputationallinguistics,seethepapersfromtheannualworkshop onEthicsinNaturalLanguageProcessing(Hovyetal.,2017;Alfanoetal.,2018).For anoutsideperspectiveonethicalissuesrelatingtodatascienceatlarge,seeboydand Crawford(2012).
Others Naturallanguageprocessingplaysasignificantroleinemerginginterdisciplinary fieldslike computationalsocialscience andthe digitalhumanities.Textclassification (chapter4),clustering(chapter5),andinformationextraction(chapter17)areparticularly usefultools;anotheris probabilistictopicmodels (Blei,2012),whicharenotcoveredin thistext. Informationretrieval (Manningetal.,2008)makesuseofsimilartools,and conversely,techniquessuchaslatentsemanticanalysis(section14.3)haverootsininformationretrieval. Textmining issometimesusedtorefertotheapplicationofdatamining techniques,especiallyclassificationandclustering,totext.Whilethereisnocleardistinctionbetweentextminingandnaturallanguageprocessing(norbetweendataminingand machinelearning),textminingistypicallylessconcernedwithlinguisticstructure,and moreinterestedinfast,scalablealgorithms.
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
1.2Threethemesinnaturallanguageprocessing
Naturallanguageprocessingcoversadiverserangeoftasks,methods,andlinguisticphenomena.Butdespitetheapparentincommensurabilitybetween,say,thesummarization ofscientificarticles(section16.3.4)andtheidentificationofsuffixpatternsinSpanish verbs(section9.1.4),somegeneralthemesemerge.Theremainderoftheintroductionfocusesonthesethemes,whichwillrecurinvariousformsthroughthetext.Eachtheme canbeexpressedasanoppositionbetweentwoextremeviewpointsonhowtoprocess naturallanguage.Themethodsdiscussedinthetextcanusuallybeplacedsomewhereon thecontinuumbetweenthesetwoextremes.
1.2.1Learningandknowledge
Arecurringtopicofdebateistherelativeimportanceofmachinelearningandlinguistic knowledge.Ononeextreme,advocatesof“naturallanguageprocessingfromscratch”(Collobertetal.,2011)proposetousemachinelearningtotrainend-to-endsystemsthattransmuterawtextintoanydesiredoutputstructure:e.g.,asummary,database,ortranslation.Ontheotherextreme,thecoreworkofnaturallanguageprocessingissometimes takentobetransformingtextintoastackofgeneral-purposelinguisticstructures:from subwordunitscalled morphemes,toword-level parts-of-speech,totree-structuredrepresentationsofgrammar,andbeyond,tologic-basedrepresentationsofmeaning.Intheory, thesegeneral-purposestructuresshouldthenbeabletosupportanydesiredapplication.
Theend-to-endapproachhasbeenbuoyedbyrecentresultsincomputervisionand speechrecognition,inwhichadvancesinmachinelearninghavesweptawayexpertengineeredrepresentationsbasedonthefundamentalsofopticsandphonology(Krizhevsky etal.,2012;GravesandJaitly,2014).Butwhilemachinelearningisanelementofnearly everycontemporaryapproachtonaturallanguageprocessing,linguisticrepresentations suchassyntaxtreeshavenotyetgonethewayofthevisualedgedetectorortheauditory triphone.Linguistshavearguedfortheexistenceofa“languagefaculty”inallhumanbeings,whichencodesasetofabstractionsspeciallydesignedtofacilitatetheunderstanding andproductionoflanguage.Theargumentfortheexistenceofsuchalanguagefaculty isbasedontheobservationthatchildrenlearnlanguagefasterandfromfewerexamples thanwouldbepossibleiflanguagewaslearnedfromexperiencealone.4 Fromapracticalstandpoint,linguisticstructureseemstobeparticularlyimportantinscenarioswhere trainingdataislimited.
Thereareanumberofwaysinwhichknowledgeandlearningcanbecombinedin naturallanguageprocessing.Manysupervisedlearningsystemsmakeuseofcarefully engineered features,whichtransformthedataintoarepresentationthatcanfacilitate
4TheLanguageInstinct (Pinker,2003)articulatestheseargumentsinanengagingandpopularstyle.For argumentsagainsttheinnatenessoflanguage,seeElmanetal.(1998).
JacobEisenstein.DraftofOctober15,2018.
1.2.THREETHEMESINNATURALLANGUAGEPROCESSING
learning.Forexample,inatasklikesearch,itmaybeusefultoidentifyeachword’s stem, sothatasystemcanmoreeasilygeneralizeacrossrelatedtermssuchas whale, whales, whalers,and whaling.(ThisissueisrelativelybenigninEnglish,ascomparedtothemany otherlanguageswhichincludemuchmoreelaboratesystemsofprefixedandsuffixes.) Suchfeaturescouldbeobtainedfromahand-craftedresource,likeadictionarythatmaps eachwordtoasinglerootform.Alternatively,featurescanbeobtainedfromtheoutputof ageneral-purposelanguageprocessingsystem,suchasaparserorpart-of-speechtagger, whichmayitselfbebuiltonsupervisedmachinelearning.
Anothersynthesisoflearningandknowledgeisinmodelstructure:buildingmachine learningmodelswhosearchitecturesareinspiredbylinguistictheories.Forexample,the organizationofsentencesisoftendescribedas compositional,withmeaningoflarger unitsgraduallyconstructedfromthemeaningoftheirsmallerconstituents.Thisidea canbebuiltintothearchitectureofadeepneuralnetwork,whichisthentrainedusing contemporarydeeplearningtechniques(Dyeretal.,2016).
Thedebateabouttherelativeimportanceofmachinelearningandlinguisticknowledgesometimesbecomesheated.Nomachinelearningspecialistlikestobetoldthattheir engineeringmethodologyisunscientificalchemy;5 nordoesalinguistwanttohearthat thesearchforgenerallinguisticprinciplesandstructureshasbeenmadeirrelevantbybig data.Yetthereisclearlyroomforbothtypesofresearch:weneedtoknowhowfarwe cangowithend-to-endlearningalone,whileatthesametime,wecontinuethesearchfor linguisticrepresentationsthatgeneralizeacrossapplications,scenarios,andlanguages. Formoreonthehistoryofthisdebate,seeChurch(2011);foranoptimisticviewofthe potentialsymbiosisbetweencomputationallinguisticsanddeeplearning,seeManning (2015).
1.2.2Searchandlearning
Manynaturallanguageprocessingproblemscanbewrittenmathematicallyintheform ofoptimization,6
where,
• x istheinput,whichisanelementofaset X ;
• y istheoutput,whichisanelementofaset Y(x);
5AliRahimiarguedthatmuchofdeeplearningresearchwassimilarto“alchemy”inapresentationat the2017conferenceonNeuralInformationProcessingSystems.Hewasadvocatingformorelearningtheory, notmorelinguistics.
6Throughoutthistext,equationswillbenumberedbysquarebrackets,andlinguisticexampleswillbe numberedbyparentheses.
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
• Ψ isascoringfunction(alsocalledthe model),whichmapsfromtheset X×Y to therealnumbers;
• θ isavectorofparametersfor Ψ;
• ˆ y isthepredictedoutput,whichischosentomaximizethescoringfunction.
Thisbasicstructurecanbeappliedtoahugerangeofproblems.Forexample,theinput x mightbeasocialmediapost,andtheoutput y mightbealabelingoftheemotional sentimentexpressedbytheauthor(chapter4);or x couldbeasentenceinFrench,andthe output y couldbeasentenceinTamil(chapter18);or x mightbeasentenceinEnglish, and y mightbearepresentationofthesyntacticstructureofthesentence(chapter10);or x mightbeanewsarticleand y mightbeastructuredrecordoftheeventsthatthearticle describes(chapter17).
Thisformulationreflectsanimplicitdecisionthatlanguageprocessingalgorithmswill havetwodistinctmodules:
Search. Thesearchmoduleisresponsibleforcomputingthe argmax ofthefunction Ψ Inotherwords,itfindstheoutput ˆ y thatgetsthebestscorewithrespecttotheinput x.Thisiseasywhenthesearchspace Y(x) issmallenoughtoenumerate,or whenthescoringfunction Ψ hasaconvenientdecompositionintoparts.Inmany cases,wewillwanttoworkwithscoringfunctionsthatdonothavetheseproperties,motivatingtheuseofmoresophisticatedsearchalgorithms,suchasbottom-up dynamicprogramming(section10.1)andbeamsearch(section11.3.1).Becausethe outputsareusuallydiscreteinlanguageprocessingproblems,searchoftenrelieson themachineryof combinatorialoptimization
Learning. Thelearningmoduleisresponsibleforfindingtheparameters θ.Thisistypically(butnotalways)donebyprocessingalargedatasetoflabeledexamples, {(x(i) , y(i))}N i=1.Likesearch,learningisalsoapproachedthroughtheframework ofoptimization,aswewillseeinchapter2.Becausetheparametersareusually continuous,learningalgorithmsgenerallyrelyon numericaloptimization toidentifyvectorsofreal-valuedparametersthatoptimizesomefunctionofthemodeland thelabeleddata.Somebasicprinciplesofnumericaloptimizationarereviewedin AppendixB.
Thedivisionofnaturallanguageprocessingintoseparatemodulesforsearchand learningmakesitpossibletoreusegenericalgorithmsacrossmanytasksandmodels. Muchoftheworkofnaturallanguageprocessingcanbefocusedonthedesignofthe model Ψ —identifyingandformalizingthelinguisticphenomenathatarerelevanttothe taskathand—whilereapingthebenefitsofdecadesofprogressinsearch,optimization, andlearning.Thistextbookwilldescribeseveralclassesofscoringfunctions,andthe correspondingalgorithmsforsearchandlearning.
JacobEisenstein.DraftofOctober15,2018.
Whenamodeliscapableofmakingsubtlelinguisticdistinctions,itissaidtobe expressive.Expressivenessisoftentradedoffagainstefficiencyofsearchandlearning.For example,aword-to-wordtranslationmodelmakessearchandlearningeasy,butitisnot expressiveenoughtodistinguishgoodtranslationsfrombadones.Manyofthemostimportantproblemsinnaturallanguageprocessingseemtorequireexpressivemodels,in whichthecomplexityofsearchgrowsexponentiallywiththesizeoftheinput.Inthese models,exactsearchisusuallyimpossible.Intractabilitythreatenstheneatmodulardecompositionbetweensearchandlearning:ifsearchrequiresasetofheuristicapproximations,thenitmaybeadvantageoustolearnamodelthatperformswellunderthesespecificheuristics.Thishasmotivatedsomeresearcherstotakeamoreintegratedapproach tosearchandlearning,asbrieflymentionedinchapters11and15.
1.2.3Relational,compositional,anddistributionalperspectives
Anyelementoflanguage—aword,aphrase,asentence,orevenasound—canbe describedfromatleastthreeperspectives.Considertheword journalist.A journalist is asubcategoryofa profession,andan anchorwoman isasubcategoryof journalist;furthermore,a journalist performs journalism,whichisoften,butnotalways,asubcategoryof writing.Thisrelationalperspectiveonmeaningisthebasisforsemantic ontologies such as WORDNET (Fellbaum,2010),whichenumeratetherelationsthatholdbetweenwords andotherelementarysemanticunits.Thepoweroftherelationalperspectiveisillustrated bythefollowingexample:
(1.3) UmashanthiinterviewedAna.Sheworksforthecollegenewspaper.
Whoworksforthecollegenewspaper?Theword journalist,whilenotstatedintheexample,implicitlylinksthe interview tothe newspaper,making Umashanthi themostlikely referentforthepronoun.(Ageneraldiscussionofhowtoresolvepronounsisfoundin chapter15.)
Yetdespitetheinferentialpoweroftherelationalperspective,itisnoteasytoformalize computationally.Exactlywhichelementsaretoberelated?Are journalists and reporters distinct,orshouldwegroupthemintoasingleunit?Isthekindof interview performedby ajournalistthesameasthekindthatoneundergoeswhenapplyingforajob?Ontology designersfacemanysuchthornyquestions,andtheprojectofontologydesignhearkens backtoBorges’(1993) CelestialEmporiumofBenevolentKnowledge,whichdividesanimals into:
(a)belongingtotheemperor;(b)embalmed;(c)tame;(d)sucklingpigs;(e) sirens;(f)fabulous;(g)straydogs;(h)includedinthepresentclassification; (i)frenzied;(j)innumerable;(k)drawnwithaveryfinecamelhairbrush;(l)et cetera;(m)havingjustbrokenthewaterpitcher;(n)thatfromalongwayoff resembleflies.
UndercontractwithMITPress,sharedunderCC-BY-NC-NDlicense.
CHAPTER1.INTRODUCTION
Difficultiesinontologyconstructionhaveledsomelinguiststoarguethatthereisnotaskindependentwaytopartitionupwordmeanings(Kilgarriff,1997).
Someproblemsareeasier.Eachmemberinagroupof journalists isa journalist:the -s suffixdistinguishesthepluralmeaningfromthesingularinmostofthenounsinEnglish. Similarly,a journalist canbethoughtof,perhapscolloquially,assomeonewhoproducesor worksona journal.(Takingthisapproachevenfurther,theword journal derivesfromthe French jour+nal,or day+ly=daily.)Inthisway,themeaningofawordisconstructedfrom theconstituentparts—theprincipleof compositionality.Thisprinciplecanbeapplied tolargerunits:phrases,sentences,andbeyond.Indeed,oneofthegreatstrengthsofthe compositionalviewofmeaningisthatitprovidesaroadmapforunderstandingentire textsanddialoguesthroughasingleanalyticlens,groundingoutinthesmallestpartsof individualwords.
Butalongside journalists and anti-parliamentarians,therearemanywordsthatseem tobelinguisticatoms:think,forexample,of whale, blubber,and Nantucket.Idiomatic phraseslike kickthebucket and shootthebreeze havemeaningsthatarequitedifferentfrom thesumoftheirparts(Sagetal.,2002).Compositionisoflittlehelpforsuchwordsand expressions,buttheirmeaningscanbeascertained—oratleastapproximated—fromthe contextsinwhichtheyappear.Take,forexample, blubber,whichappearsinsuchcontexts as:
(1.4) a. Theblubberservedthemasfuel.
b. ...extractingitfromtheblubberofthelargefish...
c. Amongstoilysubstances,blubberhasbeenemployedasamanure.
Thesecontextsformthe distributionalproperties oftheword blubber,andtheylinkitto wordswhichcanappearinsimilarconstructions: fat, pelts,and barnacles.Thisdistributionalperspectivemakesitpossibletolearnaboutmeaningfromunlabeleddataalone; unlikerelationalandcompositionalsemantics,nomanualannotationorexpertknowledgeisrequired.Distributionalsemanticsisthuscapableofcoveringahugerangeof linguisticphenomena.However,itlacksprecision: blubber issimilarto fat inonesense,to pelts inanothersense,andto barnacles instillanother.Thequestionof why allthesewords tendtoappearinthesamecontextsisleftunanswered.
Therelational,compositional,anddistributionalperspectivesallcontributetoourunderstandingoflinguisticmeaning,andallthreeappeartobecriticaltonaturallanguage processing.Yettheyareuneasycollaborators,requiringseeminglyincompatiblerepresentationsandalgorithmicapproaches.Thistextpresentssomeofthebestknownandmost successfulmethodsforworkingwitheachoftheserepresentations,butfutureresearch mayrevealnewwaystocombinethem.
JacobEisenstein.DraftofOctober15,2018.
PartI Learning
Another random document with no related content on Scribd:
This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.”
• You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work.
• You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause.
Section 2. Information about the Mission of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org.
Section 3. Information about the Project Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation
Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.
Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.
Most people start at our website which has the main PG search facility: www.gutenberg.org.
This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.