CS61BReader
(SeventhEdition)
DataStructures(IntoJava)
PaulN.Hilfinger UniversityofCalifornia,Berkeley
Acknowledgments. Thankstothefollowingindividualsforfindingmanyofthe errorsinearliereditions:DanBonachea,MichaelClancy,DennisHall,JosephHui, YinaJin,ZhiLin,AmyMok,BarathRaghavanYingssuTsai,EmilyWatt,and ZihanZhou.
Copyright c 2000,2001,2002,2004,2005,2006,2007,2008,2009,2011,2012, 2013byPaulN.Hilfinger.Allrightsreserved.
Contents 1AlgorithmicComplexity 7 1.1Asymptoticcomplexityanalysisandordernotation.... ......9 1.2Examples.................................11 1.2.1Demonstrating“Big-Ohness”..................13 1.3ApplicationstoAlgorithmAnalysis................. ..13 1.3.1Linearsearch...........................14 1.3.2Quadraticexample........................15 1.3.3Explosiveexample........................15 1.3.4Divideandconquer........................16 1.3.5Divideandfighttoastandstill.................17 1.4Amortization...............................18 1.5ComplexityofProblems.........................20 1.6SomePropertiesofLogarithms.....................21 1.7ANoteonNotation...........................22 2DataTypesintheAbstract 23 2.1Iterators..................................23 2.1.1TheIteratorInterface......................24 2.1.2TheListIteratorInterface....................26 2.2TheJavaCollectionAbstractions................... .26 2.2.1TheCollectionInterface.....................26 2.2.2TheSetInterface.........................33 2.2.3TheListInterface........................33 2.2.4OrderedSets...........................37 2.3TheJavaMapAbstractions.......................39 2.3.1TheMapInterface........................41 2.3.2TheSortedMapInterface....................41 2.4AnExample................................41 2.5ManagingPartialImplementations:DesignOptions.... .....46 3MeetingaSpecification 49 3.1DoingitfromScratch..........................52 3.2TheAbstractCollectionClass...................... 52 3.3ImplementingtheListInterface.................... .53 3.3.1TheAbstractListClass.....................53 3
4 CONTENTS 3.3.2TheAbstractSequentialListClass................ 56 3.4TheAbstractMapClass.........................60 3.5PerformancePredictions.........................60 4SequencesandTheirImplementations65 4.1ArrayRepresentationoftheListInterface........... ....65 4.2LinkinginSequentialStructures................... .69 4.2.1SinglyLinkedLists........................69 4.2.2Sentinels..............................70 4.2.3DoublyLinkedLists.......................70 4.3LinkedImplementationoftheListInterface.......... ....72 4.4SpecializedLists.............................78 4.4.1Stacks...............................78 4.4.2FIFOandDouble-EndedQueues................81 4.5Stack,Queue,andDequeImplementation.............. .81 5Trees 91 5.1Expressiontrees.............................93 5.2Basictreeprimitives...........................94 5.3Representingtrees............................96 5.3.1Root-downpointer-basedbinarytrees............. 96 5.3.2Root-downpointer-basedorderedtrees............ .96 5.3.3Leaf-uprepresentation......................97 5.3.4Arrayrepresentationsofcompletetrees........... .98 5.3.5Alternativerepresentationsofemptytrees........ ...99 5.4Treetraversals...............................100 5.4.1Generalizedvisitation......................101 5.4.2Visitingemptytrees.......................103 5.4.3Iteratorsontrees.........................104 6SearchTrees 107 6.1OperationsonaBST...........................109 6.1.1SearchingaBST.........................109 6.1.2InsertingintoaBST.......................109 6.1.3DeletingitemsfromaBST....................111 6.1.4Operationswithparentpointers................113 6.1.5Degeneracystrikes........................113 6.2ImplementingtheSortedSetinterface............... ...113 6.3OrthogonalRangeQueries........................115 6.4Priorityqueuesandheaps........................119 6.4.1HeapifyTime...........................126 6.5GameTrees................................127 6.5.1Alpha-betapruning.......................129 6.5.2Agame-treesearchalgorithm..................131
CONTENTS 5 7Hashing 133 7.1Chaining..................................133 7.2Open-addresshashing..........................134 7.3Thehashfunction............................138 7.4Performance................................140 8SortingandSelecting 141 8.1Basicconcepts..............................141 8.2ALittleNotation.............................142 8.3Insertionsorting.............................143 8.4Shell’ssort................................143 8.5Distributioncounting...........................148 8.6Selectionsort...............................148 8.7Exchangesorting:Quicksort....................... 151 8.8Mergesorting...............................153 8.8.1Complexity............................155 8.9Speedofcomparison-basedsorting.................. .155 8.10Radixsorting...............................158 8.10.1LSD-firstradixsorting......................159 8.10.2MSD-firstradixsorting.....................159 8.11Usingthelibrary.............................162 8.12Selection..................................162 9BalancedSearching 165 9.1BalancebyConstruction:B-Trees................... 165 9.1.1B-treeInsertion..........................167 9.1.2B-treedeletion..........................167 9.1.3Red-BlackTrees:BinarySearchTreesas(2,4)Trees.. ...172 9.2Tries....................................172 9.2.1Tries:basicpropertiesandalgorithms............ .174 9.2.2Tries:Representation......................179 9.2.3Tablecompression........................180 9.3RestoringBalancebyRotation.....................181 9.3.1AVLTrees.............................184 9.4SplayTrees................................186 9.4.1Analyzingsplaytrees......................188 9.5SkipLists.................................195 10ConcurrencyandSynchronization201 10.1SynchronizedDataStructures..................... .202 10.2MonitorsandOrderlyCommunication................ .203 10.3MessagePassing.............................205
6 CONTENTS 11Pseudo-RandomSequences 207 11.1Linearcongruentialgenerators................... ..207 11.2AdditiveGenerators...........................209 11.3Otherdistributions............................210 11.3.1Changingtherange.......................210 11.3.2Non-uniformdistributions....................211 11.3.3Finitedistributions........................212 11.4Randompermutationsandcombinations.............. ..215 12Graphs 217 12.1AProgrammer’sSpecification...................... 218 12.2Representinggraphs...........................219 12.2.1AdjacencyLists..........................219 12.2.2Edgesets.............................224 12.2.3Adjacencymatrices........................225 12.3GraphAlgorithms............................226 12.3.1Marking..............................226 12.3.2Ageneraltraversalschema....................227 12.3.3Genericdepth-firstandbreadth-firsttraversal.... .....228 12.3.4Topologicalsorting........................228 12.3.5Minimumspanningtrees.....................229 12.3.6Single-sourceshortestpaths................... 232 12.3.7A*search.............................234 12.3.8Kruskal’salgorithmforMST..................237
Chapter1
AlgorithmicComplexity
Theobviouswaytoanswertothequestion“Howfastdoessuch-and-suchaprogram run?”istousesomethingliketheUNIX time commandtofindoutdirectly.There arevariouspossibleobjectionstothiseasyanswer.Thetimerequiredbyaprogram isafunctionoftheinput,sopresumablywehavetotimeseveralinstancesofthe commandandextrapolatetheresult.Someprograms,however,behavefinefor most inputs,butsometimestakeaverylongtime;howdowereport(indeed,howcanwe besuretonotice)suchanomalies?Whatdowedoaboutalltheinputsforwhichwe havenomeasurements?Howdowevalidlyapplyresultsgatheredononemachine toanothermachine?
Thetroublewithmeasuringrawtimeisthattheinformationisprecise,but limited:thetimefor this inputon this configurationof this machine.Onadifferent machinewhoseinstructionstakedifferentabsoluteorrelativetimes,thenumbers don’tnecessarilyapply.Indeed,supposewecomparetwodifferentprogramsfor doingthesamethingonthesameinputsandthesamemachine.ProgramAmay turnoutfasterthanprogramB.Thisdoes not imply,however,thatprogramAwill befasterthanBwhentheyarerunonsomeotherinput,oronthe sameinput,but someothermachine.
Inmathematese,wemightsaythatarawtimeisthevalueofafunction
Cr(I,P,M )forsomeparticularinput I,someprogram P ,andsome“platform”
M (platform hereisacatchalltermforacombinationofmachine,operatingsystem,compiler,andruntimelibrarysupport).I’veinvented thefunction Cr hereto mean“therawcostof....”Wecanmakethefigurealittlemoreinformativeby summarizingover all inputsofaparticularsize C
where |I| denotesthe“size”ofinput I.Howonedefinesthesizedependsonthe problem:if I isanarraytobesorted,forexample, |I| mightdenote I .length.We saythat Cw measures worst-casetime ofaprogram.Ofcourse,sincethenumber ofinputsofagivensizecouldbeverylarge(thenumberofarraysof5 ints,for example,is2160 > 1048 ),wecan’tdirectlymeasure Cw,butwecanperhapsestimate itwiththehelpofsomeanalysisof P .Byknowingworst-casetimes,wecanmake
|I|=N Cr
w(N,P,M )=max
(I,P,M ),
7
CHAPTER1.ALGORITHMICCOMPLEXITY
conservative statementsabouttherunningtimeofaprogram:iftheworst-case timeforinputofsize N is T ,thenweareguaranteedthat P willconsumenomore thantime T for any inputofsize N .
Butofcourse,italwayspossiblethatourprogramwillworkfineonmostinputs, buttakeareallylongtimeononeortwo(unlikely)inputs.In suchcases,wemight claimthat Cw istooharshasummarymeasure,andweshouldreallylookatan average time.Assumingallvaluesoftheinput, I,areequallylikely,theaverage timeis
Fairthismaybe,butitisusuallyveryhardtocompute.Inthiscourse,therefore,Iwillsayverylittleaboutaveragecases,leavingthattoyournextcourseon algorithms.
We’vesummarizedoverinputsbyconsideringworst-casetimes;nowlet’sconsiderhowwecansummarizeovermachines.Justassummarizingoverinputs requiredthatwegiveupsomeinformation—namely,performanceonparticular inputs—sosummarizingovermachinesrequiresthatwegiveupinformationon preciseperformanceonparticularmachines.Supposethattwodifferentmodelsof computerarerunning(differenttranslationsof)thesameprogram,performingthe samestepsinthesameorder.Althoughtheyrunatdifferentspeeds,andpossibly executedifferentnumbersofinstructions,thespeedsatwhichtheyperformany particularsteptendtodifferbysomeconstantfactor.Bytakingthelargestand smallestoftheseconstantfactors,wecanputboundsaround thedifferenceintheir overallexecutiontimes.(Theargumentisnotreallythissimple,butforourpurposeshere,itwillsuffice.)Thatis,thetimingsofthesameprogramonanytwo platformswilltendtodifferbynomorethansomeconstantfactoroverallpossible inputs.Ifwecannaildownthetimingofaprogramononeplatform,wecanuseit forallothers,andourresultswill“onlybeoffbyaconstantfactor.”
Butofcourse,1000isaconstantfactor,andyouwouldnotnormallybeinsensitivetothefactthatBrandXprogramis1000timesslowerthanBrandY. Thereis,however,animportantcaseinwhichthissortofcharacterizationisuseful:namely,whenwearetryingtodetermineorcomparetheperformanceof algorithms—idealizedproceduresforperformingsometask.Thedistinctionbetweenalgorithmandprogram(aconcrete,executableprocedure)issomewhatvague.Most higher-levelprogramminglanguagesallowonetowriteprogramsthatlookvery muchlikethealgorithmstheyaresupposedtoimplement.The distinctionliesin thelevelofdetail.Aprocedurethatiscastintermsofoperationson“sets,”with nospecificimplementationgivenforthesesets,probablyqualifiesasanalgorithm. Whentalkingaboutidealizedprocedures,itdoesn’tmakeagreatdealofsenseto talkaboutthenumberofsecondstheytaketoexecute.Rather,weareinterested inwhatImightcallthe shape ofanalgorithm’sbehavior:suchquestionsas“Ifwe doublethesizeoftheinput,whathappenstotheexecutiontime?”Giventhatkind ofquestion,theparticular units oftime(orspace)usedtomeasuretheperformance ofanalgorithmareunimportant—constantfactorsdon’tmatter.
8
Ca(N,P,M )= |I|=N
Cr(I,P,M ) N
1.1.ASYMPTOTICCOMPLEXITYANALYSISANDORDERNOTATION 9
Ifweonlycareaboutcharacterizingthespeedofanalgorithmtowithina constantfactor,othersimplificationsarepossible.Weneednolongerworryabout thetimingofeachlittlestatementinthealgorithm,butcan measuretimeusing anyconvenient“markerstep.”Forexample,tododecimalmultiplicationinthe standardway,youmultiplyeachdigitofthemultiplicandby eachdigitofthe multiplier,andperformroughlyoneone-digitadditionwithcarryforeachofthese one-digitmultiplications.Countingjusttheone-digitmultiplications,therefore,will giveyouthetimewithinaconstantfactor,andthesemultiplicationsareveryeasy tocount(theproductofthenumbersofdigitsintheoperands).
Anothercharacteristicassumptioninthestudyof algorithmiccomplexity (i.e., thetimeormemoryconsumptionofanalgorithm)isthatweare interestedin typical behaviorofanidealizedprogramovertheentiresetofpossibleinputs.Idealized programs,ofcourse,beingideal,canoperateoninputsofanypossiblesize,andmost “possiblesizes”intheidealworldofmathematicsareextremelylarge.Therefore,in thiskindofanalysis,itistraditionalnottobeinterested inthefactthataparticular algorithmdoesverywellforsmallinputs,butrathertoconsideritsbehavior“in thelimit”asinputgetsverylarge.Forexample,supposethatonewantedto analyzealgorithmsforcomputing π toanygivennumberofdecimalplaces.Ican make any algorithmlookgoodforinputsupto,say,1,000,000bysimplystoring thefirst1,000,000digitsof π inanarrayandusingthattosupplytheanswer when1,000,000orfewerdigitsarerequested.Ifyoupaidany attentiontohowmy programperformedforinputsupto1,000,000,youcouldbeseriouslymisledasto theclevernessofmyalgorithm.Therefore,whenstudyingalgorithms,welookat their asymptoticbehavior —howtheybehaveastheyinputsizegoestoinfinity.
Theresultofalltheseconsiderationsisthatinconsideringthetimecomplexity ofalgorithms,wemaychooseanyparticularmachineandcountanyconvenient markerstep,andwetrytofindcharacterizationsthataretrueasymptotically—out toinfinity.Thisimpliesthatourtypicalcomplexitymeasureforalgorithmswill havetheform Cw(N,A)—meaning“theworst-casetimeoverallinputsofsize N ofalgorithm A (insomeunits).”Sincethealgorithmwillbeunderstoodinany particulardiscussion,wewillusuallyjustwrite Cw(N )orsomethingsimilar.Sothe firstthingweneedtodescribealgorithmiccomplexityisawaytocharacterizethe asymptoticbehavioroffunctions.
1.1Asymptoticcomplexityanalysisandordernotation
Asithappens,thereisaconvenientnotationaltool—knowncollectivelyas order notation for“orderofgrowth”—fordescribingtheasymptoticbehavioroffunctions. Itmaybe(andis)usedforanykindofinteger-orreal-valued function—notjust complexityfunctions.You’veprobablyseenitusedincalculuscourses,forexample.
Wewrite
f (n) ∈ O(g(n)) (aloud,thisis“f (n)isinbig-Ohof g(n)”)tomeanthatthefunction f iseventually
Figure1.1: Illustrationofbig-Ohnotation.Ingraph(a),weseethat |f (n)|≤ 2|g(n)| for n>M ,sothat f (n) ∈ O(g(n))(with K =2).Likewise, h(n) ∈ O(g(n)), illustratingthe g canbeaveryover-cautiousbound.Thefunction f isalsobounded below byboth g (with,forexample, K =0 5and M anyvaluelargerthan0)andby h.Thatis, f (n) ∈ Ω(g(n))and f (n) ∈ Ω(h(n)).Because f isboundedaboveand belowbymultiplesof g,wesay f (n) ∈ Θ(g(n)).Ontheotherhand, h(n) ∈ Ω(g(n)). Infact,assumingthat g continuestogrowasshownand h toshrink, h(n) ∈ o(g(n)). Graph(b)showsthat o(·)isnotsimplythesetcomplementofΩ(·); h′(n) ∈ Ω(g′(n)), but h′(n) ∈ o(g′(n)),either.
boundedbysomemultipleof |g(n)|.Moreprecisely, f (n) ∈ O(g(n))iff
|f (n)|≤ K ·|g(n)|, forall n>M, forsomeconstants K> 0and M .Thatis, O(g(n))isthe set offunctionsthat “grownomorequicklythan” |g(n)| doesas n getssufficientlylarge.Somewhat confusingly, f (n)heredoesnotmean“theresultofapplying f to n,”asitusually does.Rather,itistobeinterpretedasthe bodyofafunction whoseparameteris n Thus,weoftenwritethingslike O(n2)tomean“thesetofallfunctionsthatgrow nomorequicklythanthesquareoftheirargument1.”Figure1.1agivesanintuitive ideaofwhatitmeanstobein O(g(n)).
Sayingthat f (n) ∈ O(g(n))givesusonlyan upperbound onthebehaviorof f . Forexample,thefunction h inFigure1.1a—andforthatmatter,thefunctionthat
1Ifwewantedtobeformallycorrect,we’duselambdanotation torepresentfunctions(suchas Schemeuses)andwriteinstead O(λn.n2 ),butI’msureyoucanseehowsuchadegreeofrigor wouldbecometediousverysoon.
10 CHAPTER1.ALGORITHMICCOMPLEXITY 2|g(n)| 0.5|g(n)| |f (n)| |h(n)| f n (a) n = M n (b) |g′(n)| |h′(n)|
1.2.EXAMPLES 11
is0everywhere—arebothin O(g(n)),butcertainlydon’tgrowlike g.Accordingly, wedefine f (n) ∈ Ω(g(n))iffforall n>M, |f (n)|≥ K|g(n)| for n>M ,for someconstants K> 0and M .Thatis,Ω(g(n))isthesetofallfunctionsthat “grow atleast asfastas” g beyondsomepoint.Alittlealgebrasufficestoshowthe relationshipbetween O( )andΩ( ):
|f (n)|≥ K|g(n)|≡|g(n)|≤ (1/K) ·|f (n)|
so
f (n) ∈ Ω(g(n)) ⇐⇒ g(n) ∈ O(f (n))
Becauseofourcavaliertreatmentofconstantfactors,itis possibleforafunction f (n)tobeboundedbothaboveandbelowbyanotherfunction g(n): f (n) ∈ O(g(n)) and f (n) ∈ Ω(g(n)).Forbrevity,wewrite f (n) ∈ Θ(g(n)),sothatΘ(g(n))= O(g(n)) ∩ Ω(g(n)).
Justbecauseweknowthat f (n) ∈ O(g(n)),wedon’tnecessarilyknowthat f (n)getsmuchsmallerthan g(n),oreven(asillustratedinFigure1.1a)thatit iseversmallerthan g(n).Weoccasionallydowanttosaysomethinglike“h(n) becomesnegligible comparedto g(n).”Yousometimesseethenotation h(n) ≪ g(n), meaning“h(n)ismuchsmallerthan g(n),”butthiscouldapplytoasituationwhere h(n)=0.001g(n). Notbeinginterestedinmereconstantfactorslikethis,weneed somethingstronger.Atraditionalnotationis“little-oh,”definedasfollows.
h(n) ∈ o(g(n)) ⇐⇒ lim n→∞ h(n)/g(n)=0.
It’seasytoseethatif h(n) ∈ o(g(n)), then h(n) ∈ Ω(g(n));noconstant K can workinthedefinitionofΩ( ).Itisnotthecase,however,thatallfunctionsthat are outside ofΩ(g(n))mustbein o(g(n)),asillustratedinFigure1.1b.
1.2Examples
Youmayhaveseenthebig-Ohnotationalreadyincalculuscourses.Forexample, Taylor’stheoremtellsus2 that(underappropriateconditions)
forsome y between0and x,where f [k] representsthe kth derivativeof f .Therefore, if g(x)representsthemaximumabsolutevalueof f [n] between0and x,thenwe couldalsowritetheerrortermas f (x)
))
2 Yes,Iknowit’saMaclaurinserieshere,butit’sstillTaylor’stheorem.
errorterm
approximation
f (x)= xn n! f [n](y)
+ 0≤k<n f [k](0) xk k!
0
≤k<n f [k](0) xk k! ∈ O( xn n! g(x))= O(x n g(x
f (n) Iscontainedin Is not containedin
1, 1+1/n O(10000),O(√n),O(n), O(1/n),O(e n)
O(n2),O(lg n),O(1 1/n)
Ω(1), Ω(1/n), Ω(1 1/n) Ω(n), Ω(√n), Ω(lg n), Ω(n2)
Θ(1), Θ(1 1/n) Θ(n), Θ(n2), Θ(lg n), Θ(√n)
o(n),o(√n),o(n2) o(100+ e n),o(1)
logk n, ⌊logk n⌋, O(n),O(nǫ),O(√n),O(logk′ n) O(1)
⌈logk n⌉ O(⌊logk′ n⌋),O(n/ logk′ n)
Ω(1), Ω(logk′ n), Ω(⌊logk′ n⌋) Ω(nǫ), Ω(√n)
Θ(logk′ n), Θ(⌊logk′ n⌋), Θ(log2 k′ n), Θ(logk′ n + n)
Θ(logk′ n +1000)
o(n),o(nǫ)
n, 100n +15 O( 0005n 1000),O(n2), O(10000),O(lg n),
O(n lg n) O(n n2/10000),O(√n)
Ω(50n +1000), Ω(√n), Ω(n2), Ω(n lg n)
Ω(n +lg n), Ω(1/n)
Θ(50n +100), Θ(n +lg n) Θ(n2), Θ(1)
o(n3),o(n lg n) o(1000n),o(n2 sin n)
n2 , 10n2 + n O(n2 +2n +12),O(n3), O(n),O(n lg n),O(1)
O(n2 + √n) o(50n2 +1000)
Ω(n2 +2n +12), Ω(n), Ω(1), Ω(n3), Ω(n2 lg n)
Ω(n lg n)
Θ(n2 +2n +12), Θ(n2 +lg n) Θ(n), Θ(n sin n)
np
O(pn),O(np +1000np 1 ) O(np 1),O(1)
Ω(np ǫ), Ω(np+ǫ), Ω(pn)
Θ(np + np ǫ) Θ(np+ǫ), Θ(1)
o(pn),o(n!),o(np+ǫ) o((n + k)p)
2n , 2n + np O(n!),O(2n np),O(3n),O(2n+p) O(np),O((2 δ)n)
Ω(np), Ω((2 δ)n), Ω((2+ ǫ)n), Ω(n!)
Θ(2n + np) Θ(22n)
o(n2n),o(n!),o(2n+ǫ),o((2+ ǫ)n)
Table1.1:Someexamplesoforderrelations.Intheabove,namesotherthan n representconstants,with ǫ> 0,0 ≤ δ ≤ 1, p> 1,and k,k′ > 1.
12
CHAPTER1.ALGORITHMICCOMPLEXITY
1.3.APPLICATIONSTOALGORITHMANALYSIS 13
forfixed n.Thisis,ofcourse,amuchweakerstatementthantheoriginal(itallows theerrortobemuchbiggerthanitreallyis).
You’lloftenseenstatementslikethiswrittenwithalittle algebraicmanipulation:
f (x) ∈ 0≤k<n f [k](0) xk k! + O(x n g(x)).
Tomakesenseofthissortofstatement,wedefineaddition(andsoon)between functions(a, b,etc.)andsetsoffunctions(A, B,etc.):
a + b = λx.a(x)+ b(x)
A + B = {a + b | a ∈ A,b ∈ B}
A + b = {a + b | a ∈ A} a + B = {a + b | b ∈ B}
Similardefinitionsapplyformultiplication,subtraction,anddivision.Soif a is √x and b islg x,then a + b isafunctionwhosevalueis √x +lg x forevery(postive) x. O(a(x))+ O(b(x))(orjust O(a)+ O(b))isthenthesetoffunctionsyoucanget byaddingamemberof O(√x)toamemberof O(lg x).Forexample, O(a)contains
5√x+3and O(b)contains0 01lg x 16,so O(a)+O(b)contains5√x+0 01lg k 13, amongmanyothers.
1.2.1Demonstrating“Big-Ohness”
Supposewewanttoshowthat5n2 +10√n ∈ O(n2).Thatis,weneedtofind K and M sothat
5n 2 +10√n|≤|Kn2|, for n>M.
Werealizethat n2 growsfasterthan √n,soiteventuallygetsbiggerthan10√n as well.Soperhapswecantake K =6andfind M> 0suchthat
7.Sochoosing M> 5 certainlyworks.
Toget10√n<n2,weneed10 <n
1.3ApplicationstoAlgorithmAnalysis
Inthiscourse,wewillbeusuallydealwithinteger-valuedfunctionsarisingfrom measuringthecomplexityofalgorithms.Table1.1givesafewcommonexamples ofordersthatwedealwithandtheircontainmentrelations, andthesectionsbelow giveexamplesofsimplealgorithmicanalysesthatusethem.
|
5n 2 +10√n ≤ 5n 2 + n 2 =6n 2
3/2,or n> 102/3 ≈ 4
1.3.1Linearsearch
Let’sapplyallofthistoaparticularprogram.Here’satail-recursivelinearsearch forseeingifaparticularvalueisinasortedarray:
/**TrueiffXisoneofA[k]...A[A.length-1]. *AssumesAisincreasing,k>=0.*/ staticbooleanisIn(int[]A,intk,intX){ if(k>=A.length) returnfalse; elseif(A[k]>X) returnfalse; elseif(A[k]==X) returntrue; else returnisIn(A,k+1,X);
Thisisessentiallyaloop.Asameasureofitscomplexity,let’sdefine CisIn(N ) asthemaximumnumberofinstructionsitexecutesforacallwith k =0and A.length= N .Byinspection,youcanseethatsuchacallwillexecutethefirst if testupto N +1times,thesecondandthirdupto N times,andthetail-recursive callon isIn upto N times.Withonecompiler3,eachrecursivecallof isIn executes atmost14instructionsbeforereturningortail-recursivelycalling isIn.Theinitial callexecutes18.Thatgivesatotalofatmost14N +18instructions.Ifinsteadwe countthenumberofcomparisons k>=A.length,wegetatmost N +1.Ifwecount thenumberofcomparisonsagainst X orthenumberoffetchesof A[0],wegetat most2N .Wecouldthereforesaythatthefunctiongivingthelargest amountof timerequiredtoprocessaninputofsize N iseitherin O(14N +18), O(N +1), or O(2N ).However,theseareallthesameset,andinfactallareequalto O(N ). Therefore,wemaythrowawayallthosemessyintegersanddescribe CisIn(N )as beingin O(N ),thusillustratingthesimplifyingpowerofignoringconstantfactors. Thisboundisaworst-casetime.Forallargumentsinwhich X<=A[0],the isIn functionrunsinconstanttime.Thattimebound—the best-case bound—isseldom veryuseful,especiallywhenitappliestosoatypicalaninput.
Givingan O( )boundto CisIn(N )doesn’ttellusthat isIn must taketime proportionalto N evenintheworstcase,onlythatittakesnomore.Inthis particularcase,however,theargumentusedaboveshowsthattheworstcaseis,in fact,atleastproportionalto N ,sothatwemayalsosaythat CisIn(N ) ∈ Ω(N ).
Puttingthetworesultstogether, CisIn(N ) ∈ Θ(N ).
Ingeneral,then,asymptoticanalysisofthespaceortimerequiredforagiven algorithminvolvesthefollowing.
• Decidingonanappropriatemeasureforthe size ofaninput(e.g.,lengthof anarrayoralist).
3aversionofgccwiththe-Ooption,generatingSPARCcodefor aSunSparcstationIPC workstation.
14 CHAPTER1.ALGORITHMICCOMPLEXITY
}
1.3.APPLICATIONSTOALGORITHMANALYSIS
• Choosingarepresentativequantitytomeasure—onethatisproportionalto the“real”spaceortimerequired.
• Comingupwithoneormorefunctionsthatboundthequantitywe’vedecided tomeasure,usuallyintheworstcase.
• Possiblysummarizingthesefunctionsbygiving O(·),Ω(·),orΘ(·)characterizationsofthem.
1.3.2Quadraticexample
Hereisabitofcodeforsortingintegers: staticvoidsort(int[]A){ for(inti=1;i<A.length;i+=1){ intx=A[i]; intj; for(j=i;j>0&&x<A[j-1];j-=1) A[j]=A[j-1]; A[j]=x; } }
Ifwedefine Csort(N )astheworst-casenumberoftimesthecomparison x<A[j-1] isexecutedfor N = A.length,weseethatforeachvalueof i from1to A.length-1, theprogramexecutesthecomparisonintheinnerloop(on j)atmost i times. Therefore,
Csort(N )=1+2+ + N 1 = N (N 1)/2 ∈ Θ(N 2)
Thisisacommonpatternfornestedloops.
1.3.3Explosiveexample
Considerafunctionwiththefollowingform. staticintboom(intM,intX){ if(M==0) returnH(X); returnboom(M-1,Q(X)) +boom(M-1,R(X)); }
andsupposewewanttocompute Cboom(M )—thenumberoftimes Q iscalledfor agiven M intheworstcase.If M =0,thisis0.If M> 0,then Q getsexecuted onceincomputingtheargumentofthefirstrecursivecall,andthenitgetsexecuted howevermanytimesthetwoinnercallsof boom withargumentsof M 1execute
15
it.Inotherwords,
CHAPTER1.ALGORITHMICCOMPLEXITY
Cboom(0)=0
Cboom(i)=2Cboom(i 1)+1
Alittlemathematicalmassage:
Cboom(M )=2Cboom(M 1)+1, for M ≥ 1
=2(2Cboom(M 2)+1)+1, for M ≥ 2
andso Cboom(M ) ∈ Θ(2M ).
1.3.4Divideandconquer
Thingsbecomemoreinterestingwhentherecursivecallsdecreasethesizeofparametersbyamultiplicativeratherthananadditivefactor.Consider,forexample, binarysearch.
/**ReturnstrueiffXisoneof *A[L]...A[U].AssumesAincreasing, *L>=0,U-L<A.length.*/ staticbooleanisInB(int[]A,intL,intU,intX){ if(L>U)
returnfalse; else{ intm=(L+U)/2; if(A[m]==X)
returntrue; elseif(A[m]>X) returnisInB(A,L,m-1,X); else
returnisInB(A,m+1,U,X);
Theworst-casetimeheredependsonthenumberofelementsof A underconsideration, U L +1,whichwe’llcall N .Let’susethenumberoftimesthefirstline isexecutedasthecost,sinceiftherestofthebodyisexecuted,thefirstlinealso hadtohavebeenexecuted4.If N> 1,thecostofexecuting
4Forthoseofyouseekingarcaneknowledge,wesaythatthetest L>U dominates allother statements.
16
=2( (2 M 0+1)+1)
M = 0≤j≤M 1 2j =2M 1
+1
} }
isInB is1comparison
1.3.APPLICATIONSTOALGORITHMANALYSIS 17
of L and U followedbythecostofexecuting isInB eitherwith ⌊(N 1)/2⌋ orwith ⌈(N 1)/2⌉ asthenewvalueof N 5.Eitherquantityisnomorethan ⌈(N 1)/2⌉. If N ≤ 1,therearetwocomparisonsagainst N intheworstcase. Therefore,thefollowingrecurrencedescribesthecost, CisInB(i),ofexecuting thisfunctionwhen U L +1= i
CisInB(1)=2
CisInB(i)=1+ CisInB(⌈(i 1)/2⌉),i> 1
Thisisabithardtodealwith,solet’sagainmakethereasonableassumptionthat thevalueofthecostfunction,whateveritis,mustincrease as N increases.Then wecancomputeacostfunction, C ′ isInB thatisslightlylargerthan CisInB,but easiertocompute.
C ′ isInB(1)=2
C ′ isInB(i)=1+ C ′ isInB(i/2),i> 1apowerof2.
Thisisaslightover-estimateof CisInB,butthatstillallowsustocomputeupper bounds.Furthermore, C ′ isInB isdefinedonlyonpowersoftwo,butsince isInB’s costincreasesas N increases,wecanstillbound CisInB(N )conservativelyby computing C ′ isInB ofthenexthigherpowerof2.Againwiththemassage:
C ′ isInB(i)=1+ C ′ isInB(i/2),i> 1apowerof2.
=1+1+ C ′ isInB(i/4),i> 2apowerof2.
=1+ +1
lg N +2
Thequantitylg N isthelogarithmof N base2,orroughly“thenumberoftimesone candivide N by2beforereaching1.”Insummary,wecansay CisIn(N ) ∈ O(lg N ). Similarly,onecaninfactderivethat CisIn(N ) ∈ Θ(lg N ).
1.3.5Divideandfighttoastandstill
Considernowasubprogramthatcontains two recursivecalls. staticvoidmung(int[]A,L,U){ if(L<U){ intm=(L+U)/2; mung(A,L,m); mung(A,m+1,U);
5 Thenotation ⌊x⌋ meanstheresultofrounding x down(toward −∞)toaninteger,and ⌈x⌉ meanstheresultofrounding x uptoaninteger.
} }
CHAPTER1.ALGORITHMICCOMPLEXITY
Wecanapproximatetheargumentsofbothoftheinternalcallsby N/2asbefore, endingupwiththefollowingapproximation, Cmung(N ),tothecostofcalling mung withargument N = U L +1(wearecountingthenumberoftimesthetestinthe firstlineexecutes).
Cmung(1)=3
Cmung(i)=1+2Cmung(i/2),i> 1apowerof2.
So,
Cmung(N )=1+2(1+2Cmung(N/4)),N> 2apowerof2.
=1+2+4+ + N/2+ N · 3
Thisisasumofageometricseries(1+ r + r2 + + rm),withalittleextraadded on.Thegeneralruleforgeometricseriesis
so,taking r =2,
or Cmung(N ) ∈ Θ(N ).
1.4Amortization
Sofar,wehaveconsideredthetimespentbyindividualoperations,orindividual callsonacertainfunctionofinterest.Sometimes,however,itisfruitfultoconsider thecostofwholesequenceofcalls,especiallywheneachcallaffectsthecostoflater calls.
Consider,forexample,asimplebinarycounter.Incrementingthiscountercauses ittogothroughasequencelikethis:
Eachstepconsistsof flipping acertainnumberofbits,convertingbit b to1 b. Moreprecisely,thealgorithmforgoingfromonesteptoanotheris
18
0≤k≤m r k =(rm
+1 1)/(r 1)
C
mung(N )=4N 1
00000 00001 00010 00011 00100 01111 10000 ···
Increment: Flipthebitsofthecounterfromrighttoleft,uptoandincludingthe first0-bitencountered(ifany).
Clearly,ifweareaskedtogiveaworst-caseboundonthecost oftheincrement operationforan N -bitcounter(innumberofflips),we’dhavetosaythatitis Θ(N ):allthebitscanbeflipped.Usingjustthatbound,we’dthen havetosay thatthecostofperforming M incrementoperationsisΘ(M · N ). Butthecostsofconsecutiveincrementoperationsarerelated.Forexample,if oneincrementflipsmorethanonebit,thenextincrementwill alwaysflipexactly one(why?).Infact,ifyouconsiderthepatternofbitchanges,you’llseethatthe units(rightmost)bitflipsoneveryincrement,the2’sbiton everysecondincrement, the4’sbitoneveryfourthincrement,andingeneral,then2k’sbitonevery(2k)th increment.Therefore,overanysequenceof M consecutiveincrements,startingat 0,therewillbe
Inotherwords,thisisthesameresultwewouldgetifweperformed M incrementseachofwhichhadaworst-casecostof2flips,ratherthan N .Wecall2flips the amortizedcost ofanincrement.To amortize inthecontextofalgorithmsisto treatthecostofeachindividualoperationinasequenceasifitwerespreadout amongalltheoperationsinthesequence6.Anyparticularincrementmighttakeup to N flips,butwetreatthatas N/M flipscreditedtoeachincrementoperationin thesequence(andlikewisecounteachincrementthattakesonlyoneflipas1/M flip foreachincrementoperation).Theresultisthatwegetamorerealisticideaofhow muchtimetheentireprogramwilltake;simplymultiplyingtheordinaryworst-case timeby M givesusaverylooseandpessimisticestimate.Norisamortizedcostthe sameasaveragecost;itisastrongermeasure.Ifacertainoperationhasagiven averagecost,thatleavesopenthepossibilitythatthereis someunlikelysequence ofinputsthatwillmakeitlookbad.Aboundonamortizedworst-casecost,onthe otherhand,is guaranteed toholdregardlessofinput.
Anotherwaytoreachthesameresultuseswhatiscalledthe potentialmethod7 .The ideahereisthatweassociatewithourdatastructure(ourbitsequenceinthiscase) anon-negative potential thatrepresentsworkwewishtospreadoutoverseveraloperations.If ci representstheactualcostofthe ith operationonourdatastructure,
6 Theword amortize comesfromanOldFrenchwordmeaning“todeath.”Theoriginalmeaning fromwhichthecomputer-scienceusagecomes(introducedby SleatorandTarjan),is“togradually writeofftheinitialcostofsomething.”
7 AlsoduetoD.Sleator.
1.4.AMORTIZATION 19
M unit’sflips + ⌊M/2⌋ 2’sflips + ⌊M/4⌋ 4’sflips + ... + ⌊M/2n⌋ 2n’sflips , where n = ⌊lg M ⌋ =2n +2n 1 +2n 2 + ... +1 =2n+1 1 +(M 2n) =2n 1+ M < 2M flips
wedefinetheamortizedcostofthe ith operation, ai sothat
, (1.1)
whereΦi denotesthesaved-uppotentialbeforethe ith operation.Thatis,wegive ourselvesthechoiceofincreasingΦalittleonanygivenoperationandchargingthis increaseagainst ai,causing ai >ci whenΦincreases.Alternatively,wecanalso decrease ai below ci byhavinganoperationreduceΦ,ineffectusinguppreviously savedincreases.AssumingwestartwithΦ0 =0,thetotalcostof n operationsis
sincewerequirethatΦi ≥ 0.These ai thereforeprovideconservativeestimatesof thecumulativecostoftheoperationsateachpoint.
Forexample,withourbit-flippingexample,we’lldefineΦi asthetotalnumber of1-bitsbeforethe ith operation.Thecostofthe ith incrementisalwaysoneplus thenumberof1-bitsthatflipbackto0,which,becauseofhowwe’vedefinedit, canneverbemorethanΦi (whichofcourseisnevernegative).Sodefining ai =2 foreveryoperationsatisfiesEquation1.1,againprovingthatwecanboundthe amortizedcostofanincrementby2bit-flips.
Ifudgedabitherebyassumingthatourbitcounteralwaysstartsat0.Ifit startedinsteadat N0 > 0,andwestoppedafterasingleincrement,thenthetotal cost(inbitflips)couldbeasmuchas1+ ⌊lg(N0 +1)⌋.Sincewewanttoinsure thattheinequality1.2holdsforany n,we’llhavetodosomeadjustingtohandle thiscase.AsimpletrickistoredefineΦ0 =0,keepothervaluesoftheΦi thesame (thenumberof1-bitsbeforethe ith operation,andfinallydefine a0 = c0 +Φ1.In effect,wecharge a0 withthestart-upcostsofourcountingsequence.Ofcourse, this means a0 canbearbitrarilylarge,butthatmerelyreflectsreality;theremaining ai arestillconstant.
1.5ComplexityofProblems
Sofar,Ihavediscussedonlytheanalysisofanalgorithm’scomplexity.Analgorithm,however,isjustaparticularwayofsolvingsomeproblem.Wemighttherefore consideraskingforcomplexityboundsonthe problem’s complexity.Thatis,canwe boundthecomplexityofthe bestpossible algorithm?Obviously,ifwehaveaparticularalgorithmanditstimecomplexityis O(f (n)),where n isthesizeoftheinput, thenthecomplexityofthebestpossiblealgorithmmustalso be O(f (n)).Wecall f (n),therefore,an upperbound onthe(unknown)complexityofthebest-possible
20
CHAPTER1.ALGORITHMICCOMPLEXITY
a
i = ci +Φi+1 Φi
0≤i<n ci ≤ 0≤i<n (ai +Φi Φi +1) =( 0≤i<n ai)+Φ0 Φn =( 0≤i<n ai) Φn ≤ 0≤i<n ai, (1.2)
1.6.SOMEPROPERTIESOFLOGARITHMS
21
algorithm.Butthistellsusnothingaboutwhetherthebest-possiblealgorithmis any faster thanthis—itputsno lowerbound onthetimerequiredforthebestalgorithm.Forexample,theworst-casetimefor isIn isΘ(N ).However, isInB is muchfaster.Indeed,onecanshowthatiftheonlyknowledgethealgorithmcan haveistheresultofcomparisonsbetween X andelementsofthearray,then isInB hasthebestpossiblebound(itis optimal),sothattheentire problem offindingan elementinanorderedarrayhasworst-casetimeΘ(lg N ).
Puttinganupperboundonthetimerequiredtoperformsomeproblemsimply involvesfindinganalgorithmfortheproblem.Bycontrast,puttingagoodlower boundontherequiredtimeismuchharder.Weessentiallyhavetoprovethatno algorithmcanhaveabetterexecutiontimethanourbound,regardlessofhowmuch smarterthealgorithmdesigneristhanweare.Triviallower bounds,ofcourse,are easy:everyproblem’sworst-casetimeisΩ(1),andtheworst-casetimeofanyproblemwhoseanswerdependsonallthedataisΩ(N ),assumingthatone’sidealized machineisatallrealistic.Betterlowerboundsthanthose, however,requirequite abitofwork.Allthebettertokeepourtheoreticalcomputer scientistsemployed.
1.6SomePropertiesofLogarithms
Logarithmsoccurfrequentlyinanalysesofcomplexity,soitmightbeusefultoreview afewfactsaboutthem.Inmostmathcourses,youencounterthenaturallogarithm, ln x =loge x,butcomputerscientiststendtousethebase-2logarithm,lg x =log2 x, andingeneralthisiswhatImeanwhenIsay“logarithm.”Ofcourse,alllogarithms arerelatedbyaconstantfactor:sincebydefinition aloga x = x = blogb x,itfollows that
loga x =loga blogb x =(loga b)logb x.
Theirconnectiontotheexponentialdictatestheirfamiliarproperties:
lg xy =lg x +lg y
lg x/y =lg x lg y
lg xp = p lg x
Incomplexityarguments,weareofteninterestedininequalities.Thelogarithm isaveryslow-growingfunction:
lim x→∞ lg x/xp =0, forallp> 0.
Itisstrictlyincreasingandstrictly concave, meaningthatitsvalueslieaboveanyline segmentjoiningpoints(x, lg x)and(z, lg z) Toputitalgebraically,if0 <x<y<z, then
lgy> y x z x lg x + z y z x lg z.
Therefore,if0 <x + y<k,thevalueoflg x +lg y ismaximizedwhen x = y = k/2.
1.7ANoteonNotation
Otherauthorsusenotationsuchas f (n)= O(n2)ratherthan f (n) ∈ O(n2).Idon’t becauseIconsideritnonsensical.Tojustifytheuseof‘=’, oneeitherhastothink of f (n)asasetoffunctions(whichitisn’t),orthinkof O(n2)asasinglefunction thatdifferswitheachseparateappearanceof O(n2)(whichisbizarre).Icanseeno disadvantagestousing‘∈’,whichmakesperfectsense,sothat’swhatIuse.
Exercises
1.1. Demonstratethefollowing,orgivecounter-exampleswhere indicated.Showingthatacertain O( )formulaistruemeansproducingsuitable K and M for thedefinitionatthebeginningof §1.1.Hint:sometimesitisusefultotakethe logarithmsoftwofunctionsyouarecomparing.
a. O(max(|f0(n)|, |f1 (n)|))= O(f0(n))+ O(f1(n)).
b.If f (n)isapolynomialin n,thenlg f (n) ∈ O(lg n).
c. O(f (n)+ g(n))= O(f (n))+ O(g(n)).Thisisabitoftrickquestion,really, tomakeyoulookatthedefinitionscarefully.Underwhatconditionsisthe equationtrue?
d.Thereisafunction f (x) > 0suchthat f (x) ∈ O(x)and f (x) ∈ Ω(x).
e.Thereisafunction f (x)suchthat f (0)=0,f (1)=100,f (2)=10000,f (3)= 106 ,but f (n) ∈ O(n).
f. n3 lg n ∈ O(n3 0001).
g.Thereisnoconstant k suchthat n3 lg n ∈ Θ(nk).
1.2. Showeachofthefollowing false byexhibitingacounterexample.Assume that f and g areanyreal-valuedfunctions.
a. O(f (x) · s(x))= o(f (x)),assuminglimx→∞ s(x)=0.
b. If f (x) ∈ O(x3)and g(x) ∈ O(x)then f (x)/g(x) ∈ O(x2).
c. If f (x) ∈ Ω(x)and g(x) ∈ Ω(x)then f (x)+ g(x) ∈ Ω(x).
d. If f (100)=1000and f (1000)=1000000then f cannotbe O(1).
e. If f1(x),f2(x),... areabunchoffunctionsthatareallinΩ(1),then
F (N )= 1≤i≤N |fi(x)|∈ Ω(N )
22
CHAPTER1.ALGORITHMICCOMPLEXITY
Chapter2
DataTypesintheAbstract
Mostofthe“classical”datastructurescoveredincourseslikethisrepresentsome sortof collection ofdata.Thatis,theycontainsomesetormultiset1 ofvalues, possiblywithsomeorderingonthem.Someofthesecollectionsofdataare associativelyindexed; theyaresearchstructuresthatactlikefunctionsmappingcertain indexingvalues(keys)intootherdata(suchasnamesintostreetaddresses).
Wecancharacterizethesituationintheabstractbydescribingsetsofoperationsthataresupportedbydifferentdatastructures—thatisbydescribingpossible abstractdatatypes. Fromthepointofviewofaprogramthatneedstorepresent somekindofcollectionofdata,thissetofoperationsisall thatoneneedstoknow.
Foreachdifferentabstractdatatype,therearetypicallyseveralpossibleimplementations.Whichyouchoosedependsonhowmuchdatayourprogramhasto process,howfastithastoprocessthedata,andwhatconstraintsithasonsuch thingsasmemoryspace.Itisadirtylittlesecretofthetradethatforquiteafew programs,ithardlymatterswhatimplementationyouchoose.Nevertheless,the well-equippedprogrammershouldbefamiliarwiththeavailabletools.
Iexpectthatmanyofyouwillfindthischapterfrustrating,becauseitwilltalk mostlyabout interfaces todatatypeswithouttalkingverymuchatallaboutthe implementationsbehindthem.Getusedtoit.Afterall,thestandardlibrarybehind anywidelyusedprogramminglanguageispresentedtoyou,theprogrammer,asa setofinterfaces—directionsforwhatparameterstopassto eachfunctionandsome commentary,generallyinEnglish,aboutwhatitdoes.Asaworkingprogrammer, youwillinturnspendmuchofyourtimeproducingmodulesthatpresentthesame featurestoyourclients.
2.1Iterators
Ifwearetodevelopsomegeneralnotionofacollectionofdata,thereisatleastone genericquestionwe’llhavetoanswer:howarewegoingtoget items out ofsucha collection?Youarefamiliarwithonekindofcollectionalready—anarray.Getting
1 A multiset or bag islikeasetexceptthatitmaycontainmultiplecopiesofaparticulardata value.Thatis,eachmemberofamultisethasa multiplicity: anumberoftimesthatitappears.
23
itemsoutofanarrayiseasy;forexample,toprintthecontentsofanarray,you mightwrite
for(inti=0;i<A.length;i+=1) System.out.print(A[i]+",");
Arrayshaveanaturalnotionofan nth element,sosuchloopsareeasy.Butwhat aboutothercollections?Whichisthe“firstpenney”inajarofpenneys?Evenif wedoarbitrarilychoosetogiveeveryiteminacollectionanumber,wewillfind thattheoperation“fetchthe nth item”maybeexpensive(considerlistsofthings suchasinScheme).
Theproblemwithattemptingtoimposeanumberingoneverycollectionofitems aswaytoextractthemisthatitforcestheimplementorofthe collectiontoprovide amorespecifictoolthanourproblemmayrequire.It’saclassicengineeringtradeoff:satisfyingoneconstraint(thatonebeabletofetchthe nth item)mayhave othercosts(fetchingallitemsonebyonemaybecomeexpensive).
Sotheproblemistoprovidetheitemsinacollectionwithout relyingonindices, orpossiblywithoutrelyingonorderatall.Javaprovidestwoconventions,realizedas interfaces.Theinterface java.util.Iterator providesawaytoaccessalltheitems inacollectionin some order.Theinterface java.util.ListIterator providesa waytoaccessitemsinacollectioninsomespecificorder,but withoutassigningan indextoeachitem2
2.1.1TheIteratorInterface
TheJavalibrarydefinesaninterface, java.util.Iterator,showninFigure2.1, thatcapturesthegeneralnotionof“somethingthatsequencesthroughallitems inacollection”withoutanycommitmenttoorder.ThisisonlyaJavainterface; thereisnoimplementationbehindit.IntheJavalibrary,thestandardwayfora classthatrepresentsacollectionofdataitemstoprovidea waytosequencethrough thoseitemsistodefineamethodsuchas
Iterator<SomeType>iterator(){...}
thatallocatesandreturnsanIterator(Figure3.3includes anexample).Oftenthe actualtypeofthisiteratorwillbehidden(evenprivate);alltheuseroftheclass needstoknowisthattheobjectreturnedby iterator providestheoperations hasNext and next (andsometimes remove).Forexample,ageneralwaytoprint allelementsofacollectionof Strings(analogoustothepreviousarrayprinter) mightbe
for(Iterator<String>i=C.iterator();i.hasNext();) System.out.print(i.next()+"");
2Thelibraryalsodefinestheinterface java.util.Enumeration,whichisessentiallyanolder versionofthesameidea.Wewon’ttalkaboutthatinterfacehere,sincetheofficialpositionisthat Iterator ispreferredfornewprograms.
24
CHAPTER2.DATATYPESINTHEABSTRACT
packagejava.util;
/**Anobjectthatdeliverseachiteminsomecollectionofitems *eachofwhichisaT.*/ publicinterfaceIterator<T>{
/**Trueifftherearemoreitemstodeliver.*/ booleanhasNext();
/**AdvanceTHIStothenextitemandreturnit.*/ Tnext();
/**Removethelastitemdeliveredbynext()fromthecollection *beingiteratedover.Optionaloperation:maythrow *UnsupportedOperationExceptionifremovalisnotpossible.*/ voidremove();
Theprogrammerwhowritesthisloopneedn’tknowwhatgyrationstheobject i hastogothroughtoproducetherequestedelements;evenamajorchangeinhow C representsitscollectionrequiresnomodificationtotheloop. Thisparticularkindof for loopissocommonandusefulthatinJava2,version 1.5,ithasitsown“syntacticsugar,”knownasan enhanced for loop. Youcanwrite for(Stringi:C) System.out.print(i+"");
togetthesameeffectastheprevious for loop.Javawillinsertthemissingpieces, turningthisinto
for(Iterator<String> ρ =C.iterator(); ρ hasNext();){
Stringi= ρ.next(); System.out.println(i+"");
where ρ issomenewvariableintroducedbythecompilerandunusedelsewhere intheprogram,andwhosetypeistakenfromthatof C.iterator().Thisenhanced for loopwillworkforanyobject C whosetypeimplementstheinterface java.lang.Iterable,definedsimply
publicinterfaceIterable<T>{ Iterator<T>iterator();
Thankstotheenhanced for loop,simplybydefiningan iterator methodonatype youdefine,youprovideaveryconvenientwaytosequencethroughanysubparts thatobjectsofthattypemightcontain.
Well,needlesstosay,havingintroducedthisconvenientshorthandfor Iterators, Java’sdesignersweresuddenlyinthepositionthatiteratingthroughtheelements
2.1.ITERATORS 25
}
Figure2.1:The java.util.Iterator interface.
}
}
ofanarraywasmuchclumsierthaniteratingthroughthoseof alibraryclass.So theyextendedtheenhanced for statementtoencompassarrays.So,forexample, thesetwomethodsareequivalent:
/**Thesumofthe *elementsofA*/ intsum(int[]A){ intS; S=0; for(intx:A) =⇒ S+=x; }
/**Thesumoftheelements *ofA*/ intsum(int[]A){ intS; S=0; for(int κ =0; κ <A.length; κ++) { intx=A[κ]; S+=x; } }
where κ isanewvariableintroducedbythecompiler.
2.1.2TheListIteratorInterface
Somecollectionsdohaveanaturalnotionofordering,butit maystillbeexpensive toextractanarbitraryitemfromthecollectionbyindex.Forexample,youmay haveseenlinkedlistsintheSchemelanguage:givenaniteminthelist,itrequires n operationstofindthe nth succeedingitem(incontrasttoaJavaarray,which requiresonlyoneJavaoperationorafewmachineoperations toretrieveanyitem).
ThestandardJavalibrarycontainstheinterface java.util.ListIterator,which capturestheideaofsequencingthroughanorderedsequence withoutfetchingeach explicitlybynumber.ItissummarizedinFigure2.2.Inadditiontothe“navigational”methodsandthe remove methodof Iterator (whichitextends),the ListIterator classprovidesoperationsforinsertingnewitemsorreplacingitems inacollection.
2.2TheJavaCollectionAbstractions
TheJavalibrary(beginningwithJDK1.2)providesahierarchyofinterfacesrepresentingvariouskindsofcollection,plusahierarchyofabstractclassestohelp programmersprovideimplementationsoftheseinterfaces, aswellasafewactual (“concrete”)implementations.Theseclassesareallfound inthepackage java.util Figure2.4illustratesthehierarchyofclassesandinterfacesdevotedtocollections.
2.2.1TheCollectionInterface
TheJavalibraryinterface java.util.Collection,whosemethodsaresummarized inFigures2.5and2.6,issupposedtodescribedatastructuresthatcontaincollectionsofvalues,whereeachvalueisareferencetosomeObject(ornull).The term“collection”asopposedto“set”isappropriatehere,because Collection is supposedtobeabledescribemultisets(bags)aswellasordinarymathematicalsets.
26
CHAPTER2.DATATYPESINTHEABSTRACT
packagejava.util;
/**Abstractionofapositioninanorderedcollection.Atany *giventime,THISrepresentsaposition(calledits cursor ) *thatisjustaftersomenumberofitemsoftypeT(0ormore)of *aparticularcollection,calledthe underlyingcollection.*/ publicinterfaceListIterator<T>extendsIterator<T>{
/*Exceptions:Methodsthatreturnitemsfromthecollectionthrow
*NoSuchElementExceptionifthereisnoappropriateitem.Optional *methodsthrowUnsupportedOperationExceptionifthemethodisnot *supported.*/
/* Requiredmethods: */
/**TrueunlessTHISispastthelastitemofthecollection*/ booleanhasNext();
/**TrueunlessTHISisbeforethefirstitemofthecollection*/ booleanhasPrevious();
/**Returnstheitemimmediatelyafterthecursor,and *movesthecurrentpositiontojustafterthatitem.
*ThrowsNoSuchElementExceptionifthereisnosuchitem.*/ Tnext();
/**Returnstheitemimmediatelybeforethecursor,and *movesthecurrentpositiontojustbeforethatitem.
*ThrowsNoSuchElementExceptionifthereisnosuchitem.*/ Tprevious();
/**Thenumberofitemsbeforethecursor*/ intnextIndex();
/*nextIndex()-1*/ intpreviousIndex();
2.2.THEJAVACOLLECTIONABSTRACTIONS 27
Figure2.2:The java.util.ListIterator interface.
CHAPTER2.DATATYPESINTHEABSTRACT
/* Optionalmethods: */
/**InsertitemXintotheunderlyingcollectionimmediatelybefore *thecursor(Xwillbereturnedbyprevious()).*/ voidadd(Tx);
/**Removetheitemreturnedbythemostrecentcallto.next() *or.previous().Theremustnothavebeenamorerecent *callto.add().*/ voidremove();
/**Replacetheitemreturnedbythemostrecentcallto.next () *or.previous()withXintheunderlyingcollection.
*Theremustnothavebeenamorerecentcallto.add()or.remove.*/ voidset(Tx);
Figure2.3: TheJavalibrary’sMap-relatedtypes(from java.util).Ellipsesrepresentinterfaces;dashedboxesareabstractclasses,andsolidboxesareconcrete (non-abstract)classes.Solidarrowsindicate extends relationships,anddashed arrowsindicate implements relationships.Theabstractclassesareforuseby implementorswishingtoaddnewcollectionclasses;theyprovidedefaultimplementationsofsomemethods.Clientsapply new totheconcreteclassestogetinstances, and(atleastideally),usetheinterfacesasformalparametertypessoastomake theirmethodsaswidelyapplicableaspossible.
28
}
Map AbstractMap SortedMap HashMap WeakHashMap TreeMap
Figure2.2,continued:Optionalmethodsinthe ListIterator class.
2.2.THEJAVACOLLECTIONABSTRACTIONS
List Set
SortedSet
AbstractCollection
AbstractList
AbstractSequentialList ArrayList Vector
AbstractSet
HashSet TreeSet
LinkedList Stack
Figure2.4: TheJavalibrary’sCollection-relatedtypes(from java.util).SeeFigure2.3forthenotation.
29
Collection
Sincethisisaninterface,thedocumentationcommentsdescribingtheoperations neednotbeaccurate;anineptormischievousprogrammercan writeaclassthat implements Collection inwhichthe add method removes values.Nevertheless, anydecentimplementorwillhonorthecomments,sothatanymethodthataccepts a Collection, C,asanargumentcanexpectthat,afterexecuting C .add(x),the value x willbein C
Noteverykindof Collection needstoimplementeverymethod—specifically, nottheoptionalmethodsinFigure2.6—butmayinsteadchoosetoraisethestandardexception UnsupportedOperationException.See §2.5forafurtherdiscussionofthisparticulardesignchoice.Classesthatimplementonlytherequired methodsareessentially read-only collections;theycan’tbemodifiedoncetheyare created.
ThecommentconcerningconstructorsinFigure2.5is,ofcourse,merelyacomment.Javainterfacesdonothaveconstructors,sincetheydonotrepresentspecific typesofconcreteobject.Nevertheless,youultimatelyneedsomeconstructortocreatea Collection inthefirstplace,andthepurposeofthecommentistosuggest someusefuluniformity.
Atthispoint,youmaywellbewonderingofwhatpossibleusethe Collection classmightbe,inasmuchasitisimpossibletocreateonedirectly(itisaninterface), andyouaremissingdetailsaboutwhatitsmembersdo(forexample,canagiven Collection havetwoequalelements?).Thepointisthatanyfunctionthatyou can writeusingjusttheinformationprovidedinthe Collection interfacewillwork for all implementationsof Collection
Forexample,hereissimplemethodtodetermineiftheelementsofone Collection areasubsetofanother:
/**TrueiffC0isasubsetofC1,ignoringrepetitions.*/ publicstaticbooleansubsetOf(Collection<?>C0,Collection<?>C1){ for(Objecti:C0)
if(!C1.contains(i)) returnfalse;
//Note:equivalentto
//for(Iterator<?>iter=C0.iterator();iter.hasNext(); ){ //Objecti=iter.next();
returntrue;
Wehavenoideawhatkindsofobjects C0 and C1 are(theymightbecompletely differentimplementationsof Collection),inwhatordertheiriteratorsdeliverelements,orwhethertheyallowrepetitions.Thismethodreliessolelyontheproperties describedintheinterfaceanditscomments,andthereforealwaysworks(assuming, asalways,thattheprogrammerswhowriteclassesthatimplement Collection dotheirjobs).Wedon’thavetorewriteitforeachnewkindof Collection we implement.
30
CHAPTER2.DATATYPESINTHEABSTRACT
//...
}
2.2.THEJAVACOLLECTIONABSTRACTIONS
31 packagejava.util; /**Acollectionofvalues,eachanObjectreference.*/ publicinterfaceCollection<T>extendsIterable<T>{
/* Constructors. ClassesthatimplementCollectionshould *haveatleasttwoconstructors:
*CLASS():ConstructsanemptyCLASS
*CLASS(C):WhereCisanyCollection,constructsaCLASSthat *containsthesameelementsasC.*/
/* Requiredmethods: */
/**ThenumberofvaluesinTHIS.*/ intsize();
/**Trueiffsize()==0.*/ booleanisEmpty();
/**TrueiffTHIScontainsX:thatis,ifforsomezin *THIS,eitherzandXarenull,orz.equals(X).*/ booleancontains(Objectx);
/**Trueiffcontains(x)forallelementsxinC.*/ booleancontainsAll(Collection<?>c);
/**AniteratorthatyieldsalltheelementsofTHIS,insome *order.*/ Iterator<T>iterator();
/**AnewarraycontainingallelementsofTHIS.*/ Object[]toArray();
/**AssumingANARRAYhasdynamictypeT[](whereTissome *referencetype),theresultisanarrayoftypeT[]containing *allelementsofTHIS.TheresultisANARRAYitself,ifallof *theseelementsfit(leftoverelementsofANARRAYaresetto null).
*Otherwise,theresultisanewarray.Itisanerrorifnot *allitemsinTHISareassignabletoT.*/ <T>T[]toArray(T[]anArray);
Figure2.5:Theinterface java.util.Collection,requiredmembers.
//Interfacejava.util.Collection,continued. /* Optionalmethods. Anyofthesemaydonothingexceptto *throwUnsupportedOperationException.*/
/**CauseXtobecontainedinTHIS.ReturnstrueiftheCollection*/ *changesasaresult.*/ booleanadd(Tx);
/**CauseallmembersofCtobecontainedinTHIS.Returnstrue *iftheobjectTHISchangesasaresult.*/ booleanaddAll(Collection<?extendsT>c);
/**RemoveallmembersofTHIS.*/ voidclear();
/**RemoveaObject.equaltoXfromTHIS,ifoneexists, *returningtrueifftheobjectTHISchangesasaresult.*/ booleanremove(ObjectX);
/**Removeallelements,x,suchthatC.contains(x)(ifany *arepresent),returningtrueifftherewereany *objectsremoved.*/ booleanremoveAll(Collection<?>c);
/**Intersection:Removeallelements,x,suchthatC.contains(x) *isfalse,returningtrueiffanyitemswereremoved.*/ booleanretainAll(Collection<?>c);
32
CHAPTER2.DATATYPESINTHEABSTRACT
}
Figure2.6:Optionalmembersoftheinterface java.util.Collection
2.2.2TheSetInterface
Inmathematics,asetisacollectionofvaluesinwhichthere arenoduplicates.This istheideaalsofortheinterface java.util.Set.Unfortunately,thisprovisionis notdirectlyexpressibleintheformofaJavainterface.Infact,asfarastheJava compilerisconcerned,thefollowingservesasaperfectlygooddefinition:
packagejava.util;
publicinterfaceSet<T>extendsCollection<T>{}
Themethods,thatis,areallthesame.Thedifferencesareall inthecomments. Theone-copy-of-each-elementruleisreflectedinmorespecificcommentsonseveral methods.TheresultisshowninFigure2.7.Inthisdefinition,wealsoincludethe methods equals and hashCode.Thesemethodsareautomaticallypartofanyinterface,becausetheyaredefinedintheJavaclass java.lang.Object,butIincluded themherebecausetheirsemanticspecification(thecomment)ismorestringentthan forthegeneralObject.Theidea,ofcourse,isfor equals todenotesetequality. We’llreturnto hashCode inChapter7.
2.2.3TheListInterface
AsthetermisusedintheJavalibraries,alistisasequenceofitems,possiblywith repetitions.Thatis,itisaspecializedkindof Collection,oneinwhichthereisa sequencetotheelements—afirstitem,alastitem,an nth item—anditemsmaybe repeated(itcan’tbeconsidereda Set).Asaresult,itmakessensetoextendthe interface(relativeto Collection)toincludeadditionalmethodsthatmakesense forwell-orderedsequences.Figure2.8displaystheinterface.
Agreatdealoffunctionalityhereiswrappedupinthe listIterator method andtheobjectitreturns.Asyoucanseefromtheinterfacedescriptions,youcan insert,add,remove,orsequencethroughitemsina List eitherbyusingmethods inthe List interfaceitself,orbyusing listIterator tocreatealistiteratorwith whichyoucandothesame.Theideaisthatusingthe listIterator toprocess anentirelist(orsomepartofit)willgenerallybefasterthanusing get andother methodsof List thatusenumericindicestodenoteitemsofinterest.
Views
The subList methodisparticularlyinteresting.Acallsuchas L.subList(i,j) is supposedtoproduceanother List (whichwillgenerally not beofthesametypeas L)consistingofthe ith throughthe (j-1)th itemsof L.Furthermore,itistodo thisbyprovidinga view ofthispartof L—thatis,analternativewayofaccessing thesamedatacontainers.Theideaisthatmodifyingthesublist(usingmethods suchas add, remove,and set)issupposedtomodifythecorrespondingportionof L aswell.Forexample,toremoveallbutthefirst k itemsinlist L,youmightwrite L.subList(k,L.size()).clear();
2.2.THEJAVACOLLECTIONABSTRACTIONS 33
packagejava.util;
/**ACollectionthatcontainsatmostonenullitemandinwhichno *twodistinctnon-nullitemsare.equal.Theeffectsofmodifying *anitemcontainedinaSetsoastochangethevalueof.equal *onitareundefined.*/
publicinterfaceSet<T>extendsCollection<T>{
/* Constructors. ClassesthatimplementSetshould *haveatleasttwoconstructors:
*CLASS():ConstructsanemptyCLASS
*CLASS(C):WhereCisanyCollection,constructsaCLASSthat *containsthesameelementsasC,withduplicatesremoved.*/
/**CauseXtobecontainedinTHIS.ReturnstrueiffXwas*/ *notpreviouslyamember.*/ booleanadd(Tx);
/**TrueiffSisaSet(instanceofSet)andisequaltoTHISasa *set(size()==S.size()eachofiteminSiscontainedinTHIS).*/ booleanequals(ObjectS);
/**Thesumofthevaluesofx.hashCode()forallxinTHIS,with *thehashCodeofnulltakentobe0.*/ inthashCode();
/*OthermethodsinheritedfromCollection:
*size,isEmpty,contains,containsAll,iterator,toArray, *addAll,clear,remove,removeAll,retainAll*/
34
CHAPTER2.DATATYPESINTHEABSTRACT
}
Figure2.7:Theinterface java.util.Set.Onlymethodswithcommentsthatare morespecificthanthoseof Collection areshown.
35 packagejava.util;
/**Anorderedsequenceofitems,indexedbynumbers0..N-1, *whereNisthesize()oftheList.*/ publicinterfaceList<T>extendsCollection<T>{
/* Requiredmethods: */
/**TheKthelementofTHIS,where0<=K<size().Throws *IndexOutOfBoundsExceptionifKisoutofrange.*/ Tget(intk);
/**Thefirstvalueksuchthatget(k)isnullifX==null, *X.equals(get(k)),otherwise,or-1ifthereisnosuchk.*/ intindexOf(Objectx);
/**Thelargestvalueksuchthatget(k)isnullifX==null, *X.equals(get(k)),otherwise,or-1ifthereisnosuchk.*/ intlastIndexOf(Objectx);
/*NOTE:Themethodsiterator,listIterator,andsubListproduce *viewsthatbecomeinvalidifTHISisstructurallymodified by *anyothermeans(seetext).*/
/**AniteratorthatyieldsalltheelementsofTHIS,inproper *indexorder.(NOTE:itisalwaysvalidforiterator()to *returnthesamevalueaswouldlistIterator,below.)*/ Iterator<T>iterator();
/**AListIteratorthatyieldstheelementsK,K+1,...,size()-1 *ofTHIS,inthatorder,where0<=K<=size().Throws *IndexOutOfBoundsExceptionifKisoutofrange.*/ ListIterator<T>listIterator(intk);
/**SameaslistIterator(0)*/ ListIterator<T>listIterator();
/**AviewofTHISconsistingoftheelementsL,L+1,...,U-1, *inthatorder.ThrowsIndexOutOfBoundsExceptionunless *0<=L<=U<=size().*/ List<T>subList(intL,intU);
/*OthermethodsinheritedfromCollection: *add,addAll,size,isEmpty,contains,containsAll,remove,toArray*/
java.util.List,beyondthoseinherited from Collection.
2.2.THEJAVACOLLECTIONABSTRACTIONS
Figure2.8:Requiredmethodsofinterface
/* Optionalmethods: */
/**CauseitemKofTHIStobeX,anditemsK+1,K+2,...tocontain *thepreviousvaluesofget(K),get(K+1),....Throws *IndexOutOfBoundsExceptionunless0<=K<=size().*/ voidadd(intk,Tx);
/**Sameeffectasadd(size(),x);alwaysreturnstrue.*/ booleanadd(Tx);
/**IftheelementsreturnedbyC.iterator()arex0,x1,..., in *thatorder,thenperformtheequivalentofadd(K,x0), *add(K+1,x1),...,returningtrueifftherewasanythingto *insert.IndexOutOfBoundsExceptionunless0<=K<=size().*/ booleanaddAll(intk,Collection<T>c);
/**SameasaddAll(size(),c).*/ booleanaddAll(Collection<T>c);
/**RemoveitemK,movingitemsK+1,...downoneindexposition, *andreturningtheremoveditem.Throws *IndexOutOfBoundsExceptionifthereisnoitemK.*/ Objectremove(intk);
/**RemovethefirstitemequaltoX,ifany,movingsubsequent *elementsoneindexpositionlower.Returntrueiffanything *wasremoved.*/ booleanremove(Objectx);
/**Replaceget(K)withX,returningtheinitial(replaced) valueof *get(K).ThrowsIndexOutOfBoundsExceptionifthereisnoitemK.*/ Objectset(intk,Tx);
/*OthermethodsinheritedfromCollection:removeAll,retainAll*/ }
36
CHAPTER2.DATATYPESINTHEABSTRACT
Figure2.8,continued:Optionalmethodsofinterface java.util.List,beyond fromthoseinheritedfrom Collection
Asaresult,therearealotofpossibleoperationson List thatdon’thavetobe defined,becausetheyfalloutasanaturalconsequenceofoperationsonsublists. Thereisnoneedforaversionof remove thatdeletesitems i through j ofalist,or foraversionof indexOf thatstartssearchingatitem k.
Iterators(includingListIterators)provideanotherexampleofaviewofCollections.Again,youcanaccessor(sometimes)modifythecurrentcontentsofa Collectionthroughaniteratorthatitsmethodssupply.For thatmatter,anyCollectionisitselfaview—the“identityview”ifyouwant.
Whenevertherearetwopossibleviewsofthesameentity,thereisapossibility thatusingoneofthemtomodifytheentitywillinterferewiththeotherview.It’s notjustthatchangesinoneviewaresupposedtobeseeninotherviews(asinthe exampleofclearingasublist,above),butstraightforward andfastimplementations ofsomeviewsmaymalfunctionwhentheentitybeingviewedis changedbyother means.Whatissupposedtohappenwhenyoucall remove onaniterator,butthe itemthatissupposedtoberemoved(accordingtothespecificationof Iterator) hasalreadybeenremoveddirectly(bycalling remove onthefullCollection)?Or supposeyouhaveasublistcontainingitems2through4ofsomefulllist.Ifthefull listis cleared,andthen3itemsareaddedtoit,whatisinthesublistview?
Becauseofthesequandries,thefullspecificationofmanyview-producingmethods(inthe List interface,theseare iterator, listIterator,and subList)have aprovisionthattheviewbecomesinvalidiftheunderlying List is structurallymodified (thatis,ifitemsareaddedorremoved)throughsomemeansotherthanthat view.Thus,theresultof L.iterator() becomesinvalidifyouperform L.add(...), orifyouperform remove onsomeother Iterator orsublistproducedfrom L.By contrast,wewillalsoencounterviews,suchasthoseproducedbythe values method on Map (seeFigure2.12),thataresupposedtoremainvalidevenwhentheunderlyingobjectisstructurallymodified;itisanobligationon theimplementorsofnew kindsof Map thattheyseethatthisisso.
2.2.4OrderedSets
The List interfacedescribesdatatypesthatdescribesequencesinwhichtheprogrammerexplicitlydeterminestheorderofitemsinthesequencebytheorderor placeinwhichtheyareaddedtothesequence.Bycontrast,the SortedSet interfaceisintendedtodescribesequencesinwhichthe data determinetheordering accordingtosomeselectedrelation.Ofcourse,thisimmediatelyraisesaquestion: inJava,howdowerepresentthis“selectedrelation”sothat wecanspecifyit?How dowemakeanorderingrelationaparameter?
Orderings:the Comparable and Comparator Interfaces
Therearevariouswaysforfunctionstodefineanorderingoversomesetofobjects. Onewayistodefinebooleanoperations equals, less, greater,etc.,withthe obviousmeanings.LibrariesintheCfamilyoflanguages(whichincludesJava) tendtocombinealloftheseintoasinglefunctionthatreturnsanintegerwhose signdenotestherelation.Forexample,onthetype String, x.compareTo("cat")
2.2.THEJAVACOLLECTIONABSTRACTIONS 37
packagejava.lang;
/**Describestypesthathavea naturalordering. */ publicinterfaceComparable<T>{
/**Returns
**anegativevalueiffTHIS<Yunderthenaturalordering
**apositivevalueiffTHIS>Y;
**0iffXandYare"equivalent".
*ThrowsClassCastExceptionifXandYareincomparable.*/ intcompareTo(Ty);
returnsanintegerthatiszero,negative,orpositive,dependingonwhether x equals "cat",comesbeforeitinlexicographicorder,orcomesafterit.Thus,theordering x ≤ y onStringscorrespondstothecondition x.compareTo(y)<=0.
Forthepurposesofthe SortedSet interface,this ≤ (or ≥)orderingrepresented by compareTo (or compare,describedbelow)isintendedtobea totalordering. Thatis,itissupposedtobetransitive(x ≤ y and y ≤ z implies x ≤ z),reflexive (x ≤ x),andantisymmetric(x ≤ y and y ≤ x impliesthat x equals y).Also,forall x and y inthefunction’sdomain,either x ≤ y or y ≤ x.
Someclasses(suchas String)definetheirownstandardcomparisonoperation. Thestandardwaytodosoistoimplementthe Comparable interface,shownin Figure2.9.However,notallclasseshavesuchanordering,noris the natural orderingnecessarilywhatyouwantinanygivencase.Forexample,onecansort Stringsindictionaryorder,reversedictionaryorder,orcase-insensitiveorder.
IntheSchemelanguage,thereisnoparticularproblem:anorderingrelationis justafunction,andfunctionsareperfectlygoodvaluesinScheme.Toacertain extent,thesameistrueinlanguageslikeCandFortran,wherefunctionscanbe usedasargumentstosubprograms,butunlikeScheme,haveaccessonlytoglobal variables(whatarecalledstaticfieldsorclassvariablesinJava).Javadoesnotdirectlysupportfunctionsasvalues,butitturnsoutthatthisisnotalimitation.The Javastandardlibrarydefinesthe Comparator interface(Figure2.10)torepresent thingsthatmaybeusedasorderingrelations.
Themethodsprovidedbybothoftheseinterfacesaresupposedtobepropertotalorderings.However,asusual,noneoftheconditionscan actuallybeenforcedby theJavalanguage;theyarejustconventionsimposedbycomment.Theprogrammerwhoviolatestheseassumptionsmaycauseallkindsofunexpectedbehavior. Likewise,nothingcankeepyoufromdefininga compare operationthatisinconsistentwiththe .equals function.Wesaythat compare (or compareTo)is consistent withequals if x.equals(y) iff C.compare(x,y)==0.It’sgenerallygoodpracticeto maintainthisconsistencyintheabsenceofagoodreasontothecontrary.
38
CHAPTER2.DATATYPESINTHEABSTRACT
}
Figure2.9:Theinterface java.lang.Comparable,whichmarksclassesthatdefine anaturalordering.
packagejava.util;
/**Anorderingrelationoncertainpairsofobjects.If*/ publicinterfaceComparator<T>{
/**Returns
**anegativevalueiffX<YaccordingtoTHISordering; **apositivevalueiffX>Y; **0iffXandYare"equivalent"undertheorder; *ThrowsClassCastExceptionifXandYareincomparable. */ intcompare(Tx,Ty);
/**TrueifORDis"same"orderingasTHIS.Itislegaltoreturn *false(conservatively)evenifORDdoesdefinethesameordering, *butshouldreturntrueonlyifORD.compare(X,Y)and *THIS.compare(X,Y)alwayshavethesamevalue.*/ booleanequals(Objectord);
Figure2.10:Theinterface java.util.Comparator,whichrepresentsorderingrelationsbetweenObjects.
TheSortedSetInterface
The SortedSet interfaceshowninFigure2.11extendsthe Set interfacesothat its iterator methoddeliversan Iterator thatsequencesthroughitscontents“in order.”Italsoprovidesadditionalmethodsthatmakesense onlywhenthereis suchanorder.Thereareintendedtobetwowaystodefinethisordering:eitherthe programmersuppliesa Comparator whenconstructinga SortedSet thatdefines theorder,orelsethecontentsofthesetmust Comparable,andtheirnaturalorder isused.
2.3TheJavaMapAbstractions
Theterm map or mapping isusedincomputerscienceandelsewhereasasynonym for function inthemathematicalsense—acorrespondencebetweenitemsinsome set(the domain)andanotherset(the codomain)inwhicheachitemofthedomain correspondsto(ismappedtoby)asingleitemofthecodomain3 .
Itistypicalamongprogrammerstotakearatheroperational view,andsay thatamap-likedatastructure“looksup”agiven key (domainvalue)tofindthe associated value (codomainvalue).However,fromamathematicalpointofview,a perfectlygoodinterpretationisthatamappingisasetofpairs,(d,c),where d isa
2.3.THEJAVAMAPABSTRACTIONS 39
}
3 Anynumberofmembersofthedomain,includingzero,maycorrespondtoagivenmemberof thecodomain.Thesubsetofthecodomainthatismappedtobysomememberofthedomainis calledthe range ofthemapping,orthe image ofthedomainunderthemapping.
packagejava.util; publicinterfaceSortedSet<T>extendsSet<T>{
/* Constructors. ClassesthatimplementSortedSetshoulddefine *atleasttheconstructors
*CLASS():Anemptysetorderedbynaturalorder(compareTo).
*CLASS(CMP):AnemptysetorderedbytheComparatorCMP.
*CLASS(C):AsetcontainingtheitemsinCollectionC,in *naturalorder.
*CLASS(S):AsetcontainingacopyofSortedSetS,withthe *sameorder.
/**ThecomparatorusedbyTHIS,ornullifnaturalorderingused.*/ Comparator<?superT>comparator();
/**Thefirst(smallest)iteminTHISaccordingtoitsordering*/ Tfirst();
/**Thelast(largest)iteminTHISaccordingtoitsordering */ Tlast();
/*NOTE:ThemethodsheadSet,tailSet,andsubSetproduce *viewsthatbecomeinvalidifTHISisstructurallymodified by *anyothermeans.*/
/**AviewofallitemsinTHISthatarestrictlylessthanX.*/ SortedSet<T>headSet(Tx);
/**AviewofallitemsinTHISthatarestrictly>=X.*/ SortedSet<T>tailSet(Tx);
/**Aviewofallitems,y,inTHISsuchthatX0<=y<X1.*/ SortedSet<T>subSet(TX0,TX1); }
40 CHAPTER2.DATATYPESINTHEABSTRACT
*/
Figure2.11:Theinterface java.util.SortedSet
memberofthedomain,and c ofthecodomain.
2.3.1TheMapInterface
ThestandardJavalibraryusesthe java.util.Map interface,displayedinFigures2.12and2.13,tocapturethesenotionsof“mapping.”Thisinterfaceprovides boththeviewofamapasalook-upoperation(withthemethod get),butalsothe viewofamapasasetoforderedpairs(withthemethod entrySet).Thisinturnrequiressomerepresentationfor“orderedpair,”providedherebythenestedinterface Map.Entry.Aprogrammerwhowishestointroduceanewkindofmaptherefore definesnotonlyaconcreteclasstoimplementthe Map interface,butanotherone toimplement Map.Entry
2.3.2TheSortedMapInterface
Anobjectthatimplements java.util.SortedMap issupposedtobea Map inwhich thesetofkeysisordered.Asyoumightexpect,theoperationsareanalogousto thoseoftheinterface SortedSet,asshowninFigure2.15.
2.4AnExample
Considertheproblemofreadinginasequenceofpairsofnames,(ni,mi).Wewish tocreatealistofallthefirstmembers, ni,inalphabeticalorder,and,foreachof them,alistofallnames mi thatarepairedwiththem,witheach mi appearing once,andlistedintheorderoffirstappearance.Thus,theinput
JohnMaryGeorgeJeffTomBertGeorgePaulJohnPeter
TomJimGeorgePaulAnnCyrilJohnMaryGeorgeEric mightproducetheoutput
Ann:Cyril
George:JeffPaulEric
John:MaryPeter
Tom:BertJim
Wecanusesomekindof SortedMap tohandlethe ni andforeach,a List tohandle the mi.Apossiblemethod(takinga Reader asasourceofinputanda PrintWriter asadestinationforoutput)isshowninFigure2.16.
2.4.ANEXAMPLE 41
CHAPTER2.DATATYPESINTHEABSTRACT packagejava.util; publicinterfaceMap<Key,Val>{
/* Constructors: ClassesthatimplementMapshould *haveatleasttwoconstructors:
*CLASS():ConstructsanemptyCLASS
*CLASS(M):WhereMisanyMap,constructsaCLASSthat *denotesthesameabstractmappingasC.*/
/* Requiredmethods: */
/**ThenumberofkeysinthedomainofTHISmap.*/ intsize();
/**Trueiffsize()==0*/ booleanisEmpty();
/*NOTE:ThemethodskeySet,values,andentrySetproduceviews *thatremainvalidevenifTHISisstructurallymodified.*/
/**ThedomainofTHIS.*/ Set<Key>keySet();
/**TherangeofTHIS.*/ Collection<Val>values();
/**AviewofTHISasthesetofallits(key,value)pairs.*/ Set<Map.Entry<Key,Val>>entrySet();
/**ThevaluemappedtobyKEY,ornullifKEYisnot *inthedomainofTHIS.*/
/**TrueiffkeySet().contains(KEY)*/ booleancontainsKey(Objectkey);
/**Trueiffvalues().contains(VAL).*/ booleancontainsValue(Objectval); Objectget(Objectkey);
/**TrueiffMisaMapandTHISandMrepresentthesamemapping.*/ booleanequals(ObjectM);
/**ThesumofthehashCodevaluesofallmembersofentrySet().*/ inthashCode();
staticinterfaceEntry{...// SeeFigure2.14 }
42
Figure2.12:Requiredmethodsoftheinterface java.util.Map.
//Interfacejava.util.Map,continued
/* Optionalmethods: */
/**SetthedomainofTHIStotheemptyset.*/ voidclear();
/**Causeget(KEY)toyieldVAL,withoutdisturbingothervalues.*/ Objectput(Keykey,Valval);
/**AddallmembersofM.entrySet()totheentrySet()ofTHIS.*/ voidputAll(Map<?extendsKey,?extendsVal>M);
/**RemoveKEYfromthedomainofTHIS.*/ Objectremove(Objectkey);
/**Representsa(key,value)pairfromsomeMap.Ingeneral, anEntry *isassociatedwithaparticularunderlyingMapvalue.Operationsthat *changetheEntry(specificallysetValue)arereflectedin that *Map.OnceanentryhasbeenremovedfromaMapasaresultof *removeorclear,furtheroperationsonitmayfail.*/ staticinterfaceEntry<Key,Val>{
/**ThekeypartofTHIS.*/ KeygetKey();
/**ThevaluepartofTHIS.*/ ValgetValue();
/**CausegetValue()tobecomeVAL,returningthepreviousvalue.*/ ValsetValue(Valval);
/**TrueiffEisaMap.Entryandbothrepresentthesame(key,value) *pair(i.e.,keysarebothnull,orare.equal,andlikewisefor *values).
booleanequals(Objecte);
/**AnintegerhashvaluethatdependsonlyonthehashCodevalues
*ofgetKey()andgetValue()accordingtotheformula:
*(getKey()==null?0:getKey().hashCode())
*^(getValue()==null?0:getValue.hashCode())*/ inthashCode();
2.4.ANEXAMPLE 43
}
Figure2.13:Optionalmethodsoftheinterface java.util.Map
}
Figure2.14:Thenestedinterface java.util.Map.Entry,whichisnestedwithin the java.util.Map interface.
CHAPTER2.DATATYPESINTHEABSTRACT
packagejava.util; publicinterfaceSortedMap<Key,Val>extendsMap<Key,Val>{
/* Constructors: ClassesthatimplementSortedMapshould *haveatleastfourconstructors:
*CLASS():Anemptymapwhosekeysareorderedbynaturalorder.
*CLASS(CMP):AnemptymapwhosekeysareorderedbytheComparatorCMP.
*CLASS(M):AmapthatisacopyofMapM,withkeysordered *innaturalorder.
*CLASS(S):AmapcontainingacopyofSortedMapS,with *keysobeyingthesameordering.
/**ThecomparatorusedbyTHIS,ornullifnaturalorderingused.*/ Comparator<?superKey>comparator();
/**Thefirst(smallest)keyinthedomainofTHISaccordingto *itsordering*/ KeyfirstKey();
/**Thelast(largest)iteminthedomainofTHISaccordingto *itsordering*/ KeylastKey();
/*NOTE:ThemethodsheadMap,tailMap,andsubMapproduceviews *thatremainvalidevenifTHISisstructurallymodified.*/
/**AviewofTHISconsistingoftherestrictiontoallkeysin the *domainthatarestrictlylessthanKEY.*/ SortedMap<Key,Val>headMap(Keykey);
/**AviewofTHISconsistingoftherestrictiontoallkeysin the *domainthataregreaterthanorequaltoKEY.*/ SortedMap<Key,Val>tailMap(Keykey);
/**AviewofTHISrestrictedtothedomainofallkeys,y, *suchthatKEY0<=y<KEY1.*/ SortedMap<Key,Val>subMap(Keykey0,Keykey1);
Figure2.15:Theinterface
44
*/
}
java.util.SortedMap,showingmethodsnotincluded in Map.
importjava.util.*; importjava.io.*;
classExample{
/**Read (ni,mi) pairsfromINP,andsummarizeall *pairingsforeach$n_i$inorderonOUT.*/ staticvoidcorrelate(Readerinp,PrintWriterout)
{
throwsIOException
Scannerscn=newScanner(inp); SortedMap<String,List<String>>associatesMap =newTreeMap<String,List<String>>(); while(scn.hasNext()){
Stringn=scn.next(); Stringm=scn.next();
if(m==null||n==null)
thrownewIOException("badinputformat"); List<String>associates=associatesMap.get(n); if(associates==null){
associates=newArrayList<String>(); associatesMap.put(n,associates); }
if(!associates.contains(m)) associates.add(m);
for(Map.Entry<String,List<String>>e:associatesMap.entrySet()){ System.out.format("%s:",e.getKey()); for(Strings:e.getValue()) System.out.format("%s",s); System.out.println();
2.4.ANEXAMPLE 45
}
} } }
Figure2.16:Anexampleusing SortedMapsand Lists.
2.5ManagingPartialImplementations:DesignOptions
Throughoutthe Collection interfaces,yousaw(incomments)thatcertainoperationswere“optional.”Theirspecificationsgavetheimplementorleavetouse thrownewUnsupportedOperationException();
asthebodyoftheoperation.Thisprovidesanelegantenough waynottoimplement something,butitraisesanimportantdesignissue.Throwinganexceptionisa dynamic action.Ingeneral,thecompilerwillhavenocommentaboutthefactthat youhavewrittenaprogramthatmustinevitablythrowsuchan exception;youwill discoveronlyupontestingtheprogramthattheimplementationyouhavechosen forsomedatastructureisnotsufficient.
Analternativedesignwouldsplittheinterfacesintosmallerpieces,likethis:
publicinterfaceConstantIterator<T>{ Requiredmethodsof Iterator
publicinterfaceIterator<T>extendsConstantIterator<T>{ voidremove();
publicinterfaceConstantCollection<T>{ Requiredmethodsof Collection
publicinterfaceCollection<T>extendsConstantCollection<T>{ Optionalmethodsof Collection
publicinterfaceConstantSet<T>extendsConstantCollection<T>{
publicinterfaceSet<T>extendsConstantSet<T>,Collection<T>{
publicinterfaceConstantList<T>extendsConstantCollection<T>{ Requiredmethodsof List
publicinterfaceList<T>extendsCollection<T>,ConstantList<T>{ Optionalmethodsof List
etc....
46 CHAPTER2.DATATYPESINTHEABSTRACT
}
}
}
}
}
}
}
}
2.5.MANAGINGPARTIALIMPLEMENTATIONS:DESIGNOPTIONS 47
Withsuchadesignthecompilercouldcatchattemptstocallunsupportedmethods, sothatyouwouldn’tneedtestingtodiscoveragapinyourimplementation. However,sucharedesignwouldhaveitsowncosts.It’snotquiteassimpleas thelistingabovemakesitappear.Consider,forexample,the subList methodin ConstantList.Presumably,thiswouldmostsensiblyreturna ConstantList,since ifyouarenotallowedtoalteralist,youcannotbeallowedto alteroneofitsviews. Thatmeans,however,thatthetype List wouldneedtwo subList methods(with differingnames),theoneinheritedfrom ConstantList,andanewonethatproduces a List asitsresult,whichwouldallowmodification.Similarconsiderationsapply totheresultsofthe iterator method;therewouldhavetobetwo—onetoreturna ConstantIterator,andtheothertoreturn Iterator.Furthermore,thisproposed redesignwouldnotdealwithanimplementationof List thatallowedonetoadd items,orclearallitems,butnotremoveindividualitems.Forthat,youwouldeither stillneedthe UnsupportedOperationException oranevenmorecomplicatednest ofclasses.
Evidently,theJavadesignersdecidedtoacceptthecostofleavingsomeproblems tobediscoveredbytestinginordertosimplifythedesignof theirlibrary.By contrast,thedesignersofthecorrespondingstandardlibrariesinC++optedto distinguishoperationsthatworkonanycollectionsfromthosethatworkonlyon “mutable”collections.However,theydidnotdesigntheirlibraryoutofinterfaces; itisawkwardatbesttointroducenewkindsofcollectionormapintheC++library.
48 CHAPTER2.DATATYPESINTHEABSTRACT
Chapter3
MeetingaSpecification
InChapter2,wesawandexercisedanumberofabstractinterfaces—abstractinthe sensethattheydescribethecommonfeatures,themethodsignatures,ofwholefamiliesoftypeswithoutsayinganythingabouttheinternalsofthosetypesandwithout providingawaytocreateanyconcreteobjectsthatimplementthoseinterfaces.
Inthischapter,wegetalittleclosertoconcreterepresentations,byshowing onewaytofillintheblanks.Inonesense,thesewon’tbeseriousimplementations; theywilluse“naive,”ratherslowdatastructures.Ourpurpose,rather,willbeone ofexercisingthemachineryofobject-orientedprogrammingtoillustrateideasthat youcanapplyelsewhere.
Tohelpimplementorswhowishtointroducenewimplementationsoftheabstractinterfaceswe’vecovered,theJavastandardlibrary providesaparallelcollectionofabstractclasseswithsomemethodsfilledin.Onceyou’vesuppliedafew keymethodsthatremainunimplemented,yougetalltherest“forfree”.These partialimplementationclassesarenotintendedtobeuseddirectlyinmostordinaryprograms,butonlyasimplementationaidsforlibrarywriters.Hereisalist oftheseclassesandtheinterfacestheypartiallyimplement(allfromthepackage java.util):
AbstractClass Interfaces
AbstractCollection Collection
AbstractSet Collection,Set
AbstractList Collection,List
AbstractSequentialList Collection,List
AbstractMap Map
Theideaofusingpartialimplementationsinthiswayisaninstanceofadesign patterncalledTemplateMethod.Theterm designpattern inthecontextofobjectorientedprogramminghascometomean“thecoreofasolution toaparticular commonlyoccurringprobleminprogramdesign1.”The Abstract... classesare
1 TheseminalworkonthetopicistheexcellentbookbyE.Gamma,R.Helm,R.Johnson,andJ. Vlissides, DesignPatterns:ElementsofReusableObject-OrientedSoftware, Addison-Wesley,1995. Thisgroupandtheirbookareoftenreferredtoas“TheGangof Four.”
49
CHAPTER3.MEETINGASPECIFICATION
importjava.util.*; importjava.lang.reflect.Array; publicclassArrayCollection<T>implementsCollection<T>{ privateT[]data;
/**AnemptyCollection*/ publicArrayCollection(){data=(T[])newObject[0];}
/**ACollectionconsistingoftheelementsofC*/ publicArrayCollection(Collection<?extendsT>C){ data=C.toArray((T[])newObject[C.size()]); }
/**ACollectionconsistingofaviewoftheelementsofA.*/ publicArrayCollection(T[]A){data=T;}
publicintsize(){returndata.length;} publicIterator<T>iterator(){ returnnewIterator<T>(){ privateintk=0; publicbooleanhasNext(){returnk<size();} publicTnext(){
if(!hasNext())thrownewNoSuchElementException(); k+=1; returndata[k-1]; } publicvoidremove(){ thrownewUnsupportedOperationException(); } };
publicbooleanisEmpty(){returnsize()==0;}
publicbooleancontains(Objectx){ for(Ty:this){
if(x==null&&y==null
||x!=null&&x.equals(y)) returntrue; } returnfalse;
50
}
}
Figure3.1:Implementationofanewkindofread-only Collection “fromscratch.”
publicbooleancontainsAll(Collection<?>c){ for(Objectx:c) if(!contains(x)) returnfalse; returntrue;
}
publicObject[]toArray(){returntoArray(newObject[size()]);}
public<E>E[]toArray(E[]anArray){ if(anArray.length<size()){
Class<?>typeOfElement=anArray.getClass().getComponentType(); anArray=(E[])Array.newInstance(typeOfElement,size());
}
System.arraycopy(anArray,0,data,0,size()); returnanArray; }
privatebooleanUNSUPPORTED(){ thrownewUnsupportedOperationException();
}
publicbooleanadd(Tx){returnUNSUPPORTED();}
publicbooleanaddAll(Collection<?extendsT>c){returnUNSUPPORTED();} publicvoidclear(){UNSUPPORTED();}
publicbooleanremove(Objectx){returnUNSUPPORTED();}
publicbooleanremoveAll(Collection<?>c){returnUNSUPPORTED();}
publicbooleanretainAll(Collection<?>c){returnUNSUPPORTED();} }
51
Figure3.1,continued:Sincethisisaread-onlycollection,themethodsformodifyingthecollectionallthrow UnsupportedOperationException,thestandardway tosignalunsupportedfeatures.
CHAPTER3.MEETINGASPECIFICATION
usedastemplatesforrealimplementations.Usingmethodoverriding,theimplementorfillsinafewmethods;everythingelseinthetemplate usesthosemethods2 . Inthesectionstofollow,we’lllookathowtheseclassesare usedandwe’lllook atsomeoftheirinternalsforideasabouthowtousesomeofthefeaturesofJava classes.Butfirst,let’shaveaquicklookatthealternative
3.1DoingitfromScratch
Forcomparison,let’ssupposewewantedtointroduceasimpleimplementationthat simplyallowedustotreatanordinaryarrayofObjectsasaread-only Collection. ThedirectwaytodosoisshowninFigure3.1.Followingthespecificationof Collection,thefirsttwoconstructorsfor ArrayCollection providewaysofforminganemptycollection(notterriblyuseful,ofcourse,sinceyoucan’taddtoit)anda copyofanexistingcollection.Thethirdconstructorisspecifictothenewclass,and providesaviewofanarrayasa Collection—thatis,theitemsinthe Collection aretheelementsofthearray,andtheoperationsarethoseof the Collection interface.Nextcometherequiredmethods.The Iterator thatisreturnedby iterator hasananonymoustype;nouserof ArrayCollection cancreateanobjectofthis typedirectly.Sincethisisaread-onlycollection,theoptionalmethods(which modifycollections)areallunsupported.
ASideExcursiononReflection. Theimplementationofthesecond toArray methodisratherinteresting,inthatitusesafairlyexotic featureoftheJavalanguage: reflection. Thistermreferstolanguagefeaturesthatallowonetomanipulate constructsofaprogramminglanguagewithinthelanguageitself.InEnglish,we employreflectionwhenwesaysomethinglike“Theword‘hit’isaverb.”Thespecificationof toArray callsforustoproduceanarrayofthesamedynamictypeas theargument.Todoso,wefirstusethemethod getClass,whichisdefinedon all Objects,togetavalueofthebuilt-intype java.lang.Class thatstandsfor (reflects)thedynamictypeofthe anArray argument.Oneoftheoperationson type Class is getComponentType,which,foranarraytype,fetchesthe Class that reflectsthetypeofitselements.Finally,the newInstance method(definedinthe class java.lang.reflect.Array)createsanewarrayobject,givenitssizeandthe Class foritscomponenttype.
3.2TheAbstractCollectionClass
Theimplementationof ArrayCollection hasaninterestingfeature:themethods startingwith isEmpty makenomentionoftheprivatedataof ArrayCollection,
2Whilethename TemplateMethod maybeappropriateforthisdesignpattern,Imustadmitthat ithassomeunfortunateclasheswithotherusesoftheterminology.First,thelibrarydefineswhole classes, whilethenameofthepatternfocusesonindividualmethodswithinthatclass.Second,the term template hasanothermeaningwithinobject-orientedprogramming;inC++(andapparently inupcomingrevisionsofJava),itreferstoaparticularlanguageconstruct.
52
butinsteadrelyentirelyontheother(public)methods.Asa result,theycouldbe employedverbatimintheimplementationof any Collection class.Thestandard Javalibraryclass AbstractCollection exploitsthisobservation(seeFigure3.2). Itisapartiallyimplementedabstractclassthatnewkindsof Collection can extend.Atabareminimum,animplementorcanoverridejustthedefinitionsof iterator and size togetaread-onlycollectionclass.Forexample,Figure3.3 showsaneasierre-writeof ArrayCollection.If,inaddition,theprogrammer overridesthe add method,then AbstractCollection willautomaticallyprovide addAll aswell.Finally,ifthe iterator methodreturnsan Iterator thatsupports the remove method,then AbstractCollection willautomaticallyprovide clear, remove, removeAll,and retainAll.
Inprograms,theideaistouse AbstractCollection only inan extends clause. Thatis,itissimplyautilityclassforthebenefitofimplementorscreatingnew kindsof Collection,andshouldnotgenerallybeusedtospecifythetypeofa formalparameter,localvariable,orfield.This,bytheway, istheexplanationfor declaringtheconstructorfor AbstractCollection tobe protected;thatkeyword emphasizesthefactthatonlyextensionsof AbstractClass willcallit.
You’vealreadyseenfiveexamplesofhow AbstractCollection mightworkin Figure3.1:methods isEmpty, contains, containsAll,andthetwo toArray methods.Onceyougetthegeneralidea,itisfairlyeasytoproducesuchmethodbodies Theexercisesaskyoutoproduceafewmore.
3.3ImplementingtheListInterface
Theabstractclasses AbstractList and AbstractSequentialList arespecialized extensionsoftheclass AbstractCollection providedbytheJavastandardlibrary tohelpdefineclassesthatimplementthe List interface.Whichyouchoosedependsonthenatureoftherepresentationusedfortheconcretelisttypebeing implemented.
3.3.1TheAbstractListClass
Theabstractimplementationof List, AbstractList,sketchedinFigure3.4is intendedforrepresentationsthatprovidefast(generally constanttime) random access totheirelements—thatis,representationswithafastimplementationof get and(ifsupplied) remove Figure3.5showshow listIterator works,asapartial illustration.Thereareanumberofinterestingtechniques illustratedbythisclass.
Protectedmethods. Themethod removeRange isnotpartofthepublicinterface.Sinceitisdeclared protected,itmayonlybecalledwithinotherclasses inthepackage java.util,andwithinthebodiesofextensionsof AbstractList Suchmethodsare implementationutilities foruseintheclassanditsextensions. Inthestandardimplementationof AbstractList, removeRange isusedtoimplement clear (whichmightnotsoundtooimportantuntilyourememberthat L.subList(k0,k1).clear() ishowoneremovesanarbitrarysectionofa List).
53
3.3.IMPLEMENTINGTHELISTINTERFACE
CHAPTER3.MEETINGASPECIFICATION packagejava.util; publicabstractclassAbstractCollection<T>implementsCollection<T>{ /**TheemptyCollection.*/ protectedAbstractCollection<T>(){}
/**Unimplementedmethodsthatmustbeoverriddeninany *non-abstractclassthatextendsAbstractCollection*/
/**ThenumberofvaluesinTHIS.*/ publicabstractintsize();
/**AniteratorthatyieldsalltheelementsofTHIS,insome *order.Iftheremoveoperationissupportedonthisiterator, *thenremove,removeAll,clear,andretainAllonTHISwillwork.*/ publicabstractIterator<T>iterator();
/**Overridethisdefaultimplementationtosupportadding */ publicbooleanadd(Tx){ thrownewUnsupportedOperationException(); }
Default,general-purposeimplementationsof contains(Objectx),containsAll(Collectionc),isEmpty(), toArray(),toArray(Object[]A), addAll(Collectionc),clear(),remove(Objectx), removeAll(Collectionc),andretainAll(Collectionc)
/**AStringrepresentingTHIS,consistingofacomma-separated *listofthevaluesinTHIS,asreturnedbyitsiterator, *surroundedbysquarebrackets([]).Theelementsare *convertedtoStringsbyString.valueOf(whichreturns"null" *forthenullpointerandotherwisecallsthe.toString()method).*/ publicStringtoString(){...}
Figure3.2:Theabstractclass java.util.AbstractCollection,whichmaybe usedtohelpimplementnewkindsof Collection.Allthemethodsbehaveas specifiedinthespecificationof Collection.Implementorsmustfillindefinitions of iterator and size,andmayeitheroverridetheothermethods,orsimplyuse theirdefaultimplementations(notshownhere).
54
}
importjava.util.*;
/**Aread-onlyCollectionwhoseelementsarethoseofanarray.*/ publicclassArrayCollection<T>extendsAbstractCollection<T>{ privateT[]data;
/**AnemptyCollection*/ publicArrayCollection(){ data=(T[])newObject[0];
/**ACollectionconsistingoftheelementsofC*/ publicArrayCollection(Collection<?extendsT>C){ data=C.toArray(newObject[C.size()]);
/**ACollectionconsistingofaviewoftheelementsofA.*/ publicArrayCollection(Object[]A){ data=A;
publicintsize(){returndata.length;}
publicIterator<T>iterator(){ returnnewIterator<T>(){ privateintk=0; publicbooleanhasNext(){returnk<size();} publicTnext(){
if(!hasNext())thrownewNoSuchElementException(); k+=1; returndata[k-1];
publicvoidremove(){ thrownewUnsupportedOperationException();
3.3.IMPLEMENTINGTHELISTINTERFACE 55
}
}
}
}
} }; }
Figure3.3:Re-implementationof ArrayCollection,usingthedefaultimplementationsfrom java.util.AbstractCollection.
Thedefaultimplementationof removeRange simplycalls remove(k) repeatedlyand soisnotparticularlyfast.Butifaparticular List representationallowssomebetterstrategy,thentheprogrammercanoverride removeRange,gettingbetterperformancefor clear (that’swhythedefaultimplementaionofthemethodisnot declared final,eventhoughitiswrittentoworkforanyrepresentationof List).
CheckingforInvalidity. Aswediscussedin §2.2.3,the iterator, listIterator, and subList methodsofthe List interfaceproduceviewsofalistthat“become invalid”ifthelistisstructurallychanged.Implementors of List areunderno particularobligationtodoanythingsensiblefortheprogrammerwhoignoresthis provision;usinganinvalidatedviewmayproduceunpredictableresultsorthrowan unexpectedexception,asconvenient.Nevertheless,the AbstractList classgoesto sometroubletoprovideawaytoexplicitlycheckforthiserror,andimmediately throwaspecificexception, ConcurrentModificationException,ifithappens.The field modCount (declared protected toindicateitisintendedforListimplementors, notusers)keepstrackofthenumberofstructuralmodificationstoan AbstractList Everycallto add or remove (eitherontheListdirectlyorthroughaview)issupposedtoincrementit.Individualviewscanthenkeeptrackofthelastvaluethey “saw”forthe modCount fieldoftheirunderlyingListandthrowanexceptionifit seemstohavechangedintheinterim.We’llseeanexampleinFigure3.5.
HelperClasses. The subList methodof AbstractList (atleastinSun’simplementation)usesanon-publicutilitytype java.util.SubList toproduceits result.Becauseitisnotpublic, java.util.SubList isineffectprivatetothe java.util package,andisnotanofficialpartoftheservicesprovidedby that package.However,beinginthesamepackage,itisallowedto accessthenon-public fields(modCount)andutilitymethods(removeRange)of AbstractList.Thisisan exampleofJava’smechanismforallowing“trusted”classes (thoseinthesamepackage)accesstotheinternalsofaclasswhileexcludingaccessfromother“untrusted” classes.
3.3.2TheAbstractSequentialListClass
Thesecondabstractimplementationof List, AbstractSequentialList (Figure3.6), isintendedforusewithrepresentationswhererandomaccessisrelativelyslow,but the next operationofthelistiteratorisstillfast.
Thereasonforhavingadistinctclassforthiscasebecomesclearwhenyou considertheimplementationsof get andthe next methodsoftheiterators.Ifwe assumeafast get method,thenitiseasytoimplementtheiteratorstohavefast next methods,aswasshowninFigure3.5.If get isslow—specifically,iftheonly waytoretrieveitem k ofthelististosequencethroughthepreceding k items—then implementing next asinthatfigurewouldbedisasterous;itwouldrequireΘ(N 2) operationstoiteratethroughan N -elementlist.Sousing get toimplementthe iteratorsisnotalwaysagoodidea.
56
CHAPTER3.MEETINGASPECIFICATION
packagejava.util;
publicabstractclassAbstractList<T>
extendsAbstractCollection<T>implementsList<T>{
/**Constructanemptylist.*/ protectedAbstractList(){modCount=0;}
abstractTget(intindex); abstractintsize();
Tset(intk,Tx){returnUNSUPPORTED();}
voidadd(intk,Tx){UNSUPPORTED();}
Tremove(intk){returnUNSUPPORTED();}
Default,general-purposeimplementationsof add(x),addAll,clear,equals,hashCode,indexOf,iterator, lastIndexOf,listIterator,set,andsubList
/**ThenumberoftimesTHIShashadelementsaddedorremoved.*/ protectedintmodCount;
/**RemovefromTHISallelementswithindicesinthe rangeK0..K1-1.*/
protectedvoidremoveRange(intk0,intk1){ ListIterator<T>i=listIterator(k0); for(intk=k0;k<k1&&i.hasNext();k+=1){ i.next();i.remove();
privateObjectUNSUPPORTED()
{thrownewUnsupportedOperationException();}
3.3.IMPLEMENTINGTHELISTINTERFACE 57
} }
}
Figure3.4:Theabstractclass AbstractList,usedasanimplementationaidin writingimplementationsof List thatareintendedforrandomaccess.SeeFigure3.5 fortheinnerclass ListIteratorImpl.
CHAPTER3.MEETINGASPECIFICATION
publicListIterator<T>listIterator(intk0){ returnnewListIteratorImpl(k0); }
privateclassListIteratorImpl<T>implementsListIterator<T>{ ListIteratorImpl(intk0) {lastMod=modCount;k=k0;lastIndex=-1;}
publicbooleanhasNext(){returnk<size();} publichasPrevious(){returnk>0;}
publicTnext(){ check(0,size());
lastIndex=k;k+=1;returnget(lastIndex);
publicTprevious(){ check(1,size()+1); k-=1;lastIndex=k;returnget(k);
publicintnextIndex(){returnk;} publicintpreviousIndex(){returnk-1;}
publicvoidadd(Tx){ check();lastIndex=-1; k+=1;AbstractList.this.add(k-1,x); lastMod=modCount;
publicvoidremove(){ checkLast();AbstractList.this.remove(lastIndex); lastIndex=-1;lastMod=modCount;
publicvoidset(Tx){
checkLast();AbstractList.this.remove(lastIndex,x); lastIndex=-1;lastMod=modCount;
58
}
}
}
}
}
Figure3.5:Partofapossibleimplementationof AbstractList,showingtheinner classprovidingthevalueof listIterator
//ClassAbstractList.ListIteratorImpl,continued.
/*Privatedefinitions*/
/**modCountvalueexpectedforunderlyinglist.*/ privateintlastMod;
/**Currentposition.*/ privateintk;
/**Indexoflastresultreturnedbynextorprevious.*/ privateintlastIndex;
/**Checkthattherehasbeennoconcurrentmodification.Throws *appropriateexceptioniftherehas.*/ privatevoidcheck(){
if(modCount!=lastMod)thrownewConcurrentModificationException(); }
/**Checkthattherehasbeennoconcurrentmodificationand that *thecurrentposition,k,isintherangeK0<=k<K1.Throws *appropriateexceptionifeithertestfails.*/ privatevoidcheck(intk0,intk1){ check();
if(k<k0||k>=k1) thrownewNoSuchElementException(); }
/**Checkthattherehasbeennoconcurrentmodificationand that *thereisavalid‘‘lastelementreturnedbynextorprevious’’. *Throwsappropriateexceptionifeithertestfails.*/ privatecheckLast(){ check();
if(lastIndex==-1)thrownewIllegalStateException();
3.3.IMPLEMENTINGTHELISTINTERFACE 59
}
Figure3.5,continued:Privaterepresentationofthe ListIterator
CHAPTER3.MEETINGASPECIFICATION
publicabstractclassAbstractSequentialList<T>extends AbstractList<T>{ /**Anemptylist*/ protectedAbstractSequentialList(){}
abstractintsize();
abstractListIterator<T>listIterator(intk);
Defaultimplementationsof add(k,x),addAll(k,c),get,iterator,remove(k),set
From AbstractList,inheritedimplementationsof add(x),clear,equals,hashCode,indexOf,lastIndexOf, listIterator(),removeRange,subList
From AbstractCollection,inheritedimplementationsof addAll(),contains,containsAll,isEmpty,remove(),removeAll, retainAll,toArray,toString
Ontheotherhand,ifwewerealwaystoimplement get(k) byiteratingoverthe preceding k items(thatis,usethe Iterator’smethodstoimplement get rather thanthereverse),wewouldobviouslyloseoutonrepresentationswhere get is fast.
3.4TheAbstractMapClass
The AbstractMap classshowninFigure3.7providesatemplateimplementationfor the Map interface.Overridingjustthe entrySet toprovidearead-only Set gives aread-only Map.Additionallyoverridingthe put methodgivesanextendable Map, andimplementingthe remove methodfor entrySet().iterator() givesafully modifiable Map
3.5PerformancePredictions
AtthebeginningofChapter2,Isaidthattherearetypically severalimplementationsforagiveninterface.Thereareseveralpossiblereasonsonemightneedmore thanone.First,specialkindsofstoreditems,keys,orvaluesmightneedspecial handling,eitherforspeed,orbecausethereareextraoperationsthatmakesense onlyforthesespecialkindsofthings.Second,someparticular Collections or Maps mayneedaspecialimplementationbecausetheyarepartofsomethingelse, suchasthe subList or entrySet views.Third,oneimplementationmayperform
60
}
Figure3.6:Theclass AbstractSequentialList.
.
packagejava.util; publicabstractclassAbstractMap<Key,Val>implementsMap<Key,Val>{ /**Anemptymap.*/ protectedAbstractMap(){}
/**AviewofTHISasthesetofallits(key,value)pairs. *IftheresultingSet’siteratorsupportsremove,thenTHIS *mapwillsupporttheremoveandclearoperations.*/ publicabstractSet<Entry<Key,Val>>entrySet();
/**Causeget(KEY)toyieldVAL,withoutdisturbingothervalues.*/ publicValput(Keykey,Valval){ thrownewUnsupportedOperationException();
Defaultimplementationsof clear,containsKey,containsValue,equals,get,hashCode, isEmpty,keySet,putAll,remove,size,values
/**PrintaStringrepresentationofTHIS,intheform *{KEY0=VALUE0,KEY1=VALUE1,...}
*wherekeysandvaluesareconvertedusingString.valueOf(...).*/ publicStringtoString(){...}
3.5.PERFORMANCEPREDICTIONS 61
}
}
Figure3.7:Theclass AbstractMap
CHAPTER3.MEETINGASPECIFICATION
betterthananotherinsomecircumstances,butnotinothers.Finally,theremaybe time-vs.-spacetradeoffsbetweendifferentimplementations,andsomeapplications mayhaveparticularneedforacompact(space-efficient)representation.
Wecan’tmakespecificclaimsabouttheperformanceofthe Abstract... family ofclassesdescribedherebecausetheyaretemplatesrather thancompleteimplementations.However,wecancharacterizetheirperformanceasafunctionofthe methodsthattheprogrammerfillsin.Here,let’sconsidertwoexamples:theimplementationtemplatesforthe List interface.
AbstractList. Thestrategybehind AbstractLististousethemethods size, get(k), add(k,x), set(k,x),and remove(k) suppliedbytheextendingtypeto implementeverythingelse.The listIterator methodreturnsa ListIterator thatuses get toimplement next and previous, add (on AbstractList)toimplementtheiterator’s add,and remove (on AbstractList)toimplementtheiterator’s remove.Thecostoftheadditionalbookkeepingdonebytheiterator consistsof incrementingordecrementinganintegervariable,andisthereforeasmallconstant. Thus,wecaneasilyrelatethecostsoftheiteratorfunctionsdirectlytothoseofthe suppliedmethods,asshowninthefollowingtable.Tosimplifymatters,wetakethe timecostsofthe size operationandthe equals operationonindividualitemsto beconstant.Thevaluesofthe“plugged-in”methodsaregivennamesoftheform Cα;thesizeof this (the List)is N ,andthesizeoftheotherCollectionargument
c,whichwe’llassumeisthesamekindof List,justtobeabletosay
62
(denoted
more)is M Costsof AbstractList Implementations List ListIterator MethodTimeasΘ(·) MethodTimeasΘ(·) add(k,X) Ca add Ca get(k) Cg remove Cr remove(k) Cr next Cg set Cs previous Cg remove(X) Cr + N · Cg set Cs indexOf N · Cg hasNext1 lastIndexOf N · Cg listIterator(k)1 iterator()1 subList1 size1 isEmpty1 contains N Cg containsAll(c) N · M · Cg addAll(c) M · Cg +(N + M ) · Ca toArray N · Cg
AbstractSequentialList. Let’snowcomparethe AbstractList implementation with AbstractSequentialList,whichisintendedtobeusedwithrepresentations thatdon’thavecheap get operations,butstilldohavecheapiterators.Inthiscase, the get(k) operationisimplementedbycreatinga ListIterator andperforming a next operationonit k times.Wegetthefollowingtable:
Costsof AbstractList Implementations
3.1. Provideabodyforthe addAll methodof AbstractCollection.Itcanassumethat add willeitherthrow UnsupportedOperationException ifaddingtothe Collectionisnotsupported,orwilladdanelement.
3.2. Provideabodyforthe removeAll methodof AbstractCollection.You mayassumethat,ifremovalissupported,the remove operationoftheresultof iterator works.
3.3. Provideapossibleimplementationofthe java.util.SubList class.This utilityclassimplements List andhasoneconstructor:
/**AviewofitemsK0throughtK1-1ofTHELIST.Subsequent *modificationstoTHISalsomodifyTHELIST.Anystructural
*modificationtoTHELISTotherthanthroughTHISandany *iteratorsorsublistsderivedfromitrendersTHISinvalid
*OperationsonaninvalidSubListthrow
*ConcurrentModificationException*/ SubList(AbstractListtheList,intk0,intk1){...}
3.5.PERFORMANCEPREDICTIONS 63
List ListIterator MethodTimeasΘ( ) MethodTimeasΘ( ) add(k,X) Ca + k Cn add Ca get(k) k Cn remove Cr remove(k) Cr + k Cn next Cn set(k,X) Cs + k Cn previous Cp remove(X) Cr + N Cg set Cs indexOf N Cn hasNext1 lastIndexOf N · Cp listIterator(k) k · Cn iterator()1 subList1 size1 isEmpty1 contains N Cn containsAll(c) N M Cn addAll(c) M Cn + N Ca toArray N · Cn
Exercises
3.4. Forclass AbstractSequentialList,providepossibleimplementationsof add(k,x) and get.Arrangetheimplementationsothatperforminga get ofanelementator near eitherend ofthelistisfast.
3.5. Extendthe AbstractMap classtoproduceafullimplementationof Map.Try toleaveasmuchaspossibleupto AbstractMap,implementingjustwhatyouneed. Forarepresentation,provideanimplementationof Map.Entry andthenusethe existingimplementationof Set providedbytheJavalibrary, HashSet.Callthe resultingclass SimpleMap.
3.6. In §3.5,wedidnottalkabouttheperformanceofoperationsonthe Listsreturnedbythe subList method.Providetheseestimatesforboth AbstractList and AbstractSequentialList.For AbstractSequentialList,thetimerequirement forthe get methodonasublist must dependonthethefirstargumentto subList (thestartingpoint).Whyisthis?Whatchangetothedefinitionof ListIterator couldmaketheperformanceof get (andotheroperations)onsublistsindependent onwhereintheoriginallistthesublistcomesfrom?
64
CHAPTER3.MEETINGASPECIFICATION
Chapter4
SequencesandTheir Implementations
InChapters2and3,wesawquiteabitofthe List interfaceandsomeskeleton implementations.Here,wereviewthestandardrepresentations(concreteimplementations)ofthisinterface,andalsolookatinterfacesandimplementationsof somespecializedversions,the queue datastructures.
4.1ArrayRepresentationoftheListInterface
Most“production”programminglanguageshavesomebuilt-indatastructurelike theJavaarray—arandom-accesssequenceofvariables,indexedbyintegers.The arraydatastructurehastwomainperformanceadvantages.First,itisacompact (space-efficient)representationforasequenceofvariables,typicallytakinglittle morespacethantheconstituentvariablesthemselves.Second,randomaccessto anygivenvariableinthesequenceisafast,constant-timeoperation.Thechief disadvantageisthatchangingthesizeofthesequencerepresentedisslow(inthe worstcase).Nevertheless,withalittlecare,wewillseethatthe amortizedcost of operationsonarray-backedlistsisconstant.
Oneofthebuilt-inJavatypesis java.util.ArrayList,whichhas,inpart,the implementationshowninFigure4.11.So,youcancreateanew ArrayList withits constructors,optionallychoosinghowmuchspaceitinitiallyhas.Thenyoucanadd items(with add),andthearrayholdingtheseitemswillbeexpandedasneeded.
Whatcanwesayaboutthecostoftheoperationson ArrayList?Obviously, get and size areΘ(1);theinterestingoneis add.Asyoucansee,thecapacityof an ArrayList isalwayspositive.Theimplementationof add uses ensureCapacity wheneverthearraypointedtoby data needstoexpand,anditrequeststhatthe
1 TheJavastandardlibrarytype java.util.Vector providesessentiallythesamerepresentation.Itpredates ArrayList andtheintroductionofJava’sstandard Collection classes,and was“retrofitted”tomeetthe List interface.Asaresult,manyexistingJavaprogramstendto use Vector,andtendtouseits(nowredundant)pre-List operations,suchas elementAt and removeAllElements (sameas get and clear).The Vector classhasanotherdifference:itis synchronized, whereas ArrayList isnot.See §10.1forfurtherdiscussion.
65
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
capacityofthe ArrayList—thesizeof data—should double wheneveritneedsto expand.Let’slookintothereasonbehindthisdesignchoice.We’llconsiderjust thecall A.add(x),whichcalls A.add(A.size(),x).
Supposethatwereplacethelines
if(count+1>data.length) ensureCapacity(data.length*2); withthealternativeminimalexpansion: ensureCapacity(count+1); Inthiscase,oncetheinitialcapacityisexhausted,each add operationwillexpand thearray data.Let’smeasurethecostof add innumberofassignmentstoarray elements[whyisthatreasonable?].InJava,wecantakethecostoftheexpression newObject[K ] tobeΘ(K).Thisdoesnotchangewhenweaddinthecostofcopyingelementsfromthepreviousarrayintoit(using System.arraycopy).Therefore, theworst-casecost, Ci,ofexecuting A.add(x) usingoursimpleincrement-size-by-1 schemeis
Ci(K,M )= α1, if M>K; α2(K +1), if M = K where K is A.size(), M ≥ K is A’scurrentcapacity(thatis, A.data.length), andthe αi aresomeconstants.So,conservativelyspeaking,wecanjustsaythat C(K,M,I) ∈ Θ(K).
Nowlet’sconsiderthecost, Cd,oftheimplementationasshowninFigure4.1, wherewealwaysdoublethecapacitywhenitmustbeincreased.Thistime,weget
Cd(K,M )= α1, if M>K; α3(2K +1), if M = K
theworst-casecostlooksidentical;thefactoroftwoincreaseinsizesimplychanges theconstantfactor,andwecanstillusethesameformulaasbefore: Cd(K,M ) ∈ O(K).
Sofromthisna¨ıveworst-caseasymptoticpointofview,itwouldappearthetwo alternativestrategieshaveidenticalcosts.Yetweoughttobesuspicious.Consideranentire series of add operationstogether,ratherthanjustone.Withthe increment-size-by-1strategy,weexpandeverytime.Byconstrast,withthesizedoublingstrategy,weexpandlessandlessoftenasthearray grows,sothatmost callsto add completeinconstanttime.Soisitreallyaccuratetocharacterizethem astakingtimeproportionalto K?
Consideraseriesof N callsto A.add(x),startingwith A anempty ArrayList withinitialcapacityof M0 <N .Withtheincrement-by-1strategy,callnumber M0,(numberingfrom0), M0 +1, M0 +2,etc.willcosttimeproportionalto M0 +1, M0 +2,...,respectively.Therefore,thetotalcost, Cincr,of N>M0 operations beginningwithanemptylistofinitialsize M0 willbe
Cincr ∈ Θ(M0 + M0 +1+ ... + N )
=Θ((N + M0) N/2)
=Θ(N 2)
66
4.1.ARRAYREPRESENTATIONOFTHELISTINTERFACE 67 packagejava.util; /**AListwithaconstant-timegetoperation.Atanygiventime, *anArrayListhasa capacity, whichindicatesthemaximum *size()forwhichtheone-argumentaddoperation(whichaddsto *theend)willexecuteinconstanttime.Thecapacityexpands *automaticallyasneededsoastoprovideconstantamortized *timefortheone-argumentaddoperation.*/
publicclassArrayListextendsAbstractListimplementsCloneable{
/**AnemptyArrayListwhosecapacityisinitiallyatleast *CAPACITY.*/
publicArrayList(intcapacity){ data=newObject[Math.max(capacity,2)];count=0;
publicArrayList(){this(8);}
publicArrayList(Collectionc){ this(c.size());addAll(c);
publicintsize(){returncount;}
publicObjectget(intk){ check(k,count);returndata[k];
publicObjectremove(intk){ Objectold=data[k]; removeRange(k,k+1); returnold;
publicObjectset(intk,Objectx){ check(k,count);
Objectold=data[k]; data[k]=x; returnold;
}
}
}
}
}
Figure4.1:Implementationoftheclass java.util.ArrayList.
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
publicvoidadd(intk,Objectobj){ check(k,count+1); if(count+1>data.length)
ensureCapacity(data.length*2); System.arraycopy(data,k,data,k+1,count-k); data[k]=obj;count+=1;
/*CausethecapacityofthisArrayListtobeatleastN.*/ publicvoidensureCapacity(intN){ if(N<=data.length)
return;
Object[]newData=newObject[N]; System.arraycopy(data,0,newData,0,count); data=newData;
/**AcopyofTHIS(overridesmethodinObject).*/ publicObjectclone(){returnnewArrayList(this);}
protectedvoidremoveRange(intk0,intk1){ if(k0>=k1)
return; check(k0,count);check(k1,count+1); System.arraycopy(data,k1,data,k0,count-k1); count-=k1-k0;
privatevoidcheck(intk,intlimit){ if(k<0||k>=limit)
thrownewIndexOutOfBoundsException();
privateintcount;/*Currentsize*/ privateObject[]data;/*Currentcontents*/
Figure4.1,continued.
68
}
}
}
}
}
forfixed M0.Thecost,inotherwords,isquadraticinthenumberofitems added. Nowconsiderthedoublingstrategy.We’llanalyzeitusingthepotentialmethod from §1.4toshowthatwecanchooseaconstantvaluefor ai,theamortizedcostof the ith operation,byfindingasuitablepotentialΦ ≥ 0sothat(fromEquation1.1),
ai = ci +Φi+1 Φi, where ci denotestheactualcostofthe ith addition.Inthiscase,asuitablepotential is
Φi =4i 2Si +2S0 where Si isthecapacity(thesizeofthearray)beforethe ith operation.Afterthe firstdoubling,wealwayshave2i ≥ Si,sothatΦi ≥ 0forall i. Wecantakethenumberofitemsinthearraybeforethe ith additionas i, assumingasusualthatwenumberadditionsfrom0.Theactual cost, ci,ofthe ith additioniseither1timeunit,if i<Si,orelse(when i = Si)thecostofallocatinga doubledarray,copyingalltheexistingitems,andthenaddingonemore,whichwe cantakeas2Si timeunits(withsuitablechoiceof“timeunit,”ofcourse). When i<Si,therefore,wehave
So ai =4,showingthattheamortizedcostofaddingtotheendofanarrayunder thedoublingstrategyisindeedconstant.
4.2LinkinginSequentialStructures
Theterm linkedstructure refersgenerallytocomposite,dynamicallygrowabledata structurescomprisingsmallobjectsfortheindividualmembersconnectedtogether bymeansofpointers(links).
4.2.1SinglyLinkedLists
TheSchemelanguagehasonepervasivecompounddatastructure,the pair or cons cell, whichcanservetorepresentjustaboutanydatastructureonecanimagine. Perhapsitsmostcommonuseisinrepresentinglistsofthings,asillustratedin
69
4.2.LINKINGINSEQUENTIALSTRUCTURES
ai = ci +Φi+1 Φi =1+4(i +1) 2Si+1 +2S0 (4i 2Si +2S0) =1+4(i +1) 2Si +2S0 (4i 2Si +2S0) =4
i
S
ai = ci +Φi+1 Φi =2Si +4(i +1) 2Si+1 +2S0 (4i 2Si +2S0) =2Si +4(i +1) 4Si +2S0 (4i 2Si +2S0) =4
andwhen
=
i,wehave
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
Figure4.2a.Eachpairconsistsoftwocontainers,oneofwhichisusedtostorea (pointerto)thedataitem,andthesecondapointertothenextpairinthelist,or anullpointerattheend.InJava,aroughequivalenttothepairisaclasssuchas thefollowing:
classEntry{
Entry(Objecthead,Entrynext){ this.head=head;this.next=next;
Objecthead; Entrynext;
Wecalllistsformedfromsuchpairs singlylinked, becauseeachpaircarriesone pointer(link )toanotherpair.
Changingthestructureofalinkedlist(itssetofcontainers)involveswhatis colloquiallyknownas“pointerswinging.”Figure4.2reviewsthebasicoperations forinsertionanddeletiononpairsusedaslists.
4.2.2Sentinels
AsFigure4.2illustrates,theprocedureforinsertingordeletingatthebeginningof alinkedlistdiffersfromtheprocedureformiddleitems,becauseitisthevariable L,ratherthana next fieldthatgetschanged:
L=L.next;//RemovefirstitemoflinkedlistpointedtobyL L=newEntry("aardvark",L);//Additemtofrontoflist.
Wecanavoidthisrecoursetospecialcasesforthebeginning ofalistbyemploying aclevertrickknownasa sentinelnode.
Theideabehindasentinelistouseanextraobject,onethatdoesnotcarry oneoftheitemsofthecollectionbeingstored,toavoidhavinganyspecialcases. Figure4.3illustratestheresultingrepresentation.
Useofsentinelschangessometests.Forexample,testingto seeiflinkedlist L listisemptywithoutasentinelissimplyamatterofcomparing L tonull,whereas thetestforalistwithasentinelcompares L.next tonull.
4.2.3DoublyLinkedLists
Singlylinkedlistsaresimpletocreateandmanipulate,but theyareatadisadvantageforfullyimplementingtheJava List interface.Oneobviousproblemis thatthe previous operationonlistiteratorshasnofastimplementationonsingly linkedstructures.Oneisprettymuchforcedtoreturntothe startofthelistand followanappropriatenumberof next fieldstoalmostbutnotquitereturntothe currentposition,requiringtimeproportionaltothesizeofthelist.Asomewhat moresubtleannoyancecomesintheimplementationofthe remove operationonthe listiterator.Toremoveanitem p fromasinglylinkedlist,youneedapointerto theitem before p,becauseitisthe next fieldofthatobjectthatmustbemodified.
70
}
}
(a)Originallist
(b)Afterremoving bat with L.next=L.next.next
(c)Afteradding balance with L.next=newEntry("balance",L.next)
balance
(d)Afterdeleting ax with L=L.next
Figure4.2: Commonoperationsonthesinglylinkedlistrepresentation.Starting fromaninitiallist,weremoveobject β,andtheninsertinitsplaceanewone.Next weremovethefirstiteminthelist.Theobjectsremovedbecome“garbage,”and arenolongerreachablevia L
4.2.LINKINGINSEQUENTIALSTRUCTURES 71 L: α β γ ax bat syzygy
L: α β γ ax bat syzygy
L: α β γ δ ax bat syzygy
balance
L: α β γ δ ax bat syzygy
Bothproblemsareeasilysolvedbyaddinga predecessor linktotheobjects inourliststructure,makingboththeitemsbeforeandafter agiveniteminthe listequallyaccessible.Aswithsinglylinkedstructures, theuseoffrontandend sentinelsfurthersimplifiesoperationsbyremovingthespecialcasesofaddingtoor removingfromthebeginningorendofalist.Afurtherdevice inthecaseofdoubly linkedstructuresistomaketheentirelist circular, thatis,touseonesentinelas boththefrontandbackofthelist.Thiscutetricksavesthesmallamountofspace otherwisewastedbythe prev linkofthefrontsentinelandthe next linkofthelast.
4.3LinkedImplementationoftheListInterface
Thedoublylinkedstructuresupportseverythingweneedtodotoimplementthe Java List interface.Thetypeofthelinks(LinkedList.Entry)isprivatetothe implementation.A LinkedList objectitselfcontainsjustapointertothelist’s sentinel(whichneverchanges,oncecreated)andaninteger variablecontaining thenumberofitemsinthelist.Technically,ofcourse,thelatterisredundant, sinceonecanalwayscountthenumberofitemsinthelist,but keepingthisvariableallows size tobeaconstant-timeoperation.Figure4.5illustratesthe three maindatastructuresinvolved: LinkedList, LinkedList.Entry,andtheiterator LinkedList.LinkedIter.
72
L: ax bat syzygy sentinel (a)Three-itemlist E: sentinel (b)Emptylist
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
Figure4.3: Singlylinkedlistsemployingsentinelnodes.Thesentinelscontainno usefuldata.Theyallowallitemsinthelisttotreatedidentically,withnospecial caseforthefirstnode.Thesentinelnodeistypicallyneverremovedorreplaced whilethelistisinuse.
Figure4.4illustratestheresultingrepresentationsandtheprincipaloperationsupon it.
(b)Afterdeletingitem
(c)Afteraddingitem
(d)Afterremovingallitems,andremovinggarbage.
Figure4.4: Doublylinkedlistsemployingasinglesentinelnodetomark bothfront andback.Shadeditemisgarbage.
4.3.LINKEDIMPLEMENTATIONOFTHELISTINTERFACE 73 δ α β γ δ α sentinel L: ax bat syzygy
δ α β γ δ α sentinel L: ax bat syzygy
(a)Initiallist
γ (bat) δ α β γ δ α ǫ sentinel L: ax bat syzygy balance
ǫ (balance) α sentinel L:
Datastructureafterexecuting: L=newLinkedList<String>(); L.add("axolotl"); L.add("kludge"); L.add("xerophyte"); I=L.listIterator(); I.next();
Figure4.5: Atypical LinkedList (pointedtoby L andalistiterator(pointedto by I).Sincetheiteratorbelongstoaninnerclassof LinkedList,itcontainsan implicitprivatepointer(LinkedList.this)thatpointsbacktothe LinkedList objectfromwhichitwascreated.
74
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
β α β α axolotl kludge xerophyte L: 3 I: LinkedList.this lastReturned here 1 nextIndex
4.3.LINKEDIMPLEMENTATIONOFTHELISTINTERFACE 75 packagejava.util;
publicclassLinkedList<T>extendsAbstractSequentialList<T> implementsCloneable{
publicLinkedList(){ sentinel=newEntry(); size=0;
}
publicLinkedList(Collection<?extendsT>c){ this(); addAll(c); }
publicListIterator<T>listIterator(intk){ if(k<0||k>size)
thrownewIndexOutOfBoundsException(); returnnewLinkedIter(k);
}
publicObjectclone(){ returnnewLinkedList(this);
}
publicintsize(){returnsize;}
privatestaticclassEntry<E>{ Edata;
Entryprev,next; Entry(Edata,Entry<E>prev,Entry<E>next){ this.data=data;this.prev=prev;this.next=next;
Entry(){data=null;prev=next=this;}
privateclassLinkedIterimplementsListIterator{ SeeFigure4.7.
}
privatefinalEntry<T>sentinel; privateintsize;
}
}
}
Figure4.6:Theclass LinkedList.
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS packagejava.util;
publicclassLinkedList<T>extendsAbstractSequentialList<T> implementsCloneable{
privateclassLinkedIter<E>implementsListIterator<E>{ Entry<E>here,lastReturned; intnextIndex;
/**Aniteratorwhoseinitialnextelementisitem *KofthecontainingLinkedList.*/ LinkedIter(intk){
if(k>size-k){//Closertotheend here=sentinel;nextIndex=size; while(k<nextIndex)previous(); }else{ here=sentinel.next;nextIndex=0; while(k>nextIndex)next();
}
lastReturned=null;
}
publicbooleanhasNext(){returnhere!=sentinel;} publicbooleanhasPrevious(){returnhere.prev!=sentinel;}
publicEnext(){ check(here); lastReturned=here; here=here.next;nextIndex+=1; returnlastReturned.data;
}
publicEprevious(){ check(here.prev); lastReturned=here=here.prev; nextIndex-=1; returnlastReturned.data;
}
76
. . .
Figure4.7:Theinnerclass LinkedList.LinkedIter.Thisversiondoesnotcheck forconcurrentmodificationoftheunderlyingList.
4.3.LINKEDIMPLEMENTATIONOFTHELISTINTERFACE 77
publicvoidadd(Tx){
lastReturned=null; Entry<T>ent=newEntry<T>(x,here.prev,here); nextIndex+=1; here.prev.next=here.prev=ent; size+=1;
publicvoidset(Tx){
checkReturned();
lastReturned.data=x;
publicvoidremove(){ checkReturned();
lastReturned.prev.next=lastReturned.next; lastReturned.next.prev=lastReturned.prev; if(lastReturned==here) here=lastReturned.next; else nextIndex-=1; lastReturned=null; size-=1;
publicintnextIndex(){returnnextIndex;} publicintpreviousIndex(){returnnextIndex-1;}
voidcheck(Objectp){
if(p==sentinel)thrownewNoSuchElementException();
voidcheckReturned(){
if(lastReturned==null)thrownewIllegalStateException ();
}
}
}
}
} }
Figure4.7,continued.
Figure4.8: Threevarietiesofqueues—sequentialdatastructuresmanipulatedonly attheirends.
4.4SpecializedLists
Acommonuseforlistsisinrepresentingsequencesofitemsthataremanipulated andexaminedonlyatoneorbothends.Ofthese,themostfamiliarare
• The stack (or LIFOqueue for“Last-InFirstOut”),whichsupportsonly addinganddeletingitemsatoneend;
• The queue (or FIFOqueue, for“First-InFirstOut”),whichsupportsadding atoneendanddeletionfromtheother;and
• The deque or double-endedqueue,whichsupportsadditionanddeletionfrom eitherend.
whoseoperationsareillustratedinFigure4.8.
4.4.1Stacks
Javaprovidesatype java.util.Stack asanextensionofthetype java.util.Vector (itselfanoldervariationof ArrayList):
packagejava.util;
publicclassStack<T>extendsVector<T>{
/**AnemptyStack.*/ publicStack(){}
publicbooleanempty(){returnisEmpty();}
publicTpeek(){check();returnget(size()-1);} publicTpop(){check();returnremove(size()-1);} publicTpush(Tx){add(x);returnx;} publicintsearch(Objectx){ intr=lastIndexOf(x); returnr==-1?-1:size()-r;
} privatevoidcheck(){
78
D C B A push pop (a)Stack A B C D add removeFirst (b)(FIFO)Queue A B C D add removeFirst removeLast addFirst (c)Deque
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
packageucb.util; importjava.util.*; /**ALIFOqueueofT’s.*/ publicinterfaceStack<T>{
/**TrueiffTHISisempty.*/ booleanisEmpty();
/**Numberofitemsinthestack.*/ intsize();
/**Thelastiteminsertedinthestackandnotyetremoved.*/ Ttop();
/**Removeandreturnthetopitem.*/ Tpop();
/**AddXasthelastitemofTHIS.*/ voidpush(Tx);
/**Theindexofthemost-recentlyinserteditemthatis.equalto *X,or-1ifitisnotpresent.Item0istheleastrecently *pushed.*/ intlastIndexOf(Objectx);
Figure4.9:Apossibledefinitionoftheabstracttype Stack asaJavainterface. Thisis not partoftheJavalibrary,butitsmethodnamesaremoretraditionalthan thoseofJava’sofficial java.util.Stack type.Itisdesigned,furthermore,tofitin withimplementationsofthe List interface.
if(empty())thrownewEmptyStackException();
However,becauseitisoneoftheoldertypesinthelibrary, java.util.Stack doesnotfitinaswellasitmight.Inparticular,thereisnoseparateinterface describing“stackness.”Insteadthereisjustthe Stack class,inextricablycombining aninterfacewithanimplementation.Figure4.9showshowa Stack interface(in theJavasense)mightbedesigned.
Stackshavenumeroususes,inpartbecauseoftheircloserelationshipto recursion and backtrackingsearch. Consider,forexample,asimple-mindedstrategyfor findinganexittoamaze.Weassumesome Maze class,anda Position classthat representsapositioninthemaze.Fromanypositioninthemaze,youmaybeable tomoveinuptofourdifferentdirections(representedbynumbers0–4,standing perhapsforthecompasspointsnorth,east,south,andwest).Theideaisthatwe leavebreadcrumbstomarkeachpositionwe’vealreadyvisited.Fromeachposition wevisit,wetrysteppingineachofthepossibledirectionsandcontinuingfromthat point.Ifwefindthatwehavealreadyvisitedaposition,orwe runoutofdirections togofromsomeposition,we backtrack tothelastpositionwevisitedbeforethat andcontinuewiththedirectionswehaven’ttriedyetfromthatpreviousposition,
4.4.SPECIALIZEDLISTS 79
}
} }
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
stoppingwhenwegettoanexit(seeFigure4.10).Asaprogram (usingmethod namesthatIhopearesuggestive),wecanwritethisintwoequivalentways.First, recursively:
/**FindanexitfromMstartingfromPLACE.*/ voidfindExit(MazeM,Positionplace){
if(M.isAnExit(place))
M.exitAt(place);
if(!M.isMarkedAsVisited(place)){
M.markAsVisited(place);
for(dir=0;dir<4;dir+=1)
if(M.isLegalToMove(place,dir)) findExit(M,place.move(dir));
Second,aniterativeversion:
importucb.util.Stack; importucb.util.ArrayStack;
/**FindanexitfromMstartingfromPLACE.*/ voidfindExit(MazeM,Positionplace0){
Stack<Position>toDo=newArrayStack<Position>(); toDo.push(place0);
while(!toDo.isEmpty()){
Positionplace=toDo.pop();
if(M.isAnExit(place))
M.exitAt(place);
if(!M.isMarkedAsVisited(place)){ M.markAsVisited(place); for(dir=3;dir>=0;dir-=1)
if(M.isLegalToMove(place,dir)) toDo.push(place.move(dir));
where ArrayStack isanimplementationof ucb.util.Stack (see §4.5).
Theideabehindtheiterativeversionof findExit isthatthe toDo stackkeeps trackofthevaluesof place thatappearasargumentsto findExit intherecursive version.Bothversionsvisitthesamepositionsinthesameorder(whichiswhy thelooprunsbackwardsintheiterativeversion).Ineffect, the toDo playsthe roleofthe callstack intherecursiveversion.Indeed,typicalimplementations of recursiveproceduresalsouseastackforthispurpose,althoughitisinvisibletothe programmer.
80
} }
} } }
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 81
Figure4.10: Exampleofsearchingamazeusingbacktrackingsearch(the findExit procedurefromthetext).Westartinthelower-leftcorner. Theexitisthedark squareontheright.Thelightlyshadedsquaresarethosevisitedbythealgorithm, assumingthatdirection0isup,1isright,2isdown,and3isleft.Thenumbersin thesquaresshowtheorderinwhichthealgorithmfirstvisits them.
4.4.2FIFOandDouble-EndedQueues
Afirst-in,first-outqueueiswhatweusuallymeanby queue ininformalEnglish(or line inAmericanEnglish):peopleorthingsjoinaqueueatoneend,andleaveitat theother,sothatthefirsttoarrive(or enqueue)arethefirsttoleave(or dequeue).
Queuesappearextensivelyinprograms,wheretheycanrepresentsuchthingsas sequencesofrequeststhatneedservicing.TheJavalibrary (asofJava2,version
1.5)providesastandardFIFOqueueinterface,butitisintendedspecificallyfor usesinwhichaprogrammighthavetowaitforanelementtoget addedtothe queue.Figure4.11showsamore“classic”possibleinterface.
The deque, whichisthemostgeneral,double-endedqueue,probablyseesrather littleexplicituseinprograms.Itusesevenmoreofthe List interfacethandoesthe FIFOqueue,andsotheneedtospecializeisnotparticularly acute.Nevertheless, forcompleteness,IhaveincludedapossibleinterfaceinFigure4.12.
4.5Stack,Queue,andDequeImplementation
Wecouldimplementaconcretestackclassforour ucb.util.Stack interfaceas inFigure4.13:asanextensionof ArrayList justas java.util.Stack isanextensionof java.util.Vector.Asyoucansee,thenamesofthe Stack interface methodsaresuchthatwecansimplyinheritimplementations of size, isEmpty, and lastIndexOf from ArrayList
Butlet’sinsteadspiceupourimplementationof ArrayStack withalittlegeneralization.Figure4.14illustratesaninterestingkindofclassknownasan adapter or wrapper (anotherofthe designpatterns introducedatthebeginningofChapter3). Theclass StackAdapter showntherewillmakeany List objectlooklikeastack. Thefigurealsoshowsanexampleofusingittomakeaconcretestackrepresentation outofthe ArrayList class.
Likewise,givenanyimplementationofthe List interface,wecaneasilyprovide implementationsof Queue or Deque,butthereisacatch.Botharray-basedand linked-list-basedimplementationsof List willsupportour Stack interfaceequally well,giving push and pop methodsthatoperateinconstantamortizedtime.However,usingan ArrayList inthesamena¨ıvefashiontoimplementeitherofthe
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
packageucb.util;
/**AFIFOqueue*/ publicinterfaceQueue<T>{
/**TrueiffTHISisempty.*/ booleanisEmpty();
/**Numberofitemsinthequeue.*/ intsize();
/**Thefirstiteminsertedinthestackandnotyetremoved. *Requires!isEmpty().*/ Tfirst();
/**Removeandreturnthefirstitem.Requires!isEmpty().*/ TremoveFirst();
/**AddXasthelastitemofTHIS.*/ voidadd(Tx);
/**Theindexofthefirst(least-recentlyinserted)itemthatis *.equaltoX,or-1ifitisnotpresent.Item0isfirst.*/ intindexOf(Objectx);
/**Theindexofthelast(most-recentlyinserted)itemthat is *.equaltoX,or-1ifitisnotpresent.Item0isfirst.*/ intlastIndexOf(Objectx);
/**Adouble-endedqueue*/ publicinterfaceDeque<T>extendsQueue<T>{
/**Thelastinsertediteminthesequence.Assumes!isEmpty().*/ Tlast();
/**InsertXatthebeginningofthesequence.*/ voidaddFirst(Tx);
/**Removethelastitemfromthesequence.Assumes!isEmpty().*/ TremoveLast();
/*PlusinheriteddefinitionsofisEmpty,size,first,add, *removeFirst,indexOf,andlastIndexOf*/
82
}
Figure4.11:ApossibleFIFO(FirstIn,FirstOut)queueinterface. packageucb.util;
}
Figure4.12:ApossibleDeque(double-endedqueue)interface
publicclassArrayStack<T>
extendsjava.util.ArrayList<T>implementsStack<T> {
/**AnemptyStack.*/ publicArrayStack(){}
publicTtop(){check();returnget(size()-1);}
publicTpop(){check();returnremove(size()-1);}
publicvoidpush(Tx){add(x);}
privatevoidcheck(){
if(empty())thrownewEmptyStackException();
packageucb.util; importjava.util.*;
publicclassStackAdapter<T>implementsStack<T>{ publicStackAdapter(List<T>rep){this.rep=rep;}
publicbooleanisEmpty(){returnrep.isEmpty();} publicintsize(){returnrep.size();}
publicTtop(){returnrep.get(rep.size()-1);}
publicTpop(){returnrep.remove(rep.size()-1);}
publicvoidpush(Tx){rep.add(x);}
publicintlastIndexOf(Objectx){returnrep.lastIndexOf ();} }
publicclassArrayStackextendsStackAdapter{ publicArrayStack(){this(newArrayList());}
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 83
}
}
Figure4.13:Animplementationof ArrayStack asanextensionof ArrayList
}
Figure4.14:AnadapterclassthatmakesanyListlooklikea Stack,andanexampleofusingittocreateanarray-basedimplementationof the ucb.util.Stack interface.
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
Queue or Deque interfacegivesverypoorperformance.Theproblemisobvious:as we’veseen,wecanaddorremovefromtheend(highindex)ofan arrayquickly, butremovingfromtheother(index0)endrequiresmovingoveralltheelementsof thearray,whichtakestimeΘ(N ),where N isthesizeofthequeue.Ofcourse,we cansimplystickto LinkedLists,whichdon’thavethisproblem,butthereisalso aclevertrickthatmakesitpossibletorepresentgeneralqueuesefficientlywithan array.
Insteadofshiftingovertheitemsofaqueuewhenweremovethefirst,let’s insteadjustchangeourideaofwhereinthearraythequeue starts. Wekeeptwo indicesintothearray,onepointingtothefirstenqueueditem,andonetothelast. Thesetwoindices“chaseeachother”aroundthearray,circlingbacktoindex0 whentheypassthehigh-indexend,andvice-versa.Suchanarrangementisknown asa circularbuffer. Figure4.15illustratestherepresentation.Figure4.16shows partofapossibleimplementation.
84
Figure4.15: Circular-bufferrepresentationofadequewithN==7.Part(a)shows aninitialemptydeque.Inpart(b),we’veinsertedfouritemsattheend.Part(c) showstheresultofremovingthefirstitem.Part(d)showsthe fulldequeresulting fromaddingfouritemstothefront.Removingthelastthreeitemsgives(e),and afterremovingonemorewehave(f).Finally,removingtherestoftheitemsfrom theendgivestheemptydequeshownin(g).
a. first last b. B C D E first last c. C D E first last d. F C D E I H G first last e. F I H G first last f. I H G first last g. lastfirst
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 85
classArrayDeque<T>implementsDeque<T>{ /**AnemptyDeque.*/ publicArrayDeque(intN){ first=0;last=N;size=0;
publicintsize(){ returnsize;
publicbooleanisEmpty(){ returnsize==0;
publicTfirst(){ returndata.get(first);
publicTlast(){ returndata.get(last);
86
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
}
}
}
}
}
Figure4.16:Implementationof Deque interfaceusingacircularbuffer.
publicvoidadd(Tx){ size+=1; resize();
last=(last+1==data.size())?0:last+1; data.put(last,x);
}
publicvoidaddFirst(Tx){ size+=1; resize();
first=(first==0)?data.size()-1:first-1; data.put(first,x); }
publicTremoveLast(){ Tval=last();
last=(last==0)?data.size()-1:last-1; returnval;
}
publicTremoveFirst(){ Tval=first();
first=(first+1==data.size())?0:first+1; returnval;
}
privateintfirst,last; privatefinalArrayList<T>data=newArrayList<T>(); privateintsize;
/**InsurethatDATAhasatleastsizeelements.*/ privatevoidresize(){ lefttothereader } etc.
Figure4.16,continued.
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION 87
}
Exercises
4.1. Implementatype Deque,asanextensionof java.util.Vector.Tothe operationsrequiredby java.util.AbstractList,add first, last, insertFirst, insertLast, removeFirst, removeLast,anddothisinsuchawaythatallthe operationson Vector continuetowork(e.g., get(0) continuestogetthesame elementas first()),andsuchthattheamortizedcostforalltheseoperations remainsconstant.
4.2. Implementatypeof List withtheconstructor: publicConcatList(List<T>L0,List<T>L1){...}
thatdoesnotsupporttheoptionaloperationsforaddingand removingobjects,but givesa view oftheconcatenationof L0 and L1.Thatis, get(i) onsuchalistgives element i intheconcatenationof L0 and L1 atthetimeofthe get operation(that is,changestothelistsreferencedby L0 and L1 arereflectedintheconcatenated list).Besurealsotomake iterator and listIterator work.
4.3. Asinglylinkedliststructurecanbecircular.Thatis,some elementinthelist canhaveatail(next)fieldthatpointstoanitem earlier inthelist(notnecessarily tothefirstelementinthelist).Comeupwithawaytodetectwhetherthereissuch acircularitysomewhereinalist.Do not, however,useanydestructiveoperations onanydatastructure.Thatis,youcan’tuseadditionalarrays,lists, Vectors,hash tables,oranythinglikethemtokeeptrackofitemsinthelist.Usejustsimple listpointerswithoutchanginganyfieldsofanylist.See CList.java inthe hw5 directory.
4.4. Theimplementationsof LinkedList inFigure4.6and LinkedList.LinkedIter inFigure4.7donotprovidecheckingforconcurrentmodificationoftheunderlying list.Asaresult,acodefragmentsuchas
for(ListIterator<Object>i=L.listIterator();i.hasNext();){ if(bad(i.next()))
L.remove(i.previousIndex());
canhaveunexpectedeffects.Whatissupposedtohappen,accordingtothespecificationfor LinkedList,isthat i becomesinvalidassoonasyoucall L.remove,and subsequentcallsonmethodsof i willthrow ConcurrentModificationExceptions
a.Forthe LinkedList class,whatgoeswrongwiththeloopabove,andwhy?
b.Modifyour LinkedList implementationtoperformthecheckforconcurrent modification(sothattheloopabovethrows ConcurrentModificationException).
88
CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
}
4.5. Devisea DequeAdapter classanalogousto StackAdapter,thatallowsoneto createdeques(orqueues)fromarbitrary List objects.
4.6. Provideanimplementationforthe resize methodof ArrayDeque (Figure4.16). Yourmethodshoulddoublethesizeofthe ArrayList beingusedtorepresentthe circularbufferifexpansionisneeded.Becareful!Youhavetodomorethansimply increasethesizeofthearray,ortherepresentationwillbreak.
89
4.5.STACK,QUEUE,ANDDEQUEIMPLEMENTATION
90 CHAPTER4.SEQUENCESANDTHEIRIMPLEMENTATIONS
Chapter5
Trees
Inthischapter,we’lltakeabreakfromthedefinitionofinterfacestolibrariesand lookatoneofthebasicdata-structuringtoolsusedinrepresentingsearchablecollectionsofobjects,expressions,andotherhierarchicalstructures,the tree.Theterm tree referstoseveraldifferentvariantsofwhatwewilllatercallaconnected,acyclic, undirectedgraph.Fornow,though,let’s not callitthatandinsteadconcentrate ontwovarietiesof rootedtree. First,
Definition: an orderedtree consistsof
A.A node 1 ,whichmaycontainapieceofdataknownasa label.Dependingontheapplication,anodemaystandforanynumberof thingsandthedatalabelingitmaybearbitrarilyelaborate.The nodepartofatreeisknownasits rootnode or root
B.Asequenceof0ormoretrees,whoserootnodesareknownasthe children oftheroot.Eachnodeinatreeisthechildofatmost onenode—its parent.Thechildrenofanynodeare siblings ofeach other2
Thenumberofchildrenofanodeisknownasthe degree ofthatnode.A nodewithnochildreniscalleda leaf(node),externalnode, or terminal node; allothernodesarecalled internal or non-terminal nodes.
Weusuallythinkoftherebeingconnectionscalled edges betweeneachnodeand itschildren,andoftenspeakof traversing or following anedgefromparenttochild orback.Startingatanynode, r,thereisaunique,non-repeating path orsequence ofedgesleadingfrom r toanyothernode, n,inthetreewith r asroot.Allnodes alongthatpath,including r and n,arecalled descendents of r,and ancestors of n Adescendentof r isa properdescendent ifitisnot r itself; properancestors are definedanalogously.Anynodeinatreeistherootofa subtree ofthattree.Again, a propersubtree ofatreeisonethatnotequalto(andisthereforesmallerthan)
1 Theterm vertex mayalsobeused,asitiswithothergraphs,but node istraditionalfortrees.
2 Theword father hasbeenusedinthepastfor parent and son for child. Ofcourse,thisisno longerconsideredquiteproper,althoughIdon’tbelievethatthefreedomtoliveone’slifeasatree waseveranofficialgoalofthewomen’smovement.
91
Thedistancefromanode, n,totheroot, r,ofatree—thenumberofedgesthat mustbefollowedtogetfrom n to r—isthe level (or depth)ofthatnodeinthetree. Themaximumlevelofallnodesinatreeiscalledthe height ofthetree.Thesum ofthelevelsofallnodesinatreeisthe pathlength ofthetree.Wealsodefineof the internal(external)pathlength asthesumofthelevelsofallinternal(external) nodes.Figure5.1illustratesthesedefinitions.Allthelevelsshowninthatfigure arerelativetonode0.Italsomakessensetotalkabout“thelevelofnode7inthe treerootedatnode1,”whichwouldbe2.
Ifyoulookcloselyatthedefinitionoforderedtree,you’llseethatithastohave atleastonenode,sothatthereisnosuchthingasanemptyorderedtree.Thus, childnumber j ofanodewith k>j childrenisalwaysnon-empty.Itiseasyenough tochangethedefinitiontoallowforemptytrees:
Definition: A positionaltree iseither
A.Empty(missing),or
B.Anode(generallylabeled)and,foreverynon-negativeinteger, j, apositionaltree—the jth child.
Thedegreeofanodeisthenumberofnon-emptychildren.Ifallnodes inatreehavechildrenonlyinpositions <k,wesayitisa k-arytree. Leafnodesarethosewithnonon-emptychildren;allothersareinternal nodes.
Perhapsthemostimportantkindofpositionaltreeisthe binarytree,inwhich k =2.Forbinarytrees,wegenerallyrefertochild0andchild1 asthe left and right children,respectively.
A full k-arytree isoneinwhichallinternalnodesexceptpossiblytherightmost bottomonehavedegree k.Atreeis complete ifitisfullandallitsleafnodesoccur lastwhenreadtoptobottom,lefttoright,asinFigure5.2c. Completebinarytrees areofinterestbecausetheyare,insomesense,maximally“bushy”;foranygiven numberofinternalnodes,theyminimizetheinternalpathlengthofthetree,which
92 CHAPTER5.TREES 0 1 2 3 4 5 6 7 8 9 height=3 Level0Pathlength:18 Level1Externalpathlength:12 Level2Internalpathlength:6 Level3
Figure5.1: Anillustrativeorderedtree.Theleafnodesaresquares.As istradtional, thetree“grows”downward.
thetree.Anysetofdisjointtrees(suchasthetreesrootedatallthechildrenofa node)iscalleda forest
isinterestingbecauseitisproportionaltothetotaltimerequiredtoperformthe operationofmovingfromtheroottoaninternalnodeoncefor eachinternalnode inthetree.
5.1Expressiontrees
Treesaregenerallyinterestingwhenoneistryingtorepresentarecursively-defined type.Afamiliarexampleisthe expressiontree, whichrepresentsanexpression recursivelydefinedas
• Anidentifierorconstant,or
• Anoperator(whichstandsforsomefunctionof k arguments)and k expressions (whichareitsoperands).
Giventhisdefinition,expressionsareconvenientlyrepresentedbytreeswhoseinternalnodescontainoperatorsandwhoseexternalnodescontainidentifiersorconstants.Figure5.3showsarepresentationoftheexpression x*(y+3)-z
Asanillustrationofhowonedealswithtrees,considertheevaluationofexpressions.Asoftenhappens,thedefinitionofthevaluedenotedbyanexpression correspondscloselytothestructureofanexpression:
5.1.EXPRESSIONTREES 93 0 1 3 4 6 2 5 (a) 0 1 3 4 7 8 2 5 6 (b) 0 1 3 7 8 4 9 10 2 5 6 (c)
Figure5.2:Aforestofbinarytrees:(a)isnotfull;(b)isfull,butnotcomplete; (c)iscomplete.Tree(c)wouldstillbecompleteifnode10weremissing,butnotif node9weremissing.
* x + y 3 z
Figure5.3:Anexpressiontreefor x*(y+3)-z.
• Thevalueofaconstantisthevalueitdenotesasanumeral.Thevalueofa variableisitscurrently-definedvalue.
• Thevalueofanexpressionconsistingofanoperatorandoperandexpressions istheresultofapplyingtheoperatortothevaluesoftheoperandexpressions.
Thisdefinitionimmediatelysuggestsaprogram.
/**ThevaluecurrentlydenotedbytheexpressionE(given *currentvaluesofanyvariables).AssumesErepresents *avalidexpressiontree,andthatallvariables *containedinithavevalues.*/ staticinteval(TreeE)
if(E.isConstant()) returnE.valueOf(); elseif(E.isVar())
returncurrentValueOf(E.variableName()); else returnperform(E.operator(), eval(E.left()),eval(E.right()));
Here,weposittheexistenceofadefinitionof Tree thatprovidesoperatorsfor detectingwhether E isaleafrepresentingaconstantorvariable,forextractingthe data—values,variablenames,oroperatornames—storedat E,andforfindingthe leftandrightchildrenof E (forinternalnodes).Weassumealsothat perform takes anoperatorname(e.g., "+")andtwointegervalues,andperformstheindicated computationonthoseintegers.Thecorrectnessofthisprogramfollowsimmediately byusinginductiononthestructureoftrees(thetreesrootedatanode’schildren arealwayssubtreesofthenode’stree),andbyobservingthe matchbetweenthe definitionofthevalueofanexpressionandtheprogram.
5.2Basictreeprimitives
Thereareanumberofpossiblesetsofoperationsonemightdefineontrees,just astherewereforsequences.Figure5.4showsonepossibleclass(assuminginteger labels).Typically,onlysomeoftheoperationsshownwould actuallybeprovidedin agivenapplication.Forbinarytrees,wecanbeabitmorespecialized,asshownin Figure5.5.Inpractice,wedon’toftendefine BinaryTree asanextensionof Tree, butI’vedonesoherejustasanillustration.
94
CHAPTER5.TREES
{
}
5.2.BASICTREEPRIMITIVES 95
/**ApositionaltreewithlabelsoftypeT.Theemptytreeisnull.*/ classTree<T>{
/**AleafnodewithgivenLABEL*/ publicTree(Tlabel)...
/**Aninternalnodewithgivenlabel,andKemptychildren*/ publicTree(Tlabel,intk)...
/**Thelabelofthisnode.*/ publicTlabel()...
/**Thenumberofnon-emptychildrenofthisnode.*/ publicintdegree()...
/**Numberofchildren(argumentKtotheconstructor)*/ publicintnumChildren()...
/**ChildnumberKofthis.*/ publicTree<T>child(intk)...
/**SetchildnumberKofthistoC,0<=K<numChildren(). *Cmustnotalreadybeinthistree,orvice-versa.*/ publicvoidsetChild(intk,Tree<T>C)...
classBinaryTree<T>extendsTree<T>{ publicBinaryTree(Tlabel, BinaryTree<T>left,BinaryTree<T>right){ super(label,2); setChild(0,left);setChild(1,right);
publicBinaryTree<T>left(){return(BinaryTree)child(0);} publicvoidsetLeft(BinaryTree<T>C){setChild(0,C);} publicBinaryTree<T>right(){return(BinaryTree)child(1);} publicvoidsetRight(BinaryTree<T>C){setChild(1,C);} }
}
Figure5.4:Aclassrepresentingpositionaltreenodes.
}
Figure5.5:Apossibleclassrepresentingbinarytrees.
Theoperationssofarallassume“root-down”processingofthetree,inwhichitis necessarytoproceedfromparenttochild.Whenitismoreappropriatetogothe otherway,thefollowingoperationsareusefulasanadditionto(orsubstitutefor) fortheconstructorsand child methodsof Tree.
/**TheparentofT,ifany(otherwisenull).*/ publicTree<T>parent()...
/**Setsparent()toP.*/ publicvoidsetParent(Tree<T>P);
/**AleafnodewithlabelLandparentP*/ publicTree(TL,Tree<T>P);
5.3Representingtrees
Asusual,therepresentationoneusesforatreedependsinlargepartupontheuses onehasforit.
5.3.1Root-downpointer-basedbinarytrees
Fordoingthetraversalsonbinarytreesdescribedbelow(§5.4),astraightforward transcriptionoftherecursivedefinitionisoftenappropriate,sothatthefieldsare
TL;/*Datastoredatnode.*/ BinaryTree<T>left,right;/*Leftandrightchildren*/
AsIsaidaboutthesampledefinitionof BinaryTree,thisspecializedrepresentation isinpracticemorecommonthansimplyre-usingtheimplementationof Tree.Ifthe parent operationistobesupported,ofcourse,wecanaddanadditionalpointer:
BinaryTree<T>parent;//orTree,asappopriate
5.3.2Root-downpointer-basedorderedtrees
Thefieldsusedfor BinaryTree arealsousefulforcertainnon-binarytrees,thanks tothe leftmost-child,right-sibling representation.Assumethatwearerepresenting anorderedtreeinwhicheachinternalnodemayhaveanynumberofchildren.We canhave left foranynodepointtochild#0ofthenodeandhave right pointto thenextsiblingofthenode(ifany),illustratedinFigure5.6.
Asmallexamplemightbeinorder.Considertheproblemofcomputingthe sumofallthenodevaluesinatreewhosenodescontainintegers(forwhichwe’ll usethelibraryclass Integer,sinceourlabelshavetobe Objects).Thesumofall nodesinatreeisthesumofthevalueintherootplusthesumof thevaluesinall children.Wecanwritethisasfollows:
96 CHAPTER5.TREES
Figure5.6:Usingabinarytreerepresentationtorepresent anorderedtreeofarbitrarydegree.ThetreerepresentedistheonefromFigure5.1.Left(down)links pointtothefirstchildofanode,andrightlinkstothenextsibling.
/**ThesumofthevaluesofallnodesofT,assumingTisan orderedtreewithnomissingchildren.*/ staticinttreeSum(Tree<Integer>T)
{ intS; S=T.label(); for(inti=0;i<T.degree();i+=1) S+=treeSum(T.child(i)); returnS;
(Java’sunboxingoperationsaresilentlyatworkhereturning Integer labelsinto ints.)
Aninterestingsidelightisthattheinductiveproofofthis programcontainsno obviousbasecase.Theprogramaboveisnearlyadirecttranscriptionof“thesum ofthevalueintherootplusthesumofthevaluesinallchildren.”
5.3.3Leaf-uprepresentation
Forapplicationswhere parent istheimportantoperation,and child isnot,a differentrepresentationisuseful.
Tlabel;
Tree<T>parent;/*Parentofcurrentnode*/
Here, child isanimpossibleoperation.
Therepresentationhasaratherinterestingadvantage:ituseslessspace;thereis onefewerpointersineachnode.Ifyouthinkaboutit,thismightatfirstseemodd, sinceeachrepresentationrequiresonepointerperedge.Thedifferenceisthatthe “parental”representationdoesnotneednullpointersinalltheexternalnodes,only intherootnode.Wewillseeapplicationsofthisrepresentationinlaterchapters.
5.3.REPRESENTINGTREES 97 0 1 2 3 4 5 6 9 7 8
}
5.3.4Arrayrepresentationsofcompletetrees
Whenatreeiscomplete,thereisaparticularlycompactrepresentationusingan array.ConsiderthecompletetreeinFigure5.2c.Theparent ofanynodenumbered k> 0inthatfigureisnodenumber ⌊(k 1)/2⌋ (orinJava, (k-1)/2 or (k-1)>>1); theleftchildofnode k is2k +1andtherightis2k +2.Hadwenumberedthenodes from1insteadof0,theseformulaewouldhavebeenevensimpler: ⌊k/2⌋ forthe parent,2k fortheleftchild,and2k +1fortheright.Asaresult,wecanrepresent suchcompletetreesasarrayscontainingjustthe label information,usingindices intothearrayaspointers.Boththeparentandchildoperationsbecomesimple. Ofcourse,onemustbecarefultomaintainthecompletenessproperty,orgapswill developinthearray(indeed,forcertainincompletetrees, itcanrequireanarray with2h 1elementstorepresentatreewith h nodes).
Unfortunately,theheadersneededtoeffectthisrepresentationdifferslightly fromtheonesabove,sinceaccessinganelementofatreerepresentedbyanarray inthiswayrequiresthreepiecesofinformation—anarray,anupperbound,andan index—ratherthanjustasinglepointer.Inaddition,wepresumablywantroutines forallocatingspaceforanewtree,specifyinginadvanceitssize.Hereisanexample, withsomeofthebodiessuppliedaswell.
/**ABinaryTree2<T>isanentirebinarytreewithlabelsoftypeT. Thenodesinitaredenotedbytheirdepth-firstnumberin acompletetree.*/
classBinaryTree2<T>{ protectedT[]label; protectedintsize;
/**AnewBinaryTree2withroomforNlabels.*/ publicBinaryTree2(intN){ label=(T[])newObject[N];size=0; }
publicintcurrentSize(){returnsize;} publicintmaxSize(){returnlabel.length;}
/**ThelabelofnodeKinbreadth-firstorder. *Assumes0<=k<size.*/ publicTlabel(intk){returnlabel[k];}
/**Causelabel(K)tobeVAL.*/ publicvoidsetLabel(intk,Tval){label[k]=val;}
publicintleft(intk){return2*k+1;} publicintright(intk){return2*k+2;} publicintparent(intk){return(k-1)/2;} Continues...
98
CHAPTER5.TREES
Figure5.7:ThebinarytreeinFigure5.2c,representedwith anarray.Thelabels ofthenodeshappentoequaltheirbreadth-firstpositions.Inthatfigure,nodes7 and8aretheleftandrightchildrenofnode3.Therefore,theirlabelsappearat positions7(3 · 2+1)and8(3 · 2+2).
Continuationof BinaryTree2<T>:
/**Addonemorenodetothetree,thenextinbreadth-first *order.AssumescurrentSize()<maxSize().*/ publicvoidextend(Tlabel){ this.label[size]=label;size+=1;
Wewillseethisarrayrepresentationlater,whenwedealwithheapdatastructures. TherepresentationofthebinarytreeinFigure5.2cisillustratedinFigure5.7.
5.3.5Alternativerepresentationsofemptytrees
Inourrepresentations,theemptytreetendstohaveaspecialstatus.Forexample,wecanformulateamethodtoaccesstheleftnodeofatree, T,withthe syntax T.left(),butwecan’twriteamethodsuchthat T.isEmpty() istrueiff T referencestheemptytree.Instead,wemustwrite T==null.Thereason,of course,isthatwerepresenttheemptytreewiththenullpointer,andnoinstance methodsaredefinedonthenullpointer’sdynamictype(moreconcretely,wegeta NullPointerException ifwetry).Ifthenulltreewererepresentedbyanordinary objectpointer,itwouldn’tneedaspecialstatus.
Forexample,wecouldextendourdefinitionof Tree fromthebeginningof §5.2 asfollows:
classTree<T>{
publicfinalTree<T>EMPTY=newEmptyTree<T>();
/**TrueiffTHISistheemptytree.*/ publicbooleanisEmpty(){returnfalse;}
privatestaticclassEmptyTree<T>extendsTree<T>{
/**Theemptytree*/ privateEmptyTree(){}
publicbooleanisEmpty(){returntrue;} publicintdegree(){return0;} publicintnumChildren(){return0;} /**Thekthchild(alwaysanerror).*/
publicTree<T>child(intk){
5.3.REPRESENTINGTREES 99 0 1 2 3 4 5 6 7 8 9
10
} }
thrownewIndexOutOfBoundsException(); } /**ThelabelofTHIS(alwaysanerror).*/ publicTlabel(){
thrownewIllegalStateException();
Thereisonlyoneemptytree(guaranteedbecausethe EmptyTree classisprivate tothe Tree class,anexampleoftheSingletondesignpattern),butthis treeisa full-fledgedobject,andwewillhavelessneedtomakespecialtestsfornulltoavoid exceptions.We’llbeextendingthisrepresentationfurtherinthediscussionsoftree traversals(see §5.4.2).
5.4Treetraversals.
Thefunction eval in §5.1 traverses (or walks)itsargument—thatis,itprocesses eachnodeinthetree.Traversalsareclassifiedbytheorderinwhichtheyprocess thenodesofatree.Intheprogram eval,wefirsttraverse(i.e.,evaluateinthis case)anychildrenofanode,andthenperformsomeprocessingontheresults ofthesetraversalsandotherdatainthenode.Thelatterprocessingisknown genericallyas visiting thenode.Thus,thepatternfor eval is“traversethechildren ofthenode,thenvisitthenode,”anorderknownas postorder.Onecouldalso usepostordertraversalforprintingouttheexpressiontreeinreversePolishform, wherevisitinganodemeansprintingitscontents(thetreeinFigure5.3wouldcome outas“xy3+*z-).Iftheprimaryprocessingforeachnode(the“visitation”) occurs before thatofthechildren,givingthepattern“visitthenode,thentraverse itschildren”,wegetwhatisknownas preordertraversal.Finally,thenodesin Figures5.1and5.2areallnumberedin levelorder or breadth-firstorder,inwhich anodesatagivenlevelofthetreearevisitedbeforeanynodesatthenext.
Allofthetraversalorderssofarmakesenseforanykindoftreewe’veconsidered. Thereisoneotherstandardtraversalorderingthatapplies exclusivelytobinary trees:the inorder or symmetric traversal.Here,thepatternis“traversetheleft childofthenode,visitthenode,andthentraversetheright child.”Inthecase ofexpressiontrees,forexample,suchanorderwouldreproducetherepresented expressionininfixorder.Actually,that’snotquiteaccurate,becausetogetthe expressionproperlyparenthesized,thepreciseoperation wouldhavetobesomething like“writealeftparenthesis,thentraversetheleftchild,thenwritetheoperator, thentraversetherightchild,thenwritearightparenthesis,”inwhichthenode seemstobevisitedseveraltimes.However,althoughsuchexampleshaveledtoat leastoneattempttointroduceamoregeneralnotationfortraversals3 ,weusually
3Forexample,Wulf,Shaw,Hilfinger,andFlonusedsuchaclassificationschemein Fundamental StructuresofComputerScience (Addison-Wesley,1980).Underthatsystem,apreordertraversal isNLR(forvisitNode,traverseLeft,traverseRight),postorderisLRN,inorderisLNR,andthe
100 CHAPTER5.TREES
} } }
justclassifythemapproximatelyinoneofthecategoriesdescribedaboveandleave itatthat.
5.4.1Generalizedvisitation
I’vebeendeliberatelyvagueaboutwhat“visiting”means,sincetreetraversalis ageneralconceptthatisnotspecifictoanyparticularactionatthetreenodes. Infact,itispossibletowriteageneraldefinitionoftraversalthattakesasaparametertheactiontobe“visitedupon”eachnodeofthetree. Inlanguagesthat, likeScheme,havefunctionclosures,wesimplymakethevisitationparameterbea functionparameter,asin
;;VisitthenodesofTREE,applyingVISITtoeachininorder
(defineinorder-walk(treevisit)
(if(not(null?tree))
(begin(inorder-walk(lefttree)visit) (visittree)
(inorder-walk(righttree)visit))))
sothatprintingallthenodesofatree,forexampleisjust
(inorder-walkmyTree(lambda(x)(display(labelx))(newline)))
Ratherthanusingfunctions,inJavaweuseobjects(asforthe java.util.Comparator interfacein §2.2.4).Forexample,wecandefineinterfacessuchas
publicinterfaceTreeVisitor<T>{ voidvisit(Tree<T>node);
publicinterfaceBinaryTreeVisitor<T>{ voidvisit(BinaryTree<T>node);
trueorderforwritingparenthesizedexpressionsisNLNRN. Thisnomenclatureseemsnottohave caughton.
5.4.TREETRAVERSALS. 101 6 3 0 2 1 5 4 Postorder 0 1 2 3 4 5 6 Preorder 4 1 0 3 2 5 6 inorder
Figure5.8:Orderofnodevisitationinpostorder,preorder,andinordertraversals.
Figure5.8illustratesthenodesofseveralbinarytreesnumberedintheorder theywouldbevisitedbypreorder,inorder,andpostordertreetraversals.
}
}
The inorder-walk procedureabovebecomes static<T>BinaryTreeVisitor<T>
inorderWalk(BinaryTree<T>tree, BinaryTreeVisitor<T>visitor)
if(tree!=null){
inorderWalk(tree.left(),visitor); visitor.visit(tree);
inorderWalk(tree.right(),visitor); } returnvisitor; }
andoursamplecallis
inorderWalk(myTree,newPrintNode());
where myTree is,let’ssay,a BinaryTree<String> andwehavedefined
classPrintNodeimplementsBinaryTreeVisitor<String>{ publicvoidvisit(BinaryTree<String>node){ System.out.println(node.label()); } }
Clearly,the PrintNode classcouldalsobeusedwithotherkindsoftraverals.Alternatively,wecanleavethevisitoranonymous,asitwasin theoriginalScheme program:
inorderWalk(myTree, newBinaryTreeVisitor<String>(){
publicvoidvisit(BinaryTree<String>node){ System.out.println(node.label()); } });
Thegeneralideaofencapsulatinganoperationaswe’vedone hereandthen carryingittoeachiteminacollectionisanotherdesignpatternknownsimplyas Visitor.
102
CHAPTER5.TREES
{
Byaddingstatetoavisitor,wecanuseitto accumulate results:
/**ATreeVisitorthatconcatenatesthelabelsofallnodesit *visits.*/
publicclassConcatNodeimplementsBinaryTreeVisitor<String>{ privateStringBufferresult=newStringBuffer(); publicvoidvisit(BinaryTree<String>node){ if(result.length()>0)
result.append(",");
result.append(node.label());
publicStringtoString(){returnresult.toString();}
Withthisdefinition,wecanprintacomma-separatedlistoftheitemsin myTree in inorder:
System.out.println(inorderWalk(myTree,newConcatNode ())); (ThisexampleillustrateswhyIhad inorderWalk returnitsvisitorargument.I suggestthatyougothroughallthedetailsofwhythisexampleworks.)
5.4.2Visitingemptytrees
Idefinedthe inorderWalk methodof §5.4.1tobeastatic(class)methodrather thananinstancemethodinparttomakethehandlingofnulltreesclean.Ifweuse thealternativeempty-treerepresentationof §5.3.5,ontheotherhand,wecanavoid special-casingthenulltreeandmaketraversalmethodsbepartofthe Tree class. Forexample,hereisapossiblepreorder-walkmethod:
classTree<T>{
publicTreeVisitor<T>preorderWalk(TreeVisitor<T>visitor){ visitor.visit(this);
for(inti=0;i<numChildren();i+=1) child(i).preorderWalk(visitor); returnvisitor;
privatestaticclassEmptyTree<T>extendsTree<T>{
publicTreeVisitor<T>preorderWalk(TreeVisitor<T>visitor){ returnvisitor;
Hereyouseethattherearenoexplicittestsfortheemptytreeatall;everythingis implicitinwhichofthetwoversionsof preorderWalk getcalled.
5.4.TREETRAVERSALS. 103
}
}
} ···
} } }
importjava.util.Stack; publicclassPreorderIterator<T>implementsIterator<T> { privateStack<BinaryTree<T>>toDo=newStack<BinaryTree<T>>();
/**AnIteratorthatreturnsthelabelsofTREEin *preorder.*/
publicPreorderIterator(BinaryTree<T>tree){ if(tree!=null)
toDo.push(tree);
publicbooleanhasNext(){ return!toDo.empty(); }
publicTnext(){ if(toDo.empty())
thrownewNoSuchElementException(); BinaryTree<T>node=toDo.pop(); if(node.right()!=null) toDo.push(node.right()); if(node.left()!=null) toDo.push(node.left()); returnnode.label();
publicvoidremove(){ thrownewUnsupportedOperationException();
5.4.3Iteratorsontrees
Recursionfitstreedatastructuresperfectly,sincetheyarethemselvesrecursively defineddatastructures.Thetaskofprovidinganon-recursivetraversalofatree usinganiterator,ontheotherhand,israthermoretroublesomethanwasthecase forsequences.
Onepossibleapproachistouseastackandsimplytransformtherecursive structureofatraversalinthesamemannerweshowedforthe findExit procedure in §4.4.1.Wemightgetaniteratorlikethatfor BinaryTrees showninFigure5.9. Anotheralternativeistouseatreedatastructurewithparentlinks,asshown forbinarytreesinFigure5.10.Asyoucansee,thisimplementationkeepstrackof thenextnodetobevisited(inpostorder)inthefield next.Itfindsthenodetovisit after next bylookingattheparent,anddecidingwhattodobasedonwhether next istheleftorrightchildofitsparent.Sincethisiteratordoespostordertraversal, thenodeafter next is next’sparentif next isarightchild,andotherwiseitisthe
104
CHAPTER5.TREES
}
}
} }
Figure5.9:Aniteratorforpreorderbinary-treetraversal usingastacktokeeptrack oftherecursivestructure.
deepest,leftmostdescendentoftherightchildoftheparent.
Exercises
5.1. ImplementanIteratorthatenumeratesthelabelsofatree’s nodesininorder, usingastackasinFigure5.9.
5.2. ImplementanIteratorthatenumeratesthelabelsofatree’s nodesininorder, usingparentlinksasinFigure5.10.
5.3. ImplementapreorderIteratorthatoperatesonthegeneraltype Tree (rather than BinaryTree).
5.4.TREETRAVERSALS. 105
importjava.util.Stack; publicclassPostorderIterator<T>implementsIterator<T>{ privateBinaryTree<T>next;
/**AnIteratorthatreturnsthelabelsofTREEin *postorder.*/
publicPostorderIterator(BinaryTree<T>tree){ next=tree;
while(next!=null&&next.left()!=null) next=next.left(); }
publicbooleanhasNext(){ returnnext!=null;
publicTnext(){ if(next==null)
thrownewNoSuchElementException(); Tresult=next.label(); BinaryTree<T>p=next.parent(); if(p.right()==next ||p.right()==null)
//Havejustfinishedwiththerightchildofp. next=p; else{ next=p.right();
while(next!=null&&next.left()!=null) next=next.left();
} returnresult;
publicvoidremove(){
thrownewUnsupportedOperationException();
106 CHAPTER5.TREES
}
}
} }
Figure5.10:Aniteratorforpostorderbinary-treetraversalusing parentlinks inthe treetokeeptrackoftherecursivestructure.
Chapter6 SearchTrees
Aratherimportantuseoftreesisinsearching.Thetaskisto findoutwhether sometargetvalueispresentinadatastructurethatrepresentsasetofdata,and possiblytoreturnsomeauxiliaryinformationassociatedwiththatvalue.Inall thesesearches,weperformanumberofstepsuntilweeitherfindthevaluewe’re lookingfor,orexhaustthepossibilities.Ateachstep,weeliminatesomepartofthe remainingsetfromfurtherconsideration.Inthecaseoflinearsearches(see §1.3.1), weeliminateoneitemateachstep.Inthecaseofbinarysearches(see §1.3.4),we eliminatehalftheremainingdataateachstep.
Theproblemwithbinarysearchisthatthesetofsearchitems isdifficultto change;addinganewitem,unlessitislargerthanallexistingdata,requiresthat wemovesomeportionofthearrayovertomakeroomforthenewitem.Theworstcasecostofthisoperationrisesproportionatelywiththesizeoftheset.Changing thearraytoalistsolvestheinsertionproblem,butthecrucialoperationofabinary search—findingthemiddleofasectionofthearray,becomesexpensive.
Enterthetree.Let’ssupposethatwehaveasetofdatavalues,thatwecan extractfromeachdatavaluea key, andthatthesetofpossiblekeysis totally ordered—thatis,wecanalwayssaythatonekeyiseitherlessthan,greaterthan, orequaltoanother.Whatthesemeanexactlydependsonthekindofdata,but thetermsaresupposedtobesuggestive.Wecanapproximatebinarysearchby havingthesedatavaluesserveasthelabelsofa binarysearchtree (or BST ),which isdefinedtobebinarytreehavingthefollowingproperty:
Binary-Search-TreeProperty. Foreverynode, x,ofthetree,all nodesintheleftsubtreeof x havekeysthatarelessthanorequalto thekeyof x andallnodesintherightsubtreeof x havekeysthatare greaterthanorequaltothekeyof x. ✷
Figure6.1aisanexampleofatypicalBST.Inthatexample,thelabelsareintegers, thekeysarethesameasthelabels,andtheterms“lessthan,” “greaterthan,”and “equalto”havetheirusualmeanings.
Thekeysdon’thavetobeintegers.Ingeneral,wecanorganizeasetofvalues intoaBSTusingany totalordering onthekeys.Atotalordering,let’scallit‘ ’, hasthefollowingproperties:
107
• Completeness: Foranyvalues x and y,either x y or y x,orboth;
• Transitivity: If x y and y z,then x z,and
• Anti-symmetry: If x y and y x,then x = y.
Forexample,thekeyscanbeintegers,andgreaterthan,etc.,canhavetheirusual meanings.Orthedataandkeyscanbestrings,withtheorderingbeingdictionary order.Orthedatacanbepairs,(a,b),andthekeyscanbethefirstitemsofthe pairs.Adictionaryislikethat—itisorderedbythewordsbeingdefined,regardless oftheirmeanings.Thislastorderisanexamplewhereonemightexpecttohave severaldistinctitemsinthesearchtreewithequalkeys.
AnimportantpropertyofBSTs,whichfollowsimmediatelyfromtheirdefinition, isthattraversingaBSTininordervisitsitsnodesinascendingorderoftheirlabels. Thisleadstoasimplealgorithmforsortingknownas“treesort.”
/**PermutetheelementsofAintonon-decreasingorder.Assumes *theelementsofAhaveanorderonthem.*/ staticvoidsort(SomeType[]A){ inti; BSTT; T=null; for(i=0;i<A.length;i+=1){
insert A[i] intosearchtree T. } i=0; traverse T ininorder,wherevisitinganode, Q,means A[i]=Q.label();i+=1;
108
CHAPTER6.SEARCHTREES
42 19 16 25 30 60 50 91 (a) 16 19 25 ... (b)
Figure6.1:Twobinarysearchtrees.Tree(b)isright-leaninglineartree.
}
Thearraycontainselementsoftype SomeType,bywhichIintendtodenoteatype thathasaless-thanandequalsoperatorsonit,asrequiredbythedefinitionofa BST.
6.1OperationsonaBST
ABSTissimplyabinarytree,andthereforewecanusetherepresentationfrom §5.2,givingtheclassinFigure6.2.Fornow,Iwillusethetype int forlabels,and we’llassumethatlabelsarethesameaskeys.
Sinceitispossibletohavemorethanoneinstanceofalabelinthisparticular versionofbinarysearchtree,Ihavetospecifycarefullywhatitmeanstoremovethat labelortofindanodethatcontainsit.Ihavechosenheretochoosethe“highest” nodecontainingthelabel—theonenearesttheroot.[Whywillthisalwaysbe unique?Thatis,whycan’ttherebetwohighestnodescontainingalabel,equally neartheroot?]
OneproblematicfeatureofthisparticularBSTdefinitionis thatthedatastructureisrelativelyunprotected.Asthecommenton insert indicates,itispossibleto “break”aBSTbyinsertingsomethinginjudiciousintooneof itschildren,aswith BST.insert(T.left(),42),when T.label() is20.Whenweincorporatetherepresentationintoafull-fledgedimplementationof SortedSet (see §6.2),we’llprotect itagainstsuchabuse.
6.1.1SearchingaBST
SearchingaBSTisverysimilartobinarysearchinanarray,withtherootofthe treecorrespondingtothemiddleofthearray.
/**ThehighestnodeinTthatcontainsthe *labelL,ornullifthereisnone.*/ publicstaticBSTfind(BSTT,intL)
if(T==null||L==T.label)
returnT; elseif(L<T.label)
returnfind(T.left,L); elsereturnfind(T.right,L);
6.1.2InsertingintoaBST
Aspromised,theadvantageofusingatreeisthatitisrelativelycheaptoaddthings toit,asinthefollowingroutine.
6.1.OPERATIONSONABST 109
{
}
/**Abinarysearchtree.*/ classBST{ protectedintlabel; protectedBSTleft,right;
/**AleafnodewithgivenLABEL*/ publicBST(intlabel){this(label,null,null);}
/**Fetchthelabelofthisnode.*/ publicintlabel();
/**Fetchtheleft(right)childofthis.*/ publicBSTleft()... publicBSTright()...
/**ThehighestnodeinTthatcontainsthe *labelL,ornullifthereisnone.*/ publicstaticBSTfind(BSTT,intL)...
/**TrueifflabelLisinT.*/ publicstaticbooleanisIn(BSTT,intL) {returnfind(T,L)!=null;}
/**InsertthelabelLintoT,returningthemodifiedtree. *Thenodesoftheoriginaltreemaybemodified.If *TisasubtreeofalargerBST,T’,theninsertioninto *TwillrenderT’invalidduetoviolationofthebinary*search-treepropertyifL>T’.label()andTisin *T’.left()orL<T’.label()andTisinT’.right().*/ publicstaticBSTinsert(BSTT,intL)...
/**DeletetheinstanceoflabelLfromTthatisclosestto *totherootandreturnthemodifiedtree.Thenodesof *theoriginaltreemaybemodified.*/ publicstaticBSTremove(BSTT,intL)...
/*ThisconstructorisprivatetoforceallBSTcreation *tobedonebytheinsertmethod.*/ privateBST(intlabel,BSTleft,BSTright){ this.label=label;this.left=left;this.right=right; }
110
CHAPTER6.SEARCHTREES
}
Figure6.2:ABSTrepresentation.
6.1.OPERATIONSONABST 111
/**InsertthelabelLintoT,returningthemodifiedtree. *Thenodesoftheoriginaltreemaybemodified....*/ staticBSTinsert(BSTT,intL)
if(T==null)
returnnewBST(L,null,null); if(L<T.label)
T.left=insert(T.left,L); else
T.right=insert(T.right,L); return T;
BecauseoftheparticularwaythatIhavewrittenthis,whenI insertmultiplecopies ofavalueintothetree,theyalwaysgo“totheright”ofallexistingcopies.Iwill preservethispropertyinthedeleteoperation.
6.1.3DeletingitemsfromaBST.
Deletionisquiteabitmorecomplex,sincewhenoneremovesaninternalnode,one can’tjustletitschildrenfalloff,butmustre-attachthemsomewhereinthetree. Obviously,deletionofanexternalnodeiseasy;justreplaceitwiththenulltree (seeFigure6.3(a)).It’salsoeasytoremoveaninternalnodethatismissingone child—justhavetheotherchildcommitpatricideandmoveup (Figure6.3(b)).
Whenneitherchildisempty,wecanfindthe successor ofthenodewewantto remove—thefirstnodeintherighttree,whenitistraversedininorder.Nowthat nodewillcontainthesmallestkeyintherightsubtree.Furthermore,becauseitis thefirstnodeininorder,itsleftchildwillbenull[why?].Therefore,wecanreplace thatnodewithitsrightchildandmoveitskeytothenodeweareremoving,as showninFigure6.3(c).
ApossiblesetofsubprogramsfordeletionfromaBSTappears inFigure6.4. Theauxiliaryroutine swapSmallest isanadditionalmethodprivateto BST,and definedasfollows.
{
}
42 19 16 25 60 50 91 42 19 16 30 60 50 91 50 19 16 25 30 60 91 remove30remove25remove42
Figure6.3:Threepossibledeletions,eachstartingfromthetreeinFigure6.1.
/**DeletetheinstanceoflabelLfromTthatisclosestto *totherootandreturnthemodifiedtree.Thenodesof *theoriginaltreemaybemodified.*/ publicstaticBSTremove(BSTT,intL){ if(T==null) returnnull; if(L<T.label)
T.left=remove(T.left,L); elseif(L>T.label)
T.right=remove(T.right,L); //Otherwise,we’vefoundL elseif(T.left==null) returnT.right; elseif(T.right==null) returnT.left;
else
T.right=swapSmallest(T.right,T); returnT;
/**MovethelabelfromthefirstnodeinT(inaninorder *traversal)tonodeR(over-writingthecurrentlabelofR), *removethefirstnodeofTfromT,andreturntheresultingtree.
privatestaticBSTswapSmallest(BSTT,BSTR){ if(T.left==null){
R.label=T.label; returnT.right;
T.left=swapSmallest(T.left,R); returnT;
112 CHAPTER6.SEARCHTREES
}
*/
}else{
} }
Figure6.4:RemovingitemsfromaBSTwithoutparentpointers.
staticBSTinsert(BSTT,intL){ BSTnewNode; if(T==null)
returnnewBST(L,null,null); if(L<T.label)
T.left=newNode=insert(T.left,L); else
T.right=newNode=insert(T.right,L); newNode.parent=T; return T; }
6.1.4Operationswithparentpointers
IfwerevisetheBSTclasstoprovidea parent operation,andaddacorresponding parent fieldtotherepresentation,theoperationsbecomemorecomplex,butprovide abitmoreflexibility.Itisprobablywise not toprovidea setParent operationfor BST,sinceitisparticularlyeasytodestroythebinary-search-treepropertywiththis operation,andaclientof BST wouldbeunlikelytoneeditinanycase,giventhe existenceof insert and remove operations.
Theoperation find operationisunaffected,sinceitignoresparentnodes.When insertingina BST,ontheotherhand,lifeiscomplicatedbythefactthat insert mustsettheparentofanynodeinserted.Figure6.5showsone way.Finally,removal froma BST withparentpointers—showninFigure6.6—istrickiestofall,asusual.
6.1.5Degeneracystrikes
Unfortunately,allisnotroses.ThetreeinFigure6.1(b)is theresultofinserting nodesintoatreeinascendingorder(obviously,thesametreecanresultfromappropriatedeletionsfromalargertreeaswell).Youshouldbeabletoseethatdoinga searchorinsertiononthistreeisjustlikedoingasearchor insertiononalinkedlist; it is alinkedlist,butwithextrapointersineachelementthatarealwaysnull.This treeisnot balanced:itcontainssubtreesinwhichleftandrightchildrenhavemuch differentheights.WewillreturntothisquestioninChapter 9,afterdevelopinga bitmoremachinery.
6.2ImplementingtheSortedSetinterface
ThestandardJavalibraryinterface SortedSet (see §2.2.4)providesakindof Collection thatsupports rangequeries. Thatis,aprogramcanusetheinterface tofindallitemsinacollectionthatarewithinacertainrangeofvalues,according tosomeorderingrelation.Searchingforasinglespecificvalueissimplyaspecial caseinwhichtherangecontainsjustonevalue.Itisfairlyeasytoimplementthis
6.2.IMPLEMENTINGTHESORTEDSETINTERFACE 113
Figure6.5:InsertionintoaBSTthathasparentpointers.
/**DeletetheinstanceoflabelLfromTthatisclosestto *totherootandreturnthemodifiedtree.Thenodesof *theoriginaltreemaybemodified.*/ publicstaticBSTremove(BSTT,intL){ if(T==null)
returnnull; BSTnewChild; newChild=null;result=T; if(L<T.label)
T.left=newChild=remove(T.left,L); elseif(L>T.label)
T.right=newChild=remove(T.right,L); //Otherwise,we’vefoundL elseif(T.left==null)
returnT.right; elseif(T.right==null)
returnT.left; else
T.right=newChild=swapSmallest(T.right,T); if(newChild!=null)
newChild.parent=T; returnT;
privatestaticBSTswapSmallest(BSTT,BSTR){ if(T.left==null){ R.label=T.label; returnT.right; }else{
T.left=swapSmallest(T.left,R); if(T.left!=null)
T.left.parent=T; returnT;
114 CHAPTER6.SEARCHTREES
}
} }
Figure6.6:RemovingitemsfromaBSTwithparentpointers.
interfaceusingabinarysearchtreeastherepresentation; we’llcalltheresulta BSTSet.
Let’splanaheadalittle.Amongtheoperationswe’llhaveto supportare headSet, tailSet,and subSet,whichreturnviewsofsomeunderlyingsetthatconsistofasubrangeofthatset.Thevaluesreturnedwillbefull-fledged SortedSets intheirownright,modificationstowhicharesupposedtomodifytheunderlying setaswell(andvice-versa).Sinceafull-fledgedsetcanalsobethoughtofasaview ofarangeinwhichtheboundsare“infinitelysmall”to“infinitelylarge,”wemight lookforarepresentationthatsupports both setscreated“fresh”fromaconstructor, andthosethatareviewsofothersets.Thissuggestsarepresentationforourset thatcontainsapointertotherootofa BST,andtwoboundsindicatingthelargest andsmallestmembersoftheset,withnullindicatingamissingbound.
Wemaketherootofthe BST a(permanent)sentinelnodeforanimportant reason.Wewillusethesametreeforallviewsoftheset.Ifourrepresentation simplypointedatarootofthetreethatcontaineddata,then thispointerwould havetochangewheneverthatnodeofthetreewasremoved.But then,wewould havetomakesuretoupdatetherootpointerinallotherviews ofthesetaswell, sincetheyarealsosupposedtoreflectchangesintheset.Byintroducingthesentinel node,sharedbyallviewsandneverdeleted,wemaketheproblemofkeepingthem alluptodatetrivial.Thisisatypicalexampleoftheoldcomputer-sciencemaxim: Mosttechnicalproblemscanbesolvedbyintroducinganotherlevelofindirection.
Assumingweuseparentpointers,aniteratorthroughasetcanconsistofa pointertothenextnodewhoselabelistobereturned,apointertothelastnode whoselabelwasreturned(forimplementing remove)andapointertothe BSTSet beingiteratedover(convenientlyprovidedinJavabymakingtheiteratoraninner class).Theiteratorwillproceedininorder,skippingover portionsofthetreethat areoutsidetheboundsontheset.SeealsoExercise5.2concerningiteratingusing a parent pointer.
Figure6.8illustratesa BSTSet,showingthemajorelementsoftherepresentation:theoriginalset,the BST thatcontainsitsdata,aviewofthesameset,and aniteratoroverthisview.Thesetsallcontainspacefora Comparator (see §2.2.4) toallowtheuserofthesettospecifyanordering;inFigure6.8,weusethenaturalordering,whichonstringsgivesuslexicographicalorder.Figure6.7containsa sketchofthecorrespondingJavadeclarationsfortherepresentation.
6.3OrthogonalRangeQueries
Binarysearchtreesdividedata(ideally)intohalves,usingalinearorderingonthe data.Thedivide-and-conqueridea,however,doesnotrequirethatthefactorbetwo. Supposewearedealingwithkeysthathavemorestructure.Forexample,considera collectionofitemsthathavelocationson,say,sometwo-dimensionalarea.Insome cases,wemaywishtofinditemsinthiscollectionbasedontheirlocation;their keysaretheirlocations.Whileitis possible toimposealinearorderingonsuch keys,itisnotterriblyuseful.Forexample,wecouldusealexicographicordering, anddefine(x0,y0) > (x1,y1)iff x0 >x1 or
6.3.ORTHOGONALRANGEQUERIES 115
x
= x1
y0 >y1
0
and
.However,with
publicclassBSTSet<T>extendsAbstractSet<T>{
/**Theemptyset,usingCOMPastheordering.*/ publicBSTSet(Comparator<T>comp){ comparator=comp; low=high=null; sent=newBST();
}
/**Theemptyset,usingnaturalordering.*/ publicBSTSet(){this(null);}
/**ThesetinitializedtothecontentsofC,withnaturalorder.*/ publicBSTSet(Collection<?extendsT>c){addAll(c);}
/**ThesetinitializedtothecontentsofS,sameordering.*/ publicBSTSet(SortedSet<?extendsT>s){ this(s.comparator());addAll(c);
} /**Valueofcomparator();nullifnaturallyordered.*/ privateComparator<T>comp;
/**Boundsonelementsinthisclass,nullifnobounds.*/ privateTlow,high;
/**SentinelofBSTcontainingdata.*/ privatefinalBST<T>sent;
116 CHAPTER6.SEARCHTREES
Figure6.7:Javarepresentationfor BSTSet class,showingonlyconstructorsand instancevariables.
/**Usedinternallytoformviews.*/ privateBSTSet(BSTSet<T>set,Tlow,Thigh){ comparator=set.comparator(); this.low=low;this.high=high; this.sent=set.sent;
/**AniteratoroverBSTSet.*/ privateclassBSTIter<T>implementsIterator<T>{
/**Nextnodeiniterationtoyield.Equalsthesentinelnode *whendone.*/
BST<T>next;
/**Nodelastreturnedbynext(),ornullifnone,orifremove() *hasintervened.*/
BST<T>last;
BSTIter(){ last=null;
next= firstnodethatisinbounds,or sent ifnone;
/**AnodeintheBST*/ privatestaticclassBST<T>{ Tlabel;
BST<T>left,right,parent;
/**Asentinelnode*/
BST(){label=null;parent=null;}
BST(Tlabel,BST<T>left,BST<T>right){ this.label=label;this.left=left;this.right=right;
6.3.ORTHOGONALRANGEQUERIES 117
}
} ··· }
} } }
Figure6.7,continued:Privatenestedclassesusedinimplementation
BSTSet.this: last: next:
I:
hartebeest dog
axolotl elk duck
fauna: subset:
elephant gnu
Figure6.8: A BSTSet, fauna,aview, subset,formedfrom fauna.subSet("dog", "gnu"),andaniterator, I,over subset.The BST partoftherepresentationis sharedbetween fauna and subset.Trianglesrepresentwholesubtrees,androunded rectanglesrepresentindividualnodes.Eachsetcontainsa pointertotherootofthe BST (asentinelnode,whoselabelisconsideredlargerthananyvalueinthetree),plus lowerandupperboundsonthevalues(nullmeansunbounded), anda Comparator (inthiscase,null,indicatingnaturalorder).Theiteratorcontainsapointerto subset,whichitisiteratingover,apointer(next)tothenodecontainingthenext labelinsequence(“duck”)andanotherpointer(last)tothenodecontainingthe labelinthesequencethatwaslastdeliveredby I.next().Thedashedregionsof the BST areskippedentirelybytheiterator.The“hartebeest”node isnotreturned bytheiterator,buttheiteratordoeshavetopassthroughit togetdowntothe nodesitdoesreturn.
118 CHAPTER6.SEARCHTREES
∞ sentinel
6.4.PRIORITYQUEUESANDHEAPS 119
thatdefinition,thesetofallobjectsbetweenpoints A and B consistsofallthose objectswhosehorizontalpositionliesbetweenthoseof A and B,butwhosevertical positionisarbitrary(alongverticalstrip).Halftheinformationisunused.
Theterm quadtree (or quadtree)referstoaclassofsearchtreestructurethat betterexploitstwo-dimensionallocationdata.Eachstepofasearchdividesthe remainingdataintofourgroups,oneforeachoffourquadrantsofarectangleabout someinteriorpoint.Thisinteriordividingpointcanbethe center(sothatthe quadrantsareequal)givinga PRquadtree (alsocalleda point-regionquadtree or just regionquadtree),oritcanbeoneofthepointsthatisstoredinthetree,giving a pointquadtree.
Figure6.9illustratestheideabehindthetwotypesofquadtree.Eachnodeof thetreecorrespondstoarectangularregion(possiblyinfiniteinthecaseofpoint quadtrees).Anyregionmaybesubdividedintofourrectangularsubregionsto thenorthwest,northeast,southeast,andsouthwestofsome interiordividingpoint. Thesesubregionsarerepresentedbychildrenofthetreenodethatcorresponds tothedividingpoint.ForPRquadtrees,thesedividingpointsarethecentersof rectangles,whileforpointquadtrees,theyareselectedfromthedatapoints,just asthedividingvaluesinabinarysearchtreeareselectedfromthedatastoredin thetree.
6.4Priorityqueuesandheaps
Supposethatwearefacedwithadifferentproblem.Insteadof beingabletosearch quicklyforthepresenceof any elementinaset,letusrestrictourselvestosearching forthe largest (byflippingeverythinginthefollowingdiscussionaroundinthe obviousway,wecansearchforsmallestelementsinstead).Findingthelargestina BSTisreasonablyeasy[how?],butwestillhavetodealwiththeimbalanceproblem describedabove.Byrestrictingourselvestotheoperationsofinsertinganelement, andfindinganddeletingthelargestelement,wecanavoidthe balancingproblem easily.Adatastructuresupportingjustthoseoperationsiscalleda priorityqueue, becauseweremoveitemsfromitintheorderoftheirvalues,regardlessofarrival order.
InJava,wecouldsimplymakeaclassthatimplements SortedSet andthatwas particularlyfastattheoperations first and remove(x),when x happenstobethe firstelementoftheset.Butofcourse,theuserofsuchaclass mightbesurprisedto findhowslowitistoiteratethroughanentireset.Therefore,wemightspecialize abit,asshowninFigure6.10.
Aconvenientdatastructureforrepresentingpriorityqueuesisthe heap (notto beconfusedwiththelargeareaofstoragefromwhich new allocatesmemory,an unfortunatebuttraditionalclashofnomenclature).Aheap issimplyapositional tree(usuallybinary)satisfyingthefollowingproperty.
HeapProperty. Thelabelatanynodeinthetreeisgreaterthanor equaltothelabelofanydescendantofthatnode.
isafour-levelPRquadtree,usingsquareregionsforsimplicity.Belowisacorrespondingpointquadtree(therearemany,dependingonwhichpointsareused todividethedata).Ineach,theleftdiagramshowsthegeometry;thedotsrepresentthepositions—thekeys—ofsevendataitemsat(40, 30),( 30, 10),(20, 90), (30, 60),(10, 70),(70, 70),and(80, 20).Ontheright,weseethecorrespondingtreedatastructures.ForthePRquadtree,eachlevelofthetreecontainsnodes thatrepresentsquareswiththesamesizeofedge(shownattheright).Forthe pointquadtree,eachpointistherootofasubtreethatdividesarectangularregion intofour,generallyunequal,regions.Thefourchildrenof eachnoderepresentthe upper-left,upper-right,lower-left,andlower-rightquadrantsoftheircommonparentnode,respectively.Tosimplifythedrawing,wehavenot shownthechildrenof anodewhentheyareallempty.
120 CHAPTER6.SEARCHTREES -100 -75 -50 0 100 -10002550100 A • B • C • • D • E • F • G 0 A B G F E D C 200 100 50 25 -100 100 -100100 • A • B • C • D • E • F • G D B G E F A C
Figure6.9:Illustrationoftwokindsofquadtreeforthesamesetofdata.Ontop
interfacePriorityQueue<TextendsComparable<T>>{ /**InsertitemLintothisqueue.*/ publicvoidinsert(TL);
/**Trueiffthisqueueisempty.*/ publicbooleanisEmpty();
/**Thelargestelementinthisqueue.Assumes!isEmpty().*/ publicTfirst();
/**Removeandreturnaninstanceofthelargestelement(theremay *bemorethanone;removesonlyone).Assumes!isEmpty().*/ publicTremoveFirst();
Sincetheorderofthechildrenisimmaterial,thereismorefreedominhowtoarrange thevaluesintheheap,makingiteasytokeepaheapbushy.Accordingly,whenwe usetheunqualifiedterm“heap”inthiscontext,wewillmeana complete treewith theheapproperty.Thisspeedsupallthemanipulationsofheaps,sincethetime requiredtodoinsertionsanddeletionsisproportionaltotheheightoftheheap. Figure6.11illustratesatypicalheap.
Implementingtheoperationoffindingthelargestvalueisobviouslyeasy.To deletethelargestelement,whilekeepingboththeheappropertyandthebushiness ofthetree,wefirstmovethe“last”itemonthebottomlevelof theheaptothe rootofthetree,replacinganddeletingthelargestelement,andthen“reheapify”to re-establishtheheapproperty.Figure6.11b–dillustratestheprocess.Itistypical todothiswithabinarytreerepresentedasanarray,asinthe class BinaryTree2 of §5.3.Figure6.12givesapossibleimplementation.
Byrepeatedlyfindingthelargestelement,ofcourse,wecansortanarbitrary setofobjects:
/**SorttheelementsofAinascendingorder.*/ staticvoidheapSort(Integer[]A){
if(A.length<=1)
return;
Heap<Integer>H=newHeap<Integer>(A.length);
H.setHeap(A,0,A.length);
for(inti=A.length-1;i>=0;i-=1)
A[i]=H.removeFirst();
TheprocessisillustratedinFigure6.13.
6.4.PRIORITYQUEUESANDHEAPS 121
}
Figure6.10:Apossibleinterfacetopriorityqueues.
}
ofthelargestitem.Thelast(bottommost,rightmost)label isfirstmovedupto overwritethatoftheroot.Itisthen“sifteddown”untilthe heappropertyis restored.Theshadednodesshowwheretheheappropertyisviolatedduringthe process.
122 CHAPTER6.SEARCHTREES 2 60 30 42 5 4 −∞ (b) 60 2 30 42 5 4 −∞ (c) 60 42 30 2 5 4 −∞ (d) 91 60 30 42 5 4 2 (a)
Figure6.11: Illustrativeheap(a).Thesequence(b)–(d)showsstepsinthedeletion
classHeap<TextendsComparable<T>> extendsBinaryTree2<T>implementsPriorityQueue<T>{
/**AheapcontaininguptoN>0elements.*/ publicHeap(intN){super(N);}
/**Theminimumlabelvalue(written −∞).*/ staticfinalintMIN=Integer.MIN_VALUE;
/**InsertitemLintothisqueue.*/ publicvoidinsert(TL){ extend(L); reHeapifyUp(currentSize()-1);
/**Trueiffthisqueueisempty.*/ publicbooleanisEmpty(){returncurrentSize()==0;}
/**Thelargestelementinthisqueue.Assumes!isEmpty().*/ publicintfirst(){returnlabel(0);}
/**Removeandreturnaninstanceofthelargestelement(theremay *bemorethanone;removesonlyone).Assumes!isEmpty().*/ publicTremoveFirst(){ intresult=label(0); setLabel(0,label(currentSize()-1)); size-=1; reHeapifyDown(0); returnresult;
6.4.PRIORITYQUEUESANDHEAPS 123
}
}
Figure6.12:Implementationofacommonkindofpriorityqueue:theheap.
CHAPTER6.SEARCHTREES
/**Restoretheheappropertyinthistree,assumingthatonly *NODEmayhavealabellargerthanthatofitsparent.*/ protectedvoidreHeapifyUp(intnode){ if(node<=0) return; Tx=label(node); while(node!=0&&label(parent(node)).compareTo(x)<0){ setLabel(node,label(parent(node))); node=parent(node);
setLabel(node,x);
/**Restoretheheappropertyinthistree,assumingthatonly *NODEmayhavealabelsmallerthanthoseofitschildren.*/ protectedvoidreHeapifyDown(intnode){ Tx=label(node); while(true){
if(left(node)>=currentSize()) break;
intlargerChild= (right(node)>=currentSize() ||label(right(node)).compareTo(label(left(node)))<= 0) ?left(node):right(node);
if(x>=label(largerChild)) break; setLabel(node,label(largerChild)); node=largerChild;
setLabel(node,x);
/**SetthelabelsinthisHeaptoA[off],A[off+1],... *A[off+len-1].AssumesthatLEN<=maxSize().*/ publicvoidsetHeap(T[]A,intoff,intlen){ for(inti=0;i<len;i+=1) setLabel(i,A[off+i]); size=len; heapify();
/**Turnlabel(0)..label(size-1)intoaproperheap.*/ protectedvoidheapify(){...}
124
}
}
}
}
}
}
Figure6.12,continued.
(a);(b)istheresultof setHeap;(c)–(h)aretheresultsofsuccessiveiterations.Eachshowstheactivepart oftheheaparrayandtheportionoftheoutputarraythathasbeenset,separated byagap.
6.4.PRIORITYQUEUESANDHEAPS 125 (a) 19 0 -1 7 23 2 42 (b) 42 23 19 7 0 2 -1 (c) 23 7 19 -1 0 2 42 (d) 19 7 2 -1 0 23 42 (e) 7 0 2 -1 19 23 42 (f) 2 0 -1 7 19 23 42 (g) 0 -1 2 7 19 23 42 (h) -1 0 2 7 19 23 42
Figure6.13:Anexampleofheapsort.Theoriginalarrayisin
Wecouldsimplyimplement heapify likethis:
protectedvoidheapify()
for(inti=1;i<size;i+=1) reHeapifyUp(i);
Interestinglyenough,however,thisimplementationisnot quiteasfastasitcould be,anditisfastertoperformtheoperationbyadifferentmethod,inwhichwe workfromtheleavesbackup.Thatis,inreverselevelorder, weswapeachnode withitsparent,ifitislarger,andthen,asfor reHeapifyDown,continuemovingthe parent’svaluedownthetreeuntilheapnessisrestored.Itmightseemthatthisis nodifferentfromrepeatedinsertion,butwewillseelaterthatitis.
protectedvoidheapify()
for(inti=size/2-1;i>=0;i-=1) reHeapifyDown(i);
6.4.1HeapifyTime
Ifwemeasurethetimerequirementsforsorting N itemswith heapSort,wesee thatitisthecostof“heapifying” N elementsplusthetimerequiredtoextract N items.Theworst-casecostofextracting N itemsfromtheheap, Ce(N )isdominated bythecostof reHeapifyDown,startingatthetopnodeoftheheap.Ifwecount comparisonsofparentlabelsagainstchildlabels,youcanseethattheworst-case costhereisproportionaltothecurrentheightoftheheap.Supposetheinitial heightoftheheapis k (andthat N =2k+1 1).Itstaysthatwayuntil2k items havebeenextracted(removingthewholebottomrow),andthenbecomes k 1. Itstaysat k 1forthenext2k 1 items,andsoforth.Thus,thetotaltimespent extractingitemsis
126 CHAPTER6.SEARCHTREES
{
}
{
}
Ce(N )= Ce(2k+1 1)=2k k +2k 1 (k 1)+ ... +20 0 Ifwewrite2k · k as2k + +2k k andre-arrangetheterms,weget Ce(2k+1 1)=2k · k +2k 1 · (k 1)+ +20 · 0 =21 +22 + ··· +2k 1 +2k +22 + ··· +2k 1 +2k + . . . +2k 1 +2k +2k =(2k+1 2)+(2k+1 4)+ +(2k+1 2k 1)+(2k+1 2k) = k2k+1 (2k+1 2) ∈ Θ(k2k+1)=Θ(N lg N )
Nowlet’sconsiderthecostofheapifying N elements.Ifwedoitbyinsertingthe N elementsonebyoneandperforming reHeapifyUp,thenwegetacostlikethat oftheextracting N elements:Forthefirstinsertion,wedo0labelcomparisons; for thenext2,wedo1;forthenext4,wedo2;etc,or
where Cu h (N )istheworst-casecostofheapifying N elementsbyrepeated reHeapifyUps. Thisisthesameastheonewejustdid,giving
Butsupposeweheapifybyperformingthesecondalgorithmat theendof §6.4, performinga reHeapifyDown onalltheitemsofthearraystartingatitem ⌊N/2⌋−1 andgoingtowarditem0.Thecostof reHeapifyDown dependsonthedistanceto thedeepestlevel.Forthelast2k itemsintheheap,thiscostis0(whichiswhywe skipthem).Forthepreceding2k 1,thecostis1,etc.Thisgives
Sothissecondheapificationmethodrunsconsiderablyfaster(asymptotically) thantheobviousrepeated-insertionmethod.Ofcourse,sincethecostofextracting N elementsisstillΘ(N lg N )intheworstcase,theoverallworst-casecostof heapsortisstillΘ(N lg N ).However,thisdoesleadyoutoexpectthatforbig enough N ,therewillbesomeconstant-factoradvantagetousingthesecondform ofheapification,andthat’sanimportantpracticalconsideration.
6.5GameTrees
Considertheproblemoffindingthe best moveinatwo-persongamewithperfect information(i.e.,noelementofchance).Naively,youcoulddothisbyenumerating allpossiblemovesavailabletotheplayerwhoseturnitisfromthecurrentposition, somehowassignascoretoeach,andthenpickthemovewiththe highestscore. Forexample,youmightscoreapositionbycountingmaterial—bycomparingthe
6.5.GAMETREES 127
k+1 1)=20 0+21 1+ +2k
Cu h (2
k
N ) ∈ Θ(N lg N
Cu h (
)
Cd h(N )= Cd h(2k+1 1)=2k 1 1+2k 2 2+ +20 k Usingthesametrickasbefore, Cd h(2k+1 1)=2k 1 1+2k 2 2+ +20 k =20 +21 + ··· +2k 2 +2k 1 +20 +21 + ··· +2k 2 + . . . +20 =(2k 1)+(2k 1 1)+ ··· +(21 1) =2k+1 2 k ∈ Θ(N )
numberofyourpiecesagainstthoseofyouropponent.Butsuchascorewouldbe misleading.Amovemightgivemorepieces,butsetupadevastatingresponsefrom youropponent.So,foreachmove,youshouldalsoconsiderallyour opponent’s possiblemoves,assumehepicksthebestoneforhim,andusethatasthevalue.
Butwhatif you haveagreatresponsetohisresponse?Howcanweorganizethis searchsensibly?
Atypicalapproachistothinkofthespaceofpossiblecontinuationsofagame asatree,appropriatelyknownasa gametree..Eachnodeinthetreeisaposition inthegame;eachedgeisamove.Figure6.14illustratesthekindofthingwemean. Eachnodeisaposition;thechildrenofanodearethepossiblenextpositions.The numbersoneachnodearevaluesyouguessforthepositions(wherelargermeans betterforyou).Thequestionishowtogetthesenumbers.
Let’sconsidertheproblemrecursively.Giventhatitisyourmoveinacertain position,representedbynode P ,youpresumablywillchoosethemovethatgives youthebestscore;thatis,youwillchoosethechildof P withthemaximumscore. Therefore,itisreasonabletoassignthescoreofthatchild asthescoreof P itself.
Contrariwise,ifnode Q representsapositioninwhichitistheopponent’sturnto move,theopponentwillpresumablydobestbychoosingthechildof Q thatgives the minimum score(sinceminimumforyoumeansbestfortheopponent).Thus, theappropriatevaluetoassignto Q isthethatofthesmallestchild.Thenumbers ontheillustrativegametreeinFigure6.14conformtothisruleforassigningscores, whichisknownasthe minimaxalgorithm. Thestarrednodesinthediagramindicate whichnodes(andthereforemoves)youandyouropponentwouldconsidertobebest giventhesescores.
Thisprocedureexplainshowtoassignscorestoinnernodes, butitdoesn’thelp withtheleaves(thebasecaseoftherecursion).Ifourtreeiscompleteinthesense thateachleafnoderepresentsafinalpositioninthegame,it’seasytoassignleaf
128 CHAPTER6.SEARCHTREES -5 -5 -20 -5 15 -20 10 -30 -5 5 15 -20 -30 9 10 * * **** * Yourmove Opponent’smove Yourmove Opponent’smove
Figure6.14: Agametree.Nodesarepositions,edgesaremoves,andnumbersare scoresthatestimate“goodness”ofeachpositionforyou.Starsindicatewhichchild wouldbechosenfromthepositionaboveit.
values.Youcanchoosesomepositivevalueforpositionsinwhichyouhavewon, somenegativevalueforpositionsinwhichyouropponenthas won,andzeroforties. Withsuchatree,ifyouhavethefirstmove,thenyouknowthatyoucanforcea winiftherootnodehasapositivevalue(justchooseachildwiththatvalue),force atieifthetopnodehas0value(likewise),andthatyouwillalwayssufferdefeat (againstaperfectopponent)ifthetopnodehasanegativevalue.
However,formostinterestinggames,thegametreeistoobig eithertostoreor eventocompute,exceptneartheveryendofthegame.Sowecut offcomputation atsomepoint,eventhoughtheresultingleafpositionsarenotfinalpositions.Typically,wechooseamaximum depth, andusesomeheuristictocomputethevalue fortheleafbasedjustonthepositionitself(calleda staticvaluation).Aslight variationistouse iterativedeepening: repeatingthesearchatincreasingdepths untilwereachsometimelimit,andtakingthebestresultfoundsofar.
6.5.1Alpha-betapruning
Aswithanytreesearch,game-treesearchesareexponential inthedepthofthetree (thenumberofmoves(or ply)onelooksahead).Furthermore,gametreescanhave fairlysubstantial branchingfactors (thetermusedfortheaveragenumberofnext positions—children—ofanode).It’seasytoseethatifonehas16choicesforeach move,onewillnotbeabletolookverymanymovesahead.Wecan mitigatethis problemsomewhatby pruning thegametreeaswesearchit.
Onetechnique,knownas alpha-betapruning, isbasedonasimpleobservation: ifIhavealreadycalculatedthatmovingtoacertainposition, P ,willgetmea scoreofatleast α,andIhavepartiallyevaluatedsomeotherpossibleposition, Q,tothepointthatIknowitsvaluewillbe <α,thenIcanceaseanyfurther computationof Q (pruningitsunexploredbranches),knowingthatIwillnever chooseit.Likewise,whencomputingvaluesfortheopponent,ifIdeterminethata
6.5.GAMETREES 129 -5 -5 ≤-20 -5 ≥5 -20 -30 -5 5 -20 -30 * * * * * Yourmove Opponent’smove Yourmove Opponent’smove
Figure6.15: Alpha-betapruningappliedtothegametreefromFigure6.14.Missing subtreeshavebeenpruned.
/**AlegalmoveforWHOthateitherhasanestimatedvalue>=CUTOFF *orthathasthebestestimatedvalueforplayerWHO,startingfrom *positionSTART,andlookinguptoDEPTHmovesahead.*/ MovefindBestMove(Playerwho,Positionstart,intdepth,doublecutoff)
if(start isawonpositionfor who)returnWON_GAME;/*Value=∞ */ elseif(start isalostpositionfor who)returnLOST_GAME;/*Value=−∞ */ elseif(depth==0)returnguessBestMove(who,start,cutoff);
MovebestSoFar=Move.REALLY_BAD_MOVE; for(eachlegalmove, M, for who fromposition start){
Positionnext=start.makeMove(M);
/*Negatehereandbelowbecausebestforopponent=worstfor WHO*/ Moveresponse=findBestMove(who.opponent(),next, depth-1,-bestSoFar.value());
if(-response.value()>bestSoFar.value()){
Set M’svalueto -response.value(); bestSoFar=M; if(M.value()>=cutoff)break;
} returnbestSoFar;
/**Staticevaluationfunction.*/ MoveguessBestMove(Playerwho,Positionstart,doublecutoff)
MovebestSoFar;
bestSoFar=Move.REALLY_BAD_MOVE; for(eachlegalmove, M, for who fromposition start){
Positionnext=start.makeMove(M);
Set M’svaluetoheuristicguessofvalueto who of next; if(M.value()>bestSoFar.value()){ bestSoFar=M; if(M.value()>=cutoff) break;
returnbestSoFar;
130 CHAPTER6.SEARCHTREES
{
}
}
{
}
}
}
Figure6.16:Game-treesearchwithalpha-betapruning.
certainpositionwillyieldavaluenomorethan β (biggerscoresarebetterforme, worsefortheopponent),thenIcanstopcomputationonanyotherpositionforthe opponentwhosevalueisknowntobe >β.Thisobservationleadstothetechnique of alpha-betapruning.
Forexample,considerFigure6.15.Atthe‘≥ 5’position,Iknowthatthe opponentwillnotchoosetomovehere(sincehealreadyhasa 5move).Atthe ‘≤−20’position,myopponentknowsthatIwillneverchoosetomovehere(since Ialreadyhavea 5move).
Alpha-betapruningisbynomeanstheonlywaytospeedupsearchthrougha gametree.Muchmoresophisticatedsearchstrategiesarepossible,andarecovered inAIcourses.
6.5.2Agame-treesearchalgorithm
ThepseudocodeinFigure6.16summarizesthediscussioninthissection.Ifyou examinethefigure,you’llseethatthegametreewe’vebeentalkingaboutinthis sectionneveractuallymaterializes.Instead,we generate thechildrenofanode asweneedthem,andthrowthemawaywhennolongerneeded.Indeed,thereis notreedatastructurepresentatall;thetreesshowninFigures6.14and6.15are conceptual,orifyouprefer,theydescribe computations ratherthandatastructures.
Exercises
6.1. Fillinaconcreteimplementationforthetype QuadTree thathasthefollowing constructor:
/**Aninitiallyemptyquadtreethatisrestrictedtocontainpoints *withintheWxHrectanglewhosecenterisat(X0,Y0).*/ publicQuadTree(doublex0,doubley0,doublew,doubleh)..
andnootherconstructors.
6.2. Fillinaconcreteimplementationforthetype QuadTree thathasthefollowing constructor:
/**Aninitiallyemptyquadtree.*/ publicQuadTree()...
andnootherconstructors.Thisproblemismoredifficultthan theprecedingexercise,becausethereisno apriori limitontheboundariesoftheentireregion.While you could simplyusethemaximumandminimumfloating-pointnumbersforthese bounds,theresultwouldingeneralbeawastefultreestructurewithmanyuseless levels.Therefore,itmakessensetogrowtheregioncovered,asnecessary,starting fromsomearbitraryinitialsize.
6.5.GAMETREES 131
6.3. Supposethatweintroduceanewkindofremovaloperationfor BSTsthat haveparentpointers(see §6.1.4):
/**DeletethelabelT.label()fromtheBSTcontainingT,assuming *thattheparentofTisnotnull.Thenodesoftheoriginaltree *willbemodified.*/ publicstaticBSTremove(BSTT){ ··· }
Giveanimplementationofthisoperation.
6.4. Theimplementationof BSTSet in §6.2leftoutonedetail:theimplementation ofthe size method.Indeed,therepresentationgiveninFigure6.7providesnoway tocomputeitotherthantocountthenodesintheunderlying BST eachtime.Show howtoaugmenttherepresentationof BSTSet oritsnestedclassesasnecessaryso astoallowaconstant-timeimplementationof size.Rememberthatthesizeofany viewofa BSTSet mightchangewhenyouchangeaddorremoveelementsfromany otherviewofthesame BSTSet.
6.5. Assumethatwehaveaheapthatisstoredwiththelargestelementatthe root.Toprintallelementsofthisheapthataregreaterthan orequaltosomekey X,we could performthe removeFirst operationrepeatedlyuntilwegetsomething lessthan X,butthiswouldpresumablytakeworst-casetimeΘ(k lg N ),where N isthenumberofitemsintheheapand k isthenumberofitemsgreaterthanor equalto X.Furthermore,ofcourse,itchangestheheap.Showhowtoperformthis operationinΘ(k)time without modifyingtheheap.
132 CHAPTER6.SEARCHTREES
Chapter7 Hashing
Sortedarraysandbinarysearchtreesallallowfastqueries oftheform“isthere somethinglarger(smaller)than X inhere?”Heapsallowthequery“whatisthe largestiteminhere?”Sometimes,however,weareinterestedinknowingonly whethersomeitemispresent—inotherwords,onlyinequality.
Consideragainthe isIn procedurefrom §1.3.1—alinearsearchinasortedarray. Thisalgorithmrequiresanamountoftimeatleastproportionalto N ,thenumberof itemsstoredinthearraybeingsearched.Ifwecouldreduce N ,wewouldspeedup thealgorithm.Onewaytoreduce N istodividethesetofkeysbeingsearchedinto somenumber,say M ,ofdisjointsubsetsandtothenfindsomefastwayofchoosing therightsubset.Bydividingthekeysmore-or-lessevenlyamongsubsets,wecan reducethetimerequiredtofindsomethingtobeproportional,ontheaverage,to N/M .Thisiswhatbinarysearchdoesrecursively(isInB from §1.3.4),with M =2. Ifwecouldgoevenfurtherandchooseavaluefor M thatiscomparableto N ,then thetimerequiredtofindakeybecomesalmostconstant.
Theproblemistofindaway—preferablyfast—ofpickingsubsets(bins)inwhich toputkeystobesearchedfor.Thismethodmustbeconsistent,sincewheneverwe areaskedtosearchforsomething,wemustgotothesubsetweoriginallyselected forit.Thatis,theremustbeafunction—knownasa hashingfunction—thatmaps keystobesearchedforintotherangeofvalues0to M 1.
7.1Chaining
Oncewehavethishashingfunction,wemustalsohavearepresentationoftheset ofbins.Perhapsthesimplestschemeistouselinkedliststo representthebins,a practiceknownas chaining inthehash-tableliterature.ThestandardJavalibrary class HashSet usesjustsuchastrategy,illustratedinFigure7.1.Moreusually,hash tablesappearasmappings,suchasimplementationsofthestandardJavainterface java.util.Map.Therepresentationisthesame,exceptthattheentriesinthebins carrynotonlykeys,butalsotheadditionalinformationthatissupposedtobe indexedbythosekeys.Figure7.2showspartofapossibleimplementationofthe standardJavaclass java.util.HashMap,whichisitselfanimplementationofthe
133
Map interface.
The HashMap classshowninFigure7.2usesthe hashCode methoddefinedforall JavaObjectstoselectabinnumberforanykey.Ifthishashfunctionisagoodone thebinswillreceiveroughlyequalnumbersofitems(see §7.3formorediscussion). Wecandecideonan apriori limitontheaveragenumberofitemsperbin, andthengrowthetablewheneverthatlimitisexceeded.This isthepurposeof the loadFactor fieldandconstructorargument.It’snaturaltoaskwhetherwe mightusea“faster”datastructure(suchasabinarysearchtree)forthebins. However,ifwereallydochoosereasonablevaluesforthesizeofthetree,sothat eachbincontainsonlyafewitems,thatclearlywon’tgainus much.Growingthe bins arraywhenitexceedsourchosenlimitislikegrowingan ArrayList (§4.1). Forgoodasymptotictimeperformance,weroughlydoubleits sizeeachtimeit becomesnecessarytogrowthetable.Wehavetoremember,inaddition,thatthe binnumbersofmostitemswillchange,sothatwe’llhavetomovethem.
7.2Open-addresshashing
IntheGoodOldDays,theoverheadof“allthoselinkfields”andtheexpenseof “allthose new operations”ledpeopletoconsiderwaysofavoidinglinkedlistsfor representingthecontentsofbins.The open-addressing schemesputtheentries directlyintothebins(oneperbin).Ifabinisalreadyfull, thensubsequententries thathavethesamehashvalueoverflowintoother,unusedentriesaccordingto somesystematicscheme.Asaresult,the put operationfromFigure7.2wouldlook somethinglikethis:
publicValput(Keykey,Valvalue){ inth=hash(key);
while(bins.get(h)!=null&&!bins.get(h).key.equals(key)) h=nextProbe(h);
if(bins.get(h)==null){ bins.add(newentry ); size+=1;
if((float)size/bins.size()>loadFactor) resize bins; returnnull; }else
returnbins.get(h).setValue(value);
and get wouldbesimilarlymodified.
Thefunction nextProbe providesanothervalueintheindexrangeof bins for usewhenitturnsoutthatthetableisalreadyoccupiedatposition h (asituation knownasa collision).
Inthesimplestcase nextProbe(L) simplyreturns (h+1)%bins.size(),an instanceofwhatiscalledknown linearprobing.Moregenerally,linearprobing
134 CHAPTER7.HASHING
}
7.2.OPEN-ADDRESSHASHING
Figure7.1: Illustrationofasimplehashtablewithchaining,pointedtobythe variable nums.Thetablecontains11bins,eachcontainingapointertoalinkedlist oftheitems(ifany)inthatbin.Thisparticulartablerepresentstheset
Thehashfunctionissimply h(x)= x mod11onintegerkeys.(Themathematical operation a mod b isdefinedtoyield a b⌊a/b⌋ when b =0.Therefore,itisalways non-negativeif b> 0.)Thecurrentloadfactorinthissetis17/11 ≈ 1 5,againsta maximumof2.0(the loadFactor field),althoughasyoucansee,thebinsizesvary from0to3.
135 0 1 2 3 4 5 6 7 8 9 10 0 22 23 26 81 5 82 38 83 39 84 40 63 -3 9 86 65 size: bins: loadFactor:
nums:
17 2.0
{81, 22, 38, 26, 86, 82, 0, 23, 39, 65, 83, 40, 9, 3, 84, 63, 5}.
packagejava.util; publicclassHashMap<Key,Val>extendsAbstractMap<Key,Val>{
/**Anew,emptymappingusingahashtablethatinitiallyhas *INITIALBINSbins,andmaintainsaloadfactor<=LOADFACTOR.*/ publicHashMap(intinitialBins,floatloadFactor){ if(initialBuckets<1||loadFactor<=0.0)
thrownewIllegalArgumentException(); bins=newArrayList<Entry<Key,Val>>(initialBins); bins.addAll(Collections.ncopies(initialBins,null)); size=0;this.loadFactor=loadFactor;
/**AnemptymapwithINITIALBINSinitialbinsandloadfactor0.75.*/ publicHashMap(intinitialBins){this(initialBins,0.75);}
/**Anemptymapwithdefaultinitialbinsandloadfactor0.75.*/ publicHashMap(){this(127,0.75);}
/**AmappingthatisacopyofM.*/ publicHashMap(Map<Key,Val>M){this(M.size(),0.75);putAll(M);}
publicTget(Objectkey){
Entrye=find(key,bins.get(hash(key))); return(e==null)?null:e.value;
/**Causeget(KEY)==VALUE.Returnsthepreviousget(KEY). */ publicValput(Keykey,Valvalue){ inth=hash(key);
Entry<Key,Val>e=find(key,bins.get(h)); if(e==null){
bins.set(h,newEntry<Key,Val>(key,value,bins.get(h))); size+=1; if(size>bins.size()*loadFactor)grow(); returnnull;
}else
returne.setValue(value);
136
CHAPTER7.HASHING
}
}
} ···
Figure7.2:Partofanimplementationofclass java.util.HashMap,ahash-tablebasedimplementationofthe java.util.Map interface.
privatestaticclassEntry<K,V>implementsMap.Entry<K,V>{ Kkey;Vvalue; Entry<K,V>next; Entry(Kkey,Vvalue,Entry<K,V>next)
{this.key=key;this.value=value;this.next=next;} publicKgetKey(){returnkey;} publicVgetValue(){returnvalue;} publicVsetValue(Vx)
{Vold=value;value=x;returnold;} publicinthashCode(){ seeFigure2.14 } publicbooleanequals(){ seeFigure2.14 }
privateArrayList<Entry<Key,Val>>bins; privateintsize;/**Numberofitemscurrentlystored*/ privatefloatloadFactor;
/**Increasenumberofbins.*/ privatevoidgrow(){
HashMap<Key,Val>newMap
=newHashMap(primeAbove(bins.size()*2),loadFactor); newMap.putAll(this);copyFrom(newMap); }
/**Returnavalueintherange0..bins.size()-1,basedon *thehashcodeofKEY.*/ privateinthash(Objectkey){ return(key==null)?0
:(0x7fffffff&key.hashCode())%bins.size();
/**SetTHIStothecontentsofS,destroyingtheprevious *contentsofTHIS,andinvalidatingS.*/ privatevoidcopyFrom(HashMap<Key,Val>S) {size=S.size;bins=S.bins;loadFactor=S.loadFactor;}
/**TheEntryinthelistBINwhosekeyisKEY,ornullifnone.*/ privateEntry<Key,Val>find(Objectkey,Entry<Key,Val>bin){ for(Entry<Key,Val>e=bin;e!=null;e=e.next)
if(key==null&&e.key==null||key.equals(e.key)) returne; returnnull; } privateintprimeAbove(intN){return aprimenumber ≥ N ;} }
7.2.OPEN-ADDRESSHASHING 137
}
}
Figure7.2,continued:Privatedeclarationsfor HashMap
addsapositiveconstantthatisrelativelyprimetothetablesize bins.size() [whyrelativelyprime?].Ifwetakethe17keysofFigure7.1:
andinserttheminthisorderintoanarrayofsize23usinglinearprobingwith increment1and x mod23asthehashfunction,thearrayofbinswillcontainthe followingkeys:
Asyoucansee,severalkeysaredisplacedfromtheirnatural positions.Forexample, 84mod23=15and63mod23=17.
Thereisa clustering phenomenonassociatedwithlinearprobing.Theproblem issimpletoseewithreferencetothechainingmethod.Ifthe sequenceofentries examinedinsearchingforsomekeyis,say, b0,b1,...,bn,andifanyotherkeyshould hashtooneofthese bi,thenthesequenceofentriesexaminedinsearchingforitwill bepartofthesamesequence, bi,bi+1,...bn,evenwhenthetwokeyshavedifferent hashvalues.Ineffect,whatwouldbetwodistinctlistsunder chainingaremerged togetherunderlinearprobing,asmuchasdoublingtheeffectiveaveragesizeofthe binsforthosekeys.Thelongestchainforoursetofintegers (seeFigure7.1)was only3long.Intheopen-addressexampleabove,thelongestchainis9itemslong (lookat63),eventhoughonlyoneotherkey(40)hasthesamehashvalue.
Byhaving nextProbe incrementthevaluebydifferentamounts,dependingon theoriginalkey—atechniqueknownas doublehashing—wecanamelioratethis effect.
Deletionfromanopen-addressedhashtableisnon-trivial. Simplymarkingan entryas“unoccupied”canbreakthechainofcollidingentries,anddeletemorethan thedesireditemfromthetable[why?].Ifdeletionisnecessary(often,itisnot), wehavetobemoresubtle.Theinterestedreaderisreferredtovolume3ofKnuth, TheArtofComputerProgramming.
Theproblemwithopen-addressschemesingeneralisthatkeysthatwouldbein separatebinsunderthechainingschemecancompetewitheachother.Underthe chainingscheme,ifallentriesarefullandwesearchforakeythatisnotinthetable, thesearchrequiresonlyasmanyprobes(i.e.,testsforequality)astherearekeysin thetablethathavethesamehashedvalue.Underanyopen-addressingscheme,it wouldrequire N probestofindthatthekeyisnotinthetable.Inmyexperience, thecostoftheextralinknodesrequiredforchainingisrelativelyunimportant,and formostpurposes,Irecommendusingchainingratherthanopen-addressschemes.
7.3Thehashfunction
Thisleavesthequestionofwhattouseforthefunction hash,usedtochoosethe bininwhichtoplaceakey.Inorderforthemaporsetweareimplementingto workproperly,itisfirstimportantthatourhashfunctionsatisfytwoconstraints:
138 CHAPTER7.HASHING
{81, 22, 38, 26, 86, 82, 0, 23, 39, 65, 83, 40, 9, 3, 84, 63, 5}
0 0 23 1 63 2 26 3 4 5 5 6 7 8 9 9 10 11 81 12 82 13 83 14 38 15 39 16 86 17 40 18 65 19 -3 20 84 21 22 22
1.Foranykeyvalue, K,thevalueof hash(K ) mustremainconstantwhile K is inthetable(orthetablemustbereconstructedif hash(K ) changes).during theexecutionoftheprogram.
2.Iftwokeysareequal(accordingtothe equals method,orwhateverequality testthehashtableisusing),thentheir hash valuesmustbeequal.
Ifeitherconditionisviolated,akeycaneffectivelydisappearfromthetable.On theotherhand,itis not generallynecessaryforthevalueof hash tobeconstant fromoneexecutionofaprogramtothenext,norisitnecessarythatunequalkeys haveunequalhashvalues(althoughperformancewillclearlysufferiftoomanykeys havethesamehashvalue).
Ifthekeysaresimplynon-negativeintegers,asimpleandeffectivefunctionis tousetheremaindermodulothetablesize:
hash(X)==X%bins.size();
Forintegersthatmightbenegative,wehavetomakesomeprovision.Forexample hash(X)=(X&0x7fffffff)%bins.size(); hastheeffectofadding231 toanynegativevalueof X first[why?].Alternatively,if bins.size() isodd,then
hash(X)=X%((bins.size()+1)/2)+bins.size()/2; willalsowork[why?].
Handlingnon-numerickeyvaluesrequiresabitmorework.AllJavaobjects havedefinedonthema hashCode methodthatwehaveusedtoconvert Objects intointegers(whencewecanapplytheprocedureonintegers above).Thedefaultimplementationof x.equals(y) on Object is x==y—thatis,that x and y arereferencestothesameobject.Correspondingly,thedefaultimplementationof x.hashCode() suppliedby Object simplyreturnsanintegervaluethatisderived fromtheaddressoftheobjectpointedtoby x—thatis,bythepointervalue x treatedasaninteger(whichisallitreallyis,behindthescenes).Thisdefault implementationisnotsuitableforcaseswherewewanttoconsidertwodifferent objectstobethesame.Forexample,thetwo Stringscomputedby
Strings1="Hello,world!",s2="Hello,"+""+"world!"; willhavethepropertythat s1.equals(s2),but s1!=s2 (thatis,theyaretwo different String objectsthathappentocontainthesamesequenceofcharacters). Hence,thedefault hashCode operationisnotsuitablefor String,andthereforethe String classoverridesthedefaultdefinitionwithitsown. Forconvertingtoanindexinto bins,weusedtheremainderoperation.This obviouslyproducesanumberinrange;whatisnotsoobviousiswhywechosethe tablesizeswedid(primesnotclosetoapowerof2).Sufficeittosaythatother choicesofsizetendtoproduceunfortunateresults.Forexample,usingapowerof 2meansthatthehigh-orderbitsof X.hashCode() getignored.
7.3.THEHASHFUNCTION 139
Ifkeysarenotsimpleintegers(strings,forexample),aworkablestrategyisto firstmashthemintointegersandthenapplytheremaindering methodabove.Here isarepresentativestring-hashingfunctionthatdoeswell empirically,takenfroma CcompilerbyP.J.Weinberger1 .Itassumes8-bitcharactersand32-bit ints. staticinthash(StringS)
inth; h=0; for(intp=0;p<S.length();p+=1){ h=(h<<4)+S.charAt(p); h=(h^((h&0xf0000000)>>24))&0x0fffffff; } returnh;
TheJavaStringtypehasadifferentfunctionfor hashCode,whichcomputes
usingmodular int arithmetictogetaresultintherange 231 to231 1.Here, ci denotesthe ith characterintheclsString.
7.4Performance
Assumingthekeys are evenlydistributed,ahashtablewilldoretrievalinconstant time,regardlessof N ,thenumberofitemscontained.Asindicatedintheanalysis wedidin §4.1aboutgrowing ArrayLists,insertionalsohasconstantamortized cost(i.e.,costaveragedoverallinsertions).Ofcourse,ifthekeysarenotevenly distributed,thenwecanseeΘ(N )cost.
Ifthereisapossibilitythatonehashfunctionwillsometimeshavebadclustering problems,atechniqueknownas universalhashing canhelp.Here,youchooseahash functionatrandomfromsomecarefullychosenset.Onaverageoverallrunsofyour program,yourhashfunctionwillthenperformwell.
Exercises
7.1. Giveanimplementationforthe iterator functionoverthe HashMap representationgivenin §7.1,andthe Iterator classitneeds.Sincewehavechosen rathersimplelinkedlists,youwillhavetousecareingettingthe remove operation right.
1TheversionhereisadaptedfromAho,Sethi,andUllman, Compilers:Principles,Techniques, andTools,Addison-Wesley,1986,p.436.
140 CHAPTER7.HASHING
{
}
0≤i<n ci31n i 1
Chapter8
SortingandSelecting
Atleastatonetime,mostCPUtimeandI/Obandwidthwasspent sorting(these days,IsuspectmoremaybespentrenderingMPEGfiles).Asaresult,sortinghas beenthesubjectofextensivestudyandwriting.Wewillhardlyscratchthesurface here.
8.1Basicconcepts
Thepurposeofanysortistopermutesomesetofitemsthatwe’llcall records so thattheyaresortedaccordingtosomeorderingrelation.In general,theordering relationlooksatonlypartofeachrecord,the key.Therecordsmaybesorted accordingtomorethanonekey,inwhichcasewerefertothe primarykey andto secondarykeys.Thisdistinctionisactuallyrealizedintheorderingfunction:record A comesbefore B iffeither A’sprimarykeycomesbefore B’s,ortheirprimarykeys areequaland A’ssecondarykeycomesbefore B’s.Onecanextendthisdefinition inanobviouswaytohierarchiesofmultiplekeys.Forthepurposesofthisbook,I’ll usuallyassumethatrecordsareofsometype Record andthatthereisanordering relationontherecordswearesorting.I’llwrite before(A,B ) tomeanthatthe keyof A comesbeforethatof B inwhateverorderweareusing.
Althoughconceptuallywemovearoundtherecordswearesortingsoastoput theminorder,infacttheserecordsmayberatherlarge.Therefore,itisoften preferabletokeeparoundpointerstotherecordsandexchangethoseinstead.If necessary,therealdatacanbephysicallyre-arrangedasalaststep.InJava,this isveryeasyofcourse,since“large”dataitemsarealwaysreferredtobypointers.
Stability. Asortiscalled stable ifitpreservestheoriginalorderofrecordsthat haveequalkeys.Anysortcanbemadestableby(ineffect)addingtheoriginal recordpositionasafinalsecondarykey,sothatthelistofkeys(Bob,Mary,Bob, Robert)becomessomethinglike(Bob.1,Mary.2,Bob.3,Robert.4).
Inversions. Forsomeanalyses,weneedtohaveanideaof howout-of-order a givensequenceofkeysis.Oneusefulmeasureisthenumberof inversions inthe
141
sequence—inasequenceofkeys k0,...,kN 1,thisisthenumberofpairsofintegers, (i,j),suchthat i<j and ki >kj .Forexample,therearetwoinversionsinthe sequenceofwords
Charlie,Alpha,Bravo
andthreeinversionsin
Charlie,Bravo,Alpha.
Whenthekeysarealreadyinorder,thenumberofinversionsis0,andwhenthey areinreverseorder,sothat every pairofkeysisinthewrongorder,thenumberof inversionsis N (N 1)/2,whichisthenumberofpairsofkeys.Whenallkeysare originallywithinsomedistance D oftheircorrectpositionsinthesortedpermutation,wecanestablishapessimisticupperboundof DN inversionsintheoriginal permutation.
Internalvs.externalsorting. Asortthatiscarriedoutentirelyinprimary memoryisknownasan internal sort.Thosethatinvolveauxiliarydisks(or,in theolddaysespecially,tapes)toholdintermediateresultsarecalled external sorts. Thesourcesofinputandoutputareirrelevanttothisclassification(onecanhave internalsortsondatathatcomesfromanexternalfile;it’sjusttheintermediate filesthatmatter).
8.2ALittleNotation
Manyofthealgorithmsinthesenotesdealwith(orcanbethoughtofasdealing with)arrays.Indescribingorcommentingthem,wesometimesneedtomakeassertionsaboutthecontentsofthesearrays.Forthispurpose,Iamgoingtousea notationusedbyDavidGriestomakedescriptivecommentsaboutmyarrays.The notation
denotesasectionofanarraywhoseelementsareindexedfrom a to b andthat satisfiesproperty P .Italsoassertsthat a ≤ b +1;if a>b,thenthesegmentis empty.Icanalsowrite
todescribeanarraysegmentinwhichitems c +1to d 1satisfy P ,andthat c<d Byputtingthesesegmentstogether,Icandescribeanentire array.Forexample,
142
CHAPTER8.SORTINGANDSELECTING
P ab
c P d
A : ordered 0 iN
istrueifthearray A has N elements,elements0through i 1areordered,and 0 ≤ i ≤ N .Anotationsuchas
denotesa1-elementarraysegmentwhoseindexis j andwhose(single)valuesatisfies P .Finally,I’lloccasionallyneedtohavesimultaneousconditionsonnestedpieces ofanarray.Forexample,
referstoanarraysegmentinwhichitems0to N 1satisfy P ,items0to i 1 satisfy Q,0 ≤ N ,and0 ≤ i ≤ N
8.3Insertionsorting
Oneverysimplesort—andquiteadequateforsmallapplications,really—isthe straightinsertionsort. Thenamecomesfromthefactthatateachstage,weinsert anas-yet-unprocessedrecordintoa(sorted)listoftherecordsprocessedsofar,as illustratedinFigure8.2.ThealgorithmisshowninFigure8.1.
Acommonwaytomeasurethetimerequiredtodoasortistocountthecomparisonsofkeys(forFigure8.1,thecallsto before).Thetotal(worst-case)time requiredby insertionSort is
0<i<N CIL(i),where CIL(m)isthecostoftheinner(j)loopwhen i= m,and N isthesizeof A.Examinationoftheinnerloop showsthatthenumberofcomparisonsrequiredisequaltothe numberofrecords numbered0to i-1 whosekeyslargerthanthatof x,plusoneifthereisatleast onesmallerkey.Since A[0..i-1] issorted,itcontainsnoinversions,andtherefore, thenumberofelementsafter X inthesortedpartof A happenstobeequaltothe numberofinversionsinthesequence A[0],...,A[i] (since X is A[i]).When X is insertedcorrectly,therewillbenoinversionsintheresultingsequence.Itisfairly easytoworkoutfromthatpointthattherunningtimeof insertionSort,measuredinkeycomparisons,isboundedby I + N 1,where I isthetotalnumberof inversionsintheoriginalargumentto insertionSort.Thus,themoresortedan arrayistobeginwith,thefaster insertionSort runs.
8.4Shell’ssort
Theproblemwithinsertionsortcanbeseenbyexaminingtheworstcase—where thearrayisinitiallyinreverseorder.Thekeysareagreatdistancefromtheir finalrestingplaces,andmustbemovedoneslotatatimeuntil theygetthere.If keyscouldbemovedgreatdistancesinlittletime,itmightspeedthingsupabit.
8.3.INSERTIONSORTING 143
P j
Q
P
0 iN
/**PermutetheelementsofAtobeinascendingorder.*/ staticvoidinsertionSort(Record[]A){
intN=A.length; for(inti=1;i<N;i+=1){
/*A: 0 i N
ordered */ Recordx=A[i]; intj;
for(j=i;j>0&&before(x,A[j-1]);j-=1){
/*A: 0 j >x i N
ordered exceptat j */
A[j]=A[j-1]; }
ordered exceptat j */ A[j]=x;
}
144
CHAPTER8.SORTINGANDSELECTING
/*A: ≤ x 0 j >x i N }
Figure8.1:Programforperforminginsertionsortonanarray.The before function isassumedtoembodythedesiredorderingrelation.
insertElement.Thegapateachpointseparatestheportionofthearrayknown tobesortedfromtheunprocessedportion.
8.4.SHELL’SSORT 145 13 9 10 0 22 12 4 9 13 10 0 22 12 4 9 10 13 0 22 12 4 0 9 10 13 22 12 4 0 9 10 13 22 12 4 0 9 10 12 13 22 4 0 4 9 10 12 13 22
Figure8.2:Exampleofinsertionsort,showingthearraybeforeeachcallof
CHAPTER8.SORTINGANDSELECTING
/**PermutetheelementsofKEYS,whichmustbedistinct, *intoascendingorder.*/ staticvoiddistributionSort1(int[]keys){ intN=keys.length; intL=min(keys),U=max(keys); java.util.BitSetb=newjava.util.BitSet(); for(inti=0;i<N;i+=1)
b.set(keys[i]-L); for(inti=L,k=0;i<=U;i+=1) if(b.get(i-L)){ keys[k]=i;k+=1;
Figure8.3:Sortingdistinctkeysfromareasonablysmallanddenseset.Here, assumethatthefunctions min and max returntheminimumandmaximumvalues inanarray.Theirvaluesarearbitraryifthearraysareempty.
ThisistheideabehindShell’ssort1.Wechooseadiminishingsequenceofstrides, s0 >s1 >...>sm 1,typicallychoosing sm 1 =1.Then,foreach j,wedividethe N recordsintothe sj interleavedsequences
andsorteachoftheseusinginsertionsort.Figure8.4illustratestheprocesswitha vectorinreverseorder(requiringatotalof49comparisons ascomparedwith120 comparisonsforstraightinsertionsort).
Agoodsequenceof sj turnsouttobe sj = ⌊2m j 1⌋,where m = ⌊lg N ⌋.With thissequence,itcanbeshownthatthenumberofcomparisons requiredis O(N 1 5), whichisconsiderablybetterthan O(N 2).Intuitively,theadvantagesofsucha sequence—inwhichthesuccessive sj arerelativelyprime—isthatoneachpass,each positionofthevectorparticipatesinasortwithanewsetof otherpositions.The sortsget“jumbled”andgetmoreofachancetoimprovethenumberofinversions forlaterpasses.
1Alsoknownas“shellsort.”Knuth’sreference:DonaldL.Shell,inthe Communicationsofthe ACM 2 (July,1959),pp.30–32.
146
} }
R0,Rsj ,R2sj ,..., R1,Rsj +1,R2sj +1,... ··· Rsj 1,R2sj 1,...
vectorinreverseorder. Theincrementsare15,7,3,and1.Thecolumnmarked #I givesthenumberof inversionsremaininginthearray,andthecolumnmarked #C givesthenumber ofkeycomparisonsrequiredtoobtaineachlinefromitspredecessor.Thearcs underneaththearraysindicatewhichsubsequencesofelementsareprocessedat eachstage.
8.4.SHELL’SSORT 147
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 01200 14 13 12 11 10 9 8 7 6 5 4 3 2 1 15911 0 7 6 5 4 3 2 1 14 13 12 11 10 9 8 15429 0 1 3 2 4 6 5 7 8 10 9 11 13 12 14 15420 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15019
#I#C
Figure8.4:AnillustrationofShell’ssort,startingwitha
8.5Distributioncounting
Whentherangeofkeysisrestricted,thereareanumberofoptimizationspossible.In Column#1ofhisbook ProgrammingPearls 2 ,JonBentleygivesasimplealgorithm fortheproblemofsorting N distinctkeys,allofwhichareinarangeofintegersso limitedthattheprogrammercanbuildavectorofbitsindexedbythoseintegers. IntheprogramshowninFigure8.3,IuseaJava BitSet,whichisabstractlyaset ofnon-negativeintegers(implementedasapackedarrayof1-bitquantities).
Let’sconsideramoregeneraltechniquewecanapplyevenwhentherearemultiplerecordswiththesamekey.Assumethatthekeysoftherecordstobesortedare insomereasonablysmallrangeofintegers.Thenthefunction distributionSort2 showninFigure8.6sorts N recordsstably,movingthemfromaninputarray(A) toadifferentoutputarray(B).Itcomputesthecorrectfinalpositionin B foreach recordin A.Todoso,itusesthefactthatthepositionofanyrecordin B issupposedtobethenumbertherecordsthateitherhavesmallerkeysthanithas,or thathavethesamekey,butappearbeforeitin A.Figure8.5containsanexample oftheprograminoperation.
8.6Selectionsort
Ininsertionsort,wedetermineanitem’sfinalpositionpiecemeal.Anotherwayto proceedistoplaceeachrecordinitsfinalpositioninonemovebyselectingthe smallest(orlargest)keyateachstep.Thesimplestimplementationofthisideais straightselectionsorting, asfollows.
staticvoidselectionSort(Record[]A) { intN=A.length; for(inti=0;i<N-1;i+=1){
*/ intm,j; for(j=i+1,m=i;j<N;j+=1) if(before(A[j],A[m])m=j; /*NowA[m]isthesmallestelementinA[i..N-1]*/ swap(A,i,m);
Here, swap(A,i,m) isassumedtoswapelements i and m of A.Thissortisnotstable; theswappingofrecordspreventsstability.Ontheotherhand,theprogramcanbe 2Addison-Wesley,1986.Bytheway,thatcolumnmakesverynice“consciousness-raising”columnonthesubjectofappropriately-engineeredsolutions. Ihighlyrecommendboththisbookand his MoreProgrammingPearls, Addison-Wesley,1988.
148 CHAPTER8.SORTINGANDSELECTING
/*A: ordered 0 ≥ items0..i-1 iN } }
Figure8.5:Illustrationofthe distributionSort2 program.Thevaluestobe sortedareshowninthearraymarked A.Thekeysarethenumberstotheleftof theslashes.Thedataaresortedintothearray B,shownatvariouspointsinthe algorithm.ThelabelsattheleftrefertopointsintheprograminFigure8.6.Each point Bk indicatesthesituationattheendofthelastloopwhere i = k.The roleofarray count changes.First(atcount1) count[k-1] containsthenumberof instancesofkey (k-1)-1.Next(atcount2),itcontainsthenumberofinstancesof keyslessthan k-1.IntheBi lines, count[k 1] indicatestheposition(in B)at whichtoputthenextinstanceofkey k.(It’s k-1 intheseplaces,ratherthan k, because1isthesmallestkey.
8.6.SELECTIONSORT 149 A: 3/A 2/B 2/C 1/D 4/E 2/F 3/G count1: 0 1 3 2 1 count2: 0 1 4 6 7 B0: 3/A count: 0 1 5 6 7 B1: 2/B 3/A count: 0 2 5 6 7 B2: 2/B 2/C 3/A count: 0 3 5 6 7 B3: 1/D 2/B 2/C 3/A count: 1 3 5 6 7 B4: 1/D 2/B 2/C 3/A 4/E count: 1 3 5 7 7 B5: 1/D 2/B 2/C 2/F 3/A 4/E count: 1 4 5 7 7 B6: 1/D 2/B 2/C 2/F 3/A 3/G 4/E count: 1 4 6 7 7
/**AssumingthatAandBarenotthesamearrayandareof *thesamesize,sorttheelementsofAstablyintoB. */
voiddistributionSort2(Record[]A,Record[]B)
intN=A.length;
intL=min(A),U=max(A);
/*count[i-L]willcontainthenumberofitems<i*/
//NOTE:count[U-L+1]isnotterriblyuseful,butis //includedtoavoidhavingtotestforfori==Uin //thefirstiloopbelow.
int[]count=newint[U-L+2];
//Clearcount:NotreallyneededinJava,butagoodhabit //togetintoforotherlanguages(e.g.,C,C++).
for(intj=L;j<=U+1;j+=1)
count[j-L]=0;
for(inti=0;i<N;i+=1)
count[key(A[i])-L+1]+=1;
/*Nowcount[i-L]==#ofrecordswhosekeyisequaltoi-1*/
// SeeFigure8.5,pointcount1
for(intj=L+1;j<=U;j+=1)
count[j-L]+=count[j-L-1];
/*Nowcount[k-L]==#ofrecordswhosekeyislessthank, *forallk,L<=k<=U.*/
// SeeFigure8.5,pointcount2.
for(i=0;i<N;i+=1){
/*Nowcount[k-L]==#ofrecordswhosekeyislessthank, *orwhosekeyiskandhavealreadybeenmovedtoB.*/
B[count[key(A[i])-L]]=A[i];
count[key(A[i])-L]+=1;
// SeeFigure8.5,pointsB0–B6
150
CHAPTER8.SORTINGANDSELECTING
{
} }
Figure8.6:DistributionSorting.Thisprogramassumesthat key(R) isaninteger.
8.7.EXCHANGESORTING:QUICKSORT 151
modifiedtoproduceitsoutputinaseparateoutputarray,and thenitisrelatively easytomaintainstability[how?].
Itshouldbeclearthatthealgorithmaboveisinsensitiveto thedata.Unlike insertionsort,it always takesthesamenumberofkeycomparisons—N (N 1)/2. Thus,inthisnaiveform,althoughitisverysimple,itsuffersincomparisonto insertionsort(atleastonasequentialmachine).
Ontheotherhand,wehaveseenanotherkindofselectionsort before—heapsort (from §6.4)isaformofselectionsortthat(ineffect)keepsaroundinformationabout theresultsofcomparisonsfromeachpreviouspass,thusspeedinguptheminimum selectionconsiderably.
8.7Exchangesorting:Quicksort
OneofthemostpopularmethodsforinternalsortingwasdevelopedbyC.A.R.Hoare3 . Evidentlymuchtakenwiththetechnique,henamedit“quicksort.”Thenameis actuallyquiteappropriate.Thebasicalgorithmisasfollows.
staticfinalintK=...;
voidquickSort(RecordA[])
quickSort(A,0,A.length-1); insertionSort(A);
/*PermuteA[L..U]sothatallrecordsare<Kawayfromtheir*/ /*correctpositionsinsortedorder.AssumesK>0.*/ voidquickSort(Record[]A,intL,intU)
if(U-L+1>K){
ChooseRecordT=A[p],whereL ≤ p ≤ U; P:SetiandpermuteA[L..U]toestablishthepartitioning condition:
; quickSort(A,L,i-1);quickSort(A,i+1,U);
Here, K isaconstantvaluethatcanbeadjustedtotunethespeedofthesort.Once theapproximatesortgetsallrecordswithinadistance K-1 oftheirfinallocations, thefinalinsertionsortproceedsin O(KN )time.If T canbechosensothatits
3 Knuth’sreference: ComputingJournal 5 (1962),pp.10–15.
{
}
{
key ≤ key(T) 0 T i key ≥ key(T) N } }
keyisnearthemediankeyfortherecordsin A,thenwecancomputeroughlythat thetimeinkeycomparisonsrequiredforperforming quicksort on N recordsis approximatedby C(N ),definedasfollows.
C(K)=0
C(N )= N 1+2C(⌊N/2⌋)
Thisassumesthatwecanpartitionan N -elementarrayin N 1comparisons,which we’llseetobepossible.Wecangetasenseforthesolutionby consideringthecase
).
Unfortunately,intheworstcase—wherethepartition T hasthelargestorsmallestkey,quicksortisessentiallyastraightselectionsort,withrunningtimeΘ(N 2). Thus,wemustbecarefulinthechoiceofthepartitioningelement.Onetechnique istochoosearandomrecord’skeyfor T.Thisiscertainlylikelytoavoidthebad cases.Acommonchoicefor T isthe median of A[L], A[(L+U)/2],and A[U],which isalsounlikelytofail.
Partitioning. Thisleavesthesmalllooseendofhowtopartitionthearrayat eachstage(stepPintheprogramabove).Therearemanywaystodothis.Hereis oneduetoNicoLomuto—notthefastest,butsimple.
152
CHAPTER8.SORTINGANDSELECTING
N =2mK: C(N )=2mK +2C(2m 1K) =2mK 1+2mK 2+4C(2m 2K) =2mK + +2mK m 1 2 4 2m 1 + C(K) = m2mK 2m +1
Θ(m2mK)=Θ(N lg N )
∈
(sincelg(2mK)= m lg K
P: swap(A,L,p); i=L; for(intj=L+1;j<=U;j+=1){ /*A[L..U]: T L <T i ≥T j U */ if(before(A[j],T)){ i+=1; swap(A,j,i); } } /*A[L..U]: T L <T i ≥T U */
swap(A,L,i); /*A[L..U]:
Someauthorsgotothetroubleofdevelopingnon-recursiveversionsofquicksort,evidentlyundertheimpressionthattheyaretherebyvastlyimprovingits performance.Thisviewofthecostofrecursioniswidelyheld,soIsupposeIcan’t besurprised.However,aquicktestusingaCversionindicatedabouta3%improvementusinghisiterativeversion.Thisishardlyworth obscuringone’scodeto obtain.
8.8Mergesorting
Quicksortwasakindofdivide-and-conqueralgorithm4 thatwemightcall“try todivide-and-conquer,”sinceitisnotguaranteedtosucceedindividingthedata evenly.Anoldertechnique,knownasmergesorting,isaform ofdivide-and-conquer thatdoesguaranteethatthedataaredividedevenly.
Atahighlevel,itgoesasfollows.
/**SortitemsA[L..U].*/
staticvoidmergeSort(Record[]A,intL,intU)
{ if(L>=U)
return;
mergeSort(A,L,(L+U)/2);
mergeSort(A,(L+U)/2+1,U);
merge(A,L,(L+U)/2,A,(L+U)/2+1,U,A,L); }
The merge programhasthefollowingspecification
/**AssumingV0[L0..U0]andV1[L1..U1]areeachsortedin*/ /*ascendingorderbykeys,setV2[L2..U2]tothesortedcontents*/ /*ofV0[L0..U0],V1[L1..U1].(U2=L2+U0+U1-L0-L1+1).*/
voidmerge(Record[]V0,intL0,intU0,Record[]V1,intL1,intU1, Record[]V2,intL2)
Since V0 and V1 areinascendingorderalready,itiseasytodothisinΘ(N )time, where N = U 2 L2+1,thecombinedsizeofthetwoarrays.Mergingprogresses throughthearraysfromlefttoright.Thatmakesitwell-suitedforcomputerswith smallmemoriesandlotstosort.Thearrayscanbeonsecondarystoragedevices
4 Theterm divide-and-conquer isusedtodescribealgorithmsthatdivideaproblemintosome numberofsmallerproblems,andthencombinetheanswerstothoseintoasingleresult.
8.8.MERGESORTING 153
<T L ≥T iU */
CHAPTER8.SORTINGANDSELECTING
thatarerestrictedto sequentialaccess—i.e.,thatrequirereadingorwritingthe arraysinincreasing(ordecreasing)orderofindex5 .
Therealworkisdonebythemergingprocess,ofcourse.Thepatternofthese mergesisratherinteresting.Forsimplicity,considerthe casewhere N isapower oftwo.Ifyoutracetheexecutionof mergeSort,you’llseethefollowingpatternof callson merge
Wecanexploitthispatterntogoodadvantagewhentryingtodomergesorting onlinkedlistsofelements,wheretheprocessofdividingthelistinhalfisnot aseasyasitisforarrays.Assumethatrecordsarelinkedtogetherinto Lists. Theprogrambelowshowshowtoperformamergesortontheselists;Figure8.7 illustratestheprocess.Theprogrammaintainsa binomialcomb ofsortedsublists, comb[0..M-1],suchthatthelistin comb[i] iseithernullorhaslength2i .
/**PermutetheRecordsinListAsoastobesortedbykey.*/ staticvoidmergeSort(List<Record>A) {
intM=anumbersuchthat 2M 1 ≥ lengthofA; List<Record>[]comb=newList<Record>[M];
for(inti=0;i<M;i+=1) comb[i]=newLinkedList<Record>(); for(RecordR:A) addToComb(comb,R); A.clear(); for(List<Record>L:comb) mergeInto(A,L); }
5Afamiliarmovieclich´eofdecadespastwasspinningtapeunitstoindicatethatsomepieceof machinerywasacomputer(alsooperatorsflippingconsoleswitches—somethingonealmost never reallydidduringnormaloperation).Whenthoseimagescame fromfootageofrealcomputers,the computerwasmostlikelysorting.
154
Call V0 V1 #
0. A[0] A[1]
1. A[2] A[3]
2. A[0..1] A[2..3]
3. A[4] A[5]
4. A[6] A[7]
5. A[4..5] A[6..7]
6. A[0..3] A[4..7]
7. A[8] A[9] etc.
Ateachpoint,thecombcontainssortedliststhataretobemerged.Wefirstbuild upthecombonenewitematatime,andthentakeafinalpassthroughit,merging allitslists.Toaddoneelementtothecomb,wehave
/**AssumingthateachC[i]isasortedlistwhoselengthiseither0 *or 2i elements,addsPtotheitemsinCsoasto *maintainthissamecondition.*/ staticvoidaddToComb(List<Record>C[],Recordp)
{
if(C[0].size()==0){
C[0].add(p); return;
}elseif(before(C[0].get(0),p))
C[0].add(p);
else
C[0].add(p,0);
//NowC[0]contains2items
inti; for(i=1;C[i].size()!=0;i+=1) mergeLists(C[i],C[i-1]); C[i]=C[i-1];C[i-1]=newLinkedList();
Ileavetoyouthe mergeLists procedure:
/**MergeL1intoL0,producingasortedlistcontainingallthe *elementsoriginallyinL0andL1.AssumesthatL0andL1are *eachsortedinitially(accordingtothebeforeordering).
*TheresultendsupinL0;L1becomesempty.*/ staticvoidmergeLists(List<Record>L0,List<Record>L1)
8.8.1Complexity
Theoptimistictimeestimateforquicksortappliesintheworstcasetomergesorting, becausemergesortsreallydodividethedatainhalfwitheachstep(andmergingof twolistsorarraystakeslineartime).Thus,mergesortingisaΘ(N lg N )algorithm, with N thenumberofrecords.Unlikequicksortorinsertionsort,mergesortingas Ihavedescribeditisgenerallyinsensitivetotheordering ofthedata.Thischanges somewhatwhenweconsiderexternalsorting,but O(N lg N )comparisonsremains anupperbound.
8.9Speedofcomparison-basedsorting
I’vepresentedanumberofalgorithmsandhaveclaimedthatthebestofthemrequire Θ(N lg N )comparisonsintheworstcase.Thereareseveralobviousquestionsto
8.9.SPEEDOFCOMPARISON-BASEDSORTING 155
}
CHAPTER8.SORTINGANDSELECTING
Figure8.7: Mergesortingoflists,showingthestateofthe“comb”after various numbersofitemsfromthelist L havebeenprocessed.Thefinalstepistomerge thelistsremaininginthecombafterall11elementsfromthe originallisthave beenaddedtoit.The0sand1sinthesmallboxesaredecorationstoillustratethe patternofmergesthatoccurs.Eachemptyboxhasa0andeachnon-emptybox hasa1.Ifyoureadthecontentsofthefourboxesasasinglebinarynumber,units bitontop,itequalsthenumberofelementsprocessed.
156
L:(9,15,5,3,0,6,10, 1,2,20,8) 0 0: 0 1: 0 2: 0 3: 0elementsprocessed L:(15,5,3,0,6,10, 1,2,20,8) 1 • 0: (9) 0 1: 0 2: 0 3: 1elementprocessed L:(5,3,0,6,10, 1,2,20,8) 0 0: 1 • 1: (9,15) 0 2: 0 3: 2elementsprocessed L:(3,0,6,10, 1,2,20,8) 1 • 0: (5) 1 • 1: (9,15) 0 2: 0 3: 3elementsprocessed L:(0,6,10, 1,2,20,8) 0 0: 0 1: 1 • 2: (3,5,9,15) 0 3: 4elementsprocessed L:(10, 1,2,20,8) 0 0: 1 • 1: (0,6) 1 • 2: (3,5,9,15) 0 3: 6elementsprocessed L: 1 • 0: (8) 1 • 1: (2,20) 0 2: 1 • 3: ( 1,0,3,5,6,9,10,15) 11elementsprocessed
askaboutthisbound.First,howdo“comparisons”translate into“instructions”?
Second,canwedobetterthan N lg N ?
ThepointofthefirstquestionisthatIhavebeenabitdishonesttosuggestthat acomparisonisaconstant-timeoperation.Forexample,whencomparingstrings, thesizeofthestringsmattersinthetimerequiredforcomparisonintheworstcase. Ofcourse,ontheaverage,oneexpectsnottohavetolooktoofarintoastringto determineadifference.Still,thismeansthattocorrectlytranslatecomparisonsinto instructions,weshouldthrowinanotherfactorofthelengthofthekey.Suppose thatthe N recordsinoursetallhavedistinctkeys.Thismeansthatthe keys themselveshavetobeΩ(lg N )long.Assumingkeysarenolongerthannecessary, andassumingthatcomparisontimegoesupproportionallyto thesizeofakey(in theworstcase),thismeansthatsorting really takesΘ(N (lg N )2)time(assuming thatthetimerequiredtomoveoneoftheserecordsisatworst proportionaltothe sizeofthekey).
AstothequestionaboutwhetheritispossibletodobetterthanΘ(N lg N ),the answeristhat if theonlyinformationwecanobtainaboutkeysishowtheycomparetoeachother,thenwecannotdobetterthanΘ(N lg N ).Thatis,Θ(N lg N ) comparisonsisalowerboundontheworstcaseofallpossible sortingalgorithms thatusecomparisons.
Theproofofthisassertionisinstructive.Asortingprogramcanbethoughtof asfirstperformingasequenceofcomparisons,andthendecidinghowtopermute itsinputs,based only ontheinformationgarneredbythecomparisons.Thetwo operationsactuallygetmixed,ofcourse,butwecanignorethatfacthere.Inorder fortheprogramto“know”enoughtopermutetwodifferentinputsdifferently,these inputsmustcausedifferentsequencesofcomparisonresults.Thus,wecanrepresent thisidealizedsortingprocessasatreeinwhichtheleafnodesarepermutationsand theinternalnodesarecomparisons,witheachleftchildcontainingthecomparisons andpermutationsthatareperformedwhenthecomparisonturnsouttrueandthe rightchildcontainingthosethatareperformedwhenthecomparisonturnsoutfalse.
Figure8.8illustratesthisforthecase N =3.Theheightofthistreecorrespondsto thenumberofcomparisonsperformed.Sincethenumberofpossiblepermutations (andthusleaves)is N !,andtheminimalheightofabinarytreewith M leavesis
N recordsisroughlylg(N !).
8.9.SPEEDOFCOMPARISON-BASEDSORTING 157
⌈lg M ⌉,theminimalheightofthecomparisontreefor
Now lg N !=lg N +lg(N 1)+ ... +1 ≤ lg N +lg N + ... +lg N = N lg N ∈ O(N lg N ) andalso(taking N tobeeven) lg N ! ≥ lg N +lg(N 1)+ ... +lg(N/2) ≥ (N/2+1)lg(N/2) ∈ Ω(N lg N )
Figure8.8:Acomparisontreefor N =3.Thethreevaluesbeingsortedare A, B,and C.Eachinternalnodeindicatesatest.Theleftchildrenindicatewhat happenswhenthetestissuccessful(true),andtherightchildrenindicatewhat happensifitisunsuccessful.Theleafnodes(rectangular) indicatetheorderingof thethreevaluesthatisuniquelydeterminedbythecomparisonresultsthatlead downtothem.Weassumeherethat A, B,and C aredistinct.Thistreeisoptimal, demonstratingthatthreecomparisonsareneededintheworstcasetosortthree items.
sothat
lg N ! ∈ Θ(N lg N )
Thus any sortingalgorithmthatusesonly(true/false)keycomparisonstogetinformationabouttheorderofitsinput’skeysrequiresΘ(N lg N )comparisonsinthe worstcasetosort N keys.
8.10Radixsorting
Togettheresultin §8.9,weassumedthattheonlyexaminationofkeysavailablewas comparingthemfororder.Supposethatweare not restrictedtosimplycomparing keys.CanweimproveonourO(N lg N )bounds?Interestinglyenough,wecan, sortof.Thisispossiblebymeansofatechniqueknownas radixsort.
Mostkeysareactuallysequencesoffixed-sizepieces(charactersorbytes,in particular)withalexicographicorderingrelation—thatis,thekey k0k1 kn 1 is lessthan k′ 0k′ 1 k′ n 1 if k0 <k′ 0 or k0 = k′ 0 and k1 kn 1 islessthan k′ 1 k′ n 1 (wecanalwaystreatthekeysashavingequallengthbychoosingasuitablepadding characterfortheshorterstring).Justasinasearchtriewe usedsuccessivecharactersinasetofkeystodistributethestringsamongstsubtrees,wecanusesuccessive charactersofkeystosortthem.Therearebasicallytwovarietiesofalgorithm— onethatworksfromleastsignificanttomostsignificantdigit(LSD-first)andone thatworksfrommostsignificanttoleastsignificantdigit(MSD-first).Iuse“digit”
158 CHAPTER8.SORTINGANDSELECTING A<B B<C (A,B,C) A<C (A,C,B) (C,A,B) A<C (B,A,C) B<C (B,C,A) (C,B,A)
hereasagenerictermencompassingnotonlydecimaldigits, butalsoalphabetic characters,orwhateverisappropriatetothedataoneissorting.
8.10.1LSD-firstradixsorting
TheideaoftheLSD-firstalgorithmistofirstusetheleastsignificantcharacterto orderallrecords,thenthesecond-leastsignificant,andso forth.Ateachstage,we performa stable sort,sothatifthe k mostsignificantcharactersoftworecords areidentical,theywillremainsortedbytheremaining,leastsignificant,characters. Becausecharactershavealimitedrangeofvalues,itiseasy tosorttheminlinear time(using,forexample, distributionSort2,or,iftherecordsarekeptinalinked list,bykeepinganarrayoflistheaders,oneforeachpossiblecharactervalue). Figure8.9illustratestheprocess.
LSD-firstradixsortispreciselythealgorithmusedbycardsorters.Thesemachineshadaseriesofbinsandcouldbeprogrammed(usingplugboards)todrop cardsfromafeederintobinsdependingonwhatwaspunchedin aparticularcolumn.Byrepeatingtheprocessforeachcolumn,oneendedupwithasorteddeck ofcards.
Eachdistributionofarecordtoabintakes(about)constant time(assumingwe usepointerstoavoidmovinglargeamountsofdataaround).Thus,thetotaltime isproportionaltothetotalamountofkeydata—whichisthetotalnumberofbytes inallkeys.Inotherwords,radixsortingis O(B)where B isthetotalnumberof bytesofkeydata.Ifkeysare K byteslong,then B = NK,where N isthenumber ofrecords.Sincemergesorting,heapsorting,etc.,require O(N lg N )comparisons, eachrequiringintheworstcase K time,wegetatotaltimeof O(NK lg N )= O(B lg N )timeforthesesorts.Evenifweassumeconstantcomparison time,if keysarenolongerthantheyhavetobe(inordertoprovide N differentkeyswe musthave K ≥ logC N ,where C isthenumberofpossiblecharacters),thenradix sortingisalso O(N lg N ).
Thus,relaxingtheconstraintonwhatwecandotokeysyields afastsorting procedure,atleastinprinciple.Asusual,theDevilisinthedetails.Ifthekeys areconsiderablylongerthanlogC N ,astheyveryoftenare,thepassesmadeonthe lastcharacterswilltypicallybelargelywasted.Onepossibleimprovement,which KnuthcreditstoM.D.Maclaren,istouseLSD-firstradixsort onthefirsttwo characters,andthenfinishwithaninsertionsort(onthetheorythatthingswill almostbeinorderaftertheradixsort).Wemustfudgethedefinitionof“character” forthispurpose,allowingcharacterstogrowslightlywith N .Forexample,when N =100000,Maclaren’soptimalprocedureistosortonthefirst andsecond10-bit segmentsofthekey(onan8-bitmachine,thisisthefirst2.25 characters).Of course,thistechniquecan,inprinciple,makenoguaranteesof O(B)performance.
8.10.2MSD-firstradixsorting
Performingradixsortstartingatthemostsignificantdigit probablyseemsmore naturaltomostofus.Wesorttheinputbythefirst(most-significant)character into C (orfewer)subsequences,oneforeachstartingcharacter(thatis,thefirst
8.10.RADIXSORTING 159
Initial:set,cat,cad,con,bat,can,be,let,bet
be ‘ ⊔ ’
cad ‘d’
can con ‘n’
bet let bat cat set ‘t’
Afterfirstpass:be,cad,con,can,set,cat,bat,let,bet bat cat can cad ‘a’
bet let set be ‘e’
con ‘o’
Aftersecondpass:cad,can,cat,bat,be,set,let,bet,con bet be bat ‘b’
con cat can cad ‘c’
let ‘l’
set ‘s’
Afterfinalpass:bat,be,bet,cad,can,cat,con,let,set
Figure8.9:AnexampleofaLSD-firstradixsort.Eachpasssortsbyonecharacter, startingwiththelast.Sortingconsistsofdistributingtherecordstobinsindexed bycharacters,andthenconcatenatingthebins’contentstogether.Onlynon-empty binsareshown.
160 CHAPTER8.SORTINGANDSELECTING
Figure8.10:AnexampleofanMSDradixsortonthesamedataas inFigure8.9. Thefirstlineshowstheinitialcontentsof A andthelastshowsthefinalcontents. Partially-sortedsegmentsthatagreeintheirinitialcharactersareseparatedby singleslash(/)characters.The ⋆ characterindicatesthesegmentofthearraythat isabouttobesortedandthe posn columnshowswhichcharacterpositionisabout tobeusedforthesort.
characterofallthekeysinanygivensubsequenceisthesame).Next,wesorteach ofthesubsequencesthathasmorethanonekeyindividuallybyitssecondcharacter, yieldinganothergroupofsubsequencesinwhichallkeysinanygivensubsequence agreeintheirfirsttwocharacters.Thisprocesscontinuesuntilallsubsequences areoflength1.Ateachstage,weorderthesubsequences,sothatonesubsequence precedesanotherifallitsstringsprecedeallthoseintheother.Whenwearedone, wesimplywriteoutallthesubsequencesintheproperorder.
Thetrickypartiskeepingtrackofallthesubsequencessothattheycanbe outputintheproperorderattheendandsothatwecanquickly findthenext subsequenceoflengthgreaterthanone.Hereisasketchofonetechniqueforsorting anarray;itisillustratedinFigure8.10.
staticfinalintALPHA=sizeofalphabetofdigits;
/**SortA[L..U]stably,ignoringthefirstkcharactersineachkey.*/ staticvoidMSDradixSort(Record[]A,intL,intU,intk){ int[]countLess=newint[ALPHA+1];
SortA[L..U]stablybythekthcharacterofeachkey,andforeach digit,c,setcountLess[c]tothenumberofrecordsinA whosekthcharactercomesbeforecinalphabeticalorder.
for(inti=0;i<=ALPHA;i+=1)
if(countLess[i+1]-countLess[i]>1)
MSDradixSort(A,L+countLess[i], L+countLess[i+1]-1,k+1);
8.10.RADIXSORTING 161 A posn ⋆ set,cat,cad,con,bat,can,be,let,bet 0 ⋆ bat,be,bet/cat,cad,con,can/let/set 1 bat/ ⋆ be,bet/cat,cad,con,can/let/set 2 bat/be/bet/ ⋆ cat,cad,con,can/let/set 1 bat/be/bet/ ⋆ cat,cad,can/con/let/set 2 bat/be/bet/cad/can/cat/con/let/set
}
8.11Usingthelibrary
Notwithstandingallthetroublewe’vetakeninthischapter tolookatsortingalgorithms,inmostprogramsyoushouldn’teventhinkaboutwritingyourownsorting subprogram!Goodlibrariesprovidethemforyou.TheJavastandardlibraryhas aclasscalled java.util.Collections,whichcontainsonlystaticdefinitionsof usefulutilitiesrelatedtoCollections.Forsorting,wehave
/**SortLstablyintoascendingorder,asdefinedbyC.Lmust *bemodifiable,butneednotbeexpandable.*/
publicstatic<T>voidsort(List<T>L,Comparator<?superT>c){ } /**SortLintoascendingorder,asdefinedbythenaturalordering *oftheelements.Lmustbemodifiable,butneednotbeexpandable.*/ publicstatic<TextendsComparable<T>>voidsort(List<T> L){ ··· }
Thesetwomethodsuseaformofmergesort,guaranteeing O(N lg N )worst-case performance.Giventhesedefinitions,youshouldnotgenerallyneedtowriteyour ownsortingroutineunlessthesequencetobesortedisextremelylarge(inparticular,ifitrequiresexternalsorting),iftheitemstobesortedhaveprimitivetypes (like int),oryouhaveanapplicationwhereitisnecessarytosqueeze everysingle microsecondoutofthealgorithm(arareoccurrence).
8.12Selection
Considertheproblemoffindingthe median valueinanarray—avalueinthearray withasmanyarrayelementslessthanitasgreaterthanit.Abrute-forcemethod offindingsuchanelementistosortthearrayandchoosethemiddleelement(or a middleelement,ifthearrayhasanevennumberofelements). However,wecando substantiallybetter.
Thegeneralproblemis selection—givena(generallyunsorted)sequenceofelementsandanumber k,findthe kth valueinthesortedsequenceofelements.Finding amedian,maximum,orminimumvalueisaspecialcaseofthisgeneralproblem. PerhapstheeasiestefficientmethodisthefollowingsimpleadaptationofHoare’s quicksortalgorithm.
/**Assuming0<=k<N,returnarecordofAwhosekeyiskthsmallest *(k=0givesthesmallest,k=1,thenextsmallest,etc.).Amay *bepermutedbythealgorithm.*/
Recordselect(Record[]A,intL,intU,intk){
RecordT=somememberofA[L..U];
PermuteA[L..U]andfindptoestablishthepartitioning condition:
162 CHAPTER8.SORTINGANDSELECTING
key ≤ key(T) L T p key ≥ key(T) U ;
if(p-L==k) returnT; if(p-L<k)
returnselect(A,p+1,U,k-p+L-1); else
returnselect(A,L,p-1,k); }
Thekeyobservationhereisthatwhenthearrayispartitionedasforquicksort,the value T isthe(p L)stsmallestelement;the p L smallestrecordkeyswillbein A[L..p-1];andthelargerrecordkeyswillbein A[p+1..U].Hence,if k<p L,the kth smallestkeyisintheleftpartof A andif k>p,itmustbethe(k p + L 1)st largestkeyintherighthalf.
Optimistically,assumingthateachpartitiondividesthearrayinhalf,therecurrencegoverningcosthere(measuredinnumberofcomparisons)is
C(1)=0
C(N )= N + C(⌈N/2⌉)
where N = U L+1.The N comesfromthecostofpartitioning,andthe C(⌈N/2⌉) fromtherecursivecall.Thisdiffersfromthequicksortandmergesortrecurrences bythefactthatthemultiplierof C(···)is1ratherthan2.For N =2m weget C
Thisalgorithmisonlyprobabilisticallygood,justaswasquicksort.Thereare selectionalgorithmsthat guarantee linearbounds,butwe’llleavethemtoacourse onalgorithms.
Exercises
8.1. Youaregiventwosetsofkeys(i.e.,sothatneithercontains duplicatekeys), S0 and S1,bothrepresentedasarrays.Assumingthatyoucancomparekeysfor “greaterthanorequalto,”howwouldyoucomputetheintersectionofthe S0 and S1,andhowlongwouldittake?
8.2. Givenalargelistofwords,howwouldyouquicklyfindallanagramsinthe list?(An anagram hereisawordinthelistthatcanbeformedfromanotherword onthelistbyrearrangingitsletters,asin“dearth”and“thread”).
8.12.SELECTION 163
N
m + C(2m 1) =2m +2m 1 + C(2m 2) =2m+1 1=2N 1 ∈ Θ(N )
(
)=2
8.3. Supposethatwehaveanarray, D,of N records.Withoutmodifyingthis array,Iwouldliketocomputean N -elementarray, P ,containingapermutationof theintegers0to N 1suchthatthesequence D[P [0]],D[P [1]],...,D[P [N 1]]is sorted stably.Giveageneralmethodthatworkswithanysortingalgorithm (stable ornot)anddoesn’trequireanyadditionalstorage(otherthanthatnormallyused bythesortingalgorithm).
8.4. Averysimplespellingcheckersimplyremovesallendingpunctuationfrom itswordsandlooksupeachinadictionary.Comparewaysofdoingthisfromthe classesintheJavalibrary:usingan ArrayList tostorethewordsinsortedorder,a TreeSet,anda HashSet.There’slittleprogramminginvolved,asidefromlearning tousetheJavalibrary.
8.5. Iamgivenalistofrangesofnumbers,[xi,x′ i],eachwith0 ≤ xi <x′ i ≤ 1. Iwanttoknowalltherangesofvaluesbetween0and1thatare not coveredby oneoftheserangesofnumbers.So,iftheonlyinputis[0 25, 0 5],thentheoutput wouldbe[0.0, 0.25]and[0.5, 1.0](nevermindtheendpoints).Showhowtodothis quickly.
164 CHAPTER8.SORTINGANDSELECTING
Chapter9
BalancedSearching
We’veseenthatbinarysearchtreeshaveaweakness:atendencytobebecome unbalanced, sothattheyareineffectiveindividingthesetofdatatheyrepresent intotwosubstantiallysmallerparts.Let’sconsiderwhatwecandoaboutthis.
Ofcourse,wecouldalwaysrebalanceanunbalancedtreebysimplylayingall thekeysoutinorderandthenre-insertingtheminsuchawayastokeepthetree balanced.Thatoperation,however,requirestimelinearin thenumberofkeysin thetree,anditisdifficulttoseehowtoavoidhavingaΘ(N 2)factorcreepinto thetimerequiredtoinsert N keys.Bycontrast,only O(N lg N )timeisrequired tomake N insertionsifthedatahappentobepresentedinanorderthat keepsthe treebushy.Solet’sfirstlookatoperationstore-balanceatree(orkeepitbalanced) withouttakingitapartandreconstructingit.
9.1BalancebyConstruction:B-Trees
Anotherwaytokeepasearchtreebalancedistobecarefulalwaysto“insertnew keysinagoodplace”sothatthetreeremainsbushybyconstruction.Thedatabase communityhaslongusedadatastructurethatdoesexactlythis:the B-tree 1.We willdescribethedatastructureandoperationsabstractly here,ratherthangive code,sinceinpracticethereisawholeraftofdevicesoneusestogainspeed. A B-treeoforder m isapositionaltreewiththefollowingproperties:
1.Allnodeshave m orfewerchildren.
2.Allnodesotherthantheroothaveatleast m/2children(wecanalsosaythat eachnodeotherthantherootcontainsatleast ⌈m/2⌉ children2).
3.Anodewithchildren C0,C1,...,Cn 1 islabeledwithkeys K1,...,Kn 1 (thinkofkey Ki asresting“between” Ci 1 and Ci),with K1 <K2 < <
1 D.Knuth’sreference:R.BayerandE.McCreight, ActaInformatica (1972),173–189,andalso unpublishedindependentworkbyM.Kaufman.
2 Thenotation ⌈x⌉ means“thesmallestintegerthatis ≥ x.”
Kn 1.
165
4.AB-treeisasearchtree:Foranynode,allkeysinthesubtreerootedat Ci arestrictlylessthan Ki+1,and(for i> 0),strictlygreaterthan Ki.
5.Alltheemptychildrenoccuratthesamelevelofthetree.
Figure9.1containsanexampleofanorder-4tree.Inrealimplementations,B-trees tendtobekeptonsecondarystorage(disksandthelike),withtheirnodesbeing readinasneeded.Wechoose m soastomakethetransferofdatafromsecondary storageasfastaspossible.Disksinparticulartendtohave minimumtransfertimes foreachreadoperation,sothatforawiderangeofvaluesof m,thereislittle differenceinthetimerequiredtoreadinanode.Making m toosmallinthatcase isanobviouslybadidea.
We’llrepresentthenodesofaB-treewithastructurewe’llcalla BTreeNode, forwhichwe’llusethefollowingterminology:
B.child(i) Childnumber i ofB-treenode B,where0 ≤ i<m
B.key(i) Keynumber i ofB-treenode B,where1 ≤ i<m.
B.parent() Theparentnodeof B
B.index() Theinteger, i,suchthat B==B.parent().child(i)
B.arity() Thenumberofchildrenin B.
AnentireB-tree,then,wouldconsistofapointertotheroot,withperhapssome extrausefulinformation,suchasthecurrentsizeoftheB-tree.
Becauseofproperties(2)and(5),aB-treecontaining N keysmusthave O(logm/2 N ) levels.Becauseofproperty(1),searchingasinglenode’skeystakes O(1)time(we assume m isfixed).Therefore,searchingaB-treebythefollowingobviousrecursive algorithmisan O(logm N )= O(lg N )operation:
booleansearch(BTreeNodeB,KeyX){ if(B istheemptytree)
returnfalse; else{
Findlargest c suchthat B.key(i) ≤ X,forall 1 ≤ i ≤ c. if(c>0&&X.equals(B.key(c)))
returntrue; else
returnsearch(B.child(c),K);
166 CHAPTER9.BALANCEDSEARCHING
}
Figure9.1:ExampleofaB-treeoforder4withintegerkeys.Circlesrepresentempty nodes,whichappearallatthesamelevel.Eachnodehastwoto fourchildren,and onetothreekeys.Eachkeyisgreaterthanallkeysinthechildrentoitsleftand lessthanallkeysinthechildrentoitsright.
9.1.1B-treeInsertion
Initially,weinsertintothebottomofaB-tree,justasforbinarysearchtrees. However,weavoid“scrawny”treesbyfillingournodesupandsplittingthem, ratherthanextendingthemdown.Theideaissimple:wefindan appropriateplace atthebottomofthetreetoinsertagivenkey,andperformthe insertion(also addinganadditionalemptychild).Ifthismakesthenodetoo big(sothatithas m keysand m +1(empty)children),we split thenode,asinthecodeinFigure9.2. Figure9.3illustratestheprocess.
9.1.2B-treedeletion
DeletingfromaB-treeisgenerallymorecomplicatedthaninsertion,butnottoo bad.Asusual,real,productionimplementationsintroduce numerousintricaciesfor speed.Tokeepthingssimple,I’lljustdescribeastraightforward,idealizedmethod. Takingourcuefromthewaythatinsertionworks,wewillfirst movethekeytobe deleteddowntothebottomofthetree(wheredeletionisstraightforward).Then, ifdeletionhasmadetheoriginalnodetoosmall,wemergeitwithasibling,pulling downthekeythatusedtoseparatethetwofromtheparent.The pseudocodein Figure9.4describestheprocess,whichisalsoillustrated inFigure9.5.
10 20 30 40 50 60 95 100 120 130 140 150 255590 125 115
9.1.BALANCEBYCONSTRUCTION:B-TREES 167
/**SplitB-treenodeB,whichhasfrom m +1 to 2m +1 *children.*/
voidsplit(BTreeNodeB){ intk=B.arity()/2; KeyX=B.key(k);
BTreeNodeB2= anewBTreenode ; move B.child(k) through B.child(m) and B.key(k+1) through B.key(m) outof Band into B2; remove B.key(k) fromB ; if(B wastheroot ){ createanewrootwithchildren B and B2 andwithkey X; }else{ BTreeNodeP=B.parent(); intc=B.index();
insertchild B2 atposition c+1 in P,andkey X atposition c+1 in P,movingsubsequent childrenandkeysof P overasneeded; if(P.arity()> m) split(P);
168 CHAPTER9.BALANCEDSEARCHING
} }
Figure9.2:SplittingaB-treenode.Figure9.3containsillustrations.
9.1.BALANCEBYCONSTRUCTION:B-TREES 169 (a)Insert15: 10 20 10 15 20 (b)Insert145: 120 130 140 150 125 120 130 145 150 125140 (c)Insert35: 10 15 20 30 40 50 60 95 100 255590 115 10 15 20 30 40 50 60 95 100 25 5590 35115
Figure9.3: InsertingintoaB-tree.Theexamplesmodifythetreein9.1byinserting 15,145,andthen35.
/**DeleteB.key(i)fromtheBTreecontainingB.*/ voiddeleteKey(BTreeNodeB,inti){
if(B’s childrenareallempty ) remove B.key(i), movingoverremainingkeys ; else{ intn=B.child(i-1).arity(); merge(B,i);
//Thekeywewanttodeleteisnow#nofchild#i-1. deleteKey(B.child(i-1),n);
if(B.arity()> m)//Happensonlyonrecursivecalls split(B); regroup(B);
/**MoveB.key(i)andthecontentsofB.child(i)intoB.child(i-1), *afteritsexistingkeysandchildren.RemoveB.key(i)and *B.child(i)fromB,movingovertheremainingcontents.
*(TemporarilymakesB.child(i-1)toolarge.Theformer *B.child(i)becomesgarbage).*/ voidmerge(BTreeNodeB,inti){ implementationstraightforward }
/**IfBhastoofewchildren,regrouptheB-treetore-establish *theB-treeconditions.*/ voidregroup(BTreeNodeB){
if(B istheroot &&B.arity()==1) make B.child(0) thenewroot; elseif(B isnottheroot &&B.arity()< m/2){ if(B.index()==0)
merge(B.parent(),1); else
merge(B.parent(),B.index()); regroup(B.parent());
170 CHAPTER9.BALANCEDSEARCHING
}
}
} }
Figure9.4:DeletingfromaB-treenode.SeeFigure9.5foran illustration.
9.1.BALANCEBYCONSTRUCTION:B-TREES 171
removeitandsplitresultingnode(whichistoobig),at15.Next,(b)showsdeletion of10fromthetreeproducedby(a).Deleting10fromitsnodeatthebottommakes thatnodetoosmall,sowemergeit,moving15downfromtheparent.Thatinturn makestheparenttoosmall,sowemergeit,moving35downfrom theroot,giving thefinaltree.
(a)Stepsinremoving25: 10 15 20 30 25 10 15 20 25 30 10 20 30 15 (b)Removing10: 10 20 30 40 50 60 95 100 15 5590 ··· 35115 15 20 30 40 50 60 95 100 355590 115
Figure9.5: DeletionfromaB-tree.Theexamplesstartfromthefinaltree in Figure9.3c.In(a),weremove25.First,mergetomove25tothebottom.Then
9.1.3Red-BlackTrees:BinarySearchTreesas(2,4)Trees
GeneralB-treesusenodesthatare,ineffect,orderedarrays ofkeys.Order4Btrees(alsoknownas (2,4)trees,)lendthemselvestoanalternativerepresentation, knownas red-blacktrees. Every(2,4)treemapsontoparticularbinarysearchtree insuchawaythateach(2,4)nodecorrespondstoasmallclusterof1–3binary search-treenodes.Asaresult,thebinarysearchtreeisroughlybalanced,inthe sensethatallpathsfromroottoleaveshavelengthsthatdifferbyatmostafactor of2.Figure9.6showstherepresentationsofeachpossible(2,4)node.Inafulltree, wecanindicatetheboundariesoftheclusterswithaboolean quantitythatisset onlyintherootnodeofeachcluster.Traditionally,however,wedescribenodesin whichthisbooleanistrue(andalsothenullleafnodes)as“black”andtheother nodesas“red.”
ByconsideringFigure9.6andthestructureof(2,4)trees,youcanderivethat red-blacktreesarebinarysearchtreesthatadditionallyobeythefollowingconstraints(whichinstandardtreatmentsofred-blacktreesserveastheirdefinition):
A.Therootnodeandall(null)leavesareblack.
B.Everychildofarednodeisblack.
C.Anypathfromtheroottoaleaftraversesthesamenumberof blacknodes.
Again,propertiesBandCtogetherimplythatred-blacktreesare“bushy.”
Search,ofcourse,isordinarybinary-treesearch.Because ofthemappingbetween(2,4)treesandred-blacktreesshowninthefigure,the algorithmsforinsertion anddeletionarederivablefromthosefororder-4B-trees.Theusualproceduresfor manipulatingred-blacktreesdon’tusethiscorrespondencedirectly,butformulate thenecessarymanipulationsasordinarybinarysearch-treeoperationsfollowedby rebalancingrotations(see §9.3)andrecoloringsofnodesthatareguidedbythe colorsofnodesandtheirneighbors.Wewon’tgointodetails here.
9.2Tries
Looselyspeaking,balanced(maximallybushy)binarysearchtreescontaining N keysrequireΘ(lg N )timetofindakey.Thisisnotentirelyaccurate,ofcourse, becauseitneglectsthepossibilitythatthetimerequiredto compare againstakey dependsonthekey.Forexample,thetimerequiredtocompare twostringsdepends onthelengthoftheshorterstring.Therefore,inalltheplacesI’vesaid“Θ(lg N )” before,I really meant“Θ(L lg N )”for L aboundonthenumberofbytesinthe key.Inmostapplications,thisdoesn’tmuchmatter,since L tendstoincreasevery slowly,ifatall,as N increases.Nevertheless,wecometoaninterestingquestion: weevidentlycan’tgetridofthefactorof L tooeasily(afterall,youhavetolook atthekeyyou’researchingfor),butcanwegetridofthefactoroflg N ?
172 CHAPTER9.BALANCEDSEARCHING
Ontheleftarethethreepossiblecasesforasingle(2,4)node(onetothreekeys, ortwotofourchildren).Ontherightarethecorrespondingbinarysearchtrees.In eachcase,thetopbinarynodeiscoloredblackandtheothers arered.
9.2.TRIES 173 A B k1 k1 A B A B C k1 k2 k1 A k2 B C k2 k1 A B C or A B C D k1 k2 k3 k2 k1 A B k3 C D
Figure9.6: Representationsof(2,4)nodeswith“red”and“black”binarytreenodes.
9.2.1Tries:basicpropertiesandalgorithms
Itturnsoutthatwecanavoidthelg N factor,usingadatastructureknownasa trie 3.Apuretrieisakindoftreethatrepresentsasetofstringsfromsomealphabet offixedsize,say A = {a0,...,aM 1}.Oneofthecharactersisaspecialdelimiter thatappearsonlyattheendsofwords,‘✷’.Forexample, A mightbethesetof printableASCIIcharacters,with ✷ representedbyanunprintablecharacter,such as ’\000’ (NUL).Atrie, T ,maybeabstractlydefinedbythefollowingrecursive definition4:Atrie, T ,iseither
• empty,or
• aleafnodecontainingastring,or
• aninternalnodecontaining M childrenthatarealsotries.Theedgesleading tothesechildrenarelabeledbythecharactersofthealphabet, ai,likethis: Ca0 ,Ca1 ,...CaM 1 .
Wecanthinkofatrieasatreewhoseleafnodesarestrings.We imposeoneother condition:
• Ifbystartingattherootofatrieandfollowingedgeslabeled s0,s1,...,sh 1, wereachastring,thenthatstringbegins s0s1 sh 1.
Therefore,youcanthinkofeveryinternalnodeofatrieasstandingforsome prefix ofallthestringsintheleavesbelowit:specifically,aninternalnodeatlevel k standsforthefirst k charactersofeachstringbelowit.
Astring S = s0s1 ··· sm 1 isin T ifbystartingattherootof T andfollowing0 ormoreedgeswithlabeled s0 ··· sj ,wearriveatthestring S.Wewillpretendthat allstringsin T endin ✷,whichappearsonlyasthelastcharacterofastring.
3Howisitpronounced?Ihavenoidea.Thewordwassuggestedby E.Fredkinin1960,who deriveditfromtheword“retrieval.Despitethisetymology,Iusuallypronounceitlike“try”to avoidverbalconfusionwith“tree.”
4ThisversionofthetriedatastructureisdescribedinD.E.Knuth, TheArtofProgramming, vol.3,whichis the standardreferenceonsortingandsearching.Theoriginaldatastructure, proposedin1959bydelaBriandais,wasslightlydifferent.
174 CHAPTER9.BALANCEDSEARCHING
{a,abase,abash,abate,abbas,axe, axolotl,fabric,facet}.Theinternalnodesarelabeledtoshowthestringprefixesto whichtheycorrespond.
9.2.TRIES 175 a a ✷ a✷ b ab a aba s abas e abase✷ h abash✷ t abate✷ b abbas✷ x ax e axe✷ o axolotl✷ f f a fa b fabric✷ c facet✷
a a ✷ a✷ b ab a aba s abas e abase✷ h abash✷ t abate✷ b abbas✷ x ax e axe✷ o axolotl✷ b bat✷ f f a fa b fabric✷ c fac e face p faceplate✷ t facet✷
Figure9.7:Atriecontainingthesetofstrings
Figure9.8:Resultofinsertingthestrings“bat”and“faceplate”intothetriein Figure9.7.
Figure9.7showsatriethatrepresentsasmallsetofstrings.Toseeifastring isintheset,westartattherootofthetrieandfollowtheedges(linkstochildren) markedwiththesuccessivecharactersinthestringwearelookingfor(including theimaginary ✷ attheend).Ifwesucceedinfindingastringsomewherealong thispathanditequalsthestringwearesearchingfor,thenthestringweare searchingforisinthetrie.Ifwedon’t,itisnotinthetrie. Foreachword,we needinternalnodesonlyasfardownastherearemultiplewordsstoredthatstart withthecharacterstraversedtothatpoint.Theconvention ofendingeverything withaspecialcharacterallowsustodistinguishbetweenasituationinwhichthe triecontainstwowords,oneofwhichisaprefixoftheother(like“a”and“abate”), fromthesituationwherethetriecontainsonlyonelongword.
Fromatrieuser’spointofview,itlookslikeakindoftreewithStringlabels: publicabstractclassTrie{
/**TheemptyTrie.*/ publicstaticfinalTrieEMPTY=newEmptyTrie();
/**Thelabelatthisnode.Definedonlyonleaves.*/ abstractpublicStringlabel();
/**TrueifXisinthisTrie.*/ publicbooleanisIn(Stringx)...
/**TheresultofinsertingXintothisTrie,ifitisnot *alreadythere,andreturningthis.Thistrieis *unchangedifXisinitalready.*/ publicTrieinsert(Stringx)...
/**TheresultofremovingXfromthisTrie,ifitispresent. *ThetrieisunchangedifXisnotpresent.*/ publicTrieremove(Stringx)...
/**TrueifthisTrieisaleaf(containingasingleString).*/ abstractpublicbooleanisLeaf();
/**TrueifthisTrieisempty*/ abstractpublicbooleanisEmpty();
/**ThechildnumberedwithcharacterK.Requiresthatthisnode *notbeempty.Child0correspondsto ✷.*/ abstractpublicTriechild(intk);
/**SetthechildnumberedwithcharacterKtoC.Requiresthat *thisnodenotbeempty.(Intendedonlyforinternaluse.*/ abstractprotectedvoidsetChild(intk,TrieC);
176 CHAPTER9.BALANCEDSEARCHING
}
Thefollowingalgorithmdescribesasearchthroughatrie.
/**TrueifXisinthisTrie.*/ publicbooleanisIn(Stringx){ TrieP=longestPrefix(x,0); returnP.isLeaf()&&x.equals(P.label()); }
/**ThenoderepresentingthelongestprefixofX.substring(K)that *matchesaStringinthistrie.*/ privateTrielongestPrefix(Stringx,intk){ if(isEmpty()||isLeaf()) returnthis; intc=nth(x,k); if(child(c).isEmpty()) returnthis; else returnchild(c).longestPrefix(x,k+1); }
/**CharacterKofX,or ✷ ifKisofftheendofX.*/ staticcharnth(Stringx,intk){ if(k>=x.length()) return(char)0; else returnx.charAt(k); }
Itshouldbeclearfromfollowingthisprocedurethatthetimerequiredtofind akeyisproportionaltothelengthofthekey.Infact,thenumberoflevelsofthe triethatneedtobetraversedcanbeconsiderablylessthanthelengthofthekey, especiallywhentherearefewkeysstored.However,ifastringisinthetrie,you willhavetolookatallitscharacters,so isIn hasaworst-casetimeofΘ(x.length).
Toinsertakey X inatrie,weagainfindthelongestprefixof X inthetrie, whichcorrespondstosomenode P .Then,if P isaleafnode,weinsertenough internalnodestodistinguish X from P .label().Otherwise,wecaninsertaleaf for X intheappropriatechildof P .Figure9.8illustratestheresultsofadding“bat” and“faceplate”tothetrieinFigure9.7.Adding“bat”simplyrequiresaddinga leaftoanexistingnode.Adding“faceplate”requiresinsertingtwonewnodesfirst. Themethod insert belowperformsthetrieinsertion.
/**TheresultofinsertingXintothisTrie,ifitisnot *alreadythere,andreturningthis.Thistrieis *unchangedifXisinitalready.*/ publicTrieinsert(StringX)
9.2.TRIES 177
CHAPTER9.BALANCEDSEARCHING
{ returninsert(X,0);
/**AssumesthisisalevelLnodeinsomeTrie.Returnsthe*/ *resultofinsertingXintothisTrie.Hasnoeffect(returns *this)ifXisalreadyinthisTrie.*/ privateTrieinsert(StringX,intL)
if(isEmpty()) returnnewLeafTrie(X); intc=nth(X,L); if(isLeaf()){
if(X.equals(label())) returnthis; elseif(c==label().charAt(L)) returnnewInnerTrie(c,insert(X,L+1)); else{
TrienewNode=newInnerTrie(c,newLeafTrie(X)); newNode.child(label().charAt(L),this); returnnewNode;
}else{ child(c,child(c).insert(X,L+1)); returnthis;
Here,theconstructorfor InnerTrie(c,T ),describedlater,givesusaTrieforwhich child(c)is T andallotherchildrenareempty.
Deletingfromatriejustreversesthisprocess.Wheneveratrienodeisreduced tocontainingasingleleaf,itmaybereplacedbythatleaf.Thefollowingprogram indicatestheprocess.
publicTrieremove(Stringx)
{ returnremove(x,0);
/**RemovexfromthisTrie,whichisassumedtobelevelL,and *returntheresult.*/ privateTrieremove(Stringx,intL)
if(isEmpty()) returnthis;
if(isLeaf(T)){
178
}
{
}
} }
}
{
if(x.equals(label())) returnEMPTY; else returnthis; } intc=nth(x,L); child(c,child(c).remove(x,L+1)); intd=onlyMember(); if(d>=0) returnchild(d); returnthis;
/**IfthisTriecontainsasinglestring,whichisin *child(K),returnK.Otherwisereturns-1. privateintonlyMember(){/*Lefttothereader.*/}
9.2.2Tries:Representation
Weareleftwiththequestionofhowtorepresentthesetries. Themainproblem ofcourseisthatthenodescontainavariablenumberofchildren.Ifthenumberof childrenineachnodeissmall,alinkedtreerepresentation likethosedescribedin §5.2willwork.However,forfastaccess,itistraditionalto useanarraytoholdthe childrenofanode,indexedbythecharactersthatlabeltheedges. Thisleadstosomethinglikethefollowing:
classEmptyTrieextendsTrie{ publicbooleanisEmpty(){returntrue;} publicbooleanisLeaf(){returnfalse;} publicStringlabel(){thrownewError(...);} publicTriechild(intc){thrownewError(...);}
protectedvoidchild(intc,TrieT){thrownewError(...);}
classLeafTrieextendsTrie{ privateStringL;
/**ATriecontainingjustthestringS.*/ LeafTrie(Strings){L=s;}
publicbooleanisEmpty(){returnfalse;} publicbooleanisLeaf(){returntrue;} publicStringlabel(){returnL;} publicTriechild(intc){returnEMPTY;}
protectedvoidchild(intc,TrieT){thrownewError(...);} }
9.2.TRIES 179
}
}
classInnerTrieextendsTrie{ //ALPHABETSIZEhastobedefinedsomewhere*/ privateTrie[]kids=newkids[ALPHABETSIZE];
/**ATriewithchild(K)==Tandallotherchildrenempty.*/ InnerTrie(intk,TrieT){ for(inti=0;i<kids.length;i+=1) kids[i]=EMPTY; child(k,T); }
publicbooleanisEmpty(){returnfalse;} publicbooleanisLeaf(){returnfalse;} publicStringlabel(){thrownewError(...);} publicTriechild(intc){returnkids[c];} protectedvoidchild(intc,TrieT){kids[c]=T;}
9.2.3Tablecompression
Actually,ouralphabetislikelytohave“holes”init—stretchesofencodingsthat don’tcorrespondtoanycharacterthatwillappearintheStringsweinsert.We couldcutdownonthesizeoftheinnernodes(the kids arrays)byperforminga preliminarymappingof chars intoacompressedencoding.Forexample,ifthe onlycharactersinourstringsarethedigits0–9,thenwecouldre-do InnerTrie as follows:
classInnerTrieextendsTrie{ privatestaticchar[]charMap=newchar[’9’+1];
static{ charMap[0]=0; charMap[’0’]=1;charMap[’1’]=1;... }
publicTriechild(intc){returnkids[charMap[c]];}
protectedvoidchild(intc,TrieT){kids[charMap[c]]=T;}
Thishelps,butevenso,arraysthatmaybeindexedbyallcharactersvalidina keyarelikelytoberelativelylarge(foratreenode)—sayon theorderof M =60 bytesevenfornodesthatcancontainonlydigits(assuming4 bytesperpointer,4 bytesoverheadforeveryobject,4bytesforalengthfieldinthearray).Ifthereisa totalof N charactersinallkeys,thenthespaceneededisboundedbyabout NM/2. Theboundisreachedonlyinahighlypathologicalcase(wherethetriecontainsonly
180 CHAPTER9.BALANCEDSEARCHING
}
}
181
twoverylongstringsthatareidenticalexceptintheirlast characters).Nevertheless, thearraysthatariseintriescanbequite sparse.
Oneapproachtosolvingthisisto compress thetables.Thisisespeciallyapplicablewhentherearefewinsertionsoncesomeinitialsetofstringsisaccommodated. Bytheway,thetechniquesdescribedbelowaregenerallyapplicabletoanysuch sparsearray,notjusttries.
Thebasicideaisthatsparsearrays(i.e.,thosethatmostly containemptyor “null”entries)canbe overlaid ontopofeachotherbymakingsurethatthenon-null entriesinonefallontopofnullentriesintheothers.Weallocateallthearraysin asinglelargeone,andstoreextrainformationwitheachentrysothatwecantell whichoftheoverlaidarraysthatentrybelongsto.Figure9.9showsanappropriate alternativedatastructure.
Theideaisthatwhenwestoreeverybody’sarrayofkidsinone place,andstore anedgelabelthattellsuswhatcharacterissupposedtocorrespondtoeachkid. Thatallowsustodistinguishbetweenaslotthatcontainssomebodyelse’schild (whichmeansthatIhavenochildforthatcharacter),andaslotthatcontainsone ofmychildren.Wearrangethatthe me fieldforeverynodeisuniquebymaking surethatthe0thchild(correspondingto ✷)isalwaysfull.
Asanexample,Figure9.10showstheteninternalnodesofthe trieinFigure9.8 overlaidontopofeachother.Asthefigureillustrates,this representationcanbe verycompact.Thenumberofextraemptyentriesthatareneededontheright(to preventindexingofftheendofthearray)islimitedto M 1,sothatitbecomes negligiblewhenthearrayislargeenough.(Aside:Whendealingwithasetofarrays thatonewishestocompressinthisway,itisbesttoallocate thefullest(leastsparse) first.)
Suchclosepackingcomesataprice:insertionsareexpensive.Whenoneaddsa newchildtoanexistingnode,thenecessaryslotmayalready beusedbysomeother array,makingitnecessarytomovethenodetoanewlocationby(ineffect)first erasingitsnon-nullentriesfromthepackedstoragearea,findinganotherspotfor itandmovingitsentriesthere,andfinallyupdatingthepointertothenodebeing movedinitsparent.Therearewaystomitigatethis,butwewon’tgointothem here.
9.3RestoringBalancebyRotation
AnotherapproachistofindanoperationthatchangesthebalanceofaBST— choosinganewrootthatmoveskeysfromadeepsidetoashallowside—while preservingthebinarysearchtreeproperty.Thesimplestsuchoperationsarethe rotations ofatree.Figure9.11showstwoBSTsholdingidenticalsetsofkeys. Considertherightrotationfirst(theleftisamirrorimage).First,therotation preservesthebinarysearchtreeproperty.Intheunrotated tree,thenodesin A are
9.3.RESTORINGBALANCEBYROTATION
abstractclassTrie{
staticprotectedTrie[]allKids; staticprotectedchar[]edgeLabels; staticfinalcharNOEDGE=/*Somecharthatisn’tused.*/ static{
allKids=newTrie[INITIAL_SPACE]; edgeLabels=newchar[INITIAL_SPACE]; for(inti=0;i<INITIAL_SPACE;i+=1){ allKids[i]=EMPTY;edgeLabels[i]=NOEDGE;
classInnerTrieextendsTrie{
/*Positionofmychild0inallKids.Mykthchild,if *non-empty,isatallKids[me+k].Ifmykthchildis *notempty,thenedgeLabels[me+k]==k.edgeLabels[me]
*isalways0(✷).*/ privateintme;
/**ATriewithchild(K)==Tandallotherchildrenempty.*/ InnerTrie(intk,TrieT){
//SetmesuchthatedgeLabels[me+k].isEmpty().*/ child(0,EMPTY); child(k,T);
publicTriechild(intc){
if(edgeLabels[me+c]==c) returnallKids[me+c]; else returnEMPTY;
protectedvoidchild(intc,TrieT){
if(edgeLabels[me+c]!=NOEDGE&& edgeLabels[me+c]!=c){
//Movemykidstoanewlocation,andpointmeatit.
allKids[me+c]=T; edgeLabels[me+c]=c;
Figure9.9:DatastructuresusedwithcompressedTries.
182
CHAPTER9.BALANCEDSEARCHING
} }
}
...
}
}
}
} }
Figure9.10: ApackedversionofthetriefromFigure9.8.Eachofthetrienodesfrom thatfigureisrepresentedasanarrayofchildrenindexedbycharacter,thecharacter thatistheindexofachildisstoredintheupperrow(whichcorrespondstothearray edgeLabels).Thepointertothechilditselfisinthelowerrow(whichcorresponds tothe allKids array).Emptyboxesontopindicateunusedlocations(the NOEDGE value).Tocompressthediagram,I’vechangedthecharacter setencodingsothat ✷ is0,‘a’is1,‘b’is2,etc.Thecrossedboxesinthelowerrowindicateemptynodes. Theremustalsobeanadditional24emptyentriesontheright (notshown)to accountforthec–zentriesoftherightmosttrienodestored.Thesearchalgorithm uses edgeLabels todeterminewhenanentryactuallybelongstothenodeitis currentlyexamining.Forexample,therootnodeissupposed tocontainentriesfor ‘a’,‘b’,and‘f’.Andindeed,ifyoucount1,2,and6overfrom the“root”box above,you’llfindentrieswhoseedgelabelsare‘a’,‘b’,and
‘f’.If,ontheother
hand,youcountover3fromtherootbox,lookingforthenon-existent‘c’edge,you findinsteadanedgelabelof‘e’,tellingyouthattherootnodehasno‘c’edge.
theexactlytheoneslessthan B,astheyareontheright; D isgreater,asonthe right;andsubtree C isgreater,asontheright.Youcanalsoassureyourselfthat thenodesunder D intherotatedtreebeartheproperrelationtoit.
Turningtoheight,let’susethenotation HA, HC , HE, HB,and HD todenote theheightsofsubtrees A, C,and E andofthesubtreeswhoserootsarenodes B and D.Anyof A, C,or E canbeempty;we’lltaketheirheightsinthatcaseto be 1.Theheightofthetreeontheleftis1+max(HE , 1+ HA, 1+ HC ).The heightofthetreeontherightis1+max(HA, 1+ HC , 1+ HE).Therefore,aslong as HA > max(HC +1,HE )(aswouldhappeninaleft-leaningtree,forexample), theheightoftheright-handtreewillbelessthanthatofthe left-handtree.One getsasimilarsituationintheotherdirection.
Infact,itispossibletoconvertanyBSTintoanyotherthatcontainsthesame keysbymeansofrotations.Thisamountstoshowingthatbyrotation,wecanmove anynodeofaBSTtotherootofthetreewhilepreservingthebinarysearchtree
9.3.RESTORINGBALANCEBYROTATION 183 • ✷ a✷ ✷ • b ✷ ✷ ✷ •root: ✷ • a • b bat✷ • e axe✷ • e abase✷ ✷ • f • h abash✷ ✷ • a • e • p faceplate✷ • o axolotl✷ • t facet✷ • s • t abate✷ • x ✷ • b fabric✷ • c ✷ • a • b abbas✷ ...
Figure9.11: Rotationsinabinarysearchtree.Trianglesrepresentsubtreesand circlesrepresentindividualnodes.Thebinarysearchtree relationismaintainedby bothoperations,butthelevelsofvariousnodesareaffected
property[whyisthissufficient?].Theargumentisaninductiononthestructureof trees.
• Itisclearlypossibleforemptyorone-elementtrees.
• Supposewewanttoshowitforalargertree,assuming(inductively)thatall smallertreescanberotatedtobringanyoftheirnodestheir root.Weproceed asfollows:
Ifthenodewewanttomaketherootisalreadythere,we’redone.
Ifthenodewewanttomaketherootisintheleftchild,rotate theleft childtomakeittherootoftheleftchild(inductivehypothesis).Then performarightrotationonthewholetree.
Similarlyifthenodewewantisintherightchild.
9.3.1AVLTrees
Ofcourse,knowingthatitispossibletore-arrangeaBSTbymeansofrotation doesn’ttelluswhichrotationstoperform.The AVLtree isanexampleofatechniqueforkeepingtrackoftheheightsofsubtreesandperformingrotationswhen theygettoofaroutofline.AnAVLtree5 issimplyaBSTthatsatisfiesthe
AVLProperty: theheightsoftheleftandrightsubtreesofeverynode differbyatmostone.
Addingordeletinganodeatthebottomofsuchatree(ashappenswiththesimple BSTinsertionanddeletionalgorithmsfrom §6.1)mayinvalidatetheAVLproperty, butitmayberestoredbyworkinguptowardtherootfromthepointoftheinsertion ordeletionandperformingcertainselectedrotations,dependingonthenatureof theimbalancethatneedstobecorrected.Inthefollowingdiagrams,theexpressions inthesubtreesindicatetheirheights.Anunbalancedsubtreehavingtheform
5Thenameistakenfromthenamesofthetwodiscoverers,Adel’son-Vel’ski˘ıandLandis.
184 CHAPTER9.BALANCEDSEARCHING A C B E D C E D A B D.rotateRight() B.rotateLeft()
–
–
–
h
+1
canberebalancedwithasingleleftrotation,givinganAVLtreeoftheform: h h
Finally,consideranunbalancedtreeoftheform h′ h′′
h +1
whereatleastoneof h′ and h′′ is h andtheotheriseither h or h 1.Here,wecan rebalancebyperformingtworotations,firstarightrotationonC,andthenaleft rotationonA,givingthecorrectAVLtree
Theotherpossiblecasesofimbalancethatresultfromaddingorremovingasingle nodearemirrorimagesofthese.
Thus,ifwekeeptrackoftheheightsofallsubtrees,wecanalwaysrestorethe AVLproperty,startingfromthepointofinsertionordeletionatthebaseofthe treeandproceedingupwards.Infact,itturnsoutthatitisn’tnecessarytoknow thepreciseheightsofallsubtrees,butmerelytokeeptrack ofthethreecasesat eachnode:thetwosubtreeshavethesameheight,theheightoftheleftsubtreeis
9.3.RESTORINGBALANCEBYROTATION
185 h h
B h C h A
A C
h h′ h′′ h
B
greaterby1,andtheheightoftherightsubtreeisgreaterby 1.
9.4SplayTrees
Rotationsallowustomoveanynodeofabinarysearchtreeascloseaswewant totherootofthetree,allthewhilemaintainingthebinarysearchtreeproperty. Attheveryleast,therefore,wecoulduserotationsinanunbalancedtreetomake commonlysearched-forkeysquicktofind.Itturnsoutwecandobetter.A splay tree6 isaformof self-adjustingbinarysearchtree, oneinwhichevenoperations thatdon’tchangethecontentofthetreecanneverthelessadjustitsstructureto speedupsubsequentoperations.Thisdatastructurehasthe interestingproperty thatsomeindividualoperationsmaytake O(N )time(for N itemsinthetree),but theamortizedcost(see §1.4)ofawholesequenceof K operations(including K insertions)isstill O(lg K).Itis,moreover,aparticularlysimplemodificationofthe basic(unbalanced)binarysearchtree.
Unsurprisingly,thedefiningoperationinthistreeis splaying, rotatingaparticularnodetotherootinacertainway.Splayinganodemeansapplyingasequence of splayingsteps soastobringthenodetothetopofthetree.Therearethree typesofsplayingstep:
1.Givenanode t andoneofitschildren, y,rotate y around t (thatis,rotate t leftorright,asappropriate,tobring y tothetop).Theoriginalpapercalls thisa“zig”step.
2.Givenanode t,oneofitschildren, y,andthechild, z,of y thatisonthe samesideof y as y isof t,rotate y around t,andthenrotate z around y (a “zig-zig”step).
3.Givenanode t,oneofitschildren, y,andthechild, z,of y thatisonthe oppositesizeof y as y isof t,rotate z around y andthenaround t (a“zig-zag” step).
Thenodesthatwesubjecttothisoperationarethoseonthepathfromtheroot thatwewouldnormallyfollowtofindagivenvalueinabinarysearchtree.To getsomeintuitionintothemotivationbehindthisparticularoperation,consider Figure9.13.Thetreeontheleftofthefigureisatypicalworst-basebinarysearch tree.Aftersplayingnode0,wegetthetreeontherightofthe figure,whichhas roughlyhalftheheightoftheformer.It’struethatwehavetodo7rotationsto splaythisnode,buttocreatethetreeontheleft,wedid8constant-timeinsertions, sothat(sofar),theamortizedcostsofall9operations(8insertionsplusonesplay) areonlyabout2each.
Theoperationsofsearching,inserting,anddeletingfromanordinarybinary searchtreeallinvolvesearchingforanodethatcontainsaparticularvalue,orone
6D.D.SleatorandR.E.Tarjan,“Self-AdjustingBinarySearchTrees,”JournaloftheACM. 32(3),July1985,pp.652–686.
186 CHAPTER9.BALANCEDSEARCHING
Figure9.12illustratesthesethreebasicsteps.
Figure9.12:Thebasicsplayingsteps.Therearemirrorimagecaseswhen y is ontheothersideof t.Thelastrowillustratesacompletesplayingoperation(on node3).Startingatthebottom,weperforma“zig”step,followedbya“zig-zig,” andfinallya“zig-zag”allonnode3,endingupwiththetreeon theright.
9.4.SPLAYTREES 187 A B y C t B C t A y −→ “zig” −→ A B z C y D t C D t B y A z −→ “zig-zig” −→ B C z A y D t A B C D y t z −→ “zig-zag” −→ 0 8 9 6 7 4 5 2 1 3 3 0 2 1 8 4 6 5 7 9 Before After
thatisascloseaspossibletoit.Insplaytrees,afterfindingthisnode,wesplay it,bringingittotherootofthetreeandreorganizingtherestofthetree.Forthis purpose,itisconvenienttomodifytheusualalgorithmforsearchingforavaluein aBSTsothatitsplaysatthesametime.Figure9.15showsonepossibleimplementation7 Itoperatesontreesofthetype BST,giveninFigure9.14.Thisparticular typeprovidesoperationsforrotatingandreplacingachild thatwillperformeither leftorrightoperations,allowingustocollapseanestofcasesintoafew.
The splayFind procedureisatoolthatwecanusetoimplementtheusualoperationsonbinarysearchtrees,asshowninFigure9.16andillustratedinFigure9.17.
9.4.1Analyzingsplaytrees
Itisquiteeasytocreateveryunbalancedsplaytrees.Insertingitemsintoatree inorderwilldoit.Sowillsearchingforallitemsinthetree inorder.Sothecost ofanyparticularoperationinisolationisΘ(N ),if N isthenumberofnodes(and thereforekeys)inthetree.Butyouneverperformasingleoperationonalargetree; afterall,youhadtobuildthetreeinthefirstplace,andthat certainlyhadtotake timeatleastproportionaltoitssize.Therefore,wemightexpecttogetdifferent resultsifweaskforthe amortized timeofoperationsoveranentiresequence.In thissection,we’llshowthatinfacttheamortizedtimeboundfortheoperationsof search,insertion,anddeletiononasplaytreeis O(lg N ),justliketheworst-case
7The splayFind procedurehereperformszig-zigandzig-zagsteps,afterfirstpossiblyperforming azigstepatthebottomofthesearch.Thisisoneofmanypossiblevariationsonsplaying.The originalpaperbySleatorandTarjanshowshowtoperformsplayingstepsfromthetopdown, orfromthebottomup,butwiththezigatthetopofthetreeratherthanthebottom,orwith asimplifiedversionofthezig-zagstep.Theseallresultinslightlydifferenttrees,butallhave essentiallythesameamortizedperformance.Theversionhereisnotthemostefficient—beinga linearrecursioninsteadofaniterativeprocess—butIfinditconvenientforanalysis.
188 CHAPTER9.BALANCEDSEARCHING 7 6 5 4 3 2 1 0 0 6 4 7 2 5 1 3
Figure9.13:Splayingnode0inacompletelyunbalancedtree.Theresultingtree hasabouthalftheheightoftheoriginal,speedingupsubsequentsearches.
publicstaticclassBST{ publicBST(intlabel,BSTleft,BSTright){ this.label=label; this.left=left;this.right=right;
publicBSTleft,right; publicintlabel;
/**RotateCHILDleftorrightaroundme,asappropriate, *returningCHILD.CHILDmustbeoneofmychildren.*/ BSTrotate(BSTchild){
if(child==right){
right=child.left; child.left=this;
}else{
left=child.right; child.right=this; } returnchild;
/**ReplaceCHILDwithNEWCHILDasoneofmychildren.CHILD *mustbeeithermy(initial)leftorrightchild.*/ voidreplace(BSTchild,BSTnewChild){
if(child==right)
right=newChild; else
left=newChild;
9.4.SPLAYTREES 189
}
}
} }
Figure9.14:Thebinarysearch-treestructureusedforoursplaytrees.Thisisjust anordinaryBSTthatsuppliesunifiedoperationsforrotatingorreplacingchildren.
/**ReorganizeT,maintainingtheBSTproperty,sothatitsrootis *eitherVorthenextvaluelargerorsmallerthanVinT.Returns *nullonlyifTisempty.*/ privatestaticBSTsplayFind(BSTt,intv)
{ BSTy,z;
if(t==null||v==t.label)
returnt;
y=v<t.label?t.left:t.right; if(y==null)
returnt; elseif(v==y.label)
returnt.rotate(y);/*zig*/ elseif(v<y.label)
z=y.left=splayFind(y.left,v); else
z=y.right=splayFind(y.right,v); if(z==null)
returnt.rotate(y);/*zig*/ elseif((v<t.label)==(v<y.label)){/*zig-zig*/
t.rotate(y); y.rotate(z); returnz;
}else{ /*zig-zag*/
t.replace(y,y.rotate(z)); t.rotate(z); returnz;
190 CHAPTER9.BALANCEDSEARCHING
} }
Figure9.15:The splayFind procedureforfindingandsplayinganode.Usedby insertion,deletion,andsearch.
publicclassIntSplayTree{ privateBSTroot=null;
privatestaticBSTsplayFind(BSTt,intv){/* SeeFigure9.15.*/}
/**InsertVintomeiffnotalreadypresent.Returnstrue *iffVwasadded.*/ publicbooleanadd(intv){ root=splayFind(root,v); if(root==null)
root=newBST(v,null,null); elseif(v==root.label)
returnfalse; elseif(v<root.label){
root=newBST(v,root.left,root); root.right.left=null; }else{
root=newBST(v,root,root.right); root.left.right=null;
returntrue;
/**DeleteVfrommeiffpresent.ReturnstrueiffVwasdeleted.*/ publicbooleanremove(intv){ root=splayFind(root,v); if(root==null||v!=root.label) returnfalse; if(root.left==null)
root=root.right; else{ BSTr=root.right; root=splayFind(root.left,v); root.right=r;
returntrue;
/**TrueiffIcontainV.*/ publicbooleancontains(intv){ root=splayFind(root,v);
returnv==root.label;
9.4.SPLAYTREES 191
}
}
}
}
} }
Figure9.16:Standardcollectionoperationsonasplaytree.Theinterfaceisinthe styleoftheJavacollectionsclasses.Figure9.17illustratesthesemethods.
theresultofperforminga splayFind oneitherofthevalues21or24.(c)isthe resultofaddingthevalue21intotree(a);thefirststepisto createthesplayedtree (b).(d)istheresultofremoving24fromtheoriginaltree(a);againthefirststep istocreate(b),afterwhichwesplaytheleftchildof24forthevalue24,whichis guaranteedtobelargerthananyvalueinthatchild.
192 CHAPTER9.BALANCEDSEARCHING 12 0 36 6 20 38 2 8 16 28 4 24 32 (a) 24 12 36 28 38 0 20 32 6 16 2 8 4 (b) 21 12 24 36 0 20 28 38 6 16 32 2 8 4 (c) 20 12 36 0 16 28 38 6 32 2 8 4 (d)
Figure9.17:BasicBSToperationsonasplaytree.(a)istheoriginaltree.(b)is
boundsforotherbalancedbinarytrees.
Todoso,wefirstdefinea potentialfunction onourtrees,asdescribedin §1.4, whichwillkeeptrackofhowmanycheap(andunbalancing)operationswehave performed,andthusindicatehowmuchtimewecanaffordtospendinanexpensive operationwhilestillguaranteeingthatthetotalcumulativecostofasequenceof operationsstaysappropriatelybounded.Aswedidthere(Equation1.1),wedefine theamortizedcost, ai,ofthe ith operationinasequencetobe
ai = ci +Φi+1 Φi, where ci istheactualcostandΦk istheamountof“storedpotential”inthedata structurejustbeforethe kth operation.Forour ci,it’sconvenienttousethenumber ofrotationsperformed,or1ifanoperationinvolvesnorotation.Thatgivesusa valuefor ci thatisproportionaltotherealamountofwork.Thechallengeistofind aΦthatallowsustoabsorbthespikesin ci;when ai >ci,wesaveup“operation credit”inΦandreleaseit(bycausingΦi+1 < Φi)onstepswhere ci becomeslarge. Tobesuitable,wemustmakesurethatΦi ≥ Φ0 atalltimes.
Forasplaytreecontainingasetofnodes, T ,we’lluseasourpotentialfunction
where s(x)isthesize(thenumberofnodesin)thesubtreerootedat x.Thevalue r(x)=lg s(x)iscalledthe rank of x.Thus,forthecompletelylineartreeonthe leftinFigure9.13,Φ= 1≤i≤8 lg i =lg8! ≈ 15.3,whilethetreeontherightof thatfigurehasΦ=4lg1+lg3+lg5+lg7+lg8 ≈ 9.7,indicatingthatthecostof splaying0islargelyoffsetbydecreasingthepotential.
Allbutaconstantamountofworkineachoperation(search,insert,ordelete) takesplacein splayFind,soitwillsufficetoanalyzethat.Iclaimthattheamortized costoffindingandsplayinganode x inatreewhoserootis t is ≤ 3(r(t) r(x))+1. Since t istheroot,weknowthat r(t) ≥ r(x) ≥ 0.Furthermore,since s(t)= N ,the numberofnodesinthetree,provingthisclaimwillprovethattheamortizedcost ofsplayingmustbe O(lg N ),asdesired8 .
Let’slet C(t,x)representtheamortizedcostoffindingandsplayinganode x inatreerootedat t.Thatis,
C(t,x)=max(1, numberofrotationsperformed) +finalpotentialoftree initialpotentialoftree
Weproceedrecursively,followingthestructureoftheprograminFigure9.15,to showthat
C(t,x) ≤ 3(r(t) r(x))+1=3lg(s(t)/s(x))+1. (9.1)
It’sconvenienttousethenotation s′(z)tomean“thevalueof s(z)attheendofa splaystep,”and r′(z)tomean“thevalueof r(z)attheendofasplaystep.”
8 MytreatmenthereisadaptedfromLemma1anditsproofintheSleatorandTarjanpaper.
9.4.SPLAYTREES 193
x∈T
Φ= x∈T r(x)=
lg s(x)
1.When t istheemptytreeor v isatitsroot,therearenorotations,thepotential isunchanged,andwetaketherealcosttobe1.Assertion9.1isobviously trueinthiscase.
2.When x = y isachildof t (the“zig”case,shownatthetopofFigure9.12), weperformonerotation,foratotalactualcostof1.Tocomputethechange inpotential,wefirstnoticethattheonlynodeswehavetoconsiderare t and x,becausetheranksofallothernodesdonotchange,andthuscancelout whenwesubtractthenewpotentialfromtheoldone.Thus,the changein potentialis
3(r(t) r(x))+1.
3.Inthezig-zigcase,thecostconsistsoffirstsplaying x uptobeagrandchildof t (node z inthesecondrowofFigure9.12),andthenperformingtworotations. Byassumption,theamortizedcostofthefirstsplaystepis C(z,x) ≤ 3(r(z)
r(x))+1(r(z)istherankof x afteritissplayedtotheformerpositionof z,sincesplayingdoesnotchangetherankoftherootofatree. We’llabuse notationabitandrefertothenode x afterthissplayingas z sothatwecan stilluse r(x)astheoriginalrankof x).Thecostoftherotationsis2,and thechangeinthepotentialcausedbythesetworotationsdependsonlyonthe changesitcausesintheranksof t, y,and z.Summingtheseup,theamortized costforthiscaseis
194
CHAPTER9.BALANCEDSEARCHING
r ′(t)+ r ′(x) r(t) r(x) = r ′(t) r(x), since r′(x)= r(t) <r(t) r(x), since r′(t) <r(t) < 3(r(t) r(x)), since r(t) r(x) > 0 andtherefore,addinginthecostofonerotation,theamortizedcostis
<
C(t,x)=2+ r ′(t)+ r ′(y)+ r ′(z) r(t) r(y) r(z)+ C(z,x) =2+ r ′(t)+ r ′(y) r(y) r(z)+ C(z,x), since r′(z)= r(t) ≤ 2+ r ′(t)+ r ′(y) r(y) r(z)+3(r(z) r(x))+1 bytheinductivehypothesis =3(r(t) r(x))+1+2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t)
2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t) ≤ 0 (9.2)
2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t) ≤ 2+ r ′(t)+ r(z) 2r(t) since r(y) >r(z)and r(t) >r′(y).
s ′(t)/s(t))+lg(s(z)/s(t)) bythedefinitionof
sotheresultwewantfollowsif
Wecanshow9.2asfollows:
=2+lg(
r andpropertiesoflg.
Nowifyouexaminethetreesinthezig-zigcaseofFigure9.12,youcansee that s′(t)+ s(z)+1= s(t),sothat s′(t)/s(t)+ s(z)/s(t) < 1.Becauselgisa concave,increasingfunction,thisinturntellsusthat(as discussedin §1.6),
s ′(t)/s(t))+lg(s(z)/s(t)) ≤ 2+2lg(1/2)=0
4.Finally,inthezig-zagcase,weagainhavethatthedesiredresultfollowsifwe candemonstratetheinequality9.2above.Thistime,wehave s′(y)+s′(t)+1= s(t),sowecanproceed
Thusendsthedemonstration.
Theoperationsofinsertionandsearchaddaconstanttimeto thetimeofsplaying,anddeletionaddsaconstantandaconstantfactorof2(sinceitinvolvestwo splayingoperations).Therefore,alloperationsonsplaytreeshave O(lg N )amortizedtime(usingthemaximumvaluefor N foranygivensequenceofoperations).
Thisboundisactuallypessimistic.Inordertreetraversals,aswe’veseen,take lineartimeinthesizeofthetree.Sinceasplaytreeisjusta BST,wecanget thesamebound.Ifweweretosplayeachnodetotherootaswetraversedthem (whichmightseemtobenaturalforsplaytrees),ouramortizedboundis O(N lg N ) ratherthan O(N ).Notonlythat,butafterthetraversal,ourtreewillhavebeen convertedtoa“stringy”linkedlist.Oddlyenough,however,itispossibletoshow thatthecostofaninordertraversalofaBSTinwhicheachnodeissplayedasitis traversedisactually O(N )(amortizedcost O(1)foreachitemtraversed,inother words).However,theauthorthinkshehasbeatenthissubjectintothegroundand willspareyouthedetails.
9.5SkipLists
TheB-treewasanexampleofasearchtreeinwhichnodeshadvariablenumbers ofchildren,witheachchildrepresentingsomeorderedsetofkeys.Itspeedsup searchesasdoesavanillabinarysearchtreebysubdividing thekeysateachnode intodisjointrangesofkeys,andcontrivestokeepthesesequencesofcomparable length,balancingthetree.Herewelookatanotherstructurethatdoesmuchthe samething,exceptthatitusesrotationasneededtoapproximatelybalancethetree anditmerelyachievesthisbalancewithhighprobability,ratherthanwithcertainty. ConsiderthesamesetofintegerkeysfromFigure9.1,arrangedintoasearchtree whereeachnodehasonekeyandanynumberofchildren,andthe childrenofany nodeallhavekeysthatareatleastaslargeasthatoftheirparent.Figure9.18
9.5.SKIPLISTS 195
2+lg(
2+ r ′(t)+ r ′(y) r(y)+2r(z) 3r(t) ≤ 2+ r ′(t)+ r ′(y) 2r(t) since r(y) >r(z)and r(t) >r(z). =2+lg(s ′(t)/s(t))+lg(s ′(y)/s(t))
andtheresultfollowsbythesamereasoningasinthezig-zig case.
showsapossiblearrangement.Themaximumheightsatwhichthekeysappearare chosenindependentlyaccordingtoarulethatgivesaprobabilityof(1 p)pk ofa keyappearingbeingatheight k (0beingthebottom).Thatis,0 <p< 1isan arbitraryconstantthatrepresentstheapproximateproportionofallnodesatheight ≥ e thathaveheight >e.Weaddaminimal(−∞)keyattheleftwithsufficient heighttoserveasarootforthewholetree.
Figure9.18showsanexample,createdusing p =0 5.Tolookforakey,we canscanthistreefromlefttorightstartingatanyleveland workingdownwards. Startingatthebottom(level0)justgivesusasimplelinear search.Athigherlevels, wesearchaforestoftrees,choosingwhichforesttoexamine morecloselyonthe basisofthevalueofitsrootnode.Toseeif127isamember,forexample,wecan lookat
• thefirst15entriesoflevel0(notincluding −∞)[15entries];or
• thefirst7level-1entries,andthenthe2level-0itemsbelow thekey120[9 entries];or
• thefirst3level-2entries,thenthelevel-1entry140,andthenthe2level-0 itemsbelow120[6entries];or
• thelevel-3entry90,thenthelevel-2entry120,thenthelevel-1entry140,and thenthe2level-0itemsbelow120[5entries].
Wecanrepresentthistreeasakindoflinearlistofnodesinin-order(see Figure9.19)inwhichthenodeshaverandomnumbersof next links,andthe ith next linkineach(numberingfrom0asusual)isconnectedtothenextnodethat hasatleast i +1links.Thislist-likerepresentation,withsomelinks“skipping” arbitrarynumbersoflistelements,explainsthenamegiven tothisdatastructure: the skiplist 9
Searchingisverysimple.Ifwedenotethevalueatoneofthesenodesas L.value (here,we’lluseintegerkeys)andthenextpointeratheight k as L.next[k],then:
/**TrueiffXisintheskiplistbeginningatnodeLat *aheight<=K,whereK>=0.*/
staticbooleancontains(SkipListNodeL,intk,intx){ if(x==L.next[k].value)
returntrue;
elseif(x>L.next[k].value)
returncontains(L.next[k],k,x);
elseif(k>0)
returncontains(L,k-1,x);
else returnfalse;
9WilliamPugh,Skiplists:Aprobabilisticalternativetobalancedtrees,“
33,6(June,1990)pp.668–676.
196 CHAPTER9.BALANCEDSEARCHING
}
Comm.oftheACM,
Figure9.18: Anabstractviewofaskiplist,showingitsrelationshiptoa (nonbinary)searchtree.Eachkeyotherthan −∞ isduplicatedtoarandomheight. Wecansearchthisstructurebeginningatanylevel.Inthebestcase,tosearch (unsuccessfully)forthetargetvalue127,weneedonlylook atthekeysinthe shadednodes.Darkershadednodesindicatekeyslargerthan 127thatboundthe search.
Figure9.19: TheskiplistfromFigure9.18,showingapossiblerepresentation.The datastructureisanorderedlistwhosenodescontainrandom numbersofpointersto laternodes(whichallowinterveningitemsinthelisttobeskippedduringasearch; hencethename).Ifanodehasatleast k pointers,thenitcontainsapointertothe nextnodethathasatleast k pointers.Anodefor ∞ attherightallowsustoavoid testsfornull.Again,thenodeslookedatduringasearchfor 127areshaded;the darkershadingindicatesnodesthatlimitthesearch.
Figure9.20: TheskiplistfromFigure9.19afterinserting127and126(in either order),anddeleting20.Here,the127nodeisrandomlygiven aheightof5,and the126nodeaheightof1.Theshadednodesshowwhichpreviouslyexistingnodes needtochange.Forthetwoinsertions,thenodesneedingchangearethesameas thelight-shadednodesthatwereexaminedtosearchfor127(or126),plusthe ±∞ nodesattheends(iftheyneedtobeheightened).
9.5.SKIPLISTS 197 0 1 2 3 4 −∞ −∞ −∞ −∞ −∞ 10 20 20 20 25 25 30 40 50 55 55 60 90 90 90 90 95 95 100 115 120 120 120 125 130 140 140 150
−∞ 0 1 2 3 10 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞
−∞ 0 1 2 3 10 25 30 40 50 55 60 90 95 100 115 120 125 126 127 130 140 150 ∞
Wecanstarttheatanylevel k ≥ 0uptotheheightofthetree.Itturnsoutthat areasonableplacetostartforalistcontaining N nodesisatlevellog1/p N ,as explainedbelow.
Toinsertordeleteintothelist,wefindthepositionofthenodetobeadded ordeletedbytheprocessabove,keepingtrackofthenodeswe traversetodoso. Whentheitemisaddedordeleted,thesearethenodeswhosepointersmayneed tobeupdated.Whenweinsertnodes,wechooseaheightforthemrandomlyin suchawaythatthenumberofnodesatheight k +1isroughly pk,where p is someprobability(typicalvaluesforwhichmightbe0.5or0.25).Thatis,ifwe areshootingforaroughly n-arysearchtree,welet p =1/n.Asuitableprocedure mightlooklikethis:
/**Arandominteger,h,intherange0..MAXsuchthat
* Pr(h ≥ k)= P k , 0 ≤ k ≤ MAX.*/ staticintrandomHeight(doublep,intmax,Randomr){ inth; h=0;
while(h<max&&r.nextDouble()<p) h+=1; returnh;
Ingeneral,itispointlesstoaccommodatearbitrarilylargeheights,soweimpose somemaximum,generallythelogarithm(base1/p)ofthemaximumnumberofkeys oneexpectstoneed.
Intuitively,anysequenceof M insertednodeseachofwhoseheightsisatleast k willberandomlybrokenaboutevery1/p nodesbyonewhoseheightisstrictly greaterthan k.Likewise,fornodesofheightatleast k +1,andsoforth.So,if ourlistcontains N items,andwestartlookingatlevellog1/p N ,we’dexpectto lookatmostatroughly(1/p)log1/p N keys(thatis,1/p keysateachoflog1/p N levels).Inotherwords,Θ(lg N )onaverage,whichiswhatwewant.Admittedly, thisanalysisisabithandwavy,butthetrueboundisnotsignificantlylarger.Since insertinganddeletingconsistsoffindingthenode,plussomeinsertionordeletion timeproportionaltothenode’sheight,weactuallyhaveΘ(lg N )expectedbounds onsearch,insertion,anddeletion.
Exercises
9.1. Fillinthefollowingtoagreewithitscomments:
/**ReturnamodifiedversionofTcontainingthesamenodes *withthesameinordertraversal,butwiththenodecontaining *labelXattheroot.DoesnotcreateanynewTreenodes.*/ staticTreerotateUp(TreeT,ObjectX){
198 CHAPTER9.BALANCEDSEARCHING
}
//FILLIN }
9.2. Whatisthemaximumheightofanorder-5B-treecontaining N nodes?What istheminimumheight?Whatsequencesofkeysgivethemaximumheight(thatis, giveageneralcharacterizationofsuchsequences).Whatsequencesofkeysgivethe minimumheight?
9.3. The splayFind algorithmgiveninFigure9.15ishardlythemostefficient versiononecouldimagineofthisprocedure.Theoriginalpaperhasaniterative versionofthesamefunctionthatusesconstantextraspaceinsteadofthelinear recursionofourversionof splayFind.Itkeepstrackoftwotrees: L,containing nodesthatarelessthan v,and R,containingnodesgreaterthan v.Asitprogresses iterativelydownthetreefromtheroot,itaddssubtreesofthecurrentnodeto L and R untilitreachesthenode, x,thatitisseeking.Atthatpoint,itfinishes byattachingtheleftandrightsubtreesof x to L and R respectively,andthen making L and R itsnewchildren.Duringthisprocess,subtreesgetattachedto L inorderincreasingofincreasinglabels,andto R inorderofdecreasinglabels. Rewrite splayFind tousethisstrategy.
9.4. Writeanon-recursiveversionofthe contains functionforskiplists(§9.5).
9.5. Defineanimplementationofthe SortedSet interfacethatusesaskiplist representation.
9.5.SKIPLISTS 199
200 CHAPTER9.BALANCEDSEARCHING
Chapter10
Concurrencyand Synchronization
Animplicitassumptionineverythingwe’vedonesofaristhatasingleprogramis modifyingourdatastructures.InJava,one can havetheeffectofmultipleprograms modifyinganobject,duetotheexistenceof threads.
Althoughthelanguageusedtodescribethreadssuggeststhattheirpurposeis toallowseveralthingstohappensimultaneously,thisisasomewhatmisleading impression.EventhesmallestJavaapplicationrunningonSun’sJDKplatform,for example,hasfivethreads,andthat’sonlyiftheapplication hasnotcreatedany itself,andevenifthemachineonwhichtheprogramrunsconsistsofasingleprocessor(whichcanonlyexecuteoneinstructionatatime).Thefouradditional“system threads”performanumberoftasks(suchas“finalizing”objectsthatarenolonger reachablebytheprogram)thatare logicallyindependent oftherestoftheprogram. Theiractionscouldusefullyoccuratanytimerelativetotherestoftheprogram. Sun’sJavaruntimesystem,inotherwords,isusingthreadsasa organizationaltool foritssystem.ThreadsaboundinJavaprogramsthatuse graphicaluserinterfaces (GUIs). Onethreaddrawsorredrawsthescreen.Anotherrespondsto events such astheclickingofamousebuttonatsomepointonthescreen.Thesearerelated, butlargelyindependentactivities:objectsmustberedrawn,forexample,whenever awindowbecomesinvisibleanduncoversthem,whichhappens independentlyof anycalculationstheprogramisdoing.
Threadsviolateourimplicitassumptionthatasingleprogramoperatesonour data,sothatevenanotherwiseperfectlyimplementeddatastructure,withallof itsinstancevariablesprivate,canbecomecorruptedinratherbizarreways.The existenceofmultiplethreadsoperatingonthesamedataobjectsalsoraisesthe generalproblemofhowthesethreadsaretocommunicatewith eachotherinan orderlyfashion.
201
10.1SynchronizedDataStructures
Considerthe ArrayList implementationfrom §4.1.Inthemethod ensureCapacity, wefind
publicvoidensureCapacity(intN){ if(N<=data.length) return; Object[]newData=newObject[N]; System.arraycopy(data,0, newData,0,count); data=newData; }
publicObjectset(intk,Objectx){ check(k,count); Objectold=data[k]; data[k]=x; returnold; }
Supposeoneprogramexecutes ensureCapacity whileanotherisexecuting set on thesame ArrayList object.Wecouldseethefollowinginterleavingoftheiractions:
/*Program1executes:*/newData=newObject[N];
/*Program1executes:*/System.arraycopy(data,0, newData,0,count);
/*Program2executes:*/data[k]=x;
/*Program1executes:*/data=newData;
Thus,welosethevaluethatProgram2set,becauseitputsthisvalueintotheold valueof data after data’scontentshavebeencopiedtothenew,expandedarray.
Tosolvethesimpleproblempresentedby ArrayList,threadscanarrangeto accessanyparticular ArrayList in mutualexclusion—thatis,insuchawaythat onlyonethreadatatimeoperatesontheobject.Java’s synchronized statement providemutualexclusion,allowingustoproduce synchronized (or thread-safe)data structures.Hereispartofanexample,showingboththeuseofthe synchronized methodmodifierandequivalentuseofthe synchronized statement:
publicclassSyncArrayList<T>extendsArrayList<T>{
publicvoidensureCapacity(intn){ synchronized(this){ super.ensureCapacity(n);
publicsynchronizedTset(intk,Tx){ returnsuper.set(k,x);
TheprocessofprovidingsuchwrapperfunctionsforallmethodsofaListis sufficientlytediousthatthestandardJavalibraryclass java.util.Collections providesthefollowingmethod:
202 CHAPTER10.CONCURRENCYANDSYNCHRONIZATION
...
}
}
}
10.2.MONITORSANDORDERLYCOMMUNICATION
203
/**Asynchronized(thread-safe)viewofthelistL,inwhich only *onethreadatatimeexecutesanymethod.Tobeeffective, *(a)thereshouldbenosubsequentdirectuseofL, *and(b)thereturnedListmustbesynchronizedupon *duringanyiteration,asin
*ListaList=Collections.synchronizedList(newArrayList());
*synchronized(aList){
*for(Iteratori=aList.iterator();i.hasNext();) *foo(i.next());
publicstaticList<T>synchronizedList(ListL<T>){...}
Unfortunately,thereisatimecostassociatedwithsynchronizingoneveryoperation,whichiswhytheJavalibrarydesignersdecidedthat Collection andmost ofitssubtypeswouldnotbesynchronized.Ontheotherhand, StringBuffers and Vectors aresynchronized,andcannotbecorruptedbysimultaneoususe.
10.2MonitorsandOrderlyCommunication
Theobjectsreturnedbythe synchronizedList methodareexamplesofthesimplestkindof monitor. Thistermreferstoanobject(ortypeofobject)thatcontrols (“monitors”)concurrentaccesstosomedatastructuresoas tomakeitworkcorrectly.Onefunctionofamonitoristoprovidemutuallyexclusiveaccesstothe operationsofthedatastructure,whereneeded.Anotheristoarrangefor synchronization betweenthreads—sothatonethreadcanwaituntilanobjectis“ready” toprovideitwithsomeservice.
Monitorsareexemplifiedbyoneoftheclassicexamples:the sharedbuffer or mailbox. Asimpleversionofitspublicspecificationlookslikethis:
/**Acontainerforasinglemessage(anarbitraryObject).Atany *time,aSmallMailboxiseitherempty(containingnomessage)or *full(containingonemessage).*/ publicclassSmallMailbox{
/**WhenTHISisempty,setitscurrentmessagetoMESSAGE,making *itfull.*/
publicsynchronizedvoiddeposit(Objectmessage) throwsInterruptedException{...}
/**WhenTHISisfull,emptyitandreturnitscurrentmessage.*/ publicsynchronizedObjectreceive() throwsInterruptedException{...} }
Sincethespecificationssuggestthateithermethodmighthavetowaitforanew messagetobedepositedoranoldonetobereceived,wespecifybothaspossibly
*
*...
*} */
throwingan InterruptedException,whichisthestandardJavawaytoindicate thatwhilewewerewaiting,someotherthreadinterruptedus. The SmallMailbox specificationillustratesthefeaturesofatypicalmonitor:
• Noneofthemodifiablestatevariables(i.e.,fields)areexposed.
• Accessesfromseparatethreadsthatmakeanyreferencetomodifiablestateare mutuallyexcluded;onlyonethreadatatime holdsalock ona SmallMailbox object.
• Athreadmayrelinquishalocktemporarilyandawaitnotificationofsome change.Butchangesintheownershipofalockoccuronlyatwell-defined pointsintheprogram.
Theinternalrepresentationissimple: privateObjectmessage; privatebooleanamFull;
TheimplementationsmakeuseoftheprimitiveJavafeatures for“waitinguntil notified:”
publicsynchronizedvoiddeposit(Objectmessage) throwsInterruptedException
while(amFull) wait();//Sameasthis.wait(); this.message=message;this.amFull=true; notifyAll();//Sameasthis.notifyAll()
publicsynchronizedObjectreceive() throwsInterruptedException
while(!amFull) wait(); amFull=false; notifyAll(); returnmessage;
Themethodsof SmallMailbox allowotherthreadsinonlyatcarefullycontrolled points:thecallsto wait.Forexample,theloopin deposit means“Ifthereisstill oldunreceivedmail,waituntilsomeotherthreadtoreceivesitandwakesmeup again(with notifyAll) and Ihavemanagedtolockthismailboxagain.”From thepointofviewofathreadthatisexecuting deposit or receive,eachcallto wait hastheeffectofcausingsomechangetotheinstancevariablesof this—some change,thatis,thatcouldbeeffectedbyothercalls deposit or receive.
204 CHAPTER10.CONCURRENCYANDSYNCHRONIZATION
{
}
{
}
Aslongasthethreadsofaprogramarecarefultoprotectalltheirdatain monitorsinthisfashion,theywillavoidthesortsofbizarreinteractiondescribedat thebeginningof §10.1.Ofcourse,thereisnosuchthingasafreelunch;theuse of lockingcanleadtothesituationknownas deadlock inwhichtwoormorethreads waitforeachotherindefinitely,asinthisartificialexample: classCommunicate{ staticSimpleMailbox box1=newSimpleMailbox(), box2=newSimpleMailbox(); }
//Thread#1:|//Thread#2: m1=Communicate.box1.receive();|m2=Communicate.box2.receive(); Communicate.box2.deposit(msg1);|Communicate.box1.deposit(msg2);
Sinceneitherthreadsendsanythingbeforetryingtoreceiveamessagefromitsbox, boththreadswaitforeachother(theproblemcouldbesolved byhavingoneofthe twothreadsreversetheorderinwhichitreceivesanddeposits).
10.3MessagePassing
Monitorsprovideadisciplinedwayformultiplethreadstoaccessdatawithout stumblingovereachother.Lurkingbehindtheconceptofmonitorisasimpleidea:
Thinkingaboutmultipleprogramsexecutingsimultaneouslyishard,so don’tdoit!Instead,writeabunchof one-thread programs,andhave themexchangedatawitheachother.
Inthecaseofgeneralmonitors,“exchangingdata”meanssettingvariablesthateach cansee.Ifwetaketheideafurther,wecaninsteaddefine“exchangingdata”as “readinginputandwritingoutput.”Wegetaconcurrentprogrammingdiscipline called messagepassing.
Inthemessage-passingworld,threadsareindependentsequentialprogramsthan sendeachother messages. Theyreadandwritemessagesusingmethodsthatcorrespondto read onJava Readers,or print onJava PrintStreams. Asaresult, onethreadisaffectedbyanotheronlywhenitbothersto“read itsmessages.”
Wecangettheeffectofmessagepassingbywritingourthreads toperformall interactionwitheachotherbymeansofmailboxes.Thatis,thethreadssharesome setofmailboxes,butsharenoothermodifiableobjectsorvariables(unmodifiable objects,like Strings,arefinetoshare).
Exercises
10.1. Giveapossibleimplementationforthe Collections.synchronizedList staticmethodin §10.1.
10.3.MESSAGEPASSING 205
206 CHAPTER10.CONCURRENCYANDSYNCHRONIZATION
Chapter11
Pseudo-RandomSequences
Randomsequencesofnumbershaveanumberofusesinsimulation,gameplaying, cryptography,andefficientalgorithmdevelopment.Theterm “random”israther difficulttodefine.Formostofourpurposes,wereallydon’tneedtoanswerthe deepphilosophicalquestions,sinceourneedsaregenerallyservedbysequencesthat displaycertainstatisticalproperties.Thisisagoodthing,becausetruly“random”sequencesinthesenseof“unpredictable”aredifficulttoobtainquickly,and programmersgenerallyresort,therefore,to pseudo-random sequences.Theseare generatedbysomeformula,andarethereforepredictablein principle.Nevertheless,formanypurposes,suchsequencesareacceptable,iftheyhavethedesired statistics.
Wecommonlyusesequencesofintegersorfloating-pointnumbersthatare uniformly distributedthroughoutsomeinterval—thatis,ifonepicks anumber(truly) atrandomoutofthesequence,theprobabilitythatitisinanysetofnumbersfrom theintervalisproportionaltothesizeofthatset.Itisrelativelyeasytoarrange thatasequenceofintegersinsomeintervalhasthisparticularproperty:simply enumerateapermutationoftheintegersinthatintervaloverandover.Eachintegerisenumeratedonceperrepetition,andsothesequenceis uniformlydistributed. Ofcourse,havingdescribeditlikethis,itbecomesevenmoreapparentthatthe sequenceisanythingbut“random”intheinformalsenseofthisterm.Nevertheless,whentheintervalofintegersislargeenough,andthepermutation“jumbled” enough,itishardtotellthedifference.TherestofthisChapterwilldealwith generatingsequencesofthissort.
11.1Linearcongruentialgenerators
Perhapsthemostcommonpseudo-random-numbergeneratorsusethefollowingrecurrence.
Xn =(aXn 1 + c)mod m, (11.1)
where Xn ≥ 0isthe nth integerinthesequence,and a,m> 0and c ≥ 0are integers.The seed value, X0,maybeanyvaluesuchthat0 ≤ X0 <m.When m is
207
apoweroftwo,the Xn areparticularlyeasytocompute,asinthefollowingJava class.
/**Ageneratorofpseudo-randomnumbersintherange0.. 231 1.*/ classRandom1{ privateintrandomState; staticfinalint a=..., c=...;
Random1(intseed){randomState=seed;}
intnextInt(){ randomState=(a*randomState+c)&0x7fffffff; returnrandomState;
Here, m is231.The‘&’operationcomputesmod231 [why?].Theresultcanbeany non-negativeinteger.Ifwechangethecalculationof randomState to randomState=a*randomState+c;
thenthecomputationisimplicitlydonemodulo232 ,andtheresultsareintegersin therange 231 to231 1.
Thequestiontoasknowishowtochoose a and c appropriately.Considerable analysishasbeendevotedtothisquestion1.Here,I’lljustsummarize.Iwillrestrict thediscussiontothecommoncaseof m =2w,where w> 2istypicallytheword sizeofthemachine(asintheJavacodeabove).Thefollowing criteriafor a and c aredesirable.
1.Inordertogetasequencethathasmaximum period—thatis,whichcycles throughallintegersbetween0and m 1(orinourcase, m/2to m/2 1)—it isnecessaryandsufficientthat c and m berelativelyprime(havenocommon factorsotherthan1),andthat a havetheform4k +1forsomeinteger k.
2.Averylowvalueof a iseasilyseentobeundesirable(theresultingsequence willshowasortofsawtoothbehavior).Itisdesirablefor a tobereasonably largerelativeto m (Knuth,forexample,suggestsavaluebetween0.01m and 0.99m)andhaveno“obviouspattern”toitsbinarydigits.
3.Itturnsoutthatvaluesof a thatdisplaylow potency (definedastheminimal valueof s suchthat(a 1)s isdivisibleby m)arenotgood.Since a 1must
1Fordetails,seeD.E.Knuth, SeminumericalAlgorithms (TheArtofComputerProgramming, volume2),secondedition,Addison-Wesley,1981.
208
CHAPTER11.PSEUDO-RANDOMSEQUENCES
} }
bedivisibleby4,(seeitem1above),thebestwecandoistoinsurethat (a 1)/4isnoteven—thatis, a mod8=5.
4.Undertheconditionsabove, c =1isasuitablevalue.
5.Finally,althoughmostarbitrarily-chosenvaluesof a satisfyingtheaboveconditionsworkreasonablywell,itisgenerallypreferableto applyvariousstatisticaltests(seeKnuth)justtomakesure.
Forexample,when m =232,somegoodchoicesfor a are1566083941(whichKnuth creditstoWaterman)and1664525(creditedtoLavauxandJanssens).
Therearealsobadchoicesofparameters,ofwhichthemostfamousisonethat waspartoftheIBMFORTRANlibraryforsometime—RANDU,whichhad m =231 , X0 odd, c =0,and a =65539.Thisdoesnothavemaximumperiod,ofcourse(it skipsallevennumbers).Moreover,ifyoutakethenumbersthreeatatimeand considerthemaspointsinspace,thesetofpointsisrestrictedtoarelativelyfew widely-spacedplanes—strikinglybadbehavior.
TheJavalibraryhasaclass java.util.Random similarto Random1.Ittakes m =248 , a =25214903917,and c =11togenerate long quantitiesintherange0to 248 1,whichdoesn’tquitesatisfyKnuth’scriterion2.Ihaven’tcheckedtoseehow gooditis.Therearetwowaystoinitializea Random:eitherwithaspecific“seed” value,orwiththecurrentvalueofthesystemtimer(whichon UNIXsystemsgives anumberofmillisecondssincesometimein1970)—afairlycommonwaytogetan unpredictablestartingvalue.It’simportanttohaveboth: forgamesorencryption, unpredictabilityisuseful.Thefirstconstructor,however,isalsoimportantbecause itmakesitpossibletoreproduceresults.
11.2AdditiveGenerators
Onecangetverylongperiods,andavoidmultiplications(whichcanbealittle expensiveforJava long quantities)bycomputingeachsuccessiveoutput, Xn,asa sumofselectedofpreviousoutputs: Xn k forseveralfixedvaluesof k.Here’san instanceofthisschemethatapparentlyworksquitewell2:
Xn =(Xn 24 + Xn 55)mod m, for n ≥ 55(11.2)
where m =2e forsome e.Weinitiallychoosesome“random”seedvaluesfor X0 to X54.Thishasalargeperiodof2f (255 1)forsome0 ≤ f<e.Thatis,although numbersitproducesmustrepeatbeforethen(sincethereare only2e ofthem,and e istypicallysomethinglike32),theywon’trepeatinthesamepattern.
Implementingthisschemegivesusanotherniceopportunity toillustratethe circularbuffer (see §4.5).Keepyoureyeonthearray state inthefollowing:
2 KnuthcreditsthistounpublishedworkofG.J.MitchellandD.P.Moorein1958.
11.2.ADDITIVEGENERATORS
209
CHAPTER11.PSEUDO-RANDOMSEQUENCES
classRandom2{
/**state[k]willhold Xk,Xk+55,Xk+110,... */ privateint[]state=newint[55];
/**nmwillhold n mod55 aftereachcalltonextInt.
*Initially n =55.*/ privateintnm;
publicRandom2(...){
initializestate[0..54]tovaluesfor X0 to X54; nm=-1;
publicintnextInt(){ nm=mod55(nm+1); intk24=mod55(nm-24); //Nowstate[nm]is Xn 55 andstate[k24]is Xn 24. returnstate[nm]+=state[k24]; //Nowstate[nm](justreturned)represents Xn
privateintmod55(intx){ return(x>=55)?x-55:(x<0)?x+55:x;
Othervaluesthan24and55willalsoproducepseudo-randomstreamswithgood characteristics.SeeKnuth.
11.3Otherdistributions
11.3.1Changingtherange
Thelinearcongruentialgeneratorsabovegiveuspseudo-randomnumbersinsome fixedrange.Typically,wearereallyinterestedinsomeother,smaller,rangeof numbersinstead.Let’sfirstconsiderthecasewherewewanta sequence, Yi,of integersuniformlydistributedinarange0to m′ 1andaregivenpseudo-random integers, Xi,intherange0to m 1,with m>m′.Apossibletransformationis
whichresultsinnumbersthatarereasonablyevenlydistributedaslongas m ≫ m′ . Fromthis,itiseasytogetasequenceofpseudo-randomintegersevenlydistributed intherange L ≤ Y ′ i <U :
210
}
}
} }
Yi = ⌊ m′ m Xi⌋,
Y ′ i = ⌊ U L m Xi⌋
Itmightseemthat
Yi = Xi mod m ′ (11.3)
isamoreobviousformulafor Yi.However,ithasproblemswhen m′ isasmall poweroftwoandweareusingalinearcongruentialgenerator asinEquation11.1, with m apowerof2.Forsuchagenerator,thelast k bitsof Xi haveaperiodof2k [why?],andthussowill Yi.Equation11.3worksmuchbetterif m′ isnotapower of2.
The nextInt methodintheclass java.util.Random producesits32-bitresult froma48-bitstatebydividingby216 (shiftingrightby16binaryplaces),whichgets convertedtoan int intherange 231 to231 1.The nextLong methodproduces a64-bitresultbycalling nextInt twice:
(nextInt()<<32L)+nextInt();
11.3.2Non-uniformdistributions
Sofar,wehavediscussedonlyuniformdistributions.Sometimesthatisn’twhatwe want.Ingeneral,assumethatwewanttopickanumber Y insomerange ul to uh sothat3
Pr[Y ≤ y]= P (y),
where P isthedesired distributionfunction—thatis,itisanon-decreasingfunction with P (y)=0for y<ul and P (y)=1for y ≥ uh.Theideaofwhatwemust doisillustratedinFigure11.1,whichshowsagraphofadistribution P .Thekey observationisthatthedesiredprobabilityof Y beingnogreaterthan y0, P (y0), isthesameastheprobabilitythatauniformlydistributedrandomnumber X on theinterval0to1,islessthan P (y0).Suppose,therefore,thatwehadaninverse function P 1 sothat P (P 1(x))= x.Then,
Pr[P 1(X) ≤ y]=Pr[X ≤ P (y)]= P (y)
Inotherwords,wecandefine
Y = P 1(X) asthedesiredrandomvariable.
Allofthisisstraightforwardwhen P isstrictlyincreasing.However,wehave toexercisecarewhen P isnotinvertible,whichhappenswhen P doesnotstrictly increase(i.e.,ithas“plateaus”whereitsvaluedoesnotchange).If P (y)hasa constantvaluebetween y0 and y1,thismeansthattheprobabilitythat Y falls betweenthesetwovaluesis0.Therefore,wecanuniquelydefine P 1(x)asthe smallest y suchthat P (y) ≤ x.
Unfortunately,invertingacontinuousdistribution(that is,inwhich Y ranges— atleastideally—oversomeintervalofrealnumbers)isnotalwayseasytodo.There arevarioustricks;asusual,theinterestedreaderisreferredtoKnuthfordetails.In particular,Javausesoneofhisalgorithms(the polarmethod ofBox,Muller,and
3 Thenotation Pr[E]means“theprobabilitythatsituation E (calledan event)istrue.”
11.3.OTHERDISTRIBUTIONS 211
Figure11.1: Atypicalnon-uniformdistribution,illustratinghowtoconvertauniformlydistributedrandomvariableintoonegovernedbyanarbitrarydistribution, P (y).Theprobabilitythat y islessthan y0 isthesameastheprobabilitythata uniformlydistributedrandomvariableontheinterval0to1 islessthanorequalto P (y0).
Marsaglia)toimplementthe nextGaussian methodin java.util.Random,which returnsnormallydistributedvalues(i.e.,the“bellcurve”density)withameanvalue of0andstandarddeviationof1.
11.3.3Finitedistributions
Thereisasimplercommoncase:thatinwhich Y istorangeoverafiniteset— saytheintegersfrom0to u,inclusive.Wearetypicallygiventheprobabilities pi =Pr[Y = i].Intheinterestingcase,thedistributionisnotuniform, andhence the pi arenotnecessarilyall1/(u +1).Therelationshipbetweenthese pi and P (i) is
P (i)=Pr[Y ≤ i]= 0≤k≤i pk.
Theobvioustechniqueforcomputingtheinverse P 1 istoperformalookup onatablerepresentingthedistribution P .Tocomputearandom i satisfyingthe desiredconditions,wechoosearandom X intherange0–1,andreturnthefirst i suchthat X ≤ P (i).Thisworksbecausewereturn i iff P (i 1) <X ≤ P (i)(taking P ( 1)=0).Thedistancebetween P (i 1)and P (i)is pi,andsince X isuniformly distributedacross0to1,theprobabilityofgettingapoint inthisintervalisequal tothesizeoftheinterval, pi
Forexample,if1/12ofthetimewewanttoreturn0,1/2thetimewewantto return1,1/3ofthetimewewanttoreturn2,and1/12ofthetimewewantto return3,wereturntheindexofthefirstelementoftable PT thatdoesnotexceed arandom X chosenuniformlyontheinterval0to1,where PT isdefinedtohave PT [0]=1/12, PT [1]=7/12, PT [2]=11/12,and PT [3]=1.
Oddlyenough,thereisafasterwayofdoingthiscomputation forlarge u,discoveredbyA.J.Walker4 .Imaginethenumbersbetween0and u aslabelson u +1
4Knuth’scitationsare ElectronicsLetters 10,8(1974),127–128and ACMTransactionson
212
1 0 y y0 P (y0 ) P (y)
CHAPTER11.PSEUDO-RANDOMSEQUENCES
beakers,eachofwhichcancontain1/(u +1)unitsofliquid.Imaginefurtherthat wehave u +1vialsofcoloredliquids,alsonumbered0to u,eachofadifferentcolor andallimmiscibleineachother;we’llusetheinteger i asthenameofthecolorin vialnumber i.Thetotalamountofliquidinallthevialsis1unit,butthevialsmay containdifferentamounts.Theseamountscorrespondtothedesiredprobabilities ofpickingthenumbers0through u +1.
Supposethatwecandistributetheliquidfromthevialstothebeakerssothat
• Beakernumber i containstwocolorsofliquid(thequantityofoneofthecolors, however,maybe0),and
• Oneofthecolorsofliquidinbeaker i iscolornumber i.
Thenwecanpickanumberfrom0to u withthedesiredprobabilitiesbythe followingprocedure.
• Pickarandomfloating-pointnumber,X,uniformlyintherange0 ≤ X<u+1. Let K betheintegerpartofthisnumberand F thefractionalpart,sothat K + F = X, F< 1,and K,F ≥ 0.
• Iftheamountofliquidofcolor K inbeaker K isgreaterthanorequalto F , thenreturn K.Otherwisereturnthenumberoftheothercolorinbeaker K.
Alittlethoughtshouldconvinceyouthattheprobabilityof pickingcolor i under thisschemeisproportionaltotheamountofliquidofcolor i.Thenumber K representsarandomly-chosenbeaker,and F representsarandomly-chosenpoint alongthesideofthatbeaker.Wechoosethecolorwefindatthisrandomlychosen point.Wecanrepresentthisselectionprocesswithtwotablesindexedby K: YK isthecoloroftheotherliquidinbeaker K (i.e.,besidescolor K itself),and HK is theheightoftheliquidwithcolor K inbeaker K (asafractionofthedistanceto thetopgradationofthebeaker).
Forexample,considertheprobabilitiesgivenpreviously; anappropriatedistributionofliquidisillustratedinFigure11.2.Thetablescorrespondingtothisfigureare Y =[1, 2, , 1](Y2 doesn’tmatterinthiscase),and H =[0.3333, 0.6667, 1.0, 0.3333].
Theonlyremainingproblemisperformthedistributionofliquidstobeakers, forwhichthefollowingproceduresuffices(inoutline):
MathematicalSoftware, 3 (1976),253–256.
11.3.OTHERDISTRIBUTIONS 213
CHAPTER11.PSEUDO-RANDOMSEQUENCES
/** S isasetofintegersthatarethenamesofbeakersand *vialcolors.Assumesthatallthebeakersnamedin S are *emptyandhaveequalcapacity,andthetotalcontentsofthe vials *namedin S isequaltothetotalcapacityofthebeakersin
* S .Fillsthebeakersin S fromthevialsinVsothat *eachbeakercontainsliquidfromnomorethantwovialsandthe *beakernamedscontainsliquidofcolors.*/ voidfillBeakers(SetOfIntegers S )
if(S isempty) return;
v0 =thecolorofavialin S withtheleastliquid; Pourthecontentsofvial v0 intobeaker v0;
/*Thecontentsmustfitinthebeaker,becausesince v0
*containstheleastfluid,itmusthavenomorethanthe *capacityofasinglebeaker.Vial v0 isnowempty.*/
v1 =thecolorofavialin S withthemostliquid; Fillbeaker v0 therestofthewayfromvial v1;
/*If |S| =1 sothat v0 = v1,thisisthenulloperation.
*Otherwise, v0 = v1 andvial v1 mustcontainat *leastasmuchliquidaseachbeakercancontain.Thus,beaker
* v0 isfilledbythisstep.(NOTE: |S| isthe
*cardinalityof S .)*/
fillBeakers(S −{v0}); }
Theactionof“pouringthecontentsofvial v0 intobeaker v0”correspondstosetting Hv0
totheratiobetweentheamountofliquidinvial v0 andthecapacityofbeaker v0.Theactionof“fillingbeaker v0 therestofthewayfromvial v1”correspondsto setting Yv0 to v1.
214
0 1 2 3 Legend: 0: 1: 2: 3:
Figure11.2: Anexampledividingprobabilities(coloredliquids)intobeakers.Each beakerholds1/4unitofliquid.Thereis1/12unitof0-coloredliquid,1/2unitof 1-coloredliquid,1/3unitof2-coloredliquid,and1/12unitof3-coloredliquid.
{
11.4.RANDOMPERMUTATIONSANDCOMBINATIONS 215
11.4Randompermutationsandcombinations
Givenasetof N values,considertheproblemofselectinga randomsequencewithout replacement oflength M fromtheset.Thatis,wewantarandomsequenceof M valuesfromamongthese N ,whereeachvalueoccursinthesequencenomorethan once.By“randomsequence”wemeanthatallpossiblesequencesareequallylikely5 . Ifweassumethattheoriginalvaluesarestoredinanarray,thenthefollowingisa verysimplewayofobtainingsuchasequence.
/**PermuteAsoastorandomlyselectMofitselements, *placingtheminA[0]..A[M-1],usingRasasourceof *randomnumbers.*/
staticvoidselectRandomSequence(SomeType[]A,intM,Random1R) {
intN=A.length; for(inti=0;i<M;i+=1)
swap(A,i,R.randInt(i,N-1)); }
Here,weassume swap(V,j,k) exchangesthevaluesof V[j] and V[k]
Forexample,if DECK[0] isA♣, DECK[1] is2♣,...,and DECK[51] isK♠,then selectRandomSequence(DECK,52,newRandom());
shufflesthedeckofcards.
Thistechniqueworks,butif M ≪ N ,itisnotaterriblyefficientuseofspace,at leastwhenthecontentsofthearray A issomethingsimple,liketheintegersbetween 0and N 1.Forthatcase,wecanbetterusesomealgorithmsduetoFloyd(names oftypesandfunctionsaremeanttomakethemself-explanatory).
5 Here,I’llassumethattheoriginalsetcontainsnoduplicatevalues.Ifitdoes,thenwehaveto treattheduplicatesasiftheywerealldifferent.Inparticular,ifthereare k duplicatesofavalue intheoriginalset,itmayappearupto k timesintheselectedsequence.
CHAPTER11.PSEUDO-RANDOMSEQUENCES
/**ReturnsarandomsequenceofMdistinctintegersfrom0..N-1, *withallpossiblesequencesequallylikely.Assumes0<=M<=N.*/ staticSequenceOfIntegersselectRandomIntegers(intN,intM,Random1R) { SequenceOfIntegersS=newSequenceOfIntegers(); for(inti=N-M;i<N;i+=1){ ints=R.randInt(0,i); if(s ∈ S) insertiintoSafters; else prefixstothefrontofS; } returnS;
Thisprocedureproducesallpossiblesequenceswithequalprobabilitybecauseeverypossiblesequenceofvaluesfor s generatesadistinctvalueof S,andallsuch sequencesareequallyprobable.
Sanitycheck:thenumberofwaystoselectasequenceof M objectsfromaset of N objectsis
N ! (N M )!
andthenumberofpossiblesequencesofvaluesfor s isequaltothenumberofpossiblevaluesof R.randInt(0,N-M) timesthenumberofpossiblevaluesof R.randInt(0,N-M-1), etc.,whichis
(N M +1)(N M +2) ··· N = N ! (N M )!
Byreplacingthe SequenceOfIntegers withasetofintegers,andreplacing “prefix”and“insert”withsimplyaddingtoaset,wegetanalgorithmforselecting combinations of M numbersfromthefirst N integers(i.e.,whereorderdoesn’t matter).
TheJavastandardlibraryprovidestwostaticmethodsinthe class java.util.Collections forrandomlypermutinganarbitrary List:
/**PermuteL,usingRasasourceofrandomness.Asaresult, *callingshuffletwicewithvaluesofRthatproduceidentical *sequenceswillgiveidenticalpermutations.*/ publicstaticvoidshuffle(List<?>L,Randomr){ ··· }
/**Sameasshuffle(L,D),whereDisadefaultRandomvalue.*/ publicstaticvoidshuffle(ListL<?>){ }
Thistakeslineartimeifthelistsupportsfastrandomaccess.
216
}
Chapter12
Graphs
Whenthetermisusedincomputerscience,a graph isadatastructurethatrepresentsamathematicalrelation.Itconsistsofasetof vertices (or nodes)anda setof edges, whicharepairsofvertices1 .Theseedgepairsmaybeunordered,in whichcasewehavean undirectedgraph, ortheymaybeordered,inwhichcasewe havea directedgraph (or digraph)inwhicheachedge leaves, exits,oris outof one vertexand enters oris into theother.Forvertices v and w wedenoteageneral edgebetween v and w as(v,w),or {v,w} ifwespecificallywanttoindicatean undirectededge,or[v,w]ifwespecificallywanttoindicateadirectededgethat leaves v andenters w.Anedge(v,w)issaidtobe incident onitstwo ends, v and w;if(v,w)isundirected,wesaythat v and w are adjacent vertices.The degree of avertexisthenumberofedgesincidentonit.Foradirectedgraph,the in-degree isthenumberofedgesthatenteritandthe out-degree isthenumberthatleave. Usually,theendsofanedgewillbedistinct;thatis,therewillbeno reflexive edges fromavertextoitself.
A subgraph ofagraph G issimplyagraphwhoseverticesandedgesaresubsets oftheverticesandedgesof G.
A path oflength k ≥ 0inagraphfromvertex v tovertex v′ isasequenceof vertices v0,v1,...,vk 1 with v = v0, v′ = vk 1 withallthe(vi,vi+1)edgesbeingin thegraph.Thisdefinitionappliesbothtodirectedandundirectedgraphs;inthe caseofdirectedgraphs,thepathhasadirection.Thepathis simple ifthereare norepetitionsofverticesinit.Itisa cycle if k> 1and v = v′,anda simplecycle if v0,...,vk 2 aredistinct;inanundirectedgraph,acyclemustadditionallynot followthesameedgetwice.Agraphwithnocyclesiscalled acyclic Ifthereisapathfrom v to v′,then v′ issaidtobe reachable from v.Inan undirectedgraph,a connectedcomponent isasetofverticesfromthegraphandall edgesincidentonthoseverticessuchthateachvertexisreachablefromanygiven vertex,andnoothervertexfromthegraphisreachablefromanyvertexintheset. Anundirectedgraphis connected ifitcontainsexactlyoneconnectedcomponent (containingallverticesinthegraph).
217
1 DefinitionsinthissectionaretakenfromTarjan, DataStructuresandNetworkAlgorithms, SIAM,1983.
thereismorethanone,thegraphisunconnected.Thesequence[2,1,0,3]isapath fromvertex2tovertex3.Thepath[2,1,0,2]isacycle.Theonlypathinvolving vertex4isthe0-lengthpath[4].Therightmostconnectedcomponentisacyclic, andisthereforeafreetree.
Inadirectedgraph,theconnectedcomponentscontainthesamesetsofvertices thatyouwouldgetbyreplacingalldirectededgesbyundirectedones.Asubgraph ofadirectedgraphinwhicheveryvertexcanbereachedfromeveryotheriscalled a stronglyconnectedcomponent.Figures12.1and12.2illustratethesedefinitions.
A freetree isaconnected,undirected,acyclicgraph(whichimpliesthatthere isexactlyonesimplepathfromanynodetoanyother).Anundirectedgraphis biconnected ifthereareatleasttwosimplepathsbetweenanytwonodes.
Forsomeapplications,weassociateinformationwiththeedgesofagraph.For example,ifverticesrepresentcitiesandedgesrepresentroads,wemightwishto associatedistanceswiththeedges.Orifverticesrepresentpumpingstationsand edgesrepresentpipelines,wemightwishtoassociatecapacitieswiththeedges. We’llcallnumericinformationofthissort weights.
12.1AProgrammer’sSpecification
Thereisn’tanobvioussingleclassspecificationthatonemightgiveforprograms dealingwithgraphs,becausevariationsinwhatvariousalgorithmsneedcanhavea profoundeffectontheappropriaterepresentationsandwhat operationsthoserepresentationsconvenientlysupport.Forinstructionaluse,however,Figure12.3gives asample“one-size-fits-all”abstractionforgeneraldirectedgraphs,andFigure12.4 doesthesameforundirectedgraphs.Theideaisthatverticesandedgesareidentifiedbynon-negativeintegers.Anyadditionaldatathatonewantstoassociatewith avertexoredge—suchasamoreinformativelabeloraweight—canbeadded“on theside”intheformofadditionalarraysindexedbyvertexoredgenumber.
218 CHAPTER12.GRAPHS 0 1 2 3 ⋆ 4 5 6 7 8 9
Figure12.1:Anundirectedgraph.Thestarrededgeisincidentonvertices1and2. Vertex4hasdegree0;3,7,8,and9havedegree1;1,2and6have degree2;and0 and5havedegree3.Thedashedlinessurroundtheconnectedcomponents;since
Nodes5,6and7formastronglyconnectedcomponent.Theotherstronglyconnectedcomponentsaretheremainingindividualnodes.Theleftcomponentis acyclic.Nodes0and4haveanin-degreeof0;nodes1,2,and5–8haveanin-degree of1;andnode3hasanin-degreeof3.Nodes3and8haveout-degreesof0;1,2, 4,5,and7haveout-degreesof1;and0and6haveout-degreesof2.
12.2Representinggraphs
Graphshavenumerousrepresentations,alltailoredtotheoperationsthatarecritical tosomeapplication.
12.2.1AdjacencyLists
Iftheoperations succ, pred, leaving,and entering fordirectedgraphsareimportanttoone’sproblem(or incident and adjacent forundirectedgraphs),thenit maybeconvenienttoassociatealistofpredecessors,successors,orneighborswith eachvertex—anadjacencylist.Therearemanywaystorepresentsuchthings—as alinkedlist,forexample.Figure12.5showsamethodthatusesarraysinsucha waythattoallowaprogrammerbothtosequenceeasilythroughtheneighborsof adirectedgraph,andtosequencethroughthesetofalledges.I’veincludedonlya coupleofindicativeoperationstoshowhowthedatastructureworks.Itisessentiallyasetoflinkedliststructuresimplementedwitharraysandintegersinsteadof objectscontainingpointers.Figure12.6showsanexampleofaparticulardirected graphandthedatastructuresthatwouldrepresentit.
Anothervariationonessentiallythesamestructureistointroduceseparate typesforverticesandedges.VerticesandEdgeswouldthencontainfieldssuchas
12.2.REPRESENTINGGRAPHS 219 0 1 2 3 4 5 6 7 8
Figure12.2: Adirectedgraph.Thedashedcirclesshowconnectedcomponents.
/**Ageneraldirectedgraph.Foranygivenconcreteextensionofthis *class,adifferentsubsetoftheoperationslistedwillwork.For *uniformity,wetakeallverticestobenumberedwithintegers *between0andN-1.*/ publicinterfaceDigraph{
/**Numberofvertices.Verticesarelabeled0..numVertices()-1.*/ intnumVertices();
/**Numberofedges.Edgesarenumbered0..numEdges()-1.*/ intnumEdges();
/**TheverticesthatedgeEleavesandenters.*/ intleaves(inte); intenters(inte);
/**Trueiff[v0,v1]isanedgeinthisgraph.*/ booleanisEdge(intv0,intv1);
/**Theout-degreeandin-degreeofvertex#V.*/ intoutDegree(intv); intinDegree(intv);
/**ThenumberoftheKthedgeleavingvertexV,0<=K<outDegree(V).*/ intleaving(intv,intk);
/**ThenumberoftheKthedgeenteringvertexV,0<=K<inDegree(V).*/ intentering(intv,intk);
/**TheKthsuccessorofvertexV,0<=K<outDegree(V).Itisintended *thatsucc(v,k)=enters(leaving(v,k)).*/ intsucc(intv,intk);
/**TheKthpredecessorofvertexV,0<=K<inDegree(V).Itis intended *thatpred(v,k)=leaves(entering(v,k)).*/ intpred(intv,intk);
/**AddMinitiallyunconnectedverticestothisgraph.*/ voidaddVertices(intM);
/**AddanedgefromV0toV1.*/ voidaddEdge(intv0,intv1);
/**RemovealledgesincidentonvertexVfromthisgraph.*/ voidremoveEdges(intv);
/**Removeedge(v0,v1)fromthisgraph*/ voidremoveEdge(intv0,intv1);
220
CHAPTER12.GRAPHS
}
Figure12.3:Asampleabstractdirected-graphinterfacein Java.
/**Ageneralundirectedgraph.Foranygivenconcreteextensionof *thisclass,adifferentsubsetoftheoperationslistedwillwork. *Foruniformity,wetakeallverticestobenumberedwithintegers *between0andN-1.*/ publicinterfaceGraph{
/**Numberofvertices.Verticesarelabeled0..numVertices()-1.*/ intnumVertices();
/**Numberofedges.Edgesarenumbered0..numEdges()-1.*/ intnumEdges();
/**TheverticesonwhichedgeEisincident.node0isthe *smaller-numberedvertex.*/ intnode0(inte); intnode1(inte);
/**TrueiffverticesV0andV1areadjacent.*/ booleanisEdge(intv0,intv1);
/**Thenumberofedgesincidentonvertex#V.*/ intdegree(intv);
/**ThenumberoftheKthedgeincidentonV,0<=k<degree(V). */ intincident(intv,intk);
/**TheKthnodeadjacenttoV,0<=K<outDegree(V).Itis *intendedthatadjacent(v,k)=eithernode0(incident(v,k)) *ornode1(incident(v,k)).*/ intadjacent(intv,intk);
/**AddMinitiallyunconnectedverticestothisgraph.*/ voidaddVertices(intM);
/**Addan(undirected)edgebetweenV0andV1.*/ voidaddEdge(intv0,intv1);
/**RemovealledgesinvolvingvertexVfromthisgraph.*/ voidremoveEdges(intv);
/**Removethe(undirected)edge(v0,v1)fromthisgraph.*/ voidremoveEdge(intv0,intv1);
12.2.REPRESENTINGGRAPHS 221
}
Figure12.4:Asampleabstractundirected-graphclass.
/**Adigraph*/ publicclassAdjGraphimplementsDigraph{
/**AnewDigraphwithNunconnectedvertices*/ publicAdjGraph(intN){ numVertices=N;numEdges=0; enters=newint[N*N];leaves=newint[N*N]; nextOutEdge=newint[N*N];nextInEdge=newint[N*N]; edgeOut0=newint[N];edgeIn0=newint[N]; }
/**TheverticesthatedgeEleavesandenters.*/ publicintleaves(inte){returnleaves[e];} publicintenters(inte){returnenters[e];}
/**AddanedgefromV0toV1.*/ publicvoidaddEdge(intv0,intv1){ if(numEdges>=enters.length)
expandEdges();//Expandalledge-indexedarrays enters[numEdges]=v1;leaves[numEdges]=v0; nextInEdge[numEdges]=edgeIn0[v1]; edgeIn0[v1]=numEdges; nextOutEdge[numEdges]=edgeOut0[v0]; edgeOut0[v0]=numEdges; numEdges+=1; }
222 CHAPTER12.GRAPHS
Figure12.5:Adjacency-listimplementationforadirected graph.Onlyafewrepresentativeoperationsareshown.
/**ThenumberoftheKthedgeleavingvertexV,0<=K<outDegree(V).*/ publicintleaving(intv,intk){ inte;
for(e=edgeOut0[v];k>0;k-=1) e=nextOutEdge[e]; returne;
} /*Privatesection*/
privateintnumVertices,numEdges;
/*Thefollowingareindexedbyedgenumber*/ privateint[] enters,leaves, nextOutEdge,/*The#ofsiblingoutgoingedge,or-1*/ nextInEdge;/*The#ofsiblingincomingedge,or-1*/
/*edgeOut0[v]is#offirstedgeleavingv,or-1.*/ privateint[]edgeOut0;
/*edgeIn0[v]is#offirstedgeenteringv,or-1.*/ privateint[]edgeIn0;
Figure12.5,continued.
12.2.REPRESENTINGGRAPHS 223
}
Figure12.6: Agraphandoneformofadjacencylistrepresentation.Thelistsin thiscasearearrays.Thelowerfourarraysareindexedbyedgenumber,andthe firsttwobyvertexnumber.Thearray nextOutEdge formslinkedlistsofout-going edgesforeachvertex,withrootsin edgeOut0.Likewise, nextInEdge and edgeIn0 formlinkedlistsofincomingedgesforeachvertex.The enters and leaves arrays givetheincidentverticesforeachedge.
classVertex{ privateintnum;/*Numberofthisvertex*/ privateEdgeedgeOut0,edgeIn0;/*Firstoutgoing&incomingedges.*/
privateintnum;/*Numberofthisedge*/ privateVertexenters,leaves; privateEdgenextOutEdge,nextInEdge;
12.2.2Edgesets
Ifallweneedtodoisenumeratetheedgesandtellwhatnodestheyareincident on,wecansimplifytherepresentationin §12.2.1quiteabitbythrowingoutfields edgeOut0, edgeIn0, nextOutEdge,and nextInEdge.Wewillseeonealgorithm wherethisisuseful.
224 CHAPTER12.GRAPHS D A B G E F C H edgeOut0 1 3 5 11 1 8 9 12 edgeIn0 1 12 10 9 7 3 5 11 ABCDEFGH nextOutEdge 1 0 1 2 1 1 1 6 1 1 4 7 10 nextInEdge 1 1 1 1 1 4 1 1 2 8 6 1 0 enters 1 3 3 5 6 6 2 4 3 3 2 7 1 leaves 0 0 1 1 7 2 3 3 5 6 7 3 7 0123456789101112
}
}
classEdge{
12.2.3Adjacencymatrices
Ifone’sgraphsare dense (manyofthepossiblyedgesexist)andiftheimportant operationsinclude“Isthereanedgefrom v to w?”or“Theweightoftheedge between v and w,”thenwecanusean adjacencymatrix.Wenumberthevertices0 to |V |− 1(where |V | isthesizeoftheset V ofvertices),andthensetupa |V |×|V | matrixwithentry(i,j)equalto1ifthereisanedgefromthevertexnumbers i to theonenumbered j and0otherwise.Forweightededges,wecanletentry(i,j)be theweightoftheedgebetween i and j,orsomespecialvalueifthereisnoedge (thiswouldbeanextensionofthespecificationsofFigure12.3).Whenagraphis undirected,thematrixwillbesymmetric.Figure12.7illustratestwounweighted graphs—directedandundirected—andtheircorrespondingadjacencymatrices.
Adjacencymatricesforunweightedgraphshavearatherinterestingproperty. Take,forexample,thetopmatrixinFigure12.7,andconsidertheresultof multiplying thismatrixbyitself.Wedefinetheproductofanymatrix X withitself as
12.2.REPRESENTINGGRAPHS 225 D A B G E F C H M = ABCDEFGH A 01010000 B 00010100 C 00000010 D 00101001 E 00000000 F 00010000 G 00010000 H 01100010 D A B G E F C H M ′ = ABCDEFGH A 01010000 B 10010101 C 00010011 D 11101111 E 00010000 F 01010000 G 00110001 H 01110010
Figure12.7: Top:adirectedgraphandcorrespondingadjacencymatrix.Bottom: anundirectedvariantofthegraphandadjacencymatrix.
(X · X)ij = 0≤k<|V | Xik · Xkj .
Translatingthis,weseethat(M M )ij isequaltothenumberofvertices, k,such thatthereisanedgefromvertex i tovertex k (Mik =1)andthereisalsoanedge fromvertex k tovertex j (Mkj =1).Foranyothervertex,oneof Mik or Mkj will be0.Itshouldbeeasytosee,therefore,that M 2 ij isthenumberofpathsfollowing exactly two edgesfrom i to j.Likewise, M 3 ij representsthenumberofpathsthat areexactlythreeedgeslongbetween i and j.Ifweusebooleanarithmeticinstead (where0+1=1+1=1),weinsteadget1’sinallpositionswhere thereisatleast onepathoflengthexactlytwobetweentwovertices.
Adjacencymatricesarenotgoodforsparsegraphs(thosewherethenumber ofedgesismuchsmallerthan V 2).Itshouldbeobviousalsothattheypresent problemswhenonewantstoaddandsubtractverticesdynamically.
12.3GraphAlgorithms
Manyinterestinggraphalgorithmsinvolvesomesortoftraversaloftheverticesor edgesofagraph.Exactlyasfortrees,onecantraverseagraphineitherdepthfirstorbreadth-firstfashion(intuitively,walkingawayfromthestartingvertexas quicklyorasslowlyaspossible).
12.3.1Marking.
However,ingraphs,unliketrees,onecangetbacktoavertex byfollowingedges awayfromit,makingitnecessarytokeeptrackofwhatverticeshavealreadybeen visited,anoperationI’llcall marking thevertices.Thereareseveralwaystoaccomplishthis.
Markbits. Ifverticesarerepresentedbyobjects,asintheclass Vertex illustrated in §12.2.1,wecankeepabitineachvertexthatindicateswhetherthevertex hasbeenvisited.Thesebitsmustinitiallyallbeon(oroff)andarethen flippedwhenavertexisfirstvisited.Similarly,wecoulddothisforedges instead.
Markcounts. Aproblemwithmarkbitsisthatonemustbesuretheyareallset thesamewayatthebeginningofatraversal.Iftraversalsmaygetcutshort,
226 CHAPTER12.GRAPHS
M 2 = ABCDEFGH A 00111101 B 00111001 C 00010000 D 01100020 E 00000000 F 00101001 G 00101001 H 00020110 M 3 = ABCDEFGH A 01211021 B 01201021 C 00101001 D 00030110 E 00000000 F 01100020 G 01100020 H 00222002
Fortheexampleinquestion,weget
causingmarkbitstohavearbitrarysettingsafteratraversal,onemaybe abletousealargermarkinstead.Giveeachtraversalanumberinincreasing sequence(thefirsttraversalisnumber1,thesecondis2,etc.).Tovisita node,setitsmarkcounttothecurrenttraversalnumber.Eachnewtraversal isguaranteedtohaveanumbercontainedinnoneofthemarkfields(assuming themarkfieldsareinitializedappropriately,sayto0).
Bitvectors. If,asinourabstractions,verticeshavenumbers,onecankeepabit vector, M ,ontheside,where M [i]is1iffvertexnumber i hasbeenvisited. Bitvectorsareeasytoresetatthebeginningofatraversal.
Adhoc. Sometimes,theparticulartraversalbeingperformedprovidesawayof recognizingavisitedvertex.Onecan’tsayanythinggeneralaboutthis,of course.
12.3.2Ageneraltraversalschema.
Manygraphalgorithmshavethefollowinggeneralform.Italicizedcapital-letter namesmustbereplacedaccordingtotheapplication.
/*GENERALGRAPH-TRAVERSALSCHEMA*/ COLLECTION OF VERTICES fringe;
fringe= INITIAL COLLECTION; while(!fringe.isEmpty()){
Vertexv=fringe.REMOVE HIGHEST PRIORITY ITEM ();
if(! MARKED (v)){ MARK (v); VISIT (v); Foreachedge(v,w){
if(NEEDS PROCESSING (w)) Addwtofringe;
Inthefollowingsections,welookatvariousalgorithmsthatfitthisschema2
2 Inthiscontext,a schema (plural schemas or schemata)isatemplate,containingsomepieces thatmustbereplaced.Logicalsystems,forexample,oftencontain axiomschemata suchas
(∀xP(x)) ⊃P(y), where P maybereplacedbyanylogicalformulawithadistinguishedfreevariable(well,roughly).
12.3.GRAPHALGORITHMS 227
} } }
12.3.3Genericdepth-firstandbreadth-firsttraversal Depth-firsttraversalingraphsisessentiallythesameasin trees,withtheexception ofthecheckfor“alreadyvisited.”Toimplement /**Performtheoperation VISIT oneachvertexreachablefromV *indepth-firstorder.*/ voiddepthFirstVisit(Vertexv)
weusethegeneralgraph-traversalschemawiththefollowingreplacements.
COLLECTION OF VERTICES isastacktype.
INITIAL COLLECTION istheset {v}.
REMOVE HIGHEST PRIORITY ITEM popsandreturnsthetop.
MARK and MARKED setandcheckamarkbit(seediscussionabove).
NEEDS PROCESSING means“not MARKED.”
Here,asisoftenthecase,wecoulddispensewith NEEDS PROCESSING (make italwaysTRUE).Theonlyeffectwouldbetoincreasethesizeofthestacksomewhat.
Breadth-firstsearchisnearlyidentical.Theonlydifferencesareasfollows.
COLLECTION OF VERTICES isa(FIFO)queuetype.
REMOVE HIGHEST PRIORITY ITEM istoremoveandreturnthefirst(leastrecently-added)iteminthequeue.
12.3.4Topologicalsorting.
A topologicalsort ofadirectedgraphisalistingofitsverticesinsuchanorder thatifvertex w isreachablefromvertex v,then w islistedafter v.Thus,ifwe thinkofagraphasrepresentinganorderingrelationonthevertices,atopological sortisalinearorderingoftheverticesthatisconsistentwiththatorderingrelation.Acyclicdirectedgraphhasnotopologicalsort.Forexample,topologicalsort istheoperationthattheUNIX make utilityimplicitlyperformstofindanorder forexecutingcommandsthatbringseveryfileuptodatebeforeitisneededina subsequentcommand.
Toperformatopologicalsort,weassociateacountwitheach vertexofthe numberofincomingedgesfromas-yetunprocessedvertices. Fortheversionbelow, Iuseanarraytokeepthesecounts.Thealgorithmfortopologicalsortnowlooks likethis.
228 CHAPTER12.GRAPHS
/**AnarrayoftheverticesinGintopologicallysortedorder. *AssumesGisacyclic.*/ staticint[]topologicalSort(DigraphG) {
int[]count=newint[G.numVertices()]; int[]result=newint[G.numVertices()]; intk;
for(intv=0;v<G.numVertices();v+=1) count[v]=G.inDegree(v);
Graph-traversalschemareplacementfortopologicalsorting; returnresult;
Theschemareplacementfortopologicalsortingisasfollows.
COLLECTION OF VERTICES canbeanyset,multiset,list,orsequencetypefor vertices(stacks,queues,etc.,etc.).
INITIAL COLLECTION isthesetofall v with count[v]=0
REMOVE HIGHEST PRIORITY ITEM canremoveanyitem.
MARKED and MARK canbetrivial(i.e.,alwaysreturnFALSEanddonothing, respectively).
VISIT(v) makes v thenextnon-nullelementof result anddecrements count[w] foreachedge (v,w) in G.
NEEDS PROCESSING istrueif count[w]==0
Figure12.8illustratesthealgorithm.
12.3.5Minimumspanningtrees
Consideraconnectedundirectedgraphwithedgeweights.A minimum(-weight) spanningtree (or MST forshort)isatreethatisasubgraphofthegivengraph, containsalltheverticesofthegivengraph,andminimizesthesumofitsedge weights.Forexample,wemighthaveabunchofcitiesthatwewishtoconnect upwithtelephonelinessoastoprovideapathbetweenanytwo,allatminimal cost.Thecitiescorrespondtoverticesandthepossibleconnectionsbetweencities correspondtoedges3.Findingaminimalsetofpossibleconnectionsisthesame asfindingaminimumspanningtree(therecanbemorethanone).Todothis,we makeuseofausefulfactaboutMSTs.
3 Itturnsoutthattoget really lowestcosts,youwanttointroducestrategicallyplacedextra “cities”toserveasconnectingpoints.We’llignorethathere.
12.3.GRAPHALGORITHMS 229
}
Figure12.8: Theinputtoatopologicalsort(upperleft)andthreestages inits computation.Theshadednodesarethosethathavebeenprocessedandmovedto the result.Thestarrednodesaretheonesinthefringe.Subscriptsindicate count fields.Apossiblefinalsequenceofnodes,giventhisstart,is A,C,F,D,B,E,G, H.
230 CHAPTER12.GRAPHS ⋆A0 B1 ⋆C0 D2 E3 F1 G1 H1 A ⋆B0 ⋆C0 D1 E3 F1 G1 H1 A result: A ⋆B0 C ⋆D0 E3 ⋆F0 G1 H1 A C result: A ⋆B0 C ⋆D0 E2 F ⋆G0 H1 A C F result:
FACT: Iftheverticesofaconnectedgraph G aredividedintotwodisjointnonemptysets, V0 and V1,thenanyMSTfor G willcontainoneoftheedgesrunning betweenavertexin V0 andavertexin V1 thathasminimalweight.
Proof. It’sconvenienttouseaproofbycontradiction.SupposethatsomeMST, T ,doesn’tcontainanyoftheedgesbetween V0 and V1 withminimalweight.Considertheeffectofaddingto T anedgefrom V0 to V1, e,thatdoeshaveminimal weight,thusgiving T ′ (theremustbesuchanedge,sinceotherwise T wouldbe unconnected).Since T wasatree,theresultofaddingthisnewedgemusthavea cycleinvolving e (sinceitaddsanewpathbetweentwonodesthatalreadyhada pathbetweenthemin T ).Thisisonlypossibleifthecyclecontainsanotheredge from T , e′,thatalsorunsbetween V0 and V1.Byhypothesis, e hasweightlessthan e′.Ifweremove e′ from T ′,wegetatreeonceagain,butsincewehavesubstituted e for e′,thesumoftheedgeweightsforthisnewtreeislessthanthat for T ,acontradictionof T ’sminimality.Therefore,itwaswrongtoassumethat T contained nominimal-weightedgesfrom V0 to V1.(EndofProof)
Weusethisfactbytaking V0 tobeasetofprocessed(marked)verticesforwhich wehaveselectededgesthatformatree,andtaking V1 tobethesetofallother vertices.BytheFactabove,wemaysafelyaddtothetreeanyminimal-weightedge fromthemarkedverticestoanunmarkedvertex.
ThisgiveswhatisknownasPrim’salgorithm.Thistime,weintroducetwoextra piecesofinformationforeachnode, dist[v] (aweightvalue),and parent[v] (a Vertex).Ateachpointinthealgorithm,the dist valueforanunprocessedvertex (stillinthefringe)istheminimaldistance(weight)betweenitandaprocessed vertex,andthe parent valueistheprocessedvertexthatachievesthisminimal distance.
/**ForallverticesvinG,setPARENT[v]tobetheparentofvin *aMSTofG.ForeachvinG,DIST[v]maybealteredarbitrarily. *AssumesthatGisconnected.WEIGHT[e]istheweightofedge e.*/ staticvoidMST(GraphG,int[]weight,int[]parent,int[]dist)
for(intv=0;v<G.numVertices();v+=1){
dist[v]= ∞; parent[v]=-1;
LetrbeanarbitraryvertexinG; dist[r]=0;
Graph-traversalschemareplacementforMST;
Theappropriate“settings”forthegraph-traversalschema areasfollows.
COLLECTION OF VERTICES isapriorityqueueofverticesorderedby dist values,withsmaller distshavinghigherpriorities.
12.3.GRAPHALGORITHMS 231
{
}
}
INITIAL COLLECTION containsalltheverticesofG.
REMOVE HIGHEST PRIORITY ITEM removesthefirstiteminthepriorityqueue.
VISIT(v): foreachedge(v,w)with weight n,if w isunmarked,and dist[w] >n, set dist[w]to n andset parent[w]to v.
NEEDS PROCESSING(v) isalwaysfalse.
Figure12.9illustratesthisalgorithminaction.
12.3.6Single-sourceshortestpaths
Supposethatwearegivenaweightedgraph(directedorotherwise)andwewantto findtheshortestpathsfromsomestartingnodetoeveryreachablenode.Asuccinct presentationoftheresultsofthisalgorithmisknownasa shortest-pathtree.Thisis a(notnecessarilyminimum)spanningtreeforthegraphwith thedesiredstarting nodeastherootsuchthatthepathfromtheroottoeachothernodeinthetreeis alsoapathofminimaltotalweightinthefullgraph.
Acommonalgorithmfordoingthis,knownasDijkstra’salgorithm,looksalmost identicaltoPrim’salgorithmforMSTs.Wehavethesame PARENT and DIST data asbefore.However,whereasinPrim’salgorithm, DIST givestheshortestdistance fromanunmarkedvertextothemarkedvertices,inDijkstra’salgorithmitgives thelengthoftheshortestpathknownsofarthatleadstoitfromthestartingnode.
/**ForallverticesvinGreachablefromSTART,setPARENT[v] *tobetheparentofvinashortest-pathtreefromSTARTinG.For *allverticesinthistree,DIST[v]issettothedistancefromSTART *WEIGHT[e]arenon-negativeedgeweights.Assumesthatvertex *STARTisinG.*/
staticvoidshortestPaths(GraphG,intstart,int[]weight, int[]parent,double[]dist)
for(intv=0;v<G.numVertices();v+=1){ dist[v]= ∞; parent[v]=-1;
dist[start]=0;
Graph-traversalschemareplacementforshortest-pathtree;
wherewesubstituteintotheschemaasfollows:
COLLECTION OF VERTICES isapriorityqueueofverticesorderedby dist values,withsmaller distshavinghigherpriorities.
INITIAL COLLECTION containsalltheverticesofG.
232 CHAPTER12.GRAPHS
{
}
}
12.3.GRAPHALGORITHMS 233
Figure12.9: Prim’salgorithmforminimumspanningtree.Vertex r is A.The numbersinthenodesdenote dist values.Dashededgesdenote parent values; theyformaMSTafterthelaststep.Unshadednodesareinthefringe.Thelast twosteps(whichdon’tchange parent pointers)havebeencollapsedintoone.
A 0 B ∞ C ∞ D ∞ E ∞ F ∞ G ∞ H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E ∞ F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 4 D 3 E 3 F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 7 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 7 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 1 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 1 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 2 D 3 E 3 F 1 G 1 H 2 2 5 3 7 5 3 2 2 3 6 4 2 1 1
REMOVE HIGHEST PRIORITY ITEM removesthefirstiteminthepriorityqueue.
MARKEDandMARK canbetrivial(returnfalseanddonothing,respectively).
VISIT(v): foreachedge(v,w)withweight n,if dist[w] >n + dist[v],set dist[w]to n + dist[v]andset parent[w]to v.Reorder fringe asneeded.
NEEDS PROCESSING(v) isalwaysfalse.
Figure12.10illustratesDijkstra’salgorithminaction.
Becauseoftheirverysimilarstructure,theasymptoticrunningtimesofDijkstra’sandPrim’salgorithmsaresimilar.Wevisiteachvertexonce(removingan itemfromthepriorityqueue),andreorderthepriorityqueueatmostonceforeach edge.Hence,if V isthenumberofverticesof G and E isthenumberofedges,we getanupperboundonthetimerequiredbythesealgorithmsof O((V + E) lg V ).
12.3.7A*search
Dijkstra’salgorithmefficientlyfinds all shortestpathsfromasinglestartingpoint (source)inagraph.Suppose,however,thatyouareonlyinterestedinasingle shortestpathfromonesourcetoonedestination.Wecouldhandlethisbymodifying theVISITstepinDijkstra’salgorithm:
VISIT(v): [Singledestination]If v isthedestinationnode,exitthealgorithm.Otherwise,foreachedge(v,w)withweight n,if dist[w] >n + dist[v],set dist[w] to n + dist[v]andset parent[w]to v.Reorder fringe asneeded.
Thisavoidscomputationsofpathsfartherfromthesourcethanisthedestination, butDijkstra’salgorithmcanstilldoagreatdealofunnecessarywork.
Suppose,forexample,thatyouwanttofindashortestpathbyroadfromDenver toNewYorkCity.True,weareguaranteedthatwhenweselectNewYorkfrom thepriorityqueue,wecanstopthealgorithm.Unfortunately,beforethealgorithm considersasingleManhattanstreet,itwillhavefoundtheshortestpathfromDenver tonearlyeverydestinationonthewestcoast(asidefromAlaska),Mexico,andthe westernprovincesofCanada—allofwhichareinthewrongdirection!
Intuitively,wemightimprovethesituationbyconsidering nodesinadifferent order—onebiasedtowardourintendeddestination.Itturns outthatthenecessary adjustmentiseasy.Theresultingalgorithmiscalled A*search4 :
234
CHAPTER12.GRAPHS
4DiscoveredbyNilsNilssonandBertramRaphaelin1968.PeterHartdemonstratedoptimality.
12.3.GRAPHALGORITHMS 235
Figure12.10: Dijkstra’salgorithmforshortestpaths.Thestartingnode is A
Numbersinnodesrepresentminimumdistancetonode A sofarfound(dist).
Dashedarrowsrepresent parent pointers;theirfinalvaluesshowtheshortest-path tree.Thelastthreestepshavebeencollapsedtoone.
A 0 B ∞ C ∞ D ∞ E ∞ F ∞ G ∞ H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E ∞ F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F ∞ G 7 H ∞ 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F ∞ G 6 H 9 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F 7 G 6 H 9 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F 6 G 6 H 7 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1 A 0 B 2 C 5 D 3 E 5 F 6 G 6 H 7 2 5 3 7 4 5 3 2 2 3 6 4 2 1 1
/**ForallverticesvinGalongashortestpathfromSTARTtoEND, *setPARENT[v]tobethepredecessorofvinthepath,andset *DIST[v]issettothedistancefromSTART.WEIGHT[e]are *non-negativeedgeweights.H[v]isaconsistentheuristic *estimateofthedistancefromvtoEND.AssumesthatvertexSTART *isinG,andthatENDisinGandreachablefromSTART.*/ staticvoidshortestPath(GraphG,intstart,intend,int[] weight,int[]h, int[]parent,doubledist[])
for(intv=0;v<G.numVertices();v+=1){ dist[v]= ∞; parent[v]=-1; }
dist[start]=0;
Graph-traversalschemareplacementforA*search; }
TheschemaforA*searchisidenticaltothatforDijkstra’salgorithm,exceptthat theVISITstepismodifiedtotheSingleDestinationversionabove,andwereplace COLLECTION OF VERTICES [A*search]isapriorityqueueofverticesordered bythevalueof dist(v)+h[v] values,withsmallervalueshavinghigherpriorities.
Thedifference,inotherwords,isthatweconsidernodesinorderofourcurrentbest estimateoftheminimumdistancetothedestinationonapath thatgoesthrough thenode.Inotherwords,Dijkstra’salgorithmisessentiallythesame,butuses h[v]=0
Foroptimalandcorrectbehavior,weneedsomerestrictions on h,theheuristic distanceestimate.Asindicatedinthecomment,werequirethat h be consistent. Thismeansfirstthatitmustbe admissible: h[v] mustnotoverestimatetheactual shortest-pathlengthfrom v tothedestination.Second,werequirethatif(v,w)is anedge,then
h[v] ≤ weight[(v,w)]+ h[w].
Thisisaversionofthefamiliartriangleinequality:thelengthofanysideofa trianglemustbelessthanorequaltothesumofthelengthsof theothertwo. Undertheseconditions,theA*algorithmisoptimalinthesensethatnoother algorithmthatusesthesameheuristicinformation(i.e., h)canvisitfewernodes (somequalificationisneedediftherearemultiplepathswiththesameweight.)
Consideringagainroute-planningfromDenver,wecanusedistancetoNew York“asthecrowflies”asourheuristic,sincethesedistancessatisfythetriangle inequalityandarenogreaterthanthelengthofanycombinationofroadsegments betweentwopoints.Inreal-lifeapplications,however,thegeneralpracticeistodo agreatdealofpreprocessingofthedatasothatactualqueriesdon’tactuallyneed todoafullsearchandcanthusoperatequickly.
236
CHAPTER12.GRAPHS
{
12.3.8Kruskal’salgorithmforMST
Justsoyoudon’tgettheideathatourgraphtraversalschema istheonlypossible waytogo,we’llconsidera“classical”methodforformingaminimumspanningtree, knownasKruskal’salgorithm.Thisalgorithmreliesona union-find structure.At anytime,thisstructurecontainsa partition ofthevertices:acollectionofdisjoint setsofverticesthatincludesallofvertices.Initially,eachvertexisaloneinitsown set.TheideaisthatwebuildupanMSToneedgeatatime.Werepeatedlychoose anedgeofminimumweightthatjoinsverticesintwodifferent sets,addthatedgeto theMSTwearebuilding,andthencombine(union)thetwosets ofverticesintoone set.Thisprocesscontinuesuntilallthesetshavebeencombinedintoone(which mustcontainallthevertices).Atanypoint,eachsetisabunchofverticesthatare allreachablefromeachotherviatheedgessofaraddedtothe MST.Whenthere isonlyoneset,itmeansthatalloftheverticesarereachable,andsowehaveaset ofedgesthatspansthetree.ItfollowsfromtheFactin §12.3.5thatifwealways addtheminimallyweightededgethatconnectstwoofthedisjointsetsofvertices, thatedgecanalwaysbepartofaMST,sothefinalresultmustalsobeaMST. Figure12.11illustratestheidea.
Fortheprogram,I’llassumewehaveatype—UnionFind—representingsetsof setsofvertices.Weneedtwooperationsonthistype:aninquiry S .sameSet(v, w) thattellsuswhethervertices v and w areinthesamesetin S,andanoperation S .union(v, w) thatcombinesthesetscontainingvertices v and w intoone.I’ll alsoassumea“setofedges”settocontaintheresult.
12.3.GRAPHALGORITHMS 237
Figure12.11:Kruskal’salgorithm.Thenumbersintheverticesdenotesets:vertices withthesamenumberareinthesameset.Dashededgeshavebeenaddedtothe MST.ThisisdifferentfromtheMSTfoundinFigure12.9.
238
A 0 B 1 C 2 D 3 E 4 F 5 G 6 H 7 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 1 C 2 D 3 E 4 F 5 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 1 C 2 D 3 E 4 F 4 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 2 D 3 E 4 F 4 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 2 D 3 E 2 F 2 G 6 H 6 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 2 D 3 E 2 F 2 G 2 H 2 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 0 D 3 E 0 F 0 G 0 H 0 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7 A 0 B 0 C 0 D 0 E 0 F 0 G 0 H 0 1 1 2 2 2 2 3 3 3 4 4 5 5 6 7
CHAPTER12.GRAPHS
/**ReturnasubsetofedgesofGformingaminimumspanningtreeforG. *Gmustbeaconnectedundirectedgraph.WEIGHTgivesedgeweights.*/ EdgeSetMST(GraphG,int[]weight)
UnionFindS; EdgeSetE; //InitializeSto {{v}| v isavertexof G }; S=newUnionFind(G.numVertices()); E=newEdgeSet();
Foreachedge(v,w)inGinorderofincreasingweight{ if(!S.sameSet(v,w)){ Add(v,w)toE; S.union(v,w);
Thetrickypartisthisunion-findbit.Fromwhatyouknow,you mightwell guessthateach sameSet operationwillrequiretimeΘ(N lg N )intheworstcase (lookineachofupto N setseachofsizeupto N ).Interestinglyenough,thereisa betterway.Let’sassume(asinthisproblem)thatthesetscontainintegersfrom0to N 1.Atanygiventime,therewillbeupto N disjointsets;we’llgivethemnames (well,numbersreally)byselectingasingle representativemember ofeachsetand usingthatmember(anumberbetween0and N 1)toidentifytheset.Then,ifwe canfindthecurrentrepresentativememberofthesetcontaininganyvertex,wecan telliftwoverticesareinthesamesetbyseeingiftheirrepresentativemembersare thesame.Onewaytodothisistorepresenteachdisjointsetasa tree ofvertices, butwithchildrenpointingatparents(youmayrecallthatIsaidsuchastructure wouldeventuallybeuseful).Therootofeachtreeistherepresentativemember, whichwemayfindbyfollowingparentlinks.Forexample,wecanrepresenttheset ofsets
12.3.GRAPHALGORITHMS 239
{
} }
}
returnE;
{{1, 2, 4, 6, 7}, {0, 3, 5}, {8, 9, 10}} withtheforestoftrees 1 7 2 4 6 3 0 5 8 10 9
Werepresentallthiswithasingleintegerarray, parent,with parent[v] containingthenumberofparentnodeof v,or 1if v hasnoparent(i.e.,isarepresentativemember).Theunionoperationisquitesimple:tocompute S .union(v,w), wefindtherootsofthetreescontaining v and w (byfollowingthe parent chain) andthenmakeoneofthetworootsthechildoftheother.So,forexample,we couldcompute S .union(6,0) byfindingtherepresentativememberfor6(which is1),andfor0(whichis3)andthenmaking3pointto1:
Forbestresults,weshouldmakethetreeoflesser“rank”(roughly,height)pointto theoneoflargerrank5
However,whilewe’reatit,let’sthrowinatwist.Afterwetraversethepaths from6upto1andfrom0to3,we’llre-organizethetreebyhavingeverynodein thosepathspointdirectlyatnode1(ineffect“memoizing”theresultoftheoperationoffindingtherepresentativemember).Thus,afterfindingtherepresentative memberfor6and0andunioning,wewillhavethefollowing,muchflattertree:
Thisre-arrangement,whichiscalled pathcompression, causessubsequentinquiriesaboutvertices6,4,and0tobeconsiderablyfasterthanbefore.Itturns outthatwiththistrick(andtheheuristicofmakingtheshallowertreepointatthe deeperinaunion),anysequenceof M union and sameSet operationsonasetofsets containingatotalof N elementscanbeperformedintime O(α(M,N )M ).Here, α(M,N )isaninverseof Ackerman’sfunction.Specifically, α(M,N )isdefinedas theminimum i suchthat A(i, ⌊M/N ⌋) > lg N ,where
5We’recheatingabitinthissectiontomaketheeffectsoftheoptimizationswedescribeabit clearer.Wecouldnothaveconstructedthe“stringy”treesintheseexampleshadwealwaysmade thelesser-ranktreepointtothegreater.Soineffect,ourexamplesstartfromunion-findtreesthat wereconstructedinhaphazardfashion,andweshowwhathappensifwestartdoingthingsright fromthenon.
240
CHAPTER12.GRAPHS
1 7 2 4 6 3 0 5
1 7 2 4 6 0 3 5
A
A
i, 1)=
A
i,j)=
1
(1,j)=2j , for j ≥ 1,
(
A(i 1, 2), for i ≥ 2,
(
A(i
,A(i,j 1)), for i,j ≥ 2
Well,thisisallrathercomplicated,butsufficeittosaythat A growsmonumentally fast,sothat α growswithsubglacialslowness,andisforallmortalpurposes ≤ 4.In short,the amortizedcost of M operations(union and sameSets inanycombination) isroughlyconstantperoperation.Thus,thetimerequiredforKruskal’salgorithm isdominatedbythesortingtimefortheedges,andisasymptotically O(E lg E),for E thenumberofedges.Thisinturnequals O(E lg V )foraconnectedgraph,where V isthenumberofvertices.
Exercises
12.1. Aborogoveandasnarkfindthemselvesinamazeoftwistylittlepassages thatconnectnumerousrooms,oneofwhichisthemazeexit.Thesnark,beinga boojum,findsborogovesespeciallytastyafteralongdayofcausingpeopletosoftly andsilentlyvanishaway.Unfortunatelyforthesnark(andcontrariwiseforhis prospectivesnack),borogovescanruntwiceasfastassnarksandhaveanuncanny abilityoffindingtheshortestroutetotheexit.Fortunatelyforthesnark,his preternaturalsensestellhimpreciselywheretheborogove isatanytime,andhe knowsthemazelikethebackofhis,er,talons.Ifhecanarriveattheexitorin anyoftheroomsintheborogove’spathbeforetheborogovedoes(strictlybefore, notatthesametime),hecancatchit.Theborogoveisnotparticularlyintelligent, andwillalwaystaketheshortestpath,evenifthesnarkiswaitingonit.
Thus,forexample,inthefollowingmaze,thesnark(startingat‘S’)willdinein theshadedroom,whichhereachesin6timeunits,andtheborogove(startingat ‘B’)in7.Thenumbersontheconnectingpassagesindicatedistances(thenumbers insideroomsarejustlabels).Thesnarktravelsat0.5units/hour,andtheborogove at1unit/hour.
Writeaprogramtoreadinamazesuchastheabove,andprintoneoftwo messages: Snarkeats,or Borogoveescapes,asappropriate.Placeyouranswer inaclass Chase (seethetemplatesin˜cs61b/hw/hw7).
Theinputisasfollows.
• Apositiveinteger N ≥ 3indicatingthenumberofrooms.Youmayassume that N< 1024.Theroomsareassumedtobenumberedfrom0to N 1.
12.3.GRAPHALGORITHMS 241
S 3 4 5 6 7 B 8 9 E 2 2 4 1 3 6 2 1 2 3 8 1 1 7
Room0isalwaystheexit.Initiallyroom1containstheborogoveandroom2 containsthesnark.
• Asequenceofedges,eachconsistingoftworoomnumbers(the orderofthe roomnumbersisimmaterial)followedbyanintegerdistance
Assumethatwhenevertheborogovehasachoicebetweenpassagestotake(i.e.,all leadtoashortestpath),hechoosestheonetotheroomwiththelowestnumber. Forthemazeshown,apossibleinputisasfollows.
10
232242354361413
566582591672693
701798181
897
242
CHAPTER12.GRAPHS
Index A*search,232
AbstractCollectionclass,50–52
AbstractCollectionmethods add,52 iterator,52 size,52 toString,52
AbstractListclass,51–55,60
AbstractListmethods add,55 get,55 listIterator,55,56 remove,55 removeRange,55 set,55 size,55
AbstractList.ListIteratorImplclass,56,57
AbstractList.modCountfield,55
AbstractList.modCountfields modCount,55
AbstractMapclass,58,59
AbstractMapmethods clear,59 containsKey,59 containsValue,59 entrySet,59 equals,59 get,59 hashCode,59 isEmpty,59 keySet,59 put,59 putAll,59 remove,59 size,59 toString,59 values,59
AbstractSequentialListclass,54–58,61
AbstractSequentialListmethods listIterator,58 size,58 acyclicgraph,215 adapterpattern,79 add(AbstractCollection),52 add(AbstractList),55 add(ArrayList),66 add(Collection),30 add(LinkedIter),75 add(List),34 add(ListIterator),25 add(Queue),80 add(Set),32 addAll(Collection),30 addAll(List),34 addFirst(Deque),80 additivegenerator,207 adjacencylist,217 adjacencymatrix,223 adjacentvertex,215 AdjGraphclass,220 admissibledistanceestimate,234 algorithmiccomplexity,5–20 alpha-betapruning,127–129 amortizedcost,16–18,63 ancestor(oftreenode),89
Arrayclass,50
Arraymethods newInstance,50
ArrayDequeclass,84
ArrayListclass,63–65
ArrayListmethods add,66 check,66 ensureCapacity,66
243
get,65 remove,65 removeRange,66 set,65 size,65
ArrayStackclass,81
asymptoticcomplexity,7–9
averagetime,6
AVLtree,182–184
B-tree,163–170
backtracking,77
biconnectedgraph,216
Big-Ohnotation definition,7
Big-Omeganotation definition,9
Big-Thetanotation definition,9
bin,131
binarysearchtree(BST),105
binarytree,90,91
binary-search-treeproperty,105
BinaryTree,93
BinaryTreemethods left,93 right,93 setLeft,93
setRight,93
binomialcomb,152
breadth-firsttraversal,98,226
BST, see binarysearchtree deletingfrom,109
searching,107
BSTclass,108
BSTmethods find,107 insert,109,111 remove,110
swapSmallest,110
BSTSetclass,113,114
callstack,78
chainedhashtables,131
check(ArrayList),66
child(Tree),93
children(intree),89 circularbuffer,82 Classclass,50 Classmethods getComponentType,50 clear(AbstractMap),59 clear(Collection),30 clear(Map),41 clone(LinkedList),73 codomain,37
Collectionclass,29,30
Collectionhierarchy,27
Collectioninterface,24–28
Collectionmethods add,30 addAll,30 clear,30 contains,29 containsAll,29 isEmpty,29 iterator,29 remove,30 removeAll,30 retainAll,30 size,29 toArray,29
Collectionsclass,160,200,214
Collectionsmethods shuffle,214 sort,160 synchronizedList,200 collision(inhashtable),132
Comparableclass,36
Comparablemethods compareTo,36 comparator(SortedMap),42 comparator(SortedSet),38
Comparatorclass,37
Comparatormethods compare,37 equals,37 compare(Comparator),37 compareTo(Comparable),36 completetree,90,91 complexity,5–20
INDEX
244
compressingtables,179
concave,19
concurrency,199–203
ConcurrentModificationExceptionclass,54 connectedcomponent,215
connectedgraph,215
consistencywith .equals,36
consistentdistanceestimate,234 contains(Collection),29
containsAll(Collection),29 containsKey(AbstractMap),59 containsValue(AbstractMap),59
cycleinagraph,215
deadlock,203
degree(Tree),93
degreeofavertex,215
degreeofnode,89
deletingfromaBST,109
depthoftreenode,90
depth-firsttraversal,225
Dequeclass,80
dequedatastructure,76
Dequemethods
addFirst,80
last,80
removeLast,80
descendent(oftreenode),89
designpattern adapter,79
definition,47
Singleton,98
TemplateMethod,47
Visitor,100
digraph, see directedgraph
Digraphclass,218
Dijkstra’salgorithm,230
directedgraph,215
distributioncountingsort,146
domain,37
doublehashing,136
doublelinking,68–71
double-endedqueue,76
edge,89
edge,inagraph,215
edge-setgraphrepresentation,222 enhancedforloop,23 ensureCapacity(ArrayList),66 Entryclass,73 entrySet(AbstractMap),59 entrySet(Map),40 Enumerationclass,22 .equals,consistentwith,36 equals(AbstractMap),59 equals(Comparator),37 equals(Map),40 equals(Map.Entry),41 equals(Set),32 expressiontree,91 externalnode,89 externalpathlength,90 externalsorting,140
FIFOqueue,76 find(BST),107 findExitprocedure,78 first(PriorityQueue),119 first(Queue),80 first(SortedSet),38 firstKey(SortedMap),42 forloop,enhanced,23 forest,90 freetree,216 fulltree,90,91
gametrees,125–129 Gamma,Erich,47 get(AbstractList),55 get(AbstractMap),59 get(ArrayList),65 get(HashMap),134 get(List),33 get(Map),40 getClass(Object),50 getComponentType(Class),50 getKey(Map.Entry),41 getValue(Map.Entry),41 graph
acyclic,215 biconnected,216 breadth-firsttraversal,226
INDEX 245
connected,215
depth-firsttraversal,225 directed,215 path,215
traversal,general,225 undirected,215
Graphclass,219
graphs,215–239
hashCode(AbstractMap),59
hashCode(Map),40
hashCode(Map.Entry),41
hashCode(Object),132,137
hashCode(Set),32
hashCode(String),138
hashing,131–138
hashingfunction,131,136–138
HashMapclass,134
HashMapmethods get,134
put,134
hasNext(LinkedIter),74
hasNext(ListIterator),25
hasPrevious(LinkedIter),74
hasPrevious(ListIterator),25
headMap(SortedMap),42
headSet(SortedSet),38
heap,117–125
heightoftree,90
Helm,Richard,47
image,37
in-degree,215
incidentedge,215
indexOf(List),33
indexOf(Queue),80
indexOf(Stack),77
indexOf(StackAdapter),81
inordertraversal,98
insert(BST),109,111
insert(PriorityQueue),119
insertionsort,141
insertionSort,142
internalnode,89
internalpathlength,90
internalsorting,140
InterruptedExceptionclass,202 inversion,140
isEmpty(AbstractMap),59
isEmpty(Collection),29
isEmpty(Map),40
isEmpty(PriorityQueue),119
isEmpty(Queue),80
isEmpty(Stack),77
isEmpty(StackAdapter),81
Iterableclass,23
Iterablemethods
iterator,23
iterativedeepening,127 iterator,22
iterator(AbstractCollection),52
iterator(Collection),29 iterator(Iterable),23
iterator(List),33
Iteratorinterface,22–24
java.langclasses
Class,50
Comparable,36
InterruptedException,202 Iterable,23
java.lang.reflectclasses
Array,50
java.utilclasses
AbstractCollection,50–52
AbstractList,51–55,60
AbstractList.ListIteratorImpl,56,57
AbstractMap,58,59
AbstractSequentialList,54–58,61
ArrayList,63–65
Collection,29,30
Collections,160,200,214
Comparator,37
ConcurrentModificationException,54
Enumeration,22
HashMap,134
LinkedList,73
List,31,33,34
ListIterator,25
Map,39–41
Map.Entry,41
246 INDEX
Random,207,209
Set,31–32
SortedMap,39,42
SortedSet,37,38,111–113
Stack,76
UnsupportedOperationException,28
java.utilinterfaces
Collection,24–28
Iterator,22–24
ListIterator,24
java.util.LinkedListclasses
Entry,73
LinkedIter,73,74
Johnson,Ralph,47
key,105
key,insorting,139
keySet(AbstractMap),59
keySet(Map),40
Kruskal’salgorithm,235
label(Tree),93
last(Deque),80
last(SortedSet),38
lastIndexOf(List),33
lastIndexOf(Queue),80
lastKey(SortedMap),42
leafnode,89
left(BinaryTree),93
leveloftreenode,90
LIFOqueue,76
linearcongruentialgenerator,205–207
linearprobes,132
link,67
linkedstructure,67
LinkedIterclass,73,74
LinkedItermethods
LinkedListclass,73
LinkedListmethods clone,73 listIterator,73 Listclass,31,33,34
Listmethods add,34 addAll,34 get,33 indexOf,33 iterator,33 lastIndexOf,33 listIterator,33 remove,34 set,34 subList,33
listIterator(AbstractList),55,56 listIterator(AbstractSequentialList),58 listIterator(LinkedList),73 listIterator(List),33
ListIteratorclass,25 ListIteratorinterface,24
ListIteratormethods add,25 hasNext,25 hasPrevious,25 next,25 nextIndex,25 previous,25 previousIndex,25 remove,25 set,25
Little-ohnotation definition,9 logarithm,propertiesof,19 Lomuto,Nico,150
LSD-firstradixsorting,157
Mapclass,39–41
Maphierarchy,26
Mapmethods clear,41 entrySet,40 equals,40 get,40
INDEX 247
add,75 hasNext,74 hasPrevious,74 next,74 nextIndex,75 previous,74 previousIndex,75 remove,75 set,75
hashCode,40
isEmpty,40
keySet,40 put,41 putAll,41 remove,41 size,40 values,40
Map.Entryclass,41
Map.Entrymethods equals,41
getKey,41
getValue,41
hashCode,41
setValue,41
mapping,37
markingvertices,224 mergesorting,151
message-passing,203
minimaxalgorithm,126
minimumspanningtree,227,235 mod,133
modCount(field),55 monitor,201–203
MSD-firstradixsorting,157 mutualexclusion,200
naturalordering,36 newInstance(Array),50 next(LinkedIter),74 next(ListIterator),25
nextIndex(LinkedIter),75
nextIndex(ListIterator),25
nextInt(Random),209
nodeoftree,89
node,inagraph,215
non-terminalnode,89
nulltreerepresentation,97 numChildren(Tree),93
O( ), see Big-Ohnotation
o( ), see Little-ohnotation
Objectmethods
getClass,50
hashCode,132,137
Ω( ), see Big-Omeganotation
open-addresshashtable,132–136 ordernotation,7–9 orderedtree,89,90 ordering,natural,36 ordering,total,36 orthogonalrangequery,113 out-degree,215
parent(Tree),94 partitioning(forquicksort),150 pathcompression,238 pathinagraph,215 pathlengthintree,90 performance of AbstractList,60 of AbstractSequentialList,61 pointquadtree,117 point-regionquadtree,117 pop(Stack),77 pop(StackAdapter),81 positionaltree,90 postordertraversal,98 potentialmethod,17–18 PRquadtree,117 preordertraversal,98 previous(LinkedIter),74 previous(ListIterator),25 previousIndex(LinkedIter),75 previousIndex(ListIterator),25 Prim’salgorithm,229 primarykey,139 priorityqueue,117–125
PriorityQueueclass,119
PriorityQueuemethods
first,119 insert,119 isEmpty,119 removeFirst,119 properancestor,89 properdescendent,89 propersubtree,89 protectedconstructor,useof,51 protectedmethods,useof,51 pseudo-randomnumbergenerators,205–214
248 INDEX
additive,207
arbitraryranges,208 linearcongruential,205–207 non-uniform,209–212
push(Stack),77
push(StackAdapter),81
put(AbstractMap),59
put(HashMap),134
put(Map),41
putAll(AbstractMap),59
putAll(Map),41
quadtree,117
Queueclass,80
queuedatatype,76
Queuemethods add,80
first,80 indexOf,80 isEmpty,80
lastIndexOf,80
removeFirst,80 size,80
quicksort,149
radixsorting,156
“random”numbergeneration, see pseudorandomnumbergenerators randomaccess,51
Randomclass,207,209
Randommethods nextInt,209 randomsequences,212 range,37
rangequeries,111 rangequery,orthogonal,113
reachablevertex,215 record,insorting,139 recursion andstacks,77
Red-blacktree,170
reflection,50
reflexiveedge,215
regionquadtree,117 remove(AbstractList),55
remove(AbstractMap),59
remove(ArrayList),65 remove(BST),110 remove(Collection),30 remove(LinkedIter),75 remove(List),34 remove(ListIterator),25 remove(Map),41 removeAll(Collection),30 removeFirst(PriorityQueue),119 removeFirst(Queue),80 removeLast(Deque),80 removeRange(AbstractList),55 removeRange(ArrayList),66 removingfromaBST,109 retainAll(Collection),30 right(BinaryTree),93 rootnode,89 rootedtree,89 rotationofatree,179
searchingaBST,107 secondarykey,139 selection,160–161 selectionsort,146 sentinelnode,68,70 set(AbstractList),55 set(ArrayList),65 set(LinkedIter),75 set(List),34 set(ListIterator),25 Setclass,31–32
Setmethods add,32 equals,32 hashCode,32 setChild(Tree),93 setLeft(BinaryTree),93 setParent(Tree),94 setRight(BinaryTree),93 setValue(Map.Entry),41 Shell’ssort(shellsort),141 shortestpath
singledestination,232 single-source,allpaths,230 tree,230
INDEX 249
shuffle(Collections),214
singlelinking,67–68
Singletonpattern,98
size(AbstractCollection),52
size(AbstractList),55
size(AbstractMap),59
size(AbstractSequentialList),58
size(ArrayList),65
size(Collection),29
size(Map),40
size(Queue),80
size(Stack),77
size(StackAdapter),81
skiplist,193–196
sort(Collections),160
SortedMapclass,39,42
SortedMapmethods
comparator,42
firstKey,42
headMap,42
lastKey,42
subMap,42
tailMap,42
SortedSetclass,37,38,111–113
SortedSetmethods
comparator,38
first,38
headSet,38
last,38
subSet,38
tailSet,38
sorting,139–160
distributioncounting,146
exchange,149
insertion,141
merge,151
quicksort,149
radix,156
Shell’s,141
straightselection,146
sparsearrays,179
splaytree,184–193
splayFind,188
stablesort,139
stack
andrecursion,77
Stackclass,76,77,79 stackdatatype,76
Stackmethods indexOf,77 isEmpty,77
pop,77
push,77
size,77
top,77
StackAdapterclass,79,81
StackAdaptermethods indexOf,81
isEmpty,81
pop,81
push,81
size,81 top,81
staticvaluation,127 straightinsertionsort,141 straightselectionsort,146 Stringmethods hashCode,138 structuralmodification,35 subgraph,215 subList(List),33 subMap(SortedMap),42 subSet(SortedSet),38 subtree,89
swapSmallest(BST),110 symmetrictraversal,98 synchronization,199–203 synchronized keyword,200 synchronizedList(Collections),200
tailMap(SortedMap),42
tailSet(SortedSet),38
TemplateMethodpattern,47
terminalnode,89
Θ(·), see Big-Thetanotation thread-safe,200 threads,199–203
toArray(Collection),29 top(Stack),77 top(StackAdapter),81
250 INDEX
topologicalsorting,226
toString(AbstractCollection),52
toString(AbstractMap),59
totalordering,36
traversaloftree,98–99
traversinganedge,89
Tree,93
tree,89–103
arrayrepresentation,96–97
balanced,163
binary,90,91
complete,90,91
edge,89
free,216
full,90,91
height,90
leaf-uprepresentation,95
node,89
ordered,89,90
positional,90
root,89
root-downrepresentation,94
rooted,89
rotation,179
traversal,98–99
Treemethods
child,93
degree,93
label,93
numChildren,93
parent,94
setChild,93
setParent,94
treenode,89
Trie,170–179
ucb.utilclasses
AdjGraph,220
ArrayDeque,84
ArrayStack,81
BSTSet,114
Deque,80
Digraph,218
Graph,219
Queue,80
Stack,77,79
StackAdapter,79,81 unbalancedsearchtree,111 undirectedgraph,215 union-findalgorithm,237–239
UnsupportedOperationExceptionclass,28
values(AbstractMap),59
values(Map),40 vertex,inagraph,215
views,31
visitinganode,98 Visitorpattern,100 Vlissides,John,47
worst-casetime,5
INDEX 251