DSA
DataStructuresandAlgorithms: AnnotatedReferencewithExamples
FirstEdition
Copyright c GranvilleBarnett,andLucaDelTongo2008.
ThisbookismadeexclusivelyavailablefromDotNetSlackers (http://dotnetslackers.com/) the placefor.NETarticles,andnewsfrom someoftheleadingmindsinthesoftwareindustry.
Preface
Everybookhasastoryastohowitcameaboutandthisoneisnodifferent, althoughwewouldbelyingifwesaiditsdevelopmenthadnotbeensomewhat impromptu.Putsimplythisbookistheresultofaseriesofemailssentback andforthbetweenthetwoauthorsduringthedevelopmentofalibraryfor the.NETframeworkofthesamename(withtheomissionofthesubtitleof course!).Theconversationstartedoffsomethinglike,“Whydon’twecreate amoreaestheticallypleasingwaytopresentourpseudocode?”Afterafew weeksthisnewpresentationstylehadinfactgrownintopseudocodelistings withchunksoftextdescribinghowthedatastructureoralgorithminquestion worksandvariousotherthingsaboutit.Atthispointwethought,“Whatthe heck,let’smakethisthingintoabook!”Andso,inthesummerof2008we beganworkonthisbooksidebysidewiththeactuallibraryimplementation.
Whenwestartedwritingthisbooktheonlythingsthatweweresureabout withrespecttohowthebookshouldbestructuredwere:
1. alwaysmakeexplanationsassimpleaspossiblewhilemaintainingamoderatelyfinedegreeofprecisiontokeepthemoreeagermindedreaderhappy; and
2. injectdiagramstodemystifyproblemsthatareevenmoderatlychallenging tovisualise(...andsowecouldrememberhowourownalgorithmsworked whenlookingbackatthem!);andfinally
3. presentconciseandself-explanatorypseudocodelistingsthatcanbeported easilytomostmainstreamimperativeprogramminglanguageslikeC++, C#,andJava.
Akeyfactorofthisbookanditsassociatedimplementationsisthatall algorithms(unlessotherwisestated)weredesignedbyus,usingthetheoryof thealgorithminquestionasaguideline(forwhichweareeternallygratefulto theiroriginalcreators).Thereforetheymaysometimesturnouttobeworse thanthe“normal”implementations—andsometimesnot.Wearetwofellows oftheopinionthatchoiceisagreatthing.Readourbook,readseveralothers onthesamesubjectandusewhatyouseefitfromeach(ifanything)when implementingyourownversionofthealgorithmsinquestion.
Throughthisbookwehopethatyouwillseetheabsolutenecessityofunderstandingwhichdatastructureoralgorithmtouseforacertainscenario.Inall projects,especiallythosethatareconcernedwithperformance(hereweapply anevengreateremphasisonreal-timesystems)theselectionofthewrongdata structureoralgorithmcanbethecauseofagreatdealofperformancepain.
Thereforeitisabsolutelykeythatyouthinkabouttheruntimecomplexityand spacerequirementsofyourselectedapproach.Inthisbookweonlyexplainthe theoreticalimplicationstoconsider,butthisisforagoodreason:compilersare verydifferentinhowtheywork.OneC++compilermayhavesomeamazing optimisationphasesspecificallytargetedatrecursion,anothermaynot,forexample.Ofcoursethisisjustanexamplebutyouwouldbesurprisedbyhow manysubtledifferencestherearebetweencompilers.Thesedifferenceswhich maymakeafastalgorithmslow,andviceversa.Wecouldalsofactorinthe sameconcernsaboutlanguagesthattargetvirtualmachines,leavingallthe actualvariousimplementationissuestoyougiventhatyouwillknowyourlanguage’scompilermuchbetterthanus...wellinmostcases.Thishasresultedin amoreconcisebookthatfocusesonwhatwethinkarethekeyissues.
Onefinalnote:nevertakethewordsofothersasgospel;verifyallthatcan befeasiblyverifiedandmakeupyourownmind.
Wehopeyouenjoyreadingthisbookasmuchaswehaveenjoyedwritingit.
GranvilleBarnett
LucaDelTongo
Acknowledgements
Writingthisshortbookhasbeenafunandrewardingexperience.Wewould liketothank,innoparticularorderthefollowingpeoplewhohavehelpedus duringthewritingofthisbook.
SonuKapoorgenerouslyhostedourbookwhichwhenwereleasedthefirst draftreceivedoverthirteenthousanddownloads,withouthisgenerositythis bookwouldnothavebeenabletoreachsomanypeople.JonSkeetprovidedus withanalarmingnumberofsuggestionsthroughoutforwhichweareeternally grateful.Jonalsoeditedthisbookaswell.
Wewouldalsoliketothankthosewhoprovidedtheoddsuggestionviaemail tous.Allfeedbackwaslistenedtoandyouwillnodoubtseesomecontent influencedbyyoursuggestions.
Aspecialthankyoualsogoesouttothosewhohelpedpublicisethisbook fromMicrosoft’sChannel9weeklyshow(thanksDan!)tothemanybloggers whohelpedspreadtheword.Yougaveusanaudienceandforthatweare extremelygrateful.
Thankyoutoallwhocontributedinsomewaytothisbook.Theprogrammingcommunityneverceasestoamazeusinhowwillingitsconstituentsareto givetimetoprojectssuchasthisone.Thankyou.
AbouttheAuthors GranvilleBarnett
GranvilleiscurrentlyaPh.DcandidateatQueenslandUniversityofTechnology (QUT)workingonparallelismattheMicrosoftQUTeResearchCentre1.Healso holdsadegreeinComputerScience,andisaMicrosoftMVP.Hismaininterests areinprogramminglanguagesandcompilers.Granvillecanbecontactedvia oneoftwoplaces:eitherhispersonalwebsite(http://gbarnett.org)orhis blog(http://msmvps.com/blogs/gbarnett).
LucaDelTongo
LucaiscurrentlystudyingforhismastersdegreeinComputerScienceatFlorence.Hismaininterestsvaryfromwebdevelopmenttoresearchfieldssuchas dataminingandcomputervision.LucaalsomaintainsanItalianblogwhich canbefoundat http://blogs.ugidotnet.org/wetblog/
1 http://www.mquter.qut.edu.au/
Chapter1
Introduction
1.1Whatthisbookis,andwhatitisn’t
Thisbookprovidesimplementationsofcommonanduncommonalgorithmsin pseudocodewhichislanguageindependentandprovidesforeasyportingtomost imperativeprogramminglanguages.Itisnotadefinitivebookonthetheoryof datastructuresandalgorithms.
Forthemostpartthisbookpresentsimplementationsdevisedbytheauthors themselvesbasedontheconceptsbywhichtherespectivealgorithmsarebased uponsoitismorethanpossiblethatourimplementationsdifferfromthose consideredthenorm.
Youshouldusethisbookalongsideanotheronthesamesubject,butone thatcontainsformalproofsofthealgorithmsinquestion.Inthisbookweuse theabstractbigOhnotationtodepicttheruntimecomplexityofalgorithms sothatthebookappealstoalargeraudience.
1.2Assumedknowledge
Wehavewrittenthisbookwithfewassumptionsofthereader,butsomehave beennecessaryinordertokeepthebookasconciseandapproachableaspossible. Weassumethatthereaderisfamiliarwiththefollowing:
1. BigOhnotation
2. Animperativeprogramminglanguage
3. Objectorientedconcepts
1.2.1BigOhnotation
ForruntimecomplexityanalysisweusebigOhnotationextensivelysoitisvital thatyouarefamiliarwiththegeneralconceptstodeterminewhichisthebest algorithmforyouincertainscenarios.WehavechosentousebigOhnotation forafewreasons,themostimportantofwhichisthatitprovidesanabstract measurementbywhichwecanjudgetheperformanceofalgorithmswithout usingmathematicalproofs.
Figure1.1showssomeoftheruntimestodemonstratehowimportantitisto chooseanefficientalgorithm.Forthesanityofourgraphwehaveomittedcubic
O(n3),andexponential O(2n)runtimes.Cubicandexponentialalgorithms shouldonlyeverbeusedforverysmallproblems(ifever!);avoidthemiffeasibly possible.
ThefollowinglistexplainssomeofthemostcommonbigOhnotations:
O(1) constant:theoperationdoesn’tdependonthesizeofitsinput,e.g.adding anodetothetailofalinkedlistwherewealwaysmaintainapointerto thetailnode.
O(n) linear:theruntimecomplexityisproportionatetothesizeof n
O(logn) logarithmic:normallyassociatedwithalgorithmsthatbreaktheproblem intosmallerchunkspereachinvocation,e.g.searchingabinarysearch tree.
O(nlogn) just nlogn:usuallyassociatedwithanalgorithmthatbreakstheproblem intosmallerchunkspereachinvocation,andthentakestheresultsofthese smallerchunksandstitchesthembacktogether,e.g.quicksort.
O(n2) quadratic:e.g.bubblesort.
O(n3) cubic:veryrare.
O(2n) exponential:incrediblyrare.
Ifyouencountereitherofthelattertwoitems(cubicandexponential)thisis reallyasignalforyoutoreviewthedesignofyouralgorithm.Whileprototypingalgorithmdesignsyoumayjusthavetheintentionofsolvingtheproblem irrespectiveofhowfastitworks.Wewouldstronglyadvisethatyoualways reviewyouralgorithmdesignandoptimisewherepossible—particularlyloops
andrecursivecalls—sothatyoucangetthemostefficientruntimesforyour algorithms.
ThebiggestassetthatbigOhnotationgivesusisthatitallowsustoessentiallydiscardthingslikehardware.Ifyouhavetwosortingalgorithms,one withaquadraticruntime,andtheotherwithalogarithmicruntimethenthe logarithmicalgorithmwillalwaysbefasterthanthequadraticonewhenthe datasetbecomessuitablylarge.Thisapplieseveniftheformerisranonamachinethatisfarfasterthanthelatter.Why?BecausebigOhnotationisolates akeyfactorinalgorithmanalysis:growth.Analgorithmwithaquadraticrun timegrowsfasterthanonewithalogarithmicruntime.Itisgenerallysaidat somepointas n →∞ thelogarithmicalgorithmwillbecomefasterthanthe quadraticalgorithm.
BigOhnotationalsoactsasacommunicationtool.Picturethescene:you arehavingameetingwithsomefellowdeveloperswithinyourproductgroup. Youarediscussingprototypealgorithmsfornodediscoveryinmassivenetworks. Severalminuteselapseafteryouandtwoothershavediscussedyourrespective algorithmsandhowtheywork.Doesthisgiveyouagoodideaofhowfasteach respectivealgorithmis?No.Theresultofsuchadiscussionwilltellyoumore aboutthehighlevelalgorithmdesignratherthanitsefficiency.Replaythescene backinyourhead,butthistimeaswellastalkingaboutalgorithmdesigneach respectivedeveloperstatestheasymptoticruntimeoftheiralgorithm.Using thelatterapproachyounotonlygetagoodgeneralideaaboutthealgorithm design,butalsokeyefficiencydatawhichallowsyoutomakebetterchoices whenitcomestoselectinganalgorithmfitforpurpose.
Somereadersmayactuallyworkinaproductgroupwheretheyaregiven budgetsperfeature.Eachfeatureholdswithitabudgetthatrepresentsitsuppermosttimebound.Ifyousavesometimeinonefeatureitdoesn’tnecessarily giveyouabufferfortheremainingfeatures.Imagineyouareworkingonan application,andyouareintheteamthatisdevelopingtheroutinesthatwill essentiallyspinupeverythingthatisrequiredwhentheapplicationisstarted. Everythingisgreatuntilyourbosscomesinandtellsyouthatthestartup timeshouldnotexceed n ms.Theefficiencyofeveryalgorithmthatisinvoked duringstartupinthisexampleisabsolutelykeytoasuccessfulproduct.Even ifyoudon’thavethesebudgetsyoushouldstillstriveforoptimalsolutions. Takingaquantitativeapproachformanysoftwaredevelopmentproperties willmakeyouafarsuperiorprogrammer-measuringone’sworkiscriticalto success.
1.2.2Imperativeprogramminglanguage
Allexamplesaregiveninapseudo-imperativecodingformatandsothereader mustknowthebasicsofsomeimperativemainstreamprogramminglanguage toporttheexampleseffectively,wehavewrittenthisbookwiththefollowing targetlanguagesinmind:
Thereasonthatweareexplicitinthisrequirementissimple—allourimplementationsarebasedonanimperativethinkingstyle.Ifyouareafunctional programmeryouwillneedtoapplyvariousaspectsfromthefunctionalparadigm toproduceefficientsolutionswithrespecttoyourfunctionallanguagewhether itbeHaskell,F#,OCaml,etc.
Twoofthelanguagesthatwehavelisted(C#andJava)targetvirtual machineswhichprovidevariousthingslikesecuritysandboxing,andmemory managementviagarbagecollectionalgorithms.Itistrivialtoportourimplementationstotheselanguages.WhenportingtoC++youmustrememberto usepointersforcertainthings.Forexample,whenwedescribealinkedlist nodeashavingareferencetothenextnode,thisdescriptionisinthecontext ofamanagedenvironment.InC++youshouldinterpretthereferenceasa pointertothenextnodeandsoon.Forprogrammerswhohaveafairamount ofexperiencewiththeirrespectivelanguagethesesubtletieswillpresentnoissue,whichiswhywereallydoemphasisethatthereadermustbecomfortable withatleastoneimperativelanguageinordertosuccessfullyportthepseudoimplementationsinthisbook.
Itisessentialthattheuserisfamiliarwithprimitiveimperativelanguage constructsbeforereadingthisbookotherwiseyouwilljustgetlost.Somealgorithmspresentedinthisbookcanbeconfusingtofollowevenforexperienced programmers!
1.2.3Objectorientedconcepts
Forthemostpartthisbookdoesnotusefeaturesthatarespecifictoanyone language.Inparticular,weneverprovidedatastructuresoralgorithmsthat workongenerictypes—thisisinordertomakethesamplesaseasytofollow aspossible.However,toappreciatethedesignsofourdatastructuresyouwill needtobefamiliarwiththefollowingobjectoriented(OO)concepts:
1. Inheritance
2. Encapsulation
3. Polymorphism
ThisisespeciallyimportantifyouareplanningonlookingattheC#target thatwehaveimplemented(moreonthatin §1.7)whichmakesextensiveuse oftheOOconceptslistedabove.Asafinalnoteitisalsodesirablethatthe readerisfamiliarwithinterfacesastheC#targetusesinterfacesthroughout thesortingalgorithms.
1.3Pseudocode
Throughoutthisbookweusepseudocodetodescribeoursolutions.Forthe mostpartinterpretingthepseudocodeistrivialasitlooksverymuchlikea moreabstractC++,orC#,butthereareafewthingstopointout:
1. Pre-conditionsshouldalwaysbeenforced
2. Post-conditionsrepresenttheresultofapplyingalgorithm a todatastructure d
3. Thetypeofparametersisinferred
4. Allprimitivelanguageconstructsareexplicitlybegunandended Ifanalgorithmhasareturntypeitwilloftenbepresentedinthepostcondition,butwherethereturntypeissufficientlyobviousitmaybeomitted forthesakeofbrevity.
Mostalgorithmsinthisbookrequireparameters,andbecauseweassignno explicittypetothoseparametersthetypeisinferredfromthecontextsinwhich itisused,andtheoperationsperformeduponit.Additionally,thenameof theparameterusuallyactsasthebiggestcluetoitstype.Forinstance n isa pseudo-nameforanumberandsoyoucanassumeunlessotherwisestatedthat n translatestoanintegerthathasthesamenumberofbitsasaWORDona 32bitmachine,similarly l isapseudo-nameforalistwherealistisaresizeable array(e.g.avector).
Thelastmajorpointofreferenceisthatwealwaysexplicitlyendalanguage construct.Forinstanceifwewishtoclosethescopeofa for loopwewill explicitlystate endfor ratherthanleavingtheinterpretationofwhenscopes areclosedtothereader.Whileimplicitscopeclosureworkswellinsimplecode, incomplexcasesitcanleadtoambiguity.
Thepseudocodestylethatweusewithinthisbookisratherstraightforward. Allalgorithmsstartwithasimplealgorithmsignature,e.g.
1) algorithm AlgorithmName(arg1, arg2,..., argN )
2)...
n) end AlgorithmName
Immediatelyafterthealgorithmsignaturewelistany Pre or Post conditions.
1) algorithm AlgorithmName(n)
2) Pre: n isthevaluetocomputethefactorialof
3) n ≥ 0
4) Post: thefactorialof n hasbeencomputed
5)//...
n) end AlgorithmName
Theexampleabovedescribesanalgorithmbythenameof AlgorithmName, whichtakesasinglenumericparameter n.Thepreandpostconditionsfollow thealgorithmsignature;youshouldalwaysenforcethepre-conditionsofan algorithmwhenportingthemtoyourlanguageofchoice.
Normallywhatislistedasapre-coniditioniscriticaltothealgorithmsoperation.Thismaycoverthingsliketheactualparameternotbeingnull,orthatthe collectionpassedinmustcontainatleast n items.Thepost-conditionmainly describestheeffectofthealgorithmsoperation.Anexampleofapost-condition mightbe“Thelisthasbeensortedinascendingorder”
Becauseeverythingwedescribeislanguageindependentyouwillneedto makeyourownminduponhowtobesthandlepre-conditions.Forexample, intheC#targetwehaveimplemented,weconsidernon-conformancetopreconditionstobeexceptionalcases.Weprovideamessageintheexceptionto tellthecallerwhythealgorithmhasfailedtoexecutenormally.
1.4Tipsforworkingthroughtheexamples
Aswithmostbooksyougetoutwhatyouputinandsowerecommendthatin ordertogetthemostoutofthisbookyouworkthrougheachalgorithmwitha penandpapertotrackthingslikevariablenames,recursivecallsetc.
Thebestwaytoworkthroughalgorithmsistosetupatable,andinthat tablegiveeachvariableitsowncolumnandcontinuouslyupdatethesecolumns. Thiswillhelpyoukeeptrackofandvisualisethemutationsthatareoccurring throughoutthealgorithm.Oftenwhileworkingthroughalgorithmsinsuch awayyoucanintuitivelymaprelationshipsbetweendatastructuresrather thantryingtoworkoutafewvaluesonpaperandtherestinyourhead.We suggestyouputeverythingonpaperirrespectiveofhowtrivialsomevariables andcalculationsmaybesothatyoualwayshaveapointofreference.
Whendealingwithrecursivealgorithmtraceswerecommendyoudothe sameastheabove,butalsohaveatablethatrecordsfunctioncallsandwho theyreturnto.Thisapproachisafarcleanerwaythandrawingoutanelaborate mapoffunctioncallswitharrowstooneanother,whichgetslargequicklyand simplymakesthingsmorecomplextofollow.Trackeverythinginasimpleand systematicwaytomakeyourtimestudyingtheimplementationsfareasier.
1.5Bookoutline
Wehavesplitthisbookintotwoparts:
Part1: Providesdiscussionandpseudo-implementationsofcommonanduncommondatastructures;and
Part2: Providesalgorithmsofvaryingpurposesfromsortingtostringoperations.
Thereaderdoesn’thavetoreadthebooksequentiallyfrombeginningto end:chapterscanbereadindependentlyfromoneanother.Wesuggestthat inpart1youreadeachchapterinitsentirety,butinpart2youcangetaway withjustreadingthesectionofachapterthatdescribesthealgorithmyouare interestedin.
Eachofthechaptersondatastructurespresentinitiallythealgorithmsconcernedwith:
Thepreviouslistrepresentswhatwebelieveinthevastmajorityofcasesto bethemostimportantforeachrespectivedatastructure.
Forallreaderswerecommendthatbeforelookingatanyalgorithmyou quicklylookatAppendixEwhichcontainsatablelistingthevarioussymbols usedwithinouralgorithmsandtheirmeaning.Onekeywordthatwewouldlike topointouthereis yield.Youcanthinkof yield inthesamelightas return The return keywordcausesthemethodtoexitandreturnscontroltothecaller, whereas yield returnseachvaluetothecaller.With yield controlonlyreturns tothecallerwhenallvaluestoreturntothecallerhavebeenexhausted.
1.6Testing
Allthedatastructuresandalgorithmshavebeentestedusingaminimisedtest drivendevelopmentstyleonpapertofleshoutthepseudocodealgorithm.We thentranscribethesetestsintounittestssatisfyingthemonebyone.When allthetestcaseshavebeenprogressivelysatisfiedweconsiderthatalgorithm suitablytested.
Forthemostpartalgorithmshavefairlyobviouscaseswhichneedtobe satisfied.Somehoweverhavemanyareaswhichcanprovetobemorecomplex tosatisfy.Withsuchalgorithmswewillpointoutthetestcaseswhicharetricky andthecorrespondingportionsofpseudocodewithinthealgorithmthatsatisfy thatrespectivecase.
Asyoubecomemorefamiliarwiththeactualproblemyouwillbeableto intuitivelyidentifyareaswhichmaycauseproblemsforyouralgorithmsimplementation.Thisinsomecaseswillyieldanoverwhelminglistofconcernswhich willhinderyourabilitytodesignanalgorithmgreatly.Whenyouarebombardedwithsuchavastamountofconcernslookattheoverallproblemagain andsub-dividetheproblemintosmallerproblems.Solvingthesmallerproblems andthencomposingthemisafareasiertaskthancloudingyourmindwithtoo manylittledetails.
Theonlytypeoftestingthatweuseintheimplementationofallthatis providedinthisbookareunittests.Becauseunittestscontributesuchacore pieceofcreatingsomewhatmorestablesoftwareweinvitethereadertoview AppendixDwhichdescribestestinginmoredepth.
1.7WherecanIgetthecode?
Thisbookdoesn’tprovideanycodespecificallyalignedwithit,howeverwedo activelymaintainanopensourceproject1 thathousesaC#implementationof allthepseudocodelisted.Theprojectisnamed DataStructuresandAlgorithms (DSA)andcanbefoundat http://codeplex.com/dsa.
1.8Finalmessages
Wehavejustafewfinalmessagestothereaderthatwehopeyoudigestbefore youembarkonreadingthisbook:
1. Understandhowthealgorithmworksfirstinanabstractsense;and
2. Alwaysworkthroughthealgorithmsonpapertounderstandhowthey achievetheiroutcome
Ifyoualwaysfollowthesekeypoints,youwillgetthemostoutofthisbook.
1 Allreadersareencouragedtoprovidesuggestions,featurerequests,andbugssowecan furtherimproveourimplementations.
PartI DataStructures
Chapter2
LinkedLists
Linkedlistscanbethoughtoffromahighlevelperspectiveasbeingaseries ofnodes.Eachnodehasatleastasinglepointertothenextnode,andinthe lastnode’scaseanullpointerrepresentingthattherearenomorenodesinthe linkedlist.
InDSAourimplementationsoflinkedlistsalwaysmaintainheadandtail pointerssothatinsertionateithertheheadortailofthelistisaconstant timeoperation.Randominsertionisexcludedfromthisandwillbealinear operation.Assuch,linkedlistsinDSAhavethefollowingcharacteristics:
1. Insertionis O(1)
2. Deletionis O(n)
3. Searchingis O(n)
Outofthethreeoperationstheonethatstandsoutisthatofinsertion.In DSAwechosetoalwaysmaintainpointers(ormoreaptlyreferences)tothe node(s)attheheadandtailofthelinkedlistandsoperformingatraditional insertiontoeitherthefrontorbackofthelinkedlistisan O(1)operation.An exceptiontothisruleisperforminganinsertionbeforeanodethatisneither theheadnortailinasinglylinkedlist.Whenthenodeweareinsertingbefore issomewhereinthemiddleofthelinkedlist(knownasrandominsertion)the complexityis O(n).Inordertoaddbeforethedesignatednodeweneedto traversethelinkedlisttofindthatnode’scurrentpredecessor.Thistraversal yieldsan O(n)runtime.
Thisdatastructureistrivial,butlinkedlistshaveafewkeypointswhichat timesmakethemveryattractive:
1. thelistisdynamicallyresized,thusitincursnocopypenaltylikeanarray orvectorwouldeventuallyincur;and
2. insertionis O(1).
2.1SinglyLinkedList
Singlylinkedlistsareoneofthemostprimitivedatastructuresyouwillfindin thisbook.Eachnodethatmakesupasinglylinkedlistconsistsofavalue,and areferencetothenextnode(ifany)inthelist.
2.1.1Insertion
Ingeneralwhenpeopletalkaboutinsertionwithrespecttolinkedlistsofany formtheyimplicitlyrefertotheaddingofanodetothetailofthelist.When youuseanAPIlikethatofDSAandyouseeageneralpurposemethodthat addsanodetothelist,youcanassumethatyouareaddingthenodetothetail ofthelistnotthehead.
Addinganodetoasinglylinkedlisthasonlytwocases:
1. head = ∅ inwhichcasethenodeweareaddingisnowboththe head and tail ofthelist;or
2. wesimplyneedtoappendournodeontotheendofthelistupdatingthe tail referenceappropriately.
1) algorithm Add(value)
2) Pre: value isthevaluetoaddtothelist
3) Post: value hasbeenplacedatthetailofthelist
4) n ← node(value)
5) if head = ∅ 6) head ← n
Asanexampleofthepreviousalgorithmconsideraddingthefollowingsequenceofintegerstothelist:1,45,60,and12,theresultinglististhatof Figure2.2.
2.1.2Searching
Searchingalinkedlistisstraightforward:wesimplytraversethelistchecking thevaluewearelookingforwiththevalueofeachnodeinthelinkedlist.The algorithmlistedinthissectionisverysimilartothatusedfortraversalin §2.1.4.
1) algorithm Contains(head, value)
2) Pre: head istheheadnodeinthelist
3) value isthevaluetosearchfor
4) Post: theitemiseitherinthelinkedlist,true;otherwisefalse
5) n ← head
6) while n = ∅ and n.Value = value
7) n ← n.Next 8) endwhile 9) if n = ∅
returnfalse
endif
returntrue
end Contains
2.1.3Deletion
Deletinganodefromalinkedlistisstraightforwardbutthereareafewcases weneedtoaccountfor:
1. thelistisempty;or
2. thenodetoremoveistheonlynodeinthelinkedlist;or
3. weareremovingtheheadnode;or
4. weareremovingthetailnode;or
5. thenodetoremoveissomewhereinbetweentheheadandtail;or
6. theitemtoremovedoesn’texistinthelinkedlist
Thealgorithmwhosecaseswehavedescribedwillremoveanodefromanywherewithinalistirrespectiveofwhetherthenodeisthe head etc.Ifyouknow thatitemswillonlyeverberemovedfromthe head or tail ofthelistthenyou cancreatemuchmoreconcisealgorithms.Inthecaseofalwaysremovingfrom thefrontofthelinkedlistdeletionbecomesan O(1)operation.
1) algorithm Remove(head, value)
2) Pre: head istheheadnodeinthelist
3) value isthevaluetoremovefromthelist
4) Post: value isremovedfromthelist,true;otherwisefalse
5) if head = ∅ 6)//case1
7) returnfalse
8) endif 9) n ← head 10) if n.Value= value 11) if head = tail
13) head ←∅
tail ←∅ 15) else
17) head ← head.Next
endif
returntrue 20) endif
while n.Next = ∅ and n.Next.Value = value
n ← n.Next
endwhile
if n.Next = ∅ 25) if n.Next= tail
27) tail ← n
endif 29)//thisisonlycase5iftheconditionalonline25was false
n.Next ← n.Next.Next
returnfalse
end Remove
2.1.4Traversingthelist
Traversingasinglylinkedlististhesameasthatoftraversingadoublylinked list(definedin §2.2).Youstartattheheadofthelistandcontinueuntilyou comeacrossanodethatis ∅.Thetwocasesareasfollows:
1. node = ∅,wehaveexhaustedallnodesinthelinkedlist;or
2. wemustupdatethe node referencetobe node.Next.
Thealgorithmdescribedisaverysimpleonethatmakesuseofasimple while looptocheckthefirstcase.
1) algorithm Traverse(head)
2) Pre: head istheheadnodeinthelist
3) Post: theitemsinthelisthavebeentraversed
4) n ← head
5) while n =0
6) yield n.Value
7) n ← n.Next 8)
2.1.5Traversingthelistinreverseorder
Traversingasinglylinkedlistinaforwardmanner(i.e.lefttoright)issimple asdemonstratedin §2.1.4.However,whatifwewantedtotraversethenodesin thelinkedlistinreverseorderforsomereason?Thealgorithmtoperformsuch atraversalisverysimple,andjustlikedemonstratedin §2.1.3wewillneedto acquireareferencetothepredecessorofanode,eventhoughthefundamental characteristicsofthenodesthatmakeupasinglylinkedlistmakethisan expensiveoperation.Foreachnode,findingitspredecessorisan O(n)operation, sooverthecourseoftraversingthewholelistbackwardsthecostbecomes O(n2). Figure2.3depictsthefollowingalgorithmbeingappliedtoalinkedlistwith theintegers5,10,1,and40.
1) algorithm ReverseTraversal(head, tail)
2) Pre: head and tail belongtothesamelist
3) Post: theitemsinthelisthavebeentraversedinreverseorder 4) if tail = ∅ 5) curr ← tail
Thisalgorithmisonlyofrealinterestwhenweareusingsinglylinkedlists, asyouwillsoonseethatdoublylinkedlists(definedin §2.2)makereverselist traversalsimpleandefficient,asshownin §2.2.3.
2.2DoublyLinkedList
Doublylinkedlistsareverysimilartosinglylinkedlists.Theonlydifferenceis thateachnodehasareferencetoboththenextandpreviousnodesinthelist.
Thefollowingalgorithmsforthedoublylinkedlistareexactlythesameas thoselistedpreviouslyforthesinglylinkedlist:
1. Searching(definedin §2.1.2)
2. Traversal(definedin §2.1.4)
2.2.1Insertion
Theonlymajordifferencebetweenthealgorithmin §2.1.1isthatweneedto remembertobindthepreviouspointerof n totheprevioustailnodeif n was notthefirstnodetobeinsertedintothelist.
1) algorithm Add(value)
2) Pre: value isthevaluetoaddtothelist
3) Post: value hasbeenplacedatthetailofthelist
4) n ← node(value)
5) if head = ∅
6) head ← n
7) tail ← n
8) else
9) n.Previous ← tail
10) tail.Next ← n 11) tail ← n
12) endif
13) end Add
Figure2.5showsthedoublylinkedlistafteraddingthesequenceofintegers definedin §2.1.1.
2.2.2Deletion
Asyoumayofguessedthecasesthatweusefordeletioninadoublylinked listareexactlythesameasthosedefinedin §2.1.3.Likeinsertionwehavethe addedtaskofbindinganadditionalreference(Previous)tothecorrectvalue.
1) algorithm Remove(head, value)
2) Pre: head istheheadnodeinthelist
3) value isthevaluetoremovefromthelist
4) Post: value isremovedfromthelist,true;otherwisefalse
Singlylinkedlistshaveaforwardonlydesign,whichiswhythereversetraversal algorithmdefinedin §2.1.5requiredsomecreativeinvention.Doublylinkedlists makereversetraversalassimpleasforwardtraversal(definedin §2.1.4)except thatwestartatthetailnodeandupdatethepointersintheoppositedirection. Figure2.6showsthereversetraversalalgorithminaction.
2.3Summary
Linkedlistsaregoodtousewhenyouhaveanunknownnumberofitemsto store.Usingadatastructurelikeanarraywouldrequireyoutospecifythesize upfront;exceedingthatsizeinvolvesinvokingaresizingalgorithmwhichhas alinearruntime.Youshouldalsouselinkedlistswhenyouwillonlyremove nodesateithertheheadortailofthelisttomaintainaconstantruntime. Thisrequiresmaintainingpointerstothenodesattheheadandtailofthelist butthememoryoverheadwillpayforitselfifthisisanoperationyouwillbe performingmanytimes.
Whatlinkedlistsarenotverygoodforisrandominsertion,accessingnodes byindex,andsearching.Attheexpenseofalittlememory(inmostcases4 byteswouldsuffice),andafewmoreread/writesyoucouldmaintaina count variablethattrackshowmanyitemsarecontainedinthelistsothataccessing suchaprimitivepropertyisaconstantoperation-youjustneedtoupdate count duringtheinsertionanddeletionalgorithms.
Singlylinkedlistsshouldbeusedwhenyouareonlyperformingbasicinsertions.Ingeneraldoublylinkedlistsaremoreaccommodatingfornon-trivial operationsonalinkedlist.
Werecommendtheuseofadoublylinkedlistwhenyourequireforwards andbackwardstraversal.Forthemostcasesthisrequirementispresent.For example,consideratokenstreamthatyouwanttoparseinarecursivedescent fashion.Sometimesyouwillhavetobacktrackinordertocreatethecorrect parsetree.Inthisscenarioadoublylinkedlistisbestasitsdesignmakes bi-directionaltraversalmuchsimplerandquickerthanthatofasinglylinked
Chapter3
BinarySearchTree
Binarysearchtrees(BSTs)areverysimpletounderstand.Westartwitharoot nodewithvalue x,wheretheleftsubtreeof x containsnodeswithvalues <x andtherightsubtreecontainsnodeswhosevaluesare ≥ x.Eachnodefollows thesameruleswithrespecttonodesintheirleftandrightsubtrees.
BSTsareofinterestbecausetheyhaveoperationswhicharefavourablyfast: insertion,lookup,anddeletioncanallbedonein O(logn)time.Itisimportant tonotethatthe O(logn)timesfortheseoperationscanonlybeattainedif theBSTisreasonablybalanced;foratreedatastructurewithselfbalancing propertiesseeAVLtreedefinedin §7).
Inthefollowingexamplesyoucanassume,unlessusedasaparameteralias that root isareferencetotherootnodeofthetree.
3.1Insertion
Asmentionedpreviouslyinsertionisan O(logn)operationprovidedthatthe treeismoderatelybalanced.
1) algorithm Insert(value)
2) Pre: value haspassedcustomtypechecksfortype T
3) Post: value hasbeenplacedinthecorrectlocationinthetree
4) if root = ∅
5) root ← node(value)
6) else
7)InsertNode(root, value)
8) endif
9) end Insert
1) algorithm InsertNode(current, value)
2) Pre: current isthenodetostartfrom
3) Post: value hasbeenplacedinthecorrectlocationinthetree
4) if value<current.Value
5) if current.Left= ∅
6) current.Left ← node(value)
7) else
8)InsertNode(current.Left, value)
9) endif
10) else
11) if current.Right= ∅
12) current.Right ← node(value) 13) else
14)InsertNode(current.Right, value) 15) endif 16) endif 17) end InsertNode
Theinsertionalgorithmissplitforagoodreason.Thefirstalgorithm(nonrecursive)checksaverycorebasecase-whetherornotthetreeisempty.If thetreeisemptythenwesimplycreateourrootnodeandfinish.Inallother casesweinvoketherecursive InsertNode algorithmwhichsimplyguidesusto thefirstappropriateplaceinthetreetoput value.Notethatateachstagewe performabinarychop:weeitherchoosetorecurseintotheleftsubtreeorthe rightbycomparingthenewvaluewiththatofthecurrentnode.Foranytotally orderedtype,novaluecansimultaneouslysatisfytheconditionstoplaceitin bothsubtrees.
3.2Searching
SearchingaBSTisevensimplerthaninsertion.Thepseudocodeisself-explanatory butwewilllookbrieflyatthepremiseofthealgorithmnonetheless. Wehavetalkedpreviouslyaboutinsertion,wegoeitherleftorrightwiththe rightsubtreecontainingvaluesthatare ≥ x where x isthevalueofthenode weareinserting.Whensearchingtherulesaremadealittlemoreatomicand atanyonetimewehavefourcasestoconsider:
1. the root = ∅ inwhichcase value isnotintheBST;or
2. root.Value= value inwhichcase value isintheBST;or
3. value<root.Value,wemustinspecttheleftsubtreeof root for value;or
4. value>root.Value,wemustinspecttherightsubtreeof root for value.
1) algorithm Contains(root, value)
2) Pre: root istherootnodeofthetree, value iswhatwewouldliketolocate
3) Post: value iseitherlocatedornot 4) if root = ∅
3.3Deletion
RemovinganodefromaBSTisfairlystraightforward,withfourcasestoconsider:
1. thevaluetoremoveisaleafnode;or
2. thevaluetoremovehasarightsubtree,butnoleftsubtree;or
3. thevaluetoremovehasaleftsubtree,butnorightsubtree;or
4. thevaluetoremovehasbothaleftandrightsubtreeinwhichcasewe promotethelargestvalueintheleftsubtree.
Thereisalsoanimplicitfifthcasewherebythenodetoberemovedisthe onlynodeinthetree.Thiscaseisalreadycoveredbythefirst,butshouldbe notedasapossibilitynonetheless.
OfcourseinaBSTavaluemayoccurmorethanonce.Insuchacasethe firstoccurrenceofthatvalueintheBSTwillberemoved.
The Remove algorithmgivenbelowreliesontwofurtherhelperalgorithms named FindParent,and FindNode whicharedescribedin §3.4and §3.5respectively.
1) algorithm Remove(value)
2) Pre: value isthevalueofthenodetoremove, root istherootnodeoftheBST
3)CountisthenumberofitemsintheBST
3) Post: nodewith value isremovediffoundinwhichcaseyieldstrue,otherwisefalse
value
3.4Findingtheparentofagivennode
Thepurposeofthisalgorithmissimple-toreturnareference(orpointer)to theparentnodeoftheonewiththegivenvalue.Wehavefoundthatsuchan algorithmisveryuseful,especiallywhenperformingextensivetreetransformations.
1) algorithm FindParent(value, root)
2) Pre: value isthevalueofthenodewewanttofindtheparentof
3) root istherootnodeoftheBSTandis!= ∅
4) Post: areferencetotheparentnodeof value iffound;otherwise ∅
5) if value = root.Value
6) return ∅ 7) endif
8) if value<root.Value
9) if root.Left= ∅
10) return ∅
11) elseif root.Left.Value= value
12) return root
13) else
14) return FindParent(value, root.Left)
15) endif 16) else
17) if root.Right= ∅ 18) return ∅ 19) elseif root.Right.Value= value
20) return root 21) else
22) return FindParent(value, root.Right)
23) endif 24) endif 25) end FindParent
Aspecialcaseintheabovealgorithmiswhenthespecifiedvaluedoesnot existintheBST,inwhichcasewereturn ∅.Callerstothisalgorithmmusttake accountofthispossibilityunlesstheyarealreadycertainthatanodewiththe specifiedvalueexists.
3.5Attainingareferencetoanode
Thisalgorithmisverysimilarto §3.4,butinsteadofreturningareferencetothe parentofthenodewiththespecifiedvalue,itreturnsareferencetothenode itself.Again, ∅ isreturnedifthevalueisn’tfound.
1) algorithm FindNode(root, value)
2) Pre: value isthevalueofthenodewewanttofindtheparentof
3) root istherootnodeoftheBST
areferencetothenodeof value iffound;otherwise ∅
Astutereaderswillhavenoticedthatthe FindNode algorithmisexactlythe sameasthe Contains algorithm(definedin §3.2)withthemodificationthat wearereturningareferencetoanodenot true or false.Given FindNode, theeasiestwayofimplementing Contains istocall FindNode andcomparethe returnvaluewith ∅.
3.6Findingthesmallestandlargestvaluesin thebinarysearchtree
TofindthesmallestvalueinaBSTyousimplytraversethenodesintheleft subtreeoftheBSTalwaysgoingleftuponeachencounterwithanode,terminatingwhenyoufindanodewithnoleftsubtree.Theoppositeisthecasewhen findingthelargestvalueintheBST.Bothalgorithmsareincrediblysimple,and arelistedsimplyforcompleteness.
Thebasecaseinboth FindMin,and FindMax algorithmsiswhentheLeft (FindMin),orRight(FindMax)nodereferencesare ∅ inwhichcasewehave reachedthelastnode.
1) algorithm FindMin(root)
2) Pre: root istherootnodeoftheBST
3) root = ∅
4) Post: thesmallestvalueintheBSTislocated 5) if root.Left= ∅ 6) return root.Value 7) endif 8)FindMin(root.Left) 9) end FindMin
1) algorithm FindMax(root)
2) Pre: root istherootnodeoftheBST
3) root = ∅
4) Post: thelargestvalueintheBSTislocated
5) if root.Right= ∅
6) return root.Value
7) endif
8)FindMax(root.Right)
9) end FindMax
3.7TreeTraversals
Therearevariousstrategieswhichcanbeemployedtotraversetheitemsina tree;thechoiceofstrategydependsonwhichnodevisitationorderyourequire. InthissectionwewilltouchonthetraversalsthatDSAprovidesonalldata structuresthatderivefrom BinarySearchTree
3.7.1Preorder
Whenusingthepreorderalgorithm,youvisittherootfirst,thentraversetheleft subtreeandfinallytraversetherightsubtree.Anexampleofpreordertraversal isshowninFigure3.3.
1) algorithm Preorder(root)
2) Pre: root istherootnodeoftheBST
3) Post: thenodesintheBSThavebeenvisitedinpreorder
4) if root = ∅
5) yield root.Value
6)Preorder(root.Left)
7)Preorder(root.Right)
8) endif
9) end Preorder
3.7.2Postorder
Thisalgorithmisverysimilartothatdescribedin §3.7.1,howeverthevalue ofthenodeisyieldedaftertraversingbothsubtrees.Anexampleofpostorder traversalisshowninFigure3.4.
1) algorithm Postorder(root)
2) Pre: root istherootnodeoftheBST
3) Post: thenodesintheBSThavebeenvisitedinpostorder
4) if root = ∅
5)Postorder(root.Left)
6)Postorder(root.Right)
7) yield root.Value
8) endif
9) end Postorder
3.7.3Inorder
Anothervariationofthealgorithmsdefinedin §3.7.1and §3.7.2isthatofinorder traversalwherethevalueofthecurrentnodeisyieldedinbetweentraversing theleftsubtreeandtherightsubtree.Anexampleofinordertraversalisshown inFigure3.5.
Figure3.5:Inordervisitbinarysearchtreeexample
1) algorithm Inorder(root)
2) Pre: root istherootnodeoftheBST
3) Post: thenodesintheBSThavebeenvisitedininorder
4) if root = ∅
5)Inorder(root.Left)
6) yield root.Value
7)Inorder(root.Right)
8) endif
9) end Inorder
Oneofthebeautiesofinordertraversalisthatvaluesareyieldedintheir comparisonorder.Inotherwords,whentraversingapopulatedBSTwiththe inorderstrategy,theyieldedsequencewouldhaveproperty xi ≤ xi+1∀i.
3.7.4BreadthFirst
Traversingatreeinbreadthfirstorderyieldsthevaluesofallnodesofaparticulardepthinthetreebeforeanydeeperones.Inotherwords,givenadepth d wewouldvisitthevaluesofallnodesat d inalefttorightfashion,thenwe wouldproceedto d +1andsoonuntilwehadenomorenodestovisit.An exampleofbreadthfirsttraversalisshowninFigure3.6.
Traditionallybreadthfirsttraversalisimplementedusingalist(vector,resizeablearray,etc)tostorethevaluesofthenodesvisitedinbreadthfirstorder andthenaqueuetostorethosenodesthathaveyettobevisited.
1) algorithm BreadthFirst(root)
2) Pre: root istherootnodeoftheBST
3) Post: thenodesintheBSThavebeenvisitedinbreadthfirstorder
4) q ← queue
5) while root = ∅
6) yield root.Value
7) if root.Left = ∅
8) q.Enqueue(root.Left)
9) endif 10) if root.Right = ∅ 11) q.Enqueue(root.Right) 12) endif 13) if !q.IsEmpty()
14) root ← q.Dequeue()
15) else
16) root ←∅
17) endif 18) endwhile 19) end BreadthFirst
3.8Summary
Abinarysearchtreeisagoodsolutionwhenyouneedtorepresenttypesthatare orderedaccordingtosomecustomrulesinherenttothattype.Withlogarithmic insertion,lookup,anddeletionitisveryeffecient.Traversalremainslinear,but therearemanywaysinwhichyoucanvisitthenodesofatree.Treesare recursivedatastructures,sotypicallyyouwillfindthatmanyalgorithmsthat operateonatreearerecursive.
Theruntimespresentedinthischapterarebasedonaprettybigassumption -thatthebinarysearchtree’sleftandrightsubtreesarereasonablybalanced. Wecanonlyattainlogarithmicruntimesforthealgorithmspresentedearlier whenthisistrue.Abinarysearchtreedoesnotenforcesuchaproperty,and theruntimesfortheseoperationsonapathologicallyunbalancedtreebecome linear:suchatreeiseffectivelyjustalinkedlist.Laterin §7wewillexamine anAVLtreethatenforcesself-balancingpropertiestohelpattainlogarithmic runtimes.
Chapter4
Heap
Aheapcanbethoughtofasasimpletreedatastructure,howeveraheapusually employsoneoftwostrategies:
1. minheap;or
2. maxheap
Eachstrategydeterminesthepropertiesofthetreeanditsvalues.Ifyou weretochoosetheminheapstrategytheneachparentnodewouldhaveavalue thatis ≤ thanitschildren.Forexample,thenodeattherootofthetreewill havethesmallestvalueinthetree.Theoppositeistrueforthemaxheap strategy.Inthisbookyoushouldassumethataheapemploystheminheap strategyunlessotherwisestated.
Unlikeothertreedatastructuresliketheonedefinedin §3aheapisgenerally implementedasanarrayratherthanaseriesofnodeswhicheachhavereferencestoothernodes.Thenodesareconceptuallythesame,however,havingat mosttwochildren.Figure4.1showshowthetree(notaheapdatastructure) (127(32)6(9))wouldberepresentedasanarray.ThearrayinFigure4.1isa resultofsimplyaddingvaluesinatop-to-bottom,left-to-rightfashion.Figure 4.2showsarrowstothedirectleftandrightchildofeachvalueinthearray.
Thischapterisverymuchcentredaroundthenotionofrepresentingatreeas anarrayandbecausethispropertyiskeytounderstandingthischapterFigure 4.3showsastepbystepprocesstorepresentatreedatastructureasanarray. InFigure4.3youcanassumethatthedefaultcapacityofourarrayiseight.
Usingjustanarrayisoftennotsufficientaswehavetobeupfrontaboutthe sizeofthearraytousefortheheap.Oftentheruntimebehaviourofaprogram canbeunpredictablewhenitcomestothesizeofitsinternaldatastructures, soweneedtochooseamoredynamicdatastructurethatcontainsthefollowing properties:
1. wecanspecifyaninitialsizeofthearrayforscenarioswhereweknowthe upperstoragelimitrequired;and
2. thedatastructureencapsulatesresizingalgorithmstogrowthearrayas requiredatruntime
Figure4.1doesnotspecifyhowwewouldhandleaddingnullreferencesto theheap.Thisvariesfromcasetocase;sometimesnullvaluesareprohibited entirely;inothercaseswemaytreatthemasbeingsmallerthananynon-null value,orindeedgreaterthananynon-nullvalue.Youwillhavetoresolvethis ambiguityyourselfhavingstudiedyourrequirements.Forthesakeofclaritywe willavoidtheissuebyprohibitingnullvalues.
Becauseweareusinganarrayweneedsomewaytocalculatetheindexofa parentnode,andthechildrenofanode.Therequiredexpressionsforthisare definedasfollowsforanodeat index:
1. (index 1)/2(parentindex)
2. 2 ∗ index +1(leftchild)
3. 2 ∗ index +2(rightchild)
InFigure4.4a)representsthecalculationoftherightchildof12(2 ∗ 0+2); andb)calculatestheindexoftheparentof3((3 1)/2).
4.1Insertion
Designinganalgorithmforheapinsertionissimple,butwemustensurethat heaporderispreservedaftereachinsertion.Generallythisisapost-insertion operation.Insertingavalueintothenextfreeslotinanarrayissimple:wejust needtokeeptrackofthenextfreeindexinthearrayasacounter,andincrement itaftereachinsertion.Insertingourvalueintotheheapisthefirstpartofthe algorithm;thesecondisvalidatingheaporder.Inthecaseofmin-heapordering thisrequiresustoswapthevaluesofaparentanditschildifthevalueofthe childis < thevalueofitsparent.Wemustdothisforeachsubtreecontaining thevaluewejustinserted.
Theruntimeefficiencyforheapinsertionis O(logn).Theruntimeisa byproductofverifyingheaporderasthefirstpartofthealgorithm(theactual insertionintothearray)is O(1).
Figure4.5showsthestepsofinsertingthevalues3,9,12,7,and1intoa min-heap.
1) algorithm Add(value)
2) Pre: value isthevaluetoaddtotheheap
3)Countisthenumberofitemsintheheap
4) Post: thevaluehasbeenaddedtotheheap
5) heap[Count] ← value
6)Count ← Count+1
7)MinHeapify()
8) end Add
1) algorithm MinHeapify()
2) Pre: Countisthenumberofitemsintheheap
3) heap isthearrayusedtostoretheheapitems
4) Post: theheaphaspreservedminheapordering
5) i ← Count 1
6) while i> 0 and heap[i] <heap[(i 1)/2]
7)Swap(heap[i], heap[(i 1)/2]
8) i ← (i 1)/2
9) endwhile
10) end MinHeapify
Thedesignofthe MaxHeapify algorithmisverysimilartothatofthe MinHeapify algorithm,theonlydifferenceisthatthe < operatorinthesecond conditionofenteringthewhileloopischangedto >
4.2Deletion
Justasforinsertion,deletinganiteminvolvesensuringthatheaporderingis preserved.Thealgorithmfordeletionhasthreesteps:
1. findtheindexofthevaluetodelete
2. putthelastvalueintheheapattheindexlocationoftheitemtodelete
3. verifyheaporderingforeachsubtreewhichusedtoincludethevalue
1) algorithm Remove(value)
2) Pre: value isthevaluetoremovefromtheheap
3) left,and right areupdatedalias’for2 ∗ index +1,and2 ∗ index +2respectively
4)Countisthenumberofitemsintheheap
5) heap isthearrayusedtostoretheheapitems
6) Post: value islocatedintheheapandremoved,true;otherwisefalse
7)//step1
8) index ← FindIndex(heap, value)
9) if index< 0
10) returnfalse
11) endif
12)Count ← Count 1
13)//step2
14) heap[index] ← heap[Count]
15)//step3
16) while left< Count and heap[index] >heap[left] or heap[index] >heap[right]
17)//promotesmallestkeyfromsubtree
18) if heap[left] <heap[right]
19)Swap(heap, left, index)
20) index ← left
21) else
22)Swap(heap, right, index)
23) index ← right
24) endif
25) endwhile
26) returntrue
27) end Remove
Figure4.6showsthe Remove algorithmvisually,removing1fromaheap containingthevalues1,3,9,12,and13.InFigure4.6youcanassumethatwe havespecifiedthatthebackingarrayoftheheapshouldhaveaninitialcapacity ofeight.
Pleasenotethatinourdeletionalgorithmthatwedon’tdefaulttheremoved valueinthe heap array.Ifyouareusingaheapforreferencetypes,i.e.objects thatareallocatedonaheapyouwillwanttofreethatmemory.Thisisimportant inbothunmanaged,andmanagedlanguages.Inthelatterwewillwanttonull thatemptyholesothatthegarbagecollectorcanreclaimthatmemory.Ifwe weretonotnullthatholethentheobjectcouldstillbereachedandthuswon’t begarbagecollected.
4.3Searching
Searchingaheapismerelyamatteroftraversingtheitemsintheheaparray sequentially,sothisoperationhasaruntimecomplexityof O(n).Thesearch canbethoughtofasonethatusesabreadthfirsttraversalasdefinedin §3.7.4 tovisitthenodeswithintheheaptocheckforthepresenceofaspecifieditem.
Theproblemwiththepreviousalgorithmisthatwedon’ttakeadvantage ofthepropertiesinwhichallvaluesofaheaphold,thatisthepropertyofthe heapstrategybeingused.Forinstanceifwehadaheapthatdidn’tcontainthe value4wewouldhavetoexhaustthewholebackingheaparraybeforewecould determinethatitwasn’tpresentintheheap.Factoringinwhatweknowabout theheapwecanoptimisethesearchalgorithmbyincludinglogicwhichmakes useofthepropertiespresentedbyacertainheapstrategy.
Optimisingtodeterministicallystatethatavalueisintheheapisnotthat straightforward,howevertheproblemisaveryinterestingone.Asanexample consideramin-heapthatdoesn’tcontainthevalue5.Wecanonlyrulethatthe valueisnotintheheapif5 > theparentofthecurrentnodebeinginspected and < thecurrentnodebeinginspected ∀ nodesatthecurrentlevelweare traversing.Ifthisisthecasethen5cannotbeintheheapandsowecan provideananswerwithouttraversingtherestoftheheap.Ifthispropertyis notsatisfiedforanylevelofnodesthatweareinspectingthenthealgorithm willindeedfallbacktoinspectingallthenodesintheheap.Theoptimisation thatwepresentcanbeverycommonandsowefeelthattheextralogicwithin theloopisjustifiedtopreventtheexpensiveworsecaseruntime.
Thefollowingalgorithmisspecificallydesignedforamin-heap.Totailorthe algorithmforamax-heapthetwocomparisonoperationsinthe elseif condition withintheinner while loopshouldbeflipped.
1) algorithm Contains(value)
2) Pre: value isthevaluetosearchtheheapfor
3)Countisthenumberofitemsintheheap
4) heap isthearrayusedtostoretheheapitems
5) Post: value islocatedintheheap,inwhichcasetrue;otherwisefalse
6) start ← 0
7) nodes ← 1
Thenew Contains algorithmdeterminesifthe value isnotintheheapby checkingwhether count = nodes.Insuchaneventwherethisistruethenwe canconfirmthat ∀ nodes n atlevel i : value> Parent(n), value<n thusthere isnopossiblewaythat value isintheheap.AsanexampleconsiderFigure4.7. Ifwearesearchingforthevalue10withinthemin-heapdisplayeditisobvious thatwedon’tneedtosearchthewholeheaptodetermine9isnotpresent.We canverifythisaftertraversingthenodesinthesecondleveloftheheapasthe previousexpressiondefinedholdstrue.
4.4Traversal
Asmentionedin §4.3traversalofaheapisusuallydonelikethatofanyother arraydatastructurewhichourheapimplementationisbasedupon.Asaresult youtraversethearraystartingattheinitialarrayindex(0inmostlanguages) andthenvisiteachvaluewithinthearrayuntilyouhavereachedtheupper boundoftheheap.Youwillnotethatinthesearchalgorithmthatweuse Count asthisupperboundratherthantheactualphysicalboundoftheallocated array. Count isusedtopartitiontheconceptualheapfromtheactualarray implementationoftheheap:weonlycareabouttheitemsintheheap,notthe wholearray—thelattermaycontainvariousotherbitsofdataasaresultof heapmutation.
Ifyouhavefollowedtheadvicewegaveinthedeletionalgorithmthena heapthathasbeenmutatedseveraltimeswillcontainsomeformofdefault valueforitemsnolongerintheheap.Potentiallyyouwillhaveatmost LengthOf (heapArray) Count garbagevaluesinthebackingheaparraydata structure.Thegarbagevaluesofcoursevaryfromplatformtoplatform.To makethingssimplethegarbagevalueofareferencetypewillbesimple ∅ and0 foravaluetype.
Figure4.8showsaheapthatyoucanassumehasbeenmutatedmanytimes. Forthisexamplewecanfurtherassumethatatsomepointtheitemsinindexes 3 5actuallycontainedreferencestoliveobjectsoftype T .InFigure4.8 subscriptisusedtodisambiguateseparateobjectsof T
Fromwhatyouhavereadthusfaryouwillmostlikelyhavepickedupthat traversingtheheapinanyotherorderwouldbeoflittlebenefit.Theheap propertyonlyholdsforthesubtreeofeachnodeandsotraversingaheapin anyotherfashionrequiressomecreativeintervention.Heapsarenotusually traversedinanyotherwaythantheoneprescribedpreviously.
4.5Summary
Heapsaremostcommonlyusedtoimplementpriorityqueues(see §6.2fora sampleimplementation)andtofacilitateheapsort.Asdiscussedinboththe insertion §4.1anddeletion §4.2sectionsaheapmaintainsheaporderaccording totheselectedorderingstrategy.Thesestrategiesarereferredtoasmin-heap,
andmaxheap.Theformerstrategyenforcesthatthevalueofaparentnodeis lessthanthatofeachofitschildren,thelatterenforcesthatthevalueofthe parentisgreaterthanthatofeachofitschildren.
Whenyoucomeacrossaheapandyouarenottoldwhatstrategyitenforces youshouldassumethatitusesthemin-heapstrategy.Iftheheapcanbe configuredotherwise,e.g.tousemax-heapthenthiswilloftenrequireyouto statethisexplicitly.Theheapabidesprogressivelytoastrategyduringthe invocationoftheinsertion,anddeletionalgorithms.Thecostofsuchapolicyis thatuponeachinsertionanddeletionweinvokealgorithmsthathavelogarithmic runtimecomplexities.Whilethecostofmaintainingthestrategymightnot seemoverlyexpensiveitdoesstillcomeataprice.Wewillalsohavetofactor inthecostofdynamicarrayexpansionatsomestage.Thiswilloccurifthe numberofitemswithintheheapoutgrowsthespaceallocatedintheheap’s backingarray.Itmaybeinyourbestinteresttoresearchagoodinitialstarting sizeforyourheaparray.Thiswillassistinminimisingtheimpactofdynamic arrayresizing.
Chapter5
Sets
Asetcontainsanumberofvalues,innoparticularorder.Thevalueswithin thesetaredistinctfromoneanother.
Generallysetimplementationstendtocheckthatavalueisnotintheset beforeaddingit,avoidingtheissueofrepeatedvaluesfromeveroccurring.
Thissectiondoesnotcoversettheoryindepth;ratheritdemonstratesbriefly thewaysinwhichthevaluesofsetscanbedefined,andcommonoperationsthat maybeperformeduponthem.
Thenotation A = {4, 7, 9, 12, 0} definesaset A whosevaluesarelistedwithin thecurlybraces.
Giventheset A definedpreviouslywecansaythat4isamemberof A denotedby4 ∈ A,andthat99isnotamemberof A denotedby99 / ∈ A
Oftendefiningasetbymanuallystatingitsmembersistiresome,andmore importantlythesetmaycontainalargenumberofvalues.Amoreconciseway ofdefiningasetanditsmembersisbyprovidingaseriesofpropertiesthatthe valuesofthesetmustsatisfy.Forexample,fromthedefinition A = {x|x> 0,x %2=0} theset A containsonlypositiveintegersthatareeven. x isan aliastothecurrentvalueweareinspectingandtotherighthandsideof | are thepropertiesthat x mustsatisfytobeintheset A.Inthisexample, x must be > 0,andtheremainderofthearithmeticexpression x/2mustbe0.Youwill beabletonotefromthepreviousdefinitionoftheset A thatthesetcancontain aninfinitenumberofvalues,andthatthevaluesoftheset A willbealleven integersthatareamemberofthenaturalnumbersset N,where N = {1, 2, 3,...}.
Finallyinthisbriefintroductiontosetswewillcoversetintersectionand union,bothofwhichareverycommonoperations(amongstmanyothers)performedonsets.Theunionsetcanbedefinedasfollows A ∪ B = {x | x ∈ Aorx ∈ B},andintersection A ∩ B = {x | x ∈ Aandx ∈ B}.Figure5.1 demonstratessetintersectionanduniongraphically.
Giventhesetdefinitions A = {1, 2, 3},and B = {6, 2, 9} theunionofthetwo setsis A ∪ B = {1, 2, 3, 6, 9},andtheintersectionofthetwosetsis A ∩ B = {2}
Bothsetunionandintersectionaresometimesprovidedwithintheframeworkassociatedwithmainstreamlanguages.Thisisthecasein.NET3.51 wheresuchalgorithmsexistasextensionmethodsdefinedinthetype System.Linq.Enumerable 2,asaresultDSAdoesnotprovideimplementationsof
1 http://www.microsoft.com/NET/
2 http://msdn.microsoft.com/en-us/library/system.linq.enumerable_members.aspx
thesealgorithms.Mostofthealgorithmsdefinedin System.Linq.Enumerable dealmainlywithsequencesratherthansetsexclusively.
Setunioncanbeimplementedasasimpletraversalofbothsetsaddingeach itemofthetwosetstoanewunionset.
1) algorithm Union(set1, set2)
2) Pre: set1,and set2 = ∅
3) union isaset
3) Post: Aunionof set1,and set2hasbeencreated
4) foreach item in set1
5) union.Add(item)
6) endforeach
7) foreach item in set2
8) union.Add(item)
9) endforeach
10) return union
11) end Union
Theruntimeofour Union algorithmis O(m + n)where m isthenumber ofitemsinthefirstsetand n isthenumberofitemsinthesecondset.This runtimeappliesonlytosetsthatexhibit O(1)insertions.
Setintersectionisalsotrivialtoimplement.Theonlymajorthingworth pointingoutaboutouralgorithmisthatwetraversethesetcontainingthe fewestitems.Wecandothisbecauseifwehaveexhaustedalltheitemsinthe smallerofthetwosetsthentherearenomoreitemsthataremembersofboth sets,thuswehavenomoreitemstoaddtotheintersectionset.
1) algorithm Intersection(set1, set2)
2) Pre: set1,and set2 = ∅
3) intersection,and smallerSet aresets
3) Post: Anintersectionof set1,and set2hasbeencreated
4) if set1.Count <set2.Count
5) smallerSet ← set1
6) else
7) smallerSet ← set2
8) endif
9) foreach item in smallerSet
10) if set1.Contains(item) and set2.Contains(item)
11) intersection.Add(item)
12) endif 13) endforeach 14) return intersection 15) end Intersection
Theruntimeofour Intersection algorithmis O(n)where n isthenumber ofitemsinthesmallerofthetwosets.Justlikeour Union algorithmalinear runtimecanonlybeattainedwhenoperatingonasetwith O(1)insertion.
5.1Unordered
Setsinthegeneralsensedonotenforcetheexplicitorderingoftheirmembers.Forexamplethemembersof B = {6, 2, 9} conformtonoorderingscheme becauseitisnotrequired.
MostlibrariesprovideimplementationsofunorderedsetsandsoDSAdoes not;wesimplymentionitheretodisambiguatebetweenanunorderedsetand orderedset.
Wewillonlylookatinsertionforanunorderedsetandcoverbrieflywhya hashtableisanefficientdatastructuretouseforitsimplementation.
5.1.1Insertion
Anunorderedsetcanbeefficientlyimplementedusingahashtableasitsbacking datastructure.Asmentionedpreviouslyweonlyaddanitemtoasetifthat itemisnotalreadyintheset,sothebackingdatastructureweusemusthave aquicklookupandinsertionruntimecomplexity.
Ahashmapgenerallyprovidesthefollowing:
1. O(1)forinsertion
2. approaching O(1)forlookup
Theabovedependsonhowgoodthehashingalgorithmofthehashtable is,butmosthashtablesemployincrediblyefficientgeneralpurposehashing algorithmsandsotheruntimecomplexitiesforthehashtableinyourlibrary ofchoiceshouldbeverysimilarintermsofefficiency.
5.2Ordered
Anorderedsetissimilartoanunorderedsetinthesensethatitsmembersare distinct,butanorderedsetenforcessomepredefinedcomparisononeachofits memberstoproduceasetwhosemembersareorderedappropriately.
InDSA0.5andearlierweusedabinarysearchtree(definedin §3)asthe internalbackingdatastructureforourorderedset.Fromversions0.6onwards wereplacedthebinarysearchtreewithanAVLtreeprimarilybecauseAVLis balanced.
Theorderedsethasitsorderrealisedbyperforminganinordertraversal uponitsbackingtreedatastructurewhichyieldsthecorrectorderedsequence ofsetmembers.
BecauseanorderedsetinDSAissimplyawrapperforanAVLtreethat additionallyensuresthatthetreecontainsuniqueitemsyoushouldread §7to learnmoreabouttheruntimecomplexitiesassociatedwithitsoperations.
5.3Summary
Setsprovideawayofhavingacollectionofuniqueobjects,eitherorderedor unordered.
Whenimplementingaset(eitherorderedorunordered)itiskeytoselect thecorrectbackingdatastructure.Aswediscussedin §5.1.1becausewecheck firstiftheitemisalreadycontainedwithinthesetbeforeaddingitweneed thischecktobeasquickaspossible.Forunorderedsetswecanrelyontheuse ofahashtableandusethekeyofanitemtodeterminewhetherornotitis alreadycontainedwithintheset.Usingahashtablethischeckresultsinanear constantruntimecomplexity.Orderedsetscostalittlemoreforthischeck, howeverthelogarithmicgrowththatweincurbyusingabinarysearchtreeas itsbackingdatastructureisacceptable.
Anotherkeypropertyofsetsimplementedusingtheapproachwedescribeis thatbothhavefavourablyfastlook-uptimes.Justlikethecheckbeforeinsertion,forahashtablethisruntimecomplexityshouldbenearconstant.Ordered setsasdescribedin3performabinarychopateachstagewhensearchingfor theexistenceofanitemyieldingalogarithmicruntime.
Wecanusesetstofacilitatemanyalgorithmsthatwouldotherwisebealittle lessclearintheirimplementation.Forexamplein §11.4weuseanunordered settoassistintheconstructionofanalgorithmthatdeterminesthenumberof repeatedwordswithinastring.
Chapter6
Queues
Queuesareanessentialdatastructurethatarefoundinvastamountsofsoftwarefromusermodetokernelmodeapplicationsthatarecoretothesystem. Fundamentallytheyhonourafirstinfirstout(FIFO)strategy,thatistheitem firstputintothequeuewillbethefirstserved,theseconditemaddedtothe queuewillbethesecondtobeservedandsoon.
Atraditionalqueueonlyallowsyoutoaccesstheitematthefrontofthe queue;whenyouaddanitemtothequeuethatitemisplacedatthebackof thequeue.
Historicallyqueuesalwayshavethefollowingthreecoremethods:
Enqueue: placesanitematthebackofthequeue;
Dequeue: retrievestheitematthefrontofthequeue,andremovesitfromthe queue;
Peek: 1 retrievestheitematthefrontofthequeuewithoutremovingitfrom thequeue
Asanexampletodemonstratethebehaviourofaqueuewewillwalkthrough ascenariowherebyweinvokeeachofthepreviouslymentionedmethodsobservingthemutationsuponthequeuedatastructure.Thefollowinglistdescribes theoperationsperformeduponthequeueinFigure6.1:
1. Enqueue(10) 2. Enqueue(12) 3. Enqueue(9) 4. Enqueue(8) 5. Enqueue(3) 6. Dequeue() 7. Peek()6.1Astandardqueue
Aqueueisimplicitlylikethatdescribedpriortothissection.InDSAwedon’t provideastandardqueuebecausequeuesaresopopularandsuchacoredata structurethatyouwillfindprettymucheverymainstreamlibraryprovidesa queuedatastructurethatyoucanusewithyourlanguageofchoice.Inthis sectionwewilldiscusshowyoucan,ifrequired,implementanefficientqueue datastructure.
Themainpropertyofaqueueisthatwehaveaccesstotheitematthe frontofthequeue.Thequeuedatastructurecanbeefficientlyimplemented usingasinglylinkedlist(definedin §2.1).Asinglylinkedlistprovides O(1) insertionanddeletionruntimecomplexities.Thereasonwehavean O(1)run timecomplexityfordeletionisbecauseweonlyeverremoveitemsfromthefront ofqueues(withtheDequeueoperation).Sincewealwayshaveapointertothe itemattheheadofasinglylinkedlist,removalissimplyacaseofreturning thevalueoftheoldheadnode,andthenmodifyingtheheadpointertobethe nextnodeoftheoldheadnode.Theruntimecomplexityforsearchingaqueue remainsthesameasthatofasinglylinkedlist: O(n).
6.2PriorityQueue
Unlikeastandardqueuewhereitemsareorderedintermsofwhoarrivedfirst, apriorityqueuedeterminestheorderofitsitemsbyusingaformofcustom comparertoseewhichitemhasthehighestpriority.Otherthantheitemsina priorityqueuebeingorderedbypriorityitremainsthesameasanormalqueue: youcanonlyaccesstheitematthefrontofthequeue.
Asensibleimplementationofapriorityqueueistouseaheapdatastructure (definedin §4).Usingaheapwecanlookatthefirstiteminthequeuebysimply returningtheitematindex0withintheheaparray.Aheapprovidesuswiththe abilitytoconstructapriorityqueuewheretheitemswiththehighestpriority areeitherthosewiththesmallestvalue,orthosewiththelargest.
6.3DoubleEndedQueue
Unlikethequeueswehavetalkedaboutpreviouslyinthischapteradouble endedqueueallowsyoutoaccesstheitemsatboththefront,andbackofthe queue.Adoubleendedqueueiscommonlyknownasadequewhichisthename wewillhereoninrefertoitas.
Adequeappliesnoprioritizationstrategytoitsitemslikeapriorityqueue does,itemsareaddedinordertoeitherthefrontofbackofthedeque.The formerpropertiesofthedequearedenotedbytheprogrammerutilisingthedata structuresexposedinterface.
Deque’sprovidefrontandbackspecificversionsofcommonqueueoperations, e.g.youmaywanttoenqueueanitemtothefrontofthequeueratherthan thebackinwhichcaseyouwoulduseamethodwithanamealongthelines of EnqueueFront.Thefollowinglistidentifiesoperationsthatarecommonly supportedbydeque’s:
• EnqueueFront
• EnqueueBack
• DequeueFront
• DequeueBack
• PeekFront
• PeekBack
Figure6.2showsadequeaftertheinvocationofthefollowingmethods(inorder):
1. EnqueueBack(12)
2. EnqueueFront(1)
3. EnqueueBack(23)
4. EnqueueFront(908)
5. DequeueFront()
6. DequeueBack()
Theoperationshaveaone-to-onetranslationintermsofbehaviourwith thoseofanormalqueue,orpriorityqueue.Insomecasesthesetofalgorithms thataddanitemtothebackofthedequemaybenamedastheyarewith normalqueues,e.g. EnqueueBack maysimplybecalled Enqueue ansoon.Some frameworksalsospecifyexplicitbehaviour’sthatdatastructuresmustadhereto. Thisiscertainlythecasein.NETwheremostcollectionsimplementaninterface whichrequiresthedatastructuretoexposeastandard Add method.Insuch ascenarioyoucansafelyassumethatthe Add methodwillsimplyenqueuean itemtothebackofthedeque.
Withrespecttoalgorithmicruntimecomplexitiesadequeisthesameas anormalqueue.Thatisenqueueinganitemtothebackofathequeueis O(1),additionallyenqueuinganitemtothefrontofthequeueisalsoan O(1) operation.
Adequeisawrapperdatastructurethatuseseitheranarray,oradoubly linkedlist.Usinganarrayasthebackingdatastructurewouldrequiretheprogrammertobeexplicitaboutthesizeofthearrayupfront,thiswouldprovide anobviousadvantageiftheprogrammercoulddeterministicallystatethemaximumnumberofitemsthedequewouldcontainatanyonetime.Unfortunately inmostcasesthisdoesn’thold,asaresultthebackingarraywillinherently incurtheexpenseofinvokingaresizingalgorithmwhichwouldmostlikelybe an O(n)operation.Suchanapproachwouldalsoleavethelibrarydeveloper
tolookatarrayminimizationtechniquesaswell,itcouldbethatafterseveral invocationsoftheresizingalgorithmandvariousmutationsonthedequelater thatwehaveanarraytakingupaconsiderableamountofmemoryyetweare onlyusingafewsmallpercentageofthatmemory.Analgorithmdescribed wouldalsobe O(n)yetitsinvocationwouldbehardertogaugestrategically.
Tobypassalltheaforementionedissuesadequetypicallyusesadoubly linkedlistasitsbakingdatastructure.Whileanodethathastwopointers consumesmorememorythanitsarrayitemcounterpartitmakesredundantthe needforexpensiveresizingalgorithmsasthedatastructureincreasesinsize dynamically.Withalanguagethattargetsagarbagecollectedvirtualmachine memoryreclamationisanopaqueprocessasthenodesthatarenolongerreferencedbecomeunreachableandarethusmarkedforcollectionuponthenext invocationofthegarbagecollectionalgorithm.WithC++oranyotherlanguagethatusesexplicitmemoryallocationanddeallocationitwillbeuptothe programmertodecidewhenthememorythatstorestheobjectcanbefreed.
6.4Summary
Withnormalqueueswehaveseenthatthosewhoarrivefirstaredealtwithfirst; thatistheyaredealtwithinafirst-in-first-out(FIFO)order.Queuescanbe eversouseful;forexampletheWindowsCPUschedulerusesadifferentqueue foreachpriorityofprocesstodeterminewhichshouldbethenextprocessto utilisetheCPUforaspecifiedtimequantum.Normalqueueshaveconstant insertionanddeletionruntimes.Searchingaqueueisfairlyunusual—typically youareonlyinterestedintheitematthefrontofthequeue.Despitethat, searchingisusuallyexposedonqueuesandtypicallytheruntimeislinear.
Inthischapterwehavealsoseenpriorityqueueswherethoseatthefront ofthequeuehavethehighestpriorityandthosenearthebackhavethelowest. Oneimplementationofapriorityqueueistouseaheapdatastructureasits backingstore,sotheruntimesforinsertion,deletion,andsearchingarethe sameasthoseforaheap(definedin §4).
Queuesareaverynaturaldatastructure,andwhiletheyarefairlyprimitive theycanmakemanyproblemsalotsimpler.Forexamplethebreadthfirst searchdefinedin §3.7.4makesextensiveuseofqueues.
Chapter7
AVLTree
Intheearly60’sG.M.Adelson-VelskyandE.M.Landisinventedthefirstselfbalancingbinarysearchtreedatastructure,callingitAVLTree.
AnAVLtreeisabinarysearchtree(BST,definedin §3)withaself-balancing conditionstatingthatthedifferencebetweentheheightoftheleftandright subtreescannotbenomorethanone,seeFigure7.1.Thiscondition,restored aftereachtreemodification,forcesthegeneralshapeofanAVLtree.Before continuing,letusfocusonwhybalanceissoimportant.Considerabinary searchtreeobtainedbystartingwithanemptytreeandinsertingsomevalues inthefollowingorder1,2,3,4,5.
TheBSTinFigure7.2representstheworstcasescenarioinwhichtherunningtimeofallcommonoperationssuchassearch,insertionanddeletionare O(n).Byapplyingabalanceconditionweensurethattheworstcaserunning timeofeachcommonoperationis O(logn).TheheightofanAVLtreewith n nodesis O(logn)regardlessoftheorderinwhichvaluesareinserted.
TheAVLbalancecondition,knownalsoasthenodebalancefactorrepresents anadditionalpieceofinformationstoredforeachnode.Thisiscombinedwith atechniquethatefficientlyrestoresthebalanceconditionforthetree.Inan AVLtreetheinventorsmakeuseofawell-knowntechniquecalledtreerotation. h h+1
7.1TreeRotations
Atreerotationisaconstanttimeoperationonabinarysearchtreethatchanges theshapeofatreewhilepreservingstandardBSTproperties.Thereareleftand rightrotationsbothofthemdecreasetheheightofaBSTbymovingsmaller subtreesdownandlargersubtreesup.
1) algorithm LeftRotation(node)
2) Pre: node.Right!= ∅
3) Post: node.Rightisthenewrootofthesubtree,
4) node hasbecome node.Right’sleftchildand, 5)BSTpropertiesarepreserved
6) RightNode ← node.Right
7) node.Right ← RightNode.Left
8) RightNode.Left ← node
9) end LeftRotation
1) algorithm RightRotation(node)
2) Pre: node.Left!= ∅
3) Post: node.Leftisthenewrootofthesubtree,
4) node hasbecome node.Left’srightchildand,
5)BSTpropertiesarepreserved
6) LeftNode ← node.Left
7) node.Left ← LeftNode.Right
8) LeftNode.Right ← node
9) end RightRotation
Therightandleftrotationalgorithmsaresymmetric.Onlypointersare changedbyarotationresultinginan O(1)runtimecomplexity;theotherfields presentinthenodesarenotchanged.
7.2TreeRebalancing
Thealgorithmthatwepresentinthissectionverifiesthattheleftandright subtreesdifferatmostinheightby1.Ifthispropertyisnotpresentthenwe performthecorrectrotation.
Noticethatweusetwonewalgorithmsthatrepresentdoublerotations. Thesealgorithmsarenamed LeftAndRightRotation,and RightAndLeftRotation Thealgorithmsareselfdocumentingintheirnames,e.g. LeftAndRightRotation firstperformsaleftrotationandthensubsequentlyarightrotation.
1) algorithm CheckBalance(current)
2) Pre: current isthenodetostartfrombalancing
3) Post: current heighthasbeenupdatedwhiletreebalanceisifneeded
4)restoredthroughrotations
5) if current.Left= ∅ and current.Right= ∅
6) current.Height=-1;
7) else
8) current.Height=Max(Height(current.Left),Height(current.Right))+1
9) endif
10) if Height(current.Left)- Height(current.Right) > 1
11) if Height(current.Left.Left)- Height(current.Left.Right) > 0
12)RightRotation(current)
13) else
14)LeftAndRightRotation(current)
15) endif
16) elseif Height(current.Left)- Height(current.Right) < 1
17) if Height(current.Right.Left)- Height(current.Right.Right) < 0
18)LeftRotation(current)
19) else
20)RightAndLeftRotation(current)
21) endif
22) endif
23) end CheckBalance
7.3Insertion
AVLinsertionoperatesfirstbyinsertingthegivenvaluethesamewayasBST insertionandthenbyapplyingrebalancingtechniquesifnecessary.Thelatter isonlyperformediftheAVLpropertynolongerholds,thatistheleftandright subtreesheightdifferbymorethan1.EachtimeweinsertanodeintoanAVL tree:
1. Wegodownthetreetofindthecorrectpointatwhichtoinsertthenode, inthesamemannerasforBSTinsertion;then
2. wetravelupthetreefromtheinsertednodeandcheckthatthenode balancingpropertyhasnotbeenviolated;ifthepropertyhasn’tbeen violatedthenweneednotrebalancethetree,theoppositeistrueifthe balancingpropertyhasbeenviolated.
1) algorithm Insert(value)
2) Pre: value haspassedcustomtypechecksfortype T
3) Post: value hasbeenplacedinthecorrectlocationinthetree
4) if root = ∅
5) root ← node(value)
6) else
7)InsertNode(root, value)
8) endif
9) end Insert
1) algorithm InsertNode(current, value)
2) Pre: current isthenodetostartfrom
3) Post: value hasbeenplacedinthecorrectlocationinthetreewhile
4)preservingtreebalance
5) if value<current.Value
6) if current.Left= ∅
7) current.Left ← node(value)
8) else
9)InsertNode(current.Left, value)
10) endif
11) else
12) if current.Right= ∅
13) current.Right ← node(value)
14) else
15)InsertNode(current.Right, value)
16) endif
17) endif
18)CheckBalance(current)
19) end InsertNode
7.4Deletion
OurbalancingalgorithmisliketheonepresentedforourBST(definedin §3.3). Themajordifferenceisthatwehavetoensurethatthetreestilladherestothe AVLbalancepropertyaftertheremovalofthenode.Ifthetreedoesn’tneed toberebalancedandthevalueweareremovingiscontainedwithinthetree thennofurthersteparerequired.However,whenthevalueisinthetreeand itsremovalupsetstheAVLbalancepropertythenwemustperformthecorrect rotation(s).
1) algorithm Remove(value)
2) Pre: value isthevalueofthenodetoremove, root istherootnode
3)oftheAvl
4) Post: nodewith value isremovedandtreerebalancediffoundinwhich
5)caseyieldstrue,otherwisefalse
6) nodeToRemove ← root
7) parent ←∅
8) Stackpath ← root 9) while nodeToRemove = ∅ and nodeToRemove.Value = Value
parent = nodeToRemove
if value<nodeToRemove.Value
nodeToRemove ← nodeToRemove.Right
endif 16)path.Push(nodeToRemove) 17) endwhile
if nodeToRemove = ∅ 19) returnfalse //valuenotinAvl 20) endif
parent ← FindParent(value)
if count =1// count keepstrackofthe#ofnodesintheAvl
root ←∅ //weareremovingtheonlynodeintheAvl
elseif nodeToRemove.Left= ∅ and nodeToRemove.Right= null
51) endwhile
52)//settheparents’Rightpointerof largestValue to ∅
53)FindParent(largestValue.Value).Right ←∅
54) nodeToRemove.Value ← largestValue.Value
55) endif
56) while path.Count> 0
57)CheckBalance(path.Pop())//wetrackbacktotherootnodecheckbalance
58) endwhile
59) count ← count 1
60) returntrue
61) end Remove
7.5Summary
TheAVLtreeisasophisticatedselfbalancingtree.Itcanbethoughtofas thesmarter,youngerbrotherofthebinarysearchtree.Unlikeitsolderbrother theAVLtreeavoidsworstcaselinearcomplexityruntimesforitsoperations.
TheAVLtreeguaranteesviatheenforcementofbalancingalgorithmsthatthe leftandrightsubtreesdifferinheightbyatmost1whichyieldsatmosta logarithmicruntimecomplexity.
PartII Algorithms
Chapter8
Sorting
Allthesortingalgorithmsinthischapterusedatastructuresofaspecifictype todemonstratesorting,e.g.a32bitintegerisoftenusedasitsassociated operations(e.g. <, >,etc)areclearintheirbehaviour.
Thealgorithmsdiscussedcaneasilybetranslatedintogenericsortingalgorithmswithinyourrespectivelanguageofchoice.
8.1BubbleSort
Oneofthemostsimpleformsofsortingisthatofcomparingeachitemwith everyotheriteminsomelist,howeverasthedescriptionmayimplythisform ofsortingisnotparticularlyeffecient O(n2).Init’smostsimpleformbubble sortcanbeimplementedastwoloops.
1) algorithm BubbleSort(list)
2) Pre: list = ∅
3) Post: list hasbeensortedintovaluesofascendingorder
4) for i ← 0to list.Count 1
5) for j ← 0to list.Count 1
6) if list[i] <list[j]
7) Swap(list[i],list[j])
8) endif
9) endfor 10) endfor 11) return list
12) end BubbleSort
8.2MergeSort
MergesortisanalgorithmthathasafairlyefficientspacetimecomplexityO(nlogn)andisfairlytrivialtoimplement.Thealgorithmisbasedonsplitting alist,intotwosimilarsizedlists(left,and right)andsortingeachlistandthen mergingthesortedlistsbacktogether.
Note:thefunctionMergeOrderedsimplytakestwoorderedlistsandmakes themone.
1) algorithm Mergesort(list)
2) Pre: list = ∅
3) Post: list hasbeensortedintovaluesofascendingorder
4) if list.Count=1//alreadysorted
5) return list
6) endif
7) m ← list.Count / 2
8) left ← list(m)
9) right ← list(list.Count m)
10) for i ← 0to left.Count 1
11) left[i] ← list[i]
12) endfor
13) for i ← 0to right.Count 1
14) right[i] ← list[i]
15) endfor
16) left ← Mergesort(left)
17) right ← Mergesort(right)
18) return MergeOrdered(left, right)
19) end Mergesort
8.3QuickSort
Quicksortisoneofthemostpopularsortingalgorithmsbasedondivideet imperastrategy,resultinginan O(nlogn)complexity.Thealgorithmstartsby pickinganitem,calledpivot,andmovingallsmalleritemsbeforeit,whileall greaterelementsafterit.Thisisthemainquicksortoperation,calledpartition, recursivelyrepeatedonlesserandgreatersublistsuntiltheirsizeisoneorzero -inwhichcasethelistisimplicitlysorted.
Choosinganappropriatepivot,asforexamplethemedianelementisfundamentalforavoidingthedrasticallyreducedperformanceof O(n2).
1) algorithm QuickSort(list)
2) Pre: list = ∅
3) Post: list hasbeensortedintovaluesofascendingorder
4) if list.Count=1//alreadysorted
5) return list
6) endif
7) pivot ←MedianValue(list)
8) for i ← 0to list.Count 1
9) if list[i]= pivot
10) equal.Insert(list[i])
11) endif
12) if list[i] <pivot
13) less.Insert(list[i])
14) endif
15) if list[i] >pivot
16) greater.Insert(list[i])
17) endif
18) endfor
19) return Concatenate(QuickSort(less), equal,QuickSort(greater))
20) end Quicksort
8.4InsertionSort
Insertionsortisasomewhatinterestingalgorithmwithanexpensiveruntimeof O(n2).Itcanbebestthoughtofasasortingschemesimilartothatofsorting ahandofplayingcards,i.e.youtakeonecardandthenlookattherestwith theintentofbuildingupanorderedsetofcardsinyourhand.
1) algorithm Insertionsort(list)
2) Pre: list = ∅
3) Post: list hasbeensortedintovaluesofascendingorder
4) unsorted ← 1
5) while unsorted<list.Count
6) hold ← list[unsorted]
7) i ← unsorted 1
8) while i ≥ 0 and hold<list[i]
9) list[i +1] ← list[i]
10) i ← i 1
11) endwhile
12) list[i +1] ← hold
13) unsorted ← unsorted +1
14) endwhile 15) return list
16) end Insertionsort
8.5ShellSort
Putsimplyshellsortcanbethoughtofasamoreefficientvariationofinsertion sortasdescribedin §8.4,itachievesthismainlybycomparingitemsofvarying distancesapartresultinginaruntimecomplexityof O(nlog2 n).
Shellsortisfairlystraightforwardbutmayseemsomewhatconfusingat firstasitdiffersfromothersortingalgorithmsinthewayitselectsitemsto compare.Figure8.5showsshellsortbeingranonanarrayofintegers,thered colouredsquareisthecurrentvalueweareholding.
1) algorithm ShellSort(list)
2) Pre: list = ∅
3) Post: list hasbeensortedintovaluesofascendingorder
4) increment ← list.Count / 2
5) while increment =0
6) current ← increment
7) while current<list.Count
8) hold ← list[current]
9) i ← current increment 10) while i ≥ 0 and hold<list[i] 11) list[i + increment] ← list[i] 12) i = increment 13) endwhile 14) list[i + increment] ← hold 15) current ← current +1 16) endwhile 17) increment/ =2 18) endwhile 19) return list 20) end ShellSort
8.6RadixSort
Unlikethesortingalgorithmsdescribedpreviouslyradixsortusesbucketsto sortitems,eachbucketholdsitemswithaparticularpropertycalledakey. Normallyabucketisaqueue,eachtimeradixsortisperformedthesebuckets areemptiedstartingthesmallestkeybuckettothelargest.Whenlookingat itemswithinalisttosortwedosobyisolatingaspecifickey,e.g.intheexample weareabouttoshowwehaveamaximumofthreekeysforallitems,thatis thehighestkeyweneedtolookatishundreds.Becausewearedealingwith,in thisexamplebase10numberswehaveatanyonepoint10possiblekeyvalues 0..9eachofwhichhastheirownbucket.Beforeweshowyouthisfirstsimple versionofradixsortletusclarifywhatwemeanbyisolatingkeys.Giventhe number102ifwelookatthefirstkey,theonesthenwecanseewehavetwoof them,progressingtothenextkey-tenswecanseethatthenumberhaszero ofthem,finallywecanseethatthenumberhasasinglehundred.Thenumber usedasanexamplehasintotalthreekeys:
Forfurtherclarificationwhatifwewantedtodeterminehowmanythousands thenumber102has?Clearlytherearenone,butoftenlookingatanumberas finallikeweoftendoitisnotsoobvioussowhenaskedthequestionhowmany thousandsdoes102haveyoushouldsimplypadthenumberwithazerointhat location,e.g.0102hereitismoreobviousthatthekeyvalueatthethousands locationiszero.
Thelastthingtoidentifybeforeweactuallyshowyouasimpleimplementationofradixsortthatworksononlypositiveintegers,andrequiresyouto specifythemaximumkeysizeinthelististhatweneedawaytoisolatea specifickeyatanyonetime.Thesolutionisactuallyverysimple,butitsnot oftenyouwanttoisolateakeyinanumbersowewillspellitoutclearly here.Akeycanbeaccessedfromanyintegerwiththefollowingexpression: key ← (number/keyToAccess)%10.Asasimpleexampleletssaythatwe wanttoaccessthetenskeyofthenumber1290,thetenscolumniskey10and soaftersubstitutionyields key ← (1290 / 10)%10=9.Thenextkeyto lookatforanumbercanbeattainedbymultiplyingthelastkeybytenworking lefttorightinasequentialmanner.Thevalueof key isusedinthefollowing algorithmtoworkouttheindexofanarrayofqueuestoenqueuetheiteminto.
1) algorithm Radix(list, maxKeySize)
2) Pre: list = ∅
3) maxKeySize ≥ 0andrepresentsthelargestkeysizeinthelist
4) Post: list hasbeensorted
5) queues ← Queue[10]
6) indexOfKey ← 1
7) fori ← 0 to maxKeySize 1
8) foreach item in list
9) queues[GetQueueIndex(item, indexOfKey)].Enqueue(item)
10) endforeach
11) list ← CollapseQueues(queues)
12)ClearQueues(queues)
13) indexOfKey ← indexOfKey ∗ 10
14) endfor
15) return list
16) end Radix
Figure8.6showsthemembersof queues fromthealgorithmdescribedabove operatingonthelistwhosemembersare90, 12, 8, 791, 123, and61,thekeywe areinterestedinforeachnumberishighlighted.OmittedqueuesinFigure8.6 meanthattheycontainnoitems.
8.7Summary
Throughoutthischapterwehaveseenmanydifferentalgorithmsforsorting lists,someareveryefficient(e.g.quicksortdefinedin §8.3),somearenot(e.g.
Selectingthecorrectsortingalgorithmisusuallydenotedpurelybyefficiency, e.g.youwouldalwayschoosemergesortovershellsortandsoon.Thereare alsootherfactorstolookatthoughandthesearebasedontheactualimplementation.Somealgorithmsareverynicelyexpressedinarecursivefashion, howeverthesealgorithmsoughttobeprettyefficient,e.g.implementingalinear, quadratic,orsloweralgorithmusingrecursionwouldbeaverybadidea.
Ifyouwanttolearnmoreaboutwhyyoushouldbevery,verycarefulwhen implementingrecursivealgorithmsseeAppendixC.
Chapter9
Numeric
Unlessstatedotherwisethealias n denotesastandard32bitinteger.
9.1PrimalityTest
Asimplealgorithmthatdetermineswhetherornotagivenintegerisaprime number,e.g.2,5,7,and13are all primenumbers,however6isnotasitcan betheresultoftheproductoftwonumbersthatare < 6.
Inanattempttoslowdowntheinnerloopthe √n isusedastheupper bound.
1) algorithm IsPrime(n)
2) Post: n isdeterminedtobeaprimeornot
3) for i ← 2 to n do
4) for j ← 1 to sqrt(n) do
5) if i ∗ j = n
6) return false
7) endif
8) endfor
9) endfor
10) end IsPrime
9.2Baseconversions
DSAcontainsanumberofalgorithmsthatconvertabase10numbertoits equivalentbinary,octalorhexadecimalform.Forexample7810 hasabinary representationof10011102
Table9.1showsthealgorithmtracewhenthenumbertoconverttobinary is74210
9.3Attainingthegreatestcommondenominatoroftwonumbers
Afairlyroutineprobleminmathematicsisthatoffindingthegreatestcommon denominatoroftwointegers,whatweareessentiallyafteristhegreatestnumber whichisamultipleofboth,e.g.thegreatestcommondenominatorof9,and 15is3.OneofthemostelegantsolutionstothisproblemisbasedonEuclid’s algorithmthathasaruntimecomplexityof O(n2).
9.4ComputingthemaximumvalueforanumberofaspecificbaseconsistingofNdigits
Thisalgorithmcomputesthemaximumvalueofanumberforagivennumber ofdigits,e.g.usingthebase10systemthemaximumnumberwecanhave madeupof4digitsisthenumber999910.Similarlythemaximumnumberthat consistsof4digitsforabase2numberis11112 whichis1510
Theexpressionbywhichwecancomputethismaximumvaluefor N digits is: BN 1.Inthepreviousexpression B isthenumberbase,and N isthe numberofdigits.Asanexampleifwewantedtodeterminethemaximumvalue forahexadecimalnumber(base16)consistingof6digitstheexpressionwould beasfollows:166 1.Themaximumvalueofthepreviousexamplewouldbe representedas FFFFFF16 whichyields1677721510
Inthefollowingalgorithm numberBase shouldbeconsideredrestrictedto thevaluesof2,8,9,and16.Forthisreasoninouractualimplementation numberBase hasanenumerationtype.The Base enumerationtypeisdefined as:
Base = {Binary ← 2,Octal ← 8,Decimal ← 10,Hexadecimal ← 16}
Thereasonweprovidethedefinitionof Base istogiveyouanideahowthis algorithmcanbemodelledinamorereadablemannerratherthanusingvarious checkstodeterminethecorrectbasetouse.Forourimplementationwecastthe valueof numberBase toaninteger,assuchweextractthevalueassociatedwith therelevantoptioninthe Base enumeration.Asanexampleifweweretocast theoption Octal toanintegerwewouldgetthevalue8.Inthealgorithmlisted belowthecastisimplicitsowejustusetheactualargument numberBase
1) algorithm MaxValue(numberBase, n)
2) Pre: numberBase isthenumbersystemtouse, n isthenumberofdigits
3) Post: themaximumvaluefor numberBase consistingof n digitsiscomputed
4) return Power(numberBase,n) 1
5) end MaxValue
9.5Factorialofanumber
Attainingthefactorialofanumberisaprimitivemathematicaloperation.Many implementationsofthefactorialalgorithmarerecursiveastheproblemisrecursiveinnature,howeverherewepresentaniterativesolution.Theiterative solutionispresentedbecauseittooistrivialtoimplementanddoesn’tsuffer fromtheuseofrecursion(formoreonrecursionsee §C).
Thefactorialof0and1is0.Theaforementionedactsasabasecasethatwe willbuildupon.Thefactorialof2is2∗ thefactorialof1,similarlythefactorial of3is3∗ thefactorialof2andsoon.Wecanindicatethatweareafterthe factorialofanumberusingtheform N !where N isthenumberwewishto attainthefactorialof.Ouralgorithmdoesn’tusesuchnotationbutitishandy toknow.
1) algorithm Factorial(n)
2) Pre: n ≥ 0, n isthenumbertocomputethefactorialof
3) Post: thefactorialof n iscomputed
4) if n< 2
9.6Summary
Inthischapterwehavepresentedseveralnumericalgorithms,mostofwhich aresimplyherebecausetheywerefuntodesign.Perhapsthemessagethat thereadershouldgainfromthischapteristhatalgorithmscanbeappliedto severaldomainstomakeworkinthatrespectivedomainattainable.Numeric algorithmsinparticulardrivesomeofthemostadvancedsystemsontheplanet computingsuchdataasweatherforecasts.
Chapter10
Searching 10.1SequentialSearch
Asimplealgorithmthatsearchforaspecificiteminsidealist.Itoperates loopingoneachelement O(n)untilamatchoccursortheendisreached.
1) algorithm SequentialSearch(list, item)
2) Pre: list = ∅
3) Post: return index ofitemiffound,otherwise 1
4) index ← 0
5) while index<list.Count and list[index] = item
6) index ← index +1
7) endwhile
8) if index<list.Count and list[index]= item
9) return index 10) endif 11) return 1
12) end SequentialSearch
10.2ProbabilitySearch
Probabilitysearchisastatisticalsequentialsearchingalgorithm.Inadditionto searchingforanitem,ittakesintoaccountitsfrequencybyswappingitwith it’spredecessorinthelist.Thealgorithmcomplexitystillremainsat O(n)but inanon-uniformitemssearchthemorefrequentitemsareinthefirstpositions, reducinglistscanningtime.
Figure10.1showstheresultingstateofalistaftersearchingfortwoitems, noticehowthesearcheditemshavehadtheirsearchprobabilityincreasedafter eachsearchoperationrespectively.
Figure10.1:a)Search(12),b)Search(101)
1) algorithm ProbabilitySearch(list, item)
2) Pre: list = ∅
3) Post: abooleanindicatingwheretheitemisfoundornot; intheformercaseswapfoundeditemwithitspredecessor
4) index ← 0
5) while index<list.Count and list[index] = item
6) index ← index +1
7) endwhile
8) if index ≥ list.Count or list[index] = item
9) return false
10) endif
11) if index> 0
12) Swap(list[index],list[index 1])
13) endif
14) return true
15) end ProbabilitySearch
10.3Summary
Inthischapterwehavepresentedafewnovelsearchingalgorithms.Wehave presentedmoreefficientsearchingalgorithmsearlieron,likeforinstancethe logarithmicsearchingalgorithmthatAVLandBSTtree’suse(definedin §3.2). Wedecidednottocoverasearchingalgorithmknownasbinarychop(another nameforbinarysearch,binarychopusuallyreferstoitsarraycounterpart)as
thereaderhasalreadyseensuchanalgorithmin §3. Searchingalgorithmsandtheirefficiencylargelydependsontheunderlying datastructurebeingusedtostorethedata.Forinstanceitisquickertodeterminewhetheranitemisinahashtablethanitisanarray,similarlyitisquicker tosearchaBSTthanitisalinkedlist.Ifyouaregoingtosearchfordatafairly oftenthenwestronglyadvisethatyousitdownandresearchthedatastructures availabletoyou.Inmostcasesusingalistoranyotherprimarilylineardata structureisdowntolackofknowledge.Modelyourdataandthenresearchthe datastructuresthatbestfityourscenario.
Chapter11
Strings
Stringshavetheirownchapterinthistextpurelybecausestringoperations andtransformationsareincrediblyfrequentwithinprograms.Thealgorithms presentedarebasedonproblemstheauthorshavecomeacrosspreviously,or wereformulatedtosatisfycuriosity.
11.1Reversingtheorderofwordsinasentence
Definingalgorithmsforprimitivestringoperationsissimple,e.g.extractinga sub-stringofastring,howeversomealgorithmsthatrequiremoreinventiveness canbealittlemoretricky.
Thealgorithmpresentedheredoesnotsimplyreversethecharactersina string,ratheritreversestheorderofwordswithinastring.Thisalgorithm worksontheprincipalthatwordsarealldelimitedbywhitespace,andusinga fewmarkerstodefinewherewordsstartandendwecaneasilyreversethem.
1) algorithm ReverseWords(value)
2) Pre: value = ∅, sb isastringbuffer
3) Post: thewordsin value havebeenreversed
4) last ← value.Length 1
5) start ← last
6) while last ≥ 0
7)//skipwhitespace
8) while start ≥ 0 and value[start]=whitespace
9) start ← start 1
10) endwhile
11) last ← start
12)//marchdowntotheindexbeforethebeginningoftheword
13) while start ≥ 0 and start =whitespace
14) start ← start 1
15) endwhile
16)//appendcharsfrom start +1to length +1tostringbuffersb
17) for i ← start +1 to last
18) sb.Append(value[i])
19) endfor
20)//ifthisisn’tthelastwordinthestringaddsomewhitespaceafterthewordinthebuffer
21) if start> 0
22) sb.Append(‘’)
23) endif
24) last ← start 1
25) start ← last 26) endwhile
27)//checkifwehaveaddedonetoomanywhitespaceto sb
28) if sb[sb.Length 1]=whitespace
29)//cutthewhitespace
30) sb.Length ← sb.Length 1
31) endif
32) return sb
33) end ReverseWords
11.2Detectingapalindrome
Althoughnotafrequentalgorithmthatwillbeappliedinreal-lifescenarios detectingapalindromeisafun,andasitturnsoutprettytrivialalgorithmto design.
Thealgorithmthatwepresenthasa O(n)runtimecomplexity.Ouralgorithmusestwopointersatoppositeendsofstringwearecheckingisapalindrome ornot.Thesepointersmarchintowardseachotheralwayscheckingthateach charactertheypointtoisthesamewithrespecttovalue.Figure11.1showsthe IsPalindrome algorithminoperationonthestring“WasitEliot’stoiletIsaw?” Ifyouremoveallpunctuation,andwhitespacefromtheaforementionedstring youwillfindthatitisavalidpalindrome.
Figure11.1:
1) algorithm IsPalindrome(value)
2) Pre: value = ∅
3) Post: value isdeterminedtobeapalindromeornot
4) word ← value.Strip().ToUpperCase()
5) left ← 0
6) right ← word.Length 1
7) while word[left]= word[right] and left<right
8) left ← left +1
9) right ← right 1
10) endwhile
11) return word[left]= word[right]
12) end IsPalindrome
Inthe IsPalindrome algorithmwecallamethodbythenameof Strip.This algorithmdiscardspunctuationinthestring,includingwhitespace.Asaresult word containsaheavilycompactedrepresentationoftheoriginalstring,each characterofwhichisinitsuppercaserepresentation.
Palindromesdiscardwhitespace,punctuation,andcasemakingthesechanges allowsustodesignasimplealgorithmwhilemakingouralgorithmfairlyrobust withrespecttothepalindromesitwilldetect.
11.3Countingthenumberofwordsinastring
Countingthenumberofwordsinastringcanseemprettytrivialatfirst,however thereareafewcasesthatweneedtobeawareof:
1. trackingwhenweareinastring
2. updatingthewordcountatthecorrectplace
3. skippingwhitespacethatdelimitsthewords
Asanexampleconsiderthestring“Benatehay”Clearlythisstringcontains threewords,eachofwhichdistinguishedviawhitespace.Allofthepreviously listedpointscanbemanagedbyusingthreevariables:
1. index
2. wordCount
3. inWord
Ofthepreviouslylisted index keepstrackofthecurrentindexweareatin thestring, wordCount isanintegerthatkeepstrackofthenumberofwordswe haveencountered,andfinally inWord isaBooleanflagthatdenoteswhether ornotatthepresenttimewearewithinaword.Ifwearenotcurrentlyhitting whitespaceweareinaword,theoppositeistrueifatthepresentindexweare hittingwhitespace.
Whatdenotesaword?Inouralgorithmeachwordisseparatedbyoneor moreoccurrencesofwhitespace.Wedon’ttakeintoaccountanyparticular splittingsymbolsyoumayuse,e.g.in.NET String.Split 1 cantakeachar(or arrayofcharacters)thatdeterminesadelimitertousetosplitthecharacters withinthestringintochunksofstrings,resultinginanarrayofsub-strings.
InFigure11.2wepresentastringindexedasanarray.Typicallythepattern isthesameformostwords,delimitedbyasingleoccurrenceofwhitespace. Figure11.3showsthesamestring,withthesamenumberofwordsbutwith varyingwhitespacesplittingthem.
1 http://msdn.microsoft.com/en-us/library/system.string.split.aspx
1) algorithm WordCount(value)
2) Pre: value = ∅
3) Post: thenumberofwordscontainedwithin value isdetermined
4) inWord ← true
5) wordCount ← 0
6) index ← 0
7)//skipinitialwhitespace
8) while value[index]=whitespace and index<value.Length 1
9) index ← index +1
10) endwhile
11)//wasthestringjustwhitespace?
12) if index = value.Length and value[index]=whitespace 13) return 0 14) endif 15) while index<value.Length 16) if value[index]=whitespace 17)//skipallwhitespace
18) while value[index]=whitespace and index<value.Length 1
index ← index +1
inWord ← false
wordCount ← wordCount +1
11.4Determiningthenumberofrepeatedwords withinastring
Withthehelpofanunorderedset,andanalgorithmthatcansplitthewords withinastringusingaspecifieddelimiterthisalgorithmisstraightforwardto implement.Ifwesplitallthewordsusingasingleoccurrenceofwhitespace asourdelimiterwegetallthewordswithinthestringbackaselementsof anarray.Thenifweiteratethroughthesewordsaddingthemtoasetwhich containsonlyuniquestringswecanattainthenumberofuniquewordsfromthe string.Allthatislefttodoissubtracttheuniquewordcountfromthetotal numberofstingscontainedinthearrayreturnedfromthesplitoperation.The splitoperationthatwerefertoisthesameasthatmentionedin §11.3.
Figure11.4:a)Undesired
1) algorithm RepeatedWordCount(value)
2) Pre: value = ∅
3) Post: thenumberofrepeatedwordsin value isreturned
4) words ← value.Split(’’)
5) uniques ← Set
6) foreach word in words
7) uniques.Add(word.Strip())
8) endforeach
9) return words.Length uniques.Count
10) end RepeatedWordCount
Youwillnoticeinthe RepeatedWordCount algorithmthatweusethe Strip methodwereferredtoearlierin §11.1.Thissimplyremovesanypunctuation froma word.Thereasonweperformthisoperationoneach word issothat wecanbuildamoreaccurateuniquestringcollection,e.g.“test”,and“test!” arethesamewordminusthepunctuation.Figure11.4showstheundesiredand desiredsetsforthe unique setrespectively.
11.5Determiningthefirstmatchingcharacter betweentwostrings
Thealgorithmtodeterminewhetheranycharacterofastringmatchesanyofthe charactersinanotherstringisprettytrivial.Putsimply,wecanparsethestrings consideredusingadoubleloopandcheck,discardingpunctuation,theequality betweenanycharactersthusreturninganon-negativeindexthatrepresentsthe locationofthefirstcharacterinthematch(Figure11.5);otherwisewereturn -1ifnomatchoccurs.Thisapproachexhibitaruntimecomplexityof O(n2).
1) algorithm Any(word,match)
2) Pre: word,match = ∅
3) Post: index representingmatchlocationifoccured, 1otherwise
4) for i ← 0to word.Length 1
5) while word[i]=whitespace
6) i ← i +1
7) endwhile
8) for index ← 0to match.Length 1
9) while match[index]=whitespace
10) index ← index +1
11) endwhile
12) if match[index]= word[i]
13) return index 14) endif 15) endfor 16) endfor 17) return 1 18) end Any
11.6Summary
Wehopethatthereaderhasseenhowfunalgorithmsonstringdatatypes are.Stringsareprobablythemostcommondatatype(anddatastructurerememberwearedealingwithanarray)thatyouwillworkwithsoitsimportant thatyoulearntobecreativewiththem.Weforonefindstringsfascinating.A simpleGooglesearchonstringnuancesbetweenlanguagesandencodingswill provideyouwithagreatnumberofproblems.Nowthatwehavespurredyou alongalittlewithourintroductoryalgorithmsyoucandevisesomeofyourown.
AppendixA
AlgorithmWalkthrough
Learninghowtodesigngoodalgorithmscanbeassistedgreatlybyusinga structuredapproachtotracingitsbehaviour.Inmostcasestracinganalgorithm onlyrequiresasingletable.Inmostcasestracingisnotenough,youwillalso wanttouseadiagramofthedatastructureyouralgorithmoperateson.This diagramwillbeusedtovisualisetheproblemmoreeffectively.Seeingthings visuallycanhelpyouunderstandtheproblemquicker,andbetter.
Thetracetablewillstoreinformationaboutthevariablesusedinyouralgorithm.Thevalueswithinthistableareconstantlyupdatedwhenthealgorithm mutatesthem.Suchanapproachallowsyoutoattainahistoryofthevarious valueseachvariablehasheld.Youmayalsobeabletoinferpatternsfromthe valueseachvariablehascontainedsothatyoucanmakeyouralgorithmmore efficient.
Wehavefoundthisapproachbothsimple,andpowerful.Bycombininga visualrepresentationoftheproblemaswellashavingahistoryofpastvalues generatedbythealgorithmitcanmakeunderstanding,andsolvingproblems mucheasier.
Inthischapterwewillshowyouhowtoworkthroughbothiterative,and recursivealgorithmsusingthetechniqueoutlined.
A.1Iterativealgorithms
Wewilltracethe IsPalindrome algorithm(definedin §11.2)asourexample iterativewalkthrough.Beforeweevenlookatthevariablesthealgorithmuses, firstwewilllookattheactualdatastructurethealgorithmoperateson.It shouldbeprettyobviousthatweareoperatingonastring,buthowisthis represented?Astringisessentiallyablockofcontiguousmemorythatconsists ofsomechardatatypes,oneaftertheother.Eachcharacterinthestringcan beaccessedviaanindexmuchlikeyouwoulddowhenaccessingitemswithin anarray.Thepictureshouldbepresentingitself-astringcanbethoughtofas anarrayofcharacters.
Forourexamplewewilluse IsPalindrome tooperateonthestring“Never oddoreven”Nowweknowhowthestringdatastructureisrepresented,and thevalueofthestringwewilloperateonlet’sgoaheadanddrawitasshown inFigureA.1.
The IsPalindrome algorithmusesthefollowinglistofvariablesinsomeform throughoutitsexecution:
Havingidentifiedthevaluesofthevariablesweneedtokeeptrackofwe simplycreateacolumnforeachinatableasshowninTableA.1.
Now,usingthe IsPalindrome algorithmexecuteeachstatementupdating thevariablevaluesinthetableappropriately.TableA.2showsthefinaltable valuesforeachvariableusedin IsPalindrome respectively.
Whilethisapproachmaylookalittlebloatedinprint,onpaperitismuch morecompact.Wherewehavethestringsinthetableyoushouldannotate thesestringswitharrayindexestoaidthealgorithmwalkthrough.
Thereisoneotherpointthatweshouldclarifyatthistime-whetherto includevariablesthatchangeonlyafewtimes,ornotatallinthetracetable. InTableA.2wehaveincludedboththe value,and word variablesbecauseit wasconvenienttodoso.Youmayfindthatyouwanttopromotethesevalues toalargerdiagram(likethatinFigureA.1)andonlyusethetracetablefor variableswhosevalueschangeduringthealgorithm.Werecommendthatyou promotethecoredatastructurebeingoperatedontoalargerdiagramoutside ofthetablesothatyoucaninterrogateitmoreeasily.
Wecannotstressenoughhowimportantsuchtracesarewhendesigning youralgorithm.Youcanusethesetracetablestoverifyalgorithmcorrectness. Atthecostofasimpletable,andquicksketchofthedatastructureyouare operatingonyoucandevisecorrectalgorithmsquicker.Visualisingtheproblem domainandkeepingtrackofchangingdatamakesproblemsaloteasiertosolve. Moreoveryoualwayshaveapointofreferencewhichyoucanlookbackon.
A.2RecursiveAlgorithms
Forthemostpartworkingthroughrecursivealgorithmsisassimpleaswalking throughaniterativealgorithm.Oneofthethingsthatweneedtokeeptrack ofthoughiswhichmethodcallreturnstowho.Mostrecursivealgorithmsare muchsimpletofollowwhenyoudrawouttherecursivecallsratherthanusing atablebasedapproach.Inthissectionwewillusearecursiveimplementation ofanalgorithmthatcomputesanumberfromtheFiboncaccisequence.
1) algorithm Fibonacci(n)
2) Pre: n isthenumberinthefibonaccisequencetocompute
3) Post: thefibonaccisequencenumber n hasbeencomputed
4) if n< 1
5) return 0
6) elseif n< 2
7) return 1
8) endif
9) return Fibonacci(n 1)+Fibonacci(n 2)
10) end Fibonacci
Beforewejumpintoshowingyouadiagrammticrepresentationofthealgorithmcallsforthe Fibonacci algorithmwewillbrieflytalkaboutthecasesof thealgorithm.Thealgorithmhasthreecasesintotal:
Thefirsttwoitemsinthepreceedinglistarethebasecasesofthealgorithm. Untilwehitoneofourbasecasesinourrecursivemethodcalltreewewon’t returnanything.Thethirditemfromthelistisourrecursivecase.
Witheachcalltotherecursivecaseweetcheverclosertooneofourbase cases.FigureA.2showsadiagrammticrepresentationoftherecursivecallchain. InFigureA.2theorderinwhichthemethodsarecalledarelabelled.Figure A.3showsthecallchainannotatedwiththereturnvaluesofeachmethodcall aswellastheorderinwhichmethodsreturntotheircallers.InFigureA.3the returnvaluesarerepresentedasannotationstotheredarrows.
Itisimportanttonotethateachrecursivecallonlyeverreturnstoitscaller uponhittingoneofthetwobasecases.Whenyoudoeventuallyhitabasecase thatbranchofrecursivecallsceases.Uponhittingabasecaseyougobackto
thecallerandcontinueexecutionofthatmethod.Executioninthecalleris contiuedatthenextstatement,orexpressionaftertherecursivecallwasmade.
Inthe Fibonacci algorithms’recursivecasewemaketworecursivecalls. Whenthefirstrecursivecall(Fibonacci(n 1))returnstothecallerwethen executethethesecondrecursivecall(Fibonacci(n 2)).Afterbothrecursive callshavereturnedtotheircaller,thecallercanthensubesequentlyreturnto itscallerandsoon.
Recursivealgorithmsaremucheasiertodemonstratediagrammaticallyas FigureA.2demonstrates.Whenyoucomeacrossarecursivealgorithmdraw methodcalldiagramstounderstandhowthealgorithmworksatahighlevel.
A.3Summary
Understandingalgorithmscanbehardattimes,particularlyfromanimplementationperspective.Inordertounderstandanalgorithmtryandworkthrough itusingtracetables.Incaseswherethealgorithmisalsorecursivesketchthe recursivecallsoutsoyoucanvisualisethecall/returnchain.
Inthevastmajorityofcasesimplementinganalgorithmissimpleprovided thatyouknowhowthealgorithmworks.Masteringhowanalgorithmworks fromahighleveliskeyfordevisingawelldesignedsolutiontotheproblemin hand.
AppendixB
TranslationWalkthrough
Theconversionfrompseudotoanactualimperativelanguageisusuallyvery straightforward,toclarifyanexampleisprovided.Inthisexamplewewill convertthealgorithmin §9.1totheC#language.
1)publicstaticboolIsPrime(intnumber)
2) {
3)if(number < 2)
4) { 5)returnfalse;
6) }
7)intinnerLoopBound=(int)Math.Floor(Math.Sqrt(number)); 8)for(inti=1;i < number;i++)
9) { 10)for(intj=1;j <=innerLoopBound;j++)
11) { 12)if(i ∗ j==number)
13) { 14)returnfalse;
15) }
16) }
17) } 18)returntrue;
19) }
Forthemostparttheconversionisastraightforwardprocess,howeveryou mayhavetoinjectvariouscallstootherutilityalgorithmstoascertainthe correctresult.
Aconsiderationtotakenoteofisthatmanyalgorithmshavefairlystrict preconditions,ofwhichtheremaybeseveral-inthesescenariosyouwillneed toinjectthecorrectcodetohandlesuchsituationstopreservethecorrectnessof thealgorithm.Mostofthepreconditionscanbesuitablyhandledbythrowing thecorrectexception.
B.1Summary
Asyoucanseefromtheexampleusedinthischapterwehavetriedtomakethe translationofourpseudocodealgorithmstomainstreamimperativelanguages assimpleaspossible.
Wheneveryouencounterakeywordwithinourpseudocodeexamplesthat youareunfamiliarwithjustbrowsetoAppendixEwhichdescirbeseachkeyword.
AppendixC
RecursiveVs.Iterative Solutions
Oneofthemostsuccinctpropertiesofmodernprogramminglanguageslike C++,C#,andJava(aswellasmanyothers)isthattheselanguagesallow youtodefinemethodsthatreferencethemselves,suchmethodsaresaidtobe recursive.Oneofthebiggestadvantagesrecursivemethodsbringtothetableis thattheyusuallyresultinmorereadable,andcompactsolutionstoproblems.
Arecursivemethodthenisonethatisdefinedintermsofitself.Generally arecursivealgorithmshastwomainproperties:
1. Oneormorebasecases;and
2. Arecursivecase
Fornowwewillbrieflycoverthesetwoaspectsofrecursivealgorithms.With eachrecursivecallweshouldbemakingprogresstoourbasecaseotherwisewe aregoingtorunintotrouble.Thetroublewespeakofmanifestsitselftypically asastackoverflow,wewilldescribewhylater.
Nowthatwehavebrieflydescribedwhatarecursivealgorithmisandwhy youmightwanttousesuchanapproachforyouralgorithmswewillnowtalk aboutiterativesolutions.Aniterativesolutionusesnorecursionwhatsoever. Aniterativesolutionreliesonlyontheuseofloops(e.g.for,while,do-while, etc).Thedownsidetoiterativealgorithmsisthattheytendnottobeasclear astotheirrecursivecounterpartswithrespecttotheiroperation.Themajor advantageofiterativesolutionsisspeed.Mostproductionsoftwareyouwill finduseslittleornorecursivealgorithmswhatsoever.Thelatterpropertycan sometimesbeacompaniesprerequisitetocheckingincode,e.g.uponchecking inastaticanalysistoolmayverifythatthecodethedeveloperischeckingin containsnorecursivealgorithms.Normallyitissystemslevelcodethathasthis zerotolerancepolicyforrecursivealgorithms.
Usingrecursionshouldalwaysbereservedforfastalgorithms,youshould avoiditforthefollowingalgorithmruntimedeficiencies:
Ifyouuserecursionforalgorithmswithanyoftheaboveruntimeefficiency’s youareinvitingtrouble.Thegrowthrateofthesealgorithmsishighandin mostcasessuchalgorithmswillleanveryheavilyontechniqueslikedivideand conquer.Whileconstantlysplittingproblemsintosmallerproblemsisgood practice,inthesecasesyouaregoingtobespawningalotofmethodcalls.All thisoverhead(methodcallsdon’tcome that cheap)willsoonpileupandeither causeyouralgorithmtorunalotslowerthanexpected,orworse,youwillrun outofstackspace.Whenyouexceedtheallottedstackspaceforathreadthe processwillbeshutdownbytheoperatingsystem.Thisisthecaseirrespective oftheplatformyouuse,e.g..NET,ornativeC++etc.Youcanaskforabigger stacksize,butyoutypicallyonlywanttodothisifyouhaveaverygoodreason todoso.
C.1ActivationRecords
Anactivationrecordiscreatedeverytimeyouinvokeamethod.Putsimply anactivationrecordissomethingthatisputonthestacktosupportmethod invocation.Activationrecordstakeasmallamountoftimetocreate,andare prettylightweight.
Normallyanactivationrecordforamethodcallisasfollows(thisisvery general):
• Theactualparametersofthemethodarepushedontothestack
• Thereturnaddressispushedontothestack
• Thetop-of-stackindexisincrementedbythetotalamountofmemory requiredbythelocalvariableswithinthemethod
• Ajumpismadetothemethod
Inmanyrecursivealgorithmsoperatingonlargedatastructures,oralgorithmsthatareinefficientyouwillrunoutofstackspacequickly.Consideran algorithmthatwheninvokedgivenaspecificvalueitcreatesmanyrecursive calls.Insuchacaseabigchunkofthestackwillbeconsumed.Wewillhaveto waituntiltheactivationrecordsstarttobeunwoundafterthenestedmethods inthecallchainexitandreturntotheirrespectivecaller.Whenamethodexits it’sactivationrecordisunwound.Unwindinganactivationrecordresultsin severalsteps:
1. Thetop-of-stackindexisdecrementedbythetotalamountofmemory consumedbythemethod
2. Thereturnaddressispoppedoffthestack
3. Thetop-of-stackindexisdecrementedbythetotalamountofmemory consumedbytheactualparameters
Whileactivationrecordsareanefficientwaytosupportmethodcallsthey canbuildupveryquickly.Recursivealgorithmscanexhaustthestacksize allocatedtothethreadfairlyfastgiventhechance.
Justaboutnowweshouldbedustingthecobwebsofftheageoldexampleof aniterativevs.recursivesolutionintheformoftheFibonaccialgorithm.This isafamousexampleasithighlightsboththebeautyandpitfallsofarecursive algorithm.Theiterativesolutionisnotaspretty,norselfdocumentingbutit doesthejobalotquicker.IfweweretogivetheFibonaccialgorithmaninput ofsay60thenwewouldhavetowaitawhiletogetthevaluebackbecauseit hasan O(gn)runtime.Theiterativeversionontheotherhandhasa O(n) runtime.Don’tletthisputyouoffrecursion.Thisexampleismainlyused toshockprogrammersintothinkingabouttheramificationsofrecursionrather thanwarningthemoff.
C.2Someproblemsarerecursiveinnature
Somethingthatyoumaycomeacrossisthatsomedatastructuresandalgorithmsareactuallyrecursiveinnature.Aperfectexampleofthisisatreedata structure.Acommontreenodeusuallycontainsavalue,alongwithtwopointerstotwoothernodesofthesamenodetype.Asyoucanseetreeisrecursive initsmakeupwiteachnodepossiblypointingtotwoothernodes.
Whenusingrecursivealgorithmsontree’sitmakessenseasyouaresimply adheringtotheinherentdesignofthedatastructureyouareoperatingon.Of courseitisnotallgoodnews,afterallwearestillboundbythelimitationswe havementionedpreviouslyinthischapter.
Wecanalsolookatsortingalgorithmslikemergesort,andquicksort.Both ofthesealgorithmsarerecursiveintheirdesignandsoitmakessensetomodel themrecursively.
C.3Summary
Recursionisapowerfultool,andonethatallprogrammersshouldknowof. Oftensoftwareprojectswilltakeatradebetweenreadability,andefficiencyin whichcaserecursionisgreatprovidedyoudon’tgoanduseittoimplement analgorithmwithaquadraticruntimeorhigher.Ofcoursethisisnotarule ofthumb,thisisjustusthrowingcautiontothewind.Defensivecodingwill alwaysprevail.
Manytimesrecursionhasanaturalhomeinrecursivedatastructuresand algorithmswhicharerecursiveinnature.Usingrecursioninsuchscenariosis perfectlyacceptable.Usingrecursionforsomethinglikelinkedlisttraversalis alittleoverkill.Itsiterativecounterpartisprobablylesslinesofcodethanits recursivecounterpart.
Becausewecanonlytalkabouttheimplicationsofusingrecursionfroman abstractpointofviewyoushouldconsultyourcompilerandruntimeenvironmentformoredetails.Itmaybethecasethatyourcompilerrecognisesthings liketailrecursionandcanoptimisethem.Thisisn’tunheardof,infactmost commercialcompilerswilldothis.Theamountofoptimisationcompilerscan
dothoughissomewhatlimitedbythefactthatyouarestillusingrecursion. You,asthedeveloperhavetoacceptcertainaccountability’sforperformance.
AppendixD
Testing
Testingisanessentialpartofsoftwaredevelopment.Testinghasoftenbeen discardedbymanydevelopersinthebeliefthattheburdenofproofoftheir softwareisonthosewithinthecompanywhoholdtestcentricroles.This couldn’tbefurtherfromthetruth.Asadeveloperyoushouldatleastprovide asuiteofunitteststhatverifycertainboundaryconditionsofyoursoftware.
Agreatthingabouttestingisthatyoubuildupprogressivelyasafetynet.If youaddortweakalgorithmsandthenrunyoursuiteoftestsyouwillbequickly alertedtoanycasesthatyouhavebrokenwithyourrecentchanges.Suchasuite oftestsinanysizeableprojectisabsolutelyessentialtomaintainingafairlyhigh barwhenitcomestoquality.Ofcourseinordertoattainsuchastandardyou needtothinkcarefullyabouttheteststhatyouconstruct.
Unittestingwhichwillbethesubjectofthevastmajorityofthischapter arewidelyavailableonmostplatforms.MostmodernlanguageslikeC++,C#, andJavaofferanimpressivecatalogueoftestingframeworksthatyoucanuse forunittesting.
Thefollowinglistidentifiestestingframeworkswhicharepopular:
JUnit: TargetedatJav., http://www.junit.org/
NUnit: CanbeusedwithlanguagesthattargetMicrosoft’sCommonLanguage Runtime. http://www.nunit.org/index.php
BoostTestLibrary: TargetedatC++.Thetestlibrarythatshipswiththeincrediblypopular Boostlibraries. http://www.boost.org.Adirectlinktothelibrariesdocumentation http://www.boost.org/doc/libs/1_36_0/libs/test/doc/ html/index.html
CppUnit: TargetedatC++. http://cppunit.sourceforge.net/
Don’tworryifyouthinkthatthelistisverysparse,therearefarmoreon offerthanthosethatwehavelisted.Theoneslistedarethetestingframeworks thatwebelievearethemostpopularforC++,C#,andJava.
D.1Whatconstitutesaunittest?
Aunittestshouldfocusonasingleatomicpropertyofthesubjectbeingtested. Donottryandtestmanythingsatonce,thiswillresultinasuiteofsomewhat
unstructuredtests.Asanexampleifyouwerewantingtowriteatestthat verifiedthataparticularvalue V isreturnedfromaspecificinput I thenyour testshoulddothesmallestamountofworkpossibletoverifythat V iscorrect given I.Aunittestshouldbesimpleandselfdescribing.
Aswellasaunittestbeingrelativelyatomicyoushouldalsomakesurethat yourunittestsexecutequickly.Ifyoucanimagineinthefuturewhenyoumay haveatestsuiteconsistingofthousandsoftestsyouwantthoseteststoexecute asquicklyaspossible.Failuretoattainsuchagoalwillmostlikelyresultin thesuiteoftestsnotbeingranthatoftenbythedevelopersonyourteam.This canoccurforanumberofreasonsbutthemainonewouldbethatitbecomes incrediblytediouswaitingseveralminutestoruntestsonadeveloperslocal machine.
Buildingupatestsuitecanhelpgreatlyinateamscenario,particularly whenusingacontinuousbuildserver.Insuchascenarioyoucanhavethesuite oftestsdevisedbythedevelopersandtestersranaspartofthebuildprocess.
Employingsuchstrategiescanhelpyoucatchnigglinglittleerrorcasesearly ratherthanviayourcustomerbase.Thereisnothingmoreembarrassingfora developerthantohaveaverytrivialbugintheircodereportedtothemfroma customer.
D.2WhenshouldIwritemytests?
Asourceofgreatdebatewouldbeanunderstatementtopersonifysuchaquestionasthis.Inrecentyearsatestdrivenapproachtodevelopmenthasbecome verypopular.Suchanapproachisknownastestdrivendevelopment,ormore commonlytheacronymTDD.
OneofthefoundingprinciplesofTDDistowritetheunittestfirst,watch itfailandthenmakeitpass.Thepremisebeingthatyouonlyeverwrite enoughcodeatanyonetimetosatisfythestatebasedassertionsmadeinaunit test.Wehavefoundthisapproachtoprovideamorestructuredintenttothe implementationofalgorithms.Atanyonestageyouonlyhaveasinglegoal,to makethefailingtestpass.BecauseTDDmakesyouwritethetestsupfrontyou neverfindyourselfinasituationwhereyouforget,orcan’tbebotheredtowrite testsforyourcode.Thisisoftenthecasewhenyouwriteyourtestsafteryou havecodedupyourimplementation.We,astheauthorsofthisbookourselves useTDDasourpreferredmethod.
AswehavealreadymentionedthatTDDisourfavouredapproachtotesting itwouldbesomewhatofaninjusticetonotlist,anddescribethemantrathat isoftenassociatewithit:
Red: Signifiesthatthetesthasfailed.
Green: Thefailingtestnowpasses.
Refactor: Canwerestructureourprogramsoitmakesmoresense,andeasierto maintain?
Thefirstpointoftheabovelistalwaysoccursatleastonce(moreifyoucount thebuilderror)inTDDinitially.Yourtaskatthisstageissolelytomakethe testpass,thatistomaketherespectivetestgreen.Thelastitemisbasedaround
therestructuringofyourprogramtomakeitasreadableandmaintainableas possible.ThelastpointisveryimportantasTDDisaprogressivemethodology tobuildingasolution.Ifyouadheretoprogressiverevisionsofyouralgorithm restructuringwhenappropriateyouwillfindthatusingTDDyoucanimplement verycleanlystructuredtypesandsoon.
D.3HowseriouslyshouldIviewmytestsuite?
Yourtestsareamajorpartofyourprojectecosystemandsotheyshouldbe treatedwiththesameamountofrespectasyourproductioncode.Thisranges fromcorrect,andcleancodeformatting,tothetestingcodebeingstoredwithin asourcecontrolrepository.
EmployingamethodologylikeTDD,ortestingafterimplementingyouwill findthatyouspendagreatamountoftimewritingtestsandthustheyshould betreatednodifferentlytoyourproductioncode.Alltestsshouldbeclearly named,andfullydocumentedastotheirintent.
D.4ThethreeA’s
Nowthatyouhaveasenseoftheimportanceofyourtestsuiteyouwillinevitably wanttoknowhowtoactuallystructureeachblockofimperativeswithinasingle unittest.Apopularapproach-thethreeA’sisdescribedinthefollowinglist:
Assemble: Createtheobjectsyourequireinordertoperformthestatebasedassertions.
Act: Invoketherespectiveoperationsontheobjectsyouhaveassembledto mutatethestatetothatdesiredforyourassertions.
Assert: Specifywhatyouexpecttoholdaftertheprevioustwosteps.
Thefollowingexampleshowsasimpletestmethodthatemploysthethree A’s:
D.5Thestructuringoftests
Structuringtestscanbevieweduponasbeingthesameasstructuringproductioncode,e.g.allunittestsfora Person typemaybecontainedwithin
a PersonTest type.Typicallyalltestsareabstractedfromproductioncode. Thatisthatthetestsaredisjointfromtheproductioncode,youmayhavetwo dynamiclinklibraries(dll);thefirstcontainingtheproductioncode,thesecond containingyourtestcode.
Wecanalsousethingslikeinheritanceetcwhendefiningclassesoftests. Thepointbeingthatthetestcodeisverymuchlikeyourproductioncodeand youshouldapplythesameamountofthoughttoitsstructureasyouwoulddo theproductioncode.
D.6CodeCoverage
Somethingthatyoucangetasaproductofunittestingarecodecoverage statistics.Codecoverageismerelyanindicatorastotheportionsofproduction codethatyourunitstestscover.UsingTDDitislikelythatyourcodecoverage willbeveryhigh,althoughitwillvarydependingonhoweasyitistouseTDD withinyourproject.
D.7Summary
Testingiskeytothecreationofamoderatelystableproduct.Moreoverunit testingcanbeusedtocreateasafetyblanketwhenaddingandremovingfeatures providinganearlywarningforbreakingchangeswithinyourproductioncode.
AppendixE
SymbolDefinitions
Throughoutthepseudocodelistingsyouwillfindseveralsymbolsused,describes themeaningofeachofthosesymbols.
Symbol Description
← Assignment.
= Equality.
≤ Lessthanorequalto.
< Lessthan.*
≥ Greaterthanorequalto.
> Greaterthan.*
= Inequality.
∅ Null. and Logicaland. or Logicalor. whitespace Singleoccurrenceofwhitespace. yield Like return butbuildsasequence.
*Thissymbolhasadirecttranslationwiththevastmajorityofimperative counterparts.
TableE.1:Pseudosymboldefinitions