Big data benchmarking 6th international workshop wbdb 2015 toronto on canada june 16 17 2015 and 7th

Page 1


Big Data Benchmarking 6th International Workshop WBDB 2015 Toronto ON Canada June 16 17 2015 and 7th International Workshop WBDB

2015 New Delhi India December 14 15

2015 Revised Selected Papers 1st Edition Tilmann Rabl

Visit to download the full and correct content document: https://textbookfull.com/product/big-data-benchmarking-6th-international-workshop-w bdb-2015-toronto-on-canada-june-16-17-2015-and-7th-international-workshop-wbdb2015-new-delhi-india-december-14-15-2015-revised-selected-papers-1st-ed/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Enterprise Security Second International Workshop ES 2015 Vancouver BC Canada November 30 December 3 2015 Revised Selected Papers 1st Edition Victor Chang

https://textbookfull.com/product/enterprise-security-secondinternational-workshop-es-2015-vancouver-bc-canadanovember-30-december-3-2015-revised-selected-papers-1st-editionvictor-chang/

Swarm Evolutionary and Memetic Computing 6th International Conference SEMCCO 2015 Hyderabad India December 18 19 2015 Revised Selected Papers 1st Edition Bijaya Ketan Panigrahi

https://textbookfull.com/product/swarm-evolutionary-and-memeticcomputing-6th-international-conference-semcco-2015-hyderabadindia-december-18-19-2015-revised-selected-papers-1st-editionbijaya-ketan-panigrahi/

Big Data Benchmarks Performance Optimization and Emerging Hardware 6th Workshop BPOE 2015 Kohala HI USA August 31 September 4 2015 Revised Selected Papers 1st Edition Jianfeng Zhan

https://textbookfull.com/product/big-data-benchmarks-performanceoptimization-and-emerging-hardware-6th-workshop-bpoe-2015-kohalahi-usa-august-31-september-4-2015-revised-selected-papers-1stedition-jianfeng-zhan/

Graphical Models for Security Second International Workshop GraMSec 2015 Verona Italy July 13 2015 Revised Selected Papers 1st Edition Sjouke Mauw

https://textbookfull.com/product/graphical-models-for-securitysecond-international-workshop-gramsec-2015-verona-italyjuly-13-2015-revised-selected-papers-1st-edition-sjouke-mauw/

Brain Inspired Computing Second International Workshop

BrainComp 2015 Cetraro Italy July 6 10 2015 Revised Selected Papers 1st Edition Katrin Amunts

https://textbookfull.com/product/brain-inspired-computing-secondinternational-workshop-braincomp-2015-cetraro-italyjuly-6-10-2015-revised-selected-papers-1st-edition-katrin-amunts/

Music Mind and Embodiment 11th International Symposium

CMMR 2015 Plymouth UK June 16 19 2015 Revised Selected Papers 1st Edition Richard Kronland-Martinet

https://textbookfull.com/product/music-mind-and-embodiment-11thinternational-symposium-cmmr-2015-plymouth-ukjune-16-19-2015-revised-selected-papers-1st-edition-richardkronland-martinet/

Worldwide Language Service Infrastructure Second International Workshop WLSI 2015 Kyoto Japan January 22 23 2015 Revised Selected Papers 1st Edition Yohei Murakami

https://textbookfull.com/product/worldwide-language-serviceinfrastructure-second-international-workshop-wlsi-2015-kyotojapan-january-22-23-2015-revised-selected-papers-1st-editionyohei-murakami/

Computer Assisted and Robotic Endoscopy Second International Workshop CARE 2015 Held in Conjunction with MICCAI 2015 Munich Germany October 5 2015 Revised Selected Papers 1st Edition Xiongbiao Luo

https://textbookfull.com/product/computer-assisted-and-roboticendoscopy-second-international-workshop-care-2015-held-inconjunction-with-miccai-2015-munich-germanyoctober-5-2015-revised-selected-papers-1st-edition-xiongbiao-luo/

Sports Science Research and Technology Support Third

International Congress icSPORTS 2015 Lisbon Portugal November 15 17 2015 Revised Selected Papers 1st Edition Jan Cabri

https://textbookfull.com/product/sports-science-research-andtechnology-support-third-international-congressicsports-2015-lisbon-portugal-november-15-17-2015-revisedselected-papers-1st-edition-jan-cabri/

Tilmann Rabl · Raghunath Nambiar

Chaitanya Baru · Milind Bhandarkar

Meikel Poess · Saumyadipta Pyne (Eds.)

Big Data Benchmarking

6th International Workshop, WBDB 2015 Toronto, ON, Canada, June 16–17, 2015 and 7th International Workshop, WBDB 2015 New Delhi, India, December 14–15, 2015 Revised Selected Papers

LectureNotesinComputerScience10044

CommencedPublicationin1973

FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen

EditorialBoard

DavidHutchison

LancasterUniversity,Lancaster,UK

TakeoKanade

CarnegieMellonUniversity,Pittsburgh,PA,USA

JosefKittler UniversityofSurrey,Guildford,UK

JonM.Kleinberg

CornellUniversity,Ithaca,NY,USA

FriedemannMattern

ETHZurich,Zurich,Switzerland

JohnC.Mitchell

StanfordUniversity,Stanford,CA,USA

MoniNaor

WeizmannInstituteofScience,Rehovot,Israel

C.PanduRangan

IndianInstituteofTechnology,Madras,India

BernhardSteffen

TUDortmundUniversity,Dortmund,Germany

DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA

DougTygar UniversityofCalifornia,Berkeley,CA,USA

GerhardWeikum

MaxPlanckInstituteforInformatics,Saarbrücken,Germany

Moreinformationaboutthisseriesathttp://www.springer.com/series/7409

TilmannRabl • RaghunathNambiar

ChaitanyaBaru • MilindBhandarkar

MeikelPoess • SaumyadiptaPyne(Eds.)

BigData Benchmarking

6thInternationalWorkshop,WBDB2015

Toronto,ON,Canada,June16–17,2015and

7thInternationalWorkshop,WBDB2015

NewDelhi,India,December14–15,2015

RevisedSelectedPapers

Editors

TilmannRabl

TechnicalUniversityofBerlin

Berlin Germany

RaghunathNambiar CiscoSystems,Inc. SanJose,CA USA

ChaitanyaBaru UniversityofCaliforniaatSanDiego LaJolla,CA USA

MilindBhandarkar Ampool,Inc. SantaClara,CA USA

MeikelPoess OracleCorporation RedwoodShores,CA USA

SaumyadiptaPyne IndianInstituteofPublicHealth Hyderabad India

ISSN0302-9743ISSN1611-3349(electronic) LectureNotesinComputerScience

ISBN978-3-319-49747-1ISBN978-3-319-49748-8(eBook) DOI10.1007/978-3-319-49748-8

LibraryofCongressControlNumber:2016958990

LNCSSublibrary:SL3 – InformationSystemsandApplications,incl.Internet/Web,andHCI

© SpringerInternationalPublishingAG2016

Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe materialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped.

Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse.

Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors giveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsor omissionsthatmayhavebeenmade.

Printedonacid-freepaper

ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland

Preface

Bigdataprocessinghasevolvedfromabuzz-wordtoapervasivetechnologyshiftin onlyafewyears.Theplethoraofsystemsdevelopedorextendedforbigdataprocessingmakesbigdatabenchmarkingmoreimportantthanever.

Formedin2012,theBigDataBenchmarkingCommunity(BDBC)representeda majorstepinfacilitatingthedevelopmentofbenchmarksforobjectivecomparisonsof hardwareandsoftwaresystemsdealingwithemergingbigdataapplications.Ledby ChaitanyaBaru,TilmannRabl,MilindBhandarkar,RaghunathNambiar,andMeikel Poess,theBDBChassuccessfullyconductedseveninternationalWorkshopsonBig DataBenchmarking(WBDB)inonly4yearsbringingindustryexpertsandresearchers togethertopresentanddebatechallenges,ideas,andmethodologiestobenchmarkbig datasystems.In2015,theBBDCjoinedtheSPECResearchGrouptofurtherorganize andstructurethebenchmarkingeffortscollectedundertheBBDCumbrella.TheBig DataWorkingGroup1 meetsregularlyasaplatformfordiscussionanddevelopmentof bigdatabenchmarks.

Whilebeingaplatformofdiscussion,theBBDCandSPECResearchGroupalso demonstratedpracticalimpactinthe fieldofbigdatabenchmarking.In2015,BigBench,oneofthebenchmarkeffortsstartedatWBDB2012.us,wasadoptedbythe TransactionProcessingPerformanceCounciltobecomethe fi rstindustrystandardendto-endbenchmarkforbigdataanalysis.Thebenchmarkwasofficiallyreleasedunder thenameTPCx-BBinJanuary20162

Thisbookcontainsthejointproceedingsofthe6thand7thWorkshoponBigData Benchmarking.The6th WBDBwasheldinToronto,Canada,duringJune16–17,2015, hostedbytheUniversityofToronto.The7th WBDBwasheldinNewDelhi,India, duringDecember14–15,2015,attheIndiaHabitatCentre.Bothworkshopswerewell attendedbyindustryandacademiaalikeandfeaturedSPECResearchBigData WorkingGroupmeetings.KeynotespeakersinWBDB2015.cawereTamer Özsu (UniversityofWaterloo)andAnilGoel(SAP).Tamer Özsupresentedbenchmarksfor graphmanagementandAnilGoelgaveanoverviewoftheSAPbigdatainfrastructure. InWBDB2015.inMichaelFranklin(UCBerkeley)presentedemergingtrendsinbig datasoftware,GeoffreyFox(IndianaUniversityBloomington)discussedtheconvergenceofbigdataandhigh-performancecomputing(HPC),andMukundDesphande (PersistentSystems)introducedtheuseofWebtechnologiesindatabasearchitecture.

Inthisbook,wehavecollectedrecenttrendsinbigdataandHPCconvergence,new proposalsforbigdatabenchmarking,aswellastoolingandperformanceresults.Inthe firstpartofthebook,anoverviewofchallengesandopportunitiesoftheconvergence ofHPCandbigdataisgiven.Thesecondpartpresentstwonovelbenchmarks,one basedontheIndianAadhaardatabasethatcontainsbiometricinformationofhundreds

1 See https://research.spec.org/working-groups/big-data-working-group

2 See http://www.tpc.org/tpcx-bb/default.asp

ofmillionsofpeopleandoneforarraydatabasesystems.Thethirdpartcontains experiencesfromabenchmarkingframeworkforHadoopsetups.Inthe finalpart, severalperformanceresultsfromextensiveexperimentsonCassandra,Spark,and Hadooparepresented.

Thesevenpapersinthisbookwereselectedoutofatotalof39presentationsin WBDB2015.caandWBDB2015.in.Allpaperswerereviewedintworounds.Wethank thesponsors,membersoftheProgramCommittee,authors,andparticipantsfortheir contributionstotheseworkshops.Thehardworkandclosecooperationofanumberof peoplehavebeencriticaltothesuccessoftheWBDBworkshopseries.

August2016TilmannRabl

WBDB2015Organization

GeneralChairs

TilmannRablTechnischeUniversitätBerlin,Germany

ChaitanyaBaruSanDiegoSupercomputerCenter,USA

ProgramCommitteeChairs

WBDB2015.ca

TilmannRablTechnischeUniversitätBerlin,Germany

WBDB2015.in

SaumyadiptaPyneIndianInstituteofPublicHealth,Hyderabad,India

ProgramCommittee

NelsonAmaralUniversityofAlberta,Canada

AmitabhaBagchiIITDelhi,India

MilindBhandarkarAmpool,USA

TobiasBürgerPayback,Germany

AnnCavoukianRyersonUniversity,Canada

FeiChiangMcMasterUniversity,Canada

PedroFurtadoUniversityofCoimbra,Portugal

PulakGhoshIIMBangalore,India

BorisGlavicIITChicago,USA

ParkeGodfreyYorkUniversity,Canada

AnilGoelSAP,Canada

BhaskarGowdaIntel,USA

ChetanGuptaHitachi,USA

RajendraJoshiC-DAC,Pune,India

PatrickHungUniversityofOntarioInstituteofTechnology,Canada

ArnabLahaIIMAhmedabad,India

AsteriosKatsifodimosTUBerlin,Germany

JianLiHuawei,Canada

SergeMankovskiiCA,USA

TridibMukherjeeXeroxResearchCenterIndia

RaghunathNambiarCisco,USA

GlennPaulleyConestogaCollege,Canada

ScottPearsonCray,USA

MeikelPoessOracle,USA

NicolasPoggiBarcelonaSupercomputerCenter,Spain SaumyadiptaPyneCRRaoAIMSCS,Hyderabad,India

FrancoisRaabInfoSizing,USA

V.C.V.RaoC-DAC,Pune,India

KenSalemUniversityofWaterloo,Canada

BerniSchieferIBM,USA

YogeshSimmhanIISc,USA

RaghavendraSinghIBM,India

DinkarSitaramPESUniversity,Bangalore,India

ChiranjibSurShell,India

AliTizghadamTelus,Canada

JonasTraubTechnischeUniversitätBerlin,Germany

MatthiasUflackerHassoPlattnerInstitut,Germany

HernaViktorUniversityofOttawa,Canada

AnilVullikantiVirginiaBioinformaticsInstitute,USA

JianfengZhanChineseAcademyofSciences,China

RobertoV.ZicariFrankfurtBigDataLab,GoetheUniversityFrankfurt, Germany

WBDB2015Sponsors

WBDB2015.caSponsors

PlatinumSponsorIntel

GoldSponsorIBM

SilverSponsorCray

WBDB2015.inSponsors

GoldSponsorsIntel,NSF

SilverSponsorsAmpool,GovernmentofIndia

BronzeSponsorsPersistent DinnerSponsorNvidia

FutureChallenges

BigData,SimulationsandHPCConvergence.......................3 GeoffreyFox,JudyQiu,ShantenuJha,SaliyaEkanayake, andSupunKamburugamuve

NewBenchmarkProposals

BenchmarkingFast-DataPlatformsforthe Aadhaar BiometricDatabase....21 YogeshSimmhan,AnshuShukla,andArunVerma

TowardsaGeneralArrayDatabaseBenchmark:MeasuringStorage Access..................................................40 GeorgeMerticariu,DimitarMisev,andPeterBaumann

SupportingTechnology

ALOJA:ABenchmarkingandPredictivePlatformforBigData PerformanceAnalysis.......................................71 NicolasPoggi,JosepLl.Berral,andDavidCarrera

ExperimentalResults

BenchmarkingtheAvailabilityandFaultToleranceofCassandra.........87 MartenRosselli,RaikNiemann,TodorIvanov,KarstenTolle, andRobertoV.Zicari

PerformanceEvaluationofSparkSQLUsingBigBench................96 TodorIvanovandMax-GeorgBeer

AcceleratingBigBenchonHadoop..............................117 YanTang,GowdaBhaskar,JackChen,XinHao,YiZhou,YiYao, andLifengWang

AuthorIndex ............................................129

FutureChallenges

BigData,SimulationsandHPCConvergence

GeoffreyFox1(B) ,JudyQiu1 ,ShantenuJha2 , SaliyaEkanayake1 ,andSupunKamburugamuve1

1 SchoolofInformaticsandComputing,IndianaUniversity, Bloomington,IN47408,USA gcf@indiana.edu

2 ECE,RutgersUniversity,Piscataway,NJ08854,USA

Abstract. Twomajortrendsincomputingsystemsarethegrowth inhighperformancecomputing(HPC)withinparticularaninternationalexascaleinitiative,andbigdatawithanaccompanyingcloud infrastructureofdramaticandincreasingsizeandsophistication.Inthis paper,westudyanapproachtoconvergenceforsoftwareandapplications/algorithmsandshowwhathardwarearchitecturesitsuggests.We startbydividingapplicationsintodataplusmodelcomponentsandclassifyingeachcomponent(whetherfromBigDataorBigCompute)inthe sameway.Thisleadsto64propertiesdividedinto4views,whichare ProblemArchitecture(Macropattern);ExecutionFeatures(Micropatterns);DataSourceandStyle;andfinallytheProcessing(runtime)View. WediscussconvergencesoftwarebuiltaroundHPC-ABDS(HighPerformanceComputingenhancedApacheBigDataStack)andshowhowone canmergeBigDataandHPC(BigSimulation)conceptsintoasingle stackanddiscussappropriatehardware.

Keywords: BigData · HPC · Simulations

1Introduction

Twomajortrendsincomputingsystemsarethegrowthinhighperformance computing(HPC)withaninternationalexascaleinitiative,andthebigdataphenomenonwithanaccompanyingcloudinfrastructureofwellpublicizeddramatic andincreasingsizeandsophistication.Therehasbeensubstantialdiscussion oftheconvergenceofbigdataanalytics,simulationsandHPC[1, 11–13, 29, 30] highlightedbythePresidentialNationalStrategicComputingInitiative[5].In studyingandlinkingthesetrendsandtheirconvergence,oneneedstoconsider multipleaspects:hardware,software,applications/algorithmsandevenbroader issueslikebusinessmodelandeducation.Herewefocusonsoftwareandapplications/algorithmsandmakecommentsontheotheraspects.Wediscussapplications/algorithmsinSect. 2,softwareinSect. 3 andlinkthemandotheraspects inSect. 4

c SpringerInternationalPublishingAG2016 T.Rabletal.(Eds.):WBDB2015,LNCS10044,pp.3–17,2016. DOI:10.1007/978-3-319-49748-8 1

2ApplicationsandAlgorithms

Weextendtheanalysisgivenbyus[18, 21],whichusedideasinearlierparallel computingstudies[8, 9, 31]tobuildasetofBigDataapplicationcharacteristics with50features–calledfacets–dividedinto4views.Asitincorporatedthe approachoftheBerkeleydwarfs[8]andincludedfeaturesfromtheNRCMassive DataAnalysisReportsComputationalGiants[27],wetermedthesecharacteristicsasOgres.HerewegeneralizeapproachtointegrateBigDataandSimulation applicationsintoasingleclassificationthatwecallconvergencediamondswith atotalof64facetssplitbetweenthesame4views.ThefourviewsareProblem Architecture(MacropatternabbreviatedPA);ExecutionFeatures(MicropatternsabbreviatedEF);DataSourceandStyle(abbreviatedDV);andfinallythe Processing(runtimeabbreviatedPr)View.

Thecentralideaisthatanyproblem–whetherBigDataorSimulation, andwhetherHPCorcloud-based,canbebrokenupintoDataplusModel. TheDDDASapproachisanexamplewherethisideaisexplicit[3].InaBig Dataproblem,theDataislargeandneedstobecollected,stored,managedand accessed.ThenoneusesDataAnalyticstocomparesomeModelwiththisdata. TheModelcouldbesmallsuchascoordinatesofafewclustersorlargeasina deeplearningnetwork;almostbydefinitiontheDataislarge!

Ontheotherhandforsimulationsthemodelisnearlyalwaysbig–asinvalues offieldsonalargespace-timemesh.TheDatacouldbesmallandisessentially zeroforQuantumChromodynamicssimulationsandcorrespondstothetypically smallboundaryconditionsformanysimulations;howeverclimateandweather simulationscanabsorblargeamountsofassimilateddata.RememberBigData hasamodel,sotherearemodeldiamondsforbigdata.Thediamondsandtheir facetsaregiveninatableputintheappendix.Theyaresummarizedabovein Fig. 1

ComparingBigDataandsimulationsisnotsoclear;howevercomparingthe modelinsimulationsandthemodelinBigDataisstraightforwardwhilethe datainbothcasescanbetreatedsimilarly.Thissimpleidealiesatheartofour approachtoBigData-Simulationconvergence.Intheconvergencediamonds giveninTablepresentedinAppendix,onedividesthefacetsintothreetypes

1.Facetn(withoutDorM)referstoafacetofsystemincludingbothdataand model–16intotal.

2.FacetnDisaDataonlyfacet–16inTotal

3.FacetnMisaModelonlyfacet–32intotal

Theincreaseintotalfacetsandlargenumberofmodelfacetscorresponds mainlytoaddingSimulationfacetstotheProcessingViewoftheDiamonds. Notewehaveincludedcharacteristics(facets)presentintheBerkeleyDwarfs andNASParallelBenchmarksaswelltheNRCMassiveDataAnalysisComputationalGiants.Forsomefacetsthereareseparatedataandmodelfacets.A goodexamplein“ConvergenceDiamondMicropatternsorExecutionFeatures” isthatEF-4DisDataVolumeandEF-4MModelsize.

Fig.1. Summaryofthe64facetsintheconvergencediamonds

TheviewsProblemArchitecture;ExecutionFeatures;DataSourceandStyle; andProcessing(runtime)arerespectivelymainlySystemFacets,amixofsystem,modelanddatafacets;mainlydatafacetswiththefinalviewentirelymodel

facets.Thefacetstellushowtocomparediamonds(instancesofbigdataand simulationapplications)andseewhichsystemarchitecturesareneededtosupporteachdiamondandwhicharchitecturesacrossmultiplediamondsincluding thosefrombothsimulationandbigdataareas.

Inseveralpapers[17, 33, 34]wehavelookedatthemodelinbigdataproblems andstudiedthemodelperformanceonbothcloudandHPCsystems.Wehave shownsimilaritiesanddifferencesbetweenmodelsinsimulationandbigdataarea. InparticularthelatteroftenneedHPChardwareandsoftwareenhancementsto getgoodperformances.Therearespecialfeaturesofeachclass;forexamplesimulationsoftenhavelocalconnectionsbetweenmodelpointscorrespondingeitherto thediscretizationofadifferentialoperatororashortrangeforce.Bigdatasometimesinvolvefullyconnectedsetsofpointsandtheseformulationshavesimilarities tolongrangeforceproblemsinsimulation.Inbothregimesweoftenseelinearalgebrakernelsbutthesparsenessstructureisratherdifferent.Graphdatastructures arepresentinbothcasesbutthatinsimulationstendstohavemorestructure. ThelinkagebetweenpeopleinFacebooksocialnetworkislessstructuredthanthe linkagebetweenmoleculesinacomplexbiochemistrysimulation.Howeverboth aregraphswithsomelongrangebutmanyshortrangeinteractions.Simulations nearlyalwaysinvolveamixofpointtopointmessagingandcollectiveoperations likebroadcast,gather,scatterandreduction.Bigdataproblemssometimesare dominatedbycollectivesasopposedtopointtopointmessagingandthismotivates themapcollectiveproblemarchitecturefacetPA-3above.Insimulationsandbig data,oneseesasimilarBSP(looselysynchronousPA-8),SPMD(PA-7)Iterative (EF-11M)andthismotivatestheSpark[32],Flink[7],Twister[15, 16]approach. Notethatpleasinglyparallel(PA-1)local(Pr-2M)structureisoftenseeninboth simulationsandbigdata.

In[33, 34]weintroduceHarpasaplug-intoHadoopwithscientificdata abstractions,supportofiterationsandhighqualitycommunicationprimitives. Thisrunswithgoodperformanceonseveralimportantdataanalyticsincluding LatentDirichletAllocationLDA,clusteringanddimensionreduction.NoteLDA hasanontrivialstructuresparsestructurecomingfromanunderlyingbagof wordsmodelfordocuments.In[17],welookatperformanceingreatdetailshowingexcellentdataanalyticsspeeduponanInfinibandconnectedHPCcluster usingMPI.DeepLearning[14, 24]hasclearlyshownimportanceofHPCand usesmanyideasoriginallydevelopedforsimulations.

Abovewediscussmodelsinthebigdataandsimulationregimes;whatabout thedata?Hereweseetheissueasperhapslessclearbutconvergencedoesnot seemdifficulttechnically.GivenmodelscanbeexecutedonHPCsystemswhen needed,itappearsreasonabletouseadifferentarchitectureforthedatawiththe bigdataapproachofhostingdataoncloudsquiteattractive.HPChastendednot tousebigdatamanagementtoolsbutrathertohostdataonsharedfilesystemslike Lustre.WeexpectthistochangewithobjectstoresandHDFSapproachesgainingpopularityintheHPCcommunity.ItisnotclearifHDFSwillrunonHPC systemsorinsteadonco-locatedcloudssupportingtherichobject,SQL,NoSQL andNewSQLparadigms.Thisco-locationstrategycanalsoworkforstreaming

BigData,SimulationsandHPCConvergence7

datawithinthetraditionalApacheStorm-Kafkamapstreamingmodel(PA-5) bufferingdatawithKafkaonacloudandfeedingthatdatatoApacheStormthat mayneedHPChardwareforcomplexanalytics(runningonboltsinStorm).In thisregardwehaveintroducedHPCenhancementstoStorm[26].

Webelievethereisanimmediateneedtoinvestigatetheoverlapofapplicationcharacteristicsandclassificationfromhigh-endcomputingandbigdata endsofthespectrum.Herewehaveshownhowinitialwork[21]toclassifybig dataapplicationscanbeextendedtoincludetraditionalhigh-performanceapplications.Cantraditionalclassificationsforhigh-performanceapplications[8]be extendedintheoppositedirectiontoincorporatebigdataapplications?Andif so,istheendresultsimilar,overlappingorverydistincttothepreliminaryclassificationproposedhere?Suchunderstandingiscriticalinordertoeventually haveacommonsetofbenchmarkapplicationsandsuites[10]thatwillguide thedevelopmentoffuturesystemsthatmusthaveadesignpointthatprovides balancedperformance.

NoteapplicationsareinstancesofConvergenceDiamonds.Eachinstancewill exhibitsomebutnotallofthefacetsofFig. 1.Wecangiveanexampleofthe NASParallelBenchmark[4]LU(Lower-UppersymmetricGaussSeidel)using MPI.ThiswouldbeadiamondwithfacetsPA-4,7,8;Pr-3M,16Mwithitssize specifiedinEF-4M.PA-4wouldbereplacedbyPA-2ifoneused(unwisely) MapReduceforthisproblem.FurtherifyoureadinitialdatafromMongoDB, thedatafacetDV-1Dwouldbeadded.ManyotherexamplesaregiveninSect.3 of[18].Forexamplenon-vectorclusteringinTable 1 ofthissectionisanicedata analyticsexample.ItexhibitsProblemArchitectureviewPA-3,PA-7,andPA8;ExecutionFeaturesEF-9D(Static),EF-10D(Regular),EF-11M(iterative), EF-12M(bagofitems),EF-13D(Non-metric),EF-13M(Nonmetric),andEF14M(O(N2)algorithm);ProcessingviewPr-3M,Pr-9M(Machinelearningand Expectationmaximization),andPr-12M(Fullmatrix,ConjugateGradient).

3HPC-ABDSConvergenceSoftware

Inpreviouspapers[20, 25, 28],weintroducedthesoftwarestackHPC-ABDS(High PerformanceComputingenhancedApacheBigDataStack)shownonline[4]andin Figs. 2 and 3.Thesewerecombinedwiththebigdataapplicationanalysis[6, 19, 21] intermsofOgresthatmotivatedtheextendedconvergencediamondsinSect. 2. WealsouseOgresandHPC-ABDStosuggestasystematicapproachtobenchmarking[18, 22].In[23]wedescribedthesoftwaremodelofFig. 2 whilefurther detailsofthestackcanbefoundinanonlinecourse[2]thatincludesasectionwith aboutoneslide(andassociatedlecturevideo)foreachentryinFig. 2.

Figure 2 collectstogethermuchexistingrelevantsystemssoftwarecoming fromeitherHPCorcommoditysources.Thesoftwareisbrokenupintolayerssosoftwaresystemsaregroupedbyfunctionality.Thelayerswherethereis especialopportunitytointegrateHPCandABDSarecoloredgreeninFig. 2. ThisistermedHPC-ABDS(HighPerformanceComputingenhancedApacheBig DataStack)asmanycriticalcorecomponentsofthecommoditystack(suchas

Fig.2. BigDataandHPCSoftwaresubsystemsarrangedin21layers.Greenlayers haveasignificantHPCintegration(Colorfigureonline)

SparkandHbase)comefromopensourceprojectswhileHPCisneededtobring performanceandotherparallelcomputingcapabilities[23].NotethatApache isthelargestbutnotonlysourceofopensourcesoftware;webelievethatthe ApacheFoundationisacriticalleaderintheBigDataopensourcesoftwaremovementanduseittodesignatethefullbigdatasoftwareecosystem.Thefigure alsoincludesproprietarysystemsastheyillustratekeycapabilitiesandoften

Fig.3. ComparisonofBigDataandHPCsimulationsoftwarestacks motivateopensourceequivalents.Webuiltthispictureforbigdataproblems butitalsoappliestobigsimulationwithcaveatthatweneedtoaddmorehigh levelsoftwareatthelibrarylevelandmorehighleveltoolslikeGlobalArrays. ThiswillbecomeclearerinthenextsectionwhenwediscussFig. 2 inmoredetail. TheessentialideaofourBigDataHPCconvergenceforsoftwareistomake useofABDSsoftwarewherepossibleasitoffersrichnessinfunctionality,acompellingopen-sourcecommunitysustainabilitymodelandtypicallyattractiveuser interfaces.ABDShasagoodreputationforscalebutoftendoesnotgivegood performance.WesuggestaugmentingABDSwithHPCideasespeciallyinthe greenlayersofFig. 2.WehaveillustratedthiswithHadoop[33, 34],Storm[26] andthebasicJavaenvironment[17].WesuggestusingtheresultantHPC-ABDS forbothbigdataandbigsimulationapplications.InthelanguageofFig. 2,we usethestackonleftenhancedbythehighperformanceideasandlibrariesofthe classicHPCstackontheright.Asoneexamplewerecommendusingenhanced MapReduce(Hadoop,Spark,Flink)forparallelprogrammingforsimulations andbigdatawhereit’sthemodel(dataanalytics)thathassimilarrequirements tosimulations.WehaveshownhowtointegrateHPCtechnologiesintoMapReducetogetperformanceexpectedinHPC[34]andthatontheotherhandif theuserinterfaceisnotcritical,onecanuseasimulationtechnology(MPI)to driveexcellentdataanalyticsperformance[17].Abyproductofthesestudiesis thatclassicHPCclustersmakeexcellentdataanalyticsengine.Onecanusethe convergencediamondstoquantifythisresult.Thesedefinepropertiesofapplicationsbetweenbothdataandsimulationsandallowonetospecifyhardware andsoftwarerequirementsuniformlyoverthesetwoclassesofapplications.

4ConvergenceSystems

Figure 3 contrastsmodernABDSandHPCstacksillustratingmostofthe21layers andlabellingonleftwithlayernumberusedinFig. 2.TheomittedlayersinFig. 2 areInteroperability,DevOps,MonitoringandSecurity(layers7,6,4,3)which areallimportantandclearlyapplicabletobothHPCandABDS.Wealsoadd inFig. 3,anextralayercorrespondingtoprogramminglanguage,whichfeature isnotdiscussedinFig. 2.Oursuggestedapproachistobuildaroundthestacks ofFig. 2,takingthebestapproachateachlayerwhichmayrequiremergingideas fromABDSandHPC.Thisconvergedstackisstillemergingbutwehavedescribed somefeaturesintheprevioussection.Thenthisstackwoulddobothbigdataand bigsimulationaswellthedataaspects(store,manage,access)ofthedatainthe “dataplusmodel”framework.Althoughthestackarchitectureisuniformitwill havedifferentemphasesinhardwareandsoftwarethatwillbeoptimizedusingthe convergencediamondfacets.Inparticularthedatamanagementwillusuallyhave adifferentoptimizationfromthemodelcomputation.

Thusweproposeacanonical“dual”systemarchitecturesketchedinFig. 4 withdatamanagementontheleftsideandmodelcomputationontheright.As drawnthesystemsarethesamesizebutthisofcourseneednotbetrue.Further wedepictdatarichnodesonlefttosupportHDFSbutthatalsomightnotbe correct–maybebothsystemsarediskrichormaybewehaveaclassicLustre stylesystemonthemodelsidetomimiccurrentHPCpractice.Finallythesystemsmayinfactbecoincidentwithdatamanagementandmodelcomputation onthesamenodes.Thelatterisperhapsthecanonicalbigdataapproachbut weseemanybigdatacaseswherethemodelwillrequirehardwareoptimized forperformanceandwithforexamplehighspeedinternalnetworkingorGPU enhancednodes.Inthiscasethedatamaybemoreeffectivelyhandledbya separatecloudlikecluster.Thisdependsonpropertiesrecordedinthefacets oftheConvergenceDiamondsforapplicationsuites.Theseideasarebuilton substantialexperimentationbutstillneedsignificanttestingastheyhavenotbe lookedatsystematically.

Fig.4. Dualconvergencearchitecture

Wesuggestedusingthesamesoftwarestackforbothsystemsinthedual Convergencesystem.NowthatmeanswepickandchosefromHPC-ABDSon bothmachinesbutweneedn’tmakesamechoiceonbothsystems;obviously thedatamanagementsystemwouldstresssoftwareinlayers10and11ofFig. 2 whilethemodelcomputationwouldneedlibraries(layer16)andprogramming pluscommunication(layers13–15).

Acknowledgments. ThisworkwaspartiallysupportedbyNSFCIF21DIBBS 1443054,NSFOCI1149432CAREER.andAFOSRFA9550-13-1-0225awards.We thankDennisGannonforcommentsonanearlydraft.

Appendix:ConvergenceDiamondswith64Facets

ThesearediscussedinSect. 2 andsummarizedinFig. 1

Table1. ConvergenceDiamondsandtheirFacets.

FacetandView

Comments

PA:ProblemArchitectureViewofDiamonds(MetaorMacroPatterns)

NearlyallarethesystemofDataandModel

PA-1 PleasinglyParallel

PA-2 ClassicMapReduce

PA-3 Map-Collective

4 MapPoint-to-Point

PA-5 MapStreaming

PA-6 Sharedmemory(asopposedto distributedparallelalgorithm)

PA-7 SPMD

AsinBLAST,Proteindocking.IncludesLocal AnalyticsorMachineLearningMLorfiltering pleasinglyparallel,asinbio-imagery,radar images(pleasinglyparallelbutsophisticated localanalytics)

Search,IndexandQueryandClassification algorithmslikecollaborativefiltering

Iterativemaps+communicationdominatedby “collective”operationsasinreduction, broadcast,gather,scatter.Commondatamining patternbutalsoseeninsimulations

Iterativemaps+communicationdominatedby manysmallpointtopointmessagesasingraph algorithmsandsimulations

Describesstreaming,steeringandassimilation problems

Correspondstoproblemwheresharedmemory implementationsimportant.Tendtobe dynamicandasynchronous

SingleProgramMultipleData,commonparallel programmingfeature

PA-8 BulkSynchronousProcessing(BSP) Well-definedcompute-communicationphases

PA-9 Fusion

PA-10 Dataflow

PA-11M Agents

PA-12 Orchestration(workflow)

Fullapplicationsofteninvolvesfusionof multiplemethods.Onlypresentforcomposite Diamonds

Importantapplicationfeaturesoftenoccurring incompositeDiamonds

Modellingtechniqueusedinareaslike epidemiology(swarmapproaches)

Allapplicationsofteninvolveorchestration (workflow)ofmultiplecomponents

Table1. Continued

EF:DiamondMicropatternsorExecutionFeatures

EF-1 PerformanceMetrics ResultofBenchmark

EF-2 Flops/byte (MemoryorI/O). Flops/watt(power)

EF-3 ExecutionEnvironment

EF-4D DataVolume

EF-4M ModelSize

EF-5D DataVelocity

EF-6D DataVariety

EF-6M ModelVariety

EF-7 Veracity

EF-8M CommunicationStructure

EF-9D D=DynamicorS=StaticData

I/ONotneededfor“pureinmemory” benchmark

Corelibrariesneeded:matrix-matrix/vector algebra,conjugategradient,reduction, broadcast;Cloud,HPC,threads,message passingetc.Couldincludedetailsofmachine usedforbenchmarkinghere

PropertyofaDiamondInstance.Benchmark measure

Associatedwithstreamingfacetbutvalue dependsonparticularproblem.Notapplicable tomodel

MostusefulforcompositeDiamonds.Applies separatelyformodelanddata

Mostproblemswouldnotdiscussbut potentiallyimportant

Interconnectrequirements;Iscommunication BSP,Asynchronous,Pub-Sub,Collective,Point toPoint?DistributionandSynch

Clearqualitativeproperties.Importance familiarfromparallelcomputingandimportant separatelyfordataandmodel

EF-9M D=DynamicorS=StaticModel Clearqualitativeproperties.Importance familiarfromparallelcomputingandimportant separatelyfordataandmodel

EF-10D R=RegularorI=IrregularData

EF-10M R=RegularorI=IrregularModel

EF-11M Iterativeornot?

EF-12D DataAbstraction

EF-12M ModelAbstraction

ClearqualitativepropertyofModel. HighlightedbyIterativeMapReduceandalways presentinclassicparallelcomputing

e.g.key-value,pixel,graph,vector,bagsof wordsoritems.Clearquantitativeproperty althoughimportantdataabstractionsnot agreedupon.Allshouldbesupportedby Programmingmodelandruntime

e.g.meshpoints,finiteelement,Convolutional Network.

EF-13D DatainMetricSpaceornot? Importantpropertyofdata.

EF-13M ModelinMetricSpaceornot? Oftendrivenbydatabutmodelanddatacan bedifferenthere

EF-14M O (N 2 )or O (N )Complexity? PropertyofModelalgorithm

Table1. Continued

DV:DataSourceandStyleViewofDiamonds(NomodelinvolvementexceptinDV-9)

DV-1D SQL/NoSQL/NewSQL?

DV-2D Enterprisedatamodel

DV-3D FilesorObjects?

CanaddNoSQLsub-categoriessuchas key-value,graph,document,column,triple store

e.g.warehouses.Propertyofdatamodel highlightedindatabasecommunity/industry benchmarks

Clearqualitativepropertyofdatamodelwhere filesimportantinScience;objectsinindustry

DV-4D FileorObjectSystem HDFS/Lustre/GPFS.NoteHDFSimportantin Apachestackbutnotmuchusedinscience

DV-5D ArchivedorBatchedorStreaming

StreamingCategoryS1)

StreamingCategoryS2)

StreamingCategoryS3)

StreamingCategoryS4)

StreamingCategoryS5)

DV-6D Sharedand/orDedicatedand/or Transientand/orPermanent

DV-7D MetadataandProvenance

Streamingisincrementalupdateofdatasets withnewalgorithmstoachievereal-time response;Beforedatagetstocomputesystem, thereisoftenaninitialdatagatheringphase whichischaracterizedbyablocksizeand timing.Blocksizevariesfrommonth(Remote Sensing,Seismic)today(genomic)toseconds orlower(Realtimecontrol,streaming)

S1)Setofindependenteventswhereprecise timesequencingunimportant.

S2)Timeseriesofconnectedsmalleventswhere timeorderingimportant.

S3)Setofindependentlargeeventswhereeach eventneedsparallelprocessingwithtime sequencingnotcritical

S4)Setofconnectedlargeeventswhereeach eventneedsparallelprocessingwithtime sequencingcritical.

S5)Streamofconnectedsmallorlargeevents tobeintegratedinacomplexway.

Clearqualitativepropertyofdatawhose importanceisnotwellstudied.Other characteristicsmaybeneededforauxiliary datasetsandthesecouldbeinterdisciplinary, implyingnontrivialdatamovement/replication

Clearqualitativepropertybutnotforkernelsas importantaspectofdatacollectionprocess

DV-8D InternetofThings Dominantsourceofcommoditydatainfuture. 24to50BilliondevicesonInternetby2020

DV-9 HPCSimulationsgenerateData Importantinscienceresearchespeciallyat exascale

DV-10D GeographicInformationSystems GeographicalInformationSystemsprovide attractiveaccesstogeospatialdata

Table1. Continued

Pr:Processing(runtime)ViewofDiamondsUsefulforBigdataandBigsimulation

Pr-1M Micro-benchmarks

Pr-2M LocalAnalyticsorInformaticsor Simulation

Pr-3M GlobalAnalyticsorInformaticsor simulation

Pr-12M LinearAlgebraKernels

Manyimportantsubclasses

Pr-13M GraphAlgorithms

Pr-14M Visualization

Pr-15M CoreLibraries

BigDataProcessingDiamonds

Pr-4M BaseDataStatistics

Pr-5M RecommenderEngine

Pr-6M DataSearch/Query/Index

Pr-7M DataClassification

Pr-8M Learning

Pr-9M OptimizationMethodology

Pr-10M StreamingDataAlgorithms

Pr-11M DataAlignment

Importantsubsetofsmallkernels

Executesonasinglecoreorperhapsnodeand overlapsPleasinglyParallel

Requiringiterativeprogrammingmodelsacross multiplenodesofaparallelsystem

Importantpropertyofsomeanalytics

ConjugateGradient,Krylov,Arnoldiiterative subspacemethods

FullMatrix

Structuredandunstructuredsparsematrix methods

Clearimportantclassofalgorithmsoftenhard especiallyinparallel

Clearlyimportantaspectofanalysisin simulationsandbigdataanalyses

FunctionsofgeneralvaluesuchasSorting, Mathfunctions,Hashing

Describessimplestatisticalaveragesneeding simpleMapReduceinproblemarchitecture

Cleartypeofbigdatamachinelearningof especialimportancecommercially

Clearimportantclassofalgorithmsespecially incommercialapplications.

Clearimportantclassofbigdataalgorithms

Includesdeeplearningascategory

IncludesMachineLearning,Nonlinear Optimization,LeastSquares,expectation maximization,DynamicProgramming, Linear/QuadraticProgramming,Combinatorial Optimization

Clearimportantclassofalgorithmsassociated withInternetofThings.CanbecalledDDDAS DynamicData-DrivenApplicationSystems

Clearimportantclassofalgorithmsasin BLASTtoaligngenomicsequences

Simulation(Exascale)ProcessingDiamonds

Pr-16M IterativePDESolvers Jacobi,GaussSeideletc.

Pr-17M MultiscaleMethod?

Multigridandothervariableresolution approaches

Pr-18M SpectralMethods FastFourierTransform

Pr-19M N-bodyMethods Fastmultipole,Barnes-Hut

Pr-20M ParticlesandFields ParticleinCell

Pr-21M EvolutionofDiscreteSystems ElectricalGrids,Chips,BiologicalSystems, Epidemiology.NeedsODEsolvers

Pr-22M NatureofMeshifused Structured,Unstructured,Adaptive

References

1.BigDataandExtreme-scaleComputing(BDEC). http://www.exascale.org/bdec/ Accessed29Jan2016

2.DataScienceCurriculum:IndianaUniversityOnlineClass:BigDataOpenSource SoftwareandProjects(2014). http://bigdataopensourceprojects.soic.indiana.edu/. Accessed11Dec2014

3.DDDASDynamicData-DrivenApplicationsSystemShowcase. http://www. 1dddas.org/.Accessed22July2015

4.HPC-ABDSKaleidoscopeofover350ApacheBigDataStackandHPCTechnologies. http://hpc-abds.org/kaleidoscope/

5.NSCI:ExecutiveOrder-creatingaNationalStrategicComputingInitiative,29July2015. https://www.whitehouse.gov/the-press-office/2015/07/29/ executive-order-creating-national-strategic-computing-initiative

6.NISTBigDataUseCase&Requirements.V1.0FinalVersion2015,January2016. http://bigdatawg.nist.gov/V1 output docs.php

7.ApacheSoftwareFoundation:ApacheFlinkopensourceplatformfordistributed streamandbatchdataprocessing. https://flink.apache.org/.Accessed16Jan2016

8.Asanovic,K.,Bodik,R.,Catanzaro,B.C.,Gebis,J.J.,Husbands,P.,Keutzer,K., Patterson,D.A.,Plishker,W.L.,Shalf,J.,Williams,S.W.,etal.:Thelandscapeof parallelcomputingresearch:aviewfromBerkeley.Tech.rep.,UCB/EECS-2006183,EECSDepartment,UniversityofCalifornia,Berkeley(2006). http://www. eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

9.Bailey,D.H.,Barszcz,E.,Barton,J.T.,Browning,D.S.,Carter,R.L.,Dagum,L., Fatoohi,R.A.,Frederickson,P.O.,Lasinski,T.A.,Schreiber,R.S.,etal.:TheNAS parallelbenchmarks.Int.J.HighPerform.Comput.Appl. 5(3),63–73(1991)

10.Baru,C.,Rabl,T.:Tutorial4“BigDataBenchmarking”at2014IEEEInternationalConferenceonBigData(2014). http://cci.drexel.edu/bigdata/bigdata2014/ tutorial.htm Accessed2Jan2015

11.Baru,C.:BigDataTop100List. http://www.bigdatatop.100.org/.AccessedJan 2016

12.Bryant,R.E.:Data-IntensiveSupercomputing:ThecaseforDISC,10May2007. http://www.cs.cmu.edu/bryant/pubdir/cmu-cs-07-128.pdf

13.Bryant,R.E.:Supercomputing&BigData:AConvergence. https://www. nitrd.gov/nitrdgroups/images/5/5e/SC15panel RandalBryant.pdf .Supercomputing(SC)15Panel-SupercomputingandBigData:FromCollisiontoConvergence Nov182015-Austin,Texas. https://www.nitrd.gov/apps/hecportal/index.php? title=Events#Supercomputing .28SC.29 15 Panel

14.Coates,A.,Huval,B.,Wang,T.,Wu,D.,Catanzaro,B.,Andrew,N.:Deeplearning withCOTSHPCsystems.In:Proceedingsofthe30thInternationalConferenceon MachineLearning,pp.1337–1345(2013)

15.Ekanayake,J.,Li,H.,Zhang,B.,Gunarathne,T.,Bae,S.H.,Qiu,J.,Fox,G.: Twister:aruntimeforiterativemapreduce.In:Proceedingsofthe19thACM InternationalSymposiumonHighPerformanceDistributedComputing,pp.810–818.ACM(2010)

16.Ekanayake,J.,Pallickara,S.,Fox,G.:Mapreducefordataintensivescientificanalyses.In:IEEEFourthInternationalConferenceoneScience(eScience2008),pp. 277–284.IEEE(2008)

17.Ekanayake,S.,Kamburugamuve,S.,Fox,G.:SPIDAL:highperformance dataanalyticswithJavaandMPIonlargemulticoreHPCclusters, Technicalreport,January2016. http://dsc.soic.indiana.edu/publications/ hpc2016-spidal-high-performance-submit-18-public.pdf

18.Fox,G.,Jha,S.,Qiu,J.,Ekanazake,S.,Luckow,A.:Towardsacomprehensiveset ofbigdatabenchmarks.In:BigDataandHighPerformanceComputing,vol. 26,p.47,February2015. http://grids.ucs.indiana.edu/ptliupages/publications/ OgreFacetsv9.pdf

19.Fox,G.,Chang,W.:Bigdatausecasesandrequirements.In:1stBigDataInteroperabilityFrameworkWorkshop:BuildingRobustBigDataEcosystemISO/IEC JTC1StudyGrouponBigData,pp.18–21(2014)

20.Fox,G.,Qiu,J.,Jha,S.:Highperformancehighfunctionalitybigdatasoftware stack.In:BigDataandExtreme-scaleComputing(BDEC)(2014). http://www. exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdf

21.Fox,G.C.,Jha,S.,Qiu,J.,Luckow,A.:Towardsanunderstandingoffacetsand exemplarsofbigdataapplications.In:20YearsofBeowulf:WorkshoptoHonor ThomasSterling’s65thBirthdayOctober,Annapolis14October2014. http://dx. doi.org/10.1145/2737909.2737912

22.Fox,G.C.,Jha,S.,Qiu,J.,Luckow,A.:Ogres:asystematicapproachtobig databenchmarks.In:BigDataandExtreme-scale,Computing(BDEC),pp.29–30 (2015)

23.Fox,G.C.,Qiu,J.,Kamburugamuve,S.,Jha,S.,Luckow,A.:HPC-ABDShigh performancecomputingenhancedapachebigdatastack.In:201515thIEEE/ACM InternationalSymposiumonCluster,CloudandGridComputing(CCGrid),pp. 1057–1066.IEEE(2015)

24.Iandola,F.N.,Ashraf,K.,Moskewicz,M.W.,Keutzer,K.:FireCaffe:near-linear accelerationofdeepneuralnetworktrainingoncomputeclusters.arXivpreprint arxiv:1511.00175 (2015)

25.Jha,S.,Qiu,J.,Luckow,A.,Mantha,P.,Fox,G.C.:Ataleoftwodata-intensive paradigms:applications,abstractions,andarchitectures.In:2014IEEEInternationalCongressonBigData(BigDataCongress),pp.645–652.IEEE(2014)

26.Kamburugamuve,S.,Ekanayake,S.,Pathirage,M.,Fox,G.:Towardshighperformanceprocessingofstreamingdatainlargedatacenters,Technicalreport(2016). http://dsc.soic.indiana.edu/publications/high performance processing stream.pdf

27.NationalResearchCouncil:FrontiersinMassiveDataAnalysis.TheNational AcademiesPress,Washington(2013)

28.Qiu,J.,Jha,S.,Luckow,A.,Fox,G.C.:TowardsHPC-ABDS:aninitialhighperformancebigdatastack.In:BuildingRobustBigDataEcosystemISO/IEC JTC1StudyGrouponBigData,pp.18–21(2014). http://grids.ucs.indiana.edu/ ptliupages/publications/nist-hpc-abds.pdf

29.Reed,D.A.,Dongarra,J.:Exascalecomputingandbigdata.Commun.ACM 58(7), 56–68(2015)

30.Trader,T.:Towardaconvergedexascale-bigdatasoftwarestack,28January2016. http://www.hpcwire.com/2016/01/28/toward-a-converged-software/ -stack-for-extreme-scale-computing-and-big-data/

31.VanderWijngaart,R.F.,Sridharan,S.,Lee,V.W.:ExtendingtheBTNASparallel benchmarktoexascalecomputing.In:ProceedingsoftheInternationalConference onHighPerformanceComputing,Networking,StorageandAnalysis,p.94.IEEE ComputerSocietyPress(2012)

32.Zaharia,M.,Chowdhury,M.,Franklin,M.J.,Shenker,S.,Stoica,I.:Spark:cluster computingwithworkingsets.In:Proceedingsofthe2ndUSENIXConferenceon HotTopicsinCloudComputing,vol.10,p.10(2010)

33.Zhang,B.,Peng,B.,Qiu,J.:ParallelLDAthroughsynchronizedcommunication optimizations.Technicalreport(2015). http://dsc.soic.indiana.edu/publications/ LDA optimization paper.pdf

34.Zhang,B.,Ruan,Y.,Qiu,J.:Harp:collectivecommunicationonhadoop.In:IEEE InternationalConferenceonCloudEngineering(IC2E)Conference(2014)

NewBenchmarkProposals

BenchmarkingFast-DataPlatforms forthe Aadhaar BiometricDatabase

DepartmentofComputationalandDataSciences, IndianInstituteofScience,Bangalore,India simmhan@cds.iisc.ac.in,shukla@ssl.serc.iisc.in,arun.verma100@gmail.com

Abstract. Aadhaar istheworld’slargestbiometricdatabasewithabillionrecords,beingcompiledasanidentityplatformtodeliversocialservicestoresidentsofIndia. Aadhaar processesstreamsofbiometricdata asresidentsareenrolledandupdated.Besides ∼1millionenrollments andupdatesperday,upto100milliondailybiometricauthentications areexpectedduringdeliveryofvariouspublicservices.TheseformcriticalBigDataapplications,withlargevolumesandhighvelocityofdata. Here,weproposeastreamprocessingworkload,basedonthe Aadhaar enrollmentandAuthenticationapplications,asaBigDatabenchmark fordistributedstreamprocessingsystems.Wedescribetheapplication composition,andcharacterizetheirtasklatenciesandselectivity,and datarateandsizedistributions,basedonrealobservations.Wealso validatethisbenchmarkonApacheStormusingsyntheticstreamsand simulatedapplicationlogic.Thispaperoffersauniqueglimpseintoan operational nationalidentityinfrastructure,andproposesabenchmark for“fastdata”platformstosupportsucheGovernanceapplications.

1Introduction

TheUniqueIdentificationAuthorityofIndia(UIDAI)managesthenational identityinfrastructureforIndia,andprovidesabiometrics-basedunique12-digit identifierforeachresidentofIndia,called Aadhaar (whichmeans foundation, inSanskrit). Aadhaar wasconceivedasameanstoidentifythesocialservices andentitlementsthateachresidentiseligiblefor,andensurestransparentand accountablepublicservicedeliverybyvariousgovernmentagencies.

ThescopeofUIDAIisitselfnarrow.ItmaintainsadatabaseofuniqueresidentsinIndia,withuniquenessguaranteedbytheir10fingerprintsandiris scan;assignsa12digit Aadhaar numbertotheresident;andasanauthenticationservice,validatesifagivenbiometricmatchesagiven Aadhaar number bycheckingitsdatabase.Otherauthorizedgovernmentandprivateagenciescan usethisUIDAIauthenticationservicetoensurethataspecificpersonrequestingserviceiswhotheyare,andusetheir Aadhaar numberasaprimarykeyfor determiningserviceentitlementsanddelivery.However,toguaranteetheprivacy ofresidents,UIDAIdoesnotpermitagenerallookupofthedatabasebasedon justthebiometrics,tolocateanarbitrarypersonofinterest.

c SpringerInternationalPublishingAG2016 T.Rabletal.(Eds.):WBDB2015,LNCS10044,pp.21–39,2016. DOI:10.1007/978-3-319-49748-8 2

22Y.Simmhanetal.

UIDAIholdstheworld’slargestbiometricrepository. Asofwriting,ithas enrolled981Mofthe1,211MresidentsofIndiainitsdatabase1 .Itcontinuesto voluntarilyregisterpendingandnewlyeligibleones,atanoperationalcostof aboutUSD1perperson.Italsocurrentlyperformsupto2Mauthentications perday tosupportafewsocialservices,andthisissettogrowto100Mper dayastheuseof Aadhaar becomespervasive.The Aadhaar databaseis5× largerthanthenextpublicly-knownbiometricrepository,theUSHomeland Security’sOBIM(VISIT)program,whichstores176Mrecordsonfingerprints, andprocesses40× fewertransactionsat88.73Mauthentications eachyear (or 245K perday ),asoflatestdataavailablefrom2014[10].

Clearly,the Aadhaar repositoryoffersauniqueBigDatachallenge,bothin generalandspeciallyfromthe publicsector.Thepresentsoftwarearchitecture ofUIDAIisbasedoncontemporaryopensourceenterprisesolutionsthatare designedtoscale-outoncommodityhardware[3].Specifically,itusesaStaged Event-DrivenArchitecture(SEDA)[12]thatusesapublish-subscribemechanismtocoordinatetheexecutionflowoflogicalapplicationstagesoverbatches ofincomingrequests.Withineachstage,BigDatatechnologies optimizedfor volume,suchasHBaseandSolr,andeventraditionalrelationaldatabaseslike MySQL,areused.AspartoftheconstantevolutionoftheUIDAIarchitecture, oneofthegoalsistoreducethelatencytimeforenrollmentofnewresidentsor updationoftheirdetails,thattakesbetween3–30daysnow,tosomethingthat canbedoneinteractively.Anotheristoensurethescalabilityoftheauthenticationtransactionsastherequestsgrowto100’sofmillionsperday,andevaluate theapplicabilityofemergingBigDataplatformstoachievethesame.

DistributedstreamprocessingsystemssuchasApacheStormandApache SparkStreaminghavegainedtraction,offlate,inmanagingthedatavelocity. Such“fastdata”systemsofferdataflowordeclarativecompositionsemantics andprocessdataathighinputrates,withlowlatency,ondistributedcommodity clusters.Inthiscontext,thispaperproposesbenchmarkworkloads,motivated byrealisticnational-scaleeGovernanceapplications,toevaluatethe qualityof service andthehost efficiency thatareprovidedby FastData platformsto processhigh-velocitydatastreams.

Moregenerally,thispaperhighlightsthegrowingimportanceofeGovernance workloadsratherthanjustenterpriseorscientificworkloads[2].Asemerging economieslikeChina,IndiaandBrazilwithlargepopulationsstarttodigitize theirgovernanceplatformsandcitizen-servicedelivery,massiveonlineapplicationscanposeuniquechallengestoBigDataplatforms.Fore.g.,technology challengeswiththe HealthCare.gov insuranceexchangewebsiteintheUSto supporttheAffordableCareActarewellknown2 .ThereareinadequatebenchmarksandworkloadstohelpevaluateBigDataplatformsforsuchpublicsector

1 India2011Census,andlivestatisticsfrom https://portal.uidai.gov.in.

2 Healthcare.gov:CMSHasTakenStepstoAddressProblems,butNeedstoFurtherImplementSystemsDevelopmentBestPractices, www.gao.gov/products/ GAO-15-238

Another random document with no related content on Scribd:

was no good. I’ll bring you another quarter as soon as I can.”

Dan determined to do the best he could to fulfill his errand, even if it was not his fault that the storekeeper would not trust the old farmer

“Wa’ll, I s’pose that’ll have to do.”

The truth was Mr. Lee did not like to lose trade, and, though Mr. Savage owed him quite a bill, he knew he would be paid in time. If he did not accommodate the farmer’s wife they might take their trade elsewhere.

“You tell Peter Savage he’d better be careful how he circulates counterfeit money,” went on Mr. Lee, as he gave Dan back the quarter, and passed over the jug of molasses. “An’ be sure an’ bring me a good quarter as soon as ye kin.”

“Yes, sir,” replied Dan. “I wonder where that bad money came from?” he thought as he started for home.

Though he had done the errand in much less time than usual, because of the ride Mr. Harrison had given him, Dan was scolded by Mrs. Savage when he got in.

“I never see such a lazy boy!” she exclaimed. “Did ye stop t’ make th’ molasses?”

“I came pretty near not getting it,” said Dan, and he told her of the incident of the bad quarter. As he had anticipated, that caused Mrs. Savage more astonishment than did the reported refusal of Mr Lee to extend any more credit, for though Mr. Savage was wealthy he delayed as long as he could the paying of debts.

“That quarter bad!” she exclaimed, as Dan handed it to her. “I don’t believe it!”

But when she tested it carefully she had to admit that it was.

“Wa’al, of all things!” she cried. “That’s a swindle, and I’ll have th’ law on that man!”

“Who gave it to Mr. Savage?” asked Dan.

“Who? Why that pesky book agent that was askin’ ye so many questions. Pa—which was what she called her husband—give it to me fer grocery money. Now it’s bad! Wa’ll ye’ll have t’ make it up, that’s all. It was your fault that we got it.”

“I don’t see how you can say that.”

“Ye don’t, eh? well I say it, jest th’ same, an’ I tell ye what, ye’ve got t’ pay us back in some way.”

“What’s th’ matter?” inquired Mr. Savage, coming in from the barn. “Hey, Dan! Ain’t ye back at shellin’ corn yet? Land sakes! I never see sech a lazy boy!”

“Look a-here!” exclaimed Mrs. Savage. “This quarter’s a counterfeit,” and she quickly told the story of it.

“Land o’ Tunket!” cried Mr. Savage. “I’ve been swindled, that’s what I have. That was a slick feller, what was talkin’ t’ ye that day, Dan. It’s all your fault, too. Ef ye hadn’t been so ready t’ answer his questions he’d a gone on about his business, an’ I wouldn’t be out a quarter It’s all your fault.”

As the quarter had been given for a few minutes of Dan’s not very valuable time, it was hard to understand where Mr. Lee was out of pocket. But, like most unreasonable men, if he once believed a thing, he seldom changed his mind.

“Shall I take a good quarter back to Mr. Lee?” asked Dan. “I promised him I would.”

“Wa’al, I guess not. Ye’ve loafed enough fer one day. I’ll take him a quarter when I git good an’ ready. Now ye’d better start t’ weedin’ th’ onion patch. It needs it, an’ don’t be all day about it, nuther. Step lively, I never see sech a lazy boy!”

Dan did not reply, but, as he went to the big onion patch, which he hated almost as much as he did the corn sheller, he could not help thinking of the man who had given his employer the counterfeit money.

“There’s something queer about that man,” thought Dan. “He isn’t what he seems to be, a book agent. I wonder if he can be a

CHAPTER VII

A TELEPHONE MESSAGE

S days after this, during which time Dan had been kept very busy about the farm, and helping Mrs. Savage with the housework, a neighbor called to him as he was hoeing the potatoes in the garden patch.

“Say, Dan, is Mr. Savage in the house?”

“I think so, sir; did you want to see him?”

“Well, not exactly, but some one has just called him up on my telephone, and they want to talk to him. I don’t see why he doesn’t get a telephone of his own. Too mean, I suppose. I had to leave off fixing a set of harness and come running over to tell him.”

“It’s too bad you had that trouble,” responded Dan. “I will call him.”

Dan dropped his hoe and started for the house.

“Tell him to hurry,” advised the man. “Whoever it is that’s calling him has to hold the wire, and that costs money.”

“Mr. Savage will not like that.”

“Oh, well, he doesn’t have to pay for the time. The person who called him up has to foot the bill.”

“Then I’ll tell him to hurry so as to save all the expense he can.”

“Very well, I’ll go back to my work.”

Dan found his employer trying to sew up a rip in one of his shoes, for Mr. Savage hated to spend money on anything, and repairing his own footwear was one of the ways he endeavored to save.

“Some one wants to speak to you on the telephone, Mr. Savage.”

“Telephone? What telephone?”

“The one in Mr. Lane’s house.”

“Want’s to speak to me on th’ telephone? I wonder who it can be? Nobody would want to telephone me.”

“You had better hurry,” advised Dan. “It costs money to hold the wire.”

“Hold th’ wire? Costs money? Say, ef anybody thinks I’m goin’ t’ waste money t’ talk over one of them things, they’re mighty much mistaken. I ain’t got no money t’ throw away on sech foolishness,” and he was about to proceed with his shoe mending.

“No, no, it doesn’t cost you anything,” explained Dan. “The person who called you up has to pay.”

“Oh, that’s different. Wa’al, I guess I might as well go then,” and he leisurely laid aside the shoe, the piece of leather and needle.

“I think you had better hurry,” said Dan respectfully.

“Hurry? What fer? Didn’t ye say th’ other fellow was payin’ fer it? I ain’t got no call t’ hurry. It don’t cost me nothin’, an’ ef folks calls me up they has t’ wait till I git good an’ ready t’ answer.”

There was no use trying to combat such a mean argument as Dan felt this to be, so he said nothing, but went back to resume his hoeing, before Mr. Savage would have a chance to scold him for being lazy.

“I wish some one would call me up on the telephone, and tell me they had a good job for me—somewhere else besides on a farm,” thought Dan, as he bent to his work. “I wonder how it seems to talk over a wire. It must be queer.”

When he found out he did not have to pay for the telephone message Mr. Savage proceeded slowly down the road to Mr. Lane’s house. Though he pretended he was not anxious, the old miserly farmer was, nevertheless, quite excited in wondering who could want to talk to him.

“Maybe it’s a message from th’ police, to say that th’ feller what give me that bad quarter has been arrested,” he murmured as he

approached the house. “I hope it is. I’d like t’ see him git ten years. It was a mean trick t’ play on me.”

“You’d better hurry,” advised Mr. Lane, as he saw Mr. Savage coming up the front walk. “The party on the other end of the wire is getting impatient.”

“Wa’al, folks what bothers me at my work has t’ wait,” spoke up Mr. Savage, in rather a surly tone, and he did not thank Mr. Lane for his trouble in calling him to the telephone.

Mr. Savage took up the receiver, and fairly shouted into the transmitter:

“Hello: Who be ye? What ye want?”

“Not so loud,” cautioned Mr. Lane. “They can hear you better if you speak lower.”

But Mr. Savage paid no attention. However he ceased to shout, as the person on the other end of the wire was talking. The farmer listened intently. Then he began to reply:

“Yes, yes,” he said. “I s’pose I kin do it, but it takes my time. Yes, ye can tell her I’ll send it. Guess I’ll have t’ let Dan bring it over. I’ll send him t’night so’s t’ save time.”

Then he hung up the receiver.

“No bad news, I hope?” asked Mrs. Lane politely.

“Yes, ’tis,” replied Mr Savage. “My sister Lucy, over t’ Pokesville, was took suddenly sick this mornin’. That was her husband telephonin’ t’ me. Lucy wants a bottle of our old family pain-killer. I ain’t got none in th’ house, nuther, an’ I’ve got t’ go t’ th’ village an’ git it. That’s goin’ t’ take a lot of time.”

“Is she very sick?”

“Pretty bad, I guess, th’ way her husband talked,” but Mr. Savage did not seem to regard this so much as he did the loss of his time.

“He didn’t say nothin’ about pay, nuther,” he went on. “I’ve got t’ put out my money fer th’ medicine, an’ goodness knows when I’ll get it

back. Then I’ve got t’ let Dan Hardy take th’ medicine over Land sakes! But sickness is a dreadful nuisance! I don’t see what wimen is allers gittin’ sick fer.”

“Probably she couldn’t help it,” said Mr. Lane, who was indignant at the lack of feeling on the part of Mr. Savage.

“Wa’al, mebby not. But it takes my time an’ money just th’ same. But I’ll make Dan travel by night, so he won’t lose any time.”

“No, it would be too bad if he had a few hours off,” said Mr. Lane in a sarcastic manner, which, however, was lost on Mr. Savage.

“I’m—I’m much obliged fer callin’ me,” said the old farmer as he started away. It seemed as if it hurt him to say those words, so crusty was he.

“Oh, you’re welcome,” replied Mr. Lane coldly.

Mr. Savage found his wife much excited, waiting for him to return and tell her about the message, for she had learned from Dan where her husband had gone.

“Has any one died an’ left ye money?” was her first question.

“No sech luck,” replied Mr. Savage. “I’ve got t’ spend money as ’tis.”

“Can’t ye send the medicine by express, collect?” asked Mrs. Savage when her husband told of his sister’s illness.

“Wish I could, but there ain’t no express that goes that way. No, I’ve got t’ send Dan.”

Mr. Savage began his preparations to go to the village after the old family pain-killer, a mixture that had been used by himself and his relatives as far back as he could remember. It was taken for almost every thing, as few people care to call in a doctor every time they feel ill.

“You’ll have t’ go over t’ Pokeville t’night, Dan,” said the farmer, as he was driving to the village, and he explained the reason. “Ye kin go hoss back, an’ that’ll be quicker,” he added.

“It will take nearly all night to go there and return, Mr. Savage.”

“Wa’al, I can’t help it. Ye ought t’ be back here by five o’clock, an’ ye kin git an hour’s sleep, an’ start in t’ work. Boys don’t need much rest.”

“No, nor not much else, if you had your way,” thought Dan. “Oh, I’ve a good notion to run away, only I don’t know where to run to. I must see Mr. Harrison, and ask his advice,” and, pondering over his hard lot, Dan continued to hoe the potatoes.

CHAPTER VIII

A MIDNIGHT RIDE

M S took his time in going to the village, and did not come back until after the dinner hour.

“Did ye git th’ medicine?” asked Mrs. Savage of her husband.

“Yep, an’ it cost a dollar.”

“Tell Dan t’ ask yer sister fer it ’fore he delivers th’ bottle,” advised the farmer’s wife.

“I guess that would be a good way,” decided Mr. Savage, brightening up. “I kin charge fer my time, too. I’ll have him ask a dollar an’ a half. That won’t be so bad,” and he rubbed his hands in satisfaction over the idea.

It was getting dusk when Dan started away on the back of the mare Bess. He had been obliged to do all his chores before starting, for Mr. Savage never thought of such a thing as omitting any of the farm work for the sake of his sick sister.

“Now don’t ride th’ mare t’ death,” cautioned Mr. Savage as Dan rode out of the barn. “Ye’ve got all night, an’ th’ trip’ll de ye good.”

Dan had his doubts about this, but he said nothing.

“I wonder if Mrs. Randall is very sick?” the boy said to himself as he was going along the highway toward Pokeville in the dusk. “If I had a sister who was ill I wouldn’t wait until night to send her medicine. Pokeville is so far away from any stores and doctors she might die before help could reach her. I wonder what time I’ll get there? I don’t know the roads very well, and, from what the hired men said, they don’t either. Seems to me Mr. Savage could have driven over this afternoon by daylight. But I suppose he did not want to waste the time.”

Thus pondering, Dan rode on. It was quite lonesome, for in the country, people do not travel after dark unless they have to, and there was no occasion for many to be out this evening.

Dan met a few farmers whom he knew, and spoke to them, but, after a while, it got so dark he could not distinguish faces, even had he passed any one, which did not happen once he got well outside of Hayden.

The mare plodded on, Dan stopping now and then to look at sign boards to make sure he was on the right road. About ten o’clock he came to a place where four highways met. He got off the mare’s back and lighted matches to read what the signs might have on them. In the uncertain light he thought he read, “Pokeville three miles,” with a finger pointing to the east.

“Well, that’s not so bad,” he murmured. “I can do three miles in half an hour, and I may get back home in time to sleep.”

He mounted the mare again, and started off at a smart trot. He soon began to look for some signs of a village, but after he had ridden nearly an hour, and all there was to be seen was the dark road, he began to have his doubts.

“I wonder if I am going the right direction?” he asked himself. “Bess, don’t you know?”

The mare whinnied an answer, but what it was, Dan, of course, could not tell.

“I’ll keep on until I come to a house,” he decided, “and then I’ll ask my way.”

About ten minutes after this he came to a lonely residence, standing beside the road. It was all dark, and the boy disliked to knock and arouse the inmates, but he felt it was a case of necessity. Dismounting, and taking care not to smash the bottle of medicine in his coat pocket, he pounded on the door. The persons inside must have been sound sleepers, for it was nearly five minutes before a window was raised, and a man, thrusting his head out, asked:

“Well, what’s the matter? What do you want?”

“I am sorry to trouble you,” said Dan, “but is this the road to Pokeville?”

“Land sakes no!” exclaimed the man. “You’re five miles out of your way. What’s the trouble?”

“I am taking some medicine to a sick woman over there, and I looked at a sign post some five miles back. I thought it said Pokeville was in this direction.”

“I see, you made the same mistake other persons make. There’s a town called Hokeville, which is on this road. Lots of people look at the sign in a hurry, and think it says Pokeville.”

“I should think they’d make the letters big enough so people could read them easily,” remarked Dan.

“That’s what they ought to.”

“What’s the matter? Is the house afire?” demanded a woman sticking her head out beside that of the man.

“No, no, Mandy. Every time there’s any excitement at night you think there’s a fire.”

“Well, the chimbley was on fire onct!”

“Yes, but it isn’t now. This is a boy inquiring his way to Pokeville. Some one is sick and he’s taking her some medicine.”

“Who’s sick, boy?” asked the woman, determined not to miss a chance to hear some news, even if she had been awakened from sleep.

“Mrs. Randall.”

“What, not Lucy Randall, she that was a Savage?”

“She’s Mr. Savage’s sister.”

“Do tell! What’s the matter with her?”

“I don’t know, ma’am.”

“If you’ll go back about a mile, an’ take the first road you come to, an’ then go along that until ye come to an’ old grist mill, an’ then take th’

first turn to the left, an’ the second to the right after that, ye’ll git on th’ right road to Pokeville,” said the man.

“I’m afraid I can’t remember all that,” replied Dan.

The man repeated it for him, and at last the boy thought he could find his way. He thanked his informant and started off, the woman calling after him:

“Tell Lucy that Mandy Perkins, her that married a Linton, was askin’ fer her.”

“I will,” promised Dan.

He hurried Bess along through the dark night, and along the unfamiliar road. He got as far as the grist mill, which he could dimly distinguish, and then he was at a loss. He might have taken the wrong road again, but, fortunately a farmer unexpectedly came along and directed him.

It was past midnight when Dan rode into the village of Pokeville. It seemed as if every one was asleep as he trotted through the lonely main street. Now and then a dog barked, or a rooster crowed, but that was the only sign of life. Dan knew his way now, for Mr Savage had given him directions how to find the house of Mrs. Randall. It was close to one o’clock when the youth knocked at the door. He expected to find the house lighted up, since some one in it was ill.

“I wonder if she can be—worse or dead,” he murmured. “It seems very quiet.”

Some one must have been up, however, for, a few moments after his pounding on the front door had awakened the echoes of the silent house, an upper window was raised and a head thrust out.

“Everyone seems afraid to open their front doors after dark,” thought the lad.

“Well, what’s wanted?” asked a man’s voice.

“Here is the medicine from Mr. Savage.”

“Oh! All right. I’ll be right down,” and the man drew in his head. A woman’s at once replaced it.

“Who is it?” she asked.

“I’m Dan Hardy. I work for Mr. Savage. He sent me with the medicine for his sister.”

“I’m his sister, but a lot of good the medicine will do me now. I’m all over my pain. What kept you so long?”

“I lost my way in the dark.”

“Humph! That’s a fine excuse! More likely you stopped to play. You ought to have been here at nine o’clock.”

“I did not start until nearly seven, madam.”

“I don’t care. You were so long that my pain all went away, and I don’t need the medicine now. You’re a lazy boy, Dan Hardy, my brother says so! The idea of keeping me suffering while you loitered on the road.”

“I did not loiter. I came as quickly as I could.”

“I don’t believe you. You did it on purpose to make me suffer. I’ll tell my brother.”

By this time the man had opened the door, and had taken the bottle of medicine.

“Mr. Savage says it will be a dollar and a half,” said Dan, giving the message he had been instructed to deliver.

“What’s that?” inquired Mrs. Randall from above. “A dollar and a half? You can take it right back again. I’ll never pay a dollar and a half for medicine, and you can tell my brother so. Give it right back to him, Sam, I insist on it.”

The man did not seem to know what to do, but stood holding the medicine in one hand, and a lamp in the other

“Do you hear?” called Mrs. Randall. “Give it right back to that lazy boy, and he can take it home again. If it had come in time to do me any good I’d a kept it, but I won’t now. My pain is all gone, and I’m not going to waste a dollar and a half. Give it back, I say.”

Looking rather ashamed the man handed the bottle to Dan.

“I guess you’ll have to take it back,” he said.

CHAPTER IX

THE MYSTERIOUS MEN

T was no help for it. Dan could not make the Randalls take the medicine, though he knew when he got back to Mr. Savage, he would be blamed for their failure to keep it.

“Come on in and shut the door, Sam,” commanded the woman sharply. “First thing you know you’ll catch cold, and then you’ll have to take medicine.”

“If he does I hope I don’t have to bring it,” thought Dan.

“Good night,” said Mr. Randall, as he closed the door and locked it. Overhead Mrs. Randall slammed down the window.

“And they didn’t even say thank you,” mused Dan as he put the bottle into his pocket and mounted Bess.

Certainly it was a very mean way to treat him, for he had done his best. He was glad that Mrs. Randall had recovered, but he knew Mr. Savage would think less of that than the fact that he had spent a dollar for medicine, and was not likely to get it back.

“Maybe he’ll keep the bottle for himself,” thought Dan. “In that case he’ll not be so angry at me for bringing it back.” But this was a remote possibility.

The only consolation the boy had was that he had done his best. That he was late was no fault of his, for any one might have been deceived about the roads.

Tired and hungry, for the ride gave him a hearty appetite, Dan started back. He thought his employer’s sister might at least have asked him to come in after his long journey, and have given him something to eat. But Mrs. Randall was not that kind of a person.

Her husband was a goodhearted man, but his wife ruled him, and he was somewhat afraid of her.

“Well, I don’t believe I’ll get lost going back, anyhow,” thought the boy. “That’s one comfort.”

Dan rode on for several miles. He passed through the town of Marsden, and knew he was within about an hour’s ride of home. It was still very dark, and the sky was overcast with clouds.

“I guess I’ll get a chance to sleep before I have to do the chores,” the lad reasoned. “Maybe I’ll stay out in the barn after I put Bess up. I can snuggle down in the hay, and I’ll not disturb Mr. Savage. I’d never get any sleep if I told him his sister wouldn’t take the medicine.”

Deciding on this plan Dan called to Bess to quicken her pace. As he went around a turn in the road, the soft dust deadening the hoofbeats of the mare, the boy saw a mass of dark shadows just ahead of him.

“Looks like a wagon,” he decided. “Probably some farmer around here making an early start for market.”

Bess settled into a slow walk, for the mare was tired, and Dan did not urge her. Just then the moon, which had risen late, shone dimly through a rift in the clouds. Dan was in the shadow cast by some willow trees, and he could see quite plainly now, that there were several men and a wagon, just ahead of him.

The wagon stood in the middle of the road, which at that point, was lined with woods on either side.

“THEY’RE HIDING STUFF,” DAN REASONED.—Page

73.

“Maybe they’ve had a break-down,” said Dan to himself. “Perhaps I might be able to help them.”

He was about to ride forward when he saw one of the men take a bundle from the wagon, run across the road with it, and disappear in

the woods.

“That’s funny,” thought the boy. “What can he be doing?”

A moment later another man did the same thing, carrying what seemed to be a heavy bundle. A third man remained on the wagon.

Dan saw that by keeping on one side of the road he would be in the shadow and could remain hidden until quite close to the mysterious men. Softly he urged Bess forward, the horse making no sound.

Once more he saw the men approach the wagon, take something out, and disappear in the woods with it.

“They’re hiding stuff,” Dan reasoned. “I wonder if they’re chicken thieves?”

Several henhouses had been robbed of late in the vicinity of Hayden, and Dan was on the alert.

“If it was hens they had,” he went on, “I could hear some noise made by the fowls cackling or flapping their wings. Whatever they have doesn’t make any noise.”

For perhaps the fourth time the mysterious men made a trip from the wagon into the woods. When the two returned the man on the wagon seat inquired:

“Well, got it all planted?”

“Yes, and planted good and deep,” was the reply.

“Come on then,” and the two who had carried the bundles got into the wagon, the horse being whipped to a gallop at once.

“Well, that’s queer,” decided Dan, as he saw the wagon vanish amid the shadows. “They were planting stuff, but what sort of stuff would they plant at night, and where could they plant it in the woods? There’s something queer back of this. I wish I could find out what it was, but I can’t stop now, or I’ll not get back to the farm before sunrise.”

He looked to make sure none of the men remained behind, and then he rode up to where the wagon had halted. There were visible in the

dust by the dim moonlight, the marks of many feet, and the bushes on one side of the road were trampled down, showing where the men had entered the woods.

A gleam of something shining in the dust attracted Dan’s attention. He got off the mare and picked it up.

“A silver spoon!” he exclaimed as he examined it. “Perhaps those fellows were silverware peddlers and were hiding their stock in trade until they could come for it. Well, it doesn’t concern me, and I guess I’d better be going. I’ll keep this spoon for a souvenir.”

He thrust it into his pocket, jumped on the back of Bess, and was off down the road. He kept a lookout for the wagon containing the men, but it seemed to have turned off down a side road.

Dan reached the farm just as it was getting dawn. He put the mare in her stall, after watering and feeding her, and then crawled up into the hay-mow, hoping to get a little sleep before it would be time to start on his day’s work.

He must have slept quite soundly, for the sun was shining into the barn when he was awakened by hearing Mr. Savage exclaim:

“Wa’al, I wonder what keeps that lazy boy?” Then the farmer caught sight of the mare in her stall. “Why, he’s back! I wonder where’s he hidin’? Loafin’ again, that’s what he is!”

“Here I am, Mr. Savage,” spoke Dan, sliding down from the pile of hay.

“What in th’ name a’ Tunket be ye doin’ thar?”

“I did not want to disturb you by coming into the house, so I slept out here.”

“Thought ye’d git a longer chance t’ lay abed, I s’pose. Did ye git th’ dollar an’ a half?”

Mr. Savage thought more of the money than he did of the condition of his sister, as was evidenced by his question. “No, I didn’t get it,” answered Dan.

“Ye don’t mean t’ say ye lost it? Lost a dollar an’ a half! If ye have—”

“They wouldn’t give it to me. She did not need the medicine, as she was better.”

“Didn’t want th’ medicine, after I bought it fer her? Why, good land a’ Tunket: She has t’ take it!”

“She said it came too late.”

“Then it’s your fault, Dan Hardy! Ye loitered on th’ road, until my sister got better. Now I’m out a dollar an’ a half through you! I’ll make ye pay—”

Then Mr. Hardy seemed to remember that Dan never had any money, and could not pay for the medicine.

“I’ll—I’ll get it outer ye somehow,” he threatened. “Th’ idea of wastin’ time, an’ makin’ me throw away a dollar an’ a half!”

“Can’t you keep the medicine until you or Mrs. Savage need it?” asked Dan, trying to see a way out of the difficulty.

“Keep it? Why I might have it in th’ house six months an’ never need it. I ain’t never sick, an’ there is my money all tied up in a bottle of medicine. It’s a shame, that’s what ’tis! Wimmin folks hadn’t oughter git sick! Now ye’d better git t’ work. Them onions need weedin’ agin.”

“I haven’t had my breakfast, Mr. Savage.”

“Wa’al, ye don’t deserve any, but I s’pose ye got t’ have it. Go int’ th’ house, but hustle.”

The boy, who resented this way of being talked to, started for the kitchen, where he knew he was likely to meet sour words and angry looks from Mrs. Savage.

“I s’pose I’d better put this bottle away where it won’t git busted,” murmured Mr. Savage, as he followed Dan. “I’m goin’ t’ write t’ Lucy, an’ see if I can’t make her pay me fer half of it anyhow.”

Dan thought his employer would have hard work making Mrs. Randall pay.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.