![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/81a461e78ed60969a4d5d9988f5ac382.jpeg)
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/d422d89c2b5e0b072568afb16c5d55e9.jpeg)
Thisresearchispartofthe’LINCproject’.Thisprojectisco-financedbytheEuropeanRegionaland DevelopmentFundthroughtheUrbanInnovativeActionsInitiative.TheLINCprojectconsistsofalarger consortiumledbyGate21.TheconsortiumincludesMunicipalityofAlbertslund,MunicipalityofGladsaxe, NobinaDanmarkA/S,IBMDanmarkApS,RoskildeUniversity(RUC)andTheTechnicalUniversityof Denmark(DTU).TheprojectisfundedbyprojectpartnersandtheEUprogrammeUrbanInnovativeActions (UIA),whichissupportingtheprojectwith25millionDKK.
ValentinoServizi DepartmentofTechnology, ManagementandEconomics TechnicalUniversityofDenmark(DTU) valse@dtu.dk
DanR.Persson DepartmentofAppliedMathematics andComputerScience DTU
PerBækgaard DepartmentofAppliedMathematics andComputerScience DTU
FranciscoC.Pereira DepartmentofTechnology, ManagementandEconomics DTU
JeppeRich DepartmentofTechnology, ManagementandEconomics DTU
HannahVilladsen DepartmentofPeopleandTechnology RoskildeUniversity Denmark
OttoA.Nielsen DepartmentofTechnology, ManagementandEconomics DTU
February20,2022
ABSTRACT
IntelligentTransportationSystems(ITS)underpintheconceptofMobilityasaService(MaaS),which requiresuniversalandseamlessusers’accessacrossmultiplepublicandprivatetransportationsystems whileallowingoperators’proportionalrevenuesharing.Currentusersensingtechnologiessuchas Walk-in/Walk-out(WIWO)andCheck-in/Check-out(CICO)havelimitedscalabilityforlarge-scale deployments.TheselimitationspreventITSfromsupportinganalysis,optimization,calculationof revenuesharing,andcontrolofMaaScomfort,safety,andefficiency.Wefocusontheconceptof implicitBe-in/Be-out(BIBO)smartphone-sensingandclassification. ToclosethegapandenhancesmartphonestowardsMaaS,wedevelopedaproprietarysmartphonesensingplatformcollectingcontemporaryBluetoothLowEnergy(BLE)signalsfromBLEdevices installedonbusesandGlobalPositioningSystem(GPS)locationsofbothbusesandsmartphones. ToenablethetrainingofamodelbasedonGPSfeaturesagainsttheBLEpseudo-label,wepropose theCause-EffectMultitaskWassersteinAutoencoder(CEMWA).CEMWAcombinesandextends severalframeworksaroundWassersteinautoencodersandneuralnetworks.Asadimensionality reductiontool,CEMWAobtainsanauto-validatedrepresentationofalatentspacedescribingusers’ smartphoneswithinthetransportsystem.ThisrepresentationallowsBIBOclusteringviaDBSCAN. WeperformanablationstudyofCEMWA’salternativearchitecturesandbenchmarkagainstthebest availablesupervisedmethods.Weanalyzeperformance’ssensitivitytolabelquality.Underthe naïveassumptionofaccurategroundtruth,XGBoostoutperformsCEMWA.AlthoughXGBoostand RandomForestprovetobetoleranttolabelnoise,CEMWAisagnostictolabelnoisebydesignand providesthebestperformancewithan88%F1score.
Keywords Device-to-device Sensor-to-sensor Ground-truth-validation Wasserstein-auto-encoders Autonomousvehicles
1Introduction
Trackingpassengermovementsthroughthepublictransportnetwork,seamlesslyandwithoutdirecthumaninteraction, requiresaccuratemodelsandmethodstodiscriminatebetweenpassengersthatareusingthepublictransportnetwork andanyoneelseoutsidethetransportnetwork.WhiletheaccuratesolutionofsuchanimplicitBe-In/Be-Out(BIBO) classificationproblem[Narztetal.,2015],isdirectlyrelevantasameantocollectimportantdatafromthepublic transportsystem,e.g.Check-in/Check-outorWalk-in/Walk-outstatistics,itisrelevantforotherareasaswell.This includesasanexample,thetrackingofpersonsenteringbuildingstocomplywithsafetymeasuresandtheregistration, andtrackingofpeopleinsupermarketstosupportcrewmanagementindifferentpartsofthesupermarket.However, trackingofpublictransportusersrepresentamorecomplexprobleminthatbusesandpassengersmoveinspaceandtime.Asaresult,wewillarguethattheabilitytoproviderobustsolutionsforpublictransportapplicationsisa stepping-stonefortheseotherrelevantapplications.
Solvingthebeforementionedclassificationproblemisimportantforseveralreasons.Firstly,ontheverypracticalside itprovidesameanstocollectvaluabledataaboutpassengerflowsthatwouldotherwisehavebeenlostforuserspaying bycash,oraccidentallytravelingwithoutchecking-in.Secondly,itwouldenablecontext-awaresurveyingandservices whileliftingtheburdenofexplicitinteractionfrompassengers.Thirdly,forplanningoptimaldeparturetimesandroutes ofatripthroughthepublicnetwork,itwouldsupportpersonalizeddynamicrecommendations.
InawiderperspectivethepresentedmethodologycanbeseenasanimportantcomponentinMobility-as-a-service (MaaS)systems.MaaScombinesmultipletransportmodesastransportservices–e.g.,car,bus,bike,scooter–offered throughasingleinterface,andpaidwiththesameuniquesubscription,asthemediacontentson“Netflix”[Hietanen, 2014,HensherandMulley,2021].Hence,MaaSisessentially “adata-driven,user-centeredparadigm,poweredby thegrowthofsmartphones” [Goodalletal.,2017].Regardlessfromtheperspective,MaaSultimategoalistoenable adoor-to-doorpublicservice,attractiveforthepassengers,andcompetitivewith,e.g.,privatelyownedcars.Inthis context,theabilitytoaccuratelytrackpassengerswhiletravelingwouldunderpintheefficientcapacityplanningfora dynamic,responsive,andintelligentpublictransportparadigm.
IntheMaaScontext,smartphone-basedautomaticfarecollectionsystems(AFCS)withBIBOcouldallowtheintegration ofpublicserviceticketing,automaticpricecalculation,andafaircostsplitacrossmultipleoperators.Thelatterpoint includesemergingprovidersof,e.g.,car-andbike-sharingservices.ComparedtoCICOandWIWO,BIBOoffersat leasttwoadvantages:(i)publictransitincreasedcomfortforpassengers[WirtzandKlähr,2019];and(ii)operational integrationmostlysoftware,withanegligibleimpactonnewphysicalinfrastructure.Thesecondpointmeanspotentially loweraccessbarriersforemergingtransportserviceproviderstoMaaS.Forthefirst,werefertothepassengersincreased comfortwiththetermticketless.Ticketlessidentifiestheperspectiveofasystemabilitytoflexiblyadaptingthetransport servicebilltotheuser’sjourney(s)acrossmultipleserviceproviders,asopposedtotheperspectiveofmultipletickets necessaryfrommultipleserviceproviders,forthesamejourney.
FromtheBigDataperspective,handlingthisbinaryclassificationproblemwithsupervisedmachinelearningmethods presentsthefollowingchallenges:
1.Controllingnoiseinthelabels:
2.Operatingasustainablelabelscollectioncost;
3.Minimizingtheimpactofsensorsanddatacollectiononthebattery;and
4.Minimizingtheusers’privacyexposure.
Thesechallengesinvolvetheserviceoperator’sperspectiveinthefirstcaseandthesmartphoneuser’sperspectiveinthe others.
Althoughfromaticketingperspectivethereshouldbenonoise,thusoneshouldonlybechargedwhenheorshe usesatransportservice,whenusingticketsaslabelstotrainmachinelearningalgorithms,theassumptionofpossible undetectedticketingerrorsfrombothsides–passengerandserviceprovider–seemsmorethanreasonable.
Miningtransportbehaviorfromsmartphonesdatarelies,amongothersensors,onGlobalPositioningSystem(GPS), InertialNavigationSystem(INS),andBluetoothLowEnergy(BLE)signal[Servizietal.,2021a].Inurbanareas, where80%ofpublictransportdemandoccurs[BaescuandChristiansen,2020](e.g.,inDenmark),theclassificationof sensors’observationsiscomplex.WithGPS,anytransportationmodelooksthesameduetoacombinationoffactors, suchasGPSerrorsinurbancanyons,proximitybetweenpedestriansandbuses,andvehicles’lowspeedsincongested traffic[CuiandGe,2003].WithINS,multiplehabits,eachcorrespondingtowhetheronecarriesasmartphone,e.g., inthepocketorthebag,determinedifferentsensorspatterns[Wangetal.,2019];theintegralofanynoiseincluded inthesensors’signal,inaddition,leadstooftenunmanageableerrordrifts[Foxlin,1996].TheBLEsignal,whichis
extensivelystudiedforindoortracking,presentsanexcellentpotentialforproximitysensingandbatteryefficiency [Bjerre-Nielsenetal.,2020].However,smartphones’signalrecordsofBLEdevicesinproximitysufferfromsignal gaps[Malmberg,2014];ahigherspatialdensityofBLEdevicesallowsgoodindoor-trackingperformance,butsuch adensityisnotscalableatacityscale.Incontrast,GPSandINSscalingpotentialcorrespondtoaheavyimpacton thesmartphones’battery[Servizietal.,2021a].Inthefirstcase,thesensorisdirectlyresponsiblefortheenergetic consumption.Inthesecondcase,thesensors’energyconsumptionissustainableaslongasthesignalsareclassified onlinewithinthesmartphone.Yet,duetothehighsamplingratenecessaryforachievingacceptableclassification performance, > 20Hz,dataconsumptionoutsidethesmartphonewouldimplyhighnetworkenergyconsumption fordatatransfer[Servizietal.,2021a].Intheassumptionoftrainingasupervisedmachinelearningalgorithmwith high-qualitylabels,BIBObinaryclassificationintheurbancontextseemsadifficulttask.Whenlabels’qualitydegrades, wefaceanotherlimitationasclassifiers’performancecanbehighlybiased—consequently,decisionswouldbebasedon scoreslookinghighwhentheyarelowinrealityandvice-versa[Servizietal.,2021b].Toovercomethelimitations mentionedabove,inthiswork,werelyonauniquedatasetcollectedduringthreemonthsofautonomousbuses’ operationsacrossalocalpublicnetworkinDenmark.ThedatasetincludestheGPSandBLEtrajectoriescollectedfrom busesandpassengers’smartphonesthroughaproprietarysmartphone-sensingplatform,including300BLEdevices installedinbuildingsnearthebusnetwork,inthebuses,andatbusstops.Anothersetofthedataprovideshigh-quality groundtruthcollectedbyusersthatfollowedpreciseinstructionsonindividualsequencesoforiginsanddestinations withinthebusnetwork,alongspecificroutes[Shankarietal.,2020].
1.1LiteratureReview
ThesolutionweproposefortheBIBOclassificationprobleminvolvestheimplicitinteractionofpassengersmartphones, buses,andbus-network[Servizietal.,2021b,Narztetal.,2015].Therefore,itfallswithintheintersectionofseveral disciplinesconvergingaroundsmartphone-basedtravelsurveysandsmartphonesindoortrackingwithBLEnetwork interaction.Inthefirstcase,leveragingsmartphoneonboardsensors,weareinterestedinthelimitationsofthemethods formodedetectioningeneralandbusdetectioninparticular[WirtzandKlähr,2019];inthesecondcase,weare interestedinhowtodealwithBLEsignals[Servizietal.,2021b].
Theliteratureonmodedetectionfromsmartphonesdataispervasive.GPSandINSsensorsarethemostusedalsoto providelocation-andperson-agnosticmodeclassification.GPSandINSsystemsgenerateverydifferenttrajectories. Thefirstsystemprovidesageospatialtimeserieswithasamplingrate ≥ 1Hz [Servizietal.,2020,DabiriandHeaslip, 2018];thesecondsystem,athree-dimensiontimeseriesalongthethreeaxesofthesmartphone’sreferenceframe,anda samplingrate ≥ 20Hz [Servizietal.,2021a,Cornacchiaetal.,2017].Topreparethedatafortheclassification,the stepsonefollowstocleanandsegmentthesetrajectoriesdiffertoo.However,thebest-performingclassificationmethods consistoftwomaingroups.Thefirstgroupincludessupervisedmethods,suchasdecisiontrees,randomforest,and XGBoost[Koushiketal.,2020];thesecondgrouphasvariousconfigurationsofartificialneuralnetworks(ANN),both supervisedandsemi-supervised.Unsupervisedmethodsbasedonclusteringareapplieddirectlytofeaturesextracted fromGPSandINS,buttheirperformanceseemsbelowthesupervisedandsemi-supervisedmethodsmentionedabove. ThebloomingliteratureonbothGPS-andINS-basedmodedetectionproposesveryeffectivemethodologies,equally accuratewhendatasetsincludeurbanandoutskirtareasandmultipletransportationtargets[Servizietal.,2021a]. However,atlowspeeds,state-of-the-artINS-basedonlineclassifiersavailableontheleadingsmartphoneoperation systemsseemunabletodiscriminatebetweenbusandwalkmode.Incontrast,GPSandBLEclassifiersshowhigher performance[Servizietal.,2021b].
Amongthestudiesfocusingonmodedetectionandpublictransportation,specificallybuses,themostpromisingare consideringtheinteractionbetweenusersandthetransportnetwork.Thisinteractioncouldbeexpressedasthetime seriesofthedistancesbetweeneachpointofasmartphone’sGPStrajectoryandeachpointofinterest(PoI)extracted fromtheinfrastructuremappedonGIS[Semanjskietal.,2017].Theclassificationcouldbepoint-based,thusrelying onshortsegments.Anotherapproach,whichwedefinesegment-based[Servizietal.,2021a],couldlookatlonger tripsegmentsandtheperiodicityofstopstypicalofanybusoperation[Zhangetal.,2011].However,whilethefirst approachsuffersthelimitationfromtheGPSerrorindenseurbanareas,thesecondapproachseemsineffectivefor shorttrips.
LiteraturefocusingonBLEandWiFisignals–bothbasedonthesamecommunicaitonfrequencyandprotocolssharing somesimilarities–convergesbetweenindoortrackingandmodedetection.Thetraditionalmethodologiesleveragethe Friisequation,andthetrilateration[Kotanenetal.,2003,Subhanetal.,2011].However,machinelearningmethods suchasrandomforestsandGaussianprocessesareeffectiveinBLEorWiFifingerprintclassification,andspatialsignal mapping[Chenetal.,2015,Subhanetal.,2013,PérezIglesiasetal.,2012].ToallowoptimalBIBOsensingand classificationwithBLEdevices,wefindnoclearcontributionsontheminimumspatialdensityofBLEdevices,nor howtocoverthescaleofacity[Servizietal.,2021b].Therefore,werelyonliteratureaboutindoortracking[Yassin
etal.,2017]andpreliminaryBIBOexperimentswithBLEsignals[Servizietal.,2021b],suggestingthatBLEdevices installedinbusesandbusstopscouldofferacoveragesufficientforclassification.Consequently,suchaconfiguration wouldhavethepotentialtocovertheentirecityatareasonablecost.
Theparallelgrowthofcomputationpoweranddatavolumekeptincheckthetradeoffbetweencomputationalcapacity andclassificationperformance.Ontheonehand,ComputationProcessingUnits(CPU)andGraphicalProcessingUnits (GPU)havecreatedsizeableextracomputationpotential.Ontheotherhand,thepursuitofbetteraccuracyleveraging, forexample,thepervasiveintroductionofcheapsensorsandrichGeographicInformationSystems(GIS),immediately absorbedthisadditionalcapacity.Overall,transportationmodeclassifiersdeployedondatafromurbananddensely populatedareasdidnotincreasetheirperformanceproportionallywiththedataconsumption.Therefore,statistical methodsdevelopedbeforetheBigDataparadigm[SchuesslerandAxhausen,2009],andmachinelearningmethods developedafter[Koushiketal.,2020],maystillcompete.Afactoremergingfromtheliteratureisthatmethodsstill dependheavilyonlabels.Eventhoughsomesemi-supervisedconfigurationofartificialneuralnetworksexistsinthis fieldandreducestheneedforlabelsintheclassifier’strainingphase,filteringasubsetofhigh-qualitylabelsfromBig datasetisstillverychallengingandhardlyscalable.Forexample,continuousdisruptionsoftransportoperationsdue toroadworkorspecialeventswouldalsodisruptanyclassifiertrainedwithlabelsthatnolongerreflectthetransport network[Petersenetal.,2021].Evenintheassumptionofoperationsstability,theimpactofflippingandoverlaying labels–potentiallypresentduetohumancollectionerrors–seemsstillcritical.Supervisedclassifiersdeployedontime series,e.g.,fortheBIBOtask,coulddeliverbiasedclassificationsandthreatenthesystem’ssustainabilityatscale. Theproblemdeservesmoreattentioninthisfield,andfortimeseriesrequiresatleastthesameattentiongrantedto independentandidenticallydistributeddata.Systematicstudiesandappropriatemethodologiesinthesecondcase exist,suchasforimageclassification.However,fortimeseriesclassificationthesecontributionsareonlypartially applicable.Furthermore,existingpreliminarystudiesabouttheimpactofflippinglabelsontimeseriesclassification showthatseverebiasonthemeasurementsoftheseclassifiers’performanceispresentwhenjust10%ofthelabelsare wrong.Insuchacase,althoughtheclassifiersmightberesilienttolabels’noise,analystsandpractitionerswouldbase theirdecisionsonabiasedperformanceevaluation,simplybecausetheerrorrateinhumanvalidatedlabelsisunknown [Servizietal.,2021b].
1.2ContributionofthePaper
ThispaperfocusesonthecombineduseofGPSandBLEsignalsforunsupervisedautovalidatedBIBOclassification ofbuspassengers.RepresentingtheuserviathesmartphoneandthebusviaaBLEdevice,weusesensorssignalsas pseudolabelstolearndiscriminatingwhenauserisinside(BI)oroutside(BO)thebus.
Thecentralintuitionisthatwhentheuserisinsidethebus(BI)thedistancebetweensmartphoneandbusshouldbe closetozero,andtheproximitytoBLEdevicesinstalledinthebuswouldcausethehighestsignalstrength.Vice-versa, whentheuserisoutsidethebus(BO),theconsiderabledistancebetweentheuserandtheBLEdeviceshouldcausethe lowestsignalstrengthornosignalatall.
Tolearnthecause-effectrelationshipbetweensmartphone-busproximityandBLEsignalstrength,weimplementtwo parallelWasserstainAutoencoders(WAE).OnelearnshowtoreconstructthetimeseriesoftheBLEsignal(effect) giventhesmartphone-busproximity(cause).GiventheBLEsignalstrength(effect),theotherlearnstorebuildthe smartphone-busdistance(cause).Wedefinethisconfigurationasacause-effectmulti-taskWassersteinAuto-encoder (CEMWA).FromtheunsupervisedtrainingofthisCEMWA,welearntoreducethedescriptionoftheinteraction betweenpassengersandbusestoonlyfourdimensions.Inthis4-dimensionallatentspace,theobservationsself-organize suchthatdiscriminationbetweenBIandBOclassesispossiblethroughunsupervisedclusteringwithDensity-based spatialclusteringofapplicationswithnoise(DBSCAN).
CEMWAcombinesandextendsthefollowingframeworks.(i)Split-brainAuto-encoderconfigurationbyZhangetal. [2016];(ii)Deepclusteringforunsupervisedlearning[Caronetal.,2018];(iii)Multi-taskformulationoftheobjective functionbyKendalletal.[2018];(iv)MaximumMeanDiscrepancy(MMD)formulationoftheobjectivefunctionfor generativemodelsbyGrettonetal.[2008];and(v)MMDextensiontoWassersteinAuto-encodersbyTolstikhinetal. [2017].
Theresultingarchitecturesolvesthescalabilityproblemrelatedtonoiseinlabels.Weperformanablationstudy includingtraditionalWAEarchitecturesandsupervisedmethods.Resultsshowthatourunsupervisedclassifiersolves thenegativeimpactofthelabel-inducedbiasaffectingsupervisedclassifiers.Moreover,thearchitecturewepropose embodiesasolutionforsignaldataimputation,whichisgenerallyacriticalandseparatestepnecessarytoperform goodclassification.Finally,sincethemethodreliesonlyontheinteractionbetweensmartphoneandbus,temporaryor permanentdisruptionsofthenetworkwouldnotaffecttheclassificationtask.
Pesudo-label
X1
1D Conv
X2
Backpropagation
Dense
Clustering
Figure1:Cause-effectMulti-taskWassersteinAuto-encoder(CEMWA)independent cross-reconstructionof X1 ,X2 minimizing (7) andclusteringoftheresultinglatentspace,5028 parameters. X2
Pesudo-label
Backpropagation
Dense
Clustering
Figure2:Multi-taskWassersteinAuto-encoder(MWA)independentreconstructionof (X1 ,X2 ) minimizing(3),with c = LWAE andclusteringoftheresultinglatentspace,5028parameters.
1D Conv
Pesudo-label
Backpropagation
4 Dense
Clustering =
Figure3:WassersteinAuto-encoder(WA)reconstructionof X =(X1 ,X2 ) minimizing (1) and clusteringoftheresultinglatentspace,4932parameters.
2MethodsandMaterials
Thissectionpresentsanumberofframeworkssupportingourgoalofsubstitutingordinarylabelsfortrainingsupervised orsemi-supervisedartificialneuralnetworksspecializedinprocessingGPSsignal.Threearethemainstepsbehindthe intuition.Firstly,insteadoflabelsweleverageanindependentsensortime-series–BLE–forrepresentationlearning ofcause-effectrelationshipbetweenGPSandBLE.Secondly,toavoidconfoundingcorrelationsbetweenthetwo sensors’signals,wedesignandfine-tuneaspecificencoder-decoderarchitecturebasedonageneralformulationof regularizedauto-encoders.Lastly,withDBSCAN,weturnintoclassestherepresentationslearnedviaindependent sensorstime-series–GPSandBLE.
FollowingthenotationofTolstikhinetal.[2017],weidentifysetswithcalligraphicletters(i.e. X ),randomvariables withcapitalletters(i.e. X).,andvalueswithlowercaseletters(i.e. x).
Let X ∈ Rt×d bethetensordescribingthesmartphone/businteraction,inatimewindowof t observations,which d independentfeaturechannelsexpresssuchthat: X1 ∈ Rt×d1 representsthechannelsderivingfromtheGPSsensors; X2 ∈ Rt×d2 ,fromtheBLEdevicesnetwork;where (X1 ,X2 )= X and D1 D2 ⊆D,with |D| = d Wewouldliketolearnarepresentationfor X solvingthepredictionproblem X =(X1 , X2 ),where X1 = F1 (X2 ), and ˆ X2 = F2 (X1 ) F1 learnsthecause-effectrelationshipbetweensmartphone-busproximityandBLEsignalstrength, while F2 learnstheinversecause-effectrelationshipofthesameinteractionbetweensmartphoneandbus.
F representsaclassofnon-randomgenerativeEncoder/Decodermodelsdeterminalisticallymappinginputpointstothe latentspacewithaconvolutionalneuralnetwork(CNN)viaEncoder,andlatentcodestooutputpointswithatranspose CNNviaDecoder.Tolearn F,weminimizetheWassersteinoptimaltransportcost (1) betweenthetrue-unknowndata distribution PX andthelatentvariablemodel PG specifiedbythepriordistribution PZ oflatentcodes Z ∈Z andthe generativemodel PG(X|Z) ofthedatapoints X ∈X given Z [Tolstikhinetal.,2017]. (1) showsthatwhilethedecoder pursuestheencodedtrainingexamplesreconstructionattheminimalcost c,theencoderpursuestwoconflictinggoals atthesametime:(i)Matchtheencodeddistribution QZ tothepriordistribution PZ ,where QZ := EPX [Q(Z|X)] (ii)Ensurethatthelatentrepresentationforthedecoderallowsaccuratereconstructionoftheencodedtrainingexamples.
Inthistwostepsprocedure,first Z issampledfromafixeddistribution PZ onalatentspace Z,andthen Z ismapped to X = G(Z) foragivenmap G : Z→X ,where X ∈X = R
ThistaskformulationextendstheSplit-brainAutoencoderproposedbyZhangetal.[2016].Wesharetheintuition,and thegoalofachievingarepresentationcontaininghigh-levelabstractionandsemanticsofthesmartphone-businteraction registeredindependentlybyGPSandBLEsensors.IncontrastwithZahng,weaimatlearningthecause-effectfunction anditsinverse,separately,andnotjustmerelyasa“pretext”.However,tokeepupwiththeBigDatascale,Zhang approachbringssomelimitationswiththeobjectivefunctioninEq. (2):(i)Forweightingthemulti-taskcost O,Zhang introducesthehyperparameter ˆ λ thatrequiresadedicatedoptimizationprocess.(ii)Tolearncause-effectrelationship anditsinverse,wedonotwantincludethefullsignal c((F1 (
2 ), F2 (X1 )),X) inthemulti-taskobjectivefunction O. (iii)Theuseofaclassicalunregularizedauto-encoder,whichminimizesonlythereconstructioncost c,between X and ˆ X,preventsfromyieldingfulladvantageofrepresentationlearningforthisproblem,facilitatingmodelover-fitting insteadofgeneralizationpower.
InthefollowingsectionswecannowlookathowweextendedZhang’sworktocoverbothoftheaforementioned limitationsandenableclustering.
2.1ExtensionTowardsMulti-taskSelf-learnedCostWeights
Inamulti-tasksetting,Kendallshowsthatwhentasksuncertaintydependsonitsunitofmeasure,homoscedastic uncertaintyisaneffectivebiasforweightingmultiplelosses[Kendalletal.,2018].Thisfitsexactlywithourproblem, wheretheproximitybetweensmartphoneandbusismeasuredinmetersononehand,andinReceivedSignalStrength Indicator(RSSI)ontheotherhand.With ˆ
and
,where
, (3) representsthe multi-tasklossformulationforourproblem,accordingtoKendall.Themaindifferencebetween (2) and (3) isthatin thesecondcasethetwoparameterscanbe“learned”leveragingtheANNbackpropagationalgorithmwhilelearning F parameters,duringthetrainingphase.Whentrainingonlargedatasets,thisisanadvantage.
2.2Extensiontowardsregularizedauto-encoder
WAErepresentaclassofgenerativemodelsrestingontheoptimaltransportcostderivedfromVillani[2003]and expressedin (1).Thisclassunderpinsourextension:IncontrasttoZhangwork[Zhangetal.,2016],whichstudiesthe unregularizedcost c,suchasregressionandcross-entropy,weincludetotheregressioncostaregularizationterm,i.e., themaximummeandiscrepancy(MMD) DZ = MMDk(PZ ,Qz ) (4) expressestheMMD,where k : Z×Z→ R isa positive-definitereproducingkernel,and Hk isthereproducingkernelHilbertspace(RKHS)ofreal-valuedfunctions mapping Z to R [Grettonetal.,2008].
Similarlytovariationalauto-encoders(VAE)[KingmaandWelling,2013],thisWAE-MMDformulationusesartificial neuralnetworks(ANN)toparametrizeencoderanddecoder.However,toallowback-propagationthroughoutdecoder
andencoder,there-parametrizationtrick[KingmaandWelling,2013] “forces Q(Z|X = x) tomatch PZ forall thedifferentsamples x drawnfrom PX .Incontrast,WAEforcesthecontinuousmixture QZ := Q(Z|X)dPX to match PZ ” [Tolstikhinetal.,2017].Consequently,WAEallowabetterorganizationofthelatentspacewhichwe leverageforclustering.Comparedtoalternativeformulationsofthepenaltyterm,suchastheGenerativeAdversarial Networks[Makhzanietal.,2015](GAN),oringeneraltheWAE-GAN[Tolstikhinetal.,2017],where DZ in (1) isthe Jensen-ShannonDivergence,theliteratureshowsslightlybetterreconstructionperformancefor ˆ X butattheheavycost ofanadditionalnetworkandpossiblycomplexandmulti-modaldistributionsfor PZ .Sinceourproblemissimplein principle,weoptforsimplicity,thusforMMD.
If k ischaracteristic1 MMDrepresentsadivergencemeasure[Sriperumbuduretal.,2011].
Wetryboththealternativekernels k proposedforWassersteinauto-encoders(WAE)[Tolstikhinetal.,2017]:Radial basisfunctionkernel(RBF)(5);andInversemultiquadraticskernel(6).
Theresultingarchitectureconsistsoftwoindependentencoder/decodermaps F1 , F2 ∈F suchthat X1 = F1 (X2 ) and ˆ X2 = F2 (X1 ).Eachmap’sencoderconsistsof1D-Convolutions;1D-Transpose-Convolutionsforthedecoder.As describedinFig.1,mapsarelearnedusingback-propagationtominimizingthemultitaskformulationofourobjective function (7),whereweset c = ||X ˆ X||2 2 and DZ = MMDk.Tofindoptimalrelativeweightsbetweentasks,we leveragethesameback-propagationalgorithm.
2.3ExtensionofDeepClusteringArchitecture
Toallowunsupervisedclassificationofimages,Caronetal.proposesastraightANNpredictingclusterassignmentas pseudo-labels[Caronetal.,2018],anditeratebetweenclusteringwithk-means[Likasetal.,2003]andback-propagation toupdatethenetwork’sweightsaftertheclusterassignment.Theintuitionisthatclusteringprovidesandalternative andmeaningfulreferencetolabels.Therefore,thelossfunctioniscomputedagainstclustersinsteadofknownlabels. However,sincewecollecttwoindependentmeasureofthesameevent,bydesign,wetweaktheprocessusingthese twosignalasreciprocalpseudo-labelsinstead.Whenback-propagationconverges,weperformclusteringofdata representationonthelatentspacewithDBSCAN[Khanetal.,2014].Fig.1,2and3showthearchitecturestested withinourablationstudy:thefirstleveragestheknowncause-effectrelationshipbetweenGPSandBLEsignal;the second,themulti-taskindependentreconstructionofthetwosignals;thelastsharesparameterswithinthesamenetwork, toreconstructatensorwheremultiplechannelscontaineachavailablesignal.
2.4FinalModelFormulation
Fig.1presentsthefinalstructureofourCEMWAmodel,resultingfromtheSplit-brain’sarchitectureextensions describedinSec.2.1,2.2and2.3.
1Given k : Z + → R, k isinjective, Z + ispositiveandrepresentsthesetofprobabilitymeasureson Z +
Wewillargueasfollows:(i)CEMWAhastheabilityoflearningthecause-effectrelationshipbetweenGPSandBLE signalsrecordingsmartphone-businteractions.(ii)Learningsucharelationshipallowstheexposureofself-validated featurescharacterizingtheBIBOstatusofuserswithrespecttobuses.(iii)Theseself-validatedfeaturesallow unsupervisedclassificationofuserstrajectories,wheresmartphonesidentifyusersandBLEdevicesidentifybuses. (iv)Alternativeunsupervisedarchitecturesleveragingthecorrelationinsteadofcause/effectbetweentheGPSand BLEsignals—suchasthosedescribedinFig.2and3—areunabletotoperformself-validatedunsupervisedBIBO classification.(v)Incaseoflabelsnoise,CEMWAsignificantlyoutperformsthemostaccuratesupervisedclassifiers, suchasrandomforestorXG-boost(extremegradientboosting).(vi)Regardlessoftheclassificationperformance, CEMWAembodiesbothadataimputationandavalidationmechanism,whilesupervisedclassifiersoralternative unsupervisedarchitecturesshouldrelyondedicatedprocesses,suchasanexponentialweightedmovingaveragefor BLEorGPSimputation[Osmanetal.,2018],anduservalidationforBIBOlabels[Servizietal.,2021b,a].
Tosubstantiateourhypothesesthroughthefollowingexperiments,consistently,wedesignedanddeployedaspecific sensingarchitecture,andcollectedhighqualitygroundtruth.
2.4.1Groundtruthcollection,datacleansing,andpreparation
CEMWA’sarchitecturemirrorsthesmartphonesensingplatformwedesignedanddeployedtotracktheactivityofthree autonomousbusesoperatinganexperimentalpublicserviceinDenmark,betweentwoextremesoftheLyngbycampus wheretheTechnicalUniversityofDenmarkislocated.
DuringoperationsthesebusesaretrackedviaGPSavailablefromthebustelemetry,whiletestpassengersrecruited fortheexperimentaretrackedviasmartphones.ThesensingplatformcollectedGPSsignalsthatbothsmartphones andbusesgenerate.GPScollectionwasstrictlylimitedaroundtheoperationsareausingageo-fence[Almomanietal., 2011].Inthesamearea,wedeployed 300 BLEdevices:oneoneachbusandbusstop,plusoneattheentrance/sof eachbuildinginthecampus.
Tobecomeatestpassenger,eachuserprovidedexplicitagreementtotermsandconditionspresentedincompliance withtheGeneralDataProtectionRegulation2.ThesensingplatformsupportsbothAndroidandiOSdevices,andthe AppsarepublishedonGooglePlay3 andAppStore4 respectively.Thisprojectisasocialsciencestudy,includesdata andnumbersonly,isnotahealthscienceproject,anddoesnotincludehumanbiologicalmaterialnormedicaldevices. Consequently,inDenmark,wherethedatacollectiontookplace,theHealthResearchEthicsActprovidesadispensation fornotificationtoanyresearchethicscommittee.
Whenthesmartphoneiswithintherelevantgeo-fence,inoptimalconditions,theplatformcollectsGPSwith 1s resolution.Simultaneously,withthesameresolution,theplatformsamplesRSSIsignalstrengthofBLEdevices “visible”intherangeofeachsmartphone.
Weextractedthetrajectoriesofbothtestpassengersandbusesbetween1stApriland1stJuly.134usersgeneratedatotal of 4, 584, 000 GPSobservations;threebuses, 1, 162, 000 GPSobservations,foratotalofapproximately 940h bus operations(seeFig.7).
Fromtheremainingsetofdataweextractedthesub-setofobservationscontainingatleastoneBLEobservation,fora totalof 195, 000 GPSobservations(seeFig.6).ThissetpresentthemaximumBLEresolutionavailable,whilethe correspondingGPSresolutionisbelowthemaximumresolutionavailablewithinthedataset.Nolabelsareavailable forthisset.Fig.4depictsthespeeddistributionofdifferenttransportationmodespresentinthissubset.Tohighlight thedifferencesinspeedbetweendifferenttransportmode,weappliedtheexponentialtransformation.However,the blackflatcolorshowsthatthespeeddistributionseemstobethesameinallthecases,exceptforsomecars(seeblack magnifieddetail).
Outsidethepassengers’set,wegeneratedasetofrecordscounting 59, 000 observationswhicharepartofaspecific experimentwheresevencomponentsoftheproject’sstaffcollectedviasmartphoneahighqualityBIBOlabelsand observationsset(seeFig.5),followingthesamemethodologyofShankarietal.forMobilityNetdatasetcollection [Shankarietal.,2020].Thus,toavoidbiasinthelabels,weprovidedinstructionsonpreciseorigin-destination sequences,dividedinthreedifferenttrip-groups.Eachstaffmemberhasbeenrandomlyassignedtoatrip-group.After watchsynchronization,duringtheexperiment,eachstaffmemberannotatedthehourandminuteeachtimes/heboarded oralightedabus.
2Informationprovidedtousersbeforerecruitement,accesson03-09-2021
3LINCDTUatGooglePLay,accesson03-09-2021
4LINCDTUatAppstore,accesson03-09-2021
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/8859cc297c5efe36ab98e3a220184539.jpeg)
Figure4:SubsetofGPSpointspresentingatleastoneBLEdevicereading;color mapbasedon espeed showsthatbusesandothermodesintheareahavethesame speeddistribution–i.e.,walkandbike–fewtrajectoriesrecordedfromcararethe onlyexception.
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/cf06dcb9f192c764d5e010075a15f3a4.jpeg)
Figure5:GPSpointsfromsmartphones,colormapbasedonspatialdensityshows busstopsandbusdeposit.
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/8d0836312f0e0d54c890005833ff0d5b.jpeg)
Figure6:SubsetofGPSpointspresentingatleastoneBLEdevicereading;points spatialdistributionshowshigherdensityatthebusstops,busdepositandsome buildings.
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/29d45bd7bcc10663cfe4a6a7b1995c27.jpeg)
Figure7:GPSpointsfrombuses,spatialdistributionshowshigherdensityatthe busstops,busdeposit.
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/90eea65ff246cbef79fc568f3106290a.jpeg)
Figure8:Be-In(BI)clustersidentifiedonsmartphonedataclusteringCEMWA latentspacewithDBSCAN,andcoloredwithgroundtruthlabels.Redcolordepicts usersinsidethebus;bluecolor,usersoutsidethebus.
![](https://assets.isu.pub/document-structure/250117110952-4e2b9ddddfb1d05465135029b4716fc4/v1/fa958596d5aae9563470eaea5fc0c5d1.jpeg)
Figure9:Be-Out(BO)clustersidentifiedonsmartphonedataclusteringCEMWA latentspacewithDBSCAN,andcoloredwithgroundtruthlabels.Redcolordepicts usersinsidethebus;bluecolor,usersoutsidethebus.
2.4.2Experimentsetup
Table1describesexperimentalsetupfortheevaluationofsupervisedbaselines,forablationstudyofvariousunsupervised architectures,andforthemodelweproposeinthiswork.Weappliedatrajectorysegmentationconsideringeachpair ofpointsbeyond 120s time-range,orwherethespacevariationovertimevariationisbeyond 120m/s,theendofa segmentandthebeginningofthenextsegment.Aftersegmentation,foreachsegmentweappliedaslidingwindow including9consecutivepointsand1stepstride.CEMWA,MWAandWAprocesstheresultingtensorstraightly,using convolutions.Instead,RandomForestandXGboostrequireanintermediateprocesstoextracttraditionalfeaturesfrom the9stepwindowscontainedineachsegment,computedateachslide,applyingthesamestrideof1step. Wesetupthesameconditionsforbothbaselinesandproposedmethods.Comparingsupervisedandunsupervised classifiersinthissettingissubjecttothelimitationoflabeleddataset.Aswewanttoprovideperformancedistributions insteadofpoints,withsupervisedmethodsweapplyleave-one-outvalidationmethod,whilewiththeunsupervised methodsweapplyaholdoutmethod.Inthefirstcasewetrainthemodelwithalltheusersbelongingtothelabeled observationsexceptone,whichrepresentthetestset.Inthetestsetwerotatealltheusersavailable.Thus,the mainscorescanpresentedasmean ± standarddeviation.Inthesecondcase,wetrainthemodelwiththeunlabeled observations,andwithoutperformingDBSCANclustering.ThenweusethemodelincludingDBSCANtoclassify— offthesample—thelabeledobservations.Similarly,wecanpresentthemainscoresasmean ± standarddeviation. Consequently,wecancomparethesescoreseventhoughthetrainingprocessisquitedifferent.
Thissetupassumesthatthegroundtruthqualityisstableandhigh.Aswementioned,thelabelscollectionmethodwe usedcanguaranteeahigherqualitylevelonthelabels.Unlikethecasewheregroundtruthiscollectedfrompassengers, theproject’sstafffollowedinstructionsandwasnotsubjectto,e.g.,recallbias,andlesslikelytosuffersystematicand randomdistractions.Therefore,toprovideanexhaustivepictureforperformance,wetrainthesesupervisedmethods addingsomenoiseinthetrainingset,i.e.,flippingacontrolledpercentageoflabels.Wesamplethenumberoferrorsper userfromaPoissondistributionandwefliplabelsaccordingly.Thetestsetisnotaffected.Therefore,applyingaMonte Carloevaluationbasedon100loopsperexperiment,andonthesamesetupdescribedinTable1,wecanestimatethe sensitivitytolabelsnoise.Thisproblemdoesnotaffecttheunsupervisedmethods,whichuseBluetoothRSSIsignalas pseudo-labelsinstead(seeTable1,Signalsrow).
3ResultsandDiscussion
AfteramanualoptimizationprocessofCEMWA,MWA,andWA,weyieldoptimalperformancewiththecombination ofhyperparametersdescribedinTable2.AsopposedtoCEMWA,MWAandWAconvergetoarelativelylower loss,andoverfittingishigher.Althoughthethreemodelshavethesamenumberofparameters,werecorddiffering computationtimesforthetrainingphase(whichmightbejustifiedbyconcurrentprocessingonGPU).Comparedto MWAandWA,CEMWAachievessubstantiallybetterscores,withhighermeanandinferiorstandarddeviation. (5) yieldstheresultswepresent,while (6) seemsnoteffectiveinthisusecase.Weapplythesamepenalizationacross allthreemodelsduringback-propagationtorebalanceBIandBOclasseswhencomputingtheWAElosswithinthe optimizer.RatherthanthePrecisionscore,theRecallscoreoftheBIclassseemstoprovideanessentialcontributionto theoverallsuperiorperformanceofCEMWA.
Thesupervisedmethodsweevaluateareperformingverywell.XGboostpresentsaslightlyhigherscorethanCEMWA butwithaslightlylargerstandarddeviation.Thetwomodelsseemtohavecomparableperformanceintermsof computationtime.Thereseemstobethefollowingdifferences.Inoptimalconditionsandgroundtruthquality,XGboost appearstorecordasubstantiallyhigherprecisionscore,butalowerrecallscorethanCEMWA.Underthesame conditions,RandomForestseemscomparablewithMWAandWA,orbetter.Butweshouldnotforgettheimpactof wronglabelsinthetrainingprocessofsupervisedmethodssuchasXGboostandRandomForest.Thisproblemdoes notaffectunsupervisedmethodslikeCEMWA.
TotestthesensitivityofXGboostandRandomForesttonoiseinthelabels,werunaMonteCarloevaluation. Resultsshowthatbeyond10%flippedlabelsduringtrainingleadstosubstantialperformancedegradation.Thisrapid degradationisofcriticalimportancewhenlabelsarecollecteddirectlyfrompassengers.Consequently,thetrade-off betweenthecostandthequalityoflabelscollectioncriticallyimpactsthescalabilitypotentialofsupervisedmethods. Figure10depictstheimpactofwronglabelsontheclassifiersperformance:Whenusersprovidewronglabelstoless than1segmentinaverage–whereasegmentisdefinedaccordingtotheGPSTrajectorySegmentationofTable1–the performanceofsupervisedclassifiersdropsdramaticallycomparedtoCEMWA.
Thisconfigurationprovidespotentialforenhancingsmartphonebatteryefficiencyanduserprivacy,because:(i)SmartphoneswouldlistentoBluetooth,whilekeepingGPSup,withminimumresolution,justenoughtoavoidGPScold start;(ii)BluetoothinproximitywouldtriggerhigherresolutionGPS,onlywhennecessary.
Figure10:Impactofwronglabelsonsupervisedclassifierstraining(F1scoremacroaverage).
Figure11:Impactofwronglabelsonsupervisedclassifierstraining(F1scoreweightedaverage).
Figure12:Impactofwronglabelsonsupervisedclassifierstraining(AUCROC).
Figure13:Impactofwronglabelsonsupervisedclassifierstraining(Accuracy).
SmartphoneSet GPS+BLE Android+iOS
Busesset
Signals
UseofGroundTruthLabels
GPSTrajectorySegmentation
DataCleansing
Table1:ExperimentSetup
SupervisedBaseline XG-Boost RandomForest
59,000labelledobservations 7users
UnsupervisedBaseline MWA(Fig.2) WA(Fig.3) CEMWA (Fig.1)
328,000totobservations
59,000labelled 134totusers
1,162,000observations, 940h bus,3buses
Speed,Longitude,Latitude,TimestampfromGPS
Fortrainingandevaluation
Speed,Longitude,Latitude,TimestampfromGPS RSSIandTimestampfromBLEdevices
Forevaluationonly
timegapbetweenpoints>120s determinesanewsegment pointsrepresentingspeed>45m/s determineanewsegment
Segments<10consecutivepointsarediscarded
ObservationImputation ImputationwithExponentialWeightedMovingAverageandMasking MaskingOnly
BasicFeatureExtraction time-,space-gap,andbearingbetweeneachpairofGPSpoints,GPSdistancebetweensmartphoneandbuseswithin 1s range
TimeSeriesSlidingWindow movingwindowof9consecutivestepssegment,and1stepstride
Meanvalue
Maxvalue
Minvalue
Positionoftheminimumvalue
Positionofthemaximumvalue
Amplitudebetweenminandmaxvalue
FeatureExtraction onSlidingWindow
PerformanceEvaluationMethod
Methodperformancedistribution
PerformanceMetric
Numberofpointsbeyondonestddev. Numberofpointsbelowonestddev. Numberofpointsaboveonestddev. Numberofpeaksinthemvoingwindow Numberofpeakshalfslidingwindow Numberofpeaksabove1onestddev. Peakdistancewithinslidingwindow Slope
Leave-one-out: Oneuserinthetest-set Training-setisthecomplementarset. Repeatedrotatingeachuserintest-set.
None. ANNperformsfeaturesextraction. Encoder,1convolutionalneuralnetwork. Decoder,1transposedconvolutionalneuralnetwork.
ConvolutionKernel:3
λ ∈ [10 4 , 1]
BatchSize: ∈ [16, 1024]
truesamplesize: ∈ [10, 100]
LearningRate: ∈ [10 5 , 10 1] Epochs: ∈ [10, 100]
Hold-out: Training-andvalidation-setfromunlabelled-set. Test-setcorrespondingtothelabelled-set.
Givenbyperformanceonindividualusersofwholethelabelledset.
AUCROC,F1-score,Precision,Recall,Accuracy
Table2:Encoder/DecoderCNNarchitecturehyperparameters,finalconfigurationforCEMWA,EMWA,andWA.
Encoder
ConvolutionalNeuralNetwork(CNN)Layers 1
ActivationFunction RectifiedLinearUnit
FullyconnectedLayers 0
Dropout 0.25
Decoder
TransposedCNNLayers 1
ActivationFunction LeakyRectified LinearUnit
FullyconnectedLayers 0
Dropout 0.25
Optimizer Adam
Epochs 50
BatchSize 32
LearningRate 10 4
Dropout 0.25
Inpractice,aftercause-effecttrainingwithencoder-decoderarchitectureandclustering–whereGPScompressionis trainedreconstructingBLEandvice-versa–CEMWAcouldbedeployedasfollows.Duringoperations,oneCEMWA’s encodercompressesGPS,whileaseparateencodercompressBluetooth.Thetwoindependentcompressedrepresentation arejoinedintoone.Theproximitybetweentheresultingrepresentationandtheclustersdeterminewhetherthe observationbelongtoBIorBOclass.
Forapplicationswheredisruptionsareunlikely–thusweexpectastableprocessintime–theamortizationofhigh-quality groundtruthcouldrelyonalongertimehorizon.Anestablishedmetrolineforexample,isunlikelytoexperience changesfrequently.Incontrast,busservicesaresubjecttocontinuousdisruptions,e.g.,roadworksandtrafficcongestion. Therefore,asupervisedBIBOclassifiercouldbeagoodchoiceinthefirstcase.However,theunsupervisedBIBO classifierseemsbetterinthesecondcase.Resultsrelymainlyonthesmartphone-bus-distance.Thisfeaturecanbe challengingtocomputeoff-line,especiallywhenalargenumberofpassengersandvehiclesareactive.However,a federated-learningdesign[3rdGenerationPartnershipProject(3GPP),2021]wouldsolvetheproblem,andallowthe computationoffeaturesonline.
Assumingsmartphones’futuremarketpenetrationstable,andrelyingonadversarialsensorsarchitectures,weshow anapproachtosubstitutemanuallycollectiblelabels.Thisapproachhasvastpotential;forexample,BLEbeacons contraposedtoGPSwithinaCEMWAarchitecturewouldenableticketlesstransitacrossanypublictransportation system,andlarge-scaledeployment,evenforapplicationssubjecttofrequentdisruptions.Inadditiontothebeforementionedusecase,wesuggestroadandbridgetollsorsharingmobilityserviceslikecars,bikes,orscooters.ABIBO systemalsosupportsvisuallyimpairedpeopletochosetoboardtherightbusfromthebusstoportoalightattheright stopfromthebus.Itcouldfacilitatetheintegrationacrossmultipleserviceproviders,operatingmostlyonsoftware insteadofphysicalinfrastructure,evenintegratingwithexistingCICOandWIWOsystems.
Table3:ResultswithoptimalGroundTruthformethodevaluationandtrainingofsupervisedalgorithms
4Conclusion
Thispaperfocusesonanimplicittrackingsystemtodetectwhetherapassengerisinsideoroutsidethetransportnetwork. Toavoidusinglabelsintheclassifiertraining,weleverageanovelartificialneuralnetworkarchitecturelearningthe cause-effectrelationshipbetweentwoindependentsensorsmeasuringthesameevent.WecallthisapproachCEMWA. Inoptimalconditionsandwithhigh-qualitygroundtruth,CEMWA’sperformanceiscomparableorbetterthanboth supervisedandunsupervisedbaselines.CEMWAandXGboostperformanceevaluatedwithoptimalknwoledgeon BIBOgroundtruthseempromisingforpublictransportticketingingeneral.Insituationswithnoisygroundtruth–such astransportservicessubjecttodisruptionorsurveyswherepassengerslacktheticketpaymentasanincentiveto provideexactgroundtruth–weshowthatsupervisedclassifiers’performancedegrades.Supervisedmethods’tolerance tonoisylabelsiscasespecific.However,theissuedoesnotaffectCEMWAbydesign.Consequently,thisunsupervised methodisbothscalableandfulfillstherequirementsforuse-caseswhere,e.g.,frequentservicedisruptionsmaylead totheneedforregularlabels’collection.Futureresearchwillinvestigateinfewdirections:(i)Theextensionofa sensor-to-sensorvalidationonnewsignalsandneuralnetworkarchitectures,thesensitivitytolabelingnoise;(ii)The introductionofsensitivitytonoiseasaperformanceindextoevaluateandcomparesupervisedmethods;and(iii)The connectionbetweendrymachinelearningscoresofourBIBOclassifierandkeyperformanceindexassessingautomatic farecollectionsystemswithBIBO.
Acknowledgment
Thisprojectisco-financedbytheEuropeanRegionalDevelopmentFundthroughtheUrbanInnovativeActions Initiative.
References
WolfgangNarzt,StefanMayerhofer,OttoWeichselbaum,StefanHaselbock,andNiklasHofler.Be-in/be-outwithbluetoothlowenergy:Implicitticketingforpublictransportationsystems. IeeeConferenceonIntelligentTransportation Systems,Proceedings,Itsc,2015-:7313345,2015.ISSN21530017,21530009.doi:10.1109/ITSC.2015.253.
SampoHietanen.Mobilityasaservice. thenewtransportmodel,12(2):2–4,2014.
DavidA.HensherandCorinneMulley.Hensher,d.a.andmulley,c.mobilitybundlingandculturaltribalism-might passengermobilityplansthroughmaasremainnicheoraretheytrulyscalable? TransportPolicy,100:172–175,2021. ISSN0967-070X.doi:https://doi.org/10.1016/j.tranpol.2020.11.003.URL https://www.sciencedirect.com/ science/article/pii/S0967070X20309203
WarwickGoodall,TiffanyDovey,JustineBornstein,andBrettBonthron.Theriseofmobilityasaservice. Deloitte Rev,20:112–129,2017.
MatthiasH.WirtzandJ.A.N.Klähr.Smartphonebasedin/outticketingsystems:Anewgenerationofticketingin publictransportanditsperformancetesting. WitTransactionsontheBuiltEnvironment,182:351–359,2019.ISSN 17464498,17433509.doi:10.2495/UT180321.
ValentinoServizi,CamaraFranciscoPereira,KarenMarieAnderson,andAnkerOttoNielsen.Transportbehaviorminingfromsmartphones:areview. EuropeanTransportResearchReview,2021a.doi:10.1186/s12544-021-00516-z. URL https://doi.org/10.1186/s12544-021-00516-z.
OanaBaescuandHjalmarChristiansen. TheDanishNationalTravelSurveyAnnualStatisticalReportTU0619v2.DTU Management,2020.doi:10.11581/dtu:00000034.
YoujingCuiandShuzhiSamGe.Autonomousvehiclepositioningwithgpsinurbancanyonenvironments. IeeeTransactionsonRoboticsandAutomation,19(1):15–25,2003.ISSN2374958x,1042296x.doi:10.1109/TRA.2002.807557.
LinWang,HristijanGjoreski,MathiasCiliberto,SamiMekki,StefanValentin,andDanielRoggen.Enablingreproducibleresearchinsensor-basedtransportationmoderecognitionwiththesussex-huaweidataset. IEEEAccess,2019. ISSN21693536.doi:10.1109/ACCESS.2019.2890793.
E.Foxlin.Inertialhead-trackersensorfusionbyacomplementaryseparate-biaskalmanfilter.In ProceedingsoftheIEEE 1996VirtualRealityAnnualInternationalSymposium,pages185–194,1996.doi:10.1109/VRAIS.1996.490527.
AndreasBjerre-Nielsen,KeltonMinor,PiotrSapiezynski,SuneLehmann,andDavidDreyerLassen.Inferring transportationmodefromsmartphonesensors:Evaluatingthepotentialofwi-fiandbluetooth. PlosOne,15(7),2020. ISSN19326203.
IvanMalmberg. AnanalysisofiBeaconsandcriticalminimumdistancesindeviceplacement.PhDthesis,2014.URL http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187925
ValentinoServizi,DanRolandPersson,PerBækgaard,HannahVilladsen,InonPeled,JeppeRich,FranciscoCPereira, andOttoANielsen.Context-awaresensingandimplicitgroundtruthcollection:Buildingafoundationforevent triggeredsurveysonautonomousshuttles:Artikel.In ProceedingsfromtheAnnualTransportConferenceatAalborg University,volume28,2021b.
KalyanaramanShankari,JonathanFuerst,MauricioFadelArgerich,EleftheriosAvramidis,andJesseZhang.Mobilitynet:Towardsapublicdatasetformulti-modalmobilityresearch. ClimateChangeAI,2020.
ValentinoServizi,NiklasChristofferPetersen,FranciscoCamaraPereira,andOttoAnkerNielsen.Stopdetectionfor smartphone-basedtravelsurveysusinggeo-spatialcontextandartificialneuralnetworks. TransportationResearch PartC:EmergingTechnologies,121:102834,122020.ISSN0968090X.doi:10.1016/j.trc.2020.102834.URL https://linkinghub.elsevier.com/retrieve/pii/S0968090X20307385
SinaDabiriandKevinHeaslip.InferringtransportationmodesfromGPStrajectoriesusingaconvolutionalneural network. TransportationResearchPartC:EmergingTechnologies,86(November2017):360–371,2018.ISSN 0968090X.doi:10.1016/j.trc.2017.11.021.URL https://doi.org/10.1016/j.trc.2017.11.021
MariaCornacchia,KorayOzcan,YuZheng,andSenemVelipasalar.Asurveyonactivitydetectionandclassification usingwearablesensors. IeeeSensorsJournal,17(2):7742959,2017.ISSN15581748,1530437x,23799153. doi:10.1109/JSEN.2016.2628346.
AnilNPKoushik,M.Manoj,andN.Nezamuddin.Machinelearningapplicationsinactivity-travelbehaviourresearch: areview. TransportReviews,0(0):1–24,2020.doi:10.1080/01441647.2019.1704307.URL https://doi.org/10. 1080/01441647.2019.1704307
IvanaSemanjski,SidhartaGautama,ReinAhas,andFrankWitlox.Spatialcontextminingapproachfortransportmode recognitionfrommobilesensedbigdata. Computers,EnvironmentandUrbanSystems,66:38–52,2017.ISSN 01989715.doi:10.1016/j.compenvurbsys.2017.07.004.
LijuanZhang,SagiDalyot,DanielEggert,andMonikaSester.Multi-stageapproachtotravel-modesegmentationand classificationofgpstraces. InternationalArchivesofthePhotogrammetry,RemoteSensingandSpatialInformation Sciences:[GeospatialDataInfrastructure:FromDataAcquisitionAndUpdatingToSmarterServices]38-4(2011), Nr.W25,38(W25):87–93,2011.
A.Kotanen,M.Hännikäinen,H.Leppäkoski,andT.D.Hämäläinen.Experimentsonlocalpositioningwithbluetooth. ProceedingsItcc2003,InternationalConferenceonInformationTechnology:ComputersandCommunications,page 1197544,2003.doi:10.1109/ITCC.2003.1197544.
FazliSubhan,HalabiHasbullah,AzatRozyyev,andSheikhTahirBakhsh.Indoorpositioninginbluetoothnetworks usingfingerprintingandlaterationapproach. 2011InternationalConferenceonInformationScienceandApplications, Icisa2011,page5772436,2011.doi:10.1109/ICISA.2011.5772436.
LiangChen,HeidiKuusniemi,YuweiChen,JingbinLiu,LingPei,LauraRuotsalainen,andRuizhiChen.Constraint kalmanfilterforindoorbluetoothlocalization. 201523rdEuropeanSignalProcessingConference,Eusipco2015, page7362717,2015.ISSN20761465.doi:10.1109/EUSIPCO.2015.7362717.
FazliSubhan,HalabiHasbullah,andKhalidAshraf.Kalmanfilter-basedhybridindoorpositionestimationtechniquein bluetoothnetworks. InternationalJournalofNavigationandObservation,2013:570964,2013.ISSN16876008, 16875990.doi:10.1155/2013/570964.
HéctorJoséPérezIglesias,ValentínBarral,andCarlosJ.Escudero.Indoorpersonlocalizationsystemthroughrssi bluetoothfingerprinting. 201219thInternationalConferenceonSystems,SignalsandImageProcessing,Iwssip 2012,page6208163,2012.ISSN21578672.
AliYassin,YoussefNasser,MarietteAwad,AhmedAl-Dubai,RanLiu,ChauYuen,RonaldRaulefs,andEliasAboutanios.Recentadvancesinindoorlocalization:Asurveyontheoreticalapproachesandapplications. IeeeCommunicationsSurveysandTutorials,19(2):7762095,2017.ISSN1553877x,2373745x.doi:10.1109/COMST.2016.2632427. NadineSchuesslerandKayW.Axhausen.Processingrawdatafromglobalpositioningsystemswithoutadditional information. TransportationResearchRecord,2105(1):28–36,2009.doi:10.3141/2105-04.URL https://doi. org/10.3141/2105-04
NiklasChristofferPetersen,AndersParslov,andFilipeRodrigues.Short-termbustraveltimepredictionfortransfer synchronizationwithintelligentuncertaintyhandling. arXivpreprintarXiv:2104.06819,2021.
RichardZhang,PhillipIsola,andAlexeiA.Efros.Split-brainautoencoders:Unsupervisedlearningbycross-channel prediction,2016.
MathildeCaron,PiotrBojanowski,ArmandJoulin,andMatthijsDouze.Deepclusteringforunsupervisedlearningof visualfeatures.In ProceedingsoftheEuropeanConferenceonComputerVision(ECCV),September2018.
AlexKendall,YarinGal,andRobertoCipolla.Multi-tasklearningusinguncertaintytoweighlossesforscenegeometry andsemantics.In ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages7482–7491, 2018.
ArthurGretton,KarstenBorgwardt,MalteJ.Rasch,BernhardScholkopf,andAlexanderJ.Smola.Akernelmethodfor thetwo-sampleproblem,2008.
IlyaTolstikhin,OlivierBousquet,SylvainGelly,andBernhardSchoelkopf.Wassersteinauto-encoders. arXivpreprint arXiv:1711.01558,2017.
CédricVillani. Topicsinoptimaltransportation.Number58.AmericanMathematicalSoc.,2003.
DiederikPKingmaandMaxWelling.Auto-encodingvariationalbayes. arXivpreprintarXiv:1312.6114,2013.
AlirezaMakhzani,JonathonShlens,NavdeepJaitly,IanGoodfellow,andBrendanFrey.Adversarialautoencoders. arXivpreprintarXiv:1511.05644,2015.
BharathKSriperumbudur,KenjiFukumizu,andGertRGLanckriet.Universality,characteristickernelsandrkhs embeddingofmeasures. JournalofMachineLearningResearch,12(7),2011.
AristidisLikas,NikosVlassis,andJakobJ.Verbeek.Theglobalk-meansclusteringalgorithm. PatternRecognition,36 (2):451–461,2003.ISSN0031-3203.doi:https://doi.org/10.1016/S0031-3203(02)00060-2.URL https://www. sciencedirect.com/science/article/pii/S0031320302000602.Biometrics. KamranKhan,SaifUrRehman,KamranAziz,SimonFong,andS.Sarasvady.Dbscan:Past,presentandfuture.In The FifthInternationalConferenceontheApplicationsofDigitalInformationandWebTechnologies(ICADIWT2014), pages232–238,2014.doi:10.1109/ICADIWT.2014.6814687.
MuhammadS.Osman,AdnanM.Abu-Mahfouz,andPhilipR.Page.Asurveyondataimputationtechniques:Water distributionsystemasausecase. IEEEAccess,6:63279–63291,2018.doi:10.1109/ACCESS.2018.2877269.
ImanM.Almomani,NourY.Alkhalil,EnasM.Ahmad,andRaniaM.Jodeh.Ubiquitousgpsvehicletracking andmanagementsystem.In 2011IEEEJordanConferenceonAppliedElectricalEngineeringandComputing Technologies(AEECT),pages1–6,2011.doi:10.1109/AEECT.2011.6132526. 3rdGenerationPartnershipProject(3GPP).StudyontrafficcharacteristicsandperformancerequirementsforAI/ML modeltransfer,22.874,2021.URL https://portal.3gpp.org