Top Banner
A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller 2015 UT CID Report #1512 This UT CID research was supported in part by the following organizations: identity.utexas.edu
37

A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Jun 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories

Jennifer A. Miller

2015UT CID Report #1512

This UT CID research was supported in part by the following organizations:  

identity.utexas.edu  

Page 2: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

ACOMPUTATIONALMOVEMENTANALYSISFRAMEWORKFOREXPLORINGANONYMITYINHUMAN

MOBILITYTRAJECTORIES

BackgroundAdvancementsintrackingtechnologiessuchasglobalpositioningsystems(GPS),radio

frequencyidentification(RFID),cellularphonenetworks,andWiFihotspotshaveresulted

insignificantincreasesintheavailabilityofhighlyaccuratedataonmovingobjects,with

unprecedentedhighspatialandtemporalresolution.Withingeographicinformation

science(GIScience),‘computationalmovementanalysis’(CMA)hasrecentlyemergedasa

subfieldthatfocusesonthedevelopmentandapplicationofcomputationaltechniquesfor

collecting,managing,andanalyzingmovementdatainordertobetterunderstandthe

processesthatareassociatedwiththem(Gudmundssenetal.2012).Asthesetechnologies

facilitatethecollectionofnear-seamless(insomecasessub-second)movementtracks,the

‘spatiotemporalfootprint’ofanindividual’smovementcanbeexploredusingCMA

techniques.

Theselocationdataareoftenstudiedas‘trajectories’,comprisedofaseriesoftime-

stampedsequentiallocations.Dependinguponthecollectionmethod,thelocation

informationcanberepresentedbypreciselatitudeandlongitudecoordinates(e.g.,GPS

datafromasmartphoneorotherdevice)ortheuniquecatchmentareaofasinglecellular

tower(e.g.calldetailrecordsfromcellularphones).Theserelativelylowcostlocationdata

areusedtoexplorehumanmobilitypatternsrelatedto,forexample,urbanplanning,

transportationinfrastructure,disasterplanning/evacuationstrategies,potentialdisease

spread,andmanyotherapplications(Beckeretal.2013).

Theabilitytostudyhumanmobilityandissuesrelatedtointeractionwiththeenvironment

orotherindividuals,andthebehaviorstheseinteractionssuggesthasbeengreatly

enhancedbytechnologicaladvancementsthatfacilitatethecollectionofhighquality

locationdataatunprecedentedspatialandtemporalresolutions.However,asoften

happenswithtechnologicaladvancements,thecollectionofthesedatahaspreceded

extensivestudyonhowandwhattheycan(orshould)beusedfor,aswellastheprivacy

implicationsassociatedwithdistributinginformationonanindividual’slocation.The

researchpresentedhereexploresissuesrelatedtoprivacyandidentityassociatedwith

morerecentlyavailablehighresolutionGPSlocationdata.Theanalysisfocusesonusing

methodsfrommovementpatternanalysisandspatialstatisticalmethodstoaddressthe

followingissues:

• Canactivity“hotspots”beidentifiedfrommovementdataandhowcantheir

spatiotemporalstructurebeexplored?

Page 3: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

• How“unique”areanonymizedmovementtrajectories?Howistheiruniqueness

affectedbyspatialandtemporalresolution?Canmovementcharacteristicssuchas

speedbeusedtouniquelycharacterizetrajectories?

Usingmovementpatternanalysistoidentifypotentialactivity“hotspots”fromGPS

trajectorydata:acasestudyusingtaxicabdatainSanFrancisco.

Animportantapplicationusinglocationdatainvolvesexploringthespatiotemporalpattern

ofactivitytheyrepresent.Previousexamplesfocusedpredominantlyoncalldetailrecords

(CDR)thatwereaggregatedtotheirnearestcellulartower(seeGao2015forreview).

Spatialautocorrelationanalysiswasusedtoidentify“source”areaswithmoreoutgoing

callsand“sink”areas,wheremoreincomingcallsoccurred.Morerecently,GPSdatahave

beenusedtoexploremovementactivityoftaxisinShanghai(DengandJi2011),taxisin

NewYorkCity(Qianetal.2015),andcementtrucksinAthens(Orellanaetal.2010).While

therearecertainpredictablespatialpatternsoftaxicablocationandmovementrelatedto

citystructure(e.g.greateractivityincentralbusinessdistrict)ortimeofday(e.g.towards

andawayfromCBDinmorningandevening,respectively),therearealsostochastic

elementsassociatedwithotherfactorsthatcanoftenberelatedtoephemeralactivitiesand

passengerbehaviors.

Ihypothesizedthatthespatiotemporalstructureofthecollectivemovementofthetaxicabs

couldbeusedtoinferpoints-of-interest(POI)oractivity“hotspots”,andthatsomehot

spotswouldemergeordisappeardependingonthetimeofday.

Researchquestions:Howcanmovementpatternanalysisandspatialstatisticsbeusedto

identifycollectivepointsofinterestfromGPSlocationdata?Howcanthespatiotemporal

structureofthesemovementactivitiesbeexplicitlyanalyzedandvisualized?

DataSanFranciscoCabDataset(http://crawdad.org/epfl/mobility/20090224/).Iused40cabs

andextracteddataforoneweekday(WednesdayJune4,2008)toexaminehowmovement

analysisandspatialstatisticscanbeusedtoexplorepotentialpointsofinterest(POI).The

temporalresolutionwasapproximately1minute.GPSlocationsforeachofthe40cabs

werepartitionedintooneofthreetemporalbins:morning(7-10am,n=4634),afternoon

(4-7pm,n=6009),andevening(9pm-12midnight,n=6087).

MethodsTwodifferentmethodswereusedtoexplorehotspotactivities:thefirstmethodinvolved

aggregatingthetaxilocationstoa250meterx250metersquare(sizewasselected

becauseitisgreaterthan1cityblockbutlessthan2blocks)forasubsetofdowntownSan

Francisco(peakactivity).Thenumberoftaxilocationsforeachsquareandforeachofthe

threetimeperiods(morning,afternoon,evening)wascountedandanalyzedusingglobal

andlocalMoran’sI.

Page 4: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Wherexisthecountoftaxicabsandwijisthespatialweightsmatrixusedtorepresent

whatis“near”.Iusedboth1stand2ndorder(row-standardized)contiguityforspatial

weightsmatrixhere.Moran’sIrangesfrom[-1]indicatingextremenegativespatial

autocorrelationto[+1],indicatingextremepositivespatialautocorrelation,withvalues

near0indicatingnoautocorrelation.

Anselin(1995)introducedalocalstatisticsthatdecomposedtheglobalMoran’sItoalocal

measure(LISA-localindicatorofspatialautocorrelation)as:

Whereavalueiscalculatedforeachobservation.Asinglestatisticisnolongerreported

withLISA,butthevaluescanbemappedandthespatialdistributionofspatial

autocorrelationcanbeexplored.

Figures2-4showthelocalspatialautocorrelationofthetaxicabsforthemorning(fig.2),

afternoon(fig.3),andevening(fig.4)timeperiods,alongwiththerawcounts.Thereisa

coreofrelativelyhighcountsintheupperrightofthestudythatismaintainedforalltime

periods,butthemagnitudeofthiscoreisdifferentforeachtimeperiod,rangingfroma

smallclusterofhigh-highvaluesinthemorning(fig.2b)tothelargestclusterforthe

afternoon(fig.3b).WhiletheglobalMoran’sIwaspositiveandstatisticallysignificantfor

alltimeperiods(usingMonteCarlopermutations(n=499),indicatingthattheoverall

patternwasnearvaluesweresimilartoeachother,therewereoutliersforeachtime

period.Therewere7high-lowcellsinthemorning-cells(fig.2b)thathadahighcount

surroundedbyneighborswithlowcells-whichcouldindicateanisolatedareaofhigh

activity.Additionally,asinglestatisticallysignificantpositivevaluefor(global)Moran’sI

indicatesoverallpositivespatialautocorrelation,butcannotdifferentiatebetweenclusters

ofhighvaluesandclustersoflowvalues.MappingthelocalIivaluesillustratesthat,in

additiontothecoreofhightaxiactivity,thereisacoreoflowtaxiactivityinthebottomleft

foralltimeperiods,aswellaspocketsofnegativespatialautocorrelation(high-lowand

low-high).

Page 5: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

ThelocalstatisticLISAcanbefurtherextendedtomeasurecross-correlationbetweenthe

valueofavariableforatargetcellcomparedtothelaggedvalueofadifferentvariablefor

itsneighbors(inequation2.0above,thexvariablesontherightsideoftheequationwould

representadifferentvariable).BivariateLISAstatisticsareparticularlyusefulforstudying

thechangeinavariableacrosstimeperiods(Anselinetal.2007).Figure5ashowsthe

bivariateLISAformorningcountsasthetargetcomparedtotheneighboringcells’counts

fortheafternoon.Ahigh-highandlow-lowcellswouldbeinterpretedasanareaofhighor

lowactivity,respectively,acrossbothtimeperiods,whilealow-highwouldidentifyacell

thathadlowactivityinthemorningcomparedtohighactivityamongitsneighborsinthe

afternoon.Converselyahigh-lowwouldindicateacellthathadhighmorningactivity

comparedtothelowafternoonactivityofitsneighbors.Thelow-highcellsfringingtheCBD

showthattheareaofactivityincreasesfrommorningtoafternoon.

Figure6ashowstheafternoon-eveningpattern,whereahotspotemergesinthe

southeasternpartofthestudyareanearamajorfreeway(Bayshore).Figure7acompares

morningcountstoevening,andthisareaisalargehotspot,confirmingthatitisanareaof

highactivityinthemorningandevening,butrelativelylowactivityintheafternoon.A

high-lowcellhererepresentsanareaofhighactivityinthemorningthatislessactivein

theeveningandlow-highiscellsforwhichactivityishigherintheeveningcomparedto

morning.

AmorerecentextensiontobivariateLISAisthedirectionalMoranscatterplot,which

allowsforbettervisualizationofthedynamicsbetweenchangingspatialpatternsacross

timeperiods(Rey2014).ThedirectionalLISAshowsthemovementofthestatisticsacross

twotimeperiods,andthereforeincorporatesinformationfromtwodifferentMoran

scatterplots.Forexample,figure5bshowsthechangeinLISAstatisticforeachcellfrom

morningtoafternoon:eachvector‘starts’initspositionformorning(fromfigure2b)and

‘ends’initspositionforafternoon(fromfigure3b).Thesmallarrowsinthetopleftlow-

highquadrantrepresentcellsthatwerelowactivitysurroundedbyhighactivityinboth

morningandafternoon.Figure6bshowsthattherewasmuchmorevariationinLISA

statisticsinafternoonandevening.Thevectorthatishighlightedwithayellowstar

representsacellthatwasa‘coldspot’,orareaoflowactivityintheafternoon,butbecamea

hotspotintheevening.

Inadditiontomeasuringthespatialpatternofaggregatedcountstoidentifylikelyactivity

hotspots,amorenovelmethodinvolvesmeasuringthespatialautocorrelationof

movementparameters,specificallyspeed.OrellanaandWachowicz(2011)usedLISAto

analyzepedestrianmovementinordertouncover“movementsuspension”(low-low

clusters)theysuggestedwouldindicatepointsofinterestoractivityhotspots.Aftertesting

differentnearestneighborspatialweightsmatrices,onethatconsidersonlythe10closest

neighborstobe“near”wasusedtomeasurespatialautocorrelationforallpointswithin

eachofthethreetimeperiods.Asthevariableofinterestisnowspeed,alow-lowcouldbe

usedtosuggestanareaofinterestorahotspot,whilehigh-highpointswouldmostlikely

beassociatedwithfreeways.Low-highsandhigh-lowswouldlikelybedifficultto

disentanglefromvariabletrafficpatterns(e.g.“stopandgo”traffic).Figure8ashowsthe

Page 6: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

patternforthemorningperiod(fig.8biszoomedintodowntownSanFrancisco).Figures

9aand9barefortheafternoonperiodandfigures10aand10bareevening.Thespatial

distributionofthese‘suspendedmovement’areasrepresentedbylow-lowsvarieswiththe

timeofday.Inordertomoreeasilyvisualizethedifferencesinthespatialdistributionof

low-lows,kerneldensityofjustthelow-lowpointswascalculatedanda‘homerange”or

areawheremostofthemwasoccurredwasextracted.Whiletheafternoonandevening

low-lowclustersareinsimilarplaces,themorninglow-lowclustersextendfarther

downtown.Also,theafternoonandeveningclustersatthebottomrightrepresenttheSan

FranciscoAirport,wheretherewaslessslowspeedamongtaxicabsinthemorningperiod.

Examiningthe‘uniqueness’ofhumanmobilitytrajectories:acasestudyusingsmartphonedatafromBeijingAggregatingcountsofGPSlocationstoalargerareaapproximatesthespatialresolution

availablewhenthesestudiesweredonewithcellphonetowersandcallcountswere

aggregatedtotheVoronoipolygondrawnaroundeachcellulartower.However,thereare

importantprivacyissuesassociatedwithdealingwithactualGPSlocationsthatareoften

overlooked.Locationdataareoftenreleasedaftertheyhavebeen‘anonymized’—which

meansthatthetrajectoryhasbeenstrippedofanyobviousidentifyinginformationsuchas

name,address,phonenumber,etc.However,personalpointsofinterest(home,work)can

stillbeidentifiedbyminingtrajectorydataformovementpatterns,andthesepointsof

interestareoftenassociatedwithuniqueindividuals.Additionallocationsmayberesolved

thatcouldhavenegativeimplications(ex.repeatedvisitstoamedicalclinicmaybeacause

forconcernforemployers).

Duetodataavailability,mostofthepreviousworkon“unicity”ormeasuringthe

uniquenessofmovementtracesortrajectorieshasbeenwithmuchcoarserscaledcell

phonedata.Surprisingly,evenrelativelycoarsespatialresolutionlocationdatasuchasthat

associatedwithcalldetailrecords(CDR),where‘location’isanareadefinedbyits

proximitytoaspecificcellphonetower,canbeusedtouniquelyidentifyanindividual.

Locationsofcellphonetowersorantennaearebasedonpopulationdensityandthearea

associatedwitheachonevariesconsiderably.IntheirstudyinasmallEuropeancountry,

deMontjoyeetal.(2013)foundthatthereceptionorcatchmentareaforanantennaranged

from0.15km2inurbanareasto15km2inruralareas.

ZangandBolot(2011)usedanonymizedCDRfrom25millionindividualsacrosstheU.S.to

determinethe“topN”locationsatwhichcallswererecordedforeachofthreemonths.

TheyfoundthatwhenN=2(typicallycorrespondingtoworkandhome),theyfoundthat

upto35%oftheindividualscouldbeuniquelyidentified.WhenN=3(theysuggestedthe

3rdlocationtypicallyrepresentedaschoolorshoppingrelatedlocation),50%couldbe

uniquelyidentified.

Intheirseminalstudy,deMontjoyeetal.(2013)usedfifteenmonthsofanonymizedmobile

phonedata(CDR)for1.5millionindividualsinawesternEuropeancountryandfoundthat

Page 7: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

fourrandomlyselectedspatiotemporalpointsweresufficienttouniquelyidentify95%of

theindividuals.Perhapsmoretroubling,theyfoundthatover50%ofindividualswere

uniquelyidentifiablefromjusttworandomlyselectedlocations(typicallyalso

correspondingtohomeandwork).Songetal(2014)foundsimilarresultswithadatasetof

oneweekofmobilitydatafor1.14millionpeople(total56millionrecords):withjusttwo

randompoints,60%ofthetrajectorieswereunique.

Itisimportanttonotethatuniquenessdoesnotequatetore-identifiabilityandthe

objectivesofthesestudiesweretoexaminehowuniqueindividualtrajectorieswere,notto

actuallydeanonymizethemorre-attachanindividual’sinformationtoauniquetrajectory.

However,theabilitytodetermineuniquenessoftrajectoriesisanimportantprerequisite

forre-identification(whichwouldinvolvecorrelationwithanancillarydataset)and

therefore,representsapotentialthreattoindividualprivacy.

Thedegreeofuniquenessoftrajectoriescanvaryasafunctionoffactorssuchastypical

commutingpatterns,transportationmodes,andgeographicalregion(whichaffects

commutingpatternsandtransportationmodes).Therehavebeenseveralmethods

proposedtoquantifythe‘anonymity’ofadatabase.Themostcommonlyusedmethodofk-

anonymitywasintroducedbySweeney(2002)asameasuretoincreaseanonymityfor

non-spatialdatabases.Whenappliedtospatialdatabases,itensuresthatanysetofrecords

(locations)foranindividualisatleastthesameask-1individuals.Generally,k=2,

ensuringthatatleasttwotrajectoriesareequivalent,butaskincreases,sotoodoesthe

anonymity.Extensionsofk-anonymityincludel-diversityandt-closeness(Lietal.2007).

Thesemeasuresaregenerallyusedtomanagetrajectorydatasets(i.e.,datawouldbe

manipulatedsothatthelevelofanonymityreachedthereportedklevel),butinorderto

quantifytheactuallevelofanonymityoftrajectorydatasets,arigorousanalysiscomparing

randompointsfromeachtrajectorytoallothertrajectorieshastobeconducted.With

trajectorydatasetsnowavailableatonesecondintervals,thevolumeofthesedatacan

resultincomputationallyintensiveanalysis.

Montjoyeetal(2013)measured‘unicity’asthepercentageof2500randomtracesthat

wereuniquegiveprandompoints(prangedfrom2to5).Songetal(2014)defined

uniquenessoftrajectoriesasthepercentageofallavailabletrajectoriesthatwereuniquely

associatedwithprandompoints,whichtheyvariedfrom2to6.Whileanonymity(orlack

thereof)hasbeenstudiedwithCDRdata,asthepreviousexamplesshow,ithasnotyet

beenaddressedwithfinerspatiotemporalresolutionavailableasGPSlocationsfrom,e.g.,

smartphones.Thesedatasetscouldpotentiallybefarmoreuniqueandthereforemore

difficulttoanonymize.

Inthisstudy,IdoanextensivestudyoftheunicityofGPSmovementtrajectoriestestingthe

effectofspatialresolutionandtemporalresolution.Inadditiontolocation,Ialsoexplore

howeffectivemovementparameterssuchasspeedcouldbeforuniquelyidentifyinga

trajectory.Thisisoneofthefirststudiestomeasureunicityoftrajectoriescomposedof

GPSlocations.

Page 8: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

IhypothesizedthattheunicityoftrajectorieswillbegreaterforGPSlocationsthanthe

coarserscaledCDRlocations.Ialsohypothesizedthattheattenuatingeffectsofcoarsening

spatialandtemporalresolutionwillhavelessimpactthantheywouldwithCDRlocations.I

expectthatspeedwillalsobeeffectiveforuniquelyidentifyingatrajectorywhenseveral

datapointsareused.

Researchquestions:Howcananonymitybequantifiedfordifferenttypesoftrajectories

andhowisitaffectedbyspatialandtemporalresolution?Canmovementcharacteristics

suchasspeedbeusedtouniquelycharacterizeatrajectoryintheabsenceofactual

locations?

DataMicrosoftGeoLifeTrajectories(http://research.microsoft.com/en-

us/downloads/b16d359dd164-469e-9fd4-daa38f2b2e13/).Thisisanextremelydense

dataset,withtemporalresolutionof~1-5secondsandspatialresolutionof~5-10meters.

Weusedonlyoneyearofdata(January2009-December2009)andusedaspatialmaskof

Beijing(39.6°to40.2°Nlatitude),(116°to116.8°Elongitudes)toremoveuserswho

traveledoutsideofthecityduringthistimeperiod.Thisresultedin71userswhohada

totalof7,243dailytrajectories(numberoflocationsvisitedwithintrajectoriesvariedbut

themeanwas1600).

MethodsThebasisofourunicitytestinvolvedextracting500setsofpointsofsizenfromeachuser

andcountinghowmanyothertrajectoriestheyarefoundin.Thepercentageof500setsof

pointsthatmatchedonlyonetrajectorywascalculatedandthiswasdoneforeachof71

usersforfourdifferentpointsizes(n=2,3,4,and5).Ourmeasureofunicity,u,isthe

percentageof500randompointsofsizenthatarecontainedinonlyonetrajectory

averagedacrossall71users.Aunicityvaluecloseto100indicatesahighlyunique

trajectorythatcouldtheoreticallybedeanonymized,orre-connectedwithidentifyinguser

informationmoreeasily;alowunicityvaluesuggeststhattherandomsetofpointsare

containedinseveraldifferenttrajectoriesandthereforewouldmakede-anonymizing

trajectoriesfarmorechallenging.Theamountofinformationfromeachpointwasvaried-

weusedjustlocation(xandy),locationandtime(x,y,andt),andtheabsoluteangle(the

absoluteangleforpointiismeasuredbetweenthexdirectionandthestepbuiltby

relocationsiandi+1).

Theoriginallatitudeandlongitudecoordinatesfortheselocationshaveaspatialprecision

ofsixdecimalplaces(~0.1m).Inordertotesthowspatialandtemporalresolutionaffected

measurementofunicity,thegeographiccoordinateswerecoarsenedfirsttofourdecimal

places(~10m)andthetemporalresolutionwascoarsenedto30seconds,thenfurther

coarsenedtothreedecimalplaces(100m)and60seconds.Additionally,theprecisionof

Page 9: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

theabsoluteanglemeasurewasdecreasedfromtheoriginal(fivedecimalplaces)tothree

decimalplaces.

Figure12displaysthetrajectoriesfortwodifferentusersforasingleday.Thezoomedin

subsethelpstoillustratetheimportanceoflocationprecision-thereareseverallocations

thatwouldbethesameforbothusersifthelocationprecisionwascoarsenedten-or100-

fold.Figure13showsthreedifferentusers’dailytrajectoriesinaspace-timecube.Thered

andgreenusersoverlapintime,butnotspace,whiletheredandblueusersoverlapin

spacebutnottime.Theuseofallthreepiecesofcoordinateinformation-x,y,andt-canbe

extremelyimportantforuniquelyassociatingasingletrajectory.

ResultsTable1showstheunicityvaluesassociatedwithsizeofeachrandompointset.Themean

wastheaverageunicityacrossall71users,whiletheminimumandmaximumshowthe

variationinunicityamongusers.Ingeneral,thelocationsofpointsonatrajectorywere

highlyunique.90%oftherandomsetsofjusttwopointscomposedofonlylocation(no

timestamp)wereassociatedwithonlyonetrajectory.Addingthetimestampincreasedthe

unicityoftwopointsto97%.Whenfivepointswithlocationandtimestampwereused,the

unicityincreasedtoalmost99%.Somewhatsurprisingly,theangleofmovementalonehas

fairlyhighunicity—whentheangleofthreepointsaretested,theunicityissimilartothe

unicityoflocationforCDRasfoundindeMontjoyeetal.(2013)andSongetal.(2014).Five

anglevaluescoulduniquelyidentifyatrajectory73%ofthetime.

Table1:unicityresultsforlocation(x,y),locationandtime(x,y,t)andtheabsoluteangleof

apoint.Meansandrangesarereportedfor500setsofrandompointsforeachof71users.

Unicityvaluesforcoarsenedlocation,time,andabsoluteangleareshownintable2.When

justtwopoints(notimestamp)areusedatthecoarserresolution(spatialprecision

reducedtenfoldto~10m),only68.5%ofthetimearethepointsassociatedwithaunique

trajectory.Whenthe30secondslessprecisetimestampisadded,theunicityissimilarto

theoriginalresolution,andwithfivepointswithlocationandtimestamp,unicityincreases

to~94%.Theunicityoftheabsoluteangledegradessubstantially—evenusingasetoffive

pointsresultsinlessthan5%unicity.

Page 10: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Table2:unicityresultsforlocation(x,y),locationandtime(x,y,t)andtheabsoluteangleof

apoint.Spatialresolutionhasbeencoarsenedto4decimalplaces;temporalresolutionhas

beencoarsenedto30seconds;andabsoluteanglecoarsenedtothreedecimalplaces.

Meansandrangesarereportedfor500setsofrandompointsforeachof71users.

Table3showsunicityvaluesforthecoarsestlocationcoordinates:thespatialresolutionof

anx/ypairisnow~100mandthetemporalresolutionwascoarsenedtooneminute.The

spatialresolutionhereisclosertotheresolutionoftheantennareceptionareasusedinthe

deMontjoyeetal.(2013)paper(wherespatialresolutionrangedfrom115mto15km),

butthecoarsenedtemporalresolutionisstillmuchmoreprecisethantheoneusedinthe

CDRstudies.Asaresult,usinglocationandtimeforjusttwopointsstillresultsinahigh

unicity(mean80.3%),whilefivepointsincreasesthemeanunicitytoalmost88%.Using

justlocation(notimestamp),theunicitydegradesto32%fortwopointsand66%forfive

points.

Table3:unicityresultsforlocation(x,y),locationandtime(x,y,t).Spatialresolutionhas

beencoarsenedto3decimalplaces;temporalresolutionhasbeencoarsenedto60seconds.

Meansandrangesarereportedfor500setsofrandompointsforeachof71users.

Themeanunicityvaluesforlocationandlocation+time,withdifferentlevelsofcoarsening

aresummarizedinfigure14.WiththemuchhigherprecisionandspatialresolutionofGPS

datacurrentlyavailable,twox/ylocationsaresufficienttobeuniquelyassociatedwitha

Page 11: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

singletrajectory90%ofthetime,addingthetimestampmatchesasingletrajectory97%of

thetime.Thethreepiecesofinformation-x,y,time-aresospecificthatincreasingthe

numberofpointstomatchtofiveincreasestheunicityverylittlebecauseitisalreadyso

highusingjusttwopoints.Thefirstlevelofcoarseningforx,y,t(~10mspatial,30seconds

temporal)hassimilarunicitytotheoriginalresolutionforjustx,ycoordinates,andwhen

fourorfivepointsareused,thecoarsenedx,y,thasslightlyhighermeanunicity.Themost

coarsenedlevelforx,y,t(~100m,60seconds)stillhasahighunicity(80%fortwopoints).

Thex,ycoordinates(notimestamp)showthegreatestincreaseinunicitywhenmore

pointsareusedformatching.Thissuggeststhatthereisatrade-offbetweenlocation

resolutionandamountofinformation(locationpoints)available.

Discussion/FutureWorkWhileeachoftheissuesaddressedherefocusesonasingledatasetforthecasestudy,I

wouldexpecttheresultstobegenerallyapplicabletoothersimilarmobilitydatasets.The

hotspotanalysisillustratedthatlocalspatialstatisticscanbeusedtoidentify“hotspots”of

movementactivity,andspatialstatisticsvisualizationtoolsareusefulforexploringhow

thesehotspotschangethroughtime.Thespatialstatisticsusedhereallwereextensionsof

Moran’sIindex,whichrequiresavariableofinterestthatismeasuredonaratioorinterval

scale,andlocationsofpointsdonotmeetthiscondition.Therefore,thepointswere

aggregatedtopolygonsandthecountswereusedasthevariableofinterest.Inthesecond

partofthisstudy,speed(m/sec)wascalculatedforeachpoint(basedonthedistancefrom

andtimesincethepreviouslocation)andusedasthevariableofinterest.Spatial

autocorrelationofthespeedassociatedwitheachpointwasclassifiedintohigh-high(likely

associatedwithhighways),low-high,high-low,andlowlow,whichwereusedhereto

indicatepotentialareasofinterest.

Abetterunderstandingofthespatiotemporalstructureofhumanmobilitycouldalso

increasethepredictabilityofmovement.Forexample,apatternofhighactivityor

relativelyslowmovementincertainlocationsatcertaintimesofthedaycouldbeusedto

inferfuturemovementatthesamelocations.

Inbothoftheaboveexamples,locationandrelativespeedwereusedasproxiesfor

behavior,respectively.Itisalsoimportanttonotethatthesevariablesrepresented

collectivebehavior,asallpointsforall40taxicabswereconsideredtogether.Thereare

interestingfuturedirectionstogoinwiththisresearch,particularlycomparingtheutility

ofassociatingrelativespeedwithpoints-of-interestfordifferenttypesofmovingentities.

Automobiles,andtaxicabsinparticular,movedifferently(andslowdownfordifferent

reasonssometimes)comparedtopedestriansandwildlife,andevenregularvehicles.It

wouldbeinterestingtoalsotesthowusefulothermovementparameterssuchasrelative

andabsoluteangleandsteplengthwouldbeforidentifyingpoints-of-interest.Thisisonly

applicableforentitiesthatcanmovemorefreelyacrossspaceandarelessconfinedto

streetnetworksorsidewalks.

Page 12: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Theunicitystudyhasparticularlyimportantimplicationsforprivacyandtheincreasing

availabilityof‘anonymized’trajectorydatasets.Thisisoneofthefirststudiestoexplore

unicityandanonymitywithhigherresolutionGPSdataanditshouldbetroublinghow

uniqueasetoftwolocationpointscanbe.Decreasingthespatialandtemporalresolution

reducestheunicity,butfivepointswithx,ycoordinatesatthecoarsestresolutiontested

herewerestilluniquelyassociatedwithasingletrajectorymorethan60%ofthetime.

Movementparameterssuchasspeed,angle,andsteplengthhavenotbeentestedas

potentialidentifiersoftrajectories,butthecasestudyherefocusingonabsoluteangle

highlightstheirpotentialimportance.Fiveabsoluteangledatapointswereuniquely

associatedwithasingletrajectory72%ofthetime.Thissuggeststhatindividual

movement,irrespectiveofabsolutegeographiclocation,canbeidentifiablewithasufficient

levelofprecisionofanglemeasurementsanddatapoints.Futureworkshouldfocus

specificallyonhowmovementparameterscouldbeusedsinglyortogethertoidentifya

trajectory.

Itisalsoimportanttonoteherethatthefocusofthisstudywasnottore-attachuser

informationtotrajectories,itwasjusttoexaminehowuniquetrajectorieswerebasedon

differentfactors.TheprivacyissuesassociatedwithhigherqualityGPSlocationdatashould

beaddressedwiththeassumptionthatifatrajectorycanbeuniquelydescribedwith2-5

GPSpoints,thetrajectorycouldeventuallybede-anonymized.

ReferencesAnselinL(1995)LocalIndicatorsofSpatialAssociation—LISA.GeographicalAnalysis,

27(2),93–115.

AnselinL,SridharanSandGholstonS(2006)UsingExploratorySpatialDataAnalysisto

LeverageSocialIndicatorDatabases:TheDiscoveryofInterestingPatterns.Social

IndicatorsResearch,82(2),287–309.

BeckerR,CáceresR,HansonK,etal.(2013)HumanMobilityCharacterizationfromCellular

NetworkData.Commun.ACM,56(1),74–82.

deMontjoyeY-A,HidalgoCA,VerleysenM,etal.(2013)UniqueintheCrowd:Theprivacy

boundsofhumanmobility.ScientificReports,3,

DengZandJiM(2011)SpatiotemporalstructureoftaxiservicesinShanghai:Using

exploratoryspatialdataanalysis.In:IEEE,pp.1–5.

GaoS(2015)Spatio-TemporalAnalyticsforExploringHumanMobilityPatternsandUrban

DynamicsintheMobileAge.SpatialCognition&Computation,15(2),86–114.

GudmundssonJ,LaubePandWolleT(2012)Computationalmovementanalysis.In:Kresse

WandDankoDM(eds),SpringerHandbookofGeographicInformation,SpringerBerlin

Heidelberg,pp.423–438.

Page 13: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

LiN,LiTandVenkatasubramanianS(2007)t-Closeness:PrivacyBeyondk-Anonymityand

lDiversity.In:IEEE23rdInternationalConferenceonDataEngineering,2007.ICDE2007,

pp.106–115.

OrellanaDandWachowiczM(2011)ExploringPatternsofMovementSuspensionin

PedestrianMobility.GeographicalAnalysis,43(3),241–260.

OrellanaDA,WachowiczM,KnegtdeHJ,etal.(2010)Uncoveringpatternsofsuspensionof

movement.

Piorkowski,M.,Sarafijanovic--�-Djukic,N.,andGrossglauser,M.CRAWDADdataset

epfl/mobility(v.2009--�-02--�-24),downloadedfromhttp://crawdad.org/epfl/mobility/20090224,doi:10.15783/C7J010,Feb2009.

QianX,ZhanXandUkkusuriSV(2015)CharacterizingUrbanDynamicsUsingLargeScale

TaxicabData.SpringerInternationalPublishing.

ReySJ(2014)SpatialDynamicsandSpace-TimeDataAnalysis.SpringerBerlinHeidelberg.

SweeneyL(2002)k-ANONYMITY:AMODELFORPROTECTINGPRIVACY.International

JournalofUncertainty,FuzzinessandKnowledge-BasedSystems,10(05),557–570.

ZangHandBolotJ(2011)AnonymizationofLocationDataDoesNotWork:ALarge-scale

MeasurementStudy.In:Proceedingsofthe17thAnnualInternationalConferenceon

MobileComputingandNetworking,MobiCom’11,NewYork,NY,USA:ACM,pp.145–156,

Availablefrom:http://doi.acm.org.ezproxy.lib.utexas.edu/10.1145/2030613.2030630

(accessed15May2015).

Zheng,Y.LizhuZhang,XingXie,Wei-YingMa.Mininginterestinglocationsandtravel

sequencesfromGPStrajectories.InProceedingsofInternationalconferenceonWorldWild

Web(WWW2009),MadridSpain.ACMPress:791-800.

Zheng,Y.,QuannanLi,YukunChen,XingXie,Wei-YingMa.UnderstandingMobilityBased

onGPSData.InProceedingsofACMconferenceonUbiquitousComputing(UbiComp2008),

Seoul,Korea.ACMPress:312-321.

Zheng,Y.XingXie,Wei-YingMa,GeoLife:ACollaborativeSocialNetworkingServiceamong

User,locationandtrajectory.Invitedpaper,inIEEEDataEngineeringBulletin.33,2,2010,

pp.32-40.

Page 14: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 15: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 16: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 17: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 18: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 19: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 20: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 21: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 22: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 23: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 24: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Appendix:

Page 25: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 26: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 27: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 28: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 29: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 30: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 31: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 32: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 33: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 34: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 35: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 36: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 37: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

© 2015 Proprietary, The University of Texas at Austin, All Rights Reserved.

For more information on Center for Identity research, resources and information, visit identity.utexas.edu.

identity.utexas.edu