Top Banner

of 22

SHW Engaging Data Final

Apr 06, 2018

Download

Documents

Kipp Jones
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 SHW Engaging Data Final

    1/22

    SkyhookWireless2009

    Modeling Social Behavior with Aggregated LocationRequests

    EngagingData,October2009

    FirstInternaonalForumontheApplicaonandManagementof

    PersonalElectronicInformaon

  • 8/3/2019 SHW Engaging Data Final

    2/22

    SkyhookWireless2009

    LocaonTechnology

    Smartphones,netbooks,tablets,

    laptops,digitalcameras

    HybridofGPS,cellularposioning

    andWi-Filocalizaon

    iPhone,iPod,MacOS

    Dellnetbooks,laptops

    Androidhandsets

    WhatisSkyhook?

  • 8/3/2019 SHW Engaging Data Final

    3/22

    SkyhookWireless2009

    400-500fullmedrivers

    Scanningeverystreet

    withSkyhookequipment

    Automacdatacapture

    andprocessing

    110millionWi-FiAPs

    1+millioncelltowers

    HowSkyhookBuildsCoverage

  • 8/3/2019 SHW Engaging Data Final

    4/22SkyhookWireless2009

    WPSconstellaon

  • 8/3/2019 SHW Engaging Data Final

    5/22SkyhookWireless2009

    US&EuropeanCoverage

  • 8/3/2019 SHW Engaging Data Final

    6/22SkyhookWireless2009

    ResearchBackground

    50+ Million devices around the globe

    Billions of anonymous location requests every month

    20+ months of cumulative data

    Smartphone mobile users

    Using London and Manhattan as examples

    Questions:

    What does aggregate user behavior tell us? About users?Groups? Activities? Events? Locations?

    Can we discern patterns in the data?

    Can we classify time/space/frequency/phase based on thesepatterns?

    How can this information be leveraged? Operational?

    Applications?

  • 8/3/2019 SHW Engaging Data Final

    7/22SkyhookWireless2009

    CitywideAnalysisandComparisons

    London Sunday London Monday

    Consistent intensity on weekly basis Quite different, Sunday vs Monday Monday approx. 2x request intensity of Sunday

    Magenta area = 1000 requests per km per day

  • 8/3/2019 SHW Engaging Data Final

    8/22SkyhookWireless2009

    Classificaon

    Emergencebursts SouthStaon aggregated use pattern where users emerge into a context of uncertainty

    Impedanceclustering AccidentonMassPike pattern where multiple users are trying to navigate around traffic blockages or

    unanticipated impediments

    Socialaffinity/tribalclustering PresidenalInauguraon groups of users have gathered together voluntarily around or in anticipation of a

    cultural event

    Arterialaccumulaon CommonwealthAve. commuting pathways or pedestrian routes, generally occurring in temporal pulses

    Instuonalnucleaon LongwoodMedical usage clusters which have been identified occurring within the confines of

    academic campuses or hospital facilities

  • 8/3/2019 SHW Engaging Data Final

    9/22SkyhookWireless2009

    AcvityBasedAnalysis

    Time/spacebasedanalysis 'heat'oraggregaterequestanalysis

    basedondifferentscalesinmeandspace

    Frequency/phasedomain

    findtemporalpulses(hourly,weekly,daily,etc.)

    grouplikefrequencyandphaseacvityareas

    canbeappliedatdifferentspaaldimensions

    Baseline/anomalydetecon usetrainingdatasettocomputebaselineandnoisethreshold

    forspaalregion

    rundataagainstbaselineanddetectanomalies

    classifyanomaliesusingabovemethods

  • 8/3/2019 SHW Engaging Data Final

    10/22SkyhookWireless2009

    LargeScaleEvents

    Recurring Affinity Cluster

    Event Viewing and ImpedanceClustering

    Control Sample

    'Control'dayversusSt.Patrick'sDayParade

    >2.7xbaselinecontrolaverageoverthearea

  • 8/3/2019 SHW Engaging Data Final

    11/22

    SkyhookWireless2009

    Boston Sunday

    Yankees Stadium, No Game vs. GameMagenta = 1000 requests per square KM per day10 monday games over 30 week sample

    LocalIntensity

  • 8/3/2019 SHW Engaging Data Final

    12/22

    SkyhookWireless2009

    TileSize

    ~400m~400m

    ~1000 Tiles inManhattan

    ~4B Tiles cover theEarth

  • 8/3/2019 SHW Engaging Data Final

    13/22

    SkyhookWireless2009

    Baseline

    ForMonday

  • 8/3/2019 SHW Engaging Data Final

    14/22

    SkyhookWireless2009

    SATURDAY MONDAY2AM

    DailyComparison:Redsquaresmeasureacvity.

    Comparisonof2AMManhaan

    onSaturdayversusMonday.

  • 8/3/2019 SHW Engaging Data Final

    15/22

    SkyhookWireless2009

    RequestIntensityVersusVariance

    Monday 6PMMonday 3AM

    Acvityand

    Consistency:Redsquaresmeasureacvity

    (requestcountsperhourper

    le);yellowcirclesmeasure

    variance.Wherereddominates,

    usageisconsistentforle-hour

  • 8/3/2019 SHW Engaging Data Final

    16/22

    SkyhookWireless2009

    SampleLocaonofTiles

    Midtown (48th & 8th)

    Washington Square /

    Greenwich Village

    Houston

  • 8/3/2019 SHW Engaging Data Final

    17/22

    SkyhookWireless2009

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

    20

    40

    60

    80

    100

    120

    140

    160

    180

    Time (hrs)

    NumberofRequests

    Tile Activity Counts

    B73AC00B

    B73A8C84

    B739BFBF

    TileAcvityCounts

    MidtownHoustonGreenwich Village

    Feb 24, 2009 Sep 12, 2009

    Raw request logsBin requests by hourDiscretize into 'tiles' 400m squareHourly requests per tile for 3 sample tiles

  • 8/3/2019 SHW Engaging Data Final

    18/22

    SkyhookWireless2009

    Intensity in frequency domainIdentify periodic patterns of activityProject future based on past dependable patternsLargest spike shows at the 24 hour cycleFits intuition regarding daily cycle

    Not the only cycles that can be found

    Frequency of .5 representsthe maximum period we cananalyze ( the maximumsample of 1 hour).

    .04167 is 1/12th the maximumfrequency == 24 hours.

    FrequencyAnalysis

    0 0.1 0.2 0.3 0.4 0.5

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    2x 10

    4

    Frequency (normalized)

    Int

    ensity

    Tile Spectra

    2 Hrs

    4 Hrs

    8 Hrs

    12 Hrs

    16 Hrs

    24 Hrs

    1 Wk

    1 Mo B73AC00BB73A8C84

    B739BFBF

  • 8/3/2019 SHW Engaging Data Final

    19/22

    SkyhookWireless2009

    Left shows intensity at the 24 hour cycleAll sample tiles show strong affinity for daily activityRight shows phase of peakGreen tile peaks approximately 5 hours before the red tile every dayCan help partition and classify tiles based on their phase

    24-hourFrequencyandPhase

    0.038 0.039 0.04 0.041 0.042 0.043 0.044 0.0450

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    2

    x 104

    Frequency (normalized)

    Intensity

    Tile Spectra

    24 Hrs

    B73AC00B

    B73A8C84

    B739BFBF

    0.038 0.039 0.04 0.041 0.042 0.043 0.044 0.045-0.5

    -0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    Frequency (normalized)

    Phase(Cycles)

    Tile Phases

    24 Hrs

    B73AC00B

    B73A8C84

    B739BFBF24 Hrs

    l

  • 8/3/2019 SHW Engaging Data Final

    20/22

    SkyhookWireless2009

    TileSpectra

    Only one of our example tiles shows strong monthly periodicityGreenwich Village tilePattern is consistent month after month

    0 0.5 1 1.5 2 2.5 3 3.5 4

    x 10-3

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    2x 10

    4

    Frequency (normalized)

    Intensity

    Tile Spectra

    1 Wk1 Mo

    B73AC00B

    B73A8C84

    B739BFBF

    90th P l B li B H

  • 8/3/2019 SHW Engaging Data Final

    21/22

    SkyhookWireless2009

    0 20 40 60 80 100 120 140 160 1800

    10

    20

    30

    40

    50

    60

    70

    Time (Hours)

    Requests

    Baseline Week: 90th Percentile

    B73AC00B

    B73A8C84

    B739BFBF

    Outlier detection using training dataUse 90th percentile hourly behavior per tile for a weekFigure shows 7 (daily) peaks for all tilesRed curve peaks several hours behind green curve each dayCalculated this data for all tiles in Manhattan

    90thPercenleBaselineByHour

    Monday 19:00 EST

  • 8/3/2019 SHW Engaging Data Final

    22/22

    SkyhookWireless2009

    What'sNext

    Explore other analysis techniques, e.g.Eigenplaces (e.g. Francesco Calabrese,Carlo Ratti 2009)

    Systematize processing and classification Determine real-world activities associated

    with virtual analysis

    Push towards real-time analysis