- 1. Anna Monreale Fabio Pinelli Roberto TrasartiFosca Giannotti
A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti.WhereNext: a
Location Predictor on Trajectory Pattern Mining . KDD 2009
Knowledge Discovery and Delivery Lab (ISTI-CNR&Univ. Pisa)
www-kdd.isti.cnr.it
2.
- Wireless networks infrastructures are thenerves of our
territory
- besides offering their services, they gather highly
informativetracesabout the human mobile activities
- Miniaturization, wearability, pervasiveness will produce traces
of increasing
3.
- From the analysis of the traces of our mobile phones it is
possible to reconstruct our mobile behaviour, the way we
collectively move
- This knowledge may help us improving decision-making in many
mobility-related issues:
-
- Planning traffic and public mobility systems in metropolitan
areas;
-
- Planning physical communication networks
-
- Forecasting traffic-related phenomena
-
- Organizing logistics systems
4. 5.
- Predicting the next location of a trajectory can improve a
large set of services such as:
- Location-based advertising.
? ? ? .4 .8 .35 6.
- How to realize this idea:
- Extract patterns fromall theavailable movementsin a certain
area instead of on the individual history of an object;
- Using theseLocal movement patternsas predictive rules.
- Build a prediction tree as global model.
Trajectory dataset Local patterns Prediction Tree 7. Select the
set of interesting trajectories Validation Evaluation Extract
T-Patterns (A set of Local models) Merge T-Patterns (Global model)
Use the Condensed model as predictor 8.
- The local pattern we use is theT-Pattern.It describes the
common behavior of a group of users in space and time.
F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi.Trajectory
pattern mining . KDD 2007: 330-339. 9.
- Generatingall rulesfrom each T-pattern and using them to build
a classifier is too expensive.
T-Pattern Rules 1 2 3 R 1 R 2 R 3 R 4 R 1 R 2 R 3 R 4 R 1 R 2 R
3 R 4 10.
- To avoid the rules generation the T-Pattern set is organized as
a prefix tree.
- For Each nodev Ididenties the nodev
- Regiona spatial component of the T-Pattern
- Supportis the support of the T-pattern
- [a,b]correspond to the time interval nof the T-Pattern
11.
How to compute the Best Match? Best Match Prediction 12.
- The spatio-temporal distance computed between the segment of
trajectory (bounded in time using the previous transition time) and
the current node of the path.
Case a : The trajectory segment intersects the region of the
node Case b : The enlarged trajectory segment intersects the region
Case c : The enlarged trajectory segment doesnt intersect the
region Wheretheth_tis the time tolerance window defined by the
user. 13.
- The path score is the aggregation of all punctual scores along
a path.
- TheBest Matchis the path having:
-
- at least one admissible prediction.
10 min 15 min 8 min 10 min Punctual score: 1 Punctual Score: .58
Punctual Score: .8 11 min 16 min Path score .79 14.
- Averagegeneralizes distances between the trajectory and each
node
- Sumis based on the concept of depth
- Maxis the optimistic one, the best punctual score is selected
as path score
- Context-dependentaggregations can take into consideration other
aspects of the problem.
15.
- The WhereNext algorithm can be tuned using its parameters:
-th_t: time window tolerance
- -th_s : space window tolerance
- -th_score : minimum prediction score threshold
- -th_agg : the aggregation function used to compute the path
score (Avg, Sum or Max)
16.
- It is very hard to understand which is the best set
ofT-patterns we can use to build the our model:
- a big set ofT-patternsvery slow prediction.
- a small set of T-patternscoverage leaks
- For this reason we have defined a way to measure the prediction
power of a T-Pattern set.
17.
- An evaluating function is defined to estimate thepredicting
powerof a T-Pattern set.
- SpatialCoverage : the space coverage of the regions contained
in the T-Patterns set;
- DatasetCoverage : measures how much the T-Pattern set
represents the trajectories
- RegionSeparation : the precision of the regions in the
T-Pattern set.
Model 1 Model 2 Testing the a priori evaluation 18. You are here
19.
- The results are evaluated using the following measures:
- Accuracy : rate of the correctly predicted locations (space and
time) divided by the total number of trajectories to be
predicted.
- Average Error : the average distance between the real
trajectories in the predicted interval and the region
predicted.
- Prediction rate : the number of trajectories which have a
prediction divided by the total number of trajectories to be
predicted.
Predicted Location Cut Original Predicted Location Cut Original
Error 20.
- We used real life GPS dataset obtained from 17,000 vehicles in
the urban area of the city of Milan.
Training set : 4000 trajectories between 7am and 10 am on
WednesdayTest set : 500 trajectories between 7am and 10 am on
Thursday. 21.
Average Errorvsth_space 22.
Single UsersAccuracyandPrediction rate 23.
- A visual example of the application on Milan mobility data. The
context is traffic management and we want to predict how the
traffic will move in the city center.
- We have built a predictor on a good set ofT-patterns
whichincludethe city gates of Milan.
Part of the GeoPKDD integrated platform.F. Giannotti, D.
Pedreschi, and et al. Geopkdd:Geographic privacy-aware knowledge
discovery and delivery(european project), 2008. 24.
- - Anew techniqueto predict the next locations of a trajectory
based on previous movements of all the objects without considering
any information about the users. - Thetime informationis used not
only to order the events but is intrinsically equipped in the
T-Patterns used to build the Prediction tree. - The user cantune
the methodto obtain a good accuracy and prediction rate.
- - We are experimenting the methodin real
worldapplications.
25. 26. Trajectories Dataset Regions of Interest T-PATTERNS 27.
28.
- The same exact spatial location (x,y) usually never occurs
twice
- The same exact transition times usually do not occur twice
- Solution: allow approximation
-
- a notion ofspatial neighborhood
-
- a notion oftemporal tolerance
29.
- Two points match if one falls within aspatial neighborhood
N()of the other
- Two transition times match if theirtemporal difference is
30.
- Two points match if one falls within aspatial neighborhood
N()of the other
- Two transition times match if theirtemporal difference is
31.
- Two points match if one falls within aspatial neighborhood
N()of the other
- Two transition times match if theirtemporal difference is
32.
- T-pattern mining can be mapped to a density estimation problem
over R 3n-1
-
- 2 dimensions for each (x,y) in the pattern (2n)
-
- 1 dimension for each transition (n-1)
-
- mapping each sub-sequence of n points of each input trajectory
toR 3n-1
-
- drawing an influence area for each point (composition ofN()and
)
- Too computationally expensive, heuristics needed
- Our solution: a combination of sequential pattern mining and
density-based clustering