Computational Movement Analysis Lecture 2: Clustering Joachim Gudmundsson
Feb 23, 2016
Computational Movement Analysis
Lecture 2: Clustering
Joachim Gudmundsson
Fundamental tools: clustering
Clustering: Group similar objects into clusters.
Fundamental tools: clustering
Clustering: Group similar (sub)curves into clusters. Similarity measure: Fréchet distance
Question: Do we need any constraints on a cluster? Constraints on subcurves in a cluster?
Aim: Cluster subcurves
Cluster of subcurves
Subtrajectory clustering
Subtrajectory clustering
Subtrajectory clustering
Subtrajectory clustering
Recall: Fréchet Distance
Fréchet Distance measures the similarity of two curves.
Dog walking example- Person is walking his dog (person on one curve and the dog on other)- Allowed to control their speeds but not allowed to go backwards!- Fréchet distance of the curves: minimal leash length necessary for
both to walk the curves from beginning to end
Input: Two polygonal chains P=p1, … , pn and Q=q1, … , qm in Rd.
The Fréchet distance between P and Q is:
where and range over all continuous non-decreasing reparametrizations.
Note that (0)=p1, (1)=pn, (0)=q1 and (1)=qm.
Well-suited for the comparison of curves since it takes the continuity of the curves into account.
Recall: Fréchet Distance
(P,Q) =
Decision algorithm: compute path
Algorithm: 1. Compute Free Space diagram mn cells O(mn) time 2. Compute a non-xy-decreasing path from (q1,p1) to (qm,pn). Build network O(mn) time. Find a path O(mn) time.
(q1,p1)
(qm,pn)
P
Q
Cluster
Input: A polygonal curve T, an integer m>1 and a distance d.
Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves.
Constraints?
Cluster
Input: A polygonal curve T, an integer m>1 and a distance d.
Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves.
Constraint 1: subcurves are pairwise disjoint
Cluster
Input: A polygonal curve T, an integer m>1 and a distance d.
Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves.
Constraint 1: subcurves are pairwise disjoint
More constraints?
d infinite number of clusters
Cluster
Input: A polygonal curve T, an integer m>1 and a distance d.
Cluster: m subcurves T1, … , Tm of T with distance at most d between any two subcurves.
Constraint 1: subcurves are pairwise disjoint
Constraint 2: cluster has to be maximal “length”
d infinite number of clusters
Decision Problem
Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that:1. the subcurves are pairwise disjoint, 2. the distance between any two subcurves is at most d,
and 3. at least one subcurve has length l.
Decision Problem
Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that:1. the subcurves are pairwise disjoint, 2. the distance between any two subcurves is at most d,
and 3. at least one subcurve has length l.
Decision Problem
Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that:1. the subcurves are pairwise disjoint, 2. the distance between any two subcurves is at most d,
and 3. at least one subcurve has length l.
The length of a subcurve cluster is assumed to be maximal.
Decision Problem
Given a curve T, a subcurve cluster SC(m,l,d) of T consists of at least m subcurves T1, … , Tm of T such that:1. the subcurves are pairwise disjoint, 2. the distance between any two subcurves is at most d,
and 3. at least one subcurve has length l.
The length of a subcurve cluster is assumed to be maximal.
Decision Problem
Given a trajectory T, a subtrajectory cluster SC(m,l,d) of T consists of at least m subtrajectories T1, … , Tm of T such that:1. the subtrajectories are pairwise disjoint, 2. the distance between any two subtrajectories is at
most d, and 3. at least one subtrajectory has length l.
The length of a subtrajectory cluster is assumed to be maximal.
Problem
Decision version: Subtrajectory cluster SC(m,l,d)
Given a trajectory T, is there a subtrajectory cluster with parameters m, l and d?
Optimisation versions: SC(m,max,d) – maximise length of
cluster
Hardness results
Theorem 1: Finding any approximation of the SC(m,max,d) problem is 3SUM-hard.
Theorem 2: The decision problem SC(m,l,d) is NP-complete.
Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard.
[Gudmundsson & van Kreveld’08]
Hardness results
Theorem 2: The decision problem SC(m,l,d) is NP-complete.
Reduction from MaxClique
MaxClique:Is there a clique of size k in a given graph G=(V,E)?
Clique of size 4
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
b,c,d
a
e
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
a
b,c,d
e
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
Longest subtrajectory cluster: NP-complete
a
b
c
d
e
MaxClique
SC(m,l=n,d) Clique of size m in G Problem as hard as
MaxClique!
b,c,d
e
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
Longest subtrajectory cluster: NP-complete
Problem: SC(m,l=n,d).
Hardness results
Theorem 2: The decision problem SC(m,l,d) is NP-complete.
Problem: SC(m,l=n,d).
a
b,c,d
e
b
a,c,e
d
c
a,b
d,e
a,e
b,c
d
b,d
a,c
ea
b
c
d
e
MaxClique
Longest subtrajectory cluster: NP-complete
Hardness results
Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard.
Hardness results
Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard.
Corollary 1: The problem of computing a (2-)-distance approximation of SC(max, l, r), for any constant 0 < < 1, is at least as hard as approximating MaxClique.
Hardness results
Theorem 3: The problem of computing a (2-)-distance approximation of the SC(m,max,d)-problem is NP-hard.
Corollary 1: The problem of computing a (2-)-distance approximation of SC(max, l, r), for any constant 0 < < 1, is at least as hard as approximating MaxClique.
Can we find a 2-distance approximation in polynomial time?
Fréchet distance between m curves
Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni
The Fréchet distance of F can be computed by computing the Fréchet distance between every pair of curves.
Time: O( (ninj log ninj))i,j
If |Fi| = n/m then O((n/m)4 log n/m).
Fréchet distance between m curves
Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni
Observation: Given F1, F2 and F3, we have: F(F1,F3) F(F1,F2) + F(F2,F3).
[Dumitrescu & Rote’04]
Fréchet distance between m curves
Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni
Observation: Given F1, F2 and F3, we have: F(F1,F3) F(F1,F2) + F(F2,F3).
Can we use this observation to get an approximation?
a
b
a+b
[Dumitrescu & Rote’04]
Fréchet distance between m curves
Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni
Idea: Select a representative curve F1 of F.
Compute the maximum Fréchet distance D between F1 and all other curves in F.
Fréchet distance between m curves
Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni
Idea: Select a representative curve F1 of F.
Compute the maximum Fréchet distance D between F1 and all other curves in F.
D F 2D
Observation: Gives a 2-approximation
Fréchet distance between m curves
Input: Set of m polygonal curves F = {F1, …, Fm} with |Fi| = ni
Idea: Select a representative curve F1 of F.
Compute the maximum Frechet distance D between F1 and all other curves in F.
D F 2D
Observation: Gives a 2-approximation
Time: O( (n1ni log n1ni))i
Decision algorithm: compute path
Recall: Deciding if the Fréchet distance between two curves P and Q is less than r can be computed in O(mn) time.
The Fréchet distance between two polygonal curves P and Q can be computed in O(mn log mn) time using parametric search.
(q1,p1)
(qm,pn)
P
Q
Q P
Recall the problem
Given a trajectory T, a subtrajectory cluster SC(m,l,d) of T consists of at least m subtrajectories T1, … , Tm of T such that:1. the subtrajectories are pairwise disjoint, 2. the distance between any two subtrajectories is
at most d, and 3. at least one subtrajectory has length l.
Recall the problem
Constraint: For simplicity we will assume that all sub-trajectories in a cluster has to start and end at a vertex.
Input: A trajectory T with n points, an integer m>1 and a real value d>0.
Output: SC(m,max,d)
Idea: Create a free space diagram describing the distance between T and T.
Free space diagram of T
T
Free space diagram of T
T
Free space diagram of T
TA
B
C
D(A,C) d D(B,C) d D(A,B) 2d
Free space diagram of T
A
B
C
C: representative trajectory
The length of the SC {A,B,C}is the length of the representative trajectory.
Free space diagram of T
Free space diagram of T
Free space diagram of T
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
LR
While sweeping maintain network of critical points.
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
LR
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
LR
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
LR
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
LR
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
LR
Approximation algorithm
1. Sweep the free space diagram from left to right with two vertical lines (L and R)
2. At each event point decide if there are m monotone curves between L and R
a) If “yes” then move R to the right
b) If “no” and R-L=1 then move R to the right
the rightc) If “no” and R-L>1 then
move L to the right
L R
Data structures
Number of event points? LR
Data structures
Number of event points? L R
Data structures
Number of event points? O(n)L R
Two types of events: 1. L moves to the right2. R moves to the right
How to handle an event?
Decide if there are m non-overlapping xy-monotone paths between L and R
Handle event
Start with top-most corner u on R.
Find the top-most corner u’ on L that can be reached by a xy-monotone path P.
L R
Observation: No point on R below u can reach a point on L above u’ with an xy-monotone path.
P
u’
u
Handle event
L R
P
u’
u
vv’
Start with top-most corner u on R.
Find the top-most corner u’ on L that can be reached by a xy-monotone path P.
Observation: No point on R below u can reach a point on L above u’ with an xy-monotone path.
Handle event
L R
Next take the top-most corner v on R below u’. Find the top-most corner on L that can be reached by a xy-monotone path.
Continue until:1. m curves found, or 2. no more corners on R.
P
u’
u
v
v’
Start with top-most corner u on R.
Find the top-most corner u’ on L that can be reached by a xy-monotone path P.
Path Query in the Free Space diagram
Recall querying for a path in lecture 1.
O(n2) time per query
O(n) events, n points on RTotal: O(n3w) time and O(nw) space, where w = max (R-L)
In worst case the algorithm performs n path queries.
How do we perform a path query?
Path Query in the Free Space diagram
Can it be improved?
In worst case the algorithm performs n path queries.
How do we perform a path query?
O(n2w) time and O(nw) space
L R
Path Query in the Free Space diagram
Can it be improved?
In worst case the algorithm performs n path queries.
How do we perform a path query?
O(n2w) time and O(nw) space
Extension:The algorithm can be modified to handle the case when only the “reference” trajectory needs to start an end at vertex.
Approximation algorithm
Theorem:A 2-distance approximation of the SC(m,max,d) problem can be computed in O(n2+nmw) time and O(nw) space using the discrete Fréchet distance.
Theorem:A 2-distance approximation of the SC(m,max,d) problem can be computed in time O(n2w) using the continuous Fréchet distance if reference trajectory starts and ends in vertex.
Theorem:A 2-distance approximation of the SC(m,max,d) problem can be computed in time O(n3m 2(n/m)(log2 n+m)) using the continuous Fréchet distance.
[Joint work: Buchin, Buchin, Löffler and Luo’10]
Experimental Results (continuous!)
Note: Continuous model input data can be simplified![Joint work with Nacho
Valladares’13]
i5-200 CPU witha Nvidia GTX 580
Open Problems
1. Can we cluster faster?
2. Can a c-approximate Fréchet clustering be computed faster?
3. Can we cluster faster for special cases?
4. What should we report?
5. Cluster using other distance measures? For example using [Sankaramanet al. 2013]?
References
• K. Buchin, M. Buchin, J. Gudmundsson, M. Loffler and J. Luo. Detecting Commuting Patterns by Clustering Subtrajectories. International Journal on Computational Geometry and Applications, 2011.
• N. Valladares and J. Gudmundsson. A GPU approach to subtrajectory clustering using the Fréchet distance. ACM SIGSPATIAL 2012.
• A. Dumitrescu and G. Rote. On the Fréchet distance of a set of curves, Proceedings of the Sixteenth Canadian Conference on Computational Geometry, 2004.
• S. Sankararaman, P. K. Agarwal, T. Mølhave, J. Pan and A. P. Boedihardjo. Model-driven matching and segmentation of trajectories. ACM SIGSPATIAL, 2013