Better Visualization of Trips Through Agglomerative Clustering Anbarasan S February 2, 2016 1. Problem Statement : Visualization of flow/mobility data on a map always gives a cluttered view, even for small dataset .Hence it is difficult , to derive any insights or make decisions out of it. 2. Solution: Devise a methodology to group or aggregate similar flows 3. Methodology : Step 1: K-Nearest Neighbours 1.a. Find the k-nearest neighbors for the Origin location of particular Trip/Flow. 1.b.Find The k-nearest Neighbour For the destination location of particular Trip/Flow. Step 2: Contiguous Flows Two flows/Trips ,are said to be contiguous if and only if, it satisfies both the following conditions 1. K-NN of Origin1(Trip1) overlaps with k-NN of Origin2(Trip2). 2. K-NN of Destination1(Trip1) overlaps with k-NN of Destination2(Trip2). Step 3: Agglomerative Clustering Two flows are clustered in a agglomerative fashion, based on a distance measure defined by the number of nearest neighbours shared Dist(Trip1,Trip2) = 1- [KNN(O1) η KNN(O2)/k * KNN(D1) η KNN(D2)/k] O1,O2- Origins Of Trip1 and Trip2 respectively D1,D2 destination of Trip1 and Trip2 respectively Very low value of dist , suggests that the flows are very nearer, and larger value >=1 suggests that flows cannot be clustered together.
14
Embed
Better Visualization of Trips through Agglomerative Clustering
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Visualization of flow/mobility data on a map always gives a cluttered view, even for small dataset .Hence it is difficult , to derive any insights or make decisions out of it.
2.Solution:
Devise a methodology to group or aggregate similar flows
3.Methodology:
Step 1: K-Nearest Neighbours
1.a. Find the k-nearest neighbors for the Origin location of particular Trip/Flow.
1.b.Find The k-nearest Neighbour For the destination location of particular Trip/Flow.
Step 2: Contiguous Flows
Two flows/Trips ,are said to be contiguous if and only if, it satisfies both the following conditions
1. K-NN of Origin1(Trip1) overlaps with k-NN of Origin2(Trip2).2. K-NN of Destination1(Trip1) overlaps with k-NN of Destination2(Trip2).
Step 3: Agglomerative Clustering
Two flows are clustered in a agglomerative fashion, based on a distance measure defined by the number of nearest neighbours shared
Dist(Trip1,Trip2) = 1- [KNN(O1) η KNN(O2)/k * KNN(D1) η KNN(D2)/k]
O1,O2- Origins Of Trip1 and Trip2 respectively
D1,D2 � destination of Trip1 and Trip2 respectively
Very low value of dist , suggests that the flows are very nearer, and larger value >=1 suggests that flows cannot be clustered together.
Step 4: Visualization
Agglomerative Clusters , when projected on to map, gives meaningful insights
4. Implementation:
Dataset: Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015 Data Set
o_NN <- filter(origin_NN,FlowId==i)%>% select(Neighbour_Flows) #nearest neighbours of given origin
d_NN <- filter(dest_NN,FlowId == i )%>%select(Neighbour_Flows) #NNs of given destination
NN_matches <- o_NN[,1] %in% d_NN[,1] #Flows having Common Origin and Common destination
# Contiguous/Nearest Flows for a given flow# Two Flows are said to be Contiguous if they are in# Nearest neighbour of both origin and destination of a given flowcontiguous_flows <-o_NN$Neighbour_Flows[NN_matches]
#dist bw Flows#Arguments Passed:##1.List of Flows Found to be Contiguous to a given flow##2.flow_distance -Function to calculate distance between 2 flowsif(length(contiguous_flows)!=0){ snn_flow_distance <-