Solution Architect Jay Runkel @jayrunkel Time Series Data: Aggregations in Action
May 12, 2015
Solution Architect
Jay Runkel
@jayrunkel
Time Series Data: Aggregations in Action
Agenda
• Review Traffic Use Case
• Review Schema Design
• Document Retention Model
• Aggregation Queries
• Map Reduce
• Hadoop
Use Case Review
We need to prepare for this
Develop Nationwide traffic monitoring system
Traffic sensors to monitor interstate conditions
• 16,000 sensors
• Measure at one minute intervals
• Speed• Travel time• Weather, pavement, and traffic conditions
• Support desktop, mobile, and car navigation systems
What we want from our data
Charting and Trending
What we want from our data
Historical & Predictive Analysis
What we want from our data
Real Time Traffic Dashboard
Review Schema Design
Document Structure
{ _id: ObjectId("5382ccdd58db8b81730344e2"),
linkId: 900006,
date: ISODate("2014-03-12T17:00:00Z"),
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: ”Snow / Ice Conditions",
pavement: ”Ice Spots",
weather: ”Light Snow"
}
}
Sample Document Structure
Compound, uniqueIndex identifies theIndividual document
{ _id: ObjectId("5382ccdd58db8b81730344e2"),
linkId: 900006,
date: ISODate("2014-03-12T17:00:00Z"),
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: ”Snow / Ice Conditions",
pavement: ”Icy Spots",
weather: ”Light Snow"
}
}
Sample Document Structure
Saves an extra index
{ _id: “900006:14031217”,
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: ”Snow / Ice Conditions",
pavement: ”Icy Spots",
weather: ”Light Snow"
}
}
{ _id: “900006:14031217”,
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: ”Snow / Ice Conditions",
pavement: ”Icy Spots",
weather: ”Light Snow"
}
}
Sample Document Structure
Range queries:/^900006:1403/
Regex must be left-anchored &case-sensitive
{ _id: “900006:14031217”,
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: ”Snow / Ice Conditions",
pavement: ”Icy Spots",
weather: ”Light Snow"
}
}
Sample Document Structure
Pre-allocated,60 element array of per-minute data
Advantages
1. In place updates efficient
2. Dashboards simple queries
Dashboards
Mon Mar 10 2014 04:57:00 GMT-0700 (PDT)Tue Mar 11 2014 05:00:00 GMT-0700 (PDT) Tue Mar 11 2014 21:59:00 GMT-0700 (PDT)0
10
20
30
40
50
60
70
Chart Title
Series1
db.linkData.find({_id : /^20484087:2014031/})
Supporting Queries From Navigation Systems
Navigation System Queries
What is the average speed for the last 10 minutes on 50 upcoming road segments?
Current Real-Time Conditions
Last ten minutes of speeds and times
{ _id : “I-87:10656”,
description : "NYS Thruway Harriman Section Exits 14A - 16",
update : ISODate(“2013-10-10T23:06:37.000Z”),
speeds : [ 52, 49, 45, 51, ... ],
times : [ 237, 224, 246, 233,... ],
pavement: "Wet Spots",
status: "Wet Conditions",
weather: "Light Rain”,
averageSpeed: 50.23,
averageTime: 234,
maxSafeSpeed: 53.1,
location" : {
"type" : "LineString",
"coordinates" : [
[ -74.056, 41.098 ],
[ -74.077, 41.104 ] }
}
{ _id : “I-87:10656”,
description : "NYS Thruway Harriman Section Exits 14A - 16",
update : ISODate(“2013-10-10T23:06:37.000Z”),
speeds : [ 52, 49, 45, 51, ... ],
times : [ 237, 224, 246, 233,... ],
pavement: "Wet Spots",
status: "Wet Conditions",
weather: "Light Rain”,
averageSpeed: 50.23,
averageTime: 234,
maxSafeSpeed: 53.1,
location" : {
"type" : "LineString",
"coordinates" : [
[ -74.056, 41.098 ],
[ -74.077, 41.104 ] }
}
Current Real-Time Conditions
Pre-aggregated metrics
{ _id : “I-87:10656”,
description : "NYS Thruway Harriman Section Exits 14A - 16",
update : ISODate(“2013-10-10T23:06:37.000Z”),
speeds : [ 52, 49, 45, 51, ... ],
times : [ 237, 224, 246, 233,... ],
pavement: "Wet Spots",
status: "Wet Conditions",
weather: "Light Rain”,
averageSpeed: 50.23,
averageTime: 234,
maxSafeSpeed: 53.1,
location" : {
"type" : "LineString",
"coordinates" : [
[ -74.056, 41.098 ],
[ -74.077, 41.104 ] }
}
Current Real-Time Conditions
Geo-spatially indexed road segment
db.linksAvg.update(
{"_id" : linkId},
{ "$set" : {"lUpdate" : date},
"$push" : {
"times" : { "$each" : [ time ], "$slice" : -10 },
"speeds" : {"$each" : [ speed ], "$slice" : -10}
}
})
Maintaining the current conditions
Each update pops the last element off the array and pushes the new value
Document Retention
Document retention
Doc per hour
Doc per day
2 weeks
2 months
1year
Doc per Month
Rollup – 1 day
// daily document// retained for 2 months{ _id: "link:date",
// 24 element array hourly: [ { speed: { sum: , count: }, time: { sum: , count: } }, { speed: { sum: , count: }, time: { sum: , count: } } ]}
Analysis With The Aggregation Framework
Pipelining operations
grep | sort |uniq
Piping command line operations
Pipelining operations
$match $group | $sort|
Piping aggregation operations
Stream of documents Result document
What is the average speed for a given road segment?
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a given road segment?
Select documents on the target segment
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a given road segment?
Keep only the fields we really need
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a given road segment?
Loop over the array of data points
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a given road segment?
Use the handy $avg operator
> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, “_id”: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }
More Sophisticated Pipelines: average speed with variance
{ "$project" : { mean: "$meanSpd", spdDiffSqrd : { "$map" : { "input": { "$map" : { "input" : "$speeds", "as" : "samp", "in" : { "$subtract" : [ "$$samp", "$meanSpd" ] } } }, as: "df", in: { $multiply: [ "$$df", "$$df" ] }} } } },{ $unwind: "$spdDiffSqrd" },{ $group: { _id: mean: "$mean", variance: { $avg: "$spdDiffSqrd" } } }
Analysis With MapReduce
Historic Analysis
How does weather and road conditions affect traffic?
The Ask: what are the average speeds per weather, status and pavement
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
“Snow”, 34
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
“Icy spots”, 34
MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (
this.conditions.weather, { speed :
this.data[i].speed } );
emit (
this.conditions.status, { speed :
this.data[i].speed } );
emit (
this.conditions.pavement, { speed :
this.data[i].speed } );
} }
“Delays”, 34
MapReduce
MapReduce
Weather: “Rain”, speed: 44
MapReduce
Weather: “Rain”, speed: 39
MapReduce
Weather: “Rain”, speed: 46
MapReduce
function reduce ( key, values ) {
var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }
MapReduce
function reduce ( key, values ) {
var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }
Resultsresults: [
{ "_id" : "Generally Clear and Dry Conditions", "value" : { "count" : 902, "speedSum" : 45100 } }, { "_id" : "Icy Spots", "value" : { "count" : 242, "speedSum" : 9438 } }, { "_id" : "Light Snow", "value" : { "count" : 122, "speedSum" : 7686 } }, { "_id" : "No Report", "value" : { "count" : 782, "speedSum" : NaN } }
Analysis With Hadoop (using the MongoDB Connector)
Processing Large Data Sets
• Need to break data into smaller pieces
• Process data across multiple nodes
Hadoop
Hadoop
Hadoop Hadoop
HadoopHadoo
pHadoop
Hadoop
Hadoop
Hadoop
Benefits of the Hadoop Connector
• Increased parallelism• Access to analytics libraries• Separation of concerns• Integrates with existing tool chains
MongoDB Hadoop Connector
• Multi-source analytics• Interactive & Batch• Data lake
• Online, Real-time• High concurrency &
HA• Live analytics
Operational
Post Processingand
MongoDB Connector for
Hadoop
Sign up for our “Path to Proof” Program and get expert advice on implementation, architecture, and
configuration.
www.mongodb.com/lp/contact/path-proof-program
HVDF:https://github.com/10gen-labs/hvdf
Hadoop Connector:https://github.com/mongodb/mongo-hadoop
Consulting Engineer, MongoDB Inc.
Bryan Reinero
#ConferenceHashtag
Thank You