Top Banner
Solution Architect Jay Runkel @jayrunkel Time Series Data: Aggregations in Action
58

MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

May 12, 2015

Download

Technology

MongoDB
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Solution Architect

Jay Runkel

@jayrunkel

Time Series Data: Aggregations in Action

Page 2: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Agenda

• Review Traffic Use Case

• Review Schema Design

• Document Retention Model

• Aggregation Queries

• Map Reduce

• Hadoop

Page 3: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Use Case Review

Page 4: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

We need to prepare for this

Page 5: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Develop Nationwide traffic monitoring system

Page 6: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop
Page 7: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Traffic sensors to monitor interstate conditions

• 16,000 sensors

• Measure at one minute intervals

• Speed• Travel time• Weather, pavement, and traffic conditions

• Support desktop, mobile, and car navigation systems

Page 8: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What we want from our data

Charting and Trending

Page 9: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What we want from our data

Historical & Predictive Analysis

Page 10: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What we want from our data

Real Time Traffic Dashboard

Page 11: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Review Schema Design

Page 12: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Document Structure

{ _id: ObjectId("5382ccdd58db8b81730344e2"),

linkId: 900006,

date: ISODate("2014-03-12T17:00:00Z"),

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Ice Spots",

weather: ”Light Snow"

}

}

Page 13: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Sample Document Structure

Compound, uniqueIndex identifies theIndividual document

{ _id: ObjectId("5382ccdd58db8b81730344e2"),

linkId: 900006,

date: ISODate("2014-03-12T17:00:00Z"),

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Page 14: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Sample Document Structure

Saves an extra index

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Page 15: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Range queries:/^900006:1403/

Regex must be left-anchored &case-sensitive

Page 16: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id: “900006:14031217”,

data: [

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

{ speed: NaN, time: NaN },

...

],

conditions: {

status: ”Snow / Ice Conditions",

pavement: ”Icy Spots",

weather: ”Light Snow"

}

}

Sample Document Structure

Pre-allocated,60 element array of per-minute data

Page 17: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Advantages

1. In place updates efficient

2. Dashboards simple queries

Page 18: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Dashboards

Mon Mar 10 2014 04:57:00 GMT-0700 (PDT)Tue Mar 11 2014 05:00:00 GMT-0700 (PDT) Tue Mar 11 2014 21:59:00 GMT-0700 (PDT)0

10

20

30

40

50

60

70

Chart Title

Series1

db.linkData.find({_id : /^20484087:2014031/})

Page 19: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Supporting Queries From Navigation Systems

Page 20: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Navigation System Queries

What is the average speed for the last 10 minutes on 50 upcoming road segments?

Page 21: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Current Real-Time Conditions

Last ten minutes of speeds and times

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Page 22: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Current Real-Time Conditions

Pre-aggregated metrics

Page 23: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

{ _id : “I-87:10656”,

description : "NYS Thruway Harriman Section Exits 14A - 16",

update : ISODate(“2013-10-10T23:06:37.000Z”),

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain”,

averageSpeed: 50.23,

averageTime: 234,

maxSafeSpeed: 53.1,

location" : {

"type" : "LineString",

"coordinates" : [

[ -74.056, 41.098 ],

[ -74.077, 41.104 ] }

}

Current Real-Time Conditions

Geo-spatially indexed road segment

Page 24: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

db.linksAvg.update(

{"_id" : linkId},

{ "$set" : {"lUpdate" : date},

"$push" : {

"times" : { "$each" : [ time ], "$slice" : -10 },

"speeds" : {"$each" : [ speed ], "$slice" : -10}

}

})

Maintaining the current conditions

Each update pops the last element off the array and pushes the new value

Page 25: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Document Retention

Page 26: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Document retention

Doc per hour

Doc per day

2 weeks

2 months

1year

Doc per Month

Page 27: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Rollup – 1 day

// daily document// retained for 2 months{ _id: "link:date",

// 24 element array hourly: [ { speed: { sum: , count: }, time: { sum: , count: } }, { speed: { sum: , count: }, time: { sum: , count: } } ]}

Page 28: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Analysis With The Aggregation Framework

Page 29: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Pipelining operations

grep | sort |uniq

Piping command line operations

Page 30: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Pipelining operations

$match $group | $sort|

Piping aggregation operations

Stream of documents Result document

Page 31: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 32: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Select documents on the target segment

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, linkId: 1 } } , { $unwind: "$data"}, { $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 33: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Keep only the fields we really need

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 34: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Loop over the array of data points

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, _id: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 35: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

What is the average speed for a given road segment?

Use the handy $avg operator

> db.linkData.aggregate( { $match: { ”_id" : /^20484097:/ } }, { $project: { "data.speed": 1, “_id”: 1 } } , { $unwind: "$data"}, { $group: { _id: "$_id", ave: { $avg: "$data.speed"} } } );{ "_id" : 20484097, "ave" : 47.067650676506766 }

Page 36: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

More Sophisticated Pipelines: average speed with variance

{ "$project" : { mean: "$meanSpd", spdDiffSqrd : { "$map" : { "input": { "$map" : { "input" : "$speeds", "as" : "samp", "in" : { "$subtract" : [ "$$samp", "$meanSpd" ] } } }, as: "df", in: { $multiply: [ "$$df", "$$df" ] }} } } },{ $unwind: "$spdDiffSqrd" },{ $group: { _id: mean: "$mean", variance: { $avg: "$spdDiffSqrd" } } }

Page 37: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Analysis With MapReduce

Page 38: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Historic Analysis

How does weather and road conditions affect traffic?

The Ask: what are the average speeds per weather, status and pavement

Page 39: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

Page 40: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Snow”, 34

Page 41: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Icy spots”, 34

Page 42: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReducefunction map() { for( var i = 0; i < this.data.length; i++ ) { emit (

this.conditions.weather, { speed :

this.data[i].speed } );

emit (

this.conditions.status, { speed :

this.data[i].speed } );

emit (

this.conditions.pavement, { speed :

this.data[i].speed } );

} }

“Delays”, 34

Page 43: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Page 44: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Weather: “Rain”, speed: 44

Page 45: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Weather: “Rain”, speed: 39

Page 46: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

Weather: “Rain”, speed: 46

Page 47: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

function reduce ( key, values ) {

var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }

Page 48: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MapReduce

function reduce ( key, values ) {

var result = { count : 1, speedSum : 0 }; values.forEach( function( v ){ result.speedSum += v.speed; result.count++; }); return result; }

Page 49: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Resultsresults: [

{ "_id" : "Generally Clear and Dry Conditions", "value" : { "count" : 902, "speedSum" : 45100 } }, { "_id" : "Icy Spots", "value" : { "count" : 242, "speedSum" : 9438 } }, { "_id" : "Light Snow", "value" : { "count" : 122, "speedSum" : 7686 } }, { "_id" : "No Report", "value" : { "count" : 782, "speedSum" : NaN } }

Page 50: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Analysis With Hadoop (using the MongoDB Connector)

Page 51: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Processing Large Data Sets

• Need to break data into smaller pieces

• Process data across multiple nodes

Hadoop

Hadoop

Hadoop Hadoop

HadoopHadoo

pHadoop

Hadoop

Hadoop

Hadoop

Page 52: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Benefits of the Hadoop Connector

• Increased parallelism• Access to analytics libraries• Separation of concerns• Integrates with existing tool chains

Page 53: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MongoDB Hadoop Connector

• Multi-source analytics• Interactive & Batch• Data lake

• Online, Real-time• High concurrency &

HA• Live analytics

Operational

Post Processingand

MongoDB Connector for

Hadoop

Page 54: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Questions?

@[email protected]

Part 3 - July 16th, 2:00 PM EST

Page 55: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Sign up for our “Path to Proof” Program and get expert advice on implementation, architecture, and

configuration.

www.mongodb.com/lp/contact/path-proof-program

Page 56: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop
Page 57: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

HVDF:https://github.com/10gen-labs/hvdf

Hadoop Connector:https://github.com/mongodb/mongo-hadoop

Page 58: MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Consulting Engineer, MongoDB Inc.

Bryan Reinero

#ConferenceHashtag

Thank You