Top Banner
Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb
38

Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Dec 11, 2015

Download

Documents

Gerard Case
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Time Series Data in MongoDB

Senior Solutions Architect, MongoDB Inc.

Massimo Brignoli

#mongodb

Page 2: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Agenda

• What is time series data?

• Schema design considerations

• Broader use case: operational intelligence

• MMS Monitoring schema design

• Thinking ahead

• Questions

Page 3: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

What is time series data?

Page 4: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Time Series Data is Everywhere

• Financial markets pricing (stock ticks)

• Sensors (temperature, pressure, proximity)

• Industrial fleets (location, velocity, operational)

• Social networks (status updates)

• Mobile devices (calls, texts)

• Systems (server logs, application logs)

Page 5: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Time Series Data at a Higher Level

• Widely applicable data model

• Applies to several different “data use cases”

• Various schema and modeling options

• Application requirements drive schema design

Page 6: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Time Series Data Considerations

• Resolution of raw events

• Resolution needed to support– Applications– Analysis– Reporting

• Data retention policies– Data ages out– Retention

Page 7: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Schema Design Considerations

Page 8: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Designing For Writing and Reading

• Document per event

• Document per minute (average)

• Document per minute (second)

• Document per hour

Page 9: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Document Per Event

{

server: “server1”,

load: 92,

ts: ISODate("2013-10-16T22:07:38.000-0500")

}

• Relational-centric approach

• Insert-driven workload

• Aggregations computed at application-level

Page 10: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Document Per Minute (Average){

server: “server1”,

load_num: 92,

load_sum: 4500,

ts: ISODate("2013-10-16T22:07:00.000-0500")

}

• Pre-aggregate to compute average per minute more easily

• Update-driven workload

• Resolution at the minute-level

Page 11: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Document Per Minute (By Second){

server: “server1”,

load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }

ts: ISODate("2013-10-16T22:07:00.000-0500")

}

• Store per-second data at the minute level

• Update-driven workload

• Pre-allocate structure to avoid document moves

Page 12: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Document Per Hour (By Second){

server: “server1”,

load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }

ts: ISODate("2013-10-16T22:00:00.000-0500")

}

• Store per-second data at the hourly level

• Update-driven workload

• Pre-allocate structure to avoid document moves

• Updating last second requires 3599 steps

Page 13: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Document Per Hour (By Second){

server: “server1”,

load: {

0: {0: 15, …, 59: 45},

….

59: {0: 25, …, 59: 75}

ts: ISODate("2013-10-16T22:00:00.000-0500")

}

• Store per-second data at the hourly level with nesting

• Update-driven workload

• Pre-allocate structure to avoid document moves

• Updating last second requires 59+59 steps

Page 14: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Characterzing Write Differences

• Example: data generated every second

• Capturing data per minute requires:– Document per event: 60 writes– Document per minute: 1 write, 59 updates

• Transition from insert driven to update driven– Individual writes are smaller– Performance and concurrency benefits

Page 15: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Characterizing Read Differences

• Example: data generated every second

• Reading data for a single hour requires:– Document per event: 3600 reads– Document per minute: 60 reads

• Read performance is greatly improved– Optimal with tuned block sizes and read ahead– Fewer disk seeks

Page 16: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

MMS Monitoring Schema Design

Page 17: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

MMS Monitoring

• MongoDB Management System Monitoring

• Available in two flavors– Free cloud-hosted monitoring– On-premise with MongoDB Enterprise

• Monitor single node, replica set, or sharded cluster deployments

• Metric dashboards and custom alert triggers

Page 18: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

MMS Monitoring

Page 19: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

MMS Monitoring

Page 20: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

MMS Application Requirements

Resolution defines granularity of stored data

Range controls the retention policy, e.g. after 24 hours only 5-minute resolution

Display dictates the stored pre-aggregations, e.g. total and count

Page 21: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Monitoring Schema Design

• Per-minute document model

• Documents store individual metrics and counts

• Supports “total” and “avg/sec” display

{ timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 }}

Page 22: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Monitoring Data Updates

• Single update required to add new data and increment associated counts

db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} })

Page 23: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Monitoring Data Management

• Data stored at different granularity levels for read performance

• Collections are organized into specific intervals

• Retention is managed by simply dropping collections as they age out

• Document structure is pre-created to maximize write performance

Page 24: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Use Case: Operational Intelligence

Page 25: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

What is Operational Intelligence

• Storing log data– Capturing application and/or server generated

events

• Hierarchical aggregation– Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly

• Pre-aggregated reports– Processing data to generate reporting from raw

events

Page 26: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Storing Log Data

{ _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"}

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”

Page 27: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Pre-Aggregation

• Analytics across raw events can involve many reads

• Alternative schemas can improve read and write performance

• Data can be organized into more coarse buckets

• Transition from insert-driven to update-driven workloads

Page 28: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Pre-Aggregated Log Data{ timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 }}

• Leverage time-series style bucketing

• Track individual metrics (ex. page views)

• Improve performance for reads/writes

• Minimal processing overhead

Page 29: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Hierarchical Aggregation

• Analytical approach as opposed to schema approach– Leverage built-in Aggregation Framework or

MapReduce

• Execute multiple tasks sequentially to aggregate at varying levels

• Raw events Hourly Weekly Monthly

• Rolling approach distributes the aggregation workload

Page 30: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Thinking Ahead

Page 31: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Before You Start

• What are the application requirements?

• Is pre-aggregation useful for your application?

• What are your retention and age-out policies?

• What are the gotchas?– Pre-create document structure to avoid

fragmentation and performance problems– Organize your data for growth – time series data

grows fast!

Page 32: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Down The Road

• Scale-out considerations– Vertical vs. horizontal (with sharding)

• Understanding the data– Aggregation– Analytics– Reporting

• Deeper data analysis– Patterns– Predictions

Page 33: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Scaling Time Series Data in MongoDB

• Vertical growth– Larger instances with more CPU and memory– Increased storage capacity

• Horizontal growth– Partitioning data across many machines– Dividing and distributing the workload

Page 34: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Time Series Sharding Considerations

• What are the application requirements?– Primarily collecting data– Primarily reporting data– Both

• Map those back to– Write performance needs– Read/write query distribution– Collection organization (see MMS Monitoring)

• Example: {metric name, coarse timestamp}

Page 35: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Aggregates, Analytics, Reporting

• Aggregation Framework can be used for analysis– Does it work with the chosen schema design?– What sorts of aggregations are needed?

• Reporting can be done on predictable, rolling basis– See “Hierarchical Aggregation”

• Consider secondary reads for analytical operations– Minimize load on production primaries

Page 36: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Deeper Data Analysis

• Leverage MongoDB-Hadoop connector– Bi-directional support for reading/writing– Works with online and offline data (e.g. backup

files)

• Compute using MapReduce– Patterns– Recommendations– Etc.

• Explore data– Pig– Hive

Page 37: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Questions?

Page 38: Time Series Data in MongoDB Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb.

Resources

• Schema Design for Time Series Data in MongoDBhttp://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

• Operational Intelligence Use Casehttp://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence

• Data Modeling in MongoDBhttp://docs.mongodb.org/manual/data-modeling/

• Schema Design (webinar)http://www.mongodb.com/events/webinar/schema-design-oct2013