Top Banner
Harmony in Tune Philip (flip) Kromer Huston Hoburg infochimps.com Feb 15 2013 How we Refactored Cube to Terabyte Scale
80

Harmony intune final

Mar 22, 2017

Download

Documents

MongoDB
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Harmony intune final

Harmony in Tune

Philip (flip) KromerHuston Hoburg infochimps.com

Feb 15 2013

How we Refactored Cube to Terabyte Scale

Page 2: Harmony intune final

Big Data for All

Page 3: Harmony intune final

Big Data for All

Page 4: Harmony intune final

why dashboards?

Page 5: Harmony intune final

Lightweight Dashboards

• Understand what’s happening

• Understand data in context

• NOT exploratory analytics

• real-time insight...but not just about real-time

mainline: j.mp/sqcube

hi-scale branch: j.mp/icscube

Page 6: Harmony intune final

The “Church of Graphs”

Page 7: Harmony intune final

Predictive Kvetching

Page 8: Harmony intune final

Lightweight Dashboards

Page 9: Harmony intune final

Approach to Tuning

• Measure: “Why can’t it be faster?”

• Harmonize: “Use it right”

• Tune: “Align it to production resources”

Page 10: Harmony intune final

cube is awesome

Page 11: Harmony intune final

What’s so great?• Streaming, real-time

• Ad-hoc data: write whatever you want

• Ad-hoc queries: make up new queries whenever

• Efficient (“pyramidal”) calculations

Page 12: Harmony intune final

Event Stream

• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }

Page 13: Harmony intune final

Events vs Metrics

• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }

Event:

• “# of tweets in 10s bucket at 1:02:10 on 2013-02-15”

• “# of non-english-language tweets in 1hr bucket at ...”

Metrics:

Page 14: Harmony intune final

Events vs Metrics

• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

Event:

Metrics:

• “# of requests in 10s bucket at 3:05:10 on 2013-02-15”

• “Average duration of requests with 4xx status in the 5 minute bucket at 3:05:00 on 2013-02-15”

Page 15: Harmony intune final

Events vs Metrics• Events:

• baskets of facts

• narcissistic

• LOTS AND LOTS

{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

Page 16: Harmony intune final

Events vs Metrics• Events:

• baskets of facts

• narcissistic

• LOTS AND LOTS

• Metrics:

• a timestamped number

• look like the graph

• one per time bucket

{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }

{ time: "2013-02-15T01:02:03Z", value: 90 }

Page 17: Harmony intune final

billions and billions

Page 18: Harmony intune final

3000 events/second

Page 19: Harmony intune final

tuning methodology

Page 20: Harmony intune final

Monkey See Monkey Do

Google for the #s the cool kids use

Page 21: Harmony intune final

Spinal Tap

Turn everythingto 11!!!!

Page 22: Harmony intune final

Hillbilly Mechanic

Rewrite formemcachedHBase onCassandra!!!

Page 23: Harmony intune final

Moneybags

SSD plz

Moar CPU

Moar RAM

Moar Replica

Page 24: Harmony intune final

Tuning How to do it

• Measure: “Why can’t it be faster?”

• Harmonize: “Use it right”

• Tune: “Align it to production resources”

Page 25: Harmony intune final

see throughthe magic

Page 26: Harmony intune final

• Why can’t it be faster than it is now?

Page 27: Harmony intune final

• dstat (http://j.mp/dstatftw): dstat -drnycmf -t 5

• htop

• mongostat

Page 28: Harmony intune final

Grok: client-side

• Made a sprayer to inject data

• invalidate a time range at max speed

• writes variously-shaped data: noise, ramp, sine, etc

• Or just reach into the DB and poke

• delete range of metrics, leave events

• delete range of events, leave metrics

Page 29: Harmony intune final

Fault injection

• raise when packet comes in with certain flag

• { time: "2013...", data: {...}, _raise:"db_write" }

• (only in development mode, obvs.)

Page 30: Harmony intune final

app-side tracing

• “Metalog” announces lifecycle progress:

• writes to log...

• ... or as cube metrics!

metalog.event('connect', { method: 'ws', ip: connection.remoteAddress, path: request.url }, 'minor');

Page 31: Harmony intune final

app-side tracing

Page 32: Harmony intune final

fits on machine

Page 33: Harmony intune final

• Rate:

• 3000 ev/sec ≈ 250 M ev/day ≈ 2 BILLION/wk

• Expensive. Difficult.

• 250 GB accumulated per day (@1000 bytes/ev)

• 95 TB accumulated per year (@1000 bytes/ev)

3000 events/second

Page 34: Harmony intune final

Metrics• Rate:

• 3M tensec/year (π· 107 sec/year)

• < 100 bytes/metric ...

• Manageable!

• a 30 metric dashboard is ~ 10 GB/year @10sec

• a 30 metric dashboard is ~ 170 MB/year @ 5min

Page 35: Harmony intune final

20% gains are boring

At scale, your first barriers are either:

• Easy

• Impossible

Metrics: 10 GB/year

Events: 10 TB/month

Page 36: Harmony intune final

Scalability síPerformance no

Page 37: Harmony intune final

Still CPU and Memory Use

• Problem

• Mongo seems to be working

• but high resident memory and fault rate

• Memory-mapped Files

• 1Tb data served by 4Gb ram is no good

Page 38: Harmony intune final

Capped Collections

AA B C D E F

• Fixed size circular queue

• records are in order of insertion

• oldest records are discarded when full

AH C D E F G ......G

Page 39: Harmony intune final

Capped Collections

• Extremely efficient on write

• Extremely efficient for insertion-order reads

• Very efficient if queries are ‘local’

• events in same timebucket typically arrived at nearby timesand so are nearby on disk

AA B C D E F

Page 40: Harmony intune final

don’t like the answer?

change the question.

Page 41: Harmony intune final

uncapped events

capped metrics:

metrics are a view on data

mainline

Page 42: Harmony intune final

capped events

uncapped metrics:

events are ephemeral

hi-scale branch

Page 43: Harmony intune final

Harmony

• Make your pattern of accessmatch your system’s strengths and rhythm

Page 44: Harmony intune final

Validate Mental Model

Page 45: Harmony intune final

Easy fixes

• Duplicate requests = duplicate calculations

• Cube patch for request queues exists

• Easy fix!

• Non-pyramidal are inefficient

• Remove until things are under control

• ( solve paralyzing problems first )

Page 46: Harmony intune final

cube 101

Page 47: Harmony intune final

Cube Systems

Page 48: Harmony intune final

Collector

• Receives events

• writes to MongoDB

• marks metrics for re-calculation (“invalidates”)

Page 49: Harmony intune final

Evaluator

• receives, parses requests for metrics

• calculates metrics “pyramidally”

• then stores them, cached

Page 50: Harmony intune final

Pyramidal Aggregation

10 20 15 25 10 10

1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 6 0 0 1 0 3

90

ev ev ev ev ev ev ...

10s

1min

5min

Page 51: Harmony intune final

Pyramidal Aggregation

1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2

ev ev ev ev ev ev ...

10s

1min

5min

Page 52: Harmony intune final

Uses Cached Results

1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2

ev ev ev ev ev ev ...

10 20 15 25 10

5 5 4 6 4 1 2 7 0 0 0 1 10s

1min

5min

Page 53: Harmony intune final

Pyramidal Aggregation

5 min

1 min

10 sec

ev ev ev ev ev....

• calculates metrics...

• from metrics and constants ... from metrics ...

• from events

• (then stores them, cached)

Page 54: Harmony intune final

fast writes

Page 55: Harmony intune final

how fast can we write?

Page 56: Harmony intune final

how fast can we write?

FASTstreaming writes: way efficient

Page 57: Harmony intune final

locked out

Page 58: Harmony intune final

Writes and Invalidations

Page 59: Harmony intune final

Inserts Stop Every 5s

• working

• working

• ANGRY

• ANGRY

• working

• working

Page 60: Harmony intune final

Thanks, mongostat!

• working

• working

• ANGRY

• ANGRY

• working

• working

...

(simulated)

Page 61: Harmony intune final

Inserts Stop Every 5sEvents Collection

AH C D E F G ......G

hi-speed writes localized reads

Metrics Collection. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . ..

. ..

randomishreads

hi-speeddeletes

xxxxxxx

updates

Page 62: Harmony intune final

Inserts Stop Every 5sEvents Collection

AH C D E F G ......G

hi-speed writes localized reads

Metrics Collection. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . ..

. ..

randomishreads

hi-speeddeletes

xxxxxxx

updates

Page 63: Harmony intune final

Inserts Stop Every 5s• What’s really going on?

• Database write locks

• Events and metrics have conflicting locks

• Solution: split the databasesEvents Collection

AH C D E F G ......G

hi-speed writes localized reads

Metrics Collection. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . ..

. ..

randomishreads

hi-speeddeletes

xxxxxxx

Page 64: Harmony intune final

fast reads

Page 65: Harmony intune final

Pre-cache Metrics

• Keep metrics fresh (Warmer)

• Only calculate recent updates (Horizons)

Page 66: Harmony intune final

fancy metrics

Page 67: Harmony intune final

Non-pyramidal Aggregates

• Can’t calculate from warmed metrics

• Store values with counts in metrics

• Counts can be vivified for aggregations

• Smaller footprint than full events

• Works best for dense, finite values

Page 68: Harmony intune final

finally, scaling

Page 69: Harmony intune final

Multicore

• MongoDB

• Writes limited to single core

• Requires sharding for multicore

Page 70: Harmony intune final

Multicore

• Cube (node.js)

• Concurrent, but not multi-threaded

• Easy solution

• Multiple collectors on different ports

• Produces redundant invalidations

• Requires external load balancing

Page 71: Harmony intune final

Multicore

Page 72: Harmony intune final

Hardware

• High Memory

• Capped events size scale with memory

• CPU

• Mongo / cube not optimized for multicore

• Faster cores

• EC2 Best value: m2.2xlarge

• < $700/mo, 34.2GB RAM, 13 bogo-hertz

Page 73: Harmony intune final

Cloud helps

• Tune machines to application

• Dedicating databases for each application makes life a lot easier

Page 74: Harmony intune final

Cloud helps

• Tune machines to application

Page 76: Harmony intune final

good ideas that didn’t help

Page 77: Harmony intune final

Queues

• Different queueing methods

• Should optimize metric calculations

• No significant improvement

Page 78: Harmony intune final

Locks: update VS remove

• Uncapped metrics allow ‘remove’ as invalidation option

• Remove doesn’t help with database locks

• It was a stupid idea anyway: that’s OK

• “Hey, poke it and see what happens!”

Page 79: Harmony intune final

Mongo Aggregations

• Mongo has aggregations!

• Node ends up working better

• Mongo aggregations aren’t faster

• Less flexible

• Would require query language rewrite

Page 80: Harmony intune final

Why not Graphite?

• Data model

• Metrics-centric vs Events-centric(metrics code not intertwingled with app code)

• Environment familiarity

• Cube: d3, node.js, mongo

• Graphite: Django, Whisper, C