Mongodb beijingconf yottaa_3.3

Yottaa Inc. 2 Canal Park 5th FloorCambridge MA 02141http://www.yottaa.com

MongoDB In Production:

Yottaa PracticeXiang Jun WuSystem Engineer

[email protected]

http://www.yottaa.com/

Overview

• About Yottaa

• Engineering challenges

• System Architecture

• Collection Design

• Production environment

• Lesson Learned

• QA

2

What is Yottaa?

3

We Monitor More Sites Than Anyone Else

4

Demo

5

We are recruiting!http://www.yottaa.com/about#jobs

Engineering Challenges• We collect lots of data

• 27,000+ URLs monitored

• ~300 samples per URL per day

• Some samples are >1mb (firebug)

• Missing a sample isn’t a big deal

• Collect over 10 kinds of metrics: DNS lookup, time to display, time to interactive, firebug, Yslow and so forth

• We try to make everything real time• No batch jobs, everything displayed as it happens

• “Check Now” button runs tests on demand

6

7

Engineering Challenges• Small engineering team

• Started with team of 2

• Must be Agile • We didn’t know exactly what features we’d need• Requirements change daily

• Limited operations budget• No full time operations staff• 100% in the cloud: EC2, voxel, linode, rackspace and so forth

cloud provider

MongoD

MongoD

MongoD

Data Source

App ServerApp Server

CollectionCollection

Nginx

Nginx

Passenger

Passenger

Mongos

Mongos

ReportingReporting

Sharding!

High ConcurrencyScale-Out

LoadBalancer

LoadBalancer

Easy as Rails!

8

UserUser

Database Architecture

Primary data store is broken into 5 part• Users - user related data.• Web metrics - store DNS lookup, http connection, firebug etc.

Web metrics data with different scales: daily, monthly. The purpose is to speed up data report from frontend. Raw data for query in the detailed.

• Alerts - monitor if some website/URL has performance downgrade.

• Summary - store the most frequently read URL information. also, used a message queue for worker to fetch URL access task.

• URL optimization logic - store optimization switch: enable CDN, enable compression, CSS minify and so forth.

9


MongoDB has other usage cases:• System metrics - cpu/memory/network • Application metrics - cache hit, process speed, health• All log information - use logstash (http://code.google.com/p/logstash/) to feed and store log for different components in MongoDB.

Search log via Rails. Plan to apply Sinatra interface for both log feed and query.

10


11

Thinking in rowsURL Location Connect First Byte Last Byte Timestamp{ url: ‘www.google.com’,

location: “Beijing” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }

{ url: ‘www.google.com’, location: “Shanghai” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }

12

Thinking in rowsURL Location Connect First Byte Last Byte Timestamp

What was the average connect time for google on friday?

From Beijing?From Shanghai?Between 1AM-2AM?

13

Thinking in rows

URL Location Connect First Byte Last Byte Timestamp

AVGAVG

AVGAVG

AVGAVG

Day 1

Day 2

Day 3

Result

Up to 100’s of samples per URL

per day!!

30 days average query range

An “average” chart had to hit 3000 rows

14

Thinking in DocumentsURL www.google.com

Last Byte

Sum 2312

SFO

Sum 1200

Sum 1112

This document contains all data for www.google.comcollected during 9/20/2010

This tells us the average value for this metric for this url / time period

Average value from Beijing

Average value from Shanghai

15

More efficient charts

URL Day <data>

AVG

AVG

AVG

AVG

AVG

AVG

Day 1

Day 2

Day 3

Result

1 Document per URL per Day

30 days == 30 documents

Average chart hits 30 documents.

100x fewer100x fewer

16

Storing a sample

Create the document if it doesn’t already exist

Update the location specific value

Update the aggregate value

Which document we’re updating

Atomically update the document

db.metrics.dailies.update( { url: ‘www.google.com’,

day: new Date(2010,9,2)}, { ‘$inc’: { ‘connect.sum’:1234,

‘connect.count’:1, ‘connect.bj.sum’:1234, ‘connect.bj.count’:1 } }, true // upsert );

17

Putting it together

{ url: ‘www.google.com’, location: “Beijing” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }

Atomically update the daily

data

1

Atomically update the

weekly data

2

Atomically update the

monthly data

3

18

Mongodb In Production• EC2 based large server, 2CPU, 8GB memory

• 4 MongoDB server in 3 DB cluster

• Master/Slave setup in same datacenter • One master and one slave for core database

• Backup the entire database everyday• Restore the entire data to new MongoDB server for data integrity.

• Save MongoDB log for slow query/ops analysis• After 120 days, we have > 500GB of data

• Adding about 5gb / day today

• 101 read/s, 70.96 write/s

• Global lock rate 34.9%

19

20

Production: Sharding

• Scale out architecture

Mongo auto - sharding allows us to “just add servers” at rails & db tiers. Right now, no sharding is used.

Shard 1Shard 1

Shard 2Shard 2

Shard 3Shard 3

Shard 4Shard 4

Reporting ServerReporting Server

Collection ServerCollection Server

URL 1

URL 2

URL 3

URL 4

URL 5

URL 6

URL 7

URL 8

Shard by URL

Write load evenlydistributed

Most reads hit a single shard

21

Production:Monitor• Apply restful API to send mongostat metrics to own

monitor system, we can watch MongoDB performance in real time.

Lesson Learned• Consider collection sharding from first day

• Preallocate oplog before starting MongDB if you are using ext3/ext2 file system;ext4/xfs has better performance and don’t need to take care on oplog.

• Review all slow query and add proper index in staging env.

‘

• Be careful to add index in production; Try to add indexes in background or ‘off’ time.

• Avoid slow write operation or hold lock too long

• Watch MongoDB logs after new deployment

22

23

We Are Hiring!Mongodb,Ruby ,Web and Java talents

http://www.yottaa.com/about#jobsThank you for viewing

Mongodb beijingconf yottaa_3.3

Education

url information

gb of data

web metrics data

url time periodaverage

data integrity

data report

raw data

lots of data