Top Banner
Yottaa Inc. 2 Canal Park 5 th Floor Cambridge MA 02141 http://www.yottaa.com MongoDB In Production: Yottaa Practice Xiang Jun Wu System Engineer [email protected]
23

Mongodb beijingconf yottaa_3.3

Jan 15, 2015

Download

Education

Yottaa Inc

Yottaa mongodb production in Mongodb Beijing 2011.3.3 conference.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mongodb beijingconf yottaa_3.3

Yottaa Inc. 2 Canal Park 5th FloorCambridge MA 02141http://www.yottaa.com

MongoDB In Production:

Yottaa PracticeXiang Jun WuSystem Engineer

[email protected]

Page 2: Mongodb beijingconf yottaa_3.3

Overview

• About Yottaa

• Engineering challenges

• System Architecture

• Collection Design

• Production environment

• Lesson Learned

• QA

2

Page 3: Mongodb beijingconf yottaa_3.3

What is Yottaa?

3

Page 4: Mongodb beijingconf yottaa_3.3

We Monitor More Sites Than Anyone Else

4

Page 5: Mongodb beijingconf yottaa_3.3

Demo

5

We are recruiting!http://www.yottaa.com/about#jobs

Page 6: Mongodb beijingconf yottaa_3.3

Engineering Challenges• We collect lots of data

• 27,000+ URLs monitored

• ~300 samples per URL per day

• Some samples are >1mb (firebug)

• Missing a sample isn’t a big deal

• Collect over 10 kinds of metrics: DNS lookup, time to display, time to interactive, firebug, Yslow and so forth

• We try to make everything real time• No batch jobs, everything displayed as it happens

• “Check Now” button runs tests on demand

6

Page 7: Mongodb beijingconf yottaa_3.3

7

Engineering Challenges• Small engineering team

• Started with team of 2

• Must be Agile • We didn’t know exactly what features we’d need• Requirements change daily

• Limited operations budget• No full time operations staff• 100% in the cloud: EC2, voxel, linode, rackspace and so forth

cloud provider

Page 8: Mongodb beijingconf yottaa_3.3

MongoD

MongoD

MongoD

Data Source

App ServerApp Server

CollectionCollection

Nginx

Nginx

Passenger

Passenger

Mongos

Mongos

ReportingReporting

Sharding!

High ConcurrencyScale-Out

LoadBalancer

LoadBalancer

Easy as Rails!

8

UserUser

Page 9: Mongodb beijingconf yottaa_3.3

Database Architecture

Primary data store is broken into 5 part• Users - user related data.• Web metrics - store DNS lookup, http connection, firebug etc.

Web metrics data with different scales: daily, monthly. The purpose is to speed up data report from frontend. Raw data for query in the detailed.

• Alerts - monitor if some website/URL has performance downgrade.

• Summary - store the most frequently read URL information. also, used a message queue for worker to fetch URL access task.

• URL optimization logic - store optimization switch: enable CDN, enable compression, CSS minify and so forth.

9

Page 10: Mongodb beijingconf yottaa_3.3

Database Architecture

MongoDB has other usage cases:• System metrics - cpu/memory/network • Application metrics - cache hit, process speed, health• All log information - use logstash (http://code.google.com/p/logstash/) to feed and store log for different components in MongoDB.

Search log via Rails. Plan to apply Sinatra interface for both log feed and query.

10

Page 11: Mongodb beijingconf yottaa_3.3

Database Architecture

11

Page 12: Mongodb beijingconf yottaa_3.3

Thinking in rowsURL Location Connect First Byte Last Byte Timestamp{ url: ‘www.google.com’,

location: “Beijing” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }

{ url: ‘www.google.com’, location: “Shanghai” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }

12

Page 13: Mongodb beijingconf yottaa_3.3

Thinking in rowsURL Location Connect First Byte Last Byte Timestamp

What was the average connect time for google on friday?

From Beijing?From Shanghai?Between 1AM-2AM?

13

Page 14: Mongodb beijingconf yottaa_3.3

Thinking in rows

URL Location Connect First Byte Last Byte Timestamp

AVGAVG

AVGAVG

AVGAVG

Day 1

Day 2

Day 3

Result

Up to 100’s of samples per URL

per day!!

30 days average query range

An “average” chart had to hit 3000 rows

14

Page 15: Mongodb beijingconf yottaa_3.3

Thinking in DocumentsURL www.google.com

Last Byte

Sum 2312

SFO

Sum 1200

Sum 1112

This document contains all data for www.google.comcollected during 9/20/2010

This tells us the average value for this metric for this url / time period

Average value from Beijing

Average value from Shanghai

15

Page 16: Mongodb beijingconf yottaa_3.3

More efficient charts

URL Day <data>

AVG

AVG

AVG

AVG

AVG

AVG

Day 1

Day 2

Day 3

Result

1 Document per URL per Day

30 days == 30 documents

Average chart hits 30 documents.

100x fewer100x fewer

16

Page 17: Mongodb beijingconf yottaa_3.3

Storing a sample

Create the document if it doesn’t already exist

Update the location specific value

Update the aggregate value

Which document we’re updating

Atomically update the document

db.metrics.dailies.update( { url: ‘www.google.com’,

day: new Date(2010,9,2)}, { ‘$inc’: { ‘connect.sum’:1234,

‘connect.count’:1, ‘connect.bj.sum’:1234, ‘connect.bj.count’:1 } }, true // upsert );

17

Page 18: Mongodb beijingconf yottaa_3.3

Putting it together

{ url: ‘www.google.com’, location: “Beijing” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }

Atomically update the daily

data

1

Atomically update the

weekly data

2

Atomically update the

monthly data

3

18

Page 19: Mongodb beijingconf yottaa_3.3

Mongodb In Production• EC2 based large server, 2CPU, 8GB memory

• 4 MongoDB server in 3 DB cluster

• Master/Slave setup in same datacenter • One master and one slave for core database

• Backup the entire database everyday• Restore the entire data to new MongoDB server for data integrity.

• Save MongoDB log for slow query/ops analysis• After 120 days, we have > 500GB of data

• Adding about 5gb / day today

• 101 read/s, 70.96 write/s

• Global lock rate 34.9%

19

Page 20: Mongodb beijingconf yottaa_3.3

20

Production: Sharding

• Scale out architecture

Mongo auto - sharding allows us to “just add servers” at rails & db tiers. Right now, no sharding is used.

Shard 1Shard 1

Shard 2Shard 2

Shard 3Shard 3

Shard 4Shard 4

Reporting ServerReporting Server

Collection ServerCollection Server

URL 1

URL 2

URL 3

URL 4

URL 5

URL 6

URL 7

URL 8

Shard by URL

Write load evenlydistributed

Most reads hit a single shard

Page 21: Mongodb beijingconf yottaa_3.3

21

Production:Monitor• Apply restful API to send mongostat metrics to own

monitor system, we can watch MongoDB performance in real time.

Page 22: Mongodb beijingconf yottaa_3.3

Lesson Learned• Consider collection sharding from first day

• Preallocate oplog before starting MongDB if you are using ext3/ext2 file system;ext4/xfs has better performance and don’t need to take care on oplog.

• Review all slow query and add proper index in staging env.

• Be careful to add index in production; Try to add indexes in background or ‘off’ time.

• Avoid slow write operation or hold lock too long

• Watch MongoDB logs after new deployment

22

Page 23: Mongodb beijingconf yottaa_3.3

23

We Are Hiring!Mongodb,Ruby ,Web and Java talents

http://www.yottaa.com/about#jobsThank you for viewing