Yottaa Inc. 2 Canal Park 5 th Floor Cambridge MA 02141 http://www.yottaa.com MongoDB In Production: Yottaa Practice Xiang Jun Wu System Engineer [email protected]
Jan 15, 2015
Yottaa Inc. 2 Canal Park 5th FloorCambridge MA 02141http://www.yottaa.com
MongoDB In Production:
Yottaa PracticeXiang Jun WuSystem Engineer
Overview
• About Yottaa
• Engineering challenges
• System Architecture
• Collection Design
• Production environment
• Lesson Learned
• QA
2
What is Yottaa?
3
We Monitor More Sites Than Anyone Else
4
Demo
5
We are recruiting!http://www.yottaa.com/about#jobs
Engineering Challenges• We collect lots of data
• 27,000+ URLs monitored
• ~300 samples per URL per day
• Some samples are >1mb (firebug)
• Missing a sample isn’t a big deal
• Collect over 10 kinds of metrics: DNS lookup, time to display, time to interactive, firebug, Yslow and so forth
• We try to make everything real time• No batch jobs, everything displayed as it happens
• “Check Now” button runs tests on demand
6
7
Engineering Challenges• Small engineering team
• Started with team of 2
• Must be Agile • We didn’t know exactly what features we’d need• Requirements change daily
• Limited operations budget• No full time operations staff• 100% in the cloud: EC2, voxel, linode, rackspace and so forth
cloud provider
MongoD
MongoD
MongoD
Data Source
App ServerApp Server
CollectionCollection
Nginx
Nginx
Passenger
Passenger
Mongos
Mongos
ReportingReporting
Sharding!
High ConcurrencyScale-Out
LoadBalancer
LoadBalancer
Easy as Rails!
8
UserUser
Database Architecture
Primary data store is broken into 5 part• Users - user related data.• Web metrics - store DNS lookup, http connection, firebug etc.
Web metrics data with different scales: daily, monthly. The purpose is to speed up data report from frontend. Raw data for query in the detailed.
• Alerts - monitor if some website/URL has performance downgrade.
• Summary - store the most frequently read URL information. also, used a message queue for worker to fetch URL access task.
• URL optimization logic - store optimization switch: enable CDN, enable compression, CSS minify and so forth.
9
Database Architecture
MongoDB has other usage cases:• System metrics - cpu/memory/network • Application metrics - cache hit, process speed, health• All log information - use logstash (http://code.google.com/p/logstash/) to feed and store log for different components in MongoDB.
Search log via Rails. Plan to apply Sinatra interface for both log feed and query.
10
Database Architecture
11
Thinking in rowsURL Location Connect First Byte Last Byte Timestamp{ url: ‘www.google.com’,
location: “Beijing” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }
{ url: ‘www.google.com’, location: “Shanghai” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }
12
Thinking in rowsURL Location Connect First Byte Last Byte Timestamp
What was the average connect time for google on friday?
From Beijing?From Shanghai?Between 1AM-2AM?
13
Thinking in rows
URL Location Connect First Byte Last Byte Timestamp
AVGAVG
AVGAVG
AVGAVG
Day 1
Day 2
Day 3
Result
Up to 100’s of samples per URL
per day!!
30 days average query range
An “average” chart had to hit 3000 rows
14
Thinking in DocumentsURL www.google.com
Last Byte
Sum 2312
SFO
Sum 1200
Sum 1112
This document contains all data for www.google.comcollected during 9/20/2010
This tells us the average value for this metric for this url / time period
Average value from Beijing
Average value from Shanghai
15
More efficient charts
URL Day <data>
AVG
AVG
AVG
AVG
AVG
AVG
Day 1
Day 2
Day 3
Result
1 Document per URL per Day
30 days == 30 documents
Average chart hits 30 documents.
100x fewer100x fewer
16
Storing a sample
Create the document if it doesn’t already exist
Update the location specific value
Update the aggregate value
Which document we’re updating
Atomically update the document
db.metrics.dailies.update( { url: ‘www.google.com’,
day: new Date(2010,9,2)}, { ‘$inc’: { ‘connect.sum’:1234,
‘connect.count’:1, ‘connect.bj.sum’:1234, ‘connect.bj.count’:1 } }, true // upsert );
17
Putting it together
{ url: ‘www.google.com’, location: “Beijing” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }
Atomically update the daily
data
1
Atomically update the
weekly data
2
Atomically update the
monthly data
3
18
Mongodb In Production• EC2 based large server, 2CPU, 8GB memory
• 4 MongoDB server in 3 DB cluster
• Master/Slave setup in same datacenter • One master and one slave for core database
• Backup the entire database everyday• Restore the entire data to new MongoDB server for data integrity.
• Save MongoDB log for slow query/ops analysis• After 120 days, we have > 500GB of data
• Adding about 5gb / day today
• 101 read/s, 70.96 write/s
• Global lock rate 34.9%
19
20
Production: Sharding
• Scale out architecture
Mongo auto - sharding allows us to “just add servers” at rails & db tiers. Right now, no sharding is used.
Shard 1Shard 1
Shard 2Shard 2
Shard 3Shard 3
Shard 4Shard 4
Reporting ServerReporting Server
Collection ServerCollection Server
URL 1
URL 2
URL 3
URL 4
URL 5
URL 6
URL 7
URL 8
Shard by URL
Write load evenlydistributed
Most reads hit a single shard
21
Production:Monitor• Apply restful API to send mongostat metrics to own
monitor system, we can watch MongoDB performance in real time.
Lesson Learned• Consider collection sharding from first day
• Preallocate oplog before starting MongDB if you are using ext3/ext2 file system;ext4/xfs has better performance and don’t need to take care on oplog.
• Review all slow query and add proper index in staging env.
‘
• Be careful to add index in production; Try to add indexes in background or ‘off’ time.
• Avoid slow write operation or hold lock too long
• Watch MongoDB logs after new deployment
22
23
We Are Hiring!Mongodb,Ruby ,Web and Java talents
http://www.yottaa.com/about#jobsThank you for viewing