Top Banner
Pinterest Engineering Scaling Pinterest Yash Nelapati Ascii Artist Saturday, August 31, 13
65

Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Yash NelapatiAscii Artist

Saturday, August 31, 13

Page 2: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

An online pinboard to organize and share what inspires you.

Pinterest is...

Saturday, August 31, 13

Page 3: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Saturday, August 31, 13

Page 4: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Saturday, August 31, 13

Page 5: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Saturday, August 31, 13

Page 6: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

March 2010Growth Scaling Pinterest

Page views per day

Mar 2010 Jan 2011 Jan 2012 May 2012

Saturday, August 31, 13

Page 7: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

March 2010Growth Scaling Pinterest

Page views per day

Mar 2010 Jan 2011 Jan 2012 May 2012

Saturday, August 31, 13

Page 8: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

March 2010Growth Scaling Pinterest

· RackSpace

· 1 small Web Engine

· 1 small MySQL DB

· 1 Engineer + 2 Founders

Page views per day

Mar 2010 Jan 2011 Jan 2012 May 2012

Saturday, August 31, 13

Page 9: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

March 2010Growth

Saturday, August 31, 13

Page 10: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

March 2010Growth

Saturday, August 31, 13

Page 11: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

January 2011GrowthScaling Pinterest

Mar 2010 Jan 2011 Jan 2012

Page views per day

Saturday, August 31, 13

Page 12: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

January 2011GrowthScaling Pinterest

Mar 2010 Jan 2011 Jan 2012

Page views per day

Saturday, August 31, 13

Page 13: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

January 2011GrowthScaling Pinterest

· Amazon EC2 + S3 + CloudFront

· 1 NGinX, 4 Web Engines

· 1 MySQL DB + 1 Read Slave

· 1 Task Queue + 2 Task Processors

· 1 MongoDB

· 2 Engineers + 2 Founders

Mar 2010 Jan 2011 Jan 2012

Page views per day

Saturday, August 31, 13

Page 14: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Saturday, August 31, 13

Page 15: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

September 2011GrowthScaling Pinterest

Mar 2010 Jan 2011 Jan 2012 May 2012

Page views per day

Saturday, August 31, 13

Page 16: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

September 2011GrowthScaling Pinterest

Mar 2010 Jan 2011 Jan 2012 May 2012

Page views per day

Saturday, August 31, 13

Page 17: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

September 2011GrowthScaling Pinterest

· Amazon EC2 + S3 + CloudFront· 2 NGinX, 16 Web Engines + 2 API

Engines· 5 Functionally Sharded MySQL DB +

9 read slaves· 4 Cassandra Nodes· 15 Membase Nodes (3 separate

clusters)· 8 Memcache Nodes· 10 Redis Nodes· 3 Task Routers + 4 Task Processors· 4 Elastic Search Nodes· 3 Mongo Clusters· 3 Engineers (8 Total)

Mar 2010 Jan 2011 Jan 2012 May 2012

Page views per day

Saturday, August 31, 13

Page 18: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

It will fail. Keep it simple.

Saturday, August 31, 13

Page 19: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

April 2012GrowthScaling Pinterest

Mar 2010

Page views per day

Mar 2010 Jan 2011 Jan 2012 May 2012

Saturday, August 31, 13

Page 20: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

April 2012GrowthScaling Pinterest

Mar 2010

Page views per day

Mar 2010 Jan 2011 Jan 2012 May 2012

Saturday, August 31, 13

Page 21: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

April 2012GrowthScaling Pinterest

Mar 2010

· Amazon EC2 + S3 + Edge Cast

· 135 Web Engines + 75 API Engines

· 10 Service Instances

· 80 MySQL DBs (m1.xlarge) + 1 slave

each

· 110 Redis Instances

· 60 Memcache Instances

· 2 Redis Task Manager + 60 Task

Processors

· 3rd party sharded Solr

· 15 Engineers (25 Total)

Page views per day

Mar 2010 Jan 2011 Jan 2012 May 2012

Saturday, August 31, 13

Page 22: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

January 2012Growth

Saturday, August 31, 13

Page 23: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Saturday, August 31, 13

Page 24: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

August 2013GrowthScaling Pinterest

Page views per day

April 2012 August 2013

Saturday, August 31, 13

Page 25: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

August 2013GrowthScaling Pinterest

Page views per day

April 2012 August 2013

Saturday, August 31, 13

Page 26: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

August 2013GrowthScaling Pinterest

· Amazon EC2 + S3 + Edge Cast· 400+ Web Engines + 400+ API

Engines· 70+ MySQL DBs (hi.4xlarge on SSDs)

+ 1 slave each· 100+ Redis Instances· 230+ Memcache Instances· 10 Redis Task Manager + 500 Task

Processors· 70+ Engineers (130+ Total)

Page views per day

April 2012 August 2013

Saturday, August 31, 13

Page 27: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

August 2013GrowthScaling Pinterest

· Amazon EC2 + S3 + Edge Cast· 400+ Web Engines + 400+ API

Engines· 70+ MySQL DBs (hi.4xlarge on SSDs)

+ 1 slave each· 100+ Redis Instances· 230+ Memcache Instances· 10 Redis Task Manager + 500 Task

Processors· 70+ Engineers (130+ Total)

Page views per day

April 2012 August 2013· 6 services (80 instances)· Sharded Solr· 20 HBase· 12 Kafka + Azkabhan· 8 Zookeeper Instances· 12 Varnish

Saturday, August 31, 13

Page 28: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

ELB

Routing & Filtering(Varnish)

All connection pairings managed by ZooKeeper

Puppet

StatD

Monit

GangliaAPI App(Python)

Web App(Python)

Task Processing(Python/Pyres)

MySQL Service(Java/Finagle)

Memcache Mux(Nutcracker)

Feed Service(Python/Thrift)

Follower Service(Python/Thrift)

Images(S3 + CDN) MySQL Memcache Redis HBase

Task Queue(Redis)

Follower Service(Python/Thrift)

Follower Service(Python/Thrift)

Saturday, August 31, 13

Page 29: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Tripwire

EMR

S3

Scaling Pinterest

Web App(Python)

Task Processing(Python/Pyres)

Kafka

S3 Copier

Hive Azkaban

API(Python)

Saturday, August 31, 13

Page 30: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Saturday, August 31, 13

Page 31: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Technologies

Saturday, August 31, 13

Page 32: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

ChoosingYourTech

Questions to ask • Does it meet your needs?

• How mature is the product?

• Is it commonly used? Can you hire people who have used it?

• Is the community active?

• How robust is it to failure?

• How well does it scale? Will you be the biggest user?

• Does it have a good debugging tools? Profiler? Backup software?

• Is the cost justified?

Saturday, August 31, 13

Page 33: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Hosting Why Amazon Web Services?• Variety of servers running Linux

• Very good peripherals, such as load balancing, DNS, map reduce, basic firewalls, and more

• Good reliability (don’t throw tomatoes at me!)

• Very active dev community

• Not cheap, but...

• New instances ready in seconds

Saturday, August 31, 13

Page 34: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Code Why Python?• Extremely mature

• Well known and well liked

• Solid active community

• Very good libraries specifically targeted to web development

• Effective rapid prototyping

• Free

Saturday, August 31, 13

Page 35: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Production Data

Why MySQL and Memcache? • Extremely mature

• Well known and well liked

• Rarely catastrophic loss of data

• Response time to request rate increases linearly

• Very good software support: XtraBackup, Innotop, Maatkit

• Solid active community

• Free

Saturday, August 31, 13

Page 36: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Why Redis?• Well known and well liked

• Active community

• Consistently good performance

• Variety of convenient and efficient data structures

• 3 Flavors of Persistence: Now, Snapshot, Never

• Free

Production Data

Saturday, August 31, 13

Page 37: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Why HBASE? (Why not MySQL)• Efficient Storage

• Handle large write throughput

• Solid Hadoop interface

• Maturing quickly, used by facebook

• Built on HDFS

• Free

Production Data

Saturday, August 31, 13

Page 38: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

What happened to Cassandra, Mongo, ES, and Membase?Production

Data • Does it meet your needs?

• How mature is the product?

• Is it commonly used? Can you hire people who have used it?

• Is the community active?

• How robust is it to failure?

• How well does it scale? Will you be the biggest user?

• Does it have a good debugging tools? Profiler? Backup software?

• Is the cost justified?

Saturday, August 31, 13

Page 39: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

If you’re the biggest user of a technology, the challenges will

be greatly amplified

Saturday, August 31, 13

Page 40: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

What’s happening now?

Saturday, August 31, 13

Page 41: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Challenge: One Codebase + Lots of Engineers = Deploy Hell

• Major bugs and performance issues stall deploys

• Performance issues creep in under radar

• 7+ development teams, 1 ops team

• Workload changing more rapidly and less predictably

• Want developers to not fear moving fast

EmployeeGrowth

Saturday, August 31, 13

Page 42: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Solution: Deploy Checkpoints• Aggressive unit tests (careful! don’t erase your DB!)

• Rings of deployment

• Canary, employees only, 5% of user base, etc.

• Continuous deployment

• Production integration tests

EmployeeGrowth

Saturday, August 31, 13

Page 43: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Challenge: Increase Availability, Decrease Latency

• Push for better uptime and lower latency

• Initially, most uptime and latency issues due to DB + caching

• Fewer Instances => Few, but big failures

• More Instances => More smaller failures + more complexity

• How aggressively can you retry without hurting the system?

Uptime &Latency

Saturday, August 31, 13

Page 44: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Solution: Metrics Dashboard and Alerts

• Create dashboard + alerts, and review response times weekly

• When? Soon after launch at latest

• Profile everything

• MySQL - Maatkit, InnoTop

• Memcache - Maatkit

• Frontend - New Relic

• General Ops - StatsD, Nagios / Monit, Ganglia

Uptime &Latency

Saturday, August 31, 13

Page 45: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Solution: Configuration Manager and Failover

• Provides load balancing and automatic connection reconfiguration

• When? 30+ caches / DBs

• One option: Intermediate load balancers

• Example: HAProxy, Nginx, Varnish

• Extra latency hop

• More complication

• Configuration hassle (1 LB / 7 services?)

Uptime &Latency

Saturday, August 31, 13

Page 46: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Co-ordination

Solution: Zookeeper• Centralized configuration management

• Used for service discovery

• Notifies of service failures

• WATCH and its callback are pretty reliable

• Experiment framework

Saturday, August 31, 13

Page 47: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Co-ordination

Zookeeper

Services

app

Register

Solution: Zookeeper• Centralized configuration management

• Used for service discovery

• Notifies of service failures

• WATCH and its callback are pretty reliable

• Experiment framework

Saturday, August 31, 13

Page 48: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Co-ordination

Zookeeper

Services

app

Register

WATCH

Solution: Zookeeper• Centralized configuration management

• Used for service discovery

• Notifies of service failures

• WATCH and its callback are pretty reliable

• Experiment framework

Saturday, August 31, 13

Page 49: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

MySQL Failover

Part 1: Configuration Manager and Failover

A B

App

{“master” : “A”}

readonly=True

Zookeeper

Saturday, August 31, 13

Page 50: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

MySQL Failover

Part 2: Configuration Manager and Failover

A B

App

{“master” : “B”}

readonly=True

Zookeeper

Saturday, August 31, 13

Page 51: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

MySQL Failover

Part 2: Configuration Manager and Failover

A B

App

{“master” : “B”}

Zookeeper

readonly=False

Saturday, August 31, 13

Page 52: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Challenge: Number of Connections Rising

• Initially, entire app tier connected to all Memcache, Redis, MySQL

• On Memcache...

• 20k connections * 10kB / connection = 195MB / Memcache

• 40 Memcaches means 7.6 GB used on connections

• Connection space is not allocated from slab memory!

• Can eventually cause Memcache process to leak into swap

• On MySQL

• At least 256 kB / connection

Connections

Saturday, August 31, 13

Page 53: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Solution: Connection Pooling and Multiplexing

• Data Services, Nutcracker

• When? Once any service gets close to 10k connections

• Success: Memcache

• Once was >20k connections

• Now 1.3k connections

• But, aggressive fan-out causes...

• Network contention

• Incast congestion

Connections

Saturday, August 31, 13

Page 54: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Memcache Failures

App Nutcracker

Saturday, August 31, 13

Page 55: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Memcache Failures

App Nutcracker

Ketama Ring Adjusted

Saturday, August 31, 13

Page 56: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Finagle• RPC for high concurrency

• Twitter

• Completely asynchronous

• Previous experience with Finagle

• Lots of compatible libraries

• JVM

• Lots of bells and whistles - Ostrich, Zipkin, lago

Why Java Over Python?

Saturday, August 31, 13

Page 57: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

How did you shard?

Saturday, August 31, 13

Page 58: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

How we Sharded

db00001db00002 ....db00512

db00513db00514 ....db01024

db03073db03074 ....db03583

db03584db03585 ....db04096

Saturday, August 31, 13

Page 59: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

High Availability

db00001db00002 ....db00512

db00513db00514 ....db01024

db03073db03074 ....db03583

db03584db03585 ....db04096

Master Master Replication

Saturday, August 31, 13

Page 60: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

IncreasedLoad on DB?

db00001db00002 ....db00512

db00001db00002 ....db00256

db00256db00257 ....db00512

Split your Shards

Saturday, August 31, 13

Page 61: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

ID Structure 64 bits

Shard ID Local IDType

· A lookup data structure has physical server to

shard ID range (cached by each app server process)

· Shard ID denotes which shard

· Type denotes object type (e.g., pins)

· Local ID denotes position in table

Saturday, August 31, 13

Page 62: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Objects &Mappings

Object tables (e.g., pin, board, user, comment)

· Local ID MySQL blob (JSON / Serialized thrift)

Mapping tables (e.g., user has boards, pin has likes)

· Full ID Full ID (+ timestamp)

· Naming schema is noun_verb_nounQueries are PK or index lookups (no joins)

· Data DOES NOT MOVEAll tables exist on all shards

No schema changes required (index = new table)

Saturday, August 31, 13

Page 63: Scaling Pinterest - QConSP · Pinterest Engineering April 2012 Growth Scaling Pinterest Mar 2010 · Amazon EC2 + S3 + Edge Cast · 135 Web Engines + 75 API Engines · 10 Service Instances

Pinterest Engineering

Scaling Pinterest

Looking Forward• Continually improve Pinner experience

• Better uptime and lower latency

• Help Pinners discover more of the things they love

• Reduce spam and abuse

• Continually collaborate and build bigger, better, faster products

• 140 Pinployees and beyond

• MySQL 5.6

What’s next?

Saturday, August 31, 13