Top Banner
2TB of RAM ought to be enough for anybody PG Nordic Day 2014
38

Pg nordic-day-2014-2 tb-enough

May 10, 2015

Download

Technology

Renaud Bruyeron

Online classified web site Leboncoin.fr is one of the success stories of the French Web. 1/3 of the total internet population in France uses the site each month. The growth has been spectacular and swift, and was made possible by a robust and performant software platform. At the heart of the platform is a large PostgreSQL infrastructure, part of it running on some of the largest PC-class hardware available. In this presentation, we will show how we have grown our infrastructure. In particular, the amazing vertical scalability of PG will be showcased with hard numbers (IOPS, transactions/seconds, etc). We will also cover some of the hard lessons we have learned along the way, including near-disasters. Finally, we will look into how innovative features from the PostgreSQL ecosystem enable new approaches to our scalability challenge.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pg nordic-day-2014-2 tb-enough

2TB of RAM ought to be enough for anybodyPG Nordic Day 2014

Page 2: Pg nordic-day-2014-2 tb-enough

2

The Presenter

Renaud Bruyeron - @brew_your_ownParis, France

Page 3: Pg nordic-day-2014-2 tb-enough

3

The Presenter – short bio

• 1998-1999: W3C @ MIT

• 1999-2013: FullSIX• Top 3 Web agency in FR• Strong focus on PHP, Java, and Oracle…

• 2013-present: CTO of

Page 4: Pg nordic-day-2014-2 tb-enough

Schibsted Classified MediaFrom 1 to 30+ Countries in 7 years

4

Page 5: Pg nordic-day-2014-2 tb-enough

5

Page 6: Pg nordic-day-2014-2 tb-enough

6

Templated deployment in 30 countries w/ shared technology

• Technology originally inherited from Blocket.se

• Has since evolved to power all 30 sites

• Focus on performance and ease of local modificationsPostgreSQL

Middleware(data access, business

rules)

Index

Presentation layer

APIs Web

Search C

C

C

PHP

C/PHP

CPL/SQL

Page 7: Pg nordic-day-2014-2 tb-enough

7

PostgreSQL in the SCM Platform

100+ servers running PostgreSQL

8TB of data

50+ million classified ads

Page 8: Pg nordic-day-2014-2 tb-enough

8

Schibsted Classified Media & PostgreSQLmarried…and in love

Page 9: Pg nordic-day-2014-2 tb-enough

#1 Classified Web site in France

9

Page 10: Pg nordic-day-2014-2 tb-enough

10

History

Project initiated in 2006

Site launch: early 2007

Based on technology

from 1 to 230 people, challenger to #1 in 7 years

Page 11: Pg nordic-day-2014-2 tb-enough

11

Page 12: Pg nordic-day-2014-2 tb-enough

12

Page 13: Pg nordic-day-2014-2 tb-enough

13

Page 14: Pg nordic-day-2014-2 tb-enough

14

Explosive growth…with a few bumps along the way!

Page 15: Pg nordic-day-2014-2 tb-enough

15

Big Audience

250M page views / day

5M unique visitors / day

18M UV / monthThat’s 1/3 of French internet population…

600000+ new ads / day25M live ads

#7 most visited Website in France

Page 16: Pg nordic-day-2014-2 tb-enough

16

Big Ops

300+ servers in 2 DCs

20 servers hosting PG databases(in production)

Page 17: Pg nordic-day-2014-2 tb-enough

17

Built on SCM Technology

PostgreSQL

Middleware(data access, business

rules)

Index

Presentation layer

APIs Web

Search

Page 18: Pg nordic-day-2014-2 tb-enough

18

Built on SCM Technology

Website Data

MasterSlave(hot

standby)

Slave(short

queries)

Slave(Long

queries)

Middleware(data access, business

rules)

Index

Presentation layer

APIs Web

Search

Page 19: Pg nordic-day-2014-2 tb-enough

19

Built on SCM Technology

Website Data

MasterSlave(hot

standby)

Slave(short

queries)

Slave(Long

queries)

Middleware(data access, business

rules)

Index

Presentation layer

APIs Web

Search

+15 support DBs

Page 20: Pg nordic-day-2014-2 tb-enough

20

Website Data

+15 support DBs

We use PostgreSQL everywhere!

Backoffice & Analytics

MasterSlave(hot

standby)

Slave(short

queries)

Slave(Long

queries)ODS

OLAP

BI & CRM Middleware(data access, business

rules)

Index

Presentation layer

APIs Web

Search

Page 21: Pg nordic-day-2014-2 tb-enough

21

We try to limit writes!

• Index/Search acting as a structured cache

• Master DB workload = 70% writes• Slaves used to offload read queries

• Main database = 6TB on disk…• +4TB archived away…

• 20K LOCs of PL/SQL

Website Data

+15 support DBs

MasterSlave(hot

standby)

Slave(short

queries)

Slave(Long

queries)

Index

Search

Page 22: Pg nordic-day-2014-2 tb-enough

22

PostgreSQL works beautifully as a DW!

Website DataBackoffice & Analytics

MasterSlave(hot

standby)

Slave(short

queries)

Slave(Long

queries)ODS

OLAP

ETL moving 500GB / day into

the DW

2.2TB

Adding 6GB/day

Page 23: Pg nordic-day-2014-2 tb-enough

23

Big Iron: HP DL980, 2TB of RAM, 64-80 cores

Page 24: Pg nordic-day-2014-2 tb-enough

24

HTOP on the Master

Page 25: Pg nordic-day-2014-2 tb-enough

25

Physical storage for the main PostgreSQL instances

DL980 DL980

3Par V800(SAN)

Fusion-IO

DL980DL980

3Par V800(SAN)

Fusion-IO

Fib

erC

hann

el

Fib

erC

hann

el

Master Slave SlaveSlave

DC 1 DC 2

Page 26: Pg nordic-day-2014-2 tb-enough

26

Big Iron: 3Par V800 SAN

Page 27: Pg nordic-day-2014-2 tb-enough

27

Big Iron: thin provisioning with mix of SSD and FC disks

Page 28: Pg nordic-day-2014-2 tb-enough

28

Big Iron: high performance…

Page 29: Pg nordic-day-2014-2 tb-enough

29

Bragging about it ;)

Page 30: Pg nordic-day-2014-2 tb-enough

30

Despite the caches and the search engine, we get impressive workloads on the master DB

600 tx/sec

Page 31: Pg nordic-day-2014-2 tb-enough

31

How did we get there?

2x DL980 w/ Fusion-IO

2x 3Par V8002x DL980 w/

FC to the V800s

Page 32: Pg nordic-day-2014-2 tb-enough

32

We did go through growing pains and near disasters

The « Big One »

Page 33: Pg nordic-day-2014-2 tb-enough

33

Our own « worst day of our lives »: March 1st 2013 (1/2)

Master DB is slowing down dramatically

We find that Slony replication is the culprit

We don’t know what to do…

Until we find a solution on the net that involves cleaning up slony metadata…

…(you know where this is going)…

We fumble. We notice. The slave is borked.

Rebuilding the Slave with slony brings the Master down. Oh. God…

We take the Master off the stack, and start rebuilding the slave w/ Slony

…5 days later, we are done (!)…

Page 34: Pg nordic-day-2014-2 tb-enough

34

Our own « worst day of our lives »: March 1st 2013 (2/2)

…but we are not out of the woods yet!...

Pent up demand is bringing the site down!

We decide to switch to native replication!

…but the network cards are maxed out by the replication data…

…triggering a kernel bug…

…(Murphy, could you please step out of the room?)…

We implement network card bonding, and start moving support tables off the main instance

…and we are done!

Page 35: Pg nordic-day-2014-2 tb-enough

What’s Next?

35

Page 36: Pg nordic-day-2014-2 tb-enough

36

Vertical Scalability has limits!

We are already running on the biggest HW money can buy

Past certain volume levels, execution plans can change radically

Huge instances are difficult to maintain & backup safelyRebuilding the slave in March 2013 took a full 5 days…

Although we are maxing out the HW available to us, especiallyon the writes/s,

We are committed to PostgreSQL at the heart of our platform

Page 37: Pg nordic-day-2014-2 tb-enough

37

Key ideas to take our platform to the next level

Sharding

Unbundling of the schema

PGQ

Spread the reads & writes horizontally

Move parts of the schema (that can be decoupled) to other instances. Spread the

workload

Reduce application-level transaction interleaving by moving parts of the

transactions to asynchronous workers

Page 38: Pg nordic-day-2014-2 tb-enough

38