Top Banner
Big Bad PostgreSQL: A Case Study 1 Moving a large,” complicated,” and mission-critical datawarehouse from Oracle to PostgreSQL for cost control.
118

Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Apr 17, 2018

Download

Documents

vankhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Big Bad PostgreSQL: A Case Study

1

Moving a“large,”

“complicated,” andmission-criticaldatawarehouse

from Oracleto PostgreSQL

for cost control.

Page 2: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

About the Speaker

Scalable Internet ArchitecturesWith an estimated one billion users worldwide, the Internet today is nothing less than a global subculture with immense diversity, incredible size, and wide geographic reach. With arelatively low barrier to entry, almost anyone can register a domain name today and potentiallyprovide services to people around the entire world tomorrow. But easy entry to web-basedcommerce and services can be a double-edged sword. In such a market, it is typically muchharder to gauge interest in advance, and the negative impact of unexpected customer trafficcan turn out to be devastating for the unprepared.

In Scalable Internet Architectures, renowned software engineer and architect TheoSchlossnagle outlines the steps and processes organizations can follow to build online services that can scale well with demand—both quickly and economically. By making intelligent decisions throughout the evolution of an architecture, scalability can be a matter of engineering rather than redesign, costly purchasing, or black magic.

Filled with numerous examples, anecdotes, and lessons gleaned from the author’s years of experience building large-scale Internet services, Scalable Internet Architectures is boththought-provoking and instructional. Readers are challenged to understand first, before theystart a large project, how what they are building will be used, so that from the beginning they can design for scalability those parts which need to scale. With the right approach, itshould take no more effort to design and implement a solution that scales than it takes to build something that will not—and if this is the case, Schlossnagle writes, respect yourself and build it right.

Schlossnagle

DEVELOPER’S LIBRARY

$49.99 USA / $61.99 CAN / £35.99 Net UK

Internet/Programming

www.developers-library.com

DEVELOPER’S LIBRARY

Cover image © Digital Vision/Getty Images

Theo Schlossnagle is a principal at OmniTI Computer Consulting, where he providesexpert consulting services related to scalable Internet architectures, database replication,and email infrastructure. He is the creator of the Backhand Project and the Ecelerity MTA,and spends most of his time solving the scalability problems that arise in high performance and highly distributed systems.

Scalable Internet Architectures

ScalabilityPerformanceSecurity

www.omniti.com

Scalable InternetArchitectures

Theo Schlossnagle

S32699X_Scalable_Internet.qxd 6/23/06 3:31 PM Page 1 • Principal @ OmniTI

• Open Source

mod_backhand, spreadlogd,OpenSSH+SecurID, Daiquiri,Wackamole, libjlog, Spread, Reconnoiter, etc.

• Closed Source

Message Systems MTA, Message Central

• Author

Scalable Internet Architectures

Page 3: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Datawarehouse

Log Importer

Data Exporter

OLTP warm backup

OLTP

Oracle 8i

0.5 TB

Hitachi

0.25 TB

JBOD

Oracle 8i

0.75 TB

JBOD

MySQL 4.1

1.2 TB

IDE RAID

MySQL

log importer

1.2 TB

SATA

RAID

Oracle 8i

0.5 TB

Hitachi

1.5 TB

MTI

Overall Architecture

OLTP instance:drives the site

Warm spare

bulk selects / data exports

Log import andprocessing

Page 4: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Datawarehouse

Log Importer

Data Exporter

OLTP warm backup

OLTP

Oracle 8i

0.5 TB

Hitachi

0.25 TB

JBOD

Oracle 8i

0.75 TB

JBOD

MySQL 4.1

1.2 TB

IDE RAID

MySQL

log importer

1.2 TB

SATA

RAID

Oracle 8i

0.5 TB

Hitachi

1.5 TB

MTI

Overall Architecture

OLTP instance:drives the site

Warm spare

bulk selects / data exports

Log import andprocessing

Page 5: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Situation

Page 6: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Situation

• The problems:

• The database is growing.

• The OLTP and ODS/warehouse are too slow.

• A lot of application code against the OLTP system.

• Minimal application code against the ODS system.

Page 7: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Situation

• The problems:

• The database is growing.

• The OLTP and ODS/warehouse are too slow.

• A lot of application code against the OLTP system.

• Minimal application code against the ODS system.

• Oracle:

• Licensed per processor.

• Really, really, really expensive on a large scale.

Page 8: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Situation

• The problems:

• The database is growing.

• The OLTP and ODS/warehouse are too slow.

• A lot of application code against the OLTP system.

• Minimal application code against the ODS system.

• Oracle:

• Licensed per processor.

• Really, really, really expensive on a large scale.

• PostgreSQL:

• No licensing costs.

• Good support for complex queries.

Page 9: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Choices

Page 10: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Choices

• Must keep Oracle on OLTP• Complex, Oracle-specific web application.• Need more processors.

Page 11: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Choices

• Must keep Oracle on OLTP• Complex, Oracle-specific web application.• Need more processors.

• ODS: Oracle not required.• Complex queries from limited sources.• Needs more space and power.

Page 12: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Database Choices

• Must keep Oracle on OLTP• Complex, Oracle-specific web application.• Need more processors.

• ODS: Oracle not required.• Complex queries from limited sources.• Needs more space and power.

• Result:• Move ODS Oracle licenses to OLTP• Run PostgreSQL on ODS

Page 13: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL gotchas

Page 14: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL gotchas

• For an OLTP system that does thousands of updates per second, vacuuming is a hassle.

Page 15: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL gotchas

• For an OLTP system that does thousands of updates per second, vacuuming is a hassle.

• No upgrades?!

Page 16: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL gotchas

• For an OLTP system that does thousands of updates per second, vacuuming is a hassle.

• No upgrades?!

• Less community experience with large databases.

Page 17: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL gotchas

• For an OLTP system that does thousands of updates per second, vacuuming is a hassle.

• No upgrades?!

• Less community experience with large databases.

• Replication features less evolved.

Page 18: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL ♥ ODS

Page 19: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL ♥ ODS

• Mostly inserts.

Page 20: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL ♥ ODS

• Mostly inserts.

• Updates/Deletes controlled, not real-time.

Page 21: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL ♥ ODS

• Mostly inserts.

• Updates/Deletes controlled, not real-time.

• pl/perl (leverage DBI/DBD for remote database connectivity).

Page 22: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL ♥ ODS

• Mostly inserts.

• Updates/Deletes controlled, not real-time.

• pl/perl (leverage DBI/DBD for remote database connectivity).

• Monster queries.

Page 23: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL ♥ ODS

• Mostly inserts.

• Updates/Deletes controlled, not real-time.

• pl/perl (leverage DBI/DBD for remote database connectivity).

• Monster queries.

• Extensible.

Page 24: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

Page 25: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

Page 26: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

Page 27: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

• kernel panics

Page 28: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

• kernel panics

• filesystems remounting read-only

Page 29: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

• kernel panics

• filesystems remounting read-only

• filesystems don’t support snapshots

Page 30: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

• kernel panics

• filesystems remounting read-only

• filesystems don’t support snapshots

• LVM is clunky on enterprise storage

Page 31: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

• kernel panics

• filesystems remounting read-only

• filesystems don’t support snapshots

• LVM is clunky on enterprise storage

• 20 outages in 4 months

Page 32: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

Page 33: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

Page 34: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

Page 35: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

Page 36: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

• ZFS

Page 37: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

• ZFS

• snapshots (persistent), BLI backups.

Page 38: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

• ZFS

• snapshots (persistent), BLI backups.

• Excellent support for enterprise storage.

Page 39: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

• ZFS

• snapshots (persistent), BLI backups.

• Excellent support for enterprise storage.

• DTrace.

Page 40: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

• ZFS

• snapshots (persistent), BLI backups.

• Excellent support for enterprise storage.

• DTrace.

• Free (too).

Page 41: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

Page 42: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

• Partitioning

Page 43: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

• Partitioning

• Statistics and Aggregations

Page 44: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

• Partitioning

• Statistics and Aggregations

• rank over partition, lead, lag, etc.

Page 45: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

• Partitioning

• Statistics and Aggregations

• rank over partition, lead, lag, etc.

• Large selects (100GB)

Page 46: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

• Partitioning

• Statistics and Aggregations

• rank over partition, lead, lag, etc.

• Large selects (100GB)

• Autonomous transactions

Page 47: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Oracle features we need

• Partitioning

• Statistics and Aggregations

• rank over partition, lead, lag, etc.

• Large selects (100GB)

• Autonomous transactions

• Replication from Oracle (to Oracle)

Page 48: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

For large data sets:

Page 49: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

pgods=# select count(1) from ods.ods_tblpick_super;

For large data sets:

Page 50: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512(1 row)

For large data sets:

Page 51: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

• Next biggest tables: 850m, 650m, 590m

pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512(1 row)

For large data sets:

Page 52: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

• Next biggest tables: 850m, 650m, 590m

• Allows us to cluster data over specific ranges (by date in our case)

pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512(1 row)

For large data sets:

Page 53: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

• Next biggest tables: 850m, 650m, 590m

• Allows us to cluster data over specific ranges (by date in our case)

• Simple, cheap archiving and removal of data.

pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512(1 row)

For large data sets:

Page 54: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning

• Next biggest tables: 850m, 650m, 590m

• Allows us to cluster data over specific ranges (by date in our case)

• Simple, cheap archiving and removal of data.

• Can put ranges used less often in different tablespaces (slower, cheaper storage)

pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512(1 row)

For large data sets:

Page 55: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

Page 56: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

Page 57: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

Page 58: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

• some crazy object-relation paradigm.

Page 59: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

• some crazy object-relation paradigm.

• We can use it to implement partitioning:

Page 60: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

• some crazy object-relation paradigm.

• We can use it to implement partitioning:

• One master table with no rows.

Page 61: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

• some crazy object-relation paradigm.

• We can use it to implement partitioning:

• One master table with no rows.

• Child tables that have our partition constraints.

Page 62: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

• some crazy object-relation paradigm.

• We can use it to implement partitioning:

• One master table with no rows.

• Child tables that have our partition constraints.

• Rules on the master table for insert/update/delete.

Page 63: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

Page 64: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

Page 65: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

Page 66: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

• Migrate less-often-accessed partitions to slower storage

Page 67: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

• Migrate less-often-accessed partitions to slower storage

• Different indexes strategies per partition

Page 68: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

• Migrate less-often-accessed partitions to slower storage

• Different indexes strategies per partition

• PostgreSQL >8.1 supports constraint checking on inherited tables.

Page 69: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

• Migrate less-often-accessed partitions to slower storage

• Different indexes strategies per partition

• PostgreSQL >8.1 supports constraint checking on inherited tables.

• smarter planning

Page 70: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

• Migrate less-often-accessed partitions to slower storage

• Different indexes strategies per partition

• PostgreSQL >8.1 supports constraint checking on inherited tables.

• smarter planning

• smarter executing

Page 71: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

RANK OVER PARTITION

• In Oracle:

• In PostgreSQL:

With 8.4, we have windowing functions

Page 72: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

RANK OVER PARTITION

• In Oracle:

• In PostgreSQL:

select userid, email from ( select u.userid, u.email, row_number() over (partition by u.email order by userid desc) as position from (...)) where position = 1

With 8.4, we have windowing functions

Page 73: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

RANK OVER PARTITION

• In Oracle:

• In PostgreSQL:

select userid, email from ( select u.userid, u.email, row_number() over (partition by u.email order by userid desc) as position from (...)) where position = 1

FOR v_row IN select u.userid, u.email from (...) order by email, userid descLOOP IF v_row.email != v_last_email THEN RETURN NEXT v_row; v_last_email := v_row.email; v_rownum := v_rownum + 1; END IF;END LOOP;

With 8.4, we have windowing functions

Page 74: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Large SELECTs

• Application code does:

Page 75: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Large SELECTs

select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

• Application code does:

Page 76: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Large SELECTs

• The width of these rows is about 2k

select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

• Application code does:

Page 77: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Large SELECTs

• The width of these rows is about 2k

• 50 million row return set

select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

• Application code does:

Page 78: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Large SELECTs

• The width of these rows is about 2k

• 50 million row return set

• > 100 GB of data

select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

• Application code does:

Page 79: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

The Large SELECT Problem

• libpq will buffer the entire result in memory.

• This affects language bindings (DBD::Pg).

• This is an utterly deficient default behavior.

• This can be avoided by using cursors

• Requires the app to be PostgreSQL specific.

• You open a cursor.

• Then FETCH the row count you desire.

Page 80: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Big SELECTs the Postgres way

The previous “big” query becomes:

Page 81: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Big SELECTs the Postgres way

DECLARE CURSOR bigdump FORselect u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

The previous “big” query becomes:

Page 82: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Big SELECTs the Postgres way

DECLARE CURSOR bigdump FORselect u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

FETCH FORWARD 10000 FROM bigdump;

The previous “big” query becomes:

Then, in a loop:

Page 83: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Autonomous Transactions

Page 84: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Autonomous Transactions

• In Oracle we have over 2000 custom stored procedures.

Page 85: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Autonomous Transactions

• In Oracle we have over 2000 custom stored procedures.

• During these procedures, we like to:

Page 86: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Autonomous Transactions

• In Oracle we have over 2000 custom stored procedures.

• During these procedures, we like to:

• COMMIT incrementallyUseful for long transactions (update/delete) that need not be atomic -- incremental COMMITs.

Page 87: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Autonomous Transactions

• In Oracle we have over 2000 custom stored procedures.

• During these procedures, we like to:

• COMMIT incrementallyUseful for long transactions (update/delete) that need not be atomic -- incremental COMMITs.

• start a new top-level txn that can COMMITUseful for logging progress in a stored procedure so that you know how far you progessed and how long each step took even if it rolls back.

Page 88: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL shortcoming

Page 89: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL shortcoming

• PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.”

Page 90: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL shortcoming

• PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.”

• When in doubt, use brute force.

Page 91: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL shortcoming

• PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.”

• When in doubt, use brute force.

• Use pl/perl to use DBD::Pg to connect to ourselves (a new backend) and execute a new top-level transaction.

Page 92: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

Page 93: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

• Cross vendor database replication isn’t too difficult.

Page 94: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

• Cross vendor database replication isn’t too difficult.

• Helps a lot when you can do it inside the database.

Page 95: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

• Cross vendor database replication isn’t too difficult.

• Helps a lot when you can do it inside the database.

• Using dbi-link (based on pl/perl and DBI) we can.

Page 96: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

• Cross vendor database replication isn’t too difficult.

• Helps a lot when you can do it inside the database.

• Using dbi-link (based on pl/perl and DBI) we can.

• We can connect to any remote database.

Page 97: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

• Cross vendor database replication isn’t too difficult.

• Helps a lot when you can do it inside the database.

• Using dbi-link (based on pl/perl and DBI) we can.

• We can connect to any remote database.

• INSERT into local tables directly from remote SELECT statements.[snapshots]

Page 98: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication

• Cross vendor database replication isn’t too difficult.

• Helps a lot when you can do it inside the database.

• Using dbi-link (based on pl/perl and DBI) we can.

• We can connect to any remote database.

• INSERT into local tables directly from remote SELECT statements.[snapshots]

• LOOP over remote SELECT statements and process them row-by-row.[replaying remote DML logs]

Page 99: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication (really)

Page 100: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication (really)

• Through a combination of snapshotting and DML replay we:

• replicate over into over 2000 tables in PostgreSQL from Oracle

• snapshot replication of 200

• DML replay logs for 1800

Page 101: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Replication (really)

• Through a combination of snapshotting and DML replay we:

• replicate over into over 2000 tables in PostgreSQL from Oracle

• snapshot replication of 200

• DML replay logs for 1800

• PostgreSQL to Oracle is a bit harder

• out-of-band export and imports

Page 102: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

New Architecture

• Master: Sun v890 and Hitachi AMS + warm standby running Oracle (1TB)

• Logs: several customs running MySQL instances (2TB each)

• ODS BI: 2x Sun v40 running PostgreSQL 8.3 (6TB on Sun JBODs on ZFS each)

• ODS archive: 2x custom running PostgreSQL 8.3 (14TB internal storage on ZFS each)

Page 103: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

PostgreSQL is Lacking

• No upgrades (AYFKM).

• pg_dump is too intrusive.

• Poor system-level instrumentation.

• Poor methods to determine specific contention.

• It relies on the operating system’s filesystem cache.(which make PostgreSQL inconsistent across it’s supported OS base)

Page 104: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Enter Solaris

• Solaris is a UNIX from Sun Microsystems.

• Is it different than other UNIX/UNIX-like systems?

• Mostly it isn’t different (hence the term UNIX)

• It does have extremely strong ABI backward compatibility.

• It’s stable and works well on large machines.

• Solaris 10 shakes things up a bit:

• DTrace

• ZFS

• Zones

Page 105: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Solaris / ZFS

• ZFS: Zettaback Filesystem.

• 264 snapshots, 248 files/directory, 264 bytes/filesystem,278 (256 ZiB) bytes in a pool, 264 devices/pool, 264 pools/system

• Extremely cheap differential backups.

• I have a 5 TB database, I need a backup!

• No rollback in your database? What is this? MySQL?

• No rollback in your filesystem?

• ZFS has snapshots, rollback, clone and promote.

• OMG! Life altering features.

• Caveat: ZFS is slower than alternatives, by about 10% with tuning.

Page 106: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Solaris / Zones

• Zones: Virtual Environments.

• Shared kernel.

• Can share filesystems.

• Segregated processes and privileges.

• No big deal for databases, right?

But Wait!

Page 107: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Solaris / ZFS + Zones = Magic Juju

• ZFS snapshot, clone, delegate to zone, boot and run.

• When done, halt zone, destroy clone.

• We get a point-in-time copy of our PostgreSQL database:

• read-write,

• low disk-space requirements,

• NO LOCKS! Welcome back pg_dump,you don’t suck (as much) anymore.

• Fast snapshot to usable copy time:

• On our 20 GB database: 1 minute.

• On our 1.2 TB database: 2 minutes.

https://labs.omniti.com/trac/pgsoltools/browser/trunk/pitr_clone/clonedb_startclone.sh

Page 108: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

ZFS: how I saved my soul.

• Database crash. Bad. 1.2 TB of data... busted.The reason Robert Treat looks a bit older than he should.

• xlogs corrupted. catalog indexes corrupted.

• Fault? PostgreSQL bug? Bad memory? Who knows?

• Trial & error on a 1.2 TB data set is a cruel experience.

• In real-life, most recovery actions are destructive actions.

• PostgreSQL is no different.

• Rollback to last checkpoint (ZFS), hack postgres code, try, fail, repeat.

Page 109: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Let DTrace open your eyes

• DTrace: Dynamic Tracing

• Dynamically instrument “stuff” in the system:

• system calls (like strace/truss/ktrace).

• process/scheduler activity (on/off cpu, semaphores, conditions).

• see signals sent and received.

• trace kernel functions, networking.

• watch I/O down to the disk.

• user-space processes, each function... each machine instruction!

• Add probes into apps where it makes sense to you.

Page 110: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Can you see what I see?

• There is EXPLAIN... when that isn’t enough...

• There is EXPLAIN ANALYZE... when that isn’t enough.

• There is DTrace.

; dtrace -q -n ‘postgresql*:::statement-start{ self->query = copyinstr(arg0); self->ok=1;}io:::start/self->ok/{ @[self->query, args[0]->b_flags & B_READ ? "read" : "write", args[1]->dev_statname] = sum(args[0]->b_bcount);}’dtrace: description 'postgres*:::statement-start' matched 14 probes^C

select count(1) from c2w_ods.tblusers where zipcode between 10000 and 11000; read sd1 16384select division, sum(amount), avg(amount) from ods.billings where txn_timestamp between ‘2006-01-01 00:00:00’ and ‘2006-04-01 00:00:00’ group by division; read sd2 71647232

Page 111: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

OmniTI Labs / pgsoltools

• https://labs.omniti.com/trac/pgsoltools

• Where we stick out PostgreSQL on Solaris goodies...

• like pg_file_stress

FILENAME/DBOBJECT READS WRITES # min avg max # min avg maxalldata1__idx_remove_domain_external 1 12 12 12 398 0 0 0slowdata1__pg_rewrite 1 12 12 12 0 0 0 0slowdata1__pg_class_oid_index 1 0 0 0 0 0 0 0slowdata1__pg_attribute 2 0 0 0 0 0 0 0alldata1__mv_users 0 0 0 0 4 0 0 0slowdata1__pg_statistic 1 0 0 0 0 0 0 0slowdata1__pg_index 1 0 0 0 0 0 0 0slowdata1__pg_index_indexrelid_index 1 0 0 0 0 0 0 0alldata1__remove_domain_external 0 0 0 0 502 0 0 0alldata1__promo_15_tb_full_2 19 0 0 0 11 0 0 0slowdata1__pg_class_relname_nsp_index 2 0 0 0 0 0 0 0alldata1__promo_177intaoltest_tb 0 0 0 0 1053 0 0 0slowdata1__pg_attribute_relid_attnum_index 2 0 0 0 0 0 0 0alldata1__promo_15_tb_full_2_pk 2 0 0 0 0 0 0 0alldata1__all_mailable_2 1403 0 0 423 0 0 0 0alldata1__mv_users_pkey 0 0 0 0 4 0 0 0

Page 112: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Results

Page 113: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Results

• Move ODS Oracle licenses to OLTP

Page 114: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Results

• Move ODS Oracle licenses to OLTP

• Run PostgreSQL on ODS

Page 115: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Results

• Move ODS Oracle licenses to OLTP

• Run PostgreSQL on ODS

• Save $800k in license costs.

Page 116: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Results

• Move ODS Oracle licenses to OLTP

• Run PostgreSQL on ODS

• Save $800k in license costs.

• Spend $100k in labor costs.

Page 117: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Results

• Move ODS Oracle licenses to OLTP

• Run PostgreSQL on ODS

• Save $800k in license costs.

• Spend $100k in labor costs.

• Learn a lot.

Page 118: Moving ,” and - Percona · Moving a “large,” “ ... relatively low barrier to entry, almost anyone can register a domain name today and potentially ... Data Exporter OLTP warm

Thanks!

• Thank you.

• http://omniti.com/does/postgresql

• We’re hiring, but only if you love:

• lots of data on lots of disks on lots of big boxes

• smart people

• hard problems

• more than one database technology (including PostgreSQL)

• responsibility