Top Banner
Honey, I Shrunk the Database For Test and Development Environments Postgres Open, September 2011 Vanessa Hurst Paperless Post
42

Honey I Shrunk the Database

Nov 07, 2014

Download

Technology

Vanessa Hurst

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Honey I Shrunk the Database

Honey, I Shrunk the Database

For Test and Development Environments

Postgres Open, September 2011

Vanessa HurstPaperless Post

Page 2: Honey I Shrunk the Database
Page 3: Honey I Shrunk the Database

User Data

Page 4: Honey I Shrunk the Database

Why Shrink?

Accuracy

You don’t truly know how your app will behave in production unless you use real data.

Production data is the ultimate in accuracy.

Page 5: Honey I Shrunk the Database

Why Shrink?

Accuracy

Freshness

New data should be available regularly.

Full database refreshes should be timely.

Page 6: Honey I Shrunk the Database

Why Shrink?

Accuracy

Freshness

Resource Limitations

Staging and developer machines cannot handle production load.

Page 7: Honey I Shrunk the Database

Why Shrink?

Accuracy

Freshness

Resource Limitations

Data Protection

Limit spread of sensitive user or client data.

Page 8: Honey I Shrunk the Database

Why Shrink?

Accuracy

Freshness

Resource Limitations

Data Protection

Page 9: Honey I Shrunk the Database

Case Study: Paperless Post

Requirements Freshness – Daily, On command for non-

developers Shrinkage – Slices, Mutations

Page 10: Honey I Shrunk the Database

Case Study: Paperless Post

Requirements Freshness – Daily, On command for non-

developers Shrinkage – Slices, Mutations

Resources Source – extra disk space, RAM, and CPUs Destination – limited, often entirely un-

optimized Development -- constrained DBA resources

Page 11: Honey I Shrunk the Database

Shrink Strategies

Copies

Restored backups or live replicas of entire production database

Page 12: Honey I Shrunk the Database

Shrink Strategies

Copies

Slices

Select portions of exact data

Page 13: Honey I Shrunk the Database

Shrink Strategies

Copies

Slices

Mutations

Sanitized, anonymized, or otherwise changed data

Page 14: Honey I Shrunk the Database

Shrink Strategies

Copies

Slices

Mutations

Assumptions

Seed databases, fixtures, test data

Page 15: Honey I Shrunk the Database

Shrink Strategies

Copies

Slices

Mutations

Assumptions

Page 16: Honey I Shrunk the Database

Slices

Vertical Slice Difficult to obtain a valid, useful subset of data. Example: Include some entire tables, exclude

others

Page 17: Honey I Shrunk the Database

Slices

Vertical Slice Difficult to obtain a valid, useful subset of data. Example: Include some entire tables, exclude

others

Horizontal Slice Difficult to write and maintain. Example: SQL or application code to determine

subset of data

Page 18: Honey I Shrunk the Database

PG Tools – Vertical Slice

Flexibility at Source (Production)

pg_dump Include data only [-a --data-only] Include table schema only [-s --schema-only] Select tables [-t table1 table2 --table table1

table2] Select schemas [-n schema --schema=schema] Exclude schemas [-N schema --exclude-

schema=schema]

Page 19: Honey I Shrunk the Database

PG Tools – Vertical Slice

Flexibility at Destination (Staging, Development)

pg_restore Include data only [-a --data-only] Select indexes [-i index --index=index] Tune processing [-j number-of-jobs --jobs=number-

of-jobs] Select schemas [-n schema --schema=schema] Select triggers[-T trigger --trigger=trigger] Exclude privileges [-x --no-privileges --no-acl]

Page 20: Honey I Shrunk the Database
Page 21: Honey I Shrunk the Database

Mutations

External Data Protection HIPAA Regulations PCI Compliance API Terms of Use

Page 22: Honey I Shrunk the Database

Mutations

External Data Protection HIPAA Regulations PCI Compliance API Terms of Use

Internal Data Protection Protecting your users’ personal data Protecting your users from accidents, e.g. staging

emails Your Terms of Service

Page 23: Honey I Shrunk the Database

User Data

Page 24: Honey I Shrunk the Database

Case Study: Paperless Post

Composite Slice including

Vertical Slice – All application object schemas

Vertical Slice – Entire tables of static content

Horizontal Slice – Subset of users and their data

Mutation – Changed user email addresses

Page 25: Honey I Shrunk the Database

Case Study: Paperless Post

Composite Slice including

Vertical Slice – All application object schemas

pg_dump --clean --schema-only --schema public db-01 > slice.sql

Page 26: Honey I Shrunk the Database

Case Study: Paperless Post

Composite Slice including

Vertical Slice – All application object schemas

pg_dump --clean --schema-only --schema public db-01 > slice.sql

Vertical Slice – Entire tables of static content

pg_dump --data-only --schema public -t cards db-01 >> slice.sql

Page 27: Honey I Shrunk the Database

Case Study: Paperless Post

Composite Slice including

Vertical Slice – All application object schemas

pg_dump --clean --schema-only --schema public db-01 > slice.sql

Vertical Slice – Entire tables of static content

pg_dump --data-only --schema public -t cards db-01 >> slice.sql

Horizontal Slice – Subset of users and their dataMutation – Changed user email addresses

Page 28: Honey I Shrunk the Database

Case Study: Paperless Post

CREATE SCHEMA staging;

Page 29: Honey I Shrunk the Database

Case Study: Paperless Post

Horizontal Slice Custom SQL

SELECT * INTO staging.usersFROM usersWHERE EXISTS (subset of users);

Page 30: Honey I Shrunk the Database

Case Study: Paperless Post

Horizontal Slice Custom SQL

SELECT * INTO staging.usersFROM usersWHERE EXISTS (subset of users);

Dynamic relative to full data set or newly created slice

SELECT * INTO staging.stuffFROM stuffWHERE EXISTS (stuff per staging.users);

Page 31: Honey I Shrunk the Database

Case Study: Paperless Post

Horizontal Slice Custom SQL Dynamic relative to full data set or newly created

slice

Mutations Email Addresses

Use regular expressions to clean non-admin addressese.g. [email protected] => [email protected]

Cached Data Clear cached short link from link-shortening API

Page 32: Honey I Shrunk the Database

Case Study: Paperless Post

Composite Slice including

Vertical Slice – All application object schemas

pg_dump --clean --schema-only --schema public db-01 > slice.sql

Vertical Slice – Entire tables of static content

pg_dump --data-only --schema public -t cards db-01 >> slice.sql

Horizontal Slice – Subset of users and their dataMutation – Changed user email addresses

pg_dump --data-only --schema staging db-01 >> slice.sql

Page 33: Honey I Shrunk the Database

Case Study: Paperless Post

Rebuild Prepare new database as standby Gracefully close connections Rotate by renaming databases

Security Dedicated database build user Membership in application user role Application user role & privileges remain

Page 34: Honey I Shrunk the Database

Case Study: Paperless Post

Rebuild $ bzcat slice.sql.bz2 | psql db-new Staging schema has not been created, so all

data loads to default schema

Page 35: Honey I Shrunk the Database

Case Study: Paperless Post

We hacked our rebuild by importing across schemas!

Now our sequences are wrong, causing duplicate data errors every time we try to insert into tables.

Page 36: Honey I Shrunk the Database

Secret Weapon

--Updates all serial sequences for ID columns only

BEGINFOR table_record IN SELECT pc.relname FROM pg_class pc

WHERE pc.relkind = 'r' AND EXISTS (SELECT 1 FROM pg_attribute pa WHERE pa.attname = 'id' AND pa.attrelid = pc.oid) LOOPtable_name = table_record.relname::text;EXECUTE 'SELECT setval(pg_get_serial_sequence(' || quote_literal(table_name) || ', ' || quote_literal('id')::text || '), MAX(id)) FROM ' || table_name || ' WHERE EXISTS (SELECT 1 FROM ' || table_name || ')';

END LOOP;

Page 37: Honey I Shrunk the Database

Case Study: Paperless Post

Rebuild $ bzcat slice.sql.bz2 | psql db-new Staging schema has not been created, so all

data loads to default schema echo “select 1 from update_id_sequences();”

>> slice.sql Vacuum Reindex

Page 38: Honey I Shrunk the Database

Case Study: Paperless Post

Security Database build user

CREATE DB privileges Member of Application user role

Application user remains database owner Application user privileges remain limited Build only works in predetermined

environments

Page 39: Honey I Shrunk the Database

Case Study: Paperless Post

Requirements Freshness – Daily, On command for non-

developers Shrinkage – Slices, Mutations

Resources Source – extra disk space, RAM, and CPUs Destination – limited, often entirely un-

optimized Development -- constrained DBA resources

Page 40: Honey I Shrunk the Database

Questions?

Postgres Open, September 2011

Vanessa HurstPaperless Post

@DBNess

Page 41: Honey I Shrunk the Database

More Tools

Copies -- LVMSnapshots See talk by Jon Erdman at PG Conf EU Great for all reads Data stays virtualized & doesn’t take up space

until changed Ideal for DDL changes without actual data

changes

Page 42: Honey I Shrunk the Database

More Tools

Copies, Slices -- pg_staging by dmitrihttp://github.com/dimitri/pg_staging Simple -- pauses pgbouncer & restores backup Efficient -- leverage bulk loading Flexible -- supports varying psql files Custom -- limited

Slices -- replicate by rtomayko of Githubhttp://github.com/rtomayko/replicate Simple - Preserves object relations via ActiveRecord Inefficient -- Creates text-based .dump Inflexible -- Corrupts id sequences on data insert Custom -- highly