Top Banner
DB Migrations = Pain %4
54

Db migrations equal pain

Apr 14, 2017

Download

Software

Eugen Oskin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Db migrations equal pain

DB Migrations = Pain

%4

Page 2: Db migrations equal pain

Context

● Look is an application for live video streaming

● Backend, iOS and Android client, Admin page,frontend for customers

● Good management

● Good architecture

%7

Page 3: Db migrations equal pain

Context

● 3 environments: develop, qa, production (andlocal)

● 3 core services:– web (aka api)

– rtmp (video streaming)

– cent (realtime messaging)

%11

Page 4: Db migrations equal pain

Context

● There are 2 backend developers

● We think about code quality:– very strict linter

– tests: unit and behave

– deploy in 1 command

%15

Page 5: Db migrations equal pain

Story

● Deployment after 3 monthes of development

● DB redesign: changed one of the core modelsto fit business logic– Schema migration

– Data migration

● Statistics on the admin page

● Successfully deployed to dev and qa

%19

Page 6: Db migrations equal pain

Story

● Data migrations was running during 40 minutes:– I was ready to it

● Production was down during 5 hours– Kernel Panic!

● I deployed the previous version and restore DBfrom snapshot – lost last 3 hours of data

%22

Page 7: Db migrations equal pain

Plan

● Analyze

● Fix

● Learn the lesson

%26

Page 8: Db migrations equal pain

What was the symptoms?

● Django was not responding to request at all

● Memory usage was fine

● CPU was fine

● Network was fine

● Actually, Django was responding with HUGElatency– the best case was 5 minutes, to the simplest

request!

%30

Page 9: Db migrations equal pain

How did we investigate?

● Find bottlenecks:– analyze latencies locally – django-silk is the best

● Fix them one by one

● Test the fixes on the develop environment

%33

Page 10: Db migrations equal pain

How did we fix it?

● Speed up data migrations: 40 minutes → 7minutes– select_related

● Move all long running tasks to celery tasks

● To prevent race between celery and django werun them on separate instances

%37

Page 11: Db migrations equal pain

How did we fix it?

● Simplify admin page– Calculate metrics in periodic celery task

● each 10 minutes, with timeout 1 hour

– Keep in DB

– Join with the metric table

%41

Page 12: Db migrations equal pain

What do we need to do?

● Zero down time deployment aka ContiniusDeployment

%44

Page 13: Db migrations equal pain

Continues Deployment

● Blue Green Deployment

%48

Page 14: Db migrations equal pain

Our way

● Use 2 web instances:– Current

– Staging

● Use 2 DB instances:– Current

– Staging

%52

Page 15: Db migrations equal pain

Our way

● Deployment steps:– Deploy to staging

– Run migrations

– Wait

– Swap the DNS

%56

Page 16: Db migrations equal pain

The fixes deployment

%59

Page 17: Db migrations equal pain

The fixes deployment

● Production was down during 4 hour– Panic!

● The same symptoms!

%63

Page 18: Db migrations equal pain

The guess

● Look at whole stack:– DB flood the disk space

– The free disk space metric has reverse sawtoothform

● Super hot fix: turn off metric task– The free disk space metric have the same period as

the periodic task for calculating metrics

%67

Page 19: Db migrations equal pain

Invistigation

● Use the production DB clone

● Run the raw query that collects metrics– It was running 1 hour!

● This is the reason!

%70

Page 20: Db migrations equal pain

How did we fix it?

● The raw query looks like:– SELECT DISTINCT

– 8 LEFT OUTER JOINs

– 5 COUNTs

– 3 CASEs

– GROUP BY user.id

● Use EXPLAIN

%74

Page 21: Db migrations equal pain

How did we fix it?

● We were not trying to use the raw query indjango– There is no reasons to do so

● Attempts:– Remove metrics that requires CASEs

– Reduce amount of COUNTs and JOINs

– Remove DISTINCT – Fetch row by row

– Use one query for each metric

%78

Page 22: Db migrations equal pain

How did we fix it?

● The fix is:– Use one query for each metric

● The best performance in the production case

%81

Page 23: Db migrations equal pain

Did it help?

Yes

%85

Page 24: Db migrations equal pain

The lesson

● Good management and good architecture arematter

● Deploy more frequently

● Do not use data migrations as is – Usecommands

● Django admin is not efficient for aggregationqueries

● Analyze and synthesize are matter

%89

Page 25: Db migrations equal pain

A proof

● I have refactored another core model:– A schema migration

– A command for data migration

● I have deployed it without downtime

● Look production environment is still alive

%93

Page 26: Db migrations equal pain

Summary

● Analyze

● Fix

● Learn the lesson

%96

Page 28: Db migrations equal pain

DB Migrations = Pain

%4

Page 29: Db migrations equal pain

Context

● Look is an application for live video streaming

● Backend, iOS and Android client, Admin page,frontend for customers

● Good management

● Good architecture

%7

Page 30: Db migrations equal pain

Context

● 3 environments: develop, qa, production (andlocal)

● 3 core services:– web (aka api)

– rtmp (video streaming)

– cent (realtime messaging)

%11

Page 31: Db migrations equal pain

Context

● There are 2 backend developers

● We think about code quality:– very strict linter

– tests: unit and behave

– deploy in 1 command

%15

Page 32: Db migrations equal pain

Story

● Deployment after 3 monthes of development

● DB redesign: changed one of the core modelsto fit business logic– Schema migration

– Data migration

● Statistics on the admin page

● Successfully deployed to dev and qa

%19

Page 33: Db migrations equal pain

Story

● Data migrations was running during 40 minutes:– I was ready to it

● Production was down during 5 hours– Kernel Panic!

● I deployed the previous version and restore DBfrom snapshot – lost last 3 hours of data

%22

Page 34: Db migrations equal pain

Plan

● Analyze

● Fix

● Learn the lesson

%26

Page 35: Db migrations equal pain

What was the symptoms?

● Django was not responding to request at all

● Memory usage was fine

● CPU was fine

● Network was fine

● Actually, Django was responding with HUGElatency– the best case was 5 minutes, to the simplest

request!

%30

Page 36: Db migrations equal pain

How did we investigate?

● Find bottlenecks:– analyze latencies locally – django-silk is the best

● Fix them one by one

● Test the fixes on the develop environment

%33

Page 37: Db migrations equal pain

How did we fix it?

● Speed up data migrations: 40 minutes → 7minutes– select_related

● Move all long running tasks to celery tasks

● To prevent race between celery and django werun them on separate instances

%37

Page 38: Db migrations equal pain

How did we fix it?

● Simplify admin page– Calculate metrics in periodic celery task

● each 10 minutes, with timeout 1 hour

– Keep in DB

– Join with the metric table

%41

Page 39: Db migrations equal pain

What do we need to do?

● Zero down time deployment aka ContiniusDeployment

%44

Page 40: Db migrations equal pain

Continues Deployment

● Blue Green Deployment

%48

Page 41: Db migrations equal pain

Our way

● Use 2 web instances:– Current

– Staging

● Use 2 DB instances:– Current

– Staging

%52

Page 42: Db migrations equal pain

Our way

● Deployment steps:– Deploy to staging

– Run migrations

– Wait

– Swap the DNS

%56

Page 43: Db migrations equal pain

The fixes deployment

%59

Page 44: Db migrations equal pain

The fixes deployment

● Production was down during 4 hour– Panic!

● The same symptoms!

%63

Page 45: Db migrations equal pain

The guess

● Look at whole stack:– DB flood the disk space

– The free disk space metric has reverse sawtoothform

● Super hot fix: turn off metric task– The free disk space metric have the same period as

the periodic task for calculating metrics

%67

Page 46: Db migrations equal pain

Invistigation

● Use the production DB clone

● Run the raw query that collects metrics– It was running 1 hour!

● This is the reason!

%70

Page 47: Db migrations equal pain

How did we fix it?

● The raw query looks like:– SELECT DISTINCT

– 8 LEFT OUTER JOINs

– 5 COUNTs

– 3 CASEs

– GROUP BY user.id

● Use EXPLAIN

%74

Page 48: Db migrations equal pain

How did we fix it?

● We were not trying to use the raw query indjango– There is no reasons to do so

● Attempts:– Remove metrics that requires CASEs

– Reduce amount of COUNTs and JOINs

– Remove DISTINCT – Fetch row by row

– Use one query for each metric

%78

Page 49: Db migrations equal pain

How did we fix it?

● The fix is:– Use one query for each metric

● The best performance in the production case

%81

Page 50: Db migrations equal pain

Did it help?

Yes

%85

Page 51: Db migrations equal pain

The lesson

● Good management and good architecture arematter

● Deploy more frequently

● Do not use data migrations as is – Usecommands

● Django admin is not efficient for aggregationqueries

● Analyze and synthesize are matter

%89

Page 52: Db migrations equal pain

A proof

● I have refactored another core model:– A schema migration

– A command for data migration

● I have deployed it without downtime

● Look production environment is still alive

%93

Page 53: Db migrations equal pain

Summary

● Analyze

● Fix

● Learn the lesson

%96

Page 54: Db migrations equal pain

References

● https://crystalnix.com/works/look/

● http://martinfowler.com/bliki/BlueGreenDeployment.html

● https://gist.github.com/EvgeneOskin/99880b7b7e0cd2d0115f87b7eeb5ae57

%100