Top Banner
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety
73

Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Jan 04, 2016

Download

Documents

Peter Jefferson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Database Replication in Tashkent

CSEP 545 Transaction Processing

Sameh Elnikety

Page 2: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Replication for Performance

ExpensiveLimited scalability

ExpensiveLimited scalability

2

Page 3: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

DB Replication is Challenging• Single database system

– Large, persistent state– Transactions– Complex software

• Replication challenges– Maintain consistency – Middleware replication

3

Page 4: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Background

Replica 1StandaloneDBMS

4

Page 5: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Background

Replica 2

Replica 1

Replica 3

Load Balancer

5

Page 6: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Read Tx

Replica 2

Replica 1

Replica 3

Load Balancer

T

Read tx does not change DB state

Read tx does not change DB state

6

Page 7: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update tx changesDB state

Update tx changesDB state

Update Tx 1/2

Replica 2

Replica 1

Replica 3

Load Balancer

Twsws

7

Page 8: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update tx changesDB state

Update tx changesDB state

Update Tx 1/2

Replica 2

Replica 1

Replica 3

Load Balancer

Tws

Apply (or commit) T everywhere

Apply (or commit) T everywhere

ws

ws

ws

Example:T1: { set x = 1 }

Example:T1: { set x = 1 }

8

ws

Page 9: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

Update Tx 2/2

Replica 2

Replica 1

Replica 3

Load Balancer

Tws

Update tx changesDB state

Update tx changesDB state

ws

Tws

ws

9

Page 10: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

Update Tx 2/2

Replica 2

Replica 1

Replica 3

Load Balancer T

Update tx changesDB state

Update tx changesDB state

T

ws

ws

ws ws

ws

ws

ws

ws

Replica 3Example:T1: { set x = 1 }T2: { set x = 7 }

Example:T1: { set x = 1 }T2: { set x = 7 }

Commit updates in order

Commit updates in order

10

Page 11: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

Sub-linear Scalability Wall

Replica 2

Replica 1

Replica 3

Load Balancer T

T

ws

ws

ws ws

ws

ws

ws

ws

Replica 3

11

Replica 4

Page 12: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• General scaling techniques– Address fundamental bottlenecks– Synergistic, implemented in middleware– Evaluated experimentally

This Talk

12

Page 13: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Super-linear Scalability

Single Base United MALB UF0

20

40

60

80

100

120

TP

S

12 X

25 X

37 X

1 X

7 X

Page 14: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Big Picture: Let’s Oversimplify

StandaloneDBMS

R reading

update

loggingU

14

Page 15: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

reading

update

logging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

StandaloneDBMS

R reading

update

loggingU

N.R

N.U

R

U

(N-1).ws

15

Page 16: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

reading

update

logging

reading

update

logging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

Replica 1/N (optimized)

StandaloneDBMS

16

R reading

update

loggingU

N.R

N.U

R

U

(N-1).ws

N.R

N.U

R*

U*

(N-1).ws*

Page 17: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

reading

update

logging

reading

update

logging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

Replica 1/N (optimized)

StandaloneDBMS

17

R reading

update

loggingU

N.R

N.U

R

U

(N-1).ws

N.R

N.U

R*

U*

(N-1).ws*

MALBUpdate FilteringUniting O & D

MALBUpdate FilteringUniting O & D

Page 18: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Key Points1. Commit updates in order

– Perform serial synchronous disk writes– Unite ordering and durability

2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution

3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed

18

Page 19: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

12, 3

Ordering

Load balancingLoad balancing

Update propagationUpdate propagation

Commit updates in

order

Commit updates in

order 19

Page 20: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Traditionally: – Commit ordering and durability are separated

• Key idea: – Unite commit ordering and durability

Key Idea

20

Page 21: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

All Replicas Must Agree• All replicas agree on

– which update tx commit– their commit order

• Total order – Determined by middleware – Followed by each replica

durability

Replica 3

Tx A

Tx Bdurability

Replica 2

durability

Replica 1

21

Page 22: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Tx B

durability

Replica 3

Ordering

Tx A

Order Outside DBMS

Tx A

Tx Bdurability

Replica 2

durability

Replica 1

22

Page 23: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Tx B

durability

Replica 3

Ordering

Tx A

A B

A B

Order Outside DBMS

Tx A

Tx Bdurability

Replica 2

A B

durability

Replica 1

A B

A B

A B

A B

23

Page 24: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQ

L in

terface

Task A

Task A

Task B

Task B

Enforce External Commit Order

24

Page 25: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQ

L in

terface

Task A

Task A

Task B

Task B

B A

Enforce External Commit Order

25

Page 26: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQ

L in

terface

Task A

Task A

Task B

Task B

B A

Cannot commit A & B concurrently!

Enforce External Commit Order

26

Page 27: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

A B

durability

Replica 3

Proxy

Tx A

Tx B

SQ

L in

terface

Task A

Task A

Task B

Task B

A

Enforce Order = Serial Commit

DBMS

27

Page 28: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Ordering

A B

durability

Replica 3

Proxy

Tx A

Tx B

SQ

L in

terface

Task A

Task A

Task B

Task B

A B

Enforce Order = Serial Commit

DBMS

28

Page 29: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Commit Serialization is Slow

DurabilityA

Proxy

DBMS

durability

CPU

OrderingA B C

Commit orderA B C

DurabilityA B

CPU

DurabilityA B C

CPU

Co

mm

it A

Co

mm

it B

Co

mm

it C

Ac

k A

Ac

k B

Ac

k C

29

Page 30: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Commit Serialization is Slow

DurabilityA

Proxy

DBMS

durability

CPU

OrderingA B C

Commit orderA B C

DurabilityA B

CPU

DurabilityA B C

CPU

Co

mm

it A

Co

mm

it B

Co

mm

it C

Ac

k A

Ac

k B

Ac

k C

Problem: Durability & ordering separated → serial disk writes

Problem: Durability & ordering separated → serial disk writes

30

Page 31: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Co

mm

it A

Co

mm

it B

Co

mm

it C

Ac

k A

Ac

k B

Ac

k C

Unite D. & O. in Middleware

Proxy

DBMS

CPU

OrderingA B C

Commit orderA B C

CPU

DurabilityA B C

CPU

durabilityOFF

durability

31

Page 32: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Co

mm

it A

Co

mm

it B

Co

mm

it C

Ac

k A

Ac

k B

Ac

k C

Unite D. & O. in Middleware

Proxy

DBMS

CPU

OrderingA B C

Commit orderA B C

CPU

DurabilityA B C

CPU

durabilityOFF

durability

Solution: Move durability to MW Durability & ordering in middleware → group commit

Solution: Move durability to MW Durability & ordering in middleware → group commit

32

Page 33: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Middleware logs tx effects– Durability of update tx

• Guaranteed in middleware• Turn durability off at database

• Middleware performs durability & ordering– United → group commit → fast

• Database commits update tx serially– Commit = quick main memory operation

Implementation: Uniting D & O in MW

33

Page 34: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Uniting Improves Throughput• Metric

– Throughput• Workload

– TPC-W Ordering

(50% updates)• System

– Linux cluster – PostgreSQL– 16 replicas– Serializable exec. Single Base United MALB UF

0

5

10

15

20

25

30

35

40

TPC-W

1 X

12 X

7 X

TP

S

Page 35: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

1

Ordering

2, 3

Load balancingLoad balancing

Update propagationUpdate propagation

Commit updates in

order

Commit updates in

order 35

Page 36: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Key IdeaReplica 1

Mem

Disk

Replica 2

Mem

Disk

Load Balancer

Equal load on replicas Equal load on replicas

36

Page 37: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Key IdeaReplica 1

Mem

Disk

Replica 2

Mem

Disk

Load Balancer

Equal load on replicas Equal load on replicas

MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution

MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution 37

Page 38: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

How Does MALB Work?

Database 21 3

Workload A →

B →

MemMemory

21

2 3

38

Page 39: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1

Mem

Disk21 3

Replica 2

Mem

Disk21 3

LeastLoaded

31

A →

B →

21

2 3

39

Page 40: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1

Mem

Disk21 3

Replica 2

Mem

Disk21 3

LeastLoaded

31

SlowSlow

SlowSlow

A →

B →

21

2 3

40

21 331

21 331

Page 41: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Data Fits in MemoryReplica 1

Mem

Disk21 3

Replica 2

Mem

Disk21 3

MALB

A →

B →

21

2 3A, A, A, A

B, B, B, B

A, B, A, B

41

Page 42: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Data Fits in MemoryReplica 1

Mem

Disk21 3

21

Replica 2

Mem

Disk21 3

32

MALB

FastFast

FastFast

A →

B →

21

2 3A, A, A, A

B, B, B, BMemory info?Many tx and replicas?Memory info?Many tx and replicas?

A, B, A, B

42

Page 43: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Exploit tx execution plan– Which tables & indices are accessed– Their access pattern

• Linear scan, direct access

• Metadata from database– Sizes of tables and indices

Estimate Tx Memory Needs

43

Page 44: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Objective– Construct tx groups that fit together in memory

• Bin packing– Item: tx memory needs– Bin: memory of replica– Heuristic: Best Fit Decreasing

• Allocate replicas to tx groups– Adjust for group loads

Grouping Transactions

44

Page 45: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

MALB in Action

A B CD E F

MALB

45

Page 46: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

MALB in Action

A B CD E F

MALB

Memory needs forA, B, C, D, E, F

46

Page 47: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

MALB in Action

A B CD E F Group B C

Group D E F

MALB

Memory needs forA, B, C, D, E, F

47

Page 48: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

MALB in Action

A B CD E F Replica

Replica

Replica

Group B C

A

Group D E F

B C

D E F

MALB

Disk

Disk

Disk

Memory needs forA, B, C, D, E, F

48

Page 49: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Objective– Optimize for in-memory execution

• Method– Estimate tx memory needs– Construct tx groups– Allocate replicas to tx groups

MALB Summary

49

Page 50: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Implementation– No change in consistency – Still middleware

• Compare– United: efficient baseline system– MALB: exploits working set information

• Same environment– Linux cluster running PostgreSQL– Workload: TPC-W Ordering (50% update txs)

Experimental Evaluation

50

Page 51: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

MALB Doubles Throughput

TPC-W

Ordering

16 replicas

51

Single Base United MALB UF0

20

40

60

80

100

120

TP

S

105%

12 X

25 X

1 X

7 X

Page 52: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

MALB Doubles Throughput

52United MALB

0.0

0.2

0.4

0.6

0.8

1.0

Single Base United MALB UF0

20

40

60

80

100

120

TP

S

Rea

d I/

O, n

orm

aliz

ed

105%

12 X

25 X

1 X

7 X

Page 53: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%4%0%0%29%29%

48%48%105%105%45%45%

182%182%75%75%12%12%

Page 54: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%4%0%0%29%29%

48%48%105%105%45%45%

182%182%75%75%12%12%

Run from memoryRun from memory

Run from disk

Run from disk

Page 55: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

1

Ordering

2, 3

Load balancingLoad balancing

Update propagationUpdate propagation

Commit updates in

order

Commit updates in

order 55

Page 56: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

• Traditional: – Propagate updates everywhere

• Update Filtering: – Propagate updates to where they are needed

Key Idea

56

Page 57: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update Filtering ExampleReplica 1

Mem

Disk21 3

Replica 2

Mem

Disk21 3

MALBUF

A →

B →

21

2 3

A, B, A, B

57

Page 58: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering ExampleReplica 1

Group B

Mem

Disk21 3

21

Replica 2

Mem

Disk21 3

32

MALBUF

A →

B →

21

2 3

A, B, A, B

58

Page 59: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2

Mem

Disk21 3

2

MALBUF

Updatetable 1

3

3

A →

B →

21

2 3

A, B, A, B

59

Page 60: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2

Mem

Disk2

13

2

MALBUF

Updatetable 1

3

3

A →

B →

21

2 3

A, B, A, B

60

Page 61: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2

Mem

Disk2 3

2

MALBUF

Updatetable 1

3

3

A →

B →

21

2 3

A, B, A, B

611

Page 62: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2

Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →

21

2 3

A, B, A, B

62

Page 63: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2

Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →

21

2 3

A, B, A, B

63

Page 64: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2

Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →

21

2 3

A, B, A, B

64

Page 65: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update Filtering in Action

UF

65

Page 66: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update Filtering in Action

UF

Update tored table

66

Page 67: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update Filtering in Action

UF

Update tored table

Update togreen table

67

Page 68: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update Filtering in Action

UF

Update tored table

Update togreen table

68

Page 69: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Update Filtering in Action

UF

Update tored table

Update togreen table

69

Page 70: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Single Base United MALB UF0

20

40

60

80

100

120

MALB+UF Triples Throughput

37 X

TP

S

12 X

25 X

1 X

7 X

49%TPC-W

Ordering

16 replicas

Page 71: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

MALB UF0

2

4

6

8

10

12

14

Single Base United MALB UF0

20

40

60

80

100

120

MALB+UF Triples Throughput

37 X

TP

S

12 X

25 X

1 X

7 X Pro

p. U

pd

ates

15

7

49%

Page 72: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

1.49

0

0.5

1

1.5

2

MALB MALB+UF

Filtering Opportunities

50%Ordering Mix

5% Browsing Mix

1.02

0

0.5

1

1.5

2

MALB MALB+UF

Updates

Rat

io

MA

LB

+U

F /

MA

LB

72

Page 73: Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety.

Conclusions1. Commit updates in order

– Perform serial synchronous disk writes– Unite ordering and durability

2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution

3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed

73