ICDE2010 Nb-GCLOCK

Post on 10-May-2015

2201 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Makoto Yui, Jun Miyazaki, Shunsuke Uemura and Hayato Yamana. ``Nb-GCLOCK: A Non-blocking Buffer Management based on the Generalized CLOCK'', Proc. ICDE, March 2010.

Transcript

Nb-GCLOCK: A Non-blocking Buffer Management based on the Generalized CLOCK

Makoto YUI1, Jun MIYAZAKI2, Shunsuke UEMURA3

and Hayato YAMANA4

1 .Research fellow, JSPS (Japan Society for the Promotion of Science) / Visiting Postdoc at Waseda University, Japan and CWI, Netherlands 2. Nara Institute of Science and Technology 3. Nara Sangyo University 4. Waseda University / National Institute of Informatics

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

2

3

UltraSparc T2 Azul Vega Larrabee?

Nehalem

Multi-Core CPU

Many-Core CPU

2000

1990

Core2

Power4

Pentium

Single-Core CPU

Background – Recent trends in CPU development

# of CPU cores in a chip is doubling in two year cycles

Many-core era is coming.

4

UltraSparc T2 Azul Vega Larrabee?

Nehalem

Multi-Core CPU

Many-Core CPU

2000

1990

Core2

Power4

Pentium

Single-Core CPU

Background – Recent trends in CPU development

- Niagara T2 – 8 cores x 8 SMT = 64 processors - Azul Vega3 – 54 cores x 16 chips = 864 processors

# of CPU cores in a chip is doubling in two year cycles

Many-core era is coming.

Open source DBs have faced CPU scalability problems

5

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

Background – CPU Scalability of open source DBs

Open source DBs have faced CPU scalability problems

6

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Open source DBs have faced CPU scalability problems

7

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Concurrent threads

Throughput (normalized)

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Open source DBs have faced CPU scalability problems

8

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Concurrent threads

Throughput (normalized)

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Gain after 16 threads is less than 5 %

Open source DBs have faced CPU scalability problems

9

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Concurrent threads

Throughput (normalized)

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Gain after 16 threads is less than 5 %

You might think…

What about TPC-C ?

10

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

11

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores

12

Gain after 16 CPU cores is less than 5%

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores

13

Gain after 16 CPU cores is less than 5%

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores Q. What PostgreSQL community did?

14

Gain after 16 CPU cores is less than 5%

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores Q. What PostgreSQL community did?

Revised their synchronization mechanisms in the buffer management module

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

20

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Looking-up hash table

Page replacement algorithm

Page requests

hits misses

Database Files

Buffer Manager

(1)

(2)

18

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Looking-up hash table

Page replacement algorithm

Page requests

hits misses

Database Files

Buffer Manager

(1)

(2)

19

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Looking-up hash table

Page replacement algorithm

Page requests

hits misses

Database Files

Buffer Manager

(1)

(2)

20

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

PostgreSQL 8.0 PostgreSQL 8.1

21

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks!

22

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

LRU list always needs to be locked when it is accessed

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks!

23

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

LRU list always needs to be locked when it is accessed

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks! Striped a lock into buckets

24

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

LRU list always needs to be locked when it is accessed

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks!

Did not scale at all Scales up to 8 processors

Striped a lock into buckets

Page requests

25

Less naive buffer management schemes

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

hits misses

Database Files

PostgreSQL 8.1

Scales up to 8 processors

Always needs to be locked when it is accessed

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (CLOCK)

Page requests

hits misses

Database Files

PostgreSQL 8.2

Page requests CLOCK does not require a lock when an entry is touched

26

Less naive buffer management schemes

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

hits misses

Database Files

PostgreSQL 8.1

Scales up to 8 processors

Always needs to be locked when it is accessed

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (CLOCK)

Page requests

hits misses

Database Files

PostgreSQL 8.2

Scales up to 16 processors

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

27

28

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

Core idea of our approach

29

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Core idea of our approach

30

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Core idea of our approach

intuition

31

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Disk bandwidth is not utilized

Enough processors

Core idea of our approach

32

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Disk bandwidth is not utilized

Enough processors

Core idea of our approach

33

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Disk bandwidth is not utilized

Enough processors

Reduced lock granularity to one CPU instruction and remove the bottleneck

Core idea of our approach

34

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Disk bandwidth is not utilized

Enough processors

Reduced lock granularity to one CPU instruction and remove the bottleneck

Core idea of our approach

35

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

36

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

37

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Our goal is …

38

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Our goal is …

Improve throughputs by utilizing (many) CPUs.

39

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Our goal is …

Improve throughputs by utilizing (many) CPUs.

Use Non-blocking synchronization instead of acquiring locks!

40

What’s non-blocking and lock-free?

Formally:

41

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

42

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

43

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

No thread 'locks' any resource No 'critical sections', locks, mutexs, spin-locks, etc

44

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

No thread 'locks' any resource No 'critical sections', locks, mutexs, spin-locks, etc

Lock-free if every successful step makes Global Progress and completes within finite time (ensuring liveness)

45

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

No thread 'locks' any resource No 'critical sections', locks, mutexs, spin-locks, etc

Lock-free if every successful step makes Global Progress and completes within finite time (ensuring liveness)

Wait-free if every step makes Global Progress and completes within finite time (ensuring fairness)

46

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives

Utilize memory barriers

Non-blocking synchronization

47

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives CAS (compare-and-swap) cmpxchg on X86

Utilize memory barriers

Non-blocking synchronization

48

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives CAS (compare-and-swap) cmpxchg on X86

Utilize memory barriers

Non-blocking synchronization

acquire_lock(lock); counter++; release_lock(lock);

Blocking

49

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives CAS (compare-and-swap) cmpxchg on X86

Utilize memory barriers

Non-blocking synchronization

acquire_lock(lock); counter++; release_lock(lock);

int old; do { old = *counter; } while (!CAS(counter, old, old+1));

Blocking Non-Blocking

counter is incremented if the value was equals to old

50

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

51

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

1. Utilized existing lock-free hash table

52

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

1. Utilized existing lock-free hash table

2. Removing locks on cache misses (in fig. 6)

53

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

54

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

3. Need to keep consistency between lookup hash table and GCLOCK (in the right half of fig. 3)

55

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

Reference in buffer lookup table still has a different page identifier immediately after changing the page allocation of a buffer frame

3. Need to keep consistency between lookup hash table and GCLOCK (in the right half of fig. 3)

56

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

Reference in buffer lookup table still has a different page identifier immediately after changing the page allocation of a buffer frame

3. Need to keep consistency between lookup hash table and GCLOCK (in the right half of fig. 3)

4. Avoided locks on I/Os by utilizing pread, CAS, and memory barriers (in fig. 5)

57

State Machine-based Reasoning for selecting replacement victim

Construct algorithm from many 'steps' ─ build a State Machine for ensuring glabal progress

58

State Machine-based Reasoning for selecting replacement victim

59

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

State Machine-based Reasoning for selecting replacement victim

60

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Start finding a replacement victim

State Machine-based Reasoning for selecting replacement victim

61

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

62

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

63

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

64

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

65

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Thread B

66

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Thread B Oops! Candidate is intercepted.

67

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Thread B

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

68

69

Workload Zipf 80/20 distribution (a famous power law)

containing 20% of sequential scans

dataset size is 32GB in total Machine used: UltraSPARC T2

Experimental settings

64 processors

70

Workload Zipf 80/20 distribution (a famous power law)

containing 20% of sequential scans

dataset size is 32GB in total Machine used: UltraSPARC T2

Experimental settings

64 processors

We also performed evaluation on various X86 settings in the paper.

71

Throughput (normalized by LRU)

Processors

Performance comparison on moderate I/Os (of fig.9)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

8 16 32 64

LRU

GCLOCK

Nb-GCLOCK

72

Throughput (normalized by LRU)

Processors

Performance comparison on moderate I/Os (of fig.9)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

8 16 32 64

LRU

GCLOCK

Nb-GCLOCK

CPU utilization Previous approach: Low, about 20% Nb-GCLOCK: High, more than 95%

73

More difference in CPU time can be expected when # of CPU increases ➜ We expect more throughput

Throughput (normalized by LRU)

Processors

Performance comparison on moderate I/Os (of fig.9)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

8 16 32 64

LRU

GCLOCK

Nb-GCLOCK

CPU utilization Previous approach: Low, about 20% Nb-GCLOCK: High, more than 95%

74

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

75

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Processors (cores)

Throughput (log scale)

8 (1) 16 (2) 32 (4) 64 (8)

2Q 890992 819975 866009 662782

GCLOCK 1758605 1912000 1931268 1817748

Nb-GCLOCK 3409819 7331722 14245524 25834449

76

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Processors (cores)

Throughput (log scale)

8 (1) 16 (2) 32 (4) 64 (8)

2Q 890992 819975 866009 662782

GCLOCK 1758605 1912000 1931268 1817748

Nb-GCLOCK 3409819 7331722 14245524 25834449

Achieved almost linear scalability, at least, up to 64 processors! This is the first attempt that removed locks in buffer management

77

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Processors (cores)

Throughput (log scale)

8 (1) 16 (2) 32 (4) 64 (8)

2Q 890992 819975 866009 662782

GCLOCK 1758605 1912000 1931268 1817748

Nb-GCLOCK 3409819 7331722 14245524 25834449

Achieved almost linear scalability, at least, up to 64 processors! This is the first attempt that removed locks in buffer management

Interesting here is GCLOCK has CPU-scalability limit on around 16 processors. Caching solutions using GCLOCK have scalability limit there.

78

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

79

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

80

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Most of CPU time is used because our Nb-GCLOCK is non-blocking!

81

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Most of CPU time is used because our Nb-GCLOCK is non-blocking!

About 10-20% of CPU Time is used!

82

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Most of CPU time is used because our Nb-GCLOCK is non-blocking!

About 10-20% of CPU Time is used!

The CPU utilization would be more differs when # of processors grows. It would causes contentions!

800

900

1000

1100

1200

1300

1400

8 16 32 64 128

tpmC

# of terminals (threads)

Derby

Nb-GCLOCK

TPC-C evaluation using Apache Derby

Transaction per minutes

83

Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems. In Proc. VLDB, 2001.

800

900

1000

1100

1200

1300

1400

8 16 32 64 128

tpmC

# of terminals (threads)

Derby

Nb-GCLOCK

TPC-C evaluation using Apache Derby

Transaction per minutes

The original scheme of Derby (CLOCK) decreased throughput. On the other hand, ours scheme showed better result.

84

Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems. In Proc. VLDB, 2001.

800

900

1000

1100

1200

1300

1400

8 16 32 64 128

tpmC

# of terminals (threads)

Derby

Nb-GCLOCK

TPC-C evaluation using Apache Derby

Transaction per minutes

Throughput to buffer management module reduced a latch on root page of B+-tree ➜ We would require a concurrent B+-tree (see OLFIT)

85

Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems. In Proc. VLDB, 2001.

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

86

87

Bp-wrapper

Page requests

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (any)

hits misses

Database Files

Recording access

Xiaoning Ding, Song Jiang, and Xiaodong Zhang: Bp-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.

eliminates lock contention on buffer hits by using a batching and prefetching technique

88

Bp-wrapper

Page requests

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (any)

hits misses

Database Files

Recording access

Xiaoning Ding, Song Jiang, and Xiaodong Zhang: Bp-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.

eliminates lock contention on buffer hits by using a batching and prefetching technique

called Lazy synchronization in the literature

postpones the physical work (adjusting the buffer replacement list)

and immediately returns the logical operation

89

Bp-wrapper

Page requests

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (any)

hits misses

Database Files

Recording access

Xiaoning Ding, Song Jiang, and Xiaodong Zhang: Bp-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.

eliminates lock contention on buffer hits by using a batching and prefetching technique

called Lazy synchronization in the literature

postpones the physical work (adjusting the buffer replacement list)

and immediately returns the logical operation

Pros. - works with any page replacement algorithm

Cons. - Does not increase throughputs of CLOCK variants because CLOCK does not require locks on buffer hits

- Cache misses involve batching larger lock holding time makes more contentions

90

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

Linearizability and lock-freedom are proven in the paper

91

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

almost linear scalability to processors up to 64 processors while existing locking-based schemes do not scale beyond 16 processors

The first attempt that introduce non-blocking synchronization to database buffer management Optimistic I/Os using pread, CAS and memory barriers

Linearizability and lock-freedom are proven in the paper

92

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

almost linear scalability to processors up to 64 processors while existing locking-based schemes do not scale beyond 16 processors

The first attempt that introduce non-blocking synchronization to database buffer management Optimistic I/Os using pread, CAS and memory barriers

Linearizability and lock-freedom are proven in the paper

The lock-freedom guarantees a certain throughput: any active thread taking a bounded number of steps ensures global progress.

93

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

almost linear scalability to processors up to 64 processors while existing locking-based schemes do not scale beyond 16 processors

The first attempt that introduce non-blocking synchronization to database buffer management Optimistic I/Os using pread, CAS and memory barriers

Linearizability and lock-freedom are proven in the paper

The lock-freedom guarantees a certain throughput: any active thread taking a bounded number of steps ensures global progress.

This work is also useful for any caching solution that requires high throughput (e.g., C10K accesses)

94

Thank you for your attention!

top related