Top Banner
Nb-GCLOCK: A Non-blocking Buffer Management based on the Generalized CLOCK Makoto YUI 1 , Jun MIYAZAKI 2 , Shunsuke UEMURA 3 and Hayato YAMANA 4 1 .Research fellow, JSPS (Japan Society for the Promotion of Science) / Visiting Postdoc at Waseda University, Japan and CWI, Netherlands 2. Nara Institute of Science and Technology 3. Nara Sangyo University 4. Waseda University / National Institute of Informatics
94

ICDE2010 Nb-GCLOCK

May 10, 2015

Download

Technology

Makoto Yui

Makoto Yui, Jun Miyazaki, Shunsuke Uemura and Hayato Yamana. ``Nb-GCLOCK: A Non-blocking Buffer Management based on the Generalized CLOCK'',
Proc. ICDE, March 2010.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICDE2010 Nb-GCLOCK

Nb-GCLOCK: A Non-blocking Buffer Management based on the Generalized CLOCK

Makoto YUI1, Jun MIYAZAKI2, Shunsuke UEMURA3

and Hayato YAMANA4

1 .Research fellow, JSPS (Japan Society for the Promotion of Science) / Visiting Postdoc at Waseda University, Japan and CWI, Netherlands 2. Nara Institute of Science and Technology 3. Nara Sangyo University 4. Waseda University / National Institute of Informatics

Page 2: ICDE2010 Nb-GCLOCK

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

2

Page 3: ICDE2010 Nb-GCLOCK

3

UltraSparc T2 Azul Vega Larrabee?

Nehalem

Multi-Core CPU

Many-Core CPU

2000

1990

Core2

Power4

Pentium

Single-Core CPU

Background – Recent trends in CPU development

# of CPU cores in a chip is doubling in two year cycles

Many-core era is coming.

Page 4: ICDE2010 Nb-GCLOCK

4

UltraSparc T2 Azul Vega Larrabee?

Nehalem

Multi-Core CPU

Many-Core CPU

2000

1990

Core2

Power4

Pentium

Single-Core CPU

Background – Recent trends in CPU development

- Niagara T2 – 8 cores x 8 SMT = 64 processors - Azul Vega3 – 54 cores x 16 chips = 864 processors

# of CPU cores in a chip is doubling in two year cycles

Many-core era is coming.

Page 5: ICDE2010 Nb-GCLOCK

Open source DBs have faced CPU scalability problems

5

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

Background – CPU Scalability of open source DBs

Page 6: ICDE2010 Nb-GCLOCK

Open source DBs have faced CPU scalability problems

6

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Page 7: ICDE2010 Nb-GCLOCK

Open source DBs have faced CPU scalability problems

7

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Concurrent threads

Throughput (normalized)

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Page 8: ICDE2010 Nb-GCLOCK

Open source DBs have faced CPU scalability problems

8

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Concurrent threads

Throughput (normalized)

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Gain after 16 threads is less than 5 %

Page 9: ICDE2010 Nb-GCLOCK

Open source DBs have faced CPU scalability problems

9

Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”, In Proc. EDBT, 2009.

0

2

4

6

8

10

1 4 8 12 16 24 32

PostgreSQL

MySQL

BDB

Concurrent threads

Throughput (normalized)

Background – CPU Scalability of open source DBs

Microbenchmark on UltraSparc T1 (32 procs)

Gain after 16 threads is less than 5 %

You might think…

What about TPC-C ?

Page 10: ICDE2010 Nb-GCLOCK

10

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Page 11: ICDE2010 Nb-GCLOCK

11

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores

Page 12: ICDE2010 Nb-GCLOCK

12

Gain after 16 CPU cores is less than 5%

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores

Page 13: ICDE2010 Nb-GCLOCK

13

Gain after 16 CPU cores is less than 5%

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores Q. What PostgreSQL community did?

Page 14: ICDE2010 Nb-GCLOCK

14

Gain after 16 CPU cores is less than 5%

CPU scalability of PostgreSQL

Doug Tolbert, David Strong, Johney Tsai (Unisys), “Scaling PostgreSQL on SMP Architectures”, PGCON 2007.

TPC-C benchmark result on a high-end Linux machine of Unisys

(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)

Version 8.0

Version 8.1

Version 8.2

TPS

CPU cores Q. What PostgreSQL community did?

Revised their synchronization mechanisms in the buffer management module

Page 15: ICDE2010 Nb-GCLOCK

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Page 16: ICDE2010 Nb-GCLOCK

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Page 17: ICDE2010 Nb-GCLOCK

20

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Looking-up hash table

Page replacement algorithm

Page requests

hits misses

Database Files

Buffer Manager

(1)

(2)

Page 18: ICDE2010 Nb-GCLOCK

18

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Looking-up hash table

Page replacement algorithm

Page requests

hits misses

Database Files

Buffer Manager

(1)

(2)

Page 19: ICDE2010 Nb-GCLOCK

19

[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki: “Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008. [2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.

CPU

Memory

HDD Database

Files

Buffer Manager

Page requests

reduces disk access by caching database pages

Synchronization in Buffer Management Module

Several empirical studies have revealed that the largest bottleneck is …

synchronization in buffer management module

Looking-up hash table

Page replacement algorithm

Page requests

hits misses

Database Files

Buffer Manager

(1)

(2)

Page 20: ICDE2010 Nb-GCLOCK

20

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

PostgreSQL 8.0 PostgreSQL 8.1

Page 21: ICDE2010 Nb-GCLOCK

21

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks!

Page 22: ICDE2010 Nb-GCLOCK

22

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

LRU list always needs to be locked when it is accessed

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks!

Page 23: ICDE2010 Nb-GCLOCK

23

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

LRU list always needs to be locked when it is accessed

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks! Striped a lock into buckets

Page 24: ICDE2010 Nb-GCLOCK

24

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Looking-up hash table

Page replacement algorithm (Least Recently Used)

Page requests

hits misses

Database Files

Naive buffer management schemes

LRU list always needs to be locked when it is accessed

PostgreSQL 8.0 PostgreSQL 8.1

Giant lock sucks!

Did not scale at all Scales up to 8 processors

Striped a lock into buckets

Page 25: ICDE2010 Nb-GCLOCK

Page requests

25

Less naive buffer management schemes

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

hits misses

Database Files

PostgreSQL 8.1

Scales up to 8 processors

Always needs to be locked when it is accessed

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (CLOCK)

Page requests

hits misses

Database Files

PostgreSQL 8.2

Page 26: ICDE2010 Nb-GCLOCK

Page requests CLOCK does not require a lock when an entry is touched

26

Less naive buffer management schemes

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (Least Recently Used)

hits misses

Database Files

PostgreSQL 8.1

Scales up to 8 processors

Always needs to be locked when it is accessed

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (CLOCK)

Page requests

hits misses

Database Files

PostgreSQL 8.2

Scales up to 16 processors

Page 27: ICDE2010 Nb-GCLOCK

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

27

Page 28: ICDE2010 Nb-GCLOCK

28

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

Core idea of our approach

Page 29: ICDE2010 Nb-GCLOCK

29

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Core idea of our approach

Page 30: ICDE2010 Nb-GCLOCK

30

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Core idea of our approach

intuition

Page 31: ICDE2010 Nb-GCLOCK

31

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Disk bandwidth is not utilized

Enough processors

Core idea of our approach

Page 32: ICDE2010 Nb-GCLOCK

32

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Disk bandwidth is not utilized

Enough processors

Core idea of our approach

Page 33: ICDE2010 Nb-GCLOCK

33

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

Disk bandwidth is not utilized

Enough processors

Reduced lock granularity to one CPU instruction and remove the bottleneck

Core idea of our approach

Page 34: ICDE2010 Nb-GCLOCK

34

Database files

Buffer Manager

Request pages

Previous approaches Our optimistic approach

CPU

Memory

HDD Database

files

Buffer Manager

Request pages

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Disk bandwidth is not utilized

Enough processors

Reduced lock granularity to one CPU instruction and remove the bottleneck

Core idea of our approach

Page 35: ICDE2010 Nb-GCLOCK

35

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Page 36: ICDE2010 Nb-GCLOCK

36

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Page 37: ICDE2010 Nb-GCLOCK

37

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Our goal is …

Page 38: ICDE2010 Nb-GCLOCK

38

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Our goal is …

Improve throughputs by utilizing (many) CPUs.

Page 39: ICDE2010 Nb-GCLOCK

39

Previous approaches Our optimistic approach

○Reducing disk I/Os × locks are contended

△ # of I/O slightly increases ○ no contention on locks

Major Difference to Previous Approaches

Their goal is …

Improve buffer hit-rates for reducing I/Os

Unique goal for many decades. Is this goal valid for many core era? There are also SSDs.

Our goal is …

Improve throughputs by utilizing (many) CPUs.

Use Non-blocking synchronization instead of acquiring locks!

Page 40: ICDE2010 Nb-GCLOCK

40

What’s non-blocking and lock-free?

Formally:

Page 41: ICDE2010 Nb-GCLOCK

41

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Page 42: ICDE2010 Nb-GCLOCK

42

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

Page 43: ICDE2010 Nb-GCLOCK

43

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

No thread 'locks' any resource No 'critical sections', locks, mutexs, spin-locks, etc

Page 44: ICDE2010 Nb-GCLOCK

44

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

No thread 'locks' any resource No 'critical sections', locks, mutexs, spin-locks, etc

Lock-free if every successful step makes Global Progress and completes within finite time (ensuring liveness)

Page 45: ICDE2010 Nb-GCLOCK

45

What’s non-blocking and lock-free?

Formally:

Stopping one thread will not prevent global progress. Individual threads make progress without waiting.

Less Formally:

No thread 'locks' any resource No 'critical sections', locks, mutexs, spin-locks, etc

Lock-free if every successful step makes Global Progress and completes within finite time (ensuring liveness)

Wait-free if every step makes Global Progress and completes within finite time (ensuring fairness)

Page 46: ICDE2010 Nb-GCLOCK

46

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives

Utilize memory barriers

Non-blocking synchronization

Page 47: ICDE2010 Nb-GCLOCK

47

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives CAS (compare-and-swap) cmpxchg on X86

Utilize memory barriers

Non-blocking synchronization

Page 48: ICDE2010 Nb-GCLOCK

48

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives CAS (compare-and-swap) cmpxchg on X86

Utilize memory barriers

Non-blocking synchronization

acquire_lock(lock); counter++; release_lock(lock);

Blocking

Page 49: ICDE2010 Nb-GCLOCK

49

Synchronization method that does not acquire any lock, enabling concurrent accesses to shared resources

Utilize atomic CPU primitives CAS (compare-and-swap) cmpxchg on X86

Utilize memory barriers

Non-blocking synchronization

acquire_lock(lock); counter++; release_lock(lock);

int old; do { old = *counter; } while (!CAS(counter, old, old+1));

Blocking Non-Blocking

counter is incremented if the value was equals to old

Page 50: ICDE2010 Nb-GCLOCK

50

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

Page 51: ICDE2010 Nb-GCLOCK

51

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

1. Utilized existing lock-free hash table

Page 52: ICDE2010 Nb-GCLOCK

52

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

1. Utilized existing lock-free hash table

2. Removing locks on cache misses (in fig. 6)

Page 53: ICDE2010 Nb-GCLOCK

53

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

Page 54: ICDE2010 Nb-GCLOCK

54

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

3. Need to keep consistency between lookup hash table and GCLOCK (in the right half of fig. 3)

Page 55: ICDE2010 Nb-GCLOCK

55

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

Reference in buffer lookup table still has a different page identifier immediately after changing the page allocation of a buffer frame

3. Need to keep consistency between lookup hash table and GCLOCK (in the right half of fig. 3)

Page 56: ICDE2010 Nb-GCLOCK

56

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (GCLOCK)

Page requests

hits misses

Database Files

Making the buffer manager non-blocking

lock; lseek; read; unlock

Reference in buffer lookup table still has a different page identifier immediately after changing the page allocation of a buffer frame

3. Need to keep consistency between lookup hash table and GCLOCK (in the right half of fig. 3)

4. Avoided locks on I/Os by utilizing pread, CAS, and memory barriers (in fig. 5)

Page 57: ICDE2010 Nb-GCLOCK

57

State Machine-based Reasoning for selecting replacement victim

Construct algorithm from many 'steps' ─ build a State Machine for ensuring glabal progress

Page 58: ICDE2010 Nb-GCLOCK

58

State Machine-based Reasoning for selecting replacement victim

Page 59: ICDE2010 Nb-GCLOCK

59

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

State Machine-based Reasoning for selecting replacement victim

Page 60: ICDE2010 Nb-GCLOCK

60

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Start finding a replacement victim

State Machine-based Reasoning for selecting replacement victim

Page 61: ICDE2010 Nb-GCLOCK

61

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Page 62: ICDE2010 Nb-GCLOCK

62

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Page 63: ICDE2010 Nb-GCLOCK

63

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Page 64: ICDE2010 Nb-GCLOCK

64

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Page 65: ICDE2010 Nb-GCLOCK

65

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Thread B

Page 66: ICDE2010 Nb-GCLOCK

66

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Thread B Oops! Candidate is intercepted.

Page 67: ICDE2010 Nb-GCLOCK

67

Select a frame

Check whether Evicted

!null

continue

E: try next entry

null

Fix in pool

Check whether Pinned

evicted

!evicted

pinned

Try to decrement the refcount

E: decrement the refcount

success

E: move the clock hand

--refcount>0

--refcount<=0

! swapped

Try to evict

E: evict

evicted

!evicted

!pinned

E: CAS value

swapped E: entry action

Advance CLOCK hand (check the next candidate)

Return a replacement victim

Start finding a replacement victim

Decrement weight count of a buffer page

State Machine-based Reasoning for selecting replacement victim

Thread A

Thread B

Page 68: ICDE2010 Nb-GCLOCK

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

68

Page 69: ICDE2010 Nb-GCLOCK

69

Workload Zipf 80/20 distribution (a famous power law)

containing 20% of sequential scans

dataset size is 32GB in total Machine used: UltraSPARC T2

Experimental settings

64 processors

Page 70: ICDE2010 Nb-GCLOCK

70

Workload Zipf 80/20 distribution (a famous power law)

containing 20% of sequential scans

dataset size is 32GB in total Machine used: UltraSPARC T2

Experimental settings

64 processors

We also performed evaluation on various X86 settings in the paper.

Page 71: ICDE2010 Nb-GCLOCK

71

Throughput (normalized by LRU)

Processors

Performance comparison on moderate I/Os (of fig.9)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

8 16 32 64

LRU

GCLOCK

Nb-GCLOCK

Page 72: ICDE2010 Nb-GCLOCK

72

Throughput (normalized by LRU)

Processors

Performance comparison on moderate I/Os (of fig.9)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

8 16 32 64

LRU

GCLOCK

Nb-GCLOCK

CPU utilization Previous approach: Low, about 20% Nb-GCLOCK: High, more than 95%

Page 73: ICDE2010 Nb-GCLOCK

73

More difference in CPU time can be expected when # of CPU increases ➜ We expect more throughput

Throughput (normalized by LRU)

Processors

Performance comparison on moderate I/Os (of fig.9)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

8 16 32 64

LRU

GCLOCK

Nb-GCLOCK

CPU utilization Previous approach: Low, about 20% Nb-GCLOCK: High, more than 95%

Page 74: ICDE2010 Nb-GCLOCK

74

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Page 75: ICDE2010 Nb-GCLOCK

75

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Processors (cores)

Throughput (log scale)

8 (1) 16 (2) 32 (4) 64 (8)

2Q 890992 819975 866009 662782

GCLOCK 1758605 1912000 1931268 1817748

Nb-GCLOCK 3409819 7331722 14245524 25834449

Page 76: ICDE2010 Nb-GCLOCK

76

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Processors (cores)

Throughput (log scale)

8 (1) 16 (2) 32 (4) 64 (8)

2Q 890992 819975 866009 662782

GCLOCK 1758605 1912000 1931268 1817748

Nb-GCLOCK 3409819 7331722 14245524 25834449

Achieved almost linear scalability, at least, up to 64 processors! This is the first attempt that removed locks in buffer management

Page 77: ICDE2010 Nb-GCLOCK

77

Maximum throughput to processors

Scalability to processors when pages are resident in memory intending to see the scalability limit expected by each algorithm

Processors (cores)

Throughput (log scale)

8 (1) 16 (2) 32 (4) 64 (8)

2Q 890992 819975 866009 662782

GCLOCK 1758605 1912000 1931268 1817748

Nb-GCLOCK 3409819 7331722 14245524 25834449

Achieved almost linear scalability, at least, up to 64 processors! This is the first attempt that removed locks in buffer management

Interesting here is GCLOCK has CPU-scalability limit on around 16 processors. Caching solutions using GCLOCK have scalability limit there.

Page 78: ICDE2010 Nb-GCLOCK

78

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Page 79: ICDE2010 Nb-GCLOCK

79

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Page 80: ICDE2010 Nb-GCLOCK

80

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Most of CPU time is used because our Nb-GCLOCK is non-blocking!

Page 81: ICDE2010 Nb-GCLOCK

81

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Most of CPU time is used because our Nb-GCLOCK is non-blocking!

About 10-20% of CPU Time is used!

Page 82: ICDE2010 Nb-GCLOCK

82

Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)

Accesses issued from 64 threads in 60 seconds Thus, ideally 64 x 60 = 3,840 seconds can be used

Max thoughput (operation/sec) evaluation

Most of CPU time is used because our Nb-GCLOCK is non-blocking!

About 10-20% of CPU Time is used!

The CPU utilization would be more differs when # of processors grows. It would causes contentions!

Page 83: ICDE2010 Nb-GCLOCK

800

900

1000

1100

1200

1300

1400

8 16 32 64 128

tpmC

# of terminals (threads)

Derby

Nb-GCLOCK

TPC-C evaluation using Apache Derby

Transaction per minutes

83

Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems. In Proc. VLDB, 2001.

Page 84: ICDE2010 Nb-GCLOCK

800

900

1000

1100

1200

1300

1400

8 16 32 64 128

tpmC

# of terminals (threads)

Derby

Nb-GCLOCK

TPC-C evaluation using Apache Derby

Transaction per minutes

The original scheme of Derby (CLOCK) decreased throughput. On the other hand, ours scheme showed better result.

84

Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems. In Proc. VLDB, 2001.

Page 85: ICDE2010 Nb-GCLOCK

800

900

1000

1100

1200

1300

1400

8 16 32 64 128

tpmC

# of terminals (threads)

Derby

Nb-GCLOCK

TPC-C evaluation using Apache Derby

Transaction per minutes

Throughput to buffer management module reduced a latch on root page of B+-tree ➜ We would require a concurrent B+-tree (see OLFIT)

85

Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems. In Proc. VLDB, 2001.

Page 86: ICDE2010 Nb-GCLOCK

Outline

• Background

• Our approach

– Non-Blocking Synchronization

– Nb-GCLOCK

• Experimental Evaluation

• Related Work

• Conclusion

86

Page 87: ICDE2010 Nb-GCLOCK

87

Bp-wrapper

Page requests

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (any)

hits misses

Database Files

Recording access

Xiaoning Ding, Song Jiang, and Xiaodong Zhang: Bp-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.

eliminates lock contention on buffer hits by using a batching and prefetching technique

Page 88: ICDE2010 Nb-GCLOCK

88

Bp-wrapper

Page requests

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (any)

hits misses

Database Files

Recording access

Xiaoning Ding, Song Jiang, and Xiaodong Zhang: Bp-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.

eliminates lock contention on buffer hits by using a batching and prefetching technique

called Lazy synchronization in the literature

postpones the physical work (adjusting the buffer replacement list)

and immediately returns the logical operation

Page 89: ICDE2010 Nb-GCLOCK

89

Bp-wrapper

Page requests

Hash bucket

Hash bucket

Hash bucket

Hash bucket

Page replacement algorithm (any)

hits misses

Database Files

Recording access

Xiaoning Ding, Song Jiang, and Xiaodong Zhang: Bp-Wrapper: A System Framework Making Any Replacement Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.

eliminates lock contention on buffer hits by using a batching and prefetching technique

called Lazy synchronization in the literature

postpones the physical work (adjusting the buffer replacement list)

and immediately returns the logical operation

Pros. - works with any page replacement algorithm

Cons. - Does not increase throughputs of CLOCK variants because CLOCK does not require locks on buffer hits

- Cache misses involve batching larger lock holding time makes more contentions

Page 90: ICDE2010 Nb-GCLOCK

90

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

Linearizability and lock-freedom are proven in the paper

Page 91: ICDE2010 Nb-GCLOCK

91

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

almost linear scalability to processors up to 64 processors while existing locking-based schemes do not scale beyond 16 processors

The first attempt that introduce non-blocking synchronization to database buffer management Optimistic I/Os using pread, CAS and memory barriers

Linearizability and lock-freedom are proven in the paper

Page 92: ICDE2010 Nb-GCLOCK

92

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

almost linear scalability to processors up to 64 processors while existing locking-based schemes do not scale beyond 16 processors

The first attempt that introduce non-blocking synchronization to database buffer management Optimistic I/Os using pread, CAS and memory barriers

Linearizability and lock-freedom are proven in the paper

The lock-freedom guarantees a certain throughput: any active thread taking a bounded number of steps ensures global progress.

Page 93: ICDE2010 Nb-GCLOCK

93

Proposed a lock-free variant of the GCLOCK page replacement algorithm, named Nb-GCLOCK.

Conclusions

almost linear scalability to processors up to 64 processors while existing locking-based schemes do not scale beyond 16 processors

The first attempt that introduce non-blocking synchronization to database buffer management Optimistic I/Os using pread, CAS and memory barriers

Linearizability and lock-freedom are proven in the paper

The lock-freedom guarantees a certain throughput: any active thread taking a bounded number of steps ensures global progress.

This work is also useful for any caching solution that requires high throughput (e.g., C10K accesses)

Page 94: ICDE2010 Nb-GCLOCK

94

Thank you for your attention!