Top Banner
Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William and Mary The 2006 International Symposium on Memory Management June 10, 2006
32

Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Scalable Locality-Conscious Multithreaded

Memory Allocation

Scott Schneider

Christos D. Antonopoulos

Dimitrios S. Nikolopoulos

The College of William and Mary

The 2006 International Symposium on Memory Management

June 10, 2006

Page 2: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Outline

Introduction Related Work Streamflow design: data structures and

operations Experimental Evaluation Conclusions

Page 3: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Introduction

Multithreading is becoming more common Sophistication of system software trails

hardware Synchronization mechanisms used in system

software can greatly effect performance

Page 4: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Related Work

Hoard Emery Berger et al., ASPLOS 2000 Lock based, per-processor and global heaps

Michael’s Maged Michael, PLDI 2004 Lock-free

Tcmalloc Sanjay Ghemawat, part of Google’s perftools Lock based

Page 5: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Streamflow

Promote scalability and reduce latency Lock-free algorithms and data structures Synchronization-free in the common case Decoupled remote object deallocation

Promote locality Favors locally recycled objects in private heaps Thread-local heaps reduces false-sharing Removing object headers Custom page manager

Page 6: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 7: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

heaps pageblocks

Page 8: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Th

read

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 9: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Ob

ject

siz

e cl

asse

s

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 10: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 11: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 12: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 13: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 14: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Allocation

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 15: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Allocation

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 16: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Allocation

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 17: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Local Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Page 18: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Local Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock belongs to current thread

Page 19: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Local Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock belongs to current thread

Page 20: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Remote Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock does not belong to current thread

Page 21: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Remote Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock does not belong to current thread

Page 22: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Design: Page Manager

Manages pageblocks Implemented using superpages; 4MB vs. 4K

Allows Streamflow to allocate pageblocks in contiguous physical memory regions

Reduces TLB misses and minor page faults Superpage headers are managed similar to

small objects Pageblocks are allocated within a superpage

using buddy allocation

Page 23: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: System

4 processor Dell PowerEdge 6650 Hyper-Threaded Intel Xeon processors at 2.0GHz 2 GB RAM

Suse Linux 9.1 with kernel 2.6.13.4 and glibc 2-3.3

Hoard version 3.3.0 Tcmalloc version 0.4 Custom 32-bit implementation of Michael’s

Page 24: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Benchmarks

Sequential Parser: SPECINT2000 English parser

Multithreaded Synthetic

Recycle: stresses local allocation and frees Larson: server simulator; stresses remote frees Consume: producer-consume

Applications MPCDM: Multithreaded mesh generation

Page 25: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Sequential

sequential Streamflow multithreaded

Parser

0

100

200

300

400

500

600

700

Exe

cutio

n t

ime

(se

con

ds)

glibc sequential

Vam

Hoard sequential

Streamflow headers

Streamflow wo headers

Streamflow super

glibc MT

Hoard MT

Michael

Tcmalloc

Page 26: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Multithreaded

Recycle

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Page 27: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Multithreaded

Larson

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8

Threads

Th

rou

gh

pu

t (M

op

s/se

c)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Page 28: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Multithreaded

Consume

0

50

100

150

200

250

300

350

400

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Page 29: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Multithreaded

MPCDM

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Page 30: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Conclusions

Presented a new memory allocator design Uses lock-free algorithms and data structures Synchronization-free in the common case Promotes locality at multiple levels

Experimental evaluation shows the designs performs in practice

http://www.cs.wm.edu/streamflow

Page 31: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Multithreaded

Knary

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Page 32: Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William.

Evaluation: Multithreaded

Barnes

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

hoard

glibc

Tcmalloc