Top Banner
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research
42

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Jan 02, 2016

Download

Documents

Conrad Sparks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization

David BaconPerry Cheng (presenting)V.T. Rajan

IBM T.J. Watson Research

Page 2: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and

Space Usage Heap Architecture

Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 3: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Real-time Embedded Systems Memory usage important

Uniprocessor

Problem Domain

Page 4: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

3 Styles of Uniprocessor Garbage Collection:Stop-the-World vs. Incremental vs. Real-Time

STW

Inc

RT

time

Page 5: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Pause Times (Average and Maximum)

STW

Inc

RT

1.5s 1.7s

0.5s 0.7s 0.3s 0.5s 0.9s 0.3s

0.15 - 0.19 s

1.6s

0.5s

0.18s

Page 6: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Coarse-Grained Utilization vs. Time

0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

Time (s)

Uti

liza

tio

n (

%)

STW

Inc

RT

2.0 s window

Page 7: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Fine-Grained Utilization vs. Time

STW

Inc

RT

0

0.2

0.4

0.6

0.8

1

0

0.25 0.5

0.75 1

1.25 1.5

1.75 2

2.25 2.5

2.75 3

3.25 3.5

3.75 4

4.25 4.5

4.75 5

5.25 5.5

5.75 6

6.25 6.5

6.75 7

7.25 7.5

7.75 8

Time (s)

Uti

liza

tio

n

0.4 s window

Page 8: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Minimum Mutator Utilization (MMU)

STW

Inc

RT

0

20

40

60

80

100

Window Size (s) - logarithmic scale

MM

U

Page 9: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Space Usage over Time

0

10

20

30

40

50

60

70

80

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Time (s)

Use

d S

pace

(M

b)

STW

Inc

RTmax live

trigger

2 X max live

Page 10: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Problems with Existing RT Collectors

0

20

40

60

80

100

0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0

T i me (s )

Spa

ce (M

b)

max live2 X max live3 X max live4 X max live

Non-moving Collector

0

20

40

60

80

100

T i me (s )

MM

U

0

20

40

60

80

100

0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 8. 0

T i me (s )

Spa

ce (M

b)

max live2 X max live3 X max live4 X max live

Replicating Collector

Not fully incremental,Tight coupling,Work-based scheduling

Page 11: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Our Collector Goals Results

Real-Time ~10 ms Low Space Overhead ~2X Good Utilization during GC ~ 40%

Solution Incremental Mark-Sweep Collector Write barrier – snapshot-at-the-beginning [Yuasa] Segregated free list heap architecture Read Barrier – to support defragmentation [Brooks]

Incremental defragmentation Segmented arrays – to bound fragmentation

Page 12: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage

Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 13: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Fragmentation and Compaction

Intuitively: available but unusable memory

avoidance and coalescing - no guarantees compaction

used

needed

free

Page 14: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Heap Architecture Segregated Free Lists

– heap divided into pages– each page has equally-sizes blocks (1 object

per block)– Large arrays are segmented

used free

sz 24

sz 32

external

internal page-internal

Page 15: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Controlling Internal and Page-Internal Fragmentation

Choose page size (page) and block sizes (sk)

If sk = sk-1 (1 + ), internal fragmentation

page-internal fragmentation page / smax

E.g. If page = 16K, = 1/8, smax= 2K, maximum non-external fragmentation to 12.5%.

Page 16: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

dbja

ck

java

cje

ssm

trt

mpeg

audi

o

com

press

Internal Page-Internal External Recently Dead Live

Fragmentation - small heap ( = 1/8 vs.

= 1/2)

=1/8 =1/2

Page 17: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Incremental Compaction

Compact only a part of the heapRequires knowing what to compact ahead of time

Key ProblemsPopular objectsDetermining references to moved objects

used

Page 18: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Incremental Compaction: Redirection

Access all objects via per-object redirection pointers

Redirection is initially self-referential

Move an object by updating ONE redirection pointer

original replica

Page 19: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Consistency via Read Barrier [Brooks]

Correctness requires always using the replica

E.g. field selection must be modified

x[offset]

x

x[redirect][offset]

x

normal access

read barrier access

x

Page 20: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Some Important Details Our read barrier is decoupled from collection Complication: In Java, any reference might be null

actual read barrier for GetField(x,offset) must be augmented

tmp = x[offset];return (tmp == null) ? null : tmp[redirect]

CSE, code motion (LICM and sinking), null-check combining

Barrier Variants - when to redirectlazy - easier for collectoreager - better for optimization

Page 21: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Barrier Overhead to Mutator Conventional wisdom says read barriers are too

expensiveStudies found overhead of 20-40% (Zorn, Nielsen)Our barrier has 4-6% overhead with optimizations

0

2

4

6

8

10

12

com

press

jess db

java

c

mpeg

audio

mtrt

jack

Geo. M

ean

Lazy

Eager

Page 22: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Heap (one size only)Stack

Program Start

Page 23: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

allocated

Program is allocating

Page 24: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

unmarked

GC starts

Page 25: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

unmarked

marked orallocated

Program allocating and GC marking

Page 26: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

unmarked

marked orallocated

Sweeping away blocks

Page 27: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

allocated

evacuated

GC moving objects and installing redirection

Page 28: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

unmarked

evacuated

marked orallocated

2nd GC starts tracing and redirection fixup

Page 29: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

HeapStack

free

allocated

2nd GC complete

Page 30: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage

Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 31: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Scheduling the Collector Scheduling Issues

bad CPU utilization and space usage loose program and collector coupling

Time-Based Trigger the collector to run for CT seconds whenever the program runs for QT seconds

Work-Based Trigger the collector to collect CW work whenever the program allocate QW bytes

Page 32: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Time-Based Scheduling

Trigger the collector to run for CT seconds whenever the program runs for QT seconds

Sp

ace

(M

b)

Time (s)

0

10

20

30

40

50

60

70

80

90

100

Smooth Alloc Uneven Alloc High Alloc

0

0.2

0.4

0.6

0.8

1

Any

MM

U (

CP

U

Uti

liza

tio

n)

Window Size (s)

Page 33: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Work-Based Scheduling

0

0.2

0.4

0.6

0.8

1

Smooth Alloc Uneven Alloc

High Alloc

MM

U (

CP

U

Uti

liza

tio

n)

Trigger the collector to collect CW bytes whenever the program allocates QW bytes

Window Size (s)

0

20

40

60

80

100

Any

Sp

ace

(M

b)

Time (s)

Page 34: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

What is Real-time Garbage Collection? Pause Time, CPU utilization (MMU), and Space Usage

Heap Architecture Types of Fragmentation Incremental Compaction Read Barriers Barrier Performance

Scheduling: Time-Based vs. Work-Based Empirical Results

Pause Time Distribution Minimum Mutator Utilization (MMU) Pause Times

Summary and Conclusion

Roadmap

Page 35: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Pause Time Distribution for javac

(Time-Based vs. Work-Based)

12 ms 12 ms

Page 36: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Utilization vs. Time for javac

(Time-Based vs. Work-Based)

Uti

liza

tio

n

(%)

Time (s) Time (s)

0.4

0.2

0

0.6

0.8

1.0

0.4

0.2

0

0.6

0.8

1.0

Uti

liza

tio

n

(%)

0.45

Page 37: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Minimum Mutator Utilization for javac

(Time-Based vs. Work-Based)

Page 38: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Space Usage for javac (Time-Based vs. Work-

Based)

Page 39: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

3 inter-related factors:Space Bound (tradeoff)Utilization (tradeoff)Allocation Rate (lower is better)

Other factorsCollection rate (higher is better)Pointer density (lower is better)

Intrinsic Tradeoff

Page 40: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Summary: Mostly Non-moving RT GC

Read Barriers Permits incremental defragmentation Overhead is 4-6% with compiler optimizations

Low Space Overhead Space usage is only about 2 X max live data

Fragmentation still bounded Consistent Utilization

Always at least 45% at 12 ms resolution

Page 41: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Conclusions Real-time GC is real

There are tradeoffs just like in traditional GC

Scheduling should be primarily time-based

Fallback to work-based due to user’s incorrect parameter estimations

Incremental defragmentation is possible

Compiler support is important!

Page 42: A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Future Work Lowering the real-time resolution

Sub-millisecond worst-case pause Main issue: breaking up stack scan

Segmented array optimizations Reduce segmented array cost below ~2%

Opportunistic contiguous layout Type-based specialization with invalidation

Strip-mining