Garbage Collection Introduction and Overview Christian Schulte Programming Systems Lab Universität des Saarlandes, Germany [email protected]
Mar 26, 2015
Garbage CollectionIntroduction and Overview
Christian SchulteProgramming Systems Lab
Universität des Saarlandes, Germany
Purpose of Talk
Explaining basic concepts terminology
Garbage collection… …is simple …can be explained at a high-level
Organization
Purpose of Talk
Explaining basic concepts terminology
(never to be explained again) Garbage collection…
…is simple …can be explained at a high-level
Organization
Overview
What is garbage collection objects of interest principal notions classic examples with assumptions and properties
Discussion software engineering issues typical cost areas of usage why knowledge is profitable
Organizational Material Requirements
Overview
What is garbage collection objects of interest principal notions classic examples with assumptions and properties
Discussion software engineering issues typical cost areas of usage why knowledge is profitable
Organizational Material Requirements
Garbage Collection…
…is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage Collection…
dynamically allocated memory
…is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage Collection…
dynamically allocated memory last use by a program
…is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage Collection…
dynamically allocated memory last use by a program automatic reclamation
…is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage collection…
Dynamically allocated memory Last use by a program Examples for automatic reclamation
Kinds of Memory Allocation
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Static Allocation
By compiler (in text area) Available through entire runtime Fixed size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Automatic Allocation
Upon procedure call (on stack) Available during execution of call Fixed size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Dynamic Allocation
Dynamically allocated at runtime (on heap) Available until explicitly deallocated Dynamically varying size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Dynamically Allocated Memory
Also: heap-allocated memory Allocation: malloc, new, …
before first usage Deallocation: free, delete, dispose, …
after last usage Needed for
C++, Java: objects SML: datatypes, procedures anything that outlives procedure call
Getting it Wrong
Forget to free (memory leak) program eventually runs out of memory long running programs: OSs. servers, …
Free to early (dangling pointer) lucky: illegal access detected by OS horror: memory reused, in simultaneous use
programs can behave arbitrarily crashes might happen much later
Estimates of effort Up to 40%! [Rovner, 1985]
Nodes and Pointers
Node n Memory block, cell
Pointer p Link to node Node access: *p
Children children(n) set of pointers to nodes referred by n
n
p
Mutator
Abstraction of program introduces new nodes with pointer redirects pointers, creating garbage
Nodes referred to by several pointers Makes manual deallocation hard
local decision impossible respect other pointers to node
Cycles instance of sharing
Shared Nodes
Garbage collection…
Dynamically allocated memory Last use by a program Examples for automatic reclamation
Last Use by a Program
Question: When is node M not any longer used by program? Let P be any program not using M New program sketch:
Execute P; Use M; Hence:
M used P terminates We are doomed: halting problem!
So “last use” undecidable!
Safe Approximation
Decidable and also simple What means safe?
only unused nodes freed What means approximation?
some unused nodes might not be freed Idea
nodes that can be accessed by mutator
Reachable Nodes
Reachable from root set processor registers static variables automatic variables (stack)
Reachable from reachable nodes
roo
t
Summary: Reachable Nodes
A node n is reachable, iff n is element of the root set, or n is element of children(m) and m is
reachable
Reachable node also called “live”
MyGarbageCollector
Compute set of reachable nodes Free nodes known to be not reachable Known as mark-sweep
in a second…
Reachability:Safe Approximation
Safe access to not reachable node impossible depends on language semantics but C/C++? later…
Approximation reachable node might never be accessed programmer must know about this! have you been aware of this?
Garbage collection…
Dynamically allocated memory Last use by a program Examples for automatic reclamation
Example Garbage Collectors
Mark-Sweep
Others Mark-Compact Reference Counting Copying
skipped here read Chapter 1&2 of [Lins&Jones,96]
The Mark-Sweep Collector
Compute reachable nodes: Mark tracing garbage collector
Free not reachable nodes: Sweep Run when out of memory: Allocation First used with LISP [McCarthy, 1960]
Allocation
node* new() {
if (free_pool is empty)
mark_sweep();
…
Allocation
node* new() {
if (free_pool is empty)
mark_sweep();
return allocate();
}
The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
…
The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
…
all live nodes marked
Recursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
…
}
}
Recursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
…
}
}nodes reachable from n marked
Recursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
for (m in children(n))
mark(m);
}
}i-th recursion: nodes on path with length i
marked
The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
…
The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
…
all nodes on heap live
The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
…
all nodes on heap live
and not marked
Eager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) {
…
}
}
Eager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) {
if (is_marked(n)) clear_mark(n);
else free(n);
n += sizeof(*n);
}
}
The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
if (free_pool is empty)
abort(“Memory exhausted”);
}
Assumptions
Nodes can be marked Size of nodes known Heap contiguous Memory for recursion available Child fields known!
Assumptions: Realistic
Nodes can be marked Size of nodes known Heap contiguous Memory for recursion available Child fields known
Assumptions: Conservative
Nodes can be marked Size of nodes known Heap contiguous Memory for recursion available Child fields known
Mark-Sweep Properties
Covers cycles and sharing Time depends on
live nodes (mark) live and garbage nodes (sweep)
Computation must be stopped non-interruptible stop/start collector long pause
Nodes remain unchanged (as not moved) Heap remains fragmented
Variations of Mark-Sweep
In your talk…
Implementation
In your talk…
Efficiency Analysis
In your talk…
Comparison
In your talk…
Application
In your talk…
Overview
What is garbage collection objects of interest principal invariant classic examples with assumptions and properties
Discussion software engineering issues typical cost areas of usage why knowledge is profitable
Organizational Material Requirements
Software Engineering Issues
Design goal in SE: decompose systems in orthogonal components
Clashes with letting each component do its memory management
liveness is global property leads to “local leaks” lacking power of modern gc methods
Typical Cost
Early systems (LISP)
up to 40% [Steele,75] [Gabriel,85] “garbage collection is expensive” myth
Well engineered system of today
10% of entire runtime [Wilson, 94]
Areas of Usage
Programming languages and systems Java, C#, Smalltalk, … SML, Lisp, Scheme, Prolog, … Modula 3, Microsoft .NET
Extensions C, C++ (Conservative)
Other systems Adobe Photoshop Unix filesystem Many others in [Wilson, 1996]
Understanding Garbage Collection: Benefits
Programming garbage collection programming systems operating systems
Understand systems with garbage collection (e.g. Java) memory requirements of programs performance aspects of programs interfacing with garbage collection (finalization)
Overview
What is garbage collection objects of interest principal invariant classic examples with assumptions and properties
Discussion software engineering issues typical cost areas of usage why knowledge is profitable
Organizational Material Requirements
Material
Garbage Collection. Richard Jones and Rafael Lins, John Wiley & Sons, 1996.
Uniprocessor garbage collection techniques. Paul R. Wilson, ACM Computing Surveys. To appear.
Extended version of IWMM 92, St. Malo.
Organization
Requirements Talk
duration 45 min (excluding discussion) Attendance
including discussion Written summary
10 pages to be submitted in PDF until Mar 31st, 2002
Schedule weekly starting Nov 14th, 2001 next on Dec 5th, 2001
Topics For You!
The classical methods Copying 1. [Brunklaus, Guido
Tack] Mark-Sweep 2. [Schulte, Hagen
Böhm] Mark-Compact 3. [Schulte, Jens Regenberg] Reference Counting 6. [Brunklaus, Regis Newo]
Advanced Generational 4. [Brunklaus, Mirko
Jerrentrup] Conservative (C/C++) 5. [Schulte, Stephan
Lesch] Incremental & Concurrent 7. [Brunklaus, Uwe Kern]
Invariants
Only nodes with rc zero are freed RC always positive