-
1
Incremental Garbage Collection I
BAKKALAUREATSARBEIT (Seminar aus Softwareentwicklung: Garbage
Collection)
zur Erlangung des akademischen Grades
Bakkalaureus/Bakkalaurea der technischen Wissenschaften
in der Studienrichtung
INFORMATIK
Eingereicht von: WIRTH Christian, 0355354
Angefertigt am: Institut fr Systemsoftware
Betreuung: Prof. Dr. Hanspeter Mssenbck
Pasching/Linz, Jnner 2006
-
2
Abstract--Garbage collection systems ease the burdon on
programmers of manually handling the memory management. There is no
need to keep track of used and unused memory objects any longer,
the automatic collector takes over that work.
Many garbage collection algorithms have been introduced, most of
them having to suspend the main program in order to identify and
release unused nodes in memory.
Here in this paper I describe methods on how to allow the
garbage collector to incrementally identify the garbage without
having to suspend the program, letting the collector run
concurrently to it.
This paper is the first part of a three paper series on
incremental garbage collection. Its main intention is to give a
general overview of the motivation for, the fundamental priciples
of and some basic algorithms used for this advanced type of garbage
collector system.
Index Terms--Garbage collection, incremental garbage collection,
tricolor marking, write-barrier, read-barrier, Bakers
algorithm.
I. NOMENCLATURE Several terms and definitions used in this paper
deserve
separate explaination:
Mutator: This is an arbitraty program that mutates (allocates,
uses, abandons) data in memory. It is the task of the (garbage)
collector to search for and free data of the system memory when the
mutator does not need it any longer without the mutator explicitly
having to order that.
Collector: Short for garbage collector. This is the process
that, using techniques described in this paper, identifies memory
not used by the mutator any longer and releases it.
Floating Garbage: This term is used for unreferenced objects
that are not (yet) recognized by the algorithm to be garbage.
Ususally it takes another run of the garbage collector for the
floating garbage to be recognized as such.
Conservativity: The conservativity of an algorithm describes how
eager it is to free potential garbage. The more conservative an
algorithm is the higher are the chances that garbage is not
recognized immediately and thus not freed
(during the current run). This is independent of the correctness
of the algorithm. Even the least conservative ones still have to be
correct, meaning they may free only real garbage.
Fromspace, Tospace: Copying algorithms (like Bakers algorithm)
use two separate memory blocks. During one run of the collector,
all used objects are copied from the currently used block Fromspace
to another one (called Tospace). All unreferenced objects remain in
Fromspace and are freed when the whole block is.
Flip: When using a copying collector with a Fromspace/Tospace
model, flip describes the act of swaping Tospace with Fromspace.
This is done at the start of the run of the collector. After a flip
all objects are in Fromspace and only thos at least indirectly
referencable from the root set will be copied to Tospace and thus
survive the current run.
Root set: The root set is a (abstract) set of pointers. Objects
on the heap are only visible to the program if they are (at least
indirectly) referenced by a pointer that is available to the
program. The root set consists of all those pointers that are
available to the program without having to reference them in any
way. This includes global variables, the stack and the
registers.
II. INTRODUCTION NCREMENTAL garbage collection is a method of
allowing the garbage collection run to take place
simultaneously to the mutator process(es). Doing so the
collector has to take care of changes made by the mutator.
Garbage collection in general provides a system that supports
automatic memory management. Without the assistance of such an
helpful system the application programmer has to keep track of all
the memory he uses. He has to return unused objects back to the
operating system lest his programs memory consumption does not rise
perpetually. This causes quite a work overhead when writing
applications while the programmer should concentrate on
implementing the system he needes.
Incremental Garbage Collection: The Basic Algorithms
Christian Wirth
I
-
3
The first automatic garbage collection system was introduced by
John McCarthy in 1960, the inventor of the programming language
LISP. Using such a system, the programmer could from then on
concentrate on implementing his algorithms while being liberated of
the laborious memory management.
A. Drawbacks of simple algorithms Traditional, simple garbage
collection methods have one
major drawback. The usually have to stop the mutator process in
order to have a consistent view on the state of the process. They
are thus often called stop-the-world collectors. Only once the
garbage collector has finished its run the mutator may proceed to
manipulate the data in memory. Would it do so while the collector
was still runing both could badly interfere with each other
resulting in memory being released that was still in use.
Incremental garbage collection tries to solve this misery. It
enables the mutator to continue its work even during the run of the
garbage collector. In most cases the mutator still has to be
suspended for a short period of time but may proceed then while the
collector is doing its job.
The collector can either be implemented as a separate thread or
process or it can be embedded into the mutator process and
incrementally do a few steps of work every now and then. It is even
possible for the collector to execute on a completeley separate
processor in a multi-processor environment only sharing the same
memory. This increases the need for further synchronization between
the collector and the mutators though.
This paper will give an overview of the motivation why to use an
incremental garbage collector. It will present some of the basic
algorithms, especially methods to adapt the three basic garbage
collecting algorithms (mark-sweep, copying collectors and reference
counting) to be used incrementally or concurrently. It will discuss
the advantages and disadvantages of this approach.
Based on those fundamentals my colleagues Schatz [11] and
Wrthinger [12] will address some special features and more
sophisticated algorithms in further detail in their separate
papers.
III. GENERAL ADVANTAGES AND DISADVANTAGES It is hard to exactly
state the pros and cons for a big and
diverse class of different algorithms like those of incremental
garbage collectors. Depending on their actual method of
implementation, the hardware they are used on and other factors,
different advantages or causes for trouble are inherent in their
implementation.
Apparently, due to their aim their biggest advantage in general
is that they allow the mutator to smoothly continue its
computations while the collector can do its work at virtually the
same time. The algorithms try to minimize the length of forced
pause times of the mutator. This is especially important for
interactive systems or other real-time systems that heavily rely on
a low latency time.
However, those algorithms tend to be slighly more expensive
overall than others are. This is due to the special overhead of
having to keep track of the changes the mutator exercises while the
collector is already working on the data of the mutator. While for
simple algorithms the collector is the only one to manipulate data
during its run, when using incremental techniques this is not true.
Additional routines have to be used to guarantee a secure but fast
snchronization. This can get quite tricky easily for example when
several different processors have to access the same memory space
concurrently.
IV. INCREMENTAL VERSUS CONCURRENT GARBAGE COLLECTION
The terms incremental, parallel and concurrent are sometimes
inaccurately used synonymously to describe incremental garbage
collection.
A. Incremental or parallel garbage collection To be precise the
definition of incremental or parallel
garbage collection describes any collecting system that runs in
parallel to the mutator. It is the basic term for this class of
algorithms described in the paper and thus its title.
One possible method to implement such a system is to directly
interweave the collector with the mutator. A commonly used method
is for example to incrementally do a few steps of the collecting
task every time new memory is requested. This can be done by
inserting a few lines of code before every malloc (in C) or new (in
Java or C++). Whenever such a memory request is called, the
collector does a few steps of its job before actually serving the
mutators request for new memory. An algorithm using this method is
Bakers copying collector that will be explained in more details
later.
B. Concurrent garbage collection Concurrent garbage collection
on the other hand is just
another possibility to implement an incremental collector. It
describes a garbage collector that is executed in a separate
process or thread. The collector could use time-sharing techniques
to run quasi-parallel to the mutator. In a multiprocessor
environment it can even be executed on another processor.
Especially when running on a separate processor it is really able
to fulfil its task concurrently to the mutator thread.
-
4
Note that those suggestions for using the three terms is not
always followed by the authors of the literature. It seems to be
the most logical and and also most often used one though.
V. INCREMENTAL MARK AND SWEEP A well-known algorithm for garbage
collection is the mark
and sweep scheme. In the following paragraphs the possible use
of this method for incremental garbage collection are analyzed.
A. Tricolor marking A basic technique to identify the garbage
during the
marking run is the so-called tricolor marking. For simple,
non-incremental mark and sweep methods a two-color marking scheme
usually suffices. The need to identify mutator changes during the
marking run makes it necessary to use three colors here.
The collector starts with the root set. Following the pointers
from there it visits all nodes in the heap of the system memory
recursiveley. Doing so it eventually visits all nodes in memory the
mutator can access. All visited notes are colored with one of the
three colors black, white and grey, accoring to the following
rules:
Black: The node and all of its direct descendants (meaning all
the objects it directly references) have been visited. The node has
been fully processed and does not need to be visited again during
that run.
Grey: The node is marked to have already been found by the
coloring run but not yet been processed. It needs to be visited
again before the tricolor marking is complete.
White: The node has not been visited yet. If it stays white
until the end of the marking run it cannot be reached from the
mutator and can thus be considered garbage.
Initially, all nodes are white. For each node the collector
finds referenced it has to decide how to color it. Nodes are
colored black once their treatment is completed. This is when they
do not reference white nodes any more, when all their direct
descendands are at least grey. Until then the found but not treated
node is colored grey marking it as a reachable but not yet finished
node.
In an abstract way of seeing this process, the collector pushes
a wave-front of grey nodes through the memory transfering white
nodes to black ones. At the end of this process, this is when there
are no grey nodes left, all nodes are either black or white.
The collector has visited all nodes the mutator can possibly
reach, having marked those black. The remaining white
nodes can be considered garbage since they cannot be reached
from within the mutator.
This algorithm was first introduced by E. W. Dijkstra [7] to
describe incremental garbage collection. When used in a static
system the algorithm seems quite easy to be implemented. The basic
idea would be to first pause the mutator, then make the collector
start its tricolor marking run. Once all the garbage is identified
it is removed. Only then the mutator may proceed its work.
Apparently, this huge delay in the run of the mutator can not
always be tolerated. Using incremental garbage this pause is
eliminated or at least significantly shortened.
B. Implementation of tricolor marking There are two
possibilities to implement tricolor marking.
The obvious first one is to add two color bits to every
object.
The other possible way is to use a marker bit for each node and
a mark stack. Unmarked nodes are considered white. Marked nodes are
black unless they are on the mark stack in which case they are
grey. This method makes it easier to find the next grey node during
the marking run.
C. Mutator changes during the collecting run The described
marking algorithm can be used in a system
that allows the mutator to continue applying changes to the
memory. There are a few additional provisions that have to be
included though. Consider the following example.
Fig. 1 Possible failure of tricolor marking system
Of the three nodes (A, B, C), the objects A and B are in the
root set. B points to C. Now the collector starts its run and finds
A. Since it has no descendants A is colored black. Unfortunateley
the collector is stopped in favour of the mutator that edits both A
and B so that now only A points to C. Only after that the collector
may proceed and visits the only node left in the root set, B. Since
B has no descendants it is marked black, too.
C has never been visited thus it is still white. There are
neither grey nodes left nor unvisited links from the root set so
the marking run is finished. Because of its color C is considered
garbage. That is wrong of course since it is still
-
5
referenced by A.
The cause of trouble in this example is allowing the mutator to
change the state of a black node (A). In concrete, during the
marking run the mutator may not set up a pointer from a black node
to a white one that is not referenced from anywhere else. This is
demonstrated by the example: C is only referenced from (black) A.
There is no possible way that C will be visited during the current
run since Cs only possible father-node is already visited. This
renders the node garbage.
Any other possible changes the mutator could do can not harm the
system. The worst thing that can happen due to a change is that a
node meant to be garbage is overlooked in the current run. This
floating garbage has to wait for the next run to be treated
correctly. How often this happens depends on the conservativity of
the algorithm.
To express it formally, two preconditions must hold at some
point during the marking phase to cause troubles:
1. A pointer to a white node is written into a black one. 2.
That link is the only one referencing the white node.
If that holds for any (white) node this node will incorrectly be
treated as garbage since there is no other node referencing it so
the collector could only reach it via the black node. The collector
does not do so because it does not visit the black node or its
children again. So the node will stay white blemishing it as
garbage.
By preventing (at least) one of the two conditions mentioned
above an algorithm can ensure the system does not fail. This is
usually done with a barrier limiting the access to the system
memory.
VI. BARRIER METHODS To prevent the mutator from linking a white
node from a
black one two methods exists. Read barriers supervise reading
access to nodes while write barriers do so for write access. The
mutator may access the nodes only once the barrier has ensured no
link from a black to a white node can possibly be set up by the
mutator. The code of the barrier is usually put into the mutator
itself, for example into the memory allocation routines.
A. Read barrier Using a read-barrier ensures the mutator never
sees a
white node. Before the barrier allows the mutator to access a
white node it is treated by the collector. Painting it grey for
example ensures the node will be visited later again. That way it
is ensured that no node will be treated as garbage that is still
referenced.
Depending on the algorithms policy it may color all nodes grey
or immediately visit them (and possibly their descendants, too),
coloring them black and only then giving the mutator access to the
node. Described in a Tospace/Fromspace model, the barrier ensures
the mutator only accesses nodes in Tospace.
A read barrier is usually implemented by inserting a few lines
of code before each pointer read instruction. This can
significantly increases the length of the code depending on the
number of instructions to be inserted and the number of pointer
read instructions compared to the total number of instruction. It
also slows down the execution. Methods to handle this overhead are
shown later.
B. Write barrier A write barries does record when the mutator
writes
changes to an object. Whenever it tries to write a black to
white link this operation is supervised. One or probably even both
nodes (father and son) have to be visited or be marked to be
visited later this run again.
To implement a write barrier some instructions have to be added
in the mutators code to each pointer-write operation. Whenever a
pointer is about to be changed, first one of the affected objects
is colored grey and only then the pointer is changed to its new
target object.
There are two types of write-barries.
1) Snapshot-at-the-beginning Those algorithms conceptually take
a snapshot of the state
of the heap at the beginning of the marking run as the name
indicates. Whenever a write operation is detected by the barrier,
the originally referenced and now prossibly unlinked object is
colored grey.
Actually, this is usually done by coloring the original
referenced of a pointer grey before it is changed to its new
target. Creating a copy-on-write virtual copy of the active data
structure at the beginning of the collection cycle would be to
expensive and is not used for this. However, Furusous algorithm
that will be discussed later takes use of such a technique.
Note that generating black-to-white links is not prohibited for
this algorithm. Instead it guarantees that the second precondition
for a failure never holds. The old link to the object is not lost
in the way it is used to color the object grey, guaranteeing that
it is not treated as garbage during this collectors run.
As a consequence those algorithms are very conservative. The
unlinked object is kept no matter if it is garbage or not.
Consequently, objects that are created during the marking
-
6
phase are colored black even though it is quite probable that
such objects are relinquished befor the end of the current cycle.
In further detail this method is presented when Yuasas algorithm is
discussed.
2) Incremental-update Algorithms of this type record potentially
harmful pointer
changes and color one of the two involved objects grey,
depending on the actual algorithm. They are much less conservative
than Snapshot-at-the-beginning algorithms because they leave the
object that lost its reference white. That node can be deleted in
the same collectors run if no other non-garbage node points at
it.
The algorithms conservativeness also depends on which node the
algorithm shades gray. It can choose either the black father node
or the new white child node. Chosing the child node is much more
conservative because again that node can not become garbage during
that run any more. Would the collector shade the father node grey
the mutator could still unlink the (probably still white) childnode
allowing this node to be removed during that run.
Barries in general can be implemented in either software or
hardware. Former hardware architectures like Symbolics, Explorer
and SPUR provided hardware support but modern general purpose
machines usually dont do. Software read-barriers are usually quite
expensive. They increase the size of the generated code, slowing
down execution and complicate caching techniques.
VII. EXAMPLE MARK-SWEEP COLLECTORS Copying collectors have to
protect the mutator of the
changes made by copying the objects. This is usually done with
the help of a read-barrier. This method is quite expensive though
and is rearely used for non-copying collectors. They rather use
write-barriers to synchronize the mutator and the collector. The
probably most famous non-copying collectors are those of the
mark-sweep pattern.
In this paragraph some common mark-sweep algorithms will be
compared. They all use an incremental-update write-barrier method
except for Yuasas algorithm that uses a snapshot-at-the-beginning
technique:
Steeles Multiprocessing, Compactifying algorithm Dijkstras On
the Fly collector Kung and Songs four color algorithm Yuasas
sequential algorithm
To explain their functionality the following example operation
will be used.
Fig. 2 Mutator updates pointer from B to C
The mutator updates A so that it points to C instead of B. It is
not defined if there is another pointer to B. No matter what other
objects do point at B and C both have to be treated correctly (as
garbage or not). Since C has existed before and was accessible by
the mutator, there has been at least one link to it before the
update operation.
A. Yuasas algorithm Yuasas algorithm is a
snapshot-at-the-beginning
algorithm using a write barrier. The tricolor marking is
implemented using a mark bit and a stack.
shade(P) = if not marked(P) mark_bit(P) = marked
gcpush(P,mark_stack)
Update(A, C) = if phase == mark_phase shade(*A) *A = C
Algorithm 1 Yuasas write-barrier update operation
During the marking phase the barrier traps pointer updates and
shades the white object that was pointed at before (B in the
example) grey. Doing so that object is kept no matter if it is
garbage due to the unlink operation or not, making the algorithm a
very conservative one. Consequently, newly allocated objects are
colored grey immediately. Even though chances are high the new
object does not survive even a single run of the collector many
objects allocated are just temporarily used and are not stored for
a longer time - it can get garbage and be released only as early as
the end of the next run.
Fig 3. Yuasas snapshot write barrier
What might look strange first is the fact Yuasas algorithm
allows a black to white link (as from A to C in the example).
-
7
Nevertheless it ensures that only garbage will be removed.
Unlike most other algorithms is does not rely on averting the
first but the second precondition we defined to be necessary for a
problem to arise. While it allows black-to-white links (first
precondition) it does ensure every time a objects is unlinked
probably removing the last link to it this node will be colored
grey. This is a way of preventing the second precodition by using
the old link to the node for coloring it and ensure its
survival.
That the black-to-white link is set up is no problem either. See
the figure: there has to be another way to reach C since only then
a link from A to C can be set up by the mutator. C will obviously
not be visited via A but should be so via the other link. This is
guaranteed by Yuasas algorithm since if that other link is
destroyed too C will be set grey just as B was, via its former
parent node.
That way it will always either be reached via the other node
referencing it or it will be colored grey when that one is deleted
also ensuring it is not treated as garbage.
This behaviour is a result of the snapshot-at-the-beginning type
of algorithms. A static view of the memory system is assumed. All
nodes reachable in this static image are assumed not to be garbage.
Even if they are unlinked (like B) they still will not be released
since they are reachable in the image. Such nodes have to wait as
floating garbage for the next run where they are not linked in the
original image any more and will get released. The black to white
link is no problem here because it is assumed that there is another
way to reach the white node. If this other link is deleted the
object is colored grey for it is not lost.
B. Dijkstras algorithm This is the most conservative
incremental-update
algorithms. It was introduced by Dijkstra, Lamport, et al. in
1976 and in 1978.
It colors white cells grey whenever a reference is created to
them no matter what color the parent node has. That the color grey
is used and not black is obvious because the white node could have
white sons so that coloring it black could violate the rule no
black to white pointers. Unlike at Yuasas algorithm white cells
that are unlinked can be released in the same run though.
Fig. 4 Dijkstras write-barrier
Dijkstras main concern when researching this algorithm was its
correctness. He concentrated on a correct, easy to prove design.
That accounted for some of the features that lead to its
conservativeness. For example, new objects are marked black or grey
but never white.
An interesting and very important detail in the implementation
of the write-barrier of this algorithm is that when a pointer is
updated, first the pointer has to be changed and only after that
the new target may be set to gray. On first sight this seems the
wrong way round. It temporarily creates a link from a black to a
white node what is said to be prohibited for this algorithm.
Nevertheless it is the correct way of treating this operations as
we will soon discuss.
shade(P) = if white(P) color(P) = grey
Update(A, C) = *A = C shade(C)
Algorithm 2 Dijkstras write-barrier update operation
There is no need to lock during the Update operations. The
system just has to ensure that both lines of code represent atomic
operations and are not unconnected internally.
Now to the oder of the operations again. If the order of
assignment and shading was changed a permanent black to white link
could be created. Assume an already black A the mutator updates to
point to the still white C. The write-barrier would trap this
prohibited operation and shade C grey. Probably the mutator is
suspended then and the collector is allowed to proceed. It would
reach C (because it is grey) and treat it correctly.
Once the collector has finished its run it starts a new one,
reaches A and marks it black while C is not visited since it is not
yet a descendand of A. Only now the mutator is sheduled to proceed
and set up the link. This results in a black to white link from A
to C! If C is unlucky enough not to have any other parent nodes
referencing it, it will be treated as garbage although it is not!
This is why the counter-intuitive order of instructions in the
update operations is correct. For the correct order it suffices
that both instructions are atomic operations; there is no need to
lock the whole block.
Since the parent node is left black and only the descendand is
colored gray, this child will be kept even if the link is broken
during the run. This makes the algorithm even more
conservative.
-
8
1) Proof of correctness As mentioned above the main concern of
Dijkstra and his
colleagues was to find a correctly working algorithm while not
so much careing about efficiency. It took him and his colleagues
quite some effort to succeed in this task. He is quoted to have
expressed:
Our exercise has not only been very instructive, but at times
even humiliating, as we have fallen into nearly every logical trap
possible . . . It was only too easy to design what looked sometimes
even for weeks and to many people like a perfectly valid solution,
until the effort to prove it correct revealed a (sometimes deep)
bug.
When they finally succeeded they had proofen their algorithm to
fulfil two criterias:
Safety: No accessible node is ever appended to the free list
Liveness: Every garbage node is eventually collected Both
assertions have to be true for any correctly working
garbage collection system.
What Dijkstra and his colleagues still had to prove manually has
later been automated. Russinoff [4] used an automatic proof system
to prove the correctness of an algorithm similar to Dijkstras, the
Ben-Aris incremental garbage collection algorithm, that is derived
from Dijkstras but only uses two colors (black and white) to be
proven even simpler.
C. Steeles algorithm Unlike Dijkstra, Steeles main concern was
efficiency. His
variant of an incremental-update algorithm is less
conservative.
Its major difference compared to the algorithm of Dijkstra is
that it does not shade C but A gray, leaving the child object white
for the moment. That way C gets a chance to be unlinked by the
mutator and become garbage even during the same run of the
collector. The implementation of the tricolor marking is done with
a mark bit and stack like Yuasas does.
Fig. 5 Steeles write-barrier
shade(P) = mark_bit(P) = unmarked gcpush(P)
Update(A, C) = LOCK gcstate *A = C if (phase == marking_phase)
if (marked(A) and unmarked(C)) shade(A)
Algorithm 3 Steeles write-barrier update operation
By coloring the parent node and not the child node, the
algorithm retreats the grey wave-front in contrast to advancing it
as Dijkstras algorithm does. It is also more selective as it only
shades black parents of white sons. In all other cases there is no
need to shade because it is already ensured that all objects will
eventually be visited if they are not garbage then. The child has
already been visited (or marked grey) if it is not white any
longer. While this leads to some extra test and accesses to A, it
reduces the amount of floating garbage at the end of the collection
cycle.
Unlike Dijkstras, Steeles algorithm needs a lock to ensure the
collector does not finish its marking phase while the mutator is
currently working inside a barrier what could corrupt the system
otherwise.
D. Kung and Songs algorithm This algorithm is basically an
improved version of
Dijkstras. They paint free objects with a fourth color,
off-white when the data is released in the sweep phase unlike
Dijkstra who colors those free nodes white. Doing so they get rid
of some troubles Dijkstras algorithm has when it comes to differ
between free nodes and nodes marked white.
Another difference is that they use a deque (double ended queue)
instead of a marking stack. This reduces the need for critical
sections in their concurrent implementation.
New() = temp = allocate() if phase == mark_phase color(R) =
black return temp
shade(P) = if white(P) or off-white(P) color(P) = grey gcpush(P,
Mutator-end of queue)
Update(A, C) = *A = C if phase == mark_phase shade(C)
Algorithm 4 Kung and Song mutator code
-
9
VIII. TREATMENT OF NEW OBJECTS The treatment of objects
allocated during the run of the
collector has a major influence on the conservativity of an
algorithm. New cells have a high chance of dying very soon after
being allocated.
This is of course dependend on the mutators behaviour,
especially the task the mutator has to fulfill and how it is
implemented. Such allocated-and-released nodes should ideally be
freed during the same run of the collector while they should not
render too much additional computation time to identify them.
There are two possibilities how to treat new cells. The
conservative method is to color them grey or even black by visiting
them immediately. This way they cannot be reclaimed in the same run
of the compiler. Example algorithms using this method are Yuasas
and Dijkstras. Yuasas is slighty less conservative as it allows
nodes that are allocated and relinquished outside a marking phase
to be reclaimed during the next one.
The other possibility is to allocate new cells white. This has
the advantage that objects can be allocated, relinquished and
reclaimed in the same cycle, diminishing the work overhead on those
nodes if they are really often allocated and freed in the same
cycle. On the other hand it has the drawback that all surviving
nodes have to be traversed since they are not yet colored.
Steeles as well as Kungs and Songs algorithm are inherently less
conservative. To further support that feature they take a closer
look on the nodes to decide which color allocated cells should be
colored in. Kung and Song color new cells grey, or white outside a
marking phase. Steele colors new nodes grey during a marking phase
as well but uses a sophisticated heuristic to determine the color
outside a marking phase. Most of the objects will be allocated
white this way while some will be allocated black.
IX. INITIALIZATION Another point of concern is the question when
to initialize
the garbage collection run. The next run can start immediately
after the last one has finished or it can wait until the mutator
runs out of free memory space.
The first possibility can cause too much overhead when the
mutator generates only little garbage making the collector waste
its effort on just a few nodes freed. In constrast the second way
can cause mutator starvation; the mutator is running out of free
memory having to wait for the collector to catch up and free some
garbage and thus provide free memory. Doing so would derogate the
response time of the mutator since it can do nothing but wait for
some time.
As usual the best method lies somewhere in between. The next run
should not be initiated immediately but wait some time. On the
other hand it should not wait for no free memory left but start
when the number of free nodes falls below a certain treshold. Yuasa
for example suggested for his algorithm to initialize it once only
about 22 percent of the available heap space are left free.
A. User stack treatment by Dijkstra and Kung and Song Of further
interest is how to efficiently treat the user stack.
A simple sequential implementation would be to suspend the
mutator and shade all the roots grey in one operation.
For an incremental algorithm, neither Dijkstra nor Kung and Song
give hint how to do so. Dijkstra only suggests to shade all the
roots without giving further details how to shorten the necessary
break of the mutator. Kung and Song add all the roots to a marking
queue at the start of the collection cycle.
B. User stack treatment by Yuasa Yuasa optimizes the solution by
copying the entire
program stack using a block-copying operation. Only the
registers and global variables are copied directly onto the mark
stack because he assumes there is only a very limited number of
such root nodes available. The entries from the saved stack are
then transferred in lower quantities to the mark stack when that is
in danger of getting empty lest the mark stack not get to deep. The
mark phase is finished when both mark stack and save stack are
empty.
This system makes Yuasa state his algorithm is real-time. That
is only true in the context that the time complexity of the write
barriers allocation routine is bounded by several constants. He
also relies on the existence of the fast copying operator.
Depending on the actual hard- and software his algorithm is
implemented on, those bounding constants can grow unacceptably
high.
C. User stack treatment by Steele Steele suggests to first mark
all objects reachable from the
roots, one root element being traced after the other. The stack
is left untouched until the very end of the marking phase because
of its volatility. It is then traced one element at a time.
Unfortunateley the mutator can still then push objects onto the
stack so it has to push them on the marking stack too once the
collector has finished tracing the elements on the program
stack.
When used in a concurrent system, the marking stack is locked by
the collector. If it finds the stack empty then, the mark phase can
be considered complete, otherwise the lock is reversed and the
marking continues.
-
10
X. TERMINATION The termination of the marking phase is quite
expensive
for Dijkstras algorithm. His algorithm sequentially scans the
heap for grey nodes over and over again. When it completes a full
tour through the heap without identifying a grey node it has
finished the marking phase. This method has a quadratic time
complexity in worst case; unfortunateley it is easy to construct
realistic examples for that complexity.
Fig. 6 Dijkstras quadratic complexity
One simple example is a linked list on the heap that is empty
otherwise. That list would be put on the heap in reverse order, the
start would be nearer to the end of the heap and each element would
point to the one directly preceding it. I leave aside the fact that
the example shows a LISP implementation using two pointers and a
data field for each cell and just inspect the list element as if it
was one object.
There would be a link from the root set to the last element in
the list. So every run through the heap would find just one grey
element, marking this one black and coloring the preceding one
grey. Then the collector would have to travel almost one full round
through the heap to find the next grey one. That would continue
until the first element was found and colored black. That leads to
n rounds through the heap (where n is the number of elements in the
linked list), every time checking n nodes if they are grey but only
treating one node per round. The time complexity is thus O(n),
quadratic.
XI. PARALLEL EXECUTION OF MARK AND SWEEP A way of improving mark
and sweep algorithms has been
introduced by Lamport and Queinnec at al.[5]. They have shown
that the mark phase and the sweep phase can be run concurrently. To
be more precise: while the sweeping phase of the nth cycle is in
progess the (n+1)th marking cycle may already be in progress.
Therefore they applied two color fields to each element. While
cycles (both mark and sweep) with an even number access the first
color field those of odd cycles access the second one.
There is a trap hidden though. Since Dijkstras marks both used
and free memory one has to take care not to break the condition of
not setting up black to white pointers. When a node is released and
thus added to pool of free nodes and its color (as seen by the
current cycle) is white, then its other color has to be set grey.
Otherwise a black to white link could be set up.
A. Very Concurrent Garbage Collector A similar approach is
undertaken by Huelsbergen and
Winterbottom [9]. Their Very Concurrent Garbage Collector system
allows to execute mutator, marker and sweeper as three different,
autonomous threads. The mutating, marking and sweeping run in
consecutive phases. While the marker marks the nth generation of
data the sweeper releases data marked as garbage during the (n-1)th
run of the marker. In parallel the mutator may already alter data
marked in the nth generation. This approach is not to be mistaken
with generational garbage collection though.
All this can be done by virtually no synchronization between the
threads. Only the marker and the sweeper have to wait for each
other to complete their phases. The sweeper may of course only
begin with sweeping out the garbage of the nth phase when the
marker has already completed marking it. There is no need to
synchronize mutator and the two collectors with each other while
the collector is running. Only when a new phase is started the
mutator has to be suspended for a short time for the roots to be
marked correctly. For more details on this algorithm I refer to the
paper Mark and Sweep [10] of this seminar.
XII. VIRTUAL MEMORY TECHNIQUES Using a software write-barrier
implies the disadvantage of
an overhead on all pointer updates done by the mutator. Often
this can be handled by improved with assistance of the virtual
memory.
An example would be the Boehm-Demers-Shenker collector that
marks the objects incrementally with the help of the operating
system. It makes use of the dirty bits of virtual memory pages for
synchronisation. When the algorithm needs to terminate, all mutator
threads are suspended and the dirty bits of all pages are checked.
For all dirty pages the marking process is restarted, beginning
with root and shaded objects on them. Once this marking run is
finished the collector again tries to terminate.
This method needs only little overhead in the paging code that
is furthermore only executed when the program pages. Only one dirty
bit for one memory page is extremely coarse on the other hand. Many
different objects can be placed on one page while only one of them
has to be altered to result in the dirty bit set and the page
needing another marking run.
Since this method has to suspend the mutators everytime it wants
to terminate and the examining of dirty bits and scanning of pages
can be quite expensive this algorithm is usually not the best
choise. Its huge advantage on the other hand is that it can be
implemented without having to modify the compilers. It can thus be
easily adapted to support various new languages.
-
11
Another possibility of utilizing the virtual memory is to make
advantage of it to implement a snapshot-at-the-beginning
write-barrier. An actual snapshot can be created incrementally with
copy-on-write pages. This technique has been used by Furusou et al.
to implement a concurrent conservative collector for
object-oriented languages.
The biggest advantage of this method is that is has the best
pause time behaviour of all the algorithms utilizing virtual
memory. The mutator threads only have to be suspended for a short
time when the virtual memory tables are prepared for the
copy-on-write process. From then on the mutators need not be
stopped again.
Once the snapshot is taken the collector uses this static image
of the heap to mark while the mutators use (and probably modify)
the the real data. Furusou actually used Yuasas algorithm on the
copy of the heap. When the collector removes some garbage it does
so in the actual data structures. There is no need for further
synchronization between the mutators and collector.
Tested in practice Furusous collector proved to be quite
disappointing though. It was hard do avoid additionally copying
operations since copy-on-write only copies the page the first time
it is written back to the harddrive.
Additionally, the memory management emerged to be horribly
inefficient. The manager was only able to deliver a few thousand
objects per second while it should be able to deliver several
millions to be reasonably used in a concurrent object-oriented
system. Furuosu et al. suggested to assign memory in sizes of full
pages to the mutators.
XIII. CONCURRENT REFERENCE COUNTING Reference Counting should be
an appropriate algorithm to
be used in an incremental garbage collecting system since both
techniques rely on a strong interconnection between mutator and
collector. Apart from its common drawbacks inability to detect
cyclic referencing garbage, computational expenses and close
coupling to the user program reference counting is especially bad
suited for multiprocessor environments though. The process of
updating a reference count has to be synchronized between the
threads possibly accessing this object. This must be an atomic
operation thus requiring locks on all affected objects. This
further increases the already high overhead costs of pointer
assignment.
Fig. 7 Modula-2+ concurrent reference counting architecture
One implementation of this algorithm is the Modula-2+ collector
by Rovner, later improved by DeTreville. Mutator and collector
communicate via a transaction queue. The mutator does not keep
track of the reference counts but logs all changes it applies to
pointers in the transation log. The collector uses this logged data
from time to time to calculate the reference count of every object.
Those nodes having a count of as low as zero are destroyed.
Update(A, C) = LOCK mutex insert(A, C, tq) if ty is full
notify_collector(tq) tq = get_next_block() *A = C
Algorithm 5 Mutator code for Modula-2+ collector
This can be further optimized by distinguishing between
variables only the current thread has access to (like thos on the
stack or in a register) and variables accessable by all threads
(like the heap data or global variables). In fact the collector
only reference-counts shared pointer-valued variables. This of
course complicates the collector since the reference count number
now only represents a lower bound of the actual count and not the
full, corret value.
To determine if an object may be freed the following system is
used. Once the transaction block has been filled up it is
transferred to the collector. This block holds the information up
to some time t0. The collector now scans the state of each thread,
one after the other. It has to hold the mutex so that the thread is
not inside an Update operation. The time when all threads have been
scanned is t1. After that the reference counts of all variables are
recounted using the data of the transaction block. Variables with a
reference count of zero are place in a Zero-Count List (ZCL).
If an object had a reference count of zero at t0 and it was not
visible in any threads state between t0 and t1 and does not appear
in the transaction queue between t0 and t1 it is garbage
-
12
and can be freed.
The last structure to be considered is the ZCL. If an object in
the ZCL now has a shared reference count higher than zero it is
removed from the list. If it was found in a threads state it is
left in the ZCL and will eventually be freed. Otherwise the object
is removed from the list and freed.
This algorithm was not ideal in many ways. As a reference
counter it was not able to reclaim cyclic structures. The cost of
assignments to shared references was quite high. Other problems
experienced where working set size, locality and a tendency of the
collector to fall behind the mutator. After testing several
different algorithms a combination of reference counting and
mark-sweep collecting was used.
A. Combining reference counting with mark-sweep colleting An
algorithm of that type was first introduced by Deutsch
and Bobrow in 1974 [2]. By using a combination of two garbage
collection and reference counting they tried to eradicate the
shortcomings of both systems.
They stated that while reference counting has an overhead
proportional to the number of transactions, garbage collection
(mark-sweep) has an overhead proportional to the size of allocated
memory what made them suggest reference counting as the prefered
method to reclaim unused space. Since cyclic garbage cannot be
detetectet using this method, they added a mark-sweep collector
that ran much less frequent but guaranteed that all garbage was
found.
Their reference counter also took use of the transaction file
system described above. Any operations with possible effect on the
accessibility of allocated data is logged. Those operations are the
allocation of new memory, the creation of a pointer to an object
and the destruction of a pointer to an object. They assumed that a
reference count of 1 is the most common one. So they only stored
objects not having that count. Objects with a count of 2 or higher
are stored in a hashtable named Multi Reference Table (MRT), those
with a count of zero are stored in the already described ZCT. Each
transaction stored in the low is not now inspected and treated as
follows:
Allocate transactions generate an entry in the ZCT since there
are no pointers to the new cell yet.
Create pointer transactions delete the value referenced from the
ZCT if it is found there (count is now 1), increase its count if it
is in the MRT or put it in the MRT with a count of 2 if it is found
in none of the two tables.
Delete pointer transations whose referenced object is found in
the MRT trigger a decrement of the count value in the MRT unless
the count is already on its maximum (it is kept
there) or it is 2 in which case the entry is deleted from the
MRT. If the entry is not found in the MRT it had a count of 1, so
put it in the ZCT now.
Now those variables mentioned in the ZCT that are not referenced
by a variable can be deleted. This is verified using a hashtable
named Variable Count Table (VCT) containing all pointers from the
stack. Entries that are in the ZCT but not in the VCT can be
reclaimed. Of course then the reference count of the objects
referenced by this (now deleted) object have to be decremented.
This can be done by storing the information that the object has
been deleted in the normal transaction table. On the other hand
there could be a significantly large structure ready to be
reclaimed at once. However, it has to be taken into account that
this could negatively affect the mutator by a huge number of disk
accesses required if the structur occupied space on may pages.
A first simple optimization is to ignore allocate and create
pointer operations that directly succeed each other and reference
the same object. That would put the object into ZCT and delete it
right away again, so this pair can be ignored.
A further optimization is to store a block of the transation log
as hashtable and find patterns of operations referencing the same
object that render no change in the MRT or ZCT. That would be the
mentioned allocate create pointer combination as well as a create
destroy combination. Deutsch and Bobrow argue that even if such an
optimization sounds too expensive to be used it should give good
performance, especially if one block of the transation log contains
only a few thousand entries and can be kept in the core.
In addition to the reference counting scheme mentioned above
they use a garbage collector. The task of that collector is not
only to reclaim cyclic referencing garbage but also to compact and
defragment the memory in use. It is not primarily used to the
dynamic reclamation of abandoned data a garbage collector is
usually used for. The collector they researched bases on tracing
techniques for linearization by Minsky and copies the
non-garbage-data into a linearized form as described by Fenichel
and Yokelson.
The whole system they describe is capable of being executed on a
separate processor as a true concurrent collector. The mutator only
has to generate and send the transaction log but needs not be able
to access the collector. The collector gets the log file and
calculates the reference count (alongside the MRT and ZCT from that
data). The not yet solved problem of recursive reclaiming of freed
objects can just be postponed to the next run of the collector if
treated correctly as described above.
-
13
Deutsch and Bobrow are certain their system could meet the
requirements of a real-time system even when run on a single
processor environment. They do not provide any proof though.
XIV. BAKERS ALGORITHM The incremental copying collector of Baker
[1] is one of
the best-known algorithms using this scheme, first presented in
1978. Sometimes standard copying collectors were erroneously
referred to as Bakers due to this. It may not be mistaken with
Bakers Treadmill algorithm though, that is described in the paper
of Schatz [11].
Fig. 8 Bakers Tospace layout
Baker adopted Cheynes copying collector so that it allowed to
run parallel to the mutator. The Tospace is organized so that all
the old surviving data is compacted at the bottom end (B) and the
newly created objects are put at the top end (T).
Consequently all new objects are allocated black. This means the
collector does not have to scan the new objects. Even if it have
been initialized with links to objects in Fromspace this does no
harm since the read barrier takes care of this. The drawback is
that new objects can only be reclaimed during the next cycle of the
collector even if they were relinquished later that run. This
defines Bakers algorithm to be more conservative than incremental
update write-barriers (like Steeles) but less so than
snapshot-at-the-beginning algorithms (like Yuasas).
New(n)= if B > T - n if scan < B abort "Could not finish"
flip() for root R R = copy(R) repeat k times while scan < B for
P in Children(scan) *P = copy(*P) scan = scan + size(scan) if B ==
T abort "Heap full" T = T - n return T
read(T) = T' = copy(T) return T'
Algorithm 6 Bakers incremental copying algorithm
Just as usual for a copying collector Bakers start with a flip
of Tospace and Fromspace. This can either happen once the B and T
pointers meet for example. Waiting for that has the advantage of
giving the objects maximum time to die what minimizes the copying
need since less objects have to be copied to Tospace and more
garbage can be reclaimed by a single operation. The disadvantage on
the other side is that more heap memory is allocated and it is
possibly quite fragmented too what increases the number of paging
requests to be fulfilled. So it might be better to start the next
run as soon as possible, guaranteeing a lower fragmentation and
less need to page but increasing the need to copy objects from
Fromspace to Tospace.
The flip also takes care that Tospace offers sufficiently space
and expands it if that should be necessary.
Each time the mutator allocates new memory, up to k objects are
scanned by the collector, probably copied to Tospace and marked
grey then (or black). When a bigger structure is allocated, this
number is increased to n*k (where n is the size of the structure in
words).
The read-barrier only affects pointer load operations while
write operations may be done without notice of the collector or the
barrier. This allows white objects to die and be reclaimed during
the same cycle of the collector. The reason for this is that they
do not have to be copied when they are written to (and are marked
grey or black then). The read-barrier only monitors access to white
nodes, copying them and marking them grey befor allowing access to
them. Access to grey nodes is allowed freely preventing the need to
immediately visit a evacuated nodes and their descendants.
-
14
A. Drawbacks of Bakers This basic version of the algorithm has a
major drawback
as it has to scan the whole root set at flip time. If the root
set is big enough, for example when it includes a program stack,
the latency can grow quite high. Another problem can be the costs
of evacuating larger objects. The usage of Bakers algorithm is also
limited by its close connection to the mutator, making it hard to
predict when and for how long the pauses for the mutator are. It
can not be predicted when an allocation operation has to flip and
thus bears a longer pause for the mutator thread.
The first problem can be solved by incrementally scavanging a
number (k) of stack cells at each allocation. Doing so the overhead
spreads on a wider number of operations elminating the one long
pause for the mutator. The number of k is recomputed every time. It
is adapted so that k compared to the number of stack nodes
represents the same ratio as stack nodes to heap nodes.
Incrementally scavanging the stack of course complicates the
routines accessing the stack. Each pop operation could have to
change the position of the scan pointer for the stack. Secondly,
the read-barrier needs to be extended so that it also blocks
objects taken from the stack and not only those taken from the
heap. The push operation does not need any special attention since
the read-barrier already ensures only grey or black objects (that
thus reside in Tospace) can be accessed.
The process of scanning the stack can either begin at its bottom
or its top. While Baker favours to start at the top of it others
(Brooks, Steele) suggest to start at the bottom. Objects residing
nearer to the bottom are changed much less often. Doing so the
collector reduces the chance to copy garbage into Tospace.
The solution for the second problem is to copy large objects
only lazily. This means that not the whole object is copied at once
but incrementally in parts. This requires an additional pointer in
both the Fromspace and the Tospace object called forwarding address
and backward link. When the object is evacuated the necessary space
for the object (plus this pointer) is reserved in Tospace. The
forwarding address pointer in the Fromspace is set to point to the
copy in the Tospace, the backward link pointer in Tospace links
back to the original version of the object in Fromspace. The
scanning then continues on the version in Tospace with the objects
data getting copied from Fromspace incrementally every time.
Fig. 9 Lazy scanning using forwarding address and backward
link
Whenever there is a write access to the object this has to be
blocked. If the address of the field to be accessed is higher than
the position of scan, then the data is stored in the original copy
in Fromspace. The data will be copied to Tospace anyway when the
scanning wavefront reaches this address. If the address is lower
than scan then the version in Tospace is altered since scan has
already handled that region and the data was already copied to the
object in Tospace. When the scanning of the object is finished
(meaning scan is higher than the address of the last field of the
object in Tospace), the backward link is cleared (set to nil).
Since the algorithm is so closely interwoven with the mutator it
relies on hardware support to gain full efficiency. The overhead
cost of a micro-coded read-barrier as suggested by Baker lacking
any special hardware support is predicted to be about 30% by Wholey
and Fahlmann in 1984. Later research by Zorn in 1990 [8] on the
other hand showed that a well-coded software read-barrier can reach
much better results. He suggests 2% - 6% are possible on modern
systems when using appropriate methods.
Another problem is that the cost of access to an object depends
on whether it is found in Fromspace or in Tospace. This makes the
behaviour of the read-barrier highly unpredictable since the cost
of an acutal access depends on whether the object has already been
visited by the collector and transfered to Tospace in the current
run or not what can hardly be predicted.
The last drawback of Bakers algorithm is that it conservatively
allocates new objects black making it impossible for them to be
released prior the next run of the collector. This has already been
discussed in detail for other algorithms.
B. Variations on Baker 1) Brookss indirection pointers
Brooks suggested to add the indirection field used for lazy
scanning above to all of the objects. This somewhat simplifies
-
15
the read-barrier. As long as the object has not been copied both
indirection fields - the one in the original Fromspace version and
the field in the Tospace copy - point to the Fromspace object. Once
the object has been copied both indirection pointers reference the
version of the object in Tospace.
To ensure no black to white pointers are set up, destructive
operations like update have to forward their second argument before
installing it since the mutator can see both the Fromspace and the
Tospace version.
Fig. 10 Brookss forwarding pointers
Actually this method is more alike an incremental-update
write-barrier algorithm. Its major drawback is that is needs a huge
overhead for storing the indirection pointer. Applied on LISP cons
cells this means a 50% overhead in space consumption. A property of
course emerging is the time penalty of having to use the
indirection pointer every time accessing an object. However, this
is partially compensated by the fact that the write-barrier is
triggered significantly less frequent than a read-barrier would
be.
2) North and Reppy A further improved version of Brookss
variation was
introduced by North and Reppy in 1987. They use Brookss
indirection pointers method but store those pointers in a separate
dataspace all together. This reduces the space overhead since the
indirection pointers are stored only once while the original method
store them twice. The time overhead is slightly increased though by
the fact that this pointer space needs to be visited by a garbage
collector itself.
3) Dawson Dawson tries to tackle the conservativity of
Bakers
algorithm. New allocated objects are colored black there. An
improvement on this offered by Dawson is to allocate new objects in
Fromspace whenever possible and coloring them white. The next cycle
of the collector is also initiated as soon as possible, when the
last one has finished.
Fig. 11 Heap layout for Dawsons collector
4) MultiLisp implementation by Halstead Halstead tried to adapt
Bakers algorithm so it can be used
in a multi-processor environment using Concert MultiLisp.
Several processors are able to access the same memory. Each
processor has a garbage collection thread of its own and a separate
Tospace and Fromspace region. Since they all share the same memory
the garbage collection has to be synchronized between all
collectors. Halsteads solution added a lock bit to each pointer and
each object allowing a consistent run of all the collectors.
XV. DYNAMIC REGROUPING Static regrouping is used to compact the
data bringing
related objects nearer to each other. This should result in a
better runtime efficiency since paging should have to be used less
often. When using an incremental garbage collector the additional
improvement of a dynamic regrouping strategie can be thought of
that can execute quite efficiently since it can be interwoven with
the rest of the garbage collector.
A technique proposed by Baker and Dawson is the dynamic
regrouping. Its another improved version of Bakers copying
collector and additionally tries to reorganize the data dynamically
to increase the locality of reference for the memory.
The original algorithm is adapted so that the mutator always
puts its objects (that are newly allocated elements and such
evacuated from Fromspace by the read-barrier) at the top end of the
Tospace to position T. The scanner uses an additional pointer
previousB (see Figure 11 further above for the layout of the
Tospace) . It scans at that position whenever possible instead of
using the position of scanB. That way the datastructures are
traversed in a depth-first manner what linearizes the data.
This is especially usefull for LISP programs that take heavy use
of lists. For them first their cdr and only later the
-
16
car pointer is followed. By using dynamic regrouping LISP
programs can have their lists linearized what should increases the
execution time significantly while there is not too much overhead
for the technique described here. Its simply a by-product of Bakers
algorithm. The only major difference is that the scanner has to
decide whether to continue scanning at previousB or at scanB. That
is just one additional compare though and has a further advantage
of not affecting the mutator in any way.
Another approach was taken by Court. His Temporal Garbage
Collector used the fact that even when a program uses a high amount
of heap space only a somewhat smaller fraction of this data is
heavily used during a typical section. Using a simple regrouping
strategie he wanted to put that section together while more or less
ignoring the part of memory that is used less often or not at
all.
He did that by letting the user run a training session of the
most often used functions of the system. That sesstion told the
collector what data was really used and what not. While doing so a
read-barrier copied all the used objects into the Tospace while the
unused stayed in Fromspace. The scanner was turned off during this
and did not evacuate any objects. At the end of the training run
all the data used was in Tospace while the unused still resided in
Fromspace. The objects in Tospace were marked as static and were
not copied during the following collection runs. This training-set
was also stored on disk to be used for future sessions.
While the idea behind is quite good of course its major drawback
is that it still is a simple static method. It neither employes a
dynamic regrouping strategy nor does it tribute to changes in the
behaviour pattern of the user and strictly relies on data collected
probably years ago. Nonetheless Court observed that paging time for
his programs was used by 30 to 50 percent when applying this
technique.
He later combined this method with an adaptive training
strategy. That used a series of shorter training sessions, one at
each start of the collector. First a small amout of data was
trained without the scanner running, the rest of the collection
continued just as used. This generated an smaller set of often-used
data that was more related to the currently used functions though.
By also applying further improvements he reached an overall
decrease of paging time of up to 75 to 80 percent total for his
programs. This results could later be confirmed by Johnson and
Llames but although not so dramatically as described by Court.
LIST OF FIGURES 1. Possible failure of tricolor marking system 4
2. Mutator updates pointer from B to C 6 3. Yuasas snapshot
write-barrier 6 4. Dijkstras write-barrier 7 5. Steeles
write-barrier 8 6. Dijkstras quadratic complexity 9 7. Modula-2+
concurrent ref. counting architecture 11 8. Bakers Tospace layout
12 9. Lazy scanning 14 10. Brookss forwarding pointers 14 11. Heap
layout for Dawsons collector 15
REFERENCES [1] H. G. Baker, List processing in real time on a
serial computer, Commun.
ACM, vol. 21, no. 4, pp. 280294, 1978. [2] L. P. Deutsch, D. G.
Bobrow: An efficient incremental automatic garbage
collector, Commun. ACM 19,9 (Sept. 1976),522-526. [3] H. G.
Baker: List processing in real-time on a serial computer,
Commun.
ACM, 21(4):280-94, 1978 [4] D. M. Russinoff, A mechanically
verified incremental garbage collector,
Formal Aspects of Computing 6: 359-390, 1994 [5] C. Queinnec et
al. : Mark DURING Sweep rather than Mark THEN
Sweep, ACM Lecture Notes In Computer Science; Vol. 365, 1989 [6]
G. L. Steele: Multiprocessing compactifying garbage collection,
Commun. ACM, 18(9):495--508, September 1975. [7] E. W. Dijkstra,
et al. : On-the-fly garbage collection: An exercise in
cooperation, Commun. ACM, 21(11):965--975, November 1978 [8] B.
Zorn.: Barrier Methods for Garbage Collection. University of
Colorado at Boulder. Technical Report CU-CS-494-90, 1990 [9] L.
Huelsbergen, P. Winterbottom. Very concurrent mark-&-sweep
garbage collection without fine-grain synchronization. Symposium
on Memory Management, Vancouver, p. 166-175, 1998
[10] Schartner: Mark & Sweep, in Seminar aus
Softwareentwicklung: Garbage Collection, 2006
[11] R. Schatz: Incremental Garbage Collection II, in Seminar
aus Softwareentwicklung: Garbage Collection, 2006
[12] T. Wrthinger: Incremental Garbage Collection III, in
Seminar aus Softwareentwicklung: Garbage Collection, 2006