CSEP505: Programming Languages Lecture 10: OOP; Memory Mgmt; Wrap- Up Dan Grossman Winter 2009
12 March 2009 CSE P505 Winter 2009 Dan Grossman 2
Last time
• Key novelty / semantic difference of OOP is dynamic dispatch– Defined by self mapping to “whole current object”
• The method’s “receiver”
• Investigating the “extensibility problem” with canonical example:– Abstract class Exp with subclasses IntExp, AddExp, …– Exp has methods for interp, typecheck, toInt, …
12 March 2009 CSE P505 Winter 2009 Dan Grossman 3
The Grid
interp typecheck toInt …
IntExp Code Code Code Code
AddExp Code Code Code Code
MultExp Code Code Code Code
… Code Code Code Code
1 new function
1 new class
12 March 2009 CSE P505 Winter 2009 Dan Grossman 4
Back to MultExp
• Even in OOP, MultExp is easy to add, but you’ll copy the typecheck method of AddExp
• Or maybe AddExp extends MultExp, but it’s a kludge
• Or maybe refactor into BinaryExp with subclasses AddExp and MultExp– So much for not changing existing code– Fairly heavyweight approach to a helper function
12 March 2009 CSE P505 Winter 2009 Dan Grossman 5
Remaining OO plan
• Meaning of type-safety for OO
• Why are subtyping and subclassing separate concepts worth keeping separate?
• Multiple inheritance; multiple interfaces
• Static overloading
• Multimethods
• Revenge of bounded polymorphism
12 March 2009 CSE P505 Winter 2009 Dan Grossman 6
Typechecking
We were sloppy:
talked about types without “what are we preventing”
1. In pure OO, stuck if we need to interpret v.m(v1,…,vn) and v has no m method (taking n args)• “No such method” error
2. Also if ambiguous: multiple methods with same name and there is no “best choice”• “No best match” error• Will arise with static overloading and multimethods
12 March 2009 CSE P505 Winter 2009 Dan Grossman 7
Subtyping vs. subclassing
• Often convenient confusion: C a subtype of D if and only if C a subclass of D– But self is covariant; the key type system difference
• But more subtypes are sound– If A has every field and method that B has (at appropriate
types), then subsume B to A– Interfaces help, but require explicit annotation
• And fewer subtypes could allow more code reuse…
12 March 2009 CSE P505 Winter 2009 Dan Grossman 8
Non-subtyping example
Pt2 ≤ Pt1 is unsound here:
class Pt1 extends Object { int x; int get_x() { x } bool compare(Pt1 p){ p.get_x() == self.get_x() }}class Pt2 extends Pt1 { int y; int get_y() { y } bool compare(Pt2 p) { // override p.get_x() == self.get_x() && p.get_y() == self.get_y() }}
12 March 2009 CSE P505 Winter 2009 Dan Grossman 9
What happened
• Could inherit code without being a subtype• Cannot always do this
– what if get_x called self.compare with a Pt1Possible solutions:– Re-typecheck get_x in subclass– Use a really fancy type system– Don’t override compare
• Moral: Not suggesting “subclassing not subtyping” is useful, but the concepts of inheritance and subtyping are orthogonal
12 March 2009 CSE P505 Winter 2009 Dan Grossman 10
Remaining OO plan
• Meaning of type-safety for OO
• Why are subtyping and subclassing separate concepts worth keeping separate?
• Multiple inheritance; multiple interfaces
• Static overloading
• Multimethods
• Revenge of bounded polymorphism
12 March 2009 CSE P505 Winter 2009 Dan Grossman 11
Multiple inheritance
Why not allow C extends C1,…,Cn {…}– and C≤C1, …, C≤Cn
What everyone agrees on: C++ has it, Java doesn’t
We’ll just consider some problems it introduces and how (multiple) interfaces avoids some of them
Problem sources:
1. Class hierarchy is a dag, not a tree
2. Type hierarchy is a dag, not a tree
12 March 2009 CSE P505 Winter 2009 Dan Grossman 12
Diamonds
• If C extends C1, C2 and C1, C2 have a common (transitive) superclass D, we have a diamond– Always have one with multiple inheritance and a topmost
class (Object)• If D has a field f, does C have one field or two?
– C++ answer: yes • If D has a method m, C1 and C2 will have a clash
– Also possible without a diamond• If subsumption is coercive (changing method-lookup), then how
we subsume from C to D affects run-time behavior (incoherent)
Diamonds are common, largely due to types like Object with methods like equals
12 March 2009 CSE P505 Winter 2009 Dan Grossman 13
Method-name clash
What if C extends C1, C2 which both define m?
Possibilities:
1. Reject declaration of C• Too restrictive with diamonds
2. Require C overrides m• Possibly with directed resends
3. “Left-side” (C1) wins• Question: does cast to C2 change what m means?
4. C gets both methods (implies incoherent subtyping)
5. Other?
12 March 2009 CSE P505 Winter 2009 Dan Grossman 14
Implementation issues
• Multiple-inheritance semantics often muddied by wanting “efficient member lookup”– If “efficient” is compile-time offset from self pointer, then
multiple inheritance means subsumption must “bump the pointer”
– Roughly why C++ has different sorts of casts
• Preaching: Better to think– semantically first: how should subsumption affect the
behavior of method-lookup– implementationally second: what can I optimize based on
the class/type hierarchy
12 March 2009 CSE P505 Winter 2009 Dan Grossman 15
Digression: casts
A “cast” can mean too many different things (cf. C++):
Language-level:• Upcast: no run-time effect• Downcast: failure or no run-time effect• Conversion: key question is round-tripping• “Reinterpret bits”: not well-defined
Implementation level• Upcast: usually no run-time effect, but see multiple inheritance• Downcast: check the tag, maybe fail, but see multiple inheritance• Conversion: same as at language level• “Reinterpret bits”: no effect (by definition)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 16
Least supertypes
[Related to homework 4 challenge problem]
For e1 ? e2 : e3– e2 and e3 need the same type– But that just means a common supertype– But which one? (The least one)
• But multiple inheritance means may not exist!
Common solution:• Reject without explicit cast on e2 and/or e3
12 March 2009 CSE P505 Winter 2009 Dan Grossman 17
Multiple inheritance summary
1. Diamond issues (coherence issues, shared (?) fields)
2. Method clashes (what does inheriting m mean)
3. Implementation issues (slower method lookup)
4. Least supertypes (may not exist)
Multiple interfaces have issues (3) and (4) – Again, an interface is just a named type– Provides no implementation (method or field definition)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 18
Remaining OO plan
• Meaning of type-safety for OO
• Why are subtyping and subclassing separate concepts worth keeping separate?
• Multiple inheritance; multiple interfaces
• Static overloading
• Multimethods
• Revenge of bounded polymorphism
12 March 2009 CSE P505 Winter 2009 Dan Grossman 19
Static overloading
• So far: Assume every method name unique – Same name in subclass meant override
• Many OO languages allow same name, different argument types: A f(B b) {…}
C f(D d, E e) {…} F f(G g, H h) {…}
• Changes method-lookup definition for e.m(e1,…en)– Old: method-lookup a (meta)function of the class of the object e evaluates to (at run-time)
– New: method-lookup a (meta)function of the class of the object e evaluates to (at run-time) and the types of e1,…,en (at compile-time)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 20
Ambiguity
Because of subtyping, multiple methods can match!
“Best match” rules are complicated. One rough idea:– Fewer subsumptions is better match– If tied, subsume to immediate supertypes & recur
Ambiguities remain (no best match)
1. A f(B) or C f(B) (usually disallowed)
2. A f(B) or A f(C) and f(e) where e has a subtype of B and C but B and C are incomparable
3. A f(B,C) or A f(C,B) and f(e1,e2) where e1 and e2 have type B and B ≤C
12 March 2009 CSE P505 Winter 2009 Dan Grossman 21
Multimethods
Static overloading mostly saves keystrokes – Shorter method names– Name-mangling on par with syntactic sugar– But sometimes can comment out a method and program still
type-checks with different run-time behavior due to different compile-time method resolution
Multiple (dynamic) dispatch (a.k.a. multimethods) much more interesting: Method lookup for e.m(e1,…,en)a (meta)function of the classes of the objects e and e1,…,en evaluate to (at run-time)
A natural generalization: “receiver” no longer special
So may as well write m(e1,…,en) instead of e1.m(e2,…,en)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 22
Multimethods example
• compare(x,y) calls first version unless both arguments are Bs– Could add “one of each” methods if you want different
behavior
• f has fairly surprising behavior– But still more useful than with static overloading?
class A { int f; }class B extends A { int g; }bool compare(A x, A y) { x.f==y.f }bool compare(B x, B y) { x.f==y.f && x.g==y.g }bool f(A x, A y, A z) { compare(x,y) &&
compare(y,z) }
12 March 2009 CSE P505 Winter 2009 Dan Grossman 23
Pragmatics; UW
Not clear where multimethods should be defined• No longer “belong to a class” because receiver not special
Multimethods are “more OO” because dynamic-dispatch is the essence of OO
Multimethods are “less OO” because without distinguished receiver the “analogy to physical objects” is reduced
A couple papers:– Millstein got a UW PhD around multimethods for Java
• UW a long-time multimethods leader– Nice summary and “where really used” Noble OOPSLA08
12 March 2009 CSE P505 Winter 2009 Dan Grossman 24
Revenge of ambiguity
• Like static overloading, multimethods have “no best match” problems
• Unlike static overloading, the problem does not arise until run-time!
Possible solutions:
1. Run-time exception
2. Always define a best-match (e.g., Dylan)
3. A conservative type system
12 March 2009 CSE P505 Winter 2009 Dan Grossman 25
Remaining OO plan
• Meaning of type-safety for OO
• Why are subtyping and subclassing separate concepts worth keeping separate?
• Multiple inheritance; multiple interfaces
• Static overloading
• Multimethods
• Revenge of bounded polymorphism
12 March 2009 CSE P505 Winter 2009 Dan Grossman 26
Still want generics
OO subtyping no replacement for parametric polymorphismSo have both
Example:
/* 3 type constructors (e.g., Int Set a type) */interface ’a Comparable { Int f(’a,’a); }interface ’a Predicate { Bool f(’a); }class ’a Set { … constructor(’a Comparable x){…} unit add (’a x) {…} ’a Set functional_add(’a x) {…} ’a find (’a Predicate x) {…}}
12 March 2009 CSE P505 Winter 2009 Dan Grossman 27
Worse ambiguity
“Interesting” interaction with overloading or multimethods
class B { Int f(Int C x){1} Int f(String C x){2} Int g(’a x) { self.f(x) }}
Whether match is found depends on instantiation of ’a
Cannot resolve static overloading at compile-time without code duplication
At run-time, need run-time type information– Including instantiation of type constructors– Or restrict overloading enough to avoid it
12 March 2009 CSE P505 Winter 2009 Dan Grossman 28
Wanting bounds
As expected, with subtyping and generics, want bounded polymorphism
Example:
interface Printable { unit print(); }class (’a ≤ Printable) Logger { ’a item; ’a get() { item.print(); item }}
w/o polymorphism, get would return an Printable (not useful)
w/o the bound, get could not send print to item
12 March 2009 CSE P505 Winter 2009 Dan Grossman 29
Fancy example
With forethought, can use bounds to avoid some subtyping limitations
(Example lifted from Abadi/Cardelli text; I would have never thought of this)
/* Herbivore1 ≤ Omnivore1 unsound */interface Omnivore1 { unit eat(Food); }interface Herbivore1 { unit eat(Veg); }/* T Herbivore2 ≤ T Omnivore2 sound for any T */interface (’a≤Food) Omnivore2 { unit eat(’a); }interface (’a≤Veg) Herbivore2 { unit eat(’a); }/* subtyping lets us pass herbivores to feed but only if food is a Veg */unit feed(’a food, ’a Omnivore animal) { animal.eat(food);}
12 March 2009 CSE P505 Winter 2009 Dan Grossman 30
You have grading to do…
I am going to distribute course evaluation forms so you may rate thequality of this course. Your participation is voluntary, and you mayomit specific items if you wish. To ensure confidentiality, do notwrite your name on the forms. There is a possibility your handwritingon the yellow written comment sheet will be recognizable; however, Iwill not see the results of this evaluation until after the quarter isover and you have received your grades. Please be sure to use a No. 2 PENCIL ONLY on the scannable form.
I have chosen _______ to distribute and collect the forms. Whenyou are finished, he/she will collect the forms, put them into anenvelope and mail them to the Office of Educational Assessment. Ifthere are no questions, I will leave the room and not return until allthe questionnaires have been finished and collected. Thank you foryour participation.
12 March 2009 CSE P505 Winter 2009 Dan Grossman 31
From the beginning
Problem:1. Why do we need memory management?
• Same reason for any finite reusable resource2. What does safety mean? (What is guaranteed?)3. What is drag?
Solutions:1. How does tracing garbage collection (GC) work?
2. What other ways for safe memory management? a. Unique pointersb. (Automatic) reference-countingc. Regions
12 March 2009 CSE P505 Winter 2009 Dan Grossman 32
Why reuse?
• Values/objects/code take up space
• Using too much space slows down programs– Eventually they stop (memory exhaustion)
• Optimal space: reclaim immediately after last use– Earlier is incorrect (dangling-pointer dereference)– Drag is time between last use and reclamation
• But:– Last-use undecidable– Batched reclamation can gain time for space
12 March 2009 CSE P505 Winter 2009 Dan Grossman 33
The view from C/C++
• Stack objects reclaimed at end of block/function
• Heap objects reclaimed with call to free/delete– Drag can still exist
• Dangling-pointers fine; dereferencing them unsafe– “Double-free” also unsafe
• Unreclaimed objects that become unreachable will:– Never be used– Never be reclaimed– So drag until termination (“space leak”)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 34
Reachability
Reachability soundly approximates “may be used again”
Inductive definition (transitive “points to”):• Global variables reachable• Unreclaimed stack objects reachable
– Liveness analysis can do a bit better• Objects pointed to by reachable objects are reachable
C: Avoid leaks by freeing before unreachable
Garbage-collected language: Make things unreachable
Reachability is an approximation that works well in practice
12 March 2009 CSE P505 Winter 2009 Dan Grossman 35
Reachability and leaks
• GC’d languages reclaim unreachable objects– So by some definitions “leaks are impossible”– Like by some definitions deadlock with atomic is impossible
• But “infinite drag times” are possible– Example: large unused data structure in a global
• Programming for space in GC’d languages– Usually ignore the issue– Set pointers to null when done with them
• Error-prone!– Use weak pointers where appropriate
• Provided as a language feature, dereference can fail
• Compiler-writer should also consider if optimizations are “safe for space”
12 March 2009 CSE P505 Winter 2009 Dan Grossman 36
Where are we
Problem:1. Why do we need memory management?
• Same reason for any finite reusable resource2. What does safety mean?3. What is drag?
Solutions:1. How does garbage collection (GC) work?2. What other ways for safe memory management?
a. Unique pointersb. (Automatic) reference-countingc. Regions
12 March 2009 CSE P505 Winter 2009 Dan Grossman 37
Reachability, cont’d
Algorithm sketch to find all reachable objects:• Start at roots (globals and stack objects)• Follow all pointers, but do not go around cycles
Problems:• Find all pointers in pointed-to object
– How big is the object?– What fields are integers?
• Avoid cycles (solution depends on GC technique)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 38
Finding sizes
Garbage collector must know an object’s size– free/delete need to know too!
Solutions:• A header word (e.g., before object) with the size
– Class pointer can “serve double-duty”• Size segregation and a global table of “page to size”
Bottom line:• Allocator and/or compiler must collaborate with GC
12 March 2009 CSE P505 Winter 2009 Dan Grossman 39
Finding pointers
Does the GC know which fields/roots are pointers?• Yes: accurate GC• No: conservative GC
Theory: With conservative GC, “one unlucky int” could keep huge amount of data
Practice: Conservative GC tends to work
Accurate GC techniques:• Class-pointer can “serve triple-duty”• Low-order bit tricks (e.g., Caml ints are 31-bits)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 40
Conservative GC for C
Yes, you can (conservatively) GC a C program• The Boehm-Demers-Weiser conservative collector
2 of many interesting details:• Use collector’s malloc (so GC knows the size)• Possible b/c C bans code most people think is legal:
void f() { int * p = malloc(100*sizeof(int)); int * q = p + 1000; // not allowed q[-950] = 17; int * r = p + 100; /* allowed */ r[-50] = 17;}
Compile-time flag to “add a byte or keep 2 objects”
12 March 2009 CSE P505 Winter 2009 Dan Grossman 41
Semispace copying collection
• Divide memory into 2 equal-size contiguous pieces• Allocate objects into one-space until full
– Easy and fast: “bump an allocation-pointer”• Now have a full from-space & an empty to-space
– Copy reachable objects into end of to-space– Set allocation-pointer just past them in to-space– Restart the program (semispaces reversed roles)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 42
Wait a minute
Skimmed over key details• We moved objects; must update all pointers to them• Must avoid cycles• The GC can run without much extra space (good)
How:• “Cheney queue” just two pointers in to-space
– Objects to scan (update pointers and maybe add pointed-to objects to queue)
• Cycle avoidance: forwarding-pointers in from-space– Easy to tell what space is pointed-to
12 March 2009 CSE P505 Winter 2009 Dan Grossman 43
Mark-sweep collection
• Allocate objects until you have almost no room left• Mark all reachable objects (bit in header word)
– Avoid cycle by checking bit• Sweep through memory
– If object unmarked, reclaim it– If object marked, unmark it
No 2x space and no moving objects, but…
12 March 2009 CSE P505 Winter 2009 Dan Grossman 44
Wait another minute
• In practice, if more than 2/3 of objects or so are reachable, you spend lots of time in GC
• Allocation is complicated– Must find enough space for the new object– Fragmentation can hurt performance
• Or exhaust memory before copying GC does
• No “Cheney” queue, so GC needs an explicit stack or low-level cleverness to run in little space
12 March 2009 CSE P505 Winter 2009 Dan Grossman 45
Generational
Copying and mark-sweep from about 1960Generational GC a key mid-80s optimization because• Most objects die young• Most old objects never get mutated to point to young
How:• Allocate in a nursery• Empty nursery has no pointers into it!• Fill nursery like in copying collection• Also track mutations to record pointers into nursery
– Yet another reason to avoid mutation (slower)• To collect nursery, ignore rest of heap except recorded pointers
12 March 2009 CSE P505 Winter 2009 Dan Grossman 46
Some more terms
Just sketched the basics of copying and mark-sweep
And the orthogonal issue of generations
Some other terms worth knowing:• Incremental GC: do a little bit on each allocation
– Avoid large pause times• Concurrent GC (collector thread in parallel with the program)• Parallel GC (multiple collector threads)
• Lots of other important tricks: – lazy-sweeping, large-object spaces, …
12 March 2009 CSE P505 Winter 2009 Dan Grossman 47
GC Summary
Great survey paper: Paul R. Wilson. Uniprocessor Garbage Collection Techniques.
International Workshop on Memory Management 1992
• Programmer must know about reachability, that objects may move, that mutation may cost, etc.
• GC implementor must try to do well without knowing the application’s memory behavior– But done by memory-system experts!– One-size-fits-most
12 March 2009 CSE P505 Winter 2009 Dan Grossman 48
Where are we
Problem:1. Why do we need memory management?
• Same reason for any finite reusable resource2. What does safety mean?3. What is drag?
Solutions:1. How does garbage collection (GC) work?2. What other ways for safe memory management?
a. Unique pointersb. (Automatic) reference-countingc. Regions
12 March 2009 CSE P505 Winter 2009 Dan Grossman 49
Now forget GC
Idioms that avoid dangling-pointer dereferences– And languages and/or types to enforce them!– A language can have more than one– More work than GC, but safer than unchecked malloc/free
Worth knowing just for the idioms
12 March 2009 CSE P505 Winter 2009 Dan Grossman 50
Unique pointers
• If p is the only pointer to o, then free(p) can’t lead to dangling-pointer dereferences provided *p is not used afterwards
• Unique-pointers allow only trees (no dags or cycles)
• Maintaining uniqueness invariant– Dynamic: destructive-reads
• p=q and free(q) set q to null– Static: linear type systems and/or flow analysis
12 March 2009 CSE P505 Winter 2009 Dan Grossman 51
Reference-counting
(Dynamic) ref-counting basics:• Store number of pointers to object with object• If count goes to zero, free it
Can automate this easily enough
But:• Cycles never get reclaimed unless programmer breaks the
cycle– Or cycles are eventually detected via other techniques
• Expensive without tricks (e.g., “deferred ref-counting”)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 52
Regions
• A decades-old idiom also known as zones, arenas, …
• Partition memory into region; every object in one region
• API basics– new_region returns a handle– new_object takes a handle– free_region takes a handle
• No free_object
12 March 2009 CSE P505 Winter 2009 Dan Grossman 53
What did we do
• Accomplished nothing if we put every object in a different region• But now intra-region pointers “can’t go wrong”
– Programmer puts objects with similar lifetimes in same region
– To avoid leaks, just don’t lose the handle• For inter-region pointers, options:
– Dynamic ref-count (see RC or RTSJ)– Type-system to restrict “what points where” and when
pointers can be dereferenced (see Cyclone)
12 March 2009 CSE P505 Winter 2009 Dan Grossman 54
A common idiom
• Far too painful in C: caller knows lifetime of result, callee knows size and structure of result– Leads to evil stack-allocated buffers
• Region solution: a region-handle argument– Easy even if result is some complicated graph
result_t g(handle_t, …);void f() { handle_t h = new_region(); result_t r = g(h,…); /* compute with r */ free_region(h); }
12 March 2009 CSE P505 Winter 2009 Dan Grossman 55
Course summary
• Defining languages is hard but worth it– Interpretation vs. translation– Inference rules vs. a PL for the metalanguage
• Essential features we investigated– Mutable variables (and loops)– Higher-order functions, scope– Pairs and sums– Threads (and locks and channels)– Objects
• Types restrict programs (that’s a good thing!)– But want polymorphism for reuse and abstraction
12 March 2009 CSE P505 Winter 2009 Dan Grossman 56
Penultimate slide
• We avoided:– Subjective non-science (“I like curly braces”)– Real-world issues (“cool libraries/tricks in language X”)
• Focused on:– Concepts that almost every language has, including the next
fad that doesn’t exist yet– Connections (objects and closures are different, but not
totally different)– Reference implementations, not fast or industrial-strength
ones