CSEP505: Programming Languages Lecture 10: OOP; Memory Mgmt; Wrap-Up Dan Grossman Winter 2009.

CSEP505: Programming LanguagesLecture 10: OOP; Memory Mgmt; Wrap-Up

Dan Grossman

Winter 2009

12 March 2009 CSE P505 Winter 2009 Dan Grossman 2

Last time

• Key novelty / semantic difference of OOP is dynamic dispatch– Defined by self mapping to “whole current object”

• The method’s “receiver”

• Investigating the “extensibility problem” with canonical example:– Abstract class Exp with subclasses IntExp, AddExp, …– Exp has methods for interp, typecheck, toInt, …


The Grid

interp typecheck toInt …

IntExp Code Code Code Code

AddExp Code Code Code Code

MultExp Code Code Code Code

… Code Code Code Code

1 new function

1 new class


Back to MultExp

• Even in OOP, MultExp is easy to add, but you’ll copy the typecheck method of AddExp

• Or maybe AddExp extends MultExp, but it’s a kludge

• Or maybe refactor into BinaryExp with subclasses AddExp and MultExp– So much for not changing existing code– Fairly heavyweight approach to a helper function


Remaining OO plan

• Meaning of type-safety for OO

• Why are subtyping and subclassing separate concepts worth keeping separate?

• Multiple inheritance; multiple interfaces

• Static overloading

• Multimethods

• Revenge of bounded polymorphism


Typechecking

We were sloppy:

talked about types without “what are we preventing”

1. In pure OO, stuck if we need to interpret v.m(v1,…,vn) and v has no m method (taking n args)• “No such method” error

2. Also if ambiguous: multiple methods with same name and there is no “best choice”• “No best match” error• Will arise with static overloading and multimethods


Subtyping vs. subclassing

• Often convenient confusion: C a subtype of D if and only if C a subclass of D– But self is covariant; the key type system difference

• But more subtypes are sound– If A has every field and method that B has (at appropriate

types), then subsume B to A– Interfaces help, but require explicit annotation

• And fewer subtypes could allow more code reuse…


Non-subtyping example

Pt2 ≤ Pt1 is unsound here:

class Pt1 extends Object { int x; int get_x() { x } bool compare(Pt1 p){ p.get_x() == self.get_x() }}class Pt2 extends Pt1 { int y; int get_y() { y } bool compare(Pt2 p) { // override p.get_x() == self.get_x() && p.get_y() == self.get_y() }}


What happened

• Could inherit code without being a subtype• Cannot always do this

– what if get_x called self.compare with a Pt1Possible solutions:– Re-typecheck get_x in subclass– Use a really fancy type system– Don’t override compare

• Moral: Not suggesting “subclassing not subtyping” is useful, but the concepts of inheritance and subtyping are orthogonal


Remaining OO plan





• Multimethods



Multiple inheritance

Why not allow C extends C1,…,Cn {…}– and C≤C1, …, C≤Cn

What everyone agrees on: C++ has it, Java doesn’t

We’ll just consider some problems it introduces and how (multiple) interfaces avoids some of them

Problem sources:

1. Class hierarchy is a dag, not a tree

2. Type hierarchy is a dag, not a tree


Diamonds

• If C extends C1, C2 and C1, C2 have a common (transitive) superclass D, we have a diamond– Always have one with multiple inheritance and a topmost

class (Object)• If D has a field f, does C have one field or two?

– C++ answer: yes • If D has a method m, C1 and C2 will have a clash

– Also possible without a diamond• If subsumption is coercive (changing method-lookup), then how

we subsume from C to D affects run-time behavior (incoherent)

Diamonds are common, largely due to types like Object with methods like equals


Method-name clash

What if C extends C1, C2 which both define m?

Possibilities:

1. Reject declaration of C• Too restrictive with diamonds

2. Require C overrides m• Possibly with directed resends

3. “Left-side” (C1) wins• Question: does cast to C2 change what m means?

4. C gets both methods (implies incoherent subtyping)

5. Other?


Implementation issues

• Multiple-inheritance semantics often muddied by wanting “efficient member lookup”– If “efficient” is compile-time offset from self pointer, then

multiple inheritance means subsumption must “bump the pointer”

– Roughly why C++ has different sorts of casts

• Preaching: Better to think– semantically first: how should subsumption affect the

behavior of method-lookup– implementationally second: what can I optimize based on

the class/type hierarchy


Digression: casts

A “cast” can mean too many different things (cf. C++):

Language-level:• Upcast: no run-time effect• Downcast: failure or no run-time effect• Conversion: key question is round-tripping• “Reinterpret bits”: not well-defined

Implementation level• Upcast: usually no run-time effect, but see multiple inheritance• Downcast: check the tag, maybe fail, but see multiple inheritance• Conversion: same as at language level• “Reinterpret bits”: no effect (by definition)


Least supertypes

[Related to homework 4 challenge problem]

For e1 ? e2 : e3– e2 and e3 need the same type– But that just means a common supertype– But which one? (The least one)

• But multiple inheritance means may not exist!

Common solution:• Reject without explicit cast on e2 and/or e3


Multiple inheritance summary

1. Diamond issues (coherence issues, shared (?) fields)

2. Method clashes (what does inheriting m mean)

3. Implementation issues (slower method lookup)

4. Least supertypes (may not exist)

Multiple interfaces have issues (3) and (4) – Again, an interface is just a named type– Provides no implementation (method or field definition)


Remaining OO plan





• Multimethods



Static overloading

• So far: Assume every method name unique – Same name in subclass meant override

• Many OO languages allow same name, different argument types: A f(B b) {…}

C f(D d, E e) {…} F f(G g, H h) {…}

• Changes method-lookup definition for e.m(e1,…en)– Old: method-lookup a (meta)function of the class of the object e evaluates to (at run-time)

– New: method-lookup a (meta)function of the class of the object e evaluates to (at run-time) and the types of e1,…,en (at compile-time)


Ambiguity

Because of subtyping, multiple methods can match!

“Best match” rules are complicated. One rough idea:– Fewer subsumptions is better match– If tied, subsume to immediate supertypes & recur

Ambiguities remain (no best match)

1. A f(B) or C f(B) (usually disallowed)

2. A f(B) or A f(C) and f(e) where e has a subtype of B and C but B and C are incomparable

3. A f(B,C) or A f(C,B) and f(e1,e2) where e1 and e2 have type B and B ≤C


Multimethods

Static overloading mostly saves keystrokes – Shorter method names– Name-mangling on par with syntactic sugar– But sometimes can comment out a method and program still

type-checks with different run-time behavior due to different compile-time method resolution

Multiple (dynamic) dispatch (a.k.a. multimethods) much more interesting: Method lookup for e.m(e1,…,en)a (meta)function of the classes of the objects e and e1,…,en evaluate to (at run-time)

A natural generalization: “receiver” no longer special

So may as well write m(e1,…,en) instead of e1.m(e2,…,en)


Multimethods example

• compare(x,y) calls first version unless both arguments are Bs– Could add “one of each” methods if you want different

behavior

• f has fairly surprising behavior– But still more useful than with static overloading?

class A { int f; }class B extends A { int g; }bool compare(A x, A y) { x.f==y.f }bool compare(B x, B y) { x.f==y.f && x.g==y.g }bool f(A x, A y, A z) { compare(x,y) &&

compare(y,z) }


Pragmatics; UW

Not clear where multimethods should be defined• No longer “belong to a class” because receiver not special

Multimethods are “more OO” because dynamic-dispatch is the essence of OO

Multimethods are “less OO” because without distinguished receiver the “analogy to physical objects” is reduced

A couple papers:– Millstein got a UW PhD around multimethods for Java

• UW a long-time multimethods leader– Nice summary and “where really used” Noble OOPSLA08


Revenge of ambiguity

• Like static overloading, multimethods have “no best match” problems

• Unlike static overloading, the problem does not arise until run-time!

Possible solutions:

1. Run-time exception

2. Always define a best-match (e.g., Dylan)

3. A conservative type system


Remaining OO plan





• Multimethods



Still want generics

OO subtyping no replacement for parametric polymorphismSo have both

Example:

/* 3 type constructors (e.g., Int Set a type) */interface ’a Comparable { Int f(’a,’a); }interface ’a Predicate { Bool f(’a); }class ’a Set { … constructor(’a Comparable x){…} unit add (’a x) {…} ’a Set functional_add(’a x) {…} ’a find (’a Predicate x) {…}}


Worse ambiguity

“Interesting” interaction with overloading or multimethods

class B { Int f(Int C x){1} Int f(String C x){2} Int g(’a x) { self.f(x) }}

Whether match is found depends on instantiation of ’a

Cannot resolve static overloading at compile-time without code duplication

At run-time, need run-time type information– Including instantiation of type constructors– Or restrict overloading enough to avoid it


Wanting bounds

As expected, with subtyping and generics, want bounded polymorphism

Example:

interface Printable { unit print(); }class (’a ≤ Printable) Logger { ’a item; ’a get() { item.print(); item }}

w/o polymorphism, get would return an Printable (not useful)

w/o the bound, get could not send print to item


Fancy example

With forethought, can use bounds to avoid some subtyping limitations

(Example lifted from Abadi/Cardelli text; I would have never thought of this)

/* Herbivore1 ≤ Omnivore1 unsound */interface Omnivore1 { unit eat(Food); }interface Herbivore1 { unit eat(Veg); }/* T Herbivore2 ≤ T Omnivore2 sound for any T */interface (’a≤Food) Omnivore2 { unit eat(’a); }interface (’a≤Veg) Herbivore2 { unit eat(’a); }/* subtyping lets us pass herbivores to feed but only if food is a Veg */unit feed(’a food, ’a Omnivore animal) { animal.eat(food);}


You have grading to do…

I am going to distribute course evaluation forms so you may rate thequality of this course. Your participation is voluntary, and you mayomit specific items if you wish. To ensure confidentiality, do notwrite your name on the forms. There is a possibility your handwritingon the yellow written comment sheet will be recognizable; however, Iwill not see the results of this evaluation until after the quarter isover and you have received your grades. Please be sure to use a No. 2 PENCIL ONLY on the scannable form.

I have chosen _______ to distribute and collect the forms. Whenyou are finished, he/she will collect the forms, put them into anenvelope and mail them to the Office of Educational Assessment. Ifthere are no questions, I will leave the room and not return until allthe questionnaires have been finished and collected. Thank you foryour participation.


From the beginning

Problem:1. Why do we need memory management?

• Same reason for any finite reusable resource2. What does safety mean? (What is guaranteed?)3. What is drag?

Solutions:1. How does tracing garbage collection (GC) work?

2. What other ways for safe memory management? a. Unique pointersb. (Automatic) reference-countingc. Regions


Why reuse?

• Values/objects/code take up space

• Using too much space slows down programs– Eventually they stop (memory exhaustion)

• Optimal space: reclaim immediately after last use– Earlier is incorrect (dangling-pointer dereference)– Drag is time between last use and reclamation

• But:– Last-use undecidable– Batched reclamation can gain time for space


The view from C/C++

• Stack objects reclaimed at end of block/function

• Heap objects reclaimed with call to free/delete– Drag can still exist

• Dangling-pointers fine; dereferencing them unsafe– “Double-free” also unsafe

• Unreclaimed objects that become unreachable will:– Never be used– Never be reclaimed– So drag until termination (“space leak”)


Reachability

Reachability soundly approximates “may be used again”

Inductive definition (transitive “points to”):• Global variables reachable• Unreclaimed stack objects reachable

– Liveness analysis can do a bit better• Objects pointed to by reachable objects are reachable

C: Avoid leaks by freeing before unreachable

Garbage-collected language: Make things unreachable

Reachability is an approximation that works well in practice


Reachability and leaks

• GC’d languages reclaim unreachable objects– So by some definitions “leaks are impossible”– Like by some definitions deadlock with atomic is impossible

• But “infinite drag times” are possible– Example: large unused data structure in a global

• Programming for space in GC’d languages– Usually ignore the issue– Set pointers to null when done with them

• Error-prone!– Use weak pointers where appropriate

• Provided as a language feature, dereference can fail

• Compiler-writer should also consider if optimizations are “safe for space”


Where are we


• Same reason for any finite reusable resource2. What does safety mean?3. What is drag?

Solutions:1. How does garbage collection (GC) work?2. What other ways for safe memory management?

a. Unique pointersb. (Automatic) reference-countingc. Regions


Reachability, cont’d

Algorithm sketch to find all reachable objects:• Start at roots (globals and stack objects)• Follow all pointers, but do not go around cycles

Problems:• Find all pointers in pointed-to object

– How big is the object?– What fields are integers?

• Avoid cycles (solution depends on GC technique)


Finding sizes

Garbage collector must know an object’s size– free/delete need to know too!

Solutions:• A header word (e.g., before object) with the size

– Class pointer can “serve double-duty”• Size segregation and a global table of “page to size”

Bottom line:• Allocator and/or compiler must collaborate with GC


Finding pointers

Does the GC know which fields/roots are pointers?• Yes: accurate GC• No: conservative GC

Theory: With conservative GC, “one unlucky int” could keep huge amount of data

Practice: Conservative GC tends to work

Accurate GC techniques:• Class-pointer can “serve triple-duty”• Low-order bit tricks (e.g., Caml ints are 31-bits)


Conservative GC for C

Yes, you can (conservatively) GC a C program• The Boehm-Demers-Weiser conservative collector

2 of many interesting details:• Use collector’s malloc (so GC knows the size)• Possible b/c C bans code most people think is legal:

void f() { int * p = malloc(100*sizeof(int)); int * q = p + 1000; // not allowed q[-950] = 17; int * r = p + 100; /* allowed */ r[-50] = 17;}

Compile-time flag to “add a byte or keep 2 objects”


Semispace copying collection

• Divide memory into 2 equal-size contiguous pieces• Allocate objects into one-space until full

– Easy and fast: “bump an allocation-pointer”• Now have a full from-space & an empty to-space

– Copy reachable objects into end of to-space– Set allocation-pointer just past them in to-space– Restart the program (semispaces reversed roles)


Wait a minute

Skimmed over key details• We moved objects; must update all pointers to them• Must avoid cycles• The GC can run without much extra space (good)

How:• “Cheney queue” just two pointers in to-space

– Objects to scan (update pointers and maybe add pointed-to objects to queue)

• Cycle avoidance: forwarding-pointers in from-space– Easy to tell what space is pointed-to


Mark-sweep collection

• Allocate objects until you have almost no room left• Mark all reachable objects (bit in header word)

– Avoid cycle by checking bit• Sweep through memory

– If object unmarked, reclaim it– If object marked, unmark it

No 2x space and no moving objects, but…


Wait another minute

• In practice, if more than 2/3 of objects or so are reachable, you spend lots of time in GC

• Allocation is complicated– Must find enough space for the new object– Fragmentation can hurt performance

• Or exhaust memory before copying GC does

• No “Cheney” queue, so GC needs an explicit stack or low-level cleverness to run in little space


Generational

Copying and mark-sweep from about 1960Generational GC a key mid-80s optimization because• Most objects die young• Most old objects never get mutated to point to young

How:• Allocate in a nursery• Empty nursery has no pointers into it!• Fill nursery like in copying collection• Also track mutations to record pointers into nursery

– Yet another reason to avoid mutation (slower)• To collect nursery, ignore rest of heap except recorded pointers


Some more terms

Just sketched the basics of copying and mark-sweep

And the orthogonal issue of generations

Some other terms worth knowing:• Incremental GC: do a little bit on each allocation

– Avoid large pause times• Concurrent GC (collector thread in parallel with the program)• Parallel GC (multiple collector threads)

• Lots of other important tricks: – lazy-sweeping, large-object spaces, …


GC Summary

Great survey paper: Paul R. Wilson. Uniprocessor Garbage Collection Techniques.

International Workshop on Memory Management 1992

• Programmer must know about reachability, that objects may move, that mutation may cost, etc.

• GC implementor must try to do well without knowing the application’s memory behavior– But done by memory-system experts!– One-size-fits-most


Where are we


• Same reason for any finite reusable resource2. What does safety mean?3. What is drag?

Solutions:1. How does garbage collection (GC) work?2. What other ways for safe memory management?

a. Unique pointersb. (Automatic) reference-countingc. Regions


Now forget GC

Idioms that avoid dangling-pointer dereferences– And languages and/or types to enforce them!– A language can have more than one– More work than GC, but safer than unchecked malloc/free

Worth knowing just for the idioms


Unique pointers

• If p is the only pointer to o, then free(p) can’t lead to dangling-pointer dereferences provided *p is not used afterwards

• Unique-pointers allow only trees (no dags or cycles)

• Maintaining uniqueness invariant– Dynamic: destructive-reads

• p=q and free(q) set q to null– Static: linear type systems and/or flow analysis


Reference-counting

(Dynamic) ref-counting basics:• Store number of pointers to object with object• If count goes to zero, free it

Can automate this easily enough

But:• Cycles never get reclaimed unless programmer breaks the

cycle– Or cycles are eventually detected via other techniques

• Expensive without tricks (e.g., “deferred ref-counting”)


Regions

• A decades-old idiom also known as zones, arenas, …

• Partition memory into region; every object in one region

• API basics– new_region returns a handle– new_object takes a handle– free_region takes a handle

• No free_object


What did we do

• Accomplished nothing if we put every object in a different region• But now intra-region pointers “can’t go wrong”

– Programmer puts objects with similar lifetimes in same region

– To avoid leaks, just don’t lose the handle• For inter-region pointers, options:

– Dynamic ref-count (see RC or RTSJ)– Type-system to restrict “what points where” and when

pointers can be dereferenced (see Cyclone)


A common idiom

• Far too painful in C: caller knows lifetime of result, callee knows size and structure of result– Leads to evil stack-allocated buffers

• Region solution: a region-handle argument– Easy even if result is some complicated graph

result_t g(handle_t, …);void f() { handle_t h = new_region(); result_t r = g(h,…); /* compute with r */ free_region(h); }


Course summary

• Defining languages is hard but worth it– Interpretation vs. translation– Inference rules vs. a PL for the metalanguage

• Essential features we investigated– Mutable variables (and loops)– Higher-order functions, scope– Pairs and sums– Threads (and locks and channels)– Objects

• Types restrict programs (that’s a good thing!)– But want polymorphism for reuse and abstraction


Penultimate slide

• We avoided:– Subjective non-science (“I like curly braces”)– Real-world issues (“cool libraries/tricks in language X”)

• Focused on:– Concepts that almost every language has, including the next

fad that doesn’t exist yet– Connections (objects and closures are different, but not

totally different)– Reference implementations, not fast or industrial-strength

ones


Questions?

Questions?

CSEP505: Programming Languages Lecture 10: OOP; Memory Mgmt; Wrap-Up Dan Grossman Winter 2009.

Documents