Binary Trees 15-211 Fundamental Data Structures and Algorithms Peter Lee January 23, 2002
Binary Trees
15-211 Fundamental Data Structures and Algorithms
Peter LeeJanuary 23, 2002
Plan
TodayReview of binary trees, and some analysis
Reading:For today: Chapter 19.1-19.3For next time: Chapter 19.4, 20
Reminder: HW1 due on Monday!
Trees are Everywhere
CS is upside down
root
leaves
Trees
a
b c d
e f
depth=2(“height”)
root
nodesnode label
parent
children leaves
Trees, more abstractly A tree is a graph (usually directed)
with the following characteristics:
There is a distinguished node called the root node.
Every non-root node has exactly one parent node (the root has none).
Unique parents
a
b c d
e f
root
Trees are everywhere
Tree structures are everywhere in life.
As a result, in computer programs, trees turn out to be one of the most commonly used data structures.
Arithmetic Expressions
+
* 5
2 7
Organization charts
ignore this
Origins of natural languages
Origins of life =========================================== Porifera (sponges) | | ================================== Cnidaria (jellyfish, anemones, corals, etc.) | | | | =============================== Ctenophora (comb-jellies) | | | | | | ====== Arthropoda (insects, spiders, crabs, etc.) | | | | | | | ===|===== Onychophora (velvet worms) | | | | | | ======| | ======| ====== Tardigrada (water bears) | | | | | | | | | | | | ====== Annelida (segmented worms) | | | | | ===|<<===| | | | | | === Pogonophora | | | | | ===| | | | | ===| === Vestimentifera | | | | | | | | ===| | |============== Echiura | | | | | | | | ===| |============== Mollusca (snails, clams, squids, etc.) | | | | | | | | | | | =============== Sipuncula | | | ==P=| | | | | | | ================== Nemertea (ribbon worms) | | | | | | | | | ===================== Platyhelminthes (flatworms) | | | | ===| | ===| ============ Chordata (vertebrates and relatives) | | | | ===| | | | | ===| ============ Hemichordata | | | | | | | | | | ===| =============== lophophorates | ===| | | | | | ==D=| ================== Chaetognatha | | | | | ===================== Echinodermata (starfish, urchins, sea cucumbers, etc.) | | | ============================ pseudocoelomates | |======================================= Placozoa | |======================================= Monoblastozoa | |======================================= Rhomobozoa | ======================================== Orthonectida
14
Taxonomies
Rectangle
Square
Parallelogram Ellipse
Circle
Shape
Triangle
Tournament structure
Game trees
Directory structure
/afs
cs andrew
acs course usr
15 18
113 211
usr
Trees, inductively A tree is
empty, orA node containing a set of trees (its
children) Alternatively, a tree is
A leaf node (no children), orA node containing a set of trees (its
children)
Binary trees
Let's focus on binary trees
A binary tree is either • empty (we'll write nil for clarity), or• looks like (x,L,R) where x is an object, and L, R are binary subtrees
In pictures
x
LR
nil
Binary tree nodes in Java
class BinaryNode { private Object element; private BinaryNode left, right;
public BinaryNode() {…}
public Object getElement() {…} public BinaryNode getLeft() {…} public BinaryNode getRight() {…} public void setElement(Object x) {…} public BinaryNode setLeft() {…} public BinaryNode setRight() {…}…}
From Weiss, pg 579:
Quick exercise
flat(T) = e,b,f,a,d,e
a
b d
e f d
T
Write the code for a new “flat” method (also called “inorder”):
public LinkedList flat(T);
(you may assume a “join” operation on LinkedLists)
A solution
public LinkedList flat(BinaryNode t) { if (t == null) // empty tree base case return null; else { LinkedList l = flat(t.getLeft()); LinkedList r = flat(t.getRight()); return l.join(r); }}
Remember to think inductively!
Binary search trees (BSTs)
A binary tree T is a binary search tree iff
flat(T) is an ordered sequence.
Equivalently, in (x,L,R) all the nodes in L are less than x, and all the nodes in R are larger than x.
flat(T) = 2,3,4,5,6,7,9
Example5
3
6
7
2 4 9
search(x,nil) = falsesearch(x,(x,L,R)) = true
search(x,(a,L,R)) = search(x,L) x<asearch(x,(a,L,R)) = search(x,R) x>a
Binary searchHow does one search in a BST?
Inductively:
Bentley on Binary Search
Quote from Jon Bentley, "Programming Pearls", page 35, 36 (slightly re-worded to use Java syntax):
“Given a sorted array A[0] <= A[1] <=...<= A[n-1], we want to determine if a given element T is in the array. Binary search solves the problem by keeping track of a range within the array in which T must be if it is anywhere in the array. Initially the range is the entire array. The range is shrunk by comparing its middle element to T, and then discarding half the range. The process continues until T is found, or until the range in which it must lie is known to be empty. In an n-element table, the search uses roughly log2(n) comparisons.
Bentley, cont’d
“I've assigned this problem [binary search] in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert the above description [of binary search] into a program in the language of their choice.....at the end of the period, most programmers reported that they had written correct code for the task. We would then take 30 minutes to examine their code.... In several cases, and with over 100 programmers, the results varied little. 90% of the programmers found bugs in their programs.
Bentley, cont’d
“I was amazed: given ample time, only about 10 percent of professional programmers were able to get this small program right. But they aren't the only ones....Knuth points out that while the first binary search was published in 1946, the first published binary search without bugs did not appear until 1962.”
search(x,nil) = falsesearch(x,(x,L,R)) = true
search(x,(a,L,R)) = search(x,L) x<asearch(x,(a,L,R)) = search(x,R) x>a
Binary searchHow does one search in a BST?
Inductively:
Search for 6:
Searching
5
3
6
7
2 4 9
= 6?5
3
6
7
2 4 9
Searching
= 6?
5
3
6
7
2 4 9
Searching
= 6?
5
3
6
7
2 4 9
Searching
Decisions, decisions
The tree just implements the possible sequences of decisions we would have made in ordinary BS on an array.
5
3 7
< >=
It's binary since the = part causes immediate termination.
search(x,nil) = falsesearch(x,(x,L,R)) = true
search(x,(a,L,R)) = search(x,L) x<asearch(x,(a,L,R)) = search(x,R) x>a
Binary search
should return something useful…
What is the important “step” to count?
Real life
In most applications, we do not store simple elements, but pairs
(key,value)We search for a key, and want the corresponding value returned if the key is found, and "No" otherwise.
The ADT is called a dictionary.
Correctness
Clearly, search() can never return a false positive answer. But search() only walks down one branch, so how do we know we don't get false negative answers?
Suppose T is a BST that contains x. Claim: search(x,T) properly returns "true".
Proof (by induction)
T cannot be nil, so suppose T = (a,L,R).
Case 1: x = a: done.
Case 2: x < a: Since T is a BST, x must be in L.
But by induction (on trees), search(x,L) returns true. Done.
Case 3: x > a: same as case 2.
Insertions
Insertions in a BST are very similar to searching: find the right spot, and then put down the new element as a new leaf.
We will not allow multiple insertions of the same element, so there is always exactly one place for the new guy.
Deletions
How should we perform deletions?
Next time…
How Many?
How many decisions do we have to make before we have either found the element, or know it's not in the tree?
Why do we care?
versus
How Many?
So the number of decisions seems to be the important “step” to count.
In the worst case, the number of such steps is related to the depth of the tree.
Good Tree
But in a "good" BST we havedepth of T = O( log # nodes )
Unfortunately, in the easiest implementation, BSTs aren’t always good.
We’ll see next week how to maintain “goodness” in our BSTs.
Insertion, inductively Insertion into a BST:
insert(x,nil) = (x,nil,nil)insert(x,(y,L,R)) =
(y,insert(x,L),R), if x<y (y,L,insert(x,R)), if x>y (y,L,R), if x=y
In what kinds of situations can insert create “bad” BSTs?
Why do we care?
versusWhat is the height?
Logarithms and exponents
Logarithms and exponents are everywhere in algorithm analysis
logba = c if a = bc
Logarithms and exponents
Usually will leave off the base b when b=2, so for example
log 1024 = 10
Some useful equalities
logbac = logba + logbclogba/c = logba - logbclogbac = clogbalogba = (logca) / logcb(ba)c = bac
babc = ba+c
ba/bc = ba-c
Logarithms and treesIn a “perfect” BST containing n nodes…•What is the height?•How many nodes are there at level i, at each height?•How many steps does it take to search?
Constant factors
“My computer is 4 times faster than yours.”
So what?
“Big-Oh” notation
N
cf(N)
T(N)
n0
runn
ing
time
T(N) = O(f(N))“T(N) is order f(N)”
“Big-Oh” notation
Given a function T(N): T(N) = O(f(N)) if
there is a real constant c and integer constant n0 such that T(N) cf(N) for all N n0.
c is called the constant factor.
Big-Oh When T(N) = O(f(N)), we are saying
that T(N) grows no faster than f(N).I.e., f(N) describes an upper bound on
T(N).
Put another way:For “large enough” inputs, cf(N) always
dominates T(N). Called the asymptotic behavior
Big-O characteristic
If T(N) = cf(N) thenT(N) = O(f(N))Constant factors “don’t matter”
Because of this, when T(N) = O(cg(N)), we usually drop the constant and just say O(g(N))
Big-O characteristic
Suppose T(N)= k, for some constant k Then T(N) = O(1) Why?
because c*1 > k, for some c
Big-O characteristic More interesting:
Suppose T(N) = 20n3 + 10nlog n + 5Then T(N) = O(n3)Lower-order terms “don’t matter”
Question:What constants c and n0 can be used to show that
the above is true? Answer: c=35, n0=1
From last time…
We calculated this running time for reverse()
So, in big-Oh terms:O(n2)
Big-O characteristic If T1(N) = O(f(N)) and T2(N) = O(g(N))
thenT1(N) + T2(N) = max(O(f(N)), O(g(N)).The bigger task always dominates
eventually.
Also:T1(N) T2(N) = O(f(N) g(N)).
Some common functions
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10
10N100 log N5 N^2N^32^N
Big-Oh is imprecise Let T(N) = 100log N Then T(N) = O(log N)
And T(N) = O(N2) And T(N) = O(N3) And T(N) = O(2N)
Tight bounds
Because of this imprecision, we normally try to find the “tightest” bound on the number of steps.
So, while it is true that reverse() is O(2n), it is more useful to use the tighter bound of O(n2).
Big-O characteristics logk(N) = O(N) for any constant k.
I.e, logarithms grow very slowly.
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
log NN
Note
There is a bit of a mismatch because we are counting “steps”, which are always whole numbers, but logarithms are real numbers
We will take the floor or ceiling of any real numbersUsually this is implicitly done
How Many?
How many decisions do we have to make before we have either found the element, or know it's not in the binary search tree?
Why do we care?
versusWhat is the height?
How Many?
How many decisions do we have to make before we have either found the element, or know it's not in the binary search tree?
We walk down a branch in the tree, so the worst case RT for search is
O( depth of T ) = O(# nodes )
Good Tree
But in a "good" BST we havedepth of T = O( log # nodes )
We’ll see next week how to maintain “goodness” in our BSTs.
70
Inheritance: Taxonomy metaphor
Animal
Human Canine
Dog WolfProfessor Student
Reptile Mammal
Lorises
extends extends
extends
exte
nds extends
extend
s extends extends extends
Classes vs instances
Every object is an instance of a class.
The characteristics of an object are defined by its class.
An object inherits characteristics from all of its superclasses.
Classes vs instances Example:
Peter Lee is an instance of the Professor class.
He is therefore also an instance of the Human, Mammal, and Animal classes.
Sometimes we say that Peter Lee “is a” Professor (or Human or Mammal…)
Peter also “has a” wife and son, who are also instances of the Human class.
In Java:public class Animal { … }
public class Mammal extends Animal { … }
public class Human extends Mammal { … }
public class Professor extends Human { … }
public class MyClass { Public static main () { Professor danny = new Professor(); … }}
Implicitly extends class Object
In a bit more detail…
public class Animal { private int age; …}
public class Mammal extends Animal { private Mammal father; private Mammal mother; private List children; …}
public class Human extends Mammal { private String name; private Race race; private boolean hasMate; …}
public class Professor extends Human { private Department dept; …}
public class MyClass { Public static main () { Professor danny = new Professor(); … }}
Instance variables
Constructor methods
Each class defines one or more constructor methods for initializing an object’s instance variables.
Question: What are the instance variables for an instance of the Professor class?
Sample constructorspublic class Animal { private int age;
public Animal (int howOld) { age = howOld; }}
public class Mammal extends Animal { private Mammal father; private Mammal mother;
public Mammal(Mammal dad, Mammal mom, int years) {
super(years); father = dad; mother = mom; }}
Why can’t we simply write
age = years;
instead of this?
Java class hierarchy (excerpt)
Click here
Object
NumberMathCompilerClassLoaderClassCharacterBoolean
Byte ShortIntegerFloatDouble
All Classes in Java Ultimately Inherit from
Object
78
Java.langObject
NumberMathCompilerClassLoaderClassCharacterBoolean
Byte ShortIntegerFloatDouble
Inheritance is transitive:Short IS-A Number IS-A Object
thereforeShort IS-A Object
79
Another Taxonomy - Shapes
Rectangle
Square
Ellipse
Circle
Shape
Triangle
80
Interface classes
Rectangle
Square
Ellipse
Circle
Triangle
Shape
public interface Shape {public void draw();public double area();public Point upperLeft();public void moveTo(Point );public void setColor(Color );
public double perimeter();}
The interface defines all of the methods that are required in any implementation of Shape.
Some rules about interfaces No instance variables, no constructor
methods. No code. All methods must be
“abstract”. Subclasses that implement the
interface say “implements”. A class can inherit from (i.e.,
implement) multiple interfaces.
82
Abstract classes
Rectangle
Square
Ellipse
Circle
Triangle
Shape
public abstract class Shape {abstract public void draw();abstract public double area();abstract public Point upperLeft();abstract public void moveTo(Point );abstract public void setColor(Color );abstract public void setColor(Color );
abstract public double perimeter();}
The abstract class defines all of the methods that are required in any implementation of Shape.
Rules about abstract classes May have instance variables. Can’t invoke “new” on an abstract class. Abstract methods must be declared “abstract”.
Code can be supplied for other methods. Subclasses that inherit from an abstract class say
“extends”. A class may inherit from only one abstract class.
Abstract vs interface classes When does it make sense to use one
over the other?
Interfaces are simpler and allow multiple inheritance.
But sometimes it is handy to have code and instance variables, and so in these cases abstract classes work better.
85
Abstract class exampleabstract class Shape {
abstract public void draw();abstract public double area();abstract public Point upperLeft();abstract public void moveTo(Point );abstract public void setColor(Color );abstract public double perimeter();
public double semiperimeter() { return perimeter() / 2;}
}
86
Rectangle
Square
Parallelogram Ellipse
Circle
Triangle
Shape
Defining the subclassespublic class Rectangle extends Shape {
public Rectangle(Point ul, Point lr, Color c) { … }
public void draw() { … }public double area() { … }public Point upperLeft() { … }public void moveTo(Point ) { … }public void setColor(Color ) { … }public double perimeter() { … }
private Point ul;private Color color;private Point lr;
}
Note that semiperimeter() is inherited from Shape.
87
Defining a subclass
Rectangle
Shape parent class/superclass
child class/subclass
class child extends parent {...}