Binary Trees 15-211 Fundamental Data Structures and Algorithms Peter Lee January 23, 2002.

Binary Trees

15-211 Fundamental Data Structures and Algorithms

Peter LeeJanuary 23, 2002

Plan

TodayReview of binary trees, and some analysis

Reading:For today: Chapter 19.1-19.3For next time: Chapter 19.4, 20

Reminder: HW1 due on Monday!

Trees are Everywhere

CS is upside down

root

leaves

Trees

a

b c d

e f

depth=2(“height”)

root

nodesnode label

parent

children leaves

Trees, more abstractly A tree is a graph (usually directed)

with the following characteristics:

There is a distinguished node called the root node.

Every non-root node has exactly one parent node (the root has none).

Unique parents

a

b c d

e f

root

Trees are everywhere

Tree structures are everywhere in life.

As a result, in computer programs, trees turn out to be one of the most commonly used data structures.

Arithmetic Expressions

+

* 5

2 7

Organization charts

ignore this

Origins of natural languages

Origins of life =========================================== Porifera (sponges) | | ================================== Cnidaria (jellyfish, anemones, corals, etc.) | | | | =============================== Ctenophora (comb-jellies) | | | | | | ====== Arthropoda (insects, spiders, crabs, etc.) | | | | | | | ===|===== Onychophora (velvet worms) | | | | | | ======| | ======| ====== Tardigrada (water bears) | | | | | | | | | | | | ====== Annelida (segmented worms) | | | | | ===|<<===| | | | | | === Pogonophora | | | | | ===| | | | | ===| === Vestimentifera | | | | | | | | ===| | |============== Echiura | | | | | | | | ===| |============== Mollusca (snails, clams, squids, etc.) | | | | | | | | | | | =============== Sipuncula | | | ==P=| | | | | | | ================== Nemertea (ribbon worms) | | | | | | | | | ===================== Platyhelminthes (flatworms) | | | | ===| | ===| ============ Chordata (vertebrates and relatives) | | | | ===| | | | | ===| ============ Hemichordata | | | | | | | | | | ===| =============== lophophorates | ===| | | | | | ==D=| ================== Chaetognatha | | | | | ===================== Echinodermata (starfish, urchins, sea cucumbers, etc.) | | | ============================ pseudocoelomates | |======================================= Placozoa | |======================================= Monoblastozoa | |======================================= Rhomobozoa | ======================================== Orthonectida

14

Taxonomies

Rectangle

Square

Parallelogram Ellipse

Circle

Shape

Triangle

Tournament structure

Game trees

Directory structure

/afs

cs andrew

acs course usr

15 18

113 211

usr

Trees, inductively A tree is

empty, orA node containing a set of trees (its

children) Alternatively, a tree is

A leaf node (no children), orA node containing a set of trees (its

children)

Binary trees

Let's focus on binary trees

A binary tree is either • empty (we'll write nil for clarity), or• looks like (x,L,R) where x is an object, and L, R are binary subtrees

In pictures

x

LR

nil

Binary tree nodes in Java

class BinaryNode { private Object element; private BinaryNode left, right;

public BinaryNode() {…}

public Object getElement() {…} public BinaryNode getLeft() {…} public BinaryNode getRight() {…} public void setElement(Object x) {…} public BinaryNode setLeft() {…} public BinaryNode setRight() {…}…}

From Weiss, pg 579:

Quick exercise

flat(T) = e,b,f,a,d,e

a

b d

e f d

T

Write the code for a new “flat” method (also called “inorder”):

public LinkedList flat(T);

(you may assume a “join” operation on LinkedLists)

A solution

public LinkedList flat(BinaryNode t) { if (t == null) // empty tree base case return null; else { LinkedList l = flat(t.getLeft()); LinkedList r = flat(t.getRight()); return l.join(r); }}

Remember to think inductively!

Binary search trees (BSTs)

A binary tree T is a binary search tree iff

flat(T) is an ordered sequence.

Equivalently, in (x,L,R) all the nodes in L are less than x, and all the nodes in R are larger than x.

flat(T) = 2,3,4,5,6,7,9

Example5

3

6

7

2 4 9

search(x,nil) = falsesearch(x,(x,L,R)) = true

search(x,(a,L,R)) = search(x,L) x<asearch(x,(a,L,R)) = search(x,R) x>a

Binary searchHow does one search in a BST?

Inductively:

Bentley on Binary Search

Quote from Jon Bentley, "Programming Pearls", page 35, 36 (slightly re-worded to use Java syntax):

“Given a sorted array A[0] <= A[1] <=...<= A[n-1], we want to determine if a given element T is in the array. Binary search solves the problem by keeping track of a range within the array in which T must be if it is anywhere in the array. Initially the range is the entire array. The range is shrunk by comparing its middle element to T, and then discarding half the range. The process continues until T is found, or until the range in which it must lie is known to be empty. In an n-element table, the search uses roughly log2(n) comparisons.

Bentley, cont’d

“I've assigned this problem [binary search] in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert the above description [of binary search] into a program in the language of their choice.....at the end of the period, most programmers reported that they had written correct code for the task. We would then take 30 minutes to examine their code.... In several cases, and with over 100 programmers, the results varied little. 90% of the programmers found bugs in their programs.

Bentley, cont’d

“I was amazed: given ample time, only about 10 percent of professional programmers were able to get this small program right. But they aren't the only ones....Knuth points out that while the first binary search was published in 1946, the first published binary search without bugs did not appear until 1962.”



Binary searchHow does one search in a BST?

Inductively:

Search for 6:

Searching

5

3

6

7

2 4 9

= 6?5

3

6

7

2 4 9

Searching

= 6?

5

3

6

7

2 4 9

Searching

= 6?

5

3

6

7

2 4 9

Searching

Decisions, decisions

The tree just implements the possible sequences of decisions we would have made in ordinary BS on an array.

5

3 7

< >=

It's binary since the = part causes immediate termination.



Binary search

should return something useful…

What is the important “step” to count?

Real life

In most applications, we do not store simple elements, but pairs

(key,value)We search for a key, and want the corresponding value returned if the key is found, and "No" otherwise.

The ADT is called a dictionary.

Correctness

Clearly, search() can never return a false positive answer. But search() only walks down one branch, so how do we know we don't get false negative answers?

Suppose T is a BST that contains x. Claim: search(x,T) properly returns "true".

Proof (by induction)

T cannot be nil, so suppose T = (a,L,R).

Case 1: x = a: done.

Case 2: x < a: Since T is a BST, x must be in L.

But by induction (on trees), search(x,L) returns true. Done.

Case 3: x > a: same as case 2.

Insertions

Insertions in a BST are very similar to searching: find the right spot, and then put down the new element as a new leaf.

We will not allow multiple insertions of the same element, so there is always exactly one place for the new guy.

Deletions

How should we perform deletions?

Next time…

How Many?

How many decisions do we have to make before we have either found the element, or know it's not in the tree?

Why do we care?

versus

How Many?

So the number of decisions seems to be the important “step” to count.

In the worst case, the number of such steps is related to the depth of the tree.

Good Tree

But in a "good" BST we havedepth of T = O( log # nodes )

Unfortunately, in the easiest implementation, BSTs aren’t always good.

We’ll see next week how to maintain “goodness” in our BSTs.

Insertion, inductively Insertion into a BST:

insert(x,nil) = (x,nil,nil)insert(x,(y,L,R)) =

(y,insert(x,L),R), if x<y (y,L,insert(x,R)), if x>y (y,L,R), if x=y

In what kinds of situations can insert create “bad” BSTs?

Why do we care?

versusWhat is the height?

Logarithms and exponents

Logarithms and exponents are everywhere in algorithm analysis

logba = c if a = bc

Logarithms and exponents

Usually will leave off the base b when b=2, so for example

log 1024 = 10

Some useful equalities

logbac = logba + logbclogba/c = logba - logbclogbac = clogbalogba = (logca) / logcb(ba)c = bac

babc = ba+c

ba/bc = ba-c

Logarithms and treesIn a “perfect” BST containing n nodes…•What is the height?•How many nodes are there at level i, at each height?•How many steps does it take to search?

Constant factors

“My computer is 4 times faster than yours.”

So what?

“Big-Oh” notation

N

cf(N)

T(N)

n0

runn

ing

time

T(N) = O(f(N))“T(N) is order f(N)”

“Big-Oh” notation

Given a function T(N): T(N) = O(f(N)) if

there is a real constant c and integer constant n0 such that T(N) cf(N) for all N n0.

c is called the constant factor.

Big-Oh When T(N) = O(f(N)), we are saying

that T(N) grows no faster than f(N).I.e., f(N) describes an upper bound on

T(N).

Put another way:For “large enough” inputs, cf(N) always

dominates T(N). Called the asymptotic behavior

Big-O characteristic

If T(N) = cf(N) thenT(N) = O(f(N))Constant factors “don’t matter”

Because of this, when T(N) = O(cg(N)), we usually drop the constant and just say O(g(N))

Big-O characteristic

Suppose T(N)= k, for some constant k Then T(N) = O(1) Why?

because c*1 > k, for some c

Big-O characteristic More interesting:

Suppose T(N) = 20n3 + 10nlog n + 5Then T(N) = O(n3)Lower-order terms “don’t matter”

Question:What constants c and n0 can be used to show that

the above is true? Answer: c=35, n0=1

From last time…

We calculated this running time for reverse()

So, in big-Oh terms:O(n2)

Big-O characteristic If T1(N) = O(f(N)) and T2(N) = O(g(N))

thenT1(N) + T2(N) = max(O(f(N)), O(g(N)).The bigger task always dominates

eventually.

Also:T1(N) T2(N) = O(f(N) g(N)).

Some common functions

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10

10N100 log N5 N^2N^32^N

Big-Oh is imprecise Let T(N) = 100log N Then T(N) = O(log N)

And T(N) = O(N2) And T(N) = O(N3) And T(N) = O(2N)

Tight bounds

Because of this imprecision, we normally try to find the “tightest” bound on the number of steps.

So, while it is true that reverse() is O(2n), it is more useful to use the tighter bound of O(n2).

Big-O characteristics logk(N) = O(N) for any constant k.

I.e, logarithms grow very slowly.

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

log NN

Note

There is a bit of a mismatch because we are counting “steps”, which are always whole numbers, but logarithms are real numbers

We will take the floor or ceiling of any real numbersUsually this is implicitly done

How Many?

How many decisions do we have to make before we have either found the element, or know it's not in the binary search tree?

Why do we care?

versusWhat is the height?

How Many?

How many decisions do we have to make before we have either found the element, or know it's not in the binary search tree?

We walk down a branch in the tree, so the worst case RT for search is

O( depth of T ) = O(# nodes )

Good Tree

But in a "good" BST we havedepth of T = O( log # nodes )

We’ll see next week how to maintain “goodness” in our BSTs.

70

Inheritance: Taxonomy metaphor

Animal

Human Canine

Dog WolfProfessor Student

Reptile Mammal

Lorises

extends extends

extends

exte

nds extends

extend

s extends extends extends

Classes vs instances

Every object is an instance of a class.

The characteristics of an object are defined by its class.

An object inherits characteristics from all of its superclasses.

Classes vs instances Example:

Peter Lee is an instance of the Professor class.

He is therefore also an instance of the Human, Mammal, and Animal classes.

Sometimes we say that Peter Lee “is a” Professor (or Human or Mammal…)

Peter also “has a” wife and son, who are also instances of the Human class.

In Java:public class Animal { … }

public class Mammal extends Animal { … }

public class Human extends Mammal { … }

public class Professor extends Human { … }

public class MyClass { Public static main () { Professor danny = new Professor(); … }}

Implicitly extends class Object

In a bit more detail…

public class Animal { private int age; …}

public class Mammal extends Animal { private Mammal father; private Mammal mother; private List children; …}

public class Human extends Mammal { private String name; private Race race; private boolean hasMate; …}

public class Professor extends Human { private Department dept; …}

public class MyClass { Public static main () { Professor danny = new Professor(); … }}

Instance variables

Constructor methods

Each class defines one or more constructor methods for initializing an object’s instance variables.

Question: What are the instance variables for an instance of the Professor class?

Sample constructorspublic class Animal { private int age;

public Animal (int howOld) { age = howOld; }}

public class Mammal extends Animal { private Mammal father; private Mammal mother;

public Mammal(Mammal dad, Mammal mom, int years) {

super(years); father = dad; mother = mom; }}

Why can’t we simply write

age = years;

instead of this?

Java class hierarchy (excerpt)

Click here

Object

NumberMathCompilerClassLoaderClassCharacterBoolean

Byte ShortIntegerFloatDouble

All Classes in Java Ultimately Inherit from

Object

http://java.sun.com/j2se/1.3/docs/api/index.html

78

Java.langObject

NumberMathCompilerClassLoaderClassCharacterBoolean

Byte ShortIntegerFloatDouble

Inheritance is transitive:Short IS-A Number IS-A Object

thereforeShort IS-A Object

79

Another Taxonomy - Shapes

Rectangle

Square

Ellipse

Circle

Shape

Triangle

80

Interface classes

Rectangle

Square

Ellipse

Circle

Triangle

Shape

public interface Shape {public void draw();public double area();public Point upperLeft();public void moveTo(Point );public void setColor(Color );

public double perimeter();}

The interface defines all of the methods that are required in any implementation of Shape.

Some rules about interfaces No instance variables, no constructor

methods. No code. All methods must be

“abstract”. Subclasses that implement the

interface say “implements”. A class can inherit from (i.e.,

implement) multiple interfaces.

82

Abstract classes

Rectangle

Square

Ellipse

Circle

Triangle

Shape

public abstract class Shape {abstract public void draw();abstract public double area();abstract public Point upperLeft();abstract public void moveTo(Point );abstract public void setColor(Color );abstract public void setColor(Color );

abstract public double perimeter();}

The abstract class defines all of the methods that are required in any implementation of Shape.

Rules about abstract classes May have instance variables. Can’t invoke “new” on an abstract class. Abstract methods must be declared “abstract”.

Code can be supplied for other methods. Subclasses that inherit from an abstract class say

“extends”. A class may inherit from only one abstract class.

Abstract vs interface classes When does it make sense to use one

over the other?

Interfaces are simpler and allow multiple inheritance.

But sometimes it is handy to have code and instance variables, and so in these cases abstract classes work better.

85

Abstract class exampleabstract class Shape {

abstract public void draw();abstract public double area();abstract public Point upperLeft();abstract public void moveTo(Point );abstract public void setColor(Color );abstract public double perimeter();

public double semiperimeter() { return perimeter() / 2;}

}

86

Rectangle

Square

Parallelogram Ellipse

Circle

Triangle

Shape

Defining the subclassespublic class Rectangle extends Shape {

public Rectangle(Point ul, Point lr, Color c) { … }

public void draw() { … }public double area() { … }public Point upperLeft() { … }public void moveTo(Point ) { … }public void setColor(Color ) { … }public double perimeter() { … }

private Point ul;private Color color;private Point lr;

}

Note that semiperimeter() is inherited from Shape.

87

Defining a subclass

Rectangle

Shape parent class/superclass

child class/subclass

class child extends parent {...}

Binary Trees 15-211 Fundamental Data Structures and Algorithms Peter Lee January 23, 2002.

Documents