Transcript

ID3 Algorithm

CS 157B: Spring 2010

Meg Genoar

Iterative Dichotomiser 3

Ross Quinlan – 1987

C4.5 Precursor

Decision Tree Generation

Ross Quinlan

Computer Scientist – UW 1968

Data Mining & Decision Theory

AI: Data Mining

ID3, C4.5, & C5.0

RuleQuest Research

Max-Gain Split

Most Useful Attribute

Highest Information

Best Attribute

Measure of Uncertainty

Randomness

Efficient Separation of Decision Tree Elements

ID3 & Entropy

Entropy

Entropy(S) = – Ppositive Log2Ppositive

– Pnegative Log2Pnegative

Ppositive: proportion of positive data

Pnegative: proportion of negative data

Example…

A collection S consists of 20 data examples:

13 Yes : 7 No

Entropy(S) = – (13/20) Log2(13/20)

– (7/20) Log2(7/20)

Entropy(S) = 0.934

Entropy Gain Value

Gain: Place to Split the Tree

High Gain > Low Gain

High Gain: Top of the Tree

Gain(A) = E(Current Set) - ∑ E(All Child Sets)

Movie ExampleFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False

3 United States Yes Comedy True

4 Europe No Comedy True

5 Europe Yes Science Fiction

False

6 Europe Yes Romance False

7 Rest of World Yes Comedy False

8 Rest of World No Science Fiction

False

9 Europe Yes Comedy True

10 United States Yes Comedy True

Entropy of Table

Is the Film a Success?

Entropy(5 Yes, 5 No) = – (5/10) Log2(5/10)

– (5/10) Log2(5/10)

Entropy(Success) = 1

Split – Country of Origin

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False

3 United States Yes Comedy True

4 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Europe No Comedy True

2 Europe Yes Science Fiction

False

3 Europe Yes Romance False

4 Europe Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Rest of World Yes Comedy False

2 Rest of World No Science Fiction

False

Gain – Country of Origin

Where is the film from?

Entropy(USA) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)

Entropy(USA) = 0.811

Entropy(Europe) = – (2/4) Log2(2/4) – (2/4) Log2(2/4)

Entropy(Europe) = 1

Entropy(Rest of World) = – (0/2) Log2(0/2) – (2/2) Log2(2/2)

Entropy(Rest of World) = 0

Gain(Origin) = 1 – (4/10 *0.811 + 4/10*1 + 2/10*0) = 0.276

Split – Big StarFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States Yes Comedy True

3 Europe Yes Science Fiction

False

4 Europe Yes Romance False

5 Rest of World Yes Comedy False

6 Europe Yes Comedy True

7 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

2 Europe No Comedy True

3 Rest of World No Science Fiction

False

Gain – Big Star

Is there a Big Star in the film?

Entropy(Yes) = – (4/7) Log2(4/7) – (3/7) Log2(3/7)

Entropy(Yes) = 0.985

Entropy(No) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)

Entropy(No) = 0.918

Gain(Star) = 1 – (7/10 *0.985 + 3/10*0.918) = 0.0351

Split – GenreFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 Europe Yes Science Fiction

False

3 Rest of World No Science Fiction

FalseFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

2 United States Yes Comedy True

3 Europe No Comedy True

4 Rest of World Yes Comedy False

5 Europe Yes Comedy True

6 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Europe Yes Romance False

Gain – Genre

What genre is the film?

Entropy(SciFi) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)

Entropy(SciFi) = 0.918

Entropy(Com) = – (4/6) Log2(4/6) – (2/6) Log2(2/6)

Entropy(Com) = 0.918

Entropy(Rom) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)

Entropy(Rom) = 0

Gain(Genre) = 1 – (3/10 *0.918 + 6/10*0.918+ 1/10*0) = 0.1738

Compare Gains…

Gain(Origin) = 0.276

Gain(Star) = 0.0351

Gain(Genre) = 0.1738

Compare Gains…

Gain(Origin) = 0.276

Gain(Star) = 0.0351

Gain(Genre) = 0.1738

First Split: Origin

All Movies

United States Europe Rest of World

New Table New Table New Table

All Movies

United States Europe Rest of World

New Table New Table New Table

New Table – United States

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False

3 United States Yes Comedy True

4 United States Yes Comedy TrueEntropy(3 Yes, 1 No) = – (3/4) Log2(3/4) – (1/4)

Log2(1/4)

Entropy(Success) = 0.811

Split – Big Star

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States Yes Comedy True

3 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

Gain – Big Star

Is there a Big Star in the film?

Entropy(Yes) = – (3/3) Log2(3/3) – (0/3) Log2(0/3)

Entropy(Yes) = 0

Entropy(No) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)

Entropy(No) = 0

Gain(Star) = 0.811 – (3/4 *0 + 1/4*0) = 0.811

Split – Genre

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

Film

Country of Origin

Big Star Genre Success

1 United States No Comedy False

2 United States Yes Comedy True

3 United States Yes Comedy True

Gain – Genre

What genre is the film?

Entropy(SciFi) = – (1/1) Log2(1/1) – (0/1) Log2(0/1)

Entropy(SciFi) = 0

Entropy(Com) = – (2/3) Log2(2/3) – (1/3) Log2(1/3)

Entropy(Com) = 0.918

Gain(Genre) = 0.811 – (1/4 *0 + 3/4*0.918) = 0.1225

Compare Gains…

Gain(Star) = 0.811

Gain(Genre) = 0.1225

Compare Gains…

Gain(Star) = 0.811

Gain(Genre) = 0.1225

Split: Star

All Movies

United States Europe Rest of World

Star No Star

New Table New Table New Table

New Table New Table

All Movies

United States Europe Rest of World

Star No Star

Sci-Fi Comedy

New Table New Table New Table

New Table Failure

Success Success

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

top related