Machine Learning meets the Real World: Successes and new research directions Andrea Pohoreckyj Danyluk Department of Computer Science Williams College,

Machine Learning meets the Real World:Successes and new research directions

Andrea Pohoreckyj Danyluk

Department of Computer Science

Williams College, Williamstown, MA

October 11, 2002

Data, data everywhere...

• Scientific: data collection routinely produces gigabytes of data per day

• Telecommunications: AT&T produces 275 million call records

• Web: Google handles 70 million searches

• Retail: WalMart records 20 million sales transactions

A wealth of information

• Scientific data– Detection of oil spills from satellite images– Prediction of molecular bioactivity for drug

design

• Telecommunications– Fraud detection to distinguish between “bad”

and normal usage of cell phones

A wealth of information

• Web mining– Characterize killer pages

• Retail– Determine better product placement

• Direct mail– Predict who is most likely to donate to a charity

Machine learning success(Machine learning is ubiquitous)

• Scientific discovery– Detection of oil spills from satellite images

• Telecommunications– Diagnosis of problems in the local loop

• Printing– Determine causes of banding (printing cylinder

problems)

• Control– Self-steering vehicles

Why research in machine learning is so good today

Research in machine learning benefits from

• Abundant data

• Interest in fielding new applications– Even more data– Push on limits of our understanding,

technology, etc.

Plan for this talk

Original

• Discuss success stories and failures

• Failures help identify new areas of research

New plan

• One success story in detail

• Lesson learned: can identify new areas of research even when we succeed

Induction of decision trees

• Not the only (or even the most “hot”) algorithms

• Have been used in many contexts

• Important for understanding our success story: local-loop network diagnosis

Inductive learning

Given a collection of observations of the form (<x>, f<x>)

Find g<x> that approximates f<x>

Sample data

Outlook Temp Humidity Wind Go to class?

Student 1 Sunny Hot High False Yes

Student 2 Sunny Hot High True No

Student 3 Overcast Hot High False Yes

Student 4 Rainy Mild High False Yes

Student 5 Rainy Cool High True No

Student 6 Rainy Cool Normal False No

Predictive modelI.e., g<x>

Outlook

Sunny Rainy Overcast

Wind Temp Yes

Yes No mild cold

No Yes Yes No

Learning objectives

• Learn a tree that is correct

• Learn a tree that is compact

• At every level in the tree, select a test that best differentiates examples of one class from another

• If all examples are from the same class– The tree is a leaf with that class name

• Else– Pick a test to make– Construct one edge for each possible test

outcome– Partition the examples by test outcome– Build subtrees recursively

Which is better?

10 Yes10 No

Humid Outlook

High Normal Sunny Overcast

5 Y 5 Y 2 Y 8 Y5 N 5 N 10 N

The Gain Criterion

• Measure the information of the collection

• Measure the information of each possible split

• Choose the split with greatest information gain

Information (Entropy)

• Let T be a set of examples

• Let C1, C2, …, Cn be class labels

• freq(Ci,T) = number of examples in T that belong to class Ci.

• |T| = number of examples in T

• Select example and announce its class: info = - log2 freq(Ci,T)/|T|

Information (Entropy)

• Let T be a set of examples

• Info(T) =- (freq(Ci,T)/|T|) (log2 (freq(Ci/|T|)/|T|))

Entropy after a split

• Let X be an attribute with n possible values.

• Let Tj be the examples that have the value j for attribute X.

Average entropy that results from making split on X:

infoX(T) = ( |Ti| / |T| ) * info(Ti),sum over n possible values of X.

Information Gain

• Compute infoX(T) for every attribute

• Select attribute that maximizes

info(T) – infoX(T)

Which is better?

10 Yes10 No

Humid Outlook

High Normal Sunny Overcast

5 Y 5 Y 2 Y 8 Y5 N 5 N 10 N

Scrubber (the success story)

• Diagnoses problems in the local loop

• Problem may be due to trouble in:– Customer premise equipment– Facilities connecting customer to cable– Cable– Central office

• Millions of “troubles” reported annually

MAX, 1990

• Acts as Maintenance Administrator (MA)

• Sequence of action:– Customer calls– Rep takes information; initiates tests– Trouble report sent to MA– MA puts trouble in dispatch queue for specific

type of technician

Scrubber 2

• Performed a task at a later point in the pipeline

• Survey dispatch queues to determine whether dispatch appropriate– Dispatch not immediate– Many problems resolved exogenously

Scrubber 3

• Scrubber 2 for new application platform

• Centralized knowledge server

• Cover twice as large a network

Implementation difficulties

• Original expert system shell no longer supported

• Knowledge base evolved into opacity– Many tweaks over a decade– Many knowledge engineers– Most not available to work on Scrubber3

Requirements

• Level of performance at least as good as prior system– Overall accuracy– False positives and false negatives in range

• Comprehensible– For understanding and acceptance by experts

Additional requirements (ours)

• Improved performance

• Improved extensibility

Phase I: Modeling Scrubber 2

• Applied a decision tree learning algorithm

• Input data:– Trouble reports– Scrubber 2 diagnoses

26,000 trouble reports

• 40 attributes (1/2 continuous; 1/2 symbolic)

• Two classes– Dispatch– Don’t -- I.e., call customer to verify ok

Background knowledge

• C4.5 selected

• 17 of 40 attributes used

Phase I results

• Decision trees with predictive accuracy of .99, with as few as 10,000 examples

• Less than two days of work (easy!)

Phase II: Acceptance

• Comprehensibility Readability– Need to observe rationality in learned

knwoledge– Original trees on order of 1000 nodes

• The simpler the model, the better it can be understood

Comprehensibility = Readability + Simplicity + Fidelity

Trading off simplicity and correctness

• Pruning nodes sacrifices correctness

• Appropriate when comprehensibility an issue

• Langley and Schwabacher, 2001

• Note: not pruning to avoid overfitting

Phase II results

• Used only two most prominent attributes

• New decision trees created

• Still fell into acceptable zone

Phase III: Working toward extensibility

• Hoped to gain flexibility for– Local modifiability– Additional attribute values

• Moved toward probabilistic decision tree– Leaves labeled with probability estimates, not

decisions– Stubby trees easy to represent in tabular form

Phase IIIb: More data

• Focus on two attributes gave us access to an extensive data set– Many more trouble reports– Abridged (two-attribute) form had not been

considered useful earlier

Phase III results

• Simple diagnostic model

• Greater empirical confidence -- impt due to small disjunct problem– “Big” general rules cover approximately 50%

of the data– Remaining 50% covered by small disjuncts

Summarizing the success story

• C4.5 applied to induce Scrubber 2 model

• Pruned model for comprehensibility/simplicity

• Converted new model into probabilistic one

• Used newly gained data for additional tuning and confidence

• Small(?), simple model in very short time

Lessons can be learned from success

Lesson 1: the importance of comprehensibility– Rationality– Readability– Simplicity

Lesson 2: the need for algorithms to handle small data sets– Creative ways to engineer interesting features

from few

– Openness to alternative sources of data

– Algorithms specifically tuned to handle small data sets

Langley has noted this to be an issue of scientific data -- but true for industrial data as well

Lesson 3: the need to think about systematic error– Locally systematic error only look like noise

with enough data– Clearly related to the problem of small data sets– How do our algorithms hold up?

Lesson 4: the need to think about the future– Learning results put into practice will be

modifed and extended– Must new models be learned?– Can improvement be incremental?

Lesson 5: creative uses of the technology– Learning for the purposes of re-engineering

isn’t “standard”– New applications will serve to fuel new

research

Machine Learning meets the Real World: Successes and new research directions Andrea Pohoreckyj Danyluk Department of Computer Science Williams College,

tt slide

g slide

f slide

charity slide

sample data slide

sales transactions slide

information entropy

info x t

Documents

prosecution successes

Celebrating Successes

Bauxite residue uses: successes, failures...

Time Management Andrea Danyluk Williams College March 9,...

Work / Life Balance & Time Management Andrea Danyluk...

Successes Part 68

Successes Part 36

Gravitomagnetism successes (3)

Successes Part 57

Successes and Challenges

Successes Part 90

CSCI 373: Artificial Intelligence Andrea Danyluk September.....

Sharing Your Successes

globally, successes…

Successes Part 9

400 SUCCESSES