L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 1/38

241-320 Design Architecture and Engineeringfor Intelligent System

Suntorn Witosurapot

Contact Address: Phone: 074 287369 or

Email: [email protected]

January 2010



Lecture 16:

Machine Learning - Part 1-

(Learning from Observations)



241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learning

from Observations) 3

Motivation

An AI agent operating in a complex world requires anawful lot of knowledge:

– state representations, constraints,

action descriptions, heuristics, probabilities, ...

More and more, AI agents are designed to acquireknowledge through learning (การเรยีนรู )





Outline

What is Learning?

Learning Agents

Introduction to inductive learning

Logic-based inductive learning:

– Decision-tree induction

Function-based inductive learning

– Neural nets





What’s Learning?

Learning is essential for unknown environments

– i.e. when designer lacks omniscience

Learning is useful as system construction method

– i.e. expose agent to reality rather than try towrite it down

Learning modifies the agent's decision mechanismsto improve performance





Outline

What is Learning?

Learning Agents

Inductive learning




– Neural nets





Learning agents





Learning agents (cont.)

Main idea:

– agents should use their percepts not only for acting,but also for improving their future ability to act

Wide range of methods

Major design issue is thetype of feedback that willbe available to the agent





Types of learning from feedback

Supervised learning

– Given a set of example inputs and outputs

– Goal is to learn a function relating the two

(เช น เทคนคิ Decision Tree) Unsupervised learning

– Given inputs, but no outputs

– Goal is to group input into different classes

(แยกแยะขอมูลออกเป็นกลุ มๆ เช น เทคนคิ Nearest

Neighborhood)



241-320 Design Architecture &Engineering for Intelligent System Machine Learning - Part 1 (Learningfrom Observations) 10

Examples

Supervised learning

– Taxi learning to brake with instructor

– Spam filter

Unsupervised learning

– Market research

– Data mining




Other factors affects to learning

Representation of learned information

Availability of prior knowledge




Outline

Why learning?

Types of learning

Inductive learning




– Neural nets




Inductive Learning

เป็นการเรยีนรู จากเหตกุารณ์ หรอื Feedback โดยที ่รับทราบข อมูลหรอืคาความจรงิเพยีงบางส วน แตกพ็ยายามคนหาหรือประมาณการคาที ่แท จรงิ (หรอืคาที ่ใกลเคยีง) ของ

ข อมูลอ ่ืนๆ ได Ex: การเรยีนรู ของพนักงานขาย โดย

– ศ ึกษาพฤตกิรรม / บุคลกิลูกค า / ความสนใจขณะสาธติส ินค า

– กอ็าจคนหาความตองการที ่แท จรงิของลูกค า เพ ่ือสรางความส ําเร็จในงานขายได




Inductive learning

Simplest form: learn a function from examples

Let's call an example a pair (x,f(x)), where x is theinput and f(x) is the output of the function applied to x

Pure inductive inference:

– Given a collection of examples (aka training set ) off , return a function h that approximates f

– h is called a hypothesis How can we tell if a hypothesis is good?




Example

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:




Example







Example







Example




Q: How do wedecide amongthese hypothesesthat all agree withour data?




What we desire from a hypothesis

Since we will use the hypothesis h most oftento predict the output of f(x) on examples wehaven't seen yet, we want it to do well on

these We call this generalization

Ideally we would like to find an h such that

h = f




Tradeoff: complexity vs data-fit

Generally, the larger and more complex the hypothesisis, the better we can fit our data

However, we need to take into account thecomputational complexity of learning

– Fitting straight lines = easy

– Fitting high-degree polynomials = harder

Also want to take into account how hard it is to use h.

– Prefer fast computation time

Learning typically focuses on“simple” representations




Outline

Why learning?

Basic Ideas

Inductive learning




– Neural nets




Logic-Based Inductive Learning:Decision Tree Method

It is a supervised learning technique

– ใช คาดคะเน หรอืทํานายเหตกุารณท์ ่ีจะเกดิข ้ึนลวงหนา ตามสถานการณต์างๆ ท ่ีเกดิข ้ึน โดยใช ผลลัพธท์ ่ีได จากการตัดส ินใจตามผังโครงสรางขอมูลแบบตนไม

Widely used algorithm (even in our daily life)

Structure of Decision Tree:

– Root & Leaves connecting with branches

– Searching along any branch is upon the situation




Ex: More Complex of Decision Tree

Problem: decide whether to wait for a table at arestaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)




Attribute-based representations

Examples described by attribute values (Boolean, discrete,

continuous), e.g. situations where I will/won't wait for a table:

Classification of examples is positive (T) or negative (F)




Decision trees

One possible representation for hypotheses E.g., here is the “true” tree for deciding whether to wait:




Decision trees

Another possible representation for hypotheses

This decision tree looks less complex and morerealistic than the one in the previous slide – It concerns at hungriness, rather than estimate waiting time




Decision trees

Occam’s razor: prefer the simplest hypothesis consistent with data

หากพจิารณาตามเกณฑ์ของ “มดีโกนของอ็อกแคม” ข างตนจะเห็นวา Decision Tree ที ่เล็กที ่สดุนาจะเป็นอันที ่ดทีี ่สดุ

แตกระบวนการสราง Decision Tree น้ันซ ับซ อนมากเพิ ่มข ้ึนตามจํานวน Node ที ่รวมดวย (Hypothesis spaces)

How many distinct decision trees with n Boolean attributes?

= number of Boolean functions= number of distinct truth tables with 2n rows = 22n

E.g., with 6 Boolean attributes, there are18,446,744,073,709,551,616 trees




Expressiveness

Decision trees can express any function of the input attributes.

E.g., for Boolean functions, truth table row → path to leaf:

Trivially, there is a consistent decision tree for any training setwith one path to leaf for each example, but it probably won'tgeneralize to new examples

Prefer to find more compact decision trees




Decision tree learning

Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as

root of (sub)tree




Choosing an attribute

Idea: a good attribute splits the examples into subsetsthat are (ideally) "all positive" or "all negative"

Patrons? is a better choice




Choosing an attribute via Information Theory

To implement Choose-Attribute in the DTL algorithm,

It is needed to find Information Content (Entropy):

I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)

คาเอนโทรปีชุดข อมูลโดยเฉลี ่ย = ผลรวมของ (-log2(ความนาจะเป็นของข อมูลแตละตัว)

คาเอนโทรปีนี ้จะนําไปใช ในการประเมิน “เน ้ือหาของขอมูลสารสนเทศ” ชุดตางๆ วา มีความเหมือนหรอืแตกตางกันได

– ใช ประกอบการพจิารณาตอไปวาจะสามารถลดจํานวนก ่ิงทเีหมือนกันๆออกไปไดหรอืไม ในกระบวนการแปลงตารางข อมูลเป็นแผนผังต นไม

– ดูตัวอยางการโยนเหรยีญหัวก อย ในสไลดถั์ดไป




Ex: ข อมูลการโยนเหรียญ

ชุดข อมูล (M) = {หัว, ก อย}

ความนาจะเป็นในการออกหัว และก อย =P(หัว), P(ก อย) ตามลําดับ

คาเอนโทรปีโดยเฉลี ่ยของขอมูลชุดนี ้ = I(M)

จะเห็นวา เม ่ือออกหัวหรือก อยหมด คาเอนโทรปีจะเป็นศูนย์ แตคานี ้จะเพ ่ิมข ้ึนเรื ่อยๆ จนสงูสดุท ่ีโอกาสของการเป็นหัวหรือก อยมเีทากัน ดังนั้น

เอนโทรปีมีคาน อย จะบงช ี ้ได วาชดุข อมูลนี ้มีความแตกตางกันน อย หรืออาจะเป็นชุดเดียวกัน หรือกรณีตรงขามท ่ีเอนโทรปีมีคามาก ชดุข อมูลจะแตกตางกันมากดวย




Information gain

A chosen attribute A divides the training set E intosubsets E 1, … , E v

according to their values for A,where A has v distinct values.

Information Gain (IG) or reduction in entropy fromthe attribute test:

– คดิจากคาเอนโทรปีรวม ลบด วยคาเอนโทรปีหลังจากเลือก attribute

อันหนึ ่งเป็นราก Choose the attribute with the largest IG

– เพ ่ือนํามาใช ในการพจิารณาเป็น “ราก” ส ําหรับการตัดส ินใจตอไป

∑= +++

+=

v

i ii

i

ii

iii

n p

n

n p

p I

n p

n p Aremainder

1

),()(

)(),()( Aremainder n p

n

n p

p I A IG −

++

=




Information gain

For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type (and others too):

Since Patrons has the highest IG of all attributes, so it ischosen by the DTL algorithm as the root

bits0)]4

2,

4

2(

12

4)

4

2,

4

2(

12

4)

2

1,

2

1(

12

2)

2

1,

2

1(

12

2[1)(

bits0541.)]6

4,6

2(12

6)0,1(12

4)1,0(12

2[1)(

=+++−=

=++−=

I I I I Type IG

I I I Patrons IG




Example (cont.)

Decision tree learned from the 12 examples:

Substantially simpler than “true” tree---a more complexhypothesis isn’t justified by small amount of data




Performance measurement

Q: How do we know that h ≈ f ?1. Use theorems of computational/statistical learning theory

2. Try h on a new test set of examples

(use same distribution over example space as training set)

Learning curve = % correct on test set as a function oftraining set size




Summary

Learning is needed forunknown environments, lazy designers

Learning agent = feedback + learning element +

performance element

For supervised learning, the aim is to find a simplehypothesis approximately consistent with trainingexamples

Decision tree learning using information gain

Learning performance = prediction accuracymeasured on test set



241 320 Design Architecture &S M hi L i P t 1 (L if Ob i ) 38

Reading

บทท ี่ 10

การเรียนรูของเครื อ่งจักร

(Machine Learning)

บทท ี่ 6การเรียนรูของเครื อ่ง (หน า 153 - 163)

L16 Machine Learning

Documents