Top Banner
241-320 Design Architecture and Engineering for Intelligent System Suntorn Witosurapot Contact Address:  Phone: 074 287369 or Email: [email protected] January 2010
38

L16 Machine Learning

Apr 06, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 1/38

241-320 Design Architecture and Engineeringfor Intelligent System 

Suntorn Witosurapot

Contact Address: Phone: 074 287369 or

Email: [email protected]

January 2010

Page 2: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 2/38

Lecture 16: 

Machine Learning - Part 1-

(Learning from Observations) 

Page 3: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 3/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  3

Motivation

An AI agent operating in a complex world requires anawful lot of knowledge:

 – state representations, constraints,

action descriptions, heuristics, probabilities, ...

More and more, AI agents are designed to acquireknowledge through learning (การเรยีนรู )

Page 4: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 4/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  4

Outline

What is Learning?

Learning Agents

Introduction to inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

Page 5: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 5/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  5

What’s Learning?

Learning is essential for unknown environments

 – i.e. when designer lacks omniscience

Learning is useful as system construction method

 – i.e. expose agent to reality rather than try towrite it down

Learning modifies the agent's decision mechanismsto improve performance

Page 6: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 6/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  6

Outline

What is Learning?

Learning Agents

Inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

Page 7: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 7/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  7

Learning agents

Page 8: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 8/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  8

Learning agents (cont.)

Main idea:

 – agents should use their percepts not only for acting,but also for improving their future ability to act

Wide range of methods

Major design issue is thetype of feedback  that willbe available to the agent

Page 9: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 9/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learning

from Observations)  9

Types of learning from feedback

Supervised learning

 – Given a set of example inputs and outputs

 – Goal is to learn a function relating the two

(เช น เทคนคิ Decision Tree) Unsupervised learning

 – Given inputs, but no outputs

 – Goal is to group input into different classes

(แยกแยะขอมูลออกเป็นกลุ มๆ เช น เทคนคิ Nearest

Neighborhood) 

Page 10: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 10/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  10

Examples

Supervised learning

 – Taxi learning to brake with instructor

 – Spam filter

Unsupervised learning

 – Market research

 – Data mining

Page 11: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 11/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  11

Other factors affects to learning

Representation of learned information

Availability of prior knowledge

Page 12: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 12/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  12

Outline

Why learning?

Types of learning

Inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

Page 13: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 13/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  13

Inductive Learning

เป็นการเรยีนรู จากเหตกุารณ์ หรอื Feedback โดยที ่รับทราบข อมูลหรอืคาความจรงิเพยีงบางส วน แตกพ็ยายามคนหาหรือประมาณการคาที ่แท จรงิ (หรอืคาที ่ใกลเคยีง) ของ

ข อมูลอ ่ืนๆ ได   Ex: การเรยีนรู ของพนักงานขาย โดย 

 –  ศ  ึกษาพฤตกิรรม / บุคลกิลูกค า / ความสนใจขณะสาธติส  ินค า 

 –  กอ็าจคนหาความตองการที ่แท จรงิของลูกค า เพ ่ือสรางความส ําเร็จในงานขายได 

Page 14: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 14/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  14

Inductive learning

Simplest form: learn a function from examples

Let's call an example a pair (x,f(x)), where x is theinput and f(x) is the output of the function applied to x 

Pure inductive inference:

 – Given a collection of examples (aka training set ) off , return a function h  that approximates f 

 – h is called a hypothesis  How can we tell if a hypothesis is good?

Page 15: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 15/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  15

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

Page 16: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 16/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  16

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

Page 17: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 17/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  17

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

Page 18: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 18/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  18

Example

Construct/adjust h  to agree with f  on training set

(h is consistent if it agrees with f on all examples)

 

E.g., curve fitting:  

Q: How do wedecide amongthese hypothesesthat all agree withour data?

Page 19: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 19/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  19

What we desire from a hypothesis

Since we will use the hypothesis h  most oftento predict the output of f(x) on examples wehaven't seen yet, we want it to do well on

these We call this generalization 

Ideally we would like to find an h such that

h = f 

Page 20: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 20/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  20

Tradeoff: complexity vs data-fit

Generally, the larger and more complex the hypothesisis, the better we can fit our data

However, we need to take into account thecomputational complexity  of learning

 – Fitting straight lines = easy

 – Fitting high-degree polynomials = harder

Also want to take into account how hard it is to use h.

 – Prefer fast computation time

Learning typically focuses on“simple” representations

Page 21: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 21/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  21

Outline

Why learning?

Basic Ideas

Inductive learning

Logic-based inductive learning:

 – Decision-tree induction

Function-based inductive learning

 – Neural nets

Page 22: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 22/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  22

Logic-Based Inductive Learning:Decision Tree Method

It is a supervised learning technique

 –  ใช  คาดคะเน หรอืทํานายเหตกุารณท์ ่ีจะเกดิข ้ึนลวงหนา ตามสถานการณต์างๆ ท ่ีเกดิข ้ึน โดยใช  ผลลัพธท์ ่ีได จากการตัดส  ินใจตามผังโครงสรางขอมูลแบบตนไม 

Widely used algorithm (even in our daily life)

Structure of Decision Tree:

 – Root & Leaves connecting with branches

 – Searching along any branch is upon the situation

Page 23: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 23/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  23

Ex: More Complex of Decision Tree

Problem: decide whether to wait for a table at arestaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Page 24: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 24/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  24

Attribute-based representations

Examples described by attribute values (Boolean, discrete,

continuous), e.g. situations where I will/won't wait for a table:

Classification of examples is positive (T) or negative (F)

Page 25: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 25/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  25

Decision trees

One possible representation for hypotheses E.g., here is the “true” tree for deciding whether to wait:

Page 26: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 26/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  26

Decision trees

Another possible representation for hypotheses

This decision tree looks less complex and morerealistic than the one in the previous slide – It concerns at hungriness, rather than estimate waiting time

Page 27: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 27/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  27

Decision trees

Occam’s razor: prefer the simplest hypothesis consistent with data

หากพจิารณาตามเกณฑ์ของ “มดีโกนของอ็อกแคม” ข างตนจะเห็นวา Decision Tree ที ่เล็กที ่สดุนาจะเป็นอันที ่ดทีี ่สดุ 

แตกระบวนการสราง Decision Tree น้ันซ  ับซ  อนมากเพิ ่มข ้ึนตามจํานวน Node ที ่รวมดวย (Hypothesis spaces)

How many distinct decision trees with n Boolean attributes?

= number of Boolean functions= number of distinct truth tables with 2n rows = 22n

 

E.g., with 6 Boolean attributes, there are18,446,744,073,709,551,616 trees

Page 28: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 28/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  28

Expressiveness

Decision trees can express any function of the input attributes.

E.g., for Boolean functions, truth table row → path to leaf:

Trivially, there is a consistent decision tree for any training setwith one path to leaf for each example, but it probably won'tgeneralize to new examples

Prefer to find more compact decision trees

Page 29: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 29/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  29

Decision tree learning

Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as

root of (sub)tree

Page 30: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 30/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  30

Choosing an attribute

Idea: a good attribute splits the examples into subsetsthat are (ideally) "all positive" or "all negative"

Patrons? is a better choice

Page 31: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 31/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  31

Choosing an attribute via Information Theory

To implement Choose-Attribute in the DTL algorithm,

It is needed to find Information Content (Entropy):

I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)

คาเอนโทรปีชุดข อมูลโดยเฉลี ่ย = ผลรวมของ (-log2(ความนาจะเป็นของข อมูลแตละตัว)

คาเอนโทรปีนี ้จะนําไปใช  ในการประเมิน “เน ้ือหาของขอมูลสารสนเทศ” ชุดตางๆ วา มีความเหมือนหรอืแตกตางกันได 

 –  ใช  ประกอบการพจิารณาตอไปวาจะสามารถลดจํานวนก ่ิงทเีหมือนกันๆออกไปไดหรอืไม ในกระบวนการแปลงตารางข อมูลเป็นแผนผังต นไม 

 –  ดูตัวอยางการโยนเหรยีญหัวก อย ในสไลดถั์ดไป 

Page 32: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 32/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  32

Ex: ข อมูลการโยนเหรียญ 

ชุดข อมูล (M) = {หัว, ก อย}

ความนาจะเป็นในการออกหัว และก อย =P(หัว), P(ก อย) ตามลําดับ 

คาเอนโทรปีโดยเฉลี ่ยของขอมูลชุดนี ้ = I(M)

จะเห็นวา เม ่ือออกหัวหรือก อยหมด คาเอนโทรปีจะเป็นศูนย์ แตคานี ้จะเพ ่ิมข ้ึนเรื ่อยๆ จนสงูสดุท ่ีโอกาสของการเป็นหัวหรือก อยมเีทากัน ดังนั้น 

เอนโทรปีมีคาน อย จะบงช  ี ้ได วาชดุข อมูลนี ้มีความแตกตางกันน อย หรืออาจะเป็นชุดเดียวกัน หรือกรณีตรงขามท ่ีเอนโทรปีมีคามาก ชดุข อมูลจะแตกตางกันมากดวย 

Page 33: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 33/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  33

Information gain

A chosen attribute A divides the training set E intosubsets E 1, … , E v 

according to their values for A,where A has v distinct values.

Information Gain (IG) or reduction in entropy fromthe attribute test:

 –  คดิจากคาเอนโทรปีรวม ลบด วยคาเอนโทรปีหลังจากเลือก attribute

อันหนึ ่งเป็นราก  Choose the attribute with the largest IG

 –  เพ ่ือนํามาใช  ในการพจิารณาเป็น “ราก” ส ําหรับการตัดส  ินใจตอไป 

∑= +++

+=

v

i ii

i

ii

iii

n p

n

n p

 p I 

n p

n p Aremainder 

1

),()(

)(),()( Aremainder n p

n

n p

 p I  A IG −

++

=

Page 34: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 34/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  34

Information gain

For the training set,  p = n = 6, I(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type (and others too):

Since Patrons has the highest IG of all attributes, so it ischosen by the DTL algorithm as the root

bits0)]4

2,

4

2(

12

4)

4

2,

4

2(

12

4)

2

1,

2

1(

12

2)

2

1,

2

1(

12

2[1)(

bits0541.)]6

4,6

2(12

6)0,1(12

4)1,0(12

2[1)(

=+++−=

=++−=

 I  I  I  I Type IG

 I  I  I Patrons IG

Page 35: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 35/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  35

Example (cont.)

Decision tree learned from the 12 examples:

Substantially simpler than “true” tree---a more complexhypothesis isn’t justified by small amount of data

Page 36: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 36/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  36

Performance measurement

Q: How do we know that h ≈ f ?1. Use theorems of computational/statistical learning theory

2. Try h on a new test set of examples

(use same distribution over example space as training set)

Learning curve = % correct on test set as a function oftraining set size

Page 37: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 37/38

241-320 Design Architecture &Engineering for Intelligent System  Machine Learning - Part 1 (Learningfrom Observations)  37

Summary

Learning is needed forunknown environments, lazy designers

Learning agent = feedback + learning element +

performance element

For supervised learning, the aim is to find a simplehypothesis approximately consistent with trainingexamples

Decision tree learning using information gain

Learning performance = prediction accuracymeasured on test set

Page 38: L16 Machine Learning

8/3/2019 L16 Machine Learning

http://slidepdf.com/reader/full/l16-machine-learning 38/38

241 320 Design Architecture &S M hi L i P t 1 (L if Ob i ) 38

Reading 

บทท ี่ 10

การเรียนรูของเครื อ่งจักร

(Machine Learning)

บทท ี่ 6การเรียนรูของเครื อ่ง (หน า 153 - 163)