1/31/08CS 461, Winter 20091 CS 461: Machine Learning Lecture 4 Dr. Kiri Wagstaff [email protected] Dr. Kiri Wagstaff [email protected].

1/31/08 CS 461, Winter 2009 1

CS 461: Machine LearningLecture 4

Dr. Kiri [email protected]. Kiri [email protected]

1/31/08 CS 461, Winter 2009 2

Plan for Today

Solution to HW 2 Support Vector Machines Neural Networks

Perceptrons Multilayer Perceptrons

1/31/08 CS 461, Winter 2009 3

Review from Lecture 3

Decision trees Regression trees, pruning, extracting rules

Evaluation Comparing two classifiers: McNemar’s test

Support Vector Machines Classification

Linear discriminants, maximum margin Learning (optimization): gradient descent, QP

1/31/08 CS 461, Winter 2009 4

Neural Networks

Chapter 11

It Is Pitch Dark

Chapter 11

It Is Pitch Dark

1/31/08 CS 461, Winter 2009 5

Perceptron

[Alpaydin 2004 The MIT Press]

Graphical

[ ][ ]Td

Td

Td

jjj

x,...,x,

w,...,w,w

wxwy

1

10

01

1=

=

=+=∑=

x

w

xw

Math

1/31/08 CS 461, Winter 2009 6

“Smooth” Output: Sigmoid Function

€

y = sigmoid wTx( ) =1

1+ exp −wTx[ ]

€

1. Calculate g x( ) = wTx and choose C1 if g x( ) > 0, or

2. Calculate y = sigmoid wTx( ) and choose C1 if y > 0.5

Why?

• Converts output to probability!

• Less “brittle” boundary

1/31/08 CS 461, Winter 2009 7

K outputsRegression:

xy

xw

W=

=+=∑=

Tii

d

jjiji wxwy 0

1


kk

i

i

k k

ii

Tii

yy

C

oo

y

o

maxif

choose

expexp

=

=

=

∑

xw

Softmax

Classification:

1/31/08 CS 461, Winter 2009 8

Training a Neural Network

1. Randomly initialize weights2. Update =

Learning rate * (Desired - Actual) * Input

€

Δw jt = η y t − ˆ y t( )x j

t

1/31/08 CS 461, Winter 2009 9

Learning Boolean AND


€

Δw jt = η y t − ˆ y t( )x j

t

Perceptron demo

1/31/08 CS 461, Winter 2009 10

Multilayer Perceptrons = MLP = ANN


€

y i = viT z = v ihzh + v i0

h=1

H

∑

zh = sigmoid whTx( )

=1

1+ exp − whj x j + wh 0j=1

d

∑ ⎛ ⎝ ⎜ ⎞

⎠ ⎟ ⎡

⎣ ⎢ ⎤ ⎦ ⎥

1/31/08 CS 461, Winter 2009 11

x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)


1/31/08 CS 461, Winter 2009 12

Examples

Digit Recognition Ball Balancing

1/31/08 CS 461, Winter 2009 13

ANN vs. SVM

SVM with sigmoid kernel = 2-layer MLP Parameters

ANN: # hidden layers, # nodes SVM: kernel, kernel params, C

Optimization ANN: local minimum (gradient descent) SVM: global minimum (QP)

Interpretability? About the same… So why SVMs?

Sparse solution, geometric interpretation, less likely to overfit data

1/31/08 CS 461, Winter 2009 14

Summary: Key Points for Today

Support Vector Machines Neural Networks

Perceptrons Sigmoid Training by gradient descent

Multilayer Perceptrons

ANN vs. SVM

1/31/08 CS 461, Winter 2009 15

Next Time

Midterm Exam! 9:10 – 10:40 a.m. Open book, open notes (no computer) Covers all material through today

Neural Networks(read Ch. 11.1-11.8)

Questions to answer from the reading Posted on the website (calendar) Three volunteers?

1/31/08CS 461, Winter 20091 CS 461: Machine Learning Lecture 4 Dr. Kiri Wagstaff [email protected] Dr. Kiri Wagstaff [email protected].

Documents

svm slide

qp slide

dark slide

mit press slide

input slide

mit press graphicalmath

overfit data slide

brittle boundary slide