CS 2750 Machine Learning CS 2750 Machine Learning Lecture 8 Milos Hauskrecht [email protected]5329 Sennott Square Classification : Logistic regression. Generative classification model. CS 2750 Machine Learning Binary classification • Two classes • Our goal is to learn to classify correctly two types of examples – Class 0 – labeled as 0, – Class 1 – labeled as 1 • We would like to learn • Zero-one error (loss) function • Error we would like to minimize: • First step: we need to devise a model of the function } 1 , 0 { = Y } 1 , 0 { : → X f = ≠ = i i i i i i y f y f y Error ) , ( 0 ) , ( 1 ) , ( 1 w x w x x )) , ( ( 1 ) , ( y Error E y x x
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Two classes• Our goal is to learn to classify correctly two types of examples
– Class 0 – labeled as 0, – Class 1 – labeled as 1
• We would like to learn• Zero-one error (loss) function
• Error we would like to minimize:• First step: we need to devise a model of the function
}1,0{=Y
}1,0{: →Xf
=≠
=ii
iiii yf
yfyError
),(0),(1
),(1 wxwx
x
)),(( 1),( yErrorE yx x
2
CS 2750 Machine Learning
Discriminant functions
• One way to represent a classifier is by using– Discriminant functions
• Works for binary and multi-way classification
• Idea: – For every class i = 0,1, …k define a function
mapping– When the decision on input x should be made choose the
class with the highest value of
• So what happens with the input space? Assume a binary case.
)(xigℜ→X
)(xig
CS 2750 Machine Learning
Discriminant functions
)()( 01 xx gg ≥
-2 -1.5 -1 -0.5 0 0.5 1 1.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
3
CS 2750 Machine Learning
Discriminant functions
)()( 01 xx gg ≥
)()( 01 xx gg ≤
-2 -1.5 -1 -0.5 0 0.5 1 1.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
)()( 01 xx gg ≤
CS 2750 Machine Learning
Discriminant functions
)()( 01 xx gg ≥
)()( 01 xx gg ≤
-2 -1.5 -1 -0.5 0 0.5 1 1.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
)()( 01 xx gg ≤
)()( 01 xx gg ≥
4
CS 2750 Machine Learning
Discriminant functions
• Define decision boundary
)()( 01 xx gg ≥
)()( 01 xx gg ≤
-2 -1.5 -1 -0.5 0 0.5 1 1.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
)()( 01 xx gg ≥
)()( 01 xx gg ≤
)()( 01 xx gg =
CS 2750 Machine Learning
-2 -1.5 -1 -0.5 0 0.5 1 1.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3Decision boundary
Quadratic decision boundary
)()( 01 xx gg ≥
)()( 01 xx gg ≤ )()( 01 xx gg =
5
CS 2750 Machine Learning
Logistic regression model
• Defines a linear decision boundary• Discriminant functions:
• where
)()()( 1 xwxwwx, TT ggf ==
)1/(1)( zezg −+=
xInput vector
∑
1
1x )( wx,f
0w
1w2w
dw2x
z
dx
Logistic function
)()(1 xwx Tgg = )(1)(0 xwx Tgg −=- is a logistic function
CS 2750 Machine Learning
Logistic functionfunction
• Is also referred to as a sigmoid function• Replaces the threshold function with smooth switching • takes a real number and outputs the number in the interval [0,1]
)1(1)( ze
zg −+=
-20 -15 -10 -5 0 5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
6
CS 2750 Machine Learning
Logistic regression model
• Discriminant functions:
• Values of discriminant functions vary in [0,1]– Probabilistic interpretation
),|1( wx=yp
xInput vector
∑
1
1x0w
1w2w
dw2x
z
dx
)()(1 xwx Tgg = )(1)(0 xwx Tgg −=
)()(),|1()( 1 xwxxwwx, Tggypf ====
CS 2750 Machine Learning
Logistic regression
• We learn a probabilistic function
– where f describes the probability of class 1 given x
Note that:
• Transformation to binary class values:
),|1()()( 1 wxxwwx, === ypgf T
]1,0[: →Xf
2/1)|1( ≥= xypIf then choose 1Else choose 0
)|1(1),|0( wx,wx =−== ypyp
7
CS 2750 Machine Learning
Linear decision boundary
• Logistic regression model defines a linear decision boundary• Why?• Answer: Compare two discriminant functions.• Decision boundary:• For the boundary it must hold:
0)(
)(1log)()(log
1
=−
=xw
xwxx
T
T
gg
ggo
)()( 01 xx gg =
0)(explog
)(exp11
)(exp1)(exp
log)()(log
1
==−=
−+
−+−
= xwxw
xw
xwxw
xx TT
T
T
T
ggo
CS 2750 Machine Learning
Logistic regression model. Decision boundary
• LR defines a linear decision boundaryExample: 2 classes (blue and red points)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2Decision boundary
8
CS 2750 Machine Learning
Likelihood of outputs• Let
• Then
• Find weights w that maximize the likelihood of outputs– Apply the log-likelihood trick The optimal weights are the
same for both the likelihood and the log-likelihood