IEOR165 Discussion Week 7 Sheng Liu University of California, Berkeley Mar 3, 2016
IEOR165 DiscussionWeek 7
Sheng Liu
University of California, Berkeley
Mar 3, 2016
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Outline
1 Nadaraya-Watson Estimator
2 Partially Linear Model
3 Support Vector Machine
IEOR165 Discussion Sheng Liu 2
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Nadaraya-Watson Estimator: Motivation
Given the data {(X1, Y1), . . . , (Xn, Yn)}, we want to find the relationshipbetween Yi (response) and Xi (predictor):
Yi = g(Xi) + εi
Basically, we find to find g(xi), which is
g(x) = E(Yi|Xi = x)
where we assume E(εi|Xi) = 0.
Parametric methods: assume the form of g(x), e.g. linear,polynomial, exponential... Then we only need to estimate theparameters that characterize g(x).
Nonparametric methods: make no assumptions about the form ofg(x), which implies the number of parameters is infinite...
IEOR165 Discussion Sheng Liu 3
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
An Intuitive Nonparametric Method
K-nearest neighbor average:
g(x) =1
k
∑Xi∈Nk(x)
Yi
=
n∑i=1
I(Xi ∈ Nk(x))k
· Yi
=
n∑i=1
I(Xi ∈ Nk(x))∑ni=1 I(Xi ∈ Nk(x))
· Yi
=
∑ni=1 I(Xi ∈ Nk(x)) · Yi∑ni=1 I(Xi ∈ Nk(x))
You can compare this with our intuitive method for estimating pdf
Not continuous and indifferentiable
IEOR165 Discussion Sheng Liu 4
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Nadaraya-Watson Estimator
To reach smoothness, replace the indicator function by kernel function
g(x) =
∑ni=1K(Xi−x
h ) · Yi∑ni=1K(Xi−x
h )
Then we get the Nadaraya-Watson Estimator.
Weighted average: kernel weights
Relation to kernel density estimate
f(x) =1
nh
n∑i=1
K(Xi − xh
)
f(x, y) =1
nh2
n∑i=1
K(Xi − xh
)K(Yi − yh
)
And by the formula
g(x) = E(Y |X = x) =
∫yf(y|x)dy =
∫yf(x, y)
f(x)dy
IEOR165 Discussion Sheng Liu 5
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Nadaraya-Watson Estimator
IEOR165 Discussion Sheng Liu 6
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Partially Linear Model
A compromise between parametric model and nonparametric model
Yi = Xiβ + g(Zi) + εi
Predictor: (Xi, Zi)
A parametric part: linear model
A nonparametric part: g(Zi)
Estimation:E(Yi|Zi) = E(Xi|Zi)β + g(Zi)
⇒ Yi − E(Yi|Zi) = (Xi − E(Xi|Zi))β + ε
For E(Yi|Zi) and E(Xi|Zi), use Nadaraya-Watson Estimators
IEOR165 Discussion Sheng Liu 7
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Classification
When our response is qualitative, or categorical, predicting the responseis actually doing classification.
Predictor: Income and Balance (X)Response: Default or not (Y)
IEOR165 Discussion Sheng Liu 8
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Classification
How to classify? Draw a hyperplane X ′β + β0 = 0
IEOR165 Discussion Sheng Liu 9
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
How to find the optimal hyperplane X ′β + β0 = 0? when perfectlyseparable
Objective: Maximize margin = Minimize ||β||Constraints: separate the data correctly
IEOR165 Discussion Sheng Liu 10
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
If X ′iβ + β0 ≥ 1, yi = 1 If X ′iβ + β0 ≤ −1, yi = −1
So the constraint is
yi(X′iβ + β0) ≥ 1 ∀i = 1, . . . , n
The optimization problem is
minβ,β0
||β||
s.t. yi(X′iβ + β0) ≥ 1 ∀i = 1, . . . , n
or
minβ,β0
||β||2
s.t. yi(X′iβ + β0) ≥ 1 ∀i = 1, . . . , n
IEOR165 Discussion Sheng Liu 11
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
When not perfectly separable
Objective: Minimize ||β|| and errorsConstraints: separate the data correctly with possible errors
IEOR165 Discussion Sheng Liu 12
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
When not perfectly separable, the optimization problem is
minβ,β0
||β||2 + λ
n∑i=1
ξi
s.t. yi(X′iβ + β0) ≥ 1− ξi ∀i = 1, . . . , n
ξi ≥ 0 ∀i = 1, . . . , n
How to find the dual function? Recall the Lagrangian dual and KKTconditions.
IEOR165 Discussion Sheng Liu 13
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
References
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Anintroduction to statistical learning (Vol. 112). New York: springer.
Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). Theelements of statistical learning: data mining, inference andprediction. The Mathematical Intelligencer, 27(2), 83-85.
Bruce E. Hansen, Lecture Notes on Nonparametrics http:
//www.ssc.wisc.edu/~bhansen/718/NonParametrics1.pdf
IEOR165 Discussion Sheng Liu 14