Top Banner
Machine learning workshop [email protected] Machine learning introduc7on Logis&c regression Feature selec7on Boos7ng, tree boos7ng See more machine learning post: h>p://dongguo.me
25

Logistic Regression

Jan 24, 2015

Download

Technology

introduce logistic regression, inference with maximize likelihood with gradient descent, compare L1 and L2 regularization, generalized linear model
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Logistic Regression

Machine  learning  workshop  [email protected]  

Machine  learning  introduc7on  Logis&c  regression  

Feature  selec7on  Boos7ng,  tree  boos7ng  

 See  more  machine  learning  post:  h>p://dongguo.me    

Page 2: Logistic Regression

Overview  of  machine  learning  

    Machine  Learning  

Unsupervised  Learning  

Semi-­‐supervised  Learning  

Supervised  Learning  

Classifica7on   Regression  

Logis7c  regression  

Page 3: Logistic Regression

How  to  choose  a  suitable  model?  

Characteris&c   Naïve  Bayes  

Trees   K  Nearest  neighbor  

Logis&c  regression  

Neural  Networks  

SVM  

Computa7onal  scalability  

3   3   1   3   1   1  

Interpretability    2   2     1    2   1   1  

Predic7ve  power   1   1    3   2   3   3  

Natural  handling  data  of  “mixed”  type  

1   3   1   1   1   1  

Robustness  to  outliers  in  input  space  

 3   3   3   3     1   1  

<Elements  of  Sta-s-cal  Learning>  II  P351      

Page 4: Logistic Regression

Why  model  can’t  perform  perfectly  on  unseen  data  

•  Expected  risk  

•  Empirical  risk  

•  Choose  func7on  family  for  predic7on  func7ons    

•  Error  

Page 5: Logistic Regression

Logis7c  regression  

Page 6: Logistic Regression

Outline  

•  Introduc7on  •  Inference  •  Regulariza7on  •  Experiments  •  More  – Mul7-­‐nominal  LR  –  Generalized  linear  model  

•  Applica7on  

Page 7: Logistic Regression

Logit  func7on  and  logis7c  func7on  

•  Logit  func7on    

•  logis7c  func7on:  Inversed  logit  

Page 8: Logistic Regression

Logis7c  regression  

•  Predic7on  func7on  

Page 9: Logistic Regression

Inference  with  maximize  likelihood  (1)  

•  Likelihood  

•  Inference  

Page 10: Logistic Regression

Inference  with  maximize  likelihood  (2)  

•  Inference  cont.  

•  Use  gradient  descent  

 •  Stochas7c  gradient  descent  

Page 11: Logistic Regression

Regulariza7on  

•  Penalize  large  weight  to  avoid  overfi`ng  

 –  L2  regulariza7on  

 –  L1  regulariza7on  

Page 12: Logistic Regression

Regulariza7on:  Maximum  a  posteriori  

•  MAP  

Page 13: Logistic Regression

L2  regulariza7on  :  Gaussian  Prior    

•  Gaussian  prior  

 

•  MAP  

 •  Gradient  descent  step  

Page 14: Logistic Regression

L1  regulariza7on  :  Laplace  Prior    

•  Laplace  prior  

•  MAP    •  Gradient  descent  step  

Page 15: Logistic Regression

Implementa7on  

•  L2  LR    •  L1  LR  

_weightOfFeatures[fea] += step * (feaValue * error - reguParam * _weightOfFeatures[fea]);

if (_weightOfFeatures[fea] > 0) { _weightOfFeatures[fea] += step * (feaValue * error) - step * reguParam; if (_weightOfFeatures[fea] < 0) _weightOfFeatures[fea] = 0; }else if (_weightOfFeatures[fea] < 0) { _weightOfFeatures[fea] += step * (feaValue * error) + step * reguParam; if (_weightOfFeatures[fea] > 0) _weightOfFeatures[fea] = 0; }else{ _weightOfFeatures[fea] += step * (feaValue * error); }

Page 16: Logistic Regression

L2  VS.  L1  

•  L2  regulariza7on  –  Almost  all  weights  are  not  equal  to  zero  –  Not  suitable  when  training  samples  are  scarce  

•  L1  regulariza7on  –  Produces  sparse  parameter  vectors  – More  suitable  when  most  features  are  irrelevant  –  Could  handle  scarce  training  samples  be>er  

Page 17: Logistic Regression

Experiments  

•  Dataset  –  Goal:  gender  predic7on  –  Dataset:  train  samples  (431k),  test  samples  (167k)  

•  Comparison  algorithms  –  A:  gradient  descent  with  L1  regulariza7on  –  B:  gradient  descent  with  L2  regulariza7on  –  C:  OWL-­‐QN  (L-­‐BFGS  based  op7miza7on  with  L1  regulariza7on)  

•  Parameters  choice  –  Regulariza7on  value  –  Step(learning  speed)  –  Decay  ra7o  –  Itera7on  over  condi7on  

•  Max  itera7on  7mes(50)  ||    AUC  change  <=0.0005  

Page 18: Logistic Regression

Experiments  (cont.)  

•  Experiments  results  Parameters  and  metrics  

gradient  descent  with  L1  

gradient  descent  with  L2  

OWL-­‐QN  

‘best’  regulariza7on  term  

0.001~0.005   0.0002~0.001   1  

Best  step   0.05   0.02~0.05   -­‐  

Best  decay  ra7o   0.85   0.85   -­‐  

Itera7on  7mes   26   20~26   48  

Not  zero  feature  /  all  feature  

10492/10938   10938/10938   6629/10938  

AUC   0.8470   0.8463   0.8467  

Page 19: Logistic Regression

Mul7-­‐nominal  logis7c  regression  

•  Predic7on  func7on  

 •  Inference  with  maximize  likelihood  

•  Gradient  descent  step  (L2)  

Page 20: Logistic Regression

More  Link  func7ons  

•  Inference  with  maximize  likelihood  

 

•  Link  func7on  

•  Link  func7ons  for  binomial  distribu7on  –  Logit  func7on  

–  Probit  func7on  

–  Log-­‐log  func7on  

Page 21: Logistic Regression

Generalized  linear  model  

•  What  is  GLM  –  Generaliza7on  of  linear  regression  –  Connect  linear  model  with  response  variable  by  link  func7on  –  More  distribu7on  for  response  variable  

•  Typical  GLM  –  Linear  regression  ,  Logis7c  regression,  Poisson  regression  

•  Overview  

   

Page 22: Logistic Regression

Applica7on  •  Yahoo  

–  <Personalized  Click  Predic7on  in  Sponsored  Search>  WSDM’10  

•  Microsoq  –  <Scalable  Training  of  L1-­‐Regularized  Log-­‐Linear  Models>  ICML’07  

•  Baidu  –  Contextual  ads  CTR  predic7on  

•  h>p://www.docin.com/p-­‐376254439.html  

•  Hulu  –  Demographic  targe7ng  –  Other  ad-­‐targe7ng  project  –  Custom  churn  predic7on  –  More…  

Page 23: Logistic Regression

Reference  

•  ‘Scalable  Training  of  L1-­‐Regularized  Log-­‐Linear  Models’  ICML’07  –  h>p://www.docin.com/p-­‐376254439.html#  

•  ‘Genera-ve  and  discrimina-ve  classifiers:  Naïve  Bayes  and  logis-c  regression’  by  Mitchell  

•  ‘Feature  selec-on,  L1  vs.  L2  regulariza-on,  and  rota-onal  invariance’  ICML’04  

Page 24: Logistic Regression

Recommended  resources  

•  Machine  Learning  open  class  –  by  Andrew  Ng  –  //10.20.0.130/TempShare/Machine-­‐Learning  Open  Class  

•  h>p://www.cnblogs.com/vivounicorn/archive/2012/02/24/2365328.html  

•  logis7c  regression  Implementa7on[link]  –  //10.20.0.130/TempShare/guodong/Logis7c  regression  Implementa7on/  –  Support  binomial  and  mul7nominal  LR  with  L1  and  L2  regulariza7on  

•  OWL-­‐QN  –  //10.20.0.130/TempShare/guodong/OWL-­‐QN/  

Page 25: Logistic Regression

Thanks