Top Banner
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Pete r Bühlmann Presenter: Lu Ren ECE Dept., Duke Univers ity Sept. 19, 2008
19

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Jan 03, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

The Group Lasso for Logistic Regression

Lukas Meier, Sara van de Geer and Peter Bühlmann

Presenter: Lu Ren

ECE Dept., Duke University

Sept. 19, 2008

Page 2: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Outline

• From lasso to group lasso

• logistic group lasso

• Algorithms for the logistic group lasso

• Logistic group lasso-ridge hybrid

• Simulation and application to splice site detection

• Discussion

Page 3: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Lasso

A popular model selection and shrinkage estimation method.

In a linear regression set-up:

Y• :continuous response

• :design matrix

• : parameter vector

The lasso estimator is then defined as:

pnX :

p

jjX

1

2

|)|(ˆ argmin Y

n

i iu122

2uwhere , and larger set some exactly to 0.

Page 4: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Group Lasso

In some cases not only continuous but also categorical predictors (factors) are present, the lasso solution is not satisfactory with only selecting individual dummy variables but the whole factor.

Extended from the lasso penalty, the group lasso estimator is:

)(minargˆ1

2

2

2

G

gI g

XY

gI : the index set belonging to the th group of variables.

The penalty does the variable selection at the group level , belonging to the intermediate between and type penalty.

It encourages that either or for all

g

1l 2l

0ˆ gβ 0ˆ, jg },,1{ gdfj

Page 5: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Consider a case: two factors and)',( 12111 2Observe the contour of the penalty function:

1: 212111 l 121

-penalty treats the three co-ordinate directions differently: encourage sparsity in individual coefficients while -penalty treats all directions equally and does not encourage sparsity.

:2l

1l

2l

Connection

Ref: Ming Yuan and Yi Lin, Model selection and estimation in regression with grouped variables, J.R. Statist.,2008

Page 6: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Logistic Group Lasso

Independent and identically distributed observations niyii ,,1),,( xTT

GiTii ),,( ,1, xxx : p-dimensional vector of predictorsG

}1,0{iy : a binary response variable, gdf : feedom degree

The conditional probability )|1()( ii YPp xx

)(})(1

)(log{ i

i

i

p

px

x

x

G

gg

Tgii

1,0)( βxx with

The estimator is given by the minimizer of the convex function:

G

gggdfslS

12

)()()( βββ

n

iiiiyl

1

)}](exp{1log[)()( xxβ

Page 7: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Logistic Group Lasso

controls the amount of penalization02/1)( gg dfdfs rescale the penalty with respect to the dimensionality of gβ

Page 8: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Optimization Algorithms

1. Block co-ordinate descent

Cycle through the parameter groups and minimize the object function , keeping all except the current group fixed.)(S

Page 9: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

gβ :set to while all other components remain unchangedgβ 0•

• 0)( }ˆ{ ttβ the parameter vector after block updates, and it can bet

shown every limit point of the sequence is a minimum point of )(S

• blockwise minimizations of the active groups must be performed numerically, and sufficiently fast for small group size and dimension.

2. Block co-ordinate gradient descent

Combine a quadratic approximation of the log-likelihood with an additional line search:

Optimization Algorithms

Page 10: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Optimization Algorithms

Armijo rule: an inexact line search, let be the largest value in

so that

)(t

Page 11: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Optimization Algorithms

• Minimization with respect to the th parameter group depends on

only , here define .

A proper choice is where

is a lower bound to ensure convergence.

g)(t

ggHgdf

tg

tgg IhH )(

*c

• To calculate the on a grid of the penalty parameter we can start at

max10 K

We use as a starting value for and proceed iteratively until with equal or close to 0.

1

ˆk

β

kβ K

Page 12: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Hybrid Methods

• Logistic group lasso-ridge hybrid

The models selected by the group lasso are large compared with the underlying true models;

The ordinary lasso can obtain good prediction with smaller models by using lasso with relaxation.

Define the index set of predictors selected by the group lasso with , and is the set of possible parameter vectors of the corresponding submodel.

The group lasso-ridge hybrid estimator:

0 is a special case called the group lasso-MLE hybrid

Page 13: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Simulation

First sample instances of a nine-dim multivariate normal distribution with mean 0 and covariance matrix

Each is transformed into a four-valued categorical variable by using the quartiles of the standard normal so that

Simulate independent standard normal and

trainn

kT

1gdf

Four different cases are studied:

Page 14: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Observations:

The group lasso seems to select unnecessarily large models with many noise variables;

The group lasso-MLE hybrid is very conservative in selecting terms;

The group lasso-ridge hybrid seems to be the best compromise and has the best prediction performance in terms of the log-likelihood score.

Page 15: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Page 16: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Application Experiment

Splice sites: the regions between coding (exons) and non-coding (introns) DNA segments.

Two training data set: 5610 true and 5610 false donor sites

2805 true and 59804 false donor sites

Test sets: 4208 true and 89717 false donor sites.

For a threshold we assign observation to class if

And to class otherwise.

The Person correlation between true class membership and the predicted class membership.

)1,0( i 1

0

Page 17: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

The corresponding values of on the test set are

and , respectively.

Whereas the group lasso solution has some active three-way interactions, the group lasso-ridge hybrid and the group lasso-MLE hybrid contain only two-way interations.

The three-way interactions of the group lasso solution seem to be very weak.

max ,6593.0 6569.0

6541.0

The best model with respect to the log-likelihood score on the validation set is the group lasso estimator.

Page 18: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Page 19: The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Conclusions

• Study the group lasso for logistic regression

• Present efficient algorithm (automatic and much faster)

• Propose the group lasso-ridge hybrid method

• Apply to short DNA motif modelling and splice site detection