Top Banner
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptat ion and Outlier Detectio n Takafumi Kanamori Shohei Hido NIPS 2008
22

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Dec 28, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Efficient Direct Density Ratio Estimation for

Non-stationarity Adaptation and Outlier Detection

Takafumi Kanamori

Shohei Hido

NIPS 2008

Page 2: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Outline

• Motivation

• Importance Estimation

• Direct Importance Estimation

• Approximation Algorithm

• Experiments

• Conclusions

Page 3: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Motivation

• Importance Sampling

• Covariate Shift

• Outlier Detection

Page 4: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Importance Sampling• Rather than sampling from t

he distribution p, importance sampling is to reduce the variance of Ê[f(X)] by an appropriate choice of q, hence the name importance sampling, as samples from q can be more "important" for the estimation of the integral.

• Other reasons include difficulties to draw samples from distribution p or efficiency considerations.

1 2

1 2

[ ( )| ] ( ) ( )

, , ,...,

1( ) ( )

1ˆ [ ] ( ) ( ) ( )

( ) ( ) ( )[ ( )| ]

( ) ( )

, ,

i

n

n xi

n n ii

E f X p f x p x dx

Samplingaccrordingpdistribution x x x

P x xn

MontoCarloEstimate

E ff x dP x f xn

f x w x q x dxE f X p

w x q x dx

Samplingaccrordingqdistribution x x

d

=

=

= =

=

ò

å

åò

òò

,

,...,

1( ) ( )

ˆ [ ]1

( )

n

i ii

n q

ii

x

f x w xn

E fw x

n

å

[2] R. Srinivasan, Importance sampling - Applications in communications and detection, Springer-Verlag, Berlin, 2002. [3]P. J.Smith, M.Shafi, and H. Gao, "Quick simulation: A review of importance sampling techniques in communication systems," IEEE J.Select.Areas Commun.,

vol. 15, pp. 597-613, May 1997.

Page 5: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Covariate Shift

Compensated by weighting the training samples according to the importance

( | ) ( | )

( ) ( )

( , ) ( ) ( | ) ( )

( , ) ( ) ( | ) ( )

but

Train Test

Train Test

test test test test

train train train train

P Y X x P Y X x for all x

P X P X

P x y P x P y x P x

P x y P x P y x P x

= = = Î

¹

= =

χ

Distribution of input training and testing set changed, while the conditional distribution that output given input unchanged. Then, standard learning techniques such as MLE or CV are biased.

[4]Jiayuan Huang, Alexander J. Smola,Arthur Gretton,et al. Correcting Sample Selection Bias by Unlabeled Data, NIPS 2006.

Page 6: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Outlier Detection

• The importance for regular samples are close to one, while those for outliers tend to be significantly deviated from one.

• The values of the importance could be used as an index of the degree of outlyingness.

Page 7: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Related Works

• Kernel Density Estimation

• Kernel Mean Matching

a map into the feature space

the expectation operator

μ(Pr) := Ex~Pr(x)[Φ(x)] .

:F ®X F

:Pm ®F

[4]Jiayuan Huang, Alexander J. Smola,Arthur Gretton,et al. Correcting Sample Selection Bias by Unlabeled Data, NIPS 2006.

Page 8: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Direct Importance Estimation

Page 9: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Least-square Approach

• Model w(x) with linear model

• Determine the parameter alpha so that the squared error on training samples is minimized:

1 2 1 2ˆ( ) ( ) ( , ,..., )( ( ), ( ),..., ( ))T T

b bw x x x x xa j a a a j j j= =J

1( ) ( )

2 testC w x p x dx= ò

Page 10: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Least Square Importance FittingLSIF

0

1min ( ) ( )

2T TJ J C H ha a a a a- = -=

H ( ) ( ) ( ) , ( ) ( )Ttrain testx x p x dx h x p x dxj j j=ò ò=

1 ˆˆmin [ 1 ] . . 02

b

T T Tb bR

H h sta

a a a l a aÎ

Þ

- + ³

Empirical estimation

1 1

1 1ˆH ( ) ( ) , ( )train testn n

train train T testi i j

i jtrain test

x x h xn n

j j j= =

=å å=

Regularization term to avoid over-fitting

Page 11: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Model Selection for LSIF

• Model

the parameter lambda, the basis function phi

• Model selection

Cross Validation

Page 12: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Heuristics for Basic function Design

• Gaussian kernel centered at the test samples

1

2

( ) ( , ),

( , ') exp( ' )

testn testl ll

w x K x x

where K x x x x

s

s

a=

=

= - -

å

Page 13: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Unconstrained Least-squares Approach (uLSIF)

• Ignore the non-negativity constraints

• Learned parameters could be negative• To compensate for the approximation error,

modify the solution

1 ˆˆmin [ ]2 2

b

T T T

RH h

b

lb b b b b

Î- +

1ˆˆ ˆmax(0 , ), ( )b bH I hb b b l -= = +% %

Page 14: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Efficient Computation of LOOCV

• Samples

• learned without the

• LOOCV score

• According to the Sherman-Woodbury-Morrison formula, the matrix inverse needs to be computed only once.

( )ˆ ilb

{ } { }1 1, ,

testtrain nntrain testi j train testi jx x n n

= =<

train testi ix andx

( ) 2 ( )1 1 ˆ ˆ[ ( ( ) ) ( ) ]2

train T i test T ii i

itrain

x xn l lj b j b-å

Page 15: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

ExperimentsImportance Estimation

• ptrain is the d-dimensional normal distribution with mean zero and covariance identity.

• ptest is the d-dimensional normal distribution with mean (1,0,…,0)T and covariance identity.

• Normalized mean squared error

Page 16: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
Page 17: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Covariate Shift Adaptationin classification and regression

• Given the training samples, the test samples, and the outputs of the training samples

• The task is to predict the outputs for test samples

{ } { }1 1, ,

testtrain nntrain testi ji jx x

= =

{ }1

trainntraini iy

=

1

( ; ) ( , )t

l h ll

f x K x mq q=

22

1

Importanceweighted regularized least-squares

ˆmin ( )( ( ; ) )trn

tr tr tri i i

i

IWRLS

w x f x yq q g q=

é ùê ú- +ê úê úë ûå

Page 18: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Experimental Description

• Divide the training samples into R disjoint subsets

• The function is learned using

by IWRLS and its mean test error for the remaining samples is computed:

Where

{ }trr j rZ ¹

1{ }tr Rr rZ =

1{ }tr Rr rZ =

( , )

1 ˆˆ( ) ( ( ), )trr

rx y Ztrr

w x loss f x yZ Îå

ˆ( )rf x

2 1ˆ ˆ( , ) is( ) in regressionand (1 ( )) in classification.2

loss y y y y sign yy- -

Page 19: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Covariate shift adaptation

Page 20: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

ExperimentOutlier Detection

Page 21: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Conclusions

• Application– Covariate shift adaptation– Outlier detection– Feature selection– Conditional distribution estimation– ICA– ……

Page 22: Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Reference

• [1]Takafumi Kanamori, Shohei Hido. Efficient direct density ratio estimation for non-stationarity adaptation and outlier detection, NIPS 2008.

• [2] R. Srinivasan, Importance sampling - Applications in communications and detection, Springer-Verlag, Berlin, 2002.

• [3]P. J.Smith, M.Shafi, and H. Gao, "Quick simulation: A review of importance sampling techniques in communication systems," IEEE J.Select.Areas Commun., vol. 15, pp. 597-613, May 1997.

• [4]Jiayuan Huang, Alexander J. Smola,Arthur Gretton,et al. Correcting Sample Selection Bias by Unlabeled Data, NIPS 2006.

• [5] Jing Jiang. A Literature Survey on Domain Adaptation of Statistical Classifiers