Top Banner
Factorization Machines Jakub Pachocki 10-805 class talk
24

Factorization Machines

Jan 27, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Factorization Machines

Factorization Machines

Jakub Pachocki

10-805 class talk

Page 2: Factorization Machines

FACTORIZATION MACHINES

• A beautiful cross between Matrix

Factorization and SVMs

• Introduced by Rendle in 2010

Page 3: Factorization Machines

KAGGLE DOMINANCE

(FM)

(FM)

Page 4: Factorization Machines

AD CLASSIFICATION

Clicked? Country Day Ad_type

1 USA 3/3/15 Movie

0 China 1/7/14 Game

1 China 3/3/15 Game

Page 5: Factorization Machines

ONE-HOT ENCODING

Clicked? Country=

USA

Country=

China

Day=

3/3/15

Day=

1/7/14

Ad_type

=Movie

Ad_type

=Game

1 1 0 1 0 1 0

0 0 1 0 1 0 1

1 0 1 1 0 0 1

Page 6: Factorization Machines

AD CLASSIFICATION

• Very large feature space

• Very sparse samples

• Should we run SGD now?

Page 7: Factorization Machines

POLY-2 KERNEL

• Often features are more important in

pairs

• e.g. „Country=USA” ^

„Day=Thanksgiving”

• Create a new feature for every pair of

features

• Feature space: insanely large

• Samples: still sparse

Page 8: Factorization Machines

SHARPENING OCCAM’S RAZOR

• We cannot learn a weight for every

possible pair of features because of

memory constraints

• Even if we could (SVMs?), we might

overfit massively

Page 9: Factorization Machines

FACTORIZATION MACHINES

• Let wi,j be the weight assigned to

feature pair (i,j)

• Key idea: Set wi,j = <vi, vj>

• vis are vectors in k-dimensional space

Page 10: Factorization Machines

FACTORIZATION MACHINES

• Let wi,j be the weight assigned to

feature pair (i,j)

• Key idea: Set wi,j = <vi, vj>

vCountry=USA

vCountry=China

vDay=Chinese_New_Year

vDay=Thanksgiving

Page 11: Factorization Machines

SVMS MEET FACTORIZATION

• The idea is that weights between

different pairs of features are not

entirely independent

• Their dependence is described by

latent factors vCountry=USA

vCountry=China

vDay=Chinese_New_Year

vDay=Thanksgiving

Page 12: Factorization Machines

MATRIX FACTORIZATION RECAP

m movies

r11 …

… …

rij

rnm

R[i,j] = user i’s rating of movie j

n u

sers

12

Page 13: Factorization Machines

MATRIX FACTORIZATION RECAP

m movies

n u

sers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm

r11 …

… …

rij

rnm

~

R[i,j] = user i’s rating of movie j

13

Page 14: Factorization Machines

MATRIX FACTORIZATION RECAP

m movies

n u

sers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm

r11 …

… …

rij

rnm

~

R[i,j] = user i’s rating of movie j

14 Low-dimensional vectors for users

Low-dimensional vectors for movies

Page 15: Factorization Machines

FMS AND MATRIX FACTORIZATION

Rating? User=Alice User=Bob User=Jane Movie=

Titanic

Movie=

Avatar

1 1 0 0 0 1

0 0 1 0 1 0

1 0 0 1 1 0

Page 16: Factorization Machines

FMS AND MATRIX FACTORIZATION

Rating? User=Alice User=Bob User=Jane Movie=

Titanic

Movie=

Avatar

1 1 0 0 0 1

0 0 1 0 1 0

1 0 0 1 1 0

• Equivalent! The latent factors for user

and movie feature weights yield the

factorization.

Page 17: Factorization Machines

FMS AND SVMS

Clicked? Country=

USA

Country=

China

Day=

3/3/15

Day=

1/7/14

Ad_type

=Movie

Ad_type

=Game

1 1 0 1 0 1 0

0 0 1 0 1 0 1

1 0 1 1 0 0 1

Page 18: Factorization Machines

FMS AND SVMS

• What if we set k very large?

Clicked? Country=

USA

Country=

China

Day=

3/3/15

Day=

1/7/14

Ad_type

=Movie

Ad_type

=Game

1 1 0 1 0 1 0

0 0 1 0 1 0 1

1 0 1 1 0 0 1

Page 19: Factorization Machines

FMS AND SVMS

• Equivalent! For k large enough we can

express any pairwise interactions

between features.

Clicked? Country=

USA

Country=

China

Day=

3/3/15

Day=

1/7/14

Ad_type

=Movie

Ad_type

=Game

1 1 0 1 0 1 0

0 0 1 0 1 0 1

1 0 1 1 0 0 1

Page 20: Factorization Machines

FACTORIZATION MACHINES

• Generalize SVMs and Matrix

Factorization (and many other models)

• Can be learned in linear time using

SGD

Page 21: Factorization Machines

BONUS: BUT HOW DO I WIN AT KAGGLE?

• (other than months of feature

engineering)

• Use knowledge of the original fields the

features come from!

• e.g. Country might have a different

relationship to Date than to Ad_type

Page 22: Factorization Machines

FIELD-AWARE FACTORIZATION MACHINES

• Learn a different set of latent factors

for every pair of fields

• Instead of <vi, vj> we use <vi,f(j), vj,f(i)>,

where f(i) is the field feature i comes

from

Page 23: Factorization Machines

MISCELLANEOUS

• Use hash trick!

• ... and regularization

• Generating more features: GBDT,

neural nets, etc...

Page 24: Factorization Machines

THANK YOU!

Questions?