Top Banner
Discovery of Latent Factors in High-dimensional Data Using Tensor Methods Furong Huang University of California, Irvine Machine Learning Conference 2016 New York City 1 / 25
57

Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Jan 19, 2017

Download

Technology

MLconf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Discovery of Latent Factors in High-dimensionalData Using Tensor Methods

Furong Huang

University of California, Irvine

Machine Learning Conference 2016 New York City

1 / 25

Page 2: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Machine Learning - Modern Challenges

Big Data Challenging Tasks

Success of Supervised Learning

Image classification Speech recognition Text processing

Computation power growth

Enormous labeled data

2 / 25

Page 3: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Machine Learning - Modern Challenges

Big Data Challenging Tasks

Real AI requires Unsupervised Learning

Filter bank learning Feature extraction Embeddings; Topics

2 / 25

Page 4: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Machine Learning - Modern Challenges

Big Data Challenging Tasks

Real AI requires Unsupervised Learning

Filter bank learning Feature extraction Embeddings; Topics

Summarize key features in data: Machines vs Humans

Foundation for successful supervised learning

2 / 25

Page 5: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning with Big Data

Information Extraction

High dimension observation vs Low dimension representation

Cell Types

Topics

Communities

3 / 25

Page 6: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning with Big Data

Information Extraction

High dimension observation vs Low dimension representation

Cell Types

Topics

Communities

Finding Needle In the Haystack Is Challenging3 / 25

Page 7: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning with Big Data

Information Extraction

Solution for Unsupervised Learning

A Unified Tensor Decomposition Framework

3 / 25

Page 8: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Automated Categorization of DocumentsMixed topics

Topics

Education

Crime

Sports

4 / 25

Page 9: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Community Extraction From Connectivity Graph

Mixed memberships

5 / 25

Page 10: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Tensor Methods Compared with Variational Inference

PubMed on Spark: 8 million docs

103

104

105

Perplexity

TensorVariational

0

2

4

6

8

10 ×104

RunningTim

e(s)

6 / 25

Page 11: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Tensor Methods Compared with Variational Inference

PubMed on Spark: 8 million docs

103

104

105

Perplexity

TensorVariational

0

2

4

6

8

10 ×104

RunningTim

e(s)

Facebook: n ∼ 20k Yelp: n ∼ 40k DBLP: n ∼ 1 million

10-2

10-1

100

101

Error/group

FB YP DBLPsub DBLP 102

103

104

105

106

RunningTim

es(s)

FB YP DBLPsub DBLP

6 / 25

Page 12: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Tensor Methods Compared with Variational Inference

PubMed on Spark: 8 million docs

103

104

105

Perplexity

TensorVariational

0

2

4

6

8

10 ×104

RunningTim

e(s)

Facebook: n ∼ 20k Yelp: n ∼ 40k DBLP: n ∼ 1 million

10-2

10-1

100

101

Error/group

FB YP DBLPsub DBLP 102

103

104

105

106

RunningTim

es(s)

FB YP DBLPsub DBLP

Orders

ofMag

nitude Fa

ster &

MoreAc

curat

e

“Online Tensor Methods for Learning Latent Variable Models”, F. Huang, U. Niranjan, M. Hakeem, A. Anandkumar, JMLR14.“Tensor Methods on Apache Spark”, by F. Huang, A. Anandkumar, Oct. 2015.

6 / 25

Page 13: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Cataloging Neuronal Cell Types In the Brain

7 / 25

Page 14: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Cataloging Neuronal Cell Types In the BrainOur method vs Average expression level [Grange 14’]

0.5

1.0

1.5

2.0

2.5

kSpatial point process (ours)

Average expression level ( )previous

Recovered known cell types

1 astrocytes

2 interneurons

3 oligodendrocytes

” Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model ” by F.Huang, A. Anandkumar, C. Borgs, J. Chayes, E. Fraenkel, M. Hawrylycz, E. Lein, A. Ingrosso, S. Turaga, NIPS 2015 BigNeuro

workshop.

8 / 25

Page 15: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Word Sequence Embedding Extraction

football

soccer

tree

Word Embedding

The weather is good.

Her life spanned years of

incredible change for women.Mary lived through an era of

liberating reform for women.

Word Sequence Embedding

9 / 25

Page 16: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Word Sequence Embedding Extraction

football

soccer

tree

Word Embedding

The weather is good.

Her life spanned years of

incredible change for women.Mary lived through an era of

liberating reform for women.

Word Sequence Embedding

Paraphrase Detection

MSR paraphrase data: 5800 pairs of sentences

Method Outside Information F score

Vector Similarity (Baseline) word similarity 75.3%Convolutional Tensor (Proposed) none 80.7%Skip-thought (NIPS’15) train on large corpus 81.9%

“Convolutional Dictionary Learning through Tensor Factorization”, by F. Huang, A. Anandkumar, conference and workshopproceeding of JMLR, vol.44, Dec 2015.

9 / 25

Page 17: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Human Disease Hierarchy DiscoveryCMS: 1.6 million patients, 168 million diagnostic events, 11 k diseases.

” Scalable Latent TreeModel and its Application to Health Analytics ” by F. Huang, N. U.Niranjan, I. Perros, R. Chen, J. Sun,A. Anandkumar, NIPS 2015 MLHC workshop.

10 / 25

Page 18: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning via Probabilistic Models

Words

Topics

Choice Variable

life gene data DNA RNA

k1 k2 k3 k4 k5

h

A A A A A

Unlabeled data Probabilistic admixture model Learning Algorithm Inference

11 / 25

Page 19: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning via Probabilistic Models

Words

Topics

Choice Variable

life gene data DNA RNA

k1 k2 k3 k4 k5

h

A A A A A

Unlabeled data Probabilistic admixture model MCMC Inference

MCMC: random sampling, slow◮ Exponential mixing time

11 / 25

Page 20: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning via Probabilistic Models

Words

Topics

Choice Variable

life gene data DNA RNA

k1 k2 k3 k4 k5

h

A A A A A

Unlabeled data Probabilistic admixture model Likelihood Methods Inference

MCMC: random sampling, slow◮ Exponential mixing time

Likelihood: non-convex, not scalable◮ Exponential critical points

11 / 25

Page 21: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning via Probabilistic Models

Words

Topics

Choice Variable

life gene data DNA RNA

k1 k2 k3 k4 k5

h

A A A A A

Unlabeled data Probabilistic admixture model Likelihood Methods Inference

MCMC: random sampling, slow◮ Exponential mixing time

Likelihood: non-convex, not scalable◮ Exponential critical points

Solution

A unified tensor decomposition framework

11 / 25

Page 22: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning via Probabilistic Models

Words

Topics

Choice Variable

life gene data DNA RNA

k1 k2 k3 k4 k5

h

A A A A A

Unlabeled data Probabilistic admixture model Te���� �e���p������� Inference

= + +

tensor decomposition → correct model

12 / 25

Page 23: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Unsupervised Learning via Probabilistic Models

Words

Topics

Choice Variable

life gene data DNA RNA

k1 k2 k3 k4 k5

h

A A A A A

Unlabeled data Probabilistic admixture model T�� ����������� Inference

� � �

tensor decomposition → correct model

Contributions

Guaranteed online algorithm with global convergence guarantee

Highly scalable, highly parallel, random projection

Tensor library on CPU/GPU/Spark

Interdisciplinary applications

Extension to model with group invariance

12 / 25

Page 24: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

What is a tensor?

Matrix: Second Order Moments

M2: pair-wise relationship.

[x⊗ x]i1,i2 = xi1xi2 → [M2]i1,i2

=

i1

i2

Tensor: Third Order Moments

M3: triple-wise relationship.

[x⊗ x⊗ x]i1,i2,i3 = xi1xi2xi3 → [M3]i1,i2,i3=

i1

i2

i3

13 / 25

Page 25: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap[

1 00 1

]

= e1e⊤1+e2e

⊤2= u1u

⊤1+u2u

⊤2 e1

e2

u1 = [√2

2, −

√2

2]

u2 = [√2

2,√2

2]

14 / 25

Page 26: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap[

1 00 1

]

= e1e⊤1+e2e

⊤2= u1u

⊤1+u2u

⊤2 e1

e2

u1 = [√2

2, −

√2

2]

u2 = [√2

2,√2

2]

Tensor Orthogonal Decomposition

Unique: eigenvalue gap not needed

+=

≠14 / 25

Page 27: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap[

1 00 1

]

= e1e⊤1+e2e

⊤2= u1u

⊤1+u2u

⊤2 e1

e2

u1 = [√2

2, −

√2

2]

u2 = [√2

2,√2

2]

Tensor Orthogonal Decomposition

Unique: eigenvalue gap not needed

Slice of tensor has eigenvalue gap

+=

≠14 / 25

Page 28: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap[

1 00 1

]

= e1e⊤1+e2e

⊤2= u1u

⊤1+u2u

⊤2 e1

e2

u1 = [√2

2, −

√2

2]

u2 = [√2

2,√2

2]

Tensor Orthogonal Decomposition

Unique: eigenvalue gap not needed

Slice of tensor has eigenvalue gap

+=

≠14 / 25

Page 29: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Outline

1 Introduction

2 LDA and Community ModelsFrom Data Aggregates to Model ParametersGuaranteed Online Algorithm

3 Conclusion

15 / 25

Page 30: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Outline

1 Introduction

2 LDA and Community ModelsFrom Data Aggregates to Model ParametersGuaranteed Online Algorithm

3 Conclusion

16 / 25

Page 31: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Probabilistic Topic Models - LDA

Bag of words

Topics

Topic Proportion

police

witness

campus

police

witness

campus

police

witness

campus

17 / 25

Page 32: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Probabilistic Topic Models - LDA

Bag of words

Topics

Topic Proportion

police

witness

campus

police

witness

campus

police

witness

campus

police

witness

crime

Sports

Educa�

on

campus

17 / 25

Page 33: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Probabilistic Topic Models - LDA

Bag of words

Topics

Topic Proportion

police

witness

campus

police

witness

campus

police

witness

campus

police

witness

crime

Sports

Educa�on

campus

Goalcampus

police

witness Topic-word matrix P[word = i|topic = j]

17 / 25

Page 34: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word tablecampus

police

witness

18 / 25

Page 35: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word tablecampus

police

witness

M1: Occurrence of Words

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

18 / 25

Page 36: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word tablecampus

police

witness

M1: Occurrence of Words

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

No unique decomposition of vectors

18 / 25

Page 37: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word tablecampus

police

witness

M2: Modified Co-occurrence of Word Pairs

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

18 / 25

Page 38: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word tablecampus

police

witness

M2: Modified Co-occurrence of Word Pairs

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

Matrix decomposition recovers subspace, not actual model

18 / 25

Page 39: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word table

Find a W W W W such that

M2: Modified Co-occurrence of Word Pairs

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

Many such W ’s, find one, project data with W

18 / 25

Page 40: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word table

Know a W W W W such that

M3: Modified Co-occurrence of Word Triplets

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

W

W

W

Unique orthogonal tensor decomposition, project result with W †

18 / 25

Page 41: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixture Form of Moments

Goal: Linearly independent topic-word table

Know a W W W W such that

M3: Modified Co-occurrence of Word Triplets

= + +campus

police

witness

crime

Sports

Educa�on

campus

police

witness

campus

police

witness

W

W

W

Tensor decomposition uniquely discovers the correct model

Learning Topic Models through Matrix/Tensor Decomposition

18 / 25

Page 42: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixed Membership Community Models

Mixed memberships

19 / 25

Page 43: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixed Membership Community Models

Mixed memberships

What ensures guaranteed learning?

� � �Alice

Bob

Charlie

Math

ema�

cians

Vegetarian

s

Musician

s

David

Ellen

Frank

Grace

Jack

Kathy

19 / 25

Page 44: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Mixed Membership Community Models

Mixed memberships

What ensures guaranteed learning?

� � �Alice

Bob

Charlie

Math

ema�cian

s

Vegetarian

s

Musician

s

David

Ellen

Frank

Grace

Jack

Kathy

19 / 25

Page 45: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Outline

1 Introduction

2 LDA and Community ModelsFrom Data Aggregates to Model ParametersGuaranteed Online Algorithm

3 Conclusion

20 / 25

Page 46: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?Model is uniquely identifiable! How to identify?

21 / 25

Page 47: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

21 / 25

Page 48: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

Objective Function

Theorem: We propose an objective function with equivalent local optima.

21 / 25

Page 49: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

Objective Function

Theorem: We propose an objective function with equivalent local optima.

Saddle point: enemy of SGDSaddle Point

Saddle point has 0 gradient

21 / 25

Page 50: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

Objective Function

Theorem: We propose an objective function with equivalent local optima.

Saddle point: enemy of SGDSaddle Point

Saddle point has 0 gradient

Non-degenerate saddle: Hessianhas ± eigenvalue

21 / 25

Page 51: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

Objective Function

Theorem: We propose an objective function with equivalent local optima.

Saddle point: enemy of SGD

escape

stuckSaddle point has 0 gradient

Non-degenerate saddle: Hessianhas ± eigenvalue

Negative eigenvalue: direction ofescape

21 / 25

Page 52: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

Objective Function

Theorem: We propose an objective function with equivalent local optima.

Saddle point: enemy of SGD

escape

stuckSaddle point has 0 gradient

Non-degenerate saddle: Hessianhas ± eigenvalue

Negative eigenvalue: direction ofescape

Guaranteed Global Converge Online Tensor Decomposition

Theorem: For smooth fn. with non-degenerate saddle points,noisy SGD converges to a local minimum in polynomial steps.

21 / 25

Page 53: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

How to do tensor decomposition?How to find components? Non-convex optimization problem!

Objective Function

Theorem: We propose an objective function with equivalent local optima.

Saddle point: enemy of SGD

escape

stuckSaddle point has 0 gradient

Non-degenerate saddle: Hessianhas ± eigenvalue

Negative eigenvalue: direction ofescape

Guaranteed Global Converge Online Tensor Decomposition

Theorem: For smooth fn. with non-degenerate saddle points,noisy SGD converges to a local minimum in polynomial steps.

Noise could help!“Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition”,by R. Ge, F. Huang, C. Jin, Y. Yuan,COLT 2015.

21 / 25

Page 54: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Outline

1 Introduction

2 LDA and Community ModelsFrom Data Aggregates to Model ParametersGuaranteed Online Algorithm

3 Conclusion

22 / 25

Page 55: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Contributions

Spectral methods reveal hidden structure

Text/Image processing

Social networks

Neuroscience, heathcare ...

23 / 25

Page 56: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Contributions

Spectral methods reveal hidden structure

Text/Image processing

Social networks

Neuroscience, heathcare ...

Versatile for latent variable models

Flat model → hierarchical model

Sparse coding → convolutionalmodel

Efficient, convergence guarantee

� � �

� � � � � � � � �

� � � � � �M3

escape

stuck

23 / 25

Page 57: Furong Huang, Ph.D. Candidate, UC Irvine at MLconf NYC - 4/15/16

Thank YouCollaborators

Anima Anandkumar

UC Irvine

Rong Ge

Duke University

Srini Turaga

Janelia Research

Chi Jin

UC Berkeley

Jennifer Chayes

MSR

Christian Borgs

MSR

Ernest Fraenkel

MIT

Yang Yuan

Cornell U

UN Niranjan

UC Irvine

24 / 25