Tensor Decompositions and Applications · The Canonical Tensor Decomposition and Its Applications to Social Network Analysis Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy Sandia

The Canonical Tensor Decomposition and The Canonical Tensor Decomposition and Its Applications to Social Network AnalysisIts Applications to Social Network Analysis

Evrim Acar, Tamara G. Kolda and Daniel M. DunlavySandia National Labs

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

+…+=

CANDECOMP/PARAFAC (CP) model [Hitchcock’27, Harshman’70, Carroll & Chang’70]

II

KK

R components

What is Canonical Tensor What is Canonical Tensor Decomposition?Decomposition?

JJ

CP Application: NeuroscienceCP Application: Neuroscience

Tim

esa

mpl

es

Scales Channels

+a1

b1

c1

a2

b2

c2

≈

Epileptic Seizure Localization:

CP Application: NeuroscienceCP Application: Neuroscience

Tim

esa

mpl

es

Scales Channels

+a1

b1

c1

a2

b2

c2

≈

Epileptic Seizure Localization:

Acar et al, 2007, De Vos et al, 2007

CP has Numerous Applications!CP has Numerous Applications!

• Chemometrics– Fluorescence Spectroscopy– Chromatographic Data

Analysis• Neuroscience

– Epileptic Seizure Localization– Analysis of EEG and ERP

• Signal Processing• Computer Vision

– Image compression, classification

– Texture analysis• Social Network Analysis

– Web link analysis– Conversation detection in

emails– Text analysis

• Approximation of PDEs

Sidiropoulos, Giannakis and Bro, IEEE Trans. Signal Processing, 2000.

Mørup, Hansen and Arnfred, Journal of Neuroscience

Methods, 2007.

Hazan, Polak and Shashua, ICCV 2005.

Bader, Berry, Browne, Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Ed., 2007.

Doostan and Iaccarino, Journal of Computational Physics, 2009.

Andersen and Bro, Journal of Chemometrics, 2003.

Algorithms: Algorithms: How Can We Compute CP?How Can We Compute CP?

Mathematical Details for CPMathematical Details for CP

Unfolding(Matricization)

Columns: mode-1 fibers



Row: mode-2 fibers



+…+=

Matrix Khatri-Rao Product

Tube: mode-3 fibers

CP is a Nonlinear CP is a Nonlinear Optimization ProblemOptimization Problem

+…+=

Given tensor and R (# of components), find matrices A, B, C that solve the following problem:

where the vector x comprises the entries of A, B, and C stacked

column-wise:

Optimization Problem

II

JJKK

variablesvariables

Objective Function

Traditional Approach: CPALSTraditional Approach: CPALS

for k = 1,…

end

Alternating Algorithm


CPALS dating back to Harshman’70 and Carroll & Chang’70 solves for one factor matrix at a time.

Each step can be converted to a matrix least squares problem:

R x R matrix

I x RI x JK JK x R

I x JK JK x R

Repeat the following steps until “convergence”:

Very fast, but not always accurate.Not guaranteed to converge to a stationary point.Other issues, e.g., cannot exploit symmetry.

Traditional Approach: CPALSTraditional Approach: CPALS


Our Approach: CPOPTOur Approach: CPOPT

Unlike CPALS, CPOPT solves for all factor matrices simultaneously using a gradient based optimization.


Define the objective function:

Rewriting the Objective FunctionRewriting the Objective Function

Inner Product

Norm

Derivative of 2Derivative of 2ndnd SummandSummand

Tensor-Vector Multiplication

Analogous formulas exist for partials w.r.t. columns of B and C.

Derivative of 3Derivative of 3rdrd SummandSummand

Analogous formulas exist for partials w.r.t. columns of B and C.

Objective and GradientObjective and Gradient

+…+=Objective Function

Gradient (for r = 1,…,R)

Gradient in Matrix FormGradient in Matrix Form

+…+=

Gradient

Note that this formulation can be used to derive the ALS approach!

Objective Function

Indeterminacies of CPIndeterminacies of CP

• CP is often unique.

• However, CP has two fundamental indeterminacies

– Permutation – The components can be reordered

• Swap a1, b1, c1with a3, b3, c3

– Scaling – The vectors comprising a single rank-one factor can be scaled

• Replace a1 and b1with 2 a1 and ½ b1

+…+=

Not a big deal. Leads to multiple, but separated, minima.

This leads to a continuous space of equivalent solutions.

Adding RegularizationAdding Regularization

Objective Function

Gradient

Our methods:Our methods:CPOPT & CPOPTRCPOPT & CPOPTR

CPOPT: Apply derivative-based optimization method to the following objective function:

CPOPTR: Apply derivative-based optimization method to the following regularized objective function:

Another competing method:Another competing method:CPNLSCPNLS

CPNLS: Apply nonlinear least squares solver to the following equations:

Proposed by Paatero’97 and also Tomasi and Bro’05.

Jacobian is of size .

Experimental SetExperimental Set--UpUp[Tomasi&Bro’06]

Step 2: Construct tensor from factor matrices and add noise. All combinations of:

• Homoscedastic: 1%, 5%, 10%• Heteroscedastic: 0%, 1%, 5%

Step 3: Use algorithm to extract factors, using Rtrue and Rtrue+1 factors. Compare against factors in Step 1. 180

tensors

+= + +

R=3360 tests

20 triplets

Step 1: Generate random factor matrices A, B, C with Rtrue = 3 or 5 columns each and collinearity set to 0.5,i.e.,

Implementation DetailsImplementation Details

• All experiments were performed in MATLAB on a Linux workstation (Quad-Core Intel Xeon 2.50GHz, 9 GB RAM).

• Methods– CPALS – Alternating least squares. Used parafac_als in the Tensor Toolbox

(Bader & Kolda)– CPNLS – Nonlinear least squares. Used PARAFAC3W, which implements

Levenberg-Marquadt (necessary due to scaling ambiguity), by Tomasi and Bro.

– CPOPT – Optimization. Used routines in the Tensor Toolbox in calculation of function values and gradients. Optimization via Nonlinear Conjugate Gradient (NCG) method with Hestenes-Stiefel update, using Poblano (in-house code to be released soon).

– CPOPTR – Optimization with regularization. Same as above. (Regularization parameter = 0.02.)

CPOPT is Fast and AccurateCPOPT is Fast and Accurate

Generated 360 dense test problems (with ranks 3 and 5) and factorized with R as the correct number of components and one more than that. Total of 720 tests for each entry below.

K x K x KR = # components

O(RK3) O(RK3)O(R3K3) O(RK3)

OverfactoringOverfactoring has a significant impacthas a significant impact

CPOPT is robust to CPOPT is robust to overfactoringoverfactoring

250 300 350 400 4500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Emission wavelength

1

Emission mode

250 300 350 400 4500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Emission wavelength

2

Emission mode

250 300 350 400 450-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Emission wavelength

3

Emission mode

Amino (http://Amino (http://www.models.life.ku.dkwww.models.life.ku.dk/)/)

Application: Application: Link PredictionLink Prediction

20052005 20072007……

Link Prediction on Link Prediction on BibliometricBibliometric DataData

auth

ors

auth

ors

conferencesconferences

1991199119921992

……20042004

Question1: Can we use tensor decompositions to model the data and extract meaningful factors?

# of papers by ith author

at jth conf. in year k.

Question2: Can we predict who is going to publish at which conferences in future?

Components make sense! Components make sense!

auth

ors

auth

ors


≈a1

b1

c1

++

a2

b2

c2

……

aR

bR

cRyearyear

ccrrbbrraarr

1992 1994 1996 1998 2000 2002 20040

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

YearsC

oeffs

.

Time mode

0 200 400 600 800 1000 1200 1400 1600 1800-0.2

0

0.2

0.4

0.6

0.8

1

1.2Conference Mode

Conferences

Coe

ffs.

BILDMED

CARSDAGM

0 2000 4000 6000 8000 10000 12000-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3Author Mode

Authors

Coe

ffs.

Hans Peter Meinzer

Heinrich Niemann

Thomas Martin Lehmann

DBLPDBLP


auth

ors

auth

ors


≈a1

b1

c1

++

a2

b2

c2

……

aR

bR

cR

XX

yearyear

ccrrbbrraarr

1992 1994 1996 1998 2000 2002 20040

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

YearsC

oeffs

.

Time mode

0 200 400 600 800 1000 1200 1400 1600 1800-0.2

0

0.2

0.4

0.6

0.8

1

1.2Conference Mode

Conferences

Coe

ffs.

BILDMED

CARSDAGM

0 2000 4000 6000 8000 10000 12000-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3Author Mode

Authors

Coe

ffs.

Hans Peter Meinzer

Heinrich Niemann

Thomas Martin Lehmann


auth

ors

auth

ors


≈a1

b1

c1

++

a2

b2

c2

……

aR

bR

cRyearyear

ccrrbbrraarr

1992 1994 1996 1998 2000 2002 2004-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

YearsC

oeffs

.

Time mode

0 200 400 600 800 1000 1200 1400 1600 1800-0.2

0

0.2

0.4

0.6

0.8

1

1.2Conference mode

Conferences

Coe

ffs.

0 2000 4000 6000 8000 10000 12000-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16Author mode

Coe

ffs.

Authors

IJCAI

Craig Boutilier

Daphne Koller


auth

ors

auth

ors


≈a1

b1

c1

++

a2

b2

c2

……

aR

bR

cRyearyear

ccrrbbrraarr

1992 1994 1996 1998 2000 2002 2004-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

YearsC

oeffs

.

Time mode

0 200 400 600 800 1000 1200 1400 1600 1800-0.2

0

0.2

0.4

0.6

0.8

1

1.2Conference mode

Conferences

Coe

ffs.

0 2000 4000 6000 8000 10000 12000-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16Author mode

Coe

ffs.

Authors

IJCAI

Craig Boutilier

Daphne Koller

Link Prediction ProblemLink Prediction Problem

TRAIN:TRAIN:

TEST:TEST:

a1

b1

c1

++

a2

b2

c2

……

aR

bR

cR

auth

ors

auth

ors


1991199119921992

20042004……

≈≈au

thor

sau

thor

s


2005200520062006

20072007au

thor

sau

thor

s


~ 60K links out of 19 million possible <author, conf> pairs

~ 0.3% dense~ 32K previously unseen links in the training set

<authori, confj> = 1 if ith author publishes at jth conf.

<authori, confj> = 0

Score for <Score for <authorauthorii, , confconfjj>>

a1b1

• Fix signs using the signs of the maximum magnitude entries and then compute a score for each author-conference pair using the information from the time domain:

• Sign ambiguity:

a1

b1

a2

b2++ ……

aR

bR

0 2 4 6 8 10 12 14-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 cc11

timett

a1

b1

c1

++

a2

b2

c2

≈≈

0 2 4 6 8 10 12 14-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45 cc22

Score for <Score for <authorauthorii, , confconfjj>>

a1b1

• Fix signs using the signs of the maximum magnitude entries and then compute a score for each author-conference pair using the information from the time domain:

• Sign ambiguity:

a1

b1

a2

b2++ ……

aR

bR

tt

a1

b1

c1

++

a2

b2

c2

≈≈

Performance Measure: AUCPerformance Measure: AUC

s: contains the scores for all possible pairs, e.g., ~19 million

11

12

....

....ij

IJ

ss

s

s

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

95

23

67

....

....

....

ss

s

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

sortsort

10....1....0

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

labels


sorted scoresscores auth

ors

auth

ors

<authori, confj> = 1 if ith author publishes at jth conf.

<authori, confj> = 0

N: number of 1’sM: number of 0’s

Performance Measure: AUCPerformance Measure: AUC

s: contains the scores for all possible pairs, e.g., ~19 million

11

12

....

....ij

IJ

ss

s

s

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

95

23

67

....

....

....

ss

s

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

sortsort

10....1....0

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

sorted scoresscores labels TP rate FP rate

1/N 0

N: number of 1’sM: number of 0’s

1/N 1/M

1 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FP rateT

P ra

te

Receiver Operating Characteristic (ROC)

Curve

Area Under the curve(AUC)

Performance EvaluationPerformance Evaluation

Predicting Links for 2005 - 2007 (~ 60K):

Predicting Previously Unseen LinksPreviously Unseen Linksfor 2005 - 2007(~ 32K):

AUC=0.92AUC=0.92

AUC=0.87AUC=0.87

CPCP

RANDOMRANDOM

CPCP--WOPT: WOPT: Handling Missing DataHandling Missing Data

EEG

chan

nels

chan

nels

timetime--frequencyfrequency

subject 1subject 1

Missing Data ExamplesMissing Data Examples

Missing data in different disciplines due to lossof information, machine failures, different samplingfrequencies or experimental-set ups.• Chemometrics • Biomedical signal processing (e.g., EEG)• Network traffic analysis (e.g., packet drops)• Computer vision (e.g., occlusions)• … excitationexcitation

emissionemission

Tomasi&Bro’05CHEMISTRY

subject Nsubject N subjectssubjects

chan

nels

chan

nels


+…+≈≈

Modify the objective for CPModify the objective for CP

Optimization ProblemOptimization Problem

Objective Function

NO MISSING DATAFOR HANDLING MISSING DATA

Our approach: CPOur approach: CP--WOPTWOPT

Objective Function

Objective and GradientObjective and Gradient

Objective Function

Gradient (for r = 1,…,R; i=1,…I; j=1,..J; k=1,..K )

Gradient in Matrix FormGradient in Matrix Form

Objective Function

Gradient

Experimental SetExperimental Set--UpUp[Tomasi&Bro’05]

Step 1: Generate random factor matrices A, B, C with R = 5 or 10 columns each and collinearity set to 0.5.

Step 2: Construct tensor from factor matrices and add noise ( 2% homoscedastic noise)

Step 4: Use algorithm to extract R factors. Compare against factors in Step 1.

+ …= +

R

20 triplets

Step 3: Set some entries to missing• Percentage of Missing Data: 10%, 40%,

70%

Missing: entries, fibers

CPCP--WOPT is Accurate!WOPT is Accurate!

Generated 40 test problems (with ranks 5 and 10) and factorized with an R-component CP model. Each entry corresponds to the percentage of correctly recovered solutions.

# known data entries# variables

CPCP--WOPT is Accurate!WOPT is Accurate!

Generated 40 test problems (with ranks 5 and 10) and factorized with an R-component CP model. Each entry corresponds to the percentage of correctly recovered solutions.

CPNLS : Nonlinear least squares. Used INDAFAC, which implements Levenberg-Marquadt [Tomasi and Bro’05].Other alternative: ALS-based imputation (For comparisons, see Tomasi and Bro’05).

CPCP--WOPT is Fast!WOPT is Fast!

Generated 60 test problems (with M =10%, 40% and 70%) and factorized with an R-component CP model. Each entry corresponds to the average/std of the CP models, which successfully recover the underlying factors.

CPCP--WOPT is useful for real data!WOPT is useful for real data!

GOAL: To differentiate between left and right hand stimulation

subjectssubjects

chan

nels

chan

nels


≈≈ + +

COMPLETE DATACOMPLETE DATA INCOMPLETE DATAINCOMPLETE DATA

Thanks to Morten Mørup!

missing

Summary & Future WorkSummary & Future Work

• New CPOPT method – Accurate & scalable

• Extend CPOPT to CP-WOPT tohandle missing data– Accurate & scalable

• More open questions…– Starting point?– Tuning the optimization– Regularization – Exploiting sparsity– Nonnegativity

• Application to link prediction– On-going work comparing to other

methods

Thank you!Thank you!

• More on tensors and tensor models:– Survey : E. Acar and B. Yener, Unsupervised Multiway Data Analysis: A Literature Survey,

IEEE Transactions on Knowledge and Data Engineering, 21(1): 6-20, 2009.– CPOPT : E. Acar, T. G. Kolda and D. M. Dunlavy, An Optimization Approach for Fitting

Canonical Tensor Decompositions, Submitted for publication.– CP-WOPT : E. Acar, T.G. Kolda, D. M. Dunlavy and M. Mørup, Tensor Factorizations with

Missing Data, Submitted for publication.– Link Prediction: E. Acar, T.G. Kolda and D. M. Dunlavy, Link Prediction on Evolving Data, in

preparation.

• Contact:– Evrim Acar, [email protected]– Tamara G. Kolda, [email protected]– Daniel M. Dunlavy, [email protected]

Minisymposia onTensors and Tensor-based Computations

mailto:[email protected]



Tensor Decompositions and Applications · The Canonical Tensor Decomposition and Its Applications to Social Network Analysis Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy Sandia

Documents