Weighted Representational Component Models I Jörn Diedrichsen Motor Control Group Institute of Cognitive Neuroscience University College London
Weighted Representational Component Models I
Jörn Diedrichsen
Motor Control Group Institute of Cognitive Neuroscience
University College London
Weighting of different features
f1 f2 fk . . . Simple Model
w2 f2 w1 f1 wk fk . . . Weighted component
model
Examples: • Different aspects of objects: color, shape • Different aspects of movements: force, sequence, timing • Different layers of a computation vision model
Overview
Representational component modelling
• Features or groups of features (components) can be differently weighted
• Component weights can be estimated from the data • Inferences can be made directly on component weights
or • Model "t can be assessed using cross-validation
Overview
• Covariances and Distances • Features and representational components • Factorial models (MANOVA) • Linear representational models • Nonlinear representational models • Summary
Patter distance
dij = Gii +G jj −Gij −G ji
Pattern covariance
G = UUT
Covariances and Distances K
Con
ditio
ns
P Voxels
Pattern ui
...
Pattern u j
Distances (LDC)
di, j = ui −u j( ) ui −u j( )T= uiui
T +u ju jT − 2uiu j
T
Inner product of pattern
Gi, j = uiu jT
Covariances and Distances
Pattern covariance matrix (inner product matrix)
G = UUT Di, j = Gi,i +G j , j −Gi, j −G j ,i
Mahalanobis-distance matrix
Contains baseline information Contains no baseline information
G = − 12
HDHT
H = Ik −1K1kT
Assuming baseline in the mean of all patterns, columns and row means of G are zero
(centering matrix)
uiuiT +u ju j
T − 2uiu jT
uiuiT
u ju jT
uiu jT
Overview
• Covariances and Distances • Features and representational components • Factorial models (MANOVA) • Linear representational models • Nonlinear representational models • Summary
Weighting of different features
f1 f2 fk . . . Simple Model
w2 f2 w1 f1 wk fk . . . Weighted component
model
Examples: • Different aspects of objects: color, shape • Different aspects of movements: force, sequence, timing • Different layers of a computation vision model
Features and representational components
f1w1Feature 1
Feat
ure
2
Feature 1
Feat
ure
2
f1w1
Pattern for each condition is caused by different features, each associated with a feature pattern.
U K
Con
ditio
ns
P Voxels
Conditions patterns
=
K C
ondi
tions
Feature pattern Feature
fh
P Voxels
wh
h=1
H
∑
Features and representational components Pattern for each condition is caused by different features, each associated with a feature pattern.
The component weight is the variance or power of the feature pattern. ωh
= whwhT( )
ωh
fhfh
T( )Gh
h=1
H
∑
G = UUT
Covariance matrix
= fhwhh=1
H
∑ whT fh
T
h=1
H
∑
= fhwhh=1
H
∑ whT fh
T w iw j
T = 0
Assuming independence of feature patterns:
Component weight
Component matrix (G)
G = ωhGh
h=1
H
∑Covariance matrix
D = ωhDh
h=1
H
∑
Component weight
Component matrix (G)
Distance matrix
= U K
Con
ditio
ns
P Voxels
K C
ondi
tions
Feature pattern Feature Conditions patterns
fh
P Voxels
wh
h=1
H
∑
Features and representational components Most often we do not weight single features, but groups of features: representation components
The component weight is the variance or power of the feature pattern. ωh
= FhWh WhTFh
T
h=1
H
∑h=1
H
∑
= FhWhWhTFh
T
h=1
H
∑
= ωh FhVhFhT( )
Gh
h=1
H
∑
Component weight
Component matrix (G)
G = ωhGh
h=1
H
∑ D = ωhDh
h=1
H
∑
Component weight
Component matrix (G)
WiWjT = 0
Assuming independence of feature patterns across components
Covariance matrix Distance matrix
= U
K C
ondi
tions
P Voxels
K C
ondi
tions
Feature pattern Feature Conditions patterns
Fh Wh
h=1
H
∑Q Features
Q F
eatu
res
P Voxels
WhWhT = Vhωh
Assuming covariance of feature patterns
G = UUT
Covariance matrix
Features and representational components
Feature vectors
Fh
Covariance component
Gh = FhVhFDistance component
Dij = Gii +G jj − 2Gij
1
1 00 1
⎡
⎣⎢
⎤
⎦⎥
1 0 ...0 1 ...... ... ...
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
11
1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
1/ 2 00 3 / 2
⎡
⎣⎢⎢
⎤
⎦⎥⎥
1 0.50.5 1
⎡
⎣⎢
⎤
⎦⎥
Vh
Feature correlation
Not unique Unique
Features and representational components: Estimation
D = ωhDh
h=1
H
∑
Vectorise
D d
How do we estimate component weights?
X = d1 d2 ...⎡
⎣⎤⎦
Build component matrix
ω = XTX( )−1XT
X+
d
Ordinary least-squares
ω
Predicted distances (Dh) M
easu
red
dist
ance
s (D
)
Features and representational components
• Features are variables encoded in neuronal elements • Groups of features with similar encoding strength form a
representational component • Features of different components are assumed to be
mutually independent • Many feature sets can lead to the same representational
component • Models are uniquely speci"ed via their component matrices
(representational similarity trick) • Component weights estimate variance (or power) of
representations
Overview
• Covariances and Distances • Features and representational components • Factorial models (MANOVA) • Linear representational models • Nonlinear representational models • Summary
Integrated vs. independent encoding
• Now, we assumed that different components are encoded independently in the brain
• This does not mean that they are encoded in different regions / voxel: only that their patterns are unrelated to each other
• BUT: Can we test this?
Integrated vs. independent encoding: factorial models
• Are two groups of features (variables) encoded independently or dependently?
• Vary the 2 factors in a fully crossed design – Condition (see / do) x Action (3 gestures) – Rhythm x Spatial sequence – Reach directions (3) x Grasps (3) ....
• Where is factor A encoded, where is B encoded? • Are A and B encoded in an integrated or independent
fashion? • Is Factor B consistently encoded across levels of factor A
(“cross-decoding”)?
Factorial representational models (MANOVA) Fa
ctor
A
Factor B
see do
grasp
pinch
grip
YRaw data
BRegression coefficients
UPattern estimate
Dissimiliarity matrix
First-level GLM
Searchlight Noise normalisation Crossvalidation
D
ω = XTX( )−1
XTvec D( )
Component design matrix
X
Fact
or A
Fa
ctor
B
Inte
ract
ion
Features
0 1
Representational components
0 2
see
do
Factorial representational models (MANOVA) Fa
ctor
A
Factor B
see do
grasp
pinch
grip
Fact
or A
Fa
ctor
B
Inte
ract
ion
Features Representational components
0 1 0 2
ω = XTX( )−1XT
vec D( )
XTX( )−1XT
-1/6 +1/6
Cross “decoding” Pattern consistency Allefeld et al. (2013)
see
do
Factorial representational models (MANOVA) Fa
ctor
A
Factor B
see do
grasp
pinch
grip
Fact
or A
Fa
ctor
B
Inte
ract
ion
Features Representational components
0 1 0 2
XTX( )−1XT
-1/6 +1/6
Cross “decoding” Pattern consistency Allefeld et al. (2013) Component weights
ω
ωA = 0.005
ωB = 0
ω I = 0.01
see
do
Factorial representational models (MANOVA)
• Factorial models can reveal mean encoding effect and interactions
• Component weight estimates are unbiased and can be directly tested in group analysis
• Main effects are assess by pattern consistency across levels of the other variable (replaces cross-classi"cation)
• Mathematically identical to approach suggested by Allefeld et al. (2013)
Factorial MANOVA designs (example)
Kornysheva et al. (2014). eLife.
lateral medial medial lateral
Contralateral Ipsilateral
Overall encoding
Factorial MANOVA designs (example)
Contralateral Ipsilateral
lateral medial medial lateral
Kornysheva et al. (2014). eLife.
Factorial representational models (MANOVA)
Linearity Assumption Patterns for different components overlap linearly
if the relationship between neural activity and BOLD is approximately linear
AND
if they engage independent neuronal subpopulations
if they combine linearly to determine "ring rate
Experimental conditions should be similar in overall activity
Note: mean value subtraction in analysis does not "x this!
Key insights I
D1. Pattern covariance matrices and squared Euclidean distance matrices capture the same information, but the former retain the baseline D2. A representational component (RC) is a group of representational features. D3. A representation can be modelled as weighted combination of RCs (one weight per RC). D4. Weighted combinations of RCs correspond to weighted combinations of representational distance matrices. D5. Component weights can be estimated using regression and tested directly (against zero) in group analyses.
Weighted Representational Component Models II
Jörn Diedrichsen
Motor Control Group Institute of Cognitive Neuroscience
University College London
Overview
• Covariances and Distances • Features and representational components • Factorial models (MANOVA) • Linear representational models • Nonlinear representational models • Summary
Linear representational models
sequence consisting of chunks
At what level are sequences represented?
Yokoi et al. (in prep)
Linear representational models
Covariance Distance
Representational component Feature
Yokoi et al. (in prep)
Linear representational models
• are independent • have equal variance • are ~ normally distributed
η1
η2
η3
η4
i=1
4
∑ =
The use of simple regression (OLS) assumes that distances:
i. i. d.
If this is violated we have an unbiased estimator, but not the best linear unbiased estimator (BLUE)
Linear representational models: variance of distances
d = δ m( )δ n( )T = δ + ε m( )( ) δ + ε n( )( )T δ ij = ui −u j
δ ij = δ ij + ε Differences between patterns are measured with noise
d = δ δ T + ε m( )δ T +δε n( )T + ε m( )ε n( )T
Squared distances are a sum of inner products, Signal with signal, signal with noise, and noise with noise
E d( ) = δδ T +ε m( )δ T +δε n( )T + ε m( )ε n( )T
0
In the expected value, the inner products containing noise drop out
var d( ) = var δδ T( )0
+σε
2δδ T +σε2δδ T +σε
2σε2P For the variance, we obtain a part
that depends on the distances, and one part that only depends on the noise. Distance dependent Constant
Distance dependent Constant var d( ) = 4
RΔ Σ + 2P
R R −1( ) Σ ΣThe covariance of p distances when doing exhaustive crossvalidation over R partitions:
Element-by-element multiplication
Σ Within-run covariances of δΔ True inner products
δ ,δ
Linear representational models
Text
Taking this into account, we should do better than OLS
Co-variance of distances
Distance dependent Constant var d( ) = 4
RΔ Σ + 2P
R R −1( ) Σ Σ
Linear representational models: IRLS Estimation
Until convergence
4. Use in estimation
Data
ω = XT VX( )−1XT V−1d
3. Calculate variance-covariance of d
V = 4
RΔ Σ + 2P
R R −1( ) Σ Σ1.Start with initial guess of ω
2.Predict distances
Model
d = Xω
Linear representational models
Sequence Large chunk Chunk 2-Finger O
LS
estim
ate
IRLS
es
timat
e
Improvements in SD
OLS is unbiased, but suboptimal IRLS can do better, but by how much depends on model structure
-> for factorial designs it does not matter
Linear representational models: Estimation
How do we best estimate and component weights?
Ordinary least- squares (OLS)
Iteratively reweighted least-squares (IRLS)
• Unbiased estimates • Can become negative • Allows direct testing
of parameters
Non-negative least-squares
Maximum likelihood
• Positive estimates • Biased • Model testing
by crossvalidation Diedrichsen et al. (2011)
Khaligh-Razavi & Kriegeskorte (2014)
Training ω
Test Tight link to encoding models
Overview
• Covariances and Distances • Features and representational components • Factorial models (MANOVA) • Linear representational models • Nonlinear representational models • Summary
Non-linear representational models
Models in which component matrices are non-linear functions of parameters
Targ
et
Targ
et
Targ
et
Feature
Features
Target
Distances
Target
Act
ivity
A
ctiv
ity
Act
ivity
Tuning curves Tuning width
σ = 1
σ = 5
σ = 15
Non-linear representational models
• Sometimes good linear approximations can be found (Example: AR-estimation in "rst-level SPMs)
• Otherwise, estimate nonlinear parameters to optimize the log-likelihood:
logp d |θ( )∝ − 1
2d− d( )T V d( )−1
d− d( )
Overview
• Covariances and Distances • Features and representational components • Factorial models (MANOVA) • Linear representational models • Nonlinear representational models • Summary
The whole process
B
G
Y
Raw data
Regression coefficients
UPattern estimate
Covariance matrix
RDM (LDC)
D
Component weights
Group map
First-level GLM
Distance / covariance estimation
Representational model fit
Group analysis
Searchlight Prewhitening
ω
Group analysis • Model coefficients can be directly tested (unbiased)
• Sometimes it is more sensible to use (SD vs. variance)
ω
ω
ssqrt ω( )
Representational component models
• Representational component models assume • independence of data across partitions
-> (zero-distance is meaningful) • independence of feature patterns across components • linear overlap of patterns (within small range of variations)
• Representational component models do NOT assume • normality of the data • independence of distance estimates • linear relationship between psychological variables and
BOLD
Representational component models
Representational component model
Rank-based RSA
Intercept is not included in fitting Needs to be explicitly modeled
Intercept is implicitly removed Does not contribute to model comparision
Predictions on ratio-scale Predictions on ordinal scale
Non-linearity of distances removed Linearity assumption (narrow range)
Non-linearity can be modeled
Flexible factorial and combined models Single models
ω
Representational component models: Key insights II
E1. Large (squared Euclidean) distances are estimated with larger variability than smaller distances. E2. Distance estimates are statistically dependent in a way that is determined by the true distance structure. E3. Component weights can be estimated using iteratively reweighted least squares (IRLS), which yields better estimates than ordinary least squares (OLS) in some cases.