Understandingneuralrepresentationsinearlyvisualareas 798 ...tzhi/presentations/SfN2016_understanding_neural.pdfYimeng Zhanga, Corentin Massota, Tiancheng Zhib, George Papandreouc,
Post on 29-Jul-2020
0 Views
Preview:
Transcript
798.01 / Y5Understandingneural representations inearlyvisualareas
usingconvolutionalneuralnetworks
Yimeng Zhang
a
, Corentin Massot
a
, Tiancheng Zhi
b
, George Papandreou
c
, Alan Yuille
c
, Tai Sing Lee
a
aCenter for the Neural Basis of Cognition and Computer Science Department, Carnegie Mellon University, Pittsburgh, PA bPeking University, Beijing, China cUCLA, Los Angeles, CA
Motivations
• Convolutional neural networks (CNNs)
have feature representations like those
in higher layers of the primate and hu-
man visual cortex (Agrawal et al., 2014;
Khaligh-Razavi & Kriegeskorte, 2014;
Yamins et al., 2014).
• Recent data on V1 neurons suggested
that they may encode much more com-
plex features (see poster 798.03/Y7).
• CNN might be a useful tool to under-
stand the encoding of complex features
in lower layers (V1/V2) of visual cortex
as well.
Images and neural data
• 286 V1 and 390 V2 neurons in 2 mon-
keys to 150 stimuli using multi-electrode
arrays.
• The 150 stimuli have 3 subsets of 50,
Edge (E), Appearance (A), and Exem-
plar (EX), in levels of increasing com-
plexity.
0
4
8
12
16
20
0
5
10
15
20
0
4
8
12
16
20
EAEX
10 20 30 40 50
0
5
10
15
20
Ranked stimuli
Fir
ing
ra
te (
spk
/s)
25
stim
uli
Re
spo
nse
s
Edge Appearance Exemplar Tuning curves
Fir
ing
ra
te (
spk
/s)
a
b c
E A EX E A EX
• Around 3000 V1 neurons in 3 monkeys
to 2250 natural images, using calcium
imaging (see poster 798.03/Y7).
Computer vision models
normalize pool
conv1 norm1 pool1 pool5 fc6 fc7
6 otherlayers
pool
227 x 227 x 3 55 x 55 x 96 55 x 55 x 96 27 x 27 x 96 6 x
6 x
256
convolve + threshold⊗⊗⊗⊗
product + threshold× ×× ×
product + threshold× ×× ×
1 x
1 x
2048
1 x
1 x
2048
CNN “AlexNet”
V1like / V1likeSCconvolve
⊗⊗⊗⊗
threshold & saturate
normalize poolfilters for V1like filters for V1likeSC
Model comparison using Representational Similarity Analysis
• RSA (Kriegeskorte et al., 2008)
was used to compare model and
neural representationsmodel
,
area
(area can be V1, V2, etc.)
• Similarity between representa-
tions is defined as the Spearman’s
correlation coefficient of their rep-
resentational dissimilarity matri-
ces (RDMs), each of which cap-
tures pairwise distances between
images for a given representation.
50 im
ages
20 units 30 neurons
RDM(V1)RDM(model)
RDM(φ(x))ij = 1 ρ(φ(xi),φ(xj))
where denotes Pearson’s correlation.
V1(xi), i = 1, . . . , 50model(xi)
Results
E A EX
V1V2
Spea
rman
cor
rela
tion
with
neu
ral R
DM
model vs. neural data on the 150 stimuli set
V1V2
CNN vs. neural data on the 150 stimuli set, by layer
norm
alize
d si
mila
rity
CNN vs. neural data on the 2250 stimuli set, by layer
sim
ilarit
y V1
• Left: comparison of models on the 150 stimulus set. Top right: all CNN layers on the 150stimulus set. Bottom right: all CNN layers on the 2250 stimulus set.
• Horizontal lines estimates the achievable similarity by computing the similarities of feature repre-sentations among different monkeys. Similar to “explainable variance”.
• 150 set: CNN > V1likeSC > V1like, especially on complex stimuli (EX), and the best matchingCNN layer is stimulus dependent, simpler stimuli (E) best explained by lower layers, and complexstimuli (EX) by higher layers.
• 2250 set: CNN is far away from achievable similarity, suggesting missing constraints in CNN.• Higher layers in CNN can match well (even better) with V1/V2 than lower layers, suggesting
complex coding in V1/V2 neurons.
Neuron matching and visualization
visualization of some best matching units using deconvolution.
V2, pool2
0.00.10.20.30.40.5
0.00.10.20.30.40.5
KK
V1V2
conv
1
norm
1
pool
1
conv
2
norm
2
pool
2
conv
3
conv
4
conv
5
pool
5
fc6
fc7
norm
alize
d pr
opor
tion
datasetEAEX
V1, p
ool1 a, E b, E c, EX d, EX
Distributions of the layers of best matching CNN units for each individual neuron.
f, EX
e, E
• Single neuron matching results were consistent with RSA: V1 matched better to pool1, V2 to pool2.Complex stimuli (EX) shifted to higher layers compared to simple stimuli (E). V2 neurons werealso more correlated to higher layer CNN units than V1 neurons.
• While some neurons have visualizations consistent with the existing literature (a,b,c,e), someneurons preferred more complex features (d,f).
Why CNN performs better
• Network effects. Without normalization andpooling, V1like performed worse (not shown).
• Diverse filter shapes. V1likeSC and pool1 arebetter than V1like partially due to learned di-verse filters compared to Gabor ones in V1like.
• Network architecture might contribute as well.On the 2250 stimulus set, higher CNN layersperformed better than lower layers even withall network parameters being random.
Conclusion
• Some V1/V2 neurons may encode
more complex features than previously
thought.
• CNN is a good approximate model for
understanding and visualizing V1 and
V2 neurons.
• Future work: (1) add more biological
constraints into CNN models to make
CNN explain neural data better, (2) ex-
plore CNNs with heterogeneous layers,
each layer with units of different com-
plexities.
References
• Agrawal, P. et al. 2014, arXiv.org.• Khaligh-Razavi, S.-M. & Kriegeskorte, N. 2014, PLoS computa-
tional biology, 10, e1003915.• Kriegeskorte, N., Mur, M., & Bandettini, P. A. 2008, Frontiers in
Systems Neuroscience, 2.• Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012, in NIPS 25,
1097–1105.• Olshausen, B. A. 2013, in IS&T/SPIE Electronic Imaging.• Pinto, N., Cox, D. D., & DiCarlo, J. J. 2008, PLoS Comput Biol,
4, e27.• Yamins, D. L. K. et al. 2014, Proceedings of the National Academy
of Sciences, 111, 8619.
Acknowledgments
This research is supported by IARPA MICRONScontract #D16PC00007, and NIH R01 EY022247.
Thanks to Professor Shiming Tang for thecalcium imaging monkey data.
Contact: yimengzh@cs.cmu.edu
top related