YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning to distinguish cognitive subprocesses based on fMRI

Tom M. MitchellCenter for Automated Learning and Discovery

Carnegie Mellon University

Collaborators: Luis Barrios, Rebecca Hutchinson, Marcel Just, Francisco Pereira, Jay Pujara, John

Ramish, Indra Rustandi

Page 2: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Can we distinguish brief cognitive processes using fMRI?

Finds sentence ambiguous or not?

Page 3: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Decide whether consistent

Can we classify/track multiple overlapping processes?

Observed fMRI:

Observed button press:

Read sentence

View picture

Page 4: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Mental Algebra Task

[Anderson, Qin, & Sohn, 2002]

24 3 c

Page 5: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

[Anderson, Qin, & Sohn, 2002]

Activity Predicted by ACT-R Model

Typical ACT-R rule:

IF “_ op a = b”

THEN “ _ = <b <inv op> a>”

Page 6: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

[Anderson, Qin,& Sohn, 2002]

Page 7: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Outline

• Training classifiers for short cognitive processes– Examples– Classifier learning algorithms– Feature selection– Training across multiple subjects

• Simultaneously classifying multiple overlapping processes– Linear Model and classification– Hidden Processes and EM

Page 8: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Training “Virtual Sensors” of Cognitive Processes

Train classifiers of form: fMRI(t, t+) CognitiveProcess

e.g., fMRI(t, t+) = {ReadSentence, ViewPicture}

• Fixed set of cognitive processes

• Fixed time interval [t, t+]

Page 9: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Study 1: Pictures and Sentences

• Subject answers whether sentence describes picture by pressing button.

• 13 subjects, TR=500msec

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

Data from [Keller et al., 2001]

Page 10: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

It is not true that the star is above the plus.

Page 11: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:
Page 12: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

+

---

*

Page 13: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

.

Page 14: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

• Learn fMRI(t,t+8) {Picture,Sentence}, for t=0,8

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

picture or sentence? picture or sentence?

Difficulties:

only 8 seconds of very noisy data

overlapping hemodynamic responses

additional cognitive processes occuring simultaneously

Page 15: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning task formulation:

• Learn fMRI(t, …, t+8) {Picture, Sentence}– 40 trials (40 pictures and 40 sentences)– fMRI(t,…t+8) = voxels x time (~ 32,000 features)– Train separate classifier for each of 13 subjects– Evaluate cross-validated prediction accuracy

• Learning algorithms:– Gaussian Naïve Bayes– Linear Support Vector Machine (SVM)– k-Nearest Neighbor – Artificial Neural Networks

• Feature selection/abstraction– Select subset of voxels (by signal, by anatomy)– Select subinterval of time– Summarize by averaging voxel activities over space, time– …

Page 16: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning a Gaussian Naïve Bayes (GNB) classifier for <f1, … fn> C

For each class value, ci,

1. Estimate

2. For each feature fj estimate

modeling distribution for each ci , fj, as Gaussian,

Applying GNB classifier to new instance

f2f1

C

fn…

Page 17: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Support Vector Machines [Vapnik et al. 1992]

• Method for learning classifiers corresponding to linear decision surface in high dimensional spaces

• Chooses maximum margin decision surface

• Useful in many high-dimensional domains– Text classification– Character recognition– Microarray analysis

Page 18: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Support Vector Machines (SVM)

Page 19: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Linear SVM

Page 20: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:
Page 21: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Non-linear Support Vector Machines

• Based on applying kernel functions to data points

– Equivalent to projecting data into higher dimensional space, then finding linear decision surface

– Select kernel complexity (H) to minimize ‘structural risk’

Error on training data

Variance term related to kernel H complexity and number of training

examples m

True error rate

Page 22: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Generative vs. Discriminative Classifiers

Goal: learn , equivalently

Discriminative classifier:

• Learn directly

Generative classifier:

• Learn

• Classify using

Page 23: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Generative vs. Discriminative Classifiers

Discriminative Generative

What they estimate:

P(C|data) P(data|C)

Examples: SVM’s,

Artificial Neural Nets

Naïve Bayes, Bayesian networks

Robustness to modeling errors

Typically more robust

Less robust

Criterion for estimating parameters

Minimize classification

error

Maximize data likelihood

Page 24: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

GNB vs. Logistic regression [Ng, Jordan NIPS03]

Gaussian naïve Bayes

• Model P(X|C) as a class-conditional Gaussian

• Decision surface: hyperplane

• Learning converges in O(log(n)) examples, where n is number of data attributes

Logistic regression

• Model P(C|X) as a logistic function

• Decision surface: hyperplane

• Learning converges in O(n) examples

• Asymptotic error less or same as GNB

Page 25: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Accuracy of Trained Pict/Sent Classifier

• Results (leave one out cross validation)– Guessing 50% accuracy

– SVM: 91% mean accuracy• Single subject accuracies ranged from 75% to 98%

– GNB: 84% mean accuracy

– Feature selection step important for both• ~10,000 voxels x 16 time samples = 160,000 features• Selected only 240 voxels x 16 time samples

Page 26: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Can We Train Subject-Indep Classifiers?

Page 27: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Training Cross-Subject Classifiers for Picture/Sentence

• Approach1: define “supervoxels” based on anatomically defined brain regions– Abstract to seven brain region supervoxels– Each supervoxel 100’s to 1000’s of voxels

• Train on n-1 subjects, test on nth subject

• Result: 75% prediction accuracy over subjects outside training set– Compared to 91% avg. single-subject accuracies– Significantly better than 50% guessing accuracy

[Wang, Hutchinson, Mitchell. NIPS03]

Page 28: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Study 2: Semantic Word Categories

Word categories:• Fish• Trees• Vegetables

• Tools• Dwellings• Building parts

[Francisco Pereira]

Experimental setup:• Block design• Two blocks per

category• Each block begins by

presenting category name, then 20 words

• Subject indicates whether word fits category

Page 29: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning task formulation• Learn fMRI(t, …, t+32) WordCategory

– fMRI(t,…t+32) represented by mean fMRI image– Train on presentation 1, test on presentation 2 (and vice versa)

• Learning algorithm:– 1-Nearest Neighbor, based on spatial correlation [after Haxby]

• Feature selection/abstraction– Select most ‘object selective’ voxels, based on multiple regression

on boxcars convolved with gamma function– 300 voxels in ventral temporal cortex produced greatest accuracy

Page 30: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Results predicting word semantic category

Mean pairwise prediction accuracy averaged over 8 subjects:

• Ventral temporal: 77% (low: 57%, high 88%)• Parietal: 70%• Frontal: 67%

Random guess: 50%

Page 31: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Mean Activation per Voxel for Word Categories

Tools

Dwellings

Vegetables

one horizontal slice, ventral temporal cortex

[Pereira, et al 2004]

P(fMRI | WordCategory)

Page 32: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Plot of single-voxel classification accuracies.

Gaussian naïve Bayes classifier (yellow and red are most predictive). Images from three different subjects show similar regions with highly informative voxels.

Subject 1 Subject 2 Subject 3

Page 33: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Single-voxel GNB classification error vs. p value from T-statistic

N=10^6, P < 0.0001, Error = 0.51 N=10^3, P < 0.0001, Error = 0.01

Cross validated prediction error is unbiased estimate of the Bayes optimal error – the area under the intersection

Page 34: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Question:

Do different people’s brains ‘encode’ semantic categories

using the same spatial patterns?

No.

But, there are cross-subject regularities in “distances” between categories, as measured by classifier error rates.

Page 35: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Six-Category Study: Pairwise Classification Errors (ventral temporal cortex)

Fish Vegetables Tools Dwellings Trees Bldg Parts

Subj1 .20 .55 * .20 .15 .15 .05 *Sub2 .10 * .55 * .35 .20 .10 * .30Sub3 .20 .35 * .15 * .20 .20 .20Sub4 .15 .45 * .15 .15 .25 .05 *Sub5 .60 * .55 .25 .20 .15 * .15 *Sub6 .20 .25 .00 * .30 * .30 * .05Sub7 .15 .55 * .15 .25 .15 .05 *Mean .23 .46 .18 .21 .19 .12

* Worst * Best

Page 36: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

LDA classification of semantic categories of photographs.

[Carlson, et al., J. Cog. Neurosci, 2003]

Page 37: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Cox & Savoy, Neuroimage 2003

Trained SVM and LDA classifiers for semantic photo categories.

Classifiers applied to same subject a week later were equally accurate

Page 38: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Lessons Learned

Yes, one can train machine learning classifiers to distinguish a variety of cognitive processes– Comprehend Picture vs. Sentence– Read ambiguous sentence vs. unambiguous– Read Noun vs. Verb– Read Nouns about “tools” vs. “building parts”

Failures too:– True vs. false sentences– Negative vs. affirmative sentences

Page 39: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Which Machine Learning Method Works Best?

• GNB and SVM tend to outperform KNN• Feature selection important

NoYes

NoYes

NoYes

NoYes

Average per-subject classification error

Page 40: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Which Feature Selection Works Best?

• Conventional wisdom: pick features xi that best distinguish between classes A and B– E.g., sort xi by mutual information, choose the top n

• Surprise:

Alternative strategy worked much better

Wish to learn F: <x1,x2,…xn> {A,B}

Page 41: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

The learning setting

Class A Class B

Rest / Fixation

Voxel discriminability

Voxel activity Voxel activity

Page 42: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

GNB Classifier Errors: Feature Selection

NA.23.27.21ROI Active Average

.09.31.27.18ROI Active

.08.34.25.16Active

.10.36.34.26Discriminate target classes

.10.36.43.29All features

Word Categories

Nouns vs. Verbs

Syntactic Ambiguity

Picture Sentence

fMRI study

feat

ure

sel

ecti

on m

eth

od

Page 43: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

X1=S1+N1 X2=S2+N2

Z = N0

Goal: learn f: XY or P(Y|X)

Given:

1. Training examples <Xi, Yi> where Xi = Si + Ni , signal Si ~ P(S|Y= Yi), noise Ni ~ Pnoise

2. Observed noise with zero signal N0 ~ Pnoise

“Zero Signal” learning setting.

Zero signal (fixation)

Class 1 observations

Class 2 observations

Select features based on discrim(X1,X2) or discrim(Z,Xi)?

Page 44: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

“Zero Signal” learning setting

Conjecture: feature selection using discrim(Z,Xi) will improve relative to discrim(X1,X2) as:

• # of features increases

• # of training examples decreases

• signal/noise ratio decreases

• fraction of relevant features decreases

Page 45: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Decide whether consistent

2. Can we classify/track multiple overlapping processes?

Observed fMRI:

Observed button press:

Read sentence

View picture

Input stimuli:

?

Page 46: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Bayes Net related State-Space ModelsHMM’s, DBNs, etc. e.g., [Ghahramani, 2001]

Cognitive subprocesses / state variables:

fMRI:

see [Hojen-Sorensen et al, NIPS99]

Page 47: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Hidden Process Model Each process defined by:

– ProcessID: <comprehend sentence>– Maximum HDR duration: R– EmissionDistribution: [ W(v,t) ]

Interpretation Z of data: set of process instances– Desire max likelihood { <ProcessIDi, StartTimei>}

– Where data likelihood is

Generative model for classifying overlapping hidden processes

[with Rebecca Hutchinson]

Page 48: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Classifying Processes with HPMs

Start time known:

Start time unknown: consider candidate times S

Page 49: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

GNB classifier is a special case of HPM classifier

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

picture or sentence? picture or sentence?

16 sec.

GNB:

picture or sentence?

picture or sentence?

HPM:

Page 50: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning HPMs

• Known start times:Least squares regression, eg. see Dale[HMB,

1999]

• Unknown start times:EM algorithm– Repeat:

• Estimate P(S|Y,W)• W’ arg max

Currently implement M step with gradient ascent

Page 51: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

OLS learns 2 processes, overlapping in time, 1 voxel, zero noise, start times known, 10 trials

Estimates:

-00.250.50.7510.750.50.253.5108e-17

-4.7535e-170.50.50.50.50.50.50.50.5

[Indra Rustandi]

Observed data

Reconstructed data

Learned process 1

Learned process 2

Page 52: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

OLS learns 2 processes, overlapping in time, 1 voxel, noise 0.2, start times known, 10 trials

Estimates:

0.00549560.324460.488470.833170.998720.865550.556240.23633-0.050592

-0.0173760.364350.361340.48560.601430.461680.541370.474660.52419

[Indra Rustandi]

Observed data

Reconstructed data

Learned process 1

Learned process 2

Page 53: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Phase II, Words every 3 seconds. Mean LFEF, subj 08179

Estimate Noun and Verb impulse responses

Verb impulse response estimated from above

Verb impulse response “ground truth” from non-

overlapping stimuli

[Indra Rustandi]

Page 54: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Decide whether consistent

Can we classify/track multiple overlapping processes?

Observed fMRI:

Observed button press:

Read sentence

View picture

Page 55: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learned HPM with 3 processes (S,P,D), and R=13sec (TR=500msec).

P PS S

D?

Learned models

S

P

D

observed

reconstructed

D start time picked to be trailStart+18

P PS S

D D

D?

Page 56: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Initial results: HPM’s on PictSent

• EM chooses start time = 18 for hidden D process

• Classification accuracy for heldout PS/SP trials = 15/20 = 0.75

• Heldout classification accuracy same for 2 process (P,S) and 3 process (P,S,D) models

• Data likelihood over heldout data slightly better for 3 process (P,S,D)

Page 57: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Further reading• Carlson, et al., J. Cog. Neurosci, 2003

• Cox, D.D. and R.L. Savoy, Functional magnetic resonance imaging (fMRI) ``brain reading'': detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, Volume 19, Pages 261--270, 2003.

• Kjems, U., L. Hansen, J. Anderson, S. Frutiger, S. Muley, J. Sidtis, D. Rottenberg, and S. C. Strother. The quantitative evalutation of functional neuroimaging experiments: mutual information learning curves, NeuroImage 15, pp. 772--786, 2002.

• Mitchell, T.M., R. Hutchinson, M. Just, S. R. Niculescu, F. Pereira, X. Wang, Classifying Instantaneous Cognitive States from fMRI Data. Proceedings of the 2003 Americal Medical Informatics Association Annual Symposium, Washington D.C., November 2003.

• Mitchell, T.M., R. Hutchinson, S. R. Niculescu, F. Pereira, X. Wang, , M. Just, S. Newman. Learning to Decode Cognitive States from Brain Images, Machine Learning, 2004.

• Strother S.C., J. Anderson, L.Hansen, U.Kjems, R.Kustra, J. Siditis, S. Frutiger, S. Muley, S. LaConte, and D. Rottenberg. The quantitative evaluation of functional neuroimaging experiments: The NPAIRS data analysis framework. NeuroImage 15:747-771, 2002.

• Wang, X., R. Hutchinson, and T.~M. Mitchell. Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects. Proceedings of the 2003 Conference on Neural Information Processing Systems, Vancouver, December 2003.


Related Documents