C. Phillips, Centre de C. Phillips, Centre de Recherches Recherches du Cyclotron, ULg, Belgium du Cyclotron, ULg, Belgium Multiple Multiple comparison comparison problem problem DISCOS SPM course, CRC, Li DISCOS SPM course, CRC, Li è è ge, 2009 ge, 2009 Based on slides from: T. Nichols Contents Contents • • Recap Recap & Introduction & Introduction • • Inference Inference & multiple & multiple comparison comparison • • « « Take Take home home » » message message realignment & motion correction smoothing normalisation General Linear Model model fitting statistic image corrected p-values parameter estimates anatomical reference kernel image data design matrix Statistical Parametric Map correction for multiple comparisons Random effect analysis Dynamic causal modelling, Functional & effective connectivity, PPI, ... Voxel by voxel statistics Voxel by voxel statistics … … parameter estimation hypothesis test statistic image or SPM statistic Time Intensity Time single voxel time series model specification
23
Embed
DISCOS spm mcp - Coma Science Group webpage. Phillips, Centre de Recherches du Cyclotron, ULg, Belgium Multiple Multiple comparison problemcomparison problem DISCOS SPM course, CRC,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
C. Phillips, Centre de C. Phillips, Centre de RecherchesRecherches du Cyclotron, ULg, Belgiumdu Cyclotron, ULg, Belgium
Inference at a single voxelInference at a single voxelInference at a single voxel
−6 −4 −2 0 2 4 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
α = p(t>u|H)
NULL hypothesis, H: activation is zero
u=2t-distribution
We can choose u to ensurea voxel-wise significance level of α.
This is called an ‘uncorrected’p-value, for reasons we’ll see later.
We can then plot a map of above threshold voxels.
• Don’t threshold, model the signal!– Signal location?
• Estimates and CI’s on(x,y,z) location
– Signal magnitude?• CI’s on % change
– Spatial extent?• Estimates and CI’s on activation volume• Robust to choice of cluster definition
• ...but this requires an explicit spatial model
•• DonDon’’t threshold, model the signal!t threshold, model the signal!–– Signal Signal locationlocation??
•• Estimates and CIEstimates and CI’’s ons on((x,y,zx,y,z) location) location
–– Signal Signal magnitudemagnitude??•• CICI’’s on % changes on % change
–– Spatial Spatial extentextent??•• Estimates and CIEstimates and CI’’s on activation volumes on activation volume•• Robust to choice of cluster definitionRobust to choice of cluster definition
•• ...but this requires an explicit spatial ...but this requires an explicit spatial modelmodel
What we’d likeWhat weWhat we’’d liked like
space
Loc.θ Ext.θ
Mag.θ
What we needWhat we needWhat we need
• Need an explicit spatial model• No routine spatial modeling methods
exist– High-dimensional mixture modeling
problem– Activations don’t look like Gaussian blobs– Need realistic shapes, sparse representation
• Some work by Hartvig et al., Penny et al.
•• Need an explicit spatial modelNeed an explicit spatial model•• No routine spatial modeling methods No routine spatial modeling methods
problemproblem–– Activations donActivations don’’t look like Gaussian blobst look like Gaussian blobs–– Need realistic shapes, sparse representationNeed realistic shapes, sparse representation
•• Some work by Some work by HartvigHartvig et al.et al., Penny , Penny et al.et al.
Real-life inference: What we getRealReal--life inference: What we getlife inference: What we get
• Signal location– Local maximum – no inference– Center-of-mass – no inference
• Sensitive to blob-defining-threshold
• Signal magnitude– Local maximum intensity – P-values (& CI’s)
• Spatial extent– Cluster volume – P-value, no CI’s
• Sensitive to blob-defining-threshold
•• Signal Signal locationlocation–– Local maximum Local maximum –– no inferenceno inference–– CenterCenter--ofof--mass mass –– no inferenceno inference
•• Sensitive to blobSensitive to blob--definingdefining--thresholdthreshold
•• Signal Signal magnitudemagnitude–– Local maximum intensity Local maximum intensity –– PP--values (& CIvalues (& CI’’s)s)
•• Spatial Spatial extentextent–– Cluster volume Cluster volume –– PP--value, no CIvalue, no CI’’ss
•• Sensitive to blobSensitive to blob--definingdefining--thresholdthreshold
•• Gives best spatial specificityGives best spatial specificity–– The null hypothesis at a single voxel can be The null hypothesis at a single voxel can be
• Worst spatial specificity– Only can reject global null hypothesis
•• Count number of blobs Count number of blobs cc–– Minimum blob size Minimum blob size kk
•• Worst spatial specificityWorst spatial specificity–– Only can reject global null hypothesisOnly can reject global null hypothesis
uclus
space
Here c = 1; only 1 cluster larger than kk k
Sensitivity and SpecificitySensitivity and SpecificitySensitivity and Specificity
H True (o) TN FP
H False (x) FN TP
Don’tReject
Reject
ACTION
TRUTH
Sens=10/10=100%Spec=7/10=70%
At u1
o o o o o o o x x x o o x x x o x x x xu1 u2
Eg. t-scoresfrom regionsthat truly do and do not activate
Sens=7/10=70%Spec=9/10=90%
At u2
Sensitivity = TP/(TP+FN) = β Specificity = TN/(TN+FP) = 1 - αFP = Type I error or ‘error’FN = Type II errorα = p-value/FP rate/error rate/significance levelβ = power
fMRI Multiple Comparisons ProblemfMRI Multiple Comparisons ProblemfMRI Multiple Comparisons Problem
→→voxel level inferencevoxel level inference–– big big suprathresholdsuprathresholdclustersclusters
→→cluster level inferencecluster level inference–– many many suprathresholdsuprathresholdclustersclusters
→→set level inferenceset level inference
•• Power & localisationPower & localisation→→sensitivitysensitivity→→spatial specificityspatial specificity
Solutions forMultiple Comparison Problem
Solutions forSolutions forMultiple Comparison ProblemMultiple Comparison Problem
• A MCP Solution must control “False Positives”– How to measure multiple false positives?
• Familywise Error Rate (FWER)– Chance of any false positives– Controlled by Bonferroni, Random Field
Methods, non-parametric method (SnPM).
• False Discovery Rate (FDR)– Proportion of false positives among rejected
tests
•• A MCP Solution must control A MCP Solution must control ““False False PositivesPositives””–– How to measure multiple false positives?How to measure multiple false positives?
•• Familywise Error Rate (FWER)Familywise Error Rate (FWER)–– Chance of Chance of anyany false positivesfalse positives–– Controlled by Controlled by BonferroniBonferroni, Random Field , Random Field
•• 100(1100(1--αα)%ile of )%ile of max max distdistnn controls FWEcontrols FWEFWE = P( maxFWE = P( maxii TTi i ≥≥ uuαα | | HHoo) ) = = αα
–– wherewhereuuαα = = FF--11
max max (1(1--αα))
..
uα
α
Example: Experiment with 100,000 « voxels » and 40 d.f.
type I error α=0.05 (5% risk) ⇒ tα = 1.68100,000 t values ⇒ 5000 t values > 1.68
just by chance !
Familywise Error I test, PFWE:find threshold tα such that, in a family of 100,000 tstatistics, only 5% probability of one or more tvalues above that threshold
type I type I errorerror αα=0.05 (5% =0.05 (5% riskrisk) ) ⇒⇒ ttαα = 1.68= 1.68100,000 100,000 t t values values ⇒⇒ 5000 5000 tt values > 1.68values > 1.68
justjust by chance !by chance !
FamilywiseFamilywise ErrorError I test, I test, PPFWEFWE::findfind thresholdthreshold ttαα suchsuch thatthat, in a , in a familyfamily of 100,000 of 100,000 ttstatisticsstatistics, , onlyonly 5% 5% probabilityprobability of one or more of one or more ttvalues values aboveabove thatthat thresholdthreshold
Bonferroni correction:simple method to find the new threshold
Random field theory:more accurate for functional imaging
Given • a family of N independent voxels and • a voxel-wise error rate v
The probability that all tests are below the threshold, i.e. that Ho is true : (1- v)N
The Family-Wise Error rate (FWE) or ‘corrected’ errorrate α is
α = 1 – (1-v)N
~ Nv (for small v)
Therefore, to ensure a particular FWE we choosev = α/N
A Bonferroni correction is appropriate for independent tests.
Given Given •• a family of a family of NN independent voxels and independent voxels and •• a voxela voxel--wise error rate wise error rate v
The probability that all tests are below the threshold, i.e. that Ho is true : (1- v)N
The FamilyThe Family--Wise Error rate (FWE) or Wise Error rate (FWE) or ‘‘correctedcorrected’’ errorerrorrate rate αα isis
αα = 1 = 1 –– (1(1--v)v)NN
~ Nv (for small v)
Therefore, to ensure a particular FWE we choosev = α/N
A Bonferroni correction is appropriate for independent tests.
The “Bonferroni” correction…The The ““BonferroniBonferroni”” correctioncorrection……
Experiment with N = 100,000 « voxels » and 40 d.f.– v = unknown corrected probability threshold, – find v such that family-wise error rate α = 0.05
Bonferroni correction:– probability that all tests are below the threshold,– Use v = α / N– Here v=0.05/100000=0.0000005
⇒ threshold t = 5.77Interpretation:
Bonferroni procedure gives a corrected p value, i.e. for a t statistics = 5.77, – uncorrectd p value = 0.0000005– corrected p value = 0.05
ExperimentExperiment withwith N = 100,000 N = 100,000 «« voxelsvoxels »» and 40 d.f.and 40 d.f.– v = unknown corrected probability threshold, – find v such that family-wise error rate α = 0.05
BonferroniBonferroni correction:correction:– probability that all tests are below the threshold,– Use v = α / N– Here v=0.05/100000=0.0000005
⇒ threshold t = 5.77InterpretationInterpretation::
Bonferroni procedure gives a corrected p value, i.e. for a t statistics = 5.77, – uncorrectd p value = 0.0000005– corrected p value = 0.05
The “Bonferroni” correction…The The ““BonferroniBonferroni”” correctioncorrection……
100 by 100 voxels, with a zvalue.10000 independent measuresFix the PFWE = 0.05, z threshold ?Bonferroni:
v = 0.05/10000 = 0.000005 ⇒ threshold z = 4.42
100 by 100 voxels, with a zvalue.How many independentmeasures ?
Fix the PFWE = 0.05, z threshold ?Bonferroni ?
Random Field TheoryRandom Field TheoryRandom Field Theory
• Consider a statistic image as a lattice representation of a continuous random field
• Use results from continuous random field theory
•• Consider a statistic image as a lattice Consider a statistic image as a lattice representation of a continuous random fieldrepresentation of a continuous random field
•• Use results from continuous random field theoryUse results from continuous random field theory
• General form for expected Euler characteristic• χ2, F, & t fields • restricted search regions
α = Σ Rd (Ω) ρd (u)
•• General form for expected Euler characteristicGeneral form for expected Euler characteristic•• χχ22, , FF, & , & tt fields fields •• restricted search regionsrestricted search regions
αα = = ΣΣ RRd d ((ΩΩ)) ρρd d ((uu))
Unified TheoryUnified TheoryUnified Theory
Rd (Ω), RESEL count depends on :
• the search region– how big, how smooth,
what shape ?
ρd (υ): EC density depends on :
• type of field (eg. Gaussian, t)• the threshold, u.
Worsley et al. (1996), HBM
Au
Ω
• General form for expected Euler characteristic• χ2, F, & t fields • restricted search regions
α = Σ Rd (Ω) ρd (u)
•• General form for expected Euler characteristicGeneral form for expected Euler characteristic•• χχ22, , FF, & , & tt fields fields •• restricted search regionsrestricted search regions
αα = = ΣΣ RRd d ((ΩΩ)) ρρd d ((uu))
Unified TheoryUnified TheoryUnified Theory
Rd (Ω), d-dimensional Minkowski functional of Ω
• R0(Ω)=χ(Ω) Euler characteristic of Ω
• R1(Ω)=resel diameter
• R2(Ω)=resel surface area
• R3(Ω)=resel volume
ρd (u), d-dimensional EC density :E.g. Gaussian RF:
•• RR = = λλ((ΩΩ)) √√ ||ΛΛ|| = (4log2)= (4log2)3/2 3/2 λλ((ΩΩ)) / ( / ( FWHMFWHMxx × × FWHMFWHMyy × × FWHMFWHMzz ) ) •• Volume of search region in units of smoothnessVolume of search region in units of smoothness
•• Beware RESEL misinterpretationBeware RESEL misinterpretation–– RESEL RESEL are not are not ““number of independent number of independent ‘‘thingsthings’’ in the imagein the image””
•• See Nichols & Hayasaka, 2003, Stat. See Nichols & Hayasaka, 2003, Stat. MethMeth. in Med. Res.. in Med. Res...
Random Field TheorySmoothness Parameterization
Random Field TheoryRandom Field TheorySmoothness ParameterizationSmoothness Parameterization
–– PPcc increases (more severe MCP)increases (more severe MCP)•• Smoothness increases Smoothness increases (roughness (roughness ||ΛΛ||1/21/2 decreases)decreases)
–– PPcc decreases (less severe MCP)decreases (less severe MCP)
Small Volume CorrectionSmall Volume CorrectionSmall Volume Correction
SVC = correction for multiple comparison in a user’s defined volume ‘of interest’.
SVC = correction for multiple comparison in a SVC = correction for multiple comparison in a useruser’’s defined volume s defined volume ‘‘of interestof interest’’..
Shape and size of volume becomeimportant for small or oddly shaped volume !
Example of SVC (900 voxels)• compact volume: samples
from maximum 16 resels• spread volume: sample
from up to 36 resels⇒ threshold higher for
spread volume thancompact volume.
Resel Counts for Brain StructuresReselResel Counts for Brain StructuresCounts for Brain Structures
FWHM=20mm(1) Threshold depends on Search Volume(2) Surface area makes a large contribution
SummarySummarySummary
• We should correct for multiple comparisons– We can use Random Field Theory (RFT) or other methods
• RFT requires– a good lattice approximation to underlying multivariate
Gaussian fields,
– that these fields are continuous with a twice differentiable correlation function
• To a first approximation, RFT is a Bonferroni correction using RESELS.
• We only need to correct for the volume of interest.
• Depending on nature of signal we can trade-off anatomical specificity for signal sensitivity with the use of cluster-level inference.
• We should correct for multiple comparisons– We can use Random Field Theory (RFT) or other methods
• RFT requires– a good lattice approximation to underlying multivariate
Gaussian fields,
– that these fields are continuous with a twice differentiable correlation function
• To a first approximation, RFT is a Bonferroni correction using RESELS.
• We only need to correct for the volume of interest.
• Depending on nature of signal we can trade-off anatomical specificity for signal sensitivity with the use of cluster-level inference.
ContentsContentsContents
• Recap & Introduction
• Inference & multiple comparison
• Single/multiple voxel inference
• Family wise error rate (FWER)• Bonferroni correction/Random Field Theory• Non-parametric approach
Permutation Test : Toy ExamplePermutation Test : Toy Example
• Under Ho– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution
•• Under Under HHoo
–– Consider all equivalent Consider all equivalent relabelingsrelabelings–– Compute all possible statistic valuesCompute all possible statistic values–– Find 95%ile of permutation distributionFind 95%ile of permutation distribution
Permutation Test : Toy ExamplePermutation Test : Toy Example
• Under Ho– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution
• Under Ho– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution
0 4 8-4-8
Permutation Test : Toy ExamplePermutation Test : Toy Example
• Under Ho– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution
•• Under Under HHoo
–– Consider all equivalent Consider all equivalent relabelingsrelabelings–– Compute all possible statistic valuesCompute all possible statistic values–– Find 95%ile of permutation distributionFind 95%ile of permutation distribution
Controlling FWER: Permutation TestControlling FWER: Permutation Test
• Parametric methods– Assume distribution of
max statistic under nullhypothesis
• Nonparametric methods– Use data to find
distribution of max statisticunder null hypothesis
– Again, any max statistic!
• Parametric methods– Assume distribution of
max statistic under nullhypothesis
• Nonparametric methods– Use data to find
distribution of max statisticunder null hypothesis
– Again, any max statistic!
5%
Parametric Null Max Distribution
5%
Nonparametric Null Max Distribution
Permutation Test & ExchangeabilityPermutation Test & Exchangeability
• Exchangeability is fundamental– Def: Distribution of the data unperturbed by
permutation– Under H0, exchangeability justifies permuting data– Allows us to build permutation distribution
• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped
• Are fMRI scans exchangeable under Ho?– If no signal, can we permute over time?
• Exchangeability is fundamental– Def: Distribution of the data unperturbed by
permutation– Under H0, exchangeability justifies permuting data– Allows us to build permutation distribution
• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped
• Are fMRI scans exchangeable under Ho?– If no signal, can we permute over time?
Permutation Test & ExchangeabilityPermutation Test & Exchangeability
• fMRI scans are not exchangeable– Permuting disrupts order, temporal autocorrelation
• Intrasubject fMRI permutation test– Must decorrelate data, model before permuting– What is correlation structure?
• Usually must use parametric model of correlation
– E.g. Use wavelets to decorrelate• Bullmore et al 2001, HBM 12:61-78
• Intersubject fMRI permutation test– Create difference image for each subject– For each permutation, flip sign of some subjects
•• fMRI scans are not exchangeablefMRI scans are not exchangeable–– Permuting disrupts order, temporal autocorrelationPermuting disrupts order, temporal autocorrelation
•• IntraIntrasubject fMRI permutation testsubject fMRI permutation test–– Must decorrelate data, model before permutingMust decorrelate data, model before permuting–– What is correlation structure?What is correlation structure?
•• Usually must use parametric model of correlationUsually must use parametric model of correlation
–– E.g. Use wavelets to E.g. Use wavelets to decorrelatedecorrelate•• BullmoreBullmore et al 2001, HBM 12:61et al 2001, HBM 12:61--7878
•• InterIntersubject fMRI permutation testsubject fMRI permutation test–– Create difference image for each subjectCreate difference image for each subject–– For each permutation, flip sign of some subjectsFor each permutation, flip sign of some subjects
Permutation Test : ExamplePermutation Test : ExamplePermutation Test : Example
• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)
– Item Recognition• Active:View five letters, 2s pause,
• Second Level RFX– Difference image, A-B constructed
for each subject– One sample, smoothed variance t test
...
D
yes
...
UBKDA
Active
...
N
no
...
XXXXX
Baseline
Permutation Test : ExamplePermutation Test : Example
• Permute!– 212 = 4,096 ways to flip 12 A/B labels– For each, note maximum of t image.
•• Permute!Permute!–– 221212 = 4,096 ways to flip 12 A/B labels= 4,096 ways to flip 12 A/B labels–– For each, note maximum of For each, note maximum of t t imageimage..
•• AdaptiveAdaptive–– Larger the signal, the lower the thresholdLarger the signal, the lower the threshold–– Larger the signal, the more false positivesLarger the signal, the more false positives
•• False positives constant as fraction of False positives constant as fraction of rejected testsrejected tests
•• Not a problem with imagingNot a problem with imaging’’s sparse s sparse signalssignals