q-value - CBCBcbcb.umd.edu/~hcorrada/CFG/lectures/student_presentations/qvalue.pdf · set q-value cutoff at .05, and be sure that only 5% of the significant genes found are likely

q-value

Tiffany ChaoBeth JohnsonSteven Lee

Hypothesis testing

● Test for each gene○ null hypothesis: no differential expression

● Two kinds of errors○ type I error (false positive)

say that a gene is differentially expressed when it actually isn't; wrongly reject a true null hypothesis

○ type II error (false negative)

say that a gene isn't differentially expressed when it actually is; fail to reject a false null hypothesis

Thinking about p-values

● Probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming the null hypothesis is true

● Minimum false positive rate at which an observed statistic can be called significant

● If null hypothesis is simple, then a null p-value is uniformly distributed

Multiple comparison problem

● Even if we have useful approximations for our p-values, we still face the multiple comparison problem

● When performing many independent tests, p-values no longer have the same interpretation

Not only in genomics!

● "Statistical Comparisons of Classifiers over Multiple Datasets", Demsar, JMLR 2006

● "Permutation Tests for Studying Classifier Performance", Ojala, JMLR 2010

● "On Comparing Classifiers: Pitfalls to avoid and a recommended approach", Salzberg, 1997, Data Mining and Knowledge Discovery

Suppose we care about p-values ≤ 0.05?

Multiple hypothesis testing

Called significant

Called not significant

Total

Null true F m0 – F m0

Alternative true T m1 – T m1

Total S m – S m

Error rates (more on this later)

● Per comparison error rate (PCER)○ E[F] / m

● Per family error rate (PFER)○ E[F]

● Family-wise error rate (FWER)○ Pr(F ≥ 1)

● False discovery rate (FDR)*○ E[F/S] (and set F/S = 0 when S = 0)

= E[F/S | S > 0] Pr(S > 0)

● Positive false discovery rate (pFDR)*○ E[F/S | S > 0]

MHT error controlling procedure

● Suppose you test m hypotheses and get m p-values: p1 , p2 , p3 , ... pm

● A multiple hypothesis test error controlling procedure is a function T(p; α) such that rejecting all nulls with pi ≤ T(p; α) implies that

Error ≤ α● Error is a population quantity (not random)

Weak and strong control

● Weak: T(p; α) is such that Error ≤ α only when m0 = m

● Strong: T(p; α) is such that Error ≤ α for any value of m0

○ note that m0 is not an argument for T(p; α)!

Bonferroni correction

provides strong control:

but too restrictive

Why FDR and q-value?

● To help us interpret these values, two pieces of information would be useful● Estimate of the overall proportion of features that are

truly alternative (even if they cannot be precisely identified)

● Measure of significance that can be associated with each feature so that thresholding the numbers at a particular value has an easy interpretation

FDR

● Would like an error measure that provides a balance between ● Number of false positive features (F)● Number of true positive features (T)

FDR

● The false discovery rate is the expected value of the proportion of false positive features among all those called significant

*Some possibility S = 0, so some adjustment has to be made to definition of FDR

Estimating FDR

● Therefore, the FDR depends on what threshold (t) we are using to determine significance

Estimating FDR

● Because we are considering many features (m is very large), we can approximate

Estimating FDR

● We now need to approximate E[S(t)] and E[F(t)]● To illustrate how FDR is determined, for m genes

we have m p values● denoted p

1, p

2,…,p

m

● Define F(t) and S(t)

can count these for a given t

Estimating FDR

● Approximating F(t) is more difficult because we do know how many values called significant were truly null

● Assuming null p values are uniformly distributed,

the probability(null p ≤ t) = t

(# of null features x probability of null feature called significant)

Estimating FDR

● We do not know true value of m0, (# of null

features) so we must estimate

● Equivalently, we can estimate the proportion of

features that are truly null (denoted by π0)

● Assuming a uniform distribution for null p-values, we can estimate this quantity using a histogram

Estimating π0

Find where p-values look like a uniform distribution and

set λ

Estimating π0

λ (1-λ)

Note π0

does

not depend on t

Estimating π0

Can also fit a cubic

function to the π0

vs λ data to

determine π0(1)

(because “most” of the p values at 1 would be expected to be null)

FDR

● Estimate for False Discovery Rate is

Graphical Interpretation

q-value definition

● for a given feature, the q-value is the expected FDR incurred if it is called significant

○ (every other p_j <= p_i is also called significant)

● in practical terms: a q-value threshold is the "proportion of significant features that turn out to be false leads"



q-value

● a measure of each feature's significance

p-value is in terms of the false positive ratevs

q-value is in terms of the FDR

○ this takes into account that thousands of features are simultaneously being tested (via FDR)

■ uses a better model of where the significant features are likely to be

p vs q

● Example: ○ m = 10000

● p-values:○ cutoff at .01 assumes that you likely found about

100 false positives○ cutoff of .0001 assumes that you only found 1

false positive, but at what cost?● q-values:

○ set q-value cutoff at .05, and be sure that only 5% of the significant genes found are likely to be false positives

Algorithm for Determining q-Values

● Compute test statistic (p-value) for m genes● Estimate π0

○ Using histogram■ Find region where p-values are uniform + set λ■ Count p-values > λ and compute (1-λ)m (number of

values)○ Using cubic spline

● For each p-value○ calculate FDR for each threshold t >= p

■ only choose t values for each unique p in the gene set○ choose minimum FDR as q-value

q-value (cutoff)

q-value accuracy

● assumes that the dependence between features will generally be weak dependence

○ genes are actually dependent in pathways, which can be modeled as blocks

● if so, when m is large, calling all features significant with q <= alpha, implies the FDR <= alpha

● the estimated q value of each feature is greater than or equal to it's true q-value

○ conservative is desirable

q-value summary

● A standard measure of significance that can be universally interpreted between studies

● better than using just p-values○ arbitrary selection of alpha, where it is selected so

the expected number of false positives is < 1 throws away too many likely truly significant features

Questions?

FDR plug-in

●Create K permutations of the data, producing

statistics tjk for features j=1,...,M and permutations

k=1,...,K.

●For a range of cutoffs C, let

●Estimate the FDR by

q-value - CBCBcbcb.umd.edu/~hcorrada/CFG/lectures/student_presentations/qvalue.pdf · set q-value cutoff at .05, and be sure that only 5% of the significant genes found are likely

Documents