Top Banner
Finding associated genes in large collections of microarrays
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Finding associated genes in large collections of microarrays.

Finding associated genes in large collections of microarrays

Page 2: Finding associated genes in large collections of microarrays.

Produce hypothesis of functional relations between genes

• Positive correlation: Co-regulated genes or positive modulator

• Negative correlation: Co-regulated genes or inhibitor.

• Used to derive networks of gene interactions.

Page 3: Finding associated genes in large collections of microarrays.

4 simple ways of finding association

• Pearson correlation coefficient.

• Spearman’s rank correlation coefficient.

• Probabilistic approach (Present/Absent).

• Mutual information (Present/Absent)

Page 4: Finding associated genes in large collections of microarrays.

Pearson correlation coefficient

• Varies between -1 and 1:Between 0.6 and 1: strong positive correlation.

Between -0.6 and -1: strong negative correlation.

-1 is perfect negative correlation

1 is perfect positive correlation

• Assumes linear relation between variables.

Page 5: Finding associated genes in large collections of microarrays.

Pearson correlation coefficient

• Step 1: Prepare data.

• Step 2: Compute Pearson coefficient between pairs of probes of interest.

• Step 3: Assess significance.

• Step 4: Multiple testing correction.

Page 6: Finding associated genes in large collections of microarrays.

Pearson correlation coefficient

• Step 1: Prepare data:– Chips are normalized with MAS 5.0 or

other procedure.– Scale probes in each chip dividing by

mean.– Center and standardize each probe

distribution: z-scores.

Page 7: Finding associated genes in large collections of microarrays.

Pearson correlation coefficient

• Step 2: Compute Pearson coefficient between pairs of probes:

when z-scores are pre-computed:

n: number of chips

1nzz yx

Page 8: Finding associated genes in large collections of microarrays.

Pearson correlation coefficient

• Step 3: Assess significance:– Randomize if possible. Good for less than 20 chips or– Use t-Student distribution with n-2 degrees of

freedom:

ρ: correlation coefficient

n: number of chips

2)1( 2

n

t

Page 9: Finding associated genes in large collections of microarrays.

Pearson correlation coefficient

• Step 4: Multiple testing correction

Page 10: Finding associated genes in large collections of microarrays.

Spearman’s rank correlation coefficient

• Non parametric method: – Less power but more robust.– Does not assume normal distribution.

• Also varies between -1 and 1

Page 11: Finding associated genes in large collections of microarrays.

Spearman’s rank correlation coefficient

• Step 1: Prepare data.

• Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest.

• Step 3: Assess significance.

• Step 4: Multiple test correction.

Page 12: Finding associated genes in large collections of microarrays.

Spearman’s rank correlation coefficient

• Step 1: Prepare data:– Same as Pearson.– Order the values of the probes by

increasing hybridization values.– Construct the rank vectors.

Page 13: Finding associated genes in large collections of microarrays.

Spearman’s rank correlation coefficient

• Step 2: Compute coefficient between probe sets of interest:

d: differences between the ranks of the two probes

n: number of chips

16

12

2

nn

d

Page 14: Finding associated genes in large collections of microarrays.

Spearman’s rank correlation coefficient

• Step 3: Assess significance: Same as Pearson.– Randomize if possible. Less than 20 chips

or– Use t-Student distribution with n -2 degrees

of freedom:

ρ: correlation coefficient

n: number of chips

21 2

nt

Page 15: Finding associated genes in large collections of microarrays.

Spearman’s rank correlation coefficient

• Step 4: Multiple testing correction.

Page 16: Finding associated genes in large collections of microarrays.

Binary probabilistic approach based on Present/Absent

• Approach adapted from:

“Computational methods for the identification of differential and coordinated gene expression.”

Claverie JMHum Mol Genet. 1999;8(10):1821-32

• Use MAS 5.0 calls of Present-Marginal-Absent for each probe.

• Good for heterogeneous microarray collections.

Page 17: Finding associated genes in large collections of microarrays.

Binary approach based on Present/Absent

• Step 1: Prepare data.

• Step 2: Compute p-value of # of observed matches.

• Step 3: Multiple test correction.

Page 18: Finding associated genes in large collections of microarrays.

Binary approach based on Present/Absent

• Step 1: Obtain P/M/A calls for probes:– Each call is associated to a p-value. Filter

can be applied.– Codify P/M/A calls as binary vectors:

Encode P as 1 and M/A as 0

Page 19: Finding associated genes in large collections of microarrays.

Binary approach based on Present/Absent

• Step 2: Compute p-value of # of matches

probe x: 1 1 0 0 0 1 1 0 1 0 0 0

probe y: 1 1 0 0 0 0 1 0 1 0 0 0

probe z: 0 0 1 1 1 1 0 0 0 1 1 1

Find improbably high number of matches (or miss-matches).

probe x & y: 11 out of 12 matches

probe x & z: 11 out of 12 miss-matches

Page 20: Finding associated genes in large collections of microarrays.

Binary approach based on Present/Absent

• Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match.

xp : fraction of 1s (Present) probe x.

yxyxmatch ppppp 11

yp : fraction of 1s (Present) probe y.

Page 21: Finding associated genes in large collections of microarrays.

Binary approach based on Present/Absent

• Step 2: Compute probability for observing by chance x matches or more from the binomial distribution:

• For n large one can use the normal distribution:

matchpnB ,n: number of chips.

5matchnp 51 matchpn

matchmatchmatch pnpnpN 1,

Page 22: Finding associated genes in large collections of microarrays.

Binary approach based on Present/Absent

• Step 3: Multiple test correction.

Page 23: Finding associated genes in large collections of microarrays.

Mutual information based on Present/Absent

• Step 1: Prepare data.

• Step 2: Compute MI value for pairs of probes.

• Step 3: Use of a threshold for MI

Page 24: Finding associated genes in large collections of microarrays.

Mutual information based on Present/Absent

• Step 1: Obtain P/M/A calls for probes:– Each call is associated to a p-value. Filter

can be applied.– Codify P/M/A calls as binary vectors:

• Encode P/M as 1 and A as 0 OR • Encode P as 1 and M/A as 0

Page 25: Finding associated genes in large collections of microarrays.

Mutual information based on Present/Absent

• Step 2: Compute MI value for probes X and Y:

p(.) frequencies of observed Ps and As

p(x,y) frequencies of the joint distribution

Page 26: Finding associated genes in large collections of microarrays.

Mutual information based on Present/Absent

• Step 3: Use a threshold: probes X and Y are correlated if:

MI(X, Y) >1/n * log(1/P) n: number of chips.

P: 1/p^2 (with p number of probes).

“A simple method for reverse engineering causal networks”

M. Andrecut and S. A. Kauffman

J. Phys. A: Math. Gen. 39 No 46.

Page 27: Finding associated genes in large collections of microarrays.

Try Pearson method in Stembase!

Implemented by Reatha Sandie