Finding associated genes in large collections of microarrays.

Finding associated genes in large collections of microarrays

Produce hypothesis of functional relations between genes

• Positive correlation: Co-regulated genes or positive modulator

• Negative correlation: Co-regulated genes or inhibitor.

• Used to derive networks of gene interactions.

4 simple ways of finding association

• Pearson correlation coefficient.

• Spearman’s rank correlation coefficient.

• Probabilistic approach (Present/Absent).

• Mutual information (Present/Absent)

Pearson correlation coefficient

• Varies between -1 and 1:Between 0.6 and 1: strong positive correlation.

Between -0.6 and -1: strong negative correlation.

-1 is perfect negative correlation

1 is perfect positive correlation

• Assumes linear relation between variables.


• Step 1: Prepare data.

• Step 2: Compute Pearson coefficient between pairs of probes of interest.

• Step 3: Assess significance.

• Step 4: Multiple testing correction.


• Step 1: Prepare data:– Chips are normalized with MAS 5.0 or

other procedure.– Scale probes in each chip dividing by

mean.– Center and standardize each probe

distribution: z-scores.


• Step 2: Compute Pearson coefficient between pairs of probes:

when z-scores are pre-computed:

n: number of chips

1nzz yx


• Step 3: Assess significance:– Randomize if possible. Good for less than 20 chips or– Use t-Student distribution with n-2 degrees of

freedom:

ρ: correlation coefficient

n: number of chips

2)1( 2

n

t


• Step 4: Multiple testing correction

Spearman’s rank correlation coefficient

• Non parametric method: – Less power but more robust.– Does not assume normal distribution.

• Also varies between -1 and 1



• Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest.

• Step 3: Assess significance.

• Step 4: Multiple test correction.


• Step 1: Prepare data:– Same as Pearson.– Order the values of the probes by

increasing hybridization values.– Construct the rank vectors.


• Step 2: Compute coefficient between probe sets of interest:

d: differences between the ranks of the two probes

n: number of chips

16

12

2

nn

d


• Step 3: Assess significance: Same as Pearson.– Randomize if possible. Less than 20 chips

or– Use t-Student distribution with n -2 degrees

of freedom:

ρ: correlation coefficient

n: number of chips

21 2

nt


• Step 4: Multiple testing correction.

Binary probabilistic approach based on Present/Absent

• Approach adapted from:

“Computational methods for the identification of differential and coordinated gene expression.”

Claverie JMHum Mol Genet. 1999;8(10):1821-32

• Use MAS 5.0 calls of Present-Marginal-Absent for each probe.

• Good for heterogeneous microarray collections.

Binary approach based on Present/Absent


• Step 2: Compute p-value of # of observed matches.



• Step 1: Obtain P/M/A calls for probes:– Each call is associated to a p-value. Filter

can be applied.– Codify P/M/A calls as binary vectors:

Encode P as 1 and M/A as 0


• Step 2: Compute p-value of # of matches

probe x: 1 1 0 0 0 1 1 0 1 0 0 0

probe y: 1 1 0 0 0 0 1 0 1 0 0 0

probe z: 0 0 1 1 1 1 0 0 0 1 1 1

Find improbably high number of matches (or miss-matches).

probe x & y: 11 out of 12 matches

probe x & z: 11 out of 12 miss-matches


• Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match.

xp : fraction of 1s (Present) probe x.

yxyxmatch ppppp 11

yp : fraction of 1s (Present) probe y.


• Step 2: Compute probability for observing by chance x matches or more from the binomial distribution:

• For n large one can use the normal distribution:

matchpnB ,n: number of chips.

5matchnp 51 matchpn

matchmatchmatch pnpnpN 1,



Mutual information based on Present/Absent


• Step 2: Compute MI value for pairs of probes.

• Step 3: Use of a threshold for MI


• Step 1: Obtain P/M/A calls for probes:– Each call is associated to a p-value. Filter

can be applied.– Codify P/M/A calls as binary vectors:

• Encode P/M as 1 and A as 0 OR • Encode P as 1 and M/A as 0


• Step 2: Compute MI value for probes X and Y:

p(.) frequencies of observed Ps and As

p(x,y) frequencies of the joint distribution


• Step 3: Use a threshold: probes X and Y are correlated if:

MI(X, Y) >1/n * log(1/P) n: number of chips.

P: 1/p^2 (with p number of probes).

“A simple method for reverse engineering causal networks”

M. Andrecut and S. A. Kauffman

J. Phys. A: Math. Gen. 39 No 46.

Try Pearson method in Stembase!

Implemented by Reatha Sandie

Finding associated genes in large collections of microarrays.

Documents

correlation coefficient

compute coefficient

compute pearson coefficient

presentabsent step

genes positive correlation

perfect positive correlation

strong positive correlation

perfect negative correlation