MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA Patrik D'haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi Information Processing in Cells and Tissues, pp. 203-212, 1998 Presented by Bin He
21
Embed
MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA Patrik D'haeseleer, Xiling Wen, Stefanie Fuhrman,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA
Patrik D'haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi
Information Processing in Cells and Tissues, pp. 203-212, 1998
Presented by Bin He
Motivations it is necessary to determine large-
scale temporal gene expression patterns
to decipher the logic of gene regulation, we should aim to be able to monitor the expression level of all genes simultaneously
Gene time series assay the expression levels of
large numbers of genes in a tissue at different time points
Gene time seriesthe relative amounts of mRNA produced at these time points provide a gene expression time series for each gene
and Somogyi, R., 1997, Large-scale temporal gene expression mapping of CNS development, Proc. Natl. Acad. Sci., in press
Previous Approach Euclidean distance and information
theoretic measures to cluster the genes into related expression time series
A significant problem with this approach is the variety of measures that can be used
Each measure produces a unique clustering of gene expression patterns
Contributions determining significant
relationships between individual genes, based on: linear correlation rank correlation information theory
Linear correlation ------positive correlation positive linear correlation
Linear correlation ------negative correlation negative linear correlation
Linear correlation ------restriction for 112 different genes, 112x111/2
= 6216 pairs of expression time series need to be examined
to restrict the number of relationships, we might want to test which correlations are significantly larger than a certain value
Linear correlation ------restriction For instance, to find those
relationships in which at least 50% of the variance is explained by the correlation, i.e. rho2>0.5, we need |r|>0.96 to reject at the 1% significance level the null hypothesis that |rho|<0.7071
Linear correlation ------visualization residual variance based distance
measurment d=1-r2
d=0 if perfectly correlated, d=1 if uncorrelated
multidimensional scaling map time series into a two-
dimensional plane
Linear correlation ------visualization Multidimensional scaling of 34 time