Top Banner
Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network G. C.Castellani, D.Remondini, N.Intrator, B. O’Connell, JM Sedivy Centro L.Galvani Biofisica Bioinformatica e Biocomplessità Università Bologna and Physics Department Bologna Institute for Brain and Neural System Brown University Providence RI
23

Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Mar 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Gene expression Network dynamics: from microarray data to gene-gene

connectivity reconstruction. Reconstruction of

c-MYC proto-oncogene regulated genetic network

G. C.Castellani, D.Remondini, N.Intrator, B. O’Connell, JM Sedivy

Centro L.Galvani Biofisica Bioinformatica e Biocomplessità Università Bologna and Physics Department Bologna

Institute for Brain and Neural System Brown University Providence RI

Page 2: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Gene expression Network dynamics: from microarray data to gene-gene connectivity

reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network

•Gene significance

•Temporal structure

•Gene clustering

•Model validation

Page 3: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Complex Network Theory and its application to cellular networks

Complex Network theory is a rapidly growing field of contemporary interdisciplinary research.The applications ranges from Mathematics to Physics to Biology.The classical mathematical theory has been developed (1957-1960)

by Erdos and Reny : Random Graph .Some Physical problems that are related to this approach are:Percolation, Bose-Einstein Condensation and the Simon problem.Recent application to Biology are focused onNeural Network,Immune Network Protein Folding, Proteomic and Genomics mainly on the large scale organization of Biological Network

Page 4: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

One of the most recent theories that has been shown to havepromising applications in the Biological Sciences is the so called Theory of Complex Networks that have been applied to protein-proteininteraction and to metabolic network (Jeong and Barabasi)

Page 5: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Classical Random Graphs

A Random Graphs is a set of nodes and edges connecting them.The number of edges and their nodes attachment are chosen Randomly with a certain probability p.

It has been demonstrated that there exists a critical probability pc for the appearance of a giant cluster (phase transition) pc ~N-1.Another Erdos Reny result is that the degree connectivity distribution(the number of edges of each node) follows the Poisson statistics

Page 6: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Extension to Random Graph TheoryDuring the last years considerable efforts have been done to further analyze the statistics of Random Graphs.The major results are summarized by the so called “Small World”and “Scale free” graphsThe “Small World” graphs interpolates between regular lattice and Random graphs. The “Scale Free” network are created by two simple rules:Network growth and Preferential Attachment (the most connectedNodes are the most probable sites of attachment)Both models gives a non Poisson degree distribution: Power Law

kkP )( ck

kk

ekkkP)(

0

0

)()(

Moreover, this type of distributions was observed in real networks such asInternet, C.Elegans Brain, Methabolic Network with 2< < 3 exponent and various values for the exponential cutoff kc and k0

Page 7: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Inadeguacy of complete connectivity

The complete connectivity as well the random connectivity are not biologically plausible. Connectivity changes as consequence to developemental changes (ie learning, ageing) appear most appropriate

Comparison between experimental and theoretical resultson the number of virgin cells during the lifespan.The number of stable states (that we identify with memory capacity and with memory cells) increases as a function of age.We found similar results (increase of number of stable states by connectivity changes) also for the BCM model, but the biological interpretation is less clear

Page 8: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

The John Sedivy Lab at Brown University has designed a new generation of microarrays that cover approximately one half of the whole rat genome (roughly 9000 genes). The array construction aims at obtaining a precise targeting of the proto-oncogene c-MYC. This gene encodes for a transcriptional regulator that is correlated with a wide array of human malignancies, cellular growth and cell cycle progression.The data base is organized in 81 array obtained by hybridisation with a cell line of rat fibroblats. These gene expression measurements were performed in triplicate for a better statistical significance. The complete data set is divided into three separate experiments; each of which addresses a specific problem;. Experiment 1: Comparison of different cell lines where c-myc is expressed at various degrees ( null, moderate, over-expressed). This experiment can reveal the total number of genes that respond to a sustained loss of c-Myc as well as those genes that respond to c-MYC over-expression. Experiment 2: Analysis of those cell-lines that over-express c-Myc following stimulation with Tamoxifen (a drug that has been used to treat both advanced and early stage breast cancer). This data was collected during a 16 our time course. This experiment reveals the kinetics of the response to Myc activation and may lead to the identification of the early- responding genes. Experiment 3: Analysis of the time course of induction with Tamoxifen when it was performed in the presence of Cycloheximide (a protein synthesis inhibitor). This experiment reveals a subset of direct transcriptional targets of c-Myc. 

Page 9: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Our approach to the determination of the C-MYC regulated network can be summarized in 3 points

1) List of genes based on significance analysis over time points between MYC and control and within time point (between groups and within groups (time)).

2) Time translation matrix calculated on microarray treated with Tamoxifen and not treated

- T and NT raw dataThe resulting time translation matrix will be used to reconstruct the connectivity matrix between genes

4) Model validation for determination of the error model

Page 10: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

S0 is an appropriate regularizing factor.Interesting genes are chosen as the union between the genes selected with the above methods

• With this SA we obtain 776 significative genes (p<0.05) if we require significance on 1 time point

0)(

)()()(

sis

ixixid TNTbetween

1

2

1

2 )()()(

n

i

n

iis TNT

0

//

)(

)()1()(

sis

txtxid NTTNTTwithin

1

2/

1

2/ )()1(

)(n

t

n

tis NTTNTT

Significance Analysis

Page 11: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

The selected genes are used for the step 2 of our analysis:

The x(t) are the gene expressions at time t and A is the unknown matrix that we estimate from time course (0,2,4,8,16) of microarray data (T and NT separately, An and At).This is a so called inverse problem because the matrix is recovered from time dependent data.

-> From appropriate thresholding on A’s we can recover theconnectivity matrix between the genes.

)1()( tAxtx

Step 2: Linear “Markov” Model

Page 12: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Network topology

No Tamoxifen With Tamoxifen

Page 13: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Model validation

The different models (data preprocessing, modeling of gene dynamics, clustering techniques) have been validated mathematically by means of

- residues analysis (errors)

The residual are small and we have used a Markov matrix that is not the original (computed over 5 time steps) but the validated one.We compute the matrix on 4 time step and the validationis on the subsequent by comparison with the real data.

Page 14: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Changing databases

In order to have a better understanding of the results, both in

terms of network topology and connectivity distribution, we

generated 2 databases:

1) One small database with those genes that were without any

doubt affected by Tamoxifen (50 genes)

2) One larger database with all the genes that give 2 P on 3

experiments i.e. those genes for which we have good

measurements (3444 genes)

Page 15: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

50 genes database

NT T

Page 16: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Results

For each of the 50 genes, we computed the connectivity and the

clustering coefficient that express if the gene is connected to

highly connected or poorly connected genes.

It is possible to see that the treatment with Tamoxifen causes a

decrease in clustering in the network so it seems that the network

becomes “less scale free”. This is confirmed by the network

clustering coefficient:

N Overall graph clustering coefficient: 0.840

T Overall graph clustering coefficient: 0.241

The individual connectivity and clustering changes are

summarized in this table: Table

Page 17: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

The 3444 genes databaseThis large database is used in order to have a better statistics and possibly a distribution fit

N T

Clearly these distributions are not Poisson and seem to bePower law with exponential tail

Page 18: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Fitting the distributionsWe fitted the distribution with a generalized power-law :

N T

ck

kk

ekkkP)(

0

0

)()(

- 2.50398H- 0.401309+xL+4.80736Log@- 0.401309+xD- 2.34483H- 1.04328+xL+3.66552Log@- 1.04328+xD

3.266.304.10 ckk

5.28.44.00 ckk

N

T

Page 19: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Network Structure (3444 genes)

N Overall graph clustering coefficient: 0.902T Overall graph clustering coefficient: 0.893From this results and from the fit parameters it seems that the N-Network is less scale free, but these results are strongly affected by noise

We have looked at the individual connectivity and clustering coefficient, andtheir variation between N and T.The results are encouraging: between those genes that have changed their connectivity in a significant way there are C-MYC targets

Page 20: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Network Structure (3444 genes)

As an example we report some connectivity change in C-Myc target genes

2379 rc_AI178135_at complement component 1, q subcomponent binding protein 3 272 2796 U09256_at transketolase 13 39

2772 U02553cds_s_at protein tyrosine phosphatase, non-receptor type 16 133 146

 390 D10853_at phosphoribosyl pyrophosphate Amidotransferase 0 7

933M58040_at transferrin receptor 1 27

Page 21: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Conclusions

We have tested the hyphothesis that a treatment with Tamoxifen that in these engineered cells lead to C-MYC activation can be related to connectivity changes between genes

The connectivity is a very important parameter both for Physical and Biological systems. Connectivity (coupling) changes are the basis forPhase Transitions and developmental changes (ageing, learning and response to external stimuli)

Our results show that within the framework of scale free network there are changes in gene-gene connectivity.

The connectivity distributions of N and T are far from Poisson with parameters that are similar to those founded for other systems that account for scale free distribution with exponential tail.

Page 22: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Conclusions

If we look for the individual gene connectivity or if we look in smaller database we observe that there are significant changes induced by the treatment. As example the clustering coefficient changes and some C-MYC target shows connectivity and clustering coefficient changes

One clear result is that the global gene degree connectivity follow a power law distribution both without and with Tamoxifen.This result seems to point out that this type of behaviour is very general

These results need to be confirmed and further analyzed, but, at our knowledge this is the first attempt to monitor the network connectivity changes induced by C-MYC activation in comparison with a basal level

Page 23: Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic.

Conclusions

Some points that need further analysis are the correlation between connectivity change and C-MYC target, our method is not a significance test it can only help to look gene activity as result of interactions between genes at the previous time step

The MARKOV approach for the gene-gene connectivity reconstruction is not new (Maritan 2001) but we have introduced matrix validation, rigorous data discretization and normalization that can improve the model robustness

Finally we will further improve the model robustness by time reshuffling and try to test its predictive performances