Top Banner
ARCHITECTURE OF THE HUMAN REGULATORY NETWORK DERIVED FROM ENCODE DATA Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N,Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P,Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O'Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M,Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M. Paper Presentation | Physiology |M.Sc., ITMB UoA Anaxagoras Fotopoulos – Thanos Nature, 489(7414):91-100, 2012
18

Architecture of the human regulatory network derived from encode data

Jan 26, 2015

Download

Health & Medicine

Anax Fotopoulos

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of
these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the
principles of the human transcriptional regulatory network, we determined the genomic binding information of
119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of
transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations.
In particular, there are significant differences in the binding proximal and distal to genes. We organized all the
transcription factor binding into a hierarchy and integrated it with other genomic information (for example,
microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for
instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate
targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched
network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components
are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the
two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome
sequences and understanding basic principles of human biology and disease.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Architecture of the human regulatory network derived from encode data

ARCHITECTURE OF THE HUMAN REGULATORYNETWORK DERIVED FROM ENCODE DATAGerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N,Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P,Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O'Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M,Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M.

Paper Presentation | Physiology |M.Sc., ITMB UoAAnaxagoras Fotopoulos – Thanos Papathanasiou | 2014

Nature, 489(7414):91-100, 2012

Page 2: Architecture of the human regulatory network derived from encode data

INTRODUCTION▪ System-wide analyses of transcription-factor-binding patterns have been performed in

unicellular model organisms, such as Escherichia coli.

▪ For humans, systems-level analyses have been a challenge due to the size of the transcription factor repertoire and genome.

▪ Large-scale data from the ENCODE project begin to enable such analyses

▪ An analysis of the genome-wide binding profiles of 119 transcription-related factors derived from 450 distinct experiments is performed, for finding correlations and multi-transcription factor motifs.

▪ The results are integrated with other genomic information to form a multi-level meta-network in which different levels have distinct properties.

▪ Information obtained in this study will be crucial to interpreting variants in the many personal genome sequences expected in the future and understanding basic principles of human biology and disease.

Page 3: Architecture of the human regulatory network derived from encode data

ENCODE

ChIP-seq data sets for 119 TF over five main cell lines

Peak Detection

TFFor every peak

find intensities of overlapping peaks of all other factors

Generation of Cobinding maps

(e.g. GATA1)

vs

Negative Set

created by independently shuffling the peak intensity

values in each row of the co-binding map

RULEFIT ALGORITHM{Combination of

factors are compared to randomized co-

binding map}

Positive Set

Randomized co-binding map Randomized co-binding map

Relative Importance Coassociation score

Aggregate across allfocus-factor contexts

Importance correlation

matrix

Maximal Coassociation

matrix

Data & Methods Analysis

Page 4: Architecture of the human regulatory network derived from encode data

POL2-(4H8)TAL1GATA2

Relative Importance gives the overall importance of each transcription factor in the model. It reflects the ‘size’ of the biclusters to which a particular transcription factor belongs, and it is related to the number of co-binding factors and the fraction of peak locations involved.

For GATA1 context primary partners POL2, TAL1 and GATA2, as well as local partners MAX and JUN, have high RI scores.

Co-association scores measure the impact of the co-dependency implicit in a particular pair on the model as a whole, and they more directly probe the co-occupancy of transcription factors in the focus factor context than does the RI score.

CCNT2–HMGN3 Novel Pair

MYC–MAX–E2F6 Expected Pairings Many Genes that lie near

clusters of co-associated factors are enriched for specific biological functions. For example

• Bicluster {E2F6–GATA1–GATA2–TAL1} was enriched for genes related to myeloid differentiation

• Bicluster {E2F6–SP1–SP2–FOS–IRF1} was involved in DNA damage response

Distinct combinationsof factors regulate specific types of

genes.

Example of GATA1 Relative Importance & Co-association scores

Page 5: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {1/7} WITH DISTAL EDGES

Downward Pointing Edges

UpwardPointing Edges

Distal edges have a different degree distribution than proximal ones.

Transcription factors with low in-degree values in the proximal network but high in-degree values in the distal one, indicating

that they are heavily regulated through enhancers

Top Level

Middle Level

Bottom Level

Page 6: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {2/7} WITHIN THE PROXIMAL NETWORK

• Upper-level transcription factors tend to have more targets than lower-level ones

(Less Shaded TF).

• In middle-level, TF concentrate many in-degree & out-degree information (bottleneck) between top and bottom level.

Top Level

Middle Level

Bottom Level

Downward Pointing Edges

UpwardPointing Edges

Page 7: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {3/7} WITH PROTEIN INTERACTIONS AND THE PHOSPHORYLOME

Top-level transcription factors tend to have more partners in the protein–interaction network than do lower-

level ones.

Kinases at the bottom tend not to phosphorylate transcription factors

Kinases at the bottom tend to be regulated by transcription factors

Phosphorylome is a proteome (entire set) of phosphoproteins

Page 8: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {4/7} WITH ncRNAs

Highly connected transcription factors

tend to regulate more miRNAs and to be more

regulated by them.

Top-level and middle-level transcription factors have the highest total number of

ncRNA targets.enriched for miRNA –>TF edges

Balanced number of edges

enriched for TF –> miRNA edges

Page 9: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {5/7} WITH FAMILIES AND FUNCTIONAL CATEGORIES

• Transcription factors at the top of the hierarchy tend to have more general functions, and those at the bottom tend to have more specific functions.

• TFSSs show a greater degree of tissue specificity and are more highly regulated by miRNAs than the general and chromatin-related factors

Chromatin-related factors are

enriched at the top of the hierarchy

TF Sequence-Specific (TFSSs) are

enriched in the middle

Page 10: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {6/7} WITH GENE EXPRESSION

•Highly connected factors tend to be highly expressed

•Top and middle levels show a greater correlation.

•More ‘influential’ transcription factors tend to be better connected and higher in the hierarchy.

•A model integrating the binding–expression relationships of the highly connected transcription factors has the same influence (in prediction) with the less connected ones (weak binding–expression).

Top-Middle and Middle-Middle transcription factor pairs influence

gene expression cooperatively

Page 11: Architecture of the human regulatory network derived from encode data

CORRELATIONS OF TRANSCRIPTION FACTORS {7/7} WITH NETWORK DYNAMICS

Transcription factors change their binding patterns among different cell types.

Targets of lower-level transcription factors tend to change more between cell types, consistent with their role in more specialized processes.

‘Rewiring score’ is negatively correlated with hierarchy level

Binding Set 1

BindingSet 2

𝑟𝑒𝑤𝑖𝑟𝑖𝑛𝑔 𝑠𝑐𝑜𝑟𝑒=1−𝐵𝑠1∩𝐵𝑠2𝐵𝑠1∪𝐵𝑠2

Commonbinding

sites

Rewiring score quantifies the difference between two sets of binding targets of a

TF in two cell lines (Gm12878 and K562)

Page 12: Architecture of the human regulatory network derived from encode data

ENRICHED NETWORK MOTIFS {1/4}AUTO-REGULATOR MOTIFSNetwork motifs are small connectivity patterns that carry out canonical functions

Motifs in broad template patterns, could be over- or under-represented relative to a random control

•Human Regulatory Network is enriched with auto-regulators•Auto-regulators tend to be repressors, representing a well known design principle for maintaining steady state.•Auto-regulators have more ncRNAs as their targets

Auto regulator is a simple but important motif which is commonly found in networks exhibiting multistability.

90 TF are Non-Auto-regulators

28 TF are Auto-regulators

Page 13: Architecture of the human regulatory network derived from encode data

ENRICHED NETWORK MOTIFS {2/4}THREE TRANSCRIPTION FACTOR MOTIFS

•The most enriched motif of the Three-transcription-factor motifs in the proximal network is the feed-forward loop (FFL).

•From the expression levels of the genes of the FFLs over many tissues, many were positively correlated

•Enriched three-transcription-factor motifs contain an additional regulation on top of that in a FFL. This creates a mutual regulation between a pair of transcription factors, instantiating a toggle-switch, which has essential role in the determination of the cell

EnrichmentDepletion

Page 14: Architecture of the human regulatory network derived from encode data

ENRICHED NETWORK MOTIFS {3/4}PPI-MIMs MOTIFS

•Co-regulating transcription factors are likely to interact physically, indicating that they work together as a complex.

•The motif ranking second in enrichment consists of a distal regulatory relationship, a promoter regulatory relationship, and a protein–protein interaction. Consisting of a DNA loop, with an interacting complex of transcription factors binding to the promoter and enhancer simultaneously.

Possible Multiple-Input-Modules involving promoter and distal regulation and a Protein–Protein Interaction (PPI-MIMs)

Page 15: Architecture of the human regulatory network derived from encode data

ENRICHED NETWORK MOTIFS {4/4}miRNA REGULATION MOTIFS

•The miRNAs are more likely to regulate a pair of physically interacting factors.

•In order to avoid unwanted cross-talk, a miRNA tends to shut down an entire functional unit (transcription factor complex) rather than just a single component .

•miRNAs tend to target a pair of transcription factors binding both proximally and distally. This suggests that miRNA represses the expression of both promoter and distal regulators to shut down a target completely.

Page 16: Architecture of the human regulatory network derived from encode data

ALLELIC BEHAVIOR IN A NETWORK FRAMEWORK

•The degree of allele-specific behaviour of each transcription factor can be quantified by a statistic that we call ‘allelicity’.

•of the 4,798 allele-specific binding cases (Paternal or Maternal Targets) of a single transcription factor, 57% showed coordinated allelic binding and expression.

•Increment of the degree of combinatorial regulation, cause a progressively stronger relationship between expressed and regulated alleles.

•Small insertions and deletions in TF sequences cause more allelic behavior than SNPs.

Examining relationships between sequence variation and transcription factor regulation

TF

Target

Pat/Mat

Every line denotes allele specific binding

Page 17: Architecture of the human regulatory network derived from encode data

CONCLUSIONS• Human transcription factors co-associate in a combinatorial

and context-specific fashion.• Different combinations of factors bind near different targets,

and the binding of one factor often affects the preferred binding partners of others.

• Transcription factors often show different co-association patterns in gene-proximal and distal regions

• Different parts of the hierarchical transcription factor network exhibit distinct properties.

• Number of motifs in which two genes co-regulated by a factor are bridged by a protein–protein interaction or regulating miRNA.

• Both transcription factors and Targets are under strong evolutionary selection and exhibit stronger allele-specific activity but are under weaker selection than non-allelic ones.

Page 18: Architecture of the human regulatory network derived from encode data

Thank youNational & KapodistrianUniversity of AthensDepartment of Informatics

Technological Education Institute of AthensDepartment of Biomedical Engineering

Biomedical ResearchFoundation Academy of Athens

Demokritos National Center for Scientific Research

PhysiologyInformation Technologies in Medicine and Biology