Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Subtypes of Associated Protein-DNA (TF-TFBS) Patterns
Finding associated patterns on both sides is shown to be promising—when you have many diverse binding sequences (e.g. TRANSFAC) Associated TF-TFBS patterns found from sequences…
x 7664 in TRANSFAC; 408 AAs on average
x 26786 bound TFBSs,1225 matrices in TRANSFAC; 25bp on average
Associated pattern discovery
…NRIAA… …TGACA…
…NRAAA… …TGACA…
…NREAA… …TGTGA……
Tak-Ming Chan et al, Discovering approximate-associated sequence patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
4
Introduction
Finding associated patterns on both sides is shown to be promising—when you have many diverse binding sequences (e.g. TRANSFAC) Associated TF-TFBS patterns found from sequences are verified
on 3D structures to be binding cores!
…NRIAA… …TGACA…
…NRAAA… …TGACA…
…NREAA… …TGTGA……
Verified on 3D structures (binding cores <3.5Å)
x 40222 binding pairs from 1290 PDB protein-DNA complexes
5Tak-Ming Chan et al, Discovering approximate-associated sequence patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
Introduction—Motivations
We can go further with these promising associated TF-TFBS patterns Discovering and analyzing the binding variances (subtypes)
…NRIAA… …TGACA…
…NRAAA… …TGACA…
…NREAA… …TGTGA……
Subtypes may•Lead to changed binding preferences•Distinguish conserved from flexible binding residues •Reveal novel binding mechanisms
6
Methods & Materials
7
Methods & Materials
Both L-2 distance and p-value of Chi-squared test are used to shortlist subtypes (3rd: G-C; 4th:G/C-G )
8
Results
Sample results from http://www.cse.cuhk.edu.hk/~tmchan/subtypes/
9
Results
Subtypes with evidence of changed binding preferences >70% of subtypes (& pairs) reflect
changed binding preferences according to PDB structure evidence.
10
Results
Subtype clusters show more conserved (invariant) residues are important for protein-DNA interactions; variant residues show specific properties
11
Results
Case study shows subtypes that are potentially critical for regulation through dimerization and thus TF-TFBS binding
PKVEIL-CAGCTG PKVVIL-CACGTG
myogenic regulatory factor (MRF) family: PDB 1MDY
Myc family (Oncogene): PDB 1NKP
PKVEIL appears in TFs of MRF4, Myf-5, Myf-6, MyoD… in TRANSFAC
PKVVIL appears in TFs of c/L/v-Myc in TRANSFAC
• The subtypes are discovered without family information while reflecting strong familial specificity
• Literatures on wet-labs support that if V is mutated to AA (MycV394D) similar to E, the dimerization of Myc-Max will be abolished (Miz1 binding deficient)
12
Discussion
Further applications Applications on TFBS (motif) matching by adding TF associated
subtype information
Extension of the method on high-throughput sequencing data (e.g. ChIP-Seq, Protein Binding Microarrays)
Integration of other information to enhance the TF-TFBS prediction
Incorporation of 3D homology modeling to better model protein-DNA interactions
Analysis of regulatory mechanisms with other data, e.g. allele-specific mRNA data, to reveal more detailed regulatory mechanisms