CSCE555 Bioinformatics CSCE555 Bioinformatics Lecture 11 Promoter Predication Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu . HAPPY CHINESE NEW YEAR
CSCE555 Bioinformatics. Lecture 11 Promoter Predication Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555. HAPPY CHINESE NEW YEAR. University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Meeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.
OutlineOutlineIntroduction to DNA MotifMotif Representations (Recap)Motif database searchAlgorithms for motif discovery
04/24/23 2
Search SpaceSearch Space
N
Length = L
Motif width = W
Size of search space = (L – W + 1)N
L=100, W=15, N=10 size 1019
Worked ExampleWorked Example
W
k tgcai
ci
tgcaiki
kipcN1 ,,,,,,
!!3
6lnscore
1 2 3 4a 0 2 0 3
c 4 0 2 1
g 0 1 2 0
t 0 1 0 0
2561
41 N
i
cikipcki =
N = 4pi = ¼
10532
!36
i
cikip
N
Score = 1.99 - 0.50 + 0.20 + 0.60 = 2.29
Gibbs Sampling SearchGibbs Sampling Search
1
2
Suppose the search space is a 2D rectangle. (Typically, more than 2 dimensions!)
X
Start at a random point X.
Randomly pick a dimension.
Look at all points along this dimension.
Repeat.
Move to one of them randomly, proportional to its score π.
Gibbs Sampling for Motif Gibbs Sampling for Motif SearchSearch
Choose a random starting state.
Randomly pick a sequence.
Look at all motif positions in this sequence.
Pick one randomly proportional to exp(score).
Repeat.
Does it Work in Practice?Does it Work in Practice?Only successful cases get published!Seems more successful in microbes (bacteria &
yeast) than in animals.The search algorithm seems to work quite well,
the problem is the scoring scheme: real motifs often don’t have higher scores than you would find in random sequences by chance. I.e. the needle looks like hay.
Attempts to deal with this:◦ Assume the motif is an inverted palindrome (they often
are).◦ Only analyze sequence regions that are conserved in
another species (e.g. human vs. mouse).As usual, repetitive sequences cause problems.More powerful algorithm: MEME
1. Go to our MEME server:
http://molgen.biol.rug.nl/meme/website/meme.html
1. Fill in your emailadres, description of the sequences
2. Open the fasta formatted file you just saved with Genome2d (click “Browse”)
3. Select the number of motifs, number of sites and the optimum width of the motif
recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site
RNA polymerase complex binds directly to these. with no requirement for “transcription factors”
Prokaryotic promoter sequences are highly conserved
-10 region -35 region
13
14
What signals are there? What signals are there? Complex ones in Complex ones in
eukaryoteseukaryotes
15
Eukaryotic genes are transcribed by Eukaryotic genes are transcribed by 3 different RNA polymerases3 different RNA polymerasesRecognize different types of promoters & enhancers:
Eukaryotic promoters & Eukaryotic promoters & enhancers enhancers Promoters located “relatively” close to
initiation site (but can be located within gene, rather than
upstream!)Enhancers also required for regulated
transcription(these control expression in specific cell types, developmental stages, in response to environment)
RNA polymerase complexes do not specifically recognize promoter sequences directly
Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes
16
Eukaryotic transcription Eukaryotic transcription factors factors Transcription factors (TFs) are DNA
binding proteins that also interact with RNA polymerase complex to activate or repress transcription
TFs recognize specific short DNA sequence motifs “transcription factor binding sites”◦ Several databases for these, e.g. TRANSFAC http://www.generegulation.com/cgibin/pub/databases/transfac
• For comparative (phylogenetic) methods• Must choose appropriate species• Different genomes evolve at different rates• Classical alignment methods have trouble with translocations, inversions in order of functional
elements• If background conservation of entire region is
highly conserved, comparison is useless• Not enough data (Prokaryotes >>> Eukaryotes)
• Biology is complex: many (most?) regulatory elements are not conserved across species!
3: Promoter Prediction: Co-3: Promoter Prediction: Co-expression based algorithmsexpression based algorithmsProblems:• Need sets of co-regulated genes• Genes experimentally determined to be co-
regulated (using microarrays??) Careful: How determine co-regulation?
• Alignments of co-regulated genes should highlight elements involved in regulation
Algorithms:MEME
AlignACE, PhyloCon27
Examples of promoter Examples of promoter prediction/characterization prediction/characterization softwaresoftware
28
MATCH, MatInspectorTRANSFACMEME & MASTBLAST, etc.
Others?FIRST EFDragon Promoter Finder (these are links in PPTs)
also see Dragon Genome Explorer (has specialized promoter software for GC-rich DNA, finding CpG islands, etc)JASPAR
TRANSFAC matrix entry: for TRANSFAC matrix entry: for TATA boxTATA box
Fields:• Accession & ID •Brief description•TFs associated with this entry•Weight matrix •Number of sites used to build (How many here?)•Other info
30
Global alignment of human & mouse obese Global alignment of human & mouse obese gene promoters (200 bp upstream from gene promoters (200 bp upstream from TSS)TSS)
Check out optional review & Check out optional review & try associated tutorial: try associated tutorial:
Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html
D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!) 31
Check this out: http://www.phylofoot.org/NRG_testcases/
32
Annotated lists of promoter databases & Annotated lists of promoter databases & promoter prediction softwarepromoter prediction software
• URLs from Mount Chp 9, available onlineTable 9.12 http://www.bioinformaticsonline.org/links/ch_09_t_2.html