Top Banner
http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15: TF Motifs (Harendra)
38

Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Dec 14, 2015

Download

Documents

Angelica Bilson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 1

MW  11:00-12:15 in Beckman B302Prof: Gill BejeranoTAs: Jim Notwell & Harendra Guturu

CS173

Lecture 15: TF Motifs (Harendra)

Page 2: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

• Project milestones due Today

http://cs173.stanford.edu [BejeranoWinter12/13] 2

Announcements

Page 3: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 3

Review: Transcriptional regulation of genes

Transcription Start Site (TSS)

Thousands of transcription factor-CRM interactions that control gene expression in each cell type

Enhancer (CRM)

Page 4: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 4

Last Time: ChIP-Seq - a first glimpses of the regulatory genome in action

Cis-regulatory peak

4

Peak Calling

Page 5: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Gene transcription start site

SRF binding ChIP-seq peak

Ontology term (e.g. ‘actin cytoskeleton’)

http://cs173.stanford.edu [BejeranoWinter12/13] 5

Last Time: Infer functions of ChIP-seq binding profile using GREAT

π π π

GREAT = Genomic RegionsEnrichment of Annotations Tool

P = Prbinom(k ≥5 | n=6, p =0.33)

p = 0.33 of genome annotated with

n = 6 genomic regions

k = 5 genomic regions hit annotation

π

π π π

π

Page 6: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 6

GREAT gives you a tables of functions

Ontology Term # Genes Binomial Experimental P-value support*

Gene Ontology actin cytoskeletonactin binding

7x10-9

5x10-5

Miano et al. 2007

Miano et al. 2007

* Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT.

3031

Pathway Commons

TRAIL signalingClass I PI3K signaling

5x10-7

2x10-6

Bertolotto et al. 2000

Poser et al. 2000

3226

TreeFam 1x10-85 Chai & Tarnawski 2002

TF Targets Targets of SRFTargets of GABPTargets of YY1Targets of EGR1

5x10-76

4x10-9

1x10-6

2x10-4

Positive control

ChIp-Seq support

Natesan & Gilman 1995

84284423

Top GREAT enrichments of SRF

FOS gene family

Page 7: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Gene transcription start site

SRF binding ChIP-seq peak

Ontology term (e.g. ‘actin binding’)

http://cs173.stanford.edu [BejeranoWinter12/13] 7

Last Time: Infer functions of ChIP-seq binding profile using GREAT

GREAT = Genomic RegionsEnrichment of Annotations Tool

P = Prbinom(k ≥4 | n=6, p =0.5)

p = 0.5 of genome annotated with

n = 6 genomic regions

k = 4 genomic regions hit annotation

π

π π π

π`

π π π

Page 8: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 8

GREAT gives you a tables of functions

Ontology Term # Genes Binomial Experimental P-value support*

Gene Ontology actin cytoskeletonactin binding

7x10-9

5x10-5

Miano et al. 2007

Miano et al. 2007

* Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT.

3031

Pathway Commons

TRAIL signalingClass I PI3K signaling

5x10-7

2x10-6

Bertolotto et al. 2000

Poser et al. 2000

3226

TreeFam 1x10-85 Chai & Tarnawski 2002

TF Targets Targets of SRFTargets of GABPTargets of YY1Targets of EGR1

5x10-76

4x10-9

1x10-6

2x10-4

Positive control

ChIp-Seq support

Natesan & Gilman 1995

84284423

Top GREAT enrichments of SRF

FOS gene family

Page 9: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 9

GREAT gives you a tables of functions

Ontology Term # Genes Binomial Experimental P-value support*

Gene Ontology actin cytoskeletonactin binding

7x10-9

5x10-5

Miano et al. 2007

Miano et al. 2007

* Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT.

3031

Pathway Commons

TRAIL signalingClass I PI3K signaling

5x10-7

2x10-6

Bertolotto et al. 2000

Poser et al. 2000

3226

TreeFam 1x10-85 Chai & Tarnawski 2002

TF Targets Targets of SRFTargets of GABPTargets of YY1Targets of EGR1

5x10-76

4x10-9

1x10-6

2x10-4

Positive control

ChIp-Seq support

Natesan & Gilman 1995

84284423

Top GREAT enrichments of SRF

FOS gene family“π”Different

Page 10: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 10

• Hard or impossible to get the required cells• Some cells don’t occur in enough quantity to ChIP• Others are hard to dissect• Certain human tissues are hard to obtain

• Hard to get a good antibody• Ex: We have ChIP results for a factor in brain

• We have not be able to repeat it since we can’t find the same antibody

• Lots of time and money to do one experiment

• Only information for one context – cell type or time

Can we computationally predict the binding sites for many contexts and factors?

But doing the experiment is the hard part!

Page 11: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 11

Recall: TFBS Position Weight Matrix (PWM)

Alignment (count) MatrixA 9 0 0 1 0 8 0 0C 0 1 1 1 7 0 3 0G 0 2 7 8 1 2 0 8T 1 7 2 0 2 0 7 2

Frequency Weight MatrixA 0.9 0.0 0.0 0.1 0.0 0.8 0.0 0.0C 0.0 0.1 0.1 0.1 0.7 0.0 0.3 0.0G 0.0 0.2 0.7 0.8 0.1 0.2 0 0.8T 0.1 0.7 0.2 0.0 0.2 0.0 0.7 0.2

Cons A T G G C A T G

Experimentally determined sites

A T G G C A T GA G G G T G C GA T C G C A T GT T G C C A C GA T G G T A T TA T T C G A C GA G G G C G T TA T G A C A T GA T G G C A T GA C T G G A T G

Can we use a PWM to predict where the TF will bind in the genome

(without doing ChIP-seq)?

Page 12: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 12

Binding Site Prediction using Match

Problem: High number of false positives.

Page 13: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 13

Recall: TFBS Position Weight Matrix (PWM)

Alignment (count) MatrixA 9 0 0 1 0 8 0 0C 0 1 1 1 7 0 3 0G 0 2 7 8 1 2 0 8T 1 7 2 0 2 0 7 2

Frequency Weight MatrixA 0.9 0.0 0.0 0.1 0.0 0.8 0.0 0.0C 0.0 0.1 0.1 0.1 0.7 0.0 0.3 0.0G 0.0 0.2 0.7 0.8 0.1 0.2 0 0.8T 0.1 0.7 0.2 0.0 0.2 0.0 0.7 0.2

Cons A T G G C A T G

Experimentally determined sites

A T G G C A T GA G G G T G C GA T C G C A T GT T G C C A C GA T G G T A T TA T T C G A C GA G G G C G T TA T G A C A T GA T G G C A T GA C T G G A T G

1.2 0.7 0.7 0.7 0.6 1.0 0.8 1.0

Informationcontent ofeach column

Information content of a motif= sum of all columns= 1.2 + 0.7 + 0.7 +0.6 + 1.0 + 0.8 + 1.0 = 6.0

Page 14: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 14

Information content is a measure of motif specificity

SRF

REST

SPIB

(IC ~ 12)

(IC ~ 5)

(IC ~ 25)

How do these compare to a library of many PWMs?

Page 15: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 15

PWMs have a range of information content

SRF

RESTSPIB

Page 16: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

• Measure of motif specificity

16

Information content determines how accurately we can predict the binding site

SRF

SRF

2 million

http://cs173.stanford.edu [BejeranoWinter12/13]

Page 17: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

• Measure of motif specificity

17

Information content determines how accurately we can predict the binding site

SRF

SRF

2 million

2 million matches to the SRF motif,

but ChIP-seq and other estimates suggest ≈ 10,000 actual binding sites

http://cs173.stanford.edu [BejeranoWinter12/13]

Can we do better?

Page 18: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 18

Use excess conservation to improve prediction accuracy

Aaron Shoa

Wenger et al., PRISM offers a comprehensive genomic approach to transcription factor function prediction. 2013

Page 19: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Use shuffled motifs to calculate confidence of excess conservation binding site prediction

19http://cs173.stanford.edu [BejeranoWinter12/13]

shuffled

real

branch length (subst / site)

fraction conserved

Confidence is the fraction conserved in excess.

excess = 0.12total = 0.32

confidence = excess / total

Transcription factor motif

Genome-widebinding sitepredictions

10 ShuffledTranscription factor motifs

Genome-widebinding sitepredictions

Page 20: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

20

Probabilistic interpretation• Confidence is the probability that a motif instance is functional given its observed conservation.

PrR(functional | C ≥ c)

= 1 - PrR(not functional | C ≥ c) PrR(C ≥ c | not F) PrR(not

F) PrR(C ≥ c)

= 1 -

branch length (subst / site)

PrR(C ≥ c)

PrS(C ≥ c)

R: real motifS: average shuffled motif

PrR(C ≥ 1.5) = 0.2

PrS(C ≥ c) PrR(not F)PrR(C ≥ c)

= 1 -

PrR(C ≥ c) - PrS(C ≥ c) PrR(not F)PrR(C ≥

c)

=

PrR(C ≥ c) - PrS(C ≥ c)PrR(C ≥ c)

≈excesstotal

=

http://cs173.stanford.edu [BejeranoWinter12/13]

Page 21: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Excess conservation score defined by genomic background

21http://cs173.stanford.edu [BejeranoWinter12/13]

Page 22: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Excess conservation score also defined by motif

http://cs173.stanford.edu [BejeranoWinter12/13] 22

Page 23: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

ARE THE PREDICTIONS ANY GOOD?

Perform genome-wide binding site predictions…

http://cs173.stanford.edu [BejeranoWinter12/13] 23

Page 24: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 24

Use ChIP-seq overlap as a measure of sensitivity

Genome-widebinding sitepredictions for one factor (Ex: E2F4)

ChIP-seqfor same factor(Ex: E2F4)

Sensitivity = Overlapping ChIP-peaks / Total ChIP-peaks

But how do you assess if your overlap is good?Compare to the best tool out there

(or all the tools, if there is no “best”)

Page 25: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Excess conservation binding site prediction is more accurate than existing methods

25http://cs173.stanford.edu [BejeranoWinter12/13]

(prior state of the art)

Page 26: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

26

Excess conservation captures binding site profile similar to ChIP-seq

ChI

P-se

q

Mot

ifMap

PRIS

M

cons

erva

tion

(% id

entit

y)

http://cs173.stanford.edu [BejeranoWinter12/13]

Page 27: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 27

• Now we have good genome-wide binding site predictions for many factors

Lets submit them to GREAT and find out what they are doing…

Submit predictions to GREAT

Page 28: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Transcription factor Ontology Top-ranked biological context GREAT rank for ChIP-seq Experimental supportGABPA GO Biological Process translation 2 (Genuario and Perry, 1996)

GO Cellular Component membrane coat 14 NovelGO Molecular Function translation initiation factor activity 4 (Genuario and Perry, 1996)Mouse Phenotypes increased single-positive T cell number None (Yu et al., 2010)PANTHER Pathway general transcription by RNA polymerase I 1 (Hauck et al., 2002)Pathway Commons transcription 3 (Hauck et al., 2002)

REST (NRSF) GO Biological Process neurotransmitter transport 1 (Schoenherr et al., 1996)GO Cellular Component neuronal cell body None (Schoenherr et al., 1996)GO Molecular Function cation channel activity 1 (Schoenherr et al., 1996)Mouse Phenotypes abnormal synaptic transmission 1 (Schoenherr et al., 1996)PANTHER Pathway synaptic vesicle trafficking 2 (Schoenherr et al., 1996)Pathway Commons transmission across chemical synapses 3 (Schoenherr et al., 1996)

SRF GO Biological Process muscle structure development None (Miano et al., 2007)In Jurkat GO Cellular Component actin cytoskeleton 1 (Miano et al., 2007)

GO Molecular Function structural constituent of muscle None (Miano et al., 2007)Mouse Phenotypes dilated heart ventricles None (Parlakian et al., 2004)PANTHER Pathway cytoskeletal regulation by Rho GTPase None (Hill et al., 1995)Pathway Commons regulation of insulin secretion by acetylcholine None Novel

STAT3 GO Biological Process negative regulation of signal transduction None (Naka et al., 1997)In mESC GO Molecular Function transforming growth factor beta binding None (Kinjyo et al., 2006)

Mouse Phenotypes abnormal spleen B cell follicle morphology None (Schmidlin et al., 2009)Pathway Commons Signaling events mediated by TCPTP None (Yamamoto et al., 2002)

Comparing binding site prediction to ChIP-seq

28http://cs173.stanford.edu [BejeranoWinter12/13]

Page 29: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

TF function p-value target genes

SRF muscle structure development 7.43×10-41 157

29

PRISM re-discovers known functions

GLI2 skeletal system development 7.07×10-48 192

CRX retinal photoreceptor degeneration 1.30×10-10 34

AR abnormal spermiogenesis 1.19×10-6 26

Is the number of re-discovered known functions impressive?

http://cs173.stanford.edu [BejeranoWinter12/13]

Page 30: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 30

Evaluate re-discovery of known function using “closed loops”

How can we assess if the functional associations predictedby PRISM for a particular TF are reasonable without

reading a lot of papers?One way is to check if the TFs are

annotated with the function (form a closed loop)

SRF

Genes involved in “muscle structure development”

SRFIs SRF itself annotated with the term “muscle structure development”?

YES – a “closed loop”

Page 31: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

31

PRISM predictions are consistent with known transcription factor biology

http://cs173.stanford.edu [BejeranoWinter12/13]

Null Model:How many closed loopsusing 50,000 random shuffled PWM libraries?

Page 32: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 32

1. Incomplete annotation

2. “Regulation of” annotation

Many non-closed loops are still trueTF function p-value target genes

GATA6 abnormal pancreas development 5.69×10-13 23

SRF actin cytoskeleton4.84×10-58 142

Nature Genetics, December 2011.

SRF acts in the nucleus, where it regulates actin cytoskeleton genes.

Page 33: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 33

• Now we have good genome-wide binding site predictions for many factors

• AND we have functional predictions without ChIP-seq

Was it as easy as creating binding sites and submitting the results to GREAT?

…not quite…

Raw GREAT results need cleaning for conserved TFBS

Page 34: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

Shuffled motifs also give GREAT enrichments

34http://cs173.stanford.edu [BejeranoWinter12/13]

Examine closely

Transcription factor motif

Genome-widebinding sitepredictions

10 ShuffledTranscription factor motifs

Genome-widebinding sitepredictions

Run GREATand observe

biological function

Run GREATand observe

biological function

Filter PRISM

Page 35: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 35

Shuffled motifs are used to create a “E-value” metric to black list enrichments that show up for shuffles

Stage 1: GREAT on binding site prediction

Stage 2: Top significant

GREAT terms

Stage 3: PRISM terms (via black

listing)  Obtained = GREAT Kept Kept = PRISMPRISM vs. GREAT on b.s. prediction

# TF-term associations

31,946

7,529

1,658 GREAT predictions kept 5.2%TF-term FDR 50.5% 49.5% 16.4%FDR improvement 308%

closed loop % 3.3% 5.3% 10.9%fraction loops improvement 329%

(from shuffles)

What are all the terms we are throwing away?

Page 36: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 36

GREAT enrichments from shuffles are due to conservation bias

1733755 546Shuffles (2488) CNEs (2279)

• Create 10,000 random sets of random conserved non-coding regions• Run GREAT• How do the enrichments compared to those from shuffled motifs?

Pro: E-value helps us get more accurate predictions by removing false predictionsCon: Conservation bias filter, causes us to lose potentially real enrichments

in systems that are more often conserved

Page 37: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

• “Excess Conservation”• advanced the state of the art for binding site prediction

• “PRISM pipeline”• combined accurate binding site prediction with GREAT

• Publically offered as a web application• bejerano.stanford.edu/prism

http://cs173.stanford.edu [BejeranoWinter12/13] 37

So far…

Page 38: Http://cs173.stanford.edu [BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:

http://cs173.stanford.edu [BejeranoWinter12/13] 38

The rest of the talk includespre-publication work