Top Banner
Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M. Hogan Queensland University of Technology
21

Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

Queensland University of Technology

CRICOS No. 00213J

Using a Beagle to sniff for Bacterial Promoters

Stefan R. Maetschke, Michael Towsey and James M. Hogan

Queensland University of Technology

Page 2: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

2

An Agenda

• Bacterial Promoters– The domain and the motifs – Earlier approaches, including ours

• Why dumber is better – Not quite, but flexibility before sophistication – Exploiting new features as they are identified

• Results

Page 3: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

3

Upstream from a Bacterial Gene

TSSpromoter

RNA polymerase

transcription

GSS gene

• Search for ‘conserved’ -10 and -35 hexamers– Except they’re not really conserved– Plagued by massive false positive rates

• But this is the Reader’s Digest version

Page 4: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

4

Previous Work

• Mainly in the E. coli system • PWMs – simple, but poor discrimination

– Good performance if compound structure used – (Collado-Vides et. al.: State of the art pre 2006)

• HMMs – less successful than in eukaryotes • TDNNs – boosted by GSS offset distribution • SVMs – spectrum kernel ensemble

– (Gordon et. al. (us): state of the art, but at a price)

70

Page 5: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

5

Beagle

• Principled and rapid inclusion of motifs as they are discovered or hypothesised – Prior to the Gordon et. al. paper, a TP:FP ratio of

1:300 was considered good. – But this was based solely on -10 and -35 motifs

• A model description language and parser– Less sophisticated than it sounds, but sufficient

• Iterative refinement of the model

Page 6: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

6

Upstream from a Bacterial Gene

TTGACA

-10 element

TATAAT

TSS GSS-35 element

ATG

Core Enzyme:

Specific sigma controls binding at -10, -35 elements

But binding probability varies enormously

Compensate when hexamers are weak

“It has long been known that domains 2 and 4 … bind to the strongly conserved -10 and -35 boxes”. Except when they don’t because they aren’t…

Page 7: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

7

Upstream from a Bacterial Gene

TTGACA TRTG

Extended -10 element

TATAAT

TSS GSS-35 element

ATG

Simple Extended -10: TG Discovered in B. Subtilis, found in 20% of promoters in E. Coli

-16 hypothesised to be important in E. Coli, TRTG or T(AG)TG consensus

But even the alpha units aren’t what they seem…

Page 8: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

8

Upstream from a Bacterial Gene

TTGACAAAAAAARNRAWWWWWTTTTT

CTD1CTD2

NTD2

proximal UP element

TSS GSSdistal UP element

-35 element

ATG

NTD1

TRTG

Extended -10 element

TGTATAAT

-16

CTDs are carboxy terminal domains, binding to UP elements

AT-rich region, proximal element more important

Page 9: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

9

The Data

• E. Coli and B. Subtilis• Confirmed TSS locations within 250bp of the

nearest gene start – No overlapping reading frames

• N=492 (E. Coli), 205 (B. Subtilis) • 250 bp USRs available

Page 10: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

10

Beagle algorithm

• Define a consensus promoter– e.g. <TTGACA (15, 21) TATAAT (4, 13) TSS>– Ordered pairs specify gap ranges

• Parse the description and define PWMs and weighted gaps – Initially trivial

• Refine using the confirmed TSS locations

Page 11: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

11

Beagle algorithm

• For each USR in the training set:– Anchor the pattern to the known TSS location– Determine the best match based on the current model

• Find the MLE of the model parameters based on the best matches from the training data.

• Test the refined definition on unseen data– 10 repeats x 10 fold cross validation

– Essentially TSS prediction

• Iterate until improvement ceases.

Page 12: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

12

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical -35, -10

boxes 37.5 ± 1.4 % 61.6 ± 1.8 %

Canonical

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Guess which promoter boxes are more strongly conserved…

Page 13: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

13

Including UP elements

• NNW15NN – AT rich region

• NNAAAWWTWTTNNAAANNN – Estrem et al 1998

• NNAAAWWTWTTN – A6RNR– Gourse et al 2000– distal - proximal motif

Page 14: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

14

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical boxes

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Canonical

+ distance to GSS

+ Estrem UP

41.4 ± 1.2 % 62.0 ± 1.7 %

Canonical

+ distance to GSS

+ AT rich region

47.3 ± 1.2 % 64.8 ± 1.8 %

Page 15: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

15

Comparing E. coli and B. subtilis promoters

B. subtilis -35 element

B. subtilis -10 element

E. coli -10 element

E. coli -35 element

E. Coli has 7 known sigmas; B. Subtilis 18…

Page 16: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

16

Motifs ‘in the Gap’

• Extended -10 element – Consensus TGTATAAT– Strongly implicated in Subtilis– Hypothesised as significant in 20% E Coli

• Extended -16 element – Consensus TRTG

Page 17: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

17

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical boxes

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Canonical

+ distance to GSS

+TG extended-10

41.6 ± 1.3 % 62.5 ± 1.8 %

Canonical

+ distance to GSS

+TRTG extended-10

37.6 ± 1.3 % 62.6 ± 1.8 %

Page 18: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

18

The Complete Picture

-10-35

CTDII

CTD

NTD

70CTDII

CTDII

-40.5-52-62-72

UP elementAT rich

Variable location

Page 19: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

19

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical boxes

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Canonical

+ distance to GSS

+TG extended-10

+ AT rich region

48.3 ± 1.5 % 68.8 ± 1.6 %

Canonical

+ distance to GSS

+TRTG extended-10

+ AT rich region

40.5 ± 1.4 % 71.2 ± 1.7 %

Page 20: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

20

TSS recognition (% accuracy)

E. coli

43.3%

48.3%

B. subtilis

61.2%

71.2%

+AT rich 47.3% 41.6% +TG +AT rich 64.8% 62.6% +TRTG

Page 21: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

CRICOS No. 00213Ja university for the worldrealR

21

Conclusions

• Beagle provides a simple bridge between experiment and computational discovery– Is the extended -16 motif really important in E. Coli?– (Well, not in any general sense)

• Fast, robust and flexible • Extensions

– Combination of model organisms– Comparative genomics & regulation