Inferring an Origin of Replication With Computational Methods Brian Smith
Dec 29, 2015
Pseudomonas as a pathogen
A cryptic megaplasmid found in Psuedomonas syringae
Phenotypic costs associated with large scale HGT
Introduction
P. aeruginosa
P. syringae pv. aesculi
The Problem
Conjugation of pMP into P. aeruginosa has failed.
In other Pseudomonads the pMP is transferred successfully at high numbers.
Several reasons why this might be including host range, or genes on the pMP that may illicit P. aeruginosa resistance
The pMP is sequenced and the annotations are mostly ‘hypothetical protein’
Various methods for predicating bacterial origins of replication for chromosomes
I wanted to see if similar methods would work on this plasmid
GC skew
Repetitive Motifs
Searching for the Origin
Previously shown that a dramatic shift in GC content is associated with the chromosome origin and terminus (Lobry 1996).
GC Skew
E. coli
I Used seqinR and an R script to calculate GC across the pMP
GC Skew
myseq <- read.fasta(file = "Desktop/pMP.fasta", as.string = FALSE, forceDNAtolower = TRUE, set.attributes = FALSE, seqonly = TRUE, strip.desc = TRUE)
I Used seqinR and an R script to calculate GC across the pMP
GC Skew
myseq <- read.fasta(file = "Desktop/pMP.fasta", as.string = FALSE, forceDNAtolower = TRUE, set.attributes = FALSE, seqonly = TRUE, strip.desc = TRUE)
Origin of Replication
oriFinder
DnaA boxes vary in size
E.coli’s is (TTATCCACA)
Programs like this require that you know your motif
Built a custom python script from scratch
Repetitive Motifs
Input: fasta file Output to Terminal:
Sequence selection Min count # of Motifs found Top 10 common Motifs found
Output to file: Dictionary of Dictionaries
containing motif, count, and sequence position
Repetitive Motifs
Blast unknown protein sequence in this region involved in replication
Engineer smaller plasmids containing min. tools for replication using restriction enzymes
Attempt to conjugate new minimalist plasmids into P. aeruginosa
Current/Future Goals