A Model of Bacterial Chromosome Architecture Matthew Wright, Daniel Segre, George Church
Dec 19, 2015
Can we understand the 3-d structure of the chromosome?
How optimal is the spatial organization of DNA for cell?
Can we link function and chromosome structure?
Mycoplasma Pneumoniae
816 Kbp90% Coding688 Genes110 Membrane Proteins52 Ribosomal ProteinsNo Active TransportNo RegulationLimited MetabolismFew DNA Binding Proteins
A Model System
.5 m diameter
.06 m3 volume8000 Ribosomes would fill the cell
Extended DNA 80 m in diameterover 100 times cell diameter
“Nose” polarity
Features
Microscopy Cross-linking Loop Patterns
Tom KnightGasser et al. Science 2002 296 Dekker etal. Science 2002 295
Empirical Constraints
Transmembrane ProteinsPotter MD, Nicchitta CV, 2002 J Biol Chem. 2002 Jun 28;277(26)
110 genesRNA and or Protein Complexes
52 genesMetabolism
DNA Structural Forces Tobias I et al Phys Rev E Stat Phys Plasmas Fluids Relat Intdisc.
Topics. 2000 Jan;61(1)
Replication
Theoretical Constraints
Symmetry Constraints
Symmetric Replication
If polymerases replicate at a constant ratesymmetric sites from origin are close when replicated
Flattened Circle
O T
General Helix Parameters a (rise)
Supercoil Parametersw (frequency)Ac (amplitude of cos)As (amplitude of sin)
Radial ParametersR (maximum large radius) d (frequency of large radial oscillations)
Helix Parameters
0 50 100 150 200 250 300 350 400 450 5000
200
400
600
800
1000
1200
1400
1600
1800
time steps
Ene
rgy
Energy Decreases
Begin With Optimization in Helical Parameter Space
Then Perform Random Walk of Genome for Secondary Optimization
Generate Relatively Ordered Structures while allowing Local Disorder to Meet Constraints
Combine Both Methods
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
time steps
cost
Energy
Prelimary data are promising
Incorporate Distance Geometry
Need to calculate statistics
Gather experimental Datapredict and test
Incorporate Replication and Dynamics
Current
Distance Geometry
• Represent Structure in terms of distances• Constraints fit into a single matrix• Matrix with “bounds” defines all possible
configurations• Can find inconsistencies in constraints• Rotationally invariant
Basis
• Cholesky or eigenvalue decomposition of inner product matrix, M
• Can get M from D, matrix of distances by defining an origin
XX t M LLt
XX t M SS t
d0i 1
Ndij
j
1
N 2d jk
jk
Additional Cost TermsProximity of Enzymes during Metabolism
Stoichiometric Matrix
Curvature
Replication
Incorporate Forces on DNA by Using Elastic Rod Model
If constraints based on function predict structurethen structure and function are related at genome scale
Potential new class of model
Conclusions
Method
• Place constraints in matrix
• Solve for upper and lower bounds from triangle inequalities
• Randomly choose a configuration within these bounds
• Embed in 3 dimensions
• Minimize error
Model for nose replication
Seto S, Layh-Schmitt G, Kenri T, Miyata M. J Bacteriol 2001 Mar;183(5):1621-30 Visualization of the attachment organelle and cytadherence proteins of Mycoplasma pneumoniae by immunofluorescence microscopy.
Bidirectional
2 Polymerase Complexes Remain Attached
Daughter DNA Separate Sides
Causes Minimal Entanglement
Allows for Multiple Firing of Origins
Paired fork model
x frame
Rcos(dt)cos(t)
Rcos(dt)sin(t)
at
t
Ý x Ý x
n
Ý t Ý t
b
t
n
x local
t
n
b
Ac cos(wt)
Assin(wt)
0
x
x frame
x local
Frenet Frame on Helix
Melting Temperature
• Short Duplex– C total concentration of single strands
• Long Duplex
Tm H
R logC S
llCGNaTm /500/)(41]log[6.165.81
Wordsize(a digression)
• Blast seeds with at least 7 base string of identities
• Want to find all alignments with at most 20 mismatches
• What is the probability of finding a stretch of 7 identities in a string of length 70 with 20 mismatches?
Marbles
• Maps into the problem of partitioning a string of length 70 into 21 bins
• Total number of ways
20
70
11101110111101001101011101111111010101111011 etc
Counting
• Now count the fraction with at least a stretch of 7
1
21
20
63
•But over-counting is a problem
Correcting
• The cases where 2 bins each have a 7 mer is counted twice so subtract this number once
2
21
20
56
1
21
20
63
•Problem with the cases where there are 3 bins with a 7 mer
3
21
20
51)1
2
33(
2
21
20
57
1
21
20
63
Correction Continued
Principle of inclusion-exclusion
17
1
)1(21
20
770
l
l l
l
Extension
• Coefficients for at least m bins of wordsize l
• m=2
– 1,-2, 3,-4 …
...4
21
20
44)1
2
32
2
4(
3
21
20
51)1
2
3(
2
21
20
57
•m=3
–1,-3,5,-7