Model for Evaluation of DNA Model for Evaluation of DNA Synthesis Synthesis Created by: Ori Kaplan Gilad Myerson Supervised by: Gregory Linshiz, Weizmann institute Prof. Udi Shapiro, Weizmann institute
Jan 06, 2018
Model for Evaluation of DNA SynthesisModel for Evaluation of DNA Synthesis
Created by:Ori KaplanGilad Myerson
Supervised by:Gregory Linshiz, Weizmann instituteProf. Udi Shapiro, Weizmann institute
Synthesizing DNASynthesizing DNACurrently, there are few successful ways of
synthesizing DNA.
Most common - Assembly PCR.
Methods are costly and take much time (±3 weeks from order to delivery of a DNA strand).
ABI 3900Mer-Made6
New ApproachNew Approach
Prof. Udi Shapiro / Gregory Linshiz:New confidential method of in-vitro DNA molecule
synthesis.
Goal – synthesize DNA quicker, easier and cheaper.
Part of this method, involves elongation of oligonucleotides.
Elongation success rate (until now) ≈ 80-90%.
New ApproachNew Approach
Elongation of DNA includes…..
Since the elongation of oligonucleotides in-vitro is done on the pattern of synthetic DNA strands, we will give a brief explanation of synthetic oligonucleotide synthesis.Oligonucleotide synthesis is a remarkably simple process that has far reaching implications. Oligonucleotide synthesis is extremely useful in laboratory procedures. It is used to make primers crucial in methods such as PCR replication. Making a custom oligonucleotide is additionally useful because they will only bind to the region of DNA that is complementary to your custom oligonucleotide sequence. This allows specific segments of DNA to be amplified. In addition, custom oligonucleotide synthesis allows other sequences, such as restriction sites, to be added on to the desired oligonucleotide. Custom oligonucleotides are generally 50 bases in length which can limit how many additional sequences can be added on to the desired primer sequence.Oligonucleotides are synthesized by using DNA Phosphoramidite Monomer Bases as building blocks. The monomer bases active sites are all chemically blocked in such a way that they can be unblocked at will by use of unblocking solutions. The oligonucleotide synthesis involves 4 stages:Stage 1: De blockingThe first base, which is attached to the solid support, is at first inactive because all the active sites have been blockedor protected. To add the next base, the DMT group protecting the 5'-hydroxyl group must be removed. This is done by adding a base. The 5’-hydroxyl group is now the only reactive group on the base monomer. This ensures that the addition of the next base will only bind to that site.Stage 2: Base condensationThe next base monomer cannot be added until it has been activated. This is achieved by adding tetrazole to the base. The active 5’-hydroxyl group of the preceding base and the newly activated phosphorus bind to loosely oin the two bases together. Stage 3: CappingThe unbound, active 5’-hydroxyl group is capped with a protective group which subsequently prohibits that strand from growing again. This is done by adding acetic anhydride and N-methylimidazole to the reaction column.Stage 4: OxidationIn order to stabilize the phosphate linkage, a solution of dilute iodine in water, pyridine, and tetrahydrofuran is added to the reaction column, oxidizing and strengthening it.
Top Secret
SequencingSequencingAfter the DNA synthesis procedure ,sequencing
the new molecules will indicate if the right molecule was synthesized.
A chromatogram of DNA synthesis:
ChromatogramChromatogramWhat does a chromatogram portray?
“Clean” chromatogram –all molecules are identical
“Noisy” chromatogram – inexplicit
All A
Some A Some T
The problemThe problemLets assume this is the sequencing result:
I. Is the experiment successful???II. What needs to be changed in order to
improve method?pH, temp, polymerase, dNTP’s,
concentrations…
Noise
The problem contd..The problem contd..Which result is better…?
Conventional AnalysisConventional AnalysisCLONE TO UNDERSTAND THE SEQUENCINGIsolation cloning:
Isolate single molecules read exact sequence.
Cloning several oligos gives an insight to the methods' degree of success.
Theoretically, clone all in order to see if experiment was successful.
Weizmann’s requestWeizmann’s requestCloning – very long, hard and expensive.
Please try figure out a way to asses the degree of success “visually” using the chromatogram…
OK…OK…אם נחייך יחשבו
שאנחנו מבינים???
ננסה בכל מקרה
OK…OK…
יש לי יש לי יש לי...
A Solution ???A Solution ???
Lets treat the graph like LEGO© and see what we can do with the pieces…
Perfect SequencingPerfect SequencingA C T G
C A C T G A C A C G C T T A C T G C C G
10 molecules
Mutations occurMutations occur
“Dirty” chromatogram
deletion
deletion
deletion
insertion
substitution
Two ways to try understand graphTwo ways to try understand graph
Sequence every single oligonucleotide
(isolation cloning)
Impossible
Sample sequencing and assessment of result
Statistically inaccurate
Another OptionAnother Option
Mathematically “Build” oligonucleotide molecules in such a way that the accumulated graph of those molecules will be identical to the chromatogram
Graph Graph Table Table
A1018271361
G936136136
C1010282646113613971
T1013961361
If I had 10 20-nucleotide long molecules – how many bases of each kind do I have in each “place”?
Table Table Molecules MoleculesA1018271361
G936136136
C1010282646113613971
T1013961361
Random procedure
Molecules Molecules Graph Graph
New ProblemNew ProblemHow do we choose the 100 molecules that build
graph?
Linear – too many options to check O(4n)!
Choose 100 from 4n.
If oligo is 100 nucleotides long n = 100.Choose 100 molecules from 1.6*1060
nk
1.6*1060
100= ≈
OK…OK…
תחייך – אולי יתנו 100לנו
ננסה...
OKOK……
יש לי יש לי יש לי...
The problemThe problemDon’t choose from all possibilities, assume that each
molecule has only one mutation – Edit Distance 1
Reduced molecules: 4n 8n
Select 100 molecules from 800 (instead of 1.6*1060)
OR OR
Still a problemStill a problemHow do we choose 100 molecules from 800?
Linear:n!
k!(n-k)!nk
1.6*1060
100== = 3*10129 possibilities
Genetic AlgorithmGenetic Algorithm
Genetic algorithmGenetic algorithmDefine initial mutation rates:
deletions , insertions (?), substitutions (?)
Normalize graph and convert graph to matrix (4xn).
Build a molecule bank of “Edit Distance 1”.
PopulationPopulationThere is a population of 100 – each entity in
population represents a single result.
Each result consists of 100 molecules(from the ED1 bank) that build up a graph.
The population is initialized using the mutation rate.
100
100
One result
Evaluation functionEvaluation functionThe current Evaluation function is:
F(e) = ∑|Mij – Rij|
In the future the function will take amount of substitutions into consideration.
F(e) ∑experiment result
GenerationGenerationGeneration Policy (current):
Replication – Always replicate best 10.Crossover – Biased choice of entities for crossover.Mutations – i: mutate best 10.
ii: randomly mutate the whole pop.
Local Minimum Policy:20 generations without improvement – shake pop.
File HandlingFile HandlingSequence data is initially in *.ab1 filesIn order to utilize data:
Retranslate *ab1 file – Sequencing Analysis
Convert *.ab1 *.txt – Bioedit
Manage *.txt – Excel (also calculate del rate)
Genetic Algorithm
No mutations - beforeNo mutations - before
1
No mutations - afterNo mutations - after
1
10*del at 1, 10*del at 910*del at 1, 10*del at 9
1
10*del at 1, 10*del at 910*del at 1, 10*del at 9
1
10*del at 1, 10*del at 910*del at 1, 10*del at 9
1
10*del at 1, 10*del at 910*del at 1, 10*del at 9
1
15 scattered subs - before15 scattered subs - before
1
15 scattered subs15 scattered subs
1
SetbacksSetbacksED1 – Result will never be 100% correct.
Genetic Algorithm setbacks:heuristic, different final results, local min, evaluation function…
No indication if results are correct.
Algorithm deals with successful experiments.
Data input – noise interpretation, normalized data.
AdvantagesAdvantages
New method of sequencing analysis.
Potentially save many hours of isolation cloning.
Mathematically – result is correct.
Development potential for different areas of research.
Personal ViewPersonal ViewThrown into deep water swam.
Idea will (hopefully) be practical and useful.
Learned a great deal – new programs, languages, methods.
Mathematical analysis of chromatogram sequencing – ever done before???
Thank you