Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim Mullins, Mark Jensen, Jerry Learn
Bioinformatics and Graphical Models:
Computation, approximation, and their value
MSR:Nebojsa Jojic, Vladimir
Jojic, Chris Meek, David Heckerman
UW:Jim Mullins, Mark
Jensen, Jerry Learn
Overview• Computational cost of usual algorithms
– State of the art– Phylogeny + alignment– Phylogeny + sequence modeling– Approximations and their pitfalls
• Recombination– Analogy to other ML domains– Graphical model– Experiments and computational cost
• Value of the computation– Potential applications– Drug discovery cycle– Value of time and clinical success– Market size and growth
• Discussion
Rational vaccine design(Jim Mullins et al)
• Rational design– Analysis of sequences to form a model of
virus evolution (phylogenies, etc.)– Develop vaccines that target as much
variability as possible
• Traditional design– Trial and error– Educated guesses
State of the art sequence analysis programs
• Example: – Rational AIDS vaccine design– Analysis of the envelope gene from a single patient in one visit– 200 sequences with 600 base pairs each– Overnight to align– 1-2 hours to 2-3 days to build a tree, depending on how much
search you are willing to do– This does not include modeling the inter-sequence
dependencies, coupling alignment and tree search, and it ignores recombination
• The total length of the HIV genome is 10000 and the number of samples is practically only limited by cost
Computational cost of a slightly more detailed analysis
• Metropolis search over all trees on 400 sequences of the full genome (10k) would last around 2 years on one machine
• Exact search intractable!
Approximation
• Free energy as a bound on negative log-likelihood
• Computation and approximation of the free energy:– Iterative conditional modes– Mean-field method– Structured variational techniques– (Loopy) belief propagation– Sampling techniques
• How tight is the bound?• What does the looseness translate to?
An example of the approximation issues
An example of the approximation issues
An example of the approximation issues:Tightness of the bounds
Variational technique Exact EM algorithm
Recombination
• In HIV, the rate of recombination has recently been estimated to be ¼ of the rate of mutation!
• Combinatorial explosion in inference
Similar situations in other domains where graphical models work well
• Occlusion in video
• Source interaction in audio
• Composition of images
“Occlusion” in audio
Speaker1 Speaker2
M 1-M* *
+
||
Retrieved Speaker1
Retrieved Speaker2
Epitome of an image
Input image
A set of image patches
Epitome
Layers from a single photograph
em
es
S1 s2 M
x
Modeling alignment and recombination by learning a library of gene patterns
sji-1 sj
i sji+1
xji-1 xj
i xji+1
r1={ACTGTCAGT}r2={ACGATC}
copy pattern 1, position 2 (letter C); insertion mutation
s1={(1,1), (1,2), (1,3), (1,3), (1,3), (1,3), (1,4),(1,5),(1,6),(2,1),(2,2),(2,3)}c1 ={ 1 1 1 0 0 0 1 1 1 1 1 1 }x1={ A C T C A T G T A A C G }
s2 ={(2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,4), (1,5), (1,6) }c2 ={ 1 1 1 1 1 1 1 1 1 }x2 ={ A C G A T C G T C }
cji-1 cj
i cji+1
s - pattern positionc = 1 : copy letter
(with possiblemutation)
c = 0 : draw letterfrom a distributionunrelated to the
patterns
Conditionals:
p(xji|s
ji=(1,2),c=1)=f(xj
i,r1(2))=f(xji,C)
p(xji|s,c=0)=g(xj
i)
Example:
Patterns:
Observations and a hidden variable assignment:
Experimental results
Value of computation(from Tufts Center)
Growth
• Human viruses– West Nile– SARS– Hepatitis C– Polio– …
• Animal viruses– FIV – Pig, chicken and cow viruses
• Most bacterial diseases• Parasitic diseases• The first sign of success of rational design might trigger
great increase in the number of diseases tackled
How can MS/MSR be involved?
• MS: Architecture, platform, tools– Storage, transmission, computation– E.g., parallelizable computation on a single machine;
pear-to-pear networks for parallel computation on multiple machines
• MSR:– Helping to speed up the scientific progress leading to
the new opportunities for growth– Advising MS on the research direction in the
community and the future requirements for the platform