Top Banner
Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007
39

Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Understanding Science Through the Lens of Computation

Richard M. Karp

Nov. 3, 2007

Page 2: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

The Power of the Computational Perspective

• Exposes the computational nature of natural processes and provides a language for their description.

• Brings to bear fundamental algorithmic concepts: adversarial and probabilistic models, asymptotic analysis, intractability, computational learning theory, threshold behavior, fault tolerance, …

• Alters the worldviews of many scientific fields.

Page 3: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Computational Processes in the Sciences

• Regulation of protein production, metabolism and embryonic development

• Mechanisms of learning

• Molecular self-assembly

• Strategic behavior of companies

Page 4: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

A Computational View of Quantum Mechanics

• QM is the right setting for studying computation at subatomic levels.

• Theoretical computer science provides a mathematical model for a quantum computer, in which the unit of information is the qubit, a superposition of the values 0 and 1; n qubits exist in 2n states at the same time.

• There is evidence that quantum computers are more powerful than classical computers.

• The quest to realize quantum computation will test the foundations of QM.

Page 5: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

“Quantum Computation is as much about testing Quantum Physics as it is about building

powerful computers”Umesh Vazirani

Page 6: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Links Between Statistical Physics and Computer Science

• Both fields study how macroscopic properties of large systems arise from local interactions.– Statistical physics: properties of water and

magnetic materials– Computer science: global properties of World

Wide Web, structure of complex combinatorial problems

Page 7: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Similarities of Models and Methods

• Probabilistic models capture statistical behavior of large, complex, heterogeneous and incompletely known systems.

• Phase transitions in statistical physics have close parallels with sharp thresholds in computer science.

Page 8: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Areas of Convergence

• Constraint satisfaction problems

• Belief propagation and error-correcting codes

• Markov Chain Monte Carlo

• Percolation and sensor networks

Page 9: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Computational Models of the Web

• The Internet and the Web are simultaneously computational, social and economic. They support new modes of interaction.

• Novel algorithmic problems: ranking methods of search engines, reputation systems, recommendation systems, mechanism design.

Page 10: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Social Sciences and the Web

• The Web is a powerful laboratory for studying social and economic systems as computational processes.

• Insights from algorithmic game theory are indispensable for understanding the new markets and economic mechanisms that the Internet has spawned.

Page 11: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

“The Internet is an equilibrium, we just have to identify the

game.”

Scott Shenker

Page 12: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Game Theory

Payoff Matrix

Left Middle Right

Top 0/0 1,2 2,1

Middle 2,1 0.0 1,2

Bottom 1,2 2,1 0,0

• Nash equilibrium: each player chooses each pure strategy with probability 1/3

Page 13: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

The Computational Lens on Game Theory

• Finding a Nash equilibrium is an intractable problem.

• But computational tractability is an important modeling prerequisite.

• “If your laptop can’t find it, then neither can the market.” Kamal Jain

Page 14: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Computational Processes in Biology

• Learning in neural networks• Response of immune system to an invading microbe• Specialization of cells during embryonic

development• Collective behavior of animal communities:

flocking of birds, self-organization of ant colonies• Design of sensor-actuator control systems for

regulation of biological processes• Evolution of species

Page 15: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Algorithmic Challenges in Computational Molecular

Biology

Page 16: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Revolution in Biology

• Advances in computation and instrumentation enable a quantitative characterization of biological systems.

• Opportunity to advance understanding of molecular processes of life and change the ways we diagnose and treat disease.

• Multidisciplinary field: involves the biological, physical, engineering and mathematical sciences.

Page 17: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

The eukaryotic cell

Biological background

Page 18: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Goals of Computational Molecular Biology

• Sequence and compare the genomes of many organisms.

• Identify the genes and determine the functions of the proteins they encode.

• Understand how genes, proteins and other molecules act in concert to control cellular processes.

Page 19: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Goals of Computational Molecular Biology

• Trace the evolutionary history and evolutionary relationships of existing species.

• Understand the structure, function and evolutionary history of proteins.

• Identify the associations between genetic mutations and disease.

Page 20: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Regulation of Gene Expression

• Animals can be viewed as highly complex, precisely regulated spatial and temporal arrays of differential gene expression.

• Gene expression is regulated by a complex network of interactions among proteins, genomic DNA, RNA and chemicals within the cell.

Page 21: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Levels of Regulation

• Genome: spells the names of the proteins.• Transcription of genes to mRNA: regulated by

binding of transcription factors to DNA in control regions of genes.

• Translation of mRNA into functioning proteins, regulated by complex networks of protein-protein and protein-RNA interactions, and by post-translational modifications of proteins.

Page 22: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Levels of Regulation (Cont.)

• Regulation of metabolic processes: complex network of chemical reactions catalyzed by enzymes.

• Global phenotype such as disease: regulated by interaction of many metabolic processes.

Page 23: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Regulatory Networks

“We can approach understanding how the whole genome works by breaking it down into groups of genes that interact strongly with each other. Once researchers identify and understand these network modules, the next step will be to figure out the interactions within networks of networks, and so on until we eventually understand how the whole genome works, many years from now. ”

Garrett Odell

Page 24: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Key Research Areas

• Analysis of protein-DNA interactions: breaking the cis-regulatory code.

“ Regulatory interactions mandated by circuitry encoded in the genome determine whether each gene is expressed in each cell, throughout developmental space and time, and, if so, at what amplitude.” Eric Davidson

• Analysis of protein-protein interactions: identification of molecular machines and signal transduction cascades.

Page 25: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Tools for Analysis

• Measurement of protein-DNA and protein-protein interactions, and of mRNA production under selected perturbed conditions.

• DNA sequence analysis to identify genes, their regulatory regions and the transcription factor binding sites within them.

• Phylogenetic analysis to identify regulatory structures conserved across species.

• Classification of proteins according to structure and function.

Page 26: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Regulatory Control of Transcription

• E.H. Davidson Genomic Regulatory Systems

• Regulatory interactions mandated by circuitry encoded in the genome determine whether each gene is expressed in each cell, throughout developmental space and time and, if so, at what amplitude.

Page 27: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

The Ultimate Goal

• ``Portions of the endo16 cis-regulatory system of Strongylocentrotus are to date the most extensively explored of any, with respect to the functional meaning of each interaction that takes place within them. What emerges is almost astounding: a network of logic interactions programmed into the DNA sequence that amounts essentially to a hardwired biological computational device.

Page 28: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Analysis of Protein-DNA Interactions

• Transcription is regulated by proteins called transcription factors that bind to DNA near the transcription start site of a gene and influence the rate of transcription.

• Goals: identify the transcription factors, characterize the sites they bind to in the genome, and determine how they act in combination to enhance or inhibit transcription. This information is referred to as the cis-regulatory code.

Page 29: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Transcription Factors and Binding Sites

Page 30: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Binding Motifs

• Motif: short sequence pattern recognized by a transcription factor.

• Occurs repeatedly in the genome, but with considerable stochastic variation. Some positions are highly conserved, others exhibit great variation.

• Certain combinations of motifs occur repeatedly in clusters.

Page 31: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Occurrences of A Motif5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT

5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG

5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT

5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC

5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA

5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA

5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA

1: AAAAGAGTCA

2: AAATGACTCA

. AAGTGAGTCA

. AAAAGAGTCA

. GGATGAGTCA

. AAATGAGTCA

. GAATGAGTCA

M: AAAAGAGTCA

…HIS7

…ARO4

…ILV6

…THR4

…ARO1

…HOM2

…PRO3

1YSA

Info

rma

tiven

ess:

2+

∑bp b

llog 2

p bl

Page 32: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Regulatory Module

• Set of mutually cooperating transcription factors that can bind to the control regions of many genes to enhance or inhibit transcription.

• Given a database of transcription factors and their binding motifs, can identify such modules by searching for sets of transcription factors whose binding sites tend to co-occur in control regions.

Page 33: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Discovery of Conserved Protein Networks

Signaling and regulatory pathways and protein complexes consist of genes, proteins and small molecules ``wired’’ together in a complex network of intermolecular interactions.

Two proteins interact if they bind to each other or one of them binds to the control regions of the other’s gene.

Page 34: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Conserved Complexes and Pathways

Many complexes and pathways are conserved – they have evolved over evolutionary time and occur, in modified forms, in many organisms.

Our goal: using databases of protein-protein interactions in several species, in conjunction with data about protein sequence, structure, function and expression, identify conserved protein complexes and signaling pathways.

Discovery of conserved pathways and complexes allows transfer of functional annotation and prediction of interactions from one species tto another.

Page 35: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Baker’s yeast(Saccharomyes

cerevisiae)

Bacterial pathogen

(Helicobacter pylori) Kelley et al. PNAS

2003http://www.pathblast.org

Sharan et al. RECOMB 2004

Pairwise Network Alignment

Page 36: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

3-way Comparison

S. cerevisiae• 4389 proteins• 14319 interactions

C. elegans• 2718 proteins• 3926 interactions

D. melanogaster• 7038 proteins• 20720 interactions

Page 37: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Putative Complexes Conserved in Yeast, Worm and Fly

173 complexes associated with: DNA, RNA and phosphorus metabolism intracellular transport regulation of transcription protein folding, synthesis and degradation homeostasiscell proliferation, development and growthRNA localization

Page 38: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.
Page 39: Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007.

Future Work

• Identifying the proteins that interact to form a molecular machine is only a first step. The next, and far more difficult step, is to construct detailed predictive dynamical models of the behavior of molecular machines, under both normal and perturbed conditions.