Bioinformatics A Biologist’s perspective Rob Rutherford
Dec 19, 2015
Bioinformatics
A Biologist’s perspective
Rob Rutherford
1. The Biologist’s perspective
2. A survey of tools
3. Training students for the future
If the biota, in the course of eons, has built something …..who but a fool would discard seemingly useless parts? To keep every cog
and wheel is the first precaution of intelligent tinkering.
-Aldo Leopold (1887 - 1948)
Figure 1.18 Careful observation and measurement provide the raw data for science
PubMed had 400,000 new research articles entered in 2002.
NCBI-NLM, 2003
Productive Tinkerers
NIH-NLM 2003
NIH-NLM 2003
(Cockerill 2003))
“If your experiment needs statistics, you ought to have done a better experiment.”
-Rutherford (the other one)
RA Fisher 1956, University of Adeliade Archives
“To consult the statistician after an experiment is finished is … to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”
2 Wings of Bioinformatics
Housekeeping BioinformaticsRepresentation, storage, and distribution of data
Analytical BioinformaticsNew tools for the discovery of knowledge in data
Part 2
A Survey of Problems/Opportunities
“The Central Dogma”
DNA Information Warehouse
(4 nucleic acid letters atgc)
RNA Temporary copy of a gene
Protein Working Cellular Machine
(20 amino acid letters)
RNA polymerase PDB
A Survey of Problems
Finding Genes and Understanding GenesProtein Structure and FunctionGene ExpressionNetworks
Other areas
Finding and Understanding Genes
Receptors-GPCR (767)
Receptors-NHR (56)
Integrins (33)
Ion Channels (313)
Kinases (713)
Phosphatases (274)
Phosphodiesterases (58)
Neurotrans. transporters (34)
P450s (59)
Proteases (527)
Secreted (3621)
Other (53076)
Estimated Gene Number~(59538)Human Genes
Rutherford
10 20 30 40 ....*....|....*....|....*....|....*....| consen 1 SPKNTPVVLIPKKGPGKYRPISlvDYKILNKATKKrFSpp 40 1MML 83 SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH-- 117 1HNI_B 54 NPYNTPVFAIKKKDSTKWRKLV--DFRELNKRTQD-FWev 90 1MU2_B 49 NPYNTPTFAIKKKDKNKWRMLI--DFRELNKVTQD-FTei 85 1D1U_A 69 SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH– 103
50 60 70 80 ....*....|....*....|....*....|....*....|
Consen 41 qPGFRPGRSLLNKLKGS-KWFLKLDLKKAFDSIPHDPLLR 79 1MML 118 -PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 156 1HNI_B 91 qLGIPHPAGL-----KKKKSVTVLDVGDAYFSVPLDEDFR 125 1MU2_B 86 qLGIPHPAGL—AKK -RRITVLDVGDAYFSIPLHEDFR 120 1D1U_A 104-PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 142
CnD3 HIV
Finding Conserved Regions/Domains
HIV protein
Comparing your sequence versus models derived from curated known protein families
Thanks to Porterfield
Phylogenetics and Evolution
Protein Structure
Imaging Experimental X-ray diffraction data
Predicting structure in silico from sequence
Experimental structures in the Protein Data Bank
HIV reverse tanscriptase
Goodsell, PDB
DNA (human genome)
RNA (HIV virus)
Protein
Structure is Function
Goodsell, PDB
Figure 17.0 Ribosome
Structural Predictions just from raw protein sequence?
Figure 17.0 Ribosome 1 ggcacgaggc acggctgtgc aggcacgcat gcaggccagc ….
1 atctgcacgt ggttatgctg ccggagtttg ggccgccact….
CASPCommunity Wide Assessment of techniques
for Protein Structure Prediction
Every two years, contest to test protein structure prediction from primary sequence
An example:
Gene Expression
Sequencing RNA (ESTs)Sequencing bits of ESTs (SAGE)
Automation of In situDNA microarray technology
One spot for each gene
MicroArray
Microarray Expression Analysis
Reference Mixture Specific Organ
H2O2 SDS Diamide Iron NO NOSigE SigH IdeR NrpR Experimental Conditions
400
0 G
enes
Gene turned onGene turned off
Low O2
DormancyGenes
Figure 1.3 Some properties of life
Figure 1.23x1 Biotechnology laboratory
Metabolic Pathway Map
Building Transcriptional Network Map
Networks
Biochemical PathwaysSignaling Networks
Transcriptional Networks Computational Neuroscience
Scientific American 2001
Microarrays uncover networks of interactions…
Other Opportunities
Organismal Physiology Populations
Communities Ecosystems
Same issues in “Macro” Biology
Long history of mathematical modeling
Huge datasets from •GPS/GIS•Remote sensing
If the biota, in the course of eons, has built something …..who but a fool would discard seemingly useless parts? To keep every cog
and wheel is the first precaution of intelligent tinkering.
-Aldo Leopold (1887 - 1948)
Where is all this leading to?
Part 3How do we prepare our students for
this future?
Dr. Peter Munson
Head of the Mathematical and Statistical Computing Laboratory Division of Computational Biosciences National Institutes of Health
Ole’ pre 1976
The Tool Builders
• Excellent mathematical skills
(algorithms, linear algebra, data structures)• Be comfortable in a Linux/Unix environment, and
know Perl and C/C++. • A deep background in 2+ advanced area of
biology with chemistry prerequisites.• Graduate training
The systems biologist.
Biologist who is an intelligent and skeptical consumer of large data sets
• Probability and Statistics • SQL and database basics• Equilibrium and rates of change (Calculus)• Exposure to system level data
And who knows how and when to collaborate(!)
end