Bioinformatics & Computational Biology Podcast for Frontiers in Biology - ISU 7/13/06 Thanks to Mark Gerstein (Yale) & Eric Green (NIH) for many borrowed & modified PPTs na Dobbs etics, Development and Cell Biology informatics & Computational Biology a State University
32
Embed
Bioinformatics & Computational Biology Podcast for Frontiers in Biology - ISU 7/13/06
Bioinformatics & Computational Biology Podcast for Frontiers in Biology - ISU 7/13/06. Thanks to Mark Gerstein (Yale) & Eric Green (NIH) for many borrowed & modified PPTs. Drena Dobbs Genetics, Development and Cell Biology Bioinformatics & Computational Biology Iowa State University. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bioinformatics& Computational
BiologyPodcast for Frontiers in Biology - ISU 7/13/06
Thanks to Mark Gerstein (Yale) & Eric Green (NIH)
for many borrowed & modified PPTs
Drena DobbsGenetics, Development and Cell BiologyBioinformatics & Computational Biology Iowa State University
What is Bioinformatics?(& What is Computational Biology?)
Wikipedia: •Bioinformatics & computational biology involve the use of techniques from mathematics, informatics, statistics, and computer science (& engineering) to solve biological problems
What is Bioinformatics?(& What is Computational Biology?)
Gerstein: • (Molecular) Bioinformatics is conceptualizing biology in terms of molecules & applying “informatics” techniques - derived from disciplines such as mathematics, computer science, and statistics - to organize and understand information associated with these molecules, on a large scale
Modified from Mark Gerstein
What is the Information?Biological Sequences, Structures,
ProcessesCentral Dogma
of Molecular Biology
• DNA sequence -> RNA -> Protein -> Phenotype
• Molecules Sequence, Structure, Function
• Processes Mechanism, Specificity, Regulation
Central Paradigm for Bioinformatics
• Genomic (DNA) Sequence -> mRNA& other RNA sequence -> Protein sequence -> RNA & Protein Structure -> RNA & Protein Function -> Phenotype
• Large Amounts of Information Standardized Statistical
idea from D Brutlag, Stanford, graphics from S Strobel)Modified from Mark Gerstein
Explosion of "Omes" & "Omics!"Genome, Transcriptome, Proteome
• Genome - the complete collection
of DNA (genes and "non-genes") of
an organism
• Transcriptome - the complete
collection of RNAs (mRNAs &
others) expressed in an organism *
• Proteome - the complete
collection of proteins expressed in
an organism *
* Note: the set of
specific RNAs or
proteins expressed
varies greatly in
different cells and
tissues -- and
critically depends
on the age,
developmental
stage, disease
state, etc. of the
organism
Molecular Biology Information: DNA & RNA
Sequences Functions: • Genetic material• Information transfer (mRNA)• Protein synthesis (tRNA/mRNA)• Catalytic & regulatory activities (some very new!)
Information:• 4 letter alphabet
(DNA nucleotides: AGCT)• ~ 1,000 base pairs in a small gene • ~ 3 X 109 bp in a genome (human)
Functions: Most cellular functions are performed or facilitated by proteins
• What is this protein?• Which amino acids are most important -- for folding, activity, interaction with other proteins? • Which sequence variations are harmful (or, beneficial)?
Modified from Mark Gerstein
Molecular Biology Information:
Macromolecular Structures
DNA/RNA/Protein Structures
• How does a protein (or RNA) sequence fold into an active 3-dimensional structure?
• Can we predict structure from sequence?
• Can we predict function from structure (or perhaps, from sequence alone?)
Modified from Mark Gerstein
We don't yet understand the protein folding code - but we try to engineer
proteins anyway!
Modified from Mark Gerstein
Molecular Biology Information:
Biological ProcessesFunctional Genomics• How do patterns of gene
expression determine phenotype?
• Which genes and proteins are required for differentiation during during development?
• How do proteins interact in biological networks?
• Which genes and pathways have been most highly conserved during evolution?
On a Large Scale?
Whole GenomeSequencing
Genome sequence now accumulate so quickly that, in less than a week, a single laboratory can produce more bits of data than Shakespeare managed in a lifetime, although the latter make better reading.
-- G A Pekso, Nature 401: 115-116 (1999)
Modified from Mark Gerstein
Next Step after the Sequence?
• Expression Analysis• Structural Genomics• Protein Interactions• Pathway Analysis• Systems Biology
Understanding Gene Function on a Genomic
Scale
Evolutionary Implications of: • Introns & Exons• Intergenic Regions as "Gene Graveyard"
Modified from Mark Gerstein
Gene Expression Data:
the Transcriptome
MicroArray Data
Yeast Expression Data:
• Levels for all 6,000 genes!
• Experiments to investigate how genes respond to changes in environment or how patterns of expression change in normal vs cancerous tissue
(courtesy of J Hager)Modified from Mark Gerstein
ISU's Biotechnology Facilities include state-of-the-art Microarray & Proteomics instrumentation
Other Whole-Genome
Experiments
Systematic Knockouts:
Make "knockout" (null) mutations in every gene - one at a time - and analyze the resulting phenotypes!
For yeast: 6,000 KO mutants!
2-hybrid Experiments:
For each (and every) protein, identify every other protein with which it interacts!
For yeast: 6000 x 6000 / 2 ~ 18M interactions!!
Modified from Mark Gerstein
Molecular Biology Information:Integrating Data
•Understanding the function of genomes requires integration of many diverse and complex types of information: Metabolic pathways Regulatory networks Whole organism physiology Evolution, phylogeny Environment, ecology Literature (MEDLINE)
Modified from Mark Gerstein
Storing & Analyzing Large-scale Information:
Exponential Growth of Data Matched by Development of Computer Technology
CPU vs Disk & Net• Both the increase in
computer speed and the ability to store large amounts of information on computers have been crucial
• Improved computing resources have been a driving force in Bioinformatics
Modified from Mark Gerstein (Internet picture adaptedfrom D Brutlag, Stanford)
ISU's supercomputer "CyBlue" is among 100 most powerful in the world
Bioinformatics is born!& more Bioinformaticists are
needed!
(courtesy of Finn Drablos)
(Internet picture adaptedfrom D Brutlag, Stanford)
Modified from Mark Gerstein
Weber Cartoon
from Mark Gerstein
“Informatics” techniquesin Bioinformatics
•Databases Building, Querying Object-oriented DB
•String Comparison Text search Alignment Significance statistics
•Finding Patterns Machine Learning Data Mining Statistics Linguistics
What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts -- types of parts (nuts & washers)?
How many roles can these play? How flexible and adaptable are they mechanically?
Figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center).Modified from Mark Gerstein
Application II: Finding homologs
Modified from Mark Gerstein
Finding WHAT? Homologs - "same genes" in different
organisms•Human vs. Mouse vs. Yeast
Much easier to do experiments on yeast!
Best Sequence Similarity Matches to Date Between Positionally ClonedHuman Genes and S. cerevisiae Proteins
Human Disease MIM # Human GenBank BLASTX Yeast GenBank Yeast Gene Gene Acc# for P-value Gene Acc# for Description Human cDNA Yeast cDNA