1 Lecture 1 – Sep 27, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 Welcome to CSE 527: Computational Biology 1 Who is the instructor? Prof. Su-In Lee Assistant Professor A joint faculty member Computer Science & Engineering, Genome Sciences Office hours: Wednesday 1:30-2:30 Research interests Developing machine learning techniques applied to Computational Biology (genetics, systems biology) Predictive Medicine, Translational Medicine 2
30
Embed
Welcome to CSE 527: Computational Biologysuinlee/cse527/notes/lecture1... · Welcome to CSE 527: Computational Biology 1 Who is the instructor? ... 3 Curing cancer. ... at ICU (intensive
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Introduction to probabilistic models Bayesian networks, Hidden Markov models Representation and learning
Part 2: Topics in computational biology and areas of active research Genetics, systems biology, predictive medicine,
sequence analysis Finding genetic factors for complex biological traits Inferring biological networks from data Comparative genomics DNA/RNA sequence analysis
8
5
Course responsibilities Class participation and attendance (10%)
Good answers to the questions asked in class Initiating a productive discussion.
Homework assignments (40%) Four problem sets
Due at beginning of class Up to 3 late days (24-hr period) for the quarter
Collaboration allowed Teams of 2 or 3 students Individual writeups
Final project (50%) A group of up to two students. 9
Project overview (1/2) Topic
Choose from the list of project topics on the course website, or come up with your own.
Open-ended
Project deliverables Project proposal (due 10/19) Midterm report (due 11/16) Final report (due 12/14) Final presentations or poster session (12/7)
10
6
Project overview (2/2) Final report
Short report (up to 10 pages) Conference-style presentation Successful project reports can be submitted to
computational biology/ ML conferences (ISMB, RECOMB, NIPS, ICML)
Or journals (PLoS journals, Nature journals, PNAS, Genome Research and so on)
11
Reading material Lecture notes
Mostly based on recent papers & old seminar papers
Biological background The Cell, a molecular approach by Copper Genetics, from genes to genomes by Hartwell and more Principles of Population genetics by Hartl & Clark
Computational background Probabilistic graphical models by Profs. Daphne Koller &
Nir Friedman Prof. Andrew Ng’s machine learning lecture note
(cs229.stanford.edu)
No textbook required for the course 12
7
Class resources Course website – cs.washington.edu/527
Lecture notes, assignments, project topics Deadlines of assignments and projects
Nucleotides per intron:5,500 on average (max: 500k)
Nucleotides per gene:45k on average (max: 2,2M)
34
18
From RNA to Protein Proteins are long strings of amino acids joined by
peptide bonds Translation from RNA sequence to amino acid
sequence performed by ribosomes 20 amino acids 3 RNA letters required to
specify a single amino acid
35
Amino acidamino acid
C
O
N
H
C
H
H OH
R
There are 20 standard amino acids
AlanineArginine
AsparagineAspartateCysteine
GlutamateGlutamine
GlycineHistidine
IsoleucineLeucineLysine
MethioninePhenylalanine
ProlineSerine
ThreonineTryptophan
TyrosineValine
36
19
C
O
N
H
C
H
R
to previous aa to next aa
N-terminus
(start)
H OH
C-terminus
(end)
N-terminus, C-terminus
from 5’ 3’ mRNA
Proteins
37
Translation
The ribosome (a complex of protein and RNA) synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid.
ribosome, codon
mRNA
P site A site
38
20
The genetic code Mapping from a codon to an amino acid
39
Translation
5’ . . . A U U A U G G C C U G G A C U U G A . . . 3’
UTR Met
Start Codon
Ala Trp ThrStop
Codon40
21
Translation
t‐RNA
Met Ala
5’ . . . A U U A U G G C C U G G A C U U G A . . . 3’
Trp
41
amino acid
Errors?
What if the transcription / translation machinery makes mistakes?
What is the effect of mutations in coding regions?
mutation
42
22
Reading Framesreading frame
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
4343
Synonymous Mutation
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G U U U A C G A A U U A G
synonymous (silent) mutation, fourfold site
G
G C U U G U U U G C G A A U U A G
Ala Cys Leu Arg Ile
4444
23
Missense Mutation
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G U U U A C G A A U U A G
missense mutation
G
G C U U G G U U A C G A A U U A G
Ala Trp Leu Arg Ile
4545
Nonsense Mutation
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G U U U A C G A A U U A G
nonsense mutation
A
G C U U G A U U A C G A A U U A G
Ala STOP
46
24
Frameshift
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G U U U A C G A A U U A G
frameshift
G C U U G U U A C G A A U U A G
Ala Cys Tyr Glu Leu
47
Transcription and translation
48Illustration from Radboud University Nijmegen
Let’s see how this happens! Transcription: http://www.youtube.com/watch?v=DA2t5N72mgw Translation: http://www.youtube.com/watch?v=WkI_Vbwn14g&feature=related
25
Gene Expression Regulation
When should each gene be expressed? Regulate gene expression
Examples:
Make more of gene A when substance X is present Stop making gene B once you have enough Make genes C1, C2, C3 simultaneously
Why? Every cell has same DNA but each cell expresses different proteins.
Signal transduction: One signal converted to another Cascade has “master regulators” turning on many proteins,
which in turn each turn on many proteins, ...
Regulation, signal transduction
49
Gene Regulation Gene expression is controlled at many levels
DNA chromatin structure Transcription Post-transcriptional modification RNA transport Translation mRNA degradation Post-translational modification
50
26
Transcription regulation Much gene regulation occurs at
the level of transcription.
Primary players: Binding sites (BS) in cis-regulatory
modules (CRMs) Transcription factor (TF) proteins RNA polymerase II
Primary mechanism: TFs link to BSs Complex of TFs forms Complex assists or inhibits formation of
the RNA polymerase II machinery
51
Transcription Factor Binding Sites Short, degenerate DNA sequences recognized by
particular TFs
For complex organisms, cooperative binding of multiple TFs required to initiate transcription
Binding Sequence Logo
52
27
Summary All hereditary information encoded in double-
stranded DNA Each cell in an organism has same DNA DNA RNA protein Proteins have many diverse roles in cell Gene regulation diversifies protein products
within different cells
53
Outline Course logistics
A zero-knowledge based introduction to biology
Potential project topics
54
28
Say that a cancer patient X undergoes a chemotherapy. There are >200 drugs patient X can be treated with. How do doctors choose which drug to use in
How Well Can We Predict Disease-related Traits Based on DNA?
Example project topic #2 (1/2)
Standard approach Find a simple rule! Failed to detect the DNA affecting
many important traits.
…ACTCGGTAGACCTAAATTCGGCCCGG…
…ACCCGGTAGACCTTTATTCGGCCCGG…
…ACCCGGTAGACCTTAATTCGGCCGGG…
:
…ACCCGGTAGTCCTATATTCGGCCCGG…
…ACTCGGTAGTCCTATATTCGGCCGGG…
DNA sequence
…ACTCGGTAGACCTAAATTCGGCCCGG…
…ACCCGGTAGACCTTTATTCGGCCCGG…
…ACCCGGTAGACCTTAATTCGGCCGGG…
:
…ACCCGGTAGTCCTATATTCGGCCCGG…
…ACTCGGTAGTCCTATATTCGGCCGGG…
obesity
…
Individual1
Individual2
Individual3
IndividualN-1
IndividualN
Obesity
:
A
A
A
T
T
Athin, T fat
p≈106 !
cell,a complex system
??
environmental factors
Causality?
N instances
s1 s2 sptoo weak to be detected
?
One of the most important research problems in this area is to develop new computational methods that can represent more complicated interaction between sequence variation and trait.
30
Longitudinal study Environmental factors
Age, sex, smoking status
How Well Can We Predict Disease-related Traits Based on DNA?