Gene Prediction 10/21/05 D Dobbs ISU - BCB 444/544X 1 10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 1 10/21/05 Gene Prediction (formerly Gene Prediction - 3) 10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 2 Announcements Exam 2 - next Friday Posted online: Exam 2 Study Guide 544 Reading Assignment (2 papers) 10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 3 Announcements 544 Semester Projects - Information needed: Please send email to me (or David) [email protected]Briefly describe: • Your background & current grad research • Is there a problem related to your research you would like to learn more about & develop as project for this course? or • What would your ‘dream’ project be? 10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 4 Announcements 2 Bioinformatics Seminars today (Fri Oct 21) 12:10 PM BCB Faculty Seminar in E164 Lagomarcino “Protein Networks” Bob Jernigan, BBMB & Director,Baker Center for Bioinformatics & Biological Statistics http://www.bcb.iastate.edu/courses/BCB691-F2005.html#Oct%2021 4:10 PM GDCB Special Seminar in 1414 MBB “Integrating the Unknown-eome with Abiotic Stress Response Networks in Arabidopsis” Ron Mittler, Dept. of Biochem & Mol Biology University of Nevada, Reno 10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 5 Gene Prediction & Regulation Mon - Gene structure review: Eukaryotes vs prokaryotes Wed - Regulatory regions: Promoters & enhancers Fri - Predicting genes - Predicting regulatory regions (?) • Next week: Predicting RNA structure (miRNAs, too) 10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 6 Optional Reading Reviews: 1) Zhang MQ (2002) Computational prediction of eukaryotic protein- coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html 2) Wasserman WW & Sandelin (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html
8
Embed
Exam 2 Study Guide Gene Prediction - Iowa State Universityweb.cs.iastate.edu/~cs544/Lectures/GenePrediction.pdf · ... Gene Prediction 2 Announcements Exam 2 - next Friday Posted
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Gene Prediction 10/21/05
D Dobbs ISU - BCB 444/544X 1
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 1
10/21/05
Gene Prediction
(formerly Gene Prediction - 3)
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 2
AnnouncementsExam 2 - next FridayPosted online: Exam 2 Study Guide
544 Reading Assignment (2 papers)
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 3
Announcements544 Semester Projects - Information needed:
Briefly describe:• Your background & current grad research• Is there a problem related to your research you would
like to learn more about & develop as project forthis course?
or• What would your ‘dream’ project be?
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 4
Announcements2 Bioinformatics Seminars today (Fri Oct 21)12:10 PM BCB Faculty Seminar in E164 Lagomarcino
“Protein Networks”Bob Jernigan, BBMB & Director,Baker Center
for Bioinformatics & Biological Statisticshttp://www.bcb.iastate.edu/courses/BCB691-F2005.html#Oct%2021
4:10 PM GDCB Special Seminar in 1414 MBB“Integrating the Unknown-eome with AbioticStress Response Networks in Arabidopsis”Ron Mittler, Dept. of Biochem & Mol BiologyUniversity of Nevada, Reno
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 5
Gene Prediction & Regulation
Mon - Gene structure review: Eukaryotes vs prokaryotes
Wed - Regulatory regions: Promoters & enhancers
Fri - Predicting genes - Predicting regulatory regions (?)
• Next week: Predicting RNA structure (miRNAs, too)
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 6
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 9
UniGene: unique genes via ESTs
• Find UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene
• UniGene clusters contain many ESTs
• UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution
ORFs, codon usageWhat other types of information can be used? cDNAs & ESTs (experimental data,pairwise alignment) homology (sequence comparison, BLAST)
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 14
Automated gene prediction strategies
1) Similarity-based or Comparative• BLAST - Do other organisms have similar sequence?
(Is sequence similar to known gene or protein)
2) Ab initio = “from the beginning”• Predict without explicit comparison with cDNA or proteins via
“rule-based” gene models - but rules are derived fromstatistical analysis of datasets
3) Combined "evidence-based"• Combine gene models with alignment to known ESTs &
protein sequences
BEST RESULTS? Combined
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 15
Examples of gene prediction software
1) Similarity-based or Comparative• BLAST• SGP2 (extension of GeneID)
2) Ab initio = “from the beginning”• GeneID - (used in lab this week)• GENSCAN - (used in lab this week)• GeneMark.hmm - (should try this!)
3) Combined "evidence-based”• GeneSeqer (Brendel et al., ISU)
BEST? GENSCAN, GeneMark.hmm, GeneSeqerbut depends on organism & specific task
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 16
where H indexes the hypotheses of GT or AG at - True site in reading phase 1, 2, or 0 - False within-exon site in reading phase 1, 2, or 0 - False within-intron site
Let S = s-l s-l+1 s-l+2…s-1GT s1 s2 s3 …sr
!=H
HPHSPHPHSPSHP }){}|{/(}{}|{}|{
11,/}{}|{}{}{
1
1
1!!""
+!=
!!
+!=
! ==iii s
r
li
slii
r
li
l ffspsspspSP
Brendel 2005
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 27
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction 45
Other ResourcesCurrent Protocols in Bioinformaticshttp://www.4ulr.com/products/currentprotocols/bioinformatics.html
Finding Genes4.1 An Overview of Gene Identification: Approaches, Strategies, and
Considerations4.2 Using MZEF To Find Internal Coding Exons4.3 Using GENEID to Identify Genes4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm4.6 Eukaryotic Gene Prediction Using GeneMark.hmm4.7 Application of FirstEF to Find Promoters and First Exons in the Human
Genome4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences