Top Banner
BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: [email protected] Website: http://biocore.unl.edu
31

BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: [email protected] Website: @unl.edu.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

BIOS816/VBMS818

Lecture 7 – Gene Prediction

Guoqing LuOffice: E115 Beadle Center

Tel: (402) 472-4982Email: [email protected]

Website: http://biocore.unl.edu

Page 2: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Genes

• Protein coding genes– ORF– Regulatory signals

• Depend on organism

• RNA genes– rRNA– tRNA– snRNA, others…

Page 3: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Prokaryotic Gene Expression

Promoter Cistron1 Cistron2 CistronN Terminator

Transcription RNA Polymerase

mRNA 5’ 3’

TranslationRibosome, tRNAs,Protein Factors

1 2 N

Polypeptides

NC

NC N

C

1 2 3

Page 4: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Eukaryotic Gene Expression

Promoter Transcribed Region Terminator

Transcription RNA Polymerase II

Primary transcript 5’ 3’

Translation

Polypeptide

NC

Enhancer

Exon1 Exon2Intron1

CapSpliceCleave/Polyadenylate

7mG An

7mG An

Transport

Page 5: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Gene Finding

• Comparative– Compare your sequence to

what is already known– BLASTN, BLASTX

• Predictive: Stitch together a consensus– HMM, GRAIL…– Frames, Testcode– Findpatterns …

• Empirical approach – cDNA OR protein OR genetic

evidence

Page 6: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

ORF Characteristics

• Primary characters– Start Codon – (ATG)– Stop Coden - (TAA, TAG, TGA)

• Secondary characters– Codon bias– Biased nucleotide distribution

Page 7: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

ORF finding tools

• GCG– Frames, Map

• VectorNTI– ORF

• WWW tools– ORF Finder (NCBI)– …

Page 8: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Vector NTI - ORF

Ecoli Lac operon7477 bp

CDS(lacI) 1 CDS(lacZ) 2 CDS(lacY) 3CDS(lacA) 4

ORFs of the lac operon

GI: 146575

Page 9: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Statistical analysis as a means to find genes

• ORF example

• Codon Bias

• Fickett’s Statistic

Page 10: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Codon Bias

• Genetic code degenerate

• Codon usage varies– organism to organism– gene to gene

• high bias correlates with high level expression

• bias correlates with tRNA isoacceptors

• Change bias or tRNAs, change expression

Page 11: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Codon Bias

Gly GGG 6 0.21 Gly GGA 5 0.17 Gly GGT 11 0.38 Gly GGC 7 0.24

GAL4 ADH1

Gly GGG 0.21 0

Gly GGA 0.17 0

Gly GGT 0.38 0.93

Gly GGC 0.24 0.07

Gene Differences

GCG: CodonFrequency

Page 12: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Codon BiasOrganism Differences

CCU 12.8 3.4

CCC 1.7 17.6

CCA 22.4 1.2

CCG 4.9 26.2

Pc Ml

Page 13: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Codon Bias Calculation

frequency/synonymous family frequencyPref =

frequency in random/Family frequency in random

• Bias >1 in CORRECT frame

• Bias < 1 in Incorrect frame

Gly GGG 6 0.21 Gly GGA 5 0.17 Gly GGT 11 0.38 Gly GGC 7 0.24

Page 14: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Codon-Biased GeneRibosomal Protein S2, Ef-Ts

Frame 2

Frame 3

rpsB

tsf

Page 15: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Fickett’s Statistic

rpsB tsf

-analyzes the local nonrandomness at every third base in the sequence in a frame-independent fashion.-does not use codon frequency statistics

Page 16: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Error-rich DNAFickett’s

Normal

Corrupted1% substitution

2 indels

Page 17: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

ORF Found, Now What?

• Find ORFs is the biggest target, but easiest to find• Find Promoter elements

– Should be upstream of 5’-most ORF• Remember, one promoter can regulate expression of multiple

cistrons

– May have ambiguous sequence

• Find Ribosome Binding Site(s) and Start Codon(s)– 1 WITHIN each ORF (cistron) near 5’ end– RBS is close to (~5-10nt) and upstream of the start codon

P

Page 18: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

• More complex signals/regulatory elements

• More genes

• Combinatorial regulation common

• Introns/exons

ORF Found, Now What?

Page 19: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Eukaryotic Gene Complexity

• Yeast– introns rare

–promoters adjacent

–genome dense

Page 20: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Eukaryotes, cont’d

• “higher” Eukaryotes– introns common,

LONGER than exons

– Promoter/enhancer– genome sparse

• Fungi– introns common,

short relative to exons

– promoter/enhancer– genome dense

Page 21: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Fungi and “higher” eukaryotes

Sew together exons–ORF regions

–consensus sequences

–domain/polypeptide matches

Page 22: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Exon/Intron Structure

CCACATTgtn(30-10,000)an(5-20)agCAGAA

...CCACATTCAGAA...

...ProHisSerGlu...

Page 23: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Alternative Splice

CCACATTgtn(30-10,000)an(5-20)agcagAA

...CCACATTAA...

...ProHisSTOP

Page 24: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

How do we know what sequences to look for?

• Promoter sites

• Intron/Exon

• Transcription Termination/PolyA

• Translation initiation

Page 25: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Finding Functional Sequences

• Known Consensus Sequences

• Consensus Sequence Generation– Position Weight Matrices– Sequence Logos– Hidden Markov Models

• Functional Tests

Page 26: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Gene finding Tools-WWW

• GRAIL II: integrated gene parsing

• GenLang

• GENIE

• HMMGene

• GENESCAN

• GENEMARK

Page 27: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

GLIMMER for gene-findingin bacteria (www.tigr.org)

Page 28: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

YOU are the best universal gene finder…

• You understand the “rules”– ORF, Promoter, RBS– Organism specific

• You understand relationships/sequences– 5’ to 3’

• You are a good sequence finder – search patterns

• You can resolve ambiguities• EXPERIENCE

Page 29: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Exercise

• ORF analysis using Vector NTI: • Open Vector NTI • Retrieve the E. coli lac operon sequence

– Find Tools -> Open Link -> GID in the molecular display window – Type in 146575 in the Genbank ID required window

• Do ORF analysis– Find Analysis->ORF in the molecular display window– Use the Default Start & Stop setting

• Present a figure showing your ORF analysis result and report the start and stop positions and lengths of the ORF's.

Page 30: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Exercise (cont’d)

• ORF analysis using GeneMark• Go to Genmark web site:

http://opal.biology.gatech.edu/GeneMark/genemark24.cgi

• Paste in the lac operon sequence• Choose E. coli as the organism• Report the start and stop positions and

lengths of the predicted ORF's and compare them to those found with the Vector NTI ORF

Page 31: BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: @unl.edu.

Assignment #2

• Download from Blackboard– Go to “Assignment” page– Open “Assignment #2”– Download the file “Assignment1”

• Submit to Blackboard– Go to “Assignment” page– Open “Assignment #2”– Submit your answer through Tools->Digital Drop

Box

• Assignment #2 – due March 12