Top Banner
Gene Structure and Identification III BIO520 Bioinformatics Jim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8
27

Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Jan 17, 2018

Download

Documents

Anis Knight

Promoter/Enhancer analysis Regulatory Sequences –Known Consensus Sequences –Consensus Sequence Generation Using functional (experimental) Data HBB as an example
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Gene Structure and Identification III

BIO520 Bioinformatics Jim Lund

Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8

Page 2: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

For real prediction we need…

• Solve the protein folding problem• Solve the molecular docking/binding

problem• Develop realistic simulations of

molecules in cells• Simulate multicellular systems

Page 3: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Promoter/Enhancer analysis

• Regulatory Sequences– Known Consensus Sequences– Consensus Sequence Generation

• Using functional (experimental) Data

• HBB as an example

Page 4: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Gene Regulatory Sequences

• Functional sites–Consensus–Experimental tests

• Inferred sites–Transcriptome analysis

Page 5: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Sequence Logos

• http://weblogo.berkeley.edu/

Page 6: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.
Page 7: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Position Weight Matrix:

PO A C G T01 6 4 4 6 N02 4 9 3 4 N03 12 4 3 1 A04 6 1 11 2 R05 3 2 11 4 G06 3 3 4 10 N07 3 10 3 4 N08 11 2 4 3 A09 4 9 3 4 N10 3 6 3 8 N

Page 8: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

EUKARYOTES

• More complex signals– Basal/core promoter– Promoter– Enhancers

• More genes• More dispersed signals

– Larger promoters, distant enhancers, regulatory sites in introns.

• Combinatoric regulation common

Page 9: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Basal Promoter Analysis

Myers and Maniatis, Genes VI, 831

• TATA-box -25 to -30 TBP• CCAAT-box -212 to -57 CTF/NF1• GC-box -164 to +1 SP1• K C W K Y Y Y Y +1 to +5 cap signal

TATA CAATGC

+1

Page 10: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Finding PolII sites (transcription start

site)• Promoter Scan• TSSG/TSSW (TSSP for plants)• Core-Promoter• FPROM • BCM Search Launcher

Page 11: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Enhancer Elements

• Octamer OCT1, OCT2B NF B• ATF ATF• AP1… AP1• ……..

Page 12: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Consensus Sequence Databases

• TRANSFAC• TFD (transcription factor database)

Page 13: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Consensus Sequence Databases

• Finding sites in promoter regions:– TESS

• http://www.cbil.upenn.edu/cgi-bin/tess/tess

– TFSEARCH• http://www.cbrc.jp/research/db/TFSEARCH.html

– BCM Search Launcher• http://searchlauncher.bcm.tmc.edu/seq-search/gene-

search.html

Page 14: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

HBB promoter (TESS)

Page 15: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Sequence-based algorithms for identifying enhancer binding sites

• Genes from: – Microarray transcription analysis– ChIP::chip experiments– Orthologous sequences– Experimental/other

• Programs for finding consensus sites:– MEME analysis of clusters– AlignAce– BioProspector/CompareProspector

Page 16: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Practical Gene Finding

• Use ALL tools– Predictive: Stitch together a consensus

• ORF finders• Find patterns (and WWW pattern searches)• HMM: GRAIL, Genscan…

– Comparative• BLASTN, BLASTX• Compare genomes (human:mouse)

– cDNA, protein, genetic evidence

Page 17: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

ORFs-aldolase gene

Page 18: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Genomic DNA-cDNA alignment

DNA sequencing

cDNAAlign (GAP)

Infer Promoter, EnhancerTest in cis

P

Page 19: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Comparative Genomics

• Conservation of coding regions• Identification of transcription signals

– “words” in common

• Example-yeast comparisons

Page 20: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Ensembl prediction pipeline

RepeatMasker

Genscan

Blast genscan peptides vProtein,unigene,est,vert mrna

Pmatch all human Proteins and cdnas

MiniGenewiseMiniEst2genome

Genes

DNA

Page 21: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.
Page 22: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.
Page 23: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Genscan features

• Model both strands at once• Each state may output a string of symbols (according

to some probability distribution).• Explicit intron/exon length modeling• Advanced splice site modeling• Complete intron/exon annotation for sequence• Able to predict multiple genes and partial/whole

genes• Parameters learned from annotated genes• Separate parameter training for different CpG content

groups (< 43%, 43-51%, 51-57%,>57% CG content)

Page 24: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

GENSCAN predictions

Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------

7.00 Prom + 63096 63135 40 -2.75 7.01 Init + 63183 63274 92 2 2 103 77 142 0.997 14.61 7.02 Intr + 63403 63625 223 1 1 83 96 181 0.999 15.61 7.03 Term + 64524 64652 129 2 0 101 50 83 0.373 3.00 7.04 PlyA + 64758 64763 6 1.05

8.00 Prom + 70508 70547 40 -4.75 8.01 Init + 70595 70686 92 1 2 103 77 133 0.990 13.71 8.02 Intr + 70817 71039 223 2 1 100 96 217 0.999 20.91 8.03 Term + 71890 72018 129 0 0 116 43 119 0.827 7.40 8.04 PlyA + 72126 72131 6 1.05

9.00 Prom + 74399 74438 40 -8.25 9.01 Sngl + 76602 76847 246 2 0 71 50 218 0.886 11.13 9.02 PlyA + 76928 76933 6 1.05

Page 25: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

GENSCAN predicted exons

Page 26: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

Annotated predicted exons

Page 27: Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, 9.1-9.6 10.2, 10.4, 10.6-8.

HBB gene

• HBB exons 1-3• 70545..70686• 70817..71039• 71890..72150

• GENSCAN• 70595 70686• 70817 71039• 71890 72018