Top Banner
High Throughput DNA Sequencing
67

High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

High Throughput DNA Sequencing

Page 2: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

30,000

Page 3: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Shotgun Sequencing

IsolateChromosome

ShearDNAinto Fragments

Clone intoSeq. Vectors Sequence

Page 4: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Principles of DNA Sequencing

Primer

PBR322

Amp

Tet

Ori

DNA fragment

Denature withheat to produce

ssDNA

Klenow + ddN + dN+ primers

Page 5: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

The Secret to Sanger Sequencing

Page 6: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Principles of DNA Sequencing

5’

5’ Primer

3’ TemplateG C A T G C

dATPdCTPdGTPdTTPddATP

dATPdCTPdGTPdTTPddCTP

dATPdCTPdGTPdTTPddTTP

dATPdCTPdGTPdTTP

ddCTP

GddC

GCATGddC

GCddA GCAddT ddG

GCATddG

Page 7: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Principles of DNA SequencingG

C

T

A

+

_

+

_

G

C

A

T

G

C

Page 8: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Capillary Electrophoresis

Separation by Electro-osmotic Flow

Page 9: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Multiplexed CE with Fluorescent detection

ABI 3700 96x700 bases

Page 10: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Shotgun Sequencing

SequenceChromatogram

Send to Computer AssembledSequence

Page 11: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Shotgun Sequencing

• Very efficient process for small-scale (~10 kb) sequencing (preferred method)

• First applied to whole genome sequencing in 1995 (H. influenzae)

• Now standard for all prokaryotic genome sequencing projects

• Successfully applied to D. melanogaster• Moderately successful for H. sapiens

Page 12: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

The Finished Product

GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTAGAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGAT

Page 13: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Sequencing Successes

T7 bacteriophagecompleted in 198339,937 bp, 59 coded proteins

Escherichia colicompleted in 19984,639,221 bp, 4293 ORFs

Sacchoromyces cerevisaecompleted in 199612,069,252 bp, 5800 genes

Page 14: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Sequencing Successes

Caenorhabditis eleganscompleted in 199895,078,296 bp, 19,099 genes

Drosophila melanogastercompleted in 2000116,117,226 bp, 13,601 genes

Homo sapiens1st draft completed in 20013,160,079,000 bp, 31,780 genes

Page 15: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Scoring Matrices• An empirical model of evolution, biology and

chemistry all wrapped up in a 20 X 20 table of integers

• Structurally or chemically similar residues should ideally have high diagonal or off-diagonal numbers

• Structurally or chemically dissimilar residues should ideally have low diagonal or off-diagonal numbers

Page 16: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

A Better Matrix - PAM250A R N D C Q E G H I L K M F P S T W Y V

A 2

R -2 6

N 0 0 2

D 0 -1 2 4

C -2 -4 -4 -5 4

Q 0 1 1 2 -5 4

E 0 -1 1 3 -5 2 4

G 1 -3 0 1 -3 -1 0 5

H -1 2 2 1 -3 3 1 -2 6

I -1 -2 -2 -2 -2 -2 -2 -3 -2 5

L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6

K -1 3 1 0 -5 1 0 -2 0 -2 -3 5

M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6

F -4 -4 -4 -6 -4 -5 -5 -5 -2 1 2 -5 0 9

P 1 0 -1 -1 -3 0 -1 -1 0 -2 -3 -1 -2 -5 6

S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 3

T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -2 0 1 3

W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17

Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10

V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4

Page 17: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Prokaryotic Gene Prediction

Initiationcodon (ATG)

Stop codon(TAG,TAA, TGA)

Shine Delgarno sequenceRBS (G-rich)

Terminator (stemloop structure)

Promoter Region -35 (TTGAC)

-10 (TATAAT)

Page 18: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Prokaryotic Gene Prediction

• ORF finder – http://www.ncbi.nlm.nih.gov/gorf/gorf.html

• GLIMMER– http://www.tigr.org/softlab/glimmer/glimmer.html

• GeneMark– exon.biology.gatech.edu/GeneMark/– www.ebi.ac.uk/genemark

Page 19: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 20: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 21: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Hidden Markov Model

Page 22: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 23: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Eukaryotic Gene Prediction

5’site 3’site

branchpoint site

exon 1 intron 1 exon 2 intron 2

AG/GT CAG/NT

Page 24: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Gene Predictors

• GRAIL2 (http://compbio.ornl.gov/Grail-1.3/)

• Neural Network Model CC=0.47

• HMMgene (http://www.cbs.dtu.dk/services/HMMgene/)

• Hidden Markov Model CC=0.91

• GENSCAN (http://genes.mit.edu/GENSCAN.html)

• Probabilistic Model CC=0.91

• GRPL (GeneTool)• Reference Point Logistics CC=0.94

Page 25: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 26: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 27: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 28: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.
Page 29: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Evaluation Statistics

TP FP TN FN TP FN TN

Actual

Predicted

Sensitivity Fraction of actual coding regions that are correctly predicted as codingSn=TP/(TP + FN)

Specificity Fraction of the prediction that is actually correctSp=TP/(TP + FP)

Correlation Combined measure of sensitivity and specificityCC=(TP*TN-FP*FN)/[(TP+FP)(TN+FN)(TP+FN)(TN+FP)]0.5

Page 30: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Gene PredictionA

ccu

racy

(%

)A

ccu

racy

(%

)

MethodMethod

61 63

7276

8694 97

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

Xpound GeneID GRAIL 2 Genie GeneP3 RPL RPL+

Page 31: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

What Works Best?

• Expect only a single exon…– BLASTN vs. dbEST– BLASTX vs. nr(protein)

• Fully sequenced data– Run GENSCAN + HMMgene + GRPL– Combine predictions (CC > 0.95)– BLAST vs. nr(protein)– Combine BLAST result with prediction

Page 32: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

The Genetic Code

Page 33: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Translation

ATGCGTATAGCGATGCGCATTTACGCATATCGCTACGCGTAA

Frame3 A Y S D A HFrame2 C V * R C AFrame1 M R I A M R I

Frame-1 H T Y R H A NFrame-2 R I A I R MFrame-3 A Y L S A C

Page 34: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

From Sequence to Function (and Structure)

ACEDFHIKNMFACEDFHIKNMFSDQWWIPANMCSDQWWIPANMCASDFDPQWEREASDFDPQWERELIQNMDKQERTLIQNMDKQERTQATRPQDS...QATRPQDS...

Page 35: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

PROSITE Pattern Expressions

C - [ACG] - T - Matches CAT, CCT and CGT only

C - X -T - Matches CAT, CCT, CDT, CET, etc.

C - {A} -T - Matches every CXT except CAT

C - (1,3) - T - Matches CT, CCT, CCCT

C - A(2) - [TP] - Matches CAAT, CAAP

[LIV] - [VIC] - X(2) - G - [DENQ] - X - [LIVFM] (2) -G

Page 36: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

PepTool Pattern Expressions

C[ACG]T - Matches CAT, CCT and CGT only

C*T - Matches CAT, CCT, CDT, CET, etc.

C!AT - Matches every C * T except CAT

C{1,3}T - Matches C*T, C * * T, C * * * T

$CAA(TP] - Matches CAAT, CAAP at N terminus

[LIV][VIC] * * G[DENQ] *[LIVFM][LIVFM]* G

Page 37: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Protein Sequence Motifs

• Pattern-Based (PQL) Sequence Motifs• >*TCP&NLGT*• DOOLITTLE, R.F., OF URFS AND ORFS (1986)• GUANIDINE KINASE ACTIVE SITE

• Profiles or Position Scoring Matrix (PSSM)• A C D E F G H I K L M N P Q R

• 1 W G V L V 3 -2 3 4 0 4 -1 3 -1 4 4 1 1 1 -2

• 2 L L S P L 2 -2 -2 -1 3 0 -1 3 -1 6 5 -1 3 0 -1

• 3 V V V V V 2 2 -2 -2 2 2 -3 11 -2 8 6 -2 1 -2 -2

• 4 K E A T A 6 -2 5 6 -5 4 1 0 5 -2 0 3 3 3 1

Page 38: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Phosphorylation Sites

CCOOHH2N

H

PO4

CCOOHH2N

H

PO4CH3

CCOOHH2N

H

PO4

pY pT pS

Page 39: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Phosphorylation SitesPhopshorylation Sites>*KRKQI[ST]VR* CHAN K.F. et al., J. BIOL. CHEM. 257:3655-3659 (1982) PHOSPHORYLASE KINASE PHOSPHORYLATION SITE

>*KKR**R**[ST]* KEMP B.E. et al., PNAS 72:3448-3452 (1975) MYOSIN LIGHT CHAIN KINASE PHOSPHORYLATION SITE

>*NYLRRL[ST]DSNF* CZERNIK A.J. et al. PNAS 84:7518-7522 (1987)

CALMODULIN DEPENDENT PROTEIN KINASE I PHOSPHORYLATION SITE

Page 40: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Glycosylation

Page 41: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Glycosylation Sites

Glycosylation Sites>*N!P[ST]!P* MARSHALL, R.D.W. ANN. REV. BIOCHEM. 41:673-702 (1972) GLYCOSYLATION SITE (S AND/OR T ARE GLYCOSYLATED)

>*G*K*R* MARSHALL, R.D.W. ANN. REV. BIOCHEM. 41:673-702 (1972) GLYCOSYLATION SITE (K IS GLYCOSYLATED)

>*G*K**R* MARSHALL, R.D.W. ANN. REV. BIOCHEM. 41:673-702 (1972) GLYCOSYLATION SITE (K IS GLYCOSYLATED)

Page 42: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Signaling

Page 43: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Signaling SitesSignaling Sites>*[KRH][DEN]EL$ SMITH M.J. et al., EMBO J. 8:3581-3586 (1989) ENDOPLASMIC RETICULUM DIRECTING SEQUENCE

>*P***KKRKAV* KALDERON, D. et al., CELL 39:400-509 (1984) NUCLEAR TRANSPORT SIGNAL OF SV40 LARGE T ANTIGEN

>${3,20}[LIVFTA][LIVFTA][LIVFTA]{3,6}[LIV]*[GA]C* VON HEIJNE, G. PROT. ENG. 2:531-534 (1989)

SIGNAL PEPTIDASE II CLEAVAGE SITE

Page 44: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Protease Cut Sites

Page 45: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Protease Cut Sites

Protease Cut Sites>*[KR]* *[KR]/* TRYPSIN CLEAVAGE SITE (CUTS AFTER [KR])

>*[FLY]![VAG] */[FLY]![VAG] PEPSIN CLEAVAGE SITE (CUTS BEFORE [FY])

>*[FWY]* *[FWY]/* CHYMOTRYPSIN CLEAVAGE SITE (CUTS AFTER [FWY])

Page 46: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Binding SitesBinding Sites>*RGD* RUOSLAHTII E. et al., CELL 44:517-518 (1986) FIBRONECTIN ADHESION SITE

>*CDPGYIGSR* GRAF, J. et al., CELL 48:989-996 (1987) MAMMAL LAMNIN DOMAIN III B1 CHAIN CELL ATTACHMENT SITE

>*[VIL]**[TS][DN]Y**[FY][AL]* GODOVAC-ZIMMERMANN, J., TIBS 13:64-66 (1988)

BINDING SITE FOR HYDROPHOBIC MOLECULE TRANSPORT PROTEINS

Page 47: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Family Signature Sequences

Protein Family Signature Sequences>*[FY]CRNPD* NAKAMURA T. et al., NATURE 342:441-445 (1989) KRINGLE DOMAIN SIGNATURE

>*[LIVM][LIVM]DEAD*[LIVM][LIVM]* CHANG T.H. et al., PNAS 87:1571-1575 (1990) EIF 4A FAMILY ATP DEPENDENT HELICASE SIGNATURE

>*C*C*****G**C* BLOMQUIST M.C. et al., PNAS 81:7363-7362 (1984)

EGF/TGF SIGNATURE SEQUENCE

Page 48: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Enzyme Active SitesEnzyme Active Sites>*[MAFILV]DTG[STA][STAN]* DOOLITTLE, R.F., OF URFS AND ORFS, 1986 ACID OR ASPARTYL PROTEASE ACTIVE SITE

>*TCP&NLGT* DOOLITTLE, R.F., OF URFS AND ORFS, 1986 GUANIDINE KINASE ACTIVE SITE

>*F*[LIVFMY]*S**K****[AG]*[LIVM]L*JORIS, B. ET AL., BIOCHEM. J. 250:313-324 (1989)BETA LACTAMASE (TYPE A) ACTIVE SITE

Page 49: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Sequence Pattern Databases

• PROSITE - http://www.expasy.ch/

• BLOCKS - http://www.blocks.fhcrc.org/

• DOMO - http://www.infobiogen.fr/~gracy/domo

• PFAM - http://pfam.wustl.edu

• PRINTS - http://www.biochem.ucl.ac.uk/bsm/dbrowser/PRINTS

• SEQSITE - PepTool

Page 50: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Sequence Profiles

PI

HAA K

AGT AA

L

AC

QRVN

A

AY

F

AG

A

Page 51: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Defining Sequence Profiles 1 50

P43871-1 .......... ...IKKLDSN SIHAIISDIP YGIDYDDWDI LHSNTNSALGS18997-1 LMSKIYQMDA VDWLKTLENC SVDLFITDPP YESL.EKYRQ IGTTTRLKESP23192-1 EINKIHQMNC FDFLDQVENK SVQLAVIDPP YNL....... ..........P29538-1 MDQRLICSNA IKALKNLEEN SIDLIITDPP YNLG.KDY.. ..........P14751-1 TRHVYDVCDC LDTLAKLPDD SVQLIICDPP YNI....... ..........P34721-1 KNFNIYQGNC IDFMSHFQDN SIDMIFADPP YFLS.NDG.L TFKNSIIQ..P50178-1 ENAILVHADS FKLLEKIKPE SMDMIFADPP YFLS.NGG.M SNSGGQIV..P20590-1 FLNTILKGDC IEKLKTIPNE SIDLIFADPP YFMQ.TEGKL LRTNGDEF..S43876-1 GPETIIHGDC IEQMNALPEK SVDLIFADPP YNLQ.LGGDL LRPDNSKV..P28638-1 EAKTIIHGDA LAELKKIPAE SVDLIFADPP YNIG.KNF.. ..........P23941-1 DLGKLYNGDC LELFKQVPDE NVDTIFADPP FNLD.KEY.. ..........P14230-1 RSCKIIVGDA REAVQGLDSE IFDCVVTSPP YWGL.RDY.. ..........P14243-1 NGATLFEGDA LSVLRRLPSG SVRCIVTSPP YWGL.RDY.. ..........Q04845-1 LNNMLLQGNC AETLKKLPDE SVNLVFTSPP YY........ ..........S53866-1 WVNDIHEGDA EEVLAELPES SVHMVMTSPP YFGL.RDY.. ..........P29568-1 .......... ...MNELKDK SINLVVTSPP YPMV.EIWDR LFSELNPKIE

Signature Sequence: DPP Y

Page 52: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

A Sample Sequence Profile

A C D E F G H I K L M N P Q R S T V W Y

1 W G V L V 3 -2 3 4 0 4 -1 3 -1 4 4 1 1 1 -2 1 2 6 -6 -2

2 L L S P L 2 -2 -2 -1 3 0 -1 3 -1 6 5 -1 3 0 -1 3 1 4 1 -1

3 V V V V V 2 2 -2 -2 2 2 -3 11 -2 8 6 -2 1 -2 -2 0 2 15 -9 -1

4 K E A T A 6 -2 5 6 -5 4 1 0 5 -2 0 3 3 3 1 3 6 0 -6 -4

5 A P L P P 6 -1 0 1 -2 2 0 1 0 2 2 0 8 2 0 2 2 3 -5 -4

6 G G G G G 7 1 7 5 -6 15 -1 -3 0 -4 -3 4 3 6 1 6 2 -1 -6 -5

7 S S Q E D 4 -1 7 7 -6 7 2 -2 2 -3 -2 4 3 6 1 6 2 -1 -6 -5

8 S S T P S 4 4 2 2 -4 4 -1 0 2 -3 -2 2 7 0 1 10 6 0 -2 -4

<>i = log2(qi/pi)

Page 53: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Calculating a Profile Score

VLVAPGDS = 6+6+15+6+8+15+7+10=66LVLGPGLA = 4+4+8+4+8+15-3+4= 44

A C D E F G H I K L M N P Q R S T V W Y

1 W G V L V 3 -2 3 4 0 4 -1 3 -1 4 4 1 1 1 -2 1 2 6 -6 -2

2 L L S P L 2 -2 -2 -1 3 0 -1 3 -1 6 5 -1 3 0 -1 3 1 4 1 -1

3 V V V V V 2 2 -2 -2 2 2 -3 11 -2 8 6 -2 1 -2 -2 0 2 15 -9 -1

4 K E A T A 6 -2 5 6 -5 4 1 0 5 -2 0 3 3 3 1 3 6 0 -6 -4

5 A P L P P 6 -1 0 1 -2 2 0 1 0 2 2 0 8 2 0 2 2 3 -5 -4

6 G G G G G 7 1 7 5 -6 15 -1 -3 0 -4 -3 4 3 6 1 6 2 -1 -6 -5

7 S S Q E D 4 -1 7 7 -6 7 2 -2 2 -3 -2 4 3 6 1 6 2 -1 -6 -5

8 S S T P S 4 4 2 2 -4 4 -1 0 2 -3 -2 2 7 0 1 10 6 0 -2 -4

Page 54: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Profiles & Motifs are Useful

• Helped identify active site of HIV protease

• Helped identify SH2/SH3 class of STP’s

• Helped identify important GTP oncoproteins

• Helped identify hidden leucine zipper in HGA

• Used to scan for lectin binding domains

• Regularly used to predict T-cell epitopes

Page 55: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Membrane Spanning Regions

Page 56: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Predicting via Hydrophobicity

-3

-2

-1

0

1

2

3

4

1

Bacteriorhodopsin

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2OmpA

Bacteriorhodoposin OmpA

Page 57: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Predicting via Hydrophobicity

Protein Technique Predicted #helices Actual #helicesEngelman et al. 10

Microsomal cytochrome Chou & Fasman 8p 450 Rao & Argos 5

AMP07 1Eisenberg et al. 8Kyte-Doolittle 5Rao & Argos 4AMP07 4Jahnig 6Eisenberg et al. 1

Photosynthetic Reaction Rose 4 Centre (M chain) Kyte-Doolittle 4

Klein et al. 5Jahnig 7Kyte-Doolittle 4Engelman et al. 7Klein et al. 7

Quality of Membrane Helix Prediction of Membrane Proteins.

Fo-F1 ATPase (subunit A)

Bacteriorhodopsin

4

1

5

67

Page 58: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Predicting via Sequence Similarity

Protein Family No. of Membrane Segments

Cytokine/growth factor receptors (EGF, interleukin, Insulin receptors)G-coupled receptors (rhodopsin, bacteriorhodopsin etc.)Extracellular activated gated channels (Glutamate, GABA, ACh sensitive)

5) photosynthetic proteins (H chain) 5 (helices)6) photosynthetic proteins (L chain) 5 (helices)7) porins 17 (-strands)8) microsomal cytochrome p450 1 (helix)9) cytochrome b 1 (helix)10) Fo ATPases 4 (helices)

Intracellular activated gated channels 6 (helices)

1 (helix)

Table 6

7 (helices)

4 (helices)

Page 59: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Predicting via Neural Nets and PSSM’s

• PHDhtm http://dodo.cpmc.columbia.edu/predictprotein/

• TMAP http://www.mbb.ki.se/tmap/index.html

• TMPred http://www.ch.embnet.org/software/TMPRED_form.html

ACDEGF...

Page 60: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Secondary Structure Prediction

Page 61: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Secondary Structure Prediction

• Statistical (Chou-Fasman, GOR)• Homology or Nearest Neighbor (Levin)• Physico-Chemical (Lim, Eisenberg)• Pattern Matching (Cohen, Rooman)• Neural Nets (Qian & Sejnowski, Karplus)• Evolutionary Methods (Barton, Niemann)• Combined Approaches (Rost, Levin, Argos)

Page 62: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Chou-Fasman Statistics

P P Pc P P Pc

A 1.42 0.83 0.75 M 1.45 1.05 0.5C 0.7 1.19 1.11 N 0.67 0.89 1.44D 1.01 0.54 1.45 P 0.57 0.55 1.88E 1.51 0.37 1.12 Q 1.11 1.1 0.79F 1.13 1.38 0.49 R 0.98 0.93 1.09G 0.57 0.75 1.68 S 0.77 0.75 1.48H 1 0.87 1.13 T 0.83 1.19 0.98I 1.08 1.6 0.32 V 1.06 1.7 0.24K 1.16 0.74 1.1 W 1.08 1.37 0.45L 1.21 1.3 0.49 Y 0.69 1.47 0.84

Chou & Fasman Secondary Structure Propensity of the Amino Acids

Table 8

Page 63: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Chou-Fasman Algorithm

• Assign each residue a P, P, Pc value• Take a window of 7 residues and calculate

a window-averaged value for all P, P, Pc

• Assign the average value for each of the secondary structures to the middle residue

• Move down one residue and repeat steps 2 thru 3 until finished

• Scan and assign SS to the highest P/residue

Page 64: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

The PhD Approach

PRFILE...

Page 65: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

The PhD Algorithm• Search the SWISS-PROT database and

select high scoring homologues• Create a sequence “profile” from the

resulting multiple alignment• Include global sequence info in the profile• Input the profile into a trained two-layer

neural network to predict the structure and to “clean-up” the prediction

Page 66: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Prediction Performance

45505560657075

CF

GO

R I

LIM

LEV

IN

PTI

T

JAS

EP

7

GO

R II

I

ZHA

NG

PH

D

Sco

res

(%)

Page 67: High Throughput DNA Sequencing. 30,000 Shotgun Sequencing Isolate Chromosome ShearDNA into Fragments Clone into Seq. Vectors Sequence.

Best of the Best

• PredictProtein-PHD (72%)– http://cubic.bioc.columbia.edu/predictprotein

• Jpred (73-75%)– http://jura.ebi.ac.uk:8888/

• PREDATOR (75%)– http://www.embl-heidelberg.de/cgi/predator_serv.pl

• PSIpred (77%)– http://insulin.brunel.ac.uk/psipred