Top Banner
105

hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1...

Jan 03, 2016

Download

Documents

Rodney Sparks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 2: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 3: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 4: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD 100mus pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD 100 M : :: PS RL :LCD :I V SQEF AH VLA S: R Q : :SP TF Q:L : Y ::: : :L

hum TZF p 92 PLQEAARALGVQSLEEACW------RARGD---RAKKPDPG----------------LKKHQEEPEKPSRNPERELGDPGEKQKP--------------- 151hum pLZF p 101 DLLYAAEILEIEYLEEQCLKMLETIQASDDNDTEATMADGGAEEEEDRKARYLKNIFISKHSSEESGYASVAGQSLPGPMVDQSPSVSTSFGLSAMSPTK 200mus pLZF p 101 DLLYAAEILEIEYLEEQCLKILETIQASDDNDTEATMADGGGEEEDDRKARYLKNIFISKHSSEESGYASVAGQSLPGPMVDQSPSVSTSFGLSAMSPTK 200 L AA L :: LEE C :A D A D G : KH E : : L P Q P

hum TZF p 152 EQVSRTGGREQEMLH-KHSPPRG--RPEMAG-----ATQEAQQEQTRSKEKRLQ-AP------VG--------QRGADG-----KHGVLTWLRENPGGSE 223hum pLZF p 201 AAVDSLMTIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVAEVKTEMMQVDEVPSQDSPGAAESSISGGMGDKVEERGKEGPGTPTRSSVITSARELHYGRE 300mus pLZF p 201 AAVDSLMSIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVAEVKMEMMQVDEAPCQDSPGAAESSISGGMGDKFEERSKEGPGTPTRRSVITSARELHYGRE 300 V Q :L: PP G P :AG E : E : E Q :P : :R :G : V:T RE G E

hum TZF p 224 ESLRKLPGPLP----PAGSLQTSVTP--RP--SWAEAP----WLVGGQP-ALWSILLMPPRYGIPFYHST-----PTTGAWQEVWR-----------EQR 294hum pLZF p 301 ESAEQVPPPAEAGQAPTGRPEHPAPPPEKHLGIYSVLPNHKADAVLSMPSSVTSGLHVQPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKSESR 400mus pLZF p 301 ESGEQLSPPVEAGQGPPGRQEPLAPPVEKHLGIYSVLPNHKADAVLSMPSSVTSGLHVQPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKAESR 400 ES :: P P G : P : : P V P :: S L : P : ST P :E: E R

hum TZF p 295 ----------IPLSLN--------APKGLWSQ----------N-----Q--LASSSPTPGSLP-QGPAQLSP-GEMEESDQGHTGALAT-----CAG--- 349hum pLZF p 401 TIGEQCSVCGVELPDNEAVEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALETHRQTHTGTDMAVFCLLCGKRFQ 500mus pLZF p 401 PLGEQCSVCGVELPDNEAVEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALETHRQTHTGTDMAVFCLLCGKRFQ 500 : L N G: : LA S: : : Q AQ S :E Q HTG: : C

hum TZF p 350 --------HEDKAG--------CP---P---------RPHPPPAPPARS------R----------------PYACSVCGKRFSLKHQMETHYRVHTGEK 399hum pLZF p 501 AQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLRSHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCDKKFSLKHQLETHYRVHTGEK 600mus pLZF p 501 AQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLRSHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCGKKFSLKHQLETHYRVHTGEK 600 E :AG C P R H P R PY C C K:FSLKHQ:ETHYRVHTGEK

hum TZF p 400 PFSCSLCPQRSRDFSAMTKHLRTH-GAAPYRCSLCGAGCPSLASMQAHMRGHSPSQLPPGWTIRSTFLYSSSRPSRPSTSPCCPSSSTT 487hum pLZF p 601 PFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIPPDWRIEKTYLYLCY-V 673mus pLZF p 601 PFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIPPDWRIEKTYLYLCYV 673 PF C LC QRSRD:SAM KHLRTH GA:PY:C::C CPSL:SMQ HM:GH P ::PP W I T:LY :

Page 5: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Hs.99430 Homo sapiens EXPRESSION INFORMATION cDNA sources: Blood, Ovary, Testis EST SEQUENCES (8)AI150041 cDNA clone IMAGE:1751830 Testis 3' read 1.1 kbAA927876 cDNA clone IMAGE:1541369 3' read 1.1 kbAI223414 cDNA clone IMAGE:1838461 Testis 3' read 1.0 kbAI150330 cDNA clone IMAGE:1751988 Testis 3' read 0.6 kbAA868505 cDNA clone IMAGE:1408687 Testis 3' readAA476210 cDNA clone IMAGE:771312 Ovary 3' readAA456628 cDNA clone IMAGE:809583 Ovary 3' readAI361709 cDNA clone IMAGE:2021901 Blood 3' read

Northern Blotting

Page 6: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 7: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

LOCUS AF130255 1960 bp mRNA PRI 22-FEB-1999DEFINITION Homo sapiens testis zinc finger protein (TZFP) mRNA, complete cds.ACCESSION AF130255KEYWORDS .SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 1960) AUTHORS Tang,Tang K., Lai,Chun-Hung, Tang,Chieh-Ju C., Huang,Chang-Jen and Lin,Wen-chang. TITLE Identification and gene structure of a novel human PLZF related transcription factor gene, TZFP JOURNAL UnpublishedREFERENCE 2 (bases 1 to 1960) AUTHORS Tang,T. K., Tang,C.-J. C. and Lin,W.-c. TITLE Direct Submission JOURNAL Submitted (22-FEB-1999) Institute of Biomedical Sciences, Academia Sinica, No. 128, Sec. 2, Academia Road, Taipei, Taiwan 11529, TAIWAN

Page 8: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 9: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Search: AA927876

Page 10: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

dbEST Id: 1659486

IDENTIFIERSEST name: om18b09.s1GenBank Acc: AA927876GenBank gi: 3076620

CLONE INFOClone Id: IMAGE:1541369 (3')Source: NCIInsert length: 1074DNA type: cDNA

PRIMERSSequencing: -40m13 fwd. ET from AmershamSEQUENCE TTTGACGGGAGGGCACAGGAAACTCTTTATTATGGTGATGAGATCGACAATCTCCCCTAC TGTTAACCTTCGCTCCTGCACACTTCAGTGTCCTCACTCTGTAGGGCTCGCTGGCCTGGG CTTCTGCGACCCGCGATCGTCCAGGAGAGGGCACTCGGCGCCCTTCCTGGGGTNNTCTGG GGCGGAATTTGCTAGGCCGCCGTAGCAGCTGTGCCAGGTCAGAAGCCGAGCCGGNCCGCT TTTCGTTCTTTAATTGGACTCTTGGCTAAGACGCTACCGACACCCCGTCAGTGGTGGAGG AAGAAGGACAACAGGGAGAGGTCGAGGQuality: High quality sequence stops at base: 318Entry Created: Apr 17 1998Last Updated: Jun 10 1998

Page 11: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

COMMENTS This clone is available royalty-free through LLNL ; contact the IMAGE Consortium ([email protected]) for further information.

LIBRARYdbEST lib id: 1042Lib Name: Soares_NFL_T_GBC_S1

Organism: Homo sapiensOrgan: pooledLab host: DH10BVector: pT7T3D-Pac (Pharmacia) with a modified polylinkerR. Site 1: Not IR. Site 2: Eco RIDescription: Equal amounts of plasmid DNA from three normalized libraries (fetal lung NbHL19W, testis NHT, and B-cell NCI_CGAP_GCB1) were mixed, and ss circles were made in vitro. Following HAP purification, this DNA was used as tracer in a subtractive hybridization reaction. The driver was PCR-amplified cDNAs from pools of 5,000 clones made from the same 3 libraries. The pools consisted of I.M.A.G.E. clones 297480-302087, 682632-687239, 726408-728711, and 729096-731399. Subtraction by Bento Soares and M. Fatima Bonaldo.

Page 12: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Human cDNA Library Details:470 different libraries so farcovering more than 40 tissues

Stomach 202.NCI_CGAP_Gas1 gastric tumor 203.NCI_CGAP_Gas4 gastric tumor Testis 204.Barstead HPL-RB5 testis 205.Soares testis NHT 206.Life Tech. testis (10426-013) Thymus 207.NCI_CGAP_Thym1 thymoma Thyroid 208.NCI_CGAP_Thy1 invasive thyroid tumor Uterus 209.NCI_CGAP_Ut1 uterine tumor 210.NCI_CGAP_Ut2 uterine tumor 211.NCI_CGAP_Ut3 uterine tumor 212.NCI_CGAP_Ut4 uterine tumor 213.Soares pregnant uterus NbHPU

Q & A

CGAP

Page 13: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

CGAP: Cancer Genome Anatomy Project

Page 14: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Why CGAP? In the last two decades we have learned that genetic changes lie at the root of all cancers. In response, the Cancer Genome Anatomy Project (CGAP) will unite the newest technologies, along with those both cost-effective and capable of high-throughput, to identify all the genes responsible for the establishment and growth of cancer.

Project Goals To achieve a comprehensive molecular characterization of normal, precancerous, and malignant cells.

Page 15: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 16: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Normal Cells

Cancer Cells

Comparing the fingerprints of a normal versus a cancer cell will highlight genes that by their suspicious absence or presence (such as Gene H ) deserve further scientific scrutiny to determine whether such suspects play a role in cancer, or can be exploited in a test for early detection.

Page 17: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Identifying the genetic differences among normal cells, precancerous cells, and cancer cells, will contribute to our understanding of cancer as it

fosters the discovery of genes that directly cause cancer provides us with a way to identify early precancerous cells and thus enhances our methods for early detection improves our ability to match patients with appropriate treatment

Time line

Malignant TumorPre-cancer

Page 18: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

The research results displayed in this graph demonstrate that for patients suffering from the cancer neuroblastoma, the presence or absence of a specific set of genes found on Chromosome 1 strongly correlates with patient outcome. Therefore, in the future this characteristic of the tumor can be used to identify those patients that would benefit from more aggressive treatment, and those best served by the current treatment protocol.

Page 19: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 20: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 21: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 22: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 23: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 24: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Laser Capture Microdissection

(LCM)

Page 25: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 26: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 27: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 28: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 29: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 30: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 31: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Go

Page 32: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 33: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 34: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 35: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 36: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 37: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 38: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 39: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 40: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

2000CGAP sequences:

925,746CGAP genes:

79,844

1999CGAP sequences:

473,746CGAP genes:

20,665

Page 41: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 42: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 43: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 44: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 45: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Not in all others

Page 46: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 47: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Not in all others

Page 48: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 49: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 50: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Not in all others

Page 51: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Sequencing of Expressed Sequence Tags (ESTs) Serial Analysis of Gene ExpressionDifferential Display ApproachesHybridization Analysis

Page 52: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Digital Differential Display

The foundation of DDD is UniGene. UniGene employs a conservative method to assign all the human EST sequences that meet minimal standards of quality to distinct "clusters", each representing a unique human expressed gene. DDD takes advantage of UniGene by comparing the number of times sequences from different libraries were assigned to a particular UniGene cluster. This has the advantage that DDD will only report on sequences that we have confidencerepresent bona fide human expressed genes. There will of course be many differences in the number of sequences contained in each library that are assigned to a particular UniGene cluster, but only some of these differences are likely to reflect biological reality. Therefore DDD employs a statistical method of comparison - The Fisher Exact Test - to identify only those differences that are likely to be real. One important factor in determining statistical relevance is the absolute number of sequences in each library that have been successfully assigned to a UniGene cluster. In many cases there are not enough sequencesin dbEST libraries to meet the threshold of significance employed in the Fisher Exact Test. Since DDD will only yield a report if there are differences that exceed this threshold, it is expected that many comparisons will yield nothing.

Page 53: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 54: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 55: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 56: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

the fraction of sequences within the pool

visual aid that reflects the numerical values

statistically significant pairwise comparison

Page 57: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 58: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 59: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 60: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 61: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 62: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 63: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 64: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 65: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

THREE PRINCIPLES UNDERLIE THE SAGE TECHNOLOGY:

One short oligonucleotide sequence from a defined location within a transcript ("tag") allows accurate quantitation.

Tag size (10-14bp) is optimal for high throughput while maintaining accurate gene identification and quantitation.

The combined power of serial and parallel processing increases data throughput by orders of magnitude when compared to conventional approaches.

Page 66: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 67: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 68: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 69: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Ortholog:Homologous genes that have diverged from each other after speciation events (e.g., human beta- and chimp beta-globin)

Paralog:Homologous genes that have diverged from each other after gene duplication events (e.g., human beta- and gamma-globin)

Xenolog:Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria)

Homolog:Genes that are descended from a common ancestor (e.g., all globins)

Page 70: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Dec. 11, 1998:

C. elegans: Sequence to Biology

-Jonathan Hodgkin, H. Robert Horvitz, Barbara R. Jasny, Judith Kimble*

This special issue of Science celebrates a landmark in biology: determination ofthe essentially complete DNA sequence of an animal genome. The animal is a smallinvertebrate, the nematode (or roundworm) Caenorhabditis elegans, and thesequence consists of about 97 million base pairs of DNA, approximatelyone-thirtieth the number in the human genome. Nonetheless, the information contentis enormous--eight times that of the budding yeast Saccharomyces cerevisiae, the only other eukaryote with a sequenced genome.

Page 71: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Genomic sequence of the Nematode C. elegnas:A platform for investigating biology

The C. elegans Squencing Consortium

97 MB257 YACs (20% only in YAC)2527 cosmids113 fosmids44 PCR19,099 predicted genes18,891 proteins here(16,260 reviewed)

EST: 67,815 EST from 40,379 clones

7432 genes

A multicellular organism genome

Page 72: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Genefinder program:** transplicing**

40% of predicted genes have ESTmatches

16,260/19,099 genes have been interactively reviewed. Average of one gene per 5 Kb.Average of five introns per gene.27% of genome resides in exons.

pFAM protein family search :Intracellular communicationTranscriptional regulation

Page 73: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Table 1. The 20 most common protein domains in C. elegans (41). RRM, RNA recognition motif; RBD, RNA binding domain; RNP, ribonuclear protein motif; UDP, uridine 5'-

diphosphate. -------------------------------------------------------------------Number Description-------------------------------------------------------------------

650 7 TM chemoreceptor410 Eukaryotic protein kinase domain240 Zinc finger, C4 type (two domains)170 Collagen140 7 TM receptor (rhodopsin family)130 Zinc finger, C2H2 type120 Lectin C-type domain short and long forms100 RNA recognition motif (RRM, RBD, or RNP domain) 90 Zinc finger, C3HC4 type (RING finger) 90 Protein-tyrosine phosphatase 90 Ankyrin repeat 90 WD domain, G-beta repeats 80 Homeobox domain 80 Neurotransmitter-gated ion channel 80 Cytochrome P450 80 Helicases conserved C-terminal domain 80 Alcohol/other dehydrogenases, short-chain type 70 UDP-glucoronosyl and UDP-glucosyl transferases 70 EGF-like domain 70 Immunoglobulin superfamily

Page 74: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Worming secrets from the C. elegans genome:Dec 11, 1998. Sciences

Washington University Genome Sequencing Center.Sanger Centre

8 - year effort: Sydney Brenner starts all.by 1992, they were doing a million bases per year. ~$200 MHigh-through put sequencing.Human genome project.

“We will be doing a lot of jumping back and forth between species” - F. Collins

Ping-Pong homology search

Page 75: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 76: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

In silico cloning:In order to perform an electronic cDNA library screen, the EST

sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs

using computer softwares. EST 2

EST 1EST 3

There are many sequencing related errors in the dbEST.

Page 77: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

C elegnasa. a. sequences

Human EST sequences

Comparative Gene Identification

Page 78: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Query= (597 letters) Sequences producing significant alignments: (bits) Valuelcl|THC200240 224 4e-58lcl|THC151579 181 3e-45lcl|AA099787 127 8e-29

lcl|THC200240 Length = 764 Score = 224 bits (565), Expect = 4e-58 Identities = 106/187 (56%), Positives = 136/187 (72%)

Query: 248 SGMKKNKYGNIEDLVVHLNFVCPKGIIQKQCQVPRMSSGPDIHQIILGSEGTLGVVSEVT 307 SGMKKN YGNIEDLVVH+ V P+GII+K CQ PRMS+GPDIH I+GSEGTLGV++E TSbjct: 3 SGMKKNIYGNIEDLVVHIKXVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 182

lcl|THC151579 Length = 698 Score = 181 bits (455), Expect = 3e-45 Identities = 81/142 (57%), Positives = 106/142 (74%)

Query: 446 LGMNHGVLGESFETSVPWDKVLSLCRNVKELMKREAKAQGVTHPVLANCRVTQVYDAGAC 505 L + + VLGESFETS PWD+V+ LCRNVKE + RE K +GV + CRVTQ YDAGACSbjct: 41 LALEYXVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 220

Page 79: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

sp|O00116|ADAS_HUMAN ALKYLDIHYDROXYACETONEPHOSPHATE SYNTHASE PRECURSOR (ALKYL-DHAP SYNTHASE) (ALKYLGLYCERONE-PHOSPHATE SYNTHASE) Length = 658 Score = 124 bits (309), Expect = 5e-29 Identities = 59/60 (98%), Positives = 59/60 (98%) 248Query: 1 SGMKKNIYGNIEDLVVHIKXVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 60 SGMKKNIYGNIEDLVVHIK VTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEATSbjct: 319 SGMKKNIYGNIEDLVVHIKMVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 378

THC200240

sp|O00116|ADAS_HUMAN ALKYLDIHYDROXYACETONEPHOSPHATE SYNTHASE PRECURSOR (ALKYL-DHAP SYNTHASE) (ALKYLGLYCERONE-PHOSPHATE SYNTHASE) Length = 658 Score = 127 bits (315), Expect = 1e-29 Identities = 59/60 (98%), Positives = 59/60 (98%) 446Query: 1 LALEYXVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 60 LALEY VLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGACSbjct: 517 LALEYYVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 576

THC151579

446-248=198

517-319=198

Page 80: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

[THC195737---------------------------------------------

MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRNPVISPTGYI

F

--------]

DREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFTKRTQFSAIESTPSRTGAVA

T

[THC195737--------------------

PRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNMKGDKSTSLPSFWIPELNPTAVATKLEKPS

S

----------------------------------------------------]

KVLCPVSGKPIKLKELLEVKFTPMPGTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDV

V

[THC195737----------------------]

EKLIKGDGIDPINGEPMSEDDIIELQRGGTGYSATNETKAKLIRPQLELQ*

U58746

Page 81: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Translation of 1 MTRHGKNCTAGAVYTYHEKKKDTAASGYGTQNIRLSRDAVKDFDCCCLSLQPCHD 55U58746 1 MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRN 55 *******.** .******...*. ****** . ** *..*.* **.*.****.

Translation of 56 PVVTPDGYLYEREAILEYILHQKKEIARQMKAYEKQRGTRREEQKELQRAASQDH 110U58746 56 PVISPTGYIFDREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFT 110 **..* **...****** ** *** *...* **** * . *

Translation of 111 VRGFLEKESAIVSRPLNPFTAKALSGTSPD-----------DVQPGPSVGPPSKD 154U58746 111 KRTQFSAIESTPSRTGAVATPRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNM 165 * . ** * . *. *. * *

Translation of 155 K-DK--VLPSFWIPSLTPEAKATKLEKPSRTVTCPMSGKPLRMSDLTPVHFTPLD 206U58746 166 KGDKSTSLPSFWIPELNPTAVATKLEKPSSKVLCPVSGKPIKLKELLEVKFTPMP 220 * ** ******* *.* * ******** * **.****... .* *.***.

Translation of 207 SSVDRVGLITRSER-YVCAVTRDSLSNATPCAVLRPSGAVVTLECVEKLIRKDMV 260U58746 221 ------GTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDVVEKLIKGDGI 269 * * . * ..* **** *.*.* ** *. * .** . *****. * .

Translation of 261 DPVTGDKLTDRDIIVLQRGGTGFAGSGVKLQAEKSRPVMQA 301U58746 270 DPINGEPMSEDDIIELQRGGTGYSAT-NETKAKLIRPQLELQ 310 **..*. ... *** *******.. . .* ** ..

(44%/59%)

Page 82: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

[THC171302--MVFGENQDLIRTHFQKEADKVRAMKTNWGLFTRTRMIAQSDYDFIVTYQQAENEAERSTVLSVFKEK-------------------------------------------------------------------AVYAFVHLMSQISKDDYVRYTLTLIDDMLREDVTRTIIFEDVAVLLKRSPFSFFMGLLHRQDQYIVH-------------------------------------------------------------------ITFSILTKMAVFGNIKLSGDELDYCMGSLKEAMNRGTNNDYIVTAVRCMQTLFRFDPYRVSFVNING-------------------------------------------------------------------YDSLTHALYSTRKCGFQIQYQIIFCMWLLTFNGHAAEVALSGNLIQTISGILGNCQKEKVIRIVVST-----------------] [THC177150--------------------------------------------LRNLITSNQDVYMKKQAALQMIQNRIPTKLDHLENRKFTDVDLVEDMVYLQTELKKVVQVLTSFDEY-------------------------------------------------------------------ENELRQGSLHWSPAHKCEVFWNENAHRLNDNRQELLKLLVAMLEKSNDPLVLCVAAHDIGEFVRYYP------------------------------------------------]RGKLKVEQLGGKEAMMRLLTVKDPNVRYHALLAAQKLMINNWKDLGLEI

U50199

Page 83: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha is... 927 0.0gi|2895576 (AF041337) vacuolar proton pump subunit SFD beta iso... 885 0.0gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; code... 468 e-131gi|1086810 (U41109) similar to S. cerevisiae vacular H(+)-ATPas... 335 5e-91gnl|PID|e351278 (Z99532) hypothetical protein [Schizosaccharomy... 185 5e-46sp|P41807|VM13_YEAST VACUOLAR ATP SYNTHASE 54 KD SUBUNIT (V-ATP... 123 2e-27

gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; coded for by C. elegans cDNA cm7g5; coded for by C. elegans cDNA cm14b9; coded for by C. elegans cDNA yk52g5.5; coded for by C. elegans cDNA yk76e5.5; coded for by C. elegans cDNA yk131f11.5; c... Length = 470 Score = 468 bits (1192), Expect = e-131 Identities = 243/477 (50%), Positives = 314/477 (64%), Gaps = 20/477 (4%)

Human gene: 483 aa

Page 84: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha isoform [Bos taurus] Length = 483 Score = 927 bits (2369), Expect = 0.0 Identities = 460/483 (95%), Positives = 465/483 (96%)

Query: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISAEDCEFIQRFEMKRSPE 60 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMIS+EDCEFIQRFEMKRSPESbjct: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISSEDCEFIQRFEMKRSPE 60

Query: 61 EKQEMLQTEGSQCAKTFINLMTHICKEQTVQYILTMVDDMLQENHQRVSIFFDYARCSKN 120 EKQEMLQTEGSQ AKTFINLMTHI KEQTVQYILT+VDD LQENHQRVSIFFDYA+ SKNSbjct: 61 EKQEMLQTEGSQRAKTFINLMTHISKEQTVQYILTLVDDTLQENHQRVSIFFDYAKRSKN 120

Query: 121 TAWPYFLPILNRQDPFTVHMAARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180 TAW YFLP+LNRQD FTVHM ARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGSSbjct: 121 TAWSYFLPMLNRQDLFTVHMTARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180

Query: 181 GVAVETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240 GV ETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQSbjct: 181 GVTAETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240

Page 85: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Query: 241 YQMIFSIWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSTERE 300 YQMIFS+WLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKS ERESbjct: 241 YQMIFSVWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSVERE 300

Query: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELKSbjct: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360

Query: 361 SGRLEWSPVHKSEKFWRENAVRLNEKNYELLKILTKLLEVSDDPQXLAVAAHDVGXYVRX 420 SGRLEWSPVHKSEKFWREN RLNEKNYELLKILTKLLEVSDDPQ LAVAAHDVG YVR Sbjct: 361 SGRLEWSPVHKSEKFWRENPARLNEKNYELLKILTKLLEVSDDPQVLAVAAHDVGEYVRH 420

Query: 421 YPRGKRVIEQXGGKQLVMNHMHHEXQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTXA 480 YPRGKRVIEQ GGKQLVMNHMHHE QQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQT ASbjct: 421 YPRGKRVIEQLGGKQLVMNHMHHEDQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTAA 480

Query: 481 ARS 483 ARSSbjct: 481 ARS 483

Page 86: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

[AA134689-----------------------------------------------MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAF--------------------------]KQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDHKINEELRATRHFDLMDRRDEES [THC196496-------------------------------------EHSIEMQLPFIAKVMGSKRYTIVPVLVGSLPGSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERF------------------------------------------------------------------SFSPYDRHSSIPIYEQITNMDKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRIS-----------------------------------]NNHTHEFRFLHYTQSNKVRSSVDSSVSYASGVLFVHPN

U64857

Page 87: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Translation of 1 MSNR---VVCREASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGY 52U64857 1 MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGY 55 ** .* ********.* * ** ** . ***.*.*****

Translation of 53 TYCGSCAAHAYKQVDPSITRRIFILGPSHHVPLSRCALSSVDIYRTPLYDLRIDQ 107U64857 56 SYCGETAAYAFKQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDH 110 .*** .** *.*** * *.******* * * **... ***** ** .*.

Translation of 108 KIYGELWKTGMFERMSLQTDEDEHSIEMHLPYTAKAMESHKDEFTIIPVLVGALS 162U64857 111 KINEELRATRHFDLMDRRDEESEHSIEMQLPFIAKVMGSKR--YTIVPVLVGSLP 163 ** ** * *. * . .* ******.**. ** * *.. .**.*****.*

Translation of 163 ESKEQEFGKLFSKYLADPSNLFVVSSDFCHWGQRFRYSYYD-ESQGEIYRSIEHL 216U64857 164 GSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERFSFSPYDRHSSIPIYEQITNM 218 *..* .* .*..*. ** ****.********.** .* ** * ** * ..

Translation of 217 DKMGMSIIEQLDPVSFSNYLKKYHNTICGRHPIGVLLNAITELQK-NGMNMSFSF 270U64857 219 DKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRISNNHTHEFRF 273 ** *** ** * * .* **** .******.** ..*.* . *. . * *

Translation of 271 LNYAQSSQCRNWQDSSVSYAAGALTVH 297U64857 274 LHYTQSNKVRSSVDSSVSYASGVLFVHPN 302 *.*.** . * *******.* * **

Page 88: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

gi|1465834 (U64857) No definition line found [Caenorhabditis el... 300 1e-80sp|Q10212|YAY4_SCHPO HYPOTHETICAL 34.8 KD PROTEIN C4H3.04C IN C... 215 3e-55sp|P47085|YJX8_YEAST HYPOTHETICAL 38.5 KD PROTEIN IN SUI2-TDH2 ... 195 3e-49gi|2425141 (AF020286) similar to C. elegans CEESS08F encoded by... 155 4e-37gnl|PID|d1031681 (AP000006) 294aa long hypothetical protein [Py... 87 1e-16gi|2983422 (AE000712) hypothetical protein [Aquifex aeolicus] 85 7e-16gi|2621080 (AE000796) conserved protein [Methanobacterium therm... 79 4e-14gnl|PID|e283857 (Y08257) orf c05005 [Sulfolobus solfataricus] 78 9e-14sp|Q57846|Y403_METJA HYPOTHETICAL PROTEIN MJ0403 >gi|2129073|pi... 77 2e-13gi|2983762 (AE000735) hypothetical protein [Aquifex aeolicus] 68 1e-10

gi|1465834 (U64857) No definition line found [Caenorhabditis elegans] Length = 302 Score = 300 bits (759), Expect = 1e-80 Identities = 153/292 (52%), Positives = 198/292 (67%), Gaps = 4/292 (1%)

Query: 8 REASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGYTYCGSCAAHAYKQVD 67 R ASHAGSWY A+ L+ QL WL ARA+I+PHAGY+YCG AA+A+KQV Sbjct: 11 RSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAFKQVV 70

BLASTP (Jan. 10, 1999)

Page 89: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

[THC132858-------------------]MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGTTSSQRVHTM

LTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKFTLQKTEWDSIDLERLNLA

[THC85433------------------------------------------LDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDMTIPRKRKGFTSQHEKGLEKFYEAVSTA--------------------------------------------] {AA938998*****************FMRHVNLQVVKCVIVASRGFVKDAFMQHLIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEV*******} [THC200182----------------------------------------------------LETPQVALRLADTKAQGEVKALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRA-----------------------------------------------]QDIETRRKYVRLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN

Z36238

Page 90: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Translation of 1 MKLVRKNIEKDNAGQVTLVPEEPEDMWHTYNLVQVGDSLRASTIRKVQTESSTGS 55Z36238 1 MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGT 55 ** ...**.*..* * *. ** ***** ***...** ..******* .*.***.

Translation of 56 VGSNRVRTTLTLCVEAIDFDSQACQLRVKGTNIQENEYVKMGAYHTIELEPNRQF 110Z36238 56 TSSQRVHTMLTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKF 110 *.**.* **..**.**** * .*..** **.**. **.******.*****.*

Translation of 111 TLAKKQWDSVVLERIEQACDPAWSADVAAVVMQEGLAHICLVTPSMTLTRAKVEV 165Z36238 111 TLQKTEWDSIDLERLNLALDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDM 165 ** * .***. ***. * *** .*******..****..**.**.*******...

Translation of 166 NIPRKRKGNCSQHDRALERFYEQVVQAIQRHIHFDVVKCILVASPGFVREQFCDY 220Z36238 166 TIPRKRKGFTSQHEKGLEKFYEAVSTAFMRHVNLQVVKCVIVASRGFVKDAFMQH 220 .******* .***.. **.*** * * **.. ****..*** ***.. *

Translation of 221 MFQQAVKTDNKLLLGNRSKFLQVHASSGHKYSLKEALCDPTVLARLSDTKAAGEV 275Z36238 221 LIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEVLETPQVALRLADTKAQGEV 275 . .* . * .*.**. *.*** * .*** * * * **.**** ***

Page 91: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

Translation of 276 KALDDSYKMLQHEPDRAFYGLKQVEKANEAMAIDTLLISDELFRHQDVATRSRYV 330Z36238 276 KALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRAQDIETRRKYV 330 *** .. ******** .* .**. .**.***..* *** **. ** .**

Translation of 331 RLVDSVKENAGTVRIFSSLHVSGEQLSQLTGVAAILRFPVPELSDQEGDS-SSEE 384Z36238 331 RLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN 381 ***.**.*. * *.****.*******.**** *******.*.* *. *

Translation of 385 D 385Z36238 382 381

Page 92: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

sp|P48612|PELO_DROME PELOTA PROTEIN >gi|973224 (U27197) pelota ... 520 e-147sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHRO... 446 e-125gi|3941543 (AF069497) pelota [Arabidopsis thaliana] 385 e-106pir||S45456 DOM34 protein - yeast (Saccharomyces cerevisiae) >g... 236 2e-61sp|P33309|DO34_YEAST DOM34 PROTEIN >gi|295608 (L11277) DOM34 [S... 212 2e-54gnl|PID|e304505 (Z86109) unknown [Saccharomyces pastorianus] 199 3e-50gi|2622770 (AE000923) cell division protein [Methanobacterium t... 155 4e-37gnl|PID|d1031529 (AP000006) 356aa long hypothetical protein [Py... 146 3e-34sp|Q57638|Y174_METJA HYPOTHETICAL PROTEIN MJ0174 >gi|2127805|pi... 145 6e-34gi|2649765 (AE001046) cell division protein pelota (pelA) [Arch... 116 3e-25

sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHROMOSOME III >gi|3879163|gnl|PID|e1348805 (Z36238) Similar to the DOM34 protein of saccharomyces cerevisiae (Swiss Prot accession number P33309) [Caenorhabditis elegans] Length = 381 Score = 446 bits (1136), Expect = e-125 Identities = 215/371 (57%), Positives = 282/371 (75%)

BLASTP (Jan. 10, 1999)

Page 93: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 94: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 95: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 96: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 97: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 98: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 99: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.
Page 100: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

5 55

55 55

55 555 555555 555 5 5

5 5 5555 5555555555

5

5

5

5

55 55 555 55 5555 55 5555 55555 555 55 55555 555 555 55 55

5

555 555 555 555555 5555 55 5555 555 5 5555555 5555 5 555555 5555 5

5

55

55

5

55

55

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

0 100 200 300 400 500 600 700 800 900 1000

C.

ele

gan

s p

rote

in len

gth

CGI protein length

Page 101: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

HH

H

H

HHH

H

H

HH

H

H

H

H

HH

H

HHH H

H

H

H

H

H

HH

HH

HH

HH

H

HHH

H

H

H

H

H

H

HH HH

HH

H

HH

H

H

H

H

H

H

H

H

HH

H

HH

HH

H

H

H

HHHH

HH

H

HH

H

H

H

H

H

H

HH

HH

HHH

H

H

H

H

H

H

H

H

HHHHH H

HH

HHHH

HH

H H

HHHHH

H

H

H

H

H

HHH

HHHH

HH

HH H

H

HH

H

HHHH

H

H

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600 700 800 900 1000

Matc

h a

rea len

gth

CGI protein length

Page 102: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

A

A

AA

A

A

A

A

A

A

A

A

AA

AA

A

A

A

A

A

A

AA

A

A

A

AA

A

AA

AAA

A

A

A

A

AA

AA

A

A

A

A

A

AA

A

AAA

A

A

A

A

A

A

A

A

A

A

A

A

A

AA

A

A

A

A

AA

A

A

A

A

A

A

A

A

A

A

A

AAA

A

A AA

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

AA

AAA

A

A

A

A

A

A

A

A

A

A

AA

A

A

AA

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000

Pro

tein

sim

ilari

ty b

etw

een

CG

I an

d C

. ele

gan

s

CGI protein length

Page 103: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

C. elegans from WormPept: 18,452 entries HGI searches

(5 days for TBLASTN analysis)

*Families 3,934*Known Gene 7,954*New Contig 3,456*Undetermined 2,070

<100 aa 1,038

*150 full length genes so far, more expected following GAP closure and 5’RACE.

83% between Human & C. elegans11% C. elegans specific

Page 104: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

C. elegans from WormPept: 18,452 entries MGI searches

(5 days for TBLASTN analysis)

*Families 5,602*Known Gene 4,151*New Contig 5,805*Undetermined 1,856

<100 aa 1,038

84% between Mouse & C. elegans10% C. elegans specific

Page 105: hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91 hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD.

http://www.ibms.sinica.edu.tw/~wenlin/