Top Banner
BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes classic references Bishop et al., 1974 Nature 250, 199-204 Davidson and Britten, 1979 Science 204, 1052-1059 abundant 10-15 mRNAs that together represent 10-20% of the total RNA mass > 0.2% intermediate 1,000-2,000 mRNAs together comprising 40-45% of the total 0.05-0.2% abundance rare 15,000-20,000 mRNAs comprising 40-45% of the total abundance of each is less than 0.05% of the total some of these might only occur at a few copies per cell How does one go about identifying genes that might only occur at a few copies per cell?
14

BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 1 ©copyright Bruce Blumberg 2004. All rights reserved

mRNA frequency and cloning

• mRNA frequency classes – classic references

• Bishop et al., 1974 Nature 250, 199-204• Davidson and Britten, 1979 Science 204, 1052-1059

– abundant • 10-15 mRNAs that together represent 10-20% of the total RNA

mass• > 0.2%

– intermediate • 1,000-2,000 mRNAs together comprising 40-45% of the total• 0.05-0.2% abundance

– rare • 15,000-20,000 mRNAs comprising 40-45% of the total• abundance of each is less than 0.05% of the total• some of these might only occur at a few copies per cell

• How does one go about identifying genes that might only occur at a few copies per cell?

Page 2: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 2 ©copyright Bruce Blumberg 2004. All rights reserved

Normalization and subtraction

• How to identify genes that might only occur at a few copies per cell?– alter the representation of the cDNAs in a library or probe

– Normalization - process of reducing the frequency of abundant and increasing the frequency of rare mRNAs

• Bonaldo et al., 1996 Genome Research 6, 791-806

– Subtraction - removing cDNAs (mRNAs) expressed in two populations leaving only differentially expressed

• Sagerström et al. (1997) Ann Rev. Biochem 66, 751-783

Page 3: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 3 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• The problem – completion of the human genome sequence was very far off in the distance– The big debate (circa 1989)

• Sequence entire genome– Will take a long time and lots of money

• Or sequence mRNAs (cDNAs)– Will get coding sequences but how to be sure you have

every one?– How to get rare cDNAs?

Page 4: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 4 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• In 1991 only a few thousand mRNA sequences identified– Brain mRNAs < 200 – Not good for solving neurological diseases

• Venter and colleagues from National Institute for Neurological Disorders and Stroke

• How to get rapid sequence to use for – Mapping– Studying diseases– Gene identification

• The solution?– High throughput sequencing of random cDNAs (96/day!)

• Modern machines 8 x 384 /day each– These Expressed Sequence Tags have many uses– Venter proposes that they be used in place of STS (sequence

tagged sites)• Provide more information with less cost and effort (no extensive

validation required)

Page 5: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 5 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• What do you get from EST sequencing– Rapid survey of expressed genes in cell, tissue, organ or embryo– Information for gene identification – Tags for gene mapping

• Test how to improve frequency of new genes in table 1

Page 6: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 6 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Tables 2 – Table 2 shows that they

identified a number of already known human genes

– Unsaid is that these are all relatively abundant transcripts

• At least in intermediate class– Suggests what subsequent EST

sequencing shows to be the case

• Random EST sequencing overrepresents abundant and intermediate frequency sequences

• Underrepresents rare frequency class

Page 7: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 7 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Table 3 shows relationship of ESTs to other non-identical genes in the database– Putative relatives, depending on

degree of sequence similarity– Ranges from nearly identical to

about 57% (still fairly closely related)

– Conclude that EST sequencing can identify relatives of genes known in other species

Page 8: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 8 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Table 4– Compare sequences with

ProSite motif database• This categorizes patterns

seen in sequences– NLS– Zinc fingers– ATP binding cassette– Etc

– Found several that appear to be new members of particular classes

– Conclude that EST sequencing and analysis allows one to identify unknown members of known gene families

Page 9: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 9 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Table 5– Evaluated accuracy of sequencing 92-98% depending on read

length• Limitation of separation technology (slab gels)

– Very poor by today’s standards (99+% at 600 bases)• High error rate means must sequence at greater

redundancy to get correct sequence

Page 10: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 10 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Figure 1 – identification of human relatives of Drosophila neurogenic genes– These are responsible for neuronal differentiation in Drosophila– Proves that genes of known function from model organism can be

used to identify interesting human genes to study

Page 11: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 11 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Figure 2– Mapped ESTS to

chromosomes – Used PCR to check which

members of a RH panel corresponded to EST

• Maps the EST to a chromosome (provided that RH has been so mapped)

– Why is this important?• This enables mRNAs to

be mapped to genomic loci and provides a quick entry point to gene identification

– Diseases– Mutations– translocations

Page 12: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 12 ©copyright Bruce Blumberg 2004. All rights reserved

Adams et al., (1991) Science 252, 1651-1656

• Conclusions– EST sequencing is a rapid and efficient way to generate sequence

tags with numerous uses• 150-400 bp of sequence is enough to identify sequence, map

to chromosome, determine homology with distant organisms• Claimed matches with yeast and neurospora sequences

– In fact, these were contaminants in library from yeast RNA used as carrier for precipitations during library construction

» very sloppy– 337/600 sequences were putative new genes – good method to

quickly identify genes– Way too many abundant genes – suggested that libraries must be

normalized or subtracted to minimize redundancy– Pioneered large scale automated sequence entry– Suggested that in a few years, they would have mapped all of

mRNAs from human brain• Overly optimistic

Page 13: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 13 ©copyright Bruce Blumberg 2004. All rights reserved

Organism May 1999 August 2000 August 2002

Homo sapiens (human) 1,380,737 2,232,809 4,533,427Mus musculus + domesticus (mouse) 521,672 1,604,115 2,624,752Rattus sp. (rat) 112,390 188,625 351,827Glycine max (soybean) 8,236 96,930 268,299Drosophila melanogaster (fruit fly) 83,197 90,777 256,583Danio rerio (zebrafish) 24,567 71,186 255,334Hordeum vulgare + subsp. vulgare (barley) 80 240,877Bos taurus (cattle) 208 92,987 235,495Xenopus laevis 408 35,218 220,132Triticum aestivum (wheat) 4 39,150 196,047Caenorhabditis elegans (nematode) 72,567 101,252 189,632Arabidopsis thaliana (thale cress) 37,745 111,736 174,624Ciona intestinalis 102 174,272Zea mays (maize) 13,177 70,572 168,610Medicago truncatula (barrel medic) 899 72,828 162,917Dictyostelium discoideum 15,199 19,183 154,197Lycopersicon esculentum (tomato) 9,088 87,680 148,346Chlamydomonas reinhardtii 82 130,324Sus scrofa (pig) 4,136 33,267 110,213Oryza sativa (rice) 40,499 60,237 105,019Silurana+Xenopus tropicalis 0 23 104,619Solanum tuberosum (potato) 85 94,420Anopheles gambiae (African malaria mosquito) 86 94,032Sorghum bicolor (sorghum) 107 34,738 84,712Gallus gallus (chicken) 388 12,840 62,476

total public ESTs 2,464,337 5,462,530 12,190,151

dbEST Summary

Page 14: BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg 2004. All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.

BioSci 145B lecture 1 page 14 ©copyright Bruce Blumberg 2004. All rights reserved

dbEST release 040904 - April 9, 2004

Homo sapiens (human) 5,484,645Mus musculus + domesticus (mouse) 4,088,831Rattus sp. (rat) 592,060 Triticum aestivum (wheat) 555,472 Ciona intestinalis 492,511 Danio rerio (zebrafish) 484,827 Gallus gallus (chicken) 481,956 Bos taurus (cattle) 409,104 Zea mays (maize) 395,955 Xenopus laevis (African clawed frog) 368,783 Hordeum vulgare + subsp. vulgare (barley) 356,856 Xenopus tropicalis 349,052 Glycine max (soybean) 346,582 Sus scrofa (pig) 287,741 Oryza sativa (rice) 283,989 Drosophila melanogaster (fruit fly) 274,367 Saccharum officinarum 246,301 Caenorhabditis elegans (nematode) 231,096 Arabidopsis thaliana (thale cress) 204,396 Sorghum bicolor (sorghum) 190,864 Dictyostelium discoideum 155,032 Lycopersicon esculentum (tomato) 150,519 Oryzias latipes (Japanese medaka) 149,697 Solanum tuberosum (potato) 149,227 Oncorhynchus mykiss (rainbow trout) 142,967 Schistosoma mansoni (blood fluke) 139,135 Vitis vinifera 137,660 Anopheles gambiae (African malaria mosquito) 134,784 Bombyx mori (domestic silkworm) 116,541 Pinus taeda (loblolly pine) 110,622 Lotus corniculatus var. japonicus 110,563

Number of public entries: 20,685,791