Top Banner
RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute
56

RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Dec 15, 2015

Download

Documents

Paul Sowerby
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

RNAs in the human genome

Sam Griffiths-Jones

The Wellcome Trust Sanger Institute

Page 2: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Outline

• I. Non-coding RNA• The genome’s dark matter• Family classification• Genome annotation

• II. ncRNA genes in the human genome• Rogue’s gallery• miRNAs• Regulatory elements

Page 3: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

T. thermophilus - Ramakrishnan et al., Cell, 2002

Page 4: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Protein/RNA genes

DNA

RNA

proteinX

Page 5: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

ncRNA genes

• …. code for functional RNAs• Many cellular machines contain RNA

• Ribosome rRNA• Spliceosome snRNAs (U1,U2,U4,U5,U6)• Telomerase Telomerase RNA• SRP SRP RNA

Page 6: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

How many genes in the human genome?

Page 7: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Gene sweep

• CSHL 2000-2003

• Rules• $1 in 2000, $5 in 2001 and $20 in 2002

• A gene is a set of connected transcripts. A transcript is a set of exons connected via transcription. At least one transcript must be expressed outside of the nucleus and one transcript must encode a protein.

• One bet per person, per year

• Results• 165 bets

• Mean 61710

• Lowest 25947

• Highest 153478

• Answer: 21000 Winner: Lee Rowen

• http://www.ensembl.org/Genesweep/

Page 8: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

ncRNA genes

• Genomic dark matter• Ignored by gene prediction methods• Not in EnsEMBL• Computational complexity

• ~10% of human gene count?

Page 9: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

The RNA World

• Origin of life / central dogma paradox• DNA needs proteins to replicate• Proteins coded for by DNA

• RNA can be code and machinery• Selex, aptamers

• RNAs are remnants• Ancient• Essential

Page 10: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Biological sequence analysis

Protein easyRNA hard

Page 11: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Gene finding

• Rules• ATG• TAA, TGA, TAG• GT…..AG

• Compositional features• Exon lengths• Intron lengths• Codon bias• General genomic properties

• Homology

?

?

Page 12: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Protein sequence analysis

Query: 1 MKFYTIKLPKFLGGIVRAMLGSFRKD 26 M+ TIKLPKFL IVR G+ + D Sbjct: 390 MRIMTIKLPKFLAKIVRMFKGNKKSD 467

Page 13: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 14: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

RNA sequence analysis

Page 15: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

RNA sequence analysis

Page 16: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Why are families useful?

• Alignments of related sequences

• Phylogenetic trees

• Homologue detection

• Genome annotation

• Secondary structure prediction

S. cerevisiae UCCUCGUGAGAGGGP. canadensis GUCUC.UGAGAGAUP. strasburgensis CUCUC.UGAGAGAGK. thermotolerans UUCUCGUGAGAGAASS <<<<<....>>>>>

Page 17: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 18: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

RNA models

• Covariance models (profile-SCFGs)• Analogue to profile-HMMs• Statistical representation of the alignment

with structure• Homologue detection• Multiple sequence alignment• (Sean Eddy)

Page 19: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Protein sequence analysis - HMMs

ERELKKQKKLSNRERELKK..KQSNRERELKRQRKQSNRKAAAQRQKMIKNR

M M M M

D

I

EREKKKRKQSNR

D

I

B E

D D

I

Page 20: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

RNA sequence analysis - SCFGs

MP

G G A A G A U C C< < < . . . > > >

MP

MP

ML

ML

A – UG – CG – C

A AG

ML

Page 21: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

RNA models - problems

• Problems• Speed• Memory• Sensitivity

• Speed• 30 billion bases in DBs• O(N3) wrt model length• small model 300 b/s• 28S rRNA 200 b/day

Page 22: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Sanger supercomputers

Page 23: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 24: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 25: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Rfam 5.0

• http://www.sanger.ac.uk/Software/Rfam/• http://rfam.wustl.edu/• 176 ncRNA families

• Structure annotated alignments• Species distributions• Keyword searches• Sequence searches

• >235000 regions in EMBL 76

Page 26: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

ncRNA families

What we have:• tRNA• 5S, 5.8S rRNAs• Spliceosomal RNAs• SRP, RNaseP• Telomerase, tmRNA, vault• E. coli screens• Some snoRNAs• Some miRNAs• Some UTR elements• Self-splicing introns• …… more

What we don’t:• 18S, 23S rRNAs• Other large things (Xist etc)• Lots of snoRNAs• Lots of miRNAs• Many small families• Unknowns

Page 27: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Genome annotation

• GeneralOne tool fits all Compute drainAutomatic Eukaryotic complicationsComprehensiveGreat for prokaryotes

• SpecificHeuristics One family, one gene

finderIncreased speedIncreased sensitivity

tRNAscan-SE, BRUCE, SRPscan, snoscan

Page 28: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Outline

• I. Non-coding RNA• The genome’s dark matter• Family classification• Genome annotation

• II. ncRNA genes in the human genome• Rogue’s gallery• miRNAs• Regulatory elements

Page 29: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 30: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Outline

• I. Non-coding RNA• The genome’s dark matter• Family classification• Genome annotation

• II. ncRNA genes in the human genome• Rogue’s gallery• miRNAs• Regulatory elements

Page 31: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

International Human Genome Sequencing Consortium, Nature, 2001

Page 32: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

X chromosome inactivation in mammals

X X X Y

X

Dosage compensation

Page 33: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Xist – X inactive-specific transcript

Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67

Page 34: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

International Human Genome Sequencing Consortium, Nature, 2001

Page 35: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

microRNAs

• A novel class of ncRNA gene• Products are ~22 nt RNAs• Precursors are 70-100 nt hairpins• Gene regulation by pairing to mRNA• Unknown before 2001

Page 36: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Timeline

• Late 70’s – lin-4 and let-7 regulate developmental timing in worm

• 1993 – lin-4 codes for a ~22 nt RNA, complementary to 3’ UTR of lin-14

• 2000 – …. so does let-7 (stRNAs)

• 2000 – let-7 is conserved in bilaterally symmetric animals

• 2001 – ~100 miRNAs discovered by cloning in worm, fly and human

• 2002 – miRNAs conserved in plants

• 2002 – Science magazine’s breakthrough of the year

• 2002 – miRNA Registry established

• 2003 – miRNAs may account for 1% of total gene count in animals

• 2003 – a few targets of miRNAs identified

• 2004 – miRNA Registry has 719 miRNAs

Page 37: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

0

20

40

60

80

100

120

140

1999 2000 2001 2002 2003 2004

Year

Nu

mb

er

of

pu

bli

ca

tio

ns

“miRNA” in PubMed

Page 38: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

miRNA biogenesis

Adapted from DP Bartel, Cell 116:281-297(2004)

Page 39: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

miRNAs targets

DP Bartel, Cell 2004 116:281-287

Page 40: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

PNAS 99:15524-15529(2002)

Page 41: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

miRNA Registry 3.0

• Searchable database of published miRNAs• http://www.sanger.ac.uk/Software/Rfam/mirna/

• 719 entries from human, mouse, rat, worm, fly, and plants

• Naming service• Pre-publication

• Unique names for distinct miRNAs

• Confidentiality for unpublished data

Page 42: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 43: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Genomic context

180 known miRNAs in human

130 intergenic 50 intronic

60 polycistronic

70 monocistronic

Page 44: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

ncRNA gene contexts

AAAAAAA

tRNA, snRNAs,SRP, RNase P …..

Xist

miRNAs

miRNAs, snoRNAs

Page 45: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Inside-out genes

protein

Page 46: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Inside-out genes

degradation

Gas5, UHG, U17HG,U19H

snoRNA

Page 47: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

PrfA

37oC

25oC

Virulence gene expression

Cis-regulatory RNA elementsPrfA in Listeria

Page 48: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

UTR elements in human

• IRE regulation of iron metabolism

• SECIS UGA -> SeC

• Histone 3’ UTR 3’ end formation

• Vimentin 3’ UTR mRNA localisation

• CAESAR CTGF repression

• …. many more

Page 49: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

ncRNAs in human genome

• tRNA 600

• 18S rRNA 200

• 5.8S rRNA 200

• 28S rRNA 200

• 5S rRNA 200

• snoRNA 300

• miRNA 250

• U1 40

• U2 30

• U4 30

• U5 30

• U6 20

• U4atac 5

• U6atac 5

• U11 5

• U12 5

• SRP RNA 1

• RNase P RNA 1

• Telomerase RNA 1

• RNase MRP 1

• Y RNA 5

• Vault 4

• 7SK RNA 1

• Xist 1

• H19 1

• BIC 1

• Antisense RNAs 1000s?

• Cis reg regions 100s?

• Others ?

Page 50: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.
Page 51: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Summary

• ncRNA genes ….• have diverse and essential roles

• may be relics of ancient RNA-based life

• provide major computational challenges

• are often ignored!

• >10% of human gene count?

• Family classifications are useful for ….• finding homologues

• predicting structure

• allow automatic genome annotation

Page 52: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Just plain weird

http://vaults.arc.ucla.edu/sci/sci_home.htm

• Vault is huge• 13 Md

• 30 x 55 nm

• Described in 1986

• 3 proteins• MVP

• TEP1

• vPARP

• vRNA

• Conserved in higher euks

Page 53: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

http://vaults.arc.ucla.edu/sci/sci_home.htm

Page 54: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Thanks

• Alex Bateman

• Mhairi Marshall

• Simon Moxon

• Ajay Khanna

• Sean Eddy

• Informatics support group

• Ian Holmes

• Bjarne Knudsen

• Robbie Klein

• David Bartel

• Tom Tuschl

• Victor Ambros

Page 55: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Bibliography

• Computational genomics of non-coding RNA genes. Sean R. Eddy, Cell 109:137-140 (2002)

• Non-coding RNAs: the architects of eukaryotic complexity. John S. Mattick, EMBO Reports 2:986-991 (2001)

• MicroRNAs: Genomics, biogenesis, mechanism and function. David P. Bartel, Cell 116:281-297 (2004)

• Rfam: An RNA family database. Sam Griffiths-Jones et al., Nucl. Acids Res. 31:439-441 (2003)

Page 56: RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

[email protected]

http://www.sanger.ac.uk/Software/Rfam/[email protected]

http://www.stats.ox.ac.uk/~hein/HumanGenome/