Dec 17, 2015
Organisation of human genome
Nuclear genome (3.2 Gbp) 24 types of chromosomes Y- 51Mb and chr1 -279Mbp
Mitochondrial genome
9
Intergenicregions(junk)
Introns (junk)Exons
1.5%1.5%
The genome is empty?The genome is empty?
Estimatednumber ofgenes:
6,034 13,061 19,099 25,000
Saccharomycescerevisiae
(baker’s yeast)
Drosophilamelanogaster
(fruit fly)
Caenorhabdituselegans
(roundworm)
Arabidopsisthaliana
(mustard plant)
LA COMPLEJIDAD BIOLÓGICA CRECIENTE EXIGE CAMBIOS GENÓMICOS QUE INCREMENTEN LA CAPACIDAD INFORMACIONAL DEL SISTEMA...
...PERO EL NÚMERO DE GENES EN LOS DISTINTOS GENOMAS SECUENCIADOS NO CONCUERDA CON LO ESPERADO (APARENTEMENTE)
Amphimedon queenslandica 18693
Nassonia vitripennis 17279
Bos taurus >22790
Homo sapiens 21527
Mus musculus 22083
Trichoplax adhaerens 11514
Nematostella vectensis 18000
Danio rerio 21413Drosophila melanogaster 13781
Ciona intestinalis 16000
Caenorhabditis elegans 20224
Gallus gallus <17000
Takifugu rubripes 18500
Xenopus tropicalis 18000
Strongylocentrotus purpuratus 23300
Anolis carolinensis 17000
Gorilla gorilla 21000
Pan troglodytes 21000
Oryza sativa 50000
Arabidopsis thaliana 26000
Glycine max 75778
Populus trichocarpa 45550
Why (coding) gene number doesn’t matter?
• More sophisticated regulation of expression?
• Proteome vastly larger than genome?
– Alternate splicing
– RNA editing
• Postranslational modifications
• Cellular location
…but, remember there are other genes
Genes in the genome:
• Protein-coding genes (mRNA): around 20500 (as of 10/2012) • Non-coding RNAs
Ribosomal RNA (rRNA)Transfer RNA (tRNA)Small nuclear RNA (SnRNA)Small nucleolar RNA (SnoRNA)microRNA (miRNA)Other non-coding RNAs (Xist, 7SK, etc.)
• Peudogenes
Non polypeptide–coding: RNA encoding
Statistics about the current Gencode freeze (version 13)*The statistics derive from the gtf files, which include only the main chromosomes of the human reference genome.
Version 13 (March 2012 freeze, GRCh37)General statsTotal No of Genes 55123 Protein-coding genes 20670 Long non-coding RNA genes 12393 Small non-coding RNA genes 9173 Pseudogenes 13123 Total No of Transcripts 182967 Protein-coding transcripts 77901 Long non-coding RNA loci transcripts 19835
Total No of distinct translations 78119 Genes that have more than one distinct translations 14235
Protein-coding genes (mRNA):
HUMAN genes and their homology
to genes from other organisms
Noncoding regions in coding genes
• Regulatory regions– RNA polymerase binding site– Transcription factor binding sites– Polyadenylation [poly(A)] sites– Enhancers
• 5’- and 3’-UTRs
CODING GENES
DNA as a series of ‘docking’ sites
It is the relative location of these docking sites to one another that permits genes to be transcribed, spliced, and translated properly and in specific spatial and temporal patterns.
…some more statistics
• Gene density 1/100 kb (vary widely); • Averagely 9 exons per gene• 363 exons in titin gene• Many genes are intronsless• Largest intron is 800 kb (WWOX gene)• Smallest introns – 10 bp• Average 5’ UTR 0,2-0,3 kb• Average 3’ UTR 0,77 kb but underestimated…• Largest protein: titin: 38,138 aa• Largest gene: dystrophin
Human genes vary enormously in size and exon content
An example of complex human gene locus INK4a-ARF
From: Prof. Gordon Peters website
Genes within genes
Neurofibromatosis gene (NF1) intron 26 encode :
OGMP (oligodendrocyte myelin glycoprotein)EVI2A and EVO2B (homologues of ecotropic viral intergration sites in mouse)
Why gene number doesn’t matter?
• More sophisticated regulation of expression
• Proteome vastly larger than genome– Alternate splicing
– RNA editing…
• Postranslational modifications
• Cooption
• GRN’s connectivity
REDES DINÁMICAS
Why gene number doesn’t matter?
• More sophisticated regulation of expression
• Proteome vastly larger than genome
– Alternate splicing
– RNA editing…
• Postranslational modifications
• Cooption
• GRN’s connectivity
Table 1. Levels of regulation--loci of control constraints--above the genome.
Levels and transitions Dynamic regulatory system
1. Genome to transcriptomeEpigenetic regulation of gene expression (5). Includes pathways that detect energylevels (redox levels) and repress DNA transcription when cellular NADH levels areincreased.
2. Transcriptome to proteomeRegulatory constraints include posttranslational modification of proteins.
3. Proteome to dynamic systemMetabolic networks of glycolysis and mitochondrial oxidation-reduction are thedynamic systems presently the best understood in terms of both mechanism offormation and operating principles. They display control distributed over all enzymes of a network, and their phenotype includes cellular redox potential.
4. Dynamic systems to phenotype Control of global phenotype such as disease may be localized to a single regulatorysystem (such as metabolic, hormone signaling, etc.) or be distributed over many systems and levels
Gene Expression• The products of genes may be RNA or protein• RNA and protein synthesis occur in many steps• These steps are regulated and conttroled
Table 1. Levels of regulation--loci of control constraints--above the genome.
Levels and transitions Dynamic regulatory system
1. Genome to transcriptomeEpigenetic regulation of gene expression (5). Includes pathways that detect energylevels (redox levels) and repress DNA transcription when cellular NADH levels areincreased.
2. Transcriptometo proteomeRegulatory constraints include posttranslational modification ofproteins.
3. Proteome to dynamic systemMetabolic networks of glycolysis and mitochondrial oxidation-reduction are thedynamic systems presently the best understood in terms of both mechanism offormation and operating principles. They display control distributed over all enzymes of a network, and their phenotype includes cellular redox potential.
4. Dynamic systems to phenotypeControl of global phenotype such as disease may be localized to a single regulatorysystem (such as metabolic, hormone signaling, etc.) or be distributed over many systems and levels
UCSC
Table 1. Levels of regulation--loci of control constraints--above the genome.
Levels and transitions Dynamic regulatory system
1.Genome to transcriptomeEpigenetic regulation of gene expression (5). Includes pathways that detect energylevels (redox levels) and repress DNA transcription when cellular NADH levels areincreased.
2.Transcriptometo proteomeRegulatory constraints include posttranslational modification of proteins.
3.Proteome to dynamic systemMetabolic networks of glycolysis and mitochondrial oxidation-reduction are thedynamic systems presently the best understood in terms of both mechanism offormation and operating principles. They display control distributed over all enzymes of a network, and their phenotype includes cellular redox potential.
4.Dynamic systems to phenotype Control of global phenotype such as disease may be localized to a single regulatorysystem (such as metabolic, hormone signaling, etc.) or be distributed over many systems and levels
Gene Expression• The products of genes may be RNA or protein• RNA and protein synthesis occur in many steps• These steps are regulated and conttroled
Location of CpG islands in the gene
CpG islands do NOT have a deficit of CpG dinucelotides
How epigenetics worksPromoter Region Gene
CpG Island
= CpG
= methylated CpG
Unmethylated CpGs relax chromatin
Gene
= CpG
= methylated CpG
RNA
Proteins
Methylated CpGs constrain chromatin
Gene
= CpG
= methylated CpG
RNA
Proteins
Chromatin RemodelingSNF/SWI
Histone ModificationAcetylation
UbiquitinationSumoylationMethylation
Phosphorylation
DNA MethylationCpG dinucleotides
MeCP2
Histone SubstitutionH2AZH2AxH3.3
Transcription FactorModification
AcetylationPhosphorylation
Chromatin Modification
Eukaryotic transcription regulationModular construction and combinatorial control
• The regulatory sequence (cis element) on DNA consists of multiple motifs specific for transcription factors.
• Multiple transcription factors can bind simultaneously to the regulatory sequences and act together on the transcription of the gene.
TBPGene X
TATA-35
Regulated Transcription
Co-activatorprotein
General transcription
factors
Transcriptional activatorsbinding to promoter region
Activators stimulate the highly cooperative assembly of initiation complexes
Figure 10-60
Binding sites for activators that control transcription of the mouse TTR gene
Model for cooperative assembly of an activated transcription-initiation complex in the TTR promoter
Figure 10-61
(TTR= transthyretin)
Locus Control Region
Regulatory site required for optimal expressionof adjacent group of genes
Insulator ElementPrevents activation/repression extending to an adjacent
regulatory sequence
Distant Cis-Acting Elements
Distant Cis-Acting Elements
Insulator ElementPrevents activation/repression extending to an adjacent
regulatory sequence
TBPGene X
TATA-35
Regulated Transcription
Co-activatorprotein
General transcription
factors
Transcriptional activatorsbinding to promoter region
ALTERNATIVE PROMOTERS
REGULACIÓN ESPECÍFICA DE SEXO EN EL GEN DNMT1 (METHYLTRANSFERASE):PROMOTORES DE OOCITO, SOMÁTICO, O DE ESPERMATOCITO
Posttranscriptional control
• Regulation of RNA processing
• Regulation of mRNA degradation
• Regulation of translation
mRNA: many places for variation, modification, regulation
• transcription• initiation• elongation• termination
• 5’ capping • 3’ polyA addition
• alternative sites
• splicing• alternative exons• self-splicing, spliceosome-
mediated
• editing• changing bases and codons
• nuclear export• mature mRNA only
• stability• nonsense-mediated decay• degradation signals
• sequestration• localization in cytoplasmic
compartments• access to translation machinery
• antisense/RNA interference• inhibit translation
The PolyA Site (PAS)
3’ exon
stop UTRAAAA
PAS
AATAAA~17nt
AAAAAAAAAT
PolyA signal
Alternative polyadenylation sites
Alternative PAS & Post-transcriptional (de)regulation
Coding sequenceAUUAAA
3' UTRAUUAAA
AUUAAA
AUUAAA AUUAAA
Possible regulatory element(stability, translation, transport)
Use of abnormal polyA site is associated to various diseases: A/B Thalassemia (globin)Mantle cell lymphoma (Cyclin CCND1)Teratocarcinoma (PDGF)Hypertension (Ca2+ ATPase)
Consensus nucleotides at intron/exon junctions
Alternative splicing is a mechanism for Generating functional diversity
Alternative processsing exampleAlternative processsing example
RNA editing is a rare form of post-transcriptional processing whereby base-specific changes are enzymatically introduced at the RNA level. Types of RNA editing in humans:
(i) C---> U, occurs in humans by a specific cytosine deaminase
e.g. The expression of the human apolipoprotein B gene in the intestine involves tissue-specific RNA editing
(ii) A ---> I, the amino group in in carbon 6 of adenine is replaced by a carbonyl group. I then acts as a G. Occurs in some ligand-gated ion channels.
(iii) U ---> C, in mRNA of the WT1 Wilms’ tumor gene
(iv) U ---> A, in alpha-galactosidase mRNA
RNA editing
Apo B-100Apo B-100
Apo B-48 Apo B-48
Gene Expression
• The products of genes may be RNA or protein• RNA and protein synthesis occur in many steps• These steps are frequently regulated
1. Proteolysis
2. Glycosylation
3. Attachment of lipids:
myristoylation
prenylation (farnesyl or geranylgeranyl)
palmitoylation
4. Attachment of glycolipids
3. Protein Phosphorylation
1. Proteolysis
Post-translational modifications that alter activity of the p53 protein. Enzymes that have been shown to modify specific amino acid residues of p53 are shown. Enzymes that inhibit the covalent modifications are indicated in red. P, phosphorylation; R, ribosylation; Ac, acetylation.
…increasing informational capability of the genome, but there are other genes….