How cells read the genome: From DNA to Protein
Control of Gene expression
M. Saifur Rohman, MD. PhD. FIHA. FICA
Sub topic• From DNA to RNA• From RNA to protein• The RNA world and origin of the life• An overview of gene control• DNA binding motifs in gene regulatory proteins• How Genetic swithes work• The molecular genetic mechanism that create
specialized cell type• Posttransciptional controls• How genome evolve
From RNA to Protein: Step by step• The genetic code • Open reading frames• tRNA structure and production• tRNA charging - tRNA synthetases• Ribosome structure
– (components, tRNA binding, rRNA, peptide tunnel)• Peptide chain elongation
– EF-Tu, EF-G or EF1, EF2• Initiation (prokayotic & eukaryotic)• Termination• Polyribosomes• mRNA template surveillance (Quality control)
NMD, Nostop mediated decay, tmRNA• Changes in the code (selenocysteine, frameshifting, hardcoded)• Protein folding (chaperones… hsp60 & hsp70, degradation, diseases)
The RNA World• Basic tenets of the theory
• Basic timeline• preRNA world• Ribozymes• SELEX Systematic Evolution of Ligands by EXponential
enrichment• Model of central dogma
Gene control and DNA binding motifs
• Differentiated cells contain the same DNA• Structure of DNA binding proteins
– DNA binding and Activation domains
• Types of DNA binding motifs and how they work• Common techniques
• Microarray• 2-D gels• Gel mobility shift• DNA affinity chromatography• Footprinting• SELEX• One hybrid system• Chromatin immunoprecipitation, Chip-chip, Chip-seq• Phylogenetic footprinting
The control of gene expression• Each cell in the human contains all the genetic
material for the growth and development of a human• Some of these genes will be need to be expressed all
the time• These are the genes that are involved in of vital
biochemical processes such as respiration• Other genes are not expressed all the time• They are switched on an off at need
© 2007 Paul Billiet ODWS
Operons
• An operon is a group of genes that are transcribed at the same time.
• They usually control an important biochemical process.
• They are only found in prokaryotes.
© NobelPrize.org
Jacob, Monod & Lwoff
© 2007 Paul Billiet ODWS
The lac Operon
The lac operon consists of three genes each involved in processing the sugar lactose
One of them is the gene for the enzyme β-galactosidase
This enzyme hydrolyses lactose into glucose and galactose
© 2007 Paul Billiet ODWS
1. When lactose is absent
• A repressor protein is continuously synthesised. It sits on a sequence of DNA just in front of the lac operon, the Operator site
• The repressor protein blocks the Promoter site where the RNA polymerase settles before it starts transcribing
Regulator gene
lac operonOperator site
z y aDNA
I O
Repressor protein
RNA polymeraseBlocked
© 2007 Paul Billiet ODWS
2. When lactose is present • A small amount of a sugar allolactose is formed within the
bacterial cell. This fits onto the repressor protein at another active site (allosteric site)
• This causes the repressor protein to change its shape (a conformational change). It can no longer sit on the operator site. RNA polymerase can now reach its promoter site
z y a
DNA
I O© 2007 Paul Billiet ODWS
2. When lactose is present • A small amount of a sugar allolactose is formed within the
bacterial cell. This fits onto the repressor protein at another active site (allosteric site)
• This causes the repressor protein to change its shape (a conformational change). It can no longer sit on the operator site. RNA polymerase can now reach its promoter site
Promotor site
z y aDNA
I O
© 2007 Paul Billiet ODWS
3. When both glucose and lactose are present
• This explains how the lac operon is transcribed only when lactose is present.
• BUT….. this does not explain why the operon is not transcribed when both glucose and lactose are present.
© 2007 Paul Billiet ODWS
• When glucose and lactose are present RNA polymerase can sit on the promoter site but it is unstable and it keeps falling off
Promotor site
z y aDNA
I O
Repressor protein removed
RNA polymerase
4. When glucose is absent and lactose is present
• Another protein is needed, an activator protein. This stabilises RNA polymerase.
• The activator protein only works when glucose is absent • In this way E. coli only makes enzymes to metabolise other
sugars in the absence of glucose
Promotor site
z y aDNA
I O
Transcription
Activator protein steadies
the RNA polymerase
© 2007 Paul Billiet ODWS
Carbohydrates Activator protein
Repressor protein
RNA polymerase
lac Operon
+ GLUCOSE+ LACTOSE
Not bound to DNA
Lifted off operator site
Keeps falling off promoter
site
No transcription
+ GLUCOSE- LACTOSE
Not bound to DNA
Bound to operator site
Blocked by the repressor
No transcription
- GLUCOSE- LACTOSE
Bound to DNA
Bound to operator site
Blocked by the repressor
No transcription
- GLUCOSE+ LACTOSE
Bound to DNA
Lifted off operator site
Sits on the promoter site
Transcription
© 2007 Paul Billiet ODWS
• Control of Gene Expression
• 1. DNA-Protein Interaction
• 2. Transcription Regulation
• 3. Post-transcriptional Regulation
The outer surface difference of base pairs without opening the double helix
Hydrogen bond donor: blueHydrogen bond acceptor: redHydrogen bond: pinkMethyl group: yellow
One typical contact of Protein and DNA interfaceIn general, many of them will form between a protein and a DNA
DNA-Protein Interaction
1. Different protein motifs binding to DNA: Helix-turn-Helix motif; the homeodomain; leucine zipper; helix-loop-helix; zinc finger
2. Dimerization approach3. Biotechnology to identify protein and DNA sequence
interacting each other.
Helix-turn-HelixC-terminal binds to major groove, N-terminal helps to position the complex, discovered in Bacteria
A dimer of the zinc finger domain of the glucocorticoid receptor (belonging to intracellular receptor family) bound to its specific DNA sequenceZinc atoms stabilizing DNA-binding Helix and dimerization interface
Beta sheets can also recognize DNA sequence(bacterial met repressor binding to s-adenosyl methionine)
Leucine Zipper DimerSame motif mediating both DNA binding and Protein dimerization(yeast Gcn4 protein)
DNA affinity ChromatographyAfter obtain the protein, run mass spec, identify aa sequence, check genome, find gene sequence
Summary
• Helix-turn-Helix, homeodomain, leucine zipper, helix-loop-helix, zinc-finger motif
• Homodimer and heterodimer• Techniques to identify gene sequences bound
to a known protein (DNA affinity chromatography) or proteins bound to known sequences (gel mobility shift)
Tryptophan Gene Regulation (Negative control)Operon: genes adjacent to each other and are transcribed from a single promoter
Combinatory Regulation of Lac OperonCAP: catabolite activator protein; breakdown of lactose when glucose is low and lactose is present
The difference of Regulatory system in eucaryotes and bacteria
1. Enhancers from far distance over promoter regions2. Transcription factors3. Chromatin structure
Regulation of an eucaryotic geneTFs are similar, gene regulatory proteins could be very different for different gene regulations
Gene engineering revealed the function of gene activation proteinDirectly fuse the mediator protein to enhancer binding domain, omitting activator domain, similar enhancement is observed
Gene regulatory proteins help the recruitment and assembly of transcription machinery(General model)
Two mechanisms of histone acetylation in gene regulation
a. Histone acetylation further attract activator proteins
b. Histone acetylation directly attract TFs
Gene regulatory proteins can affect transcription process at different steps
The order of process may be different for different genes
Summary• Gene activation or repression proteins
• DNA as a spacer and distant regulation
• Chromatin modulation, TF assembly, polymerase recruitment
• combinatory regulations
Genetic SwitchesPositive, negative and combinatorial control of transcription in bacteria
Trp and Lac operonsLambda repressor
DNA bending and protein-protein interactions on DNADifferences in transcription regulation between prokaryotes and eukaryotesThe structure of a eukaryotic gene control regionHow eukaryotic transcriptional activators workHow eukaryotic transcriptional repressors workSteps of eukaryotic transcriptional activationTranscription factor complexes, coactivators and corepressors synergyControl of Drosophila even-skipped (eve)Locus control regions and insulators
Creating Specialized CellsPhase variation in SalmonellaYeast mating type switchingRegulation of lambda phage lysogeny: flip-flopFour types of feedbackPositive and negative transcription feedback loopsExamples:
Circadian clocks: (don’t need to know details)Myogenic proteins and muscle cell formationEye development in Drosophila
Creation of cell types by a few transcription factorsMechanisms by which patterns of gene expression can be passed to daughter cells:
X-inactivationCytosine methylationGenomic imprintingCpG islands
Post transcriptional controls
• Post-initiation transcriptional control of gene expression• attenuation
• Alternative splicing• Regulation of alternative splicing
• Transcript cleavage • Secreted verses membrane bound antibodies
• RNA editing especially as it related to human cells• RNA transport and localization
• Export of HIV RNAs from the nucleus• Localization in the cytoplasm
• Negative control of translation initiation• Bacteria (ex. Bacterial ribosomal proteins)• How do translational repressor work in eukaryotes
–Aconitase• Phosphorylation of eIF-2• uORFs
• IRES• Control of mRNA stability• RNA interference, miRNAs, siRNAs
• The transcription cycle• The structure of E. coli RNA polymerase• Sigma70 promoter structure (-10 region & variants)
– Sigma factors• Subunits of bacterial RNA polymerase• Two types of terminators
• Eukaryotic RNA pols– General composition of the polymerases– General transcriptions factors– TATA and other promoter DNA sequence signals– Mediator complex
• Elongation• RNA capping, Splicing, Cleavage and polyAdenylation• Differences between prokaryotic and eukaryotic transcription
Transcription
• Spliceosome structure and mechanism of splicing• Different types of splicing (3 major types)• Group I and II introns
Splicing
Alternative picture: co-transcriptional pre-mRNA processing
• This picture is more realistic than the previous one, particularly for long pre-mRNAs
Heterogenous ribonucleoprotein patricles (hnRNP) proteins
• In nucleus nascent RNA transcripts are associated with abundant set of proteins
• hnRNPs prevent formation of secondary structures within pre-mRNAs
• hnRNP proteins are multidomain with one or more RNA binding domains and at least one domain for interaction with other proteins
• some hnRNPs contribute to pre-mRNA recognition by RNA processing enzymes
• The two most common RNA binding domains are RNA recognition motifs (RRMs) and RGG box (five Arg-Gly-Gly repeats interspersed with aromatic residues)
Capping
p-p-p-N-p-N-p-N-p….
p-p-N-p-N-p-N-p…
G-p-p-p-N-p-N-p-N-p…
CH3
G-p-p-p-N-p-N-p-N-p…
CH3 CH3
GMP mCE (another subunit)
Capping enzyme (mCE)
methyltransferasesS-adenosyl methionine
The capping enzyme
• A bifunctional enzyme with both 5’-triphosphotase and guanyltransferase activities
• In yeast the capping enzyme is a heterodimer• In metazoans the capping enzyme is monomeric
with two catalytic domains • The capping enzyme specific only for RNAs,
transcribed by RNA Pol II (why?)
Capping mechanism in mammals
DNA
Growing RNA
Capping enzyme is allosterically controlled by CTD domains of RNA Pol II and another stimulatory factor hSpt5
Polyadenylation
• Poly(A) signal recognition• Cleavage at Poly(A) site• Slow polyadenylation• Rapid polyadenylation
• G/U: G/U or U rich region
• CPSF: cleavage and polyadenylation specificity factor
• CStF: cleavage stimulatory factor
• CFI: cleavage factor I
• CFII: cleavage factor II
pp
Pol II
ctd
mRNA
PolyA – binding factors
Link between polyadenylation and transcription
Pol II gets recycled
mRNA gets cleaved and polyadenylated
degradation
cap
polyA
cap
splicing,nuclear transport
pp
aataaa
FCP1 Phosphatase removes phospates from CTDs
cap
Small nuclear RNAs U1-U6 participate in splicing
• snRNAs U1, U2, U4, U5 and U6 form complexes with 6-10 proteins each, forming small nuclear ribonucleoprotein particles (snRNPs)
• Sm- binding sites for snRNP proteins
Additional factors of exon recognition
ESE - exon splicing enhancer sequences
SR – ESE binding proteins
U2AF65/35 – subunits of U2AF factor, binding to pyrimidine-rich regions and 3’ splice site
The spliced lariat is linearized by debranching enzyme and further degraded in exosomes
Not all intrones are completely degraded. Some end up as functional RNAs, different from mRNA
pp
Pol IIctd
mRNA
SCAFs: SR- like CTD – associated factors
cap
SRssnRNPs
Intron
Co-transciptional splicing
Self-splicing introns
• Under certain nonphysiological conditions in vitro, some introns can get spliced without aid of any proteins or other RNAs
• Group I self-splicing introns occur in rRNA genes of protozoans
• Group II self-splicing introns occur in chloroplasts and mitochondria of plants and fungi
Spliceosome• Spliceosome contains snRNAs, snRNPs and many
other proteins, totally about 300 subunits. • This makes it the most complicted macromolecular
machine known to date.• But why is spliceosome so extremely complicated if
it only catalyzes such a straightforward reaction as an intron deletion? Even more, it seems that some introns are capable to excise themselves without aid of any protein, so why have all those 300 subunits?
• No one knows for sure, but there might be at least 4 reasons:
• 1. Defective mRNAs cause a lot of problems for cells, so some subunits might assure correct splicing and error correction
• 2. Splicing is coupled to nuclear transport, this requires accessory proteins
• 3. Splicing is coupled to transcription and this might require more additional accessory proteins
• 4. Many genes can be spliced in several alternative ways, which also might require additional factors
One gene – several proteins• Cleavage at alternative poly(A) sites• Alternative promoters• Alternative splicing of different exons• RNA editing
RNA editing
• Enzymatic altering of pre-mRNA sequence• Common in mitochondria of protozoans and plants and
chloroplasts, where more than 50% of bases can be altered
• Much rarer in higher eukaryotes
Editing of human apoB pre-mRNA
The two types of editing1) Substitution editing• Chemical altering of individual nucleotides• Examples: Deamination of C to U or A to I
(inosine, read as G by ribosome)
2) Insertion/deletion editing•Deletion/insertion of nucleotides (mostly uridines) •For this process, special guide RNAs (gRNAs) are required
Small nucleolar RNAs
• ~150 different nucleolus restricted RNA species• snoRNAs are associated with proteins, forming small
nucleolar ribonucleoprotein particles (snoRNPs)• The main three classes of snoRNPs are envolved in
following processes:a) removing introns from pre-rRNAb) methylation of 2’ OH groups at specific sitesc) converting of uridine to pseudouridine
What is this pseudouridine good for?
• Pseudouridine Y is found in RNAs that have a tertiary structure that is important for their function, like rRNAs, tRNAs, snRNAs and snoRNAs
• The main role of Y and other modifications appears to be the maintenance of three-dimensional structural integrity in RNAs
Uridine ( U ) Pseudouridine ( Y )
Where do snoRNAs come from?
• Some are produced from their own promoters by RNA pol II or III
• The majority of snoRNAs come from introns of genes, which encode proteins involved in ribosome synthesis or translation
• Some snoRNAs come from intrones of genes, which encode nonfuctional mRNAs
Splicing of pre-tRNAs is different from pre-mRNAs and pre-rRNAs
• The splicing of pre-tRNAs is catalyzed by protein only
• A pre-tRNA intron is excised in one step, not by two transesterification reactions
• Hydrolysis of GTP and ATP is required to join the two RNA halves
The central channel
• Small metabolites, ions and globular proteins up to ~60 kDa can diffuse freely through the channel
• Large proteins and ribonucleoprotein complexes (including mRNAs) are selectively transported with the assistance of transporter proteins
Two different kinds of nuclear location sequences basic
hydrophobic
importin a importin b
importin b
nuclear import
Proteins which are transported into nucleus contain nuclear location sequences
Artifical fusion of a nuclear localization signal to a
cytoplasmatic protein causes its import to nucleus
After mRNA reaches the cytoplasm...
• mRNA exporter, mRNP proteins, nuclear cap-binding complex and nuclear poly-A binding proteins dissociate from mRNA and gets back to nucleus
• 5’ cap binds to translation factor eIF4E• Cytoplasmic poly-A binding protein (PABPI) binds
to poly-A tail• Translation factor eIF4G binds to both eIF4E and
PABPI, thus linking together 5’ and 3’ ends of mRNA
Quality control of translation in bacteria
Rescue the incomplete mRNA process and add labels for proteases
• RNA translation (Protein synthesis), tRNA, ribosome, start codon, stop codon
• Protein folding, molecular chaperones• Proteasomes, ubiquitin, ubiqutin ligase
How Genomes evolve
• Mutations, gene deletions, chromosomal rearrangements, transposable elements, horizontal transfer, inversions, gene duplication, whole genome duplication
• Phylogenetic trees• Sequence alignments• Chromosomal rearrangements• Gene duplication
• Neofunctionalization, subfunctionalization, • Whole genome duplication
• SNPs (mutations within a genome)• Haplotypes• CNVs