Prader-Willi & Angelman Syndromes • Both of these genetic disorders are caused by deletion of a region of chromosome 15. • However, the syndromes differ: – Prader-Willi Syndrome - obesity, mental retardation, short stature. (abbreviated PWS) – Angelman Syndrome - uncontrollable laughter, jerky movements, and other motor and mental symptoms. (abbreviated AS) • Syndrome that develops depends upon the parent that provided the mutant chromosome.
73
Embed
Prader-Willi & Angelman Syndromes Both of these genetic disorders are caused by deletion of a region of chromosome 15. However, the syndromes differ:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Prader-Willi & Angelman Syndromes
• Both of these genetic disorders are caused by deletion of a region of chromosome 15.
• However, the syndromes differ:– Prader-Willi Syndrome - obesity, mental retardation,
short stature. (abbreviated PWS)
– Angelman Syndrome - uncontrollable laughter, jerky movements, and other motor and mental symptoms. (abbreviated AS)
• Syndrome that develops depends upon the parent that provided the mutant chromosome.
PWS
AS
PWSMousemodel
ASMousemodel
From Annu Rev Genomics & Hum Genet
Introduction
Goal : Identify loci associated with variation in expression levels
Genomic DNA
mRNA
Nucleus
mRNA
regulators
Target
Cis and Trans regulation
Target gene expression phenotype
Cis-regulator
Trans-regulator
DataCentre d'Etude du Polymorphisme Humain (CEPH) families are Utah
residents with ancestry from northern and western Europe.
• 14 families with genotype and expression data available for all parents and a mean of eight offspring (range 7-9)
A1 A2 A3 A4
A1 A3 A1 A4
Method: Linkage analysis
A1 A2 A3 A4
A1 A3 A2 A4
A1 A2 A3A4
A1 A3 A1 A3
IBD=2 IBD=1 IBD=0
IBD: identical-by-descent
For a particular target gene expression
15 10 5
t-statistics
SNP1 2 3 4 5 Genetic Locus
Under criteria 1,• 27/142 (19%) expression phenotype have only a single
cis-regulator.• 110/142 (77.5%) expression phenotype have only a
single trans-regulator.• 2 /142 have a cis and a trans-acting regulator• 3 /142 gene expression have two trans-acting regulator Under criteria 2, 164 / 984 (16%) has multiple regulators
Cis and trans- regulation
Se requiere modelos de regulación de expresión génica
GAL Genes: Eukaryotic Transcriptional Regulation
GAL Genes: Eukaryotic Transcriptional Regulation
• Unlike prokaryotes, eukaryotes do not have genes in operons (most mRNAs are not polycistronic).
• The GAL genes of S. cerevisiae are the paradigm for eukaryotic gene regulation
• Galactose is metabolized by GAL gene products:
Galactose Gal-1-PGal1p
Glu-1-P
UDP-Glu
UDP-GalGal7pGal10p
Gal5pGlu-6-P
Glycolysis
EukaryoticTranscription Distal Proximal
• Proteins bind to distal elements called ENHANCERS.
• DNA folding allows these elements to be far from the start site for transcription.
• Proteins bound to the distal sites promote the binding of RNA polymerase to the proximal elements.
GAL Genes: A Transcriptional ProgramGAL Genes: A Transcriptional Program
• The response to galactose is very complex, with a number of genes being turned on or off.
• The central regulator is a protein called Gal4p.– Gal4p binds to enhancer elements in DNA and activates
transcription under some circumstances.
Gal4p: A Transcriptional RegulatorGal4p: A Transcriptional Regulator
• Gal4p binds to enhancer elements near genes that it regulates (e.g., GAL1).
• Gal4p also binds to Gal80p.– Gal80p is necessary for activation of gene expression.
• When galactose binds to Gal80p, the Gal4p-Gal80p complex can activate transcription.– This activation has now been studied at the level of the whole
genome:
• This figure shows data from a microarray experiment (Science 290:2306 [2000]).
Examining Transcriptional RegulationExamining Transcriptional Regulation• MICROARRAYS have become very popular as tools to
study gene regulation.– A microarray is a small glass slide on which cDNAs of many
(or all) genes in an organism have been dotted.– cDNA is made using mRNAs present under certain conditions
(or in a certain tissue) and labeled with fluorescent dyes.– Then, the labeled cDNA are hybridized to the microarray and
the fluorescence determined.
• There is a nice animation describing this at:– http://www.bio.davidson.edu/courses/genomics/chip/chip.html
– Does this examine transcriptional regulation?
Examining Transcriptional RegulationExamining Transcriptional Regulation• This basic method was extended for the Gal4p study
that we have been discussing discussed.– For this study, the researchers tagged the Gal4p protein so the
could purify from the cell.– Then, they chemically cross-linked it to DNA and purified it.– This allowed them to purify the DNA that Gal4p was bound to
in the cell.– The DNA that Gal4p was bound to in the cell was labeled and
used to probe the microarray.
– Does this examine transcriptional regulation?
Examining Transcriptional RegulationExamining Transcriptional Regulation• This study established several interesting facts:
– The Gal4p binding sites in the DNA are sometimes bound by Gal4p in the absence of galactose, others are bound only in the presence of galactose.
– So the trigger is more complex than simply whether or not the Gal4p protein can bind.
– This more complex regulation involves Gal80p, an inhibitor.
Two possible modelsfor regulation of theGal4p-Gal80p complex by galactose.
The models differ onlyin the exact bindingsites for Gal80p.
How do Eukaryotic Transcriptional Regulators Work?
How do Eukaryotic Transcriptional Regulators Work?
• There are a few specific types of proteins that act to increase transcriptional activity:– Many proteins have an acidic domain.
• Surprisingly, these “acid-blob” proteins often require a hydrophobic residue embedded in an acidic region.
• Both Gal4p and the herpes simplex virus VP16 protein (an transcriptional regulator for this virus) have acid blobs.
– Glutamine-rich and Proline-rich transcriptional activation domains have been characterized.
• These protein regions activate transcription when fused to other DNA-binding domains.– Alternatively, they can be recruited by protein-protein
interactions - e.g., a DNA-binding protein binds the enhancer, and it contains a region that recruits and acid-blob protein.
Using Eukaryotic Transcriptional RegulatorsUsing Eukaryotic Transcriptional Regulators• The yeast 2-hybrid system exploits these features of
eukaryotic transcription factors to examine protein-protein interactions.– The DNA-binding and transcription activating regions of Gal4p
can be separated.– Interestingly, if you fuse one protein to the Gal4p DNA-binding
domain (BD) and a second protein that it interacts (physically) with to the Gal4p transcriptional activating domain (AD), one can see transcriptional activation:
How do Eukaryotic Transcriptional Regulators Work?
How do Eukaryotic Transcriptional Regulators Work?
• Another interesting phenomenon that is sometimes seen with transcription factor is SQUELCHING.– Overexpression of transcription activators like Gal4p can
result in a general inhibition of transcriptional activity.– How does this happen?
– Presumably, specific transcription factors like Gal4p act by recruiting “basal” transcription factors.
• In fact, some basal factors that physically interact with these transcription activating domains have been found.
• Basal factors are factors involved in recruiting RNA polymerase II to a large number of promoters.
– So overexpressing proteins with these transcription activating domains can actually turn gene expression off, by competing for these factors.
How do Eukaryotic Transcriptional Regulators Work?
• At least one way is by altering the packing of DNA into chromatin.
• The role of chromatin structure in the regulation of transcription is an area of very active investigation.
• However, two important factors that play clear roles in transcriptional regulation are known:– DNA METHYLATION - A subset of cytosine (C) residues are
modified by methylation.– HISTONE ACETYLATION - Histones can be modified by
acetylation.
Chromatin• Remember, DNA in
eukaryotes packs into CHROMATIN.
• HISTONES form the NUCLEOSOME, which DNA loops around.
• EUCHROMATIN - less compact; actively transcribed
• HETEROCHROMATIN - more compact; transcriptionally inactive.– Heterochromatin can be
either constitutive or facultative.
DNA Methylation• Genes that are transcriptionally inactive are often
METHYLATED.– In eukaryotes, cytosine residues are modified by methylation.
• Typically, the sites of methylation are CG dinucleotides (vertebrates).– This allows maintenance through replication.
NH2
O NHNH
N
NH2
O NHNH
NCH3
CYTOSINE
METHYL-C
Histone Acetylation
• HISTONES in transcriptionally active genes are often ACETYLATED.
• Acetylation is the modification of lysine residues in histones.– Reduces positive charge, weakens the interaction with DNA.– Makes DNA more accessible to RNA polymerase II
• Enzymes that ACETYLATE HISTONES are recruited to actively transcribed genes.
• Enzymes that remove acetyl groups from histones are recruited to methylated DNA.– There are additional types of histone modification as well,
such as methylation of the histones.
Genetic Imprinting
• Remember that DNA methylation can be maintained through replication.
• This allows the packing of chromatin to be passed on - just like a gene sequence.– However, differences in chromatin packing are not as stable
as gene sequences.• Heritable but potentially reversible changes in gene
expression are called EPIGENETIC phenomena– Vertebrates use these differences in chromatin packing to
IMPRINT certain patterns of gene regulation.– Some genes show MATERNAL IMPRINTING while other show
PATERNAL IMPRINTING.• The alleles of some genes that are inherited from the
relevant parent are methylated, and therefore are not expressed.
Prader-Willi & Angelman Syndromes
• Both of these genetic disorders are caused by deletion of a region of chromosome 15.
• However, the syndromes differ:– Prader-Willi Syndrome - obesity, mental retardation,
short stature. (abbreviated PWS)
– Angelman Syndrome - uncontrollable laughter, jerky movements, and other motor and mental symptoms. (abbreviated AS)
• Syndrome that develops depends upon the parent that provided the mutant chromosome.
PWS
AS
PWSMousemodel
ASMousemodel
From Annu Rev Genomics & Hum Genet
Prader-Willi & Angelman Syndromes
• Prader-Willi Syndrome - develops when the abnormal copy of chromosome 15 is inherited from the father.
• Angelman Syndrome - develops when the abnormal copy of chromosome 15 is inherited from the mother.
• The differences reflect the fact that some loci are IMPRINTED - so only the allele inherited from one parent is expressed.– The region contains both maternally and paternally
imprinted genes.
Methylation and Gene Regulation
• For imprinted genes, the pattern of gene regulation is dependent upon the parent that donated the chromosome.– The methylation pattern is “reprogrammed”
in the germ line.
• There are other examples of methylation changes the regulate gene expression.– In mammals, one of the two X chromosomes
in females is inactivated.– The inactivated X is methylated.
Because of the vast amounts of data that are generated, we need new approaches
Genes (i.e., protein coding)
But. . . only <2% of the human genome encodes proteins
Other than protein coding genes, what is there?• genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.)• structural sequences (scaffold attachment regions)• regulatory sequences• “junk” (including transposons, retroviral insertions, etc.)
It’s still uncertain/controversial how much of the genome is composed of any of these classes
The answers will come from experimentation and bioinformatics. We will discuss further only gene regulation.
What’s in a genome?
Gene expression must be regulated in:
TIME
Wolpert, L. (2002) Principles of Development New York: Oxford University Press. p. 31
photo credits: Wolpert, L. (2002) Principles of Development New York: Oxford University Press. pp. 183, 340
• transcription• post transcription (RNA stability)
• post transcription (translational control)• post translation (not considered gene regulation)
usually, when we speak of gene regulation, we are referring to transcriptional regulation
the “transcriptome”
Genes can be regulated at many levels
RNA PROTEINDNATRANSCRIPTION TRANSLATION
The “Central Dogma”
One way of looking at the transcriptome is with DNA microarrays. With microarrays, the expression of thousands of genes can be assessed in a single experiment.
cDNAs or oligonucleotides representing all genes in the genome are deposited on a glass slide using a robotic arrayer:
Looking at the transcriptome: DNA
microarrays
Benfey, P. and Protopapas, A. Genomics. 2005. New Jersey: Pearson Prentice Hall. pp. 131-2
Exploring the Metabolic and Genetic Control ofGene Expression on a Genomic Scale
Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown*
Microarray
MicroArray• Allows measuring the mRNA level of thousands
of genes in one experiment -- system level response
• The data generation can be fully automated by robots
• Common experimental themes:
–Time Course (when)–Tissue Type (where)–Response (under what conditions)–Perturbation: Mutation/Knockout, Knock-in Over-expression
Looking at the transcriptome: DNA
microarrays
extract mRNA
make labeled cDNA
hybridize to microarray
cell type A
cell type B
more in “A”
more in “B”
equal in A & B
Looking at the transcriptome: microarrays
genes
co
nd
itio
ns
condition 1 condition 2
condition 3
statistical processing and analysis
Which Genes to select? • For each gene (row) compute a score defined by
sample mean of X - sample mean of Y
divided by
standard deviation of X + standard deviation of Y
• X=ALL, Y=AML
• Genes (rows) with highest scores are selected.
Seems to work ! Improvement?
•34 new leukemia samples•29 are predicated with 100% accuracy; 5 weak predication cases
That seems to work well.
They have a method
Study of cell-cycle regulated genes
• Rate of cell growth and division varies• Yeast(120 min), insect egg(15-30 min); nerve
cell(no);fibroblast(healing wounds)• Regulation : irregular growth causes cancer• Goal : find what genes are expressed at each state
of cell cycle• Yeast cells; Spellman et al (2000) • Fourier analysis: cyclic pattern
Yeast Cell Cycle(adapted from Molecular Cell Biology, Darnell et al)
Most visible event
Example of the time curve:
Histone Genes: (HTT2)ORF: YNL031CTime course:
50 250100 150 200
Histone
Why clustering make sense biologically?
Profile similarity implies functional association
The rationale is
Genes with high degree of expression similarity are likely to be functionally related and may participate in common pathways.
They may be co-regulated by common upstream regulatory factors.
Some protein complexesProtein rarely works as a single unit
• Pearson's correlation coefficient, a simple way of describing the strength of linear association between a pair of random variables, has become the most popular measure of gene expression similarity.
•1.Cluster analysis: average linkage, self-organizing map, K-mean, ...
2.Classification: nearest neighbor,linear discriminant analysis, support vector machine,…
3.Dimension reduction methods: PCA ( SVD)
Gene profiles and correlation
CC has been used by Gauss, Bravais, Edgeworth … Sweeping impact in data analysis is due to
Galton(1822-1911)
“Typical laws of heridity in man”
Karl Pearson modifies and popularizes the use.
A building block in multivariate analysis, of whichclustering, classification, dim. reduct. are recurrent themes
As a statistician, how can you ignore the time order ?(Isn’t it true that the use of sample correlation relies on the assumption that data are I.I.D. ???)
….acerca de probabilidades.
Microarrays can show us when and where genes are expressed. But what regulates this expression?
regulation in trans:transcription factors
Mechanisms of transcriptional regulation
regulation in cis :promoters & enhancersbinding sites
Identifying transcription factor binding sites
Usually, binding sites are first determined empirically.
Most transcription factors can bind to a range of similar sequences. We can represent these in either of two ways, as a consensus sequence, or as a position weight matrix (PWM).
Once we know the binding site, we can search the genome to find all of the (predicted) binding sites.
Binding site (motif) representationsTCCGGAAGCTCCGGATGCTCCGGATCTCATGGATGCCCAGGAAGTGGTGGATGCACCGGATGC
TCCC
TGGATAGC
T
A 111007200T 302000502G 110770060C 254000015
7 characterized binding sites for a
certain transcription factor:
consensus sequence:
PWM and logo:
Consensus sequences make searching easy, e.g. by using regular expressions in Perl:
A PWM allows us to assign more importance to more invariant positions. We can calculate a score based on the probability of a given nucleotide being in a given position.
TCCGGAAGC scores higher thanTCCGGATCT as GC is preferred over CT in the last two positions
Finding binding sites in the genomeBinding site motifs can be predicted computationally from the regulatory regions of genes with similar expression patterns.
For instance, the promoter regions of genes that cluster in a microarray experiment can be used.
(How can the promoter regions be extracted? You should know enough Perl at this point to be able to do this, given a well-annotated sequence database.)
Finding binding sites in the genomeseq1:TTTTTATTTTTCTGAATCACCACTTGATATTGCTTCACAGAACTseq2:CGGGCGGTGAGGCAGAGAAAGAGACCACTTGAAATGTAGTAATAseq3:CACTTGAATTTTTCTGCACGCAGTTTTTATTTTTACTTTTCTTGseq4:CGCGTTCGTTATTTGTTGTTGACCACTTGAATTGATTGCTTTATseq5:ATCCCGGTCGAGGTGCACTTGATGTTTTCAATGGAAATGTTGCCseq6:TCTGCAGATTTATGGCCCAACGCTCATTTAACAATTAAAGTGGG seq7:GCATTAACTCTCACTTCAAAAAATCATATAAACACCTCTAATATseq8:TATATTTTCTCGCCACTTAAATAGTTTTCAATGCCAATGGCAGGseq9:ATCCTTATCGAAGCACTTGGATTTTAAAGCAATCTTTTGAACAC
A Gibbs sampling algorithm can then find the common sub-sequences:
Of course, we must now discover which transcription factor binds this sequence.
Finding binding sites in the genome
How meaningful are the sites we find?• Only experiments can tell us for sure• However, we can get some hints using statistical analysis
Example 1:We just found the motif CACTTGA upstream of co-expressed genes. Is it over-represented in this set compared to a random selection of genes?
Search 100 random sets of genes.Find the mean and standard deviation. z = observed - expected/standard deviation
Finding binding sites in the genome
Example 2:Many regulatory regions contain multiple binding sites for the same transcription factor. Is the motif found an unusually large number of times in a short stretch of sequence?
Crudely:Probability of finding a 7 bp motif: 4-7 = 1/16,384i.e., expect only about 1 motif every 16 kb.Thus, finding several close together is very unlikely.
find all motifs in genome
identifytranscription
factors
identify binding motif
identify target genes
Transcription factors, binding sites, and target genes
computational searchingChIP-chip
computational searchingmicroarraysgenetic screens
bioinformatics (e.g., Gibbs sampling on microarray data)molecular biology using purified protein or protein extracts
Wasserman, W. W. and A. Sandelin (2004). "Applied Bioinformatics For The Identification Of Regulatory Elements." Nature Reviews Genetics 5(4): 276-287.
Halfon, M. S. and A. M. Michelson (2002). "Exploring Genetic Regulatory Networks in Metazoan Development: Methods and Models." Physiol Genomics 10(3): 131-43.
Davidson, E. H. (2001). Genomic Regulatory Systems. San Diego, Academic Press.
Carroll, S. B., J. K. Grenier, et al. (2001). From DNA to Diversity. Molecular Genetics and the Evolution of Animal Design. Massachusetts, Blackwell Science.