Top Banner
HGP, gene map and structure For Mastering PPDS 2010
67
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HGPgenemap Structure

HGP, gene map and structure

For Mastering PPDS 2010

Page 2: HGPgenemap Structure

What is a genome?A genome is an organism's complete set of deoxyribonucleic acid (DNA), a chemical compound that contains the genetic instructions needed to develop and direct the

activities of every organism.

DNA molecules are made of two twisting, paired strands. Each strand is made of four chemical units, called nucleotide bases. The bases are adenine (A), thymine (T), guanine (G) and cytosine (C). Bases on opposite strands pair specifically; an A always pairs with a T, and a Calways with a G.

The human genome contains approximately 3 billion of these base pairs, which reside in the 23 pairs of chromosomes within the nucleus of all our cells.

Each chromosome contains hundreds to thousands of genes, which carry the instructions for making proteins.

Each of the estimated 30,000 genes in the human genome makes an average of three proteins.

http://www.genome.gov/11006943 11/16/2009

Page 3: HGPgenemap Structure

Molecular Biology Overview

Cell Nucleus

Chromosome

ProteinGraphics courtesy of the National Human Genome Research Institute

Gene (DNA)Gene (mRNA), single strand

Page 4: HGPgenemap Structure

Hubungan antara DNA, gen dan kromosom adalah:

a. Kromosom berisi ratusan gen yang terbuat dari protein.

b. Kromosom berisi ratusan gen yang terbuat dari DNA

c. Gen berisi ratusan kromosom yang terbuat dari unit-unit protein.

d. Gen terdiri dari DNA, tapi tidak berhubungan dengan kromosom

e. Gen terdiri dari ratusan kromosom yang terbuat dari unit-unit DNA

Page 5: HGPgenemap Structure

Translating the genome-based knowledge into health benefits

Page 6: HGPgenemap Structure

HGP goals

• Analyzing the structure of human DNA (not of particular individu)

• Determining the location of the genes• And function of the genome (interconnected

pathways/nodes in the network). to understand and eventually treat genetic

diseases and multifactorial diseases in which genetic predisposition plays an important role.

http://www.genome.gov/10001477 11/16/2009

Page 7: HGPgenemap Structure

Scientific goals

• Mapping and Sequencing the Human Genome• Mapping and Sequencing the Genomes of

Model Organisms• Data Collection and Distribution• Ethical and Legal Considerations• Research Training• Technology Development• Technology Transfer

Page 8: HGPgenemap Structure

The International Human Genome Sequencing Consortium1. The Whitehead Institute/MIT Center for Genome Research,

Cambridge, Mass., U.S.2. The Wellcome Trust Sanger Institute, The Wellcome Trust

Genome Campus, Hinxton, Cambridgeshire, U. K.3. Washington University School of Medicine Genome Sequencing

Center, St. Louis, Mo., U.S.4. United States DOE Joint Genome Institute, Walnut Creek, Calif.,

U.S.5. Baylor College of Medicine Human Genome Sequencing Center,

Dep.of Molecular and Human Genetics, Houston, Tex.,U.S.6. RIKEN Genomic Sciences Center, Yokohama, Japan7. Genoscope and CNRS UMR-8030, Evry, France8. GTC Sequencing Center, Genome Therapeutics Corporation,

Waltham, Mass., USA9. Department of Genome Analysis, Institute of Molecular

Biotechnology, Jena, Germany10. Beijing Genomics Institute/Human Genome Center, Institute

of Genetics, Chinese Academy of Sciences, Beijing, China11. Multimegabase Sequencing Center, The Institute for Systems

Biology, Seattle, Wash.

12. Stanford Genome Technology Center, Stanford, Calif., U.S.13. Stanford Human Genome Center and Department of

Genetics, Stanford University School of Medicine, Stanford, Calif., U.S.

14. University of Washington Genome Center, Seattle, Wash., U.S.

15. Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan

16. University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S.

17. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, Norman, Okla., U.S.

18. Max Planck Institute for Molecular Genetics, Berlin, Germany

19. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y.

20. GBF - German Research Centre for Biotechnology, Braunschweig, Germany

http://www.genome.gov/11006943 11/16/2009

Page 9: HGPgenemap Structure

Whose DNA was sequenced for the Human Genome Project?

• The sequence is derived from the DNA of several volunteers. • To ensure that the identities of the volunteers cannot be revealed, a

careful process was developed to recruit the volunteers and to collect and maintain the blood samples that were the source of the DNA.

• Candidates were recruited from a diverse population. The volunteers provided blood samples after being extensively

counseled and then giving their informed consent. About 5 to 10 times as many volunteers donated blood as were

eventually used, so that not even the volunteers would know whether their sample was used.

• All labels were removed before the actual samples were chosen.

http://www.genome.gov/11006943 11/16/2009

Page 10: HGPgenemap Structure

Is the human genome completely sequenced?

• Within the limits of today's technology, the human genome is as complete as it can be.

• Small gaps that are unrecoverable in any current sequencing method remain, amounting for about 1 percent of the gene-containing portion of the genome, or euchromatin.

• New technologies will have to be invented to obtain the sequence of these regions.

• However, the gene-containing portion of the genome is complete in nearly every functional way for the purposes of scientific research and is freely and publicly available.

http://www.genome.gov/11006943 11/16/2009

Page 11: HGPgenemap Structure
Page 12: HGPgenemap Structure

Ethical, legal and social implications (ELSI)

4 initial emphasis: • privacy of genetic information, • safe and effective introduction of genetic

information in the clinical setting, • fairness in the use of genetic information • professional and public education.

http://www.genome.gov/10001476 on 11/16/2009

Page 13: HGPgenemap Structure

13

Page 14: HGPgenemap Structure

14

1: M57424. Human adenine nuc...[gi:178660] Links

LOCUS HUMANT2X 4982 bp DNA linear PRI 31-OCT-1994 DEFINITION Human adenine nucleotide translocator-2 (ANT-2) gene, complete cds. ACCESSION M57424 J05624 VERSION M57424.1 GI:178660 KEYWORDS adenine nucleotide translocator-2. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 4982) AUTHORS Ku,D.H., Kagan,J., Chen,S.T., Chang,C.D., Baserga,R. and Wurzel,J. TITLE The human fibroblast adenine nucleotide translocator gene. Molecular cloning and sequence JOURNAL J. Biol. Chem. 265 (27), 16060-16063 (1990) MEDLINE 90375457 PUBMED 2168878 COMMENT Original source text: Human placenta DNA, clone 21. FEATURES Location/Qualifiers source 1..4982 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /map="Xq13-q26" /tissue_type="placenta" CAAT_signal 1128..1132 /gene="ANT2" /note="G00-125-190" CAAT_signal 1455..1459 /gene="ANT2" /note="G00-125-190" CAAT_signal 1817..1821 /gene="ANT2" /note="G00-125-190" TATA_signal 1907..1910 /gene="ANT2" /note="G00-125-190" gene join(1936..2114,3124..3610,3835..3975,4350..4767) /gene="ANT2" mRNA join(1936..2114,3124..3610,3835..3975,4350..4767) /gene="ANT2" /product="adenine nucleotide translocator-2" /note="G00-125-190" exon 1936..2114 /gene="ANT2" /note="G00-125-190" /number=1 CDS join(2004..2114,3124..3610,3835..3975,4350..4507) /gene="ANT2" /codon_start=1 /product="adenine nucleotide translocator-2" /protein_id="AAA51737.1" /db_xref="GI:178661" /db_xref="GDB:G00-125-190"

1321 tttttcagta ctattttaaa aaaatcagaa gcaatgcaaa aatccatcat caacaaatga 1381 gcaacgacag gatcccaacc tgcacttgca tgaccttgtc tcgcttgcct caccctaaac 1441 ccagccttga cactccaatt aaactttatt tacaaaacaa gggggccggc cagtaggatg 1501 tagtttgccc atacgacttt tttaaagtat cgcattgact actgtttatc tcgatgactg 1561 aagggttctt ttggcatccc tgtagcaaat gcgtctcacc ctagtcctgg tcctgctcca 1621 agggtttttg tccaggcaca tcgtgacctc acccttcctc ccctctccga ggcctctctc 1681 agggtccagc gttcaagtcc cgggtgttct ctggacccgc cccttgctct cgccgggtca 1741 ggtgccgagg ggagcacggg cggcgcggag agcagtcccg gcccgccctc cacgactcct 1801 cctcctgcga gctgggccac tcaggtctgc cgcccagcgc gccggggccc agaccccgcc 1861 ccggccccgc ctccgacgcc tgccgctcca gctccggctc cccctatata aatcggccat 1921 ttgcttccgt ccgccccgca gcgccggagt caaacggttc ccggcccagt cccgtcctgc 1981 agcagtctgc ctcctctttc aacatgacag atgccgctgt gtccttcgcc aaggacttcc 2041 tggcaggtgg agtggccgca gccatctcca agacggcggt agcgcccatc gagcgggtca 2101 agctgctgct gcaggtacgt ctgggatcca gagcccaacc aggaagtggg ggaagggtcg 2161 cacagaaggc gggcgccgag gggtggcggg gagcgaactc taaagacatg gccagggaag 2221 cggcttagga gaggccagag cgggcgcaga ggcagaacag aagtcaaact ggtgggaggc 2281 gccctttagt gacctgaagt agtgagtcta ggaaggggcc ggggcagagg gcaggaccag 2341 gctctcggca tctccgaggc ggctgactcg atggagcagt ttctgagtga cttcctcccc 2401 tttcctgggc gtcaagggcg aagcgtccgc gttagaaaga ggaactccag ttttaccgaa 2461 gacctcaagg ttctgcaagg agataactgc ccgggggagg ccatgcgccc gggtccagcg 2521 gcctcccagc ccgcggacgc ctcaaacctc gccgggccgg aagcccggcg ccgggaagcc 2581 ggtgtgcctt ttacgtccgc ccccgcgcag ccgcgaccgc tgccggcgtc tccgcctgcc 2641 tccctgcgcc gcggctccag tgccggctct agagggcgct cctgggctag cgtgtagggc 2701 tggcggcggc ggcgtcgggt cacctctggg agcggagtgc gggcggagcg agacggaagc 2761 agctcaggag acttgaggcg taggctccgc ctcccaaggt gaccgcgccc tatgtgggac 2821 tcgccctaat gcctctgaac ctgggtttga ggtaatgacc tttctcctag gtctgaaggt 2881 cacgggtccg ctggaggatg ccccctctcc actcagaggg gtggaggctt aatgctactg 2941 gtgcagatca cctcttcccc tgtgacagcc tcagagggtt gggagggtcc agccagtatg 3001 atatacgaag actagatttg agagagggga gcctacctta agggcattga tcgagatggc 3061 ataagctctt ctctttccct tccccatggt tataactgtc cctgttggct tccttcctgt 3121 caggtgcagc atgccagcaa gcagatcact gcagataagc aatacaaagg cattatagac 3181 tgcgtggtcc gtattcccaa ggagcaggga gttctgtcct tctggcgcgg taacctggcc 3241 aatgtcatca gatacttccc cacccaggct cttaacttcg ccttcaaaga taaatacaag 3301 cagatcttcc tgggtggtgt ggacaagaga acccagtttt ggcgctactt tgcagggaat 3361 ctggcatcgg gtggtgccgc aggggccaca tccctgtgtt ttgtgtaccc tcttgatttt 3421 gcccgtaccc gtctagcagc tgatgtgggt aaagctggag ctgaaaggga attccgaggc 3481 ctcggtgact gcctggttaa gatctacaaa tctgatggga ttaagggcct gtaccaaggc 3541 tttaacgtgt ctgtgcaggg tattatcatc taccgagccg cctacttcgg tatctatgac 3601 actgcaaagg gtaagtttgc tgtgggcttt aacgttgtgt tcttaggaga cagtttaaaa 3661 gagcattgta ccaacctaac agtccaagag ctaaagagtt gtttttttaa ttgctaaagg 3721 aagccaagat catccaatgc aacccttgtg tacagatgac gtgtttaggg gatgtgggga 3781 aaggaagtca gtaaaacttc tgctttttgg taaagatctc tttcctattc ctaggaatgc 3841 ttccggatcc caagaacact cacatcgtca tcagctggat gatcgcacag actgtcactg 3901 ctgttgccgg gttgacttcc tatccatttg acaccgttcg ccgccgcatg atgatgcagt 3961 cagggcgcaa aggaagtaag ttccacttga gacagaagac aaagttgtag tcgtggggca 4021 atctgctgcc acaaactggt gatacatacc tttaaaaatg gctgtctgtc caagtcaagg 4081 gatggggttg atagcatctg tgtctgttcc acaactgcct ttgagcctgc cctcagatgc 4141 catgaggtgc ttaaatggtg taagaccaat gggtagcctg tatcctgtgg ttcatagtat 4201 taatatttca gtgttgccca tgctaatgtg tgaatgttgg atttaaagct gacgttctca 4261 gaggtggggc tctgctttat ttagcctagt gaatcttagg atttttcatc ggccttcagt 4321 cactaactcc acgtctttat tctttgcagc tgacatcatg tacacaggca cgcttgactg 4381 ctggcggaag attgctcgtg atgaaggagg caaagctttt ttcaagggtg catggtccaa 4441 tgttctcaga ggcatgggtg gtgcttttgt gcttgtcttg tatgatgaaa tcaagaagta 4501 cacataagtt atttcctagg atttttcccc ctgtgaacag gcatgttgta ttctataaca 4561 caatcttgag cattcttgac agactcctgg ctgtcagttt ctcagtggca actactttac 4621 tggttgaaaa tgggaagcaa taatattcat ctgaccagtt ttcctctaaa gccatttcca 4681 tgatgatgat gatgggactc aattgtattt tttatttcag tcactcctga taaataacaa 4741 atttggagaa ataaaaatat ctaaaataaa ttttgtctgc agtatatttt catataaaaa 4801 tgcatatttg agtgctacat tcgaataaat actacctttt tagtgaatgc tagattttta 4861 ataaatgcta cagtatctcc ggagatgaag aactgtcttt ttaaaaccaa ttgtcagcag 4921 tccgcttaac agaataactt ggccgtgcca cccacaaaca tttccaacac attagcaaag

Page 15: HGPgenemap Structure

Ideogram

Page 16: HGPgenemap Structure

Genetic Map

• A genetic map is a representation of the genes on a chromosome arrayed in linear order with distances between loci expressed as percent recombination (map units, centimorgans). Also called a linkage map.

• Any variations in DNA, whether in coding regions of genes or in noncoding regions, can be used as genetic markers, i.e. as a label for a particular point on a chromosome.

Page 17: HGPgenemap Structure

• Genetic distance is measured by frequency of crossing over between loci on the same chromosome.

• One map unit = one centimorgan (cM) = 1% recombination between loci.

• The farther apart two loci are, the more likely that a crossover will occur between them. Conversely, if two loci are close together, a crossover is less likely to occur between them

Page 18: HGPgenemap Structure

Physical Map

• A physical map describes the physical location of genes on chromosomes.

• Genes on the physical and genetic maps should be in the same order, but the scales need not be identical, since crossing over may occur more often in some regions than in others.

Page 19: HGPgenemap Structure

Gene mapping

• Gene mapping has important applications. • It is useful for locating the position of genes

on chromosomes, e.g. if two genes are closely linked and the position of one is known, then the other must also be nearby.

• It is useful in estimating genetic risk, e.g. if a gene cannot be tested directly, then variation at a closely linked locus may indicate the presence or absence of a detrimental allele.

Page 20: HGPgenemap Structure

MIM

Page 21: HGPgenemap Structure
Page 22: HGPgenemap Structure
Page 23: HGPgenemap Structure
Page 24: HGPgenemap Structure
Page 25: HGPgenemap Structure

• Quantitative Genetic • Linkage studies• Association studies• GWA

Page 26: HGPgenemap Structure
Page 27: HGPgenemap Structure

virusesplasmids

bacteriafungi

plantsalgae

insects

mollusks

reptiles

birds

mammals

Genome sizes in nucleotide pairs (base-pairs)

104 108105 106 107 10111010109

The size of the humangenome is ~ 3 X 109 bp;almost all of its complexityis in single-copy DNA.

The human genome is thoughtto contain ~30,000 to 40,000 genes.

bony fish

amphibians

Page 28: HGPgenemap Structure

Organization of human genome

Page 29: HGPgenemap Structure

Classes of repetitive DNA

Interspersed (dispersed) repeats (e.g., Alu sequences)

TTAGGGTTAGGGTTAGGGTTAGGG

Tandem repeats (e.g., microsatellites)

GCTGAGG GCTGAGGGCTGAGG

Page 30: HGPgenemap Structure

Gene structure

Page 31: HGPgenemap Structure

virusesplasmids

bacteriafungi

plantsalgae

insects

mollusks

reptiles

birds

mammals

Genome sizes in nucleotide pairs (base-pairs)

104 108105 106 107 10111010109

The size of the humangenome is ~ 3 X 109 bp;almost all of its complexityis in single-copy DNA.

The human genome is thoughtto contain ~30,000 to 40,000 genes.

bony fish

amphibians

Page 32: HGPgenemap Structure

Organization of human genome

Page 33: HGPgenemap Structure

Classes of repetitive DNA

Interspersed (dispersed) repeats (e.g., Alu sequences)

TTAGGGTTAGGGTTAGGGTTAGGG

Tandem repeats (e.g., microsatellites)

GCTGAGG GCTGAGGGCTGAGG

Page 34: HGPgenemap Structure

Classes of repetitive DNA

Interspersed (dispersed) repeats (e.g., Alu sequences)

TTAGGGTTAGGGTTAGGGTTAGGG

Tandem repeats (e.g., microsatellites)

GCTGAGG GCTGAGGGCTGAGG

Page 35: HGPgenemap Structure

virusesplasmids

bacteriafungi

plantsalgae

insects

mollusks

reptiles

birds

mammals

Genome sizes in nucleotide pairs (base-pairs)

104 108105 106 107 10111010109

The size of the humangenome is ~ 3 X 109 bp;almost all of its complexityis in single-copy DNA.

The human genome is thoughtto contain ~30,000 to 40,000 genes.

bony fish

amphibians

Page 36: HGPgenemap Structure

Organization of human genome

Page 37: HGPgenemap Structure

Classes of repetitive DNA

Interspersed (dispersed) repeats (e.g., Alu sequences)

TTAGGGTTAGGGTTAGGGTTAGGG

Tandem repeats (e.g., microsatellites)

GCTGAGG GCTGAGGGCTGAGG

Page 38: HGPgenemap Structure

Gene structure

Page 39: HGPgenemap Structure

virusesplasmids

bacteriafungi

plantsalgae

insects

mollusks

reptiles

birds

mammals

Genome sizes in nucleotide pairs (base-pairs)

104 108105 106 107 10111010109

The size of the humangenome is ~ 3 X 109 bp;almost all of its complexityis in single-copy DNA.

The human genome is thoughtto contain ~30,000 to 40,000 genes.

bony fish

amphibians

Page 40: HGPgenemap Structure

Organization of human genome

Page 41: HGPgenemap Structure

Classes of repetitive DNA

Interspersed (dispersed) repeats (e.g., Alu sequences)

TTAGGGTTAGGGTTAGGGTTAGGG

Tandem repeats (e.g., microsatellites)

GCTGAGG GCTGAGGGCTGAGG

Page 42: HGPgenemap Structure

Gene structure

Page 43: HGPgenemap Structure

virusesplasmids

bacteriafungi

plantsalgae

insects

mollusks

reptiles

birds

mammals

Genome sizes in nucleotide pairs (base-pairs)

104 108105 106 107 10111010109

The size of the humangenome is ~ 3 X 109 bp;almost all of its complexityis in single-copy DNA.

The human genome is thoughtto contain ~30,000 to 40,000 genes.

bony fish

amphibians

Page 44: HGPgenemap Structure

Organization of human genome

Page 45: HGPgenemap Structure

Classes of repetitive DNA

Interspersed (dispersed) repeats (e.g., Alu sequences)

TTAGGGTTAGGGTTAGGGTTAGGG

Tandem repeats (e.g., microsatellites)

GCTGAGG GCTGAGGGCTGAGG

Page 46: HGPgenemap Structure

5’ 3’

promoter region

exons (filled and unfilled boxed regions)

introns (between exons)

transcribed region

translated region

mRNA structure

+1

Gene structure

Page 47: HGPgenemap Structure

The (exon-intron-exon)n structure of various genes

-globin

HGPRT(HPRT)

total = 1,660 bp; exons = 990 bp

histone

factor VIII

total = 400 bp; exon = 400 bp

total = 42,830 bp; exons = 1263 bp

total = ~186,000 bp; exons = ~9,000 bp

Page 48: HGPgenemap Structure

Transcription and promoter elements for RNA polymerase II

transcription unit

exon exonpromoter

PTE

transcription element

Promoter (DNA sequence upstream of a gene)• determines start site (+1) for transcription initiation• located immediately upstream of the start site• allows basal (low level) transcription

Transcription element (DNA sequence that regulates the gene)• determines frequency or efficiency of transcription• located upstream, downstream, or within genes• can be very close to or thousands of base pairs from a gene• includes

enhancers (increase transcription rate)silencers (decrease transcription rate)response elements (target sequences for signaling molecules)

• genes can have numerous transcription elements

+1

Page 49: HGPgenemap Structure

Sequence elements within a typical eukaryotic gene1

GC TATACAAT GC

-25-50-80-95-130

1 based on the thymidine kinase gene octamertranscription element

promoter

TATA box (TATAAAA)• located approximately 25-30 bp upstream of the +1 start site• determines the exact start site (not in all promoters)• binds the TATA binding protein (TBP) which is a subunit of TFIID

GC box (CCGCCC)• binds Sp1 (Specificity factor 1)

CAAT box (GGCCAATCT)• binds CTF (CAAT box transcription factor)

Octamer (ATTTGCAT)• binds OTF (Octamer transcription factor)

+1

ATTTGCAT

Page 50: HGPgenemap Structure

Transcription and promoter elements for RNA polymerase II

transcription unit

exon exonpromoter

PTE

exon exon

transcription element

promoter complex

PTE

exon exonP TE

exon exonP TE

transcription element

TE

Page 51: HGPgenemap Structure

LCR

TE

P

PTE

locus control region

• a single locus control region (LCR) may control two or more transcription units in a cell-specific fashion

gene A

gene B

The locus control region is a specialized transcription element

Page 52: HGPgenemap Structure

Proteins regulating eukaryotic mRNA synthesis

General transcription factors• TFIID (a multisubunit protein) binds to the TATA box

to begin the assembly of the transcription apparatus• the TATA binding protein (TBP) directly binds the TATA box• TBP associated factors (TAFs) bind to TBP

• TFIIA, TFIIB, TFIIE, TFIIF, TFIIH1, TFIIJ assemble with TFIID

RNA polymerase II binds the promoter region via the TFII’s

Transcription factors binding to other promoter elements and transcription elements interact with proteins at the promoter and further stabilize (or inhibit) formation of a functional preinitiation complex

1TFIIH is also involved in phosphorylation of RNA polymerase II, DNA repair (Cockayne syndrome mutations), and cell cycle regulation

Page 53: HGPgenemap Structure

Transcription factors (partial list)

Factor Full name or function

CREB Cyclic AMP response element binding proteinCTF CAAT box transcription factor (=NF1) (binds GGCCAATCT)NF1 Nuclear factor-1 (=CTF)AP1 Activator protein-1 (dimer of the Fos-Jun proteins)Sp1 Specificity factor-1 (binds CCGCCC)OTF Octamer transcription factor (binds ATTTGCAT)NF-B Nuclear factor BHSTF Heat shock transcription factorMTF Metal transcription factorUSF Upstream factorATF Activating transcription factorHNF4 Hepatocyte nuclear factor-4 (nuclear receptor superfamily)GR Glucocorticoid receptor (nuclear receptor superfamily)AR Androgen receptor (nuclear receptor superfamily)ER Estrogen receptor (nuclear receptor superfamily)TR Thyroid hormone receptor (nuclear receptor superfamily)C/EBP CAAT/enhancer binding proteinE2F E2 factor (named for the adenovirus E2 gene)p53 p53 (tumor suppressor protein)Myc Product of the c-myc protooncogene (dimerizes with Max)

Page 54: HGPgenemap Structure

Transcription factors (partial list)

Factor Full name or function

CREB Cyclic AMP response element binding proteinCTF CAAT box transcription factor (=NF1) (binds GGCCAATCT)NF1 Nuclear factor-1 (=CTF)AP1 Activator protein-1 (dimer of the Fos-Jun proteins)Sp1 Specificity factor-1 (binds CCGCCC)OTF Octamer transcription factor (binds ATTTGCAT)NF-B Nuclear factor BHSTF Heat shock transcription factorMTF Metal transcription factorUSF Upstream factorATF Activating transcription factorHNF4 Hepatocyte nuclear factor-4 (nuclear receptor superfamily)GR Glucocorticoid receptor (nuclear receptor superfamily)AR Androgen receptor (nuclear receptor superfamily)ER Estrogen receptor (nuclear receptor superfamily)TR Thyroid hormone receptor (nuclear receptor superfamily)C/EBP CAAT/enhancer binding proteinE2F E2 factor (named for the adenovirus E2 gene)p53 p53 (tumor suppressor protein)Myc Product of the c-myc protooncogene (dimerizes with Max)

Page 55: HGPgenemap Structure

Zinc finger transcription factors

His

HisCys

Zn

Cys

• each “zinc finger” consists of antiparallel -sheets and an -helix• there are approximately 30 amino acid residues per finger domain• a zinc atom is bound to two cysteine and two histidine residues (in C2H2)• zinc finger proteins can have from 2 to over 30 zinc finger domains• zinc fingers of transcription factors bind to the major groove of DNA• examples of zinc finger transcription factors include Sp1 and the steroid hormone receptors (nuclear receptor superfamily)• some zinc fingers do not contain histidine (e.g., C4 and C5 zinc fingers)

ZnCys

Cys

His

His

A C2H2 zinc finger

Page 56: HGPgenemap Structure

The estrogen receptor

A C4 + C5 zinc finger pair

ZnCys

Cys

Cys

Cys

Cys

ZnCys

Cys

Cys

Cys

C4 + C5transactivation

hormone binding, dimerization and transactivation

DNA binding domain

N C

Page 57: HGPgenemap Structure

Model for binding of steroid receptor dimer to DNA

one steroid receptormonomer

(with two zinc fingers)

the other steroid receptormonomer

(with two zinc fingers)

Page 58: HGPgenemap Structure

Steroid hormone action in target cells

mifepristone (RU486) is aprogesterone receptor antagonist

Page 59: HGPgenemap Structure

The factor IX gene promoter• there are overlapping binding sites for AR and HNF4

• AR = androgen receptor• zinc finger nuclear receptor superfamily transcription factor• binds androgen• androgen levels increase at puberty

• HNF4 = hepatocyte nuclear factor-4• zinc finger nuclear receptor superfamily transcription factor• ligand unknown - therefore an “orphan” receptor• HNF4 is expressed early in development and in adult liver

Mutations affecting promoters

The factor IX gene• located on the X chromosome• transcribed region >32,700 bp, with 8 exons

-27 -15-36 -22

HNF4AR

Page 60: HGPgenemap Structure

• mutation at -20 results in Hemophilia B Leyden in which the hemophilia improves at puberty when levels of androgen increase

-27 -15-36 -22

HNF4AR

• mutation at -26 results inHemophilia B Brandenburgin which factor IX levels remain low even after puberty

Page 61: HGPgenemap Structure

splicing

Page 62: HGPgenemap Structure

Frequency of bases in each position of the splice sites

Donor sequences

exon intron%A 30 40 64 9 0 0 62 68 9 17 39 24%U 20 7 13 12 0 100 6 12 5 63 22 26%C 30 43 12 6 0 0 2 9 2 12 21 29%G 19 9 12 73 100 0 29 12 84 9 18 20

A G G U A A G U

Acceptor sequences

intron exon%A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17%U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37%C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22%G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25

Y Y Y Y Y Y Y Y Y Y Y N Y A G G Polypyrimidine track (Y = U or C; N = any nucleotide)

Page 63: HGPgenemap Structure

Mutations that disrupt splicing• o-thalassemia - no -chain synthesis• +-thalassemia - some -chain synthesis

Normal splice pattern:

Exon 1 Exon 2 Exon 3Intron 1 Intron 2

Donor site: /GU Acceptor site: AG/

Intron 2 acceptor site mutation: no use of mutant site; use of cryptic splice site in intron 2

Exon 1 Exon 2Intron 1

mutant site: GG/

Intron 2 cryptic acceptor site: UUUCUUUCAG/G

Translation of the retained portion of intron 2 results in premature termination of translation due to a stop codon within the intron, 15 codons fromthe cryptic splice site

Page 64: HGPgenemap Structure

Intron 1 mutation creates a new acceptor splice site: use of both sites

Donor site: /GU AG/: Normal acceptor site (used 10% of the time in mutant)

CCUAUUAG/U: mutant site (used 90%of the time)CCUAUUGG U: Normal intron sequence (never used because it does not conform to a splice site)

Translation of the retained portion of intron 1 results in termination at a stop codon in intron 1

Exon 1 Exon 2 Exon 3Intron 2

Exon 1 (Hb E) mutation creates a new donor splice site: use of both sites

Exon 2 Exon 3Intron 2

/GU: Normal donor site (used 60% of the time when exon 1 site is mutated)

GGUG/GUAAGGCC: mutant site (used 40%of the time)GGUG GUGAGGCC: Normal sequence (never used because it does not conform to a splice site)

The GAG glutamate codon is mutated to an AAG lysine codon in Hb E

The incorrect splicing results in a frameshift and translation terminates at a stop codon in exon 2

Page 65: HGPgenemap Structure

Patterns of alternative exon usage• one gene can produce several (or numerous) different

but related protein species (isoforms)

Cassette

Mutually exclusive

Internal acceptor site

Alternative promoters

Page 66: HGPgenemap Structure

The Troponin T (muscle protein) pre-mRNAis alternatively spliced to give rise to64 different isoforms of the protein

Constitutively spliced exons (exons 1-3, 9-15, and 18)

Mutually exclusive exons (exons 16 and 17)

Alternatively spliced exons (exons 4-8)

Exons 4-8 are spliced in every possible waygiving rise to 32 different possibilities

Exons 16 and 17, which are mutually exclusive,double the possibilities; hence 64 isoforms

Page 67: HGPgenemap Structure

Preproopiomelanocortin• multiple functional polypeptides from a single precursor• processed in a cell-specific manner

26aa 48aa 12aa 40aa 14aa 21aa 40aa 18aa 26aaN C

Signalpeptide

Proopiomelanocortin

Corticotropin(ACTH)

-MSH -Lipotropin

-MSH -MSH Endorphin

-Lipotropin Enkephalin (5aa)

31aa

5aa