9/13/2018 1 Human genome – the basics Human genome project, organization, variations, gene regulation Organization The lectures will take part in Martin, Education center 1,2,3,4,5,6, 2x45min plus 15 min The seminars and practical will take part in Biomed, Division Oncology/Dept. Mol. Biology, 4. floor, Practical Room Nr. 5.51 Test 1: lectures 1-3, seminars week 5 and 6 (practical No. 2) Test 2: lectures 4-6, seminars week 7 and 8 (practical No. 3) Grading • Each student mandatory 4 practicals (100% attendance) 1 presentation 10 p. 2 test max. 40p. each facultative List of 4 methods of molecular biology with applications in medicine 4 p. Study literature Mandatory Lectures in Molecular Biology on website of the Institute of molecular biology https://www.jfmed.uniba.sk/en/pracoviska/scientific - and - teaching - workplaces/pre - clinical - departments/institute - of - molecular - biology/graduate - study/ Facultative: Tom Strachan, Read: Genetics and Genomics in medicine, Garland Science 2015, selected parts Molecular biology methods in medicine– beginning in mid. 20.century • Understanding of processes of replication, transcription, translation • Central dogma of molecular biology • Basic techniques of molecular biology – manual DNA /RNA extraction, Souther/Nothern Blot, endpoint PCR, restriction analysis, radioactive DNA sequencing according Sanger • Efforts of application of molecular biology methods in DNA diagnostics Molecular biology methods in medicine– advanced at the start of 21.century • Subfield – human molecular genetics and genomics - the study of human gene structure and function Consequences – finishing of human genome project development of advanced techniques of molecular biology – automation of processes and high through –put analyses – automated DNA/RNA extraction, real-time PCR, fluorescence sequencing according Sanger, pyrosequencing detecting of disease causing genes and other disease-causing genetic and epigenetic changes sequencing of whole genomes Interface with genomics, bioinformatics and computational biology diagnostic and therapeutical consequences for human medicine
9
Embed
Grading Study literature - jfmed.uniba.sk · 9/13/2018 1 Human genome –the basics Human genome project, organization, variations, gene regulation Organization The lectures will
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
9/13/2018
1
Human genome – the basics
Human genome project, organization, variations, gene
regulation
Organization
The lectures will take part in Martin, Education center 1,2,3,4,5,6, 2x45min plus 15 min
The seminars and practical will take part in Biomed, Division Oncology/Dept. Mol. Biology, 4. floor, Practical Room Nr. 5.51
Test 1: lectures 1-3, seminars week 5 and 6 (practicalNo. 2)
Test 2: lectures 4-6, seminars week 7 and 8 (practicalNo. 3)
Grading
• Each student
mandatory
4 practicals (100% attendance)
1 presentation 10 p.
2 test max. 40p. each
facultative
List of 4 methods of molecular biology with applications in medicine 4 p.
Study literature
MandatoryLectures in Molecular Biology on website of the Institute
of molecular biologyhttps://www.jfmed.uniba.sk/en/pracoviska/scientific-and-teaching-workplaces/pre-clinical-departments/institute-of-molecular-biology/graduate-study/
Facultative: Tom Strachan, Read: Genetics and Genomics in
medicine, Garland Science 2015, selected parts
Molecular biology methods in medicine–beginning in mid. 20.century
• Understanding of processes of replication, transcription, translation
• Central dogma of molecular biology • Basic techniques of molecular biology – manual DNA /RNA
extraction, Souther/Nothern Blot, endpoint PCR, restriction analysis, radioactive DNA sequencing according Sanger
• Efforts of application of molecular biology methods in DNA diagnostics
Molecular biology methods in medicine–advanced at the start of 21.century
• Subfield – human molecular genetics and genomics - the study of human gene structure and function
Consequences – finishing of human genome project development of advanced techniques of molecular biology – automation
of processes and high through –put analyses – automated DNA/RNA extraction, real-time PCR, fluorescence sequencing according Sanger, pyrosequencing
detecting of disease causing genes and other disease-causing genetic and epigenetic changes
sequencing of whole genomes Interface with genomics, bioinformatics and computational biology
diagnostic and therapeutical consequences for human medicine
9/13/2018
2
Major milestones in mapping and sequencing of the human genome - discoveries and methods
1953: The primary structure of DNA is discovered by Watson and Crick
1956: The first physical map of the human genome is determined based
on distinguishing chromosomes according to size and shape using cartain stains to produces subchromosomal banding patterns – light microscopy of stained tissue reveals that our cells contain 46 chromosomes, with a total of 24 different types of chromosome
Physical map
a map which
provides
information on
the linear
structure of
DNA molecule
showing
location of
some physical
entities on a
chromosome
Metaphase
chromosomes
Chromosome banding
• G banding – the chromosomes are subjected to controlled digestion with trypsin before staining with Giemsa, a DNA-binding chemical dye. Positively staining dark bands are known as G bands. Pale bands are G negative.
Locus –
unique
postition or
location of a
gene or
genetic
marker on a
chromosome
Genetic marker -characteristic located at the same place on a pair of homologous
chromosomes that allow us distinguish
one homolog from the other – genetic
polymorphism
Common allele is usually referred to as wild type
• Wild-type homozygote - whenalleles at a given locus are identical, the individual ishomozygous
• If the alleles are different on thematernal and the paternal copy of the gene, the individual isheterozygous at this locus
• Homozygous mutated alleles –inheriting identical copies of a mutant allele occurs in manyautosomal recessive disorders, particularly in circumstances of consanguinity )
• If two different mutant alleles are inherited at a given locus, the individual is said to be a compound heterozygote
• Hemizygous is used to describe males with a mutation in an X chromosomal gene or a femalewith a loss of one X chromosomal locus.
Goals of Human Genome Project
Generate working draft of 90% of the human genome (2001)
Obtain complete high quality genomic sequence (2003)
Make all data publically available adn develop bioinfromaticsoftware and computation biology tool
Develop novel sequencing technologies
Map sequence variations
Interpret function of genes/genome
Develop comparative genomic strategies.
Ethical, legel and social implications (ELSI)
Bioinformatics and Computational Biology
The human genome www.genome.gov/Education
Collective name for the different DNA molecules found in the cells of Homo sapiens
Comprises 25 different DNA molecules:
1. Nuclear genome: 24 different linear nuclear DNA molecules – 22 autosomes and 2 sex chromosomes X and Y, app. 21,000 protein coding genes and more than 6000 RNA genes
2. Mitochondrial genome: a single type of circular mitochondrial DNA, 37 genes
Protein- coding genes 1.1% , other conserved sequences 4%Number of protein-coding genes: app. 21,000
Number of RNA genes: thousands of RNA genes
Molecular definition of a gene
Sequence of chromosomal DNA that isrequired for production of a functional
product,be it a polypeptide or a functional RNA molecule inclusive regulatory sequences
9/13/2018
3
Nuclear genome
• Size: 3200 Mb• Number of different DNA molecules: 23 (in XX cells) or 24 (in XY cells); all
linear• Total number of molecules per cell: 46 in diploid cells• Number of protein-coding genes: app. 21,000• Number of RNA genes: unceratain, more than 15,000 • Protein-coding DNA: app. 1.1%• Noncoding DNA: 98,9%
- 4 % conserved other than coding sequences- 6,5% constitutive heterochromatin (a chromosomal region that remains highly
conserved throughout the cell cycle and shows little or no evidence of active gene expression)
- 45% transposon based repeats- 44% other non conserved ( incl. repetitive sequences)
Mitochodrial genome
• Size: 16.6 kb• Number of different DNA molecules: one circular DNA molecule• Total number of molecules per cell: often several thousands• Number of protein-coding genes: 13 • Number of RNA genes : 24 RNAs genes• Protein-coding DNA: 66%• RNA-coding DNA: 32%
Polypeptide-conding genes
• Single copy genes
• Gene families
- duplication of single copy genes
- degree of sequence similarity and structural similarity
If two different genes make very similar protein products, they are most likely to be originated by an evolutionary very recent gene duplication and tend to be clustered togheter
- clustered or dispersed trough genome
Major clases of human noncoding (nc)RNAGenes where the functional product is non-coding RNA molecule (ncRNA) )
• Ribosomal RNA• Transfer RNA• Small nuclear RNA• Small nucleolar RNA• Smal Cajal body RNA• Ribonucleases RNA• Micro RNA• Piwi-binding RNA• Endogenous short interfering RNA• Long noncoding regulatory RNA
microRNA ca. 2000 different types, about 22 nt
size regulating RNA, antisense regulation of other
genes
• Gene families with genes with high degree of sequence homology over most of the length of the gene or coding sequence
histone genes – 86 different histone sequences distributed over 10 different chromosomes; two large clusters
α-globin and β-globin genes
9/13/2018
4
• Gene families defined by common protein domain, the members may have low sequence homology, but they posses certain sequences that specify one or more specific protein domains
3. Gene superfamilies– the members are much distantly related in evolutionary terms; they encode products that are functionally related in a general sense, and show only weak sequence homology over a large segment without very significant conserved amino acid motif , but
common structural features
- general related function
Immunoglobulin superfamily – very large family encompassing immunoglobulin (Ig) genes, TCR and HLA genes
products are considerably divergent at the DNA level but which function in the immune system and contain Ig-like domain.
Tandemly repeated noncoding human DNA
1.Satellite DNA – often occurs in arrays (blocks) within 100 kb to several Mb size range
- size of repeat unit is 5-171 bp
- especially at centromers; not transcribed
Alphoid DNA – bulk of the centromeric heterochromatine; repeat unit 171 bp; important for centromere function
2.Minisatellite DNA – arrays within 0,1 kb to 20 kb range
- repeat unit 9-64 bp
Telomeric family – 3-20 kb tandem of hexanuklaotide repeat units, especially TTAGGG, which are added by specialized enzyme telomerase; acting as buffer to protect the ends of the chromosomes
Hypervariable - high polymorphic (various individual loci), organized in over 1000 arrays (0,1 to 20 kb long)
Tandemly repeated noncoding human DNA
Microsatellite DNA(simple sequence repeats;SSR or STR =short tandem repeats)
- small arrays of tandem repeats of a simple sequence (usually less than 10 bp; interspersed through genome, accounting for over 60 Mb (2% of the genome)
- arises by replication splippage
CA/TG repeats are very common, accounting for about 0.5% of the genome and are often highly polymorphic
tri,-, tetra and pentanucleotide repeats
Make all data publically available a develop bioinformaticsoftware and computational biology tools
Human genome is variable in 0.1% - genetic variation describes differences between the DNA sequences of individual genomes
Human genome is variable in 0.1% -genetic variation describes differences between the DNA sequences of individual genomes -Goal of HGP - Map sequence variation
•
Origin of genetic variations are mutations
Mutation is process that produces altered DNA
Mutation (DNA variant) is outcome - any change- in the primary nucleotide sequence of DNA regardless of its functional consequence
Mutations result in alternative forms of DNA at the specific locus that are generally known as DNA variants
For any locus, if more than one DNA variant is common in the population (above frequency of 0.01), the DNA variation is described as polymorphisms
DNA variants that have frequencies of less than 0.01 are often described as rare variants
At any genetic locus the maternal and paternal alleles normaly
have identical or slightly different sequencesCommon allele is usually referred to as wild type
• Wild-type homozygote - when alleles at a given locus are identical (the DNA variants are identical), the individual is homozygous for common variant
• homozygous for rare variant
• If the alleles are different (DNA variants are different) on the maternal and the paternal copy of the gene, the individual is heterozygous at this locus
Human genome is variable in ca 0.1% - genetic variation describes differences between the DNA sequences of individual
genomes
Variants can occur in germline (sperm or oocytes); these can be transmitted from parents to progeny
De novo variants occur in sperm or oocytes, but are not present in parents
Alternatively, variants can occur during embryogenesis or in somatic tissues.
• Variants that occur during development lead to mosaicism, a situation in which tissues are composed of cells with different genetic constitution
• Other somatic variants are associated with neoplasia because they confer a growth advantage to cells
9/13/2018
6
The scale of human genetic variations
• Numerical variants or aneuploidy
- an entire chromosome is missing (monosomy)
- an extra copy is present (trisomy)
• Structural DNA variants ≥ 50 – 100 bp in size and including insertion, deletions, duplications and inversions of chromosomal regions
- Large copy number variants (CNV) involving hundreds of kb to Mb (≥ 500 Kb) of DNA that are either missing or duplicated in tandem (some times multiple times), can be very deleterious involving many genes and are typically de novo; are rare
- Small CNV - < 100 Kb
However there is imprecise cut-off between indels and copy number variants
• Microsatellites and other polymorphisms due to variable number of tandem repeats
• Small scale insertion/ deletions (insdel)
• Single nucleotide variants (SNVs) and single nucleotide polymorphisms (SNPs), when the variant exceeds the frequency of 0.01 in the population
The most common type of genetic variation in the human genome is due to single nucleotide substitution
genotype A
genotype T
Consequences of human genome variations
Depends from the impact on the protein production
• Deviations from normal gene expression
- decreased expression (one allele is inactivated from loss of a gene copy - haplosinsuficiency
- increased gene expression – one allele is duplicated from gain of a gene copy- triplosensitivity
IS SUFFICIENT TO CAUSE DISEASE IN SOME CASES – NUMERICAL, STRUCTURAL AND LARGE CNV HAVE THE GREATEST POTENTIAL TO DO DAMAGE
because they affect larger number of the genes
• Extra copies of genes - overexpression
• Too few copies – underexpression
• Translocation and inversions – fusion genes
Consequences of human genome variations
Depends from the impact on the protein production
Small CNVs and SNVs have variable effects ranging from completely innocuous to highly deleterious
- Innocuous – mostly in non coding part of the genome or outside of protein—coding sequences
- SNV most prevalent is single nucleotide substitution (1 individual more than 4 mil. )
- It can have effect of the gene product function when localized in exons, splice site or regulatory region.
Effect of SNVs on proteins structurei) no effect on the aminoacid composition (synonymous), ii) a change of one amino acid (nonsynonymous), iii) a change in the length of the protein due to a premature stop signal being induced
Consequences of human genome variations
Reference genome/ gene sequence- there is no „control“ or normal human gene or genome - to provide some sort of standard – reference genome/gene have been assembled
• Reference genome
- assembled representing a mosaic of DNA from over dozen anonymous volunteers, should contain variants that are notassociated with diseases
- Can Contain important variants of health significance that are not necessarily normal
- Is constantly updated
• Reference gene sequence is used to compare the patientssequence with - DNA sequence of a gene defines the gene coordinates and variants supposed to be not associated with diseases
9/13/2018
7
Controls on gene expression operate at several levels .
• Transcription of genes is controlled by transcription factors TF binding to specific DNA sequences within the regulatory regions of genes.
• Chromatine conformation: DNA methylation and histone code
Temporal restriction of gene expression• Cell cycle stage - some genes are only expressed at
specific times in the cell cycle (e.g. histones only at the S phase)
• Developmental stage – at the very earliest stage of development transcription does not occur; instead cell rely on previously synthesized mRNA; later in development some genes may be expressed transiently at specific stages; some genes are expressed at different developmental stages as in the case of beta-globin.
• Differentiation stage – as the cell differentiate, their genomes are modified resulting in altered expression pattern; in some differentiated cells transcription does not occur.
• Inducible expression – some genes are activate din response to environmental cues or extracellular signaling. The expression is easily reversed if the including factor is removed.
Spatial restriction of gene expression in mammalian cells
• Tissue-specific gene expression – as in the case of beta-globin gene which is expressed in erythoroid cells
• Expression in individual cells – some specialized genes produce different products in individual cells belonging to the same cell type; different B lymphocytes in a person express different (cell-specific) antibody molecules
Regulation of gene expression
• Transcriptional
• Post-transcriptional
• Translational
• Protein degradation
Regulation of gene expression
• Transcriptional – genetic (direct interaction of a control factor with the
gene) – cis-actingtrans-acting
-modulation (interaction of a control factor with the transcriptional machinery)
- Epigenetic (non-sequence changes in DNA structure)• Post-transcriptional • Translational• Protein degradation
9/13/2018
8
PROMOTERS – are combinations of short sequence elements (usually located in the immediate upstream region of the gene- often within 200 bp of the transcription start site) which serve to initiate transcription.
Position of cis-acting elements within promoter sequences
• TATA box, usually found at a position about 25 bp upstream (-25) from the transcriptional start; it is typically found in genes which are actively transcribed by RNA pol II
• GC box found in a variety of housekeeping genes, it appears to function in either orientation
• CAAT box often located at position -80; it is usually the strongest determinant of promoter efficiency
Cis-acting gene sequences -specific recognition elements
recognized by tissue-specific TFs
• ENHANCERS – positive transcriptional control elements which are particularly prevalent in mammals; they serve to increase the basal level of transcription which is initiated through the core promoter elements
• They function is independent of both their orientation and the distance (in some extent)
• SILENCERS – serve to reduce transcription levels; • RESPONSE ELEMENTS – modulate transcription in response to
specific external stimuli; they are usually located upstream of the promoter element (often within 1 kb of the transcription start site)
• A variety of such elements respond to the specific hormones (e.g. retinoic acid or steroid hormones such glucocorticoids)
Genetic changes in the regulatory mechanism of the control elements of gene expression – examples
• 1. mutations within the promoter region
• 2. mutation within enhancers, silencers and response elements
• 3. gene is under control of inappropriate enhancer, silencer or response elements e.g. gene translocation
• 4. mutations in conserved splicing sequences
Pathological gene expression
Genetic changes in the regulatory mechanism of the control elements of gene expression – examples
• 1. mutations within the promoter region
• 2. mutation within enhancers, silencers and response elements
• 3. gene is under control of inappropriate enhancer, silencer or response elements e.g. gene translocation
• 4. mutations in conserved splicing sequences
DETECTION of PATHOLOGICAL GENE EXPRESSION – diagnostic, prognostic and predictive consequences
9/13/2018
9
• Nucleosome – structural unit of chromatin; it consists of a central core of eight histone proteins (2x H2A,H2B, H3 a H4) around which a strech of 146 bp of dsDNA is coiled; adjacent nucleotides are connected by a short length of spacer DNA
•The strings of beads, approx. 10 nm in diameter, are in turn coiled into achromatin fiber ; the interphase chromosomes seems to consists of these chromatin fibers
The histone code
• The histone code concept implies that particular combination of histone modifications define the conformation of chromatin and hence the activity of DNA contained therein.
• Good example of importance of histone modifications for gene expression is provided by the methylation of H3K4 –dimethylated and trimethylated H3K4 appear in discrete peaks in genome that overlap precisely with promoter regions – landmark for recruitment of RNA pol II and protection against DNA methylation by methyltransferases
• Epigenetic mechanisms of gene control describes heritable states which do not depend on DNA sequence
• (Genetic mechanisms explain heritable states (characters) which result from changes in DNA sequences (mutations))
• DNA methylation Gene repression
• (Host defense against transposons or foreign DNA)
CpG islands –
CG rich (more than 50%) unmethylated or hypomethylated DNA sequence of about hundreds nucleotides long with significant frequency of CpG dinucleotides
are target for DNA methylation that can cause local condensation of chromatin and inhibit gene expression
DNA methylation is accomplished by DNA methyltrasnferases at CpG islands
• Genomic regions with high frequency of CpG dinucleotides; CpG islands are typically 300 – 3 000 bp in length
The usual formal definition of a CpG island is a region with at least 500 bp and with a GC percentage that is greater than 55%
Methyl-CpG binding proteins with methyl-CpG-binding domain (MBD)
•MECP2 on X chromosome – loss of function mutations in MECP2 is responsible for dominantly inherited Rett syndrome
Normal delivery, heterozygous girls develop
normally for their first year but than regress
Other main criteria include loss of purposeful
hand skills, loss of spoken language, gait
abnormalities, and stereotypic hand
movements.
80-90% - dominant de novo germline loss of function mutations (from fathers
germline) in MECP2 na Xq28
MECP2 is a transcription factor – methyl-CpG binding protein