Top Banner
DNA Nature Graduation Research ( 2009 )
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dna Nature

DNA NatureGraduation Research

( 2009 )

Page 2: Dna Nature

DNA Nature .. By / M. Hosny 2

DNA NatureGraduation Research

( 2009 )

By / Mohamed Hosny Abu-samra

Chemistry / Biochemistry department

Supervised by

Prof. Dr. / Mahmoud Mohamed Balbaa

Alexandria UniversityFaculty of Science

Biochemistry Department

Page 3: Dna Nature

DNA Nature .. By / M. Hosny 3

CONTENTS

Titles

Introduction 4

DNA structure 4

Chromosome structure 6

DNA Replication 9

Gene Mutation 16

Genomics 17

Cloning 22

References 25

Boxes

Box 1 6

Box 2 7

Box 3 13

Box 4 14

Box 5 17

Box 6 18

Box 7 19

Tables

Table 1 5

Table 2 8

Table 3 13

Table 4 14

Page 4: Dna Nature

DNA Nature .. By / M. Hosny 4

Introduction

The discovery that genetic information is coded along the length of a polymeric moleculecomposed of only four types of monomeric units was one of the major scientific achievements ofthe twentieth century. This polymeric molecule, DNA, is the chemical basis of heredity and isorganized into genes, the fundamental units of genetic information .Every organism possesses a genome that contains the biological information needed to maintainthe life of that organism . Most genomes , including the human genome and those of all othercellular life forms , are made of DNA but a few viruses have RNA genomes .

DNA is found inside a special area of the cell called the nucleus. Because the cell is very small,and because organisms have many DNA molecules per cell, each DNA molecule must be tightlypackaged. This packaged form of the DNA is called a chromosome.DNA spends a lot of time in its chromosome form. But during cell division, DNA unwinds so it canbe copied and the copies transferred to new cells. DNA also unwinds so that its instructions canbe used to make proteins and for other biological processes.

Researchers refer to DNA found in the cell's nucleus as nuclear DNA. An organism's complete setof nuclear DNA is called its genome.Besides the DNA located in the nucleus, humans and other complex organisms also have a smallamount of DNA in cell structures known as mitochondria. Mitochondria generate the energy thecell needs to function properly.In sexual reproduction, organisms inherit half of their nuclear DNA from the male parent and halffrom the female parent. However, organisms inherit all of their mitochondrial DNA from thefemale parent. This occurs because only egg cells, and not sperm cells, keep their mitochondriaduring fertilization. 6

1. DNA structure

DNA consists of two polynucleotide strands wound around each other to form a right-handeddouble helix . The structure of DNA is so distinctive that this molecule is often referred to as thedouble helix. Each nucleotide monomer in DNA is composed of a nitrogenous base (either apurine or a pyrimidine), a deoxyribose sugar, and phosphate. The mononucleotides are linked toeach other by 3',5'-phosphodiester bonds.These bonds join the 5'-hydroxyl group of thedeoxyribose of one nucleotide to the 3'-hydroxyl group of the sugar unit of another nucleotidethrough a phosphate group.The antiparallel orientation of the two polynucleotide strands allowshydrogen bonds to form between the nitrogenous bases that are oriented toward the helixinterior.There are two types of base pairs (bp) in DNA: A) adenine (a purine) pairs with thymine (apyrimidine), and B) guanine (a purine) pairs with cytosine (a pyrimidine) . 5

Several types of noncovalent bonding contribute to the stability of its helical structure:

1. Hydrophobic interactions : The base ring n cloud of electrons between stacked purine andpyrimidine bases is relatively nonpolar. The clustering of the base components of nucleotideswithin the double helix is a stabilizing factor in the three-dimensional macromolecule because itminimizes their interactions with water, thereby increasing entropy.2. Hydrogen bonds : The base pairs, on close approach, form a preferred setof hydrogen bonds, three between GC pairs and two between AT pairs. The cumulative"zippering" effect of these hydrogen bonds keeps the strands in correct complementaryorientation.

Page 5: Dna Nature

DNA Nature .. By / M. Hosny 5

A diagrammatic representation of theWatson and Crick model of thedouble-helical structure of the B formof DNA. The horizontal arrow indicatesthe width of the double helix (20 A°),and the vertical arrow indicates thedistance spanned by one completeturn of the double helix (34 A°). Oneturn of B-DNA includes ten base pairs(bp), so the rise is 3.4 A° per bp. Thecentral axis of the double helix isindicated by the vertical rod. Theshort arrows designate the polarity ofthe antiparallel strands. The majorand minor grooves are depicted. (A,adenine; C, cytosine; G, guanine; T,thymine; P, phosphate; S, sugar[deoxyribose].) 3

3. Base stacking : Once the antiparallel polynucleotide strands have been brought together bybase pairing, the parallel stacking of the nearly planar heterocyclic bases stabilizes the moleculebecause of the cumulative effect of weak van der Waals forces.4. Electrostatic interactions : DNA's external surface, referred to as the sugar- phosphatebackbone, possesses negatively charged phosphate groups. Repulsion between nearby phosphategroups, a potentially destabilizing force, is minimized by the shielding effects of divalent cationssuch as Mg2+ and polycationic molecules such as the polyamines and histones .

A , B & Z DNA :

The structure discovered by Watson and Crick, referred to as B-DNA, represents the sodium saltof DNA under highly humid conditions. DNA can assume different conformations becausedeoxyribose is flexible and the C1-N-glycosidic linkage rotates. When DNA becomes partiallydehydrated, it assumes the A form . In A-DNA , the base pairs are no longer at right angles to thehelical axis . Instead, they tilt 20° away from the horizontal. In addition, the distance betweenadjacent base pairs is slightly reduced, with 11 bp per helical turn instead of the 10.4 bp found inthe B form. Each turn of the double helix occurs in 2.5 nm, instead of 3.4 nm, and the molecule'sdiameter swells to approximately 2.6 nm from the 2.4 nm observed in B-DNA. The A form of DNAis observed when it is extracted with solvents such as ethanol.

The Z form of DNA (named for its "zigzag" conformation) radically departs from the B form. Z-DNA (D = 1.8 nm), which is considerably slimmer than B-DNA (D = 2.4 nm), is twisted into aleft-handed spiral with 12 bp per turn. Each turn of Z-DNA occurs in 4.5 nm, compared with 3.4nm for B-DNA. DNA segments with alternating purine and pyrimidine bases (especially CGCGCG)are most likely to adopt a Z configuration. 5

Table 1. Selected Structural Properties of B , A , and Z-DNA

B-DNA(Watson-Crick Structure) A-DNA Z-DNA

Helix diameter 2.4 nm 2.6 nm 1.8 nmbp per helical turn 10.4 11 12Rotation per bp 3.4 nm 2.5 nm 4.5 nmHelix rotation Right-handed Right-handed Left-handed

Page 6: Dna Nature

DNA Nature .. By / M. Hosny 6

A-DNA, B-DNA, and Z-DNA.Because DNA is a flexible molecule, itcan assume different conformationforms depending on its base pairsequence and/or isolation conditions.Each molecular form in the figurepossesses the same number of basepairs. 5

The Denaturation (Melting) of DNA:

The double-stranded structure of DNA can be separated into two component strands (melted) insolution by increasing the temperature or decreasing the salt concentration. Not only do the twostacks of bases pull apart but the bases themselves unstack while still connected in the polymerby the phosphodiester backbone.The strands of a given molecule of DNA separate over a temperature range. The midpoint iscalled the melting temperature, or Tm. The Tm is influenced by the base composition of theDNA and by the salt concentration of the solution. DNA rich in G–C pairs, which have threehydrogen bonds, melts at a higher temperature than that rich in A–T pairs, which have twohydrogen bonds.

Box 1. The human mitochondrial genome

The complete sequence of the human mitochondrial genome has beenknown for over 20 years . At just 16569 bp , it is much smaller thannuclear genome , and it contains just 37 genes . Thirteen of thesegenes code for proteins involved in the respiratory complex , the mainbiochemical component of the energy-generating mitochondria ; theother 24 specify non-coding RNA molecules that are required forexpression of the mitochondrial genome . The genes in this genome aremuch more closely packed than in the nuclear genome and they do notcontain introns . In many respects , the human mitochondrial genomeis typical of the mitochondrial genomes of other animals . 2

2. The Structure of Chromosomes

The term “chromosome” is used to refer to a nucleic acid molecule that is the repository ofgenetic information in a virus, a bacterium, an eukaryotic cell, or an organelle. It also refers tothe densely colored bodies seen in the nuclei of dye-stained eukaryotic cells, as visualized using alight microscope.

Page 7: Dna Nature

DNA Nature .. By / M. Hosny 7

Chromatin Consists of DNA and Proteins

The eukaryotic cell cycle produces remarkable changes in the structure of chromosomes . Innondividing eukaryotic cells (in G0) and those in interphase (G1, S, and G2), the chromosomalmaterial, chromatin, is amorphous and appears to be randomly dispersed in certain parts of thenucleus. In the S phase of interphase the DNA in this amorphous state replicates, eachchromosome producing two sister chromosomes (called sister chromatids) that remain associatedwith each other after replication is complete. The chromosomes become much more condensedduring prophase of mitosis, taking the form of a speciesspecific number of well-defined pairs ofsister chromatids . Chromatin consists of fibers containing protein and DNA in approximatelyequal masses, along with a small amount of RNA . The DNA in the chromatin is very tightlyassociated with proteins called histones, which package and order the DNA into structural unitscalled nucleosomes . Also found in chromatin are many nonhistone proteins, some of which helpmaintain chromosome structure, others that regulate the expression of specific genes . Beginningwith nucleosomes, eukaryotic chromosomal DNA is packaged into a succession of higher-orderstructures that ultimately yield the compact chromosome seen with the light microscope.

Changes in chromosome structureduring the eukaryotic cell cycle.

Cellular DNA is uncondensed throughoutinterphase. The interphase period can besubdivided into the G1 (gap) phase; the S(synthesis) phase, when the DNA isreplicated; and the G2 phase, in whichthe replicated chromosomes cohere toone another. The DNA undergoescondensation in the prophase of mitosis.Cohesins and condensins are proteinsinvolved in cohesion and condensation .The architecture of the cohesin-condensin-DNA complex is not yetestablished, and the interactions shownhere are figurative, simply suggestingtheir role in condensation of thechromosome. During metaphase, thecondensed chromosomes line up along aplane halfway between the spindle poles.One chromosome of each pair is linked toeach spindle pole via microtubules thatextend between the spindle and thecentromere. The sister chromatidsseparate at anaphase, each drawn towardthe spindle pole to which it is connected.After cell division is complete, thechromosomes decondense and the cyclebegins a new. 1

Box 2. Topoisomerase II

Topoisomerase II is so important to the maintenance of chromatin structure thatinhibitors of this enzyme can kill rapidly dividing cells. Several drugs used incancer chemotherapy are topoisomerase II inhibitors . 1

Page 8: Dna Nature

DNA Nature .. By / M. Hosny 8

Histones Are Small, Basic ProteinsFound in the chromatin of all eukaryotic cells, histones have molecular weights between 11,000and 21,000 and are very rich in the basic amino acids arginine and lysine (together these makeup about one-fourth of the amino acid residues). All eukaryotic cells have five major classes ofhistones, differing in molecular weight and amino acid composition . The H3 histones are nearlyidentical in amino acid sequence in all eukaryotes, as are the H4 histones, suggesting strictconservation of their functions. For example, only 2 of 102 amino acid residues differ between theH4 histone molecules of peas and cows, and only 8 differ between the H4 histones of humans andyeast. Histones H1, H2A, and H2B show less sequence similarity among eukaryotic species.

* The sizes of these histones vary somewhat from species to species. The numbers given here are for bovine histones.

Table 2. Types and Properties of Histones 1

Content of basic aminoacids

(% of total)Histone Molecular weightNumber ofamino acidresidues Lys Arg

H1* 21,130 223 29.5 11.3H2A* 13,960 129 10.9 19.3H2B* 13,774 125 16.0 16.4H3 15,273 135 19.6 13.3H4 11,236 102 10.8 13.7

Koo

lman

,C

olor

Atlas

of

Bio

chem

istr

y,2n

d ed

itio

n ©

2005

Thi

eme

Page 9: Dna Nature

DNA Nature .. By / M. Hosny 9

3. DNA Replication ( For E. coli )

DNA replication, the basis for biological inheritance, is a fundamental process occurring in allliving organisms to copy their DNA. This process is " Semiconservative " in that each strand ofthe original double-stranded DNA molecule serves as template for the reproduction of thecomplementary strand. Hence, following DNA replication, two identical DNA molecules have beenproduced from a single double-stranded DNA molecule. Cellular proofreading and error-checkingmechanisms ensure near perfect fidelity for DNA replication.

In a cell, DNA replication begins at specific locations in the genome, called "origins". Unwindingof DNA at the origin, and synthesis of new strands, forms a replication fork. In addition to DNApolymerase, the enzyme that synthesizes the new DNA by adding nucleotides matched to thetemplate strand, a number of other proteins are associated with the fork and assist in theinitiation and continuation of DNA synthesis.

Watson and Crick proposed the hypothesis of semiconservative replication soon after publicationof their 1953 paper on the structure of DNA, and the hypothesis was proved by ingeniouslydesigned experiments carried out by Matthew Meselson and Franklin Stahl in 1957.

The replication process is initiated at particular points within the DNA, known as "origins", whichare targeted by proteins that separate the two strands and initiate DNA synthesis.

A new strand of DNA is always synthesized in the 5' to 3' direction, with the free 3' OH as thepoint at which the DNA is elongated . Because the two DNA strands are antiparallel, the strandserving as the template is read from its 3' end toward its 5' end . If synthesis always proceeds inthe 5' to 3' direction , how can both strands be synthesized simultaneously ?This problem was resolved by Reiji Okazaki and colleagues in the 1960s. Okazaki found that oneof the new DNA strands is synthesized in short pieces, now called Okazaki fragments. This workultimately led to the conclusion that one strand is synthesized continuously and the otherdiscontinuously . The continuous strand, or leading strand, is the one in which 5' to 3' synthesisproceeds in the same direction as replication fork movement. The discontinuous strand, orlagging strand, is the one in which 5' to 3' synthesis proceeds in the direction opposite to thedirection of fork movement. Okazaki fragments range in length from a few hundreds to a fewthousands nucleotides, depending on the cell type. 1

DNA is Synthesized by DNA Polymerases

The search for an enzyme that could synthesize DNA began in 1955. Work by Arthur Kornbergand colleagues led to the purification and characterization of DNA polymerase from E. coli cells, asingle-polypeptide enzyme now called DNA polymerase I (Mr 103,000; encoded by the polAgene). Much later, investigators found that E. coli contains at least four other distinct DNApolymerases.The fundamental reaction is a phosphoryl group transfer. The nucleophile is the 3'-hydroxyl groupof the nucleotide at the 3' end of the growing strand. Nucleophilic attack occurs at the αphosphorus of the incoming deoxynucleoside 5'-triphosphate. Inorganic pyrophosphate isreleased in the reaction. The general reaction is :

(dNMP)n + dNTP (dNMP)n+1 + PPi DNA Lengthened

DNAwhere dNMP and dNTP are deoxynucleoside 5'-monophosphate and 5'-triphosphate, respectively.The reaction appears to proceed with only a minimal change in free energy, given that onephosphodiester bond is formed at the expense of a somewhat less stable phosphate anhydride.However, noncovalent base-stacking and base-pairing interactions provide additional stabilizationto the lengthened DNA product relative to the free nucleotide. Also, the formation of products isfacilitated in the cell by the 19 kJ/mol generated in the subsequent hydrolysis of thepyrophosphate product by the enzyme pyrophosphatase.

Page 10: Dna Nature

DNA Nature .. By / M. Hosny 10

Elongation of a DNA chain:

(a) DNA polymerase activity requires a singleunpaired strand to act as template and a primerstrand to provide a free hydroxyl group at the 3'end, to which a new nucleotide unit is added. Eachincoming nucleotide is selected in part by basepairing to the appropriate nucleotide in thetemplate strand. The reaction product has a newfree 3'-hydroxyl, allowing the addition of anothernucleotide.

(b) The catalytic mechanism likely involves two Mg2+ ions, coordinated to the phosphategroups of the incoming nucleotide triphosphate and to three Asp residues, two of which arehighly conserved in all DNA polymerases. The top Mg2+ ion in the figure facilitates attack ofthe 3'-hydroxyl group of the primer on the α phosphate of the nucleotide triphosphate; thelower Mg2+ ion facilitates displacement of the pyrophosphate. 1

Page 11: Dna Nature

DNA Nature .. By / M. Hosny 11

Mechanism of the DNA ligase reaction.

In each of the three steps, one phosphodiester bond is formed at the expense of another. Steps 1and 2 lead to activation of the 5' phosphate in the nick. An AMP group is transferred first to a Lysresidue on the enzyme and then to the 5' phosphate in the nick. In step 3 , the 3'- hydroxylgroup attacks this phosphate and displaces AMP, producing a phosphodiester bond to seal thenick. In the E. coli DNA ligase reaction, AMP is derived from NAD+. The DNA ligases isolated froma number of viral and eukaryotic sources use ATP rather than NAD+ , and they releasepyrophosphate rather than nicotinamide mononucleotide (NMN) in step 1 . 1

Early work on DNA polymerase I led to the definition of two central requirements for DNApolymerization. First, all DNA polymerases require a template. The polymerization reaction isguided by a template DNA strand according to the base-pairing rules predicted by Watson andCrick: where a guanine is present in the template, a cytosine deoxynucleotide is added to thenew strand, and so on.The polymerases require a primer. A primer is a strand segment (complementary to thetemplate) with a free 3'-hydroxyl group to which a nucleotide can be added; the free 3' end of theprimer is called the primer terminus. In other words, part of the new strand must already be inplace: all DNA polymerases can only add nucleotides to a preexisting strand. Most primers areoligonucleotides of RNA rather than DNA, and specialized enzymes synthesize primers when andwhere they are required. 1

After adding a nucleotide to a growing DNA strand, a DNA polymerase either dissociates or movesalong the template and adds another nucleotide. Dissociation and reassociation of the polymerasecan limit the overall polymerization rate ,the process is generally faster when a polymerase addsmore nucleotides without dissociating from the template. The average number of nucleotidesadded before a polymerase dissociates defines its processivity. DNA polymerases vary greatly inprocessivity; some add just a few nucleotides before dissociating, others add many thousands.

Page 12: Dna Nature

DNA Nature .. By / M. Hosny 12

Replication is very accurate

Replication proceeds with an extraordinary degree of fidelity. In E. coli, a mistake is made onlyonce for every 109 to 1010 nucleotides added. For the E. coli chromosome of ~4.6 x 106 bp, thismeans that an error occurs only once per 1,000 to 10,000 replications. During polymerization,discrimination between correct and incorrect nucleotides relies not just on the hydrogen bondsthat specify the correct pairing between complementary bases but also on the common geometryof the standard A=T and G≡C base pairs .

If the polymerase has added the wrong nucleotide, translocation of the enzyme to the positionwhere the next nucleotide is to be added is inhibited. This kinetic pause provides the opportunityfor a correction. The 3' to 5' exonuclease activity removes the mispaired nucleotide, and thepolymerase begins again. This activity, known as proofreading, is not simply the reverse of thepolymerization reaction , because pyrophosphate is not involved. 1

The additional accuracy is provided by a separate enzyme system that repairs the mismatchedbase pairs remaining after replication.

E. coli Has at Least Five DNA Polymerases

More than 90% of the DNA polymerase activity observed in E. coli extracts can be accounted forby DNA polymerase I.A search for other DNA polymerases led to the discovery of E. coli DNA polymerase II and DNApolymerase III in the early 1970s. DNA polymerase II is an enzyme involved in one type ofDNA repair . DNA polymerase III is the principal replication enzyme in E. coli. DNA polymerasesIV and V, identified in 1999, are involved in an unusual form of DNA repair .DNA polymerase I, then, is not the primary enzyme of replication; instead it performs a host ofclean-up functions during replication, recombination, and repair. The polymerase’s specialfunctions are enhanced by its 5' to 3' exonuclease activity. Most other DNA polymerases lack a 5'to 3' exonuclease activity. DNA polymerase III is much more complex than DNA polymerase I,having ten types of subunits . 1

Replication in E. coli requires not just a single DNA polymerase but 20 or more different enzymesand proteins, each performing a specific task. The entire complex has been termed the DNAreplicase system or replisome.

Replication of the E. coli Chromosome Proceeds in Stages

The synthesis of a DNA molecule can be divided into three stages: initiation, elongation, andtermination , distinguished both by the reactions taking place and by the enzymes required. Theevents described below reflect information derived primarily from in vitro experiments usingpurified E. coli proteins, although the principles are highly conserved in all replication systems.

Page 13: Dna Nature

DNA Nature .. By / M. Hosny 13

InitiationThe E. coli replication origin, oriC, consistsof 245 bp ; it bears DNA sequenceelements that are highly conserved amongbacterial replication origins. At least ninedifferent enzymes or proteins participate inthe initiation phase of replication. Theyopen the DNA helix at the origin andestablish a prepriming complex forsubsequent reactions. The crucialcomponent in the initiation process is theDnaA protein .

Box 3. DNA Is Degraded by NucleasesThe enzymes that degrade DNA are known as nucleases,or DNases , they are specific for DNA rather than RNA.Every cell contains several different nucleases, belongingto two broad classes: exonucleases and endonucleases.Exonucleases degrade nucleic acids from one end of themolecule. Many operate in only the 5' to 3' or the 3' to 5'direction , removing nucleotides only from the 5' or the3'end, respectively . Endonucleases can begin todegrade at specific internal sites in a nucleic acid strand ormolecule, reducing it to smaller and smaller fragments. 1

Initiation is the only phase of DNA replication regulated such that replication occurs only once ineach cell cycle. The mechanism of regulation is not yet well understood, but genetic andbiochemical studies have provided a few insights.

*Subunits in these cases are identical.

Elongation

The elongation phase of replication includes two distinct but related operations: leading strandsynthesis and lagging strand synthesis. Several enzymes at the replication fork are important tothe synthesis of both strands. Parent DNA is first unwound by DNA helicases, and the resultingtopological stress is relieved by topoisomerases. Each separated strand is then stabilized by SSB.From this point, synthesis of leading and lagging strands is sharply different. 1

Leading strand synthesis, the more straightforward of the two, begins with the synthesis byprimase (DnaG protein) of a short (10 to 60 nucleotide) RNA primer at the replication origin.Deoxyribonucleotides are added to this primer by DNA polymerase III. Leading strand synthesisthen proceeds continuously, keeping pace with the unwinding of DNA at the replication fork.Lagging strand synthesis , is accomplished in short Okazaki fragments. First, an RNA primer issynthesized by primase and, as in leading strand synthesis, DNA polymerase III binds to the RNAprimer and adds deoxyribonucleotides. On this level, the synthesis of each Okazaki fragmentseems straightforward, but the reality is quite complex. The complexity lies in the coordination ofleading and lagging strand synthesis: both strands are produced by a single asymmetric DNApolymerase III dimer, which is accomplished by looping the DNA of the lagging strand , bringingtogether the two points of polymerization.

Table 3. Proteins Required to Initiate Replication at the E. coli Origin 1

Protein MrNumber

ofsubunits

Function

DnaA protein 52,000 1 Recognizes ori sequence; opensduplex at specific sites in origin

DnaB protein (helicase) 300,000 6* Unwinds DNA

DnaC protein 29,000 1 Required for DnaB binding atorigin

HU 19,000 2 Histone like protein; DNA-bindingprotein; stimulates initiation

Primase (DnaG protein) 60,000 1 Synthesizes RNA primers

Single-stranded DNA–bindingprotein (SSB) 75,600 4

* Binds single-stranded DNA

RNA polymerase 454,000 5 Facilitates DnaA activity

DNA gyrase (DNAtopoisomerase II) 400,000 4 Relieves torsional strain

generated by DNA unwinding

Dam methylase 32,000 1 Methylates (5_)GATC sequencesat oriC

Page 14: Dna Nature

DNA Nature .. By / M. Hosny 14

The synthesis of Okazaki fragments on the lagging strand entails some elegant enzymaticchoreography. The DnaB helicase and DnaG primase constitute a functional unit within thereplication complex, the primosome. DNA polymerase III uses one set of its core subunits (thecore polymerase) to synthesize the leading strand continuously, while the other set of coresubunits cycles from one Okazaki fragment to the next on the looped lagging strand. The DnaBhelicase unwinds the DNA at the replication fork as it travels along the lagging strand template inthe 5' to 3' direction. DNA primase occasionally associates with DnaB helicase and synthesizes ashort RNA primer . A new β sliding clamp is then positioned at the primer by the clamp-loadingcomplex of DNA polymerase III . When synthesis of an Okazaki fragment has been completed,replication halts, and the core subunits of DNA polymerase III dissociate from their β slidingclamp (and from the completed Okazaki fragment) and associate with the new clamp . Thisinitiates synthesis of a new Okazaki fragment. As noted earlier, the entire complex responsible forcoordinated DNA synthesis at a replication fork is a replisome .Once an Okazaki fragment has been completed, its RNA primer is removed and replaced withDNA by DNA polymerase I , and the remaining nick is sealed by DNA ligase . DNA ligase catalyzesthe formation of a phosphodiester bond between a 3' hydroxyl at the end of one DNA strand anda 5' phosphate at the end of another strand.

Table 4. Proteins at the E. coli Replication Fork 1

Protein MrNumber

ofsubunits

Function

SSB 75,600 4 Binding to single-stranded DNA

DnaB protein (helicase) 300,000 6 DNA unwinding; primosomeconstituent

Primase (DnaG protein) 60,000 1 RNA primer synthesis;primosome constituent

DNA polymerase III 791,500 17 New strand elongation

DNA polymerase I 103,000 1 Filling of gaps; excision ofprimers

DNA ligase 74,000 1 Ligation

DNA gyrase (DNA topoisomerase II) 400,000 4 Supercoiling

Box 4. Molecular Weight, Molecular Mass, and Their Correct Units

There are two common (and equivalent) ways to describe molecular mass. The first ismolecular weight, or relative molecular mass, denoted Mr. The molecular weight of a substanceis defined as the ratio of the mass of a molecule of that substance to one-twelfth the mass ofcarbon-12 (12C). Since Mr is a ratio, it is dimensionless—it has no associated units. The second is molecular mass, denoted m. This is simply the mass of one molecule, or themolar mass divided by Avogadro’s number. The molecular mass, m, is expressed in daltons(abbreviated Da).One dalton is equivalent to one-twelfth the mass of carbon-12; a kilodalton (kDa) is 1,000daltons; a megadalton (MDa) is 1 million daltons. 1

Page 15: Dna Nature

DNA Nature .. By / M. Hosny 15

DNA synthesis on the leading and laggingstrands.Events at the replication fork are coordinatedby a single DNA polymerase III dimer, in anintegrated complex with DnaB helicase. Thisfigure shows the replication process , Thelagging strand is looped so that DNA synthesisproceeds steadily on both the leading andlagging strand templates at the same time.Red arrows indicate the 3' end of the two newstrands and the direction of DNA synthesis.Black arrows show the direction of movementof the parent DNA through the complex. AnOkazaki fragment is being synthesized on thelagging strand. 1

TerminationEventually, the two replication forks of the circular E. coli chromosome meet at a terminus regioncontaining multiple copies of a 20 bp sequence called Ter (for terminus) . The Ter sequences arearranged on the chromosome to create a sort of trap that a replication fork can enter but cannotleave. The Ter sequences function as binding sites for a protein called Tus (terminus utilizationsubstance). The Tus-Ter complex can arrest a replication fork from only one direction . 1

Page 16: Dna Nature

DNA Nature .. By / M. Hosny 16

4. Gene Mutation

A gene mutation is a permanent change in the DNA sequence that makes up a gene. Mutationsrange in size from a single DNA building block (DNA base) to a large segment of a chromosome.Gene mutations occur in two ways: they can be inherited from a parent or acquired during aperson’s lifetime. Mutations that are passed from parent to child are called hereditary mutationsor germline mutations (because they are present in the egg and sperm cells, which are also calledgerm cells). This type of mutation is present throughout a person’s life in virtually every cell inthe body.Mutations that occur only in an egg or sperm cell, or those that occur just after fertilization, arecalled new (de novo) mutations. De novo mutations may explain genetic disorders in which anaffected child has a mutation in every cell, but has no family history of the disorder.Acquired (or somatic) mutations occur in the DNA of individual cells at some time during aperson’s life. These changes can be caused by environmental factors such as ultraviolet radiationfrom the sun, or can occur if a mistake is made as DNA copies itself during cell division. Acquiredmutations in somatic cells (cells other than sperm and egg cells) cannot be passed on to the nextgeneration.Mutations may also occur in a single cell within an early embryo. As all the cells divide duringgrowth and development, the individual will have some cells with the mutation and some cellswithout the genetic change. This situation is called mosaicism.Some genetic changes are very rare; others are common in the population. Genetic changes thatoccur in more than 1 percent of the population are called polymorphisms. They are commonenough to be considered a normal variation in the DNA. Polymorphisms are responsible for manyof the normal differences between people such as eye color, hair color, and blood type. Althoughmany polymorphisms have no negative effects on a person’s health, some of these variationsmay influence the risk of developing certain disorders. 6

Gene mutations can affect health and development

To function correctly, each cell depends on thousands of proteins to do their jobs in the rightplaces at the right times. Sometimes, gene mutations prevent one or more of these proteins fromworking properly. By changing a gene’s instructions for making a protein, a mutation can causethe protein to malfunction or to be missing entirely. When a mutation alters a protein that plays acritical role in the body, it can disrupt normal development or cause a medical condition. Acondition caused by mutations in one or more genes is called a genetic disorder.In some cases, gene mutations are so severe that they prevent an embryo from surviving untilbirth. These changes occur in genes that are essential for development, and often disrupt thedevelopment of an embryo in its earliest stages. Because these mutations have very seriouseffects, they are incompatible with life.

What kinds of gene mutations are possible?

The DNA sequence of a gene can be altered in a number of ways. Gene mutations have varyingeffects on health, depending on where they occur and whether they alter the function of essentialproteins. The types of mutations include:Missense mutation : This type of mutation is a change in one DNA base pair that results in thesubstitution of one amino acid for another in the protein made by a gene.Nonsense mutation : A nonsense mutation is also a change in one DNA base pair. Instead ofsubstituting one amino acid for another, however, the altered DNA sequence prematurely signalsthe cell to stop building a protein. This type of mutation results in a shortened protein that mayfunction improperly or not at all.Insertion : An insertion changes the number of DNA bases in a gene by adding a piece of DNA.As a result, the protein made by the gene may not function properly.Deletion : A deletion changes the number of DNA bases by removing a piece of DNA. Smalldeletions may remove one or a few base pairs within a gene, while larger deletions can removean entire gene or several neighboring genes. The deleted DNA may alter the function of theresulting protein(s).Duplication : A duplication consists of a piece of DNA that is abnormally copied one or moretimes. This type of mutation may alter the function of the resulting protein.

Page 17: Dna Nature

DNA Nature .. By / M. Hosny 17

Frameshift mutation : This type of mutation occurs when the addition or loss of DNA baseschanges a gene’s reading frame. A reading frame consists of groups of 3 bases that each code forone amino acid. A frameshift mutation shifts the grouping of these bases and changes the codefor amino acids. The resulting protein is usually nonfunctional. Insertions, deletions, andduplications can all be frameshift mutations.Repeat expansion : Nucleotide repeats are short DNA sequences that are repeated a number oftimes in a row. For example, a trinucleotide repeat is made up of 3-base-pair sequences, and atetranucleotide repeat is made up of 4-base-pair sequences. A repeat expansion is a mutationthat increases the number of times that the short DNA sequence is repeated. This type ofmutation can cause the resulting protein to function improperly.

Box 5. What is a transcriptome? 1

A transcriptome is a collection of all the gene transcripts present in a given cell.Genes are made up of helical molecules of deoxyribonucleic acid (DNA) that contain theblueprints for making proteins. In order to actually produce proteins, these DNA blueprints mustbe transcribed into corresponding molecules of ribonucleic acid (RNA), referred to as messengerRNA (mRNA) or gene transcripts.The mRNA molecules then deliver the instructions for making proteins to ribosomes, which aretiny molecular "machines" found in the cytoplasm of the cell. In a process called translation,ribosomes "read" the mRNA's sequence and produce a protein by assembling amino acid buildingblocks in the precise order specified by the genetic code.In addition to the thousands of genes that code for proteins, there exists a different sort of genethat is transcribed into RNA molecules but that does not code for proteins. Such gene transcripts,referred to as non-coding RNA, play roles in the structure of cell components and the regulationof DNA expression.

5. Genomics

5.1The Human Genome ProjectThe Human Genome Project, which was led at the National Institutes of Health (NIH) by theNational Human Genome Research Institute, produced a very high-quality version of the humangenome sequence that is freely available in public databases. That international project wassuccessfully completed in April 2003, under budget and more than two years ahead of schedule.The sequence is not that of one person, but is a composite derived from several individuals.Therefore, it is a "representative" or generic sequence. To ensure anonymity of the DNA donors,more blood samples (nearly 100) were collected from volunteers than were used, and no nameswere attached to the samples that were analyzed. Thus, not even the donors knew whether theirsamples were actually used. 6

The Human Genome Project was designed to generate a resource that could be used for a broadrange of biomedical studies. One such use is to look for the genetic variations that increase risk ofspecific diseases, such as cancer, or to look for the type of genetic mutations frequently seen incancerous cells. More research can then be done to fully understand how the genome functionsand to discover the genetic basis for health and disease.The International HapMap Project, in which NIH also played a leading role, represents a majorstep in that direction. In October 2005, the project published a comprehensive map of humangenetic variation that is already speeding the search for genes involved in common, complexdiseases, such as heart disease, diabetes, blindness, and cancer.Another initiative that builds upon the tools and technologies created by the Human GenomeProject is The Cancer Genome Atlas pilot project. This three-year pilot, which was launched inDecember 2005, will develop and test strategies for a comprehensive exploration of the universeof genetic factors involved in cancer. 6

Page 18: Dna Nature

DNA Nature .. By / M. Hosny 18

Box 6. Some definitions 6

Gene : The functional and physical unit of heredity passed from parent to offspring. Genes arepieces of DNA , and most genes contain the information for making a specific protein .

Intron : A noncoding sequence of DNA that is initially copied into RNA but is cut out of the finalRNA transcript .

Exon : The region of a gene that contains the code for producing the gene's protein . Each exoncodes for a specific portion of the complete protein . In some species (including humans) , agene's exons are separated by long regions of DNA ( called introns or sometimes " junk DNA " )that have no apparent function .

Codon : Three bases in a DNa or RNA sequence which specify a single amino acid.

Oncogene : A gene that is capable of causing the transformation of normal cells into cancercells.

Microsatellite : Also called a short tandem repeat or STR , which is a series of repeatednucleotides (e.g CACACACA ) in which the number of repeats is variable in different individuals .

Gene transfer : Insertion of unrelated DNA into the cells of an organism . There are manydifferent reasons for gene transfer :For example , attempting to treat disease by supplying patients with therapeutic genes . Thereare also many possible ways to transfer genes . Most involve the use of a vector , such as aspecially modified virus that can take the gene along when it enters the cell .

DNA SequencingSequencing simply means determining the exact order of the bases in a strand of DNA. Becausebases exist as pairs, and the identity of one of the bases in the pair determines the other memberof the pair, researchers do not have to report both bases of the pair. In the most common type ofsequencing used today, called the chain termination method, a DNA strand is treated with avariety of nucleotides, a set of enzymes, and a specific primer to generate a collection of smallerDNA fragments. Four fluorescent tags, each specific for a given base, is part of the mixture. Eachof the fragments differs in length by one base and is marked with a fluorescent tag that identifiesthe last base of the fragment. The fragments are then separated according to size and passed bya detector that reads the fluorescent tag. Then, a computer reconstructs the entire sequence ofthe long DNA strand by identifying the base at each position from the size of each fragment andthe particular fluorescent signal at its end.At present, this technology only can determine the order of up to 800 base pairs of DNA at atime. So, to assemble the sequence of all the bases in a large piece of DNA, such as a gene,researchers need to read the sequence of overlapping segments. This allows the longer sequenceto be assembled from shorter pieces, somewhat like putting together a linear jigsaw puzzle. Inthis process, each base has to be read not just once, but at least several times in the overlappingsegments to ensure accuracy.Researchers can use DNA sequencing to search for genetic variations and/or mutations that mayplay a role in the development or progression of a disease. The disease-causing change may beas small as the substitution, deletion, or addition of a single base pair or as large as a deletion ofthousands of bases. 6

The functions of human genesThe functions of about half of the 30000 – 40000 human genes are known ; The vast majoritycode for proteins ; less than 2500 specify the various types of non-coding RNA . Almost a quarterof the protein-coding genes are involved in expression , replication and maintenance of thegenome and another 20% specify components of the signal transduction pathways thatregulate genome expression and other cellular activities in response to signals received from

Page 19: Dna Nature

DNA Nature .. By / M. Hosny 19

outside of the cell . All of these genes can be looked on as having a function that is involved inone way or another with the activity of the genome . Enzymes responsible for the generalbiochemical functions of the cell account for another 17.5% of the known genes ; the remainderare involved in activities such as transport of compounds into and out of cells , the folding ofproteins into their correct three-dimensional structures . the immune response , and synthesis ofstructural proteins such as those found in the cytoskeleton and in muscles .

Box 7. What are genetic markers?

Markers themselves usually consist of DNA that does not contain a gene, however they can tell aresearcher the identity of the person a DNA sample came from. This makes markers extremelyvaluable for tracking inheritance of traits through generations of a family, and markers have alsoproven useful in criminal investigations and other forensic applications.Although there are several different types of genetic markers, the type most used on geneticmaps today is known as a microsatellite map. However, maps of even higher resolution are beingconstructed using single-nucleotide polymorphisms, or SNPs (pronounced "snips"). Both types ofmarkers are easy to use with automated laboratory equipment, so researchers can rapidly map adisease or trait in a large number of family members. 6

Genes and related sequencesWe look on the genes as the most important part of the human genome because these are theparts that contain biological information . Most genes specify one or more protein molecules , the"expression " of these genes involving an RNA intermediate , called messenger or mRNA , whichis transported from the nucleus to the cytoplasm where it directs synthesis of the protein codedby the gene . Other genes don't specify proteins , the end-products of their expression being non-coding RNA , which plays various roles in the cell .Most human genes are discontinuous , the biological information being divided into a series ofexons separated by non-coding introns , with an average of nine exons per gene , although somegenes have many more than this .

During gene expression , the initial RNA that is synthesized is a copy of the entire gene ,including the introns as well as the exons . The process called splicing , removes the introns fromthis pre-mRNA and joins the exons together to make mRNA which eventually directs proteinsynthesis . 2

The average structure of a gene

ExonIntronExon

This gene has two exons split by a single intron . "Upstream" and "Downstrem" are twouseful terms used to indicate the DNA sequences to either side of the gene .

Start of the biologicalinformation

End of the biologicalinformation

Upstream Downstream

Page 20: Dna Nature

DNA Nature .. By / M. Hosny 20

5.2 Single Nucleotide Polymorphisms (SNPs)

All human beings differ from one another:• physical appearance• susceptibility to disease• response to medications

DNA and genes are the blueprint from which we are made. Differences between people areevident in the sequences of their DNA.Scientists have found that many of these differences are single-nucleotide substitutions in theDNA sequence. These are referred to as Single Nucleotide Polymorphisms, or SNPs. What dothese SNPs look like at the DNA level?

Scientists estimate that our DNA contains at least ten million SNPs.

Here we have two short sequences ofDNA taken from the same region of thegenome of two different people.

We can see that the sequences arealmost exactly the same, except at onenucleotide position. 6

"Dictionary" of amino acid code words inmRNAs.The codons are written in the 5' to 3' direction.The third base of each codon (in bold type)plays a lesser role in specifying an amino acidthan the first two. The three termination codonsare shaded in pink, the initiation codon AUG ingreen. All the amino acids except methionineand tryptophan have more than one codon. Inmost cases, codons that specify the same aminoacid differ only at the third base. 1

Page 21: Dna Nature

DNA Nature .. By / M. Hosny 21

Haplotype

In the real world, of course, things are more complicated. A single SNP doesn’t usually tell usvery much.

In order to find an association between SNPs and a response to a medication, scientists have tolook at multiple SNPs across a longer stretch of DNA.

These three SNPs can be arranged in 32 (that’s three SNPs, each with two possible nucleotides),or eight different combinations .Each combination of SNPs is called a haplotype . So we can say that along this region of DNA,there are eight possible haplotypes .

When we look at the DNA samples from a large group of people, we see that only four of thesecombinations are actually present. This is typical of what we see in the real world. We often seethat only some of the possible haplotypes actually exist.

Don’t forget about genetics! In the world of genetics, everything comes in pairs. We each get onehaplotype from our mother, and a second haplotype from our father. This means that we actuallyhave two haplotypes, or a haplotype pair.You can think of a person’s haplotype pair as their “SNP profile”. When scientists look at anindividual’s response to a drug, they need to consider that person’s unique SNP profile. The twohaplotypes in a pair can be different, or they can be the same.

Applying SNP Profiles to Drug Choices

AlbuterolThe drug albuterol is commonly prescribed to relieve the symptoms of asthma. Albuteroleffectively relieves asthma symptoms in some people but not in others. Scientists are currentlystudying how people with different SNP profiles respond to treatment with albuterol.

Page 22: Dna Nature

DNA Nature .. By / M. Hosny 22

Here is what weknow today:albuterol acts on thebeta-2 adrenergicreceptor (beta2ARprotein) to relieveasthma attacks. Thebeta2AR protein isencoded by theADRB2 gene. 6

In analyzing a 3,000 base-pair-stretch of DNA in the region of the ADRB2 gene, scientists haveidentified 13 locations where SNPs exist.

Scientists have looked at this region of DNA in many different people, and have identified 12different haplotypes, which are unique combinations of these 13 SNPs.We need to remember that haplotypes come in pairs (one from Mom, and one from Dad). Eachdistinct haplotype pair represents a unique SNP profile.SNP profiles found in people with asthma affect on albuterol response , one may show a goodresponse to albuterol , another poorer and another one not .

In the future, a physician will be able to determine a patient's SNP profile, compare it with knowndata, and predict whether the patient will respond to the drug albuterol.The physician can then design the patient’s treatment accordingly. This will be a greatimprovement over the trial-and-error method physicians use today.

6. Cloning

The term cloning describes a number of different processes that can be used to producegenetically identical copies of a biological entity. The copied material, which has the same geneticmakeup as the original, is referred to as a clone.Researchers have cloned a wide range of biological materials, including genes, cells, tissues andeven entire organisms, such as a sheep. 6

Cloning occurs naturally , in nature, some plants and single-celled organisms, such as bacteria,produce genetically identical offspring through a process called asexual reproduction. In asexualreproduction, a new individual is generated from a copy of a single cell from the parent organism.Natural clones, also known as identical twins, occur in humans and other mammals. These twinsare produced when a fertilized egg splits, creating two or more embryos that carry almostidentical DNA. Identical twins have nearly the same genetic makeup as each other, but they aregenetically different from either parent.

Page 23: Dna Nature

DNA Nature .. By / M. Hosny 23

Types of artificial cloning

There are three different types of artificial cloning: gene cloning, reproductive cloning andtherapeutic cloning.Gene cloning produces copies of genes or segments of DNA. Reproductive cloning producescopies of whole animals. Therapeutic cloning produces embryonic stem cells for experimentsaimed at creating tissues to replace injured or diseased tissues.Gene cloning, also known as DNA cloning, is a very different process from reproductive andtherapeutic cloning. Reproductive and therapeutic cloning share many of the same techniques,but are done for different purposes.

How are genes cloned?Researchers routinely use cloning techniques to make copies of genes that they wish to study.The procedure consists of inserting a gene from one organism, often referred to as "foreign DNA,"into the genetic material of a carrier called a vector. Examples of vectors include bacteria, yeastcells, viruses or plasmids, which are small DNA circles carried by bacteria. After the gene isinserted, the vector is placed in laboratory conditions that prompt it to multiply, resulting in thegene being copied many times over.

How are animals cloned?The technique used to clone whole animals, such as sheep, is referred to as reproductive cloning.In reproductive cloning, researchers remove a mature somatic cell, such as a skin cell or an uddercell, from an animal that they wish to copy. They then transfer the DNA of the donor animal'ssomatic cell into an egg cell, or oocyte, that has had its own DNA-containing nucleus removed.Researchers can add the DNA from the somatic cell to the empty egg in two different ways. Inthe first method, they remove the DNA-containing nucleus of the somatic cell and inject it intothe empty egg. In the second approach, they use an electrical current to fuse the entire somaticcell with the empty egg.In both processes, the egg is allowed to develop into an early-stage embryo in the test-tube andthen is implanted into the womb of an adult female animal. Ultimately, the adult female givesbirth to an animal that has the same genetic make up as the animal that donated the somaticcell. This young animal is referred to as a clone. Reproductive cloning may require the use of asurrogate mother to allow development of the cloned embryo, as was the case for the mostfamous cloned organism, Dolly the sheep.Reproductive cloning may require the use of a surrogate mother to allow development of thecloned embryo, as was the case for the most famous cloned organism, Dolly the sheep.

What is therapeutic cloning?

Therapeutic cloning involves creating a cloned embryo for the sole purpose of producingembryonic stem cells with the same DNA as the donor cell. These stem cells can be used inexperiments aimed at understanding disease and developing new treatments for disease. To date,there is no evidence that human embryos have been produced for therapeutic cloning.

The richest source of embryonic stem cells is tissue formed during the first five days after the egghas started to divide. At this stage of development, called the blastocyst, the embryo consists ofa cluster of about 100 cells that can become any cell type. Stem cells are harvested from clonedembryos at this stage of development, resulting in destruction of the embryo while it is still in thetest tube.

In November 2007, using a new cloning method that removes the egg's nucleus without dyes orultraviolet light, researchers produced the first primate embryonic stem cells. The work involvedtransferring the nucleus of a skin cell from a male rhesus monkey into the nucleus-free egg of afemale rhesus monkey. These embryonic stem cells did not develop into a whole monkey, andresearchers said their work was aimed at therapeutic applications. However, the research showsthat, with some adjustments, the techniques used to make whole copies of other animals mayalso work in primates.

Page 24: Dna Nature

DNA Nature .. By / M. Hosny 24

Schematic illustration of DNA cloning.A cloning vector and eukaryotic chromosomes are separately cleaved with the same restrictionendonuclease. The fragments to be cloned are then ligated to the cloning vector. The resultingrecombinant DNA (only one recombinant vector is shown here) is introduced into a host cellwhere it can be propagated (cloned). Note that this drawing is not to scale: the size of the E. colichromosome relative to that of a typical cloning vector (such as a plasmid) is much greater thandepicted here. 1

Page 25: Dna Nature

DNA Nature .. By / M. Hosny 25

References

Books

1. Lehninger , Principles of biochemistry , Fourth Edition2. Genomes , Second edition , T. A. Brown3. Harper’s Illustrated Biochemistry , twenty-sixth edition4. Color Atlas of Biochemistry , Second edition , Koolman & Roehm5. The Molecular Basis Of Life , Third Edition , Trudy McKee, James R McKee

Internet

6. http://www.genome.gov7. http://www.ghr.nlm.nih.gov