Top Banner
DNA Properties DNA Properties CSE, Marmara University CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Genomics Proteomics Functional genomics Structural bioinformatics Structural bioinformatics Computational Molecular Biology Bioinformatics Genomics Genomics Proteomics Functional genomics Structural bioinformatics Structural bioinformatics
16

DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

DNA PropertiesDNA PropertiesCSE, Marmara University CSE, Marmara University

mimoza.marmara.edu.tr/~m.sakalli/cse546mimoza.marmara.edu.tr/~m.sakalli/cse546

Oct/19/09Oct/19/09Computational

MolecularBiology

Bioinformatics

GenomicsGenomics

Proteomics

Functionalgenomics

Structuralbioinformatics

Structuralbioinformatics

ComputationalMolecularBiology

Bioinformatics

GenomicsGenomics

Proteomics

Functionalgenomics

Structuralbioinformatics

Structuralbioinformatics

Page 2: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

No simple definition of being alive!! (life).. No simple definition of being alive!! (life).. Reproducing itself, a default mechanism for every alive beingReproducing itself, a default mechanism for every alive beingHow about computer programs, crystals, and self building and self How about computer programs, crystals, and self building and self

learning robotics and computers.. learning robotics and computers..

Life on earth is a result of an evolutionary process, and idea is that all Life on earth is a result of an evolutionary process, and idea is that all living things have a common ancestor and are related through…living things have a common ancestor and are related through…

Basic components of evolution:Basic components of evolution:InheritanceInheritanceVariation: defined legal moves in genotype space.Variation: defined legal moves in genotype space.Selection: a probabilistic evaluation functionSelection: a probabilistic evaluation function

In Computer Science: DNA is a string of symbols from alphabet In Computer Science: DNA is a string of symbols from alphabet {A,C,G,T}{A,C,G,T}

A search through a very large space of possible organism A search through a very large space of possible organism characteristics.characteristics.

And the words built from the four letter alphabet covers all the And the words built from the four letter alphabet covers all the inheritedinherited characteristics (called the genotype) of all the organisms. characteristics (called the genotype) of all the organisms.

Page 3: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

The Central Dogma in molecular biology

http://proquestcombo.safaribooksonline.com/0596002998/blast-CHP-2http://proquestcombo.safaribooksonline.com/0596002998/blast-CHP-2

3 processes: Replication, Transcription, and Translation.3 processes: Replication, Transcription, and Translation.

Every cell in our body has 23 Every cell in our body has 23 chromosomeschromosomes in the nucleus and the in the nucleus and the genesgenes in these chromosomes are in these chromosomes are responsible for almost all of the characteristics (not merely a physical).responsible for almost all of the characteristics (not merely a physical).

Page 4: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mboc4.figgrp.600, by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter

Figure 4-5. The DNA double helix. (A) A space-filling model of 1.5 turns of the DNA double helix. Each turn of DNA is made up of 10.4 nucleotide pairs and the center-to-center distance between adjacent nucleotide pairs is 3.4 nm. The coiling of the two strands around each other creates two grooves in the double helix. As indicated in the figure, the wider groove is called the major groove, and the smaller the minor groove. (B) A short section of the double helix viewed from its side, showing four base pairs. The nucleotides are linked together covalently by phosphodiester bonds through the 3 -hydroxyl (-OH) group of one sugar and the 5 -phosphate (P) of the next. Thus, each polynucleotide strand has a chemical polarity; that is, its two ends are chemically different. The 3 end carries an unlinked -OH group attached to the 3 position on the sugar ring; the 5 end carries a free phosphate group attached to the 5 position on the sugar ring.

Page 5: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Polymer of:Polymer of:Ribose sugarRibose sugar

PhosphatePhosphate

Nitrogenous baseNitrogenous base

BasesBasesA, C, G, TA, C, G, T

and Uraciland Uracil

Pairing rulePairing ruleA (R) — T (Y)A (R) — T (Y)

G (R) — C (Y)G (R) — C (Y)

PuRine, PyrimidinePuRine, Pyrimidine

DNA structure and base pairing

Why double-stranded! Why double-stranded!

Chemically and biophysically more stable!!, allows some error correction (backup) if Chemically and biophysically more stable!!, allows some error correction (backup) if accidentally damaged—UV irradiation--. accidentally damaged—UV irradiation--.

Page 6: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.
Page 7: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Genes (less than 5% of all), providing the coding information. Instructions for protein synthesis,regulatory functions..Redundancy translates to robustness!!Synonymous codonsDual strandsDiploid

In translation the information now encoded in RNA is deciphered (translated) into instructions for making a protein. Codon: Sets of three nucleotides. Codon determines which amino acid to be added next in the protein chain. For example, GCU, the first codon in the figure, codes for alanine.

RNA - Translation

Page 8: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

The table of the nucleotide triplets (codons) and their corresponding aa. a uracil (U) is substituted for a The table of the nucleotide triplets (codons) and their corresponding aa. a uracil (U) is substituted for a thymine (T). This is Universal process.. thymine (T). This is Universal process..

The RNA alphabet is A, C, G, and U, GAAUUC The RNA alphabet is A, C, G, and U, GAAUUC

the third position of a codon is often insignificant the third position of a codon is often insignificant

ATG: Start codon protein (methionine)ATG: Start codon protein (methionine)

T in the middle hydrophobic aa. T in the middle hydrophobic aa.

64 possible codons but 20 total aa, start and stop kind of!!.. Or regulatory functions.64 possible codons but 20 total aa, start and stop kind of!!.. Or regulatory functions.

Second nt position, U, C, A, GSecond nt position, U, C, A, G

11 st

st n

t po

sition

, U, C

, A, G

nt p

ositio

n, U

, C, A

, G

33 rdrd n

t po

sition

, U, C

, A, G

nt p

ositio

n, U

, C, A

, G

Page 9: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

SNP, single nucleotide polymorphism, wobbling in the code, neutral synonymous mutations.

Some changes at every third of the DNA sequence, for example a point mutation such as that shown below, will not yield any variation of the amino acid sequence and nor the protein produced, for example alanine is produced in either case of a U to a C, therefore a point mutation from U to C would make no difference.

GCUAGGAUCUCAGGCUCA

GCCAGGAUCUCAGGCUCAPoint mutation

Protein coding sequences are called exons. The redundant parts are introns, intervening DNA segments. Both introns and exons are transcribed into mRNA (see next slide) but only exons remain in the final transcript. Frameshift of the sequence: 6 possible reading. Therefore it is important to know which codon to start translation with, and where to stop.

http://en.wikipedia.org/wiki/Gene

Page 10: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Splicing of DNA to eliminate introns

A Science Primer http://www.ncbi.nlm.nih.gov/About/primer/est.html

A protein-coding region framed with Met (ATG) and any stop codon is (called an open reading frame). TAA, TAG, or TGA. An example of an ORF.

….TCGAATGGCATTCGCAGTC…………..TACTTGCACGCTTGACCGTCATAAGCA….

In addition, each of the 20 aa’s have different chemical properties which cause the protein chains to form different 3D shapes, and differentiate their particular functions in the cell.

For example, certain folding patterns (called tertiary structures) make it possible for specific enzymes to bind in a particular place. One change in the DNA sequence could change the amino acid, which could change the protein structure…. And the enzymes..

Page 11: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Levels and types of genome variations

Plant genomes may differ from one another in different ways:http://www.igd.cornell.edu/Comparative%20Genomics

1. Amount of DNA in the nucleus. Quantified in picogrms, (also called C-value), varies over 1000-fold.

2. Number and size of chromosomes. 3. Differences at the sequence level, both in the |absolute order| of the bases, and in

the type and number of different classes of sequences.

Organisms originated millions of years ago, from the same sequence should be sharing the same sequential structures, family-tree, phylogeny.

Some of the mechanisms of genetic variations: Point mutations Insertions and deletions Translocations Transposons, (mobile) jumping genes, retrotransposons

copying themselves from RNA back to DNA – reverse transcriptase, Splicing, transcription and translation errors

Page 12: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Finding genes: cDNA: The genetic sequence could be analyzed from the DNA, but it has too much non-genetic junk materials, jut studying mRNA, however, mRNA and protein are very unstable and therefore difficult to work with.

Instead, scientists use special enzymes to convert RNA into complementary DNA (cDNA) which is a much more stable compound and because it was generated from a mRNA in which the introns have been removed, cDNA represents only transcribed DNA sequence, the genes.

Genetic Mapping: Used for linkage mapping, and uses the concepts of Mendelian inheritance and recombination frequencies to determine the chromosomal location by analyzing their inherited patterns. Done by either Southern blot (electrophoresis separated fragments subsequently detected by probe hybridization) and, more recently polymerase chain reaction - PCR (using thermal cycling) based methods.

A tomato F2 population used to calculate recombination frequencies, and genetic distances, between a selection of SSRs simple sequence repeat (microsatellites) SSRs and other molecular markers.

DNA: contains non-genic material

RNA: unstable

cDNA: stable and mainly genes

Page 13: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Comparative mapping: Among related but sexually incompatible species, heterologous (between species) DNA markers can be used to generate comparative maps and to infer linkage conservation and the position of orthologous (if branched from the homologous) loci. This requires a minimal amount of similarity between the target and probe species and so cannot be used with more distantly related species. Most gramineae genomes (i.e. grass species, maize, rice, wheat, barley, millet, etc) are connected through comparative genetic maps. While genome size varies dramatically among grass species, but the gene content and gene order remain more highly conserved..

Page 14: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Packing of DNA in the nucleus

http://employees.csbsju.edu/hjakubowski/classes/ch331/DNA/oldnastructure.htmlhttp://employees.csbsju.edu/hjakubowski/classes/ch331/DNA/oldnastructure.html

Page 15: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Figure 1-38. Genome sizes compared. Genome size is measured in nucleotide pairs of DNA per haploid genome, that is, per single copy of the genome. (The cells of sexually reproducing organisms such as ourselves are generally diploid: they contain two copies of the genome, one inherited from the mother, the other from the father.) Closely related organisms can vary widely in the quantity of DNA in their genomes, even though they contain similar numbers of functionally distinct genes. (Data from W.-H. Li, Molecular Evolution, pp. 380 383. Sunderland, MA: Sinauer, 1997.)

Archebacterium living in a Archebacterium living in a superheated sulphur vent at the superheated sulphur vent at the bottom of the oceanbottom of the ocean

A two-ton polar bear roaming the A two-ton polar bear roaming the arctic circlearctic circle

Genome size (length of DNA) Genome size (length of DNA) varies from 5,000 (SV40 virus) varies from 5,000 (SV40 virus) to 3*10to 3*1099 (humans) 10 (humans) 101111 (higher (higher plants)plants)

All organism share basic propertiesAll organism share basic properties

Made of cells (membrane-enclosed Made of cells (membrane-enclosed sacks of chemicals)sacks of chemicals)

Carry basic reactions (e.g. core Carry basic reactions (e.g. core metabolic and developmental metabolic and developmental pathways)pathways)

Page 16: DNA Properties CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 Oct/19/09 Computational Molecular Biology Bioinformatics Genomics Proteomics.

Three major groups:Three major groups:Archaea (recently discovered)Archaea (recently discovered)Bacteria (germs, algae, symbiotic organisms)Bacteria (germs, algae, symbiotic organisms)EukaryotesEukaryotes

AnimalsAnimalsGreen PlantsGreen PlantsFungiFungiProtistsProtists

VirusesViruses

Figure 1-21. The three major divisions (domains) of the living world. Note that traditionally the word bacteria has been used to refer to procaryotes in general, but more recently has been redefined to refer to eubacteria specifically. Where there might be ambiguity, we use the term eubacteria when the narrow meaning is intended. The tree is based on comparisons of the nucleotide sequence of a ribosomal RNA subunit in the different species. The lengths of the lines represent the numbers of evolutionary changes that have occurred in this molecule in each lineage.

Tree of Life