Top Banner
WHAT IS BIOINFORMATICS? Daniel Svozil, Laboratoř informatiky a chemie [email protected] http ://ich.vscht.cz/~svozil
56

What is bioinformatics?

Feb 25, 2016

Download

Documents

ponce

What is bioinformatics?. Daniel Svozil, Laboratoř informatiky a chemie [email protected] http ://ich.vscht.cz/~svozil. Studijn í materiály. http://ich.vscht.cz/~ svozil /teaching.html. Coursera. MOOC Bioinformatic Methods I https:// class.coursera.org/bioinfomethods1-001 - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What is bioinformatics?

WHAT IS BIOINFORMATICS?Daniel Svozil, Laboratoř informatiky a chemie

[email protected]://ich.vscht.cz/~svozil

Page 3: What is bioinformatics?

Coursera• MOOC

• Bioinformatic Methods I• https://class.coursera.org/bioinfomethods1-001

• Bioinformatics Algorithms (Part 1)• https://class.coursera.org/bioinformatics-001

• Computational Molecular Evolution• https://class.coursera.org/molevol-002

Page 4: What is bioinformatics?

Definition• NCBI

• Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned.

• Wikipedia.org• The application of information technology and statistics to the field of

molecular biology.

• The creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management, analysis and interpretation of biological data.

http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

Page 5: What is bioinformatics?

Extraction of biological knowledge from data

Data Knowledge

convert data to knowledgegenerate new hypotheses

design new experiments

Experimental

From publicdatabases

Page 6: What is bioinformatics?

Omes

Genome

Transcriptome

Proteome

Reactome

Tissue architectures

Cell interactions

Sigaling

……Metabolome

CellOrganism

genome – DNA sequence in an organismtranscriptome – mRNA of an entire organismproteome – all proteins in an organismmetabolome – all metabolites in an organisminteractome – all molecular interactions in an organism

Page 7: What is bioinformatics?

Omes and Omics• Genomics

• Primarily sequences (DNA and RNA)• Databanks and search algorithms• Supports studies of molecular evolution

• Proteomics• Sequences (Protein) and structures• Mass spectrometry, X-ray crystallography• Databanks, knowledge bases, visualization

• Functional Genomics (transcriptomics)• Microarray data• Databanks, analysis tools, controlled terminologies

• Systems Biology (metabolomics)• Metabolites and interacting systems (interactomics)• Graphs, visualization, modeling, networks of entities

Page 8: What is bioinformatics?

“Omics”

Biological knowledgeMedical knowledgeImproved health

GenomicsTranscriptomicsProteomicsMetabolomicsInteractomics……

includes

SequencingMicroarraysLC/MSNMRTwo hybrid……

measured by

these data areHigh-throughputHigh-noise

To reduce noiseAdvanced pre-processing techniques

Reliable high-throughput information

Techniques to analyze high-dimensional data and knowledgebases

source: Bios 560R Introduction to Bioinformatics, userwww.service.emory.edu/~tyu8/560R/560R_1.pptx

Page 9: What is bioinformatics?

Key reasearch in bioinformatics• sequence bioinformatics• structural bioinformatics• systems biology

• analysis of biological pathways to gain e.g. the understanding of disease processes

Page 10: What is bioinformatics?

21st century – complex systems• Designing (forward-engineering)• Understanding (reverse-engineering)• Fixing

• Why is it so complex?• Can we make a sense of this

complexity?• How is it robust?

http://yilab.bio.uci.edu/ICSB2007_Tutorial_AM1.htm

Page 11: What is bioinformatics?

CELL BIOLOGYDaniel Svozil

Page 12: What is bioinformatics?

Molecular biology• Though all aspects of biology can be studied at the

molecular level, molecular biology is usually restricted to the molecules of genes/gene products/heredity – molecular genetics

• Experiments in molecular biology are done using model organisms

• Two classes of organism• Prokaryotes• Eukaryotes

Page 13: What is bioinformatics?

Prokaryotes vs. Eukaryotes

bacteria• 1 bacteria = 1 cell• lower organisms • Escherichia coli (E. coli)

• plasma membrane• nucleus• organelles

Page 14: What is bioinformatics?

Cells in eukaryotes• body (somatic) cells

• differentiated into special cell types (brain cells, liver cells …)• produce by simple cell division – mitosis

• sex cells (gametes)• egg, sperm• used for sexual reproduction (only eukaryotes)• meiosis – reduction of the amount of genetic material

Page 15: What is bioinformatics?

Eukaryotic chromosomes• Threadlike DNA, carries genes• Each organism has specific number of chromosomes• Sex chromosomes (determine gender – XX (female), XY

(male)), autosomal chromosomes• 46 in human, 2 sex, 44 autosomal• Come in pairs (two in a pair have the same shape and

same set of genes (but different alleles)), homologs, diploid

Page 16: What is bioinformatics?

Cell cycle• Division of the cell in two exact copies.

Page 17: What is bioinformatics?

Genetics for Dummies, Tara Robinson

homologous chromosomes

homologous chromosomes copied

Page 18: What is bioinformatics?

http://www.bothbrainsandbeauty.com/wp-content/uploads/2009/11/chromosomes.jpg

Page 19: What is bioinformatics?

Karyotype

Genetics for Dummies, Tara Robinson

Page 20: What is bioinformatics?

Mitosis

2n

4n

2n 2n

diploid (2n) mother cell

identical diploid (2n) daughter cells

division

DNA synthesis

Page 21: What is bioinformatics?

Sexual reproduction• Egg gets fertilized by sperm. Zygote is cretaed.• Zygote is diploid (divides by mitosis), thus the gametes

must be haploid!• In organism with diploid

cells, how do you get haploid?

• Meiosis (another type of cell division)

Page 22: What is bioinformatics?

Meiosis• The result of meiosis is a haploid cell.• From one parent diploid cell you get four haploid cells. In

addition, homologous chromosomes go through recombination.

http://www.britannica.com

Page 23: What is bioinformatics?

DNA – The Basis of Life

Page 24: What is bioinformatics?

DNA• Biomacromolecule

• Consists of repeating units• DNA in organism does not usually exist in one piece

• chromosomes

Page 25: What is bioinformatics?

Deconstructing DNA• http://www.umass.edu/molvis/tutorials/dna/• bases, deoxyribose sugar, phosphate – nucleotide• Bases are flat → stacking• pYrimidines – C, T• puRines – A, G

Page 26: What is bioinformatics?

O3‘

O5‘

C3‘

C5‘

base

sugar

Nucleoside

Page 27: What is bioinformatics?

Nucleotide• nucleosides are interconnected by phospohodiester bond• nucleotide monophosphate

nucleoside

Page 28: What is bioinformatics?

Bases complement each other.

Chargaffs’ rules• amount of G = C• amount of A = T

Page 29: What is bioinformatics?

DNA conformations

B-DNA A-DNA Z-DNA

B

A

Z

Page 30: What is bioinformatics?

Biological role of different DNAs• B-DNA

• canonical DNA• predominant

• A-DNA• Conditions of lower humidity, common in crystallographic

experiments. However, they’re artificial.• In vivo – local conformations induced e.g. by interaction with proteins.

• Z-DNA• No definite biological significance found up to now.• It is commonly believed to provide torsional strain relief (supercoiling)

while DNA transcription occurs. • The potential to form a Z-DNA structure also correlates with regions

of active transcription.

Page 31: What is bioinformatics?

Different sets of DNA• nuclear DNA

• cell’s nucleus• majority of functions cell carries out• sequencing the genome – scientists mean nuclear DNA

• mitochondrial DNA• mtDNA• circular, in human very short (17 kbp) with 37 genes (controling

cellular metabolism)• all mtDNA comes from mom, no recombination - Mitochondrial Eve

• chloroplast DNA• cpDNA• circular and fairly large (120 – 160 kbp), with only 120 genes• inheritance is either maternal, or paternal

Page 32: What is bioinformatics?

Structure of DNA in the eukaryotic cell

• DNA in human chromosomes: 3.2 109 bp. As we’re diploid: 6.4 109 bp.

• 0.33 nm per bp 2.1 m in each nucleus, size of the nucleus: 5-10 m across

• DNA is highly compacted. Combination DNA + proteins.• During interphase, when cells are not dividing, the genetic

material exists as a nucleoprotein complex called chromatin, which is dispersed through much of the nucleus.

• Further folding and compaction of chromatin during mitosis produces the visible metaphase chromosomes.

• euchromatin – extended• heterochromatin – condensed

Page 33: What is bioinformatics?

Chromatin

nucleosome

Page 34: What is bioinformatics?

Nucleosome

Page 35: What is bioinformatics?
Page 36: What is bioinformatics?

Central dogma of molecular biology

Wikipedia

Page 37: What is bioinformatics?

Molecular Cell Biology, Harvey Lodish

Page 38: What is bioinformatics?

STUDYING GENOMES

Page 39: What is bioinformatics?

Studying DNA

Page 40: What is bioinformatics?

Enzymes for DNA manipulation• Before 1970s, the only way in which individual genes

could be studied was by classical genetics.• Biochemical research provided (in the early 70s)

molecular biologists with enzymes that could be used to manipulate DNA molecules in the test tube.

• Molecular biologists adopted these enzymes as tools for manipulating DNA molecules in pre-determined ways, using them to make copies of DNA molecules, to cut DNA molecules into shorter fragments, and to join them together again in combinations that do not exist in nature.

• These manipulations form the basis of recombinant DNA technology.

Page 41: What is bioinformatics?

Recombinant DNA technology• The enzymes available to the molecular biologist fall into

four broad categories:1. DNA polymerase – synthesis of new polynucleotides

complementary to an existing DNA or RNA template2. Nucleases – degrade DNA molecules by breaking the

phosphodiester bonds• restriction endonucleases (restriction enzyme) – cleave DNA

molecules only when specific DNA sequences is encountered3. Ligases – join DNA molecules together4. End modification enzymes – make changes to the ends of

DNA molecules

Page 42: What is bioinformatics?

source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

Page 43: What is bioinformatics?

DNA cloning• DNA cloning (i.e. copying) – logical extension of the ability

to manipulate DNA molecules with restriction endonucleases and ligases

• vector• DNA sequence that naturally replicates inside bacteria.• It consists of an insert (transgene) and larger sequence serving

as the backbone of the vector.• Used to introduce a specific gene into a target cell. Once the

expression vector is inside the cell, the protein that is encoded by the gene is produced by the cellular-transcription and translation machinery ribosomal complexes.

• plasmid (length of insert: 1-10 kbp), cosmid (40-45 kbp), BAC (100-350 kbp), YAC (1.5-3.0 Mbp)

Page 44: What is bioinformatics?

Vectors• plasmid

• DNA molecule that is separated from, and can replicate independently of, the chromosomal DNA.

• Double stranded, usually circular, occurs naturally in bacteria.• Serves as an important tool in genetics and biotechnology labs, where it is

commonly used to multiply (clone) or express particular genes.

• BAC (bacterial artificial chromosome)• It is a particular plasmid found in E. coli. A typical BAC can carry about

250 kbp.source: wikipedia

Page 45: What is bioinformatics?

source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

restriction endonuclease

ligase

DNA cloning

Page 46: What is bioinformatics?

PCR – Polymerase chain reaction• DNA cloning results in the purification of a single fragment

of DNA from a complex mixture of DNA molecules.• Major disadvantage: it is time-consuming (several days to

produce recombinants) and, in parts, difficult procedure.• The next major technical breakthrough (1983) after gene

cloning was PCR.• It achieves the amplifying of a short fragment of a DNA

molecule in a much shorter time, just a few hours.• PCR is complementary to, not a replacement for, cloning

because it has its own limitations: the need to know the sequence of at least part of the fragment.

Daniel Svozil
Show PCR at video pcr.flv (http://youtu.be/eEcy9k_KsDI), aleternatively pcr.swf (obtained from http://kisdwebs.katyisd.org/campuses/MRHS/teacherweb/hallk/Teacher Documents/AP Biology Materials/Genetics/)
Daniel Svozil
Then show The_PCR_song followed by The_Biorad_GTCA_Song.
Page 47: What is bioinformatics?

Mapping genomes

Page 48: What is bioinformatics?

What is it about?• Assigning/locating of a specific gene to particular region of

a chromosome and determining the location of and relative distances between genes on the chromosome.

• There are two types of maps: • genetic linkage map – shows the arrangement of genes (or other

markers) along the chromosomes as calculated by the frequency with which they are inherited together

• physical map – representation of the chromosomes, providing the physical distance between landmarks on the chromosome, ideally measured in nucleotide bases• The ultimate physical map is the complete sequence itself.

Page 49: What is bioinformatics?

Genetic linkage map• Constructed by observing how frequently two markers

(e.g. genes, but wait till next slides) are inherited together.• Two markers located on the same chromosome can be

separated only through the process of recombination.• If they are separated, childs will have just one marker

from the pair.• However, the closer the markers are each to other, the

more tightly linked they are, and the less likely recombination will separate them. They will tend to be passed together from parent to child.

• Recombination frequency provides an estimate of the distance between two markers.

Page 50: What is bioinformatics?

Genetic linkage map• On the genetic maps distances between markers are measured

in terms of centimorgans (cM).• 1cM apart – they are separated by recombination 1% of the time

• 1 cM is ROUGHLY equal to physical distance of 1 Mbp in human

Value of genetic map – marker analysis

• Inherited disease can be located on the map by following the inheritance of a DNA marker present in affected individuals (but absent in unaffected individuals), even though the molecular basis of the disease may not yet be understood nor the responsible gene identified.

• This represent a cornerstone of testing for genetic diseases.

Page 51: What is bioinformatics?

Genetic markers• A genetic map must show the positions of distinctive

features – markers.• Any inherited physical or molecular characteristic that

differs among individuals and is easily detectable in the laboratory is a potential genetic marker.

• Markers can be • expressed DNA regions (genes) or • DNA segments that have no known coding function but whose

inheritance pattern can be followed. • genes – not ideal, larger genomes (e.g. vertebrates) →

gene maps are not very detailed (low gene density)

Page 52: What is bioinformatics?

Genetic markers• Must be polymorphic, i.e. alternative forms (alleles) must

exist among individuals so that they are detectable among different members in family studies.

• Variations within exons (genes) – lead to observable changes (e.g. eye color)

• Most variations occur within introns, have little or no effect on an organism, yet they are detectable at the DNA level and can be used as markers.1. restriction fragment length polymorphisms (RFLPs)2. simple sequence length polymorphisms (SSLPs)3. single nucleotide polymorphisms (SNPs, pronounce “snips”)

Page 53: What is bioinformatics?

RFLPs• Recall that restriction enzymes cut DNA molecules at specific

recognition sequences.• This sequence specificity means that treatment of a DNA

molecule with a restriction enzyme should always produce the same set of fragments.

• This is not always the case with genomic DNA molecules because some restriction sites exist as two alleles, one allele displaying the correct sequence for the restriction site and therefore being cut, and the second allele having a sequence alteration so the restriction site is no longer recognized.

source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

Page 54: What is bioinformatics?

SSLPs• Repeat sequences that display length variations, different alleles

contain different numbers of repeat units (i.e. SSLPSs are multi-allelic).

• variable number of tandem repeat sequences (VNTRs, minisatellites)• repeat unit up to 25 bp in length

• simple tandem repeats (STRs, microsatellites)• repeats are shorter, usually di- or tetranucleotide

source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/

Page 55: What is bioinformatics?

SNPs• Positions in a genome where some individuals have one

nucleotide and others have a different nucleotide.• Vast number of SNPs in every genome.• Each SNP could have potentially four alleles, most exist in

just two forms.• The value of two-allelic marker (SNP, RFLP) is limited by

the high possibility that the marker shows no variability among the members of an interesting family.

• The advantages of SNP over RFLP:• they are abundant (human genome: 1.5 millions of SNPs, 100 000

RFLPs)• easire to type (i.e. easier to detect)

Page 56: What is bioinformatics?

Genome maps

source: Talking glossary of genetic terms, http://www.genome.gov/glossary/

relative locations of genes are established by following inheritance

patterns

visual appearance of a chromosome when stained and examined under a

microscope

the order and spacing of the genes, measured in base pairs

more at http://www.informatics.jax.org/silver/chapters/7-1.shtml

sequence map