Top Banner
CSE 6406: Bioinformatics Algorithms
29

CSE 6406: Bioinformatics Algorithms. Course Outline .

Dec 28, 2015

Download

Documents

Marjorie Moore
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSE 6406: Bioinformatics Algorithms. Course Outline .

CSE 6406: Bioinformatics Algorithms

Page 2: CSE 6406: Bioinformatics Algorithms. Course Outline .

Course Outline

http://203.208.166.84/masudhasan/cse6406_April_08.html

Page 3: CSE 6406: Bioinformatics Algorithms. Course Outline .

Why Bioinformatics ?• New topic• Huge scope in research, higher studies, and job market• Expected to dominate in future research in computer science• Scope for diversive research fields:

• AI• Algorithms• Database• Data mining• Software• System biology

• Other hot topics of current time: • Wireless communication, Ad-hoc networks, Sensor networks,

Database …

Page 4: CSE 6406: Bioinformatics Algorithms. Course Outline .

Our Strategy• We would say “Algorithms Motivated by/Initiated from/Applied in

Bioinformatics”

• Main focus on algorithms, not on biology. We will study from algorithm point of view, not from biological point of view

• We will quickly formulate the biological to an algorithmic problem

• We will meet many of our old friends in algorithms:• Exhaustive search, Greedy algorithms, Dynamic programming, Divide

and conquer, Graph algorithms, Trees, etc.

• How much biology do I need to know ?• Content from the text would suffice• Should not be nothing more than school biology

Page 5: CSE 6406: Bioinformatics Algorithms. Course Outline .

Brief Introduction

[1]: Ch 3, [2]: Ch 13

Page 6: CSE 6406: Bioinformatics Algorithms. Course Outline .

What is Life made of?

Page 7: CSE 6406: Bioinformatics Algorithms. Course Outline .

What is Life made of?

• All living things are made of Cells • What is Inside the cell: DNA, RNA, Proteins

Page 8: CSE 6406: Bioinformatics Algorithms. Course Outline .

Cells

• Fundamental working units of every living system• Every organism is composed of one of two types of cells:

• prokaryotic cells or • eukaryotic cells

Page 9: CSE 6406: Bioinformatics Algorithms. Course Outline .

Life begins with Cell

Page 10: CSE 6406: Bioinformatics Algorithms. Course Outline .

Overview of organizations of life• Nucleus = library• Chromosomes = bookshelves• Genes = books• Almost every cell in an organism contains the same libraries

and the same sets of books.• Books represent all the information (DNA) that every cell in

the body needs so it can grow and carry out its various functions.

Page 11: CSE 6406: Bioinformatics Algorithms. Course Outline .

More Terminology

• The genome is an organism’s complete set of DNA.• a bacteria contains about 600,000 DNA base pairs• human and mouse genomes have some 3 billion.

• Human genome has 24 distinct chromosomes.• Each chromosome contains many genes.

• Gene • basic physical and functional units of heredity. • specific sequences of DNA bases that encode instructions on

how to make proteins. • Proteins

• Make up the cellular structure• large, complex molecules made up of smaller subunits called

amino acids.

Page 12: CSE 6406: Bioinformatics Algorithms. Course Outline .

All Life depends on 3 critical molecules

• DNA• Hold information on how cell works

• RNA• Act to transfer short pieces of information to different parts of cell• Provide templates to synthesize into protein

• Protein• Form enzymes that send signals to other cells and regulate gene

activity• Form body’s major components (e.g. hair, skin, etc.)

Page 13: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA: The Code of Life

• The structure and the four genomic letters code for all living organisms

• Adenine, Guanine, Thymine, and Cytosine which pair A-T and C-G on complimentary strands.

Page 14: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA, continued• DNA has a double helix

structure which composed of • sugar molecule• phosphate group• and a base (A,C,G,T)

Page 15: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA, RNA, and the Flow of Information

TranslationTranscription

Replication

Page 16: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA the Genetics Makeup

Page 17: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA: The Basis of Life• Humans have about 3 billion base

pairs.• How do you package it into a cell?• How does the cell know where in

the highly packed DNA where to start transcription?

• DNA size does not mean more complex

• Complexity of DNA• Eukaryotic genomes consist of

variable amounts of DNA • Single Copy or Unique DNA• Highly Repetitive DNA

Page 18: CSE 6406: Bioinformatics Algorithms. Course Outline .

RNA• RNA is similar to DNA chemically. It is usually only a single strand.

T(hyamine) is replaced by U(racil)• Some forms of RNA can form secondary structures by “pairing up”

with itself. This can have change its properties dramatically.• DNA and RNA can pair with each other.

Page 19: CSE 6406: Bioinformatics Algorithms. Course Outline .

Protein Folding

• Proteins are not linear structures, though they are built that way

• Proteins tend to fold into the lowest free energy conformation.

• Its structure determines its function

Page 20: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA Operations

• Copying DNA• Cutting and Pasting DNA• Measuring DNA Length

• DNA sequencing• Probing DNA

Page 21: CSE 6406: Bioinformatics Algorithms. Course Outline .

Genetic Variation• Despite the wide range of physical variation, genetic variation

between individuals is quite small.

• Out of 3 billion nucleotides, only roughly 3 million base pairs (0.1%) are different between individual genomes of humans.

• Although there is a finite number of possible variations, the number is so high (43,000,000) that we can assume no two individual people have the same genome.

• What is the cause of this genetic variation?

Page 22: CSE 6406: Bioinformatics Algorithms. Course Outline .

Sources of Genetic Variation• Mutations are rare errors in the DNA replication process that

occur at random.

• Recombination is the shuffling of genes that occurs through sexual mating and is the main source of genetic variation.

• Others…..

Page 23: CSE 6406: Bioinformatics Algorithms. Course Outline .

Molecular evolution can be visualized with phylogenetic tree.

Page 24: CSE 6406: Bioinformatics Algorithms. Course Outline .

Turnip and Cabbage

• Cabbages and turnips share a common ancestor

Page 25: CSE 6406: Bioinformatics Algorithms. Course Outline .

Genetic Similarities Between Turnip and Cabbage

• In 1980s, scientists discovered evolutionary change in plants by comparing mitochondrial genomes of the cabbage and turnip

• 99% similarity between genes• These more or less identical gene sequence surprisingly

differed in gene order

Page 26: CSE 6406: Bioinformatics Algorithms. Course Outline .

Important discovery

Page 27: CSE 6406: Bioinformatics Algorithms. Course Outline .

DNA Reversal

5’ A T G C C T G T A C T A 3’

3’ T A C G G A C A T G A T 5’

5’ A T G T A C A G G C T A 3’

3’ T A C A T G T C C G A T 5’

Break and Invert

Page 28: CSE 6406: Bioinformatics Algorithms. Course Outline .

Algorithmic Problem• Given two strings (of same set of characters) find a sequence

of reversals of substrings that will transform one to other

• Biologist are interested in shortest such sequence

• Which makes the algorithm more challenging, and it is one of the most studied problem in algorithmic bioinformatics !!!

Page 29: CSE 6406: Bioinformatics Algorithms. Course Outline .

Thanks