Top Banner
Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: [email protected] 573-882-7064 http://digbio.missouri.edu
32

Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: [email protected] 573-882-7064.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Computational Problemsin Molecular Biology

Dong Xu

Computer Science Department109 Engineering Building WestE-mail: [email protected]

573-882-7064http://digbio.missouri.edu

Page 2: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Lecture Outline

From DNA to gene

Protein sequence and structure

Gene expression

Protein interaction and pathway

Provide a roadmap for the entire course

Biology from system level (computational perspective)

Page 3: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

About Life

Life is wonderful: amazing mechanisms

Life is not perfect: errors and diseases

Life is a result of evolution

Page 4: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Cells

Basic unit of life Prokaryotes/eukaryotes Different types of cell:

Skin, brain, red/white blood

Different biological function

Cells produced by cells Cell division (mitosis)

2 daughter cells

Page 5: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

DNA

Double Helix (Watson & Crick)

Nitrogenous Base Pairs Adenine Thymine [A,T]

Cytosine Guanine [C,G]

Weak bonds (can be broken)

Form long chains

Page 6: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Genome

Each cell contains a full genome (DNA) The size varies:

Small for viruses and prokaryotes (10 kbp-20Mbp)Medium for lower eukaryotes

Yeast, unicellular eukaryote 13 Mbp Worm (Caenorhabditis elegans) 100 Mbp Fly, invertebrate (Drosophila melanogaster) 170 Mbp

Larger for higher eukaryotes Mouse and man 3000 Mbp

Very variable for plants (many are polyploid) Mouse ear cress (Arabidopsis thaliana) 120 Mbp Lilies 60,000 Mbp

Page 7: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Differences in DNA

~2% ~4%

~0.2%

Page 8: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Genes

Chunks of DNA sequence that can translate into functional biomolecules (protein, RNA)

2% human DNA sequence for coding genes

32,000 human genes, 100,000 genes in tulips

Page 9: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Gene Structure

General structure of an eukaryotic gene

Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region

Page 10: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Informational Classes in Genomic DNA

Transcribed sequences (exons and introns) Messenger sequences (mRNA, exons only) Coding sequences (CDS, part of the exons only) Heads and tails: untranslated parts (UTR) Regulatory sequences ... and all the rest

Identify them: gene-finding

Page 11: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Genetic CodeA=Ala=Alanine

C=Cys=Cysteine

D=Asp=Aspartic acid

E=Glu=Glutamic acid

F=Phe=Phenylalanine

G=Gly=Glycine

H=His=Histidine

I=Ile=Isoleucine

K=Lys=Lysine

L=Leu=Leucine

M=Met=Methionine

N=Asn=Asparagine

P=Pro=Proline

Q=Gln=Glutamine

R=Arg=Arginine

S=Ser=Serine

T=Thr=Threonine

V=Val=Valine

W=Trp=Tryptophan

Y=Tyr=Tyrosine

Page 12: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Protein Synthesis

AGCCACTTAGACAAACTA (DNA)Transcribed to:

AGCCACUUAGACAAACUA (mRNA)Translated to:

SHLDKL (Protein)

Page 13: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

About Protein

10s – 1000s amino acids (average 300)

Lysozyme sequence (129 amino acids):KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS

TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS

DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

Protein backbones:Side chain

Page 14: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Evolution of Genes: Mutation

Genes alter (slightly) during reproduction

Caused by errors, from radiation, from toxicity

3 possibilities: deletion, insertion, alteration

Deletion: ACGTTGACTC ACGTGACTC

Insertion: ACGTTGACTC AGCGTTGACTC

Substitution: ACGTTGACTC ACGATGACTC

Mutations are mostly deleterious

Page 15: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Ancestor

Gene duplication

X Y

Recombination

75%X 25%Y

Paralogs(related functions)

Mixed Homology

Orthologs(similar

function)

Evolution and Homology

Twilight zone: undetectable homology (<20% sequence identity)

Page 16: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Sequence Comparison

o Pairwise sequence comparison

o multiple alignment

SAANLEYLKNVLLQFIFLKPG--SERERLLPVINTMLQLSPEEKGKLAAV O15045

NEKNMEYLKNVFVQFLKPESVP-AERDQLVIVLQRVLHLSPKEVEILKAA P34562

KNEKIAYIKNVLLGFLEHKE----QRNQLLPVISMLLQLDSTDEKRLVMS Q06704

REINFEYLKHVVLKFMSCRES---EAFHLIKAVSVLLNFSQEEENMLKET Q92805

MLIDKEYTRNILFQFLEQRD----RRPEIVNLLSILLDLSEEQKQKLLSV O42657

EPTEFEYLRKVMFEYMMGR-----ETKTMAKVITTVLKFPDDQAQKILER O70365

DPAEAEYLRNVLYRYMTNRESLGKESVTLARVIGTVARFDESQMKNVISS Q21071

STSEIDYLRNIFTQFLHSMGSPNAASKAILKAMGSVLKVPMAEMKIIDKK Q18013

Page 17: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Phylogenetic Trees

Understand evolution

Page 18: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Protein Structure

Lysozyme structure:

ball & stick strand surface

Page 19: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Structure Features of Folded Proteins

Compact Secondary structures:

loop -helix -sheet

Protein cores mostly consist of -helices and -sheets

Page 20: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Protein Structure Comparison

Structure is better conserved than sequence

Structure can adopt a wide range of mutations.

Physical forces favorcertain structures.

Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

Page 21: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Protein Folding Problem

A protein folds into a unique 3D structure under the physiological condition

Lysozyme sequence: KVFGRCELAA AMKRHGLDNY

RGYSLGNWVC AAKFESNFNT

QATNRNTDGS TDYGILQINS

RWWCNDGRTP GSRNLCNIPC

SALLSSDITA SVNCAKKIVS

DGNGMNAWVA WRNRCKGTDV

QAWIRGCRL

Page 22: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Structure-Function Relationship

Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

A predicted structure is a powerful tool for function inference. Trp repressor as a function switch

Page 23: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Structure-Based Drug Design

HIV protease inhibitor

Structure-based rational drug design is still a major method for drug discovery.

Page 24: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Gene Expression

Same DNA in all cells, but only a few percent common

genes expressed (house-keeping genes).

A few examples:

(1) Specialized cell: over-represented hemoglobin in blood cells.

(2) Different stages of life cycle: hemoglobins before and after

birth, caterpillar and butterfly.

(3) Different environments: microbial in nutrient poor or rich

environment.

(4) Special treatment: response to wound.

Page 25: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Eucaryote Gene Expression Control

DNAPrimaryRNA

transcriptmRNA mRNA

nucleus cytosol

RNA transportcontrol

inactivemRNA

mRNA degradation

control

translationcontrol

nucleus membrane

transcriptionalcontrol

protein

inactiveprotein

protein activitycontrol

RNA processing

control

Methods: Mass-spec Microarray

Page 26: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Gene Regulation

DNA sequenceStart of transcription

promoter

operator

Page 27: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Microarray Experiments

Microarray data

Regulation/function/pathway/cellular state/phenotype

Disease: diagnosis/gene identification/sub-typing

Microarray chip

Page 28: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Genetic vs. Physical Interaction

Regulatory network

Genetic interaction

Complex system

Physical interaction

Gene/protein interaction

Expressedgene

Transcriptionfactor

Page 29: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Biological Pathway

Page 30: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Studying Pathways throughSystems Biology Approach

RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC

sequence

structure

function protein interaction

gene regulation

pathway(cross-talk)

Page 31: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Discussion

Possible impacts of biotechnology to our life

Page 32: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064.

Assignments

Required reading:* Chapter 13 in “Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000.”

* Larry Hunter: molecular biology for computer scientists

Optional reading: http://www.ncbi.nih.gov/About/primer/bioinformatics.html

http://www.bentham.org/cpps1-1/Dong%20Xu/xu_cpps.htm