Top Banner
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview . . WWW IBP IR ال رت پ ان ي نا ر پ ا ك ي ن رما و ف نا و ي ن
35

1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.

Dec 27, 2015

Download

Documents

Felicia Barnett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

1

Bio + InformaticsAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

An Overview

WWW.IBP.IR بيوانفورماتيك پرتالايرانيان

Page 2: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

2

Outline

• Introduction• DNA• Definitions• Problems in bioinformatics• Conclusion

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 3: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

3

Sciences reach a point where they become

mathematized!“Leonard Adleman”

Page 4: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

4

Computing Devices

• Computers→ electronic components (transistors,…)

• Brains→ biological components (neurons, …)

• Cells→ biomolecular components (DNA,…)

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 5: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

5

DNA

• Deoxyribonucleic acid: DNA• Four nucleotides (bases), or building blocks:

A, T, G, C• Zips itself up into helixes using base pairs:

→ A with T→ G with C

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

DNA is essentially digital

Page 6: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

6

Bioinformatics

• Biomolecular computation→ idea: use biomolecules and biochemical

processes for solving computational problems

• Computational molecular biology→ goal: understand/explain biomolecular

systems and mechanisms

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 7: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

7

After going through an age of specialization, the sciences are now reuniting into a common

mode of inquiry. “The next generation could

produce a scientist in the old sense, a real generalist.”

“Leonard Adleman”

Page 8: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

8

Biomolecular Computation

• Idea: use biomolecules and biochemical processes for solving computational problems

• Start point: Leonard Adleman, 1994→ solving the Hamilton Path Problem using

liquid-phase DNA chemistry

• Advantages:→ fast→ efficient in energy consumption→ great storage capabilities

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 9: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

9

Computational Molecular Biology

• Goal: understand/explain biomolecular systems and mechanisms

• Application of computer technology to the management of biological information.

• Using Computers to gather, store, analyze and integrate biological and genetic information.

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Bioinformatics

Page 10: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

10

Problems in Bioinformatics

Page 11: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

11

Sequencing GenomesAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

GAGGGAACACAGTCTGCACACTCCTTCCGATAT

GAGGGAACACA

GTCTGCACACT

CCTTCCGATAT

Page 12: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

12

Sequencing GenomesAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

GAGGGAACACAGTCTGCACACTCCTTCCGATAT

GAGGGAACACAGT

AGTCTGCACACTC

CTCCTTCCGATAT

Page 13: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

13

Sequencing Genomes

• Concrete problem: Sequence assembly problem→ given: fragments of large DNA sequence with

overlaps (multiple coverage)→ want: entire sequence

• Complicating factors→ computational complexity: can be seen as a

variation of shortest common superstring problem which is known to be NP-hard

→ incorrect/missing nucleotides in fragment data

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 14: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

14

Relation btw Organisms

• Concrete problem: Phylogenetic tree inference→ given: homologous DNA sequence from multiple

species→ want: evolutionary tree relating these sequences

• Complicating factors→ errors in sequence→ complexity/quality of multiple sequence

alignment→ limited knowledge of evolutionary processes

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 15: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

15

Sequence AlignmentAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 16: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

16

DNA-Genes-Proteins

• Basic molecule of life: directly controls the fundamental biology of life

• Proteins determines the biological makeup of humans or any living organisms

• Variations and errors in the genomic DNA may lead to different diseases or disorders

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

DNA → Genes → Proteins

Page 17: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

17

DNA → ProteinsAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

DNA (gene)

↓mRNA

↓Protein

Page 18: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

18

Computational Gene Finding

• Given: raw sequence data• Predict:

→ coding and non-coding regions→ exons/introns→ splicing patterns→ transcription factors

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Exon1

Exon2 Exon3

Intron1 Intron2

Exon1

Exon2 Exon3

Pre mRNA

mRNA

Page 19: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

19

Structure Prediction

• RNA & Protein• Minimum free energy• RNA structure:

→ primary structure: Single stranded sequence of A, U, G, C

→ secondary structure: Intra-molecular base pairs among its bases

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 20: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

20

5’-GAGGGAACACAGUCUGCACACUCCUUC-3’

Secondary Structure

Page 21: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

21

Arc Diagram Representation

Page 22: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

22

LoopsAAACUGCUGACCGGUAACUGAGGCCUGCCUGCAAUUGCUUAACUUGGC

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Hairpin loop

Interior loop

Multi loopExternal loop

Bulge loopStacked pair

Page 23: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

23

Pseudoknotted Structure

1 2 3 4 5 6 7 8 9 10 11 12 13

Page 24: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

24

Str. Pred. Algorithms

• Dynamic programming algorithms→ restricted class of pseudoknotted structures→ Rivas and Eddy (R&E): O(N^6)

• Heuristic algorithms→ search over the solution space

AAACUGCUGACCGGUAACUGAGGCCUGCCUGCAAUUGCUUAACUUGGC

Page 25: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

25

Motif DiscoveryAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 26: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

26

Genes and Diseases

• Proteins perform all of life’s essential functions

• Changes in DNA sequence genome can have disastrous consequences

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 27: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

27

Real World ApplicationsAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 28: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

28

Related Aspects

• Computation models of organisms or biological systems

• Nature-inspired algorithms→ genetic algorithms→ neural networks→ ant colony optimization

• Artificial life→ life-like behavior of artificial systems→ (re)-design or biological organisms

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 29: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

29

Conclusion

• Bioinformatics: Using computers for gathering, storing and analyzing biological data

• Analyzing

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 30: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

30

Thank you!AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Baharak Rastegari, [email protected]

Bio Informatics

Page 31: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

31

Genetic ProcessAAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 32: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

33

Page 33: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

34

DNA

• Gene expression?• Two genes

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

DNA → Genes → Proteins

Page 34: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

35

Genomic Sequence Data Interpretation

• Gene finding• Structure prediction• Pattern discovery• Classification • Clustering

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC

Page 35: 1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview  پرتال  پرتال بيوانفورماتيك ايرانيان.

36

Understanding the Cell

• Concrete problem: Gene regulatory relationship inference→ given: expression profiles of two genes A, B→ want: decide if there is a (direct) regulatory

relationship between A and B, and whether its activating or inhibiting one

• Complicating factors→ imprecision/limitation in measuring expression

profiles→ indirect/complex regulatory relationship

AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC