Top Banner
Statistical methods in Bioinformatics
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bio info statistical-methods[1]

Statistical methods in Bioinformatics

Page 2: Bio info statistical-methods[1]

Dot Matrix•First described by Gibbs and McIntyre (1970)•Dot matrix analysis of DNA sequence (W=11, S=7) Phage P22 c2 repressor

Phage lambda cI

Page 3: Bio info statistical-methods[1]

Dot Matrix•Dot matrix analysis of amino acid sequence (W=1, S=1) Phage lambda cI

Phage P22 c2 repressor

Page 4: Bio info statistical-methods[1]

Filtering in Dot Matrix•Filtering can be applied using Sliding windows Window size Match requirement (Stringency) DNA 15 10 Protein 2/3 2

•For DNA Long Windows, higher Stringency For Proteins Short Windows, Low Stringency For Protein Domains Long Windows, Low Stringency

Page 5: Bio info statistical-methods[1]

Dot Matrix Programs

•DNA strider•DOTTER•COMPARE•DOPLOT

For sequence repeats,•LALIGN•PLALIGN

Page 6: Bio info statistical-methods[1]

LALIGN/PALIGN

Page 7: Bio info statistical-methods[1]

Dot plot for Repeat analysis(Window=1, Stringency=1)

Page 8: Bio info statistical-methods[1]

Dot plot for Repeat analysis (Window=23, Stringency=7)

Page 9: Bio info statistical-methods[1]

Dynamic programming•Compares every pair of characters in the two sequences and generates an alignment

•Alignment includes matches, mismatches and gaps

•Alignments obtained depend on the choice of scoring system

Page 10: Bio info statistical-methods[1]

Programs for alignment of sequences

Page 11: Bio info statistical-methods[1]

Scoring using Gap penalty

Page 12: Bio info statistical-methods[1]

Derivation of Dynamic programming algorithm

Page 13: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 14: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 15: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 16: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 17: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 18: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 19: Bio info statistical-methods[1]

Dynamic programming Algorithm

Page 20: Bio info statistical-methods[1]

Formal description of Algorithm

Page 21: Bio info statistical-methods[1]

Global and Local alignments

Page 22: Bio info statistical-methods[1]

Global and Local alignments

Page 23: Bio info statistical-methods[1]

Scoring matrices

•Certain amino acid substitutions common in related proteins from different species

Proteins still function with these substitutions

Page 24: Bio info statistical-methods[1]

Scoring matrices

Page 25: Bio info statistical-methods[1]

Scoring matrices

•Probability of changing

A B is identical to

B A

Page 26: Bio info statistical-methods[1]

PAM (Percent Accepted Mutation)

•Based on evolutionary principles

•Each matrix gives the changes expected for a given period of evolutionary time

•Each change at a particular site is assumed to be independent of previous mutational events

•Estimations are based on 1572 changes in 71 groups of protein sequences that were at least 85% similar

Page 27: Bio info statistical-methods[1]

Scoring matrices

Page 28: Bio info statistical-methods[1]

PAM (Percent Accepted Mutation)

PAM1 matrix estimates what rate of substitution would be expected if 1% of the amino acids had changed

Similarity Matrix used40% PAM12050% PAM8060% PAM6014-27% PAM250

Page 29: Bio info statistical-methods[1]

BLOSUM (Blocks Amino acid Substitution Matrices)

Matrix values are based on amino acid substitutions in a large set of ~2000 conserved amino acid patterns (blocks)

Note: patterns are found by MOTIFMOTIF program

Page 30: Bio info statistical-methods[1]

BLOSUM – Derivation of the Matrix values

Page 31: Bio info statistical-methods[1]

PAM 250

Page 32: Bio info statistical-methods[1]

BLOSUM62

Page 33: Bio info statistical-methods[1]

BLAST home page

Page 34: Bio info statistical-methods[1]

BLAST

Page 35: Bio info statistical-methods[1]

BLAST results

Page 36: Bio info statistical-methods[1]

BLAST results

Page 37: Bio info statistical-methods[1]

BLAST results