Top Banner
Multiple Alignment Multiple Alignment and Phylogenetic and Phylogenetic Trees Trees Csc 487/687 Computing for Csc 487/687 Computing for Bioinformatics Bioinformatics
21

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Multiple Alignment and Multiple Alignment and Phylogenetic TreesPhylogenetic Trees

Csc 487/687 Computing for Csc 487/687 Computing for BioinformaticsBioinformatics

Page 2: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Multiple Sequence AlignmentMultiple Sequence Alignment

• One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud.

• Very informative

Page 3: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

DefinitionDefinition

• A global alignment of a set of sequences is obtained by– inserting into each sequence gap characters

• so that– the resulting sequences are of the same

length

• and so that– no “column” has only gap characters

Page 4: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Example: Chromo domains alignedExample: Chromo domains aligned

Page 5: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Use of alignmentsUse of alignments• High sequence similarity usually means significant

structural and/or functional similarity. The reverse does not need to be true

• Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site.

• Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two.

• Multiple alignment can be used to derive evolutionary history.

Page 6: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Use of alignmentsUse of alignments

• Predict features of aligned objects– conserved positions

• structurally/functionally important

Page 7: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Conserved positions

Page 8: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Use of alignmentsUse of alignments

• Predict features of aligned objects– conserved positions

• structurally/functionally important

– patterns of hydrophobicity/hydrophilicity• secondary structure elements

Page 9: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Helix pattern

Page 10: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Use of alignmentsUse of alignments

• Predict features of aligned objects– conserved positions

• structurally/functionally important

– patterns of hydrophobicity/hydrophilicity• secondary structure elements

– “gappy” regions• loops/variable regions

Page 11: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Loop? Loop?Loop?

Page 12: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Use of AlignmentsUse of Alignments- make patterns/profiles- make patterns/profiles

• Can make a profile or a pattern that can be used to match against a sequence database and identify new family members

• Profiles/patterns can be used to predict family membership of new sequences

• Databases of profiles/patterns– PROSITE– PFAM– PRINTS– ...

Page 13: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Prosite: Motifs for Prosite: Motifs for classificationclassification

Protein sequence

Prositepattern 1

Prositepattern 2

Prositepattern n

Family 1 Family 2 Family n

PatternRegular expression

Profile

Page 14: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Pattern from alignmentPattern from alignment[FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]

Page 15: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Alignment problemAlignment problem

Given a set of sequences, produce a multiple alignment which corresponds as

well as possible to the biological relationships between the corresponding

bio-molecules

Page 16: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

For homologous proteinsFor homologous proteins

• Two residues should be aligned (on top of each other)– if they are homologous (evolved from the

same residue in a common ancestor protein)– if they are structurally equivalent

Page 17: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Automatic approachAutomatic approach

• Need a way of scoring alignments – fitness function which for an alignment

quantifies its “goodness”

• Need an algorithm for finding alignments with good scores

• Not all methods provide a scoring function for the final alignment!

Page 18: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Analysis of fitness functionAnalysis of fitness function

• One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences

• For example, if the structure of (some of) the proteins are known.

Page 19: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Align by use of dynamic programmingAlign by use of dynamic programming

• Dynamic programming finds best alignment of k sequences with given scoring scheme

• For two sequences there are three different column types

• For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x

• Time complexity of O(nk) (sequence lengths = n)

Page 20: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Use of dynamic programmingUse of dynamic programming

• Dynamic programming finds best alignment of k sequences given scoring scheme

Page 21: Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Algorithm for dynamic programmingAlgorithm for dynamic programming