Top Banner

Click here to load reader

Inferring phylogenetic trees

Feb 23, 2016

ReportDownload

Documents

zayit

Inferring phylogenetic trees. Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington [email protected] One-minute responses. I did not understand anything in the Gibbs sampling and the second method. - PowerPoint PPT Presentation

Multiple testing correction

Inferring phylogenetic treesProf. William Stafford NobleDepartment of Genome SciencesDepartment of Computer Science and EngineeringUniversity of Washington

[email protected]

One-minute responsesI did not understand anything in the Gibbs sampling and the second method.The class was quite OK now. Understood most important things.I understood 50% of the Python part. But I am a bit confused about the goal of the programs.Please send us the slides immediately after lecture.I put the slides on the website during the Python half of the class. Hit refresh on the web browser to see them.I didnt understand clearly converting scores to p-values, more especially putting 1 and 2. Otherwise everything was clear.I think we should go a little bit slower.I didnt understand the EM and Gibbs.The concept of EM and Gibbs sampling are really very important. Please go in depth on them.Python sessions are still fine as usual.These algorithms are complex. Could you please explain them with a bit of some examples?I didnt understand the second Python problem.Emile must not mark our assessment on the programming part.

2Revision - GibbsMotif occurrencesPSSMRandomly selectRandomly discard one sequenceBuild PSSM from remaining sequencesCountsAdd pseudocountsNormalizeScan discarded sequence with PSSMChoose new occurrence according to resulting probabilitiessequencesRevision - EMMotif occurrencesPSSMRandomly selectCountsAdd pseudocountsNormalizeDivide by backgroundTake log2Scan each sequence with PSSMTake top-scoring occurrencesequencesPhylogenetic inference

RabbitDoveLionDonkey?OutlineParsimonyDistance methodsComputing distancesFinding the treeMaximum likelihood6Selecting a methodChooseset ofrelatedsequencesObtainmultiplesequencealignmentIs therestrongsequencesimilarity?MaximumparsimonymethodsIs there clearlyrecognizablesequencesimilarityMaximumlikelihoodmethodsDistancemethodsNoYesNoYes7Maximum parsimonyfor each possible treecompute the parsimony scorereturn the tree with the best scoreEnumerating these trees can take a very long timeComputing this score is straightforward8How many trees?With four sequences: 3 unrooted trees

With five sequences: 15 unrooted trees.With seven sequences: 954 unrooted trees.1234132414329Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer = ASmik = ASpar = GSkud = A10Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer = ASmik = ASpar = GSkud = AAAScore = 111Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer = ASmik = ASpar = GSkud = AScer = ASpar = GSmik = ASkud = AScer = ASmik = ASkud = ASpar = GAAScore = 1AAAAScore = 1Score = 1This site is uninformative, because all the trees have the same score.12Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer =Smik =Spar =Skud =Scer =Spar =Smik =Skud =Scer =Smik =Skud =Spar =Score = ?Score = ?Score = ?13Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer = GSmik = ASpar = GSkud = TScer = GSpar = GSmik = ASkud = TScer = GSmik = ASkud = TSpar = GGAScore = 2GGGGScore = 2Score = 214Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer =Smik =Spar =Skud =Scer =Spar =Smik =Skud =Scer =Smik =Skud =Spar =Score = ?Score = ?Score = ?15Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGAScer = ASmik = TSpar = ASkud = TScer = ASpar = ASmik = TSkud = TScer = ASmik = TSkud = TSpar = AScore = 1Score = 2Score = 2ATAAAAThis tree is best.16Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGA 1 2111101220012223121ScerSmikSparSkudTotal = 2617Computing parsimony scoresScer AGAAAAATAACTTTCTCATGSpar GGAAAAATAACTTTCTGACASmik AAAATAACTTCTCAACAATASkud ATCTTGATCCCTTGTGTTGA 1 2112101220012223131ScerSparSmikSkudTotal = 2818Parsimony softwareIn general, the most widely used programs for phylogenetic analysis arePhylip (Joe Felsenstein)PAUP (Jim Swofford)MacClade (David and Wayne Maddison)All three do parsimony. Only Phylip is free.19Previous one-minute responsesHow many sequences are usually analyzed by parsimony methods?Exhaustively, probably tens of sequences. With heuristic search methods, you can analyze arbitrarily many, but you lose the guarantee that youre finding the most parsimonious tree.What do good parsimony scores look like?It depends upon how many sequences are involved, and how divergent they are.Why doesnt the parsimony method take into account transitions versus transversions?It can; I presented the simplest version.20Jukes-Cantor modelAssume the same probability of change at all positions and all times.dAB is the proportion of changed sites in the alignment.KAB is the distance between sequences A and B.

21Problem #1Write a program jukes-cantor.py that takes as input a pairwise sequence alignment and prints the Jukes-Cantor distance. Skip sites that contain gaps.> cat twoseqs.txtACGTACCG> python jukes-cantor.py twoseqs.txt0.823959

Problem #2Generalize your previous program to work for a multiple sequence alignment.> cat threeseqs.txtACGTACTGACGG> python jukes-cantor-matrix.py threeseqs.txt 0.000 0.824 0.304 0.824 0.000 0.304 0.304 0.304 0.000 > jukes-cantor-multiple.py moreseqs.txt 0.000 0.233 0.383 0.233 0.233 0.000 0.824 0.572 0.383 0.824 0.000 0.107 0.233 0.572 0.107 0.000

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.