Top Banner
INSTRAL: Discordance-aware Phylogenetic Placement using Quartet Scores Maryam Rabiee Department of Computer Science University of California, San Diego 1
22

INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Apr 30, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

INSTRAL: Discordance-aware Phylogenetic Placement

using Quartet ScoresMaryam Rabiee

Department of Computer Science University of California, San Diego

1

Page 2: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

2

➤ Given: — an existing tree — some new data not in the treeFind: the best position of the new data on the tree

➤ Why?

➤ With emergence of new data, trees get outdated

• De novo phylogenetic reconstruction is expensive

➤ Sample identification for query sequences, especially for mixed samples from environment

Phylogenetic placement

Page 3: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

3

Gene Trees

Backbone TreeX

Y

W

Z

A

x

y

w

z

Aw

y

z

x

Ax

y w

z

w

y

z

x ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG

gene 1

-----CATTGCT--

xwyzA

CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G

gene 2

xwyz

---CATTGCT-- AY

w

z

x

CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT

gene 3

Y w

zx

---CATTG---CT--- A

xwyz

EPA-ng [Barbera et al., Sys Bio., 2018]SEPP [Mrarab et al., Biocomputing, 2012]APPLES [Balaban et al., Sys Bio, 2019]PPlacer [Matsen et al., BMC Bio.,2010]

Page 4: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

4

Gene Trees

Backbone TreeX

Y

W

Z

A

x

y

w

z

Aw

y

z

x

Ax

y w

z

w

y

z

x ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG

gene 1

xwyz

CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G

gene 2

xwyzY

w

z

x

CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT

gene 3

Y w

zxxwyz

X

Y

W

Z

A

Page 5: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

5

Gene Trees

Backbone TreeX

Y

W

Z

INSTRAL (INsertion of New Species using asTRAL)

[Rabiee and Mirarab, SysBio, 2019]

A

x

y

w

z

Aw

y

z

x

Ax

y w

z

X

Y

W

Z

A

A species tree on n+1 species that induces the backbone tree and has maximum quartet score versus gene trees

Page 6: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

6

Orang

GorillChim

Human

Gorilla

Orang.

Chimp

HumanBonobo

Orang

GorillBonobo

Human Orang

GorillChim

Bonobo Bonobo

GorillChim

Human Orang

Chim

Human

Bonobo

Quartet support

Quartets:

Page 7: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

INSTRAL algorithm

➤ Finds the placement with maximum quartet support versus the gene trees

➤ Looks at all possible placements and finds the exact solution with no heuristics

➤ Runs in polynomial time with respect to #species and #genes

7

Page 8: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Measuring accuracy

• Remove one leaf at a time from the true species tree

• Add back the left-out species

• Measure Node distance: the number of branches between the correct placement and the reported placement

• 0 means perfect placement

8

Page 9: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

INSTRAL accuracy

9

50

Moderate High Very High

0.1

0.2

0.3

0.4

Nod

e di

stan

ce Method●

CA−ML (EPA−ng)

INSTRAL+de novo

INSTRAL+EPA−ng

Genes

ILS

Page 10: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Comparison with concatenation

10

50

Moderate High Very High

0.1

0.2

0.3

0.4

Nod

e di

stan

ce Method●

CA−ML (EPA−ng)

INSTRAL+de novo

INSTRAL+EPA−ng●●

50

Moderate High Very High

0.1

0.2

0.3

0.4

Nod

e di

stan

ce Method●

CA−ML (EPA−ng)

INSTRAL+de novo

INSTRAL+EPA−ng

Genes

ILS

Maximum-likelihood method for placement

[Barbera et al., Sys Bio., 2018]

Page 11: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

INSTRAL running time

11

1.32

16

64

256

1024

250 500 1000 2500 5000 10000Backbone tree size

Run

ning

tim

es (s

ecs)

Page 12: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

INSTRAL on large trees

➤ INSTRAL was able to insert ~70k new genomes onto the tree with 10K genomes to create a tree with ~100k leaves with around a week of computation (10 nodes with 24 cores.)

[Zhu et al, Nature Communication, 2019]

Page 13: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

(mini) Tutorial• Software available at Github site:

• https://github.com/maryamrabiee/INSTRAL

• https://github.com/maryamrabiee/Constrained-Search

• See README at GitHub site: https://github.com/maryamrabiee/INSTRAL

• Publication:

• Maryam Rabiee, Siavash Mirarab, INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores, Systematic Biology , syz045, https://doi.org/10.1093/sysbio/syz045

13

Page 14: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Step 0a: updating gene alignments

• Add new sequences into existing gene alignments

• Tools:

• SEPP

• UPP

• HMMER

• Mafft —addfragments

14

ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG

CATTGCT

ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG—-CAT-TGCT-

[Mirarab et al., 2012]

[Nguyen et al., 2015]

[Potter et al., 2018]

[Kotah et al., 2019]

Page 15: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Step 0b: updating gene trees

• De novo reconstruction of gene trees from alignments

• RAxML, FastTree , IQ-TREE,…

• Placement on the existing gene trees

• PPlacer, EPA, SEPP,…

15

[Kozlov et al., Bioinformatics, 2019] [Price et al., PloS ONE, 2010]

[Nguyen et al., Mol. Biol. Evol., 2015]

Page 16: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Step 1: prepare input

• Concatenate all the updated gene trees in Newick format into a file

• The backbone tree should also be in Newick format

16

Page 17: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Step 2: Run INSTRAL

java -Djava.library.path=/path_to_repo/lib/ -jar instral.5.13.4.jar -i estimatedgenetrees.tre -f backbone.nwk -o placement.out --placement new_species_label --no-scoring -C > placement.br 2> log.txt

• Use -Xmx for large datasets to increase memory

• Use -T for multi-thread version

17

Page 18: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

• Internal branches of the backbone need labels

• If they don’t use “label_internal_nodes” script in the repo

• INSTRAL outputs the label of the branch and the tree with new species inserted

• Branch labels can be used for multiple insertions

Interpreting the Output

18A dCB FE

N1

N2

N3Output: N1

Output tree file: ((((A,(B,C)),d),(E,F));

Page 19: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Multiple new species• Need to run INSTRAL for each new species separately

• Combine the all insertions by the script in the repo

• Final tree is unresolved

• Sample run:

./multiple_placements.sh estimatedgenetrees.tre backbone.tree outdir/ final_tree.tree

19

X

Y

W

Z

A

B

C

A polytomy

19

Page 20: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Resolving polytomies

20

Page 21: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

Constrained ASTRAL• ASTRAL can be used to resolve polytomies of a species

tree based on input gene trees

• Need as input a constrained tree and gene trees

• Code available at https://github.com/maryamrabiee/Constrained-search

• Example run:java -jar astral.5.6.9.jar -i estimatedgenetrees.tre -o resolved-speciestree.tree -j contraint.tree 2> log.txt

21

Page 22: INSTRAL: Discordance-aware Phylogenetic Placement using ...tandy.cs.illinois.edu/INSTRAL-tutorial-v1.pdf2 Given: — an existing tree — some new data not in the tree Find: the best

For more info• Contact me: Maryam Rabiee, [email protected]

• Software available at Github site:

• https://github.com/maryamrabiee/INSTRAL

• https://github.com/maryamrabiee/Constrained-Search

• See tutorial and README at GitHub site: https://github.com/maryamrabiee/INSTRAL

• See publications:

• Maryam Rabiee, Siavash Mirarab, INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores, Systematic Biology , syz045, https://doi.org/10.1093/sysbio/syz045

22