Gabor Marth, Goncalo Abecasis, PIs

Post on 24-Feb-2016

55 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Robust Software Tools for Variant Identification and Functional Assessment (Boston College & University of Michigan). Gabor Marth, Goncalo Abecasis, PIs. Informatics challenges for genomic analysis. Tool building. Widening accessibility. Facilitating analysis. Intentions of the RFA. - PowerPoint PPT Presentation

Transcript

Robust Software Tools for Variant Identification and Functional Assessment

(Boston College & University of Michigan)

Gabor Marth, Goncalo Abecasis, PIs

Informatics challenges for genomic analysis

• Tool building

• Facilitating analysis

• Widening accessibility

Intentions of the RFA

Our approach

• Complete toolbox including variant interpretation

• Full pipelines for start-to-finish analysis• Easily accessible and well documented methods• Cloud deployment (in addition to single

machine/local compute cluster)• Open development model

Progress in first 6 months• Starting with two sets of tools and pipelines, geared toward high

quality local analysis, battle-tested in the 1000GP data and medical sequencing projects

• The two groups follow a “divide and conquer” strategy to put critical pieces in place for making our algorithms available for the wider genomics community

• Boston College– A universal tool/pipeline launcher application– Infrastructure for dissemination– Cloud access via Galaxy

• University of Michigan– Integration of variant annotation/impact assessment– Pipeline/workflow control infrastructure– Adaptation for Amazon Cloud Services

FUNCTIONALITY & TOOLS

Scope

Include latest versions• Tools constantly evolving (as they must to remain relevant)• Our community toolbox to be updated with new tools as

they become available

ref: TATAGAGAGAGAGAGAGAGCGAGAGAGAGAGAGAGAGGGAGAGACGGAGTTalt: TATAGAGAGAGAGAGAGCGAGAGAGAGAGAGAGAGAGGGAGAGACGGAGTT

ref: TATAGAGAGAGAGAGAGAGC--GAGAGAGAGAGAGAGAGGGAGAGACGGAGTTalt: TATAGAGAGAGAGAGAG--CGAGAGAGAGAGAGAGAGAGGGAGAGACGGAGTT

New algorithms for complex variant detection (FreeBayes)

Include tools when ready for prime time

MEI type RetroSeq Tangram Tea

Sample Total Sensitivity Total Sensitivity Total Sensitivity

ALU NA12891 719 89% 1192 98% 1127 92%

NA12892 687 86% 1185 98% 1078 92%

NA12878 793 82% 1326 99% 1038 89%

L1 NA12891 52 78% 190 81% 286 81%

The BC mobile element insertion caller performs best in its class

EPACTS variant interpretation tools(Efficient and Parallelizable Association Container Toolbox)

• Genetic analysis tool based on VCFo Fast and parallelizable access to large VCF fileso Built-in widely used single variant and burden testso R/C++ interface for extending to newer tests o Binary & quantitative phenotypes with covariateso Useful visualization tools of association results

• Automated visualization

PIPELINES & WORKFLOW

The UM pipeline

Optional LD-aware step

GenotypeLikelihoodBAM Unfiltered

VCF

Hard-filteredVCF

GenotypeLikelihoodBAM

GenotypeLikelihoodBAM

samtools glfMultiples

vcfCooker

FilteredVCF

SVM

Filtered/PhasedVCF

Beagle/Thunder

Filtered/PhasedVCF

EPACTS

UMAKE workflow system

• Makefile based approach– The Make utility is very good for representing dependencies– Pick up where left off on Failure

• Flexible deployment– Local Machine– Local Cluster (Mosix)– Amazon Web Services Elastic Compute Cloud (EC2)

• Default options– User configurable

14

Application of UMAKE to large-scale projects

Project Depth /Region N #SNPs %dbSNP

(129)KnownTs/Tv

NovelTs/Tv

1000G 4x Genome 1,092 34.5M 24.4 2.14 2.16

1000G >40x Exome 822 598K 22.1 2.96 2.80

GoT2D 4x Genome ~2,800 26.7M 25.5 2.16 2.19

ESP >80x Exome ~6,900 1.92M 8.6 2.94 2.83

Sardinia 3x Genome 2120 17.6M 38.4 2.15 2.22

Bipolar 10x Genome

Computational cost is ~1 week / 1000 samples in a 5 node mini-cluster

ACCESSIBILITY

The Boston College tool hub

http://gkno.me(genome)

Simplified installation & use

• Unified launcher application (gkno)– single tools (e.g. Mosaik)– tool “macros” (e.g. map)– pipelines (e.g. exome variant calling)

• Download and installation– All tools pulled in a single step from github– All tools installed– All tools tested

Easily configurable pipeline system

• Part of our new unified launcher system (gkno)• Pipeline types (e.g. mapping, variant calling) and

instances (exome, whole-genome)• User-configurable: tools can be swapped in and out,

parameters configured via config files

Support

• Documentation• Tutorials / Blog• User forum• Bug reports

DEPLOYMENT / CLOUD

Software deployment

• All software is ready for running locally on a single machine

• UMAKE adds cluster support• Cloud deployment– Simple Michigan pipelines ported to Amazon– Portation of all project software on the way

Cloud-based analysis – Galaxy

OPEN & COLLABORATIVE DEVELOPMENT MODEL

Integration• Our workflows leverage 3rd

party tools for specific functionality

• All our tools are open-source, available on github (many clones, community contributed code)

• Ensemble approach (multiple tools for critical tasks)

Ensemble approach

• Multiple tools usually benefit analysis

Ts/Tv

Called in # SNPs %dbSNP Novel Known Total

Union 907,170 22.09 2.22 2.30 2.24

2 of 5 766,608 25.33 2.38 2.33 2.37

3 of 5 696,358 27.05 2.44 2.36 2.42

4 of 5 601,132 29.62 2.49 2.40 2.46

Intersection 520,083 32.20 2.53 2.42 2.49

Ensemble approach

• Our pipelines will use multiple aligners (BWA, Mosaik) and variant callers (Freebayes, glfMultiples), developed by BC/UM

In progress

• Expanding pipelines to integrate all tools • Michigan tools -> gkno• BC tools -> Michigan cloud ready pipelines• Large data set analysis on the cloud• Integrate variant interpretation tools• Integrate SV tools as they become more robust• Integrate consensus analysis (SVM and MLP

approaches to callset aggregation)• Minimal, functional pipeline -> Galaxy

Team

Boston College• Alistair Ward• Derek Barnett• Chase Miller• Wan-Ping Lee• Erik Garrison

• Gabor Marth

University of Michigan• Mary-Kate Trost• Tom Blackwell• Hyun-Min Kang• Youna Hu • Adrian Tan • Xiaowei Zhan • Dajiang Liu

• Goncalo Abecasis

top related