Top Banner
Anvaya: A Workflow Engine for High Throughput Genomics Dr. Rajendra Joshi Associate Director & HOD BioinformaFcs Group CDAC Pune , India [email protected] 1
20

Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Jun 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Anvaya:  A  Workflow  Engine  for  High  Throughput  Genomics  

Dr.  Rajendra  Joshi  Associate  Director  &  HOD  BioinformaFcs  Group  C-­‐DAC  Pune  ,  India  [email protected]  

1  

Page 2: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

To exploit the enormous scientific value of this information for understanding biological systems, the information must be integrated, analyzed, graphically displayed and ultimately modeled computationally.

HIGH-­‐THROUGHPUT  TECHNIQUES  ARE  REVOLUTIONIZING  LIFE  SCIENCES  

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca tggatttgcc tgttctggat attcatatta atagaatcaa

CURRENT SCENARIO

 Figure:  Stuart  Owen  “  Workflows  with  Taverna“  

2  

Page 3: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Architecture  

3  

Page 4: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Key  Features  of  Anvaya  

•  Rules   Engine   which   adds   intelligence   to   control   tools  connec>vity  

•  Provision  of  addi>onal  Custom  Tools  and  Custom  Parsers  •  13  Pre-­‐defined  Workflows   for   frequently   used   pipelines  

in  genome  annota>on  and  compara>ve  genomics    •  Easy  to  use,  standalone  Anvaya  Client  which  is  supported  

on  Windows  as  well  as  Linux  

4  

Page 5: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Feature  :  Workflow  OperaFons  

•  Create  workflow  or  pipeline  using  the  available  tool  list  

•  Set  proper>es  of  each  node  and  Save  workflow  

•  Run  workflow  opera>on  to  execute  the  pipeline  on  high  end  server  

•  Stop  the  workflow,  is  user  an>cipates  changes  etc.  

•  Resume  the  workflow  from  previously  executed  node  

5  

Page 6: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Feature  :  Tool  List  

The  tool  list  in  Anvaya  is  available  func>onality  wise  or  in  alphabe>cal  order    

6  

Page 7: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Feature  :  Rules  Engine  

•  All   the   tools   included   in   Anvaya   have   been  categorized  according   to   their   func>onality   and  the   allowed   logical   connec>vity   between   tools  has  been  included  as  a  rules  file.    

Defines  rules  for  logical  connecFon  between  the  exisFng  tools    

7  

Page 8: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Feature  :  Custom  Tools  and  Parsers  

•  Custom   tools   in   Anvaya   serve   as   a   wrapper  around  one  or  more  standard  tools  or  are  tools  with  new  func>onality  not  available  in  standard  tools.    

Anvaya  Custom  Tools  provide  novel  funcFonaliFes  to  carry  out  exhaus>ve  compara>ve  analysis  

•  Parser  scripts  have  been  developed  in  PERL  to  enhance  the  logical  connec>vity  between  various  tools,  which  was  hitherto  not  possible  and  required  manual  interven>on    

8  

Page 9: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

GUI  :  Design  Canvas  

•  Drag  Tools  available  in  the  tool  list  on  the  canvas    

•  Connect   them   logically   to  create     a   workflow  pipeline    

•  Set   advanced   IO   and  advanced   parameters   of  each  node    

•  Save  the  workflow  

9  

Page 10: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

GUI  :  Status  Status  available  in  tabular  format  on  the  status  tab  and  also  pictorially  on  the  design  

canvas  

10  

Page 11: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

GUI  :  Project  Explorer  Allows  user  to  view  the  input-­‐output  and  the  intermediate  output  files  of  the  current  

project  

11  

Page 12: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Client  Feature:  Scribble  Note  •  Scribble  Note  allows  user  to  store  short  notes  

regarding  associated  node  or  workflow.    •  These  can  be  minimized  or  hidden  or  expanded  

back  for  readability  purpose.    

12  

Page 13: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Client  Feature:  Sub-­‐layer  Support  •  Nodes  (Tools)  can  be  logically  grouped  together  to  form  

sublayer.    •  The  sublayer  can  be  collapsed  or  expanded  as  per  

readability.    

13  

Page 14: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Feature  :  Pre  Defined  Workflows  •  Anvaya  provides  a  set  of  13  pre-­‐defined  workflows  for  

frequently  used  pipelines  in  genome  annota>on  and  compara>ve  genomics  ranging  for  EST  assembly  and  annota>on  to  phylogene>c  reconstruc>on  and  microarray  analysis.  

 Ø  EST  Analysis  Ø  Genome  annota>on  Ø  Func>onal  Annota>on  Ø  Ortholog  Predic>on  Ø  Predic>on  of  Mo>fs  Ø  Remote  ortholog  predic>on  Ø  Phylogeny  (DNA  and  Protein  sequences)  Ø  Predic>on  of  poten>al  an>genic  sites  Ø  Primer  Predic>on  Ø  Phylogene>c  profiling  Ø  Promoter  iden>fica>on  using  microarray  data  

-­‐Reference  mapping  -­‐RNAseq  DifferenFal  expression  analysis      

14  

Page 15: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

PDW  :  EST  Analysis  

Provides researcher a single pipeline, that can read raw trace files from sequencing machines and provide fully annotated assembled ESTs. *Patil DP et al., BMC Genomics (2009)

Base calling

Vector masking Sequence

trimming Removal of PolyA tail

Trimming of QV

Functional annotation

NCBI submission format

CAP3 pre-processing

CAP3 assembly

Unique transcripts

EST prediction

Domain prediction

Gene ontology

15  

Page 16: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

PDW  :  PhylogeneFc  Profiling  

The workflow aims to infer functional linkages using phylogenetic profiling. The profiles obtained are analyzed for their statistical significance using parameters like mutual information content, Hamming distance and Pearson correlation coefficient.

Similarity search

Conversion to profile matrix format with

norm. E-values

Hamming distance

MI and CC

16  

Page 17: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Test  Case  :  Genome  annotaFon  of  21  mycobacterial  genomes    

Input Dataset: 102 MB Execution Time: 23 min 17  

Page 18: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

Anvaya Publications

•  Bhak>  Limaye,  Ruma  Banerjee,  Avik  Daba,  Harshal  Inamdar,  Pankaj  Vats,  Sonal  Dahale,  Alok  Bhandari,  E.  P.  Ramakrishnan,  Rajnikanth  Tupakula,  Sandeep  Malviya,  Avinash  Bayaskar,  Renu  Gadhari,  Sankalp  Jain,  Vivek  Gavane,  Rashmi  Mahajan,  Sunitha  K,  AND  Rajendra  Joshi,  "    ANVAYA:  A  Workflows  Environment  For  Automated  Genome  Analysis  “  ,  Journal  of  BioinformaFcs  and  ComputaFonal  Biology  (2012)  

•  Ruma  Banerjee,  Pankaj  Vats,  Sonal  Dahale,  Sunitha  Manjari  Kasibhatla  

and  Rajendra  Joshi,  ComparaFve  genomics  of  cell  envelope  

components  in  Mycobacteria,  PloS  One  (2011)    18  

Page 19: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

ICTBioMed  Leadership  at    AcceleraFng  Biology  2014:  CompuFng  Life  

19  

Page 20: Anvaya:’A WorkflowEngineforHigh’ Throughput’ …...2014/04/22  · PDW":"EST"Analysis" Provides researcher a single pipeline, that can read raw trace files from sequencing machines

THANK YOU [email protected]  

20