Top Banner
GDSAP- A Galaxy-based platform for large-scale genomics analysis Tin-Lap, LEE School of Biomedical Sciences, CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Hong Kong SAR, China.
30

Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Jan 28, 2015

Download

Technology

Tin-Lap Lee (CUHK) presentation "GDSAP- A Galaxy-based platform for large-scale genomics analysis" from the Galaxy Community Conference 2012, Chicago, July 26th 2012
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

GDSAP- A Galaxy-based platform for large-scale genomics analysis

Tin-Lap, LEESchool of Biomedical Sciences,

CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong,

Hong Kong SAR, China.

Page 2: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

CBIIT• Jointly established between

The Chinese University of Hong Kong (CUHK) and BGI.

• “We aim to provide a platform conducive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics , computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”

Page 3: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Genomic Data Submission and Analytical Platform(GDSAP)

Objectives:• Provides enhanced functionality in additional to the original Galaxy functions:

• Customized public instances.• Seamless integration with SBS-UCSC genome database mirror and

MyExperiment workflow environment.• Exchange and publish data through GigaSciences journal portal.

Outcomes: • Simplies complicated bioinformatics tasks, accelerate data processing and

allow flexible analysis.• Significantly reduce software and hardware costs, encourage research

collaboration.

Page 4: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

GDSAP Structure

ToolDevelopment PublishingBiomedical and bioinformatics research

Page 5: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

http://www.cuhk.edu.hk/cbiit/galaxy.html

Galaxy/CUHK-BGI

Page 6: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

GDSAP Structure

ToolDevelopment PublishingBiomedical and bioinformatics research

Page 7: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

What is SOAP?• SOAP - a tool package that provides full solution to NGS data

analysis by BGI.

Page 8: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Why SOAP?• Galaxy has been using SAMtools for consensus sequence calling, but the

recent upgrade has left this part out, which is very limited to some biologists.

• SOAPsnp is the only other method that can call full consensus sequences besides SAMtools.

• The main galaxy site supports none of the SOAP tools, including SOAPsnp.

Page 9: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Galaxy Tool Shed

• Enables sharing of Galaxy tools across Galaxy servers around the world.

• SOAP package tools configured for use in Galaxy.– SOAPsnp/SOAPdenovo

Page 10: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Implement: SOAPsnp

Page 11: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Implement: SOAPdenovo configuration file

Page 12: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Implement: SOAPdenovo

Page 13: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

GDSAP structure

BioinformaticsDevelopment PublishingBiomedical and bioinformatics research

Page 14: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

How does it work?• MyExperiment works as a repository for

workflows.

• Taverna workflows.

• New: Galaxy workflows.

• GDSAP integration

Page 15: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Taverna workflow

Page 16: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Page 17: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Galaxy workflow

Page 18: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Import (1)

Page 19: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Import (2)

Page 20: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Export (1)

Page 21: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Export (2)

Page 22: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

GDSAP structure

BioinformaticsDevelopment PublishingBiomedical and bioinformatics research

Page 23: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

www.gigasciencejournal.com

Large-Scale Data Journal/Database

Editor-in-Chief: Laurie Goodman, PhDEditor: Scott Edmunds, PhDAssistant Editor: Alexandra Basford, PhD

In conjunction with:

Now taking submissions…

Page 24: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

GigaScience is go…

Page 25: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

www.gigaDB.org

Data Publishing

Page 26: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

37 Datasets with DOI®s

PlantsChinese cabbageCucumberFoxtail milletPigeonpeaPotatoSorghum

MicrobesE. Coli O104:H4 TY-2482Cell-LineChinese Hamster OvaryMouse Methylomes

Human Asian individual (YH) v1+v2- DNA Methylome - Genome Assembly- TranscriptomeCancer (14TB)Hep B infected exomesSingle Cell Bladder CancerAncient DNA - Saqqaq Eskimo - Aboriginal Australian

VertebratesGiant panda Macaque - Chinese rhesus - Crab-eatingMini-PigNaked mole rat Penguin - Emperor penguin- Adelie penguinPigeon, domesticPolar bearSheepTibetan antelope

InvertebrateAnt - Florida carpenter ant- Jerdon’s jumping ant- Leaf-cutter antRoundwormSchistosomaSilkworm

Released pre-publicationNon-BGIPaper in GigaScience

Coming soon…Microbiome dataParrot

Page 27: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Genomic Data Submission and Analytical platform

GDSAP:

GigaDB v2 export to GDSAP

Page 28: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Data Modeling

Pipeline design

Validation

Applications

Genomic Data Submission and Analytical platform

Big data from the

“Sequencing Coal Face”

GDSAP:

Data, Data, Data…

Tin-Lap Lee, CUHK

Page 29: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Acknowledgements• Lee Lab (CUHK)

– Huayan Gao

• GigaScience– Scott Edmunds– Peter Li– Tam Sneddon

• BGI-Hong Kong– Dennis Chan– Edmond Leung

• Galaxy team– Nate Coraor

• myExperiment– Finn Bacall– Dave De Roure

• NBIC– Kostas Karasavvas

Page 30: Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis

Thank you