Genestack Genomics Applications Platform

GENOMICS APPLICATIONS

PLATFORM

Friday, 27 April 12

GENESTACK PLATFORM

OBJECTIVE

REASON

APPROACH

universal genomics applications platform

provide full set of building blocks

existing tool integration& new tool development

Friday, 27 April 12

GENESTACK PLATFORM

Sharing

Private data

Public data

ApplicationsSecurity

HPC

Friday, 27 April 12

GENESTACK PLATFORM

DATA private & secure sharing

free public dataformat-independent

custom data types

Friday, 27 April 12

GENESTACK PLATFORM

APPLICATIONSefficient computationuni"ed user interfacescriptabletrustedapplication SDK

Friday, 27 April 12

Friday, 27 April 12

GENESTACK PLATFORM

SDK & tooling

Data and applications store

Security audit/testing

FOR DEVELOPERS

Friday, 27 April 12

GENESTACK SERVICEPUBLIC DATA

Meta-curation

Quality control

NGS tools

Free access

Friday, 27 April 12

GENESTACK SERVICEEND-TO-END SEQUENCING

NGS service partners

Data direct to cloud

Curation and apps

Friday, 27 April 12

GENESTACK SERVICE

Cost-effective, secure

Offsite backup

Long term archival

CLOUD DEPLOYMENT

Friday, 27 April 12

Friday, 27 April 12

Genestack Limited, Salisbury House, Station Road, Cambridge, CB1 2LA, United Kingdom

Telephone +447990705531, Email: [email protected], Twitter: @genestackltd

Registered in England and Wales Company No. 7778793

GENESTACK www.genestack.com

GENOMICS OPERATING SYSTEM

Solutions to Six Problems With Genomic Data and Applications in the Enterprise

1. Managing Genomic Data Storage Costs

Problem: Sequencing gets cheaper per genome, producing more gigabases per dollar, but data storage and processing costs are in fact growing. In-house storage and cluster solutions take large capital expenditures and big operating costs.

Solution: We offer a scalable way to manage your data storage and processing costs on our cloud-based platform. For a fixed monthly subscription fee, you get storage space and computational capacity, controlling costs in line with your needs. We host world’s biggest public genomics datasets, and by economy of scale can pass our lower storage costs on to you.

Interesting: For every gigabyte of raw sequence, researchers use at least seven gigabytes of operational disk space for processing, and usually store these intermediate files long after the processing run, adding to the storage costs. Our platform is designed to optimize these intermediate data overheads.

2. Safe Data Sharing Within & Across Organizations

Problem: Managing teams in your company with external collaborators, you may need to give individuals access to valuable data. Copying, downloading and sending data by mail is expensive, inefficient and difficult to control.

Solution: Our platform supports collaborative groups within and across organizations, with fine-grained access control. Data can be encrypted at rest and in transit. We are ready for stringent security tests, can fulfill technical and insurance requirements of pharma IT and legal departments and integrate with internal authentication/authorization mechanisms.

Interesting: NGS produces files hundreds of gigabytes in size; encrypting/decrypting them is slow and CPU-intensive, while bioinformatics tools can take hours or days to run. We have thought of ways to maintain security even for such cases.

3. Using Public Data with Proprietary Data Cost-effectively

Problem: To use data from 1000 Genomes, GEO, Ensembl references or other public data for in-house R&D, your IT keeps local snapshots of these resources. Maintaining them up to date is a heavy burden, but today there is no choice if, say, you need to see ten public RNA-Seq tracks with ten proprietary ones in a genome browser, while keeping control of your data.

Solution: We host and make available to you for free a huge collection of public data, selected and annotated by our curators. Together with the provision to host securely your proprietary data, and a flexible mechanism to select and create virtual meta-experiments, our platform offers the most cost-effective way to work with public and private datasets. You will access our genome browser from any laptop to view tens of different tracks, public and private, simultaneously, securely.

Interesting: The 1000 Genomes project is about 200 TB of data. It’s on Amazon’s cloud, but you need to be an expert to use it: tutorials are many pages long. SRA, the public repository for NGS data is about ten times that. You will have free, easy access to these and other datasets, up to date and annotated by our curators by subscribing to Genestack platform.

Universal genomics data platform. Secure hosting and team sharing of Big

Data genomics experiments. Bioinformatics applications ecosystem in the

cloud. Free access to curated genomic data from public repositories. Data

curation and application development. End-to-end sequencing service.

Applications SDK & marketplace. Fixed monthly subscription.

Genestack Limited, Salisbury House, Station Road, Cambridge, CB1 2LA, United Kingdom

Telephone +447990705531, Email: [email protected], Twitter: @genestackltd

Registered in England and Wales Company No. 7778793

GENESTACK www.genestack.com

GENOMICS OPERATING SYSTEM

Solutions to Six Problems With Genomic Data and Applications in the Enterprise

1. Managing Genomic Data Storage Costs

Problem: Sequencing gets cheaper per genome, producing more gigabases per dollar, but data storage and processing costs are in fact growing. In-house storage and cluster solutions take large capital expenditures and big operating costs.

Solution: We offer a scalable way to manage your data storage and processing costs on our cloud-based platform. For a fixed monthly subscription fee, you get storage space and computational capacity, controlling costs in line with your needs. We host world’s biggest public genomics datasets, and by economy of scale can pass our lower storage costs on to you.

Interesting: For every gigabyte of raw sequence, researchers use at least seven gigabytes of operational disk space for processing, and usually store these intermediate files long after the processing run, adding to the storage costs. Our platform is designed to optimize these intermediate data overheads.

2. Safe Data Sharing Within & Across Organizations

Problem: Managing teams in your company with external collaborators, you may need to give individuals access to valuable data. Copying, downloading and sending data by mail is expensive, inefficient and difficult to control.

Solution: Our platform supports collaborative groups within and across organizations, with fine-grained access control. Data can be encrypted at rest and in transit. We are ready for stringent security tests, can fulfill technical and insurance requirements of pharma IT and legal departments and integrate with internal authentication/authorization mechanisms.

Interesting: NGS produces files hundreds of gigabytes in size; encrypting/decrypting them is slow and CPU-intensive, while bioinformatics tools can take hours or days to run. We have thought of ways to maintain security even for such cases.

3. Using Public Data with Proprietary Data Cost-effectively

Problem: To use data from 1000 Genomes, GEO, Ensembl references or other public data for in-house R&D, your IT keeps local snapshots of these resources. Maintaining them up to date is a heavy burden, but today there is no choice if, say, you need to see ten public RNA-Seq tracks with ten proprietary ones in a genome browser, while keeping control of your data.

Solution: We host and make available to you for free a huge collection of public data, selected and annotated by our curators. Together with the provision to host securely your proprietary data, and a flexible mechanism to select and create virtual meta-experiments, our platform offers the most cost-effective way to work with public and private datasets. You will access our genome browser from any laptop to view tens of different tracks, public and private, simultaneously, securely.

Interesting: The 1000 Genomes project is about 200 TB of data. It’s on Amazon’s cloud, but you need to be an expert to use it: tutorials are many pages long. SRA, the public repository for NGS data is about ten times that. You will have free, easy access to these and other datasets, up to date and annotated by our curators by subscribing to Genestack platform.

Universal genomics data platform. Secure hosting and team sharing of Big

Data genomics experiments. Bioinformatics applications ecosystem in the

cloud. Free access to curated genomic data from public repositories. Data

curation and application development. End-to-end sequencing service.

Applications SDK & marketplace. Fixed monthly subscription.

Misha Kapushesky, [email protected] @genestackltdLaunch fall 2012. Want to take part in our early access programme?

Friday, 27 April 12

Genestack Genomics Applications Platform

Sports

Genestack Genomics Applications Platform