Top Banner
Analysis of Affymetrix Analysis of Affymetrix expression data using R expression data using R on Azure Cloud on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG Workshop, Oxford Dr Andrew Harrison, University of Essex Dr Hugh Shanahan, Royal Holloway, University of London
18

Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Mar 27, 2015

Download

Documents

Brianna Lindsay
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Analysis of Affymetrix expression Analysis of Affymetrix expression data using R on Azure Clouddata using R on Azure Cloud

Anne OwenDepartment of Mathematical Sciences

University of Essex

15/16 March, 2012SAICG Workshop, Oxford

Dr Andrew Harrison, University of EssexDr Hugh Shanahan, Royal Holloway, University of London

Page 2: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

IntroductionIntroduction

• The Affymetrix GeneChip• Micro-array data• Venus-C pilot project• R scripts on Azure Cloud• Results to date• Our Experience

Page 3: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

• We are developing informatics tools to aid the analysis of Affymetrix chips (GeneChips, Exon arrays).

• Micro-arrays are the data read from GeneChips

Affymetrix GeneChip

• ArrayExpress is an example of a public database containing microarrays and other data from biological experiments

Page 4: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

DNA and RNA

Page 5: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Probe cells of an Affymetrix Gene chip contain millions of identical 25-mers

25-mer

Page 6: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Affymetrix GeneChip Hybridization – fragments of RNA stick to the probes

Page 7: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Affymetrix GeneChip Fluorescence

Page 8: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Micro-array datasetsMicro-array datasets

• Fluorescence data put into .cel files• Many 1000’s of experiments• Many 100’s of micro-arrays for each GeneChip• >1Tb data to analyse• 1000’s of published papers using Affymetrix GeneChips

• This data is a free resource to researchers

Page 9: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Going Forward...

• Currently we analyse flaws in Genechip data• Future is new genomic technology known as

‘next generation sequencing’‘next generation sequencing’• Petabytes of data being generated faster than

it can be analysed• Cloud solutions needed for storage of and

access to this data

Page 10: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Venus-C Pilot ProjectVenus-C Pilot Project

• VENUS-C is a project funded under the European Commission’s 7th Framework Programme with computing resources from Microsoft

• Joint co-operation between computing service providers and scientific user communities

• Aim: to develop, test and deploy a large, Cloud computing infrastructure for science and SMEs (small and medium-sized enterprises) in Europe.

Page 11: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Page 12: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Venus-C InfrastructureVenus-C Infrastructure

• 3 main areas dealing with standards:– VM management (OCCI and OVF)– Job submission (BES)– Cloud data storage (CDMI)

• Other specifications, such as– WS-Security

• Programming model:– Task based submission: Generic Worker role

Page 13: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

cTQm Project OverviewcTQm Project Overview

BLOB

StoragePublic database

Scripts, R libs and key data uploaded via

Azure webpage

Page 14: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Cloud / Grid InterfacesCloud / Grid Interfaces

Amazon EC2: Amazon EC2: Command line interface into Linux terminal

NGS:NGS: Portal or Command Line to Linux machine

Azure:Azure: Webpage interface to a Windows machine, Visual Studio 2010, C#

Page 15: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Page 16: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Bioinformatics Results to dateBioinformatics Results to date

• Uploading of datasets into Cloud storage is underway• Success with R scripts on Azure to confirm results in

published paper*• Minor problems with ArrayExpress to solve• Work is extending to more GeneChip types• Still need user authentication / accounting

* Nucleic Acids Research, 2011, 1-9, “Normalised Affymetrix expression data are biased by G-quadruplex formation”, by Hugh P. Shanahan, Farhat N. Memon, Graham J. G. Upton and Andrew P. Harrison

Page 17: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Our ExperienceOur Experience

• Azure Cloud is a steep learning curve for a Linux-based scientist

• Vast datasets can be made available• Applications can be user-friendly• Scalability makes Cloud approach attractive• Costs need to be assessed• Enables scientists in developing countries to

perform genome analysis

Page 18: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Acknowledgements and thanks to:-

Dr Andrew Harrison, University of EssexDr Hugh Shanahan, Royal Holloway, University of LondonDepartment of Mathematical Sciences, University of Essex

European Commission’s 7th Framework ProgrammeMicrosoft and Venus-C Venus-C project Organisers

Analysis of Affymetrix expression Analysis of Affymetrix expression data using R on Azure Clouddata using R on Azure Cloud