Analysis of Affymetrix Analysis of Affymetrix expression data using R expression data using R on Azure Cloud on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG Workshop, Oxford Dr Andrew Harrison, University of Essex Dr Hugh Shanahan, Royal Holloway, University of London
18
Embed
Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Analysis of Affymetrix expression Analysis of Affymetrix expression data using R on Azure Clouddata using R on Azure Cloud
Anne OwenDepartment of Mathematical Sciences
University of Essex
15/16 March, 2012SAICG Workshop, Oxford
Dr Andrew Harrison, University of EssexDr Hugh Shanahan, Royal Holloway, University of London
IntroductionIntroduction
• The Affymetrix GeneChip• Micro-array data• Venus-C pilot project• R scripts on Azure Cloud• Results to date• Our Experience
• We are developing informatics tools to aid the analysis of Affymetrix chips (GeneChips, Exon arrays).
• Micro-arrays are the data read from GeneChips
Affymetrix GeneChip
• ArrayExpress is an example of a public database containing microarrays and other data from biological experiments
DNA and RNA
Probe cells of an Affymetrix Gene chip contain millions of identical 25-mers
25-mer
Affymetrix GeneChip Hybridization – fragments of RNA stick to the probes
Affymetrix GeneChip Fluorescence
Micro-array datasetsMicro-array datasets
• Fluorescence data put into .cel files• Many 1000’s of experiments• Many 100’s of micro-arrays for each GeneChip• >1Tb data to analyse• 1000’s of published papers using Affymetrix GeneChips
• This data is a free resource to researchers
Going Forward...
• Currently we analyse flaws in Genechip data• Future is new genomic technology known as
‘next generation sequencing’‘next generation sequencing’• Petabytes of data being generated faster than
it can be analysed• Cloud solutions needed for storage of and
access to this data
Venus-C Pilot ProjectVenus-C Pilot Project
• VENUS-C is a project funded under the European Commission’s 7th Framework Programme with computing resources from Microsoft
• Joint co-operation between computing service providers and scientific user communities
• Aim: to develop, test and deploy a large, Cloud computing infrastructure for science and SMEs (small and medium-sized enterprises) in Europe.
Venus-C InfrastructureVenus-C Infrastructure
• 3 main areas dealing with standards:– VM management (OCCI and OVF)– Job submission (BES)– Cloud data storage (CDMI)
• Other specifications, such as– WS-Security
• Programming model:– Task based submission: Generic Worker role
cTQm Project OverviewcTQm Project Overview
BLOB
StoragePublic database
Scripts, R libs and key data uploaded via
Azure webpage
Cloud / Grid InterfacesCloud / Grid Interfaces
Amazon EC2: Amazon EC2: Command line interface into Linux terminal
NGS:NGS: Portal or Command Line to Linux machine
Azure:Azure: Webpage interface to a Windows machine, Visual Studio 2010, C#
Bioinformatics Results to dateBioinformatics Results to date
• Uploading of datasets into Cloud storage is underway• Success with R scripts on Azure to confirm results in
published paper*• Minor problems with ArrayExpress to solve• Work is extending to more GeneChip types• Still need user authentication / accounting
* Nucleic Acids Research, 2011, 1-9, “Normalised Affymetrix expression data are biased by G-quadruplex formation”, by Hugh P. Shanahan, Farhat N. Memon, Graham J. G. Upton and Andrew P. Harrison
Our ExperienceOur Experience
• Azure Cloud is a steep learning curve for a Linux-based scientist
• Vast datasets can be made available• Applications can be user-friendly• Scalability makes Cloud approach attractive• Costs need to be assessed• Enables scientists in developing countries to
perform genome analysis
Acknowledgements and thanks to:-
Dr Andrew Harrison, University of EssexDr Hugh Shanahan, Royal Holloway, University of LondonDepartment of Mathematical Sciences, University of Essex
European Commission’s 7th Framework ProgrammeMicrosoft and Venus-C Venus-C project Organisers
Analysis of Affymetrix expression Analysis of Affymetrix expression data using R on Azure Clouddata using R on Azure Cloud