Top Banner
Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University
39

Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Dec 14, 2015

Download

Documents

Curtis Leeke
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Cloud Computing fore-Science with CARMEN

Paul WatsonNewcastle University

Page 2: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

e-Science

“e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it”

John Taylor

Former Director General of the UK Research Councils

Page 3: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Two Strands to talk...

Page 4: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Research Challenge

Understanding the brain is the greatest informatics challenge

• Enormous implications for science:

• Medicine

• Biology

• Computer Science

Page 5: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Collecting the Evidence

100,000 neuroscientists generate huge quantities of data

– molecular (genomic/proteomic)

– neurophysiological (time-series activity)– anatomical (spatial)– behavioural

Page 6: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Neuroinformatics Problems

• Data is:• expensive to collect but rarely shared• in proprietary formats & locally described

• The result is:• a shortage of analysis techniques that can be applied

across neuronal systems• limited interaction between research centres with

complementary expertise

Page 7: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Data in Science

Bowker’s “Standard Scientific Model”

1. Collect data

2. Publish papers

3. Gradually loose the original data

The New Knowledge Economy & Science & Technology Policy, G.C. Bowker

Problems:– papers often draw conclusions from data that is

not published– inability to replicate experiments– data cannot be re-used

Page 8: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Codes in Science

Three stages for codes

1. Write code and apply to data

2. Publish papers

3. Gradually loose the original codes

Problems:

– papers often draw conclusions from codes that are not published

– inability to replicate experiments

– codes cannot be re-used

Page 9: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

CARMEN

enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated

Page 10: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

CARMEN Project

UK EPRSC e-Science Pilot

£5M (2006-10)

20 Investigators

Stirling

St. Andrews

Newcastle

York

Sheffield

Cambridge

ImperialPlymouth

Warwick

Leicester

Manchester

Page 11: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Newcastle: Colin Ingram Paul Watson Stuart Baker Marcus Kaiser Phil Lord Evelyne Sernagor Tom Smulders Miles Whittington

York: Jim Austin Tom Jackson

Stirling: Leslie Smith Plymouth: Roman Borisyuk

Cambridge: Stephen Eglen

Warwick: Jianfeng Feng

Sheffield: Kevin Gurney Paul Overton

Manchester: Stefano Panzeri

Leicester: Rodrigio Quian Quiroga

Imperial: Simon Schultz

St. Andrews: Anne Smith

CARMEN Consortium

Page 12: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Industry & Associates

Page 13: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

cracking the neural code

neurone 1

neurone 2

neurone 3

raw voltage signal data typically collected using single or multi-electrode array recording

Focus on Neural Activity

Page 14: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Epilepsy ExemplarData analysis guides surgeon removing brain tissue

WARNING!

The next 2 Slides show an exposed brain

Page 15: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Epilepsy Exemplar

Recording from removed tissue (up to 20 GB/h)

On-line analysis by distributed collaborators will enable experiment to be defined during data collection

Repository will enable integration of rare

case types from different labs

Advances in Treatment

Data analysis guides surgeon removing brain tissue

Page 16: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

e-Science Requirements Summary

• Sharing– data– code

• Capacity– vast data storage

• (100TB+ in CARMEN)

– support data intensive analysis

Page 17: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

CARMEN Cloud Architecture

Data storage

and

analysis

User access over Internet(typically via

browser)

Users upload data & services

Users run analyses

Page 18: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

e-Science Cloud Services

• Amazon (& Google) offer cloud computing– Basic storage & compute services– e.g. Amazon S3 & EC2

• e-Science needs a set of higher-level services to support user needs

• Which services? ....

Page 19: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

CARMEN Cloud (CAIRN)

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

WebPortal

..............

WebPortal

Rich Clients

Sec

urity

Workflow Enactment

Engine

RegistryServiceRepos-

itory

Search for Data & Analysis Code

Raw & Derived Data Store

Structured Metadata Store Enabling Search & Annotation

AnalysisCode Store

Page 20: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Dynasoar

• Code Repository and Deployment– long term storage

• Code factored as Web Services– Standard (WS-I) interface– Internals not important

• Java, MatLab, C, C#,C++,...

• Deployers for a variety of service types– .war files (Tomcat), Virtual Machines (VMWare, Virtual

PC), .NET assemblies, database stored procedures

Page 21: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Dynasoar: Dynamic Deployment

21

C WSP

req

res

1

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

3

2: service fetch &deploy

SR

Service Repository

R

The deployed service remains in place andcan be re-used - unlike job scheduling

A request to s4

Page 22: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Dynasoar

22

C WSP

req

res

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

Consumer

A request for s2 is routed to an existing

deployment of the service

Page 23: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Performance Gains

C

req

resAnalysis Service

DatabaseService

req

res

Page 24: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Scalability

0

50

100

150

200

250

300

350

400

450

0.03

0.03

0.03

0.06

0.06

0.13

0.13

0.13

0.25

0.25 0.

5

0.5

0.5 1 1 1

Arrival Rate (messages per second)

Res

pons

e tim

e (s

econ

ds)

0

2

4

6

8

10

12

14

16

18

Proc

esso

rs in

poo

l

Response time(Seconds)

processors in pool

Page 25: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

CARMEN Cloud (CAIRN)

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

WebPortal

..............

WebPortal

Rich Clients

Sec

urity

Workflow Enactment

Engine

RegistryServiceRepos-

itory

Search for Data & Analysis Code

Raw Signal Data Search & Visualisation

Enactment of scientific analysis processes

Raw & Derived Data Store

Security Policies Controlling Access to Data & Code

Structured Metadata Store Enabling Search & Annotation

AnalysisCode Store

Page 26: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Controlled Sharing

My collaborators can now see it

Everyone can see it

Only I am allowed to see

this data

Scientist

Page 27: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Security Solution

• XACML – standard way to encode rules as (subject, action, resource) triples

• Rules checked on each access

Page 28: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Controlled Sharing - conflicts

My collaborators can now see it

Only I am allowed to see

this data

All data must be accessible to everyone

after the end of the project

Scientist

Funder

Page 29: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Addressing Conflicts

• Each party expresses policy as XACML rules• Rules are converted to formal language

– XACML -> VDM++• Run formal model to detect conflicts

Page 30: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

WebPortal

..............

WebPortal

Rich Clients

Sec

urity

Workflow Enactment

Engine

RegistryServiceRepos-

itory

OMII:Grimoire

DAME:Signal Data Explorer

OMII/ myGrid:Taverna

OGSA-DAI, SRB, DAME

Gold:Role & Task based Security

myGrid & CISBAN

Dynasoar

CARMEN CAIRN

Page 31: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Using CARMEN for a typical scenario

1. Data Collection from a Multi-Electrode Array2. Data Visualisation and Exploration3. Spike Detection4. Spike Sorting5. Analysis6. Visualisation of Analysis Results

Currently, this is asemi-manual process

CARMEN has automated this….

Page 32: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Web Portal

Page 33: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Raw Data Exploration with Signal Data Explorer

Page 34: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Defining the process with Workflow

Page 35: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Running a Workflow

Page 36: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

SRB FileSystem

RDBMS

External

Client Spike Sorting

Service

Reporting

Dynamically Deployed Services in Dynasoar

TAVERNA

Registry

INPUT Data

OUTPUT Metadata

Available Services

RepositoryS

ecur

ityWorkflow Engine

Query

Running the Workflow

Page 37: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Graphical Output

Page 38: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

Movie Output

Page 39: Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University.

CARMEN (www.carmen.org.uk)

• is delivering an e-Science infrastructure that can be applied across a diverse range of applications

• uses a Cloud/Software as a Service architecture • enables cooperation and interdisciplinary working• aims to deliver new results in neuroscience, computer

science and medicine