Top Banner
The e-Science Vision Enabling New Science through Innovative Integrated Technology Solutions The Mission To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base. To “e-enable” the STFC facilities.
40
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SomeSlides

The e-Science Vision

Enabling New Science through Innovative Integrated Technology Solutions

The Mission

To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base.

To “e-enable” the STFC facilities.

Page 2: SomeSlides

The Vision•An increasingly sophisticated infrastructure supporting innovative exploitation of data from the full range of STFC facilities.

– integrated into National and International activities. •Improved use of computation and data management in areas with little historic engagement but growing needs. •Exploit emerging technologies to further enhance UK capabilities.•Better science...

– accelerate the research process,– improve traceability and reproducibility – meet the challenges posed by increasing data volumes. – improves cost effectiveness and quality– encourage collaboration and knowledge exchange– enable researchers to tackle more of the world’s grand challenges – improve the long-term exploitation of research outputs– bridging facilities and users

Page 3: SomeSlides

Ken Peach

UK e-Infrastructure

LHC

I SI S TS2

HPCx + HECtoR

Users get common access, tools, inf ormation, Nationally supported services, through NGS

I ntegratedinternationally

VRE, VLE, IE

Regional and Campus grids

Community Grids

JET

The UK e-Infrastructure for e-Science

ESRF

Page 4: SomeSlides

The Road to Net-centricity from The Road to Net-centricity from Applications PerspectiveApplications Perspective

• WEB Enabled– A application that requests, and

is given access to, services and/or resources via an HTTP request

– Application may have been created before there was a WEB

– Leverage prior investment to quickly make data or application available

– Can use simple HTML WEB Interface or full WEB service interface

– Limited by the Data / Functions exposed in the original design

• WEB Service–Typically built from the ground up to run

over the WEB

–Uses industry standards to provide means of interoperating between different software applications; runs on a variety of platforms and/or frameworks

–Can be combined in a loosely coupled way in order to achieve complex operations

–Simple services can interact with each other to deliver sophisticated value-added services

–Quality of Service and value added capabilities can be documented as Service Level Agreements (SLAs)

WEB-enabled Make Data Available

Deconstruct & Reconstruct

WEB-Services / Compose

XM

L/H

TT

P

"Reach" "Volume" "Efficient & Flexible" "Agility & Speed"

WS

DL

Qo

S &

SL

As

Dat

a T

ran

sfer

•Non-Web Era– System

typically designed as closed, standalone

– Tightly coupled and engineered interface

– Data transfer via FTP / file transfer

– Data is system application specific

SO

AP

/ U

DD

I \

WS

DL

Wra

pp

ers

Page 5: SomeSlides

Strategy •Expertise in systems, applications and information management •Develop and support the integrated e-infrastructure required by researchers

– Focused around exploiting the full lifecycle for scientific data– Developed through Science led projects– User focused, standards based, acknowledging constraints from National

and International collaborations and Government priorities.•Direct contributions to projects and activities

– e.g. LHC, ISIS, DLS, CLF…– Competitive and technology push

• R&D to inform and support future programmes– Grid infrastructures for the UK and Europe– Information management in a distributed heterogeneous environment– Long term data curation– Advanced analysis and visualisation

•Leveraging investment through provision of services to partner organisations•Engage Nationally and Internationally. •Take expert advice. The e-Science Advisory Board

Page 6: SomeSlides

e-Science Advisory Board

External Dr. Daron Green - BTProf. David Ingram - UCLDr David Williams - CERN Dr Jerzy Graff - BMT Dr. Graham Cameron – EBI Prof. Malcolm Atkinson - NeSCProf. Alex Gray – Cardiff Prof. Andy Lawrence – ROEProf. Carole Goble - ManchesterProf. Paul Jeffreys – Oxford

InternalNeil GeddesJohn GordonProf. Keith JefferyProf. Paul Durham

Page 7: SomeSlides

e-Science in 2001•CCLRC e-Science Centre

– ~ 8 people– 10 Projects covering astronomy,

particle physics and computing– £1M p.a.

e-Science Industry day February 2001

Page 8: SomeSlides

e-Science in 2007

•Over 100 staff in e-Science Centre•£11M income in 2006/07 •Projects in HEP, astronomy, biomedical simulation, environmental science, nano-technology, materials science

•UK Leadership in grid infrastructure

•European leadership in data curation

Page 9: SomeSlides

Some e-science facts and figures

eScience Income 06/07, £10.8m

9%

7%19%

18%

11%

3% 3%

30%

CCLRC

CCLRC Library

PPARC

Other RC

JISC

EU

Other Government

Other

FTE e-Science Staff by year

0

20

40

60

80

100

120

2001/02 2002/03 2003/04 2004/05 2005/06 2006/07 2007/08

113 staff 28 female (8 in Library),23 fixed term

Page 10: SomeSlides

Collaborative tools

Department OverviewSTFC e-Science Centre is:

– using leading edge IT to deliver new science• Management and exploitation of large scale scientific

data.• High-quality scientific computing services• Support for collaborative working• Collaborative R&D

– Sharing expertise - technology transfer– Based on core skills:

Data analysis and Computation

Data storage

Data management

Page 11: SomeSlides

Conclusion• Strong personal belief in opportunities from ICT• Specific opportunities for STFC:

– Exploit experience in grand challenges like LHC and IPCC– Encourage collaboration across STFC facilities– Build on our unique position to lead developments internationally– Leverage the infrastructure deployed for wider UK benefit– Meet the ICT expectations of modern researchers– Use the above to stimulate innovation and support science

research•Achieving these requires

– Living close to the technology edge– Providing technological expertise and vision– Managing technology push and user pull– Active research expertise

“innovate or die” –anon.

Page 12: SomeSlides

GridPP, LCG and EGEE

CCLRC e-science centre - LHC Tier-1- Regional Operations Centre (UK+I)- Coordinator of National Grid Service- Partner in other grid deployments

Tier-1

Page 13: SomeSlides

Facilities e-Infrastructure

Diamond synchrotron

ISIS neutron and muon facility

Vulcan laser facility

Physical facilities provide data for the information Infrastructure

•Record data•Store data•Search data•Share data

Integrated system for DLS

demonstrated February 2007

ISIS 20 year back catalogue

ISIS available online

Page 14: SomeSlides

Multi-disciplinary environmental science programmes– Molecular studies of pollutants and radiation damage– Data integration resources

CCLRC provides technological support– Data management infrastructure– Grid computing– Data and information standardisation

• CML, CSML

Environmental Science

British Atmospheric Data Centrehttp:/ / badc.nerc.ac.uk

http://ndg.nerc.ac.uk

British Atmospheric Data Centre

British Oceanographic Data Centre

Simulations

Assimilation

NERC Data Grid : Googling for secure data

Page 15: SomeSlides

Bio-Medical Sciences

Data management in post-genomic biology – Integrated Systems Biology Centre– High throughput experiments– Preparations for biomedical use of DLS/ISIS ...

Biomedical simulation and integrated systems biology – Integrative Biology

• Data sharing infrastructure• Data integration and visualisation

Protein Production

CrystallisationData Collection Phasing

Protein Structure

DepositionStructure analysis

TargetSelection

Overview of Protein Crystallography

The Ontogenesis Network

Page 16: SomeSlides

Materials and Nanotechnology

Characterisation of Materials structure and properties– e-Science technology for real time analysis for experiments– Ability to run, manage and integrate the results of hundreds of

distinct calculations– Advanced visualisation for better result analysis– Long lasting archives of scientific results with easy access for

scientists

Acid Sites in Zeolites

- Ability to share results easily when required

Page 17: SomeSlides

International

?Who

Encourage and influence development of infrastructure

Synchrotron and Neutron Data Infrastructure

European DataInfrastructure

Support UK developments, drive standard access Europe wideDevelop position as a good host + develop access for UK researchers

Access to Scientific Data:

Grid Infrastructure: ESFRI/e-IRGE-infrastructure??

Page 18: SomeSlides

Summary of STFC implementation of IB Grid services and applications for Integrative Biology

•A prototype IB grid with server side visualization to handle extremely large datasets (100MB per small experiment) generated on HPCx and other NGS clusters.

• Interfaces to the grid job management and SRB built on CoolGraphics, Meshalyser & Matlab and also a standalone C++ GUI for IB services.

• Control panels of specific application packages deployed on desktop while the functional core executes on NGS for data encoding & decoding

• Results sent to desktop as well as display walls

Page 19: SomeSlides

Summary of STFC implementation of IB Grid services and applications for Integrative Biology

•Implementation of soft tissue cancer models on the grid (parallelisation included), with embedded computational steering

•Implementation of 3D image reconstruction in real time using the visualisation cluster

• MRI & histopathology images of heart data

• in-vivo cancer image data (for statistics on histopathology)

• Arterial stent tomography data from ESRF

Schematic of stent in arteryStent image to geometry

reconstruction

Processed image with tumour cells and blood vessels highlightedResult from edge detection

Screenshot of real-time 3D image reconstruction,

halfway through. STFC visualization cluster is used and image sent to remote

desktop

Page 20: SomeSlides

SKOS Phase 2 (2005-06)

•W3C Semantic Web Best Practices and Deployment (SWBPD) Working Group

– HP, IBM, Boeing, Adobe, Universities of Maryland, Stanford, Manchester, Amsterdam

•Task force to further develop SKOS – Alistair Miles (STFC) lead

Page 21: SomeSlides

Digital Curation research activities

David GiarettaDirector of CASPAR ProjectandAssociate Director UK Digital Curation Centre

Page 22: SomeSlides

Outline

BackgroundOAISCASPAR and DCCFuture research and projectsSummary

Page 23: SomeSlides

Digital Preservation…

Easy to do… …as long as you can provide money forever

Easy to test claims about tools… …as long as you live a long time

Page 24: SomeSlides

OAIS (ISO14721)Open Archival Information System

Reference Model – referenced in just about any serious work on

digital preservation– Development hosted by CCSDS Panel 2

5 year ISO review underway– minor corrections and updates– No major changes

Revised version due early 2008

Chaired by DG

Page 25: SomeSlides

OAIS Functional Entities

SIP = Submission Information PackageAIP = Archival Information PackageDIP = Dissemination Information Package

SIP

DescriptiveInfo.

AIP AIP DIP

Administration

PRODUCER

CONSUMER

queriesresult sets

MANAGEMENT

Ingest Access

DataManagement

ArchivalStorage

DescriptiveInfo.

Preservation Planning

orders

4-6.

5

Administration

PRODUCER

Approved standardsMigration goals

Develop Packaging Designs & Migration Plans

CONSUMER

Develop Preservation Strategies

and Standards

Monitor Technology

Monitor Designated Community

ProposalsRecommendations

Technology alertsExternal data standardsProtoype resultsReports

ReportsRequirement alertsEmerging standards

Product technologies

Surveys

Surveys

Service requirments

AIP/SIP templatesAIP/SIP review

Migration packagesCustomization advice

Inventory reportsPerformance infoConsumer comments

Prototype requests

Preservation requirements

Advice

Issues

Protoype results

Prototype requests

Page 26: SomeSlides

CASPAR Project

http://www.casparpreserves.eu

EU FP6 Integrated Project

Total spend approx. 16MEuro (8.8 MEuro from EU)

Started April 2006, for 42 months

David Giaretta is Co-ordinator

Page 27: SomeSlides

CASPAR AimsProduce tools and techniques to

support digital preservation and make it easier to share the cost– must be relatively easy to use– must have a low “buy-in” in terms of effort required

for adoption– must avoid requiring wholesale change of everyone

else’s systems– must be decentralised and reproducible so that it can

live on after the formal end of the CASPAR project– must be “preservable”– must be open: open source, open standards

Cannot do everything– Working closely with other projects

Page 28: SomeSlides

CASPAR information flow architecture

•Rep

•Info

VirtualisationHow do we capture the Representation Information?

Page 29: SomeSlides

OverviewEnvironmental

driversTechnology

drivers

Revolution

e-Science Centre’s role

Environment Technology

activitiesnow future

Page 30: SomeSlides

e-Science Centre roleEnvironment

– Co-located at STFC with BADC, NEODC– IPCC Data Distribution Centre– NERC DataGrid– Background in environmental science

Technology– Standards (ISO, OGC)– Architecture– Expertise in ‘Grid’ technologies– Information modelling

Page 31: SomeSlides

Activities – current

MOTIIVE (EU FP7, http://www.motiive.net)– ISO 19109: General Feature Model

• cf. object metamodel: feature types, attributes, operations, associations

– ISO 19110: Feature cataloguing– Feature Catalogue ≡ ‘semantics repository’

• powerful operational component in SDI• inheritance: semantic re-use• behaviour: service binding

– Developing candidate implementation• ebRIM 19110 mapping

Page 32: SomeSlides

Activities – current

INSPIRE (http://www.ec-gis.org/inspire)– selected by EC to co-develop statutory Implementing Rules on

data specifications• D2.3: Scoping of themes• D2.5: Generic Conceptual Model• D2.6: Draft Methodology• D2.7: Encoding

– ocean/atmosphere/met themes• CSML leading candidate

– liaising with DEFRA on UK transposition/implementation

Page 33: SomeSlides

Activities – current

Standards– ISO

• member of BSI IST/36• ISO 19111-2: Parametric coordinates• represent NERC interests (SLA)

– OGC• ‘Observations and Measurements’ model• GML• KML• OGC documents: 06-160r1, 07-112, 07-083

Page 34: SomeSlides

CCLRC Data Portal

Local data

Local metadata

Facility N

Wrapper

Local data

Local metadata

DLS

Local data

Local metadata

JAERI

Local data

Local metadata

ISIS

Wrapper Wrapper Wrapper

Facility Section

Core Data Portal Section

At the time CCLRC had:

1 World Data Centre

5 National Data Centre

10 Minor Community based Data Centre

The Portal would enable them to all be accessible

CCLRC Data Portal

Page 35: SomeSlides

CCLRC (now Core) Scientific Metadata Model

Metadata Object

Topic

Study Description

Access Conditions

Data Location

Data Description

Related Material

Keywords providing a index on what the study is about.

Provenance about what the study is, who did it and when.

Conditions of use providing information on who and how the data can be accessed.

Detailed description of the organisation of the data into datasets and files.

Locations providing a navigational to where the data on the study can be found.References into the literature and community providing context about the study.

Today used by other e-Science Projects (e.g.

MyGrid), Facilities (e.g. ISIS, DLS, CLF, Lab-in-

a-Cell) and Internationally (e.g. SNS, CLS,

Australia)

Page 36: SomeSlides

Storage Resource Broker Virtualising the Users Data

First SRB installation outside SDSC,

Distribution Version and Installation

Guidelines, Making SRB ‘Grid

aware’ through Grid Security,Licensing

Page 37: SomeSlides

ISIS 20 Year Back Catalogue

The catalogue holds 93000 Studies and 1.87 million Data files, with 870 000 Distinct keywords categorising the data.

Page 38: SomeSlides

What we aim to provide with the e-Infrastructure

Enabling users to get rapid access to their current and past data, related experiments, publications etc., leading to improved analysis through more complete information.

Creating a powerful, long lasting scientific knowledge resource.

Page 39: SomeSlides

Protecting our valuable assets - Data Curation

2 PhD and 1 MSc studentships with the Universities of Reading and Manchester on:

Long Term Metadata Management and Quality Assurance – Arif Shaon

The Usage of semantic technologies for longterm preservation – Kaixuan Wang

Page 40: SomeSlides

Future work

Dr. Robert McGreevy, ISIS

Integrating data from disparate sources into topic centres – Challenges: Data Presentation and Integration, Trust,

Encouraging usage of data from unfamiliar sources.