The e-Science Vision Enabling New Science through Innovative Integrated Technology Solutions The Mission To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base. To “e-enable” the STFC facilities.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The e-Science Vision
Enabling New Science through Innovative Integrated Technology Solutions
The Mission
To spearhead the exploitation of e-Science technologies throughout STFC programmes, the research communities they support and the national science and engineering base.
To “e-enable” the STFC facilities.
The Vision•An increasingly sophisticated infrastructure supporting innovative exploitation of data from the full range of STFC facilities.
– integrated into National and International activities. •Improved use of computation and data management in areas with little historic engagement but growing needs. •Exploit emerging technologies to further enhance UK capabilities.•Better science...
– accelerate the research process,– improve traceability and reproducibility – meet the challenges posed by increasing data volumes. – improves cost effectiveness and quality– encourage collaboration and knowledge exchange– enable researchers to tackle more of the world’s grand challenges – improve the long-term exploitation of research outputs– bridging facilities and users
Ken Peach
UK e-Infrastructure
LHC
I SI S TS2
HPCx + HECtoR
Users get common access, tools, inf ormation, Nationally supported services, through NGS
Strategy •Expertise in systems, applications and information management •Develop and support the integrated e-infrastructure required by researchers
– Focused around exploiting the full lifecycle for scientific data– Developed through Science led projects– User focused, standards based, acknowledging constraints from National
and International collaborations and Government priorities.•Direct contributions to projects and activities
– e.g. LHC, ISIS, DLS, CLF…– Competitive and technology push
• R&D to inform and support future programmes– Grid infrastructures for the UK and Europe– Information management in a distributed heterogeneous environment– Long term data curation– Advanced analysis and visualisation
•Leveraging investment through provision of services to partner organisations•Engage Nationally and Internationally. •Take expert advice. The e-Science Advisory Board
e-Science Advisory Board
External Dr. Daron Green - BTProf. David Ingram - UCLDr David Williams - CERN Dr Jerzy Graff - BMT Dr. Graham Cameron – EBI Prof. Malcolm Atkinson - NeSCProf. Alex Gray – Cardiff Prof. Andy Lawrence – ROEProf. Carole Goble - ManchesterProf. Paul Jeffreys – Oxford
InternalNeil GeddesJohn GordonProf. Keith JefferyProf. Paul Durham
e-Science in 2001•CCLRC e-Science Centre
– ~ 8 people– 10 Projects covering astronomy,
particle physics and computing– £1M p.a.
e-Science Industry day February 2001
e-Science in 2007
•Over 100 staff in e-Science Centre•£11M income in 2006/07 •Projects in HEP, astronomy, biomedical simulation, environmental science, nano-technology, materials science
Conclusion• Strong personal belief in opportunities from ICT• Specific opportunities for STFC:
– Exploit experience in grand challenges like LHC and IPCC– Encourage collaboration across STFC facilities– Build on our unique position to lead developments internationally– Leverage the infrastructure deployed for wider UK benefit– Meet the ICT expectations of modern researchers– Use the above to stimulate innovation and support science
research•Achieving these requires
– Living close to the technology edge– Providing technological expertise and vision– Managing technology push and user pull– Active research expertise
“innovate or die” –anon.
GridPP, LCG and EGEE
CCLRC e-science centre - LHC Tier-1- Regional Operations Centre (UK+I)- Coordinator of National Grid Service- Partner in other grid deployments
Physical facilities provide data for the information Infrastructure
•Record data•Store data•Search data•Share data
Integrated system for DLS
demonstrated February 2007
ISIS 20 year back catalogue
ISIS available online
Multi-disciplinary environmental science programmes– Molecular studies of pollutants and radiation damage– Data integration resources
CCLRC provides technological support– Data management infrastructure– Grid computing– Data and information standardisation
• CML, CSML
Environmental Science
British Atmospheric Data Centrehttp:/ / badc.nerc.ac.uk
http://ndg.nerc.ac.uk
British Atmospheric Data Centre
British Oceanographic Data Centre
Simulations
Assimilation
NERC Data Grid : Googling for secure data
Bio-Medical Sciences
Data management in post-genomic biology – Integrated Systems Biology Centre– High throughput experiments– Preparations for biomedical use of DLS/ISIS ...
Biomedical simulation and integrated systems biology – Integrative Biology
• Data sharing infrastructure• Data integration and visualisation
Protein Production
CrystallisationData Collection Phasing
Protein Structure
DepositionStructure analysis
TargetSelection
Overview of Protein Crystallography
The Ontogenesis Network
Materials and Nanotechnology
Characterisation of Materials structure and properties– e-Science technology for real time analysis for experiments– Ability to run, manage and integrate the results of hundreds of
distinct calculations– Advanced visualisation for better result analysis– Long lasting archives of scientific results with easy access for
scientists
Acid Sites in Zeolites
- Ability to share results easily when required
International
?Who
Encourage and influence development of infrastructure
Synchrotron and Neutron Data Infrastructure
European DataInfrastructure
Support UK developments, drive standard access Europe wideDevelop position as a good host + develop access for UK researchers
Summary of STFC implementation of IB Grid services and applications for Integrative Biology
•A prototype IB grid with server side visualization to handle extremely large datasets (100MB per small experiment) generated on HPCx and other NGS clusters.
• Interfaces to the grid job management and SRB built on CoolGraphics, Meshalyser & Matlab and also a standalone C++ GUI for IB services.
• Control panels of specific application packages deployed on desktop while the functional core executes on NGS for data encoding & decoding
• Results sent to desktop as well as display walls
Summary of STFC implementation of IB Grid services and applications for Integrative Biology
•Implementation of soft tissue cancer models on the grid (parallelisation included), with embedded computational steering
•Implementation of 3D image reconstruction in real time using the visualisation cluster
• MRI & histopathology images of heart data
• in-vivo cancer image data (for statistics on histopathology)
• Arterial stent tomography data from ESRF
Schematic of stent in arteryStent image to geometry
reconstruction
Processed image with tumour cells and blood vessels highlightedResult from edge detection
Screenshot of real-time 3D image reconstruction,
halfway through. STFC visualization cluster is used and image sent to remote
desktop
SKOS Phase 2 (2005-06)
•W3C Semantic Web Best Practices and Deployment (SWBPD) Working Group
support digital preservation and make it easier to share the cost– must be relatively easy to use– must have a low “buy-in” in terms of effort required
for adoption– must avoid requiring wholesale change of everyone
else’s systems– must be decentralised and reproducible so that it can
live on after the formal end of the CASPAR project– must be “preservable”– must be open: open source, open standards
Cannot do everything– Working closely with other projects
CASPAR information flow architecture
•Rep
•Info
VirtualisationHow do we capture the Representation Information?
OverviewEnvironmental
driversTechnology
drivers
Revolution
e-Science Centre’s role
Environment Technology
activitiesnow future
e-Science Centre roleEnvironment
– Co-located at STFC with BADC, NEODC– IPCC Data Distribution Centre– NERC DataGrid– Background in environmental science
Technology– Standards (ISO, OGC)– Architecture– Expertise in ‘Grid’ technologies– Information modelling
Activities – current
MOTIIVE (EU FP7, http://www.motiive.net)– ISO 19109: General Feature Model
Keywords providing a index on what the study is about.
Provenance about what the study is, who did it and when.
Conditions of use providing information on who and how the data can be accessed.
Detailed description of the organisation of the data into datasets and files.
Locations providing a navigational to where the data on the study can be found.References into the literature and community providing context about the study.
Today used by other e-Science Projects (e.g.
MyGrid), Facilities (e.g. ISIS, DLS, CLF, Lab-in-
a-Cell) and Internationally (e.g. SNS, CLS,
Australia)
Storage Resource Broker Virtualising the Users Data
First SRB installation outside SDSC,
Distribution Version and Installation
Guidelines, Making SRB ‘Grid
aware’ through Grid Security,Licensing
ISIS 20 Year Back Catalogue
The catalogue holds 93000 Studies and 1.87 million Data files, with 870 000 Distinct keywords categorising the data.
What we aim to provide with the e-Infrastructure
Enabling users to get rapid access to their current and past data, related experiments, publications etc., leading to improved analysis through more complete information.
Creating a powerful, long lasting scientific knowledge resource.
Protecting our valuable assets - Data Curation
2 PhD and 1 MSc studentships with the Universities of Reading and Manchester on:
Long Term Metadata Management and Quality Assurance – Arif Shaon
The Usage of semantic technologies for longterm preservation – Kaixuan Wang
Future work
Dr. Robert McGreevy, ISIS
Integrating data from disparate sources into topic centres – Challenges: Data Presentation and Integration, Trust,
Encouraging usage of data from unfamiliar sources.