2
What is e-Infrastructure?
The integration of digitally-based technology, resources, facilities, and services combined with people and organizational structures needed to support modern, collaborative research (and teaching).
1.Data and Storage 2.Software (and Algorithms) 3.Hardware (Compute) 4.Networks 5.Security and authentication6.People (Collaboration, Skills, Capacity) 7.The Digital Library
Bioinformatics software challenges• This brings a onslaught of new
challenges for bioinformatics:– projects that used to require teams
of 500 are now accessible to small teams
– but biology curricula (i.e. biologists) still lack computational skills.
– thus biologists are overwhelmed by large amounts of data
– furthermore data types are young - so software is young, thus
• software may be badly built (by biologists with no formal software dev training/xp).
• software needs to be frequently updated (bugfixes, algorithmic improvements (sensitivity/specificity), new data type support).
changes everything for biology
ARCHER• UK National Supercomputing
Service• Replacement for HECToR• LINPACK = 1.359 Pflop/s
• EPSRC is the managing partner on behalf of RCUK• NERC are the other partner
research council• Cray XC30 Hardware
• Nodes based on 2× Intel Ivy Bridge 12-core processors
• 64GB (or 128GB) memory per node• 3008 nodes in total (72162 cores)• Linked by Cray Aries interconnect
(dragonfly topology)
External Network inside JASMIN
Unmanaged Cloud – IaaS, PaaS, SaaS
JASMIN Internal Network
Panasas storage
Lotus Batch Compute
JASMIN Cloud Architecture
Standard Remote Access Protocols – ftp, http, …
Managed Cloud - PaaS, SaaS
JASMIN Analysis Platform
VM
Project1-orgScience Analysis
VM 0
Science Analysis
VM 0Science Analysis
VM
JASMIN Cloud Management Interfaces
Direct File System Access
Direct access to batch processing
cluster
Appliance Catalogue
Firewall + NAT
Firewall
optirad-org
Science Analysis
VM 0Science Analysis
VM 0
IPython Slave VM
File Server VM
IPython JupyterHub VM
eos-cloud-org
Science Analysis
VM 0
Science Analysis VM
0
EOSCloud VM File Server
VM
EOSCloudFat Node
IPython Notebook VM with access cluster through IPython.parallel EOSCoud Desktop as a Service
with dynamic RAM boost
Appliance Catalogue
Appliance Catalogue
Firewall + NAT Firewall + NAT
Firewall
Thanks to Phil Kershaw
Bio-Linux: A scalable solution • Comprehensive, free bioinformatics workstation based on Ubuntu
Linux and Debian Med
• 11 years & 8 major releases
• Around 8000 users from 1600 locations
• 200+ bioinf packages including big integrative tools :- QIIME, Galaxy Server, PredictProtein, EMBOSS, ...Incorporates all software
Dual BootLinux Live Local Servers Cloud
EOS Cloud
• A tenancy in the JASMIN Unmanaged Cloud (& QMUL RCC)• Reusing JASMIN web interfaces and user management to
provide custom IaaS software platform• Each receives two VMs
– Bio-Linux– Ubuntu Docker hosting environment
• Users have total responsibility for instantiated system• Accessible though standard remote desktop tools• Scalability limited by support available
Why Cloud?• Data sets can be too big or restricted to easily move
– move the compute to the data– Researcher work patterns are maintained
• Tools such as Bio-Linux/Docker etc are community enablers• More efficient use of shared resources• Central maintenance of infrastructure• Central Management of data sharing agreements possible• Lower barrier to entry (Compared to traditional HPC and Grid)• What type of cloud?• What role for traditional HPC?
TRAINING IS KEY TO MAKING INFORMED CHOICES
EOS Cloud next?
• Expand currently available resource beyond current limitations?
• Create deployable machine image for other cloud marketplaces
• EOS/institutional badging to give users confidence in quality
Pilot Users
• CEH Bioinformaticians using the EOS Cloud to study patterns in microbial biodiversity
• Genomic and transcriptomic data from fish toxicogenomics studies at Exeter
© USC
© Wikimedia Commons
Pilot Users
• Creating compute pipelines and containers for each OSD in silico analysis – HPC, Cloud (IaaS & PaaS)– Portable
• Run same analysis on different laptops/grids/clouds– Repeatable/Reproducible
• Same input gives same output given that reference databases did not change– Preservation
• All analysis tools and dependencies are in one image• Images are simple tar.gz • Preserving Docker and base images is preserving all analysis
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating better, more sustainable, research software to enable world-class research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software Supported by RCUK
Communication
Website & blog
Campaigns
Advice
Guides
Courses
Workshops
Fellowship
Research
Software
Policy
Training
Community
Consultancy41 projects 92 evaluations
4 surgeries
33 UK SWC workshops
1000+ learners
50,000 readers
41 domainambassadors
20+ workshops organised
740 researchers50,000 grants
analysed
150+ contributed articles19,000 unique visitors per month
272 RSEs engaged1700 signatures13 issues highlighted
17
The end of the beginning, not the beginning of the end!
• A holistic approach is required with all parts of e-infrastructure supported from the Hard to Soft to Wet!
• Good start up investments need continuity to ensure impact– Certain tools are foundations upon which large swathes of community depend
– Putting tools next to immovable data ensures value!
• Integrating with larger activities ensure benefits of scaling– you can’t steer something you’re not involved with…
• Abstract underpinning e-infrastructure services from the users, as they’re not interested!– Run something on one resource should be able to be moved to others throug hthe use of standards
etc!
– I have ignored the institutional resources…
****WARNING****Institute for Environmental Analytics Summer School on e-infrastructure for the environment 19th – 22nd Sept ’16, Oxford.