“Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation Internet2 Joint Techs meeting Baton Rouge, LA 24 January 2012 Tad Reynales, Chief Infrastructure Officer California Institute for Telecommunications and Information Technology University of California, San Diego http://www.calit2.net 1
37
Embed
Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
“Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD”
Panel Presentation Internet2 Joint Techs meeting
Baton Rouge, LA 24 January 2012
Tad Reynales, Chief Infrastructure Officer California Institute for Telecommunications and Information Technology
University of California, San Diego http://www.calit2.net
1
Abstract
Campuses are experiencing an enormous increase in the quantity of data generated by scientific instruments and computational clusters and stored in massive data repositories. The shared Internet, engineered to enable interaction with megabyte-sized data objects is not capable of dealing with the gigabytes or terabytes to petabytes of modern scientific “big data”. Instead, a high performance cyberinfrastructure is emerging to support data-intensive research, which requires affordable, reliable network, compute, storage and archive resources, and staff CI expertise. Calit2, SDSC and the UCSD campus are engaged in a multi-year effort to design and deploy campus cyberinfrastructure which will integrate data generation, transmission, storage, analysis, visualization, curation and sharing, driven by applications as such as high throughput genomics -- DNA and RNA sequencing and gene expression profiling -- for scientific and medical research.
UCSD Network Backbone
V SIO LJ J N Muir K B M U S A CALIT2 BIOE EBU1 EBU3B HEP
B-720 M-720 B M
Distribution Nodes Default Gateway / L3 routing Packet Capture Forwarding Netflow Statistics Stateful Firewall MAC drops
Primary Path for all routed traffic
Backup Path Research Path VLAN extension
300 Buildings 75,000 IPs 1,000 Switches
Source: ACT, UCSD
UCSD Colocation and Research Networks
Thunder Lightning
Arista Arista
Jun.EX Jun.EX Jun.EX
CMMe
Optiput
Calit1101
Bio-e
Leichtag
BB-1
BB-2
CMMw
RC-1 RC-2
M-core B-core
Source: ACT, UCSD
UCSD Campus
CENIC Connections
HPR DC ISP
Commodity Internet
K-12 Community Colleges State Universities Akamai Google
Amazon S3
Cenic UCs
MX-0 MX-1
1 1 10 2 20 10
LA
Rvsd San Diego
San Diego (diverse)
Tustin
LA
Source: ACT, UCSD
CENIC HPR Backbone
UCSF UCB UCOP UCSC STAN UCD UCDMC
NASA
archive
NPS
UCSB
UCCSN
UCM
UofA
ASU
UCSD UCR UCI UCLA Los Nettos
USC Caltech
RIV
SAC SVL
LAX
20G
10G
1G Source: ACT, UCSD
Current UCSD Prototype Optical Core: Bridging End-Users to CENIC L1, L2, L3 Services
QuartziteCore
CalREN-HPRResearch
Cloud
Campus ResearchCloud
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE
10GigE
...Toothernodes
Quartzite CommunicationsCore Year 3
ProductionOOO
Switch
Juniper T3204 GigE4 pair fiber
Wavelength Selective
Switch
To 10GigE clusternode interfaces
..... To 10GigE clusternode interfaces and
other switches
Packet Switch
32 10GigE
Source: Phil Papadopoulos, SDSC/Calit2 (Quartzite PI, OptIPuter co-PI)
* COLOCATION The CIDT recommends that UCSD fund the use of at least 45 racks in this facility for near‐term needs of campus researchers to freely host their equipment, and begin discussions on how to meet long term needs.
* CENTRALIZED DISK STORAGE The CIDT recommends an initial purchase of 2 PB of raw storage capacity to supplement the Data Oasis component of SDSC’s Triton Resource, and operating funds to manage and scale up the UCSD storage resource to meet demand.
* DIGITAL CURATION AND SERVICES The CIDT recommends the establishment of the Research Data Depot, a suite of three core services designed to meet the needs of modern researchers. The three services are 1) data curation, 2) data discovery and integration, and 3) data analysis and visualization.
* RCI Network The CIDT recommends that the current RCN pilot be expanded, and requests funds to connect 25 buildings using 10 Gb/s Ethernet networking within the next several years. Funding and access philosophy would aim to encourage usage of the network.
* CONDO CLUSTERS The CIDT recommends UCSD embrace the concept of condo clusters and exploit the deployment of the Triton Resource to launch the initiative.
* CI EXPERTISE The CIDT recommends that a coordinating body be established to maintain a labor pool of such experts and work out mechanisms that would allow customers to pay for their services.
UCSD CI Design Team Recommendations
Data Oasis – 3 Different Types of Storage
HPC Storage (Lustre-Based PFS) • Purpose: Transient Storage to Support HPC, HPD, and Visualization • Access Mechanisms: Lustre Parallel File System Client
Cloud Storage • Purpose: Long-Term Storage of Data that will be Infrequently Accessed • Access Mechanisms: S3 interfaces, DropBox-esq web interface, CommVault
Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable
2005 2007 2009 2010
$80K/port Chiaro (60 Max)
$ 5K Force 10 (40 max)
$ 500 Arista 48 ports
~$1000 (300+ Max)
$ 400 Arista 48 ports
• Port Pricing is Falling • Density is Rising – Dramatically • Cost of 10GbE Approaching Cluster HPC Interconnects
Source: Philip Papadopoulos, SDSC/Calit2
Arista Enables SDSC’s Massive Parallel 10G Switched Data Analysis Resource
NSF Funds a Big Data Supercomputer: SDSC’s Gordon-Dedicated Dec. 5, 2011
• Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW – Emphasizes MEM and IOPS over FLOPS – Supernode has Virtual Shared Memory:
– 2 TB RAM Aggregate – 8 TB SSD Aggregate
– Total Machine = 32 Supernodes – 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science
Source: Mike Norman, Allan Snavely SDSC
UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage
Source: Philip Papadopoulos, SDSC, UCSD
OptIPortal Tiled Display Wall
Campus Lab Cluster
Digital Data Collections
N x 10Gb/s
Triton – Petascale Data Analysis
Gordon – HPD System
Cluster Condo
WAN 10Gb: CENIC, NLR, I2
Scientific Instruments
DataOasis (Central) Storage
GreenLight Data Center
Making University Campuses Living Laboratories for the Greener Future
Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server
512 Processors ~5 Teraflops
~ 200 Terabytes Storage 1GbE and
10GbE Switched/ Routed
Core
~200TB Sun
X4500 Storage
10GbE
Source: Phil Papadopoulos, SDSC, Calit2
Grant Announced January 17, 2006
Calit2 CAMERA: Over 4000 Registered Users From Over 80 Countries
Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis
http://camera.calit2.net/
The greening of the data center
Multiple strategies are possible: – More energy efficient compute/storage/network designs – Local energy efficient data center designs – Remote data centers near less expensive energy sources – Cloud resources and cloud-based services (local, remote) – Novel cooling technologies
– Liquid-cooled CPU’s, computers and racks – Oil immersion designs
– Containerized data centers (HP Pod, IBM PMDC, Cirrascale Forest) – Energy-efficient algorithms; software to move workload around
– UCSD campus energy production (solar, co-generation, fuel cell) * 2 MW solar, 5 MW renewal produced on campus; 40 MW peak demand
* 2.8 MW fuel cell planned, along with 2.8 MW advanced storage facility
The GreenLight Project: Instrumenting the Energy Cost of Computational Science • Focus on 5 Communities with At-Scale Computing Needs:
– Metagenomics – Ocean Observing – Microscopy – Bioinformatics – Digital Media
• Measure, Monitor, & Web Publish Real-Time Sensor Outputs – Via Service-oriented Architectures – Allow Researchers Anywhere To Study Computing Energy Cost – Enable Scientists To Explore Tactics For Maximizing Work/Watt
• Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness
• Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing
Source: Tom DeFanti, Calit2; GreenLight PI
GreenLight Project: Remote Visualization of Data Center
Source: Virtual Reality Lab, Calit2
GreenLight Project Heat Distribution
Combined heat + fans
Realis2c correla2on
Source: glimpse.calit2.net
GreenLight Project Allows for Testing of Novel Architectures on Bioinformatics Algorithms
“Our version of MS-Alignment [a proteomics algorithm] is more than 115x faster than a single core of an Intel Nehalem processor, is more than 15x faster than an eight-core version, and reduces the runtime for a few samples from 24 hours to just a few hours.” —From “Computational Mass Spectrometry in a Reconfigurable Coherent Co-processing Architecture,” IEEE Design & Test of Computers, Yalamarthy (ECE), Coburn (CSE), Gupta (CSE), Edwards (Convey), and Kelly (Convey) (2011)
UCSD Planned Optical Networked Biomedical Researchers and Instruments
Cellular & Molecular Medicine West
National Center for
Microscopy & Imaging
Leichtag Biomedical Research
Center for Molecular Genetics Pharmaceutical
Sciences Building
Cellular & Molecular Medicine East
CryoElectron Microscopy Facility
Radiology Imaging Lab
Bioengineering
Calit2@UCSD
San Diego Supercomputer
Center
GreenLight Data Center
• Connects at 10 Gbps : – Microarrays – Genome Sequencers – Mass Spectrometry – Light and Electron
Microscopes – Whole Body Imagers – Computing – Storage
Summary -- 6 Ingredients of Campus RCI RCI requires sustainable funding models to enable affordable, reliable: I. High performance networks – lab, campus à regional, national, global
II. Storage resources and services to store and preserve research data
III. Compute resources – shared services, colocation or condo clusters
IV. Data curation and management to share and archive research data - Federated with metadata standards and Digital Asset Management system - Domain-specific repositories (Protein Data Bank, GenBank, Cancer Genome)
V. Energy source(s) and data center space (several possible strategies)
VI. Cyberinfrastructure expertise (training, experience, collaboration)
…one more ingredient of RCI: Microbrews!
Acknowledgements Sponsors: • Larry Smarr, Director, Calit2 (a UC San Diego/UC Irvine partnership) • Ramesh Rao, Director, UCSD Division, Calit2 Sources: * Larry Smarr, Professor of Computer Science and Engineering, UCSD • Tom DeFanti, Director of Visualization, Calit2 • Philip Papadopolous, Division Director, SDSC and Researcher, Calit2 • Brian Dunne, network engineer, Calit2 • Chris Misleh, sysadmin, high-throughput genomics, CALIT2 • UCSD School of Medicine, High-Throughput Genomics Core • Valerie Polichar, Infrastructure Liaison, Administrative Computing &
Telecommunications, UCSD • Mark Shinn, (former) network architect, ACT, UCSD • Corporation for Education Network Initiatives in California (CENIC)