Top Banner
Introduction to HPC in Canada Erming Pei Research Computing Group, UAlberta Compute Canada / WestGrid
91

Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Feb 03, 2018

Download

Documents

hoangdieu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Introduction to HPC in Canada

Erming Pei

Research Computing Group, UAlbertaCompute Canada / WestGrid

Page 2: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Outline & Schedule

• 10:00 Introduction to Compute Canada (15’)• 10:15 Introduction to WestGrid (15’)• 10:30 Q&A -1 (5’)• 10:35 Break (10’)• 10:45 Introduction to HPC (40’)• 11:25 Q&A -2 (5’)

Page 3: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Introduction to Compute Canada

Page 4: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

About Compute Canada

• Compute Canada integrates 4 regional HPC consortia across the country– provides a shared HPC/ARC infrastructure across Canada– supports world-level leading-edge research activities.

• CC aggregates petaflops of computing power and petabytes storage capacity over Canada's high-performance networks.

• CC provides overall services including infrastructure, application, operation and user support for national-wide users.

Page 5: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Compute ConsortiaPreviously, there were 7 consortia.• ACENET • CLUMEQ • RQCHP• HPCVL• SciNet• SHARCNET • WestGrid

Currently, it has been consolidated into four consortia.• WestGrid• Compute Ontario• Calcul Québec• ACENET

Page 6: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Existing Systems & Resources

• ~40 Universities• ~27 Data Centers• ~50 Systems • ~200,000 cores, 2 Pflops, 20PB• ~100 of research software packages • ~200 experts in utilization of ARC for research

https://www.westgrid.ca/events/responding-to-canadas-research-computing-needs

Page 7: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

New CC Systems

• UVic, GP1 (Cloud)

• SFU, GP2 (General Purpose)

• UW, GP3 (General Purpose)

• UofT, LP (Large Parallel)

Page 8: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Schedule of New CC SystemsSite/Service Description Availability Resource

GP1 - UVic Large Openstack Cloud

Sept., 2016 3000cores + 40% expansion (2017)

GP2 - SFU General purpose cluster + Cloud partition

Feb., 2017 18,000cores+ 40% expansion (2017)1923 GPU nodes

GP3 - Waterloo Ditto May, 2017 19,000 cores + 40% expansion (2017)64 GPU nodes

LP - UToronto Large parallel Dec., 2017 66,000 cores

National Storage Infrastructure

HSM + Object Storage (All 4 sites)

Oct., 2016 Dozens PBs10PB to start

https://www.computecanada.ca/renewing-canadas-advanced-research-computing-platform/new-systems-at-four-national-sites/

Page 9: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Continuing Development

• Consolidation by 2018– 5-10 Data Centres– 300,000 cores, 12 Pflops, 50+ PB

• 2016-17: Commissioning new systems while decommissioning old systems

Page 10: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

CC New Organization Chart

https://staff.computecanada.ca/national_teams/chart

TLC SLC

CloudGP2 GP1

MONLP GP3

NW PSNT RS Storage

SWG

EOT VIZ DH Bio-M SPNTBio-Info

SC

Administration

Page 11: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

CC Cloud Service

• Currently Compute Canada has mainly two cloud systems: Cloud West and Cloud East

Page 12: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Access CC clouds

• Cloud East: http://east.cloud.computecanada.ca

• Cloud West: http://west.cloud.computecanada.ca

• Can access with your CC account

Page 13: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

CC Cloud Service

Page 14: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

OwnCloud

• A Dropbox-like cloud storage service– hosted by WestGrid

• Can access with WestGrid user/password

Page 15: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Globus Online

• High performance data transfer service• https://globus.computecanada.ca

Page 16: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Globus Online

• Needs MyProxy authentication (WestGrid login/passwd)• Can select existing endpoints (GridFTP service in sites)• Can create your personal endpoint with “Globus Connect

Personal”

Page 17: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Intro to WestGrid

Page 18: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

About WestGrid

USask

UBC

SFUUVi

c

UNBC

ULeth

UofC Uof

MUofW

UofR

UofA

Banff Center

BU

AU

• WestGrid is one of four regional HPC consortia of Compute Canada

• WestGrid itself has 15 partner institutions across British Columbia, Alberta, Saskatchewan and Manitoba.

TRU

Page 19: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Overall Resources

2012/13

• By far, WestGrid has more than 40,000 compute cores and 9PB storage space.

• About 1,000 Compute Canada users from 475 projects are currently using WestGrid systems.

Text and image source: Lindsay Sill, Intro to WestGrid 2013

* HQP stands for highly qualified personnel

Page 20: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Text and image source: Lindsay Sill, Intro to WestGrid 2013

WestGrid Staff

• Executive Director (Lindsay Sill)• Director of Operations (Patrick Mann)• Collaboration Coordinator• Visualization Coordinator• Site Leads• Programmers• System Analysts• System Administrators

Page 21: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UofA (Jasper)

• Processors: 4160 cores– 240 nodes with Xeon X5675 processors, 12 cores (2 x 6) and 24 GB of memory. – 160 nodes with Xeon L5420 processors, 8 cores (2 x 4) and 16 GB of memory.

– Interconnect: • Infiniband QDR, 40 Gbit/s, with a 1:1 blocking factor• Infiniband DDR, 20 Gbit/s, with a 2:1 blocking factor

– Storage: ~830TB (356TB Lustre + 280 TB storage servers + 192TB IS10K)– Quickstart: http://www.westgrid.ca/support/quickstart/jasper

Page 22: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UofA (Hungabee)

• Processors: 2048 coresShared-memory multiprocessor, comprises an SGI UV100 login node and an SGI UV1000 computational node, 16TB memory.

• Interconnect: ccNUMA(cache-coherent non-uniform memory access ), combination of Intel's Quickpath and SGI's NUMAlink

• Storage: 53TB NFS, and 356TB Lustre shared with Jasper• Quickstart: www.westgrid.ca/support/quickstart/hungabee

Page 23: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UBC (Orcinus)

• Processors: 9600 cores (3072 Intel Xeon E5450 quad-core/16GB RAM + 6528 Xeon X5650 six-core/24GB RAM)

• Storage: ~450TB, Lustre• QuickStart: www.westgrid.ca/support/quickstart/orcinus

Page 24: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UofC (Breezy)

• Processors: 384 cores (16 node Appro AMD cluster with quad-socket, 6-core AMD Istanbul processors (24 cores @ 2.4 GHz) per node, 256GB RAM/node)

• Interconnect: 4X DDR InfiniBand• Storage: ~450TB, IBRIX• Quickstart: http://www.westgrid.ca/support/quickstart/breezy

Page 25: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UofC (Lattice)

• Processors: 4096 cores– 512 nodes with Intel Xeon L5520 8-core processor, 12 GB of memory.

– Interconnect: • InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s, 2:1 blocking

– Storage: 160 TB shared with Lattice and Breezy– Quickstart: http://www.westgrid.ca/support/quickstart/lattice

Page 26: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UofC (Parallel)

• Processors: 7056 cores– 528 12-core standard Xeon E5649 nodes, 24 GB of RAM.– 60 special nodes with 3 GPGPUs each (NVIDIA Tesla M2070s, 5.5 GB Memory).

• Interconnect: – InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s, 2:1 blocking

– Storage: 160 TB shared with Lattice and Breezy– Quickstart: http://www.westgrid.ca/support/quickstart/lattice

Page 27: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UM (Grex)

• Processors: 3792 cores (316 SGI Altix XE cluster, with two 6-core Intel Xeon X5650 2.66GHz processors, 48-96GB RAM/node

• Interconnect: Non-blocking Infiniband 4X QDR• Storage: >100TB• Quickstart: www.westgrid.ca/support/quickstart/glacier

Page 28: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, UVic (Hermes/Nestor)

• Processors: 4416 cores [2112 (Hermes), 2304 (Nestor) ]– IBM iDataplex server with eight 2.67-GHz Xeon x5550 cores with 24 GB of RAM– Dell C6100 servers with twelve 2.66-GHz Xeon x5650 cores and 24 GB of RAM

• Interconnect: – 84 Hermes nodes use two bonded Gigabit/s Ethernet links– New Hermes 4X QDR non-blocking, 32-40Gb/s

• Storage: 1.2PB, GPFS• Quickstart: www.westgrid.ca/support/quickstart/hermes_nestor

Page 29: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, SFU (Bugaboo)

• Processors: 4584 cores– 16 nodes with Intel Xeon E5430 4-core, 16GB/node;– 254 node with Xeon X5650 6-core processor, 24GB/node – 16 Xeon X5355 quad-core processor, 16GB/node

• Interconnect: Infiniband using a 288-port QLogic switch• Storage: ~700TB• Quickstart: www.westgrid.ca/support/quickstart/bugaboo

Page 30: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid Facilities, USask(Silo)

• Disk: 4.2 PB raw total, 3.15 PB usable– 600 x 1TB SATA drives, RAID 6– 1800 x 2TB SATA drives, RAID 6

• Tape: IBM LTO 3584 tape library– ~3PB total, 1460 x LTO4 tapes, 920 LTO5 tapes.

• Backup System: IBM Tivoli Storage Manager (TSM)– Quickstart: http://www.westgrid.ca/support/quickstart/silo

Page 31: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Site Status

https://www.westgrid.ca/support/system_status

Page 32: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Use CC/WestGrid

• Apply for a CC/WestGrid account• Get a Grid Certificate / Proxy• Existing Resource Classification• New Resource Allocation• Software • Site status• Technical Support

Page 33: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

CC/WestGrid Account1. First ask your PI to apply for a Compute Canada account if he/she doesn’t have.

2. Then, you yourself apply for Compute Canada account as part of your PI’s project.

https://www.westgrid.ca/support/accounts/getting_account

3. Your PI approves your application

4. You apply for an consortia account, e.g. WestGrid, ACEnet

Note: It takes a couple of days for your account to be created on all sites.

Page 34: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Grid Certificate 1. Log in to http://portal.westgrid.ca and “Request a Grid Certificate”2. In “My Account” webpage, you will see two buttons for downloading you Grid certificate and private key.

Page 35: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Grid Proxy• Grid proxy is used in submitting Grid jobs or transferring

files across Grid. (Limited lifetime and limited privileges)

• Users just need log in to any WestGrid site and then run: – myproxy-logon

Page 36: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Resource Classification

Program Type SitesSerial Bugaboo, Hermes, JasperParallel Bugaboo, Nestor, Orcinus, Lattice, Parallel,

Jasper, GrexSMP Parallel Breezy, HungabeeLarge memory Grex, Breezy, HungabeeVisualization ParallelGaussian GrexMatlab Orcinus (distributed computing toolbox),

Jasper/Hungabee (UofA license), etc.Storage Silo, Bugaboo

Page 37: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Software

• WestGrid has both free and commercial software.

• You can use software packages in WestGrid– check this webpage to see if certain

software release is already avaliable on WestGrid

• Software list webpagehttps://www.westgrid.ca/support/software

Page 38: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

WestGrid support

Any questions, you can ask [email protected]

Page 39: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

New Resource Allocation

• RAC (Resource Allocation Competition)– https://www.westgrid.ca/support/accounts/resource_allocations

• RAC = RPP + RRG– RPP: Research Platforms and Portals (scientific/technical review needed)

– RRG: Resources for Research Groups (scientific/technical review needed)

• RAS: Rapid Access Service (formerly “Default Allocation”). No scientific/technical review needed

Email to:[email protected] / [email protected]

Page 40: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

New RAC Schedule

Page 41: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Introduction to HPC

Page 42: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Outline• What is HPC• Capability vs. Capacity• Programming model

– Serial/Parallel

• Architecture– SMP/DSM/MPP, UMA/NUMA/COMA

• Interconnect– PCI(E)/Infiniband/NUMALink

• Storage– RAID, Multipathing, Data Bus– DAS/NAS/SAN– Parallel File Systems

• Evolution of Computing– Mainframe, Cluster, Grid, Cloud, Big Data

Page 43: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

What is HPC?

• High Performance Computing– most generally refers to the practice of aggregating

computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

Page 44: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Capability vs. Capacity

• Capability computing is typically thought of as using the maximum computing power to solve a single large problem in the shortest time. – e.g. A real-time weather simulation and prediction application.

• Capacity computing in contrast is typically thought of as using multiple cost-effective computing power to solve a big number of small problems or a small number of big problems. – e.g. Tons of user access to a web service simultaneously or, – To analyze huge amount HEP data: split it into many small

pieces and distribute them across multiple cluster nodes.

Page 45: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Spectrum

• Capability → Capacity

Hungabee• Single system • 2048 cores• 16TB• Hi-speed interconnet

Breezy• 16 fat node cluster• 256GB/node

Bugaboo• 256+ node cluster• 16-24GB/node

BlueGene/Q• 4096 low power nodes• 65536 processor cores

Page 46: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Architectures

• By processor– SMP (Symmetric Multi-Processors )– DSM (Distributed Shared Memory)– MPP (Massive Parallel Processors)

• By memory– UMA (Uniform Memory Access)– NUMA (Non-Uniform Memory Access)– COMA (Cache Only Memory Access)

Page 47: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Evolution of Architectures

Message Passing UMA

COMANUMA

Page 48: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Programming Model

• Serial– Instructions are executed one after another on

a single CPU.• Parallel

– Computations are carried out concurrently on multiple processors.

• SPMD: single program multiple data • MPMD: multiple programs multiple data

Page 49: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Parallel Programming Paradigms/Tools

– Data Parallel• HPF (High Performance Fortran)

– Task Parallel• OpenMP (Open Multi-Processing)

– Message Passing• PVM (Parallel Virtual Machine)• MPI (Message Passing Interface)

– MPICH, Open MPI, etc.

– Hybrid (MPI+OpenMP, MPI+GPGPU)– Advanced: Chapel, PGAS(Partitioned Global Address

Space)

Page 50: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Interconnect

• PCI• PCI Express• Infiniband

• HyperTransport (AMD)• QPI/Omni-path (Intel)• NUMAlink (SGI)

Page 51: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Serial vs. Parallel

• In early days, serial connections were reliable but quite slow, so parallel connections was developed to send multiple pieces of data simultaneously.

• While later it turned out that parallel connections have their own problems – electromagnetical interference between wires.

• So the pendulum swung back to highly-optimized serial connections.

Serial → Parallel → Serial

Page 52: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

PCI/PCI-X

• PCI: Peripheral Component Interconnect (32bit)• PCI-X: PCI-eXtended (64bit)

Image source: http://www.altera.com/products/ip/altera/t-alt-pci_soln.html

Electromagnetic interference and signal degradation are common in parallel connections, which slows the connection down. The additional bandwidth of the PCI-X bus means it can carry more data but generates even more noise.

Page 53: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

PCI-Express

A single PCI Express lane, can handle 200 MB/s. A 16X PCI-E connector can reach 6.4 GB/s.

• Instead of using the parallel connections, PCI-E has a switch controlled point-to-point serial connections.

• Every device has its own dedicated connection, so devices no longer share bandwidth like they do on a normal data bus.

Image source: http://computer.howstuffworks.com/pci-express2.htm

Page 54: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Infiniband

Image source: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Fall_2007/wiki4_001_a1

Page 55: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Infiniband• The internal connections in most computers are inflexible and

relatively slow. • As I/O increases, the existing bus system becomes a bottleneck.• While through InfiniBand switches, Infiniband channels are created

to connect hosts (HCAs) and I/O targets (TCAs) • Instead of sending data in parallel across the backplane bus,

Infiniband specifies a serial bus– The serial bus can also carry multiple channels of data at the same time

in a multiplexing signal.

Infiniband theoretical throughput in Gb/s

Page 56: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Infiniband vs. PCI/PCI-Express

http://www.mellanox.com/pdf/whitepapers/PCI_3GIO_IB_WP_120.pdf

Page 57: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Storage• Storage Protocol• I/O BUS

– Serial vs. Parallel• Redundancy

– RAID (Redundant Array of Inexpensive Disks)– Multipathing (Redundant physical paths )

• Storage Attaching Approaches– DAS (Direct Attached Storage)– NAS (Network Attached Storage)– SAN (Storage Area Network)

Page 58: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Storage Protocol• CIFS/SMB (Common Internet File System)

– application-layer network protocol mainly used to provide shared access to files, printers, etc. between nodes

• NFS (Network File System) – application-layer network protocol only allows access to files over an

Ethernet network.

• SCSI/iSCSI (Internet Small Computer System Interface)– iSCSI is a mapping of regular SCSI protocol over TCP/IP

• FC (Fibre Channel)– transport protocol which mainly transports SCSI commands over

Fibre Channel networks• FCoE (Fibre Channel over Ethernet)

– This allows Fibre Channel to use 10 Gigabit Ethernet networks (or higher speeds) while preserving the Fibre Channel protocol.

Page 59: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

I/O Bus

• ATA → SATA– ATA (Advanced Technology Attachment)– SATA (Serial ATA)

• SCSI → SAS– SCSI (Small Computer System Interface)– SAS (Serially Attached SCSI)

Parallel → Serial

synchronizationelectromagnetic interference

cost

http://www.denali.com/wordpress/index.php/dmr/2010/02/02/ssd-interfaces-and-performance-effects

Page 60: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

RAIDRedundant Array of Independent Disks

• RAID 0: Striping, without parity or mirroring. • RAID 1: Mirroring, without parity or striping.• RAID 2: Bit-level striping with dedicated parity.• RAID 3: Byte-level striping with dedicated parity.• RAID 4: Block-level striping with dedicated parity.• RAID 5: Striping with single distributed parity. • RAID 6: Block-level striping with double distributed

parity.• Nested RAID: RAID 10, RAID 50, RAID 60, etc.

Page 61: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Example: RAID 0, 1, 5

Page 62: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Example: Nested RAID 10, 50

Page 63: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Comparison

http://www.techwarelabs.com/10-things-to-consider-before-setting-up-raid/

Page 64: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Multipathing

• Multipath I/O – Is a fault-tolerance and performance

enhancement technique. – To create multiple logical paths between the

server and the storage devices.• via adapters, cables, and switches, etc.

– In the event that one path fails, multipathing uses an alternate path so that applications can still continuingly access their data.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/DM_Multipath/

Page 65: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

DAS/NAS/SAN

http://abdullrhmanfarram.wordpress.com/2013/04/08/storage-technologies-das-nas-and-san/

• Storage directly attached• High cost of management• Inflexible• Expensive to scale

• Storage access through Ethernet

• Scalable and flexible

• Storage access through FC/IB• Much better performance• More flexible and scalable• Increases data availability

Page 66: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Parallel File System

• Distribute data into multiple storage nodes and access via high-speed network.

• Concurrent (often coordinated) access from many clients

• Provide global shared meta data (locations, file names, sizes, etc)

Page 67: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Parallel File Systems

• Lustre• GPFS• Panasas• NFSv4??

Page 68: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Parallel File Systems

• Lustre• GlusterFS• OrangeFS• GPFS• IBRIX• CXFS• Panasys• PVFS2• PNFS (NFSv4.1)• GoogleFS• Ceph

Page 69: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Example: Lustre

Image source: http://wiki.lustre.org/manual/LustreManual18_HTML/figures/LustreArch.png

Page 70: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Object storage• Object storage appears as a collection of objects.• An object typically includes not only data itself, but some extra

information such as meta data, OID, attributes, etc.• It moves lower-level functionalities such as space management,

security functions into the storage device itself, accessing the device through a standard object interface.

• Especially good for storing unstructured data such as photos, songs, etc.

Block Storage Object Storage

Page 71: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Comparison of 3 storage types

NFS and SMB/CIFS Fibre Channel/iSCSI AWS S3https://insights.ubuntu.com/2015/05/18/what-are-the-different-types-of-storage-block-object-and-file/

Page 72: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Comparison of 3 storage types

http://blog.sungardas.com/CTOLabs/2015/10/object-storage-the-alternator-of-cloud-computing/

Page 73: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Evolution of Computing

• Mainframe: super power • Cluster: worker bees• Grid: global orchestration• Cloud: everything as a service • Big Data: find needle in the sea

Page 74: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Evolution of Computing

Multiple sites (geographically distributed)Global SchedulingVirtualized OrganizationTransparent data accessUnified security infrastructure

Virtualized resourcesElastic computingBuild everything as a service!

Multiple nodesBatch job schedulingParallel computing

Single machineShared memory

Big volumeBig variety Big velocityFast analysis/decision

Mainframe

PC/Cluster

Grid

Cloud

Big Data

Image and test source: http://www.wikipedia.org/

Page 75: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Mainframe

• Originally referred to large cabinets that housed powerful CPU and shared memory.

• Modern design:– Redundant internal engineering

resulting in high reliability and security

– Extensive I/O facilities – High utilization rates – Uses virtualization technology

to support massive throughput

Amdahl 470V/6

Page 76: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Cluster

• Tightly connected computers that work together as a single system– Low cost– scalability– Flexibility

• Batch job scheduling/management• Parallel computing

Page 77: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Grid

• Grid computing is the coordination on massive computer resources from multiple locations, to reach a common goal. The resources are:– loosely coupled – heterogeneous – Geographically dispersed– Dynamic

• Main features:– High level scheduling/Workload management– Unified security infrastructure – Global information system– Virtualized organization – Transparent data transfer interface

Page 78: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Example: WLCG (Worldwide LHC Computing Grid)

Page 79: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Cloud• Initially

– IAAS (Infrastructure as a Service)– PAAS (Platform as a Service)– SAAS (Software as a Service)

• Subsequently– HAAS (Hardware as a Service)– NAAS (Network as a Service)– DAAS (Database as a Service)– CAAS (Communication as a Service)– BPAAS (Business Process as a Service)

• Eventually– XAAS (Everything as a service!)

Page 80: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Image source: www.telezent.com

Page 81: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

X

Image source: http://blueatoll.com/blog/the-next-generation-enterprise-business-as-a-service-in-the-cloud/

Page 82: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Big Data

• What is Big Data– refers to technologies of handling data that is

too diverse, fast-changing or massive for conventional technologies to address efficiently.

– Today new technologies make it possible to realize value from Big Data.

Page 83: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Big Data’s Four V’s

http://www.ibmbigdatahub.com/blog/how-big-data-and-cognitive-computing-are-transforming-insurance-part-2

Page 84: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Big Data: Core Technology

• Foundation stone– Google (GFS, MapReduce, Big Table)

• Free version– Apache (HDFS, YARN, Hbase, Hive, Pig…)

Page 85: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Big Data: MapReduce

Image source: http://www.slideshare.net/tothc/introduction-to-hadoop-and-map-reduce

Page 86: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Big Data: Evolution

• New Troika– Google (Dremel, Pregel, Caffeine)

• Free version– Apache Drill, Apache Giraph, Stanford GPS

Image source: http://blog.mikiobraun.de/2013/02/big-data-beyond-map-reduce-googles-papers.html

Page 87: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Example: MapReduce vs. Dremel

Query: SELECT SUM(CountWords(txtField)) / COUNT(*) FROM T1 (T1: 85 billion records, 87 TB, 3000 nodes)

Image source: http://www.cubrid.org/blog/dev-platform/meet-impala-open-source-real-time-sql-querying-on-hadoop/

Page 88: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Big Data Ecosystem

Page 89: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Summary

• Introduced Compute Canada and its consortia • Introduced WestGrid and the member sites• Introduced high performance computing from different

angles such as architecture, memory, interconnect, storage, file system, etc.

• Also briefed evolution of computing technologies from mainframe, cluster, grid, cloud, to the current hot topic —— Big Data.

Page 90: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Follow-up Talks

• Sept.15, 2016 Tips for Submitting jobs & Moving Data (with Hands-on session)– Masao Fujinaga

• Sept. 27, 2016 Scheduling & Job Management (with Hands-on session)– Kamil Marcinkowski

See more details in: https://www.westgrid.ca/events

Page 91: Introduction to HPC in Canada - WestGrid · PDF fileIntroduction to HPC in Canada Erming Pei ... • Backup System: IBM Tivoli Storage Manager (TSM) ... – RPP: Research Platforms

Thanks!Questions?