Top Banner
ANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell William Scullin Tim Williams
29

ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

ANL Site Update ScicomP/SPXXL Summer Meeting 2016 RayLoy

BenAllenGordonMcPheetersJackO'ConnellWilliamScullinTimWilliams

Page 2: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Site Update ¤  System overview ¤  Support topics ¤  GPFS AFM Burst Buffer ¤  GHI deployment ¤  Performance Portability

2

Page 3: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

ALCF Resources

3

Page 4: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

ALCF Roadmap ¤  CORAL

¥  Theta – interim system (late 2016) ¡  Cray/Intel KNL >=2500 nodes, >= 60 cores/node (8.5PF) ¡  Peak comparable to Mira

¥  Aurora – late 2018 ¡  ALCF-3, successor to Mira ¡  Cray/Intel KNH >=50K nodes

¤  Blue Gene will continue to be a primary resource through the installation of Aurora ¥  Both HW and SW considerations for extended life

4

Page 5: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Support/Requirements

5

Page 6: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

GPFS on BG/Q ¤  Current: ESS 3.5.x @ GPFS 4.1.1-2 ; DDNs, BG/Q clients @ GPFS 3.5

¥  Cannot upgrade ESS without BG/Q GPFS >=4.1 ¥  End-of-service for GPFS 3.5 coming up in 2017 ¥  BG/Q will continue in service well beyond that

¤  Support for GPFS 4.1 (or 4.2) on BG/Q IONs would be very helpful ¥  Concern that fixes for GPFS and AFM at GPFS 4.2 will continue to be

back-ported to GPFS 4.1.1 in a timely manner

¤  ANL supports SPXXL 2016 requirement (Ticket 21)

6

Page 7: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Compilers

¤  IBM XL Compilers for Blue Gene/Q ¥  XL Fortran 14.1

¡  Current BG/Q version August 2015 ¡  AIX, Linux last update April/May 2016

¥  XL C/C++ 12.1 ¡  Current BG/Q version August 2015 ¡  AIX, Linux last update April 2016

¤  Hoping for BG/Q update

7

Page 8: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

PMRs ¤  PMR 74893

¥  getdents/getdents64 alias conflict with 4.7.2 toolchain [Apr 2016] ¥  Success - Fixed in V1R2M4

¤  PMR 30358 ¥  No way to list contents of node's /dev/shmem or /dev/persistent from a

CNK program ¥  (Can only access from off-node using CDTI) ¥  Negotiating with IBM on a solution (stalled on ANL)

8

Page 9: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Implementing GPFS AFM as a Burst Buffer

9

Page 10: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

ALCF Operational File System Configuration ¤  There are three main production level GPFS file systems:

¥  mira-fs0 – 19 PiB ¥  mira-fs1 - 7 PiB ¥  mira-home – 1.1 PiB

¤  These are based on DDN 12K-E (Embedded NSD servers) storage. ¤  These file systems are mounted to all the BG IO servers, a Cray

visualization cluster, Globus DTN nodes and HPSS data movers. ¤  symlinks to the project filesets are used to mask the underlying file

system from the users. /project contains links to mira-fs0 and mira-fs1 filesets.

10

Page 11: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

The big picture

11

A,B,C,D,E,F,G,H8PB,400GB/sfs2(30ESS)

Current

A,C,D,G,H21PB,240GB/sfs0(16DDN)

B,E,F7PB,90GB/sfs1(6DDN)

Step1AFM

A,C,D,G,H21PB,240GB/sfs0(16DDN)

B,E,F7PB,90GB/sfs1(6DDN)

A,B,C,D,E,F,G,H8PB,400GB/sfs2(30ESS)

Step2GHI

A,C,D,G,H21PB,240GB/sfs0(16DDN)

B,E,F7PB,90GB/sfs1(6DDN)

HPSSviaGHI

Page 12: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

ESS 50,000 foot view ¤  Terminology:

¥  AFM: Active File Management ¥  cache – fs2, non-permanent location ¥  home – fs0, fs1, permanent location; not /home, which is confusing

¤  fs2 is a cache; Like any other cache, files can be evicted; AFM ensures there is a good copy in home before eviction. ¥  Eviction is policy driven; Our policies will probably be fairly simple (high water / lower

water and LRU), but most any file system parameter can be checked) ¤  AFM is “constantly” syncing files between the cache and home

¥  Default is every 15 seconds, but is alterable ¥  Basically “replays” events from the cache on home (metadata updates, block changes,

etc.) ¥  We can influence priorities of new writes vs. replication with tuning parameters, but true

QoS is not here yet (it is in the roadmap) ¤  Fundamental assumption: Our file systems are big enough that recalls will be rare

12

Page 13: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Why? ¤  Overall, better experience for both the users and the admins ¤  No single point of failure

¥  We got this by adding fs1 to the existing fs0 file system ¤  All users see the same performance

¥  There is not uniform bw between fs0 and fs1, but fs2 and AFM give it back ¥  A cache miss might cause run to run variation, but all users are equally subject

to this and given our cache size, the assumption is this will be rare. ¤  Minimal user/admin intervention required

¥  Ideally policies will do the right thing at the right time ¡  Example: Project data will automatically disappear, via GHI, after a project

completes; Storage team doesn’t need to do anything. ¥  Should trivially (single command) be able to cause the right behavior

¡  Probably based on extended attribute and policy ¤  Minimal (ideally zero) manual copying of files

¥  All handled by the file system

13

Page 14: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Implementation Overview ¤  Current State: Internal testing

¥  2 ESS node pairs deleted from ESS and used to form a test “home” cluster. File system called: fm-fs1.

¤  Using AFM mode that allows for a GPFS home file system. ¤  mira-fs0 and mira-fs1 will be remotely mounted to ESS system

¥  mira-home will not be AFM managed

¤  Home file system defined as: gpfs:/// ¥  Null server list which signifies remotely mounted to cache cluster

14

Page 15: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Implementation Challenges ¤  This cluster was originally based on the GSS product which was

xSeries (Intel) based. ¥  Cluster was migrated from xSeries to pSeries (ESS) to address support

concerns.

¤  Kernel Panics in mpt2sas kernel module ¥  Redhat bug: 1259907 (bug) /1318560 (back port to 7.1 zStream) ¥  Difficult PD/recreate path. Much of the work was done by ALCF dealing

directly with Red Hat and Avago. Up to 3 weeks between recreates. ¥  Fixed in RH 7.2 stream but since ESS won’t support RH 7.2 in calendar

year 2016 we needed to ask Red Hat for a port to RH 7.1 zStream. This fix is under final tests now at Red Hat.

¥  As soon as available this will be incorporated in our ESS cluster. ¥  Fixed in: kernel-3.10.0-229.33.1.el7

15

Page 16: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Implementation Challenges ¤  Quotas:

¥  No communication between home and cache WRT to quota settings. ¥  Data is sync’ed to home as root and root does not error on over quota ¥  Found best to turn off auto-migration on cache filesets.

¡  Prevents over-running home cache hard limits

¤  Bug found after file evicted and subsequently re-read from cache cluster it was not becoming resident again in cache. Efix under construction.

16

Page 17: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

HSI deployment

17

Page 18: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

FileSystemBackupOverview

Ø Mira-homeisintendedtostoreexecutablefilesandconfigura[onfiles.Ø Userquotalimitsareenforced.Ø DataisfullyprotectedthruGPFSmetadataanddatareplica[on,GPFSsnapshotsandnightlybackups.

Ø Mira-fs0andmira-fs1areintendedasintermediate-termstorageforMira/Cetusjoboutputsuchascheckpointdatasets.Ø Projectassociatedfilesetquotalimitsareenforced.Ø Onlymetadatareplica[onisenabled.Ø Theuserisresponsibleforarchivingtheirowndata.Ø Aaerprojectexpira[on,quotalimitsarereduced,dataisarchivedandthefilesetisremoved.

Page 19: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Current archiving (fs0, fs1) ¤  Users manually archive files via HSI or Globus/GridFTP ¤  When user project expires, script invokes HSI

¥  Copy to tape, shrink disk quota ¥  90 days after expiration

¡  unmount (but still on disk)

¥  180 days ¡  delete from disk, copy retained on tape

¤  No other system-related archiving

19

Page 20: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

GHI value added ¤  Instead of scripting archiving directly, generate GPFS policy files to

tell GHI when/how to migrate expired projects ¤  Improved disaster recovery

¥  Migrate everything (leave in place on disk) ¥  Optional: Implement threshold policy e.g. when fs reaches 90% start

migration down to 80% (punch holes in files leaving metadata) ¥  GHI image backup – create image of the entire fs only as "punched out"

files (metadata) ¡  Initial restore of entire fs quickly, retrieve remaining contents on demand

until caught up

20

Page 21: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

GHI Deployment Status ¤  In progress (~6 mo)

¥  GHI installed and enabled on a test fs

¤  Some delays due to PMR (snapshots not deleted, deletion times out due to other processes) ¥  Other sites report issue resolved

¤  Expecting deployment this year

21

Page 22: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

DiskCache Tape

HPSS

DataMover DataMover

GPFSCluster

GPFSNSDNodes

GHISessionNode

GHIIOMNodesDDN

Storage 10GbEQDRIBFC

High Level GHI Integration View

1GbE

4x6Gb/sSAS

Page 23: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Portability/Interoperability

23

Page 24: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Coordinating SC Centers Efforts

24

¤  Meetings of cross-labs applications readiness staff ¤  Tools and libraries working group (W. Joubert) ¤  Cross-lab training committee (F. Foerttner)

¥  Shared calendar ¥  Shared training events

¤  Manage nondisclosure, export control challenges ¥  CORAL partners, APEX partners (NERSC & Trinity)

¤  Portability

March2014Mee[ng•  Appsreadinesscoordina[on•  ~15representa[ves

September2014Mee[ng•  AppsPortability•  Appsreadinesscoordina[on•  ~25representa[ves

January2015Mee[ng•  Next-genhardware&soaware•  Portableprogramming•  ~40centerstaff•  ~10vendorreps

Page 25: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

OpenMP 4.5 – Source Code Portability

25

¤  Host (CPU) ¤  Device (accelerator)

¥  GPGPU ¥  Intel Mic ¥  omp_get_num_devices()

¤  map maybe shared location

¤  simd standard vectorization

¤  #ifdef MANYCORE #else GPU #endif <code>

doublex[128],y[128];#pragmaomptargetdatamap(to:x[0:64])map(tofrom:y[0;64]){#pragmaomptarget{//ycomputedondevice}}

doublex[128],y[128];#pragmaompforsimdaligned(x,y:32)for(inti=0;i<128;i++){//thread’siteratesàSIMDlanes}

#pragmaomptargetdatamap(to:x){#pragmaomptargetmap(tofrom:y)#pragmaompteams#pragmaompdistribute#pragmaompparallelforfor(inti=0;i<n;++i){y[i]=a*x[i]+y[i];}}

Page 26: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Performance Portability

26

¤  When directives-based approach performs: OpenMP 4.5 ¤  Use libraries or frameworks to encapsulate node level parallelism

OpenMP4.0 Libraries AppModules

CUDAVectorintrinsicsshmem

Charm++

ADLB

PETSc

Chombo

Trilinos

MADNESS

Kokkos

RAJA

Page 27: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Effort Continuing ¤  Coordination between ESP, CAAR, NESAP.

¥  ALCC allocation at ANL, OLCF, NESAP for portability work ¥  Shared training including live/videocon

¤  Bringing NNSA labs into the discussion ¥  HPCOR (9/2015), COEPP (4/2016)

¤  SC15 Workshop on Portability Among HPC Architectures for Scientific Applications

¤  New effort on kernels/mini-apps: optimize on 2 platforms, then rework with OMP 4.5 or OpenACC, compare ¥  NekBone (ALCF), BoxLib (NERSC), DSL-based library for MD (OLCF)

27

Page 28: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

Training - ATPESC ¤  Argonne Training Program on Extreme Scale Computing ¤  Intensive 2-week course held off-site

¥  Audience: doctoral students, postdocs, and computational scientists ¥  Presenters are leaders in all major areas of HPC

¤  Planning in progress for the 4th session in August 2016

¤  http://extremecomputingtraining.anl.gov

28

Page 29: ANL Site Update - ScicomPspscicomp.org/.../2016/05/loy-anl-site-update-2016-v1.4.pdfANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell

29

The End