Introduction to HPC2N Birgitte Brydsø HPC2N, Ume˚ a University 3 December 2019 1 / 23
Introduction to HPC2N
Birgitte Brydsø
HPC2N, Ume̊a University
3 December 2019
1 / 23
Kebnekaise
1 602 nodes / 19288 cores (of which 2448 are KNL)
432 Intel Xeon E5-2690v4, 2x14 cores, 128 GB/node52 Intel Xeon Gold 6132, 2x14 cores, 192 GB/node20 Intel Xeon E7-8860v4, 4x18 cores, 3072 GB/node32 Intel Xeon E5-2690v4, 2x NVidia K80, 2x14, 2x4992, 128 GB/node4 Intel Xeon E5-2690v4, 4x NVidia K80, 2x14, 4x4992, 128 GB/node10 Intel Xeon Gold 6132, 2x NVidia V100, 2x14, 2x5120, 192 GB/node
36 Intel Xeon Phi 7250, 68 cores, 192 GB/node, 16 GB MCDRAM/node
2 501760 CUDA “cores” (80*4992 cores/K80+20*5120 cores/V100)
3 More than 136 TB memory
4 Interconnect: Mellanox FDR / EDR Infiniband
5 Theoretical performance: 728 TF (+ expansion)
6 Date installed: Fall 2016 / Spring 2017 / Spring 2018
2 / 23
Using KebnekaiseConnecting to HPC2N’s systems
Linux, Windows, MacOS/OS X: Install thinlinc client
Linux, OS X:ssh [email protected]
Use ssh -Y .... if you want to open graphical displays.
Windows:Get SSH client (MobaXterm, PuTTY, Cygwin ...)Get X11 server if you need graphical displays (Xming, ...)Start the client and login with your HPC2N username to
kebnekaise.hpc2n.umu.se
More information here:
https://www.hpc2n.umu.se/documentation/guides/windows-connection
Mac/OSX: Guide here:https://www.hpc2n.umu.se/documentation/guides/mac-connection
3 / 23
Using KebnekaiseConnecting with thinlinc
Download and install the client fromhttps://www.cendio.com/thinlinc/download
Start the client. Enter the name of the server:kebnekaise-tl.hpc2n.umu.se and then enter your ownusername under ”Username”. Enter your Password.
Go to ”Options” -> ”Security” and check that authenticationmethod is set to password.
Go to ”Options” -> ”Screen” and uncheck ”Full screenmode”.
Click ”Connect”. Click ”Continue” when you are being toldthat the server’s host key is not in the registry.
After a short time, the thinlinc desktop opens, running Matewhich is fairly similar to the Gnome desktop. All your files onHPC2N should be available.
4 / 23
Using KebnekaiseTransfer your files and data
Linux, OS X:Use scp (or sftp) for file transfer. Example, scp:
local> scp [email protected]:file .
local> scp file [email protected]:file
Windows:Download client: WinSCP, FileZilla (sftp), PSCP/PSFTP, ...Transfer with sftp or scp
Mac/OSX:Transfer with sftp or scp (as for Linux) using TerminalOr download client: Cyberduck, Fetch, ...
More information in guides (see previous slide) and here:https://www.hpc2n.umu.se/documentation/filesystems/filetransfer
5 / 23
Using KebnekaiseEditors
Editing your files
Various editors: vi, vim, nano, emacs ...
Example, vi/vim:
vi <filename>Insert before: iSave and exit vi/vim: Esc :wq
Example, nano:
nano <filename>Save and exit nano: Ctrl-x
Example, Emacs:
Start with: emacsOpen (or create) file: Ctrl-x Ctrl-fSave: Ctrl-x Ctrl-sExit Emacs: Ctrl-x Ctrl-c
6 / 23
The File System
AFSYour home directory is here($HOME)Regularly backed upNOT accessible by the batch system(ticket-forwarding doesn’t work)
secure authentification with
Kerberos tickets
PFSParallel File SystemNO BACKUPHigh performance when accessedfrom the nodesAccessible by the batch systemCreate symbolic link from $HOMEto pfs:
ln -s /pfs/nobackup/$HOME
$HOME/pfs
7 / 23
The Module System (Lmod)
Most programs are accessed by first loading them as a ’module’
Modules are:
used to set up your environment (paths to executables,libraries, etc.) for using a particular (set of) softwarepackage(s)
a tool to help users manage their Unix/Linux shellenvironment, allowing groups of related environment-variablesettings to be made or removed dynamically
allows having multiple versions of a program or packageavailable by just loading the proper module
installed in a hierarchial layout. This means that somemodules are only available after loading a specific compilerand/or MPI version.
8 / 23
The Module System (Lmod)
Most programs are accessed by first loading their ’module’
See which modules exists:module spider or ml spider
Modules depending only on what is currently loaded:module avail or ml av
See which modules are currently loaded:module list or ml
Example: loading a compiler toolchain and version, here for GCC,OpenMPI, OpenBLAS/LAPACK, FFTW, ScaLAPACK and CUDA:module load fosscuda/2019a or ml fosscuda/2019a
Example: Unload the above module:module unload fosscuda/2019a or ml -fosscuda/2019a
More information about a module:module show <module> or ml show <module>
Unload all modules except the ’sticky’ modules:
module purge or ml purge
9 / 23
The Module SystemCompiler Toolchains
Compiler toolchains load bundles of software making up a complete envi-
ronment for compiling/using a specific prebuilt software. Includes some/all
of: compiler suite, MPI, BLAS, LAPACK, ScaLapack, FFTW, CUDA.
Some of the currently available toolchains (check ml av for all/versions):
GCC: GCC onlygcccuda: GCC and CUDAfoss: GCC, OpenMPI, OpenBLAS/LAPACK, FFTW, ScaLAPACKfosscuda: GCC, OpenMPI, OpenBLAS/LAPACK, FFTW, ScaLAPACK, and CUDAgimkl: GCC, IntelMPI, IntelMKLgimpi: GCC, IntelMPIgompi: GCC, OpenMPIgompic: GCC, OpenMPI, CUDAgoolfc: gompic, OpenBLAS/LAPACK, FFTW, ScaLAPACKicc: Intel C and C++ onlyiccifort: icc, iforticcifortcuda: icc, ifort, CUDAifort: Intel Fortran compiler onlyiimpi: icc, ifort, IntelMPIintel: icc, ifort, IntelMPI, IntelMKLintelcuda: intel and CUDAiomkl: icc, ifort, Intel MKL, OpenMPIpomkl: PGI C, C++, and Fortran compilers, IntelMPIpompi: PGI C, C++, and Fortran compilers, OpenMPI
10 / 23
Compiling and Linking with LibrariesLinking
Figuring out how to link
Intel and Intel MKL linking:https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
Buildenv
After loading a compiler toolchain, load ’buildenv’ and use’ml show buildenv’ to get useful linking infoExample, fosscuda, version 2019a:ml fosscuda/2019a
ml buildenv
ml show buildenv
Using the environment variable (prefaced with $) is highlyrecommended!You have to load the buildenv module in order to be able touse the environment variables for linking!
11 / 23
The Batch System (SLURM)
Large/long/parallel jobs must be run through the batchsystem
SLURM is an Open Source job scheduler, which providesthree key functions
Keeps track of available system resourcesEnforces local system resource usage and job schedulingpoliciesManages a job queue, distributing work across resourcesaccording to policies
In order to run a batch job, you need to create and submit aSLURM submit file (also called a batch submit file, a batchscript, or a job script).
Guides and documentation at:http://www.hpc2n.umu.se/support
12 / 23
The Batch System (SLURM)Useful Commands
Submit job: sbatch <jobscript>
Get list of your jobs: squeue -u <username>
srun <commands for your job/program>
salloc <commands to the batch system>
Check on a specific job: scontrol show job <job id>
Delete a specific job: scancel <job id>
Useful info about job: sacct -l -j <jobid> | less -S
13 / 23
The Batch System (SLURM)Job Output
Output and errors in:slurm-<job-id>.out
To get output and error files split up, you can give these flagsin the submit script:#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
To specify Broadwell or Skylake only:#SBATCH --constraint=broadwell or#SBATCH --constraint=skylake
To run on the GPU nodes, add this to your script:#SBATCH --gres=gpu:<card>:xwhere <card> is k80 or v100, x = 1, 2, or 4 (4 only if K80).
http://www.hpc2n.umu.se/resources/hardware/kebnekaise
14 / 23
The Batch System (SLURM)Simple example, serial
Example: Serial job, compiler toolchain ’fosscuda/2019a’
#!/bin/bash
# Project id - change to your own after the course!
#SBATCH -A SNIC2019-5-156
# Asking for 1 core
#SBATCH -n 1
# Asking for a walltime of 5 min
#SBATCH --time=00:05:00
# Always purge modules before loading new in a script.
ml purge > /dev/null 2>&1ml fosscuda/2019a
./my serial program
Submit with:
sbatch <jobscript>15 / 23
The Batch System (SLURM)parallel example
#!/bin/bash
#SBATCH -A SNIC2019-5-156
#SBATCH -n 14
#SBATCH --time=00:05:00
ml purge < /dev/null 2>&1ml fosscuda/2019a
srun ./my mpi program
16 / 23
The Batch System (SLURM)Requesting GPU nodes
Currently there is no separate queue for the GPU nodes
Request GPU nodes by adding this to your batch script:
#SBATCH --gres=gpu:<type-of-card>:x
where <type-of-card> is either k80 or v100 and x
= 1, 2, or 4 (4 only for the K80 type)
There are 32 nodes (broadwell) with dual K80 cards and 4nodes with quad K80 cards
There are 10 nodes (skylake) with dual V100 cards
17 / 23
R at HPC2NLoading R
Check which version of R is installed:ml spider R
Choose the version you want. We recommendR/3.4.4-X11-20180131
Load the necessary prerequisites as well as the moduleml GCC/6.4.0-2.28 OpenMPI/2.1.2
R/3.4.4-X11-20180131
You can now run R, or install any R packages you wish.
On our website you can see how to find out which R packagesare already installedhttps ://www .hpc2n.umu.se/resources/software/r#HPC2N R addons
18 / 23
R at HPC2NInstalling R packages/add-ons
Create a place for the R add-ons and tell R to find it. Here weuse /pfs/nobackup$HOME/R-packages:mkdir -p /pfs/nobackup$HOME/R-packages
R reads the $HOME/.Renviron file to setup its environment.Since you want to use R from the batch system, we need tocreate a link to the directory in pfs:ln -s /pfs/nobackup$HOME/.Renviron $HOME
Since the file likely is empty now, tell R where your add-ondirectory is like this:echo R LIBS="/pfs/nobackup$HOME/R-packages" >~/.Renviron
If it is not empty, edit $HOME/.Renvion so that R LIBScontain the path to your chosen add-on directory. It shouldlook something like this when you are done:R LIBS="/pfs/nobackup/home/u/user/R-packages"
19 / 23
R at HPC2NInstalling R packages/add-ons automatically
Automatic download and install
Load R and dependenciesInstall from CRAN repo (in Sweden)
R --quiet --no-save --no-restore -e
"install.packages(’package’,
repos=’http://ftp.acc.umu.se/mirror/CRAN/’)"
If the package has dependencies that come from more thanone repo it will not work. You either run the ”install.packages”interactively in R or by manual method.
You can now use your add-on like thislibrary("package")
20 / 23
R at HPC2NInstalling R packages/add-ons manually
Manual download and install
Download (wget for instance) the add-on from the CRANPackage site. Download and install any prerequisites firstLoad R and dependenciesTell R to install into your chosen add-on directory
R CMD INSTALL -l /pfs/nobackup$HOME/R-packages
R-package.tar.gz
You can now use your add-on like thislibrary("package")
21 / 23
R at HPC2NRStudio
RStudio is only installed on the thinlinc node, so you need toconnect to that first with your thinlinc client
Start RStudio with rstudio
Note that you cannot submit jobs to the batch system frominside RStudio! Anything run from inside it will run directlyon the thinlinc node!
22 / 23
Various useful info
A project has been set up for the workshop: SNIC2019-5-156
You use it in your batch submit file by adding:
#SBATCH -A SNIC2019-5-156
There is a reservation for 2 regular Broadwell nodes. Thisreservation is accessed by adding this to your batch submitfile:
#SBATCH --reservation=ml-with-r
The reservation is ONLY valid for the duration of the course.
23 / 23