Top Banner
Doc No.: HPCI-ST01-004E-03 HPCI Shared Storage User Manual For Fugaku users 2021/4/16
29

HPCI Shared Storage User Manual

Jan 01, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HPCI Shared Storage User Manual

Doc No.: HPCI-ST01-004E-03

HPCI Shared Storage User Manual For Fugaku users

2021/4/16

Page 2: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

2

Revision History Revision Date Description

Initial

and

2nd

2021/03/15 • Substantially changed based on HPCI Shared Storage

User Manual (HPCI-ST01-001), accounting for actual

Fugaku computer operations

3rd 2021/04/16 • Correction of errors and figures.

• Changed the name of csgw.fugaku.r-ccs.riken.jp to

Cloud Storage Gateway Node

Page 3: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

3

Table of Contents Introduction ............................................................................................................................................................................... 4

Overview of the Shared Storage System ........................................................................................... 6

For First-time Users of the Shared Storage System ................................................................. 7

Login to the Cloud Storage Gateway Node ..................................................................................................... 8

Obtaining the HPCI proxy certificate ................................................................................................................. 9

Setting up encrypted network communication ........................................................................................... 11

Mounting shared storage ...................................................................................................................................... 12

Remote copy to shared storage ........................................................................................................................ 13

Data transfer between Fugaku and shared storage .................................................................................. 13

Introduction of Replicas ........................................................................................................................................ 13

Parallel File Copy ..................................................................................................................................................... 14

Unmounting Shared Storage ............................................................................................................................... 16

Details of Shared Storage ........................................................................................................................................ 17

Direct Access to Shared Storage .................................................................................................................... 17

Access control of file and directory ................................................................................................................ 18

Storage usage and allocation ............................................................................................................................. 19

File Sharing in a Project ....................................................................................................................................... 22

Installing the Client Environment ...................................................................................................................... 22

Introduction of TIPS ............................................................................................................................................... 23

Troubleshooting ...................................................................................................................................................... 24

Introducing the HPCI Helpdesk ......................................................................................................................... 24

“Mountpoint is not empty” indicating the mountpoint is already in use .......................................... 25

“No write access to mountpoint” and nothing can be written ............................................................. 25

"Transport endpoint is not connected” and the shared storage cannot be accessed ............. 26

"Operation not permitted” and the shared storage cannot be mounted ........................................ 27

"Transport endpoint is not connected” and files cannot be accessed ............................................ 27

"Invalid argument” and files cannot be accessed ..................................................................................... 28

"Connection refused” and files cannot be accessed .............................................................................. 28

"Input/Output Error” and files cannot be accessed ................................................................................ 28

Page 4: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

4

Introduction This document is intended for participants in HPCI projects using the Fugaku

supercomputer (hereafter referred to as the “Fugaku”) and describes how to access

the HPCI shared storage system (hereafter referred to as the “shared storage

system”) from Fugaku gateway serverss (csgw1.fugaku.r-ccs.riken.jp and

csgw2.fugaku.r-ccs.riken.jp) for Fugaku users.

Note that, Since the client software for using the shared storage is not installed on the

Fugaku login node (login.fugaku.r-ccs.riken,jp/ hereafter referred to as the Fugaku

login node), please use the Cloud Storage Gateway Node to access the shared

storage.

Chapter 1 briefly introduces the shared storage system, and Chapter 2 discusses basic

usage with actual examples for first-time users. Chapter 3. discusses using the shared

storage system in more detail. Finally, Chapter 4. explains how to contact the HPCI

Helpdesk for troubleshooting, as well as general troubleshooting methods.

This document assumes that users have obtained both a digital and a proxy certificate

in advance from the HPCI certificate issuing system. For instructions on how to obtain

a proxy certificate, refer to Chapter 2.2 “Proxy Certificate Issuing Procedure” in the

HPCI Login Manual (HPCI-CA01-001E): https://www.hpci-office.jp/materials/hpci-

ca01-001_e.pdf

In this document, italicized characters indicate input commands and bold characters

mean that the comment or the command’s output should be checked.

When using the shared storage system, the following websites may be helpful.

• Shared storage portal website

• Shows notices and maintenance information for the shared storage system:

https://www.hpci-office.jp/info/pages/viewpage.action?pageId=11862295

• Shared storage tips

• Provides detailed explanations of shared storage and Gfarm commands:

https://www.hpci-office.jp/info/pages/viewpage.action?pageId=26935659

Page 5: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

5

• HPCI Helpdesk

• If you have any questions about the shared storage system, contact the HPCI

Helpdesk:

https://www.hpci-office.jp/pages/e_support

• HPCI portal website

• Provides support information related to using the HPCI, including applying for

a project and reporting results.

https://www.hpci-office.jp/

• Shared Storage Operation Information

• providing a dashboard that visualizes shared storage operation information. Ø Dashboard: https://hpci-web01.r-ccs.riken.jp/grafana/

Ø Manual: https://www.hpci-office.jp/info/pages/viewpage.action?pageId=216629492

The shared storage system is managed and operated by an HPCI shared storage

working group comprising the following two organizations.

• RIKEN Center for Computational Science (RIKEN Center for Computational

Science)

http://www.r-ccs.riken.jp/

• Information Technology Center, The University of Tokyo

http://www.cc.u-tokyo.ac.jp/

Page 6: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

6

Overview of the Shared Storage System The shared storage system is a large-scale data sharing platform for HPCI users. By

using the shared storage system, HPCI users can quickly and safely share large

amounts of data under one file system across the geographically-dispersed

computational resources of the HPCI. The shared storage system uses the Gfarm

network shared file system and consists of metadata servers (serving metadata) and

file system nodes (serving file data). The system ensures high fault tolerance by

always copying metadata transactions from a master metadata server, installed at the

University of Tokyo or RIKEN R-CCS, to one or more slave servers, again at the

University of Tokyo or RIKEN R-CCS. The shared storage system’s client environment

is installed on login nodes, the HPCI system’s computational resources, and Fugaku

login nodes, to allow HPCI project participants to share storage. Users can also install

the client environment on local machines. For instructions on how to install the client

environment, refer to “HPCI Shared Storage User Manual–Client Introduction”: https://www.hpci-office.jp/materials/hpci-st01-002.pdf

RIKEN Center for Computational Science

Master metadata server�Slave metadata server File system node

Information Technology Center, The University of Tokyo

Master metadata server �Slave metadata server File system node

�The master metadata server is operated either by the RIKEN Center for Computational Science or Information Technology Center, The University of Tokyo.

Page 7: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

7

For First-time Users of the Shared

Storage System This chapter explains the procedure for logging in to a shared storage login node and

mounting the shared storage system. In this chapter, the following account and Gfarm

group ID are used for all the examples.

Fugaku account name: u000000

HPCI-ID: hpci000000

Gfarm group ID: hp000000

A Gfarm group ID is assigned to each project during the initial project setup process,

except for a strategic program project awarded by FY 2015.

Please refer to the following page for the list of Gfarm group IDs.

• https://www.hpci-office.jp/info/pages/viewpage.action?pageId=178064247

In this document, use of a B shell system (such as bash) is assumed. If you are using

another type of shell, such as a C shell (e.g. csh or tcsh), adapt the B shell commands to

the shell you are using as necessary. A mount point for each project’s shared storage are

created from the Gfarm group ID. Note that, in the following description, the term “group

ID” refers to the Gfarm group ID.

This document explains how to log in to Cloud Storage Gateway Node using SSH, mount

the HPCI shared storage, and transfer data to and from Fugaku Global Storage as shown in

the following figure.

Page 8: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

8

Login to the Cloud Storage Gateway Node

Fugaku provides the Cloud Storage Gateway Node as a login node for accessing the

HPCI shared storage and cloud computing environment out side the R-CCS.

The client software for using the HPCI shared storage is installed in the Cloud Storage

Gateway Node.

Cloud Storage Gateway Node Representative FQDN Actual FQDN

csgw.fugaku.r-ccs.riken.jp csgw1.fugaku.r-ccs.riken.jp

csgw2.fugaku.r-ccs.riken.jp

You can login to the Cloud Storage Gateway Node using SSH and GSISSH in the same

way as the Fugaku login node. To login via SSH, please access the Fugaku portal site

(https:///fugaku.r-ccs.riken.jp) and register your public key in advance. For details on

how to register, please refer to the Fugaku User Manual.

Page 9: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

9

■ client$ ssh [email protected]

■ specify csgw.fugaku.r-ccs.riken.jp as the login destination, you can login to

csgw1 or csgw2.

■ [u00000@csgw1 ~]$

You can also login to the Cloud Storage Gateway Node using GSI-SSH, but you need

to issue a proxy certificate for HPCI and prepare a GSI-SSH client environment in

advance. For the issuing method and environment, please refer to the HPCI Quick Start Guide (https://www.hpci-office.jp/pages/e_hpci_info_manuals).

The following is how to login to the Cloud Storage Gateway Node using GSI-SSH.

■ client$ myproxy-logon -s portal.hpci.nii.ac.jp -l hpci000000 -t168

■ Issue a proxy certificate. It is valid for 168 hours.

■ Enter MyProxy pass phrase: ******

■ A credential has been received for user hpci000000 in

/tmp/x509up_XXXXXX.fileXXXXXXX.

■ client$ gsissh -p2222 [email protected]

■ Login to csgw.fugaku.r-ccs.riken.jp using GSI-SSH

you can login to csgw1 or csgw2.

■ [u00000@csgw1 ~]$

Obtaining the HPCI proxy certificate

This document describes how to access shared storage using an HPCI proxy certificate

(hereinafter referred to as a proxy certificate). For information on how to issue a

proxy certificate, please refer to the HPCI Quick Start Guide (https://www.hpci-

office.jp/pages/e_hpci_info_manuals).

After logging in to the Cloud Storage Gateway Node, check the expiration date of the

proxy certificate with the grid-proxy-info command.

The following is an example of a case where the proxy certificate has expired. If it has

expired, you need to obtain a proxy certificate.

Page 10: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

10

■ [u00000@csgw1 ~]$ grid-proxy-info

■ ERROR: Couldn't find a valid proxy.

■ globus_sysconfig: Could not find a valid proxy certificate file location

■ globus_sysconfig: Error with key filename

■ globus_sysconfig: File does not exist: /tmp/x509up_pXXXXX is not a valid file

■ Use -debug for further information.

■ [u00000@csgw1 ~]$

If a valid proxy certificate has not been obtained, use the myproxy-logon command to

obtain a proxy certificate as follows. The -t option specifies the validity period of the

proxy certificate in hours. You will be prompted to enter the passphrase. Enter the

passphrase that was set when the proxy certificate was issued by the HPCI certificate

issuing system and stored in the repository.

■ [u00000@csgw1 ~]$ myproxy-logon -s portal.hpci.nii.ac.jp -l hpci000000 -t 168

■ Enter MyProxy pass phrase: ******

■ A credential has been received for user hpci000000 in

/tmp/x509up_XXXXXX.fileXXXXXXX.

■ [u00000@csgw1 ~]$

If you cannot obtain a proxy certificate with the myproxy-logon command, try

reissuing the proxy certificate.

After acquiring the proxy certificate, run the grid-proxy-info command again to check

the validity period.

The validity period will be displayed in timeleft field

Page 11: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

11

■ u00000@csgw1 ~]$ grid-proxy-info

■ subject :

/C=JP/O=NII/OU=HPCI/CN=Hoge%40Foo[hpci000000]/CN=XXXXXXXXXX/CN=XXXXXXXXX/CN=XXXXXX

XXX/CN=XXXXXXXXX/CN=XXXXXXXXXX

■ issuer :

/C=JP/O=NII/OU=HPCI/CN=Hoge%40Foo[hpci000000]/CN=XXXXXXXXXX/CN=XXXXXXXXX/CN=XXXXXX

XXX/CN=XXXXXXXXX

■ identity : /C=JP/O=NII/OU=HPCI/CN=Hoge%40Foo[hpci000000]

■ type : RFC 3820 compliant impersonation proxy

■ strength : 2048 bits

■ path : /tmp/x509up_pXXXX.fileXYZABCD

■ timeleft : 23:59:40

■ [u00000@csgw1 ~]$

Setting up encrypted network communication

Since access to files and directories on shared storage is not encrypted but communicated in plain text

by system standards, it is recommended to enable the encrypted communication setting. Once you are

able to login to the Cloud Storage Gateway Node, please configure this setting before using shared

storage. To enable the encrypted communication setting for data protection, add the following

description to the configuration file $HOME/.gfarm2rc.

■ [u00000@csgw1 ~]$ cat $HOME/.gfarm2rc

■ auth enable gsi *

■ auth disable gsi_auth *

You can check whether the encrypted communication setting is enabled or not by using the gfhost

command. If the second item is uppercase "G", encrypted communication is enabled, and if it is

lowercase "g", communication is not encrypted.

Page 12: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

12

■ [u00000@csgw1 ~]$ gfhost -lv

■ 0.01/0.03/0.03 G i386-fedora3-linux 2 linux-1.example.com 600 0(10.0.0.1)

■ 0.00/0.00/0.00 G i386-fedora3-linux 2 linux-2.example.com 600 0(10.0.0.2)

■ 0.00/0.02/0.00 G i386-redhat8.0-linux 1 linux-4.example.com 600 0(10.0.0.4)

■ 0.10/0.00/0.00 G sparc-sun-solaris8 1 solaris-1.example.com 600 0(10.0.1.1)

■ ...

Mounting shared storage

To mount the shared storage, execute the mount.hpci command as follows.

■ [u00000@csgw1 ~]$ mount.hpci

■ Update proxy certificate for gfarm2fs

■ timeleft : 23:53:05

■ Mount GfarmFS on /gfarm/hp000000/u000000

■ Mount GfarmFS on /gfarm/hp000001/u000000

■ [u00000@csgw1 ~]$

The mount destination of the shared storage is displayed in the next field of "Mount GfarmFS on".

Normally, it will be mounted on /gfarm, but if the directory does not exist, it will be mounted on /tmp, as

in /tmp/hp000000/u000000.

If you are a member of multiple research projects, the home directory of the HPCI shared storage for all

projects you belong to will be mounted. In the example above, the HPCI shared storage home directories

of both the hp000000 and hp000001 projects are mounted.

You can check the mount status with the df command.

■ [u00000@csgw1 ~]$ df | grep gfarm2fs

■ Filesystem Size Used Avail Use% Mounted on

■ gfarm2fs 85P 47P 39P 55% /gfarm/hp000000/u000000

■ gfarm2fs 85P 47P 39P 55% /gfarm/hp000001/u000000

■ [u00000@csgw1 ~]$

Page 13: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

13

Remote copy to shared storage

You can copy files to the mounted shared storage by using the gscp or scp command, just like a normal

linux file system. Mount the shared storage with the Cloud Storage Gateway Node, then specify the

mount destination and execute copy.

■ [u00000@csgw1 ~]$ mount.hpci ← mount sharerd storage on Cloud Storage Gateway

Node

■ [u00000@csgw1 ~]$ exit

■ client$ scp ./data.file [email protected]:/gfarm/hp000000/u000000/

■ data.file 3% 381MB 72.6MB/s 02:12

ETA

■ client$

Data transfer between Fugaku and shared storage

For the area where the shared storage is mounted, Linux file manipulation commands can be used just

like any other file system. The following is an example of copying files from the global file system area

/data/hp000000/u000000 on Fugaku to the shared storage.

■ [u00000@csgw1 ~]$ cp /data/hp000000/u000000/data.file ¥

/gfarm/hp000000/u000000/

■ [u00000@csgw1 ~]$ ls -l /gfarm/hp000000/u000000/data.file

■ -rw-r--r-- 1 hpci000000 hp000000 100000 Feb 11 11:33 data.file

■ [u00000@csgw1 ~]$

Introduction of Replicas

Shared Storage automatically creates replicas of files on the file system nodes for data protection.

All files stored on the shared storage will have at least one replica each on the file servers of Todai and

RIKEN R-CCS, for a total of at least two replicas.

You can check the file server where the replicas are stored by using the gfwhere command. The

following example shows that the file “data.file” is stored in two file servers, one each in the Uviversity

of Tokyo and R-CCS.

Page 14: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

14

■ [u00000@csgw1 ~]$ gfwhere /gfarm/hp000000/u000000/data.file

■ gfs13-1.hpci.itc.u-tokyo.ac.j ss-05-0.r-ccs.riken.jp

In the shared storage, one replica each is placed in the University of Tokyo and R-CCS to ensure fault

tolerance. Please do not change the number of replicas or the location of replicas by users.

Parallel File Copy

Shared Storage provides the gfpcopy command to copy multiple files in parallel.

In the gfpcopy command, the parallelism of the copy is specified by the -j option. The default parallelism

is 4. In the following example, TEST_DIRECTORY stored in the Fugaku global file system is recursively

copied to the shared storage.

■ [u00000@csgw1 ~]$ mount.hpci

■ [u00000@csgw1 ~]$ cd /gfarm/hp000000/u000000/

■ [u00000@csgw1 ~]$ gfpcopy -j8 /data/hp000000/u000000/TEST_DIRECTOR/ ./

■ [u00000@csgw1 ~]$ ls /gfarm/hp000000/u000000/TEST_DIRECTORY

■ TEST_FILE_01 TEST_FILE_02 TEST_FILE_03 TEST_FILE_04 TEST_FILE_05 TEST_FILE_06

■ TEST_FILE_07 TEST_FILE_08 TEST_FILE_09 TEST_FILE_10 TEST_FILE_11 TEST_FILE_12

■ TEST_FILE_13 TEST_FILE_14 TEST_FILE_15 TEST_FILE_16 TEST_FILE_17 TEST_FILE_18

■ [u00000@csgw1 ~]$

You can set the parallelism in the client_parallel_copy variable in the configuration file $HOME/.gfarm2rc.

In the following example, the parallelism is set to 8.

■ [u00000@csgw1 ~]$ cat $HOME/.gfarm2rc

■ client_parallel_copy 8

The gfpcopy command will overwrite the source file if it is newer than the destination file.

If the file stored in the source directory has been updated, or if the copy of the file fails, the gfpcopy

command can be run against the same directory to copy only the file that failed to be copied or the file

that has been updated.

Page 15: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

15

[u00000@csgw1 ~]$ cp NEW_TEST_FILE_01 ./TEST_DIRECTOR/TEST_FILE_01

n The local test file is newer than the file srored in shared storage.

■ [u00000@csgw1 ~]$ ls -l ./TEST_DIRECTORY/TEST_FILE_01

■ -rw-r--r-- 1 u00000 hp000000 100000 Jul 15 11:14 TEST_FILE_01

■ [u00000@csgw1 ~]$ ls -l ¥

■ /gfarm/hp000000/u000000/TEST_DIRECTORY/ TEST_FILE_01

■ -rw-r--r-- 1 u00000 hp000000 300000 Jul 20 12:00 TEST_FILE_01

■ goverwrite with gfpcopy, only TEST_FILE_01 will be copied

■ [u00000@csgw1 ~]$ gfpcopy -j 8 ¥

./TEST_DIRECTOR/gfarm/hp000000/u000000/

■ [u00000@csgw1 ~]$ ls -l ¥

■ /gfarm/hp000000/u000000/TEST_DIRECTORY/ TEST_FILE_01

■ -rw-r--r-- 1 u00000 hp000000 300000 Jul 20 12:00 TEST_FILE_01

In the following example, a parallel copy is performed from the shared storage to the

Fugaku global file system. The parallelism is specified as 8. When the -v option is

specified for Gfpcopy, detailed information on file copying is displayed.

Page 16: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

16

■ [u00000@csgw1 ~]$ grid-proxy-info | grep timeleft

■ timeleft : 1:00:00 (0.1 days) ← Check the expiration date. If it is not enough,

re-obtain a proxy certificate.

■ [u00000@csgw1 ~]$ myproxy-logon -s portal.hpci.nii.ac.jp -l hpci000000 -t 168

■ Enter MyProxy pass phrase: ******

■ [u00000@csgw1 ~]$ grid-proxy-info | grep timeleft

■ timeleft : 167:59:45 (7.0 days) ← Check the expiration date.

■ [u00000@csgw1 ~]$ gfpcopy -j 8 -v ¥

■ /gfarm/hp000000/u000000/TEST_DIRECTORY2 ¥

■ /data/hp000000/u000000/

■ INFO: mkdir(file:///data/hp000000/u000000, 755) OK

■ INFO: scheduling method = noplan

■ [OK]COPY, 200MB/s(1.0s): gfarm://ms-0.r-

ccs.riken.jp:601/home/hp000000/u000000/TEST_DIRECTORY2/FILE01(gfs54-2.hpci.itc.u-

tokyo.ac.jp:600) -> file:///data/hp000000/u000000/TEST_DIRECTORY2/FILE01

■ [OK]COPY, 200MB/s(1.0s): gfarm://ms-0.r-

ccs.riken.jp:601/home/hp000000/u000000/TEST_DIRECTORY2/FILE02(ss-02-1.r-

ccs.riken.jp:600) -> file:///data/hp000000/u000000/TEST_DIRECTORY2/FILE02

■ (snip)

■ [u00000@csgw1 ~]$

Check the expiration date of the proxy certificate before executing the gfpcopy command. If the

expiration date of the proxy certificate is exceeded during file copying, file copying from the point of the

expiration will fail. If the expiration date is exceeded and file copying fails, re-obtain the proxy

certificate and re-execute the gfpcopy command. Only the files that have not yet been copied will be

copied.

Unmounting Shared Storage

To unmount the shared storage, use the umount.hpci command.

■ [u00000@csgw1 ~]$ umount.hpci

■ Unmount GfarmFS on /gfarm/hp000000/u000000

■ Unmount GfarmFS on /gfarm/hp000001/u000000

■ [u00000@csgw1 ~]$

Page 17: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

17

Details of Shared Storage This chapter provides details on how to use shared storage.

Direct Access to Shared Storage

There are two ways to access the shared storage as shown below.

(A) Mount the shared storage area and access it with standard file manipulation commands (method

described in Chapter 2).

(B) Direct access to the shared storage area using Gfarm-specific commands without mounting the

shared storage area.

This section introduces direct access (B). To specify a file stored in Gfarm in a Gfarm-specific

command, use the Gfarm absolute path beginning with gfarm://. The following is an example of listing

files by the gfls command.

■ [u00000@csgw1 ~]$ gfls -l ¥

■ gfarm:///home/hp000000/u000000/TEST_DIRECTORY

■ -rw-r--r-- 1 hpci000000 hp000000 10485760 Nov 11 16:30

gfarm:///home/hp000000/u000000/TEST_DIRECTORY/TEST_FILE_01

■ [u00000@csgw1 ~]$

The Gfarm absolute path can also be specified for the parallel copy command gfpcopy. By specifying

the Gfarm absolute path, you can access the shared storage without mounting it.

Page 18: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

18

■ [u00000@csgw1 ~]$ mkdir /data/hp000000/u00000/work_dir

■ [u00000@csgw1 ~]$ ls -l /data/hp000000/u00000/work_dir

■ total 0

■ [u00000@csgw1 ~]$ gfpcopy -j 8 ¥

gfarm:///home/hp000000/u000000/TEST_DIRECTORY3 ¥

/data/hp000000/u00000/

■ [u00000@csgw1 ~]$ ls -l /data/hp000000/u00000/TEST_DIRECTORY3

■ total 921600

■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.000

■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.001

■ (省略)

■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.098

■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.099

■ [u00000@csgw1 ~]$

Access control of file and directory

Shared Storage supports access control lists (ACLs), which allow you to set individual access rights for

any user group. Access rights can be set individually for any user or group. The gfgetfacl and gfsetfacl

commands are used to reference and set ACLs, respectively.

The following example shows how to reference and set the ACL. First, the directory work is created, and

for the work directory, the ACL is referenced using the gfgetfacl command, and the ACL is set using the

gfsetfacl command.

For detailed usage of the gfgetfacl and gfsetfacl commands, please refer to the manual(man gfgetfacl,

gfsetfacl).

Page 19: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

19

■ [u00000@csgw1 ~]$ gfmkdir -p gfarm:///home/hp000000/u000000/work

■ [u00000@csgw1 ~]$ gfls -dl gfarm:///home/hp000000/u000000/work

■ drwxr-xr-x 2 hpci000000 hp000000 0 Aug 5 15:24 work

■ [u00000@csgw1 ~]$ gfgetfacl gfarm:///home/hp000000/u000000/work

■ # file: gfarm:///home/hp000000/u000000/work

■ # owner: hpci000000

■ # group: hp000000

■ user::rwx

■ group::r-x

■ other::r-x

■ [u00000@csgw1 ~]$ gfsetfacl -m g:hp012345:rwx ¥

gfarm:///home/hp000000/u000000/work

■ [u00000@csgw1 ~]$ gfgetfacl gfarm:///home/hp000000/u000000/work

■ # file: gfarm:///home/hp000000/u000000/work

■ # owner: hpci000000

■ # group: hp000000

■ user::rwx

■ group::r-x

■ group:hp012345:rwx

■ other::r-x

■ [u00000@csgw1 ~]$

In the following example, the gfls and gfchmod commands, which are Gfarm-specific commands, are used

to refer to and set the access rights of files and directories, respectively.

■ [u00000@csgw1 ~]$ gfls -dl gfarm:///home/hp000000/u000000/work

■ drwxr-xr-x 2 hpci000000 hp000000 0 Aug 5 15:24 work

■ [u00000@csgw1 ~]$ gfchmod 775 gfarm:///home/hp000000/u000000/work

■ [u00000@csgw1 ~]$ gfls -dl gfarm:///home/hp000000/u000000/work

■ drwxrwxr-x 2 hpci000000 hp000000 0 Aug 5 15:24 work

■ [u00000@csgw1 ~]$

Storage usage and allocation

The gfusage command outputs the amount of usage and number of files used by users.

Page 20: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

20

■ [u00000@csgw1 ~]$ gfusage

■ # UserName : FileSpace FileNum PhysicalSpace PhysicalNum

■ hpci000000 : 155354084939 32 321203401235 33

■ ----------------------------------------------------------------------

■ TOTAL : 155354084939 32 321203401235 33

■ [u00000@csgw1 ~]$

To check the amount of usage for each project, specify the group ID in the -g option. You can also use

the -H option to change the unit to the power of 10 (1 Kbyte = 1000 Byte).

■ [u00000@csgw1 ~]$ gfusage -g hp000000 -H ← Displayed as a power of 10

(1KByte = 1000Byte)

■ # GroupName : FileSpace FileNum PhysicalSpace PhysicalNum

■ hp000000 : 5.4T 18.0M 11.2T 48.2M

■ ----------------------------------------------------------------------

■ TOTAL : 5.4T 18.0M 11.2T 48.2M

■ [u00000@csgw1 ~]$

The following table shows each item of the gfusage command. For shared storage, FileSpace is used to

limit the amount of space allocated, and FileNum is used to limit the number of files allocated.

Please check FileSpace and FileNum when checking the amount of space used and the number of files

used for each issue.

(PhysicalSpace and PhysicalNum are not used for limits.)

FileSpace Storage usage (The quota limit is based on the File Sapce value)

・In the following example: 100

FileNum Number of files (The limit on the number of files to be allocated uses the value of

FileNum)

※ The number of files is the number of files in the metadata, which is the sum of

files, directories, and symbolic links.

・In the example below: 3 = number of files 1 + number of directories 1 + number of

symbolic links 1

PhysicalSpace Physical usage including replicas

PhysicalNum Number of files including replicas (However, the number of directories and symbolic

links are not included.)

Page 21: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

21

In the following example, only one file (100Byte), one directory, and one symbolic link are stored in the

shared storage.

■ [u00000@csgw1 ~]$ gfls -l gfarm:///home/hp000000/u000000

■ drw-r--r-- 1 hpci000000 hp000000 0 Nov 12 13:41 directory

■ -rw-r--r-- 1 hpci000000 hp000000 100 Nov 12 13:41 file

■ lrwxrwxrwx 1 hpci000000 hp000000 0 Nov 12 13:41 symboliclink -> file

■ [u00000@csgw1 ~]$ gfusage

■ # UserName : FileSpace FileNum PhysicalSpace PhysicalNum

■ hpci000000 : 100 3 200 2

■ [u00000@csgw1 ~]$

Shared storage restricts usage by resource quota capacity and number of quota files.

Please note that if either the quota or the number of files exceeds the limit, you will not be able to write

any files. To check the resource allocation and the number of allocated files for your project, specify

the -g and -H options and execute the gfquota command.

The "FileSpaceHardLimit" and "FileNumHardLimit" fields indicate the amount of space allocated and the

number of files allocated, respectively.

■ [u00000@csgw1 ~]$ gfquota -g hp000000 -H

■ GroupName : hp000000

■ GracePeriod : disabled

■ FileSpace : 100T ← Storage usage

■ FileSpaceGracePeriod : disabled

■ FileSpaceSoftLimit : disabled

■ FileSpaceHardLimit : 500T ← Allocated storage capacity

(limit)

■ FileNum : 1K ← Number of existing files

■ FileNumGracePeriod : disabled

■ FileNumSoftLimit : disabled

■ FileNumHardLimit : 6M ← Number of allocated files

(limit)

■ (snip)

■ [u00000@csgw1 ~]$

Page 22: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

22

File Sharing in a Project

In this section, we will show you how to share data stored in the shared storage between users who

belong to the same project.

The mount.hpci command only mounts the shared storage area of the executing user, and therefore

cannot refer to the shared storage area of other users.

For each project, a directory gfarm:///home/<Gfarm group ID>/shared is provided for sharing data

among users who belong to the same project.

All users in the project have read, write, and execute permissions on the directory

gfarm:///home/<Gfarm group ID>/shared.

To use the directory gfarm:///home/<Gfarm group ID>/shared, create a symbolic link with the gfln

command as follows.

■ [u00000@csgw1 ~]$ gfln -s gfarm:///home/hp000000/shared ¥

■ gfarm:///home/hp000000/u000000/shared

■ [u00000@csgw1 ~]$ gfls -l gfarm:///home/hp000000/u000000/shared

■ lrwxrwxrwx 1 hpci000000 hp000000 0 Jun 10 10:22 ¥

■ gfarm:///home/hp000000/u000000/shared -> gfarm:///home/hp000000/shared

■ [u00000@csgw1 ~]$ gfmkdir gfarm:///home/hp000000/u000000/shared/hpci000000

■ ↑Making of shared directory in a project

■ [u00000@csgw1 ~]$ gfls -ld gfarm:///home/hp000000/u000000/shared/*

■ drwxr-xr-x 1 u00000 hp000000 11 Jun 10 10:30 hpci000000

■ drwxr-xr-x 1 kxxxxx hp000000 11 Apr 23 2014 hpci12xxxx

■ [u00000@csgw1 ~]$

Installing the Client Environment

We have introduced how to use HPCI shared storage on the Fugaku login nodes, but you can also install

the HPCI shared storage client environment on your own machine. The installation method is described

in the "HPCI Shared Storage User Manual Client Installation (HPCI-ST01-002)" on the manual page of

the HPCI portal site.

https://www.hpci-office.jp/pages/e_hpci_info_manuals

Page 23: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

23

By installing the shared storage client environment, you can mount HPCI shared storage and use Gfarm-

specific commands such as gfls and gfpcopy, which we have introduced so far, on your machine.

Introduction of TIPS

FAQs and useful usage methods are listed on the Shared Storage page of the HPCI CMS as TIPS.

Please visit: https://www.hpci-office.jp/info/pages/viewpage.action?pageId=26935639 .

Page 24: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

24

Troubleshooting This chapter explains how to deal with any problems you may encounter while using

the shared storage system.

Introducing the HPCI Helpdesk

If a problem occurs while using the shared storage system, contact the HPCI

Helpdesk.

• HPCI helpdesk:

http://www.hpci-office.jp/pages/e_support

Please attach a log to your help request, as well as screenshots taken at the time the

problem occurred so that the issue can be quickly identified. The following information

will also help us to solve the problem more efficiently. Your cooperation is appreciated.

Report the status of the problem, including the following.

• The command that resulted in the problem (including the execution method and

accurate output at the time the error occurred)

• The time the problem occurred (as accurately as possible)

• The names of the HPCI System Provider and host where the problem occurred (or

as much configuration detail as possible if this cannot be provided)

• The local account used

• The output of the gfarm2fs–V command (shows the shared storage client version)

• The output of the grid-proxy-info command (to check the proxy certificate’s

validity)

In addition, if the shared storage system cannot be mounted then report the results of

executing the following commands. • mount.hpci / umount.hpci

• gfhost -lv

• gfmdhost -l

Alternatively, if the shared storage can be mounted but the files cannot be accessed,

report the results of executing the following commands.

Page 25: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

25

• gfdf

• gfexport

• gfls -l

• gfwhere -al

• gfstat

■ Troubleshooting individual errors

“Mountpoint is not empty” indicating the mountpoint is already in use

■ [u00000@csgw1 ~]$ mount.hpci

■ timeleft : 23:49:19

■ fuse: mountpoint is not empty

■ fuse: if you are sure this is safe, use the 'nonempty' mount option

■ [u00000@csgw1 ~]$

If you see the above message, it is likely that either the shared storage system has

already been mounted and is in use or a file exists at that location. Check to make

sure the mountpoint is empty, then try to mount the shared storage system again.

“No write access to mountpoint” and nothing can be written

■ fusermount: user has no write access to mountpoint

/volumeX/home/hp000000/u00000/gfarm/hp000000/u000000

Write access is granted to the mountpoint according to the settings for the user’s

home directory when the mount.hpci command is used. By default, the directory

owners are set as shown below. If this problem occurs, it is likely that these settings

have been changed, so you should check these permissions and correct them if

necessary.

Page 26: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

26

■ permission | owner | group | directory name

■ -----------+--------+----------+-----------------

■ drwxr-xr-x u000000 hp000000 ./gfarm

■ drwxr-xr-x u000000 hp000000 ./gfarm/hp000000

■ drwxr-xr-x u000000 hp000000 ./gfarm/hp000000/u000000

■ drwxr-xr-x u000000 hp000000 ./gfarm/hp000000/u000000

"Transport endpoint is not connected” and the shared storage cannot be accessed

■ libgfarm: [2000058] realpath(/home/hp000000 /u000000/gfarm/hp000000/u000000):

Transport endpoint is not connected

If you see a message like the one above, a previous mount process may have

terminated abnormally. Execute the umount.hpci command once and try to mount the

shared storage again. The error message “failed to umount” will be shown when you

execute the umount.hpci command, but this is normal.

■ [u00000@csgw1 ~]$ umount.hpci

■ Error: failed to umount GfarmFS on /gfarm/hp000000/u000000

■ [u00000@csgw1 ~]$ mount.hpci

■ timeleft : 22:41:46

■ Mount GfarmFS on /gfarm/hp000000/u000000

■ [u00000@csgw1 ~]$

Proceed to the next step if this method does not resolve the problem.

Check the shared storage mountpoint.

■ [u00000@csgw1 ~]$ df -H 2>/dev/null | grep $USER

■ gfarm2fs 85P 47P 39P 55% /gfarm/hp000000/u000000

[u00000@csgw1 ~]$

Using the fusermount command, unmount the mountpoint obtained above. Once this

has been unmounted successfully, remount the shared storage using the mount.hpci

command.

Page 27: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

27

■ [u00000@csgw1 ~]$ fusermount -u /gfarm/hp000000/u000000

■ [u00000@csgw1 ~]$ df -H2>/dev/null | grep $USER ← Cheking to mount status

■ [u00000@csgw1 ~]$ ← No output of shell, if was succes

■ [u00000@csgw1 ~]$ mount.hpci ← Mounted shared storage in mountpoint

■ timeleft : 22:20:36

■ Mount GfarmFS on /gfarm/hp000000/u000000

■ [u00000@csgw1 ~]$

Contact the HPCI Helpdesk if this has still not resolved the problem.

"Operation not permitted” and the shared storage cannot be mounted

■ fusermount: mount failed: Operation not permitted

If you see this message, it is possible that the access authority required for mounting

has not yet been granted. If this does not resolve the problem, contact the HPCI

Helpdesk.

"Transport endpoint is not connected” and files cannot be accessed

If you see this error message when trying to access a file in the (mounted) shared

storage area, the mount process may have terminated when the shared storage

system was mounted. Unmount the storage system as follows and then try to mount it

again.

■ [u00000@csgw1 ~]$ umount.hpci

■ Unmount GfarmFS on /gfarm/hp000000/u000000

■ [u00000@csgw1 ~]$ mount.hpci

■ timeleft : 21:42:44

■ Mount GfarmFS on /gfarm/hp000000/u000000

■ [u00000@csgw1 ~]$

Page 28: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

28

"Invalid argument” and files cannot be accessed

If you see the above error message when trying to access a file in the (mounted)

shared storage area, it is likely that the proxy certificate has expired. Obtain a new

proxy certificate, according to the instructions in Chapter 2.2, “Proxy Certificate

Issuance Procedure,” of the HPCI Login Manual (HPCI-CA01-001E):

https://www.hpci-office.jp/materials/hpci-ca01-001_e.pdf

Contact the HPCI Helpdesk if this does not resolve the problem.

"Connection refused” and files cannot be accessed

If you see this error message when trying to access a file in the (mounted) shared

storage area, it is likely that the metadata server has stopped. Check the output of the

gfls command. If you see the following error, contact the HPCI Helpdesk.

■ [u00000@csgw1 ~]$ gfls

libgfarm: [1000058] connecting to gfmd at ms-0.r-ccs.riken.jp:601 failed, sleep 1

sec: connection refused

(snip)

libgfarm: [1000058] connecting to gfmd at ms-0.r-ccs.riken.jp:601 failed, sleep

16 sec: connection refused

libgfarm: [1000059] cannot connect to gfmd at ms-0.r-ccs.riken.jp:601, give up:

connection refused

libgfarm: [1000017] connecting to gfmd at ms-0.r-ccs.riken.jp:601: connection

refused

gfls: gfarm_initialize: connection refused

■ [u00000@csgw1 ~]$

"Input/Output Error” and files cannot be accessed

If you see this message when trying to access files in the (mounted) shared storage

area, investigate the situation using the gfexport, gfls, or gfwhere commands.

Examples showing the normal outputs of these commands are given below. If you do

not see messages like these, it is likely that an error has occurred.

Page 29: HPCI Shared Storage User Manual

HPCI Shared Storage User Manual (for Fugaku users)

29

■ [u00000@csgw1 ~]$ gfexport test.dat

■ [u00000@csgw1 ~]$ ← No output of shell, if was success.

[u00000@csgw1 ~]$ gfls -l test.dat

-rw-r--r-- 2 hpci000000 hp000000 104857600 Apr 1 01:23 gfarm://ms-0.r-

ccs.riken.jp:601/home/hp000000/u000000/test.dat

■ [u00000@csgw1 ~]$

[u00000@csgw1 ~]$ gfwhere -di test.dat

■ ss-09-0-2.r-ccs.riken.jp

■ gfs53-2.hpci.itc.u-tokyo.ac.jp

■ [u00000@csgw1 ~]$

If either the file size is 0 or the gfwhere–di command outputs a file system node, it is

possible that the system has failed. In this case, contact the HPCI Helpdesk.

If the file size is greater than 0 and the gfwhere–di command outputs nothing, all files

may have been damaged or lost. In this case, investigate the situation using the gfstat

command. An example of the normal output of the gfstat command is as follows.

■ [u00000@csgw1 ~]$ gfstat test.dat

■ File: "gfarm://ms-0.r-ccs.riken.jp: 601/home/hp000000/u000000/work/test.dat "

■ Size: 10485760 Filetype: regular file

■ Mode: (0644) Uid: (hpci000000) Gid: (hp000000)

■ Inode: 117016462 Gen: 1

■ (0000000006F9878E0000000000000001)

■ Links: 1 Ncopy: 2

■ Access: 2014-11-11 16:30:30.210115479 +0900

■ Modify: 2014-11-11 16:30:24.332430836 +0900

■ Change: 2014-11-11 16:30:24.332430836 +0900

■ [u00000@csgw1 ~]$

If you do not see the above status, it is likely that an error has occurred. In this case,

contact the HPCI Helpdesk.

Hosts where replicas are managed will be

output.