Top Banner
GPFS Installation INF-110
29

gpfs-513

Apr 10, 2018

Download

Documents

smart_aix
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 1/29

GPFS Installation

INF-110

Page 2: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 2/29

2

OverviewOverview

• Plan the installationBefore installing any software, it is important to plan the GPFS installation by choosingthe hardware, deciding which kind of disk connectivity to use (direct attached ornetwork attached disks), selecting the network capabilities (which depends a lot onthe disk connectivity), and, maybe the most important, verifying that yourapplication can take advantage of GPFS.

• Install the packagesAt this point, the GPFS architecture has been defined and the machines have Linux

installed . It is time now to install the packages on all the nodes that will be part ofthe GPFS cluster.

• Create the GPFS clusterOnce the GPFS packages are installed on the nodes, you need to create the GPFS

cluster. To create the GPFS cluster, we need a file that contains all of the node hostnames or IP addresses. Then we have to use the mmcrcluster command tocreate the cluster. This command will create cluster data information on all nodeschosen to be part of the GPFS cluster. In case a new node needs to be added to an

already existing GPFS cluster, the mmaddcluster command can be used.

Page 3: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 3/29

3

OverviewOverview (continued) (continued) 

• Start GPFSAfter the nodeset is created, you should start it before defining the disk. Use themmstartup command to start the GPFS daemons.

• Disk definitionAll disks used by GPFS in a nodeset have to be described in a file, and then this file

has to be passed to the mmcrnsd command. This command gives a name to eachdescribed disk and ensures that all the nodes included in the nodeset are able togain access to the disks with their new name.

• Creating the file systemOnce the cluster, the nodeset(s), and the disks have been defined, then it is time to

create the file system. With GPFS, the mmcrfs command is used for that purpose.There are many options that can be activated at this time, like file system auto-mounting, file system block size, data or metadata replication, and so on.

• Mounting the file system

At last, you have to mount the file system after it is being created. Once the file systemhad been mounted, it can be used by the nodes for read and write operations. Ifyou set auto-mounting option, your GPFS file system will be automatically mountedwhen the nodes reboot.

Page 4: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 4/29

4

Setup GPFS EnvironmentsSetup GPFS Environments

• Add a path to the GPFS binary directory to your $PATH environmentin all nodes. Type:mkdir -p /cfmroot/etc/profile.d

• Create /cfmroot/etc/profile.d/mmfs.sh, which contains:

PATH=$PATH:/usr/lpp/mmfs/bin

MANPATH=$MANPATH:/usr/lpp/mmfs/man

• Type:chmod 755 /cfmroot/etc/profile.d/mmfs.sh

cfmupdatenode -a

cp /cfmroot/etc/profile.d/mmfs.sh /etc/profile.d

. /etc/profile.d/mmfs.sh

• This way, you will distribute /etc/profile.d/mmfs.sh to allnodes, including its attributes.

Page 5: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 5/29

5

Install GPFSInstall GPFS

• The GPFS install and update files are located on the managementnode in the /lab/gpfs directory.

• Extract updates.mkdir -p /tmp/gpfs/updates

cp –r /lab/gpfs/* /tmp/gpfs

cd /tmp/gpfs/updates

tar zxvf *update.tar.gz

• Install GPFS and updates on management node.cd /tmp/gpfs;rpm -ivh gpfs*rpm

cd /tmp/gpfs/updates;rpm -Uvh gpfs*rpm

• Copy packages on compute nodes.dsh -a mkdir –p /tmp/gpfs/updates

cd /tmp/gpfs

dcp -a *rpm /tmp/gpfs

dcp –a updates/*rpm /tmp/gpfs/updates

• Install GPFS on compute nodes.dsh -a 'cd /tmp/gpfs;rpm -ivh gpfs*rpm‘

• Install GPFS updates.dsh -a 'cd /tmp/gpfs/updates;rpm -Uvh gpfs*rpm‘

Page 6: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 6/29

6

Prepare kernel

• Since GPFS code works at the kernel level (as kernel extensions), ithighly depends on the kernel level to run properly. Therefore, youhave to build your GPFS open source portability module beforebuilding a GPFS cluster, and a kernel source file is required for that.You may check the list of kernel versions that may be supported at thefollowing site:

• http://www-1.ibm.com/servers/eserver/clusters/software/gpfs_faq.html

• Lab Note: There are a few patches that should be applied. Read the

FAQ in the future. In this Lab we will not be apply the patches to savetime.

• Create Link to kernel sourcecd /usr/src

ln –s linux-2.4 linux

• Clean up treecd /usr/src/linux

make mrproper

Page 7: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 7/29

7

Prepare kernel (continued)

• Check the content of the VERSION, PATCHLEVEL, SUBLEVEL, andEXTRAVERSION variables in the /usr/src/linux/Makefile fileto match the release version of your kernel.

uname -r to check your version, e.g. 2.4.21-27.ELsmp

• Edit Makefile

VERSION = 2

PATCHLEVEL = 4

SUBLEVEL = 21

EXTRAVERSION = -27.ELsmp

• Copy kernel configuration filecp configs/kernel-2.4.21-i686-smp.config .config

• Type:make oldconfig

make dep

Page 8: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 8/29

8

Build the GPFS open source portability layer

• You have to build the GPFS open source portability layer manually on onenode (in our case, the management node), then copy them through all nodes.

• Below are the steps to build GPFS open source portability layer. Also, checkthe /usr/lpp/mmfs/src/README file for more up to date information onbuilding the GPFS Open Source portability layer:

export SHARKCLONEROOT=/usr/lpp/mmfs/src

cd /usr/lpp/mmfs/src/config

cp site.mcr.proto site.mcr

Edit the /usr/lpp/mmfs/src/config/site.mcr file. There are some sections that need to be

checked (bold):

• /* $Id: site.mcr.proto,v 1.442.2.5 2004/06/07 15:45:28 gjertsen Exp $ */

• ........

• /* Linux distribution (select/uncomment only one) */

• /* LINUX_DISTRIBUTION = REDHAT_LINUX */

• LINUX_DISTRIBUTION = REDHAT_AS_LINUX

• ........

• /* #define LINUX_DISTRIBUTION_LEVEL 80 */

• ........

• /* Linux kernel versions supported for each architecture */

• #define LINUX_KERNEL_VERSION 2042127

cd ..

make Worldmake InstallImages

Page 9: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 9/29

9

Distribute the GPFS portability layer

• Copy the above binaries to the /cfmroot/usr/lpp/mmfs/bindirectory and distribute them to all nodes using the cfmupdatenode

command or your own scripts :

mkdir -p /cfmroot/usr/lpp/mmfs/bincd /usr/lpp/mmfs/bin

cp mmfslinux lxtrace tracedev dumpconv /cfmroot/usr/lpp/mmfs/bin

cfmupdatenode -a

Page 10: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 10/29

10

Creating the GPFS nodes descriptor file• ssh to node1. All GPFS commands should be run from nodes that will be

running GPFS. The management node will NOT be running GPFS.

ssh node1

• When creating your GPFS cluster, you need to provide a file containing a list of

node descriptors, one per line for each node to be included in the cluster,including the storage nodes. Each descriptor must be specified in the form:

NodeName:NodeDesignations

• where:

An optional, - separated list ofnode roles. Roles include:

manager|client

quorum|nonquorum

NodeDesignations

The host name or IP address ofthe node for GPFS daemon todaemon communication.

NodeName

Page 11: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 11/29

11

Creating the GPFS nodes descriptor file• Create a file /tmp/gpfs.allnodes with a list of your nodes and

their roles. Ensure there is at least one node with quorum andmanager roles defined. For example:

node1:manager-quorum

node2:manager-quorum

Node3:quorum

node4:

• The above file signifies that we have four nodes in our GPFS cluster.

Node1 has configuration manager and quorum roles.

Node2 has configuration manager and quorum roles.

Node3 has the quorum role.

Node4 is using the defaults of non-quorum and client roles.

Page 12: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 12/29

12

Defining the GPFS cluster• Run the mmcrcluster command to define the GPFS cluster.

• Defined your node1 as the primary, node2 as the secondary (for GPFS data),ssh as remote shell command and scp as remote file copy commands.

.

• For example:mmcrcluster -p node1 -s node2 -n /tmp/gpfs.allnodes -r /usr/bin/ssh -R

/usr/bin/scp

Tue Aug 10 14:00:46 CDT 2004: mmcrcluster: Processing node

node1.cluster.net

Tue Aug 10 14:00:48 CDT 2004: mmcrcluster: Processing nodenode2.cluster.net

Tue Aug 10 14:00:49 CDT 2004: mmcrcluster: Processing nodenode3.cluster.net

Tue Aug 10 14:00:50 CDT 2004: mmcrcluster: Processing nodenode4.cluster.net

Tue Aug 10 14:00:55 CDT 2004: mmcrcluster: Initializing needed RSCTsubsystems.

mmcrcluster: Command successfully completed

• After creating the cluster definitions, you can see the definitions using themmlscluster command. Type:mmlscluster

Page 13: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 13/29

13

Starting GPFS

• After creating the GPFS cluster, you can start the GPFS services onevery node in the cluster by issuing the mmstartup command with

the -a parameter. The -a parameter will start GPFS on all nodes in thecluster.

• Type:

mmstartup –a

Note: To shutdown GPFS type: mmshutdown -a (do not type it now)

Page 14: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 14/29

14

Prepare Disks (Skip)

• For Fiber disks, create arrays, LUNs, and mappings• Use CSM and GPFS Redbook as a guide

• Use GPFS documentation

• Use DS4xxx (FAStT) documentation

• Lab Note: We were unable to obtain DS4xxx controllers and disk.• For each disk to be used for GPFS on each node use fdisk to removeany partitions. NOTE: We will be using disk /dev/hdc in nodes1-nodes4.

For Example:

ssh node1

fdisk /dev/hdc

Use ? for a list of commands to display and remove partitions.

Page 15: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 15/29

15

Disk definitions• A GPFS cluster with NSD network attached servers means that all

access to the disks and replication will be through one or two storageattached servers (also known as storage node). If your cluster has aninternal network segment, this segment will be used for this purpose.

• If a disk is defined with one storage attached server only, and theserver fails, the disks would become unavailable to GPFS. If the diskis defined with two NSD network attached servers, then GPFSautomatically transfers the I/O requests to the backup server.

• Lab Note: We were unable to provide Fiber storage. You will beunable to define two paths to the storage.

• Lab Note: The four nodes in your cluster (e.g. node1 - node4) eachcontain a single 40GB drive (/dev/hdc). You will use this as your

GPFS storage.

Page 16: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 16/29

16

Creating Network Shared Disks (NSDs)• You will need to create a descriptor file before creating your NSDs.

This file should contain information about each disk that will be a NSD,and should have the following syntax:

DeviceName:PrimaryNSDServer:SecondaryNSDServer:DiskUsage:FailureGroup

An integer value (0 to 4000) that identifies the failure group to which this diskbelongs. All disks with a common point of failure must belong to the samefailure group. The value -1 indicates that the disk has no common point of failurewith any other disk in the file system. GPFS uses the failure group informationto assure that no two replicas of data or metadata are placed in the same groupand thereby become unavailable due to a single failure. When this field is not

specified, GPFS assigns a failure group (higher than 4000) automatically toeach disk.

FailureGroup

The kind of information should be stored in this disk. The valid valuesare data, metadata, and dataAndMetadata (default).

DiskUsage

The server where the secondary disk attachment is connected.SecondaryServer

The host name of the server that the disk is attached to; Remember you mustalways use the node names defined in the cluster definitions.

PrimaryServer

The real device name of the external storage partition (such as /dev/hdc).DeviceName

Page 17: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 17/29

17

Creating Network Shared Disks (NSDs)• Create a new file /tmp/descfile

E.g.

/dev/hdc:node1::dataAndMetadata:-1

/dev/hdc:node2::dataAndMetadata:-1

/dev/hdc:node3::dataAndMetadata:-1

/dev/hdc:node4::dataAndMetadata:-1

• Now create the Network Shared Disks by using the mmcrnsdcommand:mmcrnsd -F /tmp/descfile -v no

• After successfully creating the NSD for GPFS cluster, mmcrnsd willcomment the original disk device and put the GPFS assigned globalname for that disk device at the following line. cat /tmp/descfile to see

the changes.cat /tmp/descfile

• You can see the new device names by using the mmlsnsd command.mmlsnsd

Page 18: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 18/29

18

Creating the GPFS file system• Once you have your NSDs ready, you can create the GPFS file

system. In order to create the file system, you will use the mmcrfscommand, where you must define the following attributes in this order: – The mount point. – The name of the device for the file system. – The descriptor file (-F).

• Type:mmcrfs /gpfs1 /dev/gpfs1 -F /tmp/descfile -A yes -B 256K -n 4 -v no

• Validate with mmlsdisk

mmlsdisk gpfs1

• Mount filesystems, exit node1 and type from the mgmt1 node:dsh -a mount –a

• Validate with df. You should have a single 156GB filesystem spanning4 disks in 4 nodes available to all nodes.dsh -a df

• Please review the CSM and GPFS Redbook and GPFSdocumentation for a list of administrative functions.

Page 19: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 19/29

19

Removing GPFS (Skip)• Often it is desired to completely remove GPFS and start over. The

most common cause is SSH and DNS setup issues that causedistributed GPFS commands to fail. Cleanup can be difficult.

• Remove GPFS from management node.

rpm -e gpfs.base gpfs.docs gpfs.gpl gpfs.msg.en_US

rm -rf /var/mmfs

• Remove GPFS from all nodes.

dsh -a ‘rpm -e gpfs.base gpfs.docs gpfs.gpl gpfs.msg.en_US’

dsh -a rm -rf /var/mmfs

• Do not remove any SRC or RSCT components.

Page 20: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 20/29

20

Authentication

• HPC clusters require a global authentication solutionenabling all nodes view all users with the same properties.

• Many authentication solutions exist. The most commonare: – NIS – LDAP – File synchronization

• File synchronization is most popular with HPC clustersand is the most scalable solution for very large clusters. Itis also easy to setup.

• Create a cluster use on your management node:useradd (username), For example:

useradd bob

Page 21: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 21/29

21

Authentication (continued)• Backup existing /etc/passwd and /etc/group files first. If for any

reason /etc/passwd gets corrupted you will be unable to login evenas root. A reboot to single user mode will be required to recover thebackup.dsh -a cp /etc/passwd /etc/passwd.SAVE

dsh -a cp /etc/group /etc/group.SAVE

• Push the /etc/passwd and /etc/group files to all nodes

• Verifydsh -a grep (username) /etc/passwd, For example:dsh -a grep bob /etc/passwd (check the output)

• Each time a new user is added, a node is added, or a node is

reinstalled run cfmupdatenode -a again.

• Generate SSH keys for each cluster user. (root is NOT a clusteruser). rsh clusters may need to create a .rhosts file per user.

Page 22: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 22/29

22

File Systems• Like authentication, HPC clusters also require a global file system solution

enabling all nodes to view the same files with the same properties.

• There are many solutions available. The most common are:NFSGPFS

GPFS is usually not required for user, application, and library directories. GPFS isbest suited for data directories.

• In this LAB we will create 2 global name spaces.NFS: /home for user applicationNFS: /usr/local for system applications and libraries.

• To setup NFS you must first export the /home and /usr/local file systemsfrom your management node. Append the follow lines to your/etc/exports file:

/home *(rw,no_root_squash,sync)

/usr/local *(rw,no_root_squash,sync)

• Restart NFS.

service nfs restart

Page 23: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 23/29

23

File Systems (continued)• Verify

dsh -a ls -l /home | grep (user name you added, e.g. bob)

For example:dsh -a ls -l /home | grep bob

• This verification checks that both the file systems and authenticationare working properly. Your dsh output should have listed the/home/username directory for your cluster user AND the user shouldhave owned the directory, e.g.

node1: drwx------ 5 bob bob 4096 Mar 24 05:01 bobnode2: drwx------ 5 bob bob 4096 Mar 24 05:01 bob

node3: drwx------ 5 bob bob 4096 Mar 24 05:01 bob

node4: drwx------ 5 bob bob 4096 Mar 24 05:01 bob

• Also verify the /usr/local was mounted.dsh -a df | grep /usr/local

Page 24: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 24/29

24

MPICH-IP• MPICH is a freely available, portable implementation of MPI, the Standard for

message-passing libraries the runs over IP.

• MPICH URL http://www-unix.mcs.anl.gov/mpi/mpich

• Install MPICH for GNU compiler.mkdir –p /tmp/mpi

cp /lab/hpc/mpich*tar.gz /tmp/mpicp /lab/hpc/mpimaker /tmp/mpi

export MPICHROOT=/usr/local/mpich

cd /tmp/mpi

./mpimaker mpich-1.2.7 up gnu ssh

• A successful build should return:

mpimaker: 1.2.5.2 up gnu ssh build startmpimaker: 1.2.5.2 up gnu ssh makempimaker: 1.2.5.2 up gnu ssh build successful

MPICH installed in /usr/local/mpich/1.2.5.2/ip/up/gnu/ssh

• Please check config.cmd make.log install.log configure.log in/usr/local/mpich/1.2.5.2/ip/up/gnu/ssh for errors. config.cmdwas the command used to build MPICH

• If the build failed check the files config.cmd make.log install.logconfigure.log in /tmp/hpc/mpich-1.2.5.2.

Page 25: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 25/29

25

mpiiotest• mpiiotest is a simple utility to test parallel file systems

• su to the the user you created earlier

su – (user name you added, e.g. bob) For example:

su – bob

• Copy the mpiiotest to the user’s home directory

mkdir ~/bench/cp /lab/hpc/mpiiotest.tgz ~/bench/

cd ~/bench/

tar zxvf mpiiotest.gz

• Build mpiiotestexport MPICH=/usr/local/mpich/1.2.7/ip/i686/up/gnu/ssh

export PATH=$MPICH/bin:$PATH

cd ~/bench/

make clean

make

Page 26: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 26/29

26

mpiiotest (continued)• Setup the users environment:

ssh node1

cd ~/bench

export MPICH=/usr/local/mpich/1.2.7/ip/i686/up/gnu/ssh

export PATH=$MPICH/bin:$PATH

• Create a file "machinefile" with 1 entries per node, e.g.:node1

node2

node3

node4

• Open another xterm on your workstation machine as root and type:xhost +

• Type the following as root on your management node:

dsh -a chmod 777 /gpfs1

• Type on one line:mpirun -machinefile machinefile -np 4 mpiiotest --filename /gpfs1/test

--filesize 10240 --blocksize 64 --display mgmt1:0 -g 1000x30

Page 27: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 27/29

27

mpiiotest (continued)• First mpiiotest creates the file in parallel. Each red band represents the status

of the current process write progress. When the bar is red the file has beenwritten.

• Next mpiiotest reads the created file. Each blue band represents the status ofthe current process read progress. When the bar is blue the file has been red.

• Exit back to the mgmt1 node.

Page 28: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 28/29

28

mpiiotest (continued)

• The performance of any filesystem is affected by the blocksize usedby that filesystem vs the blocksize that the application is using. Sincethe GPFS filesystem was setup with a 256K blocksize, the optimalblocksize for this test should be 256K. Test this by trying a couple ofdifferent blocksizes, recording the total read and write performance for

each run.

• Type on one line:mpirun -machinefile machinefile -np 4 mpiiotest --filename

/gpfs1/test --filesize 10240 --blocksize 128 --displaymgmt1:0 -g 1000x30

mpirun -machinefile machinefile -np 4 mpiiotest --filename

/gpfs1/test --filesize 10240 --blocksize 256 --display

mgmt1:0 -g 1000x30

mpirun -machinefile machinefile -np 4 mpiiotest --filename

/gpfs1/test --filesize 10240 --blocksize 512 --display

mgmt1:0 -g 1000x30

Page 29: gpfs-513

8/8/2019 gpfs-513

http://slidepdf.com/reader/full/gpfs-513 29/29

29

Modify GPFS Block Size

• If time permits modify the blocksize of the GPFS filesystem and rerunthe mpiiotest benchmark with the same three blocksizes used above.

• Follow the steps above to remove GPFS (page 20)

• Follow the steps above to reinstall GPFS (page 6)

• Follow the steps above to reconfigure GPFS (pages 10-19). On page

19 modify the mmcrfs command to read as follows:mmcrfs /gpfs1 /dev/gpfs1 -F /tmp/descfile -A yes -B 64K -n 4 -v no

• Type the following as root on your management node:

dsh -a chmod 777 /gpfs1

• Follow the steps above to rerun the mpiiotest benchmark with thethree blocksizes (pages 27-30)