Top Banner

of 33

ALEX-gpfs

Jun 03, 2018

Download

Documents

danielvp21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/12/2019 ALEX-gpfs

    1/33

    On evaluating GPFS

    Research work that has been done at HLRS byAlejandro Calderon

  • 8/12/2019 ALEX-gpfs

    2/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 2

    On evaluating GPFS

    Short description

    Metadataevaluation

    fdtree

    Bandwidthevaluation

    Bonnie

    Iozone

    IODD

    IOP

  • 8/12/2019 ALEX-gpfs

    3/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 3

    GPFS descriptionhttp://www.ncsa.uiuc.edu/UserInfo/Data/filesystems/index.html

    General Parallel File System (GPFS) is a parallel file system package developed by IBM.

    History:

    Originally developed for IBM's AIX operating system then ported to Linux Systems.

    Features:Appearsto workjustlike a traditional UNIX file systemfromthe user application level.

    Provides additional functionality and enhanced performancewhen accessed viaparallel interfaces such as MPI-I/O.

    High performance is obtained by GPFSby striping data across multiple nodes and disks.

    Striping is performedautomatically at the block level. Therefore, all files

    (larger than the designated block size) will be striped.Can be deployed in NSD or SANconfigurations.

    Clusters hosting a GPFS file systemcan allowother clustersat differentgeographical locations to mount that file system.

  • 8/12/2019 ALEX-gpfs

    4/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 4

    GPFS (Simple NSD Configuration)

  • 8/12/2019 ALEX-gpfs

    5/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 5

    GPFS evaluation (metadata)

    fdtree Used for testing the metadata performance of a file system

    Create several directories and files, in several levels

    Used on:

    Computers:

    noco-xyz

    Storage systems:

    Local, GPFS

  • 8/12/2019 ALEX-gpfs

    6/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 6

    fdtree [local,NFS,GPFS]

    0

    500

    1000

    1500

    2000

    2500

    Operations/Sec.

    Directory creates per

    second

    File creates per second File removals per

    second

    Directory removals per

    second

    ./fdtree.bash -f 3 -d 5 -o X

    /gpfs

    /tmp

    /mscratch

  • 8/12/2019 ALEX-gpfs

    7/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 7

    fdtree on GPFS (Scenario 1)ssh {x,...} fdtree.bash -f 3 -d 5 -o /gpfs...

    Scenario 1:

    several nodes, several process per node,

    different subtrees,

    many small files

    P1 Pm

    nodex

  • 8/12/2019 ALEX-gpfs

    8/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 8

    fdtree on GPFS (scenario 1)ssh {x,...} fdtree.bash -f 3 -d 5 -o /gpfs...

    0

    100

    200

    300

    400

    500

    600

    1n-1p 4n-4p 4n-8p 4n-16p 8n-8p 8n-16p

    Operations/Se

    c.

    Directory creates per second

    File creates per second

    File removals per second

    Directory removals per second

  • 8/12/2019 ALEX-gpfs

    9/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 9

    fdtree on GPFS (Scenario 2)ssh {x,...} fdtree.bash -l 1 -d 1 -f 1000 -s 500 -o /gpfs...

    Scenario 2:

    several nodes, one process per node,

    same subtree,

    many small files

    P1 Px

    nodex

  • 8/12/2019 ALEX-gpfs

    10/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 10

    fdtree on GPFS (scenario 2)ssh {x,...} fdtree.bash -l 1 -d 1 -f 1000 -s 500 -o /gpfs...

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    1 2 4 8

    number of process (1 per node)

    Filescreatesper

    second

    working in the same directory

    working in different directories

  • 8/12/2019 ALEX-gpfs

    11/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 11

    Metadata cache on GPFS client

    hpc13782 noco186.nec 304$ time ls -als | wc -l894

    real 0m0.466suser 0m0.010ssys 0m0.052s

    Working in a GPFS directory with 894 entries

    lslasneed to get each file attribute from GPFSmetadata server

    In a couple of seconds, the contents of the cacheseams disappear

    hpc13782 noco186.nec 305$ time ls -als | wc -l894

    real 0m0.222suser 0m0.011ssys 0m0.064s

    hpc13782 noco186.nec 306$ time ls -als | wc -l894

    real 0m0.033suser 0m0.009ssys 0m0.025s

    hpc13782 noco186.nec 307$ time ls -als | wc -l894

    real 0m0.034suser 0m0.010ssys 0m0.024s

  • 8/12/2019 ALEX-gpfs

    12/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 12

    fdtree results

    Main conclusions

    Contention at directory level:

    If two o more process from a parallel application need to writedata, please be sure each one use different subdirectoriesfrom GPFS workspace

    Better results than NFS (but lower that the local file system)

  • 8/12/2019 ALEX-gpfs

    13/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 13

    GPFS performance (bandwidth)

    Bonnie

    Read and write a 2 GB file

    Write, rewrite and read

    Used on:

    Computers:

    Cacau1

    Noco075

    Storage systems:

    GPFS

  • 8/12/2019 ALEX-gpfs

    14/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 14

    Bonnie on GPFS [write + re-write]

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    bandwidth(M

    B/sec.)

    write

    rewrite

    write 51,86 164,69

    rewrite 3,43 36,35

    cacau1-GPFS noco075-GPFS

    GPFS over NFS

  • 8/12/2019 ALEX-gpfs

    15/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 15

    Bonnie on GPFS [read]

    0

    50

    100

    150

    200

    250

    bandwidth(MB

    /sec.)

    read

    read 75,85 232,38

    cacau1-GPFS noco075-GPFS

    GPFS over NFS

  • 8/12/2019 ALEX-gpfs

    16/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 16

    GPFS performance (bandwidth)

    Iozone Write and read with several file size and access size

    Write and read bandwidth

    Used on:

    Computers:

    Noco075

    Storage systems: GPFS

  • 8/12/2019 ALEX-gpfs

    17/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 17

    64

    128

    256

    512

    1024

    2048

    4096

    8192

    16384

    32768

    65536

    131072

    262144

    524288

    4

    32

    256

    2048

    16384

    0,00

    200,00

    400,00

    600,00

    800,00

    1000,00

    1200,00

    Bandwidth(MB/s)

    File size (KB)

    RecL

    en(bytes)

    Write on GPFS

    1000,00-1200,00

    800,00-1000,00

    600,00-800,00

    400,00-600,00

    200,00-400,00

    0,00-200,00

    Iozone on GPFS [write]

  • 8/12/2019 ALEX-gpfs

    18/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 18

    Iozone on GPFS [read]

    64

    128

    256

    512

    1024

    2048

    4096

    8192

    16384

    32768

    65536

    131072

    262144

    524288

    4

    16

    64

    256

    1024

    4096

    16384

    0,00

    500,00

    1000,00

    1500,00

    2000,00

    2500,00

    Bandwidth(M

    B/s)

    File size (KB)

    RecL

    en(bytes)

    Read on GPFS

    2000,00-2500,00

    1500,00-2000,00

    1000,00-1500,00

    500,00-1000,00

    0,00-500,00

    GPFS l i (b d id h)

  • 8/12/2019 ALEX-gpfs

    19/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 19

    GPFS evaluation (bandwidth)

    IODD Evaluation of disk performance by using several nodes:

    disk and networking

    A dd-like command that can be run from MPI

    Used on: 2, and 4 nodes,

    4, 8, 16, and 32 process (1, 2, 3, and 4 per node)that write a file of 1, 2, 4, 8, 16, and 32 GB

    By using both, POSIX interface and MPI-IO interface

    next ->

  • 8/12/2019 ALEX-gpfs

    20/33

  • 8/12/2019 ALEX-gpfs

    21/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 21

    IODD on 2 nodes [MPI-IO]

    12

    48

    1632

    8

    2

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180GPFS (writing, 2 nodes)bandwidth (MB/sec.)

    process per node

    file size (GB)

    160-180140-160120-140100-12080-10060-8040-6020-40

    0-20

  • 8/12/2019 ALEX-gpfs

    22/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 22

    12

    48

    1632

    8

    2

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    GPFS (writing, 4 nodes)

    bandwidth (MB/sec.)

    process per node

    file size (GB)

    160-180

    140-160

    120-140

    100-120

    80-100

    60-80

    40-60

    20-40

    0-20

    IODD on 4 nodes [MPI-IO]

  • 8/12/2019 ALEX-gpfs

    23/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 23

    Differences by using different APIs

    12

    48

    1632

    16

    8

    4

    2

    1

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180GPFS (writing, 2 nodes)

    bandwidth (MB/sec.)

    process per node

    file size (GB)

    160-180

    140-160

    120-140

    100-120

    80-100

    60-8040-60

    20-40

    0-20

    GPFS (2 nodes, POSIX) GPFS (2 nodes, MPI-IO)

    12

    48

    32

    16

    8

    4

    21

    0

    10

    20

    30

    40

    50

    60

    70

    process per node

    file size (GB)

  • 8/12/2019 ALEX-gpfs

    24/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 24

    IODD on 2 GB [MPI-IO, = directory]

    1

    24

    8

    16

    32

    0

    20

    40

    60

    80

    100

    120

    140

    160

    Number of nodes

    GPFS (writing, 1-32 nodes, same directory)

    bandwidth (MB/sec.)

  • 8/12/2019 ALEX-gpfs

    25/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 25

    IODD on 2 GB [MPI-IO, directory]

    1

    2

    4

    8

    16

    32

    0

    20

    40

    60

    80

    100

    120

    140

    160

    Number of nodes

    GPFS (writing, 1-32 nodes, different directories)

    bandwidth (MB/sec.)

    IODD lt

  • 8/12/2019 ALEX-gpfs

    26/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 26

    IODD results

    Main conclusions

    The bandwidth decrease with the number of processes per node

    Beware of multithread application with medium-high I/O

    bandwidth requirements for eachthread

    It is very important to use MPI-IO because this API letusers get more bandwidth

    The bandwidth decrease with more than 4nodes too With large files, the metadata management seams not to be the main

    bottleneck

    GPFS l ti (b d idth)

  • 8/12/2019 ALEX-gpfs

    27/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 27

    GPFS evaluation (bandwidth)

    IOP Get the bandwidth obtained by writing and reading in parallel

    from several processes

    The file size is divided between the process number so each

    process work in an independent part of the file

    Used on: GPFS through MPI-IO (ROMIO on Open MPI)

    Two nodes writing a 2 GB files in parallel On independent files (non-shared)

    On the same file (shared)

    H IOP k

  • 8/12/2019 ALEX-gpfs

    28/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 28

    How IOP works

    2 nodes m= 2 process (1 per node)

    n= 2 GB file size

    a a .. b b .. x x ..

    P1 P2 Pm

    File per process (non-shared)

    a b .. x a b .. x a b .. x

    P1 P2 Pm

    Segmented access (shared)

    n n

    O ff /

  • 8/12/2019 ALEX-gpfs

    29/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 29

    IOP: Differences by using shared/non-shared

    writing on file(s) over GPFS

    0

    20

    40

    60

    80100

    120

    140

    160

    180

    1KB

    2KB

    4KB

    8KB

    16KB

    32KB

    64KB

    128KB

    256KB

    512KB

    1MB

    access size

    Bandwidth(MB/sec.)

    NON-shared

    shared

    IOP Diff b i h d/ h d

  • 8/12/2019 ALEX-gpfs

    30/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 30

    reading on file(s) over GPFS

    020

    40

    60

    80

    100

    120

    140

    160

    180200

    1KB

    2KB

    4KB

    8KB

    16KB

    32KB

    64KB

    128KB

    256KB

    512KB

    1MB

    access size

    Bandwidth(MB/sec.)

    NON-shared

    shared

    IOP: Differences by using shared/non-shared

  • 8/12/2019 ALEX-gpfs

    31/33

  • 8/12/2019 ALEX-gpfs

    32/33

    acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 32

    0

    20

    40

    60

    80

    100

    120

    140

    1MB

    512KB

    256KB

    128KB

    64KB

    32KB

    16KB

    8KB

    4KB

    2KB

    1KB

    access size

    bandwith(MB/sec)

    write

    read

    Rread

    Bread

    GPFS writing in shared file:the 128 KB magic number

    IOP results

  • 8/12/2019 ALEX-gpfs

    33/33

    acaldero @ arcos inf uc3m es HPC Europa (HLRS) 33

    IOP results

    Main conclusions

    Ifseveral process try to write to the same file but onindependent areas thenthe performance decrease

    With several independentfiles results are similaron severaltests, but with shared fileare more irregular

    Appears a magic number: 128 KBSeams that at that point the internal algorithm changes and it

    increases the bandwidth