Top Banner
Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake CSCS-ETHZ
26

Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Jun 05, 2018

Download

Documents

lynga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Technology Testing at CSCS includingBeeGFS Preliminary Results

Hussein N. HarakeCSCS-ETHZ

Page 2: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Agenda

© CSCS 2016 2

• About CSCS• About the Systems Integration (SI) Unit• Technology Overview

• DDN IME• DDN WOS• OpenStack

• BeeGFS Case Study• What is BeeGFS?• Test System Layout• Tuning• Monitoring • Benchmark tools• Results• Next Steps• Q&A

Page 3: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

CSCS (Swiss National Supercomputing Centre)

© CSCS 2016 3

• Founded in 1991• Enables world-class research with a scientific user lab • Available to domestic and international researchers through a

transparent, peer-reviewed allocation process. • Open to academia and are available as well to users from industry

and the business sector. • Operated by ETH Zurich and is located in Lugano.

Page 4: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

24 years of supercomputers at CSCS

1991 NEC SX35.5 GF Adula

1996 NEC SX410 GF Gottardo

1999 NEC SX5 64 GF Prometeo

2002 IBM SP41.3 TF Venus

2005 Cray XT35.8 TF Palu

2006 IBM P54.5 TF Blanc

2009-12 Cray XE6 402 TF Monte Rosa

2012-13 Cray XC307.7 PF Piz Daint

2014 XC30 1.25 PF Piz Daint extension 4

Page 5: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Data Centre

© CSCS 2016 5

- 2000 sq.m Machine Room- 20 MW of power and Cooling capacity- Lake Water cooling

- 700 Liters/s

Page 6: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Overview of Systems Integration (SI) Unit

Unit missions:

- Managing projects

- Relations with Vendors

- Evaluating Technologies

- Software deployments

Page 7: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Technology Overview – DDN IME

© CSCS 2016 7

Image courtesy of DDN

Page 8: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Tchnology Overview – DDN WOS (1)

© CSCS 2016 8Image courtesy of DDN

Page 9: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Technology Overview – DDN WOS (2)

© CSCS 2016 9

Page 10: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Technology Overview – DDN WOS (3)

© CSCS 2016 10

Page 11: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Technology Overview - OpenStack

© CSCS 2016 11

Image source: https://www.openstack.org/software/

Page 12: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

BeeGFS Case Study

Page 13: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

What is BeeGFS?

13

Parallel filesystemHPC orientedUsed to be called FhGFSAlternative to Lustre and GPFSDeveloped by FraunhoferOpen-sourceSupport delivered by ThinkParq

Image courtesy of BeeGFS

Page 14: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Basic Features of BeeGFS

© CSCS 2016 14

• Supports failover for data and Metadata using applications like Peacemaker, heartbeat

• Replication failover mechanism

• Supports Multiple data and metadata on both servers and targets

• Supports quota

• Uses Robin-hood to scan the entire filesystem

• Beegfs on demand filesystem (BeeOND)

• Easy to deploy and manage

Page 15: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

BeeOND

© CSCS 2016 15

- Create a filesystem on Demand

- Uses the hard drive / SSDs on every compute node

- Filesystem get created by submitting a job to the scheduleWe are working on confirming SLURM support

- Memory could used instead of SSDs

- We used 20 SSDs on 20 nodes for our tests

Page 16: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Benefits of BeeOND

© CSCS 2016 16

Benefits from unused space

No impact on the parallel filesystem

Real utilization of the high speed network

Filesystem scales with the compute nodes

Open point:

What is the overhead on the compute nodes?

Page 17: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Test System Layout

© CSCS 2016 17

DDN 7700

4 * FDR Links

2 * FDR Links

Dual sockets SB128GB memory

Fabric 1 * FDR Links

• One couplet (two controllers)

• Two X86 servers

• One enclosure 60 drives

• 6 SSDs one raid volume

• 6 * 9 Raid 5 volumes

Page 18: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Tuning the servers

© CSCS 2016 18

echo 5 > /proc/sys/vm/dirty_background_ratioecho 20 > /proc/sys/vm/dirty_ratioecho 50 > /proc/sys/vm/vfs_cache_pressureecho 262144 > /proc/sys/vm/min_free_kbytesecho always > /sys/kernel/mm/transparent_hugepage/enabledecho always > /sys/kernel/mm/transparent_hugepage/defrag

for dev in dm-0 dm-1 dm-2 dm-3 dm-4 dm-5 dm-6 doecho deadline > /sys/block/$dev/queue/schedulerecho 4096 > /sys/block/$dev/queue/nr_requestsecho 32768 > /sys/block/$dev/queue/read_ahead_kbecho 32767 > /sys/block/$dev/queue/max_sectors_kb

done

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governorecho 1 > /proc/sys/vm/zone_reclaim_mode

Documentation for the tuned parameters:

https://www.kernel.org/doc/Documentation/sysctl/vm.txthttps://access.redhat.com/solutions/46111http://www.slideshare.net/rampalliraj/linux-kernel-io-schedulers?from_action=save

Page 19: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Monitoring clients activities (1)

© CSCS 2016 19

Page 20: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Monitoring servers activities (2)

© CSCS 2016 20

Page 21: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Benchmark tools

© CSCS 2016 21

• Mdtest measuring metadata

https://sourceforge.net/projects/mdtest/

• IOzone throughput read and write

http://www.iozone.org

Page 22: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Iozone results on /beegfs

© CSCS 2016 22

Test running:Children see throughput for 64 initial writers = 5032700.90 kB/secMin throughput per process = 63754.09 kB/sec Max throughput per process = 103798.58 kB/secAvg throughput per process = 78635.95 kB/secMin xfer = 12880896.00 kB

Test running:Children see throughput for 64 rewriters = 4996297.63 kB/secMin throughput per process = 68781.82 kB/sec Max throughput per process = 90666.23 kB/secAvg throughput per process = 78067.15 kB/secMin xfer = 16473088.00 kB

Test running:Children see throughput for 64 readers = 4225632.91 kB/secMin throughput per process = 40047.24 kB/sec Max throughput per process = 77678.61 kB/secAvg throughput per process = 66025.51 kB/secMin xfer = 10813440.00 kB

Test running:Children see throughput for 64 re-readers = 4253662.00 kB/secMin throughput per process = 56998.73 kB/sec Max throughput per process = 76042.87 kB/secAvg throughput per process = 66463.47 kB/secMin xfer = 15729664.00 kB

Page 23: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Mdtest results on BeeOND

© CSCS 2016 23

0

20000

40000

60000

80000

100000

120000

1 2 4 8 16 20

Directoriesp

erse

cond

NumerofMDSs

Directorycreation

0100000200000300000400000500000600000700000800000900000

1 2 4 8 16 20

Directoriesp

erse

cond

NumerofMDSs

DirectoryStat

0

20000

40000

60000

80000

100000

120000

140000

160000

1 2 4 8 16 20

Directoriesp

erse

cond

NumerofMDSs

DirectoryStat

Page 24: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Mdtest results on BeeOND

© CSCS 2016 24

0

50000

100000

150000

200000

250000

300000

1 2 4 8 16 20

Filesp

erse

cond

NumerofMDSs

FileCreation

0100000200000300000400000500000600000700000800000900000

1 2 3 4 5 6

Filesp

erse

cond

NumerofMDSs

FileStat

0

50000

100000

150000

200000

250000

1 2 3 4 5 6

Filesp

erse

cond

NumerofMDSs

Fileremoval

Page 25: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Next steps

© CSCS 2016 25

• Scaling on bigger cluster

• Verifying the fail over procedures

• Verify the BeeOND overhead on compute nodes

• Using Nvme instead of SSDs

• Using tmpfs

• Create BeeOND through SLURM jobs

• Use Robinhood to scan millions of files

Page 26: Technology Testing at CSCS including BeeGFS Preliminary ... · Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake ... BeeGFS Case Study. ... • Use

Q&A

26

[email protected]