Top Banner
© 2013 WARP Mechanics Ltd. All Rights Reserved. Q1-2014 Josh Judd, CTO Lustre over ZFS on Linux Update on State of the Art HPC Filesystems
14

Warp Mechanics: Lustre Over ZFS on Linux Podcast

May 11, 2015

Download

Technology

insideHPC

In this slidecast, Josh Judd from Warp Mechanics presents: Lustre Over ZFS on Linux.

What does ZFS do for me? HPC relevant features include:
Support for 1/10GbE, 4/8/16GbFC, and 40Gb Infiniband
Multi-layer cache combines DRAM and SSDs with HDDs
Copy-on-write eliminates holes and accelerates writes
Checksums eliminate silent data corruption and bit rot
Snap, thin provisioning, compression, de-dupe, etc. built in
Lustre and SNFS integration allows 40GbE networking
Same software/hardware supports NAS and RAID
One management code base to control all storage platforms
Open storage. You can have the source code.

Learn more: http://warpmech.com
Watch the video presentation: http://insidehpc.com/2014/02/17/slidecast-lustre-zfs-linux/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Q1-2014

Josh Judd, CTO

Lustre over ZFS on LinuxUpdate on State of the Art HPC Filesystems

Page 2: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 2

Overview• Remind me... What is “Lustre over ZFS” and why do

I care?• What was “production grade” last year?• What has changed since then?• Why do I care about that?• Can you give me a concrete implementation

example?• How could I get started on this?

Page 3: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 3

What is Lustre over ZFS?• Lustre: Horizontally-scalable “meta” filesystem which sits

on top of “normal” filesystems and makes them big, fast, and unified

– Historically, ext4 provided the “backing” FS for Lustre– This has scalability & performance issues, plus it lacks features &

integrity assurances

• ZFS: Vertically-scalable “normal” filesystem, which includes many powerful features, integrity assurances, and an advanced RAID stack

– Historically, a ZFS filesystem could only exist on a single server at a time

– It could scale vertically, but had no ability to scale out whatsoever

• Lustre/ZFS: Marries the horizontal scalability and performance of Lustre to the vertical scalability and features of ZFS

• Better together: Each fills missing pieces of the other

Page 4: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 4

What worked last year?• LLNL was supporting part of ZFS in the Sequoia system

– This had scalability benefits, and added some features– They didn’t have ZFS RAID/Volume at high confidence in time for

Sequoia

• WARP was supporting all of the ZFS features... But a bit differently

– This was the most complete integration on the market. However...– WARP ran an OSS connected via RDMA to a separate Solaris-based

ZFS box– This provided feature benefits, but wasn’t unified hardware and didn’t

integrate the file and block layers – e.g., read ahead optimizations were “N/A”

– And you had to manage both Linux and Solaris

Page 5: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 5

What has changed?• LLNL has done more refinement and scaling on all layers• WARP has finished integrating the whole stack onto one

controller

• Now you can get file+block aspects of Lustre/ZFS on a single Linux box

– No Solaris; no external RDMA cables; no extra server in the mix– 100% of the Lustre/ZFS integration features, e.g. read-ahead

optimization works

• Easy to install from RHEL-style yum repo• Commercial support available

Page 6: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 6

Value of this to HPC systems?Complete PXE support

• Now you can can PXE boot everything, from RAID to the parallel FS layers, and run “diskless” – i.e. no image to flash inside storage nodes

• In other words, manage your storage the same way you manage the compute layer

Complete base OS control• Some customers want a GUI; others want control

• Full root access to Linux-based OS – which fully controls all layers of storage

Complete open source stack• Lowers cost for storage dramatically – just as it did on the server

layer

• Allows “the community” to add features and tools

Page 7: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 7

Value of this to HPC systems? (cont.)Moves in the “mainstream direction” of open source vs.

proprietary

Built-in compression gets 1.7:1 in real-world HPC environments

ZFS RAID layer supports hybrid SSD pools: massive performance benefits for Lustre especially with small random reads

ZFS adds multiple layers of integrity protection: essential at peta-scale

Allows running arbitrary user-provided code directly within the storage array – e.g., CRAM could be implemented inside the controller

Page 8: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 8

Concrete example?• Not a sales pitch, but...• WARP implements the stack as an appliance• In this case, the controllers are integrated into the storage

enclosure• We’ll describe how we did it in a minute, but...• You can “DIY” the same thing using COTS hardware and

free software• It’s a question of “do you want an appliance with

commercial support”?

Page 9: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 9

Core architecture of WARPhpc system4u 60bay chassis

Used for OSS “heads” and disk shelves

Building block is 12u / 180 bay OSS “pod”

1x chassis has 2x Sandy Bridge OSSs (HA)

2x chassis have SAS JBOD controllers

Can use100% HDD, 100% SSD, or hybrid

Typical case per pod: • 4GB/s to 12GB/s

• 0.5PB to 1.0PB usable

Page 10: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 10

CPU-based Controllers for OSSs

Page 11: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 11

Example HPC Lustre/ZFS Storage System• Chassis are arranged as 12u “pods”

– 3x chassis in a group – 1x “smart” and 2x “expansion”– 180 spindles (“S”), or 150 spindles + 30 SSDs (“H”), or 180 SSDs (“M”)– 2x controllers (active/active HA) with 4x 56Gbps Infiniband or 4x 40Gbps Ethernet ports– Each pod runs between 4GBytes/s to 12GBytes/s, depending on drive config and workload– Example 4-rack system: ~14PB w/ compression (typical) and ~70+GB /sec (typical)

Page 12: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Page

Slide 12

How to actually build that?Option 1: WARP just rolls in racks

Option 2: DIY

• Everything about this is open source• You can hook up COTS servers to JBODs, download code,

and go• Lustre layer works mostly like any other Lustre system• Differences are in how OSTs are created:

– mkfs.lustre --mgs --backfstype=zfs warpfs-mgt0/mgt0 mirror sdc sdd

– mkfs.lustre --ost --backfstype=zfs –mgsnode=[nid] --fsname warpfs --index=1 warpfs-ost1/ost1 raidz sdf sdg sdh sdi sdj cache sde1

Page 13: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved. Slide 13

How to get started?WARP and LLNL are in final stages of reviewing a quick start

guide

This shows how to get Lustre/ZFS running as a VM in minutes

It’s not a comprehensive guide to Lustre or ZFS, but it distills procedures down to a simple, “cut and paste” method to get started

To get an advance copy, email [email protected]

[...]

Next, tell the Lustre/ZFS startup scripts (contributed by LLNL) which Lustre services you want to start. Don’t worry thatyou haven’t created these filesystems yet. They will be created shortly.

If you prefer the vi method to echoing, then: vi /etc/ldev.conf

Make sure ldev.conf contains the following lines:

warpdemo - mgs zfs:warp-°©‐mgt0/mgts0 warpdemo

[...]

Page 14: Warp Mechanics: Lustre Over ZFS on Linux Podcast

© 2013 WARP Mechanics Ltd. All Rights Reserved.

Thanks!

[email protected]