Top Banner
If you build it, will they come? Dirk Petersen, Scientific Computing Director, Fred Hutchinson Cancer Research Center Joe Arnold, Chief Product Officer, President SwiftStack Using OpenStack Swift to build a large scale active archive in a Scientific Computing environment
29

Open stack summit-2015-dp

Jul 20, 2015

Download

Technology

Dirk Petersen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open stack summit-2015-dp

If you build it, will they come?

Dirk Petersen, Scientific Computing

Director,

Fred Hutchinson Cancer Research Center

Joe Arnold, Chief Product Officer, President

SwiftStack

Using OpenStack Swift to build a large scale active

archive in a Scientific Computing environment

Page 2: Open stack summit-2015-dp

Challenge

Need an archive to offload expensive storage

❖ Lost cost storage

❖ High throughput: Load large genome files

HPC

❖ Faster and lower cost than S3 & no

proprietary lock-in.

Page 3: Open stack summit-2015-dp

About Fred Hutch

❖ Cancer & HIV research

❖ 3 Nobel Laureates

❖ $430M budget / 85% NIH funding

❖ 2,700 employees

❖ Conservative use of information technology

Page 4: Open stack summit-2015-dp

IT at Fred Hutch

❖ Multiple data centers with >1000kw

capacity

❖ 100 staff in Center IT plus divisional IT

❖ Team of 3 Sysadmins to support storage

❖ IT funded by indirects (F&A)

❖ Storage Chargebacks started Nov 2014

❖ 1.03 PUE, natural air cooled

Inside Fred Hutch data center

Page 5: Open stack summit-2015-dp

About SwiftStack

❖ Object Storage software

❖ Build with OpenStack Swift

❖ SwiftStack is leading contributor

and Project Technical Lead

❖ Software-defined storage platform

for object storage

SecurityAuthentication & Authorization

SwiftStack Storage Clusters

Runtime Agents include:

Load balancing, Monitoring, Utilization, Device Inventory

OS

HW

OS

HW

OS

HW

OS

HW

OS

HW

OS

HW

Swift Object Storage Engine

NFS/CIFSSwift API

Device & Node Management

Datacenter 1

Datacenter 2

Datacenter 3

User Dashboard …

OS

HW

OS

HW

OS

HW

Out of Band,

Software-Defined Controller

SwiftStack Controller

Page 6: Open stack summit-2015-dp

SwiftStack Resources

https://swiftstack.com/books/

Page 7: Open stack summit-2015-dp

Researchers concerned about ….

❖ Significant storage costs – $40/TiB/month

chargebacks (first 5 TB is free) and

declining grant funding

❖ “If you charge us please give us some

cheap storage for old and big files”

❖ (Mis)perception on storage value (I can buy

a hard drive at BestBuy)

Not what you want: Unsecured and unprotected external USB

storage

Page 8: Open stack summit-2015-dp

Finance concerned about ….

❖ Cost predictability and scale

❖ Data growth causes storage costing up to

$1M per year

❖ Genomics data grows at 40%/Y and

chargebacks don’t cover all costs

❖ Expensive forklift upgrades every few

years

❖ The public cloud (e.g. Amazon S3) set new

transparent cost benchmark.

Page 9: Open stack summit-2015-dp

How much does it cost?

❖ Only small changes vs 2014 ❖ Kryder’s law obsolete at <15%/Y ?

❖ Swift now down to Glacier cost (hardware down to $3 / TB /

month)

❖ No price reductions in the cloud

❖ 4TB (~$120) and 6TB (~$250) drives cost the same ❖ Do you want a fault domain of 144TB or 216TB in your storage

servers

❖ Don’t save on CPU / Erasure Code is coming !

40

2826

11

0

10

20

30

40

50

NAS Amazon S3 Google Swiftstack

$/TB/Mo

AWS EFS is $300/TB/Mo

Page 10: Open stack summit-2015-dp

Economy File in production in 2014

❖ Chargebacks drove the Hutch to embrace

more economical storage

❖ Selected Swift object storage managed by

SwiftStack

❖ Go-live in 2014, strong interest and

expansion in 2015

❖ Researchers do not want to pay the price

for standard enterprise storage

Page 11: Open stack summit-2015-dp

Chargebacks spike Swift utilization!

❖ Started storage chargebacks

on Nov 1st

❖ Triggered strong growth in October

❖ Users sought to avoid high cost of

enterprise NAS and put as much as

possible into lower cost Swift

❖ Underestimated success of Swift

❖ Needed to stop migration to buy more

hardware

❖ Can migrate 30+ TB per day today

Page 12: Open stack summit-2015-dp

Standard Hardware

❖ Supermicro with Silicon Mechanics

❖ 2.1PB raw capacity; ~700TB usable

❖ No RAID controllers; no storage lost to

RAID

❖ Seagate SATA drives (desktop)

❖ 2 x 120GB Intel S3700 SSDs; OS +

metadata

❖ 10Gb Base-T connectivity

❖ (2) Intel Xeon E5 CPUs

❖ 64GB RAM

Page 13: Open stack summit-2015-dp

Management of OpenStack Swift using SwiftStack

❖ Out-of-band management controller

❖ SwiftStack provides control & visibility

❖ Monitoring and stats at cluster, node,

and drive levels

❖ Authentication & Authorization

❖ Capacity & Utilization Management

via Quotas and Rate Limits

❖ Alerting, & Diagnostics

Page 14: Open stack summit-2015-dp

SwiftStack Automation

❖ Deployment automation

❖ Let us roll out Swift nodes in

10 minutes

❖ Upgrading Swift across clusters

with 1 click

❖ 0.25 FTE to manage cluster

Page 15: Open stack summit-2015-dp

Supporting Scientific Computing Workloads

HPC Use Cases & Tools

Page 16: Open stack summit-2015-dp

HPC Requirements

❖ High Aggregate throughput

❖ Current network architecture is bottleneck

❖ Many parallel streams used to max out

throughput

❖ Ideal for HPC cluster architecture

Page 17: Open stack summit-2015-dp

Not a Filesystem

No traditional file system hierarchy, we just have containers, that can contain

millions of objects (aka files)

Huh, no sub-directories? But how the heck can I upload my uber-complex

bioinformatics file system with 11 folder hierarchies to Swift?

Page 18: Open stack summit-2015-dp

Filesystem Mapping with Swift

We simulate the hierarchical structure by simply putting forward slashes (/) in

the object name (or file name)

❖ So, how do you actually copy a folder?

❖ However, the Swift client is frequently used,

well supported, maintained and really fast !!

$ swift upload --changed --segment-

size=2G --use-slo --object-

name=“pseudo/folder" “container" "

/my/local/folder"

Really? Can’t we get this a little easier?

Page 19: Open stack summit-2015-dp

Introducing Swift Commander

❖ Swift Commander, a simple shell wrapper

for the Swift client, curl and some other

tools makes working with Swift very easy.

❖ Sub commands such as swc ls, swc cd,

swc rm, swc more give you a feel that is

quite similar to a Unix file system

❖ Actively maintained and available at:

❖ https://github.com/FredHutch/Swift-

commander/

$ swc upload /my/posix/folder

/my/Swift/folder

$ swc compare /my/posix/folder /my/Swift/folder

$ swc download /my/Swift/folder /my/scratch/fs

Much easier…

Some additional examples

Page 20: Open stack summit-2015-dp

Swift Commander + Metadata

❖ Didn’t someone say that object storage

systems were great at using metadata?

❖ Yes, and you can just add a few key:value

pairs as upload argument:

❖ Query the meta data via swc, or use an

external search engine such as elastic

search

$ swc meta /my/Swift/folder

Meta Cancer: breast

Meta Collaborators: jill,joe,jim

Meta Project: grant-xyz

$ swc upload /my/posix/folder /my/Swift/folder

project:grant-xyz

collaborators:jill,joe,jim

cancer:breast

Page 21: Open stack summit-2015-dp

Integrating with HPC

❖ Integrating Swift in HPC workflows is

not really hard

❖ Example, running samtools using

persistent scratch space

(files deleted if not accessed for 30

days)

If ! [[ -f /fh/scratch/delete30/pi/raw/genome.bam ]]; then

swc download /Swiftfolder/genome.bam /fh/scratch/delete30/raw/genome.bam

fi

samtools view -F 0xD04 -c /fh/scratch/delete30/pi/raw/genome.bam > otherfile

A complex 50 line HPC submission script

prepping a GATK workflow requires

just 3 more lines !!

Page 22: Open stack summit-2015-dp

Other HPC Integrations

❖ Use HPC system to download lots of bam

files in parallel

❖ 30 cluster jobs run in parallel on 30 1G

nodes (which is my HPC limit)

❖ My scratch file system says it loads data at

1.4 GB/s

❖ This means that each bam file is

downloaded at 47 MB/s on average and

downloading this dataset of 1.2 TB takes 14

min

$ swc ls /Ext/seq_20150112/ > bamfiles.txt

$ while read FILE; do

$ sbatch -N1 -c4 --wrap="swc download /Ext/seq_20150112/$FILE .";

$ done < bamfiles.txt

$ squeue -u petersen

JOBID PARTITION NAME USER ST TIME NODES NODELIST

17249368 campus sbatch petersen R 15:15 1 gizmof120

17249371 campus sbatch petersen R 15:15 1 gizmof123

17249378 campus sbatch petersen R 15:15 1 gizmof130

$ fhgfs-ctl --userstats --names --interval=5 --nodetype=storage

====== 10 s ======

Sum: 13803 [sum] 13803 [ops-wr] 1380.300 [MiB-wr/s]

petersen 13803 [sum] 13803 [ops-wr] 1380.300 [MiB-wr/s]

Page 23: Open stack summit-2015-dp

Swift Commander + Small Files

So, we could tar up this entire directory

structure… but then we have one giant tar

ball

Solution: tar up sub dirs in one file but create

a tar ball for each level

eg. /folder1/folder2/folder3

restoring folder2 and below we just need

folder2.tar.gz + folder3.tar.gz

$ swc arch /my/posix/folder /my/Swift/folder

$ swc unarch /my/Swift/folder /my/scratch/fs

It’s available at https://github.com/FredHutch/Swift-commander/blob/master/bin/swbundler.py

It’s Easy

It’s Fast❖ Archiving uses multiple processes, measured up to 400 MB/s from one

Linux box.

❖ Each process uses pigz multithreaded gzip compression (Example:

compressing 1GB DNA string down to 272MB: 111 sec using gzip, 5

seconds using pigz)

❖ Restore can use standard gzip

Page 24: Open stack summit-2015-dp

Desktop Clients & Collaboration

❖ Reality: Every archive requires access via

GUI tools

❖ Requirements

❖ Easy to use

❖ Do not create any proprietary data

structures in Swift that cannot be read by

other tools

Cyberduck desktop client running in windows

Page 25: Open stack summit-2015-dp

Desktop Clients & Collaboration

❖ Another example: ExpanDrive and Storage Made Easy

❖ Works with Windows and Mac

❖ Integrates in Mac Finder and is mountable as a drive in

Windows

Page 26: Open stack summit-2015-dp

rclone: mass copy, backup, data migration

❖ rclone is a multithreaded data copy / mirror

tool

❖ Consistent performance on Linux, Mac and

Windows

❖ E.g. keep a mirror of Synology workgroup

NAS (QNAP has a builtin swift mirror

option)

❖ Data remains accessible by swc, desktop

clients

❖ Mirror protected by swift undelete (currently

60 days retention)

Page 27: Open stack summit-2015-dp

Galaxy: Scientific Workflow Management

❖ Galaxy web based high throughput

computing at the Hutch uses Swift as

primary storage in production today

❖ SwiftStack patches contributed to Galaxy

Project

❖ Swift allows to delegate “root” access to

bioinformaticians

❖ Integrated with Slurm HPC scheduler:

automatically assigns default PI account for

each user

Page 28: Open stack summit-2015-dp

Summary

Discovery is driven by technologies that generate larger and larger datasets

❖ Object storage ideal for

❖ Ever-growing data volumes

❖ High throughput required for HPC

❖ Faster and lower cost than S3 & no

proprietary lock-in

Page 29: Open stack summit-2015-dp

Thank you!Dirk Petersen, Scientific Computing

Director,

Fred Hutchinson Cancer Research Center

Joe Arnold, Chief Product Officer, President

SwiftStack