Top Banner
ORNL is managed by UT-Battelle for the US Department of Energy Running Docker on Lustre An architectural overview Blake Caldwell OLCF/ORNL LUG 2016 Portland, Oregon April 6, 2016
28

Running Docker on Lustre - OpenSFS

Jan 23, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Running Docker on Lustre - OpenSFS

ORNL is managed by UT-Battelle for the US Department of Energy

Running Docker on Lustre

An architectural overview

Blake CaldwellOLCF/ORNL

LUG 2016Portland, OregonApril 6, 2016

Page 2: Running Docker on Lustre - OpenSFS

2

About Docker

• What is it?– A toolset for packaging, shipping, and running containers (user environment)

• What is it good for?– Consistent user environments

• Rapid prototyping, proof of concepts (development)• Reproducible research

– Application isolation– Server consolidation

Page 3: Running Docker on Lustre - OpenSFS

3

A Conversation on Image Distribution

HPC User: furious_mccarthy

Docker Oracle: goofy_blackwell

Page 4: Running Docker on Lustre - OpenSFS

4

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

Docker Oracle: goofy_blackwell

Page 5: Running Docker on Lustre - OpenSFS

5

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Page 6: Running Docker on Lustre - OpenSFS

6

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

2. My images can't leave the local network

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Page 7: Running Docker on Lustre - OpenSFS

7

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

2. My images can't leave the local network

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Create a private repository

Page 8: Running Docker on Lustre - OpenSFS

8

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

2. My images can't leave the local network

3. I have a lot of compute nodes and 1 registry (bottleneck)

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Create a private repository

Page 9: Running Docker on Lustre - OpenSFS

9

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

2. My images can't leave the local network

3. I have a lot of compute nodes and 1 registry (bottleneck)

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Create a private repository

Load balance the registries

Page 10: Running Docker on Lustre - OpenSFS

10

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

2. My images can't leave the local network

3. I have a lot of compute nodes and 1 registry (bottleneck)

4. The images are inconsistent!

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Create a private repository

Load balance the registries

Page 11: Running Docker on Lustre - OpenSFS

11

A Conversation on Image Distribution

HPC User: furious_mccarthy

1. How do I run the same image on 50 different nodes?

2. My images can't leave the local network

3. I have a lot of compute nodes and 1 registry (bottleneck)

4. The images are inconsistent!

Docker Oracle: goofy_blackwell

Push it to Docker Hub

Create a private repository

Load balance the registries

Redeploy! Cattle vs. pets…

Page 12: Running Docker on Lustre - OpenSFS

12

Normal Docker Pull

Page 13: Running Docker on Lustre - OpenSFS

Parallel Docker Pull

Page 14: Running Docker on Lustre - OpenSFS

14

Why Does Docker Need a Distributed Image Store?

• Deployment means waiting on disk I/O• Copies are everywhere!• Consistency• Security

Page 15: Running Docker on Lustre - OpenSFS

15

Why Lustre?

• A shared, persistent, filesystem already present in many cluster computing environments

• We’re addressing the speed of Docker when using the same image across many nodes in parallel

Page 16: Running Docker on Lustre - OpenSFS

16

Docker Images vs. Volumes

• Images: the base filesystem image of the container (chroot)– Stored in Docker registries (push/pull)– Made up of layers (copy-on-write)

Page 17: Running Docker on Lustre - OpenSFS

17

Docker Images vs. Volumes

• Images: the base filesystem image of the container (chroot)– Stored in Docker registries (push/pull)– Made up of layers (copy-on-write)

• Volumes: filesystem mounts added at container creation time– Bind-mounts from host– Plugins exist for volumes on distributed storage (Ceph, Gluster, S3)

• No Lustre volume driver

Page 18: Running Docker on Lustre - OpenSFS

18

Docker Images vs. Volumes

• Images: the base filesystem image of the container (chroot)– Stored in Docker registries (push/pull)– Made up of layers (copy-on-write)

• Volumes: filesystem mounts added at container creation time– Bind-mounts from host– Plugins exist for volumes on distributed storage (Ceph, Gluster, S3)

• No Lustre volume driver

• What options exist for storing images on Lustre…

Page 19: Running Docker on Lustre - OpenSFS

19

• The dm-loopback implementation is Docker’s fallback storage driver– Devicemapper in RHEL, Ubuntu, SLES– No block device configuration required– Thinp snapshots are copy-on-write

• Metadata operations are handled on VFS locally• But it’s quite slow

– Jason Brooks – Friends Don't Let Friends Run Docker on Loopback in Productionhttp://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/

Devicemapper + Loopback

Page 20: Running Docker on Lustre - OpenSFS

20

OverlayFS

• Upstream since Linux 3.18– Hasn’t always supported distributed file systems

• Presents a union mount of one or more r/o layers and one r/w layer– Layers are directories– Modified files are copied up

Page 21: Running Docker on Lustre - OpenSFS

21

OverlayFS Union Mounts

https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/

Page 22: Running Docker on Lustre - OpenSFS

22

OverlayFS

Pros:+ Page cache entries shared for all containers+ Natively supports copy-on-write

• Cons:- Copy-up penalty on write- Docker’s implementation uses hard links for chaining image layers- FS-only so lots of files, and metadata operations

Page 23: Running Docker on Lustre - OpenSFS

OverlayFS + Loopback

Page 24: Running Docker on Lustre - OpenSFS

24

OverlayFS + Loopback: Implementation

• https://github.com/bacaldwell/lustre-graph-driver

Page 25: Running Docker on Lustre - OpenSFS

25

Conclusions

• Loopback devices on Lustre could support cluster computing workloads– No image pulls, just run– Read-only layers on filesystem– Ephemeral layers node-local

• Work remains– Upstream Docker overlayfs driver with multiple lower layers– Loopback device performance (LU-6585 lloop driver)

Page 26: Running Docker on Lustre - OpenSFS

26

Resources

• Jeremy Eder – Comprehensive Overview of Storage Scalability in Dockerhttp://developers.redhat.com/blog/2014/09/30/overview-storage-scalability-docker/

• Jérôme Petazzoni – Docker storage drivershttp://www.slideshare.net/Docker/docker-storage-drivers

• Reproducible environmentshttp://nkhare.github.io/data_and_network_containers/storage_backends/https://github.com/marcindulak/vagrant-lustre-tutorial

Page 27: Running Docker on Lustre - OpenSFS

27

Questions…

Page 28: Running Docker on Lustre - OpenSFS

28

Pull to Lustre