Top Banner
CEPH: A MASSIVELY SCALABLE DISTRIBUTED STORAGE SYSTEM Ken Dreyer Software Engineer Apr 23 2015
58
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ceph Overview for Distributed Computing Denver Meetup

CEPH: A MASSIVELY SCALABLEDISTRIBUTED STORAGE SYSTEM

Ken DreyerSoftware EngineerApr 23 2015

Page 2: Ceph Overview for Distributed Computing Denver Meetup

Hitler finds out about software-defined storage

If you are a proprietary storage vendor...

Page 3: Ceph Overview for Distributed Computing Denver Meetup

THE FUTURE OF STORAGE

Traditional StorageComplex proprietary silos

Open Software Defined StorageStandardized, unified, open platforms

USER

ADMIN

USER

ADMIN

Custom GUI

Proprietary Software

Custom GUI

Proprietary Software

ProprietaryHardware

ProprietaryHardware

StandardComputersand Disks

Com

modit

yH

ard

ware

Open S

ourc

eSoft

ware

Ceph

Control Plane (API, GUI)

ADMIN USER

Page 4: Ceph Overview for Distributed Computing Denver Meetup

THE JOURNEY

Open Software-Defined Storage is a fundamental reimagining of how storage infrastructure works.

It provides substantial economic and operational advantages, and it has quickly become ideally suited for a growing number of use cases.

TODAY EMERGING FUTURE

CloudInfrastructure

CloudNative Apps

Analytics

Hyper-Convergence

Containers

???

???

Page 5: Ceph Overview for Distributed Computing Denver Meetup

HISTORICAL TIMELINE

RHEL-OSP Certification FEB 2014

MAY 2012Launch of Inktank

OpenStack Integration 2011

2010Mainline Linux Kernel

Open Source 2006

2004 Project Starts at UCSC

Production Ready Ceph SEPT 2012

2012CloudStack Integration

OCT 2013Inktank Ceph Enterprise Launch

Xen Integration 2013

APR 2014Inktank Acquired by Red Hat

10 years in the making

5

Page 6: Ceph Overview for Distributed Computing Denver Meetup

ARCHITECTURE

Page 7: Ceph Overview for Distributed Computing Denver Meetup

ARCHITECTURAL COMPONENTS

7

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 8: Ceph Overview for Distributed Computing Denver Meetup

OBJECT STORAGE DAEMONS

8

FS

DISK

OSD

DISK

OSD

FS

DISK

OSD

FS

DISK

OSD

FS

btrfsxfsext4zfs?

M

M

M

Page 9: Ceph Overview for Distributed Computing Denver Meetup

RADOS CLUSTER

9

APPLICATION

M M

M M

M

RADOS CLUSTER

Page 10: Ceph Overview for Distributed Computing Denver Meetup

RADOS COMPONENTS

10

OSDs: 10s to 10000s in a cluster One per disk (or one per SSD, RAID group…) Serve stored objects to clients Intelligently peer for replication & recovery

Monitors: Maintain cluster membership and state Provide consensus for distributed decision-

making Small, odd number These do not serve stored objects to clients

M

Page 11: Ceph Overview for Distributed Computing Denver Meetup

WHERE DO OBJECTS LIVE?

11

??

APPLICATION

M

M

M

OBJECT

Page 12: Ceph Overview for Distributed Computing Denver Meetup

A METADATA SERVER?

12

1

APPLICATION

M

M

M

2

Page 13: Ceph Overview for Distributed Computing Denver Meetup

CALCULATED PLACEMENT

13

FAPPLICATION

M

M

MA-G

H-N

O-T

U-Z

Page 14: Ceph Overview for Distributed Computing Denver Meetup

EVEN BETTER: CRUSH!

14

CLUSTER

OBJECTS

10

01

01

10

10

01

11

01

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

PLACEMENT GROUPS(PGs)

Page 15: Ceph Overview for Distributed Computing Denver Meetup

CRUSH IS A QUICK CALCULATION

15

RADOS CLUSTER

OBJECT

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

Page 16: Ceph Overview for Distributed Computing Denver Meetup

CRUSH: DYNAMIC DATA PLACEMENT

16

CRUSH: Pseudo-random placement algorithm

Fast calculation, no lookup Repeatable, deterministic

Statistically uniform distribution Stable mapping

Limited data migration on change Rule-based configuration

Infrastructure topology aware Adjustable replication Weighting

Page 17: Ceph Overview for Distributed Computing Denver Meetup

CRUSH

OBJECT

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

Page 18: Ceph Overview for Distributed Computing Denver Meetup

18

OBJECT

10 10 01 01 10 10 01 11 01 10

Page 19: Ceph Overview for Distributed Computing Denver Meetup

19

CLIENT

??

Page 20: Ceph Overview for Distributed Computing Denver Meetup

20

Page 21: Ceph Overview for Distributed Computing Denver Meetup

21

Page 22: Ceph Overview for Distributed Computing Denver Meetup

22

CLIENT

??

Page 23: Ceph Overview for Distributed Computing Denver Meetup

23

Page 24: Ceph Overview for Distributed Computing Denver Meetup

24

Page 25: Ceph Overview for Distributed Computing Denver Meetup

25

Page 26: Ceph Overview for Distributed Computing Denver Meetup

ARCHITECTURAL COMPONENTS

26

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 27: Ceph Overview for Distributed Computing Denver Meetup

ACCESSING A RADOS CLUSTER

27

APPLICATION

M M

M

RADOS CLUSTER

LIBRADOS

OBJECT

socket

Page 28: Ceph Overview for Distributed Computing Denver Meetup

L

LIBRADOS: RADOS ACCESS FOR APPS

28

LIBRADOS: Direct access to RADOS for applications C, C++, Python, PHP, Java, Erlang Direct access to storage nodes No HTTP overhead

Page 29: Ceph Overview for Distributed Computing Denver Meetup

ARCHITECTURAL COMPONENTS

29

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 30: Ceph Overview for Distributed Computing Denver Meetup

THE RADOS GATEWAY

30

M M

M

RADOS CLUSTER

RADOSGW

LIBRADOS

socket

RADOSGW

LIBRADOS

APPLICATION APPLICATION

REST

Page 31: Ceph Overview for Distributed Computing Denver Meetup

RADOSGW MAKES RADOS WEBBY

31

RADOSGW: REST-based object storage proxy Uses RADOS to store objects API supports buckets, accounts Usage accounting for billing Compatible with S3 and Swift applications

Page 32: Ceph Overview for Distributed Computing Denver Meetup

ARCHITECTURAL COMPONENTS

32

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 33: Ceph Overview for Distributed Computing Denver Meetup

STORING VIRTUAL DISKS

33

M M

RADOS CLUSTER

HYPERVISOR

LIBRBD

VM

Page 34: Ceph Overview for Distributed Computing Denver Meetup

SEPARATE COMPUTE FROM STORAGE

34

M M

RADOS CLUSTER

HYPERVISOR

LIBRBDVM

HYPERVISOR

LIBRBD

Page 35: Ceph Overview for Distributed Computing Denver Meetup

KRBD - KERNEL MODULE

M M

RADOS CLUSTER

LINUX HOST

KRBD

Page 36: Ceph Overview for Distributed Computing Denver Meetup

RBD STORES VIRTUAL DISKS

RADOS BLOCK DEVICE: Storage of disk images in RADOS Decouples VMs from host Images are striped across the cluster (pool) Snapshots Copy-on-write clones Support in:

Mainline Linux Kernel (2.6.39+) and RHEL 7 Qemu/KVM, native Xen coming soon OpenStack, CloudStack, Nebula, Proxmox

Page 37: Ceph Overview for Distributed Computing Denver Meetup

Export snapshots to geographically dispersed data centers▪ Institute disaster recovery

Export incremental snapshots▪ Minimize network bandwidth by only sending changes

RBD SNAPSHOTS

Page 38: Ceph Overview for Distributed Computing Denver Meetup

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 39: Ceph Overview for Distributed Computing Denver Meetup

SEPARATE METADATA SERVER

LINUX HOST

M M

M

RADOS CLUSTER

KERNEL MODULE

datametadata 0110

Page 40: Ceph Overview for Distributed Computing Denver Meetup

SCALABLE METADATA SERVERS

METADATA SERVER Manages metadata for a POSIX-compliant

shared filesystem Directory hierarchy File metadata (owner, timestamps, mode,

etc.) Stores metadata in RADOS Does not serve file data to clients Only required for shared filesystem

Page 41: Ceph Overview for Distributed Computing Denver Meetup

CALAMARI

41

Page 42: Ceph Overview for Distributed Computing Denver Meetup

CALAMARI ARCHITECTURE

CEPH STORAGE CLUSTER

MASTER

CALAMARI

ADMIN NODE

MINION MINION

M

MINION MINION

M

MINIONMINION

M

Page 43: Ceph Overview for Distributed Computing Denver Meetup

USE CASES

Page 44: Ceph Overview for Distributed Computing Denver Meetup

WEB APPLICATION STORAGE

WEB APPLICATION

APP SERVER APP SERVER APP SERVER

CEPH STORAGE CLUSTER(RADOS)

CEPH OBJECT GATEWAY

(RGW)

CEPH OBJECT GATEWAY(RGW)

APP SERVER

S3/Swift S3/Swift S3/Swift S3/Swift

Page 45: Ceph Overview for Distributed Computing Denver Meetup

MULTI-SITE OBJECT STORAGE

WEB APPLICATION

APP SERVER

CEPH OBJECT GATEWAY

(RGW)

CEPH STORAGE CLUSTER

(US-EAST)

WEB APPLICATION

APP SERVER

CEPH OBJECT GATEWAY

(RGW)

CEPH STORAGE CLUSTER

(EU-WEST)

Page 46: Ceph Overview for Distributed Computing Denver Meetup

ARCHIVE / COLD STORAGE

APPLICATION

CACHE POOL (REPLICATED)

BACKING POOL (ERASURE CODED)

CEPH STORAGE CLUSTER

Page 47: Ceph Overview for Distributed Computing Denver Meetup

ERASURE CODING

47

OBJECT

REPLICATED POOL

CEPH STORAGE CLUSTER

ERASURE CODED POOL

CEPH STORAGE CLUSTER

COPY COPY

OBJECT

31 2 X Y

COPY

4

Full copies of stored objects Very high durability Quicker recovery

One copy plus parity Cost-effective durability Expensive recovery

Page 48: Ceph Overview for Distributed Computing Denver Meetup

ERASURE CODING: HOW DOES IT WORK?

48

CEPH STORAGE CLUSTER

OBJECT

Y

OSD

3

OSD

2

OSD

1

OSD

4

OSD

X

OSD

ERASURE CODED POOL

Page 49: Ceph Overview for Distributed Computing Denver Meetup

CACHE TIERING

49

CEPH CLIENT

CACHE: WRITEBACK MODE

BACKING POOL (REPLICATED)

CEPH STORAGE CLUSTER

Read/Write Read/Write

Page 50: Ceph Overview for Distributed Computing Denver Meetup

WEBSCALE APPLICATIONS

50

WEB APPLICATION

APP SERVER APP SERVER APP SERVER

CEPH STORAGE CLUSTER(RADOS)

APP SERVER

NativeProtocol

NativeProtocol

NativeProtocol

NativeProtocol

Page 51: Ceph Overview for Distributed Computing Denver Meetup

ARCHIVE / COLD STORAGE

51

APPLICATION

CACHE POOL (REPLICATED)

BACKING POOL (ERASURE CODED)

CEPH STORAGE CLUSTER

Page 52: Ceph Overview for Distributed Computing Denver Meetup

CEPH BLOCK DEVICE (RBD)

DATABASES

52

MYSQL / MARIADB

LINUX KERNEL

CEPH STORAGE CLUSTER(RADOS)

NativeProtocol

NativeProtocol

NativeProtocol

NativeProtocol

Page 53: Ceph Overview for Distributed Computing Denver Meetup

Future Ceph Roadmap

Page 54: Ceph Overview for Distributed Computing Denver Meetup

CEPH ROADMAP

57

Hammer(current release) Infernalis J-Release

NewStore

Object Expiration

Performance Improvements

Stable CephFS?Object Versioning

Alternative Web Server for RGW

Performance Improvements

???

Performance Improvements

Page 55: Ceph Overview for Distributed Computing Denver Meetup

NEXT STEPS

Page 56: Ceph Overview for Distributed Computing Denver Meetup

NEXT STEPSWHAT NOW?

• Read about the latest version of Ceph: http://ceph.com/docs

• Deploy a test cluster using ceph-deploy: http://ceph.com/qsg

Getting Started with Ceph

Most discussion happens on the mailing lists ceph-devel and ceph-users. Join or view archives at http://ceph.com/list

IRC is a great place to get help (or help others!) #ceph and #ceph-devel. Details and logs at http://ceph.com/irc

Getting Involved with Ceph

59

• Deploy a test cluster on the AWS free-tier using Juju: http://ceph.com/juju

• Ansible playbooks for Ceph: https://www.github.com/alfredodeza/ceph-ansible

Download the code: http://www.github.com/ceph

The tracker manages bugs and feature requests. Register and start looking around at http://tracker.ceph.com

Doc updates and suggestions are always welcome. Learn how to contribute docs at http://ceph.com/docwriting

Page 57: Ceph Overview for Distributed Computing Denver Meetup

Thank You

Page 58: Ceph Overview for Distributed Computing Denver Meetup

extras

● metrics.ceph.com● http://yahooeng.tumblr.com/post/116391291701

/yahoo-cloud-object-store-object-storage-at