Top Banner
Ceph Intro & Architectural Overview Ross Turk VP Community, Inktank
77
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ceph Intro and Architectural Overview by Ross Turk

Ceph Intro & Architectural OverviewRoss TurkVP Community, Inktank

Page 2: Ceph Intro and Architectural Overview by Ross Turk

2

ME ME ME ME ME ME.I made a slide today. It’s all about me.

Ross TurkVP Community, Inktank

[email protected]@rossturk

inktank.com | ceph.com

Page 3: Ceph Intro and Architectural Overview by Ross Turk

3

CLOUD SERVICES

COMPUTE NETWORK STORAGE

the future of storage™

Page 4: Ceph Intro and Architectural Overview by Ross Turk

4

HUMAN

COMPUTER TAPE

HUMAN

ROCK

HUMAN

INK

PAPER

Page 5: Ceph Intro and Architectural Overview by Ross Turk

5

HUMAN

COMPUTER TAPE

Page 6: Ceph Intro and Architectural Overview by Ross Turk

6

YOUTECHNOLOG

YYOUR DATA

Page 7: Ceph Intro and Architectural Overview by Ross Turk

7

How Much Store Things All Human History?!writing

paper

computers

distributed storage

cloud computing

gaaaaaaaaahhhh!!!!!!

carving

Page 8: Ceph Intro and Architectural Overview by Ross Turk

8

HUMAN COMPUTER DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

Page 9: Ceph Intro and Architectural Overview by Ross Turk

9

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMANHUMAN

HUMANHUMAN

HUMAN

HUMANHUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

COMPUTER

Page 10: Ceph Intro and Architectural Overview by Ross Turk

10

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMANHUMAN

HUMANHUMAN

HUMAN

HUMANHUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

GIANT SPENDY

COMPUTER

Page 11: Ceph Intro and Architectural Overview by Ross Turk

11

DISKCOMPUTE

R

HUMAN

HUMAN

HUMAN

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

R

Page 12: Ceph Intro and Architectural Overview by Ross Turk

12

HUMAN

HUMAN

HUMAN

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

Page 13: Ceph Intro and Architectural Overview by Ross Turk

13

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

“STORAGE APPLIANCE”

Page 14: Ceph Intro and Architectural Overview by Ross Turk

14Storage ApplianceMichael Moll, Wikipedia / CC BY-SA 2.0

Page 15: Ceph Intro and Architectural Overview by Ross Turk

15

SUPPORT AND MAINTENANCE

PROPRIETARY SOFTWARE

PROPRIETARY HARDWARE

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

34% of 2012 revenue(5.2 billion dollars)

1.1 billion in R&Dspent in 2012

1.6 million square feetof manufacturing space

Page 16: Ceph Intro and Architectural Overview by Ross Turk

16

1010100110

1010110011

1001100101

1001101011

1001100111

1001010011

THE CLOUD

Page 17: Ceph Intro and Architectural Overview by Ross Turk

17

SUPPORT AND MAINTENANCE

PROPRIETARY SOFTWARE

PROPRIETARY HARDWARE

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

STANDARD HARDWARE

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

OPEN SOURCE SOFTWARE

ENTERPRISE SUBSCRIPTION

(optional)

Page 18: Ceph Intro and Architectural Overview by Ross Turk

18

Page 19: Ceph Intro and Architectural Overview by Ross Turk

19

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

NO SINGLE POINT OF FAILURE

SOFTWARE BASEDSELF-

MANAGING

philosophy design

Page 20: Ceph Intro and Architectural Overview by Ross Turk

20

8 years & 20,000 commits later…

Page 21: Ceph Intro and Architectural Overview by Ross Turk

21

Page 22: Ceph Intro and Architectural Overview by Ross Turk

22

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 23: Ceph Intro and Architectural Overview by Ross Turk

23

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 24: Ceph Intro and Architectural Overview by Ross Turk

24

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FSFS btrfsxfsext4

MMM

Page 25: Ceph Intro and Architectural Overview by Ross Turk

25

M

M

M

HUMAN

Page 26: Ceph Intro and Architectural Overview by Ross Turk

26

Monitors:• Maintain cluster

membership and state• Provide consensus for

distributed decision-making• Small, odd number• These do not serve stored

objects to clients

M

OSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects to

clients• Intelligently peer to perform

replication and recovery tasks

Page 27: Ceph Intro and Architectural Overview by Ross Turk

27

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 28: Ceph Intro and Architectural Overview by Ross Turk

28

LIBRADOS

M

M

M

APP

socket

Page 29: Ceph Intro and Architectural Overview by Ross Turk

LLIBRADOS• Provides direct access to

RADOS for applications• C, C++, Python, PHP, Java,

Erlang• Direct access to storage

nodes• No HTTP overhead

Page 30: Ceph Intro and Architectural Overview by Ross Turk

30

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 31: Ceph Intro and Architectural Overview by Ross Turk

31

M

M

M

LIBRADOS

RADOSGW

APP

socket

REST

Page 32: Ceph Intro and Architectural Overview by Ross Turk

32

RADOS Gateway:• REST-based object

storage proxy• Uses RADOS to store

objects• API supports buckets,

accounts• Usage accounting for

billing• Compatible with S3 and

Swift applications

Page 33: Ceph Intro and Architectural Overview by Ross Turk

33

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

Page 34: Ceph Intro and Architectural Overview by Ross Turk

34

M

M

M

VM

LIBRADOS

LIBRBD

VIRTUALIZATION CONTAINER

Page 35: Ceph Intro and Architectural Overview by Ross Turk

35

LIBRADOS

M

M

M

LIBRBD

CONTAINER

LIBRADOS

LIBRBD

CONTAINERVM

Page 36: Ceph Intro and Architectural Overview by Ross Turk

36

LIBRADOS

M

M

M

KRBD (KERNEL MODULE)

HOST

Page 37: Ceph Intro and Architectural Overview by Ross Turk

37

RADOS Block Device:• Storage of disk images in

RADOS• Decouples VMs from host• Images are striped across

the cluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel

(2.6.39+)• Qemu/KVM, native Xen

coming soon• OpenStack, CloudStack,

Nebula, Proxmox

Page 38: Ceph Intro and Architectural Overview by Ross Turk

38

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 39: Ceph Intro and Architectural Overview by Ross Turk

39

M

M

M

CLIENT

0110

datametadata

Page 40: Ceph Intro and Architectural Overview by Ross Turk

40

Metadata Server• Manages metadata for a

POSIX-compliant shared filesystem• Directory hierarchy• File metadata (owner,

timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data to

clients• Only required for shared

filesystem

Page 41: Ceph Intro and Architectural Overview by Ross Turk

41

What Makes Ceph Unique?Part one: CRUSH

Page 42: Ceph Intro and Architectural Overview by Ross Turk

42

APP??

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

Page 43: Ceph Intro and Architectural Overview by Ross Turk

43How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0

Page 44: Ceph Intro and Architectural Overview by Ross Turk

44

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

Page 45: Ceph Intro and Architectural Overview by Ross Turk

45Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0

Page 46: Ceph Intro and Architectural Overview by Ross Turk

46

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

A-G

H-N

O-T

U-Z

F*

Page 47: Ceph Intro and Architectural Overview by Ross Turk

47I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0

Page 48: Ceph Intro and Architectural Overview by Ross Turk

48

HOW DO YOUFIND YOUR KEYS

WHEN YOUR HOUSEIS

INFINITELY BIGAND

ALWAYS CHANGING?

Page 49: Ceph Intro and Architectural Overview by Ross Turk

49The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0

Page 50: Ceph Intro and Architectural Overview by Ross Turk

50

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

Page 51: Ceph Intro and Architectural Overview by Ross Turk

51

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

Page 52: Ceph Intro and Architectural Overview by Ross Turk

52

CRUSH• Pseudo-random placement

algorithm• Fast calculation, no lookup• Repeatable, deterministic• Statistically uniform

distribution• Stable mapping• Limited data migration on

change• Rule-based configuration• Infrastructure topology aware• Adjustable replication• Weighting

Page 53: Ceph Intro and Architectural Overview by Ross Turk

53

CLIENT

??

Page 54: Ceph Intro and Architectural Overview by Ross Turk

54

Page 55: Ceph Intro and Architectural Overview by Ross Turk

55

Page 56: Ceph Intro and Architectural Overview by Ross Turk

56

Page 57: Ceph Intro and Architectural Overview by Ross Turk

57

CLIENT

??

Page 58: Ceph Intro and Architectural Overview by Ross Turk

58

What Makes Ceph UniquePart two: thin provisioning

Page 59: Ceph Intro and Architectural Overview by Ross Turk

59

LIBRADOS

M

M

M

VM

LIBRBD

VIRTUALIZATION CONTAINER

Page 60: Ceph Intro and Architectural Overview by Ross Turk

60

HOW DO YOUSPIN UP

THOUSANDS OF VMsINSTANTLY

ANDEFFICIENTLY?

Page 61: Ceph Intro and Architectural Overview by Ross Turk

61

144 0 0 0 0

instant copy

= 144

Page 62: Ceph Intro and Architectural Overview by Ross Turk

62

4144

CLIENT

write

write

write

= 148

write

Page 63: Ceph Intro and Architectural Overview by Ross Turk

63

4144

CLIENTread

read

read

= 148

Page 64: Ceph Intro and Architectural Overview by Ross Turk

64

What Makes Ceph Unique?Part three: clustered metadata

Page 65: Ceph Intro and Architectural Overview by Ross Turk

65POSIX Filesystem MetadataBarnaby, Flickr / CC BY 2.0

Page 66: Ceph Intro and Architectural Overview by Ross Turk

66

M

M

M

CLIENT

0110

Page 67: Ceph Intro and Architectural Overview by Ross Turk

67

M

M

M

Page 68: Ceph Intro and Architectural Overview by Ross Turk

68

one tree

three metadata servers

??

Page 69: Ceph Intro and Architectural Overview by Ross Turk

69

Page 70: Ceph Intro and Architectural Overview by Ross Turk

70

Page 71: Ceph Intro and Architectural Overview by Ross Turk

71

Page 72: Ceph Intro and Architectural Overview by Ross Turk

72

Page 73: Ceph Intro and Architectural Overview by Ross Turk

73

DYNAMIC SUBTREE PARTITIONING

Page 74: Ceph Intro and Architectural Overview by Ross Turk

74

Getting Started With Ceph

Read about the latest version of Ceph.• The latest stuff is always at http://ceph.com/get

Deploy a test cluster using ceph-deploy.• Read the quick-start guide at http://ceph.com/qsg

Deploy a test cluster on the AWS free-tier using Juju.• Read the guide at http://ceph.com/juju

Read the rest of the docs!• Find docs for the latest release at http://ceph.com/docs

Have a working cluster up quickly.

Page 75: Ceph Intro and Architectural Overview by Ross Turk

75

Getting Involved With Ceph

Most project discussion happens on the mailing list.• Join or view archives at http://ceph.com/list

IRC is a great place to get help (or help others!)• Find details and historical logs at http://ceph.com/irc

The tracker manages our bugs and feature requests.• Register and start looking around at

http://ceph.com/tracker

Doc updates and suggestions are always welcome.• Learn how to contribute docs at http://ceph.com

/docwriting

Help build the best storage system around!

Page 76: Ceph Intro and Architectural Overview by Ross Turk

76

Ceph Cuttlefish (v0.61.x)

1. New ceph-deploy provisioning tool2. New Chef cookbooks3. Fully-tested packages for RHEL (in EPEL)4. RGW authentication management API5. RADOS pool quotas6. New ceph df7. RBD incremental snapshots

Best Ceph ever.

Page 77: Ceph Intro and Architectural Overview by Ross Turk

77

Questions?

Ross TurkVP Community, Inktank

[email protected]@rossturk

inktank.com | ceph.com