Ceph Intro and Architectural Overview by Ross Turk

Post on 15-Jan-2015

2688 Views

Category:

Technology

19 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

Transcript

Ceph Intro & Architectural OverviewRoss TurkVP Community, Inktank

2

ME ME ME ME ME ME.I made a slide today. It’s all about me.

Ross TurkVP Community, Inktank

ross@inktank.com@rossturk

inktank.com | ceph.com

3

CLOUD SERVICES

COMPUTE NETWORK STORAGE

the future of storage™

4

HUMAN

COMPUTER TAPE

HUMAN

ROCK

HUMAN

INK

PAPER

5

HUMAN

COMPUTER TAPE

6

YOUTECHNOLOG

YYOUR DATA

7

How Much Store Things All Human History?!writing

paper

computers

distributed storage

cloud computing

gaaaaaaaaahhhh!!!!!!

carving

8

HUMAN COMPUTER DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

9

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMANHUMAN

HUMANHUMAN

HUMAN

HUMANHUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

COMPUTER

10

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMANHUMAN

HUMANHUMAN

HUMAN

HUMANHUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

GIANT SPENDY

COMPUTER

11

DISKCOMPUTE

R

HUMAN

HUMAN

HUMAN

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

R

12

HUMAN

HUMAN

HUMAN

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

13

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

“STORAGE APPLIANCE”

14Storage ApplianceMichael Moll, Wikipedia / CC BY-SA 2.0

15

SUPPORT AND MAINTENANCE

PROPRIETARY SOFTWARE

PROPRIETARY HARDWARE

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

34% of 2012 revenue(5.2 billion dollars)

1.1 billion in R&Dspent in 2012

1.6 million square feetof manufacturing space

16

1010100110

1010110011

1001100101

1001101011

1001100111

1001010011

THE CLOUD

17

SUPPORT AND MAINTENANCE

PROPRIETARY SOFTWARE

PROPRIETARY HARDWARE

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

STANDARD HARDWARE

DISKCOMPUTE

RDISK

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

OPEN SOURCE SOFTWARE

ENTERPRISE SUBSCRIPTION

(optional)

18

19

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

NO SINGLE POINT OF FAILURE

SOFTWARE BASEDSELF-

MANAGING

philosophy design

20

8 years & 20,000 commits later…

21

22

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

23

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

24

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FSFS btrfsxfsext4

MMM

25

M

M

M

HUMAN

26

Monitors:• Maintain cluster

membership and state• Provide consensus for

distributed decision-making• Small, odd number• These do not serve stored

objects to clients

M

OSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects to

clients• Intelligently peer to perform

replication and recovery tasks

27

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

28

LIBRADOS

M

M

M

APP

socket

LLIBRADOS• Provides direct access to

RADOS for applications• C, C++, Python, PHP, Java,

Erlang• Direct access to storage

nodes• No HTTP overhead

30

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

31

M

M

M

LIBRADOS

RADOSGW

APP

socket

REST

32

RADOS Gateway:• REST-based object

storage proxy• Uses RADOS to store

objects• API supports buckets,

accounts• Usage accounting for

billing• Compatible with S3 and

Swift applications

33

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

34

M

M

M

VM

LIBRADOS

LIBRBD

VIRTUALIZATION CONTAINER

35

LIBRADOS

M

M

M

LIBRBD

CONTAINER

LIBRADOS

LIBRBD

CONTAINERVM

36

LIBRADOS

M

M

M

KRBD (KERNEL MODULE)

HOST

37

RADOS Block Device:• Storage of disk images in

RADOS• Decouples VMs from host• Images are striped across

the cluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel

(2.6.39+)• Qemu/KVM, native Xen

coming soon• OpenStack, CloudStack,

Nebula, Proxmox

38

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

39

M

M

M

CLIENT

0110

datametadata

40

Metadata Server• Manages metadata for a

POSIX-compliant shared filesystem• Directory hierarchy• File metadata (owner,

timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data to

clients• Only required for shared

filesystem

41

What Makes Ceph Unique?Part one: CRUSH

42

APP??

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

43How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0

44

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

45Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0

46

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

A-G

H-N

O-T

U-Z

F*

47I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0

48

HOW DO YOUFIND YOUR KEYS

WHEN YOUR HOUSEIS

INFINITELY BIGAND

ALWAYS CHANGING?

49The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0

50

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

51

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

52

CRUSH• Pseudo-random placement

algorithm• Fast calculation, no lookup• Repeatable, deterministic• Statistically uniform

distribution• Stable mapping• Limited data migration on

change• Rule-based configuration• Infrastructure topology aware• Adjustable replication• Weighting

53

CLIENT

??

54

55

56

57

CLIENT

??

58

What Makes Ceph UniquePart two: thin provisioning

59

LIBRADOS

M

M

M

VM

LIBRBD

VIRTUALIZATION CONTAINER

60

HOW DO YOUSPIN UP

THOUSANDS OF VMsINSTANTLY

ANDEFFICIENTLY?

61

144 0 0 0 0

instant copy

= 144

62

4144

CLIENT

write

write

write

= 148

write

63

4144

CLIENTread

read

read

= 148

64

What Makes Ceph Unique?Part three: clustered metadata

65POSIX Filesystem MetadataBarnaby, Flickr / CC BY 2.0

66

M

M

M

CLIENT

0110

67

M

M

M

68

one tree

three metadata servers

??

69

70

71

72

73

DYNAMIC SUBTREE PARTITIONING

74

Getting Started With Ceph

Read about the latest version of Ceph.• The latest stuff is always at http://ceph.com/get

Deploy a test cluster using ceph-deploy.• Read the quick-start guide at http://ceph.com/qsg

Deploy a test cluster on the AWS free-tier using Juju.• Read the guide at http://ceph.com/juju

Read the rest of the docs!• Find docs for the latest release at http://ceph.com/docs

Have a working cluster up quickly.

75

Getting Involved With Ceph

Most project discussion happens on the mailing list.• Join or view archives at http://ceph.com/list

IRC is a great place to get help (or help others!)• Find details and historical logs at http://ceph.com/irc

The tracker manages our bugs and feature requests.• Register and start looking around at

http://ceph.com/tracker

Doc updates and suggestions are always welcome.• Learn how to contribute docs at http://ceph.com

/docwriting

Help build the best storage system around!

76

Ceph Cuttlefish (v0.61.x)

1. New ceph-deploy provisioning tool2. New Chef cookbooks3. Fully-tested packages for RHEL (in EPEL)4. RGW authentication management API5. RADOS pool quotas6. New ceph df7. RBD incremental snapshots

Best Ceph ever.

77

Questions?

Ross TurkVP Community, Inktank

ross@inktank.com@rossturk

inktank.com | ceph.com

top related