Top Banner
CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo Dias | FOSDEM'19 - Soware Defined Storage devroom [email protected]
63

MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

Feb 16, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2MESSENGER V2

Ricardo Dias |

FOSDEM'19 - So�ware Defined Storage devroom

[email protected]

Page 2: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

OUTLINEOUTLINE

What is the Ceph messenger

Messenger API

Messenger V1 Limitations

Messenger V2 Protocol

Page 3: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHAT IS THE CEPH MESSENGER?WHAT IS THE CEPH MESSENGER?

Page 4: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHAT IS THE CEPH MESSENGER?WHAT IS THE CEPH MESSENGER?

It's a wire-protocol specification;

Page 5: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHAT IS THE CEPH MESSENGER?WHAT IS THE CEPH MESSENGER?

It's a wire-protocol specification;

and also, the corresponding so�ware implementation

Page 6: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHAT IS THE CEPH MESSENGER?WHAT IS THE CEPH MESSENGER?

It's a wire-protocol specification;

and also, the corresponding so�ware implementation

Invisible to end-users

Page 7: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHAT IS THE CEPH MESSENGER?WHAT IS THE CEPH MESSENGER?

It's a wire-protocol specification;

and also, the corresponding so�ware implementation

Invisible to end-users

Unless when it's not working properly

Page 8: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHAT IS THE CEPH MESSENGER?WHAT IS THE CEPH MESSENGER?

It's a wire-protocol specification;

and also, the corresponding so�ware implementation

Invisible to end-users

Unless when it's not working properly

The messenger knows nothing about the Ceph distributedalgorithms and specific daemons protocols

Page 9: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHERE CAN WE FIND IT?WHERE CAN WE FIND IT?

Page 10: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHERE CAN WE FIND IT?WHERE CAN WE FIND IT?

Page 11: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (1/2)CEPH MESSENGER (1/2)

Page 12: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (1/2)CEPH MESSENGER (1/2)

Messenger is used as a "small" communication libraryby the other Ceph libraries/daemons

Page 13: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (1/2)CEPH MESSENGER (1/2)

Messenger is used as a "small" communication libraryby the other Ceph libraries/daemons

It can be used as both server and clientCeph daemons (osd, mon, mgr, mds) act as bothservers and clientsCeph clients (rbd, rgw) act as clients

Page 14: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (2/2)CEPH MESSENGER (2/2)

Page 15: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (2/2)CEPH MESSENGER (2/2)

Abstracts the transport protocol of the physicalconnection used between machines

Posix SocketsRDMADPDK

Page 16: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (2/2)CEPH MESSENGER (2/2)

Abstracts the transport protocol of the physicalconnection used between machines

Posix SocketsRDMADPDK

Reliable delivery of messages with "exactly-once"semantics

Page 17: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER (2/2)CEPH MESSENGER (2/2)

Abstracts the transport protocol of the physicalconnection used between machines

Posix SocketsRDMADPDK

Reliable delivery of messages with "exactly-once"semantics

Automatic handling of temporary connection failures

Page 18: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER APICEPH MESSENGER APIclass Messenger { int start(); int bind(const entity_addr_t& bind_addr); Connection *get_connection(const entity_inst_t& dest); // Dispatcher void add_dispatcher_head(Dispatcher *d); // server address entity_addr_t get_myaddr(); int get_mytype(); // Policy void set_default_policy(Policy p); void set_policy(int type, Policy p); }; class Connection { bool is_connected(); int send_message(Message *m); void send_keepalive(); void mark_down(); entity_addr_t get_peer_addr() const; int get_peer_type() const; };

Page 19: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER APICEPH MESSENGER APIclass Messenger { Connection *get_connection(const entity_inst_t& dest); // Dispatcher void add_dispatcher_head(Dispatcher *d); }; class Connection { int send_message(Message *m); void mark_down(); };

Page 20: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER APICEPH MESSENGER APIclass Dispatcher { // Message handling bool ms_can_fast_dispatch(const Message *m) const; void ms_fast_dispatch(Message *m); bool ms_dispatch(Message *m); // Connection handling void ms_handle_connect(Connection *con); void ms_handle_fast_connect(Connection *con); void ms_handle_accept(Connection *con); void ms_handle_fast_accept(Connection *con); bool ms_handle_reset(Connection *con); void ms_handle_remote_reset(Connection *con); bool ms_handle_refused(Connection *con); // Authorization handling bool ms_get_authorizer(int peer_type, AuthAuthorizer **a); bool ms_handle_authentication(Connection *con); };

Page 21: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

CEPH MESSENGER APICEPH MESSENGER APIclass Dispatcher { // Message handling bool ms_dispatch(Message *m); // Connection handling void ms_handle_accept(Connection *con); // Authorization handling bool ms_get_authorizer(int peer_type, AuthAuthorizer **a); bool ms_handle_authentication(Connection *con); };

Page 22: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

Page 23: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

The first wire-protocol of Ceph

Page 24: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

The first wire-protocol of Ceph

No extensibility at an early stage of the protocol

Page 25: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

The first wire-protocol of Ceph

No extensibility at an early stage of the protocol

No data authenticity supported

Page 26: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

The first wire-protocol of Ceph

No extensibility at an early stage of the protocol

No data authenticity supported

No data encryption supported

Page 27: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

The first wire-protocol of Ceph

No extensibility at an early stage of the protocol

No data authenticity supported

No data encryption supported

Limited support for different authentication protocols

Page 28: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V1 WIRE PROTOCOLMESSENGER V1 WIRE PROTOCOL

The first wire-protocol of Ceph

No extensibility at an early stage of the protocol

No data authenticity supported

No data encryption supported

Limited support for different authentication protocols

No strict structure for protocol internal messages

Page 29: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (1/2)MESSENGER V2 WIRE PROTOCOL (1/2)

Page 30: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (1/2)MESSENGER V2 WIRE PROTOCOL (1/2)

By default is available on the IANA port 3300 in CephMonitors

Messenger V1 will still be available through port 6789

Page 31: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (1/2)MESSENGER V2 WIRE PROTOCOL (1/2)

By default is available on the IANA port 3300 in CephMonitors

Messenger V1 will still be available through port 6789

Only Ceph Nautilus userspace libraries support V2Ceph kernel modules still talk V1

Page 32: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (1/2)MESSENGER V2 WIRE PROTOCOL (1/2)

By default is available on the IANA port 3300 in CephMonitors

Messenger V1 will still be available through port 6789

Only Ceph Nautilus userspace libraries support V2Ceph kernel modules still talk V1

Still in development as Nautilus has not been releasedyet

Page 33: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (2/2)MESSENGER V2 WIRE PROTOCOL (2/2)

Page 34: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (2/2)MESSENGER V2 WIRE PROTOCOL (2/2)

Complete redesign and implementation

Page 35: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (2/2)MESSENGER V2 WIRE PROTOCOL (2/2)

Complete redesign and implementation

Extensible protocolA different path can be taken in a very early stage ofthe protocol

Page 36: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (2/2)MESSENGER V2 WIRE PROTOCOL (2/2)

Complete redesign and implementation

Extensible protocolA different path can be taken in a very early stage ofthe protocol

No limitations on the authentication protocols used

Page 37: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 WIRE PROTOCOL (2/2)MESSENGER V2 WIRE PROTOCOL (2/2)

Complete redesign and implementation

Extensible protocolA different path can be taken in a very early stage ofthe protocol

No limitations on the authentication protocols used

Encryption-on-the-wire support

Page 38: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSENGER V2 SPECIFICATIONMESSENGER V2 SPECIFICATION

Page 39: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

Actors:ConnectorAccepter

MESSENGER V2 SPECIFICATIONMESSENGER V2 SPECIFICATION

Page 40: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

Actors:ConnectorAccepter

Phases1. Banner Exchange2. Authentication3. Session Handshake4. Message Exchange

MESSENGER V2 SPECIFICATIONMESSENGER V2 SPECIFICATION

Page 41: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

MESSAGE FRAMEMESSAGE FRAMEstruct frame { uint32_t frame_len; // 4 bytes uint32_t tag; // 4 byts char payload[frame_len - 4]; }; struct encrypted_frame { uint32_t frame_len; uint32_t tag; char encrypted_payload[frame_len - 4]; };

Page 42: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

1. BANNER EXCHANGE1. BANNER EXCHANGE

connector accepter

connection established

banner

banner

We can change thebehavior of the protocol at

this point based on thesupported/required features

hello

hello

struct banner { char banner[8]; // "ceph v2\n" uint16_t payload_len; struct banner_payload pyload; }; struct banner_payload { uint64_t supported_features; uint64_t required_features; } struct hello { uint8_t entity_type; entity_addr_t peer_address; }

Page 43: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

2. AUTHENTICATION2. AUTHENTICATION

connector accepter

auth_request

auth_bad_method

auth_request

auth_reply_more

auth_request_more

several rounds

auth_done

From this point messageframes can be encrypted

struct auth_request { uint32_t method; uint32_t preferred_modes[num_modes]; char auth_payload[payload_len]; } struct auth_bad_method { uint32_t method; int result; uint32_t allowed_methods[num_methods]; uint32_t allowed_modes[num_modes]; }; struct auth_reply_more { char auth_payload[payload_len]; }; struct auth_request_more { char auth_payload[payload_len]; }; struct auth_done { uint64_t global_id; uint32_t mode; char auth_payload[payload_len]; };

Page 44: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

3. SESSION HANDSHAKE (NEW SESSION)3. SESSION HANDSHAKE (NEW SESSION)

connector accepter

client_ident

server_ident

struct client_ident { entity_addrvec_t addrs; int64_t global_id; uint64_t global_seq; uint64_t supported_features; uint64_t required_features; uint64_t flags; }; struct server_ident { entity_addrvec_t addrs; int64_t global_id; uint64_t global_seq; uint64_t supported_features; uint64_t required_features; uint64_t flags; uint64_t cookie; };

Page 45: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

3. SESSION HANDSHAKE (RECONNECT)3. SESSION HANDSHAKE (RECONNECT)

connector accepter

reconnect

reconnect_ok

struct reconnect { entity_addrvec_t addrs; uint64_t cookie; uint64_t global_seq; uint64_t connect_seq; uint64_t msg_seq; }; struct reconnect_ok { uint64_t msg_seq; };

Page 46: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

4. MESSAGE EXCHANGE4. MESSAGE EXCHANGE

connector accepter

session establishment

message

messagemessage

message + ack(2)

message + ack(2)

struct message { __u8 tag; // includes last seen msg seq ceph_msg_header2 header; char payload[front_len + middle_len] }; // TAGS CLOSE 6 // closing pipe MSG 7 // message ACK 8 // message ack KEEPALIVE2 14 // keepalive 2 KEEPALIVE2_ACK 15 // keepalive 2 reply

Page 47: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Page 48: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:

Page 49: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:CRC in frame header (length + tag)

Page 50: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:CRC in frame header (length + tag)CRC in messages payload (same as in V1)

Page 51: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:CRC in frame header (length + tag)CRC in messages payload (same as in V1)

Authenticity and Confidentiality:

Page 52: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:CRC in frame header (length + tag)CRC in messages payload (same as in V1)

Authenticity and Confidentiality:Frame payload only

Page 53: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:CRC in frame header (length + tag)CRC in messages payload (same as in V1)

Authenticity and Confidentiality:Frame payload onlyAuthenticity with SHA256 HMAC

Page 54: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FRAME INTEGRITY, AUHTENTICITY, ANDFRAME INTEGRITY, AUHTENTICITY, ANDCONFIDENTIALITYCONFIDENTIALITY

Integrity:CRC in frame header (length + tag)CRC in messages payload (same as in V1)

Authenticity and Confidentiality:Frame payload onlyAuthenticity with SHA256 HMACConfidentiality with AES encryption

Page 55: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHERE CAN I FIND THE CODE?WHERE CAN I FIND THE CODE?

Page 56: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHERE CAN I FIND THE CODE?WHERE CAN I FIND THE CODE?

Source code location: src/msg/async/ProtocolV2.cc

Page 57: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

WHERE CAN I FIND THE CODE?WHERE CAN I FIND THE CODE?

Source code location: src/msg/async/ProtocolV2.cc

Specificaton dra�:http://docs.ceph.com/docs/master/dev/msg

Page 58: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FUTURE FEATURESFUTURE FEATURES

Page 59: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FUTURE FEATURESFUTURE FEATURES

More authentication protocols: Kerberos, ...

Page 60: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FUTURE FEATURESFUTURE FEATURES

More authentication protocols: Kerberos, ...

Connection multiplexing

Page 61: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

FUTURE FEATURESFUTURE FEATURES

More authentication protocols: Kerberos, ...

Connection multiplexing

New ideas and contributions are welcome

Page 62: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED

Q&AQ&A

Page 63: MESSENGER V2 CEPH WIRE PROTOCOL REVISITED