Microservice Protocols of Interaction

Post on 15-Apr-2017

358 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

Transcript

Microservice Protocols of Interaction

Todd L. Montgomery @toddlmontgomery

About me…

What is a Protocol?

Why should we care!?

@toddlmontgomery

pro·to·col noun \ˈprō-tə-ˌko ̇l, -ˌkōl, -ˌkäl, -kəl\

...

3 b : a set of conventions governing the treatment and especially the formatting of data in an electronic communications system <network protocols>

...

3 a : a code prescribing strict adherence to correct etiquette and precedence (as in diplomatic exchange and in the military services) <a breach of protocol>

Protocols of Interaction

Wire Protocol, Method Calls, Shared Memory Interactions, etc.

Microservice Architectures

Forced Decoupling

via an“Asynchronous, Binary Boundary”

Forced Loose Coupling

The truth is…

Protocols can and do Couple

Protocols of Interactionare quite important!

Protocols of Interaction Matter!

The Environment

Networks, and especially the Internet,are Hostile Environments

Data can be lost,

duplicated, and re-ordered!!

TCP connections can…

be closedunexpectedly

end in anunknown state

be interceptedby idiots, er Proxies

Duplicated

Re-Ordered

Lost

Which meansData over TCP* might be…

* - When connections are re-established

Don’t assume the networkis reliable

https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing

Case Studies

Case Study 1

Loose Ordering

@toddlmontgomery

SyncRequests

&Responses

Request

Request

RequestResponse

Response

Response

Throughput limited by Round-Trip Time (RTT)!

@toddlmontgomery

AsyncRequests

&Responses

Request

Request

RequestResponse

Response

Response

Throughput less limited by Round-Trip Time!

@toddlmontgomery

AsyncRequests

&Responses

Correlation!

Request 0

Request 1

Request 2Response 0

Response 1

Response 2

Aside…

Ordering is an Illusion!!

Compiler can re-order

Runtime can re-order

CPU can re-order

Ordering has to be imposed!

@toddlmontgomery

AsyncRequests

&Responses

Correlation!

Request 0

Request 1

Request 2Response 0

Response 1

Response 2

@toddlmontgomery

Correlation!

Request 0

Request 1

Request 2Response 0

Response 1

Response 2

Ordering

@toddlmontgomery

Correlation!

Request 0

Request 1

Request 2

Response 0

Response 1

Response 2

(Valid)Re-Ordering

(one of many)

@toddlmontgomery

Handling the Unexpected

Request 0Response 1

Invalid, Drop We only know of 0.1 is unknown!

SCTPHTTP/2 (SPDY)

…most OSI Layer 4 protocols

Case Study 2

Can you hear me now?Timeouts & Retries

@toddlmontgomery

Request

ACK

Processing

Handling the unexpected

@toddlmontgomery

Request

Tim

eout

Inte

rval

X

@toddlmontgomery

Request

ACK

Processing

XTi

meo

ut In

terv

al

Retransmit at end of interval

@toddlmontgomery

ACK

Processing…

Avoid Spurious Retransmits

Retransmit

Original

Tim

eout

Inte

rval

@toddlmontgomery

Interval = N x “typical” RTT

Account for processing delay

XTi

meo

ut In

terv

al

“Average”

@toddlmontgomery

Measure! But very “noisy”?

RTT

Mea

sure

men

t

Variances inprocessing,

transmission,etc.

TCP Retransmit Timeout (RTO)

Err = M - A A <- A + gErrD <- D + h(|Err| - D)RTO = A + 4D

M = measurement, A = smoothed average, D = smoothed mean deviation,

g and h = gain constants (0 to 1)

TCP Retransmit Timeout (RTO)

Err = M - A A <- A + gErrD <- D + h(|Err| - D)RTO = A + 4D

Do you measure on a Retransmit? NO!

@toddlmontgomery

Does processing twice hurt?

X

Original

ACK

Retrans

Process Once

Process Twice

Tim

eout

Inte

rval

@toddlmontgomery

Are Original & Retransmit treated the same?

X

Original

ACK

Retrans

Process Once

Process Twice

Tim

eout

Inte

rval

TCPSCTPAeron

…anything with reliability

Case Study 3

What I Need! When I Need It!“Lifetime” Management

“Managing” Application Working Set

or

Service Liveness

Caching Algorithms

LRU, MRU, PLRU, RR,SLRU, LFU, …

“Liveness” is essential

@toddlmontgomery

Request

ACK

Service Ais Alive!

Service Bis Alive!

Service A Service B

Consequence of Processing

@toddlmontgomery

Keepalive

Keepal

ive

Service Ais Alive!

Service Bis Alive!

Service A Service B

Absence of Processing

RIP Route Deletion

Step 0 - route info broadcast @30 secondsStep 1 (3 min) - Set Distance to Infinity (16) Step 2 (+1 min) - Delete Route

Aside… RIP… aptly named

Aeron Driver Keepalive

Time of Last Activity = Shared Variable

Doesn’t need to be a message…

@toddlmontgomery

Bye

Bye

Service Ais gone!

Service Bis gone!

Service A Service B

Optimization, but insufficient with arbitrary failures

Liveness often exists acrosstransient connectivity

So…Don’t conflate transport

state with liveness!

Like TCP connection state

Dead TCP connection !=

Dead Service

Live TCP connection !=

Live Service

BGPOSPF

Transports…

almost every protocol

Case Study 4

Elasti-What?Self-Similar Behavior

Request X

Request X

Request X

Request X, X, X

Multiple same/similar requests at the same time

Response X, X, X

Similar Problem…

Reliable Multicast

1, 2, 3

1, 2, 3 1, 2, 3 1, 2, 3

Non-correlated loss

X X X

NAK 1, 2, 3

NAK 2

NAK 1

NAK 3

Request individual lost data

Retransmit 1, 2, 3

1, 2, 3

1, 2, 3 1, 2, 3 1, 2, 3

Temporally/Spatially Correlated Loss

X X X

NAK 2

NAK 2

NAK 2

NAK 2, 2, 2

Multiple requests for same data

Retransmit 2, 2, 2

Request 2

Request 2

Request 2

Request 2, 2, 2

It’s a generic problem!

Request 2

Request 2

Request 2

Request 2, 2, 2

Overloading Responder & Network

Request 2

Publish RequestsDon’t Immediately Request, Listen first

Timeout!Request

2Request

2

Suppress Request

Request 2

How long to wait & listen for?

Timeout!Request

2Request

2

Suppress Request

Statistics to the Rescue!

SRM Backoff

RandomBackoff = [C1, C1+C2] * 1-way delay

Random is more than good enough

Request 2

Request 2

Request 2, 2

Must also shed duplicates on the responder

Response 2, 2

Shed second “Request 2” if too soon

X

X

SRMPGMAeron

http://en.wikipedia.org/wiki/Scalable_Reliable_Multicasthttp://www.eurecom.fr/en/publication/107/detail/optimal-multicast-feedback

Case Study 5

Hey, Slow Down!Flow (& Congestion) Control

@toddlmontgomery

Data

Data

DataACK

ACK

ACK

Throughput = Data Length / RTT

RTT

Stop-And-WaitFlow Control

Delay

Bandwidth

BDP = (Byte / sec) * sec = Bytes

BDP(Buffer)

@toddlmontgomery

Data

ACKRT

T

Throughput = N * Data Length / RTT

… N Data“Blobs”

So…How big is N?

This is surprisingly hard to answer

It depends…

Big… but

Don’t overflow receiver

Don’t overflow “network”

TCP Flow Control

Receiver advertises N

TCP Congestion Control

Sender probes for network N

TCP Sender

min(Receiver N, Network N)

Only go as fast as Network & Receiver

ReactiveStreams

Subscriber uses explicit request(N)

Publisher assumes best case

http://www.reactive-streams.org/

Takeaways!

Protocols of interaction are important & can be tremendously impactful

for better or worse…

@toddlmontgomery

Questions?

• IETF http://www.ietf.org/• Aeron https://github.com/real-logic/Aeron• Twitter @toddlmontgomery

Thank You!

top related