Top Banner
Rate limiters in modern software systems Sandeep Joshi 1 CMG Pune 22nd September 2016
27

Rate limiters in big data systems

Apr 12, 2017

Download

Software

Sandeep Joshi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rate limiters in big data systems

Rate limiters in modern software

systemsSandeep Joshi

1

CMG Pune 22nd September 2016

Page 2: Rate limiters in big data systems

Permit RajControl systems background. Traffic shaping in networking.

Two strategies:1. Leaky bucket : A queue with constant service time. Drops packets on overflow.

2. Token bucket : Bucket holds tokens permits are acquired before transmission. Allows burstiness in traffic.

PID controller

Backpressure

2

Page 3: Rate limiters in big data systems

Congestion control

http://ecomputernotes.com/3

Page 4: Rate limiters in big data systems

Rate Limiter features 1.Enforce long term steady state rate

2.Allow short bursts to exceed limit

3.Warm-up : Gradually increase the rate after idle period

4.Allow changing the rate at run-time

5.Handle requests based on priority (fairness).

4

Page 5: Rate limiters in big data systems

Rate-limiter implementations in software1.RocksDB

2.Facebook WDT

3.Apache Kafka

4.Apache Spark (Google Guava toolkit)

5.Akka toolkit(not covering)

6.Node.js(not covering)

7.Conclusion5

Page 6: Rate limiters in big data systems

RocksDBKey value store which uses Log-Structured Merge (LSM) trees.

It has a rate limiter to throttle disk writes done during two different workflows.

1. Flush threads which write in-memory tree to disk.

2. Compaction thread which merges trees on disk.

Throttler requires 3 parameters

3. Refill period

4. Refill bytes per period

5. Fairness which decides which of two queues to serve first

6

Page 7: Rate limiters in big data systems

RocksDBState kept by Throttler

1. Available bytes [i.e. tokens]

2. Next refill time

3. Queue [low] and Queue[high] into which new requests inserted

Workflow

4. Inserts requests into a queue

5. Leader of the queue awakes at “next refill time” and increments available bytes.

6. Sets the next refill time = now + refill period

7. Services all requests inside the queue until available bytes is exhausted.7

Page 8: Rate limiters in big data systems

RocksDB hi priority only

8

Peaks do not rise above average

Page 9: Rate limiters in big data systems

RocksDB with 2 priorities

9

Low priority get delayed

Page 10: Rate limiters in big data systems

RocksDB1.Enforce long term steady state rate

2.Allow short bursts to exceed rate

3.Warm-up : Gradually increase the rate after idle period

4.Allow changing the rate at run-time

5.Service requests based on priority (fairness).

10

Page 11: Rate limiters in big data systems

Facebook WDT (Warp-speed Data transfer) http://www.github.com/facebook/wdt

Open-source library which is used for file transfer between data centers. (e.g. transmit MySQL backups to another data center)

Both sender and receiver spawn multiple threads

All threads on sender or receiver share the same “Throttler”

Throttler limits average rate as well as peak bursts.

11

Page 12: Rate limiters in big data systems

Facebook WDT

12

Allows peak bursts above average limit specified

Page 13: Rate limiters in big data systems

Facebook WDT1.Enforce long term steady state rate

2.Allow short bursts to exceed rate

3.Warm-up : Gradually increase the rate after idle period

4.Allow changing the rate at run-time

5.Service requests based on priority (fairness).

13

Page 14: Rate limiters in big data systems

Apache Kafka

http://kafka.apache.org

14

Page 15: Rate limiters in big data systems

Apache KafkaClient-based quotas to limit publisher and consumer processing.

Enforces Fixed-rate in every window

Every request is inserted into into a DelayQueue from which elements can be retrieved only after expiry (DelayQueue is part of java concurrent library)

Delay_time = (window_size) * (observed_rate - desired_rate) / observed_rate

Allows changing quota at run-time.

15

Page 16: Rate limiters in big data systems

Apache SparkDynamic rate limiter

Two components : Driver (master) and Receiver (accepts ingest)

1.Driver (master) uses PID-based Rate estimator to recompute the desired rate at end of every batch

2.Driver sends new rate to Receiver

3.Receiver uses Google Guava Rate limiter to throttle block generation

16

Page 17: Rate limiters in big data systems

Apache Spark

https://issues.apache.org/jira/browse/SPARK-897517

Page 18: Rate limiters in big data systems

Apache SparkPID Controller used in the Spark Driver to estimate best rate

1.Proportional (current): correction based on current error

2.Integral gain (past): correction based on steady-state error.

3.Derivative gain (future) : prediction based on rate of change of error

18

Page 19: Rate limiters in big data systems

Apache Spark before and after

19Source : Dean Wampler, Adding Backpressure to Spark Streaming

Page 20: Rate limiters in big data systems

Google Guava rate limiterhttps://github.com/google/guava

Used by Receiver in Apache Spark to limit the rate

Stores expected time of next request, instead of time of previous request.

Under-utilization : It stores unused permits up to a max threshold.

Warm-up period : Stored permits are given out gradually by increasing the sleep time.

20

Page 21: Rate limiters in big data systems

Google Guava rate limiter

21

Allows peak bursts while maintaining average limit.Allows warmup period

Page 22: Rate limiters in big data systems

Rate-limiting techniquesAll transmitters call some “Throttle()” function before transmission.

Some implementations push requests into a queue, others just calculate sleep time and decrement permits.

Retain time of next estimated wakeup instead of time of previous call.

Permits are added if new epoch is detected.

Save unused permits upto some maximum limit - this handles underutilization.

While using unused permits, increase the sleep time. This increases the warmup period.

22

Page 23: Rate limiters in big data systems

Comparison of implementationsrocksdb wdt guava/spark kafka

Type Leaky bucket Token Token bucket Leaky

Enforce Average rate Y Y Y Y

Allow short bursts exceeding average

Y Y

Warm-up after idle period

Y

Alter rate at runtime Y Y Y Y

Priority Y 23

Page 24: Rate limiters in big data systems

References1.RocksDB : util/rate_limiter.cc

2.WDT : Throttler.cpp

3.Apache Kafka : ClientQuotaManager.scala

4.Apache Spark : PIDRateEstimator.scala, RateLimiter.scala

5.Adding Back-pressure to Spark Streaming by Dean Wampler, Typesafe (http://files.meetup.com/1634302/Backpressure%2020160112.pdf)

6.Google Guava : RateLimiter.java and SmoothRateLimiter.java 24

Page 25: Rate limiters in big data systems

Facebook WDTIf (long term rate > average)

Sleep for (ideal - elapsed) time

Else If (short term rate > peak)

Add tokens based on time difference since last call

Sleep until tokens available are positive

25

Page 26: Rate limiters in big data systems

Node.jsRate limiter packages available

https://github.com/jhurliman/node-rate-limiter

express-rate

26

Page 27: Rate limiters in big data systems

Akka toolkit Look at TimerBasedThrottler.scala

27