Distributed Scheduling Algorithms for Switching Systems Shunyuan Ye, Yanming Shen, Shivendra Panwar 2015-7-161.
Post on 22-Dec-2015
216 Views
Preview:
Transcript
Distributed Scheduling Algorithms for Switching Systems
Shunyuan Ye, Yanming Shen, Shivendra Panwar
23/4/19 1
Overview• Background– Problem definition, related work
• A randomized scheduling algorithm– Algorithm, example, proof sketch
• Applications– Buffered crossbar switch: DISQUO– Optoelectronic switch: HELIOS
23/4/19 2
Scheduling Problem
• Objective: Find a scheduling algorithm that can sustain 100% capacity
Input 1
Output 1
VOQs Switching Fabric
Related Work (1)• Maximum Weight Matching (MWM, Tassiulas ’92)
11
22
33
11
22
33
Inputs Outputs10
15510
26
3 8
12
11
22
33
11
22
33
Inputs Outputs
15
10
12
Centralized O(N3) computations
Related Work (2)• Randomized Scheduling Algorithm (Tassiulas ’98)
CentralizedO(N) computations
11
22
33
11
22
33
Inputs Outputs
6
5
10
12
8
4
11
22
33
11
22
33
Inputs Outputs
12
8
4
Poor Delay Performance
Related Work (3)• iSLIP (McKoewn, ’98)
– Distributed, but cannot guarantee 100% throughput
• LAURA (Giaccone et al., ’02)– Merge R(n) and S(n-1)– Complexity is O(NlogN)
• EMHW (Li et al., ’04)– Using exhaustive service matching, complexity is O(logN)
• Glauber dynamics work of Walrand et al., Srikant et al., Shah
Question?• Can we have a scheduling algorithm which satisfies all
the conditions:– Guaranteed 100% throughput– Low computation complexity, i.e., O(1)– Easy to implement in a distributed way
Randomized Scheduling Algorithm• Notation– Neighbors:
• N(i, j) = {(i, j’) or (i’, j)}
– Feasible schedule: • If Sij(n) = 1, for any (k,l) in N(i,j), Skl(n) = 0
Sij(n) = 1 Skl(n) = 0
Randomized Scheduling Algorithm• S(n-1) is the schedule at time n-1• Randomly generate a feasible
schedule H(n):– Pre-determined– Hamiltonian walk: It can be
implemented in a distributed manner with a time complexity of O(1)
S(n-1) H(n)
Randomized Scheduling Algorithm• S(n) is generated following the rules:• a) For (i, j) not in H(n), Sij(n) = Sij(n-1)• b) For any (i, j) in H(n):– If (i, j) in S(n-1):
• Sij(n)=1, with probability pij
• Sij(n)=0, with 1-pij
(pij is a concave function of Qij)– If (i, j) not in S(n-1):
• If for any (k, l) in N(i, j), (k, l) was free
– Sij(n)=1, with probability pij
– Sij(n)=0, with 1-pij
• Else, Sij(n) = 0 S(n-1) H(n)
Stay the same
Randomized Scheduling Algorithm• Example
S(n) H(n+1)
• For (1, 3): none of its neighbors was active
• S13(n+1) = 1, with P13
• S13(n+1) = 0, with 1-P13
• S13(n+1) = 1, in the example
• For (2, 1): it was in S(n-1)
• S21(n+1) = 1, with P21
• S21(n+1) = 0, with 1-P21
• S21(n+1) = 1, in the example
• For (3, 2): the same as (1, 3)
• S32(n+1) = 0, in the example
S(n+1)
Intuitive Explanation• When (i, j) is picked by H(n), and none of its
neighbors was active in the previous slot, (i, j) can decide to be active or not with a probability.
• If (i, j) becomes active, all of its neighbors are blocked from being active.
• If we define the probability as a concave function of Qij, longer queues have a higher probability to become active (and a lower probability to be blocked by short queues).
• The weight of active VOQs will be very close to the maximum after the system converges.
Intuitive Explanation• Example
• A higher probability that the schedule is {(1,2), (2, 1)}
Q11 = 1Q12 = 10
Q21 = 8
Q22 = 2pij= log(Qij) / [1+ log(Qij)]
With p11 = 0, S11 = 1
With p22 = 0.4, S22 = 1
With p12 = 0.7, S12 = 1
With p21 = 0.8, S21 = 1
System Stability
• Sketch of proof of system stability– Define the state of the system as the schedule S(n)– S(n-1), S(n), S(n+1) is a Markov chain, and it is time
reversible, which implies a product-form stationary distribution.
– For any admissible Bernoulli arrival traffic, the weight of S(n) is always close to the maximum weight S*(n), after the system converges.
– System can be proved to be stable.
DISQUO Scheduling Algorithm
• DISQUO is a distributed implementation for a buffered crossbar switch
• Advantages:– Totally distributed without message passing– Delay performance is very good
• Drawback:– N2 crosspoint buffers are needed
Buffered Crossbar Switch
• Input scheduler and output scheduler can be independent, and thus distributed.
Output N
12
N
…
Input 2
Input N
…
Output 1 Output 2
Input 1
…
CBijVOQij
DISQUO Scheduling Algorithm
• Distributed Implementation Example
n = m+n = m_
• If crosspoint (i, j) is active, input i and output j have to serve this crosspoint buffer.• Otherwise, they can randomly pick one to serve
DISQUO Scheduling Algorithm
• Distributed Implementation Example
n = (m+1)+n = (m+1)_
Inputs and outputs can learn each other’s decisions by observing the crosspoint buffer, so that they can keep the consistency of the schedule
• For input 1 and 2, they have to decide whether to keep (1, 2) and (2, 1) active based on P12 and P21.
• In the example, they both decide to become inactive.
• For input 3, it has to decide whether to make (3, 2) active with a probability P33
• In the example, it decides to become active.
Simulations
• Uniform traffic
Simulations
• Non-uniform traffic– Throughput of RR-RR under hotspot traffic is 85%.
Simulations
• Impact of switch size– Delay is almost independent of switch size.
Simulations
• Impact of buffer size– K=1 is sufficient
HELIOS Scheduling Algorithm
• HELIOS is a distributed algorithm for a hybrid optical/electrical switch.
• Advantages:– Easy implementation (DWDM optical fiber)– Totally distributed without message passing– Uses an optical fabric to reduce power
consumption– Guarantees 100% throughput for any admissible
traffic
Architecture
• Each input is equipped with a fast tunable laser as the transmitter, which can tune to different wavelengths.
Architecture
• Each output has a fixed wavelength receiver operating in a specific WDM channel.
Architecture
• The optical fabric is a broadcast-and-select fabric.
The Linecard Model
• λ-monitor is used to sense the channels, so that the inputs know which wavelengths are being used.
Implementation Example
Simulation
• Under Bernoulli i.i.d. traffic, the delay performance is poor compared to MWM. But if one slot time is only a few nanoseconds, the delay is still acceptable (i.e. < 10μs)
Simulation
• Under On-Off bursty traffic, with Pareto distribution (larger α means longer burst length). The delay performance is closer to MWM.
Summary• We proposed a scheduling algorithm with a very low
computation complexity• The algorithm can be easily implemented is a
distributed way for different switching architectures• It can guarantee 100% throughput for any admissible
traffic, and for some architectures it can provide very good delay performance
Thank you!
Q&A
top related