1
1�
Outline �
2�
Introduction and Motivation
Survey of Existing Approaches
Example:
Distributive Delay-Optimal Control for Uplink OFDMA via
Localized Stochastic Learning and Auction Game
Convergence Analysis
Asymptotic Optimality
Conclusion
Introduction and Motivation �
3�
Why delay performance is important? “WHAT??!!Heisstuckinthe
air!!!$*(&#%*!(!”
“Youmustbekiddingme!Bufferingatsuchanimportantmoment!!??”�
Introduction and Motivations �
4�
We may have multiple delay-sensitive wireless applications running at different devices
Keeptrackofagame�
PlaymulI‐playergame�
Keeptalkingtosomefriends �
Related Works�
5�
OFDMA Joint Power and Subband Design for PHY
Performance
[Yu’02],[Hoo’04],[Seong’06],etc.– Selectsthestrongestuserpersubband– Time‐FrequencyWater‐fillingPowerAllocaIon– AssumingknowledgeofperfectCSIT.
[Lau’05],[Wong’09],[Brah’07]etc.– Robust Power and Subband Control with limited
feedbackoroutdatedCSIT(packeterrors). �
Introduction and Motivations �
6�
Challenges to incorporate QSI and CSI in adaptation
When Shannon meets Kleinrock… �
ClaudeShannon � LeonardKleinrock�
Existing Approaches to deal with Delay-Optimal Control�
7�
Various approaches dealing with delay problems
BufferStates
Toregulatethebufferstatetowards1/v
S<1/v S>1/v
v ‐vBufferParNNoning
Related Works�
8�
Various approaches dealing with delay problems ApproachII[Yeh’01PhD],[Yeh’03ISIT]
‐SymmetricandhomogeneoususersinmulI‐accessfadingchannels
‐UsingstochasNcmajorizaNontheory,theauthorsshowedthatthelongestqueuehighestpossiblerate(LQHPR)policyisdelay‐opImal �
A
BCapacityregion
Longerqueueforuser1
higherrateforuser1
Related Works�
9�
Various approaches dealing with delay problems
Related Works�
10�
Various approaches dealing with delay problems
Technical Challenges To be Solved�
11�
12�
Uplink OFDMA System Model�
H
OFDMA PHY Model�
14�
OFDMA Physical Layer Model
OFD
MA
SubbandAllocaN
on&pow
erControl�
CSI
MobileK�The image cannot be displayed. Your computer may not
The image cannot be Mobile1�
BS�
E[XXH ] = I
H = {Hk,n}
OFDMAPHYSubcarrier&PowerAllocaIon
DataRateRk
Source Model and System States�
15�
G‐MAPPackets
YouTubePackets
CSI�
QSI�
CrossLayerController
(BS)
PHYState
MACLayer
G‐MAPPackets
PHYLayer
Power&SubbandAllocaNon
Ime�
PacketArrivals
PHYFrames
schedulingNmeslot Channelisquasi‐staNcinaslot i.i.d.betweenslots
MACState
YouTubePackets
OFDMA Queue Dynamics�
16�
Time domain partitioned into scheduling slots
CSI H(t) remains quasi-static within a slot and is i.i.d.
between slots
Packet arrival A(t)=(A1(t) ,…,AK(t)) where Ak (t) i.i.d.
according to a general distribution P(A).
Nk(t) denotes the random packet size, i.i.d.
Qk(t) denotes the number of packets waiting in the k-th
buffer at the t-th slot.
Global System State (CSI, QSI) TotalnumberofbitsTransmi`edinthet‐thslot
OFDMA Delay-Optimal Formulation�
17�
Stationary Power and Subband Allocation Control Policy
A mapping from the system state to a power
and subband allocation actions.
(Power Constraint)
(Subband Allocation Constraint)
(Packet Drop Rate Constraint)
OFDMA Delay-Optimal Formulation�
18�
Definitions: Average Delay, Power and Packet Drop Constraints
under a control policy
Li`le’sLaw:averageno.ofpackets=averagearrivalrate*averagedelaytheaveragedelay(intermsofseconds)theaveragequeuelength
OFDMA Delay-Optimal Formulation�
19�
Problem Formulation: Find the optimal control policy that minimizes
“PosiIveWeighIngFactor”
ParetoOpImaldelayboundary
Why the Optimization Problem is difficult? – Hugedimensionofvariablesinvolved
(policy=setofacIonsoverallsystemstaterealizaIons)– KqueuesarecoupledtogetherExponenIallyLargeStateSpace– Ingeneral,wecannothaveexplicitclosed‐formexpressionofhowthe
objecIvefuncIon(averagedelay)isrelatedtothecontrolvariables(policy).
– Theproblemisnotconvex
Solution: Markov Decision Problem (MDP)
Overview of Markov Decision Problem Formulation �
20�
Specification of an Infinite Horizon Markov Decision Problem
– DecisionsaremadeatpointsofNme–decisionepochs
– SystemstateandControlAcNonSpace:
– Atthet‐thdecisionepoch,thesystemoccupiesastate
– ThecontrollerobservesthecurrentstateandappliesanacIon– Per‐stageReward&TransiNonProbability
– BychoosingacIonthesystemreceivesareward
– ThesystemstateatthenextepochisdeterminedbyatransiIonprobabilitykernel
– StaNonaryControlPolicy:– ThesetofacIonsforallsystemstaterealizaIons
– TheOpNmizaNonProblem:
– AverageReward– OpImalPolicy
R∗ = max
πlim
T→∞
1T
E[
T∑
t=1
R(St, At)
]At = π(St)
Solution of an Markov Decision Problem
Optimal average reward
Optimal policy (Fixed Point Problem on Functional Space)
Overview of Markov Decision Problem Formulation �
21�
Constrained Markov Decision Problem Formulation �
22�
Lagrangian approach to the Constrained MDP:
CMDP Formulation: Find the optimal control policy that minimizes
Optimal Solution �
23�
Infinite Horizon Average Reward MDP
Given a stationary control policy ,
he random process evolves like a Markov Chain
with transition kernel:
Solution is given by the “Bellman Equation”
“PotenIalfuncIon”(contribuIonofthestateitotheaveragereward)
“OpImalValue” EquaIonsand unknowns
Centralized Solution ?
Obtain knowledge of global QSI from K users (Uplink)?
Heavy signaling loading to deliver these QSI from mobiles to BS
Must have distributive solution !
Optimal Solution – Online Learning �
24�
How to determine the potential function ?
Brute-Force solution of the Bellman Equation ? (Value Iteration):
Too complicated, exponential complexity and memory requirement
Online stochastic learning !
Iteratively estimate potential function based on real time
observation of CSI and QSI – online value iteration
Per-user Potential and LMs Initialization
Online Policy Improvement Based on Per-subband Auction
Online Per-user Potential and LMs Update [Local CSI, Local QSI]
Termination
Distributive Solution:
Decentralized Solution (I) �
25�
Online Per-user Primal-Dual Potential Learning Algorithm via
Stochastic Approximation
Remark (Comparison to the deterministic NUM) Deterministic NUM:IteraIveupdatesareperformedwithintheCSIcoherenceImelimitthenumberofiteraIonsandtheperformance.Proposed online algorithm:IteraIveupdatesevolvesinthesameImescaleastheCSIandQSIconvergetoabe`ersoluIon(nolongerlimitedbythecoherenceImeofCSI).
Both the per-user potential and 2 LMs
are updated simultaneously.
New Observation at the beginning of the (l+1)-th slot
Decentralized Solution (II) �
26�
Per-stage auction with K bidders (MSs) and one auctioneer
(BS)
Low complexity Scalarized Per-Subband Auction
Bidding: Each user submits a bid
Subband allocation:
Power allocation:
Charging:
Lemma: The per-stage social optimal scalarized bid
(CSI,QSI) is Water‐leveldependsonQSI(viapotenIalfuncIon)
Decentralized Solution �
27�
Theorem (Convergence of online per-user learning) Under
some mild conditions, the distributive learning converges
almost surely.
Theorem (Asymptotically Global Optimal) For large K, the
online per-user learning algorithm is asymptotical global
optimal, and the summation of the per-user potential
approaches (w.p.1) to the solution of the centralized
Bellman equation.
Remark (Comparison to conventional stochastic learning) Conventional SL:(1)forunconstrainedMDPonlyorLMforCMDParedeterminedofflinebysimulaIon;(2)designedforcentralizedsoluIonwithcontrolacIondeterminedenIrelyfromthepotenIalupdateConvergenceProofbasedonstandard“contrac(onMapping”andFixed‐PointTheoremargument.Proposed SL:(1)simultaneousupdateofLMandthepotenIalfuncIon;(2)controlacIonisdeterminedbyalltheusers’potenIalviaper‐stageaucIonper‐userpotenIalupdateisNOTacontrac(onmapping&standardproofdoesnotapply.
Numerical Results�
28�
Average Delay per user vs SNR
Close‐to‐opImalperformanceevenforsmallnumberofusers
HugegainindelayperformancecomparedwithModified‐LargestWeightedDelayFirst(M‐LWDF),whichisthequeuelengthweightedthroughputmaximizaIon.
Numerical Results�
29�
Average Delay per user vs No. of users
ThedistribuIvesoluIonhashugegainindelayperformancecomparedwith3Baselines.
Numerical Results�
30�
Illustration of convergence property: Potential function vs. the scheduling slot index (K=10)
Conclusion �
31�
Online Per-user Learning: Simultaneous update of LMs and Potentials. Almost sure convergence
Asymptotically Global Optimal for large K
Optimal Strategy for the Auction Game: Delay-Optimal Power Control: Multi-Level Water-Filling (QSI water level; CSI instantaneous allocation) Delay-Optimal Subband Allocation: User selection based on (QSI,CSI)
References �
32�
• V.K.N.Lau,Y.Chen,“Delay‐Op(malPrecoderDesignforMul(‐StreamMIMOSystem”,toappearIEEETransac;onsonWirelessCommunica;ons,May2009.
• V.K.N.Lau,Y.Cui,“DelayOp(malPowerandSubcarrierAlloca(onforOFDMASystemviaStochas(cApproxima(on”,submi`edtoIEEETransacIonsonWirelessCommunicaIon,2008.
• K.B.Huang,V.K.N.Lau,“StabilityandDelayofZero‐ForcingSDMAwithLimitedFeedback",submi`edtoIEEETransacIonsonInformaIonTheory,Feb.2009.
• L.Z.Ruan,V.K.N.Lau,“Mul(‐levelWater‐FillingPowerControlforDelay‐Op(malSDMASystems”,submi`edtoIEEETransacIonsonWirelessCommunicaIon,2008.
33�