-
1Downlink Power Control in Two-Tier CellularNetworks with
Energy-Harvesting Small Cells as
Stochastic GamesTran Kien Thuc, Ekram Hossain, and Hina
Tabassum
AbstractEnergy harvesting in cellular networks is an emerg-ing
technique to enhance the sustainability of
power-constrainedwireless devices. This paper considers the
co-channel deploy-ment of a macrocell overlaid with small cells.
The small cellbase stations (SBSs) harvest energy from
environmental sourceswhereas the macrocell (MBS) uses conventional
power supply.Given a stochastic energy arrival process for the
SBSs, we derivea power control policy for the downlink transmission
of bothMBS and SBSs such that they can achieve their objectives
(e.g.,maintain the signal-to-interference-plus-noise ratio (SINR)
at anacceptable level) on a given transmission channel. We consider
acentralized energy harvesting mechanism for SBSs, i.e., thereis a
central energy queue (CEQ) where energy is harvestedand then
distributed to the SBSs. When the number of SBSsis small, the game
between the CEQ and the MBS is modeled asa single-controller
stochastic game and the equilibrium policiesare obtained as a
solution of a quadratic programming problem.However, when the
number of SBSs tends to infinity (i.e., ahighly dense network), the
centralized scheme becomes infeasible,and therefore, we use a mean
field stochastic game to obtaina distributed power control policy
for each SBS. By solving asystem of partial differential equations,
we derive the powercontrol policy of SBSs given the knowledge of
mean fielddistribution and the available harvested energy levels in
thebattery of SBSs.
Index TermsSmall cell networks, power control, energyharvesting,
stochastic game, mean field game.
I. INTRODUCTION
Energy harvesting from environment resources (e.g.,through solar
panels, wind power, or geo-thermal power) isa potential technique
to reduce the energy cost of operatingthe base stations (BSs) in
emerging multi-tier cellular net-works. While this solution may not
be practically feasible formacrocell base stations (MBSs) due to
their high power con-sumption and stochastic nature of energy
harvesting sources, itis appealing for small cell BSs (SBSs) that
typically consumeless power [2].
Designing efficient power control policies with
differentobjectives (e.g., maximizing system throughput) is among
oneof the major challenges in energy-harvesting networks. In
[3],the authors proposed an offline power control policy for
two-hop transmission systems assuming energy arrival informationat
the nodes. The optimal transmission policy was given by
thedirectional water filling method. In [4], the authors
generalizedthis idea to the case where many sources supply energy
to the
The authors are with the Department of Electrical and Computer
Engineerngat the University of Manitoba, Canada (emails:
[email protected],{Ekram.Hossain,Hina.Tabassum}@umanitoba.ca).
destinations using a single relay. A water filling algorithm
wasproposed to minimize the probability of outage. Although
theoffline power control policies provide an upper bound
andheuristic for online algorithms, the knowledge of
energy/dataarrivals is required which may not be feasible in
practice.In [5], the authors proposed a two-state Markov
DecisionProcess (MDP) model for a single energy-harvesting
deviceconsidering random rate of energy arrival and different
prioritylevels for the data packets. The authors proposed a
low-costbalance policy to maximize the system throughput by
adaptingthe energy harvesting state, such that, on average, the
harvestedand consumed energy remain balanced. Recently, in [6],
theoutage performance analysis was conducted for a
multi-tiercellular network in which all BSs are powered by the
harvestedenergy. A detailed survey on energy harvesting systems
canbe found in [7] where the authors summarized the currentresearch
trends and potential challenges.
Compared to the existing literature on energy-harvestingsystems,
this paper considers the power control problemfor downlink
transmission in two-tier macrocell-small cellnetworks considering
stochastic nature of the energy arrivalprocess at the SBSs. Note
that the power-control policies andtheir resulting interference
levels directly affect the overallsystem performance. The design of
efficient power controlpolicies is thus of paramount importance. In
this context, weformulate a discounted stochastic game model in
which allSBSs form a coalition to compete with the MBS in orderto
achieve the target signal-to-interference-plus-noise ratio(SINR) of
their users through transmit power control. Usingthe stochastic
game model, we capture the randomness ofthe system and the Nash
equilibrium power control policy isobtained as the solution of a
quadratic programming problem.For the case when the number the SBSs
is very large, thestochastic game is approximated by a mean field
game (MFG).In general, mean field games are designed to study the
strategicdecision making in very large populations of small
interactingindividuals. Recently, in [8], the authors modeled a
mean fieldgame to determine optimal power control policy for a
finitebattery powered small-cell network. In this paper, by solving
aset of forward and backward partial differential equations,
wederive a distributed power control policy for each SBS usingMFG
model.
The contributions of the paper can be summarized asfollows.
1) For a macrocell-small cell network, we consider a
cen-tralized energy harvesting mechanism for the SBSs in
-
which energy is harvested and then distributed to theSBSs
through a centralized energy queue (CEQ). Notethat the concept of
CEQ is somewhat similar to the con-cept of dedicated power beacons
that are responsible forwireless energy transfer to users in
cellular networks [9],[10]. Moreover, in the cloud-RAN architecture
[11],where along with data processing resources, a central-ized
cloud can also act as an energy farm that distributesenergy to the
remote radio heads each of which acts asan SBS. Subsequently, we
formulate the power controlproblem for the MBS and SBSs as a
discrete single-controller stochastic game with two players.
2) The existence of the Nash equilibrium and pure station-ary
strategies for this single-controller stochastic gameis proven. The
power control policy is derived as the so-lution of a
quadratic-constrained quadratic programmingproblem.
3) When the network becomes very dense, a stochasticMFG model is
used to obtain the power control policyas a solution of the forward
and backward differentialequations.
4) An algorithm using finite difference method is proposedto
solve these forward-backward differential equationsfor the MFG
model.
Numerical results demonstrate that the proposed power
controlpolicies offer reduced outage probability for the users of
theSBSs when compared to the greedy power control policieswherein
each SBS tries to obtain the target SINR of its userswithout
considering the strategies of other SBSs.
The rest of the paper is organized as follows. Section
IIdescribes the system model and assumptions used in the paper.The
formulation of the single-controller stochastic game modelfor
multiple SBSs is presented in Section III. In Section IV,we derive
the distributed power control policy using a MFGmodel when the
number of SBSs increases asymptotically.Performance evaluation
results are presented in Section Vbefore the paper is concluded in
Section VI.
II. SYSTEM MODEL AND ASSUMPTIONS
A. Energy Harvesting Model
We consider a single macrocell overlaid with M small cells.The
downlink co-channel transmission of the MBS and SBSsis considered
and it is assumed that each BS can serve only asingle user on a
given transmission channel during a transmis-sion interval (e.g.,
time slot). The MBS uses a conventionalpower source and its
transmit power level is quantized into adiscrete set of power
levels P = {pmin0 , ..., pmax0 }, where thesubscript 0 denotes the
MBS. This discrete model of transmitpower can also be found in
[12]. On the other hand, theSBSs receive energy from a centralized
energy queue (CEQ)which harvests renewable energies from the
environment. Weassume that only the CEQ can store energy for future
use andeach SBS must consume all the energy they receive from
theCEQ at every time slot. The energy arrives at the CEQ inthe form
of packets (one energy packet corresponds to oneenergy level in
CEQ). The number of energy packet arrivals(t) during any time
interval t is discrete and follows an
arbitrary distribution, i.e., Pr((t) = X). We assume thatthe
battery at the CEQ has a finite storage S. Therefore,the number of
energy packet arrivals is constrained by thislimit and all the
exceeding energy packets will be lost, i.e.,Pr((t) = S) = Pr((t)
S). The statistics of energyarrival is known a priori at both the
MBS and the CEQ. Attime t, given the battery level E(t), the number
of energypacket arrivals (t), and the energy packets Q(t) that the
CEQdistributes to the M SBSs, the battery level E(t + 1) at thenext
time slot can be calculated as follows:
E(t+ 1) = E(t)Q(t) + (t). (1)Given Q(t) energy packets to
distribute, the CEQ will
choose the best allocation method for the M SBSs ac-cording to
their desired objectives. Denote slot durationas T and the volume
of one energy packet as K, wehave the energies distributed to the M
SBSs at time t as(p1(t)T, p2(t)T, , pM (t)T ) where pi(t) is the
trans-mit power of SBS i at time t. Clearly we must have:
Mi=1
pi(t) =K
TQ(t). (2)
From the causality constraint, E(t) Q(t) 0, i.e., theCEQ cannot
send more energy than that it currently possesses.Note that E(t) is
the current battery level which is an integerand has its maximum
size limited by S. Since the battery levelof CEQ and the number of
packet arrivals are integer values,it follows from (1) that Q(t) is
also an integer.
Without a centralized CEQ-based architecture, each SBScan have
different harvested energy and in turn battery levelsat each time
slot, which will make this problem a multi-agent stochastic game
[13]. Although this kind of game can beheuristically solved by
using Q-learning [14], the conditionsfor convergence to a Nash
equilibrium are often very strictand in many cases impractical. By
introducing CEQ, the stateof the game is simplified into the
battery size of CEQ, andthe multi-player game is converted into a
two-player game.Another benefit of the centralized CEQ-based
architectureis that the energy can be distributed based on the
channelconditions of the users in SBSs so that the total payoff
will behigher than the case where each SBS individually stores
andconsumes the energy.
All the symbols that are used in the system model andsection III
are listed in Table I.
B. Channel Model
The received SINR at the user served by SBS i at time slott is
defined as follows:
i(t) =pi(t)gi,iIi(t)
, (3)
where Ii(t) =Mj 6=i
pjgi,j + p0gi,0 is the interference caused by
other BSs. gi,0 is the channel gain between MBS and the
userserved by SBS i, gi,i represents the channel gain between SBSi
and the user it serves, and gi,j is the channel gain betweenSBS j
and the user served by SBS i. Finally, pi(t) represents
ThucHighlight
ThucHighlight
-
TABLE I: List of symbols used for the single-controller
stochastic game model
gi Average channel gain between BS i and its associated user
gi,j Average channel gain between BS j and user of BS i0 (1) Target
SINR for MBS (SBS) E(t) (Discrete) Battery level of CEQ at time
t
T Duration of one time slot in seconds Q(t) Number of quanta
distributed by the CEQ at time tI0(t) Average interference at the
user served by the MBS at time t Ii(t) Average interference at the
user served by SBS i at time tS Maximum battery level of the CEQ P
Finite set of transmit power of the MBS
m, n Concatenated mixed-strategy vector for the MBS and theCEQ,
respectively
m(s), n(s) Probability mass function for actions of the MBS and
theCEQ, respectively, when E(t) = s
m(s, p) Probability that the MBS chooses power p P when E(t) = s
n(s, i) Probability that the CEQ sends i quanta when E(t) = s(t)
Energy harvested at time t pis Probability that the CEQ starts with
battery level s Discount factor of the stochastic game U0, U1
Utility function of the MBS and the CEQ, respectively
R0, R1 Payoff matrix for the MBS and the CEQ, respectively 0, 1
Discounted sum of the value function of the MBS and theCEQ,
respectively
the transmit power of SBS i at time t. The transmit powerof MBS
p0(t) belongs to a discrete set {pmin0 , ..., pmax0 }.We ignore the
thermal noise assuming that it is very smallcompared to the
cross-tier interference.
Similarly, the SINR at a macrocell user can be calculatedas
follows:
0(t) =p0(t)g0,0I0(t) +N0
, (4)
where I0(t) =Mi=1
pig0,i is the cross-tier interference from
M SBSs to the macrocell user, g0,0 denotes the channel
gainbetween the MBS and its user, g0,i represents the channel
gainbetween SBS i and macrocell user, and N0 is the
thermalnoise.
The channel gain gi,j is calculated based on path-loss andfading
gain as follows:
gi,j = |h|2ri,j , (5)where ri,j is the distance from BS j to
user served by BSi, h follows a Rayleigh distribution, and is the
path-lossexponent. We assume that the M SBSs are randomly
locatedaround the MBS and the users are uniformly distributed
withintheir coverage radii r.
III. FORMULATION AND ANALYSIS OF THESINGLE-CONTROLLER STOCHASTIC
GAME
A stochastic game is a dynamic multiple stage game, whichchanges
its form at each stage with some probability. In thisgame, the
total payoff to a player is often the discounted sumof the stage
payoffs or the limit inferior of the averages ofthe stage payoffs.
The transition of the game at each timeinstant follows Markovian
property, i.e., the current stage onlydepends on the previous one.
Therefore, a stochastic game canbe viewed as a generalization of
both repeated game and MDP.The single-controller game is a
stochastic game where the stateof the game is decided by one
player, namely, the controller.The other player can only influence
the state indirectly throughthe controller (see [15] for the
details).
A. Utility Functions of MBS and SBSsFor a given MBS and CEQ, we
formulate a two-player non-
cooperative stochastic power control game, where the MBSand the
SBSs try to maintain the average target SINR valuesof their users.
As in [16], we can define the utility function ofthe MBS at time t
as:
U0(p0, Q, t) = (p0(t)g0 0(I0(t) +N0))2, (6)
where I0(t) =Mi=1
pi(t)g0,i is the average interference at the
macrocell user at time t, and 0 is the target SINR of macrouser.
Similarly, the utility function of the CEQ is defined
asfollows:
U1(p0, Q, t) = 1M
Mi=1
(pi(t)gi 1Ii(t))2, (7)
where Ii(t) =Mj 6=i
pj(t)gi,j + p0(t)gi,0 is the average interfer-
ence at the user served by SBS i at time t. The arguments ofboth
the utility functions demonstrate that the action at time tfor the
MBS is its transmission power p0(t) while the actionof the CEQ is
the number of energy packets Q(t) that is usedto transmit data from
the SBSs. Later, in Remark 2, we willshow that the interference and
transmit power of each SBScan be derived from Q and p0. The
conflict in the payoffs ofboth the players arises from their
transmit powers that directlyimpact the cross-tier
interference.
Note that the proposed single-controller approach can beextended
to consider a variety of utility functions (averagethroughput,
total network throughput, energy efficiency, etc.)
B. Formulation of the Game Model
Unlike a traditional power control problem the action spaceof
the CEQ changes at each time and is limited by its batterysize.
Given the distribution of energy arrival and the discountfactor ,
the power control problem can be modeled by usinga
single-controller discounted stochastic game as follows: There are
two players: one MBS and one CEQ. The state of the game is the
battery level of the CEQ,
which is {0, ..., S}. At time t and state s, the action p0(t) of
the MBS is
its transmission power and belongs to the finite set P ={pmin0 ,
..., pmax0 }. On the other hand, the action of theCEQ is Q(t),
which is the number of energy packetsdistributed to M SBSs. Q(t)
belongs to the set {0, ..., s}.
Let m and n denote the concatenated mixed-stationary-strategy
vectors of the MBS and the CEQ, respec-tively. The vector m is
constructed by concatenat-ing S + 1 sub-vectors into one big vector
as m =[m(0),m(1), ...,m(S)] , in which each m(s) is a vectorof
probability mass function for the actions of the MBSat state s. For
example, if the game is in state s, m(s, p)gives the probability
that the MBS transmits with powerp. Therefore, the full form of m
will include the state s
-
and power p. However, to make the formulas simple, inthe later
parts of the paper, we will use m or m(s) todenote, respectively,
the whole vector or a sub-vector atstate s, respectively.
Similarly, for the CEQ, n(s, i) gives the probability thatthe
CEQ distributes i energy packets. Note that theavailable actions of
the CEQ dynamically vary at eachstate whereas the available actions
for the MBS remainunchanged at every state.
Pay-offs: At state s, if the MBS transmits with power p0and the
CEQ distributes Q energy packets, the payofffunction for the MBS is
U0(p0, Q) while the payofffunction for the CEQ is U1(p0, Q). We
omit t since tdoes not directly appear in U1 and U0.
Discounted Pay-offs: Denote by the discount factor( < 1),
then the discounted sum of payoffs of the MBSis given as:
0(s,m,n) = limT
Tt=1
tE[U0(m,n, t)], (8)
where E[U0(m,n, t)] is the average utility of macrouserat time t
if the MBS and the CEQ are using strategy mand n, respectively.
Similarly, we define the discountedsum of payoffs 1 at the CEQ. In
[17, Chapter 2], it wasproven that the limit of 0 and 1 always
exist whenT .
Objective: To find a pair of strategies (m,n)such that 0 and 1
become a Nash equilib-rium, i.e., 0(s,m,n) 0(s,m,n) n N and
1(s,m,n) 1(s,m,n) m Mwhere M and N are the sets of strategies of
MBS andCEQ respectively.
Given the distribution of energy arrival at the CEQ,
thetransition probability of the system from state s to state s
under action Q (0 Q s) of the CEQ is given as follows:
q(s|s,Q) =
Pr( = s (sQ)), if s < S1
SsX=0
Pr( = X), otherwise.(9)
The states of the game can be described by a Markovchain for
which the transition probabilities are defined by (9).Clearly, the
CEQ controls the state of the game while MBS hasno direct
influence. Therefore the single-controller stochasticgame can be
applied to derive the Nash equilibrium strategiesfor both the MBS
and the CEQ. The two main steps to findthe Nash equilibrium
strategies are: First, we build the payoff matrices for the MBS and
the
CEQ for every state s, where S s 0. Denote themby R0 and R1,
respectively.
Second, using these matrices, we solve a quadraticprogramming
problem to obtain the Nash equilibriumstrategies for both the MBS
and the CEQ.
C. Calculation of the Payoff Matrices
To build R0 and R1, we calculate U0 and U1 for everypossible
pair (p0, Q), where p0 P and 0 s S. In this
Fig. 1: Graphical illustration of the two BSs A, B, and the user
D locatedwithin the disk centered at B.
regard, we first derive the average channel gain gi,j .
Second,from the energy consumed Q and transmission power p0 ofthe
CEQ and the MBS, respectively, we decide how the CEQdistributes
this energy Q among the SBSs. Then, we calculatethe transmit power
at each SBS and obtain U0 and U1. Thenext two remarks provide us
with the methods to calculate U0and U1.
Remark 1. Given two BSs A and B, assume that a user D,who is
associated with B, is uniformly located within the circlecentered
at B with radius r (Fig. 1). Assume that A does notlie on the
circumference of the circle centered at B and = 4.Denote AB = R and
AD = d, then the expected value of d4,i.e., E[d4] is 1(R2r2)2 . If
A B, then E[d4] = 1r
2r2 given
that r BD 1. For other values of , E[rij ] can be easilycomputed
numerically using tools such as MATHEMATICA.
Proof. See Appendix A.
Recalling that gi,j = |h|2r4ij and that the fading and path-loss
are independent. We have gi,j = E[h2]E[r4ij ] whereE[h2] = if h
follows Rayleigh distribution with scaleparameter and E[r4ij ] can
be calculated using the remarkabove. Next we need to find how the
CEQ distribute its energyto each SBS such that U1 is maximized.
Remark 2. (Optimal energy distribution at the CEQ) If attime t,
the CEQ distributes Q energy packets to M SBSsand the MBS transmits
with power p0, then the transmitpowers (p1, p2, ..., pM ) at the M
SBSs are the solutions ofthe following optimization problem (t is
omitted for brevity):
maxp1,p2,...,pM
1M
Mi=1
pigi 1( Mj 6=i
pj gij + p0gi,0)
2 ,s.t.
Mi=1
pi =K
TQ,
Pmax pi 0, i = 1, 2, ...,M,(10)
where Pmax is the maximum transmit power of each SBS.Since this
problem is strictly concave, the solution (p1, ..., pM )always
exists and is unique for each pair (Q, p0). Thus,for each pair (Q,
p0), where Q {0, ..., S} and p0 {pmin0 , ..., pmax0 }, we have
unique values for U0(p0, Q) andU1(p0, Q).
Based on the remarks above, for each combination ofQ and p0, we
can find the unique payoff U0 and U1 of
ThucHighlight
ThucHighlight
ThucHighlight
-
MBS and CEQ. Since Q and p0 belongs to discrete setswe can find
the payoff for all of the possible combinationsbetween them. Thus,
we can build the payoff matrix R0for the MBS and R1 for the CEQ.
The matrix R0 has theform of a block-diagonal matrix diag(R00, ...,
R
S0 ), where each
sub-matrix Rs0 = (U0(p0, j))P{0,...,s}, with p0 P and
j {0, ..., s} is the matrix of all possible payoffs for theMBS
at state s. Similarly, we can build R1, which is thepayoff matrix
for the CEQ. A detailed explanation on howwe use them will be given
in the next subsection.
D. Derivation of the Nash Equilibrium
If we know the strategy m0 of the MBS, the discount factor, and
the probability pis that the CEQ starts with s energypackets in the
battery, then the stochastic game is reducedto a simple MDP problem
with only one player, the CEQ.For this case, denote the CEQs best
response strategy to m0by n. Then the CEQs value function
1(s,m0,n), wheres = 0, ..., S, is the solution of the following MDP
problem[17, Chapter 2]:
min1
Ss=0
pis1(s,m0,n),
s.t. 1(s,m0,n) r1(s,m0, j) + S
s=0
q(s|s, j)1(s,m0,n),
s, j, 0 j s and 0 s S(11)
with r1(s,m0, j) =p0P U1(p0, j)m0(s, p0) is the aver-
age payoff for the CEQ at state s when it consumes j quantaof
energy. Using the Dirac function , the dual problem canbe expressed
as
maxx
Ss=0
sj=0
r1(s,m0, j)xs,j ,
s.t.Ss=0
sj=0
[(s s) q(s|s, j)]xs,j = pis , 0 s S,
xs,j 0 s, j, 0 j s and 0 s S(12)
where (s) = 1 if s = 0 and (s) = 0 otherwise.By solving the pair
of linear programs above, the probability
that the SBS chooses action j at state s can be found asn(s, j)
=
xs,jsj=0 xs,j
. Using some algebraic manipulations, wecan convert the
optimization problem in (12) into a matrixform as:
min1
piT1,
s.t. H1 RT1m0, (P)and its dual as
maxx
mT0R1x,
s.t. xTH = piT,x 0, (D)
where R1 is the payoff matrices of the CEQ. Combining theprimal
and dual linear programs (i.e., (P) and (D) above) andusing the
same notations, we have the following theorems.
Theorem 1 (Nash equilibrium strategies [15]). If the statespace
and the action space are finite and discrete, and thetransition
probabilities are controlled only by player 2 (i.e.,the CEQ), then
there always exists a Nash equilibrium point(m,n) for this
stochastic game. Moreover, a pair (m,n) isa Nash equilibrium point
of a general-sum single-controllerdiscounted stochastic game if and
only if it is an optimalsolution of a (bilinear) quadratic program
given by
maxm,x,1,
[m(R0 +R1)x piT1 1T],s. t. H1 RT1m,
xTH = piT,
Rs0x(s) s1, s = 0, ..., S,m(s)T111 = 1, s = 0, ..., S,m,x 0,
(13)
where s is the maximum average payoff of the MBS at state s.The
sub-vector strategy n(s) of the CEQ at state s is calculatedfrom x
as:
n(s) =x(s)
x(s)T1. (14)
We can define different utility functions for the MBS andthe SBS
and apply the same method to achieve the Nashequilibrium. As long
as the number of states is finite and thetransition and the payoff
matrices remain unchanged over time,a Nash equilibrium point always
exists.
Theorem 2 (Best response strategy for the MBS). Given
astationary strategy n of the CEQ, there exists a pure
stationarystrategy m as the best response for the MBS. Similarly,
forany stationary strategy m of the MBS, there exists a
purestationary best response n of the CEQ.
Proof. See Appendix B.
Because for any mixed strategy of CEQ, MBS can find apure
stationary strategy as its best response, we only needto find Nash
equilibrium where the strategy of MBS isdeterministic. This problem
can be converted to a mixed-integer program with m as a vector of 0
and 1. We can usea brute-force search to obtain an equilibrium
point. For eachfeasible integer value of m we insert it into (13)
to obtain n.If the objective is zero, then (m,n) is the equilibrium
point.This theorem implies that the optimization problem in (13)can
be solved in a finite amount of time.
Notice that there can be multiple Nash equilibrium. There-fore,
to make the chosen Nash equilibrium point more mean-ingful, we use
the following lemma from [15].
Lemma 1. (Necessary and sufficient conditions for the
Nashequilibrium) m and n constitute a pair of Nash
equilibriumpolicies for the MBS and the CEQ if and only if
m(R0 +R1)x piT1 1T = 0. (15)
-
Since pis is the probability that the CEQ starts with senergy
level in the battery at starting time, from (11), piT1 isthe
average payoff of the CEQ with respect to the strategiesm,n. Using
the lemma above, we change the problem in (13)to a
quadratic-constrained quadratic programming (QCQP) asstated
below.
Proposition 1. (Nash equilibriums that favor the SBSs) TheNash
equilibrium (m,n) that has the best payoff for the CEQis a solution
of the following QCQP problem:
maxm,x,1,
piT1,
s.t. m(R0 +R1)x piT1 1T = 0,all constraints from (13).
(16)
By solving this QCQP, we obtain a Nash equilibrium thatreturns
the best average payoff for the CEQ. This bias towardthe SBSs is
crucial as the available energy of the CEQ islimited by the
randomness of the environment and thus theSBSs are more likely to
suffer when compared to the MBS.Again, there can be more than one
Nash equilibrium, but allof them must return the same payoff for
CEQ. Because theremay be multiple solutions, CEQ and MBS need to
exchangeinformation so that they agree on the same Nash
equilibrium.
Algorithm 1 Nash equilibrium for the stochastic game1: The MBS
and the CEQ build their reward matrices R0
and R1. For each possible pair of energy level and transmitpower
(Q, p0), the CEQ solves (2) to obtain a unique tuple(p1, p2, ...,
pM ) and record these results.
2: The MBS and the CEQ calculate their strategy m and
n,respectively, by solving (16).
3: At time t, the CEQ sends its current battery level s tothe
MBS. It also randomly chooses an action Q using theprobability
vector n(s).
4: The MBS then randomly picks power p0 using distribu-tion m(s)
and sends it back to the CEQ. Based on p0and Q, the CEQ searches
its records and retrieves thecorresponding tuple (p1, ..., pM
).
5: The CEQ distributes energy (p1T, p2T, ..., pMT
),respectively, to the M SBSs.
From Theorem 2, we know that there exist an equilibriumwith pure
stationary strategies for both MBS and CEQ. Recallthat with pure
strategy, the action of each player is a functionof the state.
Thus, if we can obtain this equilibrium, CEQ canpredict which
transmit power p0, the MBS will use based onthe current state
without exchanging information with MBSand vice versa.
E. Implementation of the Discrete Stochastic Game
For discrete stochastic control game with CEQ, each SBSfirst
needs to send its location and average fading channelinformation
E[h2] of its user to the CEQ and the MBS so thatcomplete channele
state information is known at both the MBSand CEQ. Since we only
use average channel gain, the CEQand the MBS only need to
re-calculate the Nash equilibriumstrategies when either the
locations of SBSs change, e.g., some
SBSs go off and some are turned on, or when the averageof
channel fading gain h changes, or when distribution ofenergy
arrival (t) at the CEQ is changes. Thanks to thecentral design,
SBSs and MBS only need to send informationabout their channel gains
to the CEQ to calculate the Nashequilibrium at the beginning.
Later, for each time slot, onlyCEQ and MBS need to exchange
information, therefore theoverhead for communications is small.
IV. MEAN FIELD GAME (MFG) FOR LARGE NUMBER OFSMALL CELLS
The main problem of the two-player single-controllerstochastic
game is the curse of dimensions. The timecomplexity of Algorithm 1
increases exponentially with thenumber of states S or the maximum
battery size. Note thatR0 and R1 have dimensions of |P| S(S + 1)/2,
so thecomplexity increases proportionally to S. Moreover,
unlikeother optimization problems, we are unable to relax the
QCQPin (16), because Theorem 1 states that the Nash equilibriummust
be the global solution of the quadratic programming (13).To tackle
these problems, we extend the stochastic game modelto an MFG model
for very large number of players.
The main idea of an MFG is the assumption of similarity,i.e.,
all players are identical and follow the same strategy.They can
only be differentiated by their state vectors. Ifthe number of
players is very large, we can assume that theeffect of a specific
player to other players is nearly negligible.Therefore, in an MFG,
a player does not care about othersstates but only act according to
a mean field m(t, s), whichusually is the probability distribution
of state s at time instantt [18]. In our energy harvesting game,
the state is the batteryE and the mean field m(t, E) is the
probability distributionof energy at time t in the area we are
considering. Whenthe number of players M is very large, we can
assume thatm(t, E) is a smooth continuous distribution function. We
willshow that the average interference at a SBS as a function ofthe
mean field m.
All the symbols that are used in this section are listed inTable
II.
A. Formulation of the MFG
Denote by E(v) the available energy in the battery of anSBS at
time v. Given the transmission strategies of other SBSs,each SBS
will try to maximize its long run generic utility valuefunction by
solving the following optimal control problem:
minp
U(0, E(0)) = E
[ T0
(p(v,E(v))g (I(v) +N0))2dv],
(17)s.t. dE(v) = p(v,E(v))dv + dWv, (18)
E(v) 0, p(v) 0, (19)where I(v) is the generic interference at a
user served by anSBS at time v and g is the channel gain between a
genericSBS and its user. The mean field m(v,E) is the
probabilitydistribution of energy E in the area at time v. Using M
as
ThucHighlight
ThucHighlight
ThucHighlight
-
TABLE II: List of symbols used for the MFG game model
g Average channel gain from a generic SBS to another user p(t,
R) Transmit power at a generic SBS as a function of RE (Continuous)
Battery level of an SBS m(t, E) Probability distribution of energy
E at time tR Energy coefficient eR = E m(t, R) Probability
distribution of energy coefficient R at time tWt Wiener process at
time t p(t) Average transmit power of a SBS at time t
p(t, E) Transmit power at a generic SBS as a function of E (or
R) Intensity of energy arrival or loss
the number of SBSs in a macrocell and assuming that theother
SBSs have the same average channel gain g to the userof the current
generic SBS, then, the average interferenceI(v) at the user served
by a generic SBS can be expressedas I(v) = Mgp(v), where p(v) =
0p(v,E)m(v,E)dE
can be understood as the average transmit power of
anothergeneric SBS. Since the MFG assumes similarity, p(v) can
beconsidered as the average transmit power of a generic SBS attime
v. To make the notation simpler, we denote = gM .
Thanks to similarity, all the SBSs have the same set ofequations
and constraints, so the optimal control problem forthe M SBSs
reduces to finding the optimal policy for only onegeneric SBS.
Mathematically, if an SBS has infinite availableenergy, i.e., E(0)
= , it will act as an MBS. However, forsimplicity, we will assume
that only the SBSs are involvedin the game and the interference
from the MBS is constant,which is included in the noise N0 as in
[19]. Except that, thesystem model and the optimization problem
here are similarto the case with the discrete stochastic game
model.
Assuming that the SBSs are uniformly distributed withinthe
macrocell with radius r centered at the MBS, the
averageinterference from the MBS to a generic user served by anSBS
can be easily derived by using a method similar to thatdescribed in
Remark 1. For the MFG model, the energy levelE is a continuous
non-negative variable. The equality in (18)shows the evolution of
the battery, where is a constantwhich is proportional to the
maximum energy arrival duringa time interval. Wv is a Wiener
process, thus dWv = vdv,where v is a Gaussian random variable with
mean zero andvariance 1. This model of evolution for battery energy
wasmentioned in [20]. The inflexibility of the energy arrival isthe
main disadvantage of using the MFG model comparedto the discrete
stochastic game model. The random arrival ofenergy is configured as
noise, so this can be either positiveor negative. We can consider
the negative part as the batteryleakage and internal energy
consumption. The final inequali-ties are the causality constraints:
The battery state E(v) andtransmit power must always be
non-negative. To guarantee thispositivity we follow [21] and change
the energy variable E(v)to E(v) = eR(v). This conversion is a
bijection map fromE(v) to R(v), thus we can write m(v,E) = m(v,R)
andp(v,E) = p(v,R), where > R > . The new optimalcontrol
problem can be rewritten as
minp(.)
U(0, R(0)) = E
[ T0
(p(v,R(v))g p(v) N0)2dv],
(20)
s.t. dR(v) = p(v,R(v))eR(v)dv + eR(v)dWv, (21)p(v) 0. (22)
To obtain the power control policy p, first, we derive the
Forward-Backward differential equations from the above prob-lem.
Then, we apply Finite Difference method to numericallysolve these
equations,
B. Forward-Backward Equations of MFGLets assume the optimal
control above starts at time t with
T t 0, we obtain the Bellman function U(t, R) as
U(t, R(t)) = E
[ Tt
(p(v,R(v))g p(v) N0)2dv].
(23)From this function, at time t, we obtain the
followingHamilton-Jacobi-Bellman (HJB) [21] equation:
minp0
{(p(t, R)g p(t) N0
)2 p(t, R)eRRU(t, R)}+tU +
2
2e2R2RRU = 0,
(24)
where p(t) = e
Rp(t, R)m(t, R)dR is the averagetransmit power at a generic SBS.
The Hamiltonianminp0
{(p(t, R)g p(t) N0
)2 p(t, R)eRRU(t, R)}is given by the Bellmans principle of
optimality. By applyingthe first order necessary condition, we
obtain the optimalpower control as follows:
p(t, R) =[p(t) + N0
g+eRRU
2g2
]+. (25)
Remark 3. The Bellman U , if exists, is a non-increasingfunction
of time and energy. Therefore, we have RU 0and tU 0.
From equation (25), given the current interference p(t) +N0 at a
user, the corresponding SBS will transmit less powerbased on the
future prospect e
RRU2g2 . If the future prospect is
too small, i.e., eRRU
2g2 < p(t)+N0g , it stops transmissionto save energy.
Replacing p back to the HJB equation we have
tU +2
2e2R2RRU + (p(t) + N)
2([p(t) + N +
eRRU2g
]+)2= 0,
(26)
which has a simpler form as follows:
tU +2
2e2R2RRU = (pg)
2 (p(t) + N)2. (27)Also, from (21), at time t, we have the
Fokker-Planck equation[18] as:
tm(t, R) = R(peRm) +
2
2RR(me
2R), (28)
-
where m(t, R) is the probability density function of R attime t.
Combining all these information we have the
followingproposition.
Proposition 2. The value function and the mean field (U,m)of the
MFG defined in (20) is the solution of the followingpartial
differential equations:
tU +2
2e2R2RRU = (pg)
2 (p(t) + N0)2, (29)
p(t, R) =
[p(t) + N0
g+eRRU
2g2
]+,
(30)
tm(t, R) =R(peRm) +
2
22RR(e
2Rm),
(31)
p(t) =
eRp(t, R)m(t, R)dR, (32)
m(t, R)dR =1, where m(t, R) 0. (33)
Lemma 2. The average transmit power p(t) of a generic SBSis a
derivative with respect to time of the average availableenergy in
the battery and can be calculated as
p(t) = ddt
e2Rm(t, R)dR. (34)
Proof. See Appendix C.
Since p is always non-negative, the average energy in anSBSs
battery is a decreasing function of time. That meansthe
distribution m should shift to the left when t increases.This is
because we use the Wiener process in (18). Since dWthas a normal
distribution with mean zero, the energy harvestedwill be equal to
the energy leakage. Therefore, for the entiresystem, the total
energy will decrease over time.
Lemma 3. If (U1,m1) and (U2,m2) are two solutions ofProposition
2 and m1 = m2, then we have U1 = U2.
Proof. First, from (21) we derive Fokker-Planck equation:
tm1(t, R) = R(p1eRm1) +
2
22RR(e
2Rm1),
tm2(t, R) = R(p2eRm2) +
2
22RR(e
2Rm2).
Since m1 = m2 = m, we subtract the first equation fromthe second
one to obtain R((p1 p2)eRm) = 0. Thismeans (p1 p2)eRm is a function
of t. Let us denotef(t) = (p1 p2)eRm, then we have (p1 p2)m =
f(t)eR.From Lemma 2, p is a function of m, thus p1(t) = p2(t).Since
p(t) =
eRpmdR, we have
eRp1mdR =
eRp2mdR
eR(p1 p2)mdR = 0, t. (35)
Now we substitute (p1 p2)m = f(t)eR that results in f(t)dR = 0
t. This means f(t) = 0 or p1 = p2.
Note that U is a function of p and p. Since p1 = p2 andp1 = p2,
it follows that U1 = U2. This lemma confirms thatan SBS will act
only against the mean field m. Thus m is theone that determines the
evolution of the system. Two systemswith the same mean field will
behave similarly.
C. Solving MFG Using Finite Difference Method (FDM)
To obtain U and m, we use the finite difference method(FDM) as
in [21] and [27]. We discretize time and en-ergy coefficient R into
large intervals as [0, ..., Tmaxt] and[RmaxR, ..., RmaxR] with t
and R as the stepsizes, respectively. Then U,m, p become matrices
with sizeTmax (2Rmax + 1). To keep the notations simple, we uset
and R as the index for time and energy coefficient in thesematrices
with t {0, ..., Tmax} and R {Rmax, ..., Rmax}.For example, m(t, R)
is the probability distribution of energyeRR at time tt. Using the
FDM, we replace RU , tU ,and 2RRU with the discrete formula as
follows [28]:
tU(t, R) =U(t+ 1, R) U(t, R)
t, (36)
RU(t, R) =U(t, R+ 1) U(t, R 1)
2R, (37)
2RRU(t, R) =U(t, R+ 1) 2U(t, R) + U(t, R 1)
(R)2. (38)
By using them in (29) and after some simple algebraic steps,we
have
U(t 1, R) = U(t, R) + e2R 2t
2(R)2A1 tB1, (39)
where
A1 = U(t, R+ 1) 2U(t, R) + U(t, R 1),B1 = (p(t, R)g)
2 (p(t) + N)2 .Similarly, discretizing (32), we have
m(t, R) =t
2RA2 +
2t
2(R)2B2 +m(t 1, R), (40)
where
A2 =e(R+1)Rp(t 1, R+ 1)m(t 1, R+ 1) e(R1)Rp(t 1, R 1)m(t 1, R
1), (41)
B2 =e2(R+1)Rm(t 1, R+ 1) 2e2RRm(t 1, R)+ e2(R1)Rm(t 1, R 1).
(42)
To obtain U,m, p, and p using Proposition 2, we need tohave some
boundary conditions. First, to find m, we assumethat there is no
SBS that has the battery level equal to or largerthan eRmaxR so
that m(t, Rmax) = 0, t. This is true ifwe assume that e(Rmax1)R is
the largest battery size of anSBS. Also, when R = Rmax, from the
basic property ofprobability distribution
RmaxR=Rmax
m(t, R)R = 1
m(t,Rmax) = 1R
RmaxR=Rmax+1
m(t, R).
(43)
-
Next, to find U , again we need to set some boundaryconditions.
Notice that U(Tmax, R) = 0 for all R. We furtherassume the
following: Intutitively, if the battery level of a SBS is full,
i.e., whenR = Rmax, this SBS should transmit something (becausethe
thermal noise N > 0), or equivalently, p(t, Rmax) >0. That
means
p(t, Rmax) =p(t) + N
g+eRmaxRRU(t, Rmax)
2g2.
(44)Therefore, if we know U(t, Rmax 1) and p, we cancalculate
U(t, Rmax).
Similarly, it must be true that when available energyis 0, i.e.,
R = Rmax, a SBS will stop transmission.Therefore, we can assume
p(t) + N
g+eRmaxRRU(t,Rmax)
2g2= 0. (45)
Again, if we know U(t,Rmax + 1), we can calculatethe boundary
U(t,Rmax).
During simulations, in some cases when the densityis very high,
we obtain very large (unrealistic) valuesof transmit power.
Therefore, we must put an extraconstraint for the upper limit. In
this paper, we useE(t) > p(t, E(t))T , or eR(t)R p(t, R(t))T
,where T is the duration of one time slot. This meanswe have to
limit the transmit power during one time stept to be smaller than
the maximum power that can betransmitted during one time interval T
.
Based on the above considerations, we develop an
iterativealgorithm (Algorithm 2) detailed as follows.
D. Implementation of MFG
For the MFG, we do not need the location information foreach
SBS. However, we need information about the averagechannel gain g,
g, the number of SBSs M in one macrocell,and the initial
distribution m0 of the energy of SBSs in themacrocell. Therefore,
some central system should measurethese information, solve the
differential equations, and thenbroadcast the power policy p to all
the SBSs. It is moreefficient than broadcasting all the information
to all SBSs andlet them solve the differential equations by
themselves. Again,the central system only needs to re-calculate and
broadcast toall SBSs a new power policy if there are changes in g,
g, orM .
V. SIMULATION RESULTS AND DISCUSSIONS
A. Single-Controller Stochastic Game
In this section, we quantify the efficacy of the
developedstochastic policy in comparison to the greedy power
controlpolicy. The stochastic policy is obtained from the
QCQPproblem. On the other hand, in the greedy policy, we followa
hierarchical method. First, the MBS chooses its transmitpower, then
each SBS tries to transmit with the power suchthat the SINR at its
user is nearest to its target value, ignoringthe co-interference
from other SBSs. Next, the MBS records
Algorithm 2 Iterative algorithm for FDMInitialize input
Set up Tmax (2Rmax + 1) matrices U , m, p, andT 1 vector p.
Guess arbitrarily initial values for power p, i.e.,p(t, R) =
eRR.
Initialize i = 1, U(Tmax, .) = 0, m(0, .) = m0(.),m(t, Rmax) =
0, and p(t, 0) = 0.
Initialize R and t as the step size of energy andtime with (R)2
> t.
Set MAX as the number of iterationSolve PDEs with FDMwhile i
< MAX do:
Solve the Fokker-Planck equation to get m using(40) and (43)
with given p, m0.
Update p(t) for Tmax t 0 using discrete formof equation
(32).
Calculate U for all t < Tmax by using (39), (44)and (45) with
p, p.
Calculate new transmission power pnew using (30)Regressively
update p = ap+bpnew with a+b = 1.for R {Rmax, ..., Rmax}
if p(t, R) > eRR
T then p(t, R) =eRR
T .endi i+ 1.
endLoop in Tmax time slots
At time slot t, SBS with energy battery eRR transmitswith power
p(t, R).
its current interference and then chooses a new transmit powerto
achieve the target SINR at its user and so on.
To solve the QCQP in (16), we use the fmincon functionfrom
Matlab. In the simulations, the CEQ has a maximumbattery size of S
= 21 and the volume of one energy packetis K = 25J. The duration of
one time interval is T = 5 msand the thermal noise is N0 = 108 W.
The MBS has twolevels of transmission power [10; 20] W and the SINR
outagethreshold is set to 5. The energy arrival at each SBS follows
aPoisson distribution with unit rate. The volume of each
energypacket arriving at the CEQ is C times larger than the
energypacket collected by each SBS. Thus, the amount of energy
ineach packet at the CEQ will be CK J. This shows that theCEQ
should have a more efficient method to harvest energythan each SBS
(in the case of greedy method). However, asthe maximum battery size
of the CEQ is S, its total availableenergy is always limited by the
product SCK J regardlessof M . For both the cases, each SBS can
receive up to 150J of energy from either the CEQ or the
environment. In thebeginning, the CEQ is assumed to have full
battery.
From Fig. 2, it can be seen that when the number of SBSs
issmaller than some value, the stochastic method achieves
betterresults. That is because, the CEQ can share energy among
theSBSs, and also the QCQP in (16) gives a Nash equilibrium
thatfavors the CEQ. However, at some point, the greedy methodwill
provide better results. This is not surprising since the CEQ
-
20 30 40 50 60 70 80 90 100
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of SBSs
Out
age
prob
abilit
y
Greedy MethodStochastic Method
Fig. 2: Outage probability of a small cell user with different
numberof SBSs when S = 21 states, C = 40, 1 = 0.002, 0 = 10.
1 1.5 2 2.5 3 3.5 4 4.5 5x 103
0.4
0.5
0.6
0.7
0.8
0.9
1
Target SINR of SBSs
Out
age
prob
abilit
y
Stochastic MethodGreedy Method
Fig. 3: Outage probability of a small cell user with different
targetSINR when S = 21 states, C = 40, M = 30 SBSs, 0 = 10.
can only store at most S K C J of energy. Therefore,when the
number of SBSs increases, the allocated power perSBS by the CEQ
reduces to zero while the greedy methodallows each SBS to harvest
up to 150 J no matter how largeM is. This means the greedy method
will provide a betterperformance compared to using the CEQ when M
is large.
Following Fig. 3, by increasing the threshold SINR target1 we
can reduce the outage probability of a user served byan SBS. This
is understandable since the average SINR willincrease to approach
the higher target and thus reduce theoutage probability. However,
for both the greedy and stochasticmethods, the slope is nearly flat
when the target SINR islarger than some value. This is because, to
increase the averageSINR, the SBSs need to transmit with higher
power to mitigatecross-tier interference. However, since the
battery capacity islimited for the CEQ and each of the SBSs, a
higher transmitpower means a higher consumption of harvested
energy, whichcan cause a shortage of energy later. Thus at some
pointincreasing the target SINR does not bring any benefits.
Fig. 4 shows the outage probability when increasing thequanta
volume by choosing a higher multiplier C for the CEQ.It is easy to
see that, with a higher C, i.e., choosing a moreeffective method to
harvest energy at the CEQ, we can achievea better performance. The
greedy method does not use theCEQ, so the outage probability
remains unchanged. Note that,since the battery size of each SBS is
limited to 150 J , at some
point, a higher C does not improve the outage probability.
40 50 60 70 80 90 100
0.3
0.4
0.5
0.6
0.7
Volume of one energy packet
Out
age
prob
abilit
y
Greedy MethodStochastic Method
Fig. 4: Outage probability of a small cell user with different
quantavolume when S = 21 states, M = 30 SBSs, 1 = 0.002, 0 =
10.
1 2 3 4 5x 103
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Target SINR of SBSs
Out
age
prob
abilit
y
Stochastic methodGreedy method
Fig. 5: Outage probability of a macrocell user when S = 21
states,M = 80 SBSs, C = 50, 0 = 10.
Fig. 5 shows the outage probability of the macrocell userwhen M
= 80 and C = 50. The stochastic method givesbetter results in this
case since the SBSs are more rationalin choosing their transmit
powers. Also, unlike the greedymethod, the CEQ has a fixed-energy
battery, so when M islarge, the average amount of energy
distributed to an SBSwill be small, which in turn limits the
cross-interference tothe MBS. With the greedy method, the SBSs use
highertransmit power to compete against the MBS; therefore,
itcreates a larger cross-interference and in turn increases
theoutage probability of MBS.
In summary, we see that the centralized method using aCEQ can
provide a better performance in terms of outageprobability for both
the MBS and the SBSs. However, since theCEQ has a fixed battery
size, the centralized method performspoorer when it needs to
support a large number of SBSs. Toimprove this inflexibility, we
can adjust other parameters asfollows: change target SINR, increase
the multiplier C, orincrease the battery size of each SBS.
B. Mean Field Game
We assume that the transmit power at the MBS is fixedat 10W and
it results in a constant noise at the user servedby a generic SBS.
The radius of the macrocell is r = 1000meter, so we have constant
cross-interference N0 = 105 W.
-
The target SINR is = 0.002 and assume that g = g =0.001. We
discretize the energy coefficient R into 80 intervals,i.e., Rmax =
40 and Tmax = 1000 intervals. Similar to thediscrete stochastic
case, each SBS can hold up to 150 J inthe battery, so the maximum
transmit power is 30 mW. Weimpose the threshold such that an SBS
will not transmit atR = Rmax = 40 or E = 0.6 J. The intensity of
energyloss/energy harvesting, is 1.
Fig. 6: Energy distribution over time when M = 400
SBSs/cell.
1 1.2 1.4 1.6 1.8 2 2.2 2.4x 104
0
0.5
1
1.5
2
2.5
Energy (Joules)
Prob
abilit
y di
strib
utio
n
t=0t=1st=2.5st=5s
Fig. 7: Energy distribution over time when M = 400
SBSs/cell.
For M = 400 SBSs/cell, we have g = 0.001 > =gM = 0.0008, so a
generic SBS does not need to use a largeamount of power in order to
obtain the target SINR. Notice thatp is the average transmit power
of a generic SBS. Therefore,if a generic SBS reduces p, the cost
term p also reduces.Thus the difference between the cost and the
received powerpg will be smaller, which is desirable. It makes
sense that ageneric SBS will try to reduce its power as much as
possible inthis case. The power cannot be zero though, because N0
> 0.Moreover, from Fig. 8 and Fig. 9, we see that, at the
beginning,the SBS with higher energy (i.e., 100 J) will transmit
witha high power and will gradually reduce to some value. TheSBSs
with smaller battery will increase their power gradually.Since the
transmit power is small, we see that in Fig. 6 andFig. 7, the
energy distribution shifts to the left slowly.
On the other hand, when M = 500 SBSs/cell, we haveg = = 0.001.
In this case, the effect is more complicatedbecause reducing the
transmit power may not reduce the gapbetween the received power pg
and the cost term p + N .
Fig. 8: Transmit power to serve a generic user using MFG when M
= 400SBSs/cell.
0 1 2 3 4 52
4
6
8
10
12
14
16x 105
Time (sec)
Tran
smitt
ed p
ower
(Watt
s)
E=70 microJE=75 microJE=80 microJE=100 microJ
Fig. 9: Transmit power for different energy levels when M =
400SBSs/cell.
Again, as can be seen from Fig. 11, the SBSs with
largeravailable energy will transmit with large power first and
aftersometime when there is less energy available in the system,all
of them start to use less power. Therefore, as can be seenin Fig.
10, the energy distribution shifts toward the left witha faster
speed than the previous case.
For M = 600 SBSs/cell, we have g < . This means eachSBS needs
to transmit with a power larger than the average pto achieve the
target SINR. In Fig. 12, we see that the behaviorof each SBS is the
same as in the previous case. That is, theSBSs with higher energy
transmit with larger power first andthen reduce it, while the
poorer SBSs increase their transmitpower over time.
We compare the MFG model against the stochastic discretemodel
for different values of M . For simplicity, we assumethat each SBS
has the same link gain to its user as g = 0.001.Also, we assume
that the channel gain from each SBS tothe user of another SBS is g
= 0.001. Using Remark 2,it can easily be proven that in this case,
each SBS willtransmit with the same power, i.e., if the CEQ sends
QCKJ of energy to M SBSs, then each SBS receives QCK/MJ. Then, the
interference at each SBS will be calculatedas g(M1)g+N0MT/(QCK)
with multiplier C = 20 and the
-
1 1.2 1.4 1.6 1.8 2 2.2 2.4x 104
0
0.5
1
1.5
2
2.5
Energy (Joules)
Prob
abilit
y di
strib
utio
n
t=0t=1st=2.5st=5s
Fig. 10: Energy distribution over time when M = 500
SBSs/cell.
0 1 2 3 4 50
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Time (sec)
Tran
smitt
ed p
ower
(Watt
s)
E=70 microJE=75 microJE=80 microJE=100 microJ
Fig. 11: Transmit power for different energy levels when M =
500SBSs/cell.
maximum battery size of the CEQ as S = 101. Because theMBS is
not a player of the game, the simulation step becomessimpler, and
we only need to solve a linear program for theMDP problem instead
of a QCQP. Therefore, we can call itas MDP method to accurately
reflect the difference.
For the discrete stochastic case, we discretize the
Gaussiandistribution to model the energy arrivals at the CEQ.
Thebattery size of each SBS is still 150 J . The average SINR ofa
generic small cell user using both MFG and MDP modelswith different
density is plotted in Fig. 13. We see that usingthe MFG model, the
average SINR increases at the beginningand then it starts falling
at some point. This is because, whenthe density is low, the
interference from the MBS is noticeable(i.e., 105 W in our
simulation). From the previous figures,it can be seen that an SBS
will increase its power whenthe density is higher. Therefore, after
some point the co-tierinterference becomes dominant and the average
SINR willbegin to drop. It means at some value of the density,
e.g.,M = 400 SBSs/cell in Fig. 13, we obtain the optimal
averageSINR. We notice that the MFG model performs better thanthe
MDP model with the CEQ. This is due to limited batterysize of CEQ
and nearly same channel gains of all SBS usersin a dense network
scenario.
In summary, we have two important remarks for the MFGmodel.
First, if the density of small cells is high, the SBSswill transmit
with higher power. Second, from Fig. 13, we see
0 1 2 3 4 50.005
0.01
0.015
0.02
0.025
0.03
Time (sec)
Tran
smitt
ed p
ower
(Watt
s)
E=70 microJE=90 microJE=110 microJE=150 microJ
Fig. 12: Transmission power over time when M = 600SBSs/cell.
100 200 300 400 500 600 700 800 900 10000
0.5
1
1.5
2
2.5
3
3.5
4
4.5x 103
Number of SBSs per macro cell
Aver
age
SINR
val
ue
MFG methodMDP methodTarget SINR
Fig. 13: Average SINR at a generic SBS.
that by choosing a suitable density of SBSs we can obtain
thehighest average SINR. Notice that from (30), it can be
easilyproven that the average SINR at a user served by an SBS
willalways be smaller than the target SINR (because RU <
0).Therefore, the highest average SINR is also the closest to
thetarget SINR, which is our objective in the first place.
VI. CONCLUSION
We have proposed a discrete single-controller
discountedtwo-player stochastic game to address the problem of
powercontrol in a two-tier macrocell-small cell network under
co-channel deployment where the SBSs use stochastic renewableenergy
source. For the discrete case, the strategies for boththe MBS and
SBSs have been derived by solving a quadraticoptimization problem.
The numerical results have shown thatthese strategies can perform
well in terms of outage probabilityexperienced by the users. We
have also applied a mean fieldgame model to obtain the optimal
power for the case whenthe number of SBSs is very large. We have
also discussed theimplementation aspects of these models in a
practical network.
APPENDIX A
Denote the distance between the SBS B to its user D asBD = a. If
D is uniformly located inside the disk centred at
-
B, the PDF of BD is fD(BD = a) = 2ar2 . Denote by thevalue of
the angle ABD, is uniformly distributed between(0, 2pi). Using the
cosine law d2 = R2 +a2 2aR cos(), weobtain
E[d4] = 2pi
0
r0
(R2 + a2 2aR cos )2 12pi
2a
r2da d.
(A-1)
First, we solve the indefinite integral over as
(R2 + a2 2aR cos )2d = f1(a, ) + f2(a, ) + L, where L is
aconstant and
f1(a, ) =2(R2 + a2)
(R2 a2)3 arctan(R+ a) tan 2
R a , and
f2(a, ) =2aR sin (R2 + a2 2aR cos )
(R2 a2)2 .
Since sin 0 = sin 2pi = 0, after integrating f2 over [0, 2pi],
wecan ignore it. Thus 2pi
0
(R2 + a2 2aR cos )2d = pi 2(R2 + a2)
(R2 a2)3 .
Next, we integrate the above result over a to obtain
theindefinite integral as:
1
r2
a
2(R2 + a2)
(R2 a2)3 da =1
r2a2
(R2 a2)2 + L. (A-2)
Applying the upper and lower limits of a, we complete
theproof.
APPENDIX BPROOF OF THEOREM 2
First we prove that given n, there exists a pure
stationarystrategy m which is the best response of the MBS against
n.Since the action set of the MBS is fixed, at each state s,
givenstrategy n(s) of the CEQ, the MBS just needs to choose amixed
stationary strategy m(s) such that its average payoffis maximized.
At state s, the average utility function of theMBS is
E[U1] =ps0P
sj=0
(ps0g0 0I(ps0, j))2m(s, ps0)n(s, j),
(B-1)where ps0 and j {0, ..., s} are the transmit power of
theMBS and the number of energy packets distributed at the CEQat
state s, respectively. I(ps0, j) is the average interferencefrom
other SBSs to the MBS if the CEQ distributes j energypackets and
the MBS transmits with power ps0. We have
I(ps0, j) =Mi=1
pig0,i+N,where (p1, p2, ..., pM ) is the solution
of Remark 2, with Q and p0 replaced by j and ps0,respectively.
Since
ps0Pm(s, p
s0) = 1 and ms is a non-
negative vector, we have
E[U1] maxps0P
sj=0
(ps0g0 0I(ps0, j))2n(s, j) . (B-2)
Since the set P is fixed and finite, there always exists atleast
one value of ps0 that achieves the maximum for the right
hand side. That means when the game in state s, the MBScan
choose this power level with probability of 1. However,obtaining a
closed-form ps0 is difficult because first we needto find (p1, ...,
pM ) in closed-form by solving (10).
Nevertheless, if the average channel gains from each SBSto the
macrocell user (say g0,SBS ) are same, we can obtain
ps0 in closed form by defining I(ps0, j) =
Mi=1
pig0,SBS+N0 =
g0,SBSMi=1 pi +N0 =
KT jg0,SBS +N0. The final equality
is from (2). Replacing this result back into (B-2), we have
E[U1] maxps0P
sj=0
(ps0g0 0(K
Tjg0,SBS +N0))
2n(s, j).
(B-3)
The right hand side of this inequality is a strictly
concavefunction (downward parabola) with respect to ps0. Note
thatsj=0 n(s, j) = 1. The parabola will achieve the maximum
value at its vertex given by
ps0 =0sj=0
(K
T g0,SBSj +N0)n(s, j)
g0. (B-4)
If ps0 is not available in P , since the right hand side of
theinequality above is a parabola w.r.t. ps0, the best response
p
s0
to n(s) is the one nearest to the vertex ps0 .On the other hand,
given strategy m of the MBS, the
problem of finding the best response strategy n for the CEQ
issimplified into a simple MDP in (11). Then, there always existsa
pure stationary strategy n [17, Chapter 2]. This completesthe
proof.
APPENDIX CPROOF OF LEMMA 2
Using from the stochastic differential equation in (18) attime
t, dE(t) = p(t, E(t))dt + dWt, we get the integralform as
follows:
E(t+ t) E(t) = t+tt
p(v,E(v))dv +
t+tt
dWv
= p(t, E(t))t + (Wt+t Wt),(C-1)
where t (t, t + t). We obtain the second equality usingthe mean
value theorem for integrals: If G(x) is a continuousfunction and
f(x) is integrable function that does not changesign on the
interval [a, b], then there exists x [a, b] such that baG(t)f(t)dt
= G(x)
baf(t)dt. Since equation (C-1) is true
for all SBSs, taking expectation of this equality above for
allSBSs (or all possible values of E), we have
E[E(t+t)]E[E(t)] = E[p(t, E(t))]t+E[Wt+tWt]
0
Em(t+ t, E)dE
0
Em(t, E)dE = tE[p(t, E(t))],
(C-2)
where m(t, E) is the distribution of E in the system at
timeinstant t. Using the fact that W is a Wiener process, Wt+t
-
Wt follows a normal distribution with mean zero, we haveE[Wt+t
Wt] = 0.
By dividing both sides by t and letting t to be very small(or t
dt), we have t t andd0
Em(t, E)dE
dt=
0
p(t, E)m(t, E)dE
= p(t).(C-3)
Using m(t, R) = m(t, E), dE = eRdR, and changing thevariable E
to R we complete the proof.
REFERENCES[1] K. T. Tran, H. Tabassum, and E. Hossain, A
stochastic power control
game for two-tier cellular networks with energy harvesting small
cells,Proc. IEEE Globecom14, Austin, TX, USA, 8-12 December,
2014.
[2] M. Deruyck, D. D. Vulter, W. Joseph, and L. Martens,
Modellingthe power consumption in femtocell networks, IEEE WCNC
2012Workshop on Future Green Communications.
[3] B. Gurakan, O. Ozel, J. Yang, and S. Ukulus, Energy
cooperation inenergy harvesting communication, IEEE Transactions on
Communica-tions, vol. 61, no. 12, Dec. 2013, pp. 48844898.
[4] Z. Ding, S. Perlaza, I. Esnaola, and H. Poor, Power
allocation strategiesin energy harvesting wireless cooperative
networks, IEEE Transactionson Wireless Communications, Jan. 2014,
pp. 846860.
[5] N. Michelusi, K. Stamatiou, and M. Zorzi, Transmission
policies forenergy harvesting sensors with time-correlated energy
supply, IEEETransactions on Communications, vol. 61, no. 7, Jul.
2013, pp. 29883001.
[6] H. Dhillon, Y. Li, P. Nuggehalli, Z. Pi, and J. Andrews,
Funda-mentals of heterogeneous cellular networks with energy
harvesting,www.arxiv.org/pdf/1307.1524, Jan. 2014.
[7] D. Gunduz, K. Stamatiou, M. Zorzi, Designing intelligent
energyharvesting communication systems, IEEE Communications
Magazine,vol. 52, no. 1, Jan. 2014, pp. 210216
[8] P. Semasinghe and E. Hossain, Downlink power control in
self-organizing dense small cells underlaying macrocells: A mean
fieldgame, IEEE Transactions on Mobile Computing, to appear.
[9] K. Huang and V. K. Lau, Enabling wireless power transfer in
cellularnetworks: Architecture, modeling and deployment, IEEE
Transactionson Wireless Communications, vol. 13, no. 2, Feb. 2014,
pp. 902912.
[10] H. Tabassum, E. Hossain, A. Ogundipe, and D.I. Kim,
Wireless-PoweredCellular Networks: Key Challenges and Solution
Techniques, IEEECommunications Magazine, to appear.
[11] C-RAN the road towards green RAN, White Paper, China
MobileResearch Institure, October 2011.
[12] P. Semasinghe, E. Hossain, and K. Zhu, An evolutionary game
fordistributed resource allocation in self-organizing small cells,
IEEETransactions on Mobile Computing, DOI
10.1109/TMC.2014.2318700.
[13] M. Bowling and M. Veloso, An analysis of stochastic game
theory formultiagent reinforcement learning, Technical Report,
Carnegie MellonUniversity.
[14] J. Hu and M. P. Wellman, Nash Q-learning for general-sum
stochasticgames, Journal of Machine Learning Research, vol. 4, Nov.
2003, pp.10391069.
[15] J. A. Filar, Quadratic programming and the
single-controller stochasticgames, Journal of Mathematical Analysis
and Applications, vol. 113,no. 1, Jan. 1986, pp. 136147.
[16] V. Chandrasekhar and J. Andrews, Power control in two-tier
femtocellnetworks, IEEE Transactions on Wireless Communications,
vol. 8, no.8, Aug. 2009, pp. 43164328.
[17] J. Filar and K. Vrieze, Competitive Markov Decision
Process. Springer1997.
[18] O. Gueant, J. M. Lasry, and P. L. Lions, Mean field games
and appli-cations, Paris-Princeton Lectures on Mathematical
Finance, Springer,2011, pp. 205266.
[19] A. Y. Al-Zahrani, R. Yu, and M. Huang, A joint cross-layer
and co-layer interference management scheme in hyper-dense
heterogeneousnetworks using mean-field game theory, IEEE
Transaction of VehicularTechnology, to appear, Oct. 2014.
[20] H. Tembine, R. Tempone, and P. Vilanova, Mean field games
forcognitive radio networks, in Proc. of American Control
Conference,June 2012.
[21] MFG Labs,A mean field game approach to oil
production,http://mfglabs.com/wp-content/uploads/2012/12/cfe.pdf
accessed in 14-Nov-2014.
[22] S. Guruacharya, D. Niyato, D. I. Kim, and E. Hossain,
Hierarchicalcompetition for downlink power allocation in OFDMA
femtocell net-works, IEEE Transactions on Wireless Communications,
vol. 12, no. 4,Apr. 2013, pp. 15431553.
[23] E. Altman et al., Constrained stochastic games in wireless
networks,IEEE Globecom General Symposium, Washington D.C.,
2007.
[24] E. Altman et al., Dynamic discrete power control in
cellular networks,IEEE Transactions on Automatic Control, vol. 54,
no. 10, Oct. 2009, pp.23282340.
[25] M. Huang, Member, P. E. Caines, and R. P. Malham, Uplink
poweradjustment in wireless communication systems: A stochastic
controlanalysis, IEEE Transactions on Automatic Control, vol. 49,
no. 10,Oct. 2004, pp. 16931708.
[26] T. Tao, Mean field games,
http://terrytao.wordpress.com/2010/01/07/mean-field-equations/
accessed in 14-Nov-2014.
[27] D. Bauso, H. Tembine, and T. Basar Robust mean field games
withapplication to production of an exhaustible resource, Proc. of
the 7thIFAC Symposium on Robust Control Design, Aalborg, 2012.
[28] J. Li and Yi-Tung Chen, Computational Partial Differential
EquationsUsing MATLAB, Chapman and Hall/CRC, 2008.
[29] Y. Achdou, I. C. Dolcetta, Mean field games: Numerical
methodsSIAM Journal on Numerical Analysis, vo. 48, no. 3, 2010, pp.
11361162.
[30] W. A. Strauss, Partial differential equations : an
introduction, 2ndedition, Wiley, 2007.