-
Argos: Practical Many-Antenna Base StationsClayton Shepard1,
Hang Yu1, Narendra Anand1, Li Erran Li2,
Thomas Marzetta2, Richard Yang3, and Lin Zhong11Rice University,
Houston, TX 2Bell Labs, Murray Hill, NJ 3Yale University, New
Haven, CT
{cws, hang.yu, nanand, lzhong}@rice.edu {erranlli,
tlm}@research.bell-labs.com [email protected]
ABSTRACTMulti-user multiple-input multiple-output theory
predictsmanyfold capacity gains by leveraging many antennas
onwireless base stations to serve multiple clients simultane-ously
through multi-user beamforming (MUBF). However,realizing a base
station with a large number antennas is non-trivial, and has yet to
be achieved in the real-world.We present the design, realization,
and evaluation of Ar-
gos, the first reported base station architecture that is
ca-pable of serving many terminals simultaneously throughMUBF with
a large number of antennas (M � 10). De-signed for extreme
flexibility and scalability, Argos exploitshierarchical and modular
design principles, properly parti-tions baseband processing, and
holistically considers real-time requirements of MUBF. Argos
employs a novel, com-pletely distributed, beamforming technique, as
well as aninternal calibration procedure to enable implicit
beamform-ing with channel estimation cost independent of the
numberof base station antennas. We report an Argos prototype with64
antennas and capable of serving 15 clients simultaneously.We
experimentally demonstrate that by scaling from 1 to 64antennas the
prototype can achieve up to 6.7 fold capacitygains while using a
mere 1/64th of the transmission power.
Categories and Subject DescriptorsC.2.1 [Computer-Communication
Networks]: NetworkArchitecture and Design—Wireless
Communication
KeywordsLarge-scale Antenna Systems (LSAS), Many-Antenna,
Mas-sive MIMO, Multi-User MIMO, Beamforming, Conjugate,MRT,
Zero-forcing
1. INTRODUCTIONDue to the popularization of smartphones, tablets
and
data-hungry applications, mobile data traffic is growing
ex-ponentially, with the expectation that it will increase 18-fold
within 5 years [7]. In response, wireless operators arescrambling
to acquire more spectrum resources and deploymore base stations to
increase spatial reuse. However, thereis a fundamental spectrum
efficiency limit to existing cel-lular network architectures: they
are single-user systems.That is, a base station serves only one
terminal in a given
Permission to make digital or hard copies of all or part of this
work forpersonal or classroom use is granted without fee provided
that copies arenot made or distributed for profit or commercial
advantage and that copiesbear this notice and the full citation on
the first page. To copy otherwise, torepublish, to post on servers
or to redistribute to lists, requires prior specificpermission
and/or a fee.MobiCom’12, August 22–26, 2012, Istanbul,
Turkey.Copyright 2012 ACM 978-1-4503-1159-5/12/08 ...$15.00.
resource block (i.e., time slot, spectrum channel, or
codesequence). Information theory shows that this limit can
beovercome through multi-user multiple-input
multiple-output(MU-MIMO) [9]; one promising form of MU-MIMO is
calledmulti-user beamforming (MUBF). With MUBF, a base sta-tion
employs multiple antennas to send independent datastreams to
multiple terminals in the same resource block,effectively improving
spatial reuse. As theory shows, themore antennas a base station
has, the more terminals itcan serve simultaneously, resulting in
higher spectral capac-ity. Not surprisingly, the theory community
is envisioningMUBF base stations with hundreds of antennas.
However, building a MUBF base station with many an-tennas is
non-trivial. Scaling up baseband processing, clockdistribution,
transmission synchronization, and channel es-timation raises
serious system challenges. As a result, onlytestbeds with a few
antennas have been reported in the liter-ature, e.g., [5, 14].
Emerging wireless standards are similarlyrestricted to a small
number of antennas and terminals. Thekey question to the proposal
of MUBF base stations withmany antennas remains: is it practical at
all?
In this work, we answer this question affirmatively withArgos1,
a flexible base station architecture that is scalableup to
thousands of antennas and able to serve tens of termi-nals
simultaneously through MUBF. Using commercial off-the-self radio
modules, i.e., the WARP platform [4], we haverealized an Argos
prototype with 64 antennas that is capableof serving 15 terminals
through zero-forcing and conjugateMUBF. Extensive experimental
characterization using thisprototype shows that the spectral
capacity increases from12.7 bps/Hz when using a single-antenna to
85 bps/Hz forArgos employing zero-forcing MUBF, and to 38 bps/Hz
forArgos employing the less computationally intensive conju-gate
MUBF, while using a mere 1/64th of the single-antennatransmission
power. We show that the spectral capacitygrows nearly linearly with
the number of base station anten-nas and the number of
simultaneously served terminals, assuggested by theory. The scale
of our prototype and experi-mentation are only limited by the
number of WARP boardsthat are available to us. To the best of our
knowledge, Ar-gos is the first publicly reported many-antenna MUBF
basestation design and realization (M � 10). Our work demon-strates
the feasibility of the MUBF theory community’s pro-posal, and
presents key design principles for a scalable, flex-ible, and
cost-effective realization.
Argos achieves its scalability and flexibility with four
noveldesign principles. (i) First, Argos adopts a hierarchicaland
modular design. This allows it to scale up easily byincrementally
adding modules, e.g., WARP boards in the
1Argos is a giant with 100 eyes in Greek mythology. Thegreat
vision of Argos is analogous to the improved capacityof our
many-antenna base station.
-
=
=Constructive interference
Destructive interference
y
x
Figure 1: Aerial view of two antennas, representedby two dark
dots, emitting identical sine waves atthe same frequency. The two
waves perfectly rein-force each other along the x axis
(constructive inter-ference) but completely cancel each other out
alongthe y axis (destructive interference). Between thetwo axes the
interference gradually varies, produc-ing conical radiation know as
a beam pattern.
reported prototype. As Argos scales up it can select theoptimal
beamforming algorithm by thoroughly analyzingthe performance
factors and data dependencies of variousMUBF techniques. (ii)
Second, Argos intelligently parti-tions computation tasks among the
different modules in thehierarchy. In the downlink, data to
multiple terminals isbroadcast to all antennas. Each antenna
locally applies itsbeamforming weights and transmits the combined
signal toall terminals simultaneously. In the uplink, I and Q
sam-ples from each antenna are combined in upstream modulesalong
the hierarchy. (iii) For very large scale operation,Argos leverages
a modified version of conjugate beamform-ing that allows localized
weight computation at each an-tenna. Specifically, traditional
conjugate beamforming re-quires centralized transmission power
normalization, whileArgos conducts the normalization locally at
each antenna.This modification allows Argos to scale almost
indefinitelywith regard to baseband complexity. (iv) Finally,
Argosemploys a novel internal calibration procedure that
allowsimplicit beamforming across a large number of base
stationantennas without explicit channel state information
(CSI)estimation, enabling the real-time CSI estimation overheadto
become independent of the number of base station anten-nas.
Notably, implicit beamforming requires time divisionduplex (TDD)
operation, which is a substantial modificationto the frequency
division duplex (FDD) systems primarilyused in cellular networks
currently.In summary, we make the following contributions to
ad-
vance the state of the art of MUBF with many antennas:
• We design and realize Argos, a first-of-its-kind basestation
architecture that can scale up to thousands ofantennas serving tens
of terminals with either conju-gate or zero-forcing MUBF. We report
an Argos pro-totype with 64 antennas simultaneously serving 15
ter-minals;
• Using the Argos prototype, we experimentally demon-strate the
real-world feasibility of base stations em-ploying many-antenna
MUBF and their capability tosignificantly improve capacity;
• The design of Argos contributes multiple novel tech-
������
Figure 2: Multi-user beamforming employs base-band precoding and
multiple antennas to send inde-pendent data streams to multiple
terminals at thesame time.
niques to address key challenges toward realizing basestations
with a large number of antennas, includingclock distribution,
transmission synchronization, local-ized weight computation, and
channel calibration.
In the rest of this paper, we provide the background inSection
2. We present the design and implementation ofArgos in Sections 3
and 4, respectively. In Section 5 weevaluate the real-world
performance of Argos. In Sections 6and 7 we discuss related and
future work, respectively, andthen conclude in Section 8.
2. BACKGROUNDWe first provide some background on multi-user
beam-
forming (MUBF) and highlight the key benefits and chal-lenges of
using a large number of antennas on base stations.
2.1 Beamforming BasicsBeamforming utilizes multiple antennas
transmitting at
the same frequency to realize directional transmission. Dueto
constructive and destructive interference of signals frommultiple
transmission antennas, the signal strength receivedat different
directions varies spatially, leading to a beam pat-tern, as shown
in Figure 1. This beam pattern can be al-tered by changing the
beamforming weights applied to eachantenna, effectively altering
the amplitude and phase of thesignal sent from that antenna.
Closed-loop beamforming em-ploys CSI to calculate the beamforming
weights that maxi-mize the signal strength at intended receivers
and minimizethe interference at unintended ones.
2.2 Single and Multi-user BeamformingThere are two major
categories of closed-loop beamform-
ing: Single-user beamforming (SUBF) and Multi-user beam-forming
(MUBF). SUBF maximizes the signal strength ata single intended
receiver by using beamforming weightsthat are the complex conjugate
of the CSI, which is alsoknown as maximum ratio transmission [15].
MUBF concur-rently transmits multiple data streams, each to a
differentintended receiver as shown in Figure 2. Not
surprisingly,information theoretical studies have shown that MUBF
canimprove spectral capacity manyfold due to its spatial
multi-plexing gain.
There are many baseband techniques to realize MUBF.We focus on
linear precoding since other methods are com-putationally
infeasible for practical systems. Let s denote
-
a K × 1 vector representing the data-bearing symbols to Kusers.
Linear precoding creates a transmission vector s′ forM antennas, by
multiplying the original data vector s by aM × K matrix W: s′ = W ·
s. Where W consists of thebeamforming weights.In this work, we
study two important forms of linear-
precoding for MUBF: conjugate and zero-forcing. Let Hdenote the
M ×K channel matrix between the M base sta-tion antennas and K
concurrent terminals. Let c denote aconstant chosen to satisfy a
transmission power constraint.Conjugate: W = Wconj = c · H∗, where
H∗ is the
complex conjugate of H. In other words, conjugate beam-forming
simply takes the complex conjugate of each channelcoefficient in H
as the beamforming weight, normalized by c.Indeed, it can be viewed
as simultaneous single-user beam-forming to K terminals by
aggregating the signals intendedfor these terminals. Conjugate MUBF
is sub-optimal andmay not perform well with a small M due to
inter-terminalinterference. This method has only been recently
proposedfor MUBF with a large number of antennas in [17].
Zero-forcing: W = c · Wzf = H∗(HTH∗
)−1. Zero-
forcing beamforming employs the CSI to precode the data-bearing
symbols so that they sum to zero, or a ‘null’, atunintended
receivers. The effectiveness of zero-forcing hasbeen experimentally
demonstrated recently [5] with a smallnumber of antennas (four) and
terminals (four). Zero-forcing MUBF can keep inter-terminal
interference to zeroif K ≤ M . However, due to the required matrix
inversionthe computational overhead quickly becomes infeasible
forreal-time applications, as will be discussed in Section
2.4.2.
2.3 Benefits of Many-Antenna MUBFIt is well known in information
theory that MUBF with
many antennas provide the following key benefits:First, MUBF can
greatly improve spectral capacity
through spatial reuse. Roughly speaking, the spectral ca-pacity
gain from MUBF is min(M,K) [9]. A large M allowsthe base station to
serve more terminals concurrently andtherefore achieve higher
spectral capacity.Second, a very large M allows a more
power-efficient and
cost-effective base station. The directional gain from usinga
large M can be used to compensate for reduced trans-mission power;
that is, a base station can achieve the samecapacity with a much
lower total transmission power. Underall conceivable propagation
conditions doubling the numberof base station antennas permits the
total radiated powerto be reduced by a factor-of-two with no
degradation ofperformance. Only when the number of antennas grows
solarge that it begins to envelope the terminals or
interveningscatterers will this effect cease. Moreover, multi-user
beam-forming distributes the total transmission power across
Mantennas, leading to a much lower transmission power perantenna.
The base station can therefore leverage cheaperpower amplifiers and
simpler RF filters. This eliminates theneed for active cooling,
further reducing power consumptionand total cost.Finally, since
power gains are reciprocal, the preceding
benefit also applies to terminals. Specifically, it
allowsbattery-constrained terminals to use much lower transmis-sion
power to achieve higher capacity.In Section 5, we will
experimentally demonstrate these
benefits using the Argos design.
2.4 Challenges to Many-Antenna MUBFRealizing the key benefits
outlined above is, however, non-
trivial. Any implementation of MUBF with many antennasfaces
fundamental timing constraints imposed by the coher-ence time of
the physical wireless channel. MUBF mustcollect CSI for each
terminal then use it to calculate thebeamforming weights within a
small fraction of the coher-ence time. Additionally, the
computational complexity ofMUBF weight calculation grows with the
number of anten-nas, M , and the number of simultaneously served
terminals,K. The Argos design has to address both challenges.
2.4.1 CSI EstimationAcquisition of CSI fundamentally limits the
capacity of
MUBF with many antennas. MUBF with M antennas toserve K
terminals requires CSI between every base stationantenna and
terminal, or M ·K channels. Importantly, allM ·K physical channels
must be assessed within a periodmuch shorter than the channel
coherence time in order to beuseful. The coherence time of a
wireless channel depends onhow quickly the terminal and environment
move. In cellularsystems this is typically on the order of a few
milliseconds,but can drop below 500 μs [1] with vehicular mobility
at ornear the terminal. This results in a fundamental
tradeoffbetween the time spent collecting CSI, which dictates
howmany users can be served simultaneously, and the time al-located
to sending beamformed data to those users. Thistradeoff is explored
theoretically in [16].
Traditionally, CSI is estimated explicitly. That is, eachbase
station antenna broadcasts a pilot to the terminal,where the latter
then uses this pilot to estimate its chan-nel to each of the base
station antennas. In order for thischannel estimation to be useful,
it has to be fed back tothe base station in order to perform
downlink beamforming.The reverse of this procedure is then used to
find uplink CSI,though feedback is unnecessary for maximum ratio
combin-ing at the base station. This method thus requires
O(M+K)time to send pilots (one pilot from each base station
antennaand terminal) and O(M ·K) estimates that need to be sentback
over-the-air (M estimates from K terminals). Thisoverhead is
unavoidable in frequency division duplex (FDD)systems, since the
physical channel is not reciprocal at dif-ferent frequencies.
In time division duplex (TDD) systems the physical chan-nel is
reciprocal, and thus, theoretically, CSI could be esti-mated
implicitly. That is, uplink pilots could be used toperform downlink
beamforming, reducing channel estima-tion overhead to O(K) and
eliminating the required feed-back. This is often called implicit
beamforming. However,in practice, the uplink and downlink channels
consist of notonly the physical channel, but also the channels
introducedby the active RF components in the transmit and
receivehardware, as will be further discussed in Section 3.3.
2.4.2 Real-time Beamforming Weight CalculationThe computational
complexity of MUBF weight calcula-
tion also grows with the number of base station antennas,M , and
the number of terminals, K. For conjugate MUBF,the beam weight
computation is trivial. In hardware, tak-ing the complex conjugate
of a signal only needs a bit-flipand an adder. Therefore, the delay
introduced by weightcalculation is negligible. However,
zero-forcing requires thecomputation of a matrix inverse, a
calculation with the com-
-
plexity of O(M ·K2). Moreover, the inverse algorithm hasinternal
data-dependencies that limit its ability to be par-allelized. While
the incurred latency is acceptable at smallscales, the polynomial
time nature of the inverse makes itvery challenging for MUBF
systems with a large numberof antennas. For example, we estimate
that a single 15 by15 matrix inverse would require approximately
150 μs us-ing a specialized high performance FPGA
implementationreported in [8]. 150 μs is already 30% of the 500 μs
coher-ence time specified by the LTE channel model. Moreover,in a
wideband system such as LTE, this inversion has to beperformed for
every 14 subcarriers [17]. Thus, while thesecomputations may be
pipelined, the true overall inversiontime incurred will be far
greater than 150 μs.Additionally, existing beamforming techniques
incur a
high data transmission overhead because the channel esti-mates
and beam weights have to be exchanged between acentral controller
and each of the antennas. Even usingstate-of-the art hardware,
e.g., InfiniBand, such exchangeincurs a sizable latency cost. The
fastest InfiniBand bus has1 μs overhead per hop and 40 Gbps
transfer rate [3]; it willincur approximately 5 μs delay per
subcarrier group in a 15by 15 system. For a 20 MHz bandwidth this
amounts to overa 700 μs delay. Zero-forcing cannot avoid this data
exchangebecause the inverse calculation requires the full CSI
matrix,H. More subtly, even the simplest beamforming
algorithm,conjugate, requires full knowledge of H in order to
appropri-ately scale the power of the steering weights. In Section
3.4,we present a novel localized conjugate beamforming methodthat
eliminates the overhead due to data transfer betweenthe central
controller and antennas.
3. DESIGNThe key question we ask in this section is: how do
we
design a MUBF base station that can flexibly optimize
itsarchitecture over a wide range of M and K? Before pro-ceeding to
answer it, we want to highlight its practical in-terest: realistic
wireless networks often have large variationsin many of their
properties, including the financial budgetfor the base stations,
the terminal population within thecoverage, and the data traffic
volume from terminals. Tra-ditional base stations can only scale
their transmission poweror, equivalently, their cell size, to cope
with such variations.In contrast, Argos base stations can also
scale the numberof antennas to accommodate various deployment
needs.We argue that in order to meet these demands our many-
antenna base station must: (i) be economically affordablewith
cost proportional with M , (ii) scale as both M and Kbecome very
large, and (iii) select the optimal beamformingtechnique given
deployment requirements. We next presenthow our design of Argos
accomplishes these attributes.
3.1 ScalabilityThe first question is: can MUBF scale up with M ,
the
number of base station antennas? MUBF entails three dis-tinct
phases: channel estimation, weight calculation, andlinear
precoding. We explore the feasibility and design im-plications of
these as M scales up.
3.1.1 Channel EstimationExplicit channel estimation does not
scale well with M or
K. As discussed in Section 2.4.1, explicit channel estima-tion
typically requires M +K pilots to be sent, and M ·K
estimates to be fed back to the base station. This is clearlyan
unacceptable overhead for large-scale systems, and sug-gests that
Argos must employ TDD reciprocity and implicitbeamforming to reduce
this overhead to K pilots and elim-inate the feedback. In order to
enable this, however, wemust first overcome the asymmetries
introduced by the RFhardware. To accomplish this we devise a novel
internalcalibration scheme, which we present in Section 3.3.
3.1.2 Beamforming MethodsUnfortunately, existing beamforming
methods are dis-
tinctly unscalable, as they all have centralized data
require-ments and typically have polynomial time complexity,
asdiscussed in Section 2.4.2.
In light of this, we propose a novel beamforming methodthat
allows weights to be computed completely locally, ateach base
station radio, as described in Section 3.4. Lever-aging this method
allows additional radios to be added with-out requiring additional
bandwidth, enabling Argos to easilyscale up to an unprecedented
number of base station anten-nas, e.g., 1000s.
However, while this beamforming method performs wellwith a very
large number, e.g., 100s, of base station an-tennas serving 10s of
terminals simultaneously, it is wellknown to be sub-optimal for
smaller scale systems, e.g.,M = 30,K = 10. We demonstrate this
empirically in ourresults, Section 5, where we find that
zero-forcing results inup to a 4 fold capacity increase over our
method. However,this does not account for the data transport and
computa-tional overhead of zero-forcing, which becomes
prohibitivewith a large number of users or high mobility, as
describedin Section 2.4.2. Thus we conclude that in order to scale
op-timally Argos must support traditional, centralized beam-forming
techniques for smaller scale deployments.
3.1.3 Linear PrecodingLinear precoding requires each antenna to
transmit a data
stream that is the linear combination of K data streamswith K
beamforming weights. One design option is to ap-ply these weights
centrally. Since each antenna transmitsa distinct data stream, this
would require the central con-troller to deliver M I and Q sample
streams to each of theindividual radios. This approach, obviously,
does not scalewell, since it requires the central controller to
have an outputbandwidth proportional to M . As M increases to
hundredsor even thousands, this becomes exorbitantly expensive
andeventually intractable. Thus we conclude that in any effi-cient
scalable design, the beamforming weights should beapplied at the
radio. This design choice conveniently allowsall of the radios to
share a common databus for downlinktransmission. In contrast, for
uplink transmission, the radioleverages the same linear precoding
to apply K beamform-ing weights to the incoming I and Q samples.
Since eachradio has unique weights, this again results in M
uniquedata streams (that are K wide)! Fortunately, linear
pre-coding requires these streams to simply be added
together;conveniently, this can be done anytime two streams merge
inthe architecture, thus, again, enabling a constant
bandwidthdatabus. Indeed, we see that with careful design
decisionslinear precoding can scale up with constant data rate
re-quirements. Notably, there is still a need for some form
ofcentral controller to demodulate the data once it has been
-
�������
��������
������
�
������
�
������
�
������ ������ ������
�
�
�
�����������
������
����� ����� ������
������
������
Figure 3: Argos architecture: fat tree structure
withdaisy-chained leaf nodes.
completely recombined; however this operation is
latencyinsensitive, and computationally trivial.Thus we find that,
yes, MUBF can scale up with M , but
only with careful design choices and new methods for
weightcalculation and channel estimation.
3.2 Architecture and TopologyThe design choices to enable
scalability presented above
result in two distinct components: (i) a central
controller,which handles modulation and demodulation, and (ii) theM
radio front-ends that locally calculate beam weights andapply
linear precoding. The immediate question we need toanswer is: how
do we interconnect the controller and theradios? On one hand, we
can connect all the radios directlyto the controller. This requires
the controller to have atleast M ports. Since M can be dynamic and
very large, thiswould be a unscalable and inefficient design
choice. On theother hand, we can daisy-chain all the radios
serially. Whilescalability seems to be maximized, reliability and
delay ofthe system are severely compromised.Our solution is to add
hierarchies to the base station to
improve flexibility, and simultaneously achieve a balance
be-tween scalability, reliability, and delay. But, what type of
hi-erarchical structure should we adopt? First we note that
de-ploying M separate radios and antennas would be unwieldy,and
cost ineffective to manufacture; thus we create our firstlevel
hierarchy: a module that contains one or more radiofront-ends.
Next, in order to achieve flexible, cost-effective,scaling we allow
these modules to be connected serially; en-abling additional
modules to be added atomically with lowoverhead. Finally, in order
to increase reliability and reduceend-to-end latency, we introduce
the Argos hub, which al-lows multiple modules to be connected in
parallel. Figure 3depicts the Argos architecture.The Argos base
station enables unprecedented scalability
and deployability, while fulfilling performance and cost
con-straints. This architecture enables the Argos base station
toscale in three directions: by adding more Argos hubs, by
in-creasing the length of the module chains, and by increasingthe
number of antennas on a module. The hierarchal archi-tecture
facilitates deployments of base stations with manyantennas to be
flexibly distributed geographically by usinga single link to an
Argos hub, as well as deployments of basestations with a small
number of antennas where the hub canbe omitted completely, and
modules are simply chained to-gether in series. Additionally, if
chains become too long to
�������
������
�������
������
jih→it jr
ijh →ir jt
Downlink Channel: jih→ˆ
Uplink Channel: ijh →ˆ
Figure 4: Real channels are not reciprocal due to thedifferences
in transmit and receive hardware. Notethat channel reciprocity
indicates that within thechannel coherence time the physical
channel is re-ciprocal: hi→j = hj→i.
meet latency requirements, Argos hub can simply be addedto
parallelize connections and reduce latency.
3.3 Channel CalibrationWe devise a novel, completely internal,
calibration proce-
dure to enable implicit beamforming on many-antenna basestations
through TDD channel reciprocity in order to collectCSI data in
constant time with respect to M .
For an M antenna base station to multi-user beamformto K
terminals, the base station must acquire the downlinkchannel state
information, ĥm→k, for all m = 1, 2, ...,M andk = 1, 2, ...,K. The
key challenge is to estimate the effectivedownlink CSI ĥm→k from
the uplink CSI, ĥk→m, acquiredfrom the uplink pilot signals.
However, as shown by Fig-ure 4, the uplink and downlink channels
are not reciprocaldue to the random phase and amplitude differences
in theRF hardware. This is caused by a combination of
dynamiceffects from internal clocking structures, such as
dividers,multipliers, and PLLs, as well as static effects from
manu-facturing deviations. Indeed, we verify that simply resettinga
given radio i, or even tuning to a different frequency, ran-domizes
the phase effects of ti and ri.
The uplink and downlink channels between any two trans-ceivers
is a product of (i) the frequency response of thetransmit hardware,
(ii) the physical wireless channel, and(iii) the frequency response
of the receive hardware:
ĥi→j = ti · hi→j · rj (1)In order to estimate the reciprocal
channel, ĥj→i, we definea calibration coefficient, bi→j , between
radios i and j as:
bi→j =ĥi→jĥj→i
=ti · hi→j · rjri · hj→i · tj =
ti · rjri · tj =
1
bj→i(2)
Notably, if both channels are measured within the coherencetime
then hj→i = hi→j due to physical channel reciprocity.Clearly, if we
know the calibration coefficient between tworadios and one channel
estimate, we can find the reciprocalchannel:
ĥi→j = bi→j · ĥj→i or ĥj→i = ĥi→jbi→j
(3)
Now let’s apply this to our scenario where we would like
toestimate the downlink CSI from base station antenna m to
-
terminal k (ĥm→k) from the uplink CSI (ĥk→m). To do thiswe
must know the M calibration coefficients between eachbase station
antenna and the terminal, that is, all bm→k.These would be
impractical to find in a real-system, as esti-mating bm→k requires
pilots to be sent between every basestation antenna and terminal
pair, as well as feedback fromeach terminal. Moreover, unless the
terminal and base sta-tion share clocks, which is impossible in a
wireless system,their hardware transmit and receive channels drift
relativelyover time, thus requiring this calibration to happen
fre-quently. This approach would be counter-productive,
sinceestimating bm→k requires downlink pilots, which could beused
to directly estimate ĥm→k.
3.3.1 Internal CalibrationWe find that it is possible to
internally calibrate the
base station relative to one of it’s antennas, e.g., antenna1.
That is, we find all calibration coefficients bm→1 (form = 2, 3,
... M) using Equation 2. Note that these coef-ficients are in fact
stable over long periods of time, as weshow in Section 5.4, since
all base station antennas shareclocks. We also find that if we know
the calibration coeffi-cient between any two radios and a reference
radio, then wecan derive the direct calibration coefficient between
them:
bi→jbi→y
=
ti·rjri·tjti·ryri·ty
=ty · rjry · tj = by→j (4)
Thus if we know the calibration coefficient between our
refer-ence antenna and terminal k, b1→k, we can find the
downlinkCSI:
ĥk→m · b1→kb1→m
= ĥk→m · bm→k = ĥm→k (5)
This suggests that full CSI can be found by simply send-ing one
pilot from each of the terminals, then just one pilotfrom the base
station’s reference antenna! Unfortunately,however, to find b1→k we
must feedback the reference an-tenna’s downlink channel estimate,
ĥ1→k, from each of thek terminals. This significantly reduces the
channel capacity,and quickly becomes infeasible for even a moderate
K.
3.3.2 Key Idea: Relative CalibrationOur key idea in solving the
calibration problem is that
an absolutely accurate estimation of downlink CSI, ĥm→k,is
unnecessary. For all multi-user beamforming techniquesusing linear
precoding, it is sufficient for beamforming an-tennas to have a
relatively accurate estimation. That is, aslong as each base
station antenna’s CSI estimation deviatesfrom the real CSI by the
same multiplicative factor, multi-user beamforming will still
result in the same beam pattern.To visualize this, refer back to
Figure 1; if both antennaswere to experience the same phase offset,
the resulting spa-tial beam pattern would remain the same. Thus, we
canassume b1→k = 1:
ĥm→k = ĥk→m· b1→kb1→m
⇒ ĥ′m→k = ĥk→mb1→m
= ĥk→m·bm→1(6)
This means that we estimate relative downlink CSI, ĥ′m→k,by
using only uplink pilots, without any feedback! To reca-pitulate,
the entire CSI collection process involves 4 steps:
1. Find all internal calibration coefficients, b1→m, offline
by sending pilots to and from every base station an-tenna m and
reference antenna 1.
2. Send K orthogonal pilots from each terminal and de-termine
ĥk→m.
3. Derive all ĥ′m→k from 6.
4. Use ĥ′m→k to calculate the beam weights, then sendthe
beamformed data.
Using this process we can efficiently collect full channel
stateinformation at the base station by sending only K termi-nal
pilots, without any feedback from the terminals. Thisenables us to
scale M up without any additional channelestimation overhead, which
is a critical feature in order torealize a MUBF system with many
antennas.
Note that the measurements of downlink and uplink haveto be done
within the channel coherence time in order forhm→1 = h1→m. Since
base station antennas do not move,the channel coherence time is
much larger than typical basestation to terminal coherence times.
However, as we show inSection 4.5, this calibration can easily be
done well withineven highly mobile timing constraints; our
prototype com-pletes a single antenna pair calibration within 300
μs.
3.4 Decentralized BeamformingIn order to achieve scalable
real-time beamforming weight
calculation, Argos employs a novel method that allowsweights to
be calculated locally at each antenna, and there-fore avoid the
unscalable data-transport overhead requiredby existing beamforming
techniques. As discussed in Sec-tion 2.4.2, to perform traditional
conjugate beamforming,the weights must be globally normalized so
that no basestation radio exceeds its maximum power output. For
ex-ample, assuming a maximum radio transmit amplitude of1, and in
order to ensure at least one radio transmits atmaximum power:
c =
(max
(K∑
k=1
‖ĥm→k‖))−1
(m = 1, 2, ...M) (7)
where c is the scaling factor used in the beamforming
weightcalculation (W = c · H∗). Global power scaling is
charac-terized by using a single constant to scale all of the
weights.This global scaling is necessary to maintain the ratio
be-tween each base station antenna’s weight for a given ter-minal,
which ensures per-terminal transmission energy opti-mality, as
proven in [15]. However, each base station antennamust know either
c (or H) to properly scale its own beam-forming weights. This
requires full CSI to be transferredfrom each module to the central
controller, nullifying thebenefit from the aforementioned
decentralization. To tacklethis, we propose a local power scaling
approach that closelyapproximates global normalization.
Argos leverages a key observation that for the different
ter-minals in multi-user beamforming, the channels correspond-ing
to different terminals are uncorrelated and experienceindependent
fading. Therefore, statistically speaking, whenthe number of
terminals is large, the actual transmissionpower at each antenna is
very similar. Our solution simplynormalizes the total transmission
power locally at each base
-
���������
���������
�������
���������
���������������
���������
����
����������������
��������
������������
�������
�������������
����������
����������������
������ ��
���������
����
����
�������
�������
�������
�������
��
��������
����������
�����������
����������
��������
�����
�������
��������
��������
�����
��������
Figure 5: The implementation of Argos usingWARP boards, a
laptop, an ethernet switch, andan AD9523 based clock distribution
board.
station antenna using only the CSI it measures:
cm =
(K∑
k=1
‖ĥm→k‖)−1
(m = 1, 2, ...M) (8)
The conjugate beamforming weights are then scaled via:
W = H∗ · diag(C) (9)Where C is the scaling vector given by
Clocal = [c1, c2, ...cM ],from Equation 8; notably the globally
scaled conjugate canalso be found in this form, using Cglobal = [c,
c, c, c...], fromEquation 7.We have experimentally verified the
effectiveness of such
local power scaling and observed that its performance is al-most
indistinguishable from the optimal global power scalingmethod (see
Section 5), using equal transmit power for bothmethods. Moreover,
in real deployments, since local powerscaling ensures that each
radio can utilize its full hardwarepower capacity, it can always
achieve equal or greater SNRthan global power scaling (since it can
send with greatertotal transmit power), as proven in [22, p. 24].
Notably, ifterminals are not approximately equidistant from the
basestation, then per-terminal power scaling is required to en-sure
fairness (preventing terminals closer to the base stationfrom being
allocated all of the transmission power), but thiscan be done at a
much coarser time scale (i.e., seconds), thusnot creating
additional overhead or affecting performance.
4. IMPLEMENTATIONIn this section we provide a detailed report of
our imple-
mentation of Argos, which leverages WARP [4], commer-cially
available clock distribution boards, a commodity PC,and an ethernet
switch. Figure 5 shows an abstract represen-tation of our
implementation. As the first proof-of-conceptprototype, our system
includes a central controller, an Ar-gos hub and 16 modules, each
with 4 radios. The centralcontroller consists of a single host PC,
which uses MATLABto send data, weights, and control commands to the
radiomodules. The Argos hub is comprised of a 24-port ether-net
switch, a clock distribution board, and a WARP board,which uses its
GPIO pins to provide transmission synchro-nization
splitting/replication. Due to the limited availabilityof WARP
boards, this board also serves as a radio module,however these
roles are functionally separate, and in futuregenerations of the
Argos prototype they will be physicallyseparated as well. Each
radio module is a single WARPboard with 4 radio daughter cards and
4 antennas. Figure 6depicts the real system: the base station
includes 16 WARP
boards with 64 antennas that are compactly placed on a cus-tom
rack-mount platform. We note that the number of ter-minals
supported by each module is fundamentally limitedby its hardware
capabilities. In the WARP boards we are us-ing, this bottleneck is
the number of multipliers (328 on theVirtex 2 Pro xc2vp70) [26]. We
are able to use 240 of thesemultipliers to provide linear precoding
for 15 terminals onthe 4 antennas, which requires 60 complex
multipliers. Theremaining multipliers are used by other functions,
and 4 areunusable due to routing constraints. However, the
recentlyreleased Virtex 7 supports up to 3600 multipliers clockedat
a rate of 741 MHz; with multiplexing this would enable16,672
complex multiplies per 40 MHz sample (neglectingrouting overhead
and other functions that require multipli-ers), which would,
obviously, alleviate this bottleneck [25].
To the best of our knowledge, our Argos prototype is thefirst
publicly reported many-antenna MUBF system withreal-world
feasibility. We next elaborate on our implemen-tation.
4.1 Hardware and Software PlatformWARP is a scalable and
programmable wireless platform
for prototyping advanced wireless systems. Each WARPboard allows
up to four radio daughter cards to be connectedand therefore can
contribute up to four active antennas si-multaneously to Argos.
Each radio board includes a Maxim2829 transceiver chip [18], which
operates at the 2.4 or 5GHz ISM bands with a 20 MHz bandwidth. WARP
conve-niently provides a MATLAB-based framework, WARPLab,which
allows MATLAB to control the WARP boards andprocess the transmit
and receive data samples. As shownin Figure 5, WARPLab consists of
four layers: (i) The un-derlying Simulink model that implements the
custom hard-ware for controlling the FPGA board and radio boards;
(ii)The Xilinx Platform Studio (XPS) project that integratesand
connects all of the hardware components, including theSimulink
model, the I/O cores for the serial port, Ethernetport, clocking,
etc.; (iii) The C code that runs on the Pow-erPC microprocessor,
controls the hardware through mem-ory mapped I/O, and acts as an
interface to the Ether-net port; (iv) The MATLAB interface that
configures theboards, generates the transmit samples, and processes
thereceive samples.
We have extensively customized the WARPLab frame-work to enable
hardware MUBF, transmission synchroniza-tion, clock
synchronization, and indirect calibration amongbase station
antennas. These functionalities are essential tofor Argos to enable
MUBF with many antennas.
4.2 Hardware MUBFA straightforward, and much easier approach to
realize
MUBF in WARPLab is to implement it in software withinthe MATLAB
interface; this, in fact, was our first implemen-tation. In this
approach the beamformed baseband signalcan be directly delivered to
the WARP boards without theneed of linear-precoding in hardware.
However, this methodintroduces major latency between the CSI
collection anddata transmission, which increases linearly with the
numberof base station antennas, and severely degrades
performance.This is a result of the same scaling problem discussed
in Sec-tion 3.1. Therefore, we modified the WARPLab hardwareto
enable hardware MUBF.
At each base station antenna, applying the beamform-
-
��������
�����
����
��������
����������
������
�
������
�������
���������
�����
����
�������
�����
������������
Figure 6: The prototype of Argos with 16 modules and 64
antennas. Left: front side. Right: back side.
ing weights consists of multiplying the baseband symbol
in-tended for each terminal, sk, by its corresponding beam-forming
weight, wk, and then adding them together: s
′m =∑K
k=1 wk · sk where s′m is the resultant beamformed
signaltransmitted by antenna m. Multiplying the signal by a
com-plex number is equivalent to rotating the phase and scalingthe
amplitude. In hardware, this requires K registers and Kparallel
complex multipliers (each complex multiplier needs4 multipliers and
2 adders) in series with 2 K input adders.We store the beamforming
weights, wk(k = 1, 2, ...K), inmemory mapped registers. This is
important since it en-ables the PowerPC, and in turn, the MATLAB
interface todirectly control them.
4.3 Transmission SynchronizationWARPLab has a default function
to enable transmis-
sion synchronization between multiple WARP boards. Itis achieved
by using the built-in API command ”sendsync()”in the MATLAB
interface. However, due to the jitter in-troduced by the ethernet
stack, switch, and cables, suchsynchronization can lead to a timing
offset on the orderof 20 samples, depending on the ethernet switch
and ca-ble lengths, which makes accurate CSI collection and
beam-forming impossible.To address this challenge, we employ a WARP
board to
distribute the central controller’s transmission
synchroniza-tion signal. As part of the Argos hub, this WARP
nodeleverages directly connected, registered, GPIO to reliablysend
the sync pulse to the radio modules. Notably, to en-sure the
modules receive the pulse within 1 clock cycle, thecable lengths
should all be within one wavelength, λ. Witha channel bandwidth of
20 MHz, λ is 7.5 meters (40 MHzsampling clock), which is a very
easy constraint to meet. Asstated above, this WARP node serves the
dual role of syncdistribution and module, thus it “distributes” the
sync to it-self with an effective cable length of 0. This means the
othercables must be less than 7.5 meters, which is not a problem;in
our current setup the length is 2 meters. While each boardmay have
a slightly different clock phase, this phase offset isconstant (due
to the clock synchronization), and explicitlycompensated for by the
beamforming algorithm.We have modified the Simulink model, the XPS
project,
and the C code to enable GPIO-based transmission
synchro-nization. Specifically, we inserted appropriate gateways
andregisters into the Simulink model, re-mapped the GPIO pinsto the
appropriate signals in the XPS project, and disabledthe traditional
ethernet sync in the C code.
4.4 Clock SynchronizationPrecise inter-board clock
synchronization is critical for Ar-
gos, due to its distributed modular architecture. The WARPboard
requires two reference clocks: a 20 MHz RF clock anda 40 MHz
logic/sampling clock. Both clocks can be eitherforwarded or driven
by an external source. In addition, wediscovered that the Maxim
2829 transceiver chip on the ra-dio board can use a 40 MHz clock.
Therefore, we were ableto use a single external source to drive the
logic clock, thenforward the logic clock to the reference input for
the RFclock. This way, inter-board clock synchronization can
beachieved in an easily manageable and scalable way.
We leverage a commercial clock distribution evaluationboard
designed for LTE, the AD9523/PCBZ, to accomplishthis. The AD9523
provides 18 clock outputs, which we lever-age to drive all of the
radio modules. Although we haven’texceeded the capacity of the
AD9523, an additional clockdistribution board could be connected
(as part of an addi-tional Argos hub), which would provide 17 more
outputs.Alternatively, the existing modules can forward their
clocksto additional modules, through Argos’ multi-hop
extension.
4.5 Indirect CalibrationFor indirect calibration, we need to
estimate bm→1 =
tm·r1rm·t1 for each antenna m with respect to the
“referenceantenna,” as described in Section 3.3. Due to buffer
con-straints, we implement this in a per-module iterative fash-ion.
First, the module containing the reference antenna cali-brates
internally; that is, the reference antenna sends a pilotwhile other
antennas on the module listen, then each of thoseantennas sends a
pilot, in turn, while the reference antennalistens. These channel
estimates are then reported to thecentral controller so that the
reference antenna’s buffer canbe overwritten. Next, the reference
antenna sends a pilotsequence while all the antennas on another
module listen,then each of those antennas transmits a pilot, in
turn, whilethe reference antenna listens. Again, the channel
estimatesare reported to the central controller. The process is
thenrepeated for each module. The calibration procedure is
verylatency sensitive, as the physical channel should not
changebetween transmission and reception of pilots for any
antennapair. To address this, we implement the calibration
locallyon the PowerPC in C code and leverage Argos’
transmissionsynchronization to coordinate the send and receive
phases.The resulting calibration happens within 300 μs for each
an-tenna pair, which is well within the channel coherence time.
-
����������������� ����������������������
Figure 7: Environments and the locations of thebase station and
terminals for for the reported ex-periments. Note that the base
station leverages di-rectional antennas in order to serve one
sector. Ter-minals have vertical separation as well, spanning upto
three floors.
Another challenge we encountered while performing ourindirect
calibration approach is the significant amplitudevariation for the
channels between the reference antenna 1and other antennas. This is
due to the grid-like configura-tion of our antenna array where
different pairs of antennascan have very different antenna
spacings. According to ourmeasurement, the SNR difference can be as
high as 40 dB,leading to a dilemma for us to properly choose the
transmis-sion power for the reference signal. To address this, we
iso-late the reference antenna from the others, and place it in
aposition so that its horizontal distance to the other antennasare
approximately identical. Such placement of the referenceantenna
does not affect the calibration performance due toour calibration
procedure’s isolation of the radio hardwarechannel from the
physical channel.
5. EVALUATIONLeveraging our prototype, we experimentally
evaluate the
feasibility of Argos in realistic environments. We have
thefollowing impressive observation: compared to using a sin-gle
antenna, Argos can improve spectral capacity over 12fold leveraging
MUBF with many antennas, using equal to-tal transmission power.
With 64 antennas and 15 terminals,the spectral capacity can be
boosted from 12.7 bps/Hz to 85bps/Hz (6.7x) for zero-forcing MUBF,
and 38 bps/Hz (3x)for conjugate MUBF, while using a mere 1/64th of
the totaltransmission power. We find that Argos easily scales from1
to 64 base station antennas serving 1 to 15 terminals, andthat, in
general, performance scales linearly with M andK. Finally, we
experimentally validate the performance ofour localized conjugate
beamforming method, as well as ourinternal calibration
procedure.
5.1 Experimental SetupWe employ all 64 antennas at the base
station to perform
MUBF to 15 concurrent terminals. We use the 2.4 GHzband with a
625 kHz carrier width to avoid frequency fadingeffects. Since it is
relatively easy to move our platform (seeFigure 6), we tested
various indoor locations (see Figure 7)in order to collect data
from diverse environments. Thereare both LOS and NLOS channels
between the base stationand terminals. We repeat our experiments
multiple times,
20 30 40 50 600
10
20
30
40
50
60
70
80
90
Base Station Antennas
Tota
l Cap
acity
(bps
/hz)
Zero−forcingConjugateLocal Conj.SUBFSingle Ant.
Figure 8: Cell capacity as the number of base sta-tion antennas
(M) increases from 16 to 64, by 4,serving 15 terminals. In order to
compensate forthe beamforming gain, total transmission power is1/M
, implying average power per-antenna for multi-antenna schemes is
1/M2.
typically collecting over 3000 measurements at each location,to
reliably average out performance.
To obtain the cell capacity, we aggregate the Shannoncapacity
for each terminal, or CCell =
∑Kk=1 log(1+SINRk)
where SINRk is the measured SINR at terminal k. We letthe base
station transmit dummy QPSK-modulated framesto the terminals, which
is sufficient to validate the real-worldfeasibility of Argos since
MUBF is a physical layer techniquethat is orthogonal to the MAC
layer and above.
To accurately measure the terminal SINR, we use theRSSI
indicator from the Maxim 2829 transceiver on the ra-dio board to
report the received signal strength for eachtransmission, as well
as the noise floor after the transmissioncompletes. Since the radio
is unable to distinguish signaland interference strength, we
slightly stagger the transmis-sion to the intended terminal and
that to the unintendedterminals. This way we can separately measure
the signalpower and interference power, and acquire the SINR
accord-ingly. To make sure the channel remains constant during
thetransmissions we conduct our experiments in an
ultra-stableenvironment, i.e., late at night, without moving people
andwireless traffic.
5.2 Improvement of Cell CapacityThe primary purpose of our
experiments is to determine
the capacity improvement of Argos, in order to ultimatelyanswer
the practicality of the many-antenna MUBF basestation proposal from
the theory community. We reporttwo sets of experiments, which
evaluate the scalability withregards to the number of base station
antennas, M , and thenumber of terminals, K, respectively.
5.2.1 Scaling up with MIn the first set of experiments, we vary
the number of
base station antennas, M , assuming a fixed number of
ter-minals, K = 15. Figure 8 shows CCell as a function of Mfor a
base station with a single antenna (Single Ant.), SUBF(SUBF ),
conjugate MUBF (Conjugate), our modified local-ized conjugate MUBF
(Local Conj.), and zero-forcing MUBF(Zero-forcing). In order to
compensate for the beamforminggain, total transmission power is
scaled by a factor of 1/Mfor all five cases. This enables our
experiments to separatethe orthogonality gain of scaling up from
the well-known,
-
0 2 4 6 8 10 12 14 160
10
20
30
40
50
60
70
80
90
Number of Terminals
Tota
l Cap
acity
(bps
/hz)
Zero−forcingConjugateLocal Conj.
0 2 4 6 8 10 12 14 160
5
10
15
20
25
30
35
40
45
Number of Terminals
Tota
l Cap
acity
(bps
/hz)
Zero−forcingConjugateLocal Conj.
0 2 4 6 8 10 12 14 160
3
6
9
12
15
Number of Terminals
Tota
l Cap
acity
(bps
/hz)
Zero−forcingConjugateLocal Conj.
Figure 9: Cell capacity as the number of terminals K increases.
Total transmission power is held constantwithin each plot. Left: M
= 64; Middle: M = 15; Right: M = 15 with reduced transmission
power.
predictable, beamforming gain. We have the following
keyobservations:First, when M is much larger than K, both conjugate
and
zero-forcing MUBF increase the cell capacity asM scales
up,despite reducing the total transmission power proportionallywith
M . The beamforming gain from the additional anten-nas compensates
for the power reduction, as demonstratedby the flat performance of
SUBF, while simultaneously in-creasing the natural orthogonality of
the terminals. Thisreduces the inter-terminal interference of
conjugate MUBF,and reduces the amount of power wasted to create
nulls forzero-forcing MUBF. With M = 64 the improvement for
con-jugate and zero-forcing MUBF over a single antenna are 5.7xand
12.7x for equal power, or 3x and 6.7x for 1/64
power,respectively.Second, as M drops to K, i.e., M ≈ K = 15, the
per-
formance of zero-forcing drops steeply. This is due to
thetightness of the degrees of freedom at the base station;
zero-forcing inevitably wastes the majority of transmission
powerfor interference cancelation, leading to a much reduced
signalpower at the intended terminals. Later, we will show thatwhen
M = K this inefficiency can even result in conjugateMUBF
out-performing zero-forcing.
5.2.2 Scaling up with KWe next fix M and vary the number of
terminals to see
how capacity scales with K. In the experiments reportedby
Figures 9 Left and Middle, the total transmission poweris scaled by
1/M , similar to that in Figure 8, and is heldconstant regardless
of K. Because the total power is splitamong the terminals, the
power per terminal is thereforescaled by 1/K. In the experiment
shown by Figure 9 Right,we reduce the transmission power to the
minimum WARPsetting in order to demonstrate how the capacity of the
threeforms of MUBF are affected by power. We have the
followingobservations:First, when M � K, as shown in Figure 9 Left,
capac-
ity increases approximately linearly with the number of
ter-minals for both conjugate and zero-forcing MUBF; this
isattributable to the multiplexing gains from simultaneouslyserving
K terminals.Second, conjugate beamforming initially loses capacity
as
the number of terminals increases from 1 (SUBF) to 2 dueto the
addition of interference from the other terminal, andthus the
overwhelming drop in SINR. This loss, however, isquickly
compensated for by the multiplexing gains.Third, as K approaches M
, the performance of zero-
forcing drops sharply as shown in Figure 9 Middle. This
cor-roborates the second observation made in Section 5.2.1. Ad-
ditionally, the performance of conjugate flattens, and
evenstarts to decline, as the additional interference from
moreterminals causes the average SINR to approach 0 dB.
Finally, when the transmission power is reduced, conju-gate MUBF
performs relatively better than zero-forcing, asshown in Figure 9
Right. This is because the performanceof conjugate is inherently
limited by interference from otherterminals, while the performance
of zero-forcing is insteadlimited by noise, since the interference
is explicitly canceled.It is not until the transmission power is
reduced to a pointwhere interference has the same magnitude as
noise thatthere is a significant effect on the capacity improvement
forconjugate.
5.3 Near-optimality of Localized ConjugateIn order to verify the
viability of our localized method for
conjugate MUBF, we implement it in Argos and compareit to
standard conjugate MUBF with global power scaling.As shown in
Figure 10, we see that our local power scalingmethod (Local Conj.)
results in a signal power within 1.2 dBof global power scaling
(Conjugate), but quickly approachesequivalent power as the number
of terminals increases. For afair comparison we ensure that both
methods send with thesame transmission power, however in a
practical deploymentour method will always transmit equal or more
power. Whilelocal power scaling is less efficient for a given
transmissionpower, it ensures that each base station radio is being
fullyutilized, thus more intelligently adapting to the
constraintsof real-world hardware. Furthermore, we see in Figures
8and 9 that the performance difference between global powerscaling
(Conjugate) and local power scaling (Local Conj.) isalmost
indistinguishable.
5.4 Stability of Indirect CalibrationAs described in the
previous section, we implemented a
novel reciprocal calibration method to enable implicit
beam-forming and efficient TDD operation. Figure 11 shows thatthis
calibration deviates from the mean angle an average ofless than
2.6% (maximum 6.7%), and from the mean am-plitude less than 0.7%
(maximum 1.4%), over a period of4 hours. Notably, these
measurements were taken duringthe day with normal movement around
the base station,indicating the calibration procedure is stable in
real-worldenvironments. Angle deviation is calculated by difference
inangle from average angle over π, i.e., 2.6% error is equivalentto
0.08 radians. This indicates that our internal calibrationscheme
can performed very infrequently, i.e., once a day, andthus has
negligible performance overhead.
-
0 2 4 6 8 10 12 14 16
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Number of Terminals
SN
R D
iffer
ence
(dB
)
Figure 10: Relative signal power between conjugateand our
conjugate with local power scaling, sent atthe same transmit power.
Local power scaling per-forms within 1.5 dB of global power
scaling, and thedifference quickly converges to 0 dB as K
increases.
6. RELATED WORKArgos is directly motivated by multi-user
beamforming
(MUBF) theory. Recent theoretical works have demon-strated the
exciting benefits of large-scale MUBF, or itsgeneral form, massive
MIMO [12, 13, 20]. In [17], the re-alization of conjugate
beamforming with an infinite numberof antennas in a TDD multi-cell
system is discussed. Lever-aging conjugate beamforming for a
distributed architectureis proposed in [20], however our paper
explicitly addressespower-normalization, and shows that, in a
statistical sense,each base station antenna can perform
normalization inde-pendently of the others. Single-cell analyses of
the perfor-mance of both conjugate and zero-forcing beamforming
forfinite numbers of antennas are obtained in [19]. However,these
works are theoretical in nature, and do not addressthe system and
implementation issues such as decentralizedpower scaling and TDD
calibration. Argos is motivated byand built on top of this prior
work, and complementarilyaddresses the architectural and system
challenges to realizea many-antenna base station in the
real-world.Endeavors to push MUBF into practice have been ob-
served recently. State-of-the-art wireless standards in
cellu-lar networks and WLAN, such as LTE, WiMAX, and 802.11,have
all considered incorporating downlink MUBF in theiremerging
releases, e.g., 3GPP release 9 [1] and 802.11ac [2].However, MUBF
in these standards is restricted to a muchsmaller scale, e.g., up
to eight antennas in 802.11ac. Themost recent research efforts
towards practical MUBF [5, 14]are limited to a small number of
antennas. Argos, compar-atively, is the most ambitious MUBF
prototype, featuring amuch greater number of antennas. As a result,
our contri-bution differs from existing works in that we have
identifiedand addressed a set of unique challenges regarding
practical-ity and scalability. In addition, Argos is programmable
andtherefore can be used for any large-scale MU-MIMO tech-nique,
although we have focused on MUBF in this paper.There are orthogonal
techniques to improve the spatial
reuse of spectrum such as sectorization and small cells.First,
sectorization uses multiple antennas to form direc-tional beams,
each of which covers a range of directions asa sector. Terminals in
different sectors can be served si-multaneously. Therefore,
sectorization can be treated as aspecial case of MUBF to physically
separate terminals. Sec-ond, small cell techniques, such as
femto-cells and pico-cells
0 50 100 150 200 2500
1
2
3
4
5
6
7
Time (Minutes)
Dev
iatio
n (%
)
AngleMagnitude
Figure 11: Our calibration procedure exhibits anaverage
instantaneous noise of less than 7% and re-mains stable
indefinitely.
deploy base stations more compactly, with limited coverageto
improve spatial reuse. In each sector or small cell, onecan further
apply large-scale MUBF to achieve even morespatial reuse and better
energy efficiency.
Argos improves spatial reuse efficiency through by em-ploying a
large number of antennas on base stations. Onecan also add antennas
at the terminal to further improvespectral efficiency and reduce
inter-cell interference. Theauthors in [21] reported a system with
multiple directionalantennas on mobile terminals where only one
antenna is ac-tive at a time to realize uplink directionality. The
authorsin [27] studied the feasibility of SUBF on terminals,
anddemonstrated its power efficiency and capacity benefit. Ar-gos
is completely complementary to these terminal-basedsolutions and
provides orthogonal benefits.
A few prior works have offered solutions for efficient chan-nel
calibration in TDD systems, e.g., [6, 10, 11, 23, 24]. Allprior
solutions require terminal involvement and feedback inthe
calibration process, an unacceptable overhead in a large-scale MUBF
system. In contrast, the relative calibration inArgos is done
internally at the base station without suchoverhead.
7. DISCUSSIONThis paper presents a scalable architecture for
many-
antenna base stations, a real-world implementation of
thisarchitecture, as well as compelling early experimental
re-sults. These results motivate further research in the area
ofmany-antenna systems, and raise many practical challengesto be
surmounted.
First, the size, cost, and power consumption of a many-antenna
base station are significant impediments to adop-tion. However, we
believe that advances in manufacturingcombined with the use of
specialized hardware in both theanalog and digital domains can
overcome these barriers.
While Argos supports multiple precoding techniques, anobvious
question this work raises is: which precoding tech-nique is optimal
under a given scenario? This is a deceiv-ingly hard question to
answer, as there are many variablesto optimize, such as power,
fairness, and spectral efficiency,as well as many factors that
impact this optimality, such astransmission power, propagation
environment, terminal mo-bility, hardware capabilities, number of
terminals, and num-ber of base station antennas. Moreover, we
conjecture thatit will be advantageous to dynamically select the
precodingtechnique, as many of these factors continuously
change.
-
Finally, this many-antenna architecture also presents nu-merous
network level challenges, including terminal paging,optimal
grouping and scheduling of terminals, and handoverbetween cells.
This architecture also raises many opportu-nities for improving
total network capacity; for example, aspredicted by [17], the
reduced transmit power will signifi-cantly improve inter-cell
interference. Addressing these chal-lenges and opportunities is the
subject of our future work.
8. CONCLUDING REMARKSWe present the design, realization, and
evaluation of Ar-
gos, a base station architecture that can potentially
employthousands of antennas to serve tens of terminals
simulta-neously through MUBF. In order to enable this
unprece-dented scaling in a practical environment Argos employs
ahierarchal modular design that facilitates flexible,
scalabledeployments while simultaneously constraining latency
andproviding fault tolerance. It also features a novel beamform-ing
algorithm that is completely decentralized and a new cal-ibration
method that allows CSI to be collected in constanttime with regard
to the number of base station antennas.Our experimental
characterization of an Argos prototype
with 64-antennas clearly shows the practical benefits ofMUBF
base stations with many antennas, improving spec-tral and energy
efficiency manyfold simultaneously. Ourresults are the first
publicly reported evidence that many-antenna MIMO systems can
produce significant benefits un-der real-world settings. The scale
of our experiments is onlylimited by the number of Argos modules
(WARP boards)currently available to us. The architecture of Argos,
how-ever, can easily accommodate many times more modules,each with
more radios, potentially allowing thousands of an-tennas to serve
tens of terminals through MUBF.
AcknowlegementsThis work was supported in part by NSF grants
CRI0751173, MRI 0923479, NetSE 101283, MRI 1126478 andCNS 1018292.
Clayton Shepard was supported by an ND-SEG fellowship and grant CNS
1018292. We would like tothank Ashutosh Sabharwal, Edward Knightly,
Patrick Mur-phy, Reinaldo Valenzuela, Cuong Tran, our reviewers,
andour shepherd, Sunghyu Choi, for their input and support.
References[1] 3GPP Release 9. www.3gpp.org/Release-9.
[2] IEEE 802.11ac. mentor.ieee.org.
[3] InfiniBand. www.infinibandta.org.
[4] Rice University Wireless Open Access ResearchPlatform.
warp.rice.edu.
[5] E. Aryafar, N. Anand, T. Salonidis, and E. Knightly.Design
and experimental evaluation of multi-userbeamforming in Wireless
LANs. In Proc. ACMMobiCom, Chicago, Illinois, Sept. 2010.
[6] A. Bourdoux, B. Come, and N. Khaled.
Non-reciprocaltransceivers in OFDM/SDMA systems: impact
andmitigation. In Proc. IEEE RAWCON, 2003.
[7] Cisco Inc. Cisco visual networking index: Globalmobile data
traffic forecast update, 2011-2016.
cisco.com/en/US/solutions/collateral/ns341/ns525/
ns537/ns705/ns827/white_paper_c11-520862.html.
[8] C. Dick, F. Harris, M. Pajic, and D. Vuletic.
Implementing a real-time beamformer on an FPGAplatform. Xcell
J., 2007.
[9] D. Gesbert, M. Kountouris, R.W. Heath, C. Chae,and T.
Salzer. Shifting the MIMO paradigm. IEEESignal Processing Magazine,
2007.
[10] M. Guillaud, D.T.M. Slock, and R. Knopp. Apractical method
for wireless channel reciprocityexploitation through relative
calibration. In Proc.IEEE ISSPA, 2005.
[11] Y. Hara, Y. Yano, and H. Kubo. Antenna arraycalibration
using frequency selection in OFDMA/TDDsystems. In Proc. IEEE
GLOBECOM, 2008.
[12] J. Hoydis, S. ten Brink, and M. Debbah. MassiveMIMO: How
many antennas do we need?arXiv:1107.1709v2 [cs.IT], 2011.
[13] H. Huh, G. Caire, H.C. Papadopoulos, and S.A.Ramprashad.
Achieving large spectral efficiency withTDD and not-so-many
base-station antennas.IEEE-APS Topical Conf. on APWC, 2011.
[14] J. Koppenborg, H. Halbauer, S. Saur, and C. Hoek.3D
beamforming trials with an active antenna array.In Int. Workshop on
Smart Antennas, 2012.
[15] T.K.Y. Lo. Maximum ratio transmission. IEEE Trans.on
Communications, 1999.
[16] T.L. Marzetta. How much training is required formultiuser
MIMO? In Proc. IEEE ACSSC, 2006.
[17] T.L. Marzetta. Noncooperative cellular wireless
withunlimited numbers of base station antennas. IEEETrans. on
Wireless Communications, 2010.
[18] Maxim. Single-/dual-band 802.11a/b/g world-bandtransceiver
ICs. datasheets.maxim-ic.com/en/ds/MAX2828-MAX2829.pdf.
[19] H. Ngo, E. Larsson, and T.L. Marzetta. Energy andspectral
efficiency of very large multiuser MIMOsystems. IEEE Trans. on
Communications, 2011.
[20] F. Rusek, D. Persson, B. Lau, E. Larsson, T.L.Marzetta, O.
Edfors, and F. Tufvesson. Scaling upMIMO: Opportunities and
challenges with very largearrays. arXiv:1201.3210v1 [cs.IT],
2011.
[21] A. Amiri Sani, L. Zhong, and A. Sabharwal.Directional
antenna diversity for mobile devices:Characterizations and
solutions. In Proc. ACMMobiCom, 2010.
[22] C. Shepard. Argos: Practical base stations withlarge-scale
multi-user beamforming. Master’s thesis,Rice University, April
2012. Available at:clay.rice.edu/pubs/MasterThesis.pdf.
[23] J. Shi, Q. Luo, and M. You. An efficient method
forenhancing TDD over the air reciprocity calibration. InProc. IEEE
WCNC, 2011.
[24] D. Tse and P. Viswanath. Fundamentals of
WirelessCommunication. Cambridge University Press, 2005.
[25] Xilinx Inc. 7 Series FPGAs
Overview.xilinx.com/support/documentation/data_sheets/
ds180_7Series_Overview.pdf.
[26] Xilinx Inc. Virtex-II Pro and Virtex-II Pro X
PlatformFPGAs: Introduction and Overview.
xilinx.com/support/documentation/data_sheets/ds083.pdf.
[27] H. Yu, L. Zhong, A. Sabharwal, and D. Kao.Beamforming on
mobile devices: A first study. InProc. ACM MobiCom, 2011.
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 300
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages true
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true
/MonoImageDownsampleType /Bicubic /MonoImageResolution 1200
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description > /Namespace [ (Adobe)
(Common) (1.0) ] /OtherNamespaces [ > /FormElements false
/GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks
false /IncludeInteractive false /IncludeLayers false
/IncludeProfiles false /MultimediaHandling /UseObjectSettings
/Namespace [ (Adobe) (CreativeSuite) (2.0) ]
/PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing
true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling
/UseDocumentProfile /UseDocumentBleed false >> ]>>
setdistillerparams> setpagedevice