2788 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12 ...homepage.fudan.edu.cn/xuyuedong/files/2019/09/muvi17.pdf · 2788 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER

2788 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER 2017

MuVi: Multiview Video Aware TransmissionOver MIMO Wireless Systems

Zhe Chen, Student Member, IEEE, Xu Zhang, Yuedong Xu , Jie Xiong, Yu Zhu, and Xin Wang, Member, IEEE

Abstract—Multiview video is essential for various mobile three-dimensional (3D) and immersive applications that can capturescenes from multiple angles for better user experience. However,robust transmission of multiview video is very challenging inwireless networks due to high bandwidth requirement and time-varying channel quality. Though the up-to-date 802.11 systemenables spatial multiplexing MIMO to enhance transmissioncapacity, it is still agnostic to 3D source coding structure inthe transmission. In this paper, we study the optimal resourceallocation problem in MIMO systems that deliver 3D contentwith multiview video coding. The basic idea is to exploit thechannel diversity of multiple antennas and the source codingcharacteristics so as to achieve unequal error protection againstchannel errors. To achieve this goal, we develop a nonlinear mixedinteger programming framework to perform antenna selectionand power allocation, and propose low-complexity algorithms toassign these resources. We implement a proof-of-concept system,namely MuVi, on the software-defined-radio platform, WARP, toevaluate the proposed algorithms. MuVi is the practical system totackle 3D multiview streaming in the latest Wi-Fi networks suchas IEEE 802.11ac under realistic channel conditions. Extensiveexperimental results demonstrate that the peak signal-to-noise-ratio of MuVi significantly outperforms that of the conventionalpower allocation scheme in a variety of indoor environments.

Index Terms—Antenna selection, distortion, MIMO, multiviewvideo streaming, power allocation.

I. INTRODUCTION

THREE dimensional (3D) multi-view video is a novelparadigm of multimedia service that offers a considerably

high quality of experience to viewers.1 Different viewing angles

Manuscript received July 1, 2016; revised December 1, 2016 and April 5,2017; accepted May 9, 2017. Date of publication June 8, 2017; date of currentversion November 15, 2017. This work was supported in part by the NaturalScience Foundation of China under Grant 61402114 and Grant 61271223, inpart by the 863 Program of China under Grant 2015AA016106, and in part byNatural Science Foundation of Jiangsu Province under Grant BK20140404. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Prof. Christian Timmerer. (Corresponding authors: YuedongXu; Xin Wang.)

Z. Chen, X. Zhang, and X. Wang are with the School of Computer Science,Fudan University, Shanghai 200433, China (e-mail: [email protected];[email protected]; [email protected]).

Y. Xu and Y. Zhu are with the School of Information Science and Technol-ogy, Fudan University, Shanghai 200433, China (e-mail: [email protected];[email protected]).

J. Xiong is with the School of Information Systems, Singapore ManagementUniversity, Singapore 188065 (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMM.2017.2713414

1“3D display market analysis, market size, application analysis, regionaloutlook, competitive strategies, and forecasts,” 2015. [Online]. Available:http://www.grandviewresearch.com/industry-analysis/3d-display-market.

of the same 3D scene, captured simultaneously by a set of closelyplaced cameras, facilitate the viewers to enjoy a stereo percep-tion. 3D multi-view video has gained an increasing popularityin a variety of applications such as 3D television, free viewpointtelevision, remote medical surgery, security surveillance,and virtual reality.2,3 Especially, with the rapid developmentof mobile enabling technologies, 3D multi-view video hasbecome a trend in mobile robots, unmanned aerial vehicles(UAV), and head-mounted displays that prefer an immersivefirst-person view.4 In these new applications, multi-view videosare usually transmitted over unlicensed wireless spectrum inwhich off-the-shelf 802.11 device is naturally adopted.4

Delivering 3D multi-view video is a very challenging task intoday’s wireless networks. In contrast to the single-view video,multi-view video needs to transmit a potentially much largertraffic volume generated by multiple cameras. On the other hand,the multi-view video streaming service has a stringent require-ment of low-latency and high video quality. This is usually inconflict with the limited wireless bandwidth and time-varyingchannel condition exposed to the widely deployed 802.11 b/gsystems. Very recently, one of the most significant features is touse multiple input multiple output (MIMO) scheme that employsprecoding and spatial multiplexing for simultaneous transmis-sions from multiple antennas. Equipped with MIMO technol-ogy, the 802.11ac-like Wi-Fi system can potentially achieve amuch higher throughput. The MIMO Wi-Fi system has attractedenormous interests from both the industry and research commu-nity. For instance, the seminal papers in [1], [2] investigate thedesign of resource allocation schemes for throughput maximiza-tion in MIMO Wi-Fi networks. Different from the literature, ourgoal is not to enhance the throughput of MIMO Wi-Fi, but toachieve content-aware transmission for an emerging type ofmulti-view video streaming service.

Intuitively, multi-view video streaming over the up-to-date802.11ac MIMO system is expected to achieve satisfactory qual-ity of user perception. However, if all the frames are treatedequally in the multi-view video transmission, the streamingquality will be impaired in a resource-limited wireless network.To design a 3D multi-view content-aware streaming system, onemay encounter three difficulties.

2“Microsoft-HoloLens,” 2017. [Online]. Available: https://www.microsoft.com/microsoft-hololens.

3“Sony Morpheus,” 2015. [Online]. Available: http://blog.us.playstation.com/2015/03/03/project-morpheus-ps4-vr-upgraded-coming-in-2016/.

4“Parrot Drone,” 2017. [Online]. Available: https://www.parrot.com/us/drones/parrot-ardrone-20-elite-edition.

1520-9210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

https://orcid.org/0000-0003-4168-3998

https://www.microsoft.com/microsoft-hololens

https://www.microsoft.com/microsoft-hololens

http://blog.us.playstation.com/2015/03/03/project-morpheus-ps4-vr-upgraded-coming-in-2016/

http://blog.us.playstation.com/2015/03/03/project-morpheus-ps4-vr-upgraded-coming-in-2016/

https://www.parrot.com/us/drones/parrot-ardrone-20-elite-edition

https://www.parrot.com/us/drones/parrot-ardrone-20-elite-edition

CHEN et al.: MUVI: MULTIVIEW VIDEO AWARE TRANSMISSION OVER MIMO WIRELESS SYSTEMS 2789

Fig. 1. Coding and decoding dependencies. (a) Directed acyclic dependencein H.264/AVC extension (MVC) videos [4]. (b) Typical dependencies betweenIBBPBBPBBP 2D video frames.

1) Multi-view video has a much more complicated sourcecoding structure than 2D single-view video. Fig. 1(a) il-lustrates a representative 3D video coding scheme, namelymulti-view video coding (MVC), with left and right views.The upper one is the basic view, and the lower one is theauxiliary view. In MVC, video streams are encoded intoa string of group of pictures (GoPs) that contain one Iframe, one or more P frames, and multiple B frames. Thearrows indicate the dependency of these frames in the de-coding procedure. Unlike the traditional 2D video coding[in Fig. 1(a)] with only sequential dependence among I,B and P frames, the 3D MVC owns a non-sequentialand directed acyclic graph (DAG) dependence. The twoviews are correlated such that the transmission of framesshould take into account the importance of different videoframes, instead of squeezing them from left to right andfrom basic view to auxiliary view. In the general case withmore than two views, analyzing the interaction among dif-ferent views become more complicated. In addition, thevideo frames are assembled into a number of data packetspassed down to wireless interface for transmission, whichis different from the assumption of an entire frame as apacket with a fixed loss rate in [3]. Hence, in our context,a packet loss will degrade the image quality, while notdamaging the whole frame. These two factors togethermake the MVC streaming dramatically different from theconventional 2D video streaming.

2) It is essential to allocate MIMO antennas and transmissionpower for MVC streaming jointly. The channel gains aretime-varying, and are heterogeneous in different antenna-viewer pairs. By selecting the appropriate antennas fortransmission, a wireless user may achieve remarkablethroughput gain, while considering the fundamental het-erogeneity of video packets. An interesting question iswhether we need to allocate “good” antennas to transmitimportant packets, and how the total transmission powershould be assigned to different antennas.

3) Evaluating MVC-aware transmission strategy over a real-istic MIMO system is nontrivial. A numerical simulationcan exhibit the performance gain of context-aware MVC

streaming, while the result is not convincing enough. Thechannel condition in a real indoor environment is muchmore complicated than that described by any statisticalmodel. Furthermore, the damage of error channel to MVCstreaming may deviate from its average distortion model.A more practical MVC streaming prototype is able tobridge the gap between theory and practice, however, itrequires compound cross-layer design rationales.

In this paper, we study a novel content-aware transmissionstrategy for 3D MVC streaming in an MIMO wireless system.The commonly adopted peak signal-to-noise-ratio (PSNR) isused to measure the distortion of reconstructed video framesafter transmission. In the MIMO system, a video packet is nottransmitted as a whole, but as a string of bits. Hence, bit errorsmay result in the loss of macro-blocks inside a video frame,while not the loss of the whole video frame. As the first step,we present comprehensive measurements to quantify the av-erage PSNR of MVC video frames, which paves the way ofoptimizing the transmission. We subsequently exploit the diver-sity gain of unequal error protection (UEP) that aims to offermore resources to guarantee robust transmission of importantpackets against channel errors. A mixed-integer programmingframework is formulated to minimize the average distortion of3D MVC video frames. We propose a low-complexity jointantenna selection and power allocation algorithm to improvethe performance. The basic idea is to allocate different MVCdistortion packets to the heterogeneous antennas for unequalerror protection. Finally, we design and implement a proof-of-concept system, named MuVi, to evaluate the performance ofthe proposed resource allocation algorithms. When it comes toa practical implementation, MuVi needs to be compatible withthe state-of-art 802.11 PHY/MAC layer protocols. Thanks tothe software-defined-radio (SDR) technology, we built MuVion top of 802.11 MIMO-OFDM module without any modifi-cation to the hardware. Extensive experiments demonstrate thatMuVi outperforms the conventional resource allocation schemewith an improvement of PSNR by 3 dB to 5 dB for MVC videotransmission.

MuVi integrates the above-mentioned features of MVC withthe MIMO spatial multiplexing technique. Our main contribu-tions are as follows:

1) We formulate a nonlinear mixed integer programmingproblem to perform antenna selection and power allo-cation for multi-view video aware transmission.

2) We present the two-stage antenna selection and powerallocation algorithms to achieve unequal error protectionof data units in MVC video frames.

3) We implement a proof-of-concept prototype, namelyMuVi, on the software-defined radio platform to demon-strate its effectiveness. MuVi is compatible with the802.11ac specifications at the MAC and physical layers.

The remainder of the paper is organized as follows. InSection II, we briefly describe the MVC coding and error prop-agation. In Section III, we present the mathematical modelfor unequal error protection of multi-view video streaming.Section IV presents the antenna selection and power allocationalgorithms. Section V describes our design and implementationof our MuVi system. The performance of MuVi is evaluated


in Section VI. We survey the related work in Section VII, andconclude this paper in Section VIII.

II. BACKGROUND

In this section, we describe the basic properties of MVC videocoding, and the error propagation in MVC video frames.

A. MVC Video Streaming Structure Overview

The content features from N views are captured by theirrespective cameras. These content features are compressed andpacketized into data units for transmission subsequently. Forexample, in Fig. 1(a) the two video sequences are encoded byan MVC extension of the H.264 standard [4], and encapsulatedin a single bitstream. The frame dependencies within a GoPare also shown in Fig. 1(a). Different from 2D video, the MVCdependencies include intra- and inter-views, and are associatedwith different cameras. The decoding of MVC streaming is adirected acyclic graph consequently.

To enable fine-grained protection of video frames in the trans-mission, one need to extract different view bitstreams for re-source allocation. As described in [4], there are different typesof Network Abstraction Layer Units(NALUs) that contain sev-eral configuring descriptions of a coded H.264/MPEG-4 AVCvideo. Hence, MVC can utilize such NALUs to provide back-ward compatibility. Generally speaking, in order to be compati-ble with a conventional single-view 2D video decoder, the baseview bitstream is coded independently, and encapsulated in con-ventional 2D video NALU. Then, the coded video data of otheradditional views is encapsulated in an extension NALU type. Inthis way, legacy 2D receivers can easily distinguish the typesof NALU, and bypass the additional view bitstreams using theirconventional decoder. Meanwhile, decoders of 3D receivers canrecognize the new NALU types belonged to the additional viewbitstreams. Thus, the additional views are put together with baseview to finish a 3D video decoding.

Moreover, the sequence parameter set of NALU indicatesthree important pieces of information that do not appear in 2Dvideo, such as view identification, view dependency informa-tion, and level index for operation points. According to the abovetwo pieces of information, we can parse packet headers to gettheir property information to carry out MuVi. The view iden-tification describes the number of views, and a listing of viewidentifiers. The view dependency information includes severalsignals which indicate the number of inter-view reference pic-tures used in the prediction process, and the views which areutilized for predicting a particular view.

B. Error Propagation in MVC Videos

When MVC videos are transmitted over wireless channels,the video quality is influenced by bit errors. Especially, the videois more sensitive to the bit errors with a higher compression rate.Here, we give examples to explain how the error propagates inMVC frames. To simplify the illustrations, we assume that thehierarchal B coding is disabled, but it makes sense because thereare only I and P frames in a low-delay application. Consideringthe error concealment, there are three types of error propagationsin MVC frames.

Fig. 2. Example of error propagation in MVC videos. (a) Propagation of anMB error from I-frame of base view. (b) Propagation of an MB error fromP-frame of base view. (c) Propagation of an MB loss from P-frame of auxiliaryview.

In Fig. 2(a) the macro-block (MB)error in I frame which onlyexists in the base view is demonstrated. The black box representsan MB error in the I frame. This error will cause a distortionreduction, and will propagate to all the subsequent frames in-cluding the base and auxiliary views, because all the P frames ofthe base and auxiliary views refer to the I frame. Moreover, thedistortion reduction caused by the MB error will not stop untilthe end of a GoP. Moreover, if there is no motion estimationin the I frame, the decoder can only use spatial concealmentbased on weighted pixel averaging method. In Fig. 2(b), thepropagation of an MB error in a P frame of the base view isillustrated. The grey box represents an MB error in the P frameof the base view, but the distortion reduction can be reduced viatemporal concealment. The simplest temporal concealment is tocopy the correct MB of a previous frame in place of the errorMB. Although error concealment is used, the error still propa-gates to other frames until the end of a GoP. The last type of theerror propagation is an MB error occurs only in auxiliary view.Fig. 2(c). illustrates the propagation of an MB error in a P frameof auxiliary view. The grey box represents the MB error in theP frame of auxiliary view. In this case, the error concealmentcan take place either from the previous frame or the frame of adifferent view. After error concealment, the distortion reductionwill be reduced.

To summarize, if an error occurs in 2D video, its effect isonly limited to some of its subsequent frames. However, if anerror occurs in an MVC video, due to the recursive structureof MVC codec, the error will propagate not only to its subse-quent frames but also to the frames of other views. Knowing the


above features, we can design an efficient algorithm for resourceallocation in MuVi.

III. SYSTEM MODEL

In this section, we present the mathematical models ofwireless error channel and MVC video distortion. We furtherformulate an optimization framework to achieve unequal errorprotection for MVC video frames in MIMO networks.

A. Channel Model

We consider the transmission of MVC videos through a wire-less error channel. The transmitter can acquire channel stateinformation (CSI) via channel feedbacks. In this work, zero-forcing beamforming (ZFBF) [2], a simple but efficient beam-forming scheme is employed so that the received signals areorthogonal in the space domain at the receiver. In this way, thereceiver can achieve a nearly optimal capacity without incurringcomplicated signal processing [1], [2].

We consider a pair of M -antennas transmitter and M -antennas receiver. Let X be an M × 1 vector, representing thedata symbols which are sent to the M -antennas receiver. Then,the vector of received symbols is

Y = HWX + Z (1)

where Z is an M × 1 noise vector with zero mean andvariance σ2 , H is an M × M channel matrix, and W is theprecoding matrix designed to mitigate interference between an-tennas. For instance, there have [HW]k,j = 0 and k �= j. InZFBF, we can obtain the ZF condition, which is equivalent toHW = diag{√p}, where

√p = [

√p1 ,

√p2 , · · · √pM ]T is a

vector with real non-negative elements. W is obtained via apseudo-inverse of the channel matrix

W = HH (HHH )−1diag{√p} (2)

where (·)H denotes the complex conjugate transpose operator.With precoding, the received symbols become Y = X + Z.

Specially, precoded symbols should migrate each other at itsantenna. It means that its own symbol plus noise is only receivedby each antenna. In practical MIMO transceivers, the SNR γj

of each antenna j can be calculated as γj =∣∣∣∑M

k=1 hjkwkj

∣∣∣

2·

σ−2 , where zj is the jth element of Z. With ZFBF, a commonlystudied problem in the literature is to find the precoding matrixW to maximize total throughput of all the antennas under theconstraint of transmission power, i.e.,

maxp

M∑

j = 1

log2

(

1 +pj

σ2

)

s.t. Tr{WH W} ≤ P (3)

which can be solved via the well-known water-filling approach[5]. The above operator Tr denotes the trace. All the notationsare summarized in Table I.

B. MVC Distortion Estimation

We model the expected distortion of a data unit of the pri-mary and secondary views sent over a lossy channel. This model

TABLE ISUMMARY OF NOTATIONS

Notation Description

M the number of antennas or the number of viewsj antenna jv view v in MVC codingxv

j binary variable indicating that a data unit of view v

is transmitted over the jth antennaDv

i expected distortion reduction of view v in the ith data unit

Δd(x,y )i distortion reduction of view y caused by ith data unit of view x

pj transmission power of antenna jP total power of all the antennasLj probability of data unit loss over antenna jγj SNR of antenna jj(v) view v is selected by antenna jU (j, v) utility function of expected distortion reduction

serves as a prerequisite to achieve unequal error protection(UEP) that enables us to assign different antennas and trans-mission power to transmit different data units.

There are M antennas for ZF precoding transmission, and Nis the number of data units in one GoP. Without loss of gen-erality, each view is allocated to an antenna for transmission.According to the packetization in H.264/AVC standard for errorresiliency [4], each row of macro-blocks of the MVC-encodedvideo is encapsulated into a separate packet. Due to the recur-sive structure of the MVC video coding, a data unit error notonly causes the distortion in this frame, but also propagates tosubsequent frames depending on this frame in the GoPs. In ad-dition, the error in one view may also propagate to other views.For instance, in a two-views scenario, if an error occurs in thetransmission of data unit i at the base view, it will introduce adistortion reduction Δd

(1,1)i to the base view, and a distortion

reduction Δd(1,2)i to the auxiliary view. If an error occurs in

the auxiliary view, it causes a distortion reduction Δd(2,2)i to

the auxiliary view, while not influencing the quality of the baseview.

We are inspired by the intuition that the data units are notequally important in MVC encoded videos, and important dataunits can be better protected by associating them to the antennaswith good channel gains. To achieve unequal error protection,we first model the distortion reduction for the success of adata unit. We use j(v) to represent that view v is selected fortransmission on antenna j. Let Sj = |[H · W]j,j |2 · σ−2 be theSNR with normalized power allocation, and then γj = pjSj bethe SNR of the jth antenna after power allocation at a timeslot. We let D

(v )i be the expected distortion reduction when

the ith data unit belonging to view v is transmitted over an-tenna j. In an error-prone wireless channel, the transmissionof every bit might occur with a certain probability. This prob-ability depends on the SNR, the transmission power and themodulation. To compute D

(v )i , if the binary phase-shift keying

(BPSK) modulation is used by transmitter, the bit error rate(BER) is computed by e = 0.5

(

erfc(√γj ))

where erfc(·) is theerror function. In fact, other modulation schemes beyond BPSKcan be similarly applied in our model via error function [6].


The BER also can be extended to the channel coding scenario.For a particular channel coding, the coding gain g can be es-timated using Monte-Carlo simulations [7]. Then, the codingBER ecode = 0.5

(

erfc(√

γj + g))

. We suppose that the size ofdata unit in bits is equal to lv in an OFDM symbol. The proba-

bility of success reception is thus L(γj (v )) =(∏lv

k=1(1 − ek ))

.

Here, we use qk = (1 − ek ) to denote the successful transmis-sion of an OFDM sub-channel. Hence, the successful transmis-sion probability of all OFDM sub-channels is approximated byq = 1 − max{ek}. For ease of analysis and system design, werewrite the probability of successful transmission of a data unitas L(γj (v )) = qlv (γj (v )). Therefore, the expected distortion re-duction of view v at antenna j(v) at each transmission slot isgiven by

D(v )i =

M∑

k=1

Lj (γj (v ))Δd(v ,k)i . (4)

For a GoP consisting of N packets, the expected distortionreduction is expressed as DGoP =

∑Ni = 1

∑Mv=1 D

(v )i . So far,

we have derived the distortion reduction model for the ith dataunit. When a data unit is transmission over lossy channel, theresource is allocated at each time slot. Hence, we express theexpected distortion reduction brought by a data unit as that ateach time slot. The expected distortion reduction over the rthslot is rewritten as

Dr = L(γj (v ))ΔD(v )r (5)

where ΔD(v )r =

∑Mk = 1 Δd

(v ,k)i .

C. Problem Formulation

The channel quality of each transmitting antenna is differentin an MIMO system. When a multi-view video stream is trans-mitted, data units can be interleaved before being passed downto all the transmitting antennas. However, this naive scheme mayrisk a serious degradation of video quality due to packet losseswhen some of the MIMO channels have low SNRs. Hence,an idea transmission strategy should be channel-aware so asto achieve unequal error protection of video packets, whichturns out to be a very challenging task. To minimize the ag-gregate distortion reduction of an entire GoP, all the CSI oneach antenna should be known as a priori over the duration ofthis GoP. In practice, all the CSI in future slots cannot be ac-quired beforehand, and cannot be estimated accurately at thecurrent slot. Hence, we resort to a one-shot optimization frame-work to minimize the distortion reduction of upcoming dataunit at the rth transmission slot, given the knowledge of cur-rent CSI from all the transmitting antennas and the distortionreduction.

We carry out our analytical framework in an MIMO-OFDMsystem. The main module, resided in the transmitter, collectsthe CSI from the receiver and allocates resource to perform one-shot optimization. Let a binary variable x

(v )j indicate whether

a data unit of view v is transmitted through antenna j. Af-ter certain normalization on the matrix HH (HHH )−1 in (2),we can transform the constraint on total transmission power

with regard to pj for all j ∈ [1,M ]. Minimizing the distor-tion reduction caused by the corrupted data units is equivalentto maximizing that with successfully received data units. Thehuman perception of video quality increases with the distor-tion reduction, while the further reduction of distortion bringsa shrinking improvement of human perception [8]. Hence, weadopt a log-utility function to capture the human perception for agiven distortion reduction. Then, our maximization of sum log-utility for expected distortion reduction is formally defined asbelow:

maxx,p

M∑

j=1

M∑

v=1

x(v )j η(v ) log

([

Lj (γj )ΔD(v )r

])

(6)

s.tM∑

j=1

pj = P, pj > 0 (7)

M∑

j=1

M∑

v=1

x(v )j ≤ M, x

(v )j ∈ {0, 1} (8)

M∑

v=1

x(v )j ≤ 1,

M∑

j=1

x(v )j ≤ 1. (9)

The parameter η(v ) denotes the priority level price of a view,which is related to distortion reduction ΔD

(v )r in rth transmis-

sion. For example, η(v ) can be defined as a staircase function orpiece-wise function in different domains [9]. The equality in (7)imposes a limit on the total transmission power at all the trans-mitting antennas. Inequalities from (8) to (9) represent threefeasibility conditions: each antenna can transmit at most onedata unit at a time slot; each data unit belongs to a single view;and the total number of data units prepared for transmissionshould be no larger than the number of transmitting antennas.Here, we assume that the number of views is the same as that ofantennas. In fact, more general cases can be studied in the simi-lar way. If the number of antennas is greater than that of views,the extra antennas can be used to transmit the next packets ofthe multi-view video. If the number of antennas is less than thatof views, we can divide the transmitting slots of an antenna tosend different views in turns. The above optimization problemis a mixed-integer nonlinear programming that the optimal so-lution can only be attained via an exhaustive search [10]. Otherthan pursuing the optimal solution with prohibitively expensivecomplexity, it is of great importance to design an efficient re-source allocation strategy with much less computations. In whatfollows, we decompose the optimization problem in (6) into twosub-problems and solve them respectively. For notational sim-plicity, we define a utility function for (6)

U(j, v) = η(v ) log([

Lj (γj )ΔD(v )r

])

.

IV. ALGORITHM DESIGN

In this section, we decompose the optimal unequal error pro-tection problem into two sub-problems and design two efficientalgorithms to address them accordingly.


A. Power Allocation Sub-problem

Given the antenna selection strategy x, the power allocationsub-problem can be rewritten as

maxp

M∑

j=1

M∑

v=1

[

x(v )j U(j, v)|x

]

(10)

s.tM∑

j=1

M∑

v=1

x(v )j pj = P, pj > 0 (11)

where all the integer variables in the original optimization prob-lem are taken as constants. We first explore the convexity ofpower allocation sub-problem.

Theorem 1: (Convexity) The power allocation sub-problem(10) is a constrained convex optimization problem.

Proof: Please refer to the appendix. �Next, a necessary condition will be derived for the optimal

power allocation base on the Karush-Kuhn-Tucker (KKT) con-ditions of (10). Let Sj = [H · W]2j,j /σ2 , and then γj = pjSj Tosimplify the notation of the first-order partial derivative (24), wedefine a function

Θ(pj , Sj ) =lvSj exp(−pjSj )

(

1 − erfc(√

pj Sj )2

)

2√

πpjSj

. (12)

Then, the partial derivative takes the form ∂ [U (j,v )|x]∂pj

=

Θ(pj , Sj ) · η(v ) .Theorem 2: (Optimality) Let p� = {p�

1 , p�2 · · · p�

M } be theoptimal power allocation strategy. For a fixed antenna selec-tion scheme x, the optimal power allocation strategy satisfies

Θ(p�j , Sj ) · η(v ) =

M∑

k=1

M∑

v=1

x(v )k Θ(pk , Sj ) · η(v ) · pk

P. (13)

Proof: Please refer to the appendix. �The objective function is complicated so that there does not

exist a close-form solution to the optimal power allocation.However, owing to the convexity of this sub-problem, we canutilize the gradient descent method to solve p� numericallywithout incurring a heavy computation in practice.

B. Antenna Selection Sub-problem

In this section, we focus on the optimal antenna selectionwhen the optimal power allocation strategy has been attainedat the first stage. The antenna selection sub-problem can beexpressed below:

maxx

⎧

⎨

⎩max

p

M∑

j=1

M∑

v=1

[

x(v )j U(j, v)|x

]

⎫

⎬

⎭(14)

s.tM∑

j=1

M∑

v=1

x(v )j ≤ M, x

(v )j ∈ {0, 1} (15)

M∑

v=1

x(v )j ≤ 1,

M∑

j=1

x(v )j ≤ 1. (16)

An exhaustive search method can be used to obtain the optimalantenna selection scheme. Trying all the possible combinationsof {x(v )

j }, is feasible when the number of views and that of an-tennas are small. However, with the growing numbers of viewsand antennas, the computational complexity grows at a factorialorder. It is inevitable to develop a practical antenna selectionalgorithm of polynomial time complexity. Intuitively, data unithas a relatively higher success probability when transmitted viathe antenna with good SNR than the one with poor SNR. If ithas a large distortion reduction ΔD

(v )r its association to the an-

tenna with good SNR will reduce the probability of transmissionerrors, thus improving the distortion of video frames. Follow-ing the same logic, less important data units can be transmittedthrough the antennas of low SNR. Next, there is a propositionto present.

Proposition 1: Suppose that the normalized SNR on antennai is greater than on antenna j, and the MB of view u has a betterdistortion reduction than that of view v at the rth round, i.e.,Si > Sj and ΔD

(u)r > ΔD

(v )r . Under equal power allocation,

there exists Li(Si)ΔD(u)r > Lj (Sj )ΔD

(v )r .

Proof: Under equal power allocation, the probability of suc-cess reception on antenna i is Li( P

M Si). If Si > Sj , then therehas Li( P

M Si) > Lj ( PM Sj ) due to the monotonicity of Li(·).

When multiplying ΔD(u)r and ΔD

(v )r on both sides of the in-

equality, we obtain Li( PM Si)D

(u)r > Lj ( P

M Sj )D(v )r . �

C. Algorithm Specification

According to above the analysis in Sections IV-A and IV-B,we will give two algorithms to solve the problem, which hastractable complexity. We firstly introduce a power allocationalgorithm based on the gradient descent method. Then, we givea resource allocation algorithm implementation in our system.

1) Power Allocation Algorithm: In this section, we proposea power allocation algorithm based on gradient descent projec-tion. Algorithm 1 is used to calculate the optimal power alloca-tion scheme p� . From line 2 to line 8, p� is initialized followingan antenna selection policy Proposition 1.

After initialization, the gradient projection method is appliedto obtain the optimal solution. The transmission power on eachantenna is updated iteratively along the steepest descent direc-tion. Let ω represent the direction that has the largest projectionon the gradient descent direction ε. Here, 1M denotes an all 1column vector with M elements. The mathematical model isformulated as the following:

minω

ωT ε (17)

s.t 1TM ω = 0 (18)

ωT ω = 1. (19)

Thus, we can find the direction with the negative directionalderivative and use a set of multipliers λ and μ to form theLagrangian function

L(ω, λ, μ) = ωT ε − ωT 1M λ − μ(ωT ω − 1). (20)


Algorithm 1: Power Allocation Algorithm Power_Allocate()

Input: ΔDr , x, P , SOutput: p�

1: //initialize the power of each antenna2: for j = 1 · · · M do3: for v = 1 · · · M do4: if xv

j = 1 then5: p�

j = P/M6: end7: end8: end9: //gradient descent projection:

10: it = 111: R(0) = +∞12: while ||R(it) − R(it−1) || < δ do

13: ε =(

∂ [U(j,v)|x]∂p

)∣∣∣∣p=p�

14: ω = α1

(

I − 1M

1M 1TM

)

ε

15: p� = max{p� + α2ω,0}16: R(it) = [U(j,v)|x]17: end

The KKT conditions for L is satisfied as

∂L∂ω

= ε − 1M λ − 2μω = 0. (21)

Multiplying ∂L∂ω = 0 by 1T

M on both sides, we have 1TM ε −

1TM 1M λ = 0 and subsequently obtain λ

λ =1M

1TM ε. (22)

We can derive a closed form of the direction ε

ω =12μ

(

I − 1M

1M 1TM

)

ε. (23)

The factor 1/2μ is not important because ω is only the di-rection of search. Therefore, in general we can define ω =

α1(I − 1M

1M 1TM )ε where α1 = −1 represents the search di-

rection [11].2) Antenna Selection Algorithm: We hereby present a de-

tailed algorithm to perform UEP for multi-view video stream-ing in our MuVi system. The inputs are distortion reductionsΔDr = {ΔD

(1)r ,ΔD

(2)r , · · · ,ΔD

(M )r }, total power P , and the

received SNRs S = {S1 , S2 , · · · , SM } without power alloca-tion. The best antenna selection scheme is the exhaustive search,however, the computational complexity is a burden. This moti-vates us to pursue a low complexity solution to perform antennaselection in Algorithm 2. Instead of applying complicated an-tenna selection schemes for transmission, we use a very simplemethod of antenna selection based on SNRs of receiving anten-nas and distortion reductions. This scheme of antenna selection

Algorithm 2: Antenna Selection Algorithm Antenna_Sel()

Input: ΔDr , P , SOutput: x� , p�

1: //initialize x� , p�

2: for j = 1 · · · M do3: for v = 1 · · · M do4: xv

j = 05: end6: end7: for j = 1 · · · M do8: p�

j = 09: end

10: // sort ΔDr in decreasing order.11: ΔDr = sort(ΔDr)12: // t is permutation of { 1, 2, · · · ,M }13: t = {t1 , t2 , · · · , tM |ΔDt1

r ≤ ΔDt2r ≤ · · ·ΔDtM

r }14: // sort S in decreasing order.15: S = sort(S)16: // k is permutation of { 1, 2, · · · ,M }17: k = {k1 , k2 , · · · , kM |Sk1 ≤ Sk2 ≤ · · ·SkM

}18: // get the index of x19: for j = 1 · · · M do20: x

tj

kj= 1

21: end22: //power allocation23: p� = Power_Allocate(ΔDr , x, P , S)

is static when different data units are assigned to different an-tennas based on Proposition 1.

The complexity of our algorithm is originated from twoaspects: the antenna selection and the power allocation. Thecomplexity of our antenna selection scheme has an order ofO(M log(M)) where M is the number of antennas. Whereas theexhaustive search scheme needs O(M !) operations. The powerallocation is implemented through the gradient descend method.The convergence rate is O(1/k) where k is the number of itera-tions. Note that our algorithm calls Power_Allocate() only once.The complexity of Power_Allocate() and Antenna_Sel() cannotbe combined together for that Power_Allocate() is a continuousoptimization but Antenna_Sel() is a combinatorial one. To sumup, our algorithm has a light computational complexity that issuitable for practical implementation.

V. MUVI SYSTEM: DESIGN AND IMPLEMENTATION

In this section, we build our prototype system, namely MuVi,for multi-view streaming in MIMO systems. MuVi is imple-mented on the software defined radio platform and is designedto be compatible with existing 802.11 protocols.

A. Overview of MuVi System

This section enumerates the procedures of MuVi, and illus-trates the system overview of MuVi in Fig. 3.

Step 1: Acquiring multiview video bitstream: The multi-viewvideo streaming is generated or stored in MVC codec by a


Fig. 3. System overview of MuVi: multiview video-aware transmission with802.11-compatibility.

remote host. Then, the transmitter requests MVC bitstreamsfrom the remote host to the receiver.

Step 2: Classifying MVC frames: The classifier classifies dif-ferent MVC frames according to H.264 extension protocol [4].If the data packet is a conventional packet, it will be bypassedby the multi-view video aware module.

Step 3: Multiview video aware transmission: The multi-viewvideo aware module of transmitter carries out three main sub-steps: MVC Distortion Estimation (MDE), View-based SpatialInterleaving (VSI), and Unequal Power Allocation (UPA). MDEestimates expected distortion of the MVC video data unit anddetermines which data units have higher priorities than theothers. VSI assigns each view bitstream to each antenna formitigating burst errors in a view. UPA applies the unequalpower allocation solution to execute MVC video streamingtransmission from the assigned antennas to the receiver over anMIMO system. More details on above sub-steps are described inSection V-C and Fig. 5.

Step 4: Reception of multiview video packets: At the receiver,the multi-view video streaming are assembled, and uploaded toupper layer by multi-view video aware module. In fact, this stepis very similar to the Step 3, but reverses the order of entireprocedures.

Step 5: Video playback: The MVC video player decodes andrenders the received MVC video frames.

B. Implementation

The entire system is divided into the MVC video codec atthe application layer, the multi-view classifier at the MAC layerand the MIMO signal processing modules at the physical layer.The MVC video is encoded and decoded by JMVC toolkit.The 802.11 MAC layer assigns each view to an antenna subse-quently. The signal processing modules of MIMO transmissionare implemented on WARPv3 software radio platform.5 Fig. 5.illustrates the architecture of our MuVi system that consists ofthe 802.11 MAC and physical layers.

1) MVC Video Codec Implementation: In the applicationlayer, we modify JMVC for our purposes. At the transmitter,we use JMVC to calculate distortion reductions of data units to

5“WARP project.” [Online]. Available: http://warpproject.org.

Fig. 4. WARPv3 platform.

derive expected distortion reduction. We also port the codes ofJMVC to MAC of MuVi to classify data units from the upperlayer. At the receiver, the PSNR is calculated in comparisonwith the original video. Note that our system considers the errorconcealment of MVC because the error concealment is appliedin the MVC decoder. Moveover, RTP protocol based on UDP isadopted in the transmission layer because that is more suitablefor real-time video streaming service.

2) Testbed Implementation: Our goal is to implement MuViwith a full-fledged 802.11ac-compatible MIMO-OFDM systemon the WARPv3 platform shown in Fig. 4. The MAC packet sizeis chosen as uniformly 2304 Bytes [12].

At the transmitter, we implement five modules includingscrambler, convolution encoder, interleaver, mapper andresource allocation modules. The resource allocation module isin the center of our MuVi system. Linear precoding (i.e., ZFBF)is adopted to cancel the inter-antenna interference beforehand.The precoded data symbols are divided into M separatebit streams that match the number of transmitting antennas.Each stream is packetized into data units and each data unitis subject to the OFDM modulation. All the packets follow802.11ac specification including the Long Training Field (LTF)and Short Training Field (STF). At the transmission stage,MuVi controller efficiently computes antenna selection andpower allocation strategies to maximize the expected distortionreduction. Then, the data streams are transmitted through the RFfront-ends.

At the receiver, the received bits are passed into demapper,deinterleaver, viterbi decoder, descrambler modules sequen-tially. An auto-correlation algorithm is applied to detect theSTF and to identify the starting position of each packet. TheLTF can be used to estimate the channel matrix of CSIs betweeneach transmitting and receiving antennas. The channel matrix issent back to the transmitter for precoding when necessary. WithZFBF, the data units from each antenna of the receiver can bedecoded since the inter-antenna interference has been canceled.

The MuVi system is implemented at both the receiver andtransmitter sides. However, since the round-trip time betweenthe baseband process and RF signal collection is several ordersof magnitude longer than the off-the-shelf 802.11 hardware,real-time control messages such as ACK and CSI feedback arenot able to be implemented directly. Instead, all the WARPv3boards connect to one host computer, and the control messages


Fig. 5. System modules of MuVi on WARPv3 platform.

can be implemented via function calling (e.g., [13]). The timerof host computer is used to control the transmission durationand inter-packet spacing. All the MIMO data streams are sentover-the-air channels by WARPv3 RF front-ends.

3) Trace-Driven Emulation: Performing real-time experi-ments on any software radio platform is a very challengingtask mainly due to the hardware constraint. Even we reduce thesystem latency of MIMO-OFDM transmission (the whole pro-cess involving channel estimation, CSI polling and feedback,precoding, data transmission and data decoding), the entire du-ration lasts more than 100 ms in the WARPv3 platform. Thislatency is mainly induced by the hardware interface between theWARPv3 and the computer. Since the 802.11 MAC frame inter-vals last around 10 μs, the transmission of packets on WARPv3takes more time that actually needed so that the real-time ex-periments are hard to be realized. More latency analyses canbe found in [14] and [15]. A commonly adopted approach tocircumvent this limitation is to adopt a trace-driven emulation(e.g., [16]).

We insert the 802.11ac MIMO preambles including STF andLTF sequences into the transmitting buffer of WARPv3 andsend them repeatedly over the air to another WARPv3. Thewaveform of signal is sampled, and recorded to a file. Then,the preamble packets are detected and estimated to achieve asequence of channel traces. In our experiment, a fine-grainedsampling of the MIMO channel in every 10 ms can be obtained.To evaluate MuVi system, we replace WARPv3 RF front-endswith an emulated channel. The emulated wireless channel isvarying according to the collected traces. In this case, we areable to study a large variety of channel instances and comparedifferent approaches more fairly and easily.

C. Compatibility to the Standard 802.11 Protocol

Our MuVi system performs resource allocation at physicallayer with the awareness of 3D multi-view content. Insteadof a clean-state design, MuVi can be well integrated to theoff-the-shelf 802.11 MIMO system. The MAC layer module isalso highlighted in Fig. 5. Note that the components in whiteboxes follow the 802.11 standard, and those in shadow boxesinvolve our modifications to the 802.11 standard. In what fol-lows, we elaborate these modifications and explain why MuViis a lightweight system.

MuVi requires the 802.11 AP to classify 3D MVC packetsaccording to their views before transmission. On one hand, it iseasy to classify data units of each view by parsing the header ofNALU without decoding them. On the other hand, the existing802.11 MAC layer allows to configure multiple queues. The dataunits of the same view are stored at the same queue. Maintain-ing multiple queues have been well studied in 802.11 enhanceddistributed channel access (EDCA) so that our MuVi can takeadvantage of the mature MAC-layer protocol. The distortion es-timator is in the controller that calculates the expected distortionbased on the CSI and the distortion reduction of each data unit.The controller further computes the resource allocation strategythat will be executed at the physical layer.

VI. PERFORMANCE EVALUATION

A. Metrics and Experimental Setup

We evaluate the performance of MuVi on our testbed by com-paring it with state-of-the-art wireless transmission schemes. Inwhat follows, we describe our performance metrics, experimentsetup and results.

1) Performance Metrics: We evaluate multi-view videostreaming quality with the standard Peak Signal-to-Noise Ratio(PSNR) metric. We can use PSNR = 20 log10

2 l e n −1√M SE

to cal-culate PSNR, where len is the number of bits, typically 8 forencoding pixel luminance. Mean squared error (MSE) is calcu-lated between all pixels of the decoded video and the originalvideo.

2) Experimental Setup: All the reference multi-view videosequences including Kendo, and Balloons are obtained from theNagoya University Multi-view Sequences Download List.6 Theframe rate is 30 fps (frames per second), and the sample formatis YUV420. The size of a frame has 1024 × 768 pixels, and thesize of GoP is set to eight frames for source processing. Weadopt the JMVC 8.5 to encode two views for 2 × 2 MIMO link.Once an error occurs at a macro-block, this MB is replaced bythe one generated through the weighted pixel averaging on thenearby MBs in our experiment. The priority level price η(v ) isdefined as a simple function to normalize distortion reductionat the rth transmission.

We use two WARPv3 platforms each equipped with two an-tennas, to build a 2 × 2 MIMO system in an indoor environment.We also move the antennas of receiver to different locations inour lab to acquire different CSIs. We use 20 MHz bandwidthand 64 OFDM sub-channels including 48 for data transmission.We define the size of data unit or packet is about four bytes, andthe size is almost equal to the size of a macro-block.

B. Evaluation Results

Here, we run several benchmark experiments to assess MuVi,including distortion reductions, PSNRs and effects of inaccurateCSI.

1) Distortion Reduction: We first measure the distortionreduction of MVC videos using JMVC 8.5. Nonetheless, the

6[Online]. Available: http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data/.


Fig. 6. Balloons-1024 × 768: distortion reduction. (a) Base view. (b) Auxiliary view.

Fig. 7. Kendo-1024 × 768: distortion reduction. (a) Base view. (b) Auxiliary view.

Fig. 8. PSNR for MuVi and WFA methods for “Balloons” and “Kendo” without FEC coding. (a) Base view. (b) Auxiliary view. (c) Base view.(d) Auxiliary view.

JMVC 8.5 does not allow any error [17] in decoding process.Therefore, we modify the JMVC decoder to accept an MBerror. We modify the JMVC decoder slightly to accept an MBerror, that is, the decoding is executed after the above errorconcealment procedure. The distortion reduction distribution isshowed in Figs. 6. and 7. The MVC videos are “Balloons” and“Kendo” which both have two views with QP = 37.

The distortion reduction is calculated as ΔPSNR = PSNRno error − PSNRerror . Specially, in JMVC, PSNRno error =99 represents the maximum video quality. From the distortionreduction distribution in both videos, the distortion reduction ofeach MB is different. The average distortion reduction of baseview is higher than that of the auxiliary view. The reference Iframe has the largest distortion reduction than those of others,but every MB of I frame has different distortion reduction. Note

that not all distortion reductions of MBs in I frame is higherthan distortion reductions of MBs in other frames. For example,in Figs. 6 and 7, some MBs of 4th frame have higher distortionreductions than some distortion reductions in I frame macro-blocks. The MBs with higher distortion reduction need to beprotected with higher priority.

2) PSNR for MuVi and WFA: We use the optimal power al-location scheme, water-filling for antennas (WFA), which is asthe equal error protection reference scheme. We compare theperformance of MuVi with WFA. When we use BPSK for SU-MIMO transmission, PSNR of MuVi has about 3 dB higherthan WFA in Fig. 8. In these figures, even in the lower SNRi.e., 5 dB, MuVi can achieve above 30 dB PSNR, while the per-ception of multi-view video quality of WFA cannot be toleratedby users. The reason is that MuVi considers the MVC video


Fig. 9. “Balloons” MVC video results for different resource allocation schemes at 10 dB SNR. (a) MuVi: base view. (b) WFA: base view. (c) MuVi: auxiliaryview. (d) WFA: auxiliary view.

Fig. 10. “Kendo” MVC video results for different resource allocation schemes at 10 dB SNR. (a) MuVi: base view. (b) WFA: base view. (c) MuVi: auxiliaryview. (d) WFA: auxiliary view.

Fig. 11. PSNR for MuVi and WFA methods for all traces. (a) Base view. (b) Auxiliary view. (c) Base view. (d) Auxiliary view.

streaming to allocate physical resource for UEP, but the WFAtreats all the MVC video streams equally. When the diversity ofantennas is large, the quality of MVC video streaming will bedegraded over the worse wireless channel. However, MuVi canimprove the PSNR via protecting important views. For exam-ple, although the packet of auxiliary view has errors, the relatedpart of base view can recovery the errors in auxiliary view byerror concealment. Furthermore, the line segment in each set ofexperiments indicates the 95% confidence interval.

Figs. 9 and 10 show the images of “Balloons” and “Kendo”multi-view videos under different schemes with average 10 dBSNR channel condition. One can observe that MuVi outper-form the WFA. Moreover, we find that in lower receiving SNRregimes, MuVi can protect important views better.

3) Effects of Channel Coding: MuVi is independent of thechannel coding. It means that MuVi can work with any channelcoding and be integrated in state-of-the-art Wi-Fi systems. Infact, if the channel coding is adopted in MuVi, it can providebetter performance. We evaluate the PSNRs of MuVi with 1/2-convolutional code in Fig. 11. Obviously, albeit both PSNRsof MuVi and WFA are better than those of their counterpartwithout channel coding, MuVi still outperforms WFA.

4) Effects of CSI Inaccuracy: We also measure the impact ofinaccurate CSIs in MuVi. Especially, the performance of ZFBFprecoding is sensitive to accurate CSIs, and that will impact onthe performance of MuVi. In general, the inaccuracy of CSIs is

caused by noise of estimation and precision errors in the staticcase. In a mobile case, inaccurate CSIs occur in quick-varyingchannels. The precoding of MIMO and the resource allocationscheme of MuVi depend on CSIs. While the CSIs are inaccurate,the entire spatial streams are not able to be de-correlated bythe precoding. Even though CSI delay causes precoding lesseffective, it is still better than no precoding. without precoding,the signals from different transmitting antennas will interferewith each other.

As shown in Fig. 12, when the CSI delay changes from 10 msto 1000 ms (1s), PSNRs of both base view and auxiliary vieware degraded by the CSIs delay. However, even without accurateCSIs, MuVi has a more superior PSNR than WFA, since theresource allocation scheme of MuVi can maximize the overalldistortion in the MVC video streaming. We know that eventhough CSI delay is larger than 1s, the PSNR of MuVi is betterthan 32 dB, so the overhead CSI feedback of MuVi can beaccepted.

5) CDF of All Traces: We study the performance of MuViand WFA schemes for all traces. Note that we use the MATLABfunction ksdensity() to draw CDFs to make the curves smooth.From Fig. 13(a) to Fig. 13(d) illustrate that MuVi always obtainsa much higher PSNR than that of WFA scheme. However, wecan see the PSNR of MuVi in “Balloons” is higher than it in“Kendo”. Because “Kendo” has more motions than “balloons”.From Figs. 6. and 7, the variance of distortion reduction in


Fig. 12. PSNR for MuVi and WFA methods for “Balloons” and “Kendo” with 1/2 FEC coding. (a) Base view. (b) Auxiliary view. (c) Base view. (d) Auxiliaryview.

Fig. 13. PSNR for MuVi and WFA methods in different CSI delay at 10 dB SNR. (a) Base view. (b) Auxiliary view. (c) Base view. (d) Auxiliary view.

“Balloons” is higher than that in “Kendo”. More MBs of differ-ent views have almost “equal” distortion reductions. Therefore,MuVi will work like WFA to protect these MBs. In a word,MuVi outperforms WFA in all the traces.

Fig. 14(a) and 14(b) demonstrate the CDFs of BER in eachview at 10 dB and 15 dB SNR, respectively. Here, view 0 (baseview) is more important than view 1 (auxiliary view) in thisMVC coding scenario. We can see in both figures that our algo-rithm has a lower BER on view 0 than on view 1 for the averageSNR at both 10 dB and 15 dB. A lower BER of view 0 can pro-tect the important base view well, while sacrificing the quality ofview 1 under adversary channel conditions. For WFA algorithm,it is agnostic to the video content so that the CDF curves of dif-ferent views are very similar. A cross-comparison shows thatour algorithm yields a lower BER on view 0 than WFA, but hasa worse BER on view 1. Due to the relative importance of view0, our algorithm will yield a better PSNR compared with WFA.

6) Simulation for Scenarios With More Antennas: In orderto study the performance of MuVi for more antennas and views,we extend our experiment for 4 × 4 MIMO systems. Since weare short of the expensive system, we hereby use different tracesto assemble 4 × 4 MIMO traces. We use JMVC to generatea 4-view “Kendo”. We compare MuVi with WFA with fourantennas MIMO systems. We use an exhaustive search methodto find the optimal solution of problem (6). The results of threemethods are illustrated in Fig. 15. One can see the PSNRs ofMuVi at lower SNRs are very close to optimal solution. At highSNRs, MuVi is worse than optimal solution. This is becauseMuVi adopts a static method to assign each antenna accordingto its SNR. Nonetheless, optimal solution can dynamically finda solution to maximize the expected distortion reduction. Infact, when the PSNR is higher than 37 dB, the received videois with an excellent quality. Therefore, MuVi can still workefficiently in the scenarios with more antennas and views.

Fig. 14. CDF of BER for “Balloons” without FEC coding. (a) CDF of BERin each view at 10 dB SNR. (b) CDF of BER in each view at 15 dB SNR.

7) Statistical Significance Analysis: We perform the statisti-cal significance analysis in here. The proposed scheme is com-pared with the benchmark scheme, namely WFA, using all theexperimental results. The student-t test is adopted in our study.Our approach is the following. We let the PSNR values of ourscheme be vector muvi, and let those of the benchmark schemebe vector wfa. We use MATLAB ttest2 function to validatethat MuVi is better than WFA with statistical significance. TheP-value is 2.7848 × 10−6 and the statistical value is 4.8596.Since the P-value is far below 0.01, one can conclude that MuViis better than WFA with statistical significance.

8) SSIM Results: PSNR is the most frequently used metricto evaluate the objective image quality after the transmission.In some situations, PSNR might not be consistent with humanvisual perception so that the structural similarity (SSIM) [18]is introduced as an alternative metric to access image quality.Note that SSIM contains the structure information of objectsin the images. Although our experimental study has shown thesignificant performance gain of MuVi, we further measure theSSIM of video images with MuVi and water-filling allocationschemes for comprehensiveness. The SNR ranges between 5 dB


Fig. 15. CDFs of SSIM for MuVi and WFA methods. (a) Base view of “Balloons”. (b) Auxiliary view of “Balloons”. (c) Base view of “Kendo”. (d) Auxiliaryview of “Kendo”.

Fig. 16. “Kendo” MVC video results for water filling, MuVi, and optimal solution. (a) View 1. (b) View 2. (c) View 3. (d) View 4.

and 25 dB, and the SSIM index is obtained by contrasting eachsource image with the received image. We do not differ eachscenario case by case, but comparing the CDF of SSIM in all theexperiments. In Fig. 16, x-axis indicates the SSIM value, and y-axis indicates the percentage of experiments with SSIM belowa certain value. One can see that the SSIM index of MuVi isstatistically much higher than that of the benchmark algorithm.

VII. RELATED WORK

A. Conventional Cross-Layer Design for Video Transmission

Unequal power allocation schemes for video transmissionhave been studied in MIMO systems. Authors in [19] pre-sented a novel adaptive channel selection scheme for scalableH.264/AVC video transmission over MIMO systems. With thisapproach, the overall video quality can be maximized by theco-design of application and physical layers. To achieve de-sirable prioritized spatial multiplexing, [20] proposed a novelprecoding scheme that is capable of integrating both channeland source coding characteristics. The important video layersin a SVC video are assigned with relatively higher transmissionpower for better protection. Authors in [7] proposed an archi-tecture for real-time video transmission over MIMO systemsusing loss visibility side information. In [21], authors built avery simple test-bed with two computers, each equipped withan 802.11ac device, and evaluated the quality of transmittinghigh-definition video experimentally.

Apart from source coding and resource allocation, ForwardError Correction (FEC) is also an important way to enableUEP video transmission in wireless channels. In [22], theauthors aimed to minimize the distortion for layered videosover an MIMO channel. They consider three key components:video layer extraction (temporal and quality layers), video layerscheduling and FEC rate allocation at the application layer. Thepacket error rate in the wireless channel cannot be controlled,

and a certain amount of redundancy is introduced to combatchannel errors. Authors in [23] focused on the the analysis anddesign of space-time coding in an MIMO channel. The multi-media service is simplified as the bitstreams with unequal targeterror rates.

Recently, analog video streaming [24] revives as a new cross-layer technology for smooth video transmission that adapts toadditive white Gaussian noise (AWGN) channels. Unlike digitalvideo streaming, analog video streaming removes the entropycoding and allocate power directly for each chunk. Authors in[25], [26] further addressed the analog video streaming prob-lems over an MIMO fading channel. Especially, [26] allocatedpower to OFDM sub-channels based on chunk coefficients of theanalog video. One potential drawback of analog video streamingis that a special hardware module is needed.

B. Multiview Video Delivery

There have existed several cross-layer approaches aiming toexploit UEP to improve the quality of multi-view video deliv-ery over wireless networks. Authors in [27] investigated a jointsource channel coding problem for multi-view videos over alossy channel. Their objective was to minimize the transmissionof source packets and forward error correction packets. Authorsin [28] proposed a resource allocation scheme based on UEPto minimize the distortion caused by bit errors of multi-viewvideos. To cope with to channel variation in 3D video transmis-sion, [29] proposed Swift, a novel hybrid digital-analog (HDA)solution, to combine the advantages of both digital and ana-log videos. These studies relied on the single input single out-put (SISO) wireless systems, while today’s Wi-Fi systems areequipped with multiple antennas.

Salim and Xiang et al. presented a seminal study on 3Dview-plus-depth video streaming over an MIMO-OFDM chan-nel in [30]. They formulated a rate-distortion model consistingof the source distortion and the channel distortion. The UEP


is performed mainly by isolating high-priority and low-prioritystreams with the consideration of the current SNR and the packettype. In [31], a novel UEP scheme is proposed that adopted thepacket partitioning for MVC and video plus depth (VpD) overa cooperative MIMO channel. It utilized a concatenating formof the rate-variable low-density parity-check (LDPC) codes,and the diversity of space-time block codes (STBC) to protectmulti-view video packets unequally with different channel cod-ing rates. In [32], the streaming quality of a video is representedby the number of layers that can be supported. Hence, UEP isachieved by determining the number of video layers for eachuser, given the constraint of the number of subcarriers. Our pre-vious work in [33] targeted at the power efficient MVC videotransmission in which the objective is to minimize the powerwith the constraint of video quality guarantee.

To sum up, existing studies on UEP for 3D video streamingmainly utilize the channel coding design at the physical layeror the FEC at the application layer. In contrast, MuVi exploitsUEP through joint power allocation and antenna selection ac-cording to the distortion reduction of multi-view videos in thespatial multiplexing MIMO systems. We further implement itand evaluate its performance on the WARPv3 software definedradio platform.

VIII. CONCLUSION

Robust transmission of multi-view video is very challengingin MIMO systems due to the high bandwidth requirement andcomplicated source coding structure. In this paper, we designand implement MuVi system for multi-view video transmissionover MIMO-OFDM systems. We propose a nonlinear mixedinteger programming framework to combine power allocationwith antenna selection for multi-view video aware transmission.To achieve UEP for MVC videos, we present the two-stage al-gorithms for antenna selection and power allocation. Our MuVisystem is implemented on software radio platform and is com-patible to 802.11 standards. Extensive experiments show thatMuVi significantly outperforms the conventional scheme agnos-tic to MVC videos. For real traces of 2 × 2 MIMO, the PSNRof MuVi is above 3 dB better than that of the conventionalscheme. For simulations of 4 × 4 MIMO, MuVi can achievenearly optimal performance at lower SNRs.

APPENDIX

A. Proof of Theorem 1

Proof: From the optimization problem (6), we know thatwhen x

(v )j = 0 the data unit of view v with distortion ΔD

(v )r is

not transmitted by the antenna j. Then, the expected distortionis simply 0. Thus, we move forward to maximize the utilityfunction U(j, v) under the case x

(v )j = 1.

The first-order and second-order partial derivatives of U(j, v)can be derived for each x

(v )j = 1, according to the Leibniz in-

tegral rule. For convenience, we let Sj = [H · W]2j,j /σ2 , andthen γj = pjSj .

The first-order partial derivative of U(j, v) is

∂U(j, v)∂pj

=lvSj exp(−pjSj )η(v )

B. (24)

The second-order partial derivative of U(j, v) is

∂2U(j, v)∂p2

j

= −Sj lv exp(−2Sjpj )η(v )

4pjπB2

− Aη(v )

2√

πSjpjB− Aη(v )

4√

πSjpjSjpjB(25)

where A = lvS2j exp(−pjSj ), B = (1 − erfc(

√pj Sj )

2 ), anderfc(x) is the error function.

By definition, there have pj > 0, Sj > 0, η(v ) > 0 and lv > 0so that 0 < erfc(pjSj ) ≤ 1. Note that the pj = 0 is impossible

in practice. Since ∂ 2 U (j,v )∂p2

j< 0 for pj > 0, [U(j, v)|x] is always

a concave function of every pj . Therefore, the power allocationproblem (10) is a convex optimization problem.

B. Proof of Theorem 2

Proof: According to the KKT conditions, we have an asso-ciated Lagrangian function L

L(p, λ0 , λ1 , · · · , λM )=M∑

j=1

M∑

v=1

[

x(v )j U(j, v)|x

]

+ λ0

(M∑

j=1

M∑

v=1x

(v )j pj −P

)

−M∑

j=1λj pj . (26)

We obtain the following equation due to the KKT conditions:⎛

⎝

∂[

x(v )j U(j, v)|x

]

∂pj+ λ0x

(v )j

⎞

⎠ pj = 0. (27)

Given the allocation power pj > 0 for each transmission inpractice, we have

∂[

x(v )j U(j, v)|x

]

∂pj+ λ0x

(v )j = 0. (28)

In order to compute the λ0 , we can accumulate all equationswith regard to x

(v )j and submit

∑Mj=1

∑Mv=1 x

(v )j pj − P = 0

into (28). To distinguish the notations in λ0 , we use k to replacej

M∑

k=1

M∑

v=1

∂[

x( v )k U (j,v )|x

]

∂pkpk + λ0

M∑

k=1

M∑

v=1x

(v )k pk = 0. (29)

Then, λ0 is derived as the following:

λ0 = −M∑

k=1

M∑

v=1

∂[

x(v )k U(j, v)|x

]

∂pk· pk

P. (30)


Finally, we obtain the equation as below when submitting λ0to (28):

∂[

x(v )j U(j, v)|x

]

∂pj= x

(v )j

M∑

k=1

M∑

v=1

∂[

x(v )k U(j, v)|x

]

∂pk· pk

P.

(31)

Therefore, (13) holds, which concludes the proof.

REFERENCES

[1] J. Xiong, K. Sundaresan, K. Jamieson, M. A. Khojastepour, and S. Ran-garajan, “MIDAS: Empowering 802.11ac networks with multiple-inputdistributed antenna systems,” in Proc. 10th ACM Int. Conf. Emerg. Netw.Exp. Technol., 2014, pp. 29–40.

[2] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing methodsfor downlink spatial multiplexing in multiuser MIMO channels,” IEEETrans. Signal Process., vol. 52, no. 2, pp. 461–471, Feb. 2004.

[3] B. Yan and H. Gharavi, “Efficient error concealment for the whole-frameloss based on H.264/AVC,” in Proc. IEEE 15th Int. Conf. Image Process.,Oct. 2008, pp. 3064–3067.

[4] A. Vetro, T. Wiegand, and G. J. Sullivan, “Overview of the stereo andmultiview video coding extensions of the H.264/MPEG-4 AVC standard,”Proc. IEEE, vol. 99, no. 4, pp. 626–642, Apr. 2011.

[5] A. Wiesel, Y. C. Eldar, and S. Shamai, “Zero-forcing precoding andgeneralized inverses,” IEEE Trans. Signal Process., vol. 56, no. 9,pp. 4409–4418, Sep. 2008.

[6] A. Goldsmith, Wireless Communications. Cambridge, U.K.: CambridgeUniv. Press, 2005.

[7] A. A. Khalek, C. Caramanis, and R. W. Heath, “Loss visibility optimizedreal-time video transmission over MIMO systems,” IEEE Trans. Multi-media, vol. 17, no. 10, pp. 1802–1817, Oct. 2015.

[8] Z. Orlov, “Network-driven adaptive video streaming in wireless envi-ronments,” in Proc. IEEE 19th Int. Symp. Pers., Indoor, Mobile RadioCommun., Sep. 2008, pp. 1–6.

[9] H. Zhang, Y. Zheng, M. A. Khojastepour, and S. Rangarajan, “Cross-layer optimization for streaming scalable video over fading wirelessnetworks,” IEEE J. Sel. Areas Commun., vol. 28, no. 3, pp. 344–353,Apr. 2010.

[10] M. D. Toksarı and E. Guner, “Minimizing the earliness/tardiness costson parallel machine with learning effects and deteriorating jobs: A mixednonlinear integer programming approach,” Int. J. Adv. Manuf. Technol.,vol. 38, no. 7, pp. 801–808, 2008.

[11] J. B. Rosen, “The gradient projection method for nonlinear program-ming. Part I. Linear constraints,” J. Soc. Ind. Appl. Math., vol. 8, no. 1,pp. 181–217, 1960.

[12] IEEE Standard for Information Technology—Telecommunications andinformation exchange between systems—Local and metropolitan areanetworks—Specific requirements—Part 11: Wireless LAN medium accesscontrol (MAC) and physical layer (PHY) specifications Amendment 4:Enhancements for very high throughput for operation in bands below 6GHz.IEEE Std 802.11ac(TM)-2013 (Amendment to IEEE Std 802.11-2012, as amended by IEEE Std 802.11ae-2012, IEEE Std 802.11aa-2012,and IEEE Std 802.11ad-2012), pp. 1–425, 2013.

[13] X. Xie, X. Zhang, and K. Sundaresan, “Adaptive feedback compressionfor MIMO networks,” in Proc. 19th Annu. Int. Conf. Mobile Comput.Netw., 2013, pp. 477–488.

[14] J. Xiong and K. Jamieson, “ArrayTrack: A fine-grained indoor locationsystem,” in Proc. USENIX Conf. Netw. Syst. Des. Implementation, 2013,pp. 71–84.

[15] J. Xiong and K. Jamieson, “Towards fine-grained radio-based indoor loca-tion,” in Proc. 12th Workshop Mobile Comput. Syst. Appl., 2012, pp. 1–6.

[16] D. Halperin, W. Hu, A. Sheth, and D. Wetherall, “Predictable 802.11packet delivery from wireless channel measurements,” in Proc. ACM SIG-COMM Conf., 2010, pp. 159–170.

[17] “MVC software manual: JMVC 8.5. Joint video team of the ISO/IECMPEG and the ITU-T VCEG,” doc. JVT-B118r2, 2011.

[18] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image qualityassessment: From error visibility to structural similarity,” IEEE Trans.Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[19] D. Song and C. W. Chen, “Scalable H.264/AVC video transmission overMIMO wireless systems with adaptive channel selection based on partialchannel information,” IEEE Trans. Circuits Syst. Video Technol., vol. 17,no. 9, pp. 1218–1226, Sep. 2007.

[20] Q. Liu, S. Liu, and C. W. Chen, “A novel prioritized spatialmultiplexing for MIMO wireless system with application to H.264SVC video,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2010,pp. 968–973.

[21] A. Adeyemi-Ejeye and S. Walker, “4KUHD H264 wireless live videostreaming using CUDA,” J. Elect. Comput. Eng., vol. 2014, 2014, Art. no.183716.

[22] C. Zhou, C. W. Lin, X. Zhang, and Z. Guo, “A novel JSCC schemefor UEP-based scalable video transmission over MIMO systems,” IEEETrans. Circuits Syst. Video Technol., vol. 25, no. 6, pp. 1002–1015,Jun. 2015.

[23] S. H. Chang, J. P. Choi, P. C. Cosman, and L. B. Milstein, “Optimization ofmultimedia progressive transmission over MIMO channels,” IEEE Trans.Veh. Technol., vol. 65, no. 3, pp. 1244–1260, Mar. 2016.

[24] S. Jakubczak and D. Katabi, “A cross-layer design for scalable mobilevideo,” in Proc. Annu. Int. Conf. Mobile Comput. Netw., 2011, pp. 289–300.

[25] X. L. Liu, W. Hu, Q. Pu, F. Wu, and Y. Zhang, “ParCast: Soft videodelivery in MIMO-OFDM WLANs,” in Proc. Annu. Int. Conf. MobileComput. Netw., 2012, pp. 233–244.

[26] X. L. Liu et al., “ParCast+: Parallel video Unicast in MIMO-OFDMWLANs,” IEEE Trans. Multimedia, vol. 16, no. 7, pp. 2038–2051,Nov. 2014.

[27] A. Vosoughi, V. Testoni, P. Cosman, and L. Milstein, “Jointsource-channel coding of 3D video using multiview coding,” inProc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2013,pp. 2050–2054.

[28] T. Fujihashi, S. Kodera, S. Saruwatari, and T. Watanabe, “Wireless multi-view video streaming with subcarrier allocation by frame significance,” inProc. IEEE 80th Veh. Technol. Conf., Sep. 2014, pp. 1–5.

[29] D. He, C. Luo, F. Wu, and W. Zeng, “Swift: A hybrid digital-analogscheme for low-delay transmission of mobile stereo video,” in Proc. ACMInt. Conf. Model., Anal., Simul. Wireless Mobile Syst.,, 2015, pp. 327–336.

[30] O. H. Salim and W. Xiang, “A novel unequal error protection schemefor 3-D video transmission over cooperative MIMO-OFDM systems,”EURASIP J. Wireless Commun. Netw., vol. 2012, 2012, Art. no. 269.

[31] O. Salim, W. Xiang, and J. Leis, “An efficient unequal error protectionscheme for 3-D video transmission,” in Proc. IEEE Wireless Commun.Netw. Conf., Apr. 2013, pp. 4077–4082.

[32] Z. Yang and X. Wang, “Scalable video broadcast over downlink MIMO-OFDM systems,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 2,pp. 212–223, Feb. 2013.

[33] Z. Chen, X. Zhang, Y. Xu, and X. Wang, “POM: Power efficient multi-view video streaming over multi-antenna wireless system,” in Proc. IEEEInt. Conf. Multimedia Expo, Jul. 2016, pp. 1–6.

Zhe Chen (S’17) is currently working toward thePh.D. degree at the School of Computer Science, Fu-dan University, Shanghai, China.

His research interests include designing and imple-menting system for practical large-scale MU-MIMOsystems, and Internet-of-Things application.

Xu Zhang received the B.S. degree in computer sci-ence and the M.Sc. degree in computer architecturefrom Fudan University, Shanghai, China, in 2013 and2016, respectively.

He serves regularly as a Reviewer for many pres-tigious international academic conferences and jour-nals. His research interests include computer net-works and algorithm engineering, especially in wire-less networks and multimedia transmission systems.


Yuedong Xu received the B.S. degree from AnhuiUniversity, Hefei, China, in 2001, the M.S. degreefrom the Huazhong University of Science and Tech-nology, Wuhan, China, in 2004, and the Ph.D. degreefrom the Chinese University of Hong Kong, HongKong, China, in 2009.

He is a tenured Associate Professor with theSchool of Information Science and Technology, Fu-dan University, Shanghai, China. From late 2009 to2012, he was a Postdoc with INRIA Sophia Antipo-lis, Biot, France, and Universite d’Avignon, Avignon,

France. His research interests include performance evaluation, optimization, se-curity, data analytics and economic analysis of communication networks, andmobile computing.

Jie Xiong received the B.Eng. degree from NanyangTechnological University, Singapore, in 2005, theM.Sc. degree from Duke University, Durham, NC,USA, in 2009, and the Ph.D. degree in computer sci-ence from University College London, London, U.K.,in 2015.

He is an Assistant Professor in the School of In-formation Systems, Singapore Management Univer-sity, Singapore. His research interests include build-ing practical wireless and mobile systems that bridgethe gaps between theory and reality. His recent work

appears at MobiCom, NSDI, CoNEXT, Ubicomp, and INFOCOM.Prof. Xiong was the recipient of the prestigious Google European Doctoral

Fellowship in Wireless Networking for his doctoral studies. His Ph.D. the-sis was the 2016 British Computer Society Distinguished Dissertation Awardrunner-up.

Yu Zhu received the B.Eng. (Hons.) degree in elec-tronics engineering and the M.Eng. (Hons.) degreein communication and information engineering fromthe University of Science and Technology of China,Hefei, China, in 1999 and 2002, respectively, and thePh.D. degree in electronic and computer engineer-ing from The Hong Kong University of Science andTechnology, Hong Kong, China, in 2007.

Since 2008, he has been with Fudan Univer-sity, Shanghai, China, where he is currently a Profes-sor with the Department of Communication Science

and Engineering. His current research interests include broadband wireless com-munication systems and networks, and signal processing for communications.

Prof. Zhu was the recipient of the Shanghai Pujiang Scholar Award in 2008and the Fudan Juoxue Award in 2012.

Xin Wang (M’07) received the B.S. degree in infor-mation theory and the M.S. degree in communica-tion and electronic systems from Xidian University,Xi’an, China, in 1994 and 1997, respectively, andthe Ph.D. degree in computer science from ShizuokaUniversity, Shizuoka, Japan, in 2002.

In 1995 and 1998, he was working on China’s pi-oneering telecom-level video conferencing systemsand DVB-S systems with Huawei, Inc., Shenzhen,China. He is currently a Professor with Fudan Uni-versity, Shanghai, China. His research interests in-

clude quality of network service, next-generation network architecture, mobileInternet, and network coding.

Prof. Wang is a Member of CCF and ACM.

2788 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12 ...homepage.fudan.edu.cn/xuyuedong/files/2019/09/muvi17.pdf · 2788 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER

Documents