Top Banner
Creative Commons Attribution-NonCommercial (CC BY-NC). JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019 339 Delay-Aware Fountain Codes for Video Streaming with Optimal Sampling Strategy Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu Abstract: The explosive demand for online video from smart mobile devices poses unprecedented challenges to delivering high qual- ity of experience (QoE) over wireless networks. Streaming high- definition video with low delay is difficult mainly due to the stochas- tic nature of wireless channels and the fluctuating videos bit rate. To address this, we propose a novel delay-aware fountain coding (DAF) technique that integrates channel coding and video coding. In this paper, we reveal that the fluctuation of video bit rate can also be exploited to further improve fountain codes for wireless video streaming. Specifically, we develop two coding techniques: the time-based sliding window and the optimal window-wise sam- pling strategy. By adaptively selecting the window length and op- timally adjusting the sampling pattern according to the ongoing video bit rate, the proposed schemes deliver significantly higher video quality than existing schemes, with low delay and constant data rate. To validate our design, we implement the protocols of DAF, DAF-L (a low-complexity version) and the existing delay- aware video streaming schemes by streaming H.264/AVC and high efficiency video coding (HEVC) standard videos over an 802.11b network on CORE emulation platform. The results show that the decoding ratio of our scheme is 15% to 100% higher than the state of the art techniques. Index Terms: Delay-awareness, fountain codes, video streaming, wireless communications. I. INTRODUCTION I N the recent decade, we have witnessed the bloom of video services over the Internet. Some of them provide pre-recorded video streams such as YouTube and Netflix; others provide live video communications such as Skype and Facetime. As ex- pected, a huge amount of multimedia contents will be gener- ated and consumed. On the other hand, the prevalent smart mo- bile devices make these contents more accessible to people than ever. Thanks to the evolution of communication technologies, such as 3G/4G, LTE, WiFi, etc., wireless networks are widely available in our daily lives. However, despite the promising de- velopments, the stochastic nature of wireless channels still per- sists: Its vulnerability to channel noise, inter-user interference Manuscript received August 13, 2018; approved for publication by Sang-Hyo Kim, Division I Editor, February 23, 2019. This work was supported in part by NSF ECCS-1509212 and NSFC 61529101. K. Sun, D. Wu, and H. Zhang are with Department of Electrical & Com- puter Engineering, University of Florida, USA, email: [email protected], [email protected], [email protected]. Y. Gao is with School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China, email: gaoy- [email protected]. Y. Gao is the correspondence author. Digital Object Identifier: 10.1109/JCN.2019.000024 and low data rate under mobility. The problems easily deteri- orate in video-dominant applications where the requirement on channel quality is the highest. As a result, how to stream videos with low delay, stable data rate and high quality over wireless network raises formidable challenges in communication society. To provide reliable video transmission, advanced coding and signal processing techniques, such as forward error correction (FEC) erasure codes, have been proposed. One important class of FEC codes are fountain codes [1], such as Luby trans- form (LT) code [2] and Raptor code [3]. Fountain codes are ideal for wireless video streaming for the following reasons: (i) Retransmission-free property: Fountain codes will recon- struct the original data using the redundancy sent by the sender, without demanding ACK or retransmission; (ii) efficient broEf- ficient broadcast/multicast: Different receivers can decode from different subsets of received packets as long as the number of re- ceived packets exceed a threshold; (iii) universality: The actual transmission rate will automatically approach channel capacity without the use of channel estimation, which has been proven in [4]. The traditional fountain codes are initially designed for achieving the complete decoding of the entire original file. That means, if a video file is transmitted using traditional fountain codes, users cannot watch it until the whole video file is success- fully decoded. However, a lot of video streaming applications are delay-aware and loss-tolerant, which means (i) the time in- terval between video being generated and being played can not exceed a certain threshold; and (ii) partial decoding is tolerable, albeit higher decoding ratio is still preferred. In order to intro- duce delay awareness into fountain codes, the most intuitive so- lution is to partition the video file into fixed-length data blocks, separately encode them, and transmit sequentially. We call this method the block coding scheme. From the perspective of video transmission, a smaller block size is preferred, because it leads to shorter playback latency. From the perspective of fountain codes, however, the block size needs to be as big as possible to maintain a smaller coding overhead [5]. The fundamental trade- off between video watching experience and coding performance is crucial for the design of delay-aware fountain codes. In this paper, we propose a novel delay-aware fountain code scheme for video streaming that deeply integrates channel cod- ing and video coding. Our scheme is based on sliding window fountain codes (SWFC) [6], which partitions a file into many overlapping data windows and transmits them sequentially. Al- though there have been a number of works on joint fountain- and video-coding design, such as [6]–[10], they do not fully exploit the characteristics of multimedia content in order to optimize video watching experience. The novelty of our scheme lies in that we do not treat the sliding windows as homogeneous, which, according to existing 1229-2370/19/$10.00 c 2019 KICS
14

Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

Nov 07, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

Creative Commons Attribution-NonCommercial (CC BY-NC).

JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019 339

Delay-Aware Fountain Codes for Video Streamingwith Optimal Sampling Strategy

Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

Abstract: The explosive demand for online video from smart mobiledevices poses unprecedented challenges to delivering high qual-ity of experience (QoE) over wireless networks. Streaming high-definition video with low delay is difficult mainly due to the stochas-tic nature of wireless channels and the fluctuating videos bit rate.To address this, we propose a novel delay-aware fountain coding(DAF) technique that integrates channel coding and video coding.In this paper, we reveal that the fluctuation of video bit rate canalso be exploited to further improve fountain codes for wirelessvideo streaming. Specifically, we develop two coding techniques:the time-based sliding window and the optimal window-wise sam-pling strategy. By adaptively selecting the window length and op-timally adjusting the sampling pattern according to the ongoingvideo bit rate, the proposed schemes deliver significantly highervideo quality than existing schemes, with low delay and constantdata rate. To validate our design, we implement the protocols ofDAF, DAF-L (a low-complexity version) and the existing delay-aware video streaming schemes by streaming H.264/AVC and highefficiency video coding (HEVC) standard videos over an 802.11bnetwork on CORE emulation platform. The results show that thedecoding ratio of our scheme is 15% to 100% higher than the stateof the art techniques.

Index Terms: Delay-awareness, fountain codes, video streaming,wireless communications.

I. INTRODUCTION

IN the recent decade, we have witnessed the bloom of videoservices over the Internet. Some of them provide pre-recorded

video streams such as YouTube and Netflix; others provide livevideo communications such as Skype and Facetime. As ex-pected, a huge amount of multimedia contents will be gener-ated and consumed. On the other hand, the prevalent smart mo-bile devices make these contents more accessible to people thanever. Thanks to the evolution of communication technologies,such as 3G/4G, LTE, WiFi, etc., wireless networks are widelyavailable in our daily lives. However, despite the promising de-velopments, the stochastic nature of wireless channels still per-sists: Its vulnerability to channel noise, inter-user interference

Manuscript received August 13, 2018; approved for publication by Sang-HyoKim, Division I Editor, February 23, 2019.

This work was supported in part by NSF ECCS-1509212 and NSFC61529101.

K. Sun, D. Wu, and H. Zhang are with Department of Electrical & Com-puter Engineering, University of Florida, USA, email: [email protected],[email protected], [email protected].

Y. Gao is with School of Computer Science and Engineering, South ChinaUniversity of Technology, Guangzhou, Guangdong, China, email: [email protected].

Y. Gao is the correspondence author.Digital Object Identifier: 10.1109/JCN.2019.000024

and low data rate under mobility. The problems easily deteri-orate in video-dominant applications where the requirement onchannel quality is the highest. As a result, how to stream videoswith low delay, stable data rate and high quality over wirelessnetwork raises formidable challenges in communication society.

To provide reliable video transmission, advanced coding andsignal processing techniques, such as forward error correction(FEC) erasure codes, have been proposed. One important classof FEC codes are fountain codes [1], such as Luby trans-form (LT) code [2] and Raptor code [3]. Fountain codes areideal for wireless video streaming for the following reasons:(i) Retransmission-free property: Fountain codes will recon-struct the original data using the redundancy sent by the sender,without demanding ACK or retransmission; (ii) efficient broEf-ficient broadcast/multicast: Different receivers can decode fromdifferent subsets of received packets as long as the number of re-ceived packets exceed a threshold; (iii) universality: The actualtransmission rate will automatically approach channel capacitywithout the use of channel estimation, which has been provenin [4]. The traditional fountain codes are initially designed forachieving the complete decoding of the entire original file. Thatmeans, if a video file is transmitted using traditional fountaincodes, users cannot watch it until the whole video file is success-fully decoded. However, a lot of video streaming applicationsare delay-aware and loss-tolerant, which means (i) the time in-terval between video being generated and being played can notexceed a certain threshold; and (ii) partial decoding is tolerable,albeit higher decoding ratio is still preferred. In order to intro-duce delay awareness into fountain codes, the most intuitive so-lution is to partition the video file into fixed-length data blocks,separately encode them, and transmit sequentially. We call thismethod the block coding scheme. From the perspective of videotransmission, a smaller block size is preferred, because it leadsto shorter playback latency. From the perspective of fountaincodes, however, the block size needs to be as big as possible tomaintain a smaller coding overhead [5]. The fundamental trade-off between video watching experience and coding performanceis crucial for the design of delay-aware fountain codes.

In this paper, we propose a novel delay-aware fountain codescheme for video streaming that deeply integrates channel cod-ing and video coding. Our scheme is based on sliding windowfountain codes (SWFC) [6], which partitions a file into manyoverlapping data windows and transmits them sequentially. Al-though there have been a number of works on joint fountain- andvideo-coding design, such as [6]–[10], they do not fully exploitthe characteristics of multimedia content in order to optimizevideo watching experience.

The novelty of our scheme lies in that we do not treat thesliding windows as homogeneous, which, according to existing

1229-2370/19/$10.00 c© 2019 KICS

Page 2: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

340 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

methods, have fixed length and uniform sampling distribution.On the contrary, our scheme optimizes every window and thusis a deep-level joint design of multimedia streaming and channelcoding. Innovatively, we take into account video bit rate fluctu-ation and video coding parameters, such as group of pictures(GOP) size and frame rate, at the level of channel coding, andexploits them in the design of a novel delay-aware fountain cod-ing approach. As a result, the proposed schemes take advan-tages of all the benefits of fountain codes, and optimize themin the context of delay-aware video applications. According toa novel performance metric, i.e., in-time decoding ratio, whichbetter reflects the real video watching experience, our methodssignificantly outperform existing schemes.

The contributions and key techniques of this paper are sum-marized below.1. We propose a time-based sliding window scheme to provide

the much desired delay awareness in video streaming. Un-like the existing SWFC schemes that have a fixed number ofpackets in each window, our scheme adaptively selects win-dow lengths according to the number of bits in frames. In thisway, we can maximize the code word length in the codingblocks, so as to achieve higher coding gain within a boundedplayback delay.

2. We propose an optimal window-wise sampling strategy, inorder to deliver a consistent watching experience. Since allthe existing SWFCs uniformly sample and encode the pack-ets within each window, due to video bit rate fluctuation,the received video quality may be time-varying. By opti-mally adjusting the sampling pattern according to the ongo-ing video bit rate, the proposed technique delivers signifi-cantly higher decoding ratio than existing schemes.

3. We develop a delay-aware fountain codes protocol (DAF)by integrating all the above mentioned techniques to deliverthe optimal solution. To reduce computational complexity,we also propose a sub-optimal yet low-complexity versioncalled DAF-L. By comparing with its conventional counter-parts, such as [6], [9], [10], in various scenarios, our ap-proach is shown to yield the best overall performance.An extended discussion of this work could be found in [11].

In [11], we discussed some methods to lower the complexity ofDAF codes. By utilizing model predictive control (MPC) tech-nique, we achieved a sub-optimal and linear-complexity solu-tion to DAF codes encoding, so it could be applied on live videotransmission. The theories and schemes proposed in this paperis the basis of [11].

The paper is organized as follows. Section II proposes thetime-based sliding window scheme, and compares our distinctdesigns with related work. Section III proposes our optimalwindow-wise sampling strategy. Section IV designs the DAFand DAF-L system using all the techniques we proposed inprevious sections. Section V gives the simulation results. Sec-tion VI concludes the paper.

II. DELAY-AWARE SLIDING WINDOW FOUNTAINCODES

Our work focuses on a deep integration of fountain codes andvideo coding, hence the concepts in both fountain codes and

Table 1. Definitions of the notations for the variables related to fountain codes.

Notation Unit DefinitionR byte/s Data rate.C N/A Code rate.PLR N/A Packet loss rate.∆t frame Step size. The number of frames the

window shifts each time it slides for-ward.

W frame Sliding window size.wW (t) packet Number of native packets in the slid-

ing window starting from tth frame.P (t) N/A Sampling probability. The average

number of times for a packet beingsampled in frame t.

k packet Total number of native packets.N packet Total number of coded packets.NW packet Number of coded packets to be sent

within each sliding window.Nwindow window Number of windows.

Table 2. Definitions of the notations for the variables related to video coding.

Notation Unit DefinitionF frame/s Frame rate.TDelay frame Tolerable end-to-end delay.s(frmno) packet Number of native packets in the

frmnoth frame.NGOP frame GOP size. The number of

frames in a GOP.T frame Video length. Total number of

frames in the video sequence.pkt(t0, k) packet The number of packets in the

first k frames in the windowstarting from the t0th frame ofthe video.

pktno(frmno) packet no. Starting packet sequence num-ber of the frmnoth frame inthe video.

frmno(pktno) frame no. Frame sequence number fromwhich the pktnoth originalpacket belongs to.

video coding will be frequently referred to. We define two setsof variables in Tables 1 and 2. Table 1 lists the variables relatedto fountain codes, and Table 2 lists the properties related to videocoding.

For notation simplicity, all the concepts relating to “time” inthis article are actually in the unit of “number of frames.” We

can then get the total number of native packets k as k =T∑t=1

s(t).

The definitions of the other variables will be introduced whenthey are used later in this article.

In the following two sections, we will discuss two key noveldesigns of the proposed scheme, as opposed to those existingdesigns.

A. Sliding Window vs. Block Coding

The concept of SWFC was first proposed in [6] for LT codes.Then, the similar idea was extended to Raptor codes and un-equal error protection (UEP) was applied in [7]. As shown inFig. 1(a), the block coding scheme has a relatively small blocksize, and the coded packets for each block are only linked to thesource packets in a small window. But in Fig. 1(b), the overlapbetween sliding windows makes decoded packets in one windowto help the decoding of other windows. In that sense, the size of

Page 3: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

KAIRAN et al.: DELAY-AWARE FOUNTAIN CODES FOR VIDEO STREAMING ... 341

������� �������

����� ����������� ����� �����������

(a)

�������� ������� �������

������ ������������ ������ ����������� ������ �����������

(b)

Fig. 1. Comparison of coding structure: (a) Block coding scheme and (b) slidingwindow scheme.

the window is virtually extended. As a result, sliding windowschemes virtually extend the block size, so as to enhance theperformance of the fountain codes by reducing the overhead.

In [9], [10], the authors proposed an expanding window foun-tain code scheme. Instead of using the overlapping fixed-sizewindows, the packets in each window must be a subset of thenext window. In other words, the starting position of each win-dow is fixed, but the end position moves forward, so the size ofwindow expands with time. Although this scheme is also delay-aware, it is not suitable for video streaming, since the decodingprobability is unbalanced. For example, if the window size isincreased by a fixed length of block in each time slot, after theaccumulation of T time slots, the sampling probability of thefirst block will be T ×

∑Tk=1(1/k) times higher than the last

one. On the other hand, [8] uses a block coding scheme, andits virtual block size is expanded by duplicating all symbols ineach block. However, all the above schemes does not carefullyexamine of the relationship between block size and the end-to-end delay in delay-aware applications.

In our design, we use an SWFC scheme. The step size be-tween two consecutive windows is ∆t. For simplicity, we as-sume that W and T are integral multiples of ∆t. In order toavoid dividing the frames from one GOP into different windows,∆t should be an integral multiple of NGOP .

A noteworthy difference between SWFC and block codingis the relationship among ∆t, TDelay and window size W . Forblock coding, as shown in Fig. 2(a), because the receiver canonly start to play the content in current block when the trans-mission for this block finishes, and the sender can only start toencode and send the next block’s packets when all the packetsin the next block are available, the end-to-end playback delayTDelay ≥ 2W . An exception of block coding schemes that canachieve real-time streaming is systematic fountain codes, e.g.,systematic Raptor codes [3]. However, the major advantage ofsystematic Raptor codes is its compatibility with traditional re-ceiver with no decoding capability, by only accepting systematicpart and simply discarding the check bits. If systematic Raptor

code is used in the real-time video play-back, it must use thecompatible mode, because the reconstruction of the lost data stillneeds the delay of one data block. Thus, without reconstruction,the performance of real-time systematic Raptor is the same asnaive UDP transmission.

For SWFC, if the step size is ∆t, the encoder can start to en-code the next window as soon as next ∆t packets are available,so the end-to-end play delay TDelay ≥ W + ∆t. The above re-lationships implicitly impose the maximal window size (whichcorresponds to the best coding efficiency) we can set for bothschemes. If we deem block coding as a special case of slidingwindow when ∆t = W , we can see the sliding window cannotexceed (1/2) · TDelay. We also know that the biggest windowsize is obtained when ∆t = 1, as shown in Fig. 2(b).

The number of windows to be sent, Nwindow, could be ob-tained by Nwindow = (T −W )/∆t. A very important derivedparameter isNW , the number of coded packets to be sent withineach sliding window, which is a link between fountain codesand video coding: NW = (R ·∆t)/(F · P ). Then, the totalnumber of coded packets to be sent, N , can be defined based onNW : N = bNwindow × NW c = bR · (T −W )/(F · P )c. Theoverall code rate C is then defined using C = k/N .

B. Time-based Window Size vs. Packet-based Window Size

The distinct aspect about our proposed scheme is that the sizeof windows are based on time (or interchangeably speaking,based on the number of frames).

Although the specific methods may vary, a lot of existingwork, such as [6]–[10], designed the delay-aware fountain codesbased on the following core idea: Group the video data intowindows (either overlapping or non-overlapping), and send thewindows one by one within each period of time. In the afore-mentioned work, the coding parameters, such as the size of thewindows, the speed of window movement, and the total lengthof the data, are constant numbers based on the number of pack-ets. Inherently, the authors considered the number of packets asan abstraction of the number of frames.

However, the packet-based schemes ignored an importantcharacteristic of video data: there are different amounts of bitsfor each frame. Even if rate control techniques are used, theymay inevitably lead to bit rate fluctuation and video qualitydegradation [12]. This fact makes packet-based windows dif-ferent from time-based windows. Dividing the video streamingdata into blocks with fixed number of packets will result in thefollowing phenomenons:1. Improper partition of frames and GOPs: Because a frame

may contain various number of packets, it is highly likelythat packets from one frame, or one GOP, to fall into twoblocks, thus causing video playback error.

2. Uncontrollable delay: Because there are different amountsof frames in each block, the time of delay varies from time totime, so we do not know the resulting time delay. As a result,packet-based window cannot be used in real-time or delay-aware systems. Even if we have to use packet-based win-dow in delay-aware systems, we need to know the numberof packets in each frame of the video before-hand, and selectthe fewest number of packets in any TDelay-frame period asthe window size. In that case, the tolerable delay TDelay is un-

Page 4: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

342 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

������

��������

��������

�������

����

�� ���� �� ������ �� ������

�����

Video stream

(a)

������

��������

��������

Video stream

�������� �����

Window size

(b)

�����������������

�� ��������

��������� ����� ����������������

������ ������ ����������

��������� ����������������

����������� ����� ������

���������������������������������������

(c)

Fig. 2. Compare block vs. sliding window: (a) Block coding, (b) sliding window when ∆t = 1, and (c) meanings of the colors in the blocks.

derused for most of the time, which contradicts the designingprinciple of making the best use of delay.

3. Unstable data rate: Using fixed code rate, the encoder willgenerate the same amount of coded packets within anypacket-based window. However, because of the nonunifor-mity of video bit rate, the data rate will be different in differ-ent time periods.

On the contrary, by using the time-based window size, all theabove issues will be resolved: it will ensure one frame to begrouped in a same window; it will make the best use of the delayat all time; if the encoder generates the same amount of codedpackets within any time-based window, the data rate will be aconstant.

III. OPTIMAL WINDOW-WISE SAMPLING STRATEGY

In order to simplify the analysis, the theories in this work arebased on two basic assumptions: 1) All the packets in a videohave the same importance; 2) we use random loss rate model inthis work. We realize that different picture types have differentimportance, because an error affecting I-frames could have morenegative consequences than that affecting P-frames. Therefore,we use the videos encoded with one I-frame followed by allP-frames in all the examples and experiments.

Except for the picture type, there are other factors that causingdifferent importance of packets, such as content of the picture,relative position in a GOP, etc. However, since the focus of thiswork is to illustrate the method to control the overall samplingdistribution, and uniform distribution is the simplest one, in-troducing unbalanced importance will complicate the problem.This problem can be solved by assigning different importanceweight to each packet and performing unequal error protection(UEP), but it is not in the scope of this work.

A. Nonuniform Global Sampling Distribution Using SWFC

For an LT encoder, each coded packet is generated using thefollowing two steps: (i) Randomly choose the degree dn of thepacket from a degree distribution r(d); (ii) choose dn inputpackets at random from the original packets uniformly, and acoded packet is obtained as the exclusive-or (XOR) of those dnpackets. The designing of the degree distribution r(d) is a keypart of the design. The design must take into account the bal-ance between low computational complexity and high coveragefraction: Some high-degree encoded packets should be gener-ated so that no native packet will be left disconnected; on the

40 60 80 100 120 140 160 180 200 220 240 2600

0.5

1

1.5

2x 10

4

Frame no.

Bits

(a)

40 60 80 100 120 140 160 180 200 220 240 2600.5

1

1.5

2

Frame no.

Accum

ula

ted s

am

plin

g p

robabili

ty

Non−overlap (block coding)Uniform within windows

Per−frame optimization Slope−only optimization

(b)

Fig. 3. (a) The bit rate of foreman, and (b) the results of ASP using differentSWFC schemes. The figures show the frames in the range of 40 to 260: (a)Number of bits in each frame for CIF sequence and (b) comparison of theaccumulated sampling probabilities using different sliding window schemesand optimization strategies for foreman.

other hand, a sufficient proportion of low-degree packets shouldalso be included to facilitate the low-complexity belief propa-gation (BP) decoding. The optimization of degree distributionr(d) for LT code has been well studied in some literatures,e.g., [2], [13], [14]. It should be noted that, the optimizationprocedures are all built on a common prerequisite – the samplingdistribution should be uniform, because the highest efficiency offountain codes is achieved using uniform distribution.

However, we want to point out that with the time-based slid-ing window, even if every window’s sampling distribution is uni-form, the overall sampling distribution may still be nonuniform.The reason is that the number of packets might be different fordifferent frames. Using time-based sliding window and a con-stant data rate, the fountain codes encoder will generate a sameamount of coded packets within a same length of time period.However, since each time frame contains a different number ofnative packets, the average number of times for each packet be-ing sampled will be different across frames. For simplicity, wecall “the average number of times for a packet being sampledin frame t” as sampling probability of frame t, formally defined

Page 5: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

KAIRAN et al.: DELAY-AWARE FOUNTAIN CODES FOR VIDEO STREAMING ... 343

as P (t). Considering in uniform block coding, the native pack-ets in a window are uniformly sampled and non-overlapped, itis easy to understand that P (t) is inversely proportional to thenumber of bits in frame t.

For example, as shown in Fig. 3(a), the bit rate is not constantfor the CIF video sequence foreman. If the video is segmentedinto 20-frame blocks and the data rate of block coding fountaincode is constant, however, as shown as the black line in Fig. 3(b),the fluctuation of sampling probabilities is huge.

In SWFC scheme, instead of being related to only one win-dow, the sampling probability of a frame is related to all thewindows that covers it, as shown in (1).

P (t) =∑

ω∈all windows cover frame t

ppktω (t), (1)

where ppktω (t) denotes the average sampling probability of eachpacket in frame t within the window ω. Here, P (t) accumulatesall the sampling probabilities through all the sliding windowscovering frame t. We call it accumulated sampling probability,or ASP in the rest of this article. Here, because we assume thatthe bits within each frame have equal importance, it is assumedthat all the packets in one frame have a same sampling probabil-ity, which leads to (2).

ppktω (t) =1

s(t)pfrmω (t), (2)

where pfrmω (t) denotes the average number of times (samplingprobability) of the packets in frame t to be sampled, within thewindow ω. s(t) is the number of packets in frame t as definedin Table 2.

For example, using a uniform-distribution sliding window(window size W = 20, step size ∆t = 5) to slide through thevideo sequence foreman, we can obtain the ASP as shown as thered line in Fig. 3(b). The ASPs shown here are normalized, sothe average value of all the probabilities in one scheme is nor-malized to 1.

Fortunately, the overlapping property of the sliding windowprovides a way to stabilize the ASP: The sampling probabilitieswithin each window can be assigned unequally to achieve theoverall uniformity of the ASP.

Although selecting the best sampling distribution for eachwindow is an optimization problem, we can still intuitively un-derstand it as follows: If the vicinity of a window has relativelylow bit rate and it will get higher in the future, in order to makethe overall sampling distribution as homogeneous as possible,we do not want to “waste” the sampling opportunities on the im-minent frames, which are already sampled in previous windowsfor too many times; instead, the encoder should sample morefrom the future side of the window, such that it could compen-sate the low sampling probability of the upcoming high bit rateframes.

In the rest of this section, we will introduce some solutions tothis problem. To give a glimpse of what can the optimal window-wise sampling strategy do, the blue and green lines in Fig. 3(b)show the resulting ASPs using different optimization strategies.We can see that they are significantly more stable than the non-optimal schemes, which are in red and black.

B. Per-frame Optimization Scheme

In order to optimize the sampling distributions for all win-dows, we must know the video length T , window size W andthe number of packets in each frame s(t) (or its vector forms =

[s(1) s(2) · · · s(T )

]). We define pfrmt (i) to denote

the probability of sampling the packets in the ith frame of thewindow starting from the tth frame. As in (2), the samplingprobability for each packet in that frame within the window,ppktt (i), is defined as (3).

ppktt (i) =1

s(t+ i− 1)pfrmt (i). (3)

As in (1), the ASP for the tth frame is defined as (4).

P (t) =

t∑t0=t−W+1

ppktt0 (t− t0 + 1)

=1

s(t)

t∑t0=t−W+1

pfrmt0 (t− t0 + 1).

(4)

For simplicity, this accumulation process does not considerthe step sizes of ∆t other than 1. Because both the video lengthT and window sizeW are defined to be integral multiples of ∆t,if ∆t > 1, all the parameters can be down-sampled by a factorof ∆t. For example, the new T ′ = (1/∆t) ·T ,W ′ = (1∆t) ·W ,

s′(t) =t·∆t∑

i=(t−1)∆t+1

s(i), and ∆t′ = 1, so (3) and (4) still hold.

Let xt,i = pfrmt (i) and make a matrix from them. We get theparameter matrix A as in (5).

A =

x1,1 x1,2 · · · x1,W

x2,1 x2,2 · · · x2,W

......

. . ....

xT−W+1,1 xT−W+1,2 · · · xT−W+1,W

. (5)

The number of rows is T −W + 1 because there are (T −W + 1) windows in total (again, ∆t is assumed to be 1). Thenumber of columns is W because each window has W sam-pling probabilities. Because every row in the matrix representsthe probability distribution within a window, the elements in Amust satisfy the constraints of (6).

W∑w=1

xt,w = 1, ∀t

xt,w ≥ 0, ∀t, w.(6)

With this notation, (4) can be rewritten into a parameterizedform as (7).

PA(t) =1

s(t)

t∑t0=t−W+1

xt0,t−t0+1. (7)

The objective is to find the optimal parameter matrix A,which minimizes the fluctuation of the ASPs, PA(t). Becausein this problem, the parameters to be optimized are the sam-pling probabilities for each frame of every window, we call thismethod the per-frame optimization scheme.

Page 6: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

344 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

�� �� �� ��� ��� ��� ��� ��� ��� ��� ��� �����

�� �����

Sam

plin

g pr

ob. w

ithin

win

dow

s

�P�r-frame optimized distributionsUniform distributionsTrend of the bit rate

Fig. 4. The optimization result of sampling distributions for each window offoreman using per-frame optimization scheme.

B.1 Problem Description

Given the total number of frames T , the window size W , andnumber of packets in each frame s(t), we want to find a setof parameters as in (5), for which the mean square error of thesampling probabilities of all packets attains its minimum value.The optimization problem is defined in (8).

arg minA

T−W+1∑t=W

(PA(t)− PA

)2s.t.

W∑w=1

xt,w = 1, ∀t

xt,w ≥ 0, ∀t, w,

(8)

where PA = 1/(T − 2W + 2)T−W+1∑t=W

PA(t).

It should be noted that the range of frames we want to stabilizeis from W to T −W + 1. Because the frames in that range areall covered by exactly W sliding windows, they are deemed asstable frames. On the other hand, the frames before W or afterT − W + 1 are covered by less than W sliding windows, sothey are considered to be warm-up/cool-down frames, and arenot counted as the targets of the optimization.

B.2 Solution

If the conditions of xt,w ≥ 0 are ignored, this optimizationproblem can be solved using Lagrange multiplier. Otherwise, itcan be solved by Karush-Kuhn-Tucker (KKT) conditions.

An example of the optimization result is shown in Fig. 4.It is the optimization result of sampling distributions for eachwindow of CIF sequence foreman using per-frame optimizationscheme. Window size W = 20 and step size ∆t = 5. Becausethere are too many windows to be clearly shown in one figure,only a fraction of the windows is presented here. The proba-bilities are normalized. The trend of the bit rate, which is rep-resented by green line, is also plotted in the figure, in order toindicate the relationship between bit rate and optimization re-sults. The blue line in Fig. 3(b) shows the resulting ASP usingthis per-frame optimization strategy.

B.3 Computational Complexity

Because there are W × (T −W + ∆t)/∆t2 variables to op-timize and

((T −W )/∆t

)+ 1 conditions for Lagrange multi-

plier (if using KKT conditions, there are (W +∆t)× (T −W +∆t)/∆t2 conditions), the optimization process yields the sys-tem of equations with (W + ∆t)× (T −W + ∆t)/∆t2 equa-tions (or (2W + ∆t)× (T −W + ∆t)/∆t2 equations in KKT

������������� ������������������������������� ����� ���������

�������� ����������

��������������� �������������

�������������� �������������

��������������� ���������������

���

���

����

���� �

Fig. 5. The sampling distributions when slope factor a = 1, a = 0, anda = −1.

conditions). Assuming that T � W � ∆t, if we omit con-stant factors and lower order terms, the solution of both KKTconditions and Lagrange multiplier involves the generation of aparameter matrix of (T ·W/∆t2)× (T ·W/∆t2) and the com-putation of its inverse matrix. As a result, the computationalcomplexity is O

((T ·W/∆t2)3

).

C. Slope-only Optimization Scheme

Although optimizing the sampling distribution for each framewithin every window yields the optimal solution in terms ofminimizing the fluctuation of sampling probabilities betweenframes, per-frame optimization is unrealistic in practical de-signs. First of all, there are too many parameters to be opti-mized. The computational complexity is O

(((T ·W )/∆t2

)3),

which is too high for large T or W . The second problem of theper-frame optimization may not be very obvious at this point,but it will become much clearer when coding scheme is intro-duced in Section IV: In order to reconstruct the coded pack-ets on the decoder side, the encoder must tell the receiver whatsampling distribution is used in each window, by explicitly in-cluding every frame’s sampling probability in the packet header.That will introduce a large overhead in the packet header. Sincebigger packets are more vulnerable to channel noise, includingtoo much information in headers will increase packet loss ratein wireless networks. As a result, a more concise description forthe sampling distributions is needed for the practical designs, sothey can be obtained with lower computational complexity, andbe transmitted in the headers with less bits.

We introduce a slope-only description for the sampling dis-tributions. It requires only one parameter – slope factor, de-noted as a, to control the shape of the distribution. However, itshould be noted that using less bits will inevitably lose the pre-cision of describing the sampling distributions. Therefore, com-pared to the optimal performance that can achieved by usingper-frame description, slope-only description may result in sub-optimal performance.

The slope factor is a real number, and it ranges from −1 to 1.The distribution functions are defined to be linear functions, andthe slope factor only controls their slopes. We do not want anyof the packet’s probability to be 0, because in that case, the ef-fective window size will shrink. As a result, we define that whenthe slope factor a = 1, the distribution function of the packets

Page 7: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

KAIRAN et al.: DELAY-AWARE FOUNTAIN CODES FOR VIDEO STREAMING ... 345

!

!"#$%&'()*

!"!#"

!

!

"!"! # "

$#$!

$##

!

!"! # "

"

!"! #"

$#!

Fig. 6. The distribution functions when slope factor a = 1, a = 0, and a = −1.

starts from 0 and increases linearly, forming a forward triangu-lar distribution, as shown in the top of Fig. 5; when the slopefactor a = 0, the distribution function is a uniform distribution,as shown in the second line of Fig. 5; when the slope factora = −1, the distribution function is the reverse of that in a = 1,or a backward triangular distribution, as shown in the third lineof Fig. 5. Therefore, the distribution functions of all the slopefactor values in the middle are continuously defined.

As defined in table 1, for time t in video sequence, thenumber of packets in the window is wW (t). For each win-dow, a linear distribution function can be defined over the in-terval [0, wW (t)]. Because the integration of the function in[0, wW (t)] must be 1, we can get the definitions of the linesfor different slope factors a. When slope factor a = 1, itpasses points (0, 0) and

(wW (t), 2/

(wW (t)

)), as the red line

shown in Fig. 6; when slope factor a = −1, it passes points(0, 2/

(wW (t)

))and

(wW (t), 0

), as the green line shown in

Fig. 6. The lines for all slope factors will always pass the point((1/2) ·wW (t), 1/

(wW (t)

)). As a result, the distribution func-

tion given any a and t is (9).

y =2a

w2W (t)

x+1− awW (t)

, x ∈ [0, wW (t)]. (9)

As stated in Subsection III.B, the sampling probabilities of thepackets within a same frame should be the same. As a result, theprobabilities of sampling one frame should be grouped together,and the actual sampling probability of each packet should be theaverage value of all packets in its frame. As the example shownin Fig. 7, there are four frames, each of which contains 3, 4, 2and 2 packets, respectively. The actual sampling probability foreach packet is the average value of the packets in the interval ofits frame. Given slope factor a, the probability of sampling theith frame in the window starting from the tth frame, denoted aspfrmt,a (i), is then defined as (10).

pfrmt,a (i) =

∫ pkt(t,i)

pkt(t,i−1)

( 2a

w2W (t)

x+1− awW (t)

)dx

=

(2a

w2W (t)

(pkt(t, i)− s(t+ i− 1)

2

)+

1− awW (t)

)× s(t+ i− 1), for i = 1, 2, · · ·,W,

(10)

where pkt(t, i) is defined in Table 2. The second equality holdsbecause the distribution function is a linear function, and theaverage value is taken at the middle point of each interval.

��������

������� ��������� �������� ���������

Fig. 7. An example of sampling distribution for each frame within a window.The window contains four frames, and the sampling probabilities of thepackets within a same frame should be the same.

As in (2), given slope factor a, the probability of each packetto be sampled in the ith frame within the tth window, denotedas ppktt,a (i), is the average value of the distribution function in theinterval [pkt(t, i− 1), pkt(t, i)], which is defined in (11).

ppktt,a (i) =pfrmt,a (i)

s(t+ i− 1)

=2a

w2W (t)

(pkt(t, i)− s(t+ i− 1)

2

)+

1− awW (t)

,

for i = 1, 2, · · ·,W.

(11)

As in (1), the ASP for each packet in frame t, denoted asPa(t) is defined in (12).

Pa(t) =

t∑t0=t−W+1

ppktt0,at0(t− t0 + 1), (12)

a =[a1 a2 · · · aT−W+1

], (13)

where a denotes the set of slope factors for all windows in thevideo sequence (from frame 1 to frame (T − W + 1) ) as in(13). Again, for simplicity, the accumulation process does notconsider the step sizes of ∆t other than 1.

We can rewrite (12) for clearer notations as in (14).

Pa(t) = d1(t) · a + d2(t), (14)

where “·” denotes the dot product of the two vectors of(T −W + 1) elements, and

d1(t) =[d1(t, 1) d1(t, 2) · · · d1(t, T −W + 1)

];

d1(t, i) =

{2·pkt(i,t−i+1)−s(t)

w2W (i)

− 1wW (i) , if i ∈ [t−W + 1, t];

0, otherwise;

d2(t) =

t∑t0=t−W+1

1

wW (t0).

(15)

From (15) we can see that d1 and d2 are only relevant to s,W and t, but not influenced by a.

With the equations defined above, we can describe the opti-mization problem as follows.

Page 8: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

346 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

�� �� �� ��� ��� ��� ��� ��� ��� ��� ��� ����

���

���

���

�� �����

Sam

plin

g pr

ob. w

ithin

win

dow

s

�Optimized distributionsUniform distributionsTrend of the bit rate

Fig. 8. The optimization result of sampling distributions for each window offoreman using slope-only optimization scheme.

C.1 Problem Description

Given the total number of frames T , the window size W , andnumber of packets in each frame s(t), we want to find a set ofslope factors as in (13), for which the mean square error of thesampling probabilities of all packets attains its minimum value.The optimization problem is defined in (16).

arg mina

T−W+1∑t=W

(Pa(t)− Pa

)2s.t. −1 ≤ at ≤ 1, ∀t,

(16)

where Pa = 1/(T − 2W + 2)T−W+1∑t=W

Pa(t). The range of

frames we want to stabilize is also from W to T −W + 1, forthe same reason as stated in the per-frame condition.

C.2 Solution

As in the per-frame scheme, it can be solved by KKT condi-tions. The result of each window’s sampling distribution usingslope-only optimization is shown in Fig. 8 with the same settingsas in Fig. 4. The green line in Fig. 3(b) shows the resulting ASPusing this slope-only optimization strategy. We can observe that,in terms of stability of ASP, slope-only scheme yields worse re-sult than per-frame scheme.

C.3 Computational Complexity

Because there are (T −W )/∆t+1 variables to optimize and2 ×

((T −W )/∆t + 1

)conditions for KKT conditions, the

optimization process yields the system of equations with 3 ×((T −W )/∆t+1) equations. Assuming that T �W � ∆t, ifwe omit constant factors and lower order terms, the solution in-volves the generation of a parameter matrix of (T/∆t)×(T/∆t)and the computation of its inverse matrix. As a result, the com-putational complexity is O

((T/∆t)3

). Compared to that of

per-frame scheme, the computational complexity of slope-onlyscheme is lowered by the factor of (W/∆t)3.

It should be pointed out that, although the complexity is low-ered, DAF still suffers from high computational overhead whenthe video is long. In this work, our solution is to slice the longvideos into smaller segments, as we will introduce in Subsec-tion IV.C. In order to fundamentally address this issue, we de-veloped an algorithm to lower the complexity from cubic tolinear. The algorithm integrates the model predictive control(MPC) technique into DAF codes, but we will not discuss it inthis article. The interested reader could refer to [11] for moredetails.

D. Degree Distribution

As mentioned in Section III, the degree distribution plays acrucial role in the performance of LT code, and it has been stud-ied in several works, such as [2], [13], [14]. The basic principleof optimal degree distribution is based on robust soliton distri-bution (RSD) proposed in [2]. It derives from the combinationof the elements of mass function of the ideal soliton distribution(ISD) and an extra set of values. The definition of ISD is givenin (17).

ρ(i) =

{1k , for i = 1

1i(i−1) , for i = 2, · · ·, k,

(17)

where ρ(·) stands for ISD. k denotes the data length. Then, a setof supplement values are added to ISD, and it is standardized sothat the values add up to 1. The definition of RSD is given in(18).

β =

k∑i=1

ρ(i) + τ(i),

r(i) =(ρ(i) + τ(i)

)/β, for i = 1, · · ·, k,

(18)

where r(·) denotes the RSD. τ(·) is the set of supplement valuesas described in [2] as (19).

τ(i) =

Ri·k , for i = 1, · · ·, bk/Rc − 1R·ln(R/δ)

k , for i = bk/Rc0, for i = bk/Rc+ 1, · · ·, k,

(19)

where R = c · ln(k/δ)√k. It has two parameters δ and c, which

should derive from an optimization process, such as [15], in or-der to gain the optimal performance. However, it is too expen-sive to compute on-line and is also out of our current scope.Thus, we decided to fix them so RSD can get good average per-formances in our implementations.

However, having the adjusted evaluation criteria in delay-aware applications, the value of data length k needs to bechanged in RSD. In (17), we must ensure that, within any win-dow, there is at least one coded packet having the degree of 1,so the in-time BP decoding could be triggered within the delay.As a result, k should be set to wW , which is the packet numberwithin a window. Because we also add the supplement valuesof τ(·) to increase its robustness, the performance is not overlysensitive to the data length. As a result, for simplicity, we setk for each round of SWFC as the average value of wW of thatsequence.

IV. SYSTEM DESIGN

In this section, we design a practical system for the delay-aware fountain code scheme. From the acronyms of the scheme,we name our protocol as DAF.

Because of the reasons we stated in Subsection III.C, we de-sign DAF protocol based on the slope-only description and op-timization scheme.

Page 9: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

KAIRAN et al.: DELAY-AWARE FOUNTAIN CODES FOR VIDEO STREAMING ... 347

� � � � � � ��

Video stream

Window size

����

Warm-up period Cool-down period

Fig. 9. The definition of warm-up/cool-down periods.

Schedulerdetermines

Video preprocessing module

Send the packetusing UDP.

Sample d packets from Packet BufferIn <StartP, WSize>by PDF generated

from SlopeF.

Video source

Choose degree d from the degree distribution.

(Seed: PacketID)

Video analyzerGets: F, s

Optimizes: a.

End.Set the

next timer.

Assemble packet:

Payload: XOR of d pkt.

Headers

PacketBuffer

Segment frames intoP-length packets,

pad the insufficients.

Legend: Data flow Process flow

Start.Timer

triggered. PacketID, StartP,WSize, SlopeF

User inpute.g. T

Delay, Δt, R, C.

Fig. 10. The header structure of a DAF packet.

A. Warm-up/Cool-down Period

If the window always contains W frames, and it slides fromthe 1st frame to the T th frame at the speed of ∆t = 1, it is easyto realize that all the frames will be covered in W windows,except for the first W − 1 frames and the last W − 1 frames.Namely, the first and the last ith frame will be covered in i win-dows when i ≤ W − 1. We call those two periods as warm-upand cool-down periods (W/CP), as illustrated in Fig. 9, sincethey are undersampled and yield unstable decoding ratio.

In our implementation, before the actual SWFC begins, bothencoder and decoder will obtain the length of W/CP. The en-coder will fill these two periods with padding characters, and thedecoder will do the same and automatically mark those packetsas decoded. Then, the SWFC is performed. The detailed pro-cedures of encoder and decoder will be introduced in Subsec-tions IV.C and IV.D. Also, for the sake of fairness, the pseudo-decoded padding packets in W/CP should not be counted as be-ing decoded in evaluation, since they do not contain any usefulinformation.

B. Packet Structure

The structure of a DAF packet header is shown in Fig. 10.The payload of a DAF packet is coded, and its length is givenin the header. The total size of header is 15 bytes. It includesthe starting packet position of the window (StartP ), the sizeof current window in the unit of number of packets (WSize),the slope factor used in current window (SlopeF ), packet ID(PacketID) and packet size (P ).

The data length of SlopeF determines the precision of theslope factors used in generating sampling distribution. In ourprotocol design, we use 4 bytes as the length of it, which storesa real number as the float type in C++. PacketID starts from1, and will be increased by 1 every time a coded packet is sent.It serves the similar purpose as in fountain code, which is therandom seed for generating degrees and sampling packets.

If a user does not need the sampling distribution optimizationdue to limited computational power, SlopeF field can be set to0, which means uniform distribution, to have the low complexity

Video playback module

PacketBuffer

Start: Receive pkt.

Reconstruct d pkt. from Packet Buffer In <StartP, WSize> by PDF generated

from SlopeF.

Recover degree d

from the degreedistribution.

(Seed: PacketID)

Try decodeusing

BP algorithm.

End.Wait for

next packet.

PacketID, StartP,WSize, SlopeF

Payload

Assemble packets

into frames.

Video playback

Error concealment

Legend: Data flow Process flow

Fig. 11. The flowchart of the DAF encoder.

version of DAF (DAF-L).

C. DAF Encoder

The system design of DAF encoder is shown as a flowchartin Fig. 11. Beforehand, the coding parameters, degree distri-butions and W/CP are already obtained by the encoder. Thesystem takes two sets of input: the parameters assigned by user(e.g., TDelay, ∆t, R, and C ) and the video source.

The video source feeds the system with streamed video data,and it is first processed by the video preprocessing module, asshown in the dotted box. Because the computational overhead ofDAF optimization is cubic to video length, this module shouldslice the long videos into smaller segments, and do the opti-mization separately within each of them. The video could besegmented according to number of frames or scene change. Thismodule gets the information such as F , s,NGOP , and optimizesthe slope factors a. It also segments the data from each frame(or GOP) into several P -byte packets, and pads the insufficientpackets to P bytes. It puts the segmented video packets in thepacket buffer.

The middle row of the flowchart describes the encoding al-gorithm of DAF system. After the procedure is triggered by thetimer, the scheduler will determine whether to move the windowto the next position, according to the parameters and the currentstatus. If not, StartP , WSize and SlopeF remain the same aslast sent packet; if the window slides, let StartP = StartP +∆t, WSize = wW (StartP ), and SlopeF = a(StartP ). Inboth cases, let PacketID = PacketID + 1.

In the next step, a degree d is chosen according to the degreedistribution, like that in LT codes. Then, d packets are sampledfrom the packet buffer in the range confined by StartP andWSize. Each window’s packet-wise sampling distribution isgenerated by (11), given SlopeF . The bit-wise XOR of thesed original packets is obtained as the payload of current codedpacket. The complexity of LT encoding is linear, which is ig-norable compared to optimization process. According to ourexperiment, the average encoding speed for a 1080p movie is1 ms/frame.

Then, the parameters and the payload are assembled as anAPP layer packet, according to the structure shown in Fig. 10.The packet will be sent using UDP. Last but not least, the pro-gram will set the timer to trigger the procedure again accordingto the frequency of sending packets, which is determined by pa-rameters such as F , P , R, C, etc.

D. DAF Decoder

The system design of DAF decoder is shown as a flowchart inFig. 12. Also, the coding parameters, degree distributions and

Page 10: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

348 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

���������� �������

�� ��������

���������������� ��

��������������� ������������������������������� ���������������

��������� ��

�������������������

������ ���������������������

!"��������� ���#

$��������������

���%������ ��

����&�������

��'���� ���

���� �������������� ����� �

�������

%���������� ���

�����������

����������

(���������������

�� ������������������)�������������������)

Fig. 12. The flowchart of the DAF decoder.

W/CP are already obtained by the decoder. The procedure startswhen a coded packet is received.

The decoding procedure is basically the reverse of encodingprocedure. Having StartP , WSize, SlopeF , and PacketID,the degree d, the sampling distribution and the composition ofthe coded packet can be reconstructed. They are fed into a be-lief propagation (BP) decoder, which tries to decode the origi-nal packets. The complexity of BP decoder is O(N2), and theDAF decoder does not add additional complexity. According toour experiment, the average decoding speed for a 1080p movieis 11 ms/frame. The decoded packets are stored in the packetbuffer.

The video playback module requests packets from the packetbuffer as the time goes. First, the packets are re-assembled intoframes (or GOPs). If a packet has not been decoded yet whenit is requested, it is considered as a packet loss. If this happens,image processing techniques such as error concealment may beperformed to fix it before playing it.

E. General Framework of Fountain Code Systems

It is worth mentioning that the framework of DAF system isa generalization of many existing schemes based on fountaincodes. Different fountain code schemes can be easily imple-mented by changing settings and modules in DAF system, butthe protocol does not need to be changed.

For example, if a user does not need the sampling distributionoptimization due to limited computational power, SlopeF canbe set to 0, which means uniform distribution, to have a lowcomplexity version of DAF (DAF-L); the original fountain codecan be viewed as a special case when W = T , and let the timercontinually send the coded packets until an ACK is received;the block coding schemes can also be viewed as special casewhen ∆t = W ; furthermore, the sliding window schemes withpacket-based window size, like [6], is a special case with fixedWSize; finally, expanding window [9], [10] can be viewed asanother special case if we modify the scheduler of the encoder,by fixing StartP .

As a result, the proposed system enjoys the flexibility to meetdifferent requirements.

V. SIMULATION EXPERIMENTS AND PERFORMANCEEVALUATION

A. Simulator Setup

We conduct the simulation experiments on common open re-search emulator (CORE) [16] and extendable mobile ad-hoc net-work emulator (EMANE) [17]. The former provides virtualiza-tion on application (APP), transport (UDP or TCP) and network

(IP) layer controlled by a graphical user interface, and the latterprovide high-fidelity simulation for link (MAC) layer and physi-cal (PHY) layer. The working environment is set up on Oracler

VM VirtualBox virtual machines.We use CORE to emulate the topology of the virtual network

and the relay nodes. Two VMs are connected to the virtual net-work as a source (or encoder/sender) node running the client ap-plication, and a destination (or decoder/receiver) node runningthe server application. A video is streamed from client to serverusing different schemes.

EMANE is used for emulation of IEEE 802.11b on PHY andMAC layer of each wireless node. Because of the forward er-ror correction (FEC) nature of fountain code, we disable the re-transmission mechanism of 802.11b for all fountain-code-basedschemes. For the simplicity of performance evaluation, we alsodisable the adaptive rate selection mechanism of 802.11b, andonly allow the 11 Mbps data rate to be used. Ad-hoc on-demanddistance vector (AODV) protocol is used for routing.

B. Performance Metric

In our work, we use packet decoding ratio to evaluate the per-formance of the schemes. It is worth noting that the evaluationcriteria of delay-aware multimedia streaming is different fromthe file transfer applications, and it is commonly overlookedby existing SWFC schemes. In delay-aware applications, if apacket is decoded after its playback time, it has to be countedas a packet loss for the video decoder, since the player does notrewind the video. As a result, we introduce the metric of in-timedecoding ratio (IDR), which only counts a decoded packet as“in-time” decoded when it is within the current window. Com-paratively, file decoding ratio (FDR) means the percentage oftotal decoded packets after the complete coding session finishes.For SWFC schemes, there is always FDR ≥ IDR; for blockcoding, FDR = IDR.

C. Performance Evaluation

We conduct experiments for the following cases: (i) One hopwith no node mobility (fixed topology) under various delay re-quirements and code rates, (ii) various number of hops with nonode mobility (fixed topology) under fixed packet loss rate perhop, (iii) two hops with a moving relay node (dynamic topol-ogy). We implement six schemes for comparisons, which areabbreviated as follows:1. DAF: The proposed delay-aware fountain code protocol as

introduced in Section IV. Different values of ∆t could bechosen in the experiments.

2. DAF-L: It is the low complexity version of DAF scheme. Itis DAF without using the optimized window-wise samplingdistribution, as proposed in Subsection IV.B.

3. S-LT: The sliding window LT code from [6].4. Block: The block coding for fountain codes. The video file

is segmented into the blocks according to the time delay, andthey are sent one by one using fountain code. Because of thereason we pointed out in Subsection II.A, the block size isset to the half of the allowed TDelay. The step size ∆t is setto the block size on the encoder. SlopeF is set to 0.

5. Expand: This is the expanding window scheme of [9].

Page 11: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

KAIRAN et al.: DELAY-AWARE FOUNTAIN CODES FOR VIDEO STREAMING ... 349

���� ��� ���� ��� ���� ������

���

���

���

���

���

�� ����

���

� �����������

���� ��� ���� ��� ���� ������

���

���

���

���

���

�� ����

���

� ��������

���� ��� ���� ��� ���� ������

���

���

���

���

���

�� ����

���

� ����������

���� ��� ���� ��� ���� ������

���

���

���

���

���

�� ����

���

� ������������

����

�����

���

��� !

"#$�%�

Fig. 13. Relations of IDR vs. code rate of CIF sequence foreman whenTDelay = 0.5, 1, 1.5, and 1.83 s. Five sliding window schemes are com-pared. PLR = 10%. ∆t = 1.

6. TCP: This scheme uses TCP protocol to stream video. In or-der to add delay awareness, the video file is also segmentedinto the blocks like in “Block” scheme, but they are sent us-ing TCP. In order to simulate the limited bandwidth of wire-less network and to provide a fair comparison, the maximumdata rate is limited to the same amount as required by theSWFC schemes.All the five fountain-code-based schemes use the following

parameter setting: The packet size P = 1024 bytes; for degreedistribution, let δ = 0.02 and c = 0.4 (as defined in Defini-tion 11 in [2]), so LT code can get good average performances.

Several benchmark CIF and HD test sequences, providedby [18], are used for our evaluation. They are coded intoH.264/AVC and HEVC format using x264 and x265, encap-sulated into ISO MP4 files using MP4Box, and streamified bymp4trace tool from EvalVid tool-set [19]. The coding structureis IPPP, which contains only one I-frame (the first frame) and noB-frame, and all the rest are P-frames. For the sake of clarity, allthe delays shown in the experiments are in the unit of seconds.Because the frame rate for all sequences are 30 frames per sec-ond, it is easy to convert the unit between seconds and numberof frames.

We conduct 20 experiments for each setting with differentrandom seeds, and take the median value of them as the per-formance measure. IDR and FDR are reported in the experi-mental results.

C.1 Case 1: One hop with no node mobility

In this case, there are two nodes in the network: A sourcenode and a destination node. The communication path from thesource to the destination has one hop. The distance between thetwo nodes is carefully set so that the packets with 1024-bytepayload will have 10% packet loss rate (PLR). The value of ∆tis set to 1 and 5 in DAF scheme.

We use the CIF sequence foreman and H.264/AVC for the ex-periments. Fig. 13 shows the relations of IDR vs. C of CIF se-quence foreman for different TDelay. The results of four delays

��� � ������

���

���

���

���

��

�� �����

��

������ �������

��� � ������

���

���

���

���

��

�� �����

��

������ ��������

��� � ������

���

���

���

���

��

�� �����

��

������ ���������

��� � ������

���

���

���

���

��

�� �����

��

������ ���������

����

� ��

!���"

#$% &�

Fig. 14. Relations of IDR vs. delay of CIF sequence foreman when C = 0.9,0.8, 0.75, and 0.65. Five sliding window schemes are compared. PLR =10%. ∆t = 1.

0

0.2

0.85

0.4

1.8

IDR

0.6

0.8 1.6

Code rate

0.8

0.75

Delay (s)

1.4

1

1.20.710.65

S-LT

DAF-L

DAF

Block

Expand

Fig. 15. Comparison of IDR of foreman. Five delay-aware fountain codeschemes are compared under Case 1.

are shown: 0.5, 1, 1.5, and 1.83 s. Fig. 14 shows the relationsof IDR vs. TDelay of CIF sequence foreman for different C.The results of four code rates (i.e., 0.9, 0.8, 0.75, and 0.65) areshown. Only partial results of “block” scheme are shown, be-cause its values are too small to be maintained in the same scaleas others.

We choose all the combinations of TDelay ∈ [0.8, 1.8] andC ∈ [0.6, 0.9] to conduct the experiments. There are two dimen-sions of variables, TDelay and C, so the results of each schemeform a surface. Fig. 15 shows five surfaces of the schemes.

In Table 3, we compare the numerical results of IDR of thedecoded videos between different schemes with variant delaysand code rates for sequences foreman.

From the results above, we have the following observations:• Among all schemes, DAF produces the highest IDR. As

shown in Fig. 15, almost the entire IDR surface of DAF isabove the other schemes. The performance of DAF-L is lower

Page 12: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

350 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

Table 3. Decoding ratio comparisons of Case 1.

Code Delay Scheme IDR FDRrate (s)DAF(∆t = 1) 93.99% 98.81%DAF(∆t = 5) 93.13% 98.43%DAF-L 74.42% 98.29%

0.74 0.8 S-LT 66.34% 82.78%Block 29.32% 29.32%Expand 56.14% 82.67%TCP 68.21% 68.21%DAF(∆t = 1) 95.20% 98.72%DAF(∆t = 5) 94.35% 98.11%DAF-L 81.35% 97.94%

0.79 1.2 S-LT 73.74% 97.26%Block 27.68% 27.68%Expand 59.21% 83.64%TCP 62.25% 62.25%DAF(∆t = 1) 93.73% 98.22%DAF(∆t = 5) 92.53% 97.96%DAF-L 84.20% 96.90%

0.90 1.7 S-LT 81.71% 96.88%Block 25.14% 25.14%Expand 61.84% 83.18%TCP 57.23% 57.23%

than DAF, but higher than others. DAF outperforms DAF-Lbecause the overall sampling distribution of DAF is more ho-mogeneous. The proposed schemes improve the IDR whencoding resource is insufficient or tolerable delay is small.

• The performance of DAF is higher when ∆t = 1 than when∆t = 5. That is because smaller ∆t leads to more adjustablesliding windows, so stabler sampling probability could beachieved though optimization. However, the computationaloverhead will accordingly increase.

• The performance of S-LT is lower than two proposedschemes, but higher than others. DAF and DAF-L outperformS-LT because their window size is bigger.

• If C is low enough or TDelay is large enough, the decodingratios of all three SWFC schemes converge to 100%. Corre-spondingly, ifC is too high or TDelay is too small, their perfor-mances are equally bad, or DAF may be even worse than theDAF-L scheme. That is because when data rate is extremelylimited, DAF makes all the frames unlikely to be decoded atthe same time, while in DAF-L scheme, some frames withvery low bit rate will be decoded. However, since in those sce-narios the video decoding ratios are below 50%, which is toolow to be properly viewed, they are not the cases we concernabout the most.

• The decoding ratio of all the schemes is an increasing functionof TDelay, and also a decreasing function of C. That meanslarger delay and lower code rate lead to higher overall per-formance, which meets our expectation. Also, Table 3 showsthat in order to obtain the decoding ratio at a certain level, weneed to balance TDelay and C.

• TCP’s performance is relatively low. The reason is that TCPis not suitable for wireless scenarios where PLR is high [20].The slow start, congestion avoidance phases and congestioncontrol mechanisms lower its performance.

• Block scheme performs the poorest among all schemes. Sincethe blocks are too small (TDelay/2) and non-overlapping, thecoding overhead is very large [5].

• The above observations are true for both IDR and FDR.

Table 4. IDR comparisons under Case 2.

Seq. Coderate

Delay Scheme IDR(s) 1 hop 2 hops 3 hops

MobileCIF 0.77 0.5

DAF 95.61% 90.23% 38.07%DAF-L 93.22% 67.55% 36.67%S-LT 66.90% 45.85% 25.97%Block 26.91% 17.10% 14.94%Expand 43.04% 40.90% 37.57%TCP 82.24% 48.09% N/A

AkiyoCIF 0.83 1

DAF 94.85% 90.30% 53.18%DAF-L 93.26% 82.80% 38.03%S-LT 80.68% 38.03% 30.68%Block N/A N/A N/AExpand 51.36% 49.62% 46.97%TCP 76.31% 39.51% N/A

Cactus1080p 0.92 1

DAF 98.25% 92.27% 57.40%DAF-L 94.88% 73.26% 43.03%S-LT 73.54% 53.03% 32.93%Block 46.90% 38.05% 21.21%Expand 54.33% 50.49% 44.83%TCP 89.68% 55.87% N/A

There is always FDR ≥ IDR, as we pointed out in Subsec-tion V.B. For TCP and Block schemes, there is FDR = IDR,because the frames prior to current window will never be de-coded in the future.

• Although decoding ratios of DAF and DAF-L are high (90%−99%) compared to other schemes, it hardly reaches 100%, dueto the limitations of LT code [3].

C.2 Case 2: Various number of hops with no node mobility

The setup of this set of experiments is the following. The net-work consists of a source node, a destination node, and 0 or 1 or2 relay nodes. All the nodes in the network form a chain topol-ogy from the source node to the destination node. The commu-nication path from the source node to the destination node has1 or 2 or 3 hops. All the nodes are immobile; hence the net-work topology is fixed. For all the experiments in Case 2, we setPLR = 5% for each hop/link. ∆t is set to 1 in DAF scheme.

The IDR results of video sequences mobile (CIF), akiyo(CIF) and cactus (1920× 1080) are compared in Table 4, whereN/A means the corresponding decoding ratio is below 10% andunable to recover any consecutive frames, making the actual de-coding ratio insignificant. We have the following observations:• The relationship of DAF, DAF-L and S-LT remains the same

as in case 1 for different PLR: DAF achieves the highest de-coding ratio among all schemes; DAF-L scheme is the sec-ond best; S-LT performs the worst among the three. Thatshows the proposed schemes maintain their advantages overthe state-of-the-art schemes in a wide range of network con-ditions.

• The performance of block coding scheme is still the lowestamong all schemes.

• TCP performs relatively well in the cases when PLR = 5%,but they are extremely inefficient when PLR = 15%. Thatis because its performance is very sensitive to packet losses.High loss rate will cause TCP to time out.

C.3 Case 3: Two hops with a moving relay node

The setup of this set of experiments is the following. Thereare three nodes in the network: A source node and a destination

Page 13: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

KAIRAN et al.: DELAY-AWARE FOUNTAIN CODES FOR VIDEO STREAMING ... 351

Table 5. IDR comparisons under Case 3.

Scheme IDRDAF 85.16%DAF-L 65.21%S-LT 32.42%Block N/AExpand 20.90%TCP N/A

node are fixed, and a relay node is moving. The distance be-tween the source node and the destination node is 366 meters;the transmission range of each node is about 200 meters. Hence,the source node cannot directly communicate with the destina-tion node; a relay node is needed. The relay node is movingback and forth along the straight line, which is perpendicularto the straight line that links the source node and the destinationnode; in addition, the relay node has equal distance to the sourcenode and the destination node. When the relay node moves intothe transmission range of the source node, it can pick up thepackets transmitted by the source node, and relay the packetsto the destination node. When the relay node moves out of thetransmission range of the source node, it cannot receive pack-ets transmitted by the source node although the source nodekeeps transmitting; in this case, all the packets transmitted bythe source node will be lost. The communication path from thesource node to the destination node has two hops. Since therelay node moves around, the network topology is dynamic.

In this set of experiments, we stream the sequence coast-guard, with C = 0.8 (the corresponding R = 2611), andTDelay = 0.8 s. Table 5 shows the IDR of schemes underCase 3. We have the following observations:• DAF performs better than DAF-L. That is because during the

movement of the relay node, the end-to-end packet loss ratewill range from 0% to 100%, and the performance gap be-tween DAF and DAF-L is significant when packet loss rate ishigh, as shown in Figs. 13 and 14.

• DAF and DAF-L still perform the best among all the schemes.However, the decoding ratios are not as high as previouscases. That is because when the relay node temporarily movesout of the connection range, the source does not stop stream-ing video, therefore the content transmitted during the discon-necting period is lost.

• The performance of S-LT is worse than proposed schemes butbetter than others, and the block coding scheme is still amongthe worst schemes, for the same reasons as in Case 1.

• The performance of TCP scheme is also poor, because thedisconnecting period causes the time-out in TCP.

VI. CONCLUSIONS

This paper proposed a novel delay-aware fountain codescheme for video streaming, which deeply integrates channelcoding and video coding. This is the first work to exploit thefluctuation of bit rate in video data at the level of channel cod-ing, and to incorporate it towards the optimal design of videostreaming-oriented fountain codes. Specifically, we developedtwo novel coding techniques: The time-based sliding windowand the optimal window-wise sampling strategy. The proposed

scheme delivers significantly higher video quality than existingschemes with a constant bandwidth cost. We designed two pro-tocols, DAF and DAF-L, to improve video decoding ratio un-der different computational complexity. The simulation resultsshow that the decoding ratio of our scheme is 15% to 100%higher than the state-of-the-art delay-aware schemes in a varietyof settings.

This work hopes to be a first step toward further understand-ing this important coding opportunity. However, there are someissues mentioned in this article that need to be addressed in thefuture. In order to improve this work, we are currently work-ing on the following directions: (i) Lowering the computationalcomplexity of the optimization algorithm; (ii) adaptively choos-ing the coding parameters according to the network condition;(iii) integrating UEP into the DAF framework; (iv) exploitingthe usage in multicast/broadcast applications.

REFERENCES[1] J. W. Byers, M. Luby, and M. Mitzenmacher, “A digital fountain approach

to asynchronous reliable multicast,” IEEE J. Sel. Areas Commun., vol. 20,no. 8, pp. 1528–1540, 2002.

[2] M. Luby, “LT codes,” in Proc. IEEE FOCS, Nov. 2002.[3] A. Shokrollahi, “Raptor codes,” IEEE Trans. Inf. Theory, vol. 52, no. 6,

pp. 2551–2567, 2006.[4] P. Oswald and A. Shokrollahi, “Capacity-achieving sequences for the era-

sure channel,” IEEE Trans. Inf. Theory, vol. 48, no. 12, pp. 3017–3028,2002.

[5] G. Liva, E. Paolini, and M. Chiani, “Performance versus overhead forfountain codes over Fq,” IEEE Commun. Lett., vol. 14, no. 2, pp. 178–180, 2010.

[6] M. C. Bogino et al., “Sliding-window digital fountain codes for streamingof multimedia contents,” in Proc. IEEE ISCAS, May 2007.

[7] P. Cataldi et al., “Sliding-window raptor codes for efficient scalable wire-less video broadcasting with unequal loss protection,” IEEE Trans. ImageProcess., vol. 19, no. 6, pp. 1491–1503, 2010.

[8] S. Ahmad, R. Hamzaoui, and M. M. Al-Akaidi, “Unequal error protectionusing fountain codes with applications to video communication,” IEEETrans. Multimedia, vol. 13, no. 1, pp. 92–101, 2011.

[9] D. Sejdinovic et al., “Expanding window fountain codes for unequal errorprotection,” IEEE Trans. Commun., vol. 57, no. 9, pp. 2510–2516, 2009.

[10] D. Vukobratovic et al., “Scalable video multicast using expanding windowfountain codes,” IEEE Trans. Multimedia, vol. 11, no. 6, pp. 1094–1104,2009.

[11] K. Sun and D. Wu, “MPC-based delay-aware fountain codes for live videostreaming,” in Proc. ICC, May 2016.

[12] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-rate-distortionanalysis for wireless video communication under energy constraints,”IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 5, pp. 645–658,2005.

[13] E. Hyytiä, T. Tirronen, and J. Virtamo, “Optimizing the degree distributionof LT codes with an importance sampling approach,” in Proc. RESIM, Oct.2006.

[14] E. Hyytiä, T. Tirronen, and J. Virtamo, “Optimal degree distribution forLT codes with small message length,” in Proc. IEEE INFOCOM, May2007.

[15] P. Cataldi et al., “Implementation and performance evaluation of LT andRaptor codes for multimedia applications,” in Proc. IEEE IIH-MSP, Dec.2006.

[16] J. Ahrenholz, “Comparison of core network emulation platforms,” in Proc.IEEE MILCOM, Oct. 2010.

[17] U.S. naval research laboratory, networks and communication systemsbranch, “The extendable mobile ad-hoc network emulator (EMANE),”[Online]. Available: http://www.nrl.navy.mil/itd/ncs/products/emane

[18] P. Seeling and M. Reisslein, “Video transport evaluation with H.264 videotraces,” Commun. Surveys Tuts., vol. 14, no. 4, pp. 1142–1165, 2012.

[19] J. Klaue, B. Rathke, and A. Wolisz, “Evalvid–a framework for videotransmission and quality evaluation,” Comput. Performance Evaluation.Modelling Techniques and Tools, vol. 2794, Springer, Berlin, Heidelberg,2003.

[20] G. Holland and N. Vaidya, “Analysis of TCP performance over mobile adhoc networks,” Wireless Netw., vol. 8, no. 2–3, pp. 275–288, 2002.

Page 14: Kairan Sun, Huazi Zhang, Ying Gao, and Dapeng Wu

352 JOURNAL OF COMMUNICATIONS AND NETWORKS, VOL. 21, NO. 4, AUGUST 2019

Kairan Sun received the B.S. degree in InformationSecurity in 2012 from Fudan University, Shanghai,China. He received the M.S. and Ph.D. degrees inElectrical Engineering at the University of Florida,FL, USA, in 2016. Currently, he is a Research Sci-entist at Facebook Inc. His current research interestsinclude network coding, multimedia communication,and natural language processing (NLP).

Huazi Zhang received the B.S. and Ph.D. degreesin Electrical Engineering from Zhejiang University,Hangzhou, China, in 2008 and 2013, respectively.From 2011 to 2013, he worked as a Visiting Scholar inthe Department of Electrical and Computer Engineer-ing, NC State University, Raleigh, NC. From 2013 to2015, he has been working at Nanyang Technologi-cal University and University of Florida as a ResearchAssociate. Currently, he is working at Huawei. Hisresearch interests include coding techniques and wire-less systems.

Ying Gao received the Bachelor’s degree and Mas-ter’s degree from Central South University of Chinaand the Ph.D. degree from South China University ofTechnology, China, in 1997, 2000, and 2006, respec-tively. She is currently a Professor with the School ofComputer Science and Engineering, South China Uni-versity of Technology, China. Her current researchinterests include service-oriented computing technol-ogy, software architecture, and network security. Shehas published more than 30 papers in internationaljournals and conferences.

Dapeng Wu is a Professor at the Department ofElectrical and Computer Engineering, University ofFlorida, Gainesville, FL. His research interests arein the areas of networking, communications, sig-nal processing, computer vision, machine learning,smart grid, and information and network security.He received University of Florida Term ProfessorshipAward in 2017, University of Florida Research Foun-dation Professorship Award in 2009, AFOSR YoungInvestigator Program (YIP) Award in 2009, ONRYoung Investigator Program (YIP) Award in 2008,

NSF CAREER award in 2007, the IEEE Circuits and Systems for Video Tech-nology (CSVT) Transactions Best Paper Award for Year 2001, and the Best Pa-per Awards in IEEE GLOBECOM 2011 and International Conference on Qual-ity of Service in Heterogeneous Wired/Wireless Networks (QShine) 2006. Cur-rently, he serves as Editor in Chief of IEEE Transactions on Network Scienceand Engineering. He is an IEEE Fellow.