MULTIMEDIA COMMUNICATIONS TECHNICAL …mmc.committees.comsoc.org/files/2018/02/01-MMTC_Communication... · and multimodal multimedia are used in multiple areas, including but not

IEEE COMSOC MMTC Communications – Frontiers

http://mmc.committees.comsoc.org 1/56 Vol.13, No.1, January 2018

MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE http://www.comsoc.org/~mmc

MMTC Communications - Frontiers

Vol. 13, No. 1, January 2018

CONTENTS

Message from the MMTC Chair ......................................................................................3

SPECIAL ISSUE ON QoE Evaluation and Control in Immersive Multi-modal

Multimedia Applications ...................................................................................................4

Guest Editors: Pedro A. A. Assunção 1, 2 and Erhan Ekmekcioglu 3 ...................................4 1 Instituto de Telecomunicações, Portugal ..........................................................................4 2 Instituto Politécnico de Leiria, Leiria, Portugal ...............................................................4 3 Loughborough University London, United Kingdom ........................................................4

[email protected]; E.Ekmekcioglu@lboro.ac.uk....................................................................4

Evaluating QoE of Immersive Multisensory Experiences .............................................6

Niall Murray1,2, Yuansong Qiao2, Conor Keighrey1, Darragh Egan1, Débora Pereira

Salgado1, Gabriel Miro Muntean3, Christian Timmerer4, Oluwakemi A Ademoye5,

Gheorghita Ghinea6, Brian Lee2 ..........................................................................................6 1Dept. Of Electronics & Informatics, Faculty of Engineering & Informatics, Athlone

Institute of Technology, Ireland ...........................................................................................6 2Software Research Institute, Athlone Institute of Technology, Ireland ..............................6 3School of Electronic Engineering, Dublin City University, Ireland...................................6 4Dept. Of Information Technology, Alpen-Adria-Universitat Klagenfurt, Austria .............6 5Faculty of Architecture, Computing and Engineering, University of Wales Trinity St.

David, UK ............................................................................................................................6 6Dept. Of Computer Science, Brunel University, United Kingdom .....................................6

Psychophysiological Methods for Quality of Experience Research in Virtual Reality

Systems and Applications ................................................................................................14

Miguel Barreda-Ángeles, Rafael Redondo-Tejedor, Alexandre Pereda-Baños ................14

Eurecat – Technology Centre of Catalonia, Barcelona, Spain .........................................14

[email protected]; [email protected];

[email protected] .............................................................................................14

QoE Concerns and Measurement in Augmented Reality Applications ......................21

Patrick Seeling ...................................................................................................................21

Department of Computer Science, Central Michigan University, MI, USA

pseeling@ieee.org..............................................................................................................21

Emerging levels of immersive experience in MPEG-I video coding ...........................24

Dragorad Milovanovic, Dragan Kukolj ............................................................................24

Dept. of Computer Engineering, Faculty of Engineering, University of Novi Sad, Serbia

24

dragan.kukolj@rt-rk.com...................................................................................................24



Trends in QoE for immersive experiences.....................................................................27

Andrew Perkis, Sebastian Arndt ........................................................................................27

Department of Electronic Systems .....................................................................................27

The Norwegian University of Science and Technology, Trondheim, Norway ...................27

[email protected], [email protected] ................................................................27

SPECIAL ISSUE ON Content Caching and Sharing in Wireless Networks .............33

Guest Editors: ....................................................................................................................33

Zheng Chang, University of Jyväskylä, Finland ................................................................33

Zhenyu Zhou, North China Electric Power University, China ..........................................33

[email protected], [email protected] ............................................................33

Content Caching and Push in Small Cells with Renewable Energy ...........................36

Jie Gong .............................................................................................................................36

School of Data and Computer Science, .............................................................................36

Sun Yat-sen University, Guangzhou 510006, China ..........................................................36

Energy Efficiency Analysis of 5G Content Caching System ........................................41

Di Zhang1,2, Zhenyu Zhou2, Zhengyu Zhu1, Shahid Mumtaz4 ............................................41 1School of Information Engineering, Zhengzhou University, Zhengzhou, 450-001, China.

41 2Department of Electric and Computer Engineering, Seoul National University, Seoul,

151-742, Korea. .................................................................................................................41 3State Key Laboratory of Alternate Electrical Power System with Renewable Energy

Sources, School of Electrical and Electronic Engineering, North China Electric Power

University, Beijing, 102206, China....................................................................................41 4Instituto de Telecomunicações, Aveiro, 1049-001, Portugal. ..........................................41

Energy-Efficient Design for Latency-tolerant Content Delivery Networks ...............46

Thang X. Vu, Lei Lei, and Satyanarayana Vuppala ..........................................................46

The Interdisciplinary Centre for Security, Reliability and Trust (SnT), U niversity of

Luxembourg, 29 Avenue John ............................................................................................46

F. Kennedy, Luxembourg. Email: {thang.vu, lei.lei, satyanarayana.vuppala}@uni.lu ....46

Cooperative Content Caching and Distribution in Multihop D2D-V2V Networks ...51

Yahui Wang*, Zhenyu Zhou*, Houjian Yu*, and Chen Xu* ..............................................51

*School of Electrical and Electronic Engineering, North China Electric Power

University, Beijing, China..................................................................................................51

MMTC OFFICERS (Term 2016 — 2018) .....................................................................56



Message from the MMTC Chair

Dear MMTC colleagues and friends,

I hope this new year 2018 finds you well. On behalf of all the MMTC officers, I would like to wish all

of you and your families a very Happy New Year 2018. May this year bring you all health, happiness,

and prosperity.

I would also like to take this opportunity to thank those who were able to attend the MMTC meeting

at Globecom 2017. The Communication Software, Services and Multimedia Applications Symposium

(CSSMA) at Globecom and ICC is sponsored by MMTC and I encourage all of you to continue to be

actively involved in CSSMA and submit papers there. The next CSSMA will be at ICC 2018 in Kansas

City, Missouri, USA from May 20-24, 2018, and we hope to see you all there.

The MMTC officers would like to encourage members to be actively involved in the TC as well as

help recruit new members. Membership is open to all those who are interested, and more information

can be found at the TC website, http://mmc.committees.comsoc.org. MMTC provides members the

opportunity to actively serve the community by submitting nominations for associate editorship to

journals, special issue proposals, conference chairs, and ComSoc distinguished lecturers. In addition,

the TC can help assist members in the process for elevation to senior and fellow grades. MMTC will

also soon be seeking new officers since the current officer’s term will end in May 2018.

I would also like to point out that Dr. Wenwu Zhu, editor in chief for the IEEE Transactions on

Multimedia (TMM), has a call for nominations for new associate editors. As a sponsoring TC for TMM,

MMTC will be submitting up to 3 candidates for consideration to the TMM steering committee. We

encourage members who are interested to submit nominations by January 31, 2018 following the

instructions in the e-mail sent by Dr. Shiwen Mao. Nominations should include a two page CV,

supporting letters from two senior leaders in the community, and a one page summary of the member’s

past involvement with MMTC.

There will also soon be a call for two service awards. Please be on the lookout for the call and I

encourage you all to submit nominations.

Recently, the TC submitted nominations for Dr. Wanqing Li to serve as a MMTC representative to the

ICME steering committee, Dr. Liang Zhou to serve as the MMTC representative to the CCNC steering

committee, and Dr. Guosen Yue and Dr. Qing Yang as Globecom 19 CSSMA symposium co-chairs.

In addition, Dr. Shiwen Mao, who is the current MMTC chair, was elected as the Chair of the

Distinguished Lecturers' Selection Standing Committee as well as the Vice-Chair for the Technical

Educational and Activities Committee (TEA-C) for IEEE ComSoc for the 2018-2019 term.

Congratulations to all!

I would also like to thank the special issue editors and publication board for their hard work in

publishing this issue of MMTC Frontiers.

Sincerely,

Sanjeev Mehrotra

Vice-Chair, Multimedia Communications Technical Committee, IEEE Communications Society

http://mmc.committees.comsoc.org/

IEEE COMSOC MMTC Communications - Frontiers


SPECIAL ISSUE ON QoE Evaluation and Control in Immersive Multi-modal Multimedia

Applications

Guest Editors: Pedro A. A. Assunção 1, 2 and Erhan Ekmekcioglu 3

1 Instituto de Telecomunicações, Portugal 2 Instituto Politécnico de Leiria, Leiria, Portugal

3 Loughborough University London, United Kingdom

[email protected]; [email protected]

The overarching goal of immersive multimedia environments is to enable life-like experiences and interactions

between people and other media objects, in other words, to cement the gap between the physical and virtual entities.

Amongst the primary objectives is the facilitation of the sense of presence, through the use of multiple sensory

channels like vision, audio, haptics and (more recently) olfaction and taste. Applications that are based on immersive

and multimodal multimedia are used in multiple areas, including but not limited to media and entertainment, gaming,

healthcare (like remote robotic surgery, emergency remote intervention), transport (driver assistance, intelligent

navigation), commerce, digital manufacturing and design, and training. Research and development in various forms

of immersive multimedia is not new. Underpinning technologies, such as multi-camera and light-field capture and

processing systems; coding, distribution, and rendering of three-dimensional environments on high-end display

systems (video and spatial audio); and novel human-computer interfaces, have been widely researched. But especially

with the surge of consumer-grade Virtual Reality (VR) headsets in the past few years, and increasing availability of

mobile Augmented Reality (AR) systems that are powered by advanced object recognition, tracking and localisation

techniques, there has been a considerable rise in the innovation of new use cases for immersive multimedia.

Despite many advances to date, many challenges still remain to be addressed to ensure the practicality and widespread

adoption of emerging immersive experiences. The factors influencing the Quality of Experience (QoE), which has

been widely studied in the context of traditional multimedia applications, needs to be fully understood in the context

of multi-sensory immersive experiences. Not only the video, graphics and audio fidelity are included in the QoE, but

also the level of affective engagement, feel of presence, comfort of use without issues related to cyber-sickness,

intuitiveness of interaction means, and effectiveness of the storytelling approaches count towards the QoE of

immersive experiences. The impact of the constraints associated with the entire processing pipeline, like delay,

bandwidth, devices (e.g., graphical processing power, display resolution, field-of-view, multi-sensory

synchronization) should be modelled to predict the QoE of immersive experiences. On the other hand, the power of

other physiological signals that can be captured via lightweight wearable sensors in monitoring the affective state of

users and indirectly inferring the levels of real-time QoE should be harnessed in the design and control of immersive

applications. In this Special Issue, authors highlight their research findings and perspectives on the topic of Quality of

Experience in immersive multi-sensory multimedia applications.

The first contribution titled “Evaluating QoE of Immersive Multisensory Experiences” by N. Murray, et. al., outlines

a comprehensive overview of the authors’ past works addressing the evaluation of Quality of Experience in immersive

multi-sensorial experiences, in particular olfaction-enhanced multimedia. Authors suggest that understanding user’s

QoE of olfaction based applications is not trivial, and describe a number of research challenges for the multimedia

community in that respect.

The second contribution titled “Psychophysiological Methods for Quality of Experience Research in Virtual Reality

Systems and Applications” by M. Barreda-Ángeles, et. al., provides an outline of the use of psycho-physiological

sensing and measurement technologies on the QoE research associated with immersive applications. The authors

suggest that aspects such as spatial presence, attentional and emotional involvement, cognitive effort, or stress, are

important to explain how a user feels experiencing an immersive service, and psychophysiological-based methods

may be the best option currently available for them to be understood.

In his paper titled “QoE Concerns and Measurement in Augmented Reality Applications”, P. Seeling sheds light on

the current efforts in determining and modelling the QoE associated with Augmented Reality applications. The author

indicates that when determining the QoE in AR scenarios, media fidelity as well as the contrast and colours as a


http://www.comsoc.org/~mmc/ 5/56 Vol.13, No.1, January 2018

consequence of the overlay of content with the real world should be considered. The combination of them is typically

neither readily determinable nor steady and therefore requires new considerations for perceptual models. Furthermore,

the use of brain-computer interfaces for measuring the QoE of AR applications is discussed.

The fourth contribution titled “Emerging levels of immersive experience in MPEG-I video coding”, written by D.

Milovanovic and D. Kukolj provides a summary of the current activities related to the MPEG development and

research activities of the emerging immersive media technologies. They briefly introduce the readers to the exploration

project MPEG-I (Coded representation of immersive media).

Finally, in their paper titled “Trends in QoE for immersive experiences”, A. Perkis and S. Arndt focus on the

importance of novel storytelling methods to fuel the success of emerging immersive media applications. The authors

indicate that digital storytelling is all about enhancing the QoE in the rich digital media, changing the passive viewer

into a participating, engaged and immersed user. The methods vary and include adding interactivity, increasing

dimension, mixing realities all the way through to creating content that triggers more senses. Their research focusses

on the creative aspects of new digital media, designing new platforms for combining art and technology and creating

immersive and interactive content in public spaces (an example project described in detail) and the measurement of

the QoE.

With this Special Issue we have no intent to present a complete picture on the state of the QoE measurement and

control for immersive multimodal multimedia services. However, we hope that the presented papers provide the

audience with a brief tutorial and valuable insight into the persisting challenges in the area, and predictions for the

future research.

Our special thanks go to all authors for their precious contributions to this Special Issue. We would also like to

acknowledge the gracious support from the Board of MMTC Communications - Frontiers.

Pedro A. A. Assunção received the Licenciado and M.Sc. degrees from the University of Coimbra,

in 1988 and 1993, respectively, and the Ph.D. in Electronic Systems Engineering from the University

of Essex, in 1998. He is currently professor of Electrical Engineering and Multimedia

Communication Systems at the Polytechnic Institute of Leiria and a senior researcher at the Institute

for Telecommunications, Portugal. His current research interests include high efficiency and 360-

degree, multi-view video and light field coding, multiple description and robust coding, error

concealment and quality evaluation. He is a senior member of the IEEE.

Erhan Ekmekcioglu received his Ph.D. degree from University of Surrey, UK, in 2010. Between

2010 and 2017 he worked as a post-doctoral researcher in University of Surrey and Loughborough

University, respectively, specialising in the field of video processing and communications. He is

currently a senior lecturer at the Institute for Digital Technologies, Loughborough University

London. His research interests include 2D/3D and multi-view video processing, coding, and

transport, quality of experience, immersive and interactive multimedia. He co-authored around 50

peer-reviewed research articles, book chapters, and a book on 3D-TV systems.



Evaluating QoE of Immersive Multisensory Experiences

Niall Murray1,2, Yuansong Qiao2, Conor Keighrey1, Darragh Egan1, Débora Pereira Salgado1,

Gabriel Miro Muntean3, Christian Timmerer4, Oluwakemi A Ademoye5, Gheorghita Ghinea6,

Brian Lee2

1Dept. Of Electronics & Informatics, Faculty of Engineering & Informatics, Athlone Institute of

Technology, Ireland 2Software Research Institute, Athlone Institute of Technology, Ireland

3School of Electronic Engineering, Dublin City University, Ireland 4Dept. Of Information Technology, Alpen-Adria-Universitat Klagenfurt, Austria

5Faculty of Architecture, Computing and Engineering, University of Wales Trinity St. David, UK 6Dept. Of Computer Science, Brunel University, United Kingdom

1. Introduction

Recent technological advances have led to a profound increase the quality of multimedia content, in addition to

different ways in interacting with and consuming it. Technologies such as Virtual Reality (VR), 360-degree video,

Augmented Reality (AR) and 3D audio aim to support novel immersive and interactive experiences. However, such

approaches towards immersion only stimulate two of the five human senses. Opportunities now exist to target the

human senses outside the traditional audio and visual, to include tactile, olfaction, and gustatory. Hence, it is possible

to develop applications that consider inputs across all senses, i.e., truly immersive and interactive multimedia

experiences. Such experiences may be influenced by the integration of different media formats, sensory modalities,

the context, the user and varying communication/delivery mechanisms; with the aim to increase the perceptual user

and quality of experience. Indeed such experiences are only possible by a multidisciplinary research approach which

involves (and is not limited to) multimedia, psychology (including experimental), human-computer interaction, social

computing and electronics among many others. In addition, the range of applications for virtual reality, augmented

reality, 360-degree video and multisensory experiences is quite diverse with related and unique research challenges.

Such domains include tele-presence, training/education, health, tourism, entertainment etc. Critical to the success of

these immersive and multisensory experiences (IMEx), is the fact that on a per application basis, it is crucial to

understand the perceptual user and the quality of experience (QoE).

In this context, the user QoE of IMEx is complex to model and as a research problem, is multifactorial and

multidimensional. QoE is defined in the QUALINET Whitepaper [1] as: “the degree of delight or annoyance of a

person whose experiencing involves an application, service, or system. It results from the person’s evaluation of the

fulfillment of his or her expectations and needs with respect to the utility and/or enjoyment in the light of the person’s

context, personality and current state”. QoE is a theoretical framework, it is a measurement-centered reflection of a

users’ perception of an application, system, or service. Therefore QoE aligns well with the multifactorial and

multidimensional challenge of modelling user perception of IMEx applications or services. A persons QoE in affect

by influencing factors, which are defined in [2][3] as being “any characteristic of a user, system, service or context

who actual state or setting may have influence of the QoE of the user”. There are a few articles that categorize such

IF’s in different manners [1][3][4] with commonality in the actual factors identified and explained. In [1] as per Fig.

1, the IF’s that effect user QoE are a function of the traditional QoS (device, network, content) metrics and

social/psychological aspects with an overarching categorization within the system, user and context factors.

In terms of olfaction-enhanced multimedia, the literature provides a number of key articles of how olfaction is and

can be employed in future multimedia applications [5][6][7][8][9][10][11][12]. Ghinea and Ademoye in [5] reviewed

works that employed olfaction as a media component in the areas of virtual reality and entertainment. They also

proposed potential future research directions of synchronization, olfactory display development and content

association. The authors of [6][7] presented the use of and potential for olfaction-enhanced multimedia applications

in areas such as education, training, e-health and virtual tourism as well as providing an overview of various

commercially available multisensory technologies. In [8], ten categories of smell experience were defined based on

feedback obtained from over 400 participants in a user study. Considering the rapid development of olfactory sensor

and display technology, olfaction-based multimedia applications are a realistic possibility



Figure 1. Factors influencing user Quality of Experience, adapted from [1]

technically and across a wide variety of application domains. [9][10][11] highlighted opportunities and challenges

around sensorial touch, taste and smell. They outlined key challenges around understanding sensory system processing

within context of HCI: which tactile, olfactory and gustatory experiences HCI designers should design for; designing

interfaces for sensory inputs e.g. olfaction but also interfaces that integrate multisensory experiences i.e. taste & smell.

Finally in [12], a suite of olfaction enhanced multimedia research challenges ranging from standardization, effects of

intensity and duration, application domains, delivery, display development as well as a key problem of methodologies

to evaluate QoE of olfaction enhanced multimedia was discussed.

2. IMEx QoE Methodologies

Evaluating user QoE of traditional media components is non-trivial and the addition of immersive and multisensorial

media components increases this challenge. No standardized methodology exists to conduct subjective quality

assessments of immersive & multisensorial media applications [12]. To date researchers have employed different

aspects of audiovisual standards [13][14] to assess user QoE. In terms of IMEx, the literature reports quality

assessment which can be based on two broad categories: implicit and explicit evaluations [15]. Explicit evaluations

require the user to report, post the experience, perceived quality using predefined scales (e.g. mean opinion score), or

open-ended questions. This has been dominant in efforts to capture user QoE of IMEx [9][16][17]. This said, the

literature also highlights numerous issues with explicit evaluations: time consuming; bias; and inaccuracies in

responses due to external factors [18][19].

Implicit evaluations aim to analyze the relationship between captured physiological measures and user QoE. They

have gained traction, in particular due to their real-time continuous nature. In [20], Engelke et al provide a survey of

psychophysiology-based QoE assessment across a range of multimedia applications. They highlight the advantages

and possible opportunities of capturing physiological data along with the psychological bases of perceptual and

cognitive processes. Further discussion is available is also available in [21][22]. Also in [21], the use of interaction

measures learning, effort required, response times, interaction, errors and satisfaction are employed. These all fall

within the human, system, and context domains of QoE and provide valuable objective data on user QoE from an

interaction perspective. In [12], specific to olfaction enhanced multimedia, the authors highlighted issues researchers

face from numerous perspectives including applicability (or lack of) existing audiovisual standards to evaluate user

QoE and lack of result comparability due to varying approaches, specific requirements of olfactory-based

multisensorial media applications, and novelty associated with these applications. Finally, based on the diverse

approaches in the literature and the collective experience of authors, [12] provides a tutorial and recommendations on

the key steps to conduct olfactory-based multisensorial media QoE evaluation.

3. IMEx QoE studies

In recent times, QoE studies involving IMEx have started to emerge. As mentioned earlier, these have typically fallen

within either implicit or explicit assessment approaches. In this section, we highlight some efforts we have made in

the recent past with respect to QoE studies of IMEx. Initially our work on user perception of olfaction-enhanced



multimedia was inspired by that of Ademoye et al. [22][24][25][26][27]. They instantiated a model first proposed by

Wikstrand [28]. This model considered defined multimedia quality at technical and user perspectives. As such it

proposed consideration of quality at three levels: network, media and content. The network level considered the effects

of transmission over communications networks on user perceptual quality; the media-level considered the influence

on perceptual quality of how the media is coded for transport; and the content-level is concerned with the transfer of

information and level of satisfaction between the video media and the user, i.e. level of enjoyment [28].

Our work to-date [29]-[36] has complimented and extended this by considering network level effects like delay and

jitter [29][33], defining a user profile based on age, gender and culture[30][31][32]; by analyzing the influence on

QoE of scent type [35] and audio masking effects [34] (both content level); and finally how multiple olfactory streams

[33] impact user QoE. The results to date have revealed a number of interesting findings. All of the studies were

performed with respect to olfaction enhanced multimedia QoE have involved explicit assessment approaches,

borrowing facets from various ITU-T standards [13][14] and various ISO sensory analysis standards [37][38][39]. In

this context, the interested reader can view recommendations we proposed on how to perform olfaction enhanced

multimedia QoE evaluations in [12] with respect to assessor screening and training; olfaction-enhanced multimedia

equipment; laboratory and experimental design as well as methodology.

In terms of the impact on user QoE of network influencing factors (delay and jitter) [29][33], we found that users were

quite tolerable to large inter media skew levels as per Fig. 2. Fig. 2 show the user responses to the question of whether

or not skew levels between olfaction and visual media were annoying (with 5 meaning they did not detect any skew,

4 meaning that they detected skew but it was not annoying; 3 slightly annoying and 3-1 varying degrees as of

annoyance with 1 being very annoying). As per Fig. 2, assessors were willing to accept skew levels of +10s when

olfaction was presented before video and 5s when olfaction was presented before video.

In terms of the influence of human factors on olfaction enhanced multimedia QoE, we considered age, gender and

culture as per Fig. 3, Fig. 4 and Fig. 5. In terms of rating the impairment caused by the existence of a synchronization

error, Fig. 3 details the assessor annoyance at varying levels of skews. Assessors rated olfaction before video more

annoying than olfaction after video. The female group were much more sensitive to skew with olfaction before video

than the male group, with both groups reported similar annoyance to skew with olfaction after video. As per Fig. 4,

the younger female group were the most sensitive to skew, with the male (20-30 yrs and 30-40 yrs) and female (30-

40 yrs) group similar in terms of the their rating of skews. The two older groups were the most tolerant to skew.

In terms of defining the temporal boundaries for synchronizing olfactory and video media based on human factors,

we define “in-sync” and “out-of-sync” regions. These boundaries are based on the findings that users were tolerable

to certain skew levels, they defined as “not annoying” (i.e. An impairment rating of above 3.5). It also considers

differences in perception based on gender, age and nationality. The in-synch region spans between a maximum skew

of 0s to -5s/-15s when olfaction is ahead of video, and a maximum skew of 0s to +10s/+15s when olfaction is after

video depending on the age and gender and nationality of the user.

In terms of considering the influence of content level factors i.e. scent type (Fig. 6) and the presence of audio (Fig. 7),

some interesting observations can be made. Firstly from Fig. 6, for each of the scent types, whether pleasant or

unpleasant, it is clear that assessors found scents presented after video less annoying than before video. This is

particularly exaggerated with the “unpleasant” scent types such as foul and burnt. As per [35], 21 statistically

significant differences exist across the different skew levels per scent type. For 15 of the 21 of these; one pleasant and

one maybe unpleasant/pleasant or unpleasant scent type were being compared. Further work on the reasons for this

are required, but initial investigation suggests that the content of the video scene was emphasized with the scent. In

terms of temporal boundaries for synchronization of olfaction enhanced multimedia considering scent type, different

temporal boundaries exist per scent. Again if we consider a MOS of 3.5 as the minimum required rating, for the foul

scent type, presentation from 0s up to +15s was not annoying, whereas, small skew levels (e.g. -5s) when olfaction

was presented before video were below this threshold.

The findings of Fig. 2 and Fig. 7 compare the differences in annoyance levels for users when video only was enhanced

with olfaction (Fig. 2) and audiovisual media was enhanced by olfaction (Fig. 7) across the various skew levels. As is

clear when comparing both figures, users were much more sensitive to skew level in the absence of the audio. In

addition, the results favor the no audio component presentation when the inter-media presentation is in synch. These

results suggest the presence of the audio media component of a multisensorial stream acts as a mask



Fig. 2. Analysis of annoyance level per skew between olfaction Fig. 3. Gender analysis of annoyance level per skew with

and visual media with confidence interval based Confidence intervals based on a 99% confidence level [30].

on 99% confidence level. [29]

Fig. 4. Gender/Age Analysis of Fig. 5. Nationality based analysis of Annoyance Level per Skew Annoyance Level per Skew [30] with confidence intervals based on a 99% confidence level [32]

Fig. 6: Assessor perception of skew type considering scent type [35]. Fig. 7: Assessor perception of skew between olfaction and

audiovisual media [24].

for potential synchronization issues between the olfaction media component and video, hiding some of their negative

effects from the user. These results support and complement the findings in [40].

This concludes our brief overview of our studies in the area of olfaction enhanced multimedia QoE. With an eye to

the future, it is clear that we are only scratching the surface in terms of our understanding the user QoE of IMEx.

There is a significant shortage of research in this area. The delivery of implicit and explicit datasets by the multimedia

community would be very much welcome. In particular, development of such datasets which facilitate analysis to

determine correlations would be very valuable. Considering section 2, it is our belief that we require significantly

more research on the use of psychophysiology-based QoE assessment [20]. In this context, further work to validate

and extend the recommendations we highlighted previously [12] is required. In addition, the type of physiological

sensors employed needs to ensure ecological validity of the data. A collaborative approach is required that

encompasses the multimedia community in addition to HCI, psychology, electronics among many others. Finally since



the range of application domains is so varied, another major challenge from a QoE perspective is how we can address

context based influencing factors which transcends all layers of Fig. 1. A potential approach here may involve the

development of models that estimate or predict QoE.

4. Conclusion

In this article, we have presented a brief overview of our findings with respect to understanding user QoE of olfaction-

enhanced multimedia. We have considered numerous influencing factors as part of QoE evaluations inclusive of

network transmission related effects; human factors and content factors. Understanding user QoE of Olfaction based

applications is non-trivial, and as such we have proposed a number of research challenges for the multimedia

community to consider addressing this research challenge.

Acknowledgement

This work was partly funded by the Irish Research Council New Foundations Scheme.

References

[1] Le Callet, P., Möller, S., and Perkis,A.. 2012. “Qualinet White Paper on Definitions of Quality of Experience“. European

Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003).

[2] Ebrahimi, T. 2009. “Quality of Multimedia Experience: Past, Present and Future”, ACM Multimedia Conference (MM’09), pp.

3-4.

[3] Stankiewicz, R, Cholda, P., Jajszczyk, A. 2011 “QoX: What is it Really?” In IEEE Communications Magazine, vol. 49, no. 4,

pp 148–158.

[4] Reiter, U., Brunnstrom, K., Moor, K. D., Larabi, M-C., Pereira, M., Pinheiro, A., You, J., Zgank, A. 2014. “Factors influencing

Quality of Experience”. In Quality of Experience, Advanced Concepts, Applications and Methods.

[5] Ghinea, G., Ademoye. O. A., 2011. “Olfaction-enhanced multimedia: perspectives and challenges”. In Multimedia Tools and

Applications, vol. 55, no. 3, pp. 601-626

[6] P. T. Kovács, N. Murray, G. Rozinaj, Y. Sulema and R. Rybárová. 2015. "Application of immersive technologies for education:

State of the art," 2015 International Conference on Interactive Mobile Communication Technologies and Learning (IMCL),

Thessaloniki, pp. 283-288, doi: 10.1109/IMCTL.2015.7359604

[7] Murray, N., Qiao, Y., Lee, B., Karunakar, AK, Muntean, G.-M., “Olfaction enhanced multimedia: A survey of application

domains, displays and research challenges". In ACM Computing Surveys 48:4, 2016.

[8] Obrist, M., Tuch, A.N., Hornbaek, K., 2014. “Opportunities for Odor: experiences with smell and implications for

technology” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2843-2852.

doi>10.1145/2556288.2557008

[9] Obrist, M., Vi, C., Ranasinghe, N., Israr, A., Cheok, A., Spence, C., Gopalakrishnakone, P., 2016. “Sensing the Future of HCI:

Touch, Taste, and Smell User Interfaces”. In Interactions, Vol. 23, Issue 5, pp. 40-49, 2016.

[10] Obrist, M., Gatti, E., Maggioni, E., CT Vi, Velasco, C.. “Multisensory Experiences in HCI”. In IEEE Transactions on

Multimedia, Vol. 24, Issue 2, pp. 9-13, June 2017.

[11] Spence, C., Obrist, M., Velasco, C. and Ranasinghe, N. 2017. “Digitizing the chemical senses: Possibilities & Pitfalls”. In

International Journal of Human-Computer Studies, vol. 107, pp. 62-74. https://doi.org/10.1016/j.ijhcs.2017.06.003

[12] Murray, N., Ademoye, O. A., Ghinea, G., Muntean, G.-M. 2017. “A Tutorial for Olfaction-based Multisensorial Media

Application Design and Evaluation” In ACM Computing Surveys (CSUR), Vol. 50, Issue 5, Article No. 67,

doi>10.1145/3108243

[13] ITU-T BT.500. Methodology for the subjective assessment of the quality of television pictures, 2002.

[14] ITU-T P.910. Subjective video quality assessment methods for multimedia applications, 2008.

[15] J. Puig, A. Perkis, F. Lindseth and T. Ebrahimi, "Towards an Efficient Methodology for Evaluation of Quality of Experience in Augmented

Reality," in Quality of Multimedia Experience (QoMEX), 2012.

[16] J. Cha, M. Eid, A. Barghout, A. M. Rahm and A. El Saddik, "HugMe: Synchronous Haptic Teleconferencing," in ACM international

conference on Multimedia, 2009.

[17] A. Drachen, L. E. Nacke, G. Yannakakis and A. L. Pedersen, "Correlation between Heart Rate, Electrodermal Activity and Player Experience,"

in SIGGRAPH Symposium on Video Games, 2010.

[18] T. Hoßfeld, R. Schatz and S. Egger, "SOS: The Mos Is Not Enough!," in Quality of Multimedia Experience (QoMEX), 2011

[19] E. Kroupi, P. Hanhart, J.-S. Lee, M. Rerabek and T. Ebrahimi, "Modeling Immersive Media Experiences by Sensing Impact on Subjects,"

Multimedia Tools and Applications, vol. 75, p. 12409–12429, 2016.

[20] U. Engelke, D. P. Darcy, G. H. Mulliken, S. Bosse, M. G. Martini, S. Arndt, J.- N. Antons, K. Y. Chat, N. Ramzan and K. Brunnström,

"PsychophysiologyBased QoE Assessment: A Survey," IEEE Journal of Selected Topics in Signal Processing, 2017.

[21] Keighrey, C., Flynn, R., Murray, S., Brennan, S., and Murray, N. 2017. "Comparing user QoE of AR and VR applications using physiological

and interaction measurements”. In 25th ACM International Conference on Multimedia (ACM MM 2017), Thematic Workshop, Oct 2017, in Mountain View, CA.



[22] Keighrey, C., Flynn, R., Murray, S., and Murray, N. "A QoE Evaluation of Immersive Augmented and Virtual Reality Speech

& Language Assessment Applications". "A QoE evaluation of immersive augmented and virtual reality speech & language

assessment applications," 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), Erfurt, 2017,

pp. 1-6.

doi: 10.1109/QoMEX.2017.7965656.

[23] Ademoye, O. A. and Ghinea. G. 2009. Synchronization of Olfaction-Enhanced Multimedia. IEEE Trans. Multimed., vol. 11,

no. 3, pp. 561–565.

[24] Ghinea, G. and Ademoye, O. A. 2010. “Perceived Synchronization of Olfactory Multimedia”. In IEEE Trans. on SYSTEMS,

MAN, AND CYBERNETICS – PART A: SYSTEMS AND HUMANS vol. 40, issue 4, pp. 657-663.

[25] Ghinea, G., Ademoye, O. A. 2012. “The Sweet Smell of Success: Enhancing Multimedia Applications with Olfaction”, ACM

Transactions on Multimedia Computing, Communications and Applications (TOMM), vol. 8, no. 1, article 2.

[26] Ghinea, G., Ademoye, OA. 2009. “Olfaction-enhanced multimedia: Bad for information recall?” International conference on

Multimedia and Expo (ICME), pp. 970-973.

[27] Ghinea, G. and Ademoye, OA. 2012. “User perception of media content association in olfaction-enhanced multimedia”. In

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 8, no. 4, article 52.

[28] Wilkstrand, G. 2003. “Improving user comprehension and entertainment in wireless streaming media” In Introducing

Cognitive Quality of Service, Department of Computer Science, Umea, Sweden.

[29] Murray, N., Qiao, Y., Lee, B., Karunakar, AK, Muntean, G.-M. 2013. “Subjective Evaluation of Olfactory and Visual Media

Synchronization” In Proceedings of ACM Multimedia Systems conference. Feb 26 - March 1, Oslo, Norway.

[30] Murray, N., Qiao, Y., Lee, B., Muntean, G.-M., Karunakar, AK,. 2013. “Age and Gender Influence on Perceived Olfactory

and Visual Media Synchronization” In Proceedings of IEEE International Conference on Multimedia and Expo (ICME), San

Jose, CA, 2013, pp. 1-6.

doi: 10.1109/ICME.2013.6607467

[31] Murray, N., Lee, B., Qiao Y., and Muntean, G.-M. 2016. "The Influence of Human Factors on Olfaction based Mulsemedia

Quality of Experience" . In 8th International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, 2016, pp. 1-

6.

doi: 10.1109/QoMEX.2016.7498975

[32] Murray, N., Qiao Y., Lee, B., and Muntean, G.-M. 2014. "User-Profile-Based Olfactory and Visual Media

Synchronization". ACM Transactions on Multimedia Computing Communications and Applications (TOMM), vol 10. issue

1s, article 11, doi>10.1145/2540994.

[33] Murray, N., Lee, B.,Qiao, Y., Muntean, G.-M.. 2014. “Multiple-Scent Enhanced Multimedia Synchronization” In ACM

Transactions on Multimedia Computing, Communications, and Applications (TOMM). Volume 11 Issue 1s, September

2014 Article No. 12 doi>10.1145/2637293.

[34] Ademoye, O.A., Murray, N., Muntean, G.-M. and Ghinea, G. 2016. "Audio Masking Effect on Inter-Component Skews in

Olfaction-Enhanced Multimedia Presentations". In ACM Transactions on Multimedia Computing Communications and

Applications (TOMM), vol. 12, issue 4, article 51.

[35] Murray, N., Lee, B., Qiao, Y., Muntean, G.-M., 2017. "The Impact of Scent Type on Olfaction-enhanced Multimedia Quality

of Experience". In IEEE Transactions on Systems, Man, and Cybernetics, vol. 47, issue 9, pp. 2503-2515, doi>

10.1109/TSMC.2016.2531654

[36] Egan, D., Keighrey, C., Barrett, J., Qiao, Y., Brennan, S., Timmerer, C., and Murray, N. "Subjective Evaluation of an Olfaction Enhanced

immersive Virtual Reality Environment" . In Proceedings of the 2nd International Workshop on Multimedia Alternate Realities, in Mountain View, CA., pp. 15-18, 2017, doi>10.1145/3132361.3132363

[37] ISO 5492:2008 Sensory analysis – Vocabulary, International Standards Organization (ISO)

[38] ISO/IEC 8589 Sensory analysis – General guidance for the design of test rooms.

[39] ISO 5496:2006 – Sensory Analysis – Methodology – Initiation and training of assessors in the detection and recognition of odours

[40] B.R. Brkic, A. Chalmers, K. Boulanger, S. Pattanaik, and J. Covington. 2009. “Cross-modal effects of smell on the real-time rendering of

grass” Spring Conference on Computer Graphics (SCCG), Budmerice, Slovakia, pp. 161-166.

Niall Murray is a Lecturer with the Faculty of Engineering and Informatics, in the Athlone Institute

of Technology (AIT), Ireland. He is founder (in 2014) and principal investigator (PI) in the truly

Immersive and Interactive Multimedia Experiences (tIIMEx) research group in AIT. He is a Science

Foundation Ireland (SFI) Funded Investigator (FI) in the Confirm Centre for Smart manufacturing

and an associate PI on the Enterprise Ireland funded Technology Gateway COMAND. His current

research interests include immersive and multisensory multimedia communication and applications,

multimedia signal processing, quality of experience, and wearable sensor systems. He has published

over 40 works in top-level international journals and conferences and book chapters. Further information available at:

www.niallmurray.info

http://dx.doi.org/10.1145/2637293



Yuansong Qiao is a senior research fellow and a Science Foundation Ireland funded Investigator

working in the Software Research Institute (SRI) at Athlone Institute of Technology. He received

his Ph.D. in Computer Applied Technology from the Institute of Software, Chinese Academy of

Sciences, Beijing, China, in 2007. He received a BSc and an MSc in Solid Mechanics from Beijing

University of Aeronautics and Astronautics, China in 1996 and 1999 respectively. His research

interests include Information Centric Networking, Software Defined Networking, and networking

support for emerging multimedia and big data systems..

Conor Keighrey is a research candidate in the Athlone Institute of Technology (AIT), Ireland.

He received his BSc. in Computer Network Management and Cloud Infrastructure in 2016 and is

currently in pursuit of his PhD. His current research work focuses on understanding the key

influencing factors that affect quality of experience of emerging immersive multimedia

experiences (Augmented Reality and Virtual Reality).

Débora Pereira Salgado is a research candidate in the Athlone Institute of Technology (AIT),

Ireland. She has a BS. in Biomedical Engineering from the Universidade Federal de Uberlândia in

2017 and is currently in pursuit of his PhD. Her current research work focuses on evaluating the

utility and relationship of various physiological metrics and user quality of experience of emerging

immersive multimedia experiences.

Gabriel-Miro Muntean is an Associate Professor with the Dublin City University (DCU) School

of Electronic Engineering and co-director of the DCU Performance Engineering Laboratory,

Ireland. His research interests include quality, performance, and energy saving issues related to

multimedia and multiple sensorial media delivery, technology-enhanced learning, and other data

communications over heterogeneous networks. He has published more than 300 papers in top-level

international journals and conferences, and has authored three books and 16 book chapters and

edited six additional books. He is an Associate Editor for IEEE Transactions on Broadcasting, an

Editor for the IEEE Communications Surveys & Tutorials, and a reviewer for important international journals,

conferences, and funding agencies. He is project coordinator for the EU-funded project NEWTON

(http://newtonproject.eu).

Kemi Ademoye is a lecturer in computing at University of Wales Trinity Saint David. Her

research interests revolve around the areas of multisensory computing, user interfaces and

human-computer interaction, enterprise data integration and application development processes.

Kemi has an MSc and PhD from Brunel University..

Christian Timmerer is an associate professor in the Department of Information Technology

(ITEC), Multimedia Communication Group (MMC), Alpen-Adria-Universität Klagenfurt, Austria.

His research interests include the immersive multimedia communication, streaming, adaptation, and

Quality of Experience. He was the general chair of WIAMIS 2008 and QoMEX 2013 and has

participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next,

ALICANTE, SocialSensor, and COST IC1003 QUALINET. He also participated in ISO/MPEG

work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and MPEG-DASH.

He received his PhD in 2006 from the Alpen-Adria-Universität Klagenfurt. In 2012 he co-founded bitmovin.com to

provide professional services around MPEG-DASH.

Gheorghita Ghinea is a Professor of Mulsemedia Computing in the Computer Science

Department at Brunel University, United Kingdom. His research activities lie at the confluence of

Computer Science, Media and Psychology. In particular, his work focuses on the area of building

end-to-end communication systems incorporating user perceptual requirements. He has authored

http://newtonproject.eu/

https://bitmovin.com/



over 300 articles in peer-reviewed journals and conferences and co-edited two books on Digital Multimedia

Perception and Design, and Multiple Sensorial Media Advance and Applications.

Brian Lee is director of the SRI in the Athlone IT. He joined AIT in August 2009, having previously

been Research manager in LM Ericsson in Ireland where he supervised a team of 20 researchers

investigating solutions in network management for Ericsson’s Operations Support System (OSS) for

mobile and fixed networks. He has over 20 years of experience in research and system design of

network management solutions for large scale telecommunications networks. He has participated in

many national and international research projects and is coordinator for the H2020 project on Cyber

security, Protective (https://protective-h2020.eu/). He holds a PhD from Trinity College Dublin in

the area of policy management applied to charging. His research interests focus on security and self-adaptive software

systems for network management.

https://protective-h2020.eu/



Psychophysiological Methods for Quality of Experience Research in Virtual Reality

Systems and Applications

Miguel Barreda-Ángeles, Rafael Redondo-Tejedor, Alexandre Pereda-Baños

Eurecat – Technology Centre of Catalonia, Barcelona, Spain

[email protected]; [email protected]; [email protected]

1. Introduction

Virtual Reality (VR) technology has lived an impressive development in recent years. Although the term and

application has been there for decades, few years ago the density of integrated circuits and displays, the computer

performance and the memory capacity began to allow the construction of head mounted devices (HMD) to simulate

virtual spaces. For that, the HMD projects two spatially and temporal coherent video signals to each user´s eye. The

visual system is later responsible of forming a 3D scene by the so called stereopsis neural mechanism, resulting in an

intense immersive experience [1].

Although 3D audio technology is being relegated to a second plane in the VR field, it is lately that the importance of

localizing sound in 3D space is coming to an evidence to have profound immersive experiences. Analogously to the

Human Visual System (HVS), by means of dedicated filters of Head Related Transfer Function (HRTF) for specific

orientations, the ambisonics technology is able to recreate the perceptual auditory illusions in the Human Auditory

System (HAS). The effect turns in listening a sound from any pair of azimuth and elevation by a simply two channel

(stereo) audio streams, also called binaural listening [2].

VR is a transversal technology which spreads beyond the realm of video games, currently its major contributor in

terms of development and market incomes [3]. VR applications can be found in TV and cinema, documentary, medical

systems, education, museums, journalism, modelling, industrial processes or therapeutic treatment, just to name a few.

Technology companies, related to these fields to a greater or lesser extent, and specially the major international

companies able to provide VR-related technologies, are aware of such potentiality and their investments are said to be

raised from $3 billion in 2016 to $6 billion in 2017. Some recent key facts in 2017 are that Facebook dropped to $200

a wireless VR headset, HTC Vive and Samsung Gear glasses have also received a second important upgrade, Adobe

acquired the company Mettle for VR post-production, Apple announced a dedicated VR framework for developers

and acquired the VR plugins suite Dashwood, and Google released the VR glasses Daydream. Several IEC technical

committees (TCs) and their subcommittees (SCs) produce International Standards for hard- and software used in this

domain. For example:

- ISO/IEC JTC 1, the Joint Technical Committee of IEC and the International Organization for Standardization

(ISO), cover standardization for information technology.

- Subcommittee, ISO/IEC JTC 1/SC 24 works on interfaces for information technology-based applications

relating to computer graphics and virtual reality, image processing, environmental data representation,

support for mixed and augmented reality, and interaction with, and visual presentation of information.

- Sensors are vital components of VR technology. IEC TC 47 and its Subcommittees produce Standards for

microelectromechanical systems (MEMS), to ensure that sensors and such systems work reliably and

efficiently.

- The activities of IEC TC 100 contribute to the quality, performance and interoperability of audio, video and

multimedia systems and equipment

- IEC TC 110 covers electronic display devices and certain components, such as dashboard touchscreens in

cars

A key factor holding back the widespread adoption of VR by consumers is the ability of the current VR systems to

provide satisfactory user’s experiences [2]. Compared to more traditional audiovisual systems, VR systems not only

involve additional factors that may negatively impact the experience, but also have contexts of use in which completely

novel issues regarding user’s experience may arise. Optical limitations are present to conceive a coherent monoscopic

video capture from 360 multi-camera systems due to parallax effects [4]. This is even more challenging for



stereoscopic video format, which is probably eventually an ill-posed problem. Thus, image processing algorithms

based on optical flow and depth estimation aim at mitigating artifacts between stitched areas from different video-

streams, which may derive to ghost effects and perspective loses [5]. Analogously the most advanced ambisonics

microphones are compound of multiple capsules (32 for 4th order) and Youtube the major 360 video streaming

platform currently supports 1st order ambisonics. Moreover, reducing front, back and bottom confusion areas is still

challenging for audio processing. Another important limitation in perceptual terms is on the HMD side. The human

FoV for monocular vision is about 170°-175° Field of View (FoV) and most headset devices currently offer about

100°-110° [6]. Averaged pixel density is about 500 ppi which is still not enough, given an averaged human visual

acuity of 5 arc minutes. The Fresnel lenses are mounted probably to achieve thinner thickness for short distance focus

point. But such design suffers from clearly visible concentric circular-shaped artifacts. Among stereoscopic and

context-awareness, depth-from-focus is one of the HVS mechanisms to infer the relative size of objects in 3D scenario.

This last factor is simply not there since the HDM optical systems imply a fixed focus length about 3-4 cm[6]. Some

experimental optical systems allow several focal planes, but still far from the smooth focal transition experienced in

real life [7]. All these technical limitations are finally translated into perceptual incoherence of audio-visual stimuli.

A sound not aligned with its visual source, an object that vanishes or gets distorted as it moves or an impossible focus

perspective can cause brain confusion, fatigue and to the extreme be responsible for unpleasant experiences like

cybersickness.

The development of methods for analyzing user’s Quality of Experience (QoE) in VR systems and applications is

therefore a topic that deserves attention by researchers in the field. The aim of the present article is to provide an

overview of the current state of the art, and so to stress the advantages of such methods as well as the main faced

challenges.

2. Psychophysiological Methods and QoE Research

Psychophysiological methods can be defined as methods to provide information on psychological states of an

individual based on the analysis of his or her physiological responses. They include the measurement of activity

proxies of the peripheral nervous system, such as electrodermal activity (EDA), heart rate (HR) and heart rate

variability (HRV), electromyography (EMG, and, particularly, facial EMG), or respiration rate, as well as the analysis

of activity of the central nervous system, using techniques such as electroencephalography (EEG). Such signals reflect

diverse cognitive and emotional processes that can be useful to understand how the user thinks and feels in a certain

moment. For instance, increases in EDA have been related to experiences of emotional arousal, cognitive effort, or

stress, HRV is considered to reflect parasympathetic nervous activity associated to attentional focus and emotional

regulation, and facial EMG over certain facial muscles is adequate to represent the hedonic valence of the emotions

experienced by the user [5].

QoE is generally considered as a multidimensional concept that can be defined as “the delight of degree or annoyance

of the user of an application or service”, as resulting from “the fulfillment of his or her expectations with respect to

the utility and/or enjoyment of the application or service in the light of the user’s personality and current state” [3].

Research on QoE with immersive systems such as 3DTV has traditionally grounded on self-reported methods such as

psychophysics scales or questionnaires. However, despite the undeniable utility of such methods, in the last years

several researchers have highlighted their limitations, and made the case for the use of psychophysiological methods

[4] for exploring different dimensions of QoE.

Standard evaluations of stereoscopic image quality often rely on the psychophysical scaling methods initially proposed

in ITU recommendations BT1438, or ITU-R BT.500, which allow quantifying specific factors such as the degree of

perceived sharpness, or more general ones, such as overall image quality. Traditional self-reported methods usually

involve asking the user to make a judgment about the quality after watching a certain content (or conducting a certain

task), so forcing him or her to summarize in a single score an experience that happens over time. This involves losing

information on the temporal development of the quality of the experience, and also makes the scores subject to memory

issues and cognitive biases (e.g. the peak-end rule). In the best cases, continuous assessment systems involve the user

giving continuous judgments about the quality, which can overcome some of those limitations, but still require the

user to split his or her attention between the content and the assessment task, and hence may reduce ecological validity

of the test.

By contrast, psychophysiological signals can be collected with a high sampling rate during the experience, making it

easy to synchronize changes in the signals with events in the content or application, and also providing more objective



information of the temporal evolution of the subject’s experience without asking the participant to carry out any extra

effort. Psychophysiological evaluations are also able to capture psychological processes or states even if the user is

not aware of them, and somehow free of some cognitive biases that may affect self-reported methods (e.g. social

desirability). Due to these and other advantages, in the last years psychophysiological methods have begun to become

popular among researchers interested in having a picture of mental processes of the users, especially in areas like

media psychology or user experience research.

Much QoE research with multimedia content has focused on subjective judgments of the visual and auditory perceived

quality. In this context, there are evidences that some psychophysiological measures, specifically some brain potentials

measured through EEG, correlate with visual and auditory quality of the stimuli [6][7]. Particularly,

psychophysiological methods have a great potential for the measurement of visual fatigue and discomfort, which are

usually associated to the presence of visual distortions in stereoscopic contents. Regarding EEG, previous studies

suggest that the power of some frequency bands, such as alpha or beta bands, are related to visual fatigue, and that the

relationship between the power of different bands can be used as a metric for visual fatigue [8].

Other researchers have relied on physiological indicators of their indirect effects, such as changes in motivational or

emotional reactions towards the contents. For instance, low visual quality is associated with changes in the frontal

alpha asymmetry index, an EEG indicator of motivational approach [9]. Also the activity of the peripheral nervous

system, such as EDA or HR, has showed correlation with different levels of visual quality and users’ emotions while

experiencing 3D contents [10][11][12][13].

These antecedents may have an undoubtable utility also in the context of VR-QoE assessment, since many of the

problems of visual distortions found in more traditional stereoscopic contents are also present in VR. Furthermore,

VR involves new sources of distortion, such as video stitching, image freezing due to low rendering rates while moving

the head and general incoherent image formation of the perceived scene, which may cause visual fatigue and

discomfort, i.e. low QoE. However, the more relevant factor of QoE in VR and probably the one more differentiating

from other types of multimedia content, is the occurrence of cybersickness. Perceptual incoherence and long exposures

or simple high sensitivity can provoke symptoms such as headache, disorientation, nausea, eye-strain, sweating, or

vomiting, among others [14]. Although usually measured using self-reported measures, cybersickness has also

observable effects over psychophysiological variables such as delta and beta bands of EEG, heart period, respiration,

or blinking [15][16]. Cybersickness is associated to the presence of illusory self-motion (vection) in the represented

environment [18][19], but its etiology is yet not well understood. In this sense, the more traditional explanation has

been the one provided by sensory conflict theory [20], according to which the mismatch between the sensory cues

provided during illusory movement and the expectations resulting from previous experience causes the symptoms of

cybersickness. Alternatively, postural instability theory [21] argues that the reason behind motion sickness is not

sensory conflict itself, but the inability to achieve effective postural adaptations to conflicting visual, vestibular, and

proprioceptive information. Research has evidenced that body sway predicts the occurrence of cybersickness [22][23],

so providing support to this theory. Research on the user experience of VR can not only benefit from this research,

but also contribute significantly to it.

However one of the main advantages of VR systems compared to traditional systems is the ability to elicit in the user

a feeling of spatial presence, that is, the feeling of “being” in the virtual location. Presence is usually measured through

questionnaires [17], but there are important concerns about the validity and reliability of self-reported methods when

measuring such constructs [18]. An alternative is the use of psychophysiological methods that account for indirect

effects of the feeling of presence. For instance, the feeling of presence is assumed to be related to the elaboration of

mental model of the spatial properties of the environment [19], as well as related to stronger emotional responses to

events in the environment [20]. Thus, activity changes in brain areas related to spatial navigation (measured through

EEG), or more commonly enhanced emotional reactions [21] could be taken as indexes of spatial presence experiences

in VR environments. An interesting approach, in which the advantages provided by the temporal resolution of

psychophysiological measures are evident, consists of focusing on “breaks in presence” (BIP), that is, points in which

the feeling of presence momentary vanishes into attentional orienting responses (probably the reflecting attention

shifts from the virtual to the real world), which leaves a measurable trace on physiological signals such as HR or EDA

[22].

Another important high-level aspect of QoE in which psychophysiological methods can be crucial is in the



measurement of user’s engagement. Although it is generally considered as a key factor in describing the relationship

between users and technology, it is a blurry concept that has been defined in several ways (cf. [23]). Some recent

approaches tend to emphasize its relationship with cognitive, emotional, and motivational processes, such as, for

instance, motivational approach as measured by frontal alpha asymmetry [23]. Interestingly, it has been observed that

the consistency over time of attentional and emotional responses obtained from a small sample, is a good predictor of

behavioral measures of engagement of much larger audiences [24][25]. This open a venue for, grounding on an

operationalization of engagement in terms of between-subjects synchrony in terms of attentional and emotional

responses, broadening the scope of QoE research, by exploring not only how certain visual and auditory distortions

may affect not only to the individual satisfaction (“delight or annoyance with the application or service”), but also to

the users’ engagement in a more general fashion.

3. Future directions and challenges

The ability to provide information of the psychological processes even below the subject’s awareness, without any

voluntary effort from the user and high temporal resolution, makes psychophysiological methods especially suitable

for tackling aspects of QoE in VR environments compared to self-reported methods. Particularly, in experiences that

occur over time, such as videogames, training or psychological treatment, these methods can provide insights on

perceptions of visual and auditory quality, cybersickness degrees, presence feeling, or user engagement. This is

positively achieved barely affecting user’s attention and robust to subject’s memory biases or ability for introspection.

It is certainly an advantage since obviously the manipulation of physical tools for reporting subjective impressions is

complicated while the face is covered and reporting inside the VR world could bias the evaluation.

On the contrary, one limitation of psychophysiological methods is that, in their current state they are informative of

basic psychological constructs (e.g. physiological arousal, attentional effort), but they do not directly reflect higher-

level constructs such as presence, engagement, or satisfaction. A central challenge for researchers in QoE is therefore

to explore psychological models inferred from psychophysiological signals. In this sense, a recent work has

demonstrated that cybersickness can be predicted from changes in physiological indicators such as respiration,

stomach activity, and blinking [15]. However, there are no reports of predictive models of aspects such as feeling of

presence, enjoyment, or satisfaction, which are central to QoE. Some difficulties are: the lack of a prediction ground-

truth, since the operationalization of the psychological states usually relies on self-reported methods (e.g. presence

questionnaires), with the associated problems regarding validity, reliability, and temporal already mentioned. A

variety of application contexts in VR can also difficult to establish direct relationships between low-level

psychological processes and high-level inferences. For instance, the presence of negative emotions and stress is

probably an indication of bad experience for a training application, but may be the key factor in some entertainment

contexts. In order to overcome these issues, a possible alternative could rely on observable behavioral outcomes as a

proxy for the psychological inferences, as for instance choice-related metrics or use time or implicit association tests.

Additionally, the variability of psychophysiological signals makes unlikely that a single model based on

psychophysiological measures can accurately predict subjective aspects of QoE for the users. In this respect, a possible

approach is to focus on finding possible clusters of users with similar response patterns, learning models for each

cluster.

The unique way of experiencing VR contents creates specific challenges for the experimental design of QoE tests.

The fact that the user has the capacity of rotating the head and gazing in any direction can provoke that relevant events

are unnoticed by the user. Considering 6 degrees of freedom, where the user can move around the room, it is even

more critical. This highlights the need for adequate experimental designs aimed to find the right balance between the

user’s free behavior (in order to keep the ecological validity of the test) and the constrains imposed by the instructions

to accomplish the evaluation goals.

The physical characteristics of HMD itself impose some constraints to the type of psychophysiological measurements

can be taken. Thus, the headset hinders the use of EEG and the attachment of EMG face electrodes, hand controllers

may obstruct the placement of EDA sensors usually attached to fingers, or the highly wired environment can frustrate

the user’s movements. On this regard, recent wearable systems for psychophysiological recordings (e.g. Empatica,

Shimmer sensing) are a fundamental step forward to the incorporation of psychophysiological methods to VR-QoE.

A second generation of wireless VR glasses has been announced by the companies Oculus and HTC in 2018. There

exists methods for measuring HR by means of cameras and computer vision techniques (without attaching any

physical sensors) in the context of QoE research [26]. However, approaches based on the use of computer vision

techniques would need to address important challenges, such as that user’s face (whose detection is key in many cases)

is partially occluded by the headset or even the user’s arms. In that respect, although dedicated eye tracking devices



have been widely used in basic research for decades, especially for analyzing task accomplishment, it is not until now

that some companies are recently able to incorporate eye-tracking systems into VR glasses like FOVE or TOBII, and

presumably the most major VR companies will do in a recent future.

4. Conclusion

The challenges proposed to psychophysiology-based QoE research in VR environments not only encourage

researchers to rethink traditional QoE metrics and experimental evaluations, but also to walk further to the boundaries

of QoE as a research field. VR expands the use of audiovisual systems from entertainment to very disparate areas

including psychological treatment, data visualization, rehabilitation, and training, among others. The ‘delight’ or

‘annoyance’ of the users in such disparate contexts present several dimensions far beyond issues of perceptual quality

or visual discomfort, and that may hardly been understood grounding solely on subjective perceptions. Aspects such

as spatial presence, user’s attentional and emotional involvement, cognitive effort, or stress, are the key to explain

how a user senses, feels and thinks experiencing a certain application or service, and psychophysiological-based

methods may be the best option currently available for them to be understood. In combination with more traditional

self-reported approaches, as well as with other multimodal measurements, they can help not only to test how

satisfactory a certain VR system or application can be for some users from a consumer perspective, but also, in general

terms, to understand how a disruptive technology such as VR may impact several aspects of our lives.

References [1] Barnard, S. T. and M. A. Fischler (1987) Stereo Vision. Encyclopedia of Artificial Intelligence, pp. 1083-1090. John Wiley, New York

(USA).

[2] Begault, D. R., and Trejo, L. J. (2000). 3-D sound for Virtual Reality and Multimedia. National Aeronautics and Space Administration, Ames

Research Center.

[3] Virtual Reality (VR) Market Analysis By Device, By Technology, By Component, By Application By Region, And Segment Forecasts, 2014

– 2025, GVR-1-68038-831-2, May 2017.

[4] Jones, J. A., Swan II, J. E., Singh, G., Kolstad, E., & Ellis, S. R. (2008, August). The effects of virtual reality, augmented reality, and motion

parallax on egocentric depth perception. In Proceedings of the 5th symposium on Applied perception in graphics and visualization (pp. 9-14).

ACM.

[5] Anderson, R., Gallup, D., Barron, J. T., Kontkanen, J., Snavely, N., Hernández, C., ... & Seitz, S. M. (2016). Jump: virtual reality

video. ACM Transactions on Graphics (TOG), 35(6), 198.

[6] Guenter, Finch, Drucker, Tan, Snyder “Foveated 3D Graphics”, ACM SIGGRAPH Asia 2012.

[7] A. Maimone, G. Wetzstein, D. Lanman, M. Hirsch, R. Raskar, H. Fuchs "Focus 3D: Compressive Accommodation Display". ACM Trans.

Graph. 2013, presented at ACM SIGGRAPH 2014.

[8] M. Slater, M.V. Sánchez-Vives, “Enhancing our lives with immersive virtual reality”, Frontiers in Robotics and AI, vol. 3, art. 74, 2016.

[9] Perkins Coie & Upload, 2016 augmented and virtual reality survey report, 2016.

[10] K. Brunnström et al., “Qualinet White Paper on Definitions of Quality of Experience,” Mar. 2013, Qualinet White Paper on Definitions of

Quality of Experience, Novi Sad, March12, 2013

[11] U. Engelke et al., “Psychophysiology-based QoE assessment: A survey”, IEEE Journal of Selected Topics in Signal Processing, vol. 11, no.

1, pp 6-21, 2013.

[12] A. Lang, R.F. Potter, P. Bolls, “Where Psychophysiology Meets the Media: Taking the Effects Out of Mass Media Research”, in J. Bryant,

M.B. Oliver (Eds.), Media effects: Advances in theory and research, pp. 185-206, New York, NY, Routledge.

[13] S. Scholler, S. Bosse, M.S., Treder, B. Blankertz, G. Curio, K.R. Muller, T. Wiegand, “Toward a direct measure of video quality perception

using EEG”. IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2619-2629, 2012.

[14] J. N. Antons, R. Schleicher, S. Arndt, S. Moller, A.K. Porbadnigk, G. Curio, “Analyzing speech quality perception using

electroencephalography”. IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 721-731, 2012

[15] C. Chen, J. Wang, K. Li, Q. Wu, H. Wang, Z. Qian, N. Gu, “Assessment visual fatigue of watching 3DTV using EEG power spectral

parameters,” Displays, vol. 35, no. 5, pp. 266–272, 2014.

[16] E. Kroupi, P. Hanhart, J. Lee, M. Rerabek, T. Ebrahimi, “EEG correlates during video quality perception,” in Proc. IEEE 22nd Eur. Signal

Proc. Conf., pp. 2135– 2139, Sept. 2014.

[17] E. Kroupi, P. Hanhart, J.-S. Lee, M. Rerabek, and T. Ebrahimi, “Predicting subjective sensation of reality during multimedia consumption

based on EEG and peripheral physiological signals,” in Proc. Int. Conf. Multimedia and Expo, pp, 1-6, Sept. 2014.

[18] Hettinger, L. J., Berbaum, K. S., Kennedy, R. S., Dunlap, W. P., & Nolan, M. D. (1990). Vection and simulator sickness. Military Psychology,

2(3), 171.

[19] Bonato, F., Bubka, A., & Palmisano, S. (2009). Combined pitch and roll and cybersickness in a virtual environment. Aviation, space, and

environmental medicine, 80(11), 941-945.

[20] Reason, J. T. (1978). Motion sickness adaptation: a neural mismatch model. Journal of the Royal Society of Medicine, 71(11), 819.

[21] Riccio, G. E., & Stoffregen, T. A. (1991). An ecological theory of motion sickness and postural instability. Ecological psychology, 3(3), 195-

240.

http://web.media.mit.edu/~gordonw/Focus3D/

http://web.media.mit.edu/~gordonw/Focus3D/



[22] Merhi, O., Faugloire, E., Flanagan, M., & Stoffregen, T. A. (2007). Motion sickness, console video games, and head-mounted displays. Human

Factors, 49(5), 920-934.

[23] Munafo, J., Diedrick, M., & Stoffregen, T. A. (2016). The virtual reality head-mounted display Oculus Rift induces motion sickness and is

sexist in its effects. Experimental Brain Research, 1-13.

[24] M. Barreda-Ángeles, R. Pépion, E. Bosc, E., P. Le Callet, A. Pereda-Baños, “Exploring the effects of 3D visual discomfort on viewers'

emotions”, in Proc. IEEE Int. Conf. Image Processing, pp. 753-757, Oct. 2014.

[25] M. Barreda-Ángeles, R. Pépion, E. Bosc, E., P. Le Callet, A. Pereda-Baños, “How visual discomfort affects 3DTV viewers' emotional arousal”,

in Proc. 3DTV-CON Conf: The True Vision-Capture, Transmission and Display of 3D Video, pp. 1-4, Jul. 2014.

[26] J. J. LaViola, “A discussion of cybersickness in virtual environments”. ACM SIGCHI Bulletin, vol. 32, no. 1, pp. 47- 56, 2000.

[27] M. S. Dennison, A.Z. Wisti, M. D’Zmura, M, “Use of physiological signals to predict cybersickness”. Displays, vol. 44, pp 42-52, 2016.

[28] Y.Y. Kim, H. J. Kim, E. N. Kim, H. D. Ko, H. T. Kim, “Characteristic changes in the physiological components of cybersickness”.

Psychophysiology, vol 42, no. 5, pp. 616-625, 2005.

[29] Witmer, B. G., M. J. Singer, “Measuring presence in virtual environments: A presence questionnaire”. Presence: Teleoperators and virtual

environments, vol. 7, no. 3, pp. 225-240, 1998.

[30] M. Slater, “How colorful was your day? Why questionnaires cannot assess presence in virtual environments”. Presence: Teleoperators and

Virtual Environments, vol. 13, no. 4, pp. 484-493, 2004.

[31] W. Wirth et al., “A process model of the formation of spatial presence experiences”, Media Psychology, vol. 9, no. 3, pp. 493-525, 2007.

[32] J. Diemer, G.W. Alpers, H.M. Peperkorn, Y. Shiban, Y.,A. Mühlberger, “The impact of perception and presence on emotional reactions: A

review of research in virtual reality”, Frontiers in Psychology, vol. 6, art. 26, Jan. 2015.

[33] T. Baumgartner, L. Valko, M. Esslen, L. Jäncke, L, “Neural correlate of spatial presence in an arousing and noninteractive virtual reality: An

EEG and psychophysiology study”, CyberPsychology & Behavior, vol. 9, no. 1, pp. 30-45, 2006.

[34] B. Liebold, M. Brill, D. Pietschmann, F. Schwab, P. Ohler, “Continuous measurement of breaks in presence: Psychophysiology and orienting

responses”, Media Psychology, vol. 20, no. 3, pp. 477-501, 2017.

[35] I. Arapakis, M. Barreda-Angeles, A. Pereda-Baños, “Interest as a proxy of engagement in news reading: Spectral and entropy analyses of

EEG activity patterns”, IEEE Transactions on Affective Computing (online preprint), 2017.

[36] J.P. Dmochowski, M.A. Bezdek, B.P. Abelson, J.S. Johnson, E.H. Schumacher, L.C. Parra, “Audience preferences are predicted by temporal

reliability of neural processing”, Nature communications, vol. 5, 4567, 2014.

[37] C. Christoforou, S. Christou-Champi, F. Constantinidou, M. Theodorou, “From the eyes and the heart: A novel eye-gaze metric that predicts

video preferences of a large audience”, Frontiers in Psychology, vol. 6, art 579, 2015.

[38] M. Bonomi, M. Barreda-Ángeles, F. Battisti, G. Boato, P. Le Callet, M. Carli, “Towards QoE Estimation of 3D Contents Through Non-

invasive Methods”, in Proc. 3DTV-CON Conf: The True Vision-Capture, Transmission and Display of 3D Video, pp. 1-4, Jul. 2016.



Miguel Barreda-Ángeles is a researcher at the Digital Humanities unit at Eurecat – Technology

Centre of Catalonia. He received a Ph.D. in Social Communication from Pompeu Fabra University

(Spain) in 2014. His research interests include psychophysiology and user experience research.

Alexandre Pereda-Baños is a researcher at Eurecat’s Digital Humanities Unit, leading the research

line on perception and cognition. He research’s on the field of User Experience measurement and has

participated in several ICT projects such as 2020 3D media. He obtained a Ph.D. in Cognitive

Neuroscience from the Trinity College Institute of Neuroscience (Dublin, Ireland), and before joining

Barcelona Media, he worked at the Multisensory Research Group in the Universitat Pompeu Fabra

(Barcelona).

Dr. Rafael Redondo Tejedor is a researcher with the Eurecat’s Multimedia Technologies Unit mainly dedicated in

computer vision for virtual reality and interactive systems. He received his PhD in computer vision

from the Instituto de Óptica (CSIC) and Escuela Técnica Superior de Ingenieros de

Telecomunicación (ETSIT) at Universidad Politécnica de Madrid (UPM) in 2007. Later on he has

participated in international European projects for 3D medical image visualization (CSIC, 2009),

camera contrast enhancement (Imatrics, 2010), autofocus evaluation (Visilab University of Castilla

La Mancha, 2011), pollen recognition (Inspiralia 2013) and medical image quality evaluation

(UCLM, 2014). He also received a master degree in Sonology from the University of Pompeu Fabra

(UPF). Among his research fields are vision modeling, image and volumetric coding, time-frequency representations,

medical imaging and pattern classification.



QoE Concerns and Measurement in Augmented Reality Applications

Patrick Seeling

Department of Computer Science, Central Michigan University, MI, USA

[email protected]

1. QoE and Augmented Reality

Recent years have seen an emergence of a multitude of new Human-Computer Interface (HCI) types, especially in the

domain of wearable devices. Industry predictions, such as Gartner’s Hype Cycle, indicate that we will soon witness

the broad adaptation of Augmented Reality (AR) devices in the consumer and professional spaces, including industrial,

governmental, and military applications [1]. As already witnessed in today’s networks, the presentation of multimedia

content in fixed and mobile scenarios accounts for a significant portion of the overall network traffic. Industry

predictions indicate that this trend is highly likely to continue in the foreseeable future [2]. Jointly, these trends indicate

that a significant portion of future multimedia network traffic will be directed at content presentation in AR scenarios.

In addition to the typical quality and size trade-offs required for the timely display of traditional network-delivered

multimedia content, AR scenarios will likely require adaptations, including multidimensional/immersive views [3],

[4]. A quantification of the impact of these trade-offs generally is achieved by determining objective fidelity metrics

(Quality of Service, QoS) and by mapping them to subjective experience ratings (Quality of Experience, QoE).

Employing the QoE rather than QoS metrics alone has the inherent benefit of enabling network and content service

providers with the means of fine-tuning their offerings to customer expectations and, ultimately, willingness to pay.

The QoE is commonly determined using the Experience Sampling Method (ESM), e.g., using the NASA-TLX

approach [5], captured using Likert-type scales with individual subjects and combined into Mean Opinion Scores

(MOS), see, e.g., [6]. However, this active human in-the-loop approach is not feasible in applied scenarios and

mappings between the objectively determinable QoS and subjective QoE have emerged, such as the IQX Hypothesis

[7] or the Weber-Fechner Law in [8].

The AR environment, however, presents additional challenges. First, the content presentation in AR scenarios

commonly is performed using head-mounted devices (HMDs) to display content, which results in content displayed

close to the eye. Secondly, the presentation is overlapped with the real world, which results in an ad-hoc environment

without significant potential for ex-ante estimations. Intuitively, considerations that need to be taken into account

when determining the QoE in AR scenarios include the media fidelity in addition to contrast and colors as consequence

of the overlay of content with the real world [9]. The combination of both is typically neither readily determinable nor

steady and requires new considerations for perceptual models [10], [11].

Initial evaluations that strive to determine the AR QoE in steady environments for popular test images and video

sequences can be found in [12], [13]. Specifically for still images, we illustrate the difference between the traditional

(opaque) and AR (see-through) mode in Figure 1. As described in greater detail in [12], the effects of image

compression (QoS) and mean opinion scores (MOS, QoE) were denoted as Visual User Experience Difference

(VUED). Interestingly, higher ratings are attained in the AR display for higher qualities, while lower qualities exhibit

a reverse trend. For the two presentation modes, a model was presented that can be applied for estimations of the QoE

in AR settings, based on traditional display modes in a fairly steady environment, resulting even in predictability [14].

A remaining shortcoming for practical real-time evaluations, however, is the remaining active role of the human in-

the-loop that is required to determine the QoE interactively, especially in dynamic environments.

2. Measuring the QoE with EEG

During the same time frame as AR emerged, another form of HCI approaches began to garner interest from the

research community. Brain-Computer Interfaces (BCI) as a subset of HCI employ electroencephalography (EEG) to

measure brainwaves at several positions and derive further information from the different frequency bands at different

localities. While wet electrodes were common in the beginning, we now have reached a point in time where dry

electrodes can readily be placed on a human subject and provide information – all through commercially available

off-the-shelf devices.

In past research efforts, media quality was evaluated in the context of cognitive processes [15] in laboratory settings

with wet electrodes and continue to date [16]. Typically, EEG measurements at 300 to 500 ms after the stimulus,



Figure 2: Visual User Experience Difference (VUED) based on mean opinion scores different images presented traditionally and in AR. Please refer to [12] for additional details.

such as media display or quality changes. Approaches moving to dry electrodes are beginning to emerge in order to

determine the QoE in general [17]. The potential for a direct measurement has successfully been exploited in

traditional settings, even with dry electrodes, see, e.g., [18], [19].

Jointly with the commonly head-worn binocular vision augmenting devices, a new opportunity in determining the

QoE of device operators emerges. Specifically, little modifications of current AR devices could provide real-time or

close to real-time EEG measurements, as the physical contact of additional sensors on device wearers can be readily

realized with the head mounted device itself. We employed this approach in our own research, evaluating the

possibility of predicting traditional image display (AR) and spherical/immersive image display (SAR) QoE with a 4-

electrode headband. We illustrate some high-level results for the Mean Absolute Error (MAE) in Figure 2.

Corroborating intuition, the combination of all electrodes in the machine learning based prediction approach yields

the lowest overall errors. Simplifications by omitting individual sensors, however, maintain a fairly high level of

accuracy with a configuration that could readily incorporated into future HMD for AR. We refer the interested reader

to [20] for more details and overview of the publicly available data set.

3. QoE Pasa?

BCI with EEG seems poised to emerge in the realm of wearable devices as a future means of directly determining the

QoE from human subjects as they perform actions in the real world. However, other domains of multimedia content

presentation, such as Virtual Reality (VR), are ideal application scenarios as well. The head-worn nature of the devices

presents a unique opportunity to capture psycho-physiological sensor data from the device operator directly. This

enables a passive human in-the-loop approach in the determination of the QoE and subsequent service adjustment.

Consider future closed feedback loop scenarios that allow a passive human in-the-loop evaluation of the QoE through

EEG feedback loops. In turn, multimedia content delivery can be radically changed based on the directly determined

QoE without human subject interventions, but tailored to the situation and the individual subject.

While this is certainly an allure for personalized media services, significant challenges remain, even outside of the

BCI domain for a successful future implementation. The delivery of context-dependent network-delivered content to

(a) AR

(b) SAR

Figure 1: Mean Absolute Errors (MAE) for regular (AR) and spherical (SAR) images with averages and standard deviations for subject ratings

(QoE) and impairment level (QoS) prediction performance analysis of individual subjects.



devices in near real-time, however, represents a challenge and requires new paradigm considerations, especially

bandwidth and latency in access networks and clouds [21]. These new extreme low-latency services are currently also

referred to as the “tactile internet” - which brings an additional dimension for the future of QoE research.

References

[1] R. P. Spicer, S. M. Russell, and E. S. Rosenberg, “The mixed reality of things: emerging challenges for human-information interaction,” Proc.

SPIE Volume 10207, Next-Generation Analyst V, pp. 10207, May 2017.

[2] Cisco, Inc., “Cisco visual networking index: Forecast and methodology, 2016–2021,” Cisco Tech. Rep., Jun. 2017.

[3] C. Ozcinar, E. Ekmekcioglu, J. Calic, and A. Kondoz, “Adaptive delivery of immersive 3d multi-view video over the internet,” Multimedia

Tools and Applications, vol. 75, no. 20, pp. 12 431–12 461, Oct. 2016.

[4] L. A. da Silva Cruz, M. Cordina, C. J. Debono, and P. A. A. Assuncao, “Quality monitor for 3-d video over hybrid broadcast networks,” IEEE

Transactions on Broadcasting, vol. 62, no. 4, pp. 785–799, Oct. 2016.

[5] S. G. Hart, “Nasa-task load index (nasa-tlx); 20 years later,” in Proc. of the human factors and ergonomics society annual meeting, vol. 50, no.

9, pp. 904-908, Oct. 2006.

[6] J. M. Hektner, J. A. Schmidt, and M. Csikszentmihalyi, Experience sampling method: Measuring the quality of everyday life. Sage, 2007.

[7] M. Fiedler, T. Hossfeld, and P. Tran-Gia, “A generic quantitative relationship between quality of experience and quality of service,” IEEE

Network, vol. 24, no. 2, pp. 36–41, March/April 2010.

[8] P. Reichl, B. Tuffin, and R. Schatz, “Logarithmic laws in service quality perception: where microeconomics meets psychophysics and quality

of experience,” Telecommunication Systems, vol. 52, no. 2, pp. 587–600, Feb. 2013.

[9] M. A. Livingston, J. H. Barrow, and C. M. Sibley, “Quantification of Contrast Sensitivity and Color Perception using Head-worn Augmented

Reality Displays,” in Proc. of the IEEE Virtual Reality Conference (VR), Lafayette, LA, USA, Mar. 2009, pp. 115–122.

[10] Y. Jang and W. Woo, “Unified Visual Perception Model for context–aware wearable AR,” Proc. of the IEEE International Symposium on

Mixed and Augmented Reality (ISMAR), Adelaide, SA, Australia, Oct. 2013, pp. 1–4.

[11] E. Kruijff, J. E. Swan II, and S. Feiner, “Perceptual issues in augmented reality revisited,” in Proc. of IEEE and ACM International Symposium

on Mixed and Augmented Reality (ISMAR), Seoul, Korea, Oct. 2010, pp. 3–12.

[12] P. Seeling, “Visual user experience difference: Image compression impacts on the quality of experience in augmented binocular vision,” in

Proc. of IEEE Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, Jan. 2016, pp. 931–936.

[13] ——, “Towards quality of experience determination for video in augmented binocular vision scenarios,” Signal Processing: Image

Communication, vol. 33, no. 0, pp. 41–50, Apr. 2015.

[14] B. Bauman and P. Seeling, “Towards still image experience predictions in augmented vision settings,” in Proc. of the IEEE Consumer

Communications and Networking Conference (CCNC), Las Vegas, NV, USA, Jan. 2017, pp. 1–6.

[15] S. Scholler, S. Bosse, M. S. Treder, B. Blankertz, G. Curio, K.-R. Mueller, and T. Wiegand, “Toward a Direct Measure of Video Quality

Perception Using EEG,” IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2619–2629, May 2012.

[16] S. Bosse, K. R. Muller, T. Wiegand, and W. Samek, “Brain-computer interfacing for multimedia quality assessment,” in 2016 IEEE

International Conference on Systems, Man, and Cybernetics (SMC), Oct 2016, pp. 002834–002839.

[17] A.-N. Moldovan, I. Ghergulescu, S. Weibelzahl, and C. H. Muntean, “User-centered eeg-based multimedia quality assessment,” in 2013 IEEE

International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), London, UK, June 2013, pp. 1-8.

[18] P. Davis, C. D. Creusere, and J. Kroger, “The effect of perceptual video quality on EEG power distribution,” in 2016 IEEE International

Conference on Image Processing (ICIP), Phoenix, AZ, USA, Sept. 2016, pp. 2420–2424.

[19] P. Arnau-Gonzalez, T. Althobaiti, S. Katsigiannis, and N. Ramzan, “Perceptual video quality evaluation by means of physiological signals,”

in 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), Erfurt, Germany, May 2017, pp. 1–6.

[20] B. Bauman and P. Seeling, “Visual interface evaluation for wearables datasets: Predicting the subjective augmented vision image qoe and

qos,” Future Internet, vol. 9, no. 3, p. 40, Jul. 2017.

[21] Y. Y. Shih, W. H. Chung, A. C. Pang, T. C. Chiu, and H. Y. Wei, “Enabling low-latency applications in fog-radio access networks,” IEEE

Network, vol. 31, no. 1, pp. 52–58, January 2017.

Patrick Seeling is an Associate Professor with the Department of Computer Science at Central

Michigan University, USA. He received his Ph.D. degree from Arizona State University in 2005. His

research interests include user experiences in mixed realities, mobile systems and networking, and

engineering education.



Emerging levels of immersive experience in MPEG-I video coding

Dragorad Milovanovic, Dragan Kukolj

Dept. of Computer Engineering, Faculty of Engineering, University of Novi Sad, Serbia

[email protected]

1. Introduction

This article provides a summary of the current activities related to the MPEG development and research activities of

the emerging immersive media technologies (UHD high-resolution 4/8K video and HDR&WCG 10/12bpp picture

quality, 3D video formats, 360° panorama video, Augmented/Mixed/Virtual Reality, immersive games, Light Field

displays, plenoptic imaging, multi-sensorial media) [1]. Especially we outline the need for perceptual tools in

exploration project MPEG-I Coded representation of immersive media towards specification the new standard

ISO/IEC 23090 [2]. Most of these standards activities are currently in early phase. For each use case of immersive

video technology, we consider the supported features which determine the attained level of immersiveness. This level

is different among the considered technologies, and varies with technological complexity [3]. In some of the cases,

the technology is anticipated to be available in the not so close future. This fact is one of the motivations of works in

ISO/IEC MPEG group on MPEG-I project, which aims at standardization of immersive visual media in phases.

2. Levels of immersion and technological complexity

In June 2016, MPEG started working on MPEG-VR initiative to develop a roadmap and coordinate the various

activities related to immersive media within MPEG and to liaison also with other consortia working on innovative

products and services. Currently, MPEG-I project explores standards to digitally represent immersive media. The first

stage of MPEG-I Phase 1A, target the most urgent market needs, which is specification of 360 video projection

formats OMAF (Omnidirectional Media Application Format). The next Phase 1B (Doc.N17069 Requirements on

Phase 1B, July 2017) will the extend specification towards 3DoF+ applications. The Phase 2 (Doc.N17073

Requirements on 6DoF v1, July 2017) is intended to start from about 2019, aims at addressing 6DoF applications like

free viewpoint video (Table 1). Different levels of experience can be achieved by the user who may freely move his

head around three rotational axes 3DoF (yaw, pitch, roll), and along three translational directions 6DoF (left/right,

forward/backward, up/down) (Doc.W17285 Visual activities on 6DoF and Light Fields, Oct. 2017).

Table 1. MPEG-I features and technology in exploration.

Use case Technology Phase

3DoF

360° video

This phase aims to deliver a complete distribution system: Basic 360° streaming (possible

optimized Tiled Streaming with adequate support in HEVC ) and projection (monoscopic and

stereoscopic). OMAF (ISO/IEC 23090-2 Application format for omnidirectional media, 2017)

Phase1A

Oct. 2017

3DoF+

360° video

Focus on VR 360° with 3DoF, with some additional depth clues, that would allow moving the

viewpoint in a limited space. In addition, optimization in projection mapping, further motion-to-

photon delay reductions, optimizations for person-to-person communications as well as the

phase should have some quality definition and verification. FVC (Future Video Codec)

baseline (Doc.N17195 Joint Call for Proposals on Video Compression with capability beyond

HEVC, Oct. 2017). PCC (Doc.N17251 Report on Point Cloud Compression Call for Proposals,

Oct. 2017).

Phase1B

2019

6DoF

WindowedVR

Most important element new video codec with support for 6 DoF. Systems elements required in

support of 6DoF, as well as 3D graphics. Support for interaction with the virtual environment

(Doc.W17130 Exploration experiments: Windowed-6DoF, Oct. 2017).

Phase 2

2020

Augmented

reality

The additional requirements come from the fact that the renderer has to be aware of the real

environment. Dense sampling of 3D space with video signals. Metadata can specify aspects of

environment. AV signals can be ultra-realistic and augmented.

Hybrid Natural/Synthetic Scene Container (Doc.N17064 Requirements for MPEG-I hybrid

natural/synthetic scene data container v1, July 2017).

2021

Omnidirectional 360° video provides immersive experiences based on interactivity between the user and the content.

However, the market fragmentation due to lack of appropriate standards on storage and delivery format for such

content is becoming one of the strong concerns by the industry. A first set of MPEG-I specifications is required in

time for a market launch of products and services in 2018. It is highly likely that MPEG can deliver solutions that are

optimized in a longer time frame, which requires for more experimentations and development. Since many believe

that major market launch of VR 360° services will happen in 2020, a next set of specifications can be delivered in

2019. At the same time it is clear that there is a strong need for longer term work, notably in the video area, but



possibly also in the audio space on 6DoF content.

3. The need for perceptual tools and assessment

In order to immerse the user into virtual reality, the MPEG-I technology has to convince our senses. The most basic

requirement is to present vision to the eyes of the viewer. The level of immersion can be increased if the following

features are implemented:

✓ Rotation. The ability to look around freely with 3DoF (yaw, pitch, roll) allows human brain to construct holistic

model of the environment. This process is crucial to provide true immersive experience. Therefore, the views

presented to the user’s eyes should follow rotation of the head.

✓ Motion. The ability of the user to move in all directions 3DoF improves the level of immersion in two ways. First,

it enables motion parallax, which helps brain to perceive the depth and cope with occlusions. Second, it allows the

user to explore.

✓ Joint rotation and motion. Both of these features together grant the user with 6 degrees of freedom. Thanks to

the synergy, the user can witness the presented reality without bounds.

✓ Latency. Our brains are very vulnerable to the differences in time of perception of information coming from

different senses. The mismatch causes motion sickness, which can be avoided by minimizing overall latency of

the system.

✓ Binocular vision. The human visual system employs information from both of eyes to sense the depth of the scene.

Without the proper depth sensation, the scene is perceived as flat and unnatural.

✓ Resolution. For the example of HMDs, the displays are mounted very close to the user’s eyes and therefore the

amount of pixels must be sufficient in order to avoid aliasing. This is in particular important when the user moves

or rotates very slightly, which results in unnatural jumps of edges in the perceived image by single pixel distance.

✓ Self-embodiment. The ability to see parts of its own body convinces the user about being part of the presented

reality.

✓ Interactivity. Allowing the user to manipulate objects provides strong premises about integrity of the presented

reality.

In order to study immersive experience metrics and their measurability in immersive services, and develop a test

methodology of the technical specification for the intended use cases, MPEG-I is calling for video test material to

assess algorithm performance for different setups where information is combined from different cameras to generate

virtual views scene (Doc.N16766 Call for immersive visual test material, April 2017). Test material should comply

to the attributes as follow:

✓ General considerations. Still image and video sequences from both indoor and outdoor scenes can be submitted,

with sufficient complexity to test the limits of the algorithms under study - natural content is highly preferred over

computer-generated content. Color components, depth, and metadata are provided separately (particular for the

camera parameters). Types of cameras and camera array arrangements (highly dense array of images along a

predefined track: 2D linear with parallel cameras, 2D linear with convergent cameras, 2D cylindrical surface, 2D

spherical surface). Accurate temporal synchronization of multiple cameras is preferred.

✓ Omnidirectional video with depth data. The content should be captured with an arrangement of cameras that

records divergent views, preferably in an arrangement that supports the capture of a full 360° field of view. Both

the texture and depth data must be provided at the same resolution with an input greater than or equal to 4K, and

the same projection - preferably in the equirectangular projection.

✓ Divergent/convergent camera arrangement. Video material recorded with significant overlap preferably in an

arrangement that supports the capture of a full 360° field of view / volume of visual data. Both the intrinsic and

extrinsic camera parameters must also be provided.

✓ 2D camera array arrangement (following a planar, cylindrical or spherical surface). Dense video sequences are

particularly sought with a baseline distance between cameras not more than 20cm, and the distance from one end

of the array to the other end as wide as possible.

✓ Plenoptic cameras. Density of micro-lenses supposed to be large enough to ensure a good angular sampling of

the light field. Resolution of the plenoptic image should be no less than 15 mega-rays.

✓ Systems of simultaneous multiple acquisitions. Simultaneously acquire the same scene following the

specifications defined above.



Recently, MPEG-I established an ad-hoc group Immersive media quality evaluation with the goal to document

requirements for QoE in Phase 1B, collect test material, study existing methods for QoE assessment (Doc.W17135

Survey on assessing subjective quality of immersive media applications and services, Oct. 2017), study immersive

experience metrics (Doc.W17239 Immersive media metrics under considerations, Oct. 2017) and their measurability

in immersive services, and develop a test methodology (Doc.M40814 VR Experience metrics, July 2017). MPEG-I

CD Part 6 Jan. 2018 will specify immersive media metrics and measurement framework to enhance the immersive

media quality and experiences. This part also includes a client reference model with observation and measurement

points to define the interfaces for the collection of the metrics.

4. Conclusion

It can be summarized that the MPEG-I project develops technologies with various levels of immersiveness. The level

is different among the considered technologies, and varies with technological complexity. MPEG technically finalized

the first international standard for delivery and storage immersive media in OMAF (Omnidirectional MediA Format).

Next, the MPEG meeting in Oct. 2017 marked the first major step toward the FVC (Future Video Coding) standard

in the form of a joint call for proposals, which includes the testing of technology for 360° omnidirectional video

coding. Also, MPEG evaluates responses to call for proposals for Point Cloud Compression (PCC) and kicks off its

technical work in lossless or lossy coding of extremely large amounts of 3D data with applications in immersive real-

time communication and 6DoF virtual reality.

In order to study immersive experience metrics, their measurability in immersive services, and develop a test

methodology of the technical specification for the intended use cases, MPEG-I establish an ad-hoc group. The

mandate of the working group is to specify immersive media metrics and measurement framework to enhance the

immersive media quality and experiences in draft standard till January 2018.

References

[22] D.Milovanovic, D.Kukolj, "Recent advances in UHD video coding technology: High Dynamic Range and Wide Color Gamut", IEEE

COMSOC MMTC Communications - Frontiers, Special issue on Ultra-high definition video communications (Guest editors: P.A.Assuncao,

R.Vanam), Vol.11, No.1, Jan. 2016. pp.50-55.

[23] ISO/IEC JTC1/SC29 23090 Coded Representation of Immersive Media, Part 1 Technical report on immersive media, Part 2 Application

format for omnidirectional media, Part 3 Immersive video, Part 4 Immersive audio, Part 5 Point cloud compression, Part 6 Immersive media

metrics, (under development / exploration).

[24] Q.Huynh-Thu, P.Le Callet, M.Barkowsky, "Video quality assessment: From 2D to 3D - Challenges and future trends", in Proc. 17th IEEE

ICIP, Sep 2010, Hong Kong SAR China. pp.4025 – 4028.

Dragorad Milovanovic received the Dipl. Electr. Eng. and M.Sc. degree from Faculty of

Electrical Engineering, University of Belgrade, Serbia. He has working as research assistant in

DSP, R&D engineer in multimedia communications and PhD researcher. He has coauthored

reference-books for PH, Wiley and CRC Press as well as published more than 200 papers in

international journals and conference proceedings.

Dragan Kukolj (M’97-SM’06) received his Diploma degree in control engineering in 1982, MSc

degree in computer engineering in 1988, and PhD degree in control engineering in 1993, all from

the University of Novi Sad, Serbia. He is currently a Professor of computer-based systems with

Dept. of Computer Engineering, Faculty of Engineering, University of Novi Sad. His main

research interests include digital signal processing, video processing and machine learning. He

has published over 200 papers in referred journals and conference proceedings. Dr. Kukolj is the

coordinator of Intellectual Property Centre of University of Novi Sad.



Trends in QoE for immersive experiences

Andrew Perkis, Sebastian Arndt

Department of Electronic Systems

The Norwegian University of Science and Technology, Trondheim, Norway

[email protected], [email protected]

1. Introduction

Digital storytelling is at the heart of new digital media and the ability to tell stories in various formats for multiple

platforms is becoming increasingly important. The drive today is towards creating immersive and interactive digital

stories for a diversity of services and applications, spanning from pure entertainment through art, learning and training

towards edutainment and advertising. Sensor based digital storytelling targets the interactivity and feedback in a story

enabling the shift from a passive user to an active and engaged participant feeling a higher degree of affiliation to the

content by triggering more of their senses.

In traditional storytelling, the viewer always follows a structured and logical path through the story. This is known

as linear storytelling. In immersive and interactive stories on the other hand, the participants can interact with the

content and make their own path through the story. This is known as non-linear storytelling which we will refer to as

sensor based digital storytelling. Since the objective of sensor based digital storytelling is to create more immersive

narratives, it is important to notice that the more the sensor based digital story triggers all of our senses, the more

immersed the participants will be and the more presence they will feel in the story. Recent years have seen an extreme

growth in new technologies, such as the Head Mounted Displays (HMD) for Augmented Reality (AR) and Virtual

Reality (VR), which have immense possibilities of new and more immersive and interactive content. Also, the

evolution in technologies like motion tracking and haptic technologies has given new and interesting possibilities.

However, so far, the technologies lack content and applications in order to make us think ‘wow - this is it!’. The search

for these new applications allowing more immersive and interactive content has led the way to a new field of research

within immersive narratives and content creation. Sensor based digital storytelling is a part of this with the concept of

using sensors, visuals and audio to allow interactivity, multi-sensory stimuli and non-linear storylines.

In immersive and interactive content, it is important to engage the user in the best way, by using appropriate sensors

and communicate the interaction in a natural way having a good understanding of Human Computer Interaction and

being able to model and assess the quality of their experiences. Previously it has been common to measure the user’s

quality based on the Quality of Service (QoS) of the system, e.g. only measuring the delivery of the content, making

sure the delivery is error free and as such optimizing the sense of realness of the story, that the digital elements are

received as intended by the transmitter. In some cases, it is a good measure, but for sensor based digital storytelling

and many other modern applications it is not. Even though the system has high accuracy, low delay and is very stable,

meaning that the QoS is high, the end user might experience low Quality of Experience (QoE). For interactive and

immersive content, we should be optimizing the sense of being there rather than the sense of realness. The ICT and

digital media industry is embracing this and is currently changing towards becoming user centric attempting to

optimize the users’ QoE. Convergence of the media and ICT industry has led to a paradigm shift away from using the

network centric QoS as quality measure for networked media handling towards putting into place a quality measure

covering the end to end points in the multimedia system as well as considering context. This drive gives rise to the

problem of defining, modelling and measuring the QoE of immersive experiences.

2. Sensor based digital storytelling

The creative and media industry is all about content and its users consuming ever richer digital media on a plethora

of different devices over various networks. Often the situation of the user can be felt as in Figure 1. Digital storytelling

is all about enhancing the QoE in this rich digital media changing the viewer into a participating, engaged and

immersed user. The methods vary and include adding interactivity, increasing dimension, mixing realities all the way

through to creating content that triggers more senses. All this adds to the complexity and understanding of digital

storytelling.

mailto:[email protected]


http://www.comsoc.org/~mmc/ 28/56 Vol. 13, No. 1, January 2018

A narrative starts as a creative process as an idea in the mind of someone. For the narrative to be usable, it has to be

transcribed into a form that can be eventually turned into an objective digital form. This process requires a trans-

disciplinary skill set and a set of tools. The design usually requires the use of electronics, sensors, actuators and tools

in the form of hardware and software. For the users, the final experience is concerned with issues such as visualization

and user interfaces. There are a lot of existing current practices to this which typically vary from business to business

in the creative and media industry such as publishing, news media, broadcasting, movies, gaming etc. The hardware

available for the users is also ever changing and in constant development, with the mobile devices on a strong rise and

the current trend of head mounted displays and rage about mixed realities. The quest is to create immersive and

interactive content the market is willing to pay for.

Figure 1. The immersive user.

Figure 1 provides a concept of an immersive user today, surrounded by her devices and constant digital expressions

all forming a sensor based digital story throughout the day. The immersive experience differs significantly from a

media experience, especially as the context is important for the user, however, the context often remains unknown for

the creator, content owner and provider. This is also true for the network conditions and device capabilities at the

user’s end. Another problem is differentiating between the technical quality assessment, measuring degradations due

to system parameters such as capture, media processing and network conditions as opposed to the actual aesthetic

quality intended by the creator. Immersive and interactive media supports natural interactions between people and

their environment. The media considered still consist of audio and visual presentations enriched by interactivity by

user interactions including traditional interactivity as well as novel methods such as haptics and explore use of other

media such as olfactory and taste. The ultimate goals are to digitally create real world presence and a sense of being

there as a measure of immersion in an Immersive Media Technology Experiences (IMTE). IMTE is a concept

incorporating several disciplines including Media Technology, Information and Communication Technology, and

Media Studies encompassing diverse core competencies covering fields such as communications, information

retrieval, entertainment and social networks.

There is often a misconception that this is new, which it is not. Some elements have been around for more than 50

years, such as Virtual reality, while other are based on long known principles such as transmedia storytelling.

3. QoE for immersive experiences

In order to find a measure for the user’s perceived quality of the received media presentation, we have been active

in the development shifting from using simple QoS as a measure of the quality to the broader concept of QoE. More

recently the definitions of QoE have been driven by the media processing and delivery community with close links to

other fields such as Psychology and social sciences. A formal definition is given in the Qualinet White paper published

in 2012 [1].



QoE assessment and modelling of immersive experience is multi-faceted problem touching upon many of the

intangible features of human experience which are rather difficult to sense, capture, interpret and/or interact with, let

alone its quality assessment and modelling. It is, however, of great importance because immersion is a major

psychological mechanism in media enjoyment which, if properly unveiled, can lead to significant improvement and

innovation in the value creation of media production and consumption.

Zhang et al. [2] propose a framework which aims to measure immersive experience from the QoE perspectives of

human factors, system factors and design factors. In human factors, Perrin et al. [3] predict and measure sense of

presence using subjective QoE measurements such as neuropsychological and physiological signals (EEG, ECG and

respiration). Using neurophysiological measures, user experiences can be measured less obtrusive. In addition, Antons

et al. Error! Reference source not found. showed that brain responses to quality reductions in some cases can be

more sensitive than behavioral data is. Redi et al. [5] discuss and evaluate the QoE of emerging display technologies

and AR/VR applications in immersive viewing experience. In design factors, Mansilla and Perkis [6] discuss and

evaluate the measurement of implicitly activated QoE judgment in storytelling and design, such as sensation

transference, thin slicing and priming effects.

These QoE assessment methods of immersive experiences reflect the fact that immersion is a multi-dimensional

construct and any attempts to measure and evaluate it must be implemented both at the technical level (i.e. system

factors) and/or by neuro-psycho-physiological means (i.e. human factors), and any mediation and interaction between

them (i.e. contextual factors). The methodology needs to be further advanced by including new elements such as

design factors, experiential factors and media factors, to foreground the unique sensory, perceptual and affective

experiences that are brought forth by an immersive experience.

3. Physiological measures for QoE

The advance of ever better and cheaper sensors makes it possible for the end user to purchase sensors that can measure

physiological parameters. Simple measuring devices, such as heart-rate monitors are already deeply integrated in many

wearables, and more sensors are about to become more popular. This makes it possible to use the sensors in two

different ways; either using them as an additional input for multimedia systems, meaning another contextual variable,

or using them as a feedback measure, measuring the users’ response to certain content.

In the first case, these kinds of measures give the creative industry the chance to use physiological signals as an

additional input, such that the user immerses even more into the story. This could include estimating the current

physical or emotional state, or the change of it, and using this information to offer even more suitable content for the

current situation which is not only based on time and/or location, but in addition on how the user is currently

feeling/how the current situation of the user is. On the other hand, it also could offer hints to the user, that they should

take a break now, because their cognitive capacity is decreasing, and thus giving the user a better experience in the

end.

Secondly, physiological measures can also be used to estimate the perceived level of QoE or how the users’ state is

changing during perception of different multimedia contents. Thus, not asking for a subjective feedback at the end of

the session, as done at the moment, but using the applied sensors for an estimate on how the state has been changing

during the use of the service. Measures of brain waves, i.e. electroencephalography, have shown that the cognitive

state is changing when users are exposed to longer low quality multimedia sequences, implying that the user is

becoming more fatigued Error! Reference source not found.. Furthermore, simpler measures such as e.g. heart rate

variability can give an indicator of the stress level for a user.

4. Use cases

4.1 Adressaparken

The results from quality assessments in multimedia communications let us extend our work moving into new digital

media enabling more immersive experiences. Our fist use case focuses on using lights, audio and visual presentations

and their interactions through sensor networks in public spaces. By creating our own content in the form of interactive

art installations, we are able to experiment on new ways of modeling and assessing QoE expanding the range from

pure audiovisual content to immersive and interactive content. Learning from the culture of the counter-establishment



and from remix innovations in art and digital media —often marked by a reframing of existing narratives from

alternative and innovative perspectives— we invoke the idea of a multiuse play space. The idea is to exploit a public

space facility to explore technological infrastructures and mobile materials that can be moved, combined, taken apart

and placed back together, and “placemaking” community interactions that can be reused and redesigned in a number

of ways. As an experimental platform, we developed a design-led development process for creating an exemplar

multiuse play space in Trondheim, Norway, resulting in Adressaparken. Adressaparken – an interactive installation

park – was designed and implemented in Trondheim, Norway as a platform for sensor based digital storytelling [8].

Adressaparken is a public park of around 1300 square meters surrounding the head office of the local newspaper –

Adresseavisen. The project is co-owned by the municipality of Trondheim, Adresseavisen and NTNU. More

information can be found at https://www.ntnu.edu/thepark/.

The current technical infrastructure of Adressaparken comprises 12 custom-built sensor boxes; eight reusable

support mounts for displays or screens; tree, tunnel, and river-side connectivity and power boxes; six outdoor speakers;

and two display projectors; and nine controllable walk-over LED light lines distributed all over the park. These 12

sensor boxes house smart sensors, mobile equipment, a sensor gateway, and power and Internet connections. All boxes

in the park contain power; USB ports; DMXS12 (a digital multiplex); a high-definition multimedia interface; a VGA;

RJ45 sockets; and Ethernet, Wi-Fi, and fiber-optic connections. We accommodate most of today's multimedia gadgets,

sensor connectivity, and power requirements in a secure, hammer-proof glass casing protected from extreme weather

conditions. We also installed sensors all over the park to monitor the temperature, air, light, sun, noise, pollution, and

presence of people. The driving force was to provide the city and its citizens with an arena for artistic experiences,

development of knowledge and a site for societal debates using sensor based digital stories as a primary focus in the

design process. The storytelling platform provides us with a unique platform for QoE assessment of new digital media

and immersive and interactive content where our users are the general public participating and experiences the story.

So far, we have achieved:

1. Through our successful design and placemaking methodologies, we have implemented a place that has the

characteristics of a successful public place for new digital media experiences;

2. Adressaparken promotes sociability by acting as a gathering place for frequent and meaningful interaction;

it offers activities, opportunities and immersive and interactive content for play;

3. It creates a thriving environment for art, technology, digital awareness, and cultural activities.

For experimentation and expressions, we are offering our Adressaparken infrastructures to anyone who would like to

collaborate, design and create their own projects and activities. Figure 2 shows the three exhibitions currently running

in the park. In \/\/iFi, we get tracked by our movement and smartphone signals and usage. What if we are visually

aware of that data and analytics? In Adressaparken visitors can connect to the \/\/iFi hotspots and get the story exposing

the hidden waves of our digital behaviors and movements in Adressaparken and plays with the digital jungle gym we

unconsciously already build. In Current the story is told thinking about the site of Adressaparken as a public space,

and as an interactive site with sensors that register information about the temperature of its environment. The

collaborative work is centered around interaction with natural spaces and phenomena connecting a glacier and working

with programming effects that could create connection between Adressaparken, a glacier and the digital field. We

thought of currents of traffic and people at Adressaparken, and ice and water being in movement as well as ways to

show fluidity and transformation happening in digital space. RGB Playspace is an open-ended facility for creative

empowerment that can be manipulated and transformed through play. This public art installation invites children and

adults alike to physically explore the interactive space to instantly become a composer of music and lights. RGB

Playspace is an example of art that brings technological change that can benefit everyone. There is a worldwide social

issue of digital awareness being not yet evenly distributed. By bringing art and technology into public spaces, the artist

hopes to pilot a new approach in bringing human community and social technology together - through playfulness and

humility in face of complexities.



Figure 2. Adressaparken (More information can be found at https://www.ntnu.edu/thepark/).

4.2 VisualMedia

Our second use case is from the traditional media industry. This industry has to adapt to the changes happening around

them. One point they have to embrace is to bring the news to different devices and platforms. This includes, TV,

websites, social media, or apps. The story needs to be adapted to each platform individually, as each of them is used

in a different way. In case of news stories for transmedia, it is required to have the content prepared differently for

each of the devices. The change in traditional broadcasting and media sector, and addressing the challenge to win back

the younger generation has also been the main focus of the VisualMedia project [9]. In order to immerse youngsters

and give them a voice, different approaches have been implemented. The project developed a workflow and tool, such

that broadcasters now have the possibility to easily search for content on different social media channels and put those

on live TV. Furthermore, VisualMedia offers accompanying apps that can be used by viewers for polls, published by

the TV channel, or interact with the TV show in different ways. Making the viewer an integral part of a TV show and

having visually appealing graphics. This will give the viewer the feeling of a better level of immersion in the show

and thus a better QoE. Figure 3 shows how user partner TVR (national TV in Romania) is using VisualMedia to

display 3D graphics of social media posts in their sports show.

Figure 3. Case scenario of Romanian national TV (TVR) within the VisualMedia project



5. Conclusions

The best way to be immersed is through a story, be it text, a play, music, a concert or a digital media experience. Thus,

stories can be analogue or digital, where our focus is on the digital stories. A key question in creating and improving

the best digital stories is how to model and assess such an immersive experience through refining and using the

measure Quality of Experience – QoE. As most immersive digital stories rely on some sort of sensors we have taken

this as a first approach and reviewed some recent trends in QoE for immersive experiences. Our research has focused

mainly on the creative aspects of new digital media, designing new platforms for combining art and technology and

creating immersive and interactive content in public spaces through Adressaparken and physiological measures for

QoE. Both areas are receiving a lot of attention in the quality evaluation community and are important aspects of

understanding immersion.

6 Acknowledgements

This work has partially been funded by the European Union's Horizon 2020 research and innovation programme under

grant agreement No 687800.

References [1] P. L. Callet, S. Möller and A. Perkis, “Qualinet White Paper on Definitions of Quality of Experience,” European Network on Quality of

Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.1, Jun. 2012. [2] Zhang, C., Hoel, A. S., & Perkis, A., “Quality of Immersive Experience in Storytelling: A Framework”. In Proc. IEEE Int. conf. on Quality of

Multimedia Experience (QoMEX 16), June 2016.

[3] Perrin, A. F., Řeřábek, M., & Ebrahimi, T., ”Towards prediction of Sense of Presence in immersive audiovisual communications” in

Electronic Imaging, pp. 1-8., 2016

[4] Antons, J.-N., Schleicher, R., Arndt, S., Möller, S., Porbadnigk, A. K. & Curio, G. (2012). Analyzing Speech Quality Perception using

Electro-Encephalography. Journal of Selected Topics in Signal Processing. IEEE, 721-731. [5] Redi, J. A., Zhu, Y., de Ridder, H., Heynderickx, I., ”How passive image viewers became active multimedia user”, in Visual Signal Quality

Assessment Springer International Publishing, 205, pp. 31-72.

[6] Mansilla, W. A., & Perkis, A., “Design and storytelling concepts in the quality of experience”, in Proc. IEEE Int. conf. on Quality of

Multimedia Experience (QoMEX 15), May 2015, pp. 1-6.

[7] Arndt, S., Antons, J.N., Schleicher, R., Möller, S. (2016). Using electroencephalography to analyze sleepiness due to low-quality audiovisual

stimuli. Signal Processing Image Communication (42). 120-129 [8] Wendy Ann Mansilla, Andrew Perkis, “Multiuse Playspaces: Mediating Expressive Community Places”, in IEEE MultiMedia vol. 24 no. 1,

2017, p. 12-16, (http://folk.ntnu.no/wendyann/Adressaparken_toolkit/).

[9] Arndt, S., Räty, V.P., Perkis, A. (2016). Opportunities of Social Media in TV Broadcasting. Proceedings of the 9th Nordic Conference on

Human-Computer Interaction. 123

Andrew Perkis received his Siv.Ing and Dr. Techn. Degrees in 1985 and 1994, respectively. In

2008, he received an executive Master of Technology Management in cooperation from NTNU,

NHH and NUS (Singapore). His current research focus is within the synergies of art and

technology (NTNU ARTEC), methods and functionality of content representation, quality

assessment and its use within the media value chain. His application focus is on art in public

spaces, sensor based digital storytelling, change management and business modelling for the creative and media

industry.

Sebastian Arndt is a post doc at the Norwegian University of Science and Technology, NTNU, in

Trondheim. He studied Computer Science at Technische Universität Berlin and received his diploma

in 2010. He received his doctoral degree (Dr.-Ing.) in 2015 from TU Berlin in the group of 'Quality

and Usability'. His current research is focusing on usability evaluation methods, and developing user

requirements (user-centered design) in the context of TV broadcasting which also includes evaluating

implemented systems. Furthermore, his research interests are in the area of Quality of Experience (QoE) of multimedia

and using (neuro)physiological methods to enhance well-being and quality of life.

http://folk.ntnu.no/wendyann/Adressaparken_toolkit/



SPECIAL ISSUE ON Content Caching and Sharing in Wireless Networks

Guest Editors:

Zheng Chang, University of Jyväskylä, Finland

Zhenyu Zhou, North China Electric Power University, China

[email protected], [email protected]

The proliferation of smart mobile devices and multimedia services has resulted in the explosive

increase of wireless data traffic. To cope with this challenge on data demand, caching content

and sharing at the network edge, including base stations and devices, has been proposed to

offload network traffic by pulling content closer to end users. With network edge caching, the

multimedia application data can be stored and shared in an efficient and distributed manner, and

thus can alleviate the burden of data transmission on the backhaul, improve the latency

performance and also save the battery of mobile devices. Meanwhile, due to its inherent nature

of caching and sharing at the network edge, there are many challenges ahead towards a reliable,

accurate and efficient mechanism for content caching and sharing among a number of mobile

devices. Specially, due to the large number of multimedia applications over massive devices and

rapid development of advance wireless technologies, content caching and sharing design is of

profound importance.

The 4 papers included in this special issue on content caching and sharing in wireless netwoirks

to address a number of noteworthy challenges and present the corresponding solutions and

suggestions. These contributions are made by authors who are renowned researchers in the field,

and the audience will find in these papers the research advances for enhanced content centric

wireless network for the multimedia services in terms of better efficiency and reliablity, among

many other metrics. Each of these 4 papers is briefly introduced in the following paragraphs.

As there is a trend to use renewable energy for the future heterogeneous networks, how to cope

the content caching with the energy harvesting capability is one of the most significant problems.

In contribution, “Content Caching and Push in Small Cells with Renewable Energy”, Jie Gong

explore the content information to design the joint caching and push mechanism in the small-cell

base stations (SBSs) powered by renewable energy. The problem is formulated as a Markov

decision process by exploring the features of content popularity and renewal and by taking into

consideration the energy consumption for both content fetch from core network and push to the

users. The objective is to minimize the number of requests which cannot be met by the SBSs.

Energy efficiency is a critical metric in the content caching system. “Energy Efficiency Analysis

of 5G Content Caching System” introduces a comprehensive system model for 5G with the

content caching mechanism. The authors introduce the caches into in-network router, BS and the

neighboring smart device sides with caching mechanism. With the smart portable devices, the

user can obtain its request contents from neighboring user’s caches. Additionally, it can also

obtain the request contents from in-network router or BS caches. The energy efficiency

performance of core network, long distance as well as short distance scenarios are then

investigated with achievable sum rate and consumed energy analyzes. It is a versatile model that

can be adopted by similar work as well.



Reducing the energy consumption of the wireless network is able to improve the energy

efficiency of the content caching system, while sometimes at the cost of longer transmission

delay. In contribution “”,Thang X. Vu, Lei Lei, and Satyanarayana Vuppala investigate the

energy efficiency performance of content delivery networks in which a data center serves

multiple users via a shared wireless medium. Focusing on latency-tolerant applications, the

authors propose energy-efficient precoding design and optimization that minimize the total

energy consumption while guaranteeing some given quality of service constraints. In particular,

an energy-buffering time trade-off is derived in a closed-form expression for single-user

scenarios, which reveals the impact of the key system parameters on the total energy

consumption. An energy minimization problem is formulated with a minimum mean square error

(MMSE)-based precoding design for multiple-user scenarios and addressed via a linear

approximation of the non-convex constraint.

Due to the inherent nature of vehicular network, the design of content caching and sharing is of

profound significance. In contribution, “Cooperative Content Caching and Distribution in

Multihop D2D-V2V Networks”, Yahui Wang, Zhenyu Zhou, Houjian Yu, and Chen Xu present

investigate how to achieve dependable content distribution in device to device (D2D) based

cooperative vehicular networks by combining big data based vehicle trajectory prediction with

coalition formation game based resource allocation, determine the formation of content

distribution groups with different lifetimes as a coalition formation game, and evaluate the delay

performance based on real-world map and realistic vehicular traffic. It can be observed that with

big data analytic capability, the content distribution scheme for vehicular network can

significantly improve the delay performance.

The guest editors would like to give our special thanks to all the authors for making contribution

to this special issue. We are also thankful to the MMTC Communications–Frontier Board for

providing helpful support.

Guest Editor Zheng Chang received the B.Eng. degree from Jilin University, Changchun, China in 2007,

M.Sc. (Tech.) degree from Helsinki University of Technology (Now Aalto University), Espoo,

Finland in 2009 and Ph.D degree from the University of Jyväskylä, Jyväskylä, Finland in 2013.

Since 2008, he has held various research positions at Helsinki University of

Technology, University of Jyväskylä and Magister Solutions Ltd in Finland.

He was a visiting researcher at Tsinghua University, China, from June to

August in 2013, and at University of Houston, TX, during from April to May

in 2015. He has been awarded by the Ulla Tuominen Foundation, the Nokia

Foundation and the Riitta and Jorma J. Takanen Foundation for his research

work. Currently he is working as a Assistant professor with University of

Jyväskylä and his research interests include cloud/edge computing, radio

resource allocation, and green communications. He is an Editor of IEEE

Access, Wireless Network and MMTC communication Frontier, and a guest

editor of IEEE Communications Magazine, IEEE Wireless Communications,



Wireless Communications and Mobile Computing, and IEEE Access. He serves as a TPC

member for numerous IEEE conferences, such as INFOCOM, ICC and Globecom, and reviewer

for major IEEE Journals, such as IEEE TVT, TWC, JSAC, TMC, ToN etc. He has received best

conference paper awards from IEEE APCC and IEEE TCGCC in 2017.

Zhenyu Zhou received his M.E. and Ph.D degree from Waseda University, Tokyo,

Japan in 2008 and 2011 respectively. From April 2012 to March 2013, he was

the chief researcher at Department of Technology, KDDI, Tokyo, Japan. From

March 2013 to now, he is an Associate Professor at School of Electrical and

Electronic Engineering, North China Electric Power University, China. He is

also a visiting scholar with Tsinghua-Hitachi Joint Lab on Environment-

Harmonious ICT at University of Tsinghua, Beijing from 2014 to now. He

served as an Associate Editor for IEEE Access, and a Guest Editor for IEEE Communications

Magazine and Transactions on Emerging Telecommunications Technologies. He also served as

workshop co-chair for IEEE ISADS 2015, and TPC member for IEEE Globecom, IEEE CCNC,

IEEE ICC, IEEE APCC, IEEE VTC, IEEE Africon, etc. He is a voting member of P1932.1

Working Group. He was the recipient of the IEEE Vehicular Technology Society "Young

Researcher Encouragement Award" in 2009, the “Beijing Outstanding Young Talent Award in

2016, the IET Premium Award in 2017, and the IEEE ComSoc Green Communications and

Computing Technical Committee 2017 Best Paper Award. His research interests include green

communications, vehicular communications, and smart grid communications. He is a senior

member of IEEE.



Content Caching and Push in Small Cells with Renewable Energy

Jie Gong

School of Data and Computer Science,

Sun Yat-sen University, Guangzhou 510006, China

Abstract: In this paper, we explore the content information to design the joint caching and push

mechanism in the small-cell base stations (SBSs) powered by renewable energy. The problem is

formulated as a Markov decision process by exploring the features of content popularity and

renewal and by taking into consideration the energy consumption for both content fetch from core

network and push to the users. The objective is to minimize the number of requests which cannot

be met by the SBSs. We adopt the policy iteration algorithm to obtain the optimal caching and

push policy. The performance gain of the proposed algorithm is shown in the numerical results.

1. Introduction

Recently, energy harvesting (EH) technology [1] has been considered as one of the candidate

technologies for green communications. However, due to the randomness of energy arrival and

limitations on the battery capacity, energy waste or shortage will occur when the energy harvesting

process and the traffic pattern mismatches with each other in either spatial or time domain. To

improve the efficiency of the harvested energy, one should adjust the power allocation policy using

the traffic information to re-shape the energy profile to match the traffic profile. As users may be

interested in the same content (latest news, popular videos and etc.), lots of repeated transmissions

can be reduced if the content information is fully utilized.

The content caching and push mechanism is viewed as a promising way to improve the efficiency

of content delivery in wireless network. To reduce the core network overhead, contents are

suggested to be cached at the small-cell BSs (SBSs) [2] with proactive caching. On the other hand,

with the improvement of data storage capacity, user devices are capable of storing large amount

of data. Hence, the content push mechanism [3] is developed based on wireless multicast. With

renewable energy, ref. [4] uses EH based SBSs to cache contents for the deployment flexibility

and energy consumption reduction, and ref. [5] designs the energy-aware resource allocation

algorithm with limited content cache. However, joint content caching and push policy design using

renewable energy is still an open problem.

In this paper, we combine the EH technology with the content caching and push by considering

EH powered SBSs under the GreenDelivery framework [6]. Specifically, with the non-negligible

energy consumption offetching contents from the core network, the SBS can not cache all the

contents due to the limited renewable energy. It needs to decide when to fetch, push or unicast

contents depending on the energy condition. We optimize the joint caching and push policy using

Markov decision process (MDP) [7] approach. Numerical results are provided to illustrate the

influence of cache size under different parameter settings as well as the tradeoff between the

number of cached contents in the SBS and the available energy for content push.

2. System Model and Problem Formulation

Consider a two-tier heterogeneous cellular network composed of a macro-cell BS (MBS) and a

second-tier small-cell with radius 𝑅, as shown in Fig. 1. The MBS is powered by the power grid



and the SBS is powered by renewable energy, and the harvested energy can be stored in a battery

with capacity 𝐵max. There is a dedicated wired/wireless backhaul link for the SBS to fetch contents

from the core network through the MBS, which consumes a fixed amount of energy 𝐸𝑓. The SBS

has a limited content cache size 𝑁. The content is transmitted with a constant data rate, and hence,

the transmission power depends on the distance between user and SBS.

Fig. 1. Two-tier heterogeneous cellular network

The system is slotted with time slot length 𝑇𝑠. The contents are assumed of the same length and

can be completely delivered in a time slot. The popularity of the contents is well fitted by the Zipf

distribution [8]. Specifically, the popularity of the 𝑖-th ranked content can be expressed as

𝑓𝑖 =1/𝑖𝑣

∑ 1𝑁𝑗=1 /𝑗𝑣

, (1)

where 𝑣 ≥ 0 is the skew parameter, N is the total number of contents. In additon, Assume in each

slot, a content leaves the system and is replaced by a new one with probability 𝑝𝑐 ∈ [0,1]. The

leaving content is uniformly chosen from 1,2, ⋯ , 𝑁.

There are two content cache states in this system, i.e., the number of cached contents in the SBS

��𝑘 and those at users ��𝑘, where 𝑘 is the time index. For optimality, the SBS and users always cache

the most popular contents. The SBS’s action includes: fetch a content from the MBS, unicast the

required content to a specific user, push a content to all users, or sleep. It can be denoted by 𝑢𝑘 =(��𝑘, ��𝑘), where ��𝑘 ∈ {0,1} indicates the fetch action, ��𝑘 ∈ {0,1,2} indicates sleep, unicast or push.

Notice that the backhaul link is orthogonal to the downlink unicast or push.

The user request is assumed to follow the Bernoulli distribution, i.e., there is a content request

with probability 𝑝𝑢 ∈ [0,1] in each time slot. The user request can be represented by 𝑄𝑘 = 𝑃𝑡(𝑑)𝑇𝑠,

where d is the transmission distance. The required energy for content push is 𝐸𝑝 = 𝑃𝑡(𝑅)𝑇𝑠. Set

𝑄𝑘 = 0 to indicate either there is no request or the content is in users' cache. The battery energy

state 𝐸𝑘 is updated as 𝐸𝑘+1 = min { 𝐵max, 𝐸𝑘 − 𝑈𝑘 + 𝐴𝑘}, where 𝑈𝑘 is the energy used for

transmission which satisfies 𝑈𝑘 ≤ 𝐸𝑘 , and 𝐴𝑘 is the harvested energy in period 𝑘 , which is

assumed i.i.d. with average ��. Our problem can be described as minimizing the ratio of user

requests handled by the MBS over the total user requests by adjusting the behavior of the SBS

under the energy constraint.

3. Optimal Policy Design

To find the optimal solution, we need to decide the SBS’s action based on the system state at the

beginning of each time slot. MDP [7], also termed as dynamic programming (DP), is an effective



tool to solve this type of problems and is widely used for the control optimization of stochastic

process. A standard MDP problem contains the following elements: state, action, cost function,

and state transition. In our problem, the state xk includes the battery state Ek, user request Qk, and

cache states ��𝑘, ��𝑘 in the SBS and users, the action uk includes fetch, push, unicast, and sleep, the

cost gk(xk, uk) is an indicator whether a user request is denied by the SBS. Then the optimization

problem can be re-written as

min lim𝐾→+∞

1

𝐾𝔼 [∑ 𝑔

𝐾−1

𝑘=0

(𝑥𝑘, 𝑢𝑘(𝑥𝑘))]. (2)

The expectation operation is taken over all the random parameters including energy arrival, user

request, and content update. The optimization is taken over all the possible policies {𝑢1, 𝑢2, ⋯ }. It

can be proved that there exists an optimal stationary policy 𝑢∗, and the optimal average cost 𝜆∗

together with some vector ℎ∗ = {ℎ∗(𝑥)|𝑥 ∈ 𝒮} satisfies the Bellman’s equation

𝜆∗ + ℎ∗(𝑥) = min𝑢∈𝒰(𝑥)

[𝑔(𝑥, 𝑢) + ∑ 𝑝𝑥→𝑦|𝑢

𝑦∈𝒮

ℎ∗(𝑦)]. (3)

Furthermore, if 𝑢∗(𝑥) attains the minimum value of (2) for each 𝑥, the stationary policy 𝑢∗ is

optimal. Based on the Bellman’s equation, the policy iteration algorithm can effectively solve the

problem. Suppose in the 𝑗-th step, we have a stationary policy denoted by 𝑢(𝑗). Based on this policy,

we perform policy evaluation step, i.e.,

𝜆(𝑗) + ℎ(𝑗)(𝑥) = 𝑔(𝑥, 𝑢(𝑗)(𝑥)) + ∑ 𝑝𝑥→𝑦|𝑢(𝑗)(𝑥)

𝑦∈𝒮

ℎ(𝑗)(𝑦) (4)

for ∀𝑥 ∈ 𝒮 to get the average cost 𝜆(𝑗) and vector ℎ(𝑗). As 𝑢(𝑗) may not be the optimal policy, we

subsequently perform policy improvement step to find the policy 𝑢(𝑘+1) which minimizes the right

hand side of Bellman’s equation

𝑢(𝑗+1)(𝑥) = arg min𝑢∈𝒰(𝑥)

[𝑔(𝑥, 𝑢) + ∑

𝑦∈𝒮

𝑝𝑥→𝑦|𝑢ℎ(𝑗)(𝑦)]. (5)

If 𝑢(𝑗+1) = 𝑢(𝑗), the algorithm terminates, and the optimal policy is obtained 𝑢∗ = 𝑢(𝑗). Otherwise,

repeat the procedure by replacing 𝑢(𝑗) with 𝑢(𝑗+1) . It is proved that the policy iteration

algorithmterminates in finite number of iterations.

4. Numerical Results

In this section, we run some numerical simulations for performance evaluation. We set the cell

radius 𝑅 = 50m, the required content delivery spectrum efficiency 𝑟0/𝑊 = 1bps/Hz, the pathloss

parameters 𝛽 = 10dB and 𝛼 = 2, and the Zipf parameter 𝑣 = 1. The maximum transmit power or

equivalently the transmit power for cell-edge user is set 𝑃𝑡(𝑅) = 1Watt. The channel coefficient

ℎ follows Rayleigh fading. The quantized battery capacity is set to 𝐸max = 12. The energy arrival

process follows a Poisson distribution with average arrival rate �� units of energy.

We compare the proposed algorithm with some heuristic algorithms to demonstrate the

importance of joint optimization of content caching and push. We consider the following baseline

algorithms: greedy fetch policy, in which the SBS always fetches contents as long as there is

sufficient energy and the cache in the SBS is not full, threshold fetch policy, in which the SBS

fetches contents if the cache size in the SBS does not achieve a pre-defined threshold 𝑁th, and



non-push policy, in which the SBS only unicasts the required contents to the users on demand.

Compared with greedy fetch policy, our optimal policy reduces the SBS blocking probability by

more than 40%. The results of threshold fetch policies with different 𝑁th show a tradeoff between

the number of cached contents in the SBS and the available energy for content push. In our settings,

the optimal threshold is 𝑁th = 5. In addition compared with non-push policy, the greedy fetch

policy can achieve more than 20% blocking probability reduction, which illustrates the great

performance improvement by push.

Fig. 2. Comparison with some heuristic algorithms. 𝒑𝒖 = 𝟎. 𝟓, 𝒑𝒄 = 𝟎. 𝟑, �� = 𝟎. 𝟔, 𝑬𝒇 = 𝑬𝐮𝐧𝐢𝐭, 𝑵 = 𝟖.

5. Conclusion

In this paper, content caching and push mechanism in EH-powered SBSs is jointly optimized. The

proposed policy iteration algorithm solves the problem, and the obtained optimal policy performs

much better than the heuristic greedy fetch policy and non-push policy. Using more energy to fetch

improves the content availability in the SBS, but degrades the energy availability for push/unicast.

The proposed optimal policy well balances the content availability and the energy availability.

REFERENCES [1]. D. Gunduz, K. Stamatiou, N. Michelusi, and M. Zorzi, “Designing intelligent energy harvesting

communication systems,” IEEE Communications Magazine, vol. 52, no. 1, pp. 210–216, Jan. 2014. [2]. N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire, “Femtocaching and device-to-device

collaboration: A new architecture for wireless video distribution,” IEEE Communications Magazine, vol. 51, no. 4, pp. 142–149, Apr. 2013.

[3]. I. Podnar, M. Hauswirth, and M. Jazayeri, “Mobile push: Delivering content to mobile users,” in Proc. 22nd International Conference on Distributed Computing Systems Workshops, 2002.

[4]. N. Sharma, D. Krishnappa, D. Irwin, M. Zink, and P. Shenoy, “Greencache: Augmenting off-the-grid cellular towers with multimedia caches,” in Proc. ACM MMsys13, Feb. 2013.

[5]. A. Kumar and W. Saad, “On the tradeoff between energy harvesting and caching in wireless networks,” in IEEE International Conference on Communications (ICC), London, U.K., 2015.

[6]. S. Zhou, J. Gong, Z. Zhou, W. Chen, and Z. Niu, “Greendelivery: Proactive content caching and push with energy-harvesting-based small cells,” IEEE Communications Magazine, vol. 53, no. 4, pp. 142–149, Apr. 2015.

[7]. D. P. Bertsekas, Dynamic programming and optimal control, Volume II, 3rd edition. Athena Scientific Belmont, MA, 2005.



[8]. M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon, “I tube, you tube, everybody tubes: Analyzing the world’s largest user generated content video system,” in Proc. 7th ACM SIGCOMM Conference on Internet measurement. ACM, 2007.

Jie Gong (S'09, M'13) received his B.S. and Ph.D. degrees in Department of Electronic

Engineering in Tsinghua University, Beijing, China, in 2008 and 2013, respectively.

From July 2012 to January 2013, he visited Institute of Digital Communications,

University of Edinburgh, Edinburgh, UK. During 2013-2015, he worked as a

postdoctorial scholar in Department of Electronic Engineering in Tsinghua University,

Beijing, China. He is currently an associate research fellow in School of Data and

Computer Science, Sun Yat-sen University, Guangzhou, Guangdong, China. He served

as workshop co-chair for IEEE ISADS 2015 and TPC member for the IEEE/CIC ICCC

2016/17, the IEEE WCNC 2017, the IEEE Globecom 2017, the IEEE CCNC 2017, and the APCC 2017.

He was a co-recipient of the Best Paper Award from IEEE Communications Society Asia-Pacific Board in

2013. He was selected as the IEEE Wireless Communications Letters (WCL) Exemplary Reviewer in 2016.

His research interests include Cloud RAN, energy harvesting technology and green wireless

communications.



Energy Efficiency Analysis of 5G Content Caching System

Di Zhang1,2, Zhenyu Zhou2,3, Zhengyu Zhu1, Shahid Mumtaz4

1School of Information Engineering, Zhengzhou University, Zhengzhou, 450-001, China.

2Department of Electric and Computer Engineering, Seoul National University, Seoul, 151-742,

Korea.

3State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources,

School of Electrical and Electronic Engineering, North China Electric Power University,

Beijing, 102206, China.

4Instituto de Telecomunicações, Aveiro, 1049-001, Portugal.

1. Introduction

Although various studies have been done with ICN's caching and sharing (CS) mechanism on energy efficiency (EE)

topic, it is found that the CS mechanism is separately discussed in prior work. A comprehensive performance

comparison of obtaining the request contents from caches located in in-network router, base station (BS) and

neighboring user side is still in its fancy. That is, obtaining the request contents from where, under what specific

condition, is still ambiguous. This inspires us to develop this treatise. To compare those three scenarios, it is assumed

the request content are distributed to the core router, BS, as well as neighboring users. Those distributions are defined

as the core router, long distance and short distance scenarios for the sake of convenient. The placement problem for

obtaining the request content is finally addressed with the analyzes and numerical results.

The contributions of this study are summarized as follows:

• A comprehensive redesigned system model is introduced for fifth generation (5G) with the CS mechanism.

That is, we introduce the caches into in-network router, BS and the neighboring smart device sides with CS

mechanism. With the smart portable devices, the user can obtain its request contents from neighboring user’s

caches. Additionally, it can also obtain the request contents from in-network router or BS caches.

• The EE performance of core network, long distance as well as short distance scenarios are investigated with

achievable sum rate and consumed energy analyzes. It is a versatile model that can be adopted by similar

work as well.

• Numerical results are used to answer the specific condition of where to obtain the request content problem.

It is found that the short distance scenario has best the EE performance, followed by the long distance and

core router scenarios. However, due to the limited battery of smart devices, in reality, long or core router

scenarios are more reasonable choices.

2. System model

In the proposed system here, massive multi-input-multi-output (MIMO) antenna array is selected as the outdoor BS.

It is assumed that one cell has 20 users with hundreds of massive MIMO antenna arrays, which is a widely used

assumption of massive MIMO BS in 5G [7]. The user’s requested contents can be either fulfilled by the contents

storing in the caches of neighboring users with smart devices, BS and the in-network core routers with CS mechanism.

In contrast, the request contents can be directly retrieved from the remote content server (on condition that there is no

requesting content cached in the cache). Detail information of the optimized system is given by Fig. 1. As shown, in

the system, the CS concept is comprehensively introduced to the neighboring user, BS as well as the in-network router

sides. This is different from the prior literature that separately investigates the CS mechanism from “in-network”, BS

or neighboring user regimes.

Suppose there are two users within one cell area, user A and user B, as shown by Fig. 1. In addition, user A, B, BS

and the in-network routers in the wired core network are capable of CS the temporary hot contents (which are visited

a lot). In this case, whenever user A has a content request, say “objective A request”, it can be obtained from the

caches named “Copy of A”. In contrast, obtaining it from the remote content server as the conventional system model

without the CS mechanism via back-haul links connected to the wireless and wired sections. Compared with obtaining

from the remote content server, the CS mechanism, once applied, can reduce the energy consumption via shorter



distance and less components that engaged in the transmission procedure, which yields better system EE performance.

However, by what scale the EE performance will be enhanced is still ambiguous. Moreover, the decision should be

clarified: say under what constraint, distributing the contents and obtaining them from where within the constraint of

this proposed system model should be set forth. This is the focused content distribution problem in this study, which

will be answered by the following sections.

Fig. 1. Description of the proposed system model.

B. Sum Rate

By following the prior analysis in [2], after the zero-forced beamforming (ZFBF), signal to interference plus noise

ratio (SINR) expression of user k will be

𝑆𝑁𝐼𝑅𝑘 = 𝜌𝑘

𝑀 − 𝑁

𝑁,

where 𝜌𝑘 is the signal to noise (SNR) of user k that can be given by 𝜌𝑘 = 𝑃𝑘

𝑃𝑛. Here 𝑃𝑘 , 𝑃𝑛 separately yields the

power of user k and the channel noise power. In addition, M, N the number of transmit and receive antenna. In this

case, by summarizing all the transmission rate within one cellular area, the achievable sum rate of one cellular area

will be

𝑅𝑠𝑢𝑚 = ∑ 𝑅𝑘

𝑁

𝑘=1

= ∑ 𝐵𝑙𝑜𝑔2 (1 + 𝜌𝑘

𝑀 − 𝑁

𝑁) ,

𝑁

𝑘=1

where B is the carrier bandwidth. Note that although the qualitative expression of 𝜌𝑘 has been given, but the specific

values of are still unknown. To settle down this, the following analysis will be employed.

3. Energy Efficiency Analysis

A. The long and short distance scenarios

We define obtaining the contents from caches of neighboring user, BS and core router as the short distance, long

distance and core router scenarios here in this study. By obtaining the requesting contents from neighboring user’s

caches and BSs, the transmission procedures are similar other than the distances that traveled through, thus we

comprehensively give their analysis first. With free space propagation model in hand, the SNR can be rewritten as

𝑃𝑘 =𝑃𝑡[

√𝐺𝑙𝜆

4𝜋𝑑]

2

𝑃𝑛.

Here 𝑝𝑡 is the emission power at the transmitter side, 𝐺𝑙 the coefficient of field radiation patterns in light of sight

(LoS) direction, 𝜆 the wavelength, d distance from transmitter to receiver, respectively. In line with prior work in

[3], the received noise power at receiver side can be given as

𝑃𝑛 = −174 + 10𝑙𝑜𝑔10𝐵 (𝑑𝐵𝑚). The achievable sum rate within one cellular area turns out to be

𝑅𝑠𝑢𝑚 = ∑ 𝐵𝑙𝑜𝑔2 (1 +𝑃𝑡[

√𝐺𝑙𝜆

4𝜋𝑑]

2

(𝑀−𝑁)

𝑁(−174+10𝑙𝑜𝑔10𝐵 (𝑑𝐵𝑚)))𝑁

𝑡=1 .

On condition that the requested content is obtained from the BS cache, power consumption can be estimated by

𝑃𝑤 = 𝑃𝑡 ,𝑡𝑜𝑡𝑎𝑙+ 𝑃𝑏𝑠 + 𝑃𝑅𝐹 + 𝑃𝑐𝑖𝑟𝑐𝑢𝑖𝑡 ,



where 𝑃𝑡 ,𝑡𝑜𝑡𝑎𝑙 is the power consumption of massive MIMO antenna array with 𝑃𝑡 ,𝑡𝑜𝑡𝑎𝑙 = ∑ 𝑃𝑡 ,𝑁𝑡=1 𝑃𝑏𝑠, 𝑃𝑅𝐹 , 𝑃𝑐𝑖𝑟𝑐𝑢𝑖𝑡

respectively denote the power consumption of massive MIMO array, BS machine room, radio frequency (RF) and

the circuit of this transmission procedure. Typically, RF power consumption is around 100 ∼ 200 mW, which is

ignored in the analysis. This gives the EE performance of long distance scenario

𝜂𝑒𝑒 ,𝑙𝑜𝑛𝑔 = 𝑅𝑠𝑢𝑚

𝑃𝑡,𝑡𝑜𝑡𝑎𝑙 + 𝑃𝑏𝑠 + 𝑃𝑐𝑖𝑟𝑐𝑢𝑖𝑡

.

In the short distance scenario, there is no power consumption from BS, circuit, by following a similar analysis

procedure, the EE expression can be given as

𝜂𝑒𝑒 ,𝑠ℎ𝑜𝑟𝑡 = 𝑅𝑠𝑢𝑚(4𝜋𝑑2)2

2𝑅𝑠𝑢𝑚

𝐵−1𝑃𝑛(√𝐺𝑙𝜆)

2.

It is worth to note that in short distance scenario, the power threshold of user equipment currently is around 1 ∼ 2

W.

B. The core router scenario

The power consumption of core router scenario can be estimated as

𝑃𝑐 = 𝑁𝑛 ∗ 𝑃𝑐𝑐 + (𝑁𝑛 + 1)𝑃𝑜 + 𝑃𝑤 , here 𝑁𝑛 is the number of core network equipment. Additionally, 𝑃𝑐𝑐 , 𝑃𝑜 are the power consumptions of one pair of

core network, optical fiber, respectively. The reason that needed optical fiber is 𝑁𝑛 + 1 giving 𝑁𝑛 is that, the optical

fiber link is need from BS to the first router by adopting a simple equal optical fiber distance from BS to core router,

core router to core router and core router to remote center. Moreover, 𝑃𝑐𝑐 can be given as

𝑃𝑐𝑐 = 𝑃𝑡𝑟𝑎𝑛𝑠 + 𝑃𝑝ℎ𝑦 + 𝑃𝑚𝑎𝑐 + 𝑃𝑡𝑝 + 𝑃𝑓𝑖 + 𝑃𝑚𝑒𝑚 + 𝑃𝑙𝑐 + 𝑃𝑝𝑠,

where 𝑃𝑡𝑟𝑎𝑛𝑠 , 𝑃𝑝ℎ𝑦 , 𝑃𝑚𝑎𝑐 , 𝑃𝑡𝑝 , 𝑃𝑓𝑖 , 𝑃𝑚𝑒𝑚 , 𝑃𝑙𝑐 , 𝑃𝑝𝑠 respectively denote the power of transceiver, physical player (such as

enconding/decoding, scrambing/descrambling, forward error correction (FEC)), Mac layer (such as mapping,

framing), transport profile/forward error (TP/FE) (such as packet processing, classifying), fabric interference, line

card, and the packet switch. On condition that one line card is used in the transmission with all the others turned off

to save energy, according to the estimation in [4], their values are 5.9 W, 3.4 W, 30.6 W, 183.6 W, 61.2 W, 13.6 W,

298.3 W, 224.4 W, respectively. By further taking the fan section into consideration, according to prior estimation

[5, 6], we have 𝑃𝑐𝑐′ =

100∗𝑃𝑐𝑐

67 . In this case, EE performance of core router scenario will be

𝜂𝑒𝑒 ,𝑐𝑜𝑟𝑒 = 𝑅𝑠𝑢𝑚

𝑃𝐶.

4. Simulation results

The simulation parameter that used here is given as Table I, in line with prior work [7, 8] and the 3GPP documents.

By comparing all the three scenarios, observation has that the short distance scenario consumed the least power

following by the long distance scenario and core router cache scenarios. In this regard, the short distance scenario

displays the best system EE performance, followed by the long distance scenario and core router scenario. All of

the cache-enabled scenarios display better EE performance than prior studies. In this case, ICN’s CS mechanism can

reduce the power consumption and enhance the system EE performance. In addition, while adopting the CS

mechanism, for distance less than 0.2 m (typically in the indoor environment), short distance scenario is a good

choice with the best system EE performance. Generally, in the outdoor environment, the long distance scenario can

be a more feasible choice.

Table I. Simulation parameters.

parameters values 𝐺𝑙 1

Carrier frequency f 1900 MHz User power threshold 𝑃𝑡ℎ 2 W

Carrier bandwidth B 20 MHz parameters values

transmit antenna number M 100 BS range 𝑑1 350 m

Receiver number N 20 BS power consumption 𝑃𝑏𝑠 400 W

Per user request 20 MBit/s 𝑃𝑐𝑖𝑟𝑐𝑢𝑖𝑡 160.8W



Fig. 2. EE performance of core router scenario.

Fig. 3. EE performances of short, long distance scenarios.

5. Conclusion

One comprehensive system model was introduced with CS mechanism in this paper. Based on the proposed system

model, EE performance analysis was comprehensively investigated with in-network core router, short and long

distance scenarios. Simulation results demonstrated that all the three scenarios displayed better EE performance

compared to the prior studies without the CS mechanism. While applying the CS mechanism in the outdoor

environment, long distance scenario is a more reasonable choice. In the indoor and other scenarios with closer user

to user distance, short distance scenario is better than the long distance scenario. Otherwise, the core router scenario

is a feasible choice compared to the without CS mechanism.

Reference

[1] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive MIMO for next generation wireless systems,”

IEEE Commun. Mag., vol. 52, no. 2, pp. 186–195, Feb. 2014.

[2] M. Jung, T. Kim, K. Min, Y. Kim, J. Lee, and S. Choi, “Asymptotic distribution of system capacity in multiuser

MIMO systems with large number of antennas,” in IEEE VTC, Jun. 2013, pp. 1–5.

[3] Q. Gu, “RF system design of transceivers for wireless communication.” Springer US, 2005.

[4] S. Aleksic, “Analysis of power consumption in future high-capacity network nodes,” J. Opt. Commun. and

Netw., vol. 1, no. 3, pp. 245–258, Aug. 2009.

[5] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The cost of a cloud: Research problems in data center

networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008.

[6] A. C. Orgerie, M. D. d. Assuncao, and L. Lefevre, “A survey on techniques for improving the energy efficiency

of large-scale distributed systems,” ACM Comput. Surv., vol. 46, no. 4, pp. 4701–4731, Mar. 2014.

[7] Z. Zhou, J. Gong, Y. He, and Y. Zhang, “Software defined Machine-to-Machine communication for smart

energy management,” IEEE Commun. Mag., vol. 55, no. 10, pp. 52–60, Oct. 2017.

[8] C. Lab, “Cloud radio network white paper (3rd edition),” Tech. Rep., Jun. 2014.



Di Zhang (S’13-M’17) received his Ph.D. degree with honor from Waseda University, Tokyo,

Japan (2013-2017), MSc. degree with honor from Central China Normal University, Wuhan,

China (2010-2013). Currently, he is an Assistant Professor with the Zhengzhou University,

Zhengzhou, 450-001, China, and also a Senior Researcher with the Information System

Laboratory, Department of Electrical and Computer Engineering, Seoul National University,

Seoul, 151-742, South Korea. He visited the State Key Laboratory of Alternate Electrical Power

System with Renewable Energy Sources, North China Electric Power University (2015-2017),

and the Advanced Communication Technology Laboratory, National Chung Hsing University

(2012). He served as the TPC member of several IEEE conferences, such as IEEE ICC, WCNC,

VTC, CCNC, Healthcom. His research interests include 5G, internet of things, vehicle communications, green

communications and signal processing.

Zhenyu Zhou (M’11-SM’17) received his M.E. and Ph.D degree from Waseda University,

Tokyo, Japan in 2008 and 2011 respectively. From April 2012 to March 2013, he was the chief

researcher at Department of Technology, KDDI, Tokyo, Japan. From March 2013 to now, he is

an Associate Professor at School of Electrical and Electronic Engineering, North China Electric

Power University, China. He is also a visiting scholar with Tsinghua-Hitachi Joint Lab on

Environment-Harmonious ICT at University of Tsinghua, Beijing from 2014 to now. He served

as an Associate Editor for IEEE Access, and a Guest Editor for IEEE Communications Magazine

and Transactions on Emerging Telecommunications Technologies. He also served as workshop

co-chair for IEEE ISADS 2015, and TPC member for IEEE Globecom, IEEE CCNC, IEEE ICC,

IEEE APCC, IEEE VTC, IEEE Africon, etc. He is a voting member of P1932.1 Working Group. He was the

recipient of the IEEE Vehicular Technology Society "Young Researcher Encouragement Award" in 2009, the

“Beijing Outstanding Young Talent Award in 2016, the IET Premium Award in 2017, and the IEEE ComSoc Green

Communications and Computing Technical Committee 2017 Best Paper Award. His research interests include green

communications, vehicular communications, and smart grid communications. He is a senior member of IEEE.

Zhengyu Zhu (S’12-M’17) received B.S. degree from Henan University in 2010, and received the Ph.D. degree

from Zhengzhou University, China, in 2017. Currently, He is with the School of Information Engineering,

Zhengzhou University, China. His research interests include information theory and signal processing for wireless

communications such as MIMO wireless network, physical layer security, wireless cooperative networks, internet of

things, vehicle communications, convex optimization techniques, and energy harvesting communication systems.

Shahid Mumtaz (SM’16) ([email protected]) has more than seven years of wireless industry

experience and is currently working as a senior research scientist and technical manager at

Instituto de Telecomunicações Aveiro (IT), Portugal, in the 4TELL group. Prior to his current

position, he worked as a research intern at Ericsson and Huawei Research Labs. He received his

M.Sc. and Ph.D. degrees in electrical and electronic engineering from Blekinge Institute of

Technology Karlskrona, Sweden, and the University of Aveiro. His research interests lie in the

field of architectural enhancements to 3GPP networks (i.e., LTE-A user plane and control plane

protocol stack, NAS, and EPC), 5G related technologies, green communications, cognitive

radio, cooperative networking, radio resource management, cross-layer design,

backhaul/fronthaul, heterogeneous networks, M2M and D2D communication, and baseband digital signal

processing. He has more than 80 publications in international conferences, journals, and book chapters.

mailto:[email protected]



Energy-Efficient Design for Latency-tolerant Content Delivery Networks

Thang X. Vu, Lei Lei, and Satyanarayana Vuppala

The Interdisciplinary Centre for Security, Reliability and Trust (SnT), U

niversity of Luxembourg, 29 Avenue John

F. Kennedy, Luxembourg. Email: {thang.vu, lei.lei, satyanarayana.vuppala}@uni.lu

Abstract

In this paper, we investigate the energy efficiency performance of content delivery networks in which a data center serves multiple users via a shared wireless medium. Focusing on latency-tolerant applications, we propose energy-efficient precoding design and optimization that minimize the total energy consumption while guaranteeing some given quality of service constraints. In particular, an energy-buffering time trade-off (EBT) is derived in a closed-form expression for single-user scenarios, which reveals the impact of the key system parameters on the total energy consumption. We then formulate an energy minimization problem with a minimum mean square error (MMSE)-based precoding design for multiple-user scenarios. In order to overcome the non-convexity of the formulated problem, we propose an iterative algorithm which solves the problem suboptimally via a linear approximation of the non-convex constraint. Finally, numerical results are presented to demonstrate the effectiveness of the proposed solution.

Index terms— Content delivery networks, precoding, energy efficiency, latency, optimization.

I. NTRODUCTION

Future content delivery networks will have to address stringent requirements of delivering content at

high speed and low latency due to the proliferation of mobile handsets and data-hungry applications. It

is predicted by Cisco that more than 70% of network traffic will be video in 2018. On the other hand,

only 5–10% of the files are frequently requested, which results in an inefficient utilization of network

resources of the conventional content delivery. One of the promising solutions to improve the resources

utilization is storing the content closer to users in distributed storage, which is referred to content

placement or caching [1]. Caching usually consists of two phases: placement and delivery. The placement

phase is executed during off-peak time when the network resources are redundant. In this phase, popular

content is duplicated and stored in the distributed caches in the network. The later usually occurs during

peak-traffic hours when the users’ demands are requested. If the requested content is available in the

user’s local storage, it can be served locally without being sent via the network. In this manner,

caching allows significant throughput reduction during peak-traffic time and thus reduces network

congestion [1–5].

The joint design of caching and physical layer design has attracted much attention recently. The

basic principle is to take into consideration the caching capacity at the edge nodes when designing

the signal transmission to improve the resources [6–9]. The authors in [6] study the trade-off between

energy consumption and backhaul load during the placement phase in heterogeneous networks. In [7],

a closed-form expression of the energy efficiency is derived showing essential impacts of caching. The

authors in [8] show that significant reduction in transmit power and fronthaul bandwidth can be obtained

via the careful design of cache-aware multicast beamforming and power allocation. In [11], the authors

study D2D networks in which the content can be cached at either small base stations or user nodes. A

joint content replacement and delivering scheme is developed to reduce the total energy cost taking into

account the fading channels. In [12], success delivery rate is studied in cluster-centric networks, which

group small base stations (SBSs) into disjoint clusters. The SBSs within one cluster share a cache

which is divided into two parts: one contains the most popular files, and one comprises different files

mailto:@uni.lu

mailto:@uni.lu



which are most popular locally. The authors in [13] study energy consumption based on an over

simplified model which assumes caching and transportation costs are linearly dependent on the number

of bits.

In this paper, we investigate the energy efficiency of content delivery networks in which a base station

(BS) is serving multiple users via a shared wireless channel. We focus on latency-tolerant applications

where the users can tolerate a reasonable delay before starting the requested service. First, we derive an

energy-buffering time trade-off (EBT) in a closed-form expression for single-user scenarios. From the

derived closed form, the impact of key system parameters on the total energy consumption is revealed.

We then formulate an optimization problem to minimize the total system energy usage for multiple-

user scenarios. In order to overcome the non-convexity of the formulated problem, we propose an

iterative algorithm which approximates the non-convex constraint by the first order approximation.

Finally, the effectiveness of the formulated problem is demonstrated via numerical results.

II. SYSTEM MODEL

We consider a content delivery network consisting of one BS equipped with L antennas serving K single-antenna users via a shared wireless medium, with K ≤ L, as depicted in Figure 1. The BS is connected to a data centre via high speed backhaul links. The BS is assumed to have full access to the content at the data centre, which contains N files of equal size of Q bits (in practice, unequal file size can be divided into trunks of subfiles which have the same size) and is denoted by F = {F1 , . . . , FN } the library. The users are equipped with a cache memory of size M (files) . We consider offline caching and focus on the energy consumption of the delivery phase [8].

A. Caching model

In this paper, we assume the content popularity follows a Zipf distribution [14]. The

probability of the i-th file being requested from a user is given as

where α is the skewness factor of the Zipf distribution.



In order to minimize the channel load, the users will cache the most popular files in their

cache. In particular, the first M

most popular files are prefetched at the user caches during the placement phase, which occurs

during off-peak time [1].

B. Signal transmission model

In the delivery phase, each user requests a file from the BS. First the user checks its own cache. If the requested file has been prefetched in its cache, it can be serve immediately. Otherwise, the requested file will be transmitted from the BS. Denote K’ as the subset of users whose requested files are not available in their cache. The BS will only transmit to these users in |K’ |. Obviously, |K’ | ≤ K .

We consider latency-tolerant applications, where the users can allow some buffering time after releasing their requests. Let θ denote a buffering time that the users can tolerate (the gap time between the moment the users send requests and when they can start the requested service, e.g., watching a video). Since the users can tolerate a buffering time θ, they will use this period to preload parts of the requested file to their buffer. Denote 𝐰𝑘

𝑏, 𝐰𝑘𝑡 ∈ CL×1 as the

precoding vector for user k during the buffering and transmission time, respectively. The received signal at user k is given as

where the superscript (b, t) represents the corresponding buffering time or transmission time, 𝑥𝑘 is the modulated signal of the requested file from user k, 𝑧𝑘 is Gaussian noise, and 𝐡𝑘 is the channel fading vector from the BS antennas to user k, which follows a circular-symmetric complex Gaussian distribution. Perfect channel state information (CSI) is assumed to be known at the BS. In practice, robust channel estimation can be achieved through the transmission of pilot sequences. We consider block fading channels and assume the channel coherence time is sufficient long to accommodate one request session [8]. By treating the interference as noise, the respective achievable information rate for user k ∈ K’ during the buffering time and transmission time are

where B is the channel bandwidth.

III. Problem Formulation

In this section, we consider an energy minimization problem with delay-tolerant design. The problem formulation is shown below. For more details, please refer to [10].

IV. NUMERICAL RESULTS



This section presents numerical results to demonstrate the derived optimization. The system parameters for simulations are as follows: B = 1 MHz, κ = −20 dB, σ2 = −10 dBm, Q = 48 Mbits, and the request rate r1 = · · · = rK = r = 4 Mbps which is corresponding to the expected serving time T = Q/r = 12 seconds, Ptot = 2 Watt.

Fig. 2a presents the EBT for the single-user scenario without caching, i.e., M = 0. It is observed that the analysis perfectly

matches simulation results. If the user does not allow any delay, it costs 0.58 Joule to send the

requested file. However, if the user can tolerate a delay of 0.8 seconds, the system can save

10% of the energy cost. Fig. 2b plots the energy consumption in multi-user systems under

two precoding designs for two cases: without caching, i.e., M = 0 (left subfigure), and with

a cache size M = 0.1N (right subfigure). The energy consumption is calculated based on the

optimial solution of the formulated problems in Section IV. It is shown that the MMSE-based

design is more efficient than the ZF-based design in the considered setting. In particular, the

MMSE design consumes approximately 10% less than the ZF design. It is also shown that with

a cache size equal to 10% of the library size, the system can significantly reduce 75% the

total energy usage. In all cases, increasing the tolerated latency results in less energy

consumption. We would remark that the average energy cost per user in this case (left subfigure)

is higher than in the single-user scenario since additional energy is required to mitigate inter-

user interference.

V. CONCLUSION

We have analysed the energy performance of cache-assisted content delivery networks in

which a date centre is serving users via shared wireless channels. First, we have derived an

energy-buffering time trade-off in a closed-form expression for single-user scenarios. We then

have formulated two optimization problems corresponding two linear precoding design for

multi-user systems to minimize the total system energy consumption taking into account an

allowable latency. The developed framework can be utilized as a guideline for system design

and optimization for latency-tolerant services.

REFERENCES



[1] S. Borst, V. Gupta, and A. Walid, “Distributed caching algorithms for content distribution networks,” in Proc. IEEE Int. Conf. Comput.

Commun., Mar. 2010, pp. 1–9.

[2] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2856–2867, May 2014. [3] K. C. Almeroth and M. H. Ammar, “The use of multicast delivery to provide a scalable and interactive video-on-demand service,” IEEE J. Sel. Areas

Commun., vol. 14, no. 6, pp. 1110–1122, IEEE Trans. Inf. Theory. 1996. [4] M. Ji, G. Caire, and A. F. Molisch, “Fundamental limits of caching in wireless D2D networks,” IEEE Trans. Inf. Theory, vol. 62,

no. 2, pp. 849–869, Feb. 2016. [5] A. Sengupta, R. Tandon, and T. C. Clancy, “Fundamental limits of caching with secure delivery,” IEEE Trans. Info. Forensics and

Security, vol. 10, no. 2, pp. 355–370, Feb. 2015. [6] F. Gabry, V. Bioglio, and I.Land, “On energy-efficient edge caching in heterogeneous networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 12, pp.

3288–3298, Dec. 2016. [7] D. Liu and C. Yang, “Energy efficiency of downlink networks with caching at base stations,” IEEE J. Sel. Areas Commun., vol. 34,

no. 4, pp. 907–922, Apr. 2016. [8] M. Tao, E. Chen, H. Zhou, and W. Yu, “Content-centric sparse multicast beamforming for cache-enabled cloud RAN,” IEEE Trans.

Wireless Commun., vol. 15, no. 9, pp. 6118–6131, Sept. 2016. [9] T. X. Vu, S. Chatzinotas, and B. Ottersten, “Energy-efficient design for edge-caching wireless networks: When is coded-caching beneficial?” in Proc.

IEEE Int. Workshop Signal Process. Adv. Wireless Commun., Jul. 2017, pp. 1–5. [10] T. X. Vu, L. Lei, and S. Vuppala, Energy-Efficient Design for Latency-tolerant Content Delivery Networks, in Proc. IEEE WCNC

Workshop, Barcelona, Apr. 2018, pp. 1-6. [11] M. Gregori, J. Gmez-Vilardeb, J. Matamoros, and D. Gndz, “Wireless content caching for small cell and D2D networks,” IEEE J. Sel.

Areas Commun., vol. 34, no. 5, pp. 1222–1234, May 2016. [12] Z. Chen, J. Lee, T. Q. Quek, and M. Kountouris, “Cooperative caching and transmission design in cluster-centric small cell networks,” IEEE Trans.

Wireless Commun., vol. 16, no. 5, pp. 3401 – 3415, May 2016. [13] Y. Xu, Y. Li, Z. Wang, T. Lin, G. Zhang, and S. Ci, “Coordinated caching model for minimizing energy consumption in radio access network,” in Proc.

IEEE Int. Conf. Commun., 2014, pp. 2406–2411. [14] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipf-like distributions: Evidence and implications,” in IEEE INFOCOM, Mar.

1999, vol. 1, pp. 126–134.

Thang X. Vu was born in Hai Duong, Vietnam. He received the B.S. and the M.Sc., both in Electronics and Telecommunications Engineering, from the VNU University of Engineering and Technology, Vietnam, in June 2007 and September 2009, respectively, and the Ph.D. in Electrical Engineering from the University Paris-Sud, France, in January 2014. From 2007 to 2009, he was with the Department of Electronics and Telecommunications, VNU University of Engineering and Technology, Vietnam as a research assistant. In 2010, he received the Allocation de Recherche fellowship to study Ph.D. in France. From September 2010 to May 2014, he was with the Laboratory of Signals and Systems (LSS), a joint laboratory of CNRS, CentraleSupelec and University Paris-Sud XI, France. From July 2014 to January 2016, he was postdoctoral researcher with the Singapore University of Technology and Design (SUTD), Singapore. Currently, he is research associate at Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg. His research interests are in the field of wireless communications, with particular interests of cache-assisted 5G, cloud radio access networks, resources allocation and optimization, cooperative diversity, channel and network decoding, and iterative decoding.

Lei Lei received the B.Eng. and M.Eng. degrees from Northwestern Polytechnical University, Xi’an, China, in 2008 and 2011, respectively. He obtained his Ph.D. degree in 2016 at the Department of Science and Technology, Linko ping University, Sweden. From Nov. 2016, he is a research associate at the Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg. He was a research assistant at Institute for Infocomm Research (I2 R), A*STAR, Singapore, from June 2013 to December 2013. He received the IEEE Sweden Vehicular Technology-Communications-Information Theory (VT-COM-IT) joint chapter best student journal paper award in 2014. His current research interests include resource allocation and optimization in 4G/5G/satellite networks, wireless caching, energy-efficient communications.

Satyanarayana Vuppala received the Bachelor of Tech. degree with distinction in Computer Science and Engineering from JNTU Kakinada, India, in 2009, and the Master of Tech. degree in Information Technology from the National Institute of Technology, Durgapur, India, in 2011. He received the Ph.D. degree in Electrical Engineering from Jacobs University Bremen in January 2015. He was a Postdoctoral at IDCOM, University of Edinburg, UK, from 2015 to 2017. Since May 2017, he is a post-doctoral researcher at the Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg. His research activities are mainly focused 5G Networks (Millimeter wave, Full-duplex, Non-orthogonal multiple access, D2D), Machine learning for Wireless Networks, and Internet of Things. He also works on physical, access, and network layer aspects of wireless security. He coauthored articles short-listed for the Best Paper Awards at the Asilomar Conference on Signals Systems and Computers in 2012, and 2014.



Cooperative Content Caching and Distribution in Multihop D2D-V2V Networks

Yahui Wang*, Zhenyu Zhou*, Houjian Yu*, and Chen Xu*

*School of Electrical and Electronic Engineering, North China Electric Power University,

Beijing, China.

1. Introduction

Device-to-device (D2D) communication, which allows direct content sharing over proximate peer-to-peer

links [1] and dependable vehicular connectivity [2]. And D2D based V2V (D2D-V2V) communication can

also realize effective data offloading, dependable service delivery and coordinated resource utilization by

exploring the cellular infrastructures with centralized intelligence [3].

A number of works have studied content distribution problems in conventional D2D networks including

relay networks [4], social networks [5], as well as mmWave cellular networks [6]. However, these works

are not suitable for the highly dynamic and unreliable D2D-V2V links. And as for works addressed the

content distribution problem in D2D based vehicular networks [7], [8], they have not consider the multi-

hop transmission scenario and vehicle trajectory prediction.

However, it imposes new challenges in D2D based vehicular content distribution. First, it is difficult to

form a content distribution group in fast-varying channel conditions and network topologies. Second, co-

channel interference should be carefully managed to satisfy the dependable timeliness requirements of

D2D-V2V communication. Thirdly, the multi-hop content distribution process involves a joint optimization

with peer discovery and spectrum allocation from a delay minimization perspective.

In this work, we investigate how to achieve dependable content distribution in D2D based cooperative

vehicular networks by combining big data based vehicle trajectory prediction with coalition formation game

based resource allocation, determine the formation of content distribution groups with different lifetimes

as a coalition formation game, and evaluate the delay performance based on real-world map and realistic

vehicular traffic by connecting SUMO with MATLAB via predefined standard interfaces.

2. System Model

Figure 1 The system model of D2D-V2V multihop content distribution

Figure 1 shows the system model of a D2D based cooperative vehicular network, which is composed of

a base station (BS), K cellular user equipments (CUEs), M vehicular content providers (V-TXs), and N



vehicular content requesters (V-RXs). Vehicle mobility pattern and trajectory prediction have been

studied in [9]-[11]. We adopt a multi-Kalman filter (MKF) based trajectory prediction approach with the

assistance of global positioning system (GPS) and geographic information system (GIS) big data

proposed in [9] to estimate the connection time between two vehicles in transmission process. We assume

that each CUE k

VC is allocated with one orthogonal uplink resource block (RB) RB

kC , which can be

reused simultaneously by at most one D2D-V2V multicast transmission.

We consider an example that V-TX TX

mV serves the content request of V-RX

RX

nV by reusing RB RB

kC .

On account of uplink spectrum reusing, TX

mV will create co-channel interference to the BS, while RX

nV

will suffer from the interference caused by CUE k

VC . To evaluate the content distribution performance,

we take the network average delay as a key measurement, which can be expressed as a function of SINR

and vehicle connection time. The corresponding transmission rate is defined as ,

k

m nr , then the transmission

delay of D2D-V2V link (TX

mV , RX

nV ) using RB RB

kC can be approximately calculated as

,

,

,,( | )

k

m nk

m n k

m nm nt

%

% (1)

where ,

,

k

m nk

m n

D

r % and D represents the size of the required content in bits. ,,( | )

k

m nm nt % is an indicator

function of connection time ,m nt and it is defined as 1 when

,m nt > ,

k

m n% . ,,( | )k

m nm nt % makes sure that the

connection time of two vehicles should be no less than the duration required to deliver the content.

In each content distribution group, the content is delivered simultaneously from a serving V-TX to

multiple V-RXs co-located within the same group. During the modeling process of vehicular content

distribution, there are two critical aspects that should be carefully considered. First, the numbers of V-

TXs and V-RXs vary over time rather than remain constant. The number of potential V-TXs increases

gradually as more and more V-RXs obtain the content. Second, the lifetime of each D2D-V2V content

distribution group is different from one another due to the diverse channel conditions and interference

levels.

The delay ,

K

m nT for RX

nV to obtain the content from TX

mV is composed of the delay required for TX

mV to

obtain the content, and the transmission delay from TX

mV to RX

nV .

We design a M N K matrix M N K to represent the set of optimization variables. Each element

, ,m n ko of the matrix M N K is a binary variable. If TX

mV and RX

nV form a D2D-V2V pair by using RB

kC ,

, , 1m n ko , and otherwise, , , 0m n ko . The formulated joint peer discovery, spectrum allocation, and route

selection problem is given by

, ,

, , ,,{ }

1min

m n k RX TX RBn RX m TX k RB

K

m n k m no

V v V v C c

o TN

(2)

It is noteworthy that , min

k V

m n and min

m C

k needs to be satisfied to ensure QoS requirements for

cellular links and D2D-V2V links. In addition, all of the V-RXs in the same group are related to the same

V-TX and the same RB.

4. Coalition Formation Game based Dependable Content Distribution and Simulation Results

In this section, we introduce how to formulate the original content distribution problem as a coalition

formation game and some fundamental concepts. And the proposed algorithm is evaluated in simulation

based on real-world road topology and realistic vehicular traffic.



In a coalition formation game, a set of game players seek to form cooperative content distribution

groups with the aim to reduce average network delay. Here, the game formulation is defined as a triplet

(T, P, U), where T is the player set defined as TX RX RBv v c , P is a collection of coalitions, and U

denotes the coalition utility. Furthermore, P is also defined as any arbitrary set of disjoint coalitions

mS T . If P spans the player set T, P can also be regarded as a partition of T. Although coalitions are

formed to achieve dependable content distribution, there may exist some V-RXs not included in P

because of the QoS and connection time constraints. To make the definition of coalition consistent, we

introduce the concept of solo coalition {RX

nV }, which contains only the unserved V-RX RX

nV .

During a coalition game, each V-RX tends to join an ideal coalition to maximize its individual payoff.

The RB occupied by the coalition can be released for new coalition formation if and only if all of the V-

RXs within that coalition have received the requested content. Hence, the objective of a coalition is to

minimize the average delay of all the coalition members. As a result, a V-RX may be refused by a

coalition if it dramatically decreases the coalition utility.

After obtaining the requested content, a V-RX can act as V-TX and join a new coalition to serve other

V-RXs in the next hop. A new D2D-V2V coalition can only be formed if a RB is willing to join this

coalition. A conflict arises when multiple RBs tend to join the same coalition. In this case, only the RB

with the highest payoff is allowed to join the coalition.

Based on the concepts of preference relation and the split and merge rule, the coalition formation game

based vehicular content distribution is implemented as follows.

Phase 1: Coalition formation initialization

Phase 2: Iterative coalition formation

Phase 3: Resource allocation and content dissemination

The algorithm terminates if either one of the following conditions is satisfied. One is that any RX RX

nV V has obtained the requested content. The other is any V-RX that has not obtained the content

yet cannot be served by any TX TX

mV V .

(a) (b)

Figure 2 The percentage of served V-RXs and average network delay performance

The simulation of content distribution is conducted by connecting SUMO with MATLAB through the

standard traffic control interference (TraCI) protocol. We compare the proposed algorithm with two

heuristic schemes, i.e., a non-cooperative content distribution scheme [12] and a random group formation-

based content distribution scheme [12].



Figure 2(a) shows the content distribution efficiency versus time. More rapid content distribution can be

achieved by the proposed algorithm during the beginning phases. In addition, the proposed algorithm

achieves better coverage performance when the content distribution process is finished since the route

selection, peer discovery, and spectrum allocation are jointly optimized in the proposed algorithms.

Figure 2(b) shows that the average network delay performance decreases monotonically with the number of

RBs. Adding more RBs can not only support numerous content distribution groups but also can introduce additional

diversity gain since there will be an increased opportunity for each group to select a better RB. Hence, the

performance gap demonstrates that the benefits brought by increasing the number of RBs can be better explored by

the proposed algorithm.

4. Conclusion

In this paper, we investigated the content distribution problem in D2D-based cooperative vehicular

networks and proposed a big data integrated coalition formation game approach to jointly optimize peer

discovery, route selection, and spectrum allocation from a delay minimization perspective. We conclude

that the proposed algorithm achieves the best content distribution efficiency and well explores the benefits

of adding more RBs. And it is more robust to the adverse impacts caused by multi-hop transmission.

References

[1] Z. Zhou, M. Dong, K. Ota, and C. Xu, “Energy-efficient matching for resource allocation in D2D enabled

cellular networks,” IEEE Trans. Veh.Technol., vol. 66, no. 6, pp. 5256–5268, June. 2017.

[2] X. Cheng, L. Yang, and X. Shen, “D2D for intelligent transportation systems: a feasibility study,” IEEE Trans.

Intell. Transp. Syst., vol. 16, no. 4, pp. 1784–1793, Aug. 2015.

[3] N. Cheng, H. Zhou, L. Lei, N. Zhang, Y. Zhou, X. Shen, and F. Bai,“Performance analysis of vehicular Device-

to-Device underlay communication,” IEEE Trans. Veh. Technol., vol. 66, no. 6, pp. 5409–5421, June. 2017.

[4] Y. Zhao and W. Song, “Truthful mechanisms for message dissemination via Device-to-Device

communications,” IEEE Trans. Veh. Technol., vol. pp, no. 99, pp. 1–1, July. 2017.

[5] C. Xu, C. Gao, Z. Zhou, Z. Chang, and Y. Jia, “Social network-based content delivery in Device-to-Device

underlay cellular networks using matching theory,” IEEE Access, vol. 5, pp. 924–937, Nov. 2016.

[6] N. Giatsoglou, K. Ntontin, E. Kartsakli, A. Antonopoulos, and C. Verikoukis, “D2D-aware device caching in

mmWave-cellular networks,” IEEE J. Sel. Areas Commun, vol. pp, no. 99, pp. 1–1, June. 2017.

[7] Z. Zhou, C. Gao, C. Xu, Y. Zhang, S. Mumtaz, and J. Rodriguez, “Social big data based content dissemination in

internet of vehicles,” IEEE Trans. Ind. Informat., vol. pp, no. 99, pp. 1–1, July. 2017.

[8] H. Li, B. Wang, Y. Song, and K. Ramamritham, “VeShare: a D2D infrastructure for real-time social-enabled

vehicle networks,” IEEE Wirel. Commun., vol. 23, no. 4, pp. 96–102, Aug. 2016.

[9] C. Barrios and Y. Motai, “Improving estimation of vehicles trajectory using the latest global positioning system

with Kalman filtering,” IEEE Trans. Instrum. Meas., vol. 60, no. 12, pp. 3747–3755, May. 2011.

[10] B. Barshan and H. F. Durrant-Whyte, “Inertial navigation systems for mobile robots,” IEEE Trans. Robot.

Autom., vol. 11, no. 3, pp. 328–342, June. 1995.

[11] R. Toledo, M. A. Zamora, B. Ubeda, and A. F. Gomez, “High integrity IMM-EKF based road vehicle

navigation with low cost GPS/INS,” IEEE Trans. Intell. Transp. Syst., vol. 8, no. 3, pp. 491–511, Sept. 2007.

[12] R. Mochaourab, E. Bjrnson, and M. Bengtsson, “Adaptive pilot clustering in heterogeneous massive MIMO

networks,” IEEE Trans. Wirel. Commun., vol. 15, no. 8, pp. 5555–5568, May. 2016.

Yahui Wang is currently pursuing the M.S. degree with North China Electric Power University,

China. Her research interests include resource allocation, interference management, and energy

management in D2D communications.



Zhenyu Zhou (M'11-SM'17) received his M.E. and Ph.D degree from Waseda University, Tokyo,

Japan in 2008 and 2011 respectively. From April 2012 to March 2013, he was the chief researcher

at Department of Technology, KDDI, Tokyo, Japan. From March 2013 to now, he is an Associate

Professor at School of Electrical and Electronic Engineering, North China Electric Power

University, China. He is also a visiting scholar with Tsinghua-Hitachi Joint Lab on Environment-

Harmonious ICT at University of Tsinghua, Beijing from 2014 to now. He served as an Associate

Editor for IEEE Access, and a Guest Editor for IEEE Communications Magazine and Transactions

on Emerging Telecommunications Technologies. He also served as workshop co-chair for IEEE ISADS 2015, and

TPC member for IEEE Globecom, IEEE CCNC, IEEE ICC, IEEE APCC, IEEE VTC, IEEE Africon, etc. He is a

voting member of P1932.1 Working Group. He was the recipient of the IEEE Vehicular Technology Society

"Young Researcher Encouragement Award" in 2009, the “Beijing Outstanding Young Talent Award in 2016, the

IET Premium Award in 2017, and the IEEE ComSoc Green Communications and Computing Technical Committee

2017 Best Paper Award. His research interests include green communications, vehicular communications, and smart

grid communications. He is a senior member of IEEE.

Houjian Yu is currently working towards the B.S. degree at North China Electric Power University,

China. His research interests include resource allocation, interference management, and energy

management in D2D communications.

Chen Xu (S’12-M’15) received the B.S. degree from Beijing University of Posts and

Telecommunications in 2010, and the Ph.D. degree from Peking University, Beijing, in 2015. She is

now a lecturer in School of Electrical and Electronic Engineering, North China Electric Power

University. Her research interests mainly include wireless resource allocation and management,

game theory, optimization theory, heterogeneous networks, and smart grid communication. She

served as a TPC member for IEEE Globecom, IEEE ICC, and IEEE ICCC. She received the best

paper award at the 2012 International Conference on Wireless Communications and Signal

Processing, and IEEE Leonard G. Abraham Prize in 2016.



MMTC OFFICERS (Term 2016 — 2018)

CHAIR STEERING COMMITTEE CHAIR

Shiwen Mao Zhu Li

Auburn University University of Missouri

USA USA

VICE CHAIRS

Sanjeev Mehrotra (North America) Fen Hou (Asia)

Microsoft University of Macau

USA China

Christian Timmerer (Europe) Honggang Wang (Letters&Member Communications)

Alpen-Adria-Universität Klagenfurt UMass Dartmouth

Austria USA

SECRETARY STANDARDS LIAISON

Wanqing Li Liang Zhou

University of Wollongong Nanjing Univ. of Posts & Telecommunications

Australia China

MMTC Communication-Frontier BOARD MEMBERS (Term 2016—2018)

Guosen Yue Director Huawei R&D USA USA

Danda Rawat Co-Director Howard University USA

Hantao Liu Co-Director Cardiff University UK

Dalei Wu Co-Director University of Tennessee USA

Zheng Chang Editor University of Jyväskylä Finland

Lei Chen Editor Georgia Southern University USA

Tasos Dagiuklas Editor London South Bank University UK

Melike Erol-Kantarci Editor Clarkson University USA

Kejie Lu Editor University of Puerto Rico at Mayagüez Puerto Rico

Nathalie Mitton Editor Inria Lille-Nord Europe France

Shaoen Wu Editor Ball State University USA

Kan Zheng Editor Beijing University of Posts & Telecommunications China

MULTIMEDIA COMMUNICATIONS TECHNICAL …mmc.committees.comsoc.org/files/2018/02/01-MMTC_Communication... · and multimodal multimedia are used in multiple areas, including but not

Documents