ACM Multimedia Systems Conference 2018 - Distributed Systems … · 2018-10-04 · ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands. Overview ... AM Multimedia 2011.

Distribution Systems for 3D Teleimmersive and Video 360

Content: Similarities and Differences

Klara Nahrstedt

Department of Computer Science

University of Illinois at Urbana-Champaign

[email protected]

ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands

Overview

• Motivation

• 3D Teleimmersive Video Representation

• Video 360 Representation

• Similarities and Differences in Content Representation

• Distribution of 3DTI Video

• Distribution of Video 360

• Similarities and Differences in Content Distribution

• Conclusion

3D Teleimmersive (3DTI) Systems

3Source: http://tele-immersion.citris-uc.org; http://monet.cs.illinois.edu/projects/cyphy-multi-modal-teleimmersion-for-tele-physiotherapy/teleimmersion-gallery/

http://tele-immersion.citris-uc.org/

High-End Tele-Presence Environments

2

Traditional telephony and videoconferencing provide some of these elements, including ease of

use and audio quality, yet fail on most others. Our Coliseum effort aims to advance the state of

videoconferencing by applying recent advances in image-based modeling and computer vision to

bring these other elements of face-to-face realism to remote collaboration.

Scene reconstruction, the task of building 3D descriptions using the information contained in

multiple views of a scene, is an established challenge in computer vision [Longuet-Higgins 81].

It has seen remarkable progress over the last few years due to improved algorithms [Seitz 97,

Narayanan 98, Pollefeys 99] and faster computers. The Coliseum system is based on the Image-

Based Visual Hulls (IBVH) image-based rendering scene reconstruction technology of MIT

[Matusik 00]. Our recent Coliseum efforts have shown that the IBVH method can operate at

video rates from multiple camera streams hosted by a single personal computer [Baker 02].

Each Coliseum participant works on a standard PC with LCD monitor and a rig housing five

video cameras spaced at roughly 30 degree increments, as shown in Figure 1. During a

teleconferencing session, Coliseum builds 3D representations of each participant at video rates.

The appropriate views of each participant are rendered for all others and placed in their virtual

environments, one view of which is shown in Figure 2. The impression of a shared space results,

with participants free to move about and express themselves in natural ways, such as through

voice, gesture, and gaze.

Handling five video streams and preparing 3D reprojection views for each of numerous

coparticipating workstations at video rates is a formidable task. Tight control must be exercised

on computation, process organization, and inter-desktop communication. At project inception,

we determined we needed an effective speedup of about one hundred times over the MIT IBVH

processing on a single PC to reach utility. Our purpose in this paper is to detail some of the major

issues in attaining this performance.

Figure 1. The Coliseum immersive videoconferencing system

Cis

co T

ele

-pre

sence

HP

Halo

UN

CH

P C

ole

siu

m

Multi-Camera Live Broadcast Systems

http://www.dailymail.co.uk/sciencetech/article-2336893/New-TV-cameras-bring-Matrix-style-bullet-time-

trickery-live-sports-coverage.html

Multi-Camera Broadcast Systems

https://thegadgetflow.com/portfolio/slingstudio-

multi-camera-broadcaster/

https://www.myslingstudio.com/ https://www.cinfo.es/our-products/synthetrick/multicam

https://www.spiideo.com/sports/

https://thegadgetflow.com/portfolio/slingstudio-multi-camera-broadcaster/

https://www.myslingstudio.com/

360-Degree Video

7360 Degrees Cameras – CoolPile.com: http://coolpile.com/tag/360-degrees-cameras

http://coolpile.com/tag/360-degrees-cameras

3D Teleimmersive Video Representation

3D Teleimmersive Stereo Video and Free Viewpoint Video Capture

3DTI Viewing

Photo courtesy of Prof. Ruzena Bajcsy.

Singapore, 2014

3D Stereo Video Representation

Wu, Ahsan, Kurillo, Agarwal, Nahrstedt, Bajcsy, “Color-plus-Depth Level-of-Detail in 3D Teleimmersive Video: A Psychophysical Approach”, ACM Multimedia 2011

Free-Viewpoint 3D Video Representation

Example of 3D representation captured by different cameras

ca

me

ra-1

Cam

era

-2

Cam

era

-3

Cam

era

-8

camera

direction

source: http://zing.ncsl.nist.gov/~gseidman/vrml/

Angle

θ

View Model

OiOu

3DTI Data Model

• 3D frame for camera i at time t: fi,t

• Each pixel in the frame carries color+depth data and can be independently rendered

• Stream for camera i• Si = { fi,t1 fi,t2 … }

• Macro-frame• Ft = { f1,t f2,t … fn,t }

…

…

1 n

f1,t1 fn,t1Ft1

…f1,t2 fn,t2Ft2

S1 Sn

360-Degree Video Representation

360-Degree VideoUser’s Viewport

Generation of 360-Degree Video • Capturing of multiple 2D videos together with their metadata• Stitching videos together and further editing them in spherical video• Encoding spherical video considering projection, interactivity, storage and delivery formats

(this will impact decoding and rendering processes)

Video 360 Viewing and Navigation

https://en.wikipedia.org/wiki/Head-mounted_display

Controller

Example of HDM (Head-Mounted Displays) – Oculus Rift, Samsung Gear VR, HTC Vive,

360-Degree Video Data Model• Field-of-View or Viewport – display region on the Head-Mounted Display

• Fraction of omnidirectional view of the scene

• Viewport defined by a device-specific viewing angle (typically 120 degrees) which delimits horizontally scene from head direction center, called viewport center

• Viewport Resolution – 4K (3840x2160) pixels• Resolution of full 360-degree video – at least 12K (11520x6480)

• Video Framerate – order of HMD refresh rate 100Hz – 100 fps

• Motion-to-Photon Latency requirement • Less than 20 ms for VR – much smaller than Internet request-reply delay

• Need viewport prediction

• Bitrate – Video 360 vs HEVC (8K video at 60fps is approx. 100 Mbps)

• Tiling- Spatial divide of spherical video into in independent tiles

Tiles and Spherical Maps

Issues with Spherical Mapping to Tiles• Viewport distortion• Spatial quality variance Considerations of sphere-to-plane mapping and viewing probability of tiles are IMPORTANT• Overall spherical distortion of segment is the sum of distortion over all pixels the segment

covers

Xie et al. “360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming”, ACM MM 2017

Video 360 Spherical-to-Plane Projections

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017Nasrabadi et al. “Adaptive 360-Degree Video Streaming using Scalable Video Coding”, ACM Multimedia 2017

Video 360 Capture as Spherical Video

Equirectangular Projection – stretches poles and reduces efficiency of codingPyramid Projection – sees degradation on sides Cubemap – maps 90 degree FOV to sides of cube and provides hence less degradation

Encoding and Delivery Formats • Codecs

• AVC/H.264, HEVC/H.265• VP8, VP9

• Delivery Formats• DASH/HLS (Dynamic Adaptive HTTP)

• MPEG-DASH Standard considers tiling

• MPD (Media Presentation Description) –Modified for Video 360

• SRD (Spatial Relation Description) integrated into MPD

• HEVC considers video tiles

• MPEG – Immersive media standard ISO/IEC 23090

• Part 1: Use cases• Part 2: OMAF (Omnidirectional Media

Application Format)• Description of equirectangular projection

format• Metadata for interoperable rendering of

360-degree monoscopic and stereoscopic audio-visual data

• Storage format (ISO base media file format/MP4

• Codecs: HEVC, MPEG0H 3D audio

• Part 3: Immersive video• Part 4: Immersive Audio

Graf, Timmerer, Mueller, “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP”, ACM MMSys 2017

Similarities and Differences of Representations

Similarity Parameter 3DTI Video 360-Degree Video

Multi-camera Views Yes (view) Yes (viewport)

Joint coordinate system Yes Yes

Bitrate consideration Yes Yes

View change Yes Yes

Difference Parameter 3DTI Video 360-Degree Video

Video Format Color-Plus-Depth Color

Smallest item to adapt 3DTI frame tile

Frame Representation Frame manipulation at Pixel level (RGB, Depth, Polygons)

Frame manipulation at tiles and Region of Interest level

Coding Simple zlip Complex HVEC

Resolution 640x480 or 1080p 4K to 16K

Resolution for diverse devices No Yes

Format for diverse navigation No Yes

Distribution Systems of 3DTI Video

Multi-Camera 3DTI Transmission System

P

camera

av

dis

pla

y

CCR

GG switch

Site -2

A

microphone

camera

av

dis

pla

y

RC

C

Gswitch

Site-1

A

microphoneC = camera

A = microphone

G = gateway

R = renderer

Internet

25

Approach: Multi-stream Hierarchical Adaptation

Multi-stream Adaptation(Stream Selection)

• Camera orientation:

• User view orientation: cos = , , where is the angle between camera and user view

• Selection (SI) – View-Centric Stream Selection

where T is a user specified parameter

cameradirection

Zhenuy Yang, Klara Nahrstedt, Bin Yu, Ruzena Bajcsy, “A Multi-stream Adaptation framework for Bandwidth Management in 3D Teleimmersion”, ACM NOSSDAV 2006, May 2006, Newport, Rhode Island

View-Centric Stream Differentiation

3D capturing

8

4

6 2

3D camera

transmission

8

4

6 2

3D rendering

user viewstreams contributingmore to user view

less important streams

Timing Performance Validation

Macro-Frame Delay at Sender side

Macro-frame Completion Interval at Receiver Side (End-to-End Delay UIUC-UCB)

Immersive View-Centric Multi-View Multi-Party 3DTI

Z. Yang et al. “ViewCast: View Dissemination and Management for Multi-Party 3D Tele-immersive Environments, ACM Multimedia 2007

Multi-Party Multi-View Telepresence

Example of 3D representation captured by 4 camerasca

mer

a-1

Cam

era-

2

Cam

era-

3

Cam

era-

8

c1c2

Camera

c3

c4

c5

c6c7c8

view

Multi-stream contents

Multi-view environment

High resource demand

Multi-stream dependency

Real-time interactivity

Telepresence Session Control

G

G GR

C

C

A

C

C R

A

R

A

C

C

C = camera

A = microphone

G = gateway

R = renderer

Decoupled control and data plane Hierarchical control Global session controller Local session controllers at G

Coordinated global control plane Monitor data plane Configure data plane

Data plane at TI participants Session routing table (SRT) Stream forwarding

Global Session Controller

(SRT)

Matching

Field (ID)

Forwarding

Action Bitrate

Site-X

Site-Y

Site-Z

ViewCast: Middleware (Overlay) FrameworkA three-layer multi-party/multi-stream

management framework

View-aware Stream

Differentiation/Selection

Overlay network

Service Middleware

Network

Tele-immersive Application

ViewCast

V2

V3

V4

V1

U2.wU3.w

U4.w

user view

U2

session

controller

U3

U4

3D capturing

8

4

6 2

3D camera

transmission

8

4

6 2

3D rendering

User/node’s view request

streams contributingmore to user view

less important streams

V2

V3

V4

V1

U2.wU3.w

U4.w

U2

session

controller

U3

U4

U3.w

victim

Why view change a problem?

Streams/View

GC = 100%, Ii (Oi) = 24

average 3.2

better than

MC–3

performance

but with 22%

less rejection

ratio

Immersive and Non-ImmersiveMulti-Party Multi-View (Live Broadcast) Systems

Arefin Ahsan , Zixia Huang, Klara Nahrstedt, Pooja Agarwal, “4D TeleCast: Towards Large Scale Multi-site and Multi-view Dissemination of 3DTI Content”, IEEE ICDCS 2012, Makau, China.

TI Components & Participants

• Immersive Participants

• Tight Interactivity

• Limited ScaleP

camera

avd

isp

lay

CCR

GG switch

SITE -2

A

microphone

Be

rke

ley

camera

avd

isp

lay

RCC

Gswitch

SITE-1

A

microphone

Illin

ois INTERNET

C = camera

A = microphone

G = gateway

R = renderer

S = sensors

S• Non-immersive Participants

• Large Scale

SITE-3

RG

SITE-4

RG

SITE-5

RG

SITE-6

RG

SITE-7

RG

SITE-9

RG

SITE-8

RG

SITE-10

RG

SITE-10

RG

Producers

NI Viewers

Producers

View/Stream Concepts among Immersive Participants

3D capturing

8

4

6 2

3D camera

transmission

8

4

6 2

3D rendering

user viewstreams contributing

more to user view

less important

streams

Content Producer(Immersive Participant)

Content Producer(Immersive Participant)

View/Stream Concept among Non-Immersive Participants

Site-A Site-B1

35

7

1

3

7

Dis

pla

y

view

Viewers

(Non-immersive Participant)

5

Camera

v1

v1 = [ ]

3D streams 3D streams

4D Content

Content

Producers 5

7

5

6 5 4 7 6 5

6 5 47 6 5> > > > >

Site-B1

35

7

1

3

7

Dis

pla

y

view

Viewers

5

Camera

v2

Site-A

Producers

v2 = [ ]4 3 5 6 7 5

4 3 56 7 5> > > > >

Multi-View Video among Non-Immersive Participants

Approach: 4D TeleCast

Producer Tier

Site-A Site-B

Site-C

Viewer Tier

C G ViewerCamera Communication

Gateway

InternetGSC

LSC LSC LSC

GSC – Global Session Controller LSC – Local Session Controller

Vie

wer

Tie

r

G

C

C

AS

R

G

C

C

AS

R

Producer Tier

Site-BSite-A

Site-C

D

Internet

CDNCDN-P2P

Infrastructure Management

[CDN Assisted Peer]

Wang’08, Liu’10, Chang’09

4D TeleCast

Distribution

Core Server

Edge Server

CDN

Request view V1={S1, S2, S3}

s1 s2 s3

Request view: V1={S1, S2, S3}

s1

s3

s2 s1

s2 s3



s2 s3

s1

Multi-stream Dependency (Problem Description)

S1A S2

A

>dbuffv1 = {S1

A, S2A}

U1

dbuff

Send to

display

time

S1A

S1A

U2

S2A S1

AS2

A

U1

CDN

Victim stream

Maximum allowed delay bound = dbuff

S1A

Violation of delay bound by dbuff Waste of bandwidthVictim streams

U3

Understanding E2E Delay

u2

A

B

CSite-A

Site-BSite-C

Layer-0

Layer-1

Layer-2

τ

Δ

end

-to

-en

d d

elay

Pro

du

cer

u1 s1A s1

B s1C

…

u3

u2

u1

u3

u3

u2

u1

τ = layer size

Δ = distance from source

• Use Delay Layer Hierarchy

00.10.20.30.40.50.60.70.80.9

1

0 2 4 6 8 10 12 14 16 18

Fra

cti

on

of

vie

wers

Maximum layer of accepted streams

Distribution Systems for Video 360

Pipeline of 360-Degree Video


Challenges of 360-degree Video Distribution

• Real-Time Stitching

• Simulator Sickness in Interactivity Scenarios• Enable to react to HMD head movements as fast as the HMD refresh rate (120 Hz)

• Viewport extraction in real-time • Challenge: difficult to predict user orientation for more than 3 seconds

• Challenge: if short-term prediction is needed, how do we avoid rebuffering/stall under small playout buffers?

• Avoidance of bandwidth waste (if one downloads viewports that are not needed)

• Tiles prefetching error

MPEG-DASH Video Distribution System for Single 2D Video Stream

dash.js. https://github.com/Dash-Industry-Forum/dash.js/wiki.

MPEG-DASH Video 360 Video Streaming using Tiles


360-Video Streaming Systems

• Tiling for Adaptive Streaming• Video divided into tiles• Depending on the mapping of spherical video projection, different tiles will be

streamed• Tiles currently viewed by users are streamed at high quality and the rest with low

resolution

• Personalized Viewport-Only Streaming – Asymmetric Panorama viewing• Also called asymmetric panorama viewport adaptive streaming

• Methods: Truncated Pyramid Projection (TSP), Cubemap• Video divided into segments• When client moves head, the viewport center changes and new viewport must be

display• Decrease of bitrate without decrease of quality of viewport

ISO/IEC JTC1/SC29/WG11/M. 2016. VR/360 Video Truncated Square Pyramid Geometry for OMAF.

Tile-based HTTP Adaptive Streaming and Head Movement PredictionXie, Xu, Ban, Zhang, Guo, “360ProbDASH: Improving QoE of 360 Video Streaming Tile-based HTTP Adaptive Streaming”, ACM Multimedia 2017

Tile-based HTTP Adaptive Streaming for 360 Video

Data Model at 360ProbDASH Server

ERP – Raw Panoramic Video• ERP is divided into video chunks• Each chunk is cropped into N tiles, indexed in raster-scan order• Each tile is encoded into segments with M bit-rate levels• MxN optional segments stored at server and ready for pre-fetching and streaming

360ProbDASH Approach

• Pre-fetch Segments by predicting viewport• Use probabilistic model for prediction

• Leverage Linear Regression Prediction of Orientation

• Distribution of Prediction Errors • Long-term predictions are hard

• 5 users data collection for short term prediction error (3 seconds)

Yourstory.comYaw prediction

Pitch prediction

Roll prediction

Delta = 3 sec

Tile-based Adaptive Video Streaming

• Ochi et al use tile-based streaming where spherical video is mapped to equirectangularvideo and video is cut into 8x8 tiles

• Hosseini and Swaminathan use hexa-face sphere-based tiling of 360-degree video to take into account projection distortion

• Description of tiles with MPEG-DASH Spatial Relation Description

• Quan et al use prediction of head movement to deliver tiles

• Weaknesses of Tiling systems• Time and energy consuming reconstruction • Coding inefficiency due to independent tiling• Server management of files is difficult due to large amount of quality levels and large MPD files• Client selection process is complex• Mixed bit-rate tiles can result in visible border and quality inconsistence in combined-tiles rendering• Multiple Decoders

D. Ochi, Y. Kunita, A. Kameda, A. Kojima, and S. Iwaki. Live streaming system for omnidirectional video. In Proc. of IEEE Virtual Reality (VR), 2015.M. Hosseini and V. Swaminathan. Adaptive 360 vr video streaming: Divide and conquer! In IEEE International Symposium on Multimedia (ISM), 2016.F. Quan, B. Han, L. Ji, and V. Gopalakrishnan. Optimizing 360 video delivery over cellular networks. In ACM SIGCOMM AllThingsCellular, 2016.

QER Viewport-Adaptive StreamingCarbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017

Viewport Adaptive Streaming System

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017

Approach: QER - Quality Emphasized Region • Not only bit-rate adaptation but also QER server adaptation where different regions have

different quality• QER – Quality Enhanced Region

• Each QER is represented by Quality Emphasis Center (QEC)• Full video gets delivered in certain projection representation (equirectangular, cube, ..), but it has different

versions of video QEC• Client device selects the right representation and extracts viewport

• Viewport-adaptive streaming similar to DASH• Client runs adaptation algorithm to select video representation; selects QER and QEC of available QER• QEC selection is based on smallest orthodromic distance

• Orthodromic distance –shortest distance between two points on surface of sphere, measured along surface of sphere

• Video segment length• Temporal Chunk sent from server – 1-10 seconds• Tradeoff between short and long segments

• Expanded MPD• MPD file expanded with new information

• Coordinates of its QEC in degrees• Two angles (0,360) degrees and (-90,90) degrees

• All representations assume the same reference coordinate system

QER-Based Viewport Adaptive Streaming

Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017

Examples of Experimental Results• Metric to extract viewport – (1) MS-SSIM: Multi-Scale Structural Similarity and (2) PSNR• Original equirectangular video of full quality - 4K video with 1080p resolution• QEC - in center of face encoded with best quality, other faces at 25% of full quality• Distance - for d = 0, QEC and viewport center match 0.98; as d increases, quality decreases• QEC numbers - With increased QEC number, quality increases; shorter segments are better

Similarities and Differences of Distribution Systems

Similarity Parameter 3DTI Video 360-Degree Video

Dealing with Bandwidth Adapt Views Adapt Viewports

View change yes yes

Navigation Via mouse yes Via mouse yes

Client adaptation yes yes

Streaming Protocols TCP-based TCP-based

Difference Parameter 3DTI Video 360-Degree Video

Dealing with Bandwidth Adapt Views/Streams Adapt Viewports/Tiles

Encoding Standards zlip/some efforts in MPEG/OMAF on 3DTI compression

MPEG-DASH considersomnidirectional video tiles

Distribution Style Real-time view-based telepresencestyle or live view-based broadcast

On-demand DASH-style

Clients homogeneous heterogeneous

Viewing Flat 2D or 3D displays Head-Mounted Displays

Streaming Protocols TCP-Based HTTP-based Standard MPEG-DASH

Navigation Via mouse only Via mouse, head movement, hand movement

Conclusion and Summary• 360-degree video is becoming possible for

• 3D teleimmersive video or • Omnidirectional video

• First solutions are coming up in terms of • capture, encoding and viewing

• But distribution represents challenge• Real-time live streaming or • Near-real-time distribution of 360-degree video

• A lot of presented material will be published in a survey paper • “Scalable 36-Degree Video Streaming: Challenges, Solutions and Opportunities”• Authors: Michael Zink, Ramesh Sitaraman, Klara Nahrstedt • Journal Venue: Proceedings of IEEE Special Issue• Editors: Boris Koldehofe, Ralf Steinmetz, …• Coming up in early 2019

ACM Multimedia Systems Conference 2018 - Distributed Systems … · 2018-10-04 · ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands. Overview ... AM Multimedia 2011.

Documents