Top Banner

Click here to load reader

ACM Multimedia Systems Conference 2018 - Distributed Systems · PDF file 2018-10-04 · ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands. Overview ... AM Multimedia 2011.

Jul 08, 2020




  • Distribution Systems for 3D Teleimmersive and Video 360

    Content: Similarities and Differences

    Klara Nahrstedt

    Department of Computer Science

    University of Illinois at Urbana-Champaign

    [email protected]

    ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands

  • Overview

    • Motivation

    • 3D Teleimmersive Video Representation

    • Video 360 Representation

    • Similarities and Differences in Content Representation

    • Distribution of 3DTI Video

    • Distribution of Video 360

    • Similarities and Differences in Content Distribution

    • Conclusion

  • 3D Teleimmersive (3DTI) Systems

    3 Source:; modal-teleimmersion-for-tele-physiotherapy/teleimmersion-gallery/

  • High-End Tele-Presence Environments


    Traditional telephony and videoconferencing provide some of these elements, including ease of

    use and audio quality, yet fail on most others. Our Coliseum effort aims to advance the state of

    videoconferencing by applying recent advances in image-based modeling and computer vision to

    bring these other elements of face-to-face realism to remote collaboration.

    Scene reconstruction, the task of building 3D descriptions using the information contained in

    multiple views of a scene, is an established challenge in computer vision [Longuet-Higgins 81].

    It has seen remarkable progress over the last few years due to improved algorithms [Seitz 97,

    Narayanan 98, Pollefeys 99] and faster computers. The Coliseum system is based on the Image-

    Based Visual Hulls (IBVH) image-based rendering scene reconstruction technology of MIT

    [Matusik 00]. Our recent Coliseum efforts have shown that the IBVH method can operate at

    video rates from multiple camera streams hosted by a single personal computer [Baker 02].

    Each Coliseum participant works on a standard PC with LCD monitor and a rig housing five

    video cameras spaced at roughly 30 degree increments, as shown in Figure 1. During a

    teleconferencing session, Coliseum builds 3D representations of each participant at video rates.

    The appropriate views of each participant are rendered for all others and placed in their virtual

    environments, one view of which is shown in Figure 2. The impression of a shared space results,

    with participants free to move about and express themselves in natural ways, such as through

    voice, gesture, and gaze.

    Handling five video streams and preparing 3D reprojection views for each of numerous

    coparticipating workstations at video rates is a formidable task. Tight control must be exercised

    on computation, process organization, and inter-desktop communication. At project inception,

    we determined we needed an effective speedup of about one hundred times over the MIT IBVH

    processing on a single PC to reach utility. Our purpose in this paper is to detail some of the major

    issues in attaining this performance.

    Figure 1. The Coliseum immersive videoconferencing system

    C is

    c o T

    e le

    -p re

    s e n c e

    H P

    H a lo

    U N

    C H

    P C

    o le

    s iu


  • Multi-Camera Live Broadcast Systems


  • Multi-Camera Broadcast Systems


  • 360-Degree Video

    7 360 Degrees Cameras –

  • 3D Teleimmersive Video Representation

  • 3D Teleimmersive Stereo Video and Free Viewpoint Video Capture

  • 3DTI Viewing

    Photo courtesy of Prof. Ruzena Bajcsy.

    Singapore, 2014

  • 3D Stereo Video Representation

    Wu, Ahsan, Kurillo, Agarwal, Nahrstedt, Bajcsy, “Color-plus-Depth Level-of-Detail in 3D Teleimmersive Video: A Psychophysical Approach”, ACM Multimedia 2011

  • Free-Viewpoint 3D Video Representation

    Example of 3D representation captured by different cameras

    c a

    m e

    ra -1

    C a m

    e ra


    C a m

    e ra


    C a m

    e ra


  • camera





    View Model

    Oi Ou

  • 3DTI Data Model

    • 3D frame for camera i at time t: fi,t

    • Each pixel in the frame carries color+depth data and can be independently rendered

    • Stream for camera i • Si = { fi,t1 fi,t2 … }

    • Macro-frame • Ft = { f1,t f2,t … fn,t }

    1 n

    f1,t1 fn,t1Ft1

    …f1,t2 fn,t2Ft2

    S1 Sn

  • 360-Degree Video Representation

  • 360-Degree Video User’s Viewport

    Generation of 360-Degree Video • Capturing of multiple 2D videos together with their metadata • Stitching videos together and further editing them in spherical video • Encoding spherical video considering projection, interactivity, storage and delivery formats

    (this will impact decoding and rendering processes)

  • Video 360 Viewing and Navigation


    Example of HDM (Head-Mounted Displays) – Oculus Rift, Samsung Gear VR, HTC Vive,

  • 360-Degree Video Data Model • Field-of-View or Viewport – display region on the Head-Mounted Display

    • Fraction of omnidirectional view of the scene

    • Viewport defined by a device-specific viewing angle (typically 120 degrees) which delimits horizontally scene from head direction center, called viewport center

    • Viewport Resolution – 4K (3840x2160) pixels • Resolution of full 360-degree video – at least 12K (11520x6480)

    • Video Framerate – order of HMD refresh rate 100Hz – 100 fps

    • Motion-to-Photon Latency requirement • Less than 20 ms for VR – much smaller than Internet request-reply delay

    • Need viewport prediction

    • Bitrate – Video 360 vs HEVC (8K video at 60fps is approx. 100 Mbps)

    • Tiling- Spatial divide of spherical video into in independent tiles

  • Tiles and Spherical Maps

    Issues with Spherical Mapping to Tiles • Viewport distortion • Spatial quality variance Considerations of sphere-to-plane mapping and viewing probability of tiles are IMPORTANT • Overall spherical distortion of segment is the sum of distortion over all pixels the segment


    Xie et al. “360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming”, ACM MM 2017

  • Video 360 Spherical-to-Plane Projections

    Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017 Nasrabadi et al. “Adaptive 360-Degree Video Streaming using Scalable Video Coding”, ACM Multimedia 2017

    Video 360 Capture as Spherical Video

    Equirectangular Projection – stretches poles and reduces efficiency of coding Pyramid Projection – sees degradation on sides Cubemap – maps 90 degree FOV to sides of cube and provides hence less degradation

  • Encoding and Delivery Formats • Codecs

    • AVC/H.264, HEVC/H.265 • VP8, VP9

    • Delivery Formats • DASH/HLS (Dynamic Adaptive HTTP)

    • MPEG-DASH Standard considers tiling

    • MPD (Media Presentation Description) –Modified for Video 360

    • SRD (Spatial Relation Description) integrated into MPD

    • HEVC considers video tiles

    • MPEG – Immersive media standard ISO/IEC 23090

    • Part 1: Use cases • Part 2: OMAF (Omnidirectional Media

    Application Format) • Description of equirectangular projection

    format • Metadata for interoperable rendering of

    360-degree monoscopic and stereoscopic audio-visual data

    • Storage format (ISO base media file format/MP4

    • Codecs: HEVC, MPEG0H 3D audio

    • Part 3: Immersive video • Part 4: Immersive Audio

    Graf, Timmerer, Mueller, “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP”, ACM MMSys 2017

  • Similarities and Differences of Representations

  • Similarity Parameter 3DTI Video 360-Degree Video

    Multi-camera Views Yes (view) Yes (viewport)

    Joint coordinate system Yes Yes

    Bitrate consideration Yes Yes

    View change Yes Yes

    Difference Parameter 3DTI Video 360-Degree Video

    Video Format Color-Plus-Depth Color

    Smallest item to adapt 3DTI frame tile

    Frame Representation Frame manipulation at Pixel level (RGB, Depth, Polygons)

    Frame manipula