Click here to load reader
Jul 08, 2020
Distribution Systems for 3D Teleimmersive and Video 360
Content: Similarities and Differences
Klara Nahrstedt
Department of Computer Science
University of Illinois at Urbana-Champaign
ACM Multimedia Systems, June 12, 2018, Amsterdam, Netherlands
Overview
• Motivation
• 3D Teleimmersive Video Representation
• Video 360 Representation
• Similarities and Differences in Content Representation
• Distribution of 3DTI Video
• Distribution of Video 360
• Similarities and Differences in Content Distribution
• Conclusion
3D Teleimmersive (3DTI) Systems
3 Source: http://tele-immersion.citris-uc.org; http://monet.cs.illinois.edu/projects/cyphy-multi- modal-teleimmersion-for-tele-physiotherapy/teleimmersion-gallery/
http://tele-immersion.citris-uc.org/
High-End Tele-Presence Environments
2
Traditional telephony and videoconferencing provide some of these elements, including ease of
use and audio quality, yet fail on most others. Our Coliseum effort aims to advance the state of
videoconferencing by applying recent advances in image-based modeling and computer vision to
bring these other elements of face-to-face realism to remote collaboration.
Scene reconstruction, the task of building 3D descriptions using the information contained in
multiple views of a scene, is an established challenge in computer vision [Longuet-Higgins 81].
It has seen remarkable progress over the last few years due to improved algorithms [Seitz 97,
Narayanan 98, Pollefeys 99] and faster computers. The Coliseum system is based on the Image-
Based Visual Hulls (IBVH) image-based rendering scene reconstruction technology of MIT
[Matusik 00]. Our recent Coliseum efforts have shown that the IBVH method can operate at
video rates from multiple camera streams hosted by a single personal computer [Baker 02].
Each Coliseum participant works on a standard PC with LCD monitor and a rig housing five
video cameras spaced at roughly 30 degree increments, as shown in Figure 1. During a
teleconferencing session, Coliseum builds 3D representations of each participant at video rates.
The appropriate views of each participant are rendered for all others and placed in their virtual
environments, one view of which is shown in Figure 2. The impression of a shared space results,
with participants free to move about and express themselves in natural ways, such as through
voice, gesture, and gaze.
Handling five video streams and preparing 3D reprojection views for each of numerous
coparticipating workstations at video rates is a formidable task. Tight control must be exercised
on computation, process organization, and inter-desktop communication. At project inception,
we determined we needed an effective speedup of about one hundred times over the MIT IBVH
processing on a single PC to reach utility. Our purpose in this paper is to detail some of the major
issues in attaining this performance.
Figure 1. The Coliseum immersive videoconferencing system
C is
c o T
e le
-p re
s e n c e
H P
H a lo
U N
C H
P C
o le
s iu
m
Multi-Camera Live Broadcast Systems
http://www.dailymail.co.uk/sciencetech/article-2336893/New-TV-cameras-bring-Matrix-style-bullet-time-
trickery-live-sports-coverage.html
Multi-Camera Broadcast Systems
https://thegadgetflow.com/portfolio/slingstudio-
multi-camera-broadcaster/
https://www.myslingstudio.com/ https://www.cinfo.es/our-products/synthetrick/multicam
https://www.spiideo.com/sports/
https://thegadgetflow.com/portfolio/slingstudio-multi-camera-broadcaster/ https://www.myslingstudio.com/
360-Degree Video
7 360 Degrees Cameras – CoolPile.com: http://coolpile.com/tag/360-degrees-cameras
http://coolpile.com/tag/360-degrees-cameras
3D Teleimmersive Video Representation
3D Teleimmersive Stereo Video and Free Viewpoint Video Capture
3DTI Viewing
Photo courtesy of Prof. Ruzena Bajcsy.
Singapore, 2014
3D Stereo Video Representation
Wu, Ahsan, Kurillo, Agarwal, Nahrstedt, Bajcsy, “Color-plus-Depth Level-of-Detail in 3D Teleimmersive Video: A Psychophysical Approach”, ACM Multimedia 2011
Free-Viewpoint 3D Video Representation
Example of 3D representation captured by different cameras
c a
m e
ra -1
C a m
e ra
-2
C a m
e ra
-3
C a m
e ra
-8
camera
direction
source: http://zing.ncsl.nist.gov/~gseidman/vrml/
Angle
θ
View Model
Oi Ou
3DTI Data Model
• 3D frame for camera i at time t: fi,t
• Each pixel in the frame carries color+depth data and can be independently rendered
• Stream for camera i • Si = { fi,t1 fi,t2 … }
• Macro-frame • Ft = { f1,t f2,t … fn,t }
…
…
1 n
f1,t1 fn,t1Ft1
…f1,t2 fn,t2Ft2
S1 Sn
360-Degree Video Representation
360-Degree Video User’s Viewport
Generation of 360-Degree Video • Capturing of multiple 2D videos together with their metadata • Stitching videos together and further editing them in spherical video • Encoding spherical video considering projection, interactivity, storage and delivery formats
(this will impact decoding and rendering processes)
Video 360 Viewing and Navigation
https://en.wikipedia.org/wiki/Head-mounted_display
Controller
Example of HDM (Head-Mounted Displays) – Oculus Rift, Samsung Gear VR, HTC Vive,
360-Degree Video Data Model • Field-of-View or Viewport – display region on the Head-Mounted Display
• Fraction of omnidirectional view of the scene
• Viewport defined by a device-specific viewing angle (typically 120 degrees) which delimits horizontally scene from head direction center, called viewport center
• Viewport Resolution – 4K (3840x2160) pixels • Resolution of full 360-degree video – at least 12K (11520x6480)
• Video Framerate – order of HMD refresh rate 100Hz – 100 fps
• Motion-to-Photon Latency requirement • Less than 20 ms for VR – much smaller than Internet request-reply delay
• Need viewport prediction
• Bitrate – Video 360 vs HEVC (8K video at 60fps is approx. 100 Mbps)
• Tiling- Spatial divide of spherical video into in independent tiles
Tiles and Spherical Maps
Issues with Spherical Mapping to Tiles • Viewport distortion • Spatial quality variance Considerations of sphere-to-plane mapping and viewing probability of tiles are IMPORTANT • Overall spherical distortion of segment is the sum of distortion over all pixels the segment
covers
Xie et al. “360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming”, ACM MM 2017
Video 360 Spherical-to-Plane Projections
Carbillon, Simon, Devlic, Chakareski, “Viewport-Adaptive Navigable 360-Degree Video delivery”, May 2017 Nasrabadi et al. “Adaptive 360-Degree Video Streaming using Scalable Video Coding”, ACM Multimedia 2017
Video 360 Capture as Spherical Video
Equirectangular Projection – stretches poles and reduces efficiency of coding Pyramid Projection – sees degradation on sides Cubemap – maps 90 degree FOV to sides of cube and provides hence less degradation
Encoding and Delivery Formats • Codecs
• AVC/H.264, HEVC/H.265 • VP8, VP9
• Delivery Formats • DASH/HLS (Dynamic Adaptive HTTP)
• MPEG-DASH Standard considers tiling
• MPD (Media Presentation Description) –Modified for Video 360
• SRD (Spatial Relation Description) integrated into MPD
• HEVC considers video tiles
• MPEG – Immersive media standard ISO/IEC 23090
• Part 1: Use cases • Part 2: OMAF (Omnidirectional Media
Application Format) • Description of equirectangular projection
format • Metadata for interoperable rendering of
360-degree monoscopic and stereoscopic audio-visual data
• Storage format (ISO base media file format/MP4
• Codecs: HEVC, MPEG0H 3D audio
• Part 3: Immersive video • Part 4: Immersive Audio
Graf, Timmerer, Mueller, “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP”, ACM MMSys 2017
Similarities and Differences of Representations
Similarity Parameter 3DTI Video 360-Degree Video
Multi-camera Views Yes (view) Yes (viewport)
Joint coordinate system Yes Yes
Bitrate consideration Yes Yes
View change Yes Yes
Difference Parameter 3DTI Video 360-Degree Video
Video Format Color-Plus-Depth Color
Smallest item to adapt 3DTI frame tile
Frame Representation Frame manipulation at Pixel level (RGB, Depth, Polygons)
Frame manipula