2d 3d Conversion

3D-TV Content Creation: Automatic 2D-to-3DVideo Conversion

Presented By,PADMASHRI N.K

1BJ08CS027

3D-TV Content Creation: Automatic 2D-to-3DVideo Conversion

• Area of the paper : Broadcasting• Year of Publication : 2011• Authors : Liang Zhang,

Carlos Vazquez, Sebastian Knorr

Terminologies

• Broadcasting• data conversion• image generation• stereo displays• stereo synthesis• stereo vision• three-dimensional• Displays• 3D-TV

Outline

• Abstract• Introduction• System Design• Existing System• Proposed system• Problem definitions• Computational model• Summary of characteristics of depth representation• Challenges• Conclusion• References

Abstract• Three-dimensional television (3D-TV) is the next major revolution in

television. • A successful rollout of 3D-TV will require a backward-compatible

transmission/distribution system, inexpensive 3D displays, and an adequate supply of high-quality 3D program material. With respect to the last factor, the conversion of 2D images/videos to 3D will play an important role.

• This paper provides an overview of automatic 2D-to-3D video conversion with a specific look at a number of approaches for both the extraction of depth information from monoscopic images and the generation of stereoscopic images.

• Some challenging issues for the success of automatic 2D-to-3D video conversion are pointed out as possible research topics for the future.

Introduction(Digital TV Technology Trend Overview)

Introduction • Three-dimensional television (3D-TV) is anticipated to be the next step in

the advancement of television.• The term “3D” in this context denotes “stereoscopic,” meaning a two-

view system is used for visualization. • The successful adoption of 3D-TV by the general public will depend not

only on technological advances in 3D displays and 3D-TV broadcasting systems, but also on the availability of a wide variety of program content in stereoscopic 3D (S3D) format for 3D-TV services.

• The potential market is attracting many companies to invest their manpower and money for developing 2D-to-3D conversion techniques.

• The fundamental principle of 2D-to-3D conversion techniques rests on the fact that stereoscopic viewing involves binocular processing of two slightly dissimilar images.

• Thus, converting 2D images to stereoscopic 3D images involves the underlying principle of horizontal shifting of pixels to create a new image, so that there are horizontal disparities between the original image and a new version of it.

• Various approaches for 2D-to-3D conversion have been proposed. These approaches can be classified into three schemes, namely: manual, human-assisted and automatic conversion.

Introduction (continued..)

System Design

• 2D-to-3D video conversion involves the generation of new images from a single 2D image or a single stream of 2D images (video sequence). From this perspective, 2D-to-3D video conversion can be seen.

• The framework commonly used for the automatic 2D-to-3D video conversion basically consists of two elements namely: the extraction of depth information and the generation of stereoscopic images in accordance with both the estimated depth information and the expected viewing conditions.

The extraction of depth information:• The extraction of depth information aims to exploit pictorial cues and motion

parallax, contained in a single 2D image or video, to recover the depth structure of the scene.

• The retrieved depth information is then converted into a suitable representation for usage in the 2D-to-3D video conversion process.

System Design (continued..)

• A sparse 3D scene structure and a depth map are two representations of incomplete geometry of a captured scene that are commonly used.

• A sparse 3D scene structure usually consists of a number of 3D real world coordinates, while a depth map is essentially a two-dimensional (2D) function that provides the depth, with respect to the camera position, as a function of the image coordinates.

The generation of stereoscopic images:• The generation of stereoscopic images is the step that involves warping

textures from original images in accordance with the depth information retrieved to create a new image or video sequence for the second eye.

System Design (Continued..)

Existing system

The manual scheme: The manual scheme is to shift the pixels horizontally with an artistically

chosen depth value for different regions/objects in the image to generate a new image, where hand drawing produces high quality depth, but is very time consuming and expensive.

The human-assisted scheme: The human-assisted scheme is to convert 2D images to stereoscopic 3D

with some corrections made “manually” by an operator. Even though this scheme reduces the time consumed in comparison to the manual conversion scheme, a significant amount of human engagement is still required to complete the conversion.

Proposed System

Automatic 2D-to-3D Video Conversion:

• To convert the vast collection of available 2D material into 3D in an economic manner, an automatic conversion scheme is desired.

• The automatic conversion scheme exploits depth information originated from a single image or from a stream of images to generate a new projection of the scene with a virtual camera of a slightly different (horizontally shifted) viewpoint.

• It may be done in real-time or in a more time-consuming off-line process. • The quality of the resulting product is related to the level of the

processing involved.

Problem definitions

There are two key issues to consider for automatic 2D-to-3D conversion techniques:

• how to retrieve depth information from a monoscopic image or video

• how to generate high-quality stereoscopic images at new virtual viewpoints.

• Each depth image stores depth information as 8-bit grey values with the grey level 0 indicating the furthest value and the grey level 255 specifying the closest value.

• A variety of depth cues are exploited by the human being to perceive the world in three dimensions.

• These are typically classified into binocular and monocular depth cues.

Computational modelExtraction of scene depth information

• The extraction of scene depth information aims to convert monocular depth cues contained in video sequences into quantitative depth values of a captured scene.

• Monocular depth cues can be subdivided into pictorial and motion cues.

A. Depths From Pictorial Cues• Pictorial depth cues are the elements in an image that allow us to perceive

depth in a 2D representation of the scene.• Depth perception could be related to the physical characteristics of the

Human Visual System (HVS) or could be learned from experience.• The three categories of pictorial cues commonly used to extract depth

information in the following subsections.

Computational model (Continued..)


Classification of depth cues.

Pictorial depth cues in a 2D image. Visible depth cues: linear perspective, relative and known size, texture gradient, atmospheric scattering, relativeheight in picture, and interposition.

1) Depth from focus/defocus:• This mechanism can be exploited for the generation of depth

information from captured images, which contain a focused plane and objects out of the focused plane.

There are two main approaches to implement this mechanism:• The first one employs several images with different focus

characteristics. • The second approach tries to extract the blur information from a

single image by measuring the amount of blur associated with each pixel and then mapping the blur measures to the depth of that pixel.

Although the approach of recovering depth from focus/defocus is

relatively simple, it suffers from a major drawback, how to distinguish the foreground from the background when the amount of blur is similar.


2) Depth from geometric cues:• Geometric related pictorial depth cues are linear perspective, known size,

relative size, height in picture, interposition, and texture gradient. • Linear perspective refers to the property of parallel lines of converging at

infinite distance. • The height in picture denotes that objects that are closer to the bottom of

the images are generally closer than objects at the top of the picture.• Aside from linear perspective and height in picture it is also possible to

recover depths from texture (called shape-from-texture), which aims to estimate the shape of a surface based on cues from markings on the surface or its texture. Those methods, however, are normally restricted to specific types of images and cannot be applied to 2D-to-3D conversion of general video content.


3) Depth from color and intensity cues: Variations in the amount of light arriving to the eye could also provide

information of the depth of objects. This type of variation is reflected on captured images as variations of intensity or changes in color.

• Atmospheric scattering refers to the scattering of light rays by the atmosphere producing a bluish tint and less contrast to objects that are in the far distance and a better contrast to objects that are in close range.

• Light and shadow distribution refers to the information provided by shadows with respect to the position and shape of objects relative to other objects and the background.

• Figure-ground perception is another mechanism that helps in the perception of depth. Edges and regions in the image are the depth cues providing this information.


B. Depths From Motion Cues• Motion parallax refers to the relative motions of objects across the

retina. • For a moving observer, near objects move faster across the retina than far

objects, and so relative motion provides an important depth cue. This is usually called the principle of the depth from motion parallax approach.

• This is based on the fact that objects with different motions usually have different depths.

• In principle, only video sequences that are captured by a freely moving camera have motion parallax that is closely related to the captured scene structure.

• Different camera motions will lead to different strengths of depth perception.

• This process consists of two parts: determination of motion parallax from the sequence, and the mapping of motion parallax into depth information.


1) Motion Parallax Between Images:• Motion parallax allows perceiving depth from the differences between

two frames in a video sequence. • These differences are observed in the video as image motion. By

extracting this image motion, the motion parallax could be recovered. • Image motion may relate to the whole image (global motion estimation)

or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel.

• Motion estimation methods can be generally classified into direct and indirect.

2) Conversion of Motion Parallax Into Depth Information: Dependent on the depth representation, motion parallax estimated from

a video sequence is then converted into depth information in terms of a 2D depth map or a 3D scene structure.


a) 2D depth map reconstruction:• A 2D depth map can be reconstructed from image motion. • The magnitude of motion vectors within each video frame are directly

treated as depth values, when consecutive video frames are taken in almost parallel viewing or when they are acquired in a small baseline distance of two consecutive viewpoints.

• Such a motion-to-depth mapping might not generate correct depth magnitudes, but would have a correct depth order.

b) Sparse 3D scene structure reconstruction:• A sparse 3D scene structure is represented by a set of 3D feature points in a

reconstructed 3D world.


Sparse 3D scene structure and camera track determined by SfM and positioning of a virtual stereo camera.

Depth map enhancementComputational model (Continued..)

Depth fusionDepth map estimated by block-

matching motion estimation

Color segmented frame

Enhanced depth map

Generation of stereoscopic imagesComputational model (Continued..)

The procedures for the generation of stereoscopic images vary with the representations of depth information.

A. Approaches Based on 2D Depth Maps:• To generate the 3D video, DIBR is used to syntheses the second view video

based on the estimated depth map and the 2D video input.• Depth image based rendering (DIBR) permits the creation of novel images,

using information from depth maps, as if they were captured with a camera from different viewpoints. The DIBR system usually consists of three steps:i)The pre-processingii)The 3D image warping

iii)Hole filling: -> Detect holes -> Fill holes by averaging textures from neighborhood pixels -> Linear interpolation technology


Depth estimation

Depth map pre process-ing

3D image wrap-ing

Hole filling

2D video input

Depth image based rendering(DIBR)Left view

Right view


Stereoscopic image synthesis with DIBR. (a) Original color image “interview,” (b) correspondent depth image, and (c) rendered image without hole-filling. Holes are marked with a green color.

Major challenges in DIBR:• Occlusion: Two different points in the image plane at the real view can be warped to the same location in the virtual view. To resolve this, the point with position appear closer to the camera in the virtual view will be used.• Disocclusion: Occluded area in the real view may become visible in the virtual view. Disocclusion can be resolved by (1) Hole-filling and (2) Depth Map Pre-processing

Hole filling by interpolation:

B. Approaches Based on Sparse 3D Scene Structure:The basic idea in is to determine the transformation between the original and virtual views, based on the sparse 3D scene structure, to enable the generation of virtual views. This procedure consists the following three steps:1) Setup of Virtual Stereo Rig2) 2) Determination of Planar Homographies for Image Warping3) 3) Virtual View Generation



Demo : 2D video sample Demo : converted 3D video sample

Summary of characteristics of depth representation

Challenges Even though much research has been done to enable automatic 2D-to-3D

conversion, the techniques are still far from mature. Most available products and methods are only successful in certain circumstances. The following are some key challenging issues to be solved.

• One issue that directly affects the image quality is the occlusion disocclusion problem during the generation of the stereoscopic images.

• The depth ambiguity from monocular depth cues is one issue that impacts the depth quality. The depth ambiguity originates from the violation of the principles of depth generation.

• The integration of various depth cues is another issue affecting the success of automatic 2D-to-3D video conversion.

• The real-time implementation of 2D-to-3D conversion is also a critical issue for the adoption of the proposed techniques by the general public.

Conclusion • This paper summarized current technical advances related to the development

of automatic 2D-to-3D video conversion. • The underlying principle of the conversion is to horizontally shift the pixels of

an original image to generate a new version of it. • To enable this conversion, different approaches for the extraction of depth

information from monoscopic images and the generation of stereoscopic images were reviewed.

• A number of challenging issues that have to be solved for the success of automatic 2D-to-3D video conversion were pointed out as possible research topics.

• With the development of more advanced techniques for 2D-to-3D video conversion, the vast collection of 2D material currently available will be converted into stereoscopic 3D to boost the general public interest in purchasing 3D displays and 3D-TV services.

References• IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 2, JUNE 2011• S. Yano and I. Yuyama, “Stereoscopic HDTV: Experimental system and

psychological effects,” J. SMPTE, vol. 100, pp. 14–18, 1991.• 3D video conversion .pdf• N. S. Holliman, N. A. Dodgson, and G. Favalora, “Three-dimensional display

technologies: An analysis of technical performance characteristics,” IEEE Trans. Broadcast., 2011, to be published.

• P. Harman, J. Flack, S. Fox, and M. Dowley, “Rapid 2D to3D conversion,” in SPIE Conf. Stereoscopic Displays and Virtual Reality Systems IX, 2002, vol. 4660, pp. 78–86.

2d 3d Conversion

Documents

2d video input

monocular

tv content

3d video conversion

2d depth map

3d conversion

single 2d

depth map