Unconstrained 2D to Stereoscopic 3D Image and Video Conversion using Semi-Automatic Energy Minimization Techniques Raymond Phan, Richard Rzeszutek and Dimitrios Androutsos Dept. of Electrical & Computer Engineering Ryerson University – Toronto, Ontario, Canada Thursday, October 24 th , 2012 Chinese 6 Theatre – Hollywood, California, USA
27
Embed
Unconstrained 2D to Stereoscopic 3D Image and Video Conversion using Semi-Automatic Energy Minimization Techniques
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Unconstrained 2D to Stereoscopic 3D Image and Video Conversion using Semi-Automatic Energy Minimization Techniques
Raymond Phan, Richard Rzeszutek and Dimitrios Androutsos Dept. of Electrical & Computer Engineering Ryerson University – Toronto, Ontario, Canada
Thursday, October 24th, 2012 Chinese 6 Theatre – Hollywood, California, USA
Conversion Framework – Images • Random Walks: Energy Minimization Scheme
– Starting from a user-defined label, what is the likelihood of a random walker visiting all unlabeled pixels in image
– Goal: Classify every pixel to belong to one of K labels – Pixel gets the label generating the highest likelihood
• Modify Random Walks to create depth maps – Likelihoods are probabilities: Spans the set [0,1] – User-defined depths and solved depths spans same set – Goal is to solve for one label: The depth!
Conversion Framework – Images – (2) – Use Scale-Space Random Walks in our framework
• Pyramidal sampling scheme, with Random Walks applied to each resolution Merged via Geometric Mean
– User chooses values from [0,1] and brushes over image – 0: Dark intensity / color, 1: Light intensity / color – Resulting solved probabilities are directly used as depths
• Is this valid? – Yes! Psych. study done at Tel Aviv Uni. in Israel – As long as the user is perceptually consistent in marking
– Minimizes cardboard cutout effect – RW generates depths not originally defined by user – But, we need to respect object boundaries
• Idea: Combine Random Walks with Graph Cuts – Graph Cuts = Hard Segmentation
• Only creates result with depths/labels provided by user – GC solves the MAP-MRF problem with user labels – Consider image as a weighted connected graph
• Solution is to solve the max-flow/min-cut of this graph
Conversion Framework – Images – (5) • NB: Making depth maps = Segmentation Problem
– Specifically, a multi-label segmentation problem – But! Graph Cuts: Binary segmentation problem (FG/BG) – Graph Cuts also has an integer labeling, not from [0,1]
• Must modify above to become multi-label – Each unique user-defined label is given an integer label – Binary segmentation is performed for each label
• FG: Label in question, BG: All of the other labels – Rest of the pixels are those we wish to label
Conversion Framework – Images – (7) • But! We can make use of this
– Merge Random Walks & Graph Cuts together – Create a depth prior: An initial depth map estimate – This is essentially Graph Cuts! – We merge by feeding depth prior as additional information into RW
• Before we merge… – Depth maps for RW and GC must be compatible with each other
• RW has depths of [0,1], GC has integer labels
– Map the user-defined labels from RW of set [0,1] to an integer set – Perform Graph Cuts on using this integer set, and map the integer
set back to the set of [0,1] Use a lookup table to do this
Conversion Framework – Images – (8) • Summary of Method:
– Place user-defined strokes on image & create depth prior – Feed depth prior with same strokes into RW and solve – To modify depth map, append strokes & re-run algorithm
Conversion Framework – Video – (2) • How do we solve?
– Use block processing to preserve temporal coherency – Process blocks of frames without exhausting memory – Overlapping frames within blocks are left unused
• Each block is independent of the others
• Back to labeling: How many frames do we label? – We allow the user the option of manually choosing
which ones to label – However, labeling only a small set of frames will result in
• Labeling all frames is better – Not doing this results in depth artifacts – For frames having no labels, moving points quickly “fade” in depth – Labeling all frames is better, but can be very time consuming!
Conversion Framework – Video – (5) • Labeling all frames is better – Part II
– Instead of manually labeling all frames, label first frame and use a tracking algorithm to propagate the labels
– Adjust depths of labels when object depths change • Label Tracking?
– Would be very taxing to track all points in a stroke – Decompose a stroke at a particular depth into N points – Track each of these points separately – Reconstruct the stroke using spline interpolation
– Long term tracker for unconstrained video by Kalal et al. – Simply draw a bounding box around object in first frame – After, trajectory is determined for the rest of the frames,
accounting for size and illumination changes • How do we use this?
– For each point in each decomposed stroke, surround a bounding box and track the region
– Reconstruct each stroke using the tracked points
Conclusions • Made a semi-auto method for 2D-3D conversion
– Auto: Needs error correction & pre/post processing – Manual: Time-consuming and expensive – Happy medium between the two – Allows user to correct errors instantly and re-run fast
• Works for both images and video – Merged two segmentation algorithms together
• Combine merits of both methods together for better accuracy – Video: Modified a robust tracking algorithm to track
user-defined labels as well as dynamically adjust depths