"Getting from Idea to Product with 3D Vision," a Presentation from Intel and MathWorks

Copyright © 2016 MathWorks, Inc. and Intel Corp 1

Getting from Idea to Product with 3D Vision

Avinash Nehemiah, A.G. Ramesh

May 3, 2016


• Measure objects and distances

• Locate position of autonomous systems

• Map the environment of an autonomous system

• Reconstruct the 3-D structure of the environment

What Can You Do With 3-D Vision?

3-D Vision uses sensors to measure, map, locate and reconstruct the three

dimensional structure of the environment for visual perception tasks


• Examples of two real systems with 3-D vision

• Box Measurement: Measuring object

dimensions with 3-D Vision

• 3-D Reconstruction and Odometry:

Reconstructing 3-D scene map and estimating

trajectory of an autonomous system

• For each system

• Challenges faced

• How we solved them

• Practical tradeoffs

Topics for Today’s Talk


System Integration

Test

Integrate

3-D Vision System Design

3-D Vision System Design: Top 3 Challenges

Sensor

Selection

Dilemma

Very Difficult

to Test 3-D Vision is

Never Perfect

Test Algorithm

Design Algorithm

Select Camera


• Measuring the dimensions of packages in 3-D using vision only

• Non-intrusive — no measuring tape etc.

• Faster — simply point device at box to measure

Box Measurement

Example System #1


Box Measurement

Sensor Selection Dilemma

Sensor Use Cases Limitation Range Software

Complexity

Cost

Single

Camera

• 3-D structure from sensor or

object motion

• Measurement (planar objects

only)

• Up to scale

reconstruction only

( not metric)

• Planar objects only

Usually up to 8

meters

High Low

Stereo

Camera

• Measurement

• Object recognition

• Navigation

• 3-D Reconstruction

• Does not work well

with homogenous

surfaces

• Requires good

lighting conditions

Usually up to 10

meters

Medium Medium

Depth sensor

IR based

(Time of

flight,

structured

light, active

stereo)

• Measurement

• Object recognition

• Navigation

• 3D reconstruction

• Gesture recognition

• Augmented Reality

• Does not work well

outdoors

• Requires

calibration

Range depends on

implementation

0.3 m to 4m

Low High


Box Measurement

Challenges Faced

User Clicked

Points

Holes in RGB-D

Output

• Point to point measurement with user input

• User clicks on end points

• Significant error if user is off by a few pixels

• Time consuming — 4 clicks from user

• Fully automatic measurement

• Accuracy of RGB-D output depends on surface

material

• Boxes are hard to detect

• No distinct texture or features

• Can be confused with other surfaces —

corner between wall and floor


Box Measurement

3-D Vision is Never Perfect

Detected Plane

– Not Box

Surface

• Initial Solution Approach (Fully Automated)

• Use 3-D plane fitting to find box surface

• Practical Issues Found

• Confusion with floor, walls and other planes

• Failures when too many holes in point cloud

• Final Solution

• Single click from user to locate box

• Holes in RGB-D filled using 2-D image

processing techniques ( flood fill )

• Used plane intersection to find edges

• Tradeoff: Single click by user = substantially

better accuracy


• Consecutive runs of algorithm — 2cm variance in output

• Why does this happen?

• Plane fitting uses Random Sample Consensus (RANSAC)

to account for noisy points

• This gives a small variation in the planes fit from run – run

• Workarounds

• Accuracy: Run algorithm multiple times on same data

and remove bad measurements

• Testing: Collect ground truth (perfect mm accurate

measurements), automate testing to test outputs vs.

ground truth

Box Measurement

Very Difficult to Test

#1 – 270cm x417cm x 174cm

#2 – 272cm x414cm x 174cm


• User Experience vs. Measurement Accuracy

(Single click vs. Fully Automated)

Single click user input

• Higher measurement accuracy

• Fewer false detections

• Fully automatic

• Potential for false detections

• Compute Time vs. Accuracy

(Run Algorithm Once vs. Multiple Times)

• Single Run

• 2-5 cm variance from run-run

Multiple runs

• Better measurement accuracy

Box Measurement

Practical Tradeoffs


• Estimate trajectory (visual

odometry) of motion of

autonomous robot and reconstruct

3-D model of scene

• Used by search and rescue

robots to map areas reconstruct

3-D models

• Elements of simultaneous

localization and mapping

(SLAM)

3-D Reconstruction and Odometry

Example System #2


• Three sensor types considered – Single camera, stereo camera, RGB-D


Sensor Selection Dilemma

Sensor 3-D Reconstruction Cost Software

Complexity

Processing

Power Required

Single Camera

(Calibrated)

Up to a projective

transform (no real units)

Low High High

Stereo Camera Full 3-D reconstruction

with real units

Medium Low High

RGB-D Full 3-D reconstruction

with real units . Does not

work well outdoors

High Low Medium

• Our choice: Two sensors ( RGB-D and single color camera )

• 3-D Reconstruction (RGB-D):

• RGB-D over stereo camera since application was indoors (due to limited compute)

• Visual Odometry (Single color camera)

• Better range and resolution than color sensor on RGB-D camera



3-D Vision is Never Perfect

• Issue

• Small errors creating map from

frame-frame compound

• Workarounds

• Leverage other sensors

• Use secondary sensor(IMU) to

augment estimate from vision

algorithm

• Vision only (increased algorithm

complexity)

• Detect loop closure (the robot passes

same point for second time)

• Perform bundle adjustment to adjust

transforms for stitching point clouds



Very Difficult to Test

• Issues Faced

• Different runs with identical data had different results

• Very difficult to test visualize and test intermediate steps

• Why does this happen

• Randomness in pose estimation from frame-frame due to

RANSAC

• Workarounds

• Simulate against perfect synthetic data to establish a

“known good” — use this for initial development

• Test algorithms with random inputs — make sure a good

result wasn’t “lucky”

• Use bundle adjustment to refine estimates


• Tradeoff: Cost vs. Computation Time

• Factor: Additional Sensor vs. Increased Algorithm Complexity

Additional Sensor (IMU)

• Very accurate transform estimation for stitching

• Less computation since load on vision system is reduced

• Difficult to align and synchronize multiple sensors

• Increased Algorithm Complexity

• Cheaper since no additional sensor required

• Slower and more computationally expensive


Practical Tradeoffs


• 3-D vision is never perfect

• Leverage other sensors

• Clever heuristics can make all the difference

• Very difficult to test

• Establish “ground truth” and test vs. ground truth

• Use perfect synthetic data to test against

• Sensor selection dilemma

• Consider the software complexity vs. cost tradeoff

Lessons Learned: 3-D Vision


• Books

• Multiple View Geometry in Computer Vision — Hartley and Zisserman

• Links

• 3-D Point Cloud Processing

• Stereo Vision

• Structure from Motion

• Measuring Planar Objects with a Calibrated Camera

• Software

• Computer Vision System Toolbox

• Hardware

• Intel® RealSense™

Resources

http://sharepoint.mathworks.com/salesservice/aeg/cre/Lists/ItemManager/DispForm.aspx?ID=2268



http://www.mathworks.com/products/computer-vision/features.html

http://www.mathworks.com/help/vision/getting-started-with-computer-vision-system-toolbox.html

http://www.mathworks.com/help/vision/examples/measuring-planar-objects-with-a-calibrated-camera.html

http://www.mathworks.com/products/computer-vision/features.html

http://www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html

http://www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html

"Getting from Idea to Product with 3D Vision," a Presentation from Intel and MathWorks

Technology