HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligence, by Jeff Bier

Jeff Bier, Founder, Embedded Vision Alliance / President, BDTI

AMD Developer Summit, November 13, 2013

Creating Smarter Applications and Systems

Through Visual Intelligence

“Computer vision is the science and technology of machines that see,

where ‘see’ means that the machine is able to extract information from

an image that is necessary to solve some task.”

– Adapted from en.wikipedia.org/wiki/Computer_vision

Computer vision is distinct from other types of video and image

processing: it involves extracting meaning from visual inputs.

We use the term “embedded vision” to refer to the practical deployment

of computer vision into a wide range of products and applications

• Industrial, automotive, medical, defense, retail, gaming, consumer

electronics, security, education, …

• In embedded systems, mobile devices, PCs and the cloud

Copyright © 2013 Embedded Vision Alliance 2

Computer Vision / Embedded Vision

The proliferation of embedded vision is enabled by:

• Hardware: processors, sensors, etc.

• Software: tools, algorithms, libraries, APIs


Why is Embedded Vision Proliferating Now?

Embedded vision upgrades what machines know about the physical world,

and how they interact with it

This enables dramatic improvements in existing products—and creation of

new types of products

Embedded vision can:

• Boost efficiency: Improving throughput and quality

• Enhance safety: Detecting danger and preventing accidents

• Simplify usability: Making the “user interface” disappear

• Fuel innovation: Enabling us to do things that were impossible


What Does Embedded Vision Enable?


Embedded Vision:

The Software-Defined Sensor

Established (or rapidly growing)

embedded vision markets:

• Factory automation

• Agriculture

• Video game consoles

• Military

• Automotive safety

• Augmented reality for retail

(in store, at home, mobile)

• Public safety and security


Example Embedded Vision Application Areas

Emerging embedded vision

markets:

• Building automation

• Toys and games

• User interfaces (mobile devices,

cars, consumer electronics)

• Robots for many uses and

settings

• Education

• Clinical and home health care

• Field service (e.g., equipment

repair)

• Aids for the visually impaired

Augmenting Human Capabilities: OrCam

Visual Interpreter for the Sight Impaired

Copyright © 2013 Embedded Vision Alliance

www.youtube.com/watch?v=ykDDxWbt5Nw

7

http://www.youtube.com/watch?v=ykDDxWbt5Nw

• Infinitely varying inputs in many applications: uncontrolled lighting,

orientation, motion, occlusion

• Complex, multi-layered algorithms

• Lack of analytical models means

exhaustive experimentation is required

• Numerous algorithms and algorithm

parameters to choose from

• Most vision applications involve high data rates and

complex algorithms high computation requirements

• For vision to be widely deployed, it must be implemented in many

designs that are constrained in cost, size, and power consumption

• Most product creators lack experience with embedded vision


What Makes Embedded Vision Hard?

www.selectspecs.com

A typical embedded vision pipeline:

Typical total compute load: ~10-100 billion operations/second

Loads can vary dramatically with pixel rate and algorithm complexity

9

How Does Embedded Vision Work?

Segmenta-tion

Object Analysis

Heuristics or Expert System

Image Acquisition

Image Pre-

processing

Feature Detection

Ultra-high data rates;

low to medium

algorithm complexity

High to medium data

rates; medium algorithm

complexity

Low data rates;

high algorithm

complexity

How Does Embedded Vision Work?


Detect lane markings on the road and warn when car veers out of the lane

A simplified solution:

• Acquire road images from front-facing camera (often with fish-eye

lens)

• Apply pre-processing (primarily lens correction)

• Perform edge detection

• Detect lines in the image with Hough transform

• Determine which lines are lane markings

• Track lane markings and estimate positions in the next frame

• Assess car’s trajectory with respect to lane and warn driver in case

of lane departure

Lane Departure Warning—The Problem


• Lenses (especially inexpensive ones) tend to distort images

• Straight lines become curves

• Distorted images tend to thwart vision

algorithms

11

Lane-Departure Warning: Lens Distortion

Section based on “Lens Distortion Correction” by Shehrzad Qureshi; used with permission. Image courtesy of and © Luis Alvarez


• A typical solution is to use a known test pattern to quantify the lens

distortion and generate a set of warping coefficients that enable the

distortion to be (approximately) reversed

• The good news: the calibration procedure is performed once

• The bad news: the resulting coefficients then must be used to

“undistort” (warp) each frame before further processing

• Warping requires interpolating between pixels

12

Lens Distortion: A Solution


In our lane-departure warning system, edge detection is the first step in

detecting lines (which may correspond to lane markings)

• Edge detection is a well-understood technique

• Primarily comprises 2D FIR filtering

• Computationally-intensive pixel processing

• Many algorithms are available (Canny, Sobel, etc.)

• Some algorithms are highly data-parallel

• Others (e.g. Canny) include steps such as edge-tracing that

reduce data parallelism

13

Lane Departure Warning: Edge Detection


Edge thinning removes unwanted spurious edge pixels

• Improves output of Hough transform

• Often performed in multiple passes over the frame

• Also useful in other applications

14

Lane-Departure Warning: Edge Thinning


• Hough transform examines the edge pixels found in the image, and

detects predefined shapes (typically lines or circles)

• In a lane-departure warning system, Hough transform is used to detect

lines, which may correspond to lane markings on the road

15

Lane-Departure Warning: Hough Transform

Original image Edges detected Lines detected


Similar to a histogram

• Each detected edge pixel is a “vote” for all of the lines that pass

through the pixel’s position in the frame

• Lines with the most “votes” are detected in the image

• Uses a quantized line-parameter space (e.g. angle and distance

from origin)

• Must compute all possible line-parameter values for each detected

edge pixel

16

Lane-Departure Warning: Hough Transform

Edge

pixels

Every possible line through an edge pixel

gets one vote when the pixel is processed

A line passing

through many edge

pixels gets many

votes


• Filter the detected lines to discard lines that are not likely to be lane

markings

• Find start and end points of line segments

• Filter by length, position, and angle

• Filter by line color and background color

• Additional heuristics may apply (e.g. dashed or solid lines are likely

to be lane markings, but lines with uneven gaps are not)

• Possibly classify the lines as lane markings or other lane indication (e.g.

curb)

17

Lane-Departure Warning:

Detecting Lane Markings


• Tracking lane markers from each frame to the next

• Helps eliminate spurious errors

• Provides a measure of the car’s trajectory relative to the lane

• Typically done using a predictive filter:

• Predict new positions of lane markings in the current frame

• Match the lane markings to the predicted positions and compute the

prediction error

• Update the predictor for future frames

• Kalman filters are often used for prediction in vision applications

• Theoretically these are the fastest-converging filters

• Found in OpenCV

• Simpler filters are often sufficient

• Very low computational demand due to low data rates

• E.g. 2 lane marking positions × 30 fps 60 samples per second

18

Lane-Departure Warning:

Tracking Lane Markings


• The basic algorithm presented is not robust. May need

significant enhancements for real-world conditions

• Must work properly on curved roads

• Must handle diverse conditions (e.g. glare on wet road at

night)

• Integration with other automotive safety functions

19

Lane-Departure Warning: Challenges


• Nearly all embedded vision systems use a

CPU, but using only CPUs is often impractical

due power, size and/or cost

• Many other processor types are used: GPUs, DSPs,

FPGAs, many-core arrays, specialized datapath engines, etc.

• But this can create big challenges for developers:

• Figuring out how to partition

• Complex programming models

• Multiple languages, tool flows

20

Heterogeneous Workloads Often Map Most

Efficiently to Heterogeneous Architectures

Segmenta-tion

Object Analysis

Heuristics or Expert System

Image Acquisition

Image Pre-

processing

Feature Detection


CPU

GPU

DSP

CONNECTIVITY

ISPs

DISPLAY

NAVIGATION

SENSORS

MULTIMEDIA

Source: Qualcomm

The Embedded Vision Alliance (www.Embedded-Vision.com) is a

partnership of 35 leading embedded vision technology and services

suppliers

Mission: Inspire and empower product creators (including mobile

app developers) to incorporate visual intelligence into their

products

The Alliance provides free, high-quality technical educational

resources for engineers

• The Embedded Vision Academy offers in-depth tutorial

articles, video “chalk talks,” code examples, tools and

discussion forums

• The Embedded Vision Insights newsletter delivers news,

Alliance updates and new resources

Companies interested in becoming sponsoring members of the

Alliance should contact [email protected]


Helping Product Creators Harness

Embedded Vision

http://www.embedded-vision.com/



mailto:[email protected]



• “Embedded vision” refers to practical systems that extract meaning

from visual inputs

• Embedded vision upgrades what machines know about the physical

world, and how they interact with it, enabling dramatic improvements

in existing products—and creation of new types of products

• To date, embedded vision has largely been limited to low-profile

applications like surveillance and industrial inspection

• Thanks to the emergence of high-performance, low-cost, energy

efficient programmable processors, this is changing

• Heterogeneous processors are often best for embedded vision

• HSA increases flexibility and simplifies programming

• The Embedded Vision Alliance provides a wide range of resources to

help product creators incorporate visual intelligence into their products


Conclusions

• Eye-Catching Vision Video Clips:

http://www.embedded-vision.com/eye-

catching-embedded-vision-clips

• Embedded Vision Alliance News Stream:

http://www.embedded-

vision.com/news

• BDTI OpenCV Executable Demo

Package—No programming required:

www.embeddedvisionacademy.com/ope

ncvdemo

• BDTI Quick-Start OpenCV Kit:

www.embeddedvisionacademy.com/ope

ncvkit


Resources

http://www.embedded-vision.com/eye-catching-embedded-vision-clips














http://www.embedded-vision.com/news





Thank You

24

Visit us at www.Embedded-Vision.com

http://www.embedd-vision.com/



• Alliance Member companies position themselves as

leaders in front of thousands of product creators

who visit the Alliance web site each month

• Multiple Embedded Vision Summit conferences

each year introduce Member companies and their

products to prospective customers

• Our Member companies meet quarterly to develop

business partnerships and gain insights into

embedded vision markets and technology trends

• We secure frequent press coverage on embedded

vision topics, gaining exposure for our members as

thought leaders

• Companies interested in joining the Alliance may

contact us via [email protected]


Vision Technology and Service Suppliers:

Join the Alliance




HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligence, by Jeff Bier

Technology

existing productsand

embedded vision

departure

embedded vision

embedded vision

hough transform

embedded vision

departure