7/30/2019 CVIU Lecture 1
1/55
1
ENGN8530: Computer Vision
and Image Understanding:Theories and Research
Topic 1:
Introduction to Computer Vision
and Image UnderstandingDr Chunhua Shen
Dr Roland Goecke
VISTA / NICTA & RSISE, ANU
7/30/2019 CVIU Lecture 1
2/55
ENGN8530: CVIU 2
What is Computer Vision?
Vision is a process that produces, from images of theexternal world, a description that is useful to the viewerand not cluttered with irrelevant information. (Marr andNishihara, 1978)
Computer vision is the science and technology ofmachines that see. computer vision is concerned with thetheory and technology for building artificial systems that
obtain information from images or multi-dimensionaldata. (Wikipedia)Reference:
D. Marr and K. Nishihara, Representation and recognition of the spatial organisation of three-
dimensional shapes, Proc. Royal Society, B-200, 1978, pp. 269-294.
7/30/2019 CVIU Lecture 1
3/55
ENGN8530: CVIU 3
What is Computer Vision? (2)
Sometimes seen as complementary to biological vision. In biological vision, the visual perception of humans and
various animals are studied, resulting in models of howthese systems operate in terms of physiologicalprocesses.
Computer vision, on the other
hand, studies and describesartificial vision system that areimplemented in software and/orhardware.
7/30/2019 CVIU Lecture 1
4/55ENGN8530: CVIU 4
What is Computer Vision? (3)
Applications: Controlling processes
(robots, vehicles)
Detecting events (visual
surveillance) Organising information
(indexing databases ofimages / videos)
Modelling objects orenvironments (medicalimage analysis)
Interaction (HCI)
Source: Wikipedia
7/30/2019 CVIU Lecture 1
5/55
ENGN8530: CVIU 5
Image Understanding
Computer vision goeshand in hand withimage understanding
What information dowe need to know tounderstand the scene?
How can we makedecisions about whatobjects are present,their shape, their
positioning? Source: CMU Computer Vision course
7/30/2019 CVIU Lecture 1
6/55
ENGN8530: CVIU 6
Image Understanding (2)
Many different questions and approaches to solvecomputer vision / image understanding problems:
Can we build useful machines to solve specific (and limited)vision problems?
Is there anything special about the environment which makesvision possible?
Can we build a model of the world / scene from 2D images?
Many different fields are involved, e.g. computer science,AI, neuroscience, psychology, engineering, philosophy,art.
7/30/2019 CVIU Lecture 1
7/55
ENGN8530: CVIU 7
Sub-areas of CVIU
Scene reconstruction Event detection
Object tracking
Object recognition Object structure recovery
Ego-motion
Multi-view geometry
Indexing of image / video databases
7/30/2019 CVIU Lecture 1
8/55
ENGN8530: CVIU 8
Scene Reconstruction
From stereo From multiple views
7/30/2019 CVIU Lecture 1
9/55
ENGN8530: CVIU 9
Event Detection
Source: MERL
Source: Roland Goecke
7/30/2019 CVIU Lecture 1
10/55
ENGN8530: CVIU 10
Object Tracking
Source: Roland Goecke
7/30/2019 CVIU Lecture 1
11/55
ENGN8530: CVIU 11
Object Recognition
Query
Result
DatabaseSource: David Nister
7/30/2019 CVIU Lecture 1
12/55
ENGN8530: CVIU 12
Object Structure Recovery
Reference:
A.D. Worrall, J.M. Ferryman, G.D. Sullivan and K.D. Baker, Pose and structure recovery using
active models, Proc. 6th British Machine Vision Conference, Vol.1, Birmingham, UK, pp137-146.
7/30/2019 CVIU Lecture 1
13/55
ENGN8530: CVIU 13
Ego-motion
Estimated camera path
Optical flow
Source: Roland Goecke
7/30/2019 CVIU Lecture 1
14/55
ENGN8530: CVIU 14
Multi-View Geometry
Epipolar geometry
Source: Richard Hartley, Andrew Zisserman
7/30/2019 CVIU Lecture 1
15/55
ENGN8530: CVIU 15
Indexing and Retrieval
Results
1
23
Query
Reference: J. Sivic and A. Zisserman, Video Google: A Text Retrieval Approach to Object Matching in
Videos, Proc. International Conference on Computer Vision, Nice, France, 2003, pp. 1470-1477.
7/30/2019 CVIU Lecture 1
16/55
ENGN8530: CVIU 16
The Default Approach (Marr)
Workbottom upfrom the image to a 3D world modelvia hierarchy of representations as follows
Pixel array the image
Raw primal sketch edge, corner, etc. representation
Primal sketch structural information, i.e. groupings,segmentations, etc.
2-D sketch depth information in image-centred view
3-D world model
Reference:
D. Marr, Vision, Freeman, 1982.
7/30/2019 CVIU Lecture 1
17/55
ENGN8530: CVIU 17
The Default Approach (2)
Image sensor
Visible,infra-red,
radar
Image capture
Digitisation
Image processing
Feature detection
(edges, corners,regions)
Feature grouping
Characterization
of parts
Object
recognition
7/30/2019 CVIU Lecture 1
18/55
ENGN8530: CVIU 18
What is in Image? An image is an array/matrix of
values (picture elements =pixels) on a plane whichdescribe the world from the
point of view of the observer.
Because of the line of sighteffect, this is a 2D
representation of the 3D world. The meaning of the pixels
depends on the sensors used
for their acquisition. Source: Antonio Robles-Kelly
7/30/2019 CVIU Lecture 1
19/55
ENGN8530: CVIU 19
Imaging Sensors The information seen by the imaging device is digitised
and stored as pixel values. Two important quantities of imaging sensors are:
Spatial resolution: How many pixels are there? Image size
Signal resolution: How many values per pixel? There are many different types ofsensors
Optical: CCDs, CMOS, photodiodes, photomultipliers,
photoresistors Infrared: Bolometers
Others: Range sensors (laser), Synthetic Aperture Radar (SAR),Positron emission tomography (PET), Computed (Axial)
Tomography (CAT/CT), Magnetic Resonance Imaging (MRI)
7/30/2019 CVIU Lecture 1
20/55
ENGN8530: CVIU 20
Electro-Magnetic SpectrumSWIR MWIR LWIR
1.7m2.5m3.0m 5.0m 14.0m8.0m
NIR
1.0m
UV Visible
0.4m
The human eye can seelight between 400 and700 nm.
7/30/2019 CVIU Lecture 1
21/55
ENGN8530: CVIU 21
Charge-Coupled Device (CCD) CCDs (Charge-Coupled Devices)
were invented in 1969 by WillardBoyle and George Smith at
AT&T.
They are composed of an arrayof capacitors which are sensibleto light.
More modern devices are basedupon photodiodes.
Source: Wikipedia
7/30/2019 CVIU Lecture 1
22/55
ENGN8530: CVIU 22
CCD (2) Generally, the light-sensitive unit of
construction is arranged in an arraywhose topology is a lattice
Not always true, e.g. log-polarCCDs
Colour CCDs:
Bayer filter: 1x Red, 1x Blue, 2x Green
because the human eye is moresensitive to green
RGBE filter: 1x Red, 1x Blue, 1x Green,1x Emerald (Cyan)
Bayer filter
RGBE filter
Source: Wikipedia
7/30/2019 CVIU Lecture 1
23/55
ENGN8530: CVIU 23
Bolometers Invented by the astronomer Samuel Pierpont Langley in
1878.
It is a device comprised ofan "absorber" in contact witha heat sink through aninsulator. The sink can beviewed as a reference for
the absorber temperature,which is raised by the powerof the incident
electromagnetic wave.
Source: Los Alamos National Laboratory
7/30/2019 CVIU Lecture 1
24/55
ENGN8530: CVIU 24
Microbolometer The microbolometer, a particular kind of bolometer, is
the basis for thermal cameras.
It is a grid of vanadium oxide or amorphous silicon heatsensors atop a corresponding grid of silicon.
IR radiation from a specific rangeof wavelengths strikes thevanadium oxide and changes its
electrical resistance. Thisresistance change is measured andprocessed into temperatures which
can be represented graphically. Source: Roland Goecke
7/30/2019 CVIU Lecture 1
25/55
ENGN8530: CVIU 25
Synthetic Aperture Radar SARis an active sensing technique
Active sensor transmits radio waves
Antenna picks up reflections
For a conventional radar, the footprint is governed by the
size of the antenna (aperture).
SAR creates a synthetic aperture and delivers a 2Dimage. One dimension is the range (cross track),
whereas the other one is the azimuth (along track).
Sonar and ultrasound work on the same principles but indifferent wavelengths
7/30/2019 CVIU Lecture 1
26/55
ENGN8530: CVIU 26
SAR (2)
Radar Track
Range
Azimuth
Nadir Track
RADAR = Radio Detection and Ranging
NADIR = Opposite of zenith SAR image of Venus
Source: Wikipedia
7/30/2019 CVIU Lecture 1
27/55
ENGN8530: CVIU 27
Positron Emission Tomography Active sensing technique.
PET based on measuring emittedradiation.
PET is a nuclear medicineimaging technique which usesradiation from a radio-isotopeintroduced into the target.
PET produces a 3D image or mapof functional processes in thebody.
Source: Wikipedia
7/30/2019 CVIU Lecture 1
28/55
ENGN8530: CVIU 28
Magnetic Resonance Imaging Active sensing technique.
MRI also based on measuringemitted radiation.
MRI simulates the emission ofradiation by aligning the spinsof water molecules making useof a high energy magnetic field
(several Tesla!).
Good for showing soft tissue
Not good for showing bones
MRI
Magnetic
Resonance
Angiography
Source: Wikipedia
7/30/2019 CVIU Lecture 1
29/55
ENGN8530: CVIU 29
Functional MRI Functional MRI (fMRI) measures
signal changes in the brain thatare due to changing neuralactivity.
Increases in neural activity causechanges in the MR signal due tochange in ratio of oxygenated to
deoxygenated haemoglobin. Deoxygenated haemoglobin
attenuates the MR signal.
fMRI of head: Highlighted areasshow primary visual cortex
Source: Wikipedia
7/30/2019 CVIU Lecture 1
30/55
ENGN8530: CVIU 30
Computed (Axial) Tomography Employs a set of axially acquired x-
ray images to recover a 3Drepresentation of the object.
Originally, the images were in axial
or transverse planes, but the modernCT scanner deliver volumetric data.
Digital geometry processing is used
to generate a 3D image of theinternals of an object from a largeseries of 2D X-ray images taken
around a single axis of rotation.
CT scan of head
Source: Wikipedia
7/30/2019 CVIU Lecture 1
31/55
ENGN8530: CVIU 31
CAT/CT Good for showing
bones
Not good for showingsoft tissue
Modern diagnostic software
7/30/2019 CVIU Lecture 1
32/55
ENGN8530: CVIU 32
Camera Geometry
The apertureallows light to enter the camera
The image planeis where the image is formed
The focal lengthis the distance between the aperture and theimage plane
The optical axispasses through the center of the aperture andis perpendicular to it.
f (focal length)
x'
image planeaperture
y'd
z optical axis
7/30/2019 CVIU Lecture 1
33/55
ENGN8530: CVIU 33
Camera Geometry (2)
f (focal length)
x'
x
z optical axis x'
By similar triangles, x'/f=x/z
For small angle
tanor fxzxfx ==
fx =
7/30/2019 CVIU Lecture 1
34/55
ENGN8530: CVIU 34
Camera Geometry (3)
f
x'x'
xt
xb x't
x'b
x
And, using the formula in the previous slide
Hence, size transforms as
z
xf
z
fxx
xxxz
fx
xz
fx
x
bt
bt
b
b
t
t=
====
)(
and,
ffx =2
tan2
7/30/2019 CVIU Lecture 1
35/55
ENGN8530: CVIU 35
Camera Geometry (4)
Close objectDistant object
Rays that pass through the camera aperture spread out anddo not make a sharp point on the image.
These rays need to be focussed to make a sharp point in
the image. The rays from close objects diverge more than from distant
objects
For very distant objects, the rays are effectively parallel
7/30/2019 CVIU Lecture 1
36/55
ENGN8530: CVIU 36
Aperture and Resolution Light diffracts as it passes through the aperture
A point in the scene spreads out into a blob in the image(fundamental limit on image sharpness)
Size of Airy disk (and best resolution) is (Rayleigh
criterion)
where is the wavelength of the light, d is the apertured
fR
d
22.122.1 minmin ==
Circular apertureAiry disk
Squareaperture
Separatepoints
7/30/2019 CVIU Lecture 1
37/55
ENGN8530: CVIU 37
Resolution The resolution of a camera is the minimum separation
between two points such that they appear separately onthe image plane
Since distant objects appear smaller and closer together,
the resolution varies with respect to the distance.
The angle between separable objects does not vary wrtdistance angular resolution
The distance on the image plane does not vary imageplane resolution.
7/30/2019 CVIU Lecture 1
38/55
ENGN8530: CVIU 38
Camera Models Pinhole camera
Camera with lenses
7/30/2019 CVIU Lecture 1
39/55
ENGN8530: CVIU 39
Pinhole Camera Advantages
No distortion of image
Depth of field from a few cm to infinity
Wide angular field
Works on ultra-violet and X-rays
Disadvantages
Very limited light gathering
Poor resolution
7/30/2019 CVIU Lecture 1
40/55
ENGN8530: CVIU 40
Pinhole Camera (2)
Simplest camera
The pinhole (aperture d) must be small to get asharp image
But we need a large pinhole to get enough light!
7/30/2019 CVIU Lecture 1
41/55
ENGN8530: CVIU 41
Pinhole Camera (3) For distant objects the
geometric limit is
The diffraction limit is
The best resolution occurswhen these two are equal:
or
f* is the optimal focal length
d
dR =
dfR /22.1 =
dfd /22.1 *=
22.1/2* df =
R=
d
Geometric
Diffraction
f
R
7/30/2019 CVIU Lecture 1
42/55
ENGN8530: CVIU 42
Pinhole Camera (4)Geometric limit
Longer wavelength
Smaller aperture
7/30/2019 CVIU Lecture 1
43/55
ENGN8530: CVIU 43
Cameras with Lenses For better light-gathering capabilities, we need to
increase the aperture.
A lens removes the geometric limit on resolution,since it focuses all light entering through the
aperture on the same point on the image.
f
d
Pinholepath
7/30/2019 CVIU Lecture 1
44/55
ENGN8530: CVIU 44
Cameras with Lenses (2) We can have apertures as large as we like
The price to pay: chromatic and spherical aberration
The image-plane resolution of lens based camera is thediffraction limit of the aperture:
The larger the aperture, the better the resolution
The image-plane resolution is still f
d/22.1 =
dfR /22.1 =
7/30/2019 CVIU Lecture 1
45/55
ENGN8530: CVIU 45
Camera Resolution Examples Pinhole camera, 0.5mm pinhole
Optimal focal length f*=37cm
=4.6', equivalent to 1mm at 75cm
For a 35mm lens camera and visible light:
=3.9'', 1mm at 52m
Focal length depends on the lens, but typically
7/30/2019 CVIU Lecture 1
46/55
ENGN8530: CVIU 46
Illumination The amount of light entering the camera is proportional
to the area of the lens (d2/4)
The area covered by the image is proportional to f2
So, the brightness of the image is proportional to d2/f2
Dependent on the focal ratio f/d
Brightness is controlled by a moveable aperture whichchanges d
Referred to by a sequence of f-stops; f:1 is fully open,each successive f-stop halves the brightness (so theaperture is reduced by 2): f:1.4, f:2, f:2.8, f:4, f:5.6
7/30/2019 CVIU Lecture 1
47/55
ENGN8530: CVIU 47
Absorption and Reflection
Reflection
Transmission
Absorption
Reflected + absorbed + t ransmit t ed energy
= I ncident l ight energy
All of these are object(material, surface) dependant!
7/30/2019 CVIU Lecture 1
48/55
ENGN8530: CVIU 48
The BSDF
Source: Wikipedia
Bidirectional Scattering
Distribution Function
Describes the way in which lightis scattered by a surface
BSDF = BRDF + BSSRDF + BTDF
BRDF - Bidirectional reflectancedistribution function
BSSRDF - Bidirectional surfacescattering reflectance distributionfunction (incl. subsurface scattering)
BTDF - Bidirectional transmittance
distribution function
7/30/2019 CVIU Lecture 1
49/55
ENGN8530: CVIU 49
The BRDF It describes the reflectance of
an object as a function of theillumination, viewinggeometry and wavelength.
Its given by the ratio ofirradiance (incident flux perunit area) to radiance
(reflected flux per unit area).
Reference:
F. Nicodemus, "Reflectance nomenclature and directional reflectance and emissivity," Appl. Opt.,
Vol. 9, 1970, pp. 14741475.
7/30/2019 CVIU Lecture 1
50/55
ENGN8530: CVIU 50
The BRDF (2) The modelling of the lighting conditions in the scene is of
pivotal importance for the acquisition and processing ofdigital imagery.
The radiance function can be decomposed into a linear
combination ofambient, diffuse and specularcomponents.
Recovering the radiance function from a single image is
an underconstrained problem.
7/30/2019 CVIU Lecture 1
51/55
ENGN8530: CVIU 51
The BRDF (3) In general, the BRDF has the following form
The function depends on
Incoming and outgoing angle
Incoming and outgoing wavelength
Incoming and outgoing polarisation
Incoming and outgoing position (subsurface scattering)
Delay between the incoming and outgoing light rays
7/30/2019 CVIU Lecture 1
52/55
ENGN8530: CVIU 52
Radiance Power per unit projected area perpendicular to
the ray per unit solid angle in the direction of theray
Flux given byd = L(x,) cos d dA
Solid angle is proportional to the surface area, Sof a projection of the object onto a sphere dividedby the square of its radius R.
dA dw
L(x,w)
7/30/2019 CVIU Lecture 1
53/55
ENGN8530: CVIU 53
Example BRDFs Oren and Nayar
Cook and Torrance
7/30/2019 CVIU Lecture 1
54/55
ENGN8530: CVIU 54
Example BRDFs (2)
where mp is the microfacet slope
7/30/2019 CVIU Lecture 1
55/55
ENGN8530: CVIU 55
Example BRDFs (3) Phong