COMPUTER VISION Introduction Computer vision is the study and application of methods which allow computers to "understand" image content or content of multidimensional data in general. The term "understand" means here that specific information is being extracted from the image data for a specific purpose: either for presenting it to a human operator (e. g., if cancerous cells have been detected in a microscopy image), or for controlling some process (e. g., an industry robot or an autonomous vehicle). The image data that is fed into a computer vision system is often a digital gray-scale or colour image, but can also be in the form of two or more such images (e. g., from a stereo camera pair), a video sequence, or a 3D volume (e. g., from a tomography device). In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common. Computer vision can also be described as the complement (but not necessary the opposite) of biological vision. In biological vision and visual perception real vision systems of humans and various animals are studied, 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
COMPUTER VISION
Introduction
Computer vision is the study and application of methods which allow computers to
"understand" image content or content of multidimensional data in general. The term
"understand" means here that specific information is being extracted from the image data
for a specific purpose: either for presenting it to a human operator (e. g., if cancerous
cells have been detected in a microscopy image), or for controlling some process (e. g.,
an industry robot or an autonomous vehicle). The image data that is fed into a computer
vision system is often a digital gray-scale or colour image, but can also be in the form of
two or more such images (e. g., from a stereo camera pair), a video sequence, or a 3D
volume (e. g., from a tomography device). In most practical computer vision applications,
the computers are pre-programmed to solve a particular task, but methods based on
learning are now becoming increasingly common. Computer vision can also be described
as the complement (but not necessary the opposite) of biological vision. In biological
vision and visual perception real vision systems of humans and various animals are
studied, resulting in models of how these systems are implemented in terms of neural
processing at various levels.
State Of The Art
Relation between Computer vision and various other fields
The field of computer vision can be characterized as immature and diverse. Even though
earlier work exists, it was not until the late 1970's that a more focused study of the field
1
started when computers could manage the processing of large data sets such as images.
However, these studies usually originated from various other fields, and consequently
there is no standard formulation of the "computer vision problem". Also, and to an even
larger extent, there is no standard formulation of how computer vision problems should
be solved. Instead, there exists an abundance of methods for solving various well-defined
computer vision tasks, where the methods often are very task specific and seldom can be
generalized over a wide range of applications. Many of the methods and applications are
still in the state of basic research, but more and more methods have found their way into
commercial products, where they often constitute a part of a larger system which can
solve complex tasks (e.g., in the area of medical images, or quality control and
measurements in industrial processes).
A significant part of artificial intelligence deals with planning or deliberation for system
which can perform mechanical actions such as moving a robot through some
environment. This type of processing typically needs input data provided by a computer
vision system, acting as a vision sensor and providing high-level information about the
environment and the robot. Other parts which sometimes are described as belonging to
artificial intelligence and which are used in relation to computer vision is pattern
recognition and learning techniques. As a consequence, computer vision is sometimes
seen as a part of the artificial intelligence field.
Since a camera can be seen as a light sensor, there are various methods in computer
vision based on correspondences between a physical phenomenon related to light and
images of that phenomenon. For example, it is possible to extract information about
motion in fluids and about waves by analyzing images of these phenomena. Also, a
subfield within computer vision deals with the physical process which given a scene of
objects, light sources, and camera lenses forms the image in a camera. Consequently,
computer vision can also be seen as an extension of physics.A third field which plays an
important role is neurobiology, specifically the study of the biological vision system.
Over the last century, there has been an extensive study of eyes, neurons, and the brain
structures devoted to processing of visual stimuli in both humans and various animals.
This has led to a coarse, yet complicated, description of how "real" vision systems
2
operate in order to solve certain vision related tasks. These results have led to a subfield
within computer vision where artificial systems are designed to mimic the processing and
behaviour of biological systems, at different levels of complexity. Also, some of the
learning-based methods developed within computer vision have their background in
biology.
Yet another field related to computer vision is signal processing. Many existing methods
for processing of one-variable signals, typically temporal signals, can be extended in a
natural way to processing of two-variable signals or multi-variable signals in computer
vision. However, because of the specific nature of images there are many methods
developed within computer vision which have no counterpart in the processing of one-
variable signals. A distinct character of these methods is the fact that they are non-linear
which, together with the multi-dimensionality of the signal, defines a subfield in signal
processing as a part of computer vision.
Beside the above mentioned views on computer vision, many of the related research
topics can also be studied from a purely mathematical point of view. For example, many
methods in computer vision are based on statistics, optimization or geometry. Finally, a
significant part of the field is devoted to the implementation aspect of computer vision;
how existing methods can be realized in various combinations of software and hardware,
or how these methods can be modified in order to gain processing speed without losing
too much performance.
Related Fields
Computer vision, Image processing, Image analysis, Robot vision and Machine vision are
closely related fields. If you look inside text books which have either of these names in
the title there is a significant overlap in terms of what techniques and applications they
cover. This implies that the basic techniques that are used and developed in these fields
are more or less identical, something which can be interpreted as there is only one field
with different names. On the other hand, it appears to be necessary for research groups,
scientific journals, conferences and companies to present or market themselves as
3
belonging specifically to one of these fields and, hence, various characterizations which
distinguish each of the fields from the others have been presented. The following
characterizations appear relevant but should not be taken as universally accepted.
Image processing and Image analysis tend to focus on 2D images, how to transform one
image to another, e.g., by pixel-wise operations such as contrast enhancement, local
operations such as edge extraction or noise removal, or geometrical transformations such
as rotating the image. This characterization implies that image processing/analysis neither
require assumptions nor produce interpretations about the image content.
Computer vision tends to focus on the 3D scene projected onto one or several images,
e.g., how to reconstruct structure or other information about the 3D scene from one or
several images. Computer vision often relies on more or less complex assumptions about
the scene depicted in an image.
Machine vision tends to focus on applications, mainly in industry, e.g., vision based
autonomous robots and systems for vision based inspection or measurement. This implies
that image sensor technologies and control theory often are integrated with the processing
of image data to control a robot and that real-time processing is emphasized by means of
efficient implementations in hardware and software. There is also a field called Imaging
which primarily focus on the process of producing images, but sometimes also deals with
processing and analysis of images. For example, Medical imaging contains lots of work
on the analysis of image data in medical applications.
Finally, pattern recognition is a field which uses various methods to extract information
from signals in general, mainly based on statistical approaches. A significant part of this
field is devoted to applying these methods to image data.A consequence of this state of
affairs is that you can be working in a lab related to one of these fields, apply methods
from a second field to solve a problem in a third field and present the result at a
conference related to a fourth field!
Typical Tasks Of Computer Vision
4
Each of the application areas described above employ a range of computer vision tasks;
more or less well-defined measurement problems or processing problems, which can be
solved using a variety of methods. Some examples of typical computer vision tasks are
presented below.
Recognition
The classical problem in computer vision, image processing and machine vision is that of
determining whether or not the image data contains some specific object, feature, or
activity. This task can normally be solved robustly and without effort by a human, but is
still not satisfactory solved in computer vision for the general case: arbitrary objects in
arbitrary situations. The existing methods for dealing with this problem can at best solve
it only for specific objects, such as simple geometric objects (e.g., polyhedrons), human
faces, printed or hand-written characters, or vehicles, and in specific situations, typically
described in terms of well-defined illumination, background, and pose of the object
relative to the camera.
Different varieties of the recognition problem are described in the literature:
Recognition: one or several pre-specified or learned objects or object classes can
be recognized, usually together with their 2D positions in the image or 3D poses
in the scene.
Identification: An individual instance of an object is recognized. Examples:
identification of a specific person face or fingerprint, or identification of a specific
vehicle.
Detection: the image data is scanned for a specific condition. Examples: detection
of possible abnormal cells or tissues in medical images or detection of a vehicle in
an automatic road toll system. Detection based on relatively simple and fast
computations is sometimes used for finding smaller regions of interesting image
data which can be further analyzed by more computationally demanding
techniques to produce a correct interpretation. Several specialized tasks based on
recognition exist, such as:
5
Content-based image retrieval: find all images which has a specific content in a
larger set or database of images.
Pose estimation: estimation of the position and orientation of specific object
relative to the camera. Example: to allow a robot arm to pick up the objects from
the belt.
Optical character recognition (or OCR): images of printed or handwritten text
are converted to computer readable text such as ASCII or Unicode.
Motion
Several tasks relate to motion estimation in which an image sequence is processed to
produce an estimate of the local image velocity at each point. Examples of such tasks are
Egomotion: determine the 3D rigid motion of the camera.
Tracking of one or several objects (e.g. vehicles or humans) through the image
sequence.
Surveillance: detection of possible activities based on motion.
Scene Reconstruction
Given two or more images of a scene, or a video, scene reconstruction aims at computing
a 3D model of the scene. In the simplest case the model can be a set of 3D points. More
sophisticated methods produce a complete 3D surface model.
Image Restoration
Given an image, an image sequence, or a 3D volume, which has been degraded by noise,
image restoration aims at producing the image data without the noise. Examples of noise
processes which are considered are sensor noise (e.g., ultrasonic images) and motion blur
(e.g., because of a moving camera or moving objects in the scene).
Computer Vision Systems
6
A typical computer vision system can be divided in the following subsystems:
Image acquisition
The image or image sequence is acquired with an imaging system
(camera,radar,lidar,tomography system). Often the imaging system has to be calibrated
before being used.
Preprocessing
In the preprocessing step, the image is being treated with "low-level"-operations. The aim
of this step is to do noise reduction on the image (i.e. to dissociate the signal from the
noise) and to reduce the overall amount of data. This is typically being done by
employing different (digital)image processing methods such as:
1. Downsampling the image.
2. Applying digital filters
3. Computing the x- and y-gradient (possibly also the time-gradient).
4. Segmenting the image.
a. Pixelwise thresholding.
5. Performing an eigentransform on the image
a. Fourier transform
6. Doing motion estimation for local regions of the image (also known as optical
flow estimation).
7. Estimating disparity in stereo images.
8. Multiresolution analysis
Feature extraction
7
The aim of feature extraction is to further reduce the data to a set of features, which ought
to be invariant to disturbances such as lighting conditions, camera position, noise and
distortion. Examples of feature extraction are:
1. Performing edge detection or estimation of local orientation.
2. Extracting corner features.
3. Detecting blob features.
4. Extracting spin images from depth maps.
5. Extracting geons or other three-dimensional primitives, such as superquadrics.
6. Acquiring contour lines and maybe curvature zero crossings.
7. Generating features with the Scale-invariant feature transform.
8. Calculating the Co-occurrence matrix of the image or sub-images to measure
texture.
Registration
The aim of the registration step is to establish correspondence between the features in the
acquired set and the features of known objects in a model-database and/or the features of
the preceding image. The registration step has to bring up a final hypothesis. To name a
few methods:
1. Least squares estimation
2. Hough transform in many variations
3. Geometric hashing
4. Particle filtering
Applications Of Computer Vision
The following is a non-complete list of applications which are studied in computer vision.
In this category, the term application should be interpreted as a high level function which
solves a problem at a higher level of complexity. Typically, the various technical
problems related to an application can be solved and implemented in different ways.
8
Applications Of Computer Vision
A facial recognition system is a computer-driven application for automatically
identifying a person from a digital image. It does that by comparing selected facial
features in the live image and a facial database. It is typically used for security systems
and can be compared to other biometrics such as fingerprint or eye iris recognition
systems.
Popular recognition algorithms include eigenface, fisherface, the Hidden Markov model,
and the neuronal motivated Dynamic Link Matching. A newly emerging trend, claimed to
achieve previously unseen accuracies, is three-dimensional face recognition. Another
emerging trend uses the visual details of the skin, as captured in standard digital or
scanned images. Tests on the FERET database, the widely used industry benchmark,
showed that this approach is substantially more reliable than previous algorithms.
Polly (robot)
Polly was a robot created at the MIT Artificial Intelligence Laboratory by Ian Horswill
for his PhD, which was published in 1993 as a technical report. It was the first mobile
robot to move at animal-like speeds (1m per second) using computer vision for its
navigation. It was an example of behavior based robotics. For a few years, Polly was able
to give tours of the AI laboratory's seventh floor, using canned speech to point out
landmarks such as Anita Flynn's office. The Polly algorithm is a way to navigate in a
cluttered space using very low resolution vision to find uncluttered areas to move forward
into, assuming that the pixels at the bottom of the frame (the closest to the robot) show an
example of an uncluttered area. Since this could be done 60 times a second, the algorithm
only needed to discriminate three categories: telling the robot at each instant to go
straight, towards the right or towards the left.
Mobile robot
9
Mobile Robots are automatic machines that are capable of movement in a given
environment. Robots generally fall into two classes, linked manipulators (or Industrial
robots) and mobile robots. Mobile robots have the capability to move around in their
environment and are not fixed to one physical location. In contrast, industrial
manipulators usually consist of a jointed arm and gripper assembly (or end effector) that
is attached to a fixed surface.
The most common class of mobile robots are wheeled robots. A second class of mobile
robots includes legged robots while a third smaller class includes aerial robots, usually
referred to as unmanned aerial vehicles (UAVs). Mobile robots are the focus of a great
deal or current research and almost every major university has one or more labs that
focus on mobile robot research. Mobile robots are also found in industry, military and
security environments, and appear as consumer products.
Robot
A humanoid robot manufactured by Toyota "playing" a trumpet
The word robot is used to refer to a wide range of machines, the common feature of
which is that they are all capable of movement and can be used to perform physical tasks.
Robots take on many different forms, ranging from humanoid, which mimic the human
form and way of moving, to industrial, whose appearance is dictated by the function they
are to perform. Robots can be grouped generally as mobile robots (eg. autonomous
vehicles), manipulator robots (eg. industrial robots) and Self reconfigurable robots, which
can conform themselves to the task at hand.
Robots may be controlled directly by a human, such as remotely-controlled bomb-
disposal robots, robotic arms, or shuttles, or may act according to their own decision
making ability, provided by artificial intelligence. However, the majority of robots fall in-
between these extremes, being controlled by pre-programmed computers. Such robots
may include feedback loops such that they can interact with their environment, but do not
display actual intelligence.
10
The word "robot" is also used in a general sense to mean any machine which mimics the
actions of a human (biomimicry), in the physical sense or in the mental sense.It comes
from the Czech and Slovak word robota, labour or work (also used in a sense of a serf).
The word robot first appeared in Karel Čapek's science fiction play R.U.R. (Rossum's
Universal Robots) in 1921.
History
The construction of the Soviet-made robot of the 1970's. The robot was able to move,
reproduce the pre-recorded sounds, imitate the clever conversation using the built-in
11
radio station and demonstrate movies on the built-in screen. It was used in various
shows.The word robot was introduced by Czech writer Karel Capek in his play R.U.R.
(Rossum's Universal Robots) which was written in 1920 (See also Robots in literature for
details of the play). However, the verb robotovat, meaning "to work" or "to slave", and
the noun robota (meaning corvée) used in the Czech and Slovak languages, has been
used since the early 10th century. It was suggested that the word robot had been coined
by Karel Čapek's brother, painter and writer Josef Čapek.
An early automaton was created 1738 by Jacques de Vaucanson, who created a
mechanical duck that was able to eat grain, flap its wings, and excrete.
The first human to be killed by a robot was 37 year-old Kenji Urada, a Japanese factory
worker, in 1981. According the Economist.com, Urada "climbed over a safety fence at a
Kawasaki plant to carry out some maintenance work on a robot. In his haste, he failed to
switch the robot off properly. Unable to sense him, the robot's powerful hydraulic arm
kept on working and accidentally pushed the engineer into a grinding machine."
Smart Camera
A smart camera is an integrated machine vision system which, in addition to image
capture circuitry, includes a processor, which can extract information fromimageswithout
need for an external processing unit, and interface devices used to make results available
to other devices.
A Smart Camera or „intelligent Camera“ is a self-contained, standalone vision system
with built-in image sensor in the housing of an industrial video camera. It contains all
12
necessary communication interfaces, e.g. Ethernet. It is not necessarily larger than an
industrial or surveillance camera. This architecture has the advantage of a more compact
volume compared to PC-based vision systems and often achieves lower cost, at the
expense of a somewhat simpler (or missing altogether) user interface.
Early smart camera (ca. 1985, in red) with an 8MHz Z80 compared to a modern device
featuring Texas Instruments' C64 @1GHz. A Smart Camera usually consists of several
(but not necessarily all) of the following components:
1. Image sensor (matrix or linear, CCD- or CMOS)
2. Image digitization circuitry
3. Image memory
4. Communication interface (RS232, Ethernet)
5. I/O lines (often optoisolated)
6. Lens holder or built in lens (usually C or C-mount)
Examples Of Applications For Computer Vision
Another way to describe computer vision is in terms of applications areas. One of the
most prominent application fields is medical computer vision or medical image
processing. This area is characterized by the extraction of information from image data
for the purpose of making a medical diagnosis of a patient. Typically image data is in the
form of microscopy images, X-ray images, angiography images, ultrasonic images, and
tomography images. An example of information which can be extracted from such image
data is detection of tumours, arteriosclerosis or other malign changes. It can also be
measurements of organ dimensions, blood flow, etc. This application area also supports