Touchscreen Add-On
Official Website: YosrON.com
July, 2012
Cairo University Faculty of Engineering
Electrical Electronics and Communications Department
i
YosrON: Touchscreen add-on
By
Donia Alaa Eldin Hassan Idriss: [email protected]
Muhammad Al-Sherbeeny Hassan: [email protected]
Under the Supervision of
Dr. Ibrahim Qamar
A Graduation Project Report Submitted to
the Faculty of Engineering at Cairo University
In Partial Fulfillment of the Requirements for the Degree of
Bachelor of Science
in
Electronics and Communications Engineering
Faculty of Engineering, Cairo University
Giza, Egypt
July 2012
ii
Table of Contents
List of Figures ................................................................................................................ v
Acknowledgments......................................................................................................... vi
Abstract ........................................................................................................................ vii
Chapter 1: Introduction .............................................................................................. 1
1.1 Why is it important? ........................................................................................ 1
1.2 Other related projects ...................................................................................... 3
1.3 YosrON is built on the 2nd
version of EverScreen .......................................... 6
1.3.1 The hardware ........................................................................................... 6
1.3.2 The software............................................................................................. 6
1.3.3 The advantages of YosrON ...................................................................... 7
1.3.4 The challenges we expected ..................................................................... 7
1.3.5 The skills we needed ................................................................................ 8
1.3.6 The plan ................................................................................................... 8
Chapter 2: YosrON structure ..................................................................................... 9
2.1 System Description ......................................................................................... 9
2.2 Scanlines.......................................................................................................... 9
2.3 Noise reduction ............................................................................................. 10
2.4 Fast pointer detection .................................................................................... 11
2.5 Positioning the cameras ................................................................................. 14
2.6 Calibration phase ........................................................................................... 16
2.7 Tracking algorithm ........................................................................................ 16
2.8 Resolution accuracy ...................................................................................... 18
2.9 Algorithm complexity ................................................................................... 21
2.10 Settings and system performance .............................................................. 21
Chapter 3: Notes on the code ................................................................................... 22
3.1 Overall flow of the code ................................................................................ 22
iii
3.2 Notes on main.cpp ......................................................................................... 24
3.2.1 Defining the two webcams we need ...................................................... 24
3.2.2 Smoothing neighborhood in averaging .................................................. 24
3.2.3 The color model to be used .................................................................... 24
3.2.4 Debugging .............................................................................................. 25
3.2.5 Luminance.............................................................................................. 25
3.2.6 Control the code while running.............................................................. 25
3.3 Notes on constants.h ...................................................................................... 26
3.3.1 Threshold of colors difference in each pixel .......................................... 26
3.3.2 Consecutive pixels threshold ................................................................. 26
3.3.3 Calibration touch offset.......................................................................... 26
3.3.4 Consecutive detections to locate a corner .............................................. 26
3.3.5 Limit of attempts to locate a corner ....................................................... 27
3.3.6 Calibration scanlines distances .............................................................. 27
3.3.7 Picture format, resolution, fps and grab method .................................... 27
3.4 Compiling the code after any edits ................................................................ 28
Chapter 4: Challenges .............................................................................................. 29
4.1 The environment ........................................................................................... 29
4.1.1 OpenCV on Windows ............................................................................ 29
4.1.2 C/C++ programming on Linux/Ubuntu ................................................. 29
4.1.3 Libraries that must be installed for the code .......................................... 30
4.2 The cameras................................................................................................... 33
4.3 The fisheye lenses ......................................................................................... 34
Chapter 5: Conclusions and Future Work ............................................................... 38
References .................................................................................................................... 39
Chapter 0: Appendix ................................................................................................ 43
0.1 Installing Ubuntu 11.10 ................................................................................. 43
iv
0.2 Installing the required libraries and packages ............................................... 50
0.2.1 Installing "build-essential" using the "Terminal" .................................. 50
0.2.2 Installing libraries/packages using "Ubuntu Software Center".............. 53
0.3 Check webcam supported formats and UVC compliance ............................. 54
0.3.1 UVC compliance check ......................................................................... 54
0.3.2 Supported configurations and formats ................................................... 55
0.3.3 Troubleshooting webcams ..................................................................... 56
v
List of Figures
Figure 1.1: Survey results on UniMasr.com website. .................................................... 2
Figure 1.2: Survey results on YosrON page on facebook. ............................................ 3
Figure 1.3: Touchscreen add-on by TouchMagic. ......................................................... 3
Figure 2.1: Visual representation of scanlines. ............................................................ 10
Figure 2.2: The buffer used for the analysis of the green row shows a clear peak. ..... 12
Figure 2.3: The system correctly detects only the the pointer coming from above. .... 13
Figure 2.4: The vertical contiguity constraint of a of a hand holding a pen. ............... 14
Figure 2.5: Example of a simple but inefficient configuration. ................................... 14
Figure 2.6: Suggested configuration to optimize the use of the cameras. ................... 15
Figure 2.7: Resolution accuracy of W1 in t. ................................................................. 19
Figure 2.8: A4Tech webcam, PK 720G model. ........................................................... 21
Figure 4.1: Full-frame fisheye image........................................................................... 35
Figure 4.2: Remapped full-frame fisheye image into rectilinear prespective. ............. 35
Figure 4.3: Circular fisheye image............................................................................... 36
Figure 4.4: The image of circular fisheye after remapping (Defisheye). ..................... 36
Figure 4.5: Fisheye for home doors. ............................................................................ 37
Figure 6.1: Windows Disk Management Tool. ............................................................ 44
Figure 6.2: Shrink dialog box. ..................................................................................... 45
Figure 6.3: Windows partitions after successfully freeing space. ............................... 46
Figure 6.4: Don't allow any updates ............................................................................ 47
Figure 6.5: Install Ubuntu alongside your current OS. ................................................ 48
Figure 6.6: Disabling automatic updates of Ubuntu. ................................................... 49
Figure 6.7: Using Ubuntu Software Center to install required libraries/packages. ..... 54
Figure 6.8: Checking supported configurations and formats using guvcview............. 55
vi
Acknowledgments
We would like to thank those who helped us to make this dream comes true. No
matter how big or little help they offered, we would like to mention them all as much
as we can. We will mention them according to the timing of their help.
Thanks to Dr. Ibrahim Qamar for accepting us and our idea. Thanks for his valuable
time and efforts of understanding and kindness discussing with us many problems
leading us to solutions.
Thanks to Eng. Abdel-Mohsen for telling us which programming language to use
(Matlab is easy but slow, C++ is good with toolboxes and very fast for image
processing).
Thanks to Eng. Khaled Yeiha and Eng. Ahmad Ismail for giving us useful
guidelines for the algorithm.
Thanks to Eng. E. Rustico (From Italy) for supporting us with documents, codes and
instructions that helped us very much as we built our project on his work, EverScreen.
Thanks to Dr. Essam, the glasses maker, for helping us with the fisheye lenses.
Thanks to Eng. Shaimaa Mahmoud and Eng. Dina Zeid for helping us with
OpenCV toolbox and in some translations (From Italian to English).
Thanks to Muhammad Sherif and Sherif Medhat for helping us with programming
on Ubuntu.
Thanks to Eng. Muhammad Hosny for helping us debugging some codes and
solving many problems we faced with the OS and the software.
Thanks to Mr. Muhammad Reda for helping us finding the compatible webcams.
Thanks to Eng. Sherbeeny Hasan, Muhammad's father, for helping us with the
webcams and the fisheye lenses.
Thanks to our families for supporting us in every way all the time.
vii
Abstract
The entire world is heading to design all operating systems and programs to work
with touch technology. But most of the Egyptians, and others around the world, can't
afford the cost of a touchscreen for their computers. That's why we came up with
YosrON.
YosrON is meant to be a touchscreen add-on that can be put on any computer screen,
PC or laptop, to add the "touch" feature to the computer screen using a USB
connection and software.
It has been built on a complete and inexpensive system to track the movements of a
physical pointer on a flat surface. Any opaque object can be used as a pointer (fingers,
pens, etc.) and it is possible to discriminate whether the surface is being touched or
just pointed at. The system relies on two entry-level webcams and it uses a fast
scanline-based algorithm. A calibration wizard helps the user during the initial setup
of the two webcams. No markers, gloves or other hand-held devices are required.
Since the system is independent from the nature of the pointing surface, it is possible
to use a screen or a projected wall as a virtual touchscreen. The complexity of the
algorithms used by the system grows less than linearly with resolution, making the
software layer very lightweight and suitable also for low-powered devices like
embedded controllers.
We were planning to make a resizable plastic frame as housing for the webcams and
the added wide-angle lenses, fisheye lenses, but we ran out of time and faced many
problems that made us postpone it for future work besides adding the multi-touch
feature.
For now, YosrON is just two webcams fixed far away from the touching surface and
software is used for calibration and moving the mouse.
1
Chapter 1: Introduction
1.1 Why is it important?
The advances in technology and the widespread usage of computers in almost every
field of human activity are necessitating new interaction methods between humans
and machines. The traditional keyboard and mouse combination has proved its
usefulness but also, and in a more extensive way, its weakness and limitations. In
order to interact in an efficient and expressive way with the computer, humans need to
be able to communicate with machines in a manner more similar to human-human
communication.
In fact, throughout their evolution, human beings have used their hands, alone or with
the support of other means and senses, to communicate with others, to receive
feedback from the environment, and to manipulate things. It therefore seems
important that technology makes it possible to interact with machines using some of
these traditional skills.
The human-computer interaction (HCI) community has invented various tools to
exploit humans’ gestures, the first attempts resulting in mechanical devices.
Devices such as data gloves can prove especially interesting and useful in certain
specific applications but have the disadvantage of often being onerous, complex to
use, and somewhat obtrusive.
The use of computer vision can consequently be a possible alternative. Recent
advances in computer vision techniques and availability of fast computing have made
the real-time requirements for HCI feasible. Consequently, extensive research has
been done in the field of computer vision to identify hand poses and static gestures,
and also, more recently, to interpret the dynamic meaning of gestures. Computer
vision systems are less intrusive and impose lower constraints on the user since they
use video cameras to capture movements and rely on software applications to perform
the analysis.
2
Among the existing graphical input devices, computer users love especially
touchscreens. The reason is that they reflect, as no other device does, the way we use
to get in touch and interact with the reality around us: we use to point and touch
directly with our hands what we see around us; touchscreens allow doing the same
with our fingers on computer interfaces. This preference is confirmed by a strong
trend in the industry of high end platforms (e.g. Microsoft Surface and Touchwall)
and in the market of mobile devices: Apple, Samsung and Nokia, to cite only a few
examples, finally chose a touch-sensible display for their leading products, while the
interest for this technology is growing also for design studios, industrial environments
and public information points like museums and ATMs.
Unfortunately, touchscreen flexibility is low: finger tracking is impossible without
physical contact; it is not possible to use sharp objects on them; large touch-sensible
displays are expensive because of their manufacturing cost and damage-proneness.
YosrON is made of low cost devices, without the use of any kind of equipment that is
not possible to find in any computer shop with less than 300 EGP which is reasonable
price for the Egyptian market and other similar markets. It's important to offer such
add-on with low price because the upcoming Microsoft Windows 8 OS, which is the
most common OS in Egypt, is mainly designed for touchscreens. Of course it can be
used without touchscreens, but that would be a great loss for the user experience.
We made a simple survey asking many computer users and resellers if they would buy
such an add-on and how much would they pay for it?
The results are in fig. 1.1 and fig. 1.2.
Figure 1.1: Survey results on UniMasr.com website.
3
Figure 1.2: Survey results on YosrON page on facebook.
1.2 Other related projects
The only commercial product we found is TouchMagic (fig. 1.3)
which is available in USA and can be found in the Middle East,
only in UAE, KSA and the occupied lands of Palestine; Israel.
This product is available in fixed sizes with minimal cost of 170
$ = 1000 EGP for 15" screens. So, if you changed your
computer/screen for any reason, you will probably need to buy a
new add-on that fits your screen size. That's why it's not wanted
in the market because it's expensive and not resizable.
But when we talk about researches and projects in computer interfaces, we find all of
them turning back to the human body, trying to adapt the way we communicate with
computers to our natural way of move and behave. Speech-driven interfaces, gesture-
recognition software and facial expression interpreters are just some examples of this
recent trend.
There is a growing interest in the ones that involve real-time body tracking, especially
if no expensive hardware is required and the user does not need to wear any special
equipment. The simplest and cheapest choice is to use optical devices to track a
specific part of the body (head, eyes, hands or even the nose {Check [GMR02] in the
Figure 1.3:
Touchscreen add-on
by TouchMagic.
4
references}); we focus on finger tracking systems that do not require lasers, markers,
gloves or hand-held devices [SP98, DUS01, Lee07].
The main application of finger tracking is to move a digital pointer over a screen,
enabling the user to replace the pointing device (e.g. the mouse) with his hands. While
for eye or head tracking we have to direct the camera(s) towards the users’ body,
finger tracking let us a wider range of choices.
The first possibility is to direct the camera towards the user’s body, as for head
tracking, and to translate the absolute or relative position of the user’s finger to screen
coordinates. In [WSL00] an empty background is needed; in [IVV01] the whole arm
position is reconstructed, and in [Jen99] a combination of depth and color analysis
helps to robustly locate the finger. Some works tried to estimate the position of the
fingertip relatively to the view frustum of the user; this was done in [CT06] with one
camera and in [pHYssCIb98] with stereovision, but both had strong limits in the
accuracy of the estimation.
The second possibility is to direct the camera towards the pointing surface, which may
be static or dynamic. Some works require a simple black pad as pointing surface,
making it easy to locate the user’s finger with only one camera [LB04]; however, we
may need additional hardware [Mos06] or stereovision [ML04] to distinguish if the
user is just hovering the finger on it or if there is a physical contact between the finger
and the surface. A physical desktop is an interesting surface to track a pointer on.
Some works are based on the DigitalDesk setup [Wel93], where an overhead
projector and one or more cameras are directed downwards on a desk and virtual
objects can interact with physical documents [Ber03, Wil05]; others use a similar
approach to integrate physical and virtual drawings on vertical or horizontal
whiteboards [Wil05, vHB01, ST05], and one integrates visual information with an
acoustic triangulation to achieve better accuracy [GOSC00]. These works use
differencing algorithms to segment the user’s hands from the background, and then
shape analysis or finger templates matching to locate the fingertips; they rely on the
assumption that the background surface is white, or in general of a color different than
skin. Other approaches work also on highly dynamic surfaces. It is possible to
robustly suppress the background by analyzing the screen color space [Zha03] or by
applying polarizing filters to the cameras [AA07]; in the first the mouse click has to
5
be simulated with a keystroke, while in the latter a sophisticated mathematical finger
model allow to detect the physical contact with stereovision. Unfortunately, these two
techniques cannot be applied to a projected wall. Directing the camera towards the
pointing surface implies, in general, the use of computationally expensive algorithms,
especially when we have to deal with dynamic surfaces.
A third possible approach, which may drastically reduce the above problems, is to
have the cameras watching sidewise - i.e. laying on the same plane of the surface;
using this point of view we do not have any problem with dynamic backgrounds both
behind the user or on the pointing surface, and this enables us to set up the system
also in environments otherwise problematic (e.g. large displays, outdoor, and so on).
Among the very few works using this approach, in [QMZ95] the webcam is on the top
of the monitor looking towards the keyboard, and the finger is located with a color
segmentation algorithm. The movement of the hand along the axis perpendicular to
the screen is mapped to the vertical movement of the cursor, and a keyboard button
press simulates the mouse click. However, the position of the webcam has to be
calibrated and the vertical movement is mapped in an unnatural way. Also in [WC05]
we find a camera on the top of a laptop display directed towards the keyboard, but the
mouse pointer is moved accordingly to the motion vectors detected in the gray scale
video flow; a capacitive touch sensor enables and disables the tracking, while the
mouse button has to be pressed with the other hand. In [Mor05], finally, the ―lateral‖
approach is used to embed four smart cameras into a plastic frame that is possible to
overlap on a traditional display.
The above approaches need to process the entire image as it is captured by the
webcam. Thus, every of the above algorithms are at least quadratic with respect to
resolution (or linear with respect to image area). Although it is possible to use smart
region finding algorithms, these would not resolve the problem entirely. In [FR08]
they proposed the 1st version of EverScreen, a different way to track user movements
keeping the complexity low. They drastically decreased the scanning area to a discrete
number of pixel lines of two uncalibrated cameras. Their system requires a simple
calibration phase that is easy to perform also for non-experienced users. The proposed
technique only regards the tracking of a pointer, and it is not about gesture
recognition. The output of the system, at present, is directly translated into mouse
movements, but may be instead interpreted by gesture recognition software.
6
1.3 YosrON is built on the 2nd
version of EverScreen
The 1st version of EverScreen focused its attention mostly on the mapping algorithm
and provided only a description of an early stage of the system. The 2nd
introduces a
more efficient and mature system, exploiting an improved pointer detection but
computationally and economically cheap as the previous one. Among the
improvements:
Two proximity constraints in the pointer detection help to reduce the
number of false positives.
A convolution-based algorithm is used to locate the presence of a pointer.
The gap from the reference backgrounds is kept under control to detect
camera movements.
The calibration phase is faster, and the system graphically shows the points
to touch.
Iterative algorithms are used to solve the linear systems instead of direct
formulas.
1.3.1 The hardware
YosrON was planned to consist of four 90-degrees view angle cameras fixed in
corners of a resizable frame with Arrays of IR or LEDs, all together connected to a
USB hub to be connected to the computer through one single port. We also planned to
implement the software on a microprocessor to eliminate any processing load on the
host computer. But we had to reduce the hardware because of some challenges that
will be mentioned later.
1.3.2 The software
It's for image processing and geometrical calculations on the cameras outputs to
determine the position of the finger (pointing tool). It was planned to be a C++ code
using OpenCV toolbox on visual studio in Windows OS. We faced some problems
with the configuration of the environment and some limitation with the toolbox so we
migrated to Ubuntu 11.10, 64-bit with lots of libraries to be mentioned later.
7
1.3.3 The advantages of YosrON
Resizable: With no glass used, the same item can be used with any screen
of any size.
Low cost: The expected cost for end users is around 200 EGP. (The
prototype cost less than 300 EGP, so the single item in mass production
would cost less!)
Fast: With configuration of 30 fps, the response of the software is
immediate (In the range of microseconds).
Accurate: With configuration of 320x240 resolution, the accuracy is
acceptable for touchscreen systems (OS and programs are designed with
big buttons)
Easy fabrication: Manufacturers can easily fabricate it in mass production
without the need of any new or complicated technology.
1.3.4 The challenges we expected
Cameras: Finding USB cameras of low cost and fast response with wide
view angle (90˚ at least).
Resizable frame: Fabrication of a plastic resizable frame and mounting
the cameras on it.
Processing: Building the software that can interact with the cameras and
process the images to determine the pointer/finger position.
Load: Reducing the processing load on the host computer using
microprocessor.
8
1.3.5 The skills we needed
Image processing using OpenCV toolbox on C++ visual studio. (Before
we migrate to Ubuntu)
Installing and configuring Linux/Ubuntu OS.
C/C++ programming on Ubuntu.
Debugging and troubleshooting.
And for production, we will need to make drivers for different OSs and to implement
the software on microprocessor.
1.3.6 The plan
Purchasing and installing webcams and wide-angle lenses.
Building the initial image processing code on live stream images on a
single webcam for finger detection only.
Building the code of calibration and solving the streams from both
webcams.
Building the mouse controlling code.
Fabricating the resizable frame housing.
Refining the software after housing.
Building the driver and calibration software.
9
Chapter 2: YosrON structure
2.1 System Description
The system now consists of two off-the-shelf webcams positioned sidewise so that the
lateral silhouette of the hand is captured into an image like fig. 2.1. After a quick
auto-calibration, the software layer will be able to interpret the image flow and
translate it into absolute screen coordinates and mouse button clicks; the
corresponding mouse events will be simulated on the OS in a completely transparent
way for the application level. We call pointing surface the rectangle of surface to be
tracked; as pointing surface we can choose a desk, a LCD panel, a projected wall, etc.
An automatic region stretching is done to map the coordinates of the pointing surface
to the target display. Any opaque object can be used to point or touch the surface: the
system will track a finger as well as a pencil, a chalk or a wooden stick.
2.2 Scanlines
We focus the processing only on a small number of pixel lines from the whole image
provided by each webcam; we call these lines scanlines. Each scanline is horizontal
and ideally parallel with the pointing surface; we call touching scanline the lowest
scanline (the nearest to the pointing surface), and pointing scanline every other one.
The calibration phase requires grabbing a frame before any pointer enters in the
tracking area; these reference frames (one per webcam) will be stored as reference
backgrounds, and will be used to look for runs of consecutive pixels different from
the reference background. We will see later how we detect such scan-line
interruptions (fig. 2.1). The detection of a finger only in pointing scanlines will mean
that the surface is only being pointed, while a detection in all the scanlines will mean
that the user is currently touching the surface. To determine if a mouse button
pressure has to be simulated, we can just look at the touching scanline: we assume
that the user is clicking if the touching scanline is occluded in at least one of the two
views.
10
Figure 2.1: Visual representation of scanlines.
During the calibration phase the number of scanlines of interest may vary from a
couple to tens; during the tracking, three or four scanlines will suffice for an excellent
accuracy. A detailed description of the calibration will be given later.
2.3 Noise reduction
We detect the presence of a physical pointer in the view frustum of a webcam by
comparing the current frame with the reference background. This is simple in absence
of noise; unfortunately, the video flow captured from a CMOS sensor (the most
common type of sensor in low cost video devices) is definitely not ideal and presents
a bias of white noise, salt and pepper noise and motion jpeg artifacts. This makes
pointer detection more difficult, especially when the pointer is not very close to the
camera and its silhouette is therefore only a few pixels wide. To keep the overall
complexity low we avoided applying any post-elaboration filter on each of the
grabbed frames and we adopted two simple strategies in order to reduce the impact of
noise on our algorithm.
11
The first strategy is to store, as a reference background, not just the first frame
but the average of the first b frames captured (in current implementation, b =
4). The average root mean square deviation of a frame from the reference
background, after this simple operation, decreases from ~1.52 to ~1.26 (about
−17%).
The second strategy is to apply a simple convolution to the scanlines we focus
on. The matrix we use is
with divisor 3. This is equivalent to say that we replace each pixel with the
average of a 1 pixel neighborhood on the same row; it is not worth increasing
the neighborhood of interest because by increasing it we decrease the tracking
accuracy.
Finally, we keep track of the Root Mean Square Error (RMSE) with respect to the
reference frames; if the RMSE gets higher than a threshold, this is probably due to a
disturbing entity in the video or to a movement of the camera rather than to systematic
noise. In this case, the system automatically stops tracking and informs the user that a
new reference background is about to be grabbed.
2.4 Fast pointer detection
Although some noise has been reduced, we cannot rely only on a binary differencing
algorithm. A set of pixels different from the reference frame is meaningful if they are
close to each other; we apply this spatial contiguity principle both horizontally and
vertically. This approach imitates the so called Helmholtz principle for human
perception.
The Helmholtz principle states that an observed geometric structure is perceptually
meaningful if its number of occurrences would be very small in a random situation.
(see [MmM01])
12
The first goal is to find a run of consecutive pixels significantly different from the
reference; what we care is the X coordinate of the center of such interruption.
We initialize to zero a buffer of the same size of one row, and then we start scanning
the selected line (say l). For each pixel p = ( px, pl ), we compute the absolute
difference dp from the correspondent reference value; then, for each pixel q = ( qx , ql
) in a neighborhood long n, we add this dp multiplied by a factor m inversely
proportional to | px – qx |.
Finally we read in the buffer a peak value correspondent to the X coordinate of the
center of the interruption (fig. 2.2); if no interruption occurred in the row (i.e. pixels
different from the reference were not close to each other), we will have only ―low‖
peaks in the buffer.
To distinguish between a ―high‖ and a ―low‖ peak we can use a fixed or a relative
threshold; in our tests, a safe threshold was about 20 times greater than the
neighborhood length.
Figure 2.2: The buffer used for the analysis of the green row shows a clear peak.
13
Now we have a horizontal proximity check, but not a vertical one yet. Each webcam
sees the pointer always breaking into the view frustum by the upper side. The pointer
silhouette may be straight (like a stick) or curved (e.g. a finger); in both cases, the
interruptions found on scanlines close to each other should not differ more than a
given threshold.
This vertical proximity constraint gives a linear upper bound to the curvature of the
pointer, and helps discarding interruptions caused by noise or other objects entering in
the view frustum; in other words, the system detects only pointers coming from
above, and keeps working correctly if other objects appear in the view frustum from a
different direction (e.g. the black pen in fig. 2.3).
Figure 2.3: The system correctly detects only the the pointer coming from above.
These two simple proximity checks make the recognition of the pointer an easier task.
Fig. 2.4 shows the correct detection of the pointer (a hand holding a pen) over a
challenging background. The lower end of the vertical sequence of interruptions is
marked with a little red cross.
14
Figure 2.4: The vertical contiguity constraint of a of a hand holding a pen.
2.5 Positioning the cameras
The proposed technique requires the positioning of two webcams relatively to the
pointing surface. The simplest choice is to put them so that one detects only
movements along the X axis, while the other one detects Y axis changes. This solution
is the simplest to implement, but requires the webcams to have their optical axes
perfectly aligned along the sides of
the pointing surface. Moreover, the
wider is the view field of a
webcam, the more we lose
accuracy on the opposite side of
the surface. On the other hand, the
narrower is the view field of the
webcams, the farther we have to
put them to capture the entire
surface.
Figure 2.5: Example of a simple but inefficient
configuration.
15
In fig. 2.5, for example, the webcam along Y axis of the surface has a wide view field,
but this brings resolution loss on segment DC; on the other side, the webcam along X
axis of the surface has a narrow view field, but it has to be positioned far from the
pointing surface to cover the whole area. If the surface is a 2×1.5m projected wall
and the webcam has a 45° view field, we have to put the camera ~5.2 meters away to
catch the whole horizontal size. A really usable system should not bother the final
user about webcam calibration, view angles and so on.
A way to minimize the calibration effort is to position the webcams near two non-
opposite corners of the pointing surface, far enough to catch it whole and oriented as
the surface diagonals were about bisectors of the respective view fields (fig. 2.6).
With this configuration there is no need to put the webcams far away from the
surface; this reduces the accuracy loss on the ―far‖ sides.
Figure 2.6: Suggested configuration to optimize the use of view frustum of the cameras.
In the rest of this project we will assume, for the sake of clarity, that the webcams are
in the same locations and orientations as in fig. 2.6. However, the proposed tracking
algorithm works with a variety of configurations without changes in the calibration
phase: the cameras may be positioned anywhere around the surface, and we only need
that they do not face each other.
16
2.6 Calibration phase
When the system is loaded, the calibration phase starts.
In this phase, after grabbing the reference backgrounds, we ask the user to touch the
vertices of the pointing surface and its center. When a pointer is detected in both
views, we track the position of its lower end (the red cross in fig. 2.4 and 2.3); if this
position holds with a low variance for a couple of seconds, the correspondent X
coordinate is stored. After we grabbed the position of all the five points, we compute
the Y coordinate of a ―special‖ scanline as the lowest row not intercepting the pointing
surface: during the tracking we will focus only on this row to grab the position of the
pointer, so that the overall complexity will be linear with the horizontal resolution.
2.7 Tracking algorithm
During the calibration phase we stored the X coordinate of each vertex as seen by the
webcams. The basic idea is to calculate the perspective transformation that translates
the absolute screen coordinates to absolute coordinates in the viewed image. We store
vertices in homogeneous coordinates and use a 3x3 transformation matrix M:
Since P is determined up to a proportional factor α there is no loss of generality in
setting one of the elements of M to an arbitrary non-zero value. In the following we
set the element l33 = 1. To obtain all the other elements of M, in principle the
correspondence between four pairs of points must be given. The proposed application
only needs to look at horizontal scanlines; for this reason there is no need to know the
coefficients l21,l22,l23 of M and we only have to determine the values of l11,l12,l13,l31,l32.
The number of unknown matrix elements has been decreased to five, so we only need
the x coordinate of five points (instead of the x and y of four points).
17
During the calibration phase, we ask the user to touch the four vertices of the pointing
surface and its center.
This setup greatly simplifies the computation of the unknown coefficients. Indeed
points A,B,C,D and the center E (see fig. 2.6) have screen coordinates respectively:
when the display resolution is W × H.
If Q is a point on the surface, let Qxp be the x coordinate of the corresponding
projected point. The final linear system to solve is:
which makes easy to obtain l11, l12, l13, l31, l32 for each camera.
During the tracking phase we face a somehow inverse problem: we know the
projected x coordinate in each view, and from these values (let them be Xl and Xr) we
would like to compute the x and y coordinates of the correspondent unprojected point
(that is, the point the user is touching). Let lij be the transformation values for the first
camera, and rij for the second one; the linear system we have to solve in this case is
18
It is convenient to divide the first two equations by zl and the latter two by zr , and
rename the unknown variables as follows
So that the final system is
This is a determined linear system, and it is possible to prove that in the setting above
there is always one and only one solution. By solving this system in x and y we find
the absolute coordinates of the point that the user is pointing/touching on the surface.
We can solve this system in a very fast way by computing once a LU factorization of
the coefficient matrix, and by using it to compute x and y for each pair of frames; we
can also use numerical methods, such as Single Value Decomposition, or direct
formulas. In the previous version of the system direct formulas were used, while now
a LU factorization is implemented.
2.8 Resolution accuracy
Let’s consider now how accurate is the tracking system depending on display and
webcam physical characteristics.
19
Let t = (xt ,yt ) be a point on the pointing surface, XD×YD the display resolution (i.e.
the resolution of the projector for a projected wall) and XW1 ×YW1 the resolution of a
webcam W1; let βW1 be the bisector of the view frustum of W1, and let the upper left
corner of the surface be the origin of our coordinate system (with Y pointing
downwards, like in fig. 2.7). We assume for simplicity that the view frustum of the
camera is centered on the bisector of the coordinate system, but the following
considerations keep their validity also in slightly different configurations.
The higher is the number of pixels detected by the webcam for each real pixel of the
display, the more accurate will be the tracking. Thus, if we want to know how
accurate is the detection of a point in the pointing surface, we could consider the ratio
between the length in pixels of the segment Xt , passing by t and perpendicular to βW1
, and the number of pixels detected by the webcam W1. We define resolution accuracy
of W1 in t and we call σ(W1, t) this ratio. It is clear that we only care about the
horizontal resolution of W1, which is constant in the whole view frustum of the
camera. (fig. 2.7)
Figure 2.7: We define ―resolution accuracy of W1 in t― the ratio between the length of Xt and the
number of pixels detected by W1.
20
Because pixels are approximately squares, the number of pixels along the diagonal of
a square is equal to the number of pixels along an edge of the square; thus, the length
of Xt will be equal to the distance from the origin of one of the two points that Xt
intercepts on the X and Y axes.
For every point p ∈ Xt is xp + yp = k; then, its length will be equal to the y-intercept of
the line passing by t and perpendicular to βW1. So we have |Xt | = xt + yt ; hence, the
resolution accuracy of W1 in t is
One of the most interesting applications of the system is to projected walls, so that
they become virtual blackboards.
A very common projector resolution is nowadays 1024 × 768 pixels, while one of
the maximum resolutions that recent low-cost webcams support is 1280×1024 pixels
at 15 frames per second. In this configuration, the resolution accuracy in t = (1024,
768) is
This is the lowest resolution accuracy we have with W1 in the worst orientation; if we
invert the X axis to get the accuracy for W2 (supposing that W2 is placed on the upper
right corner of the surface), σ (W2, t) ≈1.7.
In the central point u = (512, 384) of the display we have σ(W1, u) = σ(W2, u) ≈ 1.4;
it is immediate that, in the above configuration, the average resolution accuracy is
higher than 1:1 (sub-pixel).
21
2.9 Algorithm complexity
The number of scanlines is constant and in the tracking phase it is not useful to use
more than 3 or 4 of them. For each scanline we do a noise reduction (in linear time),
we apply a linear convolution filter (in linear time too) and then we do a linear search
for a peak. Finally, we solve the system (in constant time). The total complexity is
therefore linear with the horizontal resolution of the webcams.
2.10 Settings and system performance
The webcams we used for testing are two A4Tech PK 720G, with the following
specifications:
Image sensor: 1/6" CMOS, 640×480 pixels
Lens: F=2.4, f=3.5 mm
View angle: 54 degrees
Exposure control: Automatic
White balance: Automatic
Computer interface: USB 2.0
Focus range: Automatic focus, 10 cm to infinity
Frame rates: 30fps@160x120, @320x240, @640x480
Their 2012 price has been of about 110 EGP each. There is a mature Video4Linux2
compliant driver (uvcvideo) available for GNU/Linux.
Our prototype has good resolution accuracy and excellent time performances: less
than 10 milliseconds are needed to elaborate a new frame and compute the pointer
coordinates. Two USB webcams connected to the same computer can usually send
less than 20 frames per second simultaneously, while the software layer could
elaborate hundreds more.
The tracking system is in C++ in a GNU/Linux environment; in the relatively small
source code, all software layers are strictly separated, so that it is possible to port the
whole system to different platforms with very little changes in the source.
Figure 2.8: A4Tech webcam,
PK 720G model.
22
Chapter 3: Notes on the code
The code consists of separate files. Most of them are standard header files or contain
many standard functions. Most of our efforts in coding were made in the files:
constants.h, main.cpp and makefile.
3.1 Overall flow of the code
Yes
Start
Detect screen size
Initialize webcams & mouse handler
Grab 4 frames/webcam then average them
to set a reference image for each webcam
Ask the user to touch the 4 corners of the
screen and its center
For each corner, compare the live frames of
each webcam with its reference image
Redefine touchline after each corner
RMSE > 8.0
No
Any corner detection
attempts > 100 Exit
Yes
No
@
23
@
Calibration completed. Send
values to GSL for calculations
Tracking
Any
interrupts
No
Interrupts in
pointing scanlines
Yes
Move mouse Yes
Interrupts in
touchline below the
pointing interruptions
No
Inside the
tracking area
Yes
No
Click mouse Yes
No
24
3.2 Notes on main.cpp
3.2.1 Defining the two webcams we need
The following lines are responsible for defining which webcams to use:
const char *videodevice1 = "/dev/video1"; const char *videodevice2 = "/dev/video2";
If the host computer doesn't have any other webcams (doesn't have built-in webcam),
then these lines should be like this:
const char *videodevice1 = "/dev/video0"; const char *videodevice2 = "/dev/video1";
In general, we used an application called "Cheese webcam" to test the webcam and to
determine their ID.
After installing "Cheese webcam" using "Ubuntu Software Center", go to Edit
preferences And you can see a list of all connected webcams and their ID.
3.2.2 Smoothing neighborhood in averaging
It can be defined in the file constants.h, but it's defined in the file main.cpp for now. It
determines how many pixels before and after the each pixel to blur horizontally.
unsigned int SMOOTHING_NEIGHBORHOOD = 2;
It shouldn't be high to keep the reference image realistic.
3.2.3 The color model to be used
Two color models available in the code: YUV and RGB. Selection is made using the
following lines:
bool RGB_INSTEAD_OF_YUV = false;
False for YUV
True for RGB
25
3.2.4 Debugging
There are two debugging modes. Debug_one is for debugging one webcam only (the
first one) as we will be able to see a live streaming from the first webcam with a
single horizontal line across the image defining the scanline resulting a histogram
below the live stream showing interruptions as in fig. 2.2. And the other mode is an
overall debugging. Activating any of them is using the following lines:
debug = false; debug_one = false;
If debug_one is activated (Making it "true") it will prevent rest of the code from
running.
3.2.5 Luminance
The value of the following variable should be set depending on the luminance of the
surrounding.
norm_luminance = false;
3.2.6 Control the code while running
Some options can be altered while the code is running as following:
q: Quit.
s: Edit smoothing neighborhood.
l: Selecting the line to scan.
h: Which histogram mode to use ( l for live, p for peak, s for static, d for differential).
m: Which color model to use ( y for YUV, r for RGB).
u: To update the reference images.
26
3.3 Notes on constants.h
3.3.1 Threshold of colors difference in each pixel
In general, and for YUYV model, the threshold can be controlled using the following
lines:
const unsigned char COLOR_THRESHOLD = 20; const unsigned char Y_THRESHOLD = 20;
For RGB model, the threshold applied separately to each channel R, G, B.
const unsigned char R_THRESHOLD = 35; const unsigned char G_THRESHOLD = 38; const unsigned char B_THRESHOLD = 35;
3.3.2 Consecutive pixels threshold
How many sequence of consecutive pixels (not) to different whether the start (end) of
an interruption?
const unsigned int LENGTH_THRESHOLD = 16; const unsigned int HOLE_THRESHOLD = 3;
3.3.3 Calibration touch offset
Difference between the lowest breakpoint detected in the image and the height of the
scanline to choose for the interruption.
const unsigned int CALIBRATION_TOUCH_OFFEST = 8; //would edit it to make it 2
3.3.4 Consecutive detections to locate a corner
How many consecutive breaks are necessary to claim to have located the corner?
const unsigned int ALT_CALIBRATION_CONSECUTIVE_INTERRUPTIONS = 6; // make it 15
27
3.3.5 Limit of attempts to locate a corner
Maximum number of attempts for each corner detection.
const unsigned int CALIBRATION_CORNER_ATTEMPTS = 100;
3.3.6 Calibration scanlines distances
Distance between scanlines. The height of the touching line is established in the
calibration, the others are calculated using this value.
const unsigned int CALIBRATION_SCALINES_DISTANCE = 20;
3.3.7 Picture format, resolution, fps and grab method
In the following lines, you should only enter the resolution, fps, format and grab
method available by the webcams.
Check the appendix for more details on how to get these details about any webcam.
const unsigned int width = 320; const unsigned int height = 240; const unsigned int fps = 30; const int grabmethod = 1; // Use mmap (default) // const int grabmethod = 0; // Ask for read instead default mmap const int format = V4L2_PIX_FMT_YUYV; // Better quality, lower framerate //const int format = V4L2_PIX_FMT_MJPEG; // Lower quality, higher frame rate
Note that entering an unsupported option would lead to error 22. And entering higher
resolution without lowering the fps or using MJPEG format would lead to error 28
which is due to USB 2.0 bandwidth limitation.
More details about error 22 and error 28 can be found in section 4.2.
28
3.4 Compiling the code after any edits
To compile the code on Ubuntu, press "Alt+Ctrl+T" to open the terminal. If the code
is in the folder "YosrON" on Desktop, then type:
cd Desktop/YosrON
Note that all the commands in the terminal are case-sensitive even with the folder
names.
To remove older compilation files, type:
Make clean
To make new compilation files, type:
Make
To run the code, type: (For example)
./yosron
29
Chapter 4: Challenges
4.1 The environment
We spent very long time searching for the best software environment starting from the
programming language and toolboxes/libraries to use… ending with the OS.
4.1.1 OpenCV on Windows
We started with OpenCV toolbox with Visual Studio C++ on Windows 7, 64-bit. We
faced many problems at first due to incompatibilities between the latest version of
OpenCV and windows 7. After lots of online searching, we were instructed to use an
older version of OpenCV. We used version 2.2 and we were able to interface with the
webcams.
When we started to work on the code, we needed to process a single horizontal line of
pixels only instead of processing the entire image which is a very essential function
for our project as we wanted the software to be faster and light as much as we can.
After consulting engineers of experience with OpenCV, we have been told that
OpenCV can't do such a function and it must process the entire image. So, we had to
look for other alternatives leading us to C/C++ programming on Linux/Ubuntu.
4.1.2 C/C++ programming on Linux/Ubuntu
We had to change our track from Windows to Linux, even that our time was very
limited. We were encouraged to do so after we communicated with Eng. E. Rustico,
the designer of EverScreen, and he supported us with very useful documentation,
codes and instructions that helped us achieving our main target.
The OS used is Ubuntu 11.10, 64-bit with kernel version 3.0.0-22 and gcc/g++
version 4.4.6. (gcc/g++ is the compiler of C/C++ on Linux)
Installing Ubuntu is a little bit tricky as there are many options. We tried to install it
using WUbi (Windows Ubuntu Installer) but we had many problems. After many
attempts to fix those problems, we assumed that they would disappear if we tried the
30
installation all over again using another method. We had to remove all the installation
again and install it from a boot CD alongside with Windows 7. Details about this
process are available in the appendix.
4.1.3 Libraries that must be installed for the code
Build-essential:
An informational list of needed packages for C/C++
programming on Linux as it generally includes gcc/g++ and
other utilities and libraries.
Libc dev:
It provides headers from the Linux kernel. These headers are
used by the installed headers for GNU glibc and other system
libraries.
SDL dev (libsdl1.2-dev):
Simple DirectMedia Layer is a cross-platform multimedia
library designed to provide low level access to audio, keyboard,
mouse, joystick, 3D hardware via OpenGL, and 2D video
framebuffer. It is used by MPEG playback software, emulators,
and many popular games, including the award winning Linux
port of "Civilization: Call To Power."
SDL supports Linux, Windows, Windows CE, BeOS, MacOS,
Mac OS X, FreeBSD, NetBSD, OpenBSD, BSD/OS, Solaris,
IRIX, and QNX. The code contains support for AmigaOS,
Dreamcast, Atari, AIX, OSF/Tru64, RISC OS, SymbianOS,
and OS/2, but these are not officially supported.
SDL is written in C, but works with C++ natively, and has
bindings to several other languages, including Ada, C#, D,
Eiffel, Erlang, Euphoria, Go, Guile, Haskell, Java, Lisp, Lua,
ML, Objective C, Pascal, Perl, PHP, Pike, Pliant, Python,
Ruby, Smalltalk, and Tcl.
31
GSL dev (libgsl0-dev):
The GNU Scientific Library (GSL) is a numerical library for C
and C++ programmers. It is free software under the GNU
General Public License.
The library provides a wide range of mathematical routines
such as random number generators, special functions and least-
squares fitting. There are over 1000 functions in total with an
extensive test suite.
Xorg XTest (libxtst-dev):
The X window system (commonly X Window System or X11,
based on its current major version being 11) is a computer
software system and network protocol that provides a basis for
graphical user interfaces (GUIs) and rich input device
capability for networked computers. It creates a hardware
abstraction layer where software is written to use a generalized
set of commands, allowing for device independence and reuse
of programs on any computer that implements X.
V4L2 dev (libv4l-dev):
Video4Linux or V4L is a video capture application
programming interface for Linux. Many USB webcams, TV
tuners, and other devices are supported. Video4Linux is closely
integrated with the Linux kernel.
V4L2 is the second version of V4L. The original V4L was
introduced late into the 2.1.X development cycle of the Linux
kernel. Video4Linux2 fixed some design bugs and started
appearing in the 2.5.X kernels. Video4Linux2 drivers include a
compatibility mode for Video4Linux1 application, though
practically, the support can be incomplete and it is
recommended to use V4L2 devices in V4L2 mode.
32
It's considered as an API that provides unified access to various
video capturing devices, such as TV tuners, USB web cameras,
etc.
UVC drivers:
The USB video device class (also USB video class or UVC) is
a USB device class that describes devices capable of streaming
video like webcams, digital camcorders, transcoders, analog
video converters, television tuners, and still-image cameras.
The latest revision of the USB video class specification carries
the version number 1.1 and was defined by the USB
Implementers Forum in a set of documents describing both the
basic protocol and the different payload formats.
Webcams were among the first devices to support the UVC
standard and they are currently the most popular UVC devices.
It can be expected that in the near future most webcams will be
UVC compatible as this is a logo requirement for Windows and
Since Linux 2.6.26 the driver is included in kernel source
distribution.
luvcview:
luvcview is a camera viewer for UVC based webcams. It
includes an mjpeg decoder and is able to save the video stream
as an AVI file.
guvcview:
It provides a simple GTK interface for capturing and viewing
video from devices supported by the linux UVC driver,
although it should also work with any v4l2 compatible device.
The project is based on luvcview for video rendering, but all
controls are built using a GTK2 interface. It can also be used as
a control window only
33
4.2 The cameras
The cameras were very hard to find in the Egyptian market due to lack of availability
of highly technical details we need about any camera before we buy it. The cameras
must be UVC compliant and support different control options for resolution, frames
per second, color profiles… etc. And we also needed the cameras to mechanically
solid, stiff and capable of being fixed on any surface with the ability of changing the
direction of the lenses towards any direction.
First, we bought two 2B webcams and they worked with us nicely on OpenCV. But
when we migrated to Ubuntu, we had a major problem in the first phase of the project
(pointer/finder detection phase streaming from one webcam only) as they were
working well with guvcview but producing an error (error 22) with our code. We
checked their driver to make sure they are UVC compliant as we can't use the
Windows driver provided in the CD. (Checking UVC compliance for webcams is
available in the appendix). The error 22 was produced because the code was
configured for MJPEG picture format which is a compressed format of the raw stream
while the cameras only support YUYV format which is the uncompressed/raw format.
MJPEG format had been chosen in the beginning because it needs low bandwidth of
USB so that we can use 4 webcams or more on the same USB 2.0 bus while YUYV
format consumes higher bandwidth with slightly better quality. Unfortunately, most or
all webcams in the Egyptian market doesn't support MJPEG format and we have been
told that they would be much more expensive. (Checking the supported formats by the
webcam is available in the appendix)
But when we moved to the second phase (stream from two webcams for calibration
and calculating the pointer/finger position to move the mouse) we faced other errors
(28 and 16). After searching online, we found that error 28 is due to USB bandwidth
limitation and error 16 is due to device hanged.
As we know, the bus of USB 2.0 supports a total bandwidth of 480 Mbps. calculating
the required bandwidth for a webcam is based on the configuration of the webcam.
For a resolution of 640 x 480, 30 frames per second and 32-bit colors: the required
bandwidth = 640 x 480 x 30 x 32 = 294912000 bits/second = 294.912 Mbps
34
So, the total required bandwidth for two webcams = 2 x 294.912 = 589.824 Mbps
which is higher than the 480 Mbps total bandwidth supported by USB 2.0.
Overcoming this problem was supposed to be easy by setting the configuration of the
webcams to fewer frames (15 fps) or lower resolution (320x240), but that didn't work.
After spending more than a week investigating this problem and trying all the
suggested solutions, we suspected that the 2B webcams only supports one bandwidth
setting despite of the configuration which means that each webcam reserves a fixed
USB bandwidth much more than it really need no matter what is the configuration.
Error 16 is much related to error 22 as it means that the device is hanged and can't be
accessed. When a webcam starts streaming, it reserves the bandwidth.
When the other webcam starts to work on the same bus, it requests the needed
bandwidth which is not available because of the first webcam. So, both webcams
hang and stop responding while the system keeps their ports (i.e. /dev/video1)
reserved forcing us to unplug and plug them again.
Our final solution for these errors was to buy another two webcams that support either
MJPEG format or variable bandwidth depending on the configuration.
We didn't find webcams in the Egyptian market that support MJPEG format but we
found A4Tech webcams that supported variable bandwidth depending on the
configuration.
A4Tech webcams don't support MJPEG and support only 30 frames per second. So
we had to work with the configuration of 320 x 240 resolution which is acceptable for
our needs.
4.3 The fisheye lenses
We need the view angle of each webcam to be more than 90 degrees to be able to put
them very near to the screen and not having any blind areas. Most of webcams have a
view angle of less than 60 degrees. So, we need to use fisheye lenses by installing
each lens on each webcam.
35
We needed a full-frame fisheye lens that produces images as in fig. 4.1.
Figure 4.1: Full-frame fisheye image.
Then to remap it into rectilinear perspective (Defisheye) with any of the available
scripts like Panorama Tools as in fig 4.2.
Figure 4.2: Remapped full-frame fisheye image into rectilinear prespective.
We searched in many places and asked many photographers and glass makers to help
us finding a single lens that serves as a full-frame fisheye with a very small size for
our webcams. But all the attempts failed.
36
We also couldn't find circular fisheye that would produce an image as in fig 4.3.
Figure 4.3: Circular fisheye image.
That also can be remapped into a normal image as in fig. 4.4.
Figure 4.4: The image of circular fisheye after remapping (Defisheye).
37
Our final hope is to use the only available fisheye small enough for
YosON: The fisheye for home doors as in fig. 4.5.
We removed its metallic housing as we don't need and while we need
to make it smaller to fit in the plastic frame.
After removing the housing of the webcams and fixing the fisheye lenses on them, we
faced a problem that we couldn't overcome due to the lake of time and available
support in Egypt. The problem was that the fisheye lens produced some internal
reflections on the image (i.e. the lighting would be repeated in other parts in the
image) increasing the noise to unacceptable levels.
Another problem was the difficulties of finding two exactly identical fisheye lenses.
We thought it should be a simple thing if we bought them both from the same brand
and the same shop, but believe it or not: They weren't identical!!
Although that "identical" problem is possible to overcome using software, but the
killing problem was the "internal reflections" problem that made us postpone the
fisheye addition and the plastic frame to future work.
Figure 4.5:
Fisheye for home
doors.
38
Chapter 5: Conclusions and Future Work
5.1 Conclusions
We presented a low cost system for bare finger tracking able to turn LCD displays
into touchscreens, as well as a desk into a design board, or a wall into an interactive
whiteboard. Many application domains can benefit from the proposed solution:
designers, teachers, gamers, interface developers. The proposed system requires a
simple calibration phase.
5.2 Future work
Future works will be devoted to improve the robustness of the calibration and the
pointer-detection subsystems; moreover, suitable evaluation procedures to test the
empirical accuracy of tracking will be addressed. Adding multitouch support will also
be considered.
The system needs a GUI for installation, calibration and configuration as all of them
now are done by editing the source code which is not user friendly of course.
It would be better for the system if the processing load is not on the host computer.
That can be done by using a standalone DSP unit for image processing and position
calculations which will lead to changes in the cameras and the code.
A standalone DSP processing unit would be also good to make the system cross OS
supported as all the processing will be made on that unit and it will only send signals
to the OS through USB to move the mouse, do the clicks and even multitouch
functions. That will save us from making drivers and code editions for each OS like
Windows, Linux and Mac OS.
Solving the problem of the fisheye lenses is still an essential need for YosrON to be a
user friendly product. After solving this problem we can easily seek to put the entire
hardware inside a resizable plastic as housing.
39
References
[Figure 1.1] Survey from UniMasr.com website at: Can be found at:
http://unimasr.com/community/viewtopic.php?t=87470.
[Figure 1.2] Survey from YosrON page on facebook (http://fb.com/yosronx)
at: http://fb.com/questions/242871132427684/.
[Figure 1.3] Image and price details from http://www.magictouch.com and
local resellers available at: http://www.magictouch.com/middleeast.html.
[Figure 2.8] A4Tech webcam, PK 720G model at:
http://a4tech.com/product.asp?cid=77&scid=167&id=693.
E. Rustico. "Low cost finger tracking for a virtual blackboard" at
http://www.dmi.unict.it/~rustico/docs/Low%20cost%20finger%20tracking%2
0for%20a%20virtual%20blackboard.pdf.
[AA07] Chandraker M. Blake A. Agarwal A., Shahram Izadi S. High
precision multitouch sensing on surfaces using overhead cameras. In
Horizontal Interactive Human-Computer Systems, 2007. TABLETOP ’07.
Second Annual IEEE International Workshop on, pages 197– 200, 2007.
[Ber03] F. Berard. The magic table: Computer vision based augmentation of a
whiteboard for creative meetings. IEEE International Conference in Computer
Vision, 2003.
[CT06] Kelvin Cheng and Masahiro Takatsuka. Estimating virtual touchscreen
for fingertip interaction with large displays. In OZCHI ’06: Proceedings of the
20th conference of the computer-human interaction special interest group
(CHISIG) of Australia on Computer-human interaction: design: activities,
artefacts and environments, pages 397–400, New York, NY, USA, 2006.
ACM.
[DUS01] Klaus Dorfmüller-Ulhaas and Dieter Schmalstieg. Finger tracking
for interaction in augmented environments. Augmented Reality, International
Symposium on, 0:55, 2001.
[FR08] G.M. Farinella and E. Rustico. Low cost finger tracking on flat
surfaces. In Eurographics Italian chapter 2008, 2008.
[GMR02] D. Gorodnichy, S. Malik, and G. Roth. Nouse ’use your nose as a
mouse’ – a new technology for hands-free games and interfaces, 2002.
40
[GOSC00] Christophe Le Gal, Ali Erdem Ozcan, Karl Schwerdt, and James L.
Crowley. A sound magicboard. In ICMI ’00: Proceedings of the Third
International Conference on Advances in Multimodal Interfaces, pages 65–71,
London, UK, 2000. Springer-Verlag.
[IVV01] Giancarlo Iannizzotto, Massimo Villari, and Lorenzo Vita. Hand
tracking for human-computer interaction with gray level visual glove: turning
back to the simple way. In PUI ’01: Proceedings of the 2001 workshop on
Perceptive user interfaces, pages 1–7, New York, NY, USA, 2001. ACM.
[Jen99] Cullen Jennings. Robust finger tracking with multiple cameras. In In
Proc. Of the International Workshop on Recognition, Analysis, and Tracking
of Faces and Gestures in Real-Time Systems, pages 152–160, 1999.
[LB04] Julien Letessier and François Bérard. Visual tracking of bare fingers
for interactive surfaces. In UIST ’04: Proceedings of the 17th annual ACM
symposium on User interface software and technology, pages 119–122, New
York, NY, USA, 2004. ACM.
[Lee07] Johnny Chung Lee. Head tracking for desktop VR displays using the
Wii remote http://www.cs.cmu.edu/~johnny/projects/wii. 2007.
[ML04] Shahzad Malik and Joe Laszlo. Visual touchpad: a two-handed
gestural input device. In ICMI ’04: Proceedings of the 6th international
conference on Multimodal interfaces, pages 289–296, New York, NY, USA,
2004. ACM.
[MmM01] Lionel Moisanm and Jean Michel Morel. Edge detection by
Helmholtz principle. Journal of Mathematical Imaging and Vision, 14:271–
284, 2001.
[Mor05] Gerald D. Morrison. A camera-based input device for large
interactive displays. IEEE Computer Graphics and Applications, 25(4):52–57,
2005.
[Mos06] Tomer Moscovich. Multi-finger cursor techniques. In In GI ’06:
Proceedings of the 2006 conference on Graphics interface, pages 1–7, 2006.
[pHYssCIb98] Yi ping Hung, Yang Yao-strong, Yong sheng Chen, and Hsieh
Ingbor. Freehand pointer by use of an active stereo vision system. In Proc.
14th
Int. Conf. Pattern Recognition, pages 1244–1246, 1998.
41
[QMZ95] F. Quek, T. Mysliwiec, and M. Zhao. Fingermouse: A freehand
computer pointing interface, 1995.
[SP98] Joshua Strickon and Joseph Paradiso. Tracking hands above large
interactive surfaces with a low-cost scanning laser range finder. In
Proceedings of CHI’98, pages 231–232. Press, 1998.
[ST05] Le Song and Masahiro Takatsuka. Real-time 3d finger pointing for an
augmented desk. In AUIC ’05: Proceedings of the Sixth Australasian
conference on User interface, pages 99–108, Darlinghurst, Australia,
Australia, 2005. Australian Computer Society, Inc.
[vHB01] Christian von Hardenberg and François Bérard. Bare-hand human-
computer interaction. In PUI ’01: Proceedings of the 2001 workshop on
Perceptive user interfaces, pages 1–8, New York, NY, USA, 2001. ACM.
[WC05] Andrew D.Wilson and Edward Cutrell. Flowmouse: A computer
vision-based pointing and gesture input device. In Interact ’05, 2005.
[Wel93] Pierre Wellner. Interacting with paper on the digitaldesk.
Communications of the ACM, 36:87–96, 1993.
[Wil05] Andrew D. Wilson. Play anywhere: a compact interactive tabletop
projection-vision system. In Patrick Baudisch, Mary Czerwinski, and Dan R.
Olsen, editors, UIST, pages 83–92. ACM, 2005.
[WSL00] Andrew Wu, Mubarak Shah, and N. Da Vitoria Lobo. A virtual 3d
blackboard: 3d finger tracking using a single camera. In In Fourth IEEE
International Conference on Automatic Face and Gesture Recognition, pages
536–543, 2000.
[Zha03] Zhengyou Zhang. Vision-based interaction with fingers and papers. In
Proc. International Symposium on the CREST Digital Archiving Project,
pages 83–106, 2003.
Details about guvcview package from: http://guvcview.sourceforge.net.
Details about luvcview package from:
http://packages.ubuntu.com/hardy/luvcview.
Details about V4L2 library from: http://en.wikipedia.org/wiki/Video4Linux.
Details about SDL library from: http://www.libsdl.org.
Details about GSL library from: http://www.gnu.org/software/gsl.
42
Details about Xorg Xtest from
http://en.wikipedia.org/wiki/X_Window_System.
Details about build-essential package from:
http://packages.ubuntu.com/lucid/build-essential.
Details about UVC drivers from:
http://en.wikipedia.org/wiki/USB_video_device_class.
Details about Libc dev package from: http://packages.debian.org/sid/linux-
libc-dev.
Details about fisheye lenses from: http://en.wikipedia.org/wiki/Fisheye_lens.
Details about defisheye scripts from:
http://www.fmwconcepts.com/imagemagick/defisheye/index.php.
How to install Ubuntu 11.10 from a CD or USB flash memory. From:
http://blog.sudobits.com/2011/09/11/how-to-install-ubuntu-11-10-from-usb-
drive-or-cd/
How to free space on your hard disk and make it unallocated using Windows
Disk Management Tool. From: http://technet.microsoft.com/en-
us/magazine/gg309169.aspx.
How to disable automatic updates in Ubuntu. From:
http://www.garron.me/linux/turn-off-stop-ubuntu-automatic-update.html.
How to install build-essential from:
https://help.ubuntu.com/community/CompilingEasyHowTo.
How to check UVC compliance of a webcam and troubleshoot it from:
http://www.ideasonboard.org/uvc/faq.
43
Chapter 0: Appendix
0.1 Installing Ubuntu 11.10
The instructions given in this section assume that you want to install Ubuntu 11.10 as
a dual boot with Windows 7 (or XP/Vista or whatever you’ve already installed),
which is recommended for absolute beginners as if any problem occurs with Ubuntu.
That's how you would still be able to access Windows, but if you want something else
like – removing windows and install Ubuntu or erase whole disk and install Ubuntu
on a new computer – then most of the steps would be same – few things will change
that I’ve pointed out (Jump to steps).
Preparing for installation:
First of All – backup your important data
This step is very important, especially for beginners, as some mistakes would
lead to reformatting the entire hard disk and losing data.
So, Before going to start the installation procedure – you are strongly recommended
to backup your data (using a backup disk or online backup program), although you
aren’t going to lose any if you’ve multiple partition on your drive and want to go for
custom installation procedure, but you’re supposed to have a backup of all your
critical data before starting any experiments.
Step 1: Download Ubuntu 11.10 ISO file
First, Download Ubuntu 11.10 ISO (http://releases.ubuntu.com/oneiric), then select
the archive file (ISO) depending on your computer architecture – such as Intelx86 or
AMD64. If you are not sure then go for first one. When the download is completed,
move on to next step.
44
Step 2: Create a bootable media (USB/CD)
You can create a bootable USB stick/drive or a CD/DVD from the ISO file you’ve
just downloaded. If you want to create a bootable CD/DVD – then it’s pretty easy-
you just need to burn the ISO image to the cd.
If you want to install Ubuntu from a USB flash memory (pendrive), then use the free
program called – universal USB installer. To make your pendrive bootable – use
Universal-USB-Installer (Download from "http://www.pendrivelinux.com/universal-
usb-installer-easy-as-1-2-3" and run it – then locate the ISO file, choose your USB
drive as a target and your will be done in a minute). In Windows 7 you can burn ISO
files directly in few simple steps – Insert cd in to the tray, right click on the ISO file
and select burn this ISO… and finally you will get a bootable cd.
Step 3: Free enough space
Explore your partitions and make sure that one of them has at least 20 GB free. Then
use the Windows 7 Disk Management tool that provides a simple interface for
managing partitions and volumes. Here’s an easy way to shrink a volume:
1. Open the Disk Management console by pressing "Windows key + R" and
typing diskmgmt.msc at an elevated command prompt.
2. In Disk Management, right-click the volume that you want to shrink, and then
click Shrink Volume.
Figure 0.1: Windows Disk Management Tool.
45
3. In the field provided in the Shrink dialog box, enter the amount of space by
which to shrink the disk.
Figure 0.2: Shrink dialog box.
The Shrink dialog box provides the following information:
Total Size Before Shrink In MB Lists the total capacity of the volume in MB. This
is the formatted size of the volume.
Size Of Available Shrink Space In MB Lists the maximum amount by which you
can shrink the volume. This doesn’t represent the total amount of free space on the
volume; rather, it represents the amount of space that can be removed, not including
any data reserved for the master file table, volume snapshots, page files, and
temporary files.
Enter The Amount of Space To Shrink In MB Lists the total amount of space that
will be removed from the volume. The initial value defaults to the maximum amount
of space that can be removed from the volume. For optimal drive performance, you
should ensure that the volume has at least 10 percent of free space after the shrink
operation.
Total Size After Shrink In MB Lists what the total capacity of the volume in MB
will be after you shrink the volume. This is the new formatted size of the volume.
46
4. After clicking "Shrink", you should see the free space as a green partition.
Figure 0.3: Windows partitions after successfully freeing space.
That free unallocated space will be automatically used by the Ubuntu installer.
Step 4: Insert the USB disk (or CD) and restart
Now restart your computer and enter the BIOS to make sure that your computer is
configured to boot first from CD or USB drives. The steps of this configuration are
not the same for every computer, so you have to do it yourself. You can search online
according to the model for your motherboard or you can ask the help of any available
technical support for you.
When your computer is booting, if you have set any password, enter your supervisor
BIOS password as your system may not boot from CD if you enter user BIOS
password. Your computer should boot automatically from the bootable media, and the
Ubuntu will be loaded in RAM.
If any option comes then select "try Ubuntu without installing" if you want to take a
look before installing it on your hard drive. Then click on the install Ubuntu 11.10
icon on the desktop to begin and select the language "English" to continue.
47
Step 5: Select Installation Type
For YosrON, we should not allow any updates for the environment, especially kernel
updates or an entire upgrade, as this might lead to incompatibilities between the
software headers and the kernel headers. So, make sure to uncheck "Download
Updates" but you can check "install third party software", but you must be connected
with Internet (it’s recommended – if wireless network doesn’t seem to work use wired
connection). Although there is no hurry – you can always install them later, so it’s
optional.
Figure 0.4: Don't allow any updates
then click on continue – then a new window will appear – where you need to select
installation type.
48
Figure 0.5: Install Ubuntu alongside your current OS.
You may get different options depending on your computer configuration. The above
snapshot has been taken while installing Ubuntu 11.10 on a computer with Ubuntu
10.04 and Windows 7 pre-installed as dual boot.
Install Ubuntu alongside with them: it will install Ubuntu 11.10 alongside
with existing operating systems such as Windows 7.
Erase Entire Disk and Install Ubuntu: it’s going to erase your whole hard
drive and everything will be deleted (your files as well as other operating
systems), useful only if your hard-drive doesn’t have any important files or
you just bought a new computer and want to keep only one OS – i.e. Ubuntu.
Something Else: Create, Allocate and choose the partition to which you want
to install Ubuntu, using advanced partition manager. At first look it may seems
little difficult but it’s better as it give you more options/control.
However, we will go with the first option – select "Install Ubuntu alongside them"
and continue.
49
Step 6: Finishing the installation
The rest of the steps are easy for any user and they are standards as available online.
But it's important to select the correct keyboard layout to ensure no problems later.
Most of keyboards in Egypt are "Arabic 101" layout. Also, it's very important to set a
password for Ubuntu and remember is very well as we will use it in installing the
required libraries and packages for YosrON.
Step 7: Disabling automatic updates
As we mentioned before, it's very important for YosrON to disable the automatic
updates feature in Ubuntu. From the menu on the left of the screen, Open the Ubuntu
Software Center then go to Edit -> Software Sources… and be sure to select to Never
the option Automatically check for updates:.
Figure 0.6: Disabling automatic updates of Ubuntu.
Then click "close". This will disable automatic update on you Ubuntu box.
50
0.2 Installing the required libraries and packages
Some libraries/packages can be installed directly from "Ubuntu Software Center" and
others must be installed from the "Terminal". The internet connection must be
available in both cases.
"Ubuntu Software Center" can be opened from the menu on the left of the screen. The
"Terminal" (AKA: command-line) can be obtained by pressing "Alt+Ctrl+T".
0.2.1 Installing "build-essential" using the "Terminal"
We need to install a compiler gcc which can be obtained by installing the build-
essential package.
Step 1: Prep your system for building packages
By default, Ubuntu does not come with the tools required. You need to install the
package build-essential for making the package and checkinstall for putting it into
your package manager. These can be found on the install CD or in the repositories,
searching in Synaptic Package Manager or the terminal apt-get:
sudo apt-get install build-essential checkinstall
And since you may want to get code from some projects with no released version, you
should install appropriate version management software.
sudo apt-get install cvs subversion git-core mercurial
You should then build a common directory for yourself where you'll be building these
packages. We recommend creating "/usr/local/src", but really you can put it anywhere
you want. Make sure this directory is writable by your primary user account, by
running
sudo chown $USER /usr/local/src
and, just to be safe
sudo chmod u+rwx /usr/local/src
After you've done this, you're set up to start getting the programs you need.
51
Step 2: Resolving Dependencies
One nice thing about modern Linux distributions is they take care of dependencies for
the user. That is to say, if you want to install a program, the apt program will make
sure it installs all needed libraries and other dependent programs so installing a
program is never more difficult than just specifying what you need and it does the
rest. Unfortunately with some programs this is not the case, and you'll have to do it
manually. It's this stage that trips up even some fairly experienced users who often
give up in frustration for not being able to figure out what they need to get.
You probably want to read about the possibilities and limitations of auto-apt
(https://help.ubuntu.com/community/AutoApt) first, which will attempt to take
care of dependency issues automatically. The following instructions are for
fulfilling dependencies manually:
To prepare, install the package "apt-file", and then run sudo apt-file update. This will
download a list of all the available packages and all of the files those packages
contain, which as you might expect can be a very large list. It will not provide any
feedback while it loads, so just wait.
The "apt-file" program has some interesting functions, the two most useful are apt-
file search which searches for a particular file name, and apt-file list which lists all the
files in a given package. (Two explanations:
1{http://debaday.debian.net/2007/01/24/apt-file-search-for-files-in-packages-
installed-or-not/} and 2{http://www.debianhelp.co.uk/findfile.htm})
To check the dependencies of your program, change into the directory you created in
step two (cd /usr/local/src). Extracting the tarball or downloading from
"cvs/subversion" will have made a sub-directory under "/usr/local/src" that contains
the source code. This newly-created directory will contain a file called "configure",
which is a script to make sure that the program can be compiled on your computer. To
run it, run the command ./configure This command will check to see if you've got all
the programs needed to install the program — in most cases you will not, and it will
error out with a message about needing a program.
If you run ./configure without any options, you will use the default settings for
the program. Most programs have a range of settings that you can enable or
52
disable, if you are interested in this check the README and INSTALL files
found in the directory after decompressing the tar file. You can check the
developer documentation and in many cases ./configure --help will list some
of the key configurations you can do. A very common options is to use
./configure --prefix=/usr which will install your application into "/usr" instead
of "/usr/local" as my instructions do.
If this happens, the last line of output will be something like
configure: error: Library requirements (gobbletygook) not met, blah blah blah stuff
we don't care about
But right above that it will list a filename that it cannot find (often a filename ending
in ".pc", for instance). What you need to do then is to run
apt-file search missingfilename.pc
which will tell you which Ubuntu package the missing file is in. You can then simply
install the package using
sudo apt-get install requiredpackage
Then try running ./configure again, and see if it works. If you get to a bunch of text
that finishes with "config.status: creating Makefile" followed by no obvious error
messages, you're ready for the next steps.
Step 3: Build and install
If you got this far, you've done the hardest part already. Now all you need to do is to
make sure you are inside the program folder (for example: a folder on the desktop
called YosrON), type:
cd Desktop/YosrON
Then, run the command
make
53
which does the actual building (compiling) of the program. (You can use make clean
to remove older compilation files after any edits you make in the code then use make
again)
Make sure you installed all the libraries/packages needed for YosrON before running
this command. Check the following sections.
When it's done, install the program. You probably want to use
sudo checkinstall
which puts the program in the package manager for clean, easy removal later. This
replaces the old sudo make install command. See the complete documentation at
CheckInstall (https://help.ubuntu.com/community/CheckInstall).
Note: If checkinstall fails you may need to run the command like
sudo checkinstall --fstrans=0
which should allow the install to complete successfully
Then the final stage of the installation will run. It shouldn't take long. When finished,
if you used checkinstall, the program will appear in Synaptic Package Manager. If
you used sudo make install, your application will be installed to "/usr/local/bin" and
you should be able to run it from there without problems.
Finally, it would be better to change the group of "/usr/local/src/" to admin and give
them rwx privileges? Since anyone adding and removing software should be in the
admin group.
0.2.2 Installing libraries/packages using "Ubuntu Software Center"
"Ubuntu Software Center" is much easier to be used for installing packages and
libraries.
First of all, you need to enable installing/updating software from other sources other
than Ubuntu. This is done by all opening the software center from the menu on the
left of the screen and going to Edit Software sources Other software
54
Check the box "Canonical Partners" as in fig. 6.7, then click "Close".
Figure 0.7: Using Ubuntu Software Center to install required libraries/packages.
Then, type the name (code) of what you want to install in the search box found in the
upper right of the window. For example: type "guvcview" and it will appear in the
results. Just click "Install".
Some libraries/packages can't be installed from the "Ubuntu Software Center" leading
us to the "Terminal". For example: to install the SDL library, type:
sudo apt-get install libsdl1.2-dev
0.3 Check webcam supported formats and UVC compliance
0.3.1 UVC compliance check
1. First find out the vendor ID (VID) and product ID (PID) of the webcam.
Use: lsusb which will list all your USB devices including the VID and PID in
this format: VID:PID.
55
2. Use the lsusb tool again to look for video class interfaces like this: (In this
example, the VID is 046d and the PID is 08cb.)
lsusb -d 046d:08cb -v | grep "14 Video"
If the webcam is a UVC device, you should see a number of lines that look like this:
bFunctionClass 14 Video bInterfaceClass 14 Video bInterfaceClass 14 Video bInterfaceClass 14 Video
In this case the Linux UVC driver should recognize your camera when you plug it in.
If there are no such lines, your device is not a UVC device.
0.3.2 Supported configurations and formats
This is done using guvcview. Type guvcview in the terminal then go to "Video &
files". You can see all the supported configurations and formats for every webcam.
Figure 0.8: Checking supported configurations and formats using guvcview.
56
0.3.3 Troubleshooting webcams
If the webcam is UVC-compatible, it should be supported out of the box in any recent
Linux distribution. Failures are usually caused by buggy applications or broken
hardware (cameras, USB cables and USB host controllers can be faulty).
You should start with trying several applications. qv4l2, guvcview and luvcview are
common test tools for UVC webcams, but feel free to try other V4L2 applications as
well. In particular be careful that different webcams might use different video
formats, and some of them can be unsupported in some applications.
If all applications fail display the same failure, chances are that your hardware is
broken (or at least buggy), or that you're lucky enough to have hit a bug in the UVC
driver. To diagnose the problem, please follow this procedure:
1. Make sure the webcam is UVC compliant as mentioned in a previous section.
2. Enable all uvcvideo module traces:
sudo echo 0xffff > /sys/module/uvcvideo/parameters/trace
3. Reproduce the problem. The driver will print many debugging messages to the
kernel log, so don't let video capture running for too long. You can disable the
uvcmodule traces when you're done:
sudo echo 0 > /sys/module/uvcvideo/parameters/trace
4. Capture the contents of the kernel log:
dmesg > dmesg.log
5. If your device is not listed in the supported devices list
(http://www.ideasonboard.org/uvc/#devices), dump its USB descriptors:
lsusb -d VID:PID -v > lsusb.log
(replace VID and PID with your device VID and PID)