YosrON Report v1.4

Touchscreen Add-On

Official Website: YosrON.com

July, 2012

Cairo University Faculty of Engineering

Electrical Electronics and Communications Department

http://yosron.com/

i

YosrON: Touchscreen add-on

By

Donia Alaa Eldin Hassan Idriss: [email protected]

Muhammad Al-Sherbeeny Hassan: [email protected]

Under the Supervision of

Dr. Ibrahim Qamar

[email protected]

A Graduation Project Report Submitted to

the Faculty of Engineering at Cairo University

In Partial Fulfillment of the Requirements for the Degree of

Bachelor of Science

in

Electronics and Communications Engineering

Faculty of Engineering, Cairo University

Giza, Egypt

July 2012

mailto:[email protected]

mailto:[email protected]

ii

Table of Contents

List of Figures ................................................................................................................ v

Acknowledgments......................................................................................................... vi

Abstract ........................................................................................................................ vii

Chapter 1: Introduction .............................................................................................. 1

1.1 Why is it important? ........................................................................................ 1

1.2 Other related projects ...................................................................................... 3

1.3 YosrON is built on the 2nd

version of EverScreen .......................................... 6

1.3.1 The hardware ........................................................................................... 6

1.3.2 The software............................................................................................. 6

1.3.3 The advantages of YosrON ...................................................................... 7

1.3.4 The challenges we expected ..................................................................... 7

1.3.5 The skills we needed ................................................................................ 8

1.3.6 The plan ................................................................................................... 8

Chapter 2: YosrON structure ..................................................................................... 9

2.1 System Description ......................................................................................... 9

2.2 Scanlines.......................................................................................................... 9

2.3 Noise reduction ............................................................................................. 10

2.4 Fast pointer detection .................................................................................... 11

2.5 Positioning the cameras ................................................................................. 14

2.6 Calibration phase ........................................................................................... 16

2.7 Tracking algorithm ........................................................................................ 16

2.8 Resolution accuracy ...................................................................................... 18

2.9 Algorithm complexity ................................................................................... 21

2.10 Settings and system performance .............................................................. 21

Chapter 3: Notes on the code ................................................................................... 22

3.1 Overall flow of the code ................................................................................ 22

iii

3.2 Notes on main.cpp ......................................................................................... 24

3.2.1 Defining the two webcams we need ...................................................... 24

3.2.2 Smoothing neighborhood in averaging .................................................. 24

3.2.3 The color model to be used .................................................................... 24

3.2.4 Debugging .............................................................................................. 25

3.2.5 Luminance.............................................................................................. 25

3.2.6 Control the code while running.............................................................. 25

3.3 Notes on constants.h ...................................................................................... 26

3.3.1 Threshold of colors difference in each pixel .......................................... 26

3.3.2 Consecutive pixels threshold ................................................................. 26

3.3.3 Calibration touch offset.......................................................................... 26

3.3.4 Consecutive detections to locate a corner .............................................. 26

3.3.5 Limit of attempts to locate a corner ....................................................... 27

3.3.6 Calibration scanlines distances .............................................................. 27

3.3.7 Picture format, resolution, fps and grab method .................................... 27

3.4 Compiling the code after any edits ................................................................ 28

Chapter 4: Challenges .............................................................................................. 29

4.1 The environment ........................................................................................... 29

4.1.1 OpenCV on Windows ............................................................................ 29

4.1.2 C/C++ programming on Linux/Ubuntu ................................................. 29

4.1.3 Libraries that must be installed for the code .......................................... 30

4.2 The cameras................................................................................................... 33

4.3 The fisheye lenses ......................................................................................... 34

Chapter 5: Conclusions and Future Work ............................................................... 38

References .................................................................................................................... 39

Chapter 0: Appendix ................................................................................................ 43

0.1 Installing Ubuntu 11.10 ................................................................................. 43

iv

0.2 Installing the required libraries and packages ............................................... 50

0.2.1 Installing "build-essential" using the "Terminal" .................................. 50

0.2.2 Installing libraries/packages using "Ubuntu Software Center".............. 53

0.3 Check webcam supported formats and UVC compliance ............................. 54

0.3.1 UVC compliance check ......................................................................... 54

0.3.2 Supported configurations and formats ................................................... 55

0.3.3 Troubleshooting webcams ..................................................................... 56

v

List of Figures

Figure ‎1.1: Survey results on UniMasr.com website. .................................................... 2

Figure ‎1.2: Survey results on YosrON page on facebook. ............................................ 3

Figure ‎1.3: Touchscreen add-on by TouchMagic. ......................................................... 3

Figure ‎2.1: Visual representation of scanlines. ............................................................ 10

Figure ‎2.2: The buffer used for the analysis of the green row shows a clear peak. ..... 12

Figure ‎2.3: The system correctly detects only the the pointer coming from above. .... 13

Figure ‎2.4: The vertical contiguity constraint of a of a hand holding a pen. ............... 14

Figure ‎2.5: Example of a simple but inefficient configuration. ................................... 14

Figure ‎2.6: Suggested configuration to optimize the use of the cameras. ................... 15

Figure ‎2.7: Resolution accuracy of W1 in t. ................................................................. 19

Figure ‎2.8: A4Tech webcam, PK 720G model. ........................................................... 21

Figure ‎4.1: Full-frame fisheye image........................................................................... 35

Figure ‎4.2: Remapped full-frame fisheye image into rectilinear prespective. ............. 35

Figure ‎4.3: Circular fisheye image............................................................................... 36

Figure ‎4.4: The image of circular fisheye after remapping (Defisheye). ..................... 36

Figure ‎4.5: Fisheye for home doors. ............................................................................ 37

Figure ‎6.1: Windows Disk Management Tool. ............................................................ 44

Figure ‎6.2: Shrink dialog box. ..................................................................................... 45

Figure ‎6.3: Windows partitions after successfully freeing space. ............................... 46

Figure ‎6.4: Don't allow any updates ............................................................................ 47

Figure ‎6.5: Install Ubuntu alongside your current OS. ................................................ 48

Figure ‎6.6: Disabling automatic updates of Ubuntu. ................................................... 49

Figure ‎6.7: Using Ubuntu Software Center to install required libraries/packages. ..... 54

Figure ‎6.8: Checking supported configurations and formats using guvcview............. 55

vi

Acknowledgments

We would like to thank those who helped us to make this dream comes true. No

matter how big or little help they offered, we would like to mention them all as much

as we can. We will mention them according to the timing of their help.

Thanks to Dr. Ibrahim Qamar for accepting us and our idea. Thanks for his valuable

time and efforts of understanding and kindness discussing with us many problems

leading us to solutions.

Thanks to Eng. Abdel-Mohsen for telling us which programming language to use

(Matlab is easy but slow, C++ is good with toolboxes and very fast for image

processing).

Thanks to Eng. Khaled Yeiha and Eng. Ahmad Ismail for giving us useful

guidelines for the algorithm.

Thanks to Eng. E. Rustico (From Italy) for supporting us with documents, codes and

instructions that helped us very much as we built our project on his work, EverScreen.

Thanks to Dr. Essam, the glasses maker, for helping us with the fisheye lenses.

Thanks to Eng. Shaimaa Mahmoud and Eng. Dina Zeid for helping us with

OpenCV toolbox and in some translations (From Italian to English).

Thanks to Muhammad Sherif and Sherif Medhat for helping us with programming

on Ubuntu.

Thanks to Eng. Muhammad Hosny for helping us debugging some codes and

solving many problems we faced with the OS and the software.

Thanks to Mr. Muhammad Reda for helping us finding the compatible webcams.

Thanks to Eng. Sherbeeny Hasan, Muhammad's father, for helping us with the

webcams and the fisheye lenses.

Thanks to our families for supporting us in every way all the time.

vii

Abstract

The entire world is heading to design all operating systems and programs to work

with touch technology. But most of the Egyptians, and others around the world, can't

afford the cost of a touchscreen for their computers. That's why we came up with

YosrON.

YosrON is meant to be a touchscreen add-on that can be put on any computer screen,

PC or laptop, to add the "touch" feature to the computer screen using a USB

connection and software.

It has been built on a complete and inexpensive system to track the movements of a

physical pointer on a flat surface. Any opaque object can be used as a pointer (fingers,

pens, etc.) and it is possible to discriminate whether the surface is being touched or

just pointed at. The system relies on two entry-level webcams and it uses a fast

scanline-based algorithm. A calibration wizard helps the user during the initial setup

of the two webcams. No markers, gloves or other hand-held devices are required.

Since the system is independent from the nature of the pointing surface, it is possible

to use a screen or a projected wall as a virtual touchscreen. The complexity of the

algorithms used by the system grows less than linearly with resolution, making the

software layer very lightweight and suitable also for low-powered devices like

embedded controllers.

We were planning to make a resizable plastic frame as housing for the webcams and

the added wide-angle lenses, fisheye lenses, but we ran out of time and faced many

problems that made us postpone it for future work besides adding the multi-touch

feature.

For now, YosrON is just two webcams fixed far away from the touching surface and

software is used for calibration and moving the mouse.

1

Chapter 1: Introduction

1.1 Why is it important?

The advances in technology and the widespread usage of computers in almost every

field of human activity are necessitating new interaction methods between humans

and machines. The traditional keyboard and mouse combination has proved its

usefulness but also, and in a more extensive way, its weakness and limitations. In

order to interact in an efficient and expressive way with the computer, humans need to

be able to communicate with machines in a manner more similar to human-human

communication.

In fact, throughout their evolution, human beings have used their hands, alone or with

the support of other means and senses, to communicate with others, to receive

feedback from the environment, and to manipulate things. It therefore seems

important that technology makes it possible to interact with machines using some of

these traditional skills.

The human-computer interaction (HCI) community has invented various tools to

exploit humans’ gestures, the first attempts resulting in mechanical devices.

Devices such as data gloves can prove especially interesting and useful in certain

specific applications but have the disadvantage of often being onerous, complex to

use, and somewhat obtrusive.

The use of computer vision can consequently be a possible alternative. Recent

advances in computer vision techniques and availability of fast computing have made

the real-time requirements for HCI feasible. Consequently, extensive research has

been done in the field of computer vision to identify hand poses and static gestures,

and also, more recently, to interpret the dynamic meaning of gestures. Computer

vision systems are less intrusive and impose lower constraints on the user since they

use video cameras to capture movements and rely on software applications to perform

the analysis.

2

Among the existing graphical input devices, computer users love especially

touchscreens. The reason is that they reflect, as no other device does, the way we use

to get in touch and interact with the reality around us: we use to point and touch

directly with our hands what we see around us; touchscreens allow doing the same

with our fingers on computer interfaces. This preference is confirmed by a strong

trend in the industry of high end platforms (e.g. Microsoft Surface and Touchwall)

and in the market of mobile devices: Apple, Samsung and Nokia, to cite only a few

examples, finally chose a touch-sensible display for their leading products, while the

interest for this technology is growing also for design studios, industrial environments

and public information points like museums and ATMs.

Unfortunately, touchscreen flexibility is low: finger tracking is impossible without

physical contact; it is not possible to use sharp objects on them; large touch-sensible

displays are expensive because of their manufacturing cost and damage-proneness.

YosrON is made of low cost devices, without the use of any kind of equipment that is

not possible to find in any computer shop with less than 300 EGP which is reasonable

price for the Egyptian market and other similar markets. It's important to offer such

add-on with low price because the upcoming Microsoft Windows 8 OS, which is the

most common OS in Egypt, is mainly designed for touchscreens. Of course it can be

used without touchscreens, but that would be a great loss for the user experience.

We made a simple survey asking many computer users and resellers if they would buy

such an add-on and how much would they pay for it?

The results are in fig. 1.1 and fig. 1.2.

Figure ‎1.1: Survey results on UniMasr.com website.

3

Figure ‎1.2: Survey results on YosrON page on facebook.

1.2 Other related projects

The only commercial product we found is TouchMagic (fig. 1.3)

which is available in USA and can be found in the Middle East,

only in UAE, KSA and the occupied lands of Palestine; Israel.

This product is available in fixed sizes with minimal cost of 170

$ = 1000 EGP for 15" screens. So, if you changed your

computer/screen for any reason, you will probably need to buy a

new add-on that fits your screen size. That's why it's not wanted

in the market because it's expensive and not resizable.

But when we talk about researches and projects in computer interfaces, we find all of

them turning back to the human body, trying to adapt the way we communicate with

computers to our natural way of move and behave. Speech-driven interfaces, gesture-

recognition software and facial expression interpreters are just some examples of this

recent trend.

There is a growing interest in the ones that involve real-time body tracking, especially

if no expensive hardware is required and the user does not need to wear any special

equipment. The simplest and cheapest choice is to use optical devices to track a

specific part of the body (head, eyes, hands or even the nose {Check [GMR02] in the

Figure 1.3:

Touchscreen add-on

by TouchMagic.

4

references}); we focus on finger tracking systems that do not require lasers, markers,

gloves or hand-held devices [SP98, DUS01, Lee07].

The main application of finger tracking is to move a digital pointer over a screen,

enabling the user to replace the pointing device (e.g. the mouse) with his hands. While

for eye or head tracking we have to direct the camera(s) towards the users’ body,

finger tracking let us a wider range of choices.

The first possibility is to direct the camera towards the user’s body, as for head

tracking, and to translate the absolute or relative position of the user’s finger to screen

coordinates. In [WSL00] an empty background is needed; in [IVV01] the whole arm

position is reconstructed, and in [Jen99] a combination of depth and color analysis

helps to robustly locate the finger. Some works tried to estimate the position of the

fingertip relatively to the view frustum of the user; this was done in [CT06] with one

camera and in [pHYssCIb98] with stereovision, but both had strong limits in the

accuracy of the estimation.

The second possibility is to direct the camera towards the pointing surface, which may

be static or dynamic. Some works require a simple black pad as pointing surface,

making it easy to locate the user’s finger with only one camera [LB04]; however, we

may need additional hardware [Mos06] or stereovision [ML04] to distinguish if the

user is just hovering the finger on it or if there is a physical contact between the finger

and the surface. A physical desktop is an interesting surface to track a pointer on.

Some works are based on the DigitalDesk setup [Wel93], where an overhead

projector and one or more cameras are directed downwards on a desk and virtual

objects can interact with physical documents [Ber03, Wil05]; others use a similar

approach to integrate physical and virtual drawings on vertical or horizontal

whiteboards [Wil05, vHB01, ST05], and one integrates visual information with an

acoustic triangulation to achieve better accuracy [GOSC00]. These works use

differencing algorithms to segment the user’s hands from the background, and then

shape analysis or finger templates matching to locate the fingertips; they rely on the

assumption that the background surface is white, or in general of a color different than

skin. Other approaches work also on highly dynamic surfaces. It is possible to

robustly suppress the background by analyzing the screen color space [Zha03] or by

applying polarizing filters to the cameras [AA07]; in the first the mouse click has to

5

be simulated with a keystroke, while in the latter a sophisticated mathematical finger

model allow to detect the physical contact with stereovision. Unfortunately, these two

techniques cannot be applied to a projected wall. Directing the camera towards the

pointing surface implies, in general, the use of computationally expensive algorithms,

especially when we have to deal with dynamic surfaces.

A third possible approach, which may drastically reduce the above problems, is to

have the cameras watching sidewise - i.e. laying on the same plane of the surface;

using this point of view we do not have any problem with dynamic backgrounds both

behind the user or on the pointing surface, and this enables us to set up the system

also in environments otherwise problematic (e.g. large displays, outdoor, and so on).

Among the very few works using this approach, in [QMZ95] the webcam is on the top

of the monitor looking towards the keyboard, and the finger is located with a color

segmentation algorithm. The movement of the hand along the axis perpendicular to

the screen is mapped to the vertical movement of the cursor, and a keyboard button

press simulates the mouse click. However, the position of the webcam has to be

calibrated and the vertical movement is mapped in an unnatural way. Also in [WC05]

we find a camera on the top of a laptop display directed towards the keyboard, but the

mouse pointer is moved accordingly to the motion vectors detected in the gray scale

video flow; a capacitive touch sensor enables and disables the tracking, while the

mouse button has to be pressed with the other hand. In [Mor05], finally, the ―lateral‖

approach is used to embed four smart cameras into a plastic frame that is possible to

overlap on a traditional display.

The above approaches need to process the entire image as it is captured by the

webcam. Thus, every of the above algorithms are at least quadratic with respect to

resolution (or linear with respect to image area). Although it is possible to use smart

region finding algorithms, these would not resolve the problem entirely. In [FR08]

they proposed the 1st version of EverScreen, a different way to track user movements

keeping the complexity low. They drastically decreased the scanning area to a discrete

number of pixel lines of two uncalibrated cameras. Their system requires a simple

calibration phase that is easy to perform also for non-experienced users. The proposed

technique only regards the tracking of a pointer, and it is not about gesture

recognition. The output of the system, at present, is directly translated into mouse

movements, but may be instead interpreted by gesture recognition software.

6

1.3 YosrON is built on the 2nd

version of EverScreen

The 1st version of EverScreen focused its attention mostly on the mapping algorithm

and provided only a description of an early stage of the system. The 2nd

introduces a

more efficient and mature system, exploiting an improved pointer detection but

computationally and economically cheap as the previous one. Among the

improvements:

Two proximity constraints in the pointer detection help to reduce the

number of false positives.

A convolution-based algorithm is used to locate the presence of a pointer.

The gap from the reference backgrounds is kept under control to detect

camera movements.

The calibration phase is faster, and the system graphically shows the points

to touch.

Iterative algorithms are used to solve the linear systems instead of direct

formulas.

1.3.1 The hardware

YosrON was planned to consist of four 90-degrees view angle cameras fixed in

corners of a resizable frame with Arrays of IR or LEDs, all together connected to a

USB hub to be connected to the computer through one single port. We also planned to

implement the software on a microprocessor to eliminate any processing load on the

host computer. But we had to reduce the hardware because of some challenges that

will be mentioned later.

1.3.2 The software

It's for image processing and geometrical calculations on the cameras outputs to

determine the position of the finger (pointing tool). It was planned to be a C++ code

using OpenCV toolbox on visual studio in Windows OS. We faced some problems

with the configuration of the environment and some limitation with the toolbox so we

migrated to Ubuntu 11.10, 64-bit with lots of libraries to be mentioned later.

7

1.3.3 The advantages of YosrON

Resizable: With no glass used, the same item can be used with any screen

of any size.

Low cost: The expected cost for end users is around 200 EGP. (The

prototype cost less than 300 EGP, so the single item in mass production

would cost less!)

Fast: With configuration of 30 fps, the response of the software is

immediate (In the range of microseconds).

Accurate: With configuration of 320x240 resolution, the accuracy is

acceptable for touchscreen systems (OS and programs are designed with

big buttons)

Easy fabrication: Manufacturers can easily fabricate it in mass production

without the need of any new or complicated technology.

1.3.4 The challenges we expected

Cameras: Finding USB cameras of low cost and fast response with wide

view angle (90˚ at least).

Resizable frame: Fabrication of a plastic resizable frame and mounting

the cameras on it.

Processing: Building the software that can interact with the cameras and

process the images to determine the pointer/finger position.

Load: Reducing the processing load on the host computer using

microprocessor.

8

1.3.5 The skills we needed

Image processing using OpenCV toolbox on C++ visual studio. (Before

we migrate to Ubuntu)

Installing and configuring Linux/Ubuntu OS.

C/C++ programming on Ubuntu.

Debugging and troubleshooting.

And for production, we will need to make drivers for different OSs and to implement

the software on microprocessor.

1.3.6 The plan

Purchasing and installing webcams and wide-angle lenses.

Building the initial image processing code on live stream images on a

single webcam for finger detection only.

Building the code of calibration and solving the streams from both

webcams.

Building the mouse controlling code.

Fabricating the resizable frame housing.

Refining the software after housing.

Building the driver and calibration software.

9

Chapter 2: YosrON structure

2.1 System Description

The system now consists of two off-the-shelf webcams positioned sidewise so that the

lateral silhouette of the hand is captured into an image like fig. 2.1. After a quick

auto-calibration, the software layer will be able to interpret the image flow and

translate it into absolute screen coordinates and mouse button clicks; the

corresponding mouse events will be simulated on the OS in a completely transparent

way for the application level. We call pointing surface the rectangle of surface to be

tracked; as pointing surface we can choose a desk, a LCD panel, a projected wall, etc.

An automatic region stretching is done to map the coordinates of the pointing surface

to the target display. Any opaque object can be used to point or touch the surface: the

system will track a finger as well as a pencil, a chalk or a wooden stick.

2.2 Scanlines

We focus the processing only on a small number of pixel lines from the whole image

provided by each webcam; we call these lines scanlines. Each scanline is horizontal

and ideally parallel with the pointing surface; we call touching scanline the lowest

scanline (the nearest to the pointing surface), and pointing scanline every other one.

The calibration phase requires grabbing a frame before any pointer enters in the

tracking area; these reference frames (one per webcam) will be stored as reference

backgrounds, and will be used to look for runs of consecutive pixels different from

the reference background. We will see later how we detect such scan-line

interruptions (fig. 2.1). The detection of a finger only in pointing scanlines will mean

that the surface is only being pointed, while a detection in all the scanlines will mean

that the user is currently touching the surface. To determine if a mouse button

pressure has to be simulated, we can just look at the touching scanline: we assume

that the user is clicking if the touching scanline is occluded in at least one of the two

views.

10

Figure ‎2.1: Visual representation of scanlines.

During the calibration phase the number of scanlines of interest may vary from a

couple to tens; during the tracking, three or four scanlines will suffice for an excellent

accuracy. A detailed description of the calibration will be given later.

2.3 Noise reduction

We detect the presence of a physical pointer in the view frustum of a webcam by

comparing the current frame with the reference background. This is simple in absence

of noise; unfortunately, the video flow captured from a CMOS sensor (the most

common type of sensor in low cost video devices) is definitely not ideal and presents

a bias of white noise, salt and pepper noise and motion jpeg artifacts. This makes

pointer detection more difficult, especially when the pointer is not very close to the

camera and its silhouette is therefore only a few pixels wide. To keep the overall

complexity low we avoided applying any post-elaboration filter on each of the

grabbed frames and we adopted two simple strategies in order to reduce the impact of

noise on our algorithm.

11

The first strategy is to store, as a reference background, not just the first frame

but the average of the first b frames captured (in current implementation, b =

4). The average root mean square deviation of a frame from the reference

background, after this simple operation, decreases from ~1.52 to ~1.26 (about

−17%).

The second strategy is to apply a simple convolution to the scanlines we focus

on. The matrix we use is

with divisor 3. This is equivalent to say that we replace each pixel with the

average of a 1 pixel neighborhood on the same row; it is not worth increasing

the neighborhood of interest because by increasing it we decrease the tracking

accuracy.

Finally, we keep track of the Root Mean Square Error (RMSE) with respect to the

reference frames; if the RMSE gets higher than a threshold, this is probably due to a

disturbing entity in the video or to a movement of the camera rather than to systematic

noise. In this case, the system automatically stops tracking and informs the user that a

new reference background is about to be grabbed.

2.4 Fast pointer detection

Although some noise has been reduced, we cannot rely only on a binary differencing

algorithm. A set of pixels different from the reference frame is meaningful if they are

close to each other; we apply this spatial contiguity principle both horizontally and

vertically. This approach imitates the so called Helmholtz principle for human

perception.

The Helmholtz principle states that an observed geometric structure is perceptually

meaningful if its number of occurrences would be very small in a random situation.

(see [MmM01])

12

The first goal is to find a run of consecutive pixels significantly different from the

reference; what we care is the X coordinate of the center of such interruption.

We initialize to zero a buffer of the same size of one row, and then we start scanning

the selected line (say l). For each pixel p = ( px, pl ), we compute the absolute

difference dp from the correspondent reference value; then, for each pixel q = ( qx , ql

) in a neighborhood long n, we add this dp multiplied by a factor m inversely

proportional to | px – qx |.

Finally we read in the buffer a peak value correspondent to the X coordinate of the

center of the interruption (fig. 2.2); if no interruption occurred in the row (i.e. pixels

different from the reference were not close to each other), we will have only ―low‖

peaks in the buffer.

To distinguish between a ―high‖ and a ―low‖ peak we can use a fixed or a relative

threshold; in our tests, a safe threshold was about 20 times greater than the

neighborhood length.

Figure ‎2.2: The buffer used for the analysis of the green row shows a clear peak.

13

Now we have a horizontal proximity check, but not a vertical one yet. Each webcam

sees the pointer always breaking into the view frustum by the upper side. The pointer

silhouette may be straight (like a stick) or curved (e.g. a finger); in both cases, the

interruptions found on scanlines close to each other should not differ more than a

given threshold.

This vertical proximity constraint gives a linear upper bound to the curvature of the

pointer, and helps discarding interruptions caused by noise or other objects entering in

the view frustum; in other words, the system detects only pointers coming from

above, and keeps working correctly if other objects appear in the view frustum from a

different direction (e.g. the black pen in fig. 2.3).

Figure ‎2.3: The system correctly detects only the the pointer coming from above.

These two simple proximity checks make the recognition of the pointer an easier task.

Fig. 2.4 shows the correct detection of the pointer (a hand holding a pen) over a

challenging background. The lower end of the vertical sequence of interruptions is

marked with a little red cross.

14

Figure ‎2.4: The vertical contiguity constraint of a of a hand holding a pen.

2.5 Positioning the cameras

The proposed technique requires the positioning of two webcams relatively to the

pointing surface. The simplest choice is to put them so that one detects only

movements along the X axis, while the other one detects Y axis changes. This solution

is the simplest to implement, but requires the webcams to have their optical axes

perfectly aligned along the sides of

the pointing surface. Moreover, the

wider is the view field of a

webcam, the more we lose

accuracy on the opposite side of

the surface. On the other hand, the

narrower is the view field of the

webcams, the farther we have to

put them to capture the entire

surface.

Figure 2.5: Example of a simple but inefficient

configuration.

15

In fig. 2.5, for example, the webcam along Y axis of the surface has a wide view field,

but this brings resolution loss on segment DC; on the other side, the webcam along X

axis of the surface has a narrow view field, but it has to be positioned far from the

pointing surface to cover the whole area. If the surface is a 2×1.5m projected wall

and the webcam has a 45° view field, we have to put the camera ~5.2 meters away to

catch the whole horizontal size. A really usable system should not bother the final

user about webcam calibration, view angles and so on.

A way to minimize the calibration effort is to position the webcams near two non-

opposite corners of the pointing surface, far enough to catch it whole and oriented as

the surface diagonals were about bisectors of the respective view fields (fig. 2.6).

With this configuration there is no need to put the webcams far away from the

surface; this reduces the accuracy loss on the ―far‖ sides.

Figure ‎2.6: Suggested configuration to optimize the use of view frustum of the cameras.

In the rest of this project we will assume, for the sake of clarity, that the webcams are

in the same locations and orientations as in fig. 2.6. However, the proposed tracking

algorithm works with a variety of configurations without changes in the calibration

phase: the cameras may be positioned anywhere around the surface, and we only need

that they do not face each other.

16

2.6 Calibration phase

When the system is loaded, the calibration phase starts.

In this phase, after grabbing the reference backgrounds, we ask the user to touch the

vertices of the pointing surface and its center. When a pointer is detected in both

views, we track the position of its lower end (the red cross in fig. 2.4 and 2.3); if this

position holds with a low variance for a couple of seconds, the correspondent X

coordinate is stored. After we grabbed the position of all the five points, we compute

the Y coordinate of a ―special‖ scanline as the lowest row not intercepting the pointing

surface: during the tracking we will focus only on this row to grab the position of the

pointer, so that the overall complexity will be linear with the horizontal resolution.

2.7 Tracking algorithm

During the calibration phase we stored the X coordinate of each vertex as seen by the

webcams. The basic idea is to calculate the perspective transformation that translates

the absolute screen coordinates to absolute coordinates in the viewed image. We store

vertices in homogeneous coordinates and use a 3x3 transformation matrix M:

Since P is determined up to a proportional factor α there is no loss of generality in

setting one of the elements of M to an arbitrary non-zero value. In the following we

set the element l33 = 1. To obtain all the other elements of M, in principle the

correspondence between four pairs of points must be given. The proposed application

only needs to look at horizontal scanlines; for this reason there is no need to know the

coefficients l21,l22,l23 of M and we only have to determine the values of l11,l12,l13,l31,l32.

The number of unknown matrix elements has been decreased to five, so we only need

the x coordinate of five points (instead of the x and y of four points).

17

During the calibration phase, we ask the user to touch the four vertices of the pointing

surface and its center.

This setup greatly simplifies the computation of the unknown coefficients. Indeed

points A,B,C,D and the center E (see fig. 2.6) have screen coordinates respectively:

when the display resolution is W × H.

If Q is a point on the surface, let Qxp be the x coordinate of the corresponding

projected point. The final linear system to solve is:

which makes easy to obtain l11, l12, l13, l31, l32 for each camera.

During the tracking phase we face a somehow inverse problem: we know the

projected x coordinate in each view, and from these values (let them be Xl and Xr) we

would like to compute the x and y coordinates of the correspondent unprojected point

(that is, the point the user is touching). Let lij be the transformation values for the first

camera, and rij for the second one; the linear system we have to solve in this case is

18

It is convenient to divide the first two equations by zl and the latter two by zr , and

rename the unknown variables as follows

So that the final system is

This is a determined linear system, and it is possible to prove that in the setting above

there is always one and only one solution. By solving this system in x and y we find

the absolute coordinates of the point that the user is pointing/touching on the surface.

We can solve this system in a very fast way by computing once a LU factorization of

the coefficient matrix, and by using it to compute x and y for each pair of frames; we

can also use numerical methods, such as Single Value Decomposition, or direct

formulas. In the previous version of the system direct formulas were used, while now

a LU factorization is implemented.

2.8 Resolution accuracy

Let’s consider now how accurate is the tracking system depending on display and

webcam physical characteristics.

19

Let t = (xt ,yt ) be a point on the pointing surface, XD×YD the display resolution (i.e.

the resolution of the projector for a projected wall) and XW1 ×YW1 the resolution of a

webcam W1; let βW1 be the bisector of the view frustum of W1, and let the upper left

corner of the surface be the origin of our coordinate system (with Y pointing

downwards, like in fig. 2.7). We assume for simplicity that the view frustum of the

camera is centered on the bisector of the coordinate system, but the following

considerations keep their validity also in slightly different configurations.

The higher is the number of pixels detected by the webcam for each real pixel of the

display, the more accurate will be the tracking. Thus, if we want to know how

accurate is the detection of a point in the pointing surface, we could consider the ratio

between the length in pixels of the segment Xt , passing by t and perpendicular to βW1

, and the number of pixels detected by the webcam W1. We define resolution accuracy

of W1 in t and we call σ(W1, t) this ratio. It is clear that we only care about the

horizontal resolution of W1, which is constant in the whole view frustum of the

camera. (fig. 2.7)

Figure ‎2.7: We define ―resolution accuracy of W1 in t― the ratio between the length of Xt and the

number of pixels detected by W1.

20

Because pixels are approximately squares, the number of pixels along the diagonal of

a square is equal to the number of pixels along an edge of the square; thus, the length

of Xt will be equal to the distance from the origin of one of the two points that Xt

intercepts on the X and Y axes.

For every point p ∈ Xt is xp + yp = k; then, its length will be equal to the y-intercept of

the line passing by t and perpendicular to βW1. So we have |Xt | = xt + yt ; hence, the

resolution accuracy of W1 in t is

One of the most interesting applications of the system is to projected walls, so that

they become virtual blackboards.

A very common projector resolution is nowadays 1024 × 768 pixels, while one of

the maximum resolutions that recent low-cost webcams support is 1280×1024 pixels

at 15 frames per second. In this configuration, the resolution accuracy in t = (1024,

768) is

This is the lowest resolution accuracy we have with W1 in the worst orientation; if we

invert the X axis to get the accuracy for W2 (supposing that W2 is placed on the upper

right corner of the surface), σ (W2, t) ≈1.7.

In the central point u = (512, 384) of the display we have σ(W1, u) = σ(W2, u) ≈ 1.4;

it is immediate that, in the above configuration, the average resolution accuracy is

higher than 1:1 (sub-pixel).

21

2.9 Algorithm complexity

The number of scanlines is constant and in the tracking phase it is not useful to use

more than 3 or 4 of them. For each scanline we do a noise reduction (in linear time),

we apply a linear convolution filter (in linear time too) and then we do a linear search

for a peak. Finally, we solve the system (in constant time). The total complexity is

therefore linear with the horizontal resolution of the webcams.

2.10 Settings and system performance

The webcams we used for testing are two A4Tech PK 720G, with the following

specifications:

Image sensor: 1/6" CMOS, 640×480 pixels

Lens: F=2.4, f=3.5 mm

View angle: 54 degrees

Exposure control: Automatic

White balance: Automatic

Computer interface: USB 2.0

Focus range: Automatic focus, 10 cm to infinity

Frame rates: 30fps@160x120, @320x240, @640x480

Their 2012 price has been of about 110 EGP each. There is a mature Video4Linux2

compliant driver (uvcvideo) available for GNU/Linux.

Our prototype has good resolution accuracy and excellent time performances: less

than 10 milliseconds are needed to elaborate a new frame and compute the pointer

coordinates. Two USB webcams connected to the same computer can usually send

less than 20 frames per second simultaneously, while the software layer could

elaborate hundreds more.

The tracking system is in C++ in a GNU/Linux environment; in the relatively small

source code, all software layers are strictly separated, so that it is possible to port the

whole system to different platforms with very little changes in the source.

Figure 2.8: A4Tech webcam,

PK 720G model.

22

Chapter 3: Notes on the code

The code consists of separate files. Most of them are standard header files or contain

many standard functions. Most of our efforts in coding were made in the files:

constants.h, main.cpp and makefile.

3.1 Overall flow of the code

Yes

Start

Detect screen size

Initialize webcams & mouse handler

Grab 4 frames/webcam then average them

to set a reference image for each webcam

Ask the user to touch the 4 corners of the

screen and its center

For each corner, compare the live frames of

each webcam with its reference image

Redefine touchline after each corner

RMSE > 8.0

No

Any corner detection

attempts > 100 Exit

Yes

No

@

23

@

Calibration completed. Send

values to GSL for calculations

Tracking

Any

interrupts

No

Interrupts in

pointing scanlines

Yes

Move mouse Yes

Interrupts in

touchline below the

pointing interruptions

No

Inside the

tracking area

Yes

No

Click mouse Yes

No

24

3.2 Notes on main.cpp

3.2.1 Defining the two webcams we need

The following lines are responsible for defining which webcams to use:

const char *videodevice1 = "/dev/video1"; const char *videodevice2 = "/dev/video2";

If the host computer doesn't have any other webcams (doesn't have built-in webcam),

then these lines should be like this:

const char *videodevice1 = "/dev/video0"; const char *videodevice2 = "/dev/video1";

In general, we used an application called "Cheese webcam" to test the webcam and to

determine their ID.

After installing "Cheese webcam" using "Ubuntu Software Center", go to Edit

preferences And you can see a list of all connected webcams and their ID.

3.2.2 Smoothing neighborhood in averaging

It can be defined in the file constants.h, but it's defined in the file main.cpp for now. It

determines how many pixels before and after the each pixel to blur horizontally.

unsigned int SMOOTHING_NEIGHBORHOOD = 2;

It shouldn't be high to keep the reference image realistic.

3.2.3 The color model to be used

Two color models available in the code: YUV and RGB. Selection is made using the

following lines:

bool RGB_INSTEAD_OF_YUV = false;

False for YUV

True for RGB

25

3.2.4 Debugging

There are two debugging modes. Debug_one is for debugging one webcam only (the

first one) as we will be able to see a live streaming from the first webcam with a

single horizontal line across the image defining the scanline resulting a histogram

below the live stream showing interruptions as in fig. 2.2. And the other mode is an

overall debugging. Activating any of them is using the following lines:

debug = false; debug_one = false;

If debug_one is activated (Making it "true") it will prevent rest of the code from

running.

3.2.5 Luminance

The value of the following variable should be set depending on the luminance of the

surrounding.

norm_luminance = false;

3.2.6 Control the code while running

Some options can be altered while the code is running as following:

q: Quit.

s: Edit smoothing neighborhood.

l: Selecting the line to scan.

h: Which histogram mode to use ( l for live, p for peak, s for static, d for differential).

m: Which color model to use ( y for YUV, r for RGB).

u: To update the reference images.

26

3.3 Notes on constants.h

3.3.1 Threshold of colors difference in each pixel

In general, and for YUYV model, the threshold can be controlled using the following

lines:

const unsigned char COLOR_THRESHOLD = 20; const unsigned char Y_THRESHOLD = 20;

For RGB model, the threshold applied separately to each channel R, G, B.

const unsigned char R_THRESHOLD = 35; const unsigned char G_THRESHOLD = 38; const unsigned char B_THRESHOLD = 35;

3.3.2 Consecutive pixels threshold

How many sequence of consecutive pixels (not) to different whether the start (end) of

an interruption?

const unsigned int LENGTH_THRESHOLD = 16; const unsigned int HOLE_THRESHOLD = 3;

3.3.3 Calibration touch offset

Difference between the lowest breakpoint detected in the image and the height of the

scanline to choose for the interruption.

const unsigned int CALIBRATION_TOUCH_OFFEST = 8; //would edit it to make it 2

3.3.4 Consecutive detections to locate a corner

How many consecutive breaks are necessary to claim to have located the corner?

const unsigned int ALT_CALIBRATION_CONSECUTIVE_INTERRUPTIONS = 6; // make it 15

27

3.3.5 Limit of attempts to locate a corner

Maximum number of attempts for each corner detection.

const unsigned int CALIBRATION_CORNER_ATTEMPTS = 100;

3.3.6 Calibration scanlines distances

Distance between scanlines. The height of the touching line is established in the

calibration, the others are calculated using this value.

const unsigned int CALIBRATION_SCALINES_DISTANCE = 20;

3.3.7 Picture format, resolution, fps and grab method

In the following lines, you should only enter the resolution, fps, format and grab

method available by the webcams.

Check the appendix for more details on how to get these details about any webcam.

const unsigned int width = 320; const unsigned int height = 240; const unsigned int fps = 30; const int grabmethod = 1; // Use mmap (default) // const int grabmethod = 0; // Ask for read instead default mmap const int format = V4L2_PIX_FMT_YUYV; // Better quality, lower framerate //const int format = V4L2_PIX_FMT_MJPEG; // Lower quality, higher frame rate

Note that entering an unsupported option would lead to error 22. And entering higher

resolution without lowering the fps or using MJPEG format would lead to error 28

which is due to USB 2.0 bandwidth limitation.

More details about error 22 and error 28 can be found in section 4.2.

28

3.4 Compiling the code after any edits

To compile the code on Ubuntu, press "Alt+Ctrl+T" to open the terminal. If the code

is in the folder "YosrON" on Desktop, then type:

cd Desktop/YosrON

Note that all the commands in the terminal are case-sensitive even with the folder

names.

To remove older compilation files, type:

Make clean

To make new compilation files, type:

Make

To run the code, type: (For example)

./yosron

29

Chapter 4: Challenges

4.1 The environment

We spent very long time searching for the best software environment starting from the

programming language and toolboxes/libraries to use… ending with the OS.

4.1.1 OpenCV on Windows

We started with OpenCV toolbox with Visual Studio C++ on Windows 7, 64-bit. We

faced many problems at first due to incompatibilities between the latest version of

OpenCV and windows 7. After lots of online searching, we were instructed to use an

older version of OpenCV. We used version 2.2 and we were able to interface with the

webcams.

When we started to work on the code, we needed to process a single horizontal line of

pixels only instead of processing the entire image which is a very essential function

for our project as we wanted the software to be faster and light as much as we can.

After consulting engineers of experience with OpenCV, we have been told that

OpenCV can't do such a function and it must process the entire image. So, we had to

look for other alternatives leading us to C/C++ programming on Linux/Ubuntu.

4.1.2 C/C++ programming on Linux/Ubuntu

We had to change our track from Windows to Linux, even that our time was very

limited. We were encouraged to do so after we communicated with Eng. E. Rustico,

the designer of EverScreen, and he supported us with very useful documentation,

codes and instructions that helped us achieving our main target.

The OS used is Ubuntu 11.10, 64-bit with kernel version 3.0.0-22 and gcc/g++

version 4.4.6. (gcc/g++ is the compiler of C/C++ on Linux)

Installing Ubuntu is a little bit tricky as there are many options. We tried to install it

using WUbi (Windows Ubuntu Installer) but we had many problems. After many

attempts to fix those problems, we assumed that they would disappear if we tried the

30

installation all over again using another method. We had to remove all the installation

again and install it from a boot CD alongside with Windows 7. Details about this

process are available in the appendix.

4.1.3 Libraries that must be installed for the code

Build-essential:

An informational list of needed packages for C/C++

programming on Linux as it generally includes gcc/g++ and

other utilities and libraries.

Libc dev:

It provides headers from the Linux kernel. These headers are

used by the installed headers for GNU glibc and other system

libraries.

SDL dev (libsdl1.2-dev):

Simple DirectMedia Layer is a cross-platform multimedia

library designed to provide low level access to audio, keyboard,

mouse, joystick, 3D hardware via OpenGL, and 2D video

framebuffer. It is used by MPEG playback software, emulators,

and many popular games, including the award winning Linux

port of "Civilization: Call To Power."

SDL supports Linux, Windows, Windows CE, BeOS, MacOS,

Mac OS X, FreeBSD, NetBSD, OpenBSD, BSD/OS, Solaris,

IRIX, and QNX. The code contains support for AmigaOS,

Dreamcast, Atari, AIX, OSF/Tru64, RISC OS, SymbianOS,

and OS/2, but these are not officially supported.

SDL is written in C, but works with C++ natively, and has

bindings to several other languages, including Ada, C#, D,

Eiffel, Erlang, Euphoria, Go, Guile, Haskell, Java, Lisp, Lua,

ML, Objective C, Pascal, Perl, PHP, Pike, Pliant, Python,

Ruby, Smalltalk, and Tcl.

31

GSL dev (libgsl0-dev):

The GNU Scientific Library (GSL) is a numerical library for C

and C++ programmers. It is free software under the GNU

General Public License.

The library provides a wide range of mathematical routines

such as random number generators, special functions and least-

squares fitting. There are over 1000 functions in total with an

extensive test suite.

Xorg XTest (libxtst-dev):

The X window system (commonly X Window System or X11,

based on its current major version being 11) is a computer

software system and network protocol that provides a basis for

graphical user interfaces (GUIs) and rich input device

capability for networked computers. It creates a hardware

abstraction layer where software is written to use a generalized

set of commands, allowing for device independence and reuse

of programs on any computer that implements X.

V4L2 dev (libv4l-dev):

Video4Linux or V4L is a video capture application

programming interface for Linux. Many USB webcams, TV

tuners, and other devices are supported. Video4Linux is closely

integrated with the Linux kernel.

V4L2 is the second version of V4L. The original V4L was

introduced late into the 2.1.X development cycle of the Linux

kernel. Video4Linux2 fixed some design bugs and started

appearing in the 2.5.X kernels. Video4Linux2 drivers include a

compatibility mode for Video4Linux1 application, though

practically, the support can be incomplete and it is

recommended to use V4L2 devices in V4L2 mode.

32

It's considered as an API that provides unified access to various

video capturing devices, such as TV tuners, USB web cameras,

etc.

UVC drivers:

The USB video device class (also USB video class or UVC) is

a USB device class that describes devices capable of streaming

video like webcams, digital camcorders, transcoders, analog

video converters, television tuners, and still-image cameras.

The latest revision of the USB video class specification carries

the version number 1.1 and was defined by the USB

Implementers Forum in a set of documents describing both the

basic protocol and the different payload formats.

Webcams were among the first devices to support the UVC

standard and they are currently the most popular UVC devices.

It can be expected that in the near future most webcams will be

UVC compatible as this is a logo requirement for Windows and

Since Linux 2.6.26 the driver is included in kernel source

distribution.

luvcview:

luvcview is a camera viewer for UVC based webcams. It

includes an mjpeg decoder and is able to save the video stream

as an AVI file.

guvcview:

It provides a simple GTK interface for capturing and viewing

video from devices supported by the linux UVC driver,

although it should also work with any v4l2 compatible device.

The project is based on luvcview for video rendering, but all

controls are built using a GTK2 interface. It can also be used as

a control window only

33

4.2 The cameras

The cameras were very hard to find in the Egyptian market due to lack of availability

of highly technical details we need about any camera before we buy it. The cameras

must be UVC compliant and support different control options for resolution, frames

per second, color profiles… etc. And we also needed the cameras to mechanically

solid, stiff and capable of being fixed on any surface with the ability of changing the

direction of the lenses towards any direction.

First, we bought two 2B webcams and they worked with us nicely on OpenCV. But

when we migrated to Ubuntu, we had a major problem in the first phase of the project

(pointer/finder detection phase streaming from one webcam only) as they were

working well with guvcview but producing an error (error 22) with our code. We

checked their driver to make sure they are UVC compliant as we can't use the

Windows driver provided in the CD. (Checking UVC compliance for webcams is

available in the appendix). The error 22 was produced because the code was

configured for MJPEG picture format which is a compressed format of the raw stream

while the cameras only support YUYV format which is the uncompressed/raw format.

MJPEG format had been chosen in the beginning because it needs low bandwidth of

USB so that we can use 4 webcams or more on the same USB 2.0 bus while YUYV

format consumes higher bandwidth with slightly better quality. Unfortunately, most or

all webcams in the Egyptian market doesn't support MJPEG format and we have been

told that they would be much more expensive. (Checking the supported formats by the

webcam is available in the appendix)

But when we moved to the second phase (stream from two webcams for calibration

and calculating the pointer/finger position to move the mouse) we faced other errors

(28 and 16). After searching online, we found that error 28 is due to USB bandwidth

limitation and error 16 is due to device hanged.

As we know, the bus of USB 2.0 supports a total bandwidth of 480 Mbps. calculating

the required bandwidth for a webcam is based on the configuration of the webcam.

For a resolution of 640 x 480, 30 frames per second and 32-bit colors: the required

bandwidth = 640 x 480 x 30 x 32 = 294912000 bits/second = 294.912 Mbps

34

So, the total required bandwidth for two webcams = 2 x 294.912 = 589.824 Mbps

which is higher than the 480 Mbps total bandwidth supported by USB 2.0.

Overcoming this problem was supposed to be easy by setting the configuration of the

webcams to fewer frames (15 fps) or lower resolution (320x240), but that didn't work.

After spending more than a week investigating this problem and trying all the

suggested solutions, we suspected that the 2B webcams only supports one bandwidth

setting despite of the configuration which means that each webcam reserves a fixed

USB bandwidth much more than it really need no matter what is the configuration.

Error 16 is much related to error 22 as it means that the device is hanged and can't be

accessed. When a webcam starts streaming, it reserves the bandwidth.

When the other webcam starts to work on the same bus, it requests the needed

bandwidth which is not available because of the first webcam. So, both webcams

hang and stop responding while the system keeps their ports (i.e. /dev/video1)

reserved forcing us to unplug and plug them again.

Our final solution for these errors was to buy another two webcams that support either

MJPEG format or variable bandwidth depending on the configuration.

We didn't find webcams in the Egyptian market that support MJPEG format but we

found A4Tech webcams that supported variable bandwidth depending on the

configuration.

A4Tech webcams don't support MJPEG and support only 30 frames per second. So

we had to work with the configuration of 320 x 240 resolution which is acceptable for

our needs.

4.3 The fisheye lenses

We need the view angle of each webcam to be more than 90 degrees to be able to put

them very near to the screen and not having any blind areas. Most of webcams have a

view angle of less than 60 degrees. So, we need to use fisheye lenses by installing

each lens on each webcam.

35

We needed a full-frame fisheye lens that produces images as in fig. 4.1.

Figure ‎4.1: Full-frame fisheye image.

Then to remap it into rectilinear perspective (Defisheye) with any of the available

scripts like Panorama Tools as in fig 4.2.

Figure ‎4.2: Remapped full-frame fisheye image into rectilinear prespective.

We searched in many places and asked many photographers and glass makers to help

us finding a single lens that serves as a full-frame fisheye with a very small size for

our webcams. But all the attempts failed.

36

We also couldn't find circular fisheye that would produce an image as in fig 4.3.

Figure ‎4.3: Circular fisheye image.

That also can be remapped into a normal image as in fig. 4.4.

Figure ‎4.4: The image of circular fisheye after remapping (Defisheye).

37

Our final hope is to use the only available fisheye small enough for

YosON: The fisheye for home doors as in fig. 4.5.

We removed its metallic housing as we don't need and while we need

to make it smaller to fit in the plastic frame.

After removing the housing of the webcams and fixing the fisheye lenses on them, we

faced a problem that we couldn't overcome due to the lake of time and available

support in Egypt. The problem was that the fisheye lens produced some internal

reflections on the image (i.e. the lighting would be repeated in other parts in the

image) increasing the noise to unacceptable levels.

Another problem was the difficulties of finding two exactly identical fisheye lenses.

We thought it should be a simple thing if we bought them both from the same brand

and the same shop, but believe it or not: They weren't identical!!

Although that "identical" problem is possible to overcome using software, but the

killing problem was the "internal reflections" problem that made us postpone the

fisheye addition and the plastic frame to future work.

Figure 4.5:

Fisheye for home

doors.

38

Chapter 5: Conclusions and Future Work

5.1 Conclusions

We presented a low cost system for bare finger tracking able to turn LCD displays

into touchscreens, as well as a desk into a design board, or a wall into an interactive

whiteboard. Many application domains can benefit from the proposed solution:

designers, teachers, gamers, interface developers. The proposed system requires a

simple calibration phase.

5.2 Future work

Future works will be devoted to improve the robustness of the calibration and the

pointer-detection subsystems; moreover, suitable evaluation procedures to test the

empirical accuracy of tracking will be addressed. Adding multitouch support will also

be considered.

The system needs a GUI for installation, calibration and configuration as all of them

now are done by editing the source code which is not user friendly of course.

It would be better for the system if the processing load is not on the host computer.

That can be done by using a standalone DSP unit for image processing and position

calculations which will lead to changes in the cameras and the code.

A standalone DSP processing unit would be also good to make the system cross OS

supported as all the processing will be made on that unit and it will only send signals

to the OS through USB to move the mouse, do the clicks and even multitouch

functions. That will save us from making drivers and code editions for each OS like

Windows, Linux and Mac OS.

Solving the problem of the fisheye lenses is still an essential need for YosrON to be a

user friendly product. After solving this problem we can easily seek to put the entire

hardware inside a resizable plastic as housing.

39

References

[Figure 1.1] Survey from UniMasr.com website at: Can be found at:

http://unimasr.com/community/viewtopic.php?t=87470.

[Figure 1.2] Survey from YosrON page on facebook (http://fb.com/yosronx)

at: http://fb.com/questions/242871132427684/.

[Figure 1.3] Image and price details from http://www.magictouch.com and

local resellers available at: http://www.magictouch.com/middleeast.html.

[Figure 2.8] A4Tech webcam, PK 720G model at:

http://a4tech.com/product.asp?cid=77&scid=167&id=693.

E. Rustico. "Low cost finger tracking for a virtual blackboard" at

http://www.dmi.unict.it/~rustico/docs/Low%20cost%20finger%20tracking%2

0for%20a%20virtual%20blackboard.pdf.

[AA07] Chandraker M. Blake A. Agarwal A., Shahram Izadi S. High

precision multitouch sensing on surfaces using overhead cameras. In

Horizontal Interactive Human-Computer Systems, 2007. TABLETOP ’07.

Second Annual IEEE International Workshop on, pages 197– 200, 2007.

[Ber03] F. Berard. The magic table: Computer vision based augmentation of a

whiteboard for creative meetings. IEEE International Conference in Computer

Vision, 2003.

[CT06] Kelvin Cheng and Masahiro Takatsuka. Estimating virtual touchscreen

for fingertip interaction with large displays. In OZCHI ’06: Proceedings of the

20th conference of the computer-human interaction special interest group

(CHISIG) of Australia on Computer-human interaction: design: activities,

artefacts and environments, pages 397–400, New York, NY, USA, 2006.

ACM.

[DUS01] Klaus Dorfmüller-Ulhaas and Dieter Schmalstieg. Finger tracking

for interaction in augmented environments. Augmented Reality, International

Symposium on, 0:55, 2001.

[FR08] G.M. Farinella and E. Rustico. Low cost finger tracking on flat

surfaces. In Eurographics Italian chapter 2008, 2008.

[GMR02] D. Gorodnichy, S. Malik, and G. Roth. Nouse ’use your nose as a

mouse’ – a new technology for hands-free games and interfaces, 2002.

http://www.magictouch.com/middleeast.html

http://www.magictouch.com/middleeast.html

40

[GOSC00] Christophe Le Gal, Ali Erdem Ozcan, Karl Schwerdt, and James L.

Crowley. A sound magicboard. In ICMI ’00: Proceedings of the Third

International Conference on Advances in Multimodal Interfaces, pages 65–71,

London, UK, 2000. Springer-Verlag.

[IVV01] Giancarlo Iannizzotto, Massimo Villari, and Lorenzo Vita. Hand

tracking for human-computer interaction with gray level visual glove: turning

back to the simple way. In PUI ’01: Proceedings of the 2001 workshop on

Perceptive user interfaces, pages 1–7, New York, NY, USA, 2001. ACM.

[Jen99] Cullen Jennings. Robust finger tracking with multiple cameras. In In

Proc. Of the International Workshop on Recognition, Analysis, and Tracking

of Faces and Gestures in Real-Time Systems, pages 152–160, 1999.

[LB04] Julien Letessier and François Bérard. Visual tracking of bare fingers

for interactive surfaces. In UIST ’04: Proceedings of the 17th annual ACM

symposium on User interface software and technology, pages 119–122, New

York, NY, USA, 2004. ACM.

[Lee07] Johnny Chung Lee. Head tracking for desktop VR displays using the

Wii remote http://www.cs.cmu.edu/~johnny/projects/wii. 2007.

[ML04] Shahzad Malik and Joe Laszlo. Visual touchpad: a two-handed

gestural input device. In ICMI ’04: Proceedings of the 6th international

conference on Multimodal interfaces, pages 289–296, New York, NY, USA,

2004. ACM.

[MmM01] Lionel Moisanm and Jean Michel Morel. Edge detection by

Helmholtz principle. Journal of Mathematical Imaging and Vision, 14:271–

284, 2001.

[Mor05] Gerald D. Morrison. A camera-based input device for large

interactive displays. IEEE Computer Graphics and Applications, 25(4):52–57,

2005.

[Mos06] Tomer Moscovich. Multi-finger cursor techniques. In In GI ’06:

Proceedings of the 2006 conference on Graphics interface, pages 1–7, 2006.

[pHYssCIb98] Yi ping Hung, Yang Yao-strong, Yong sheng Chen, and Hsieh

Ingbor. Freehand pointer by use of an active stereo vision system. In Proc.

14th

Int. Conf. Pattern Recognition, pages 1244–1246, 1998.

41

[QMZ95] F. Quek, T. Mysliwiec, and M. Zhao. Fingermouse: A freehand

computer pointing interface, 1995.

[SP98] Joshua Strickon and Joseph Paradiso. Tracking hands above large

interactive surfaces with a low-cost scanning laser range finder. In

Proceedings of CHI’98, pages 231–232. Press, 1998.

[ST05] Le Song and Masahiro Takatsuka. Real-time 3d finger pointing for an

augmented desk. In AUIC ’05: Proceedings of the Sixth Australasian

conference on User interface, pages 99–108, Darlinghurst, Australia,

Australia, 2005. Australian Computer Society, Inc.

[vHB01] Christian von Hardenberg and François Bérard. Bare-hand human-

computer interaction. In PUI ’01: Proceedings of the 2001 workshop on

Perceptive user interfaces, pages 1–8, New York, NY, USA, 2001. ACM.

[WC05] Andrew D.Wilson and Edward Cutrell. Flowmouse: A computer

vision-based pointing and gesture input device. In Interact ’05, 2005.

[Wel93] Pierre Wellner. Interacting with paper on the digitaldesk.

Communications of the ACM, 36:87–96, 1993.

[Wil05] Andrew D. Wilson. Play anywhere: a compact interactive tabletop

projection-vision system. In Patrick Baudisch, Mary Czerwinski, and Dan R.

Olsen, editors, UIST, pages 83–92. ACM, 2005.

[WSL00] Andrew Wu, Mubarak Shah, and N. Da Vitoria Lobo. A virtual 3d

blackboard: 3d finger tracking using a single camera. In In Fourth IEEE

International Conference on Automatic Face and Gesture Recognition, pages

536–543, 2000.

[Zha03] Zhengyou Zhang. Vision-based interaction with fingers and papers. In

Proc. International Symposium on the CREST Digital Archiving Project,

pages 83–106, 2003.

Details about guvcview package from: http://guvcview.sourceforge.net.

Details about luvcview package from:

http://packages.ubuntu.com/hardy/luvcview.

Details about V4L2 library from: http://en.wikipedia.org/wiki/Video4Linux.

Details about SDL library from: http://www.libsdl.org.

Details about GSL library from: http://www.gnu.org/software/gsl.

42

Details about Xorg Xtest from

http://en.wikipedia.org/wiki/X_Window_System.

Details about build-essential package from:

http://packages.ubuntu.com/lucid/build-essential.

Details about UVC drivers from:

http://en.wikipedia.org/wiki/USB_video_device_class.

Details about Libc dev package from: http://packages.debian.org/sid/linux-

libc-dev.

Details about fisheye lenses from: http://en.wikipedia.org/wiki/Fisheye_lens.

Details about defisheye scripts from:

http://www.fmwconcepts.com/imagemagick/defisheye/index.php.

How to install Ubuntu 11.10 from a CD or USB flash memory. From:

http://blog.sudobits.com/2011/09/11/how-to-install-ubuntu-11-10-from-usb-

drive-or-cd/

How to free space on your hard disk and make it unallocated using Windows

Disk Management Tool. From: http://technet.microsoft.com/en-

us/magazine/gg309169.aspx.

How to disable automatic updates in Ubuntu. From:

http://www.garron.me/linux/turn-off-stop-ubuntu-automatic-update.html.

How to install build-essential from:

https://help.ubuntu.com/community/CompilingEasyHowTo.

How to check UVC compliance of a webcam and troubleshoot it from:

http://www.ideasonboard.org/uvc/faq.

43

Chapter 0: Appendix

0.1 Installing Ubuntu 11.10

The instructions given in this section assume that you want to install Ubuntu 11.10 as

a dual boot with Windows 7 (or XP/Vista or whatever you’ve already installed),

which is recommended for absolute beginners as if any problem occurs with Ubuntu.

That's how you would still be able to access Windows, but if you want something else

like – removing windows and install Ubuntu or erase whole disk and install Ubuntu

on a new computer – then most of the steps would be same – few things will change

that I’ve pointed out (Jump to steps).

Preparing for installation:

First of All – backup your important data

This step is very important, especially for beginners, as some mistakes would

lead to reformatting the entire hard disk and losing data.

So, Before going to start the installation procedure – you are strongly recommended

to backup your data (using a backup disk or online backup program), although you

aren’t going to lose any if you’ve multiple partition on your drive and want to go for

custom installation procedure, but you’re supposed to have a backup of all your

critical data before starting any experiments.

Step 1: Download Ubuntu 11.10 ISO file

First, Download Ubuntu 11.10 ISO (http://releases.ubuntu.com/oneiric), then select

the archive file (ISO) depending on your computer architecture – such as Intelx86 or

AMD64. If you are not sure then go for first one. When the download is completed,

move on to next step.

44

Step 2: Create a bootable media (USB/CD)

You can create a bootable USB stick/drive or a CD/DVD from the ISO file you’ve

just downloaded. If you want to create a bootable CD/DVD – then it’s pretty easy-

you just need to burn the ISO image to the cd.

If you want to install Ubuntu from a USB flash memory (pendrive), then use the free

program called – universal USB installer. To make your pendrive bootable – use

Universal-USB-Installer (Download from "http://www.pendrivelinux.com/universal-

usb-installer-easy-as-1-2-3" and run it – then locate the ISO file, choose your USB

drive as a target and your will be done in a minute). In Windows 7 you can burn ISO

files directly in few simple steps – Insert cd in to the tray, right click on the ISO file

and select burn this ISO… and finally you will get a bootable cd.

Step 3: Free enough space

Explore your partitions and make sure that one of them has at least 20 GB free. Then

use the Windows 7 Disk Management tool that provides a simple interface for

managing partitions and volumes. Here’s an easy way to shrink a volume:

1. Open the Disk Management console by pressing "Windows key + R" and

typing diskmgmt.msc at an elevated command prompt.

2. In Disk Management, right-click the volume that you want to shrink, and then

click Shrink Volume.

Figure 0.1: Windows Disk Management Tool.

45

3. In the field provided in the Shrink dialog box, enter the amount of space by

which to shrink the disk.

Figure ‎0.2: Shrink dialog box.

The Shrink dialog box provides the following information:

Total Size Before Shrink In MB Lists the total capacity of the volume in MB. This

is the formatted size of the volume.

Size Of Available Shrink Space In MB Lists the maximum amount by which you

can shrink the volume. This doesn’t represent the total amount of free space on the

volume; rather, it represents the amount of space that can be removed, not including

any data reserved for the master file table, volume snapshots, page files, and

temporary files.

Enter The Amount of Space To Shrink In MB Lists the total amount of space that

will be removed from the volume. The initial value defaults to the maximum amount

of space that can be removed from the volume. For optimal drive performance, you

should ensure that the volume has at least 10 percent of free space after the shrink

operation.

Total Size After Shrink In MB Lists what the total capacity of the volume in MB

will be after you shrink the volume. This is the new formatted size of the volume.

46

4. After clicking "Shrink", you should see the free space as a green partition.

Figure ‎0.3: Windows partitions after successfully freeing space.

That free unallocated space will be automatically used by the Ubuntu installer.

Step 4: Insert the USB disk (or CD) and restart

Now restart your computer and enter the BIOS to make sure that your computer is

configured to boot first from CD or USB drives. The steps of this configuration are

not the same for every computer, so you have to do it yourself. You can search online

according to the model for your motherboard or you can ask the help of any available

technical support for you.

When your computer is booting, if you have set any password, enter your supervisor

BIOS password as your system may not boot from CD if you enter user BIOS

password. Your computer should boot automatically from the bootable media, and the

Ubuntu will be loaded in RAM.

If any option comes then select "try Ubuntu without installing" if you want to take a

look before installing it on your hard drive. Then click on the install Ubuntu 11.10

icon on the desktop to begin and select the language "English" to continue.

47

Step 5: Select Installation Type

For YosrON, we should not allow any updates for the environment, especially kernel

updates or an entire upgrade, as this might lead to incompatibilities between the

software headers and the kernel headers. So, make sure to uncheck "Download

Updates" but you can check "install third party software", but you must be connected

with Internet (it’s recommended – if wireless network doesn’t seem to work use wired

connection). Although there is no hurry – you can always install them later, so it’s

optional.

Figure ‎0.4: Don't allow any updates

then click on continue – then a new window will appear – where you need to select

installation type.

48

Figure ‎0.5: Install Ubuntu alongside your current OS.

You may get different options depending on your computer configuration. The above

snapshot has been taken while installing Ubuntu 11.10 on a computer with Ubuntu

10.04 and Windows 7 pre-installed as dual boot.

Install Ubuntu alongside with them: it will install Ubuntu 11.10 alongside

with existing operating systems such as Windows 7.

Erase Entire Disk and Install Ubuntu: it’s going to erase your whole hard

drive and everything will be deleted (your files as well as other operating

systems), useful only if your hard-drive doesn’t have any important files or

you just bought a new computer and want to keep only one OS – i.e. Ubuntu.

Something Else: Create, Allocate and choose the partition to which you want

to install Ubuntu, using advanced partition manager. At first look it may seems

little difficult but it’s better as it give you more options/control.

However, we will go with the first option – select "Install Ubuntu alongside them"

and continue.

49

Step 6: Finishing the installation

The rest of the steps are easy for any user and they are standards as available online.

But it's important to select the correct keyboard layout to ensure no problems later.

Most of keyboards in Egypt are "Arabic 101" layout. Also, it's very important to set a

password for Ubuntu and remember is very well as we will use it in installing the

required libraries and packages for YosrON.

Step 7: Disabling automatic updates

As we mentioned before, it's very important for YosrON to disable the automatic

updates feature in Ubuntu. From the menu on the left of the screen, Open the Ubuntu

Software Center then go to Edit -> Software Sources… and be sure to select to Never

the option Automatically check for updates:.

Figure ‎0.6: Disabling automatic updates of Ubuntu.

Then click "close". This will disable automatic update on you Ubuntu box.

50

0.2 Installing the required libraries and packages

Some libraries/packages can be installed directly from "Ubuntu Software Center" and

others must be installed from the "Terminal". The internet connection must be

available in both cases.

"Ubuntu Software Center" can be opened from the menu on the left of the screen. The

"Terminal" (AKA: command-line) can be obtained by pressing "Alt+Ctrl+T".

0.2.1 Installing "build-essential" using the "Terminal"

We need to install a compiler gcc which can be obtained by installing the build-

essential package.

Step 1: Prep your system for building packages

By default, Ubuntu does not come with the tools required. You need to install the

package build-essential for making the package and checkinstall for putting it into

your package manager. These can be found on the install CD or in the repositories,

searching in Synaptic Package Manager or the terminal apt-get:

sudo apt-get install build-essential checkinstall

And since you may want to get code from some projects with no released version, you

should install appropriate version management software.

sudo apt-get install cvs subversion git-core mercurial

You should then build a common directory for yourself where you'll be building these

packages. We recommend creating "/usr/local/src", but really you can put it anywhere

you want. Make sure this directory is writable by your primary user account, by

running

sudo chown $USER /usr/local/src

and, just to be safe

sudo chmod u+rwx /usr/local/src

After you've done this, you're set up to start getting the programs you need.

51

Step 2: Resolving Dependencies

One nice thing about modern Linux distributions is they take care of dependencies for

the user. That is to say, if you want to install a program, the apt program will make

sure it installs all needed libraries and other dependent programs so installing a

program is never more difficult than just specifying what you need and it does the

rest. Unfortunately with some programs this is not the case, and you'll have to do it

manually. It's this stage that trips up even some fairly experienced users who often

give up in frustration for not being able to figure out what they need to get.

You probably want to read about the possibilities and limitations of auto-apt

(https://help.ubuntu.com/community/AutoApt) first, which will attempt to take

care of dependency issues automatically. The following instructions are for

fulfilling dependencies manually:

To prepare, install the package "apt-file", and then run sudo apt-file update. This will

download a list of all the available packages and all of the files those packages

contain, which as you might expect can be a very large list. It will not provide any

feedback while it loads, so just wait.

The "apt-file" program has some interesting functions, the two most useful are apt-

file search which searches for a particular file name, and apt-file list which lists all the

files in a given package. (Two explanations:

1{http://debaday.debian.net/2007/01/24/apt-file-search-for-files-in-packages-

installed-or-not/} and 2{http://www.debianhelp.co.uk/findfile.htm})

To check the dependencies of your program, change into the directory you created in

step two (cd /usr/local/src). Extracting the tarball or downloading from

"cvs/subversion" will have made a sub-directory under "/usr/local/src" that contains

the source code. This newly-created directory will contain a file called "configure",

which is a script to make sure that the program can be compiled on your computer. To

run it, run the command ./configure This command will check to see if you've got all

the programs needed to install the program — in most cases you will not, and it will

error out with a message about needing a program.

If you run ./configure without any options, you will use the default settings for

the program. Most programs have a range of settings that you can enable or

https://help.ubuntu.com/community/AutoApt

http://debaday.debian.net/2007/01/24/apt-file-search-for-files-in-packages-installed-or-not/

http://www.debianhelp.co.uk/findfile.htm

52

disable, if you are interested in this check the README and INSTALL files

found in the directory after decompressing the tar file. You can check the

developer documentation and in many cases ./configure --help will list some

of the key configurations you can do. A very common options is to use

./configure --prefix=/usr which will install your application into "/usr" instead

of "/usr/local" as my instructions do.

If this happens, the last line of output will be something like

configure: error: Library requirements (gobbletygook) not met, blah blah blah stuff

we don't care about

But right above that it will list a filename that it cannot find (often a filename ending

in ".pc", for instance). What you need to do then is to run

apt-file search missingfilename.pc

which will tell you which Ubuntu package the missing file is in. You can then simply

install the package using

sudo apt-get install requiredpackage

Then try running ./configure again, and see if it works. If you get to a bunch of text

that finishes with "config.status: creating Makefile" followed by no obvious error

messages, you're ready for the next steps.

Step 3: Build and install

If you got this far, you've done the hardest part already. Now all you need to do is to

make sure you are inside the program folder (for example: a folder on the desktop

called YosrON), type:

cd Desktop/YosrON

Then, run the command

make

53

which does the actual building (compiling) of the program. (You can use make clean

to remove older compilation files after any edits you make in the code then use make

again)

Make sure you installed all the libraries/packages needed for YosrON before running

this command. Check the following sections.

When it's done, install the program. You probably want to use

sudo checkinstall

which puts the program in the package manager for clean, easy removal later. This

replaces the old sudo make install command. See the complete documentation at

CheckInstall (https://help.ubuntu.com/community/CheckInstall).

Note: If checkinstall fails you may need to run the command like

sudo checkinstall --fstrans=0

which should allow the install to complete successfully

Then the final stage of the installation will run. It shouldn't take long. When finished,

if you used checkinstall, the program will appear in Synaptic Package Manager. If

you used sudo make install, your application will be installed to "/usr/local/bin" and

you should be able to run it from there without problems.

Finally, it would be better to change the group of "/usr/local/src/" to admin and give

them rwx privileges? Since anyone adding and removing software should be in the

admin group.

0.2.2 Installing libraries/packages using "Ubuntu Software Center"

"Ubuntu Software Center" is much easier to be used for installing packages and

libraries.

First of all, you need to enable installing/updating software from other sources other

than Ubuntu. This is done by all opening the software center from the menu on the

left of the screen and going to Edit Software sources Other software

https://help.ubuntu.com/community/CheckInstall

54

Check the box "Canonical Partners" as in fig. 6.7, then click "Close".

Figure ‎0.7: Using Ubuntu Software Center to install required libraries/packages.

Then, type the name (code) of what you want to install in the search box found in the

upper right of the window. For example: type "guvcview" and it will appear in the

results. Just click "Install".

Some libraries/packages can't be installed from the "Ubuntu Software Center" leading

us to the "Terminal". For example: to install the SDL library, type:

sudo apt-get install libsdl1.2-dev

0.3 Check webcam supported formats and UVC compliance

0.3.1 UVC compliance check

1. First find out the vendor ID (VID) and product ID (PID) of the webcam.

Use: lsusb which will list all your USB devices including the VID and PID in

this format: VID:PID.

55

2. Use the lsusb tool again to look for video class interfaces like this: (In this

example, the VID is 046d and the PID is 08cb.)

lsusb -d 046d:08cb -v | grep "14 Video"

If the webcam is a UVC device, you should see a number of lines that look like this:

bFunctionClass 14 Video bInterfaceClass 14 Video bInterfaceClass 14 Video bInterfaceClass 14 Video

In this case the Linux UVC driver should recognize your camera when you plug it in.

If there are no such lines, your device is not a UVC device.

0.3.2 Supported configurations and formats

This is done using guvcview. Type guvcview in the terminal then go to "Video &

files". You can see all the supported configurations and formats for every webcam.

Figure ‎0.8: Checking supported configurations and formats using guvcview.

56

0.3.3 Troubleshooting webcams

If the webcam is UVC-compatible, it should be supported out of the box in any recent

Linux distribution. Failures are usually caused by buggy applications or broken

hardware (cameras, USB cables and USB host controllers can be faulty).

You should start with trying several applications. qv4l2, guvcview and luvcview are

common test tools for UVC webcams, but feel free to try other V4L2 applications as

well. In particular be careful that different webcams might use different video

formats, and some of them can be unsupported in some applications.

If all applications fail display the same failure, chances are that your hardware is

broken (or at least buggy), or that you're lucky enough to have hit a bug in the UVC

driver. To diagnose the problem, please follow this procedure:

1. Make sure the webcam is UVC compliant as mentioned in a previous section.

2. Enable all uvcvideo module traces:

sudo echo 0xffff > /sys/module/uvcvideo/parameters/trace

3. Reproduce the problem. The driver will print many debugging messages to the

kernel log, so don't let video capture running for too long. You can disable the

uvcmodule traces when you're done:

sudo echo 0 > /sys/module/uvcvideo/parameters/trace

4. Capture the contents of the kernel log:

dmesg > dmesg.log

5. If your device is not listed in the supported devices list

(http://www.ideasonboard.org/uvc/#devices), dump its USB descriptors:

lsusb -d VID:PID -v > lsusb.log

(replace VID and PID with your device VID and PID)

http://www.ideasonboard.org/uvc/#devices

YosrON Report v1.4

Documents

pk 720g model

usb flash

vertical contiguity

disabling

successfully

synaptic package

shrink dialog

define resolution