Augmented media for traditional magazines

Augmented Media for Traditional Magazines

Vinh-Tiep Nguyen

University of Science &

John von Neumann Institute

VNU-HCM

Ho Chi Minh city, Vietnam

[email protected]

Trung-Nghia Le

University of Science

VNU-HCM


[email protected]

Quoc-Minh Bui


VNU-HCM


[email protected]

Minh-Triet Tran


VNU-HCM


[email protected]

Anh-Duc Duong

University of Information Technology

VNU-HCM


[email protected]

ABSTRACT Reading traditional newspapers or magazines is a common

way to get latest information about events or new products.

However these printed materials only provide readers with

static information. Readers may want to know more detail

information about some product in an article or to watch

video clips related to an event mentioned in a news right at

the moment when they read that article or news. The

authors propose a system with mobile devices that can

provide extra information and multimedia for readers by

applying augmented reality to traditional magazines. A user

can enjoy extra rich multimedia information on a product or

a news on his/her mobile device just by looking at an article

in a traditional magazine through his/her mobile device.

The system detects which article in which page of a

magazine that is being displayed in a mobile device and

provides a reader with related information and multimedia

objects. The important feature of our proposed system is

using lightweight filter to efficiently filter out candidate

covers or articles that do not visually match an image

captured by a mobile device. The experiment shows that

our proposed system achieves the average accuracy of more

than 90% and can process in the real-time manner.

Keywords

Magazine, Augmented Reality, Planar Object Recognition

1. INTRODUCTION

For the traditional newspaper, readers can only receive

information in the form of printed papers. Although it is

prevalent, users will easily get bored with the traditional

way of reading news. For more information about the

events they read, readers often find themselves on the

website. We can solve this problem by integrating AR

technology into existing newspapers. With this approach,

users can get a new experience by the interesting media has

been added to the traditional newspaper.

Inspired by Harry Potter film, when the newspapers do not

only contain static images but also characters which can

move, talk and act like real people live. The animations are

put into real life of man, with the support of the AR. This

can be done on the papers, comic books, the user guide, or

even in the field of education such as books, research

papers.

Imagine when traditional newspapers are now equipped

with digital information, the reader will no longer need to

open a web browser to search the video whenever they see

the images of an event in the newspaper. We do not need to

go online for information about a new film was released, we

can watch the movie trailer directly and quickly through the

application associated with the newspaper.

In addition, this application is not limited to video

enhancement. We can expand to increase the illustrations

for the events in the newspaper. Sounds can also be added

to the newspaper to know the emotional tone of someone in

specific situations. For example in case of earthquakes in

Japan, we can see the video of the tsunami with more

pictures of each earthquake and hear the feelings of the

victims when faced with disaster.

To get more information or to argue online, we can use the

URL associated with this event on the paper to open it in

your browser. To share your feelings with friends while

reading the newspaper, we can use function “Like” to put

his remarks on social networks like Facebook. That can

actually raise more interest in reading the news of people,

because they not only read alone, but also read and share

with their friends.

Besides, the augmented information such as videos, images,

sounds, URLs and even 3D models can be used for

advertising purposes on the newspaper. As the new

advertisements of Toyota cars, manufacturers will offer

more information about their vehicles, a vehicle created by

a 3D model will help customers to view products visually.

They not only can observe the car from all angles, but also

can interact by touching the car parts to get detailed

information about the vehicle, such as gasoline

consumption in 1 km, the information warranty of the

product, etc. All the regular questions that the customers

may request will be matched appropriately according to

their requirements. Another example is real estate. The sale

will advertise their land more easily with images, videos

and 3D models. The customer can see and know how to

communicate easily with the seller. In this way, the broker

will be able to reduce their budget by advertising when they

can add a range of information on this same page size

paper.

Kompas newspaper, Indonesia's first Asian reality

applications enhance the reader support [2].

Commonwealth Bank uses technology to enhance

newspaper advertising [4]. The company Tissot watches has

practical applications in helping enhance the user can lay

hands on the clock to try to find the most suitable type

without having to lay directly at the store [3].

In this paper, we propose a system which gives a new

experience of interaction for newspaper readers, supply

more related information, so that users could approach in

multi-dimension. With this system, users could express their

though about the article such as “like”, comment. We also

propose a lightweight filtering method which helps

matching processing skip unnecessary computation. This is

called pruning strategy. The experiment shows that our

proposed method could be used in the reality with many

type of magazine.

This paper is organized as follows. In section 2, we present

the background about augmented reality and related works

in detection and matching. Our proposed system and

method are presented in section 3. Experiments to evaluate

the performance and efficiency of our proposed system are

in section 4. Sample usage scenarios of our proposed

system with different types of mobile devices are presented

in section 5. Conclusion and future work are discussed in

section 6.

2. BACKGROUND

2.1 Augmented Reality

With the development of virtual reality technology,

everything in the real world can be simulated by

computer[1]. All of objects are created in a likely 3D

environment and human can fully experience by vision,

hearing, and acting on it or even feel the fragrance of the

object. However, these objects is still a virtual object, the

user can not feel like the real world around them. For the

above reasons, augmented reality combines both virtual

objects and the real world to make human feel more

familiar.

Augmented Reality (AR) is a combination of virtual objects

and the real world [7]. The virtual object is used to enhance

the relevant information of the scene recorded from reality.

Whatever users can see is the augmented information which

is displayed overlaid real world objects or associated with

real space in which they are observed. The user will not feel

the separation between the virtual and real components.

That is the main purpose is of augmented reality to blur the

boundaries and the difference between real and virtual

objects in order to increase awareness and interaction of

human with the real world [8].

AR provides information in various types such as text,

images, and video can be applied in many different fields

such as education [6], health [7], geographic information

systems [9], and painting [10].

2.2 Marker based matching

This methods could be used to calculate camera pose in

real-time from marker [10], special image [11] or bokode

[12]. Markers are like barcodes that are associated with

objects that need to be tracked. ARToolKit [9] is one of the

most famous toolkits widely used in AR applications. After

thresholding the input image, regions whose outline contour

can be fitted by four line segments are extracted. The

regions are normalized and the sub-image within the region

is compared by template matching with patterns. This

recognition is linear with the number of markers, so the

performance is very high if there are a lot of markers in the

same image.

ARToolKitPlus [5] and ARTag [22] for example overcome

this scalability issue by using a bar code-like system to

https://www.researchgate.net/publication/224197749_Foldable_augmented_maps?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/200026513_Emerging_Technologies_of_Augmented_Reality_Interfaces_and_Design?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy


https://www.researchgate.net/publication/224197753_Differential_Instant_Radiosity_for_mixed_reality?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy


https://www.researchgate.net/publication/221221212_Point-and-shoot_for_ubiquitous_tagging_on_mobile_phones?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/215457339_Bokode_Imperceptible_Visual_Tags_for_Camera_Based_Interaction_from_a_Distance?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/220240985_Foldable_Augmented_Maps?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy


https://www.researchgate.net/publication/297211129_Virtual_immersive_and_3D_learning_spaces_Emerging_technologies_and_trends?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/216813818_ARToolKitPlus_for_Pose_Tracking_on_Mobile_Devices?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/221221119_Continuous_Natural_User_Interface_Reducing_the_Gap_Between_Real_and_Digital_World?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/4156386_ARTag_a_fiducial_marker_system_using_digital_techniques?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

encode the marker index in its appearance. Tracking by

using markers offers high speed and high accuracy. Also,

markers can be attached to any objects and if we want, it is

possible to make users unable to see the markers within the

display zone by the video see-through system. However,

newspaper or magazine does not have enough space for

attaching markers, they often used for content and

advertising. Moreover, in practice, one of the most

disadvantages of markers is that it is not familiar with

human sense.

2.3 Natural image based matching

This method recognizes objects using their outside

appearance of a book or magazine as natural features.

Natural image based matching is a common technique that

finds a sub-template in a bigger image. There are two main

approaches of template matching: local feature-based

template matching and template-based template matching.

Template-based approach uses color information of the

template as a main factor to determine the similarity

between template and an extracted pattern from source

image. In the template-based approach, there are many

distance measures to classify templates such as: Sum of

Squared Differences (SSD), Sum of Absolute Differences

(SAD) [20]. Area-based matching methods are simple and

easy to be implemented. Moreover, this approach can work

efficiently with both simple texture and complex texture

patterns. However, this approach is usually not robust with

change of scale, rotation and view point. So it is not

suitable for our problem solving.

Feature-based approaches use features such as edges [16],

corners [25], blobs [17][18] and a similarity measure to find

the best matching between two features in a template image

and a source image. There are two main steps to determine

a local feature. The first step is to detect interest points and

the second step is to descript that key points. This approach

is very popular in object recognition problem because of its

robustness with scale and rotation transformation,

occlusion, change of view point and noise. However, one of

the weaknesses of this approach is its high computational

cost. There are many ways to improve the low speed of

SIFT-like feature such as: Randomized Tree and Ferns

[24], however, it require a long time of training process.

Combining detection and tracking also solves the problem

of high computational cost but it only performs with a

limited number of objects.

Come back to our problem, we only need to perform with a

static image capture by mobile camera, so that we don’t

need to use tracking technique. All of the patterns are also

extracted features and stored in the server. Hence, in this

paper, we propose a lightweight filtering approach to skip a

lot of patterns as much as possible in order to decrease

unnecessary computational cost.

2.4 Marker and markerless AR applications

Using marker, barcodes, bokode or another natural markers,

AR applications can make many applications on many

aspects of life such as entertainment, health-care, sport and

education. Augmented Book [13] is a system that uses

Hybrid Visual Tracking to display information of some

book. The Hybrid Visual Tracking is the combination of

fiducial marker tracking and markerless tracking. Fiducial

marker is surrounded by a black shape to detect easily while

markerless tracking uses key points to match between

different scenes. By using FAST detector [14], the authors

of this system can find the key points of the image and the

augmented reality information can be displayed to users in

real-time.

Virtual Pop-up book [15] is another application using

Augmented Reality to provide more extra information for

users. In this system, the authors do not use markers to

create natural ways for users to communicate with the

system. Another attraction of the system is using 3D scenes,

users feel the scene alive and they become involved in the

story.

3. PROPOSED METHOD

3.1 Overview of the system

The overview of our proposed system is shown in Figure 1.

When reading a book, magazine, a reader wants to know

information about some product on a page of magazine

right at the time he or she sees the appearance of the

product. The user can use a mobile device to see a page of a

magazine. After sending the query image, the server will

find the information about products in the query image and

display this on user’s mobile device screen. The user will

select one of the products that they want to know its

information.

https://www.researchgate.net/publication/4193938_Fusing_points_and_lines_for_high_performance_tracking?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/2239823_Good_Features_to_Track?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/224377985_A_Computational_Approach_To_Edge_Detection?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/220851149_Hybrid_Visual_Tracking_for_Augmented_Books?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/226189467_Virtual_Pop-Up_Book_Based_on_Augmented_Reality?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

https://www.researchgate.net/publication/275057952_SURF_Speeded_up_robust_features?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy

Figure 1. The proposed system’s overview.

Figure 2. The process of proposed system.

The specific steps of system are illustrated in Figure 2.

First, a mobile device is to capture the visual appearance of

the magazine that has the product that a user is interested in.

This query image must be a planar object. In a smart device

of the user, we use histogram of color to detect color of the

visual query of product to filter out pages of magazines in

database that cannot match with the magazine user want to

get information. Then, user will send visual query image

with a collection of candidates to server for matching and

searching information.

A server will receive the query photo and verify the best

match for the visual query photo in the collection of

candidate sent by user’s mobile device. After this step, the

server will find all products in the visual query photo and

send a collection of products to user. Then, user will select

one of them to get the information of this product.

3.2 Lightweight Filtering This module is the pre-processing step for the visual query

process. The purpose is filter out quickly books, magazines

in database that cannot be candidates for next step. Another

gain of this step is comparison between the images with low

computational cost. Thus, this step is processed on user

mobile.

Dominant colors in an external appearance of a book,

magazine is the main factor for customers to detect which

book or magazine is. Not only that, an external appearance

of a book, magazine usually has only few dominant colors

so that customers can remember and easily recognize that

book from first sight. This natural approach can be

employed to execute the lightweight filtering module for

our proposed system. In our implementation, the

lightweight filtering module uses only the distributions of

colors of two images to evaluate the dissimilarity between

them. The color distribution of an image is used as its

lightweight feature.

RGB (Red, Green, and Blue) space color is the most

common model to encode colors, it is not appropriate to

represent the photosensitivity of human. Difference to

RGB, HSV model reflects on human color perception. In

HSV color space, each component takes a different role. H

represents Hue responding to color element, Saturation (S)

refers to the dominance of hue in the color, and finally

Value (V) is the brightness of the color. In reality, when

capturing a book, magazine, the brightness factors are

different. Therefore, lightweight filtering module needs to

reduce the effect of brightness conditions in comparing two

images. Thus, we use the Hue component to create

lightweight visual feature of an image.

Let MaxH be the maximum value of Hue. In practice, MaxH

= 360 (degree). Let nH > 0 be the total number of bins for

Hue channel. We calculate the nH-bin histogram of Hue

channel for each image as follows:

Ikk nInIH /)()( for Hnk 0

where nk(I) is the number of pixels in I with Hue value in

[kMaxH /nH, (k+1) MaxH / nH] and nI is the total number of

pixels in I. The nH bin histogram of Hue channel of an

image is its lightweight feature.

Based on nH-bin histogram of Hue channel for each image,

we apply dissimilarity measure between features of a query

image and an external appearance of a book image to

calculate the difference between two images. There are two

main categories: bin-to-bin distance and cross-bin distance

[19].

The bin-to-bin distance is sensitive to quantization, i.e. size

of a bin. When the number of bins decreases, the robustness

is increasing but the distinctiveness is decreasing, vice

versa. In order to achieve both the robustness and

distinctiveness, we use a cross-bin distance such as

Quadratic-Chi histogram distance [19].

Query image Cover 1 Cover 2

Histograms of the query image and three covers

Figure 3. Lightweight filtering with a query image and

three covers.

The Quadratic-Chi histogram distance between a query

image I* and a logo Ik is determined as follows:

where Ai,j is the similarity between bin i and j. After

calculating the distance between query image and cover

image, we choose no more than nk candidate books,

magazines whose distance less than a threshold H.

This module is illustrated in Figure 3 with a query image

and three cover images. The image of magazine 1 has

histograms similar to the query image’s histogram and the

peaks of their histograms are at bin 4. On the other hand,

the magazine 2 has a histogram with peak at bin 6, thus it

cannot be a candidate because it is dissimilar to the query

image.

3.3 Product Matching After filtering out books, magazines in lightweight filtering

module, the next step is to verify if each of the candidate

book, magazine found in the lightweight filtering step can

be accepted as the result of the visual query process. Our

main purpose is to find books, magazines with similar

visual appearance to a query image I*, so that we can apply

template matching in this step.

Template matching is a technique for finding a sub-template

in an image. This technique can be divided into two

approaches: template-based approach and feature-based

approach.

Template-based approaches use color information of a

template as global features to determine the similarity

between a template and an extracted pattern from a source

image. Sum-comparing metric (such as Sum of Squared

Differences (SSD), Sum of Absolute Differences (SAD),

Cross-Correlation [20]) is a measure to determine the best

location of template image in an image. These methods are

simple, easy to implement, and can perform with less

texture objects. However they are not robust with scale,

rotation transformation, and change of view point.

Feature-based approaches use local features such as edges

[16], corners [25], blobs [17][18] and a similarity measure

to find the best match between local features in a template

image and a source image. Because these methods are

robust with scale, rotation, and change of view point, they

are suitable for matching an image in which a book,

magazine can be captured in different scales, poses, and

orientations. Furthermore, a template for each book,

magazine is large enough and has sufficient texture for this

approach.

With each candidate selected in lightweight filtering

module, if the template T (candidate) can be matched with a

query image I*, the corresponding book, magazine is

considered as a result of the visual query process. This



https://www.researchgate.net/publication/221304451_The_Quadratic-Chi_Histogram_Distance_Family?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy



process consists of two main steps: key point extraction and

key point matching between template image and query

image.

In the first step, we extract key points from the query image

I*. Each key point is a blob-like structure described by its

center and the properties of its neighbor region. Scale

Invariant Feature Transform (SIFT) by D. Lowe[17] is the

most popular feature based method. In this method, each

key point is described with the descriptor of 128-

dimensional vector. The main advantage of this method is

invariant with scale, rotation, illumination and viewpoint.

Another method is Speeded-Up Robust Features (SURF)

[18]. The descriptor of a key point in this method is just a

64-dimensional vector. Therefore, the speed of key points

extraction in SURF is faster than SIFT. Thus, we decide to

use SURF because it is not only faster than SIFT but also

invariant with scale, rotation, illumination and viewpoint.

In the next step of matching process, we match key points

between a template T and a query image I* to consider each

template can be a result of visual query or not.

Let T and I* be the key point collections T and I*

respectively. For each key point p in T, we find its

corresponding key point q in I* by the nearest neighbour

search. The pair (p, q) is called a match and is only valid if

the distance between p and q is not greater than a threshold

M. Now we have a collection T,I* of key point matches

between T and I*.

If the template T is a result of visual query process, we can

find the matrix M that can map most of the keypoints in T

into I*. In our proposed system, we use RANSAC

method[21]. This method estimates the homography

transform M from a subset T,I* with randomly selected of

matches (usually with no less than 5 matches) between two

images, then count the number of outliers, i.e. matches that

do not support the estimated transform. This selection

process repeats until the number of iterations exceeds a

threshold.

Let M0 be the best homography transform with minimum

number of outliers found in RANSAC process. If the

number of outliers corresponding to M0 is less than a

threshold H, the template T is accepted as one result of the

query image I*. Otherwise, the template T can be

considered as conditionally accepted if the number of

matches |T,I*| is greater than a given threshold. If the

number of matches |T,I*| is lower than the threshold, this

template T is rejected.

Figure 4 shows an example of template matching using

SURF features with the book, magazine cover T (left) and

the query image I* (right). Each line mapping from the left

to the right is a pair of corresponding SURF features.

Figure 4. Detect a book, magazine cover in a query

image using SURF features.

4. EXPERIMENTS We present experiments to test different properties of our

proposed system, including three main tasks: evaluating the

efficiency of Lightweight Filtering (c.f. Section 5.1), the

performance of using Lightweight Filter (c.f. Section 5.2),

and the accuracy of Template Matching (c.f. Section 5.3).

The experiments is done based on the system running Core

Quad 2.4 GHz (with 2GB RAM) and a graphic card

GeForce GTX 460 (1GB memory)

4.1 Efficiency of Lightweight Filtering The experiment is set in order to evaluate efficiency in

limiting matching candidates by Lightweight Filter.

We have observation that most of the magazines or journals

have not many pages having similar color distribution,

except pages only use two colors (black and white) and the

ones without photos. To verify, we collect 30 magazines of

10 kinds of magazines as Table 1. For each magazine, we

https://www.researchgate.net/publication/215458581_Random_Sample_Consensus_A_Paradigm_for_Model_Fitting_with_Applications_To_Image_Analysis_and_Automated_Cartography?el=1_x_8&enrichId=rgreq-1ee72273047690c1b42875539de59492-XXX&enrichSource=Y292ZXJQYWdlOzI0MTc3MDkwOTtBUzo5ODk3Mzk5MzkzMDc1NkAxNDAwNjA4NjQ3MzMy


classify to groups having similar color distribution by using

Lightweight Filter.

The experiment shows that the number of pages for each

group is smaller than the total pages of the magazine a lot.

In the worst case, the percentage of the number of page for

each group is equal to 10.29% total pages of magazine in

this group.

Table 1. Efficiency of Lightweight Filtering

Magazine Number

of pages

Maximum number of

pages in a group of

pages with similar

color distribution

Tiep thi gia dinh 158 9

Tuoi Tre

(Sunday Edition)

44 3

Thanh Nien

(Weekly edition)

68 7

Echip mobile 60 3

Game world 82 4

PC world 132 11

Sai Gon Saturday 44 3

The gioi dien anh 84 4

Kien truc va doi song 110 6

Sieu thi o to 154 7

4.2 Performance of using Lightweight Filter The experiment is set to compare the performance of the

system using Lightweight Filter and not using Lightweight

Filter. In this experiment, our dataset include 150

magazines. Each of them has from 50 to 150 pages. We

divide the dataset into 5 small ones with different sizes:

200, 300, 500, 800, and 1000 pages.

For each dataset, we perform 100 visual queries with

different input images. For each visual query, we conduct

the visual query process in two situations: without

Lightweight Filtering and with Lightweight Filtering. The

experimental results are illustrated in Figure 5. In the first

situation, a query image I* is matched with each cover in a

dataset. Thus the total time to process a query linearly

increases with the number of covers in that dataset. In the

second context, only the top nk candidate covers are

considered for matching with SURF features. In our

experiment, we choose nk = 5.

In Figure 5, the time to process a visual query in the second

case increase slightly with the total number of product logos

in a dataset because image matching (with SURF features)

is only executed with no more than nk candidate covers for

each query. The average elapsed time is slightly higher than

the total time for matching a query image with nk = 5

candidates because of the extra time to perform the

Lightweight Filtering.

Figure 5 Comparison between the performance (in

milliseconds) of processing a visual query with and

without Lightweight Filtering.

4.3 Accuracy of Template Matching The experiment is to evaluate the accuracy of using

Template Matching. The covers captured by mobile devices

are matched with the datasets. We conduct the experiments

in four scenarios: a cover is obscured by fingers, a plastic

cover makes glare lighting, a cover with shadow, and a

cover with motion blur because of fast movement. Figure 6

illustrates sample images in 4 scenarios.

In each scenario, we detect 40 magazine covers and for

each cover, we process in 300 frames. The accuracy

percentages are shown in Table 2. In motion blur situation,

the cover cannot be detected in consecutive frames.

However, we can sterilize the result by applying Kalman

Filter[23] to correct detection and make the processing

more smoothly and exactly.

(a) Being obscured (b) Glare lighting

(c) Shadow (d) Motion blur

Figure 6 Sample images of 4 scenarios:

(a) Being obscured, (b) Glare lighting,

(c) Shadow, (d) Motion blur

Table 2. Accuracy of Template Matching

Scenario Without

Kalman Filter

With

Kalman Filter

(a) Being obscured 89.8% 93.4%

(b) Glare lighting 83.2% 88.3%

(c) Shadow 92.6% 96.8%

(d) Motion Blur 84.8% 91.8%

5. SAMPLE USAGE SCENARIOS OF THE

PROPOSED SYSTEM In this section, we briefly present several features of our

proposed system in practical contexts.

Figure 7 shows an example of a regular page of a magazine

or a newspaper with extra information and multimedia

objects marked in this page. When the page is detected, a

user can interact with each augmented object in this page to

invoke the performance of that object.

In Figure 8, an audio clip and a color photo are augmented

into a regular article in Thanh Nien newspaper. Through the

mobile device, the grayscale photo in the printed article is

replaced by color photos. When a user touches on the icon

of the audio clip, he or she can listen to the whole content

of that article.

Figure 7 Extra information and multimedia objects are

marked into a regular page of a magazine or a newspaper

Figure 8 An audio clip and a color photo are augmented

into an article in Thanh Nien newspaper

Figure 9 demonstrates the proposed system with a tablet. A

video clip corresponding to an article in the first page of

Tuoi Tre newspaper is being played when a reader read this

article through the tablet. First, the first frame of the video

clip is displayed to replace the grayscale photo in the

article. When a user invokes this video clip by touching into

the touchscreen of the tablet, he or she will watch this clip

in different sizes and views, e.g. projective view of

fullscreen view.

Figure 9 A video clip is embedded into a regular article

in Tuoi Tre newspaper

6. CONCLUSION In this paper, we introduce a system that provides extra

information for user by applying Augmented Reality

technology to the traditional newspaper. A user can have

content of extra information about a product or news as

soon as he/she reads it in the first time.

Lightweight filter is the important feature of the system to

filter out quickly candidates that do not match with the

product/news. The experiment shows that the system can

process in real-time and can be applied in practice.

In the future, we can apply parallel processing in matching

step to improve the performance of the system. We display

not only multimedia information (videos, clips, and detail

about product) but also social media content (comment,

“like”, rating) from social networks.

7. ACKNOWLEDGEMENT This research was supported by John von Neumann

Institute – Vietnam National University and Faculty of

Information Technology, Ho Chi Minh University of

Science – Vietnam National University.

REFERENCES [1] Ig-Jae Kim, “Introduction to augmented reality and its

applications”, ACM SIGGRAPH ASIA 2010 Courses (SA

'10), 2010

[2] Kompas Augmented Reality, http://www.kompas.com/ar

[3] Tissot Reality, http://www.tissot.ch/reality

[4] http://www.commbank.com.au/about-us/news/media-

releases/interactive/iphone

[5] Daniel Wagner, Dieter Schmalstieg, “ARToolkitPlus for

Pose Tracking on Mobile Devices”, Proceedings of 12th

Computer Vision Winter Workshop (CVWW07), 2007, tr.

139-146.

[6] Shalin Hai-Jew, “Virtual Immersive and 3D Learning

Spaces: Emerging Technologies and Trends”, IGI Global,

2010

[7] Michael Haller, Mark Billinghurst, Bruce Thomas,

“Emerging Technologies of Augmented Reality: Interfaces

and Design”, IGI Global, 2006

[8] Nils Petersen, Didier Stricker, “Continuous natural user

interface: Reducing the gap between real and digital world”,

in Proceedings of the 8th IEEE International Symposium on

Mixed and Augmented Reality 2009, ISMAR 2009, tr. 23-26

[9] Sandy Martedi, Hideaki Uchiyama, Guillermo Enriquez,

Hideo Saito, Tsutomu Miyashita, Takenori Hara, “Foldable

augmented maps”, in Proceedings of the 9th IEEE

International Symposium on Mixed and Augmented Reality

2010, ISMAR 2010, tr. 65-72

[10] M. Knecht, C. Traxler, O. Mattausch, W. Purgathofer, M.

Wimmer, “Differential Instant Radiosity for Mixed Reality”,

ISMAR 2010, pp. 99-107 (2010).

[11] W. Lee, Y. Park, V. Lepetit, “Point-and-Shoot for

Ubiquitous Tagging on Mobile Phones”, ISMAR 2010, pp.

57-64 (2010).

[12] A. Mohan, G. Woo, S. Hiura, Q. Smithwick, R. Raskar.

Bokode, “Imperceptible Visual Tags for Camera-based

Interaction from a Distance”, SIGGRAPH 2009 (2009).

[13] Hyun S. Yang, Kyusung Cho, Jaemin Soh, Jinki Jung, and

Junseok Lee. 2008, “Hybrid Visual Tracking for Augmented

Books”, In Proceedings of the 7th International Conference

on Entertainment Computing (ICEC '08).

[14] Rosten, E., Drummond, T., “Fusing points and lines for high

performance tracking”, In: 9th IEEE International

Conference on Computer Vision, pp. 1508–1511 (2005).

[15] Nobuko Taketa, Kenichi Hayashi, Hirokazu Kato, and Shogo

Noshida. 2007, “Virtual pop-up book based on augmented

reality”, In Proceedings of the 2007 conference on Human

interface: Part II, Michael J. Smith and Gavriel Salvendy

(Eds.). Springer-Verlag, Berlin, Heidelberg, 475-484.

[16] J. Shi and C. Tomasi, “Good Features to Track”. In IEEE

Conference on Computer Vision and Pattern Recognition,

pp. 593 – 600, 1994.

[17] D. G. Lowe, “Distinctive Image Features from Scale-

Invariant Keypoints”, International Journal of Computer

Vision (IJCV), pp. 91-110, 2004.

[18] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SURF:

Speeded Up Robust Features”, Computer Vision and Image

Understanding (CVIU), pp. 346-359, 2008.

[19] O. Pele and M. Werman. “The quadratic-chi histogram

distance family”, In Proceedings of European conference on

Computer vision (ECCV). pp. 749-762 (2010).

[20] J. P. Lewis. “Fast normalized cross-correlation”, In Vision

Interface, Canadian Image Processing and Pattern

Recognition Society. pp. 120 – 123 (1995).

[21] M. A. Fischler, R. C. Bolles. “Random Sample Consensus: A

Paradigm for Model Fitting with Applications to Image

Analysis and Automated Cartography”, Comm. of the ACM,

Vol 24, pp 381-395 (1981)






















































[22] M. Fiala, “ARTag, a fiducial marker system using digital

techniques”, Conference on Computer Vision and Pattern

Recognition, pp. 590-596, 2005.

[23] R. E. Kalman, “A New Approach to Linear Filtering and

Prediction Problems”, Transaction of ASME-Journal of

Basic Engineering, 1960.

[24] M. Ozuysal, M. Calonder, P. Fua, V. Lepetit, “Fast keypoint

recognition using random ferns”, IEEE Transactions on

Pattern Analysis and Machine Intelligence ,448-461, 2010.

[25] J. Canny, “A Computational Approach to Edge Detection”,

In IEEE Transactions on Pattern Analysis and Machine

Intelligence, pp. 679-698, 1986.







Augmented media for traditional magazines

Documents