This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The research presented in this thesis addresses the problem of Text Segmentation in
Web images. Text is routinely created in image form (headers, banners etc.) on Web
pages, as an attempt to overcome the stylistic limitations of HTML. This text
however, has a potentially high semantic value in terms of indexing and searching for
the corresponding Web pages. As current search engine technology does not allow for
text extraction and recognition in images, the text in image form is ignored.
Moreover, it is desirable to obtain a uniform representation of all visible text of a Web
page (for applications such as voice browsing or automated content analysis). This
thesis presents two methods for text segmentation in Web images using colour
perception and topological features.
The nature of Web images and the implicit problems to text segmentation are
described, and a study is performed to assess the magnitude of the problem and
establish the need for automated text segmentation methods.
Two segmentation methods are subsequently presented: the Split-and-Merge
segmentation method and the Fuzzy segmentation method. Although approached in a
distinctly different way in each method, the safe assumption that a human being
should be able to read the text in any given Web Image is the foundation of bothmethods’ reasoning. This anthropocentric character of the methods along with the use
of topological features of connected components, comprise the underlying working
principles of the methods.
An approach for classifying the connected components resulting from the
segmentation methods as either characters or parts of the background is also
It would be considerably more difficult to walk along the twisting and occasionally
disheartening road of research, without the help of numerous people who with their
myriad contributions, suggestions, friendly ears and timely distractions, supported me
during the last few years.
Firstly, thanks must go to my supervisor, Apostolos Antonacopoulos, for hissupport and feedback throughout my period of research. Without his guidance and his
valuable contributions, this thesis would not be possible. Also in the department, I
would like to thank my fellow doctoral students for their cherished comradeship. In
particular, I wish to thank Dave Kennedy, who shared an office with me for the last
three years, for maintaining a cheerful atmosphere by frequently distracting me with
pure English jokes (which admittedly I didn’t always understand).
Special thanks must also go to my ex-flatmate Dimitris Sapountzakis, with whom
I spent one of my best years in Liverpool. Vicky Triantafyllou, for her love and
support. Alexandros Soulahakis, for his delicious “loukoumades” and for keeping me
up to date with the latest music. Maria and Vasilis Mavrogenis for the coffee breaks
and the occasional barbeques. Paula and Dimitris for dragging me out for a drink once
in a while. Chrysa, and Elena for keeping me home the rest of the times. Nikos
Bechlos for keeping me up to date with mobile technologies, and numerous others
who made my stay in Liverpool an enjoyable time.
Finally, I owe much to my family who with their tireless interest and constant
support helped me immensely to accomplish this task. Thank you all for being there.
I declare that this doctoral thesis was composed by myself and that the work
contained therein is my own, except were explicitly stated otherwise in the text. The
following articles were published during my period of research.
1. A. Antonacopoulos and D. Karatzas , “ Fuzzy Text Extraction from Web Images Based on Human Colour Perception ”, to appear in the book: “ Web Document
Analysis – Challenges and Opportunities ”, A. Antonacopoulos and J. Hu (eds.),
World Scientific Publishing Co., 2003
2. D. Karatzas and A. Antonacopoulos, “ Visual Representation of Text in Web
Documents and Its Interpretation ”, proceedings of 2 nd Workshop on Visual
Representations and Interpretations (VRI’2002), Liverpool, September 2002
3. A. Antonacopoulos and D. Karatzas , “Fuzzy Segmentation of Characters in Web
Images Based on Human Colour Perception” , proceedings of the 5 th IARP Workshop
on Document Analysis Systems (DAS’2002), Princeton, NJ, August 2002
4. A. Antonacopoulos and D. Karatzas , “Text Extraction from Web Images Based on
Human Perception and Fuzzy Inference” , proceedings of the 1 st International
Workshop on Web Document Analysis (WDA’2001), Seattle, Washington,
September 2001
5. A. Antonacopoulos, D. Karatzas and J. Ortiz Lopez, “Accessing Textual
Information Embedded in Internet Images” , proceedings of Electronic Imaging 2001:
Internet Imaging II, San Jose, California, USA, January 2001
6. A. Antonacopoulos and D. Karatzas , “An Anthropocentric Approach to Text
Extraction from WWW Images” , proceedings of the 4 th IARP Workshop on Document
Analysis Systems (DAS’2000), Rio de Janeiro, Brazil, December 2000
1.1 I MAGES IN W EB DOCUMENT ANALYSIS ..............................................................2
1.2 T HE USE OF IMAGES IN THE W EB – A S URVEY ...................................................3
1.3 T HE NEED – A PPLICATIONS OF THE M ETHOD ....................................................7
1.4 W ORKING WITH W EB IMAGES – O BSERVATIONS AND C HARACTERISTICS .......8
1.5 A IMS AND O BJECTIVES OF THIS P ROJECT .........................................................12
1.6 O UTLINE OF THE T HESIS ....................................................................................13
2 IMAGE SEGMENTATION TECHNIQUES 15
2.1 G REYSCALE SEGMENTATION T ECHNIQUES ......................................................17
2.1.1. T HRESHOLDING AND CLUSTERING METHODS ..................................................18
2.1.2. E DGE DETECTION BASED METHODS ................................................................22
2.1.3. R EGION EXTRACTION METHODS ......................................................................35
2.2 C OLOUR SEGMENTATION TECHNIQUES .............................................................412.2.1. C OLOUR ...........................................................................................................42
2.2.2. H ISTOGRAM A NALYSIS TECHNIQUES ...............................................................47
2.2.3. C LUSTERING TECHNIQUES ...............................................................................51
2.2.4. E DGE DETECTION TECHNIQUES .......................................................................60
2.2.5. R EGION BASED TECHNIQUES ...........................................................................67
2.3 D ISCUSSION .........................................................................................................73
Text Segmentation in Web Images Using Colour Perception and Topological Features
3 TEXT IMAGE SEGMENTATION AND CLASSIFICATION 77
3.1 B I-LEVEL P AGE SEGMENTATION M ETHODS .....................................................78
3.2 C OLOUR P AGE SEGMENTATION M ETHODS .......................................................83
3.3 T EXT E XTRACTION FROM VIDEO ......................................................................87
3.4 T EXT EXTRACTION FROM R EAL SCENE IMAGES ...............................................91
3.5 T EXT E XTRACTION FROM W EB IMAGES ...........................................................96
3.6 D ISCUSSION .......................................................................................................100
4 SPLIT AND MERGE SEGMENTATION METHOD 103
4.1 B ASIC C ONCEPTS – I NNOVATIONS ...................................................................103
4.2 D ESCRIPTION OF THE M ETHOD .......................................................................105
4.3 P RE -P ROCESSING .............................................................................................106
4.4 S PLITTING P HASE .............................................................................................113
4.4.1. S PLITTING PROCESS .......................................................................................113
4.4.2. H ISTOGRAM A NALYSIS ..................................................................................116
4.5 M ERGING P HASE ..............................................................................................123
4.5.1. C ONNECTED COMPONENT IDENTIFICATION ...................................................1244.5.2. V EXED AREAS IDENTIFICATION .....................................................................125
4.5.3. M ERGING PROCESS ........................................................................................128
4.6 R ESULTS AND SHORT DISCUSSION ...................................................................141
4.7 C ONCLUSION ....................................................................................................145
5 FUZZY SEGMENTATION METHOD 147
5.1 B ASIC C ONCEPTS – I NNOVATIONS ...................................................................147
5.2 D ESCRIPTION OF METHOD ................................................................................148
5.3 I NITIAL C ONNECTED C OMPONENTS ANALYSIS ...............................................149
5.4 T HE F UZZY INFERENCE SYSTEM .....................................................................154
5.4.1. C OLOUR DISTANCE ........................................................................................154
5.4.2. T OPOLOGICAL R ELATIONSHIPS BETWEEN COMPONENTS ...............................157
5.4.3. T HE CONNECTIONS R ATIO .............................................................................160
5.4.4. C OMBINING THE I NPUTS : PROPINQUITY .........................................................165
5.5 A GGREGATION OF C ONNECTED C OMPONENTS ...............................................169
5.6 R ESULTS AND SHORT DISCUSSION ...................................................................171
5.7 D ISCUSSION .......................................................................................................174
6 CONNECTED COMPONENT CLASSIFICATION 177
6.1 F EATURE BASED C LASSIFICATION ..................................................................177
6.1.1. F EATURES OF CHARACTERS ...........................................................................178
6.1.2. M ULTI -D IMENSIONAL FEATURE SPACES .......................................................183
6.2 T EXT L INE IDENTIFICATION ............................................................................187
6.2.1. G ROUPING OF CHARACTERS ..........................................................................187
6.2.2. I DENTIFICATION OF COLLINEAR COMPONENTS ..............................................190
6.2.3. A SSESSMENT OF LINES ...................................................................................194
6.3 C ONCLUSION ....................................................................................................205
7 OVERALL RESULTS AND DISCUSSION 207
7.1 D ESCRIPTION OF THE DATASET .......................................................................207
7.2 S EGMENTATION R ESULTS ................................................................................2137.2.1. S PLIT AND MERGE METHOD ..........................................................................214
7.2.2. F UZZY METHOD .............................................................................................224
7.2.3. C OMPARISON OF THE TWO SEGMENTATION METHODS ...................................233
7.3 C HARACTER C OMPONENT C LASSIFICATION R ESULTS ...................................239
7.4 D ISCUSSION .......................................................................................................242
8 CONCLUSION 247
8.1 S UMMARY .........................................................................................................247
8.2 A IMS AND OBJECTIVES REVISITED ...................................................................248
8.3 A PPLICATION TO D IFFERENT DOMAINS ..........................................................249
8.4 C ONTRIBUTIONS OF THIS R ESEARCH ..............................................................251
cientific and cultural progress would not be possible without ways to
preserve and communicate information. Nowadays, automatic methods for
retrieving, indexing and analysing information are increasingly being used in every
day life. Accessibility of information is therefore a significant issue. World Wide Web
is possibly the upshot of information exchange and an area where problems in
information retrieval are easily identifiable.
S
Images constitute a defining part of almost every kind of document, either serving
as a carrier of information related to the content of the document (e.g. diagrams,
charts, etc), or used for aesthetic purposes (e.g. background, photographs, etc). At
first, mostly due to limited bandwidth, the use of images on Web pages was rather
restricted and their role was mainly to beautify the overall appearance of a page
without carrying important (semantic) information. Nevertheless, as the overall speed
of Internet connections is rising an increasing trend has been noticed to embed
semantically important entities of Web pages into images. More specifically,
designing eye-catching headers and menus and enhancing the appearance of a Web
page using images for anything that the visitor should pay attention to (e.g.
advertisements), is a strong advantage in the continuous fight to attract more visitors.
Furthermore, certain parts of a document, such as equations, are bound to be in image
form, as there is no alternative way to code them in Web pages.
Regardless of the role images play in Web pages, text remains the primary (if notthe only) medium for indexing and searching Web pages. Search engines cannot
Text Segmentation in Web Images Using Colour Perception and Topological Features
access any text inside the images [4], while analysing the alternative text (ALT tags)
provided, proves to be rather a disadvantage, since in almost half of the cases it is
totally misleading as will be seen in Section 1.2. Therefore, important information on
Web pages is not accessible, introducing a number of difficulties regarding automatic
processes such as indexing and searching.
The research described in this Thesis addresses the need to extract textual
information from Web images. The aim of the project is to examine possible ways to
segment and identify characters inside colour images such as the ones found on Web
pages. Towards the text segmentation in Web images, two different methods were
implemented and tested. The first method works in a split-and-merge fashion, based
on histogram analysis of the image in the HLS colour system, and connected
component analysis. The second method is based on the definition of a propinquity
measure between connected components, defined in the context of a fuzzy inference
system. Colour distance and topological properties of components are incorporated
into the propinquity measure providing a comprehensive way to analyse relationships
between components. The innovation of both approaches, lies in the fact that they are
both based on available (existing and experimentally derived) knowledge on the way
humans perceive colour differences. This anthropocentric character of the two
approaches is evident primarily through the way colour is manipulated, making use of
human perception data and employing colour systems that are efficient
approximations of the psychophysical manner humans understand colour.
The concept of the “Web document” will be introduced in the next section, and
text extraction from Web images will be discussed in the context of document
analysis. The impact and consequences of using images in Web pages is assessed in
Section 1.2. The significant need for a method that extracts text from Web images is
discussed in Section 1.3, along with possible applications for such a method.
Section 1.4 examines the distinguishing characteristics of Web images and
summarizes interesting observations made. Characteristic paradigms of Web images
are also given in Section 1.4. The aims and objectives of this project are detailed in
Section 1.5. Finally an outline of the thesis is given in Section 1.6.
1.1. Images in Web Document Analysis
Following the exploding expansion of World Wide Web, the classical definition of what a document is had to change in order to accommodate the new form of electronic
document that appeared: the “Web Document” . Two distinct descriptions exist for
every Web page, one is the code used to produce the output, and the other is the actual
output itself. Although either description should ideally be adequate to describe a
given Web page, not the same information is stored in each representation. For
instance, information about the filenames of images contained in the document is only
available in the Web page’s code, whereas the images themselves, thus the
information they add to the document, are only part of the two-dimensional viewable
output. The definition of Web Document is therefore not trivial, and should
incorporate both representations.
The key-role images play in Web Pages makes them an important part of every
Web Document Analysis method. This stands true for all types of images, as images
are used in Web pages for a variety of purposes, such as to define a background
pattern for the document, as bullets in a list, deliminators between different sections,
photographs, charts etc. The most significant kind of images, in respect to the amount
of information they carry, are the ones containing text, such as headers, menus, logos,
equations, etc. Images containing text carry important information about both the
layout and the content of the document, and special attention should be paid in
incorporating this information in every Web Document Analysis method.
1.2. The Use of Images in the Web – A Survey
In order to assess the impact of the presence of text in Web images, the Pattern
Recognition and Image Analysis (PRImA) group of the University of Liverpool
delivered a survey over the contents of images of around 200 Web pages [9]. The
Web pages included in the survey originated from a variety of sites that the average
user would be interested in browsing. Sites of newspapers and TV stations, on-line
shopping, commercial, academic and other organizations’ sites as well as sites dealingwith leisure activities, work-related activities and other routine activities were all
included, so that the sample set reflects the average usage of World Wide Web today.
The Web pages analysed were all in English language, and therefore the majority of
sites were from the United States and the United Kingdom. For the purpose of this
survey, there should not be loss of generality by this choice.
In order to assess the impact of the presence of text in Web images, certain
properties were measured for each image, related both to the textual contents of theimages as well as to the contents of the ALT tags associated with each image. The
Text Segmentation in Web Images Using Colour Perception and Topological Features
Contrary to typical document images, which have a minimum resolution of 300dpi
and a typical size of an A4 page (thus their dimensions are in the range of thousands
of pixels), Web Images have an average resolution of 75dpi . This is adequate for
viewing on a monitor, which is the primary use for those images. Consequently, the
size of the images is in most of the cases small. Since the images are to be viewed on
a monitor with an average resolution of 800x600 pixels, the dimensions of Web
Images are never larger than some hundred pixels. Another observation that would
lead us to similar assumptions about the average size of Web Images is the function of
the images in the Web pages. Headers, menu items and section titles occupy usually a
small area of the page, which is reflected in the size of the images used for these
entities. Furthermore, the size of the characters used is also much smaller than the size
of characters on scanned documents ( Figure 1-5) . An expected character size in
scanned documents is 12pt or larger, whereas in Web pages characters can be as small
as to an equivalent of 5 to 7pt .
(a) (b)
Figure 1-5 – (a) An image containing small characters. (b) Magnified part of the image.
Another important observation is that Web images are used in order to createimpact. The ultimate function of images in Web pages is not only to beautify the
overall look of the pages, but also to attract viewers. This should be combined with
the fact that the creation of Web pages is not limited to professional designers (that
could possibly comply to certain designing rules), but essentially open to everyone
that owns a computer. Creativity is therefore the only limit when designing a Web
page. Consequently, complex colour schemes are used most of the times, resulting to
images having multi-coloured text over multi-coloured background ( Figure 1-6) . One
would anticipate that just because creating impact is an important issue, high contrast
The objective of this project is the implementation and testing of new ideas for
performing text extraction from web images, as dictated from the specific
characteristics of the problem. A successful solution should be able to cope with the
ma
text and
bac
age text extraction method is expected to be used, it is important that
the
y of the text extraction method to decide whether an image contains text
or not, is a desired property, but considered to be out of the scope of this research. For
the evaluation of this research, only images containing text will be employed.
1.6. Outline of the Thesis
Following this introductory chapter, Chapters 2 and 3 provide a theoretical
background for this research. Chapter 2 gives a detailed overview of segmentation
methods, starting with methods used for greyscale image analysis, followed by a brief introduction to colour and a critical review of colour segmentation methods. Chapter 3
provides an overview of text image segmentation and classification methods. Several
aspects of text image segmentation are discussed and a number of classification
methods for the segmented regions are presented and appraised. Previous work on the
topic of text extraction from colour images is also part of Chapter 3.
Chapter 4 describes in detail the first method used for segmenting text in Web
images. This method works in a split-and-merge fashion, making use of the HLS
jority of web images, producing precise segmentation results and high
classification rates. Given the volume of fundamentally different images existent in
the World Wide Web, this is expected to be a complicated task.
The final solution should be able to correctly segment images containing
kground of varying colour, mainly gradient and antialiased text. It should be also
able to cope with various text layouts (e.g. non-horizontal text orientation)
The methods developed should not set any special requirements for the input
images, while the assumptions about the contents of the image should be kept to an
absolute minimum.
Given the volume of web images available and the special types of applications
where a web im
execution time is kept short. Nevertheless, since the focus of this research is on the
implementation and evaluation of novel methods, the execution time for the
prototypes should be allowed to be reasonably long, as long as optimisation and
Text Segmentation in Web Images Using Colour Perception and Topological Features
0,1, == ji N
ji X X I , ji ≠ Eq. 2-2
even though not mentioned, should also be in effect for the above definition to be
complete. That last condition states that no pixel of the image can belong to more than
one subset.
The second condition implies that regions must be connected, i.e. composed of
contiguous lattice points. This is a very important criterion since it affects the central
structure of the segmentation algorithms [119, 197] . This second condition is not met
in many approaches [31, 81, 141, 173].
A uniform predicate P as the one associated with the third condition is according
to Fu and Mui [56] one which assigns the value TRUE or FALSE to a non-empty
subset Y of the grid of sample points X of a picture, depending only on properties
related to the brightness matrix for the points of Y . Furthermore, P has the property
that if Z is a non-empty subset of Y , then P(Y)=TRUE , implies always that
P(Z)=TRUE . The above definition is rather restricted in the sense that the only feature
on which P depends on is the brightness of the points of the subset Y . In Tremeau and
Borel [197] this is generalized in order to address other characteristic features of the
subset pixels as well. According to them, the predicate P determines what kind of
properties the segmentation regions should have, for example a homogeneous colour
distribution.
Finally, the fourth condition entails that no two adjacent regions can have the
same characteristics.
Segmentation is one of the most important and at the same time difficult steps of
image analysis. The immediate gain of segmenting an image is the substantial
reduction in data volume. For that reason segmentation is usually the first step beforea higher level process which further analyses the results obtained.
Segmentation methods are basically ad hoc and their differences emanate from
each problem’s trait. As a consequence, a variety of segmentation methods can be
found in literature, the vast majority of which addresses greyscale images.
In the present chapter an overview of some of the existing methods for
segmentation will be attempted, paying particular attention to segmentation methods
created for colour images. The next section gives a brief anaphora on greyscale
segmentation methods, followed by the main section of this chapter, which focuses on
colour image segmentation.
2.1. Greyscale Segmentation Techniques
Fu and Mui [56] in their survey on image segmentation classified image segmentation
techniques as characteristic feature thresholding or clustering, edge detection and
region extraction. This survey was done from a biomedical viewpoint, and the
evaluation of techniques is based on cytology images. Authors’ comments are
objective, but the main interest is clearly cytology imaging. A review of several
methods for thresholding and clustering, edge detection and region extraction was
performed. Most of the methods reviewed are towards greyscale segmentation, with
the exception of Ohlander’s work [134] on colour image thresholding and a clustering
method proposed by Mui, Bacus and Fu [124], which only uses colour-density
histograms at a later stage, after initial segmentation has already been obtained.
Haralick and Shapiro [69] categorized segmentation techniques into six classes:
measurement space guided spatial clustering, single linkage region growing schemes,
hybrid linkage region growing schemes, centroid linkage region growing schemes,
spatial clustering schemes and spit and merge schemes. The survey mainly focuses on
the first two classes of segmentation techniques; measurement space guided spatial
clustering and region growing techniques for which a good summary of different
types of region growing methods has been presented. The recursive clustering method
proposed by Ohlander [134] and the work of Ohta, Kanade and Sakai on colour
variables [136] are detailed among others in the section concerning clustering. There
is also a small section about multidimensional measurement space clustering, where
Haralick and Shapiro propose to work in multiple lower order projection spaces and
then reflect these clusters back to the full measurement space.Pal and Pal [140] have made a somewhat more complete review of image
segmentation techniques. They cover areas not addressed in previous surveys such as
fuzzy set and neural networks based segmentation techniques as well as the problem
of objective evaluation of segmentation results. Furthermore, they consider the
segmentation of colour images and range images (basically magnetic resonance
images). They identify two approaches for segmentation: classical approach and fuzzy
mathematical approach, each including techniques based on thresholding, edgedetection and relaxation. The paper attempts a critical appreciation of several methods
distinct labels assigned to the segmented image [56]. Bi-level thresholding can be
seen as a special case of multi-thresholding for m=2 , thus two intervals are defined
one between T 0 and T 1 and the second between T 1 and T 2.
Level slicing or density slicing is a special case of bi-level thresholding by which
a binary image is produced with the use of two threshold levels. “A binary one is
produced on the output image whenever a pixel value on the input image lies between
the specified minimum and maximum threshold levels” [70].
Based on the aforementioned definition a threshold operator T can be defined as a
test involving a function of the form [60]:
( )),(),,(,, y x f y x N y xT Eq. 2-4
where N(x, y) denotes some local property of the point (x, y) for example the average
grey level over some neighbourhood. Depending on the functional dependencies of
the threshold operator T , Weszka [204] and Gonzalez and Woods [60] divided
thresholding into three types. The threshold is called global if T depends only on
f(x, y). It is called a local threshold when T depends on both f(x, y) and N(x, y). Finally
if T depends on the coordinate values x, y as well, then it is called a dynamicthreshold.
A global threshold can be selected in numerous ways. There are threshold
selection schemes employing global information of the image (e.g. the histogram of a
characteristic feature of the image, such as the grey scale value) and others using local
information of the image (e.g. co-occurrence matrix, or the gradient of the image).
Taxt et al [192] refer to these threshold selection schemes using global and local
information as contextual and non-contextual thresholding respectively. According toTaxt et al , under these schemes, if only one threshold is used for the entire image then
it is called global thresholding, whereas if the image is partitioned into sub-regions
and a threshold is defined for each of them, then it is called local thresholding [192],
or adaptive thresholding [214]. For the rest of this section, the definitions given by
Weszka [204] and Gonzalez and Woods [60] will be used.
Global threshold selection methods
If there are well defined areas in the image having a certain grey-level value, then thatwould be reflected in the histogram to well separated peaks. In this simple case,
operators ( Figure 2-2) , which are based on a 3x3 neighbourhood. The Sobel operator
has the advantage of providing both a differencing and a smoothing effect, which is a
particularly attractive feature, since derivatives enhance noise. Prewitt’s operator is
Di
also adequately noise-immune, but it has a weaker response to the diagonal edge.
rection
Prewitt−−− 111
000
111
−−−
011
101
110
−−−
101
101
101 −
−−
110
101
011
Sobel−−− 121
000
1
−−−
012
101
210
−−−
101
202
101
−−
−−
210
101
012
21
Robinson −−−−
111
21
1
−−−
−
111
121
111
1
11
−−−−
111
121
111
−−−−
111
121
111
K
333 333 − 335 −− 355
irsch−−− 555
303
−−−
355
305
−−
335
305 −
333
305
Figure 2-2 – 3x3 Convolution Masks of some operators approximating the first derivative. Four out
of the eight directions are depicted.
Gradient operators are widely used, since they comprise a straightforward,
computationally cheap parallel process. Local operators based on the second
derivative have also been proposed. The Laplacian of a function is a second order
derivative defined as:
2
2
2
2
2 y f x f f ∂∂+∂∂= Eq. 2-6
The Laplacian operator may be implemented in various ways. It presents a number
of disadvantages. Being a second order derivative operator, the Laplacian operator is
very sensitive to noise. Furthermore, it produces double edges.
A good edge detector should be a filter able to act at any desired scale, so that
large filters can be used to detect blurry shadow edges and small ones to detectsharply focused fine details. Marr and Hildreth [113] proposed the Laplacian of
Text Segmentation in Web Images Using Colour Perception and Topological Features
Gaussian ( LG) operator as one that satisfies the above statement. The Gaussian part of
the LG operator smoothes the image at scales accord g to the scale of the Ga
Then the zero crossings in the output image produced by the Laplacian part of the
operator indicate the positions of edges. The space described by the scale parameter of
the Gaussian and the zero crossing curves of the output image is known as the
scale-space. Techniques based on scale-space analysis [26, 207] are also used to
identify interesting peaks in histograms and will be discussed again in section 2.2.2.
The Laplacian of a two dimensional Gaussian function of the form of Eq. 2-7 is
given in Eq. 2-8. In Figure 2-3 the plot of the Laplacian of the Gaussian is shown, as
well as a cross-section. The zero crossings of the function are at r=± σ , where σ is the
standard deviation of the Gaussian.
in ussian.
2222
2
,2
exp),( y xr r
y xh +=
−=
σ Eq. 2-7
−
−=
2222
2exp1
σ σ σ h
221 r r Eq. 2-8
-σ σ
h2
r
(a) (b)
Figure 2-3 – (a) A plot of the Laplacian of the Gaussian ( 2h); (b) a cross-section of 2h.
The importance of Marr and Hildreth’s method lies partially to the introduction of
the concept of scale to edge detection, a concept that has een widely used sin
54]. Lu and Jain [108] studied the behaviour of edges in the scale-space using the LG operator, aiming to derive useful rules for scale-space reasoning and for high-level
Text Segmentation in Web Images Using Colour Perception and Topological Features
Further processing usually consists of linking edge elements separated by small
gaps and removing short segments that are unlikely to be part of an important
boundary. Local processing methods for edge linking are usually generic and they do
not give good results in many situations. More often than not methods aiming to
identify certain shapes in the set of produced edge pixels are used.
Line and curve fitting techniques for edge linking
Lowe [107] describes a method to recognize three-dimensional objects from single
two-dimensional images. His method is based on matching certain structures of
identified lines with the three-dimensional description of the model in question. In
order to group the edge points resulting from an edge detection process into lines,
Lowe proposed a method based on a significance metric for each line fit. Thesignificance of a straight line fit to a list of points was estimated by calculating the
rati
rithm was proposed by Rosin and West [166].
n the straight-line description
n is zero. This proved to happen often with short lines. For this
metric, a lower significance value would indicate a more significant line. The
o of the length of the line segment divided by the maximum deviation of any point
from the line. This measure remains constant under different scales of the image. A
line segment is then recursively subdivided at the point of maximum deviation, giving
two smaller line segments, and the process is repeated until no line segment is larger
than 4 pixels. A binary tree of possible subdivision for each line segment is thus
created, and the significance of each sub-segment is calculated. Then following a
bottom-up procedure a decision is made at each junction as to whether to keep the two
sub-segments or replace them with the higher-order one. If the significance of any of
the sub-segments is greater than the significance of the complete segment, then the
sub-segments are preferred. This is a very compact approach, which manages to
approximate any set of linked pixels with a number of line segments, independently
from the scale of the image in hand.
An extension of the above algo
They suggested that circular arcs could be detected i
using a similar algorithm to find the best fit of arcs and straight lines to the data. At
each level of recursion, a decision is taken as to whether an arc is a better fit for the
given set of points than a lower level description consisting of straight lines and arcs.
Rosin and West used the reciprocal of Lowe’s significance metric, so that division by
zero could be avoided when all the pixels lay directly on the straight line, thus the
Text Segmentation in Web Images Using Colour Perception and Topological Features
into the image space, while it is insensitive to missing parts of lines, noise and other
non-line structures present in the image.
b = -x·a + y
b = - x 1 · a +
y 1
b
a
b = - x 2 · a + y 2
y = a·x + b
(x1,y1)
(x2,y2)
y = a ’ · x
+ b ’
y
b’
a’
x
(a) (b)
Figure 2-4 – (a) Image space for Hough transform; (b) Slope-intercept space for Hough transform.
with that approach, is that both slope and intercept, which were
the param
One basic problem
eters used by Hough are unbounded (slope is infinite for vertical lines),
complicating the application of the technique. Duda and Hart [46], proposed the use
of a different parameter space. They proposed the use of the so-called normal
parameterisation to describe a line. According to that, each line can be described by
the angle of its normal θ and the algebraic distance from the origin ρ . Using these
parameters, the equation of a line would be given by:
θ θ ρ sincos += y x Eq. 2-14
π θ <≤0 , D D 22 <≤− ρ Eq. 2-15
If θ is restricted to the interval [ 0, π ] then the normal parameters for a line are
unique. Each line in the image space would correspond to a sinusoidal curve, given by
Eq. 2.1-10. Duda and Hart, transformed each identified pixel in the image to the
corresponding sinusoidal curve in the parameter space. Collinear pixels in the image
space, would correspond to two intersecting sinusoidal curves in the parameter space,
thus the problem again was converted in finding concurrent curves in the parameter space ( Figure 2-5) . The advantage of using the normal parameterisation to describe
lines, is that both d ρ can be confined as shown in Eq. 2-15 where D is the
distance between corners in the image, since points outside this rectangle correspond
θ an
to lines outside the image in the image plane.
θ θ ρ
(x1,y1)
(x2,y2)
x
y θ θ ρ sincos += y x
'sin'cos' θ θ ρ += y x
ρ
θ
sincos += y x ρ
θ ρ = x 2 · c
o s θ +
y 2 · s i n θ
ρ = x 1 · c
o s θ +
y 1 · s i n θ
ρ’
θ ’
(a) (b)
Figure 2-5 - (a) Image space for Hough transform; (b) θ - ρ space for Hough transform.
n be described by an analytical
is called a directed graph . If an arc is directed from node ni to node n j, then n j is calleda successor of its parent node ni. The one node at level zero of the graph is called the
Generalization to more complex curves that ca
equation is straightforward. In each case, the appropriate parameter space is
constructed and quantized within the limits of the parameters. Then each identifiededge pixel in the image would map to a number of accumulator cells for different sets
of parameters. The accumulator cells with the higher number of counts would give the
most probable parameters for the shape in question. If the desired region borders
cannot be described using an analytical situation, a generalized Hough transform [12,
40, 83] can be used. In this case, a parametric curve description is constructed based
on sample situations detected in the learning stage. Finally, even if the exact shape of
objects is unknown, as long as there is enough a priori knowledge to form an
approximate model, a fuzzy Hough transform can be used [148].
Graph techniques for edge detection and linking
Some methods have also been proposed to link edge elements based on representing
them in the form of a graph and searching for an optimum path in it that corresponds
to a meaningful boundary. A graph [59, 186] is a general structure consisting of a set
of nodes ni and arcs between the nodes [ ni , n j]. A graph in which the arcs are directed
Text Segmentation in Web Images Using Colour Perception and Topological Features
number of cells of small size ( 2x2, 4x4, or 8x8) and calculate a statistical measure of
intensity over the cells. Then, beginning with the first cell in the upper-left corner of
the
of boundary segments, thus the
per
o
regions that could be sufficiently approximated by a single approximating function.
He proved that if an image has been approximated by a large number of regions, thenthe number of approximating regions can be decreased by merging regions that have
image, they compared the statistics with those of each neighbouring cell to
determine if they are similar, in which case they joined the two cells into a bigger
fragment. The process was continued; growing the fragment by comparing it to all of
its neighbours, until no neighbours remained that could be joined to the fragment. The
next uncompleted cell was then used as the starting one and the process was repeated
until all cells were assigned to some region. The only property employed by this
method was intensity information. The results of the method are dependant to the
order in which the cells are being assessed, as is every region growing and merging
method.
Brice and Fennema [20] used heuristics that evaluated parameters depending on
more than one region. They started with individual pixels and following a process
similar to that of Muerle and Allen [122] created regions of pixels having equal
intensity. This first stage produces a large number of atomic regions in most of the
cases. The concept of boundary was introduced. The boundary of each region is
composed of a set of simply connected boundary segments, which are assigned
strength according to the difference of intensities at each side of them. For each
region, we can count the number of boundary segments having strength below a
specific tolerance, as well as the total number
imeter of the region. The heuristics introduced by Brice and Fennema were based
on these boundary segments. The first heuristic, called the “phagocyte” heuristic,
merges two adjacent regions if the boundary between them is weak (strength below a
tolerance) and the resulting region has a shorter boundary than the previous two. The
second heuristic, called the “weakness” heuristic merges two regions if the weak
portion of their common boundary is some predetermined percentage of their total
shared boundary. The first heuristic is more general, and the second is used to refine
the results of the “phagocyte” heuristic, but cannot be used on its own since it does
not consider the influence of different region sizes.
Pavlidis [145] proposed a different approach to the problem of region growing
using functional approximations. His technique was based on dividing the image int
similar coefficients. Pavlidis initially sliced the image into one pixel wide stripes, and
larger ones, region
spli
e the same as the mean grey level of the region. The
algorithm subdivides regions imposing either a vertical or a horizontal partition, and
continues doing so as long as a sub-region can be found whose mean grey level is
sufficiently different from that of its parent region. Robertson et al. quantitatively
defined the partition quality error of a region as the weighted sum of the grey level
variance over every sub-region. The weights were given by the relative sizes of the
sub-regions to the parent region.
divided the stripes into segments such as the intensity of each of those segments could
be approximated by a simple one-dimension linear function. These segments were the
starting regions for the method, which considered joining regions with similar
coefficients with the help of a graph representation of the starting segments. The only
coefficient of the approximating function used in this implementation was its slope.
Region splitting techniques
Region splitting can be considered the opposite of region merging. While region
merging is a bottom-up process combining smaller regions into
tting is a top-down process, starting with the whole image as one region, and
dividing it in sub-regions so that the resulting regions conform to a homogeneitycriterion. Although region merging and region splitting methods employ the same
criteria about homogeneity and region similarity, the two methods are not dual, in the
sense that even if identical homogeneity criteria were used, the two methods would
not result in the same segmentation as can be seen in Figure 2-6.
One of the first techniques using region splitting was proposed by
Robertson et al. [160]. They defined a criterion for region uniformity, called
G-regularity . Although the original algorithm was developed for multi-spectral
images, the grey-scale equivalent of G-regularity would be the mean grey level
Text Segmentation in Web Images Using Colour Perception and Topological Features
(a) (b)
Figure 2-6 – (a) Different pyramid levels for a chessboard image; (b) The average grey-level for each
pyramid level. Splitting the upper pyramid level does not result to regions of different average grey-
levels, so no splitting occurs. The lowest level’s regions have different average grey-levels so no
merging can take place. Splitting and merging in this case would produce different segmentations, even
if they use the same criteria.
Splitting and merging techniques
The next expected step in region extraction, was the joining of splitting and merging
techniques. Horowitz and Pavlidis [76, 146] proposed a functional approximation split
and merge technique, in which regions are described again in terms of an
approximating function. The approximations are two dimensional here, in contrast
with the previous work of Pavlidis [145]. A pyramidal structure was introduced
(Figure 2-7) , which is a stack of pictures beginning with the original picture at the
bottom and pictures of decreased resolutions at higher levels. The picture at one level
is produced from the picture at the level below by averaging the intensities of pixels in
non-overlapping 2x2 squares. Thus, the picture at any level would be half the width
and height of the picture at the level below. If any region in any pyramid level is not
homogeneous, it is split into four regions, which are the corresponding regions at the
level below. Similarly, if four regions exist at a pyramid having similar homogeneityvalues, they are merged into a single region in the above pyramid level. Splitting and
merging can therefore be described as moving up and down in this pyramidal
structure. This pyramidal structure can be expressed as a quadtree where the root is
the top level of the pyramid structure and each node is an element of some pyramidal
level. A constraint imposed by the pyramidal structure used, is the assumption of
square regions. A grouping operation that follows the split and merge operation is
used in Horowitz and Pavlidis method that addresses this issue by merging adjacent
regions regardless of the pyramidal structure.
S p l i t
t i n g
M e r gi n g
(a) (b)
Figure 2-7 – (a) Pyramid structure of an image. Each level is a decreased resolution copy of the image
below; (b) Splitting and merging expressed as moving between levels of the pyramid.
Based on the concept of splitting and merging, many approaches have been
n adaptive split and merge algorithm. A
Semantic region extracting techniques
proposed. Chen et al. [32] proposed a
modification of the pyramidal structure, introducing overlapping regions, was
proposed by Pietikainen and Rosenfeld [149-151], where each region has four
potential parent regions and each parent region has sixteen possible child regions. A
single-pass split and merge method was proposed by Suk and Chunk [191], using a
dictionary of the twelve possible splitting patterns for a 2x2 block of pixels. A single-
pass method is advantageous it terms of memory usage which can prove high for split
and merge methods [22].
In all the methods discussed up to this point, only heuristics based on local properties
of regions were used, like intensity. Here, a number of techniques suggested that
employ semantic information to facilitate merging will be described. Semanticsegmentation methods interpret the regions as they are formed, using a-priori
Text Segmentation in Web Images Using Colour Perception and Topological Features
ima
of trying to extend to colour
images, methods originally proposed for grey-scale ones. This introduces a number of
problem d later on.
sult of light in the visible region of the spectrum (having
wav
which case it is called
“psychophysical colour” .
Wyszecki and Stiles [210] define psychophysical colour as following: “Colour is
that characteristic of visible radiant energy by which an observer may distinguish
differences between two structure-free fields of view of the same size and shape, such
as may be caused by differences in the spectral composition of the radiant energyconcerned in the observation” . For simplicity reasons, the word colour is used
throughout this thesis referring to psychophysical colour.
ges. The feature space of colour images is inherently multi-dimensional, since
three (in some cases more) components are needed in order to describe each colour.
Nevertheless, generally colour segmentation approaches do not treat a pixel’s colour
as a point in a colour space; instead, they decompose it into three separate values,
which they recombine later on. This is the natural result
s, which will be explaine
In the first section of this chapter, a definition of colour is given and a brief
description of colour systems and their importance for colour image segmentation is
examined. The remainder of this chapter comprises a review of techniques used for
colour image segmentation along with a critical appreciation of methods proposed.
Finally, concluding remarks are given in the last section.
2.2.1. Colour
Colour is the perceptual re
elengths between approximately 400nm to 700nm ). More formally, the word
colour can be used to define two similar but distinct effects. Colour is used to define
the aspect of human perception concerned with the ability to make a distinction
between different spectral power distributions, in which case it is called “perceived
colour” . The word colour is also used to define the characteristic of a visible radiant
Figure 2-8 – CIE standard daylight illuminant D 65 relative spectral power distribution. The power of
each wavelength is normalized so that the power at λ=560nm equals 100.
A colour stimulus is radiant energy of given intensity and spectral composition,entering the eye and producing a sensation of colour. This radiant energy can be
completely described by it’s spectral power di 31
example, Figure 2-8 shows the spectral power distribution of CIE (Commission
Internationale de l’Eclairage or International Commission on Illumination) standard
daylight illuminant D . Using 31 com
stribution. This is often expressed in
components, each representing power in a 10nm band from 400nm to 700nm . For
65 ponents is a rather impractical and inefficient
way to describe a colour, especially when a number of colours must be described andcommunicated, which is the case with computer graphics. A more efficient way
would be to determine a number of appropriate spectral weighting functions to
describe a colour, and it proves that just three components are adequate for that, based
on the trichromatic nature of vision. CIE standardized in 1931 a set of spectral
weighting functions, called Colour Matching Functions, which model the perception
of colour. These curves are referred to as x , y , and z , and are illustrated in Figure
2-9. A more detailed discussion on human vision and colour systems is available inAppendix A, while colour systems in the context of colour image segmentation are
cases the first two features only were adequate for a good segmentation. These three
features strongly correlate to the three components of the CIE L*a*b* colour system.
Schacter, Davis and Rosenfeld [176] also suggested that the use of a uniform
colour performance of colour
ardware profiles are increasingly
developed. sRGB [190] is such a colour system,
ly splitting the image in two,
based on the peak selected. The nine colour features used were collected from three
system such as the CIE L*a*b* could improve the
clustering. Furthermore, Zhang and Wandell [217] proposed a spatial extension of the
CIE L*a *b* colour system in order to simulate the spatial blurring that naturally
happens by the human visual system. The image is transformed into an opponent
colours space and each opponent-colours image is convolved with a kernel whose
shape is determined by the visual sensitivity to that colour area.
Choosing the appropriate colour system to use is not a trivial task. Each
application has different objectives and requirements, and one colour system cannot
always be the right choice. The latest developments are towards systems that are
hardware independent and computationally inexpensive. In order to preserve colour
appearance between different monitors, special h
used, and new colour systems are
developed for use in the World Wide Web.
2.2.2. Histogram Analysis Techniques
Although colour information is fully described by three or more components as
discussed before, the complexity of working with all colour components
simultaneously, lead to simpler approaches that work with one or two colour
components at a time. Some of the earliest approaches for segmenting colour images
were based on 1D histogram analysis of colour components of the image. In most of
these methods, histograms for all colour components available are computed, but
segmentation happens by working on one of the histograms, and recursively splitting
the feature space into clusters, one colour component at a time. A number of histogram analysis methods will be discussed in this section, working our way to
more complex clustering schemes in section 2.2.3.
Histogram Analysis Techniques
One of the first colour segmentation methods proposed is by Ohlander, Price and
Reddy [135]. The method is based on selecting the best peak from a number of
histograms of different colour features, and recursive
where S p is the peak area, T a is the area of the whole histogram, and fwhm is the full
width at half-maximum of the peak. The pixels that fall in the range demarcated by
the peak define a sub-region. The sub-region is extracted and the thresholding process
is repeated for the pixels of the sub-region, leading to the detection of the most
significant cluster. The process finishes when the histograms become mono-modal.
Following a labelling step based on the above segmentation, the same procedure of
recursive thresholding is applied to the remaining of the pixels, identifying in this way
the rest of the important clusters.Tominaga also proposed a modification of the aforementioned algorithm [196].
The first step here is essentially the sam
ain of cluster m1. In a similar
fash
r
Lightness , Hue and Chroma . The method proposed is similar to Ohlander [135] and
Tominaga [195] in the sense that clusters are identified by recursive thresholding of
e as described above, modified to overcome
the problem of handling overlapping clusters. The colour space used this time is the
CIE L*a*b*. A second step is supplemented to the algorithm, which classifies again the
pixels, based on a colour distance metric in CIE L*a *b*. A number of colours are
initially identified according to the regions the image was initially partitioned into. So
if K is the number of regions resulted by the first step, K colours (the colours of theregions) are identified as representative of the image. Let k 1 , k 2 , k 3 …, k n be the set of
representative colours. The first colour is used as the centre for a first cluster m1, so
that m1=k 1. The second colour is then compared to m1 in terms of its distance in CIE
L*a*b*. If the distance is more than a threshold T , then a new cluster is created with
centre m2=k 2; otherwise, k 2 is assigned to the dom
ion, the colour difference of each representative colour to each established cluster
centre is computed and thresholded. A new cluster is created if all of these distancesexceed the threshold T , otherwise the colour is assigned to the class to which it is
closest.
An approach that operates in the CIE L*a*b* colour space has been proposed by
Celenk [31]. Celenk defines a set of cylindrical coordinates in CIE L*a *b* which
resemble the Munsel colour system, and concur well with the accepted physiological
model of colour vision. The coordinates used are named L*, H° and C * and stand fo
The multidimensional extension of the concept of thresholding is called clustering.
Clustering is the process of assigning units that share common characteristics in a
number of homogeneous partitions in the feature space called clusters. Clusteringalgorithms in the literature can be broadly divided in hierarchical clustering and
Hierarchical clustering algorithms produce a tree structure of a sequence of
clusterings. According to their tree-structure, hierarchical clustering algorithms can be
cate
ngle partitioning of the data set in
eve the
nsive
pro
gorized into nested and non-nested. In nested hierarchical clusterings, each cluster
fits itself in whole inside a larger cluster of a higher scale, whereas in non-nested
hierarchical algorithms a cluster obtained at a smaller scale can divide itself intoseveral parts and fit those parts in different clusters at a higher scale. Nested
tree-structures are usually easier to use, nevertheless once a cluster is formed, its
members cannot be separated subsequently, making nested hierarchical clustering
algorithms less flexible than non-nested ones.
Partitional clustering algorithms produce a si
contrast to hierarchical ones. Most partitional clustering algorithms achi
clustering through the minimization of appropriate measures such as cost functions.
The high complexity and computational cost of those algorithms necessitates the use
of techniques such as simulated or deterministic annealing to lower the computational
overhead and ensure to a certain degree that the global minimum of the criterion used
has been reached. K-means clustering, ISODATA and c-means clustering are just a
few examples of partitional clustering algorithms.
Clustering is a generic process used across a variety of fields. In the context of
colour image segmentation, the feature space commonly used for clustering is the
colour space employed for the description of the image. Since colour spaces are
inherently three-dimensional, clustering is usually a computationally expe
cess, therefore a number of methods have been suggested that work in lower-
dimensional spaces. A brief description of two two-dimensional clustering approaches
will be given first, followed by a critical review of several multi-dimensional
hierarchical and partitional clustering techniques.
Text Segmentation in Web Images Using Colour Perception and Topological Features
Essentially, K-means algorithm aims to minimize an appropriate criterion. The
sum of squares criterion, given in Eq. 2-17, is most commonly used. More often than
not, a local minimum is reached instead of the global one, necessitating the use of
techniques such as simulated or deterministic annealing in order to achieve a better
solution.
∑∑=
been used a lot for colour image segmentation. An
example is the method proposed by Weeks and Hague [203], who perform K-means breaking the clustering process
inate between the colours; insteadthey will perceive an average of the colours present. In order to simulate this
−= K
i S nin
i
m x D1
2 Eq. 2-17
K-means clustering has
clustering in the HSI colour space. They also propose
in two, one in the Intensity – Saturation two-dimensional space and one in the Hue
one-dimensional space, biasing that way the segmentation process towards a colour’s
Hue value, which they the consider more important for human perception.
The K-means clustering technique has a number of weaknesses. The most
prominent disadvantage is the fact that the results of the clustering depend on the
number of clusters, and the way the initial means for them are initialised. It frequentlyhappens, that non-optimal partitions are found. The standard solution to this problem,
is to try a number of different starting points, or the use of techniques such as
simulated or deterministic annealing as mentioned before. Depending on the initial
selection of cluster means, it is possible that the set of feature vectors closest to one of
the means is empty therefore that specific mean cannot be updated. Special cases like
these have to be properly handled by each implementation. K-means clustering is
often used as a first step in more complicated approaches. An example of such anapproach will be given next.
K-means Clustering and Probabilistic Assignment
Mirmehdi and Petrou [118] suggested a method to segment colour image textures
based on human perceptual characteristics. When an observer deals with multi-
coloured objects, their colour matching behaviour is affected by the spatial properties
of the stimuli. For example, when a number of pixels of different colours are
cluster during the first iteration, introducing four new clusters at each iteration. The
number of 2400 pixels and the number of clusters were decided upon after
experimentation with different values. At each iteration, the algorithm checks the
cluster means to determine whether two cluster means are very close. The condition
used by the authors is given in Eq. 2-21. If the difference computed for each feature is
less than the set threshold, the two clusters are considered identical and are merged
into one bigger cluster. The feature spaces used were the RGB colour space and the
I 1 I 2 I 3 colour space proposed by Ohta [136], while the inner product norm metrics
tested were the Euclidean and the Mahalanobis distances. The authors concluded that
the differences between the RGB and the I 1 I 2 I 3 colour space as far as segmentation is
concerned are minimal.
075.02
21
21 <+− f f
f f for each feature “f”, Eq. 2-21
ive. Using this knowledge
of the location of valleys in each colour component histogram, they effectively divide
the colour space in a series of hexahedra. Th first phase thus identifies clu
terms of the hexahedra located, but only the ones containing above a certain number
of pixels are considered good. A second step th n follows that classifies the rest of the pixels into one of the found clusters. Lim and Lee tried a number of different colour
spaces ( RGB, XYZ , YIQ, UVW , I 1 I 2 I 3) and observed that I 1 I 2 I 3 proposed by Ohta [136]
gives the best results.
Graph Partitioning
Shi and Malik [180], formulated the segmentation problem as a graph part
problem. Their method reduces the problem of graph partitioning to solving an
eigenvector and eigenvalue problem. Let G=(V, E) be a graph whose nodes are pointsin the measurement space and whose edges are associated with a weight representing
An interesting colour segmentation method based on fuzzy c-means has been
proposed by Lim and Lee [103]. They use scale-space histogram thresholding to find
the number of clusters to use for the fuzzy c-means clustering. First, scale-spacefiltering is applied to the histogram of each colour component and the optimal scale is
determined. Examining the smoothed (at the determined optimal scale) histograms,
they locate valleys by use of the first and the second derivat
The edge weights are given by Eq. 2-13 were X(i) is the spatial location of node i,
and F(i) is the feature vector. Shi and Malik tested their algorithm with different
features, depending on the application. For colour images, the features used are based
on the HSV colour system, while features are also proposed for texture and grey-scale
image segmentation. The algorithm suggested is based in solving the system of
equations Eq. 2-27, where D is a diagonal matrix with d (Eq. 2-28) on its diagonal,
and W a symmetrical matrix with W(i, j)=w(i, j) . Shi and Malik showed that thesecond smallest eigenvector is the real valued solution to the normalized cut problem,
and is the one used in each iteration to split the graph, until the normalized cut
exceeds a pre-defined threshold.
Other Clustering Algorithms
A morphological approach for 3D clustering in feature space is proposed by Park,
Yun an 14 is to smooth the 3D colour
ny colour
sys
d Lee [ 3]. The first step of this approach
histogram by performing 3D Gaussian convolution with two standard deviations ( σ 1
and σ 2). Then the difference of the two resulting histograms is considered and peaks
and valleys are identified. After this pre-processing it is observed that non-empty bins
are widely scattered in the colour space, therefore, a closing operation follows after
which clusters are identified and labelled in the colour space. A dilation process
comes next, enlarging the clusters in a manner so that neighbouring bins not contained
to a cluster to merge with one, still preserving the clusters themselves (not combining
any two of them). Finally, a post processing stage follows, where the remaining
unsegmented pixels are assigned to a cluster by means of checking the colour distance
(Euclidean distance in the colour space) between each unsegmented pixel and the
segmented neighbouring pixels. The unsegmented pixels are assigned to one of the
clusters of their neighbouring segmented pixels, according to their colour distance to
it. The authors use RGB; nevertheless, the algorithm can be used with a
tem. Partitional algorithms (such as K-means or fuzzy c-means) partition the
feature space based on a distance measure, whereas the proposed one is concernedwith only the shape, connectivity and distribution of clusters.
Text Segmentation in Web Images Using Colour Perception and Topological Features
2.2.4. Edge Detection Techniques
As mentioned in section 2.1.2, edge detection based segmentation techniques are
based on the detection of discontinuities in the image. Edges are generally defined as
the points where significant discontinuities occur. Discontinuities are considerablyne in colour images than in grey-scale ones. Numerous
f the orientation and strength of an edge is computed.
Fin
information needed to
more complicated to defi
methods exist to derive edge information for each pixel when only one component is
used, as in grey-scale images. The fact that more than one component (usually three)
are used to describe the colour of each pixel in colour images, introduces the need of
an additional step in the process of edge detection, namely the recombination of the
image, which can happen in different stages in the pipeline of edge detection.
Effectively, a set of operations is performed on each component and the intermediateresults are then combined to a single output. According to the point at which
recombination occurs, colour edge detection methods can be categorized as Output
Fusion methods, Multidimensional Gradient methods, and Vector methods.
Output fusion methods work in each colour component independently, and the
results are then merged to produce the final edge map. In multidimensional gradient
techniques, a single estimate o
ally, in vector methods, no decomposition (and therefore no recombination) of the
image happens; instead, the vector nature of colour is preserved throughout the
process. In the remainder of this section, approaches that fall in each of the above
categories will be presented.
Output Fusion Methods
In output fusion methods, grey-scale edge detection is carried out independently in
each colour component, and then the results are combined to produce the final edge
map. One of the first output fusion methods was developed by Nevatia [131]. Nevatia
defined a colour system of one Luminance and two Chromaticity components. The
Luminance component is given as a weighted sum of the R, G and B components of
the image, while the chromaticity components are chosen to be representative of Hue
and Saturation. Hueckel’s edge detector is applied to each component separately, but
although the edges in the three components are allowed to be independent, a
constraint is applied, that they must have the same orientation. The author states that
the edges in the Luminance component contain most of the
Text Segmentation in Web Images Using Colour Perception and Topological Features
component, should be identified in the two remaining ones, namely the Saturation and
the Luminance. Even if RGB is finally chosen, the Euclidean distance is not the best
way to compute the contrast, for reasons explained already in section 2.2.1.
Vector MethodsAn interesting edge detection method that does not perform any decomposition of the
image into colour components has been suggested by Huntsberger and Descalzi [80].
The method is based on identifying clusters in the colour space, and assigning to each
pixel membership values to the clusters identified. These values are subsequently used
for edge detection. Specifically, the clustering part of the method is a fuzzy c-means
clustering algorithm proposed by Huntsberger et al. [81] that has been reviewed in
sec
els in the image.
Moment-preserving bi-level thresholding is then the process of finding two thresholds
tion 2.2.3. Very briefly, the algorithm starts with four clusters and introduces four new clusters at each iteration, until all pixels are assigned a membership value above a
specified threshold. After this first step, each pixel has been assigned a certain
membership value to each cluster. Edge pixels are then defined as the ones that
equally belong to two clusters, in other words a color edge is defined as the zero-
crossing of the operator Edge k ( µ i ,µ j )= µ i-µ j, where µ i and µ j are the membership
values of pixel k to the clusters i and j. A strong advantage of this method is that it is
independent of orientation, and of any type of crisp or fuzzy threshold for the edge
pixels; on the other hand, it is strongly dependant on the initial clustering and
association of membership values to pixels. The authors examined different colour
systems ( RGB, Ohta’s colour system, CIE XYZ and RGB with a Riemannian norm
metric) and reported better results when using RGB with a Riemannian metric. They
reason that this happens because such a metric induces ellipsoidal shaped clusters,
which in turn match the shape of clusters derived from human colour matching
experiments. Although this stands true (Mac Adam’s ellipses [210]), if a colour space
that presents some degree of perceptual uniformity is used, no complicated distance
metric is actually needed.
A method that treats colours as vectors has been proposed by Yang and Tsai
[213]. Their method is based on dimensionality reduction and moment-preserving
thresholding of blocks of pixels in the image, followed by edge detection in each
block. The ith moment of a grey-level image is given by Eq. 2-33, where ƒ(x, y) is the
grey value of pixel (x, y) and N is the number of pix
Text Segmentation in Web Images Using Colour Perception and Topological Features
by Yang and Tsai. Specifically, no clear explanation is given to why the final vectors
were (R1 , G 1 , B1 ) and (R2 , G 2 , B2 ) and not some other combination, for example
(R1 , G 2 , B1 ) and (R2 , G 1 , B2 ). Another issue in this method is the fact that only
straight-line shaped edges are sought for. Unless the blocks used are small, line
elements cannot be representative of edges, on the other hand, if small blocks are
used, the number of pixels may not be adequate to perform statistical measurements
such as computing meaningful moments. Finally, the way edge detection is
performed, prohibits more that one straight edge element in each window, as for
example near intersections of edges.
Edge Detection using Colour Distributions
A different approach to colour edge detection is described by Ruzon and Tomasi [168,170]. They propose the use of the so-called compass operator, which is based on
measuring the difference between the colour distributions of the pixels lying on
opposite halves of a circular window taken in different orientations. A circular
window is centred on each pixel, and for each orientation, it is partitioned in two
hemi-circles. The distribution of pixel colours on each side is represented as a colour
signature, that is a set of colour masses in a colour space. The size of each point mass
is determined by a weighting function, which is defined as a function that approaches
zero as we move away of the centre of the window. Vector quantization is perform
before calculating the colour signatures, in order to reduce the number of colours, thus
the computational cost of the algorithm. The distance between the distributions of the
two halves is then computed for each orientation, using the Earth Mover’s Distance
(EMD) [167]. Given d ij a distance measure between colours i and j, where colour i is
in o
the minimum EMD is high, the edge model is violated,
thus the authors call this minimum value “abnormality”. Furthermore, a measure of
ed
ne hemi-circle ( S 1) and colour j in the other ( S 2), EMD finds a set of flows that
minimizes Eq. 2-38. The distance d ij between two colours ranges in [ 0,1] and is given
by Eq. 2-37, where E ij is the Euclidean distance of the colours computed in
CIE L*a*b* and γ is a constant that determines the steepness of the function. The
maximum and minimum values of EMD and the associated orientations are identified
for each pixel. The maximum value gives the strength of the edge. The minimum
value is equally important, because it can be considered a measure of the photometric
Text Segmentation in Web Images Using Colour Perception and Topological Features
colour of a region can be calculated by Eq. 2-39. Then the trace of the covariance
matrix is given by Eq. 2-40.
== ∑∑∑
N
c
N
c
N
cmmm M i
ii
ii
i 321
321 ,,),,( Eq. 2-39
N
mcmcmcT i
ii
ii
i ∑∑∑ −+−+−=
233
222
211 )()()(
Eq. 2-40
where m1, m2, m3 are the means of each colour component for the given region, ( c1i,c2i, c3i) is the colour of pixel i, and N is the number of pixels in the region. If this
value is above a specified threshold, the corresponding region is recursively split,
whereas if the value is below this threshold the region is added to a list of regions to
be subsequently merged. The merging process can also be ruled by a condition based
on the trace of the covariance matrix. The value is computed for the merged region
and if below a threshold, the two regions can be merged. Alternatively, a comparison
between the average colours of the regions could be used to rule the merging process.
Simple Region Growing Techniques
The second algorithm investigated by Gauch and Hsia [58] is a typical seed based
region growing technique. They commented on a number of techniques to perform
seed based region growing based on colour distance between the regions. The
importance of re-computing the average colour of each region when a new pixel is
added to it was stressed, since otherwise the outcome would be very much dependant
on the initial selection of the seed pixels. Searching all the adjacent pixels of a givenregion at each iteration of the region process to identify the one with the smallest
colour distance to the mean colour of the region would be preferable, but
computationally expensive. Instead, the authors search for any neighbour that is
within a specified colour distance to the mean colour of the region. As expected, the
speed improvement is remarkable, while the authors report that no noticeable
difference in the segmentation results is observed. Finally, the authors suggested that
in order to obtain results independent of the initial selection of seed points, one should
make sure that the seeds selected are not edge pixels. An edge detection algorithm can
be used to identify which pixels to avoid. The authors concluded that the best colour
space to use depends strongly on the type of the image in question, nevertheless they
generally favour RGB and YIQ to L*a *b* and HLS .
A rather simple colour segmentation algorithm was proposed by Tremeau and
Borel [197]. Their algorithm is based on region growing and region merging, which
they perform simultaneously. Starting with a pixel as the seed, they grow a region
checking the colour similarity between the pixel and its neighbours. After each region
is identified, it is checked to the neighbouring regions, and if the difference of their
average colours is under a specified threshold, the two regions are merged. The colour
dist
he regions identified up to that
mo ent, introduces an even larger dependency on the sequence of comparisons. On
the other hand, this approach limits the number of tests and results in a
computationally inexpensive process.
A colour segmentation method for video-conferencing type images was proposed
by Ikonomakis, Plataniotis and Venetsanopoulos [82]. The colour system they use is
the HSI , as a perceptual oriented one. The innovative point of this method is the
different treatment between chromatic and achromatic pixels. An achromatic pixel is
defined as having Intensity at the edges o tensity scale or low Saturation. For
these pixels, Hue would not be indicative of their colour, since it tends to be unstable
(even undefined for extreme Intensity values) at these ranges of Saturation and
Intensity. For achromatic pixels, the homogeneity criterion is defined in terms of their Intensity difference, whereas for chromatic pixels, four different distant metrics were
ance used in both cases is the Euclidean distance in the RGB colour system.
Finally, the authors also propose methods to estimate the thresholds used both for
pixel and for region similarity. The method proposed suffers from a number of
drawbacks. First, the choice of RGB and the Euclidean distance to calculate colour
differences is rather a disadvantage, since RGB is neither related to human perception
of colour, nor it is perceptually uniform so as to justify the choice of Euclidean
distance. The authors do mention that the method is generic, and propose the use of
CIE L*a*b* or CIE L*u*v* and Mahalanobis distance instead, but take no step in that
direction. Furthermore, the algorithm proposed is very dependant on the sequence of
pixel and region comparisons. Although this is true for all region growing methods,
the fact that the comparison of each region to the neighbouring regions takes place
immediately after the region’s creation (prior to having identify all the possible
Text Segmentation in Web Images Using Colour Perception and Topological Features
tested. These are the generalized Minkowski metric ( Eq. 2-41) , the Canberra metric
(Eq. 2-42) , and a metric defined by Tseng and Chang [198] called cylindrical distance
metric ( Eq. 2-43) . The cylindrical distance metric computes the distance between the
projections of the pixel points on a chromatic plane.
( )δ γ β α 1
),( ji ji ji M I I S S H H jid −+−+−= Eq. 2-41
ji
ji ji
i
ji
can
i I S S
H H
H H jid
−+
−+
+−
=),( Eq. 2-42
ji j I I S S ++
( )222),( C I cyl d d jid += , where1 Eq. 2-43
ji I I I d −= and ( )21
22 cos2 θ −+= ji jiC S S S S d Eq. 2-44
The method proposed is based on a classical region growing scheme, working in a
left to right and top to bottom fashion growing regions from seed pixels by making
appropriate comparisons (chromatic / achromatic) to their neighbouring pixels.According to the authors, the Cylindrical Distance metric produces better results.
They also report that the Minkowski metric gives better results if they put more
emphasis in the Saturation component, which is rather unexpected and directly
contradicts to one of their previous statements that the Hue component has a greater
discrimination power. The differentiation between chromatic and achromatic pixels
can be a good enhancement. It is exploited in a different manner in the method
described later on in Chapter 4.Region Growing Techniques Based on Combined Spatial and Colour Measures
A comprehensive approach for region growing in colour images is given by
Moghaddamzadeh and Bourbakis [119]. The authors define a set of procedures, which
they combine in two different ways defining a method designed for coarse
seg
edge detection approach [120] by the same authors reviewed in the previous section),
mentation and one for fine segmentation. A number of pre-processing steps are
used by the authors, in order to identify edges in the colour image. First, the image is
smoothed using an algorithm that preserves certain pixels located at edges (same as in
and then the edges are identified in the image, by the use of an algorithm explained in
a bit more detail in section 2.2.4. Two conditional criteria are defined that are used
extensively throughout the region growing stage of the algorithm: the homogeneity
criterion and the degree of farness measure. The homogeneity criterion checks the
absolute colour distance between a given pixel and a segment, and the local colour
distance between the pixel and its neighbouring pixels in the direction of expansion,
in order to decide whether the given pixel can be merged with the segment or not. By
checking the colour distance locally, this criterion keeps the segment growing if the
colour is gradually changing (due to illumination or shading). The second criterion,
the degree of farness measure, combines the spatial distance of a pixel to a segment,
with the colour distance between the pixel colour and the segment average colour.
The two distances are multiplied to produce the degree of farness measure.
Segments are identified by scanning the image and growing any possible seed
bas
ly
rela
ed on either the edge information (produced at pre-processing) or the homogeneity
measure. Only segments having a size over a specified threshold are considered good.
A segment expansion procedure is also defined, which takes into account either of the
two criteria defined before. Finally, a procedure is defined that checks for unassigned
pixels and decides whether they should be assigned in a neighbouring segment, or – if
adequately different – it grows a segment around the pixel in question. The method
defined for coarse segmentation is essentially based on repeatedly finding segments
and expanding segments, calling the procedures first for large segment sizes and
subsequently for smaller ones. This coarse segmentation method finishes by checking
for unassigned pixels as mentioned before. The fine segmentation method differs in
the way segments are identified in the first place. Instead of performing region
growing, a histogram table is constructed, which contains a sorted list of all the
colours and the number of occurrences of each one. Segments of the colours
presenting higher occurrences are identified and expanded first, followed by colours
presenting lower occurrences.
In recent work [215] the authors applied the dichromatic reflection model in
addition to the two criteria described above to merge highlight and shadow areas with
matte areas in the image. Although a comprehensive approach, the two methods
suggested by Moghaddamzadeh and Bourbakis suffer from certain drawbacks main
ted to the two criteria defined. The homogeneity criterion works well, by allowingfor gradients to be included in the segments, while at the same time checking the
egmentation as defined in the previous chapter, is the process of partitioning
an image into a number of homogeneous disjoined regions. This definition
has to be slightly altered when referring to text image segmentation. The product of
text image segmentation is a higher-level description of the image in hand, in terms of
regions of interest. Regions of interest typically are areas in the image where text lies.
Depending on the application, other structural elements of the text image, such as
tables, figures, separators etc. might also be regions of interest. After having identified
a number of regions, the type of contents of each region must be established; this
process is called classification. Classification of the regions can take place either
independently after segmentation, or in parallel, interacting with the segmentation
process.
S
The underlying assumption here is that text exists in the image being analysed.
Consequently, specialised segmentation and classification methods make extensive
use of features emanating from the existence of text in the images. Images found in
the World Wide Web more often than not, contain text; therefore, they constitute a
special type of text images. A review of the main text image segmentation and
classification techniques will be given in this chapter. A number of bi-level page
segmentation methods will be detailed in the next section, followed by methods
specialised in colour documents. In Section 3.3, methods for text extraction from
video sequences will be discussed and in Section 3.4 the more generic problem of finding text in real-life scenes will be addressed. Finally, Section 3.5 will detail the
Text Segmentation in Web Images Using Colour Perception and Topological Features
few existing approaches for extracting text from Web Images. A discussion on the
methods presented follows in Section 3.6.
3.1. Bi-level Page Segmentation Methods
In the context of Text Image segmentation, Page segmentation has received much
attention mainly due to its many applications. Page segmentation specialises in
analysing document images. Typically, document images are bi-level (most often dark
ink over light background). In bi-level images of document pages, the two classes,
namely the foreground and the background, are already separated: pixels of one
colour denote background and pixels of the other denote foreground. Regions of
interest in this case would be neighbourhoods of foreground pixels.
Since the foreground and background are already separated, the way regions are
described in the page can be slightly relaxed. What this means, is that strict definitions
of the regions, such as connected components of foreground pixels, can be used but
this level of detail is not always necessary (and many times, not easy to use). Less
relaxed descriptions of the regions of interest, for example by use of bounding boxes
or contours of regions, can also be used. Such descriptions though, allow part of the
background to be contained in the final regions. This is not a problem for bi-level
images, since the foreground and background are readily separated by means of their
colour. In contrast, including parts of both the foreground and background in the
region description is unacceptable for colour documents, since a separation of the two
classes cannot be easily achieved afterwards.
In general, all methods devised for bi-level images assume either implicitly or
explicitly that the two classes (foreground and background) are already separated.
Due to this fact, most bi-level page segmentation techniques (e.g. techniques based on
Projection Profiles or Analysis of the Background) are not directly applicable tocolour text image segmentation. Nevertheless, there are a number of techniques
(e.g. Connected Component Grouping, Segmentation by Image Transformations) that
in principle can be applied to colour images.
Morphological Operations
A number of techniques based on morphological operations are commonly used either
as a pre-processing step (e.g. noise reduction) or as part of the segmentation process
itself [71]. The morphological operations typically used are dilation and erosion, and
Chapter 3 – Text Image Segmentation and Classification
3.2. Colour Page Segmentation Methods
Segmenting colour documents is understandably a much more complicated task than
segmenting bi-level or even greyscale ones. Many researchers avoid going into the
field of colour document analysis, by converting the colour documents into greyscaleones, or by examining the lightness component only. They argue that text (especially
when rendered in a single colour) would still stand out in the greyscale representation,
as a single shade of grey, adequately different from that of the background.
For example, Goto and Aso [62] propose a method for analysing colour images of
complex background, which is based on a greyscale representation of the original
image. They assume that characters in a single text string are printed in a solid,
uniform colour. The method starts by applying multi-level thresholding to thegreyscale image to create a set of sub-images for each range of grey values as
indicated from the histogram of the image. An initial pixel labelling process takes
place in the sub-images, and then region growing between neighbouring (in terms of
grey ranges) sub-images follows.
According to the authors, the proposed method performs much better than
methods created purposely for colour documents, specifically the methods of
Ohya et al. [137], and Sobottka et al. [183, 184]. The method of comparison though, is
by feeding the greyscale version of the image into those methods as well, which might
be unfair at least for the method of Sobottka et al. since it is created to work with full
colour images. The method of Sobottka et al. (discussed in this section) specialises in
book and journal covers, while the method of Ohya et al. (detailed in Section 3.4) is
created for recognizing text in scene images.
Colour Reduction for Colour Document Analysis
Since full colour information can be difficult to manipulate and computationallyexpensive to use, many researchers suggest that some type of colour reduction be
performed before processing. The necessity of colour reduction for scanned colour
documents in particular, is further dictated by the nature of the scanning process. Due
to the characteristics of the optical scanner, scanned documents contain more colours
that the original printed document. In the specific case of colour documents, it is of
great importance that no information is dropped which concerns the textual parts of
the document, since that would hinder the segmentation and subsequently the
Text Segmentation in Web Images Using Colour Perception and Topological Features
A segmentation method based exactly on colour clustering is proposed by
Worring and Todoran [208] . The document model they use, assumes that each
document can be decomposed in a number of (possibly overlapping) frames of
arbitrary shape, the content of which might be text or pictures. They restrict this
model to documents where the background colour of each frame is uniform. The
authors suggest that transitions from one colour to another (for example at the edge
between two frames), would be rather smooth due to the printing and scanning
process, unlike the step edges appearing in the original. In the colour space, these
smooth transitions would appear as lines connecting the two (clusters of) colours. The
method aims in identifying those lines in the colour space. Initially, the N most
dominant colours are selected from the RGB histogram therefore N clusters are
identified. This number of clusters is subsequently reduced by combining clusters that
lie in close proximity in the colour space. This process, takes place in already reduced
colour space. Lines are then identified in the colour space by means of edge detection
(in the colour space). Having identified a set of lines, the colour of each pixel is
checked, and the lines that present a distance to that colour less than a predefined
threshold are identified. If one line is identified, the pixel is assigned to this line;
otherwise, spatial characteristics are used to facilitate the decision process. Although
the suggested method can produce satisfactory results in simple colour documents, the
number of assumptions made does not allow for a general use.
Perroud et al. [147] examine two histogram based clustering approaches, one in
the RGB space and a second in the RGBY space, where Y is a spatial component,
which represents a quantity from the image plane. The histogram based approach, in
its simplest form ( 1D), is based in analysing a quantised version of the histogram. For
each cell in the histogram, the two neighbouring cells are checked and a pointer to the
larger one is created. If both neighbours are equal and larger than the cell in question,
a pointer to the left one is created by default, whereas if none of the neighbours is
larger, no pointer is created. At the end of this process, the histogram contains chains
of cells pointing to a local maximum. Clusters can then be defined with the help of
those chains. When this method is extended to three dimensions, as in the case of
RGB clustering, each cell in the quantised 3D histogram has 26 neighbours. Chains
could then be identified in a similar way, by checking all 26 neighbours each time.
The suggestion of the authors was to consider spatial information as well as colour during the histogram based clustering. Therefore, a fourth dimension, namely Y is
Text Segmentation in Web Images Using Colour Perception and Topological Features
text colour. The top-down segmentation accurately segments small sized text, while
over-segmentation of the characters is not possible. The bottom-up process acts in a
complementary manner to the top-down process. A region growing is performed here:
beginning with a start region of at least three horizontally or vertically aligned pixels
of the same cluster, pixels within a 3x3 neighbourhood are iteratively merged if they
also belong to the same cluster. This bottom-up approach presents difficulties
segmenting very small characters, while it can split graphic regions into sub-regions.
Regions are grouped into lines of text, based on basic text line hypotheses (distance
between components, co-linearity of characters, distance between lines). This region
grouping is performed independently for the results of each process, and the
identification of text lines makes use of the fact that both types of analysis predicted
the same output. This method assumes that both the background and the text (at least
at the character level) are of uniform colour. Although the majority of documents
conform to those specifications, a number of complex images where text is of gradient
colour, or where text is rendered on photographic background would possible pose
certain difficulties.
Another method, which is not specifically designed for colour documents, but for
a broad range of colour images containing text, comes from Jain and Yu [86, 87].
They propose a method based on decomposing the given image in a number of
foreground images and one background one, by using colour information. They use
slightly different approaches for 8-bit palettised images and for 24-bit true colour
ones. For 8-bit images the authors argue that characters occupy a sufficiently large
number of pixels, so a foreground image is created for each palette entry with a
number of corresponding pixels larger than a predefined threshold ( 400 pixels).
Therefore, each foreground image contains pixels of the same colour. Furthermore,
the number of foreground images accepted is limited to eight. The colour with the
largest number of pixels is considered the background colour. Also, as background
colour is considered the colour with the second largest number of pixels, if that
number is above a predefined threshold ( 10.000 pixels). The authors further assume
that if the text colour is not uniform, the surrounding background colour will be.
Based on that, they produce one more foreground image, which consists of all the
pixels that do not belong to the background. The process for true colour images is
similar, but some pre-processing takes place first. Bit-dropping is performed to eachof the RGB components, and only the highest two bits of each one are kept (this
Text Segmentation in Web Images Using Colour Perception and Topological Features
scrolls independently from the background scene. Therefore, the analysis of the
differences between consecutive frames or motion analysis can facilitate this process
enormously. Not limited to that type of textual information, methods have been
proposed, which aim in extracting text from video scenes; that is from the actual
content of the video sequence. These techniques are conceptually identical to
extracting text from real-life scenes, which will be analysed in Section 3.4.
Extraction of Superimposed Text
Lienhart and Stuber [102] propose a method to extract text from video sequences.
Their method is limited to text superimposed on video frames (captions, end credits
etc). They work with greyscale instances of the video frames, and base their method
in several characteristics that superimposed text possess, such as characters beingrigid and of uniform colour, text being static or linearly moving in one direction
(when scrolling), size restrictions etc. The method starts with a colour clustering
process, based on the Split and Merge algorithm proposed by Horowitz and
Pavlidis [75]. The split process begins with the whole image as a segment, and splits it
in quarters checking each time the colour homogeneity of the sub-segments produced.
If a sub-segment is not homogeneous, splitting continues, otherwise splitting stops.
The merging process, checks the final segments and if neighbouring segments have
similar average grey value they are merged. This produces a number of final regions,
which initially are text candidates. A number of tests are then performed, and most of
the regions are discarded. The first test discards regions of very large or very small
size. Then motion analysis takes place, and the remaining regions are matched with
regions of consecutive frames. They argue that text should remain unchanged in
shape, rotation and colour, and it should be displaced only for a small distance (if at
all). Based on that, regions without an equivalent in the subsequent frame are
discarded, so are those the equivalent of which has significantly different average grey
value. The next test has to do with contrast analysis. Since superimposed text is
placed in an image specifically to be read by the viewers, it should be sufficiently
different from the background (usually outlined text is used for that reason), which
would produce strong edges. Therefore, strong edges are identified in the image and
dilated. The regions that do not intersect with any dilated edge are discarded. Finally,
an extra filtering is performed based on the fill factors and the aspect ratio of the
remaining regions. This method is completely based on special characteristics that
Text Segmentation in Web Images Using Colour Perception and Topological Features
Methods Limited to Horizontal Text
Ohya et al. [137] present a method for the extraction and recognition of characters in
real scene images. They work with grey-level images, and start by binarising the
image using local thresholding. For local thresholding, the image is split into
sub-blocks, and a threshold is determined for each one. Then the thresholds specified
are interpolated to the whole image. For sub-blocks of a uniform grey value,
performing thresholding yields a number of unwanted noise components; therefore, a
bimodality check is usually employed to determine whether a sub-block should be
thresholded or not. The authors though, argue that such a bimodality check would
result in loosing real character segments, so they threshold every sub-block of the
image instead. The characters in a real scene image are not necessarily only black or
only white, so the method does not favour one situation in particular. Instead, the
method identifies all components in the image, and initially selects character-like ones
by assessing the grey differences between adjacent components.
Character-like components are expected to have adequate difference from their
adjacent ones. Components in close proximity that have the same bi-level value (after
binarizasion) or similar average grey values are identified and marked as possible
parts of characters. This process addresses the problem of multi-segment characters
(like “i”, “j” or some Chinese characters) and aims in creating character pattern
candidates. Character pattern candidates are then classified. This classification process
involves checking the similarity between the character pattern candidates and a set of
character categories stored in a database. High similarity patterns are then
post-processed with a relaxational operation, in order to remove ambiguities. The
algorithm is reported to work well with a number of different types of images: road
signs, licence plates, signs of shops etc, which include a variety of characters. Due to
the nature of the classification process, the characters are required to be printed
horizontally.
Another approach comes form Wu et al. [209]. They propose a two-step method to
segment an image into text regions, and a robust (but not very generic) way to
binarize the extracted text regions. First, a texture segmentation algorithm takes
advantage of certain characteristics of text to segment areas possibly containing text
lines. The second phase focuses on the previously detected areas and constructs
“chips” from strokes taking into account text characteristics. This phase starts bycreating strokes from significant edges in the picture. Strokes that are unlikely to
Text Segmentation in Web Images Using Colour Perception and Topological Features
peak in the histogram. Text, graphics and other non-textual components, which are
usually darker or lighter from the background, make contribution to the histogram
tails. Consequently, two thresholds are selected (and two bi-level images are
produced), to capture the left and the right tail of the histogram. Components are then
identified in the bi-level images and filtered according to several heuristics, which aim
at characterizing single textual objects. Component characteristics used are
dimension, convexity, local contrast and elongation (aspect ratio).
During the last step, the remaining components are grouped to produce a set of
text lines. The components are clustered into text lines by means of a hierarchy
divisive procedure. Initially a single cluster contains all the components, which
represents the root node of a tree. Then a set of expansion rules is used to recursively
expand each node. A generic node of the tree is represented by a structure with
numerous fields like the direction of the block of components, the width and height of
the block etc. To compute the direction of the block, a number of potential angles are
generated from pairs of components, and a projection profile is generated for each
angle. The projection profile with the minimum entropy corresponds to the correct
angle. The expansion rules are based on a second set of heuristics concerning the
characteristics of groups of components belonging to the same line of text. Such
characteristics used are closeness, alignment and comparable heights. First, a
“closeness segmentation” is applied once, which creates clusters of components based
on their topological proximity. At this stage, areas of text printed in different angles
are exported in different clusters. Then “alignment segmentation” is performed which
aims in separating blocks of similarly oriented text to text lines.
Clark and Mirmehdi [35] go one step further, and suggest two approaches to the
location and recovery of text in real scenes. The first method, is focused on situations
where text lies on the surface of an object and the edges of the object are rectangular
and aligned with the text. This is the case for paper documents, posters, signs, stickers
etc. The objects on which text is printed have strong visible edges, and depending on
the camera perspective and the positioning of the object in the scene, the rectangles
around text regions will appear as quadrilaterals in the scene image. Such
quadrilaterals are extracted from the image with the use of edge detection, Hough
transform and grouping of extracted line segments. Quadrilaterals that do not refer to
objects containing any text are then eliminated. A confidence measure is employed for this process, based on the histogram of edge angles for each region. Because the
Text Segmentation in Web Images Using Colour Perception and Topological Features
deal with a grey-level version of the original colour image. These characteristics
hinder the use of many of those methods with images taken in real environment, under
unconstrained conditions.
3.5. Text Extraction from Web Images
Web Images are generally computer created. As such, they belong to a special
category of images called synthetic images. As described in the introduction of this
thesis, the fact that Web Images are created with the use of computers, to be shown on
computer monitors, entails a number of characteristics, and inherent problems. The
problems associated with Web Images, are not the typical problems one expects to
find in scanned documents or video frames and scene images like skew, 3D
perspective, illumination or scanning artefacts, noise etc. Instead, artefacts produced
by compression or anti-aliasing are common, while text itself, as a result of the artistic
expression of the creator, can appear colourful, in various orientations, outlined or
shadowed etc. The main characteristic of Web Images though, is that text is rendered
in colour, and most of the times, either the text or the background (or both) are
multi-coloured. As can be seen, extracting text from Web images can be an extremely
complicated process.
Comparing to previous categories of image text examined here, text in Web
Images is created with the image. In that sense, Web Image text extraction is closer to
the extraction of superimposed text on video frames, rather than scene text, or scanned
documents.
The main contribution to the specific problem of Web Image text extraction
comes from Lopresti and Zhou [105, 106, 218-220]. They propose two methods to
locate text in images, as well as methods to recognize text, and although they make a
number of assumptions that do not always hold, their approaches can producesatisfactory results to a significant sub-group of Web Images, namely images stored as
GIF files (8-bit palettized colour).
The main underlying assumption to the method proposed by Lopresti and Zhou
for locating text in Web Images [218], is that text is printed in uniform colour. This
contradicts to the main characteristic of Web Images, that text is usually
multi-coloured. This assumption automatically limits the applicability of the method
to a certain subset of Web Images. Based on that assumption, the text extractionmethod is based on identifying in the image connected components with the same
Text Segmentation in Web Images Using Colour Perception and Topological Features
The images are first divided into non-overlapping mxm blocks and each block is
processed. The blocks are small enough so that each region is roughly bi-tonal. A
local clustering algorithm then exports a small number of clusters from each block
(one to three). Initially pixels of each block are grouped to three clusters, and then the
method decides based on the distance described here, whether pairs of clusters should
be merged. After this pre-processing step at the local level, the EMST algorithm runs
with the already colour reduced image. The character-like connected component
identification takes place as before, based on geometrical characteristics of the
components, but now a second step is added. After using features that are relatively
invariant to character touching (e.g. stroke width or white-to-black ratio) to identify
character-like components, the elongated components are singled out, and a
post-processing step splits those components based on breaks identified in their
vertical profile. For this step, the method further assumes that text is printed
horizontally. Finally, a layout analysis step takes place, which combines components
in words, and checks the “goodness” of each grouping by use of a measure called
saliency, based on the degree of height and the positional alignments of the characters
in a word.
Apart from Web Image text segmentation, Lopresti and Zhou suggested two
methods [105, 219] for performing OCR on the extracted text regions. A foreground
colour is chosen for each region (based on the assumption that the text is of uniform
colour). Then the method computes the difference between that colour and the colour
of each pixel in the region. The result forms a 3D surface, which is used for
recognition. The recognition method proposed is based on a polynomial surface fitting
proposed by Wang and Pavlidis [200]. Each character is treated as a 4th degree
polynomial function. This is done by least square fitting of the polynomial function on
the 3D surface of the character. Features derived from that polynomial representation
are then used to match the character to a database of prototypes. The problems with
that approach are two. First, the polynomial surface representation method is a
computationally intensive operation. Second, the polynomial representation can
capture only the global shape of characters. Characters whose shapes are similar to
each other (e.g. “ c” and “ e”) may not be distinguished reliably.
Another OCR method suggested by Lopresti and Zhou, is based on n-tuples. An
n-tuple is simply a set of locations in an image with specified colours. For recognition, n-tuples are superimposed onto an image and colours are compared. A
Chapter 3 – Text Image Segmentation and Classification
degree of how much an n-tuple matches to the underlying area is then calculated. To
use n-tuples, each pixel is assigned a value in the range [ 0,1] representing the
certainty of it to belong to the foreground. N-tuples is an old OCR technique, which
due to a number of problems it presents has received only scant attention.
In a similar manner to Lopresti and Zhou, Antonacopoulos and Delporte [7]
propose a way to extract characters from Web Images. They improve on the previous
suggestions in two ways. First, by providing support for full colour (JPEG) images,
which represent a great percentage of Web Images. Furthermore, gradient characters
extraction receives more attention. For full colour images, the authors perform a bit
dropping operation, keeping only the 3 most important bits of each colour component,
thus reducing the maximum number of colours to 512 . Subsequently, colour
clustering is performed to further reduce the number of colours in the image. Two
approaches to colour clustering are suggested. The first is based on analysing the
histogram of the image, and ordering the colours according to their prominence. The
most dominant colour is then taken as the centre of the first cluster, and colours
presenting a distance less than a pre-defined threshold to the centre of the cluster, are
assigned to it. The most important of the remaining colours is then selected and a new
cluster is created. The process continues until all available colours have been assigned
to a cluster. The second clustering method suggested is the Euclidean
minimum-spanning-tree technique (EMST) as described previously. This second
algorithm is used in more complex colour situations. Subsequently, a connected
component analysis is performed, during which special attention is paid to gradient
components. Subsequently, components are evaluated based on certain features, like
size, aspect ratio and the number of strokes crossed by each image scanline. This last
feature, essentially measures the black-to-white transitions, which cannot be more
than four on a scanline (case of letter “M”), at least for the Latin character set. Then
components are evaluated are grouped in words based on their colour, their alignment,
and proximity in the image.
A framework for analysing Web Images is proposed by Koppen et al. [96]. The
framework consists of four stages, the colour seperation stage, the information
granulation-specification modules (GVMs), the task stage and the recognition stage.
Each image is initially split in colour components. Five sub-images are produced, one
for each component of the CMYK colour system, and a fifth one for the Saturationcomponent of the HIS colour system. Each sub-image is then used as input to one or
he first of the two methods investigated towards text segmentation in Web
images is presented in this chapter. This method works in a split and merge
fashion, aiming at constructing connected components that ultimately describe the
characters in the image. The two parts of the method, namely the splitting process andthe connected component aggregation (merging) process, are explained in detail.
Certain aspects of the method that are of particular interest are also detailed in
dedicated subsections.
T
4.1. Basic Concepts – Innovations
Both text segmentation methods that were produced as part of this research share a
common belief about the way Web Images are constructed. As explained inChapter 1, Web Images are created to be viewed by humans, thus it is reasonable to
expect that particular colour combinations are used that enable humans to differentiate
(relatively easily) between the textual content of the image and the background. Based
on that observation, the common denominator both methods share is their
anthropocentric character. Although approached in a distinctly different way in each
method, the safe assumption that a human being should be able to read the text in any
given Web Image is the foundation of both methods’ reasoning.
different maximum for each Hue [189], beyond which further increases in intensity
produce a reduction of Saturation. It should also be mentioned here that changes in
Hue also occur as Lightness changes, a phenomenon known as the Bezold-Brucke
Effect [156, 211], although such level of interaction is outside the scope of the
research presented here. Based on the above information, the suitability of HLS for
colour analysis of images can be well justified. This type of information about factors
enabling colour discrimination and the interaction of colour attributes is taken into
account in the first part of the Split and Merge method.
The next stage of component aggregation employs existing biological data on
Wavelength discrimination and Lightness discrimination, involving the appropriate
HLS components in accordance to the factors explained above. Each type
(Wavelength, Lightness or Colour Purity) of discrimination information is used at
each level of processing, working towards the final segmentation of the image.
4.2. Description of the Method
The method starts by a pre-processing step, which aims at separating the chromatic
from the achromatic pixels of the image (see next section). The image is split in two
layers at this point, one containing the achromatic pixels and another containing the
chromatic ones.
Subsequently, the splitting process takes place. The histogram of Lightness for the
pixels of achromatic layer is computed, and peaks are identified. A short analysis of
the peaks identified follows, where peaks corresponding to similar Lightness values
are combined. The left and right minima (see Section 4.4.2) of each peak (or
combination of peaks) define a range of Lightness values. For each range of Lightness
values, a new sub-layer is introduced, and the corresponding pixels are copied over.
In a similar manner, the histogram of Hues for the pixels of the chromatic layer iscomputed, peaks are identified and the chromatic layer is split into sub-layers of
different Hue ranges. For each of the sub-layers produced from Hue histogram
analysis, the Lightness histogram is computed and the process is repeated. This goes
on, alternating the component used at each step until a specified number of splits are
performed or until only one peak can be identified in the histogram. Following this
process, a tree of layers is produced, where the original image is the root of the tree,
If MaximumNumberOfSteps has been reached Then Exit Compute Histogram of ColourComponent for LayerAnalyse HistogramIf PeaksIdentified == 1 Then ExitFor Each Peak identified in Histogram
{Create SubLayer for the interval specified by PeakFor Each Pixel in Layer{
If ColourComponent value of PixelColour falls under PeakThen copy Pixel to SubLayer
erOfSteps)Else If (ColourComponent == Hue)
Then Split(SubLayer, Lightness, MaximumNumberOfSteps)}
}If (ColourComponent == Lightness)
Then Split(SubLayer, Hue, MaximumNumb
}
Figure 4-8 – Pseudo code of the pre-processing and splitting steps of the Split and Merge Method.
Commands and reserved words are typed in Bold.
4.4.2. Histogram Analysis
Histogram analysis aims at identifying interesting peaks in a histogram of the layer in
question, which will be subsequently used to split the layer. Only the pixels that
belong to that layer are taken into account when computing a histogram. Bearing in
mind that a merging phase follows, it is preferable at this point to over-split the image,even if that entails breaking characters across different sub-layers, rather than to
Text Segmentation in Web Images Using Colour Perception and Topological Features
Every histogram can then be decomposed to a number of such peaks. Special
consideration is given to Hue histograms, due to the fact that the Hue component is 2 π
modulus. That is, since Hue is expressed as an angle, it presents a circular repetition
wh
e of the peak can be defined as the number of pixels under the peak. In
oth
s than the
background. Nevertheless, because text can share some colours with the background,
the size of the peak representing those colours will be significantly larger than
expected ( Figure 4-10) . Small peaks located on top of larger ones are therefore lost.
Finally, even the first assumption that text occupies a small portion of the image is not
always true; sometimes the larger peak in the histogram actually corresponds to text.
In conclusion, size cannot be used to sufficiently indicate the presence of a text peak.
ere value 256 is mapped back to zero (or, value –1 is mapped to 255). For Huehistograms only, a peak is allowed to exist bridging the two ends (e.g., with a left
minimum at 250 and a right minimum at 5).
Text Peak Features
Based on the general structure of a peak, certain features can be defined. Such
features are discussed next in the context of identifying text peaks in a histogram.
The siz
er words, that would be the integral of the peak structure. Size is a useful metric, if the number of pixels belonging to text is known beforehand. Although this is not
generally the case, text is normally expected to comprise fewer pixel
(a)
(b)
0
1000
2000
3000
4000
5000
0 50 100 150 200 250
N u m
b e r o
f P i x e
l s
Hue Value
Peak A [34,42]
Peak B [126,130]
(d)
(c)
Figure 4-10 – (a) Original Image. (b) Pixels under Peak A. Peak A includes the yellow characters and
part of the background. (c) Pixels under Peak B. Peak B includes the greenish characters and part of
the background. (d) The Hue histogram for the image. The text and the background share colours.
number of mixed peaks of the histogram. Therefore, a successful smoothing should
decrease the overall number of distinct text and background peaks, while it should not
increase the number of mixed peaks. It should be noted that special attention should
be paid to the fact that if a mixed peak is merged with a background or a text one,
then the number of mixed peaks will not increase, while a wrong merger will happen.
For that reason, the results for each smoothing and heuristic technique examined were
also visually inspected, before deciding on which technique to employ.
Weighted-averaging the histogram at different scales was initially examined.
Weighted averaging reduces vastly the number of peaks in the histogram, merging a
num
peaks are derived from the maxima of the initial peak structures. Certain
fea
an
smo
e.
The x-coordinate of the centre of gravity of a peak indicates the mean Hue or meanLightness of the peak, depending on the type of histogram. The y-coordinate of the
ber of small peaks with bigger ones, which in most situations is not desired, since
some small peaks often correspond to text. Therefore, smoothing by weighted
averaging was not used.
A structural approach for smoothing noisy signals suggested by Antonacopoulos
and Economou [8] was also implemented and tested. Data points are expressed in
terms of peak structures , and peak structures are subsequently expressed in terms of
meta-peak structures . Meta-peak structures consist of a left minimum, a maximum
and a right minimum point, exactly like the initial peak structures, but the points for
the meta-
tures of peaks, such as the width and height are then analysed, and the meta-peaks
are classified as either noise or characteristic peaks of the signal. Following that, the
noise peaks are smoothed, while the characteristic ones are preserved. Originally
devised for smoothing noisy signals, this method was slightly changed to fit the
specific problem of colour histograms. The issue here is not to separate noise from
characteristic peaks, but to decide which peaks can be safely combined without
effectively merging text and background pixels. For this reason, absolute measures
such as widths and heights of peaks were not used; instead, the width ratio and height
ratio of peaks were employed. This approach produced far better results th
othing by weighted averaging, however in a few cases, smoothing produced
wrong mergers between peaks, therefore this method was not finally employed.
A different technique used was a heuristic method based on comparing the centres
of gravity between peaks. A comparison between the centres of gravity of two
successive peaks can give an indication of both their proximity and their relative siz
Similarly to Hue discrimination, humans have different Lightness discrimination
abilities for dark and light colours. Lightness perception is roughly logarithmic.
Humans cannot differentiate between two colours if the ratio of their intensities is less
than approximately one percent. In the context of computer graphics, CRT monitors
are inherently non-linear, that is the intensity of light reproduced on the screen of a
CRT monitor is a non-linear function of its voltage input, which in turn is set by the
RGB values of the pixels. Those RGB values can be corrected to compensate for this
non-line , a process known gamma correction
An experiment was performed to measure Lightness discrimination for all 255
in Appendix A. Based on the results obtained, the Lightness thresholds are
defined as shown in Figure 4-15.
arity as .
levels of Lightness as defined in the HLS colour system. Details of the experiment are
given
10
20
30
40
50
i m u m
P o s i t i v e
C h a n g e A
l l o w e d
0 0 50 100 150 200 250
Lightness Value
M a x
Figure 4-15 – The Lightness discrimination thresholds used. For each Lightness value, the maximum
positive change for which two colours are considered similar is shown.
For each Lightness value, the minimum increase that can be made without
producing any noticeable change between the two colours is shown. The thresholdsused, are also relaxed compared to the measurements made, similarly to the Hue
Text Segmentation in Web Images Using Colour Perception and Topological Features
(a) (b) (c) (d)
Figure 4-18 – (a) A character broken in two components. (b) Bottom component and vexed area.
(c) Top component and vexed area. (d) Overlapping of components.
Suppose a character rendered in a gradient. During the splitting process, the
character might be broken in different components, for example the top of thecharacter might be one component and the bottom of the character another one. The
vexed area of the top component would then cover some portion of the bottom one,
and vice-versa. If, furthermore, the background is sufficiently different to the colours
of the two components (that is a very simple situation for which the method has to
work well), the whole of the vexed area of the top component is overlapping with the
bottom one, and vice-versa.
The fact that the whole of the vexed area of one component overlaps with an other com that the components should be merged,
yet, Eq. 4-2 fails to provide a value close to 1. The reason for this is that the maximum
possible number of overlapping pixels is defined to be equal to the sum of the s
the two components. For this case, the maximum possible number of overlapping
pixels should be the sum of the sizes of the two vexed areas. This is because the vexed
[0,1].
ponent, should give a strong indication
izes of
areas are much smaller than the components that participate in the comparison. For
the example of Figure 4-18, if b is the bottom component and t the top one, then
NOP(b v , t)=13 , NOP(b, t v )=16 , Size(b)=36 , Size(t)=53 , Size(b v )=13 and Size(t v )=16 .
To cover this case, Eq. 4-2 could be changed accordingly, replacing the denominator
to the sum of the sizes of the vexed areas of the components ( Eq. 4-3) . Since the
maximum possible number of overlapping pixels cannot be greater than this sum,
Text Segmentation in Web Images Using Colour Perception and Topological Features
(a) (b) (c) (d)
Figure 4-20 - (a) A character broken in two
(c) Top component and v
components. (b) Bottom component and vexed area.
exed area. (d) Overlapping of components.
Although the above definition is more complete than the previous ones, there are
still some special cases that should be dealt with. Such a special case is illustrated inFigure 4-20. For the situation presented in this figure, Eq. 4-4 gives an overlapping
value equal to 1. The question here is whether we feel confident to base the merging
of two components on the overlapping of one or just a few pixels. This depends on the
size of the components involved, if the sizes of the components involved are
comparable to the number of pixels overlapping, then it is probably a good call,
otherwise its probably not. This leads to the definition of a weighting function, which
should reflect exactly this confidence. Such a weighting function is given in Eq. 4-5.
)()(),(),(
),(bSizeaSize
ba NOP ba NOP baW vv
++= Eq. 4-5
The above weighting function is quite comprehensive, and on a closer look, it is
the same as the first definition of overlapping ( Eq. 4-2) and ranges in [ 0,1].
Nevertheless, it also presents some special cases, as can be seen in Figure 4-21. Here
the small component should probably be merged with the large one, and the
overlapping value as computed by Eq. 4-4 is certainly large enough (equal to 1) to
indicate that, but the weight computed by Eq. 4-5 is small, due to the big size of one
Figure 4-21 - (a) A character broken in two components. (b) Bottom component and vexed area.
(c) Top component and vexed area. (d) Overlapping of components.
It proves better to base the weighting function on the smaller of the two
components only, so the final weighting function is given in Eq. 4-6. The weight is nolonger in the range [ 0,1], so by definition any value greater than 1 is bound to 1.
( ))(),(min2),(),(
),(bSizeaSizeba NOP ba NOP
baW vv += Eq. 4-6
For each pair of components a and b the value ),(),( baOvl baW is computed,
and if above a pre-defined threshold the components are considered for merging. The
value of the threshold used in the method was set equal to 0.5625 (=0.75 2). From now
on, the term overlapping refers to weighted overlapping.
Merging in the Leaf-Layers
Merging is first performed in all the leaf layers. Subsequently, components in layers
having a common parent layer (as shown in Figure 4-12) are merged by copying the
components one level up (to the common parent-layer) and performing merging in the
parent-layer. The merging process is the same for both the leaf and the intermediate
layers, and is based on the overlapping between components as defined previously.
combination of components is checked first, and if their
overlapping value is above the predefined threshold, a possible merger is identified.
All
The merging process in the leaf-layers is described next.
Every possible
identified mergers are kept in a sorted list, and merging starts with the merger
having the bigger overlapping value. After each merger, the list is updated. Other
nished in all level-3 leaf layers, all the components are copied one level up,
layers are copied in their parent level-2 Hue layer. Consequently, after this copying,
all
After all possible mergers have taken pla
components of layers of the same level is performed. This happcomponents of the leaf layers one level up, and repeating the merging proces
layer that receives the components. For example, based on Figure 4-12, after m
has fi
following the tree structure. In this example, the resulted components in the Lightness
components in the Hue layer will have similar Hues (since they were identified in
children layers of this Hue layer), and will have vexed areas defined based on theLightness thresholds (since vexed areas were identified in the Lightness layers). By
performing a merger in the Hue layer at this point, effectively, we merge components
of all the level-3 Lightness layers, based on Lightness defined vexed areas.
Two components will only overlap at this stage, if their Lightness values are
sufficiently similar (according to the Lightness thresholds used). The rationale behind
merging at this point is to address characters of constant Hue, comprising of areas of
slightly different Lightness. Examples are characters in gradient colour (in the
Lightness component), or characters with shadows or highlights. These characters will
have been broken across different Lightness layers, but if their consisting components
are similar enough in terms of Lightness, their vexed areas will adequately overlap at
this point.
(a) (b)
Figure 4-23 – (a) Hue Layer with components of all its children layers copied over. (b) Components
resulted after merging.
An example of the above process can be seen in Figure 4-23. All components of
the children layers copied over in the shown Hue layer are illustrated in the image on
Text Segmentation in Web Images Using Colour Perception and Topological Features
the left, while the components resulted after merging are shown in the image on the
right.
After all possible mergers occur in this layer, two additional processes take place:
the refinement of the vexed areas of the resulting components and the examination of
the integrity of the components resulting from mergers. These two processes will be
explained next.
Vexed Area Refinement
The vexed areas of the components were identified in the leaf layers, according to the
type of the leaf layers. After being copied one level up, and after merging has been
performed, the vexed areas of the components remaining need to be refined, so that
they are representative of the new layer in which they now reside. For example, after copying all the components identified in the Lightness type leaf layers to the parent
Hue layer, the vexed areas must be refined so that they contain pixels not only of
similar Lightness to the component, but of similar Hue as well. This is important, as
merging between Hue layers of the same level will be performed next, and this
merging must be based on Hue similarities.
The vexed areas are refined according to the type of layer they reside in, that is
they might be refined based on Lightness similarity (for Lightness layers) Hue
similarity (for Hue layers) or combined Lightness, Hue and Saturation similarity (for
the Chromatic and Achromatic layers). When the process reaches the original image
(the root of the tree), no refinement is necessary, since no other merging can happen.
The Hue and the Lightness similarity thresholds were defined earlier in
Section 4.5.2. For the Hue and Lightness layers the process of vexed area refinement
is as follows. For each component in a given layer, each pixel of the vexed area of the
component is compared to the average colour of the component, and if not similar, it
is removed from the vexed area. Similarity is based on the type of the layer as
mentioned above. The pseudo-code of that process is shown in Figure 4-24.
If ( Colour(Pixel) is NOT similar to Colour(Component) )Then Remove Pixel from VexedArea
}}
}
Figure 4-24 – Pseudo-code for the Refinement of Vexed Areas function of the Split and Merge Method.
The process is slightly different when it comes to the chromatic or the achromatic
layer. Refining the vexed areas at this point, aims to prepare the components of thetwo layers (the chromatic and the achromatic) to participate in a merging process
across this tree level, effectively checking the overlapping between achromatic and
chromatic components. Consequently, the vexed areas at this point should represent
som potential extension for each component based on the Saturation value of the
component; for example, a low saturated chromatic component could potentially be
merged with an achromatic one. For the chromatic layer, refining the existing vexed
areas based on some kind of saturation similarity is of no real benefit. This is becausethe vexed areas of the components of the chromatic layer do not contain any pixels
outside the range of the hues of the children hue-layers, whereas on the other hand
achromatic pixels have undefined hue (conventionally set to zero). That effectively
means, that if for example a very low saturated green component exists, even if the
green hued pixels, since it was refined as such in the children Hue layers. For this
rea
possibly contain some chromatic pixels. If this was not the case, it would
e
colour of the component is very low saturated indicating that it could be merged with
some neighbouring grey component, its vexed area will not contain anything but
son, instead of simply comparing the saturation of each pixel of the existing vexed
areas to the saturation value of the component to which the vexed area belongs, we
discard the existing vexed areas, and construct new ones, based on mixed Lightness
Hue and Saturation similarity as will be described next. By doing this, we ensure that
the vexed areas of the components of the chromatic layer can possibly contain some
achromatic pixels, and vice versa: the vexed areas of components of the achromatic
Figure 4-29 – Image containing single-coloured background and multi-coloured text. Anti-aliasing produces a fuzzy area between characters “o” and “u” that ultimately causes the merging of the
characters.
Original Final Segmentation Segmented Characters
Figure 4-30 – An image containing characters with shadows. Most of the text is written circularly.
The method is not configured to give the optimum results in every individual
situation; instead, the thresholds were selected so that reasonable results can be
obtained over a range of fundamentally different images. An example can be seen in
Figure 4-31. Using stricter Lightness thresholds, the method produces a better final
segmentation for the same image as in Figure 4-28. Nevertheless, using stricter
lightness thresholds can produce worse final segmentations in other images as can be
seen in Figure 4-32. In this example, the highlights and shadows (areas of higher or lower Lightness respectively) of the characters in the word “Google”, are segmented
as separate components in the case that stricter Lightness thresholds are used.
Text Segmentation in Web Images Using Colour Perception and Topological Features
(Euclidean) distances in the HLS colour space may not necessarily be perceived by
humans as being equally dissimilar. A more suitable colour system would be one that
exhibits perceptual uniformity. 1 The CIE (Commission Internationale de l’Eclairage)*v* (sometimes also referred to as
should be used in
conjunction with colour information to achieve a correct segmentation. That extra
step, of incorporating topological relationship information between components into
the merging process is taken here. Towards this, a Propinquity measure based on both
the colour distance and the topological relationship between components is defined by
means of a fuzzy inference system.
5.2. Description of method T
process. Since the image is not bi-level (therefore, the components cannot be easily
defined in terms of black and white), colour similarity between pixels is used for
com
step are
sub
has standardized two colour systems: L*a *b* and L*u
CIELAB and CIELUV) based upon the CIE XYZ colour system [30, 116]. These
colour systems offer a significant improvement over the perceptual non-uniformity of
CIE XYZ [152, 211] and are a more appropriate choice to use in that aspect than HLS .
Therefore, the Euclidean distance in the L*a *b* colour space is used here as indicative
of colour similarity.
Colour is a very important attribute in identifying objects in colour images,
nevertheless, is not always adequate, especially when dealing with complex colour
combinations. Shape plays an equally important role, and it
he method starts by performing an one-pass connected component identification
ponent identification.
The rationale behind this pre-processing step (expressing the image in terms of
components) is that if some neighbouring pixels have colours that humans cannot
differentiate, it is beneficial to treat those pixels as a single component. By doing so,the processing time of the method is substantially reduced.
The components resulted from the initial component identification
sequently aggregated into larger regions. Using a fuzzy inference system defined,
1 For example, assume that two colours have HLS (Euclidean) distance δ. Humans find it more
difficult to differentiate between the two colours if they both lie in the green band than if the two
colours lie in the red-orange band (with the distance remaining δ in both cases). This is becausehumans are more sensitive to the red-orange wavelengths than they are to the green ones.
Text Segmentation in Web Images Using Colour Perception and Topological Features
R G B White
x 0.640 0.300 0.150 0.3127
y 0.330 0.600 0.060 0.3290
z 0.030 0.100 0.790 0.3582Table 5-1 - Primaries and D 65 white point of Rec. 709
The image data is coded in the RGB colour system
from RGB to L*a*b*. A direct conversion exists between CIE
, so a way is needed to convert
XYZ and L*a *b*. Still,
similar nversion is needed between the RGB and
evice-dependant, however between monitors that conform to the
stan
− X 498535.0537150.1240479.3
* * *
ly to the Split and Merge method, a co
the CIE XYZ colour systems. It is not feasible to convert from a device-dependant
colour system to a device-independent one without any extra knowledge about thehardware used both for the creation and for displaying of the RGB data. The RGB
colour system is d
dard Rec. 709 [84], RGB colours can be considered to be unvarying. The
primaries and the D65 white point of Rec. 709 are displayed in Table 5-1. To convert
from RGB709 to CIE XYZ and vice-versa we use the transforms:
R709
−−=
Z
Y
B
G
057311.1204043.0055648.0
041556.0875992.1969256.0
709
709
=709
709
709
950227.0119193.0019334.0
072169.0715160.0212671.0
180423.0357580.0412453.0
B
G
R
Z
Y
X
Eq. 5-1
Then the conversion of the CIE XYZ values to L a b is given by:
Text Segmentation in Web Images Using Colour Perception and Topological Features
abbababa BP BP BP BP →→+ −−+=)( abbababa BP BP BP BP BP →→+ BP −−+≠)( abbababa BP BP BP BP BP →→+ −−+ ≠)(
(a) (b) (c)
Figure 5-6 – Different cases where the number of Boundary Pixels (BP) of the component resulting
from a merger can (a) or cannot (b, c) be computed from known information. Boundary pixels areillustrated here with a triangle in the upper left corner.
5.4.3. The Connections Ratio
Another attribute was therefore sought, which also expresses the degree of
connectivity between components, but at the same time is easier to calculate and use.
The attribute finally used as an input in the fuzzy inference system is the Connections Ratio . This is defined in a way that it overcomes the problems of Boundary Sharing,
by using Connections instead of pixels to define the boundaries of components.
Definitions
A Connection is defined here as a link between a pixel and any one of its
8-neighbours, each pixel thus having 8 connections. A connection can be either
internal or external . A connection is called internal when both the pixel in question
and the neighbouring one belong to the same component and external when theneighbouring pixel belongs to another component. Connections to pixels outside the
image, are also considered external. Figure 5-7 illustrates the external and internal
connections of a given component to its neighbouring components.
If C x is the total number of connections of component x, Ci x and Ce x are the
number of internal and external connections of component x respectively and S x is the
size of component x, where size equals the number of pixels of the component, then
So based on Eq. 5-16 and Eq. 5-17, Eq. 5-12 becomes:
( ) ( )cbaba
cbca
cba
cbacba CeCeCeCe
CeCe
CeCe
CeCR
,2min,min ,
,,,, −+
+==
+
++ Eq. 5-18
As shown above, the calculation of the value of the Connection Ratio feature for
the component resulting from a merger requires knowledge of the individual feature
values of the components involved only. Therefore, the feature values for eachcomponent need to be calculated for the components themselves only once at the
alculations after every merger.
Fuz y Sets and Membership Functions
Characters consist of continuous strokes, ther
ponents that partially neighbour, will have a
Connections Ratio value in the middle range. After experimentation it was found that
n Ratio values indicative of components likely to be
stro
for more flexibility in
des
beginning of the process, minimising the number of c
z
efore, if a character is split in two or
more components, those components, being parts of strokes, will neighbour only
partially. This is because strokes are continuous shapes of small thickness, so if a
stroke is split in two components, those components will neighbour to an extentcomparable to the thickness of the stroke.
The Connections Ratio input is indicative of the extent to which components
neighbour each other; as a result, it can be used to indicate when components are
more likely to be parts of a stroke. Com
this middle range of Connectio
ke parts, is between 0.05 and 0.65 . Two fuzzy sets were defined for this middlerange called “Medium Low” and “Medium High”, the membership functions of which
are shown in Figure 5-8. The reason for defining two fuzzy sets instead of one for the
middle range of Connections Ratio values is to provide
igning the system, in a manner similar to the fuzzy sets defined for small Colour
Distance values. Specifically, although pairs of components presenting Connections
Ratio in the range of either the “Medium Low” or “Medium High” fuzzy sets will be
favoured at the time of component aggregation, having two fuzzy sets in this rangeallows for better control over the order mergers are performed. Between the two
when Connections Ratio values reach zero. As explained before, a Connections Ratio
value of zero indicates that two components are not neighbouring, and should not be
con
the opposite end, the “Definite” fuzzy set is defined to give high degrees of
membership to Propinquity values above 0.9. In a manner similar to the “Zero” fuzzy
Colour Distance and medium Connections Ratio) be awarded a
high Propinquity value, therefore be placed high in the hierarchy of mergers to
hap
setis d
eshold of 0.5 will
effe
sidered for merging in any case. For zero values of Connections Ratio, another
fuzzy set is defined called “Zero” in order to facilitate the different handling of
components that do not neighbour at all. If a pair of components presents a
Connections Ratio in the “Zero” fuzzy set, it is never considered for merging, instead
a Propinquity value of zero is returned independently of the Colour Distance value.
5.4.4. Combining the Inputs: Propinquity
The single output of the fuzzy inference system, the Propinquity, is defined with the
help of seven fuzzy sets. Three of the fuzzy sets defined carry a special meaning.
Those are the “Zero”, the “Medium” and the “Definite” fuzzy sets.
The “Zero” fuzzy set is defined in such a way that a Propinquity of zero has a
membership value of 1 to the set, while any other Propinquity has a membership value
of 0. This is a very crisply defined fuzzy set, which is necessary to facilitate the
rejection of certain cases where components should not be merged (e.g. very large
Colour Distance, or zero Connections Ratio).
On
set, the “Definite” fuzzy set ensures that cases where components should definitely be
merged (e.g. small
pen.
The Propinquity output is defined so that a value of 0.5 will be the threshold
above which two components should be considered for a merger, while values less
than 0.5 indicate that two components should not be merged. The “Medium” fuzzyefined to cover the middle range of Propinquity values ( 0.4 to 0.6 ) while it gives a
membership value of 1 to Propinquity equal to 0.5 . This fuzzy set is used to indicate
cases where it is not certain whether two components should be merged or not.
A pair of components awarded a Propinquity value above the thr
ctively be merged in the subsequent component aggregation phase. Contrary, a
pair of components awarded a value below that threshold, should not be considered
for a merger. The range of Propinquity values between the “Medium” and the“Definite” fuzzy sets are described by two fuzzy sets: “High I” and “High II”. Initially
inference system are responsible for establishing the
Distance and Connections Ratio) and the
rules are simple statements, where the fuzzy sets
d verbs respectively.
the antecedent, while the then-part of the rule is
rts: first
ship, then the consequent is also true to
ssigned to
le of the system are then aggregated into a
output set, which is finally defuzzyfied, or resolved to a single number. The
The set of rules used to define the fuzzy inference system is shown in Figure 5-10
below.
If Connections Ratio then Propinquity is Zero
The rules of the fuzzy
relations between the two inputs (Colour
output (Propinquity). These if-then
and fuzzy operators are the subjects an
The if-part of a rule is called
called the consequent. Interpreting an if-then rule involves two distinct pa
evaluating the antecedent and second applying that result to the consequent. If the
antecedent is true to some degree of member
the same degree. The consequent of each rule specifies a fuzzy set to be a
the output. The output fuzzy sets for each ru
single
above process is described in more detail in Appendix B.
is Zero
If Connections Ratio is Low and Colour Distance is Insignificant then Propinquity is High I
If Connections Ratio is Low and Colour Distance is Small then Propinquity is Medium
If Connections Ratio is Low and Colour Distance is Medium then Propinquity is Low II
If Connections Ratio is Low and Colour Distance is Large then Propinquity is Zero
If Connections Ratio is Medium Low and Colour Distance is Insignificant then Propinquity is High II
If Connections Ratio is Medium Low and Colour Distance is Small then Propinquity is High I
If Connections Ratio is Medium Low and Colour Distance is Medium then Propinquity is Medium
If Connections Ratio is Medium Low and Colour Distance is Large then Propinquity is Low I
If Connections Ratio is Medium High and Colour Distance is Insignificant then Propinquity is Definite
If Connections Ratio is Medium High and Colour Distance is Small then Propinquity is High II
If Connections Ratio is Medium High and Colour Distance is Medium then Propinquity is High I
If Connections Ratio is Medium High and Colour Distance is Large then Propinquity is Low II If Connections Ratio is High and Colour Distance is Insignificant then Propinquity is High I
If Connections Ratio is High and Colour Distance is Small then Propinquity is Medium
If Connections Ratio is High and Colour Distance is Medium then Propinquity is Low II
If Connections Ratio is High and Colour Distance is Large then Propinquity is Zero
Figure 5-10 – The fuzzy inference system rules.
As mentioned before, a small Colour Distance indicates that the components
involved are sufficiently similar (in colour) to be merged, while a large Colour
Text Segmentation in Web Images Using Colour Perception and Topological Features
Original Final Segmentation Segmented Characters
Figure 5-13 – Image with multi-coloured characters over photographic background.
In Figure 5-14 an example of an image containing a logo (Netscape’s “ N ”character), as well as one-pixel wide characters (“ CLICK HERE ” at the bottom-right
corner) is given. Although the one-pixel wide characters are correctly separated from
the background, the segmentation method fails to segment the logo character. What
appears here to be an over-merging problem (parts of the characters merged with the
background), originates from the sequence in which mergers occur (as directed by
Propinquity). Parts of the gradient background of the Netscape logo (light coloured
ones) are first merged with parts of the “ N ” character. The components resulting from
these first mergers are subsequently merged with other parts of the background of
similar colour, producing the result shown here. The dependence of the Fuzzy
Segmentation method results on the sequence of mergers is an important aspect of this
method. Using the Propinquity between components to define the order of the mergers
in most of the cases ensures a correct separation of the characters from the
and is a measure of how full is a specific area by pixels of the character. In general,
characters have a compactness value in the middle range, since they consist of a
number of strokes, and therefore infreque ly occupy much of the area defined by
their bounding box. Certain characters, such as “ l ” or “ i” present a much higher
compactness, but these can be o by
seen next.
The compactness of a number of ideal characters was measured, and the results
are shown in Figure 6-3. As expected, m st of the characters present a compactnessvalue in the middle range. A number of characters such as “ i” and “ l ” have a
compactness value equal to one, which results to a peak on the right side of the
nt
filtered ut the use of their aspect ratio as will be
o
histogram.
30
40
50
60
70
80
90
100
F r e q u e n c y
20
0
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Compactness
Figure 6-3 - Histogram and scatter plot of compactness for the 26 letter of the English alphabet. The
font used for the measurements is normal weight, non-italics Arial. Compactness was measured for six
sizes, ranging from 6 to 22pt. As can be seen most characters have an compactness value between 0.2
and 0.7.
Real data for compactness are illustrated in Figure 6-4. Components resulted by
the segmentation methods discussed in this thesis were manually classified as
due to the considerable variety of background components as explained before.
Combinations of two and three features were tried in an attempt to see if a decision
boundary could be defined in a 2D or feature space. It is important that the
features combined are as little correlated as possible, so that the best separation
two classes is obtained.
ponents. Generally, the number of transitions roughly correlates with
compactness, and most characters present a number of transitions at the middle range.
6.1.2. Multi-Dimensional Feature Spaces
Using just one feature to separate the two classes of comp ents i not an option er
3D
of the
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Aspect Ratio
C o m p a c
t n e s s
Figure 6-5 – Scatter plot of Ideal Character data in the 2D feature space of Aspect Ratio and
Com
expected, most data are concentrated in the centre (yellow) marked area.
Elo
i” and “ j”, which are both compact and rectangular.
pactness. Areas of interest are marked.
The feature space of Aspect Ratio and Compactness was one of the 2D spacesevaluated. A scatter plot of ideal character data in the feature space is shown in Figure
6-5. As
ngated characters such as “ l ” and “ i” present significant compactness and low
aspect ratio, so they are concentrated in the area (red) at the top left of the scatter plot.
The third area, marked with green, at the right top of the scatter plot might appear at
first mysterious, since there are no characters that are both rectangular (aspect ratio
near 1) and filled (compactness near 1). This third set of points actually comes fromthe dots over characters such as “
Text Segmentation in Web Images Using Colour Perception and Topological Features
The accumulator array is examined after the Hough transform has taken place, and
the cell (or cells) with the maximum count is identified. A possible text line is
identified, having the parameters of the cell, and the associated components are
recorded as part of the line. The components associated with the exported line are
removed from the array of the components of this size-group, and the same process is
repeated with the rest of the points each time identifying the cell (and corresponding
ops when no cell exists with a count of
{
line) with the maximum count. The process st
more than three. The pseudo-code for the above process is given in Figure 6-11.
For Each SizeGroup{
While (Number of Components in SizeGroup >= 3)
{For Each Component in the SizeGroup{
Find the Centre of Gravity and add it to the ListOfPoints}Perform Hough Transform at the ListOfPointsFind the MaximumCount in the accumulator CellsIf MaximumCount < 3 Then Continue with next SizeGroupFor Each Cell{
If Count equals MaximumCount Then
Identify the components falling in the Cell as a line
Remove components from SizeGroup}
}}
}
Figure 6-11 – Pseudo-code for the line identification process.
A point worth mentioning is that if more than one cells present a count equal to
vering all possibilities. By choosing the appropriate accumulator celldim nsions (quantization) for the Hough transforms, such a situation can be limited.
Thi
eded to be able to assess the co-linearity
between them. The main argument against using only three points, would be thatstatistically three points would be just enough to give an indication, but not to define
the maximum count, no special selection occurs. Instead, all candidate lines are
identified, coe
s is because most of these cases occur as the same components identified as
slightly rotated or parallel moved lines (neighbouring accumulator cells).
As mentioned before, three is the minimum number of components requested for a
line to be identified. The rationale behind this decision is manyfold. First,
There are many statistical distribution moments defined characterizing the
variability of the distribution around its mean value. The most common is probably
the variance 2, which is defined in Eq. 6-4 and the standard deviation defined in Eq.
6-5. There is also a computationally inexpensive estimator called the average
deviation or mean absolute deviation defined in Eq. 6-6. For our purpose, the
computational advantage of using the average deviation is minimal. The standard
deviation is used here, as it gives a metric that can be directly compared to the actual
units of the image.
∑=
n
−−
= j
jn x xn
x xVar 1
21 )(
11
)...( Eq. 6-4
)...()...( 11 nn x xVar x x =σ Eq. 6-5
∑=
eviation is a metric of the typical deviation from the mean of the
−=n
j jn x x
n x x ADev
11
1)...( Eq. 6-6
The Standard D
values of the distribution. The value of Standard Deviation is given in image units,
since the values xi are expressed in image units. In order to associate this information
to the specific size range of the components of the line, the ratio of the Standard
Deviation to the Mean of the values ( Eq. 6-7) was used.
x x x nm )...( 1
x x n )...( 1σ σ = Eq. 6-7
2 There is a lot of discussion about whether the denominator of Eq. 6-4 should be n or n-1 . Briefly,
the denominator should be n if the mean value of the distribution is known beforehand (it is unbiased),
but it should be changed to n-1 if the mean value is computed from the data (thus it is biased). In
reality, if the difference between n and n-1 actually matters, then the data set is not big enough, and we
pro bly trying to substantiate a questionable hypothesis with marginal data. Unfortunately, this isfrequently true for our case where lines contain only a few characters each.are ba
Text Segmentation in Web Images Using Colour Perception and Topological Features
Sample Text
Figure 6-15 – Horizontal projection of a text line. Certain characteristics of text line projections such
as higher peaks at the base line, mid-line and top-line, and trailing caused by descending characters
are illustrated.
Assessing the similarity of the projections to the projection expected from ideal
cision can be made as to whether the given line resembles a text line or
t and the line is given by Eq. 6-9. The sign of the distance indicates
wh
characters, a de
not. Such an assessment involves a number of operations, such as obtaining the projections, normalizing the values and performing some kind of similarity check.
These steps will be briefly described next.
Given the parameters of a line ( ρ , θ ), and a point (x, y), the vertical distance d
between the poin
ether the point is above of below the line.
−+−= x y
y xd arctancos22 θ ρ Eq. 6-9
In terms of the identified lines of components, the distance of each pixel of the
components assigned to the line can be computed. The maximum distance a pixel may
have from the line can be calculated from known information about the quantisation
of the accumulator array of the Hough transform (related to the size-group the line
belongs to). The distance of each pixel of the components assigned to the line is
calculated and the histogram of distances is obtained. This is the projection of the
line’s components on the direction of the line.
Ideal Character Projections
The projections of ideal images of long sentences were obtained and sampled. Various
fonts were used (Arial and Times typefaces, various sizes). Furthermore, projections
of all the small and all the capital letters of the English alphabet were separately
obtained. Although this second type of projections would not be very relevant for
As can be seen in the figure, the projections have a well-defined distribution, with
two peaks of almost equal strength one at the height of the baseline, and the other at
the height of the top of the small characters. Trails are also visible on the left and righton the main body of each projection, which occur from the descending, ascending and
capital characters. A visible peak also exists at the height of the strike-through line
(the middle of lowercase characters).
When only capital letters are present in the text line, the left and right trails do not
exist anymore. The two prominent peaks in this histogram are at the height of the
baseline and the height of the top of the capital characters. The characteristics of this
histogram, as well as the shifted positions of the dominant peaks indicate that a
different approach should be taken if such a line is encountered.
In this chapter, two methods were examined to classify the connected components
resulting from a segmentation process into two classes: text or background. The
difference of the two methods is the scale at which the problem was addressed.Specifically, the first attempt tries to classify components looking into features of
individual components, whereas the second attempt, works at a different scale, trying
to identify lines of components that share some common features.
It proves that for the specific problem of Web Images, a classification method
results. The second
approach, based on identifying lines of text, works better in a variety of situations, and
is therefore the one used here for classifying components after segmentation.It should be mentioned at this point, that any character-like component
identification process, is strongly dependent on a correct segmentation. Slightly
inaccurately segmented characters, which is a rather frequent situation for Web
images, may result in wrong assessments.
An evaluation of the connected component classification method based on text
line identification will be presented in Chapter 7.
based on individual components is unable to provide good
n this chapter, the results for the two segmentation methods (Split-and-Merge,
Chapter 4 and Fuzzy, Chapter 5) and the character component classification
method (see Chapter 6) are presented and critically appraised. The evaluation of the
methods was based on a dataset of images collected from numerous Web pages. Adescription of the dataset used, is given in Section 7.1. The two segmentation methods
presented in Chapters 4 and 5 are evaluated on the same dataset and statistical results
are given for each one, as well as characteristic examples (Sections 7.2.1 and 7.2.2) .
Then a comparison between the two segm
I
entation methods is made in Section 7.2.3.
The text line identification m
ability of the text extraction method to decide whether an
image contains text or not, is a desired property, but considered to be out of the scope
ethod is subsequently evaluated in Section 7.3. Finally,
an overall discussion is given in Section 7.4.
7.1. Description of the Dataset
In order to evaluate the methods described in this thesis, a dataset of images collected
from a variety of Web pages was used. To achieve a representative sampling of the
images, the images of the dataset, originate from Web pages that the average user
would be interested to browse. Sites of newspapers, companies, academic sites, e-
commerce sites, search engines etc were included in the sample. All the images in the
Text Segmentation in Web Images Using Colour Perception and Topological Features
(a) (b)
Figure 7-2 – (a) Image containing small, but readable text. (b) Image containing large but
non-readable text.
The number of characters in the images of the data set ranges from 2 to 83. An
average image was found to have around 20 characters, out of which around 16 are
readable. In total, the images in the dataset contain 2,404 characters, out of which
1,852 are classified as readable and 552 are classified as non-readable.
The images in the data set were grouped into four categories according to the
colour combinations used. Category A holds images that contain multicolour
characters over multicolour background. Category B contains images that have
multicolour characters over single-colour background. Category C has images with
single-colour characters over multicolour background. Finally, Category D holds
images with single-colour characters rendered over single-colour background. Thegrouping of images into the four categories is shown in Figure 7-3, while the number
of characters per category is shown in Figure 7-4.
Both segmentation methods described in Chapters 4 and 5 were evaluated based on all
the images contained in the dataset. The aim of the segmentation process is to
partition the image into disjoined regions, in such a way that the text is separated fromthe background, and characters are not split in sub-regions or merged together.
The evaluation of the segmentation methods was performed by visual inspection.
This assessment can be subjective since the borders of the characters are not precisely
defined in most cases (due to anti-aliasing or other artefacts caused by compression).
Nevertheless, since no other information is available about which pixel belongs to a
character and which to the background (no ground truth information is available for
web images), visual inspection is the only method of assessment that can be used.This method of assessment is in agreement with previous research on web image text
extraction. Lopresti and Zhou [106], for instance, evaluate their segmentation results
es where it is
not clear whether a character-like component contains any pixel of the background or
not
nents exists that describes them completely without containing
pixels of the background.
in the same way. Since visual assessment is inherently subjective, in cas
, the evaluator decides on the outcome based on whether by seeing the component
on its own they can recognize the character or not. The foundation for this is that even
if a few pixels have been misclassified, as long as the overall shape can still be
recognized, the character will be recognisable by OCR software. Even though in
many cases a human could still read the text in question, even if some pixels are
missed (or added), OCR processes tend to be much more sensitive; hence, we err on
the side of being conservative.
The following rules apply regarding the categorization of the results. Each
character contained in the image is characterised as identified , merged , broken or
missed . Identified characters are those that are described by a single component.
Broken ones, are the characters described by more than one component, as long as
each of those components contain only pixels of the character in question (not any
background pixels). If two or more characters are described by only one component,
yet no part of the background is merged in the same component, then they are
characterised as merged. Finally, missed are the characters for which no component or
Text Segmentation in Web Images Using Colour Perception and Topological Features
Time to Split (sec) Time to Merge (sec) Total Time (sec)
Category A 0.445 414.701 415.146
Category B 0.313 558.649 558.962
Category C 0.366 609.529 609.895
Category D 0.182 33.774 33.956
All Categories 0.286 320.396 320.683
Table 7-3 – Average execution times for the Split and Merge segmentation method per category.
Two factors play an important role in terms of execution time: the size of the
image in question, and the complexity of its colour scheme. The first factor is
self-explanatory, the bigger an image is, the more time is required for each operation.
The second factor, namely the colour content of the image, or else the total number of
colours it contains, affects the method in a different way. First, the more colours an
image has, the more peaks will be identified in the histograms and the more layers
will be produced. This affects both the splitting and the merging process. The number
of components produced also increases as the number of colours (and layers) of the
image grows. Based on those observations, the role of those two factors was studied.The images of the dataset were broken into different size groups and colour
content groups, depending on the total number of pixels ( width x height ) and the total
number of colours they have. The time required for the splitting and the merging
phase for each image in the dataset was recorded and the results can be seen in the
figures below. Two remarks can be made here: the execution time increases roughly
exponential as the size and colour complexity of the images rise, and the splitting
stage takes negligible time comparing to the merging stage.
Text Segmentation in Web Images Using Colour Perception and Topological Features
222
colour schemes that the text is of similar colour to the surrounding background.
Figure 7-11 illustrates one of the difficult cases that the currently used thresholds
cannot cope with. Most of the characters here are split in two components.
Original Final Segmentation Segmented Characters
Figure 7-11 – An image containing characters of gradient colour.
A different problem appears when characters are overlapping. In the case of
Figure 7-12 the colours of the characters are also mixed. As can be seen in the final
segmentation image, the three characters are well separated into three connected
components, each complete enough to recognize. In this case, parts of “ U ” and “ C ”
have been assigned to “ P ”. In order to get all three characters right those common
areas should belong to more than one character, which is not allowed with the currentsegmentation requirements for disjoined, non-overlapping areas. Nevertheless,
although not visible in Figure 7-12, the vexed areas (possible extensions) of the
components representing “ U ” and “ C ” actually consist of exactly the missing parts of
the characters. This could be used as an extra feature for recognition, as will be
discussed next.
Original Final Segmentation Segmented Characters
Figure 7-12 – An image containing overlapping characters.
Antialiased characters can be a significant problem during segmentation. If the
characters are thick enough, antialiasing will create a soft edge, but will leave enough
Text Segmentation in Web Images Using Colour Perception and Topological Features
232
One more example of multi-coloured characters can be seen in Figure 7-22.
Similarly to the previous example, the originally split (after the initial connected
component analysis) characters are finally correctly identified, leaving the
d the shadow out of the inal character components. The only exception
here is character “ G”, which is finally broken in two components. This is because the
letter gets too thin at its lower right side, and subsequently the two parts of the
character do not touch adequately for the algorithm to combine them. The small
characters on the right present high contrast to the background and are thus identified
correctly to a great extend. Of the non-readable characters, only one is correctly
identified.
background an f
Original Initial Components
Final Segmentation Segmented Characters
Figure 7-22 – An image c
nu
cl
no
ontaining multi-coloured characters (each “Google” character contains a
mber of different shades of its main colour) with shadows. The small characters on the right are
assified as readable, while the “ TM ” characters at the upper-right side of “Google” are classified as
n-readable ones.
An image with overlapping characters can be seen in Figure 7-23. The fuzzy
me ent half of the characters in this method. It finds the common parts of the characters equally different in colour from both characters they belong to,
thus it does not assign them to any character specifically. In numerous cases, the
resulting characters maintain their overall shape (e.g. character “ e” above), and are
thus considered identified for recognition purposes. In contrast, characters like “ b”
and “ a” above, whose overall shape is not maintained in a single component, are not
considered identified. As for the non-readable characters “ TM ” at the upper right side
of the logo, one out of the two is correctly identified.
Figure 7-28 – An example of an image with gradient text. The Split and Merge method merges parts of
the character with the background.
Generally, the Fuzzy segmentation method produces much “cleaner”
segmentations. The reason for that is that the Fuzzy segmentation method appears to
deal much better with the small components of the image. While the Split and Merge
method shows a tendency to merge medium sized components, the Fuzzy method has
a tendency to merge the small ones. By doing so, two aspects of the final
segmentation are affected. First, the small components surrounding characters (like
parts of the outlines, the antialiasing areas or the shadows) are merged with either the
character or the background, leaving in many cases thinner characters (e.g. Figure7-26 and Figure 7-29) , but also producing a much cleaner output. The fact that the
final segmentations produced by the Fuzzy method are much cleaner is reflected at the
resu
le characters of the images. As
can be seen in Figure 7-30 (page 239) , the rates for the correct identification of
non-readable components are always higher for the Fuzzy segmentation method,
rather than the Split and Merge one.
lts of the Character Component Classification process as will be discussed next.
The second, and most important effect of this tendency of the Fuzzy method, is
that in most of the cases the non-readable (extremely small) characters of the images
are much better segmented by the Fuzzy method, than the Split and Merge one. An
example of this can be seen in Figure 7-29. Further proof for this comes by comparingthe performance of the two methods for the non-readab
Text Segmentation in Web Images Using Colour Perception and Topological Features
242
when the text line is identified. The character missed most of the times is
character “i”, as illustrated in Figure 7-32. Although the size ranges for the diagonals
were selected so that all characters of the same font would be grouped together, there
are some cases where very small (or very large) characters are not grouped correctly.
(a) (b)
Figure 7-32 – (a) A segmentation produced by the Fuzzy method. (b) One of the extracted lines
identified as text. Character “i” is missing here because it was not considered as part of the same size
group as the rest of the characters on the line.
It should be mentioned at this point, that two-part characters (like “i” and “j”) are
spli ods. The CCC method, aims to
cation method were presented and critically appraised. The dataset
of im
easier images containing single-coloured characters, while the Fuzzy method
t in two components by the segmentation meth
classify the bigger part of such characters, without any special consideration for thedots above them. The dots of such characters can be retrieved at a later
post-processing stage of the CCC method. Such a post-processing stage is not
currently implemented.
Concerning the time performance of the CCC method, it never takes above 0.007
seconds on a Pentium IV running at 1.8GHz.
7.4. Discussion
In this chapter, the results of both segmentation methods and the Connected
Component Classifi
ages used for the assessment was also described and the categorization used for
the images was explained. Finally, possible applications of the methods to domains
other than Web Text Extraction were studied.
Overall, the segmentation methods yield character identification rates around
70% , which is a very promising result. The Split and Merge method works better for
his thesis examined the problem of text extraction from Web Images. A
mary of the contents of this thesis will be given next, followed by a
dis
con
s research in Chapter 1, the theoretical background for this research
was given in Chapters 2 and 3. Chapter 2 gave a literature review of segmentation
sum
cussion based on the aims and objectives set at the beginning. The main
tributions and limitations of this research are detailed in the last section along witha summary of future possibilities suggested and new directions identified. Finally, the
applicability of the methods presented to other domains, except Web Image text
segmentation, is investigated.
T
8.1. Summary
This thesis was divided in three parts. First, the theoretical background of the problem
was presented. The description of the new methods developed followed, and the thesis
closed with an evaluation of the methods presented.
After a brief introduction to the problem, and the establishment of the aims and
objectives of thi
methods, focused mostly on methods created for colour images. Chapter 3 specialised
in text image segmentation and classification of the results. Previous work on web
Text Segmentation in Web Images Using Colour Perception and Topological Features
248
hapter 5 the second
segmentation method developed was described. This segmentation method is based
on a fuzzy defined propinquity value, which is a metric of closeness between
ormation
between them. Chapter 6 addressed the problem of classifying the connected
com
Finally, an evaluation of the two segmentation methods and the connected
compon performed in Chapter 7. A dataset of images
method was subsequently tested on segmentations resulted from both
one classification
me
able to correctly segment the characters
con
the background, should be higher than the contrast
between pixels of the same class. The anthropocentric character of both segmentation
The segmentation and classification methods were described in Chapters 4 to 6.
Chapter 4 detailed the first segmentation method, which works in a split and merge
fashion, based on human colour discrimination information. In C
components that takes into account both colour distance and topological inf
ponents produced by the segmentation method as character or non-character ones.
The classification method developed assesses lines of similar sized components, and
decides whether they resemble lines of text or not.
ent classification method was
collected from the numerous web pages was introduced, and the segmentation
methods were tested against the images in the dataset. The connected component
classification
the split and merge and the fuzzy segmentation methods.
8.2. Aims and objectives revisited
This research aimed in identifying novel ways to extract text from images used in the
World Wide Web. Towards this, two segmentation methods and
thod were developed. The methods developed were built on observations made
over a number of web images.
The segmentation methods presented are generally able to cope with complicated
colour schemes, including gradient text and background, and various instances of
antialiasing. Both segmentation methods are
tained in the images at around 70% of the cases. At the same time, theclassification method, presents a high recall (around 80% ) and an average precision
rate (around 60% ).
The assumptions of the method were kept to an absolute minimum. The only
assumption made for the segmentation method was that the text in every image is
written in such a way that a human being is able to read it. This mainly entails that the
Text Segmentation in Web Images Using Colour Perception and Topological Features
250
The ma ethod presents when used to segment coloured
scanned bo gazine covers is the handling of dithered areas. Due to the way
coloured publications are printed, dithering is extensively used to produce colours.
The segme handle corr ctly dithered s, because the dots
used to crea imilar colour as can be seen in Figure 8-1. In such
vast number of
components, of inadequately similar colour for the aggregation stage to combine, and
the segmentation fails. A solution to that problem would be to downscale the image
by averaging over neighbours of pixels. That would produce areas of similar colour,
in problem that the m
ok and ma
ntation method cannot e area
te the dither are of diss
cases, the initial connected component analysis produces a
but also smaller characters.
(a) (b)
Figure 8-1 – (a) Part of a scanned magazine cover. (b) Magnified region, where dithering is visible.
Instead of scanning and downscaling a number of book covers, we opted at testing
the segmentation method with already downscaled images of book covers obtained
from “amazon.com”. This online business displays small-sized (~ 350 x 475 pixels)versions of the scanned covers for each book offered, so a huge dataset of downscaled
book covers is readily available. The segmentation method was able to correctly
segment an average of around 80% of the characters in those images, while the
execution time was on average 40 seconds for each image.
Another test performed was with video frames containing superimposed text.
Video frames are very similar to web text in many aspects. The segmentation method
Appendix A – Colour Vision and Colorimetry, A Discussion on Colour Systems
=709
709
709
950227.0119193.0019334.0
072169.0715160.0212671.0
180423.0357580.0412453.0
B
G
R
Z
Y
X
Eq. A-2
259
is hardware oriented. Numerous other colour systems exist
which are user-oriented, meaning that they are based on more human oriented
concepts. Colour systems like HSV , HLS , HSI etc fall in this category. Such systems
usually em
A.2.2. HLS
The RGB colour system
ploy three components, namely Hue , Saturation and Lightness (or Value ,
or Intensity , or Brightness ), corresponding to the notions of tint, shade and tone.Specifically Hue represents the “redness”, “greenness”, “blueness” etc of the colour
and corresponds to the colorimetric term Dominant Wavelength . Lightness denotes
the perceived Luminance (as a matter of fact, for self-luminous objects the correct
term is Brightness). Saturation denotes the purity of the colour, corresponding to the
colorimetric term Excitation Purity. The lower the Saturation, the closer the colour is
to grey.
Figure A-4 – The HLS double hexcone colour space..
Text Segmentation in Web Images Using Colour Perception and Topological Features
262
o transform between the HLS and CIE XYZ systems, an intermediate
transformation to RGB ( sRGB) is needed, then we use the matrix transformation given
A.2.3. CIE XYZ
The CIE XYZ system has already been described in the previous section. The three
components of this system are the tristimulus values computed to match any given
colour based of the
T
in section A.2.1.
x , y , and z colour matching functions for the CIE standard
Observer. CIE XYZ colour space is illustrated in Figure A-5.
Figure A-5 – The CIE XYZ Colour Space.
It is convenient for both conceptual understanding and computation, to have arepresentation of “pure” colour in the absence of luminance. The CIE standardized a
procedure for normalizing XYZ tristimulus values to obtain two chromaticity values x
and y (Eq. A-7) . A third chromaticity value, z, can be computed similarly, however it
Appendix A – Colour Vision and Colorimetry, A Discussion on Colour Systems
265
The CIE LUV colour system is defined by Eq. A-14.
1611631
−
=
n
*
Y Y
L
( )nuu Lu ′−′= ** 13
( )nvv Lv ′−′= ** 13
Eq. A-14
As can be seen the definition of L* is the same here as in Eq. A-8 and the sameconstraint applies, namely if Y/Yn is less than 0.008856 Eq. A-9 should be used. The
quantities u ′ , v′ and are calculated from:nu ′ , nv′
Z Y X ++ X
u =′ 4
315Y
Z Y X ++v =′ 9
315
nnn
nn Z Y X X
u ++=′ 3154
nnn
nn Z Y X Y
v ++=′ 3159
Eq. A-15
The tristimulus values X n, Y n and Z n are similarly to CIE LAB those of the
nominally white object-colour stimulus. The colour difference between two colour
stimuli is calculated from:
( ) ( ) ( )[ ]21
2*
2*
2** vu Luv ∆+∆+∆=∆Ε Eq. A-16
If two coloured lights C 1 and C 2 are mixed additively to produce a third colour,
C 3, and the three of them are plotted on a chromaticity diagram, then C 3 should
ideally lie on the straight line joining C 1 and C 2 at a position that can be calculated
from the relative amounts of C 1 and C 2 in the mixture. The CIE 1932 ( x, y)
Chromaticity diagram ( Figure A-6) and the CIE LUV derived (CIE 1976 UCS)
chromaticity diagram (if u* is plotted against v* for constant L*) exhibit this property,
Text Segmentation in Web Images Using Colour Perception and Topological Features
Saturday
Sunday
M
Saturd
onday
Tuesday
ednes
Thursday
Friday
Days of the weekend
W day
ay
Sunday
nday
Tuesday
ed
Thursday
riday
Days of the weekend
Mo W nesday
F
(a) (b)
Figure B-1 – (a) The definition of “weekend days” using a classical set. (b) The definition of “weekend
days” using a fuzzy set. Although technically Friday should be excluded from the set, it feels like part
of the weekend, so in the latter case it is assigned a partial membership to the set.
The concept of the Fuzzy Set is of great importance to fuzzy logic. A Fuzzy Set is
a set without a clearly defined boundary. In a classical definition, a set is a container
that wholly includes or wholly excludes any given element. There is not such a thing
as an element being both part of the set and not part of the set, consequently, of any
subject, one thing must be either asserted or denied, must be either true or false. A
fuzzy set on the other hand, allows elements to be partially members of it. For
example, trying to define a set of days comprising the weekend one would agree that
Saturday and Sunday belong to the set, but what about Friday. Although it should be
technically excluded, it “feels” like part of the weekend ( Figure B-1) . Fuzzy sets are
used to describe vague concepts (weekend days, hot weather, tall people).
In fuzzy logic, the truth of any statement becomes a matter of degree. Friday can
be part of the weekend to a certain degree. So, instead of assigning true ( 1) or false ( 0)
membership values to each element, fuzzy logic allows for in-between degrees of
membership to be defined (Friday is sort of a weekend day, the weather is rather hot).
Everyday life suggests that this second approach is much closer to the way peoplethink. For the above example, one could give a value representing the degree of
membership of each day of the week to the fuzzy set called “weekend”. A
Membership Function is a curve that defines how each point in the input space is
mapped to a membership value (or degree of membership) between 0 and 1. The input
space is sometimes referred to as the Universe of Discourse .
Since we allowed for partial memberships to fuzzy sets (degrees of membership
between true and false), the logical reasoning has to slightly change, to accommodate
this. Fuzzy logical reasoning is a superset of standard Boolean Logic. If we keep thefuzzy values at their extremes ( 1 and 0) standard logic operations should hold.
Consequently, the AND, OR and NOT operators of standard Boolean Logic have to be
replaced by operations that behave in the same way if the fuzzy values are at their
extremes. One widely used set of operations used is illustrated in Figure B-2. A AND
B becomes min(A, B), A OR B becomes max(A, B) and NOT (A) becomes 1-A.
A B0 00 11 01 1
A and B0001
min(A,B)0001
AND
A B0 00 11 01 1
A or B0111
max(A,B)0111
OR
not A10
NOT
A01
1 - A10
Figure B-2 – Boolean Logic and Fuzzy Logic operations. If we keep the fuzzy values at their extremes
(1 and 0) standard Boolean Logic operations hold.
B.3. If-Then Rules
Fuzzy sets and fuzzy operators are the verbs and subjects of fuzzy logic. A set of
if-then statements called rules is used to formulate the conditional statements that
comprise fuzzy logic. A single fuzzy rule assumes the form: “if x is A then y is B ”,
where A and B are linguistic values defined by fuzzy sets. The if-part of the rule is
called the antecedent, while the then-part of the rule is called the consequent. An
example of a fuzzy rule might be “if distance from crosswalk is short, then braking is
hard”. Here “short” is represented as a number between 0 and 1, therefore the
antecedent is an interpretation that returns a single number between 0 and 1. On the
other hand, “hard” is represented as a fuzzy set, so the consequent is an assignment
that assigns the entire fuzzy set “hard” to the output variable “braking”.
The use of the word “is” here is twofold, depending on whether it appears in the
antecedent or the consequent. In the antecedent “is” has the meaning of a relational
1. T. Akiyama and N. Hagita, "Automated Entry System for PrintedDocuments," Pattern Recognition , vol. 23, pp. 1141-1154, 1990.
2. M. d. B. Al-Daoud and S. A. Roberts, "New Methods for the Initialisation of Clusters," Pattern Recognition Letters , vol. 17, pp. 451-455, 1996.
3. M. Ali, W. N. Martin, and J. K. Aggarwal, "Color-Based Computer Analysisof Aerial Photographs," Computer Graphics and Image Processing , vol. 9, pp.282-293, 1979.
4. D. Amor, The E-Business (R)evolution: Living and Working in an Interconnected World , 2nd ed. New Jersey: Prentice Hall, 2001.
5. A. Antonacopoulos, "Page Segmentation Using the Description of theBackground," Computer Vision and Image Understanding , vol. 70, pp. 350-369, 1998.
6. A. Antonacopoulos and A. Brough, "Methodology for Flexible and EfficientAnalysis of the Performance of Page Segmentation Algorithms," Proc. of 5 th International Conference on Document Analysis and Recognition, Bangalore,India, 20-22 September 1999.
7. A. Antonacopoulos and F. Delporte, "Automated Interpretation of VisualRepresentations: Extracting Textual Information from WWW Images," in
Visual Representations and Interpretations , R. Paton and I. Neilson, Eds.London: Springer, 1999.
8. A. Antonacopoulos and A. Economou, "A Structural Approach for Smoothing Noisy Peak-shaped Analytical Signals," Chemometrics and Intelligent Laboratory Systems , vol. 41, pp. 31-42, 1998.
9. A. Antonacopoulos, D. Karatzas, and J. Ortiz Lopez, "Accessing TextualInformation Embedded in Internet Images," Proc. of SPIE Internet Imaging II,San Jose, USA, 24-26 January 2001, pp. 198-205.
24. R. L. Cannon, R. L. Dave, and J. C. Bezdek, "Efficient Implementation of Fuzzy c-means Clustering Algorithms," IEEE Transactions on Pattern
Analysis and Machine Intelligence , vol. 8, 1986.
25. J. Canny, "A Computational Approach to Edge Detection," IEEE Trans.
Pattern Anal. Mach. Intell. , vol. 8, pp. 679-698, 1986.
26. M. J. Carlotto, "Histogram Analysis Using a Scale-Space Approach," IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI-9, pp. 121-129, 1987.
27. S. G. Carlton and O. R. Mitchell, "Image Segmentation Using Texture andGray Level," Proc. of IEEE Conf. Pattern Recognition and Image Processing,Rot, New York, 6-8 June 1977, pp. 387-391.
28. T. Carron and P. Lambert, "Color Edge Detection Using Jointly Hue,Saturation and Intensity," Proc. of IEEE International Conference on Image
Processing, ICIP'94, Austin, USA, 1994, pp. 977-981.
29. T. Carron and P. Lambert, "Fuzzy Color Edge Extraction by Inference RulesQuantitative Study and Evaluation of Performance," Proc. of IEEE
1995, pp. 181-184.
30. R. C. Carter and E. C. Carter, "CIE L *u*v* Color-Difference Equations for Self-Luminous Displays," Color Research and Application , vol. 8, pp. 252-253, 1983.
31. M. Celenk, "A Color Clustering Technique for Image Segmentation,"
32. . Chen, W. C. Lin, and C. T. Chen, "Split-and-Merge Image SegmentationBased on Localized Feature Analysis and Statistical Tests," Graphical Modelsand Image Processing , vol. 53, pp. 457-475, 1991.
33. Y. P. Chien and K. S. Fu, "Preprocessing and Feature Extraction of PicturePatterns," Purdue University, West Lafayette, Indiana TR-EE 74-20, 1974.
34. C. K. Chow and T. Kaneko, "Automatic Boundary Detection of the Left-ventircle from Cineangiograms," Comput. Biomed. Res. , vol. 5, pp. 388-410,1972.
35. P. Clark and M. Mirmehdi, "Recognising Text in Real Scenes," International Journal on Document Analysis and Recognition , vol. 4, pp. 243-257, 2002.
36. J. Cornelis, J. De Becker, M. Bister, C. Vanhove, G. Demonceau, and A.Cornelis, "Techniques for Cardiac Image Segmentation," Proc. of 14 th IEEEEMBS Conference, Paris, France, 1992, pp. 1906-1908.
37. Y. Cui and Q. Huang, "Extacting Characters of Licence Plates from VideoSequences," Machine Vision and Applications , vol. 10, pp. 308-320, 1998.
International Conference on Image Processing, ICIP'95, Washington DC,USA, 23-26 October
Text Segmentation in Web Images Using Colour Perception and Topological Features
304
38. J. F. Cullen and K. Ejiri, "Weak Model-Dependent Page Segmentation andSkew Correction for Processing Document Images," Proc. of 2 nd InternationalConference on Document Analysis and Recognition, Tsukuba, Japan, October 20-22 1993, pp. 757-760.
39. L. S. Davis, "A Survey of Edge Detection Techniques," Computer Graphicsand Image Processing , vol. 4, pp. 248-270, 1975.
40. L. S. Davis, "Hierarchical Generalized Hough Transforms and Line SegmentBased Hough Transforms," Pattern Recognition , vol. 15, pp. 277-285, 1982.
41. J. De Becker, M. Bister, N. Langloh, C. Vanhove, G. Demonceau, and J.Cornelis, "A Split-and-merge Algorithm for the Segmentation of 2-d, 3-d, 4-dcardiac images," Proc. of IEEE Satelite Symposium on 3D Advanced Image
42. A. Dengel and G. Barth, "Document Description and Analysis by Cuts," Proc.of Conference on User-Oriented Content-Based Text and Image Handling,MIT, Cambridge MA, March 21-24 1988, pp. 940-952.
43. S. Di Zenzo, "A Note on the Gradient of a Multi-Image," Computer Vision,Graphics, And Image Processing , vol. 33, pp. 116-125, 1986.
44. W. Doyle, "Operations useful for similarity-invariant pattern recognition," J. Ass. Comput. Mach. , vol. 9, pp. 259-267, 1962.
45. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis . New
York: Wiley, 1970.46. R. O. Duda and P. E. Hart, "Use of the Hough Transform to Detect Lines and
Curves in Pictures," Communs ACM , vol. 15, pp. 11-15, 1972.
47. H. J. Durrett, Color and the Computer . Orlando, Florida, USA: AcademicPress, Inc., 1987.
, "A Fast Adaptive Method for Binarization of Document Images," Proc. of Inte t sis
ecognition, ICDAR'91, France, 1991, pp. 435-443.
50. J. A. Feldman and Y. Yakimovsky, "Decision Theory and ArtificialIntelligence: A Semantic-based Region Analyser," Artificial Intelligence , vol.5, pp. 349-371, 1974.
51. M. Ferraro, G. Boccignone, and T. Caelli, "On the Representation of ImageStructures via Scale Space Entropy Conditions," IEEE Trans. Pattern Anal.
Mach. Intell. , vol. 21, pp. 1199-1203, 1999.
Processing in Medicine, Rennes, France, 1992, pp. 185-189.
48. L. Eikvil, T. Taxt, and K. Moenrna ional Conference on Document Analy
and R
49. J. H. Elder and S. W. Zucker, "Local Scale Control for Edge Detection andBlur Estimation," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 20, pp. 699-716, 1998.
52. J. L. Fisher, S. C. Hinds, and D. P. D'Amato, "A Rule-Based System for Document Image Segmentation," Proc. of 10 th International Conference onPattern Recognition, Atlantic City, New Jersey, U.S.A., 16-21 June 1990, pp.567-572.
53. L. A. Fletcher and R. Kasturi, "A Robust Algorithm for Text String Separationfrom Mixed Text/Graphics Images," IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. 10, pp. 910-918, 1988.
54. L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A.Viergever, "Scale and Differential Structure of Images," Image and VisionComputing , vol. 10, pp. 376-388, 1992.
55. J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes, Computer Graphics: Principles and Practice , 2nd ed: Addison-Wesley, 1996.
56. K. S. Fu and J. K. Mui, "A Survey on Image Segmentation," Pattern Recognition , vol. 13, pp. 3-16, 1981.
57. A. Gagneux, V. Eglin, and H. Emptoz, "Quality Approach of Web Documents by an Evaluation of Structure Relevance," Proc. of 1 st International Workshopon Web Document Analysis, Seatlle, USA, September 8 2001.
58. J. Gauch and C. W. Hsia, "A Comparison of Three Color Image SegmentationAlgorithms in Four Color Spaces," Proc. of Visual Communications andImage Processing, 1992, pp. 1168-1181.
59. M. Goldberg and S. Shlien, "A Cluster Scheme for Multi-spectral Images," IEEE Trans. Systems, Man, Cybernet. , vol. SMC-8, pp. 86-92, 1978.
60. R. G. Gonzalez and R. E. Woods, Digital Image processing , 3 ed: Addison-Wesley Publishing, 1993.
61. R. G. Gonzalez and R. E. Woods, Digital Image processing , 2 ed: Addison-Wesley Publishing, 2001.
62. H. Goto and H. Asi, "Character Pattern Extraction from Documents withComplex Backgrounds," International Journal on Document Analysis and
Recognition , pp. 258-268, 2002.
63. J. N. Gupta and P. A. Winzt, "Computer Processing Algorithm for LocatingBoundaries in Digital Pictures," Proc. of Second International JointConference on Pattern Recognition, 1974, pp. 155-156.
64. J. N. Gupta and P. A. Winzt, "Multi-image Modelling," School of ElectricalEngineering, Purdue University, Technical Report TR-EE 74-24, Sept. 1974.
65. E. R. Hancock and J. Kittler, "Edge-labelling Using Dictionary-basedRelaxation," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 12, pp. 165-181,1990.
Text Segmentation in Web Images Using Colour Perception and Topological Features
306
66. A. R. Hanson and E. M. Riseman, "Segmentation of Natural Scenes," inComputer Vision Systems , A. R. Hanson and E. M. Riseman, Eds. New York:Academic Press, 1978, pp. 129-164.
67. R. M. Haralick, "Zero Crossing of Second Directional Derivative Edge
Operator," Proc. of the Society of Photo-Optical Instrumentation EngineersTechnical Symposium East, Arlington, Virginia, May 3-7 1982.
68. R. M. Haralick, "Digital Step Edges From Zero Crossing of SecondDirectional Derivative," IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI-6, pp. 58-68, 1984.
69. R. M. Haralick, "Image Segmentation Techniques," Computer Vision,Graphics, And Image Processing , vol. 29, pp. 100-132, 1985.
70. R. M. Haralick, "Glossary of Computer Vision Terms," Pattern Recognition ,
vol. 24, pp. 69-93, 1991.
71. R. M. Haralick, I. Phillips, S. Chen, and J. Ha, "Document Zone Hierarchy andClassification," Proc. of IAPR International Workshop on Structural andSyntactic Pattern Recognition (SSPR'94), Nahariya, Israel, October 4-6 1994.
72. M. Hase and Y. Hoshino, "Segmentation Method of Document Images byTwo-Dimensional Fourier Transformation," Systems and Computers in Japan ,vol. 3, pp. 38-45, 1985.
73. P. Heckbert, "Color Image Quantization for Frame Buffer Display," Comput.
Graph. , vol. 16, pp. 297-307, 1982.74. S. C. Hinds, J. L. Fisher, and D. P. D'Amato, "A Document Skew Detection
Method Using Run-Length Encoding and the Hough Transform," Proc. of 10 P
thP
International Conference on Pattern Recognition, Atlantic City, New Jersey,U.S.A., 16-21 June 1990, pp. 464-468.
75. S. L. Horowitz and T. Pavlidis, "Picture Segmentation by a TraversalAlgorithm," Computer Graphics and Image Processing , pp. 360-372, 1972.
76. S. L. Horowitz and T. Pavlidis, "Picture Segmentation by a Directed Split-and-merge Procedure," Proc. of 2 P
ndP Int. Joint Conference on Pattern Recognition,
Copenhagen, Denmark, 1974, pp. 424-433.
77. P. V. C. Hough, "Method and Means for Recognizing Complex Patterns," U.S.Patent 3,069,654, 18 Dec 1962
78. M. Hueckel, "An Operator Which Locates Edges in Digital Pictures," J. Ass.Comput. Mach. , vol. 18, pp. 113-125, 1971.
79. R. W. G. Hunt, Measuring Colour . West Sussex, England: Ellis HorwoodLimited, 1987.
80. T. L. Huntsberger and M. F. Descalzi, "Color Edge Detection," Pattern Recognition Letters , vol. 3, pp. 205-209, 1985.
81. T. L. Huntsberger, C. L. Jacobs, and R. L. Cannon, "Iterative Fuzzy ImageSegmentation," Pattern Recognition , vol. 18, pp. 131-138, 1985.
82. N. Ikonomakis, K. N. Plataniotis, and A. N. Venetsanopoulos, "A Region- based Color Image Segmentation Scheme," Proc. of SPIE VisualCommunication and Image Processing, 1999, pp. 1202-1209.
83. J. Illingworth and J. Kittler, "The Adaptive Hough Transform," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 9, pp. 690-698, 1987.
84. ITU-BT709, "Basic Parameter Values for the HDTV Standard for the Studioand for International Programme Exchange," InternationalTelecommunications Union, ITU-R Recommendation BT.709 [formely CCIR
Rec.709] Geneva, Switzerland: ITU 1990.
85. O. Iwaki, H. Kida, and H. Arakawa, "A Segmentation Method Based onDocument Hierarchical Structure," Proc. of IEEE International Conference onSystems, Man and Cybernetics, Alexandria, VA, Oct. 20-23 1987, pp. 759-763.
86. A. K. Jain and B. Yu, "Automatic Text Location in Images and VideoFrames," PRIP Lab, Department of Computer Science, Michigan StateUniversity, Technical Report MSU-CPS-97-33, 1997.
87. A. K. Jain and B. Yu, "Automatic Text Location in Images and VideoFrames," Pattern Recognition , vol. 31, pp. 2055-2076, 1998.
88. A. K. Jain and B. Yu, "Document Representation and Its Application to PageDecomposition," IEEE Transactions on Pattern Analysis and Machine
Intelligence , vol. 20, pp. 294-308, 1998.
89. T. Kanade, "Region Segmentation: Signal vs Semantics," Computer Graphicsand Image Processing , vol. 13, pp. 279-297, 1980.
90. T. Kanungo, C. H. Lee, and R. Bradford, "What Fraction of Images on theWeb Contain Text?," Proc. of 1 P
stP International Workshop on Web Document
Analysis, Seattle, USA, September 8 2001.
91. J. Kasson and W. Plouffe, "An Analysis of Selected Computer InterchangeColor Spaces," ACM Transactions on Graphics , vol. 11, pp. 373-405, 1992.
92. M. Kelly, "Edge Detection by Computer Using Planning," in Machine Intelligence , vol. VI. Edinburgh: Edinburgh University Press, 1971, pp. 397-409.
Text Segmentation in Web Images Using Colour Perception and Topological Features
308
93. H.-K. Kim, "Efficient Automatic Text Location Method and Content-BasedIndexing and Structuring of Video Database," Journal of Visual Communication and Image Representation , vol. 7, pp. 336-344, 1996.
94. R. Kirsch, "Computer Determination of the Constituent Structure of Biological
Images," Comput. Biomed. Res. , vol. 4, pp. 315-328, 1971.
95. R. Kohler, "A Segmentation System Based on Thresholding," Computer Graphics and Image Processing , vol. 15, pp. 319-338, 1981.
96. M. Koppen, L. Lohmann, and B. Kickolay, "An Image Consulting Framework for Document Analysis of Internet Graphics," Proc. of 4 P
thP International
Conference on Document Analysis and Recognition, Ulm, Germany, 18-20August 1997, pp. 819-822.
97. J. J. Kulikowski, V. Walsh, and I. J. Murray, Limits of Vision , vol. 5. Boca
Raton, USA: Macmillan Press Ltd, 1991.
98. R. S. Ledley, M. Buas, and T. J. Golab, "Fundamentals of True-Color ImageProcessing," Proc. of 10 P
thP International Conference on Pattern Recognition,
1990, pp. 791-795.
99. Y. Leung, J. S. Zhang, and Z. B. Xu, "Clustering by Scale-Space Filtering," IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 22, pp.1396-1410, 2000.
100. H. Li, D. Doerman, and O. Kia, "Text Extraction, Enhancement and OCR in
Digital Video," in Document Analysis Systems: Theory and Practice , vol.1655, Lecture Notes in Computer Science , Y. Nakano and S.-W. Lee, Eds.:Springer, 1999, pp. 363-377.
101. H. Li, D. Doerman, and O. Kia, "Automatic Text Detection and Tracking inDigital Video," IEEE Transactions on Image Processing , vol. 9, pp. 147-156,2000.
102. R. Lienhart and F. Stuber, "Automatic Text Recognition in Digital Videos,"Proc. of SPIE Volume: 2666 - Image and Video Processing IV, 1996, pp. 180-188.
103. Y. W. Lim and S. U. Lee, "On the Color Image Segmentation AlgorithmBased on the Thresholding and the Fuzzy c-means Techniques," Pattern
Recognition , vol. 23, pp. 935-952, 1990.
104. T. Lindeberg, "Scale-space for Discrete Signals," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 12, pp. 234-254, 1990.
105. D. Lopresti and J. Zhou, "Document Analysis and the World Wide Web,"Proc. of the Workshop on Document Analysis Systems, Marven,Pennsylvania, October 1996, pp. 417-424.
106. D. Lopresti and J. Zhou, "Locating and Recognizing Text in WWW Images," Information Retrieval , vol. 2, pp. 177-206, 2000.
107. D. G. Lowe, "Three-dimensional Object Recognition from Single Two-dimensional Images," Artificial Intelligence , vol. 31, pp. 355-395, 1987.
108. Y. Lu and R. C. Jain, "Behavior of Edges in Scale Space," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 11, pp. 337-356, 1989.
109. E. P. Lyvers and O. R. Mitchell, "Precision Edge Constrast and OrientationEstimation," IEEE Trans. Pattern Anal. Mach. Intell. , vol. 10, pp. 927-937,1988.
110. L. W. MacDonald and M. R. Luo, Colour Imaging . West Sussex, England:John Wiley & Sons, 1999.
111. J. MacQueen, "Some Methods for Classification and Analysis of MultivariateObservations," Proc. of 5 P
thP Berkeley Symposium on Mathematical Statistics
and Probability, 1967, pp. 281-297.
112. E. H. Mamdani and S. S., "An Experiment in Linguistic Synthesis with aFuzzy Logic Controller," International Journal of Man-Machine Studies , vol.7, pp. 1-13, 1975.
113. D. Marr and E. Hildreth, "Theory of Edge Detection," Proc. R. Soc. Lond. ,vol. B 207, pp. 187-217, 1980.
114. A. Martelli, "Edge Detection Using Heuristic Search Methods," Computer Graphics and Image Processing , vol. 1, pp. 169-182, 1972.
115. A. Martelli, "An Application of Heuristic Search Methods to Edge andContour Detection," Communs ACM , vol. 19, pp. 73-83, 1976.
116. K. McLaren, "The Development of the CIE 1976 (L P
*Pa P
*P b P
*P) Uniform Colour
Space and Colour-difference Formula," Journal of the Society of Dyers and Colorists , vol. 92, pp. 338-341, 1976.
117. S. Messelodi and C. M. Modena, "Automatic Identification and SkewEstimation of Text Lines in Real Scene Images," Pattern Recognition , vol. 32,
pp. 791-810, 1999.
118. M. Mirmehdi and M. Petrou, "Segmentation of Color Textures," IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 22, pp. 142-159, 2000.
119. A. Moghaddamzadeh and N. Bourbakis, "A Fuzzy Region Growing Approachfor Segmentation of Color Images," Pattern Recognition , vol. 30, pp. 867-881,1997.
Text Segmentation in Web Images Using Colour Perception and Topological Features
310
120. A. Moghaddamzadeh, D. Goldman, and N. Bourbakis, "A Fuzzy-LikeApproach for Smoothing and Edge Detection in Color Images," International
Journal of Pattern Recognition and Artificial Intelligence , vol. 12, pp. 801-816, 1998.
121. U. Montanari, "On the Optimal Detection of Curves in Noisy Pictures,"Communs ACM , vol. 14, pp. 335-345, 1971.
122. J. L. Muerle and D. C. Allen, "Experimental Evaluation of Techniques for Automatic Segmentation of Objects in a Complex Scene," in Pictorial Pattern
Recognition , G. C. Cheng and e. al., Eds. Washington: Thompson, 1968, pp.3-13.
123. F. Muge, I. Granado, M. Mengucci, P. Pina, V. Ramos, N. Sirakov, J. R.Caldas Pinto, A. Marcolino, M. Ramalho, P. Vieira, and A. Maia do Amaral,"Automatic Feature Extraction and Recognition for Digial Access of Books of
the Renaissance," in Research and Advanced Technology for Digital Libraries , vol. 1923, Lecture Notes in Computer Science , J. Borbinha and T.Baker, Eds. Berlin Heidelberg: Springer-Verlag, 2000, pp. 1-13.
124. J. K. Mui, J. W. Bacus, and F. K.S., "A Scene Segmentation Technique for Microscopic Cell Images," Proc. of Symp. Computer Aided Diagnosis of Medical Images, San Diego, CA, 1976, pp. 99-106.
125. E. V. Munson and Y. Tsymbalenko, "To Search for Images on the Web, Look at the Text, Then Look at The Images," Proc. of 1 P
stP International Workshop on
Web Document Analysis, Seattle, USA, September 8 2001, pp. 39-42.
126. G. Murch, "Color Displays and Color Science," in Color and the Computer , J.Durrett, H., Ed. Orlando, Florida: Academic Press INC., 1987, pp. 1-25.
127. G. Nagy, J. Kanai, M. Krishnamoorty, M. Thomas, and M. Viswanathan,"Two Complementary Techniques for Digitized Document Analysis," Proc. of ACM Conference on Document Processing Systems, Santa Fe, New Mexico,Dec. 5-9 1988, pp. 169-176.
128. G. Nagy and S. Seth, "Hierarchical Representation of Optically ScannedDocuments," Proc. of 7 P
thP International Conference on Pattern Recognition,
Montreal, Canada, 1984, pp. 347-349.129. G. Nagy, S. Seth, and S. D. Stoddard, "Document Analysis with an Expert
System," in Pattern Recognition in Practice II : North-Holland, 1986, pp. 149-159.
130. Y. Nakagawa and A. Rosenfeld, "Some Experiments on VariableThresholding," Pattern Recognition , vol. 11, pp. 191-204, 1979.
131. R. Nevatia, "A Color Edge Detector and Its Use in Scene Segmentation," IEEE Transactions on Systems, Man and Cybernetics , vol. 7, pp. 820-826,
132. L. O'Gorman, "The Document Spectrum for Bottom-Up Page LayoutAnalysis," in Advances in Structural and Syntactic Pattern Recognition , H.Bunke, Ed.: World Scientific, 1992, pp. 270-279.
133. L. O'Gorman, "Binarization and Multithresholding of Document Images using
134. R. B. Ohlander, "Analysis of Natural Scenes," Ph.D. Dissertation, Departmentof Computer Science, Carnegie Mellon University, Pittsburgh, Pensylvania,1975
135. R. B. Ohlander, K. Price, and D. R. Reddy, "Picture Segmentation Using aRecursive Region Splitting Method," Computer Graphics and Image
Processing , vol. 8, pp. 313-333, 1978.
136. Y. Ohta, T. Kanade, and T. Sakai, "Color Information for RegionSegmentation," Computer Graphics and Image Processing , vol. 13, pp. 222-241, 1980.
137. J. Ohya, A. Shio, and S. Akamatsu, "Recognizing Characters in SceneImages," IEEE Transactions on Pattern Analysis and Machine Intelligence ,vol. 16, pp. 214-218, 1994.
138. M. T. Orchard and A. Bouman, "Color Quantization Techniques," IEEE Transactions on Signal Processing , vol. 39, pp. 2677-2690, 1991.
139. N. R. Pal and D. Bhandari, "On Object-Background Classification," Int. J.Syst. Sci. , vol. 23, pp. 1903-1920, 1992.
140. N. R. Pal and S. K. Pal, "A Review on Image Segmentation Techniques," Pattern Recognition , vol. 26, pp. 1277-1294, 1993.
141. S. K. Pal, "Image Segmentation Using Fuzzy Correlation," InformationScience , vol. 62, pp. 223-250, 1992.
142. N. Papamarkos, A. E. Atsalakis, and C. P. Strouthopoulos, "Adaptive Color Reduction," IEEE Transactions on Systems, Man and Cybernetics , vol. 32, pp.44-56, 2002.
143. S. H. Park, I. D. Yun, and S. U. Lee, "Color Image Segmentation based on 3-DClustering: Morphological Approach," Pattern Recognition , vol. 31, pp. 1060-1076, 1998.
144. P. Parodi and R. Fontana, "Efficient and Flexible Text Extraction fromDocument Pages," International Journal on Document ANalysis and
Text Segmentation in Web Images Using Colour Perception and Topological Features
312
145. T. Pavlidis, "Segmentation of Pictures and Maps through FunctionalApproximations," Computer Graphics and Image Processing , vol. 1, pp. 360-372, 1972.
147. T. Perroud, K. Sobottka, H. Bunke, and L. O. Hall, "Text Extraction fromColor Documents-Clustering Approaches in Three and Four Dimensions,"Proc. of 6 P
thP International Conference on Document Analysis and Recognition,
2001, pp. 937-941.
148. K. P. Philip, "Automatic Detection of Myocardial Contours in Cine ComputedTomographic Images," PhD Thesis, University of Iowa, 1991
149. M. Pietikainen and A. Rosenfeld, "Image Segmentation by Texture UsingPyramid Node Linking," IEEE Transactions on Systems, Man and
Cybernetics , vol. 11, pp. 822-825, 1981.
150. M. Pietikainen and A. Rosenfeld, "Gray Level Pyramid Linking as an Aid inTexture Analysis," IEEE Transactions on Systems, Man and Cybernetics , vol.12, pp. 422-429, 1982.
151. M. Pietikainen, A. Rosenfeld, and I. Walter, "Split-and-Link Algorithms for Image Segmentation," Pattern Recognition , vol. 15, pp. 287-298, 1982.
152. C. Poynton, A Technical Introduction to Digital Video . New York: JohnWilley & Sons, 1996.
153. J. M. Prager, "Extracting and Labelling Boundary Segments in NaturalScenes," IEEE Transactions on Pattern Analysis and Machine Intelligence ,vol. 2, pp. 16-27, 1980.
154. J. M. S. Prewitt, "Object Enhancement and Extraction," in Picture Processing and Psychopictorics , B. S. Lipkin and A. Rosenfeld, Eds. New York:Academic Press, 1970, pp. 75-149.
155. J. M. S. Prewitt and M. L. Mendelsohn, "The Analysis of Cell Images,"Transactions of New York Academy of Science , vol. 128, pp. 1035-1053, 1966.
156. D. M. L. Purdy, "On the Saturations and Chromatic Thresholds of the SpectralColours," British Journal of Phsycology , vol. 21, pp. 283, 1931.
157. E. M. Riseman and M. A. Arbib, "Computational Techniques in the VisualSegmentation of Static Scenes," Computer Graphics and Image Processing ,vol. 6, pp. 221-276, 1977.
158. E. M. Riseman and M. A. Arbib, "Segmentation of Static Scenes," Computer Graphics and Image Processing , vol. 6, pp. 221-276, 1977.
159. A. L. Robertson, "The CIE 1976 Color Difference Formulae," Color Researchand Application , vol. 2, pp. 7-11, 1977.
160. T. V. Robertson, P. H. Swain, and K. S. Fu, "Multispectral ImagePartitioning," School of Electrical Engineering, Purdue University TR-EE 73-
26, August 1973 1973.
161. G. S. Robinson, "Color Edge Detection," Optical Engineering , vol. 16, pp.479-484, 1977.
162. A. Rosenfeld, Picture Processing by Computer . New York: Academic Press,1969.
163. A. Rosenfeld, "Iterative Methods in Image Analysis," Proc. of IEEE Conf.Pattern Recognition and Image Processing, Troy, Ney York, June 1977 1977,
pp. 14-20.
164. A. Rosenfeld, R. A. Hummel, and S. W. Zucker, "Scene Labelling byRelaxation Operations," IEEE Trans. on Systems, Man and Cybernetics , vol.6, pp. 420-433, 1976.
165. A. Rosenfeld and A. C. Kak, Digital Picture Processing . New York:Academic Press, 1976.
166. P. L. Rosin and G. A. W. West, "Segmentation of Edges into Lines and Arcs," Image and Vision Computing , vol. 7, pp. 109-114, 1989.
167. Y. Rubner, C. Tomasi, and L. J. Guibas, "A Metric for Distributions withApplication to Image Databases," Proc. of International Conference onComputer Vision, Bombay, India, 5-8 January 1998, pp. 59-66.
168. M. A. Ruzon and C. Tomasi, "Color Edge Detection with the CompassOperator," Proc. of IEEE Conference on Computer Vision and PatternRecognition, June 1999, pp. 160-166.
169. M. A. Ruzon and C. Tomasi, "Corner Detection in Textured Color Images,"Proc. of IEEE International Conference on Computer Vision, September 1999,
pp. 1039-1045.
170. M. A. Ruzon and C. Tomasi, "Edge, Junction, and Corner Detection UsingColor Distributions," IEEE Transactions on Pattern Analysis and Machine
Intelligence , vol. 23, pp. 1281-1295, 2001.
171. A. Saheed and I. H. Witten, "Processing Textual Images," New Zealand Journal of Computing , vol. 4, pp. 57-66, 1993.
172. P. K. Sahoo, S. Soltani, A. K. C. Wong, and Y. C. Chen, "Survey of Thresholding Techniques," Computer Vision, Graphics, And Image
Text Segmentation in Web Images Using Colour Perception and Topological Features
314
173. A. Sarabi and J. K. Aggarwal, "Segmentation of Chromatic Images," Pattern Recognition , vol. 13, pp. 417-427, 1981.
174. T. Sato, T. Kanade, E. K. Hughes, and M. Smith, A., "Video OCR for Digital News Archive," Proc. of IEEE International Workshop on Content Based
Access of Image and Video Database, 1998, pp. 52-60.
175. J. Sauvola and M. Pietikainen, "Adaptive Document Image Binarization," Pattern Recognition , vol. 33, pp. 225-236, 2000.
176. B. J. Schacter, L. S. Davis, and A. Rosenfeld, "Scene Segmentation by Cluster Detection in Color Space," Computer Science Center, University of Maryland,Technical Report 424 1975.
177. R. Schettini, "A Segmentation Algorithm for Color Images," Pattern Recognition Letters , vol. 14, pp. 499-506, 1993.
178. J. Serra, Image Analysis and Mathematical Morphology . London: AcademicPress, 1982.
179. L. G. Shapiro and G. C. Stockman, Computer Vision . Upper Saddle River, New Jersey: Prentice-Hall, Inc., 2001.
180. J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," Proc. of 16 P
thP
IEEE Conference on Computer Vision and Pattern Recognition (CVPR'97),Puerto Rico, 17-19 June 1997, pp. 731-737.
181. L. D. Silverstein, "Human Factors for Color Display Systems: Concepts,Methods, and Research," in Color And The Computer , J. Durrett, H., Ed.Orlando, Florida: Academic Press INC., 1987, pp. 27-61.
182. M. Smith, A. and T. Kanade, "Video Skimming and Characterization throughthe Combination of Image and Language Understanding Technique," Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 1997, pp.775-781.
183. K. Sobottka, H. Bunke, and H. Kronenberg, "Identification of Text on ColoredBook and Journal Covers," Proc. of 5th International Conference on DocumentAnalysis and Recognition (ICDAR), September 1999.
184. K. Sobottka, H. Kronenberg, T. Perroud, and H. Bunke, "Text Extraction fromColored Book and Journal Covers," International Journal on Document
ANalysis and Recognition , pp. 163-176, 2000.
185. P. Soille, Morphological Image Analysis . Berlin: Springer, 1999.
186. M. Sonka, V. Hlavec, and R. Boyle, Image Processing, Analysis and MachineVision . London: Chapman & Hall Computing, 1993.
187. A. L. Spitz, "Recognition Processing for Multilingual Documents," Proc. of International Conference on Electronic Publishing, Document Manipulationand Typography, Gaithersburg, Maryland, September 1990, pp. 193-205.
188. S. N. Srihari and Govindaraju, "Analysis of Textual Images Using the Hough
Transform," Machine Vision and Applications , vol. 2, pp. 141-153, 1989.
189. S. S. Stevens, Psychophysics: Introduction to Its Perceptual, Neutral and Social Prospects . New York: Wiley, John & Sons, Incorporated, 1975.
190. M. Stokes, M. Anderson, S. Chandrasekar, and R. Motta, "A Standard DefaultColor Space for the Internet -sRGB," 1.10 ed. 1996,HThttp://www.w3.org/Graphics/Color/sRGB.html TH
191. M. Suk and S. M. Chunk, "A New Image Segmentation Technique Based onPartition Mode Test," Pattern Recognition , vol. 16, pp. 469-480, 1983.
192. T. Taxt, P. J. Flynn, and A. K. Jain, "Segmentation of Document Images," IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI-11, pp. 1322-1329, 1989.
193. P. D. Thouin and C.-I. Chang, "A Method for Restoration of Low-resolutionDocument Images," International Journal on Document Analysis and
Recognition , pp. 200-210, 2000.
194. P. D. Thouin and C.-I. Chang, "Automated System for Restoration of Low-resolution Document and Text Images," Journal of Electronic Imaging , vol.10, pp. 535-547, 2001.
195. S. Tominaga, "Color Image Segmentation Using Three Perceptual Attributes,"Proc. of Conference Computer Vision and Pattern Recognition, 1986, pp. 628-630.
196. S. Tominaga, "A Color Classification Method for Color Images Using aUniform Color Space," Proc. of 10 P
thP International Conference on Pattern
Recognition, Atlantic City, New Jersey, 16-21 June 1990, pp. 803-807.
197. A. Tremeau and N. Borel, "A Region Growing and Merging Algorithm toColor Segmentation," Pattern Recognition , vol. 30, pp. 1191-1203, 1997.
198. D. C. Tseng and C. H. Chang, "Color Segmentation Using PerceptualAttributes," Proc. of 11 P
thP International Conference on Pattern Recognition,
1992, pp. 228-231.
199. D. Wang and S. N. Srihari, "Classification of Newspaper Image Blocks UsingTexture Analysis," Pattern Recognition , vol. 47, pp. 327-352, 1989.
200. L. Wang and T. Pavlidis, "Detection of Curved and Straight Segments fromGray Scale Topography," Proc. of SPIE Symposium on Character RecognitionTechnologies, San Jose, California, 1993, pp. 10-20.
Text Segmentation in Web Images Using Colour Perception and Topological Features
316
201. S. Watanabe and et al., "An Automated Apparatus for Cancer Prescreening,"Computer Graphics and Image Processing , vol. 3, pp. 350-358, 1974.
202. A. R. Weeks, C. E. Felix, and H. R. Myler, "Edge Detection of Color ImagesUsing the HSL Color Space," Proc. of SPIE Non Linear Image Processing VI,
February 1995, pp. 291-301.
203. A. R. Weeks and G. E. Hague, "Color Segmentation in the HSI Color SpaceUsing the K-means Algorithm," Proc. of Nonlinear Image Processing VIII,April 1997, pp. 143-154.
204. J. S. Weszka, "A Survey of Threshold Selection Techniques," Computer Graphics and Image Processing , vol. 7, pp. 259-265, 1978.
205. J. S. Weszka, R. N. Nagel, and A. Rosenfeld, "A Threshold SelectionTechnique," IEEE Trans. Comput. , vol. C-23, pp. 1322-1326, 1974.
206. J. S. Weszka and A. Rosenfeld, "Threshold Evaluation Techniques," IEEE Trans. Systems, Man, Cybernet. , vol. SMC-8, pp. 622-629, 1978.
207. A. P. Witkin, "Scale-space Filtering," Proc. of IEEE Int. Conf. on Acoustic &Signal Processing, 1984, pp. 39A.1.1-39A.1.4.
208. M. Worring and L. Todoran, "Segmentation of Color Documents by LineOriented Clustering using Spatial Information," Proc. of 5 P
thP International
Conference on Document Analysis and Recognition ICDAR'99, Bangalore,India, September 1999, pp. 67-69.
209. V. Wu, R. Manmatha, and E. M. Riseman, "Finding Text in Images," Proc. of 2 P
ndP ACM International Conference on Digital Libraries, Philadephia, PA,
1997, pp. 23-26.
210. G. Wyszecki and W. S. Stiles, Color Science - Concepts and Methods,Quantitative Data Formulas . New York: John Wiley, 1967.
211. G. Wyszecki and W. S. Stiles, Color Science, Concepts and Methods,Quantitative Data and Formulae , 2 P
ndP ed. New York: John Wiley & sons,
2000.
212. Y. Yakimovsky and J. A. Feldman, "A Semantics-based Desicion TheoryRegion Analyser," Proc. of Third International Joint Conference on ArtificialIntelligence, 1973, pp. 580-588.
213. C. K. Yang and W. H. Tsai, "Reduction of Color Space Dimensionality byMoment-preserving Thresholding and its Application for Edge Detection inColor Images," Pattern Recognition Letters , vol. 17, pp. 481-490, 1996.
214. S. D. Yanowizt and A. M. Bruckstein, "A New Method for imageSegmentation," Computer Vision, Graphics, And Image Processing , vol. 46,