1 Détection des textes dans les images issues d’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information.

Détection des textes dans les images issues d’un flux vidéo pour l

´indexation sémantique

Laboratoire d'Informatique en Images et Systèmes d'information LIRIS, FRE 2672 CNRS

Bât. Jules Verne, INSA de Lyon69621 Villeurbanne cedex

1 juillet 2004

Christian.wolf@liris.cnrs.frhttp://rfv.insa-lyon.fr/~wolf

Christian Wolf

2FeaturesIntroduction Evaluation ConclusionText detection

System Recall Precision H. meanAshida 46 55 50HWDavid 46 44 45Wolf 44 30 36Todoran 18 19 18Full 6 1 2

Results

Introduction

Features

Evaluation/Choice of features

Text detection

Conclusion Experimental Results

Image/video indexing• Content based image retrieval (Master’s degree):

Query by example:Indexing based on local texture (Gabor) features

• Video indexing using semantic descriptors (PhD) :Text detection, enhancement, segmentation and recognition.

Result

keyword-basedSearch

Patrick Mayhew

Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel......

Key word

Indexing phase

FeaturesIntroduction Evaluation ConclusionText detection Results

Text detection

“Soukaina Oufkir”

Detection

Enhancement

Segmentation

Detection in an image

Contrast and Edge features

Geometrical features

Texture features

Color features

Problems:• Which features?• How can the decision be taken (text - non-text)?

Separate populations (discriminant analysis)

Learning a model (SVM, etc.)

Reinforcement learning

Master’s thesis of Graham Taylor

Heuristics

Region/stroke segmentation

Corner features

6FeaturesIntroduction Evaluation ConclusionText detection Results

Introduction

Features

Text detection

Videos vs. scanned documentsTemporal aspectsComplex and moving background Artificial shadows

Videos vs. scanned documents

• Low resolution• Low quality

• Antialising artifacts• Compression artifacts• Color bleeding

What is text? - character segmentation

Artificial textArtificial text

Scene textScene text

What is text? - texture

Example: Gabor energy features on a text image

Original image Filter tuned to the example text

Gabor energy Thresholded Gabor energy

What is text? - texture

Still imagesIntroduction Videos IndexingCharacter segmentation Results

What is text? - corners

Unthresholded “Harris” corner response

Derivative 2nd derivative smearedyxI

What is text? - contrast & geometry

Example image Accumulated horizontal Sobel edges

What is text? - color

Original image

Sobel on grayscale image Modified Sobel on L*u*v* image

Special cases of text:• Small contrast in the lumination plane• High(er) contrast in the color plane

Introduction

Features

Text detection

EvaluationA good evaluation algorithm permits:

• A simple and intuitive interpretation of the obtained performance

• An objective comparison between the different algorithms to evaluate

• A good correspondence between the performance measures and the real performance, taking into account the objective of the algorithm (goal oriented approach)

• Takes into account only the performance of the algorithm, without side effects of other processing steps

Evaluation at different levels

Statistical separation: Bhattacharyya distance

Error rate,Recall/Precision

on pixel level

Recall/Precisionon rectangle level

Goal oriented: Recall/Precision

on character level

Higher relevance to the application

Lower influence of later stagesLower computational complexity

Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel......

Detection result Ground truth

Evaluation on rectangle level

Detection Ground truth

Pure overlap is ambiguous on multiple images: 50% of recall could mean:

• 50% of the text rectangles have been detected perfectly• 100% of the rectangles have been detected with 50% surface• Anything between the two ...

Evaluation on rectangle levelRequirements of an evaluation measure:• Tells intuitively how many rectangles have been detected,

and how many false alarms• Measures the detection quality• Takes into account one-2-one, one-2-many and many-2-one

matches• Scales up to multiple images

Counts number of correctly detected

rectangles

Measures the detection quality

Problem:

Contradiction

Performance graphs

Ground truth Gi

Detection Di

“Surface” Recall and Precision:Thresholded by different thresholds on recall and precision

For each rectangle, we will know whether it has been detected or not, depending on a quality threshold

Performance graphs

Threshold on surface recall Threshold on surface precisionFeaturesIntroduction Evaluation ConclusionText detection Results

Comparison of different detection

algorithms

Method 1:Local contrast

Method 2:SVM Learning

The influence of the test database

Local contrast SVM learningFeaturesIntroduction Evaluation ConclusionText detection Results

Introduction

Features

Text detection

The local contrast method

Calculate a text probability image according to a text model (1 value/ pixel)

Separate the probability values into 2 classes.

Post processing

Fisher/Otsu

• Mathematical morphology• Geometrical constraints• Verification of special cases• Combination of rectangles

F. LeBourgeois

Still imagesIntroduction Videos ConclusionCharacter segmentation Results

The learning method

Learning gray values and edge maps alone may not generalize enough.

Texture alone is not reliable, especially if the text is short.

Geometry is a valuable feature.

State of the art: enforce geometrical constraints in the post-processing step (mathematical morphology)

We propose the usage of geometrical features very early in the detection process, i.e. not during post-processing.

Geometrical features: baseline

Text consists of:• A high density of strokes in

direction of the text baseline.• A consistent baseline (a

rectangular region with an upper and lower border).

Two detection philosophies:• Detection of the baseline directly

before detecting the text region.• Detection of the baseline as the

boundary area of the detected text region in order to refine the detection quality.

Estimation of the text rectangle height

Original image Accumulated gradients

Mode width (=rectangle height) Mode height (=Contrast) Difference height left-right

Mode mean Mode standard deviation Difference in mode width

Features

Learning with Support Vector Machines

Training image database positive samples negative samples

Classification step: a reduction of the computational complexity is necessary:

• Sub-sampling of the pixels to classify (4x4)• Approximation of the SVM model by SVM-regression.

Bootstrapping, cross-validation

Introduction

Features

Conclusion

Text detection

System Recall Precision H. meanAshida 46 55 50HWDavid 46 44 45Wolf 44 30 36Todoran 18 19 18Full 6 1 2

Experimental Results

AIM3News

AIM4Cartoons, News

AIM5News

AIM2Commercials

Detection in still images

Local contrast

SVM learning

Dataset # G Recall Precision H.MeanArtificial text + no text

144 1.49 81.2 20.1 32.3

Artificial text + scene text + no text

384 1.84 59.1 18.1 27.7

Dataset # G Recall Precision H.MeanArtificial text + no text

144 1.49 59.7 23.9 34.2

Artificial text + scene text + no text

384 1.84 47.5 21.5 29.6

Local contrast

SVM learning

Local contrast

SVM learning

Detection in video sequences

Videos Contrast SVM Learn.

Classified as text 301 284

Classified as non-text 21 38

Total in ground truth 322 322

Positives 350 384

False alarms 947 171

Logos 75 39

Scene text 72 90

Total - false alarms 497 513

Total 1444 684

Recall (%) 93.5 88.2

Precision (%) 34.4 75.0

Harmonic mean (%) 50.3 81.1

Character segmentation: examplesOriginal image

Fisher/Otsu

Fisher/Otsu (windowed)

Yanowitz-B.

Yanowitz-B. +post-proc.

Niblack

Sauvola et al.

Contrast maximiz.

Bin. Method Recall Precision H. Mean N. CostOtsu 47.3 90.5 62.1 56.8Niblack 80.5 80.4 80.4 40.0Sauvola 72.4 81.2 76.5 42.3Max. contrast 85.4 90.7 88.0 23.0

OCR resultsLocal contrast based binarization

Recognition by Abby Finereader 5.0

Sauvola et al. MRF

Bayesian estimation using a Markov random field prior

1 2 3 4 5 Total

Sauvola 77.1 39.8 77.1 99.0 98.7 79.0

MRF 81.0 40.5 87.3 99.3 98.8 82.0

Character recognition rate

Document

TREC 2002

“Dance”

“EnergyGas”

“Music”

“Oil”

“Airline”“Air plane”

Collaboration with Laboratory LAMP, University of Maryland

ConclusionThe choice of features is primordial in vision.

We developed a new system for detection, tracking, enhancement and binarisation of text.

Detection performance is high due to the integration of several types of features in a very early stage. The learning method is less sensitive to textured noise in the image.

We propose a new evaluation method which allows intuitive visualization of the detection quality by performance graphs.

Outlook

Possible improvement of the features (e.g. contrast normalization, non-linear texture filters).

Integration of different feature types (statistical, structural, ...)

Usage of a priori knowledge on text in order to decrease the number of false alarms

Integration of the detected text into a indexing/browsing/segmentation framework

Optionalslides

The Bhattacharyya distance

1 Détection des textes dans les images issues d’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information.

Documents

Xavier Tannier xavier.tannier@limsi.fr Indexation avancée.....

Resume Mots Cles Indexation

Les Marchés publics d'informatique

Numérisation simplifiée. Indexation automatisée...

Détection des textes dans les images issues d ’un flux.....

PhotosNormandie et The Commons - MSH Lyon St...

ISI | Institut supérieur d'informatique

Indexation avancée

PROGRAMMES D'INFORMATIQUE

Laboratoire d'Informatique Gaspard Monge

INDEXATION des IMAGES

indexation fondamentale

Indexation stop ou encore?

les métiers d'informatique

Traitements automatiques des images et Images et sons des...

Cours d'Informatique “Bases de données”