1 Détection des textes dans les images issues d’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information.
Post on 15-Jan-2016
213 Views
Preview:
Transcript
1
Détection des textes dans les images issues d’un flux vidéo pour l
´indexation sémantique
Laboratoire d'Informatique en Images et Systèmes d'information LIRIS, FRE 2672 CNRS
Bât. Jules Verne, INSA de Lyon69621 Villeurbanne cedex
1 juillet 2004
Christian.wolf@liris.cnrs.frhttp://rfv.insa-lyon.fr/~wolf
Christian Wolf
2FeaturesIntroduction Evaluation ConclusionText detection
System Recall Precision H. meanAshida 46 55 50HWDavid 46 44 45Wolf 44 30 36Todoran 18 19 18Full 6 1 2
Results
Introduction
Features
Evaluation/Choice of features
Text detection
Conclusion Experimental Results
Plan
3
Image/video indexing• Content based image retrieval (Master’s degree):
Query by example:Indexing based on local texture (Gabor) features
• Video indexing using semantic descriptors (PhD) :Text detection, enhancement, segmentation and recognition.
Result
keyword-basedSearch
Patrick Mayhew
Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel......
Key word
Indexing phase
FeaturesIntroduction Evaluation ConclusionText detection Results
4
Text detection
“Soukaina Oufkir”
Detection
Enhancement
Segmentation
FeaturesIntroduction Evaluation ConclusionText detection Results
5
Detection in an image
Contrast and Edge features
Geometrical features
Texture features
Color features
Problems:• Which features?• How can the decision be taken (text - non-text)?
Separate populations (discriminant analysis)
Learning a model (SVM, etc.)
Reinforcement learning
Master’s thesis of Graham Taylor
Heuristics
Region/stroke segmentation
Corner features
FeaturesIntroduction Evaluation ConclusionText detection Results
6FeaturesIntroduction Evaluation ConclusionText detection Results
Introduction
Features
Evaluation/Choice of features
Text detection
Conclusion Experimental Results
Plan
7
Videos vs. scanned documentsTemporal aspectsComplex and moving background Artificial shadows
FeaturesIntroduction Evaluation ConclusionText detection Results
8
Videos vs. scanned documents
• Low resolution• Low quality
• Antialising artifacts• Compression artifacts• Color bleeding
FeaturesIntroduction Evaluation ConclusionText detection Results
9
What is text? - character segmentation
Artificial textArtificial text
Scene textScene text
FeaturesIntroduction Evaluation ConclusionText detection Results
10
What is text? - texture
Example: Gabor energy features on a text image
Original image Filter tuned to the example text
Gabor energy Thresholded Gabor energy
FeaturesIntroduction Evaluation ConclusionText detection Results
11
What is text? - texture
0
50
100
150
200
250
0
50
100
150
200
250
300
0
50
100
150
200
250
0
50
100
150
200
250
300
Still imagesIntroduction Videos IndexingCharacter segmentation Results
12
What is text? - corners
Unthresholded “Harris” corner response
FeaturesIntroduction Evaluation ConclusionText detection Results
Derivative 2nd derivative smearedyxI
13
What is text? - contrast & geometry
Example image Accumulated horizontal Sobel edges
FeaturesIntroduction Evaluation ConclusionText detection Results
14
What is text? - color
Original image
Sobel on grayscale image Modified Sobel on L*u*v* image
Special cases of text:• Small contrast in the lumination plane• High(er) contrast in the color plane
FeaturesIntroduction Evaluation ConclusionText detection Results
15FeaturesIntroduction Evaluation ConclusionText detection Results
Introduction
Features
Text detection
Conclusion Experimental Results
Plan
Evaluation/Choice of features
16
EvaluationA good evaluation algorithm permits:
• A simple and intuitive interpretation of the obtained performance
• An objective comparison between the different algorithms to evaluate
• A good correspondence between the performance measures and the real performance, taking into account the objective of the algorithm (goal oriented approach)
• Takes into account only the performance of the algorithm, without side effects of other processing steps
FeaturesIntroduction Evaluation ConclusionText detection Results
17
Evaluation at different levels
Statistical separation: Bhattacharyya distance
Error rate,Recall/Precision
on pixel level
Recall/Precisionon rectangle level
Goal oriented: Recall/Precision
on character level
Higher relevance to the application
Lower influence of later stagesLower computational complexity
Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel......
Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel......
Detection result Ground truth
FeaturesIntroduction Evaluation ConclusionText detection Results
18
Evaluation on rectangle level
Detection Ground truth
Pure overlap is ambiguous on multiple images: 50% of recall could mean:
• 50% of the text rectangles have been detected perfectly• 100% of the rectangles have been detected with 50% surface• Anything between the two ...
FeaturesIntroduction Evaluation ConclusionText detection Results
19
Evaluation on rectangle levelRequirements of an evaluation measure:• Tells intuitively how many rectangles have been detected,
and how many false alarms• Measures the detection quality• Takes into account one-2-one, one-2-many and many-2-one
matches• Scales up to multiple images
Counts number of correctly detected
rectangles
Measures the detection quality
Problem:
Contradiction
FeaturesIntroduction Evaluation ConclusionText detection Results
20
Performance graphs
Ground truth Gi
Detection Di
“Surface” Recall and Precision:Thresholded by different thresholds on recall and precision
For each rectangle, we will know whether it has been detected or not, depending on a quality threshold
FeaturesIntroduction Evaluation ConclusionText detection Results
21
Performance graphs
Threshold on surface recall Threshold on surface precisionFeaturesIntroduction Evaluation ConclusionText detection Results
22
Comparison of different detection
algorithms
Method 1:Local contrast
Method 2:SVM Learning
FeaturesIntroduction Evaluation ConclusionText detection Results
23
The influence of the test database
Local contrast SVM learningFeaturesIntroduction Evaluation ConclusionText detection Results
24FeaturesIntroduction Evaluation ConclusionText detection Results
Introduction
Features
Conclusion Experimental Results
Plan
Evaluation/Choice of features
Text detection
25
The local contrast method
Calculate a text probability image according to a text model (1 value/ pixel)
Separate the probability values into 2 classes.
Post processing
Fisher/Otsu
• Mathematical morphology• Geometrical constraints• Verification of special cases• Combination of rectangles
F. LeBourgeois
Still imagesIntroduction Videos ConclusionCharacter segmentation Results
26
The learning method
Learning gray values and edge maps alone may not generalize enough.
Texture alone is not reliable, especially if the text is short.
Geometry is a valuable feature.
State of the art: enforce geometrical constraints in the post-processing step (mathematical morphology)
We propose the usage of geometrical features very early in the detection process, i.e. not during post-processing.
FeaturesIntroduction Evaluation ConclusionText detection Results
27
Geometrical features: baseline
Text consists of:• A high density of strokes in
direction of the text baseline.• A consistent baseline (a
rectangular region with an upper and lower border).
Two detection philosophies:• Detection of the baseline directly
before detecting the text region.• Detection of the baseline as the
boundary area of the detected text region in order to refine the detection quality.
FeaturesIntroduction Evaluation ConclusionText detection Results
28
Estimation of the text rectangle height
Original image Accumulated gradients
FeaturesIntroduction Evaluation ConclusionText detection Results
29
Mode width (=rectangle height) Mode height (=Contrast) Difference height left-right
Mode mean Mode standard deviation Difference in mode width
Features
FeaturesIntroduction Evaluation ConclusionText detection Results
30
Learning with Support Vector Machines
Training image database positive samples negative samples
Classification step: a reduction of the computational complexity is necessary:
• Sub-sampling of the pixels to classify (4x4)• Approximation of the SVM model by SVM-regression.
Bootstrapping, cross-validation
FeaturesIntroduction Evaluation ConclusionText detection Results
31FeaturesIntroduction Evaluation ConclusionText detection Results
Introduction
Features
Conclusion
Plan
Evaluation/Choice of features
Text detection
System Recall Precision H. meanAshida 46 55 50HWDavid 46 44 45Wolf 44 30 36Todoran 18 19 18Full 6 1 2
Experimental Results
32FeaturesIntroduction Evaluation ConclusionText detection Results
AIM3News
AIM4Cartoons, News
AIM5News
AIM2Commercials
33
Detection in still images
Local contrast
SVM learning
FeaturesIntroduction Evaluation ConclusionText detection Results
Dataset # G Recall Precision H.MeanArtificial text + no text
144 1.49 81.2 20.1 32.3
Artificial text + scene text + no text
384 1.84 59.1 18.1 27.7
Dataset # G Recall Precision H.MeanArtificial text + no text
144 1.49 59.7 23.9 34.2
Artificial text + scene text + no text
384 1.84 47.5 21.5 29.6
34FeaturesIntroduction Evaluation ConclusionText detection Results
Local contrast
SVM learning
35FeaturesIntroduction Evaluation ConclusionText detection Results
Local contrast
SVM learning
36
Detection in video sequences
Videos Contrast SVM Learn.
Classified as text 301 284
Classified as non-text 21 38
Total in ground truth 322 322
Positives 350 384
False alarms 947 171
Logos 75 39
Scene text 72 90
Total - false alarms 497 513
Total 1444 684
Recall (%) 93.5 88.2
Precision (%) 34.4 75.0
Harmonic mean (%) 50.3 81.1
FeaturesIntroduction Evaluation ConclusionText detection Results
37
Character segmentation: examplesOriginal image
Fisher/Otsu
Fisher/Otsu (windowed)
Yanowitz-B.
Yanowitz-B. +post-proc.
Niblack
Sauvola et al.
Contrast maximiz.
FeaturesIntroduction Evaluation ConclusionText detection Results
38
Bin. Method Recall Precision H. Mean N. CostOtsu 47.3 90.5 62.1 56.8Niblack 80.5 80.4 80.4 40.0Sauvola 72.4 81.2 76.5 42.3Max. contrast 85.4 90.7 88.0 23.0
OCR resultsLocal contrast based binarization
Recognition by Abby Finereader 5.0
Sauvola et al. MRF
Bayesian estimation using a Markov random field prior
1 2 3 4 5 Total
Sauvola 77.1 39.8 77.1 99.0 98.7 79.0
MRF 81.0 40.5 87.3 99.3 98.8 82.0
Character recognition rate
Document
FeaturesIntroduction Evaluation ConclusionText detection Results
39
TREC 2002
“Dance”
“EnergyGas”
“Music”
“Oil”
“Airline”“Air plane”
FeaturesIntroduction Evaluation ConclusionText detection Results
Collaboration with Laboratory LAMP, University of Maryland
40
ConclusionThe choice of features is primordial in vision.
We developed a new system for detection, tracking, enhancement and binarisation of text.
Detection performance is high due to the integration of several types of features in a very early stage. The learning method is less sensitive to textured noise in the image.
We propose a new evaluation method which allows intuitive visualization of the detection quality by performance graphs.
FeaturesIntroduction Evaluation ConclusionText detection Results
41
Outlook
Possible improvement of the features (e.g. contrast normalization, non-linear texture filters).
Integration of different feature types (statistical, structural, ...)
Usage of a priori knowledge on text in order to decrease the number of false alarms
Integration of the detected text into a indexing/browsing/segmentation framework
FeaturesIntroduction Evaluation ConclusionText detection Results
42
Optionalslides
43
The Bhattacharyya distance
top related