Detection and Extraction of Artificial Text from Videos PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et Vision Bât. Jules Verne INSA 69621 Villeurbanne CEDEX 10 th July 2001 Christian Wolf and Jean-Michel Jolion http://rfv.insa-lyon.fr/~{wolf,jolion}
46
Embed
Detection and Extraction of Artificial Text from Videos PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Detection and Extraction of Artificial Text from Videos
PROJECT France Télécom Research & Development 001B575
Laboratoire de Reconnaissance de Formes et VisionBât. Jules Verne INSA
69621 Villeurbanne CEDEX
10th July 2001
Christian Wolf and Jean-Michel Jolion
http://rfv.insa-lyon.fr/~{wolf,jolion}
Plan of the presentationIntroductionDetection Image enhancement - multiple frame
integrationBinarisation of the text boxesSetup of the experimentsResults
Niblack proposed a method which calculates a threshold surface by gliding a rectangular window over the image and calculating statistics on this window:
skmT m mean s standard deviationk parameter, = -0.2
Fixing the dynamic range R=128 might be ok for document images, but not for text boxes taken from videos. Binarisation will not be correct, if the contrast of the image is smaller. We therefore set the parameter R to the
maximum standard deviation for all windows calculated:
)max(sR
To avoid two passes of the windowing algorithm, the mean and standard deviation can be stored in a table during the first pass and the threshold
Improvement: Shift of the image rangeThe strong hypothesis on the gray values (text pixels must be near zero) is not justified for some video text boxes:
m mean s standard deviationk parameter, = 0.5R = maximum of the
std.dev. of all windows
M = minimum gray value of the text box
The same effect can also be achieved by changing the threshold formula:
Rs1
)( MmkmT
Fast incremental calculationMean and variance can be calculated in one pass:
)(1 2
2
Na
bN
s aN
m1
ixa 2
ixb
RyxLyx
tt yxIyxIaa),(),(
1 ),(),(
2
),(),(
21 ),(),(
RyxLyx
tt yxIyxIbb
L
R
At the beginning of each line, the full window is calculated and the variables a and b kept. After each shift, a and b are calculated incrementally by subtracting the column of pixels which left the window and adding the column which entered the window.
Mean and standard deviation are stored in 2d tables, then the maximum R=max(s) is computed before calculating the threshold surface
The interface to the OCR softwareIdeal situation:Pass individual (binarised) text boxes to an OCR software which recognises the contents box after box.
In reality:We used standard commercial OCR software for our tests. This software has been designed to recognise scanned A4 or US letter pages and cannot directly process text boxes.
OCR Output051Q07Ô7 N*Verf 05JQ0707PUBLICITE IPUBIIÏITE IPUBLICITEprenez prenez prenezboyard boyard boyard^française ^française ^françaiseFRANCE FRANCE FRANCE FRANCE FRANCEc'est plus musclé c'est plus muscléiï 'Jfort fort fort fort fort .fort .fort .fortcotHfUet blé cotHfUet blé cQ#tfUet bléuutàfruuk On va beaucoup {&*$ loin avec Itineris.Partout Partout Partout Partout Partout Partout Partout Partout Partout PartoutI22h35 I22h35 I22h35 I22h35 I22h35PUBLICITE \PUBLICITE \PUBLICITE>3h55l23h55l23h55l23h55l23h55l23h55 20h.50120h50 |20h50120h50 |20h50120h50,f ort boyard ,f ort boyard ,f ort boyard ,f ort boyard2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg JII II II II II II II II IIgà dents gà dents gà dentsIIH r Lessive classique lljir Lessive classique I[HT Lessive classiquele temps le temps le temps le temps le temps ^PUBLICITE ^PUBLICITE ^PUBLICITEI Par Amour du Goût. Il Par Amour du Goût. Ien en en en en en en en enrévolution révolution révolution
blé cotHfUetuutàfruukOn va beaucoup {&*$ loin avec Itineris.
PartoutI22h35PUBLICITE \>3h55l20h.50
,f ort boyard
dimanche23h55N Vert 05100707BerlingoPUBLICITEprenezdiffusion simultanée en stéréo surboyardfrançaiseFRANCEc'est plus muscléPUBLICITEfortCoralblé completfruitsOn va beaucoup Plus loin avec Itineris.BohêmePartout22h35PUBLICITE23h5520h50fortfort boyard
Automatic evaluation using markersThe manual processing of the OCR output (separation of the output strings and search of the corresponding input box) is time consuming and error prone, especially in cases where the quality of the OCR output is very poor.
Automatic OCR output processing can be achieved by placing marker images between the text boxes. The marker boxes contain text which is easily recognised by the OCR software.
In the results section we will present results for both types of evaluation.
@S Par Amour du Goût.@S en@S révolution@S la@S française@S le pire de@S 20H45
OCR output Raw ground truth
Search output for individual text boxes
List of strings, each corres-ponding to the output for a text box, but eventually multiple times
# Page 1:P 1T 1 2M 1 2T 2 3M 2 2T 3 2
Structure log
Prepare ground truth
List of strings, each corresponding to the ground truth for a text box. Each string is repeated the same number of times as the corresponding text image in the OCR input image
A measure for resemblance of two character strings. The cost to transform string A into string B is calculated. Basic transformation operations are used, which correspond to a certain cost. The cost function is minimised.
ConclusionWe developed a system for detection, tracking,
enhancement and binarisation of text.A detection performance of 93.5% is obtained.We derived a new binarisation method adapted to the type
of text found in videos.The total recognition rate is surprisingly high, given the
quality of the text, but not yet good enough for indexation purposes.
OCR integration problem: No software development kits for direct access to the recognition functions available. A collaboration with an OCR company seems to be inevitable.
OutlookThe perspectives of our work are situated in the extension of the existing algorithms to text with more difficult properties, and the enhancement and deeper studies of the existing techniques:
Scene text: The binarisation techniques developed in the last 30 years are aimed either at document images or images from computer vision. The method we introduced in the framework of this project is an improvement of the work already presented, but the quality of the text is not yet satisfying enough. Especially the binarisation of scene text will demand the development of new methods.
Detection recall: We are convinced, that the recall of the detection system can still be increased by further research, e.g. on the binarisation technique applied to the map of accumulated gradients.