JY Ramel et al. Interactive and Incremental Analysis of Document Images Laboratoire d’Informatique de Tours - FRANCE.

JY Ramel et al.

Interactive and Incremental Analysis of Document Images

Laboratoire d’Informatique de Tours - FRANCE

Document Image Analysis Systems: strategies and tools

2

LABRI Seminar

Introduction Context of the presented work Let’s dive into the semantical gap…

Characterization and representation of document images Selection of low level primitives A graph representation for the layout or structure

Analysis and recognition of the contents Contextual and incremental analysis Operators and scenarios AGORA and RETRO Prototypes

Conclusion

Work realized over a series of collaborations … thanks to all contributors!


3

IntroductionContext of the work

Preservation of the cultural heritage The CESR Tours : a training and research centre Working on various domains of the Renaissance (historians) A rich library of rare books (Loire Valley)

An initial project: The Humanistic Virtual Library (BVH in French)

Collaboration with RFAI research team A pluridisciplinary collaboration :

Experts in DIA + Experts in rare books + End-users

Fill up the semantical gap ? A new idea : Introduce more interaction into DIA systems


4

IntroductionLet’s dive into the semantical gap

Which segmentation methods are able to extract such EoC (Element of Content)?

• Data driven methods?

No Too much noise and parameters

• Model driven methods?

No Not much variability in the model of document


5


Low level information (data driven)

Pixels, regions, contours, primitives

Low level processing according to images

specificities (data)

High level entities (model driven)

Indexation = reading =

a priori knowledge = Model

Domain specific processing genericity

Most of the time: no user / Everything is

encoded

?


6

Fill up the gap with the help of the user ?

Learning of the model or of the shapes to recognize


Before


7

Fill up the gap with the help of the user ? Interactive construction of the processing sequence

A posteriori intervention

Error Correction by relevance feedback

Ariane / PandoreGraphEdit / Directshow


Before

After


8

Introduction

Let’s dive into the semantical gap

Fill up the gap with the help of the user Our Proposition

Incremental analysis Segmentation for recognition, recognition for segmentation From the simplest to the more difficult

Interactive analysis User-driven method Adaptation according to images Adaptation according to user objectives

It requires An initial representation of the image content A set of processing operators for segmentation and recognition

with interoperability and compatibility capabilities

DURING


9

Part I

Characterization and representation of document images

Information about the shapes using contour vectorization

Information about the structure (layout) using a graph representation

Towards a generic structural representation VectoGraph


10

Characterization and representation of document imagesInformation about the shapes using contour vectorization

Which primitives for describing shapes in a document?

Binarization to extract contours (Vectors, Quad) and CC


11

Characterization and representation of document imagesInformation about structure using graphs

• Idea: An evolutive graph of EoC

Two types of EoC:

• Primitives / Elementary EoC• Connected components• Vectors• Quadrilaterals

• User defined EoC • Characters• Words• Ornamental letters• Triangles• Diodes• …


12

Node = Primitive or EoC

- Type of EoC– Centre (X,Y) of the Bounding Box– Bounding Box– Bounding Rectangle BR :(P1,P2,P3,P4)– Orientation = inertia axis– Density of B&W inside the BB– Color : Average color– List of the elementary EoC– Number of elementary EoC– Confidence rate

Edge = Relation between EoC

– Minimal distance between 2 EoC– Angle between EoC– Relation : Inside, Overlap, L, T, P, X, S, undefined

EoC


• Towards a generic representation with graphs

Axis


13

Initial Representation for old document - Graph of EoCs + Background map -

Background map

Graph of the connected components Tagging of the nodes according to the

size Noise Text Images

Edges between closed shapes (CC) Horizontal/Vertical neighborhood

ThresholdFusionjiNgMin256GGdGGji

_))],([(),(],[),(

2121

1 2 3

4

HH

V V


14


• Towards a generic structural representationD

om

ain

independent

Documents : Pixels, Points, …

Layout & relations Shapes & EoC

Angle

Edge : Relation between primitives

or EoCs

Quadrilateral Vector

Connectedcomponent

Distance

Topology

Node : Primitive, EoC

Representation = structural graph

Analysis = Domain dependent


15

Part II

Analysis and recognition of image contents

Contextual and incremental analysis of old document images

Three operators with simple parameters Interactive construction of scenario

Examples AGORA prototype


16

Analysis and recognition of image contents Strategy of analysis

Proposition : user driven analysis (scenario)

incremental and interactive approach

No predefined EoCs (model of document) Users can themselves define the required EoCs Interactive definition of the model of the document Incremental analysis (simple difficult)

Easiness Easy to use interfaces – user assistant No complex image processing algorithms But just:

Tagging (extraction-recognition) Merging Deletion


17

Analysis and recognition of image contents Three operators

Tagging (extraction) of EoC (nodes) according to rules about spatial position in the pages rules about neighborhood relationship (using edges) rules about internal properties (node attributes)

Merging of EoC according to rules using the distance

computed from the background map (edge attributes)

on a specific type of EoCs

Deletion of EoC according to label and user decision


Analysis and recognition of image contents Scenarios

User-defined processing sequence

Graph analysis and modification

Defined by users on a typical image

Depending on the user objectives and on the images

Can be saved, edited and applied in batch mode …


19

Analysis and recognition of image contents Examples

Initial representation = primal EoC


20


Tagging the primal EoC Text – Graphic - Noise

Graphic


21


Merging of EoC = Text Word – Line - Paragraph


22


Position verticale

Position horizontale

avg = 0,46 std = 0,41

avg = 0,51 std = 0,07

Automatic Tagging Lettrine

With the collaboration of Nicholas…


23


ERR

EUR

Automatic Tagging with manual validation/modification of the rule

With the collaboration of Nicholas…


24


- Primal sketch construction

- Img Type = Connected Component of size > 200

- Noise Type = Connected Component of size < 10

- Text Type = Connected Component of size between 10 and 200

- Horizontal and vertical Fusion of Text with d < 2000

- Border Type = Img with Width/Height Ratio between 3 and 10

- Ornamental Letter Type = Img close to Nothing at the Left and close to Text at the Right

- Img Type = Ornamental Letter with Width/Height Ratio < 0,8 or with Width/Height Ratio >1,2

- Img Type = Ornamental Letter on the Right < 75 %

- Left Margin Type = Text on the left with 25%

- Right Margin Type = Text on the right with 25%

- Vertical Fusion of the Left and Right Margins with d < 10000

- Horizontal Fusion of the Text with d < 3000

- Pagination Type = Text in top with 10%

- Text Type = Pagination with a number of Connected Components > 3

- Signature Type = Text in bottom with 25%

- Text Type = Signature with a number of Connected components > 5

- Text type = Signature with Text below, on the left or on the right

- Suppression of the EoC labelled Text - Suppression of the EoC labelled Noise

Example of an obtained scenario applicable on a set of images



Marge

Title

Text

Text

Legend

Lettrine

Noise

Results


26

Analysis and recognition of image contents AGORA

A User-driven Approach |see IJDAR]

Graph representation Simple operators Scenarios Some interfaces

Used since 2004 at the

CESR Always to be improved…

Download :

http://www.rfai.li.univ-tours.fr/pagesperso/ramel/fr/work1.html


27

Analysis and recognition of image contents From AGORA to RETRO


28


…La l[21]ngueu[7] du chevalet depuis s[21]n pied jusques au c[7][21]chet d’en haut, p[21]rte d[21]uze t[7][21]us …

Accuracy Frequence of « tri-gram »

Dictionnary

Contextual, manual and automatic Transcription


29


Experiment on one complete book (Vésale)

Book of 150 pages 1.062.081 connected

components (pseudo characters) 40.000 classes (clusters) have

been built. 90% of these classes are

composed of less than 10 occurrences

Ignoring these classes during transcription means to miss one character for 14 more than one on each text line !!!

57% of the classes are composed of a single shape

Why ? Noise, spots Touching characters Splitted characters Same for words

• The 200 largest classes correspond to 85% of the text


30

Conclusion

Proposition of a global approach: from images to their interpretation

Modelisation of the data representation of image content

Genericity : thin and filled shapes, line and curves, shapes and structure or layout

Contour vectorization + relationship analysis

Utilization of attributed graphs

Modelisation of processing operators recognition

Utilization of contextual information during the EoC extraction and recognition

Involvement of the user (early) in the processing sequence : user-driven analysis

Proposition of new structural PR techniques


31

Thanks

Questions ?


Analysis and recognition of image contents Scenarios

Document : Image, pixels

Symbole …

Titre Lettrine

EdCEdC

Representation = Graph of EdC

Q1 Q2 …

User defined scenarios= succession of operators + thresholds

Q2 Q1 …P2 Q3 …

Scénario 1 Scénario 2 Scénario 3

P1 P3P2 P4

JY Ramel et al. Interactive and Incremental Analysis of Document Images Laboratoire d’Informatique de Tours - FRANCE.

Documents

model of document

initial representation

semantical gapcharacterization

semantical gapbeforeafter

semantical gapfill

quadrilaterals user

user objectivesit

primitiveslow level