Top Banner
Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology • Chemnitz University of Technology] Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler
22

Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

Mar 28, 2015

Download

Documents

Grace Wood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

Institut für Print- und Medientechnik der TU Chemnitz[Institute for Print and Media Technology • Chemnitz University of

Technology] Direktor: Prof. Dr. Arved C. Hübler • Reichenhainer Str. 70 • 09126 Chemnitz • Germany

http://www.tu-chemnitz.de/pm • [email protected] • Tel: +49-371-531-2364 • Fax: -3780

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler

Page 2: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

2 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Digitization of Historical Documents

GEB1150

Page 3: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

3 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Alphabet und Font Extraction

XML instance

alphabet and fontdefinition

...

content

...

glyph ID1

glyph ID2

ID1 ID2 ID3ID3 ID4

Page 4: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

4 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Vectorization - Raster to Vector Conversion

font assignmen

t

Vectorization

RIP

41 hex

OCR

vector font

encoded text e.g.

ASCII

bitmap graphic

Page 5: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

5 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

document image

blocks

textual blocks image blocks

structural information

region basedsegmentation

blockclassification

text lines

segmentation

words

characters

segmentation

segmentation

DIA System und Workflow

1. text (headline)

2. bitmap image 3. text block

4. text block

1. text (headline)

2. bitmap image 3. text block

4. text block

Page 6: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

6 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

character images

set of prototypes

clustering

vectorisable glyphs

classification ofvectorisable glyphs

non vectorisable

images

set of bitmap symbols

IDassignment

set of vectorised

paths

vectorisation

document specific

SVG font

transformationto SVG

set of SVG glyph

descriptions

assignment of private Unicode code points

DIA System und Workflow

&#xE000

Page 7: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

7 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

DIA System und Workflow

XML + SVG encoded

document

image blocks

structural information

set of bitmap symbols

document specific

SVG font

encoding

references

specific output formats

layout modificationby means of XSLT

OCR

XML

1. text (headline)

2. bitmap image 3. text block

4. text block

1. text (headline)

2. bitmap image 3. text block

4. text block

Page 8: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

8 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Vectorization Approaches

• Contour based

• Skeleton based

CompxNCompxCore )(: CorexCompxCont :

zxyxCompContxdist

zyCompContzyCompxS

)(,(

, ),(,:

Page 9: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

9 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Applied Algorithms

• Pre-processing- Finding connected components (Region Growing)- Contour extraction (Contour following)

• Polygonal Approximation Based on Relaxation- Phase 1: Clustering of polygonal points- Phase 2: Relaxation (Error correction)

• Automatic Parameter Control- Rasterization of the resulting glyph images- Ascertaining a weighted error (Ground Truth)- Selecting appropriate vectorization parameters

Page 10: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

10 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Finding Connected Components

Ü Ö Ä % “ !

Page 11: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

11 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Region Growing

Page 12: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

12 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Contour Following

white pixel

black pixel

starting point

examination order

Page 13: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

13 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Clustering of Polygonal Points

Page 14: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

14 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Relaxation

Page 15: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

15 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

SVG Representation

Page 16: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

16 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Visual Quality

Page 17: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

17 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Formal Quality Measurement - Ground Truth

Error function- absolute number of wrong pixels- weighted by the distance to the next true component

Page 18: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

18 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Results

Page 19: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

19 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2

vectorization parameter ε

acc

ura

cy

H K d

Adaptive Parameter Control

-5

-4

-3

-2

-1

0

1

2

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2

vectorization parameter ε

accu

racy

gra

die

nt

H K d

Page 20: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

20 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Compression rates

Page 21: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

21 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Conclusions

• Good vectorization results already with linear primitives• High compression rates can be achieved• Extracted fonts can be easily scaled and further formatted• Known vectorization methods have been extended towards an adaptive system for automatic parameter control• These methods can be applied for preservation and handling of unknown type faces in digitized documents• Originals may be re-encoded using a document specific alphabet and font• Direct integration into XML/SVG based processes possible• Various output formats can be supported by means of XSL transformations

Page 22: Institut für Print- und Medientechnik der TU Chemnitz [Institute for Print and Media Technology Chemnitz University of Technology] Direktor: Prof. Dr.

22 Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

Thank you very much!

[email protected]

Questions