Top Banner
Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of Sydney, NSW 2006, Australia phone: +61 2 9351 5338 fax: +61 2 9351 4824 e-mail: [email protected]
12

Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Form Image Compression

using

Template Extraction and Matching

Jianguo Wang and Hong Yan

School of Electrical and Information Engineering

University of Sydney, NSW 2006, Australia

phone: +61 2 9351 5338

fax: +61 2 9351 4824

e-mail: [email protected]

Page 2: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Multi-copy Form Images

Redundancy Analysis

• Local Redundancy (CCITT Group 3, Group 4, JBIG)

• Global Redundancy

– Component-level redundancy (JBIG2)

– Pattern assemblage redundancy in similar images (TEM)

Page 3: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Flow chart of the TEM form compression scheme

Compressing Restoring

No

No

Comparing

Similar?

Try again?

Templateextraction

Filled-in patternextraction

Compressingand saving

Finish

Yes

Yes

Formimages

Compressedimages

Templatelocation

Restoringcompressed

images

Displayand/orsaving

Finish

Page 4: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Template extraction

• image de-skewing and locating,

• distortion adjusting,

• template extraction,– generating greyscale image

– thresholding to get two pre-templates

– getting template by comparing pre-templates

• template refining.

Page 5: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

A set of adjusted binary form images is overlapped to generate a greyscale image. The density of a pixel is determined by the times of black pixels overlapped

Page 6: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Examples of the compression approach(a) an original form image; (b) template extracted from a set of filled-in

forms

Page 7: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Compression

• image de-skewing and locating,

• distortion adjusting,

• filled-in data extraction,– three possible situation

– two types of prototypes: SCC and CCC

• compression with Group 4 as tiff files.

Page 8: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Decompression

• two types of prototypes: – SCC: performing in the rectangle area– CCC: performing in the pixel set of prototypes

• Three possible situations:– blank: copy the corresponding prototype– different: no substitution occurs– exactly same: delete the component

Page 9: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

(c) the reconstructed image (d) the filled-in data extracted from (a).

Page 10: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Sample forms used for testing

Page 11: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Form Document Compression Experiment Results

A Directory Micros Soco Tafe Westp

B Number of files 100 6 100 50

C= B*F Size of all the tiff files

(bytes)2,141,539 141,673 3,640,274 4,456,211

D =G*B+H Size of the compressed

file(bytes) 467,548 25,702 274,344 896,958

E= C/D Average compression rate

over tiff4.58 5.51 13.27 4.97

F= C/B

Average size ofeach tiff file (bytes) 21,415 23,612 36,403 89,953

G =(D-H)/B

Average size of eachcompressed image(bytes) 4,497 1,652 2,389 16,490

H Size of the template (bytes) 17,832 15,792 35,462 72,646

Page 12: Form Image Compression using Template Extraction and Matching Jianguo Wang and Hong Yan School of Electrical and Information Engineering University of.

Conclusion• TEM to reduce pattern assemblage redundancy in

similar images;– can combine with any current standard (CCITT G3,

G4, JBIG) to reduce local redundancy

– can combine with JBIG2 to reduce Component-level redundancy in same image;

• a statistical template extraction algorithm by over-lapping binary images to a greyscale images;

• Form images de-skewing, location and distortion adjusting;

• pattern matching rules for SCC and CCC.