UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition
21
Embed
Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Optical Data Capture: Optical Character Recognition (OCR)
Intelligent Character Recognition (ICR)
Intelligent Recognition
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Summary Concept/Definition Forms Design Scanners & Software Storage Accuracy OCR/ICR Advantages and Disadvantages Intelligent Recognition (IR) Commercial Suppliers
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of OCR
Gives scanning and imaging systems the ability to turn images of machine printed characters into machine readable characters.
Images of the machine printed characters are extracted from a bitmap of the scanned image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of ICR
Gives scanning and imaging systems the ability to turn images of hand written characters into machine readable characters
Images of the hand written characters are extracted from a bitmap of the scanned image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR and ICR Differences
OCR is less accurate than OMR but more accurate than ICR
ICR will require editing to achieve high data coverage
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Forms OCR/ICR has less strict form design
compared to OMR No timing tracks Has Registration Marks
ICR requires hand printed boxes filled one alphanumeric character per box
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR Forms
OCR/ ICR is more flexible since: no timing tracks are required The image can float on a page
The use of drop color reduces the size of the scanner’s output and enhances the accuracy
ICR/OCR technology often uses registration mark on the four-corners of a document, in the recognition of an image
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Scanners and Software
Forms can be scanned through a scanner and then the recognition engine of the OCR/ICR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).
Users can scan up without doing the OCR
Speeds Range from: 85-160 sheets/min (dependent on the recognition engine)
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Storage Characteristics Storage/Retrieval
Images are scanned and stored and maintained electronically
There is no need to store the paper forms as long as you safeguard the electronic files
With OCR/ICR technologies, images can be scanned, indexed, and written to optical media
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Ideal OCR/ICR Accuracy Thresholds Accuracy:
Accuracy achieved by data entry clerks (~99.5%) are approximately equal to OCR/ICR in in perfect tuning (~99.5%)
Up to 99.9% accuracy with editing (like OMR)
The recognition engine must be tuned, tested and validated very carefully
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Advantages Advantages
Recognition engines used with imaging can capture highly specialized data sets
OCR/ICR recognize machine-printed or hand-printed characters.
Scanning and recognition allowed efficient management and planning for the rest of the processing workload
Quick retrieval for editing and reprocessing
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Disadvantages
Technology is costly
May require significant manual intervention
Additional workload to data collectors -ICR has severe limitations when it comes to human handwriting
Characters must be hand-printed/machine-printed with separate characters in boxes
ineffective when dealing with cursive characters
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OMR-OCR/ICR Compared
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
OCR/ICR Challenges/Issues
Has corresponding issues with OMR
Algorithm development (Preparation of memory dictionary)
Processing time considerations due to recognition engine
Development costs
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IRState of the art recognition technology
Gives scanning and imaging systems the ability to turn images of hand written and cursive characters into machine readable characters
Images of the hand written and cursive characters are extracted from a bitmap of the scanned image
The ability to capture cursive make this method unique
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR
eight elements that make up the trajectories of all cursive letters (figure 1)
Photo: Parascript LLC
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Definition/Concept of IR Intelligent Recognition dynamically uses context
context is used during the recognition process, improving the accuracy of results
Contexts helps to identify letters where the symbol segmentation of an image is ambiguous
Photo: Parascript LLC
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Cursive
Bad quality machine print
UnconstrainedHandprint
ConstrainedHandprint
Machine Print
TEXT STYLESFORM TYPESNo special form designNo constraining boxes or combsCondensed stringsDirty & Noisy formsBad quality paperLegacy Forms
Specially designed for automatic recognition
Constraining boxes or combs
Drop out ink for preprinted text & boxes
TECHNOLOGY EVOLUTION
OCR ICRIntelligentRecognition
Technology Evolution
Illustration: Conference on Technology Options for 2011 Census
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing
Doha, State of Qatar, 18-22 May 2008
Major Commercial Suppliers
Top Image Systems (TIS) (http://www.topimagesystems.com)
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing