Top Banner
© 2014 Cognizant © 2014 Cognizant Cursive Handwriting Recognition March - 2015
19

March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

Apr 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

© 2014 Cognizant

Cursive Handwriting Recognition

March - 2015

Page 2: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 2

Introduction

Cursive Handwriting Recognition has been an active area ofresearch and due to its diverse enterprises applications, itcontinues to be a challenging research topic in terms ofaccuracy.

The focus is specifically on recognition of cursive handwrittencharacters in insurance forms.

Six different types of insurance forms have been used for thecurrent study. These forms are mixtures of printed andhandwritten text.

Example fields: Phone number, SSN, telephone number, policynumber, name, address, dependent details etc.,

Page 3: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

To design a semi automated method for extracting the cursive

handwritten text present in insurance forms and reduce the errors

that could happen during extraction and recognition.

Also it attempts to correct the recognition error using Natural

Language Processing(NLP).

3

Objective

Page 4: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

It is challenging task to design a practical cursive handwritten

recognition system, which can maintain high accuracy and it is

independent of quality of the input documents.

Complexity of character segmentation stems from the wide

variety of fonts, rapidly expanding text styles and poor image

characteristics.

Touched, overlapped, separated and broken characters are

major factors for causing segmentation errors.

4

Challenges

Page 5: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

Utilization of MATLAB as a platform to provide efficient solution

within short period of time.

Using Image processing toolbox to process the captured image

in order to extract the enhanced image snippets and segment it

to characters.

Using methods: Histogram of Orientated Gradients(HOG) and

Principal Component Analysis (PCA) to construct the feature

model.

Computer Vision tool box is used to classify and recognize

segmented characters.

MATLAB utilities are used to build GUI for displaying the

results and store the extracted data in database within MATLAB

environment.

5

Methodology

Page 6: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 6

The Proposed Model

Input Image

(Scanned Insurance Form)

Manual Error

Correction

(Low confidence

case)

Preprocessing

(Snippet Extraction

+segmentation)

Feature Extraction

( HoG PCA)

Natural

Language

processing

(Error Correction)

Store the

Extracted

data

Classification

and

Recognition

(Neural Network)

Page 7: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 7

Preprocessing

Tagging of Forms (Manual Process ) – Snippet

extraction

Segmentation(Image snippet segment into individual glyphs or letters)

Labeling process (Assigning a number to each segmented characters)

Noise FilteringSmoothing the

edgesNormalization Dilation

Image Enhancement

The following series of operations needs to be perform on scanned

input image.

Page 8: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

HOG- PCA Features describes the relevant structural information

contained in a pattern.

Histograms of Oriented Gradient(HOG) descriptor and then project

it to a linear subspace Principal Component Analysis(PCA).

Robust under illumination, pose and view point changes.

8

Feature Extraction

Page 9: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

Neural Network (NN) is a powerful classifier that can be very

useful to classify HOG-PCA features.

Feed Forward Back Propagation Neural Network (FFBPN) is

used.

Neural Classifier consists of two hidden layers besides an input

and output layer.

The total number of neurons in output layer is 62 (upper case,

lower case and numeric characters) as the proposed system is

designed to identify numeric characters and alphabets.

9

Classification

Page 10: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

Building NLP aided Dictionary for insurance terms.

It stores user terms extracted from the previous history(database) and then improves the recognition accuracy.

It prevents repeated mis-recognitions.

Also avoids the risk of user stress caused by repeatedfailures in recognition.

It is used to modify the classification model. Helps to buildadaptive classification model.

10

Error Correction Using Natural Language processing (NLP)

Page 11: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

MATLAB R2013b : Generic math functions present in MATLAB

11

Tools used

MATLAB Tools/Functions Purpose

Image Processing Tool BoxImhist, histeq, dilate, bwlabel, imadjust, histeq,

adapthisteq, imfilter , imopen, imclose etc.,

Toolbox supports a wide range of image

processing operations including noise filtering,

histogram, enhancement , normalization etc.,

Computer Vision Tool BoxextractHOGFeatures

To HOG features from the input image

Statistics Tool Boxprepca, prestd, trapca etc.,

Principal component analysis on input data.

Neural Network Toolboxnntool

To classify and correctly recognize objects

based on the observed features

Database Toolboxdatabase, isconnection, set, sql2native

To build database to store the extracted data

Graphical User Interface(GUI) GUIDE To display the results within MATLAB

Environment

Page 12: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

Training

Number of characters used for training (alphabets, numeric and

alphanumeric) 20,000

Number of characters used for testing 15,000

Testing

Numeric Recognition with accuracy of 96% at an average

confidence level of 95%

Alphabets with a accuracy of 81% with average confidence

level of 85%.

12

Results

Page 13: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 13

Average ( Upper, Lower and Numeric) Character Recognition

Results

Hidden

Units

Classification rate(%)

with NLP

Classification rate(%)

without NLP

10 82.25 80.08

20 84.93 82.00

30 85.83 83.84

Page 14: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 14

Receiver Operating Characteristic plot

Numeric Recognition

AlphabetsRecognition

Page 15: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 15

Screen shots

Graphical User Interface

Page 16: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 16

Contd….

Page 17: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

In this work, a new NLP based cursive handwrite recognition

approach has been presented in this project that produces

promising results

MATLAB tool is utilized to build a efficient cursive handwritten

characters recognition system within short span of time

NLP aided error correction helps to improve the accuracy

significantly

The recognition rate on this various insurance forms are very

promising in the real time environment

17

Conclusion

Page 18: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant

1. Choudhary, A. (2014)A Review of Various Character Segmentation Techniques

for Cursive Handwritten Words Recognition.

2. Abuzaraida, M. A., & Zeki, A. M. (2012, November). Recognition Techniques for

Online Arabic Handwriting Recognition Systems. In Advanced Computer Science

Applications and Technologies (ACSAT), 2012 International Conference on (pp.

518-523). IEEE.

3. Ghosh, R., & Ghosh, M. (2005). An intelligent offline handwriting recognition

system using evolutionary neural learning algorithm and rule based over

segmented data points. Journal of Research and Practice in Information

Technology, 37(1), 73-88.

4. Günter, S. (2004). Multiple classifier systems in offline cursive handwriting

recognition (Doctoral dissertation, University of Bern).

5. Wada, Y., & Kawato, M. (1995). A theory for cursive handwriting based on the

minimization principle. Biological Cybernetics, 73(1), 3-13.

18

Reference

Page 19: March - 2015...The focus is specifically on recognition of cursive handwritten characters in insurance forms. Six different types of insurance forms have been used for the current

© 2014 Cognizant 19