LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/25484/10/10... · 2018. 7. 2. · A Neural Network Based Handwritten Character Recognition for Marathi Script

CHAPTER 2

LITERATURE

REVIEW

A Neural Network Based Handwritten Character Recognition for Marathi Script

2. Literature Review 16

2.1 Introduction

There is a great need for OCR related research in Indian languages, even though

there are many technical challenges as well as the lack of a commercial market [1]. With

the spread of computers in organizations and homes, automatic processing of paper

documents is rapidly gaining importance in India [2]. A short description of the

advancements in OCR of Indian scripts including Bangla, Tamil, Telugu, Gurmukhi,

Oriya, Gujarati, Kannada, and Devanagari up to 2002 can be seen in [3]. In this paper, it

is tried to address all the advancements till 2010 in printed as well as handwritten

Devanagari script recognition along with their performances. Devanagari is the script

used for writing many official languages in India, such as Hindi, Marathi, Sindhi, Nepali,

Sanskrit, and Konkani, where Marathi is the language spoken in Maharashtra state.

Several other Indian languages like Gujarati, Punjabi, and Bengali use scripts similar to

Devanagari. More than 300 million people use Devanagari script for documentation in

central and northern parts of India [4]. This chapter presents a comprehensive review of

the work carried out in Devanagari OCR. Section 2.2 discusses the literature review in

the field of machine-printed Devanagari script. Section 2.3 presents the review in the

handwritten character recognition field. In both these cases, the research carried out at

each stage of the OCR namely, pre-processing, feature extraction and

classification/recognition is discussed in detail. Section 2.4 puts forth some observations

and finally the chapter ends giving some concluding remarks in Section 2.5.

2.2 Recognition of machine-printed Devanagari script

The work on automatic recognition of printed Devanagari script started in early

1970s. The efforts then were initiated by Sinha [9], [10] at Indian Institute of

Technology, Kanpur. A syntactic pattern analysis system for Devanagari script

recognition is presented in Sinha’s Ph.D. thesis [9]. Another OCRsystem development of

printed Devanagari is by Palit and Chaudhuri [11] as well as Pal and Chaudhuri [12]. A

team comprising Prof. B. B. Chaudhuri, U. Pal, M. Mitra, and U. Garain of Indian

Statistical Institute, Kolkata, developed the first commercial level product for printed

Devanagari OCR. The same technology has been transferred to Center for Development

for the Advance Computing (CDAC) in 2001 for commercialization and is marketed as



“Chitrankan” [3]. The following sections discuss the preprocessing, feature-extraction,

and classification techniques reported so far for machine-printed Devanagari OCR.

2.2.1 Pre-processing and segmentation techniques

When a document is scanned using an optical scanner, a small degree of skew

(tilt) is unavoidable. Skew angle is the angle that the text lines in the digital image make

with the horizontal direction. Skew estimation and correction are important

preprocessing steps of document layout analysis. As far as documents containing

Devanagari text are concerned, the most important characteristic to be considered for

skew estimation is the header line (shirorekha) joining all the characters in a word.

An approach based on the detection of “shirorekha” is proposed by Chaudhuri

and Pal [13] and in [14]. Das and Chanda [15] also proposed a fast and script-

independent skew estimation technique based on mathematical morphology. After layout

preprocessing like skew elimination, the separation of paragraphs, text lines, words, and

characters is to be carried out for effective feature extraction. Text blocks in the

document pages are extracted first, and then, lines and words are separated. Separation of

text lines from text blocks is called line segmentation and separation of words from each

text line is called word segmentation. Projection profiles, space between words and lines

are used to achieve this in [5].

Separating words into constituent characters is called character segmentation.

Removal of shirorekha (header line) does the segmentation of characters from each

Devanagari word in [5], [16]. Garain and Chaudhuri [17] presented another technique for

identification and segmentation of touching machine-printed Devanagari characters

based on fuzzy multi factorial analysis. Bansal and Sinha [18] presented a two-pass

algorithm for the segmentation of machine-printed composite characters into their

constituent symbols. The proposed algorithm extensively uses structural properties of the

script. Kompalli et al. [19] used a graph representation method to segment characters

from printed words. In the methodology described by Bansal and Sinha [20], the

segmentation by smearing leaves the overlapping text lines and touching characters

unsegmented. The selection of image regions for further segmentation is based on

statistical analysis of height or width depending on the context. Sharma et al. [21]



presented a rule-based approach for skew correction along with removing insignificant

data like dark band, thumb mark, and specks.

In the method proposed by Kompalli et al. [22], the shirorekha is determined

using projection profile and run length. Once the shirorekha is removed, the top, middle,

and bottom zones are identified easily. Components in top and bottom zones are part of

vowel modifiers. Each of these components is then scaled to a standard size before

feature extraction and classification [23]. To segment touching printed Devanagari

characters on degraded documents, a technique based on fuzzy multi factorial analysis is

proposed in [96], where a predictive algorithm effectively selects the cut points to

segment touching Devanagari characters.

For the binarization of natural scene images containing Devanagari textual

information, an adaptive thresholding technique is proposed in [80]. A water-reservoir-

based analogy is proposed in [39] to extract individual text lines from such documents. It

is necessary to identify the scripts before applying their corresponding recognition

engine. Many techniques on line-wise and word-wise script identification have been

proposed in the literature [79], [82], [84], [86], [91], [95], [98], [106]. In [106], a line-

wise script identification approach is proposed, where different structural features are

used. In [86], appearance-based models are employed for the script identification of the

printed text. These models are based on principal component analysis (PCA) and linear

discriminant analysis (LDA)/Fisher’s linear discriminant (FLD).

Words are identified in multilingual document images using SVM in [95]. In

[98], for word-wise script identification, the document is initially segmented into lines,

and then, the lines are segmented into words. Individual script words are identified from

document images using different topological and structural features. Texture features

have been applied in [84] for script identification. In [79], a technique to identify

Kannada, Hindi, and English text lines from a printed document is presented. To get

higher accuracy, a two-stage approach is proposed for printed script identification in

[82].

2.2.2 Feature extraction techniques

Different features have been used for the recognition of Devanagari characters.

The system described by Sinha and Mahabala [10] for printed Devanagari characters



stores structural descriptions for each symbol of the script in terms of primitives and

their relationships. Sinha [24] also demonstrated how the spatial association among the

constituent symbols of Devanagari script plays an important role in understanding

Devanagari words. In [5], a character is assigned to one of the three groups, namely

basic, modifier, and compound character groups and group-wise features are considered.

Also, it is observed that the compound characters (around 250) in the script occupy only

6% of the text.

The major two features considered for printed Devanagari characters by Jayanthi

et al. [25] are main horizontal line and various vertical lines. The third feature is to test

whether vertical lines are present in the rightmost side of the character. The other

features have been the height to width (aspect) ratio of the character, whether the

character is narrow or broad ended and the number of free ends it has. Govindaraju et al.

[16] considered gradient features for feature selection of the characters. Kompalli et al.

[22], [26], used gradient, structural, and concavity (GSC) features for OCR of machine

printed and multifont Devanagari text. The gradient features were used to classify

segmented images. In the method proposed by Dhurandhar et al. [27], the significant

contours of the printed character are extracted and characterized as a contour set based

on a reference coordinate system. Jawahar et al. [23] used PCA for feature extraction of

printed characters. A word-level matching scheme for searching in printed document

images is proposed by Meshesha and Jawahar [28]. The feature-extraction scheme

extracts local features by scanning vertical strips of the word image and combines them

automatically based on their discriminatory potential. The features considered are word

profiles, moments, and transform-domain representations. In [1], printed Hindi words are

initially identified from bilingual or multilingual documents based on features of the

Devanagari script using SVM. Identified words are then segmented into individual

characters in the next step, where the composite characters are identified and further

segmented based on the structural properties of the script and statistical information.

In [79], a technique to identify Kannada, Hindi, and English text lines from a

printed document is presented. The features used for script identification of machine-

printed text in [82] are 64-D CH features and 400-D gradient features. For the purpose of

indexing in [87], printed Devanagari word images are represented in the form of

geometric feature graphs (GFG). It is a graph-based representation of the features

extracted from the image of the word. A set of features including percentiles, horizontal,



and vertical derivatives of percentiles, angles, correlations, and energy were used for the

recognition of printed Devanagari character recognition in [94]. LDA was then used to

reduce the dimensionality of the feature set from 81 to 15.

Zernike moments and directional features are used as the features for printed

characters in [95]. Using background and foreground information, a scheme toward the

recognition of Indian complex documents is proposed in [107].

2.2.3 Recognition/Classification techniques

Many classifiers like artificial neural network (ANN) [22], [23], [61], [77],

hidden Markov model (HMM) [42], support vector machine (SVM) [35], [61], modified

quadratic discriminant function (MQDF) [50], [56], etc., have been used for Devanagari

character recognition. Several compound discriminant functions have been derived from

the projection distance (PD) and the MQDF is one of them [35].

Some contemporary techniques like rough sets, fuzzy rules, evolutionary

algorithms, and Mahalanobis and Hausdorff distances [54], [68], [69], [96], etc., are also

used for the recognition purpose of Devanagari characters. A feature-based tree classifier

has been used in [5] to recognize the basic characters. A top–down binary-tree-based

recognition of printed Devanagari characters is proposed by Jayanthi et al. [25] as binary

tree is one of the fastest decision making processes for a computer program.

Govindaraju et al. [16] considered 38 characters and 83 frequently occurring

conjunct character classes in a multistage classification approach. Initially, they were

classified into four categories depending on their structural properties. Each category was

then classified using a separate classifier of three-level ANN, where the network is

trained using a standard back propagation algorithm. The recognition of printed

characters in the method proposed by Dhurandhar et al. [27] involves comparing the

contour sets with those in the enrolled database.

In [10], the recognition of printed characters involves a search for primitives on the

labeled pattern based on the stored description. Contextual constraints are also utilized to

arrive at the correct interpretation. In [19], multiple hypotheses are obtained for each

composite character by considering all possible combinations of the classifier results for

the primitive components. A dynamic time warping (DTW) based partial matching



algorithm is designed for morphological matching that takes care of word from

variations in the beginning and at the end is proposed by Meshesha et al. [28].

Kompalli et al. [26] outlined two different techniques for OCR of machine-

printed, multifont Devanagari text. In [22], neural network classifiers are used for the

recognition of printed characters and words. Jawahar et al. [23] used SVM for

classifying printed characters. In [1], segmented printed characters are recognized using

generalized Hausdorff image comparisons. In [29], the classification of printed

Devanagari characters is done through five filters: 1) coverage of the region of the core

strip; 2) vertical bar feature; 3) horizontal zero crossings; 4) number and position of

vertex points; and 5) moments.

In [94], for printed Devanagari character recognition, each basic glyph and

ligature is modeled with a 14-state left-to-right HMM with a maximum of 256 Gaussians

per HMM. The training of HMM was carried out using the standard expectation–

maximization procedure. For classification of printed characters in [95], generalized

Hausdorff image comparison, nearest neighbor classifier, weighted Euclidean distance,

and hierarchical classification technique were employed.

General OCR techniques produce poor results on noisy and degraded documents

like old books or newspapers, photocopy materials, faxed documents, etc. [31]. The

quality degradation of old documents and books are mainly due to ancient print

technology and poor paper quality. As a result the main difficulty in recognizing the

images of such documents is because of the distortion of characters due to spreading of

ink. Imperfections in scanning may also result in noisy images. To handle such degraded

documents, Dhingra et al. [31] presented an approach for the development of minimum

classification error (MCE) based system. Gabor filters directly extract features used for

classification as they have been successfully applied to Chinese OCR in [32]. The MCE-

based classifiers provide robustness to the system against random noise by adjusting the

system feature space according to the loss function computed.

Dhingra et al. [31] used a degradation model [33] to simulate the distortions

caused due to the imperfections in scanning. In [71] and [91], the effectiveness of Gabor

and discrete cosine transform (DCT) features was independently evaluated using nearest

neighbor, linear discriminant, and SVM classifiers for the blind recognition of 11

different printed scripts including Devanagari. From the experimentations, it was evident

that the Gabor–SVM combination had an edge over other combinations. The



classification of a machine-printed word to a particular script was done in [82] using

SVM via majority voting of each recognized character component of the word. For the

recognition of multi-oriented Devanagari characters SVM is used in [107] too.

Towards post processing of Devanagari OCR, only a few works are reported.

Bansal and Sinha [30] described a method for the correction of optically read printed

character strings using a Hindi word dictionary. Pal and Chaudhuri [12] and [99] also

proposed a suffix- and prefix-based error correction technique, which can take care of

different inflectional languages. Only a few works are reported regarding document

retrieval and word spotting. In [88], a search system for retrieval of relevant documents

from large collection of document images is presented. A DTW-based partial matching

scheme is employed to group together similar words for the indexing purpose. Word

profiles like upper and lower words and projection and transition profiles are used as

features for word representation. Two different approaches are proposed for spotting

words in images of printed Sanskrit documents in [97]. In the first approach, a block

adjacency graph (BAG) based scheme for word recognition is used. In the second

approach, a moment-based word matching technique, which maintains a script invariant

representation of all word images, is employed. Word matching is then carried out using

cosine similarity.

A shape-code-based word-spotting matching technique for retrieval of

multilingual Indian documents is proposed by Tarafdar et al. [100], where different

primitive shape codes like 1) zonal information of extreme points; 2) vertical-shape-

based feature; 3) crossing count (with respect to the position of vertical bar); 4) loop

shape and position; and 5) background information, etc., are used. An inexact matching

technique is employed to measure the similarity for possible spotting.

The details of many printed Devanagari character and word recognition systems

are summarized in Tables 2.1 and Table 2.2, respectively. It is evident from Table 2.1

that for printed Devanagari characters, the method proposed by Dhingra et al. [31] is

superior to other methods in terms of recognition accuracy. For printed word recognition,

the method proposed by Kompalli et al. [19] has the highest accuracy, as shown in Table

2.2.



Table 2.1 Details of printed Devanagari character recognition systems

Method Feature Classifier Data set

(size)

Accuracy

(%)

Govindajaru et al [16] Gradient Neural networks 4,506 84

Kompalli et al [22] GSC Neural networks 32,413 84.77

Bansal et al [20] Statistical and

Structural

Statistical knowledge

sources

Unspecified 87

Huanfeng Ma et al [1] Structural and

statistical

Hausdroff image

comparison

2,727 88.24

Sinha et al [10] Structural Syntactic pattern

recognition

Unspecified 90

Natarajan et al [94] Derivatives HMM 21,982 91.3

Bansal et al [29] Filters Five filters Unspecified 93

Dhurandhar et al [27] Contours Interpolation 546 93.03

Kompalli et al [26] GSC K-nearest neighbor 9,297 95

Jayanthi et al [25] Statistical Binary tree 4863 95.08

Chaudhuri et al [5] Statistical Tree classifier and

Template matching

10,000 95.08

Kompalli et al [19] SFSA Stochastic finite state

automation

10,606 96

Jawahar et al [23] PCA Support vector machine 2,00,000 96.7

Dhingra et al [31] Gabor MCE 30,000 98.5

Table 2.2 Details of printed Devanagari word recognition systems


(size)

Accuracy

(%)

Govindajaru et al [16] Gradient Neural networks 4,506 53

Kompalli et al [26] GSC K-nearest neighbor 1,882 58.51

Kompalli et al [22] GSC Neural networks 14,353 61.8

Huanfeng Ma et al [1] Statistical and

Structural

Hausdroff image

comparison

578 66.78

Chaudhuri et al [5] Statistical Tree classifier and

Template matching

10,000 83.67

Kompalli et al [19] SFSA Stochastic finite state

automation

10,606 87



2.3 Recognition of handwritten Devanagari script

Only during recent years, research toward Indian handwritten character

recognition is getting increased attention although the first research report on offline

handwritten Devanagari characters was published in 1977 [34]. Many approaches have

been proposed toward handwritten Devanagari numeral, character, and word recognition

in the past decade [35].

2.3.1 Pre-processing and segmentation techniques

Some handwritten documents (e.g., Indian postal documents) may contain some

non text parts (like stamp-seal, etc.). Before recognition of this document, it is needed to

segment the text and non-text parts. Many techniques [37], [38] based on connected

component analysis, run length-smoothing approach (RLSA), and morphological

operations are used for this.

For converting gray-scale images to binary, many techniques are employed in the

literature. In [38], images are binarized using a histogram based global binarization

algorithm [39]. In [41] and [42], the Devanagari word image is first smoothed using a

median filter, and then, binarized by Otsu’s [43] thresholding method. The binarized

image is then smoothed using a median filter. Both local and global methods are used in

some of the works [37]. Noise removal of the document is also an important step toward

the recognition. Bajaj et al. [44] used a median filtering-based approach for noise

removal from the images of handwritten Devanagari numerals.

For skew angle detection of handwritten Devanagari words and characters, an

extension to the work in [13] is proposed in [67]. The method treats shirorekha (header

line) as an inherent feature of Devanagari script. The authors have assumed that a

handwritten Devanagari word will never have the straight shirorekha, and hence,

considered the straightest part of the shirorekha for skew determination. A heuristic

approach has been applied to detect the skew angle. Initially the document is scanned

from all the four sides for getting the coordinates of pixels encountered along the

demarcation of the word boundaries. First-order differential of the coordinate

information gives the spatial-level curve. Various levels are then clustered using the

nearest neighborhood algorithm to form various regions. The biggest region is treated as



the region of importance. The skew angle is then calculated through a heuristic weight

assignment scheme.

In [41], mathematical morphological operations, namely erosion and dilation

were used to detect the shirorekha of each Devanagari word. With the assumption that

the shirorekha is piecewise linear, the skew correction of the word is performed after

detecting the shirorekha. The skew angle is found using eigenvectors of the scatter

matrix of each component (piece) of shirorekha. For correcting the skew of the word, it

is again divided into slabs of a particular number of columns. Each slab is pushed up or

down depending on the skew angle of the shirorekha component of that particular slab.

Text-line segmentation is an important task in the automatic recognition of offline

handwritten text document. Variations in interline distance, presence of inconsistent

baseline skew, touching, and overlapping text lines make this task more crucial and

complex. Correctness/incorrectness of text-line segmentation directly affects the

accuracy of word/character segmentation, which consequently changes the accuracy of

word/character recognition.

Several techniques for text-line segmentation are reported in the literature [101],

[102]. The techniques may be categorized into four groups, which are as follows: 1)

projection-profile-based techniques; 2) Hough-transform based techniques; 3) smearing

techniques; and 4) methods based on thinning operation. As a conventional technique for

text-line segmentation, global horizontal projection analysis of black pixels has been

utilized for line segmentation in printed documents [3]. However, this technique cannot

be used directly on unconstrained handwritten text documents due to text-line skew

variability, inconsistent interline distances, and overlapping and touching components of

two consecutive text lines. Partial or piecewise horizontal projection analysis of black

pixels is employed by many researchers to separate handwritten text lines of different

languages [60], [103], [104].

In the piecewise horizontal projection technique, a text-page image is initially

decomposed into a number of vertical stripes. The positions of potential piecewise

separating lines (PSL) are obtained for each stripe using partial horizontal projection on

each stripe. For PSL computing, row-wise sum of all black pixels of a stripe is

calculated. The row, where this sum is zero is a PSL. The extra pieces of lines are

removed based on some heuristic rules. The potential separating lines are then connected

to achieve complete separating lines for all respective text lines of the image [40], [104].



For line segmentation of handwritten Devanagari text in [83], a method based on

header line detection, base line detection, and contour-following technique is proposed.

The proposed method is free from preprocessing techniques like skew correction,

thinning, and noise removal. Roy et al. [105] proposed morphology based handwritten

line segmentation using foreground and background information. Hanmandlu et al. [59]

used a structural approach for segmentation of handwritten Hindi text. In [81], a dual

method based on interdependency between text line and interline gap is proposed for the

identification of handwritten Devanagari text. The method draws curves simultaneously

through the text and interlines gap points found from strip wise histogram peaks and

inter peak valleys. The curves stabilize after several iterations, and then, define the final

text-line and interline gaps. Also because of upper and lower modifiers of Devanagari

text, many touching may occur between two consecutive lines and more research is

needed to solve these problems in Devanagari scripts. After a text line is segmented,

words are separated from it.

Most of the exiting techniques use vertical projection profile for this purpose [3],

[60]. The segmentation of characters from words, there are two types of segmentation

schemes: recognition-free and recognition-based segmentations. In recognition-free

segmentation, a character string can be divided into segments by rules without

recognition. In recognition-based segmentation, candidate segmentation points are

verified with recognizer. In the past years, many algorithms for the segmentation of

character strings have been proposed [3], [41], and [59].

One class of approaches use contour features for segmentation. Analyzing the

contour of a connected pattern, the corresponding valley and mountain points are

derived. A cutting path is then decided to segment the connected pattern by joining

valley and mountain points. In general, contour-based methods do not provide accurate

results. Some researchers use profile features for segmentation. Profile-based methods

fail when the handwritings are strongly skewed or overlapped. A multi agent-based

approach to the segmentation of touching handwritten Hindi numerals is presented in

[65]. The first agent locates possible touching based on the thickness of handwriting.

The second agent works on the thinned image to locate possible touching based on the

rules that govern the connection of different segments to form digits. The two agents

then negotiate and try to agree on the actual touching points. The distortions in



handwritten Devanagari characters are removed in [72] using a thickening process

followed by thinning and pruning operations.

Hanmandlu et al. [59] make an attempt to segment handwritten Devanagari

words into constituent characters and modifiers. Initially, the handwritten text is

segmented into lines and words using the technique given in [60]. The segmentation of

each word includes its separation into characters, lower modifiers, upper modifiers, and

separation of compound (composite) characters into consonants and half consonants.

Initially, the header line is located and removed after correcting the skew. Analysis of

horizontal pixel density in the top half of a word gives the location of the header line.

After removing the header line, upper modifiers and characters below the header line are

separated using connected component analysis. The characters below the header lines are

analyzed further for the presence of lower modifiers. This is done by horizontally

scanning the thinned image from top to bottom. A window-based approach is used to

find whether the segmented character is a composite one or not. The segmentation of

characters from a Devanagari word in [41] is based on the assumption that a shirorekha

(header line) is always present in a word.

Some works are reported on script identification from handwritten documents. It

is done using texture features in [84]. The texture features are extracted based on the

cooccurrence histograms of wavelet-decomposed images, which capture information

about the relationship between each high-frequency subband and the corresponding low-

frequency subband of the transformed image.

The correlation between the subbands at the same resolution is significant in

characterizing a texture. For script identification in handwritten documents in [85],

denoising, thinning, pruning, m-connectivity, and text size normalization are done in

sequence. Afterward, multi channel Gabor filtering is used to extract texture features that

characterize the visual appearances of the document image. There exist documents,

where both machine printed and handwritten texts appear together.

In [92], a machine printed and handwritten text classification for Devanagari and

Bangla is presented. The scheme is based on both the structural and statistical features of

printed and handwritten text lines.



2.3.2 Feature extraction techniques

Even though researchers test different features, statistical and structural features

are mostly used for handwritten numeral/character recognition. The feature-extraction

methods in [8] for handwritten Devanagari numeral recognition are based on both

statistical and structural features. Sethi and Chatterjee [34] described handwritten

Devanagari numeral recognition based on a structural approach. The primitives used are

horizontal and vertical line segments, right and left slants. For handwritten numerals in

[2], a wavelet filter-based multiresolution analysis of input numeral images is carried out

in a cascaded manner. It is described that Daubechies wavelet, as a problem solving tool,

fit efficiently with digital computer with its basis functions defined by multiplication and

addition operators, as there are no derivatives or integrals involved. They considered

high-level features based on contour representations of all the four frequency

components (high–high, high–low, low–high, and low–low) of the wavelet-filtered

image.

Bajaj et al. [44] represented each handwritten Devanagari numeral using three

types of features: 1) density features; 2) moment features of right, left, upper, and lower

profile curves; and 3) descriptive component features. For extracting the features, a box

approach is proposed by Hanmandlu et al. [45], [46] for handwritten numbers, which

requires a spatial division of the numeral image into boxes. Ramteke and Mehrotra [47]

evaluated the performance of various techniques based on moment invariants on

handwritten Devanagari numerals. The features that have been extracted are based on

moments, image partition, principal component axes, correlation coefficient, and

perturbed moments. Thinning-based features are also used in Devanagari handwritten

character recognition. From the thinned images of handwritten Hindi numerals, three

different types of feature points, namely end, branch, and cross points are extracted first

in [62]. The strokes between these feature points and their cavity information is also used

for the recognition purpose. In [75], translation and scale invariance of handwritten

Devanagari numerals are achieved using simple Geometric moments. Higher order

Zernike moments are also used in the same work as shape descriptors.

The feature used for classifying handwritten digits in [90] is the quad-tree-based

longest run feature (QTLR). Chain code and gradient-based features are used for

Devanagari numeral recognition in [56]. Fourier descriptors (FD) capable of representing



shapes have been used as features in [57] for handwritten numerals. Sixty-four-

dimensional FD invariant to rotation, scale, and translation represent each handwritten

numeral. Kumar [48] compared performances of five feature-extraction methods on

handwritten characters. The various features covered are Kirsch directional edges,

distance transform, chain code, gradient and directional distance distribution. From the

experimentations, it is found that Kirsch directional edges are least performing and

gradient is best performing with SVM classifiers. With multilayer perceptrons (MLP),

the performance of gradient and directional distance distribution is almost same. The

chain-code-based feature is better as compared to Kirsch directional edges and distance

transform. A new feature is also proposed in the paper, where the gradient direction is

quantized into four-directional levels and each gradient map is divided into 4 × 4

regions. This is combined with total distances in four directions and neighborhood pixels

weight. Kaur [49] used Zernike moments along with zoning for feature extraction from

handwritten Devanagari characters.

The application of moments as a feature extractor provides a method for

describing the properties of an object in terms of its area, position, orientation, and other

precisely defined parameters. For the recognition of handwritten Devanagari non

compound characters, shadow features, and CH features are computed in [61]. In [63],

the handwritten Devanagari characters are represented using chain-code features. In [68],

the features are extracted from handwritten Devanagari characters using a box approach

presented in [69]. Each character image is divided into 24 boxes. The features are

represented using normalized vector distances for each character. The shirorekha and

spine in a handwritten character are detected using a differential-distance-based

technique in [72]. Also features like crossing points, end points, and corners are also

considered in the same work.

In [73], a feature-extraction technique to improve the recognition results of two

similar shaped handwritten characters is discussed. The technique is based on Fisher

ratio (F-ratio), a statistical measure defined by the ratio of the between-class variance to

the within-class variance. The main features for handwritten Devanagari characters

considered in [77] are the CH features, four side views based, and shadow-based

features. Features used by Sharma et al. [50] for handwritten Devanagari characters are

obtained from the directional chain code information of the contour points of the

characters. The bounding box of a character is segmented into blocks and a CH is



computed in each of the blocks. Based on the CH, they have used 64-D features for

recognition. The features used by Pal et al. [51] for handwritten characters are mainly

based on directional information obtained from the arc tangent of the gradient and

Gaussian filter.

In [35], a comparative study of Devanagari handwritten character recognition

using 12 different classifiers and four sets of features is presented. Feature sets used in

the classifiers are computed based on curvature and gradient information obtained from

binary as well as gray-scale images. The histogram of chain-code directions in the

image-strips scanned from left to right by a sliding window is used by Shaw et al. [42] as

a feature vector for handwritten Devanagari word recognition.

2.3.3 Recognition/Classification techniques

A decision tree is employed to perform the analysis of hand printed Devanagari

numerals by Sethi and Chatterjee [34] depending on the presence/absence of primitives

like horizontal and vertical line segments, right and left slants and their interconnections.

A similar strategy is applied to the constrained hand printed characters in [52].

Bhattacharya and Chaudhuri [2] use a distinct MLP classifier at each stage of their

recognition scheme for handwritten numerals. Each such classifier either classifies or

rejects an input numeral at the corresponding resolution level. If the MLP classifier at a

coarser resolution level rejects a numeral, the classifier of the following stage attempts to

recognize it at the next higher resolution level. Finally, if rejection still occurs at the

highest resolution level, the output vector of each of these three MLP classifiers is

transformed into a kind of likelihood measurement. Another MLP classifier has been

used to obtain the final decision by combining these three likelihood measurement

vectors. Patil and Sontakke [53] proposed a general fuzzy hyperline segment neural

network for rotation, scale, and translation invariant handwritten numeral recognition. It

combines supervised and unsupervised learning in a single algorithm so that it can be

used for pure classification, pure clustering, and hybrid classification/clustering. Bajaj et

al. [44] combined decisions of multiple classifiers for handwritten Devanagari numerals.

A neural network-based classification scheme is designed for this task. Three different

neural classifiers have been used for classification. The outputs of the three classifiers are

combined using a connectionist scheme.



Hanmandlu et al. [46] proposed a fuzzy model-based scheme for recognition of

handwritten Devanagari numerals by representing them in the form of exponential

membership functions, which serve as a fuzzy model. Modifying the exponential

membership functions fitted to the fuzzy sets does the recognition. These fuzzy sets are

derived from features consisting of normalized distances obtained using the Box

approach. The Gaussian distribution function has been adopted by Ramteke and

Mehrotra [47] for classification of handwritten numerals. In [76], a method is proposed

based on cubic spline interpolation for determining smooth and continuous edges in the

images of handwritten Devanagari numerals.

In [90], a Hough transformation- based technique is used to localize the postal

code blocks from structured postal documents with defined address block region.

Isolated handwritten digits are then extracted from the localized postal-code region. In

[58], a system is proposed to classify handwritten Devanagari characters into several

groups based on similarity measure. The header line (shirorekha) is located based on end

points and pixels positions in the top half part of the character image. The header line is

removed from the images of every character before coarse classification. Three different

classifiers, namely nearest neighbor, k-NN, and SVM were tested independently to

recognize handwritten Devanagari numerals in [57]. The performance of SVM in terms

of accuracy was better than the other two classifiers.

A syntactic representation (SR) of features is used in [62] for handwritten

numeral recognition. This representation is matched against the set of prototype SRs of

handwritten numerals for a possible match. Edge direction histogram features are used

along with PCA for enhancing recognition accuracies of handwritten Devanagari

numerals in [76]. Recognition of handwritten numeric postal codes in a multiscript

environment is presented in [90]. Similar shaped digit patterns of four scripts, namely

Latin, Devanagari, Bangla, and Urdu are grouped in 25 clusters. A script-independent

pattern SVM-based classifier is designed to classify the numeric postal codes into one of

these 25 clusters. Based on the classification decisions, a rule-based script inference

engine is developed to infer about the script of the numeric postal code. One of the four

script-specific SVM-based classifiers is then invoked to recognize the digits of the

corresponding script.

The work in [64] explores the potentiality of a clonal selection algorithm (CSA)

in recognition. In particular, a retraining scheme for the CSA is proposed for better



recognition of handwritten Devanagari numerals. Size normalized binary image matrix is

used as the feature map for the same. In [49], the feature vector is entered as an input to

one of the feed forward back propagation neural network for the classification of

handwritten Devanagari characters. Kumar [48] compared the performances of SVM and

MLP classifiers with six different features on handwritten characters and found that the

performance of SVM classifier was superior to MLP in all the six cases. But the

classification time required for SVM was greater than that of MLP. Sharma et al. [50]

proposed a quadratic classifier-based scheme for the recognition of handwritten

characters. A modified quadratic classifier is applied by Pal et al. [51] on the features of

handwritten characters for recognition.

In [55], two classifiers are combined to get higher accuracy of character

recognition with the same features. Combined use of SVM and MQDF is applied for the

same. A comparative study was done by Pal et al. [35] on Devanagari handwritten

character recognition using 12 different classifiers like PD, subspace method (SM),

linear discriminant function (LDF), SVM, MQDF, mirror image learning (MIL),

Euclidean distance (ED), nearest neighbor, k-NN, modified PD (MPD), compound PD

(CPD), and compound MQDF (CMQDF). From the experiment, they noted that MIL

classifier provided best results and the ED showed the lowest results among all the 12

classifiers considered. A divide-and-conquer strategy is adopted in [58] for the

recognition of handwritten Devanagari characters, where each category is divided into

subcategories based on structural properties to make the classification process simpler.

The subcategories considered in the paper are connected characters, non connected

characters, end-bar characters, middle-bar characters, without bar characters, end-bar

characters with one closed loop, end-bar characters with two closed loops, and without-

bar characters with loop. Identifying the presence and position of vertical line segments

and closed loops does this coarse classification.

In [59], the top modifiers of Devanagari script are classified into one touching-

point and two touching-point modifiers by checking whether a modifier touches the

header line at two positions or not. Further classification of two touching point modifiers

is done by analyzing the core strip of the word. Two MLPs and a minimum edit distance

(MED) method are used for classification of handwritten Devanagari non compound

characters in [61]. In the first stage of classification, characters with distinct shapes are

classified using two MLPs. Shadow features are used for one MLP and CH features are



used for the other MLP for classification. In the second stage of classification, confused

characters having similar shapes are classified using a MED method. This method makes

use of corners detected in a character image using modified Harris corner detection

technique. The work reported in [72] presents a two-stage classification approach for

handwritten Devanagari characters. The first stage is using structural properties like

shirorekha and spine in a character. The second stage exploits intersection features of

characters, which are then fed to a feed forward neural network (FFNN) for further

classification.

In [77], three MLPs are designed for three types of features. Each MLP is trained

with a back propagation-learning algorithm. Results of three MLPs are then combined

using a weighted majority scheme. The work reported in [63] discusses the use of regular

expressions (RE) in handwritten Devanagari character recognition, where a handwritten

character is converted into an encoded string based on chain-code features. Then, RE of

stored templates is matched with it. Rejected samples are then sent to a MED classifier

for recognition.

An elastic matching (EM) technique based on an eigen deformation (ED) for

recognition of handwritten Devanagari characters is proposed in [66]. The method

consists of two phases: a training phase for the estimation of EDs, and a recognition

process using the estimated EDs. EDs are the intrinsic deformations within each

character category and can be estimated by the PCA of actual deformations collected

through the EM.

A coarse classification is done in [68] prior to recognition, where the handwritten

Devanagari characters are classified into three major categories, namely end-bar

characters, middle-bar characters, and characters without any bar based on the presence

of vertical bar. The recognition of handwritten characters in [68] is based on the

modified exponential membership function fitted to the fuzzy sets derived from the

features of the characters. A Reuse Policy that provides guidance from the past policies

is also utilized in the paper to improve the speed of the learning process. Not much work

is reported toward handwritten character string (word) recognition of Devanagari.

A segmentation-based approach to handwritten Devanagari word recognition is

proposed by Shaw et al. [41]. On the basis of the header line, a word image is segmented

into pseudo characters. HMM are proposed to recognize the pseudo characters. The

word-level recognition is done on the basis of string edit distance.



A continuous density HMM is also proposed by Shaw et al. [42] to recognize a

handwritten word images. The states of the HMM are not determined a priori, but are

determined automatically based on a database of handwritten word images. An HMM is

constructed for each word. To classify an unknown word image, its class conditional

probability for each HMM is computed. The class that gives highest such probability is

finally selected. In [74] a dynamic programming (DP) based technique is proposed for

pin code string recognition. Initially, the pin code string is segmented into primitives.

Table 2.3 Details of handwritten Devanagari numeral recognition systems


(size)

Accuracy

(%)

Bajaj et al [44] Statistical Neural networks 2,460 89.68

Ramteke et al [47] Moment

invariants

Gaussian distribution 2,000 92

Lakshmi et al [76] Gradient PCA 9,800 94.25

Hanmandlu et al [46] Box approach Fuzzy model 3,500 95

Hanmandlu et al [54] Box approach Bacterial foraging 3,500 96

Elnagar et al [62] Structural Matching SR 1,500 96

Garain et al [64] Binary image Clonal selection 12,000 96

Basu et al [90] QTLR SVM 3,000 97

Rajput et al [57] Fourier

descriptor

SVM 13,000 97.85

Pal et al [74] Gradient MQDF 23,340 98.41

Sharma et al [50] Chain code Quadratic 22,556 98.86

Bhattacharya et al [2] Wavelet MLP 22,556 99.27

Patil et al [53] Structural General fuzzy neural

network

2,000 99.50

Pal et al [56] Gradient MQDF 22,556 99.56

The details of many handwritten Devanagari numeral, character, and word

recognition systems are summarized in Tables 2.3-2.5, respectively. It is evident from

Table 2.3 that in recognizing Devanagari handwritten numerals, the method proposed by

Pal et al. [56] is superior in terms of accuracy. Even for recognizing Devanagari

handwritten characters, the method proposed by Pal et al. [35] has the highest accuracy,

as shown in Table 2.4.



Table 2.4 Details of handwritten Devanagari character recognition systems


(size)

Accuracy

(%)

Sharma et al [16] Chain code Quadratic 11,270 80.36

Deshpande et al [63] Chain code RE and MED 5,000 82

Arora et al [72] Structural FFNN 50,000 89.12

Arora et al [77] Combined MLP 1,500 89.98

Hanmandlu et al [68] Vector distance Fuzzy sets 4,750 90.65

Arora et al [61] Shadow and CH MLP and MED 7,154 90.74

Kumar et al [48] Gradient SVM 25,000 94.10

Pal et al [51] Gradient and

Gaussian filter

Quadratic 36,172 94.24

Mane et al [66] Eigen

deformation

Elastic matching 3,600 94.91

Pal et al [55] Gradient SVM and MQDF 36,172 95.13

Pal et al [35] Gradient MIL 36,172 95.19

Table 2.5 Details of handwritten Devanagari word recognition systems


(size)

Accuracy

(%)

Shaw et al [42] Chain code HMM 39,700 80.20

Shaw et al [41] Segments HMM 39,700 84.31

2.4 Some observations

In the recent past, Department of Information Technology (DIT), Government of

India formed a Consortium of several Institutions/Universities of India involved in OCR

activities and provided considerable amount of fund to this Consortium to improve the

quality of research related to Indian language OCR. Creation of benchmark databases for

Devanagari script is essential for successful research.

Efforts have been made in India and U.S. to create test beds for printed and

handwritten Devanagari character recognition [2], [36], [93]. The database developed by

Indian Statistical Institute, Kolkata [2] contains 22 556 isolated handwritten Devanagari



numeral samples collected from real-life situations, and is made available free of cost to

researchers of other academic institutions.

Setlur et al. [36] also made some efforts in creating data resources and designing

an evaluation test bed for Devanagari script recognition. Jawahar et al. [93] of

International Institute of Information Technology, Hyderabad were successful in

generating a corpus consists of more than 600 000 document images printed in Indian

scripts. But still there is no standard database for handwritten composite characters and

words written in Devanagari.

For the segmentation-based recognition of handwritten text, initially it has to be

separated into words, and then, words into individual characters and modifiers. As the

recognition has to be performed on isolated characters, segmentation of words into

characters is a critical step for handwritten text recognition as incorrect segmentation of

words may lead to incorrect recognition. Most of the segmentation errors are due to

various writing styles of different individuals. Also presence of many touching

characters is another major problem of segmentation. As a result, much research is

needed and expected in this area. Identifying compound and touching characters is also a

challenging task. Some authors are of the opinion that the use of contextual information

may improve the results of segmentation. Characters are joined together using a

shirorekha to frame words in Devanagari script. It has been observed that some people

write words without using a shirorekha on them. Thus, a word may or may not have a

header line in some handwriting text and such an absence creates problems in

recognition. Skew detection will be difficult in the absence of shirorekha as some of the

existing works related to skew detection are based on the presence of shirorekha on

words. The straightness of the shirorekha is also an issue of concern. A study on the

irregularities in Devanagari handwriting is presented in [70]. These irregularities occur

during the writing process and make the word-level recognition of the text more

complex. Some of them are: abnormal size of a vowel symbol, incomplete and inaccurate

representation of a vowel symbol, merging of vowel symbols with headline, intrusion of

upper vowel symbols with middle region, intrusion of lower vowel symbols with middle

region, improper attachment of lower vowel symbols, writing vowel symbols in

isolation, wrong position of header line, incomplete writing, presence of unwanted or

extra strokes, narrow writing, and over writing.



In [78], some problems in designing OCRs for Indian scripts are discussed. The

authors presented a data collection tool, a segmentation analysis tool, and a feature

selection tool capable of tuning the features used for the classification of a particular font

or script of another set. With increasing popularity of digital cameras attached with

various handheld electronic devices, some new computational challenges have gained

significance recently [80]. One such problem is the extraction of textual information

from natural scene images captured by such devices. The extracted text information can

be sent to an OCR or to a text-to-speech engine for further processing. Because of the

recent surge in digital library projects globally and large scale intensification of

digitization efforts, it is expected that almost all of man’s knowledge will be available in

digital form on the Web in the coming years [89].

2.5 Concluding remarks

With the advent of computer and information technology, there has been a

dramatic increase of research in the field of Devanagari OCR since 1990. Different

strategies using combination of multiple features, multiple classifiers, and multiple

templates have been considered extensively in the state of the art. Only a few works have

been reported in the areas of unconstrained Devanagari handwriting recognition.

Lexicon-based approaches shall be used for recognizing legal amounts on bank cheques

and city names on postal documents. There is a great scope of research in these areas for

the future researchers in the area of handwritten Devanagari OCR. Word spotting in

handwritten Devanagari documents is also an interesting area of research as it will be

helpful in indexing as well as searching the document images of handwritten archives.

Holistic approaches shall be employed for the same. Some research is really required to

find ideal combinations of classifiers for the purpose of recognition. It is still not clear

that how a combination strategy can fully utilize the power of subclassifiers, and to deal

with the tradeoff between combination and effectiveness. The information about the

classification power of a subclassifier may also help in assigning weights to them. Only a

few papers are published on script identification. Generally researchers assume that a

given document is written in a specific script. In countries like India, where many

languages and scripts exist, the identification of script has to be done prior to the

recognition in applications like postal address reader, where address can be written in



any Indian script. More research toward this direction on handwritten documents is

expected in near future.

In India huge volumes of historical documents and books (handwritten or printed

in Devanagari script) remain to be digitized for better access, sharing, indexing, etc. This

will definitely be helpful for other research communities in India in the areas of social

sciences, economics, and linguistics. From the survey, it is noted that the errors in

recognizing printed Devanagari characters are mainly due to incorrect character

segmentation of touching or broken characters. Because of upper and lower modifiers of

Devanagari text, many portions of two consecutive lines may also overlap and proper

segmentation of such overlapped portions are needed to get higher accuracy. Many

authors suggest that the post-processing of classifier outputs by integrating a dictionary

with the OCR system can significantly reduce the misclassifications in printed as well as

handwritten word recognitions. Recently, some efforts have been reported toward

building benchmark databases to enhance the quality of OCR-related research in India. It

is also observed that special keyboards are required to key-in Devanagari text as the

number of characters and modifiers in Devanagari script are more than the number of

characters in Latin script. Since the process of typing is tiring and time consuming,

digitization of documents and their automatic processing would be easier than keying-in

the Devanagari text. The next chapter discusses the problems in recognition of

handwritten Marathi characters and the strategies to overcome them.

LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/25484/10/10... · 2018. 7. 2. · A Neural Network Based Handwritten Character Recognition for Marathi Script

Documents