Top Banner
Presentation Organization Presentation Organization 1. Introduction 2. Document Analysis and Character Recognition 3. Objective ١ 4. Rule-based Algorithm for Off-line Isolated Handwritten character recognition 5. Rule-based Algorithm for On-line Cursive Handwriting Segmentation and Recognition 6. Summary, Conclusion and Future work
110
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rule based algorithm for handwritten characters recognition

Presentation OrganizationPresentation Organization

1. Introduction

2. Document Analysis and Character

Recognition

3. Objective

١

3. Objective

4. Rule-based Algorithm for Off-line Isolated

Handwritten character recognition

5. Rule-based Algorithm for On-line Cursive

Handwriting Segmentation and Recognition

6. Summary, Conclusion and Future work

Page 2: Rule based algorithm for handwritten characters recognition

٢

Prepared by:Prepared by:

Eng. Randa Ibrahim M. ElanwarEng. Randa Ibrahim M. ElanwarResearch assistant , Electronic Research Institute

Under the supervision of:Under the supervision of:

Prof. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. MashalyProf. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. Mashaly

Professor of Digital Signal Processing, Head of computers and systems dept,

Faculty of Engineering, Cairo University Electronic Research Institute

Page 3: Rule based algorithm for handwritten characters recognition

Presentation OrganizationPresentation Organization

1. Introduction

2. Document Analysis and Character

Recognition

3. Objective

٣

3. Objective

4. Rule-based Algorithm for Off-line Isolated

Handwritten character recognition

5. Rule-based Algorithm for On-line Cursive

Handwriting Segmentation and Recognition

6. Summary, Conclusion and Future work

Page 4: Rule based algorithm for handwritten characters recognition

٤

Page 5: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

The Motivation of Document The Motivation of Document

Analysis and Recognition (DAR) & Analysis and Recognition (DAR) &

Character Recognition (CR) Character Recognition (CR)

٥

Character Recognition (CR) Character Recognition (CR)

research fieldsresearch fields

Arabic Character RecognitionArabic Character Recognition

Page 6: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Motivation of Document Analysis and

Character Recognition

Facilities of using documents in computerized formatFacilities of using documents in computerized format

11. Easy editing . Easy editing

٦

11. Easy editing . Easy editing

22. High quality hard copies. High quality hard copies

33. Quick distribution across world. Quick distribution across world--wide networkswide networks

44. Key word or pattern searching. Key word or pattern searching

Page 7: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Motivation of Document Analysis and

Character Recognition .. (cont’d)

Trillions of old documents, handwritten notes, Trillions of old documents, handwritten notes, forms or drawings, that are still not in forms or drawings, that are still not in

٧

forms or drawings, that are still not in forms or drawings, that are still not in computerized format. computerized format.

The manual process used to enter the data The manual process used to enter the data from these documents into computers demands from these documents into computers demands a great deal of time and money.a great deal of time and money.

Page 8: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Motivation of Document Analysis and

Character Recognition .. (cont’d)

The general objective of DAR research is to The general objective of DAR research is to fully automate the process of understanding fully automate the process of understanding

٨

fully automate the process of understanding fully automate the process of understanding printed or handwritten data and entering it to printed or handwritten data and entering it to the computer. the computer.

The Optical Character Recognition (OCR) is The Optical Character Recognition (OCR) is the subthe sub--field of document analysis concerned field of document analysis concerned with the recognition of machine printed or with the recognition of machine printed or handwritten characters in a document.handwritten characters in a document.

Page 9: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Motivation of Document Analysis and

Character Recognition .. (cont’d)

With the advent of a Personal Digital Assistant With the advent of a Personal Digital Assistant (PDA) there is a great need for handwriting (PDA) there is a great need for handwriting

٩

(PDA) there is a great need for handwriting (PDA) there is a great need for handwriting recognition.recognition.

The problem of recognizing writing in case of The problem of recognizing writing in case of handwritten scanned document images is handwritten scanned document images is referred to as offreferred to as off--line handwriting recognition.line handwriting recognition.

The problem of recognizing writing in case of The problem of recognizing writing in case of PDAs is referred to as onPDAs is referred to as on--line handwriting line handwriting recognition.recognition.

Page 10: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Arabic Character Recognition

Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:

Always written from right to left. Always written from right to left.

١٠

Arabic word consists of one or more portions; Arabic word consists of one or more portions; each has one or more characters.each has one or more characters.

Many characters differ only by the position and Many characters differ only by the position and the number of dots attached.the number of dots attached.

Page 11: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Arabic Character Recognition .. (cont’d)

Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:

Every character has more than one shape, Every character has more than one shape, depending on its position. depending on its position.

١١

depending on its position. depending on its position.

Characters overlap.Characters overlap.

Page 12: Rule based algorithm for handwritten characters recognition

IntroductionIntroduction

Arabic Character Recognition .. (cont’d)

Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:

Existence of Ligatures. Existence of Ligatures.

١٢

�� As a result of encountering these special As a result of encountering these special characteristics, Arabic character recognition characteristics, Arabic character recognition systems still need more research to be systems still need more research to be established commercially.established commercially.

Page 13: Rule based algorithm for handwritten characters recognition

١٣

Page 14: Rule based algorithm for handwritten characters recognition

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

OffOff--line Document Analysis & CRline Document Analysis & CR

�� PreprocessingPreprocessing

�� FeaturesFeatures

OnOn--line Document Analysis & CRline Document Analysis & CR

١٤

OnOn--line Document Analysis & CRline Document Analysis & CR

�� PreprocessingPreprocessing

�� FeaturesFeatures

SegmentationSegmentation

Learning and ClassificationLearning and Classification

Page 15: Rule based algorithm for handwritten characters recognition

The DACR field is subdivided to:The DACR field is subdivided to:

11. Off. Off--line Document Analysis & CRline Document Analysis & CR

ApplicationsApplications: Bank check processing, Mail sorting, : Bank check processing, Mail sorting,

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

١٥

ApplicationsApplications: Bank check processing, Mail sorting, : Bank check processing, Mail sorting,

Reading of commercial forms, etcReading of commercial forms, etc

22. On. On--line Document Analysis & CRline Document Analysis & CR

ApplicationsApplications: Pen computing industry, Signature verification,: Pen computing industry, Signature verification,

Author authentication Author authentication

Page 16: Rule based algorithm for handwritten characters recognition

1. Off-line Document Analysis &CR

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

١٦

Page 17: Rule based algorithm for handwritten characters recognition

1. Off-line Document Analysis & CR ..

(cont’d)

1.1 Preprocessing

��BinarizationBinarization

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

١٧

��BinarizationBinarization

��Noise removalNoise removal

��NormalizationNormalization

��Morphological image processing: Opening, Closing, Morphological image processing: Opening, Closing,

Erosion, Dilation, etc.Erosion, Dilation, etc.

��Segmentation: Explicit, Implicit, segmentationSegmentation: Explicit, Implicit, segmentation--freefree

Page 18: Rule based algorithm for handwritten characters recognition

1. Off-line Document Analysis & CR .. (cont’d)

1.2 Features

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

١٨

��Structural DecompositionStructural Decomposition

(Height contour and chain code features, End points, T-joints and X-joints)

��Series ExpansionSeries Expansion

(Moments, Fourier Transform, Gabor Transform and Wavelets)

Page 19: Rule based algorithm for handwritten characters recognition

2. On-line Document Analysis & CR

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

١٩

Page 20: Rule based algorithm for handwritten characters recognition

2. On-line Document Analysis & CR ..

(cont’d)

2.1 Preprocessing

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٠

��Noise removalNoise removal

(Smoothing, Filtering, De(Smoothing, Filtering, De--hooking, etc)hooking, etc)

��NormalizationNormalization

(Slant correction, Baseline drift correction, Scale normalization, etc)(Slant correction, Baseline drift correction, Scale normalization, etc)

��SegmentationSegmentation

(Explicit, Implicit, Segmentation(Explicit, Implicit, Segmentation--free)free)

Page 21: Rule based algorithm for handwritten characters recognition

2. On-line Document Analysis & CR .. (cont’d)

2.2 Features

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢١

Features are typically extracted at a subFeatures are typically extracted at a sub--letter level:letter level:

��Shape DescriptorsShape Descriptors

(Ascender, descender, concavity, loop, cusp, curliness, lineness)

��Tangent and curvature features for a window of Tangent and curvature features for a window of

pointspoints

��Writing SpeedWriting Speed

Page 22: Rule based algorithm for handwritten characters recognition

Segmentation

�� Segmentation based on contour analysis and Segmentation based on contour analysis and

baseline locationbaseline location

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٢

�� Segmentation based on vertical histogramSegmentation based on vertical histogram

�� Stroke SegmentationStroke Segmentation

�� PostPost-- Segmentation (Segmentation by recognition)Segmentation (Segmentation by recognition)

�� Segmentation by Neural NetworkSegmentation by Neural Network

�� Segmentation using Dynamic programming (PreSegmentation using Dynamic programming (Pre--

stroke segmentation)stroke segmentation)

Page 23: Rule based algorithm for handwritten characters recognition

Segmentation .. (cont’d)

Segmentation based on contour analysis and Segmentation based on contour analysis and

baseline locationbaseline location

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٣

The chain code providesThe chain code provides

information about findinginformation about finding

the baseline location. the baseline location.

After defining the baselineAfter defining the baseline

location, segmentation islocation, segmentation is

done at the points wheredone at the points where

contour makes transition contour makes transition

from the inside to the from the inside to the

outside of the baseline.outside of the baseline.

Page 24: Rule based algorithm for handwritten characters recognition

Segmentation .. (cont’d)

Stroke SegmentationStroke Segmentation

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٤

Page 25: Rule based algorithm for handwritten characters recognition

Segmentation .. (cont’d)

Segmentation based on vertical histogramSegmentation based on vertical histogram

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٥

�� After plotting the vertical histogram of the word or subAfter plotting the vertical histogram of the word or sub--

word, it is traversed by a predefined threshold. word, it is traversed by a predefined threshold.

�� The zones above this threshold are isolated. The zones above this threshold are isolated.

�� This threshold value depends on the font, and is This threshold value depends on the font, and is

proportional to the lump of black pixels that joins proportional to the lump of black pixels that joins

characters togethercharacters together

Page 26: Rule based algorithm for handwritten characters recognition

Segmentation .. (cont’d)

PostPost-- Segmentation (Segmentation (Segmentation by recognitionSegmentation by recognition))

�� The basic idea is to extract sequentially a set of The basic idea is to extract sequentially a set of

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٦

�� The basic idea is to extract sequentially a set of The basic idea is to extract sequentially a set of

features and accumulating the values while moving features and accumulating the values while moving

along the word. then checked against the feature along the word. then checked against the feature

space of a given font. space of a given font.

�� This process is repeated until the character is This process is repeated until the character is

recognized or the end of the word is reached. recognized or the end of the word is reached.

Page 27: Rule based algorithm for handwritten characters recognition

Segmentation .. (cont’d)

Segmentation by Neural NetworkSegmentation by Neural Network

�� Neural Networks are trained on manually marked Neural Networks are trained on manually marked

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٧

�� Neural Networks are trained on manually marked Neural Networks are trained on manually marked

break points. break points.

�� For the test words, Neural Networks will have to For the test words, Neural Networks will have to

determine the location of break points between determine the location of break points between

characters. characters.

Page 28: Rule based algorithm for handwritten characters recognition

Segmentation .. (cont’d)

Segmentation using Dynamic programming Segmentation using Dynamic programming

(Pre(Pre--stroke segmentation)stroke segmentation)

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٨

�� Valley points (festoonValley points (festoon--like strokes) usually correspond to like strokes) usually correspond to

segmentation points between characters. segmentation points between characters.

�� The basic idea is to use a dynamic programming The basic idea is to use a dynamic programming

algorithm to find a globally optimal set of cuts through algorithm to find a globally optimal set of cuts through

the input string which minimizes a certain cost function. the input string which minimizes a certain cost function.

�� The set of cuts and their precise shape are found The set of cuts and their precise shape are found

simultaneously.simultaneously.

Page 29: Rule based algorithm for handwritten characters recognition

Learning (Training)

��Supervised LearningSupervised Learning

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٢٩

��Unsupervised LearningUnsupervised Learning

��Reinforcement LearningReinforcement Learning

Page 30: Rule based algorithm for handwritten characters recognition

Learning (Training) .. (cont’d)

Supervised LearningSupervised Learning

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٣٠

A teacher provides a category label or cost for each A teacher provides a category label or cost for each

pattern in a training setpattern in a training set

Unsupervised LearningUnsupervised Learning

There is no explicit teacher, and the system forms There is no explicit teacher, and the system forms

clusters or “natural groupings” of the input patterns.clusters or “natural groupings” of the input patterns.

Page 31: Rule based algorithm for handwritten characters recognition

Learning (Training) .. (cont’d)

Reinforcement LearningReinforcement Learning

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٣١

This is analogous to a critic who merely states that This is analogous to a critic who merely states that

something is right or wrong, but does not say something is right or wrong, but does not say

specifically how it is wrong. specifically how it is wrong.

(Thus only binary feedback is given to the classifier)(Thus only binary feedback is given to the classifier)

Page 32: Rule based algorithm for handwritten characters recognition

Classification (Recognition)

Classification ApproachesClassification Approaches

11. Holistic Approach . Holistic Approach

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٣٢

11. Holistic Approach . Holistic Approach

Segmentation free, Closed Vocabulary, Global featuresSegmentation free, Closed Vocabulary, Global features

22. Analytical Approach . Analytical Approach

Implicit or Explicit Segmentation, Open VocabularyImplicit or Explicit Segmentation, Open Vocabulary

Page 33: Rule based algorithm for handwritten characters recognition

Classification (Recognition) .. (cont’d)

Classification ToolsClassification Tools

11. Template Matching. Template Matching

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٣٣

11. Template Matching. Template Matching

(Direct matching, string matching and elastic matching)(Direct matching, string matching and elastic matching)

22. Statistical Methods. Statistical Methods

(k nearest neighbour, Bayesian Classifier)(k nearest neighbour, Bayesian Classifier)

33. Stochastic Processes. Stochastic Processes

(Markov Chain)(Markov Chain)

Page 34: Rule based algorithm for handwritten characters recognition

Classification (Recognition) .. (cont’d)

Classification ToolsClassification Tools

44. Structural Matching. Structural Matching

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٣٤

44. Structural Matching. Structural Matching

(Trees, Chains, etc)(Trees, Chains, etc)

55. Neural Networks. Neural Networks

66. Rule. Rule--based Methodsbased Methods

(Abstract description of writing)(Abstract description of writing)

77. Multiple Classifiers. Multiple Classifiers(Classifier Ensemble)(Classifier Ensemble)

Page 35: Rule based algorithm for handwritten characters recognition

OnOn--line and Offline character recognition line and Offline character recognition systems can be categorized as:systems can be categorized as:

11. Recognition of Isolated Characters (. Recognition of Isolated Characters (ISRISR).).

Document Analysis and Document Analysis and

Character RecognitionCharacter Recognition

٣٥

22. Explicit Segmentation into characters/primitives Before . Explicit Segmentation into characters/primitives Before

Recognition (Recognition (SBRSBR).).

33. Simultaneous / Sequential recognition and segmentation . Simultaneous / Sequential recognition and segmentation

((SSRSSR).).

44. Global Whole Word recognition (. Global Whole Word recognition (GWRGWR).).

Page 36: Rule based algorithm for handwritten characters recognition

٣٦

Page 37: Rule based algorithm for handwritten characters recognition

ObjectiveObjective

11. Viewing the ACR problem from different sides: . Viewing the ACR problem from different sides:

�� Isolated and cursive Isolated and cursive

�� OffOff--line and online and on--line character problemline character problem

�� Single writer and multiSingle writer and multi--writer variability writer variability

(WD & WI) (WD & WI)

٣٧

(WD & WI) (WD & WI)

22. Achieving the best possible character . Achieving the best possible character

recognition accuracy using the most logical recognition accuracy using the most logical

rulerule--based algorithmsbased algorithms

Page 38: Rule based algorithm for handwritten characters recognition

٣٨

Page 39: Rule based algorithm for handwritten characters recognition

A. System Stages

11. Database Collection. Database Collection

22. Preprocessing. Preprocessing

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٣٩

33. Feature Extraction, Learning & Classification. Feature Extraction, Learning & Classification

33..11) A single feature) A single feature--based classifier based classifier

systemsystem

33..22) Hierarchical Mixture of feature) Hierarchical Mixture of feature--based based

classifiers systemclassifiers system

B. Results and Discussion

Page 40: Rule based algorithm for handwritten characters recognition

1. Database Collection:

A database for a single writer consisted of A database for a single writer consisted of 30 30

samples (samples (20 20 for training and for training and 10 10 for test) of the for test) of the

Arabic alphabetic characters were used. i.e. Arabic alphabetic characters were used. i.e. 580 580

characters for training and characters for training and 290 290 for testfor test

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٠

characters for training and characters for training and 290 290 for testfor test

2. Preprocessing:

��Character Image BinarizationCharacter Image Binarization

��Character Image ThresholdingCharacter Image Thresholding

Page 41: Rule based algorithm for handwritten characters recognition

3. Feature Extraction, Learning and Classification:

Recognition results were based upon the Recognition results were based upon the

comparison between:comparison between:

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤١

11. A single feature. A single feature--based classifier systembased classifier system

22. Hierarchical Mixture of feature. Hierarchical Mixture of feature--based classifiers based classifiers

systemsystem

33..11) A single feature) A single feature--based classifier systembased classifier system

The feature used for this single classifier system The feature used for this single classifier system

was mainly the radial distanceswas mainly the radial distances

Page 42: Rule based algorithm for handwritten characters recognition

3.1) A single feature-based classifier system:

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٢

In the training stage, we compute a representative pattern for In the training stage, we compute a representative pattern for

each classeach class

Each character was considered a separate class Each character was considered a separate class

Classification using the Euclidean distance measureClassification using the Euclidean distance measure

Page 43: Rule based algorithm for handwritten characters recognition

3.1) A single feature-based classifier system: .. (cont’d)

The average system accuracy = The average system accuracy = 7070..0606%%

Most of the confusions lack sense. This is Most of the confusions lack sense. This is

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٣

Most of the confusions lack sense. This is Most of the confusions lack sense. This is

because:because:

��The input pattern is compared to all classes.The input pattern is compared to all classes.

��One feature is not representative enough. One feature is not representative enough.

�� We need a better way of categorizationWe need a better way of categorization

�� We need to Acquire more featuresWe need to Acquire more features

Page 44: Rule based algorithm for handwritten characters recognition

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

Character images are composed of Character images are composed of 11, , 22, , 3 3 or or 4 4 objectsobjects

Example:Example:

٤٤

We have a main object (character body) and secondaries. We have a main object (character body) and secondaries.

To determine the number of dots associated we need to To determine the number of dots associated we need to

discriminate between:discriminate between:

1.1. Single dotSingle dot

2.2. Two stuck dotsTwo stuck dots

3.3. HamzaHamza

4.4. Separated AlefSeparated Alef

Page 45: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system

The recognition stage in our proposed system had passed by The recognition stage in our proposed system had passed by

4 4 stages:stages:

Stage Stage 11:: using classifier ensemble (hierarchical mixture of using classifier ensemble (hierarchical mixture of

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٥

Stage Stage 11:: using classifier ensemble (hierarchical mixture of using classifier ensemble (hierarchical mixture of

experts) gated by using dotsexperts) gated by using dots

Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating

between different featurebetween different feature--based classifiers based classifiers

Stage Stage 33:: Adding more features and using feature fusion Adding more features and using feature fusion

Stage Stage 44:: Increasing the reliability of gatingIncreasing the reliability of gating

Page 46: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 11:: using classifier ensemble (hierarchical using classifier ensemble (hierarchical

mixture of experts) gated by using dotsmixture of experts) gated by using dots

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٦

��Characters are clustered into groups according to the Characters are clustered into groups according to the

number of dots attached to them to work as gating number of dots attached to them to work as gating

between redundant classifiers. between redundant classifiers.

�� The same feature is used for recognition in each The same feature is used for recognition in each

cluster. i.e., we now have a cluster. i.e., we now have a classifier ensemble of classifier ensemble of

individual classifiers (individual classifiers (by varying training databy varying training data).).

��Classification using the Euclidean distance measureClassification using the Euclidean distance measure

Page 47: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 11:: using classifier ensemble (hierarchical using classifier ensemble (hierarchical

mixture of experts) gated by using dotsmixture of experts) gated by using dots

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٧

��The average system accuracy = The average system accuracy = 7878..3333%%

Page 48: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating

between different featurebetween different feature--based classifiersbased classifiers

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٨

��Characters are clustered into groups according to the Characters are clustered into groups according to the

number of dots attached to them and the existence of number of dots attached to them and the existence of

loops and Hamzas: (loops and Hamzas: (8 8 different classifiers). different classifiers).

��The same feature is used for recognition in each The same feature is used for recognition in each

cluster. cluster.

��Classification using the Euclidean distance measureClassification using the Euclidean distance measure

��The average system accuracy has risen to be The average system accuracy has risen to be 8080..8686%%

Page 49: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating

between different featurebetween different feature--based classifiersbased classifiers

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٤٩

�� New Structural features are added:New Structural features are added:

�� Number and position of the character stroke end Number and position of the character stroke end

pointspoints

�� Number of vertical and horizontal lines cuts by the Number of vertical and horizontal lines cuts by the

character bodycharacter body

Page 50: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating

between different featurebetween different feature--based classifiersbased classifiers

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٠

��The average system accuracy = The average system accuracy = 9292..2525%%

Page 51: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating

between different featurebetween different feature--based classifiersbased classifiers

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥١

Page 52: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 33:: Adding more features and using feature Adding more features and using feature

fusionfusion

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٢

�� A New featureA New feature--based classifier that uses based classifier that uses 4545°° inclined lines inclined lines

cuts feature is addedcuts feature is added

�� We used a fusion technique, We used a fusion technique, weighted averageweighted average, to , to

combine together different features combine together different features

�� The average system accuracy has risen to be The average system accuracy has risen to be 9696%%

Page 53: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 33:: Adding more features and using feature Adding more features and using feature

fusionfusion

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٣

Page 54: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 44:: Increasing the reliability of gatingIncreasing the reliability of gating

We raised the secondaries identification accuracy to We raised the secondaries identification accuracy to 9999..77% %

using some structural features:using some structural features:

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٤

using some structural features:using some structural features:

�� Character Body to Secondary Ratio, Character Body to Secondary Ratio,

�� Secondary Black to white pixel ratio, and Secondary Black to white pixel ratio, and

�� Secondary height to width ratio. Secondary height to width ratio.

We removed class overlapping in the feature space We removed class overlapping in the feature space

The average system accuracy has risen to be The average system accuracy has risen to be 9797%%

Page 55: Rule based algorithm for handwritten characters recognition

3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)

Stage Stage 44:: Increasing the reliability of gatingIncreasing the reliability of gating

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٥

Page 56: Rule based algorithm for handwritten characters recognition

Results and Discussion

The system stages followed to end up with:The system stages followed to end up with:

11. Average recognition accuracy of . Average recognition accuracy of 9797% %

22. The total increase in the recognition accuracy is about . The total increase in the recognition accuracy is about

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٦

22. The total increase in the recognition accuracy is about . The total increase in the recognition accuracy is about

2727% from the recognition accuracy achieved by a single % from the recognition accuracy achieved by a single

classifier systemclassifier system

33. We were able to achieve high results using the most . We were able to achieve high results using the most

common features by proposing the idea of multiple common features by proposing the idea of multiple

classifier system (classifier ensemble) besides using a classifier system (classifier ensemble) besides using a

classification hierarchy based on the structural features of classification hierarchy based on the structural features of

Arabic characters. Arabic characters.

Page 57: Rule based algorithm for handwritten characters recognition

Results and Discussion

Our system is very simple and the results are Our system is very simple and the results are

comparable to those obtained by other researchers:comparable to those obtained by other researchers:

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

٥٧

Page 58: Rule based algorithm for handwritten characters recognition

Results and Discussion

RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated

Handwritten character recognitionHandwritten character recognition

70.0678.33

92.25 96 97

40

60

80

100

Average Accuracy__

٥٨

0

20

Average Accuracy__

Single

Classifier

Stage 1 Stage 2 Stage 3 Stage 4

Page 59: Rule based algorithm for handwritten characters recognition

٥٩

Page 60: Rule based algorithm for handwritten characters recognition

Classically [Classically [1111], on], on--line recognizers consist of:line recognizers consist of:

11. A preprocessor. A preprocessor

22. A classifier which provides estimates of . A classifier which provides estimates of

probabilities for the different categories of probabilities for the different categories of

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٠

probabilities for the different categories of probabilities for the different categories of

characters and characters and

33. A postprocessor, which eventually incorporates . A postprocessor, which eventually incorporates

a language modela language model

We propose a ruleWe propose a rule--based algorithm for the two early based algorithm for the two early

stages of an onstages of an on--line recognizer cursive Arabic line recognizer cursive Arabic

handwritinghandwriting

Page 61: Rule based algorithm for handwritten characters recognition

A. System Stages

11. Database Collection. Database Collection

22. Preprocessing. Preprocessing

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦١

33. Pattern Shapes Definition. Pattern Shapes Definition

44. Feature Extraction. Feature Extraction

55. Training. Training

66. Recognition. Recognition

B. Results and Discussion

Page 62: Rule based algorithm for handwritten characters recognition

1. Database Collection

��Handwritten documents were collected on a Handwritten documents were collected on a

slate tablet PCslate tablet PC

��The Database collected was unconstrained The Database collected was unconstrained

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٢

��The Database collected was unconstrained The Database collected was unconstrained

(open vocabulary) (open vocabulary)

��No digits included.No digits included.

��Writing is in NASKH font onlyWriting is in NASKH font only

Page 63: Rule based algorithm for handwritten characters recognition

2. Preprocessing

��Filter the document and clear it from unintended Filter the document and clear it from unintended

writers' errors.writers' errors.

��Break down the document into text lines and Break down the document into text lines and

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٣

��Break down the document into text lines and Break down the document into text lines and

words or subwords or sub--words. words.

��Detect the type of each stroke (either mainDetect the type of each stroke (either main--body body

or secondary). or secondary).

Page 64: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Filter the document and clear it from unintended Filter the document and clear it from unintended

writers' errors. writers' errors.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٤

Page 65: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

The two problems that face using xThe two problems that face using x--y axes projection y axes projection

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٥

The two problems that face using xThe two problems that face using x--y axes projection y axes projection

histograms:histograms:

11. The base line skewing that makes line separation difficult . The base line skewing that makes line separation difficult

and needs careful skew detection and correction stage. and needs careful skew detection and correction stage.

22. The multi. The multi--word overlap where the interword overlap where the inter--word distance is word distance is

smaller than the normal expected threshold for separating smaller than the normal expected threshold for separating

words.words.

Page 66: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

E. Ratzlaff used a “bottomE. Ratzlaff used a “bottom--up” clustering of discrete strokes up” clustering of discrete strokes

into increasingly larger groups that eventually merge to into increasingly larger groups that eventually merge to

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٦

into increasingly larger groups that eventually merge to into increasingly larger groups that eventually merge to

complete text lines.complete text lines.

The initial bottomThe initial bottom--up clustering began by creating Forward up clustering began by creating Forward

Projection (FP) groups. Projection (FP) groups.

Strokes were merged into FP groups if they have strongly Strokes were merged into FP groups if they have strongly

overlapping Yoverlapping Y--axis projections. A single unmerged stroke axis projections. A single unmerged stroke

became an independent FPbecame an independent FP

Page 67: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

Drawbacks: Drawbacks:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٧

11. The secondaries usually have null overlapping Y. The secondaries usually have null overlapping Y--axis axis

projectionsprojections

22. Large base line skews among the text line and even within . Large base line skews among the text line and even within

one word. one word.

Page 68: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

Another idea for text line separation was expressed by Gareth Another idea for text line separation was expressed by Gareth

Loudon et al. This was successfully working with English Loudon et al. This was successfully working with English

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٨

Loudon et al. This was successfully working with English Loudon et al. This was successfully working with English

script due to limited cursive nature, i.e. the stroke (pen script due to limited cursive nature, i.e. the stroke (pen

down/up movement) usually represents a single character. down/up movement) usually represents a single character.

Several parameters were calculated for each stroke during the Several parameters were calculated for each stroke during the

character segmentation step. character segmentation step.

Page 69: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

Example:Example:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٦٩

�� if (si > max(xi)) or (if (si > max(xi)) or (--si > si > 22* max(xi) & yi > max(xi)), * max(xi) & yi > max(xi)),

then stroke i was a character at the end of a word, then stroke i was a character at the end of a word,

�� else if ( ci > else if ( ci > 00) )

stroke i was a character within a word, stroke i was a character within a word,

�� else else

stroke i must be merged with the next stroke to form a character. stroke i must be merged with the next stroke to form a character.

Page 70: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

Drawbacks: Drawbacks:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٠

11. The Arabic stroke usually represents more than one . The Arabic stroke usually represents more than one

character which makes it impossible to estimate the Arabic character which makes it impossible to estimate the Arabic

stroke geometry (height, width, etc.). stroke geometry (height, width, etc.).

22. Delayed strokes in English are usually written immediately . Delayed strokes in English are usually written immediately

after the main stroke which is not the case in Arabic strokes.after the main stroke which is not the case in Arabic strokes.

33. The stroke size and stroke sequence varieties among . The stroke size and stroke sequence varieties among

writers make the problem more difficult. writers make the problem more difficult.

Page 71: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

Our new technique uses the same bottomOur new technique uses the same bottom--up clustering up clustering

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧١

Our new technique uses the same bottomOur new technique uses the same bottom--up clustering up clustering

concept and uses the spatiotemporal relations between concept and uses the spatiotemporal relations between

strokes to build the smallest possible FP groups. strokes to build the smallest possible FP groups.

The FP groups contain the main and secondary strokes of The FP groups contain the main and secondary strokes of

the same word regardless the sequence by which they were the same word regardless the sequence by which they were

writtenwritten

Page 72: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

By examining the states of successive written Arabic strokes By examining the states of successive written Arabic strokes

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٢

By examining the states of successive written Arabic strokes By examining the states of successive written Arabic strokes

we found them related spatially to each other by one of the we found them related spatially to each other by one of the

following relations:following relations:

11. Touching. Touching

∴∴The two strokes should belong to the same word groupThe two strokes should belong to the same word group

22. Not touching but overlapping on x. Not touching but overlapping on x--axisaxis

∴∴ The two strokes should belong to the same word The two strokes should belong to the same word

groupgroup

Page 73: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٣

33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis

If the interIf the inter--stroke distance is less than the average stroke stroke distance is less than the average stroke

widthwidth

∴∴ The two strokes should belong to the same The two strokes should belong to the same

word groupword group

ElseElse

∴∴ The two strokes should belong to two different The two strokes should belong to two different

word groupsword groups

Page 74: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

Example:Example:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٤

* Strokes * Strokes 1 1 & & 22: neither touching nor overlapping but belong to the : neither touching nor overlapping but belong to the

same word.same word.

*Strokes *Strokes 2 2 & & 55: neither touching nor overlapping but belong to : neither touching nor overlapping but belong to 2 2

different words.different words.

* Strokes * Strokes 1 1 & & 33: overlapping and belong to the same word.: overlapping and belong to the same word.

* Strokes * Strokes 7 7 & & 88: touching and belong to the same word.: touching and belong to the same word.

Page 75: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words. words.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٥

Page 76: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Break down the document into text lines and words Break down the document into text lines and words

or subor sub--words.words.

We overcame these problems:We overcame these problems:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٦

11. Secondaries having null overlapping Y. Secondaries having null overlapping Y--axis projections, that axis projections, that

were usually separated as an independent text linewere usually separated as an independent text line

22. Base line skew . Base line skew

33. Delayed stroke are comprised in the same word regardless . Delayed stroke are comprised in the same word regardless

the sequence by which they were written.the sequence by which they were written.

Page 77: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Detect the type of each stroke (either main or Detect the type of each stroke (either main or

secondary).secondary).

There are many characters having the same main body and There are many characters having the same main body and

differ only by the dots. By erasing these dots, we can reduce differ only by the dots. By erasing these dots, we can reduce

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٧

differ only by the dots. By erasing these dots, we can reduce differ only by the dots. By erasing these dots, we can reduce

the number of patterns. the number of patterns.

If the FP group contains If the FP group contains 1 1 stroke then it should be mainstroke then it should be main--type. type.

If the FP group contains If the FP group contains 2 2 or more strokes then the first one or more strokes then the first one

should be mainshould be main--type. The following strokes may be secondary type. The following strokes may be secondary

or main depending on its height, shape and location.or main depending on its height, shape and location.

Page 78: Rule based algorithm for handwritten characters recognition

2. Preprocessing .. (cont’d)

Detect the type of each stroke (either main or Detect the type of each stroke (either main or

secondary).secondary).

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٨

Page 79: Rule based algorithm for handwritten characters recognition

3. Pattern Shape Definition

Pattern shapes are defined by observing the Pattern shapes are defined by observing the

collected handwritings. We have more than one collected handwritings. We have more than one

shape for the handwritten character in all its known shape for the handwritten character in all its known

positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٧٩

positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).

Page 80: Rule based algorithm for handwritten characters recognition

4. Feature Extraction

Depending on the directions, lengths, and penDepending on the directions, lengths, and pen--

up/down movements of substrokes, up/down movements of substrokes, 25 25 substrokes substrokes

of eight directions are defined: eight long strokes of eight directions are defined: eight long strokes

(A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--up up

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٠

(A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--up up

movements (movements (11––88) and one pen) and one pen--up movement (up movement (99).).

Page 81: Rule based algorithm for handwritten characters recognition

4. Feature Extraction .. (cont’d)

Example:Example:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨١

Page 82: Rule based algorithm for handwritten characters recognition

5. Training

The details of this stage depend greatly on the methodology The details of this stage depend greatly on the methodology

that will be used in the recognition stage.that will be used in the recognition stage.

Approach Approach 11:: Segmentation based systems (Analytical).Segmentation based systems (Analytical).

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٢

Approach Approach 22:: Segmentation free systems (Holistic).Segmentation free systems (Holistic).

We followed the first approach but by performing We followed the first approach but by performing

segmentationsegmentation--byby--recognition rather than explicit recognition rather than explicit

segmentationsegmentation--beforebefore--recognition. recognition.

Page 83: Rule based algorithm for handwritten characters recognition

5. Training .. (cont’d)

S. ElS. El--Dabi [Dabi [33, , 99] used to extract sequentially a set of features ] used to extract sequentially a set of features

and accumulating the values while moving along the word and accumulating the values while moving along the word

image (column by column) then checked against the feature image (column by column) then checked against the feature

space of a given font until a character is recognized or the end space of a given font until a character is recognized or the end

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٣

space of a given font until a character is recognized or the end space of a given font until a character is recognized or the end

of the word is reached.of the word is reached.

We need to build a registry comprising all skeleton patterns We need to build a registry comprising all skeleton patterns

(feature space) of all pattern shapes.(feature space) of all pattern shapes.

�� We made transcription files of the training data to describe We made transcription files of the training data to describe

the content of each training file. These files stand for the content of each training file. These files stand for

manual segmentation of the word strokes manual segmentation of the word strokes

Page 84: Rule based algorithm for handwritten characters recognition

5. Training .. (cont’d)

Example:Example:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٤

Page 85: Rule based algorithm for handwritten characters recognition

5. Training .. (cont’d)

For each transcription file, For each transcription file,

pattern shapes data are read pattern shapes data are read

and the direction features are and the direction features are

extracted.extracted.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٥

All the feature vectors belonging All the feature vectors belonging

to the same pattern shape are to the same pattern shape are

clustered. clustered.

The most The most representative patterns representative patterns (feature vectors) are stored to (feature vectors) are stored to

construct a registry for the construct a registry for the

recognition stagerecognition stage

Page 86: Rule based algorithm for handwritten characters recognition

6. Recognition

In this stage, the main task was to find cuts that divide up In this stage, the main task was to find cuts that divide up

connected components into their individual characters. connected components into their individual characters.

The basic idea is to use a dynamic programming algorithm to The basic idea is to use a dynamic programming algorithm to

find a globally optimal set of cuts through the input string find a globally optimal set of cuts through the input string

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٦

find a globally optimal set of cuts through the input string find a globally optimal set of cuts through the input string

(feature vector) which minimizes a certain cost function. (feature vector) which minimizes a certain cost function.

The set of cuts and their precise shape are found The set of cuts and their precise shape are found

simultaneously. simultaneously.

The feature vector of the test stroke was compared against The feature vector of the test stroke was compared against

the registry (direction after the other) until either a character the registry (direction after the other) until either a character

was recognized (i.e., we decide a segmentation point) or the was recognized (i.e., we decide a segmentation point) or the

feature vector reached its end. feature vector reached its end.

Page 87: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

This comparison was performed using a dynamic This comparison was performed using a dynamic

programming technique called "programming technique called "Minimum Edit DistanceMinimum Edit Distance".".

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٧

Page 88: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

Example: assuming Insertion cost = Deletion cost = Example: assuming Insertion cost = Deletion cost = 11, ,

substitution cost = substitution cost = 22

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٨

Page 89: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

GroupGroup1 1 = ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group= ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group2 2 = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h'] = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h']

& Group& Group3 3 = ['= ['11' '' '22' '' '33' '' '44' '' '55' '' '66' '' '77' '' '88'];'];

The penalties are decided as follows:The penalties are decided as follows:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٨٩

Page 90: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

Insertion Cost = Substitution Cost/Insertion Cost = Substitution Cost/2 2 & &

Deletion Cost = Substitution Cost/Deletion Cost = Substitution Cost/22

The factors 'The factors '44' and '' and '1616' come from the assumption that short ' come from the assumption that short

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٠

The factors 'The factors '44' and '' and '1616' come from the assumption that short ' come from the assumption that short

strokes (represented by Group strokes (represented by Group 2 2 directions) are almost half directions) are almost half

the length of long strokes (represented by Group the length of long strokes (represented by Group 1 1 directions) directions)

Other value sets for these factors were tried {Other value sets for these factors were tried {11..5522, (, (11..5522))22}, },

{{22..5522, (, (22..5522))22}, {}, {3322, (, (3322))22}, {}, {33..5522, (, (33..5522))22}, {}, {4422, (, (4422))22}. We chose }. We chose

{{2222, (, (2222))22} value set as they represent the smallest integer } value set as they represent the smallest integer

values thus the total distances do not get so large. values thus the total distances do not get so large.

Page 91: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

The minimumThe minimum--editedit--distance technique is a good mathematical distance technique is a good mathematical

measure but cannot be used solely with the chain code measure but cannot be used solely with the chain code

feature. feature.

We need either some offWe need either some off--line features or at least template line features or at least template

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩١

We need either some offWe need either some off--line features or at least template line features or at least template

matching information. matching information.

We used We used string matchingstring matching to find out the number of matches to find out the number of matches

between the representative patterns from the registry and the between the representative patterns from the registry and the

test vector. test vector.

The final cost function is given by the following equation:The final cost function is given by the following equation:

matches ofNumber

pattern tiverepresenta ofLength distance-edit-minimum Distance ×=

Page 92: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

The probable pattern shapes of the first character in the stroke The probable pattern shapes of the first character in the stroke

were stored as roots of individual trees. were stored as roots of individual trees.

Each tree was completed by comparing the unEach tree was completed by comparing the un--identified identified

region of the feature vector to the registry again and again to region of the feature vector to the registry again and again to

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٢

region of the feature vector to the registry again and again to region of the feature vector to the registry again and again to

find the probable pattern shapes of the second, third and find the probable pattern shapes of the second, third and

fourth characters till the whole stroke was totally recognized. fourth characters till the whole stroke was totally recognized.

After tree construction, we were able to obtain a ranked list in After tree construction, we were able to obtain a ranked list in

which each member comprised the characters (without dots) which each member comprised the characters (without dots)

representing the stroke, ranked with their total edit distance representing the stroke, ranked with their total edit distance

''DistanceDistance‘. ‘.

Page 93: Rule based algorithm for handwritten characters recognition

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٣

Page 94: Rule based algorithm for handwritten characters recognition

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

File File 11:: File File 22::

٩٤

Page 95: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

The last step left in this stage was the dot restoration.The last step left in this stage was the dot restoration.

Two trials were done for assigning dots to the characters Two trials were done for assigning dots to the characters representing the stroke. representing the stroke.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٥

Trial Trial 11:: The dots centroids were calculated, as well as the The dots centroids were calculated, as well as the centroid of each character per stroke and the dots were centroid of each character per stroke and the dots were assigned to the character having the nearest centroid. assigned to the character having the nearest centroid.

Despite of the large list size reduction and swapping correct Despite of the large list size reduction and swapping correct results to the top of the list, the dot position drifts caused results to the top of the list, the dot position drifts caused wrong dot assignments to characters and therefore a lot of wrong dot assignments to characters and therefore a lot of losses of correct choices as well. losses of correct choices as well.

Page 96: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

Trial Trial 22:: Trying different distributions of dots with the stroke Trying different distributions of dots with the stroke characters and checking the validity of their number and characters and checking the validity of their number and location to remove inconvenient list members. location to remove inconvenient list members.

This trial was more successful, we were able to preserve This trial was more successful, we were able to preserve

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٦

This trial was more successful, we were able to preserve This trial was more successful, we were able to preserve almost all correct list members together with reasonable almost all correct list members together with reasonable reduction percentage in the list size. reduction percentage in the list size.

A new ranked list was obtained after removing inconvenient A new ranked list was obtained after removing inconvenient members.members.

Page 97: Rule based algorithm for handwritten characters recognition

6. Recognition .. (cont’d)

Example:Example:

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٧

Page 98: Rule based algorithm for handwritten characters recognition

Results and Discussion

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

TestTraining

44No. of writers

94317No. of words

4351814No. of char.

٩٨

Results representation:Results representation:

Neskovic and Cooper [Neskovic and Cooper [1414], have developed an on], have developed an on--line line segmentationsegmentation--byby--recognition system for English using HMMs recognition system for English using HMMs together with Dynamic programming technique (Viterbi). The together with Dynamic programming technique (Viterbi). The output of the system is a ranked set of words. The system's output of the system is a ranked set of words. The system's performance depends on the writer, on his style and the clarity performance depends on the writer, on his style and the clarity of his writing: For good writers the correct word is in the top of his writing: For good writers the correct word is in the top 5 5 words over words over 9797% of the time. For bad writers the correct word is % of the time. For bad writers the correct word is in the top in the top 5 5 words over words over 9090% of the time.% of the time.

4351814No. of char.

Page 99: Rule based algorithm for handwritten characters recognition

Results and Discussion .. (cont’d)

Using the same terminology in [Using the same terminology in [1414], we can represent our ], we can represent our results as follows:results as follows:

�� Before dot restoration, the correct segmentationBefore dot restoration, the correct segmentation--recognition results of the test strokes exist within the top recognition results of the test strokes exist within the top list members list members 9393% % of the time (of the time (9696% % of the time for the test of the time for the test

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

٩٩

list members list members 9393% % of the time (of the time (9696% % of the time for the test of the time for the test characters).characters).

�� After dot restoration, the correct segmentationAfter dot restoration, the correct segmentation--recognition recognition results of the test strokes exist within the top list members results of the test strokes exist within the top list members 9292% % of the time (of the time (9595% % of the time for the test characters).of the time for the test characters).

Recognition Probability

Correctly Recognized

Total Number

.95٤١٥٤٣٥Characters

.92٢٧٩٣٠٥Strokes

.74٧٠٩٤Words

Page 100: Rule based algorithm for handwritten characters recognition

Results and Discussion .. (cont’d)

Fortunately, the most correct recognition results exist at the Fortunately, the most correct recognition results exist at the top of the ranked list.top of the ranked list.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

Recognition Choices

180

١٠٠

0

20

40

60

80

100

120

140

160

180

1 3 5 7 9

11

13

15

17

19

21

23

25

27

29

31

37

43

53

58

74

116

194

420

521

Location in the ranked list

No. of correct choices____

Characters

Strokes

Page 101: Rule based algorithm for handwritten characters recognition

Results and Discussion .. (cont’d)

The list sizes after dot restoration has been reduced The list sizes after dot restoration has been reduced significantly with almost no loss for correct results. significantly with almost no loss for correct results.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

١٠١

Page 102: Rule based algorithm for handwritten characters recognition

Results and Discussion .. (cont’d)

The The 55% loss in the number of characters recognized is the % loss in the number of characters recognized is the

consequence of two problems:consequence of two problems:

11. . Imperfect segmentationImperfect segmentation: due to not covering a large : due to not covering a large

degree of writing varieties. degree of writing varieties.

RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive

Handwriting segmentation & recognitionHandwriting segmentation & recognition

١٠٢

22. . Wrong dot assignmentWrong dot assignment: due to writer drifts and strokes : due to writer drifts and strokes

overlaps. overlaps.

∴∴ Increasing training samples from multi writers and Increasing training samples from multi writers and

avoiding overlaps is expected to give much better results. avoiding overlaps is expected to give much better results.

Page 103: Rule based algorithm for handwritten characters recognition

١٠٣

Page 104: Rule based algorithm for handwritten characters recognition

The proposed work overviewed both branches of The proposed work overviewed both branches of the handwritten Arabic character recognition the handwritten Arabic character recognition problem: the offproblem: the off--line and the online and the on--line, and attacked line, and attacked the problem from different sides:the problem from different sides:

11. Isolated and connected character problems. Isolated and connected character problems

22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems

Summary, Conclusion & Future WorkSummary, Conclusion & Future Work

١٠٤

22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems

33. Single output decision and multi. Single output decision and multi--outputs decisions.outputs decisions.

using the simplest trend of solution: the ruleusing the simplest trend of solution: the rule--based algorithms. based algorithms.

Page 105: Rule based algorithm for handwritten characters recognition

We proposed an offWe proposed an off--line character recognition line character recognition system for isolated handwritten Arabic character system for isolated handwritten Arabic character recognition, recognition,

And we were able to achieve high results, And we were able to achieve high results, comparable to that achieved by other researchers comparable to that achieved by other researchers

Summary, Conclusion & Future WorkSummary, Conclusion & Future Work

١٠٥

comparable to that achieved by other researchers comparable to that achieved by other researchers by proposing the idea of multiple classifier system by proposing the idea of multiple classifier system besides using a classification hierarchy based on besides using a classification hierarchy based on the structural features of Arabic characters and the structural features of Arabic characters and using feature fusion.using feature fusion.

Page 106: Rule based algorithm for handwritten characters recognition

We proposed a ruleWe proposed a rule--based algorithm for the two based algorithm for the two early stages of an onearly stages of an on--line cursive Arabic line cursive Arabic handwriting recognizer. handwriting recognizer.

We followed a segmentationWe followed a segmentation--byby--recognition recognition approach, and we used the pen trajectory as the approach, and we used the pen trajectory as the

Summary, Conclusion & Future WorkSummary, Conclusion & Future Work

١٠٦

approach, and we used the pen trajectory as the approach, and we used the pen trajectory as the feature with some modifications. We were able to feature with some modifications. We were able to correctly segment and recognize most of the test correctly segment and recognize most of the test words. words.

Page 107: Rule based algorithm for handwritten characters recognition

Following the pen trajectory causes the loss of Following the pen trajectory causes the loss of the global pattern shape information which the the global pattern shape information which the offoff--line image provides (e.g., confusions between line image provides (e.g., confusions between .({ .({ـمفــمفـ , ,ـھــھـ} and {} and {رر , ,وو}}

On the other hand, converting the onOn the other hand, converting the on--line data to line data to

Summary, Conclusion & Future WorkSummary, Conclusion & Future Work

١٠٧

On the other hand, converting the onOn the other hand, converting the on--line data to line data to bitmaps and trying to solve it as offbitmaps and trying to solve it as off--line is a very line is a very hard task, still under research and is not yet hard task, still under research and is not yet achieving reliable results [achieving reliable results [4040, , 4141, , 4242]. ].

Besides, the segmentation task is quite harder in Besides, the segmentation task is quite harder in

case of offcase of off--line than online than on--line.line.

Page 108: Rule based algorithm for handwritten characters recognition

As future work that we can benefit both systems' As future work that we can benefit both systems' advantages by using onadvantages by using on--line/offline/off--line classifier line classifier ensemble system. In this case:ensemble system. In this case:

�� The segmentation decision in offThe segmentation decision in off--line case is corrected line case is corrected using the onusing the on--line system and line system and

�� The classification decision of the onThe classification decision of the on--line case is line case is corrected using the offcorrected using the off--line system. line system.

Summary, Conclusion & Future WorkSummary, Conclusion & Future Work

١٠٨

corrected using the offcorrected using the off--line system. line system.

Page 109: Rule based algorithm for handwritten characters recognition

As future work we need:As future work we need:

Summary, Conclusion & Future WorkSummary, Conclusion & Future Work

١٠٩

1.1. Much larger neat training database from cooperative writers Much larger neat training database from cooperative writers to enhance the results and automating the transcription file to enhance the results and automating the transcription file creation process.creation process.

2.2. Introducing Language Model and solving ambiguities Introducing Language Model and solving ambiguities linguistically to obtain single output.linguistically to obtain single output.

3.3. Encountering a large degree of writing variability by using a Encountering a large degree of writing variability by using a multiple classifier system and decision fusion. multiple classifier system and decision fusion.

4.4. Working on numerals and Reqaa font are still considered as Working on numerals and Reqaa font are still considered as open issues and need more research.open issues and need more research.

Page 110: Rule based algorithm for handwritten characters recognition

١١٠