Page 1
Presentation OrganizationPresentation Organization
1. Introduction
2. Document Analysis and Character
Recognition
3. Objective
١
3. Objective
4. Rule-based Algorithm for Off-line Isolated
Handwritten character recognition
5. Rule-based Algorithm for On-line Cursive
Handwriting Segmentation and Recognition
6. Summary, Conclusion and Future work
Page 2
٢
Prepared by:Prepared by:
Eng. Randa Ibrahim M. ElanwarEng. Randa Ibrahim M. ElanwarResearch assistant , Electronic Research Institute
Under the supervision of:Under the supervision of:
Prof. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. MashalyProf. Dr. Mohsen A. A. Rashwan Prof. Dr. Samia A. A. Mashaly
Professor of Digital Signal Processing, Head of computers and systems dept,
Faculty of Engineering, Cairo University Electronic Research Institute
Page 3
Presentation OrganizationPresentation Organization
1. Introduction
2. Document Analysis and Character
Recognition
3. Objective
٣
3. Objective
4. Rule-based Algorithm for Off-line Isolated
Handwritten character recognition
5. Rule-based Algorithm for On-line Cursive
Handwriting Segmentation and Recognition
6. Summary, Conclusion and Future work
Page 5
IntroductionIntroduction
The Motivation of Document The Motivation of Document
Analysis and Recognition (DAR) & Analysis and Recognition (DAR) &
Character Recognition (CR) Character Recognition (CR)
٥
Character Recognition (CR) Character Recognition (CR)
research fieldsresearch fields
Arabic Character RecognitionArabic Character Recognition
Page 6
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition
Facilities of using documents in computerized formatFacilities of using documents in computerized format
11. Easy editing . Easy editing
٦
11. Easy editing . Easy editing
22. High quality hard copies. High quality hard copies
33. Quick distribution across world. Quick distribution across world--wide networkswide networks
44. Key word or pattern searching. Key word or pattern searching
Page 7
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition .. (cont’d)
Trillions of old documents, handwritten notes, Trillions of old documents, handwritten notes, forms or drawings, that are still not in forms or drawings, that are still not in
٧
forms or drawings, that are still not in forms or drawings, that are still not in computerized format. computerized format.
The manual process used to enter the data The manual process used to enter the data from these documents into computers demands from these documents into computers demands a great deal of time and money.a great deal of time and money.
Page 8
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition .. (cont’d)
The general objective of DAR research is to The general objective of DAR research is to fully automate the process of understanding fully automate the process of understanding
٨
fully automate the process of understanding fully automate the process of understanding printed or handwritten data and entering it to printed or handwritten data and entering it to the computer. the computer.
The Optical Character Recognition (OCR) is The Optical Character Recognition (OCR) is the subthe sub--field of document analysis concerned field of document analysis concerned with the recognition of machine printed or with the recognition of machine printed or handwritten characters in a document.handwritten characters in a document.
Page 9
IntroductionIntroduction
Motivation of Document Analysis and
Character Recognition .. (cont’d)
With the advent of a Personal Digital Assistant With the advent of a Personal Digital Assistant (PDA) there is a great need for handwriting (PDA) there is a great need for handwriting
٩
(PDA) there is a great need for handwriting (PDA) there is a great need for handwriting recognition.recognition.
The problem of recognizing writing in case of The problem of recognizing writing in case of handwritten scanned document images is handwritten scanned document images is referred to as offreferred to as off--line handwriting recognition.line handwriting recognition.
The problem of recognizing writing in case of The problem of recognizing writing in case of PDAs is referred to as onPDAs is referred to as on--line handwriting line handwriting recognition.recognition.
Page 10
IntroductionIntroduction
Arabic Character Recognition
Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:
Always written from right to left. Always written from right to left.
١٠
Arabic word consists of one or more portions; Arabic word consists of one or more portions; each has one or more characters.each has one or more characters.
Many characters differ only by the position and Many characters differ only by the position and the number of dots attached.the number of dots attached.
Page 11
IntroductionIntroduction
Arabic Character Recognition .. (cont’d)
Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:
Every character has more than one shape, Every character has more than one shape, depending on its position. depending on its position.
١١
depending on its position. depending on its position.
Characters overlap.Characters overlap.
Page 12
IntroductionIntroduction
Arabic Character Recognition .. (cont’d)
Special characteristics of Arabic scripts:Special characteristics of Arabic scripts:
Existence of Ligatures. Existence of Ligatures.
١٢
�� As a result of encountering these special As a result of encountering these special characteristics, Arabic character recognition characteristics, Arabic character recognition systems still need more research to be systems still need more research to be established commercially.established commercially.
Page 14
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
OffOff--line Document Analysis & CRline Document Analysis & CR
�� PreprocessingPreprocessing
�� FeaturesFeatures
OnOn--line Document Analysis & CRline Document Analysis & CR
١٤
OnOn--line Document Analysis & CRline Document Analysis & CR
�� PreprocessingPreprocessing
�� FeaturesFeatures
SegmentationSegmentation
Learning and ClassificationLearning and Classification
Page 15
The DACR field is subdivided to:The DACR field is subdivided to:
11. Off. Off--line Document Analysis & CRline Document Analysis & CR
ApplicationsApplications: Bank check processing, Mail sorting, : Bank check processing, Mail sorting,
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
١٥
ApplicationsApplications: Bank check processing, Mail sorting, : Bank check processing, Mail sorting,
Reading of commercial forms, etcReading of commercial forms, etc
22. On. On--line Document Analysis & CRline Document Analysis & CR
ApplicationsApplications: Pen computing industry, Signature verification,: Pen computing industry, Signature verification,
Author authentication Author authentication
Page 16
1. Off-line Document Analysis &CR
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
١٦
Page 17
1. Off-line Document Analysis & CR ..
(cont’d)
1.1 Preprocessing
��BinarizationBinarization
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
١٧
��BinarizationBinarization
��Noise removalNoise removal
��NormalizationNormalization
��Morphological image processing: Opening, Closing, Morphological image processing: Opening, Closing,
Erosion, Dilation, etc.Erosion, Dilation, etc.
��Segmentation: Explicit, Implicit, segmentationSegmentation: Explicit, Implicit, segmentation--freefree
Page 18
1. Off-line Document Analysis & CR .. (cont’d)
1.2 Features
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
١٨
��Structural DecompositionStructural Decomposition
(Height contour and chain code features, End points, T-joints and X-joints)
��Series ExpansionSeries Expansion
(Moments, Fourier Transform, Gabor Transform and Wavelets)
Page 19
2. On-line Document Analysis & CR
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
١٩
Page 20
2. On-line Document Analysis & CR ..
(cont’d)
2.1 Preprocessing
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٠
��Noise removalNoise removal
(Smoothing, Filtering, De(Smoothing, Filtering, De--hooking, etc)hooking, etc)
��NormalizationNormalization
(Slant correction, Baseline drift correction, Scale normalization, etc)(Slant correction, Baseline drift correction, Scale normalization, etc)
��SegmentationSegmentation
(Explicit, Implicit, Segmentation(Explicit, Implicit, Segmentation--free)free)
Page 21
2. On-line Document Analysis & CR .. (cont’d)
2.2 Features
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢١
Features are typically extracted at a subFeatures are typically extracted at a sub--letter level:letter level:
��Shape DescriptorsShape Descriptors
(Ascender, descender, concavity, loop, cusp, curliness, lineness)
��Tangent and curvature features for a window of Tangent and curvature features for a window of
pointspoints
��Writing SpeedWriting Speed
Page 22
Segmentation
�� Segmentation based on contour analysis and Segmentation based on contour analysis and
baseline locationbaseline location
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٢
�� Segmentation based on vertical histogramSegmentation based on vertical histogram
�� Stroke SegmentationStroke Segmentation
�� PostPost-- Segmentation (Segmentation by recognition)Segmentation (Segmentation by recognition)
�� Segmentation by Neural NetworkSegmentation by Neural Network
�� Segmentation using Dynamic programming (PreSegmentation using Dynamic programming (Pre--
stroke segmentation)stroke segmentation)
Page 23
Segmentation .. (cont’d)
Segmentation based on contour analysis and Segmentation based on contour analysis and
baseline locationbaseline location
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٣
The chain code providesThe chain code provides
information about findinginformation about finding
the baseline location. the baseline location.
After defining the baselineAfter defining the baseline
location, segmentation islocation, segmentation is
done at the points wheredone at the points where
contour makes transition contour makes transition
from the inside to the from the inside to the
outside of the baseline.outside of the baseline.
Page 24
Segmentation .. (cont’d)
Stroke SegmentationStroke Segmentation
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٤
Page 25
Segmentation .. (cont’d)
Segmentation based on vertical histogramSegmentation based on vertical histogram
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٥
�� After plotting the vertical histogram of the word or subAfter plotting the vertical histogram of the word or sub--
word, it is traversed by a predefined threshold. word, it is traversed by a predefined threshold.
�� The zones above this threshold are isolated. The zones above this threshold are isolated.
�� This threshold value depends on the font, and is This threshold value depends on the font, and is
proportional to the lump of black pixels that joins proportional to the lump of black pixels that joins
characters togethercharacters together
Page 26
Segmentation .. (cont’d)
PostPost-- Segmentation (Segmentation (Segmentation by recognitionSegmentation by recognition))
�� The basic idea is to extract sequentially a set of The basic idea is to extract sequentially a set of
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٦
�� The basic idea is to extract sequentially a set of The basic idea is to extract sequentially a set of
features and accumulating the values while moving features and accumulating the values while moving
along the word. then checked against the feature along the word. then checked against the feature
space of a given font. space of a given font.
�� This process is repeated until the character is This process is repeated until the character is
recognized or the end of the word is reached. recognized or the end of the word is reached.
Page 27
Segmentation .. (cont’d)
Segmentation by Neural NetworkSegmentation by Neural Network
�� Neural Networks are trained on manually marked Neural Networks are trained on manually marked
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٧
�� Neural Networks are trained on manually marked Neural Networks are trained on manually marked
break points. break points.
�� For the test words, Neural Networks will have to For the test words, Neural Networks will have to
determine the location of break points between determine the location of break points between
characters. characters.
Page 28
Segmentation .. (cont’d)
Segmentation using Dynamic programming Segmentation using Dynamic programming
(Pre(Pre--stroke segmentation)stroke segmentation)
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٨
�� Valley points (festoonValley points (festoon--like strokes) usually correspond to like strokes) usually correspond to
segmentation points between characters. segmentation points between characters.
�� The basic idea is to use a dynamic programming The basic idea is to use a dynamic programming
algorithm to find a globally optimal set of cuts through algorithm to find a globally optimal set of cuts through
the input string which minimizes a certain cost function. the input string which minimizes a certain cost function.
�� The set of cuts and their precise shape are found The set of cuts and their precise shape are found
simultaneously.simultaneously.
Page 29
Learning (Training)
��Supervised LearningSupervised Learning
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٢٩
��Unsupervised LearningUnsupervised Learning
��Reinforcement LearningReinforcement Learning
Page 30
Learning (Training) .. (cont’d)
Supervised LearningSupervised Learning
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٣٠
A teacher provides a category label or cost for each A teacher provides a category label or cost for each
pattern in a training setpattern in a training set
Unsupervised LearningUnsupervised Learning
There is no explicit teacher, and the system forms There is no explicit teacher, and the system forms
clusters or “natural groupings” of the input patterns.clusters or “natural groupings” of the input patterns.
Page 31
Learning (Training) .. (cont’d)
Reinforcement LearningReinforcement Learning
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٣١
This is analogous to a critic who merely states that This is analogous to a critic who merely states that
something is right or wrong, but does not say something is right or wrong, but does not say
specifically how it is wrong. specifically how it is wrong.
(Thus only binary feedback is given to the classifier)(Thus only binary feedback is given to the classifier)
Page 32
Classification (Recognition)
Classification ApproachesClassification Approaches
11. Holistic Approach . Holistic Approach
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٣٢
11. Holistic Approach . Holistic Approach
Segmentation free, Closed Vocabulary, Global featuresSegmentation free, Closed Vocabulary, Global features
22. Analytical Approach . Analytical Approach
Implicit or Explicit Segmentation, Open VocabularyImplicit or Explicit Segmentation, Open Vocabulary
Page 33
Classification (Recognition) .. (cont’d)
Classification ToolsClassification Tools
11. Template Matching. Template Matching
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٣٣
11. Template Matching. Template Matching
(Direct matching, string matching and elastic matching)(Direct matching, string matching and elastic matching)
22. Statistical Methods. Statistical Methods
(k nearest neighbour, Bayesian Classifier)(k nearest neighbour, Bayesian Classifier)
33. Stochastic Processes. Stochastic Processes
(Markov Chain)(Markov Chain)
Page 34
Classification (Recognition) .. (cont’d)
Classification ToolsClassification Tools
44. Structural Matching. Structural Matching
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٣٤
44. Structural Matching. Structural Matching
(Trees, Chains, etc)(Trees, Chains, etc)
55. Neural Networks. Neural Networks
66. Rule. Rule--based Methodsbased Methods
(Abstract description of writing)(Abstract description of writing)
77. Multiple Classifiers. Multiple Classifiers(Classifier Ensemble)(Classifier Ensemble)
Page 35
OnOn--line and Offline character recognition line and Offline character recognition systems can be categorized as:systems can be categorized as:
11. Recognition of Isolated Characters (. Recognition of Isolated Characters (ISRISR).).
Document Analysis and Document Analysis and
Character RecognitionCharacter Recognition
٣٥
22. Explicit Segmentation into characters/primitives Before . Explicit Segmentation into characters/primitives Before
Recognition (Recognition (SBRSBR).).
33. Simultaneous / Sequential recognition and segmentation . Simultaneous / Sequential recognition and segmentation
((SSRSSR).).
44. Global Whole Word recognition (. Global Whole Word recognition (GWRGWR).).
Page 37
ObjectiveObjective
11. Viewing the ACR problem from different sides: . Viewing the ACR problem from different sides:
�� Isolated and cursive Isolated and cursive
�� OffOff--line and online and on--line character problemline character problem
�� Single writer and multiSingle writer and multi--writer variability writer variability
(WD & WI) (WD & WI)
٣٧
(WD & WI) (WD & WI)
22. Achieving the best possible character . Achieving the best possible character
recognition accuracy using the most logical recognition accuracy using the most logical
rulerule--based algorithmsbased algorithms
Page 39
A. System Stages
11. Database Collection. Database Collection
22. Preprocessing. Preprocessing
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٣٩
33. Feature Extraction, Learning & Classification. Feature Extraction, Learning & Classification
33..11) A single feature) A single feature--based classifier based classifier
systemsystem
33..22) Hierarchical Mixture of feature) Hierarchical Mixture of feature--based based
classifiers systemclassifiers system
B. Results and Discussion
Page 40
1. Database Collection:
A database for a single writer consisted of A database for a single writer consisted of 30 30
samples (samples (20 20 for training and for training and 10 10 for test) of the for test) of the
Arabic alphabetic characters were used. i.e. Arabic alphabetic characters were used. i.e. 580 580
characters for training and characters for training and 290 290 for testfor test
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٠
characters for training and characters for training and 290 290 for testfor test
2. Preprocessing:
��Character Image BinarizationCharacter Image Binarization
��Character Image ThresholdingCharacter Image Thresholding
Page 41
3. Feature Extraction, Learning and Classification:
Recognition results were based upon the Recognition results were based upon the
comparison between:comparison between:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤١
11. A single feature. A single feature--based classifier systembased classifier system
22. Hierarchical Mixture of feature. Hierarchical Mixture of feature--based classifiers based classifiers
systemsystem
33..11) A single feature) A single feature--based classifier systembased classifier system
The feature used for this single classifier system The feature used for this single classifier system
was mainly the radial distanceswas mainly the radial distances
Page 42
3.1) A single feature-based classifier system:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٢
In the training stage, we compute a representative pattern for In the training stage, we compute a representative pattern for
each classeach class
Each character was considered a separate class Each character was considered a separate class
Classification using the Euclidean distance measureClassification using the Euclidean distance measure
Page 43
3.1) A single feature-based classifier system: .. (cont’d)
The average system accuracy = The average system accuracy = 7070..0606%%
Most of the confusions lack sense. This is Most of the confusions lack sense. This is
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٣
Most of the confusions lack sense. This is Most of the confusions lack sense. This is
because:because:
��The input pattern is compared to all classes.The input pattern is compared to all classes.
��One feature is not representative enough. One feature is not representative enough.
�� We need a better way of categorizationWe need a better way of categorization
�� We need to Acquire more featuresWe need to Acquire more features
Page 44
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
Character images are composed of Character images are composed of 11, , 22, , 3 3 or or 4 4 objectsobjects
Example:Example:
٤٤
We have a main object (character body) and secondaries. We have a main object (character body) and secondaries.
To determine the number of dots associated we need to To determine the number of dots associated we need to
discriminate between:discriminate between:
1.1. Single dotSingle dot
2.2. Two stuck dotsTwo stuck dots
3.3. HamzaHamza
4.4. Separated AlefSeparated Alef
Page 45
3.2) Hierarchical Mixture of feature-based classifiers system
The recognition stage in our proposed system had passed by The recognition stage in our proposed system had passed by
4 4 stages:stages:
Stage Stage 11:: using classifier ensemble (hierarchical mixture of using classifier ensemble (hierarchical mixture of
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٥
Stage Stage 11:: using classifier ensemble (hierarchical mixture of using classifier ensemble (hierarchical mixture of
experts) gated by using dotsexperts) gated by using dots
Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating
between different featurebetween different feature--based classifiers based classifiers
Stage Stage 33:: Adding more features and using feature fusion Adding more features and using feature fusion
Stage Stage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
Page 46
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 11:: using classifier ensemble (hierarchical using classifier ensemble (hierarchical
mixture of experts) gated by using dotsmixture of experts) gated by using dots
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٦
��Characters are clustered into groups according to the Characters are clustered into groups according to the
number of dots attached to them to work as gating number of dots attached to them to work as gating
between redundant classifiers. between redundant classifiers.
�� The same feature is used for recognition in each The same feature is used for recognition in each
cluster. i.e., we now have a cluster. i.e., we now have a classifier ensemble of classifier ensemble of
individual classifiers (individual classifiers (by varying training databy varying training data).).
��Classification using the Euclidean distance measureClassification using the Euclidean distance measure
Page 47
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 11:: using classifier ensemble (hierarchical using classifier ensemble (hierarchical
mixture of experts) gated by using dotsmixture of experts) gated by using dots
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٧
��The average system accuracy = The average system accuracy = 7878..3333%%
Page 48
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٨
��Characters are clustered into groups according to the Characters are clustered into groups according to the
number of dots attached to them and the existence of number of dots attached to them and the existence of
loops and Hamzas: (loops and Hamzas: (8 8 different classifiers). different classifiers).
��The same feature is used for recognition in each The same feature is used for recognition in each
cluster. cluster.
��Classification using the Euclidean distance measureClassification using the Euclidean distance measure
��The average system accuracy has risen to be The average system accuracy has risen to be 8080..8686%%
Page 49
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٤٩
�� New Structural features are added:New Structural features are added:
�� Number and position of the character stroke end Number and position of the character stroke end
pointspoints
�� Number of vertical and horizontal lines cuts by the Number of vertical and horizontal lines cuts by the
character bodycharacter body
Page 50
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٠
��The average system accuracy = The average system accuracy = 9292..2525%%
Page 51
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 22:: Adding more structural features for gating Adding more structural features for gating
between different featurebetween different feature--based classifiersbased classifiers
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥١
Page 52
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 33:: Adding more features and using feature Adding more features and using feature
fusionfusion
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٢
�� A New featureA New feature--based classifier that uses based classifier that uses 4545°° inclined lines inclined lines
cuts feature is addedcuts feature is added
�� We used a fusion technique, We used a fusion technique, weighted averageweighted average, to , to
combine together different features combine together different features
�� The average system accuracy has risen to be The average system accuracy has risen to be 9696%%
Page 53
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 33:: Adding more features and using feature Adding more features and using feature
fusionfusion
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٣
Page 54
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
We raised the secondaries identification accuracy to We raised the secondaries identification accuracy to 9999..77% %
using some structural features:using some structural features:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٤
using some structural features:using some structural features:
�� Character Body to Secondary Ratio, Character Body to Secondary Ratio,
�� Secondary Black to white pixel ratio, and Secondary Black to white pixel ratio, and
�� Secondary height to width ratio. Secondary height to width ratio.
We removed class overlapping in the feature space We removed class overlapping in the feature space
The average system accuracy has risen to be The average system accuracy has risen to be 9797%%
Page 55
3.2) Hierarchical Mixture of feature-based classifiers system .. (cont’d)
Stage Stage 44:: Increasing the reliability of gatingIncreasing the reliability of gating
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٥
Page 56
Results and Discussion
The system stages followed to end up with:The system stages followed to end up with:
11. Average recognition accuracy of . Average recognition accuracy of 9797% %
22. The total increase in the recognition accuracy is about . The total increase in the recognition accuracy is about
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٦
22. The total increase in the recognition accuracy is about . The total increase in the recognition accuracy is about
2727% from the recognition accuracy achieved by a single % from the recognition accuracy achieved by a single
classifier systemclassifier system
33. We were able to achieve high results using the most . We were able to achieve high results using the most
common features by proposing the idea of multiple common features by proposing the idea of multiple
classifier system (classifier ensemble) besides using a classifier system (classifier ensemble) besides using a
classification hierarchy based on the structural features of classification hierarchy based on the structural features of
Arabic characters. Arabic characters.
Page 57
Results and Discussion
Our system is very simple and the results are Our system is very simple and the results are
comparable to those obtained by other researchers:comparable to those obtained by other researchers:
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
٥٧
Page 58
Results and Discussion
RuleRule--based Algorithm for Offbased Algorithm for Off--line Isolated line Isolated
Handwritten character recognitionHandwritten character recognition
70.0678.33
92.25 96 97
40
60
80
100
Average Accuracy__
٥٨
0
20
Average Accuracy__
Single
Classifier
Stage 1 Stage 2 Stage 3 Stage 4
Page 60
Classically [Classically [1111], on], on--line recognizers consist of:line recognizers consist of:
11. A preprocessor. A preprocessor
22. A classifier which provides estimates of . A classifier which provides estimates of
probabilities for the different categories of probabilities for the different categories of
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٠
probabilities for the different categories of probabilities for the different categories of
characters and characters and
33. A postprocessor, which eventually incorporates . A postprocessor, which eventually incorporates
a language modela language model
We propose a ruleWe propose a rule--based algorithm for the two early based algorithm for the two early
stages of an onstages of an on--line recognizer cursive Arabic line recognizer cursive Arabic
handwritinghandwriting
Page 61
A. System Stages
11. Database Collection. Database Collection
22. Preprocessing. Preprocessing
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦١
33. Pattern Shapes Definition. Pattern Shapes Definition
44. Feature Extraction. Feature Extraction
55. Training. Training
66. Recognition. Recognition
B. Results and Discussion
Page 62
1. Database Collection
��Handwritten documents were collected on a Handwritten documents were collected on a
slate tablet PCslate tablet PC
��The Database collected was unconstrained The Database collected was unconstrained
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٢
��The Database collected was unconstrained The Database collected was unconstrained
(open vocabulary) (open vocabulary)
��No digits included.No digits included.
��Writing is in NASKH font onlyWriting is in NASKH font only
Page 63
2. Preprocessing
��Filter the document and clear it from unintended Filter the document and clear it from unintended
writers' errors.writers' errors.
��Break down the document into text lines and Break down the document into text lines and
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٣
��Break down the document into text lines and Break down the document into text lines and
words or subwords or sub--words. words.
��Detect the type of each stroke (either mainDetect the type of each stroke (either main--body body
or secondary). or secondary).
Page 64
2. Preprocessing .. (cont’d)
Filter the document and clear it from unintended Filter the document and clear it from unintended
writers' errors. writers' errors.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٤
Page 65
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
The two problems that face using xThe two problems that face using x--y axes projection y axes projection
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٥
The two problems that face using xThe two problems that face using x--y axes projection y axes projection
histograms:histograms:
11. The base line skewing that makes line separation difficult . The base line skewing that makes line separation difficult
and needs careful skew detection and correction stage. and needs careful skew detection and correction stage.
22. The multi. The multi--word overlap where the interword overlap where the inter--word distance is word distance is
smaller than the normal expected threshold for separating smaller than the normal expected threshold for separating
words.words.
Page 66
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
E. Ratzlaff used a “bottomE. Ratzlaff used a “bottom--up” clustering of discrete strokes up” clustering of discrete strokes
into increasingly larger groups that eventually merge to into increasingly larger groups that eventually merge to
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٦
into increasingly larger groups that eventually merge to into increasingly larger groups that eventually merge to
complete text lines.complete text lines.
The initial bottomThe initial bottom--up clustering began by creating Forward up clustering began by creating Forward
Projection (FP) groups. Projection (FP) groups.
Strokes were merged into FP groups if they have strongly Strokes were merged into FP groups if they have strongly
overlapping Yoverlapping Y--axis projections. A single unmerged stroke axis projections. A single unmerged stroke
became an independent FPbecame an independent FP
Page 67
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
Drawbacks: Drawbacks:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٧
11. The secondaries usually have null overlapping Y. The secondaries usually have null overlapping Y--axis axis
projectionsprojections
22. Large base line skews among the text line and even within . Large base line skews among the text line and even within
one word. one word.
Page 68
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
Another idea for text line separation was expressed by Gareth Another idea for text line separation was expressed by Gareth
Loudon et al. This was successfully working with English Loudon et al. This was successfully working with English
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٨
Loudon et al. This was successfully working with English Loudon et al. This was successfully working with English
script due to limited cursive nature, i.e. the stroke (pen script due to limited cursive nature, i.e. the stroke (pen
down/up movement) usually represents a single character. down/up movement) usually represents a single character.
Several parameters were calculated for each stroke during the Several parameters were calculated for each stroke during the
character segmentation step. character segmentation step.
Page 69
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٦٩
�� if (si > max(xi)) or (if (si > max(xi)) or (--si > si > 22* max(xi) & yi > max(xi)), * max(xi) & yi > max(xi)),
then stroke i was a character at the end of a word, then stroke i was a character at the end of a word,
�� else if ( ci > else if ( ci > 00) )
stroke i was a character within a word, stroke i was a character within a word,
�� else else
stroke i must be merged with the next stroke to form a character. stroke i must be merged with the next stroke to form a character.
Page 70
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
Drawbacks: Drawbacks:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٠
11. The Arabic stroke usually represents more than one . The Arabic stroke usually represents more than one
character which makes it impossible to estimate the Arabic character which makes it impossible to estimate the Arabic
stroke geometry (height, width, etc.). stroke geometry (height, width, etc.).
22. Delayed strokes in English are usually written immediately . Delayed strokes in English are usually written immediately
after the main stroke which is not the case in Arabic strokes.after the main stroke which is not the case in Arabic strokes.
33. The stroke size and stroke sequence varieties among . The stroke size and stroke sequence varieties among
writers make the problem more difficult. writers make the problem more difficult.
Page 71
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
Our new technique uses the same bottomOur new technique uses the same bottom--up clustering up clustering
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧١
Our new technique uses the same bottomOur new technique uses the same bottom--up clustering up clustering
concept and uses the spatiotemporal relations between concept and uses the spatiotemporal relations between
strokes to build the smallest possible FP groups. strokes to build the smallest possible FP groups.
The FP groups contain the main and secondary strokes of The FP groups contain the main and secondary strokes of
the same word regardless the sequence by which they were the same word regardless the sequence by which they were
writtenwritten
Page 72
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
By examining the states of successive written Arabic strokes By examining the states of successive written Arabic strokes
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٢
By examining the states of successive written Arabic strokes By examining the states of successive written Arabic strokes
we found them related spatially to each other by one of the we found them related spatially to each other by one of the
following relations:following relations:
11. Touching. Touching
∴∴The two strokes should belong to the same word groupThe two strokes should belong to the same word group
22. Not touching but overlapping on x. Not touching but overlapping on x--axisaxis
∴∴ The two strokes should belong to the same word The two strokes should belong to the same word
groupgroup
Page 73
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٣
33. Neither touching nor overlapping on x. Neither touching nor overlapping on x--axisaxis
If the interIf the inter--stroke distance is less than the average stroke stroke distance is less than the average stroke
widthwidth
∴∴ The two strokes should belong to the same The two strokes should belong to the same
word groupword group
ElseElse
∴∴ The two strokes should belong to two different The two strokes should belong to two different
word groupsword groups
Page 74
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٤
* Strokes * Strokes 1 1 & & 22: neither touching nor overlapping but belong to the : neither touching nor overlapping but belong to the
same word.same word.
*Strokes *Strokes 2 2 & & 55: neither touching nor overlapping but belong to : neither touching nor overlapping but belong to 2 2
different words.different words.
* Strokes * Strokes 1 1 & & 33: overlapping and belong to the same word.: overlapping and belong to the same word.
* Strokes * Strokes 7 7 & & 88: touching and belong to the same word.: touching and belong to the same word.
Page 75
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words. words.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٥
Page 76
2. Preprocessing .. (cont’d)
Break down the document into text lines and words Break down the document into text lines and words
or subor sub--words.words.
We overcame these problems:We overcame these problems:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٦
11. Secondaries having null overlapping Y. Secondaries having null overlapping Y--axis projections, that axis projections, that
were usually separated as an independent text linewere usually separated as an independent text line
22. Base line skew . Base line skew
33. Delayed stroke are comprised in the same word regardless . Delayed stroke are comprised in the same word regardless
the sequence by which they were written.the sequence by which they were written.
Page 77
2. Preprocessing .. (cont’d)
Detect the type of each stroke (either main or Detect the type of each stroke (either main or
secondary).secondary).
There are many characters having the same main body and There are many characters having the same main body and
differ only by the dots. By erasing these dots, we can reduce differ only by the dots. By erasing these dots, we can reduce
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٧
differ only by the dots. By erasing these dots, we can reduce differ only by the dots. By erasing these dots, we can reduce
the number of patterns. the number of patterns.
If the FP group contains If the FP group contains 1 1 stroke then it should be mainstroke then it should be main--type. type.
If the FP group contains If the FP group contains 2 2 or more strokes then the first one or more strokes then the first one
should be mainshould be main--type. The following strokes may be secondary type. The following strokes may be secondary
or main depending on its height, shape and location.or main depending on its height, shape and location.
Page 78
2. Preprocessing .. (cont’d)
Detect the type of each stroke (either main or Detect the type of each stroke (either main or
secondary).secondary).
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٨
Page 79
3. Pattern Shape Definition
Pattern shapes are defined by observing the Pattern shapes are defined by observing the
collected handwritings. We have more than one collected handwritings. We have more than one
shape for the handwritten character in all its known shape for the handwritten character in all its known
positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٧٩
positions (Start, Middle, End, and Isolated).positions (Start, Middle, End, and Isolated).
Page 80
4. Feature Extraction
Depending on the directions, lengths, and penDepending on the directions, lengths, and pen--
up/down movements of substrokes, up/down movements of substrokes, 25 25 substrokes substrokes
of eight directions are defined: eight long strokes of eight directions are defined: eight long strokes
(A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--up up
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٠
(A(A––H), eight short strokes (aH), eight short strokes (a––h), eight penh), eight pen--up up
movements (movements (11––88) and one pen) and one pen--up movement (up movement (99).).
Page 81
4. Feature Extraction .. (cont’d)
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨١
Page 82
5. Training
The details of this stage depend greatly on the methodology The details of this stage depend greatly on the methodology
that will be used in the recognition stage.that will be used in the recognition stage.
Approach Approach 11:: Segmentation based systems (Analytical).Segmentation based systems (Analytical).
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٢
Approach Approach 22:: Segmentation free systems (Holistic).Segmentation free systems (Holistic).
We followed the first approach but by performing We followed the first approach but by performing
segmentationsegmentation--byby--recognition rather than explicit recognition rather than explicit
segmentationsegmentation--beforebefore--recognition. recognition.
Page 83
5. Training .. (cont’d)
S. ElS. El--Dabi [Dabi [33, , 99] used to extract sequentially a set of features ] used to extract sequentially a set of features
and accumulating the values while moving along the word and accumulating the values while moving along the word
image (column by column) then checked against the feature image (column by column) then checked against the feature
space of a given font until a character is recognized or the end space of a given font until a character is recognized or the end
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٣
space of a given font until a character is recognized or the end space of a given font until a character is recognized or the end
of the word is reached.of the word is reached.
We need to build a registry comprising all skeleton patterns We need to build a registry comprising all skeleton patterns
(feature space) of all pattern shapes.(feature space) of all pattern shapes.
�� We made transcription files of the training data to describe We made transcription files of the training data to describe
the content of each training file. These files stand for the content of each training file. These files stand for
manual segmentation of the word strokes manual segmentation of the word strokes
Page 84
5. Training .. (cont’d)
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٤
Page 85
5. Training .. (cont’d)
For each transcription file, For each transcription file,
pattern shapes data are read pattern shapes data are read
and the direction features are and the direction features are
extracted.extracted.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٥
All the feature vectors belonging All the feature vectors belonging
to the same pattern shape are to the same pattern shape are
clustered. clustered.
The most The most representative patterns representative patterns (feature vectors) are stored to (feature vectors) are stored to
construct a registry for the construct a registry for the
recognition stagerecognition stage
Page 86
6. Recognition
In this stage, the main task was to find cuts that divide up In this stage, the main task was to find cuts that divide up
connected components into their individual characters. connected components into their individual characters.
The basic idea is to use a dynamic programming algorithm to The basic idea is to use a dynamic programming algorithm to
find a globally optimal set of cuts through the input string find a globally optimal set of cuts through the input string
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٦
find a globally optimal set of cuts through the input string find a globally optimal set of cuts through the input string
(feature vector) which minimizes a certain cost function. (feature vector) which minimizes a certain cost function.
The set of cuts and their precise shape are found The set of cuts and their precise shape are found
simultaneously. simultaneously.
The feature vector of the test stroke was compared against The feature vector of the test stroke was compared against
the registry (direction after the other) until either a character the registry (direction after the other) until either a character
was recognized (i.e., we decide a segmentation point) or the was recognized (i.e., we decide a segmentation point) or the
feature vector reached its end. feature vector reached its end.
Page 87
6. Recognition .. (cont’d)
This comparison was performed using a dynamic This comparison was performed using a dynamic
programming technique called "programming technique called "Minimum Edit DistanceMinimum Edit Distance".".
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٧
Page 88
6. Recognition .. (cont’d)
Example: assuming Insertion cost = Deletion cost = Example: assuming Insertion cost = Deletion cost = 11, ,
substitution cost = substitution cost = 22
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٨
Page 89
6. Recognition .. (cont’d)
GroupGroup1 1 = ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group= ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'], Group2 2 = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h'] = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h']
& Group& Group3 3 = ['= ['11' '' '22' '' '33' '' '44' '' '55' '' '66' '' '77' '' '88'];'];
The penalties are decided as follows:The penalties are decided as follows:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٨٩
Page 90
6. Recognition .. (cont’d)
Insertion Cost = Substitution Cost/Insertion Cost = Substitution Cost/2 2 & &
Deletion Cost = Substitution Cost/Deletion Cost = Substitution Cost/22
The factors 'The factors '44' and '' and '1616' come from the assumption that short ' come from the assumption that short
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٠
The factors 'The factors '44' and '' and '1616' come from the assumption that short ' come from the assumption that short
strokes (represented by Group strokes (represented by Group 2 2 directions) are almost half directions) are almost half
the length of long strokes (represented by Group the length of long strokes (represented by Group 1 1 directions) directions)
Other value sets for these factors were tried {Other value sets for these factors were tried {11..5522, (, (11..5522))22}, },
{{22..5522, (, (22..5522))22}, {}, {3322, (, (3322))22}, {}, {33..5522, (, (33..5522))22}, {}, {4422, (, (4422))22}. We chose }. We chose
{{2222, (, (2222))22} value set as they represent the smallest integer } value set as they represent the smallest integer
values thus the total distances do not get so large. values thus the total distances do not get so large.
Page 91
6. Recognition .. (cont’d)
The minimumThe minimum--editedit--distance technique is a good mathematical distance technique is a good mathematical
measure but cannot be used solely with the chain code measure but cannot be used solely with the chain code
feature. feature.
We need either some offWe need either some off--line features or at least template line features or at least template
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩١
We need either some offWe need either some off--line features or at least template line features or at least template
matching information. matching information.
We used We used string matchingstring matching to find out the number of matches to find out the number of matches
between the representative patterns from the registry and the between the representative patterns from the registry and the
test vector. test vector.
The final cost function is given by the following equation:The final cost function is given by the following equation:
matches ofNumber
pattern tiverepresenta ofLength distance-edit-minimum Distance ×=
Page 92
6. Recognition .. (cont’d)
The probable pattern shapes of the first character in the stroke The probable pattern shapes of the first character in the stroke
were stored as roots of individual trees. were stored as roots of individual trees.
Each tree was completed by comparing the unEach tree was completed by comparing the un--identified identified
region of the feature vector to the registry again and again to region of the feature vector to the registry again and again to
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٢
region of the feature vector to the registry again and again to region of the feature vector to the registry again and again to
find the probable pattern shapes of the second, third and find the probable pattern shapes of the second, third and
fourth characters till the whole stroke was totally recognized. fourth characters till the whole stroke was totally recognized.
After tree construction, we were able to obtain a ranked list in After tree construction, we were able to obtain a ranked list in
which each member comprised the characters (without dots) which each member comprised the characters (without dots)
representing the stroke, ranked with their total edit distance representing the stroke, ranked with their total edit distance
''DistanceDistance‘. ‘.
Page 93
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٣
Page 94
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
File File 11:: File File 22::
٩٤
Page 95
6. Recognition .. (cont’d)
The last step left in this stage was the dot restoration.The last step left in this stage was the dot restoration.
Two trials were done for assigning dots to the characters Two trials were done for assigning dots to the characters representing the stroke. representing the stroke.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٥
Trial Trial 11:: The dots centroids were calculated, as well as the The dots centroids were calculated, as well as the centroid of each character per stroke and the dots were centroid of each character per stroke and the dots were assigned to the character having the nearest centroid. assigned to the character having the nearest centroid.
Despite of the large list size reduction and swapping correct Despite of the large list size reduction and swapping correct results to the top of the list, the dot position drifts caused results to the top of the list, the dot position drifts caused wrong dot assignments to characters and therefore a lot of wrong dot assignments to characters and therefore a lot of losses of correct choices as well. losses of correct choices as well.
Page 96
6. Recognition .. (cont’d)
Trial Trial 22:: Trying different distributions of dots with the stroke Trying different distributions of dots with the stroke characters and checking the validity of their number and characters and checking the validity of their number and location to remove inconvenient list members. location to remove inconvenient list members.
This trial was more successful, we were able to preserve This trial was more successful, we were able to preserve
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٦
This trial was more successful, we were able to preserve This trial was more successful, we were able to preserve almost all correct list members together with reasonable almost all correct list members together with reasonable reduction percentage in the list size. reduction percentage in the list size.
A new ranked list was obtained after removing inconvenient A new ranked list was obtained after removing inconvenient members.members.
Page 97
6. Recognition .. (cont’d)
Example:Example:
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٧
Page 98
Results and Discussion
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
TestTraining
44No. of writers
94317No. of words
4351814No. of char.
٩٨
Results representation:Results representation:
Neskovic and Cooper [Neskovic and Cooper [1414], have developed an on], have developed an on--line line segmentationsegmentation--byby--recognition system for English using HMMs recognition system for English using HMMs together with Dynamic programming technique (Viterbi). The together with Dynamic programming technique (Viterbi). The output of the system is a ranked set of words. The system's output of the system is a ranked set of words. The system's performance depends on the writer, on his style and the clarity performance depends on the writer, on his style and the clarity of his writing: For good writers the correct word is in the top of his writing: For good writers the correct word is in the top 5 5 words over words over 9797% of the time. For bad writers the correct word is % of the time. For bad writers the correct word is in the top in the top 5 5 words over words over 9090% of the time.% of the time.
4351814No. of char.
Page 99
Results and Discussion .. (cont’d)
Using the same terminology in [Using the same terminology in [1414], we can represent our ], we can represent our results as follows:results as follows:
�� Before dot restoration, the correct segmentationBefore dot restoration, the correct segmentation--recognition results of the test strokes exist within the top recognition results of the test strokes exist within the top list members list members 9393% % of the time (of the time (9696% % of the time for the test of the time for the test
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
٩٩
list members list members 9393% % of the time (of the time (9696% % of the time for the test of the time for the test characters).characters).
�� After dot restoration, the correct segmentationAfter dot restoration, the correct segmentation--recognition recognition results of the test strokes exist within the top list members results of the test strokes exist within the top list members 9292% % of the time (of the time (9595% % of the time for the test characters).of the time for the test characters).
Recognition Probability
Correctly Recognized
Total Number
.95٤١٥٤٣٥Characters
.92٢٧٩٣٠٥Strokes
.74٧٠٩٤Words
Page 100
Results and Discussion .. (cont’d)
Fortunately, the most correct recognition results exist at the Fortunately, the most correct recognition results exist at the top of the ranked list.top of the ranked list.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
Recognition Choices
180
١٠٠
0
20
40
60
80
100
120
140
160
180
1 3 5 7 9
11
13
15
17
19
21
23
25
27
29
31
37
43
53
58
74
116
194
420
521
Location in the ranked list
No. of correct choices____
Characters
Strokes
Page 101
Results and Discussion .. (cont’d)
The list sizes after dot restoration has been reduced The list sizes after dot restoration has been reduced significantly with almost no loss for correct results. significantly with almost no loss for correct results.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
١٠١
Page 102
Results and Discussion .. (cont’d)
The The 55% loss in the number of characters recognized is the % loss in the number of characters recognized is the
consequence of two problems:consequence of two problems:
11. . Imperfect segmentationImperfect segmentation: due to not covering a large : due to not covering a large
degree of writing varieties. degree of writing varieties.
RuleRule--based Algorithm for Onbased Algorithm for On--line Cursive line Cursive
Handwriting segmentation & recognitionHandwriting segmentation & recognition
١٠٢
22. . Wrong dot assignmentWrong dot assignment: due to writer drifts and strokes : due to writer drifts and strokes
overlaps. overlaps.
∴∴ Increasing training samples from multi writers and Increasing training samples from multi writers and
avoiding overlaps is expected to give much better results. avoiding overlaps is expected to give much better results.
Page 104
The proposed work overviewed both branches of The proposed work overviewed both branches of the handwritten Arabic character recognition the handwritten Arabic character recognition problem: the offproblem: the off--line and the online and the on--line, and attacked line, and attacked the problem from different sides:the problem from different sides:
11. Isolated and connected character problems. Isolated and connected character problems
22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٤
22. Single writer and multi. Single writer and multi--writer variability problemswriter variability problems
33. Single output decision and multi. Single output decision and multi--outputs decisions.outputs decisions.
using the simplest trend of solution: the ruleusing the simplest trend of solution: the rule--based algorithms. based algorithms.
Page 105
We proposed an offWe proposed an off--line character recognition line character recognition system for isolated handwritten Arabic character system for isolated handwritten Arabic character recognition, recognition,
And we were able to achieve high results, And we were able to achieve high results, comparable to that achieved by other researchers comparable to that achieved by other researchers
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٥
comparable to that achieved by other researchers comparable to that achieved by other researchers by proposing the idea of multiple classifier system by proposing the idea of multiple classifier system besides using a classification hierarchy based on besides using a classification hierarchy based on the structural features of Arabic characters and the structural features of Arabic characters and using feature fusion.using feature fusion.
Page 106
We proposed a ruleWe proposed a rule--based algorithm for the two based algorithm for the two early stages of an onearly stages of an on--line cursive Arabic line cursive Arabic handwriting recognizer. handwriting recognizer.
We followed a segmentationWe followed a segmentation--byby--recognition recognition approach, and we used the pen trajectory as the approach, and we used the pen trajectory as the
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٦
approach, and we used the pen trajectory as the approach, and we used the pen trajectory as the feature with some modifications. We were able to feature with some modifications. We were able to correctly segment and recognize most of the test correctly segment and recognize most of the test words. words.
Page 107
Following the pen trajectory causes the loss of Following the pen trajectory causes the loss of the global pattern shape information which the the global pattern shape information which the offoff--line image provides (e.g., confusions between line image provides (e.g., confusions between .({ .({ـمفــمفـ , ,ـھــھـ} and {} and {رر , ,وو}}
On the other hand, converting the onOn the other hand, converting the on--line data to line data to
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٧
On the other hand, converting the onOn the other hand, converting the on--line data to line data to bitmaps and trying to solve it as offbitmaps and trying to solve it as off--line is a very line is a very hard task, still under research and is not yet hard task, still under research and is not yet achieving reliable results [achieving reliable results [4040, , 4141, , 4242]. ].
Besides, the segmentation task is quite harder in Besides, the segmentation task is quite harder in
case of offcase of off--line than online than on--line.line.
Page 108
As future work that we can benefit both systems' As future work that we can benefit both systems' advantages by using onadvantages by using on--line/offline/off--line classifier line classifier ensemble system. In this case:ensemble system. In this case:
�� The segmentation decision in offThe segmentation decision in off--line case is corrected line case is corrected using the onusing the on--line system and line system and
�� The classification decision of the onThe classification decision of the on--line case is line case is corrected using the offcorrected using the off--line system. line system.
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٨
corrected using the offcorrected using the off--line system. line system.
Page 109
As future work we need:As future work we need:
Summary, Conclusion & Future WorkSummary, Conclusion & Future Work
١٠٩
1.1. Much larger neat training database from cooperative writers Much larger neat training database from cooperative writers to enhance the results and automating the transcription file to enhance the results and automating the transcription file creation process.creation process.
2.2. Introducing Language Model and solving ambiguities Introducing Language Model and solving ambiguities linguistically to obtain single output.linguistically to obtain single output.
3.3. Encountering a large degree of writing variability by using a Encountering a large degree of writing variability by using a multiple classifier system and decision fusion. multiple classifier system and decision fusion.
4.4. Working on numerals and Reqaa font are still considered as Working on numerals and Reqaa font are still considered as open issues and need more research.open issues and need more research.