Top Banner
Mahmoud A. Arram Preprocessing and Segmentation of Cursive Online Arabic Script MIT 6.870 Spring 2007
18

Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Jul 27, 2015

Download

Documents

Mahmoud A. Arram

Preprocessing and Segmentation of Cursive Online Arabic Script
MIT 6.870 Spring 2007

Contents
1 Introduction 2 Data Collection 3 Ink Processing 3.1 Pen Stroke Segmentation . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.1.3 3.1.4 3.2 3.3 Stroke Segmentation Points . . . . . . . . . . . . . . . Merging Stroke Segmentation Points . . . . . . . . . . Fitting Segments . . . . . . . . . . . . . . . . . . . . . Combining Segments . . . . . . . . . . . . . . . . . . . 3 4 5 5
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Mahmoud A. Arram

Preprocessing and Segmentation of CursiveOnline Arabic Script

MIT 6.870 Spring 2007

Page 2: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Contents

1 Introduction 3

2 Data Collection 4

3 Ink Processing 5

3.1 Pen Stroke Segmentation . . . . . . . . . . . . . . . . . . . . 5

3.1.1 Stroke Segmentation Points . . . . . . . . . . . . . . . 5

3.1.2 Merging Stroke Segmentation Points . . . . . . . . . . 7

3.1.3 Fitting Segments . . . . . . . . . . . . . . . . . . . . . 8

3.1.4 Combining Segments . . . . . . . . . . . . . . . . . . . 9

3.2 Baseline Estimation . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Closed Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Segmentation of Arabic Words 12

4.1 Delayed Strokes . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.1 Dot Detection . . . . . . . . . . . . . . . . . . . . . . . 12

5 Feature Vectors 13

6 Conclusions 14

7 Sample Output 15

2

Page 3: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Abstract

We propose a method for segmenting cursive online Arabic hand-writing into word parts and letters using spaciotemporal properties ofArabic handwriting that include pen speed, curvature and intersectionpoints. We also introduce a novel approach for identifying delayed dotstrokes in Arabic handwriting using writing direction.

1 Introduction

The Arabic script is one of the most used scripts in the world. It is thewriting system not only for the Arabic language, but for many other lan-guages such as Farsi, Dari, Pashto, Azeri, Urdu, Kashmiri, Sindhi, Uyghurand Hausa and until recently, Turkish and Somali[10].

Arabic is written right to left and is always cursive. Letters are joinedtogether along the writing line to form words or sub-words.

The Arabic Alphabet has 28 letters, each with two to four shapes. Theshape of a letter is determined by its position: start, middle or end of a wordpart. Word parts are connected components of a word, which were brokenup by one of six Arabic letters that can only be joined from the right, butnot the left. See figure 1.

More importantly from the point of view of automated segmentationand recognition, many Arabic letters contain obligatory dots and verticalstrokes in addition to the letter body. The letter sheen M consists of thebody of the letter seen x in addition to three letters above it. Theseadditional strokes are usually added after writing the body of the word orword part, which makes online methods ideal for detecting them. Mostpapers on online handwriting segmentation and recognition tout the benefitsof using handwriting as a more natural replacement for the traditional mouseand keyboard. Rather than preach to the choir, I will share my personalappreciation for the beauty of the Arabic calligraphy that adorns manyinternational works of art; from the mosques of Arabia to the walls of theAlhambra and the coronation mantles of the holy Roman emperors. Arabichandwriting is an art form; it would be great if computers understood it.

3

Page 4: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Figure 1: The different forms of written Arabic letters.

2 Data Collection

At first, we implemented a web-based widget[3] that converted mouse orstylus strokes into usable data structures to enable us to collect writingsamples from different and maybe remote participants. However, the inkdensity produced by the widget proved insufficient for the purposes of hand-writing analysis. Nonetheless, it might still be useful for other applicationsthat seek to collect much shorter strokes.

We ended up using a standard pen tablet computer running MicrosoftWindows Journal to collect samples of the author’s handwriting. We usedSketchers [1] to convert the output file into the CSAIL Design RationaleGroup’s XML stroke format. We then wrote a script that converted theoutput into MATLAB data structures.

All the writing samples that were used in prototyping this system wereof the author’s own handwriting. A suggestion for future work would be totest the implementation with a more diverse set of handwriting samples.

4

Page 5: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

3 Ink Processing

The pen tablet we used was very accurate and produced many redundantpoints with progressive timestamps. To prepare the input strokes for ourcalculations, we merged all coincident points into the one with the earliesttimestamp.

3.1 Pen Stroke Segmentation

Rather than relying on the pen tablet software to group points into separatestrokes, we wrote a routine that grouped points into word parts. The pro-gram calculates the average Euclidean distance between each point in theseries. Point distances that are a threshold multiple of the average distanceindicate a pen lift and hence delineate strokes.

3.1.1 Stroke Segmentation Points

At this stage, we attempt to find points with interesting spatial or tem-poral features, which may later help with segmenting a word part into itscomposing letters. The following is a discussion of these points.

Pen Speed Minima and Curvature Maxima

Most of the preprocessing techniques described in the extant literature dis-card pen speeds. Biadsy et al. [4] normalize all speed data before carryingout feature extraction. Alsallakh et al. [2] argue that most online recogni-tion systems find speed information irrelevant for recognition. However, weargue that the pen speed segmentation technique described by Stahovich[8]can be useful for segmenting handwriting as well.

Pen speeds are first calculated at each point according to the formula[8]:

si =di+1 − di−1

ti+1 − ti−1(1)

Where:di is the arc length at point i.ti is the time stamp at point i.

5

Page 6: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

No distinction is made between vertical and horizontal pen speeds. Teul-ings et al. [9] contend that pen speed in the vertical direction bears moreinformation than that in the horizontal direction, and thus speed calcula-tions should be weighted accordingly. However, we were not sure that thisis a universal observation that applies also to Arabic handwriting. Changesin direction are easier captured by the other types of stroke segmentationpoints described herein.

Curvature is calculated at each point as the derivative of the tangentangle, θ, with respect to arc length:

C =∂θ

∂d(2)

We then use the following criteria to choose segmentation points:

1. Speed minima that are lower than a threshold percentage of the aver-age speed, and

2. Local maxima of curvature are accepted if the speed at that point isbelow a threshold (even if not a minimum) and the curvature is abovea threshold.

Direction Change Points

It is useful to determine the points at which the handwriting stroke pathchanges directions. The slopes of tangent lines are computed at each point byfitting a linear regression line to a window of surrounding points. The pointsat which the slope of the tangent changes signs and differs by more than athreshold amount in magnitude from the slope of the previous tangent, arenoted.

The size of the linear regression window, and the slope magnitude dif-ference threshold are noted in parameters table.

Derivative Extrema

Derivative extrema signal a sudden shift in the amount of change in a givendirection. These segmentation points are important because they can help

6

Page 7: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

in identifying ascenders and descenders in Arabic handwriting such as themedial alif, laam þlþ Aþ and the final raarþ.

Intersection Points

Points at which the handwriting stroke intersects itself are noted. The inter-section finding algorithm is executed after finding and merging the previoustypes of stroke segmentation points. Section 3.3 discusses the importanceof these intersection points.

4000 4100 4200 4300 4400 4500 4600 4700200

250

300

350

400

450

500

550

600Derivative extrema stroke segmentation points

4000 4100 4200 4300 4400 4500 4600 4700200

250

300

350

400

450

500

550

600Speed minima and curvature maxima stroke segmentation points

4000 4100 4200 4300 4400 4500 4600 4700200

250

300

350

400

450

500

550

600Direction change stroke segmentation points

4000 4100 4200 4300 4400 4500 4600 4700200

250

300

350

400

450

500

550

600Stroke intersection segmentation points

Figure 2: The four types of stroke segmentation points

3.1.2 Merging Stroke Segmentation Points

After identifying the four sets of segmentation points, nearly coincident seg-mentation points are identified and merged with highest weight given tointersection points.

The algorithm calculates distances between segmentation points in suc-cessively wider windows by comparing the standard deviation of their indices

7

Page 8: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

in the stroke to a threshold. If the distances between the points is less thanor equal to the threshold, then they are deemed nearly coincident, and arereplaced with the point with highest curvature.

We empirically found that the point with the highest curvature is abetter replacement for a set of nearly coincident segmentation points thanthe point with minimum speed. Speed minima is usually a couple of pointsbefore the curvature corner.

4000 4100 4200 4300 4400 4500 4600 4700200

250

300

350

400

450

500

550

600Stroke segmentation points after merging

Figure 3: Stroke segmentation points after merging the nearly coincident.

3.1.3 Fitting Segments

After finding segmentation points, segments are then classified into line orarc by trying to fit linear regression lines and circles and comparing themean square errors. The technique is similar to that used by Stahovich[8].

Residuals of the circle fit are calculated at each point as the distancebetween that point and the perimeter of the fitted circle. The angle sub-tended by the arc of the circle is calculated by computing the inverse cosine

8

Page 9: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

of the normalized dot product of the vectors connecting the two segmen-tation points to the center of the fitted circle. If the length of the arc isdetermined to be larger than half the perimeter of the circle, then the angleis recalculated as 360 minus that angle first computed.

Since it is common for nearly straight lines to be accurately fit with anarc with a large radius and a small angle, Stahovich classifies segments asarcs only if they represented at least one tenth of a circle. We replaced thatconstraint by requiring the length of the arc to be larger than the radius ofthe fitted circle. We found that adding this constraint reduced the numberof serifs incorrectly identified as arcs.

4000 4100 4200 4300 4400 4500 4600 4700200

250

300

350

400

450

500

550

600Word part fitted with geometric primitives

Figure 4: Fitting a segmented word part with lines and arcs. Note the loop.

3.1.4 Combining Segments

After the initial segments have been computed, the program tries to increasethe accuracy of the fitted segments by merging and deleting segments asnecessary.

If a very short segment is adjacent to a much longer one, then the short

9

Page 10: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

segment is merged into its longer neighbor.

If two segments of the same type are adjacent, the program tries tomerge them. So if two line segments are adjacent, the angle between themis calculated. If the angle is within a threshold value of 180◦, then they arecombined.

If two arcs are adjacent, then a new circle fit is calculated for the com-bined segment. If the arcs are in the same direction, the error of the new fitis less than the errors of the two segments, and the angle subtended by thenew arc is larger than the two angles, then the segments are replaced by thenew fit.

4900 5000 5100 5200 5300 5400 5500 5600 5700 5800100

150

200

250

300

350

400

450

500

550

600Before combining similar segments

4900 5000 5100 5200 5300 5400 5500 5600 5700 5800100

150

200

250

300

350

400

450

500

550

600After combining similar segments

Figure 5: Combining similar segments.

3.2 Baseline Estimation

Arabic is written cursively in a horizontal orientation. Since letters areconnected by horizontal segments, it would be important to estimate theimaginary line upon which the words would lie. This is the line which onewould write on in ruled paper, and it is termed the baseline.

The extant literature on segmentation of Arabic script, printed andhandwritten online or offline, has featured methods for baseline estimation.Those methods range in their complexity and assumptions. Burrow[5] hasan excellent overview of those methods for offline Arabic handwriting seg-mentation in his master’s thesis on.

For this work, the projection histogram method of baseline estimationwas extensively tried. The idea was that since Arabic letters script areconnected by horizontal lines -called takhshida-, the position of that line

10

Page 11: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

will be indicated by a peak in the point distribution histogram along the Yaxis. We can then use the method described by Lorigo et al. [6] to segmentascending and descending letters.

Pechwitz et al[7] argue that the projection histogram method is ill suitedfor Arabic handwriting since handwriting is usually slanted. Moreover, pro-jection histograms would give erroneous results if estimated for one word.For the purposes of this paper, the input text is assumed to not be slantedor rotated, so that the baseline can be easily estimated. More robust pre-processing and segmentation techniques would have a slant removal andbaseline estimation step.

In the end, we decided to forgo baseline calculations, and instead usefeatures that are more specific to online handwriting. We instead lookedfor almost horizontal line segments that appear between more complicatedsegments.

? ? ? ? ? ?

Figure 6: Stroke segmentation points after merging the nearly coincident.

3.3 Closed Loops

Closed loops are present in many Arabic letters such as þ`þ � � ¤ þ¡ £þ�þ, which makes them important morphological features in segmenting andrecognizing Arabic letters.

After finding the initial segmentation points based on the previous tech-niques, intersection points are calculated between the segments spanningthose points. This is less computationally intensive than exhaustively search-ing for intersection points around each point in the word-part.

If two intersection points are found to be too close in Euclidean distance,but far in their stroke indices, then they are flagged as possible start andend points of a loop. The sums of the x and y derivatives are calculatedalong the possible loop path. If the sums are almost zero, then the path isconfirmed to be a loop.

11

Page 12: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

However, loops do not look the same in all Arabic letters. The differencebetween a medial ghayn and faa þ�þ þfþ is the acute angle in the left side ofthe ghayn loop. The feature vectors generated will indicate a loop, but willalso retain the features of the geometric primitives constituting the loop.

4 Segmentation of Arabic Words

After fitting geometrical primitives to a ward part, we proceed by trying tosegment each word part into its composing letters. We use three collaborat-ing techniques:

1. Find almost horizontal fitted lines, since letters are connected horizon-tally.

2. Find loops. A loop is the central structure of many letters.

3. Find dots. A letter is usually above or below dots.

We favor under-segmentation to over-segmentation, since there are recog-nition frameworks that utilize dictionaries of Arabic word-parts [4] ratherthan seperate Arabic letters.

4.1 Delayed Strokes

Biadsy et al. [4] described a method by which delayed strokes are detectedbased on their size and location, in addition to the time order. They alsodetect dots based on the size and shape of their bounding box.

This paper introduces a novel technique for detecting dot strokes basedon writing direction. Other delayed strokes are identified using the methoddescribed by Biadsy et al.

4.1.1 Dot Detection

After observing a number of native Arabic writers, it was noticed that afterwriting a word part, they all added the dots from left to right. We postulatetwo reasons for that. One might be that since the hand is already at the end(left) of the word part, it is faster to then go back and add the dots left toright. The other stems from the Arabic calligraphy measuring system that

12

Page 13: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

uses the rhomboid shaped dot created by the calligrapher’s pen as a unit ofmeasure. The easiest way to draw two horizontally adjacent rhomboid dotsleft to right, so that the edge of the first dot can be used as a starting pointfor the second dot.

Figure 7: The Arabic calligraphy measuring system devised by Ibn Muqlah,and Abbasid calligrapher who invented the Nashkh script. He died in Bagh-dad in 940 A.D

The writing direction is determined by calculating the sum of the xderivative of the stroke. A stroke that was written left to right will have anegative x derivative sum.

5 Feature Vectors

The output of the segmentation step is a feature vector for every word part.That vector contains the following data:

1. A list of the stroke segmentation points in the word part.

2. A list of the types of fitted primitives between segmentation points.For every line segment, we include a data structure that has the fol-lowing data:

(1) The slope of the line

(2) its delineating points

(3) the mean square error of the fit

For every arc segment, we include a data structure that has the fol-lowing data:

(1) The angle subtended by the arc

13

Page 14: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

(2) the coordinates of the center of the fitted circle

(3) the length of the arc

(4) the mean square error of the fit

3. A list of intersection points. This is a subset of segmentation points.

4. A list of segmentation points that delineate letters. This is also asubset of segmentation points.

6 Conclusions

We do not have enough data to determine the exact general performanceof our technique. However, our segmentation technique always correctlyidentified a word part. However, the rate of correctly segmenting word partsinto letters was in the range of 60%. We suggest that the general principlesbehind our three collaborating segmentation techniques are sound. Ourcurrent implementation of them will require tweaking and refactoring inorder to become more robust.

This course gave me my first encounter with processing pen data. I havelearned many things about processing handwriting strokes, most importantof which is that it is a hard task. I spent countless hours trying to col-lect clean handwriting data, and then trying different techniques for findingpoints of interest. For a while, I had no idea what online features to lookfor to segment Arabic word parts into letters. I felt like the problem wasway above my head, but I tried very hard nonetheless.

There are many things that I wish to improve on. I have developed aninterest in this problem that goes beyond this course. I plan to take thiseffort farther and actually work on developing a recognizer to complementthis segmentation technique.

14

Page 15: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

7 Sample Output

Figure 8: Sample output. This is the first verse of a famous poem writtenby Ahmad Shawqi, an Egyptian poet (1868 - 1932).  Ab�� �y� �Aq�� Yl� �§C�l`��¤

15

Page 16: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Figure 9: Sample output with errors highlighted

Figure 10: City names in Arabic

Figure 11: This reads: Massachusetts Institute of Technology

16

Page 17: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

Figure 12: Another opening verse by Shawqi

Code Parameters

Parameter ValueMultiple of average distance to delineateword part

5

Size of window for smoothing pen speed 2 points on each side(5 total)Size of window for computing tangentslope

11

Speed threshold for speed minima 50% of averageCurvature threshold .75 degree / pixelSpeed threshold for curvature maxima 90% of averageMinimum allowed distance between twosegmentation points

4

Minimum distance between a segmenta-tion point and stroke end

5

Minimum angle between two arcs to com-bine

135◦

Minimum angle between two line segmentsto combine

165◦

Maximum derivative sum to consider patha loop

15

Maximum line slope to consider line seg-ment horizontal

0.11

17

Page 18: Segmentation of Cursive Online Arabic Script: A final paper for MIT 6.870 by Mahmoud Arram

References

[1] Jason Fennell Aaron Wolin, Devin Smith and Max Pfluger. Sketchers2006, 2007. [Online; accessed 1-May-2007].

[2] Bilal Alsallakh and Hani Safadi. Arapen: An arabic online handwrit-ing recognition system. Information and Communication Technologies,2006. ICTTA ’06. 2nd, pages 1844–1849, 2006.

[3] Mahmoud Arram. Pen lab: A web-based application for capturingonline pen strokes, 2007. [Online; accessed 13-May-2007].

[4] Fadi Biadsy, Jihad El-Sana, and Nizar Habash. Online Arabic hand-writing recognition using Hidden Markov Models. In Proceedings of the10th International Workshop on Frontiers of Handwriting and Recog-nition, 2006.

[5] Peter Burrow. Arabic handwriting recognition. Master’s thesis, Schoolof Informatics, University of Edinburgh, 2004.

[6] Liana Lorigo and Venu Govindaraju. Segmentation and pre-recognitionof arabic handwriting. In ICDAR ’05: Proceedings of the Eighth In-ternational Conference on Document Analysis and Recognition, pages605–609, Washington, DC, USA, 2005. IEEE Computer Society.

[7] M. Pechwitz and V. Margner. Baseline estimation for arabic handwrit-ten words. pages 479–484, 2002.

[8] Thomas F. Stahovich. Segmentation of pen strokes using pen speed.AAAI Fall Symposium Series 2004: Making Pen-Based Interaction In-telligent and Natural., 2004.

[9] H.L. Teulings, L.B. Schomaker, P Morasso, and A. Thomassen. Hand-writing analysis system. Proceedings of the Third International Sympo-sium on Handwriting and Computer Applications, pages 181–183, 1987.

[10] Wikipedia. Arabic alphabet- Wikipedia, the free encyclopedia, 2007.[Online; accessed 13-May-2007].

18