Top Banner
The state of the art in handwriting synthesis Randa I. Elanwar Researcher Computers and Systems Dept., ERI
32

The state of the art in handwriting synthesis

Jul 21, 2015

Download

Science

Randa Elanwar
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The state of the art in handwriting synthesis

The state of the art in handwriting synthesis

Randa I. ElanwarResearcher

Computers and Systems Dept., ERI

Page 2: The state of the art in handwriting synthesis

Personal handwriting style aspects

1. Glyph and the size of characters2. Pressure distribution and the slant of

handwriting3. Relative sizes of the middle, the upper, and

the lower zones of letters4. Existence and the shape of lead-in,

connecting, and ending parts5. Letter, the word, and the line spacings6. Embellishment7. Simplified or neglected strokes

PEIT 2013 2

Page 3: The state of the art in handwriting synthesis

What is handwriting synthesis?

Converting ASCII text into the user’s personal handwriting.

PEIT 2013 3

Page 4: The state of the art in handwriting synthesis

What is handwriting synthesis needed for?

Forensic examiners

The disabled

Handwriting recognition systems

Biometric security systems

Information retrieval (web search)

Natural language understanding

Personalization of pen-computing devices

PEIT 2013 4

Page 5: The state of the art in handwriting synthesis

Handwriting synthesis strategies

Duplicated Samples

Combination of different real samples

Synthetic-individuals

PEIT 2013 5

Page 6: The state of the art in handwriting synthesis

Handwriting synthesis strategies

Duplicated Samplesstarts from real samples of a given person

produces duplicated samples corresponding to the same person.

increase the amount of already acquired handwriting data

does not to generate completely new datasets

The great majority of existing approaches for synthetic signature generation are based on this type of strategy.

PEIT 2013 6

Page 7: The state of the art in handwriting synthesis

Handwriting synthesis strategies

Combination of different real samples◦ starts from a pool of real n-grams and using some

type of concatenation

◦ needs real samples to generate the synthetic handwriting trait

◦ utility for performance evaluation and vulnerability assessment is very limited

Useful to produce multiple handwriting samples of a given real user, but not to generate synthetic individuals.

PEIT 2013 7

Page 8: The state of the art in handwriting synthesis

Handwriting synthesis strategies

Synthetic-individuals◦ A priori knowledge about handwriting trait is used

to create a model that characterizes that handwriting trait for a population.

◦ New synthetic individuals can be generated sampling the constructed model.

Doesn't need any real handwriting samples to generate completely synthetic databases. Thus, overcomes the usual shortage of handwriting data collection.

PEIT 2013 8

Page 9: The state of the art in handwriting synthesis

Methods for handwriting synthesis

Movement simulation techniques

Shape-simulation techniques

PEIT 2013 9

Page 10: The state of the art in handwriting synthesis

Methods for handwriting synthesis

Movement simulation techniques

◦ Studies starting from the late sixties have analyzed and studied the human handwriting moments and the human "motor code" for the production of cursive handwriting

PEIT 2013 10

Page 11: The state of the art in handwriting synthesis

Methods for handwriting synthesis

Movement simulation techniques

◦ The handwriting trajectory is analyzed and modeled by velocity or force functions

◦ focus on the representation and analysis of real handwriting signals rather than handwriting synthesis

◦ May not be convenient for synthesizing non-cursive handwriting

PEIT 2013 11

Page 12: The state of the art in handwriting synthesis

Methods for handwriting synthesis

Shape-simulation techniques◦ The straightforward approach to synthesize handwriting from collected handwritten glyphs.

◦ Each glyph is a handwriting sample image of one, two or three letters.

◦ When synthesizing a word, glyphs are simply juxtaposed in sequence and are connected using simple curves causing unnatural looking (discontinuities)

PEIT 2013 12

Page 13: The state of the art in handwriting synthesis

Methods for handwriting synthesis

Shape-simulation techniques◦ Also called glyph-based methods

◦ usually record the glyphs directly and reuse or sample the glyphs when synthesis.

◦ Require intensive user involvement in the sample collection process and cannot produce various handwriting styles in a natural way

◦ tries to learn a separate model for the connection of each pair of letters (impractical due to limited training set)

PEIT 2013 13

Page 14: The state of the art in handwriting synthesis

Challenges of handwriting synthesis

the seed data samples from which character models are designed (variability vs. generalization

the model design and style-dependant parameters selection

how to generate (deform) a synthesized sample finding the best synthesized samples to compose

a word how to concatenate the glyphs without

introducing discontinuities how to make the handwriting look natural how to evaluate the quality of the synthesized

word

PEIT 2013 14

Page 15: The state of the art in handwriting synthesis

Challenges of handwriting synthesis

For the glyph-based approaches, the main challenge is to collect a large enough seed dataset of natural human handwritings encountering a wide range of writer variability.

Meanwhile this violates the main target beyond synthesis, which is, generating large amount of data independent from natural sources.

PEIT 2013 15

Page 16: The state of the art in handwriting synthesis

Challenges of handwriting synthesis

Model Selection: it is difficult to find a reliable description of a word able to represent all the admitted occurrences of the input shape.

Deformation: find an optimal distortion parameter range and control the variability of data to ensure that the synthetic handwriting is natural enough.

PEIT 2013 16

Page 17: The state of the art in handwriting synthesis

Challenges of handwriting synthesis

Natural looking of the synthesized data (cursiveness): the system has to determine which pair of adjacent letters in a word is connected.

It would be ideal if we compute the connection probability from handwritten samples. However, it is impractical as this requires large amount natural handwriting samples.

PEIT 2013 17

Page 18: The state of the art in handwriting synthesis

Challenges of handwriting synthesis

Objective evaluation: ◦ Most researchers do not evaluate the quality of the synthesis process

◦ others use human expertise

◦ using HMM recognizers to justify the improvement in recognition accuracy due to adding synthetic handwriting to the training set.

PEIT 2013 18

Page 19: The state of the art in handwriting synthesis

Synthesis in biometric security

Synthetically generated biometric databases:

i) facilitate the performance evaluation of recognition systems instead of the costly and time-consuming real biometric databases

ii) provide a tool with which to evaluate the vulnerability of biometric systems to attacks carried out with synthetically generated traits

PEIT 2013 19

Page 20: The state of the art in handwriting synthesis

Synthesis in biometric security

A system example is given to generate synthetic online signatures. A Discrete Fourier Transform (DFT) of the trajectory x-y signals is generated.

Deformations: Horizontal and vertical affine scaling, Duration expansion or contraction, and Noise addition are applied.

Finally, signals are refined and processed in the time domain to give more realistic appearance (smoothing, translation, rotation and scaling transformations are applied).

PEIT 2013 20

Page 21: The state of the art in handwriting synthesis

Synthesis in Information retrieval

Handwriting synthesis makes it possible to perform text searches on handwritten word image databases when no ground-truth data is available.

The handwritten string is treated as a pictographic pattern without an attempt to understand it.

In this approach the query string is compared to database strings using an appropriate distance function allowing user to extend his search to include also non-ASCII symbols

PEIT 2013 21

Page 22: The state of the art in handwriting synthesis

Synthesis in Information retrieval

A system example is given to perform text searches on handwritten word image databases when no ground-truth data is available.

The approach proceeds by synthesizing multiple images of the query string using different computer fonts. Normalization and feature extraction operations are applied.

The model is trained using synthetic images, and produces samples according to the distribution of handwritten features. Finally, unsupervised font selection method leverages the font contributions to best represent handwritten data, and yields significant improvements in accuracy.

PEIT 2013 22

Page 23: The state of the art in handwriting synthesis

Synthesis in handwriting recognition

The problem of general unconstrained word recognition is that the recognition rates are low due to shortage of a good quality training data of large amount.

Expanding the training set by collecting additional natural human written texts is error prone, expensive and labor- and time-consuming process. Alternatively, the training set can be expanded by synthetic texts.

PEIT 2013 23

Page 24: The state of the art in handwriting synthesis

Synthesis in handwriting recognition

Several methods have been overviewed. Most of them generate the synthetic texts from existing natural ones.

Some approaches use human written character tuplesto build up synthetic texts, others decompose the glyph to a set of control points and in-between curves.

synthetic texts are generated by applying random perturbations on human written characters

Some researchers use horizontal connection lines or simple polynomials as within-word connections.

PEIT 2013 24

Page 25: The state of the art in handwriting synthesis

Synthesis in pen based computing

The recent emergence of pen computers with high resolution tablets has made available dynamic (temporal) information as well as created the need for robust online handwriting recognition algorithms.

Handwriting is preferable to typed text in some cases because it adds a personal touch.

When writing a note on a Tablet PC, if the computer can automatically correct some written errors and generate some predefined handwriting strokes, this would be more effective and intelligent.

Researchers working on the online problem usually use the same techniques used for offline problem solution

PEIT 2013 25

Page 26: The state of the art in handwriting synthesis

Conclusions and opinions

Researchers are either tending to use local datasets of their own or using very limited part of a public dataset.

This may be attributed to using human expertise for evaluation (laborious with large datasets).

Finding reliable automatic evaluation scheme accelerates the process and allow using much bigger dataset.

PEIT 2013 26

Page 27: The state of the art in handwriting synthesis

Conclusions and opinions

Researchers tend to collect isolated glyph samples which leads to unnatural looking at synthesizing words.

This might be solved if they use cursive words for training.

Automation will be needed for training dataset preprocessing (specially segmentation) to help enlarging the dataset used.

PEIT 2013 27

Page 28: The state of the art in handwriting synthesis

Conclusions and opinions

In the generation process, the glyph model produces samples according to the features distribution extracted from the whole training dataset.

In case of large variances, such method may deform the model and lead to odd looking generated samples.

Writer style clustering and generating different models for the same character per cluster will help restrict the distortion range.

PEIT 2013 28

Page 29: The state of the art in handwriting synthesis

Conclusions and opinions

Researchers tend to use almost the same features or subset of them (geometric, polynomials, filter control points, etc.) which may be a good reason for limited performance.

In case of not finding other novel descriptive features, feature fusion techniques might help change the distribution in feature space. Consequently, modeling will be better.

PEIT 2013 29

Page 30: The state of the art in handwriting synthesis

Conclusions and opinions

Researchers tend to evaluate their synthesis process are using recognizers (especially HMM) by adding the synthetic datasets to the training data and test their system to recognize natural handwritten data.

Recognizers can be used to enhance synthesis not to evaluate. The training data set has to be natural handwritten data and the test dataset should be synthetic.

PEIT 2013 30

Page 31: The state of the art in handwriting synthesis

Conclusions and opinions

Targeting better recognition performance will tune the distortion and modify the synthesized glyph samples look.

Reaching a proper recognition rate, may then qualify the synthetic dataset to be added as training data for another recognizer type for evaluation.

PEIT 2013 31

Page 32: The state of the art in handwriting synthesis

Thank You

PEIT 2013 32