Project report of OCR Recognition

1. INTRODUCTION

In the running world, there is growing demand for the software systems to recognize

characters in computer system when information is scanned through paper documents as we

know that we have number of newspapers and books which are in printed format related to

different subjects. These days there is a huge demand in “storing the information available

in these paper documents in to a computer storage disk and then later reusing this

information by searching process”. One simple way to store information in these paper

documents in to computer system is to first scan the documents and then store them as

IMAGES. But to reuse this information it is very difficult to read the individual contents

and searching the contents form these documents line-by-line and word-by-word. The

reason for this difficulty is the font characteristics of the characters in paper documents are

different to font of the characters in computer system. As a result, computer is unable to

recognize the characters while reading them. This concept of storing the contents of paper

documents in computer storage place and then reading and searching the content is called

DOCUMENT PROCESSING. Sometimes in this document processing we need to process

the information that is related to languages other than the English in the world. For this

document processing we need a software system called CHARACTER RECOGNITION

SYSTEM. This process is also called DOCUMENT IMAGE ANALYSIS (DIA).

Thus our need is to develop character recognition software system to perform Document

Image Analysis which transforms documents in paper format to electronic format. For this

process there are various techniques in the world. Among all those techniques we have

1

chosen Optical Character Recognition as main fundamental technique to recognize

characters. The conversion of paper documents in to electronic format is an on-going task

in many of the organizations particularly in Research and Development (R&D) area, in

large business enterprises, in government institutions, so on. From our problem statement

we can introduce the necessity of Optical Character Recognition in mobile electronic

devices such as cell phones, digital cameras to acquire images and recognize them as a part

of face recognition and validation.

To effectively use Optical Character Recognition for character recognition in-order to

perform Document Image Analysis (DIA), we are using the information in Grid format. .

This system is thus effective and useful in Virtual Digital Library’s design and

construction.

1.1 PURPOSE

The main purpose of Optical Character Recognition (OCR) system based on a grid

infrastructure is to perform Document Image Analysis, document processing of electronic

document formats converted from paper formats more effectively and efficiently. This

improves the accuracy of recognizing the characters during document processing compared

to various existing available character recognition methods. Here OCR technique derives

the meaning of the characters, their font properties from their bit-mapped images.

The primary objective is to speed up the process of character recognition in document

processing. As a result the system can process huge number of documents with-in less

time and hence saves the time.

2

Since our character recognition is based on a grid infrastructure, it aims to recognize

multiple heterogeneous characters that belong to different universal languages with

different font properties and alignments.

1.2 PROJECT SCOPE

The scope of our product Optical Character Recognition on a grid infrastructure is to

provide an efficient and enhanced software tool for the users to perform Document Image

Analysis, document processing by reading and recognizing the characters in research,

academic, governmental and business organizations that are having large pool of

documented, scanned images. Irrespective of the size of documents and the type of

characters in documents, the product is recognizing them, searching them and processing

them faster according to the needs of the environment.

1.3 EXISTING SYSTEM

In the running world there is a growing demand for the users to convert the printed

documents in to electronic documents for maintaining the security of their data. Hence the

basic OCR system was invented to convert the data available on papers in to computer

process able documents, So that the documents can be editable and reusable. The existing

system/the previous system of OCR on a grid infrastructure is just OCR without grid

functionality. That is the existing system deals with the homogeneous character recognition

or character recognition of single languages.

3

1.4 DRAWBACK OF EXISTING SYSTEM

The drawback in the early OCR systems is that they only have the capability to convert and

recognize only the documents of English or a specific language only. That is, the older

OCR system is uni-lingual.

1.5 PROPOSED SYSTEM

Our proposed system is OCR on a grid infrastructure which is a character recognition

system that supports recognition of the characters of multiple languages. This feature is

what we call grid infrastructure which eliminates the problem of heterogeneous character

recognition and supports multiple functionalities to be performed on the document. The

multiple functionalities include editing and searching too where as the existing system

supports only editing of the document. In this context, Grid infrastructure means the

infrastructure that supports group of specific set of languages. Thus OCR on a grid

infrastructure is multi-lingual.

1.6 BENEFIT OF PROPOSED SYSTEM

The benefit of proposed system that overcomes the drawback of the existing system is that

it supports multiple functionalities such as editing and searching. It also adds benefit by

providing heterogeneous characters recognition.

4

1.7 ARCHITECTURE OF THE PROPOSED SYSTEM

The Architecture of the optical character recognition system on a grid infrastructure

consists of the three main components. They are:-

Scanner

OCR Hardware or Software

Output Interface

Figure.1: OCR Architecture

5

Document

Illuminator

Detector

Document Analysis Character

Recognition Contextual Processing

Scanner

OCR Hard-Ware Or Soft-Ware

Document image

Recognition Results

To application user

1.8 INTENDED AUDIENCE AND READING SUGGESTIONS

In this section, we identify the audience who are interested with the product and are

involved in the implementation of the product either directly or indirectly. As from our

research, the OCR system is mainly useful in R&D at various scientific organizations, in

governmental institutes and in large business organizations, we identify the following as

various interested audience in implementing OCR system:-

The scientists, the research scholars and the research fellows in telecommunication

institutions are interested in using OCR system for processing the word document

that contains base paper for their research.

The Librarian to manage the information contents of the older books in building

virtual digital library requires use of OCR system.

Various sites that vendor e-books have a huge requirement of this OCR system in-

order to scan all the books in to electronic format and thus make money. The

Amazon book world is largely using this concept to build their digital libraries.

Now we present the reading suggestions for the users or clients through which the user can

better understand the various phases of the product. These suggestions may be effective and

useful for the beginners of the product rather than the regular users such as research

scholars, librarians and administrators of various web-sites. With these suggestions, the user

need not waste his time in scrolling the documents up and down, browsing through the web,

visiting libraries in search of different books and … The following are the various reading

suggestions that the user can follow in-order to completely understand about our product

and to save time:-

6

It would help you if you start with Wikipedia.com. It lets you know the basic

concept of every keyword you require. First learn from it what is OCR? And how

does it work based on a Grid infrastructure?

Now you can proceed your further reading with the introduction of our product we

provided in our documentation. From these two steps you completely get an in-

depth idea of the use of our product and several processes involved in it.

The more you need is the implementation of the product. For this you can visit

FreeOCR.com where you can view how the sample OCR works and you can try it.

7

2. FEASIBILITY STUDY

A feasibility study is a high-level capsule version of the entire System analysis and Design

Process. The study begins by classifying the problem definition. Feasibility is to determine

if it’s worth doing. Once an acceptance problem definition has been generated, the analyst

develops a logical model of the system. A search for alternatives is analyzed carefully.

There are 3 parts in feasibility study.

2.1 TECHNICAL FEASIBILITY

Evaluating the technical feasibility is the trickiest part of a feasibility study. This is because,

at this point in time, not too many detailed design of the system, making it difficult to

access issues like performance, costs on (on account of the kind of technology to be

deployed) etc. A number of issues have to be considered while doing a technical analysis.

Understand the different technologies involved in the proposed system before commencing

the project we have to be very clear about what are the technologies that are to be required

for the development of the new system. Find out whether the organization currently

possesses the required technologies. Is the required technology available with the

organization?.

2.2 OPERATIONAL FEASIBILITY

Proposed project is beneficial only if it can be turned into information systems that will

meet the organizations operating requirements. Simply stated, this test of feasibility asks

8

if the system will work when it is developed and installed. Are there major barriers to

Implementation? Here are questions that will help test the operational feasibility of a

project:

Is there sufficient support for the project from management from users? If the current

system is well liked and used to the extent that persons will not be able to see reasons

for change, there may be resistance.

Are the current business methods acceptable to the user? If they are not, Users may

welcome a change that will bring about a more operational and useful systems.

Have the user been involved in the planning and development of the project?

Early involvement reduces the chances of resistance to the system and in general and

increases the likelihood of successful project.

Since the proposed system was to help reduce the hardships encountered. In the existing

manual system, the new system was considered to be operational feasible.

2.3 ECONOMIC FEASIBILITY

Economic feasibility attempts to weigh the costs of developing and implementing a new

system, against the benefits that would accrue from having the new system in place. This

feasibility study gives the top management the economic justification for the new system. A

simple economic analysis which gives the actual comparison of costs and benefits are much

more meaningful in this case. In addition, this proves to be a useful point of reference to

compare actual costs as the project progresses. There could be various types of intangible

9

benefits on account of automation. These could include increased customer satisfaction,

improvement in product quality better decision making timeliness of information,

expediting activities, improved accuracy of operations, better documentation and record

keeping, faster retrieval of information, better employee morale.

2.4 TRAINING

Training is a very important process of working with a neural network. As seen from neural

networks, there are two forms of training that can be employed with a neural network. They

are namely:-

1. Un-Supervised Training

2. Supervised Training

Supervised training provides the neural network with training sets and the anticipated

output. Unsupervised training supplies the neural network with training sets, but there is no

anticipated output provided.

2.4.1 UNSUPERVISED TRAINING

Unsupervised training is a very common training technique for Kohonen neural networks.

We will discuss how to construct a Kohonen neural network and the general process for

training without supervision.

What is meant by training without supervision is that the neural network is provided with

training sets, which are collections of defined input values. But the unsupervised neural

network is not provided with anticipated outputs.

10

Unsupervised training is usually used in a classification neural network. A classification

neural network takes input patterns, which are presented to the input neurons. These input

patterns are then processed, and one single neuron on the output layer fires. This firing

neuron can be thought of as the classification of which group the neural input pattern

belonged to. Handwriting recognition is a good application of a classification neural

network.

The input patterns presented to the Kohonen neural network are the dot image of the

character that was hand written. We may then have 26 output neurons, which correspond to

the 26 letters of the English alphabet. The Kohonen neural network should classify the

input pattern into one of the 26 input patterns.

During the training process the Kohonen neural network in handwritten recognition is

presented with 26 input patterns. The network is configured to also have 26 output patterns.

As the Kohonen neural network is trained the weights should be adjusted so that the input

patterns are classified into the 26 output neurons. This technique results in a relatively

effective method for character recognition.

Another common application for unsupervised training is data mining. In this case you

have a large amount of data, but you do not often know exactly what you are looking for.

You want the neural network to classify this data into several groups. You do not want to

dictate, ahead of time, to the neural network which input pattern should be classified to

which group. As the neural network trains the input patterns will fall into similar groups.

This will allow you to see which input patterns were in common groups.

2.4.2 SUPERVISED TRAINING

11

The supervised training method is similar to the unsupervised training method in that

training sets are provided. Just as with unsupervised training these training sets specify

input signals to the neural network.

The primary difference between supervised and unsupervised training is that in supervised

training the expected outputs are provided. This allows the supervised training algorithm to

adjust the weight matrix based on the difference between the anticipated output of the

neural network, and the actual output.

There are several popular training algorithms that make use of supervised training. One of

the most common is the back-propagation algorithm. It is also possible to use an algorithm

such as simulated annealing or a genetic algorithm to implement supervised training

2.5 INTRODUCING KOHONEN NEURAL NETWORK

The Kohonen neural network differs considerably from the feed-forward back propagation

neural network. The Kohonen neural network differs both in how it is trained and how it

recalls a pattern. The Kohonen neural network does not use any sort of activation function.

Further, the Kohonen neural network does not use any sort of a bias weight.

Output from the Kohonen neural network does not consist of the output of several neurons.

When a pattern is presented to a Kohonen network one of the output neurons is selected as

a "winner". This "winning" neuron is the output from the Kohonen network. Often these

"winning" neurons represent groups in the data that is presented to the Kohonen network.

12

For example, in an OCR program that uses 26 output neurons, the 26 output neurons map

the input patterns into the 26 letters of the Latin alphabet.

The most significant difference between the Kohonen neural network and the feed forward

back propagation neural network is that the Kohonen network trained in an unsupervised

mode. This means that the Kohonen network is presented with data, but the correct output

that corresponds to that data is not specified. Using the Kohonen network this data can be

classified into groups. We will begin our review of the Kohonen network by examining the

training process.

It is also important to understand the limitations of the Kohonen neural network. Neural

networks with only two layers can only be applied to linearly separable problems. This is

the case with the Kohonen neural network. Kohonen neural networks are used because they

are a relatively simple network to construct that can be trained very rapidly.

A "feed forward" neural network is similar to the types of neural networks that we are ready

examined. Just like many other neural network types the feed forward neural network

begins with an input layer. This input layer must be connected to a hidden layer. This

hidden layer can then be connected to another hidden layer or directly to the output layer.

There can be any number of hidden layers so long as at least one hidden layer is provided.

In common use most neural networks will have only one hidden layer. It is very rare for a

neural network to have more than two hidden layers. We will now examine, in detail, and

the structure of a "feed forward neural network".

The Structure of a Feed Forward Neural Network

13

A "feed forward" neural network differs from the neural networks previously examined.

Figure 2.1 shows a typical feed forward neural network with a single hidden layer.

Figure 2 Feed Forward Neural Network

The Input Layer

The input layer to the neural network is the conduct through which the external

environment presents a pattern to the neural network. Once a pattern is presented to the

input layer of the neural network the output layer will produce another pattern. In essence

this is all the neural network does. The input layer should represent the condition for which

we are training the neural network for. Every input neuron should represent some

independent variable that has an influence over the output of the neural network.

It is important to remember that the inputs to the neural network are floating point numbers.

These values are expressed as the primitive Java data type "double". This is not to say that

14

you can only process numeric data with the neural network. If you wish to process a form

of data that is non-numeric you must develop a process that normalizes this data to a

numeric representation.

The Output Layer

The output layer of the neural network is what actually presents a pattern to the external

environment. Whatever patter is presented by the output layer can be directly traced back to

the input layer. The number of a output neurons should directly related to the type of work

that the neural network is to perform.

To consider the number of neurons to use in your output layer you must consider the

intended use of the neural network. If the neural network is to be used to classify items into

groups, then it is often preferable to have one output neurons for each groups that the item

is to be assigned into. If the neural network is to perform noise reduction on a signal then it

is likely that the number of input neurons will match the number of output neurons. In this

sort of neural network you would one day he would want the patterns to leave the neural

network in the same format as they entered.

For a specific example of how to choose the numbers of input and output neurons consider

a program that is used for optical character recognition, or OCR. To determine the number

of neurons used for the OCR example we will first consider the input layer. The number of

input neurons that we will use is the number of pixels that might represent any given

character. Characters processed by this program are normalized to universal size that is

represented by a 5x7 grid. A 5x7 grid contains a total of 35 pixels. The optical character

recognition program therefore has 35 input neurons.

15

The number of output neurons used by the OCR program will vary depending on how many

characters the program has been trained for. The default training file that is provided with

the optical character recognition program is trained to recognize 26 characters. As a result

using this file the neural network would have 26 output neurons. Presenting a pattern to the

input neurons will fire the appropriate output neuron that corresponds to the letter that the

input pattern corresponds to.

3.SOFTWARE REQUIREMENT ANALYSIS

3.1 PROBLEM STATEMENT

The problem here is for the software systems to recognize characters in computer system

when information is scanned through paper documents as we know that we have number of

16

newspapers and books which are in printed format related to different subjects. Whenever

we scan the documents through the scanner, the documents are stored as images such as

jpeg, gif etc., in the computer system. These images cannot be read or edited by the user.

But to reuse this information it is very difficult to read the individual contents and searching

the contents form these documents line-by-line and word-by-word. These days there is a

huge demand in “storing the information available in these paper documents in to a

computer storage disk and then later editing or reusing this information by searching

process”.

3.2 MODULES AND THEIR FUNCTIONALITIES

Our software system Optical Character Recognition on a grid infrastructure can be divided

into five modules based on its functionality.The modules classified are as follows:-

Document Processing Module

System Training Module.

Document Recognition Module.

Document Editing Module and

Document Searching Module.

3.2.1 DOCUMENT PROCESSING MODULE

This module is accessed by administrator whose role in our application is a librarian.This

module perform certain activities such as scanning documents, storing them as images,

recognizing characters in images to transfer them into word format. During the recognition

17

process, this module uses the OCR methodology in support of grid infrastructure

datastructure. The module supports the following services:-

Scanning printed documents.

Storing the documents as snapshots or images.

Processing those image-based documents.

Converting these image-based documents into e-documents(also called structured

documents).

Recognizing the characters in documents.

Generating grid infrastructure datastructure.

3.2.2 SYSTEM TRAINING MODULE

This module can be accessed by both the administrator and the end-user. Before converting

the printed documents in to editable and searchable documents, the first and the mandatory

step is providing training to the system. Here training in the sense the font followed in the

scanned document should be identified by the user. Then the user types all the characters

that are required for recognition from the scanned document as an image file. This image

file should be provided as an input during the training process. The user then clicks the train

button provided in the recognition module. Then the training gets completed. Thus the

system gets familiar with the new font. This module supports:-

Training the system with the pre-defined fonts.

Training the system with the new fonts that are not present in the system and that

cannot be identified by the system.

3.2.3 DOCUMENT RECOGNITION MODULE

18

This module can be accessed by both the administrator and the end-user. Once the printed

documents are converted into structured documents, any user can recognize the characters

present in the document. That means the user can recognize the characters of any language

he chooses which makes OCR more flexible. This flexibility is due to the adaptation of grid

infrastructure. This is the module where the main functionality of OCR is tested.

Under this module, there are two types of recognition. They are handwritten recogniiton

and scanned document recognition.

In handwritten recognition, the handwriting of the user in any language is trained to the

system only for the first time. From there on-wards, the system recognizes the characters or

words written by the user. Thus handwritten document recognition recognizes the human

handwriting.

In scanned document recognition, the system is first trained with the font characters in the

document in the training module itself. Now in the recognition module, the system takes the

scanned documents image as an input file, first crops the image and then

extracts/recognizes the characters from the document and makes these documents editable

and searchable. Thus the scanned document recognition recognizes the chracters from the

scanned document image and makes the document editable and searchable. Hence the

document recogniiton module on a whole supports the following services:-

Converts the document into specific format

Recognizes the characters

Heterogeneous character Recognition

3.2.4 DOCUMENT EDITING MODULE

19

This module can be accessed by both the administrator and the end-user during document

editing to implement the character recogniiton process. Once the scanned documents are

stored, they reside in computer memory. This data resides in the form of an image that is

just viewable in an image viewer. Hence, the document is first coverted into a form such

that it is editable. The desired form of the document may be MS-Word,Text,… as specified

by the user.The objective of this module is to let the user perform :-

Addition of specific content to the documents

Deletion of certain content from documents

Any other modification of documents.

3.2.5 DOCUMENT SEARCHING MODULE

This module can be accessed by both the administrator and the end-user during the search

of the user required document to implement the character recogniiton process on it. The

user requests the system to search for a particular document. Then the system finds the

documents based on OCR methodology and returns the result of the search to the user.

4. SOFTWARE DESIGN

20

4.1 DATA FLOW DIAGRAM

The DFD is also called as bubble chart. A data-flow diagram (DFD) is a graphical

representation of the "flow" of data through an information system. DFD’s can also be used

for the visualization of data processing. The flow of data in our system can be described in

the form of dataflow diagram as follows:-

1. Firstly, if the user is administrator he can initialize the following actions:-

Document processing

Document search

Document editing.

All the above actions come under 2cases.They are described as follows:-

a) If the printed document is a new document that is not yet read into the system, then the

document processing phase reads the scanned document as an image only and then

produces the document image stored in computer memory as a result.

Now the document processing phase has the document at its hand and can read the

document at any point of time. Later the document processing phase proceeds with

recognizing the document using OCR methodology and the grid infrastructures. Thus it

produces the documents with the recognized characters as final output which can be

later searched and edited by the end-user or administrator.

b) If the printed document is already scanned in and is held in system memory, then the

document processing phase proceeds with document recognition using OCR

methodology and grid infrastructure. And thus it finally produces the document with

recognized documents as output.

21

2. If the user using the OCR system is the end-user, then he can perform the following

actions:-

Document searching

Document editing

1. Document Searching:- The documents which are recognized can be searched by the

user whenever required by requesting from the system database.

2. Document Editing:- The recognized documents can be edited by adding the specific

content to the document, deleting specific content from the document and modifying the

document.

22

Document as image

Figure 3:Data Flow Diagram

4.2 UML DIAGRAMS

23

Recognize Document

Read documents

Store images

Scan images

Modify

Use Grid

Use OCR

Delete

Recognize Document

Document Processing

Edit DocumentDocument Search

User

UML combines best techniques from data modeling (entity relationship diagrams),

business modeling (work flows), object modeling, and component modeling. It can be used

with all processes, throughout the software development life cycle, and across different

implementation technologies. UML has 14 types of diagrams divided into two categories.

Seven diagram types represent structural information, and the other seven represent general

types of behavior, including four that represent different aspects of interactions. Some of

these diagrams we provided to describe the design and implementation of our OCR system

can be categorized hierarchically as below:-

Use case diagram

Class diagram

Sequence diagram

Collaboration diagram

Activity diagram

Component diagram

Deployment diagram

4.2.1 USE-CASE DIAGRAMS

Our software system can be used to support library environment to create a Digital Library

where several paper documents are converted into electronic-form for accessing by the

users. For this purpose the printed documents must be recognized before they are converted

into electronic-form. The resulting electronic-documents are accessed by the users like

faculty and students for reading and editing. Now according to this information, the

following are the different actors involved in implementing our OCR system:-

24

If we consider for virtual digital library, the Administrator can be the Librarian and

the End-users can be Students or/and Faculty.

The following are the list of use diagrams that altogether form the complete or the

overall use-case diagram. They are listed below:-

1. Use-case diagram for document processing

2. Use-case diagram for neural network training

3. Use-case diagram for document recognition

4. Use-case diagram for document editing

5. Use-case diagram for document searching

In each of the use-case diagrams below we clearly explained about that particular use-

case functionality. In this we provided a description about the

Use-case name

Details about the use-case

Actors using this use-case

The flow of events carried out by the use-case

The conditions that occur in this use-case

25

Scans documents

read images

stores the images

Administrator

Figure 4 :Use-Case Diagram For Document Processing

Use Case Name

Document processing

Description

The administrator is the only person who participates in the document processing. Here he

scans the documents. The scanned documents are read as images. Finally the read images

are stored in the system memory.

Actors

Primary Actor : Administrator

Secondary Actor : User

26

Flow of Events

1. The Administrator scans the document which he wants to edit.

2. The scanned documents are read as images.

3. Finally the images that are read are stored in system memory for the future

reference.

Enters specific characters

Stores them as image file

Trains the system

Administrator or end-user

Figure 5:Use-Case Diagram For Neural Network Training

Use case Name

Neural Network Training

Description

The Administrator or End-user enters the specific characters required for training. User

stores them as image file and trains the system.

27

Actors

Primary Actor : Administrator or End-user


Flow of Events

1. The user enters the specific characters in order to train the system.

2. After entering it is stored as image file.

3. Finally trains the system according to the system.

Pre-Condition

The font in the scanned document should be identified.

Open document in editor

Select Edit action

Performs editing

Stores edited document

Administrator or End-user

Figure 6 Use-Case Diagram For Document Editing

28

Use case Name

Document editing

Description

Both Administrator and End-user can perform the document editing. The user opens the

document in the editor and selects the edit action i.e., edit, modify, delete etc. After

selecting the edit action editing operation is performed and finally stores the document that

had been edited.

Actors



Flow of Events

1. The Administrator or End-user opens the document which he want to edit.

2. He selects the edit action. The action consists of editing the document, modifying

the document, deleting the document etc.

3. After selecting the edit action the editing operation is performed.

4. Finally the edited document is stored in the system memory.

Pre-Condition

The input to be taken for editing should be an image of the document that is converted in to

word or text file. That is the input file must be either .doc file or .txt file only.

29

Post-Condition

Finally after editing the document there are specific target formats defined by the user. The

document should be saved in that format only. That will be the output of the editor. That is,

as per our design the final document after editing must be saved in .doc file or .txt file only.

Trains System

Recognize characters


Figure 7:Use-Case Diagram For Document Recognition

Use case Name

Document Recognition

30

Description

The Administrator or End-user trains the system according to the given symbols or

alphabets. Then the characters are recognized after the system is trained.

Actors



Flow of Events

1. The user trains the system to recognize the characters.

2. After the system is trained the characters are recognized.

Pre-Condition

Before trying to recognize the characters, the system should be trained first with the font

characteristics and the font size.

Opens document in Editor

Enters word for search

searches the word


Figure 8:Use-Case Diagram For Document searching

31

Use case Name

Document Searching

Description

The Administrator or End-user opens the document in editor. He enters the word which he

is looking for in that document. Then he searches the word.

Actors



Flow of Events

1. The user opens the document for searching a word he required.

2. After opening the document he enters the word for search.

3. Finally searches the word in that document.

Pre And Post Conditions

No pre-condition and post-condition

32

Overall Use-Case Diagram

end-user1end-user2

Document modification Document deletion

Document recognition

scan documents

store documents

Document processing

<<includes>>

<<includes>>

Document processing

Document editing

administrator

Trains the system

end-user

33

Figure 9

4.2.2 CLASS DIAGRAMS

The class diagram is the main building block in object oriented modeling. The classes in a

class diagram represent both the main objects and or interactions in the application and the

objects to be programmed.

The class diagram of our OCR system consists of 9classes. They are

1. MainScreen

2. Editor

3. HelpFrame

4. Document

5. HEntry

6. Entry

7. TrainingSet

8. KohonenNetwork

9. PrintedFrame.

34

Among all these classes the MainScreen is the main class that represents all the major

functions carried out by our OCR system. The MainScreen class has an association with

five classes viz., Editor, HelpFrame, Document, TrainingSet, PrintedFrame. And the

TrainingSet class in-turn has an association with the HEntry and the KohonenNetwork

classes. The PrintedFrame has an association with the Entry and KohonenNetwork classes.

Document

docid : integerdocname : Stringdocsize : integerdoctype : String

getDocumentDetails()scanDocument()covertToImage()storeImage()

Editor

cut()copy()paste()new()open()find()

HelpFrame

HEntry

hLineClear()vLineClear()findBounds()

TrainingSet

inputCount : intoutputcount : inttrainingSetCount : int

setInputCount()setOutputCount()setTrainingSetCount()setClassify()

1..*

1

1..*

1

MainScreen

editor()helpFrame()printedFrame()handWrittenFrame()

Entry

recog : intdownSampleLeft : intdownSampleRight : intdownSampleTop : intdownSampleBottom : int

hLineClear()hLineClearWithin()vLineClear()vLineClearWithin()

PrintedFrame

open_action()train_action()topen_action()recogniseAll_action()

1..*

1

1..*

1

KohenNetwork

LearnMethod = 1:intLearnRate = 0.3:doublequitError : double

copyWeights()clearWeights()winner()normalizeInput()

1..*1..* 1..*1..* 1..*1..* 1..*1..*

Figure 10:Class Diagram

4.2.3 SEQUENCE DIAGRAMS

35

Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and

timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different

processes or objects that live simultaneously, and, as horizontal arrows, the messages

exchanged between them, in the order in which they occur. This allows the specification of

simple runtime scenarios in a graphical manner.

In sequence diagram, the class objects that are used to describe the interaction between

various classes vary from one function to another function. There are five sequence

diagrams short-listed below for presenting the sequence of actions performed by each of the

five modules. The key class object involved in all of these module functions is MainScreen

class which controls the interaction among various class objects.

Sequence Diagram for Document Processing

1. Objects

Administrator - “a”

MainScreen - “m”

Document - “d”

SystemMemory - “s”

2. Links

1. Administrator object to MainScreen object.

2. MainScreen object to Document object.

36

3. Document object to SystemMemory object.

4. SystemMemory object to Administrator object.

3. Messages

1. Process documents

2. Scan documents

3. Scans

4. Stores documents

5. Stores

6. Returns the processed documents

37

a:Administraror m:MainSreen d:Document s:SystemMemory

1.Process documents

2.Scan documents

3.Scans

4.Stores documents

5.Stores

6.Returns the processed documents

Figure 11:Sequence Diagram for Processing

Sequence Diagram for System Training

1. Objects


System - “s”

TrainingSet - “t”

2. Links

38

1. Administrator object to System object

2. System object to TrainingSet object

3. TrainingSet object to System object

4. System object to Administrator object

3. Messages

1. Specifies the font characters

2. Stores it as an image

3. Trains the system with new font

4. System recognizes new font and returns for user

a:Administrator s:System t:TrainingSet

1.Specifies the font characters

2.Stores it as an image

3.Trains the system with new font

4.System recognizes new font and returns for user

Figure 12:Sequence Diagram for Training

39

Sequence Diagram for Document Recognition

1. Objects




TrainingSet - “t”

2. Links

1. Administrator object to MainScreen object

2. MainScreen object to SystemMemory object

3. SystemMemory object to MainScreen object

4. MainScreen object to TrainingSet object

5. TrainingSet object to MainScreen object

6. MainScreen object to Administrator object

3. Messages

1. Recognize documents

2. Store processed document

3. Read file image

4. Recognize using ocr

40

5. Send processed document

6. Recognize the characters

a:Administrator m:MainScreen s:SystemMemory t:TrainingSet

1:Recognise documents

2.Store processed document

3.Read file image

4.Recognise using ocr

5.Send processed document

6.Recognise the characters

Figure 13:Sequence Diagram for Recognition

Sequence Diagram for Document Editing

1. Objects



Document - “d”


2. Links

1. Administrator object to MainScreen object.

41

2. MainScreen object to Document object.

3. MainScreen object to Document object


5. Document object to SystemMemory object.

6. SystemMemory object to Administrator object.

3. Messages

1. Edit document

2. Adding document

3. Adds

4. Deleting document

5. Deletes

6. Modifying document

7. Modifies

8. Stores the edited documents

9. Administrator accesses the edited documents

42

a:Administrator m:MainScreen d.Document s:SystemMemory

1.Edit document

2.Adding document

3.adds

4.Deleting content5.Deletes

7.Modifies

8.Stores the edited documents

9.Administrator accesses the edited documents

6.Modifing content

Figure 14:Sequence Diagram for Editing

Sequence Diagram for Document Searching

Objects



Document – “d”

Links

1. Administrator object to MainScreen object

43


3. Document object to Administrator object

Messages

1. Specifies the word

2. Searches the word

3. Searches

4. Returns the location of the word

a:Administrator m:MainScreen d:Document

1.Specifies the word2.Searches the word

4.Returns the location of the word

3.Searches

Figure 15:Sequence Diagram for Searching

4.2.5 ACTIVITY DIAGRAMS

The purpose of activity diagram is to provide a view of flows and what is going on inside a

use case or among several classes. Activity diagram can also be used to represent a class’s

method implementation. A token represents an operation. An activity is shown as a round

44

box containing the name of the operation. An outgoing solid arrow attached to the end of

activity symbol indicates a transition triggered by the completion.

Activity Diagram For Document Processing

45

Request document processing

Process document

Retry for scanning

Scan documents

Store documents

[ scanner not ready ]

[ scanner ready ]

Figure 16:Activity Diagram For Processing

46

Request document

Initiate search

Returns message

Sends document to user

Retrieves document

[ Document exists ]

[ Document does not exist ]

Figure 17:Activity Diagram for document Retrieval

47

Activity Diagram For Document Storage

Edit documents

Delete document content

[ user choses delete ]Add document content

[ user choses add ]

Modify document

[ user choses modify ]

Store documents

Figure 18:Activity Diagram for Document Storage

48

4.2.6 COMPONENT DIAGRAM

The crucial component in our component diagram that plays a major role in implementing

the OCR system is the GUI component. All other components that is Document processing

and recognition, Document editing and Document Searching depends on it. They are as

follows:-

GUI Component that is used to design GUI screens for interacting with the end-user and

administrator.

From the GUI component other components functionalities are carried out. The

functionalities include Document processing and recognition, Document editing and

Document Searching.

49

GUI Document Processing and Recognition

EditingSearching

GUI Screens

adding,deleting,modifying

scanning,storing and recognising characters

supports user search function

Figure 19:Component Diagram

4.2.7 DEPLOYMENT DIAGRAM

A deployment diagram serves to model the physical deployment of artifacts on

deployment targets. Deployment diagrams show "the allocation of Artifacts to Nodes

according to the Deployments defined between them.”.

50

In the deployment diagram of our OCR system, the server role is played by admin called

Librarian. There can be N number of clients who can access the digital library data content

at a time. The clients here may be either the students or the faculty or the both.

The actions performed by the Administrator are document processing, searching and

editing where as the actions performed by the end-user are only document searching

and editing.

<<Server>>

<<Client1>> <<Client2>> <<ClientN>>

Document searching, editing



Document Processing, editing and searching

Figure 18:Deployment Diagram

51

5.CODING/CODE TEMPLATES

Sample Code

CODE SNIPPETS FOR TRAINING

public class TrainingSet

{

protected int inputCount;

protected int outputCount;

protected double input[][];

protected double output[][];

protected double classify[];

protected int trainingSetCount;

TrainingSet ( int inputCount , int outputCount )

{

this.inputCount = inputCount;s

this.outputCount = outputCount;

trainingSetCount = 0;

}

public int getInputCount()

{

return inputCount;

52

}

public int getOutputCount()

{

return outputCount;

}

public void setTrainingSetCount(int trainingSetCount)

{

this.trainingSetCount = trainingSetCount;

input = new double[trainingSetCount][inputCount];

output = new double[trainingSetCount][outputCount];

classify = new double[trainingSetCount];

}

public int getTrainingSetCount()

{

return trainingSetCount;

}

void setInput(int set,int index,double value) throws RuntimeException

{

if ( (set<0) || (set>=trainingSetCount) )

throw(new RuntimeException("Training set out of range:" + set ));

if ( (index<0) || (index>=inputCount) )

throw(new RuntimeException("Training input index out of range:" + index ));

input[set][index] = value;

53

}

void setOutput(int set,int index,double value)

throws RuntimeException

{



if ( (index<0) || (set>=outputCount) )


output[set][index] = value;

}

void setClassify(int set,double value)


{



classify[set] = value;

}

double getInput(int set,int index)


{



if ( (index<0) || (index>=inputCount) )

54


return input[set][index];

}

double getOutput(int set,int index)


{



if ( (index<0) || (set>=outputCount) )


return output[set][index];

}

double getClassify(int set)


{



return classify[set];

}

void CalculateClass(int c)

{for ( int i=0;i<=trainingSetCount;i++ ) {

classify[i] = c + 0.1;

} }

55

double []getOutputSet(int set)


{if ( (set<0) || (set>=trainingSetCount) )


return output[set];}

double []getInputSet(int set)


{



return input[set];

}

}

56

6.TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover

every conceivable fault or weakness in a work product. It provides a way to check the

functionality of components, sub assemblies, assemblies and/or a finished product. It is the

process of exercising software with the intent of ensuring that the software system meets its

requirements and user expectations and does not fail in an unacceptable manner. There are

various types of test. Each test type addresses a specific testing requirement.

6.1 TYPES OF TESTS

Unit Testing

Unit testing involves the design of test cases that validate that the internal program logic

is functioning properly, and that program input produces valid outputs. All decision

branches and internal code flow should be validated. It is the testing of individual software

units of the application .it is done after the completion of an individual unit before

integration. This is a structural testing, that relies on knowledge of its construction and is

invasive. Unit tests perform basic tests at component level and test a specific business

process, application, and/or system configuration. Unit tests ensure that each unique path of

a business process performs accurately to the documented specifications and contains

clearly defined inputs and expected results.

57

Integration Testing

Integration tests are designed to test integrated software components to determine if they

actually run as one program. Testing is event driven and is more concerned with the basic

outcome of screens or fields. Integration tests demonstrate that although the components

were individually satisfaction, as shown by successfully unit testing, the combination of

components is correct and consistent. Integration testing is specifically aimed at exposing

the problems that arise from the combination of components.

System Testing

System testing ensures that the entire integrated software system meets requirements. It

tests a configuration to ensure known and predictable results. An example of system testing

is the configuration oriented system integration test. System testing is based on process

descriptions and flows, emphasizing pre-driven process links and integration points.

Functional Testing

Functional tests provide a systematic demonstration that functions tested are available as

specified by the business and technical requirements, system documentation, and user

manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.

58

Organization and preparation of functional tests is focused on requirements, key functions,

or special test cases. In addition, systematic coverage pertaining to identify business process

flows, data fields, predefined processes, and successive processes must be considered for

testing. Before functional testing is complete, additional tests are identified and the

effective value of current tests is determined.

There are two basic approaches of functional testing:

a. Black box or functional testing.

b. White box testing or structural testing.

(a) Black box testing

This method is used when knowledge of the specified function that a product has been

design to perform is known. The concept of black box is used to repress..ent a system hose

inside working’s are not available to inspection. In a black box the test item is eaten as

“Black”, since its logic is unknown is what goes in and what comes out, or the input and

output.

In black box testing, we try various inputs and examine the resulting outputs. The black

box testing can also be used for scenarios based test .In this test we verify whether it is

taking valid input and producing resultant out to user. It is imaginary box testing that hides

internal workings. In our project valid input is image resultant output well structured image

should be received.

(b)White box testing

White box testing is concern with testing implementation of the program. The intent of

structural testing is not to exercise all the inputs or outputs but to exercise the different

programming and data structures used in the program. Thus structure testing aims to

59

achieve test cases that will force the desire coverage of different structures. Two types of

path testing are:

1. Statement testing

2. Branch testing

Statement Testing

The main idea of statement testing coverage is to test every statement in the objects

method by executing it at least once. However, realistically, it is impossible to test program

on every single input, so you never can be sure that a program will not fail on some input.

Branch Testing

The main idea behind branch testing coverage is to perform enough tests to ensure that

every branch alternative has been executed at least once under some test. As in statement

testing coverage, it is unfeasible to fully test any program of considerable size.

6.2 UNIT TESTING

Unit testing is usually conducted as part of a combined code and unit test phase of the

software lifecycle, although it is not uncommon for coding and unit testing to be conducted

as two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

All field entries must work properly.

Pages must be activated from the identified link.

The entry screen, messages and responses must not be delayed.

60

Features to be tested

Verify that the entries are of the correct format.

No duplicate entries should be allowed.

All links should take the user to the correct page.

6.3 INTEGRATION TESTING

Software integration testing is the incremental integration testing of two or more integrated

software components on a single platform to produce failures caused by interface defects.

The task of the integration test is to check that components or software applications, ex.

components in a software system or one step up software applications at the company level

- interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects

encountered.

6.4 ACCEPTANCE TESTING

User Acceptance Testing is a critical phase of any project and requires significant

participation by the end user. It also ensures that the system meets the functional

requirements.

Test Results: All the test cases mentioned above passed successfully. No defects

encountered.

61

OUTPUT SCREENS

The following shows the series of output screens and how the actual process of

implementing OCR takes place:-

The first and the home page of our optical character recognition system looks as shown

in figure 8.1.It provides an interface to the user such that the user can access any

module that is present in this software from this page itself. The page is as shown

below:-

Figure 18: Main screen

There are two types of recognitions in the document recognition module. They are

handwrtitten letter recognition and the scanned document recognition. The

implementation of the handwritten document recognition proceeds as follows:-

62

Process Explaining Hand Written Letter Recognition

Firstly, When we click the handwritten recognition button on the home page the

following screen appears on the user interface presenting the user all the operations that

can be performed in this module:-

Hand Written Screen 1

From the above screen we can write letters on the workspace provided with the name

“Draw Letters Here” by using mouse pointer. For recognizing these letters we have to

train the system first. Else, it will give an error message depicting that the system has to

be trained first. This process is explained with the following screens:-

Firstly suppose that we have drawn a letter named ‘A’ in the workspace provided.

63

Hand written Screen 2

Now suppose that you have clicked the “Recognize” button without training, for

recognizing the character you have written and showing the recognized character in the

grid. Then it will display an error message as shown below:-

64

Hand written screen 3

Now if we click the “Begin Training” button before proceeding with the recognition

then a status message with successful status is shown below:-


65

Since the training has been completed, now the letter ‘A’ can be recognized by clicking

then “Recognize” button. Then the letter ‘A’ will appear in the grid as output. It is as

shown below:-


Once we have provided training to the system for every session, the system do not need

any further training for any kind of letter in any kind of language. That is, once the

training is provided to the system for at-least one character then onwards, it will

recognize any character written in the workspace without the need of training it.

For Example, First we have written letter ‘A’ provided training for it and recognized the

letter A. Now we have written letter S. Now without the need for the training we can

directly recognize the letter ‘S’ in the grid by clicking the “Recognize” button. Thus we

do not need to train the system further.

66



Since we have provided the training to the system once with one character of English

language, We can now recognize the characters of any language other than English that

67

too without the need for training. Suppose we have written a telugu character as shown

below:-


Now we can directly recognize the above telugu character without the need of training

the system. Just click the “Recognize” button once after drawing the letter in the

workspace.

68


Next other than providing the training to the system through the drawn letters, we can

also train the system by providing the characters through the keyboard and storing them

as patterns. Later we provide training to the system on those patterns.

Firstly, We provide the input through the keyboard as follows:-


69

If we click ok, those letters will be saved in stored patterns workspace. Later we can

click “Begin Training” button such that those stored patterns will be trained to the

system. Else, it will provide an error message depicting that the system needs training.


Now suppose that if we write a word ‘sr’ and click “Recognize” button before

providing training on the above stored pattern ‘A’ then an error message will be

displayed depicting that the system needs to be trained on the stored patterns as shown

below:-

70


Now click the “Begin Training” button before you attempt to recognize the drawn word

‘sr’. Then it produces an output screen as shown below indicating that the training has

been completed:-


Now if we click the “Recognize” button then the drawn word ‘sr’ is recognized and is

shown as an output in the grid format by firing the last neuron in stored patterns.

71


Since we have provided training on the stored patterns once, from now onwards we can

just draw the characters or words of any language and we can recognize them directly

by clicking the ”Recognize” button without the need for training the system again. An

example is shown for a telugu word.

72

Hand written Screen 15

Process Explaining Scanned Document Recognition

Firstly, When we click the “Scanned Document Recognition Button” the main page of

this recognition module is displayed as follows:-

Scanned Screen 1

73

The data that is present in the first text box is the default image file set by the user. The

user can change the input image file rather than the default image file by clicking open

and then selecting an image file. The procedure is as shown below:-

Scanned Recognition Screen 2

Scanned Recognition Screen3

74

There are two main tabs under the scanned document recognition. They are training

and recognition. First we should train the system under training module. Only then we

can recognize the characters from the input image provided using the recognition

module. The training tab under scanned document recognition looks like this.


The above figure shows the default input image for training. We can change the training

input image for different fonts by opening different input image files and then training

them such that the system gets adapted to the new fonts.

75


The choice of opening image file changes the default input image for training in to a

new image as shown below.


76

Now the user can select the bounds up to which the system must be trained just by using

click and drag actions of the mouse. Then selected data highlights as follows:-


After selection of the data, just click the “Train” button. This lets the system to train

itself with the help of the kohonen network and finally displays a dialog box depicting

that the training has been completed successfully.

77


Once the training of the system is completed, we move on to the recognition phase

where we open a new scanned image file to be converted into editable document as an

input as per our requirement. Now we select that part of the image from which the data

has to be extracted. Then it looks like:-


78

Next click the “Crop” button such that it finds the bounds of the text that is selected by

the user by composing a red boundary line around the selected text. It is as shown

below:-

Scanned recognition Screen 10

Finally click the “Recognize” button such that it extracts/recognizes the characters from

the image and presents it to the user. But this data is still not editable. Hence when we

click on the “EDIT” button provided at the bottom-center then the document becomes

both editable and searchable. This complete process is explained in the upcoming two

screens. It is as shown below:-

79



80

Now from the data available in the above screen shot, we can make any sort of changes

to the document using cut, copy, paste and etc and You can finally save the document in

two formats(word, text) as per our design.

The search function can be carried out here by clicking the “find” image button at the

bottom-left corner. Then it asks the user to enter the search term. It is as shown below:-


Now in the above screen shots dialog box, if you click Ok then there are two cases that

happens over here as per our design. They are:-

Case-1:- If the user enter search term resides in the document, then it will display a

dialog box asking the user, “whether he wants to continue the search or not? “.

If the user clicks yes then it will move the cursor to the search term.

If the user clicks no then it will exit the search.

81

Case-2:-If the user enters a search term that does not reside in the document, then it will

direct display a dialog box saying that the searching is finished. It means that the search

term is not present in the document.

Thus the user can understand whether the search term is present in the document or

not just after entering the search term itself.

If we are searching for a term that is already present in the document then the series of

output screens will be as follows:-


82


If we are searching for a term that does not reside in the document then the series of

output screens will be as follows:-


83


If we are using the editor, you can perform the following actions displayed in the

screens below:-


84

7. PLATFORM/TOOLS USED

SOFTWARE REQUIREMENTS SPECIFICATION

Operating System : Windows-XP

Programming Language : Core Java

User Interface : Swings

HARDWARE REQUIREMENTS SPECIFICATION

Processor : Pentium IV processor or higher

RAM : Minimum of 512 MB RAM

Memory : 500 MB or higher

8. CONCLUSION AND FUTURE SCOPE

85

What does the future hold for OCR? Given enough entrepreneurial designers and sufficient

research and development dollars, OCR can become a powerful tool for future data entry

applications. However, the limited availability of funds in a capital-short environment could

restrict the growth of this technology. But, given the proper impetus and encouragement, a

lot of benefits can be provided by the OCR system. They are:-

The automated entry of data by OCR is one of the most attractive, labor reducing

technology

The recognition of new font characters by the system is very easy and quick.

We can edit the information of the documents more conveniently and we can reuse the

edited information as and when required.

The extension to software other than editing and searching is topic for future works.

The Grid infrastructure used in the implementation of Optical Character Recognition

system can be efficiently used to speed up the translation of image based documents into

structured documents that are currently easy to discover, search and process.

FUTURE ENHANCEMENTS

The Optical Character Recognition software can be enhanced in the future in different kinds

of ways such as:

Training and recognition speeds can be increased greater and greater by making it

more user-friendly.

86

Many applications exist where it would be desirable to read handwritten entries. Reading

handwriting is a very difficult task considering the diversities that exist in ordinary

penmanship. However, progress is being made

87

9. REFERENCES

Under this references section, we have mentioned various references from which we

collected our problem and several others that supported us to design the solution for our

problem. These references include either books, papers published through some standards

and several websites links with URL’s:-

For the complete reference and understanding of neural networks refer jeff heaton’s

chapter 1 from www.jeffheaton.com

For the complete reference and understanding of OCR refer jeff heaton’s chapter 7

from www.jeffheaton.com

The IEEE standard reference paper from which we collected our problem statement

is authorized by Dana Petcu, Silviu Panica, Viorel Negru and Andrei Eckstein of

Computer Science Department who are from West University of Timisoara,

Romania.

The reference paper is also authorized by Doina Banciu from National Institute for

Research and Development in Informatics, Romania.

You can refer the IEEE standard paper written by D. Andrews, R. Brown, C.

Caldwell, et al., “A Parallel Architecture for Performing Real Time Multi-Line

Optical Character Recognition”

You can refer the IEEE standard paper written by H. Goto, “OCRGrid : A Platform

for Distributed and Cooperative OCR Systems”

88

10. APPENDIX

Appendix A: Glossary

TERMS

All the terms and abbreviations in the project are specified clearly. For further

development of project evolved definitions will be specified

ACRONYMS

IEEE: Institute of Electrical and Electronics Engineers

DFD: Data Flow Diagram

UML: Unified Modeling Language

J2EE: Java 2 Enterprise Edition

GUI: Graphical User Interface

OCR: Optical Character Recognition

GOCR: Grid OCR

Appendix B: Analysis Mode

This includes all the pertinent analysis models, such as data flow diagrams, class diagrams,

use case diagrams, interaction diagrams and state-chart diagrams.

89

Project report of OCR Recognition

Engineering

computer system

electronic documents

character recognitionsystem

documents line

documents of english

able documents

size of documents

paper documents aredifferent