“A MULTI SCRIPT NUMERALS CLASSIFICATION USING SOFT COMPUTING” A Dissertation Submitted in partial fullfillment for the award of the Degree of Master of Technology in Department of Computer Science & Engineering (With Specialization in Information and Communication) Supervisor: Submitted By: Mr. Akhilesh Pandey Anshu Kumar Associate Professor SGVU091143318 SGVU,Jaipur Department of Computer Science & Engineering Suresh Gyan Vihar University Mahal, Jagatpura, Jaipur-302025 June 2015
47
Embed
A MULTI SCRIPT NUMERALS CLASSIFICATION USING SOFT …
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
“A MULTI SCRIPT NUMERALS CLASSIFICATION
USING SOFT COMPUTING”
A
Dissertation
Submitted
in partial fullfillment
for the award of the Degree of
Master of Technology
in Department of Computer Science & Engineering
(With Specialization in Information and Communication)
Supervisor: Submitted By:
Mr. Akhilesh Pandey Anshu Kumar
Associate Professor SGVU091143318
SGVU,Jaipur
Department of Computer Science & Engineering
Suresh Gyan Vihar University
Mahal, Jagatpura, Jaipur-302025
June 2015
Chapter1 INTRODUCTION
In the today’s world one of the critical problems is the system that not recognizes the English
and Devnagri handwritten numbers. Numbers in Davnagri is not truthfully and efficiently
recognized by the electronic system. For recognizing the numbers various algorithm and research
work have been proposed. Numbers of process have been performed for number recognition, it’s
not like only single algorithm or single technique can do the recognition of the both number to
give accurate result. In India, Hindi is considered as our mother tongue so the recognition of the
number in Denarii is very much important. Therefore to solve complete problem the following
system help to solve it in more easy and precise way. But some parts of India People are very
frequent in the English. Recognition of the numbers which is handwritten is very much problem
in the area of research because also it is office automation and is very important requirement. It
provides recognition of the numbers in a very effective and practical way. The writing of the
person depends upon its writing style or on their mood. In the process of the recognition all the
structure of the numbers as well as topological and statistical information is being observed.
Limited variation on the shape and the size is considered in the hindi and English numbers which
is hand printed, the main focus is on the process recognition.
Researchers had been working on the recognition of handwritten numbers from last thirty five
years. Public is now looking towards the technology of the recognition of the handwritten
numbers or script. Now a days 100% rating for the systems which can recognize handwritten
numbers or script is still not created and achieved as humans are not able to recognize every
writer’s handwriting without confusion, even they cannot recognize or read their own hand
writing in effective manner. Therefore it is writers’ responsibility to write any text which should
be in readable format.
1.1 Devnagri Script
After English and Chinese language, hindi language is most common in all over the world and is
approximately 500,000 people can read , write and speak this language. Script of Devnagri is the
basic script for the various languages present in India. For example, we can say hindi and
Sanskrit. In various other languages close variation of the Devnagri script is used. In the ancient
times, Sanskrit was the common language however there is written on material still available in
Arabic, French, portages, Turkish, dravadian as well English. Sanskrit is expressive language
and some word of English, portages, Turkish etc. came out of it. Devnagri script attracted lots of
people attention. Pure Indian nature was the Brahminpeople who was the devnagri script holder.
In the 4thcentury, script of brahmigupta was formed. From the script of gupta subsequent kettle
come out of it in the 8th century. Later modern script of nagri came to be known as
devnagriacharya. Vinobabasu says that the script came out to be known as loknagri because of
the common language known by the people in the nation.it is a dialect of Hindi said by few
people but it seems not to be fair since other language like hindi language also written in
following script. Devnagri is considered as script whereas hindi is considered as language.
Variousview of the devnagriscriptare
Table no.1
0-9 in various Indian Language
Nagri script was called devnagri due to its nagri prevalent, whereas Sanskrit language
was called denas voices.
Due to very much use by the gujrat Brahmins, it was called Devnagri.
It was spoken commonly in the Devnagar of kashi, so named as devnagri.
The most accurate and scientific language is the Devnagri as formatted .most of the Aryan
language of india like nepali, Sanskrit, Marathi etc. Indian constitution has declared
Devnagri as national dialect and hindi as national language. State language had become
hindi whereas state script had become Devnagri for the state like Bihar, Rajasthan, Haryana,
Uttranchal etc. Most scientific script found today is devnagri, since every script is formed
from the script of brahmi therefore Devnagri script has relation with every script mostly.
Figure 1.1: Sample of Devnagri Handwritten numbers
1.2 English Script
English is a west Germanic language that was first spoken inearly medieval in England and is
now a global lingua franca. It is spoken as a first language by the majority population of
several sovereign state, including the united states,the united kingdom, Canada, Australia,
Ireland, New Zealand and a number of Caribbean nation; moreover, it is an official language
of almost 60 soverigen state.
Significance
Modern English sometimes described as the first global lingua franca is the dominant
language or in some instance even the required international language of communication,
science, information technology, business, aviation, entertainment, radio, and diplomacy.
History
English originated in the dialects of north sea Germanic that were carried of Britain by
Germanic settlers from various part of what are now the Netherlands northwest Germany and
Denmark.
The old English was latter transforms by two waves of invasion. The first was speakers of the
north Germanic language and the second was by speakers of romance language old Norman
in the 11thcentury.
Figure 1.2: Sample English Number
1.3 Optical character recognition
Miniature form of optical number recognition is the optical character recognition. Therefore the
method used here help machine to indicate number mechanically as optical mechanism. Due to
this, Humans are able to identify objects in many numbers. Optical machine is the eyes whereas
the activities of input are look up by the brains. The advance technology of the system OCR
faced by the technologist of this system had made easier to recognize the variable. Suppose a
person is allowed to read an unfamiliar language of the page, he/she may not be able to recognize
the words however if the numerical value is present in those statement, it can be easily
recognized and explained by the person because the number the number present are used by the
people all over the world. Therefore this concludes that …. Is only used to acknowledge the
numbers. Secondly, the size of the alphabetical and numerical signs is almost same. It is hard to
study the word which is printed which appears pitch dark, backdrops either it is mentioned above
the graphics or words. The paper document i.e. document present in paper form can be easily
read by human being but it creates problem for the computer to read which is a machine to learn
straight report. Therefore OCR system changes the paper document into the process able shape
of computer.
Machine searched pictures are reshaped and to sign in process able form of computer like ASCII
of the work which is hand written and alphabet numbers are the following process of the system.
Pattern recognition is one of the areas of the OCR and character which is handwritten is
processed due to the motivation of having improvement in the machine and man communication.
For character recognition few commercial products are currently available. Though many
research had been done but the product which can perform recognition of the handwritten data
are still not available. However to solve the problem, neural networks which is artificial in nature
is used due to the high interest level which is recently observed. During past generation mostly
the neural network development for the feature extraction make use of either approaches of
statistical or pattern matching. In the field of machine learning or artificial intelligence one of its
basic aims was to enable the computer to accomplish the task in such a way that which seems
very natural to the people.
1.3.1 Application of number recognition
There are various numbers of recognition of number applications. They are as follows:
Work-specific readers: there is number of petition on the recognition of the number for the
application of high volume where input of high level is needed arepraised by many of the
readers. High price input is needed managing in one desired field consumes less time.
Though document of same kind held size of same kind as well as layout. It is easier for the
scanner of picture to concentrate on the whole information.
Address readers: Address reader’s means delivering mail related to the system. It’s target on
the size of the mail as well as ZIP code and sorts the postal mail.
Forum reader: 2 graphics has been divided for the forum reading method. First one is the
instruction which is printed and second one is field data. Only that part of the forum is
mentioned in the system whose data is printed.
Check reader: here is the following mode of reading the picture of check is taken and
goodness of the price is identified as well as information of the accounts present on the check
and hence this data is used to cross check the outcome.
Bill processing system: the purpose of bill processing system generally is used to study the
inventory documents, payments slips and utility bills. On the document a certain region is
focused by the system where the information is located which is inspected. For example the
value of the payment and number of account.
Passport readers: with the help of custom inspection the returning American passengers
speed up is done by the automated readers of the passport. The date of birth, number of
passport present of the traveller is read by the reader and cross checked with the records of
the database which contains information of the smugglers and felons.
General purpose page readers: the page readers have two categories that is low end page
readers and high end page readers. Compared to low end page readers more advance is high
end page readers. The low end page is compatible with scanners of the flat beds which
generally don’t come with scanner in it. The low end page readers are generally used in
environment of the office with work station desktop, which in the throughput of the system is
less demanding. A sacrifice in the accuracy of the recognition is made in order to handle a
document of broader rang. Some software of the OCR for the improving accuracy of the
recognition allows the user to adopt engine for the recognition of the customer data.
1.3.2 Limitation
The 100% perfect read rate has been never achieved by the OCR. Therefore a system very
much need which allow very frequent as well as accurate reject correction. A problem is always
has been there is the processing of the exception item since the job entry completion is being
delayed particularly the function of balancing. The data in dollar is not balanced by the system
accuracy particularly for the manufacture of the hardware it is not their responsibility to make
device of the OCR which can accurately read the data without any substitution.
It mostly depends upon the item quality which is to be processed. The OCR main purpose from
last many years is given as:
The accuracy of the reading is increased i.e toreduce substitution and rejects.
Eliminate the especially front or character which is designed and also the characters
which are handwritten.
Sensitivity of the scanning is reduced so that it can read input which is less control.
The limit given above to the most of the application is not objectionable and the users which are
dedicated to the user of the system of the OCR is increasingly every year. To create a system
which is not special is not special is by itself not sufficient. OCR system is a saver of time , but
then also it is not perfect.
The accuracy of 99.9% level is rarely reached by it.
Problem is faced by it with printed newspaper.
With material of heavily bound it faces the problem.
1.2.3 Different phases ofHindi and English character recognition
Hindi and english script process is divided into different phases as shown in the diagram 1.1 and
all the phase is explained.
Figure 1.3: Block Diagram of HDCR
1.2.3.1 Pre- processing
The family of procedures for cleaning up, enhancing, smoothing, filtering and otherwise digital
image messaging in order that frequent algorithm along with the road to the classification of final
to make it more accurate and simple is the name given to the pre-processing.
Figure 1.4: Database Image of Offline Handwritten Devanagari N
Binarization- binarization/theresolding is the process where the grey scale image is
converted into the binary image. Two division of thresolding are as follows:
Global, which form the image of the entire document picks one of the threshold
which is background level estimation value based from the histogram image
intensity.
Aoloptine, according to the information of the local areas uses each pixel different
value.
Noise removal- the main aim of the noise removal is to discard any unwanted pattern of
bit, which in the output doesn’t have any significance
Figure 1.5: Noise Reduction
Skeletoniztion- skeletonization is also known as thinning. In this process the width of the line
is reduced. For example from various pixels of the object to just in a pixel of image. In the
process irregulaties of the letter is removed which in turn make algorithm of recognition
simpler since only on character stroke it have to be worked, which is wide only by one pixel.
It also decreases the space of the memory which is required to the store the input characters
information and therefore also reduces the time of processing.
Figure 1.6: Skeletonization of Image
Smoothing: the process of smoothing is to smooth the character which is broken as noisy.
(b) (c) (d)
Figure 1.7: Smoothing of Numbers at different Numbers Levels
Original Image
32 x 32 resolution
16 x 16 resolution
8 x 8 resolution
Contour smoothing: the process of counter smoothing is smoothening of skewness input
characters which is broken as noisy.
Figure 1.8: Original Image
Figure 1.9: Contour Image
Skewness: skewness id caused due to the reason if the paper is not fed in the scanner straight.
It is referring to the process where to the titt in the image of the bit mapped of the paper
scanned for the recognition of the character system. Most of the algorithm of the character
recognition is orientation sensitive of the input range, which can correct and can detect the
skewness automatically.
Figure 1.10: Correction of Skewness
Segmentation
When former stage a clear picture that partition processing is used to move forward towards the
stage is produced. This let to try to find images and pictures of single-line wise signs by working
the numbers.
In many ways, the quality of work can be divided on the bases and plans as properly based with
divisions and remove all classified document after check, background sound emissions and slant
of ways bit map images to improve document production work of pre-processing is put through.
Figure 1.11: Segmentation and Recognition of Numeral
Figure 1.12: Segmentation of 2 and 0
1.2.3.2 Feature extraction and Classification In many ways, the quality of work can be divided on the bases and plans as properly based with
divisions and remove all classified document after check, background sound emissions and slant
of ways bit map images to improve document production work of pre-processing is put through.
In many ways, the quality of work can be divided on the bases and plans as properly based with
divisions and remove all classified document after check, background sound emissions and slant
of ways bit map images to improve document production work of pre-processing is put through.
There are two levels of feature extraction and classification. Feature extraction of evaluate the
system responsible for the family after making an ornament in the appropriate data intelligence,
was named the assignment of data classified by the Administration to make it easy. Text block
explicitly marks the property dividation can be identified by squeezing a text block and features
that me. .
Theereare two types of recognition systems-
Type of Number
Recognition system
On Line Number Offline Number
Recognition Recognition
Figure 1.13: Type number recognition system
computer hardware to a continual development of the front want the simplest subject to the
human and the computer center. Identify the number some in our number on the tip that
Adaptive hand number with the status of indenture read written recognition of law.Later it
became to be instructive. Although the number of busy work cognizance handwrites are not as
trustworthy is unusually very hand when working in outlying issuance autography. They are two
types of recognition systems-
1.3.1 In the case of online identity
Validation only online number is a natural number of times identified. Online system in their
offline affiliate number as condition for knowledge. Online programs combine offline
cognizance at all time as a goal to achieve the rank of a pen. Protesting offline between
knowledgeable position numbers composite numbers written in different types of signal numbers
figure and the big difference between the as as it is.
1.3.2 Number identification number validation offline
In the case of typed text and just text text number in officially as paper browsed documents in
binary or scale drawings and cognizance making accessible to the algorithm. Protesting too much
identification number offline and work content and authoring tool is used up. Means the median
and scanning and hard between actions like Todbinarization effect sculpture.