CHAPTER ONEINTRODUCTION
1.1 Project Overview :This project will demonstrate a kind of
editing of both image, text , and voice technologies. The user will
be able to output the text that is contained in an image or written
in the editor and read this text by using the speech recognition.
Also the ability of having an edited text in a file format of
editing and save this file in a specific place under the name of
recent documents that you got from this editor.
This project will explore these ideas by developing Optical
Character Recognition (OCR) software, and then demonstrating that
software through a basic implementation of a text to Speech
conversion system . The system will load an image of any type of
format, extract the text founded in this image , and then read this
text and sore this edited text in a file. Also the user can write
or copy and paste a text on the editor directly.
1.2 Problem :Because of the high speed of information technology
in the world , there is a strong connection between technology and
the other fields in our life. Technology , software and hardware ,
are used in many places by different age slides of the community,
adults and children, but the main problem is that there is a
specific slide of people gets a difficulty in dealing with
technology. This slide is blind people. So our project came to help
this slide of community by making a conversion of edited text into
speech to be listened by the blind people.
Also the another aim of making our project is that there is many
images contained text which sometimes the user need it to his
different purpose. In this case , our project helps the user to get
this text , contained in an image , by using the technique of
Optical Character Recognition (OCR).
1.3 Objectives : A full realization of this concept would
involve a few distinct steps : To develop a text from an image by
OCR system. To develop text recognition software that can be gotten
from an image or even directory written into text editor system. To
develop a read the text contained in the text editor by using
Speech Recognition System. To develop the above system to exist on
a programmable OCR such that it operates independently of an
external computing source, and interacts with its software inputs
and outputs independently.
Such a system would be integrated in the users sources, use
speakers in the computer as output sources, and would issue control
files to software already installed in the computer. There are
different significant factors to be considered while designing both
Optical Character Recognition and Text to speech systems that will
produce clear text and speech outputs.
1.4 Introduction To OCR :The goal of Optical Character
Recognition (OCR) is to classify optical patterns (often contained
in a digital image) corresponding to alphanumeric or other
characters. The process of OCR involves several steps including
segmentation, feature extraction, and classification. Each of these
steps is a field unto itself, and is described briefly here in the
context of a Matlab implementation of OCR.
1.5 Text-to-Speech Software :A Text-To-Speech (TTS) recognition
is computer based system that should be able to read any text
aloud, whether it was straight bring in the computer by an operator
or scanned and submitted to an Optical Character Recognition
system. In the context of TTS synthesis, it is very complicated to
record and accumulate all the words of the language. So it is in
effect more appropriate to define TTS as the automatic production
of speech by using the concept of grapheme and phonemes text of the
sentences to complete. 1.6 Project Methodologies :1.6.1 OCR
Methodology :OCR softwarehas been around as long as computers have
to connect the printed world with the electronic one. Traditional
document imaging methods use templates and algorithms in a
two-dimensional environment to recognize objects and patterns. OCR
methods today recognize a spectrum of colors, and they can
distinguish between the background and the forefront in documents.
They de-skew, de-speckle and use 3-D image correction in order to
work with lower resolution images taken from mediums such as faxes,
the internet and cell phone cameras.OCR software uses two different
kinds of optical character recognition: feature extraction and
matrix matching. Feature extraction recognizes shapes using
statistical and mathematical techniques to detect edges, corners
and ridges in a text font to identify the letters in a word,
sentence and paragraph. OCR software achieves the best results when
the image has the following conditions: Is a clean, straight image.
Uses a very distinguishable font such as Arial or Helvetica. Uses
black letters on a clear background for better results. Has at
least 300 dpi resolution.However, these conditions are not always
possible. The best OCR techniques can still read words accurately
in less ideal circumstances using matrix matching.One example of
OCR is shown below. A portion of a scanned image of text, borrowed
from the web, is shown along with the corresponding (human
recognized) characters from that text.
Figure 1.1 : Scanned image of text and its corresponding
recognized representation.
1.6.2 Text to Speech Methodology :A Text-To-Speech (TTS)
recognition is computer based system that should be able to read
any text aloud, whether it was straight bring in the computer by an
operator or scanned and submitted to an Optical Character
Recognition system. In the context of TTS synthesis, it is very
complicated to record and accumulate all the words of the language.
So it is in effect more appropriate to define TTS as the automatic
production of speech by using the concept of grapheme and phonemes
text of the sentences to complete.
Figure 1.2 : TTS System.
1.7 Speech Synthesis :Synthesized speech can be created by
concatenating part of recorded speech which is stored in a
database. The power of a speech synthesizer is moderator by its
similarity to the human being voice, and by its ability to be
understood. The mainly significant qualities of a speech synthesis
system are naturalness and Intelligibility. Naturalness expresses
how intimately the output sounds like human speech, whereas
intelligibility is the easiness with which the output is
understood. The perfect speech synthesizer is providing both
natural and intelligible speech hence speech synthesis systems
usually try to maximize both characteristics. There are different
significant factors to be considered while designing a Text to
speech system that will produce clear speech.
Figure 1.3 : Flowchart of Text to Speech Recognition.
1.7.1 Text To Speech System : TTS Synthesizer is a computer
based system that should be understand any text clearly whether it
was establish in the computer by an operator or scanned and
submitted to an Optical Character Recognition (OCR) system. The
intention of a text to speech system is to convert an random given
wording into a speak waveform. Most important workings of text to
speech system are Text processing and Speech production. The two
primary methods for producing synthetic speech waveforms are
concatenative synthesis and formant synthesis. We are used
Concatenative synthesis for our TTS. Concatenative synthesis is
stand on the concatenation of piece of recorded words. Usually
concatenative synthesis constructs the most normal sounding
synthesized words.
1.7.2 Speech Generation Component :Given order of phonemes, the
idea of the speech generation component is to synthesize the
acoustic waveform Speech generation has been attempted by
concatenating the recorded words . Recent state of art language
synthesis produces natural sounding speech by using huge amount of
speech pieces. Storage of huge number of pieces and their retrieval
in real time is feasible due to availability of cheap memory and
computation power. The problem related to the unit selection speech
synthesis system are consider in three things that are choice of
unit size, generation of speech database and criteria for selection
of a unit.
1.7.3 Speech Synthesis Process : This TTS system is able to read
any written text. This procedure is called text normalization,
preprocessing and tokenization. In this system, we have developed a
phonetic based text to speech synthesis system. We can improve the
speech quality using matlab language . The following figure shows
the block diagram for TTs system .
Figure 1.4 : Block Diagram for Text to speech Synthesis.
Figure 1.5 : Flow chart for TTS with example.
1.8 Speech Synthesis Technology :Research in the area of speech
synthesis has been going on for decades. As we found out with our
research, numerous models and theories exist for the best way
implementing a speech synthesis system. Although the models seemed
intuitive from a high level perspective they quickly grew in
complexity as we got closer to implementation.
1.9 MATLAB Overview :Matlab is widely used in all areas of
applied mathematics, in education and research at universities, and
in the industry. Matlab stands for MATrix LABoratory and the
software is built up around vectors and matrices. This makes the
software particularly useful for linear algebra but matlab is also
a great tool for solving algebraic and differential equations and
for numerical integration. Matlab has powerful graphic tools and
can produce nice pictures in both 2D and 3D. It is also a
programming language, and is one of the easiest programming
languages for writing mathematical programs. Matlab also has some
tool boxes useful for signal processing, image processing,
optimization, etc.
Matlab is a high-performance language for technical computing.
It integrates computation, visualization, and programming in an
easy-to-use environment where problems and solutions are expressed
in familiar mathematical notation. Typical uses include: Math and
computation Algorithm development Modeling, simulation, and
prototyping Data analysis, exploration, and visualization
Scientific and engineering graphics Application development,
including Graphical User Interface building.
Matlabis an interactive system whose basic data element is an
array that does not require dimensioning. This allows you to solve
many technical computing problems, especially those with matrix and
vector formulations, in a fraction of the time it would take to
write a program in a scalar noninteractive language such as C or
Fortran.
Matlab was originally written to provide easy access to matrix
software developed by the LINPACK and EISPACK projects, which
together represent the state-of-the-art in software for matrix
computation.Matlab has evolved over a period of years with input
from many users. In university environments, it is the standard
instructional tool for introductory and advanced courses in
mathematics, engineering, and science. In industry, Matlab is the
tool of choice for high-productivity research, development, and
analysis.
Matlab features a family of application-specific solutions
called toolboxes. Very important to most users of matlab, toolboxes
allow you tolearnandapplyspecialized technology. Toolboxes are
comprehensive collections of matlab functions (M-files) that extend
the matlab environment to solve particular classes of problems.
Areas in which toolboxes are available include signal processing,
control systems, neural networks, fuzzy logic, wavelets,
simulation, and many others.
1.10 History of Matlab :Cleve Moler, the chairman of thecomputer
sciencedepartment at theUniversity of New Mexico, started
developing matlab in the late 1970s. He designed it to give his
students access toLINPACKandEISPACKwithout them having to
learnFortran. It soon spread to other universities and found a
strong audience within theapplied mathematicscommunity.Jack Little,
an engineer, was exposed to it during a visit Moler made toStanford
Universityin 1983. Recognizing its commercial potential, he joined
with Moler and Steve Bangert. They rewrote matlab inCand
foundedMathWorksin 1984 to continue its development. These
rewritten libraries were known as JACKPAC.In 2000, matlab was
rewritten to use a newer set of libraries for matrix
manipulation,LAPACK. Matlab was first adopted by researchers and
practitioners incontrol engineering, Little's specialty, but
quickly spread to many other domains. It is now also used in
education, in particular the teaching oflinear algebraandnumerical
analysis, and is popular amongst scientists involved inimage
processing.
1.11 SQL Server Overview :Generically, anydatabase management
system (DBMS)that can respond toqueriesfromclient machinesformatted
in theSQL language. When capitalized, the term generally refers to
either of two database management products fromSybaseandMicrosoft.
Both companies offerclient-serverDBMS products calledSQL
Server.
1.12 The History of SQL Server :IBM invented a computer language
back in the 1970s designed specifically for database queries
calledSEQUEL, which stood for Structured English Query Language.
Over time the language has been added to, so that it is not just a
language for queries but can also be used to build databases and
manage security of the database engine. IBM releasedSEQUELinto the
public domain, where it became known as SQL.
Because of this heritage you can pronounce it as "sequel" or
spell it out as "S-Q-L" when talking about it. Various versions of
SQL are used in today's database engines. Microsoft SQL Server uses
a version called Transact-SQL. Although you will use Transact-SQL
in this book and learn the basics of the language, the emphasis in
this book is on installing, maintaining, and connecting to SQL
Server. Sams Publishing also has a book titledTeach Yourself
Transact-SQL in 21 Days, which has more details on the language and
its usage.
Microsoft initially developed SQL Server (a database product
that understands the SQL language) with Sybase Corporation for use
on the IBM OS/2 platform. When Microsoft and IBM split, Microsoft
abandoned OS/2 in favor of its new network operating system,
Windows NT Advanced Server. At that point, Microsoft decided to
further develop the SQL Server engine for Windows NT by itself. The
resulting product was Microsoft SQL Server 4.2, which was updated
to 4.21. After Microsoft and Sybase parted ways, Sybase further
developed its database engine to run on Windows NT (Sybase System
10 and now System 11), and Microsoft developed SQL Server 6.0then
SQL Server 6.5, which also ran on top of Windows NT. SQL Server 7.0
now runs on Windows NT as well as on Windows 95 and Windows 98.
Although you can run SQL Server 7.0 on a Windows 9x system, you
do not get all the functionality of SQL Server. When running it on
the Windows 9x platform, you lose the capability to use multiple
processors, Windows NT security,NTFS(New Technology File System)
volumes, and much more. We strongly urge you to use SQL Server 7.0
on Windows NT rather than on Windows 9x. Windows NT has other
advantages as well. The NT platform is designed to support multiple
users. Windows 9x is not designed this way, and your SQL Server
performance degrades rapidly as you add more users.
SQL Server 7.0 is implemented as a service on either NT
Workstation or NT Server (which makes it run on the server side of
Windows NT) and as an application on Windows 95/98. The included
utilities, such as the SQL Server Enterprise Manager, operate from
the client side of Windows NT Server or NT Workstation. Of course,
just like all other applications on Windows 9x, the tools run as
applications.Aserviceis an application NT can start when booting up
that adds functionality to the server side of NT. Services also
have a generic application programming interface (API) that can be
controlled programmatically. Threads originating from a service are
automatically given a higher priority than threads originating from
an application.
1.13 SQL Server 2008 R2 :Microsoft SQL Server 2008 R2 is the
most advanced, trusted, and scalable dataplatform released to date.
Building on the success of the original SQL Server 2008release, SQL
Server 2008 R2 has made an impact on organizations worldwide with
its groundbreaking capabilities, empowering end users through
self-service business intelligence (BI), bolstering efficiency and
collaboration between database administrators (DBAs) and
application developers, and scaling to accommodate the most
demanding data workloads.
This chapter introduced the new SQL Server 2008 R2 features,
capabilities, and editions from a DBAs perspective. It also
discusses why Windows Server 2008 R2 is recommended as the
underlying operating system for deploying SQL Server 2008 R2. Last,
SQL Server 2008.
CHAPTER TWOPROJECT ANALYSIS
2.1 The Classification Process :There are two steps in building
a classifier, training and testing. These steps can be broken down
further into sub-steps :
1. Training :a. Pre-processing Processes the data so it is in a
suitable form for use.b. Feature extraction Reduce the amount of
data by extracting relevant informationUsually results in a vector
of scalar values. c. Model Estimation from the finite set of
feature vectors, need to estimate a model (usually statistical) for
each class of the training data.2. Testing :a. Pre-processing.b.
Feature extraction (both same as above).c. Classification Compare
feature vectors to the various models and find the closest match.
One can use a distance measure.
Figure 2.1 : The pattern classification process.
2.2 OCR Pre-processing :These are the pre-processing steps often
performed in OCR : Binarization Usually presented with a grayscale
image, binarization is then simply amatter of choosing a threshold
value. Morphological Operators Remove isolated specks and holes in
characters, can use the majority operator. Segmentation Check
connectivity of shapes, label, and isolate. Can use Matlab
6.1sbwlabel and regionprops functions. Difficulties with characters
that arent connected,e.g. the letter i, a semicolon, or a colon (;
or :).Segmentation is by far the most important aspect of the
pre-processing stage. It allows the recognizer to extract features
from each individual character. In the more complicated case
ofhandwritten text, the segmentation problem becomes much more
difficult as letters tend to beconnected to each other.
2.3 OCR Feature extraction :Given a segmented (isolated)
character, the useful features for recognition are :
1. Moment based features :
Think of each character as a Notepad. The 2-D moments of the
character are:
From the moments we can compute features like:1. Total mass
(number of pixels in a binarized character)2. Centroid - Center of
mass3. Elliptical parametersi. Eccentricity (ratio of major to
minor axis)ii. Orientation (angle of major axis)4. Skewness5.
Kurtosis6. Higher order moments2. Hough and Chain code transform3.
Fourier transform and series
2.4 OCR - Model Estimation :Given labeled sets of features for
many characters, where the labels correspond to the
particularclasses that the characters belong to, we wish to
estimate a statistical model for each characterclass. For example,
suppose we compute two features for each realization of the
characters 0through 9. Plotting each character class as a function
of the two features we have:
Figure 2.2 : Character classes plotted as a function of two
features. Figure 2.3 : Flowchart of recognizing words
The Optical Character Recognition deals with recognition of
optically processed characters. Reliably interpreting text from
real-world photos is a challenging problem due to variations in
environmental factors even it becomes easier using the best open
source OCR engine.
CHAPTER THREEPROJECT DESIGN
The project Design with the GUI (Graphical User Interface) :
Figure 3.1 : The main GUI of the project.
Load Image :
Figure 3.2 : Loading an image from computer into the
application.
The matlab code :
[filename, pathname] =
uigetfile({'*.jpg';'*.bmp';'*.gif';'*.tif'}, 'Pick an Image
File');if (filename==0) warndlg('You did not selected any file ') ;
% fille is not selectedendimg=imread([pathname,filename]);h =
waitbar(0,'Please wait...');steps = 100;for step = 1:steps %
computations take place here waitbar(step /
steps)endclose(h)set(handles.btnConvert,'Enable','on');set(handles.path,'Enable','on');set(handles.imageInfo,'Enable','on');set(handles.img_display,'Visible','on');set(handles.text1,'String',[filename]);set(handles.text1,'FontSize',14);set(handles.path,'String',[pathname]);axes(handles.img_display);imagesc(img);address
= cat(2,pathname,filename);imagen=imread(address);% Show
imageimshow(imagen);
Recognize Text :
In Folder " letters_numbers"
Figure 3.3 : Recognize text pattern.
Create Templates :
%CREATE TEMPLATES%Letterclc;close
all;A=imread('letters_numbers\A.bmp');B=imread('letters_numbers\B.bmp');C=imread('letters_numbers\C.bmp');D=imread('letters_numbers\D.bmp');E=imread('letters_numbers\E.bmp');F=imread('letters_numbers\F.bmp');G=imread('letters_numbers\G.bmp');H=imread('letters_numbers\H.bmp');I=imread('letters_numbers\I.bmp');J=imread('letters_numbers\J.bmp');K=imread('letters_numbers\K.bmp');L=imread('letters_numbers\L.bmp');M=imread('letters_numbers\M.bmp');N=imread('letters_numbers\N.bmp');O=imread('letters_numbers\O.bmp');P=imread('letters_numbers\P.bmp');Q=imread('letters_numbers\Q.bmp');R=imread('letters_numbers\R.bmp');S=imread('letters_numbers\S.bmp');T=imread('letters_numbers\T.bmp');U=imread('letters_numbers\U.bmp');V=imread('letters_numbers\V.bmp');W=imread('letters_numbers\W.bmp');X=imread('letters_numbers\X.bmp');Y=imread('letters_numbers\Y.bmp');Z=imread('letters_numbers\Z.bmp');
%lower case
lettersa=imread('letters_numbers\a.png');b=imread('letters_numbers\b.png');c=imread('letters_numbers\c.png');d=imread('letters_numbers\d.png');e=imread('letters_numbers\e.png');f=imread('letters_numbers\f.png');g=imread('letters_numbers\g.png');h=imread('letters_numbers\h.png');i=imread('letters_numbers\i.png');j=imread('letters_numbers\j.png');k=imread('letters_numbers\k.png');l=imread('letters_numbers\l.png');m=imread('letters_numbers\m.png');n=imread('letters_numbers\n.png');o=imread('letters_numbers\o.png');p=imread('letters_numbers\p.png');q=imread('letters_numbers\q.png');r=imread('letters_numbers\r.png');s=imread('letters_numbers\s.png');t=imread('letters_numbers\t.png');u=imread('letters_numbers\u.png');v=imread('letters_numbers\v.png');w=imread('letters_numbers\w.png');x=imread('letters_numbers\x.png');y=imread('letters_numbers\y.png');z=imread('letters_numbers\z.png');
%Numberone=imread('letters_numbers\1.bmp');
two=imread('letters_numbers\2.bmp');three=imread('letters_numbers\3.bmp');four=imread('letters_numbers\4.bmp');five=imread('letters_numbers\5.bmp');
six=imread('letters_numbers\6.bmp');seven=imread('letters_numbers\7.bmp');eight=imread('letters_numbers\8.bmp');nine=imread('letters_numbers\9.bmp');
zero=imread('letters_numbers\0.bmp');
%*-*-*-*-*-*-*-*-*-*-*-letter=[A B C D E F G H I J K L M... N O
P Q R S T U V W X Y Z];number=[one two three four five... six seven
eight nine zero]; lowercase = [a b c d e f g h i j k ... l m n o p
q r s t u v w x y z];character=[letter number lowercase];
templates=mat2cell(character,42,[24 24 24 24 24 24 24 ... 24 24
24 24 24 24 24 ... 24 24 24 24 24 24 24 ... 24 24 24 24 24 24 24
... 24 24 24 24 24 24 24 24 ... 24 24 24 24 24 24 24 24 ... 24 24
24 24 24 24 24 24 ... 24 24 24 24 24 24 24 24 ... 24 24]);save
('templates','templates')clear all
Read Letter :
%function read_letterfunction
letter=read_letter(imagn,num_letras)% Computes the correlation
between template and input image% and its output is a string
containing the letter.% Size of 'imagn' must be 42 x 24 pixels%
Example:% imagn=imread('D.bmp');% letter=read_letter(imagn)%load
templatesglobal templatescomp=[ ]; for n=1:num_letras
sem=corr2(templates{1,n},imagn); comp=[comp sem]; %pause(1)end
vd=find(comp==max(comp));%*-*-*-*-*-*-*-*-*-*-*-*-*-if vd==1
letter='A';elseif vd==2 letter='B';elseif vd==3 letter='C';elseif
vd==4 letter='D';elseif vd==5 letter='E';elseif vd==6
letter='F';elseif vd==7 letter='G';elseif vd==8 letter='H';elseif
vd==9 letter='I';elseif vd==10 letter='J';elseif vd==11
letter='K';elseif vd==12 letter='L';elseif vd==13 letter='M';elseif
vd==14 letter='N';elseif vd==15 letter='O';elseif vd==16
letter='P';elseif vd==17 letter='Q';elseif vd==18 letter='R';elseif
vd==19 letter='S';elseif vd==20 letter='T';elseif vd==21
letter='U';elseif vd==22 letter='V';elseif vd==23 letter='W';elseif
vd==24 letter='X';elseif vd==25 letter='Y';elseif vd==26
letter='Z'; %*-*-*-*-*elseif vd==27 letter='1';elseif vd==28
letter='2';elseif vd==29 letter='3';elseif vd==30 letter='4';elseif
vd==31 letter='5';elseif vd==32 letter='6';elseif vd==33
letter='7';elseif vd==34 letter='8';elseif vd==35 letter='9';elseif
vd==36 letter='0'; %********elseif vd==37 letter='a';elseif vd==38
letter='b';elseif vd==39 letter='c';elseif vd==40 letter='d';elseif
vd==41 letter='e';elseif vd==42 letter='f';elseif vd==43
letter='g';elseif vd==44 letter='h';elseif vd==45 letter='i';elseif
vd==46 letter='j';elseif vd==47 letter='k';elseif vd==48
letter='l';elseif vd==49 letter='m';elseif vd==50 letter='n';elseif
vd==51 letter='o';elseif vd==52 letter='p';elseif vd==53
letter='q';elseif vd==54 letter='r';elseif vd==55 letter='s';elseif
vd==56 letter='t';elseif vd==57 letter='u';elseif vd==58
letter='v';elseif vd==59 letter='w';elseif vd==60 letter='x';elseif
vd==61 letter='y';elseif vd==62 letter='z';else letter='l';
%*-*-*-*-*End
Lettere crope :
%function letter_in_a_linefunction [fl re
space]=letter_crop(im_texto)% Divide letters in
linesim_texto=clip(im_texto);num_filas=size(im_texto,2);%figure,imshow(im_texto);
%title('line sent in the function letter');for s=1:num_filas s;
sum_col = sum(im_texto(:,s)); if sum_col==0 k = 'true';
nm=im_texto(:,1:s-1); % First letter matrix %figure,imshow(nm);
%title('first letter in the function letter_in_a_line'); %pause(1);
rm=im_texto(:,s:end);% Remaining line matrix %figure,imshow(rm);
%title('remaining letters in the function letter_in_a_line');
%pause(1); fl = clip(nm); %pause(1); re=clip(rm); space =
size(rm,2)-size(re,2); %*-*-*Uncomment lines below to see the
result*- %subplot(2,1,1);imshow(fl); %subplot(2,1,2);imshow(re);
break else fl=im_texto;%Only one line. re=[ ]; space = 0; endend
function img_out=clip(img_in)[f
c]=find(img_in);img_out=img_in(min(f):max(f),min(c):max(c));
Lines Crop :
function [fl re]=lines(im_texto)% Divide text in lines%
im_texto->input image; fl->first line; re->remain line%
Example:% im_texto=imread('TEST_3.jpg');% [fl re]=lines(im_texto);%
subplot(3,1,1);imshow(im_texto);title('INPUT IMAGE')%
subplot(3,1,2);imshow(fl);title('FIRST LINE')%
subplot(3,1,3);imshow(re);title('REMAIN
LINES')im_texto=clip(im_texto);num_filas=size(im_texto,1);for
s=1:num_filas if sum(im_texto(s,:))==0 nm=im_texto(1:s-1, :); %
First line matrix rm=im_texto(s:end, :);% Remain line matrix fl =
clip(nm); re=clip(rm); %*-*-*Uncomment lines below to see the
result*-*-*-*- % subplot(2,1,1);imshow(fl); %
subplot(2,1,2);imshow(re); break else fl=im_texto;%Only one line.
re=[ ]; endend function img_out=clip(img_in)[f
c]=find(img_in);img_out=img_in(min(f):max(f),min(c):max(c));%Crops
image
Figure 3.4 : Recognize text in the project.
% --- Executes on button press in btnConvert.function
btnConvert_Callback(hObject, eventdata, handles)% hObject handle to
btnConvert (see GCBO)% eventdata reserved - to be defined in a
future version of MATLAB% handles structure with handles and user
data (see GUIDATA)% Convert to gray
scalepathname=get(handles.path,'String');filename=get(handles.text1,'String');address
= cat(2,pathname,filename);imagen=imread(address);if
size(imagen,3)==3 %RGB image imagen=rgb2gray(imagen);end% Convert
to BWthreshold = graythresh(imagen);imagen
=~im2bw(imagen,threshold);% Remove all object containing fewer than
30 pixelsimagen = bwareaopen(imagen,30);%Storage matrix word from
image
%Storage matrix word from imageword=[ ];text=[
];re=imagen;text='';% Load templatesload templatesglobal templates%
Compute the number of letters in template
filenum_letras=size(templates,2); while 1 %Fcn 'lines' separate
lines in text [fl re]=lines(re); imgn=fl; n=0; %Uncomment line
below to see lines one by one %figure,imshow(fl);pause(2)
%-------------------------------------------------- spacevector =
[]; % to compute the total spaces betweeen % adjacent letter rc =
fl; while 1 %Fcn 'letter_crop' separate letters in a line [fc rc
space]=letter_crop(rc); %fc = first letter in the line %rc =
remaining cropped line %space = space between the letter % cropped
and the next letter %uncomment below line to see letters one by one
%figure,imshow(fc);pause(0.5) img_r = imresize(fc,[42 24]); %resize
letter so that correlation %can be performed n = n + 1;
spacevector(n)=space; %Fcn 'read_letter' correlates the cropped
letter with the images %given in the folder 'letters_numbers'
letter = read_letter(img_r,num_letras); %letter concatenation word
= [word letter]; if isempty(rc) %breaks loop when there are no more
characters break; end end
%-------------------------------------------------% max_space =
max(spacevector); no_spaces = 0; for x= 1:n %loop to introduce
space at requisite locations if spacevector(x+no_spaces)> (0.75
* max_space) no_spaces = no_spaces + 1; for m = x:n
word(n+x-m+no_spaces)=word(n+x-m+no_spaces-1); end
word(x+no_spaces) = ' '; spacevector = [0 spacevector]; end end
%fprintf(fid,'%s\n',lower(word));%Write 'word' in text file (lower)
%fprintf(fid,'%s\n',word);%Write 'word' in text file (upper) text =
char(text, word); % Clear 'word' variable word=[ ]; %*When the
sentences finish, breaks the loop if isempty(re) %See variable 're'
in Fcn 'lines' break endendh = waitbar(0,'Please wait...');steps =
100;for step = 1:steps % computations take place here waitbar(step
/ steps)endclose(h)
set(handles.text2,'String',text);set(handles.text2,'FontSize',24);set(handles.Speak,'Enable','on');guidata(hObject,
handles);
Save to NotePad :
Figure 3.5 : Save to Notepad file format.
% --- Executes on button press in btnOpen.function
btnOpen_Callback(hObject, eventdata, handles)% hObject handle to
btnOpen (see GCBO)% eventdata reserved - to be defined in a future
version of MATLAB% handles structure with handles and user data
(see
GUIDATA)value=get(handles.text2,'String');setappdata(0,'txt',value)file_fig();
Figure 3.6 : Saving a text file.
% --- Executes on button press in btnOk.function
btnOk_Callback(hObject, eventdata, handles)% hObject handle to
btnOk (see GCBO)% eventdata reserved - to be defined in a future
version of MATLAB% handles structure with handles and user data
(see GUIDATA) %Opens text.txt as file for
writefname=get(handles.edit_name,'String');filename=strcat(fname,'.txt');pathname=get(handles.edit_location,'String');filepath=fullfile(pathname,filename);if
isequal(exist(filepath,'file'),2) button = questdlg('file name
already exist ', ... 'Warning','Override','Cancle','Cancle');
switch button
case 'Override' fid = fopen(filepath, 'wt'); case 'Cancle'
return; endelse fid = fopen(filepath, 'wt');endh =
waitbar(0,'Please wait...');steps = 100;for step = 1:steps %
computations take place here waitbar(step /
steps)endclose(h)%fprintf(fid,'%s\n',lower(word));%Write 'word' in
text file (lower)txt=getappdata(0,'txt');rmappdata(0,'txt');nRows =
size(txt, 1) ;stxt='';if nRows>1 for k=1:nRows
fprintf(fid,'%s\n',txt(k,:));%Write 'word' in text file (upper)
stxt=strcat(stxt,32,txt(k,:),10); endelse fprintf(fid,'%s\n',txt);
stxt=txt;endfclose(fid);date1=date;decr=get(handles.edit_note,'String');if
strcmp(decr,'Write Note here ...') decr=NaN;end
%data1 =
cell(1,6);columns={'id','name_file','text','path_file','time','note'};data1={handles.lastid
fname stxt pathname date1 decr};conn =
database('dbFiles','sa','123');insert(conn,'File_Data',columns,data1);close(conn)%
Update handles structureguidata(hObject, handles);%Open 'text.txt'
file winopen(filepath)close
Figure 3.7 : Edited text in a Notepad file format.
Load Text File :
Figure 3.8 : Loading a text file (Notepad file format).% ---
Executes on button press in load.function load_Callback(hObject,
eventdata, handles)% hObject handle to load (see GCBO)% eventdata
reserved - to be defined in a future version of MATLAB% handles
structure with handles and user data (see
GUIDATA)[filename,pathname] = uigetfile('*.txt;','select txt
file');filepath=fullfile(pathname,filename);h = waitbar(0,'Please
wait...');steps = 100;for step = 1:steps % computations take place
here waitbar(step / steps)endclose(h);%# preassign s to some large
cell arraytxt=cell(10000,1);sizS = 10000;lineCt = 1;fid =
fopen(filepath,'r');tline = fgetl(fid);while ischar(tline)
txt{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if
lineCt > sizS txt = [txt;cell(10000,1)]; sizS = sizS + 10000;
end tline = fgetl(fid);end%# remove empty entries in
stxt(lineCt:end) =
[];set(handles.text2,'String',txt)set(handles.Speak,'Enable','on')fclose(fid)
Loading file in edit tool :
Figure 3.9 : Loading a text of notepad file format in the edit
tool.
Text To Speech :
% --- Executes on button press in Speak.function
Speak_Callback(hObject, eventdata, handles)% hObject handle to
Speak (see GCBO)% eventdata reserved - to be defined in a future
version of MATLAB% handles structure with handles and user data
(see GUIDATA)text=get(handles.text2,'String');nRows = size(text, 1)
;if isempty(text) text = 'Write something to speak';endtry
NET.addAssembly('System.Speech'); Speaker =
System.Speech.Synthesis.SpeechSynthesizer; for n=1:nRows
rwtxt=text(n,:); if ~isa(rwtxt,'cell') rwtxt = {rwtxt}; end for
k=1:length(rwtxt) Speaker.Speak (rwtxt{k}); end endcatch
warning(['Not working !!']);end
Design DataBase (using SQL Srver 2008 R) :
Table Name : File_Data :
Figure 3.10 : File data.
Some Data in a Table :
Figure 3.11 : Some data in a database table.
Microsoft SQL ServerODBC in Matlab for Windows :
Figure 3.12 : Database explorer in matlab.
List of Text in Database :
Figure 3.13 : List of text in the database.
On Opening Form :
% --- Executes just before list_files is made visible.function
list_files_OpeningFcn(hObject, eventdata, handles, varargin)% This
function has no output args, see OutputFcn.% hObject handle to
figure% eventdata reserved - to be defined in a future version of
MATLAB% handles structure with handles and user data (see GUIDATA)%
varargin command line arguments to list_files (see
VARARGIN)handles.edit=0;conn = database('dbFiles','sa','123');curs
= exec(conn,['select * from
File_Data']);setdbprefs('DataReturnFormat','cellarray')curs=fetch(curs);a=curs.Data;if
~isequal('No Data',a) set(handles.listbox1,'String',a(:,2))
set(handles.listbox1,'Value',1)
set(handles.edit_id,'String',a(1,1))
set(handles.edit_name,'String',a(1,2))
set(handles.edit_date,'String',a(1,5))
set(handles.edit_location,'String',a(1,4))
set(handles.edit_text,'String',a(1,3)) if isempty(a(1,6))
set(handles.edit_note,'String','There is no note'); else
set(handles.edit_note,'String',a(1,6)); endend% Choose default
command line output for list_fileshandles.output = hObject; %
Update handles structureguidata(hObject, handles);
Open File :
Figure 3.14 : Open file by using notepad file format.
% --- Executes on button press in btn_open.function
btn_open_Callback(hObject, eventdata, handles)% hObject handle to
btn_open (see GCBO)% eventdata reserved - to be defined in a future
version of MATLAB% handles structure with handles and user data
(see GUIDATA)id=get(handles.edit_id,'String');if
~isempty(id)fname=get(handles.edit_name,'String');fname=strcat(fname,'.txt');pathname=get(handles.edit_location,'String');filepath=fullfile(pathname,fname);txt=get(handles.edit_text,'String');
ee=exist(filepath{1},'file');if isequal(ee,2)
winopen(filepath{1})else button = questdlg(['filse has been damged
or change it location. ',char(10),'What you want to do?'], ...
'Warning','Create','Delete','Cancle','Cancle'); switch button case
'Create' fid = fopen(filepath{1}, 'wt') nRows = size(txt, 1) ; for
k=1:nRows fprintf(fid,'%s\n',txt{k,:});%Write 'word' in text file
(upper) end fclose(fid); winopen(filepath{1}); case 'Delete' button
= questdlg(['Are you sure you want to delete?'], ...
'Warning','OK','Cancle','Cancle'); switch button case 'OK'
btn_del_Callback(hObject, eventdata, handles); case 'Cancle'
return; end case 'Cancle' return; endendend
Edit :
Figure 3.15 : Edited text in notepad file.
% --- Executes on button press in pushbutton5.function
btn_edit_Callback(hObject, eventdata, handles)% hObject handle to
pushbutton5 (see GCBO)% eventdata reserved - to be defined in a
future version of MATLAB
% handles structure with handles and user data (see
GUIDATA)id=get(handles.edit_id,'String');if
~isempty(id)handles.edit=handles.edit+1;if handles.edit==1
set(handles.edit_note,'Enable','on')
set(handles.edit_note,'BackgroundColor',[1.0 1.0 1.0]);else
handles.edit=0; conn = database('dbFiles','sa','123');
edit_txt=get(handles.edit_note,'String'); if
~isequal(edit_txt,'There is no note') whereclause=strcat('where
id=',id); update(conn,'File_Data',{'note'},{edit_txt},whereclause)
set(handles.edit_note,'Enable','inactive')
set(handles.edit_note,'BackgroundColor',[0.961 0.976 0.992]);
helpdlg('You are Done update','Update') else
set(handles.edit_note,'Enable','inactive')
set(handles.edit_note,'BackgroundColor',[0.961 0.976 0.992]);
endend% Update handles structureguidata(hObject, handles);end
Delete From Database :
% --- Executes on button press in btn_del.function
btn_del_Callback(hObject, eventdata, handles)% hObject handle to
btn_del (see GCBO)% eventdata reserved - to be defined in a future
version of MATLAB% handles structure with handles and user data
(see GUIDATA)id=get(handles.edit_id,'String');if ~isempty(id)button
= questdlg(['Are you sure you want to delete?'], ...
'Warning','OK','Cancle','Cancle'); switch button case 'OK'
id=get(handles.edit_id,'String'); query=strcat('delete from
File_Data where id=',id); conn = database('dbFiles','sa','123');
curs = exec(conn,query{1}); curs = exec(conn,['select * from
File_Data']); setdbprefs('DataReturnFormat','cellarray')
curs=fetch(curs); a=curs.Data; if ~isequal('No Data',a{1})
set(handles.listbox1,'String',a(:,2))
set(handles.listbox1,'Value',1)
set(handles.listbox1,'String',a(:,2))
set(handles.listbox1,'Value',1)
set(handles.edit_id,'String',a(1,1))
set(handles.edit_name,'String',a(1,2))
set(handles.edit_date,'String',a(1,5))
set(handles.edit_location,'String',a(1,4))
set(handles.edit_text,'String',a(1,3)) if isempty(a(1,6))
set(handles.edit_note,'String','There is no note'); else
set(handles.edit_note,'String',a(1,6)); end else
set(handles.listbox1,'String','') set(handles.edit_id,'String','')
set(handles.edit_name,'String','')
set(handles.edit_date,'String','')
set(handles.edit_location,'String','')
set(handles.edit_text,'String','')
set(handles.edit_note,'String','') end close(curs) close(conn)
helpdlg('Delete if Done','Delete') case 'Cancle' return; end
end
List of files :
Figure 3.16 : List of files.
% --- Executes on button press in btn_speak.function
btn_speak_Callback(hObject, eventdata, handles)% hObject handle to
btn_speak (see GCBO)% eventdata reserved - to be defined in a
future version of MATLAB% handles structure with handles and user
data (see GUIDATA)text=get(handles.edit_text,'String');if
~isempty(text)value=get(handles.edit_text,'String');setappdata(0,'text',value)close()ocr_gui()end
Return to the main Form with the text :
Figure 3.17 : Returning to the main form with the text .
function ocr_gui_OpeningFcn(hObject, eventdata, handles,
varargin)% This function has no output args, see OutputFcn.%
hObject handle to figure% eventdata reserved - to be defined in a
future version of MATLAB% handles structure with handles and user
data (see GUIDATA)% varargin command line arguments to ocr_gui (see
VARARGIN)% Choose default command line output for
ocr_guihandles.output = hObject; text=getappdata(0,'text');if
~isempty(text) set(handles.text2,'String',text)
set(handles.Speak,'Enable','on') rmappdata(0,'text');end % Update
handles structureguidata(hObject, handles);
CHAPTER FOURIMPLEMENTATION
4.1 Project Implementation : 1. Loading any image format (bmp,
jpg, png etc )
Figure 4.1 : Loading an image into the program.
2. The image will load .
Figure 4.2 : Viewing the image in the program.3. View the image
information by clicking the button called Image Info.
Figure 4.3 : Viewing the image information.
4. Convert the image to grayscale and binarize it using the
threshold value (Otsu algorithm).5. Page layout analysis. In this
step we tried to identify the text zones present in the image. So
that only that portion is used for recognition and rest of the
region is left out.6. Lines detection and removing. 7. Detection of
text lines and words. Here we also need to take care of different
font sizes and small spaces between words.8. Recognition of
characters. This is the main algorithm of OCR; an image of every
character must be converted to appropriate character code.
Sometimes this algorithm produces several character codes for
uncertain images. For instance, recognition of the image of "I"
character can produce "I", "|" "1", "l" codes and the final
character code will be selected later.
9. Click Recognize Text to get the text
Figure 4.4 : Recognizing text.
10. Saving results to selected output format, for instance,
searchable TXT file format. And store (name, Text, Location, path
and note) of txt file in database directly.11. Cliking on Save to
Notepad Will open form to insert name and location of the file
(Browse).
Figure 4.5 : Saving text in a notepad file.12. Click OK to open
and save in a file.If the file name is already in the location you
select a message will show ask you if you want to override or
cancel to rename the file.
Figure 4.6 : Warning message of an exit file name .
Figure 4.7 : Opening a file in notepad.13. Import text to be
edited and read in the editor and to be converted into voice (
text-to-speech ) conversion.
Figure 4.8 : The pattern classification process.
When you select the file ,the contents text well loaded in the
edit text :
Figure 4.9 : Loading the contents of the file into the edit
text.14. Using database to view the recent documents that have been
saved by this program.
Figure 4.10 : Viewing the recent document using the
database.
15. Open the text you have been saved in database in Notepad
Figure 4.11 : Opening the text of notepad file using
database.
16. You can Edit the note.
Figure 4.12 : Editing in the notepad file.
Figure 4.13 : Updating the editing.17. You can click on speak to
load the text in main form.18. Absolutely you can delete from the
list.
Figure 4.14 : Warning message of deleting file from list.
Figure 4.15 : Delete done message.
Conclusion :
In this project, we discussed the topics relevant to the
development of TTS systems. We conducted MOS tests to evaluate the
performance of speech synthesizer. This paper describes the
successful completion of a simple text to speech translation by
simple matrix operations. Thus this system is very easy and
efficient to implement unlike other methods which involve many
complex algorithms and methods. The next step in improving this
system would be implementing some machine learning algorithms in
order to support generalization.
Suggestions for Future Work :
A number of open problems must be solved to allow the
development of a truly Image , text to speech conversion and
recognition system. These problems suggest a variety of research
directions that need to be pursued to make such a system feasible.
First , we will add another feature to our project which is Speech
to Text Conversion . Second , Saving the audio files with different
types of audio file formats ,WAV, MP3, VOX, RAW,...etc, with the
help of database programs. Third, opening an audio file and getting
the speech to text conversion of this file. Forth, making the
application able to open text in different text file formats , pdf
, docx ,...etc. Fifth, Saving the text files with different types
of text file formats, pdf , docx,...etc, with the help of database
programs. Finally , we are interested to make our project more
efficient and getting the use of different slides of people of the
community and spreading its features globally.
REFERENCES :
[1] S. D. Shirbahadurkar and D.S.Bormane Subjective and
Spectrogram Analysis of Speech Synthesizer for Marathi TTS Using
Concatenative Synthesis. 2010 IEEE International Conference on
Recent Trends in Information, Telecommunication and Computing. [2]
Johnny Kanisha and G.Balakrishanan Speech Transaction for Blinds
Using Speech-Text-SpeechConversions Advances in Computer Science
and Information Technology Communications in Computer and
Information Science Volume 131, 2011, pp 43-48 [3] Hamad, M. Arabic
Text-To-Speech Synthesizer, Research and Development (SCOReD), 2011
IEEE Student Conference 9 978-1-4673-0099-5 ) on 19-20 Dec. 2011
409 - 414 .
[4] S.D.Shirbahadurkar and D.S.Bormane, (2009) Marathi Language
Speech Synthesizer Using Concatenative Synthesis Strategy (Spoken
in Maharashtra, India), Second International Conference on Machine
Vision, pp. 181-185.
[5] http://code.google.com/p/tesseract-ocr/. Last accessed: May
12, 2009.
[6] Md. Abul Hasnat, Muttakinur Rahman Chowdhury and Mumit Khan,
"Integrating Bangla script recognition support in Tesseract OCR",
Proc. of the Conference on Language and Technology 2009 (CLT09),
Lahore, Pakistan, 2009.
[7] http://code.google.com/p/ocropus/. Last accessed: May 12,
2009.
[8] http://code.google.com/p/banglaocr/. Last accessed: May 12,
2009.
55