PROJECT REPORT ON Design & Implementation of a Person Authenticating & Commands Following Robot Submitted In Partial Fulfillment of the Requirements for the Degree of BACHELOR OF TECHNOLOGY Under the able guidance of By Dr. Haranath Kar Aditya Agarwal (2002519) Subhayan Banerjee (2003516) Nilesh Goel (2003560) Chandra Veer Singh (2002577)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PROJECT REPORT
ON
Design & Implementation of a Person Authenticating & Commands Following Robot
Submitted
In Partial Fulfillment of the Requirements
for the Degree of
BACHELOR OF TECHNOLOGY
Under the able guidance of By Dr. Haranath Kar Aditya Agarwal
(2002519) Subhayan Banerjee
(2003516)
Nilesh Goel (2003560)
Chandra Veer Singh
(2002577)
ii
Department of Electronics and Communication Engineering
MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY
ALLAHABAD 211004, INDIA
MAY 2007
Abstract
In this project, the algorithms for the speech recognition
and face recognition have been developed and implemented on
MATLAB 7.0.1.These algorithms can be used for any security
system in which the person authentication is required. A security system developed using these two algorithms first recognizes the person to be authenticated using the face recognition algorithm
and after proper authentication follows the commands of the
person using the speech recognition algorithm. The speech
recognition algorithm basically uses the speech templates and the face recognition algorithm uses the Fourier Descriptors for the identification purpose. The proposed algorithms are simpler, faster
and economic as compare to previously reported algorithm. These algorithms can be easily implemented on DSP kits (Texas or Analog
Devices) to develop an autonomous wireless security system. This
security system has been easily mounted on SODDRO (Sound Direction Detection Robot) developed by the same group in November 2006.
iii
Acknowledgments
We take this opportunity to express our deep sense of
gratitude and regard to Dr. Haranath Kar, Asst.Professor,
Department of Electronics and Communication Engineering, MNNIT, Allahabad for his continuous encouragement and able guidance, we
needed to complete this project.
We are indebted to Dr. T.N.Sharma, Dr.Sudarshan Tiwari, Mr.
Asim Mukherji, Mr.Arvind Kumar and Mr. Rajeev Gupta. of MNNIT Allahabad, for their valuable comments and suggestions that have
helped us to make it a success. The valuable and fruitful discussion with them was of immense help without which it would have been difficult to present this Robot in its present form.
We also wish to thank Mrs. Vijya Bhadauria, Project In-charge,
,Dr.V.K.Srivastava and Romesh Nandwana(B.Tech 2nd Yr.,ECE) for
their very kind support throughout the project. Finally, we are grateful to P.P.Singh, staff, Project and PCB Lab,
Chandra Vali Tiwari and Ram Sajivan, staff, Basic Electronics Lab,
Ram ji, staff, Computer Architecture Lab of MNNIT, Allahabad and
the administration of MNNIT, Allahabad for providing us the help
required.
Aditya Agarwal
Subhayan Banerjee
Nilesh Goel Chandra Veer Singh
v
Certificate
TO WHOM IT MAY CONCERN
This is to certify that project titled
“Design & Implementation of a Person Authenticating &
Commands Following Robot”
Submitted by:
1. Aditya Agarwal
2. Subhayan Banerjee 3. Nilesh Goel
4. Chandra Veer Singh
Of B.Tech 8th semester, Electronics & Communication
Engineering, in partial fulfillment of the requirement for the degree of Bachelor of Technology in Electronics &
Communication Engineering, MNNIT(Deemed University),
Allahabad ,during the academic year 2006-07 is their original endeavor carried out under my supervision and guidance and
has not been presented anywhere else.
Dr. Haranath Kar Department of Electronics and Communication Engineering
Motilal Nehru National Institute of Technology Allahabad
Allahabad 211004
05 MAY 2007
vi
Table of Contents
Page
Abstract ii
Acknowledgments iii
Certificate iv
Table of Contents v
List of Tables vi
List of Figures vii Chapter 1: Introduction 1 1.1 Purpose of This Document 2
Chapter 2: Algorithm for Face Recognition 3
Chapter 3: Algorithm for Speech Recognition 6 Chapter 4: System Description & Hardware Implementation 8 4.1a Person to be authenticated 8 4.1b web Camera 8
4.2f Microcontroller & Motor Controller Unit 10 4.2g Mechanical Assembly 11
4.3 List of Components 13 Chapter 5: Software Section 14
5.1 MATLAB Code for Face Recognition 14 5.2 MATLAB Code for Speech Recognition 21 5.3 Assembly Code for Sound Detection 33
Chapter 6: Results 51
Chapter 7: Summary & Conclusion 52 7.1 Summary 52
7.2 Conclusion 52
Chapter 8: Future Scope 53 References 54
Appendix A 55
vi
List of Tables
Table Title Page
6.1 Table for Results of Face Recognition 51 6.2 Table for Results of Speech Recognition 51
vii
List of Figures
Figure Title page
2.1 A Simple Binary Image 3 2.2 Result of Structuring element on Fig.2.1 4
4.1 Block Diagram of Face Recognition System 8
4.2 Block Diagram of Speech Recognition System 9 4.2a Microcontrollers & Motor Controller Unit 11
4.2b Bottom View of Mechanical Assembly 12
1
Chapter 1
Introduction
Speech Recognition & Face Recognition are two important
areas that have drawn the attention of so many researchers in the recent years. Face Recognition in a real time application with
sufficient efficacy is still a challenge for the researchers keeping in
mind the constraints imposed by memory availability and
processing time. Here a two dimensional approach of face recognition is introduced. In the proposed algorithm for face
recognition first the face is detected in the image using the techniques of edge detection and then the face is recognized with
the help of Fourier descriptors. The main advantage of using Fourier descriptors is that these are invariant to translation,
rotation and scaling of the observed object. For Speech Recognition the speech templates are used that basically depends upon the
intensity and the accent of the speech.
The Speech Recognition and Face Recognition modules are the most important stages of any humanoid robot that needs
proper authentication of the person before following any instruction. Apart from the humanoid robot the proposed
algorithms can be also used in different real time industrial applications.
The rest of the report is organized as follows. The description of the algorithm for Face Recognition is given in Chapter 2. In Chapter 3, an algorithm for reliable Recognition of Speech is
proposed. The Chapter 4 consists of System Description and
Hardware Implementation. The Software Section is given in
chapter 5.Results are depicted in Chapter 6.To bind up the report Summary & Conclusion are shown in Chapter 7.Chapter 8 deals
with Future Scope and at last References are given.
2
1.1 Purpose of this Document
This project report is prepared as the part of B.Tech final year project in Electronics and Communication Department,
MNNIT, Allahabad. The purpose of this project report is to give the detailed
description of the algorithms used for speech recognition and face
recognition; hardware for speech recognition and software
programs used for the development of the security system that at first authenticates the person using face recognition and after
proper authentication follows the predefined commands given by
him.
3
Chapter 2 Algorithm for Face Recognition
This algorithm consists of both the face detection and
recognition parts .First the edge of the face is detected by using a
morphological algorithm of boundary extraction. In this algorithm the image is first converted into a binary image and then erosion is
performed after taking a structuring element of 1‘s and of suitable dimensions (generally 5 ×5).
The edge of the face image ( I ) can be obtained by first eroding A by a structuring element B and then performing the set
difference between A and its erosion. If edge of A is denoted by E
(A) then
E (A) = A – (AӨB)
where AӨB shows erosion of image A by structuring element B.
Use of this 5 ×5 structuring element on Fig. 2 would result in
an edge of between 2 and 3 pixels thick as shown in Fig. 3.
Figure 2.1: A simple binary image.
4
Figure 2.2: Result of structuring element in Fig. 2.1
After getting the edge Fourier descriptors for the edge are calculated. Fourier descriptors are used for the face detection of
any object found on input image. The main advantage of using Fourier descriptors is their invariance to translation, rotation and
scaling of the observed object. Let the complex array ,
represent the edge belonging to face. Here value of n
depends upon the size of the face and the dimensions of the image
matrix obtained during image acquisition.
The Fourier transform coefficient is calculated by
The Fourier descriptors are obtained from the sequence by
truncating the elements and , then by taking the absolute
value of the remaining elements and dividing every element of thusly obtained array by . To summarize the Fourier descriptors
are
Ck-2 , k= 2, 3…..n-1.
The Fourier descriptors for each face edge will be invariant to
rotation, translation and scaling. Idealized translation only affects .So it is truncated while evaluating the Fourier
descriptors. Idealized rotation only causes the multiplication of
with each element. So while calculating Fourier descriptors the
absolute value is calculated. Idealized scaling accounts for
5
multiplication of a constant C with every element and so this effect
can be nullified by dividing all Fourier transforms by one of the calculated Fourier Transform Coefficient. As has been already
truncated, one good choice is .So every element is divided by .
All of the properties described are correct when idealized case
of translation, rotation and scaling is considered, but as the input
images acquired by our acquisition system are spatially sampled and all of the transformations will occur before image sampling, the
assumptions made concerning all of the transformations
translation, rotation and scaling—are not correct, but are only
approximations. However, in practical usage this will not cause any difficulties.
For face recognition a library of images of the person to be authenticated is created. First 3 images of each of the persons are taken with the help of web camera and the image acquisition
toolbox as described before and converted to binary image. The edge of the binary images and their corresponding Fourier
descriptors are calculated according to the algorithm using image processing toolbox.
For authentication of a person the same procedure is repeated to calculate the edge of the face and its Fourier descriptors.
For recognition purpose the absolute difference between the
first 100 Fourier descriptors of the sample image and the Fourier
descriptors of the dictionary images are calculated and summed using the following formula.
Sum=
For a sample image there are 3 such sums are calculated for
images of a person saved in the dictionary as there are 3 images
saved for each person in the dictionary. After getting these sums,
two highest sums are taken and average of these two final sums is
the final sum that is to be used for recognition purpose. The
numbers of final sums depend upon the number of authenticated persons whose images are saved in the dictionary.
For recognition the minimum sum is found out and this minimum sum corresponds to the image of the authenticated
person.
6
Chapter 3 Algorithm for Speech Recognition
To recognize the voice commands efficiently different
parameters of speech like pitch, amplitude pattern or power/energy can be used. Here to recognize the voice commands
power of the speech signal is used.
First the voice commands are taken with the help of a microphone that is directly connected to PC. After it the analog
voice signals are sampled using MATLAB 7.0.1. As speech signals generally lie in the range of 300Hz-4000 Hz, so according to Nyquist Sampling Theorem, minimum sampling rate required
should be 8000 samples/second.
After sampling, the discrete data obtained is passed through a
band pass filter having pass band frequency in the range of 300 - 4000 Hz. The basic purpose of using band pass filter is to eliminate
the noise that lies at low frequencies (below 300 Hz) and generally above 4000 Hz there is no speech signal.
This algorithm for voice recognition comprises of speech templates. The templates basically consist of the power of discrete
signals. To create the templates here the power of each sample is calculated and then the accumulated power of 250 subsequent
samples is represented by one value. For example, in the implemented algorithm 16000 samples are taken, then power of
discrete data will be represented by 64 discrete values as power of
subsequent 250 samples (i.e. 1-250.251-500….15750-16000) is accumulated and represented by one value. The numbers of
samples taken and grouped are absolutely flexible and can be
changed keeping in mind the required accuracy, memory space
available and the processing time. For recognition of commands first a dictionary is created that
consists of templates of all the commands that the robot has to
follow (like in our case these are ‗Turn Left‘, ‗Move Right‘, ‗Come Forward‘ and ‗Go Back‘. For creating the dictionary the same
command is taken several times (15 in this case) and template is
created each time. For creating the final template the average of all these templates is taken and stored.
7
After creating the dictionary of templates, the command to be
followed is taken with the help of the microphone and the template
of the input command signal is created using the same procedure as mentioned earlier.
Now the template of command received is compared with the templates of dictionary using Euclidian distance. It is the
accumulation of the square of each difference between the value of
dictionary template and that of command template at each sample
points. The formula can be given as
Euclidian Distance=
where i denotes the number of sample points, which is 32 in the proposed algorithm. After calculating Euclidian distance for each
dictionary template, these distances are sorted in the ascending
order to find out the smallest distance among them. This distance corresponds to a particular dictionary template which is the
template belonging to a particular dictionary command. Then the Robot detects that particular command given by the operator and
performs the task accordingly. If the command given by the operator does not match with any of the dictionary command then the Robot should not follow that command. In order to incorporate
this feature in the system an individual maximum range of values
of Euclidian distance for all the dictionary commands has been set. If the calculated Euclidian distance of received command does not
lie in the range for any dictionary command, then the command received is considered as a strange one and robot requests for a
familiar command. The efficiency of the proposed algorithm depends on the mechanism of dictionary creation and comparison method of the dictionary templates with the received command
template and also on the range of values for Euclidian distance. If the number of times the same command is taken for creating
dictionary is increased, the efficiency of proposed algorithm will go
up.
8
Chapter 4
System Description & Hardware Implementation
The block diagram of the proposed system for the face recognition is shown in Figure 4.1.
Figure 4.1: Block diagram of the face recognition system.
4.1a Person to be Authenticated
The person who wants to get the robot following his/her commands should be an authenticated one. So to prove one‘s
authenticity one will have to come in front of Web Camera for giving shot.
4.1b Web Camera
The web camera used for getting the snapshots may be any
simple web camera with an appropriate resolution. The resolution generally used is 480 ×640 but higher resolution can also be used
keeping in mind the processing time of the image acquired and
complexity of the system.
4.1c Image Acquisition and Processing Toolbox
Image acquisition toolbox and image processing toolbox are modules of MATLAB 7.0.1 installed in personal computer. Image
acquisition toolbox is used to acquire the image of the person to be
observed with the help of web camera .Image processing toolbox processes the image that has been obtained in the matrix form
9
using web camera and image acquisition toolbox. The main
functions of this toolbox are to detect the face in image first using
edge detection technique and then recognize it using Fourier descriptors.
The block diagram of the proposed system for the speech
recognition is shown in Figure 4.2.
Figure 4.2: Block diagram of the speech recognition system.
A brief description of various modules of the speech recognition system is given below.
4.2a Voice Commands
The voice commands are given by the person who has been
authenticated by the face recognition algorithm. In the proposed algorithm a limit on the number of voice commands has been
imposed to make the system useful for real world applications.
4.2b Microphone
The microphone takes the commands from the authenticated person. It is directly connected to the personal computer.
Commands given by the person are taken as analog inputs using the Date Acquisition Toolbox of MATLAB.
10
4.2c Sampler
The speech signal obtained is sampled to convert it into discrete from. The sampling is done in MATLAB. As speech signals
lies in the range of 300 Hz-4000 Hz, so according to Nyquist
Sampling Theorem minimum sampling rate required should be
8000 samples/second. But in order to obtain the required accuracy sampling rate is decided as 16000 samples/second.
4.2d Band Pass Filter
After sampling the discrete signal obtained is passed through
a band pass filter. Here, the fourth order Chebysheb band pass filter having the pass band 300 Hz - 4000 Hz. The band pass filter
is used to remove the noise existed outside the pass band.
4.2e Processing and Decision Making Unit The processing unit does all the processing of the speech
signals required for the voice commands recognition. Here personal computer with MATLAB 7.0.1 is used as processing and decision
making unit.
4.2f Microcontroller & Motor Controller Unit
Here the processing and decision making unit (See Figure 3.4) for sound detection and then for required is ATMEL‘s AVR family
ATMEGA32L microcontroller. This microcontroller has 32k programmable flash memory and maximum clock frequency 8 MHz
It consists of 32 x 8 General Purpose Working Registers. It has 32K Bytes of In-System Self-Programmable Flash with endurance:
10,000 Write/Erase Cycles, 1024 Bytes EEPROM with endurance:
100,000 Write/Erase Cycles, 2K Byte Internal SRAM, Two 8-bit
Timer/Counters with Separate Prescalers and Compare Modes and
One 16-bit Timer/Counter with Separate Prescaler, Compare Mode and Capture Mode. The Operating Voltage is 2.7 - 5.5V and Speed
Grade is 0 - 8 MHz for ATmega32L. It has an inbuilt 8 channel, 10
bit A/D converter. Here A/D converter (Port A) is used for converting analog signal
from the output of band pass filter to digital signal which is
processed by the processing unit of microcontroller and
accordingly, it will generate appropriate control signals (at Port D
11
in microcontroller for ‗SODDRO‘) to drive the motors used in
mechanical assembly.
Dual full H bridge motor driver IC L298 as shown in Figure
3.4 is used to control the movement of motors. Each H bridge is capable of moving the motor ‗clockwise‘ or ‗anticlockwise‘
depending upon the direction of current flow through the circuit.
Using IC L298, it is possible to ‗jam‘ or ‗free‘ the motors if required.
Basically L298 acts as an interface between the low power control signals generated by microcontroller and the motor assembly which
requires relatively high power for driving of motors. In this system
the logic supply voltage used is 5V and motor supply voltage is 6V.
Figure 4.2a: Microcontroller and motor controller unit
4.2g Mechanical assembly
This module mainly consists of two 3V brushed DC motors,
gear boxes and vehicle chassis. The implemented side steering
mechanism in mechanical assembly can effectively control the
motors for taking sharp turns. Motor 1 controls the motion of left wheels and motor 2 controls the right wheels as shown in the
Figure 3.5.
12
Figure 4.2b: Bottom view of mechanical assembly
13
4.3 List of components used for Sound Detection, Face
out portc,r16 ;siganl for ready to get command i/p
READ_PORTB:
in r16,pinb ;read port B on which signal from pc is present
andi r16,$07 ; masking the upper five bits
cpi r16,$00
breq READ_PORTB
cpi r16,$01
breq MOVE_FORWARD
cpi r16,$02
breq TURN_LEFT
cpi r16,$03
breq GO_BACK
cpi r16,$04
breq TURN_RIGHT
rjmp READ_PORTB
MOVE_FORWARD:
ldi r29,$55 ;data to move robo FWD
out portd,r29 ;give control signal to port D to move
ldi r20,$b4
ldi r21,$1a
call MOVE
rjmp MAIN
TURN_LEFT:
ldi r29,$56 ;data to move robo CCW
49
out portd,r29 ;give control signal to port D to move
ldi r20,$b4
ldi r21,$1a
call MOVE
rjmp MAIN
GO_BACK:
ldi r29,$66 ;data to move robo back
out portd,r29 ;give control signal to port D to move
ldi r20,$b4
ldi r21,$1a
call MOVE
rjmp MAIN
TURN_RIGHT:
ldi r29,$65 ;data to move robo cw
out portd,r29 ;give control signal to port D to move
ldi r20,$b4
ldi r21,$1a
call MOVE
rjmp MAIN
MOVE:
; =============================
; delay loop of
; 100,000 cycles:
; -----------------------------
; delaying 99990 cycles:
ldi R22, $A5
WGLOOPdd00: ldi R23, $C9
WGLOOPdd01: dec R23
brne WGLOOPdd01
dec R22
brne WGLOOPdd00
; -----------------------------
; delaying 9 cycles:
ldi R22, $03
WGLOOPdd02: dec R22
brne WGLOOPdd02
; -----------------------------
; delaying 1 cycle:
nop
; =============================
LOOPABCD:
; =============================
; delay loop of
; 64 cycles:
50
; -----------------------------
; delaying 63 cycles:
ldi R22, $15
WGLOOPcc000: dec R22
brne WGLOOPcc000
; -----------------------------
; delaying 1 cycle:
nop
; =============================
dec r20
brne LOOPABCD
cpi r21,$00
breq END_LOOP
dec r21
brne LOOPABCD
END_LOOP:
ldi r29,$ff ; jam both motors
out portd,r29
; =============================
; delay loop generator
; 5,000,000 cycles:
; -----------------------------
; delaying 4999995 cycles:
ldi R22, $21
WGLOOPqqq0: ldi R23, $D6
WGLOOPqqq1: ldi R25, $EB
WGLOOPqqq2: dec R25
brne WGLOOPqqq2
dec R23
brne WGLOOPqqq1
dec R22
brne WGLOOPqqq0
; -----------------------------
; delaying 3 cycles:
ldi R22, $01
WGLOOPqqq3: dec R22
brne WGLOOPqqq3
; -----------------------------
; delaying 2 cycles:
nop
nop
; =============================
ret
51
Chapter 6
Results
The algorithms for face recognition and speech recognition
have been successfully implemented on the MATLAB 7.0.1 and the
results are given below.
6.1 Face Recognition Results
Person Recognition Efficiency
Aditya 85%
Subhayan 85%
Nilesh 82%
Chandra Veer 90%
.
6.2 Speech Recognition Results
Words
Recognition Efficiency
TURN LEFT
85%
MOVE RIGHT
80%
COME FORWARD
85%
GO BACK
75%
The above results are taken in very stringent environmental
conditions. For speech recognition, it is very necessary to
maintain same environmental conditions during creation of
dictionary and while taking sample voice commands. For face
recognition it is always required to maintain same light
conditions and image background during creation of dictionary and while taking sample image. The efficiency of the proposed
algorithm can be improved significantly if the strict and suitable
laboratory conditions are provided.
52
Chapter 7
Summary and Conclusion
7.1 Summary
The algorithms for the Speech Recognition and Face
Recognition are of utmost importance as far as any security system
is concerned. In a reliable security system at least two stages of
security exist. This system also incorporates the same. At first the person is authenticated using face recognition and after proper
authentication of the person voice commands of the person is accepted by the system and the system works according to that.
This system can be used as authentication module of a humanoid robot or in real time industrial applications for security. This
security system is faster, simpler and economic as compare to previous algorithms.
7.2 Conclusion
A system for reliable recognition of speech and face has been
designed and developed. This system can be made highly efficient and effective if stringent environmental conditions are maintained.
The setup for maintaining these environmental conditions will be a onetime investment for any real life application. Apart from it, this system is highly efficient and economic as compare to other
systems generally used for providing security. The running cost of
this system is much lower as compare to other systems used for
the same purpose.
53
Chapter 8
Future Scope The proposed system is highly efficient and effective .The
accuracy of the system can be improved remarkably if a setup for
providing stringent environmental conditions is provided. In the
proposed system for speech recognition the accent is used as distinguishing parameter and for face the boundary points are used
as distinguishing parameters. To improve the efficiency of these
algorithms some other parameters for speech and image can be used but this will definitely increase the system complexity and the
processing time. The efficiency of the same algorithms can be
improved significantly if the concept of Neural Networks is also incorporated in the implementation of the proposed algorithms.
54
REFERENCES
[1] A. U. Batur, B.E. Flinchbaug and M.H. Hayes, ―A dsp based approach for the implementation of face recognition algorithms,‖
Proc. of ICASSP, pp. 253-256, 2003.
[2] A. U. Batur and M.H. Hayes, ―Linear subspace for illumination-robust face recognition,‖ Proc. of IEEE Conf. Computer Vision and
Pattern Recognition, pp. 296-301, 2001.
[3] M. A. Turk and A.P. Pentland, ―Face recognition using eigenfaces,‖ Proc. IEEE Conf. Computer Vision and Pattern
Recognition, pp. 586-91, 1991
[4] B. Moghaddam and A. Pentland, ―Probabilistic Visual Learning
for Object Representation,‖ IEEE Trans. Pattern Analysis and
Machine Intelligence, Vol.19, pp. 696-710, July 1997
[5] C. Kotropoucos and I. Pitas, ―Rule –Based Face Detection in
Frontal Views,‖ Proc. Int’l Conf. Acoustics, Speed and Signal
Processing, Vol.4, pp. 2537-2540, 1997.
[6] K. K. Sung and T. Poggio, ―Example Based Learning for View-Based Human Face Detection,‖ IEEE Trans. Pattern Analysis and
Machine Intelligence, Vol. 20, No. 1, pp. 39-51, Jan. 1998.
.
[7] R.F. Estrada, and E. A. Starr, “50 years of acoustic signal
processing for detection: coping with the digital revolution,‖ Annals of the History of Computing, IEEE, vol. 27, Issue: 2, pp. 65 – 78,
April-June 2005.
[8] R. C. Gonzalez and R. E. Woods, Digital Image Processing,
Pearson Education (Singapore) Pte. Ltd, Delhi, India, pp. 519-560,
2004.
[9] A. Mishra and A.Jain, Programming in MATLAB 7.0.1, Agarwal