Android Speech to Text Converter for SMS Application

Post on 08-Nov-2014

201 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Power Point Presentation

Transcript

Presented byMr. Sachin Deshmukh.

9970406068

Guided ByProf.S.K.Sonkar

A Seminar on

Presentation Flow

Introduction Related Work Various Approaches Various Modules Working Performance Parameters Of The System Applications Advantages and Disadvantages Future Scope Conclusion References

IntroductionIntroduction and Motivation

History

Traditional Method

Need ?

Overview Of Android Speech toText Converter

• Speech recognition is done via the Internet

• Use of HMM

Related Work

1. Speech to Text Conversionusing Android Platform

Related Work

International Journal of Engineering Research and Applications (IJERA)ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.253-258

2. Android Speech To TextConverter For SMS Application

IOSR Journal of Engineering Mar. 2012, Vol. 2(3) pp: 420-423

Related Work

3. Smart Texting System

International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 2,Mar-Apr 2012, pp.1126-1128

Related Work

Association for International Journal in Computer Science & Electronics Volume I ReferenceID: aijcse2005.

Literature Survey

4. SMS Application Using Speech To TextConverter In Android Mobiles

Various Approaches Template-Based Approaches Knowledge-Based Approaches Neural Network-Based Approaches Hidden Markov Model (HMM)-Based Speech

Recognition

1.Template Based ApproachIt is a process of matching unknown speech is compared against aset of pre-recorded words (templates) to find the best match. Disadvantage that the pre-recorded templates are fixed, so

variations in speech can only be modeled by using manytemplates per word, which eventually becomes impractical.

Template preparation and matching become prohibitivelyexpensive or impractical.

vocabulary size increases beyond a few hundred words. requires storage and processing power to perform the matching. Template matching was also heavily speaker dependent and

continuous speech recognition was also impossible.

2. Knowledge Based Approaches The “expert” knowledge about variations in speech is

hand-coded into a system. It uses set of features fromthe speech, and then the training system generates setof production rules automatically from the samples. This has the advantage of explicitly modeling

variations in speech; but unfortunately such expertknowledge is difficult to obtain and use successfully, sothis approach was judged to be impractical, andautomatic learning procedures were sought instead.

3. Neural Network-BasedApproaches Another approach in acoustic modeling is the use of

neural networks. They are capable of solving muchmore complicated recognition tasks, but do not scaleas excellent as Hidden Markov Model (HMM) when itcomes to large vocabularies A neural network (NN) is an interconnected group of

natural or artificial neurons that uses a mathematicalor computational model for information processing

4. Hidden Markov Model (HMM)-Based Speech Recognition In this approach the speech data is trained and the

pattern representative is created using one or moretest patterns. then HMM-Based Recognition is used. Recognition or pattern classification is the process of

comparing the unknown test pattern with each soundclass reference pattern and computing a measure ofsimilarity (distance) between the test pattern and eachreference pattern. An Hidden Markov model is use forspeech recognition, which converts the speech to text.

Various Modules1. Speech Recognition2. Speech Preprocessing3. Hmm Training

Speech samples are obtained from speaker at real time. For speech recognition we require microphone There is need to store the sample of different users to make

system more compatible to any type of voice.

1. SPEECH RECOGNITION

Sr. No. Voice Text

1. Speech Nick\Now is

2. Cute Spite

3. Google and/this/yes

4. Yahoo Down to/now

Table showing output generated by acquisition module

2. SPEECH PREPROCESSING

The voice which is taken at the real time will requirenoise free speech signals background noise that need to be removed. The preprocessing reduces the amount of efforts in

next stages. Input to the speech preprocessing is speech signals

which then converted into speech frames and givesunique sample.

Steps in Speech Preprocessing Steps: 1. The system must identify useful or significant samples from the

speech signal. To accomplish this goal, the system divides the speechsamples into overlapped frames.

2. The system performs checks for the voice activity using endpointdetection and energy threshold calculations.

3. The speech samples are then passed through a pre-emphasis filter. 4. The frames with voice activity are passed through a Hamming

window. The system performs autocorrelation analysis on each frame. 6. The system finds linear predictive coding (LPC) coefficients using

the Levinson and Durbin algorithm. We apply a Hamming window to each frame to minimize signal

discontinuities at the beginning and end of the frame.

3. HMM TRAINING Training involves creating a pattern representative of

the features of a class using one or more test patterns. A model commonly used for speech recognition is the

HMM, which is a statistical model used for modelingan unknown system using an observed outputsequence.

Speech Recognition

Working

Speech Processing

HMM Training

HMM BasedRecognition

Hidden MarkovModel Text Storage

Speech To Text Conversion System

Noise Free Data

Google’s speechrecognition engine.

Performance Parameters Of TheSystem

1. Accuracy of RecognitionAccuracy is measured with the Word Error Rate

(WER), whereas speed is measured with the real time factor.WER can be computed by the equation,

WER=(S+D+I)/N Where S is the number of substitutions, D is the number of the deletions, I is the

number of the insertions and N is the number of words in the reference.

Performance Parameters Of TheSystem

2. Speed of RecognitionThe speed of a speech recognition system is

commonly measured in terms of RealTime Factor (RTF). It takes time P to process an input ofduration I. It is defined by theformula,

RTF=P/I

Applications

1. Sending MessageApplications

2. EmailApplications

3.Web SearchApplications

4.Voice Dial

5. Data Entry

6.Speech To Text Convertor inMobile Phones

Applications

Advantages Natural way of interaction.it is not necessary to sit at a

keyboard. Faster data processing. No training required for users! Useful for physically handicapped people. We can use Google's speech recognition engine for

memory savings.

Disadvantages If there is noise or some other sound in the room (e.g. the

television or a kettle boiling), the number of errors willincrease. The microphone is close to the user More distant microphones (e.g. on a table or wall) will tend

to increase the number of errors. Requires preprocessing after acquiring of speech. The system requires training of voice because of the

purpose of it should recognize the correct voice of thatuser. the need for permanent Internet connection.

Future Scope“Further work is planned to implement themodel of speech recognition for differentlanguage.”

Conclusion

Reference [1] B. Raghavendhar Reddy, E. Mahender : Speech to Text

Conversion using Android Plat-form,International Journal ofEngineering Research and Applications (IJERA) ISSN: 2248-9622Vol. 3, Issue 1, January -February 2013, pp.253-258

[2] Ms. Anuja Jadhav,Prof. Arvind Patil : Android Speech to TextConverter for SMS Application,IOSR Journal of Engineering Mar.2012, Vol. 2(3) pp: 420-423

[3] Jagriti Chand:Sms Application Using Speech To Text ConvertorIn Android Mobiles,

Volume I Issue I Reference Id: Aijcse2005 [4] M. Bacchiani, F. Beaufays, J. Schalkwyk, M. Schuster, and B.

Strope. Deploying GOOG-411: Early lessons in data, measurement,and testing. In Proceedings of ICASSP, pages 52605263, 2008.

Questions???

Thank You.

top related