Top Banner

Click here to load reader

of 32

Silent sound interface

Jan 28, 2015





2. CONTENTSIntroductionWhat is speech?Sources of informationBrain computer interface (BCI)Speech synthesisspeech synthesis technologiesBlock diagramFeatures Methods of producingElectromyographyImage processing ApplicationsIn fictionReference 3. You are in a theatre or a noisy restaurant or a bus etc., where there is lot ofnoise around is a big issue while talking on a mobile phone. But in future thisproblem is eliminated with silent sound technology, a new technologyunveiled at the CeBIT fare. It transforms lip movements into a computergenerated voice for the listener at the other end of the listener Silent speech is a device that allows speech communication without usingthe sound made when people vocalize their speech sounds. As such it is atype of electronic lip reader. It works by computer identifying phonemes thatan individual pronounces from non auditory sources of information abouttheir speech movements. These are then used to recreate the speech usingspeech synthesis 4. The device uses electromyography, monitoring tiny muscular movementsthat occur when we speak and converting them into electrical pulses thatcan be turned into speech without a sound uttered. It also uses imageprocessing technique that converts digital data into a film image withminimal corrections and calibration. 5. Speech is the vocalized form of human communication. It is based upon thesyntactic combination of lexical and names that are drawn from very large(usually to about 10,000 different words) vocabularies. A gestural form of human communication exists for the deaf in the form ofsign language. Speech in some cultures has become the basis of a writtenlanguage, often one that differs in its vocabulary, syntax and phonetics from itsassociated spoken one, a situation called diglossia 6. Sources of information:Vocal tractBone conduction 7. The vocal tract is the cavity in human beings and inanimals where sound that is produced at the sound source (larynx in mammals;syrinx in birds) is filtered. 8. Bone conduction is the conduction of sound to the inner ear through the bonesof the skull.Some hearing aids employ bone conduction, achieving an effect equivalent tohearing directly by means of the ears. A headset is ergonomically positioned on thetemple and cheek and the electromechanical transducer, which converts electricsignals into mechanical vibrations, sends sound to the internal ear through thecranial bones. Likewise, a microphone can be used to record spoken sounds viabone conduction. The first description, in 1923, of a bone conduction hearing aidwas Hugo Gernsbacks "Osophone", which he later elaborated on with his"Phonosone". 9. Categories:Ordinary productsHearing aidsSpecialized communication products Advantages: Ears free High sound clarity in very noisy environment Can have a perception of stereo sound Disadvantages: Some implementations require more power than headphones. Less clear recording and playback than headphones. 10. A brain computer interface is often called as mind machine interface(MMI)orsometimes called direct neural interface is a direct communication pathwaybetween the brain and an external deviceThe field of BCI research and development has since focused primarily onneuroprosthetics applications that aim at restoring damaged hearing, sight andmovement. Thanks to the remarkable cortical plasticity of the brain, signals fromimplanted prostheses can, after adaptation, be handled by the brain like naturalsensor or effecter channels. Following years of animal experimentation, the firstneuroprosthetic devices implanted in humans appeared in the mid- 90s 11. Speech synthesis is the artificial production of human speech. Acomputer system used for this purpose is called a speechsynthesizer, and can be implemented in software or hardware. Synthesized speech can be created by concatenating pieces ofrecorded speech that are stored in a database. Systems differ in thesize of the stored speech units; a system that stores phones ordi phones provides the largest output range, but may lack clarity. 12. Speech synthesizing process:The quality of a speech synthesizer is judged by its its similarity to the humanvoice and by its ability to be understood. An intelligible text-to-speech programallows people with visual impairments or reading disabilities to listen to writtenworks on a home computer. Many computer operating systems have includedspeech synthesizers since the early 1980s. 13. The most important qualities of speech synthesis system are naturalness andintelligibility . Naturalness describes how closely the output sounds like humanspeech, while intelligibility is the ease with which the output is understood. There are 8 types of Synthesizing technologies such that they are : a) Concatenative synthesis b) Unit selection synthesisc) Di phone synthesisd) Domain-specific synthesise) Formant synthesisf) Articulatory synthesisg) HMM-based synthesish) Sine wave synthesis 14. CONCATENATIVE SYNTHESIS: Concatenative synthesis is based on the concatenation (or stringingtogether) of segments of recorded speech. Generally, Concatenative synthesisproduces the most natural-sounding synthesized speech.UNIT SELECTION SYNTHESIS: Unit selection synthesis uses large databases of recorded speech. Duringdatabase creation, each recorded utterance is segmented into some or all of thefollowing: individual phones, di phones, half-phones, syllables, morphemes, words, phrases, and sentences. DI PHONE SYNTHESIS:Di phone synthesis uses a minimal speech database containing all the diphones(sound-to-sound transitions) occurring in a language. The number of diphones depends on the phonotactics of the language: for example, Spanish hasabout 800 di phones and German about 2500. In di phone synthesis, only oneexample of each di phone is contained in the speech database. 15. Domain specific synthesis:Domain-specific synthesis concatenates prerecorded words andphrases to create complete utterances. It is used in applications where thevariety of texts the system will output is limited to a particular domain, liketransit schedule announcements or weather reports.Format synthesis:Format synthesis does not use human speech samples at runtime.Instead the synthesized speech output is created using additive synthesisand an acoustic model (physical modeling synthesis). Parameters such asfundamental frequency, voicing, and noise levels are varied over time tocreate a waveform of artificial speech. This method is sometimes calledrules-based synthesis 16. ARTICULATORY SYNTHESIS: Articulatory synthesis refers to computational techniques forsynthesizing speech based on models of the human vocal tract and thearticulation processes occurring there. Until recently, articulatory synthesismodels have not been incorporated into commercial speech synthesissystems.HMM BASED SYNTHESIS:HMM-based synthesis is a synthesis method based on hidden Markovmodels, also called Statistical Parametric Synthesis. In this system, thefrequency spectrum (vocal tract), fundamental frequency (vocalsource), and duration (prosody) of speech are modeled simultaneously byHMMs. Speech waveforms are generated from HMMs themselves based onthe maximum likelihood criterion. 17. SINE WAVE SYNTHESIS: Sine wave synthesis is a technique for synthesizingspeech by replacing the formants (main bands of energy) withpure tone whistles. 18. BLOCK DIAGRAM : 19. FEATURES: AUDIO SPOTLIGHT:The Audio Spotlight transmitters generate a column of sound betweenthree and five degrees wider than the transmitter. It converts ordinaryaudio into high-frequency ultrasonic signals that are outside the range ofnormal hearing. As these sound waves push out from the source, theyinteract with air pressure to create audible sounds.Sound field distribution is shown with equal loudness contours for astandard 1 KHz tone. The center area is louder at 100% amplitude, whilethe sound level just outside the illustrated beam area is less than 10%.Audio spotlight systems are much sensitive to listener distance thantraditional loudspeakers, but maximum performance is attained at roughly1-2m (3-6feet) from the listener.Typical levels are 80dB SPL at 1 KHz for As-16 and 85dB SPL for AS-24models. The larger AS-24 can output about twice the power and twice lowfrequency range. 20. This simulation is fixed for fixed source size(0.4m/16) with varying wavelength.From the statements above, we expect to see an unidirectional response for a largewavelength relative to source, and higher directivity as wavelength decreases. 21. METHODS OF PRODUCINGELECTROMYOGRAPHYIMAGE PROCESSING 22. ELECTROMYOGRAPHY:It is a technique for evaluating and recording the electrical activityproduced by skeletal muscles. EMG is performed using an instrumentcalled an electromyography, to produce a record called anelectromyogram. An electromyography detects the electrical potentialgenerated by muscle cells when these cells are electrically orneurologically activated. 23. Electromyographic sensors attached to the face records the electric signalsproduced by the facial muscles, compare them with pre recorded signalpattern of spoken words .When there is a match that sound is transmitted on to the other end of theline and person at the other end listen to the spoken words. 24. For such an interface ,we should use 4 kinds of TRANSDUCERS . They are as follows :- 1.Vibration sensors 2.Pressure sensor 3.Electromagnetic sensor 4.Motion sensorIMAGE POCESSING:The simplest form of image processing converts the data tape into a film imagewith minimal corrections and calibrations. 25. Digital data Pre processingFeature extractionImage enhancement Selection of trainingdataManual interpretation Decision and classification Ancillary dataSupervisedUnsupervisedClassification outputPost processing operation Assess memory Maps and imageries Reports Data 26. As we know in

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.