Top Banner
Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine learning techniques Presented by : Ahmed Mesbah Ahmed El-taybany Mentor : Dr. Marwan Torki
20

Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Dec 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Automated Lip reading technique for people with speech disabilities by converting identified visemes

into direct speech using image processing and machine learning techniques

Presented by :Ahmed Mesbah

Ahmed El-taybanyMentor : Dr. Marwan Torki

Page 2: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

ProblemDeaf-muted

Remnants of hearing

Other speech disabilities

Page 3: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

10%

90%

Deaf-muted people statistics in Egypt in %Deaf-muted Normal

Statistics

Page 4: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Background research SIGN LANGUAGE RECOGNITION

Page 5: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

WATCH KEYBOARD ELECTRONIC LARYNX

Page 6: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Main idea

- Decreasing physiological impacts- Semi-normal state - It was proved that human could replace ears with eyes for speech reading.

Page 7: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Audio-visual speech recognition (AVSR)

Page 8: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Capturing Hardware and design

Page 9: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Design advantages and proof of concept

The Mouthesizer: A Facial Gesture Musical Interface 2004No more face detection

Page 10: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Lip Feature extraction

Image-based approaches

Model-based approaches

Page 11: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

AAMACM

ASM

Thresholding

Man

ifold

LDA

PCADCT

DWT

16

710

25

24

11 10

1

users

Lip Feature extraction used methods

Page 12: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Classifiers

Hmm NNKNN

SVM

RDADTW

MPTW

HSOM

HCM FCN DP

52

11

3 51 1 1 3 2 1 3

users

- Hidden Markov Model and Neural Network were the most common classifiers

Page 13: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Dataset- AV letters (University of East Angela)- Oulu database (University of Oulu)-CUAVE database (Clemson University)- Home-made data set

Page 14: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Lip reading system problems for multi-speaker

Variation in : Accents

Talking speeds

Skin color

Lip shapes

Illumination conditions

Confusing recognition tasks

Facial hair

Page 15: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

International phonetics alphabetic (IPA)

Visible speech

phonemes

visemes

Page 16: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

seen

unseen

phonemes

Using prediction technique to recover unseen letters like Microsoft Speech API or Google

Letter Prediction methods

Page 17: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Lip reading system

1 •Input

2 •Feature extraction

3 •Classification

4 •Output

Page 18: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Applications

Page 19: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

References[1] Hsu, Rein-Lien, Abdel-Mottaleb, Mohamed, Jain, Anil K., Face Detection in Color mages, IEEE ICIP 1999, pp 622-626

[2] Lai-Kan-Thon, Olivier, Lips Localization, Brno 2003

[3] Smith, S. M., Brady, J. M., SUSAN – a new approach to low level image processing, International Journal of Computer Vision, 23(1):45-78, May 1997

[4] Ahlberg, J.: A system for face localization and facial feature extraction, Linkoping University, Tech.Rep. LiTH-ISY-R-2172

[5] Albiol, A., Torres, L., Delp, E. J.: Optimum color spaces for skin detection, In Proceeding of the International Conference on Image Processing 2001, vol. 1, 122-124

[6] G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W.Senior, “Recent advances in the automatic recognition of audio-visual speech,” Proc. IEEE, 91(9): 1306–1326, 2003.

[7] D. Gatica-Perez, G. Lathoud, J.-M. Odobez, and I. Mc-Cowan, “Multimodal multispeaker probabilistic trackingin meetings,” in Proc. Int. Conf. Multimodal Interfaces (ICMI), 2005.

[8] A. Pentland, “Smart rooms, smart clothes,” in Proc. Int.Conf. Pattern Recog. (ICPR), 1998.

[9] CHIL: Computers in the Human Interaction Loop. [Online]. Available: http://chil.server.de

[10] P. Lucey and G. Potamianos, “Lipreading using profile versus frontal views,” in Proc. Int. Works. Multimedia SignalProcess. (MMSP), pp. 24–28, 2006.

[11] P. Lucey, G. Potamianos, and S. Sridharan, “A unified approach to multi-pose audio-visual ASR,” (To Appear) inProc. Interspeech, 2007.

Page 20: Automated Lip reading technique for people with speech disabilities by converting identified visemes into direct speech using image processing and machine.

Thanks