Top Banner
P A G E 1 © 2015 Apio Systems, Inc. Confidential 1 Jared Sheehan @ Driversiti Speech Recognition as a User Interface
30

Speech Recognition as a User Interface

Feb 11, 2017

Download

Technology

Jared Sheehan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Title

1Jared Sheehan @ DriversitiSpeech Recognition as a User Interface

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 2Who am I

Glass explorer, speech recognition enthusiast and big android nerd

Android Lead @Driversiti - driving safety for the mobile generation

Speech Recognition application for the Amazon Fire Phone

Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch Android, AOL HD, AIM Blackberry

Meetup evangelist DC Android Meetup Group Join today!

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 3Overview

What is voice/speech recognition?

What awesome stuff you can do with it?

How it works

Demo!

Question and Answer

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 4Hello Computer

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 5

Definition

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 6What can you do with SR?

Technology that allows spoken input into software systems.

You speak to your computer, tablet, phone or device and it uses what you said as input to trigger some sort of action.

Replace other methods of input like clicking, swiping, typing or selecting in other ways.

It is a means to make devices and software more user-friendly and to increase productivity.

It is used extensively as a form of accessibility assistance.

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 7ASR - Dictation

Automatic speech recognition (ASR) also called Dictation

Translates speech input into words, sentences and punctuation.

Audio is input through a microphone and streamed somewhere

The result is usually returned as a string with a confidence level

Very easy integration with Android 2 ways to do it.

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 8How does it work?

A user speaks into a recording device of some sort

Speech recognition begins with the digital sampling of speech and then acoustic signal processing of the audio.

Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models) and NNs (Neural Networks) can achieve the desired results

Most systems use language specific knowledge to tune the models.

Next is the actual recognition of phonemes, groups of phonemes and words

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

8CHI-AAA123-20100421-

9Speech Recognition system architecture

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 10Into the weeds

Speaker dependence

Speaker independence

Continuous Speech

How good is your system? Hint: Word Error Rate

Isolated wordIs that all it does??

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

10CHI-AAA123-20100421-

11Dictation is cool, but not that cool

Next step is understanding what the user wants to do

Then act on it

Generally, the ASR results are passed into an Intent recognition system with additional information

Contextual information can be, where the utterance is coming from (mobile phone, computer), what app they are using, location etc.

That information is used to determine the users intent and execute the request.

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 12Intent recognition

Recognizing speech is only part of the process. How does Google Now know that I want to send an SMS message to a friend? How does Siri know when I want to know how tall Kobe Bryant is?

ASR is only the first step in true Speech as a user interface. To successfully help users perform useful actions we must understand their intent. How to do this?

Three systems; ASR, Intent Recognition and a Dialog Engine

The Dialog engine takes the output from the IR system and sends responses and actionable information to the caller.

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 13Android Speech APIs

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 14Android Speech APIs

http://developer.android.com/reference/android/speech/package-summary.html

Relatively easy implementation

A UI and no UI API

InputMethodServices use the no UI version - Keyboards

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 15Recognizer Intent

UI is supplied for you

Fire the intent and get a result

Again very easy to use

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 16SpeechRecognizer

UI is not supplied for you

Results are streamed directly to the EditText

Still fairly easy to use

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential What else is there?16CHI-AAA123-20100421-

17Google Now Onto Intent recognition systems

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 18Google Now On tap

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 19Apple Siri

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

20Amazon Fire phone, Fire Tv and Echo

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 21Microsoft Cortana

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 22Speech providers Google, Nuance, IBM Watson

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 23Google Voice Interaction API

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 24Nuance Speech SDK

Dragon Mobile SDK Free up to 20k transactions per/month

Upload custom vocabularies

Developer: Uploads a new song and music vocabularyUtterance: Eminem higher probability then M&M

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

24CHI-AAA123-20100421-

25User Interface examples - Google Glass

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 26User Interface examples - Google Glass continued

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 27User Interface examples - Google Glass continued

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

Enough talk!

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

Show me code!

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential

[email protected]://www.meetup.com/DCAndroid/Tweet: @jayroo5245

THANK YOU

Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential