Talking to machines, listening to humans Gordon Plant, WIAD 2017 @gordonplant
Talking to machines, listening to humans
Gordon Plant, WIAD 2017
@gordonplant
Why talk to machines?https://www.theguardian.com/culture/gallery/2015/jan/08/the-top-20-artificial-intelligence-films-in-pictures#img-7
If I could talk to the animals machines…
“if I could talk to the animals, just imagine it
Chatting to a chimp in chimpanzee
Imagine talking to a tiger, chatting to a cheetah
What a neat achievement that would be”
Dr Doolittle (1967)
Humans anthropomorphise everything
• “54 per cent of people have verbally assaulted their computers,
while 40 per cent have resorted to physical violence”
http://www.telegraph.co.uk/technology/5086091/Computer-rage-affects-more-than-half-of-Britons.html
Why talk to machines?
• Talking doesn’t interrupt other tasks
• Attention can remain focussed elsewhere
• Machines are ‘effort multipliers’
• Effort / Reward ratio is improved as effort is reduced
• Some intentions are hard to express via a GUI
Who’s talking already?
Survey of 1250 people by Creative Strategies, October 2016
• 22% use a voice assistant four to six times a week
• 33% think it is more convenient to talk than type
• 27% would prefer to act with bots in the car
• 26% would prefer to act with bots in the home
http://creativestrategies.com/no-bots-please-europeans/
How does a LUI work?
Utterance, intent, invocation
• Utterance
• The spoken words
• Intent
• A recognisable intent extracted from parsing the utterance
• Invocation phrase
• The phrase that launches the relevant ‘skill’
A ‘skill’ is a bit like an
App on your phone
Utterance strcuture
Alexa, tell HAL to open the pod bay doors
Utterance
IntentRequestWakeword
Connecting words
Invocationname
Let’s turn on the heating
command boiler
Voice input
Let’s turn on the heating
Skill
Touch input App
Direct input
Let’s turn on the heating
Alexa, tell Hive to turn the heating on
Utterance
IntentConnecting words
Wakeword
Invocationname
Request
Alternative request words
• Talk to
• Open
• Launch
• Start
• Resume • Run
• Load • Begin
Let’s turn on the heating
Alexa, tell Hive to turn the heating on Alexa, tell Hive to turn the heating on to 20 degrees Alexa, tell Hive to put the heating on to 20 degrees Alexa, tell Hive to boost my heating
Tell <invocation name> <connecting word> <some action>
Let’s turn on the heating
Alexa, tell Hive to turn the heating onAlexa, turn the heating on Alexa, tell Hive to turn the heating on to 20 degreesAlexa, Hive to 20 degrees Alexa, tell Hive to put the heating on to 20 degrees Alexa, put the heating on for 20 degrees Alexa, tell Hive to boost my heatingAlexa, tell Hive to put the heating on for 1 hour
Tell <invocation name> <connecting word> <some action>
For every phrase that works, there are many similar ones that don’t
GUI vs LUI
Wake Invoke App Navigate app Tap button ConfirmationAlexa tell Hive to turn the heating on “ok”
<invocation name> <connecting word> <some action>
Model / Modes / Actionsare discoverable at launch
Model / Modes / Actionsonly discoverable by trial and error
User only needs to remember the name or location of the app
User needs to remember complete, structured sentences
GUI LUI
Heating on
0 0.5 1.0
1.0
0.5
Alexa, tell Hive to turn the heating on
Alexa, ask Hive heating on
Alexa, tell Hive to turn the heating off
Alexa, tell Hive heating off
Alexa, turn the heating on
We don’t have to think about resolving the click – it just
happens
GUI LUIMany inputs may resolve to the same ‘click’,
and others may not resolve at all
Heating off
Tell me a joke
After Matthew Honnibal
Heating on
After Matthew Honnibal0
Heating off
Tell me a joke What’s the weather?
Play some jazz
What’s the time in Seattle?
Play some rock
How’s my diary looking?
Add beer to my shopping list
Did Arsenal win last night?
Get the Batmobile ready
Buy more dishwasher tabs
Lock the back door
Where’s that Beer I ordered?
https://medium.com/swlh/a-natural-language-user-interface-is-just-a-user-interface-4a6d898e9721#.bogmc1aru
The Invisible Canvas
“[With a LUI] you have a vastly bigger canvas on which the user
can “click”…But you still have to paint buttons, forms, navigation
menus etc. onto this canvas. You’re still wiring a UI to some fixed
underlying set of capabilities.”
Matthew Honnibal
https://medium.com/swlh/a-natural-language-user-interface-is-just-a-user-interface-4a6d898e9721#.bogmc1aru
Listening to peoplehttps://www.youtube.com/watch?v=RFqe8U8qw-M
All conversation has a shared context
• When two people talk, their context will modify tone and content…
• We have social rules around phone calls, texts, IM etc.
• These are modifications to the rules of face to face conversation
• We use different language for work / home / social contexts
• …but machines have no context to share
• We have to explicitly model the context for the machine
• Alexa does not have ‘common sense’
Authentication
• In human-to-human conversation we authenticate on sound of
voice almost instantly
• To Alexa, all voices are equal
• Without authentication, many potential use for LUIs are unsecure
https://uxdesign.cc/what-we-can-learn-from-alexas-mistakes-a4670a9e6c3e#.3l540jsf4
You talkin’ to me?
• We rely on tone of voice to provide meta-data about the
message content
• “It’s not what you said, it’s the way you said it.”
• Without the meta data, the communications capacity
(bandwidth) is greatly reduced
GSOH essential
• “Amazon, Google, Apple and Facebook have been recruiting a diverse cast
of script writers, audio specialists and comedians. It is part of a much wider
drive in digital industry to hire those with an understanding of how etiquette,
creativity, dramatic timing and humour can elevate a digital experience.
Google, for instance, is reportedly working with joke writers from Pixar and
The Onion to imbue its new Assistant with some personality.”
• http://www.mobileuserexperience.com/?cat=79
Tone of voice
When will it be like the movies?
https://www.theguardian.com/culture/gallery/2015/jan/08/the-top-20-artificial-intelligence-films-in-pictures#img-17
Conclusionhttps://goo.gl/images/lHH00z
Consider the context
• Spatial context
• Is the user in a space where speech can be recognised?
• Social context
• Is it socially acceptable to talk to a machine?
• Task context
• Is the user engaged in some other task ?
The Hype Cycle & Amara’s Law
“We tend to overestimate
the effect of a technology in
the short run and
underestimate the effect in
the long run”
Roy Amara
Source https://en.wikipedia.org/wiki/Hype_cycle
We are here