Top Banner
Specialization Module Speech Technology Timo Baumann [email protected] Universität Hamburg, Department of Informatics Natural Language Systems Group
34

Specialization Module

Feb 22, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Specialization Module

Specialization Module

Speech Technology

Timo [email protected]

Universität Hamburg, Department of InformaticsNatural Language Systems Group

Page 2: Specialization Module

Spoken Dialogue, a Complex Interactive System

Page 3: Specialization Module

The Noisy-channel Model

INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATION

NOISESOURCE

MESSAGEMESSAGE

SIGNAL RECEIVEDSIGNAL

Page 4: Specialization Module

The Chain Model of Communication

INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATION

NOISESOURCE

MESSAGEMESSAGE

SIGNAL RECEIVEDSIGNAL

Page 5: Specialization Module

The Chain Model of Communication

INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATION

NOISESOURCE

MESSAGEMESSAGE

SIGNAL RECEIVEDSIGNAL

INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATION

MESSAGEMESSAGE

SIGNALRECEIVEDSIGNAL

Page 6: Specialization Module

Chain model of Communication

recombine souds to words

recover structure of sequence

determine meaning of structure

recover idea described by message

represent words through sounds

sequentialize structure to word stream

determine structure to convey meaning

find message that describes idea pragmatics

syntax/morphology

semantics/lexicology

phonology/phonetics

Page 7: Specialization Module

Human Communication (simplified)

Speaker Listener

Concept

Verbalization

muscularmovements

Received Concept

Interpretation

sensoryimpression

Page 8: Specialization Module

Human Communication (simplified)

Speaker Listener

Concept

Verbalization

muscularmovements

Received Concept

Interpretation

sensoryimpression

correspondence(hopefully!)

Page 9: Specialization Module

But what about dialogue?

In what ways does the simple model for a dialog agent seem insufficient to you? (in pairs / small groups; 5 minutes)

Page 10: Specialization Module

Aspects of dialogue

● bi-directional communication– no clear „sender“ and „receiver“; agents are both

● agents share the communication channel– time-sharing– additional feedback signals– simultaneous speech is more frequent than we think!

● communication is controlled interactively by both the current-speaker and the current-listener

➔ local management within each layer (e.g. entrainment)➔ turn-taking!

Page 11: Specialization Module

Dialogue (simplified)

dialogue agent dialogue agent

Page 12: Specialization Module

Turn-taking

● the question of who talks when in a dialogue – „who holds the floor“ → the task is called floor-tracking or end-of-turn-detection

● need to find out whether the other speaker has finished / whether it's OK to start speaking

Page 13: Specialization Module

The many kinds of turn-taking signals:

What may indicate that your turn is over / that your interlocutor may take the floor?

Page 14: Specialization Module

recombine souds to words

recover structure of sequence

determine meaning of structure

recover idea described by message

represent words through sounds

sequentialize structure to word stream

determine structure to convey meaning

find message that describes idea pragmatics

syntax/morphology

semantics/lexicology

phonology/phonetics

Page 15: Specialization Module

Towards a model of a dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Sound

Words

DA

Words

Sound

DA

SQL?DA?

Page 16: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Page 17: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Sound

Page 18: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generationWords

Page 19: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

DADA = dialog act

Page 20: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

SQL?DA?

DA = dialog act

Page 21: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

DADA = dialog act

Page 22: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Words

DA = dialog act

Page 23: Specialization Module

A simple dialogue agent

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Sound

DA = dialog act

Page 24: Specialization Module

Where is turn-taking?

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Sound

Words

DA

Words

Sound

DA

SQL?DA?

DA = dialog act

Page 25: Specialization Module

Where is turn-taking?

Domain

speech recognition

History

Dialog-Manager

language understanding

speech synthesis

language generation

Sound

Words

DA

Words

Sound

DA

SQL?DA?

DA = dialog act

Floor Tracking

S:U:

sign

als

wordsprosody

Page 26: Specialization Module

blackboard-based architecture

domain

Speech Recognition

history

Dialog Manager

Language Understanding

Speech Synthesis

Language Generation

Blackboard

speech speechwords wordsm

ultip

le

multiplemultiple

multiple multiple

Page 27: Specialization Module

blackboard-based architecture

domain

Speech Recognition

history

Dialog Manager

Language Understanding

Speech Synthesis

Language Generation

Blackboard

speech speechwords wordsm

ultip

le

multiplemultiple

multiple multiple

Page 28: Specialization Module

pipeline vs. blackboard

● conceptually very simple

● modules have one input type and one output type

● may use existing modules

● concurrency is easy● unable to completely solve

the problem● modules can be merged if

this is necessary/helpful

● conceptually simple (but complex interactions)

● modules may look at all other modules' output

● may use existing modules (but then loose advantage)

● concurrency is very hard● in principle able to solve the

dialogue problem● merging is not necessary

but possible

Page 29: Specialization Module

How can a simple dialogue agent work?

● dialog interaction is very robust

● in particular, turn-taking behaviour in humans is excellent

● (systems) theoretically: different attractors are available– coming to a „slow mode“ of turn-taking if other is slow to repond– conversing more clearly to be understood– stopping oneself from giving feedback if other is confused by that– ...

Page 30: Specialization Module

Conclusion

● most applied systems are modular and pipeline-based (possibly where some modules are merged, forming their own mini-blackboards)

● most applied systems ignore all but the very obvious turn-taking signals: they speak after no-one else has spoken for some time (e.g. 500 milliseconds) and they stop speaking when someone else barges in

● turn-taking is extremely complex and uses prosodic and other features

● turn-taking is very robust. This relies on attraction towards stable states in the complex dialogue system– dialog systems get away with spending little effort on good turn-taking

behaviour

Page 31: Specialization Module

Thank you.

[email protected]

https://nats-www.informatik.uni-hamburg.de/SLP16

Universität Hamburg, Department of InformaticsNatural Language Systems Group

Page 32: Specialization Module

Further Reading

● Chain model of communication:

– M. Pétursson & J. Neppert (1996): Elementarbuch der Phonetik. Buske. StaBi: F Ling 062/6.

● Introduction to Dialogue and Linguistics:

– the relevant chapters in: Jurafsky and Martin (2009): Speech and Language Processing. Pearson International. InfBib: A JUR 4204x.

● Systems theoretic views on complex systems in general and on language in particular:

– Bertalanffy (1972): „The History and Status of General Systems Theory“. In: The Academy of Management Journal 15(4), pp. 407-426. via Google Scholar.

– Larsen-Freeman and Cameron (2008): Complex Systems and Applied Linguistics, Oxford University Press. StaBi: A 2009 / 7836.

Page 33: Specialization Module

Notizen

Page 34: Specialization Module

Desired Learning Outcomes

● interaction management is a crucial aspect of dialogue– in particular channel management in multiple ways

● turn-taking cannot easily be allocated to a „module“ but it emerges from the interaction

● prosody is a field of phenomena relevant in many linguistic layers

● students grasp the idea of emergence in complex systems and attraction as a principle to control such systems