Top Banner
Tutorial on D eep P robabilistic G enerative M odels f or R obotics Introduction 2020.10.25 IROS2020 on demand Organized by Takayuki Nagai, Osaka University Tadahiro Taniguchi, Ritsumeikan University Takato Horii, Osaka University Chie Hieida, Nara Institute of Science and Technology Kaede Hayashi, Ritsumeikan University 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) October 25-29, 2020, Las Vegas, NV, USA (Virtual) 978-1-7281-6211-9/20/$31.00 ©2020 IEEE 655
26

TS-2408 Tutorial Intro Video - PaperCept

Feb 10, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TS-2408 Tutorial Intro Video - PaperCept

Tutorial on Deep Probabilistic Generative Models for Robotics

Introduction2020.10.25 IROS2020 on demand

Organized byTakayuki Nagai, Osaka University Tadahiro Taniguchi, Ritsumeikan University Takato Horii, Osaka UniversityChie Hieida, Nara Institute of Science and TechnologyKaede Hayashi, Ritsumeikan University

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)October 25-29, 2020, Las Vegas, NV, USA (Virtual)

978-1-7281-6211-9/20/$31.00 ©2020 IEEE 655

Page 2: TS-2408 Tutorial Intro Video - PaperCept

Welcome!• Introduction

• What is Deep Probabilistic Generative Models (DPGM)?• Why should we learn DPGM• What to learn?

• Theoretical side (2 talks)• Prof. Taniguchi• Dr. Okada

• Implementation side (2 talks)• Prof. Suzuki• Prof. Nakamura

• Information on this tutorial• HP, supplemental materials

2

DPGM

656

Page 3: TS-2408 Tutorial Intro Video - PaperCept

from http://www.slideshare.net/issei_sato/deim2012-issei-sato

3Probabilistic generative models

Number of data N

Latent valuable

Generativeprocess

Use these “generative models” for developing intelligent robots!

Observation

“Deep” Probabilistic Generative Models (DPGMs)

Inference network

inference model generative model

Probabilistic Generative Models

APPLICATIONS

THEORIES

TOOLS

657

Page 4: TS-2408 Tutorial Intro Video - PaperCept

We are interested in cognitive roboticsConstructive approach

• Constructive Human Science⇒ Construct to know mechanisms behind our mind⇒ Construct to use it for specific applications

Science

Comparison

Decomposition

Construction

Constructive approach

Hypothesis

AlgorithmEvaluation

Service robots

Industrial robots

Elderly care

Child care

Applications

Engineering

4

658

Page 5: TS-2408 Tutorial Intro Video - PaperCept

Domestic service applications5

659

Page 6: TS-2408 Tutorial Intro Video - PaperCept

What is the problem?• Problem of building general intelligence

• What is the essence of intelligence based on its own body?• We want to build a computational model and to implement

6

𝑝𝑝(𝑌𝑌|𝑋𝑋)𝑋𝑋:image

𝑌𝑌:joint torque

Take the plastic bottole

𝑋𝑋:speech signal

The ability required for the robot is to predict the (continuous) pattern Y that generates its own action from the input (continuous) pattern X

660

Page 7: TS-2408 Tutorial Intro Video - PaperCept

Approaches

𝑝𝑝(𝑌𝑌|𝑋𝑋) 𝑝𝑝 𝑌𝑌 𝑋𝑋 = �𝑧𝑧

𝑝𝑝 𝑌𝑌, 𝑧𝑧 𝑋𝑋 =�𝑧𝑧

𝑝𝑝(𝑌𝑌|𝑧𝑧)𝑝𝑝 𝑧𝑧 𝑋𝑋

𝑝𝑝 𝑌𝑌 𝑋𝑋 ≈ 𝑝𝑝 𝑌𝑌 argmax 𝑝𝑝 𝑧𝑧 𝑋𝑋= 𝑝𝑝 𝑌𝑌 𝑧𝑧

Pattern recognition

𝑧𝑧 is defined by hand⇒Required labeled data

Training the classfier 𝑝𝑝 𝑋𝑋 𝑧𝑧

Hard to design 𝑝𝑝 𝑌𝑌 𝑧𝑧

𝑋𝑋

𝑌𝑌

SupervisedEnd to end learning

Pipeline Problem of finding the label 𝑧𝑧 corresponding to 𝑋𝑋

𝑋𝑋

𝑌𝑌

Bayesian model

𝑧𝑧 is a latent variable (concept), which is acqired by the robot itself

=> unsupervised learning

Neural Nets

When 𝑧𝑧 is selected by the classifier, an action must be selected according to the result

𝑋𝑋:image

𝑌𝑌:joint torque

𝒛𝒛:concept

action Rec./Rep.

7

𝑧𝑧 = argmax𝑝𝑝 𝑋𝑋 𝑧𝑧 𝑝𝑝(𝑧𝑧)

𝑝𝑝(𝑋𝑋)

AI and/or Robot

661

Page 8: TS-2408 Tutorial Intro Video - PaperCept

Concept space

What is understanding?• Robot understanding of real world

“Understanding” : prediction of unobservable information through concepts

“Meaning” : predicted contents“Concept”:multimodal categorization generates

concepts (categories)• Symbol grounding

soft(appearance)

understanding

「stuffed toy」Phoneme seq.

inference

8

No shared ground truthEverybody generates own space

Communication solves the mismatch

Concept#1Concept#2

Concept#3

Concept#4

representation

662

Page 9: TS-2408 Tutorial Intro Video - PaperCept

Counter direction9

Concept space

soft(appearance)

「stuffed toy」Phoneme seq.

inference

No shared ground truthEverybody generates own spaceCommunication solves the mismatch

• Not modeling alone, but modeling joint probabilities

𝑝𝑝(𝑥𝑥,𝑦𝑦,⋯ ) = �𝑧𝑧

𝑝𝑝 𝑥𝑥,𝑦𝑦,⋯ 𝑧𝑧 𝑝𝑝 𝑧𝑧

𝑝𝑝(𝑦𝑦|𝑥𝑥)

Concept#1Concept#2

Concept#3

Concept#4

understanding

representation

663

Page 10: TS-2408 Tutorial Intro Video - PaperCept

Multimodal Generative Models

𝑥𝑥 𝑧𝑧 𝑦𝑦

Multimodal supervised learning

observations latent variables output

𝑥𝑥

𝑧𝑧

𝑦𝑦

Multimodal unsupervised learning

𝑦𝑦 𝑧𝑧 𝑥𝑥

observations

𝑝𝑝(𝑥𝑥|𝑦𝑦)

𝑝𝑝(𝑦𝑦|𝑥𝑥)

𝑝𝑝(𝑥𝑥,𝑦𝑦,⋯ )

𝑝𝑝 𝑦𝑦 𝑥𝑥 =𝑝𝑝(𝑥𝑥,𝑦𝑦)𝑝𝑝(𝑥𝑥)

latent variable

10

664

Page 11: TS-2408 Tutorial Intro Video - PaperCept

Basics of DPGMfR

11

1st tutorial talkProf. Tadahiro Taniguchi

Ritsumeikan UniversityBasics of Probabilistic Generative Models for Robotics

665

Page 12: TS-2408 Tutorial Intro Video - PaperCept

from http://www.slideshare.net/issei_sato/deim2012-issei-sato

12

Probabilistic generative modelNumber of data N

Latent valuable

Generative process

666

Page 13: TS-2408 Tutorial Intro Video - PaperCept

Multimodal generative model13

𝑥𝑥

𝑧𝑧

𝑦𝑦 observations𝑝𝑝(𝑥𝑥,𝑦𝑦,⋯ )

Latent variable

𝜃𝜃

𝛽𝛽1 𝛽𝛽2 𝛽𝛽𝑘𝑘

667

Page 14: TS-2408 Tutorial Intro Video - PaperCept

Multimodal categorizationCategorization of multimodal data

• Multimodal Latent Dirichlet Allocation(MLDA, MHDP, … )[Nakamura+ 09]

Inference of the parameters and by Gibbs Sampling

vision

audition

tactile

word

: Dirichlet prior

: multinomial parameters

: categories

: multimodal infromation

: multinomial parameters

: Dirichlet priorπ ∗

β ∗ θ

[Nakamura + 09] Nakamura,T. et al., Grounding of word meanings in multimodal concepts using LDA, in Proc. IROS2009, pp.3943–3948, 2009

Stuffed toysoft

14

668

Page 15: TS-2408 Tutorial Intro Video - PaperCept

HMM

Model-basedplanning

Temporal Leaning

HMMLanguage Learning

Language area

Q-function

Reinforcement Learning

Basal gangliaCorpus striatum

ww

ww

zz

vision

audition

tactile

word

Building block (module)

Hierarchical connection of modules based on functions of the brain

z

z

z

zw

Integrated cognitive model

MLDA

15

K.Miyazawa et al. “Integrated cognitive architecture for robot learning of action and language,” Frontiers in Robotics and AI, 2019

669

Page 16: TS-2408 Tutorial Intro Video - PaperCept

HMM

Model-basedplanning

Temporal Leaning

HMMLanguage Learning

Language area

Q-function

Reinforcement Learning

Basal gangliaCorpus striatum

ww

ww

zz

Hierarchical connection of modules based on functions of the brain

z

z

z

zw

Integrated cognitive model w/ deep generative models

Deep mMLDA

16

LSTM(temporal learning)

Latent Variables

670

Page 17: TS-2408 Tutorial Intro Video - PaperCept

How does the robot use DPGMs?• Planning/Control as probabilistic inference• Relationship between DPGMs and MPC

17

POMDP (World Model)

????

Planning/Control problems can be solved as probabilistic inference on the PGM

Equivalent to

Complex cognitive model by DPGMs

*Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review*Variational Inference MPC for Bayesian Model-based Reinforcement Learning*PlaNet of the Bayesians: Reconsidering and Improving Deep Planning Network by Incorporating Bayesian Inference

671

Page 18: TS-2408 Tutorial Intro Video - PaperCept

Planning/Control as inference

• Planning/Control as inference

18

2nd tutorial talkDr. Masashi Okada

Panasonic Corp.Theories of planning/control as probabilistic inference

672

Page 19: TS-2408 Tutorial Intro Video - PaperCept

Tools• We need to implement very complex models in practice

• We have a very useful programing language for developing DPGMs!• Pixyz

• We have a framework for integrating multiple DPGMs (modules)• SERKET/Neuro SERKET

19

*Nakamura T, Nagai T and Taniguchi T (2018) SERKET: An Architecture forConnecting Stochastic Models to Realize a Large-Scale Cognitive Model. Front.Neurorobot. 12:25. doi: 10.3389/fnbot.2018.00025*T. Taniguchi, T. Nakamura, M. Suzuki, R. Kuniyasu, K. Hayashi, A. Taniguchi, T.Horii, T. Nagai, Neuro-SERKET: Development of Integrative CognitiveSystem Through the Composition of Deep Probabilistic Generative Models,New Generation Computing. 38. 10.1007/s00354-019-00084-w

673

Page 20: TS-2408 Tutorial Intro Video - PaperCept

VAE(An example of DGMs)20

Loss function : ELBO

This slide was provided by Dr. Suzuki

Inference model generative model

674

Page 21: TS-2408 Tutorial Intro Video - PaperCept

Multimodal deep generative models• Encoder-decoder architecture is problematic in this case

• Information cannot be predicted fro the other input

21

• JMVAE [Suzuki+ 16]• PoE [Wu+ 18]• Use associater between Z [Jo+ 19] This slide was provided by Dr. Suzuki

Multi-modalities

Shared representation

encoder decoder675

Page 22: TS-2408 Tutorial Intro Video - PaperCept

Pixyz: programming language for DPGMs

22

Loss function : ELBO

This slide was provided by Dr. Suzuki

Inference model generative model

3rd tutorial talkProf. Masahiro Suzuki

The University of TokyoPixyz: a framework for developing complex deep generative models

676

Page 23: TS-2408 Tutorial Intro Video - PaperCept

Even more complex generative models• Integration of modules• Optimization as a whole

23

T.Nakamura, T.Nagai, T.Taniguchi, SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model, Front. Neurorobot., 26 June 2018

T. Taniguchi, T. Nakamura, M. Suzuki, R. Kuniyasu, K. Hayashi, A. Taniguchi, T. Horii, T. Nagai, Neuro-SERKET: Development of Integrative Cognitive System Through the Composition of Deep Probabilistic Generative Models, New Generation Computing. 38. 10.1007/s00354-019-00084-w

decomposition

677

Page 24: TS-2408 Tutorial Intro Video - PaperCept

SERKET: integration of multiple models

24

4th tutorial talkProf. Tomoaki NakamuraThe University of Electro-Communications

A Framework for constructing multimodal learning models: SERKET

678

Page 25: TS-2408 Tutorial Intro Video - PaperCept

Recap

• 4 tutorial talks• Theoretical side (2 talks)

• by Prof. Taniguchi• by Dr. Okada

• Implementation side (2 talks)• by Prof. Nakamura • by Prof. Suzuki

• Supplemental materialshttps://sites.google.com/view/dpgmfr/home• Slides, GitHhub, sample codes, papers, past workshops

25

679

Page 26: TS-2408 Tutorial Intro Video - PaperCept

This tutorial is presented by RSJand

JST CREST "Symbol Emergence in Robotics for Future Human-Machine Collaboration"

Enjoy!26

Thanks for endorsing this tutorial !• IEEE RAS TC on Robot Learning• IEEE RAS TC on Cognitive Robotics• IEEE CDS TC Task Force on Robotics680