Intelligent Perception - ACMacm.ut.ac.ir/.../post_attachment/attachment/87/Intelligent_Perception… · Highly structured General Purpose Graphics Programming Vikash Mansinghka, Tejas

Post on 19-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Intelligent Perception

S. M. Ali Eslami

December 2016

Underlying scene Observation

?

Underlying scene Observation

?

1. How should the scene be represented?

2. How should the representation be computed?

Learning paradigms

x

z

SupervisedLearning

yhorse

Computer

Horse

Cow

OutputInput

Prep

roce

ssin

g

Feat

ure

Extr

actio

n

Feat

ure

Sele

ctio

n

Lear

ned

Dis

crim

inat

ion

Calib

ratio

n

Algorithm

Computer

Horse

Cow

OutputInput

Stag

e 1

Stag

e 2

Stag

e 3

Stag

e 4

Stag

e 5

Algorithm

Introduction

● Optimize directly for the end loss

● End-to-end training, no engineered inputs

● With enough data, learn a big non-linear function

● Supervised labeling is often enough for transferrable representations

● Large labeled dataset + big / deep neural network + GPUs

Deep Supervised Learning

Clarifai (2014)

Introduction

Deep Supervised Learning

Zhang et al. (2015) Simonyan et al. (2014)

Text Classification Video Classification

Introduction

● Innovation continues○ Inception (Szegedy et al., 2015)○ Residual connections (He et al., 2015)○ Batchnorm (Ioffe et al., 2015)

● Performance is continuously improving

Deep Supervised Learning

Szegedy et al., (2015)

Where does the data come from?

What is the correct representation?

Learning paradigms

x

z

x

SupervisedLearning

Reinforcement Learning

y

z

ahorse left env

Human-level control in ATARI

End-to-end reinforcement learning

Mnih et al. (2015)

How much experience do we really need?

Learning paradigms

Decoder

x

z

xx

z

x

SupervisedLearning

Reinforcement Learning

GenerativeModelling

y

z

a yhorse left env

Learning paradigms

Decoder

x

z

xx

z

x

SupervisedLearning

Reinforcement Learning

GenerativeModelling

y

z

a y

(2.3, -1, 0.5, 3)

not blinkinghorse left env

Highly structured

General Purpose Graphics Programming

Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, and Joshua B. Tenenbaum (2013)

Partially structured

A Stochastic Grammar of Images

Song-Chun Zhu and David Mumford (2007)

Partially structured

S. M. Ali Eslami and Christopher K. I. Williams (2012)

Fully unstructured

Geoffrey Hinton (2006) Antti Rasmus et al. (2016) Jeff Donahue et al. (2016)

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey HintonNeural Information Processing Systems (NIPS), 2016

To obtain object-based representationsTo learn from orders-of-magnitude less data

Motivation

x

z

blue brick

Mod

elIm

age

Cau

se

x

z

blue brick pile of bricks

x

z

Mod

elIm

age

Cau

se

not sufficient forgraspingcountingtransfergeneralisation

x

z

x

z1 z2

Mod

elIm

age

Cau

se

blue brick red brickpile of bricks

x

z

Mod

elIm

age

Cau

se

x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2

blue brick red brickpile of bricks blue brickabove

red brickbelow

x

z1 z2

Decoder

x y

z Decoder

x y

h1 h2 h3

z1 z2 z3

x

z

x

z1 z2 z3

Mod

elIn

fere

nce

Net

wor

k

Decoder

x y

h1 h2 h3

z1 z2 z3 Decoder

x y

h1 h2 h3

zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3

x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2

Mod

elIn

fere

nce

Net

wor

k

x

z1 z2 z3

Decoder

x y

h1 h2 h3

z1 z2 z3 Decoder

x y

h1 h2 h3

zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3

x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2

Mod

elIn

fere

nce

Net

wor

k

x

z1 z2 z3

focus on representation not reconstruction

output is a setorder? count?

x y

zpres

zwhatxatt yatt

hi

zwhere...

VA

E

yi

i ii

i

i

... ...

Omniglot

6

9

no

yes

Representational power

Sum? Increasing order?

Additional structure

x

z

distributed vector that correlates with blue brick

learned

x

z

Additional structure

x

z

distributed vector that correlates with blue brick

class=brickcolour=blueposition=Protation=R

learned

specified

Decoder

x y

h1 h2 h3

z1 z2 z3

x

z1 z2 z3

Additional structure

specified

Inverse graphics

Policy learning

Tabl

e-to

pM

NIS

T

Unsupervised Learning of 3D Structure from Images

Danilo Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas HeessNeural Information Processing Systems (NIPS), 2016

To recover 3D structure from 2D imagesTo form stable representations, regardless of camera position

Motivation

To recover 3D structure from 2D imagesTo form stable representations, regardless of camera position

● Inherently ill-posed○ All objects appear under self occlusion, infinite explanations○ Therefore build statistical models to know what’s likely and what’s not

● Even with models, inference is intractable○ Important to capture multi-modal explanations

● How are 3D scenes best represented?○ Meshes or voxels?

● Where is training data collected from?

Motivation

Unsupervised Learning of 3D Structure from Images

Unsupervised Learning of 3D Structure from Images

Projection operatorsUnsupervised Learning of 3D Structure from Images

Unsupervised Learning of 3D Structure from Images

Unconditional samples

Unsupervised Learning of 3D Structure from Images

Class-conditional samples

Unsupervised Learning of 3D Structure from Images

Class-conditional samples

Unsupervised Learning of 3D Structure from Images

Multi-modality of inference

Unsupervised Learning of 3D Structure from Images

3D structure from multiple 2D images

Unsupervised Learning of 3D Structure from Images

Inferring object meshes

● Deep Supervised Learning

● Deep Reinforcement Learning

● Model-based Methods

● Structured / Unstructured Generative Models

Recap

aeslami@google.comarkitus.com

top related