Top Banner
From simple innate biases to complex visual concepts This image is in the public domain. 1
37

From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Jul 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

From simple innate biases to complex visual concepts

This image is in the public domain.

1

Page 2: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

© Reuters. All rights reserved. This content is excluded fromour Creative Commons license. For more information, seehttps://ocw.mit.edu/help/faq-fair-use/.

2

Page 3: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

How it all starts

• Start without world knowledge • Watch many movies of the world• Develop representations of various

concepts

© Source Unknown. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

3

Image removed due to copyright restrictions.Please see the video.

Page 4: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Hands Gaze

Difficult, appear early, important for subsequent learning of agents, goals, interactions,

© Harry L Anthony. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

© ciifka at Flickr.com. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

4

Page 5: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Hands and body parts are important

Action recognition Gesture and communication Agents interactions

© Somesai via Flickr.com. All rights reserved. This contentis excluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

5

Page 6: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Hands are difficult

Van Gogh Kirchner

Multiple appearances

Small and inconspicuous

This image is in the public domain.

© Source Unknown. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

© Joe Amaro. All rights reserved. This content isexcluded from our Creative Commons license. Formore information, seehttps://ocw.mit.edu/help/faq-fair-use/.

© Ernst Kerchner. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

© Source Unknown. All rights reserved. Thiscontent is excluded from our Creative Commonslicense. For more information, seehttps://ocw.mit.edu/help/faq-fair-use/.

6

Page 7: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Difficult to extract in unsupervised schemes

Informative fragments from people / no-people

Unsupervised Deep Learning

‘The problem of recovering human body configurations in a general setting is arguably the most difficult recognition problem in computer vision’

Mori, Malik, CVPR 2004

© Source Unknown. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

Figure removed due to copyright restrictions.Please see the video.

7

Page 8: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Building High-level Features Using Large Scale Unsupervised LearningNg et al Stanford and Google ICML 2012

1B connections, 10M YouTube images, 1000 machines, 16,000 cores, 3 days

Some statistically significant structures emerge with large data

Unsupervised learning does not discover hands

Figure removed due to copyright restrictions. Please see the video.Source: Le, Quoc V. "Building high-level features using large scaleunsupervised learning." In Acoustics, Speech and Signal Processing(ICASSP), 2013 IEEE International Conference on, pp. 8595-8598.IEEE, 2013.

8

Page 9: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

In humans: Selectivity to hands appear early in infancy

Using a Head Camera to Study Visual Experience.

‘Overall…hand were in view and dynamically acting on an object in over 80% of the frames’.

Yoshida & Smith 2008

What makes hands learnable by humans?

© Wiley. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: Yoshida, Hanako, and Linda B. Smith. "What's in view for toddlers? Usinga head camera to study visual experience." Infancy 13, no. 3 (2008): 229-248.

9

Page 10: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Motion, Hand as ‘mover’ (7-months old)

See: Saxe, Carey The perception of causality in infancy. Acta Psychologica 2006

© fotosearch. All rights reserved. This content is excludedfrom our Creative Commons license. For more information,see https://ocw.mit.edu/help/faq-fair-use/.

10

Page 11: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Early sensitivity to special motion types

• High sensitivity to motion in general

(detecting motion, motion segmentation, tracking)

• Specific sub-classes of motion: self-motion, passive, and ‘mover’

A specific motion even is highly indicative of hands

© Source Unknown. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

11

Page 12: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Detecting ‘Mover’ Events

A moving image region causing a stationary region to move or change after contact.

Simple and primitive, prior to objects or figure-ground segmentation

Courtesy of National Academy of Sciences, U. S. A. Used with permission.Source: Ullman, Shimon, Daniel Harari, and Nimrod Dorfman. "From simple innatebiasesto complex visual concepts." Proceedings of the National Academy of Sciences 109, no. 44(2012): 18215-18220. Copyright © 2012 National Academy of Sciences, U.S.A.

12

Page 13: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Movers detection

‘Mover’ as an innate teaching signal for hand

Motion alone is insufficient

13

Page 14: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

‘Mover’ events extracted from videos

High fraction of Hand images (90% recall 65% precision) Internal supervision by movers and by tracking

14

Courtesy of National Academy of Sciences, U. S. A. Used with permission.Source: Ullman, Shimon, Daniel Harari, and Nimrod Dorfman. "From simple innatebiasesto complex visual concepts." Proceedings of the National Academy of Sciences 109, no. 44(2012): 18215-18220. Copyright © 2012 National Academy of Sciences, U.S.A.

Page 15: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Training Videos

Movies of scenes, people moving, manipulating objects, moving hands.

‘Mover’ events are detected in all movies and used for training

15

Page 16: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Hand detection in still images

Detection mainly of hands in object manipulation scenes

© Proceedings of the National Academy of Sciences. All rights reserved. Thiscontent is excluded from our Creative Commons license. For more information,see https://ocw.mit.edu/help/faq-fair-use/.Source: Ullman, Shimon, Daniel Harari, and Nimrod Dorfman. "From simpleinnate biases to complex visual concepts." Proceedings of the National Academyof Sciences 109, no. 44 (2012): 18215-18220.

16

Page 17: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Continued learning

• Two detection algorithms:

• Hands by their appearance

• Hands by the body context

Figure removed due to copyright restrictions. Please see the video.Source: Karlinsky, Leonid, Michael Dinerstein, Daniel Harari, andShimon Ullman. "The chains model for detecting parts by theircontext." In Computer Vision and Pattern Recognition (CVPR),2010 IEEE Conference on, pp. 25-32. IEEE, 2010.

17

Page 18: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Hand by Surrounding Context

Face Shoulder Upper-arm Lower-arm Hand

Amano, Kezuka, Yamamoto 2004Slaughter Heron-Delaney 2010Slaughter, Neary 2011 18

Page 19: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Co-training

Appearance Pose

Two supervised classifiersInternal co-supervision

19

Page 20: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

The chains computation:

Chains modelf

nL

)1(T

nF

)2(T

nF

)3(T

nF

j

nF

k

nF

m

nF

l

nF

hL

wij

© The Weizmann Institute. All rights reserved.This contentis excluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

20

Page 21: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

(a)

(c) (d) Appearance (e) Context

21

Courtesy of National Academy of Sciences, U. S. A. Used with permission.Source: Ullman, Shimon, Daniel Harari, and Nimrod Dorfman. "From simple innatebiasesto complex visual concepts." Proceedings of the National Academy of Sciences 109, no. 44(2012): 18215-18220. Copyright © 2012 National Academy of Sciences, U.S.A.

Page 22: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Own Hands

(A) (B)

Yoshida & Smith

A learned class, not the basis of hands in general Caregiver’s hands

© Wiley. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: Yoshida, Hanako, and Linda B. Smith. "What's in view for toddlers? Usinga head camera to study visual experience." Infancy 13, no. 3 (2008): 229-248.

22

Page 23: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Own Hands

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RecallP

recis

ion

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

Own hands

Movers(A) (B)

Manipulating Freely moving

23

Page 24: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Gaze

Infants follow the gaze of others Starting at 3-6 months and continues to develop Head orientation first, eye cues later Important in the development of communication and languageModeling mainly head direction

24

© ciifka at Flickr.com. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

Page 25: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Wollaston 1824

W.H. Wollaston, “On the Apparent Direction of Eyes in aPortrait,” Philosophical Trans. Royal Soc. of London, 1824.

25

This image is in the public domain.

Page 26: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Gaze cues are subtle and inconspicuous

26

Page 27: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Mover supplies the teaching signal

27

Page 28: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Using hand ‘mover’ events to learn gaze direction

28

Page 29: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

HoG description

29

Page 30: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Gaze extraction 2D

Training Testing

Humans

Model

30

Page 31: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Gaze results, 700 test images8 people, leave-one-out

31

Page 32: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Emerging Interpretation

Both agents are manipulating objects;The one on the left is interested in the other’s object

© Shutterstock. All rights reserved. This content is excludedfrom our Creative Commons license. For more information, seehttps://ocw.mit.edu/help/faq-fair-use/.

32

Page 33: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Mover events Hands Gaze word reference

When infants hear ‘He was mooping him’ they look in the gaze direction of the speaker and use this.

Nappa et al 2009

Internal supervision Learning ‘trajectories’

© Psychology Press. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: Nappa, Rebecca, Allison Wessel, Katherine L. McEldoon, Lila R. Gleitman,and John C. Trueswell. "Use of speaker's gaze and syntax in verb learning."Language Learning and Development 5, no. 4 (2009): 203-234.

33

Page 34: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Innate capacities

Mover

Tracking

Mover-to-gaze

Co-training

…..

Concepts Hand –appearance Hand – context Gaze Nouns, verbs

‘Digital Baby’

Figure removed due to copyright restrictions. Please see the video.Source: Karlinsky, Leonid, Michael Dinerstein, Daniel Harari, andShimon Ullman. "The chains model for detecting parts by theircontext." In Computer Vision and Pattern Recognition (CVPR), 2010IEEE Conference on, pp. 25-32. IEEE, 2010.

© Source Unknown. All rights reserved. This content isexcluded from our Creative Commons license. For moreinformation, see https://ocw.mit.edu/help/faq-fair-use/.

34

Page 35: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Rational imitation in preverbal infants

Gyorgy Gergely, Harold Bekkering, Ildiko Kiraly, Nature 415, 2002

Reprinted by permission from Macmillan Publishers Ltd: Nature.Source: Gergely, György, Harold Bekkering, and Ildikó Király. "Developmental psychology:Rational imitation in preverbal infants." Nature 415, no. 6873 (2002): 755. © 2002.

35

Page 36: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

Learning and innate structures

• Complex concept neither learned on its own nor innate.• Domain-specific innate structures • Not full solutions, but proto-concepts and strategies• Not hands, but movers etc. • Guide the system to develop meaningful representations• Provide internal supervision • ‘Learning trajectories’: mover – hand – gaze – reference • Can extract meaningful concepts event when they are non-

salient in the input • From cognition to AI: incorporate similar structures in

computational systems

36

Page 37: From Simple Innate Biases to Complex Visual Concepts · ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author

MIT OpenCourseWare https://ocw.mit.edu

Resource: Brains, Minds and Machines Summer CourseTomaso Poggio and Gabriel Kreiman

ax ay az The following may not correspond to a p articular course on MIT OpenCourseWare, but has been provided by the author as an individual learning resource.

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.