Top Banner
Egocentric Future Localization Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi
72

Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Dec 07, 2018

Download

Documents

dangdien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Egocentric Future Localization

Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi

Page 2: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

VIRAT Dataset

Page 3: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

VIRAT Dataset

Where will he go?

Page 4: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

VIRAT Dataset Kitani et al. ECCV12

Where will he go?

Page 5: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

VIRAT Dataset Kitani et al. ECCV12

Where will he go?

Page 6: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

VIRAT Dataset Kitani et al. ECCV12

Where will he go? If I were him, how would I move into the scene?

Page 7: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

What is he experiencing visually?

Page 8: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

First person view

Page 9: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

RGBD First person view

Page 10: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

First person view

Page 11: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Occlusion

Occlusion

First person view

Page 12: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

First person view

Future egomotion

Page 13: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Future localization

Page 14: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Future localization

Page 15: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Future localization

Page 16: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Why challenging?

Page 17: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Training

Why challenging?

Page 18: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Training

Why challenging? Prediction

Page 19: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Wall

Wall

Cars

Cars Tree Tree

Training

Road

Hand Prediction

Road

Hand Why challenging?

Page 20: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Wall

Wall

Cars

Cars Tree Tree

Training

Road

Hand Prediction

Road

Why challenging? Hand

1. Geometric inconsistency

Page 21: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Wall

Wall

Cars

Cars Tree Tree

Training

Road

Hand

Road

Why challenging? Hand

Vanishing line

Looking down Looking forward

Prediction 1. Geometric inconsistency

Page 22: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Wall

Wall

Cars

Cars Tree Tree

Training

Road

Hand

Road

1. Geometric inconsistency

Why challenging? Hand

Vanishing line

Prediction

2. Semantic inconsistency

Looking down Looking forward

Page 23: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Wall

Wall

Cars

Cars Tree Tree

Training

Road

Hand

Road

Why challenging? Hand

Vanishing line

Prediction 1. Geometric inconsistency

2. Semantic inconsistency

EgoRetinal representation

Preference learning

Looking down Looking forward

Page 24: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Image space

Page 25: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

Prediction: Configuration space (ground plane)

Image space

Page 26: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

Prediction: Configuration space (ground plane)

Image space

tx

=1, , Fx x g( ) ( )Predicted trajectory

Page 27: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

Prediction: Configuration space (ground plane)

Image space

tx

=1, , Fx x g( ) ( )Predicted trajectory

Page 28: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

=1, , Fx x g( ) ( )

n

Ground plane

tx

g f( ( ))

Prediction: Configuration space (ground plane)

Projection to cfg. space Image space

Page 29: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

=1, , Fx x g( ) ( )

tx

g f( ( ))

Prediction: Configuration space (ground plane)

Projection to cfg. space

n

Ground plane

Image space

Head orientation invariant

Page 30: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

=1, , Fx x g( ) ( )g f( ( ))

n

Ground plane

Image space

r( , )θ

rf ,( ) θ

Page 31: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

=1, , Fx x g( ) ( )g f( ( ))

n

Ground plane

Image space

r( , )θ

Tr rf u n, proj( , )( ) = ( , ) θ θ

Image projection

Page 32: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

=1, , Fx x g( ) ( )g f( ( ))

n

Ground plane

Image space

r( , )θ

Tr rf u n, proj( , )( ) = ( , ) θ θ

uTu n

Height of occluding object

Page 33: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

∆ ∝1

logrD

where D is depth.

Retinal representation

Cf) Proxemics

Page 34: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

∆ ∝1

logrD

where D is depth.

Retinal representation

Cf) Proxemics

Persistent to 2D and 3D distance

Page 35: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

∆ ∝1

logrD

where D is depth.

Retinal representation Persistent to 2D and 3D distance

3D distance

Page 36: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

∆ ∝1

logrD

where D is depth.

Retinal representation Persistent to 2D and 3D distance

3D distance

log p

Page 37: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

EgoRetinal image projection EgoRetinal RGB

Page 38: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

EgoRetinal image projection EgoRetinal RGB EgoRetinal depth

Page 39: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

EgoRetinal image projection EgoRetinal RGB EgoRetinal depth

P2: 2D and 3D persistent P3: Occlusion reasoning P1: Head orientation invariant

Page 40: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.)

Configuration space

Page 41: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.)

Configuration space

Page 42: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.)

Configuration space

Cartesian in ground plane

Page 43: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.)

Configuration space

Cartesian in ground plane

Page 44: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.)

Configuration space

Cartesian in ground plane

Vanishing point

Page 45: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.) Cartesian in ground plane EgoRetinal space

Configuration space

Vanishing point

Page 46: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 47: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

• 1280x960 stereo (100mm baseline, ~15m depth resolution) • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours)

Dataset summary

Page 48: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

Page 49: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane , ,= ( )t t t tx r θ ω

Page 50: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane , ,= ( )t t t tx r θ ωtω

Trajectory topology

Page 51: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane , ,= ( )t t t tx r θ ωtω

Trajectory topology

Traj A

Traj B

Page 52: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane , ,= ( )t t t tx r θ ωtω

Trajectory topology

Traj A

Traj B

Page 53: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane Walkable pixel

Walkable pixels

, ,= ( )t t t tx r θ ω

Page 54: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

n

Ground plane

Walkable height

Walkable height , ,= ( )t t t tx r θ ω

Page 55: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Testing image Trajectory retrieval Depth cost Walking affordance

Page 56: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Testing image Trajectory retrieval Depth cost Walking affordance

Page 57: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Testing image Trajectory retrieval

*+ + −2

D RGBminimize X

E E X Xλ

Depth cost Walking affordance

Data cost

*X

*X : retrieved trajectory

Page 58: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Testing image Trajectory retrieval

*+ + −2

D RGBminimize X

E E X Xλ

Depth cost Walking affordance

Depth walking preference

*X : retrieved trajectory

Page 59: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Testing image Trajectory retrieval

*+ + −2

D RGBminimize X

E E X Xλ

Depth cost RGB cost

Walking preference

*X : retrieved trajectory

Page 60: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Future localization

Page 61: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian in image (LeCun et al.)

Page 62: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian on ground plane

Page 63: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Cartesian on ground plane

Short range error

Short range

Page 64: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Ours

Short range error

Page 65: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 66: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 67: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 68: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 69: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 70: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary
Page 71: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

EgoRetinal map

Putting yourself in his shoes

Page 72: Egocentric Future Localization -  · Training . Road . Hand . Road . 1. Geometric inconsistency . ... • 26 scenes (13 indoor, 13 outdoor) • 65.5k frames (9.1 hours) Dataset summary

Egocentric Future Localization

Website: http://www.seas.upenn.edu/~hypar/future_loc.html