Top Banner
Siyu ZHANG Research Engineer ZJU-SenseTime Joint Lab of 3D Vision Object 6D Pose Estimation by Action-Decision
28

Object 6D Pose Estimation by Action-Decision - Siyu Zhang

May 07, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

Siyu ZHANG

Research Engineer

ZJU-SenseTime Joint Lab of 3D Vision

Object6DPoseEstimationbyAction-Decision

Page 2: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

AbitofRecapofObjet6DPoseEstimation

• As a regression problem

• Pose Estimation: direct regression

• Pose Tracking: render and regression

• As a matching problem

• Pose Estimation: matching from image pixels to points in object frame

• Pose Tracking:matching between frames

2

Page 3: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

• As a regression problem

• Pose Tracking: render and regression

• Render the image of target object

• Feed the rendered image and input image into network

• Regress the additive pose for target object

3DeepIM: Deep Iterative Matching for 6D Pose Estimation, ECCV, 2018

AbitofRecapofObjet6DPoseEstimation

Page 4: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

• As a regression problem

• Pose Estimation: direct regression

• Pose Tracking: render and regression

• Are there possible improvements?

4

AbitofRecapofObjet6DPoseEstimation

Page 5: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

Content• Paper 1: I Like to Move it 6D Pose Estimation as an Action Decision

Process - model object pose refinement as discrete decision making process

• Paper 2: Pose-Free Reinforcement Learning for 6D Pose Estimation - model object pose refinement as as reinforcement learning problem

5

Page 6: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Methodology

• Input:

• Cropped image of real object + rendered object with initial pose

• Concat with rendered depth and mask

• Output: 13 DoF action

• Action: +tx, -tx, +rx, -rx, …, stop

• Step size is fixed

• Initialization: Random seed and vote for object center (detailed next page)

6

Page 7: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Initialization by voting

• Observation:

• Will usually translate then rotate

• Action can still converge even with large offset.

• Method:

• Random sample seeds

• Vote for object center by aggregate actions.

7

Page 8: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Methodology

• Input: RGB image + rendered image

• Output: 13 DoF action

• Initialization: Random seed and vote for object center

8

Page 9: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Difference with existing approach:

• Discrete action with fixed size

• Intuition: Why this is better?

• Generalization ability

• Wider converge basin

• Lighter network + simpler task

• Synthetic training

9

Page 10: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Experiment

• Dataset: YCB-Video, LAVAL

• Trained only on YCB and eval on both

• While analysis, they amazingly found that:

10

Page 11: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Problem: GT is wrong… (all methods before are trying to overfit on false GT)

11

Page 12: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Experiment

• Dataset: YCB-Video, LAVAL

• Trained only on YCB and eval on both.

• Evaluation on YCB (single-object model) - surpass SoTA method

12

Page 13: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Robustness & Convergence Analysis • Can still converge even when the initial pose is unreasonably bad.

- make it possible to initialize without other approaches.

• While previous methods (e.g. Deep 6DoF tracking) failed when overlapping is less than 50%

13

Page 14: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Experiment: Evaluation on LAVAL

• Not good enough in rotation

14

Page 15: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasActionDecisionProblem

• Experiment: Runtime

15

Page 16: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Quick recap of RL (considering only control problem)

• Terminologies:

• State: information about the world

• Action: action to trigger next state from current state, sampled from policy

• Reward: how good is current action.

• Target: get policy that would optimize value function (expected cumulated reward for all times)

16

Page 17: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

Discussion:Compare(D)RLandSupervisedLearning

• Similarities:

• Target: get some output from network that would produce maximally possible performance

• Method: optimize network parameter w.r.t. performance measurement

• For Supervised Learning

• Performance measurement comes from a differentiable Loss function of network output

• Supervision is dense

• For Reinforcement Learning

• Performance measurement does not necessarily relate to network output (for example, it may comes from the environment)

• Supervision is sparse and temporal correlated

17

Page 18: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

Discussion:Compare(D)RLandSupervisedLearning

• Use DRL instead of Supervised Learning when …

• Loss some part of the network is non- differentiable

• Supervision is sparse

• Task is temporal correlated (e.g. path planning)

18

Page 19: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Quick recap of RL (considering only control problem)

• Terminologies:

• State: information about the world

• Action: action to trigger next state from current state, sampled from policy

• Reward: how good is current action.

• Target: get policy that would optimize value function (expected cumulated reward for all times)

• Why use RL instead of Supervised Learning?

• Using 2D mask as sparse supervision

19

Page 20: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Problem formulation

• Maximize future discounted rewards:

20

Page 21: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Reward: 2D Mask-based reward

• IoU Difference Reward: Encourage overlapping of 2D mask

• Goal Reached Reward: Stop refining when reach IoU_thr

• Ceneralization Reward: Bootstrap the network

21

Page 22: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Problem formulation

• Maximize future discounted rewards:

• State: rendered RGB image, projection mask, observed RGB image, gt-2D box

• Action: discrete & hand-craft action

22

Page 23: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Action:

• Discrete hand-craft action

• Proved to be better than continuous action.

23

Page 24: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Composite Reinforced Optimization

• Policy-gradient optimization based on PPO

• Off-policy optimization based with replay buffer

24

Page 25: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Experiment: Evaluation on Linemod & T-LESS

25

Page 26: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

6DPoseasReinforcementLearningProblem

• Experiment: Runtime

26

Page 27: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

Take-HomeMessage• When network is not working, turn to the training & val data

at first (your algorithm may really outperform ground truth!)

• When dealing with limited network capacity, learning less is more

• Consider using RL when supervision is sparse and temporal correlated.

27

Page 28: Object 6D Pose Estimation by Action-Decision - Siyu Zhang

ThanksforyourAttention

Siyu ZHANGResearch Engineer

ZJU-SenseTime Joint Lab of 3D Vision

[email protected]