Top Banner
L11: 6D Pose Estimation Hao Su Machine Learning meets Geometry Ack: Minghua Liu and Jiayuan Gu for helping to prepare slides
79

L11: 6D Pose Estimation - GitHub Pages

May 01, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: L11: 6D Pose Estimation - GitHub Pages

L11: 6D Pose Estimation

Hao Su

Machine Learning meets Geometry

Ack: Minghua Liu and Jiayuan Gu for helping to prepare slides

Page 2: L11: 6D Pose Estimation - GitHub Pages

We aren’t talking about human pose

2

Figure from https://www.tensorflow.org/lite/models/pose_estimation/overview

Page 3: L11: 6D Pose Estimation - GitHub Pages

We are talking about object pose

3

Figure from https://paperswithcode.com/task/6d-pose-estimation

Page 4: L11: 6D Pose Estimation - GitHub Pages

Rigid Transformation

4

rotate R ∈ 𝕊𝕆(3)

translate t ∈ ℝ3×1

source points target points

Transformation is relative!

Page 5: L11: 6D Pose Estimation - GitHub Pages

Rigid Transformation

• Rigid transformation , where

• Represented by a rotation , and a translation

• All the rigid transformations form the special Euclidean group, denoted by

T(p) = Rp + t p ∈ ℝ3×1

R ∈ SO(3)t ∈ ℝ3×1

{T}SE(3)

5

Page 6: L11: 6D Pose Estimation - GitHub Pages

6D Pose Estimation

6

recognize the 3D location and orientation of an object relative to a canonical frame

canonical frame

pose 1

pose 2

Page 7: L11: 6D Pose Estimation - GitHub Pages

6D Pose & Rigid Transformation

• 6D pose: object-level rigid transformation, associated with a canonical frame

• rigid transformation: can be object-level or scene-level, no predefined canonical frame

7

Page 8: L11: 6D Pose Estimation - GitHub Pages

Agenda

• Introduction

• Rigid transformation estimation- Closed-form solution given correspondence- Iterative closet point (ICP)

• Learning-based approaches- Direct approaches- Indirect approaches

8

Page 9: L11: 6D Pose Estimation - GitHub Pages

Rigid Transformation Estimation

Page 10: L11: 6D Pose Estimation - GitHub Pages

Rigid Transformation Estimation

10

Rigid transformation T(p) = Rp + t

source

rotate translate

target

Page 11: L11: 6D Pose Estimation - GitHub Pages

Correspondence

11

Rigid transformation T(p) = Rp + t

source

target

q1 = T(p1) = Rp1 + t

q2 = T(p2) = Rp2 + t

qn = T(pn) = Rpn + t

3n equations from n pairs of points

pi

qi

Page 12: L11: 6D Pose Estimation - GitHub Pages

Estimate from Correspondence

12

Rigid transformation T(p) = Rp + t

source

target

How many pairs of points are required to

uniquely define a rigid transformation?

pi

qi

Page 13: L11: 6D Pose Estimation - GitHub Pages

Two Key Steps

• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search

• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution

13

Page 14: L11: 6D Pose Estimation - GitHub Pages

Two Key Steps

• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search

• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution

14

Math is coming

Page 15: L11: 6D Pose Estimation - GitHub Pages

Least-square Estimation of Rigid Transformation

• Given source points , and target points , the objective (least-square error) is:

P = {pi}Q = {qi}

L =n

∑i=1

∥Rpi + t − qi∥2

15

Page 16: L11: 6D Pose Estimation - GitHub Pages

Optimization Problem

• Parameters: and

• Target point cloud:

• Source point cloud:

• Objective:

R ∈ 𝕊𝕆(3) t ∈ ℝ3×1

Q = {qi}

P = {pi}

minR,t

n

∑i=1

∥Rpi + t − qi∥2

16

Page 17: L11: 6D Pose Estimation - GitHub Pages

• Assuming is known,

- Recall the objective

- Calculate the gradient

- Solve :

R ∈ 𝕊𝕆(3)

L =n

∑i=1

∥Rpi + t − qi∥2

∂L∂t

=n

∑i=1

(Rpi + t − qi)

∂L∂t

= 0

17

Step I: Representing by t R

t =∑n

i=1 qi

n−

∑ni=1 Rpi

n

Page 18: L11: 6D Pose Estimation - GitHub Pages

• Substituting the expressed by , the objective can be simplified as

,

where

,

t ∈ ℝ3×1 R

L =n

∑i=1

∥Rp̄i − q̄i∥2

p̄i = pi −∑n

i=1 pi

n

q̄i = qi −∑n

i=1 qi

n

18

Step I: Representing by t R

Page 19: L11: 6D Pose Estimation - GitHub Pages

19

L =n

∑i=1

∥Rp̄i − q̄i∥2

, subject to , where ,

⇒ R* = argminR∥RP − Q∥F

RTR = Idet(R) = 1

P = [p1, p2, ⋯, pn] ∈ ℝ3×n

Q = [q1, q2, ⋯, qn] ∈ ℝ3×n

Objective:

Step II: Solve R

Page 20: L11: 6D Pose Estimation - GitHub Pages

• Orthogonal Procrustes Problem, subject to

• Notice: No determinant constraint• This problem has a closed form solution!

- Project to the space of orthogonal matrices

- Numerically, the magical SVD comes!‣ If ,‣ then

• The proof can be found on the wiki

argminR∥RP − Q∥F RTR = I

M = QPT

M = UΣVT

R = UVT

20

Step II: Solve R

Page 21: L11: 6D Pose Estimation - GitHub Pages

• How to satisfy the determinant constraint?

• Assume - If , then flip the sign of the last

column of

• The proof can be found in Umeyama's paper

R = UVT

det(R) = − 1V

21

Step II: Solve R

Page 22: L11: 6D Pose Estimation - GitHub Pages

Summary of Closed-Form Solution (Known Correspondences)

• Known: ,

• Objective:

• Solution

- (SVD)

- (flip the sign of the last column of if )

-

P = {pi} Q = {qi}

minR,t

n

∑i=1

∥Rpi + t − qi∥2

n

∑i=1

(qi − q̄i)(pi − p̄i)T = UΣVT

R = UVT Vdet(R) = − 1

t =∑n

i=1 qi

n−

∑ni=1 Rpi

n:= q̄ − Rp̄

22

Page 23: L11: 6D Pose Estimation - GitHub Pages

Two Key Steps

• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search

• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution

23

Page 24: L11: 6D Pose Estimation - GitHub Pages

Iterative Closet Point (ICP)

Page 25: L11: 6D Pose Estimation - GitHub Pages

Heuristic

• The closest point might be the corresponding point

• If starting from a transformation close to the actual one, we can iteratively improve the estimation

25

Page 26: L11: 6D Pose Estimation - GitHub Pages

Find Correspondence

26

source

target

pi

qi

source

target

pi

qi

GT correspondence ICP correspondence

Page 27: L11: 6D Pose Estimation - GitHub Pages

Update Transformation

27

source

target

Update transformation by minimizing the least square error

source

target

pi

qi

ICP correspondence

Page 28: L11: 6D Pose Estimation - GitHub Pages

Iterate to Refine

28

source

target

Find correspondence

Update transformation

source

target

Page 29: L11: 6D Pose Estimation - GitHub Pages

General ICP Algorithm

Starting from an initial transformation

1. Find correspondence: for each point in the source point cloud transformed with current transformation, find the nearest neighbor in the target point cloud

2. Update the transformation by minimizing an objective function over the correspondence

3. Go to step 1 until the transformation is not updated

T = (R, t)

E(T )

29

Page 30: L11: 6D Pose Estimation - GitHub Pages

Illustration

30

Animation from https://github.com/yassram/iterative-closest-point Animation from https://github.com/pglira/simpleICP

Page 31: L11: 6D Pose Estimation - GitHub Pages

Improve ICP

• Objective functions- PointToPoint- PointToPlane (faster convergence, but requires

normal computation)

• Outlier removal: abandon pairs of points with too large distance

31

Page 33: L11: 6D Pose Estimation - GitHub Pages

Limitation of ICP

• However, even with improvement- Easy to get stuck in local minima- Require a good initialization to work

33

Page 34: L11: 6D Pose Estimation - GitHub Pages

Acquire the Initialization for ICP• Go-ICP (correspondences based on features)

- http://www.open3d.org/docs/release/tutorial/pipelines/global_registration.html

- https://github.com/yangjiaolong/Go-ICP

• Teaser (more robust to outliers)- https://github.com/MIT-SPARK/TEASER-plusplus

34 Read by yourself

Page 35: L11: 6D Pose Estimation - GitHub Pages

Learning-based Approach

Page 36: L11: 6D Pose Estimation - GitHub Pages

Two Categories of Approaches

• Direct: predict directly

• Indirect: predict corresponding pairs - points in the canonical frame - points in the camera frame - estimate by solving

(R, t)

{(pi, qi)}{pi}

{qi}(R, t)

minR,t

n

∑i=1

∥Rpi + t − qi∥2

36

Page 37: L11: 6D Pose Estimation - GitHub Pages

Direct Approaches

Page 38: L11: 6D Pose Estimation - GitHub Pages

Direct Approaches• Input: cropped image/point cloud/depth map of a

single object• Output: (R, t)

38

Point cloud

Depth map

Image

Neural Network (R, t)

Page 39: L11: 6D Pose Estimation - GitHub Pages

Challenges for Direct Approaches

• The choice of the representation of

- Recall Lec 5: rotation matrix, Euler angles, quaternion, axis-angle, … (more will be introduced in this lecture)

- Which is more learnable for neural networks?

R

39

Page 40: L11: 6D Pose Estimation - GitHub Pages

Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion

Page 41: L11: 6D Pose Estimation - GitHub Pages

Continuity: 2D Rotation Example

41

2D rotation can be parameterized by θ

However, the mapping is discontinuous due to the topology difference

Zhou, Yi, et al. "On the continuity of rotation representations in neural networks." CVPR 2019

Page 42: L11: 6D Pose Estimation - GitHub Pages

Rotation Representations

42

Representation Continuous parameterization Unique for a rotation

Rotation Matrix ✔ ✔

Euler Angle No No (gimbal lock)Angle-axis No No (singularity)Quaternion No No (double covering)

Note that neural networks are generally continuous

Fitting a discontinuous function is not friendly to neural networks

Page 43: L11: 6D Pose Estimation - GitHub Pages

Continuous Representation

• Next, we will introduce two continuous representations: 6D (vector) and 9D (vector)

43

Page 44: L11: 6D Pose Estimation - GitHub Pages

6D Representation

• 6D representation: , where

• Convert to a rotation matrix through the Gram-Schmidt process,

where with unit length

, ,

• Convert a rotation matrix to 6D:

[aT1 , aT

2 ] a1, a2 ∈ ℝ3x1

x = [aT1 , aT

2 ]R = [b1, b2, b3]

b1, b2, b3 ∈ ℝ3x1

b1 =a1

∥a1∥b2 ∝

a2

∥a2∥− ⟨b1,

a2

∥a2∥⟩b1 b3 = b1 × b2

x = [bT1 , bT

2 ]

44

Zhou, Yi, et al. "On the continuity of rotation representations in neural networks." CVPR 2019

Page 45: L11: 6D Pose Estimation - GitHub Pages

9D Representation

• 9D representation (rotation matrix):

• Find the closest rotation matrix - Recall the Orthogonal Procrustes Problem

, subject to - equivalent to and

• Implementation

X ∈ ℝ3x3

R ∈ SO(3)

argminR∥RP − Q∥F RTR = IP = I Q = X

45

Levinson, Jake, et al. "An analysis of svd for deep rotation estimation." NeurIPS 2020

Page 46: L11: 6D Pose Estimation - GitHub Pages

Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion

Page 47: L11: 6D Pose Estimation - GitHub Pages

Loss for Direct Approaches

• shape-agnostic: distance between and

• shape-aware: distance between the same shape transformed by and

respectively

(R, t)(RGT, tGT)

X ∈ ℝ3×n (R, t) (RGT, tGT)

47

Page 48: L11: 6D Pose Estimation - GitHub Pages

Shape-agnostic Loss

• Rotation- Geodesic distance (angle difference, relative

rotation error) on : - Mean square error over : - distance on representation :

• Translation- distance:

SO(3) arccos[12

(tr(RGT RT) − 1)]

ℝ3×3 ∥R − RGT∥2F

Lp x ∥x − xGT∥p

Lp ∥t − tGT∥p

48

Page 49: L11: 6D Pose Estimation - GitHub Pages

Challenges: Symmetry • Symmetry introduces ambiguities of GT labels

• Require specially designed losses to tackle it

49

Examples of symmetric objects in YCB dataset

Page 50: L11: 6D Pose Estimation - GitHub Pages

Rotational Symmetry

50

1 symmetry axissymmetry order 2

1 symmetry axissymmetry order infinite

3 symmetry axissymmetry order (4, 2, 2)

Page 51: L11: 6D Pose Estimation - GitHub Pages

Shape-agnostic Loss for Symmetric Objects

• Symmetry group: Multiple symmetry-equivalent GT rotations (n is the order of symmetry)

• Min of N loss (for finite order of symmetry)

ℛ = {R1GT, R2

GT, ⋯, RnGT}

minRi

GT∈ℛL(Ri

GT, R)

51

Q: How to deal with infinite order?

Page 52: L11: 6D Pose Estimation - GitHub Pages

Shape-agnostic Loss for Symmetric Objects (Infinite Order)

52 Read by yourself

θ

Use the angle between two symmetry axes

Page 53: L11: 6D Pose Estimation - GitHub Pages

Motivation of Shape-aware Loss

53

target pose predicted pose1 predicted pose2

Similar angle difference, but very different perception error

Page 54: L11: 6D Pose Estimation - GitHub Pages

Shape-aware Loss

• Given a shape , the loss for rotation can be the distance between and

• e.g., (per-point Mean Square Estimation (MSE))

X ∈ ℝ3×n

RX RGT X

L = ∥RX − RGT X∥2F

54

Page 55: L11: 6D Pose Estimation - GitHub Pages

Shape-aware Loss for Symmetry Objects

• With as the distance metric, we can also apply the Min of N loss

∥RX − RGT X∥2F

L = minRi

GT∈ℛ∥RX − Ri

GT X∥2F

55

Page 56: L11: 6D Pose Estimation - GitHub Pages

Shape-aware Loss for Symmetry Objects

• Recall the distance metrics for point clouds in Lec7- Chamfer distance- earth mover distance

• Those distance metrics are compatible with symmetry (even infinite order)

• But requires a shape and might get stuck in local minima

56

Xiang et al., “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes”, RSS 2018

Page 57: L11: 6D Pose Estimation - GitHub Pages

Example of Local Minima for Chamfer Distance

57

No local minima for Min of N

CD=2.02

Rotate the red left a bit?

Page 58: L11: 6D Pose Estimation - GitHub Pages

Example of Local Minima for Chamfer Distance

58

No local minima for Min of N

CD=2.05

Page 59: L11: 6D Pose Estimation - GitHub Pages

Example of Local Minima for Chamfer Distance

59

No local minima for Min of N

CD=2.02

Reset, and try rotate right:

Page 60: L11: 6D Pose Estimation - GitHub Pages

Example of Local Minima for Chamfer Distance

60

No local minima for Min of N

CD=2.50

Page 61: L11: 6D Pose Estimation - GitHub Pages

Example of Local Minima for Chamfer Distance

61

Note: No local minima for Min of N

CD=2.05 CD=2.02 CD=2.50

Page 62: L11: 6D Pose Estimation - GitHub Pages

Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion

Page 63: L11: 6D Pose Estimation - GitHub Pages

Example: DenseFusion

63Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019

Segment objects

Page 64: L11: 6D Pose Estimation - GitHub Pages

Example: DenseFusion

64Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019

Get the image crop and the point cloud for an object

Page 65: L11: 6D Pose Estimation - GitHub Pages

Example: DenseFusion

65Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019

Encode appearance (image) with CNN and geometry (point cloud) with PointNet

appearance

Page 66: L11: 6D Pose Estimation - GitHub Pages

Example: DenseFusion

66Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019

Predict poses from features

appearance

Page 67: L11: 6D Pose Estimation - GitHub Pages

Example: DenseFusion

67Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019

Compute shape-aware loss

non-symmetric objects:per-point MSE

symmetric objects:chamfer distance

Min-of-N as alternative?

Page 68: L11: 6D Pose Estimation - GitHub Pages

Indirect Approaches

Page 69: L11: 6D Pose Estimation - GitHub Pages

Indirect Approaches

• Input: cropped image/point cloud/depth map of a single object

• Output: corresponding pairs - points in canonical frame - points in camera frame - estimate by solving

{(pi, qi)}{pi}

{qi}(R, t)

minR,t

n

∑i=1

∥Rpi + t − qi∥2

69

Page 70: L11: 6D Pose Estimation - GitHub Pages

Two Categories of Indirect Approaches

• If points in the canonical frame are known, predict their corresponding locations in the camera frame

• If points in the camera frame are known, predict their corresponding locations in the canonical frame

70

Page 71: L11: 6D Pose Estimation - GitHub Pages

Given Points in the Canonical Frame, Predict Corresponding Location in the Camera Frame

• Recall: Three correspondences are enough • Which points in the canonical frame should be given?

- Choice by PVN3D: keypoints in the canonical frame

71He et al., “PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation”, CVPR 2020

Page 72: L11: 6D Pose Estimation - GitHub Pages

• Option1: bounding box vertices

• Option 2: farthest point sampling (FPS) over CAD object model

Keypoint Selection

72Chen et al., “G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features”, CVPR 2020

Page 73: L11: 6D Pose Estimation - GitHub Pages

Example: PVN3D

73

Get point-wise features by fusing color and geometry features

Page 74: L11: 6D Pose Estimation - GitHub Pages

Example: PVN3D

74

For each keypoint:• Voting: for each point in the camera frame, predict its offset to the keypoint (in the camera frame)

• Clustering: find one location according to all the candidates

Keypoint

Page 75: L11: 6D Pose Estimation - GitHub Pages

Given Points in the Camera Frame, Predict Corresponding Location in the Canonical Frame

• Which points in the camera frame should be given?- Choice by NOCS: every point in the camera frame

75Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.

3D point in the camera frame(2D visible pixel with depth)

3D point in the (normalized) canonical frame

Page 76: L11: 6D Pose Estimation - GitHub Pages

Example: NOCS

76Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.

Output NOCS Map H × W × 3

Input Image

Note: the object is normalized to have unit diagonal of bounding box in the canonical space, so the canonical space is called “Normalized Object Canonical Space” (NOCS)

Page 77: L11: 6D Pose Estimation - GitHub Pages

Example: NOCS for Symmetric Objects

• Given equivalent GT rotations (finite symmetry order n),

we can generate n equivalent NOCS maps

• Similar to shape-agnostic loss in direct approaches, we can use Min of N loss

ℛ = {R1GT, R2

GT, ⋯, RnGT}

77Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.

Page 78: L11: 6D Pose Estimation - GitHub Pages

Umeyama’s Algorithm

• However, the target points in the canonical space of NOCS are normalized, and thus we also need to predict the scale factor

• Similarity transformation estimation (rigid transformation + uniform scale factor)

• Closed-form solution- Umeyama algorithm: http://web.stanford.edu/class/

cs273/refs/umeyama.pdf- Similar to the counterpart without scale

78 Read by yourself

Page 79: L11: 6D Pose Estimation - GitHub Pages

Tips for Homework 2

• For learning-based approaches- Start with direct approaches- Crop the point cloud of each object from GT depth

map given GT segmentation mask- Train a neural network, e.g. PointNet, with shape-

agnostic loss- Improve the results considering symmetry

79