L11: 6D Pose Estimation - GitHub Pages

L11: 6D Pose Estimation

Hao Su

Machine Learning meets Geometry

Ack: Minghua Liu and Jiayuan Gu for helping to prepare slides

We aren’t talking about human pose

2

Figure from https://www.tensorflow.org/lite/models/pose_estimation/overview

https://www.tensorflow.org/lite/models/pose_estimation/overview

We are talking about object pose

3

Figure from https://paperswithcode.com/task/6d-pose-estimation

https://paperswithcode.com/task/6d-pose-estimation

Rigid Transformation

4

rotate R ∈ 𝕊𝕆(3)

translate t ∈ ℝ3×1

source points target points

Transformation is relative!

Rigid Transformation

• Rigid transformation , where

• Represented by a rotation , and a translation

• All the rigid transformations form the special Euclidean group, denoted by

T(p) = Rp + t p ∈ ℝ3×1

R ∈ SO(3)t ∈ ℝ3×1

{T}SE(3)

5

6D Pose Estimation

6

recognize the 3D location and orientation of an object relative to a canonical frame

canonical frame

pose 1

pose 2

6D Pose & Rigid Transformation

• 6D pose: object-level rigid transformation, associated with a canonical frame

• rigid transformation: can be object-level or scene-level, no predefined canonical frame

7

Agenda

• Introduction

• Rigid transformation estimation- Closed-form solution given correspondence- Iterative closet point (ICP)

• Learning-based approaches- Direct approaches- Indirect approaches

8

Rigid Transformation Estimation

Rigid Transformation Estimation

10

Rigid transformation T(p) = Rp + t

source

rotate translate

target

Correspondence

11


source

target

q1 = T(p1) = Rp1 + t

q2 = T(p2) = Rp2 + t

qn = T(pn) = Rpn + t

…

3n equations from n pairs of points

pi

qi

Estimate from Correspondence

12


source

target

How many pairs of points are required to

uniquely define a rigid transformation?

pi

qi

Two Key Steps

• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search

• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution

13

Two Key Steps



14

Math is coming

Least-square Estimation of Rigid Transformation

• Given source points , and target points , the objective (least-square error) is:

P = {pi}Q = {qi}

L =n

∑i=1

∥Rpi + t − qi∥2

15

Optimization Problem

• Parameters: and

• Target point cloud:

• Source point cloud:

• Objective:

R ∈ 𝕊𝕆(3) t ∈ ℝ3×1

Q = {qi}

P = {pi}

minR,t

n

∑i=1


16

• Assuming is known,

- Recall the objective

- Calculate the gradient

- Solve :

R ∈ 𝕊𝕆(3)

L =n

∑i=1


∂L∂t

=n

∑i=1

(Rpi + t − qi)

∂L∂t

= 0

17

Step I: Representing by t R

t =∑n

i=1 qi

n−

∑ni=1 Rpi

n

• Substituting the expressed by , the objective can be simplified as

,

where

,

t ∈ ℝ3×1 R

L =n

∑i=1

∥Rp̄i − q̄i∥2

p̄i = pi −∑n

i=1 pi

n

q̄i = qi −∑n

i=1 qi

n

18

Step I: Representing by t R

19

L =n

∑i=1

∥Rp̄i − q̄i∥2

, subject to , where ,

⇒ R* = argminR∥RP − Q∥F

RTR = Idet(R) = 1

P = [p1, p2, ⋯, pn] ∈ ℝ3×n

Q = [q1, q2, ⋯, qn] ∈ ℝ3×n

Objective:

Step II: Solve R

• Orthogonal Procrustes Problem, subject to

• Notice: No determinant constraint• This problem has a closed form solution!

- Project to the space of orthogonal matrices

- Numerically, the magical SVD comes!‣ If ,‣ then

• The proof can be found on the wiki

argminR∥RP − Q∥F RTR = I

M = QPT

M = UΣVT

R = UVT

20

Step II: Solve R

https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem

• How to satisfy the determinant constraint?

• Assume - If , then flip the sign of the last

column of

• The proof can be found in Umeyama's paper

R = UVT

det(R) = − 1V

21

Step II: Solve R

http://web.stanford.edu/class/cs273/refs/umeyama.pdf

Summary of Closed-Form Solution (Known Correspondences)

• Known: ,

• Objective:

• Solution

- (SVD)

- (flip the sign of the last column of if )

-

P = {pi} Q = {qi}

minR,t

n

∑i=1


n

∑i=1

(qi − q̄i)(pi − p̄i)T = UΣVT

R = UVT Vdet(R) = − 1

t =∑n

i=1 qi

n−

∑ni=1 Rpi

n:= q̄ − Rp̄

22

Two Key Steps



23

Iterative Closet Point (ICP)

Heuristic

• The closest point might be the corresponding point

• If starting from a transformation close to the actual one, we can iteratively improve the estimation

25

Find Correspondence

26

source

target

pi

qi

source

target

pi

qi

GT correspondence ICP correspondence

Update Transformation

27

source

target

Update transformation by minimizing the least square error

source

target

pi

qi

ICP correspondence

Iterate to Refine

28

source

target

Find correspondence

Update transformation

source

target

General ICP Algorithm

Starting from an initial transformation

1. Find correspondence: for each point in the source point cloud transformed with current transformation, find the nearest neighbor in the target point cloud

2. Update the transformation by minimizing an objective function over the correspondence

3. Go to step 1 until the transformation is not updated

T = (R, t)

E(T )

29

Illustration

30

Animation from https://github.com/yassram/iterative-closest-point Animation from https://github.com/pglira/simpleICP

https://github.com/yassram/iterative-closest-point

https://github.com/pglira/simpleICP

Improve ICP

• Objective functions- PointToPoint- PointToPlane (faster convergence, but requires

normal computation)

• Outlier removal: abandon pairs of points with too large distance

31

Useful Libraries

• Open3D- http://www.open3d.org/docs/release/tutorial/

pipelines/icp_registration.html- http://www.open3d.org/docs/release/tutorial/

geometry/kdtree.html

• PCL: https://pointclouds.org/documentation/group__registration.html

32

http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html

http://www.open3d.org/docs/release/tutorial/pipelines/icp_registration.html

http://www.open3d.org/docs/release/tutorial/geometry/kdtree.html

http://www.open3d.org/docs/release/tutorial/geometry/kdtree.html

https://pointclouds.org/documentation/group__registration.html



Limitation of ICP

• However, even with improvement- Easy to get stuck in local minima- Require a good initialization to work

33

Acquire the Initialization for ICP• Go-ICP (correspondences based on features)

- http://www.open3d.org/docs/release/tutorial/pipelines/global_registration.html

- https://github.com/yangjiaolong/Go-ICP

• Teaser (more robust to outliers)- https://github.com/MIT-SPARK/TEASER-plusplus

34 Read by yourself

http://www.open3d.org/docs/release/tutorial/pipelines/global_registration.html

http://www.open3d.org/docs/release/tutorial/pipelines/global_registration.html

https://github.com/yangjiaolong/Go-ICP

https://github.com/MIT-SPARK/TEASER-plusplus

Learning-based Approach

Two Categories of Approaches

• Direct: predict directly

• Indirect: predict corresponding pairs - points in the canonical frame - points in the camera frame - estimate by solving

(R, t)

{(pi, qi)}{pi}

{qi}(R, t)

minR,t

n

∑i=1


36

Direct Approaches

Direct Approaches• Input: cropped image/point cloud/depth map of a

single object• Output: (R, t)

38

Point cloud

Depth map

Image

Neural Network (R, t)

Challenges for Direct Approaches

• The choice of the representation of

- Recall Lec 5: rotation matrix, Euler angles, quaternion, axis-angle, … (more will be introduced in this lecture)

- Which is more learnable for neural networks?

R

39

Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion

Continuity: 2D Rotation Example

41

2D rotation can be parameterized by θ

However, the mapping is discontinuous due to the topology difference

Zhou, Yi, et al. "On the continuity of rotation representations in neural networks." CVPR 2019

Rotation Representations

42

Representation Continuous parameterization Unique for a rotation

Rotation Matrix ✔ ✔

Euler Angle No No (gimbal lock)Angle-axis No No (singularity)Quaternion No No (double covering)

Note that neural networks are generally continuous

Fitting a discontinuous function is not friendly to neural networks

Continuous Representation

• Next, we will introduce two continuous representations: 6D (vector) and 9D (vector)

43

6D Representation

• 6D representation: , where

• Convert to a rotation matrix through the Gram-Schmidt process,

where with unit length

, ,

• Convert a rotation matrix to 6D:

[aT1 , aT

2 ] a1, a2 ∈ ℝ3x1

x = [aT1 , aT

2 ]R = [b1, b2, b3]

b1, b2, b3 ∈ ℝ3x1

b1 =a1

∥a1∥b2 ∝

a2

∥a2∥− ⟨b1,

a2

∥a2∥⟩b1 b3 = b1 × b2

x = [bT1 , bT

2 ]

44

Zhou, Yi, et al. "On the continuity of rotation representations in neural networks." CVPR 2019

9D Representation

• 9D representation (rotation matrix):

• Find the closest rotation matrix - Recall the Orthogonal Procrustes Problem

, subject to - equivalent to and

• Implementation

X ∈ ℝ3x3

R ∈ SO(3)

argminR∥RP − Q∥F RTR = IP = I Q = X

45

Levinson, Jake, et al. "An analysis of svd for deep rotation estimation." NeurIPS 2020

https://github.com/google-research/google-research/tree/master/special_orthogonalization


Loss for Direct Approaches

• shape-agnostic: distance between and

• shape-aware: distance between the same shape transformed by and

respectively

(R, t)(RGT, tGT)

X ∈ ℝ3×n (R, t) (RGT, tGT)

47

Shape-agnostic Loss

• Rotation- Geodesic distance (angle difference, relative

rotation error) on : - Mean square error over : - distance on representation :

• Translation- distance:

SO(3) arccos[12

(tr(RGT RT) − 1)]

ℝ3×3 ∥R − RGT∥2F

Lp x ∥x − xGT∥p

Lp ∥t − tGT∥p

48

Challenges: Symmetry • Symmetry introduces ambiguities of GT labels

• Require specially designed losses to tackle it

49

Examples of symmetric objects in YCB dataset

Rotational Symmetry

50

1 symmetry axissymmetry order 2

1 symmetry axissymmetry order infinite

3 symmetry axissymmetry order (4, 2, 2)

Shape-agnostic Loss for Symmetric Objects

• Symmetry group: Multiple symmetry-equivalent GT rotations (n is the order of symmetry)

• Min of N loss (for finite order of symmetry)

ℛ = {R1GT, R2

GT, ⋯, RnGT}

minRi

GT∈ℛL(Ri

GT, R)

51

Q: How to deal with infinite order?

Shape-agnostic Loss for Symmetric Objects (Infinite Order)

52 Read by yourself

θ

Use the angle between two symmetry axes

Motivation of Shape-aware Loss

53

target pose predicted pose1 predicted pose2

Similar angle difference, but very different perception error

Shape-aware Loss

• Given a shape , the loss for rotation can be the distance between and

• e.g., (per-point Mean Square Estimation (MSE))

X ∈ ℝ3×n

RX RGT X

L = ∥RX − RGT X∥2F

54

Shape-aware Loss for Symmetry Objects

• With as the distance metric, we can also apply the Min of N loss

∥RX − RGT X∥2F

L = minRi

GT∈ℛ∥RX − Ri

GT X∥2F

55

Shape-aware Loss for Symmetry Objects

• Recall the distance metrics for point clouds in Lec7- Chamfer distance- earth mover distance

• Those distance metrics are compatible with symmetry (even infinite order)

• But requires a shape and might get stuck in local minima

56

Xiang et al., “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes”, RSS 2018

Example of Local Minima for Chamfer Distance

57

No local minima for Min of N

CD=2.02

Rotate the red left a bit?


58


CD=2.05


59


CD=2.02

Reset, and try rotate right:


60


CD=2.50


61

Note: No local minima for Min of N

CD=2.05 CD=2.02 CD=2.50


Example: DenseFusion

63Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019

Segment objects



Get the image crop and the point cloud for an object



Encode appearance (image) with CNN and geometry (point cloud) with PointNet

appearance



Predict poses from features

appearance



Compute shape-aware loss

non-symmetric objects:per-point MSE

symmetric objects:chamfer distance

Min-of-N as alternative?

Indirect Approaches

Indirect Approaches

• Input: cropped image/point cloud/depth map of a single object

• Output: corresponding pairs - points in canonical frame - points in camera frame - estimate by solving

{(pi, qi)}{pi}

{qi}(R, t)

minR,t

n

∑i=1


69

Two Categories of Indirect Approaches

• If points in the canonical frame are known, predict their corresponding locations in the camera frame

• If points in the camera frame are known, predict their corresponding locations in the canonical frame

70

Given Points in the Canonical Frame, Predict Corresponding Location in the Camera Frame

• Recall: Three correspondences are enough • Which points in the canonical frame should be given?

- Choice by PVN3D: keypoints in the canonical frame

71He et al., “PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation”, CVPR 2020

• Option1: bounding box vertices

• Option 2: farthest point sampling (FPS) over CAD object model

Keypoint Selection

72Chen et al., “G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features”, CVPR 2020

Example: PVN3D

73

Get point-wise features by fusing color and geometry features

Example: PVN3D

74

For each keypoint:• Voting: for each point in the camera frame, predict its offset to the keypoint (in the camera frame)

• Clustering: find one location according to all the candidates

Keypoint

Given Points in the Camera Frame, Predict Corresponding Location in the Canonical Frame

• Which points in the camera frame should be given?- Choice by NOCS: every point in the camera frame

75Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.

3D point in the camera frame(2D visible pixel with depth)

3D point in the (normalized) canonical frame

Example: NOCS


Output NOCS Map H × W × 3

Input Image

Note: the object is normalized to have unit diagonal of bounding box in the canonical space, so the canonical space is called “Normalized Object Canonical Space” (NOCS)

Example: NOCS for Symmetric Objects

• Given equivalent GT rotations (finite symmetry order n),

we can generate n equivalent NOCS maps

• Similar to shape-agnostic loss in direct approaches, we can use Min of N loss

ℛ = {R1GT, R2

GT, ⋯, RnGT}


Umeyama’s Algorithm

• However, the target points in the canonical space of NOCS are normalized, and thus we also need to predict the scale factor

• Similarity transformation estimation (rigid transformation + uniform scale factor)

• Closed-form solution- Umeyama algorithm: http://web.stanford.edu/class/

cs273/refs/umeyama.pdf- Similar to the counterpart without scale

78 Read by yourself




Tips for Homework 2

• For learning-based approaches- Start with direct approaches- Crop the point cloud of each object from GT depth

map given GT segmentation mask- Train a neural network, e.g. PointNet, with shape-

agnostic loss- Improve the results considering symmetry

79

L11: 6D Pose Estimation - GitHub Pages

Documents