Qizhu Li, Anurag Arnab, Philip H.S. Torr - Information …liqizhu/projects/weakly... · 2018. 9. 17. · WEAKLY- AND SEMI-SUPERVISED PANOPTIC SEGMENTATION Qizhu Li, Anurag Arnab,

WEAKLY- AND SEMI-SUPERVISED PANOPTIC SEGMENTATIONQizhu Li*, Anurag Arnab*, Philip H.S. Torr

* Indicates equal contribution

Scan me!

Project Page

INTRODUCTION

We present a weakly supervised model that jointly performs both semantic- and

instance-segmentation – a particularly relevant problem given the substantial cost of

obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance

segmentation approaches based on object detectors, our method does not predict

any overlapping instances. Moreover, we are able toAvailable for

download!segment both “thing” and “stuff” classes, and thus

explain all the pixels in the image. MethodValidation Test

𝐴𝑃𝑣𝑜𝑙𝑟 th. 𝐴𝑃𝑣𝑜𝑙

𝑟 st. 𝐴𝑃𝑣𝑜𝑙𝑟 all 𝑃𝑄 th. 𝑃𝑄 st. 𝑃𝑄 all 𝐴𝑃𝑣𝑜𝑙

𝑟 th.

Ours (weak, ImageNet init.) 17.0 33.1 26.3 35.8 43.9 40.5 12.8

Ours (full, ImageNet init.) 24.3 42.6 34.9 39.6 52.9 47.3 18.8

Ours (full, PSPNet [8] init.) [1] 28.6 52.6 42.5 42.5 62.1 53.8 23.4

Pixel Encoding [3] 9.9 - - - - - 8.9

RecAttend [4] - - - - - - 9.5

InstanceCut [5] - - - - - - 13.0

DWT [6] 21.2 - - - - - 19.4

SGN [7] 29.2 - - - - - 25.0

Dataset𝐼𝑜𝑈 𝐴𝑃𝑣𝑜𝑙

𝑟 𝑃𝑄VOC COCO

Weak Weak 75.7 55.5 59.5

Weak Full 75.8 56.1 59.8

Full Weak 77.5 58.9 62.7

Full Full 79.0 59.5 63.1

Method 𝐼𝑜𝑈 (weak) 𝐼𝑜𝑈 (full) 𝐹𝑆%

Ours (th.) 68.2 70.4 96.9

Ours (st.) 60.2 72.4 83.1

Ours (all) 63.6 71.6 88.8

Table 1. Semantic and instance segmentation

performance on Pascal VOC with varying levels of

supervision. We obtain state-of-the-art results for

both full and weak supervision.

Table 2. Semantic segmentation results on the

Cityscapes val. set. Using more informative,

bounding-box cues for “thing” classes leads to

its higher 𝐹𝑆% than that of “stuff” classes,

which are trained with only image-level tags.

Table 3. Instance-level segmentation results on Cityscapes. On the validation set, we report results for

both “thing” (th.) and “stuff” (st.) classes. The online server, which evaluates the test set, only computes

the 𝐴𝑃𝑣𝑜𝑙𝑟 for “thing” classes. We compare to other fully supervised methods which produce non-

overlapping instances.

Semantic and instance segmentation

on Pascal VOC (weak, semi, full):

Semantic segmentation on

Cityscapes (weak, full):

Instance-level segmentation on Cityscapes (weak, full):

Input

image

“Thing”

detector

Fully

convolutional

network

Box

consistency

term

Global

term

Instance

CRF

Instance-level

segmentation

Category-level

Seg. Module

Instance-level

Seg. Module

Forward and

backward

Forward

only

𝐻 ×𝑊 × 3

𝐻 ×𝑊 × (𝐶𝑠 + 𝐶𝑡)

5 × 𝐷𝑡 𝐻 ×𝑊 × (𝐶𝑠 + 𝐷𝑡)

𝐻 ×𝑊 × (𝐶𝑠 + 𝐷𝑡)

𝐻 ×𝑊 × (𝐶𝑠 + 𝐷𝑡)

We use the network architecture proposed in our previous fully-supervised work [1], which produce

non-overlapping instances. Each of the 𝐷𝑡 detections (variable number per image) defines a possible

“thing” instance. We assume that there can only be a single instance of a "stuff" class in an image.

Therefore, there can be (𝐶𝑠 + 𝐷𝑡) instances per image which we need to label.

The box consistency term 𝜓𝐵𝑜𝑥 encourages pixels inside a bounding box 𝐵𝑖 (given by the detector

for “things”, or covering the whole image for “stuff”) to associate with the 𝑖-th instance:

𝜓𝐵𝑜𝑥 𝑉𝑘 = 𝑖 = ቊ𝑠𝑖𝑄𝑘 𝑙𝑖 , 𝑘 ∈ 𝐵𝑖0, otherwise

The global term 𝜓𝐺𝑙𝑜𝑏𝑎𝑙 handles poor detection localisation:

𝜓𝐺𝑙𝑜𝑏𝑎𝑙 𝑉𝑘 = 𝑖 = 𝑄𝑘(𝑙𝑖)

We use the same CRF formulation as our earlier work [1] with densely connected pairwise terms [2]:

𝐸 𝑽 = 𝒗 = −

𝑖

𝑁

ln 𝑤1𝜓𝐵𝑜𝑥 𝑣𝑖 + 𝑤2𝜓𝐺𝑙𝑜𝑏𝑎𝑙 𝑣𝑖 + 𝜀 +

𝑖<𝑗

𝑁

𝜓𝑃𝑎𝑖𝑟𝑤𝑖𝑠𝑒(𝑣𝑖 , 𝑣𝑗)

(a) Input image (b) Weakly supervised model (c) Fully supervised model

[1] A Arnab, et al. Pixelwise instance segmentation with a dynamically instantiated network. In CVPR, 2017.

[2] P Krahenbuhl and V Koltun. Efficient Inference in fully connected CRFs with Gaussian edge potentials. In NIPS, 2011.

[3] J Uhrig, et al. Pixel-level encoding and depth layering for instance-level semantic labeling. In GCPR, 2016.

[4] M Ren and RS Zemel. End-to-end instance segmentation with recurrent attention. In CVPR, 2017.

[5] A Kirillov, et al. Instancecut: from edgesto instances with multicut. In CVPR, 2017.

[6] M Bai and R Urtasun. Deep watershed transform for instance segmentation. In CVPR, 2017.

[7] S Liu, et al. Sgn: Sequential grouping networks for instance segmentation. In ICCV, 2017.

[8] H Zhao, et al. Pyramid scene parsing network. In CVPR, 2017.Project page: qizhuli.github.io/publication/weakly-supervised-panoptic-segmentation/

Code release: github.com/qizhuli/Weakly-Supervised-Panoptic-Segmentation

SEGMENTATION NETWORK STRUCTURE

QUANTITATIVE RESULTS

QUALITATIVE RESULTS

MergePseudo

ground truth

Image-level

tags

Train multi-

label

classifier

Class

activation

maps (CAM)

“STUFF” BRANCH

Bounding

boxes

Run MCG

and GrabCut

Coarse

foreground

masks (FM)

“THING” BRANCH

MergeTrain seg.

network

Network

predictions

Better

pseudo

ground truth

ITERATIVE TRAINING

Data

Compute

Input

Output

Input n

times

Legend

WEAKLY- AND SEMI-SUPERVISED TRAINING

1

2

3

(1a) Input image (1b) Localisation

heatmaps

(1c) Approximate

ground truth

(2a)

B-Boxes

Figure 1. Approximate ground truth generated from

image-level tags using weak localisation cues from a

multi-label classification network.

Figure 3. (3a-3e): By using the output of the trained network, the initial approximate ground truth

produced above (Iteration 0) can be iteratively refined. Black regions are “ignore” labels over which

the loss is not computed in training. Note for instance segmentation, permutations of instance labels

of the same class are equivalent. (3f): Panoptic quality (𝑃𝑄) of our panoptic segmentation results show

significant improvement due to iterative training.

(3a) Input image (3b) Iteration 0 (3c) Iteration 2 (3d) Iteration 5 (3e) Ground truth (3f) 𝑃𝑄 vs Iter.

4

3 4

1

(2b) Appr.

Semantic GT

Figure 2. Approximate ground truth

generated from bounding boxes usingcoarse object masks from MCG&GrabCut.

2

(2c) Appr.

Instance GT

road building

vegetation sky

Qizhu Li*, Anurag Arnab*, Philip H.S. Torr - Information …liqizhu/projects/weakly... · 2018. 9. 17. · WEAKLY- AND SEMI-SUPERVISED PANOPTIC SEGMENTATION Qizhu Li*, Anurag Arnab*,

Documents

Qizhu Li, Anurag Arnab, Philip H.S. Torr - Information …liqizhu/projects/weakly... · 2018. 9. 17. · WEAKLY- AND SEMI-SUPERVISED PANOPTIC SEGMENTATION Qizhu Li, Anurag Arnab,