YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Improving Weakly-Supervised Object Localization By Micro ... · Improving Weakly-Supervised Object Localization By Micro-Annotation Alexander Kolesnikov akolesnikov@ist.ac.at Christoph

Improving Weakly-Supervised Object Localization By Micro-Annotation

Alexander [email protected]

Christoph H. [email protected]

IST AustriaAm Campus 13400 KlosterneuburgAustria

Localizationscore map

Similarity matrix

CNNM

id-lev

elPat

tern

s

For every image in

the training set

Cluster I

Cluster II

(i)

(ii) (iii)

Localizationscore map

Localizationscore map with

suppressed distractors

CNN

Object

Distractor

(iv)

Object localization is a crucial step neededfor building automatic systems for visual sceneunderstanding. This task can be successfullytackled using fully-supervised learning methods,but these require annotations in a form ofbounding boxes or per-pixel segmentation masksthat are time-consuming and expensive to ac-quire. Therefore, it is important to developweakly-supervised object localization learningtechniques, which require much cheaper formsof annotation, e.g. image-level class labels.

Analyzing the current methods for weakly-supervised object localization we arrive at theconclusion that they tend to fail for object classesthat consistently co-occur with the same back-ground elements (distractors), e.g. trains ontracks. We overcome these failures by develo-ping a new procedure that determines seman-tic parts that constitute the object detection andthen discards distractor parts. The main steps ofour approach are (see Figure above) (i) representall predicted foreground regions of all imagesby mid-level features learned by a deep neuralnetwork, (ii) cluster these features using spectralclustering (the number of clusters is determinedautomatically), (iii) visualize the clusters and leta human annotator select which ones actually

corresponds to the object class of interest. Theinformation about clusters and their annotationcan then be used to better localize objects: (iv)for any (new) image, predict a foreground mapusing only the image regions that match clusterslabeled as ’object’.

Note, that the proposed method requires vir-tually negligible amount of additional supervi-sion: an annotator has to answer a few binaryquestions (typically 2 or 3) per semantic class.Huge datasets, such as ILSVRC, can be anno-tated by one annotator in just a few hours.

The proposed approach can be readily usedin combination with many existing localizationmethods. In this work we combine it withthe current state-of-the art methods for weakly-supervised bounding box prediction [2] and forweakly-supervised semantic segmentation [1],showing improved results on the challengingILSVRC 2014 and PASCAL VOC 2012 datasets.

[1] A. Kolesnikov and C. H. Lampert. Seed,expand and constrain: Three principlesfor weakly-supervised image segmentation.ECCV, 2016.

[2] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva,and A. Torralba. Learning deep features fordiscriminative localization. In CVPR, 2016.

Related Documents