Top Banner
Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona, and Serge Belongie Presented by: Yan Fang
52

Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Aug 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Visual Recognition with Humans

in the Loop

Authors: Steve Branson, Catherine Wah, Florian Schroff, Boris

Babenko, Peter Welinder, Pietro Perona, and Serge Belongie

Presented by: Yan Fang

Page 2: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Overview

• Problem Introduction- Challenge

- Goal

- Related Work

• Approach- Method Overview

- Incorporating Computer Vision

- User Response

• Experiments & Results- Datasets & Configuration

- Performance Evaluation

- Results

• Conclusion & Discussion

Page 3: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Problem Introduction

• Multi-class Object Recognition

• Challenge: Computer vision performs bad on fine-grain category

Inter-category:

easy for computer and human

Fine-grain category:

hard for computer and human

Page 4: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Why do we care?

• Low performance on basic-level category of CV

algorithms, not acceptable

• Low object category number in most datasets

• Important problem to study - help people recognize

types of objects they don't yet know how to identify

Page 5: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Why is it hard?

Difficulties for Human in Fine-grain category classification

Easy

Recognize sub-class Recognize visual attributes

Hard

Page 6: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Why is it hard?

Compare Human with Computer:

Human Computer

Memory, Expertise, Knowledge

Limited Good

Basic Visual Capabilities

Good Limited

Page 7: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Combine them together

Blue Belly

Finch?

Bunting?

Hard for computer and human Easy for human Easy for computer

Page 8: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Goal

• Build a human-computer framework for multi-class object

recognition

• Easy to plug in any object recognition algorithm

• Use assistance of human to improve performance

• Minimize the human effort in recognition task

• Good enough for real-life application

Page 9: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Related Work

• Recognition of tightly-related categories- Dataset: Oxford Flowers 102, UIUC Birds, and STONEFLY

shortcoming: scaling, object domain, performance

- Similar work: Botanist's Field Guide

difference: intention (for expert/layperson), processing of image

• Areas combine vision, learning with human input- Relevance feedback, active learning, expert system

- Similar but different from this work

• Scaling to large number of category- Class taxonomies feature sharing, error correcting output codes

(ECOC), attribute based classification methods

- Can be plug into this work

Page 10: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Approach

Page 11: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Method Overview

Goal: Given image, classify bird category

• Pose question about visual property for human, easy to answer

• Intelligently select question, exploit visual content by step

• Make decision based on refined probability distribution

Page 12: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Method Overview

Example of Visual 20-question game for human

A database of C classes needs O(log C) questions,

can be faster with computer vision

http://20q.net

Page 13: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Algorithm Details

Some terms:

A set of possible questions (e.g. IsRed?, HasStripes?, BellyColor?)

Answer with confidence value

// Initialize question set

// Ask question iteratively

// Pick question by information gain

// Pose the question

// Make the decision

Page 14: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

More notations

For time step t, select question

is the history response set

is the index of question in question set

is the current probability distribution for classification

is the information gain obtained if ask another

question

Page 15: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Select Next Question

Maximizing Information Gain like decision tree algorithm

Kullback–Leibler divergence, measure of

difference between two distributions

Entropy of

Depends on CV algorithm

Depends on user response

Cross-Entropy?

Page 16: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Incorporate Computer Vision

• Any recognition algorithm can be plugged in, e.g. classifier

like SVM that uses attributes or features

• The purpose of computer vision is to evaluate

• This conditional prior helps update the current class

distribution and determine which question to ask

• It’s OK not to use any CV algorithm, can be obtained

by any probability distribution, or simply replaced with prior

Page 17: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Incorporate Computer Vision

• Simple framework using Bayesian rule:

• Assume user response is class-dependent not image-

dependent

Page 18: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Modeling User Response

• Assume questions are answered independently given

the category (experimentally work)

Page 19: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Modeling User Response

• Dependencies of terms:

Page 20: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Modeling User Response

• Still need

• Assume

Weighted Dirichlet Prior

Global Attribute Prior Pooling together certainty labels

Page 21: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Modeling User Response

Example of user response

Page 22: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Experiments & Results

Page 23: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Dataset & Configuration

Bird200 Dataset

• 6033 images, 200 species

• Difficult for layperson

Questions:

• 25 question, 288 binary attributes

• Deterministic attributes from whatbird.com

Page 24: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Answer Collection

Mechanical Turk Interface:

• Non-expert answer

• Prototypical image with

supplementary material

• Use randomly selected

answer

Page 25: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Evaluation

Method Configuration:

• No Computer Vision

• Classifier based on SIFT, VL Features from Andrea

Vedaldi

• Classifier based on attributes

Evaluation:

• Ask T question, measure classification accuracy

• Provide images of the class with highest probability

after each question, user stop process by verify these

images

Page 26: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results & Performance

• No Computer Vision

• Contribution of modeling user response

• Non-expert user is not ideal

Page 27: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results & Performance

• Question number vs Accuracy

• CV algorithms do improve performance when

fewer questions are asked

Page 28: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results & Performance

• User stop tests

• CV algorithms reduce the labor of easy tasks

Page 29: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results & Performance

• Similar Performance on Animal with Attributes

• Attribute works better than 1-vs-all

Page 30: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Computer Vision Help Case

• Computers help select the proper question which

helps the correct recognition

Page 31: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Human Response Help Case

• User response help correct the wrong prediction

of computer vision

Page 32: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Failed Case

• Cropped image lead to the failing response of

certain question (attributes of belly)

• Two species are naturally similar, questions fail to

capture the distinguish attributes

Page 33: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Conclusion

• Pros- A framework combine computer vision and human recognition

- Compatible with any CV algorithms

- Human inputs improve the accuracy on hard recognition task

- Computer reduce human labor on easy task

- Practical for real application that help non-expert human

• Cons- Cropped image can lead to failure answers to questions

- Might not work on very similar species

- The attribute selection is very complicate and depends on expert

knowledge

Page 34: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Future Work

• Trend: reduce/exclude human efforts in the

framework

• Improve CV performance on hard problem

• Develop better question design and selection

mechanism

Page 35: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Discussion & Questions

Page 36: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

What's It Going to Cost You? : Predicting Effort vs.

Informativeness for Multi-Label Image Annotations

Sudheendra Vijayanarasimhan and Kristen Grauman

Page 37: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Overview

• Problem

• Method Overview

• Experiments & Results

• Conclusion & Discussion

Page 38: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Problem Introduction

• Annotation of train data is very important to visual recognition

• Manual effort required, and images are not equally informative

• Active learning does not fit visual category learning:- Images contain multiple objects need multiple labels

- More annotation type, regions, segments

- Each annotation cost different efforts due to different types and

image

Page 39: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Proposed Method

• A new active learning framework weight informativeness against

effort for annotation

• A multiple-instance, multi-label learning (MIML) formulation help

select most promising annotation

• Capable of choose both image and the types of annotation

• Learn from human to predict the effort cost of different image

Page 40: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Active Learning

Page 41: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Method Overview

Page 42: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Method Overview

Step 1.

Learn object categories from

multi-label images, with a

mixture of weak and strong

labels.

MIML- multiple-instance multi-

label learning

Page 43: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

MIML Scenario

Unlabeled images are

oversegmented into regions

Multiple bag of regions

Different level of annotation

provide different informativeness

Page 44: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Method Overview

Step 2. Active multi-level selection of multi-label annotations

• surveys unlabeled and partially labeled images,

• predicts the tradeoff between its informativeness versus the manual

effort

• Select the promising annotation and update classifier

Page 45: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Experiments

• The MSRCv2 dataset, 591 images and 21 classes

• Evaluate three aspects:- Accuracy of learning from multi-label examples

- Accuracy of annotation cost prediction

- Effectiveness to reduce manual effort

• RBF kernel for SVM, set parameter based on cross-validation, ignore

void region

Page 46: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results

• Segment image and obtain texton and color histogram of each bulb

• Each image is a bag, segment is instance

• Image-level label

• Accuracy on new image and new region

Page 47: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results

• Gather data with Amazon’s Mechanical Turk

• Classifiers on “Easy” and “Hard”

• Regressors predict actual time cost

Page 48: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results

• Comparison of different selection strategy

• Accuracy: average value of the diagonal of the confusion matrix

• Region-level accuracy

• 80 random images added into unlabeled pool

Page 49: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results

• Comparison of with or without cost prediction function

• Work on Tree and Airplane, not Sky

Page 50: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Results

• Numbers for evaluation of active selection

• Active selection takes less effort to achieve the same level of

accuracy

Page 51: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Contribution

• An active learning framework choose annotation example

based on the balance of manual efforts and informativeness

• Handle annotation types on different level

• Active learning reduce much manual efforts

• Effectively predict the cost of annotation

• Multi-level and multi-label strategy outperform traditional

active method

Page 52: Visual Recognition with Humans in the Loopkovashka/cs3710_sp15/active...Department of Electrical & Computer Engineering Visual Recognition with Humans in the Loop Authors: Steve Branson,

Department of Electrical & Computer Engineering

Discussion & Question