Top Banner
COLLABORATIVE TEAM for TRECVID 2009 High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, Department of Computer Science, Georgia Institute of Technology Nakamasa Inoue, Shanshan Hao, Tatsuhiko Saito, Koichi Shinoda, Department of Computer Science, Tokyo Institute of Technology
26

High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

Jun 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

High-Level Feature Extraction Using

SIFT GMMs, Audio Models,

and MFoM

Ilseo Kim,

Chin-Hui Lee,

Department of Computer Science,

Georgia Institute of Technology

Nakamasa Inoue, Shanshan Hao,

Tatsuhiko Saito, Koichi Shinoda,

Department of Computer Science,

Tokyo Institute of Technology

Page 2: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Outline1. SIFT Gaussian mixture models (GMMs) and audio models

2. Text representation of images

3. Multi-Class Maximal Figure-of-Merit (MC MFoM)

classifier to combine 1 & 2

Best result: Mean InfAP = 0.168

1

Page 3: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

1. SIFT GMMs and Audio Models

Page 4: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

SIFT Feature Extraction! Extract SIFT features from all the image frames

with Harris-Affine / Hessian-Affine regions.

! Apply PCA to reduce dimension [128dim 32dim].

PCA

PCA

Harris-Affine

Hessian-Affine

shot

2

Page 5: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

SIFT Gaussian Mixture Models! Model SIFT features by a Gaussian Mixture Model

(GMM).

Robustness against quantization errors that occur in hard-assignment clustering in the BoW approach is expected.

! Probability density function (pdf)

of SIFT GMM :

: num. of mixtures (512)

: mixing coefficient

: pdf of Gaussian

: mean vector

: variance matrix3

Page 6: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

SIFT Gaussian Mixture Models! Maximum A Posteriori (MAP) adaptation

all videos

shot

SIFT GMM UBM(Universal Background Model)

SIFT GMM for the shot

MAP adaptation

4

Page 7: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Classification! Distance between SIFT GMMs: Weighted sum of Mahalanobis distance

: UBM, : s-th and t-th shots

! SVM classification with probability outputs

Kernel function :

Finally, we obtain posteriori probability

5

Page 8: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

FFT

Log DCT

spectrum

filter bankMFCCs

Audio Models! Features: Mel-Frequency Cepstral Coefficients (MFCCs)

! Models: Hidden Markov Models (HMMs)

Feature extraction process1. Frame extraction

2. Windowing [Hamming window]

3. Fast Fourier transform (FFT)

4. Mel scale filter bank

5. Logarithmic transform

6. Discrete cosine transform (DCT)

6

Page 9: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Hidden Markov Models! Ergodic HMMs (2 states, GMMs with 512 mixtures)

! Log of likelihood ratio

HMM UBM

HMM for the target HLF

all videos

Videos of

a target HLF

7

Page 10: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Hidden Markov Models! Ergodic HMMs (2 states, GMMs with 512 mixtures)

! Log of likelihood ratio

shotlikelihood

likelihood

log of likelihood ratio

UBM

Target

7

Page 11: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Combination of SIFT GMMs and Audio Models

! Outputs from

audio models

SIFT GMMs with Harris-Affine regions

SIFT GMMs with Hessian-Affine regions

! Log of likelihood ratio and posteriori probability

! Combined log of likelihood ratio

Optimize weight parameters by 2-fold cross validation

where

8

Page 12: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Combination of SIFT GMMs and Audio Models

! Outputs from

audio models

SIFT GMMs with Harris-Affine regions

SIFT GMMs with Hessian-Affine regions

! Log of likelihood ratio and posteriori probability

whereconst.

8

Page 13: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Combination of SIFT GMMs and Audio Models

! Outputs from

audio models

SIFT GMMs with Harris-Affine regions

SIFT GMMs with Hessian-Affine regions

! Log of likelihood ratio and posteriori probability

! Combined log of likelihood ratio

Optimize weight parameters by 2-fold cross validation

where

8

Page 14: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

2. Text Representation of Images

and MC MFoM Classifier

Page 15: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Text Representation of Images

.

Concept 1

Concept 2

Concept n

.

.

.

.

MC-ML

Learning

Feature Vector

Counts of

visual terms :

unigram and

bigrams or more

Dimensionality

reduction

Apply

LSA

1 1 1 1

1 1 1 1

1 4 4 1

4 9 9 4

40 38 40

40 21 21 21

38

Image representation

with visual alphabetsSegmentation

Extract Low-Level

Features

Object, Color,

Texture, Shape

-> Clustering

9

Page 16: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

MC MFoM Classifier! Multi-Class (MC) learning approach

MC learning approach can learn a classifier even if there are not

enough positive samples like the case of the HLF extraction task

in TRECVID2009.

! Maximal Figure-of-Merit (MFoM) Classifier

MFoM classifier can directly optimize any objective performance

metric such as m-F1 and MAP by approximating discrete

functions to continuous functions, and the GPD algorithm.

10

Page 17: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

MC MFoM Learning Scheme• The parameter set, is estimated by directly

optimizing an objective performance metric with a linear classifier,

.

• Given N concepts, and D-dimensional image

representation, , the decision rule is

where indicates a geometric average for scores of all

competing concepts to the concept j.

11

Page 18: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

MC MFoM Learning Scheme• Misclassification function, is

defined where a correct decision is made when .

• Approximation of discrete functions to continuous functions byintroducing a sigmoid function

• Now, most commonly used metrics could be represented with theabove approximations, and directly optimized with GPD algorithm.

12

Page 19: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

3. MFoM Fusion

Page 20: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Discriminant Fusion Scheme! Model Based Transformation (MBT) fusion

Given N concepts, N score functions are learned by an MC MFoM

classifier. Taking the N score functions as the basis for the

transformation, we can obtain a new N-dimensional feature.

A new MC-MFoM classifier can be trained using MxN-dimensional

features.

13

Page 21: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

! Rank fusion

The rank numbers from different systems are combined to get a

new rank number:

2-fold cross validation is used to determine the weight parameters

Reference experiment to MFoM fusion

the rank number of shot x in the ranked output of

classification system i

:

the weight assignment to system i:

14

Page 22: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

4. Experiment

Page 23: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Result

0.168

0.152

0.149

0.147

0.108

0.023

SIFT GMMs + Audio models (no fusion)

MFoM (MBT fusion) 1

MFoM (MBT fusion) 2

Rank fusion

Visual word + MFoM (no fusion)

Local + Global features (no fusion)

A_TITGT-Titech-1_4

A_TITGT-Fusion-score-2_3

A_TITGT-Fusion-score-1_2

A_TITGT-Fusion-rank_1

A_TITGT-Gatech-Ftr_5

A_TITGT-Titech-1_6

MInfAPRun name

! MeanInfAP of SIFT GMMs + Audio models was 0.168, which is ranked11th of all A-type runs and 4th among all participating teams.

! The MFoM fusion works better than the rank fusion.

15

Page 24: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Result cont.

16

SIFTGMMs + Audio (A_TITGT-Titech-1_4)

Visual word + MFoM (A_TITGT-Gatech-Ftr_5)

Fusion best (A_TITGT-Fusion-score-2_3)

Max

Median

Page 25: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Result cont.SIFTGMMs + Audio (A_TITGT-Titech-1_4)

Visual word + MFoM (A_TITGT-Gatech-Ftr_5)

Fusion best (A_TITGT-Fusion-score-2_3)

Max

Median

! Combination with audio is effective for the HLF extraction. Good : Singing (0.229), People-dancing (0.319),

People-playing-a-musical-instruments (0.155),

Female-human-face-closeup (0.266).

! SIFT GMMs represent HLFs with the background. Good : Airplane_flying (0.138), Boat_Ship (0.250). 16

Page 26: High-Level Feature Extraction Using SIFT GMMs, Audio ... · High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Chin-Hui Lee, ... Nakamasa Inoue, Shanshan

COLLABORATIVE TEAM

for TRECVID 2009

Conclusion! Combination of SIFT GMMs and audio models is effective

for the HLF extraction (Mean InfAP = 0.168).

- SIFT GMMs work well for various HLFs.

- Audio models can detect HLFs complementary.

! It is difficult to make a fusion of different systems.

! More improved collaboration work

! Using time/spatial region information

Future work

17