Schmiedeke, Kelm and Sikora Communication Systems Group Technische Universität Berlin 4 October, 2012 Feature Selection Methods for Bag-of-(visual)-Words Approaches
Dec 19, 2014
Schmiedeke, Kelm and SikoraCommunication Systems Group
Technische Universität Berlin
4 October, 2012
Feature Selection Methods for Bag-of-(visual)-Words Approaches
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Motivation 2
sports
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Lessons from last year
- Features derived from metadata (esp. tags) outperform visual and ASR ones• Metadata: Naive Bayes (non translated)• Visual feat.: SVM (avg. pooled histograms)• ASR transcripts: kNN (JSD)
- Uploader mainly contribute to a single category
3
Schmiedeke: “Feature Selection Methods for BoW Approaches”
This year‘s question
- Does feature selection improve results achieved with BoW model?
4
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection/ Transformation
- Mutual information:
- Term Frequency:
- PCA (Eigenvalue decomposition):
5
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection
- Concepts for terms selection:
6
Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)ministri(0.0780)
…
daytripp (0.0)adagio (0.0)acustica (0.0)
Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)
…
sonnet (0.0)screenplai (0.0)acustica (0.0)
Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)
…
ilsr (0.0)resystem (0.0)acustica (0.0)
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection 7
Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)misistri(0.0780)
…
daytripp (0.0)adagio (0.0)acustica (0.0)
Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)
…
sonnet (0.0)screenplai (0.0)acustica (0.0)
Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)
…
ilsr (0.0)resystem (0.0)acustica (0.0)
- Top-k-Union:
Schmiedeke: “Feature Selection Methods for BoW Approaches”
- Top-k:
Feature Selection 8
Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)misistri(0.0780)
…
daytripp (0.0)adagio (0.0)acustica (0.0)
Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)
…
sonnet (0.0)screenplai (0.0)acustica (0.0)
Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)
…
ilsr (0.0)resystem (0.0)acustica (0.0)
Schmiedeke: “Feature Selection Methods for BoW Approaches”
- Union>th:
0.0002 0.0002 0.0001
Feature Selection 9
Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)unleaven(0.0782)eeli (0.0782)davideel(0.0781)misistri(0.0780)
…
daytripp (0.0)adagio (0.0)acustica (0.0)
Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)grittv (0.0881)flander (0.0861)laura (0.0855)economi(0.0747)
…
sonnet (0.0)screenplai (0.0)acustica (0.0)
Top terms for health:jama (0.0495)health (0.0378)report (0.0357)harta (0.0227)exceric (0.0211)yoga (0.0203)study (0.0192)
…
ilsr (0.0)resystem (0.0)acustica (0.0)
Schmiedeke: “Feature Selection Methods for BoW Approaches”
- Intersection>Th:
0.0002 0.0002 0.0001
Feature Selection 10
Top terms for religion:bibl (0.0897)jesu (0.0797)god (0.0796)…webpythonxboxbigexpo…daytripp (0.0)adagio (0.0)acustica (0.0)
Top terms for politics:lunch (0.1200)obama (0.1113)polit (0.0982)…applgooglteenmusictv…sonnet (0.0)screenplai (0.0)acustica (0.0)
Top terms for health:jama (0.0495)health (0.0378)report (0.0357)…gossipinterviewiphonsantexa…ilsr (0.0)resystem (0.0)acustica (0.0)
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs
- Bag of clustered SURF features transformed using PCA• Result does not benefit from transformation
11
official run without FS/FT
mAP 0.2301 0.2309
CA 41.63 % 41.71 %
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs
- Bag of filtered ASR transcripts terms (Union>Th)• Result does benefit from selection
12
official run without FS/FT
mAP 0.1035 0.0522
CA 32.53 % 26.54 %
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs
- Bag of clustered SURF features filtered using MI and intersection>th strategy• Result does slightly benefit from selection
13
official run without FS/FT
mAP 0.2259 0.2221
CA 40.80 % 40.78 %
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs
- Bag of filtered terms derived from tags, title and descriptions (Union>Th)• Result does benefit from selection
14
official run without FS/FT
mAP 0.5225 0.4146
CA 58.18 % 55.70 %
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs
- Bag of clustered SURF features transformed using PCA and decision fusion using uploader• Result does benefit from transformation
15
official run without FS/FT
mAP 0.3304 0.2988
CA 52.14 % 49.19 %
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Conclusion & Future Work
- FS showed potential for improving the results
- Choice of using MI or TF is not critical, both methods achieve roughly same results
• Metadata (mAP) : MI12004 (0.5277) vs. TF14976 (0.5275)
- Investigation in different scaling schemes (NB)
- Use of class-independent selection score (MI)
16
Schmiedeke: “Feature Selection Methods for BoW Approaches”
Backup 17
Thank you!Questions ?