Top Banner
Generating Ground Truth for Music Mood Classification Using Mechanical Turk Jin Ha Lee & Xiao Hu JCDL 2012
25

Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Dec 01, 2014

Download

Technology

Jin Ha Lee

Presentation of "Generating Ground Truth for Music Mood Classification Using Mechanical Turk" by Jin Ha Lee and Xiao Hu at the 12th Annual ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL).
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Generating Ground Truth for Music Mood Classification Using

Mechanical TurkJin Ha Lee & Xiao Hu

JCDL 2012

Page 2: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Mood: a relatively long lasting and stable emotional state (Meyer, 1956)

Emotion?Affect?

Page 3: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Music mood

• Recently received a lot of attention in

MIR (Music Information Retrieval) domain

• “Audio Music Mood Classification” task in MIREX, starting in 2007

• Critical for developing MDLMusic Information RetrievalEvaluation eXchange

Page 4: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

• Evaluation is based on ground truth

Passionate

Bittersweet

Bittersweet

Bittersweet

Page 5: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

More is better!

However, generating ground truth based on human input is

expensive and time consuming

Page 6: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

How is it done in MIREX?

• A web-based survey system called E6K

• Invitations posted to MIREX and music-ir mailing lists in order to recruit volunteers

Page 7: Generating Ground Truth for Music Mood Classification Using Mechanical Turk
Page 8: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Can we use the

CROWDinstead ofMUSIC EXPERTS?

Is there a

better way?

Page 9: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

1. How do music mood classification

results obtained from Mechanical Turk compare to those collected

from music experts in MIREX?

2. How different or similar are the

evaluation outcomes for

MIREX AMC task when based on ground truth collected from Mechanical Turk vs. E6K?

Page 10: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Workers (Turkers)

Task Requester

Amazon Mechanical Turk(MTurk)

Page 11: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Cluster1 passionate, rousing, confident, boisterous, rowdy

Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured

Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding

Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry

Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral

TASK:Listen to 30 second music clips →Select one of the five mood clusters ↓

Page 12: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Qualification test

Consistency check

Review process

Page 13: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

1250 songs

x 2 judgments

2500 unique mood judgments

186 HITs collected

- 86 HITs rejected

100 HITs acceptedBasic Stats

1HIT =25 songs

Page 14: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

EVALUTRON 6000

Stats on Collecting Data

Average Time Spent on Each Music Clip

21.54 seconds 17.46 seconds

Total Time for Collecting All Judgments

38 days(+ additional in-house

assessment)

19 days

Cost for Collecting All Judgments

$0 $60.50

Page 15: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Comparison of E6K and MTurk data

Page 16: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Cluster E6K MTurk Diff. in % (E6K-MTurk)

Cluster1 405 (16.4%) 450 (18.0%) -1.6%

Cluster2 472 (19.1%) 536 (21.4%) -2.3%

Cluster3 542 (22.0%) 622 (24.9%) -2.9%

Cluster4 412 (16.7%) 367 (14.7%) 2.0%

Cluster5 400 (16.2%) 403 (16.1%) 0.1%

Other 237 (9.6%) 122 (4.9%) 4.7%

Total 2468 2500 -

Number of Judgments and Distribution across Clusters

Page 17: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Distribution of Agreement

Cluster E6K MTurk Both

Cluster1 121 89 29

Cluster2 130 131 44

Cluster3 163 216 91

Cluster4 121 85 42

Cluster5 126 121 64

Total 661 642 270

Page 18: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Confusion among the Clusters

Clusters Disagreed in E6K

Disagreed IN MTurk

Cluster 1 & Cluster 2 20 95

Cluster 2 & Cluster 4 31 86

Cluster 1 & Cluster 5 13 74

⁞ ⁞ ⁞

Cluster 3 & Cluster 4 6 27

Cluster 2 & Cluster 5 1 22

Cluster 3 & Cluster 5 1 20

Total 253 595

Page 19: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Cluster 1

Cluster 2

Cluster 5

Cluster 4

Cluster 3

Russell’s model

Page 20: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

System Performance

E6KAverage accurac

yMTurk

Average accurac

yCL 0.65 GT 0.66GT 0.64 CL 0.63TL 0.64 TL 0.63

ME1 0.61 ME1 0.57ME2 0.61 ME2 0.57IM2 0.57 IM2 0.57KL1 0.56 KL1 0.55IM1 0.53 IM1 0.54KL2 0.29 KL2 0.29

Page 21: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

TK-HSD Rank Comparison E6K

MTurkE6K

Page 22: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Conclusion

• Overall the human judgments from E6K and MTurk showed similar patterns:– Judgment distribution across five mood

clusters– Agreement distribution across clusters– Confusion among clusters

• System performance rankings from E6K and Mturk were also comparable

Page 23: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Conclusion (Cont’d.)

• However, combined ground truth from E6K and MTurk is only about 60% the size of the original E6K ground truth

• Mood is a highly subjective feature for describing and organizing music

• Other means for judging the moods should be explored (e.g., ranking)

Page 24: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Future work

• In-depth interview with users to investigate factors affecting people’s judgments on music mood

• More controlled study with different user groups

Page 25: Generating Ground Truth for Music Mood Classification Using Mechanical Turk

Questions?