Top Banner
Multi-scale Approaches to the MediaEval 2015 “Emotion in Music” Task Mingxing Xu, Xinxing Li, Haishu Xianyu, Jiashen Tian, Fanghang Meng, Wenxiao Chen Human-Computer Speech Interaction Lab. (HCSIL) Department of Computer Science and Technology Tsinghua University, Beijing, China 1
9

MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Feb 13, 2017

Download

Education

multimediaeval
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Multi-scale Approaches to the MediaEval 2015 “Emotion in Music” Task

Mingxing Xu, Xinxing Li, Haishu Xianyu, Jiashen Tian, Fanghang Meng, Wenxiao Chen

Human-Computer Speech Interaction Lab. (HCSIL) Department of Computer Science and Technology

Tsinghua University, Beijing, China

1

Page 2: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Motivation / Main Idea

1. High correlation among the music feature sequence

2. Multi-scale methods at three different levels

• Acoustic feature (run 3)

• Regression model (run 1, 2)

• Emotion annotation (run 4)

Acoustic Feature

Regression Model

Emotion Annotation

2

Page 3: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Feature Learning with Hierarchical Deep Neural Networks (DBNs + AE) Run 3

Acoustic features were organized into 4 groups according to theirphysical fundamentals and time scales on which they were extracted. NOTE: We submitted a paper to AAAI, containing details about this framework. 3

60 ms 25 ms25 ms25 ms

win: 1s; shift: 0.5s

final features @ 2 Hz

Page 4: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

BLSTM_60

baseline features @ 2 Hz

60

30 30

20

10

BLSTM_30

BLSTM_20

BLSTM_10

20 20

10 10 10 10 10

Fusion

Dynamic Music Emotion (Arousal, Valence)

Multi-scale BLSTM-RNNs based Fusion (1)

Run 1, 2Run 3

New Features

NOTE: • BLSTM-RNNs: 5 hidden layers (2 layers pre-trained), 250 units • Sequence length (time-scale): 60, 30, 20, 10 • Sliding window with 50% overlap used during full-song testing

4

Page 5: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Multi-scale BLSTM-RNNs based Fusion (2)

411 clips 20 clips

411 clips 20 clips

411 clips20 clips

411 clips20 clips

411 clips20 clips

trail 1 trail 2 trail 3

partition 1 RMSE 11 RMSE 12 RMSE 13

partition 2 RMSE 21 RMSE 22 RMSE 23

partition 3 RMSE 31 RMSE 32 RMSE 33

partition 4 RMSE 41 RMSE 42 RMSE 43

partition 5 RMSE 51 RMSE 52 RMSE 53

5 different partitions: select 20 clips randomly

as the validation set

3 trails of the same model: randomized initial weights

Two criteria for model selection: 1. RMSE-first: select the model with the best RMSE for each time scale

2. RMSE+PARTITION: consider both RMSE and partition

5

Page 6: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

BLSTM_10BLSTM_60 BLSTM_30 BLSTM_20

ELM 1

+ Delta, + Smoothing

BLSTM_10BLSTM_20

GROUP 1 RMSE-first

GROUP 2 RMSE + PARTITION

ELM 2

AVERAGE

Dynamic Music Emotion (Arousal, Valence)

AVERAGE

Run 1 Run 2Run 3

Multi-scale BLSTM-RNNs based Fusion (3)

6

triangle filter, length: 50

Page 7: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

SVR based Hierarchical Regression

songGlobal feature Global SVR

Local feature Local SVR

SUMDynamic Music Emotion

(Arousal, Valence)

clip ( 30 s )Dynamic Music Emotion

(Arousal, Valence)

Global trend

Local fluctuation

Global SVR

Local SVR

Global feature

Local feature

Global features: OpenSMILE, IS13_ComParE, 6373 Local features: OpenSMILE, IS13_ComParE_lld,130, MEAN + STD, Win: 1s, Shift: 0.5s

Run 4

7

Page 8: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Conclusions:

1. Several multi-scale approaches at three levels were proposed.

2. Results illustrated the effectiveness of our new methods.

3. Multi-scale BLSTMs based Fusion with ELMs (Run 2) was almost the best.

4. SVR based Hierarchical Regression is a promising method.

Future Work:

• Select the time scale automatically and systematically

• Improve multi-scale feature learning8

BLSTM-AVGBLSTM-ELMR1 + NEW-FSVR-HR

Page 9: MediaEval 2015 - Multi-Scale Approaches to the MediaEval 2015 "Emotion in Music" Task

Thank you for your attention!

Questions?

9