Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content Stefan Balke International Audio Laboratories Erlangen PhD Defense
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
Stefan BalkeInternational Audio Laboratories Erlangen
PhD Defense
2
Vision
HeadIn
HeadOut
© AudioLabs, 2018
Stefan BalkeMultimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
3
Talk Outline
§ Retrieval of Musical Themes
§ Extraction of Predominant Musical Voices
§ Web-Based Technologies for Accessing Musical Content
Retrieval of Musical Themes
Beethoven, Op. 67Fate Motif
© AudioLabs, 2018
Stefan BalkeMultimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
5
A Dictionary of Musical ThemesHarold Barlow and Sam Morgenstern
© AudioLabs, 2018
Stefan BalkeMultimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
6
A Dictionary of Musical ThemesHarold Barlow and Sam Morgenstern
7
A Dictionary of Musical ThemesDatasets
ImageImage§ H. Barlow, S. Morgenstern:
A Dictionary of Musical Themes§ 10,000 Themes from Western classical music
8
A Dictionary of Musical ThemesDatasets
MIDI/Text§ Electronic Dictionary of Musical Themes§ Corresponding MIDI files plus metadata
Symbolic Text
Beethoven, Op. 67Fate Motif
ImageImage§ H. Barlow, S. Morgenstern:
A Dictionary of Musical Themes§ 10,000 Themes from Western classical music
9
A Dictionary of Musical ThemesDatasets
MIDI/Text§ Electronic Dictionary of Musical Themes§ Corresponding MIDI files plus metadata
Image
AudioAudio§ Performances of the musical works
Symbolic Text
Beethoven, Op. 67Fate Motif
Image§ H. Barlow, S. Morgenstern:
A Dictionary of Musical Themes§ 10,000 Themes from Western classical music
10
A Dictionary of Musical ThemesAudio-based Retrieval
Audio Collection
§ Cross-modalitySymbolic vs. audio data
§ TuningDeviations from standard tuning
§ TranspositionPlayed key vs. written key
§ TempoLocal & global tempo deviations
§ PolyphonyMonophonic query vs. polyphonic audio
§ Query: Musical theme
§ Goal: Retrieve audio recording
Musical Theme
[Balke16]
Challenges
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
11
A Dictionary of Musical ThemesRetrieval Pipeline
-1
1
0
Audio CollectionMusical Theme
Brahms, Hungarian DanceBeethoven, 5th Symphony
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
12
A Dictionary of Musical ThemesRetrieval Pipeline
C
B
G
-1
0
Audio CollectionC
hrom
a
Time (s)
B
Chr
oma
Time (s)
C
G
1
Brahms, Hungarian DanceBeethoven, 5th Symphony
Musical Theme
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content13
A Dictionary of Musical ThemesRetrieval Pipeline
C
B
G
-1
1
0
Audio Collection
Matching Function Cos
t
1
Time (s)
Chr
oma
B
Chr
oma
Time (s)C
G
Brahms, Hungarian DanceBeethoven, 5th Symphony
Musical Theme
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content14
A Dictionary of Musical ThemesRetrieval Pipeline
C
B
G
-1
1
0
Audio Collection
Matching Function Cos
t
1
Time (s)
Chr
oma
B
Chr
oma
Time (s)C
G
Brahms, Hungarian DanceBeethoven, 5th Symphony
Musical Theme
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content15
A Dictionary of Musical ThemesRetrieval Pipeline
C
B
G
-1
1
0
Audio Collection
Matching Function Cos
t
1
Time (s)
Chr
oma
B
Chr
oma
Time (s)C
G
Brahms, Hungarian DanceBeethoven, 5th Symphony
Musical Theme
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content16
A Dictionary of Musical ThemesRetrieval Pipeline
C
B
G
-1
1
0
Audio Collection
Matching Function Cos
t
0
1
Time (s)
Chr
oma
0
B
Chr
oma
Time (s)C
G
Original Theme Repetition
Beethoven, 5th Symphony Brahms, Hungarian Dance
Musical Theme
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content17
A Dictionary of Musical ThemesRetrieval Pipeline
C
B
G
Brahms, Hungarian Dance
-1
1
0
Audio Collection
Matching Function Cos
t
0
1
Time (s)
Chr
oma
0
B
Chr
oma
Time (s)C
G
Original Theme Repetition
1. Beethoven Op. 672. Brahms, Hungarian Dance
Beethoven, 5th Symphony
Musical Theme
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
18
A Dictionary of Musical ThemesRetrieval Experiment I
#Queries: 177 Themes #Database: 100 Tracks (~11 h)
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content19
A Dictionary of Musical ThemesRetrieval Experiment I
Top-1 Top-20Baseline 45.2 76.8
#Queries: 177 Themes #Database: 100 Tracks (~11 h)
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
20
A Dictionary of Musical ThemesRetrieval Experiment I
Top-1 Top-20Baseline 45.2 76.8
+ Tuning 46.9 81.9
#Queries: 177 Themes #Database: 100 Tracks (~11 h)
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
21
A Dictionary of Musical ThemesRetrieval Experiment I
Top-1 Top-20Baseline 45.2 76.8
+ Tuning 46.9 81.9
+ Transposition 53.7 91.0
#Queries: 177 Themes #Database: 100 Tracks (~11 h)
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
22
A Dictionary of Musical ThemesRetrieval Experiment I
Top-1 Top-20Baseline 45.2 76.8
+ Tuning 46.9 81.9
+ Transposition 53.7 91.0
+ Query Length 68.4 93.2
#Queries: 177 Themes #Database: 100 Tracks (~11 h)
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content23
A Dictionary of Musical ThemesRetrieval Experiment II
#Database: 1113 (~120 h)#Queries: 2046 Themes
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content24
A Dictionary of Musical ThemesRetrieval Experiment II
Top-1 Top-20 Top-50Tuning + 10 s 18.3 29.2 46.1
#Database: 1113 (~120 h)#Queries: 2046 Themes
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content25
A Dictionary of Musical ThemesRetrieval Experiment II
Top-1 Top-20 Top-50Tuning + 10 s 18.3 29.2 46.1Transp. + Query Length *) 39.5 66.9 76.1
*) Results from a recent study together with Frank Zalkow.
#Database: 1113 (~120 h)#Queries: 2046 Themes
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content26
A Dictionary of Musical ThemesRetrieval Experiment II
Top-1 Top-20 Top-50Tuning + 10 s 18.3 29.2 46.1Transp. + Query Length *) 39.5 66.9 76.1+ Predominant Melody *) 61.2 81.8 86.7
#Database: 1113 (~120 h)#Queries: 2046 Themes
*) Results from a recent study together with Frank Zalkow.
Extraction of PredominantMusical Voices
Solo VoiceEnhancement
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
28
Extraction of Predominant Musical Voices
Our Data-Driven Approach [Balke17]Estimate “monophonic” time-frequency representationfrom a “polyphonic” audio recording using Deep Neural Networks (DNNs).
Predominant Melody Extraction
1. Model-based approach [Salamon13, Bosch16]
2. Data-driven approach [Bittner15, Rigaud16, Bittner17]
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content29
DNN Training
4 5 6 7 8 9Time (s)
9
28
110
440
1760
8372
Freq
uenc
y(H
z)
8372
1760
440
110
28
94 5 6 7 8 9
Freq
uenc
y (H
z)
Time (s)4 5 6 7 8 9
Time (s)
9
28
110
440
1760
8372
Freq
uenc
y(H
z)4 5 6 7 8 9
Time (s)
TargetInput
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content30
DatasetWeimar Jazz Database (WJD)
§ 299 transcribed jazz solos of monophonic instruments
§ ca. 10 h of annotated music
Thanks to the Jazzomat research group: M. Pfleiderer, K. Frieler, J. Abeßer, and W.-G. Zaddach
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
31
DNN Architecture
ReLU ReLU ReLU ReLUReLU
! ∶= Input, $ ∶=Output, % ∶= Target, & ∶= Loss
! $& = MSE(!, $)
120Dimensions: 120 120 120 120 120 120
§ Basic DNN with 5 fully-connected layers.
§ Training is applied layer-wise [Bengio06, Uhlich15].
W1, b1 W2, b2 W3, b3 W4, b4 W5, b5
[Balke17]
32
Layer-Wise Training
§ Initialize weights and bias with
Linear Least Squares (LLS)
§ Train 600 epochs …
600
W1, b1
Epochs
33
Layer-Wise Training
600 1200
W1, b1 W2, b2
Epochs
34
Layer-Wise Training
600 1200 1800
W1, b1 W2, b2 W3, b3
Epochs
35
Layer-Wise Training
600 1200 1800 2400
W1, b1 W2, b2 W3, b3 W4, b4
Epochs
36
Layer-Wise Training
600
W1, b1 W2, b2
1200
W3, b3
1800
W4, b4
2400
W5, b5
3000
Epochs
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
37
Qualitative Evaluation
4 5 6 7 8 9Time (s)
9
28
110
440
1760
8372
Freq
uenc
y(H
z)
8372
1760
440
110
28
94 5 6 7 8 9
Fre
quen
cy (
Hz)
Time (s)
4 5 6 7 8 9Time (s)
9
28
110
440
1760
8372
Freq
uenc
y(H
z)
4 5 6 7 8 9Time (s)
TargetInput Output
4 5 6 7 8 9Time (s)
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
38
Predominant MelodyExtraction
Collection of PolyphonicMusic Recordings
MonophonicTranscription
RetrievalProcedure
vs.
Experiment: Jazz Music Retrieval
© AudioLabs, 2018Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content39
Experiment: Jazz Music RetrievalResults
Baseline Chroma-based matching [Mueller15]Melodia Quantized F0-trajectory [Salamon13]DNN
Query Duration (s)
Mea
nR
ecip
roca
lRan
k
Web-Based Technologies for Accessing Musical Content
< >
Audio
Score
Image
Text
© AudioLabs, 2018
Stefan BalkeMultimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
41
Technologies for Accessing Musical Content
T T
[Balke18]
© AudioLabs, 2018
Stefan BalkeMultimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
42
Technologies for Accessing Musical Content
T T
[Balke18]
© AudioLabs, 2018
Stefan BalkeMultimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
43
Technologies for Accessing Musical Content
T T
[Balke18]
© AudioLabs, 2018
Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
44
Technologies for Accessing Musical Content
T T
Retrieval Procedure
vs.
[Balke18]
45
[Balke18]
46
Audio
Symbolic
Image
Text
www