Estimation of Violin Bowing Features from Audio …media.aau.dk/smc/wp-content/uploads/2017/09/ML4... · ffw NN Estimation of Violin Bowing Features from Audio Recordings with Convolutional

ffwNN

Estimation of Violin Bowing Features from Audio Recordings with Convolutional NetworksAlfonso Perez-Carrillo

Music Technology Group Universitat Pompeu Fabra , Barcelona, Spain

[email protected]

Hendrik Purwins The School of Engineering and Science

Aalborg University Copenhagen Copenhagen, DK [email protected] ML4Audio

The measurement (or direct acquisition ) of musical gestures usually involves the use of expensive sensing systems and complex setups that are generally intrusive in practice. In this work, we present an indirect acquisition method to estimate violin bowing controls from audio signal analysis based on training Convolutional Neural Networks with a previously recorded database of multimodal data (bowing controls and sound features) of violin performances.

sound bowing

Sinusoidal Model (SMS)

Inputs & Outputs

sound

harmonic

residual

Har

mon

ic

Energy in 40 harmonic + 40 residual frequency bands.

samples

Resi

dual

40

30 20

10

40

30

20

10

Logarithmic band centers, 50%overlapFrequency [Hz]

Triangular analysis windows

Harmonic/residual spectrum

Outputs: Bowing Controls (measured with sensors)which string bowing pressure bowing speed bow-bridge distance

Inputs: Auditory EnergyGram

X9

9

20

2

2

2x2x1x9

40

18

9

X

2

2

93

5

2x2x9x9

9

X

2

2

9 2

3

2x2x9x9

99 x 3 x 2

18 100

fully connected layer

100 flatten

9 x 3 x 2

x + 100

x + 18

fully connected layer

25bow control

Network Architecture

Correlation Coefficient

Mean Absolute ErrorAvg. error in parameter unitsRelative Absolute ErrorUnit-less avg. error percentage Root Relative Squared ErrorSimilar to RAE but weights outliers more heavily due to the square.

Evaluation

Estimation of Violin Bowing Features from Audio …media.aau.dk/smc/wp-content/uploads/2017/09/ML4... · ffw NN Estimation of Violin Bowing Features from Audio Recordings with Convolutional

Documents