ffw NN Estimation of Violin Bowing Features from Audio Recordings with Convolutional Networks Alfonso Perez-Carrillo Music Technology Group Universitat Pompeu Fabra , Barcelona, Spain [email protected] Hendrik Purwins The School of Engineering and Science Aalborg University Copenhagen Copenhagen, DK [email protected] ML4Audio The measurement (or direct acquisition ) of musical gestures usually involves the use of expensive sensing systems and complex setups that are generally intrusive in practice. In this work, we present an indirect acquisition method to estimate violin bowing controls from audio signal analysis based on training Convolutional Neural Networks with a previously recorded database of multimodal data (bowing controls and sound features) of violin performances. sound bowing Sinusoidal Model (SMS) Inputs & Outputs sound harmonic residual Harmonic Energy in 40 harmonic + 40 residual frequency bands. samples Residual 40 30 20 10 40 30 20 10 Logarithmic band centers, 50%overlap Frequency [Hz] Triangular analysis windows Harmonic/residual spectrum Outputs: Bowing Controls (measured with sensors) which string bowing pressure bowing speed bow-bridge distance Inputs: Auditory EnergyGram X 9 9 20 2 2 2x2x1x9 40 18 9 X 2 2 9 3 5 2x2x9x9 9 X 2 2 9 2 3 2x2x9x9 9 9 x 3 x 2 18 100 fully connected layer 100 flatten 9 x 3 x 2 x + 100 x + 18 fully connected layer 25 bow control Network Architecture Correlation Coefficient Mean Absolute Error Avg. error in parameter units Relative Absolute Error Unit-less avg. error percentage Root Relative Squared Error Similar to RAE but weights outliers more heavily due to the square. Evaluation