Top Banner
Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao Duan University of Rochester 14th Sound and Music Computing Conference July 5 – 8, 2017 Espoo, Finland 1 14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5 - 8, 2017, ESPOO , FINLAND
24

Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Mar 23, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis

Bochen Li, Chenliang Xu, Zhiyao Duan

University of Rochester

14th Sound and Music Computing ConferenceJuly 5 – 8, 2017Espoo, Finland

114TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 2: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

• Music à multi-modal art form

• See and listen à more enjoyment

• Popular music video streaming service38.4%Music

Others

Background

14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND 2

Page 3: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Background

Multi-modal MIR

• Instrument Recognition

• Playing Activity Detection

• Polyphonic Music Analysis

• Fingering Estimation

• Conductor Following

314TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 4: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Detected Players

Separated Sound Tracks

String Music Performance

The Problem – Audio-visual Source Association

414TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 5: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

String Music Performance

Audio-visual Source Association

The Problem – Audio-visual Source Association

514TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 6: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

• Intuitive and user-friendly interaction with music performance videos• Smart Music Editor• Concert cameras automatically take close-up shots of the leading

player/instrument

The Problem – Audio-visual Source AssociationApplication

614TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 7: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

The Problem – Audio-visual Source AssociationApplication

714TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

• Intuitive and user-friendly interaction with music performance videos• Smart Music Editor• Concert cameras automatically take close-up shots of the leading

player/instrument

Page 8: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

The Problem – Audio-visual Source AssociationApplication

814TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

• Intuitive and user-friendly interaction with music performance videos• Smart Music Editor• Concert cameras automatically take close-up shots of the leading

player/instrument

Page 9: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Prior WorkBow Motion Analysis

• Bow Motion <–> Note Onsets

914TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 10: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Prior WorkLimitations

• When players have the same rhythm

1014TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 11: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Proposed System OverviewVibrato Features for String Instruments

• Vibrato à Audio pitch fluctuations• Vibrato à Fine motions of left hand• Correlate pitch fluctuations with fine motions of left hand

1114TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 12: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Audio AnalysisScore-informed Source Separation

Vibrato Extraction

[2] Z. Duan and B. Pardo, “Soundprism: An online system for score-informed source separation of music audio,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, pp., 2011.

• Score-informed pitch refinement on separated sources• Auto-correlation on pitch trajectory

• Audio-score alignment• Harmonic mask

1214TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 13: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Video AnalysisHand Tracking• Kanade-Lucas-Tomasi (KLT) tracker with 30 feature points• Bounding box: 70*70 pixels, centered at the median position of feature points• Re-initialize feature points every 20 frames

1314TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 14: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Video AnalysisFine-grained Motion Capture

• Optical flow estimation à pixel-level motion velocities• Average the motion velocities within the bounding box:

• Subtract its moving average to eliminate body motion:

Original Frame Color-encoded Optical Flow v(t)

1414TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 15: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Video AnalysisFine-grained Motion Capture

• Principal Component Analysis (PCA) à Identify principal motion along the

fingerboard à 1-D Motion Velocity Curve:

• Integration on V(t)à Motion Displacement Curve:

1514TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 16: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Video AnalysisFine-grained Motion Capture

1614TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 17: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Source-player Association

Motion Displacement CurvePitch Contour Associated player

Not associated player

Pitch & Motion (normalized)

One note

1714TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 18: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Source-player Association

• Note-level matching scoreà Cross-correlation

• Track-level matching scoreà Sum of note-level matching score

Audio track index

Normalized pitch

Normalized motion

Total number of vibrato notes in the p-th track

18

Player index

!-th note

14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 19: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Method – Source-player Association

M1,1 M2,1 M3,1 M4,1

M1,2 M2,2 M3,2 M4,2

M1,3 M2,3 M3,3 M4,3

M1,4 M2,4 M3,4 M4,4

Output the permutation that maximizes the association score

• Association score

19

Track-level matching score

Total number of tracks (i.e.,

players)

One permutation

14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 20: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

ExperimentsDataset: URMP Dataset [3]• Individually recorded and assembled together• 14 instruments, 44 piece arrangements

[3] B. Li *, X. Liu *, K. Dinesh, Z. Duan, and G. Sharma, “Creating a musical performance dataset for multimodal musicanalysis: Challenges, insights, and applications,” IEEE Trans. Multimedia, under review.

2014TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 21: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

ExperimentsPiece Selection• 19 pieces → 5 duets, 4 trios, 7 quartets, 3 quintets • Selection criteria: contains at most 1 non-string instrument• Same set as the baseline system (bow motion ßà note onset)

Evaluation Measure• Note-level Matching Accuracy:

The % of vibrato notes that are best matched to the correct player, according to the note-level matching score• Piece-level Association Accuracy:

The % of pieces that the correct association is returned, according to the piece-level association score(Polyphony increases à Number of error candidates increases in factorial rate)

2114TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 22: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Experiments

Results: Note-level Matching Accuracy

2214TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Median / Mean

Accuracy by random

guess

Page 23: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

ExperimentsResults: Piece-level Association Accuracy

• Overall Accuracy: 94.7% (18 out of 19) Compared with Baseline: 89.5% (based on bow motion/audio onset)

• Error Case: No vibrato is used in the performance

2314TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND

Page 24: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao

Conclusions & Future Work

Future Work

• Combine all motion features in string music

Bow & Vibrato & Body movement & …

• Video à Vibrato analysis (rate & extent)

From monophonic to polyphonic

• Step into woodwind & brass instruments

• Audio-visual Source Separation

24

Conclusions

• Audio-visual source association for string music, by correlating pitch fluctuations and left-hand motions

• Highly effective, not demanding on camera angles

• Limitations: Vibrato is not guaranteed to appear in all pieces

14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND