Top Banner
Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han Interactive Audio Lab Northwestern University
73

Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Apr 09, 2018

Download

Documents

Dang Thu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Musical Sound Source Separation based on Computation Auditory Scene Analysis

Jinyu Han

Interactive Audio Lab Northwestern University

Page 2: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Outline of presentation

  Cocktail party problem

  Overlapped Harmonics

  Least Square Estimation

Page 3: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Cocktail Party Problem

Fig. 1. A cocktail party (Image from Breakfast at Tiffany’s: Paramount Pictures)

Page 4: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Cocktail Party Problem Ensemble: pick one instrument 

Page 5: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Fig. 2. Bach Chorale: Ach Gott und Herr

♪♩ ♫ ♬

♪♩ ♫ ♬

♫ 

♬♬

♩ ♫ ♫ ♫ 

♫ 

♬ ♬♩ ♫ 

♩ ♩ 

Page 6: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

music.cs.northwestern.edu 

Audio Source Separa7on •  Separa7ng out the individual sounds in an audio mixture 

Source 1

+

= Mixture

Source Separation

Source 2

Estimate 1 Estimate 2

♬♪

♬♪

Page 7: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

music.cs.northwestern.edu 

Prac7cal Applica7ons 

•  Hearing Aids •  Automated transcrip7on of speech and music 

•  Automated sound source iden7fica7on 

•  Speech recogni7on systems 

Page 8: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

music.cs.northwestern.edu 

Interes7ng Ques7ons 

•  How do humans separate sounds? 

•  Can we build a machine to do this?  

• What cues in the sound are important to separate one sound from background noise?  

Page 9: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approaches to Audio Separation 

•  Blind Source Separation (BSS)  Few assumptions about the sound source itself  Usually works on mixture of at least two channels  Methods include: ICA, NMF, Beamforming

•  Computational Auditory Scene Analysis (CASA)  Use heuristic grouping cues based upon psychological

observation  Typically deal with single channel mixture

Page 10: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

10 

Outline of presentation

  Cocktail party problem

  Overlapped Harmonics

  Least Square Estimation

Page 11: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

It’s NOT easy Violin 

Bassoon 

Time (s) 

Page 12: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

It’s NOT easy 

Time (s) 

Lay each source on top of each other 

Page 13: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

It’s NOT easy (Time Domain) Violin 

Bassoon 

Time (s) 

Mixture of violin and bassoon 

Page 14: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Time-frequency domain 

Bassoon 

Violin 

Page 15: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Time-frequency domain 

Bassoon 

Violin 

Overlap 

Overlap 

Overlap 

Overlap 

345 

1389 

2414 

Page 16: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Time-frequency domain 

Bassoon 

Violin 

Overlap 

Overlap 

Bassoon 

Overlap 

Overlap 

345 

1389 

2414 

Page 17: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Time-frequency domain 

Bassoon 

Violin 

Overlap 

Overlap 

Violin 

Overlap 

Overlap 

345 

1389 

2414 

Page 18: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Time-frequency domain 

Bassoon 

Violin 

Overlap 

Overlap 

Overlap 

Overlap 

345 

1389 

2414 

Page 19: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #1 

Bassoon 

Violin 

discard 

discard 

discard 

discard 

Bassoon 

Bassoon 

Bassoon 

Bassoon 

345 

1389 

2414 

Page 20: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #1 

•  Find the un‐overlapped parts (belonging to a single source) in the mixture 

•  Rebuild the sources from the un‐overlapped parts 

Page 21: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #1 

•  Find the un‐overlapped (ie single‐source) parts of the mixture – Use a mul7 pitch tracker to track the pitch of each sources 

•  Rebuild the sources from the un‐overlapped parts – Based on the pitch, find the harmonics for each source 

– Rebuild only using the un‐overlapped harmonics 

Page 22: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Just Take the un-overlapped Part  

Page 23: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Just Take the un-overlapped Part  

Page 24: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

DO SOMETHING with the overlap!!! 

Page 25: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Take a close look at the first 10 harmonics 

Page 26: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

A 3-D plot 

1st Harmonic 

5th  Harmonic 

Page 27: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

overlap 

overlap 

Un‐overlapped harmonic amplitude of bassoon from the mixture 

Page 28: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

overlap 

overlap 

Un‐overlapped harmonic amplitude of bassoon from the mixture 

Page 29: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Harmonic amplitude of bassoon 

Harmonic amplitude of bassoon in the mixture 

4th and 8th harmonics are overlapped by violin in the mixture  

Page 30: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Spectral Smoothness 

•  The Amplitude of a harmonic par7al is usually close to the amplitudes of the nearby par7als of the same sound. 

Page 31: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #2 

•  Find the un‐overlapped (ie single‐source) parts of the mixture 

•  Rebuild the sources from the un‐overlapped parts 

•  Rebuild the overlapped parts by interpola7ng from the un‐overlapped parts adjacent to the overlapped parts 

Page 32: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Spectral Smoothness 

Any problem ??? 

Original Reconstruc9on: 4th and 8th harmonics are interpolated from the neighboring harmonics 

Page 33: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Harmonic amplitude envelope Divide the amplitude of the harmonic 

at 9me t by the amplitude of the 

harmonic at 9me t=0 

Harmonic amplitude  

Harmonic amplitude envelope  

Page 34: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Harmonic amplitude envelope Harmonic amplitude envelope (in a log scale)  

Page 35: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Common Amplitude Modulation (CAM) 

•  The amplitude envelopes of different harmonics of the same source exhibit similar temporal dynamics 

Page 36: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Common Amplitude Modulation (CAM) 

Amplitude of un‐overlapped harmonics  

Harmonic amplitude envelope (normalized by the first frame)  

Page 37: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Common Amplitude Modulation  

•  For the overlapped harmonics, assume we know the amplitude at t = 1. 

•  Reconstruct the harmonic amplitude (t = 2,3,…….) using the amplitude of the first frame (t=1) and the envelope of the neighboring harmonic envelope 

Page 38: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #3 

•  Find the un‐overlapped (ie single‐source) parts of the mixture 

•  Es7mate the amplitude of un‐overlapped harmonics at t = 1; 

•  Rebuild the overlapped harmonics using the envelope of un‐overlapped harmonics 

Page 39: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #3 

• Let H4(t) indicate the amplitude of 4th harmonics at 7me t 

• The es7ma7on: H4’(t) = H4(1)*H5(t)/H5(1) 

Page 40: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Common Amplitude Modulation (CAM) 

Original  

Reconstruc9on 

Page 41: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

41 

Outline of presentation

 Cocktail party problem

 Overlapped harmonics

 Least Square Estimation

Page 42: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Time-frequency domain 

Bassoon 

Violin 

Overlap 

Overlap 

Overlap 

Overlap 

Page 43: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Β 

Α 

Es7mated envelope of bassoon scaled by the ini7al value 

Observed amplitude  in the mixture  Es7mated envelope of 

violin scaled by the ini7al value 

A and B indicate the amplitude of harmonic at the begging 7me t = 1  

Page 44: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Β 

Α 

Es7mated envelope of bassoon scaled by the ini7al value 

Observed amplitude  in the mixture  Es7mated envelope of 

violin scaled by the ini7al value 

A and B indicate the amplitude of harmonic at the begging 7me t = 1  

Page 45: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Es7mate the ini7al amplitude 

•  Iden7fy the overlapped parts from the mixture 

•  Es7mate the amplitude envelope for the overlapped harmonics based on un‐overlapped harmonics  

•  Do a least square es7ma7on 

Page 46: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

music.cs.northwestern.edu 

Audio Source Separa7on •  Separa7ng out the individual sounds in an audio mixture 

Source 1

+

= Mixture

Approach #3

Source 2

Estimate 1 Estimate 2

♬♪

♬♪

Page 47: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #1 Bassoon 

Page 48: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #3 Bassoon 

Page 49: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Original Bassoon 

Page 50: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #1 Violin 

Page 51: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Approach #3 Violin 

Page 52: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Original Violin 

Page 53: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

53 

Audio Source Separation

 Harmonic Masking

 Spectral Smoothness

 Common Amplitude Modulation

Page 54: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

54 

END

 Cocktail party problem

 Overlapped harmonics

 Least Square Estimation

Page 55: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

55 

Auditory Scene Analysis   Listeners parse the complex mixture of sounds arriving at the

ears in order to form a mental representation of each sound source

  This perceptual process is called auditory scene analysis

  Two conceptual processes of auditory scene analysis (ASA):   Segmentation. Decompose the acoustic mixture into sensory

elements (segments)   Grouping. Combine segments into groups, so that segments in the

same group likely originate from the same environmental source

Page 56: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

56 

Computational auditory scene analysis   Computational auditory scene analysis (CASA) approaches

sound separation based on ASA principles

  Pitch continuity   Harmonic partials   Spectral shape   Harmonic temporal envelope   ……

Page 57: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Pitch and Harmonics

Page 58: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Timbre and Spectral shape 

•  Harmonic structure feature  Normalized relative amplitudes of harmonics

58

Page 59: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Timbre and Spectral shape 

Page 60: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Group pitches into streams

60

Page 61: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Harmonic temporal Envelope

Fig: Amplitude envelopes of a clarinet playing a G# 

Page 62: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Common Amplitude Modulation 

•  Harmonics of same source have correlated envelope •  Harmonics with strong energy are more correlated

Page 63: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

63 

Outline of presentation

  Cocktail party problem   Computational Auditory Scene Analysis (CASA)   Harmonic instrument separation based upon CASA

Page 64: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoids

Page 65: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Fourier Transform

•  Fourier Transform break a signal into sum of sinusoids 

Page 66: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoid Model

Page 67: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Phase Change

Page 68: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Phase Change

Page 69: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoidal Model

Page 70: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoidal Model

Page 71: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoidal Model

Page 72: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoidal Model

Page 73: Musical Sound Source Separation - Northwestern …pardo/courses/eecs352/lectures/source... · Musical Sound Source Separation based on Computation Auditory Scene Analysis Jinyu Han

Sinusoidal Model