Top Banner
Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/ ~gtzan
42

Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Digital Music & Music Processing

George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University

[email protected]://www.cs.cmu.edu/~gtzan

Page 2: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Overview

Music Information Retrieval (MIR) and Computer Audition

Motivation

Techniques

Applications

Computer Music and Sound SynthesisExamples, demos

Page 3: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

MIR Music History

9000 B.C 1000 1700

1877 1960 2002

Page 4: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Music

4 million recorded CD tracks

4000 CDs / month

Mp3 bandwidth %

Global

Pervasive

Persistent

Why ?

Page 5: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

The future of MIR

Library of all recorded music

Tasks: organize, search, retrieve, classify, recommend, browse, listen, annotate

Examples:

Page 6: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Audio MIR Pipeline

Signal Processing Machine Learning Human Computer Interaction

HearingRepresentation

UnderstandingAnalysis

ReactingInteraction

Page 7: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Traditional Music Representations

Page 8: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Time domain waveform

time

pressure

Decompose into building blocks

time

frequency

Page 9: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

MIDI

Musical Instrument Digital InterfaceHardware interface

File format

Note eventsDuration, discrete pitch, “instrument”

ExtensionsGeneral MIDI

Notation, OMR, continuous pitch

Page 10: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Symbolic vs Audio MIR

Audio

PolyphonicTranscription

Symbolic Representation (MIDI)

MIR

Audio

ComputerAudition

MIR

Machine Learning Models

Page 11: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Feature extraction

Page 12: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Timbral Texture

Timbre = differentiate sounds of same loudness, pitchTimbral Texture = differentiate mixtures of sounds

Global, statistical and fuzzy properties

Page 13: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Spectrum

t

t+1

M

M

Page 14: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Fourier Transform P=1/f

Page 15: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Short Time Fourier Transform

STFT Filterbank interpretation

Filters Oscillators

Amplitude

Frequency

output

Page 16: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Short Time Fourier Transform II

t

t+1

M

M

Page 17: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Formants

From “Real Time Synthesis for Interactive Applications”P.Cook, A.K Peters Press, used by permission

Page 18: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Linear Prediction Coefficients

Impulses @ f0

White Noise

Source Filter Speech

Lossless tubes

Page 19: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

MPEG Audio Coding (mp3)

AnalysisFilterbank

Psychoacoustic Model

Available bits

32 linearly spacedbands

Encoder: Slower, ComplicatedDecoder: Faster, Simpler

Perceptual Audio Coding

Page 20: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Spectral Shape

t

MCentroidRolloffFluxRMS

Moments…

Page 21: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Summary of Timbral Texture Features

Time-Frequency analysis

Signal Processing (STFT, DWT)

Source-filter (LPC)

Perceptual (MP3)

Spectral Shape to feature vector

Page 22: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Pitch Content

Harmony-melody = pitch concepts

Music theory score = music

Bridge to symbolic MIR

Automatic music transcription

Non-transcriptive arguments

Page 23: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Automatic Pitch Detection

P=1/f

Time-domainFrequency-domain

Perceptual

Zerocrossings Autocorrelation analysis = peaks of function correspond to dominant pitches

Page 24: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Pitch Histograms

Jazz IrishChroma - folded

Height - unfolded

Page 25: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Automatic Music Transcription

Original Transcribed

Mixture signal Noise suppresionPredominant Pitch

Estimation

Remove detected sound

Estimate # of voices

Page 26: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Rhythm

Movement in time

Origins in Poetry (iambic, trochaic)

Foot tapping definition

Hierarchical semi-periodic structure at multiple levels of detail

Links to motion, dance

Running vs global

Page 27: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Self similarity

DWT Autocorrelation

Peak Picking

Beat Histograms

EnvelopeExtraction

Page 28: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Beat Histograms

Page 29: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Analysis

Classification

Segmentation

Similarity Retrieval

Clustering

Thumbnailing

Fingerprinting

Page 30: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Analysis Overview

Musical PieceTrajectory Point

Page 31: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Query-by-example Content-based Retrieval

Ranked list of k nearest neighbors

Page 32: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

QBE examples

Rock: BeatlesJazz: Bobby Hutserson Funk: Mano negraWorld: Tibetan singerComputer Music: Paul Lansky

Query Match

Page 33: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Automatic Musical Genre Classification

Categorical music descriptions created by humans

Fuzzy boundaries

Statistical propertiesTimbral texture, rhythmic structure, harmonic content

Automatic musical genre classificationEvaluate musical content features

Structure audio collections

Page 34: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Genregram demo

Dynamic real-timevisualization for classification

of radio signals

Page 35: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Audio segmentation

Detect changes of audio texture

Page 36: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Multifeature automatic segmenation methodology

Time series of feature vector v(t)

Detect abrupt changes in trajectory

Page 37: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Context & Content Aware User Interfaces

Automatic results not perfect

Music listening is personal and subjective

Browsing vs retrieval

“Overview, zoom and filter, details on demand”, Shneiderman mantra

Adapt UI to music content and contextComputer Audition

Visualization

Page 38: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Content and Context

Content ~ fileGenre, male voice, saxophone

Content ~ file, collectionSimilarity

Slow-fast

Multiple visualizationsSame content

Different context

Page 39: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Timbregrams

Content & Context Similarity + Time StructurePrincipal Component AnalysisMap feature vectors to color

Page 40: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Timbrespaces

Page 41: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Islands of Music

Page 42: Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.

Copyright Nov. 2002, George

Tzanetakis

Auditory Scene Analysis