Classifying Motion Picture Audio

Post on 31-Jan-2016

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Classifying Motion Picture Audio. Eirik Gustavsen 07.06.07. Outline. Motivation Thesis State of the Art Proposed system Experimental setup Results Future work Conclusion. Motivation. Most projects classify clear classes or classes with noise. - PowerPoint PPT Presentation

Transcript

Classifying Motion Picture Audio

Eirik Gustavsen07.06.07

Outline

• Motivation • Thesis• State of the Art• Proposed system• Experimental setup• Results• Future work• Conclusion

Motivation

• Most projects classify clear classes or classes with noise.

• Few clear boundaries in motion picture audio• Subjective descriptions of movies• Dificult to compare movie content

Thesis

It is possible to automatically create a table of contents of a motion picture, based on its audio track only.

Research questions

• Find best LLDs to classify motion picture audio

• Detect boundaries between audio classes within complex audio segments

• Automatically create a TOC based on the audio track only

Pre-Processing44100 Hz sample rateMono16 bits

30 ms windows (LW)

Low Level Descriptors

Time domain Frequency domain

Low Level Descriptors

• Total of 23 low level descriptors

TIME DOMAIN

• Audio Power• Audio Wave Form• Root-Mean Square• Short Time Energy• Low Short Time Energy Ratio• Zero-Crossing Rate• High Zero-Crossing Rate Ratio

FREQUENCY DOMAIN

• Audio Spectrum Centroid• Fundamental Frequency• 10 Mel-Frequency Cepstral Coefficients• Spectrum Flux

Dimensionally reduction

Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis.

f(1)f(2)f(3)f(4)f(5)...f(23)

PCAd(1)d(2)d(3)

K Nearest Neighbors

Proposed system

Pre- Prosessing LLD Norm

PCAKNNPost- Prosessing

TOC Generation

Classifying Audio

Speech

Noise (white)

Music

”Silence”

Mixed audio classes

Class Boundary Detection

Class Boundary Detection

Class Boundary Detection

Finding most suitable LLDs

Most Suitable:

ASCAWFRMSHZCRR

Sample Results

Music with low volume

Clear speech

Speech with background environmental sounds

Fading between music and speech

Speech with Background music

Jingle

” Some mistakes”

Future Work

• To be done in this thesis– Post processing– TOC

• Open research questions for future works– New motion picture audio classes– Detecting sound objects– Speech recognition

Conclusion

• Pre-processing makes it possible to classify motion picture audio correctly

• Using right combination of LLDs enhances the result of the classification

Questions

?

top related