Classifying Motion Picture Audio
Post on 31-Jan-2016
16 Views
Preview:
DESCRIPTION
Transcript
Classifying Motion Picture Audio
Eirik Gustavsen07.06.07
Outline
• Motivation • Thesis• State of the Art• Proposed system• Experimental setup• Results• Future work• Conclusion
Motivation
• Most projects classify clear classes or classes with noise.
• Few clear boundaries in motion picture audio• Subjective descriptions of movies• Dificult to compare movie content
Thesis
It is possible to automatically create a table of contents of a motion picture, based on its audio track only.
Research questions
• Find best LLDs to classify motion picture audio
• Detect boundaries between audio classes within complex audio segments
• Automatically create a TOC based on the audio track only
Pre-Processing44100 Hz sample rateMono16 bits
30 ms windows (LW)
Low Level Descriptors
Time domain Frequency domain
Low Level Descriptors
• Total of 23 low level descriptors
TIME DOMAIN
• Audio Power• Audio Wave Form• Root-Mean Square• Short Time Energy• Low Short Time Energy Ratio• Zero-Crossing Rate• High Zero-Crossing Rate Ratio
FREQUENCY DOMAIN
• Audio Spectrum Centroid• Fundamental Frequency• 10 Mel-Frequency Cepstral Coefficients• Spectrum Flux
Dimensionally reduction
Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis.
f(1)f(2)f(3)f(4)f(5)...f(23)
PCAd(1)d(2)d(3)
K Nearest Neighbors
Proposed system
Pre- Prosessing LLD Norm
PCAKNNPost- Prosessing
TOC Generation
Classifying Audio
Speech
Noise (white)
Music
”Silence”
Mixed audio classes
Class Boundary Detection
Class Boundary Detection
Class Boundary Detection
Finding most suitable LLDs
Most Suitable:
ASCAWFRMSHZCRR
Sample Results
Music with low volume
Clear speech
Speech with background environmental sounds
Fading between music and speech
Speech with Background music
Jingle
” Some mistakes”
Future Work
• To be done in this thesis– Post processing– TOC
• Open research questions for future works– New motion picture audio classes– Detecting sound objects– Speech recognition
Conclusion
• Pre-processing makes it possible to classify motion picture audio correctly
• Using right combination of LLDs enhances the result of the classification
Questions
?
top related