An Auditory Classifier Employing a Wavelet Neural Network Implemented in a Digital Design Thesis Defense Jonathan Hughes [email protected] Department of Computer Engineering Rochester Institute of Technology Rochester, NY August 11 th , 2006
Mar 26, 2015
An Auditory Classifier Employing a Wavelet Neural Network Implemented in a Digital Design
Thesis Defense
Jonathan [email protected]
Department of Computer EngineeringRochester Institute of TechnologyRochester, NYAugust 11th, 2006
Overview
• Introduction• Wavelets• Feature Extraction• Neural Networks• Wavelet Neural Networks• Results• Conclusions• Future Work• Acknowledgements• Questions• Demonstration
Introduction
• Computer Systems– Can replace or improve upon human operators
• Auditory Processing– Voice recognition– Speaker recognition– Multimedia indexing– Sonar Analysis
Introduction (cont.)
• Problem– Classify audio samples as either Voice or Music– But, how to classify a time-series?
• Wavelets– The wavelet transform reveals details of a time series with both
time and frequency localization
• Feature Extraction– Need to extract meaningful features from wavelet coefficients
• Artificial Neural Networks– Excellent at classification tasks– Can classify the audio samples from the extracted features
• Wavelet Neural Network
Wavelets
• Signal analysis– Benefit from the Fourier Transform and more
recently the Wavelet Transform
• Fourier Transform (early 1800’s)– Superposition of sine and cosine functions– Reveal frequency information of time-series– Not very useful for localized non-periodic
signal analysis
Wavelets (cont.)
• Wavelet Transform (early 1900’s)– Multi-resolution analysis– Small details are represented, as well as the
gross features, and all scales in between– Uses a Mother Wavelet (Analyzing Wavelet) as
the prototype function– Approximate the target function
• Dilation• Translation
Wavelets (cont.)
Wavelets (cont.)
ψ00
-1.5
-1
-0.5
0
0.5
1
1.5
-0.5 0 0.5 1 1.5
Wavelets (cont.)
• Time-Frequency components are generated by sets of scaling and wavelet functions– Low- and High-Pass filters– Down-sampled high-pass filter outputs become
wavelet detail coefficients– Down-sampled low-pass filter outputs are
processed by the next level of filters– Final stage low-pass filter output is the wavelet
approximation coefficient
Wavelets (cont.)
• Digital Design Implementation– Designed to accept uncompressed digital
audio– Sampled at 11.025 kHz– 16 bits per sample– Processed in blocks of 256 samples or
approximately 23 milliseconds– Requires log2(256) = 8 levels of filters to
transform the audio data into wavelet coefficients
Wavelets (cont.)
2ba XX
LowPass
Carry Look-Ahead Adder
16 16
Low Pass Unit
0Ci
Carry Look-Ahead Adder
16
High Pass Unit
1Ci
16
Carry Look-Ahead Adder
0 1Ci
16
16
bit 15
bit 15 bits 15-1
Mux
Carry Look-Ahead Adder
0 1Ci
16
16
bit 15
bit 15 bits 15-1
Mux
2ba XX
HighPass
Wavelets(cont.)
AUDIO
CLK
RESET
MODE
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
Low Pass Filter High Pass Filter
128 4-bit Save Registers64 4-bit Save Registers321684211
16 16
8-bit Counter
to all registers
control theregister loading
4 4 4 4 4 4 4 4 4
16-bit Register
16
128 4-bit Result Registers64 4-bit Result Registers321684211
Wavelet Coefficients x256
4 4 4 4 4 4 4 4 4
16
AUDIO
CLK
RESET
MODE
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
Low Pass Filter High Pass Filter
128 4-bit Save Registers64 4-bit Save Registers321684211
16 16
8-bit Counter
to all registers
control theregister loading
4 4 4 4 4 4 4 4 4
16-bit Register
16
128 4-bit Result Registers64 4-bit Result Registers321684211
Wavelet Coefficients x256
4 4 4 4 4 4 4 4 4
16
AUDIO
CLK
RESET
MODE
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
16
Low Pass Filter High Pass Filter
16
16-bit Register 16-bit Register
16 16
Low Pass Filter High Pass Filter
128 4-bit Save Registers64 4-bit Save Registers321684211
16 16
8-bit Counter
to all registers
control theregister loading
4 4 4 4 4 4 4 4 4
16-bit Register
16
128 4-bit Result Registers64 4-bit Result Registers321684211
Wavelet Coefficients x256
4 4 4 4 4 4 4 4 4
16
Feature Extraction
• Data provided by the wavelet decomposition process– Multi-resolution and time- and frequency-localized– Not of an acceptable form for neural network
processing
• Feature extractor– Generates meaningful features from the wavelet
coefficients– Stefan Pittner and Sagar V. Kamarthi
Feature Extraction (cont.)
• Feature extraction process– Requires initial setup steps that involve
processing a training set of data– Customized for the data space of interest– Depends on the use of clusters to identify
groups of wavelet coefficients
• Cluster Generation
Feature Extraction (cont.)
• Extracting the features– For each cluster, U, the feature, u, is
calculated by taking the square root of the sum of the squares of each coefficient, v, in the cluster
– Euclidean norm
iUv
ii vru 2
2:
Feature Extraction (cont.)
• Digital Design Implementation– Resulted in discovering 34 Clusters– The feature extractor module accepts wavelet
coefficients as inputs – Allocates to the cluster processors according
to the cluster boundaries
Feature Extraction (cont.)CLK
RESET
MODE ...4 4 4 4 4 4 4 4 4 4 4 4 4 4
4x4multiplier
5
8-bit Register 8-bit Register
8
Carry Look-Ahead Adder
8 8 0Ci
8-bit RegisterDone?
5
8
8-bit Square
Root
8
4
Wavelet Feature
20 Input Mux
5-bitCounter
4
Bit 3
Carry Look-Ahead Adder
0 Ci=1
Mux
4
44
4
8
Feature Extraction (cont.)
CLK
RESET
MODE
Wavelet Coefficients x2564
Wavelet Features x34 4 4
Feature ClusterUnit
Feature ClusterUnit
Feature ClusterUnit
Feature ClusterUnit
Feature ClusterUnit
... 34 FeatureCluster Units ...
4 4
20 inputs 20 inputs 20 inputs 20 inputs 20 inputs
4 4 4 4 4
Neural Networks
• Neural networks can be applied to solve classification problems by means of a learning process – solution to the classification problem can be
found without the need for complex, often slow and inaccurate, algorithms
• Varieties of different types of neural networks– Multi-layer perceptron, which has its basis in
neural biology
Neural Networks (cont.)
• Multi-layer perceptron – Basic building block is the Neuron (perceptron)
• Synapses are the inputs to the neuron– Each with its own weight in order to adjust the strength of the
input • Adder component combines the inputs of the neuron, and
multiplies each by its respective weight– This weighted sum is called the activation potential –
• The activation function then applies a “squashing function” to the activation potential
– Limits the permissible amplitude range of the output signal–
m
jjkjk xwv
0
)(tanh)( nvbanv kkk
Neural Networks (cont.)
• Multi-layer perceptron (cont.)– The number of layers and the number of
neurons in each layer determine the number of decision regions that a multi-layer perceptron can define
• One or more hidden layers (one of which is also known as the input layer)
• Output layer
– First trained with a known training set
Neural Networks (cont.)
• Multi-layer perceptron Training– Apply the known input to the input layer– Forward propagate the results through the
other layers• Weights remain constant
– Results from the output layer are then collected, and compared to the desired response
– An error signal is calculated
Neural Networks (cont.)
• Multi-layer perceptron Training (cont.)– Error signal is then back propagated through
the neural network, against the direction of the synaptic connections
– Weights in the neural network are modified via back-propagation error-correction
– Training continues until the weights of the neural network produce outputs that converge
Neural Networks (cont.)
• Digital Design Implementation– 2-layer multi-layer perceptron
• 34 input neurons • 2 output neurons
– The 34 wavelet-features were fully connected to the 34 input neurons
– Which were in turn, fully connected to the output neurons
– Each output neuron corresponds to each of the two result classes, voice and music
Neural Networks (cont.)
• The weights in the neural network were designed to be uploaded– Eliminates the need for training hardware in the design– Training was instead performed in a software simulation model
• Novel algorithm• Main neural network module
– contains 34 input neuron modules – 2 output neuron modules – synchronization hardware – result generator module
• Generates VOICE, MUSIC, OTHER, and VALID signals • Neuron module
– Weight registers– Multiplier– Activation functions
• look-up based comparators
Neural Networks (cont.)
WEIGHTSCLK
RESET
1
8-bit Weight Register
8-bit Weight Register
8-bit Weight Register
8-bit Weight Register
8
6-bitCounter
GenerateLoad Lines
MUX
6
...4 4 4 4 4 4 4 4 4 4 4 4 4 4
34 Input Mux
34 Inputs
8x4multiplier
12-bit Register
12
Carry Look-Ahead Adder
120
Ci
12-bit Register
12
4
8
Done?
12
ActivationFunction
4
Neuron Output
Select Line
... 34 Weight Registers ...
8-bit Weight Register
8-bit Weight Register
8-bit Weight Register
12-bit Register
12
Neural Networks (cont.)
WEIGHTS
CLK
VALID VOICE MUSIC OTHER
RESET
MODE
Wavelet Features x34
Neuron Neuron Neuron Neuron NeuronNeuron
Neuron Neuron
... 34 Neurons ...
34 Inputs 34 Inputs 34 Inputs 34 Inputs 34 Inputs 34 Inputs
34 Inputs 34 Inputs
8
DetermineResult
4 4
To all inputs of input layer neurons
4 4 4 4 4 4
To inputs of ouput layer neurons To inputs of ouput layer neurons
CLK
RESET
MODE6-bit
Counter
GenerateSelect Lines
6
Wavelet Neural Networks
• Applications of Wavelet Neural Networks are nearly as varied as their possible configurations– Function approximators– Signal classifiers
• Improve upon Artificial Neural Networks– limited ability to characterize local features of a time
series – A wavelet function is used to condition the inputs to the
neural network, such that only vital information about the signal is processed by the network
Wavelet Neural Networks (cont.)• Digital Design Implementation
– Discreet Haar wavelet processor– Feature extractor processor – 2-layer multi-layer perceptron consisting of 34 input
neurons, and 2 output neurons– Synchronization hardware– Mode Selection
• Upload Weights• Classify Input
– Imposed limitations to reduce hardware size• 4-bit data paths (where possible)• 8-bit weight registers (neural network)
Wavelet Neural Networks (cont.)• Data Flow
– 256 samples of 16-bit digital audio applied to the wavelet processor
• Converted to 256 4-bit wavelet coefficients– 256 wavelet coefficients applied to the feature extractor
processor• Resulting in 34 4-bit wavelet-features
– 34 wavelet-features applied to the neural network processor
• Wavelet-features were fully connected to the 34 input neurons• Which were in turn, fully connected to the 2 output neurons• Each output neuron corresponds to each of the two result
classes, voice and music
Wavelet Neural Networks (cont.)
AUDIO WEIGHTS
CLK
VALID VOICE MUSIC OTHER
16
RESET
MODEWavelet Filter
Feature Extractor
Wavelet Coefficients x256
4 4 4 4 4 4
Neural Network
Wavelet Features x34
4 4 4 4 4 4
8
Results
• The wavelet neural network was originally constructed as a software model – To experiment with network parameters – To determine neural network weight values– To determine ideal results– To provide a reference to verify correct hardware
operation
• To conserve hardware resources, two additional models were created– 8-bit data paths, 8-bit weights (software sim only)– 4-bit data paths, 8-bit weights (digital design / sim)
Results (cont.)
• Wavelet Neural Network system was modeled in VHDL
• Synthesized with Synplicity Synplify – Actel ProASICPlus APA600 synthesized library
cells– Target clock frequency of 11.025 KHz – 96,265 cells– Estimated max clock frequency of 15.6 MHz
Results (cont.)
Voice Music Other
Voice Training Set 98.76% 0.00% 1.24%
Voice Test Set 91.93% 2.88% 5.19%
Music Training Set 0.39% 98.26% 1.35%
Music Test Set 2.49% 90.23% 7.28%
Ideal Model
Results (cont.)
Voice Music Other
Voice Training Set 88.59% 7.79% 3.61%
Voice Test Set 86.74% 10.95% 2.31%
Music Training Set 13.44% 80.85% 5.71%
Music Test Set 10.15% 86.40% 3.45%
8-bit weights 8-bit data
Results (cont.)
Voice Music Other
Voice Training Set 80.23% 17.68% 2.09%
Voice Test Set 70.61% 28.82% 0.58%
Music Training Set 20.79% 77.47% 1.74%
Music Test Set 22.61% 76.44% 0.96%
8-bit weights 4-bit data
Conclusions
• The trained digital design wavelet neural network was effective in correctly classifying the test data sets
• The novel design of the wavelet transform processor produced an efficient hardware design that was also a high performance pipeline
• The novel neural network training algorithm was effective in determining weight values that produced excellent classification results
Conclusions (cont.)
• The design of the hardware modules was straightforward to model in VHDL
• The synthesis was simplistic due to the low clock operating speed
• The ideal model of the wavelet neural network demonstrates what can be achieved with much larger hardware sizes
• A hardware implementation offers advantages over a purely software implementation
Future Work
• Implement the digital design in an FPGA with supporting circuitry on a PCB
• Add training hardware to the design• Wavelet Neural Network configuration
– Network parameters– Number of layers, number of neurons– Wavelet filter type– Feature extraction method
• Increase robustness of classifier by training with wider variety of audio samples
Future Work (cont.)
• Future areas of possible research– Speech recognition – Speaker recognition – Content based music genre classification
Acknowledgements
• Dr. Kenneth W. Hsu (Computer Engineering)
• Dr. Pratapa V. Reddy (Computer Engineering)
• Dr. Marcin Łukowiak (Computer Engineering)
Acknowledgements (cont.)
• Dr. Roger Gaborski (Computer Science Department)
• Dr. Albert Titus (University at Buffalo)• Anne DiFelice (Computer Engineering)• Pam Steinkirchner (Computer Engineering)• André Botha (Microsoft Corporation)• Paul Brown (Intel Corporation)
• Stefan Pittner and Sagar V. Kamarthi (Northeastern University)– allowing the use of their research in feature extraction
Acknowledgements (cont.)
• Heather Hughes
• Mr. John Hughes
• Mrs. Suellen Hughes
• Zachary Hughes
• Wendy Hughes
Questions
Demonstration
Demonstration Results
Voice Music Other
Voice Training Set 80.23% 17.68% 2.09%
Voice Test Set 70.61% 28.82% 0.58%
Music Training Set 20.79% 77.47% 1.74%
Music Test Set 22.61% 76.44% 0.96%
8-bit weights 4-bit data