An Auditory Classifier Employing a Wavelet Neural Network Implemented in a Digital Design Thesis Defense Jonathan Hughes [email protected] Department.

An Auditory Classifier Employing a Wavelet Neural Network Implemented in a Digital Design

Thesis Defense

Jonathan [email protected]

Department of Computer EngineeringRochester Institute of TechnologyRochester, NYAugust 11th, 2006

mailto:[email protected]

Overview

• Introduction• Wavelets• Feature Extraction• Neural Networks• Wavelet Neural Networks• Results• Conclusions• Future Work• Acknowledgements• Questions• Demonstration

Introduction

• Computer Systems– Can replace or improve upon human operators

• Auditory Processing– Voice recognition– Speaker recognition– Multimedia indexing– Sonar Analysis

Introduction (cont.)

• Problem– Classify audio samples as either Voice or Music– But, how to classify a time-series?

• Wavelets– The wavelet transform reveals details of a time series with both

time and frequency localization

• Feature Extraction– Need to extract meaningful features from wavelet coefficients

• Artificial Neural Networks– Excellent at classification tasks– Can classify the audio samples from the extracted features

• Wavelet Neural Network

Wavelets

• Signal analysis– Benefit from the Fourier Transform and more

recently the Wavelet Transform

• Fourier Transform (early 1800’s)– Superposition of sine and cosine functions– Reveal frequency information of time-series– Not very useful for localized non-periodic

signal analysis

Wavelets (cont.)

• Wavelet Transform (early 1900’s)– Multi-resolution analysis– Small details are represented, as well as the

gross features, and all scales in between– Uses a Mother Wavelet (Analyzing Wavelet) as

the prototype function– Approximate the target function

• Dilation• Translation

Wavelets (cont.)

Wavelets (cont.)

ψ00

-1.5

-1

-0.5

0

0.5

1

1.5

-0.5 0 0.5 1 1.5

Wavelets (cont.)

• Time-Frequency components are generated by sets of scaling and wavelet functions– Low- and High-Pass filters– Down-sampled high-pass filter outputs become

wavelet detail coefficients– Down-sampled low-pass filter outputs are

processed by the next level of filters– Final stage low-pass filter output is the wavelet

approximation coefficient

Wavelets (cont.)

• Digital Design Implementation– Designed to accept uncompressed digital

audio– Sampled at 11.025 kHz– 16 bits per sample– Processed in blocks of 256 samples or

approximately 23 milliseconds– Requires log2(256) = 8 levels of filters to

transform the audio data into wavelet coefficients

Wavelets (cont.)

2ba XX

LowPass

Carry Look-Ahead Adder

16 16

Low Pass Unit

0Ci


16

High Pass Unit

1Ci

16


0 1Ci

16

16

bit 15

bit 15 bits 15-1

Mux


0 1Ci

16

16

bit 15

bit 15 bits 15-1

Mux

2ba XX

HighPass

Wavelets(cont.)

AUDIO

CLK

RESET

MODE

16-bit Register 16-bit Register

16 16

16

Low Pass Filter High Pass Filter

16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16


128 4-bit Save Registers64 4-bit Save Registers321684211

16 16

8-bit Counter

to all registers

control theregister loading

4 4 4 4 4 4 4 4 4

16-bit Register

16

128 4-bit Result Registers64 4-bit Result Registers321684211

Wavelet Coefficients x256

4 4 4 4 4 4 4 4 4

16

AUDIO

CLK

RESET

MODE


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16



16 16

8-bit Counter

to all registers


4 4 4 4 4 4 4 4 4

16-bit Register

16



4 4 4 4 4 4 4 4 4

16

AUDIO

CLK

RESET

MODE


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16

16


16


16 16



16 16

8-bit Counter

to all registers


4 4 4 4 4 4 4 4 4

16-bit Register

16



4 4 4 4 4 4 4 4 4

16

Feature Extraction

• Data provided by the wavelet decomposition process– Multi-resolution and time- and frequency-localized– Not of an acceptable form for neural network

processing

• Feature extractor– Generates meaningful features from the wavelet

coefficients– Stefan Pittner and Sagar V. Kamarthi

Feature Extraction (cont.)

• Feature extraction process– Requires initial setup steps that involve

processing a training set of data– Customized for the data space of interest– Depends on the use of clusters to identify

groups of wavelet coefficients

• Cluster Generation


• Extracting the features– For each cluster, U, the feature, u, is

calculated by taking the square root of the sum of the squares of each coefficient, v, in the cluster

– Euclidean norm

iUv

ii vru 2

2:


• Digital Design Implementation– Resulted in discovering 34 Clusters– The feature extractor module accepts wavelet

coefficients as inputs – Allocates to the cluster processors according

to the cluster boundaries

Feature Extraction (cont.)CLK

RESET

MODE ...4 4 4 4 4 4 4 4 4 4 4 4 4 4

4x4multiplier

5


8


8 8 0Ci

8-bit RegisterDone?

5

8

8-bit Square

Root

8

4

Wavelet Feature

20 Input Mux

5-bitCounter

4

Bit 3


0 Ci=1

Mux

4

44

4

8


CLK

RESET

MODE


Wavelet Features x34 4 4

Feature ClusterUnit

Feature ClusterUnit

Feature ClusterUnit

Feature ClusterUnit

Feature ClusterUnit

... 34 FeatureCluster Units ...

4 4

20 inputs 20 inputs 20 inputs 20 inputs 20 inputs

4 4 4 4 4

Neural Networks

• Neural networks can be applied to solve classification problems by means of a learning process – solution to the classification problem can be

found without the need for complex, often slow and inaccurate, algorithms

• Varieties of different types of neural networks– Multi-layer perceptron, which has its basis in

neural biology

Neural Networks (cont.)

• Multi-layer perceptron – Basic building block is the Neuron (perceptron)

• Synapses are the inputs to the neuron– Each with its own weight in order to adjust the strength of the

input • Adder component combines the inputs of the neuron, and

multiplies each by its respective weight– This weighted sum is called the activation potential –

• The activation function then applies a “squashing function” to the activation potential

– Limits the permissible amplitude range of the output signal–

m

jjkjk xwv

0

)(tanh)( nvbanv kkk


• Multi-layer perceptron (cont.)– The number of layers and the number of

neurons in each layer determine the number of decision regions that a multi-layer perceptron can define

• One or more hidden layers (one of which is also known as the input layer)

• Output layer

– First trained with a known training set


• Multi-layer perceptron Training– Apply the known input to the input layer– Forward propagate the results through the

other layers• Weights remain constant

– Results from the output layer are then collected, and compared to the desired response

– An error signal is calculated


• Multi-layer perceptron Training (cont.)– Error signal is then back propagated through

the neural network, against the direction of the synaptic connections

– Weights in the neural network are modified via back-propagation error-correction

– Training continues until the weights of the neural network produce outputs that converge


• Digital Design Implementation– 2-layer multi-layer perceptron

• 34 input neurons • 2 output neurons

– The 34 wavelet-features were fully connected to the 34 input neurons

– Which were in turn, fully connected to the output neurons

– Each output neuron corresponds to each of the two result classes, voice and music


• The weights in the neural network were designed to be uploaded– Eliminates the need for training hardware in the design– Training was instead performed in a software simulation model

• Novel algorithm• Main neural network module

– contains 34 input neuron modules – 2 output neuron modules – synchronization hardware – result generator module

• Generates VOICE, MUSIC, OTHER, and VALID signals • Neuron module

– Weight registers– Multiplier– Activation functions

• look-up based comparators


WEIGHTSCLK

RESET

1

8-bit Weight Register




8

6-bitCounter

GenerateLoad Lines

MUX

6

...4 4 4 4 4 4 4 4 4 4 4 4 4 4

34 Input Mux

34 Inputs

8x4multiplier

12-bit Register

12


120

Ci

12-bit Register

12

4

8

Done?

12

ActivationFunction

4

Neuron Output

Select Line

... 34 Weight Registers ...




12-bit Register

12


WEIGHTS

CLK

VALID VOICE MUSIC OTHER

RESET

MODE

Wavelet Features x34

Neuron Neuron Neuron Neuron NeuronNeuron

Neuron Neuron

... 34 Neurons ...

34 Inputs 34 Inputs 34 Inputs 34 Inputs 34 Inputs 34 Inputs

34 Inputs 34 Inputs

8

DetermineResult

4 4

To all inputs of input layer neurons

4 4 4 4 4 4

To inputs of ouput layer neurons To inputs of ouput layer neurons

CLK

RESET

MODE6-bit

Counter

GenerateSelect Lines

6

Wavelet Neural Networks

• Applications of Wavelet Neural Networks are nearly as varied as their possible configurations– Function approximators– Signal classifiers

• Improve upon Artificial Neural Networks– limited ability to characterize local features of a time

series – A wavelet function is used to condition the inputs to the

neural network, such that only vital information about the signal is processed by the network

Wavelet Neural Networks (cont.)• Digital Design Implementation

– Discreet Haar wavelet processor– Feature extractor processor – 2-layer multi-layer perceptron consisting of 34 input

neurons, and 2 output neurons– Synchronization hardware– Mode Selection

• Upload Weights• Classify Input

– Imposed limitations to reduce hardware size• 4-bit data paths (where possible)• 8-bit weight registers (neural network)

Wavelet Neural Networks (cont.)• Data Flow

– 256 samples of 16-bit digital audio applied to the wavelet processor

• Converted to 256 4-bit wavelet coefficients– 256 wavelet coefficients applied to the feature extractor

processor• Resulting in 34 4-bit wavelet-features

– 34 wavelet-features applied to the neural network processor

• Wavelet-features were fully connected to the 34 input neurons• Which were in turn, fully connected to the 2 output neurons• Each output neuron corresponds to each of the two result

classes, voice and music

Wavelet Neural Networks (cont.)

AUDIO WEIGHTS

CLK

VALID VOICE MUSIC OTHER

16

RESET

MODEWavelet Filter

Feature Extractor


4 4 4 4 4 4

Neural Network

Wavelet Features x34

4 4 4 4 4 4

8

Results

• The wavelet neural network was originally constructed as a software model – To experiment with network parameters – To determine neural network weight values– To determine ideal results– To provide a reference to verify correct hardware

operation

• To conserve hardware resources, two additional models were created– 8-bit data paths, 8-bit weights (software sim only)– 4-bit data paths, 8-bit weights (digital design / sim)

Results (cont.)

• Wavelet Neural Network system was modeled in VHDL

• Synthesized with Synplicity Synplify – Actel ProASICPlus APA600 synthesized library

cells– Target clock frequency of 11.025 KHz – 96,265 cells– Estimated max clock frequency of 15.6 MHz

Results (cont.)

Voice Music Other

Voice Training Set 98.76% 0.00% 1.24%

Voice Test Set 91.93% 2.88% 5.19%

Music Training Set 0.39% 98.26% 1.35%

Music Test Set 2.49% 90.23% 7.28%

Ideal Model

Results (cont.)

Voice Music Other


Voice Test Set 86.74% 10.95% 2.31%


Music Test Set 10.15% 86.40% 3.45%

8-bit weights 8-bit data

Results (cont.)

Voice Music Other


Voice Test Set 70.61% 28.82% 0.58%


Music Test Set 22.61% 76.44% 0.96%


Conclusions

• The trained digital design wavelet neural network was effective in correctly classifying the test data sets

• The novel design of the wavelet transform processor produced an efficient hardware design that was also a high performance pipeline

• The novel neural network training algorithm was effective in determining weight values that produced excellent classification results

Conclusions (cont.)

• The design of the hardware modules was straightforward to model in VHDL

• The synthesis was simplistic due to the low clock operating speed

• The ideal model of the wavelet neural network demonstrates what can be achieved with much larger hardware sizes

• A hardware implementation offers advantages over a purely software implementation

Future Work

• Implement the digital design in an FPGA with supporting circuitry on a PCB

• Add training hardware to the design• Wavelet Neural Network configuration

– Network parameters– Number of layers, number of neurons– Wavelet filter type– Feature extraction method

• Increase robustness of classifier by training with wider variety of audio samples

Future Work (cont.)

• Future areas of possible research– Speech recognition – Speaker recognition – Content based music genre classification

Acknowledgements

• Dr. Kenneth W. Hsu (Computer Engineering)

• Dr. Pratapa V. Reddy (Computer Engineering)

• Dr. Marcin Łukowiak (Computer Engineering)

Acknowledgements (cont.)

• Dr. Roger Gaborski (Computer Science Department)

• Dr. Albert Titus (University at Buffalo)• Anne DiFelice (Computer Engineering)• Pam Steinkirchner (Computer Engineering)• André Botha (Microsoft Corporation)• Paul Brown (Intel Corporation)

• Stefan Pittner and Sagar V. Kamarthi (Northeastern University)– allowing the use of their research in feature extraction

Acknowledgements (cont.)

• Heather Hughes

• Mr. John Hughes

• Mrs. Suellen Hughes

• Zachary Hughes

• Wendy Hughes

Questions

Demonstration

Demonstration Results

Voice Music Other


Voice Test Set 70.61% 28.82% 0.58%


Music Test Set 22.61% 76.44% 0.96%


An Auditory Classifier Employing a Wavelet Neural Network Implemented in a Digital Design Thesis Defense Jonathan Hughes [email protected] Department.

Documents

neural biology slide

calculated slide

kamarthi slide

output signal slide

mother wavelet

known training set slide

feature extraction data

cluster euclidean norm