ECE532 Group Report Group #22 April 9, 2012

XAudio: A Voice Controlled Music Player

ECE532 – Group Report

Group #22

April 9, 2012

Michael Cornacchia

Jason Deng

Goce Jankuloski

Table of Contents

Overview ....................................................................................................................................................... 1

Project Description.................................................................................................................................... 1

Manual ...................................................................................................................................................... 1

System Block Diagram ............................................................................................................................... 2

Summary ................................................................................................................................................... 3

Outcome ....................................................................................................................................................... 3

Proposed Acceptance Criteria: ................................................................................................................. 3

Results ....................................................................................................................................................... 3

Improvements ........................................................................................................................................... 4

Project Schedule ........................................................................................................................................... 4

Planned Schedule ...................................................................................................................................... 4

Actual Schedule ......................................................................................................................................... 5

Schedule Comparison ............................................................................................................................... 6

Description of Blocks ..................................................................................................................................... 7

Word Processor ......................................................................................................................................... 7

Vcom_RxTx ................................................................................................................................................ 8

Audio (v1.00a) ........................................................................................................................................... 8

OPB_AC97 (v2.00a) ................................................................................................................................... 8

XPS_SysACE (v1.00a) ................................................................................................................................. 8

PLB_OPB_Bridge (v1.00c) ......................................................................................................................... 9

Voice Recognition (voice_rec) uB (v7.10d) ............................................................................................... 9

Voice Recognition (voice_rec) UART (v1.00a)........................................................................................... 9

Player uB (v7.10d) ..................................................................................................................................... 9

Player UART (v1.00a) ................................................................................................................................ 9

XPS_GPIO (v1.00a) .................................................................................................................................... 9

2to1 Mux ................................................................................................................................................... 9

Description of Design Tree .......................................................................................................................... 10

Tips and Tricks ............................................................................................................................................. 10

Xilfatfs_v1_00_a ...................................................................................................................................... 10

1

Overview

Project Description The objective of this project was to design and build a voice controlled music player. By speaking

predefined voice commands into the microphone of the device, the device carries out the wanted action

on behalf of the user. The device has the songs which can be played and the frequency signature of

reference voice commands stored in memory. The user is able to control the playback of the music on

the device with typical commands such as regulating the playback mode, current track played, etc. The

commands are converted to the frequency domain using hardware implemented DCT (discrete cosine

transform) so that they can be compared with the commands stored in memory. The goal was to

develop a music player which is fun to use due to its voice enabled interaction.

Manual Event Type Description

Left push button Button/Switch Toggles the operation mode between “voice command” (to accept user voice input) and “training mode” (to store user voice input)

Enter push button Saves the most recent voice command reference data to memory

Right push button Toggles the method of “training mode” between “overwrite” (erases existing voice reference data) and “average” (averages the existing voice reference data with the new data)

Toggle switch 3 Selects rs232 output between the UARTs of the 2 microblazes

Switches 0-2 Selects each word to train

Say “Play” Voice Command

Play the current song

Say “Stop” Stop the current song

Say “Next” Go to the next song

Say “Pause” Pause the current song

Say “Shuffle” Set shuffle mode

Say “Repeat” Set repeat mode

2

PLB

OPB

System Block Diagram

Legend

TYPE

Existing IP

Custom IP

External Hardware (Off Chip)

Name/Desc.

Name/Desc.

Name/Desc.

HARDWARE

FSL

MICROBLAZE

player_uB

MICROBLAZE

voice_rec_uB

wordprocessor

AUDIO

audio

PLBv46_OPB_BRIDGE

plb_opb_bridge

OPB_AC97

opb_ac97

XPS_UARTLITE

player_UART

XPS_UARTLITE

voice_rec_UART

2to1 mux

HARDWARE

RS-232

vcom_rxtx

XPS_GPIO

GPIO

HARDWARE

AC97

XPS_SYSACE

SysACE

XPS_INTC

voice_rec_intr

XPS_INTC

player_intr

HARDWARE

led, switch, push button

3

Summary The AC97 codec is used for playback of music and capturing of voice input. The custom DCT block

converts the time domain microphone input into a frequency domain representation and segments the

audio stream into words. The frequency information describing this word is then sent to the next

custom block which is the voice command receiver (Vcom_rxtx). This block accumulates the frequency

information for a word and manages an FSL connection to forward the information to a MicroBlaze

processor where it is compared against a set of predefined word commands loaded in memory. If a

match is found, the corresponding command ID is sent from the voice recognizer MicroBlaze to the

player MicroBlaze, which then alters playback accordingly. The player MicroBlaze controls playback by

writing to the FSL connecting it to the audio block. It also reads music files from the CF card through the

SysACE interface controller.

Everything except for the custom IP used is Xilinx IP, which is available through the IP catalog in XPS. The

only exception is the opb_ac97 and the audio cores which are still Xilinx IP and come as part of a

reference design.

Outcome

Proposed Acceptance Criteria: Device understands 9 out of every 10 given commands

Device does not misrecognize one command for another

Any sound/command outside of the predefined commands is rejected

Processing of given commands happens within 2 seconds of input

Eliminates noise – command understood in environment with sound level 65 dB

Sound feedback is at an audible level in 65 dB room

Loudest volume is below 95 dB

Voice input sampling at a minimum of 44.1kHz

Music output bit rate at minimum of 256 kbps

Results The input processing hardware is able to perform a 128 sample DCT in 1920 cycles or 19.2 microseconds

at 100MHz, allowing it to process the required 44.1kHz input audio (22.7 microsecond sampling period)

in real-time. Noise filtering is automatically performed in hardware by applying a threshold to each

frequency bin separately before forwarding the frequency information to the MicroBlaze. The software

can perform autocorrelation or mean squared error comparison in well under one second, meeting the

desired two second processing time for audio commands.

The requirement for music output bit rate was found to be impossible to satisfy due to the limited read

speed from the Compact Flash card to the MicroBlaze. We determined that 22kHz single channel .wav

format audio files could be read from the card in real-time.

4

We are not yet able to determine the command recognition accuracy due to last minute issues with

drivers and meeting area constraints which have just been resolved. The final results for this

requirement will be shown in our presentation after we are able to determine appropriate settings for

microphone boost and training the software to match the outputs on the final hardware.

Improvements The fundamental tools for the project have been created including hardware for word segmentation and

frequency domain conversion, software for matching frequency patterns to reference words and

software and drivers for music player control, reading files and audio output. With this backbone, it

would now be easy to make improvements such as adding additional commands, adding other optional

functions for performing pattern matching to improve accuracy or improving audio quality using by

reading higher bit rate music files from a different memory source.

Project Schedule

Planned Schedule

Week Milestone

Feb 8 - Figure out music format and other logistics like different command sound patterns - Look at existing FFT IP block and see how it works - Bandwidth estimation

Feb 15 - Figure out protocols (FSL) and write the Verilog code for custom blocks (Week 1/2) - Figure out MPMC protocol and store music into DDR, ability to access files for read

Feb 29 - Simulate existing FFT block - Simulate individual custom blocks to verify correctness - Figure out protocols and write the Verilog code for custom blocks (Week 2/2) - Music files playing through AC97 codec

Mar 7 - Interface modules o Interface microblaze with the music reader o Interface microblaze with the command receiver

Mar 14 - Test out completed design

Mar 21 - Debug - Optional: Visualizer component

o Displaying video frame form memory o Creating video frame from FFT output

Mar 28 - Make sure everything is functional

5

Actual Schedule

Week Milestone

Feb 8 - Bandwidth estimation - Read about FFT IP block, and instantiated with estimated parameters - Created Modelsim .do scripts for testing FFT - Discussion of custom block I/O

Feb 15 - Created Matlab model for performing word segmentation and FFT - Wrote up the code needed to read/write from CF card. - Synthesized the hardware to test this out - Prototype C code for frequency pattern recognition for words

Feb 29 - Created Verilog module for performing word segmentation - Got the write/read working for CF - Researched on how to get audio implemented by using old blocks. - Coding, testing, and implementation of custom block “vcom_rxtx”

Mar 7 - Added support to “vcom_rxtx” for accumulating frequency data directly from the FFT - Simulated word segmentation module - Decided to use DCT instead of FFT - Created Matlab model for performing 128 sample DCT with integer approximation of

DCT matrix - Wrote and synthesized to test out the audio playback. Wasn’t able to get it playing

audio.

Mar 14 - Created C++ program to generate code for the DCT module multiplications and sums - Created fast multiplier for the known bit widths to multiply - Modified word segmenter to work on frequency converted data - Simulated “wordprocessor” module, combining DCT and word segmentation - Successfully got the audio working. Used reference design from Xilinx, that uses OPB - Revised “vcom_rxtx” to support the DCT - Started design of a testbench project in XPS to test “vcom_rxtx”, adding in software

code for frequency pattern recognition

Mar 21 - Modified wordprocessor module to take input as a slave on an FSL - Implemented downsampling by 4 before DCT to maximize use of frequency spectrum

for 44.1kHz input - Able to read .wav file from CF and play it onto the AC97 - Modified XilFatFS library to disable cache – otherwise cannot play more than 10s - Implemented a “dummy_dct” custom block to test the output of “vcom_rxtx” - Added FSL communication to the frequency pattern recognition software

Mar 28 - Integration and Debugging

Extra - Changed DCT to perform 128 multiplications and sums per cycle for 128 cycles instead of the original parallel design to fit the fpga LUT and FF constraints

- Further decreased the size of “vcom_rxtx” by reducing wire bit width and removing double buffering functionality to reduce logic utilization

- Modified some xilfatfs driver code to force the configuration controller to reset and give the lock to the sysace controller, which fixes some minor errors

6

Schedule Comparison Unsurprisingly our actual schedule differs from the planned schedule. Most noticeably our actual

schedule is behind our planned schedule, sometimes by as much as 2-3 weeks. The slowdown in

progress is attributed mainly to errors with existing drivers, running out of space on the FPGA chip, and

modifications in custom blocks due to unforeseen complexities.

For example the music playback from CF card to AC97 was originally planned to be done by Feb. 29, but

it only got done by Mar. 21. In this instance, there were several unexpected obstacles which significantly

slowed down the progress in this matter. The read/write functionality for CF took longer because of the

way the presence of the card is interpreted by the board. However, the ‘slack’ in our planned schedule

for this partition of the work in the weeks of March allowed us in the end to still finish on time.

Although the implementation of the custom blocks around Feb. 15 to Feb. 29 was complete on time, the

design changes of a single custom block affected the operation of the other. The result was that the

custom blocks were continuously modified through the week of Mar. 14. Similarly, in the “Extra” weeks,

the system failed to map onto the FPGA chip, forcing further changes in the custom blocks and changes

in various software settings to reduce logic utilization.

7

Description of Blocks

Word Processor

The core of this custom block is another custom module which performs a discrete cosine transform on

a sequence of 128 audio samples, each represented as a 16-bit integer. Due to constraints on the

number of LUTS and FFs, only the five most significant bits are kept. These values are multiplied with

four-bit pre-calculated cosine samples using a custom multiplier for multiplying by signed four-bit values

(optimal bit width determined through Matlab simulation) in three cycles using a sequence of additions.

The cosine values are hard coded as inputs to the multipliers by using a script to generate the Verilog

file. The output frequency components are 17-bit because they are the sum of 128 scaled inputs.

The DCT block uses an FSM to store a sample each time iwrite is driven high. Once it accumulates 128

time values it performs 128 dot products as described above to generate the frequency data. It then

outputs the frequency information by raising owrite whenever a new value is ready.

The wordprocessor block is an FSM which downsamples the input by four. This value was determined

using a Matlab model to optimize the use of the frequency bins for a 128-DCT considering 44.1kHz

sampled audio and the range of frequencies observed when saying the keywords. It also manages the

communication protocols for receiving input from the FSL, using the DCT block and outputting to the

next custom block.

The information is output to the adjacent block as a 128-bit wide array of ones or zeros for each

frequency component which are determined by comparing the magnitude of the frequency component

to an experimentally determined threshold for optimizing distinction between keywords. The number of

ones in a vector is used to determine the power of the signal during the 128 time samples considered. A

single vector with power greater than a threshold starts a word and two consecutive power levels below

a threshold ends the word.

wordprocessor

clk reset

vec_out

send

write

sendready

32 128 FSL_S_Data

FSL_S_Control

FSL_S_Exists

FSL_S_Read

FSL_S_Clk

dct in

iwrite

clk reset

out

owrite

16 17 / /

8

Vcom_RxTx This custom block builds up the frequency data into a histogram/bar graph and once signaled, sends the

data to “Voice Recognition uB” through an FSL. It obtains sets of 128-bits of frequency data from “Word

Processor” every clock cycle that “dct_raw_write” is high. Each bit of the 128-bits is summed in a

separate 8-bit counter. Thus the data sent to the microblaze consists of a total of 128 8-bit numbers

packaged into 32-bits to fit the FSL. The accumulated data is sent to the FSL when “dct_raw_send” is

high. “dct_sendready” is used for handshaking to indicate the completion of send.

Audio (v1.00a) The audio block is a custom core that comes bundled with opb_ac97 block from Xilinx Reference designs

on how to play audio. As show by the block diagram, the audio block interfaces with the player

microblaze via FSL for playback of the audio, as well as the word processor block for microphone voice

input. The FSLs are implicitly connected to the opb_ac97 block through which the audio is sent/received.

It is unclear why Xilinx would not merge these two blocks into one simple block.

OPB_AC97 (v2.00a) This is the main block that controls and configures the AC97 codec. It is as mentioned above, pulled from

Xilinx’s own reference designs and interfaces with OPB instead of PLB. Through its own driver, things like

setting the sampling rate, volume level, line in/out enabling, as well as microphone controls are easily

accomplished. No modifications were necessary to get it working.

XPS_SysACE (v1.00a) This is standard Xilinx IP block that is used to communicate with the compact flash storage. No real

hardware changes were made to this block; there were some library function changes in order to disable

things like cache buffering.

Vcom_rxtx

clk rst

dct_raw_data

dct_raw_write

dct_raw_send

dct_sendready

fsl_m_data

fsl_m_control

fsl_m_write

fsl_m_full

128 32

9

PLB_OPB_Bridge (v1.00c) This is a deprecated core that needed to be used in order to be able to communicate with the opb_ac97.

Voice Recognition (voice_rec) uB (v7.10d) The microblaze used for determining which word command was issued by the user and in turn, on a

valid command, sends the request to “Player uB”. The frequency data input comes from “Vcom_Rxtx”,

which is sent through an FSL. The microblaze is configured to be interrupted by available FSL input. The

frequency pattern comparison is performed through an autocorrelation function. The word command is

similarly outputted through an FSL to “Player uB”.

Voice Recognition (voice_rec) UART (v1.00a) Standard UART block used in order to be able to display messages on RS232. It is used for debugging

purposes to ensure proper FSL functionality and voice command recognition.

Player uB (v7.10d) The microblaze used for controlling playback of music, as well as reading and writing to the compact

flash card. There were no hardware changes, other than configuring it to have the correct number of FSL

links. On the software side, the XilFatFS (v1.00a) library was included which allows for read/write access

to the card on a higher level like you would expect when doing a normal C program.

Player UART (v1.00a) Standard UART block used in order to be able to display messages on RS232. Also it was used to display

the playback state of the player. Things like track #, track progress, as well as play mode are displayed

through this.

XPS_GPIO (v1.00a) This general purpose input/output, which is connected to “Voice Recognition uB”, is used to connect to

the switches and push buttons for user input and testing purposes. It also outputs to LEDs to indicate

some states, mainly used for debugging and notification.

2to1 Mux This custom block is a simple 2 input multiplexer. This is used to select one of the UARTs (there are 2, 1

for each microblaze) to output print messages through RS232. The select pin is connected to a switch.

10

Description of Design Tree

File directory File/Folder Name Description

doc Documents

Xaudio player_ub Software project for music player

voice_rec_microblaze Software project for voice recognition

Xaudio/code Main.c Software code for the music player

Audio Software code to communicate with the AC97

Cf Software code to communicate with the compact flash card

Voice_rec Software code for voice recognition

Xaudio/data System.ucf User and pin constraints

Xaudio/drivers ac97_v2_00_a AC97 driver files

audio_v1_00_a Audio driver files

All other files are automatically generated by XPS.

Tips and Tricks

Xilfatfs_v1_00_a The existing driver code for the sysace controller, specifically in “xilfatfs_sysace.c” where the sysace is

initialized in function “init_ace()” has trouble acquiring a lock to the controller. The system ace ERROR

led is a solid red at this point. The configuration controller has the lock and for some reason, does not

release it even though no configuration file is present on the compact flash card. In the above

mentioned file, the following lines replace the call to XSysace_lock(…) in “init_ace”:

XSysAce_ResetCfg(&Ace); XSysAce_Unlock(&Ace); XSysAce_Lock(&Ace,XTRUE);

This will reset the configuration controller and forcefully acquire a lock for the sysace controller. The

sysace is only used for file IO in this project.

Second, in order to be able to read large amounts of data from the CF we had to disable the caching. As

soon as the cache filled out, all the sectors were marked as valid – hence the software got stuck because

it would not be able to evict any sector from the cache. The fix is to disable the caching in the same file

as above – function is read_sector(). Remove everything and replace it with this line:

return read_sector_cf(sector, sector_buf);