Top Banner
2006 IEEE 24th Convention of Electrical and Electronics Engineers in Israel Parallel Processing for a DSP Application using FPGA Nonel Thirer, Member, IEEE, and Aviram Souhami Abstract - In this paper we discuss a parallel architecture for an FPGA system including several embedded simple micro- processors (pP), for a digital signal processing application (DSP). Each tP in the system has a different purpose and a separate code unit, but all the jPs share the same data unit. The architecture of the tP can vary in type - it may be designed in the traditional form of a pP, or as a FIR filter, a video pattern generator and so on. Such systems can constitute a good solution when the DSP's main process can be divided into several processes. Every RP can be reprogrammed to perform more than one function, and a superscalar operation mode can be introduced and controlled by the programmer. This type of platform was designed and experimented for an audio synthesizer system. Index Terms - Audio synthesizer, FPGA, multi-processor, parallel processing. I. INTRODUCTION M ost DSP algorithms require complex tools and a massive amount of instructions. Often it is necessary to perform these algorithms in real time. The solution to that was to build an embedded system on a chip, which includes special digital hardware and a micro-controller or a micro-processor for better performance. In recent years, the FPGAIASIC based systems drew a lot of attention, particularly with the introduction of the fast CMOS reprogrammable logic devices [1]-[4]. These allowed to manufacture FPGA based DSP devices with reprogrammable multi-processors. II. SYSTEM ARCHITECTURE An audio synthesizer system (fig. 1) includes a user interface, an input block (including the data acquisition unit with one or more A/D converters), a DSP processing block and an output unit (including D/A converter and speaker). During the synthesizing process, performed by the DSP block, the signal passes usually over three processing units (fig.2): VCO (Voltage Controlled Oscillator), VCF (Voltage Controlled Filter) and VCA (Voltage Controlled Amplifier). ................................................ IDalta Processing Fig. 1 Audio Synthesizer System Block Diagram Fig.2 Processing System Blocks In order to simplify the implementation of this system on an FPGA platform and in order to permit a parallel execution of some phases, it is necessary to identify the common resources used in the process [5]. In the audio synthesizers, three basic functional units (fig.3) are used: an FM oscillator, a Filter and an Amplitude Modulator (AM-DSB-TC) in_FM i. [16Bit] OSC_OUT_ou Depth [8Dit [16Bit] in-_ freq [161itl in _ Wave [8Bilt FM-Oscilator Filter Imn . jf[16 bit] .. filter_out[16 bit] out .k[8 bit] AM-DSB-TC __Depthn [8Bit] AMbit _ ou ~~~~~~~~16Bt r C ier [16Bit] in _ LFO [l6Bitl Fig.3 Basic Functional Units. The idea is that every processing unit from fig.2 can be implemented by using the basic functional units from fig. 3. Thus the VCO stage is implemented using three FM oscillators (fig.4), the VCF stage is implemented using two FM oscillators, two AM Modulators and a Filter (fig.5), the VCA stage is implemented using a FM oscillator and a AM Modulator (fig.6). N. Thirer is with the Holon Institute of Technology, 58102 Holon, Israel (e-mail: Tirer_n@ hit.ac.il) A.Souhami, is with Runcom Technologies, Rishon le Zion, Israel. (e-mail: aviram2k@gmail. com). 1-4244-0230-1/06/$20.00 )2006 IEEE 389
4

Parallel Processing for a DSP Application using FPGAfaratarjome.ir/u/media/shopping_files/store-EN-1520764269-3733.pdf · M ost DSP algorithms require complex tools and a massive

May 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Processing for a DSP Application using FPGAfaratarjome.ir/u/media/shopping_files/store-EN-1520764269-3733.pdf · M ost DSP algorithms require complex tools and a massive

2006 IEEE 24th Convention of Electrical and Electronics Engineers in Israel

Parallel Processing for a DSP Application using

FPGA

Nonel Thirer, Member, IEEE, and Aviram Souhami

Abstract - In this paper we discuss a parallel architecture foran FPGA system including several embedded simple micro-processors (pP), for a digital signal processing application (DSP).Each tP in the system has a different purpose and a separate codeunit, but all the jPs share the same data unit. The architecture ofthe tP can vary in type - it may be designed in the traditionalform of a pP, or as a FIR filter, a video pattern generator and soon. Such systems can constitute a good solution when the DSP'smain process can be divided into several processes. Every RP canbe reprogrammed to perform more than one function, and asuperscalar operation mode can be introduced and controlled bythe programmer. This type of platform was designed andexperimented for an audio synthesizer system.

Index Terms - Audio synthesizer, FPGA, multi-processor,parallel processing.

I. INTRODUCTION

M ost DSP algorithms require complex tools and a massiveamount of instructions. Often it is necessary to perform

these algorithms in real time. The solution to that was to buildan embedded system on a chip, which includes special digitalhardware and a micro-controller or a micro-processor forbetter performance. In recent years, the FPGAIASIC basedsystems drew a lot of attention, particularly with theintroduction of the fast CMOS reprogrammable logic devices[1]-[4]. These allowed to manufacture FPGA based DSPdevices with reprogrammable multi-processors.

II. SYSTEM ARCHITECTURE

An audio synthesizer system (fig. 1) includes a userinterface, an input block (including the data acquisition unitwith one or more A/D converters), a DSP processing blockand an output unit (including D/A converter and speaker).

During the synthesizing process, performed by the DSP block,the signal passes usually over three processing units (fig.2):

VCO (Voltage Controlled Oscillator), VCF (VoltageControlled Filter) and VCA (Voltage Controlled Amplifier).

................................................

IDalta Processing

Fig. 1 Audio Synthesizer System Block Diagram

Fig.2 Processing System Blocks

In order to simplify the implementation of this system on anFPGA platform and in order to permit a parallel execution ofsome phases, it is necessary to identify the common resourcesused in the process [5]. In the audio synthesizers, three basicfunctional units (fig.3) are used: an FM oscillator, a Filter andan Amplitude Modulator (AM-DSB-TC)

in_FM i. [16Bit] OSC_OUT_ouDepth [8Dit [16Bit]

in-_ freq [161itlin _ Wave [8Bilt

FM-Oscilator

FilterImn .jf[16 bit] . .filter_out[16 bit] out

.k[8 bit]

AM-DSB-TC

__Depthn [8Bit] AMbit _ ou~~~~~~~~16Bt

r C ier [16Bit]in_ LFO [l6Bitl

Fig.3 Basic Functional Units.

The idea is that every processing unit from fig.2 can beimplemented by using the basic functional units from fig. 3.Thus the VCO stage is implemented using three FM oscillators(fig.4), the VCF stage is implemented using two FMoscillators, two AM Modulators and a Filter (fig.5), the VCAstage is implemented using a FM oscillator and a AMModulator (fig.6).

N. Thirer is with the Holon Institute of Technology, 58102 Holon, Israel(e-mail: Tirer_n@ hit.ac.il)

A.Souhami, is with Runcom Technologies, Rishon le Zion, Israel.(e-mail: aviram2k@gmail. com).

1-4244-0230-1/06/$20.00 )2006 IEEE 389

Page 2: Parallel Processing for a DSP Application using FPGAfaratarjome.ir/u/media/shopping_files/store-EN-1520764269-3733.pdf · M ost DSP algorithms require complex tools and a massive

FM-Oscilator FM-Oscilator FM-Oscilator[16 OSC_OUT O[16 OscAOrcT

in-DepE [SBitl [16Bi iN Deti [8Bit [16Bit ll DepE [8Bitl [16BW| It VCF

i_ ,e [16BU &iie [16Bitl &, Eq [B~i]i _ Wave [8Bit| Wave [8Bit l m ~~~~~~~Wave[8BiN |

Fig.5 VCO Architecture

The main controller is responsible to data acquisition (readingsamples from the A/D converter), data output to speaker (viaD/A converter) and the "coordination" of the microprocessors'work. For this, the main controller unit loads one (or more)gPs with the data it needs for proper operation and enables thegPs to start their operation. The controller will wait until thegP finishes its process. Then it will collect the results and willconfigure again the same gP and/or others gPs.Using only three types of gPs it is possible to perform all therequired functions of the DSP's processing. Using identicalgPs (by implementing all the functions in every gP) working inparallel makes it possible to improve performance in the VCFphase. Moreover, in our platform, a VCO phase of a new inputdata can be processing in parallel with the VCA phase of thelast input data, increasing the throughput of the system, due tothe pipelining operation of the system [5].

Fig.5 VCF Architecture

AM-DSB-TC

Fig.6 VCA Architecture

III. AN FPGA IMPLEMENTATION OF THE DSP BLOCK

The platform, as shown in figure 7, contains threesoftware defined gPs with code segment on ROM device and acontroller unit witch contains a small RAM memory. Every gPis programmed to perform one or every one of the DSP's base

Controller Unit[1 bit]et [1 bit]

ple_ln [16bit]vert_start [1 bit],_busy [1 bit]

Pl_En [lbit]P1_Data RDY [1 bit]

P1_collected_data [lbit]P1_Data_Bus [16 bit]

Sample_Out [16bit]Load_DAC [lbitJWR DAC [1 bitJ P2_En [1 bit]P2 Data ROY [lbit)

P2_collected data [lbit]Data P2_Data_Bus [16 bit]

Segment

P1_Data in [16 bit]Pl_busy [lbit]P2_Data_in [16 bit] P3_En [lbit]P2_busy [lbit] P3_Data_RDY [lbit]

P3 collected data 1 bit]P3_Data_in [16 bit] P3 Data Bus [16 bit]P3_busy [1 bit]

PP I

CLK [1bit] data_out[16bit]Reset[l bit] busy [1 bit]En [1 bit]data_ready [lbit]data_colected [1 bit] Codedata_in [16 bit] Segment

pP 2

CLK [1bit] data-out [16bit]Reset[l bit] busy [1 bit]En [1 bit]data ready [1 bit]data_colected [1 bit] Codedata_in [16 bitJ Segment_

pP 3

CLK [1bitJ data-out [16bit]Reset]l bit] busy 1 bit]En [1 bit]data ready [1 bit]

4 data_colected [1 bit] Codedata_in [16 bitJ Segment_

I11'llFig.7 A FPGA Implementation of the DSP Processing Block

Ii-

IV. DSP MICROPROCESSOR UNITS

In order to perform the specific DSP functions (oscillator,filter and amplitude modulator), the microprocessors wereprogrammed as RISC (reduced instruction set computer)processors.

The algorithm of every gP is:* Read data (including the function to be executed)

from the controller unit and store it in registers or in asmall memory.

* Wait for a synchronous event to start data processing.* Start the process execution and generate a "Busy"

signal.* Complete the process execution, deactivate the

"Busy" signal and write the result to the outputregister.

To communicate with the controller, the gP uses "data-in"and "data-out" ports and "data-ready" and "data-collected"control signals (fig.9), in a two wires handshake protocol. Inthis manner, the controller informs the microprocessor thatnew data is available (data_ready='1'). The gP reads the datafrom the bus and answers to the controller that the data wascollected (data_collected='1'). Then the controller finishes thecommunication by sending data_ready='O' and the gP finishesthe communication by sending data_collected='O'.

PlData Bus data-in

P1 Data RDY idata ready

Pl collected data i data collected

data ready iai

end

communic ation

cknow1ege

n

pP read data

Fig.8 Communication Timing

390

nis.

CLKRese

SamconvADC_

functior

c_I

AID

DIA 4.-

Page 3: Parallel Processing for a DSP Application using FPGAfaratarjome.ir/u/media/shopping_files/store-EN-1520764269-3733.pdf · M ost DSP algorithms require complex tools and a massive

V. THE CONTROLLER UNIT

The Controller Unit (CU) is the heart of the system. Thismodule is used to transfer data between the memory, the DSPgP and the I/O modules, as well as to control each of theDSP's gPs module activity.The CU module performs none of the DSP's operations, and

this module behaves like a microcontroller without anarithmetical and logic unit (ALU). The DSP processors are theALU of this module, but a generic ALU can be planted ifneeded.The CU includes a "code ROM", which operates as a "code

memory", and a state machine, that decodes the data from thismemory (fig.9).The module performs the following principal tasks:

* Moving data from memory to a specific DSP moduleor I/O and enabling it.

* Moving data from a specific DSP module to memory.* Waiting for an event in "interrupt" ports which

connects to the finish ports of the DSP modules.* Enabling a specific DSP module or resetting it.* Stopping and waiting for "new sample" events (start

reading ROM from the beginning).A reduced instruction set was implemented in order to

simplify this module as much as possible. A better instructionset, including, for example, "call" and "jump" operations, canbe added if the code that is used to control the system hasspecific subroutines or complex instructions that are repeated.In this manner we will need more logic elements, but a lessROM capacity and a smaller memory interface will benecessary.

The control unit includes two main processes (the"DECODE" process and the "READ" process) and a ROMwhich contains the instruction code. Every process isimplemented as a state machine.The "Decode" process includes three states:

* Idle state: the process waits for an event in the port"new sample", connected to the sample clock, whichsends a pulse when a new sample is produced.

* Decoding state: after an event of new-sample hasoccurred, the machine starts decoding the datacoming from the ROM data bus. The machine knowsthat the data in the port is ready when "data ready" ='1'.

* Waiting state: the machine waits to a specifichardware event to rise to '1'. This mode is usefulwhen we want to collect one of the modules outputdata.

The "Read" process also includes three states:* Reading state: in this mode the machine increases the

address counter every clock.* Restart state: the machine enters to this state when

wait4enable='1' and resets the address counter. Whenwait4enable='O' the machine goes to reading state.

* Rewind state: when wait4event=' 1' then addresscounter is decreased by 2. The Decode process is nowwaiting for a hardware event and didn't fetch the lasttwo addresses, so by decreasing the counter we canbe sure that the instruction will be made in the rightorder.

These processes work in pipeline mode, increasing thesystem's performance.

Fig.9 Controller Unit Architecture

VI. FPGA IMPLEMENTATION

The architecture of this system and the algorithms for thecontroller, for the microprocessors and for the communicationbetween controller and microprocessors, were implementedand successfully tested using an ALTERA NIOS STRATIXdevelopment Kit and 10 bits parallel A/D and D/A converters(fig. 0). To interface with the A/D and D/A converters, specialcontrollers were implemented (A/D Ctrl, D/A Ctrl). Also, a

dual port RAM was implemented to store the input data.

391

ROM addres

Dataready

Wait4event

Wait4enabIe

DataControl bus

-, Hem add. BusI/O add. Bus

NTewTsample TT THardware events.

These ports connected to the "Finish" ports in the DSP modules.

I

Page 4: Parallel Processing for a DSP Application using FPGAfaratarjome.ir/u/media/shopping_files/store-EN-1520764269-3733.pdf · M ost DSP algorithms require complex tools and a massive

Fig. 10 System Architecture

VII. CONCLUSION

Our DSP algorithm for the synthesizer system is composedof three separate processes. In some cases, two or threeprocesses will work in parallel to improve performance.

The Controller and the DSP microprocessors areprogrammed as RISC processors, to simplify the FPGAplatform.The parallel processing of data is provided by the

architecture of our platform and the pipeline execution ofcontroller unit processes and of the DSP modules.

REFERENCES[1] B. V. Herzen, "Signal Processing at 250MHz using High-Performance

FPGAs" ACM International Symposium on FPGAs, 1997, pp. 62-68.[2] T.S. Hall, D.V. Anderson, "A Framework for Teaching Real-Time

Digital Signal Processing with Field-Programmable Gate Arrays", IEEETrans. On Education, vol.48, 3, 2005, pp 551-558,

[3] Z.A.Zamindar, "Signal Processing Capability with the NuHorizonsSpartan-3 Development Board" ,Xcell journal, 52,2005,pp.28-30

[4] M. Pradhan, "Simplified Micro-controller & FPGA Platform for DSPApplications", Proceedings of the 2005 IEEE Int'l Conf. onMicroelectronic Systems Education (MSE'05), vol IV, pp 544-549.

[5] N. Thirer, I. David, I.Baal Zedaka, Uzi Efron, "Improvement of FPGAPipelines Implementation" SPIE Conf. "Optical Engineering andInstrumentation" San Diego, CA, USA,13-17 aug.2006, paper 6294-37.

392