Reconfigurable Baseband Blocks for Wireless Multistandard ... · One leading solution, the subject of our project, is the dynamic ... reconfigurable transceiver that is adaptable

Reconfigurable Baseband Blocks for Wireless Multistandard Transceivers

Department of Electrical and Computer Engineering

Faculty of Engineering and Architecture American University of Beirut

Final Year Project Spring 2005-2006

Advisors: Prof. Mazen Saghir

Prof. Walid Ali Ahmad

Members: Abdul Hadi Al-Sayed 200300531

Hasan Khalifeh 200301843

Houssam Hayek 200302327

Submitted On: 23.5.2006

ii

iii

AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements

This report couldn’t have been possible without the help of the big-hearted people

who supplied us with their invaluable time, information and appreciated benevolence.

First, we would like to thank our supervisors, Prof. Mazen Saghir and Prof. Walid Ali-

Ahmad, for searching and providing us with essential documents for our project. They

motivated us to achieve our goals. They regularly checked out our latest progress and

provided us with their valuable comments. Second, great thanks to Mr. Khaled Joujou

who supported us on a daily basis and who helped us in installing the PCI 5640

Labview8.0 board on our computer in the communication lab. Third, we would like to

thank National Instruments Company, “a technology pioneer and leader in virtual

instrumentation”, for providing the PCI 5640 Labview8.0 board which is an essential

device in our project.

iv

Table of Contents

v

Table of Contents Table of Contents Table of Contents Table of Contents

List of Illustrations vii

List of Tables ix

Abstract x

1- Introduction 1

1.1- Problem Definition 2

1.2- Report Structure 3

2-Literature Survey 4

2.1- Overview of Proposed Wireless Standards 5

2.1.2- WIMAX 5

2.1.3- WCDMA 8

2.2- Software Radio Concept 11

2.3- FIR Filters 13

2.3.1- Variable FIR Filters 14

2.3.1.1- Design Methods for Variable FIR Filters 14

2.3.3.2- Tap Design With Variable Frequency

Response Filters 15

2.3.2- Area Considerations in FIR Design Schemes 18

2.4- Hardware Platforms 19

2.4.1- FPGA features 19

2.4.3- LABVIEW 8.0 System Board: PCI 5640 20

2.5- Error Vector Magnitude (EVM) Metric 21

3-Design Alternatives 22

3.1- FIR vs. IIR: Advantages and Disadvantages 23

3.2 FPGA vs. DSP Chips 25

4- Project Design& Analysis 27

4.1- System Definition 28

4.2- FIR Filter Coefficients Design 29

4.2.1- FIR System Definition 29

4.2.2- Filter Coefficients Generation 29

4.2.2.1- WIMAX Channel FIR Filter Design 30

4.2.2.2- WCDMA Channel FIR Filter Design 32

4.3- LABVIEW Simulation of 3G Channels 34

4.3.1-WCDMA channel Simulation 34

4.3.1.1-WCDMA Acquire Input Signal 38

4.3.1.2-Noise Introduction VI 41

4.3.1.3-Channel Filter VI 43

Table of Contents

vi

4.3.1.4-EVM VI 45

4.3.2- WIMAX Channel Simulation 47

4.3.2.1- WIMAX VIs Explanation 49

4.4- Reconfigurable System Architecture 53

4.5- Hardware Implementation 56

4.5.1- Reconfigurability Aspect 56

4.5.2- Data Links and Communications 57

4.5.3- Host-FPGA synchronization 58

4.5.4- FPGA-HOST synchronization 60

4.5.5- Memory Component 61

4.5.6- Number Representation 63

4.5.7- Convolution Process 64

4.5.8- Host Application 66

4.5.9- FPGA Process 66

4.6- Design Assessment 68

4.6.1- Testing Scheme 68

4.6.2- Results Assessment 69

5- System Design Constraints 74

Conclusion 77

Appendix 78

I- Digital Filter Coefficients Design 78

II-Area Considerations for Variable FIR Design 80

III-LABVIEW 8.0 System Board: PCI 5640 84

IV- Virtex II Pro FPGA Capabilities 87

V- Fixed Point Notation 88

Bibliography 89

vii

List of IllustrationsList of IllustrationsList of IllustrationsList of Illustrations

-Figure 2.1: WIMAX Block Diagram 6

-Figure 2.2: WCDMA Block Diagram 9

-Figure 2.3: Ideal root raised cosine filters 10

-Figure 2.4: Standard FIR Filter 13

-Figure 2.5: Magnitude and Phase Error 21

-Figure 4.1: WIMAX FIR baseband filter 31

-Figure 4.2: FS10 WIMAX FIR 31

-Figure 4.3: WIMAX FIR Filter Response using FS10 32

-Figure 4.4: WCDMA FIR baseband filter 33

-Figure 4.5: WCDMA FIR baseband filter using FS10 33

-Figure 4.6: Simplified WCDMA channel 34

-Figure 4.7: LABVIEW based WCDMA transceiver block diagram 35

-Figure 4.8: WCDMA transmitter using ADS 36

-Figure 4.9: WCDMA transmitted signal and its constellation using ADS 37

-Figure 4.10: Top level View of the WCDMA Channel 38

-Figure 4.11: WCDMA Acquire Input Signal VI 39

-Figure 4.12: PSD of a WCDMA Signal 40

-Figure 4.13: Constellation of the Input Data 40

-Figure 4.14: Noise Introduction VI 41

-Figure 4.15: Noisy PSD 42

-Figure 4.16: Noisy QPSK Constellation 42

-Figure 4.17: Channel Filter LABVIEW VI 43

-Figure 4.18: PSD of the Filtered Signal 44

-Figure 4.19: Constellation after Filtering 44

-Figure 4.20: EVM LABVIEW VI 46

-Figure 4.21: LABVIEW WIMAX Transceiver Block Diagram 47

-Figure 4.22: WIMAX transmitter using ADS 47

-Figure 4.23: WIMAX transmitted signal using ADS 48

-Figure 4.24: Top level View of the WIMAX Channel in LABVIEW 49

-Figure 4.25: WIMAX PSD in LABVIEW 50

-Figure 4.26- WIMAX EVM LABVIEW Module 51

-Figure 4.27: WIMAX filter response. A) no noise b) noisy WIMAX signal for 0.3

noise standard deviation c) WIMAX filter response for 0.3 noise

standard deviation 52

-Figure 4.28: Reconfigurable Transceiver Block Diagram 54

-Figure 4.29: Case Structure (Choose WCDMA / WIMAX) 57

viii

-Figure 4.30: Host-FPGA Link 58

-Figure 4.31: FPGA Read Process 59

-Figure 4.32: DMA FIFO Read Method 60

-Figure 4.33: FIR implementation Using Arrays and FIFOs 61

-Figure 4.34: Write a 32-bit coefficient in the memory 62

-Figure 4.35: Convolution Process 65

-Figure 4.36: HOST VI 66

-Figure 4.37: FPGA VI 67

-Figure 4.38: Frequency Domain of the Input WCDMA signal 70

-Figure 4.39: Frequency Domain of the Input WCDMA signal 70

-Figure 4.40: WCDMA Initial Constellations 71

-Figure 4.41: WCDMA Filtered Constellations 71

-Figure 4.42: Frequency Domain of the Input WIMAX signal 72

-Figure 4.43: Frequency Domain of the filtered WIMAX signal 73

-Figure A.1: Transposed FIR 82

-Figure A.2: Transposed FIR with multiplier block 82

-Figure A.3: High level FPGA_ I/O architecture 85

-Figure A.5: High level Diagram of the PCI 5640 85

ix

List of TablesList of TablesList of TablesList of Tables

-Table 4.1: MATLAB code to generate FIR coefficients of WIMAX 31

-Table 4.2: MATLAB Testing for Fixed Point Notation 63

x

AbstractAbstractAbstractAbstract

As new wireless communication standards are introduced to market, the idea of

reconfigurable systems is becoming essential to solve the different problems that the

coexistence of multiple standards poses. In this report, a proposed implementation

technique is given to reconfigure WIMAX and WCDMA transceivers. This

implementation technique highlights the design considerations related to the channel

FIR filter present in the receiver of each of the prementioned standards. This report, also,

discusses the features of PCI5640 Labview8.0 device on which the proposed design is

downloaded. In addition, a common architecture design is proposed in order to facilitate

future job of reconfiguring the different modules in the transceiver. Implementation,

performance, reconfigurable FIR filter testing, and results are further discussed in details.

1. Introduction

1.1- Problem Definition

1.2- Report Structure

Reconfigurable Baseband Blocks for Wireless

Multistandard Transceivers

Introduction 1.1- Problem Definition

2

1-Introduction

Since early 1980s, the evolution of new wireless communication standards has been

remarkably noticed, especially in migrating from analog communication systems to their

equivalent in the digital domain. Later, the industrial competition between Asia, Europe,

and America encouraged the development of a unique mobile system standardized all

over the world which would be of great benefit to the market [1]. From now till the

deployment of the above mentioned standard, the market will be facing many problems

due to the coexistence of multistandardized communication systems. Nowadays, many

researchers are working on short end solutions before the transition to the worldwide

standards takes place. One leading solution, the subject of our project, is the dynamic

reconfiguration of the different modules in the system to suit the specs of as many

standards as possible.

1.1- Problem Definition

Nowadays, the heterogeneity found at the different layers of wireless communication

channel is increasing as new standards are introduced. Despite this problem, many

countries such as European countries and Japan, are willing to install new base stations

that support multitude of communication standards such as GSM, EDGE, UMTS-FDD and

Bluetooth. Designing such base stations efficiently requires studying the reconfigurable

aspects of the different modules in order to avoid duplication of resources. Thus, the

system is capable of dynamically reconfiguring itself to the environment as needed. This

solution is beneficial for both, to the final user and the manufacturers. Starting with the

end user, he will benefit from a higher quality of service, better connectivity, and

enhanced roaming concept. Concerning the manufacturers, they would profit from ease

of introduction of new types of services, less to market time and reduction in the cost of

addition of new standards.

Introduction 1.2- Report Structure

3

In this report, we present the design and implementation of some blocks of a

reconfigurable transceiver that is adaptable to WCDMA and WIMAX.

1.2- Report Structure

Our report is organized as follows. Chapter 2Chapter 2Chapter 2Chapter 2 introduces our topic by giving a general

survey about the related subjects in our design; an overview of WIMAX, and WCDMA

wireless standards, with their specifications, is given. A general introduction about SDR

concept comes afterwards. FIR filters and FPGA related topics are followed. . In chapter chapter chapter chapter

3333, different design alternatives that were studied throughout our survey are presented. A

more detailed description of our system design and analysis, including simulation and

hardware implementation, is then introduced in chapter chapter chapter chapter 4444. In this chapter, we also go

further by presenting our testing scheme used and a detailed description of the results

obtained. In chapter chapter chapter chapter 5555, we present system design constraints form different perspective

such as economic, social, sustainability, political… Finally a conclusionconclusionconclusionconclusion section is added.

An appendix appendix appendix appendix covering further details about the fixed point notation, the PCI 5640, area

considerations in FIR design, Virtex II pro FPGA capabilities and filter coefficients design

is included for further information.

System Design Constraints

2. Literature Survey

2.1- Overview of Proposed Wireless Standards 2.1.1- WIMAX

2.1.2- WCDMA

2.2- Software Radio Concept

2.3- FIR Filters 2.3.1- Variable FIR Filters 2.3.1.1- Design Methods for Variable FIR Filters

2.3.1.2- FIR Tap Design With Variable Frequency Response

2.3.2- Area Considerations in FIR Design Schemes 2.4- Hardware Platforms 2.4.1- FPGA features

2.4.2- LABVIEW 8.0 System Board: PCI 5640

2.5- Error Vector Magnitude (EVM) Metric

Reconfigurable Baseband Blocks for Wireless Multistandard Transceivers

Literature Survey 2.1- Overview of proposed Wireless Standards

5

2- Literature Survey

In this chapter, we present a literature survey about some topics needed for the design

and implementation of the final year project. An overview the proposed 3G wireless

standards: WIMAX and WCDMA including their specifications is presented. A general

description of the software radio concept (SDR) is also introduced. A survey about FIR

filters, in particular variable FIR filters and some design related techniques follows.

Finally, we present the features of the used Virtex-II Pro FPGA as well as the definition

of the used Error Vector Magnitude (EVM) metric.

2.1- Overview of Proposed Wireless Standards

Due to the evolving technology, users’ needs are becoming more crucial, especially in

the field of wireless communication. He no more feels sufficient to use his mobile phone

for voice communication, but also looks forward for high data rata communications

through SMS or even multimedia communication. For all these reasons, new wireless

standards evolved in order to meet such and other user’s requirements. Of these

standards, we mention: WIMAX and WCDMA.

2.1.1- WIMAX

WIMAX, (Worldwide for Microwave Interoperability Access) also referred to as

802.16, is the current standard for Broadband Wireless MAN networks that is aimed to

provide a wireless alternative to cable, DSL and T1/E1 for last mile broadband access. It

will be also used to connect hot-spots to the internet [2]. It has the potential for very long

range (5 - 30 miles) and high speeds [3]. The first version of WIMAX was approved as an

IEEE Standard 802.16-2001 and this was published in 2002. This standard, however, had

the drawback of addressing only fixed line-of-sight connections by focusing on licensed


6

frequencies in the range of 10-66 GHz; this standard could reach a maximum distance of

5 Km [2].

Because of the mentioned drawbacks, they had to enhance the current standard thus

leading to a new standard 802.16a that addresses lower frequencies 2-11 GHz range; it

could reach a maximum distance of 50Km (ten times better) with a bit rate up to

75Mbit/s. The most important advantage for this standard, in addition to the previous

mentioned ones, is the fact that it supports Non-line-of-sight. This is because this

standard runs on lower frequency bands in comparison to the high frequency bands

involved in the previous standard (10-66GHz) [2].

WIMAX has a higher capacity with a lower cost than DSL or any cable for extending

fiber networks. It also has the advantage of supporting multimedia and fast internet

applications. The block diagram of 802.16a is illustrated in figure 2.1.

Figure 2.1: WIMAX Block Diagram

As shown in the above diagram, the WIMAX uses OFDM and this provides the possibility

of using NLOS (no line of sight) systems as deduced [2].

WIMAX specifications are summarized below:

- Selectable channel bandwidths of 1.5, 1.75, 3, 3.5, 5.5, 7, 10, 14 and 20 MHz


7

- 256 – point FFT / IFFT

- 10-bit AGC with fully programmable outputs for interface to any type of attenuator.

- Supports maximum 128dB of attenuation with 1⁄2 dB step resolution.

- Includes interpolation and decimation filters for 2x oversampling

Moreover, the filter characteristics depend on the ADC (Analog to digital converter)

dynamic range and sample rate [5].

It is useful at this step, to explain some of the blocks that appeared in the figure above:

Convolutional EncoderConvolutional EncoderConvolutional EncoderConvolutional Encoder: This encoder encodes a stream of binary input vectors (K) and

outputs a, usually, larger stream of output vectors (K*L) where L is a certain positive

integer chosen suitably for the design specifications [4]. This encoder plays an important

role in a fading, or noisy, environment because it is capable of correcting some errors

affected by such environment.

Interleaver: Interleaver: Interleaver: Interleaver: This block improves further the performance of encoder at the transmitter

side and decoder at the receiver side [4]. Its presence becomes more important in a fading

environment where it spreads the errors into many (K*L) output bits thus leaving few

errors in each (K*L) output bits and thus is capable of correcting these few errors.

Modulator: Modulator: Modulator: Modulator: It (802.11a) uses OFDM (Orthogonal frequency division multiplexing) where

it divides the given bandwidth into many multicarriers and sends the data (for one user

or multiple users) on each multicarrier. It uses shifted pulse shaping filters at transmitter

and receiver. This would filter some crossings between the multicarriers [4].

Time guard: Time guard: Time guard: Time guard: This block is added to improve the modulator more by decreasing the effect

of multipath propagation. Accordingly, OFDM needs only one multiplication on each

subcarrier as equalization [4].

Puncturer: Puncturer: Puncturer: Puncturer: This block decreases the rate of bits to match the rate of the interleaver in

such a way, it won’t loose any information.


8

2.1.2- WCDMA

Wideband CDMA is a third-generation (3G) wireless standard. It uses a 5 MHz

channel for both voice and data, offering an initial data speed of 384 Kbps [3]. It can also

reach speeds of up to 2 Mbps for voice, video, data and image transmission.

WCDMA is also referred to as UMTS - the two terms have become interchangeable [3].

This standard is based on code division multiple access modulation (CDMA) which

provides the capability of finding multi-user scenarios. However, this leads to inter-

symbol and intra-symbol interference (ISI). Thus, it uses a spread spectrum modulation

technique (SS) that is capable of reducing interference by a factor “L” called “the

spreading gain” or the “spreading factor”. Actually, this modulation technique has other

advantages. For example, it spreads the signal into a larger bandwidth with same energy,

(thus reducing the amplitude of the signal) and accordingly, it can escape any voluntarily

jamming action that has a certain noise threshold (the information signal will lie below

this threshold). Also, this technique enables for multiple accessing for the same frequency

band and at the same time. This, however, leads to some problems, especially,

interference which can be solved by some techniques that are implemented by the

WCDMA like: Soft handover, and softer handover solution techniques. The WCDMA

block diagram is shown in figure 2.2.

As shown in the figure 2.2, the mapping used is a QPSK mapping which maps every

two bits into one symbol. Upsampling is then performed, usually by a factor of 4, to

increase the bit rate. Upsampling is performed so that we can input the output of our

processing blocks at the same rate to the DAC to be processed correctly [6].

Upsampling is performed by inserting (4-1) zeros between each original input and the

net result would be a compressed DTFT signal by a factor of 4 [7]. However, Upsampling

adds to the original signal undesired spectral images which are centered on multiples of

the original sampling rate. Accordingly, we have to perform some kind of filtering to


9

remove the undesired spectral images (this is the interpolation filtering at the end of the

chain) [6].

Figure 2.2: WCDMA Block Diagram

The standard uses a root raised cosine filter (RRC) which has the characteristic, in

addition to being a pulse shaping filter, of canceling ISI in an ideal channel scenario since

the peak of only one of the signals will lie above the zero crossings of all the other signals.

This means that all the other signals will have no impact on this single signal and thus it

will not suffer from ISI. The FIR root raised cosine time function is given in Figure 2.3.

The function obeys the following equation:

Where α is the rolloff factor.

2

(1 ) (1 )cos sin

4 4( )

41

t T t

T t Th t

T

T

α π α π

α α

π α

+ − +

=

−


10

Figure 2.3: Ideal root raised cosine filters

Upcoversion is then followed. It converts the low frequency signal into an RF

frequency signal. A filter is then applied called “interpolation filter”, this filter, as

discussed previously, removes the undesired spectral images caused by upsampling and

upconversion. The output has then the same rate as the DAC and accordingly, it can now

be inputted to the DAC block and then transmitted.

The receiver components contain the inverse of the blocks explained previously.

Literature Survey 2.2- Software Radio Concept

11

2.2- Software Radio Concept

In the transformation from 2G to 3G standards, we need a common implementation

platform which can group all wireless standards in one way or another.

One evolving technique for the variable FIR implementation is using the SDR concept.

“SDR is a rapidly evolving technology that is receiving enormous recognition and

generating widespread interest in the telecommunication industry. It is the focus of

research in the communication field world wide” [8].

“SDR refers to the technology where software modules running on a generic

hardware platform are used to implement radio functions” [8, 9]. In other words, the

hardware platform supports multiple software modules. So the system can switch

between different standards by running the corresponding software.

SDR tries to achieve two main goals [10].

-to move the digital part of the transmitter and the receiver as much as possible

toward the antenna (RF end)

-to replace ASIC with DSPs since DSPs are able to process baseband signals and thus

radio functionalities through software

As illustrated above, most solutions are trying to rely on software to solve different

problems. These solutions, however, need not be the optimal ones. In addition to

software, hardware programmable devices such as FPGAs need to be used in parallel to

reach optimal performance.

SDR offers many advantages to the user, the most important of which is

reconfigurability. The Reconfigurability subject has been extensively researched in the

last decade especially in the field of mobile communications. Different people have been

working on reconfiguring the different modules of the channels. For example in [11], the

paper presents the implementation in software of the different modulation/demodulation

schemes for the GSM, UMTS, EDGE and Bluetooth on a unique hardware platform.

Looking into these schemes, UMTS uses QPSK as modulation technique, GSM – GMSK,

Literature Survey 2.2- Software Radio Concept

12

EDGE - 8PSK, Bluetooth-GFSK. The need to implement these different systems on the

hardware platform forced the researchers to look into the mathematical representation of

these types of modulation. They observed that all these modulation schemes can be

expressed by a quadrature decomposition, which encouraged them to build a common

architecture called digital-IF. Their implementation is totally done in software and is

download on a DSP, also the transition in the frequency band is done in software. This

implementation takes advantage of the common mathematical aspect of the different

modulation schemes and benefits from DSP board that allows work to be done in the

software domain.

More advantages can be offered by SDR. These include “Multi service”; the SDR

system can theoretically operate in multi-service environments, without being

constrained to a particular standard, “Multi band”; SDR systems can theoretically

function on any radio frequency band, “Update Feature”: the software modules that

implement new characteristics can be downloaded to the hardware platform and thus the

system can be kept up to date [1,8,12,13].

Some drawbacks of SDR is that the system can have higher power consumption,

higher processing power (MIPS) requirement or higher initial costs depending on the

design.

Literature Survey 2.3-FIR Filters

13

2.3- FIR Filters

The digital filter is one of the basic blocks in Digital Signal Processing and

Communication systems. Their design and thus their operation can affect the

performance of the whole system. FIR Filters are characterized by several parameters

including their orders and the values of their coefficients which depend on the desired

frequency response. Generally, there are two kinds of digital filters: IIR and FIR. Infinite

Impulse Response, IIR, Filters are filters whose impulse response can be infinite in

duration. Finite impulse response filter, FIR, on the other hand, is a special kind that

contains a finite number of taps in its impulse response [14, 15, 16]. It is one of the most

widely used modules in DSP applications. “It performs a moving, weighted average on a

discrete input signal, x(n), to produce an output signal” [17]. Thus, the FIR output

depends only on the previous N inputs; where N is the number of taps.

The basic operation in the filtering process is to convolve the input by the filter weights

or taps as given by the following equation:

0

( ) ( ) ( )N

k

y n x n h n k=

= −∑ (1)

Fig 2.4 shows the architecture of a standard fully pipelined FIR filter for the

implementation above formula.

Figure 2.4: Standard FIR Filter

FIR filter design involves two main stages: coefficients design and architecture design.

The following section deals with the first stage: coefficients design. The second stage,

architecture design, is referred to in the FIR Implementation section. For more


14

information about the first stage, the filter coefficients design, refer to section I in the

Appendix. As for the second stage, the FIR implementation stage, it is illustrated in

chapter 3 in the report, Project design and analysis.

2.3.1- Variable FIR filters

Variable digital filters are digital filters whose frequency characteristics depend on

control or tuning parameters. The most common variable parameters include:

•••• Variable cutoff frequency

•••• Adjustable Passband Width

•••• Adjustable Stopband Width

•••• Controllable Fractional Delay

•••• Magnitude and number of ripples

•••• Attenuation level in various bands

Varying any of the above parameters results in a change of the order of the filter i.e. the

number of taps or coefficients of the filter and of course their values.

2.3.2.3.2.3.2.3.1111.1.1.1.1---- Design Methods for Variable FIR Filters Design Methods for Variable FIR Filters Design Methods for Variable FIR Filters Design Methods for Variable FIR Filters

Methods for designing variable digital filters can be classified into two main

categories: the transformation based methods and the spectral parameter approximation

methods [18]. The transformation methods are based on first designing a filter with

certain fixed frequency characteristics and then applying a certain transformation to

obtain the new filter with new desired frequency characteristics based on predesigned

parameters. Generally, this method is applied to filters with variable cutoff frequencies.

The spectral parameter approximation methods, on the other hand, approximate either

the impulse response or the poles and the zeros of the filter by polynomials that are

functions of certain spectral parameters [18, 19, 20, 21]. One used technique is the curve

fitting technique as shall be examined in the following section.


15

2.3.1.2- FIR Tap Design with Variable Frequency Response

Different approaches for each category have been proposed for the design of digital

variable filters taps. In this paper, we try to focus on the most widely used approaches.

One old but still evolving technique that belongs to the second category expresses the FIR

impulse response as a linear combination of some basis functions. Another technique

relies on the Frequency masking concept. Note that the frequency masking approach is a

mix of the two categories as will be illustrated later.

In the former, each filter coefficient is a multidimensional function or polynomial of the

spectral parameter. The famous algorithms for the optimal approximation of filter

coefficients include the LSE (least squares method), the WLS (weighted least squares) and

the curve fitting approaches as shown below:

Least Square Approximation MethodLeast Square Approximation MethodLeast Square Approximation MethodLeast Square Approximation Method

By expressing the impulse response of the filter as a linear combination of basis

functions, the optimal LS (Least squares) solution for designing the filter then reduces to

solving a system of linear equations.

The impulse response of the variable filter, h(n,Φ), is considered as a linear combination

of the functions ψm (Φ), which depend on the variable parameter (Φ). This is illustrated in

the following equation.

,

0

( , ) ( )M

n m m

m

h n c ψ=

Φ = Φ∑

The functions ψm (Φ) constitute the basis functions and are most often chosen to be

orthonormal but this is not necessary. The cn,m values are the expansion coefficients. The

aim is to determine the expansion coefficients given the basis functions such that the

frequency response h(n, Φ) approximates a desirable variable frequency response as a

function of the spectral parameter Φ. The approximation error, which is the difference

between the desired frequency response and the approximated response in frequency

domain, is a function of the expansion coefficients cn,m as illustrated [18]. The L2 form of

E(ω,Φ) is given by:


16

2( , ). | ( , ) |

s s

jE W e E d d

ω

φ

φ ω φ ω φΩ

= ∫ ∫

where “W” is some weighting function that controls the amount of approximation error

in the frequency space. Ωs is the frequency space and Φs is the parameter space over

which the spectral parameter vector Φ is to be varied. The L2 norm of E(ω,Φ) is a

quadratic function of the expansion coefficients and has a unique minimum characterized

by a system of linear equations. Writing “E” in the appropriate form and differentiating

with respect to the expansion coefficients and then setting the result to zero, one gets a

linear equation with the optimal coefficients solution [18].

Weighted Least Square ApproximationWeighted Least Square ApproximationWeighted Least Square ApproximationWeighted Least Square Approximation

Using the weighted least squares approximation method, the designer can control the

frequency response error by minimizing it in the passband frequency region. The cost,

thus, is the increased value of errors in the other regions.

By using an adaptive weight function1, the norm of the frequency response error can be

minimized.

If ρ is a real number representing the parameter to be varied; e.g. bandwidth, resonance

frequency, group delay …, then the actual filter can be written in the following form:

N

0

F(z, ) = ( ) n

na zρ ρ −∑

Where an ( ρ ) is a polynomial function in ρ .Thus F(z, ρ ) can be written as the

product of the 3 matrices: F(z, ρ ) = ZT(z).AMP( ρ ) such that

Z(z) = [1, z-1, z-2, ….z-N]T,

P( ρ ) = [1, ρ , ρ 2, …. ρ k]T,

where AM = amplitudes matrix. F(z, ρ ) can also be written as

F(z, ρ ) = AvT (P( ρ ) ⊗ Z(z))

where ⊗ is the Kronecker product and Av is a row vector denoting the concatenation of

the rows of the AM matrix i.e. [r1 r2 ….. rN].

1 Changing the weights at run time


17

Thus F is a linear expression in function of z and ρ . Note that the implementation

cost is proportional to the number of elements in the AM matrix.

Defining the cost function so the frequency error function is minimized both along the v

and ρ axis as

J = 2 2| ( , ) ( , ) | ( , )i

l i

i v

l id i l i l

p C z L

F e p F v p w v pπ

∈ ∈

−∑∑

where “id” denotes the ideal filter and w is the weight function. Assuming that w is

independent in v and ρ , then

w(v, ρ ) = w(v) . w( ρ ).

The weights are then designed such that J is minimum [22].

Curve Fitting TechniqueCurve Fitting TechniqueCurve Fitting TechniqueCurve Fitting Technique

Conventional techniques suffer from the fact that the edge frequencies of the various

bands cannot be independently controlled. The proposed technique removes this

restriction by expressing the filter coefficients as analytical functions of the frequency

specifications by using a curve fitting technique. Thus more flexibility is available than

the transformation based techniques. The technique also belongs to the spectral

parameter approximation category illustrated above. The technique, however, suffers

from the fact that it requires a large design time. The main idea is that the technique

optimizes several fixed filters having different spectral parameter values and then a curve

fitting technique in-order to fit an analytic function to the coefficient values. Assuming

that the frequency response changes slightly between different given fixed filter

responses, and thus small change in filter coefficients, the curve fitting technique can

present highly accurate results. The time required to come up with the designed

coefficients is highly dependent on the number of given fixed frequency responses (of

various fixed filters) as well as the number of selected points chosen from the responses.

A tradeoff should be decided between the accuracy of the filter taps and time on one

hand and the number of selected points on the other hand. Increasing the number of

selected points per filter response increases the accuracy of the filter taps as well as the


18

execution time of the convolution process. Moreover, the degrees of the chosen

polynomials have a considerable effect on the accuracy of filter coefficients [21].

2.3.2- Area Considerations in FIR Design Schemes

There exists a never-ending demand for decreasing the amount of hardware used in a

system. This leads to substantial benefits like reduced cost and power consumption,

increased application functionality1, and thus increased utilization of FPGA resources …

In most FIR implementation, hardware consumption is mainly due to the multiplier

blocks rather than adder modules [23]. Different algorithms have been proposed for

efficient implementation of multiplier blocks. Previously, different algorithms were

proposed for minimizing adder hardware cost since it was assumed that the adder cost

dominates the area requirement; from a VLSI point of view. However, after the

introduction of the FPGA as the hardware platform, the solution of minimizing adders’

complexity does not work anymore since the “FPGA has a fixed architecture for

implementing digital logic”. Instead, it is the architecture design that minimizes such cost

[23].

Different commonly used approaches and architectures that increase resource

utilization are considered below:

Consider first the standard FIR implementation shown in Fig 2.5. The figure shows a full

parallel, fixed coefficient FIR filter. For each tap, the filter requires one multiplier, one

adder and one delay element. Thus the resource usage is proportional to the number of

coefficients [14, 24, & 25]. Other enhanced architectures and techniques with higher

complexity include array multiplication, multipliers using add and shift operations,

transposed FIR, transposed FIR architecture with multiplier block, MAG (Minimized

adder group) algorithm or multiplier design, architecture based on computational sharing

multipliers (CSHM). For more details about the mentioned methods, refer to appendix II.

1 by using the extra available area

Literature Survey 2.4- Hardware Platforms

19

2.4- Hardware Platforms

“An FPGA is an array of gates with programmable interconnect and logic functions

that can be redefined/ reconfigured after manufacture” [24].It is characterized by its small

size and high resource utilization.

Designers usually use VHDL or Verilog to define various hardware resources. To access

these hardware resources, designers implement driver modules. Drivers are defined as the

interface between software and hardware and thus decrease the gap between

hardware/software design as much as possible.

2.4.1 – FPGA Features

One interesting advantage of the FPGA is its optimized DSP applications. FPGAs, for

example, can perform MAC (multiply and accumulate) operations very quickly. One way

to implement the MAC operation on an FPGA is to use array multiplication with a

pipelined structure resulting in fast throughput. Another design can use LUT, look up

tables, which can execute operations in high speeds [24].

Thus, FPGAs are increasingly becoming the implementation platform for high-speed

DSP systems. They offer many advantages that fulfill the DSP applications needs.

FPGA design can reduce design time and thus time-to-market duration. Most current

FPGA tools have been created for fast ASIC prototyping since they are very efficient in

the use of engineering time [24]. The reason is that the FPGA offers a high degree of

flexibility which facilitates the testing process where errors are incurred and thus

modifications take place at nearly no cost.

Another characteristic is the dynamic reconfigurability of the FPGA. Usually, the

FPGA is configured when the system is powered on. Thus the FPGA does a fixed

operation until the system power is turned off. However, recent FPGAs offer dynamic

reconfiguration, where the user can reconfigure the FPGA during processing time [24].

Literature Survey 2.4- Hardware Platforms

20

2.4.2 – LABVIEW 8.0 System Board: PCI 5640

The IF RIO LABVIEW based reconfigurable system board, manufactured by NI,

contains a reconfigurable FPGA surrounded by fixed I/O resources such as ADC and DAC

that can be controlled through software. The RIO board uses the LABVIEW software to

create VI modules that can run on the FPGA, also known as FPGA VI.

The advantage of such board is that it uses relatively easy software, LABVIEW, and

thus hides the complexity of the common HDL languages, VHDL and Verilog, that are

commonly used to design hardware components. Moreover, it also supports designs that

are created using HDL. So, modules created using VHDL or some other HDL language

can be imported to the LABVIEW as custom VIs.

Due to these reasons, we propose the LABVIEW 8.0 System Board: PCI 5640 to use as

the underlying hardware platform. For more information about the LABVIEW board as

refer to appendix III & IV.

Literature Survey 2.5- EVM Metric

21

2.5- Error Vector Magnitude (EVM) Metric

The error vector magnitude metric (EVM) measure is used to evaluate the

efficiency of the design of modulators, filters… This measure has shown excellent

performance in testing communication systems and is widely used in nowadays research

centers such as in the IS-54 TDMA digital cellular systems to verify the specifications

needed for accuracy of /4-DQPSK modulators.

The algorithm involves the following steps:

1. Read an input signal with I and Q components.

2. Compute the ideal constellation positions such as ± 0.7 ± 0.7 j for an ideal QPSK

modulator to use as a reference for comparison.

3. For every input signal (I and Q combination)

a. Calculate the distance between this point and the 4 possible reference

constellations (distance between A&B in figure 11 where B is a reference

constellation and A is an input signal).

b. Choose the minimum distance so that this input signal is mapped to this

reference constellation in the demodulator.

c. Square the minimum distance and normalize it with respect to the square

of the distance between the origin and the reference constellation point

(AB2/OB2).

4. Add all the normalized square distances

5. Take the square root of the sum and divide by the number of inputs to get the

EVM metric.

Figure 2.5: Magnitude and Phase Error


CH 3: Design Alternatives

3.1- FIR vs. IIR: Advantages

and Disadvantages

3.2 FPGA vs. DSP Chips



Design Alternatives 3.1- FIR vs. IIR

23

3- Design Alternatives

Even though the literature survey various design issues, it is still important to

present some other design alternatives and evaluate their performance and how they

affected our choice in what alternative to choose throughout the design. Two of the main

studied design alternatives are presented in section 3.1 and 3.2. Other alternatives and

algorithms were presented previously in the literature survey as illustrating examples.

3.1- FIR vs. IIR: Advantages and Disadvantages

FIR filters are characterized by their simple architecture and thus lower

implementation complexity. For example, the FIR filter can be implemented using only a

single multiplier and an accumulator. In addition the FIR can use fewer bits than the IIR

filter due to the absence of a feedback loop which introduces more errors1.

In contrast to the IIR filters where the output can sometimes be unstable, the FIR, on the

other hand, can always be designed such that its output is stable. In addition, the FIR

filter can have a linear phase if the filter coefficients are symmetrical or anti-symmetrical

around the center frequency [14]. This feature is essential for data transmission, video

processing and high-quality audio systems [14,16].

Another advantage of the FIR errors is that errors introduced as a result of quantizing

filter coefficients can have a low impact on the filter outputs case the quantization

process was properly handled. This is a very important property when a low bit-error-

rate is desired [14].

Even though the FIR possesses many advantages; many disadvantages arise compared

to the IIR. FIR filters usually have a higher order than IIR filters for a given spectral

characteristic. Thus, FIR filters require a higher number of multipliers compared to IIR

1 Given that the IIR filter output relies on previous outputs, then errors propagate to future outputs and thus

we need more bits to get the desired accuracy

Design Alternatives 3.1- FIR vs. IIR

24

filters case the implementation is fully pipelined1, and thus every output needs one

iteration. On the other hand, if the implementation is not pipelined, the FIR would take

more time than the IIR filter.

These disadvantages translate into larger memory requirements and

computational resources. Inaddition, “FIR coefficients must be designed using an iterative

method since the required filter length to satisfy a given filter specification can only be

estimated” [16]. In other words, the designer specifies the order of the filter, given certain

specs, and then simulates the frequency response. If it didn’t meet the desired response,

he re-estimates a new order based on the previous results and repeats the process.

1 By pipelined, we mean that every tap is assigned one multiplier

Design Alternatives 3.2- FPGA vs. DSP Chip

25

3.2 FPGA vs. DSP Chips

Digital signal processors are optimized processors designed to perform signal

processing mathematics operations. They have been extensively used in the market

during the last three decades. Nowadays however, after the introduction of FPGAs,

customer attraction to DSP has recessed. The use of either the DSP or the FPGA depends

on several factors as illustrated below.

DSPs are characterized by their flexibility and ease of programming relative to the

FPGA. In a DSP system, the programmer does not need to understand the hardware

architecture [24]; the hardware implementation is hidden from the user. The DSP

programmer uses either C or assembly whereas the FPGA designer usually uses VHDL or

Verilog.

With respect to the performance criterion, the speed of the DSP chip is limited by the

clock speed of the board, given that the DSP processor operates in a sequential manner

and accordingly cannot be fully parallelized. FPGAs, on the other hand, can work very

fast if an appropriate parallelized architecture is designed however they offer less

flexibility than the DSP processors.

Reconfigurability in DSPs can be achieved by changing the memory content of its

program. This is in contrast to FPGAs where reconfigurability can be performed by

downloading reconfiguration data to the RAM.

Regarding power consumption in a DSP, it depends on the number of memory

elements used regardless of the size of the executable program. As for the FPGA, the

power consumption depends on the circuit design.

FPGAs are important when there is a need to implement a parallel algorithm, that is,

when different components operate in parallel to implement the system functionality.

Thus the speed of execution is independent of the number of modules. This is in contrast

to DSP systems where their execution speed is inversely proportional to the number of

functionalities.

Design Alternatives 3.2- FPGA vs. DSP Chip

26

In conclusion, in most engineering projects, it is the application that dictates which

device and platform to use in order to achieve optimal performance at a low cost. FPGAs

outperform DSP systems in the area of filter implementation, convolvers, correlators,

FFTs … Whereas, DSPs are more practical for signal processing programs of sequential

nature.

Given that we are designing a reconfigurable hardware platform that operates in a

parallel fashion, FPGA is more suitable to use than a DSP board. A complete description

of the details of the implementation process will be presented in the successive chapters.


4. Project Design & Analysis

4.1- System Definition

4.4- FIR Filter Coefficients Design 4444.4.1.4.1.4.1.4.1---- FIR System Definition FIR System Definition FIR System Definition FIR System Definition

4444.4.2.4.2.4.2.4.2---- Filter Coefficients Generation Filter Coefficients Generation Filter Coefficients Generation Filter Coefficients Generation

4.4.2.14.4.2.14.4.2.14.4.2.1---- WIMAX WIMAX WIMAX WIMAX ChannelChannelChannelChannel FIR Filter Design FIR Filter Design FIR Filter Design FIR Filter Design

4.4.2.24.4.2.24.4.2.24.4.2.2---- WCDMA Channel FIR Filter Desi WCDMA Channel FIR Filter Desi WCDMA Channel FIR Filter Desi WCDMA Channel FIR Filter Designgngngn

4.2- LABVIEW Simulation of 3G Channels 4.2.14.2.14.2.14.2.1----WCDMA channel SimulationWCDMA channel SimulationWCDMA channel SimulationWCDMA channel Simulation

4.2.14.2.14.2.14.2.1----WCDMA Acquire Input SignalWCDMA Acquire Input SignalWCDMA Acquire Input SignalWCDMA Acquire Input Signal

4.2.4.2.4.2.4.2.2222--------Noise Introduction VINoise Introduction VINoise Introduction VINoise Introduction VI

4.2.4.2.4.2.4.2.3333--------Channel Filter VIChannel Filter VIChannel Filter VIChannel Filter VI

4.2.4.2.4.2.4.2.4444--------EVM VIEVM VIEVM VIEVM VI

4.2.24.2.24.2.24.2.2---- WIMAX Channel SimulationWIMAX Channel SimulationWIMAX Channel SimulationWIMAX Channel Simulation

WIMAX VIs ExplanationWIMAX VIs ExplanationWIMAX VIs ExplanationWIMAX VIs Explanation

4.3- Reconfigurable System Architecture

4.5- Hardware Implementation 4.5.14.5.14.5.14.5.1---- Reconfigurability AspectReconfigurability AspectReconfigurability AspectReconfigurability Aspect

4.5.24.5.24.5.24.5.2---- Data Links and CommunicatData Links and CommunicatData Links and CommunicatData Links and Communicatioioioionsnsnsns

4.5.34.5.34.5.34.5.3---- HostHostHostHost----FPGA synchronizationFPGA synchronizationFPGA synchronizationFPGA synchronization

4.5.44.5.44.5.44.5.4---- FPGAFPGAFPGAFPGA----HOST synchronizationHOST synchronizationHOST synchronizationHOST synchronization

4.5.54.5.54.5.54.5.5---- Memory ComponentMemory ComponentMemory ComponentMemory Component

4.5.64.5.64.5.64.5.6---- Number RepresentationNumber RepresentationNumber RepresentationNumber Representation

4.5.74.5.74.5.74.5.7---- ConvConvConvConvolution Processolution Processolution Processolution Process

4.5.84.5.84.5.84.5.8---- Host Application ProcessingHost Application ProcessingHost Application ProcessingHost Application Processing

4.6- Design Assessment 4444....6.16.16.16.1---- Testing Scheme Testing Scheme Testing Scheme Testing Scheme 4.6.2- Results



Project Design& Analysis 4.1- System Definition

28

4-Project Design & Analysis

After conducting a literature survey, we move to the design and analysis phase.

The chapter presents the details of the system definition, the design of the variable FIR

filter, the performed simulations, the reconfigurable architecture, as well as the hardware

implementation and the project assessment.

4.1 System Definition

Our system is primarily a reconfigurable transceiver supporting two of the 3G

standards: WCDMA and WIMAX standards. The transceiver is optimized to support

these two standards that include different modules in their channels. Initially, we studied

each scheme alone by looking into its channel and requirements, and then we tried to

build a common architecture where we emphasize on the idea of reconfiguring the

common modules like channel filers, pulse shaping filters, modulation on both sides from

the transceiver. This reconfigurability scheme helps in developing new systems that

supply the designer with both flexibility of design as well as less hardware resources,

especially that the hardware platforms used in such implementations have limited

resources. For example, implementing three different FIR channel filters would require

the use of 3n multipliers, instead of n multipliers for one reconfigurable filter. The

hardware platform used in our project is PCI 5640 IF RIO system board, manufactured by

National Instruments. This board is typical for our design providing us with high data

rates, AD and DA converters, and the Virtex II-PRO v3000 FPGA. Our system has two

types of inputs: the control/switch, to choose between the two used standards, and the

data input port which receives a sequence of I & Q modulated values representing a given

message sent over one of these standards. At the output side, we get another set of I & Q

values and we would be targeting a low EVM degradation.

Project Design& Analysis 4.2- FIR Filter Coefficients Design

29

4.2- FIR Filter Coefficients Design

This section discusses the design of an FIR filter, in terms of order and coefficient

values design. First, however, we define the FIR system and then move to the generation

of the filter coefficients for the WCDMA and the WIMAX channel filters.

4.2.1- FIR System Definition

A reconfigurable FIR filter, supporting WCDMA and WIMAX standards, with a

variable frequency response is proposed. This reconfigurability aspect has the vivid

advantage of removing some extra unneeded hardware. Our system is part of a larger

system aiming to adapt itself to a large variety of wireless systems, already standardized,

by means of a common hardware platform. We will first generate the different impulse

responses of the different filters using MATLAB. Then, we will use the Virtex-II V2MB

1000 system board as the hardware platform for our system. We will instantiate different

hardware blocks on the system board including the interrupt controller (choose type of

signal), BRAM memory (storing coefficients), on-chip multipliers …and then connect the

different components to come up with the whole system. Our system includes control as

well as data inputs. In-order for the filter to be able to distinguish between the different

proposed standards, the system checks a boolean control: TRUE for WCDMA and FALSE

for WIMAX. The data input is the “InputSignal”, sequence of I & Q values that need to be

filtered. Based on the control, it then uses the corresponding response in the convolution

process. The output for our system is the “OutputSignal” is the filtered values.

4.2.2- Filter Coefficients Generation

The filter design is mainly divided into 2 steps: order and coefficients. The order of

the filter varies according to the specs. In order to achieve certain specs, there is a

minimum order that we need to satisfy to get an acceptable frequency response.

Increasing the order above this target will lead to a more sharp response, but we will pay


30

for the delay and hardware usage. First, we wrote a MATLAB function that generates the

required order of the filter, generate its coefficients, and then plots its frequency response

based on the rejections in the different bands and the pass-band as shown in Table 4.1.

We also used the FS_10 program to generate the FIR filter coefficients and compare them

with those generated by the MATLAB simulation. In the subsequent section, we generate

the FIR coefficients for the WIMAX standard. Coefficients for the WLAN can be

similarly generated and shall be included in the spring final project report.

4.2.2.1- WIMAX Channel FIR Filter Design

Section 2.1.1, WIMAX standard specification, presents the different bit rates that

WIMAX can support. Of these bit rates, we consider an RF bandwidth of 7 MHz. That is,

the WIMAX can send a data rate up to 7 Mbits/sec at RF. Accordingly, at baseband the

bandwidth of the WIMAX signal will be 3.5M1. [5] presents the FIR baseband

specifications of WIMAX. Based on these specifications, we wrote a MATLAB code to

generate the FIR coefficients as shown in table 4.1. The order of the filter came to be 52.

Figure 4.1 shows the corresponding frequency response of the WIMAX FIR filter. As you

notice, the center frequency is approximately 3 MHz and the specs are met.

% The WIMAX has a bandwidth of 7MHz at RF frequency. Therefore, at baseband, it % has a bandwidth of 7/2 = 3.5MHz. At Adj. Ch (7MHz), the attenuation is at least % 38dB. At Alt. Ch (14MHz) the attenuation is at least 57dB freq_band =[0 2000 3500 7000 14000 16000]; % define the frequency bands of WIMAX attenuation_dB = [0 0 -38 -57]; % define the attenuation at each band in (dB) attenuation = 10 .^(attenuation_dB/10); % transform the attenuation into linear scale Ripple_Ratio = [0.001 0.001 0.001 0.001];% define the percentage ripple at each band Sampling_Frequency = 32e3; % the sampling time is given by ADS to be 0.03125us [N,fpts,mag,wt]= firpmord(freq_band,attenuation,Ripple_Ratio,Sampling_Frequency); b = remez(N,fpts,mag);% apply the remez algorithm given the order % and frequency bands with their attenuations [H,f]=freqz(b,1,512); plot(f/pi*11500,20*log10(abs(H))) axis([0 10000 -100 1]) xlabel('Frequency (Hz)'); ylabel('Attenuation (dB)') title('FIR Baseband filter of WIMAX')

1 7M/2


31

Table 4.1: MATLAB code to generate FIR coefficients of WIMAX

Figure 4.1: WIMAX FIR baseband filter

To validate our results, the specs were supplied to the FS10 program as shown in

Figure 4.2. The sampling rate is 32 MHz, the same value used by the MATLAB code. This

is also true for the order of the filter and the center frequency. Figure 4.3 shows the

corresponding frequency response, which came to be as expected. The only difference is

the presence of higher amplitude ripples which are characteristic of the built in function

of the program. The group delay, also shown in the figure, is constant. This implies a

linear phase filter, a common characteristic of FIR.

Figure 4.2: FS10 WIMAX FIR


32

Figure 4.3: WIMAX FIR Filter Response using FS10

4.2.2.2- WCDMA Channel FIR Filter Design

The WCDMA signal is a wideband signal of bandwidth 3.84MHz at radio frequency.

Accordingly, at baseband frequency, the cutoff frequency is 1.92MHz. [26] specifies the

baseband filter specifications of WCDMA. Following the same procedure as the one in

the previous section, the order of the filter turned out to be 48 with the frequency

response shown in figure 4.4. As shown in the figure, the cutoff frequency is

approximately 1.92MHz at -3dB value. This frequency response looks very similar to the

one shown in figure 4.5 which is simulated using the FS10 program. Again the group

delay is constant.


33

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-70

-60

-50

-40

-30

-20

-10

0

10

Frequency (Hz)

Att

enuation (

dB

)

FIR Baseband filter of WCDMA

X: 1914

Y: -3.83

Figure 4.4: WCDMA FIR baseband filter

Figure 4.5: WCDMA FIR baseband filter using FS10

Project Design& Analysis 4.3- LABVIEW Channel Simulation

34

4.3- LABVIEW Channel Simulation

The LABVIEW simulation is intended to test the validity of our design before

implementing it on a hardware platform. The metric used to test the reliability of the

channel filter design is Constellation Error Vector Magnitude Metric (EVM). Since we

currently care for testing the channel filter, and since this metric doesn’t necessitate the

design of a whole transceiver channel, we can then compare the IQ channel input of the

filter and the IQ channel output from it and then calculate the EVM. Noting that

hardware implementation is time consuming, this LABVIEW software simulation is

essential for testing and debugging purposes in order to find and recover errors as early as

possible in the design process. Also, it allows the user to easily optimize the design by

choosing the best possible set of different parameters.

4.3.1- Simulation of WCDMA channel using LABVIEW

The LABVIEW simulation of the WCDMA channel aims at finding a basis or a

reference based on which we can compare the output of the implemented WCDMA

blocks on the PCI IF RIO 5640 system board. The presented chain has a simpler block

diagram than the one in section 2.1.2. This goes back to the fact that our design doesn’t

take into account the effect of fading (only noise). Accordingly, the WCDMA simplified

chain is given in figure 4.6.

Source

(Stream of bits)

QPSK

Modulator Spread

Spectrum DAC

Basic Modulation Advanced Modulation Pulse Shaping Filter

Root Raised

Cosine Filter

Basic

Demodulation

Advanced

Demodulation Pulse Shaping Filter

ADC QPSK

Demodulator

Sink

(Detected Root Raised

Cosine Filter Despreading Channel

Filter (FIR)

Figure 4.6: Simplified WCDMA channel


35

As revealed in figure 4.6, the complete transceiver starts with a bit generator, on the

transmitter side, and ends with the BER module, on the receiver side. The BER metric is

used to evaluate the performance of our channel filter and the correctness of the whole

transceiver design in general. Figure 4.7 shows the high level LABVIEW designed

WCDMA transceiver block diagram.

Figure 4.7: LABVIEW based WCDMA transceiver block diagram

After testing each module separately, we connected all the blocks together as

shown in the above diagram. A technical problem, however, arose when executing the

whole chain on the host: virtual memory too low resulting in a fatal error and halting the

execution of LABVIEW software. Even though the host memory capacity is not low, it

was not enough to run this simulation. This is due to the fact that simulating such

transceivers necessitates the use of large number of input bits; in order to observe a clear

spectrum of the signal (time–frequency uncertainty principle). In our design, for example,

we used a spreading factor of 16, and QPSK modulation type. Thus, in order to reach

3.84Mcps at the transmitter’s output, the number of generated bits, n, at the beginning of

the transmitter then is:

63.84 102 / 480,000 !!!

16 /

chipsn bis symbol bits

chips symbol

×= × =

a number that a host, with a 250MB of RAM, can't perform heavy processing on through

the different channel stages. As a result, and due to the fact that the promise given to us

of upgrading the host’s memory wasn't fulfilled except for the last weeks of the semester,

we had to figure out another way to perform the simulation process.


36

Therefore, we thought of using the ADS (Advanced design system) software,

designed by Agilent, to simulate the transmitter’s part. Accordingly, we generated the

transmitted I and Q (inphase and quadrature) components of the signal using ADS, wrote

these values into a text file, and then read them from LABVIEW and passed them to the

receiver’s side. This technique is possible because of two main reasons: first the generated

files to be read from LABVIEW are not so large; contains around 45000 values (enough to

resemble a WCDMA signal); much smaller than the previous case: 000,480 values that

were used to generate a WCDMA signal. Second, the critical path of the channel and

thus the processing load is tremendously decreased. Thus, we place the designed channel

filter at the front end of the receiver’s side, in the LABVIEW software, to test if it meets

the specifications. Figure 4.8 shows the WCDMA transmitter using ADS:

Qch

Ich

VAR

VAR1

FilterLength=16*SamplesPerChip+1

TimeStop=(StartSlot+NumSlotMeasured)*(666.6667e-6)

TimeStep=1/(3840000*SamplesPerChip)

TimeStart=StartSlot*666.6667e-6

StartSlot=0

NumSlotMeasured=15

ChipsPerSlot=2560

SamplesPerChip=8

SpecVersion=1

EqnVar

1

3GPPFDD_DPCH

G8

SymbolRate=30ksps

PilotPowerOffset=0.0 dB

PilotBitsNum=4 Bits

TPCPowerOffset=0.0 dB

TFCIPowerOffset=0.0 dB

tDPCHOffset=0

TPCValue=0x5555

TFCIValue=0

TFCIField=Off

SpreadCode=0

ScrambleType=Normal

ScrambleOffset=0

ScrambleCode=0

UserFileName="datafile.txt"

RepBitValue=0xff

DataPattern=Random

SpecVersion=Version 12-00

DPCH

3GPP

1 2

RaisedCosineCx

Filter

SquareRoot=YES

ExcessBW=0.22

SymbolInterval=SamplesPerChip

Length=FilterLength

Interpolation=SamplesPerChip

DecimationPhase=0

Decimation=1

1

EVM

E1

OptimizeSamplingInstant=YES

Constellation="(1,1) (1,-1) (-1,-1) (-1,1)"

ModType=QPSK

MeasType=EVM RMS

SymBurstLen=2560

SymTime=(1/3840000) sec

Start=666.6667 usec

RLoad=DefaultRLoad

Plot=None

EVM

1

TimedDataWrite

Time2

FileName="3GPPFDD_BS_DL_Q_Data"

ControlSimulation=YES

Stop=2 msec

Start=666.6667 usec

1

TimedDataWrite

Time1

FileName="3GPPFDD_BS_DL_I_Data"


Stop=2 msec

Start=666.6667 usec

1

TimedSink

I


Stop=700 usec

Start=666.6667 usec

RLoad=DefaultRLoad

Plot=None

1

SpectrumAnalyzer

SpectrumI

WindowConstant=0.0

Window=Kaiser 7.865

Stop=DefaultTimeStop

Start=DefaultTimeStart

Plot=Rectangular

1

SpectrumAnalyzer

Spectrum

WindowConstant=0.0

Window=Kaiser 7.865

Stop=DefaultT imeStop

Start=DefaultT imeStart

Plot=Rectangular

1

TimedSink

Q


Stop=700 usec

Start=666.6667 usec

RLoad=DefaultRLoad

Plot=None

DF

DF

DefaultTimeStop=TimeStop sec

DefaultTimeStart=TimeStart sec

DefaultNumericStop=100

DefaultNumericStart=0

1 2

CxToTimed

C2

FCarrier=2140 MHz

TStep=TimeStep sec

1 2

FloatToTimed

F2

TStep=TimeStep sec

1 2

FloatToTimed

F1

TStep=TimeStep sec

12

3

CxToRect

C1

Figure 4.8: WCDMA transmitter using ADS


37

The above figure shows that EVM metric, check section 2.5, is now used to evaluate

the performance of our digital channel filter. This is due to the fact that we are no more

generating bits and comparing them with the output bits using the BER metric. Instead,

we compute the transmitted constellation points and compare them with the output

constellation points using the EVM metric. The transmitted WCDMA signal with its

constellation is shown in Figure 4.9. The figure shows the 3.84Mcps bandwidth and the

QPSK basic modulation respectively.

-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4-0.6 0.5

-0.6

-0.4

-0.2

0.0

0.2

0.4

-0.8

0.6

I

Q

Figure 4.9: WCDMA transmitted signal and its constellation using ADS


38

The WCDMA channel LABVIEW VIs can be organized in a hierarchical manner.

It includes four main components:

• WCDMA Acquire Input Signal

• Noise Introduction

• Channel Filtering

• EVM Evaluation

Figure 4.10 illustrates the top level connections between the four major

components. Detailed explanation of each VI, component, are presented in subsequent

sections.

Figure 4.10: Top level View of the WCDMA Channel

4.3.1.1- WCDMA Acquire Input Signal

This VI represents the whole ADS-based transmitter shown in the previous

section. Thus, it contains the spreader, the QPSK modulator, and the root raised cosine

pulse shaping filter blocks. The VI basically acquires I and Q waveforms; the ADS

generated data files generated. Each file contains coded and spread 3G I / Q signals taken

over two 3GPP time slots. Each WCDMA signal has 15 slots with total duration of 10ms.

Thus, the 3.84Mcps will be spread equally over the 15 time slots leaving a total of


39

256Kbps per time slot. Accordingly, two 3GPP time slots will take total time of

2x666.667x1e-6 sec, each running at 3.84Mcps, but sampled at 8x chiprate (4xNyquist). So

every 8 samples in the time waveform represents one chip. These data are based on

30ksps voice coded rate and then spread x128 by spreading code number 0 (which

represents 128 streams of ones). A more detailed view of the VI is shown is figure 4.11. As

can be seen, the inphase and quadrature of the WCDMA signal are stored in two files

which are fetched at run time and transmitted to corresponding VIs. In the VI, we used

an incremental time dt = 3.255E-8 which is calculated as follows:

8

610255.3

81084.3

11 −×=××

==chiptime

dt

The plot of the power spectrum density of the signal and its constellations are shown

in figures 4.12 and 4.13 respectively.

Figure 4.11: WCDMA Acquire Input Signal VI


40

Figure 4.12: PSD of a WCDMA Signal

Figure 4.13: Constellation of the Input Data

Notice from figure 4.12 that the signal has its 3dB difference from its peak value at

about 1.92MHz which is the baseband bandwidth frequency of the WCDMA signal. The

constellation type of the signal, as shown in figure 4.13, is QPSK which characterizes the

type of modulation applied to the input data bit stream. The QPSK type is one of several

valid types of constellations for a WCDMA signal.


41

In a QPSK constellation, the ideal constellation points are centered at 4 different

locations, two on the x-axis and the other two on the y-axis. This is roughly shown in

figure 4, where we notice 4 main condensed points, representing the four ideal points,

inaddition to some other points deviating from their ideal locations. These deviations are

due to the effect of pulse shaping filter which spreads the points and changes their I and

Q values.

4.3.1.2- Noise Introduction VI

In real wireless communication scenarios, the channel is rarely an ideal channel.

It often experiences several types of noise including pathloss, shadowing and fading of the

signal due to the transmitting environment. Thus adding noise to our channel is

important for better modeling of real channel parameters. In our simulation, however,

due to the complex modeling of all types of noise sources and due to the fact that noise is

not the main aspect of our project, we limited the effect of noise to an AWGN noise. Thus

“Noise Introduction” VI, shown in figure 4.14, uses the “Add AWGN” VI to add noise to

both; the I and Q components of the WCDMA signal. This AWGN block has a variable

input that determines the standard deviation of the Gaussian noise, and thus the amount

of noise contribution to the signal. The corresponding power spectral density of the noisy

signal and its constellation are shown in figures 4.15 and 4.16.

Figure 4.14: Noise Introduction VI


42

Figure 4.15: Noisy PSD

Figure 4.16: Noisy QPSK Constellation

As can be noticed from figure 4.15, the WCDMA signal includes some distortion

components. Inaddition, the corresponding constellation, shown in figure 4.16, gets

noisier as well. The above constellation contains only 400 samples of the given noisy

signal. Notice the spread crosses away from the four major locations due to the additional

noise.

4.3.1.3- Channel Filter VI

The Channel Filter VI is one of the most important VIs in our project. It evaluates

the WCDMA filter designed previously. A detailed view of the channel filter is shown in

figure 4.17. The filter first reads the filter coefficients from a file, previously designed

using FS_10, and convolves them with the noisy WCDMA signal. Of course, the


43

convolution process is done separately for each of the inphase and quadrature

components of the signal.

Figure 4.17: Channel Filter LABVIEW VI

The power spectral density of signal and the corresponding constellation are shown in

figures 4.18 and 4.19.

Figure 4.18: PSD of the Filtered Signal


44

Figure 4.19: Constellation after Filtering (Quadrature vs. In-phase)

In Figure 4.18, the filtered WCDMA signal is far better than the noisy WCDMA

signal, shown in figure 4.15. This validates the effect of the design of the channel filter.

Notice that the passband amplitude is kept as it was before filtering; about 10-6 units. The

stop band, on the other hand, has been reduced from around 10-11 to 10-14 in dB (factor of

1000). The constellation in figure 4.19 assures the previous analysis. The constellation

points become more deviated towards the four major centers, i.e. more towards the ideal

transmitted WCDMA signal constellations. This deviation (after filtering) is used to

measure the EVM in the signal, as discussed below.

4.3.1.4- Error Vector Magnitude (EVM) VI

The error vector magnitude metric (EVM) measure is used to evaluate the

efficiency of the design of modulators, filters... Refer to section 2.5 for more explanation

of the EVM metric algorithm.

The EVM measure is implemented using LABVIEW. A detailed view of the block

diagram is shown in figure 4.20. The VI has three inputs: the signal, its size, and the

reference positions for the constellations. For our WCDMA signal and since the

generated signal is upsampled by 8 which means that every point is sent 8 times, we

decided to take the best of the eight sent points to calculate the EVM. In order to test the

efficiency of the channel filter designed, we compare the value of the EVM before adding

noise and the value of the EVM after adding and filtering the signal using our channel


45

filter. The former value came to be 10.0417% and it was degraded into 10.2858 % after

filtering. The degradation percentage is given by:

22 ___ EVMOldEVMNewPercentagenDegradatio −=

22 100417.0102858.0_ −=PercentagenDegradatio = 2.227%2.227%2.227%2.227%

In other words, the degradation percentage after adding noise and filtering it using our

channel filter is 2.227% which is acceptable; less that 5%. Thus, we can deduce that our

channel is working efficiently and we are now ready to synthesize it on the board.


46

Figure 1.20: EVM LABVIEW VI


47

4.3.2- WIMAX Channel Simulation

The same arguments presented earlier in the WCDMA channel simulation section

also apply for the WIMAX channel simulation case. Similarly, we designed the whole

WIMAX transceiver, but the host couldn't simulate the true number of bits needed…

The high level LABVIEW based WIMAX transceiver bock diagram is shown in figure

4.21.

Figure 4.21: LABVIEW WIMAX Transceiver Block Diagram

Using the specifications of the WIMAX standard, see section 2.1.1, we simulated the

WIMAX transmitter using ADS. Figure 4.22 shows the ADS simulated WIMAX

transmitter:

Note: Rate_ID Modulation RS-CC 0 BPSK 1/2 1 QPSK 1/2 2 QPSK 3/4 3 16QAM 1/2 4 16QAM 3/4 5 64QAM 2/3 6 64QAM 3/4

VAR

Signal_Generation_VAR

Cy clicPref ix=1/4

Bandwidth=7.0 MHz

Ov ersamplingOption=2

Rate_ID=3

DataLength=864

SignalPower=10

BurstWithFEC=1

NumberOf Burst=1

Eqn

Var

DF

DF2

OutVar=OutVar

CxToTimedIQ

C1TStep=0.03125 usec

i

q

SpectrumAnalyzerResBWSpec_Q

SegmentTime=NumSegments=

ResBW=3 kHzWindow=Hanning 0.50Stop=DefaultTimeStop

Start=DefaultTimeStartPlot=Rectangular

ResBW

TimedDataWriteT3

FileName="i"ControlSimulation=YES

RLoad=DefaultRLoadStop=DefaultTimeStop


TimedDataWriteT2

FileName="q"ControlSimulation=YES

RLoad=DefaultRLoadStop=DefaultTimeStop


WMAN_DL_SignalSrc_RF

DL_source

DataPattern=S_16-QAM

CyclicPrefix=CyclicPrefixBandwidth=Bandwidth

OversamplingOption=OversamplingOptionRate_ID=Rate_IDDataLength=DataLength

BurstWithFEC=BurstWithFECNumberOfBurst=NumberOfBurst

Power=dbmtow(SignalPower)FCarrier=IF_Freq1

RF

DL Source

WMAN

SplitterRFS1

WMAN_OFDM_DL_RxSensitivity_Info

Information

WMAN 802.16-2004 Design Information

VAR

OutVar

OutVar="RSS_Power"

EqnVar

VAR

VAR2

RSS_Power=Prx

EqnVar

VARMeasurement_VARs

Frame=200

EqnVar

VARRF_Channel_VARs

Prx=-102+SNR_Rx+10*log(200/256*RF_Bandwidth*10e-6)

SNR_Rx=16.4

IF_Freq1=380 MHz

EqnVar

TimedToCx

T1

Figure 4.22: WIMAX transmitter using ADS


48

The transmitted WIMAX signal is an OFDM signal. Thus, it comprises the peaks of a

series of shifted sinc functions in the frequency domain. This phenomenon is further

illustrated in figure 4.23.

365 370 375 380 385 390 395360 400

-150

-100

-50

-200

0

freq, MHz

dB

m(S

pec_Q

)

Figure 4.23: WIMAX transmitted signal using ADS

The WIMAX transceiver LABVIEW VIs can be organized, as in the WCDMA case, in

a hierarchical manner. It includes four main components:

• WIMAX Acquire Input Signal

• Noise Introduction

• Channel Filtering

• EVM Evaluation

Figure 4.24 illustrates the top level connections between the four major components.

Detailed explanation of each VI, component, will be presented in subsequent sections.


49

Figure 4.24: Top level View of the WIMAX Channel in LABVIEW

4.3.2.1- WIMAX VIs Explanation

The WIMAX Acquire Input signal VI represents the transmitter designed in ADS and

shown in Figure 4.22. Thus, it contains the basic modulation, the OFDM with no pulse

shaping filter blocks as stated in WIMAX specs. The WIMAX was simulated with 7MHz

RF bandwidth. The VI basically acquires I and Q waveforms from the ADS generated

files. Each file contains coded 3G I / Q signals taken for a 4.32ms time slot (containing 108

symbols at 40us per symbol). The used modulation type is 16QAM signal, 1/2

convolutionally coded with 256 FFT sub carriers. The sampling rate is at 0.03125 us.

To test this VI, we simulated the power spectral density of the signal generated by the

given files, and compared the result with the WIMAX transmitted signal using ADS

shown in Figure 4.23. For the comparison to be fair, we split the ADS WIMAX signal into

two symmetric parts and compare each half of the RF WIMAX signal with the baseband

WIMAX LABVIEW PSD. The result was quite similar as shown in Figure 4.25. Note the

baseband bandwidth which is half the RF bandwidth (3.5MHz = 2

7MHz)


50

Figure 4.25: WIMAX PSD in LABVIEW (Amp. Versus freq)

The noise introduction and the EVM evaluation blocks are quite similar to their

counterparts in WCDMA. Regarding the channel filtering block, it has the same

architecture as that in WCDMA, but obviously with different coefficient values and

number of taps.

We chose the hamming filter rather than rectangular or remez, because this type of

filtering proved its efficiency in case of high noise addition impact. The vividness of such

type is illustrated in the diagrams shown in figure 4.27. For a 0.3 noise standard deviation,

the original signal gets highly distorted. However, through the use of the hamming

filtering, the signal gets successfully recovered.

As stated previously, EVM is the best metric that evaluates the performance of the

channel filter. Unlike the EVM metric for the WCDMA QPSK constellation, the EVM for

the WIMAX has to take into account the 16QAM constellation. Thus the EVM

mathematical formula is changed and this will surely affect the EVM VI that is shown in

figure 4.26.


51

Figure 4.26- WIMAX EVM LABVIEW Module

Applying the above EVM metric, the EVM of the signal, before applying the channel

filter, turns out to be 51.9898%. As you notice, such value is considered a very high

number. This is due to the fact that the 16QAM constellation points have been randomly

distributed after applying the OFDM function. We would have applied the OFDM

demodulator before calculating the EVM and the EVM will be much smaller but as

explained previously, we aim at lowering the EVM degradation after applying the

channel filter and not the value itself which doesn’t constitute any trouble to us. The

EVM after applying the filter appears to be 52.0834%. Accordingly the degradation

percentage is given by:

%12.3520834.00.519898_ 22 =−=PercentagenDegradatio

Again, the degradation is less than 5% which gives us a good motive to shift to the

implementation part.


52

Figure 4.27: WIMAX filter response. A) no noise b) noisy WIMAX signal for 0.3 noise standard

deviation c) WIMAX filter response for 0.3 noise standard deviation (Amp vs. freq)

Project Design& Analysis 4.4- Reconfigurable System Architecture

53

4.4- Reconfigurable System Architecture

Given the block diagrams for each of WLAN, WCDMA, and WIMAX, in the previous

chapter, we need a way to combine such standards in order to facilitate the job of

reconfiguration between these standards. As you may have noticed from the previous

block diagrams, we have included the blocks that deal with a real channel scenario (noise

and fading). Given, as a first step, that no fading simulation has to be done, our channel

will be highly depending on noise, and noise alone. So, this pushes us to remove some

blocks from the previous block diagrams, as the interleaver, the Cyclic Prefix Insertion

block, convolution encoding and some other blocks. Accordingly, this leaves us with only

the blocks dealing with modulation/demodulation (“basic” such as QPSK and “advanced”

such as OFDM and spread spectrum), and filtering. However, we kept the block diagrams

as they are (those accounting for the real channel scenario) in order to be a reference for

our future work in case we want to improve our project further and account for real

fading channels (in addition to noise). In figure 4.28, we show the common block

diagram for noisy channels reconfigurable for the prementioned standards: WCDMA and

WIMAX.

Notice that both the interpolation and the decimation filtering are not included in

this block diagram explicitly, however the ADC and the DCA present in PCI 5640R we

are working with, include some specifications for decimation and interpolation filtering

respectively. Briefly, the design will start by inputting a certain number of bits; the

number of such bit rates may be dealt with in either of two ways. Either we input a fixed

number of bits and then throughout the chain of blocks we upsample the bits to meet the

specified bit rate for each standard, or we can input a variable bit rate so that would use

same upsampling factor. Afterwards, we have to modulate these bits. As you may have

noticed from the specifications, for each standard, QPSK is the common possible basic

modulation scheme for the three standards. Then, we have to perform either a DSSS

(direct sequence spread spectrum) for WCDMA and OFDM (orthogonal frequency


54

division multiple access) for both WIMAX and WLAN. Now, the specifications for both

WIMAX and WLAN differ in performing OFDM and that’s why such block must be

reconfigurable between both. Afterwards, a 256 IFFT is performed for both WIMAX and

WLAN which is a very efficient discrete implementation that replaces the pulse shaping

filter. Finally, a root raised cosine filter is applied for WCDMA. While WIMAX doesn’t

use any pulse shaping filter, the WLAN uses a shifted Gaussian shaping filter at each of

the multicarrier specified by the OFDM operation. The output is then converted into the

analog domain using the DAC block with the appropriate interpolation and decimation

factor as discussed previously.

Given that the channel is noisy, the “Reconfigurable Channel Filter (FIR)” block is

implemented as shown in chapter 4.5.

Figure 4.28: Reconfigurable Transceiver Block Diagram

It depends on the specifications of each of the standard as the adjacent channel

attenuation, the baseband frequency, the attenuation at some frequencies, etc…


55

Refer to section 2.3 for additional information about the “channel Filter (FIR)” block,

for each case, with its corresponding specifications.

Project Design& Analysis 4.5- Hardware Implementation

56

4.5- Hardware Implementation

In this section, we present the implementation details of the FPGA-based

reconfigurable FIR filter. We discuss how reconfigurability of the filter is implemented.

Moreover, we present the different problems encountered at the different stages of the

process, their solutions and some other proposed alternatives including their advantages

and disadvantages from an implementation point of view.

4.5.1- Reconfigurability Aspect

The LABVIEW-based board (PCI-5640) simplifies the implementation of a

reconfigurable system because of its two-level hierarchical model: Host and FPGA. The

host VI has the ability to read and modify the different parameters (controls and

indicators) of the FPGA by reading/passing their values to the FPGA VI through built-in

read/write control methods. It also transfers the digitized input signal from a file on the

host to a pre-assigned memory on the FPGA through the read/write control methods.

These methods serve as means of data communication between the host and the FPGA.

All of these features help us to implement an efficient reconfigurable FIR filter where the

number of taps is controlled from the host level and the coefficient values downloaded

and stored on the FPGA memory at run-time. So, instead of storing the different

coefficients of all the filters for the different standards on the FPGA memory, we

download the coefficients of only one filter at run time depending on a user control. This

method is preferred because of its scalability when including several standards; less

memory space. In sum, we can switch between the two standards by just changing a

boolean input control from the host as shown in Figure 4.29. The boolean control is

wired to the select of a case structure. When the select is true the WCDMA coefficient

and data files and other parameters are loaded and when false the WIMAX files are

loaded.


57

Figure 4.29: Case Structure (Choose WCDMA / WIMAX)

Another alternative method for reconfiguring the FIR filter is to store the

coefficients of the different standards on the memory blocks of the FPGA before run

time. So, when the host VI is run, only a boolean control is transferred from the host to

the FPGA and then the corresponding coefficients are used for filtering. Even though this

alternative has a less initialization phase, it is inefficient from a memory usage point of

view. Targeting an efficient design, we decided to use the first method.

4.5.2- Data Links and Communications

As previously mentioned, the PCI-5640 board hierarchy is divided into two levels:

host and FPGA. The former is used to keep track of controls and indicators that are sent

and received from while communicating with the FPGA. It is also used to configure (in

LABVIEW) some hardware resources that are already synthesized on the FPGA such as

the ADC and the DACs. Concerning the FPGA level or VI, the designed blocks such as

registers and memory blocks of the design are compiled and then synthesized on the

Virtex II Pro FPGA. After setting the configuration parameters, the FPGA listens for data

coming to its inputs and sends results to a DMA FIFO depending on the logical modeling

of the project. Figure 4.30 shows a high level diagram of the communication process

between the host and FPGA.


58

Figure 4.30: Host-FPGA Link

The communication link shown in figure 4.29 is asynchronous, meaning that

there is no guarantee that a sent value from the host VI will be received to the FPGA at a

certain time. In other words, a received value may be read several times before a new

value is received or a sent value may not be read at all!!! So the incoming signal gets

distorted and wrong results are experienced. Due to this fact, we experienced a very hard

time trying to synchronize the two levels, the solution of which is presented below.

4.5.3- Host-FPGA Synchronization

The PCI 5640 board is a very important tool for communications systems

implementation especially that it is LABVIEW based where most of the needed

components can be easily implemented. Yet, this board is still under test and we, as

students, are helping in this procedure. As we mentioned earlier, we faced a major

problem during the filter implementation: how to read values on the FGPA that are sent

from the Host continuously. This issue is very critical in any application; since we always

need to send data from the host level to the FGPA level. We contacted NI via the

available online forums, in particular the NI IF-RIO forum until we finally reached a

solution. Figure 4.31 shows the process of reading a value on the FPGA from the

coefficient control. This read process guarantees that the sent element is read only once.

It saves the previous value in a shift register, and on every iteration of the while loop, it

compares the previous value (the one in the shift register) with the one available in the


59

Shift register

coefficients buffer. If they are not equal then, we can recognize that a new value was

received. Thus we read it, save its value in the shift register to compare it with the

coefficient values in the next iteration of the while loop and then pass it to the next stage.

Note that it should be different than zero because the default value for any parameter on

the FPGA is zero, so if we don’t set this condition, we would loose iterations initially. In

other words, the FPGA would write a new value if and only if the output of the AND

gate is true as seen in Figure 4.31.

Figure 4.31: FPGA Read Process

One may ask that we can’t send a zero or the same value twice back to back to the

FPGA? A more complex implementation can be used to allow you to send the same value

more than once. With each value sent from the host to the FPGA a “time stamp” is

attached. Since a time stamp is never equal to the one from the previous iteration, we can

then compare the time stamps instead of comparing the value of the coefficients. Also,

one can send a zero and it would be successful since its time stamp is different than zero.

But, in order to implement this design, we need to encode the time stamp to each value

sent to the FPGA in order to ensure that every value and its corresponding time stamp are

mapped to each other. Accordingly several bits need to be reserved for the time stamp,

and thus several bits of precision for the input values are lost. The signal distortion then

would dramatically increase due to error accumulation upon adding and multiplying

signal terms.

Since, our input and coefficient adjacent values are different; we decided to go

with the simple design without the time stamp to gain higher precision in our results.


60

4.5.4- FPGA- Host Synchronization

The link from the FPGA to the host is much simpler than in the opposite

direction. The reason is that the FPGA supports the use of DMA FIFOs in this direction

but not the other way around. Using DMA FIFOs, we can store values in the FIFO and

then read them from the host VI. Since we control the write operation to the FIFO on the

FPGA VI; every value is written once to the FIFO and when read from the host VI is

popped out (removed from the FIFO and thus not read again) and since reading the FIFO

on the host does not allow reading an empty, null, value, then there is no need to

synchronize the two blocks. Putting the read FIFO operation in loop guarantees that the

values are read in the correct manner. This procedure, FIFO read operation, is used to

store the results of the filtering process which are then read from the HOST VI and saved

in an array for later evaluation. Figure 4.32 shows a block diagram of a DMA FIFO. The

inputs for the FIFO include the Timeout value, usually set to zero in order not to cause

any delays. The currently used PCI 5640 board supports only a zero timeout; otherwise a

compilation error would occur. A second input is the “Number of Elements”; the number

of Data values that should be read from the FIFO in every iteration. Since we write one

value per iteration, and thus only one value needs to be transferred from the FPGA to the

host, then we set the number of elements to be read to 1. Also note that the FIFO

supports 32-bit number representation which is in accordance with our design as

described in 4.5.6.

Figure 4.32: DMA FIFO Read Method


61

4.5.5- Memory Component

In designing the FIR filter, we need to save the coefficients in memory in order to

repeatedly use them for the convolution process. In addition, we need to keep track of

the last N = “number of coefficients” input values. The reason is that every output

depends on the last N inputs. Thus every new input needs to replace the oldest input in

memory before the convolution process takes place. This is descried in details in the

section 4.5.7.

First, we tried to implement our FIR filter using arrays and FIFOs (local and

DMA). But, the compile took around 15 hours and failed, resulting in a fatal error. After

daily contacts with the NI forums, we recognized that using arrays is not encouraged on

such type of boards (as advised by NI) and that other users of the same board experienced

the same problem as well. The reason is due to the fact that arrays at the software level

are mapped to registers on the board, and that there exists a shortage in registers for large

arrays, typical to the ones we are using. The design of the project using arrays is shown in

figure 4.33. This figure includes a preliminary implementation of our filter where we

were trying to do a simple convolution process with fixed coefficients and filter order.

Thus, it failed because of the above described reasons.


62

Figure 4.33: FIR implementation Using Arrays and FIFOs

So, we decided to migrate to a more practical solution, which is using BRAM memory

blocks. These memory blocks are organized as 8k x 16 bits. But since we are using a 32 bit

representation, as illustrated in section 4.5.6, a new problem arises; storing and reading

32-bit number in a 16-bit memory.

To solve this problem we split/concatenated the 32 bit number into two 16 bit

numbers and wrote/read each 16 bits alone to/from memory, thus using 2 memory

addresses. Thus, every 32 bit write/read operation involves two 16 bit write/read

operations; one for the 16 MSB bits and another for the LSB bits, see figure 4.34.

Although this operation doubles the size of the needed memory, it is however necessary

to get the required precision of our numbers.

Figure 4.34: Write a 32-bit coefficient in the memory

Address

Data


63

4.5.6- Number Representation

One of the important decisions in any digital design process is which number

representation should be used. Using MATLAB, we first compute the resulting error

when rounding a given number using the fixed point notation. All the numbers are

mapped to a single exponent; chosen in accordance with the given data range and that

would minimize the loss in precision. Knowing that a 16 bit representation resulted in an

unacceptable error value, a 32-bit representation, on the other hand, was sufficient. Using

this representation, we recognized using MATLAB if we scale the output of the

convolution process by higher than 109, we would face a problem of overflow because we

would need more than 32-bit. So, we notice that the highest possible scaling is then 109,

so as result the scaling of the coefficients and input values should not surpass this value

when multiplied.

Table 4.2 shows the MATLAB code that rounds the coefficients and input values to

different exponents between 103 and 106 to get the best combination for the case of the

WCDMA filter design. The results shown that multiplying the coefficients by 982 and the

input values by (109/982) would result in the minimum error. So, based on these result,

we decided to scale up the coefficients by 103 and the input values by 106 before sending

them to the FPGA for filtering process.

coeff_ideal = load('WIMAX_Ham.txt'); % Open the Filter Coeffecient File

input_ideal = load('WIMAX_Q.txt'); % Open the Input I File

output_ideal = conv(coeff_ideal,input_ideal); % Perform the FIR operation for the ideal values

error =ones(1,991); %initialize error vector with ones

for i=1:10:1000

coeff_round = round(coeff_ideal*i*1000);

input_round = round(input_ideal*(1000/i)*1000);

output_ideal = conv(coeff_ideal,input_ideal);

output_round = conv(coeff_round,input_round)/(1000000000);

error(i) = mean(abs(output_ideal-output_round));

end;

[exp, min_err] = min(error) Table 4.2: MATLAB Testing for Fixed Point Notation


64

4.5.7- Convolution Process

The convolution process represents the bulk of the FIR filter. Implementing the

convolution efficiently is very important for the performance of the system. The main

efficient metrics to consider in an FIR design and implementation are minimum latency

and minimum hardware usage. Since the order of the filter is around 50, as described in

section 2.3, then we have two choices: the first choice is implementing a parallel filter,

meaning that the filter is able to perform the convolution in a parallel manner. This

choice however translates into fetching 2x50 (for the coefficients and input) values from

memory on every iteration which is way far from being practical, this number of fetches

is not supported by the board and thus we would have to fetch them serially. Such an

implementation would be beneficial in the case that registers were used instead of

memory blocks, because register reads are independent from each other. But, since we are

obliged to use memory blocks as described previously in section 4.55, the parallel filter is

then not a option. So we decided to use a serial implementation of an FIR filter. Figure

4.35 shows how the convolution process takes place on our hardware platform.


65

Figure 4.35: Convolution Process

The filtering process consists first of finding the address of the data to be fetched from

memory in the iteration, fetching the corresponding values twice; for the found address

and for address + 1, and then concatenated the two as shown in figure 4. Then, the

multiplication and accumulation process starts until it reaches the last iteration (equal to

“Number of Coefficients -1”). At the last iteration, the output of the convolution is stored

once on the DMA FIFO that would then be read from the host level at run time. The

accumulator in our design is a 32-bit register initialized to zero.

The critical path of our hardware design, consisting mainly of “number of

coefficients” multiplications and additions, is less than the clock period at which the host

sends data values to the FPGA.

Split/join

numbers

Find

Address Accumulator Memory

Read


66

4.5.8- Host Application

On the host level, the host reads the output of the FIR filter from the DMA FIFO

before displaying and evaluating the results. A block diagram of the host VI is shown in

figure 4.36. Then, the host VI plots the frequency domain representation of the filter

output and calculates the corresponding EVM values. The frequency domain

representation is important to show that specific frequency contributions to the signal

were preserved. This however is only a visual evaluation. The EVM metric, however,

gives a numerical value of the degradation/deviation of the signal from the original one.

Since, we are designing a channel filter, so degradation less than 5 % is a good indicator

of the effect of the filter on the distribution of the constellations.

Figure 4.36: HOST VI

4.5.9- FPGA Process

The FPGA platform includes the hardware implementation of our reconfigurable FIR

filter. On each iteration of the host’s clock, it receives a new value that needs to go the

filtering process. This value is read as described in section 4.5.2 where the whole FGPA

VI is looping continuously in order to guarantee a read of every new value. Then since

we do have an In-phase and Quadrature component of the signal, we have decided to do

the filtering process on the different channels in parallel as


67

shown in figure 4.37. Our FPGA acts on 32-bit numbers and uses BREM memory blocks and DMA fifos for data transfer as described through

this section.

Figure 4.37: FPGA VI

Project Design& Analysis 4.6- Project Assessment

68

4.6- Design Assessment

In this section, we present the testing scheme used through the design process and

the way it helps reaching the final results. Moreover, we emphasize on the both, the

software and hardware levels of the testing scheme. Also, we compare the results

obtained from the hardware implementation with those of the software simulation.

4.6.1- Testing Scheme

The testing process for the implementation of the FIR filter is divided into two main

phases presented shortly. The aim of such testing procedure is to limit the possible errors

and check the feasibility of each phase as early as possible in the design/implementation

process in order to optimize the whole system with the minimum possible cost.

The first phase in the testing scheme involves simulating the FIR in LABVIEW and

testing the different metrics used to qualify our design. In our case, the two main metrics

used to assess our design are the frequency domain response of the output and the error

vector magnitude (EVM). Once, this design phase is successful and we are satisfied with

the results, we can set the designed filter coefficients and start working on the hardware

simulation. The results for this phase are presented in section 4.3; where we show how

we surpassed this phase for the WCDMA and WIMAX channels. As mentioned

previously, the EVM value is a useful metric for filter testing because it allows us to test

our filter and how it affects the distribution of the constellations, especially in our case

where we can’t measure bit error rate of the received bits. In order to assess the efficiency

of our filter design, we need to guarantee that the degradation in the EVM value between

the input signal and the filtered one stays less than 5%, like 2.227% in the case of

WCDMA. Also, concerning the output frequency domain, we need to ensure that

frequencies higher than the transmission frequency experience a deep rejection that

would cancel the effect of any jammer. Once, we have reached the optimum design

(lowest EVM degradation) via the LABVIEW simulation, we have decided on the

optimum type of the FIR filter (rectangular for WCDMA and hamming for WIMAX).


69

The second phase is the hardware implementation testing where we have to test the

FIR filter after downloading it to the FPGA. This phase is the hardest part of the testing

scheme because it requires compiling the FPGA for every single change in the FPGA VI,

sometimes such runs may require hours of running. During this phase, we have passed

through different design alternatives before reaching our final design. For example, we

have started designing our reconfigurable filter using only arrays and FIFOs (local and

DMA), but this VI failed to compile after 15 hour running time (see figure 4.33). Then,

we shifted to an implementation using BRAM memory blocks that usually do cause huge

latencies because of their negligible fetching times. In the next section, we would present

the achieved results based on this implementation and how would they compare to the

simulated results.

4.6.2- Results Assessment

In this section, we would present the strength of our implementation and how it

helps in saving hardware resources. First, the PCI 5640 board helps in exploring the

reconfigurability aspect by presenting a two level hierarchy, Host and FPGA. This

hierarchy permits the Host to use the same hardware resources in different channels by

using some control parameters as described in section 4.5. Starting with the WCDMA

channel, we were able to reach successful results based on our metrics. The first metric is

the frequency domain response of our filtered signal compared to the input as shown in

figure 4.38 and 4.39. The results show that the frequencies below 1.92 MHz conserve

their power, while any kind of jamming at high frequencies is highly attenuated from 10-

11-10-12 to 10-13-10-19 as seen in their graphs. Moreover compared to the software

simulation, the attenuation level is better because in the LABVIEW simulation, we

reached lower rejections of the order 10-13-10-15.


70

Figure 4.38: Frequency Domain of the Input WCDMA signal (Amp vs. freq)

Figure 4.39: Frequency Domain of the Input WCDMA signal (Amp vs. freq)

Concerning the second metric, the EVM value witnessed an increase from 0.102317

before filtering to 0.112394 after filtering which is equivalent to a percentage decrease:

22 ___ EVMOldEVMNewPercentagenDegradatio −=

22 102317.0112394.0_ −=PercentagenDegradatio = 4.654.654.654.65 % % % %

The distributions of the 4-QAM constellations before and after filtering are shown in

figures 4.40 and 4.41 respectively. Moreover, it is worth to mention that the initial value

of the EVM is different than the one obtained in the LABVIEW simulation because once


71

measured in such case, we need to use the fixed point notation thus rounding each value

before measuring the EVM. This is done in order to have a better idea about the

degradation effect that is only caused by the filter thus excluding other effects such as

fixed point notation errors. This percentage increase is higher than the case of the

software simulation where the increase was only approximated by 2.27 %. This difference

is due to the fact that in the software simulation we have higher precision values that are

truncated in the case of hardware implementation.

Figure 4.40: WCDMA Initial Constellations (Quad vs. in-phase)

Figure 4.41: WCDMA Filtered Constellations (Quad vs. in-phase)

The second standard, WIMAX, in our reconfigurable system exhibits similar results

under testing. The EVM value increases from 0.521008 to 0.524038 which is equivalent to

degradation of:


72

22 521008.0524038.0_ −=PercentagenDegradatio = 5.62 %

This degradation is also higher than the obtained one from the simulation which is

3.12%. We should mention that the value of the EVM in the case of the WIMAX is

higher than the one for WCDMA because the transmitter part from the WIMAX channel

does not include a pulse shaping filter as in the WCDMA case, especially that the pulse

shaping filter plays a major role in well distributing the constellations, so reducing the

EVM value. Concerning the frequency domain representation of the input signal and the

output signal are represented in figures 4.42 and 4.43 respectively. We easily notice that

the transition phase in PSD, shown in figure 4.42, is shopped off as seen in figure 4.43.

Figure 4.42: Frequency Domain of the Input WIMAX signal (Amp vs. Freq)


73

Figure 4.42: Frequency Domain of the filtered WIMAX signal (Amp vs. Freq)

After presenting our results, we believe that the LABVIEW simulation and the

hardware implementation have reached quite similar results with some differences in the

case of EVM in favor of the simulation and the frequency response in favor of the

hardware implementation. So, implementing a reconfigurable FIR filter on an FPGA is a

good option and it would be able to compete with the implementation of any other type

of FIR filters.

CH 5: System Design

Constraints




75

5- System Design Constraints

For any project, designers should be down to earth and try to face as many constraints

as possible that their system poses and at any level, ranging from economical levels,

passing through social, political and ethical levels and ending with more technical

constraints; like manufacturability and sustainability.

Actually our project is a very important step towards achieving a uniform global

system, which although has different languages of communications within it (WIMAX,

WLAN, WCDMA, GSM, etc…) there would exist a unique communication language

capable of connecting all such non-uniform standards. Accordingly, there will be no

more a need for the present so called “Roaming Service” which is a relatively costly

service. Therefore, economically, there would be huge savings due to the absence of such

service on one hand especially for business men who keep traveling from one region to

anther. On the other hand, such reconfigurable system has the vivid advantage of

reducing hardware resources and adapting to different standards.

However, such system poses some economical constraints. This might be due to the

fact that since this reconfigurable system has to group multiple standards together, base

stations and mobile systems or any wireless card have to install new wireless blocks such

as switches (to switch between these reconfigurable standards) or blocks to remove any

possible RF coupling between these signals; such blocks are not needed in the case of a

single wireless standard device.

Socially, our project would help increase the social activities between different

cultures because it would allow people from different regions using different

communication standards to communicate with each other at low costs. Moreover, this

would lead to a decrease in the price of the international calls, thus parents can easily

contact their relatives and keep track of their news.


76

Also, the manufacturing process would take a long time to be processed especially

that installing a reconfigurable hardware design on today cell phones and base stations

would take a long time. This transition phase between these two generations is going to

affect the huge advancement in the latest facilities and applications integrated in our

phones such as cameras, video streaming… because the research in this phase would be

mainly concentrated on building area and power efficient reconfigurable blocks. So, as a

result, this transition includes a kind of trade-off between unifying the world and having

a low development rate in design and facilities.

Our project would be a typical competitor to roaming services by replacing them with

very low costs. But, this would create an issue of security since communications between

different countries would be allowed without sufficient level of control which may pose

its own effect especially nowadays where many countries have been encouraged by

globalization to dominate others.

Projecting the issue of security, discussed previously, on an individual bases may lead

to some of the undesirable ethical consequences. Again, people may take advantage of

this new system to spy other systems and you can imagine in what such unethical act

would result and to whom it might be directed if this subject was not solved. For this

serious problem, there needs to be a sufficient attention given to this issue and its effects

need to be seriously considered and dealt with.

Concerning sustainability, our project imposes a common hardware platform on

which any new standard can be integrated. However, this wouldn’t be applicable for new

highly developed standards especially if they follow a complete different architecture.

Thus, two possibilities are left for our reconfigurable design, either to survive with

present standards on behalf of the new ones, or undergo major changes in its architecture

to support the new standards; surely this would lead to sacrificing some old ones.

77

ConclusionConclusionConclusionConclusion

The objective of our FYP project is to give a solution to the problem that the

commercial wireless communication industry is currently facing due to the different

link-layer evolution steps that each wireless standard has undergone. This existence, in

many countries, of incompatible different wireless network technologies, ranging from

2G to 4G, has imposed many difficulties in the deployment of global roaming facilities

and problems in rolling-out new features or services due to the presence of wide-spread

legacy subscriber handsets [26]. Our project concept promises to solve such problems by

implementing the radio functionality as Software modules (using LABVIEW) running on

a generic hardware platform, the PCI5640 Labview8.0 board.

We started with a general overview of the design by giving a literature survey about the

related subjects such as FIR, SDR, EVM etc… We have seen some proposed

reconfigurable, common, architectures and, more importantly, our proposed

reconfigurable block diagram for WIMAX, and WCDMA, in a noisy environment, is

introduced. We have simulated the WCDMA and WIMAX chains using LABVIEW to

have a theoretical reference for our implementation. Moreover, FIR design was

established, and most importantly a reconfigurable FIR module was implemented on the

PCI5640 Labview8.0 board. The results were quite pleasing especially when compared

with the theoretical outcomes resulted from LABVIEW simulation. There was a small

difference between EVM degradation of both the simulation and implementation which

proves the validity of both our design and implementation.

We believe that such project is a very important step towards achieving a whole

reconfigurable transceiver. Therefore, we suggest for future FYP students to continue

where we stopped and try to reconfigure the other blocks in the whole channel

transceiver.

Appendix I- Digital Filter Coefficients Design

78

APPENDIAPPENDIAPPENDIAPPENDIXXXX

I- Digital Filter Coefficients Design

Filter coefficients can be designed using an automated tool, a software application that

generates the taps based on various user-defined parameters. One example of such tools is

the “FS10.0” software or the “Filter Solutions 10.0”.

The following algorithm explains the tap design of an FIR filter. The frequency response

for FIR filters is periodic in the frequency domain with a period of sampling frequency.

Since it is periodic, it can be represented by a Fourier series as shown below:

∑∞

−∞=

Ω−Ω =k

jKjekheH ).()(

where h(k) are the impulse response coefficients that describe the digital FIR filter [16].

These coefficients can be determined from the frequency response using the following

equation:

Ω= Ω

+Ω

−Ω

Ω

∫ deeHnh jjo

o

π

π

ππ

).(2

1)(

Notice the finite number of coefficients given in the above formula. The chosen number

of coefficients (N) should be chosen according to time delay and implementation cost.

The indices above range between –M and M and accordingly, we are assuming the

number of coefficients is equal to N = 2M+1. By making this selection, we are effectively

setting all other coefficients to zero [15].

The frequency response can be determined using the following formula:

∑∞

−∞=

Ω−Ω =n

jnjenheH ).()(

Plotting the desired frequency response and the one based on the designed coefficients

allows us to check if the design is acceptable. Thus, the user can adjust several

Appendix I- Digital Filter Coefficients Design

79

parameters: allowed ripple, transition band … and accordingly increase or decrease the

number of taps that implement the filter [15].

Appendix II- Area Considerations for Variable FIR Design

80

II-Area Considerations for Variable FIR Design

As mentioned in section 2.3.2, there exists a never-ending demand for decreasing the

amount of hardware used in a system. This leads to substantial benefits like reduced cost

and power consumption, increased application functionality1, and thus increased

utilization of FPGA resources …

In most FIR implementation, hardware consumption is mainly due to the multiplier

blocks rather than adder modules [23]. Different algorithms have been proposed for

efficient implementation of multiplier blocks. Previously, different algorithms were

proposed for minimizing adder hardware cost since it was assumed that the adder cost

dominates the area requirement; from a VLSI point of view. However, after the

introduction of the FPGA as the hardware platform, the solution of minimizing adders’

complexity does not work anymore since the “FPGA has a fixed architecture for

implementing digital logic”. Instead, it is the architecture design that minimizes such cost

[23].

Different commonly used approaches and architectures that increase resource

utilization are considered below:

Consider first the standard FIR implementation shown in figure 2.4. The figure shows a

full parallel, fixed coefficient FIR filter. For each tap, the filter requires one multiplier,

one adder and one delay element. Thus the resource usage is proportional to the number

of coefficients [14, 24, 25].

Other enhanced architectures and techniques with higher complexity include array

multiplication, multipliers using add and shift operations, transposed FIR, transposed FIR

architecture with multiplier block, MAG (Minimized adder group) algorithm or

multiplier design, architecture based on computational sharing multipliers (CSHM).

These algorithms are explained in more details below:

1 by using the extra available area


81

Array MultiplicationArray MultiplicationArray MultiplicationArray Multiplication: this is one of the most commonly used techniques when fast

multiplication is needed. It is mainly used to implement MAC (multiply and Accumulate)

operations. In array multiplication, rows of adders are placed in parallel. Multiplexers

then decide whether to add partial products or not, based on the corresponding bit of the

multiplicand [24]. A pipeline structure can be implemented by inserting flip-flops or

registers between the different rows of adders or stages. The drawback of array

multiplication is that it needs a large number of logic blocks, even for a small number of

multipliers multiplier.

Multipliers Using Add and Shift OperationsMultipliers Using Add and Shift OperationsMultipliers Using Add and Shift OperationsMultipliers Using Add and Shift Operations: this technique is also called distributed

arithmetic. It differs from the previously mentioned technique in the order in which it

performs the steps in a MAC operation. Consider the FIR example: One typical operation

in the FIR is the multiplication of ai by bj, the multiplication of the ith tap (ai) by the jth

input (bj). By breaking ai into its bits, aibj can be represented as follows:

aibj = (ai0bj) + (ai1bj)s(1) + … + (ai,n-1bj)s(n-1)

In distributed arithmetic, however, aibj is modified to:

aibj = (ai0bj) + ( (ai1bj)s(1) + ( ( (ai2bj)s(1) + … + (…( ( (ai,n-1bj)s(1)) ) )…)s(1)

In other words, addition is performed before the multiplication operation (i.e shift and

then add). This helps reduce FPGA resources [24].

Transposed FIR filterTransposed FIR filterTransposed FIR filterTransposed FIR filter: Another commonly used architecture is the transposed FIR

architecture. The architecture is shown in figure A.1. This architecture is mathematically

identical to the standard FIR implementation. However, it performs a more efficient

pipelining than the standard one because of its reduced latency; taps receive input sample

simultaneously and thus identical tap coefficient magnitudes can share multiplication

resources [27].


82

Figure A.1: Transposed FIR

Transposed FIR aTransposed FIR aTransposed FIR aTransposed FIR architecture with multiplier blockrchitecture with multiplier blockrchitecture with multiplier blockrchitecture with multiplier block: this is an enhanced version of the

transposed FIR filter. The architecture is shown in figure A.2. This architecture

introduces a multiplier block that is based on cascaded additions, subtractions or shifts.

The complexity of the multiplication process is hidden inside the block and is

independent of the other operations. The multiplier block, thus, determines the

efficiency of the filter implementation [23].

Figure A.2: Transposed FIR with multiplier block

MAG (MinimizMAG (MinimizMAG (MinimizMAG (Minimized adder group) algorithm or multiplier designed adder group) algorithm or multiplier designed adder group) algorithm or multiplier designed adder group) algorithm or multiplier design: The algorithm was

proposed by Dempster and Macleod. It generates minimum adder graphs that minimize

the number of adders used for implementing integer multiplication. These adder

reductions reduce hardware cost. In brief, the algorithm first finds different graphs that

can perform the required multiplication. It then chooses between the different graphs

according to the minimum number of required single-bit full adders [23].

Architecture based on ComputationalArchitecture based on ComputationalArchitecture based on ComputationalArchitecture based on Computational Sharing Multipliers (CSHM): Sharing Multipliers (CSHM): Sharing Multipliers (CSHM): Sharing Multipliers (CSHM): The architecture

takes advantage of the computational reuse of different partial vector products inorder to

enhance resource utilization. It aims to reduce redundant computations in the

convolution process. The main idea is to decompose the sequence of bits that represent

the different coefficients by a smaller set of sequences called alphabets. For example, if


83

c0= 00110111, then C0.X can be rewritten as 24.x. (0011) + 0111.x. Thus the coefficient is

composed of two alphabets 0011 and 0111. Note that an alphabet space should span all

the available coefficients. So if another coefficient contains an equal alphabet of another

coefficient, it can reuse the previously computed multiplication result. Moreover, the

entire multiplication process is reduced to a set of add and shift operations [12,27]. The

approach can also be applied for FIR filters with programmable coefficients.

Appendix III- LABVIEW 8.0 System Board: PCI 5640

84

III-LABVIEW 8.0 System Board: PCI 5640

The PCI 5640 device, a LABVIEW 8.0 RIO system board, is mainly based on a

reconfigurable FPGA and some fixed I/O resources, i.e. an IF transceiver. Unlike

traditional IF digitizers where the functionality of the system is completely

predetermined, the FPGA allows the user to configure the behavior of various modules to

meet the system requirements. The FPGA is built around a reconfigurable architecture

where the user can define I/O resources or create new ones. Figure A.3 shows a high level

diagram of the reconfigurable architecture [28].

Figure A.3: High level FPGA_ I/O architecture

The I/O resources can either be outputs of the ADC and DAC, digital input lines,

digital output lines …. Software modules access the device through the BUS interface

while the FPGA provides logic need for the connectivity1 between the bus interface and

the I/O resources. Figure A.4 illustrates the FPGA logic for the IF transceiver [28].

Figure A.5 shows a high level diagram of the PCI 5640 device. Note that the DC power

control and the memory modules are hidden to simply the diagram [28].

1 Timing, triggering, processing, custom I/O


85

Figure A.4: FPGA logic for the IF transceiver

Figure A.5: High level Diagram of the PCI 5640

The PCI 5640 device has two analog inputs, AI, ports. The input signal is passed

through low pass filter, converted to a differential signal and then passed to the ADC1.

The signal is then downconverted and passed to the FPGA.

The device also includes two analog outputs, AO, ports. The pulse shaping filter maps

bits to signals that are passed to a compensation filter, and then to an interpolation filter.

The device then performs upconversion in the digital domain and finally passes the signal

to the DAC2.

1 AD6654 component form analog instruments

2 AD9857 component from analog instruments


86

The RTSI, real time system integration bus, allows multiple RIO PCI-5640 devices to

share the same trigger and events’ synchronization signals.

The PCI bus provides PCI bus interface for the PCI 5640 device with bus mastering

capabilities. The PCI bus allows to efficiently transfer data between the host PC and the

5640 device.

The PCI 5640 device also includes onboard memory of 2 MB (SRAM) inaddition to

the RAM available in the Virtex-II Pro (XC2VP30) FPGA. Section IV in the Appendix

gives more details about the capabilities of the FPGA.

The advantage of such board in our project is that it uses a relatively easy software,

LABVIEW, and thus hides the complexity of the common HDL languages, VHDL and

Verilog, that are commonly used to design hardware components. Moreover, it also

supports designs that are created using HDL. So, modules created using VHDL or some

other HDL language can be imported to the LABVIEW as custom VIs

Appendix IV- Virtex-II Pro FPGA Capabilities

87

IV- Virtex II Pro FPGA Capabilities

The LABVIEW 8.0 PCI 5640 System board contains a Virtex II Pro device that is

connected to all resources on the device (ADC, DAC, clk distribution circuit (CDC),

external trigger…). The Virtex II Pro is a platform FPGA based on IP cores and

customized modules [29]. The device present in our system board is the XC2VP30 FPGA.

This kind of device incorporates many resources and features some of which are:

----RocketIO transceiver blocks:RocketIO transceiver blocks:RocketIO transceiver blocks:RocketIO transceiver blocks: a full duplex serial transceiver whose baud rates range

from 600 Mbits/sec to 3.125 Gbits/sec. It is a flexible serial to parallel and parallel to serial

embedded transceiver cores used to interconnect busses, backplanes or other subsystems

with high bandwidth [29]. Our device supports up to 8 RocketIO transceiver blocks.

----PowerPC Processor blocksPowerPC Processor blocksPowerPC Processor blocksPowerPC Processor blocks: an embedded 300 MHz or more with Harvard architecture

block. It can execute instructions at a sustained rate of 1 instruction per cycle. Our

device can support up to 2 PowerPC processors [29].

----30816 logic cells30816 logic cells30816 logic cells30816 logic cells where a logic cell is defined as

Logic cell = (1) 4-input LUT + (1) FF + Carry logic

----18x18 multiplier block18x18 multiplier block18x18 multiplier block18x18 multiplier block: an 18 bit x 18 bit multiplier block. The block is a two’s

complement signed multiplier and is characterized by a very efficient structure. The

device can hold up to 136 multiplier blocks.

----SelectRAM+ blockSelectRAM+ blockSelectRAM+ blockSelectRAM+ block: this block contains memory resources of 18 Kb of True Dual Port

RAM. It can be cascaded to implement large memory blocks. Our device supports 136

18Kb blocks with MaxBlock RAM of 2448 Kb.

----Max User I/O pads of 644Max User I/O pads of 644Max User I/O pads of 644Max User I/O pads of 644

----DCMDCMDCMDCM: digital clock manager; provides self calibrating, fully digital solutions for clock

distribution delay compensation, clock multiplication and division, and fine and coarse

clock phase shifting. Our device can support up to 8 DCMs.

Appendix V- Fixed Point Notation

88

V- Fixed Point Notation

In computing arithmetic, any fixed point integer can be represented by a pair of

integers (n, e), the mantissa and the exponent. The pair represents the function n.2-e. If ‘e’

is a variable quantity, then the pair (n, e) represents a floating point number. On the

other hand, if e is known in advance, in compile time, then the pair is said to be a fixed

point number.

The following steps are the needed operations used in fixed point notation:

• Converting a number to fixed point notation is simply dividing this number by 2-e

where ‘e’ is a fixed parameter, and the mantissa would be presented in our design

in 16-bit numbers stored in registers without the known exponent.

• Addition / Subtraction: addition of the mantissa without change of the exponents

n2-e ± m2-e = (n ± m)2-e

• Multiplication / Division: multiplication of the two mantissas and shifting to the

answer to the right ‘e’ times.

n2-p x m2-p = mn2-2p = mn 2-p x 2-p = (mn2-p ) >> p

The above argument was given for an exponent value of 2. The argument, however, can

be generalized for an exponent. An exponent of e = 10, for example is a valid example.

Note that the more bits you use to represent the mantissa and the exponent, the better

the resolution for the output is. The designer, however, wants to represent the

coefficients with the least possible number of bits that gives good accuracy for the output;

the aim of the designer is to take as much utilization of the resources as possible.

Unfortunately, the chosen number of bits is inversely proportional to the magnitude of

the quantization error.

89

BibliographyBibliographyBibliographyBibliography

1. Buracchini, E. “The software radio concept”. IEEE, Communications Magazine.

Volume 38, Issue 9, Page(s):138 – 143. Sept, 2000.

2. H. Córdova , P. Boets L. Van Biesen. Vrije. “Insight Analysis into WI-MAX Standard

and its Trends”. Universiteit Brussel, Belguim.

3. PhoneScoope Co. “WIMAX 8-2.16” Retrieved in 2005 from

http://www.phonescoop.com/glossary/term.php?gid=187

4. Amine Sobh, Mohammad Boulmalf, Shakil Akhtar, “Physical Layer Performance of

802.11g WLAN” Applied Telecommunication Symposium, UAE University

5. Hasssan Yaghobi. “WIMAX2 :802.16, Broadband Wireless Access: the next big thing

in Wireless”. Intel. Sept 16, 2003

6. “Interpolation” Retrieved in 2005 from

http://www.dspguru.com/info/faqs/multrate/interp.htm

7. Phil Shcniter. “Upsampling”. Connexions SM . Retrieved on Oct, 2005 from

http://cnx.rice.edu/content/m10403/latest/

8. Wipro Technologies. “Software-Defined Radio, White paper. A technology

Overview”. Available at: http://www.wipro.com/dsp. Aug, 2002.

9. G. Girau, M. Martina, A. Molino, A. Terreno, and F. Vacca. “FPGA Digital Down

Converter IP for SDR Terminals”. IEEE, Signals, Systems and Computers. Vol. 2,

pages: 1010-1014. November 2002.

10. Enrico Buracchini. “The Software Radio Concept”. IEEE, Communication Magazine.

Vol. 38, pages 138-43. September 2000

11. Apostolos A. Kountouris, Christphe Moy, Luc Rambaud, Pascal Le Corre. “A

Reconfigurable Radio Case Study: A Software based Mulit-Standard Transceiver for

90

UMTS, GSM, EDGE, and Bluetooth”. IEEE, Vehicular Technology Conference. Vol. 2,

pages: 1196 – 1200. October 2001.

12. Jerry C.-Y. Kao, C.-F Su and Allen C.-H Wu. “High Performance FIR Generation

Based on a Timing Driven Architecture and Component Selection Method”. IEEE,

Circuits and Systems. Vol. 4, pages:759-762. May 2005

13. Jun Seo Lee, Jong Hyun Park, Sang Woo Kim, Ying Shan Lee and Heung Gyoon Ryu.

“Implementation of DSP-Based Digital Receiver for the SDR Application”. IEEE,

Communications Society. Vol. 1, pages: 6-10. Aug-Sept 2004.

14. Litwin, Louig. “FIR and IIR digital Filter, the effects of finite bit precision”. IEEE

potentials. Vol. 19, Issue 4, pages: 28-31. Oct-Nov 2000.

15. Mitra, Sanjit Kumar. “Digital Signal Processing: a computer-based approach”. 2nd

edition, 2001. McGrow-Hill series in electrical and computer engineering, Boston

16. Thede, Les. “Practical Analog and Digital Filter Design”. Artech House Inc. 2004

17. Hoffman, M.W, Stewart, R. W. “Digital Signal Processing From A to Z”. Blue Box

Multimedia Inc. 1998

18. Carson K. S. Pun, S.C. Chan, K.S. Yeung, and K.L. Ho. “On the Design and

Implementation of FIR and IIR Filters With Variable Frequency Characteristics”.

IEEE, Trans. On Circuits and Systems: Analog And Digital Signal Processing. Vol. 2,

pages 185-188. May 2002.

19. Khalid H. Abed, Vivek Venugopal, Shailesh B. Nerurkar. “High Speed Digital Filter

Design Using Minimal Signed Digit Representation”. IEEE, South East Conference.

Pages 105-110. April 2005

20. Tian Bo Deng. “Variable 2-D FIR Digital Filter Design and Parallel Implementation”.

IEEE, Trans on Circuits and Systems: Analog and Digital Signal Processing. Vol. 46,

pages: 631-635. May 1999.

91

21. Rachid Zarour, Moustafa M. Fahmy. “A Design Technique for Variable Digital

Filters”. IEEE, Trans. On Circuits and Systems. Vol. 36, pages 1473-1478. November

1989

22. Cain, G.D, Hermanowicz, E, Rojewski, M, Tarczynski, A. “WLS design of variable

frequency response FIR filters””””. Circuits and Systems, ,Proceedings of 1997 IEEE

International Symposium. Volume 4, Page(s):2244 – 2247. June 1997

23. Macpherson, K.N.; Stewart, R.W. “Low FPGA area multiplier blocks for full parallel

FIR filters””””, Proceedings of the IEEE International Conference on Field-

Programmable Technology. Page(s):247 – 254. 2004.

24. Marc. Cummings, Shinichiro Haruyama. “FPGA in the Software Radio”. IEEE,

Communication Magazine. Vol. 37, pages: 108-112. February 1999

25. David A. Parker, Kashab K. Parhi. “Area-Efficient Parallel FIR digital

Implementation”. IEEE, Application Specific Systems, Architectures and Processors

(ASAP). Pages: 93-111. August 1996.

26. Xiaopeng Li, “Architectures and specs help analysis of multi-standard receivers”.

Available at http://www.ece.osu.edu/vlsi/architecture_multi_standard_receivers.htm.

March 12, 2003

27. Jongsun Park, Woopyo Jeong, Hunso Choo, hamid Mahmoodi-Meimand, Kaushik

Roy. “High Performance and Low Power FIR Filter Design Based on Sharing

Multiplication”. ACM, Low Power Electronics and Design, USA. Aug. 2002.

28. National Instruments. “NI5640R User manual”. Retrieved in 2005 from www.ni.com

29. Xilinx. “Virtex II Pro and Virtex II Pto X Platform FPGA: Complete Data Sheet”

Retrieved on October 10, 2005 from www.xilinx.com

Reconfigurable Baseband Blocks for Wireless Multistandard ... · One leading solution, the subject of our project, is the dynamic ... reconfigurable transceiver that is adaptable

Documents