161-514-2-PB

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

http://ijcer.org ISSN: 2278-5795 Page 94

LOW POWER ADD AND SHIFT MULTIPLIER DESIGN

BZFAD ARCHITECTURE

Prof Prasann D.Kulkarni1, Prof.S.P.Deshpande

2, Dr.G.R.Udupi

3

1Lecturer,Dept of E&CE, KLSs VDRIT, Haliyal, India

2Asst.Prof, Dept of E&CE, KLSs G.I.T, Belgaum, India

3Principal, KLSs VDRIT, Haliyal, India

[email protected],

[email protected],

[email protected]

Abstract - A multiplier is one of the key hardware blocks in most digital and high performance systems such as

FIR filters, digital signal processors and microprocessors

etc. With advances in technology, many researchers have

tried and are trying to design multipliers which offer

either of the following- high speed, low power

consumption, regularity of layout and hence less area or

even combination of them in multiplier. Thus making

them suitable for various high speed, low power and

compact VLSI implementations. However area and

speed are two conflicting constraints. So improving

speed results always in larger areas. So here we try to

find out the best trade off solution among them.

Generally as we know multiplication goes in two basic

steps. Partial product and then addition. Hence here, we

first try to design Considering the design of Wallace tree

multiplier then followed by Booths Wallace multiplier

and comparing the speed and Power consumption in

them.

Motivation - As the scale of integration keeps

growing, more and more sophisticated signal

processing systems are being implemented on a

VLSI chip. These signal processing applications

not only demand great computation capacity but

also consume considerable amount of energy.

While performance and Area remain to be the two

major design tools, power consumption has

become a critical concern in todays

VLSI system design. The need for low-power

VLSI system arises from two main forces. First,

with the steady growth of operating frequency and

processin capacity per chip, large currents have to

be delivered and the heat due to large power

consumption must be

removed by proper cooling techniques. Second,

battery life in portable electronic devices is

limited.

Low power design directly leads to prolonged

operation time in these portable devices.

Multiplication is a fundamental operation in

most signal processing algorithms. Multipliers

have large area, long latency and consume

considerable power. Therefore low-power

multiplier design has been an important part in

low- power VLSI system design. A systems

performance is generally determined by the

performance of the multiplier because the

multiplier is generally the slowest element in the

system. Furthermore, it is generally the most area

consuming. Hence,optimizing the speed and area

of the multiplier is a major design issue. However,

area and speed are usually conflicting constraints

so that improving speed results mostly in larger

areas.

We study different adders and compare them, so

that we can judge to know which adder was best

suited for situation.

Ripple Carry Adder has a smaller area while having lesser speed.

Carry Select Adders are high-speed but posses a larger area.

Carry Look Ahead Adder is in between the spectrum having a proper trade off between

time and area complexities.

Coming to Multipliers, we consider different

Multipliers starting from Array Multiplier to

Wallace Tree, Booth Multipliers, both Radix-2

and Radix-4.

Array Multiplier is the worst case multiplier

consuming highest amount of power. Then comes

the Radix-2 Booth multiplier which consumes

lesser power than array multiplier. The Wallace

Tree multiplier and Booth Multiplier Radix-4

have nearly same amount of delay while Radix-4

Booth consuming lesser power than the other.

Hence we reach to a conclusion that Booth Radix-

4 Multiplier is best for situations requiring Low

power Applications. However, the benefit

achieved comes at the expense of increased

http://ijcer.org/mailto:[email protected],[email protected]



hardware complexity. Indeed, this implementation

requires hardware for the encoding and for the

selection of the partial products. Among other

multipliers, shift-and-add multipliers have been

used in many applications for their simplicity and

relatively small area requirement. The architecture

in BZFAD, gives an optimization in both power

and area.

Table 1: Comparison of address

Adder Delay

for n bit

Rea

for n

bit

Area

delay

product

Ripple

carry

adder

2n 7n 14n2

Carry

select

adder

2.8(n)1/2

14n 39.6(n)3/2

Carry

look

ahead

adder

4log2n 4n 16nlog2n

Table 2:.Comparision Multipliers

Multiple Power

Consumption

Speed

Array

Multiplier

High Limited

Radix-2

Booth

Multiplier

Less than

array

Moderate

Radix-4

Booth

Less than

other

Highest

Wallace

tree

multiplier

Less than

radix-2

High

1. INTRODUCTION

Power dissipation of VLSI chips is traditionally

a neglected subject. In the past, the device density

and frequency were low enough that it was not a

constraining factor in chips. As the scale of

integration improves, more transistors, faster and

smaller than their predecessors, are being packed

into a chip. This leads to the steady growth of the

operating frequency and processing capacity per

chip, resulting in increased power dissipation.

The power consumption in digital CMOS circuit

can be described by

Pavg=Pdynamic+Pshortcircuit+Pleakage+Pstatic

(1)

The dynamic power dissipation is caused by

charging and discharging of capacitances in the

circuit. The short circuit power consumption is

caused by the current flow through the direct path

existing between the power supply and the ground

during the transition phase. The n-MOS and p-

MOS transistors used in a CMOS logic circuit

commonly have non zero reverse leakage and sub

threshold current. The computation of a multiplier

manipulates two input data to generate many

partial products for subsequent addition

operations, which in the CMOS circuit design

require many switching activities. The switching

activities within the functional unit of a multiplier

accounts for the majority of the power dissipation

of a multiplier, as given in the following equation

Pswitching = C Vdd2 fclk (2)

Where is the switching activity parameter, C

is the loading capacitance, Vdd is the operating

voltage and fclk is the operating frequency.

Shift-and-add multiplication is similar to the

multiplication performed by paper and pencil.

This method adds the multiplicand X to itself

Y times, where Y denotes the multiplier. To

multiply two numbers by paper and pencil, the

algorithm is to take the digits of the multiplier one

at a time from right to left, multiplying the

multiplicand by a single digit of the multiplier and

placing the intermediate product in the appropriate

positions to the left of the earlier results. To

perform the entire operations for getting the final

product, the conventional architecture for shift and

add multipliers require many switching activities.

So the dynamic power dissipation is more in

conventional architecture. By eliminating or

reducing the sources switching activity in the

conventional multiplier, low power architecture of

multiplier can be derived. Being one among the

functional components of many digital systems

the reduction of power dissipation in multipliers

should be as much as possible.

http://ijcer.org/



BZFAD

A low-power structure called BZ-FAD (Bypass

Zero, Feed A Directly) for shift-and-add

multipliers is proposed. The architecture

considerably lowers the switching activity of

conventional multipliers. The modifications to the

multiplier which multiplies A by B include the

removal of the shifting the B register, direct

feeding of A to the adder, bypassing the adder

whenever possible, using a ring counter instead of

a binary counter and removal of the partial

product shift. The architecture makes use of a

low-power ring counter proposed in this work.

Simulation results for 32-bit radix-2 multipliers

show that the BZ-FAD architecture lowers the

total switching activity up to 76% and power

consumption up to 30% when compared to the

conventional architecture. The proposed multiplier

can be used for low-power applications where the

speed is not a primary design parameter.

The rest of the paper is organized as follows.

Section II briefly reviews the background

information about conventional shift and add

multiplier. Section III describes the architecture

description of the low power multiplier. Section

IV describes the low power ring counter

architecture. Results are discussed in section V

and conclusion is in the last section.

2. TYPES OF ADDERS

Addition is the most common and often used

arithmetic operation on microprocessor, digital

signal processor, especially digital computers.

Also, it serves as a building block for synthesis all

other arithmetic operations. Therefore, regarding

the efficient implementation of an arithmetic unit,

the binary adder structures become a very critical

hardware unit. Although many researches dealing

with the binary adder structures have been done,

the studies based on their comparative

performance analysis are only a few.

With respect to asymptotic delay time and area

complexity, the binary adder architectures can be

categorized into four primary classes as given

below.

2.1 Ripple Carry Adder(RCA)

The well known adder architecture, ripple carry

adder is composed of cascaded full adders for n-

bit adder, as shown in figure 2.1.It is constructed

by cascading full adder blocks in series. The carry

out of one stage is fed directly to the carry-in of

the next stage. For an n-bit parallel adder it

requires n full adders.

Figure 1: A 4-bit Ripple Carry Adder

Logic equations

gi = ai bi p = ai xor bi.

Ci+1 = gi + pi.ci Si = pi xor ci

Complexity and Delay for n-bit RCA structure

ARCA = O (n) = 7n

TRCA = O (n) = 2n

Not very efficient when large number bits numbers are used.

Delay increases linearly wit bit length. 2.2 Carry Select Adder(CSLA)

In Carry select adder scheme, blocks of bits are

added in two ways: one assuming a carry-in of 0

and the other with a carry-in of 1.This results in

two precomputed sum and carry-out signal pairs

(s0i-1:k , c0i ; s1i-1:k , c1i) , later as the blocks

true carry-in (ck) becomes known , the correct

signal pairs are selected. Generally multiplexers

are used to propagate carries.

Figure 2: A Carry Select Adder with 1 level using

n/2- bit RCA

Logic equations

Si-1: k = ck' s0i-1: k + ck s1i-1: k

ci = ck' c0i + ck c1i

Complexity and Delay for n-bit CSLA structure

ACSLA = O (n) = 14n

TCSLA = O (n1/*l+1) = 2.8n1/2.

Because of multiplexers larger area is required.

Have a lesser delay than Ripple Carry Adders (half delay of RCA).

http://ijcer.org/



Hence we always go for Carry Select Adder while working with smaller no of

bits.

2.3 Carry Look Ahead Adder(CLA)

Carry Look Ahead Adder can produce carries

faster due to carry bits generated in parallel by an

additional circuitry whenever inputs change. This

technique uses carry bypass logic to speed up the

carry propagation.

Figure 3: 4-BIT CLA Logic equations

Let ai and bi be the augends and addend inputs,

ci the carry input, si and ci+1, the sum and carry-

out to the ith bit position. If the auxiliary

functions, pi and gi called the propagate and

generate signals, the sum output respectively are

defined as follows.

pi = ai + bi gi = ai bi

si = ai xor bi xor ci ci+1 = gi + pici

As we increase the no of bits in the Carry Look

Ahead adders, the complexity increases because

the no. of gates in the expression Ci+1 increases.

So practically its not desirable to use the

traditional CLA shown above because it increases

the Space required and the power too.

Instead we will use here Carry Look Ahead

adder (less bits) in levels to create a larger CLA.

Commonly smaller CLA may be taken as a 4-bit

CLA. So we can define carry look ahead over a

group of 4 bits. Hence now we redefine terms

carry generate as [Group Generated Carry] g[

i,i+3 ] and carry propagate as [Group Propagated

Carry] p[ i,i+3 ] which are defined below.

Redefined Equations

g[ i,i+3 ] = gi+3 + gi+2 pi+3 + gi+1 pi+2 pi+3 +

g[i pi+1 pi+2 pi+3

p[ i,i+3 ] = pi pi+1 pi+2 pi+3

Now the modified block diagram for the Carry

Look ahead Adder (8-bit) using levels (of 4-bit

CLA) will be as block diagram below

Figure 4: 8-BIT Carry Look Ahead Generator

(using 2-bit CLA)

Complexity and Delay for n-bit CLA structure

ACLA = O (n) = 14n

TCLA = O (log n) = 4 log2n.

3. TYPES OF MULTIPLIERS

3.1. Wallace Tree Multiplier

The Wallace tree multiplier is considerably

faster than a simple array multiplier because its

height is logarithmic in word size, not linear.

However, in addition to the large number of

adders required, the Wallace trees wiring is much

less regular and more complicated. As a result,

Wallace trees are often avoided by designers,

while design complexity is a concern to them.

Wallace tree styles use a log-depth tree network

for reduction. Faster, but irregular, they trade ease

of layout for speed. Wallace tree styles are

generally avoided for low power applications,

since excess of wiring is likely to consume extra

power.

While subsequently faster than Carry-save

structure for large bit multipliers, the Wallace tree

multiplier has the disadvantage of being very

irregular, which complicates the task of coming

with an efficient layout.

Figure 5: Wallace Tree Block Diagram

http://ijcer.org/



Three step processes are used to multiply two

numbers

Formation of bit products. Reduction of the bit product matrix into a

two row matrix by means of a carry save

adder.

Summation of remaining two rows using a faster Carry Look Ahead Adder (CLA).

3.2 Booths Multiplier

Though Wallace Tree multipliers were faster

than the traditional Carry Save Method, it also

was very irregular and hence was complicated

while drawing the Layouts. Slowly when

multiplier bits gets beyond 32-bits large numbers

of logic gates are required and hence also more

interconnecting wires which makes chip design

large and slows down operating speed

Booth multiplier can be used in different modes

such as radix-2, radix-4, radix-8 etc. But we

decided to use Radix-4 Booths Algorithm

because of number of Partial products is reduced

to n/2.

3.2.1. Booth Multiplication Algorithm(Radux 4)

One of the solutions realizing high speed

multipliers is to enhance parallelism which helps

in decreasing the number of subsequent

calculation stages. The Original version of

Booths multiplier (Radix 2) had two

drawbacks.

The number of add / subtract operations became variable and hence became

inconvenient while designing Parallel

multipliers.

The Algorithm becomes inefficient when there are isolated 1s

These problems are overcome by using Radix 4

Booths Algorithm which can scan strings of three

bits with the algorithm given below. The design of

Booths multiplier in this project consists of four

Modified Booth Encoded (MBE), four sign

extension corrector, four partial product

generators (comprises of 5:1 multiplexer) and

finally a Wallace Tree Adder. This Booth

multiplier technique is to increase speed by

reducing the number of partial products by half.

Since an 8-bit booth multiplier is used in this

project, so there are only four partial products that

need to be added instead of eight partial products

generated using conventional multiplier. The

architecture design for the modified Booths

Algorithm used in this project is shown below.

Figure 6: Architecture of designed Booth

Multiplier.

4. CONVENTIONAL SHIFT & ADD

MULTIPLIER

Figure 5. shows the architecture of a

conventional shift and add multiplier. The dashed

ovals show the major sources of switching

activities. The multiplier is shifted in each cycle

and the bit which getting out of register B is

connected to the select pin of multiplexer, mux_A.

As the select signal changes, the output of mux_A

also changes. This causes the adder operation. The

partial product is required to be shifted in every

cycle. The counter is for checking whether the

required number of operations has been

performed. The major sources of switching

activities are summarized as below

Shifting of the B register

Activity in the counter

Activity in the adder

Switching between 0 and A in the

multiplexer

Activity in the multiplexer select

Shifting of the partial product register

By eliminating or reducing the switching activity

described above, low power architecture can be

derived architecture can be derived.

Figure 7: Architecture of conventional shift

and add multiplier with major

source of switching activity.

http://ijcer.org/



4.1 State Diagram

Figure 8: Conventional add shift multiplier

state diagram

5. THE PROPOSED LOW POWER

MULTIPLIER: BZ-FAD

5.1 Architecture

To derive a low-power architecture, we

concentrate our effort on eliminating or reducing

the sources of the switching activity discussed in

the previous section. The proposed architecture

which is shown in Figure 6.3 is called BZ-FAD.

5.1.1 Shift of the B Register An example of shifting of register is shown here

Figure 9: Shift and add multiplication example

In the traditional architecture (see Figure 9), to

generate the partial product, B(0) is used to decide

between A and 0. If the bit is 1, A should be

added to the previous partial product, whereas if it

is 0, no addition operation is needed to generate

the partial product. Hence, in each cycle, register

B should be shifted to the right so that its right bit

appears at B(0); this operation gives rise to some

switching activity.

Figure 10: Multiplier with ring counter

For a 3 bit multiplier 3 bit ring counter is used.

Table 2 gives the required bit and counter output

Combination

TABEL 3: Counter output with required bit.

To avoid this, in the proposed architecture (Fig

11) a multiplexer (M1) with one-hot encoded bus

selector chooses the hot bit of B in each cycle. A

ring counter is used to select B(n) in the nth cycle.

As will be seen later, the same counter can be

used for block M2 as well. The ring counter used

in the proposed multiplier is noticeably wider (32

bits vs. 5 bits for a 32-bit multiplier) than the

binary counter used in the conventional

architecture; therefore an ordinary ring counter, if

used in BZ-FAD, would raise more transitions

than its binary counterpart in the conventional

architecture. To minimize the switching activity of

the counter, we utilize the low-power ring counter,

which is described in the next section.

5.1.2 Reducing Switching Activity of te Adder

In the conventional multiplier architecture

(Figure 7), in each cycle, the current partial

product is added to A (when B(0) is one) or to 0

(when B(0) is zero). This leads to unnecessary

transitions in the adder when B(0) is zero. In these

cases, the adder can be bypassed and the partial

product should be shifted to the right by one bit.

This is what is performed in the proposed

architecture which eliminates unnecessary

switching activities in the adder. As shown in

Figure 11, the Feeder and Bypass registers are

used to bypass the adder in the cycles where B(n)

is zero. In each cycle, the hot bit of the next cycle

(i.e., B(n + 1)) is checked. If it is 0, i.e., the adder

is not needed in the next cycle, the Bypass register

is clocked to store the current partial product. If

http://ijcer.org/



B(n + 1) is 1, i.e., the adder is really needed in the

next cycle, the Feeder register is clocked to store

the current partial product which must be fed to

the adder in the next cycle. Note that to select

between the Feeder and Bypass registers we have

used NAND and NOR gates which are inverting

logic, therefore, the inverted clock (~Clock in

Figure6.3) is fed to them. Finally, in each cycle,

B(n determines if the partial product should come

from the Bypass register or from the Adder output.

In each cycle, when the hot bit B(n) is zero, there

is no transition in the adder since its inputs do not

change. The reason is that in the previous cycle,

the partial product has been stored in the Bypass

register and the value of the Feeder register,

which is the input of the adder, remains

unchanged. The other input of the adder is A,

which is constant during the multiplication. This

enables us to remove the multiplexer and feed

input A directly to the adder, resulting in a

noticeable power saving. Finally, note that the

BZ-FAD architecture does not put any constraint

on the adder type. In this work, we have used the

ripple carry adder which has the least average

transition per addition among the look ahead,

carry skip, carry-select, and conditional sum

adders.

5.1.3 Shift of the PP Register

In the conventional architecture, the partial

product is shifted in each cycle giving rise to

ransitions. Inspecting the multiplication algorithm

reveals that the multiplication may be completed

by processing the most significant bits of the

partial product, and hence, it is not necessary for

the least significant bits of the partial product to

be shifted. We take advantage of this observation

in the BZ-FAD architecture. Notice that in Figure

11 for PLow, the lower half of the partial product,

we use k latches (for a k-bit multiplier). These

latches are indicated by the dotted rectangle M2 in

Figure 11 .

Figure 11: The proposed low power multiplier

architecture (BZ-FAD)

In the first cycle, the least significant bit, PP(0),

of the product becomes finalized and is stored in

the rightmost latch of PLow. The ring counter

output is used to open (unlatch) the proper latch.

This is achieved by connecting the S/~H line of

the nth latch to the nth bit of the ring counter

which is '1' in the nth cycle. In this way, the nth

latch samples the value of the nth bit of the final

product (Figure 11). In the subsequent cycles, the

next least significant bits are finalized and stored

in the proper latches. When the last bit is stored in

the leftmost latch, the higher and lower halves of

the partial product form the final product result.

Using this method, no shifting of the lower half of

the partial product is required. The higher part of

the partial product, however, is still shifted.

Comparing the two architectures, BZ-FAD saves

power for two reasons: first, the lower half of the

partial product is not shifted, and second, this half

is implemented with latches instead of flip-flops.

Note that in the conventional architecture (Fig 1)

the data transparency problem of latches prohibits

us from using latches instead of flip-flops for

forming the lower half of the partial product. This

problem does not exist in BZ-FAD since the lower

half is not formed by shifting the bits in a shift

register.

http://ijcer.org/



Figure12.Manual approach for BZFAD

5.2 State Diagram

Figure 13: BZFAD state diagram

6. CONVENTIONAL MULTIPLIER CODE

DESCRIPTION

Following the architecture of conventional add

and shift multiplier, simulation results are

obtained. The total operation is obtained in four

states. First state loads the registers and second

state calculates the first partial product. As we

move on to the third state, the counter value is

incremented and is tested for the kth

bit value.

With every increment of the counter until the

required value is reached, the other shifting and

http://ijcer.org/



addition operations are calculated. The output is

visible at the transition from third state to fourth

state, as done signal goes high. Later counter is

reset for further operations.

6.1 BZFAD Multiplier Code Description

We made a number of adjustments to the

conventional multiplier architecture to reduce

power. Following this BZFAD architecture,

simulation results are obtained. In the first state

the multiplier and multiplicand values are loaded

with their respective values and all the signals are

initialized to zero. In the next state, in each cycle,

the hot bit of the next cycle, that is, B(n+1) is

checked. If it is 0, that is, adder is not needed in

the next cycle, the bypass register is clocked to

store the current partial product. If B(n+1) is 1,

that is, the adder is really needed in the next cycle.

The Feeder register is clocked to store the current

partial product which must be fed to the adder in

the next cycle. In each cycle ring counter is

incremented and the MSB is checked for 1, when

it becomes 1 state is incremented. In the next

state, the lower half of partial product is stored in

the Plow latch and the upper half is stored in the

feeder, and these two registers are concatenated to

form the final product.

7. HARDWARE IMPLEMENTATION

7.1 Basics About Spartan-II Trainer Kit

The Spartan-II trainer MXSFK-LC-208 is

useful to realize and verify various digital designs.

User can construct VHDL/Verilog code and verify

the results by implementing physically in to the

target device (FPGA -Field Programmable Gate

Arrays). With the help of this trainer user can

simulate/observe various input and output

conditions to verify the implemented design. Also

you can select various i/o std. Interface to the

device.

7.2. Programmable Logic Devices [PLDS]

A Programmable Logic Device is a device

whose logic characteristics can be changed and

manipulated or stored through programming.

7.2.1 Different Types of PLDs.

7.2.1.1 Programmable Array Logic[PALS]

The most common and simple device that falls

in this category is the PAL, which simply consists

of an array of AND gates and an array of OR

gates. The AND array is programmable while the

OR array is relatively fixed.

7.2.1.2. Field Programmable Gate Arrays

[FPGAS]

FPGA's are arrays of logic blocks, which can

be linked together to form complex logic

implementations. They are separated into two

categories - Fine Grained and Coarse Grained.

Fine Grained being made up of sea of gates or

transistors or small macro cells, while Coarse

Grained being made up of bigger macro cells

which are often made up of flip-flops and Look up

Tables which make up the Combinational logic

functions. These are RAM based devices i.e.

these devices lose their configuration when power

is switched off. Hence they have to be configured

every time when power is applied.

7.2.1.3 Complex Programmable Logic Devices

[CPLDS]

CPLD's are made up of smaller common Macro

cells, which are programmable. CPLD's consists

of multiple PAL like function block that can be

interconnected through a switch matrix. These are

[Flash] EPROM based devices i.e. these devices

store their configuration even when power is

switched off. Hence they need not to be

configured every time when power is applied.

7.2.1.4 Application Specific Integrated Circuits

[ASICS]

ASIC's are nothing but prefabricated pre-doped

silicon chips. These are application specific

designs. They cannot be reconfigured once

manufactured. Once the design is completely

finalized, it can be made as ASIC. Design changes

are not possible but the size and speed is more.

7.3 SPARTAN-II [FPGA] Spartan-II family is second-generation high

volume production FPGA solution. Devices in

this family are available up to 200,000 gates, with

up to 200MHz system performance at 2.5V

supply.

Features of the Spartan-II families are:

1. On-chip RAM (block and distributed).

2. Fully PCI compliant.

3. Dedicated carry logic for high-speed

arithmetic.

4. Dedicated multiplier support.

5. Low power segmented routine

architecture.

http://ijcer.org/



6. 16 high performance interface standards.

7. 4 dedicated delay locked loop (DLLs) for

advanced clock control.

8. Power down mode (ICCO =100 mA).

9. Unlimited re-programmability.

8.3.1 Tainer Description

Technical Data

On board FPGA Spartan-II XC2S50 PQ 208 and compatible with XC2S100,

XC2S150, XC2S200.in PQ 208 Package.

2 Keys for Keyboard Interface.

8 Digital I/Ps and O/Ps with LED indication.

Two seven segment Displays.

On board 4 MHZ clock and Power On reset circuit.

User selectable Interface hardware.

Support required for VCCO is on board, no external supply required].

Probing facility: All I/Os available to the user.

Power Supply

9-Volt Adapter supplied with Spartan-II Trainer.

Required VCCO (3.3V) and Vccint (2.5V) voltages are generated on board.

Seven Segment Led Display

Two 7-Segment LED displays are provided. User can use them as an aid to

verify his design. [They come handy in

counter related application to monitor the

results].

LEDs

There are total 18 LEDs on the Trainer, which are grouped as follows.

1. POWER-ON LED is used for

power supply indication

2. .DONE LED, indicates successful

configuration of SPARTAN-II

device.

3. Eight LEDs [IL0 to IL7] indicate

the inputs applied by user.

4. Eight LEDs [LD0 to LD7] indicate

output conditions.

Test Points [TPs]

User can use these points to verify ground, supply voltage, and clock.

DIP Switch

Single 8-way DIP switch [SW 1] is provided to be

used as input to the FPGA. Logic Level applied to

FPGA through SW1 is seen on LEDs LD0 to

LD7.

JUMPERS

Various jumpers are provided for

Selection of clock.

Selection of configuration mode.

KEYS

Two Keys are provided for Keyboard

Interface.

Downloading Cable

For downloading the design from PC, a 9 pin

D-Type male (J7) connector is provided on board.

The trainer can be connected to PC's parallel port

with a cable having 25 pins D-Type (male) to 9

pins D- type (female) connector. This cable is

provided with the trainer.

Figure 14: SPARTAN 2 Trainer

http://ijcer.org/



When a software is to be implemented on

hardware, interfacing is done. Any software

code/program can be dumped on a hardware kit

(in this case Spartan-II FPGA) with the help of a

software interfacing tool (Xilinx).

When we burned our programs for conventional

architecture and BZFAD architecture on the

Spartan-II kit, the results were obtained

successfully. The images of Spartan-II executing

the program are shown

8. RESULTS AND ANALYSIS

After understanding the architecture of both

conventional and BZFAD multipliers, next step

was to implement it. In order to accomplish this

we write a code in Very High Speed Integrated

Circuit- Hardware Descriptive Language [VHDL].

This code was synthesized using Xilinx and

simulated using ISE simulator [isim], and was

implemented by burning on Spartan2 FPGA kit.

Simulation results, timing summary, area

utilization and power analysis report is shown

below.

8.1 Simulation Results

The simulation results for both the conventional

and BZFAD architectures follow in the order

given below,

4 Bit conventional multiplier 8 Bit Conventional Multiplier 4 Bit BZFAD Multiplier 8 Bit BZFAD Multiplier

8.2 Timing Summary

Conventional 8

bit

BZFAD 8

bit

Minimum

period

8.258 ns 6.975 ns

Maximum

frequency

121.094 Mhz 143.362 Mhz

Minimum

input arrival

time

8.426 ns 7.167 ns

Conventional 16

bit

BZFAD 16

bit

Minimum

period

9.946 ns 6.564 ns

Maximum

frequency

100.540 Mhz 152.352

Mhz

Minimum

input arrival

time

10.281 ns 7.502 ns

8.3 Area Utilization

Conventional

4 bit

BZFAD

4 bit

Minimum

period

5.943 ns 4.918 ns

Maximum

frequency

168.264 Mhz 203.33

Mhz

Minimum

input

arrival time

6.682 ns 5.160 ns

http://ijcer.org/



8.4 Power Analysis

8.5 Result Summary

Figure 15: Area, Power and Delay comparison for

conventional and proposed BZFAD multiplier for

various bits.

Figure 16: Relationship between power reduction

and bit size of multiplier.

http://ijcer.org/



Figure 17: Simulation for 8 bit BZFAD

Figure 18: Simulation for 8 bit conventional

CONCLUSION

In this paper, a low-power architecture for

shift-and-add multipliers was proposed. The

modifications to the conventional architecture

included the removal of the shift of the B register

(in A B), direct feeding of A to the adder,

bypassing the adder whenever possible, use of a

ring counter instead of the binary counter, and

removal of the partial product shift. The results

showed an average power reduction of 30% by the

proposed architecture. We also compared our

multiplier with SPST [6], a low-power tree-based

array multiplier. The comparison showed that the

power saving of BZ-FAD was only 6% lower than

that of SPST whereas the SPST area was five

times higher than that of the BZ-FAD. Thus, for

applications where small area and high speed are

important concerns, BZ-FAD is an excellent

choice. Additionally we proposed a low-power

architecture for ring counters based on

partitioning the counter into blocks of flip flops

clock gated with a special clock gating structure

the complexity of which was independent of the

block sizes. The simulation results showed that in

comparison with the conventional architecture, the

proposed architecture reduced the power

consumption more than 75% for the 64-bit counte

REFERENCES

[1] M.Mottaghi Dastjerdi ,A.afzali

Kusha,m.Pedram BZFAD A Low Power

Low Area Multiplier Based on Shift and Add

Architecture IEEE Trans. Very Large Scale

Integr .(VLSI)Syst., Vol.17, no-2,pp302-306,

Feb. 2009.

[2] O. Chen, S.Wang, and Y.W. Wu,

Minimization of switching activities of

partial products for designing low-power

multipliers, IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 11, no. 3, pp. 418

433, Jun. 2003.

[3] B.Parhami Computer arithmetic algorithms

and Hardware designs 1 st ed.Oxford U.K.

Oxford Univ, Press 2000.

[4] Ercegovac M.D. and Huang Z. (March 2006)

http://ijcer.org/



High performance low power left to right

array multiplier design IEEE Trans.

Comput., Vol-54, no-2, pp 272-283.

[5] Anantha P. Chandrakasan, Samuel Sheng, and

Robert W. Brodersen, Low-Power CMOS

Digital Design, Journal of Solid state

circuits. Volume 27, NO 4. April 1992.

[6] Nazieh M. Botros, HDL programming

(VHDL and Verilog), Dreamtech

Press(Available through John Wiley- India

and Thomson Learning) 2006 Edition.

[7] Charles H. Roth. Jr:, Digital systems Design

using VHDL, Thomson Learning, Inc, 9th

reprint, 2006.

AUTHORS PROFILE

Mr. Prasann D.Kulkarni has

completed B.E in Electronics

and Communication Engg.

From KLSs Vishwanathrao

Deshpande Rural Institute of

Technology, Haliyal,Uttar

Kannada, Karnataka, India.

Presently he is pursuing M. Tech in Digital

Electronics from KLSs G.I.T, Belgaum,

Karnataka, India and since 2008 he is working as a

lecturer in KLSs Vishwanathrao Deshpande Rural

Institute of Technology, Haliyal, Uttar Kannada,

Karnataka, India. His Research interests are in Low

Power Embedded system design, Fuzzy logic in

neural applications.

http://ijcer.org/

161-514-2-PB

Documents

low power design

lowpower multiplier

low power vlsi system

low power consumption

lowpower vlsi system

considerable power

large power consumption

speed results