Top Banner
ISSN 2319 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com Design and Verification of High Speed Mac Unit by Using Booth Encoder with Low Power Consumtion 1 S.Suresh Babu(M.Tech), 2 AdityaPutta M.Tech, 3 Bighneswar Panda, M.Tech 2 Assistant Professor, HOD Of ECE, 3 Assistant Professor,Dept Of ECE 1,2,3 Gokul Inistitute Of Tech & Scinces, Piridi, Bobbili, Vizayanagaram(DT)-535558 Abstract- The Design and Implementation of Signed- Unsigned Modified Booth Encoding (SUMBE) multiplier. The present modified booth encoding (MBE) multiplier and the Baugh-Wooley multiplier perform multiplication operation on signed numbers only. The array multiplier and Braun array multipliers perform multiplication operation on unsigned numbers only. Thus, the requirement of the modern computer system is a dedicated and very high speed unique multiplier unit for signed and unsigned numbers. Therefore, this paper presents the design and implementation of SUMBE Multiplier. The modified booth encoder circuit generates half the partial products in parallel. By extending sign bit of the operands and generating an additional partial product the SUMBE multiplier is obtained. The carry save adder (CSA) tree and the final Carry Look Ahead (CLA) adder used to speed up the multiplier operation. Since signed and unsigned multiplication operation is performed by the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power dissipation and cost of a system. A configurable multiplier optimized for low power and high speed operations and which can be configured either for single 16-bit multiplication operation, single 8-bit multiplication is designed. The output product can be truncated to further decrease power consumption and increase speed by sacrificing a bit of output precision. Furthermore, the proposed multiplier maintains an acceptable output quality with enough accuracy when truncation is performed. Thus it provides a flexible arithmetic capacity and a tradeoff between output precision and power consumption. The approach also dynamically detects the input range of multipliers and disables the switching operation of the non effective ranges. Thus the ineffective circuitry can be efficiently deactivated, thereby reducing power consumption and increasing the speed of operation. Thus the proposed multiplier outperforms the conventional multiplier in terms of power and speed efficiencies. Keywords-MAC; verilog; signed multiplier; unsigned multiplier; Baugh-Wooley multiplier; Booth encoder. I.INTRODUCTION Multiplication is an essential arithmetic operation and its applications are dated several decades back in time. Earlier ALU‟s adders were used to perform the multiplication originally. As the applications of Array multipliers were introduced the clock rates increased as well as timing constrains became austere. Ever since then methods to implement multiplication are proposed which are more sophisticated As known the use of multiplication operation in digital computing and digital electronics is very intense especially in the field of multimedia and digital signal processing (DSP) applications. There are mainly three stages to perform multiplication: The first stage mainly consists of generating the partial products which are generated through an array of AND gates; Second stage consist of reducing the partial products by the use of partial product reduction schemes; and finally the product is obtained by adding the partial products. Power dissipation is recognized as a critical parameter in modern VLSI design field. To satisfy MOORE‟S law and to produce consumer electronics goods with more backup and less weight, low power VLSI design is necessary. Basically the multiplication is performed on two types of numbers such as 1).Signed multiplication, 2).Unsigned multiplication.
10

Design and Verification of High Speed Mac Unit by ... - ijsrcsams

Apr 20, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

Design and Verification of High Speed Mac

Unit by Using Booth Encoder with Low Power

Consumtion 1 S.Suresh Babu(M.Tech),

2AdityaPutta M.Tech,

3Bighneswar Panda, M.Tech

2Assistant Professor, HOD Of ECE,

3Assistant Professor,Dept Of ECE

1,2,3 Gokul Inistitute Of Tech & Scinces, Piridi, Bobbili, Vizayanagaram(DT)-535558

Abstract- The Design and Implementation of Signed-

Unsigned Modified Booth Encoding (SUMBE)

multiplier. The present modified booth encoding (MBE)

multiplier and the Baugh-Wooley multiplier perform

multiplication operation on signed numbers only. The

array multiplier and Braun array multipliers perform

multiplication operation on unsigned numbers only.

Thus, the requirement of the modern computer system

is a dedicated and very high speed unique multiplier

unit for signed and unsigned numbers. Therefore, this

paper presents the design and implementation of

SUMBE Multiplier. The modified booth encoder circuit

generates half the partial products in parallel. By

extending sign bit of the operands and generating an

additional partial product the SUMBE multiplier is

obtained. The carry save adder (CSA) tree and the final

Carry Look Ahead (CLA) adder used to speed up the

multiplier operation. Since signed and unsigned

multiplication operation is performed by the same

multiplier unit the required hardware and the chip area

reduces and this in turn reduces power dissipation and

cost of a system. A configurable multiplier optimized

for low power and high speed operations and which can

be configured either for single 16-bit multiplication

operation, single 8-bit multiplication is designed. The

output product can be truncated to further decrease

power consumption and increase speed by sacrificing a

bit of output precision. Furthermore, the proposed

multiplier maintains an acceptable output quality with

enough accuracy when truncation is performed. Thus it

provides a flexible arithmetic capacity and a tradeoff

between output precision and power consumption. The

approach also dynamically detects the input range of

multipliers and disables the switching operation of the

non effective ranges. Thus the ineffective

circuitry can be efficiently deactivated, thereby

reducing power consumption and increasing the speed

of operation. Thus the proposed multiplier outperforms

the conventional multiplier in terms of power and speed

efficiencies.

Keywords-MAC; verilog; signed multiplier; unsigned

multiplier; Baugh-Wooley multiplier; Booth encoder.

I.INTRODUCTION

Multiplication is an essential arithmetic operation

and its applications are dated several decades back in

time. Earlier ALU‟s adders were used to perform the

multiplication originally. As the applications of Array

multipliers were introduced the clock rates increased

as well as timing constrains became austere. Ever

since then methods to implement multiplication are

proposed which are more sophisticated As known

the use of multiplication operation in digital

computing and digital electronics is very intense

especially in the field of multimedia and digital

signal processing (DSP) applications. There are

mainly three stages to perform multiplication: The

first stage mainly consists of generating the partial

products which are generated through an array of

AND gates; Second stage consist of reducing the

partial products by the use of partial product

reduction schemes; and finally the product is

obtained by adding the partial products. Power

dissipation is recognized as a critical parameter in

modern VLSI design field. To satisfy MOORE‟S law

and to produce consumer electronics goods with

more backup and less weight, low power VLSI

design is necessary. Basically the multiplication is

performed on two types of numbers such as 1).Signed

multiplication, 2).Unsigned multiplication.

Page 2: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

II.BASICS OF MULTIPLIER

Multiplication is a mathematical operation that at

its simplest is an abbreviated process of adding an

integer to itself a specified number of times. A

number (multiplicand) is added to itself a number of

times as specified by another number (multiplier) to

form a result (product). In elementary school,

students learn to multiply by placing the multiplicand

on top of the multiplier. The multiplicand is then

multiplied by each digit of the multiplier beginning

with the rightmost, Least Significant Digit (LSD).

Intermediate results (partial products) are placed one

atop the other, offset by one digit to align digits of

the same weight. The final product is determined by

summation of all the partial-products. Although most

people think of multiplication only in base 10, this

technique applies equally to any base, including

binary. Figure 1.1 shows the data flow for the basic

multiplication technique just described. Each black

dot represents a single digit.

Figure 1: basic Multiplication.

Here, we assume that MSB represent the sign of

the digit. The operation of multiplication is rather

simple in digital electronics. It has its origin from the

classical algorithm for the product of two binary

numbers. This algorithm uses addition and shift left

operations to calculate the product of two numbers.

Based upon the above procedure, we can deduce an

algorithm for any kind of multiplication which is

shown in figure 1. We can check at the initial stage

also that whether the product will be positive or

negative or after getting the whole result, MSB of the

results tells the sign of the product.

B. Booth’s Algorithm for signed multiplication:

1. Let, A, B and P be the predetermined values of

x+y+1 length.

a. A: the binary value of c represent the MSB (most

significant bit)position and y+1 number of zeros be

appended in the LSB(least significant bit)position.

b. B: the two‟s compliment of c represent the

MSB(most significant bit)position and y+1 number

of zeros be appended in the LSB(least significant

bit)position

c. P: here, we append x bits of zero in the MSB(most

significant bit)position then substitute the binary

value of d and the LSB(least significant bit)position be represented by zero. 2. Now, considering some of the conditions for

addition and arithmetic shift.

a. In P if 01 represents the two positions of the LSB

then A is added into P. Hence, P=P+A.

b. In P if 10 represent the two positions of the LSB

then B is added into P. Hence, P=P+B.

c. In P if 11 represent the two positions of the LSB

then there is no change in value of P

d. In P if 00 represent the two positions of the LSB

then there is no change in value of P

3. When, the above conditions are verified and

executed then arithmetic shift is done

4. Repeat the second and third step for the number of

bits in the multiplier.

5. This step is the most crucial step of all as here we

get the final result. To get the final product we have

to drop the LSB from [7].

C.Booth’s Algorithm for unsigned numbers:

1. Let, A, B and P be the predetermined values of

x+y+1 length.

a. A: the binary value of c represent the MSB(most

significant bit)position and y+1 number of zeros be

appended in the LSB(least significant bit)position.

b. P: here, we append x bits of zero in the MSB(most

significant bit)position then substitute the binary

value of d and the LSB(least significant bit)position

be represented by zero.

2. Now, considering some of the conditions for

addition and arithmetic shift.

Page 3: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

a. In P if 01 represents the two positions of the LSB

then A is added into P. Hence, P=P+A.

b. In P if 10 represent the two positions of the LSB

then B is added into P. Hence, P=P-A.

c. In P if 11 represent the two positions of the LSB

then there is no change in value of P.

d. In P if 00 represent the two positions of the LSB

then there is no change in value of P.

3. When, the above conditions are verified and

executed then arithmetic shift is done.

4. Repeat the second and third step for the number of

bits in the multiplier.

This step is the most crucial step of all as here we

get the final result. To get the final product we have

to drop the LSB from P. After implementing signed

booth‟s multiplier and unsigned booth‟s multiplier

we combine the two together in a single multiplier by

using a select/control line. In order to implement the

signed baugh-wooley multiplier the array structure

must be know

P=a[m-1]b[n-1]2𝑚+𝑛−2+

𝑁𝑂𝑇(𝑎[𝑚 − 1𝑏[𝑖])2𝑖+𝑚−1𝑖=𝑛−2𝑖=0 +

𝑁𝑂𝑇(𝑎 𝑗 𝑏[𝑛 − 1])2𝑖+𝑚−1𝑗=𝑚−2𝑗=0 +

(𝑎 𝑖 𝑏[𝑗])2𝑖+𝑗𝑖=𝑚−2,𝑗=𝑛−2𝑖=0,𝑗=0 +2𝑛−1+2𝑚−1-

2(𝑚+𝑛−1) ….(1)

Hence, the above equation will tell us about the

array structure by placing m=n=32 in order to

implement the 32 bit signed baugh-wooley multiplier.

Fig 2. Signed Multiplication Algorithm.

The basic multiplication algorithm is shown

in figure 2, Considering the bit representation of the

multiplicand x = xn-1…..x1 x0 and the multiplier y =

yn-1…..y1y0 in order to form the product up to n

shifted copies of the multiplicand are to be added for

unsigned multiplication. The entire process consists

of three steps, partial product generation, partial

product reduction and final addition. All the bits of

the partial products in each column are added to

obtain two bits: sum and carry. Finally, the sum and

carry bits in each column have to be summed.

Similarly, for the multiplication of an n-bit

multiplicand and an m-bit multiplier, a product with n

+ m bits long and m partial products can be

generated. The method shown in figure 1.3 is also

called a non-Booth encoding scheme.

Fig 3. Multiplication Operation in hardware (Non

Booth-Encoding).

III.MULTIPLERS ARCHITECTURE

In the majority of digital signal processing (DSP)

applications the critical operations usually involve

many multiplications and/or accumulations. For real-

time signal processing, a high speed and high

throughput Multiplier-Accumulator (MAC) is always

Page 4: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

a key to achieve a high performance digital signal processing system. In the last few years, the

main consideration of MAC design is to enhance its

speed. This is because; speed and throughput rate is

always the concern of digital signal processing

system. But for the epoch of personal

communication, low power design also becomes

another main design consideration. This is because;

battery energy available for these portable products

limits the power consumption of the system.

Therefore, the main motivation of this work is to

investigate various Pipelined multiplier/accumulator

architectures and circuit design techniques which are

suitable for implementing high throughput signal

processing algorithms and at the same time achieve

low power consumption. A conventional MAC unit

consists of (fast multiplier) multiplier and an

accumulator that contains the sum of the previous

consecutive products. The function of the MAC unit

is given by the following equation:

F=Σ Ai Bi………………… (2.1)

The main goal of a DSP processor design is to

enhance the speed of the MAC unit, and at the same

time limit the power consumption. In a pipelined

MAC circuit, the delay of pipeline stage is the delay

of a 1-bit full adder. Estimating this delay will assist

in identifying the overall delay of the pipelined

MAC. In this work, 1-bit full adder is designed. Area,

power and delay are calculated for the full adder,

based on which the pipelined MAC unit is designed

for low power.

A. High-Speed Booth Encoded Parallel Multiplier

Design:

Fast multipliers are essential parts of digital

signal processing systems. The speed of multiply

operation is of great importance in digital signal

processing as well as in the general purpose

processors today, especially since the media

processing took off. In the past multiplication was

generally implemented via a sequence of addition,

subtraction, and shift operations. Multiplication can

be considered as a series of repeated additions. The

number to be added is the multiplicand, the number

of times that it is added is the multiplier, and the

result is the product. Each step of addition generates

a partial product. In most computers, the operand

usually contains the same number of bits. When the

operands are interpreted as integers, the product is

generally twice the length of operands in order to

preserve the information content. This repeated

addition method that is suggested by the arithmetic

definition is slow that it is almost always replaced by

an algorithm that makes use of positional

representation. It is possible to decompose multipliers

into two parts. The first part is dedicated to the

generation of partial products, and the second one

collects and adds them.

Fig 4.Hardware architecture of the proposed MAC.

The basic multiplication principle is twofold i.e.

evaluation of partial products and accumulation of

the shifted partial products. It is performed by the

successive additions of the columns of the shifted

partial product matrix. The „multiplier‟ is

successfully shifted and gates the appropriate bit of

the „multiplicand‟.

The delayed, gated instance of the multiplicand

must all be in the same column of the shifted partial

product matrix. They are then added to form the

product bit for the particular form. Multiplication is

therefore a multi operand operation. To extend the

multiplication to both signed and unsigned.

B. Derivation of MAC Arithmetic:

Basic Concept: If an operation to multiply two –bit

numbers and accumulates into a 2-bit number is

considered, the critical path is determined by the 2-bit

accumulation operation. If a pipeline scheme is

applied for each step in the standard

Page 5: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

design of Fig 1, the delay of the last accumulator

must be reduced in order to improve the performance

of the MAC. The overall performance of the

proposed MAC is improved by eliminating the

accumulator itself by combining it with the CSA

function. If the accumulator has been eliminated, the

critical path is then determined by the final adder in

the multiplier. The basic method to improve the

performance of the final adder is to decrease the

number of input bits. In order to reduce this number

of input bits, the multiple partial products are

compressed into a sum and a carry by CSA. The

number of bits of sums and carries to be transferred

to the final adder is reduced by adding the lower bits

of sums and carries in advance within the range in

which the overall performance will not be degraded.

A 2-bit CLA is used to add the lower bits in the CSA.

In addition, to increase the output rate when

pipelining is applied, the sums and carries from the

CSA are accumulated instead of the outputs from the

final adder in the manner that the sum and carry from

the CSA in the previous cycle are inputted to CSA.

Due to this feedback of both sum and carry, the

number of inputs to CSA increases, compared to the

standard design and . In order to efficiently solve the

increase in the amount of data, CSA architecture is

modified to treat the sign bit.

Fig 5. Proposed arithmetic operation of multiplication

and accumulation.

Equation Derivation: The aforementioned concept is

applied to express the proposed MAC arithmetic.

Then, the multiplication would be transferred to a

hardware architecture that complies with the

proposed concept, in which the feedback value for

accumulation will be modified and expanded for the

new MAC. First, if the multiplication in (4) is

decomposed and rearranged, it becomes

If this is divided into the first partial product, sum of

the middle partial products, and the final partial

product. The reason for separating the partial product

addition as is that three types of data are fed back

for accumulation, which are the sum, the carry, and

the pre added results of the sum and carry from lower

bits.

Now, the proposed concept is applied to in (5). If is

first divided into upper and lower bits and rearranged,

(8) will be derived. The first term of the right-hand

side in (8) corresponds to the upper bits. It is the

value that is fed back as the sum and the carry. The

second term corresponds to the lower bits and is the

value that is fed back as the addition result for the

sum and carry

The second term can be separated further into the

carry term and sum term as

Thus,

Page 6: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

V. MODIFIED BOOTH ENCODER

The new MBE Recoder [1] was designed

according to the following analysis. Table 1 presents

the truth table of the new encoding scheme. The Z

signal makes the output zero to compensate the

incorrect X2_b and Neg signals. Fig. 1 presents the

circuit diagram of the encoder and decoder. The

encoder generates X1_b, X2_b, and Z signals by

encoding the three x-signals. The yLSB signal is the

LSB of the y signal and is combined with x-signals to

determine the Row_LSB and the Neg_cin signals.

Similarly, yMSB is combined with x- an overview of

the partial product array for an 8 × 8 multiplier. The

sign extension circuitry developed in [22] and [23].

The conventional MBE partial product array has two

drawbacks: 1) an additional partial product term at

the (n-2)th bit position; 2) poor performance at the

LSB-part. To remedy the two drawbacks, the LSB

part of the partial product array is modified.

Referring to Fig. 2a, the Row_LSB (gray circle) and

the Neg_cin terms are combined and further

simplified using Boolean minimization. The new

equations for the Row_LSB and Neg_cin can be

written as (1) and (2), respectively.

In order to achieve high-speed multiplication,

multiplication algorithms using parallel counters,

such as the modified Booth algorithm has been

proposed, and some multipliers based on the

algorithms have been implemented for practical use.

This type of multiplier operates much faster than an

array multiplier for longer operands because its

computation time is proportional to the logarithm of

the word length of operands.

Booth multiplication is a technique that allows for

smaller, faster multiplication circuits, by recoding the

numbers that are multiplied. It is possible to reduce

the number of partial products by half, by using the

technique of radix-4 Booth recoding. The basic idea

is that, instead of shifting and adding for every

column of the multiplier term and multiplying by 1 or

0, we only take every second column, and multiply

by ±1, ±2, or 0, to obtain the same results. The

advantage of this method is the halving of the number

of partial products. To Booth recode the multiplier

term, we consider the bits in blocks of three, such

that each block overlaps the previous block by one

bit. Grouping starts from the LSB, and the first block

only uses two bits of the multiplier. Figure 3 shows

the grouping of bits from the multiplier term for use

in modified booth encoding.

Fig 6. Grouping of bits from the multiplier

term.

Each block is decoded to generate the correct

partial product. The encoding of the multiplier Y,

using the modified booth algorithm, generates

the following five signed digits, -2, -1, 0, +1, +2.

Each encoded digit in the multiplier performs a

certain operation on the multiplicand, X, as

illustrated in Table 1

Table 1.Multiplication operation on the multiplicand

X.

For the partial product generation, we adopt

Radix-4 Modified Booth algorithm to reduce the

number of partial products for roughly one half. For

multiplication of 2‟s complement numbers, the two-

bit encoding using this algorithm scans a triplet of

bits. When the multiplier B is divided into groups of

two bits, the algorithm is applied to this group of

divided bits. The figure 7 shows a computing

example of Booth multiplying two numbers”2AC9”

and “006A”. The shadow denotes that the numbers in

this part of Booth multiplication are all zero so that

Page 7: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

this part of the computations can be neglected.

Saving those computations can

Fig 7. Illustration of multiplication using modified

Booth encoding.

The PP generator generates five candidates of the

partial products, i.e., {-2A,-A, 0, A, 2A}. These are

then selected according to the Booth encoding results

of the operand B. When the operand besides the

Booth encoded one has a small absolute value, there

are opportunities to reduce the spurious power

dissipated in the compression tree.

VII.PROPOSED SUMBE MULTIPLIER

The main goal of this paper is to design and

implement 8×8 multiplier for signed and unsigned

numbers using MBE technique. Table 2 shows the

truth table of MBE scheme. From table 2 the MBE

logic diagram is implemented as shown in Fig. 4.

Using the MBE logic and considering other

conditions the Boolean expression for one bit partial

product generator is given by the equation 3.

Table 2. Truth table for Proposed MBE Scheme.

Fig 8. Logic Diagram for MBE.

Equation 3 is implemented as shown in Fig. 5.

The SUMBE multiplier does not separately consider

the encoder and the decoder logic, but instead

implemented as a single unit called partial product

generator as shown in Fig. 5. The negative partial

products are converted into 2‟s complement by

adding a negate (Ni) bit. An expression for negate bit

is given by the Boolean equation 4. This equation is

implemented as shown in Fig. 6. The required signed

extension to convert 2‟s complement signed

multiplier into both signed-unsigned multiplier is

given by the equations 5 and 6. For Boolean

equations 5 and 6 the corresponding logic diagram is

shown in Fig 8.

The working principle of sign extension that

converts signed multiplier signed-unsigned multiplier

as follows. One bit control signal called signed-

unsigned (s_u) bit is used to indicate whether the

multiplication operation is signed indicates unsigned

number multiplication, and when s_u = 1, it indicates

signed number multiplication. It is required that when

the operation is unsigned multiplication the sign

extended bit of both multiplicand and multiplier

should be extended with 0, that is a8 = a9 = b8 = b9

= 0. It is required that when the operation is signed

multiplication the sign extended bit depends on

whether the multiplicand is negative or the multiplier

is negative or both the operands are negative. For this

when the multiplicand operand is negative and

multiplier operand is positive the sign extended bits

should be generated are s_u = 1, a7 =1, b7 = 0, a8 =

a9 =1, and b8 = b9 =0. And when the multiplicand

Page 8: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

operand is positive and multiplier operand is negative

the sign extended bits should be generated are s_u =

1, a7 =0, , b7 = 1, a8 = a9 =0, and b8 = b9 =1.

Table 3 shows the SUMBE multiplier operation.

Fig 9. Logic diagram of 1-bit partial product

generator.

Fig 10.Logic diagram of negate bit generator.

Fig 11. Logic diagram of sign converter.

VIII. PROPOSED CSA ARCHITECTURE

The architecture of the hybrid-type CSA that

complies with the operation of the proposed MAC is

shown in Fig. 5, which performs 8-bit operation. In

Fig. 2.11 Si is to simplify the sign expansion and Ni

is to compensate 1‟s complement number into 2‟s

complement number. S[i] and C[i] correspond to the

ith bit of the feedback sum and carry. Z[i] is the ith

bit of the sum of the lower bits for each partial

product that were added in advance and Z‟[i] is the

previous result. In addition, Pj[i] corresponds to the

ith bit of the jth partial product. Since the multiplier

is for 8 bits, totally four partial products are

generated from the Booth encoder. This CSA

requires at least four rows of FAs for the four partial

products. Thus, totally five FA rows are necessary

since one more level of rows are needed for

accumulation. For an -bit MAC operation, the level

of CSA is (n/2+1). The white square in Fig. 2.11

represents an FA and the gray square is a half adder

(HA). The rectangular symbol with five inputs is a 2-

bit CLA with a carry input.

The critical path in this CSA is determined by

the 2-bit CLA. It is also possible to use FAs to

implement the CSA without CLA. However, if the

lower bits of the previously generated partial product

are not processed in advance by the CLAs, the

number of bits for the final adder will increase. When

the entire multiplier or MAC is considered, it

degrades the performance. In Table I, the

characteristics of the proposed CSA architecture have

been summarized and briefly compared with other

architectures. For the number system, the proposed

CSA uses 1‟scomplement, but ours uses a modified

CSA array without sign extension. The biggest

difference between ours and the others is the type of

values that is fed back for accumulation. Ours has the

smallest number of inputs to the final adder.

IX.CONCLUSION

An 8x8 multiplier-accumulator (MAC) is

presented in this work. A Radix 4Modified Booth

multiplier circuit is used for MAC architecture.

Compared to other circuits, the Booth multiplier has

the highest operational speed and less hardware

count. The basic building blocks for the MAC unit

are identified and each of the blocks is analyzed for

its performance. Power and delay is calculated for the

blocks. 1-bit MAC unit is designed with enable to

reduce the total power consumption based on block

enable technique. Using this block, the N-bit MAC

unit is constructed and the total power consumption

is calculated for the MAC unit. The power reduction

techniques adopted in this work. The MAC unit

designed in this work can be used in filter realizations

for High speed DSP applications.

Page 9: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

Fig 12. Architecture of the proposed CSA tree.

X.SIMULATION RESULTS

The simulation result of signed unsigned number

in binary. Here when the control signal s_u = 0, the

8-bit operands are considered as unsigned and the

product of 11111111 × 11111111 =

1111111000000001. And when the control signal s_u

= 1, the 8-bit operands are considered as signed and

the product of 11111111 × 11111111 =

0000000000000001.

Fig 13. Simulation result of signed-unsigned

numbers.

Fig 14. Simulation result of signed unsigned number

in binary.

ACKNOWLEDGEMENT

S.Suresh Babu would like to thank Mrs,Telkar

Kalpana Associate professor, in the Department of

ECE who had been guiding throughout the project

and supporting me in giving technical ideas about the

paper and motivating me to complete the work

efficiently and successfully.

REFERENCES

[1] W. –C. Yeh and C. –W. Jen, “High Speed Booth

encoded Parallel Multiplier Design,” IEEE

transactions on computers, vol. 49, no. 7, pp. 692-

701, July 2000.

[2] Shiann-Rong Kuang, Jiun-Ping Wang, and Cang-

Yuan Guo, “Modified Booth multipliers with a

Regular Partial Product Array,”IEEE Transactions on

circuits and systems-II, vol 56, No 5, May 2009.

Page 10: Design and Verification of High Speed Mac Unit by ... - ijsrcsams

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IJSRCSAMS Volume 3, Issue 6 (November 2014) www.ijsrcsams.com

[3] Li-Rong Wang, Shyh-Jye Jou and Chung-Len

Lee, “A well-tructured Modified Booth Multiplier

Design” 978-1-4244-1617-2/08/$25.00©2008 IEEE.

[4] Soojin Kim and Kyeongsoon Cho “Design of

High-speed Modified Booth Multipliers Operating at

GHz Ranges” World Academy of Science,

Engineering and Technology 61 2010.

[5] Magnus Sjalander and Per Larson-Edefors. “The

Case for HPM-Based Baugh-Wooley Multipliers,”

Chalmers University of Technology, Sweden, March

2008.

[6] Z Haung and M D Ercegovac, “High performance

Low Power left to right array multiplier design”

IEEE trans.Computer, vol 54 no3, page 272-283 Mar

2005.

[7] Hsing-Chung Liang and Pao-Hsin Huang,

“Testing Transition Delay Faults in Modified Booth

Multipliers by Using C-testable and SIC

Patterns”IEEE2007, 1-4244-1272-2/07.

[8] Aswathy Sudhakar, and D. Gokila, “Run-Time

Reconfigurable Pipelined Modified Baugh-Wooley

Multipliers,” Advances in Computational Sciences

and Technology ISSN 0973-6107 Volume 3 Number

2 (2010) pp. 223–235.

[9] Myoung-Cheol Shin, Se-Hyeon Kang, and In-

Cheol Park, “An Area- Efficient Iterative Modified-

Booth Multiplier Based on Self-Timed Clocking,”

Industry, and Energy through the project System IC

2010, and by IC Design Education Center (IDEC).

[10] Leandro Z. Pieper, Eduardo A. C. da Costa,

Sergio J. M. de Almeida, “Efficient Dedicated

Multiplication Blocks for2´s Complement Radix- 2m

Array Multipliers,” JOURNAL OF COMPUTERS,

VOL. 5, NO. 10, OCTOBER 2010.

[11] C R Baugh and B. A Wooley, “A two‟s

complement parallel array multiplication algorithm,”

IEEE Transaction on Computers, Vol. 22, n0.12, pp

1045-1047, Dec.1973.