M.tech Final Project

7/31/2019 M.tech Final Project

1/23

VIF COLLEGE OF ENGINEERING &

TECHNOLOGY

( Himayat Nager ,gandipet x road )

(Affiliated to Jawaharlal Nehru Technological University)

logo

P.G.DEPARTMENT OF ECE

Seminar Report

On

--------------------------------------

Submitted By

Name:

Roll No:Branch:


2/23

VIF COLLEGE OF ENGINEERING &

TECHNOLOGY(---------)

Logo

P.G.DEPARTMENT OF ECE

Certificate

This is to certify that Mrs. ---------------------------------

bearing H.T.No. -------- has satisfactorily completed the course of

Seminar entitled --------------- prescribed by JNTU for the I -

SEMISTER of M.Tech( Branch) during the academic year 2010-

2011.

Faculty In-charge Head of the department


3/23

Acknowledgement


4/23

Abstract

This paper provides a detailed study of a configurable multiplier optimized for low power

and high speed operations and which can be configured either for single 16-bit

multiplication operation, single 8-bit multiplication or twin parallel 8-bit multiplication.

The output product can be truncated to further decrease power consumption and increase

speed by sacrificing a bit of output precision. Furthermore, the proposed multiplier

maintains an acceptable output quality with enough accuracy when truncation is performed.

Thus it provides a flexible arithmetic capacity and a tradeoff between output precision and

power consumption. The approach also dynamically detects the input range of multipliers

and disables the switching operation of the non effective ranges. Thus the ineffective

circuitry can be efficiently deactivated, thereby reducing power consumption and

increasing the speed of operation. Thus the proposed multiplier outperforms the

conventional multiplier in terms of power and speed efficiencies


5/23

List of Tables


6/23

List of Figures


7/23

Table of Content

Page No

Acknowledgement

Abstract

List of Tables

List of Figures

Chapter 1: Introduction

Chapter 2: Multipliers

Chapter 3: Booths Algorithm

Chapter 4: Timing and Area Analysis

Chapter 5: Conclusion

Appendix A. Test File Verilog Code

References


8/23

Chapter 1

Introduction

Portable multimedia and digital signal processing (DSP) systems, which typically require flexible

processing ability, low power consumption, and short design cycle, have become increasingly popular over the

past few years. Many multimedia and DSP applications are highly multiplication intensive so that the

performance and power consumption of these systems are dominated by multipliers. The computation of the

multipliers manipulates two input data to generate many partial products for subsequent addition operations,

which in the CMOS circuit design requires many switching activities. Thus, switching activity within the

functional unit requires for majority of power consumption and also increases delay. Therefore, minimizing the

switching activities can effectively reduce power dissipation and increase the speed of operation without

impacting the circuits operational performance. Besides, energy-efficient multiplier is greatly desirable for many

multimedia applications.

The first multiplication algorithm that was developed for the early computing requirements follow the

steps that we use to multiply two numbers by hand [1]. According to Patterson and Hennessy [1], when this

algorithm was translated for computer use, it required five hardware components, as seen in Figure 1.1. The

components included one register for each number (multiplicand, multiplier, and product), an ALU, and a

control. The algorithm involves multiplying each digit of the multiplier with the multiplicand and adding up the

individual results [1]. Since binary multiplication involves only 1s and 0s, the multiplication of each digit to

multiplicand translates to shifting and adding of the multiplicand.

Figure 1.1 First Multiplication Hardware Implementation

As seen in Figure 1.1, the control tests the multipliers least significant bit (LSB). If the LSB is 1, it

will send a signal to the ALU to add the multiplicand to the current calculated product. The multiplier is then

shifted to the right to fetch the next bit and multiplicand is shifted to the left to prepare for the next multiplication

iteration. This algorithm, shown as a flow chart in Figure 1.2 [1], is the basis for the pen and paper algorithm.


9/23

Figure 1.2 First Multiplication Algorithm Flowchart for 32-bit Numbers

Many tried to make several improvements to the traditional pen and paper algorithm by reducing the

amount of additions being performed in the algorithm. In 1951, based on the idea that computers are faster at

shifting bits than adding them [1], Andrew Donald Booth developed an algorithm known as Booths algorithm.

There were many such discoveries through the years to improve the efficiency and performance of the

multiplication algorithms.

Here attempt is made to combine configuration, partially guarded computation, and the truncation

technique to design a high speed and power-efficient configurable BM (CBM). The main concerns are speed,

power efficiency and structural flexibility. The proposed multiplier not only perform single 16-b, single 8-b, or

twin parallel 8-b multiplication operations but also offer a flexible tradeoff between output accuracy and power

consumption to achieve more power savings.

Several techniques are available [1] [3] to improve the speed and power efficiency is analyzed.

Approaches termed guarded evaluation, clock gating, signal gating, truncation etc. reduce the power

consumption and increase the speed of multipliers by eliminating spurious computations according to the

dynamic range of the input operands. The work in [4] separated the arithmetic units into the most and least

significant parts and turned off the most significant part when it did not affect the computation results to save

power. Techniques in [5] that can dynamically adjust two voltage supplies based on the range of the incoming

operands and disable ineffective ranges with a zero-detection circuitry were presented to decrease the power


10/23

consumption of multipliers. In [6] a dynamic-range detector to detect the effective range of two operands was

developed. The one with the smaller dynamic range is processed to generate booth encoding so that partial

products have a greater opportunity to be zero, thereby reducing power consumption maximally.

Furthermore, in many multimedia and DSP systems is frequently truncated due to the fixed register

size and bus width inside the hardware. With this characteristic, significant power saving can be achieved by

directly omitting the adder cells for computing the least significant

bits of the output product, but large truncation errors are introduced. Various error compensation approaches and

circuits, which add the estimated compensation carries to the carry inputs of the retained adder cells to reduce the

truncation error. In the constant scheme [7], constant error compensation values were pre-computed and added to

reduce the truncation error. On the contrary, data-dependent error compensation approaches [8] [10] were

developed to achieve better accuracy than that of the constant schemed were in data dependent error

compensation values will be added to reduce the truncation error of array and Booth multipliers (BMs).

Here, we attempt to combine configuration, partially guarded computation, and the truncation

technique to design a power-efficient configurable BM (CBM). Our main concerns are power efficiency and

structural flexibility. Most common multimedia and DSP applications are based on 816-b operands, the

proposed multiplier is designed to not only perform single 16-b but also performs single 8-b, or twin parallel 8-b

multiplication operations. The experimental results demonstrate that the proposed multiplier can provide various

configurable characteristics for multimedia and DSP systems and achieve more power savings with slight area

overhead.

Chapter 2


11/23

Multipliers

A binary multiplier is an electronic circuit used in digital electronics, such as

a computer, to multiply two binary numbers. It is built using binary adders.

A variety of computer arithmetic techniques can be used to implement a digital

multiplier. Most techniques involve computing a set ofpartial products, and then summing

the partial products together. This process is similar to the method taught to primary

schoolchildren for conducting long multiplication on base-10 integers, but has been

modified here for application to a base-2 (binary) numeral system.

History

Until the late 1970s, most minicomputers did not have a multiply instruction, and

so programmers used a "multiply routine"which repeatedly shifts and accumulates partial

results, often written using loop unwinding. Mainframe computers had multiply

instructions, but they did the same sorts of shifts and adds as a "multiply routine".

Early microprocessors also had no multiply instruction. The Motorola 6809,

introduced in 1978, was one of the earliest microprocessors with a dedicated hardware

multiply instruction. It did the same sorts of shifts and adds as a "multiply routine", but

implemented in the microcode of the MUL instruction.

As more transistors per chip became available due to larger-scale integration, it

became possible to put enough adders on a single chip to sum all the partial products at

once, rather than reuse a single adder to handle each partial product one at a time.

Because some common digital signal processing algorithms spend most of their

time multiplying, digital signal processor designers sacrifice a lot of chip area in order to

make the multiply as fast as possible; a single-cycle multiplyaccumulate unit often used

up most of the chip area of early DSPs.

Multiplication basics

The method taught in school for multiplying decimal numbers is based on calculating

partial products, shifting them to the left and then adding them together. The most difficult

part is to obtain the partial products, as that involves multiplying a long number by one

digit (from 0 to 9):

123


12/23

x 456

=====

738 (this is 123 x 6)

615 (this is 123 x 5, shifted one position to the left)

+ 492 (this is 123 x 4, shifted two positions to the left)

=====

56088

A binary computer does exactly the same, but with binary numbers. In binary

encoding each long number is multiplied by one digit (either 0 or 1), and that is much

easier than in decimal, as the product by 0 or 1 is just 0 or the same number. Therefore, the

multiplication of two binary numbers comes down to calculating partial products (which

are 0 or the first number), shifting them left, and then adding them together (a binary

addition, of course):

1011 (this is 11 in binary)

x 1110 (this is 14 in binary)

======

0000 (this is 1011 x 0)

1011 (this is 1011 x 1, shifted one position to the left)

1011 (this is 1011 x 1, shifted two positions to the left)

+ 1011 (this is 1011 x 1, shifted three positions to the

left)

=========

10011010 (this is 154 in binary)

This is much simpler than in the decimal system, as there is no table of

multiplication to remember: just shifts and adds.

This method is mathematically correct and has the advantage that a small CPU may

perform the multiplication by using the shift and add features of its arithmetic logic unit

rather than a specialized circuit. The method is slow, however, as it involves many

intermediate additions. These additions take a lot of time. Faster multipliers may be

engineered in order to do fewer additions; a modern processor can multiply two 64-bit

numbers with 16 additions (rather than 64), and can do several steps in parallel.

The second problem is that the basic school method handles the sign with a separate

rule ("+ with + yields +", "+ with - yields -", etc.). Modern computers embed the sign of the


13/23

number in the number itself, usually in the two's complement representation. That forces

the multiplication process to be adapted to handle two's complement numbers, and that

complicates the process a bit more. Similarly, processors that use ones' complement, sign-

and-magnitude, IEEE-754 or other binary representations require specific adjustments to

the multiplication process.

A more advanced approach: an unsigned example

For example, suppose we want to multiply two unsigned eight bit integers together: a[7:0]

and b[7:0]. We can produce eight partial products by performing eight one-bit

multiplications, one for each bit in multiplicand a:

p0[7:0] = a[0] b[7:0] = {8{a[0]}} & b[7:0]

p1[7:0] = a[1] b[7:0] = {8{a[1]}} & b[7:0]

p2[7:0] = a[2] b[7:0] = {8{a[2]}} & b[7:0]

p3[7:0] = a[3] b[7:0] = {8{a[3]}} & b[7:0]

p4[7:0] = a[4] b[7:0] = {8{a[4]}} & b[7:0]

p5[7:0] = a[5] b[7:0] = {8{a[5]}} & b[7:0]

p6[7:0] = a[6] b[7:0] = {8{a[6]}} & b[7:0]

p7[7:0] = a[7] b[7:0] = {8{a[7]}} & b[7:0]

where {8{a[0]}} means repeating a[0] (the 0th bit of a) 8 times (Verilog notation).

To produce our product, we then need to add up all eight of our partial products, as

shown here:

p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0]

+ p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[0] 0

+ p2[7] p2[6] p2[5] p2[4] p2[3] p2[2] p2[1] p2[0] 0 0

+ p3[7] p3[6] p3[5] p3[4] p3[3] p3[2] p3[1] p3[0] 0 0 0

+ p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[0] 0 0 0 0

+ p5[7] p5[6] p5[5] p5[4] p5[3] p5[2] p5[1] p5[0] 0 0 0 0 0

+ p6[7] p6[6] p6[5] p6[4] p6[3] p6[2] p6[1] p6[0] 0 0 0 0 0 0

+ p7[7] p7[6] p7[5] p7[4] p7[3] p7[2] p7[1] p7[0] 0 0 0 0 0 0 0

-------------------------------------------------------------------------------------------

P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]

In other words,P[15:0] is produced by summingp0,p1


14/23


15/23

bit position 0 (LSB) and all the -1's in bit columns 7 through 14 (where each of the MSBs

are located) are added together, they can be simplified to the single 1 that "magically" is

floating out to the left. For an explanation and proof of why flipping the MSB saves us the

sign extension, see a computer arithmetic book


16/23

Chapter 3

Booths Algorithm

Booth's multiplication algorithm is a multiplication algorithm that multiplies two

signed binary numbers in two's complement notation. The algorithm was invented

by Andrew Donald Booth in 1950 while doing research on crystallography at Birkbeck

College in Bloomsbury, London. Booth used desk calculators that were faster

at shifting than adding and created the algorithm to increase their speed. Booth's algorithm

is of interest in the study of computer architecture.

The algorithm

Booth's algorithm examines adjacent pairs of bits of theN-bit multiplierYin

signed two's complement representation, including an implicit bit below the least

significant bit,y-1 = 0. For each bityi, fori running from 0 toN-1, the bitsyi andyi-1 are

considered. Where these two bits are equal, the product accumulatorPremains unchanged.

Whereyi = 0 andyi-1 = 1, the multiplicand times 2i is added toP; and whereyi = 1 andyi-1 =

0, the multiplicand times 2i is subtracted fromP. The final value ofPis the signed product.

The representation of the multiplicand and product are not specified; typically,

these are both also in two's complement representation, like the multiplier, but any number

system that supports addition and subtraction will work as well. As stated here, the order of

the steps is not determined. Typically, it proceeds from LSB to MSB, starting at i = 0; the

multiplication by 2i is then typically replaced by incremental shifting of thePaccumulator

to the right between steps; low bits can be shifted out, and subsequent additions and

subtractions can then be done just on the highestNbits ofP.[1] There are many variations

and optimizations on these details.

The algorithm is often described as converting strings of 1's in the multiplier to a

high-order +1 and a low-order 1 at the ends of the string. When a string runs through the

MSB, there is no high-order +1, and the net effect is interpretation as a negative of the

appropriate value.

A typical implementation
http://en.wikipedia.org/wiki/Two's_complementhttp://en.wikipedia.org/wiki/Booth's_multiplication_algorithm#cite_note-0http://en.wikipedia.org/wiki/Two's_complementhttp://en.wikipedia.org/wiki/Booth's_multiplication_algorithm#cite_note-0


17/23

Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned

binary addition) one of two predetermined valuesA and Sto a productP, then performing a

rightwardarithmetic shift onP. Let m and r be the multiplicand and multiplier,

respectively; and letx andy represent the number of bits in m and r.

1. Determine the values ofA and S, and the initial value ofP. All of these numbers

should have a length equal to (x +y + 1).

1. A: Fill the most significant (leftmost) bits with the value ofm. Fill the

remaining (y + 1) bits with zeros.

2. S: Fill the most significant bits with the value of (m) in two's complement

notation. Fill the remaining (y + 1) bits with zeros.

3. P: Fill the most significantx bits with zeros. To the right of this, append the

value ofr. Fill the least significant (rightmost) bit with a zero.

2. Determine the two least significant (rightmost) bits ofP.

1. If they are 01, find the value ofP+A. Ignore any overflow.

2. If they are 10, find the value ofP+ S. Ignore any overflow.

3. If they are 00, do nothing. UsePdirectly in the next step.

4. If they are 11, do nothing. UsePdirectly in the next step.

3. Arithmetically shift the value obtained in the 2nd step by a single place to the right.

LetPnow equal this new value.

4. Repeat steps 2 and 3 until they have been doney times.

5. Drop the least significant (rightmost) bit fromP. This is the product ofm and r.

Example

Find 3 (4), with m = 3 and r = 4, andx = 4 andy = 4:

m = 0011, -m = 1101, r = 1100

A = 0011 0000 0

S = 1101 0000 0 P = 0000 1100 0

Perform the loop four times :

1. P = 0000 1100 0. The last two bits are 00.

P = 0000 0110 0. Arithmetic right shift.

http://en.wikipedia.org/wiki/Arithmetic_shifthttp://en.wikipedia.org/wiki/Arithmetic_shifthttp://en.wikipedia.org/wiki/Arithmetic_shifthttp://en.wikipedia.org/wiki/Arithmetic_shift


18/23



P = 1101 0011 0. P = P + S.




The product is 1111 0100, which is 12.

The above mentioned technique is inadequate when the multiplicand is the largest negative

numberthat can be represented (e.g. if the multiplicand has 4 bits then this value is 8).

One possible correction to this problem is to add one more bit to the left of A, S and P.

Below, we demonstrate the improved technique by multiplying 8 by 2 using 4 bits for the

multiplicand and the multiplier:

A = 1 1000 0000 0

S = 0 1000 0000 0

P = 0 0000 0010 0

Perform the loop four times :

1. P = 0 0000 0010 0. The last two bits are 00.

P = 0 0000 0001 0. Right shift.


P = 0 1000 0001 0. P = P + S.

P = 0 0100 0000 1. Right shift.


P = 1 1100 0000 1. P = P + A.

P = 1 1110 0000 0. Right shift.


P = 1 1111 0000 0. Right shift.

The product is 11110000 (after discarding the first and the last bit) which is 16.

Booth Recoding
http://en.wikipedia.org/wiki/Two's_complement#The_most_negative_numberhttp://en.wikipedia.org/wiki/Two's_complement#The_most_negative_numberhttp://en.wikipedia.org/wiki/Two's_complement#The_most_negative_numberhttp://en.wikipedia.org/wiki/Two's_complement#The_most_negative_number


19/23

Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by

recoding the numbers that are multiplied. It is the standard technique used in chip design,

and provides significant improvements over the "long multiplication" technique.

Shift and Add

A standard approach that might be taken by a novice to perform multiplication is to

"shift and add", or normal "long multiplication". That is, for each column in the multiplier,

shift the multiplicand the appropriate number of columns and multiply it by the value of the

digit in that column of the multiplier, to obtain a partial product. The partial products are

then added to obtain the final result:.

0 0 1 0 1 1

0 1 0 0 1 1

0 0 1 0 1 1

0 0 1 0 1 1

0 0 0 0 0 0

0 0 0 0 0 0

0 0 1 0 1 1

0 0 1 1 0 1 0 0 0 1

With this system, the number of partial products is exactly the number of columns

in the multiplier.

Reducing the Number of Partial Products

It is possible to reduce the number of partial products by half, by using the

technique of radix 4 Booth recoding. The basic idea is that, instead of shifting and addingfor every column of the multiplier term and multiplying by 1 or 0, we only take every

second column, and multiply by 1, 2, or 0, to obtain the same results. So, to multiply by

7, we can multiply the partial product aligned against the least significant bit by -1, and

multiply the partial product aligned with the third column by 2:


20/23

Partial Product 0 = Multiplicand * -1, shifted left 0 bits (x -1)

Partial Product 1 = Multiplicand * 2, shifted left 2 bits (x 8)

This is the same result as the equivalent shift and add method:





The advantage of this method is the halving of the number of partial products. This is

important in circuit design as it relates to the propagation delay in the running of the

circuit, and the complexity and power consumption of its implementation.

It is also important to note that there is comparatively little complexity penalty in

multiplying by 0, 1 or 2. All that is needed is a multiplexer or equivalent, which has a delay

time that is independent of the size of the inputs. Negating 2's complement numbers has the

added complication of needing to add a "1" to the LSB, but this can be overcome by adding

a single correction term with the necessary "1"s in the correct positions.

Radix-4 Booth Recoding

To Booth recode the multiplier term, we consider the bits in blocks of three, such that each

block overlaps the previous block by one bit. Grouping starts from the LSB, and the first

block only uses two bits of the multiplier (since there is no previous block to overlap):

Figure 1 : Grouping of bits from the multiplier term, for use in Booth recoding. The least

significant block uses only two bits of the multiplier, and assumes a zero for the third bit.


21/23

The overlap is necessary so that we know what happened in the last block, as the MSB of

the block acts like a sign bit. We then consult the table 2-3 to decide what the encoding will

be.

Block Partial Product

000 0

001 1 * Multiplicand



100 -2 * Multiplicand



111 0

Table 1 : Booth recoding strategy for each of the possible block values.

Since we use the LSB of each block to know what the sign bit was in the previous block,

and there are never any negative products before the least significant block, the LSB of the

first block is always assumed to be 0. Hence, we would recode our example of 7 (binary

0111) as :

0 1 1 1

block 0 : 1 1 0 Encoding : * (-1)

block 1 : 0 1 1 Encoding : * (2)

In the case where there are not enough bits to obtain a MSB of the last block, as below, we

sign extend the multiplier by one bit.

0 0 1 1 1

block 0 : 1 1 0 Encoding : * (-1)



The previous example can then be rewritten as:

0 0 1 0 1 1 , multiplicand

0 1 0 0 1 1 , multiplier

1 1 -1 , booth encoding of multiplier

1 1 1 1 1 1 0 1 0 0 , negative term sign extended

0 0 1 0 1 1

0 0 1 0 1 1

0 0 0 0 1 , error correction for negation

0 0 1 1 0 1 0 0 0 1 , discarding the carried high bit

One possible implementation is in the form of a Booth recoder entity, such as the one in

figure 2-16, with its outputs being used to form the partial product:


22/23

Figure 2 : Booth Recoder and its associated inputs and outputs.

In figure 2,

The zero signal indicates whether the multiplicand is zeroed before being used as a

partial product

The shift signal is used as the control to a 2:1 multiplexer, to select whether or not

the partial product bits are shifted left one position.

Finally, the neg signal indicates whether or not to invert all of the bits to create a

negative product (which must be corrected by adding "1" at some later stage)The described operations for booth recoding and partial product generation can be

expressed in terms of logical operations if desired but, for synthesis, it was found to be

better to implement the truth tables in terms of VHDL case and if/then/else statements.

Sign Extension Tricks

Once the Booth recoded partial products have been generated, they need to be shifted and

added together in the following fashion:

[Partial Product 1]

[Partial Product 2] 0 0

[Partial Product 3] 0 0 0 0

[Partial Product 4] 0 0 0 0 0 0

The problem with implementing this in hardware is that the first partial product needs to be

sign extended by 6 bits, the second by four bits, and so on. This is easily achievable in

hardware, but requires additional logic gates than if those bits could be permanently kept

constant.

1 1 1 1 1 1 1 0 1 0 0

0 0 0 0 0 1 0 1 1

0 0 0 1 0 1 1

0 0 0 0 1 , error correction for negation

0 0 1 1 0 1 0 0 0 1

Fortunately, there is a technique that achieves this:

Invert the most significant bit (MSB) of each partial product

Add an additional '1' to the MSB of the first partial product


23/23

M.tech Final Project

Documents