7/31/2019 M.tech Final Project
1/23
VIF COLLEGE OF ENGINEERING &
TECHNOLOGY
( Himayat Nager ,gandipet x road )
(Affiliated to Jawaharlal Nehru Technological University)
logo
P.G.DEPARTMENT OF ECE
Seminar Report
On
--------------------------------------
Submitted By
Name:
Roll No:Branch:
7/31/2019 M.tech Final Project
2/23
VIF COLLEGE OF ENGINEERING &
TECHNOLOGY(---------)
Logo
P.G.DEPARTMENT OF ECE
Certificate
This is to certify that Mrs. ---------------------------------
bearing H.T.No. -------- has satisfactorily completed the course of
Seminar entitled --------------- prescribed by JNTU for the I -
SEMISTER of M.Tech( Branch) during the academic year 2010-
2011.
Faculty In-charge Head of the department
7/31/2019 M.tech Final Project
3/23
Acknowledgement
7/31/2019 M.tech Final Project
4/23
Abstract
This paper provides a detailed study of a configurable multiplier optimized for low power
and high speed operations and which can be configured either for single 16-bit
multiplication operation, single 8-bit multiplication or twin parallel 8-bit multiplication.
The output product can be truncated to further decrease power consumption and increase
speed by sacrificing a bit of output precision. Furthermore, the proposed multiplier
maintains an acceptable output quality with enough accuracy when truncation is performed.
Thus it provides a flexible arithmetic capacity and a tradeoff between output precision and
power consumption. The approach also dynamically detects the input range of multipliers
and disables the switching operation of the non effective ranges. Thus the ineffective
circuitry can be efficiently deactivated, thereby reducing power consumption and
increasing the speed of operation. Thus the proposed multiplier outperforms the
conventional multiplier in terms of power and speed efficiencies
7/31/2019 M.tech Final Project
5/23
List of Tables
7/31/2019 M.tech Final Project
6/23
List of Figures
7/31/2019 M.tech Final Project
7/23
Table of Content
Page No
Acknowledgement
Abstract
List of Tables
List of Figures
Chapter 1: Introduction
Chapter 2: Multipliers
Chapter 3: Booths Algorithm
Chapter 4: Timing and Area Analysis
Chapter 5: Conclusion
Appendix A. Test File Verilog Code
References
7/31/2019 M.tech Final Project
8/23
Chapter 1
Introduction
Portable multimedia and digital signal processing (DSP) systems, which typically require flexible
processing ability, low power consumption, and short design cycle, have become increasingly popular over the
past few years. Many multimedia and DSP applications are highly multiplication intensive so that the
performance and power consumption of these systems are dominated by multipliers. The computation of the
multipliers manipulates two input data to generate many partial products for subsequent addition operations,
which in the CMOS circuit design requires many switching activities. Thus, switching activity within the
functional unit requires for majority of power consumption and also increases delay. Therefore, minimizing the
switching activities can effectively reduce power dissipation and increase the speed of operation without
impacting the circuits operational performance. Besides, energy-efficient multiplier is greatly desirable for many
multimedia applications.
The first multiplication algorithm that was developed for the early computing requirements follow the
steps that we use to multiply two numbers by hand [1]. According to Patterson and Hennessy [1], when this
algorithm was translated for computer use, it required five hardware components, as seen in Figure 1.1. The
components included one register for each number (multiplicand, multiplier, and product), an ALU, and a
control. The algorithm involves multiplying each digit of the multiplier with the multiplicand and adding up the
individual results [1]. Since binary multiplication involves only 1s and 0s, the multiplication of each digit to
multiplicand translates to shifting and adding of the multiplicand.
Figure 1.1 First Multiplication Hardware Implementation
As seen in Figure 1.1, the control tests the multipliers least significant bit (LSB). If the LSB is 1, it
will send a signal to the ALU to add the multiplicand to the current calculated product. The multiplier is then
shifted to the right to fetch the next bit and multiplicand is shifted to the left to prepare for the next multiplication
iteration. This algorithm, shown as a flow chart in Figure 1.2 [1], is the basis for the pen and paper algorithm.
7/31/2019 M.tech Final Project
9/23
Figure 1.2 First Multiplication Algorithm Flowchart for 32-bit Numbers
Many tried to make several improvements to the traditional pen and paper algorithm by reducing the
amount of additions being performed in the algorithm. In 1951, based on the idea that computers are faster at
shifting bits than adding them [1], Andrew Donald Booth developed an algorithm known as Booths algorithm.
There were many such discoveries through the years to improve the efficiency and performance of the
multiplication algorithms.
Here attempt is made to combine configuration, partially guarded computation, and the truncation
technique to design a high speed and power-efficient configurable BM (CBM). The main concerns are speed,
power efficiency and structural flexibility. The proposed multiplier not only perform single 16-b, single 8-b, or
twin parallel 8-b multiplication operations but also offer a flexible tradeoff between output accuracy and power
consumption to achieve more power savings.
Several techniques are available [1] [3] to improve the speed and power efficiency is analyzed.
Approaches termed guarded evaluation, clock gating, signal gating, truncation etc. reduce the power
consumption and increase the speed of multipliers by eliminating spurious computations according to the
dynamic range of the input operands. The work in [4] separated the arithmetic units into the most and least
significant parts and turned off the most significant part when it did not affect the computation results to save
power. Techniques in [5] that can dynamically adjust two voltage supplies based on the range of the incoming
operands and disable ineffective ranges with a zero-detection circuitry were presented to decrease the power
7/31/2019 M.tech Final Project
10/23
consumption of multipliers. In [6] a dynamic-range detector to detect the effective range of two operands was
developed. The one with the smaller dynamic range is processed to generate booth encoding so that partial
products have a greater opportunity to be zero, thereby reducing power consumption maximally.
Furthermore, in many multimedia and DSP systems is frequently truncated due to the fixed register
size and bus width inside the hardware. With this characteristic, significant power saving can be achieved by
directly omitting the adder cells for computing the least significant
bits of the output product, but large truncation errors are introduced. Various error compensation approaches and
circuits, which add the estimated compensation carries to the carry inputs of the retained adder cells to reduce the
truncation error. In the constant scheme [7], constant error compensation values were pre-computed and added to
reduce the truncation error. On the contrary, data-dependent error compensation approaches [8] [10] were
developed to achieve better accuracy than that of the constant schemed were in data dependent error
compensation values will be added to reduce the truncation error of array and Booth multipliers (BMs).
Here, we attempt to combine configuration, partially guarded computation, and the truncation
technique to design a power-efficient configurable BM (CBM). Our main concerns are power efficiency and
structural flexibility. Most common multimedia and DSP applications are based on 816-b operands, the
proposed multiplier is designed to not only perform single 16-b but also performs single 8-b, or twin parallel 8-b
multiplication operations. The experimental results demonstrate that the proposed multiplier can provide various
configurable characteristics for multimedia and DSP systems and achieve more power savings with slight area
overhead.
Chapter 2
7/31/2019 M.tech Final Project
11/23
Multipliers
A binary multiplier is an electronic circuit used in digital electronics, such as
a computer, to multiply two binary numbers. It is built using binary adders.
A variety of computer arithmetic techniques can be used to implement a digital
multiplier. Most techniques involve computing a set ofpartial products, and then summing
the partial products together. This process is similar to the method taught to primary
schoolchildren for conducting long multiplication on base-10 integers, but has been
modified here for application to a base-2 (binary) numeral system.
History
Until the late 1970s, most minicomputers did not have a multiply instruction, and
so programmers used a "multiply routine"which repeatedly shifts and accumulates partial
results, often written using loop unwinding. Mainframe computers had multiply
instructions, but they did the same sorts of shifts and adds as a "multiply routine".
Early microprocessors also had no multiply instruction. The Motorola 6809,
introduced in 1978, was one of the earliest microprocessors with a dedicated hardware
multiply instruction. It did the same sorts of shifts and adds as a "multiply routine", but
implemented in the microcode of the MUL instruction.
As more transistors per chip became available due to larger-scale integration, it
became possible to put enough adders on a single chip to sum all the partial products at
once, rather than reuse a single adder to handle each partial product one at a time.
Because some common digital signal processing algorithms spend most of their
time multiplying, digital signal processor designers sacrifice a lot of chip area in order to
make the multiply as fast as possible; a single-cycle multiplyaccumulate unit often used
up most of the chip area of early DSPs.
Multiplication basics
The method taught in school for multiplying decimal numbers is based on calculating
partial products, shifting them to the left and then adding them together. The most difficult
part is to obtain the partial products, as that involves multiplying a long number by one
digit (from 0 to 9):
123
7/31/2019 M.tech Final Project
12/23
x 456
=====
738 (this is 123 x 6)
615 (this is 123 x 5, shifted one position to the left)
+ 492 (this is 123 x 4, shifted two positions to the left)
=====
56088
A binary computer does exactly the same, but with binary numbers. In binary
encoding each long number is multiplied by one digit (either 0 or 1), and that is much
easier than in decimal, as the product by 0 or 1 is just 0 or the same number. Therefore, the
multiplication of two binary numbers comes down to calculating partial products (which
are 0 or the first number), shifting them left, and then adding them together (a binary
addition, of course):
1011 (this is 11 in binary)
x 1110 (this is 14 in binary)
======
0000 (this is 1011 x 0)
1011 (this is 1011 x 1, shifted one position to the left)
1011 (this is 1011 x 1, shifted two positions to the left)
+ 1011 (this is 1011 x 1, shifted three positions to the
left)
=========
10011010 (this is 154 in binary)
This is much simpler than in the decimal system, as there is no table of
multiplication to remember: just shifts and adds.
This method is mathematically correct and has the advantage that a small CPU may
perform the multiplication by using the shift and add features of its arithmetic logic unit
rather than a specialized circuit. The method is slow, however, as it involves many
intermediate additions. These additions take a lot of time. Faster multipliers may be
engineered in order to do fewer additions; a modern processor can multiply two 64-bit
numbers with 16 additions (rather than 64), and can do several steps in parallel.
The second problem is that the basic school method handles the sign with a separate
rule ("+ with + yields +", "+ with - yields -", etc.). Modern computers embed the sign of the
7/31/2019 M.tech Final Project
13/23
number in the number itself, usually in the two's complement representation. That forces
the multiplication process to be adapted to handle two's complement numbers, and that
complicates the process a bit more. Similarly, processors that use ones' complement, sign-
and-magnitude, IEEE-754 or other binary representations require specific adjustments to
the multiplication process.
A more advanced approach: an unsigned example
For example, suppose we want to multiply two unsigned eight bit integers together: a[7:0]
and b[7:0]. We can produce eight partial products by performing eight one-bit
multiplications, one for each bit in multiplicand a:
p0[7:0] = a[0] b[7:0] = {8{a[0]}} & b[7:0]
p1[7:0] = a[1] b[7:0] = {8{a[1]}} & b[7:0]
p2[7:0] = a[2] b[7:0] = {8{a[2]}} & b[7:0]
p3[7:0] = a[3] b[7:0] = {8{a[3]}} & b[7:0]
p4[7:0] = a[4] b[7:0] = {8{a[4]}} & b[7:0]
p5[7:0] = a[5] b[7:0] = {8{a[5]}} & b[7:0]
p6[7:0] = a[6] b[7:0] = {8{a[6]}} & b[7:0]
p7[7:0] = a[7] b[7:0] = {8{a[7]}} & b[7:0]
where {8{a[0]}} means repeating a[0] (the 0th bit of a) 8 times (Verilog notation).
To produce our product, we then need to add up all eight of our partial products, as
shown here:
p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0]
+ p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[0] 0
+ p2[7] p2[6] p2[5] p2[4] p2[3] p2[2] p2[1] p2[0] 0 0
+ p3[7] p3[6] p3[5] p3[4] p3[3] p3[2] p3[1] p3[0] 0 0 0
+ p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[0] 0 0 0 0
+ p5[7] p5[6] p5[5] p5[4] p5[3] p5[2] p5[1] p5[0] 0 0 0 0 0
+ p6[7] p6[6] p6[5] p6[4] p6[3] p6[2] p6[1] p6[0] 0 0 0 0 0 0
+ p7[7] p7[6] p7[5] p7[4] p7[3] p7[2] p7[1] p7[0] 0 0 0 0 0 0 0
-------------------------------------------------------------------------------------------
P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
In other words,P[15:0] is produced by summingp0,p1
7/31/2019 M.tech Final Project
14/23
7/31/2019 M.tech Final Project
15/23
bit position 0 (LSB) and all the -1's in bit columns 7 through 14 (where each of the MSBs
are located) are added together, they can be simplified to the single 1 that "magically" is
floating out to the left. For an explanation and proof of why flipping the MSB saves us the
sign extension, see a computer arithmetic book
7/31/2019 M.tech Final Project
16/23
Chapter 3
Booths Algorithm
Booth's multiplication algorithm is a multiplication algorithm that multiplies two
signed binary numbers in two's complement notation. The algorithm was invented
by Andrew Donald Booth in 1950 while doing research on crystallography at Birkbeck
College in Bloomsbury, London. Booth used desk calculators that were faster
at shifting than adding and created the algorithm to increase their speed. Booth's algorithm
is of interest in the study of computer architecture.
The algorithm
Booth's algorithm examines adjacent pairs of bits of theN-bit multiplierYin
signed two's complement representation, including an implicit bit below the least
significant bit,y-1 = 0. For each bityi, fori running from 0 toN-1, the bitsyi andyi-1 are
considered. Where these two bits are equal, the product accumulatorPremains unchanged.
Whereyi = 0 andyi-1 = 1, the multiplicand times 2i is added toP; and whereyi = 1 andyi-1 =
0, the multiplicand times 2i is subtracted fromP. The final value ofPis the signed product.
The representation of the multiplicand and product are not specified; typically,
these are both also in two's complement representation, like the multiplier, but any number
system that supports addition and subtraction will work as well. As stated here, the order of
the steps is not determined. Typically, it proceeds from LSB to MSB, starting at i = 0; the
multiplication by 2i is then typically replaced by incremental shifting of thePaccumulator
to the right between steps; low bits can be shifted out, and subsequent additions and
subtractions can then be done just on the highestNbits ofP.[1] There are many variations
and optimizations on these details.
The algorithm is often described as converting strings of 1's in the multiplier to a
high-order +1 and a low-order 1 at the ends of the string. When a string runs through the
MSB, there is no high-order +1, and the net effect is interpretation as a negative of the
appropriate value.
A typical implementation
http://en.wikipedia.org/wiki/Two's_complementhttp://en.wikipedia.org/wiki/Booth's_multiplication_algorithm#cite_note-0http://en.wikipedia.org/wiki/Two's_complementhttp://en.wikipedia.org/wiki/Booth's_multiplication_algorithm#cite_note-07/31/2019 M.tech Final Project
17/23
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned
binary addition) one of two predetermined valuesA and Sto a productP, then performing a
rightwardarithmetic shift onP. Let m and r be the multiplicand and multiplier,
respectively; and letx andy represent the number of bits in m and r.
1. Determine the values ofA and S, and the initial value ofP. All of these numbers
should have a length equal to (x +y + 1).
1. A: Fill the most significant (leftmost) bits with the value ofm. Fill the
remaining (y + 1) bits with zeros.
2. S: Fill the most significant bits with the value of (m) in two's complement
notation. Fill the remaining (y + 1) bits with zeros.
3. P: Fill the most significantx bits with zeros. To the right of this, append the
value ofr. Fill the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits ofP.
1. If they are 01, find the value ofP+A. Ignore any overflow.
2. If they are 10, find the value ofP+ S. Ignore any overflow.
3. If they are 00, do nothing. UsePdirectly in the next step.
4. If they are 11, do nothing. UsePdirectly in the next step.
3. Arithmetically shift the value obtained in the 2nd step by a single place to the right.
LetPnow equal this new value.
4. Repeat steps 2 and 3 until they have been doney times.
5. Drop the least significant (rightmost) bit fromP. This is the product ofm and r.
Example
Find 3 (4), with m = 3 and r = 4, andx = 4 andy = 4:
m = 0011, -m = 1101, r = 1100
A = 0011 0000 0
S = 1101 0000 0 P = 0000 1100 0
Perform the loop four times :
1. P = 0000 1100 0. The last two bits are 00.
P = 0000 0110 0. Arithmetic right shift.
2. P = 0000 0110 0. The last two bits are 00.
http://en.wikipedia.org/wiki/Arithmetic_shifthttp://en.wikipedia.org/wiki/Arithmetic_shifthttp://en.wikipedia.org/wiki/Arithmetic_shifthttp://en.wikipedia.org/wiki/Arithmetic_shift7/31/2019 M.tech Final Project
18/23
P = 0000 0011 0. Arithmetic right shift.
3. P = 0000 0011 0. The last two bits are 10.
P = 1101 0011 0. P = P + S.
P = 1110 1001 1. Arithmetic right shift.
4. P = 1110 1001 1. The last two bits are 11.
P = 1111 0100 1. Arithmetic right shift.
The product is 1111 0100, which is 12.
The above mentioned technique is inadequate when the multiplicand is the largest negative
numberthat can be represented (e.g. if the multiplicand has 4 bits then this value is 8).
One possible correction to this problem is to add one more bit to the left of A, S and P.
Below, we demonstrate the improved technique by multiplying 8 by 2 using 4 bits for the
multiplicand and the multiplier:
A = 1 1000 0000 0
S = 0 1000 0000 0
P = 0 0000 0010 0
Perform the loop four times :
1. P = 0 0000 0010 0. The last two bits are 00.
P = 0 0000 0001 0. Right shift.
2. P = 0 0000 0001 0. The last two bits are 10.
P = 0 1000 0001 0. P = P + S.
P = 0 0100 0000 1. Right shift.
3. P = 0 0100 0000 1. The last two bits are 01.
P = 1 1100 0000 1. P = P + A.
P = 1 1110 0000 0. Right shift.
4. P = 1 1110 0000 0. The last two bits are 00.
P = 1 1111 0000 0. Right shift.
The product is 11110000 (after discarding the first and the last bit) which is 16.
Booth Recoding
http://en.wikipedia.org/wiki/Two's_complement#The_most_negative_numberhttp://en.wikipedia.org/wiki/Two's_complement#The_most_negative_numberhttp://en.wikipedia.org/wiki/Two's_complement#The_most_negative_numberhttp://en.wikipedia.org/wiki/Two's_complement#The_most_negative_number7/31/2019 M.tech Final Project
19/23
Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by
recoding the numbers that are multiplied. It is the standard technique used in chip design,
and provides significant improvements over the "long multiplication" technique.
Shift and Add
A standard approach that might be taken by a novice to perform multiplication is to
"shift and add", or normal "long multiplication". That is, for each column in the multiplier,
shift the multiplicand the appropriate number of columns and multiply it by the value of the
digit in that column of the multiplier, to obtain a partial product. The partial products are
then added to obtain the final result:.
0 0 1 0 1 1
0 1 0 0 1 1
0 0 1 0 1 1
0 0 1 0 1 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 0 1 1
0 0 1 1 0 1 0 0 0 1
With this system, the number of partial products is exactly the number of columns
in the multiplier.
Reducing the Number of Partial Products
It is possible to reduce the number of partial products by half, by using the
technique of radix 4 Booth recoding. The basic idea is that, instead of shifting and addingfor every column of the multiplier term and multiplying by 1 or 0, we only take every
second column, and multiply by 1, 2, or 0, to obtain the same results. So, to multiply by
7, we can multiply the partial product aligned against the least significant bit by -1, and
multiply the partial product aligned with the third column by 2:
7/31/2019 M.tech Final Project
20/23
Partial Product 0 = Multiplicand * -1, shifted left 0 bits (x -1)
Partial Product 1 = Multiplicand * 2, shifted left 2 bits (x 8)
This is the same result as the equivalent shift and add method:
Partial Product 0 = Multiplicand * 1, shifted left 0 bits (x 1)
Partial Product 1 = Multiplicand * 1, shifted left 1 bits (x 2)
Partial Product 2 = Multiplicand * 1, shifted left 2 bits (x 4)
Partial Product 3 = Multiplicand * 0, shifted left 3 bits (x 0)
The advantage of this method is the halving of the number of partial products. This is
important in circuit design as it relates to the propagation delay in the running of the
circuit, and the complexity and power consumption of its implementation.
It is also important to note that there is comparatively little complexity penalty in
multiplying by 0, 1 or 2. All that is needed is a multiplexer or equivalent, which has a delay
time that is independent of the size of the inputs. Negating 2's complement numbers has the
added complication of needing to add a "1" to the LSB, but this can be overcome by adding
a single correction term with the necessary "1"s in the correct positions.
Radix-4 Booth Recoding
To Booth recode the multiplier term, we consider the bits in blocks of three, such that each
block overlaps the previous block by one bit. Grouping starts from the LSB, and the first
block only uses two bits of the multiplier (since there is no previous block to overlap):
Figure 1 : Grouping of bits from the multiplier term, for use in Booth recoding. The least
significant block uses only two bits of the multiplier, and assumes a zero for the third bit.
7/31/2019 M.tech Final Project
21/23
The overlap is necessary so that we know what happened in the last block, as the MSB of
the block acts like a sign bit. We then consult the table 2-3 to decide what the encoding will
be.
Block Partial Product
000 0
001 1 * Multiplicand
010 1 * Multiplicand
011 2 * Multiplicand
100 -2 * Multiplicand
101 -1 * Multiplicand
110 -1 * Multiplicand
111 0
Table 1 : Booth recoding strategy for each of the possible block values.
Since we use the LSB of each block to know what the sign bit was in the previous block,
and there are never any negative products before the least significant block, the LSB of the
first block is always assumed to be 0. Hence, we would recode our example of 7 (binary
0111) as :
0 1 1 1
block 0 : 1 1 0 Encoding : * (-1)
block 1 : 0 1 1 Encoding : * (2)
In the case where there are not enough bits to obtain a MSB of the last block, as below, we
sign extend the multiplier by one bit.
0 0 1 1 1
block 0 : 1 1 0 Encoding : * (-1)
block 1 : 0 1 1 Encoding : * (2)
block 2 : 0 0 0 Encoding : * (0)
The previous example can then be rewritten as:
0 0 1 0 1 1 , multiplicand
0 1 0 0 1 1 , multiplier
1 1 -1 , booth encoding of multiplier
1 1 1 1 1 1 0 1 0 0 , negative term sign extended
0 0 1 0 1 1
0 0 1 0 1 1
0 0 0 0 1 , error correction for negation
0 0 1 1 0 1 0 0 0 1 , discarding the carried high bit
One possible implementation is in the form of a Booth recoder entity, such as the one in
figure 2-16, with its outputs being used to form the partial product:
7/31/2019 M.tech Final Project
22/23
Figure 2 : Booth Recoder and its associated inputs and outputs.
In figure 2,
The zero signal indicates whether the multiplicand is zeroed before being used as a
partial product
The shift signal is used as the control to a 2:1 multiplexer, to select whether or not
the partial product bits are shifted left one position.
Finally, the neg signal indicates whether or not to invert all of the bits to create a
negative product (which must be corrected by adding "1" at some later stage)The described operations for booth recoding and partial product generation can be
expressed in terms of logical operations if desired but, for synthesis, it was found to be
better to implement the truth tables in terms of VHDL case and if/then/else statements.
Sign Extension Tricks
Once the Booth recoded partial products have been generated, they need to be shifted and
added together in the following fashion:
[Partial Product 1]
[Partial Product 2] 0 0
[Partial Product 3] 0 0 0 0
[Partial Product 4] 0 0 0 0 0 0
The problem with implementing this in hardware is that the first partial product needs to be
sign extended by 6 bits, the second by four bits, and so on. This is easily achievable in
hardware, but requires additional logic gates than if those bits could be permanently kept
constant.
1 1 1 1 1 1 1 0 1 0 0
0 0 0 0 0 1 0 1 1
0 0 0 1 0 1 1
0 0 0 0 1 , error correction for negation
0 0 1 1 0 1 0 0 0 1
Fortunately, there is a technique that achieves this:
Invert the most significant bit (MSB) of each partial product
Add an additional '1' to the MSB of the first partial product
7/31/2019 M.tech Final Project
23/23