DESIGN OF LOW POWER COMPLEX MULTIPLIER USING COMPRESSORS THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR DEGREE OF MASTER OF TECHNOLOGY IN VLSI DESIGN BY Nilay Chandrakant Ghumre UNDER GUIDANCE OF PROF. DR. R. B. Deshmukh Department of Electronics and Computer Science Engineering Visvesvaraya National Institute of Technology Nagpur, May 2010
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DESIGN OF LOW POWER COMPLEX MULTIPLIER
USING COMPRESSORS
THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR DEGREE OF MASTER OF TECHNOLOGY IN VLSI DESIGN
BYNilay Chandrakant GhumreUNDER GUIDANCE OF
PROF. DR. R. B. Deshmukh
Department of Electronics and Computer Science EngineeringVisvesvaraya National Institute of Technology
Nagpur, May 2010
DEPARTMENT OF ELECTRONICS AND COMPUTER SCIENCEVISVESVARAYA NATIONAL INSTITUTE OF TECHNOLOGY
NAGPUR
Date:
CERTIFICATE
This is to clarify that the thesis entitled “ Design of Low Power
Complex Multiplier using Compressors” is bonafied workdone at Visvesvaraya
National Institute of Technology, Nagpur, India by Nilay Chandrakant Ghumre and is
submitted to Visvesvaraya National Institute of Technology, Nagpur, India in partial
fulfillment of degree of Master of Technology in VLSI Design
(Dr. R. B. Deshmukh) (Dr. R. M. Patrikar)
Guide Head of
Department
Department of Electronics and Computer Science EngineeringVNIT, Nagpur, India 440011. May 2010
DECLARATION
I here with submit the thesis “ Design of Low Power Complex
Multiplier using Compressors” to Visvesvaraya National Institute of Technology,
Nagpur for degree of Master of Technology in VLSI Design. I carried it out under the
guidance of Prof. R. B. Deshmukh, ( Department of Electronics and Computer Science
Engineering ).
This thesis has not been submitted to any other University/ Institute for
award of any degree or diploma.
Date: Nilay Chandrakant Ghumre
M. Tech, VLSI Design
VNIT, Nagpur, India.
Acknowledgement
I express my sincere gratitude to many people who have helped me and
supported during the project work. Without them I could not have completed the project
on time. I am thankful to my guide, Prof. Dr. R. B. Deshmukh, for his encouragement,
patience and valuable guidance throughout entire project, Prof. Dr. R. M. Patrikar for
their valuable suggestions and the whole VLSI design lab members for their cooperation
and coordination.
I also want to thank my colleagues and friends for their encouragement
while completing this project work, I want to thank my parents, without their emotional
and moral support nothing was possible. Their love and support always encouraged me,
and last but not least I am very thankful to God, who provided me good health and good
people around me.
Nilay Chandrakant Ghumre
ABSTRACT
In High-performance VLSI circuits, the on-chip power densities are playing
dominant role in both static and dynamic conditions due to shrinking device features. The
consumed power is usually dissipated heat, affecting the performance and reliability of
the chip. Complex Multiplier is an arithmetic circuit that is extensively used in DSP and
communication applications like, FFT, Digital Filters etc. For fast circuit implementation,
parallel multiplier is preferred. For large bit-width multiplications, a large number of
adders are required to perform the partial product addition..
Compressors are used to compress partial product addition stages. Higher order
compressors permit the reduction of the vertical critical paths in parallel multiplier
resulting in better speed-power product for the multiplier circuit. Thesis presents a novel
scheme for 16*16 bit multiplier using thirteen different types of compressors. The
scheme is optimized for low power as well as high speed implementation over reported
schemes. It represents low power multiplier design methodology, which counts only
number of 1’s in the partial products.
.
CONTENTS
1. INTRODUCTION1.1 Introduction
1.2 Complex Number
1.2.1 Operation of Complex Numbers
1.3 Organization of Thesis
2. SURVEY OF COMPLEX MULTIPLICATION 2.1 General rule of Complex Multiplication
2.2 Cases of Multiplication
2.3 Types of Complex Multiplication
2.3.1 Complex Multiplication for Area Efficient
2.3.2 Multiplication of Complex Number using a low power parallel multiplier
2.4 Related Research
2.4.1 Braun Multiplier
2.4.2 Baugh-Wooley Multiplier
2.4.3 Multiplier using Bypassing circuitary
2.4.4 Multiplier using Adder-Subtractor Unit (ASU)
2.5 Signed Number Multiplication
2.5.1 Representation of Negative Numbers
2.5.2 Booth’s Recoding Algorithm 2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-2 and Radix-4
Introduction The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale
systems design - in short, due to the advent of VLSI. The number of applications of
integrated circuits in high-performance computing, telecommunications, and consumer
electronics has been rising steadily, and at a very fast pace. Increasing demand for
portable electronics for computing and communication, as well as other applications, has
necessitated longer battery life, lower weight, and lower power consumption. In order to
satisfy these requirements, research activities focusing on low power/low voltage design
techniques are underway. Since 'power' is now one of the design decision variables, the
expanded design space required for low power has further increased the complexity of an
already non-trivial task. Low power design basically involves two concomitant tasks:
power estimation and analysis and power minimization. These tasks need to be carried
out at each of the levels in the design hierarchy, namely, the behavioral, architectural,
logic, circuit and physical levels.[1]
In the survey of the current state of the field, many of the salient power
estimation and minimization techniques proposed for low power VLSI design are
reviewed. For each of the design levels, we provide an overview of several power
estimation and minimization approaches and the CAD tools that support them. Finally,
future research issues are discussed that will be necessary in order to make the low power
design endeavor a successful one. In the majority of digital signal processing (DSP)
applications the critical operations are the multiplication and accumulation. Real-time
signal processing requires high speed and high throughput Multiplier unit that consumes
low power, which is always a key to achieve a high performance digital signal processing
system. The purpose of this work is design and implementation of a low power multiplier
unit with block enabling technique to save power[2].
1.1 IntroductionSizes of devices are scaling down by Moore Law. The sources of energy
consumption on a CMOS chip can be classified as static and dynamic power dissipation. The dominant component of energy consumption in CMOS is dynamic power consumption caused by the actual effort of the circuit to switch. A first order approximation of the dynamic power consumption of CMOS circuitry is given by the formula:
P = C*V2*f
Where P is the power, C is the effective switch capacitance, V is the supply voltage, and f is the frequency of operation. The power dissipation arises from the charging and discharging of the circuit node capacitances found on the output of every logic gate. Power management is the careful planning of power budget for every subsystem of a VLSI chip. This is especially important issue for today’s complex systems. The most important and successful use of power management is to deactivate a portion of circuit when its computation is not required [3].
Every low-to-high logic transition in a digital circuit incurs a change of voltage, drawing energy from the power supply. A designer at the technological and architectural level can try to minimize the variables in these equations to minimize the overall energy consumption. However, power minimization is often a complex process of trade-offs between speed, area, and power consumption. The current work proposes reduction of dynamic switching power in 16*16 complex multiplier by using higher order compressors to reduce the switching activity as well as reduction of gate counts.
Multipliers require high amount of power and delay during the partial products
addition. At this stage, most of the multipliers are designed with different kind of adders
that are capable to add two/three or at most 4 bits by using 4-2 compressors. For higher
order multiplications, a huge number of adders or compressors are used to perform the
partial product addition. Binary counter property has been merged with the compressor
property to develop higher order compressors[3] [5].
1.2 Complex Number:- A complex number is a number comprising a real and imaginary part. It can be
written in the form a + bi, where a and b are real numbers, and i is the standard imaginary
unit with the property i 2 = −1. To construct a complex number, we associate with each
real number a second real number. A complex number is then an ordered pair of real
numbers(a,b).
Complex numbers were first conceived and defined to to find solutions to cubic
equations. The solution of a general cubic equation in radicals (without trigonometric
functions) may require intermediate calculations containing the square roots of negative
numbers, even when the final solutions are real numbers. This ultimately led to the
fundamental theorem of algebra, which shows that with complex numbers, a solution
exists to every polynomial equation of degree one or higher. Complex numbers thus form
an algebraically closed field, where any polynomial equation has a root.
Complex numbers are usually written in the form (A+Bi), where a and b are real
numbers, and i is the imaginary unit, which has the property i 2 = −1. The real number a is
called the real part of the complex number, and the real number b is the imaginary part.
For example, 3 + 2i is a complex number, with real part 3 and imaginary part 2. If,
Z=A+Bi, the real part A is denoted by Re(Z) and imaginary part B is denoted by Im(Z).
The complex numbers (C) are regarded as an extension of the real numbers (R) by
considering every real number as a complex number with an imaginary part of zero. The
real number a is identified with the complex number a + 0i. Complex numbers with a real
part of zero (Re(z)=0) are called imaginary numbers. Instead of writing 0 + bi, that
imaginary number is usually denoted as just bi. If b equals 1, instead of using 0 + 1i or 1i,
the number is denoted as i.
Two complex numbers are said to be equal if and only if their real parts are
equal and their imaginary parts are equal. In other words, if the two complex numbers are
In many real-time DSP applications, high performance is a prime target.
However, achieving this may be done at the expense of area, power dissipation and
accuracy. Attempts have been made to use alternative number systems to optimize the
realization of arithmetic blocks, maintaining high performance without incurring
prohibitive area and power increases[1].
Fourier transforms play an important role in many digital signal processing
applications including speech, signal and image processing. However, direct computation
of Discrete Fourier Transform (DFT) requires on the order of N2 operations where N is
the transform size. Parallel-pipelined FFTs are preferred for both high throughput and
low power consumption.
2.1 General rule of Complex Multiplication:-
Consider two complex numbers: (a+bi) and (c+di) ,then
(a+bi).(c+di)=(ac-bd) + (ad+bc)i
(ac-bd) is the Real Part of Complex Multiplication and (ad+bc) is the Imaginary Part of
Complex Multiplication.
Remember that (ac–bd), the real part of the product, is the product of the real
parts minus the product of the imaginary parts, but (ad + bc), the imaginary part of the
product, is the sum of the two products of one real part and the other imaginary part.
The positive value is called the modulus of Z and is denoted as |Z|.
Z=a+bi , then |Z|=
2.2 Cases of Multiplication:-
i) Multiplication of Complex Number with Real Number:-
In the above formula for multiplication, if d is zero, then you get a formula for
multiplying a complex number a+bi and a real number c together:
(a+bi).c = ac + bc i.
In other words, we just multiply both parts of the complex number by the real
number. For example, let us take two numbers (1+2i) and 3 then after multiplication
of these two numbers we get:-
(1+2i).3= 3+6i
Geometrically, when you double a complex number, just double the distance from the
origin, 0. Similarly, when you multiply a complex number z by 1/2, the result will be
half way between 0 and z. You can think of multiplication by 2 as a transformation
which stretches the complex plane C by a factor of 2 away from 0; and multiplication
by 1/2 as a transformation which squeezes C toward 0.
ii) Multiplication of Complex Number with Imaginary Number:-
In the above formula for multiplication, if c is zero, then you get a formula for
multipliying a complex number a+bi and a imaginary number d together:
(a+bi).di = -bd+ad i.
In other words, we just multiply both parts of the complex number by the
imaginary number. For example, let us take two numbers (1+2i) and 3i then after
multiplication of these two numbers we get:-
(1+2i). 3i= -6+3i
2.3 Types of Complex Multiplication
2.3.1 Complex Multiplication for Area Efficient:-
i) Complex Multiplication using LNS [2]:-
Complex Multiplication for Lower Area i.e. to reduce hardware cost of realizing
Complex Multiplier is explained below using Logarithmic Number System(LNS). LNS
based complex multiplier employs correction algorithm. It composed with four real
multipliers, one adder and one subtractor. Attempts have been made to optimize the
realization of the complex multiplier by reducing the number of multipliers and
accumulating the partial products; however, the wider the input, the more partial product
layers that must be added in order to compute the result. To solve this problem, one can
consider the LNS to realize the multiplication as shown in Equations
Xo=AC-BD = log -1(log A + log C) – log -1(log B+ log D) Yo=BC+AD = log -1(log B+log C) + log -1(log A + log D)
Figure shows the complex multiplier block diagram that is composed from
logarithmic and anti-logarithmic converters and N-Bit Adders. This method can
significantly reduce the hardware to build a multiplier.
LNS provides a simple technique to compute multiplication at the cost of reduced
precision. This approach has limited accuracy.
ii) Complex Multiplier using OBC and DA [3] :-
A well known Area-Efficient method to implement Complex Multiplier is Offset
Binary Coded and Distributed Arithmetic. The structure of Complex Multiplier using
OBC-DA is shown below:-
Figure 2.1. OBC-DA based Complex Multiplier structure[3]
It is formed by the following modules:
a) Two registers that store a W-bits word each (-(cR-cI) and -(cR+cI)), whose outputs are
connected to two multiplexers that are controlled by an XOR of the
input bits.
b) Two shift-accumulators SA to add and shift the multiplexer output.
In this structure a subtraction can happens in each cycle of the computation, as a
difference with the previous one where it only happens during the last cycle. The extra-
bit slide is a bit-serial adder which is needed to complete the two’s complement in any
cycle. Another difference is that SA2 includes hardware for loading the offset
value (Ao) in carry registers.
2.3.2 Multiplication of Complex Number using a low power parallel multiplier:-
The Conventional Technique of Complex Multiplier is given as
(A + Bj) . (C + Dj) = (AC –BD) + (AD + BC )j
It requires four multiplication and two adders . In this technique a different way for the
realization of complex multiplication that reduces complexity of the circuit. The
canonical form of the obtained circuits makes them well suited for VLSI realizations.
Besides circuit reduction, the hardware or software for the control in the realization of the
algorithms is simplified, especially when either of these includes only complex
operations, as in an FET. Each complex bit takes four possible values. Consequently, it
must be represented by two bits. This representation allows the development of
algorithms for operations with complex numbers and the ability to describe these
algorithms in the bit-level. It is natural that these algorithms and the corresponding
circuits have great similarities to those for real numbers in two’s complement form.
Complex Parallel multiplication is the most critical for realization. The parallel multiplier
includes specialized hardware circuitry designed to perform complex multiplication
operations at high speeds. The parallel multiplier requires significantly less die area than
conventionally required, which results in reduced manufacturing costs and reduced power
consumption.[4]
2.4 Related Research:-
In FPGA designs power reduction is possible only through reduced switching
activity, which is also called dynamic power. In general dynamic power consumption is
defined as the power consumed while the clock is running and the external inputs are
switching. In general design practices to reduce switching activity reduction can be
controlled at various levels of the design flow. Architectural decisions in the early design
phases have the greatest impact. For high switching signals, delay balancing and
reduction of the number of logic levels are among the most efficient techniques to tackle
power penalty. An obvious method to reduce the switching activity is to shut down the
idle part of the circuit, which is not in operating condition.
A general M x N parallel multiplier operates by computing the partial products in
parallel and by shifting and accumulating the partial products. Switching activity is
poorly correlated with the input coefficient. In particular, reducing the switching activity
of the component used in the design can minimize the power dissipation i.e. if kth bit of
the coefficient is zero, the kth row of adders need not be activated. However, this type of
multiplier does not help us for reduced switching since there is unnecessarily switching
of adders even if the kth bit is zero.
2.4.1 Braun Multiplier[4][5] :-
Figure 2.2 4x4 Braun Multiplier
Above figure shows structure of 4*4 Braun Multplier. An n*n bit Braun
Multiplier requires n(n-1) adders and n2 AND gates. In these technique each partial
product can be added to previous sum of partial products by using row of adders. The
Carry-out signals are shifted one bit to the left and then added to the sum of the first
adder which is adition of partial product bits. The shifting of carry-out bits to the left is
done by carry-save adder. As carry bits are passed diagonally downward to the next adder
stage, there is no horizontal carry propagation for the first four rows. Instead, the
respective carry bit is “saved” for the subsequent adder stage.
Braun Multiplier has some drawback that, the number of components required in
building the Braun Multiplier increases quadratically with number of bits. This makes
Braun Multiplier inefficient. The delay of Braun Multiplier is dependent on full adder cell
and also on final adder in last row. In this multiplier array, a full adder with balanced
carry and sum delays is desirable because sum and carry both are in critical path .
2.4.2 Baugh-Wooley Multiplier[6]:-
Baugh-Wooley Multiplier are used for both unsigned and signed number
multiplication. Signed Number operands which are represented in 2’s complemented
form. Partial Products are adjusted such that negative sign move to last step, which in
turn maximize the regularity of the multiplication array. Baugh-Wooley Multiplier
operates on signed operands with 2’s complement representation to make sure that the
signs of all partial products are positive.
To reiterate, the numerical value of 2’s complement numbers, suppose X and Y
can be obtained from following product terms made of one AND gate.
Variables with bars denotes prior inversions. Inverters are connected before the input of
the full adder or the AND gates as required by the algorithm. Each column represents the
addition in accordance with the respective weight of the product term.
2.4.3 Multiplier using Bypassing circuitary:-
In these technique, The main idea of our approach is based on the observation that
most modern multipliers produce a large number of signal transitions while adding zero
partial products. If, any bit of the multiplier is zero that row of adders need not to be
activated, since corresponding partial product is zero. The adders of these multiplier,
however perform summation of the zero partial products and, as result, exhibit redundant
signal switching. The increased activity of the internal nodes results in unnecessary
power dissipation[7] [8].
To disable this adder rows we have to bypass the partial product of previous adder
row to next adder row. It modifies the unnecessary transitions and bypass inputs to
outputs when corresponding partial product is zero. Multiplexers are used at the output of
full adder to pass the partial product directly when it is zero to the next stage.
Figure 2.3 4*4 Bypass Multiplier
The tri-state buffers, placed at the inputs of the adder cell, disable signal transitions in
those adding cells which are bypassed. The output carry-bits c are passed downwards,
instead of to the right [9].
2.4.4 Multiplier using Adder-Subtractor Unit(ASU)[4] :-
In these technique, higher power reduction can be achieved if the operand
contains more number of 0’s than 1’s. In this approach it was propose Binary / Booth
Recoding Unit which will force operand to have more number of zeros. The advantage
here is that if operand contains more successive number of ones then Binary / Booth
Recoding unit converts these ones in zeros. Adder-Subtractor Unit also removes the extra
2’s complement addition circuitry needed. Use of look up table is again an added
advantage to this design.
The switching activity of the component used in the design depends on the input
bit coefficient. This means if the input bit coefficient is zero, corresponding row or
column of adders need not be activated. If operand contains more zeros, higher power
reduction can be achieved. We proposed a Binary / Booth Recoding Unit which will
force operand to have more number of zeros.
+/-+/-+/-
texttexttext +/-+/-+/-
+/-+/-+/-
SASASA
XOR
XOR
XOR
XOR
XOR
XOR
a1b0a2b0 a0b0a3b0s2b1 s1b1 s0b0
Mux
Mux
MuxMux
s2b2 s1b2 s0b2
MuxMuxMux
a0a1a2
a0a1a2XOR
s2b3XOR
s1b3XOR
s0b3
Mux Mux
XOR
s3b3 a0a1a2
P5 P1P2P3P4P6 P0
AND AND AND
a0a1a2
a3b1
a3b2
Figure 2.4 4*4 ASU Multiplier [4]
Figure shows the 4x4 low power ASU multiplier structure. This technique will be
very useful as we go for higher width of the multiplicand specially when there are
successive numbers of ones.Each ASU will work as an adder or subtractor depending
upon the sign bit of sign register. For multiplication with b it will make ASU to work as
subtractor and with 0 and 1, it will work as an adder. The great advantage of this
technique is that we don’t need extra addition circuitry to add sign extension bits when
multiplicand bit is –1. In the upper row of architecture we need to and sign bits with b0.
Since when sj=1 and b0=0, if not added produces wrong outputs. At the bottom, ASU
will work as half adder or subtractor depending upon the sign bits. For higher width of
multiplicand smart adder chain will continue.
Figure 2.5 Adder Subtractor Unit[1]
Figure 2.6: - Smart Adder (SA)
The Modified Full Adder-Subtractor Unit is constructed as shown in figure. If aj is zero,
FA is disabled. Here sj is a sign bit of operand. Structure of smart adder is shown in
figure.
2.5 Signed Number Multiplication:-As we seen in unsigned multiplication, user has to input number as well as
sign ,so for total operation of this multiplier we required more hardware and more
switching operation hence the switching power, i.e. dynamic power will be more for
Unsigned Multiplication.
In Signed Multiplication, directly user has to enter signed number, so there is no
need to enter separate sign bit for all four numbers. The only difference between Signed
number and Unsigned number is the range of the number. As, we saw earlier in section
3.1 the range of the Unsigned number is from 0 to 2ⁿ-1. So, the range of the Signed
Number is from –2ⁿ -1 to +(2ⁿ -1-1).
2.5.1 Representation of Negative Numbers:-
For fixed-point number in a radix r system, we have to determine way of negative
number to be represented. Two different forms are commonly used:-
1. Sign and Magnitude Representation.
2. Complement Representation.
1.Sign and Magnitude Representation:-
In this form of representation sign and magnitude are represented separately. First
digit is sign bit and the remaining (n-1) bits are magnitude. In binary case, ‘0’ is
represented as positive and ‘1’ is represented as negative. In the non-binary case, value 0
and (r-1) are assigned to the sign digit of positive and negative number, respectively. In
the binary case all 2n sequences are utilized. The 2n-1 sequence from 00----0 to 01----1
represents positive number, while the remaining 2n-1 sequences from 10----0 to 11----1
represents negative number. A major disadvantage of the signed-magnitude
representation is that the operation to be performed may depend on the signs of the
operand. For example, when adding a positive number X and a negative number –Y, we
need to perform the calculation X+(-Y). If, Y>X, then we should obtain as a final result
–(Y-X). For that we have to perform (Y-X) ,i.e., switch the order of operands and
perform subtraction rather than addition, and then attach minus sign to it.
Example:- +7 would be 111 and then a 0 in front so 00000111 for an 8-bit representation.
-9 would be 1001 (+9) and then a 1 so 10001001 for an 8-bit representation
2. Complement Representation:-
In complement representation, numbers are represented as two’s complement in
the binary section. In this method, positive number is represented in the same way as
signed-magnitude method. It is most widely used method of representation. Positive
numbers are simply represented as a binary number with ‘0’ as sign bit. To get negative
number convert all 0’s to 1’s , all 1’s to 0’s and then add ‘1’ to it. Suppose, a number
which are in 2’s complement form and we have to find its value in binary, then if number
starts with ‘0’ then it is a positive number and if number starts with ‘1’ then it is a
negative number.
If, number is negative take the 2’s complement of that number, we will get number
in ordinary binary. Let us take, 1101. Take the 2’s complement then we will get 0011.
As, number is started with ‘1’ it is negative number and 0011 is binary representation of
positive 3. So, the number is -3. Similarly, we are representing other negative numbers in
2’s complement representation.
Suppose we are adding +5 and -5 in decimal we get ‘0’. Now, represent these
numbers in 2’s complement form, then we get +5 as 0101 and -5 as 1011. On adding
these two numbers we get 10000. Discard carry, then the number is represented as ‘0’
In this signed multiplication we had modified the Complex Multiplication
strategy, normally we are having Four Multipliers and three adder/subtractor blocks.
But,in modified strategy we require Three Multipliers and five Adders.
For Complex Multiplication of two numbers:-
(a+jb).(c+jd) we get
Real Part:- (c-d).b + c.(a-b)
Imaginary Part:- (c+d).a – c.(a-b)
So, we required only Three Multiplication term as c.(a-b) is common term in
both results. Hence, we are saving more power than we used in previous method of
Complex Multiplication.
2.5.2 Booth’s Recoding Algorithm:-Parallel Multiplication using basic Booth’s Recoding algorithm
technique based on the fact that partial product can be generated for group of consecutive 0’s and 1’s which is called as Booth’s Recoding. These Booth’s Recoding algorithm is used to generate efficient partial product. These Partial Products always have large number of bits than the input number of bits. This width of partial product is usually depends upon the radix scheme used for recoding. These generated partial products are added by compressor’s as explained in section 3.2. So, these scheme uses less partial products which comprises low power and area.
There are two types of algorithm Radix-2 and Radix-4 to generate efficient partial products for multiplication. First we will explain basic technique of Booth’s Recoding algorithm and then Modified Booth’s Recoding technique for both Radix-2 and Radix-4 algorithm.
2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-2 and Radix-4:-Booth has proposed Radix algorithm for high speed multiplication which reduces partial products for multiplication. The Booth’s algorithm for multiplication is based on this observation. To do a multiplication A*B, whereA= an ,an-1…..a0 is a multiplier
B= bn ,bn-1…..b0 is a multiplicand
then, we check every two consecutive bits in A at a time:-Ai Ai-1 Y Comments Explanation0 0 0 Middle of 0’s String of 0’s shift only0 1 1.B End of 1’s Add and Shift1 0 -1.B Beginning of
1’sAdd and Shift
1 1 0 Middle of 1’s String of 1’s shift only Table 2.1. Booth’s Recoding algorithm Radix-2Ai+1 Ai Ai-1 Y Comments Explanation
0 0 0 0 Strings of zeros
Two bit shift only
0 0 1 1.B End of 1’s Add and two bit shift0 1 0 1.B A single 1 Add and two bit shift0 1 1 2.B End of 1’s Add and two bit shift1 0 0 -2.B Beginning of
1’sAdd and two bit shift
1 0 1 -1.B A single 0 Add and two bit shift1 1 0 -1.B Beginning of
1’sAdd and two bit shift
1 1 1 0 Strings of Two bit shift only
zeros
Table 2.2. Booth’s Recoding algorithm Radix-4
Let us take example:-
Radix-2:-
Suppose A is Multiplier having value -5 and B is Multiplicand having value +2 then,
B=> 0010 (+2)
A=> 1011 (-5)
After looking into above table for multiplicand, first we see two LSB values and then
adjacent values in A. We, get partial product as:-
i) For 10 we have to perform -1.B, i.e., 2’s complement of B, 1110.
ii) For 11 we have to put all 0’s i.e., 0000.
iii) For 01 we have to perform 1.B, i.e., value of B,0010
iv) For 10 again -1.B, i.e. 1110.
Here, some bits are encapsulated called as correction bits to match the width of partial
products.
Radix 4:-A=> -5 => 1 1 1 1 1 0 1 1B=> 46 => 0 0 1 0 1 1 1 0, then the following Partial Products are generated:-
In the above technique of Booth’s Algorithm vertical length of partial products are more, hence more adders are required, so power and area will be more.
-:References:-
[1] Solomentsev, E.D. (2001), "Complex number", in Hazewinkel, Michiel,
Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104
[2] Man Yan Kong; Langlois, J.M.P.; Al-Khalili, D.(2008), “Efficient FPGA
implementation of complex multipliers using the logarithmic number system “Circuits
and Systems, 2008. ISCAS 2008. IEEE International Symposium on Digital Object
[6] C. R. Baugh and B. A.Wooley, .A two.s complement parallel array multiplication
algorithm., IEEE Trans. Comput., Dec. 1973, vol. C-22, pp. 1045-1047.
[7] Ko-Chi Kuo; Chi-Wen Chou (2006),” Low Power Multiplier with Bypassing
and Tree Strucuture” Circuits and Systems, 2006. APCCAS 2006. IEEE
Asia Pacific Conference 4-7 Dec. 2006,602 – 605.
[8] J. Ohban, V.G. Moshnyaga, and K. Inoue, Multiplier energy reduction through
bypassing of partial products, Asia-Pacific Conf. on Circuits and Systems. 2002.,vol.2,
pp. 13-17.
[9] Ming-Chen Wen, Sying-Jyan Wang, and Yen-Nan Lin, Low Power Parallel
Multiplier with Column Bypassing, Electronics letters, 10, 12 May 2005 Volume
41, Issue Page(s): 581 – 583
Chapter 3.
Multiplier Unit
As explained in previous chapters about various technique of Complex
Multipliers, we found that implementation of Complex Multipliers are implemented
using more than one number of Basic Multipliers are required, i.e. to implement normal
way to implement Complex Multiplication, four Basic Multipliers are required. To make
Complex Multiplier as low power unit, this Basic Multipliers are designed by using
Compressor technique. If, the Basic Multiplier is designed as low power then Complex
Multiplier also becomes a low power unit.
Figure 3.1 Internal Block Diagram of 16*16 Basic Multiplier[2]
The above figure shows Internal Block Diagram of Basic Multiplier. It consists of three
stages:-
i) Partial Product Generator
ii) Different Order Compressors
iii) Parallel Adder
Below is the description of all three blocks that are used for multiplication.
3.1 Partial Product Generator:-
In Unsigned Multiplier, normally we are generating partial products and adding
them to generate result of multiplier. Let ‘A’ and ‘B’ are two n-bit unsigned numbers
which is generating product ‘Z’ which is of 2n-bit. First we are generating Partial
products by using ‘AND’ operation. For n bit number multiplication n*n number of
partial product generated.
Let us take two 16-bit numbers A15-A0 called Multiplicand and B15-B0 called
Multiplier as inputs of multiplier, partial products are generated by ANDing each bit of
‘A’ with each bit of ‘B’, so 16*16=256 number of partial products are generated. Each bit of multiplicand is ANDed with every bit of multiplicand. a0 is ANDed with b0-b15 producing m00-m015 sixteen partial product for first row. Similarly, for other 14 rows we are using AND operation of a1-a15 with b0-b15 for producing other 240 remaining partial products i.e. from m01-m1515.
Figure 3.2. Partial Product Generator(4 Bit)
In above diagram Partial Product Generator is explained. a0 bit which is multiplicand is ANDed with other bits of multiplier b0-b3 producing sixteen partial products m00-m33. This Partial Products is going to the inputs of Compressors to compress the partial product stages. This Compressors are used to reduce the stages of partial products into only two stages.
3.2 Different Order Compressors[1][3][4]:- After Generation of Partial Products, these partial products are going to inputs to compressors. Compressors are used to reduce the partial product stages of the multiplier. The main operation of compressors is to count number of 1’s. After generating partial products we have make vertical groups. This vertical groups will count number of 1’s and count value of that group is passed it on second stage.
3.2.1 Adder as Counter:-Adder circuit whether it is a full adder or half adder can be used as a counter which counts number of 1’s.
Above table shows the half adder and full adder as a counter, it counts number of 1’s , if inputs are A,B and C then its count value carry and sum together gives number of 1’s in binary form. Carry is Most Significant Bit and Sum is Least Significant Bit. This adder which uses three inputs and generating two outputs, so it means it compresses three bits into two bits called 3:2 compressor. Similarly, on the basis of these logic we can make other types of compressors having more number of inputs called higher order compressors. These compressors count number of 1’s of higher number of inputs. So, as vertical length of partial products increases we can use these higher order compressors.
3.2.2Compressor Logic:-
Different Compressor logic based upon the concept of counter of full adder. It can be defined as single bit adder circuit that has more than three inputs as in full adder and less number of outputs. It is noticed that in full adder there are three outputs so, it will count upto three(11). Similarly, for three bit output it will count upto maximum seven(111) value. Compressors having four,five,six and seven number of inputs produces three number of outputs which counts maximum seven(111) value. Other Compressors having eight to fifteen number of inputs produces four number of outputs which counts maximum fifteen(1111) value. So, these compressors are build depend on number of inputs they are having and what count value they have to generate. Following is the description of different compressor logics with their block diagrams:-
1) 4:3 Compressor:-
Figure 3.5. Block Diagram of 4:3 Compressor
Above figure shows block diagram of 4:3 Compressor. It consists of four inputs and three outputs. 4:3 Compressor has two Half Adders and one Parallel Adder. If, all four inputs are 1 then it will give maximum count value as 100 . Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
2) 5:3 Compressor:-
Figure 3.6. Block Diagram of 5:3 Compressor
Above figure shows block diagram of 5:3 compressor. It consists of five inputs and three outputs. 5:3 Compressors has one Half adder, one Full adder and a Parallel Adder. So, the maximum count value will be 101. Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
3) 6:3 Compressor:-
Figure 3.7. Block Diagram of 6:3 CompressorAbove figure shows block diagram of 6:3 compressor. It consists of six inputs and three outputs. 6:3 Compressor has two Full adders and one parallel adder.So, the maximum count value of 6:3 compressor will be 110. Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
4) 7:3 Compressor:-
Figure 3.8. Block Diagram of 7:3 Compressor
Above figure shows block diagram of 7:3 compressor. It consists of seven inputs and three outputs. 7:3 Compressors has one 4:3 Compressor, one Full adder and one parallel adder. So, the maximum count value of 7:3 compressor is 111. Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
5) 8:4 Compressor:-
Figure 3.9. Block Diagram of 8:4 Compressor
Above figure shows block diagram of 8:4 compressor. It consists of eight inputs and four outputs. 8:4 Compressor has one 5:3
Compressor, one Full Adder and one Parallel Adder. The maximum count value of 8:4 compressor is 1000. Consider the output bits represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.
6) 9:4 Compressor:-
Figure 3.10. Block Diagram of 9:4 Compressor
Above figure shows block diagram of 9:4 Compressor. It consists of nine inputs and four outputs. 9:4 Compressor has one 6:3 Compressor, one Full Adder and one parallel adder. The maximum count value of 9:4 compressor is 1001. Consider the output bits represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.
6) 10:4 Compressor:-
Figure 3.11. Block Diagram of 10:4 CompressorAbove Figure shows block diagram of 10:4 Compressor. It consists of ten inputs and four outputs. 10:4 Compressor has one 7:3 Compressor, one Full Adder and one Parallel Adder.The maximum count value of 10:4 compressor is 1010. Consider the output bits represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.
7) 11:4 Compressor:-
Figure 3.12. Block Diagram of 11:4 Compressor
Above Figure shows Block Diagram of 11:4 Compressor. It consists of eleven inputs and four outputs. 11:4 Compressor has one 7:3
Compressor, one 4:3 Compressor and one Parallel Adder. The maximum count value of 11:4 compressor is 1011. Consider the output bits represented as j, (j+1),(j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB.
8) 12:4 Compressor:-
Figure 3.13. Block Diagram of 12:4 Compressor
Above Figure shows Block Diagram of 12:4 Compressor. It consists of twelve inputs and four outputs. 12:4 Compressor has one 7:3 Compressor, one 5:3 Compressor and one three-bit Parallel adder. The maximum count value of 12:4 compressor is 1100. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB.
9) 13:4 Compressor:-
Figure 3.14. Block Diagram of 13:4 Compressor
Above Figure shows Block Diagram of 13:4 Compressor. It consists of thirteen inputs and four outputs. 13:4 Compressors has one 7:3 Compressor, one 6:3 Compressor and one three-bit parallel adder.The maximum count value of 13:4 compressor is 1101. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB.
10) 14:4 Compressor:-
Figure 3.15. Block Diagram of 14:4 Compressor
Above Figure shows Block Diagram of 14:4 Compressor. It consists of fourteen inputs and four outputs. 14:4 Compressor has two 7:3 Compressors and one three-bit parallel adder. The maximum count value of 14:4 compressor is 1110. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB
11) 15:4 Compressor:-
Figure 3.16. Block Diagram of 15:4 Compressor
Above Figure shows Block Diagram of 15:4 Compressor. It consists of fifteen inputs and four outputs. 15:4 Compressors has one 8:4 Compressor, one 7:3 Compressors and one three-bit parallel adder.The maximum count value of 15:4 compressor is 1111. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB
12) 16:5 Compressor:-
Figure 3.17. Block Diagram of 16:5 Compressor
Above Figure shows Block Diagram of 16:5 Compressor. It consists of sixteen inputs and five outputs. 16:5 Compressors has two 8:4 Compressors and one four-bit parallel adder. The maximum count value of 16:5 compressor is 10000. Consider the output bits represented as j, (j+1), (j+2) ,(j+3) and (j+4). (j+4)th bit is MSB and jth bit is LSB.
These different order Compressors are used to reduce the partial product stages. Compressors are also used to reduce the switching operations as we are used to count the number of 1’s only. The partial products generated is divided into different order compressors vertically.
3.3 Parallel Adders:-
Figure 3.18. Block Diagram of Parallel Adder
Above figure shows Block Diagram of Parallel Adder. It consists of cascaded Full Adder’s. Depending on length of output that many of adders are used. For N*N multiplication 2N number of full adders are used. Here, Cout of first full adder is connected to Cin of next adjacent full adder. The main concept of these parallel adder is comes from Carry Look-ahead Adder. The output of Parallel Adder is the final output of Multiplier.
3.4 Architecture of Multiplier Using Compressor:-Following figure shows the Architecture of 8*8 Multiplier using different order Compressors.
.
Figure 3.19. Architecture of 8*8 Multiplier using Compressors[2]
As, shown in above figure Partial Products are added in four stages. Adders and different compressors are used to minimize the stage operations. Compressors are used carefully so that minimum number of outputs are generated. Consider column number eight, where eight bits are added at the first stage. These eight bits are added by using 8:4 Compressor, that generates four output which eventually decreases number of bits for next stage.
It is to be mentioned that output of each compressor from 4:3 to 7:3 has bit position jth, (j+1)th and (j+2)th, where jth bit is LSB bit and (j+2)th bit is MSB bit.Compressor from 8:4 to 15:4 has bit position jth, (j+1)th, (j+2)th and (j+3)th, where jth bit is LSB and (j+3)th is MSB. Compressor 16:5 has bit position jth, (j+1)th, (j+2)th, (j+3)th and (j+4)th, where jth bit is LSB and (j+4)th is MSB. Suppose, if compressor in column number four i.e.,4:3 Compressor, its jth output goes to column number four and next adjacent output i.e.,(j+1)th output goes to column number five and (j+2)th output goes to column number six. Similarly, for eight column i.e. for 8:4 compressor,its jth output goes to column number eight and next adjacent output (j+1)th output goes to column number nine and last output(j+3)th output goes to column number eleven. Thus, these compressors are used to reduce vertical critical path more rapidly.
Now, similarly for next stage if vertical path having bit more than two bits, we used compressors of that many bits to reduce again the vertical critical path. Finally, we use compressors upto the stage where only vertically two bits are there and that two bits are added parallely as explained in section 3.3.
Above Block Diagram shows Modified Complex Multiplier which consists of three
multipliers and three adder/subtractor unit. These multiplier requires one less multiplier
compare to previous technique. So, it consumes less power. To perform signed
multiplication we are using Booth’s Radix algorithm. Booth’s Radix algorithm reduces
partial products as compared to normal multiplier algorithm. So, it reduces the switching
operation of the multiplier, hence reduces power. It is based on the fact that partial product can be generated for group of consecutive zeros & ones which is called as Booth’s recoding.
4.2.1 Modified Technique Recoding Algorithm for Radix-2 and Radix-4[1][2]:-
Parallel Multiplication using basic Booth Recoding Technique is explained in previous section. Since this technique requires lot of adders as a
result it requires more power & area. In next proposed multiplier design, we have reduced
number of adders required in partial product addition. Hence, reduction of vertical length
of Partial Products. In these technique, mainly correction bits are reduced This is done
without compromising correctness of multiplication of 2’s complement numbers. We
have used Multiplexer based Booth Recoding scheme to reduce the length and width of
partial products.
In these technique, change in scheme results in partial products which after recoding
are always greater than input bit length by one bit Radix-2 scheme. Similarly, in Radix-4
scheme recoding are always greater than input bit length by two bits. These additional
bit/bits are act as a correction bit/bits to get correct value of the multiplier. Also, at
hardware realization of Booth’s recoding scheme, we can remove extra select line, which
is used at the time of recoding. Because of this extra select lines multiplexer size become
large. We have observed that if we do not consider this extra bit at the time of hardware
realization we can reduces size of one multiplexer. So, in radix 2 LSB decides first partial
product. Also, in radix 4 first two LSB bits decides first partial product. Now these partial
products have been added using proposed array of adders to achieve correct
multiplication output. The working of this novel design has been explained in following
sections.
Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier[1]
In order to achieve signed number multiplication Partial Products are generated
using Modified Booth’s Recoding Unit Multiplication block. After generation of new
Partial products these are added using Compressors and Parallel adder. Below is the
explanation of Modified Booth’s Recoding Unit for Multiplier.
4.2.2 Modified Booth’s Recoding Unit[3]:-
Partial Products are generated using Modified Booth’s Recoding Unit block. As,
we saw in previous section generation of Partial Products for basic Booth’s Recoding
algorithm, using the same concept we are generating partial products for Modified
Booth’s Recoding Algorithm having the length of partial product more than input bit
sequence by one for Radix-2 scheme and by two for Radix-4 scheme.
These modified technique is explained below:-
Radix-2 Method:-
As, we saw in Table 1. output partial products are added and shifted according to
input sequence. Here, we are using multiplexers to generate recoding unit. Select lines of
multiplexers are input bits of multiplier and outputs are according to modified table as
Above Table shows how partial products are generated according to input bit sequence.
Here, we are generating two extra bits according the input bit. These two bits are
correction bits to get corrected output of multiplication. MSBs of partial products need to
be added carefully. For that, new structure of adder array is introduced. This modification
removes the problem of large number of correction bits which requires more numbers of
adders hence more higher order compressors.
4.3 Compressors and Adders:-
Recoding and Addition scheme for Radix-2 and Radix-4 for four bit input sequence
[4] [5]:-
Figure 4.6 Addition scheme for Radix-2
Above figure shows the addition scheme for Radix-2 which having five bit partial
product. These partial product are added using compressor scheme as explained
previously. Here, value of m(0)(4) is added diagonally. i.e, added with diagonal bit which
is MSB of second partial product and also a correction bit. So, we are adding m(0)(4)
with m(1)(4) and result of that is putting in place of m(1)(4). Similarly, that new value of
MSB of second partial product row is added with old MSB of third partial product to get
new value of MSB of third partial product as shown in above figure. After getting new
values of correction bit we are adding these nits by using compressors.
Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2 [5]
Above figure shows Architecture of 8*8 Signed Multiplier for Radix-2 scheme where
partial products are generated by using Modified Booth’s Recoding Unit. Here, we are
generating partial product of 9 bits per row. In first stage, this partial products are divided
in vertical blocks, these vertical blocks are half adders, full adders and different order
compressors. Vertical block of 2 Bits are half adders and vertical block of 3 bits are full
adders. Output of these adders and compressors arranged as explained in chapter 3.
Horizontal blocks are parallel adders which are used for addition to generate final
multiplication result.
Figure 4.8 Addition scheme for Radix-4
Above figure shows addition scheme for Radix-4 which having six partial product
bits, four LSB bits are input sequence and two MSB’s are correction bit. Here, MSB of
the first row of partial products is added to both MSB’s of second row. In Modified
Radix-4 scheme total number of partial products row are half of the normal partial
product scheme. Suppose, if the multiplier is of 4*4 bit then total number of rows for
partial product including correction bits are two, i.e. half of the rows of original scheme
as shown in above figure. Similarly, for other wide bit multiplier using radix-4 scheme
total number of partial products row are half of the original, that results in less switching
operation hence, less power.
Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4
Above figure shows Architecture of 8*8 Signed Multiplier of Radix-4 scheme
where Partial Products are generated by using Modified Booth’s Recoding Unit. In this
scheme we are generating partial products of 10 bit each, i.e. extra two bit for each row as
explained in table of Radix-4 scheme. The main advantage of Radix-4 scheme is that
number of rows for partial products are become half of the Radix-2 method, i.e., here in
8*8 multiplier number of partial products row are become four, so less compressors are
required and hence less switching operation which causes low-power.
-: References:-[1] D. A Pucknell, K. Eshraghain, Basic VLSI Design, Prentice-Hall, ISBN
81-203-0986-3.
[2] Israel Koren, Computer arithmatics algorithms A.K.Peters Ltd. ISBN 1568811608.[3] A.D.Booth, A signed binary multiplication technique, Quarterly Journal of Mechanics