Oct 29, 2015
1
CHAPTER 1
2
1. INTRODUCTION: 1.1 BACKGROUND:
In todays fast technologically developing world, the shift has been towards
construction of small and portable devices. As the number of these battery operated,
processor driven equipments increase and their performance demand is expected to be
more, there is a need of increasing their processing speed and reducing their power
dissipation. In such a consumer controlled scenario, these demands mean a serious
look into the construction of the devices. These Processors used for such purposes but
also, in these processors, major operations such as FIR filter design, DCT, etc are
done through multipliers. As multipliers are the major components of DSP,
optimization in multiplier design will surely lead to a better operating DSP.
1.2 IMPORTANCE OF MULTIPLIER:
Computational performance of a DSP system is limited by its multiplication
performance and since, multiplication dominates the execution time of most DSP
algorithms therefore high-speed multiplier is much desired . Currently, multiplication
time is still the dominant factor in determining the instruction cycle time of a DSP
chip. With an ever-increasing quest for greater computing power on battery-operated
mobile devices, design emphasis has shifted from optimizing conventional delay time
area size to minimizing power dissipation while still maintaining the high
performance . Traditionally shift and add algorithm has been implemented to design
however this is not suitable for VLSI implementation and also from delay point of
view. Some of the important algorithm proposed in literature for VLSI implementable
fast multiplication is array multiplier and Wallace tree multiplier This paper presents
the fundamental technical aspects behind these approaches. The low power and high
speed VLSI can be implemented with different logic style. The three important
considerations for VLSI design are power, area and delay. There are many proposed
3
logics (or) low power dissipation and high speed and each logic style has its own
advantages in terms of speed and power.
1.3 MULTIPLIER SCHEMES:
There are two basic schemes in the multiplication process. They are serial
multiplication and parallel multiplication.
Serial Multiplication (Shift-Add)
It Computing a set of partial products, and then summing the partial products
together. The implementations are primitive with simple architectures (used when
there is a lack of a dedicated hardware multiplier)
Parallel Multiplication
Partial products are generated simultaneously Parallel implementations are
used for high performance machines, where computation latency needs to be
minimized.
Comparing these two types parallel multiplication has more advantage than the
serial multiplication. Because the parallel type has lesser steps comparing to the serial
multiplication. So it performs faster than the serial multiplication.
1.4 MULTIPLIER FEATURES:
The features of the multiplier are
1.4.1 PIPELINING:
Pipelining allows this multiplier to accept and start the partial process
of multiplication of a set of data, eventhough a part of another multiplication is taking
place.
4
1.4.2 MIXED ARCHITECTURE:
The mixed type architecture has been considered, consisting of Wallace tree
multiplier. This allows taking the advantage of low delay of Wallace multiplier.
1.4.3 CLOCKING:
Clocking has been so done as to allow the multiplier to work at its highest clock
frequency without compromising with the perfect flow of partial products in the
structure.
1.4.4 DATA RANGE:
The data range has been extended from initial 4x4 bit to 16x16 bit,which is
actually the required working data range for many of the DSP processors.
1.4.5 STRUCTURAL MODELLING:
This makes sure the best implementation of the multiplier, beit on ASIC or in
FPGA, and removes any chance of redundant hardware that may be generated.
5
CHAPTER 2
6
2.1 ADDER In electronics, an adder is a digital circuit that performs addition of numbers. In
modern computers adders reside in the arithmetic logic unit (ALU) where other
operations are performed. Although adders can be constructed for many numerical
representations, such as Binary-coded decimal or excess-3, the most common adders
operate on binary numbers. In cases where two's complement is being used to
represent negative numbers it is trivial to modify an adder into an adder-subtracter.
2.2 TYPES OF ADDERS
For single bit adders, there are two general types. A half adder has two inputs,
generally labeled A and B, and two outputs, the sum S and carry C. S is the two-bit
XOR of A and B, and C is the AND of A and B. Essentially the output of a half adder
is the sum of two one-bit numbers, with C being the most significant of these two
outputs.The second type of single bit adder is the full adder. The full adder takes into
account a carry input such that multiple adders can be used to add larger numbers. To
removeambiguity between the input and output carry lines, the carry in is labeled Ci
or Cin while the carry out is labeled Co or Cout.
Half adder
Fig 1: Half adder circuit diagram
A half adder is a logical circuit that performs an addition operation on two
binary digits. The half adder produces a sum and a carry value which are both binary
digits.
7
Following is the logic table for a half adder:
TABLE 1: HALFADDER
A B C S
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
Fig 2: Full adder circuit diagram
8
Schematic symbol for a 1-bit full adder
A full adder is a logical circuit that performs an addition operation on three
binary digits. The full adder produces a sum and carries value, which are both binary
digits. It can be combined with other full adders (see below) or work on its own.
TABLE 2: FULL ADDER
A B Ci C0 S
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Note that the final OR gate before the carry-out output may be replaced by an
XOR gate without altering the resulting logic. This is because the only discrepancy
9
between OR and XOR gates occurs when both inputs are 1; for the adder shown here,
one can check this is never possible. Using only two types of gates is convenient if
one desires to implement the adder directly using common IC chips. A full adder can
be constructed from two half adders by connecting A and B to the input of one half
adder, connecting the sum from that to an input to the second adder, connecting Ci to
the other input and or the two carry outputs. Equivalently, S could be made the three-
bit xor of A, B, and Ci and Co could be made the three-bit majority function of A, B,
and Ci. The output of the full adder is the two-bit arithmetic sum of three one-bit
numbers.
10
CHAPTER 3
11
LITRETURE SURVEY
3.1 BASIC MULTIPLIER ARCHITECTURES:
3.1.1 INTRODUCTION:
Basic multiplier consists ANDed terms (as shown in Fig 1) and array of full
adders and/or half adders arranged so as to obtain partial products at each level. These
partial products are added along to obtain the final result. It is the different
arrangement and the construction changes in these adders that lead to various type of
structures of basic multipliers.
Fig 3: AND gate
Full Adder (FA)implementation is showing the two bits(A,B) and Carry In (Ci)
as inputs and Sum (S) and Carry Out (Cout) as outputs.
12
3.2 BINARY MULTIPLIER
A Binary multiplier is an electronic hardware device used in digital electronics
or a computer or other electronic device to perform rapid multiplication of two
numbers in binary representation. It is built using binary adders.
The rules for binary multiplication can be stated as follows
1. If the multiplier digit is a 1, the multiplicand is simply copied down and
represents the product.
2. If the multiplier digit is a 0 the product is also 0.
For designing a multiplier circuit we should have circuitry to provide or do the
following three things:
1. it should be capable identifying whether a bit is 0 or 1.
2. It should be capable of shifting left partial products.
3. It should be able to add all the partial products to give the products as sum of
partial products.
4. It should examine the sign bits. If they are alike, the sign of the product will
be a positive, if the sign bits are opposite product will be negative. The sign
bit of the product stored with above criteria should be displayed along with the
product.
From the above discussion we observe that it is not necessary to wait until all
the partial products have been formed before summing them. In fact the addition of
partial product can be carried out as soon as the partial product is formed.
Notations:
a multiplicand
b multiplier
p product
13
Binary multiplication (eg n=4)
P = ab
an1 an2a1a0
bn1 bn2b1b0
p2 n1 p2 n2p1 p0
x x x x a (Multiplicant)
x x x x b (Multiplier)
---------
x x x x b0a20
x x x x b1a21 (Partial Product)
x x x x b2a22
x x x x b3a23
---------------
x x x x x x x x p (Partial Sum)
3.2.1 BASIC HARDWARE MULTIPLIER
Partial products In binary, the partial products are trivial- If multiplier
bit=1,copy the multiplicand Else 0 Use an AND gate.
14
3.2.2 MULTIPLY ACCUMULATE CIRCUITS
Multiplication followed by accumulation is a operation in many digital systems,
particularly those highly interconnected like digital filters, neural networks, data
quantizers, etc. One typical MAC(multiply-accumulate) architecture is illustrated in
figure. It consists of multiplying 2 values, then adding the result to the previously
accumulated value, which must then be restored in the registers for future
accumulations. Another feature of MAC circuit is that it must check for overflow,
which might happen when the number of MAC operation is large . This design can be
done using component because we have already design each of the units shown in
figure. However since it is relatively simple circuit, it can also be designed directly. In
any case the MAC circuit, as a whole, can be used as a component in application like
digital filters and neural networks.
3.3 WALLACE TREE MULTIPLIER:
A Wallace tree is an efficient hardware implementation of a digital circuit that
multiplies two integers. For a NxN bit multiplication, partial products are formed
from (N^2)AND gates. Next N rows of the partial products are grouped together in set
of three rows each. Any additional rows that are not a member of these groups are
transferred to the next level without modification. For a column consisting of three
15
partial products and a full adder is used with the sum dropped down to the same
column whereas the carry out is brought to the next higher column. For column with
two partial products, a half adder is used in place of full adder. At the final stage, a
carry propagation adder is used to add over all the propagating carries to get the final
result. It can also be implemented using Carry Save Adders. Sometimes it will be
Combined with Booth Encoding.Various other researches have been done to reduce
the number of adders, for higher order bits such as 16 & 32.Applications, as the use in
DSP for performing FFT,FIR, etc.,
3.3.1 WALLACE TREE HARDWARE ARCHITECTURE:
Fig 4: wallace tree hardware architecture
16
3.3.2 FUNCTION:
The Wallace tree has three steps:
Multiply (that is - AND) each bit of one of the arguments, by each bit of the
other, yielding n2 results. Depending on position of the multiplied bits, the wires
carry different weights, for example wire of bit carrying result of a2b3 is 32.
Reduce the number of partial products to two by layers of full and half adders.
Group the wires in two numbers, and add them with a conventional adder.
3.3.3 EXAMPLE:
Suppose two numbers are being multiplied:
a3a2a1a0 X
b3b2b1b0
___________________________________
a3b0 a2b0 a1b0 a0b0
a3b1 a2b1 a1b1 a0b1
a3b2 a2b2 a1b2 a0b2
a3b3 a2b3 a1b3 a0b3
_____________________________________
Arranging the partial products in the form of tree structure
a3b3 a2b3 a1b3 a0b3 a0b2 a0b1 a0b0
a3b2 a2b2 a1b2 a1b1 a1b0
a3b1 a2b1 a2b0
a3b0
17
3.3.4 ADDER ELEMENTS
Half Adder:
18
Full Adder:
3.3.5 ADVANTAGES:
Each layer of the tree reduces the number of vectors by a factor of 3:2
Minimum propagation delay.
The benefit of the Wallace tree is that there are only O(log n) reduction layers,
but adding partial products with regular adders would require O(log n)2 time.
3.3.6 DISADVANTAGES:
Wallace trees do not provide any advantage over ripple adder trees in many FPGAs.
Due to the irregular routing, they may actually be slower and are certainly more difficult to route.
Adder structure increases for increased bit multiplication
.
19
CHAPTER 4
20
4. ARRAY MULTIPLIER:
This is the most basic form of binary multiplier construction. Its basic principle
is exactly like that done by pen and paper. It consists of a highly regular array of full
adders, the exact number depending on the length of the binary number to be
multiplied. Each row of this array generates a partial product. This partial product
generated value is then added with the sum and carry generated on the next row. The
final result of the multiplication is obtained directly after the last row. ANDed terms
generated using logic AND gate. Full Adder (FA) implementation showing the two
bits(A,B) and Carry In (Ci) as inputs and Sum (S) and Carry Out (Co) as outputs.
4.1 HARDWARE ARCHITECTURE
Fig 5: Hardware architecture
21
4.2 EXAMPLE :
4*4 bit multiplication
a3 a2 a1 a0
b3 b2 b1 b0
a3b0 a2b0 a1b0 a0b0
a3b1 a2b1 a1b1 a0b1
a3b2 a2b2 a1b2 a0b2
a3b3 a3b2 a3b1 a3b0
p7 p6 p5 p4 p3 p2 p1 p0
4.3 PRINCIPLES OF ARRAY MULTIPLIER:
Fig 6: Array multiplier
22
Due to the highly regular structure, array multiplier is very easily constructed
and also can be densely implemented in VLSI, which takes less space. But compared
to other multiplier structures proposed later, it shows a high computational time. In
fact, the computational time is of order of log O(N), one of the highest in any
multiplier structure.
4.4 BAUGH-WOOLEY MULTIPLIER :
Baugh-Wooley Multiplier are used for both unsigned and signed
number multiplication. Signed Number operands which are represented in 2s
complemented form. Partial Products are adjusted such that negative sign
move to last step, which in turn maximize the regularity o f the
multip lic atio n array. Baugh-Wo o ley Multip lier operates on signed
operands with 2s complement representation to make sure that the signs of
all partial products are positive. To reiterate, the numerical value of 2s
complement numbers, suppose X and Y can be obtained from following product
terms made of one AND gate.
Variables with bars denotes prior inversions. Inverters are connected
before the input of the full adder or the AND gates as required by the algorithm.
23
Each column represents the addition in accordance with the respective weight of the
product term.
4.5 BAUGH-WOOLEY HARDWARE ARCHITECTURE:
Fig7: Signed 2s-Complement baugh wooley multiplier
4.6 MULTIPLING TWOS COMPLIMENT NUMPERS:
The Baugh-wooley multiplication algorithm is an efficient way to handle the
sign bits.This technique has been developed in order to design regular multipliers that
is suited for 2s-complement numbers.Dr.Gebali has extended this basic idea and
developed efficient fast inner product processors capable of performing double-
precision multiply-accumulate operations without the speed penalty.Let us consider
two n-bit numbers,A and B,to be multiplied. A and B can be represented as
24
Where the ais and bis are the bits in A and B, respectively, and an-1 and
bn-1 are the sign bits.
The product, P = A * B, is then given by the following equation:
It indicates that the final product is obtained by subtracting the last two positive terms
from the first two terms.
25
4.7 BLOCK DIAGRAM OF A 4*4 BAUGH-WOOLEY MULTIPLIER:
Fig8: Block diagram of baugh wooley multiplier
4.8 ADVANTAGES:
Minimum complexity.
Easily scalable.
Easily pipelined.
Regular shape, easy to place & route.
4.9 DISADVANTAGES:
High power consumption.
More digital gates resulting in large chip area.
26
CHAPTER 5
27
5.1 PROPOSED MULTIPLIER DESIGN
Mathematics is a mother of all sciences. Mathematics is full of magic and
mysteries. The ancient Indians were able to understand these mysteries and develop
simple keys to solve these mysteries. Thousands of years ago the Indians used these
techniques in different fields like construction of temples, astrology, medicine, science
etc., due to which Indian emerged as the richest country in the world. The Indians
called this system of calculations as The vedic mathematics. Vedic Mathematics is
much simpler and easy to understand than conventional mathematics. The ancient
system of Vedic Mathematics was reintroduced to the world by Swami Bharati
Krishna Tirthaji Maharaj, Shan-karacharya of Goverdhan Peath. Vedic Mathematics
was the name given by him. Bharati Krishna, who was himself a scholar of Sanskrit,
Mathematics, History and Philosophy, was able to reconstruct the mathematics of the
Vedas. According to his re-search all of mathematics is based on sixteen Sutras, or
word-formulae and thirteen sub-sutras. According to Mahesh Yogi, The sutras of
Vedic Mathematics are the software for the cosmic computer that runs this universe.
Vedic Mathematics introduces the wonderful applications to Arithmetical
computations, theory of numbers, compound multiplications, algebraic operations,
factorizations, simple quadratic and higher order equations, simultaneous quadratic
equations, partial fractions, calculus, squaring, cubing, square root, cube root,
coordinate geometry and wonderful Vedic Numerical code. Conventional
mathematics is an integral part of engineering education since most engineering
system designs are based on various mathematical approaches. All the leading
manufacturers of microprocessors have developed their architectures to be suit-able
for conventional binary arithmetic methods. The need for faster processing speed is
continuously driving major improvements in processor technologies, as well as the
search for new algorithms. The Vedic mathematics approach is totally different and
considered very close to the way a human mind works. A multiplier is one of the key
28
hardware blocks in most of applications such as digital signal processing , encryption
and decryption algorithms in cryptography and in other logical computations. With
advances in technology, many researchers have tried to design multipliers which offer
either of the following high speed, low power consumption, regularity of layout and
hence less area or even combination of them in multiplier. The Vedic multiplier is
considered here to satisfy our requirements. In this work, we present multiplication
operations based on Urdhva tiryagbhyam in binary, designed using a new proposed 4-
bit adder and implemented in HDL language. The paper is organized as follows.
Vedic multiplication method based on Urdhva Tiryagbhyam sutra for binary numbers
is discussed. A new 4-bit adder is proposed and deals with the design and
implementation of the above said multiplier. Finally,summarizes the experimental
results obtained, with this conclusions of the work.
5.2 VEDIC MULTIPLIER:
Digital signal processors (DSPs) are very important in various engineering
disciplines. Fast multiplication is very important in DSPs for convolution, Fourier
transforms etc. A fast method for multiplication based on ancient Indian Vedic
mathematics is proposed in this work. Among the various methods of multiplications
in Vedic mathematics, Urdhva tiryakbhyam is discussed in detail. Urdhva
tiryakbhyam is a general multiplication formula applicable to all cases of
multiplication. This algorithm is applied to digital arithmetic and multiplier
architecture is formulated. This is a highly modular design in which smaller blocks
can be used to build higher blocks. The coding is done in Verilog HDL and synthesis
is done using Altera Quartus-II. The combinational delay obtained after synthesis is
compared with the performance of the Baugh wooley and Wallace tree multiplier
which are fast multiplier. This Vedic multiplier can bring about great improvement in
DSP performance.
29
5.3 IMPORTANCE OF VEDIC MATHEMATICS:
Among the various methods of multiplication in Vedic mathematics, Urdhva
tiryagbhyam, being a general multiplication formula, is equally applicable to all cases
of multiplication. This is more efficient in the multiplication of large numbers with
respect to speed and area. From this work, a 4 X 4 binary multiplier is designed using
this sutra. This multiplier can be used in applications such as digital signal processing,
encryption and decryption algorithms in cryptography, and in other logical
computations. This design is implemented in Verilog HDL.
5.4 Urdhva Tiryakbhyam Sutra
The given Vedic multiplier based on the Vedic multiplication formulae (Sutra).
This Sutra has been traditionally used for the multiplication of two numbers. Urdhva
Tiryakbhyam Sutra is a general multiplication formula applicable to all cases of
multiplication. It means Vertically and Crosswise . The digits on the two ends of the
line are multiplied and the result is added with the previous carry. When there are
more lines in one step, all the results are added to the previous carry. The least
significant digit of the number thus obtained acts as one of the result digits and the
rest act as the carry for the next step. Initially the carry is taken to be as zero. The line
diagram for multiplication of two 4-bit numbers is as shown in figure.
30
To illustrate this multiplication scheme, let us consider the multiplication of two
decimal numbers (325 * 728). Line diagram for the multiplication is shown in Figure.
The digits on the two ends of the line are multiplied and the result is added with the
previous carry. When there are more lines in one step, all the results are added to the
previous carry. The least significant digit of the number thus obtained acts as one of
the result digits and the rest act as the carry for the next step. Initially the carry is
taken to be zero
Fig 9: Multiplication of two decimal numbers by Urdhva tiryagbhyam sutra
Urdhva tiryagbhyam Sutra is used for two decimal numbers multiplication .
This Sutra is used in binary multiplication as shown in Figure . The 4-bit binary
numbers to be multiplied are written on two consecutive sides of the square as shown
in the figure. The square is divided into rows and columns where each row/column
corresponds to one of the digit of either a multiplier or a multiplicand. Thus, each bit
of the multiplier has a small box common to a digit of the multiplicand. Each bit of
the multiplier is then independently multiplied (logical AND) with every bit of the
31
multiplicand and the product is written in the common box. All the bits lying on a
crosswise dotted line are added to the previous carry. The least significant bit of the
obtained number acts as the result bit and the rest as the carry for the next step. Carry
for the first step (i.e., the dotted line on the extreme right side) is taken to be zero. We
can extend this method for higher order binary numbers.
Fig 10: Multiplication of two 4-bit binary numbers by Urdhva tiryagbhyam sutra
Now we will extend this Sutra to binary number system. For the multiplication
algorithm, let us consider the multiplication of two 8 bit binary numbers
A7A6A5A4A3A2A1A0 and B7B6B5B4B3B2B1B0. As the result of this
multiplication would be more than 8 bits, we express it as R7R6R5R4R3R2R1R0.
As in the last case, the digits on the both sides of the line are multiplied and added
with the carry from the previous step. This generates one of the bits of the result and a
carry. This carry is added in the next step and hence the process goes on. If more
than one lines are there in one step, all the results are added to the previous carry. In
each step, least significant bit acts as the result bit and all the other bits act as carry.
32
For example, if in some intermediate step, we will get 011, then1 will act as result bit
and 01 as the carry. Thus we will get the following expressions
R0=A0B0
C1R1=A0B1+A1B0
C2R2=C1+A0B2+A2B0+A1B1
C3R3=C2+A3B0+A0B3+A1B2+A2B1
C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2
C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3
C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4 +A3B3
C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5 +A4B3+A3B4
C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4
C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5
C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5
C11R11=C10+A7B4+A4B7+A6B5+A5B6
C12R12=C11+A7B5+A5B7+A6B6
C13R13=C12+A7B6+A6B7
C14R14=C13+A7B7
C14R14R13R12R11R10R9R8R7R6R5R4R3R2R1R0 being the final product.
Hence this is the general mathematical formula applicable to all cases of
multiplication. All the partial products are calculated in parallel and the delay
associated is mainly the time taken by the carry to propagate through the adders which
form the multiplication array. So, this is not an efficient algorithm for the
multiplication of large numbers as a lot of propagation delay will be involved in such
cases. To overcome this problem, Nikhilam Sutra will present an efficient method of
multiplying two large numbers.
33
5.5 THE MULTIPLIER ARCHITECTURE:
The multiplier architecture is based on this Urdhva tiryakbhyam sutra. The
advantage of this algorithm is that partial products and their sums are calculated in
parallel. This parallelism makes the multiplier clock independent. The other main
advantage of this multiplier as compared to other multipliers is its regularity. Due to
this modular nature the lay out design will be easy. The architecture can be explained
with two four bit numbers i.e. the multiplier and multiplicand are four bit numbers.
The multiplicand and the multiplier are split into four bit blocks. The four bit blocks
are again divided into two bit multiplier blocks. According to the algorithm the 4 x 4
(AxB) bit multiplication will be as follows
A = AH - AL, B = BH - BL
A = A3A2A1A0
B = B3B2B1B0
AH = A3A2, AL = A1A0
BH = B3B2, BL = B1B0
34
5.6 METHODOLOGY:
Fig 11: vedic Algorithm
By the algorithm, the product can be obtained as follows.
Product of A x B = AL x BL + AH x BL + AL x BH + AH x BH
The parallel multiplications:-
35
The 4 x 4 bit multiplication can be again reduced to 2 x 2 bit multiplications. The 4 bit
multiplicand and the multiplier are divided into two-bit blocks.
AH = AHH - AHL
BH = BHH - BHL
AH x BH = AHL x BHL + AHH x BHL + AHL x BHH + AHH x BHH
Here the parallel multiplications are
36
5.7 ADVANTAGE OF VEDIC METHODS
The use of Vedic mathematics lies in the fact that it reduces the typical
calculations in conventional mathematics to very simple ones. This is so because the
Vedic formulae are claimed to be based on the natural principles on which the human
mind works . Vedic Mathematics is a methodology of arithmetic rules that allow more
efficient speed implementation . This is a very interesting field and presents some
effective algorithms which can be applied to various branches of engineering such as
computing.
37
CHAPTER 6
38
6.1 VERILOG LANGUAGE
6.1.1 Introduction of Verilog HDL
Verilog HDL has evolved as a standard hardware description language. Verilog
HDL offers many useful features.Verilog HDL is a general-purpose hardware
description language that is easy to learn and easy to use. It is similar in syntax to the
C programming language. Designers with C programming experience will find it easy
to learn Verilog HDL. Verilog HDL allows different levels of abstraction to be mixed
in the same model.Thus, a designer can define a hardware model in terms of switches,
gates, RTL, or behavioral code. Also, a designer needs to learn only one language for
stimulus and hierarchical design. Most popular logic synthesis tools support Verilog
HDL. This makes it the language of choice for designers. All fabrication vendors
provide Verilog HDL libraries for postlogic synthesis simulation. Thus, designing a
chip in Verilog HDL allows the widest choice of vendors. The Programming
Language Interface (PLI) is a powerful feature that allows the user to write custom C
code to interact with the internal data structures of Verilog. Designers can customize a
Verilog HDL simulator to their needs with the PLI.
6.2 Importance of HDLs
HDLs have many advantages compared to traditional schematic-based design.
Designs can be described at a very abstract level by use of HDLs. Designers can write
their RTL description without choosing a specific fabrication technology. Logic
synthesis tools can automatically convert the design to any fabrication technology. If a
new technology emerges, designers do not need to redesign their circuit. They simply
input the RTL description to the logic synthesis tool and create a new gate-level
netlist, using the new fabrication technology. The logic synthesis tool will optimize
the circuit in area and timing for the new technology. By describing designs in HDLs,
39
functional verification of the design can be done early in the design cycle. Since
designers work at the RTL level, they can optimize and modify the RTL description
until it meets the desired functionality. Most design bugs are eliminated at this point.
This cuts down design cycle time significantly because the probability of hitting a
functional bug at a later time in the gate-level netlist or physical layout is minimized.
Designing with HDLs is analogous to computer programming. A textual description
with comments is an easier way to develop and debug circuits. This also provides a
concise representation of the design, compared to gate-level schematics. Gate-level
schematics are almost incomprehensible for very complex designs. HDL-based design
is here to stay. With rapidly increasing complexities of digital circuits and
increasingly sophisticated EDA tools, HDLs are now the dominant method for large
digital designs. No digital circuit designer can afford to ignore HDL-based
design.New tools and languages focused on verification have emerged in the past few
years. These languages are better suited for functional verification. However, for logic
design, HDLs continue as the preferred choice.
6.3 Trends in HDLs
The speed and complexity of digital circuits have increased rapidly. Designers
have responded by designing at higher levels of abstraction. Designers have to think
only in terms of functionality. EDA tools take care of the implementation details.
With designer assistance, EDA tools have become sophisticated enough to achieve a
close to optimum implementation. The most popular trend currently is to design in
HDL at an RTL level, because logic synthesis tools can create gatelevel netlists from
RTL level design. Behavioral synthesis allowed engineers to design directly in terms
of algorithms and the behavior of the circuit, and then use EDA tools to do the
translation and optimization in each phase of the design. However, behavioral
synthesis did not gain widespread acceptance. Today, RTL design continues to be
40
very popular. Verilog HDL is also being constantly enhanced to meet the needs of
new verification methodologies. Formal verification and assertion checking
techniques have emerged. Formal verification applies formal mathematical
techniques to verify the correctness of Verilog HDL descriptions and to establish
equivalency between RTL and gate-level netlists. However the need to describe a
design in Verilog HDL will not go away. Assertion checkers allow checking to be
embedded in the RTL code. This is a convenient way to do checking in the most
important parts of a design. New verification languages have also gained rapid
acceptance. These languages combine the parallelism and hardware constructs from
HDLs with the object oriented nature of C++.
These languages also provide support for automatic stimulus creation, checking,
and coverage. However, these languages do not replace Verilog HDL. They simply
boost the productivity of the verification process. Verilog HDL is still needed to
describe the design. For very high-speed and timing-critical circuits like
microprocessors, the gate-level netlist provided by logic synthesis tools is not optimal.
In such cases, designers often mix gate-level description directly into the RTL
description to achieve optimum results. This practice is opposite to the high-level
design paradigm, yet it is frequently used for highspeed designs because designers
need to squeeze the last bit of timing out of circuits, and EDA tools sometimes prove
to be insufficient to achieve the desired results. Another technique that is used for
system-level design is a mixed bottom-up methodology where the designers use either
existing Verilog HDL modules, basic building blocks, or vendor-supplied core blocks
to quickly bring up their system simulation. This is done to reduce development costs
and compress design schedules. For example, consider a system that has a CPU,
graphics chip, I/O chip, and a system bus. The CPU designers would build the next-
generation CPU themselves at an RTL level, but they would use behavioral models for
41
the graphics chip and the I/O chip and would buy a vendor-supplied model for the
system bus. Thus, the system-level simulation for the CPU could be up and running
very quickly and long before the RTL descriptions for the graphics chip and the I/O
chip are complete.
42
CHAPTER 7
43
7.1 INTRODUCTION OF FPGA:
A field-programmable gate array (FPGA) is an integrated circuit designed to
be configured by a customer or a designer after manufacturing hence "field-
programmable". The FPGA configuration is generally specified using a hardware
description language (HDL), similar to that used for an application-specific integrated
circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as
they were for ASICs, but this is increasingly rare). Contemporary FPGAs have large
resources of logic gates and RAM blocks to implement complex digital computations.
As FPGA designs employ very fast IOs and bidirectional data buses it becomes a
challenge to verify correct timing of valid data within setup time and hold time. Floor
planning enables resources allocation within FPGA to meet these time constraints.
FPGAs can be used to implement any logical function that an ASIC could perform.
The ability to update the functionality after shipping, partial re-configuration of a
portion of the design and the low non-recurring engineering costs relative to an ASIC
design (notwithstanding the generally higher unit cost), offer advantages for many
applications.
FPGAs contain programmable logic components called "logic blocks", and a
hierarchy of reconfigurable interconnects that allow the blocks to be "wired
together"somewhat like many (changeable) logic gates that can be inter-wired in
(many) different configurations. Logic blocks can be configured to perform complex
combinational functions, or merely simple logic gates like AND and XOR. In most
FPGAs, the logic blocks also include memory elements, which may be simple flip-
flops or more complete blocks of memory.
Some FPGAs have analog features in addition to digital functions. The most
common analog feature is programmable slew rate and drive strength on each output
pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise
44
ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-
speed channels that would otherwise run too slow.[ Another relatively common analog
feature is differential comparators on input pins designed to be connected to
differential signaling channels. A few "mixed signal FPGAs" have integrated
peripheral analog-to-digital converters (ADCs) and digital-to-analog converters
(DACs) with analog signal conditioning blocks allowing them to operate as a system-
on-a-chip.[Such devices blur the line between an FPGA, which carries digital ones and
zeros on its internal programmable interconnect fabric, and field-programmable
analog array (FPAA), which carries analog values on its internal programmable
interconnect fabric.
7.2 FPGA architecture:
The most common FPGA architecture consists of an array of logic blocks
(called Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on
vendor), I/O pads, and routing channels. Generally, all the routing channels have the
same width (number of wires). Multiple I/O pads may fit into the height of one row or
the width of one column in the array.
An application circuit must be mapped into an FPGA with adequate resources.
While the number of CLBs/LABs and I/Os required is easily determined from the
design, the number of routing tracks needed may vary considerably even among
designs with the same amount of logic. For example, a crossbar switch requires much
more routing than a systolic array with the same gate count. Since unused routing
tracks increase the cost (and decrease the performance) of the part without providing
any benefit, FPGA manufacturers try to provide just enough tracks so that most
designs that will fit in terms of Lookup tables (LUTs) and IOs can be routed. This is
determined by estimates such as those derived from Rent's rule or by experiments with
existing designs.
45
In general, a logic block (CLB or LAB) consists of a few logical cells (called
ALM, LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder (FA) and a
D-type flip-flop, as shown below. The LUTs are in this figure split into two 3-input
LUTs. In normal mode those are combined into a 4-input LUT through the left mux.
In arithmetic mode, their outputs are fed to the FA. The selection of mode is
programmed into the middle multiplexer. The output can be either synchronous or
asynchronous, depending on the programming of the mux to the right, in the figure
example. In practice, entire or parts of the FA are put as functions into the LUTs in
order to save space.
Fig 12: FPGA architecture
ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with
some shared signals. CLBs/LABs typically contains a few ALMs/LEs/Slices. In
recent years, manufacturers have started moving to 6-input LUTs in their high
performance parts, claiming increased performance. Since clock signals (and often
46
other high fan out signals) are normally routed via special-purpose dedicated routing
networks in commercial FPGAs, they and other signals are separately managed.
7.3 Cyclone II FPGA Family:
Altera's Cyclone II FPGA family is designed on an all layer copper, low k, 1.2-
V SRAM process and is optimized for the smallest possible die size. Built on TSMCs
highly successful 90-nm process technology using 300-mm wafers, the Cyclone II
FPGA family offers higher densities, more features, exceptional performance, and the
benefits of programmable logic at ASIC prices.
7.4 Altera's Cyclone II FPGA Family Features:
1.Cost-Optimized Architecture
The Cyclone II architecture is optimized for the lowest cost and offers up to 68,416
logic elements (LEs) more than 3x the density of first generation Cyclone FPGAs.
The logic resources in Cyclone II FPGAs can be used to implement complex
applications.
2.High Performance
Cyclone II FPGAs are 60 percent faster than competing low-cost 90-nm FPGAs,
making them the highest performing low-cost 90-nm FPGAs on the market.
3.Low Power
Cyclone II FPGAs are half the power of competing low-cost 90-nm FPGAs,
dramatically reducing both static and dynamic power.
47
4.Process Technology
Cyclone II FPGAs are manufactured on 300-mm wafers using TSMC's leading-
edge 90-nm, low-k dielectric process technology.
5.Embedded Multipliers
Cyclone II FPGAs offer up to 150 18 x 18 multipliers that are ideal for low-cost
digital signal processing (DSP) applications. These multipliers are capable of
implementing common DSP functions such as finite impulse response (FIR) filters,
fast Fourier transforms (FFTs), correlators, encoders/decoders, and numerically
controlled oscillators (NCOs).
6.Fast On Capability
Select Cyclone II FPGAs offer fast on capability, allowing them to be
operational soon after power up, making them ideal for automotive and other
applications where quick start-up time is essential. Cyclone II FPGAs, which offer a
faster power-on reset (POR) time, are designated with an A in the device ordering
code (EP2C5A, EP2C8A, EP2C15A, and EP2C20A).
48
CHAPTER 8
49
RESULTS
8.1 BAUGH WOOLEY MULTIPLIER
Fig 13 : simulation result for baugh wooley multiplier
50
Fig:14 RTL schemetic of Baugh wooley multiplier
51
BAUGH WOOLEY:
8.1.1 AREA (TOTAL LOGIC ELEMENTS)
8.1.2 POWER:
52
8.1.3 DELAY ( TIMING REQUIREMENT)
8.2 VEDIC MULTIPLIER
Fig 15: simulation result of vedic multiplier
53
Fig 16: RTL schematic of the vedic multiplier
8.2.1 AREA (TOTAL LOGIC ELEMENTS)
54
8.2.2 DELAY ( TIMING REQUIREMENT)
8.2.3 POWER:
BAUGH WOOLEY MULTIPLIER
TIME (ns)
22.314
POWER (mW)
195.68
TOTAL LOGIC ELEMENTS
41
55
VEDIC MULTIPLIER
TIME (ns)
16.939
POWER (mW)
195.18
TOTAL LOGIC ELEMENTS
28
8.3 COMPARISON OF TWO MULTIPLIERS
8.3.1 DELAY ( TIMING REQUIREMENT)
BAUGH WOOLEY: 22.314 nS
VEDIC: 16.939 nS
56
8.3.2 AREA (TOTAL LOGIC ELEMENTS)
0
5
10
15
20
25
30
35
40
45
Baugh-wooley multiplier Vedic multiplier
Total logic elements
BAUGH WOOLEY : 41
VEDIC: 28
8.3.3POWER
BAUGH WOOLEY: 195.68 mW
VEDIC: 195.18 mW
57
CHAPTER 9
58
CONCLUSION
Multipliers are one the most important component of many systems. So we
always need to find a better solution in case of multipliers. Our multipliers should
always consume less power. So through our project we try to determine which of the
two algorithms works the best. Our project gives a clear concept of different multiplier
and their implementation in Altera Quartus-II tool. We found that the vedic multiplier
is much option than the Baugh wooley multiplier. We concluded this from the result
of power consumption and the total area. In case of vedic multiplier, the total area is
much less than that of baugh wooley multipliers. Hence the power consumption is
also less. This is clearly depicted in our results. This speeds up the calculation and
makes the system faster. When the two multipliers were compared we found that
baugh wooley multiplier is more power consuming and have the maximum area. This
is because it uses a large number of adders. As a result it slows down the system
because now the system has to do a lot of calculation. In the end we determine that
Urdhva Tiryakbhyam algorithm works the best.
59
CHAPTER 10
60
REFERENCES:
[1] Delay-Power performance of Multipliers in VLSI Design Sumit Vaidya and
Deepak Dandekar International Journal of Computer Networks & Communications
(IJCNC), Vol.2, No.4, July 2010
[2] Design and implementation of different multipliers using VHDL Prof. Dr. K.K
Mahapatra Dept. of Electronics and Communication Engineering, National Institute
of Technology, Rourkela 2007.
[3] A Reduced-Bit Multiplication Algorithm for Digital Arithmetic Harpreet Singh
Dhillon and Abhijit Mitra, International Journal of Computational and Mathematical
Sciences 2:2 ,2008.
[4] VLSI implementation of vedic multiplier with reduced delay First Krishnaveni
D., Department of TCE, A.P.S College of Engineering; Second Umarani T.G.,
Department of ECE, A.P.S College of Engineering, Somanahalli. International
Journal of Advanced Technology & Engineering Research (IJATER) National
Conference on Emerging Trends in Technology (NCET-Tech) ISSN
[5]A New Low Power 3232- bit Multiplier Pouya Asadi and Keivan Navi, World
Applied Sciences Journal IDOSI Publication.
[6] A Novel Parallel Multiply and Accumulate (V-MAC) Architecture Based On
Ancient Indian Vedic Mathematics Himanshu Thapliyal and Hamid RArbania.
[7] Low power and high speed 8x8 bit Multiplier Using Non-clocked Pass Transistor
Logic C.Senthilpari,Ajay Kumar Singh and K. Diwadkar, 2007, IEEE.
61
[8] Kiat-seng Yeo and Kaushik Roy Low-voltage,low power VLSI sub system Mc
Graw-Hill Publication.
[9] Jong Duk Lee, Yong Jin Yoony, Kyong Hwa Leez and Byung-Gook Park
Application of Dynamic Pass Transistor Logic to 8-Bit Multiplier Journal of the
Korean Physical Society,March 2001
[10] C. F. Law, S. S. Rofail, and K. S. Yeo A Low-Power 1616-Bit Parallel
Multiplier Utilizing Pass-Transistor Logic IEEE Journal of Solid State circuits,
October 1999.
[11] Low Power High Performance Multiplier C.N. Marimuthu and P.Thiangaraj,
ICGST-PDCS,Volume 8, December 2008.
[12] ASIC Implementation of 4 Bit Multipliers Pravinkumar Parate ,IEEE
Computer society.ICETET,2008.25.
Books referred:
1.VHDL by B Bhaskar
2.Verilog HDL: A Guide to Digital Design and Synthesis, Second Edition
By Samir Palnitkar, Publisher: Prentice Hall PTR, : February 21, 2003
62