project report about multipliers

1

CHAPTER 1

2

1. INTRODUCTION: 1.1 BACKGROUND:

In todays fast technologically developing world, the shift has been towards

construction of small and portable devices. As the number of these battery operated,

processor driven equipments increase and their performance demand is expected to be

more, there is a need of increasing their processing speed and reducing their power

dissipation. In such a consumer controlled scenario, these demands mean a serious

look into the construction of the devices. These Processors used for such purposes but

also, in these processors, major operations such as FIR filter design, DCT, etc are

done through multipliers. As multipliers are the major components of DSP,

optimization in multiplier design will surely lead to a better operating DSP.

1.2 IMPORTANCE OF MULTIPLIER:

Computational performance of a DSP system is limited by its multiplication

performance and since, multiplication dominates the execution time of most DSP

algorithms therefore high-speed multiplier is much desired . Currently, multiplication

time is still the dominant factor in determining the instruction cycle time of a DSP

chip. With an ever-increasing quest for greater computing power on battery-operated

mobile devices, design emphasis has shifted from optimizing conventional delay time

area size to minimizing power dissipation while still maintaining the high

performance . Traditionally shift and add algorithm has been implemented to design

however this is not suitable for VLSI implementation and also from delay point of

view. Some of the important algorithm proposed in literature for VLSI implementable

fast multiplication is array multiplier and Wallace tree multiplier This paper presents

the fundamental technical aspects behind these approaches. The low power and high

speed VLSI can be implemented with different logic style. The three important

considerations for VLSI design are power, area and delay. There are many proposed

3

logics (or) low power dissipation and high speed and each logic style has its own

advantages in terms of speed and power.

1.3 MULTIPLIER SCHEMES:

There are two basic schemes in the multiplication process. They are serial

multiplication and parallel multiplication.

Serial Multiplication (Shift-Add)

It Computing a set of partial products, and then summing the partial products

together. The implementations are primitive with simple architectures (used when

there is a lack of a dedicated hardware multiplier)

Parallel Multiplication

Partial products are generated simultaneously Parallel implementations are

used for high performance machines, where computation latency needs to be

minimized.

Comparing these two types parallel multiplication has more advantage than the

serial multiplication. Because the parallel type has lesser steps comparing to the serial

multiplication. So it performs faster than the serial multiplication.

1.4 MULTIPLIER FEATURES:

The features of the multiplier are

1.4.1 PIPELINING:

Pipelining allows this multiplier to accept and start the partial process

of multiplication of a set of data, eventhough a part of another multiplication is taking

place.

4

1.4.2 MIXED ARCHITECTURE:

The mixed type architecture has been considered, consisting of Wallace tree

multiplier. This allows taking the advantage of low delay of Wallace multiplier.

1.4.3 CLOCKING:

Clocking has been so done as to allow the multiplier to work at its highest clock

frequency without compromising with the perfect flow of partial products in the

structure.

1.4.4 DATA RANGE:

The data range has been extended from initial 4x4 bit to 16x16 bit,which is

actually the required working data range for many of the DSP processors.

1.4.5 STRUCTURAL MODELLING:

This makes sure the best implementation of the multiplier, beit on ASIC or in

FPGA, and removes any chance of redundant hardware that may be generated.

5

CHAPTER 2

6

2.1 ADDER In electronics, an adder is a digital circuit that performs addition of numbers. In

modern computers adders reside in the arithmetic logic unit (ALU) where other

operations are performed. Although adders can be constructed for many numerical

representations, such as Binary-coded decimal or excess-3, the most common adders

operate on binary numbers. In cases where two's complement is being used to

represent negative numbers it is trivial to modify an adder into an adder-subtracter.

2.2 TYPES OF ADDERS

For single bit adders, there are two general types. A half adder has two inputs,

generally labeled A and B, and two outputs, the sum S and carry C. S is the two-bit

XOR of A and B, and C is the AND of A and B. Essentially the output of a half adder

is the sum of two one-bit numbers, with C being the most significant of these two

outputs.The second type of single bit adder is the full adder. The full adder takes into

account a carry input such that multiple adders can be used to add larger numbers. To

removeambiguity between the input and output carry lines, the carry in is labeled Ci

or Cin while the carry out is labeled Co or Cout.

Half adder

Fig 1: Half adder circuit diagram

A half adder is a logical circuit that performs an addition operation on two

binary digits. The half adder produces a sum and a carry value which are both binary

digits.

7

Following is the logic table for a half adder:

TABLE 1: HALFADDER

A B C S

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0

Fig 2: Full adder circuit diagram

8

Schematic symbol for a 1-bit full adder

A full adder is a logical circuit that performs an addition operation on three

binary digits. The full adder produces a sum and carries value, which are both binary

digits. It can be combined with other full adders (see below) or work on its own.

TABLE 2: FULL ADDER

A B Ci C0 S

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

Note that the final OR gate before the carry-out output may be replaced by an

XOR gate without altering the resulting logic. This is because the only discrepancy

9

between OR and XOR gates occurs when both inputs are 1; for the adder shown here,

one can check this is never possible. Using only two types of gates is convenient if

one desires to implement the adder directly using common IC chips. A full adder can

be constructed from two half adders by connecting A and B to the input of one half

adder, connecting the sum from that to an input to the second adder, connecting Ci to

the other input and or the two carry outputs. Equivalently, S could be made the three-

bit xor of A, B, and Ci and Co could be made the three-bit majority function of A, B,

and Ci. The output of the full adder is the two-bit arithmetic sum of three one-bit

numbers.

10

CHAPTER 3

11

LITRETURE SURVEY

3.1 BASIC MULTIPLIER ARCHITECTURES:

3.1.1 INTRODUCTION:

Basic multiplier consists ANDed terms (as shown in Fig 1) and array of full

adders and/or half adders arranged so as to obtain partial products at each level. These

partial products are added along to obtain the final result. It is the different

arrangement and the construction changes in these adders that lead to various type of

structures of basic multipliers.

Fig 3: AND gate

Full Adder (FA)implementation is showing the two bits(A,B) and Carry In (Ci)

as inputs and Sum (S) and Carry Out (Cout) as outputs.

12

3.2 BINARY MULTIPLIER

A Binary multiplier is an electronic hardware device used in digital electronics

or a computer or other electronic device to perform rapid multiplication of two

numbers in binary representation. It is built using binary adders.

The rules for binary multiplication can be stated as follows

1. If the multiplier digit is a 1, the multiplicand is simply copied down and

represents the product.

2. If the multiplier digit is a 0 the product is also 0.

For designing a multiplier circuit we should have circuitry to provide or do the

following three things:

1. it should be capable identifying whether a bit is 0 or 1.

2. It should be capable of shifting left partial products.

3. It should be able to add all the partial products to give the products as sum of

partial products.

4. It should examine the sign bits. If they are alike, the sign of the product will

be a positive, if the sign bits are opposite product will be negative. The sign

bit of the product stored with above criteria should be displayed along with the

product.

From the above discussion we observe that it is not necessary to wait until all

the partial products have been formed before summing them. In fact the addition of

partial product can be carried out as soon as the partial product is formed.

Notations:

a multiplicand

b multiplier

p product

13

Binary multiplication (eg n=4)

P = ab

an1 an2a1a0

bn1 bn2b1b0

p2 n1 p2 n2p1 p0

x x x x a (Multiplicant)

x x x x b (Multiplier)

---------

x x x x b0a20

x x x x b1a21 (Partial Product)

x x x x b2a22

x x x x b3a23

---------------

x x x x x x x x p (Partial Sum)

3.2.1 BASIC HARDWARE MULTIPLIER

Partial products In binary, the partial products are trivial- If multiplier

bit=1,copy the multiplicand Else 0 Use an AND gate.

14

3.2.2 MULTIPLY ACCUMULATE CIRCUITS

Multiplication followed by accumulation is a operation in many digital systems,

particularly those highly interconnected like digital filters, neural networks, data

quantizers, etc. One typical MAC(multiply-accumulate) architecture is illustrated in

figure. It consists of multiplying 2 values, then adding the result to the previously

accumulated value, which must then be restored in the registers for future

accumulations. Another feature of MAC circuit is that it must check for overflow,

which might happen when the number of MAC operation is large . This design can be

done using component because we have already design each of the units shown in

figure. However since it is relatively simple circuit, it can also be designed directly. In

any case the MAC circuit, as a whole, can be used as a component in application like

digital filters and neural networks.

3.3 WALLACE TREE MULTIPLIER:

A Wallace tree is an efficient hardware implementation of a digital circuit that

multiplies two integers. For a NxN bit multiplication, partial products are formed

from (N^2)AND gates. Next N rows of the partial products are grouped together in set

of three rows each. Any additional rows that are not a member of these groups are

transferred to the next level without modification. For a column consisting of three

15

partial products and a full adder is used with the sum dropped down to the same

column whereas the carry out is brought to the next higher column. For column with

two partial products, a half adder is used in place of full adder. At the final stage, a

carry propagation adder is used to add over all the propagating carries to get the final

result. It can also be implemented using Carry Save Adders. Sometimes it will be

Combined with Booth Encoding.Various other researches have been done to reduce

the number of adders, for higher order bits such as 16 & 32.Applications, as the use in

DSP for performing FFT,FIR, etc.,

3.3.1 WALLACE TREE HARDWARE ARCHITECTURE:

Fig 4: wallace tree hardware architecture

16

3.3.2 FUNCTION:

The Wallace tree has three steps:

Multiply (that is - AND) each bit of one of the arguments, by each bit of the

other, yielding n2 results. Depending on position of the multiplied bits, the wires

carry different weights, for example wire of bit carrying result of a2b3 is 32.

Reduce the number of partial products to two by layers of full and half adders.

Group the wires in two numbers, and add them with a conventional adder.

3.3.3 EXAMPLE:

Suppose two numbers are being multiplied:

a3a2a1a0 X

b3b2b1b0

___________________________________

a3b0 a2b0 a1b0 a0b0

a3b1 a2b1 a1b1 a0b1

a3b2 a2b2 a1b2 a0b2

a3b3 a2b3 a1b3 a0b3

_____________________________________

Arranging the partial products in the form of tree structure

a3b3 a2b3 a1b3 a0b3 a0b2 a0b1 a0b0

a3b2 a2b2 a1b2 a1b1 a1b0

a3b1 a2b1 a2b0

a3b0

17

3.3.4 ADDER ELEMENTS

Half Adder:

18

Full Adder:

3.3.5 ADVANTAGES:

Each layer of the tree reduces the number of vectors by a factor of 3:2

Minimum propagation delay.

The benefit of the Wallace tree is that there are only O(log n) reduction layers,

but adding partial products with regular adders would require O(log n)2 time.

3.3.6 DISADVANTAGES:

Wallace trees do not provide any advantage over ripple adder trees in many FPGAs.

Due to the irregular routing, they may actually be slower and are certainly more difficult to route.

Adder structure increases for increased bit multiplication

.

19

CHAPTER 4

20

4. ARRAY MULTIPLIER:

This is the most basic form of binary multiplier construction. Its basic principle

is exactly like that done by pen and paper. It consists of a highly regular array of full

adders, the exact number depending on the length of the binary number to be

multiplied. Each row of this array generates a partial product. This partial product

generated value is then added with the sum and carry generated on the next row. The

final result of the multiplication is obtained directly after the last row. ANDed terms

generated using logic AND gate. Full Adder (FA) implementation showing the two

bits(A,B) and Carry In (Ci) as inputs and Sum (S) and Carry Out (Co) as outputs.

4.1 HARDWARE ARCHITECTURE

Fig 5: Hardware architecture

21

4.2 EXAMPLE :

4*4 bit multiplication

a3 a2 a1 a0

b3 b2 b1 b0

a3b0 a2b0 a1b0 a0b0

a3b1 a2b1 a1b1 a0b1

a3b2 a2b2 a1b2 a0b2

a3b3 a3b2 a3b1 a3b0

p7 p6 p5 p4 p3 p2 p1 p0

4.3 PRINCIPLES OF ARRAY MULTIPLIER:

Fig 6: Array multiplier

22

Due to the highly regular structure, array multiplier is very easily constructed

and also can be densely implemented in VLSI, which takes less space. But compared

to other multiplier structures proposed later, it shows a high computational time. In

fact, the computational time is of order of log O(N), one of the highest in any

multiplier structure.

4.4 BAUGH-WOOLEY MULTIPLIER :

Baugh-Wooley Multiplier are used for both unsigned and signed

number multiplication. Signed Number operands which are represented in 2s

complemented form. Partial Products are adjusted such that negative sign

move to last step, which in turn maximize the regularity o f the

multip lic atio n array. Baugh-Wo o ley Multip lier operates on signed

operands with 2s complement representation to make sure that the signs of

all partial products are positive. To reiterate, the numerical value of 2s

complement numbers, suppose X and Y can be obtained from following product

terms made of one AND gate.

Variables with bars denotes prior inversions. Inverters are connected

before the input of the full adder or the AND gates as required by the algorithm.

23

Each column represents the addition in accordance with the respective weight of the

product term.

4.5 BAUGH-WOOLEY HARDWARE ARCHITECTURE:

Fig7: Signed 2s-Complement baugh wooley multiplier

4.6 MULTIPLING TWOS COMPLIMENT NUMPERS:

The Baugh-wooley multiplication algorithm is an efficient way to handle the

sign bits.This technique has been developed in order to design regular multipliers that

is suited for 2s-complement numbers.Dr.Gebali has extended this basic idea and

developed efficient fast inner product processors capable of performing double-

precision multiply-accumulate operations without the speed penalty.Let us consider

two n-bit numbers,A and B,to be multiplied. A and B can be represented as

24

Where the ais and bis are the bits in A and B, respectively, and an-1 and

bn-1 are the sign bits.

The product, P = A * B, is then given by the following equation:

It indicates that the final product is obtained by subtracting the last two positive terms

from the first two terms.

25

4.7 BLOCK DIAGRAM OF A 4*4 BAUGH-WOOLEY MULTIPLIER:

Fig8: Block diagram of baugh wooley multiplier

4.8 ADVANTAGES:

Minimum complexity.

Easily scalable.

Easily pipelined.

Regular shape, easy to place & route.

4.9 DISADVANTAGES:

High power consumption.

More digital gates resulting in large chip area.

26

CHAPTER 5

27

5.1 PROPOSED MULTIPLIER DESIGN

Mathematics is a mother of all sciences. Mathematics is full of magic and

mysteries. The ancient Indians were able to understand these mysteries and develop

simple keys to solve these mysteries. Thousands of years ago the Indians used these

techniques in different fields like construction of temples, astrology, medicine, science

etc., due to which Indian emerged as the richest country in the world. The Indians

called this system of calculations as The vedic mathematics. Vedic Mathematics is

much simpler and easy to understand than conventional mathematics. The ancient

system of Vedic Mathematics was reintroduced to the world by Swami Bharati

Krishna Tirthaji Maharaj, Shan-karacharya of Goverdhan Peath. Vedic Mathematics

was the name given by him. Bharati Krishna, who was himself a scholar of Sanskrit,

Mathematics, History and Philosophy, was able to reconstruct the mathematics of the

Vedas. According to his re-search all of mathematics is based on sixteen Sutras, or

word-formulae and thirteen sub-sutras. According to Mahesh Yogi, The sutras of

Vedic Mathematics are the software for the cosmic computer that runs this universe.

Vedic Mathematics introduces the wonderful applications to Arithmetical

computations, theory of numbers, compound multiplications, algebraic operations,

factorizations, simple quadratic and higher order equations, simultaneous quadratic

equations, partial fractions, calculus, squaring, cubing, square root, cube root,

coordinate geometry and wonderful Vedic Numerical code. Conventional

mathematics is an integral part of engineering education since most engineering

system designs are based on various mathematical approaches. All the leading

manufacturers of microprocessors have developed their architectures to be suit-able

for conventional binary arithmetic methods. The need for faster processing speed is

continuously driving major improvements in processor technologies, as well as the

search for new algorithms. The Vedic mathematics approach is totally different and

considered very close to the way a human mind works. A multiplier is one of the key

28

hardware blocks in most of applications such as digital signal processing , encryption

and decryption algorithms in cryptography and in other logical computations. With

advances in technology, many researchers have tried to design multipliers which offer

either of the following high speed, low power consumption, regularity of layout and

hence less area or even combination of them in multiplier. The Vedic multiplier is

considered here to satisfy our requirements. In this work, we present multiplication

operations based on Urdhva tiryagbhyam in binary, designed using a new proposed 4-

bit adder and implemented in HDL language. The paper is organized as follows.

Vedic multiplication method based on Urdhva Tiryagbhyam sutra for binary numbers

is discussed. A new 4-bit adder is proposed and deals with the design and

implementation of the above said multiplier. Finally,summarizes the experimental

results obtained, with this conclusions of the work.

5.2 VEDIC MULTIPLIER:

Digital signal processors (DSPs) are very important in various engineering

disciplines. Fast multiplication is very important in DSPs for convolution, Fourier

transforms etc. A fast method for multiplication based on ancient Indian Vedic

mathematics is proposed in this work. Among the various methods of multiplications

in Vedic mathematics, Urdhva tiryakbhyam is discussed in detail. Urdhva

tiryakbhyam is a general multiplication formula applicable to all cases of

multiplication. This algorithm is applied to digital arithmetic and multiplier

architecture is formulated. This is a highly modular design in which smaller blocks

can be used to build higher blocks. The coding is done in Verilog HDL and synthesis

is done using Altera Quartus-II. The combinational delay obtained after synthesis is

compared with the performance of the Baugh wooley and Wallace tree multiplier

which are fast multiplier. This Vedic multiplier can bring about great improvement in

DSP performance.

29

5.3 IMPORTANCE OF VEDIC MATHEMATICS:

Among the various methods of multiplication in Vedic mathematics, Urdhva

tiryagbhyam, being a general multiplication formula, is equally applicable to all cases

of multiplication. This is more efficient in the multiplication of large numbers with

respect to speed and area. From this work, a 4 X 4 binary multiplier is designed using

this sutra. This multiplier can be used in applications such as digital signal processing,

encryption and decryption algorithms in cryptography, and in other logical

computations. This design is implemented in Verilog HDL.

5.4 Urdhva Tiryakbhyam Sutra

The given Vedic multiplier based on the Vedic multiplication formulae (Sutra).

This Sutra has been traditionally used for the multiplication of two numbers. Urdhva

Tiryakbhyam Sutra is a general multiplication formula applicable to all cases of

multiplication. It means Vertically and Crosswise . The digits on the two ends of the

line are multiplied and the result is added with the previous carry. When there are

more lines in one step, all the results are added to the previous carry. The least

significant digit of the number thus obtained acts as one of the result digits and the

rest act as the carry for the next step. Initially the carry is taken to be as zero. The line

diagram for multiplication of two 4-bit numbers is as shown in figure.

30

To illustrate this multiplication scheme, let us consider the multiplication of two

decimal numbers (325 * 728). Line diagram for the multiplication is shown in Figure.

The digits on the two ends of the line are multiplied and the result is added with the

previous carry. When there are more lines in one step, all the results are added to the

previous carry. The least significant digit of the number thus obtained acts as one of

the result digits and the rest act as the carry for the next step. Initially the carry is

taken to be zero

Fig 9: Multiplication of two decimal numbers by Urdhva tiryagbhyam sutra

Urdhva tiryagbhyam Sutra is used for two decimal numbers multiplication .

This Sutra is used in binary multiplication as shown in Figure . The 4-bit binary

numbers to be multiplied are written on two consecutive sides of the square as shown

in the figure. The square is divided into rows and columns where each row/column

corresponds to one of the digit of either a multiplier or a multiplicand. Thus, each bit

of the multiplier has a small box common to a digit of the multiplicand. Each bit of

the multiplier is then independently multiplied (logical AND) with every bit of the

31

multiplicand and the product is written in the common box. All the bits lying on a

crosswise dotted line are added to the previous carry. The least significant bit of the

obtained number acts as the result bit and the rest as the carry for the next step. Carry

for the first step (i.e., the dotted line on the extreme right side) is taken to be zero. We

can extend this method for higher order binary numbers.

Fig 10: Multiplication of two 4-bit binary numbers by Urdhva tiryagbhyam sutra

Now we will extend this Sutra to binary number system. For the multiplication

algorithm, let us consider the multiplication of two 8 bit binary numbers

A7A6A5A4A3A2A1A0 and B7B6B5B4B3B2B1B0. As the result of this

multiplication would be more than 8 bits, we express it as R7R6R5R4R3R2R1R0.

As in the last case, the digits on the both sides of the line are multiplied and added

with the carry from the previous step. This generates one of the bits of the result and a

carry. This carry is added in the next step and hence the process goes on. If more

than one lines are there in one step, all the results are added to the previous carry. In

each step, least significant bit acts as the result bit and all the other bits act as carry.

32

For example, if in some intermediate step, we will get 011, then1 will act as result bit

and 01 as the carry. Thus we will get the following expressions

R0=A0B0

C1R1=A0B1+A1B0

C2R2=C1+A0B2+A2B0+A1B1

C3R3=C2+A3B0+A0B3+A1B2+A2B1

C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2

C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3

C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4 +A3B3

C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5 +A4B3+A3B4

C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4

C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5

C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5

C11R11=C10+A7B4+A4B7+A6B5+A5B6

C12R12=C11+A7B5+A5B7+A6B6

C13R13=C12+A7B6+A6B7

C14R14=C13+A7B7

C14R14R13R12R11R10R9R8R7R6R5R4R3R2R1R0 being the final product.

Hence this is the general mathematical formula applicable to all cases of

multiplication. All the partial products are calculated in parallel and the delay

associated is mainly the time taken by the carry to propagate through the adders which

form the multiplication array. So, this is not an efficient algorithm for the

multiplication of large numbers as a lot of propagation delay will be involved in such

cases. To overcome this problem, Nikhilam Sutra will present an efficient method of

multiplying two large numbers.

33

5.5 THE MULTIPLIER ARCHITECTURE:

The multiplier architecture is based on this Urdhva tiryakbhyam sutra. The

advantage of this algorithm is that partial products and their sums are calculated in

parallel. This parallelism makes the multiplier clock independent. The other main

advantage of this multiplier as compared to other multipliers is its regularity. Due to

this modular nature the lay out design will be easy. The architecture can be explained

with two four bit numbers i.e. the multiplier and multiplicand are four bit numbers.

The multiplicand and the multiplier are split into four bit blocks. The four bit blocks

are again divided into two bit multiplier blocks. According to the algorithm the 4 x 4

(AxB) bit multiplication will be as follows

A = AH - AL, B = BH - BL

A = A3A2A1A0

B = B3B2B1B0

AH = A3A2, AL = A1A0

BH = B3B2, BL = B1B0

34

5.6 METHODOLOGY:

Fig 11: vedic Algorithm

By the algorithm, the product can be obtained as follows.

Product of A x B = AL x BL + AH x BL + AL x BH + AH x BH

The parallel multiplications:-

35

The 4 x 4 bit multiplication can be again reduced to 2 x 2 bit multiplications. The 4 bit

multiplicand and the multiplier are divided into two-bit blocks.

AH = AHH - AHL

BH = BHH - BHL

AH x BH = AHL x BHL + AHH x BHL + AHL x BHH + AHH x BHH

Here the parallel multiplications are

36

5.7 ADVANTAGE OF VEDIC METHODS

The use of Vedic mathematics lies in the fact that it reduces the typical

calculations in conventional mathematics to very simple ones. This is so because the

Vedic formulae are claimed to be based on the natural principles on which the human

mind works . Vedic Mathematics is a methodology of arithmetic rules that allow more

efficient speed implementation . This is a very interesting field and presents some

effective algorithms which can be applied to various branches of engineering such as

computing.

37

CHAPTER 6

38

6.1 VERILOG LANGUAGE

6.1.1 Introduction of Verilog HDL

Verilog HDL has evolved as a standard hardware description language. Verilog

HDL offers many useful features.Verilog HDL is a general-purpose hardware

description language that is easy to learn and easy to use. It is similar in syntax to the

C programming language. Designers with C programming experience will find it easy

to learn Verilog HDL. Verilog HDL allows different levels of abstraction to be mixed

in the same model.Thus, a designer can define a hardware model in terms of switches,

gates, RTL, or behavioral code. Also, a designer needs to learn only one language for

stimulus and hierarchical design. Most popular logic synthesis tools support Verilog

HDL. This makes it the language of choice for designers. All fabrication vendors

provide Verilog HDL libraries for postlogic synthesis simulation. Thus, designing a

chip in Verilog HDL allows the widest choice of vendors. The Programming

Language Interface (PLI) is a powerful feature that allows the user to write custom C

code to interact with the internal data structures of Verilog. Designers can customize a

Verilog HDL simulator to their needs with the PLI.

6.2 Importance of HDLs

HDLs have many advantages compared to traditional schematic-based design.

Designs can be described at a very abstract level by use of HDLs. Designers can write

their RTL description without choosing a specific fabrication technology. Logic

synthesis tools can automatically convert the design to any fabrication technology. If a

new technology emerges, designers do not need to redesign their circuit. They simply

input the RTL description to the logic synthesis tool and create a new gate-level

netlist, using the new fabrication technology. The logic synthesis tool will optimize

the circuit in area and timing for the new technology. By describing designs in HDLs,

39

functional verification of the design can be done early in the design cycle. Since

designers work at the RTL level, they can optimize and modify the RTL description

until it meets the desired functionality. Most design bugs are eliminated at this point.

This cuts down design cycle time significantly because the probability of hitting a

functional bug at a later time in the gate-level netlist or physical layout is minimized.

Designing with HDLs is analogous to computer programming. A textual description

with comments is an easier way to develop and debug circuits. This also provides a

concise representation of the design, compared to gate-level schematics. Gate-level

schematics are almost incomprehensible for very complex designs. HDL-based design

is here to stay. With rapidly increasing complexities of digital circuits and

increasingly sophisticated EDA tools, HDLs are now the dominant method for large

digital designs. No digital circuit designer can afford to ignore HDL-based

design.New tools and languages focused on verification have emerged in the past few

years. These languages are better suited for functional verification. However, for logic

design, HDLs continue as the preferred choice.

6.3 Trends in HDLs

The speed and complexity of digital circuits have increased rapidly. Designers

have responded by designing at higher levels of abstraction. Designers have to think

only in terms of functionality. EDA tools take care of the implementation details.

With designer assistance, EDA tools have become sophisticated enough to achieve a

close to optimum implementation. The most popular trend currently is to design in

HDL at an RTL level, because logic synthesis tools can create gatelevel netlists from

RTL level design. Behavioral synthesis allowed engineers to design directly in terms

of algorithms and the behavior of the circuit, and then use EDA tools to do the

translation and optimization in each phase of the design. However, behavioral

synthesis did not gain widespread acceptance. Today, RTL design continues to be

40

very popular. Verilog HDL is also being constantly enhanced to meet the needs of

new verification methodologies. Formal verification and assertion checking

techniques have emerged. Formal verification applies formal mathematical

techniques to verify the correctness of Verilog HDL descriptions and to establish

equivalency between RTL and gate-level netlists. However the need to describe a

design in Verilog HDL will not go away. Assertion checkers allow checking to be

embedded in the RTL code. This is a convenient way to do checking in the most

important parts of a design. New verification languages have also gained rapid

acceptance. These languages combine the parallelism and hardware constructs from

HDLs with the object oriented nature of C++.

These languages also provide support for automatic stimulus creation, checking,

and coverage. However, these languages do not replace Verilog HDL. They simply

boost the productivity of the verification process. Verilog HDL is still needed to

describe the design. For very high-speed and timing-critical circuits like

microprocessors, the gate-level netlist provided by logic synthesis tools is not optimal.

In such cases, designers often mix gate-level description directly into the RTL

description to achieve optimum results. This practice is opposite to the high-level

design paradigm, yet it is frequently used for highspeed designs because designers

need to squeeze the last bit of timing out of circuits, and EDA tools sometimes prove

to be insufficient to achieve the desired results. Another technique that is used for

system-level design is a mixed bottom-up methodology where the designers use either

existing Verilog HDL modules, basic building blocks, or vendor-supplied core blocks

to quickly bring up their system simulation. This is done to reduce development costs

and compress design schedules. For example, consider a system that has a CPU,

graphics chip, I/O chip, and a system bus. The CPU designers would build the next-

generation CPU themselves at an RTL level, but they would use behavioral models for

41

the graphics chip and the I/O chip and would buy a vendor-supplied model for the

system bus. Thus, the system-level simulation for the CPU could be up and running

very quickly and long before the RTL descriptions for the graphics chip and the I/O

chip are complete.

42

CHAPTER 7

43

7.1 INTRODUCTION OF FPGA:

A field-programmable gate array (FPGA) is an integrated circuit designed to

be configured by a customer or a designer after manufacturing hence "field-

programmable". The FPGA configuration is generally specified using a hardware

description language (HDL), similar to that used for an application-specific integrated

circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as

they were for ASICs, but this is increasingly rare). Contemporary FPGAs have large

resources of logic gates and RAM blocks to implement complex digital computations.

As FPGA designs employ very fast IOs and bidirectional data buses it becomes a

challenge to verify correct timing of valid data within setup time and hold time. Floor

planning enables resources allocation within FPGA to meet these time constraints.

FPGAs can be used to implement any logical function that an ASIC could perform.

The ability to update the functionality after shipping, partial re-configuration of a

portion of the design and the low non-recurring engineering costs relative to an ASIC

design (notwithstanding the generally higher unit cost), offer advantages for many

applications.

FPGAs contain programmable logic components called "logic blocks", and a

hierarchy of reconfigurable interconnects that allow the blocks to be "wired

together"somewhat like many (changeable) logic gates that can be inter-wired in

(many) different configurations. Logic blocks can be configured to perform complex

combinational functions, or merely simple logic gates like AND and XOR. In most

FPGAs, the logic blocks also include memory elements, which may be simple flip-

flops or more complete blocks of memory.

Some FPGAs have analog features in addition to digital functions. The most

common analog feature is programmable slew rate and drive strength on each output

pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise

44

ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-

speed channels that would otherwise run too slow.[ Another relatively common analog

feature is differential comparators on input pins designed to be connected to

differential signaling channels. A few "mixed signal FPGAs" have integrated

peripheral analog-to-digital converters (ADCs) and digital-to-analog converters

(DACs) with analog signal conditioning blocks allowing them to operate as a system-

on-a-chip.[Such devices blur the line between an FPGA, which carries digital ones and

zeros on its internal programmable interconnect fabric, and field-programmable

analog array (FPAA), which carries analog values on its internal programmable

interconnect fabric.

7.2 FPGA architecture:

The most common FPGA architecture consists of an array of logic blocks

(called Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on

vendor), I/O pads, and routing channels. Generally, all the routing channels have the

same width (number of wires). Multiple I/O pads may fit into the height of one row or

the width of one column in the array.

An application circuit must be mapped into an FPGA with adequate resources.

While the number of CLBs/LABs and I/Os required is easily determined from the

design, the number of routing tracks needed may vary considerably even among

designs with the same amount of logic. For example, a crossbar switch requires much

more routing than a systolic array with the same gate count. Since unused routing

tracks increase the cost (and decrease the performance) of the part without providing

any benefit, FPGA manufacturers try to provide just enough tracks so that most

designs that will fit in terms of Lookup tables (LUTs) and IOs can be routed. This is

determined by estimates such as those derived from Rent's rule or by experiments with

existing designs.

45

In general, a logic block (CLB or LAB) consists of a few logical cells (called

ALM, LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder (FA) and a

D-type flip-flop, as shown below. The LUTs are in this figure split into two 3-input

LUTs. In normal mode those are combined into a 4-input LUT through the left mux.

In arithmetic mode, their outputs are fed to the FA. The selection of mode is

programmed into the middle multiplexer. The output can be either synchronous or

asynchronous, depending on the programming of the mux to the right, in the figure

example. In practice, entire or parts of the FA are put as functions into the LUTs in

order to save space.

Fig 12: FPGA architecture

ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with

some shared signals. CLBs/LABs typically contains a few ALMs/LEs/Slices. In

recent years, manufacturers have started moving to 6-input LUTs in their high

performance parts, claiming increased performance. Since clock signals (and often

46

other high fan out signals) are normally routed via special-purpose dedicated routing

networks in commercial FPGAs, they and other signals are separately managed.

7.3 Cyclone II FPGA Family:

Altera's Cyclone II FPGA family is designed on an all layer copper, low k, 1.2-

V SRAM process and is optimized for the smallest possible die size. Built on TSMCs

highly successful 90-nm process technology using 300-mm wafers, the Cyclone II

FPGA family offers higher densities, more features, exceptional performance, and the

benefits of programmable logic at ASIC prices.

7.4 Altera's Cyclone II FPGA Family Features:

1.Cost-Optimized Architecture

The Cyclone II architecture is optimized for the lowest cost and offers up to 68,416

logic elements (LEs) more than 3x the density of first generation Cyclone FPGAs.

The logic resources in Cyclone II FPGAs can be used to implement complex

applications.

2.High Performance

Cyclone II FPGAs are 60 percent faster than competing low-cost 90-nm FPGAs,

making them the highest performing low-cost 90-nm FPGAs on the market.

3.Low Power

Cyclone II FPGAs are half the power of competing low-cost 90-nm FPGAs,

dramatically reducing both static and dynamic power.

47

4.Process Technology

Cyclone II FPGAs are manufactured on 300-mm wafers using TSMC's leading-

edge 90-nm, low-k dielectric process technology.

5.Embedded Multipliers

Cyclone II FPGAs offer up to 150 18 x 18 multipliers that are ideal for low-cost

digital signal processing (DSP) applications. These multipliers are capable of

implementing common DSP functions such as finite impulse response (FIR) filters,

fast Fourier transforms (FFTs), correlators, encoders/decoders, and numerically

controlled oscillators (NCOs).

6.Fast On Capability

Select Cyclone II FPGAs offer fast on capability, allowing them to be

operational soon after power up, making them ideal for automotive and other

applications where quick start-up time is essential. Cyclone II FPGAs, which offer a

faster power-on reset (POR) time, are designated with an A in the device ordering

code (EP2C5A, EP2C8A, EP2C15A, and EP2C20A).

48

CHAPTER 8

49

RESULTS

8.1 BAUGH WOOLEY MULTIPLIER

Fig 13 : simulation result for baugh wooley multiplier

50

Fig:14 RTL schemetic of Baugh wooley multiplier

51

BAUGH WOOLEY:

8.1.1 AREA (TOTAL LOGIC ELEMENTS)

8.1.2 POWER:

52

8.1.3 DELAY ( TIMING REQUIREMENT)

8.2 VEDIC MULTIPLIER

Fig 15: simulation result of vedic multiplier

53

Fig 16: RTL schematic of the vedic multiplier


54


8.2.3 POWER:

BAUGH WOOLEY MULTIPLIER

TIME (ns)

22.314

POWER (mW)

195.68

TOTAL LOGIC ELEMENTS

41

55

VEDIC MULTIPLIER

TIME (ns)

16.939

POWER (mW)

195.18

TOTAL LOGIC ELEMENTS

28

8.3 COMPARISON OF TWO MULTIPLIERS


BAUGH WOOLEY: 22.314 nS

VEDIC: 16.939 nS

56


0

5

10

15

20

25

30

35

40

45

Baugh-wooley multiplier Vedic multiplier

Total logic elements

BAUGH WOOLEY : 41

VEDIC: 28

8.3.3POWER

BAUGH WOOLEY: 195.68 mW

VEDIC: 195.18 mW

57

CHAPTER 9

58

CONCLUSION

Multipliers are one the most important component of many systems. So we

always need to find a better solution in case of multipliers. Our multipliers should

always consume less power. So through our project we try to determine which of the

two algorithms works the best. Our project gives a clear concept of different multiplier

and their implementation in Altera Quartus-II tool. We found that the vedic multiplier

is much option than the Baugh wooley multiplier. We concluded this from the result

of power consumption and the total area. In case of vedic multiplier, the total area is

much less than that of baugh wooley multipliers. Hence the power consumption is

also less. This is clearly depicted in our results. This speeds up the calculation and

makes the system faster. When the two multipliers were compared we found that

baugh wooley multiplier is more power consuming and have the maximum area. This

is because it uses a large number of adders. As a result it slows down the system

because now the system has to do a lot of calculation. In the end we determine that

Urdhva Tiryakbhyam algorithm works the best.

59

CHAPTER 10

60

REFERENCES:

[1] Delay-Power performance of Multipliers in VLSI Design Sumit Vaidya and

Deepak Dandekar International Journal of Computer Networks & Communications

(IJCNC), Vol.2, No.4, July 2010

[2] Design and implementation of different multipliers using VHDL Prof. Dr. K.K

Mahapatra Dept. of Electronics and Communication Engineering, National Institute

of Technology, Rourkela 2007.

[3] A Reduced-Bit Multiplication Algorithm for Digital Arithmetic Harpreet Singh

Dhillon and Abhijit Mitra, International Journal of Computational and Mathematical

Sciences 2:2 ,2008.

[4] VLSI implementation of vedic multiplier with reduced delay First Krishnaveni

D., Department of TCE, A.P.S College of Engineering; Second Umarani T.G.,

Department of ECE, A.P.S College of Engineering, Somanahalli. International

Journal of Advanced Technology & Engineering Research (IJATER) National

Conference on Emerging Trends in Technology (NCET-Tech) ISSN

[5]A New Low Power 3232- bit Multiplier Pouya Asadi and Keivan Navi, World

Applied Sciences Journal IDOSI Publication.

[6] A Novel Parallel Multiply and Accumulate (V-MAC) Architecture Based On

Ancient Indian Vedic Mathematics Himanshu Thapliyal and Hamid RArbania.

[7] Low power and high speed 8x8 bit Multiplier Using Non-clocked Pass Transistor

Logic C.Senthilpari,Ajay Kumar Singh and K. Diwadkar, 2007, IEEE.

61

[8] Kiat-seng Yeo and Kaushik Roy Low-voltage,low power VLSI sub system Mc

Graw-Hill Publication.

[9] Jong Duk Lee, Yong Jin Yoony, Kyong Hwa Leez and Byung-Gook Park

Application of Dynamic Pass Transistor Logic to 8-Bit Multiplier Journal of the

Korean Physical Society,March 2001

[10] C. F. Law, S. S. Rofail, and K. S. Yeo A Low-Power 1616-Bit Parallel

Multiplier Utilizing Pass-Transistor Logic IEEE Journal of Solid State circuits,

October 1999.

[11] Low Power High Performance Multiplier C.N. Marimuthu and P.Thiangaraj,

ICGST-PDCS,Volume 8, December 2008.

[12] ASIC Implementation of 4 Bit Multipliers Pravinkumar Parate ,IEEE

Computer society.ICETET,2008.25.

Books referred:

1.VHDL by B Bhaskar

2.Verilog HDL: A Guide to Digital Design and Synthesis, Second Edition

By Samir Palnitkar, Publisher: Prentice Hall PTR, : February 21, 2003

project report about multipliers

Documents

multiplication performance

multiplication time

multiplication process

serial multiplication

multiplier design

types parallel multiplication

multiplier schemes

multiplier features