Nov. 29, 2005 ELEC6970-001 1 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula
Nov. 29, 2005 ELEC6970-001 1
Power Minimization Using Voltage Reduction and Parallel Processing
By
Sudheer Vemula
Nov. 29, 2005 ELEC6970-001 2
Outline:- Goal of the Project Introduction to Parallel Processing Delay of the critical path in the given circuit of
32x32 Array Multiplier Methods to introduce parallelism in the given circuit. Reduction in delay of critical path due to the
introduced parallelism Calculations showing that the estimation of area and
delay Conclusion
Nov. 29, 2005 ELEC6970-001 3
Goal of the Project To reduce the power consumption of the
circuit. By reducing the Voltage of the power supply.
Consequence: Increases the delay of the critical path.
To compensate the increase in delay by introducing parallelism.
To calculate the reduction in power.
Nov. 29, 2005 ELEC6970-001 4
Parallel Processing Definition:- Concurrent execution of several
programs or several blocks of a program is known as parallel processing[1].
Types of parallelism Data Parallelism & Control Parallelism
Data Parallelism is parallel execution of single expression on data distributed over multiple processors[2].
Control Parallelism is the parallelism that is achieved by the simultaneous execution of multiple threads [3].
Nov. 29, 2005 ELEC6970-001 5
Voltage Scaling and Delay:-
Since transistor is a voltage controlled current device, the resistance depends on the voltage and current.
= 0.5(0.5 Rp C + 0.5 Rn C)
RC5.0
2
fr
dsatpdsatn
dd
II
CV 11
4
tdd
dd
VV
kV
= 2 for low Vdd
Nov. 29, 2005 ELEC6970-001 6
Critical Path:-
0
0
0
0
0 0 0 0
A0
A1
A2
A3
B3 B2 B1 B0
Y0
Y1
Y2
Y3 Y4 Y5 Y6 Y7
0
0
0
0
0 0 0 0
A0
A1
A2
A3
B3 B2 B1 B0
Y0
Y1
Y2
Y3 Y4 Y5 Y6 Y7
Delay of the Critical path for a multiplier of order n x m = (2m+n-2)
Delay of the Critical path for a multiplier of order 32 x 32 = 94
Approximate area of 32 x 32 Multiplier = 1024FAs + 128FAs (due to AND Gates) = 1152 FAs
Nov. 29, 2005 ELEC6970-001 7
Horizontal Partition:-
32 x 16 Multiplier
32 x 16 Multiplier
16 bits
16 bits
32
bits
32
bits
48
48
32 bit Adder
64 bit Result
1632
16
32
16 bit Half Adder
Cout
0
0 0 0
A0
0 A1
B0B1B2B3
0
0 0 0
A2
0 A3
B0B1B2B3
Critical path delay for a multiplier of order 32x16 = (2*16+32-2) + Delay of the 32 bit Full Adder (FA) + Delay of the 16 bit Half Adder (HA)= 62 + Delay of the 32 bit FA+ Delay of the 16 bit HA
Ex.: A=98 and B=76
AB=(90x76) + (8x76)
=(9x76)x10 + 8x76
Nov. 29, 2005 ELEC6970-001 8
Vertical Partition
16 x 32 Multiplier
32 bits
16
bits
16 x 32 Multiplier
32 bits1
6 b
its
32 bit Full AdderCout
16 bit Half Adder
64 bit result
0
0
0
0
0
A0
A1
A2
A3
0B0B1
0
0
0
0
0
A0
A1
A2
A3
0B3B4
Ex.: A=98 and B=76
AB = (98x70) + (98x6) = (98x7)x10 + (98x6)
Critical path delay for a multiplier of order 16x32
= (2x32+16-2) + Delay of the 32 bit FA+ Delay of the 16 bit HA
=78 + Delay of the 32 bit FA+ Delay of the 16 bit HA
Nov. 29, 2005 ELEC6970-001 9
Delay of the 32 bit FA:-
The computation of products and sum is done simultaneously.
FA introduces only a delay of 1 unit.
Now the remaining delay is due to the delay of the HA.
The delay due to 16 bit HA adder is ~ equal to 8 FA units
Let A=1010 B=1011
1010 1010
X 10 x 11
10100 11110
Product1:- 1 1 1 1 0
Product2:- 1 0 1 0 0
Sum:- 0 1 1 0 1 1 1 0
Nov. 29, 2005 ELEC6970-001 10
Eliminating the Delay due to Half Adder:-
16 x 32 Multiplier
32 bits
16 b
its
16 x 32 Multiplier
32 bits
16 b
its
32 bit Full AdderCout
16 bit Half Adder
64 bit result
‘1’
48 4832 32
16
16
16 32
16
16
Here we are introducing a 16 bit multiplexer to eliminate the delay due to 16 bit Half Adder.
The additional delay is only due to the multiplexer.
Delay of this circuit = 78+1+0.5(~delay due to mux)
Additional No. of gates = 32FAs + 16 HAs + Multiplexers ~ 32+8+5 = 45FAs
The same procedure can be implemented in the circuit with horizontal partitioning.
Nov. 29, 2005 ELEC6970-001 11
16 x 16 Multiplier 32 bits
16 b
its
16 x 16 Multiplier 32 bits
16 b
its
16 bit Full AdderCout
16 bit HA
48 bit result
‘1’
32 16 16 32
16
16
16
1616
16
16 x 16 Multiplier 32 bits
16 b
its
16 x 16 Multiplier 32 bits
16 b
its
16 bit Full AdderCout
16 bit HA
48 bit result
‘1’
32 16 16 32
16
16
16
16
16
16
32 bit FA
64 bit Result
16 bit HA
‘1’ Cout
48
48
32
32
32
16
16
16
16
16
Ex.: A=98 and B=76
AB=(90x76) + (8x76)
=(9x76) 10 + 8x76
=(9x7) 100 + (9x6) 10+(8x7) 10 + (8x6)
Nov. 29, 2005 ELEC6970-001 12
Delay and Area Calculations:- Delay of the circuit = (2x16+16-2)+ 1.5 + (Delay due to 32
bit FA) +1.5 Delay due to 32 bit FA is 16 units. Because the 16 LSBs of
the FA are computed simultaneously with previous stage whereas the 16 MSBs are computed without any overlap.
Therefore, Delay = 49 + 16 = 65 Area Overhead = 2 x 16 bit FAs + 32 bit FA +3 x 16 bit HAs
+ 3 x 16 bit Multiplexers ~ 64 + 24 + 3 x 8 = 112 FAsPercentage Reduction in Delay = (94-65) x 100 / 94 = 30.8%Percentage Increase in Area = (112/1152) x 100 = 9.7%
Nov. 29, 2005 ELEC6970-001 13
Circuit with improved Delay:-
16 x 16 Multiplier 32 bits
16 b
its
16 x 16 Multiplier 32 bits
16 b
its
16 bit Full AdderCout
16 bit HA
48 bit result
‘1’
32 16 16 32
16
16
16
1616
16
16 x 16 Multiplier 32 bits
16 b
its16 x 16
Multiplier 32 bits
16 b
its
16 bit Full AdderCout
16 bit HA
48 bit result
‘1’
32 16 16 32
16
16
16
16
16
16
64 bit Result
16 bit HA
‘1’ Cout
48
48
32
32
32
16
16
16
16
1616 bit CLA 16 bit FA
Nov. 29, 2005 ELEC6970-001 14
Delay and Area Calculations:- Delay of the circuit = (2x16+16-2)+ 1.5 + (Delay due to 16
bit CLA) +1.5 Therefore, Delay = 49 + (16/3.6) = 53.5 --[4] Area Overhead = 2 x 16 bit FAs + 16 bit FA + 16 bit Carry
Look Ahead Adder (CLA) + 3 x 16 bit HAs + 3 x 16 bit Multiplexers
~ 32 + 16 + 16 x (10/7.2) + 24 + 24 --- [4] = 48 + 22 + 48 = 118 FAsPercentage Reduction in Delay = (94-53.5) x 100 / 94 = 43.08%Percentage Increase in Area = (118/1152) x 100 = 10.24%
Nov. 29, 2005 ELEC6970-001 15
16 x 16 Multiplier 32 bits
16 b
its
16 x 16 Multiplier 32 bits
16 b
its
16 bit Full AdderCout
16 bit HA
48 bit result
‘1’
32 16 16 32
16
16
16
1616
16
16 x 16 Multiplier 32 bits
16 b
its
16 x 16 Multiplier 32 bits
16 b
its
16 bit Full Adder
32 bit result
32 16 16 32
16
16
16
64 bit Result
Cout
48
32
32
32
1616 bit CLA 16 bit FA
FA
S
C
15 bit HA
‘1’
15
15
Nov. 29, 2005 ELEC6970-001 16
Delay and Area Calculations:- Delay of the circuit = (2x16+16-2)+ 1.5 + (Delay due to 16
bit CLA) +1.5 + 1(Added delay due to one FA) Therefore, Delay = 49 + (16/3.6) +1 = 54.5 ---[4] Area Overhead = 2 x 16 bit FAs + 16 bit FA + 16 bit Carry
Look Ahead Adder (CLA) + 16 bit HA + 1 bit FA + 15 bit HA + 3 x 16 bit Multiplexers
~ 32 + 16 + 16 x (10/7.2) + 8 + 1+ 8.5 + 24 --- [4] = 48 + 22 + 41.5 = 111.5 FAsPercentage Reduction in Delay = (94-54.5) x 100 / 94 = 42.02%Percentage Increase in Area = (111.5/1152) x 100 = 9.7%
Nov. 29, 2005 ELEC6970-001 17
32x32 Multiplier with 4x4 Multipliers:-
New delay of the circuit = (2x4+4-2) + 1.5 + 1.5 + 10 (CLAs) + 3 + 4.5 (both from previous ckt. values) = 29.5
New Area overhead = 8 x 4 bit FAs + 8 x 4 bit HAs + 4 x 4 bit CLA + 4 x 4 bit FA + overhead of previous ckt = 32 + 16 + 16 x (10/7.2) + 16 + 111.5 ~ 198 FAs
Percentage reduction in Delay = (94 - 30) / 94 = 68% Percentage increase in Area = 198/1152 = 17%
Nov. 29, 2005 ELEC6970-001 18
Conclusion:- The percentage reduction in Delay is much
higher than the increase in Area. So, there is a very high possibility that the final power consumed after voltage scaling is much lesser than the original value.
Nov. 29, 2005 ELEC6970-001 19
References [1]dspvillage.ti.com/docs/catalog/
dspplatform/details.jhtml [2]www.llnl.gov/CASC/Overture/henshaw/
documentation/App/manual/node160.html [3]books.nap.edu/html/up_to_spedd/
appD.html [4] J. M. Rabey & M. Pedram, Low power
Design Metodologies, Kluwer Academic Publishers, Boston MA, 1996.