Low-power parallel multiplier with column bypassing M.-C. Wen, S.-J. Wang and Y.-N. Lin A low-power parallel multiplier design, in which some columns in the multiplier array can be turned-off whenever their outputs are known, is proposed. This design maintains the original array structure without introducing extra boundary cells, as was the case in previous designs. Experimental results show that it saves 10% of power for random input. Higher power reduction can be achieved if the operands contain more 0’s than 1’s. Introduction: Multiplication is an essential arithmetic operation for common DSP applications, such as filtering and fast Fourier transform (FFT). To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. CMOS is currently the dominant technology in digital VLSI. Two components contribute to the power dissipation in CMOS circuits. The static dissipation is due to leakage current, while dynamic power dissipation is due to switching transient current as well as charging and discharging of load capacitances. Since the amount of leakage current is usually small, the major source of power dissipation in CMOS circuits is the dynamic power dissipation. Dynamic power dissipation appears only when a CMOS gate switches from one stable state to another. Thus, the power consumption can be reduced if one can reduce the switching activity of a given logic circuit without changing its function. Many low-power multiplier designs can be found in the literature. A straightforward approach is to design a full adder (FA) that consumes less power [1]. Power reduction can also be achieved through structural modification. For example, rows of partial products can be ignored [2]. Parallel multiplier: Consider the multiplication of two unsigned n-bit numbers, where A ¼ a n1 a n2 , ... , a 0 is the multiplicand and B ¼ b n1 b n2 , ... , b 0 is the multiplier. The product P ¼ p 2n1 p 2n2 , ... , p 0 , can be written as follows: P ¼ X n1 i¼0 X n1 j¼0 ða i b j Þ2 iþj An array implementation, known as the Braun multiplier [3], is shown in Fig. 1. On the other hand, the Baugh-Wooley multiplier uses the same array structure to handle 2’s complement multiplication, with some of the partial products replaced by their complements. The multiplier array consists of (n 1) rows of CSA, in which each row contains (n 1) FA cells. Each FA in the CSA array has two outputs: the sum bit goes down while the carry bit goes to the lower-left FA. For an FA in the first row, there are only twovalid inputs, and the third input bit is set two 0. Therefore, it can be replaced by a two-input half-adder. The last row is a ripple adder for carry propagation. In this Letter, we propose a low-power design for this multiplier. Fig. 1 4 4 Braun multiplier Low-power multipliers with row-bypassing: A low-power multiplier design may disable the operations in some rows to save power [2]. If bit b j is 0, all partial products a i b j ,0 i n 1, are zero. Therefore, the additions in the corresponding row in Fig. 1 can be bypassed. The row- bypassing multiplier is shown in Fig. 2. Each cell in the CSA array is augmented with three tri-state gates and two multiplexers. For exam- ple, let b 2 be 0 in Fig. 2. In this case, the CSA in the second row (enclosed in the circle) can be bypassed, and the outputs from the first row are fed directly to the third row CSA. However, since the rightmost FA in the second row is disabled, it does not execute the addition and thus the output is not correct. To remedy this problem, an extra circuit must be added, and these elements locate in the triangle area in Fig. 2. P 7 ab 13 P 6 P 5 P 4 P 3 P 2 P 1 P 0 ab 23 ab 33 ab 03 ab 32 ab 22 ab 12 ab 02 ab 31 ab 21 ab 11 ab 01 ab 0 0 ab 10 ab 20 ab 30 + + + + + + + + + + + + + + + 0 0 0 0 0 b 3 –b 2 b 2 -b 1 01 10 01 10 01 10 01 10 01 10 01 10 01 10 01 10 01 10 0 0 0 -b 3 Fig. 2 4 4 Braun multiplier with row-bypassing Proposed method: Instead of bypassing rows of full adders, we propose a multiplier design in which columns of adders are bypassed. In this approach, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0. There are two advantages to this approach. First, it eliminates the extra correcting circuit as shown in Fig. 2. Secondly, the modified FA is simpler than that used in the row-bypassing multiplier. Assume that we execute 1010 1111 in Fig. 1. It can be verified that, for FAs in the first and third diagonals, two out of the three input bits are 0: the ‘carry’ bit from its upper right FA, and the partial product a i b j (note that a 0 ¼ a 2 ¼ 0). As a result, the output carry bit of such an FA is 0, and the output sum bit is simply equal to the third bit, which is the ‘sum’ output of its upper FA. The following theorem shows that this is true in general. Therefore, when a i is 0, the operations in the correspond- ing diagonal can be disabled since all the outputs are known. We refer to the FAs in a diagonal in Fig. 1 as a column. Let FA i, j be the full adders locating in row i and column j,0 i, j n 2, in the (n 1) (n 1) array, as shown in Fig. 1. FA 0,0 is the adder at the upper-right corner. The following theorem establishes reason for column bypassing. Theorem 1: When a j ¼ 0, the output of a column j adder cell FA i, j can be specified as follows. 1. The output carry bit is 0. 2. The output sum bit is equal to the output sum bit of FA i1, jþ1 . Proof: We prove this theorem by induction. 1. Consider row 0. Note that, in row 0, there are only two bits to be added. Adder FA 0, j carries out a j b 1 þ a jþ1 b 0 . If a j ¼ 0, then the output carry bit must be zero, and the out sum bit is equal to a jþ1 b 0 . 2. Assume that the theorem holds for row i. 3. In row i þ 1, the inputs of FA iþ1, j are carry bit from FA i, j , sum bit from FA i, jþ1 , and the partial product a j b iþ1 . Since a j ¼ 0, two out of the three inputs are 0, and the output sum bit is equal to the sum bit sent by FA i, jþ1 . According to theorem 1, when a j ¼ 0, the operations in column j can be ignored and thus the full adders can be disabled since the outputs are known. ab 33 P 7 P 6 P 5 P 4 P 3 P 2 P 1 P 0 + + + + + + + + + + + + ab 23 a 2 a 1 a 0 10 10 10 10 10 10 10 10 10 ab 13 ab 03 ab 32 ab 22 ab 12 ab 02 ab 31 ab 21 ab 11 ab 30 ab 20 ab 10 ab 0 0 ab 01 Fig. 3 4 4 column-bypassing multiplier ELECTRONICS LETTERS 12th May 2005 Vol. 41 No. 10