1 Combined LNS Adder/Subtractors for DCT Hardware Jie Ruan & Mark G. Arnold
Dec 21, 2015
1
Combined LNS Adder/Subtractors for DCT
Hardware
Jie Ruan & Mark G. Arnold
2
Outline
Logarithmic Number System (LNS) Discrete Cosine Transform (DCT) Combined LNS adder/subtractor
3
LNS (Logarithmic Number System)
Represents a number by a sign bit and an exponent to a certain base b
Exponent (n-1 bits)S
F (Precision)
4
Properties of LNS
Large dynamic range Easy for multiplications,
divisions and exponentiations
Additions are not linear operations for LNS
Cost of adders is exponential to word lengths
Have advantages at low precisions
5
LNS Arithmetic Units
Multiplication• logb(XY) = logbX + logbY
• The cost is a fixed-point adder
Addition
• More complex process than multiplication
• E.g., when calculating logb(X+Y),
(x=logbX, y=logbY)1. Calculate z=x-y Z=X/Y
2. Table-lookup sb(z)=logb(1+bz) 1+X/Y
3. logb(X+Y)=y+sb(z) Y(1+X/Y)=X+Y Subtraction
• db(z)=logb|1-bz|
6
LNS Multiplication and Addition
sb(z)
db(z)
z
sb(z)=logb(1+2z)
=y+sb(z)
_+
x
y
logb(X+Y)
db(z)=logb|1-2z|
x
y
+
LNS multiplication LNS addition
logb(XY)
=x+y
x=logbX, y=logbY
(=y+db(z) when Sx≠Sy)
7
Discrete Cosine Transform
7
0
7
0
),(2
)(
2
)(),(
x x
yxfcc
F
7...0,,7...0, yx
16
)12(cos
16
)12(cos
yx
An important part in MPEG encoding
2 Dimensional 8x8 DCT
2-D DCT usually performed through 2 rounds of 1-D DCT to reduce the hardware cost
8
LNS DCT in MPEG encoding
Floating-point cost is too high for portable systems LNS has the same visual result as fixed-point at the
same precisions LNS have shorter word length than fixed-point
numbers
At the same dynamic range and precisions for MPEG-1
• Fixed-point (12+F) bits
• LNS (6+F) bits
9
Fast DCT algorithm
Chen’s 1-D DCT algorithm (one cycle)• Directly factorizes the DCT matrix
• 16 multiplications
• 26 additions
• Perform one 8-point 1-D DCT in one cycle
Two-cycle version by reusing hardware• 14 adders
• 10 multipliers
• Perform one 8-point 1-D DCT in two cycles
10
Diagram of Chen’s 1-D DCT
-
-
-
-
-
-
-
-
-
-
S(1/4)
C(1/4)
S(1/8)
C(1/8)
S(1/8)
-C(1/8)
C(1/4)
S(1/4)
S(1/16)
C(1/16)
-S(7/16)
C(7/16)
S(5/16)
C(5/16)
-S(3/16)
C(3/16)
f(0)
f(1)
f(2)
f(3)
f(4)
f(5)
f(6)
f(7)
F(0)
F(4)
F(2)
F(6)
F(1)
F(5)
F(3)
F(7)
-
S(m/n)=sin(mπ/n), C(m/n)=cos(mπ/n)
11
Many computational units as below in DCT
Combined LNS adders/subtractors
X+Y
X-Y-
The above two computation always access different sb(z) table and db(z) table
Share table-lookup part and some combinational parts in the above two computations
=
12
Combined LNS adder/subtractors
1. z=x-y
2. Table-lookup sb(z)=logb(1+2z)
3. y+sb(z)
X+Y
x=logbX, y=logbY
1. z=x-y
2. Table-lookup db(z)=logb|1-2z|
3. y+db(z)
X-YSame hardware
Same address for different tables
13
Combined LNS adder/subtractors (type 1)
sb(z)
db(z)
z=x-y
=y+sb(z)_+
x
y
logb(X+Y)
(=y+db(z) when Sx≠Sy)
+=y+db(z)
logb|X-Y|
(=y+sb(z) when Sx≠Sy)
14
Combined LNS adder/subtractors (type 1)
sb(z)
db(z)
z=x-y
=y+sb(z)_+
x
y
logb(X+Y)
(=y+db(z) when Sx≠Sy)
+=y+db(z)
logb|X-Y|
(=y+sb(z) when Sx≠Sy)
15
Diagram of Chen’s 1-D DCT
-
-
-
-
-
-
-
-
-
-
S(1/4)
C(1/4)
S(1/8)
C(1/8)
S(1/8)
-C(1/8)
C(1/4)
S(1/4)
S(1/16)
C(1/16)
-S(7/16)
C(7/16)
S(5/16)
C(5/16)
-S(3/16)
C(3/16)
f(0)
f(1)
f(2)
f(3)
f(4)
f(5)
f(6)
f(7)
F(0)
F(4)
F(2)
F(6)
F(1)
F(5)
F(3)
F(7)
S(m,n)=sin(mπ/n), C(m,n)=cos(mπ/n)
S(1/8)
C(1/8)
-C(1/8)
S(1/8)
S(1/8)
C(1/8)
S(1/8)
-C(1/8)
S(1/8)
C(1/8)
-C(1/8)
S(1/8)
16
Some computation units perform blow computations
Combined LNS adder/subtractors
a1X+a2Y
-a2X+a1Y (a1, a2 are constants)
S(1/8)
C(1/8)
S(1/8)
-C(1/8)
Access different tables in an LNS adder Share table-lookup part Add some extra combinational hardware The table-lookup of the two computations use
different addresses
=
17
Combined LNS adder/subtractors (type 2)
sb(z)
db(z)
=y+sb(z1)_+
logba2X
logb(a1X+a2Y)
(=y+db(z1) when Sx≠Sy)
+=y+db(z2)
logb(-a2X+a1Y)
(=y+sb(z2) when Sx≠Sy)
_
logba2Y
logba1X
logba1Yz2
z1
18
Portions of table-lookup part in LNS adders
0
500
1000
1500
2000
2500
3000
2 3 4 5
F
Are
a Combinational
Table-lookup
19
ROM size with/without combined LNS adder/subtractors
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
One-cycle Two-cycle
ROM bitsWithout
With
20
Hardware comparison for LNS adder and LNS adder/subtractors
0
500
1000
1500
2000
2500
3000
3500
4000
2 3 4 5
F
Are
a
Ordinary
Type 1
Type2
21
LNS adder/subtractors in Chen’s hardware
LNS adders
Ordinary Type 1 Type 2
Direct inferred hardware
26 0 10 3
Two-cycle version hardware
14 4 3 2
22
Hardware comparison for Chen’s DCT algorithm at F=4
0
20000
40000
60000
80000
100000
120000
140000
One-cycle Two-cycle
Fixed-point
One-cycle
Two-cycle
23
Conclusion
Significant area savings by combined LNS adder/subtractors in DCT hardware
Suitable to reduce area in portable MPEG devices Some overhead when converting to/from fixed-point