-
Hindawi Publishing CorporationVLSI DesignVolume 2010, Article ID
794891, 19 pagesdoi:10.1155/2010/794891
Review Article
CORDIC Architectures: A Survey
B. Lakshmi and A. S. Dhar
Department of Electronics and Electrical Communication
Engineering, Indian Institute of Technology,Kharagpur, West Bengal
721302, India
Correspondence should be addressed to B. Lakshmi,
[email protected]
Received 6 October 2009; Accepted 10 January 2010
Academic Editor: Kiyoung Choi
Copyright 2010 B. Lakshmi and A. S. Dhar. This is an open access
article distributed under the Creative Commons AttributionLicense,
which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properlycited.
In the last decade, CORDIC algorithm has drawn wide attention
from academia and industry for various applications such asDSP,
biomedical signal processing, software defined radio, neural
networks, and MIMO systems to mention just a few. It is aniterative
algorithm, requiring simple shift and addition operations, for
hardware realization of basic elementary functions. SinceCORDIC is
used as a building block in various single chip solutions, the
critical aspects to be considered are high speed, lowpower, and low
area, for achieving reasonable overall performance. In this paper,
we first classify the CORDIC algorithm basedon the number system
and discuss its importance in the implementation of CORDIC
algorithm. Then, we present systematic andcomprehensive taxonomy of
rotational CORDIC algorithms, which are subsequently discussed in
depth. Special attention has beendevoted to the higher radix and
flat techniques proposed in the literature for reducing the
latency. Finally, detailed comparison ofvarious algorithms is
presented, which can provide a first-order information to designers
looking for either further improvementof performance or selection
of rotational CORDIC for a specific application.
1. Introduction
The current research in the design of high speed
VLSIarchitectures for real-time digital signal processing
(DSP)algorithms has been directed by the advances in the
VLSItechnology, which have provided the designers with signif-icant
impetus for porting algorithm into architecture. Manyof the
algorithms used in DSP and matrix arithmetic requireelementary
functions such as trigonometric, inverse trigono-metric, logarithm,
exponential, multiplication, and divisionfunctions. The commonly
used software solutions for thedigital implementation of these
functions are table lookupmethod and polynomial expansions,
requiring number ofmultiplication and additions/subtractions.
However, digit-by-digit methods exist for the evaluation of these
elementaryfunctions, which compute faster than software
solutions.
Some of the digit-by-digit methods for the computationof the
above mentioned elementary functions were describedby Henry Briggs
in 1624 in Arithmetica Logarithmica[1, 2]. These are iterative
pseudo division and pseudomultiplication processes, which resemble
repeated-additionmultiplication and repeated-subtraction division.
In 1959,
Volder has proposed a special purpose digital computingunit
known as COordinate Rotation DIgital Computer(CORDIC), while
building a real time navigational computerfor use in an aircraft
[3, 4]. This algorithm was initiallydeveloped for trigonometric
functions which were expressedin terms of basic plane
rotations.
The conventional method of implementation of 2Dvector rotation
shown in Figure 1 using Givens rotationtransform is represented by
the equations
xout = xin cos yin sin ,yout = xin sin + yin cos ,
(1)
where (xin, yin) and (xout, yout) are the initial and
finalcoordinates of the vector, respectively. The hardware
real-ization of these equations require four multiplications,two
additions/subtractions and accessing the table storedin memory for
trigonometric coecients. The CORDICalgorithm computes 2D rotation
using iterative equationsemploying shift and add operations. The
versatility ofCORDIC is enhanced by developing algorithms on the
samebasis to convert between binary to binary coded decimal
-
2 VLSI Design
(BCD) number representation by Daggett in 1959 [5].
Theseiterative methods were described using decimal radix forthe
design of powerful small machines by Meggitt in 1962[6].
Subsequently, Walther in 1971 [7, 8] has proposed aunified
algorithm to compute rotation in circular, linear,and hyperbolic
coordinate systems using the same CORDICalgorithm, embedding
coordinate systems as a parameter.
During the last 50 years of the CORDIC algorithm awide variety
of applications have emerged. The CORDICalgorithm has received
increased attention after an unifiedapproach is proposed for its
implementation [7]. Thereafter,CORDIC based computing has been the
choice for scientificcalculator applications and HP-2152A
co-processor, HP-9100 desktop calculator, HP-35 calculator are a
few suchdevices based on the CORDIC algorithm [1, 8]. TheCORDIC
arithmetic processor chip is designed and imple-mented to perform
various functions possible in rotationand vectoring mode of
circular, linear, and hyperboliccoordinate systems [9]. Since then,
CORDIC technique hasbeen used in many applications [10], such as
single chipCORDIC processor for DSP applications [1115],
lineartransformations [1621], digital filters [17], [2224],
andmatrix based signal processing algorithms [25, 26].
Morerecently, the advances in the VLSI technology and the adventof
EDA tools have extended the application of CORDICalgorithm to the
field of biomedical signal processing [27],neural networks [28],
software defined radio [29], andMIMO systems [30] to mention a
few.
Although CORDIC may not be the fastest technique toperform these
operations, it is attractive due to its potentialfor ecient and low
cost implementation of a large class ofapplications. Several
modifications have been proposed inthe literature for the CORDIC
algorithm during the last twodecades to provide high performance
and low cost hardwaresolutions for real time computation of a two
dimensionalvector rotation and transcendental functions.
A new type of arithmetic operation called fast rotationsor
orthonormal -rotations over a set of fixed angles isproposed [31].
These orthonormal -rotations are basedon the idea of CORDIC and
share the property thatperforming the rotation requires a minimal
number of shift-add operations. These fast rotations methods form a
viablelow cost alternative to the CORDIC arithmetic for
certainapplications such as FIR filter banks for image
processing,the generation of spherical sample rays in 3D graphics,
andthe computation of eigenvalue decomposition and singularvalue
decomposition.
We have carried out the critical study of dierentarchitectures
proposed in the literature for 2D rotationalCORDIC in circular
coordinate system, to initiate the workfor further latency
reduction or throughput improvement.In this paper, we will review
the architectures proposedfor rotational CORDIC. Specifically, we
focus on redundantunfolded architectures, employing techniques
suitable toincrease throughput and reduce latency.
The rest of the paper is organized as follows. InSection 2, the
basics of redundant arithmetic are presented.In Section 3, we
present a review of generalized CORDICalgorithm, radix-2 and
radix-4 CORDIC algorithms. In
O x
y
Min
(xout, yout)
(xin, yin)
Figure 1: Two dimensional vector rotation.
Section 4, general architectures being employed in literaturefor
the implementation of the CORDIC algorithm arediscussed. In Section
5, the complete taxonomy of rotationalCORDIC algorithms is
presented. Section 6 presents thelow latency nonredundant CORDIC
algorithm. Sections 79 provide dierent redundant CORDIC algorithms
alongwith the architectures being proposed in the literature for
therotational CORDIC, followed by the comparison of dierentmethods
in Section 10. Finally, conclusions are presented inSection 11.
2. Redundant Arithmetic [32, 33]
A nonredundant radix- number system has the set{0, 1, . . . , 1}
and all numbers can be uniquely represented.To avoid carry
propagation delay in addition, redundantbinary number system is
employed. The two common redun-dant number systems employed in
CORDIC arithmeticare the signed-digit (SD) [3437] and the
carry-save (CS)[38] number systems. In a SD number system for
radix, the numbers are represented with digit set {, +1, . . . ,1,
0, +1, . . . ,}, where ( 1) and (1 (1)). For symmetric digit set, =
, and each digit s of SDnumber system is represented as (s+, s) by
(p,n) encodingsuch that (s+ s = s). In the radix-2 SD number
system,numbers are represented with digits {1, 0, 1}. In the
carry-save number system, numbers are represented with digit set{0,
1, 2}. It may be observed that, in both SD and CS numbersystems
each number can be represented in multiple ways.The redundancy in
SD and CS number representation limitsthe carry propagation from
each stage to its immediate moresignificant bit position only. In
both the SD/CS adders, allsum bits are generated with two full
adder delay independentof the word length. Hence, the application
of redundantarithmetic can accelerate the additions/subtractions
due tocarry-free or limited carry-propagation.
3. CORDIC Algorithm
The CORDIC algorithm involves rotation of a vector v onthe
XY-plane in circular, linear and hyperbolic coordinatesystems
depending on the function to be evaluated. Tra-jectories for the
vector vi for successive CORDIC iterationsare shown in Figure 2.
This is an iterative convergence
-
VLSI Design 3
algorithm that performs a rotation iteratively using a seriesof
specific incremental rotation angles selected so that eachiteration
is performed by shift and add operation. Thenorm of a vector in
these coordinate systems is defined
asx2 + my2, where m {1, 0,1} represents a circular,
linear or hyperbolic coordinate system respectively. Thenorm
preserving rotation trajectory is a circle defined by x2 +y2 = 1 in
the circular coordinate system. Similarly, the normpreserving
rotation trajectory in the hyperbolic and linearcoordinate systems
is defined by the function x2 y2 = 1 andx = 1, respectively. The
CORDIC method can be employedin two dierent modes, namely, the
rotation mode and thevectoring mode. The rotation mode is used to
perform thegeneral rotation by a given angle . The vectoring
modecomputes unknown angle of a vector by performing a finitenumber
of microrotations.
3.1. Generalized CORDIC Algorithm. The generalized equa-tions of
the CORDIC algorithm for an iteration can bewritten as [7]
xi+1 = xi miyiSm,i ,yi+1 = ixiSm,i + yi,zi+1 = zi im,i,
(2)
where i represents either clockwise or counter
clockwisedirection of rotation, represents the radix of the
numbersystem, m steers the choice of circular (m = 1), linear(m =
0) or hyperbolic (m = 1) coordinate systems, Sm,iis the
nondecreasing integer shift sequence, and m,i is theelementary
rotation angle. The latter directly depends on Sm,ithrough the
relation
m,i = 1m
tan1(
mSm,i). (3)
The shift sequence Sm,i depends on the coordinate system andthe
radix of number system. Sm,i aects the convergence ofthe algorithm
and n aects the accuracy of the final result.A detailed discussion
on these is presented later. The valueof i depends on the radix of
the number system and isdetermined by the following equation
assuming that vectoris either in the first or in the fourth
quadrant:
i =
sign(zi), for rotation mode,
sign(yi), for vectoring mode,
(4)
where z and y are the steering variables in rotation
andvectoring mode respectively. The required microrotations arenot
perfect and increase the length of the vector. In order tomaintain
a constant vector length, the obtained results haveto be scaled by
the scale factor
K =
i
ki,
ki =
1 + m2i 2Sm,i ,
(5)
where ki denotes the elementary scaling factor of the
ithiteration, and K is the resultant scaling factor after
niterations. The computation of scale factor and its compen-sation
increases the computational overhead and hardwaredepending on the
number system employed in the CORDICarithmetic.
With the appropriate initial values of x, y, and z, bothrotation
and vectoring modes can be used to computecommonly used elementary
functions [39] given in Table 1.
3.2. CORDIC Algorithm for Circular Coordinate System. Wepresent
in this section the detailed description of 2D planerotation in
circular coordinate system, since this is usedin many applications.
The CORDIC algorithm calculatestrigonometric functions, rotation of
a vector and angle ofa vector by realizing two dimensional vector
rotation incircular coordinate systems. Figure 3 shows the rotation
ofa vector with length Min by a sequence of microrotationsthrough
the elementary angles i. Equation (2) represents theiterative
rotation by an angle i in circular coordinate systemfor m = 1 and
is given by
xi+1 = xi i yii,
yi+1 = ixii + yi,zi+1 = zi ii .
(6)
The values of i are chosen such that tan (i) = i andthe
multiplication of tangent term is reduced to simpleshift operation.
It may observed that the norm of vectorin (i + 1)th iteration is
extended compared to that inith rotation, that is Mi+1 = Mi
1 + tan2. The increase
in magnitude of the vector in every iteration depends onthe
radix of the number system and number of iterationsand is
represented by the scale factor K . The direction ofiterative
rotation is determined using zi or yi depending onrotation mode or
vectoring mode respectively. The numberof microrotations to be
performed in both the modesdepends on the desired computing
accuracy and can beconstant for a particular computer of finite
word length. Thenumber of microrotations in turn decides the number
ofelementary angles. The iterative equations of the CORDICalgorithm
for radix-2 and radix-4 number systems will bepresented in the
following sections.
3.2.1. Rotation Mode. In rotation mode, the input angle will be
decomposed using a finite number of elementaryangles [3]
= 00 + 11 + + n1n1, (7)
where n indicates the number of microrotations, i is
theelementary angle for ith iteration and i is the directionof ith
microrotation. In rotation mode, z0 is the angleaccumulator
initialized with the input rotation angle. Thedirection of vector
in every iteration must be determinedto reduce the magnitude of the
residual angle in the angleaccumulator. Therefore, the direction of
rotation in any
-
4 VLSI Design
Table 1: Realization of some functions using CORDIC
Algorithm.
m Mode Initialization Output
1 (Circular) Rotation
x0 = xin xn = Km (xin cos yin sin )
y0 = yin yn = Km (yin cos + xin sin )z0 = zn = 0
x0 = 1/Km xn = cos
y0 = 0 yn = sin
z0 = zn = 0
x0 = 1 xn =
1 + a2
y0 = a yn = sin
z0 = /2 zn = 0
1 (Circular) Vectoring
x0 = xin xn = Km sign(x0) (x2in + y2in)1/2
y0 = yin yn = 0z0 = 0 zn = tan1(yin/xin)
0 (Linear) Rotation
x0 = xin xn = xiny0 = yin yn = yin + xin z
z0 = z zn = 0
0 (Linear) Vectoring
x0 = xin xn = xiny0 = yin yn = 0
z0 = z zn = z + yin/xin
1 (Hyperbolic) Rotationx0 = xin xn = Km (xin cosh + yin sinh
)
y0 = yin yn = Km (yin cosh + xin sinh )
z0 = zn = 0
x0 = 1/Km xn = cosh
y0 = 0, z0 = zn = 0, yn = sinh
x0 = a xn = ae
y0 = a, z0 = zn = 0, yn = ae
1 (Hyperbolic) Vectoring x0 = xin xn = Km sign(x0) (x2in
y2in)1/2
y0 = yin yn = 0, zn = + tanh1(yin/xin)
x0 = a xn =a2 1
y0 = 0 yn = 0, zn = coth1a
x0 = a + 1 xn = 2a
y0 = a 1 yn = 0, zn = 0.5 ln(a)
x0 = a + b xn = 2ab
y0 = a b yn = 0, zn = 0.5 ln(a/b)
-
VLSI Design 5
O x
y
v (i) =[x (i)y (i)
]
x2 + y2 = 1
v1v3
v4
v2
v0
(a) Circular system
Ox
y
v (i) =[x (i)y (i)
]
v1
v3
v2
v0
x = 1
(b) Linear system
O x
y
v (i) =[x (i)y (i)
]
x2 y2 = 1
v1
v3
v2
v0
y = x
y = x
(c) Hyperbolic system
Figure 2: Rotation in various coordinate systems.
O x
y
Min
(xout, yout)
(xin, yin)
(a)
=
i=0 i
ROT() =
i=0 ROT(i)
(b)
y
O xXi+1 Xi
Yi+1
Yi
Mi (1
+ta
n2 i)
Mi tani
Mii
Q
(c)
Figure 3: CORDIC algorithm based 2D vector rotation.
iteration is determined using the sign of the residual
angleobtained in the previous iteration. The coordinates of avector
obtained after n microrotations are
xn = K(xin cos yin sin
),
yn = K(xin sin + yin cos
),
zn 0.(8)
3.2.2. Vectoring Mode. In vectoring mode, the unknownangle of a
vector is determined by performing a finite numberof microrotations
satisfying the relation [3]
= 00 + 11 + + n1n1. (9)The vectoring mode rotates the input
vector through apredetermined set of n elementary angles so as to
reduce they coordinate of the final vector to zero as closely as
possible.Therefore, the direction of rotation in every iteration
mustbe determined based on the sign of residual y
coordinateobtained in the previous iteration. The coordinates
obtainedin vectoring mode after n iterations are given by
xn = Kx2in + y
2in,
yn 0,
zn = tan1(yinxin
).
(10)
3.2.3. Radix-2 CORDIC Algorithm. The iteration equationsof the
radix-2 CORDIC algorithm [7] in rotation mode ofcircular coordinate
system at the (i + 1)th step are obtainedby using = 2 in (6) and
are given by
xi+1 = xi i2i yi,yi+1 = i2ixi + yi,zi+1 = zi ii,
(11)
where i = tan1(2i) and
i =1, for zi < 0,1, otherwise.
(12)
In order to maintain a constant vector length, the
obtainedresults have to be scaled by the scale factor K given
by
K =n1
i=0
1 + 22i. (13)
For radix-2 CORDIC, K 1.65. The major drawback of
theconventional CORDIC algorithm is its relatively high latencyand
low throughput due to the sequential nature of theiteration process
with carry propagate addition and variableshifting in every
iteration. To overcome these drawbacks,pipelined implementations
are proposed [40, 41]. However,
-
6 VLSI Design
the carry propagate addition remained a bottleneck forfurther
throughput improvement. Two major methodologieshave been employed
in the literature to increase the speedof CORDIC implementation.
One reduces the delay ofeach iteration by adopting redundant
arithmetic to radix-2 CORDIC [42] to eliminate carry propagate
addition. Theother technique involves reducing the number of
iterationsby increasing the radix employed for the implementation
ofCORDIC algorithm [43].
The redundant radix-2 CORDIC [42] is proposed byemploying
redundant arithmetic. The direction of rotationsi, are selected
from the set {1, 0, 1} in contrast to {1, 1}employed in the
conventional CORDIC. These i values arecomputed by evaluating a few
most significant digits of zi,since the determination of sign of a
redundant number takeslong time. This redundant CORDIC algorithm
performs norotation extension for i = 0 and aects the value of
scalingfactor K , thus making it data-dependent. Therefore, K hasto
be calculated for each microrotation. This calculation
andcorrection increases the computation time and hardware.
3.2.4. Redundant Radix-4 CORDIC Algorithm. As men-tioned above,
the speed of CORDIC algorithm implementa-tion can be improved by
reducing the number of iterations.The iteration equations for the
radix-4 CORDIC algorithmin rotation mode derived at the (i + 1)th
step by using = 4in (6) and are given by
xi+1 = xi i4i yi,
yi+1 = i4ixi + yi,
wi+1 = wi tan1(i4i
),
(14)
where i {2,1, 0, 1, 2}. The final x and y coordinatesare scaled
by
K =
i0ki =
i0
(1 + 2i 4
2i)1/2
. (15)
Here, the scale factor K depends on the values of i andhence,
has to be computed in every iteration. The range ofK is (1, 2.52)
for radix-4 CORDIC. In this CORDIC, thedirection of rotation is
computed based on the estimatedvalue of wi [43]. The w path
involves the computationof estimated wi and evaluation of selection
function todetermine i resulting in increase of the iteration
delaycompared to that of radix-2. However, the number ofiterations
required for radix-2 CORDIC can be halved byemploying the radix-4
CORDIC algorithm.
The Scale factor computation and compensation,CORDIC algorithm
convergence and accuracy aspects arepresented in following
sections.
3.2.5. Scale Factor Computation. The CORDIC rotation stepschange
the length of the vector in every iteration resulting inthe
distortion of the norm of the vector as shown in Figure 3and is
given by (5). In nonredundant radix-2 CORDIC, K isconstant since =
1. However, K is no longer constant for
nonredundant radix higher than 2, and redundant numbersystem.
For radix-2, the scale factor needs to be computed forn/2
iterations as ki =
1 + 22i becomes unity for i > n/2+1.
In redundant radix-4 CORDIC [43], scale factor (15) is
notconstant. In addition, it is sucient to compute K for
n/4iterations as ki =
1 + 42i becomes unity thereafter.
3.2.6. Scale Factor Compensation. The scale factor compen-sation
technique involves scaling of the final coordinates(xn, yn) with
1/K . The most direct method for scalingoperation is the
multiplication of (xn, yn) by 1/K usingthe CORDIC module in linear
mode [7]. This can realizedusing the CORDIC module in linear mode
[7]. However,this method requires n shift and add operations which
arecomparable to the computational eort of the CORDICalgorithm
itself. Since K1 is constant for radix-2, thecomputational overhead
can be reduced by using CSDrecoded multiplier. On an average, the
number of nonzerodigits can be reduced to n/3 using CSD
representation[32] and hence, the eort for multiplication using
CSDrecoded multiplier is approximately one third that requiredusing
conventional multiplier. Further, scaling can also beimplemented
using a Wallace tree by fully parallelizingmultiplication and is
preferred for applications aiming forlow latency at the expense of
more silicon area [44].
Scaling may be done by extending the sequence ofCORDIC
iterations [9, 16, 17] to avoid additional hardwarerequired in the
direct method. A comparison of several scalefactor compensation
techniques proposed in the literaturealong with two additional
methods, additive and multi-plicative decomposition approaches, for
radix-2 CORDIC ispresented in [44]. It is observed from the
presented resultsthat additive technique oers a low latency
solution andmultiplicative technique oers an area economical
solutionfor applications of CORDIC employing array and
pipelinedarchitectures. An algorithm is proposed [45] to
performsscale factor compensation in parallel with the
CORDICrotation using nonredundant and redundant arithmetic,thereby,
eliminating the final multiplication [3] or additionalscaling
iterations [9, 16, 17].
3.2.7. Convergence. The CORDIC algorithm involves therotation of
a vector to reduce the z or y coordinate of thefinal vector as
closely as possible to zero for rotation orvectoring mode
respectively. The maximum value of rotationangle by which the
vector can be rotated depends on the shiftsequence [7]. The
expected results of the CORDIC algorithmcan be obtained if the z or
y coordinate is driven sucientlyclose to zero. In addition, it can
be guaranteed to drive z or yto zero, if the initial values of a
vector (xin, yin, zin) or (xin, yin)lies within the permissible
range. These ranges define thedomain of convergence of the CORDIC
algorithm.
For n-bit precision, the given rotation angle can bedecomposed
as
=n1
i=0ii + , (16)
-
VLSI Design 7
where is an angle approximation error such that || < n1and is
negligible in practical computation [7]. This angleapproximation
error in rotation and vectoring mode can becomputed as
(rotation) = tan1(zn),
(vectoring
) = tan1(ynxn
).
(17)
The magnitude of elementary angle for the given shiftsequence
may be predetermined using
i = tan1Sm,i , (18)
where is the radix of the number system. The directionof
rotation i must be selected to drive z or y towardszero for
rotation or vectoring respectively. The range of idepends on the
radix and digit set being used for the numbersystem. Since the
number of iterations and elementary anglesto be traversed by the
vector during these iterations arepredetermined, the range of for
which CORDIC algorithmcan be used, called domain of convergence, is
given by [7]
|| =n1
i=0i + n1. (19)
The convergence range of CORDIC algorithm can be definedfor
rotation mode as
zin n1
i=0i + n1 (20)
and for vectoring mode as
tan1(yinxin
)
n1
i=0i + n1. (21)
The expected final results cannot be obtained, if the
giveninitial values xin, yin and zin do not satisfy these
convergencevalues. The range of convergence of the CORDIC
algorithmcan be extended from /2 to using preprocessingtechniques
[7, 27, 46].
3.3. Accuracy. The accuracy of the CORDIC algorithm isaected by
two primary sources of error, namely, angleapproximation and
rounding error. The error bounds forthese two sources of error are
derived by performing thedetailed numerical analysis of the CORDIC
algorithm [47].The approximation error and the rounding error
derivedare combined to yield the overall quantization error in
theCORDIC computation. The overall quantization error can beassured
to be within the range by considering an additionallog2n guard bits
in the implementation of the CORDICalgorithm [7].
3.3.1. Angle Approximation Error. Theoretically, the
rotationangle is decomposed into infinite number of
elementaryangles as shown in Figure 3. For practical
implementation,
CORDICarchitecture
Folded Unfolded
Bit serial Word serial Pipelined Parallel
Figure 4: Taxonomy of CORDIC architectures.
a finite number of microrotations n are considered. Hence,the
input rotation angle can only be approximated resultingin an angle
approximation error
< n1, (22)
where n1 is the residual angle after n microrotations.Hence, the
accuracy of the output of the nth iteration isprincipally limited
by the magnitude of the last rotationangle.
3.3.2. Rounding Error. The second type of error calledrounding
error is due to the truncation of CORDIC internalvariables by the
finite length of storage elements. In additionscale factor
compensation also contributes to this error. In abinary code, the
truncation of intermediate results after everyiteration introduces
maximum rounding error of log2n bits.To achieve a final accuracy of
1 bit in n bits, an additionallog2n guard bits must be considered
in implementation ofthis algorithm [7].
4. CORDIC Architectures
In this section, a few architectures for mapping the
CORDICalgorithm into hardware are presented. In general,
thearchitectures can be broadly classified as folded and unfoldedas
shown in Figure 4, based upon the realization of the threeiterative
equations (6). Folded architectures are obtained byduplicating each
of the dierence equations of the CORDICalgorithm into hardware and
time multiplexing all theiterations into a single functional unit.
Folding providesa means for trading area for time in signal
processingarchitectures. The folded architectures can be
categorizedinto bit-serial and word-serial architectures depending
onwhether the functional unit implements the logic for one bitor
one word of each iteration of the CORDIC algorithm.
The CORDIC algorithm has traditionally been imple-mented using
bit serial architecture with all iterationsexecuted in the same
hardware [3]. This slows down thecomputational device and hence, is
not suitable for highspeed implementation. The word serial
architecture [7, 48]is an iterative CORDIC architecture obtained by
realizingthe iteration equations (6). In this architecture, the
shiftersare modified in each iteration to cause the desired
shiftfor the iteration. The appropriate elementary angles, i
areaccessed from a lookup table. The most dominating speedfactors
during the iterations of word serial architecture are
-
8 VLSI Design
carry/borrow propagate addition/subtraction and variableshifting
operations, rendering the conventional CORDIC[7] implementation
slow for high speed applications. Thesedrawbacks were overcome by
unfolding the iteration process[41, 48], so that each of the
processing elements alwaysperform the same iteration as shown in
Figure 5. The mainadvantage of the unfolded pipelined architecture
comparedto folded architecture is high throughput due to the
hard-wired shifts rather than time and area consuming
barrelshifters and elimination of ROM. It may be noted that
thepipelined architecture oers throughput improvement by afactor of
n for n-bit precision at the expense of increasing thehardware by a
factor less than n.
5. CORDIC Taxonomy
The implementation of CORDIC algorithm has evolvedover the years
to suit varying requirements of applicationsfrom conventional
nonredundant to redundant nature.The unfolded implementation with
redundant arithmeticinitiated the eorts to address high latency in
conventionalCORDIC. Subsequently, several modifications have
beenproposed for redundant CORDIC algorithm to achievereduction in
iteration delay, latency, area and power. Theevolution of the
unfolded rotational CORDIC algorithmsis shown in Figure 6. As this
taxonomy is fairly rich, theremainder of the review presents
taxonomy in top-downapproach.
CORDIC is broadly classified as nonredundant CORDICand redundant
CORDIC based on the number systembeing employed. The major drawback
of the conventionalCORDIC algorithm [3, 7] was low throughput and
highlatency due to the carry propagate adder used for
theimplementation of iterative equations. This contradicted
thesimplicity and novelty of the CORDIC algorithm attractingthe
attention of several researchers to device methods toincrease the
speed of execution. The obvious solution is toreduce the time for
each iteration or the number of iterationsor both. The redundant
arithmetic has been employedto reduce the time for each iteration
of the conventionalCORDIC. We have analyzed and presented in the
followingSections, features of dierent pipelined and
nonpipelinedunfolded implementations of the rotational CORDIC.
6. Low Latency NonredundantRadix-2 CORDIC [49]
A significant improvement for the conventional rotationalCORDIC
algorithm in circular coordinate system is pro-posed [50],
employing linear approximation to the rotationwhen the remaining
angle is small. This remaining angle ischosen such that a first
order Taylor series approximationof sin r and cos r , calling r the
remaining angle, may beemployed as sin r r and cos r 1. The
architecture forthe implementation of this algorithm using
nonredundantarithmetic is presented in [49]. The iteration
equations ofthis algorithm for the first n/2 + 1 microrotations are
sameas those for the conventional CORDIC algorithm (11). The
i values for the first n/3 iterations are determined
itera-tively using the sign of angle accumulator zi. The
rotationdirections from iteration n/3 + 1 onwards can be
generatedin parallel, since the conventional circular arc tangent
radixvalues approach the radix-2 coecients progressively
forincreasing values of CORDIC iteration index as evident fromthe
expression
limk
tan(
2k)
2k= 1. (23)
For the range of iterations (n/3 + 1) i (n/2 + 1), alli values
are determined from the recoded representationof remaining angle
z(n/3+1). These i values are used toobtain z(n/2+1) from z(n/3+1).
For i > (n/2 + 1), the CORDICmicrorotations are replaced by a
single rotation using theremaining angle z(n/2+1). Thus, (11) is
modified as
x f = x(n/2+2) = k(n/2+1)(x(n/2+1) r y(n/2+1)
),
y f = y(n/2+2) = k(n/2+1)(rx(n/2+1) + y(n/2+1)
),
(24)
where r = z(n/2+1), k(n/2+1) is the scale factor in the
(n/2+1)thiteration and (x f , y f ) are the scaled final
coordinates.
Scale Factor. The low latency nonredundant radix-2CORDIC
algorithm achieves constant scale factor sincei {1, 1} and performs
the scale factor compensationconcurrently with the computation of x
and y coordinates,using two multipliers in parallel [49]. This is
in contrast totwo series multiplications required in the algorithm
[50].
7. Constant Scale Factor RedundantRadix-2 CORDIC
Redundant radix-2 CORDIC methods can be classified asvariable
and constant scale factor methods based on thedependence of scale
factor on the input angle. In redundantradix-2 CORDIC [42], i {1,
0, 1} and hence scale factorK is data-dependent. Therefore, K has
to be calculated foreach microrotation. This calculation and
correction increasesthe computation time and hardware. Several
redundantCORDIC algorithms with constant scale factor are
availablein the literature [5153] to address data dependency ofthe
scale factor as shown in Figure 7. In these methods,the iterative
rotations of a point around the origin on theXY-plane are
considered (see Figure 1). The direction ofeach rotation depends on
the sign of steering variable zi,which represents the remaining
angle of rotation. Sincethe computation of the sign of redundant
number requiresmore time, estimated value of zi (zi) is used to
determinethe direction of rotation. The estimated value is
computedbased on the value of the three most significant digits
ofzi. Constant scale factor is achieved by restricting i tothe set
{1, 1}, thus facilitating a faster implementation.The constant
scale factor methods can be classified basedon the arithmetic
employed as redundant radix-2 CORDICwith signed digit arithmetic
and carry save arithmetic (seeFigure 7).
-
VLSI Design 9
X0 Y0 Z0 0
0 +/
Wired shift (1) Wired Shift (1)
+/ +/
+/ +/ +/
+/ +/ +/
MUXsign (Y)
MUX
MUX
Wired shift (i) Wired shift (i)
0 0
1 1
sign (Y)
sign (Y)
1 1 1
i i i
i
sign (Z)
sign (Z)
sign (Z)Xi+1
X1
X2
Xi
Y1
Y2
Yi
Yi+1
2
i+1
Z1
Z2
Zi
Zi+1
. . . . . . . . .
. . . . . . . . .
Figure 5: Unfolded pipelined CORDIC architecture.
Scale Factor. The scale factor need not be computed for
theimplementation of all the constant scale factor
techniquesdiscussed in this section. In these methods, no
specificscale factor compensation technique is considered. It maybe
noted that a specific compensation technique can beconsidered
depending on the application.
7.1. Constant Scale Factor Redundant CORDIC Using SDArithmetic.
The redundant radix-2 CORDIC using SDarithmetic can be further
classified based on the tech-nique employed to achieve constant
scale factor (seeFigure 7). These methods are implemented using
thebasic CORDIC iteration recurrences (11) with
necessarytransformations.
7.1.1. Double Rotation Method [51]. The double rotationmethod
performs two rotation-extensions for each elemen-tary angle during
the first n/2 iterations for n bit precisionto achieve constant
scale factor independent of the operand.One rotation extension is
performed for every elementaryangle for iterations greater than
n/2. A negative rotationis performed by two negative subrotations,
and a positiverotation by two positive subrotations. A nonrotation
isperformed by one negative and one positive subrotation.Hence, 50%
additional iterations are required compared tothe redundant CORDIC
[42].
7.1.2. Correcting Rotation Method [51]. This is anothermethod
proposed to achieve constant scale factor for thecomputation of
sine and cosine functions. This methodavoids rotation corresponding
to i = 0 and performs onerotation extension in every iteration
depending on the zi.Further, extra rotation extensions are
performed at fixedintervals for correcting the error introduced by
avoidingi = 0 and to assure convergence. If b fractional bits
areused to estimate zi, the interval between correcting
iterationsshould be less than or equal to (b 2) [54]. This
methodalso requires 50% additional iterations, if three or four
mostsignificant digits are used for sign estimation. The increase
inlatency of rotational CORDIC due to these double rotationand
correcting iteration methods is reduced using branchingalgorithm
[52].
7.1.3. Branching Method [52]. This method implementsCORDIC
algorithm using SD arithmetic, restricting thedirection of
rotations i to 1, without the need for extrarotations. This
requires two modules in parallel to performtwo conventional CORDIC
iterations, such that, the correctresult is retained at the end of
each iteration. Two modulesperform the rotation in the same
direction if the sign ofcorresponding zi can be determined.
Otherwise, branchingis performed by making one CORDIC module (z+)
performrotation with i = +1 and another module (z) perform
-
10 VLSI Design
CO
RD
ICim
plem
enta
tion
Hig
h-s
peed
bit-
leve
lpip
elin
ed(1
992,
1996
)
Red
un
dan
tC
OR
DIC
Non
redu
nda
nt
CO
RD
IC
Rad
ix2
(195
9)lo
wla
ten
cyra
dix2
(198
9,20
08)
Rad
ix2
Rad
ix2-
4(1
996)
Rad
ix4
Var
iabl
esc
ale
fact
or
Car
rysa
ve(1
997)
Sign
eddi
git
(199
3)
Con
stan
tsc
ale
fact
orV
aria
ble
scal
efa
ctor
(198
7)
Sign
eddi
git
arit
hm
etic
Car
ry-S
ave
arit
hm
etic
ipr
edic
tion
Dou
ble
rota
tion
(199
1)
Cor
rect
ing
rota
tion
(199
1)
Bra
nch
ing
(199
3)D
CO
RD
IC(1
996)
Dou
ble
step
(199
8)Lo
w-l
aten
cy(1
992)
Low
-lat
ency
(199
2)P
CO
RD
IC(2
002)
Hyb
rid
CO
RD
IC(1
997)
Flat
(200
2),P
ara
(200
4)Se
mifl
at(2
006)
Figure 6: Taxonomy of CORDIC algorithms.
-
VLSI Design 11
Constantscale factor
Signed digitarithmetic
Carry-Savearithmetic
High-speedbit-level pipeilined
(1992, 1996)
Low-latency(1992)
Double Step(1998)
DCORDIC(1996)
Branching(1993)
CorrectingRotation(1991)
Double rotation(1991)
Figure 7: Taxonomy of constant scale factor redundant radix-2
CORDIC methods.
rotation with i = 1. The direction of rotation in the
nextsubsequent rotation is decided by the sign of that zi
modulewhose value is small. In every iteration i, angle
accumulator(z+ or z) computes the remaining angle (z+i or z
i ) to
determine the direction of rotation for the next iteration.
Thedirection of rotation is determined by examining window ofthree
digits of z+i or z
i .
The disadvantage of branching method is the necessityof
performing two conventional CORDIC iterations inparallel which
requires almost two fold eort in terms ofimplementation complexity.
In addition, one of the moduleswill not be utilized when branching
does not take place.However, this method oers faster implementation
thandouble and correcting rotation methods [51], since, it doesnot
require additional iterations to achieve constant scalefactor.
7.1.4. Double Step Branching Method [53]. The performanceof
branching algorithm is enhanced by the double stepbranching method
to improve utilization of hardware. Thismethod involves determining
two distinct i values ineach step with some additional hardware
compared to thebranching method, where the two modules do
dierentcomputations only when branching takes place. Doublestep
branching method determines the two direction ofrotations by
examining the six most significant digits to do adouble step. These
six digits are divided into two subgroupsof three digits each, and
each subgroup is handled inparallel, to generate the required i
using zeroing modules (zpath). Although double stepping method
introduces a smallhardware overhead compared to the branching
method, it isbetter than the latter since it increases the
utilization of x/yrotator modules.
7.2. Constant Scale Factor Redundant CORDIC Using CSArithmetic.
It is worth discussing here one more classifi-cation related to
constant scale factor redundant radix-2CORDIC (see Figure 7). The
implementation of redundantCORDIC with constant scale factor using
signed arithmeticresults in an increase in the chip area [5153] and
latency[51] by at least 50% compared to redundant radix-2CORDIC
[42]. Low latency CORDIC algorithm [55] anddierential CORDIC
algorithm [56, 57] with constant scale
factor using CS arithmetic have been proposed to reduce
thisoverhead, the details of which are discussed below.
7.2.1. Low Latency Redundant CORDIC [55]. This algorithmis
proposed to reduce the latency of redundant CORDIC[51] by
subdividing the n iterations into dierent groups andusing dierent
techniques for each of these groups. For all theiterations, if i =
1, conventional iteration equations (11)are used. This method
avoids i = 0 for iterations between0 i (n 3)/4 and employs
correcting rotation method[51]. For iterations (n 3)/4 < i (n +
1)/2, i = 0 isconsidered as a valid choice. Since for this group of
iterationski =
1 + 22i = 1 + 22i1 holds within n-bit precision,
vector is not rotated for i = 0. However, the length ofthe
vector is increased by the scale factor for that iteration,as the
final coordinates are scaled assuming constant scalefactor. For the
iterations i > (n + 1)/2, no correcting factor isrequired as the
scale factor becomes unity.
7.2.2. DCORDIC [56]. In the sign estimation methods [5153], half
of the computational eort in the x/y/z datapaths of rotational
CORDIC is required to allow for thecorrection of possible errors,
as the sign estimation is notentirely perfect. This problem is
reduced by high speedbit-level pipelining technique with CS
arithmetic proposedin [57]. This algorithm involves the
transformation of theconventional CORDIC iteration equations (11)
into partiallyfixed iteration equations, given by
zi+1 = zi
i,
xi+1 = xi sign(zi)2i yi ,
yi+1 = sign(zi)2ixi + yi.(25)
It is clear from these expressions that the computation ofx and
y requires the actual sign of zi, while the angleaccumulator
requires only the absolute value of zi. Theactual sign of zi (i)
can be determined by taking intoaccount the initial sign of z0 and
providing informationabout sign changes during the absolute value
computation ofzi. Similarly, all i values are computed recursively.
Later thistechnique is implemented with SD arithmetic and
proposedas Dierential CORDIC (DCORDIC) algorithm [56]. Since
-
12 VLSI Design
RedundantCORDIC
Radix 2(1987)
Radix 2-4(1996)
Radix 4(1993, 1997)
Figure 8: Classification of CORDIC algorithms based on the
radix.
the sign calculation of steering variable (zi) during
absolutevalue computation takes long time, most significant
digitfirst absolute value technique is employed. This
techniquereplaces the word level sign dependence by a bit
leveldependence, reducing the overall computation time. Thebit
level pipelined architecture is proposed to implementthese
transformed iteration sequences, thus allowing highoperational
speed.
8. Higher Radix Redundant CORDIC
As mentioned earlier, throughput and latency are
importantperformance attributes in CORDIC based systems. Thevarious
radix-2 CORDIC algorithms presented so far maybe used to reduce the
iteration delay, thereby improvingthe throughput, with constant
scale factor. Higher radixCORDIC algorithms using SD arithmetic
[54, 58] andCS arithmetic [43, 59] are proposed to address
latencyreduction. This is possible, since higher radix
representationreduces the number of iterations. The classification
of redun-dant CORDIC algorithms proposed in the literature basedon
the radix of the number system is shown in Figure 8. Theapplication
of radix-4 rotations in the CORDIC algorithmwas initially proposed
in [54] to accelerate the radix-2algorithm.
Scale factor need not be computed for the constantscale factor
algorithms to be discussed in this section.Since no specific scale
factor compensation technique isconsidered for these methods, a
compensation technique canbe considered depending on the
application.
8.1. Pipelined Radix-4 CORDIC [58]. The generalizedCORDIC
algorithm for any radix in three coordinate systemsand
implementation of the same in rotation mode ofcircular coordinate
system using radix-4 pipelined CORDICprocessor is presented in
[58]. This algorithm performs twosuccessive radix-2 microrotations
with the same microrota-tion angle using the iteration
equations
xi+1 = xi (i,1 + i,2
)4i yi i,1i,242ixi,
yi+1 =(i,1 + i,2
)4ixi + yi i,1i,242i yi,
zi+1 = zi (i,1 + i,2
)i,
(26)
where i,1 and i,2 are two redundant radix-2 coecientsto
decompose radix-4 coecient i {2,1, 0, +1, +2}satisfying the
relation (i = i,1 + i,2). The value of i isselected as 0 = 21 and i
= 4i for 1 i n 1. The
selection function for i is determined using the five
mostsignificant digits of z-coordinate, ensuring the convergenceof
this algorithm. This algorithm is designed using SDarithmetic and
requires two adders/subtractors for each stageof x/y data path in
contrast to one adder/subtractor requiredin radix-2 CORDIC [42],
for i < n/4. However, the numberof additions required are
reduced during the last n/4 stages.
Scale Factor Computation. The scale factor K in radix-4CORDIC
algorithm is variable, since i takes values from thedigit set {2,1,
0, +1, +2}. K is computed in each iterationusing the combinational
circuit by realizing the expression
K =n/21
i=0ki =
n/21
i=0
(1 +
i,142i
)1/2(1 +
i,242i
)1/2.
(27)
8.2. Redundant Radix 2-4 CORDIC [59]. The number ofrotations in
a redundant radix-2 CORDIC rotation unit isreduced by about 25% by
expressing the direction of rota-tions in radix-2 and radix-4 [54].
This algorithm employsdierent modified CORDIC algorithms using CS
arithmeticfor dierent subsets of iterations. For the iterations 1 i
(n/2 + 1),redundant radix-4 CORDIC algorithm is employed,
thus,halving the number of iterations. An unified architecture
isproposed for the implementation of this algorithm to operatein
rotation/vectoring mode of circular and hyperbolic coor-dinate
systems.
Scale Factor Computation. This algorithm achieves constantscale
factor, since the rotation corresponding to = 0 isavoided for i n/2
+ 1. Fori > n/2 + 1 scale factor neednot be computed as ki =
1 + 42i 1.
8.3. Radix-4 CORDIC [43]. A redundant radix-4 CORDICalgorithm is
proposed using CS arithmetic, to reduce thelatency compared to
redundant radix-2 CORDIC [42].This algorithm (14) computes i values
using two dierenttechniques. For the microrotations in the range 0
i < (n/6),i is determined sequentially using angle accumulator.
Forthe microrotations in the range i (n/6), the i values
arepredicted from the the remaining angle after the first n/6[60].
Thus, the complexity of the w path is n/6, compared ton in the
other architectures [4253] presented in the previoussections. For
the range 0 i < (n/6), microrotations arepipelined in two stages
to increase the throughput. A 32-bitpipelined architecture is
proposed for the implementation ofthe radix-4 CORDIC algorithm
using CS arithmetic.
Scale Factor Computation. The possible scale factors
areprecomputed and stored in a ROM. The number of possiblescale
factors for 2i {0, 1, 4} is 3n/4+1. The size of ROM andaccess time
increases with n. Hence, the scale factors for someiterations are
stored in ROM and these values are used to
-
VLSI Design 13
ParallelCORDIC
Low-latency(1992)
Hybrid CORDIC
Flat CORDIC(2002)
Para CORDIC(2004)
PCORDIC(2002)
Semiflat CORDIC(2006)
MixedPartitioned
Figure 9: Taxonomy of direction prediction based
CORDICalgorithms.
compute the scale factor for remaining iterations with
thecombinational logic. This is designed by realizing the firstfew
terms of Taylor series expansion of scale factor. For thisredundant
radix-4 implementation, the number of iterationsare reduced at the
expense of adding hardware for computingthe scale factor.
9. Parallel CORDIC Algorithms
The CORDIC algorithms discussed so far have represented using a
set of elementary angles i called arc tangent radix set[3]
= 00 + 11 + + n1n1, (28)
where i = tan1(2i) and i {1, 1}, satisfying theconvergence
theorem [7]
i n1
j=i+1j < n1 (29)
in contrast to the representation using a normal radix
= 020 + 121 + + n12n+1. (30)
The direction of rotation i for the ith iteration is
determinedafter computing the (i1) iterations sequentially. It is
evidentfrom this sequential dependence of the radix system that
thespeed of CORDIC algorithm can be improved by avoidingthe
sequential behavior in the computation of i values or
x/ycoordinates. The various redundant CORDIC algorithmsproposed in
the literature employing either one or both thesetechniques are
shown in Figure 9 and are discussed in thefollowing sections.
9.1. Low Latency Radix-2 CORDIC [55]. The low latencyparallel
radix-2 CORDIC architecture presented for therotation mode [55]
predicts is by eliminating sequentialdependency of the z path. In
order to minimize theprediction error, directions are predicted for
a group of
iterations at a time rather than for all iterations
together.This architecture does not allow rotation for index i =
0.Hence, the convergence range of this architecture is lessthan
(/2, +/2). On the other hand, the requirementof redundant to binary
conversions of intermediate resultsin the z path restricts the
pipelined implementation ofthis architecture. In order to reduce
the latency of thisparallelizing scheme further, termination
algorithm andbooth encoding method have been proposed.
9.2. P-CORDIC [61]. The sequential procedure in the com-putation
of direction of rotations of the CORDIC algorithmis eliminated by
the P-CORDIC algorithm, while main-taining a constant scale factor.
This algorithm precomputesthe direction of microrotations before
the actual CORDICrotation starts iteratively in the x/y path. This
is obtainedby deriving a relation between the constructed
binaryrepresentation of direction of rotations d, and rotation
angle [40, 62] given by
= 0.5 + 0.5c1 + sign()0 + , (31)
where c1 = 2
i=0(2i tan1(2i)), =n/3
i=1(ii),0 = 1 tan1(1), and i = 2i tan1(2i). Here, iscomputed
using the partial oset i and the correspondingdirection bit i for
the first n/3 iterations, since the valueof i decreases by a factor
of 8 beyond n/3 iterations. Thedirection of rotations for any input
angle in binary form areobtained by realizing this expression
taking a variable oset from ROM. The unfolded architecture proposed
for theimplementation of this algorithm eliminates the z path
andreduces the area of the implementation. This
architectureachieves latency and hardware reduction over the
radix-2unfolded parallel architecture [55].
Scale Factor. The scale factor in the implementation of P-CORDIC
algorithm remains constant, as i {1, 1} beinggenerated for the
implementation of x/y path. The scalefactor compensation is
implemented using constant factormultiplication technique as
discussed in Section 3.2.6.
9.3. Hybrid CORDIC Algorithm. For n-bit fixed pointCORDIC
processor in circular coordinate system, nearly n/3iterations must
be computed sequentially. This is true forboth generation of
direction and rotation without aectingaccuracy [60]. The subsequent
rotation directions for thelast 2n/3 iterations can be generated in
parallel since theconventional circular ATR values approach the
radix-2coecients progressively with increasing iteration index,
thatis,
limk+
tan 2k
2k= 1. (32)
This behavior is exploited by introducing the hybridCORDIC
algorithms to speed up the conventional CORDICrotator. This
algorithm involves partitioning into H andL. The rotation by H are
performed as in the conventionalCORDIC algorithm and the iterations
related to L can be
-
14 VLSI Design
simplified as in linear coordinate system. This algorithm ledto
the development of several parallel CORDIC algorithms[6365]. These
can be categorized broadly as mixed-hybridCORDIC and
partitioned-hybrid CORDIC algorithms. Inmixed-hybrid CORDIC
algorithms [65], the input angle and initial coordinates (xin, yin)
are used to compute therotations for the first n/3 iterations as in
the conventionalCORDIC. The remaining angle after these first n/3
iterationis used for computing directions for the last 2n/3
iterations.The implementation is designed to keep the fast
timingcharacteristics of redundant arithmetic in the x/y path ofthe
CORDIC processing. In the partitioned-hybrid CORDIC[63, 64], the
first n/3 direction of rotations are generatedusing the first n/3
bits of and last 2n/3 direction of rotationsare predicted using the
2n/3 least significant bits of .
9.3.1. Flat CORDIC [63]. The flat CORDIC algorithm isproposed to
eliminate iterative nature in the x/y path forreducing the total
computation time. This algorithm trans-forms x/y recurrences (11)
of the conventional CORDIC intoa parallelized version by successive
substitution to expressthe final vectors in terms of the initial
vectors, resulting in asingle equation for n-bit precision. The
expressions for finalcoordinates of 16-bit sine/cosine generator
are
x16 =[1 {(122122 12321223
232223 91029210)
+(123421222324 + + 234522232425 + +346723242627
)+ ECX
}],
y16 =[121 + 222 + + 16216
(123212223 578252728)
+(123452122232425 + +234562223242526
)+ ECY
],
(33)
where ECX and ECY are the error compensation factorsin x16 and
y16, respectively. xin and yin are initialized with1/K and 0
respectively. The 16 sign digits (1, 2, . . . , 15, 16)for 16-bit
precision represents the polarity of 16 microrota-tions required to
achieve the target angle. These equationsdemonstrate the complete
parallelization of the conventionalCORDIC algorithm. This technique
precomputes i whichtakes values from the set {1, 1} to achieve
constant scalefactor. The is for the first n/3 iterations are
precomputedemploying a technique, called Split Decomposition
Algo-rithm (SDA), which limits the input angle range to (0,/4)[66].
The last 2n/3 number of is are predicted from theremaining angle of
n/3 iterations. The internal word lengthof the architecture
proposed for this technique is consideredas (n + log2n) for n-bit
external accuracy [47]. It may benoted that the complete
parallelization of x/y iterations leadto the exponential increase
of terms to be flattened, aecting
the circuit complexity. In addition, the implementation offlat
CORDIC needs complex combinational hardware blockswith poor
scalability.
Scale Factor. The scale factor in the implementation of theflat
CORDIC algorithm is maintained constant, since i {1, 1}. The scale
factor compensation is implemented usinga multiplier designed with
CS adder tree.
9.3.2. Para-CORDIC [64]. The Para-CORDIC parallelizesthe
generation of direction of rotations from the binaryvalue of the
input angle by employing binary to bipolarrepresentation (BBR) and
microrotation angle recoding(MAR) techniques. This algorithm
computes x/y coordi-nates iteratively while eliminating iterative z
path completely.The input angle is divided into the higher part H
andlower part L. The twos complement binary representationof input
angle is
= (d0) +l1
i=1di2i +
n
i=ldi2i, (34)
where di {0, 1} and l = (n log23)/3. The (l 1) bitsof input
angle are converted into BBR, and MAR techniqueis employed to
determine the direction of rotations 1 tol1. Since tan12i /= 2i,
this method performs additionalmicrorotations for every iteration
depending on each posi-tional binary weight 2i for i = 1, 2, . . .
, l 1. The remainingangle after the first (l1) rotations is added
to L. The valuesof l to n+1 are obtained from BBR of the corrected
L.This method eliminates ROM for storing the predetermineddirection
of rotations. However, it requires additional x/ystages for the
repetition of a certain microrotations and arrayof adders to
compute the corrected L.
9.3.3. Semi-Flat CORDIC [65]. The iterative nature in
theimplementation of the conventional CORDIC algorithm ispartially
eliminated by semi flat algorithm. This is designedfor the semi
parallelization of the x/y/z recurrences, toimprove the speed of a
rotational unfolded CORDIC withoutincreasing the area requirements.
The internal precision istaken higher than the required external
precision in order toreduce the quantization error encountered in
the CORDICalgorithm as discussed in Section 3.2.6. For the first
bitsof i, x/y recurrences are computed iteratively using thedouble
rotation method [51] resulting in x1/y1. Then,xn1/yn1 can be
expressed in terms of these x1/y1, ifall is are predicted. The is
for (nint/3 ) bits (nint= internal precision) of input angle are
precomputed andstored in ROM, which is addressed by (nint/3 ) bits
ofinput angle. The remaining (2nint/3) number of is arepredicted
from rotation angle [60]. It may be noted thatneither the
description nor the reference is provided for splitdecomposition
method employed to precompute (nint/3)number of is.
The computation time and area of the chip are aectedby the
choice of , which is clear from the simulationresults presented in
[65]. It is observed from these simulation
-
VLSI Design 15
results that the best trade-o is obtained with = 6 and = 8 for a
16-bit CORDIC (internal precision 22 bits)and 32-bit CORDIC
(internal precision 39 bits) respectively.After iterations, all the
terms of (xn/yn) were added usingthe Wallace tree, flattening the
x/y path. However, thisarchitecture has poor scalability.
Scale Factor. This algorithm achieves constant scale
factor,since i takes value from the set {1, 1}.
10. Comparison
We have presented a latency estimate comparison ofunfolded
architectures available in the literature for 2Drotational CORDIC
in Table 2. Latency is defined as sumof the delays for the
computation of redundant x/y coor-dinates, scale factor
compensation and redundant to binaryconversion of final x/y
coordinates. The design detail of scalefactor compensation and
redundant to binary conversionstages is not made available in the
literature for all thearchitectures as discussed in the previous
sections (Sections69). Hence, we have compared all the CORDIC
algorithmswith respect to the latency required for the rotation
computa-tion, excluding the scale factor compensation and
redundantto binary conversion stages. All the architectures
presentedin this table are implemented using redundant
arithmeticexcept the conventional CORDIC [3] and the Low
latencynonredundant CORDIC [49].
The nonpipelined and pipelined implementation of theconventional
radix-2 CORDIC algorithm [3, 7] requiresn iterations to compute x/y
coordinates iteratively. Theiteration delay depends on the fast
carry propagate adder,which is the bottleneck to increase
throughput and reducelatency.
The application of redundant arithmetic [42] to theconventional
CORDIC makes i to take values from theset {1, 0, 1} instead of the
set {1, 1}. The i values arecomputed iteratively and the choice of
i = 0 resulted in thevariability of the usually constant scale
factor. The variablescale factor increases the area and delay for
scale factorcomputation. The latency of this implementation is
ntstage,where tstage is the iteration stage delay in terms of full
adderdelay tFA.
The double rotation and correcting rotation redundantCORDIC
methods using SD arithmetic are proposed in[51], to reduce the cost
of the scale factor computation.The nonpipelined and pipelined
implementation of thesemethods require latency of 1.5ntstage to
compute final x/ycoordinates iteratively. These methods achieve
constantscale factor, increasing the latency by 50% compared
to[42].
Low latency CORDIC algorithm [55] reduces the latencyto ((9n
3)/8)tstage compared to that 1.5ntstage in [51]. Thisalgorithm
computes iteratively the direction of rotations andx/y coordinates.
In addition, a nonpipelined architecture isalso proposed in this
paper using prediction technique. Thelatency of this architecture
is (n + log3n 1)tstage.
Branching algorithm using signed digit arithmetic isproposed to
achieve constant scale factor. The latency of non-pipelined and
pipelined implementation of this algorithmis ntstage. This
algorithm achieves 50% latency improvementover [51] to compute
final x/y coordinates iteratively.However, it requires double the
hardware as two sets of x/y/zmodules are employed.
The direction of rotations computed using the signestimation
methods [51, 52, 55] may not be accurate,therefore, half of the
computational eort is required forcorrection. DCORDIC algorithm is
proposed to determinethe direction of rotations iteratively using
the sign of steeringvariable. However, this method requires an
initial latencyof ntFA before the CORDIC rotation starts, to obtain
thefirst direction of rotation. The signs are obtained for
theremaining iterations with one full adder delay using bit
levelpipelined architecture with n stages. This
implementationrequires latency of ntstage + (n + 1)tFA to compute
the finalx/y coordinates iteratively. In addition, this method
requires2.5n initial register rows for skewing of input data.
All the methods presented so far reduce the latency bydecreasing
the iteration delay using redundant arithmetic.Since the latency
reduction can also be obtained by reduc-ing the number of
iterations, the same has motivated toimplement radix-4 pipelined
CORDIC processor [58], whichresults in latency of (3n/4 +
1)tstage.
The mixed radix CORDIC algorithm [59] is proposedusing radix-2
and radix-4 rotations for designing a pipelinedprocessor to operate
in rotation and vectoring modes ofcircular and hyperbolic
coordinate systems. The latency ofthis pipelined architecture
requires (3n/4 + 1) stages withthree dierent stage delays (tstage)
as 31tNAND(1 i < n/4),34tNAND(n/4 i (n/2 + 1)) and 36tNAND(i
> (n/2 + 1)).This architecture takes more stage delay as this is
designedfor various modes of operation.
The advantage of applying radix-4 rotations for alliteration
stages is exploited in [43] with less number ofadders as compared
to [58]. For the microrotations in therange 0 i < (n/6), the
pipelined architecture proposedfor this algorithm implementation
determines i valuessequentially using angle accumulator. For the
microrotationsin the range i (n/6), the i values are determined
fromthe remaining angle after n/6 iterations. The latency of
thisarchitecture to compute the final x/y coordinates iterativelyis
(2n/3 + 2)tstage.
In [61], P-CORDIC algorithm is proposed to eliminate zpath
completely, using a linear relation between the rotationangle and
the corresponding direction of all microrotationsfor rotation mode.
This algorithm computes the x/y coordi-nates iteratively. The
latency of the nonpipelined architectureproposed to implement this
algorithm for n-bit precision is(n/12 + log2n + 1.75 + 2n)tFA.
The iterative nature in the x/y/z path is eliminatedat the cost
of scalability by the flat CORDIC algorithm[63]. This algorithm
transforms x/y recurrences (11) ofthe conventional CORDIC into a
parallelized version, bysuccessive substitution to express the
final vectors in termsof the initial vectors, resulting in a single
equation for n-bit precision. The direction of rotations are
precomputed
-
16 VLSI Design
Ta
ble
2:C
ompa
riso
nof
vari
ous
rota
tion
alC
OR
DIC
arch
itec
ture
s,(S
D:s
ign
eddi
git,
CS:
carr
ysa
ve).
Met
hod
(yea
r)R
adix
,A
rith
met
icLa
ten
cy(t
FA)
Iter
ativ
eIt
erat
ive
Scal
efa
ctor
Non
pipe
lined
Pip
elin
edX/Y
path
Z-p
ath
K
Non
redu
nda
nt
(195
9)[3
]2
2s
com
plim
ent
n2
n2
Con
stan
t
Red
un
dan
t(1
987)
[42]
2C
Snt s
tage
nt s
tage
Var
iabl
e
Dou
ble-
rota
tion
/2
SD1.
5nt s
tage
1.5n
t sta
ge
C
onst
ant
Cor
rect
ing
(199
1)[5
1]
Low
late
ncy
(199
2)[5
5]2
CS
(n+
log 3n
1)t s
tage
((9n
3)/8
)tst
age
Con
stan
t
Bra
nch
ing
(199
3)[5
2]2
SDnt s
tage
nt s
tage
Con
stan
t
DC
OR
DIC
(199
6)[5
6]2
SD/C
S
(nt s
tage
+n
+1)
Con
stan
t
Rad
ix-4
(199
3)[5
8]4
SD
(3n/4
+1)t s
tage
Var
iabl
e
Rad
ix2-
4(1
996)
[59]
24
CS
(3n/4
+1)t s
tage
Con
stan
t
Rad
ix-4
(199
7)[4
3]4
CS
(2n/3
+1)t s
tage
n/6
Var
iabl
e
PC
OR
DIC
(200
2)[6
1]2
SD(1.7n
+1.
25+
log 2n
)
Con
stan
t
Flat
CO
RD
IC(2
002)
[63]
2SD
34fo
r16
-bit
,50
for
32-b
it
com
bin
atio
nal
C
onst
ant
Para
-CO
RD
IC(2
004)
[64]
2C
S(2
(s(n
)+n/2l+
2)+
lo
g 1.5n
+2
)
Con
stan
t
Sem
i-fl
at(2
006)
[65]
2SD
33fo
r16
-bit
/
com
bin
atio
nal
C
onst
ant
Non
redu
nda
nt
low
-lat
ency
(200
8)[4
9]2
2s
com
plim
ent
(n/2
+2)t a
dder
+t m
ult
iplie
r(n/2
+1)/m
ult
iplie
rn/3
Con
stan
t
-
VLSI Design 17
before initiating the computation of x/y coordinates. Thefinal
x/y coordinates are computed using combinationalblocks with the
latency of 34tFA/16-bit and 50tFA/32-bit. Theexpressions for x and
y variables need to be derived andcombinational building blocks
have to be redesigned withchange in precision.
In [64], Para-CORDIC algorithm is proposed to pre-compute the
direction of rotations without using ROM,while eliminating
iterative z path completely. This methoduses additional x/y stages
for the repetition of a certainmicrorotations to predict the
direction of rotations incontrast to ROM employed in [61, 63, 65].
The latency of thisPara-CORDIC is ((2(s(n) + n/2 l + 2) + log1.5n +
2))tFA,where l = (n log23)/3 and s(n) represents the total numberof
microrotations required in MAR recoding of (l1) bits ofthe input
angle. The values of s(n) for 16/32/64-bit precisionare 5, 18, 52
respectively.
The semiflat technique is proposed in [65], to
partiallyeliminate the iterative nature in x/y/z paths for the (n )
iterations ( = 6 for a 16-bit CORDIC and = 8for a 32-bit CORDIC,
respectively). The latency of thenonpipelined implementation of
this algorithm is 33tFA/16-bit and 49tFA/32-bit, respectively. It
is observed that thisarchitecture is combinational after iterations
and has poorscalability.
In [49], the x/y coordinates are computed iteratively forthe
(n/2 + 1) iterations using (n/2 + 1) number of fast adders.These
values are used to compute the final x/y coordinatesusing two
multipliers in parallel and one adder resulting inthe latency of
((n/2 + 2)tadder + tmultiplier). The i values for thefirst (n/3 +
1) iterations are determined iteratively using thesign of angle
accumulator zi. For the range (n/3 + 1) < i (n/2 + 1), the
rotation directions are generated in parallel.
11. Conclusions
In this paper, we have surveyed the algorithms for
unfoldedimplementation of 2D rotational CORDIC algorithms.Special
attention has been devoted to the systematic andcomprehensive
classification of solutions proposed in theliterature. In addition
to the pipelined implementation ofnonredundant radix-2 CORDIC
algorithm that has receivedwide attention in the past, we have
discussed the impor-tance of redundant and higher radix algorithms.
We havealso stressed the importance of prediction algorithms
toprecompute the directions of rotations and parallelization ofx/y
path. It is worth noting that the considered algorithmsshould not
be implemented as alternatives over the others,rather they should
be integrated depending on the designconstraints of a specific
application.
We can draw final conclusions about the dierentalgorithms to
achieve ecient implementation of applica-tion specific rotational
CORDIC algorithm. As far as theapplication of redundant arithmetic
to the pipelined imple-mentation of the conventional radix-2 CORDIC
algorithmis concerned, area is doubled with reduction in the
adderdelay of each stage from (log2n)tFA to 2tFA. Similarly,
thehardware and iteration delay of redundant radix-2 CORDIC
can be reduced by employing prediction technique forthe
precomputation of direction of rotations. Further, thelatency
reduction of this can be achieved by integrating theprediction
technique with the redundant radix-4 arithmetictrading the area for
variable scale factor computation.Another important observation
about the solutions pro-posed with fully parallelization of x/y
path is that it aectsthe modularity and regularity of the
architecture leading toa poor scalable implementation. Finally, we
conclude thatthe solution which can allow the design of scalable
archi-tecture, employing prediction and x/y path
parallelizationtechniques to redundant CORDIC algorithm can
achieveboth latency reduction and throughput improvement.
References
[1] D. S. Cochran, Algorithms and accuracy in the
HP-35,Hewlett-Packard Journal, vol. 23, no. 10, 1972.
[2] J.-M. Muller, Elementary Functions: Algorithms and
Implemen-tation, Birkhauser, Boston, Mass, USA, 2004.
[3] J. E. Volder, The CORDIC trigonometric computing tech-nique,
IRE Transactions on Electronic Computers, vol. 8, no.3, pp. 330334,
1959.
[4] J. E. Volder, The birth of CORDIC, Journal of VLSI
SignalProcessing, vol. 25, no. 2, pp. 101105, 2000.
[5] D. H. Daggett, Decimal-binary conversions in CORDIC,IRE
Transactions on Electronic Computers, vol. 8, pp. 335339,1959.
[6] J. E. Meggitt, Pseudo division and pseudo
multiplicationprocesses, IBM Journal, vol. 6, no. 2, pp. 210226,
1962.
[7] J. S. Walther, A unified algorithm for elementary
functions,in Proceedings of the AFIPS Spring Joint Computer
Conference,pp. 379385, May 1971.
[8] J. S. Walther, The story of Unified CORDIC, Journal of
VLSISignal Processing, vol. 25, no. 2, pp. 107112, 2000.
[9] G. L. Haviland and A. A. Tuszynski, A CORDIC
arithmeticprocessor chip, IEEE Journal of Solid-State Circuits,
vol. 15,no. 1, pp. 415, 1980.
[10] Y. H. Hu, CORDIC-based VLSI architectures for digital
signalprocessing, IEEE Signal Processing Magazine, vol. 9, no. 3,
pp.1635, 1992.
[11] A. A. J. de Lange, A. J. van der Hoeven, E. F.
Deprettere,and J. Bu, Optimal floating-point pipeline CMOS
CORDICprocessor, in Proceedings of the IEEE International
Symposiumon Circuits and Systems (ISCAS 88), vol. 3, pp.
20432047,June 1988.
[12] A. A. J. de Lange, A. J. van der Hoeven, E. F.
Deprettere,and P. Dewilde, An application specific IC for digital
signalprocessing: the floating point pipeline CORDIC processor,
inProceedings of the European Conference on ASIC Design (ASIC90),
pp. 6267, May 1990.
[13] D. E. Metafas and C. E. Goutis, A DSP processor with
apowerful set of elementary arithmetic operations based oncordic
and CCM algorithms, Microprocessing and Micropro-gramming, vol. 30,
no. 15, pp. 5157, 1990.
[14] D. Timmermann, H. Hahn, B. J. Hosticka, and G. Schmidt,A
programmable CORDIC chip for digital signal processingapplications,
IEEE Journal of Solid-State Circuits, vol. 26, no.9, pp. 13171321,
1991.
[15] A. A. J. de Lange and E. F. Deprettere, Design
andimplementation of a floating-point quasi-systolic generalpurpose
CORDIC rotator for high-rate parallel data and signal
-
18 VLSI Design
processing, in Proceedings of the 10th IEEE Symposium onComputer
Arithmetic, pp. 272281, June 1991.
[16] A. M. Despain, Fourier transform computers using
CORDICiterations, IEEE Transactions on Computers, vol. 23, no. 10,
pp.9931001, 1974.
[17] H. M. Ahmed, J.-M. Delosme, and M. Morf, Highly concur-rent
computing structures for matrix arithmetic and signalprocessing,
Computer, vol. 15, no. 1, pp. 6582, 1982.
[18] Y. H. Hu and S. Naganathan, A novel implementationof a
chirp Z-transform using a CORDIC processor, IEEETransactions on
Acoustics, Speech, and Signal Processing, vol. 38,no. 2, pp.
352354, 1990.
[19] A. S. Dhar and S. Banerjee, An array architecture for
fastcomputation of discrete Hartley transform, IEEE transactionson
circuits and systems, vol. 38, no. 9, pp. 10951098, 1991.
[20] K. Maharatna, A. S. Dhar, and S. Banerjee, A VLSI
arrayarchitecture for realization of DFT, DHT, DCT and DST,Signal
Processing, vol. 81, no. 9, pp. 18131822, 2001.
[21] K. C. Ray and A. S. Dhar, CORDIC-based unified
VLSIarchitecture for implementing window functions for realtime
spectral analysis, IEE Proceedings: Circuits, Devices andSystems,
vol. 153, no. 6, pp. 539544, 2006.
[22] S. K. Rao and T. Kailath, Orthogonal digital filters for
VLSIimplementation, IEEE transactions on circuits and systems,vol.
31, no. 11, pp. 933945, 1984.
[23] P. P. Vaidyanathan, A unified approach to orthogonal
digitalfilters and wave digital filters based on LBR two pair
extrac-tion, IEEE transactions on circuits and systems, vol. 32,
no. 7,pp. 673686, 1985.
[24] Y. H. Hu and H. E. Liao, CALF: a CORDIC adaptive
latticefilter, IEEE Transactions on Signal Processing, vol. 40, no.
4,pp. 990993, 1992.
[25] J. R. Cavallaro and F. T. Luk, CORDIC arithmetic for an
SVDprocessor, Journal of Parallel and Distributed Computing, vol.5,
no. 3, pp. 271290, 1988.
[26] J. A. Lee and T. Lang, SVD by
constant-factor-redundant-CORDIC, in Proceedings of the 10th IEEE
Symposium onComputer Arithmetic, pp. 264271, June 1991.
[27] A. Banerjee, A. S. Dhar, and S. Banerjee, FPGA
realizationof a CORDIC based FFT processor for biomedical
signalprocessing, Microprocessors and Microsystems, vol. 25, no.
3,pp. 131142, 2001.
[28] A. Meyer-Base, R. Watzel, U. Meyer-Base, and S. Foo,
Aparallel CORDIC architecture dedicated to compute theGaussian
potential function in neural networks, EngineeringApplications of
Artificial Intelligence, vol. 16, no. 7-8, pp. 595605, 2003.
[29] C. Y. Kang and E. E. Swartzlander Jr., Digit-pipelined
directdigital frequency synthesis based on dierential CORDIC,IEEE
Transactions on Circuits and Systems I, vol. 53, no. 5,
pp.10351044, 2006.
[30] H. Wang, P. Leray, and J. Palicot, Reconfigurable
architecturefor MIMO systems based on CORDIC operators,
ComptesRendus Physique, vol. 7, no. 7, pp. 735750, 2006.
[31] G. J. Hekstra and E. F. A. Deprettere, Fast rotations:
low-costarithmetic methods for orthonormal rotation, in
Proceedingsof the 13th IEEE Symposium on Computer Arithmetic, pp.
116125, October 1997.
[32] K. Hwang, Computer Arithmetic: Principles, Architecture
andDesign, John Wiley & Sons, New York, NY, USA, 1979.
[33] K. K. Parhi, VLSI Digital Signal Processing Systems:
Designand Implementation, John Wiley & Sons, New York, NY,
USA,1999.
[34] A. Avizienis, Signed-digit number representation for
fastparallel arithmetic, IRE Transactions on Electronic
Computers,vol. 10, pp. 389400, 1961.
[35] D. E. Atkins, Introduction to the role of redundancy
incomputer arithmetic, IEEE Computer Magazine, vol. 8, no. 6,pp.
7477, 1975.
[36] B. Parhami, Carry-free addition of recorded binary
signed-digit numbers, IEEE Transactions on Computers, vol. 37,
no.11, pp. 14701476, 1988.
[37] B. Parhami, Generalized signed-digit number systems:
aunifying framework for redundant number representations,IEEE
Transactions on Computers, vol. 39, no. 1, pp. 8998,1990.
[38] T. G. Noll, Carry-save arithmetic for high-speed
digitalsignal processing, in Proceedings of the IEEE
InternationalSymposium on Circuits and Systems, vol. 2, pp. 982986,
May1990.
[39] R. Andraka, A survey of CORDIC algorithms for FPGAbased
computers, in Proceedings of the 6th ACM/SIGDAInternational
Symposium on Field Programmable Gate Arrays(FPGA 98), pp. 191200,
February 1998.
[40] P. W. Baker, Suggestion for a fast binary sine/cosine
genera-tor, IEEE Transactions on Computers, vol. 25, no. 11, pp.
11341136, 1976.
[41] Y. H. Hu, Pipelined CORDIC architecture for the
imple-mentation of rotational based algorithm, in Proceedings ofthe
International Symposium on VLSI Technology, Systems
andApplications, p. 259, May 1985.
[42] M. D. Ercegovac and T. Lang, Fast cosine/sine
implementa-tion using on-line CORIC, in Proceedings of the 21st
AsilomarConference on Signals, Systems, and Computers, 1987.
[43] E. Antelo, J. Villalba, J. D. Bruguera, and E. L. Zapata,
Highperformance rotation architectures based on the Radix-4CORDIC
algorithm, IEEE Transactions on Computers, vol. 46,no. 8, pp.
855870, 1997.
[44] D. Timmermann, H. Hahn, B. J. Hosticka, and B. Rix, Anew
addition scheme and fast scaling factor compensationmethods for
CORDIC algorithms, The VLSI Journal onIntegration, vol. 11, no. 1,
pp. 85100, 1991.
[45] J. Villalba, J. A. Hidalgo, E. L. Zapata, E. Antelo, and J.
D.Bruguera, CORDIC architectures with parallel compensationof the
scale factor, in Proceedings of the International Con-ference on
Application Specific Array Processors, pp. 258269,Strasbourg,
France, July 1995.
[46] X. Hu, R. G. Harber, and S. C. Bass, Expanding the range
ofconvergence of the CORDIC algorithm, IEEE Transactions
onComputers, vol. 40, no. 1, pp. 1321, 1991.
[47] Y. H. Hu, The quantization eects of the CORDIC algo-rithm,
IEEE Transactions on Signal Processing, vol. 40, no. 4,pp. 834844,
1992.
[48] M. D. Erecegovac and T. Lang, Digital Arithmetic,
Elsevier,Amsterdam, The Netherlands, 2004.
[49] E. Antelo, J. Villalba, and E. L. Zapata, A
low-latencypipelined 2D and 3D CORDIC processors, IEEE
Transactionson Computers, vol. 57, no. 3, pp. 404417, 2008.
[50] H. M. Ahmed, Ecient elementary function generation
withmultipliers, in Proceedings of the 9th Symposium on
ComputerArithmetic, pp. 5259, September 1989.
[51] N. Takagi, T. Asada, and S. Yajima, Redundant CORDICmethods
with a constant scale factor for sine and cosinecomputation, IEEE
Transactions on Computers, vol. 40, no. 9,pp. 989995, 1991.
-
VLSI Design 19
[52] J. Duprat and J.-M. Muller, The CORDIC algorithm:
newresults for fast VLSI implementation, IEEE Transactions
onComputers, vol. 42, no. 2, pp. 168178, 1993.
[53] D. S. Phatak, Double step branching CORDIC: a new
algo-rithm for fast sine and cosine generation, IEEE Transactionson
Computers, vol. 47, no. 5, pp. 587602, 1998.
[54] J. A. Lee and T. Lang, Constant-factor redundant CORDICfor
angle calculation and rotation, IEEE Transactions onComputers, vol.
41, no. 8, pp. 10161025, 1992.
[55] D. Timmermann, H. Hahn, and B. J. Hosticka, Low latencytime
CORDIC algorithms, IEEE Transactions on Computers,vol. 41, no. 8,
pp. 10101015, 1992.
[56] H. Dawid and H. Meyr, The dierential CORDIC
algorithm:constant scale factor redundant implementation
withoutcorrecting iterations, IEEE Transactions on Computers, vol.
45,no. 3, pp. 307318, 1996.
[57] H. Dawid and H. Meyr, High speed bit-level
pipelinedarchitectures for redundant CORDIC implementation,
inProceedings of the International Conference on Application,
pp.358372, 1992.
[58] J. D. Bruguera, E. Antelo, and E. L. Zapata, Design of
apipelined Radix 4 CORDIC processor, Parallel Computing,vol. 19,
no. 7, pp. 729744, 1993.
[59] E. Antelo, J. D. Bruguera, and E. L. Zapata, Unified
mixedRadix 24 redundant CORDIC processor, IEEE Transactionson
Computers, vol. 45, no. 9, pp. 10681073, 1996.
[60] S. Wang, V. Piuri, and E. E. Swartzlander Jr., Hybrid
CORDICalgorithms, IEEE Transactions on Computers, vol. 46, no.
11,pp. 12021207, 1997.
[61] M. Kuhlmann and K. K. Parhi, P-CORDIC: a precomputa-tion
based rotation CORDIC algorithm, EURASIP Journal onApplied Signal
Processing, vol. 2002, no. 9, pp. 936943, 2002.
[62] M. Kuhlmann and K. K. Parhi, A high-speed CORDIC algo-rithm
and architecture for DSP applications, in Proceedings ofthe IEEE
Workshop on Signal Processing Systems (SiPS 99), pp.732741, October
1999.
[63] B. Gisuthan and T. Srikanthan, Pipelining flat CORDICbased
trigonometric function generators, MicroelectronicsJournal, vol.
33, no. 1-2, pp. 7789, 2002.
[64] T.-B. Juang, S.-F. Hsiao, and M.-Y. Tsai,
Para-CORDIC:parallel CORDIC rotation algorithm, IEEE Transactions
onCircuits and Systems I, vol. 51, no. 8, pp. 15151524, 2004.
[65] H. S. Kebbati, J. Ph. Blonde, and F. Braun, A new
semi-flatarchitecture for high speed and reduced area CORDIC
chip,Microelectronics Journal, vol. 37, no. 2, pp. 181187,
2006.
[66] T. Srikanthan and B. Gisuthan, A novel technique
foreliminating iterative based computation of polarity of
micro-rotations in CORDIC based sine-cosine generators,
Micropro-cessors and Microsystems, vol. 26, no. 5, pp. 243252,
2002.
-
Submit your manuscripts athttp://www.hindawi.com
VLSI Design
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Hindawi Publishing Corporation http://www.hindawi.com
Journal of
EngineeringVolume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Shock and Vibration
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Mechanical Engineering
Advances in
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Distributed Sensor Networks
International Journal of
The Scientific World JournalHindawi Publishing Corporation
http://www.hindawi.com Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Modelling & Simulation in EngineeringHindawi Publishing
Corporation http://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Active and Passive Electronic Components
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Chemical EngineeringInternational Journal of
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Antennas andPropagation
International Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014
Navigation and Observation
International Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume
2014