Research Article Novel Receiver Architecture for LTE …downloads.hindawi.com/archive/2014/825183.pdfResearch Article Novel Receiver Architecture for LTE-A Downlink Physical Control
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleNovel Receiver Architecture for LTE-A Downlink PhysicalControl Format Indicator Channel with Diversity
S Syed Ameer Abbas1 S J Thiruvengadam2 and S Susithra1
1 Department of Electronics and Communication Engineering Mepco Schlenk Engineering College Sivakasi 626 005 India2Department of Electronics and Communication Engineering Thiagarajar College of Engineering Madurai 625 015 India
Correspondence should be addressed to S Syed Ameer Abbas abbas mepcoyahoocom
Received 14 November 2013 Revised 27 April 2014 Accepted 5 May 2014 Published 5 June 2014
Academic Editor Chien-In Henry Chen
Copyright copy 2014 S Syed Ameer Abbas et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Physical control format indicator channel (PCFICH) carries the control information about the number of orthogonal frequencydivision multiplexing (OFDM) symbols used for transmission of control information in long term evolution-advanced (LTE-A) downlink system In this paper two novel low complexity receiver architectures are proposed to implement the maximumlikelihood- (ML-) based algorithm which decodes the CFI value in field programmable gate array (FPGA) at user equipment (UE)The performance of the proposed architectures is analyzed in terms of the timing cycles operational resource requirement andresource complexity In LTE-A base station and UE have multiple antenna ports to provide transmit and receive diversities Theproposed architectures are implemented in Virtex-6 xc6vlx240tff1156-1 FPGA device for various antenna configurations at basestation and UE When multiple antenna ports are used at base station transmit diversity is obtained by applying the concept ofspace frequency block code (SFBC) It is shown that the proposed architectures use minimum number of operational units inFPGA compared to the traditional direct method of implementation
1 Introduction
The goal of third generation partnership project (3GPP) longterm evolution-advanced (LTE-A) wireless standard is toincrease the capacity and speed of wireless data communi-cation The LTE-A physical layer is a highly efficient meansof conveying both data and control information betweenan enhanced base station popularly known as eNodeB andmobile user equipment (UE) It supports both frequencydivision duplex (FDD) and time division duplex (TDD)configurations in uplink and downlink operations Furtherit provides a wide range of system bandwidths in order tooperate in a large number of different spectrum allocations[1]
LTE-A standard has six physical channels for downlinkThey are physical broadcast channel (PBCH) physical down-link shared channel (PDSCH) physical multicast channel(PMCH) physical downlink control channel (PDCCH)physical hybrid automatic repeat request (ARQ) indicatorchannel (PHICH) and physical control format indicator
channel (PCFICH) PBCH carries the basic system informa-tion for the other channels to be configured and operatedin the LTE-A grid The PDSCH is the main data-bearingchannel PMCH is defined for future use In LTE-A thecontrol signals are transmitted at the start of each subframein the LTE-A grid PDCCH is used to carry the schedulinginformation of different types such as downlink resourcescheduling and uplink power control instructions PHICHis used to send the acknowledgementnegative acknowledge-ment bit to UEs to indicate whether the uplink user datais correctly received or not PCFICH carries the controlinformation about the number of orthogonal frequency divi-sion multiplexing (OFDM) symbols used for transmissionof downlink control information The high data rate inLTE-A requires high processing demands on all layers ofthe system which includes high digital signal processing(DSP) hardware processing in the physical layer Further thehardware implementation of receiver structures of variousphysical channels in LTE-A becomes a challenging task as thecomputational complexity increases
Hindawi Publishing CorporationVLSI DesignVolume 2014 Article ID 825183 15 pageshttpdxdoiorg1011552014825183
2 VLSI Design
In [2] receivers were designed for a 2 times 2 antenna systemand for quadrature phase shift keying (QPSK) modulationand quadrature amplitude modulation (16-QAM and 64-QAM) Though successive interference cancellation (SIC)receiver meets the timing requirements in the LTE systemit is complex and the K-best list sphere detector (K-LSD)receiver has high latency In [3] field programmable gatearray (FPGA) and application specific integrated circuit(ASIC) implementations of receivers based on the linearminimummean-square error (LMMSE) the K-LSD iterativesuccessive interference cancellation (SIC) detector and theiterative K-LSD algorithms are carried out for spatial multi-plexing based LTE-A system The SIC algorithm is found toperformworse than the K-LSDwhen theMIMO channels arehighly correlated while the performance difference dimin-ishes when the correlation decreases The ASIC receiversare designed to meet the decoding throughput requirementsin LTE and the K-LSD is found to be the most complexreceiver although it gives the best reliable data transmissionthroughput It is shown that the receiver architecture whichcould be reconfigured to use a simple or a more complexdetector as the channel conditions change would achieve thebest performancewhile consuming the least amount of powerin the receiver FPGA implementation of MIMO detectorbased on two typical sphere decoding algorithms namely theViterbo-Boutros (VB) algorithm and the Schnorr-Euchner(SE) algorithm is carried out in [4] In this implementationmethod three levels of parallelism are explored to improvethe decoding rate the concurrent execution of the channelmatrix preprocessing on an embedded processor and thedecoding functions on customized hardware modules theparallel decoding of realimaginary parts for complex con-stellation and the concurrent execution of multiple stepsduring the closest lattice point search The implementationof low-complexity codebook searching engine is proposedto support both LTE and LTE-A operations [5] In [6]VLSI implementation of a low-complexity multiple inputmultiple output (MIMO) symbol detector based on a novelMIMOdetection algorithm calledmodified fixed-complexitysoft-output (MFCSO) detection is presented It includes amicrocode-controlled channel preprocessing unit separatechannel memory and a pipelined detection unit MATLAB-based downlink physical-layer simulator for LTE only forresearch applications is presented [7] In [8] maximumlikelihood- (ML-) based receiver structures are developedfor decoding the downlink control channels PCFICH andPHICH in LTE wireless standard and the performance ofthe receivers has been analyzed for various configurationsThe analytical results were validated against computer sim-ulations but hardware implementation of the structures wasnot coded or synthesized In [9] direct implementation ofreceive algorithms was carried out in FPGA for downlinkcontrol channels in LTE However most of these workseither propose architectures for FPGA implementation oranalyze the performance of various receiver structures in ageneralized manner The objective of this paper is to proposenovel architectures for FPGA implementation of transmit andreceive processing of downlink PCFICH channel in LTE-Astandard in particular
Table 1 CFI 32-bit block code
CFI ⟨11988731 119887
0⟩
1 01101101101101101101101101101101
2 10110110110110110110110110110110
3 11011011011011011011011011011011
4 00000000000000000000000000000000
11 Transmit and Receive Processing of PCFICH In PCFICHthe control format indicator (CFI) contains a 32-bit codeword that represents the value of CFI as 1 2 3 or 4 The CFIinforms the UE about the number of OFDM symbols usedfor the transmission of PDCCH information in a subframeThe 32-bit code word corresponding to the value of CFIis scrambled and QPSK modulated The resultant 16 QPSKcomplex symbols are mapped to the resource elements of thefirst OFDM symbol of every subframe after layer mappingand precoding to obtain transmit diversity when two ormore antenna ports are used at eNodeB [10] The 32-bit codewords for the four possible values of CFI are given in Table 1A general block diagram of the transmitter and receiverprocessing of PCFICH is shown in Figure 1
The OFDM signal is transmitted through a frequencyselective fading channel It is assumed that the number ofreceive antenna ports at UE is 119870 At each receive antennaport of theUE resource-element demapping follows the cyclicprefix removal and fast fourier transformation (FFT) The16times1 receive signal vector at each antenna port is equalized infrequency domain at each subcarrier using the corresponding16 times 1 channel frequency response vector The outputs offrequency domain equalizer from each antenna port aresummed up The resultant 16 times 1 complex vector is appliedto the maximum likelihood (ML) detector for detecting theCFI value The objective of this paper is to synthesize andimplement the receiver architecture for PCFICH
The paper is structured as follows Section 2 explains thesystem model and basic implementation architectures forsingle input single output (SISO) and single input multipleoutput (SIMO) configurations The system model and basicimplementation architecture for multiple input single output(MISO) and multiple input multiple output (MIMO) con-figurations are described in Sections 3 and 4 respectivelyThe proposed implementation architectures using foldingand superscalar methods are given in Section 5 for SISOSIMO MISO and MIMO configurations Section 6 analyzesthe performance of the proposed architectures and Section 7concludes the paper with remarks on future work
2 System Model and ImplementationArchitecture for SISO and SIMOConfigurations
The received signal model for SISO configuration of PCFICHis given by
y = h ∘ d (119898) + w (1)
VLSI Design 3
Maximum likelihood detector
CFI (1234)
Detected PCFICH information
signal vector
Mapping to 32 bits
Resource elementmapper
QPSK modulator
Baseband transmit processing at base station eNodeB
Receive processing at user equipment
OFDM symbol generator
Transmission channels
Input signal at antenna port 0
Input signal at Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
response vectorvectors for CFI 1 2 3 4
Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
Layermapping Precoding
Layers
Transmit antenna ports
middotmiddotmiddot
16 times 1
16 times 1 channel frequency
middotmiddotmiddot
middotmiddotmiddot
response vector16 times 1 channel frequency
16 times 1 precomputed data
sum
16 times 1 transmit
d(1) d(2) d(3) d(4)antenna port K minus 1
middot middot middot
16 times 1
Figure 1 Block diagram of transmitter and receiver processing
Precomputed data vectors
Detected CFI
RPB
CFI-1
RPB
CFI-2
RPB
CFI-3
RPB
CFI-4
CFI detector
d(1)
d(2)
d(3)
d(4)
16 times 1 channel frequency response vector h16 times 1 received signal vector y
r1
r2
r3
r4
Figure 2 Basic architecture for SISO configuration
where y = [1199100 1199101 119910
15]119879 is a 16 times 1 received signal
vector h = [ℎ0 ℎ1 ℎ
15]119879 is a 16 times 1 channel frequency
response vector d(119898) = [119889(119898)
0 119889(119898)
1 119889
(119898)
15]119879
is a 16 times 1complex QPSK symbol vector corresponding to CFI value
from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as
CFI = argmin119898
10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817
2
(2)
Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)
2
assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB
The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903
1 1199032 1199033 and 119903
4 The CFI detector determines which
RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
In [2] receivers were designed for a 2 times 2 antenna systemand for quadrature phase shift keying (QPSK) modulationand quadrature amplitude modulation (16-QAM and 64-QAM) Though successive interference cancellation (SIC)receiver meets the timing requirements in the LTE systemit is complex and the K-best list sphere detector (K-LSD)receiver has high latency In [3] field programmable gatearray (FPGA) and application specific integrated circuit(ASIC) implementations of receivers based on the linearminimummean-square error (LMMSE) the K-LSD iterativesuccessive interference cancellation (SIC) detector and theiterative K-LSD algorithms are carried out for spatial multi-plexing based LTE-A system The SIC algorithm is found toperformworse than the K-LSDwhen theMIMO channels arehighly correlated while the performance difference dimin-ishes when the correlation decreases The ASIC receiversare designed to meet the decoding throughput requirementsin LTE and the K-LSD is found to be the most complexreceiver although it gives the best reliable data transmissionthroughput It is shown that the receiver architecture whichcould be reconfigured to use a simple or a more complexdetector as the channel conditions change would achieve thebest performancewhile consuming the least amount of powerin the receiver FPGA implementation of MIMO detectorbased on two typical sphere decoding algorithms namely theViterbo-Boutros (VB) algorithm and the Schnorr-Euchner(SE) algorithm is carried out in [4] In this implementationmethod three levels of parallelism are explored to improvethe decoding rate the concurrent execution of the channelmatrix preprocessing on an embedded processor and thedecoding functions on customized hardware modules theparallel decoding of realimaginary parts for complex con-stellation and the concurrent execution of multiple stepsduring the closest lattice point search The implementationof low-complexity codebook searching engine is proposedto support both LTE and LTE-A operations [5] In [6]VLSI implementation of a low-complexity multiple inputmultiple output (MIMO) symbol detector based on a novelMIMOdetection algorithm calledmodified fixed-complexitysoft-output (MFCSO) detection is presented It includes amicrocode-controlled channel preprocessing unit separatechannel memory and a pipelined detection unit MATLAB-based downlink physical-layer simulator for LTE only forresearch applications is presented [7] In [8] maximumlikelihood- (ML-) based receiver structures are developedfor decoding the downlink control channels PCFICH andPHICH in LTE wireless standard and the performance ofthe receivers has been analyzed for various configurationsThe analytical results were validated against computer sim-ulations but hardware implementation of the structures wasnot coded or synthesized In [9] direct implementation ofreceive algorithms was carried out in FPGA for downlinkcontrol channels in LTE However most of these workseither propose architectures for FPGA implementation oranalyze the performance of various receiver structures in ageneralized manner The objective of this paper is to proposenovel architectures for FPGA implementation of transmit andreceive processing of downlink PCFICH channel in LTE-Astandard in particular
Table 1 CFI 32-bit block code
CFI ⟨11988731 119887
0⟩
1 01101101101101101101101101101101
2 10110110110110110110110110110110
3 11011011011011011011011011011011
4 00000000000000000000000000000000
11 Transmit and Receive Processing of PCFICH In PCFICHthe control format indicator (CFI) contains a 32-bit codeword that represents the value of CFI as 1 2 3 or 4 The CFIinforms the UE about the number of OFDM symbols usedfor the transmission of PDCCH information in a subframeThe 32-bit code word corresponding to the value of CFIis scrambled and QPSK modulated The resultant 16 QPSKcomplex symbols are mapped to the resource elements of thefirst OFDM symbol of every subframe after layer mappingand precoding to obtain transmit diversity when two ormore antenna ports are used at eNodeB [10] The 32-bit codewords for the four possible values of CFI are given in Table 1A general block diagram of the transmitter and receiverprocessing of PCFICH is shown in Figure 1
The OFDM signal is transmitted through a frequencyselective fading channel It is assumed that the number ofreceive antenna ports at UE is 119870 At each receive antennaport of theUE resource-element demapping follows the cyclicprefix removal and fast fourier transformation (FFT) The16times1 receive signal vector at each antenna port is equalized infrequency domain at each subcarrier using the corresponding16 times 1 channel frequency response vector The outputs offrequency domain equalizer from each antenna port aresummed up The resultant 16 times 1 complex vector is appliedto the maximum likelihood (ML) detector for detecting theCFI value The objective of this paper is to synthesize andimplement the receiver architecture for PCFICH
The paper is structured as follows Section 2 explains thesystem model and basic implementation architectures forsingle input single output (SISO) and single input multipleoutput (SIMO) configurations The system model and basicimplementation architecture for multiple input single output(MISO) and multiple input multiple output (MIMO) con-figurations are described in Sections 3 and 4 respectivelyThe proposed implementation architectures using foldingand superscalar methods are given in Section 5 for SISOSIMO MISO and MIMO configurations Section 6 analyzesthe performance of the proposed architectures and Section 7concludes the paper with remarks on future work
2 System Model and ImplementationArchitecture for SISO and SIMOConfigurations
The received signal model for SISO configuration of PCFICHis given by
y = h ∘ d (119898) + w (1)
VLSI Design 3
Maximum likelihood detector
CFI (1234)
Detected PCFICH information
signal vector
Mapping to 32 bits
Resource elementmapper
QPSK modulator
Baseband transmit processing at base station eNodeB
Receive processing at user equipment
OFDM symbol generator
Transmission channels
Input signal at antenna port 0
Input signal at Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
response vectorvectors for CFI 1 2 3 4
Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
Layermapping Precoding
Layers
Transmit antenna ports
middotmiddotmiddot
16 times 1
16 times 1 channel frequency
middotmiddotmiddot
middotmiddotmiddot
response vector16 times 1 channel frequency
16 times 1 precomputed data
sum
16 times 1 transmit
d(1) d(2) d(3) d(4)antenna port K minus 1
middot middot middot
16 times 1
Figure 1 Block diagram of transmitter and receiver processing
Precomputed data vectors
Detected CFI
RPB
CFI-1
RPB
CFI-2
RPB
CFI-3
RPB
CFI-4
CFI detector
d(1)
d(2)
d(3)
d(4)
16 times 1 channel frequency response vector h16 times 1 received signal vector y
r1
r2
r3
r4
Figure 2 Basic architecture for SISO configuration
where y = [1199100 1199101 119910
15]119879 is a 16 times 1 received signal
vector h = [ℎ0 ℎ1 ℎ
15]119879 is a 16 times 1 channel frequency
response vector d(119898) = [119889(119898)
0 119889(119898)
1 119889
(119898)
15]119879
is a 16 times 1complex QPSK symbol vector corresponding to CFI value
from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as
CFI = argmin119898
10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817
2
(2)
Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)
2
assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB
The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903
1 1199032 1199033 and 119903
4 The CFI detector determines which
RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Baseband transmit processing at base station eNodeB
Receive processing at user equipment
OFDM symbol generator
Transmission channels
Input signal at antenna port 0
Input signal at Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
response vectorvectors for CFI 1 2 3 4
Resource element demapper
Frequency domain equalization of each subcarrier
received signal vector
Layermapping Precoding
Layers
Transmit antenna ports
middotmiddotmiddot
16 times 1
16 times 1 channel frequency
middotmiddotmiddot
middotmiddotmiddot
response vector16 times 1 channel frequency
16 times 1 precomputed data
sum
16 times 1 transmit
d(1) d(2) d(3) d(4)antenna port K minus 1
middot middot middot
16 times 1
Figure 1 Block diagram of transmitter and receiver processing
Precomputed data vectors
Detected CFI
RPB
CFI-1
RPB
CFI-2
RPB
CFI-3
RPB
CFI-4
CFI detector
d(1)
d(2)
d(3)
d(4)
16 times 1 channel frequency response vector h16 times 1 received signal vector y
r1
r2
r3
r4
Figure 2 Basic architecture for SISO configuration
where y = [1199100 1199101 119910
15]119879 is a 16 times 1 received signal
vector h = [ℎ0 ℎ1 ℎ
15]119879 is a 16 times 1 channel frequency
response vector d(119898) = [119889(119898)
0 119889(119898)
1 119889
(119898)
15]119879
is a 16 times 1complex QPSK symbol vector corresponding to CFI value
from the set 1 2 3 4 ldquo∘rdquo represents the element by elementmultiplication and w is a 16 times 1 additive white noise vectorand its elements are zero mean Gaussian random numberswith unit variance The objective is to detect the value ofCFI from the received signal vector y assuming the channelfrequency response vector h to be known Using maximumlikelihood (ML) principle CFI is detected as
CFI = argmin119898
10038171003817100381710038171003817y minus h ∘ d(119898)10038171003817100381710038171003817
2
(2)
Figure 2 shows the basic architecture for estimating CFIusing (2) in SISO configurationThe received signal vector yand the channel frequency response vector h are provided asinput to the four receiver processing blocks (RPB) along withprecomputed data vectorsd(1)d(2)d(3) andd(4)The internaldiagram for RPB CFI-1 is shown in Figure 3 It computesthe expression y minus h ∘ d(1)
2
assuming the CFI = 1 In RPB-m the precomputed data vector d(119898) is multiplied elementby element with the channel frequency response vector Theresultant (16 times 1) vector is subtracted from the (16 times 1)received signal vector y The sum of squared magnitude ofeach element in the resultant vector is the output of RPB
The inputs to the CFI detector are the 16-bit outputs ofRPBs 119903
1 1199032 1199033 and 119903
4 The CFI detector determines which
RPB output has minimum value The internal diagram forCFI detector circuit which has 4 comparator modules (CM)is shown in Figure 4 In CM-1 input 119903
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
where ldquo119870rdquo represents the number of receive antennas at UEh(119896) is 16 times 1 channel frequency response vector between thetransmit antenna and 119896th receive antenna and w
119896is 16 times 1
noise vector at 119896th receive antenna Now the objective is todetect the value ofCFI from the received signal vectors at eachreceive antenna assuming the channel frequency responsevectors at each receive antenna are knownThemaximal ratiocombining is carried out at the receiver Using maximumlikelihood (ML) principle CFI is estimated as [9]
CFI = min119898=1234
119870
sum
119896=1
10038171003817100381710038171003817y(119896) minus (h(119896) ∘ d(119898))10038171003817100381710038171003817
2
(6)
The basic architecture for estimating CFI using (6) in1 times 2 SIMO configuration shown in Figure 5 is similar to thebasic architecture of SISO configuration The received signalvector y(119896) and the channel frequency response vector h(119896)are provided as input to the four receiver processing blocks(RPB-CFI(119896)
119898) at 119896th receive antenna along with precomputed
data vectors d(1) d(2) d(3) and d(4)The outputs from themthRPB at 0th receive antenna 119903(0)
119898and 1st receive antenna 119903(1)
119898are
added to get themth input 119903119898of the CFI detector circuit
3 System Model and ImplementationArchitecture for MISO Configuration
In MISO and MIMO configurations space frequency blockcode (SFBC) based layer mapping and precoding are carriedout to obtain transmit diversity when two or more antennaports are used at eNodeB as per the 3GPP LTE wirelessstandard [1 11] It is assumed that 2 antenna ports are usedat eNodeB The 16 times 1 complex symbol vector output ofthe modulation mapper is applied to the layer mapper The8 times 1 symbol vectors at layer 0 and layer 1 are given by[1198890 1198892 1198894 11988961198898 11988910 11988912 and 119889
14] and [119889
1 1198893 1198895 1198897
1198899 11988911 11988913 and 119889
15] The precoding is carried out using
6 VLSI Design
PB-4
Detected CFI
PB-1
PB-2
PB-3
RDB-1
CFI detector
RDB
RDB-2
RDB-3
RDB-4
16 times 1 channel frequencyresponse vector
16 times 1 received signalvector y
16 times 1 precomputed signalvector s1
16 times 1 precomputed signalvector s2
16 times 1 precomputed signalvector s3
16 times 1 precomputed signalvector s4
h(0)h(1)
z
z(1)
z(2)
z(3)
z(4)
r1
r2
r3r4
Figure 7 Proposed MISO receiver architecture for PCFICH
16 times 1 channel frequency response vectors h(0) and h(1)
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 12 Multiplicands rearrangement for a single complexmultiplication block
This is repeated for all the 8 symbols in layer 0 andlayer 1 Equation (7) defines the transmission format withthe row index indicating the antenna port number and thecolumn index indicating the subcarrier index In 2 times 1MISOconfiguration the receive signals at 119894th and (119894+1)th subcarrierare given in matrix form as
[119910119894
119910lowast
119894+1
] = [ℎ(0)
119894minusℎ(1)
119894
ℎlowast(1)
119894+1ℎlowast(0)
119894+1
] [119889119894
119889lowast
119894+1
] + [119899119894
119899lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14
(8)
where ℎ(0)119894
represents the channel frequency response of 119894thsubcarrier between 0th transmit antenna port and receive
antenna 119889119894is data symbol at 119894th subcarrier and 119899
119894is the
noise at 119894th subcarrier at the receive antenna Equation (8)can simply be represented as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (9)
where y119894is 2 times 1 receive signal vector Heff119894 is the 2 times 2
channel matrix d119894is 2 times 1 complex signal vector and n
119894is
2 times 1 noise vector The objective is to detect the elements119889119894and 119889
119894+1of the data vector d
119894 Assuming that the elements
of channel frequency response matrix Heff119894 are perfectlyknown at the receiver the decoder output vector z
whereH119867eff119894 is theHermitian of the 2times2 channel transmissionmatrix Equation (10) is expanded as
[119911119894
119911lowast
119894+1
] = [ℎlowast(0)
119894ℎ(1)
119894+1
minusℎlowast(1)
119894ℎ(0)
119894+1
] [119910119894
119910lowast
119894+1
]
for 119894 = 0 2 4 6 8 10 12 14(11)
The elements of decoder output are calculated as
119911119894= ℎlowast(0)
119894119910119894+ ℎ(1)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
119911lowast
i+1 = minusℎlowast(1)
119894119910119894+ ℎ(0)
119894+1119910lowast
119894+1 for 119894 = 0 2 4 6 8 10 12 14
(12)
The PCFICH receive architecture for 2 times 1MISO config-uration is shown in Figure 7 Receiver decoding block (RDB)
VLSI Design 9
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14 2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15Folded unit
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi+1
Imi Im+1
r1
Channel frequency response hi
Received signalvector y i
Precomputeddata si
minus
2l + 1 3 5 7 9 11 13 15
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 13 Illustration of folded architecture of RPB in SISO and SIMO
gets the 16 times 1 received signal vector y and computes thedecoder output vector using (10) assuming that the channelfrequency response vectors h(0) and h(1) are known Thedetailed internal architecture of RDBM is shown in Figure 11The decoder output vectors z
119894 119894 = 0 2 4 14 are stacked as
16 times 1 vector z = [1199111198790 119911119879
2 119911
119879
14]119879 The 16 times 1 precomputed
data vectors forCFI= 1234 are represented as s1 s2 s3and s4
respectivelyThe detailed structure of receiver decoding blocks (RDB)
is shown in Figure 8 The output vectors z(1) z(2) z(3) z(4)from RDB-1 to RDB-4 are fed to the processing blocks (PB-1 to PB-4) The detailed architecture of PB-1 is shown inFigure 9 The sum of the square magnitude of the elementsof difference vector between decoded output vector z and theprecomputed data vector s
1is the output 119903
1of PB-1 Similarly
1199032 1199033 and 119903
4are computed for CFI = 2 3 and 4 using PB-
2 PB-3 and PB-4 respectively The processing block outputs1199031 1199032 1199033 and 119903
4are applied to the CFI determination circuit
shown in Figure 4 to detect the CFI value
4 System Model and ImplementationArchitecture for MIMO Configuration
In MIMO system the signals at 119894th and (119894 + 1)th subcarrier inthe receive array are given by
[[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]]
]
=
[[[
[
ℎ00
ℎ01ℎ10 ℎ11ℎlowast
01 minusℎlowast
00ℎlowast
11 minusℎlowast
10
]]]
]
[119889119894
119889lowast
119894+1
] +
[[[[[[[[
[
119899(0)
119894
119899(0)lowast
119894+1
119899(1)
119894
119899(1)lowast
119894+1
]]]]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14(13)
where ℎ119886119887
represents the channel frequency response vectorbetween 119887th transmit antenna and 119886th receive antenna and119899(119895)
119894represents the noise in 119894th subcarrier in 119895th receive
antenna In vector form it is written as
y119894= Heff119894d119894 + n
119894 for 119894 = 0 2 4 6 8 10 12 14 (14)
where y119894is 4times1 receive signal vectorHeff119894 is the 4times2 channel
frequency response vector at 119894th and (119894 + 1)th subcarrier d119894
is 2 times 1 data vector at 119894th and (119894 + 1)th subcarrier and n119894is
4 times 1 noise vector The objective is to detect the elements 119889119894
and 119889119894+1
of the data vector d119894 Assuming that the elements of
channel frequency response matrixHeff119894 are perfectly knownat the receiver the decoder output vector z is given by
whereH119867eff119894 is theHermitian of the 4times2 channel transmissionmatrix This can be expanded as
[119911119894
119911lowast
+1
] = [ℎlowast
00minusℎlowast
10
ℎlowast
01minusℎlowast
11
ℎ01
ℎ11
minusℎ00minusℎ10
]
[[[[[
[
119910(0)
119894
119910(0)
119894+1
119910(1)lowast
119894
119910(1)lowast
119894+1
]]]]]
]
for 119894 = 0 2 4 6 8 10 12 14
(16)
The decoder outputs are given by
119911119894= ℎlowast
00119910(0)
119894minus ℎlowast
10119910(0)
119894+1+ ℎ01119910(1)lowast
119894+ ℎ11119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
119911lowast
119894+1= ℎlowast
01119910(0)
119894minus ℎlowast
11119910(0)
119894+1minusℎ00119910(1)lowast
119894minusℎ10119910(1)lowast
119894+1
for 119894 = 0 2 4 6 8 10 12 14
(17)
10 VLSI Design
Channel frequencyresponse vector
Channel frequencyresponse vector
Received signalvector
Received signalvector
Precomputeddata vector z(k)i
from RDB-1using s1i
y i
y i
y i+1
y i+1
hi+1 hi+1
hihi
zi
Folded RDB unit
Folded PB unit
Conjugate
DRei
Imi
Rei
Rei
Rei+1
Rei+1
Imi
Imi
Imi+1
Im+1
minus
r1
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 1 3 5 7 9 11 13 15
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 0 2 4 6 8 10 12 14
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 1 3 5 15
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
2l + 0 2 4 14
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 14 Illustration of proposed architecture for RDB and PB in MISO and MIMO Note receiver decoding block (RDB) in MISO istermed as RDBM in MIMO
The PCFICH receiver architecture of 2 times 2MIMO configura-tions is shown in Figure 10
Receiver decoding block (RDBM) gets the 16times1 receivedsignal vector y and computes the decoder output vector using(14) assuming that the channel frequency response vectorsh(00) h(01) h(10) and h(11) are knownThe 16times1 precomputeddata vectors for CFI = 1 2 3 and 4 are represented as s(0)
1
s(0)2 s(0)3 and s(0)
4 respectively for antenna 0 and as s(1)
1
s(1)2 s(1)3 and s(1)
4 respectively for antenna 1 The received
signal vectors y(0)119894
and y(1)119894multiply with the four channel
estimation vectors to give decoded output vector z that is sentto the processing block (PB) which is shown in Figure 9 Thedecoder outputs 119911
119894 119894 = 0 2 4 14 are stacked as 16 times 1
vector z = [1199111198790 119911119879
2 119911
119879
14]119879 Similarly RDBM1 gives output
vector z(1) using the precomputed data vectors y(0)1
and y(1)1
and channel estimation vectors The architecture of PBs and
VLSI Design 11
clk
T1 T2 T3 T4
RPB
RPB
RPB
RPB
d
d
dd
d
d
r4
r3
r2
r1
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 15 Illustration of superscalar method for SISO and SIMO(with no complex multiplications and operating from 119879
1to 1198794)
the CFI detection architecture are similar to that of theMISOsystem The sum of the squared magnitude of the differencebetween each element in the decoded output vector z and itsprecomputed data in the vector z(1) is the output 119903
1of PB1
Similarly 1199032 1199033 and 119903
4are computed for other CFIThe 119903
1 1199032
1199033 and 119903
4are compared to determine the minimum value by
the CFI detector shown in Figure 4
5 PCFICH Receiver Implementation Methods
The PCFICH receiver architectures can be implementeddirectly based on the basic architectures developed inSections 3 and 4 But in order to effectively utilize theresources in FPGA the implementation of basic architecturesis done using the modified novel architectures based on VLSIDSP techniques namely folding and superscalar processingapproach
51 Direct Implementation with Multiplicands RearrangedMethod In the receiver architecture for SISO and SIMOthe 16 times 1 received signal vector is directly subtracted fromthe precomputed data vector for a given CFI This requireslesser number of multipliers and adders when comparedto MISO and MIMO In MISO and MIMO configurationscomplex multiplications are necessary for the multiplicationof H119867119890119891119891
with the received signal vector It increases thenumber of multiplications in the CFI detection processHence optimum rearrangement of the terms is carried outto minimize the number of multiplications Further theintermediate products are reused in the calculation of real andimaginary parts Consider the multiplication of two complexnumbers Reℎ+119895 Imℎ and Re119910+119895 Im119910The output realpart (119890) and imaginary part (119891) terms are given by
119890 = Re ℎRe 119910 minus Im ℎ Im 119910 119891 = Re ℎ Im 119910 + Im ℎRe 119910
(18)
It requires four multiplications and two additions To reducethe number of multiplications the terms in (18) are rear-ranged as
119890 = [Re 119910 minus Im 119910] [Re ℎ minus Im ℎ]
minus Re 119910 Im ℎ + Im 119910Re ℎ
119891 = Re 119910 Im ℎ + Im 119910Re ℎ
(19)
Since the terms Re119910 Imℎ and Im119910Reℎ are in (19) itrequires only three multiplications but five additions Thiskind of rearrangement of themultiplicands is employed in theprocessing blocks at the cost of increased additions as shownin Figure 12
52 Proposed Architecture Using Folding Method Foldingarchitecture systematically determines the control circuits inDSP architectures where multiple algorithmic operations aretime-multiplexed to a single functional unit [12] It is usedfor synthesis of DSP architectures that can be operated atsingle or multiple clocks It reduces the number of hardwarefunctional units (FUs) by a factor of 119873 at the expense ofincreased computation time
The folding architecture is introduced in the receiverstructure of RPB in SISO and SIMO configurations and ofRPB and PB in MISO and MIMO configurations as shownin Figures 13 and 14 respectively For SISO RPB there are16 hardware lines to calculate the value of 119903
1each requiring
twomultipliers Hence the number of multipliers used in oneRPB is 32 In order to reduce the number of multipliers andadders folding architecture is proposed This architectureuses only two multipliers and performs the operation of asingle hardware line 16 times in sequential wayThe differencebetween the product of channel frequency response vectorwith the precomputed data vector and the received signalvector is stored in registers At a time one resultant signalpair involves in computation using two multipliers to get thevalue of 119911
119894 Four switches operating in system clock speed
are involved in the architecture where two switches are usedto pass the real part of the signal to one multiplier whilethe other two switches are used to pass the imaginary partof the signal to another multiplier The multipliers pass theproducts to the first adder for 119911
119894 The output of the first adder
is passed to the second adder with a delay to accumulate thevalues 119911
0to 11991115into a register in subsequent clock cyclesThis
process requires 16 clock cycles and the CFI is detected at the17th clock cycle Though it takes longer time for the clockcycles to get the output the resources are minimized in thismethod
The folded architecture of decoding block of MISO andMIMO involving complex multiplication of the channelfrequency response vector and the receive signal vector isshown in Figure 14 There are 2 complex multiplications andone addition in each of the 16 hardware lines Hence totalresource elements used are 32 complex multiplications and16 additions The folded architecture which reduces to just2 complex multiplications and one addition requires fiveswitches Two switches are used to pass the first elementof the receive signal vector and its corresponding channel
12 VLSI Design
clk
T1 T2 T3 T4 T5 T6
PB
PB
PB
PB
d
d
dd
d
d
dr4
r3
r2
r1
From other hardware lines
From other hardware lines
Complexmultiplications(real part) of a
hardware line from
Complex
multiplications
(imaginary part) of
a hardware line from RDB
RDB
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 16 Illustration of superscalar method for MISO and MIMO (with complex multiplications and operating from 1198791to 1198796)
Figure 17 Simulation waveform for PCFICH receiver
frequency response vector to one multiplier and other twoswitches are used to pass the second element of receive signalvector and its channel frequency response vector to anothermultiplierThese four switches operate in system clock speedThe multipliers pass their products to the adder through thefifth switch before moving to PB This process requires 16clock cycles and the CFI is detected at the 17th clock cycle
53 Proposed Architecture Using Superscalar Method Super-scalar approach is another low resource utilizing VLSI DSPtechnique The superscalar processing method includes par-allel processing and pipelining strategies In this case paralleloperation for the 16 pairs of hardware lines is arrangedwith pipelining of the subtraction and square magnitudeoperations for each CFI SISO configuration does not havecomplex multiplications and it has only square magnitudeoperations Hence the RPB of SISOhas 16 hardware lines eachhaving 2 multipliers which results to a total of 32 multipliers
This setup requires more hardware resources than foldingbut the output is obtained at every 4th clock cycle as shownin Figure 15 SIMO configuration which involves two receiveantenna signal processing requires twice the number ofmultiplications as that of SISO and the output is obtainedat every 4th clock cycle The block ldquo119889rdquo represents the delayelement introduced to buffer the values and produce theoutputs at the same time instant
For MISO configuration the RDB has 16 hardware lineswith 2 complex multiplications each Since each complexmultiplication requires four real multiplications RDB can beexecuted in two clock cycles by reusing 64 multipliers 32multipliers are required for PB taking 4 clock cyclesHence 96multipliers are required in MISO configuration For MIMOconfiguration the RDB requires reuse of 128 multiplierstaking 2 clock cycles and an additional 32 multipliers arerequired for the PB taking 4 clock cycles Hence 160 multi-pliers are required for MISO configuration and the output isobtained at every 6th clock cycle as shown in the Figure 16The block ldquo119889rdquo represents the delay element introduced tobuffer the values and produce the outputs at the same timeinstant
6 Results and Discussion
The proposed receiver architectures for PCFICH in SISOSIMO MISO and MIMO configurations are implementedusing the Xilinx PlanAhead tool on the Virtex-6 FPGAxc6vlx240tff1156-1 device board The target device Virtex-6has only 768 DSP elements Table 2 shows the performanceof the proposed architectures using folding and superscalarmethods being compared with the direct implementationof PCFICH receiver in terms of resource utilisation speedand power for all the SISO SIMO MISO and MIMO
VLSI Design 13
clk clk
clk
clk
clk
clkDiversity[10]
Diversity[1] GND 1 o wide mux 1Address[10] Output[30]
RTL rom 2 4
3
2
1
0
e
e
e
e
Div0
Div1
Div2
Div3
SISO
SIMO
MISO
MIMO
Count CFI1[490]Count CFI2[490]Count CFI3[490]Count CFI4[490]Out[310] Op
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Figure 19 Resource utilization graph for generalized architecture
configurations The proposed architectures based on foldingand superscalar processing methods require less number ofresource elements
In the folding approach resource utilization is less com-pared to the direct and superscalar approach at the cost ofreduced speed of operation but it is suitable for real-time
frame timings When the LTE-A system operates at 14MHzbandwidth maximum time available for detection at eachsubcarrier is 992063 ns since each slot of 05ms duration in aframe (10ms radio frame duration) consists of 7 OFDM sym-bols and there are 72 subcarriers along one OFDM symbolThe total delay in the receiver architecture is within the LTEtime constraint The dynamic power consumption is less inthe folding method compared to superscalar method due todecrease in block arithmetic Direct method does not requiresequential execution and clocking and hence total powerconsumption is due to static power Hence it is inferredthat the proposed architecture based on folding method ismore suitable for CFI detection The simulation waveformof the proposed architecture based on folding method isshown in Figure 17 for SISO SIMO MISO and MIMOconfigurations
A general architecture based on folding method whichoperates at all the four SISO SIMO MISO and MIMOconfigurations has also been developed In this architecture
14 VLSI Design
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Table 3 Resource requirements of proposed architecture using folding method
Parameter Multipliers Adders Minimum clock period (ns) Total delay (ns) Speed (MHz) Total dynamic power (mW)Value 58 518 16033 256528 3898 1019
Figure 20 Implemented device in FPGA editor
a control variable ldquo119890rdquo is used to enable or disable thesubmodules SISO SIMO MISO or MIMO according tothe selection input ldquodiversityrdquo CFI is detected at every 17thclock cycle The synthesis results of a general architecturebased on folding show that it utilizes minimum resourcesin XC6VLX240TFF1156-1 Virtex 6 device (768 DSPs) Thisis summarized in Table 3 Dynamic power consumption isdue to internal switching contributed by the clock (246mW)logic (670mW) and the block arithmetic (103mW)
Figure 18 shows the RTL schematic of 4 diversity blocksldquodiv0rdquo ldquodiv1rdquo ldquodiv2rdquo and ldquodiv3rdquo corresponding to SISO SIMOMISO and MIMO controlled by wires named ldquo119890rdquo Powerconsumed includes both static power and dynamic power dueto internal switching
Figure 19 shows the resource utilization graph whichshows the percentage of registers lookup tables (LUTs) slicesDSP elements and buffers used
Figure 20 shows the implemented device in FPGA editorwith the implemented components and interconnectionsbetween the components configured into the FPGA device
7 Conclusion
In this paper low complexity low resource single or multi-antenna CFI detection at the receiver system has been pro-posed and analyzed using modelsim and implementation inthe Virtex-6 device in Xilinx PlanAhead tool In the receivercomputational complexity and the resource utilization areminimized by employing arithmetic operational rearrange-ment and suboptimal sequential DSP algorithm called thefolding approach The proposed architecture using folding
method complies with the LTE frame timing constraint inSISO SIMOMISO andMIMO configurations It is a suitablesolution for the area optimized hardware implementation ofreceiver structures for PCFICH In future a total hardwareaccommodating all the physical downlink control channelsof the 3GPP-LTE-A with low resource utilization could besynthesized and implemented
Conflict of Interests
The authors do not have direct financial relation with anycommercial identity mentioned in the paper or any otherconflict of interests
Acknowledgments
The authors wish to express their sincere thanks to All IndiaCouncil for Technical Education NewDelhi for the Grant todo the Project titled Design of Testbed for the Developmentof Optimized Architectures of MIMO Signal Processing(no 8023RIDRPS0391112) They are also thankful tothe Managements of Mepco Schlenk Engineering CollegeSivakasi and Thiagarajar College of Engineering Maduraifor their constant support and encouragement to carry outthis research work successfully
References
[1] 3GPP TS 36 211 Version 11 0 0 Release 11 (2012-10) EvolvedUniversal Terrestrial Radio Access (E-UTRA) Physical Chan-nels and Modulation 2012
[2] J Ketonen and M Juntti ldquoSIC and K-best LSD receiverimplementation for a MIMO-OFDM systemrdquo in Proceedings ofthe 16th European Signal Processing Conference (EUSIPCO rsquo08)Lausanne Switzerland August 2008
[3] J Ketonen M Juntti and J R Cavallaro ldquoPerformancemdashcomplexity comparison of receivers for a LTE MIMOOFDMsystemrdquo IEEE Transactions on Signal Processing vol 58 no 6pp 3360ndash3372 2010
[4] X Huang C Liang and J Ma ldquoSystem architecture andimplementation of MIMO sphere decoders on FPGArdquo IEEETransactions on Very Large Scale Integration Systems vol 16 no2 pp 188ndash196 2008
[5] Y Lin Y Chen C Chu C Zhan and A Wu ldquoDual-modelow-complexity codebook searching algorithm and VLSI archi-tecture for LTELTE-advanced systemsrdquo IEEE Transactions onSignal Processing vol 61 no 14 pp 3545ndash3562 2013
[6] DWu J Eilert R Asghar and D Liu ldquoVLSI implementation ofa fixed-complexity soft-output MIMO detector for high-speedwirelessrdquo EURASIP Journal on Wireless Communications andNetworking vol 2010 Article ID 893184 13 pages 2010
[7] C Mehlfuhrer M Wrulich J C Ikuno D Bosanska and MRupp ldquoSimulating the long term evolution physical layerrdquo in
VLSI Design 15
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999
Proceedings of the 17th European Signal Processing Conference(EUSIPCO rsquo09) pp 1471ndash1478 Glasgow UK August 2009
[8] S J Thiruvengadam and L M A Jalloul ldquoPerformance anal-ysisof the 3GPP-LTE physical control channelsrdquo EURASIPJournalon Wireless Communications and Networking vol 2010Article ID 914934 10 pages 2010
[9] S M Alamouti ldquoA simple transmit diversity technique forwireless communicationsrdquo IEEE Journal on Selected Areas inCommunications vol 16 no 8 pp 1451ndash1458 1998
[10] S S A Abbas and S JThiruvengadam ldquoFPGA implementationof 3GPP-LTE-A physical downlink control channel using diver-sity techniquesrdquo International Journal of Wireless and MobileComputing vol 9 no 2 p 84 2013
[11] S Ahmadi LTE-Advanced A Practical Systems Approach toUnderstanding 3GPP LTE Releases 10 and 11 Radio AccessTechnologies Sassan Ahmadi Academic Press 2013
[12] K K Parhi VLSI Digital Signal Processing SystemsmdashDesign andImplementation Wiley-Interscience 1999