-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
1. ASIC and FPGA Implementation of the Gaussian Mixture Model
Algorithm for
Real-Time Segmentation of High Definition Video
Background identification is a common feature in many video
processing systems. This paper
proposes two hardware implementations of the Open CV version
of
the Gaussian mixture model (GMM), a background identification
algorithm. The implemented
version of the algorithm allows a fast initialization of the
background model while an innovative,
hardware-oriented, formulation of the GMM equations makes the
proposed circuits able to
perform real-time background identification on highdefinition
(HD) video sequences with frame
size 1920 1080. The first of the two circuits is designed with
commercial field-programmable
gate-array (FPGA) devices as target. When implemented on Virtex6
vlx75t, the proposed circuit
process 91 HD fps (frames per second) and uses 3% of FPGA logic
resources. The second circuit
is oriented to the implementation in UMC-90 nm CMOS standard
cell technology, and is
proposed in two versions. Both versions can process at a frame
rate higher than 60 HD fps. The
first version uses the constant voltage scaling technique to
provide a low power implementation.
It provides silicon area occupation of 28847 m2 and energy
dissipation per pixel of 15.3
pJ/pixel. The second version is designed to reduce silicon area
utilization and occupies 21847
m2with an energy dissipation of 49.4 pJ/pixel.
2. Design and FPGA Implementation of High-Speed, Fixed-Latency
Serial
Transceivers
Fixed-latency serial links are important components of the
distributed measurement and control
systems. However, most high-speed Serializer-Deserializer
(SerDes) chips do not keep the same
linklatency after each power-up or reset. In this paper, we
propose a fixed-
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
latency serial transceiver based on dynamic clock phase shifting
and changeable delay tuning
technologies. Our solution can process all possible phase
offsets between the transmitted and
received clocks, so it relaxes the requirement of fanning in the
same reference clock both to the
transmitter and to the receiver. It also eliminates the
reset-relock process in the roulette approach.
We present a specific example of implementation based on the
serial transceiver in Xilinx Virtex
5 FPGA. The experiment results indicate that our transceiver can
achieve a
deterministic latency with sub-nanosecond precision.
3. DART: A Programmable Architecture for NoC Simulation on
FPGAs
The increased demand for on-chip communication bandwidth as a
result of the multicore trend
has made packet-switched networks-on-chip (NoCs) a more
compelling choice for the
communication backbone in next-generation systems . However, NoC
designs have many power,
area, and performance tradeoffs in topology, buffer sizes,
routing algorithms, and flow control
mechanisms hence, the study of new NoC designs can be very time
intensive. To address these
challenges, we propose DART, a fast and flexible FPGA-based NoC
simulation architecture.
Rather than laying theNoC out in hardware on the FPGA like
previous approaches , , our design
virtualizes the NoC by mapping its components to a generic NoC
simulation engine, composed
of a fully connected collection of fundamental components (e.g.,
routers and flit queues). This
approach has two main advantages: 1) since it is virtualized it
can simulate any NoC, and 2)
any NoC can be mapped to the engine without rebuilding it, which
can take significant time for a
large FPGA design. We demonstrate 1) that an implementation of
DART on a Virtex-II Pro
FPGA can achieve over $(100times)$ speedup over the cycle-based
software simulator Booksim
, while maintaining the same level of simulation accuracy, and
2) that a more modern Virtex-6
FPGA can accommodate a 49-node DART implementation.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
4. Defense Against Primary User Emulation Attacks in Cognitive
Radio Networks
Using Advanced Encryption Standard
This paper considers primary user emulation attacks in cognitive
radio networks operating in the
white spaces of the digital TV (DTV) band. We propose a reliable
AES-assisted DTV scheme, in
which an AES-encrypted reference signal is generated at the TV
transmitter and used as the sync
bits of the DTV data frames. By allowing a shared secret between
the transmitter and the
receiver, the reference signal can be regenerated at the
receiver and used to achieve accurate
identification of the authorized primaryusers. In addition, when
combined with the analysis on
the autocorrelation of the received signal, the presence of the
malicious user can be detected
accurately whether or not the primary user is present. We
analyze the effectiveness of the
proposed approach through both theoretical analysis and
simulation examples. It is shown that
with the AES-assisted DTV scheme, the primary user, as well as
malicious user, can be detected
with high accuracy under primary user emulation attacks. It
should be emphasized that the
proposed scheme requires no changes in hardware or system
structure except for a plug-in AES
chip. Potentially, it can be applied directly to today's DTV
system
under primary useremulation attacks for more efficient spectrum
sharing.
5. Energy-Efficient Resource Allocation in OFDM Systems With
Distributed Antennas
In this paper, we develop an energy-efficient
resource-allocation scheme with proportional
fairness for downlink multiuser orthogonal frequency-division
multiplexing
(OFDM) systems with distributedantennas. Our aim is to maximize
energy efficiency (EE) under
the constraints of the overall transmit power of each remote
access unit (RAU), proportional
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
fairness data rates, and bit error rates (BERs). Because of the
nonconvex nature of the
optimization problem, obtaining the optimal solution is
extremely computationally complex.
Therefore, we develop a low-complexity suboptimal algorithm,
which separates
subcarrier allocation and power allocation. For the
low-complexity algorithm, we first allocate
subcarriers by assuming equal power distribution. Then, by
exploiting the properties of fractional
programming, we transform the nonconvex optimization problem in
fractional form into an
equivalent optimization problem in subtractive form, which
includes a tractable solution. Next,
an optimalenergy-efficient power-allocation algorithm is
developed to maximize EE while
maintaining proportional fairness. Through computer simulation,
we demonstrate the
effectiveness of the proposed low-complexity algorithm and
illustrate the fundamental tradeoff
between energy- and spectral-efficienttransmission designs.
6. Design Flow for Flip-Flop Grouping in Data-Driven Clock
Gating
Clock gating is a predominant technique used for power saving.
It is observed that the commonly
used synthesis-based gating still leaves a large amount of
redundant clock pulses. Data-
driven gating aims to disable these. To reduce the hardware
overhead involved, flip-flops (FFs)
are grouped so that they share a common clock enabling signal.
The question of what is
the group size maximizing the power savings is answered in a
previous paper. Here we answer
the question of which FFs should be placed in a group to
maximize the power reduction. We
propose a practical solution based on the toggling activity
correlations of FFs and their physical
position proximity constraints in the layout. Our
data-drivenclock gating is integrated into an
Electronic Design Automation (EDA) commercial backend design
flow, achieving total power
reduction of 15%-20% for various types of large-scale
state-of-the-art industrial and
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
academic designs in 40 and 65 manometer process technologies.
These savings are achieved on
top of the sClock gating is a predominant technique used for
power saving. It is observed that the
commonly used synthesis-based gating still leaves a large amount
of
redundant clock pulses. Data-driven gating aims to disable
these. To reduce the hardware
overhead involved, flip-flops (FFs) aregrouped so that they
share a common clock enabling
signal. The question of what is the group size maximizing the
power savings is answered in a
previous paper. Here we answer the question of which FFs should
be placed in a group to
maximize the power reduction. We propose a practical solution
based on the toggling activity
correlations of FFs and their physical position proximity
constraints in the layout. Our data-
driven clock gating is integrated into an Electronic Design
Automation (EDA) commercial
backend design flow, achieving total power reduction of 15%-20%
for various types of large-
scale state-of-the-art industrial and academic designs in 40 and
65 manometer process technol-
gies. These savings are achieved on top of the savings obtained
by clock gating synthesis
performed by commercial EDA tools, and gating manually inserted
into the register transfer
level design.avings obtained by clock gating synthesis performed
by commercial EDA tools,
and gating manually inserted into the register transfer level
design.
7. Effect of Image Downsampling on Steganographic Security
The accuracy of steganalysis in digital images primarily depends
on the statistical properties of
neighboring pixels, which are strongly affected by the image
acquisition pipeline as well as any
processing applied to the image. In this paper, we study how the
detectability of embedding
changes is affected when the cover image is downsampled prior to
embedding. This topic is
important for practitioners because the vast majority of images
posted on
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
websites, image sharing portals, or attached to e-mails are
downsampled. It is also relevant to
researchers as the security ofsteganographic algorithms is
commonly evaluated on databases of
downsampled images. In the first part of this paper, we
investigate empirically how the
steganalysis results depend on the parameters of the resizing
algorithm-the choice of the
interpolation kernel, the scaling factor (resize ratio),
antialiasing, and the downsampled pixel
grid alignment. We report on several novel phenomena that appear
valid universally across the
tested cover sources, steganographic methods, and steganalysis
features. This paper continues
with a theoretical analysis of the simplest interpolation kernel
- the box kernel. By fitting a
Markov chain model to pixel rows, we analytically compute the
Fisher information rate for any
mutually independent embedding operation and derive the proper
scaling of the secure payload
with resizing. For least significant bit (LSB) matching and a
limited range of downscaling, the
theory fits experiments rather well, which indicates the
existence of a new scaling law expressing
the length of the secure payload when the cover size is modified
by subsampling.
8. An FPGA-Based Fully Synchronized Design of a Bilateral Filter
for Real-Time
Image Denoising
In this paper, a detailed description of a synchronous
field-programmable gate array
implementation of abilateral filter for image processing is
given. The bilateral filter is chosen for
one unique reason: It reduces noise while preserving details.
The design is described on register-
transfer level. The distinctive feature of our design concept
consists of changing the clock
domain in a manner that kernel-based processing is possible,
which means the processing of the
entire filter window at one pixel clock cycle. This feature of
the kernel-based design is supported
by the arrangement of the input data into groups so that the
internal clock of the design is a
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
multiple of the pixel clock given by a targeted system.
Additionally, by the exploitation of the
separability and the symmetry of one filter component, the
complexity of the design is widely
reduced. Combining these features, the bilateral filter is
implemented as a highly parallelized
pipeline structure with very economical and effective
utilization of dedicated resources. Due to
the modularity of the filter design, kernels of different sizes
can be implemented with low effort
using our design and given instructions for scaling. As the
original form of the bilateral filterwith
no approximations or modifications is implemented, the resulting
image quality depends on the
chosen filter parameters only. Due to the quantization of the
filter coefficients, only negligible
quality loss is introduced.
9. Subjective evaluation of HEVC and AVC/H.264 in mobile
environments
This paper compares the quality of AVC/H.264 and HEVC encoded
video in low bandwidth
mobile environments. In this study, the focus within the mobile
environment is smart phones.
The key characteristics of a smart phone are smaller screen
size, which is usually 3.5 inches
diagonal to 5.0 inches diagonal for high end smart phones and
typical cellular network
bandwidth, which is 3G or faster. Subjective evaluations were
conducted to evaluate the user
experience on a mobile device with a small screen size and video
coded at 200 and 400 Kbps.
The studies showed compelling evidence that a user's experience
in low bandwidth mobile
environments is very similar between HEVC and AVC/H.264. The
results suggest the benefits of
HEVC over AVC/H.264 in a mobile environment with lower video
bitrates and resolutions are
not as clear.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
10. Improved Method to Select the Lagrange Multiplier for
Rate-Distortion Based
Motion Estimation in Video Coding
The motion estimation (ME) process used in the H.264/AVC
reference software is based on
minimizing a cost function that involves two terms (distortion
and rate) that are properly
balanced through a Lagrangian parameter, usually denoted as
motion. In this paper we propose
an algorithm to improve the conventional way of estimating
motion and, consequently, the ME
process. First, we show that the conventional estimation of
motion turns out to be significantly
less accurate when ME-compromising events, which make the ME
process to perform poorly,
happen. Second, with the aim of improving the coding efficiency
in these cases, an efficient
algorithm is proposed that allows the encoder to choose between
three different values of
motion for the Inter 16x16 partition size. To be more precise,
for this partition size, the
proposed algorithm allows the encoder to additionally test
motion=0 and motionarbitrarily
large, which corresponds to minimum distortion and minimum rate
solutions, respectively. By
testing these two extreme values, the algorithm avoids making
large ME errors. The
experimental results on video segments exhibiting this type of
ME-compromising events reveal
an average rate reduction of 2.20% for the same coding quality
with respect to the JM15.1
reference software of H.264/AVC. The algorithm has been also
tested in comparison with a
state-of-the-art algorithm called context adaptive Lagrange
multiplier. Additionally, two
illustrative examples of the subjective performance improvement
are provided.
11. An Overview of Information Hiding in H.264/AVC Compressed
Video
Information hiding refers to the process of inserting
information into a host to serve specific
purpose(s). In this paper, information hiding methods in the
H.264/AVC compressed video
domain are surveyed. First, the general framework of information
hiding is conceptualized by
relating the state of an entity to a meaning (i.e., sequences of
bits). This concept is illustrated by
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
using various data representation schemes such as bit plane
replacement, spread spectrum,
histogram manipulation, divisibility, mapping rules, and matrix
encoding. Venues at which
information hiding takes place are then identified, including
prediction process, transformation,
quantization, and entropy coding. Related information hiding
methods at each venue are briefly
reviewed, along with the presentation of the targeted
applications, appropriate diagrams, and
references. A timeline diagram is constructed to chronologically
summarize the invention of
information hiding methods in the compressed still image and
video domains since 1992. A
comparison among the considered information hiding methods is
also conducted in terms of
venue, payload, bitstream size overhead, video quality,
computational complexity, and video
criteria. Further perspectives and recommendations are presented
to provide a better
understanding of the current trend of information hiding and to
identify new opportunities for
information hiding in compressed video.
12. VLSI Architecture Design of Guided Filter for 30 Frames/s
Full-HD
Video
Filtering is widely used in image and video processing for
various applications. Recently, the
guided filter has been proposed and became one of the popular
filtering methods. In this paper, to
achieve the computation demand of guided filtering in full-HD
video, a double integral image
architecture for guided filter ASIC design is proposed. In
addition, a reformation of the guided
filter formula is proposed, which can prevent the error resulted
from truncation in the fractional
part and modify the regularization parameter on user's demand.
The hardware architecture of
the guided image filter is then proposed and can be embedded in
mobile devices to achieve real-
time HD applications. To the best of our knowledge, this paper
is also the first ASIC design for
guided image filter. With a TSMC 90-nm cell library, the design
can operate at 100 MHz and
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
support for Full-HD (1920 1080) 30 frame/s with 92.9K gate
counts and 3.2 KB on-chip
memory. Moreover, for the hardware efficiency, our architecture
is also the best compared to
other previous works with bilateral filter.
13. Property Analysis of XOR-Based Visual Cryptography
A (k,n) visual cryptographic scheme (VCS) encodes a secret image
into n shadow images
(printed on transparencies) distributed among n participants.
When any k participants
superimpose their transparencies on an overhead projector (OR
operation), the secret image can
be visually revealed by a human visual system without
computation. However, the monotone
property of OR operation degrades the visual quality of
reconstructed image for OR-based VCS
(OVCS). Accordingly, XOR-based VCS (XVCS), which uses XOR
operation for decoding, was
proposed to enhance the contrast. In this paper, we investigate
the relation between OVCS and
XVCS. Our main contribution is to theoretically prove that the
basis matrices of (k,n)-OVCS can
be used in (k,n)-XVCS. Meantime, the contrast is enhanced
2(k-1)
times.
14. Effectiveness of Leakage Power Analysis Attacks on
DPA-Resistant Logic Styles
Under Process Variations
This paper extends the analysis of the effectiveness of Leakage
Power Analysis (LPA) attacks to
cryptographic VLSI circuits on which circuit level
countermeasures against Differential Power
Analysis (DPA) are adopted. Security metrics used for assessing
the DPA-resistance of crypto
core implementations, such as the minimum number to disclosure
(MTD) and the asymptotic
correlation coefficient, have been extended to the case of LPA.
The LPA-resistance has been
evaluated in terms of MTD as a function of the on chip noise.
Noise variances up to 10000 times
greater than the signal variance have been taken into account
and LPA attacks have been
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
successfully executed for all the logic styles under analysis
using less than 100000
measurements. Moreover the role of process variations has been
investigated through extensive
Monte Carlo simulations in order to evaluate their impact on the
leakage model for the logic
styles under analysis. Results show that LPA attacks can be
successfully carried out on the
different anti-DPA logic styles even in presence of process
variations. To the best of our
knowledge, this work proves for the first time the effectiveness
of LPA attacks in a real scenario
where on chip noise and process variations are taken into
account.
15. Data Hiding in Encrypted H.264/AVC Video Streams by Codeword
Substitution
Digital video sometimes needs to be stored and processed in an
encrypted format to maintain
security and privacy. For the purpose of content notation and/or
tampering detection, it is
necessary to perform data hiding in these encrypted videos. In
this way, data hiding in encrypted
domain without decryption preserves the confidentiality of the
content. In addition, it is more
efficient without decryption followed by data hiding and
re-encryption. In this paper, a novel
scheme of data hiding directly in the encrypted version of
H.264/AVC video stream is proposed,
which includes the following three parts, i.e., H.264/AVC video
encryption, data embedding, and
data extraction. By analyzing the property of H.264/AVC codec,
the codewords of
intraprediction modes, the codewords of motion vector
differences, and the codewords of
residual coefficients are encrypted with stream ciphers. Then, a
data hider may embed additional
data in the encrypted domain by using codeword substitution
technique, without knowing the
original video content. In order to adapt to different
application scenarios, data extraction can be
done either in the encrypted domain or in the decrypted domain.
Furthermore, video file size is
strictly preserved even after encryption and data embedding.
Experimental results have
demonstrated the feasibility and efficiency of the proposed
scheme.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
16. Optimal Transport for Secure Spread-Spectrum Watermarking of
Still Images
This paper studies the impact of secure watermark embedding in
digital images by proposing a
practical implementation of secure spread-spectrum watermarking
using distortion optimization.
Because strong security properties (key-security and
subspace-security) can be achieved using
naturalwatermarking (NW) since this particular embedding lets
the distribution of the host and
watermarked signals unchanged, we use elements of transportation
theory to minimize the global
distortion. Next, we apply this new modulation, called
transportation NW (TNW), to design a
secure watermarking scheme for grayscale images. The TNW uses a
multiresolution image
decomposition combined with a multiplicative embedding which is
taken into account at the
distribution level. We show that the distortion solely relies on
the variance of the wavelet
subbands used during the embedding. In order to maximize a
target robustness after JPEG
compression, we select different combinations of subbands
offering the lowest Bit Error Rates
for a target PSNR ranging from 35 to 55 dB and we propose an
algorithm to select them. The use
of transportation theory also provides an average PSNR gain of
3.6 dB on PSNR with respect to
the previous embedding for a set of 2000 images.
17. Impulse Noise Estimation and Removal for OFDM Systems
Orthogonal Frequency Division Multiplexing (OFDM) is a
modulation scheme that is widely
used in wired and wireless communication systems. While OFDM is
ideally suited to deal with
frequency selective channels and AWGN, its performance may be
dramatically impacted by the
presence of impulse noise. In fact, very strong noise impulses
in the time domain might result in
the erasure of whole OFDM blocks of symbols at the receiver.
Impulse noise can be mitigated by
considering it as a sparse signal in time, and using recently
developed algorithms for sparse
signal reconstruction. We propose an algorithm that utilizes the
guard band null subcarriers for
the impulse noise estimation and cancellation. Instead of
relying on ell_1 minimization as done
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
in some popular general-purpose compressive sensing schemes, the
proposed method jointly
exploits the specific structure of this problem and the
available a priori information for sparse
signal recovery. The computational complexity of the proposed
algorithm is very competitive
with respect to sparse signal reconstruction schemes based on
ell_1 minimization. The proposed
method is compared with respect to other state-of-the-art
methods in terms of achievable rates
for an OFDM system with impulse noise and AWGN.
18. Bit-Level Optimization of Adder-Trees for Multiple Constant
Multiplications
for Efficient FIR Filter Implementation
Multiple constant multiplication (MCM) scheme is widely used for
implementing transposed
direct-formFIR filters. While the research focus of MCM has been
on more effective common
subexpression elimination, the optimization of adder-trees,
which sum up the computed sub-
expressions for each coefficient, is largely omitted. In this
paper, we have identified the resource
minimization problem in the scheduling of adder-tree operations
for the MCM block, and
presented a mixed integer programming (MIP) based algorithm for
more efficient MCM-based
implementation of FIR filters. Experimental result shows that up
to 15% reduction of area and
11.6% reduction of power (with an average of 8.46% and 5.96%
respectively) can be achieved
on the top of already optimized adder/subtractor network of the
MCM block.
19. Frequency Estimation of Distorted and Noisy Signals in Power
Systems by FFT-
Based Approach
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
This paper focuses on the accurate frequency estimation of power
signals corrupted by a
stationary white noise. The noneven item interpolation FFT based
on the triangular self-
convolution window is described. A simple analytical expression
for the variance of noise
contribution on the frequency estimation is derived, which shows
the variances of frequency
estimation are proportional to the energy of the adopted window.
Based on the proposed method,
the noise level of the measurement channel can be estimated, and
optimal parameters (e.g.,
sampling frequency and window length) of the interpolation FFT
algorithm that minimize the
variances of frequency estimation can thus be determined. The
application in a power quality
analyzer verified the usefulness of the proposed method.
20. Accurate and Efficient On-Chip Spectral Analysis for
Built-In Testing and
Calibration Approaches
The fast Fourier transform (FFT) algorithm is widely used as a
standard tool to carry out spectral
analysis because of its computational efficiency. However, the
presence of multiple tones
frequently requires a fine frequency resolution to achieve
sufficient accuracy, which imposes the
use of a large number of FFT points that results in large area
and power overheads. In this paper,
an FFT method is proposed for on-chip spectral analysis of
multi-tone signals with particular
harmonic and intermodulation components. This accurate FFT
analysis approach is based on
coherent sampling, but it requires a significantly smaller
number of points to make
the FFT realization more suitable for on-chip built-in testing
and calibration applications that
require area and power efficiency. The technique was assessed by
comparing the simulation
results from the proposed method of single and multiple tones
with the simulation results
obtained from the FFT of coherently sampled tones. The results
indicate that the proper selection
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
of test tone frequencies can avoid spectral leakage even with
multiple narrowly spaced tones.
When low-frequency signals are captured with an
analog-to-digital converter (ADC) for on-chip
analysis, the overall accuracy is limited by the ADC's
resolution, linearity, noise, and bandwidth
limitations. Post-layout simulations of a 16-point FFT showed
that third-order intermodulation
(IM3) testing with two tones can be performed with 1.5-dB
accuracy for IM3 levels of up to 50
dB below the fundamental tones that are quantized with a 10-bit
resolution. In a 45-nm CMOS
technology, the layout area of the 16-point FFT for on-chip
built-in testing is 0.073 mm2, and its
estimated power consumption is 6.47 mW.
21. Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter
With Low
Adaptation-Delay
In this paper, we present an efficient architecture for the
implementation of a delayed least mean
square adaptive filter. For achieving lower adaptation-delay and
area-delay-power efficient
implementation, we use a novel partial product generator and
propose a strategy for optimized
balanced pipelining across the time-consuming combinational
blocks of the structure. From
synthesis results, we find that the proposed design offers
nearly 17% less area-delay product
(ADP) and nearly 14% less energy-delay product (EDP) than the
best of the existing systolic
structures, on average, for filter lengths N=8, 16, and 32. We
propose an efficient fixed-point
implementation scheme of the proposed architecture, and derive
the expression for steady-state
error. We show that the steady-state mean squared error obtained
from the analytical result
matches with the simulation result. Moreover, we have proposed a
bit-level pruning of the
proposed architecture, which provides nearly 20% saving in ADP
and 9% saving in EDP over
the proposed structure before pruning without noticeable
degradation of steady-state-error
performance.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
22. Efficient Integer DCT Architectures for High Efficiency
Video CODEC standard
In this paper, we present area- and power-efficient
architectures for the implementation of
integer discrete cosine transform (DCT) of different lengths to
be used in High Efficiency Video
Coding (HEVC). We show that an efficient constant
matrix-multiplication scheme can be used to
derive parallel architectures for 1-D integer DCT of different
lengths. We also show that the
proposed structure could be reusable for DCT of lengths 4, 8,
16, and 32 with a throughput of 32
DCT coefficients per cycle irrespective of the transform size.
Moreover, the proposed
architecture could be pruned to reduce the complexity of
implementation substantially with only
a marginal affect on the coding performance. We propose
power-efficient structures for folded
and full-parallel implementations of 2-D DCT. From the synthesis
result, it is found that the
proposed architecture involves nearly 14% less area-delay
product (ADP) and 19% less energy
per sample (EPS) compared to the direct implementation of the
reference algorithm, on average,
for integer DCT of lengths 4, 8, 16, and 32. Also, an additional
19% saving in ADP and 20%
saving in EPS can be achieved by the proposed pruning algorithm
with nearly the same
throughput rate. The proposed architecture is found to support
ultrahigh definition 7680 4320
at 60 frames/s video, which is one of the applications of
HEVC.
23. Low-Cost Low-Power ASIC Solution for Both DAB+ and DAB Audio
Decoding
DAB+ is the upgraded version of digital audio broadcasting
(DAB). DAB and DAB+ coexist in
many countries, so receivers are required to be compatible with
both standards. In this paper, a
solution integrating an MPEG1-LayerII (MP2) decoder and an
advanced audio coding
(AAC) low-complexity (AAC LC) decoder is proposed to provide
basic audio decoding for both
DAB and DAB+. It also utilizes simple methods to improve high
frequencies and stereo quality
instead of complicated spectrum band replication and parametric
stereo. A highly integrated low-
power audio decoder design compatible with DAB/DAB+ and using a
purely ASIC approach is
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
presented. As a result of the system structure optimization and
hardware sharing, the audio
decoder is fabricated in 1P4M 0.18- m CMOS technology using only
3.2 mm2 silicon area
(including 147 456 bits RAM and 170 496 bits ROM). The
powerconsumption of the audio
decoder is 10.4 mW for DAB audio decoding and 8.5 mW for DAB+
audio decoding.
Laboratory and field tests show that the function is correct and
the audio quality is good for
receiving both DAB and DAB+. The audio decoder is thus proven to
be a low-cost low-
power solution for the two existing DAB standards.
24. Low-Power Digital Signal Processor Architecture for Wireless
Sensor Nodes
Radio communication exhibits the highest energy consumption in
wireless sensor nodes. Given
their limited energy supply from batteries or scavenging, these
nodes must trade data
communication for on-the-node computation. Currently, they are
designed around off-the-
shelf low-power microcontrollers. But by employing a more
appropriate processing element, the
energy consumption can be significantly reduced. This paper
describes the design and
implementation of the newly proposed folded-tree architecture
for on-the-node data processing
in wireless sensor networks, using parallel prefix operations
and data locality in hardware.
Measurements of the silicon implementation show an improvement
of 10-20 in terms of energy
as compared to traditional modern micro-controllers found in
sensor nodes.
25. Memory Footprint Reduction for Power-Efficient Realization
of 2-D Finite
Impulse Response Filters
We have analyzed memory footprint and combinational complexity
to arrive at a systematic
design strategy to derive area-delay-power-efficient
architectures for two-dimensional (2-D)
finite impulse response (FIR) filter. We have presented novel
block-based structures for
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
separable and non-separable filters with less memory footprint
by memory sharing and memory-
reuse along with appropriate scheduling of computations and
design of storage architecture. The
proposed structures involve L times less storage per output
(SPO), and nearly L times less energy
consumption per output (EPO) compared with the existing
structures, where L is the input block-
size. They involve L times more arithmetic resources than the
best of the corresponding existing
structures, and produce L times more throughput with less memory
band-width (MBW) than
others. We have also proposed separate generic structures for
separable and non-separable filter-
banks, and a unified structure of filter-bank constituting
symmetric and general filters. The
proposed unified structure for 6 parallel filters involves
nearly 3.6L times more multipliers, 3L
times more adders, (N2-N+2) less registers than similar existing
unified structure, and computes
6L times more filter outputs per cycle with 6L times less MBW
than the existing design, where
N is FIR filter size in each dimension. ASIC synthesis result
shows that for filter size (4 4),
input-block size L=4, and image-size (512 512), proposed
block-based non-separable and
generic non-separable structures, respectively, involve 5.95
times and 11.25 times less area-
delay-product (ADP), and 5.81 times and 15.63 times less EPO
than the corresponding existing
structures. The proposed unified structure involves 4.64 times
less ADP and 9.78 times less EPO
than the corresponding existing structure.
26. Ultra-High Throughput Low-Power Packet Classification
Packet classification is used by networking equipment to sort
packets into flows by comparing
their headers to a list of rules, with packets placed in the
flow determined by the matched rule. A
flow is used to decide a packet's priority and the manner in
which it is processed. Packet
classification is a difficult task due to the fact that all
packets must be processed at wire speed
and rulesets can contain tens of thousands of rules. The
contribution of this paper is a hardware
accelerator that can classify up to 433 million packets per
second when using rule sets containing
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
tens of thousands of rules with a peak power consumption of only
9.03 W when using a Stratix
III field-programmable gate array (FPGA). The hardware
accelerator uses a modified version of
the HyperCuts packet classification algorithm, with a new
pre-cutting process used to reduce the
amount of memory needed to save the search structure for large
rulesets so that it is small
enough to fit in the on-chip memory of an FPGA. The modified
algorithm also removes the need
for floating point division to be performed when classifying a
packet, allowing higher clock
speeds and thus obtaining higher throughputs.
27. A Configurable and Low-Power Mixed Signal SoC for Portable
ECG Monitoring
Applications
This paper describes a mixed-signal ECG System-on-chip (SoC)
that is capable of implementing
configurable functionality with low-power consumption
for portable ECG monitoring applications. A low-voltage and high
performance analog front-end
extracts 3-channel ECG signals and single channel impedance
measurement with
high signal quality. A custom digital signal processor provides
the configurability and advanced
functionality like motion artifact removal and R peak detection.
The SoC is implemented in
0.18m CMOS process and consumes minimum 31.1W from a 1.2V.
28. Partial Access Mode: New Method for Reducing Power
Consumption of Dynamic
Random Access Memory
Demands have been placed on a dynamic random access memory
(DRAM) to not only have
increasedmemory capacity and data transfer speed, but also have
reduced operating and standby
currents. When a system uses a DRAM, a refresh operation is
necessary because of its data
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
retention time restriction: each bit of the DRAM is stored as an
amount of electrical charge in a
storage capacitor that is discharged by the leakage current.
Power consumption for the refresh
operation increases in proportion to the memory capacity. We
propose
a new method to reduce the refresh powerconsumption by
effectively extending the memory cell
retention time. Conversion from 1 cell/bit to$2^{N}$ cells/bit
reduces the variation in the
retention time among memory cells. Although active
powerincreases by a factor of $2^{N}$ ,
the refresh time increases by more than $2^{N}$ as a consequence
of the fact that the majority
decision does better than averaging for the tail distribution of
retention time. The conversion can
be realized very simply from the structure of the DRAM array
circuit, and it reducesthe
frequency of disturbance and power consumption by two orders of
magnitude. On the basis of
this conversion method, we propose
a partial access mode to reduce power consumption dynamically
when the full memory capacity
is not required.
29. Reliability-Oriented Placement and Routing Algorithm for
SRAM-Based FPGAs
As the feature size shrinks to the nanometer scale, SRAM-based
FPGAs will become
increasingly vulnerable to soft errors. Existing
reliability-
oriented placement and routing approaches primarily focus on
reducing the fault occurrence
probability (node error rate) of soft errors. However, our
analysis shows that, besides the fault
occurrence probability, the propagation probability (error
propagation probability) plays an
important role and should be taken into consideration. In this
paper, we first propose a cube-
based analysis algorithm to efficiently and accurately estimate
the error propagation
probability. Based on such a model, we propose a novel
reliability-
oriented placement and routingalgorithm that combines both the
fault occurrence probability and
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
the error propagation probability together to enhance
system-level robustness against soft errors.
Experimental results show that, compared with the baseline
versatile place and route technique,
the proposed scheme can reduce the failure rate by 20.73%, and
increase the mean time between
failures by 39.44%.
30. Time-Based All-Digital Technique for Analog Built-in
Self-Test
A scheme for built-in self-test of analog signals with minimal
area overhead for measuring on-
chip voltages in an all-digital manner is presented. The method
is well suited for a distributed
architecture, where the routing of analog signals over long
paths is minimized. A clock is routed
serially to the sampling heads placed at the nodes of analog
test voltages. This sampling head
present at each testnode, which consists of a pair of delay
cells and a pair of flip-flops, locally
converts the test voltage to a skew between a pair of subsampled
signals, thus giving rise to as
many subsampled signal pairs as the number of nodes. To measure
a certain analog voltage, the
corresponding subsampled signal pair is fed to a delay
measurement unit to measure the skew
between this pair. The concept is validated by designing a test
chip in a UMC 130-nm CMOS
process. Sub-millivolt accuracy for static signals is
demonstrated for a measurement time of a
few seconds, and an effective number of bits of 5.29 is
demonstrated for low-bandwidth signals
in the absence of sample-and-hold circuitry.
31. Improved 8-Point Approximate DCT for Image and Video
Compression Requiring
Only 14 Additions
Video processing systems such as HEVC requiring low energy
consumption needed for the
multimedia market has lead to extensive development in fast
algorithms for the efficient
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
approximation of 2-D DCT transforms. The DCT is employed in a
multitude of compression
standards due to its remarkable energy compaction properties.
Multiplier-free approximate DCT
transforms have been proposed that offer superior compression
performance at very low circuit
complexity. Such approximations can be realized in digital VLSI
hardware using additions and
subtractions only, leading to significant reductions in chip
area and power consumption
compared to conventional DCTs and integer transforms. In this
paper, we introduce a novel 8-
point DCT approximation that requires only 14 addition
operations and no multiplications. The
proposed transform possesses low computational complexity and is
compared to state-of-the-art
DCT approximations in terms of both algorithm complexity and
peak signal-to-noise ratio. The
proposed DCT approximation is a candidate for reconfigurable
video standards such as HEVC.
The proposed transform and several other DCT approximations are
mapped to systolic-array
digital architectures and physically realized as digital
prototype circuits using FPGA technology
and mapped to 45 nm CMOS technology.
32. Reconfigurable CORDIC-Based Low-Power DCT Architecture Based
on Data
Priority
This paper presents a low-power coordinate rotation digital
computer (CORDIC)-based
reconfigurable discrete cosine transform (DCT) architecture. The
main idea of this paper is based
on the interesting fact that all the computations in DCT are not
equally important in generating
the frequency domain outputs. Considering the importance
difference in the DCT coefficients,
the number of CORDIC iterations can be dynamically changed to
efficiently tradeoff image
quality for power consumption. Thus, the computational energy
can be significantly reduced
without seriously compromising the image quality. The proposed
CORDIC-based 2-D DCT
architecture is implemented using 0.13 m CMOS process, and the
experimental results show
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
that our reconfigurable DCT achieves power savings ranging from
22.9% to 52.2% over the
CORDIC-based Loeffler DCT at the cost of minor image quality
degradations.
33. Data Encoding Techniques for Reducing Energy Consumption in
Network-on-Chip
As technology shrinks, the power dissipated by the links of a
network-on-chip (NoC) starts to
compete with the power dissipated by the other elements of the
communication subsystem,
namely, the routers and the network interfaces (NIs). In this
paper, we present a set of data
encoding schemes aimed at reducing the power dissipated by the
links of an NoC. The proposed
schemes are general and transparent with respect to the
underlying NoC fabric (i.e., their
application does not require any modification of the routers and
link architecture). Experiments
carried out on both synthetic and real traffic scenarios show
the effectiveness of the proposed
schemes, which allow to save up to 51% ofpower dissipation and
14% of energy consumption
without any significant performance degradation and with less
than 15% area overhead in the NI.
34. Achieving High-Performance On-Chip Networks With
Shared-Buffer Routers
On-chip routers typically have buffers dedicated to their input
or output ports for temporarily
storing packets in case contention occurs on output physical
channels. Buffers, unfortunately,
consume significant portions of router area and power budgets.
While running a traffic trace,
however, not all input ports of routers have incoming packets
needed to be transferred
simultaneously. Therefore, a large number of buffer queues in
the network are empty and other
queues are mostly busy. This observation motivates us to design
router architecture with shared
queues (RoShaQ), router architecture that maximizes buffer
utilization by allowing the sharing
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile:
09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
multiple buffer queues among input ports. Sharing queues, in
fact, makes using buffers more
efficient hence is able to achieve higher throughput when the
network load becomes heavy. On
the other side, at light traffic load, our router achieves low
latency by allowing packets to
effectively bypass these shared queues. Experimental results on
a 65-nm CMOS standard-cell
process show that over synthetic traffics RoShaQ has 17% less
latency and 18% higher
saturation throughput than a typical virtualchannel (VC) router.
Because of its higher
performance, RoShaQ consumes 9% less energy per transferred
packet than VC router given the
same buffer space capacity. Over real multitask applications and
E3S embedded benchmarks
using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower
latency than VC router
and targeting the same application throughput with 30% lower
energy per packet.
35. Energy Efficiency Optimization Through Codesign of the
Transmitter and Receiver
in High-Speed On-Chip Interconnects
A novel equalized global link architecture and driver-receiver
codesign flow are proposed for
high-speed and low-energy on-chip communication by utilizing a
continuous-time linear
equalizer (CTLE). The proposed global link is analyzed using a
linear system method, and the
formula of CTLE eye opening is derived to provide high-level
design guidelines and insights.
Compared with the separate driver-receiver design flow, over 50%
energy reduction is observed.
The final optimal solution achieves 20-Gb/s signaling over 10
mm, 2.6- m pitch on-chip
transmission line with 15.5-ps/mm latency and 0.196-pJ/b energy
using 45-nm technology.
Monte Carlo simulation also shows that 3 / for power and delay
variation in the proposed
global link are 13.1% and 4.6%, respectively.