Top Banner
Zuara Technologies Battle with bugs No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected] Web site: www.zuaratech.com 82, Station road, Radha nagar,Chrompet Chennai-44 Mob.: 9095188016/9677465689 1. ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for Real-Time Segmentation of High Definition Video Background identification is a common feature in many video processing systems. This paper  proposes two hardware implementations of the Open CV version of the Gaussian mixture model (GMM), a background identification algorithm. The implemented version of the algorithm allows a fast initialization of the background model while an innovative, hardware-oriented, formulation of the GMM equations makes the proposed circuits able to  perform real-time background identification on highdefinition (HD) video sequences with frame size 1920 × 1080. The first of the two circuits is designed with commercial field-programmable gate-array (FPGA) devices as target. When implemented on Virtex6 vlx75t, the proposed circuit  process 91 HD fps (frames per second) and uses 3% of FPGA logic resources. The second circuit is oriented to the implementation in UMC-90 nm CMOS standard cell technology, and is  proposed in two versions. Both versions can process at a frame rate higher than 60 HD fps. The first version uses the constant voltage scaling technique to provide a low power implementation. It provides silicon area occupation of 28847 μm 2  and energy dissipation per pixel of 15.3  pJ/pixel. The second version is designed to reduce silicon area utilization and occupies 21847 μm 2 with an energy dissipation of 49.4 pJ/pixel. 2. Design and FPGA Implementation of High-Speed, Fixed-Latency Serial Transceivers Fixed-latency serial links are important components of the distributed measurement and control systems. However, most high-speed Serializer-Deserializer (SerDes) chips do not keep the same linklatency after each power-up or reset. In this paper, we propose a fixed-
24

VLSI 2014 IEEE TITLES

Oct 13, 2015

Download

Documents

Here is the list of best VLSI 2014 IEEE titles. The lists contains titles based on low power vlsi, Image processing titles, signal Processing, Audio processing, Video processing, stegnograpghy, cryptography.

Zuara Technologies provides Educational & Development boards for Industries and has been at the forefront of technological and market innovation in FPGA and Embedded products for High Performance Computing applications and Embedded development.
We offer certified project internship for B.E/B.Tech, M.E/M.Tech, PhD candidates in the field of VLSI, EMBEDDED, MATLAB, and Software and firmware developments. Our deep industry knowledge enables us to provide you with innovative ideas that help you improve productivity and innovative ideas.

For more details Contact us at [email protected]
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    1. ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for

    Real-Time Segmentation of High Definition Video

    Background identification is a common feature in many video processing systems. This paper

    proposes two hardware implementations of the Open CV version of

    the Gaussian mixture model (GMM), a background identification algorithm. The implemented

    version of the algorithm allows a fast initialization of the background model while an innovative,

    hardware-oriented, formulation of the GMM equations makes the proposed circuits able to

    perform real-time background identification on highdefinition (HD) video sequences with frame

    size 1920 1080. The first of the two circuits is designed with commercial field-programmable

    gate-array (FPGA) devices as target. When implemented on Virtex6 vlx75t, the proposed circuit

    process 91 HD fps (frames per second) and uses 3% of FPGA logic resources. The second circuit

    is oriented to the implementation in UMC-90 nm CMOS standard cell technology, and is

    proposed in two versions. Both versions can process at a frame rate higher than 60 HD fps. The

    first version uses the constant voltage scaling technique to provide a low power implementation.

    It provides silicon area occupation of 28847 m2 and energy dissipation per pixel of 15.3

    pJ/pixel. The second version is designed to reduce silicon area utilization and occupies 21847

    m2with an energy dissipation of 49.4 pJ/pixel.

    2. Design and FPGA Implementation of High-Speed, Fixed-Latency Serial

    Transceivers

    Fixed-latency serial links are important components of the distributed measurement and control

    systems. However, most high-speed Serializer-Deserializer (SerDes) chips do not keep the same

    linklatency after each power-up or reset. In this paper, we propose a fixed-

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    latency serial transceiver based on dynamic clock phase shifting and changeable delay tuning

    technologies. Our solution can process all possible phase offsets between the transmitted and

    received clocks, so it relaxes the requirement of fanning in the same reference clock both to the

    transmitter and to the receiver. It also eliminates the reset-relock process in the roulette approach.

    We present a specific example of implementation based on the serial transceiver in Xilinx Virtex

    5 FPGA. The experiment results indicate that our transceiver can achieve a

    deterministic latency with sub-nanosecond precision.

    3. DART: A Programmable Architecture for NoC Simulation on FPGAs

    The increased demand for on-chip communication bandwidth as a result of the multicore trend

    has made packet-switched networks-on-chip (NoCs) a more compelling choice for the

    communication backbone in next-generation systems . However, NoC designs have many power,

    area, and performance tradeoffs in topology, buffer sizes, routing algorithms, and flow control

    mechanisms hence, the study of new NoC designs can be very time intensive. To address these

    challenges, we propose DART, a fast and flexible FPGA-based NoC simulation architecture.

    Rather than laying theNoC out in hardware on the FPGA like previous approaches , , our design

    virtualizes the NoC by mapping its components to a generic NoC simulation engine, composed

    of a fully connected collection of fundamental components (e.g., routers and flit queues). This

    approach has two main advantages: 1) since it is virtualized it can simulate any NoC, and 2)

    any NoC can be mapped to the engine without rebuilding it, which can take significant time for a

    large FPGA design. We demonstrate 1) that an implementation of DART on a Virtex-II Pro

    FPGA can achieve over $(100times)$ speedup over the cycle-based software simulator Booksim

    , while maintaining the same level of simulation accuracy, and 2) that a more modern Virtex-6

    FPGA can accommodate a 49-node DART implementation.

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    4. Defense Against Primary User Emulation Attacks in Cognitive Radio Networks

    Using Advanced Encryption Standard

    This paper considers primary user emulation attacks in cognitive radio networks operating in the

    white spaces of the digital TV (DTV) band. We propose a reliable AES-assisted DTV scheme, in

    which an AES-encrypted reference signal is generated at the TV transmitter and used as the sync

    bits of the DTV data frames. By allowing a shared secret between the transmitter and the

    receiver, the reference signal can be regenerated at the receiver and used to achieve accurate

    identification of the authorized primaryusers. In addition, when combined with the analysis on

    the autocorrelation of the received signal, the presence of the malicious user can be detected

    accurately whether or not the primary user is present. We analyze the effectiveness of the

    proposed approach through both theoretical analysis and simulation examples. It is shown that

    with the AES-assisted DTV scheme, the primary user, as well as malicious user, can be detected

    with high accuracy under primary user emulation attacks. It should be emphasized that the

    proposed scheme requires no changes in hardware or system structure except for a plug-in AES

    chip. Potentially, it can be applied directly to today's DTV system

    under primary useremulation attacks for more efficient spectrum sharing.

    5. Energy-Efficient Resource Allocation in OFDM Systems With Distributed Antennas

    In this paper, we develop an energy-efficient resource-allocation scheme with proportional

    fairness for downlink multiuser orthogonal frequency-division multiplexing

    (OFDM) systems with distributedantennas. Our aim is to maximize energy efficiency (EE) under

    the constraints of the overall transmit power of each remote access unit (RAU), proportional

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    fairness data rates, and bit error rates (BERs). Because of the nonconvex nature of the

    optimization problem, obtaining the optimal solution is extremely computationally complex.

    Therefore, we develop a low-complexity suboptimal algorithm, which separates

    subcarrier allocation and power allocation. For the low-complexity algorithm, we first allocate

    subcarriers by assuming equal power distribution. Then, by exploiting the properties of fractional

    programming, we transform the nonconvex optimization problem in fractional form into an

    equivalent optimization problem in subtractive form, which includes a tractable solution. Next,

    an optimalenergy-efficient power-allocation algorithm is developed to maximize EE while

    maintaining proportional fairness. Through computer simulation, we demonstrate the

    effectiveness of the proposed low-complexity algorithm and illustrate the fundamental tradeoff

    between energy- and spectral-efficienttransmission designs.

    6. Design Flow for Flip-Flop Grouping in Data-Driven Clock Gating

    Clock gating is a predominant technique used for power saving. It is observed that the commonly

    used synthesis-based gating still leaves a large amount of redundant clock pulses. Data-

    driven gating aims to disable these. To reduce the hardware overhead involved, flip-flops (FFs)

    are grouped so that they share a common clock enabling signal. The question of what is

    the group size maximizing the power savings is answered in a previous paper. Here we answer

    the question of which FFs should be placed in a group to maximize the power reduction. We

    propose a practical solution based on the toggling activity correlations of FFs and their physical

    position proximity constraints in the layout. Our data-drivenclock gating is integrated into an

    Electronic Design Automation (EDA) commercial backend design flow, achieving total power

    reduction of 15%-20% for various types of large-scale state-of-the-art industrial and

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    academic designs in 40 and 65 manometer process technologies. These savings are achieved on

    top of the sClock gating is a predominant technique used for power saving. It is observed that the

    commonly used synthesis-based gating still leaves a large amount of

    redundant clock pulses. Data-driven gating aims to disable these. To reduce the hardware

    overhead involved, flip-flops (FFs) aregrouped so that they share a common clock enabling

    signal. The question of what is the group size maximizing the power savings is answered in a

    previous paper. Here we answer the question of which FFs should be placed in a group to

    maximize the power reduction. We propose a practical solution based on the toggling activity

    correlations of FFs and their physical position proximity constraints in the layout. Our data-

    driven clock gating is integrated into an Electronic Design Automation (EDA) commercial

    backend design flow, achieving total power reduction of 15%-20% for various types of large-

    scale state-of-the-art industrial and academic designs in 40 and 65 manometer process technol-

    gies. These savings are achieved on top of the savings obtained by clock gating synthesis

    performed by commercial EDA tools, and gating manually inserted into the register transfer

    level design.avings obtained by clock gating synthesis performed by commercial EDA tools,

    and gating manually inserted into the register transfer level design.

    7. Effect of Image Downsampling on Steganographic Security

    The accuracy of steganalysis in digital images primarily depends on the statistical properties of

    neighboring pixels, which are strongly affected by the image acquisition pipeline as well as any

    processing applied to the image. In this paper, we study how the detectability of embedding

    changes is affected when the cover image is downsampled prior to embedding. This topic is

    important for practitioners because the vast majority of images posted on

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    websites, image sharing portals, or attached to e-mails are downsampled. It is also relevant to

    researchers as the security ofsteganographic algorithms is commonly evaluated on databases of

    downsampled images. In the first part of this paper, we investigate empirically how the

    steganalysis results depend on the parameters of the resizing algorithm-the choice of the

    interpolation kernel, the scaling factor (resize ratio), antialiasing, and the downsampled pixel

    grid alignment. We report on several novel phenomena that appear valid universally across the

    tested cover sources, steganographic methods, and steganalysis features. This paper continues

    with a theoretical analysis of the simplest interpolation kernel - the box kernel. By fitting a

    Markov chain model to pixel rows, we analytically compute the Fisher information rate for any

    mutually independent embedding operation and derive the proper scaling of the secure payload

    with resizing. For least significant bit (LSB) matching and a limited range of downscaling, the

    theory fits experiments rather well, which indicates the existence of a new scaling law expressing

    the length of the secure payload when the cover size is modified by subsampling.

    8. An FPGA-Based Fully Synchronized Design of a Bilateral Filter for Real-Time

    Image Denoising

    In this paper, a detailed description of a synchronous field-programmable gate array

    implementation of abilateral filter for image processing is given. The bilateral filter is chosen for

    one unique reason: It reduces noise while preserving details. The design is described on register-

    transfer level. The distinctive feature of our design concept consists of changing the clock

    domain in a manner that kernel-based processing is possible, which means the processing of the

    entire filter window at one pixel clock cycle. This feature of the kernel-based design is supported

    by the arrangement of the input data into groups so that the internal clock of the design is a

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    multiple of the pixel clock given by a targeted system. Additionally, by the exploitation of the

    separability and the symmetry of one filter component, the complexity of the design is widely

    reduced. Combining these features, the bilateral filter is implemented as a highly parallelized

    pipeline structure with very economical and effective utilization of dedicated resources. Due to

    the modularity of the filter design, kernels of different sizes can be implemented with low effort

    using our design and given instructions for scaling. As the original form of the bilateral filterwith

    no approximations or modifications is implemented, the resulting image quality depends on the

    chosen filter parameters only. Due to the quantization of the filter coefficients, only negligible

    quality loss is introduced.

    9. Subjective evaluation of HEVC and AVC/H.264 in mobile environments

    This paper compares the quality of AVC/H.264 and HEVC encoded video in low bandwidth

    mobile environments. In this study, the focus within the mobile environment is smart phones.

    The key characteristics of a smart phone are smaller screen size, which is usually 3.5 inches

    diagonal to 5.0 inches diagonal for high end smart phones and typical cellular network

    bandwidth, which is 3G or faster. Subjective evaluations were conducted to evaluate the user

    experience on a mobile device with a small screen size and video coded at 200 and 400 Kbps.

    The studies showed compelling evidence that a user's experience in low bandwidth mobile

    environments is very similar between HEVC and AVC/H.264. The results suggest the benefits of

    HEVC over AVC/H.264 in a mobile environment with lower video bitrates and resolutions are

    not as clear.

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    10. Improved Method to Select the Lagrange Multiplier for Rate-Distortion Based

    Motion Estimation in Video Coding

    The motion estimation (ME) process used in the H.264/AVC reference software is based on

    minimizing a cost function that involves two terms (distortion and rate) that are properly

    balanced through a Lagrangian parameter, usually denoted as motion. In this paper we propose

    an algorithm to improve the conventional way of estimating motion and, consequently, the ME

    process. First, we show that the conventional estimation of motion turns out to be significantly

    less accurate when ME-compromising events, which make the ME process to perform poorly,

    happen. Second, with the aim of improving the coding efficiency in these cases, an efficient

    algorithm is proposed that allows the encoder to choose between three different values of

    motion for the Inter 16x16 partition size. To be more precise, for this partition size, the

    proposed algorithm allows the encoder to additionally test motion=0 and motionarbitrarily

    large, which corresponds to minimum distortion and minimum rate solutions, respectively. By

    testing these two extreme values, the algorithm avoids making large ME errors. The

    experimental results on video segments exhibiting this type of ME-compromising events reveal

    an average rate reduction of 2.20% for the same coding quality with respect to the JM15.1

    reference software of H.264/AVC. The algorithm has been also tested in comparison with a

    state-of-the-art algorithm called context adaptive Lagrange multiplier. Additionally, two

    illustrative examples of the subjective performance improvement are provided.

    11. An Overview of Information Hiding in H.264/AVC Compressed Video

    Information hiding refers to the process of inserting information into a host to serve specific

    purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video

    domain are surveyed. First, the general framework of information hiding is conceptualized by

    relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    using various data representation schemes such as bit plane replacement, spread spectrum,

    histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which

    information hiding takes place are then identified, including prediction process, transformation,

    quantization, and entropy coding. Related information hiding methods at each venue are briefly

    reviewed, along with the presentation of the targeted applications, appropriate diagrams, and

    references. A timeline diagram is constructed to chronologically summarize the invention of

    information hiding methods in the compressed still image and video domains since 1992. A

    comparison among the considered information hiding methods is also conducted in terms of

    venue, payload, bitstream size overhead, video quality, computational complexity, and video

    criteria. Further perspectives and recommendations are presented to provide a better

    understanding of the current trend of information hiding and to identify new opportunities for

    information hiding in compressed video.

    12. VLSI Architecture Design of Guided Filter for 30 Frames/s Full-HD

    Video

    Filtering is widely used in image and video processing for various applications. Recently, the

    guided filter has been proposed and became one of the popular filtering methods. In this paper, to

    achieve the computation demand of guided filtering in full-HD video, a double integral image

    architecture for guided filter ASIC design is proposed. In addition, a reformation of the guided

    filter formula is proposed, which can prevent the error resulted from truncation in the fractional

    part and modify the regularization parameter on user's demand. The hardware architecture of

    the guided image filter is then proposed and can be embedded in mobile devices to achieve real-

    time HD applications. To the best of our knowledge, this paper is also the first ASIC design for

    guided image filter. With a TSMC 90-nm cell library, the design can operate at 100 MHz and

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    support for Full-HD (1920 1080) 30 frame/s with 92.9K gate counts and 3.2 KB on-chip

    memory. Moreover, for the hardware efficiency, our architecture is also the best compared to

    other previous works with bilateral filter.

    13. Property Analysis of XOR-Based Visual Cryptography

    A (k,n) visual cryptographic scheme (VCS) encodes a secret image into n shadow images

    (printed on transparencies) distributed among n participants. When any k participants

    superimpose their transparencies on an overhead projector (OR operation), the secret image can

    be visually revealed by a human visual system without computation. However, the monotone

    property of OR operation degrades the visual quality of reconstructed image for OR-based VCS

    (OVCS). Accordingly, XOR-based VCS (XVCS), which uses XOR operation for decoding, was

    proposed to enhance the contrast. In this paper, we investigate the relation between OVCS and

    XVCS. Our main contribution is to theoretically prove that the basis matrices of (k,n)-OVCS can

    be used in (k,n)-XVCS. Meantime, the contrast is enhanced 2(k-1)

    times.

    14. Effectiveness of Leakage Power Analysis Attacks on DPA-Resistant Logic Styles

    Under Process Variations

    This paper extends the analysis of the effectiveness of Leakage Power Analysis (LPA) attacks to

    cryptographic VLSI circuits on which circuit level countermeasures against Differential Power

    Analysis (DPA) are adopted. Security metrics used for assessing the DPA-resistance of crypto

    core implementations, such as the minimum number to disclosure (MTD) and the asymptotic

    correlation coefficient, have been extended to the case of LPA. The LPA-resistance has been

    evaluated in terms of MTD as a function of the on chip noise. Noise variances up to 10000 times

    greater than the signal variance have been taken into account and LPA attacks have been

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    successfully executed for all the logic styles under analysis using less than 100000

    measurements. Moreover the role of process variations has been investigated through extensive

    Monte Carlo simulations in order to evaluate their impact on the leakage model for the logic

    styles under analysis. Results show that LPA attacks can be successfully carried out on the

    different anti-DPA logic styles even in presence of process variations. To the best of our

    knowledge, this work proves for the first time the effectiveness of LPA attacks in a real scenario

    where on chip noise and process variations are taken into account.

    15. Data Hiding in Encrypted H.264/AVC Video Streams by Codeword Substitution

    Digital video sometimes needs to be stored and processed in an encrypted format to maintain

    security and privacy. For the purpose of content notation and/or tampering detection, it is

    necessary to perform data hiding in these encrypted videos. In this way, data hiding in encrypted

    domain without decryption preserves the confidentiality of the content. In addition, it is more

    efficient without decryption followed by data hiding and re-encryption. In this paper, a novel

    scheme of data hiding directly in the encrypted version of H.264/AVC video stream is proposed,

    which includes the following three parts, i.e., H.264/AVC video encryption, data embedding, and

    data extraction. By analyzing the property of H.264/AVC codec, the codewords of

    intraprediction modes, the codewords of motion vector differences, and the codewords of

    residual coefficients are encrypted with stream ciphers. Then, a data hider may embed additional

    data in the encrypted domain by using codeword substitution technique, without knowing the

    original video content. In order to adapt to different application scenarios, data extraction can be

    done either in the encrypted domain or in the decrypted domain. Furthermore, video file size is

    strictly preserved even after encryption and data embedding. Experimental results have

    demonstrated the feasibility and efficiency of the proposed scheme.

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    16. Optimal Transport for Secure Spread-Spectrum Watermarking of Still Images

    This paper studies the impact of secure watermark embedding in digital images by proposing a

    practical implementation of secure spread-spectrum watermarking using distortion optimization.

    Because strong security properties (key-security and subspace-security) can be achieved using

    naturalwatermarking (NW) since this particular embedding lets the distribution of the host and

    watermarked signals unchanged, we use elements of transportation theory to minimize the global

    distortion. Next, we apply this new modulation, called transportation NW (TNW), to design a

    secure watermarking scheme for grayscale images. The TNW uses a multiresolution image

    decomposition combined with a multiplicative embedding which is taken into account at the

    distribution level. We show that the distortion solely relies on the variance of the wavelet

    subbands used during the embedding. In order to maximize a target robustness after JPEG

    compression, we select different combinations of subbands offering the lowest Bit Error Rates

    for a target PSNR ranging from 35 to 55 dB and we propose an algorithm to select them. The use

    of transportation theory also provides an average PSNR gain of 3.6 dB on PSNR with respect to

    the previous embedding for a set of 2000 images.

    17. Impulse Noise Estimation and Removal for OFDM Systems

    Orthogonal Frequency Division Multiplexing (OFDM) is a modulation scheme that is widely

    used in wired and wireless communication systems. While OFDM is ideally suited to deal with

    frequency selective channels and AWGN, its performance may be dramatically impacted by the

    presence of impulse noise. In fact, very strong noise impulses in the time domain might result in

    the erasure of whole OFDM blocks of symbols at the receiver. Impulse noise can be mitigated by

    considering it as a sparse signal in time, and using recently developed algorithms for sparse

    signal reconstruction. We propose an algorithm that utilizes the guard band null subcarriers for

    the impulse noise estimation and cancellation. Instead of relying on ell_1 minimization as done

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    in some popular general-purpose compressive sensing schemes, the proposed method jointly

    exploits the specific structure of this problem and the available a priori information for sparse

    signal recovery. The computational complexity of the proposed algorithm is very competitive

    with respect to sparse signal reconstruction schemes based on ell_1 minimization. The proposed

    method is compared with respect to other state-of-the-art methods in terms of achievable rates

    for an OFDM system with impulse noise and AWGN.

    18. Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications

    for Efficient FIR Filter Implementation

    Multiple constant multiplication (MCM) scheme is widely used for implementing transposed

    direct-formFIR filters. While the research focus of MCM has been on more effective common

    subexpression elimination, the optimization of adder-trees, which sum up the computed sub-

    expressions for each coefficient, is largely omitted. In this paper, we have identified the resource

    minimization problem in the scheduling of adder-tree operations for the MCM block, and

    presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based

    implementation of FIR filters. Experimental result shows that up to 15% reduction of area and

    11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved

    on the top of already optimized adder/subtractor network of the MCM block.

    19. Frequency Estimation of Distorted and Noisy Signals in Power Systems by FFT-

    Based Approach

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    This paper focuses on the accurate frequency estimation of power signals corrupted by a

    stationary white noise. The noneven item interpolation FFT based on the triangular self-

    convolution window is described. A simple analytical expression for the variance of noise

    contribution on the frequency estimation is derived, which shows the variances of frequency

    estimation are proportional to the energy of the adopted window. Based on the proposed method,

    the noise level of the measurement channel can be estimated, and optimal parameters (e.g.,

    sampling frequency and window length) of the interpolation FFT algorithm that minimize the

    variances of frequency estimation can thus be determined. The application in a power quality

    analyzer verified the usefulness of the proposed method.

    20. Accurate and Efficient On-Chip Spectral Analysis for Built-In Testing and

    Calibration Approaches

    The fast Fourier transform (FFT) algorithm is widely used as a standard tool to carry out spectral

    analysis because of its computational efficiency. However, the presence of multiple tones

    frequently requires a fine frequency resolution to achieve sufficient accuracy, which imposes the

    use of a large number of FFT points that results in large area and power overheads. In this paper,

    an FFT method is proposed for on-chip spectral analysis of multi-tone signals with particular

    harmonic and intermodulation components. This accurate FFT analysis approach is based on

    coherent sampling, but it requires a significantly smaller number of points to make

    the FFT realization more suitable for on-chip built-in testing and calibration applications that

    require area and power efficiency. The technique was assessed by comparing the simulation

    results from the proposed method of single and multiple tones with the simulation results

    obtained from the FFT of coherently sampled tones. The results indicate that the proper selection

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    of test tone frequencies can avoid spectral leakage even with multiple narrowly spaced tones.

    When low-frequency signals are captured with an analog-to-digital converter (ADC) for on-chip

    analysis, the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth

    limitations. Post-layout simulations of a 16-point FFT showed that third-order intermodulation

    (IM3) testing with two tones can be performed with 1.5-dB accuracy for IM3 levels of up to 50

    dB below the fundamental tones that are quantized with a 10-bit resolution. In a 45-nm CMOS

    technology, the layout area of the 16-point FFT for on-chip built-in testing is 0.073 mm2, and its

    estimated power consumption is 6.47 mW.

    21. Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low

    Adaptation-Delay

    In this paper, we present an efficient architecture for the implementation of a delayed least mean

    square adaptive filter. For achieving lower adaptation-delay and area-delay-power efficient

    implementation, we use a novel partial product generator and propose a strategy for optimized

    balanced pipelining across the time-consuming combinational blocks of the structure. From

    synthesis results, we find that the proposed design offers nearly 17% less area-delay product

    (ADP) and nearly 14% less energy-delay product (EDP) than the best of the existing systolic

    structures, on average, for filter lengths N=8, 16, and 32. We propose an efficient fixed-point

    implementation scheme of the proposed architecture, and derive the expression for steady-state

    error. We show that the steady-state mean squared error obtained from the analytical result

    matches with the simulation result. Moreover, we have proposed a bit-level pruning of the

    proposed architecture, which provides nearly 20% saving in ADP and 9% saving in EDP over

    the proposed structure before pruning without noticeable degradation of steady-state-error

    performance.

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    22. Efficient Integer DCT Architectures for High Efficiency Video CODEC standard

    In this paper, we present area- and power-efficient architectures for the implementation of

    integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video

    Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to

    derive parallel architectures for 1-D integer DCT of different lengths. We also show that the

    proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32

    DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed

    architecture could be pruned to reduce the complexity of implementation substantially with only

    a marginal affect on the coding performance. We propose power-efficient structures for folded

    and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the

    proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy

    per sample (EPS) compared to the direct implementation of the reference algorithm, on average,

    for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20%

    saving in EPS can be achieved by the proposed pruning algorithm with nearly the same

    throughput rate. The proposed architecture is found to support ultrahigh definition 7680 4320

    at 60 frames/s video, which is one of the applications of HEVC.

    23. Low-Cost Low-Power ASIC Solution for Both DAB+ and DAB Audio Decoding

    DAB+ is the upgraded version of digital audio broadcasting (DAB). DAB and DAB+ coexist in

    many countries, so receivers are required to be compatible with both standards. In this paper, a

    solution integrating an MPEG1-LayerII (MP2) decoder and an advanced audio coding

    (AAC) low-complexity (AAC LC) decoder is proposed to provide basic audio decoding for both

    DAB and DAB+. It also utilizes simple methods to improve high frequencies and stereo quality

    instead of complicated spectrum band replication and parametric stereo. A highly integrated low-

    power audio decoder design compatible with DAB/DAB+ and using a purely ASIC approach is

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    presented. As a result of the system structure optimization and hardware sharing, the audio

    decoder is fabricated in 1P4M 0.18- m CMOS technology using only 3.2 mm2 silicon area

    (including 147 456 bits RAM and 170 496 bits ROM). The powerconsumption of the audio

    decoder is 10.4 mW for DAB audio decoding and 8.5 mW for DAB+ audio decoding.

    Laboratory and field tests show that the function is correct and the audio quality is good for

    receiving both DAB and DAB+. The audio decoder is thus proven to be a low-cost low-

    power solution for the two existing DAB standards.

    24. Low-Power Digital Signal Processor Architecture for Wireless Sensor Nodes

    Radio communication exhibits the highest energy consumption in wireless sensor nodes. Given

    their limited energy supply from batteries or scavenging, these nodes must trade data

    communication for on-the-node computation. Currently, they are designed around off-the-

    shelf low-power microcontrollers. But by employing a more appropriate processing element, the

    energy consumption can be significantly reduced. This paper describes the design and

    implementation of the newly proposed folded-tree architecture for on-the-node data processing

    in wireless sensor networks, using parallel prefix operations and data locality in hardware.

    Measurements of the silicon implementation show an improvement of 10-20 in terms of energy

    as compared to traditional modern micro-controllers found in sensor nodes.

    25. Memory Footprint Reduction for Power-Efficient Realization of 2-D Finite

    Impulse Response Filters

    We have analyzed memory footprint and combinational complexity to arrive at a systematic

    design strategy to derive area-delay-power-efficient architectures for two-dimensional (2-D)

    finite impulse response (FIR) filter. We have presented novel block-based structures for

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    separable and non-separable filters with less memory footprint by memory sharing and memory-

    reuse along with appropriate scheduling of computations and design of storage architecture. The

    proposed structures involve L times less storage per output (SPO), and nearly L times less energy

    consumption per output (EPO) compared with the existing structures, where L is the input block-

    size. They involve L times more arithmetic resources than the best of the corresponding existing

    structures, and produce L times more throughput with less memory band-width (MBW) than

    others. We have also proposed separate generic structures for separable and non-separable filter-

    banks, and a unified structure of filter-bank constituting symmetric and general filters. The

    proposed unified structure for 6 parallel filters involves nearly 3.6L times more multipliers, 3L

    times more adders, (N2-N+2) less registers than similar existing unified structure, and computes

    6L times more filter outputs per cycle with 6L times less MBW than the existing design, where

    N is FIR filter size in each dimension. ASIC synthesis result shows that for filter size (4 4),

    input-block size L=4, and image-size (512 512), proposed block-based non-separable and

    generic non-separable structures, respectively, involve 5.95 times and 11.25 times less area-

    delay-product (ADP), and 5.81 times and 15.63 times less EPO than the corresponding existing

    structures. The proposed unified structure involves 4.64 times less ADP and 9.78 times less EPO

    than the corresponding existing structure.

    26. Ultra-High Throughput Low-Power Packet Classification

    Packet classification is used by networking equipment to sort packets into flows by comparing

    their headers to a list of rules, with packets placed in the flow determined by the matched rule. A

    flow is used to decide a packet's priority and the manner in which it is processed. Packet

    classification is a difficult task due to the fact that all packets must be processed at wire speed

    and rulesets can contain tens of thousands of rules. The contribution of this paper is a hardware

    accelerator that can classify up to 433 million packets per second when using rule sets containing

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    tens of thousands of rules with a peak power consumption of only 9.03 W when using a Stratix

    III field-programmable gate array (FPGA). The hardware accelerator uses a modified version of

    the HyperCuts packet classification algorithm, with a new pre-cutting process used to reduce the

    amount of memory needed to save the search structure for large rulesets so that it is small

    enough to fit in the on-chip memory of an FPGA. The modified algorithm also removes the need

    for floating point division to be performed when classifying a packet, allowing higher clock

    speeds and thus obtaining higher throughputs.

    27. A Configurable and Low-Power Mixed Signal SoC for Portable ECG Monitoring

    Applications

    This paper describes a mixed-signal ECG System-on-chip (SoC) that is capable of implementing

    configurable functionality with low-power consumption

    for portable ECG monitoring applications. A low-voltage and high performance analog front-end

    extracts 3-channel ECG signals and single channel impedance measurement with

    high signal quality. A custom digital signal processor provides the configurability and advanced

    functionality like motion artifact removal and R peak detection. The SoC is implemented in

    0.18m CMOS process and consumes minimum 31.1W from a 1.2V.

    28. Partial Access Mode: New Method for Reducing Power Consumption of Dynamic

    Random Access Memory

    Demands have been placed on a dynamic random access memory (DRAM) to not only have

    increasedmemory capacity and data transfer speed, but also have reduced operating and standby

    currents. When a system uses a DRAM, a refresh operation is necessary because of its data

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    retention time restriction: each bit of the DRAM is stored as an amount of electrical charge in a

    storage capacitor that is discharged by the leakage current. Power consumption for the refresh

    operation increases in proportion to the memory capacity. We propose

    a new method to reduce the refresh powerconsumption by effectively extending the memory cell

    retention time. Conversion from 1 cell/bit to$2^{N}$ cells/bit reduces the variation in the

    retention time among memory cells. Although active powerincreases by a factor of $2^{N}$ ,

    the refresh time increases by more than $2^{N}$ as a consequence of the fact that the majority

    decision does better than averaging for the tail distribution of retention time. The conversion can

    be realized very simply from the structure of the DRAM array circuit, and it reducesthe

    frequency of disturbance and power consumption by two orders of magnitude. On the basis of

    this conversion method, we propose

    a partial access mode to reduce power consumption dynamically when the full memory capacity

    is not required.

    29. Reliability-Oriented Placement and Routing Algorithm for SRAM-Based FPGAs

    As the feature size shrinks to the nanometer scale, SRAM-based FPGAs will become

    increasingly vulnerable to soft errors. Existing reliability-

    oriented placement and routing approaches primarily focus on reducing the fault occurrence

    probability (node error rate) of soft errors. However, our analysis shows that, besides the fault

    occurrence probability, the propagation probability (error propagation probability) plays an

    important role and should be taken into consideration. In this paper, we first propose a cube-

    based analysis algorithm to efficiently and accurately estimate the error propagation

    probability. Based on such a model, we propose a novel reliability-

    oriented placement and routingalgorithm that combines both the fault occurrence probability and

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    the error propagation probability together to enhance system-level robustness against soft errors.

    Experimental results show that, compared with the baseline versatile place and route technique,

    the proposed scheme can reduce the failure rate by 20.73%, and increase the mean time between

    failures by 39.44%.

    30. Time-Based All-Digital Technique for Analog Built-in Self-Test

    A scheme for built-in self-test of analog signals with minimal area overhead for measuring on-

    chip voltages in an all-digital manner is presented. The method is well suited for a distributed

    architecture, where the routing of analog signals over long paths is minimized. A clock is routed

    serially to the sampling heads placed at the nodes of analog test voltages. This sampling head

    present at each testnode, which consists of a pair of delay cells and a pair of flip-flops, locally

    converts the test voltage to a skew between a pair of subsampled signals, thus giving rise to as

    many subsampled signal pairs as the number of nodes. To measure a certain analog voltage, the

    corresponding subsampled signal pair is fed to a delay measurement unit to measure the skew

    between this pair. The concept is validated by designing a test chip in a UMC 130-nm CMOS

    process. Sub-millivolt accuracy for static signals is demonstrated for a measurement time of a

    few seconds, and an effective number of bits of 5.29 is demonstrated for low-bandwidth signals

    in the absence of sample-and-hold circuitry.

    31. Improved 8-Point Approximate DCT for Image and Video Compression Requiring

    Only 14 Additions

    Video processing systems such as HEVC requiring low energy consumption needed for the

    multimedia market has lead to extensive development in fast algorithms for the efficient

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression

    standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT

    transforms have been proposed that offer superior compression performance at very low circuit

    complexity. Such approximations can be realized in digital VLSI hardware using additions and

    subtractions only, leading to significant reductions in chip area and power consumption

    compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-

    point DCT approximation that requires only 14 addition operations and no multiplications. The

    proposed transform possesses low computational complexity and is compared to state-of-the-art

    DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The

    proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC.

    The proposed transform and several other DCT approximations are mapped to systolic-array

    digital architectures and physically realized as digital prototype circuits using FPGA technology

    and mapped to 45 nm CMOS technology.

    32. Reconfigurable CORDIC-Based Low-Power DCT Architecture Based on Data

    Priority

    This paper presents a low-power coordinate rotation digital computer (CORDIC)-based

    reconfigurable discrete cosine transform (DCT) architecture. The main idea of this paper is based

    on the interesting fact that all the computations in DCT are not equally important in generating

    the frequency domain outputs. Considering the importance difference in the DCT coefficients,

    the number of CORDIC iterations can be dynamically changed to efficiently tradeoff image

    quality for power consumption. Thus, the computational energy can be significantly reduced

    without seriously compromising the image quality. The proposed CORDIC-based 2-D DCT

    architecture is implemented using 0.13 m CMOS process, and the experimental results show

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    that our reconfigurable DCT achieves power savings ranging from 22.9% to 52.2% over the

    CORDIC-based Loeffler DCT at the cost of minor image quality degradations.

    33. Data Encoding Techniques for Reducing Energy Consumption in Network-on-Chip

    As technology shrinks, the power dissipated by the links of a network-on-chip (NoC) starts to

    compete with the power dissipated by the other elements of the communication subsystem,

    namely, the routers and the network interfaces (NIs). In this paper, we present a set of data

    encoding schemes aimed at reducing the power dissipated by the links of an NoC. The proposed

    schemes are general and transparent with respect to the underlying NoC fabric (i.e., their

    application does not require any modification of the routers and link architecture). Experiments

    carried out on both synthetic and real traffic scenarios show the effectiveness of the proposed

    schemes, which allow to save up to 51% ofpower dissipation and 14% of energy consumption

    without any significant performance degradation and with less than 15% area overhead in the NI.

    34. Achieving High-Performance On-Chip Networks With Shared-Buffer Routers

    On-chip routers typically have buffers dedicated to their input or output ports for temporarily

    storing packets in case contention occurs on output physical channels. Buffers, unfortunately,

    consume significant portions of router area and power budgets. While running a traffic trace,

    however, not all input ports of routers have incoming packets needed to be transferred

    simultaneously. Therefore, a large number of buffer queues in the network are empty and other

    queues are mostly busy. This observation motivates us to design router architecture with shared

    queues (RoShaQ), router architecture that maximizes buffer utilization by allowing the sharing

  • Zuara Technologies Battle with bugs

    No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]

    Web site: www.zuaratech.com

    82, Station road, Radha nagar,Chrompet Chennai-44

    Mob.: 9095188016/9677465689

    multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more

    efficient hence is able to achieve higher throughput when the network load becomes heavy. On

    the other side, at light traffic load, our router achieves low latency by allowing packets to

    effectively bypass these shared queues. Experimental results on a 65-nm CMOS standard-cell

    process show that over synthetic traffics RoShaQ has 17% less latency and 18% higher

    saturation throughput than a typical virtualchannel (VC) router. Because of its higher

    performance, RoShaQ consumes 9% less energy per transferred packet than VC router given the

    same buffer space capacity. Over real multitask applications and E3S embedded benchmarks

    using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower latency than VC router

    and targeting the same application throughput with 30% lower energy per packet.

    35. Energy Efficiency Optimization Through Codesign of the Transmitter and Receiver

    in High-Speed On-Chip Interconnects

    A novel equalized global link architecture and driver-receiver codesign flow are proposed for

    high-speed and low-energy on-chip communication by utilizing a continuous-time linear

    equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the

    formula of CTLE eye opening is derived to provide high-level design guidelines and insights.

    Compared with the separate driver-receiver design flow, over 50% energy reduction is observed.

    The final optimal solution achieves 20-Gb/s signaling over 10 mm, 2.6- m pitch on-chip

    transmission line with 15.5-ps/mm latency and 0.196-pJ/b energy using 45-nm technology.

    Monte Carlo simulation also shows that 3 / for power and delay variation in the proposed

    global link are 13.1% and 4.6%, respectively.