Final

CHAPTER I

INTRODUCTION

1.1 INTRODUCTION ABOUT THE PROJECT

Nanotechnology provides smaller, faster, and lower energy devices which

allow more powerful and compact circuitry; however, these benefits come with

a cost—the nanoscale devices may be less reliable. Thermal- and shot-noise

estimations alone suggest that the transient fault rate of an individual nanoscale

device (e.g., transistor or nanowire) may be orders of magnitude higher than

today’s devices. As a result, we can expect combinational logic to be

susceptible to transient faults in addition to storage cells and communication

channels. Therefore, the paradigm of protecting only memory cells and

assuming the surrounding circuitries (i.e., encoder and decoder) will never

introduce errors is no longer valid .In this paper, we introduce a fault-tolerant

nanoscale memory architecture which tolerates transient faults both in the

storage unit and in the supporting logic (i.e., encoder, decoder (corrector), and

detector circuitries). Particularly, this involves identifying a class of error-

correcting codes (ECCs) that guarantees the existence of a simple fault-tolerant

detector design. This class satisfies a new, restricted definition for ECCs which

guarantees that the ECC codeword has an appropriate redundancy structure

such that it can detect multiple errors occurring in both the stored codeword in

memory and the surrounding circuitries. We call this type of error-correcting

codes, fault-secure detector capable ECCs (FSD-ECC). The parity-check

Matrix of an FSD-ECC has a particular structure that the decoder circuit,

generated from the parity-check Matrix, is Fault-Secure. The ECCs we identify

1

in this class are close to optimal in rate and distance, suggesting we can

achieve this property without sacrificing traditional ECC metrics. We use the

fault-secure detection unit to design a fault-tolerant encoder and corrector by

monitoring their outputs. If a detector detects an error in either of these units,

that unit must repeat the operation to generate the correct output vector. Using

this retry technique, we can correct potential transient errors in the encoder and

corrector outputs and provide a fully fault-tolerant memory system.

The novel contributions of this paper include the following:

1. a mathematical definition of ECCs which have simple FSD which

do not requiring the addition of further redundancies in order to

achieve the fault-secure property

2. identification and proof that an existing LDPC code (EGLDPC)

has the FSD property

3. a detailed ECC encoder, decoder, and corrector design that can be

built out of fault-prone circuits when protected by this fault-secure

detector also implemented in fault-prone

4. circuits and guarded with a simple OR gate built out of reliable

circuitry .

To further show the practical viability of these codes, work is done through the

engineering design of a nanoscale memory system based on these encoders and

decoders including the following:

memory banking strategies and scrubbing

reliability analysis

unified ECC scheme for both permanent memory bit

defects and transient upsets

2

This allows us to report the area, performance, and reliability achieved for

systems based on these encoders and decoders,

1.2 LITERETURE SURVEY

H. Naeimi and A. DeHon, “Fault secure encoder and decoder for memory applications,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Sep. 2007,

Proposed the concept of a nanowire-based, sub lithographic memory

architecture tolerant to transient faults. Both the storage elements and the

supporting ECC encoder and corrector are implemented in dense, but

potentially unreliable, nanowire based technology. This compactness is made

possible by a recently introduced Fault-Secure detector design [18]. Using

Euclidean Geometry error-correcting codes (ECC), and identify particular

codes which correct up to 8 errors in data words, achieving a FIT rate at or

below one for the entire memory system for bit and nanowire transient failure

rates as high as 10−17 upsets/device/cycle with a total area below 1.7× the area

of the unprotected memory for memories as small as 0.1 Gbit. Scrubbing

designs are explored and this shows that the overhead for serial error

correction and periodic data scrubbing can be below 0.02% for fault rates as

high as 10−20 upsets/device/cycle. A design is presented to unify the error-

correction coding and circuitry used for permanent defect and transient fault

tolerance.

M. Davey and D.J.Mackay, “Low density parity check codes over Gf(q),”

IEEE Commun. Lett.,vol.2,no.6,pp.165-167,jun.1998.

3

Proposed the concept of memory cells were the only circuitry susceptible

to transient faults, and all the supporting circuitries around the memory (i.e.,

encoders and decoders) were assumed to be fault-free. As a result most of prior

work designs for fault-tolerant memory systems focused on protecting only the

memory cells. However, as we continue scaling down feature sizes or use sub

lithographic devices, the surrounding circuitries of the memory system will

also be susceptible to permanent defects and transient faults .

S. J. Piestrak, A. Dandache, and F. Monteiro, “Designing fault-secure

parallel encoders for systematic linear error correcting codes,” IEEE Trans.

Reliab., vol. 52, june 2003

Proposed the scheme is using redundancy to generate fault tolerant

encoder. develops a fault- secure encoder unit using a concurrent parity

prediction scheme. Like the general parity-prediction technique, concurrently

generates (predicts) the parity-bits of the encoder outputs (encoded bits) from

the encoder inputs (information bits). The predicted parity bits are then

compared against the actual parity function of the encoder output (encoded

bits) to check the correctness of the encoder unit. The parity predictor circuit

implementation is further optimized for each ECC to make a more compact

design. For this reason, efficient parity prediction designs are tailored to a

specific code. Simple parity prediction guarantees single error detection;

however, no generalization is given for detecting multiple errors in the detector

other than complete replication of the prediction and comparison units.

H. Tang, J. Xu, S. Lin, and K. A. S. Abdel-Ghaffar, “Codes on

finite geometries,” IEEE Trans. Inf. Theory, vol. 51, no. 2, Feb. 2005.

Proposed new techniques Euclidean Geometry codes based on the lines

and points of the corresponding finite geometries .Euclidean Geometry codes

4

are also called EG-LDPC codes based on the fact that they are low-density

parity-check (LDPC) codes .LDPC codes have a limited number of 1’s in each

row and column of the matrix; this limit guarantees limited complexity in their

associated detectors and correctors making them fast and light weight .

D. J. C. MacKay, “Good error-correcting codes based on very sparse

matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar.1999.

Proposed on a simple electromechanical memory device in which an

iron nano particle shuttle is controllably positioned within a hollow nano tube

channel. The shuttle can be moved reversibly via an electrical write signal and

can be positioned with nanoscale precision. The position of the shuttle can be

read out directly via a blind resistance read measurement, allowing application

as a non volatile memory element with potentially hundreds of memory states

per device. The shuttle memory has application for archival storage, with

information density as high as 1012 bits/in2, and thermodynamic stability in

excess of one billion years.

H. Wymeersch, H. Steendam, and M. Moeneclaey, “Log-domain

decoding of LDPC codes over Gf(q),” in Proc. IEEE Int. Conf. Commun.,

Paris, France, Jun. 2004, pp. 772–776.

Proposed on a performance and reliability analysis of a scaled crossbar

molecular switch memory and demultiplexer. In particular, multi-switch

junction fault tolerance scheme is compared with a banking defect tolerance

scheme. Results indicate that delay and power scale linearly with increasing

number of redundant molecular switch junctions. The

5

multi-switch junction scheme was also shown to achieve greater than 99%

reliability for molecular switch junction failures rates less than 20%, when a

redundancy of at least 3 was implemented. In contrast, the banking scheme

was only effective for molecular switch junction failure rates of less 1%, which

requires over three times the number of banking modules.

CHAPTER II

SYSTEM ANALYSIS

2.1 EXISTING METHOD

With the popularity of mobile wireless devices soaring, the

wireless communication market continues to see rapid growth. However, with

this growth comes a significant challenge. Many applications, such as digital

video, need new high data rate wireless communication algorithms. The

continuous evolution of these wireless specifications is constantly widening the

gap between wireless algorithmic innovation and hardware implementation. In

addition, low power consumption is now a critical design issue, since the life

of a battery is a key differentiator among consumer mobile devices. The chip

designer's most important task is to implement highly complex algorithms into

hardware as quickly as possible, while still retaining power efficiency. High

Level Synthesis (HLS) methodology has already been widely adopted as the

best way to meet the challenge. This article gives an example in which an HLS

tool is used, together with architectural innovation, to create a low power

LDPC decoder.

HLS methodology allows the hardware design to be completed at a

higher level of abstraction such as C/C++ algorithmic description. This

6

provides significant time and cost savings, and paves the way for designers to

handle complex designs quickly and efficiently, producing results that compare

favorably with hand design. HLS tools also offer specific power-saving

features, designed to solve the problems of power optimization. In any design,

there are huge opportunities for power reduction at both the system and the

architecture levels.

HLS can make a significant contribution to power reduction at the

architecture level, specifically by offering the following: Ease of architecture

and micro-architecture exploration and ease of frequency and voltage

exploration. Use of high level power reduction techniques such as multi-level

clock gating, which are time-consuming and error-prone when done manually

at the RTL level. Power-saving opportunities at the RTL and gate-level are

limited and have a much smaller impact on the total power consumption

LOW DENSITY PARITY CHECK CODERS

Forward Error Correction (FEC) coding, a core technology in wireless

communications, has already advanced from 2G convolutional/block codes to

more powerful 3G Turbo codes. Recently, designers have been looking

elsewhere for help with the more complex 4G systems. A Low-Density, Parity-

Check (LDPC) encoding scheme is an attractive proposition for these systems,

because of its excellent error correction performance and highly parallel

decoding scheme. Nevertheless, it is a major challenge for any designer to

create quickly and efficiently a high performance LDPC decoder which also

meets the data rate and power consumption constraints in wireless handsets.

LDPC decoders vary significantly in their levels of parallelism, which range

from fully parallel to partially parallel to fully sequential. A fully parallel

decoder requires a large amount of hardware resources. Moreover, it hard-

wires the entire parity matrix into hardware, and therefore can only support one

7

particular LDPC code. This makes it impractical to implement in a wireless

system-on-a-chip (SoC) because different or multiple LDPC codes might need

to be supported eventually. Partial parallel architectures can achieve high

throughput decoding at a reduced hardware complexity. However, the level of

parallelism in these instances has to be at the sub-circulant (shifted identity

matrix) level, which makes it code-specific as well and therefore can be too

inflexible for the wireless SoC

2.2 PROPOSED METHOD

In this paper a fault-tolerant nano-technology

memory system that tolerates faults in the encoder, corrector and detector

circuitry as well as the memory is presented. Euclidean Geometry codes with a

fault-secure detector are used to design this memory system. These particular

codes tolerate up to 8 errors in the stored data and up to 16 total errors in

memory and correction logic with an area less than 1.7 times the unprotected

memory area; thereby this involves determining an optimum scrubbing

interval, banking scheme, and corrector parallelism so that error correction has

negligible performance overhead. This design shows a nanoscale corrector to

tolerate permanent cross point defects. Nanotechnology provides smaller,

faster, and lower energy devices, which allow more powerful and compact

circuitry; however, these benefits come with a cost—the nanoscale devices

may be less reliable. Thermal- and shot-noise estimations alone suggest that

the transient fault rate of an individual nanoscale device (e.g., transistor or

nanowire) may be orders of magnitude higher than today’s devices. As a result,

8

we can expect combinational logic to be susceptible to transient faults, not just

the storage and communication systems. Therefore, to build fault-tolerant

nanoscale systems, we must protect both combinational logic and memory

against transient faults. In the present work we introduce a fault-tolerant

nanoscale memory architecture which tolerates transient faults both in the

storage unit and in the supporting logic (i.e., encoder and decoder (corrector)

circuitry). Our proposed system with high fault-tolerant capability is feasible

when the following two fundamental properties are satisfied:

1) Any single error in the encoder or corrector circuitry can only corrupt a

single codeword digit (i.e., cannot propagate to multiple codeword digits).

2) There is a Fault Secure detector (FSD) circuit which can detect any limited

combination of errors in the received codeword or the detector circuit itself.

Property 1 is guaranteed by not sharing logic between the circuitry which

produces each bit. The FSD (Property 2) is possible with a more constrained

definition for the ECC .Figure 1 shows the memory architecture based on this

FSD. There are two FSD units monitoring the output vector of the encoder and

corrector circuitry. If an error is detected at the output of the encoder or

corrector units, that unit has to redo the operation to generate the correct output

vector. Using this detect-and-repeat technique, correct potential transient errors

can be corrected in the encoder or corrector output to provide a fault-tolerant

memory system with fault-tolerant supporting circuitry. The conventional

strategy only works as long as we can expect the encoding, decoding, and

checking logic to be fault-free, which would prevent the use of nanoscale

devices.

It is important to note that transient errors accumulate in the memory

words over time. In order to avoid error accumulation, which exceeds the code

9

correction capability, the system must scrub memory frequently to remove

errors. Memory scrubbing is periodically reading memory words from the

memory, correcting any potential errors, and writing the corrected words back

into the memory . The frequency of scrubbing must be determined carefully.

The scrubbing frequency impacts the throughput from two directions:

i) The memory cannot be used on scrubbing cycles, reducing the memory

bandwidth

available to the application; more frequent scrubbing increases this

throughput loss effect.

ii) During the normal operation, when an error is detected in a memory word,

the system

must spend a number of cycles correcting the error; these cycles also take

bandwidth

away from the application. When scrubbing happens less frequently, more

errors

accumulate in the memory, and therefore more memory reads require error

correction,

increasing bandwidth loss.

10

Fig 2.1: Fault-tolerant memory architecture, with Multiple Parallel Pipelined Corrector corrector

The information bits are fed into the encoder to encode the information

vector, and the fault secure detector of the encoder verifies the validity of the

encoded vector. If the detector detects any error, the encoding operation must

be redone to generate the correct codeword. The codeword is then stored in the

memory. During memory access operation, the stored code words will be

accessed from the memory unit. Code words are susceptible to transient faults

while they are stored in the memory; therefore a corrector unit is designed to

correct potential errors in the retrieved code words.

CHAPTER III

DEVELOPMENT ENVIRONMENT

11

3.1. HARDWARE ENVIRONMENT

1. WINDOWS XP

2. DUAL CORE processor

3. 512 SD RAM

4. JTAG CABLE

5. CPLD

3.1.1 INTRODUCTION TO CPLD

A complex programmable logic device (CPLD) is a programmable logic

device with complexity between that of PALs and FPGAs, and architectural

features of both. The building block of a CPLD is the macro cell, which

contains logic implementing disjunctive expressions and more specialized

logic operations.

Features in common with PALs:

Non-volatile configuration memory. Unlike many FPGAs, an external

configuration ROM isn't required, and the CPLD can function

immediately on system start-up.

For many legacy CPLD devices, routing constrains most logic blocks to

have input and output signals connected to external pins, reducing

opportunities for internal state storage and deeply layered logic. This is

usually not a factor for larger CPLDs and newer CPLD product families.

12

http://en.wikipedia.org/wiki/Read-only_memory

http://en.wikipedia.org/wiki/Programmable_array_logic

http://en.wikipedia.org/w/index.php?title=Macro_cell&action=edit&redlink=1

http://en.wikipedia.org/wiki/Field-programmable_gate_array

http://en.wikipedia.org/wiki/Programmable_Array_Logic

http://en.wikipedia.org/wiki/Programmable_logic_device


Features in common with FPGAs:

Large number of gates available. CPLDs typically have the equivalent of

thousands to tens of thousands of logic gates, allowing implementation

of moderately complicated data processing devices. PALs typically have

a few hundred gate equivalents at most, while FPGAs typically range

from tens of thousands to several million.

Some provisions for logic more flexible than sum-of-

product expressions, including complicated feedback paths between

macro cells, and specialized logic for implementing various commonly-

used functions, such as integer arithmetic.

The most noticeable difference between a large CPLD and a small FPGA is

the presence of on-chip non-volatile memory in the CPLD. This distinction is

rapidly becoming less relevant, as several of the latest FPGA products also

offer models with embedded configuration memory. The characteristic of non-

volatility makes the CPLD the device of choice in modern digital designs to

perform 'boot loader' functions before handing over control to other devices

not having this capability. A good example is where a CPLD is used to load

configuration data for an FPGA from non-volatile memory.

CPLDs were an evolutionary step from even smaller devices that preceded

them, PLAs (first shipped by Signetics), and PALs. These in turn were

preceded by standard logic products, that offered no programmability and were

"programmed" by wiring several standard logic chips together.

13

http://en.wikipedia.org/wiki/Logic_family

http://en.wikipedia.org/wiki/Programmable_array_logic

http://en.wikipedia.org/wiki/Signetics

http://en.wikipedia.org/wiki/Programmable_logic_array

http://en.wikipedia.org/wiki/Boot_loader

http://en.wikipedia.org/wiki/Arithmetic

http://en.wikipedia.org/wiki/Integer

http://en.wikipedia.org/wiki/Disjunctive_normal_form

http://en.wikipedia.org/wiki/Disjunctive_normal_form

http://en.wikipedia.org/wiki/Logic_gate

http://en.wikipedia.org/wiki/Field_programmable_gate_array

Because they offer high speeds and a range of capacities, CPLDs are useful

for a very wide assortment of applications, from implementing random glue

logic to prototyping small gate arrays. One of the most common uses in

industry at this time, and a strong reason for the large growth of the CPLD

market, is the conversion of designs that consist of multiple SPLDs into a

smaller number of CPLDs.

CPLDs can realize reasonably complex designs, such as graphics controller,

LAN controllers, UARTs, cache control, and many others. As a general rule-

of-thumb, circuits that can exploit wide AND/OR gates, and do not need a very

large number of flip-flops are good candidates for implementation in CPLDs.

A significant advantage of CPLDs is that they provide simple design changes

through re-programming (all commercial CPLD products are re-

programmable). With insystem programmable CPLDs it is even possible to re-

configure hardware (an example might be to change a protocol for a

communications circuit) without power-down. Designs often partition

naturally into the SPLD-like blocks in a CPLD. The result is more predictable

speed-performance than would be the case if a design were split into many

small pieces and then those pieces were mapped into different areas of the

chip. Predictability of circuit implementation is one of the strongest advantages

of CPLD architectures.

Commercially Available FPGAs

As one of the largest growing segments of the semiconductor industry,

the FPGA market-place is volatile. As such, the pool of companies involved

14

changes rapidly and it is somewhat difficult to say which products will be the

most significant when the industry reaches a stable state. For this reason, and

to provide a more focused discussion, we will not mention all of the FPGA

manufacturers that currently exist, but will instead focus on those companies

whose products are in widespread use at this time. In describing each device

we will list its capacity, nominally in 2-input NAND gates as given by the

vendor. Gate count is an especially contentious issue in the FPGA industry,

and so the numbers given in this paper for all manufacturers should not be

taken too seriously.

Wags have taken to calling them “dog” gates, in reference to the

traditional ratio between human and dog years. There are two basic categories

of FPGAs on the market today: 1. SRAM-based FPGAs and 2. antifuse-based

FPGAs. In the first category, Xilinx and Altera are the leading manufacturers

in terms of number of users, with the major competitor being AT&T. For

antifuse-based products, Actel, Quicklogic and Cypress, and Xilinx offer

competing products.

3.2 SOFTWARE ENVIRONMENT

SOFTWARE TOOLS:

MODEL SIM

XILINX

3.2.1 AN INTRODUCTION ABOUT MODEL SIM

ModelSim XE-III is a complete PC HDL simulation environment that

enables you to verify the HDL source code and functional and timing models

15

of your designs. Each of the ModelSim tools includes a complete HDL

simulation and debugging environment providing 100% VHDL and Verilog

language coverage, a source code viewer/editor, waveform viewer, design

structure browser, list window, and a host of other features designed to

enhance productivity.

ModelSim is an easy-to-use yet versatile

VHDL/(System)Verilog/SystemC simulator by Mentor Graphics. It supports

behavioral, register transfer level, and gate-level modeling. ModelSim supports

all platforms used here at the Institute of Digital and Computer Systems

(i.e. Linux, Solaris and Windows) and many others too. On Linux and Solaris

platforms ModelSim can be found preinstalled on Institute's computers.

Windows users, however, must install it by themself. This tutorial is intended

for users with no previous experience with ModelSim simulator. It introduces

you with the basic flow how to set up ModelSim simulator, compile your

designs and the simulation basics with ModelSim SE. The example used in this

tutorial is a small design written in VHDL and only the most basic commands

will be covered in this tutorial. This tutorial was made by using version 6.1b of

ModelSim SE on Linux.

The example used in this tutorial is a simple design describing an

electronic lock that can be unlocked by entering a 4-digit PIN (4169) code

from a key pad. When the lock detects the correct input sequence, it will set its

output high for one clock cycle as a sign to unlock the door. The figure below

shows the state machine of the design. The design also includes one dummy

variable (count_v) which has no practical meaning but is used to demonstrate

debug methods in ModelSim.

16

Modelsim eases the process of finding design defects with an

intelligently engineered debug environment. The model sim debug

environment efficiently displays design data for analysis and debug of all

languages. Model Sim allows many debug and analysis capabilities to be

employed post-simulation on saved results, as well as during live simulation

runs. For example, the coverage viewer analyzes and annotates source code

with code coverage results, including FSM state and transition, statement,

expression, branch, and toggle coverage. Signal values can be annotated in the

source window and viewed in the waveform viewer, easing debug navigation

with hyperlinked navigation between objects and its declaration and between

visited files. Race conditions, delta, and event activity can be analyzed in the

list and wave windows. User-defined enumeration values can be easily defined

for quicker understanding of simulation results. For improved debug

productivity, Model Sim also has graphical and textual dataflow capabilities.

FEATURES High-performance, high-capacity engine for the fastest regression suite

throughput

Native support of Verilog, VHDL, and SystemC for effective

verification of the most sophisticated design environments

Fast time-to-debug causality tracing and multi-language debug

environment

Advanced code coverage and analysis tools for fast time to coverage

closure

3.2.2 AN INTRODUCTION ABOUT XILINX

17

Xilinx, is a supplier of programmable logic devices. It is known for

inventing the field programmable gate array (FPGA) and as the first

semiconductor company with a fabless manufacturing model. Xilinx was

founded in 1984 by two semiconductor engineers, Ross Freeman and Bernard

Vonderschmitt, who were both working for integrated circuit and solid-state

device manufacturer Zilog Corp. Xilinx designs, develops and markets

programmable logic products including integrated circuits (ICs), software

design tools, predefined system functions delivered as intellectual property (IP)

cores, design services, customer training, field engineering and technical

support Xilinx sells both FPGAs and CPLDs programmable logic devices for

electronic equipment manufacturers in end markets such as communications,

industrial, consumer, automotive and data processing.

Xilinx's FPGAs have been used for the ALICE (A Large Ion Collider

Experiment) at the CERN European laboratory on the French-Swiss border to

map and disentangle the trajectories of thousands of subatomic particles The

Virtex-II Pro, Virtex-4, Virtex-5, and Virtex-6 FPGA families are focused on

system-on-chip (SoC) designers because they include up to two embedded

IBM PowerPC cores.

Xilinx FPGAs can run a regular embedded OS (such as Linux or

vxWorks) and can implement processor peripherals in programmable logic.

Xilinx's IP cores include IP for simple functions (BCD encoders, counters,

etc.), for domain specific cores (digital signal processing, FFT and FIR cores)

to complex systems (multi-gigabit networking cores, MicroBlaze soft

microprocessor, and the compact Picoblaze microcontroller). Xilinx also

creates custom cores for a fee.The ISE Design Suite is the central electronic

design automation (EDA) product family sold by Xilinx. The ISE Design Suite

18

http://en.wikipedia.org/wiki/FIR

http://en.wikipedia.org/wiki/FFT

http://en.wikipedia.org/wiki/Digital_signal_processing

http://en.wikipedia.org/wiki/BCD

http://en.wikipedia.org/wiki/VxWorks

http://en.wikipedia.org/wiki/Linux

http://en.wikipedia.org/wiki/System-on-chip

http://en.wikipedia.org/wiki/Subatomic_particles

http://en.wikipedia.org/wiki/Swiss

http://en.wikipedia.org/wiki/France

http://en.wikipedia.org/wiki/CERN

http://en.wikipedia.org/wiki/ALICE

http://en.wikipedia.org/wiki/Data_processing

http://en.wikipedia.org/wiki/Automotive

http://en.wikipedia.org/wiki/Consumer

http://en.wikipedia.org/wiki/Industrial

http://en.wikipedia.org/wiki/Communications

http://en.wikipedia.org/wiki/Zilog

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Bernard_Vonderschmitt

http://en.wikipedia.org/wiki/Bernard_Vonderschmitt

http://en.wikipedia.org/wiki/Ross_Freeman

http://en.wikipedia.org/wiki/Engineers

http://en.wikipedia.org/wiki/Manufacturing

http://en.wikipedia.org/wiki/Fabless

http://en.wikipedia.org/wiki/Company

http://en.wikipedia.org/wiki/Semiconductor

http://en.wikipedia.org/wiki/Field_programmable_gate_array


features include design entry and synthesis supporting Verilog or VHDL,

place-and-route (PAR), completed verification and debug using ChipScope Pro

tools, and creation of the bit files that are used to configure the chip.

Xilinx's Embedded Developer's Kit (EDK) supports the embedded

PowerPC 405 and 440 cores (in Virtex-II Pro and some Virtex-4 and -5 chips)

and the Microblaze core. Xilinx's System Generator for DSP implements DSP

designs on Xilinx FPGAs. A freeware version of its EDA software called ISE

WebPACK is used with some of its non-high-performance chips. Xilinx is the

only (as of 2007) FPGA vendor to distribute a native Linux freeware synthesis

toolchain. The Spartan series targets applications with a low-power footprint,

extreme cost sensitivity and high-volume; e.g. displays, set-top boxes, wireless

routers and other applications.The Spartan-6 family is built on a 45-nanometer

[nm], 9-metal layer, dual-oxide process technology. The Spartan-6 was

marketed in 2009 as a low-cost solution for automotive, wireless

communications, flat-panel display and video surveillance applications.

3.2.3 HISTORICAL PERSPECTIVE-VLSI

The electronics industry has achieved a phenomenal growth over

the last two decades, mainly due to the advent of VLSI. The number of

applications of integrated circuits in high-performance computing,

telecommunications and consumer electronics has been raising steadily and at

a very fast pace. Typically, the required computational power (or, in other

words, the intelligence) of these applications is the driving force for the fast

development of this field. The current leading-edge technologies (such as low

bit-rate video and cellular communications) already provide the end users a

19

http://en.wikipedia.org/wiki/Wireless_router

http://en.wikipedia.org/wiki/Wireless_router

http://en.wikipedia.org/wiki/Set-top_boxes

http://en.wikipedia.org/wiki/Displays

http://en.wikipedia.org/wiki/Microblaze

http://en.wikipedia.org/wiki/PowerPC

http://en.wikipedia.org/wiki/VHDL

http://en.wikipedia.org/wiki/Verilog

certain amount of processing power and portability. This trend is expected to

be continued with very important implications of VLSI and systems design.

As more and more complex functions are required in

various data processing and telecommunications devices, the need to integrate

these function in the small

system /package is also increasing .The level of integration as measured by the

number of logic gates in a monolithic chip has been steadily rising for almost

three decades, mainly due to the rapid progress in processing technology and

interconnect technology. Shows the evolution of logic complexity in integrated

circuits over the last three decades, and marks the milestone of each era. Here,

the numbers for circuit complexity should be interpreted only as representative

examples to show the order of magnitude. A logic block can contain ten to

hundred transistors depending upon the function.

The important message here is that the logic complexity per chip has been

increasing exponentially. The monolithic integration of a large number of

functions on a single chip usually provides:

Less area / volume and therefore compactness.

Less power consumption.

Less testing requirements at system level.

Higher reliability, mainly due to improved on-chip

Interconnects.

Higher speed, due to significantly reduced interconnection length.

Significant cost savings.

20

Therefore, the current trend of integration will also continue in the foreseeable

future.

3.2.3.1 VLSI DESIGN FLOW

Fig 3.1 :VLSI Design Flow

21

The design process, at various levels, is usually evolutionary in nature. It

starts with a given set of requirements. Initial design is developed and tested

against the requirements. When requirements are not met, the design has to be

improved. If such improvement is either not possible or too costly, then the

revision of requirements and its impacts analysis must be considered.

The VLSI design flow consists of three major domains, namely:

Behavioral domain

Structural domain

Geometrical Layout domain.

The design flow starts from the algorithm that the behavior of the target

chips. The corresponding architecture of the processor is first defined. It is

mapped onto the chip surface by floor planning. The next design evolution in

the behavioral domain defines finite state machines(FSMs), which are

structurally implemented with functional modules such as registers and the

arithmetic logic units (ALUs).These modules are then geometrically placed

onto the chip surface using CAD tools for automatic module placement

followed by routing, with a goal of minimizing the interconnects area and

signal delays. The third evolution starts with behavioral modules are then

implemented with leaf cells. At this stage the chip is described in terms of

logic gates (leaf cells), which can be placed and interconnected by using a cell

placement and routing program. The last evolution involves a detailed Boolean

description of leaf cells and mask generation. In standard-cell based design,

leaf cells are already pre-designed and stored in a library for logic design use.

22

3.2.4 VHDL –AN OVERVIEW

VHDL is a hardware description language. The word ‘hardware’

however is used in a wide variety of contexts, which range from complete

systems like personal computers on one side to the small logical on their

internal integrated circuits on the other side.

3.24.1 USES OF A VHDL:

Since VHDL is a standard, the chip vendors can easily exchange their

circuit designs without depending on their proprietary software. The designing

process can be greatly simplified, as each component is designed individually

and all such components are interconnected to form a full system- hierarchy

and timing are always maintained.

With simulators available, a circuit can be tested easily and any error

found can be rectified without the expense of using a physical prototype, which

means that design time and expenditure on this get slashed down. Programs

written in either of the HDLS can be easily understood as they are similar to

programs of C or Pascal.

3.2.4.2 FEATURES OF VHDL:

VHDL provides five different types of primary constructs, called design

units. They are,

Entity: It consists of a design’s interface signals to the external circuitry

Architecture: It describes a design’s behavior and functionality.

Package: It contains frequently used declarations, constants, functions,

procedures user data types and components.

23

Configuration: It binds an entity to architecture when there are multiple

architecture for a single entity

Library: It consists of all the compiled design units like entities

architectures, packages and configurations

3.2.4.3 RANGE OF USE:

The design process always starts with a specification phase. The

component, which is to be designed, is defined with respect to function, size,

interfaces, etc. Despite the complexity of the final product, mainly simple

methods based on paper and pencil most of the time are being used. After that,

self-contained modules have to be defined on the system level. Behavior

models of standard components can be integrated into the system from libraries

of commercial model developer’s .The overall system can already be

simulated.

On the logic level, the models that have to be designed are described

with all the synthesis aspects in view .As long as only a certain subset of

VHDL constructs is used, commercial synthesis programs can derive the

Boolean functions from this abstract model description and map them to the

elements of an ASIC gate library or the configurable logic blocks of FPGAs.

The result is a net list of the circuit or of the module on the gate level.

Finally, the circuit layout for a specific ASIC technology can be created

by means of other tools from the net list description. Every transition to a

lower abstraction level must be proven by functional validation. For this

purpose, the description is simulated in such a way that for all stimuli (=input

24

signals for the simulation) the module’s responses are compared. VHDL is

suitable for the design phases from system level to gate level.

3.2.4.4 APPLICATION FIELD:

VHDL is used mainly for the development of Application Specific

Integrated Circuits (Asics). Tools for the automatic transformation of VHDL

code into a gate level net list were developed already at an early point of time.

This transformation is called synthesis and is an integral part of current design

flows. For the use with Field

Programmable Gate Arrays (FPGAs) several problems exist. In the first

step, Boolean equations are derived from the VHDL description, no matter,

whether an ASIC or a FPGA is the target technology. But now, this Boolean

code has to be partitioned into the configurable logic blocks (CLB) of the

FPGA. This is more difficult than the mapping onto an ASIC library. Another

big problem is the routing of the CLBs, as the available resources for

interconnections are the bottleneck of current FPGAs.

MODELING PROCEDURES USING VHDL

STRUCTURAL STYLE OF MODELING In the structural style of modeling, an entity is described as a set of Inter

connected components. Here the architecture body is composed of two parts:

the declarative part and statement part. Declarative part contains the

component declarations. The declared components are instantiated in the

statement part.

BEHAVIORAL STYLE OF MODELING

25

The behavioral style of modeling specifies the behavior of an entity as

aset of statements that are executed sequentially in the specified order. This set

of sequential statements, which are specified inside a process statement, do not

explicitly specify the structure of the entity but merely its functionality. A

process statement is a concurrent statement that can appear with in an

architecture body.

DATA FLOW STYLE OF MODELING

In this modeling style, the flow of data through the entity is expressed

primarily using concurrent signal assignment statements. The structure of the

entity is not explicitly specified in this modeling style, but it can be implicitly

deduced.

MIXED STYLE OF MODELING:

It is possible to mix three modeling styles that we have known in a

single Architecture body. That is, within an architecture body, we could use

component instantiation statements, concurrent signal assignment statements,

and process statements.

CHAPTER IV

ARCHITECTURE DETAILS

This paper presents a high-throughput decoder architecture for generic

quasi-cyclic low-density parity-check (QC-LDPC) codes. Various

26

optimizations are employed to increase the clock speed. A row permutation

scheme is proposed to significantly simplify the implementation of the shuffle

network in LDPC decoder. An approximate layered decoding approach is

explored to reduce the critical path of the layered LDPC decoder. Provided are

an LDPC encoder and decoder, and LDPC encoding and decoding methods.

The LDPC encoder includes: a code generating circuit that includes a memory

storing a first parity check matrix and sums a first row which is at least one

row of the first parity check matrix and a second row which is at least one of

the remaining rows of the first parity check matrix to output a second parity

check matrix; and an encoding circuit receiving the second parity check matrix

and an information word to output an LDPC-encoded code word. Also the

LDPC decoder includes: a code generating circuit including a memory which

stores a first parity check matrix and summing a first row which is at least one

row of the first parity check matrix and a second row which is at least one of

the remaining rows of the first parity check matrix to output a second parity

check matrix; and a decoding circuit receiving the second parity check matrix

and a code word to output an LDPC-decoded information word. he low-density

parity-check (LDPC) code invented in 1962 by Robert Gallager is a linear

block code defined by a very sparse parity check matrix, which is populated

primarily with zeros and sparsely with ones.

When it was first introduced, the LDPC code was too complicated to

implement, and so it was forgotten for a long time until not too long ago. The

LDPC code was brought to light again in 1995, and an irregular LDPC code

(which is a generalization of the LDPC code suggested by Robert Gallager)

was introduced in 1998. When the LDPC code was first introduced by

Gallager, a probabilistic decoding algorithm was also suggested, and the LDPC

code which is decoded using this algorithm exhibited excellent performance

27

characteristics. The LDPC code also showed improved performance when

extended to non-binary code as well as binary code to define code words. Like

the turbo code, the LDPC code yields a bit error rate (BER) approaching a

Shannon channel capacity limit, which is the theoretical maximum amount of

digital data that can be transmitted in a given bandwidth in presence of a

certain noise interference. The irregular LDPC code which is known to have

the best performance only needs an additional 0.13 dB from the Shannon

channel capacity to achieve a BER of 10−6 when a code length is a million bits

in an additive white Gaussian noise (AWGN) channel environment, and is thus

suitable for applications which require high-quality transmission with a very

low BER.

Unlike algebraic decoding algorithms usually used for decoding a block

code, the decoding algorithm of the LDPC code is a probabilistic decoding

algorithm to which a belief-propagation algorithm, which employs a graph

theory and a guessing theory, is applied “as is”. An LDPC decoder computes a

probability of a bit corresponding to each bit of a code word received through a

channel being “1” or “0”. The probability information computed by the LDPC

decoder is referred to as a message, and the quality of the message can be

checked through each parity defined in a parity check matrix. If a certain parity

of the parity check matrix is satisfied, i.e., the result of a parity check is

positive, the computed message is specially referred to as a parity check

message and contains the most probable value of each code word bit. The

parity check message for each parity is used to determine the value of a

corresponding bit, and information on a computed bit is referred to as a bit

message. Through a procedure of repeating such message transmission, the

information for bits of each code word comes to satisfy all parities of the

parity-check matrix. Finally, when all parities of the parity-check matrix are

28

satisfied, the decoding of the code word is finished. In an environment where a

signal to noise (S/N) ratio is low, systematic codes are used, and thus certain

portions of the code word are extracted to reproduce information bits.

If a channel is a frequency selective fading channel, adaptive modulation

and coding is used for low-error communication. The LDPC code is a type of

block channel code and thus has the disadvantage of being difficult to

adaptively modulate compared to a trellis code such as a convolution code or a

turbo code to which a desired form of modulation and coding can easily be

applied through puncturing. In order for the LDPC code to support various

code rates for adaptive transmission, it has to have various code matrices,

which carries the disadvantage of the encoder and the decoder needing a large

memory.

4.1 SUMMARY OF THE INVENTION

The present invention is directed to an LDPC encoder, an LDPC

decoder, and LDPC encoding and decoding methods in which a size of a

memory of the encoder and decoder can be reduced by forming, from one

parity-check matrix, a smaller parity-check matrix. A first aspect of the present

invention is to provide an LDPC encoder, including: a code generating circuit

including a memory which stores a first parity check matrix and summing a

first row which is at least one row of the first parity check matrix and a second

row which is at least one of the remaining rows of the first parity check matrix

to output a second parity check matrix; and an encoding circuit receiving the

second parity check matrix and an information word to output an LDPC-

encoded code word.

29

A second aspect of the present invention is to provide an LDPC decoder,

including: a code generating circuit including a memory which stores a first

parity check matrix and summing a first row which is at least one row of the

first parity check matrix and a second row which is at least one of the

remaining rows of the first parity check matrix to output a second parity check

matrix; and a decoding circuit receiving the second parity check matrix and a

code word to output an LDPC-decoded information word.

A third aspect of the present invention is to provide an LDPC encoder,


parity check matrix and outputting a second parity check matrix formed by

removing a first row which is at least one row of the first parity check matrix;

and an encoding circuit receiving the second parity check matrix and an

information word to output an LDPC-encoded code word.

A fourth aspect of the present invention is to provide an LDPC decoder,


parity check matrix and outputting a second parity check matrix formed by

removing a first row which is at least one row of the first parity check matrix;

and a decoding circuit receiving the second parity check matrix and a code

word to output an LDPC-decoded information word.

A fifth aspect of the present invention is to provide an LDPC encoding

method, including: storing a first parity check matrix in a memory; summing a



to form a second parity check matrix; and receiving the second parity check

matrix and an information word and performing LDPC-encoding.

30

A sixth aspect of the present invention is to provide an LDPC decoding

method, including: storing a first parity check matrix in a memory; summing a



to form a second parity check matrix; and receiving the second parity check

matrix and a code word and performing LDPC-decoding. Low Density Parity

Check (LDPC) codes offer excellent error correcting performance. However,

current implementations are not capable of achieving the performance required

by next generation storage and telecom applications. Extrapolation of many of

those designs is not possible because of routing congestions. This article

proposes a new architecture, based on a redefinition of a lesser-known LDPC

decoding algorithm. As random LDPC codes are the most powerful, we abstain

from making simplifying assumptions about the LDPC code which could ease

the routing problem. We avoid the routing congestion problem by going for

multiple independent sequential decoding machines, each decoding separate

received codewords. In this serial approach the required amount of memory

must be multiplied by the large number of machines. Our key contribution is a

check node centric reformulation of the algorithm which gives huge memory

reduction and which thus makes the serial approach possible.

NANO-X API

The Nano-X API tries to be compliant with the Microsoft Win32 and

WinCE GDI standard. Currently, there is support for most of the graphics

drawing and clipping routines, as well as automatic window title bar drawing

and dragging windows for movement. The Nano-X API is message-based, and

allows programs to be written without regard to the eventual window

31

management policies implemented by the system. The Nano-X API is not

currently client/server, and will be discussed in more detail in the section

called Nano-X API.

NANO-X API

The Nano-X API is modeled after the mini-x server written initially by

David Bell, which was a reimplementation of X on the MINIX operating

system. It loosely follows the X Window System Xlib API, but the names all

being with GrXXX() rather than X...(). Currently, the Nano-X API is

client/server, but does not have any provisions for automatic window

dressings, title bars, or user window moves. There are several groups writing

widget sets currently, which will provide such things. Unfortunately, the user

programs must also then write only to a specific widget set API, rather than

using the Nano-X API directly, which means that only the functionality

provided by the widget set will be upwardly available to the applications

programmer. (Although this could be considerable, in the case that, say Gdk

was ported.)

In recent years, research on nanotechnology has advanced rapidly. Novel

nanodevices have been developed, such as those based on carbon nanotubes,

nanowires, etc. Using these emerging nanodevices, diverse nanoarchitectures

have been proposed. Among them, hybrid nano/CMOS reconfigurable

architectures have attracted attention because of their advantages in

performance, integration density, and fault tolerance. Recently, a high

performance hybrid nano/CMOS reconfigurable architecture, called NATURE,

was presented. NATURE comprises CMOS reconfigurable logic and

interconnect fabric, and CMOS-fabrication-compatible nanomemory. High-

32

http://embedded.centurysoftware.com/docs/nx/arch--nanox-api.html

http://embedded.centurysoftware.com/docs/nx/arch--nanox-api.html

density, fast nano RAMs are distributed in NATURE as on-chip storage to

store multiple reconfiguration copies for each reconfigurable element. It

enables cycle-by-cycle runtime reconfiguration and a highly efficient

computational model, called temporal logic folding. Through logic folding,

NATURE provides more than an order of magnitude improvement in logic

density and area-delay product, and significant design flexibility in performing

area-delay trade-offs, at the same technology node. Moreover, NATURE can

be fabricated using mainstream photolithography fabrication techniques.

Hence, it offers a currently commercially viable reconfigurable architecture

with high performance, superior logic density, and outstanding design

flexibility, which is very attractive for deployment in cost-conscious embedded

systems.

In order to fully explore the potential of NATURE and further improve

its performance, in this article, a thorough design space exploration is

conducted to optimize its architecture. Investigations in terms of different logic

element architectures, interconnect designs, and various technologies for nano

RAMs are presented. Nano RAMs can not only be used as storage for

configuration bits, but the high density of nano RAMs also makes them

excellent candidates for large-capacity on-chip data storage in NATURE.

Many logic- and memory-intensive applications, such as video and image

processing, require large storage of temporal results. To enhance the capability

of NATURE for implementing such applications, we investigate the design of

nano data memory structures in NATURE and explore the impact of memory

density. Experimental results demonstrate significant throughput

improvements due to area saving from logic folding and parallel data

processing.

33

CHAPTER V

SYSTEM MODULES

5.1 FAULT TOLERANCE APPROACH

Fault tolerance technique is based on at least one of the three

types of redundancy: time, data, or hardware redundancy. Hardware

redundancy means the replication of hardware modules and some kind of result

comparison or voting instance. The inherent redundancy in field-

programmable logic resulting from the regular cell-based structure allows a

very efficient implementation of hardware redundancy. The faulty resource

must not be reused by the new configuration. After the reconfiguration, the

possible effect of the fault must be confined for some applications and the

circuit must be reset to a consistent state. Then the system can continue to

operate. The idea of an autonomous mechanism for fault detection and

reconfiguration at an appropriate speed, in terms of the regarded system, is the

starting point for the fault tolerance technique presented here. The technique

combines a scalable hardware-based fault detection mechanism with a fast

online fault reconfiguration technique and a check pointing and rollback

mechanism for fault recovery. The reconfiguration is based on a hardware-

34

implemented reconfiguration controller: the reconfiguration control unit

(RCU). In contrast to other online fault test and reconfiguration strategies as

described. The fault detection mechanism must provide the fault location and

trigger reconfiguration. The reconfiguration step must replace the current

configuration data set by an alternative configuration (which provides a fault-

avoiding mapping of the user circuit) and trigger recovery. The recovery step

must bring the whole system back into a consistent state. a fast online

technique, such differentiations are too time-consuming and a simpler

approach must be taken: all faults are assumed to be permanent. Even under

this assumption, no general technique is available today which controls the

appropriate reconfiguration procedure.

Fig.5.1: Phases of the fault tolerance technique.

The basic characteristics of fault tolerance require:

1. No single point of repair

2. Fault isolation to the failing component

3. Fault containment to prevent propagation of the failure

35

4. Availability of reversion modes

Fault-tolerant systems are typically based on the concept of redundancy.

5.2 NANOMEMORY ARCHITECTURE MODEL

The design structure of the encoder, corrector, and detector units of

our proposed fault-tolerant memory system. We also present the

implementation of these units on a sub-lithographic, nanowire-based substrate.

Before going into the design structure details we start with a brief overview of

the sub-lithographic memory architecture model.

Fig. 5.2. Structure of Nano Memory core

We use the Nano Memory and Nano PLA architectures to implement the

memory core and the supporting logic, respectively. Nano Memory and Nano

PLA are based on nanowire crossbars .The Nano Memory architecture

developed in can achieve greater than b/cm density even after including the

lithographic-scale address wires and defects. This design uses a nanowire

crossbar to store memory bits and a limited number of lithographic scale wires

for address and control lines. Fig.3 shows a schematic overview of this

memory structure. The nanowires can be uniquely selected through the two

address decoders located on the two sides of the memory core. Instead of using

36

a lithographic-scale interface to read and write into the memory core, we use a

nanowire-based interface. The reason that we can remove the lithographic-

scale interface is that all the blocks interfacing with the memory core (encoder,

corrector and detectors) are implemented with nanowire-based crossbars.

5.3 FAULT SECURE DETECTOR

The core of the detector operation is to generate the syndrome vector,

which is basically implementing the following vector-matrix multiplication on

the received encoded vector C and parity-check matrix H:

S=C.HT

Fig.5.3: Fault-secure detector for (15, 7, 5) EG-LDPC code

This binary sum is implemented with an XOR gate. Fig. 4 shows the

detector circuit for the (15, 7, 5)EG-LDPC code. Since the row weight of the

parity-check matrix is ρ , to generate one digit of the syndrome vector we need

a ρ -input XOR gate, or (ρ-1)2-input XOR gates. For the whole detector, it take

n(ρ-1) 2-input XOR gates. Table II illustrates

this quantity for some of the smaller EG-LDPC codes.

37

Hamming bound EG-LDPC Gilert Varshamov bound

(14,7,5) (15,7,5) (17,7,5)

(58,37,9) (63,37,9) (67,37,9)

(222,175,17) (255,175,17) (255,175,17)

TABLE 5.1 Detector, encoder, and corrector circuit area

An error is detected if any of the syndrome bits has a nonzero value. The

final error detection signal is implemented by an OR function of all the

syndrome bits. The output of this -input OR gate is the error detector signal

(see Fig. 4). In order to avoid a single point of failure, we must implement the

OR gate with a reliable substrate (e.g., in a system with sub-lithographic

nanowire substrate, the OR gate is implemented with reliable lithographic

technology—i.e., lithographic-scaled wire-OR).

5.4 ENCODER

An n-bit codeword c, which encodes a k-bit information vector is

generated by multiplying the k -bit information vector with k x n a bit

generator matrix G ; i.e., c=i .G.. EG-LDPC codes are not systematic and the

information bits must be decoded from the encoded vector, which is not

desirable for our fault-tolerant approach due to the further complication and

delay that it adds to the operation. these codes are cyclic codes 15. We used the

procedure to convert the cyclic generator matrices to systematic generator

matrices for all the EG-LDPC codes under consideration.

38

Fig. 5.4: Structure of an encoder circuit for the (15, 7, 5) EG-LDPC code

The above figure shows the encoder circuit to compute the parity bits of

the (15, 7, 5) EG-LDPC code. In this figure i=(i0,…….,i6) is the information

vector and will be copied to c0,…….c6 bits of the encoded vector, c , and the

rest of encoded vector ,the parity bits, are linear sums (XOR) of the

information bits. If the building block is two-input gates then the encoder

circuitry takes 22 two-input XOR gates. Table I shows the area of the encoder

circuits for each EG-LDPC codes under consideration based on their generator

matrices.

5.5 CORRECTOR

1) ONE-STEP MAJORITY-LOGIC CORRECTOR

One-step majority logic correction is the procedure that identifies the

correct value of a each bit in the codeword directly from the received

codeword; this is in contrast to the general message-passing error correction

39

strategy (e.g., [23]) which may demand multiple iterations of error diagnosis

and trial correction. Avoiding iteration makes the correction latency both small

and deterministic

This method consists of two parts:

1) Generating a specific set of linear sums of the received vector bits

2) Finding the majority value of the computed linear sums. linear sum of

the received

encoded vector bits can be formed by computing the inner product of

the received

vector and a row of a parity-check matrix. This sum is called Parity-

Check sum

2) MAJORITY CIRCUIT IMPLEMENTATION

Here we present a compact implementation for the majority gate using

Sorting Networks

5.6 BANKED MEMORY

Large memories are conventionally organized as sets of smaller memory

blocks called banks. The reason for breaking a large memory into smaller

banks is to trade off overall memory density for access speed and reliability.

Excessively small bank sizes will incur a large area overhead for memory

drivers and receivers. Large memory banks require long rows and columns

which results in high capacitance wires that consequently increases

the delay. Furthermore long wires are more susceptible to breaks and bridging

defects. Therefore excessively large memory banks have high defect rate and

low performance.

40

Fig.5.5. Banked memory organization, with single global corrector.

The number of faults that accumulate in the memory is directly related

to the scrubbing period. The longer the scrubbing period is, the larger the

number of errors that can accumulate in the system. However, scrubbing all

memory words serially can take a long time. If the time to serially scrub the

memory becomes noticeable compared to the scrubbing period, it can reduce

the system performance. To reduce the scrubbing time, we can potentially

scrub all the memory banks in parallel

CHAPTER VI

SYSTEM IMPLEMENTATION

41

6.1 PROCESS (Dynamic Reconfiguration)

The feasibility of run-time reconfiguration of FPGAs has been

established by a large number of case studies. However, these systems have

typically involved an ad hoc combination of hardware and software. The

software that manages the dynamic reconfiguration is typically specialized to

one application and one hardware configuration. We present three different

applications of dynamic reconfiguration, based on research activities at

Glasgow University, and extract a set of common requirements. We present the

design of an extensible run-time system for managing the dynamic

reconfiguration of FPGAs, motivated by these requirements. The system is

called RAGE, and incorporates operating-system style services that permit

sophisticated and high level operations on circuits.

ECC stands for "Error Correction Codes" and is a method used to detect

and correct errors introduced during storage or transmission of data. Certain

kinds of RAM chips inside a computer implement this technique to correct

data errors and are known as ECC Memory. ECC Memory chips are

predominantly used in servers rather than in client computers. Memory errors

are proportional to the amount of RAM in a computer as well as the duration of

operation. Since servers typically contain several Gigabytes of RAM and are in

operation 24 hours a day, the likelihood of errors cropping up in their memory

chips is comparatively high and hence they require ECC Memory.

Memory errors are of two types, namely hard and soft. Hard errors are

caused due to fabrication defects in the memory chip and cannot be corrected

once they start appearing. Soft errors on the other hand are caused

predominantly by electrical disturbances. Memory errors that are not corrected

immediately can eventually crash a computer. This again has more relevance

42

to a server than a client computer in an office or home environment. When a

client crashes, it normally does not affect other computers even when it is

connected to a network, but when a server crashes it brings the entire network

down with it. Hence ECC memory is mandatory for servers but optional for

clients unless they are used for mission critical applications.

ECC Memory chips mostly use Hamming Code or Triple Modular

Redundancy as the method of error detection and correction. These are known

as FEC codes or Forward Error Correction codes that manage error correction

on their own instead of going back and requesting the data source to resend the

original data. These codes can correct single bit errors occurring in data. Multi-

bit errors are very rare and hence due not pose much of a threat to memory

systems.

ENCODING PROCESS

EGLDPC codes have received tremendous attention in the coding

community because of their excellent error correction capability and near-

capacity performance. Some randomly constructed EGLDPC codes, measured

in Bit Error Rate (BER), come very close to the Shannon limit for the AWGN

channel (within 0.05 dB) with iterative decoding and very long block sizes (on

the order of 106 to 107). However, for many practical applications (e.g.

packet-based communication systems), shorter and variable block-size

EGLDPC codes with good Frame Error Rate (FER) performance are desired.

Communications in packet-based wireless networks usually involve a large

per-frame overhead including both the physical (PHY) layer and MAC layer

headers. As a result, the design for a reliable wireless link often faces a trade-

off between channel utilization (frame size) and error correction capability.

One solution is to use adaptive burst profiles in which, transmission parameters

43

relevant to modulation and coding may be assigned dynamically on a burst-by-

burst basis. Therefore, LDPC codes with variable block lengths and multiple

code rates for different quality-of service under various channel conditions are

highly desired.

FLOW OF ENCODING PROCESS

Fig 6.1:Flow of encoding process

In the recent literature, there are many EGLDPC decoder architectures

but few of them support variable block-size and muti-rate decoding. For

example, a 1 Gbps 1024-bit, rate 1/2 EGLDPC decoder has been implemented.

However this architecture just supports one particular EGLDPC code by

wiring the whole Tanner graph into hardware. A code rate programmable

EGLDPC decoder is proposed, but the code length is still fixed to 2048 bit for

simple VLSI implementation. In [3], a EGLDPC decoder that supports three

block sizes and four code rates is designed by storing 12 different parity check

matrices on-chip. As we can see, the main design challenge for supporting

variable block sizes and multiple code rates stems from the random or

unstructured nature of the EGLDPC codes. Generally support for different

44

block sizes of EGLDPC codes would require different hardware architectures.

To address this problem, we propose a generalized decoder architecture based

on the quasi-cyclic EGLDPC codes that can support a wider range of block

sizes and code rates at a low hardware requirement. To balance the

implementation complexity and the decoding throughput, a structured

EGLDPC code was proposed in recently for modern wireless communication

systems including but not limited to IEEE 802.16e and IEEE 802.11n. An

expansion factor. It divides the variable nodes and the check nodes into

clusters of size P such that if there exists an edge between variable and check

clusters, then it means P variable nodes connect to P check nodes via a

permutation (cyclic shift) network. Generally, support for different block sizes

and code rates implies usage of multiple PCMs. Storing all the PCMs onchip is

almost impractical and expensive. A good tradeoff between design complexity

and decoding throughput is partially parallel decoding by grouping a certain

number of variable and check nodes into a cluster for parallel processing.

Furthermore, the layered decoding algorithm can be applied to improve the

decoding convergence time by a factor of two and hence increases the

throughput. The structured EGLDPC code makes it effectively suitable for

efficient VLSI implementation by significantly simplifying the memory access

and message passing. The PCM can be viewed as a group of concatenated

horizontal layers, where the column weight is at most 1in each layer due to the

cyclic shift structure.

6.2 TESTING TECHNIQUES

In this project describe simple iterative decoders for low-density parity-

check codes based on Euclidean geometries, suitable for practical very-large-

scale-integration implementation in applications requiring very fast decoders.

45

The decoders are based on shuffled and replica-shuffled versions of iterative

bit-flipping (BF) and quantized weighted BF schemes. The proposed decoders

converge faster and provide better ultimate performance than standard BF

decoders. Here present simulations that illustrate the performance versus

complexity tradeoffs for these decoders. This project can show in some cases

through importance sampling that no significant error floor exists. Here novel

architectures comprising of one parallel and two semi-parallel decoder

architectures for popular PG-based LDPC codes.

These architectures have no memory clash and further are reconfigurable

for different lengths (and their corresponding rates). The architectures can be

configured either for the regular belief propagation based decoding or majority

logic decoding (MLD).In this paper, these analyze storage circuits constructed

from unreliable memory components. This project propose a memory

construction, using low-density parity-check codes, based on a construction

originally made by Taylor. The storage circuit consists of unreliable memory

cells along with a correcting circuit. The correcting circuit is also constructed

from unreliable logic gates along with a small number of perfect gates. The

modified construction enables the memory device to perform better than the

original construction. These present numerical results supporting our claims.

CHAPTER VII

PERFORMANCE AND LIMITATIONS

REED-SOLOMON APPLICATIONS

Modem Technologies xDSL, Cable modems CD, DVD Players

46

Digital Audio and Video Broadcast HDTV/Digital TV Data Storage and Retrieval Systems Hard-Disk Drives, CD-ROM Wireless Communications Cell Phones, Base Stations Wireless Enabled PDAs Digital Satellite Communication and Broadcast RAID Controllers with Fault-Tolerance

7.1 APPLICATIONS:

Used in SOC, NOC Processor

Used in Radios

Used almost in all electronic devices

Loopback BIST model for digital transceivers with limited test

circuitry

Spot-defects models (typical of CMOS technology) based on noise

and nonlinear analysis, using fault abstraction

7.2 MERITS OF SYSTEM

Reduces maintenance cost

High speed fault tolerance

Can easily identify faults

Process Capability

No external circuitry

Does not affect the internal Architecture of nano memory

Multiple faults can be easily solved

47

7.3 LIMITATIONS OF SYSTEM

Hardware faults cannot be recognized

Only pre designed regions can be checked

May negatively impact manufacturers current technology of silicon

chips

Only used in specific application

7.4 FUTURE ENHANCEMENT

With the advancement in science electrical and electronic devices

has reached unimaginable levels. The main constraint of any good device is,

it serves its purpose effectively BiST enables this efficiency. Future BiST t

system can be designed in such a way that hardware faults can also be

indicated so that it can be corrected. A multiprocessor system-on-chip is an

integrated system that performs real-time tasks at low power and for low

cost.

CHAPTER VIII

OUTPUT RESULTS AND DISCUSSIONS

ENCODER

48

DECODER

49

EXISTING METHOD’S RESULT

50

PAPER’S RESULT

PROPOSED METHOD’S RESULT

51

CHAPTER IX

CONCLUSION

This paper presents an algebraic method for constructing Modified E.G

low-density parity-check (LDPC) codes based on the structural properties of

Euclidean geometries. The construction method results in a class of M-EG-

LDPC codes. The key novel contribution of this paper is identifying and

defining a new class of error-correcting codes whose redundancy makes the

design of fault-secure detectors (FSD) particularly simple. We further quantify

the importance of protecting encoder and decoder circuitry against transient

errors, illustrating a scenario where the system failure rate (FIT) is dominated

by the failure rate of the encoder and decoder. We prove that Euclidean

geometry low-density parity-check (EG-LDPC) codes have the fault-secure

detector capability

52

CHAPTER X

REFERENCES

[1] Fault Secure Encoder and Decoder for memory Applications. Naeimi and

A. DeHon, in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst.,

Sep.2007, pp. 409–417

[2] M. Davey and D.J.Mackay, “Low density parity check codes over Gf(q),”

IEEE Commun. Lett.,vol.2,no.6,pp.165-167,jun.1998.

[3] Codes on finite geometries H.Tang, J. Xu, S.Lin, and K. A. S. Abdel- IEEE

Trans. Inf. Theory, vol. 51, no. 2,Feb. 2005.

[4] Designing fault-secure parallel encoders for systematic linear error

correcting codes S. J. Piestrak, A. Dandache, and F. Monteiro IEEE Trans.

Reliab., vol. 52, no.4 December 2003.

[5] D. J. C. MacKay, “Good error-correcting codes based on very sparse

matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar.1999.

[6] H. Wymeersch, H. Steendam, and M. Moeneclaey, “Log-domain

decoding of LDPC codes over Gf(q),” in Proc. IEEE Int. Conf. Commun.,

Paris, France, Jun. 2004, pp. 772–776.

53

54

Final

Documents

fault secure encoder

faulttolerant memory

transient fault rate

faulttolerant encoder

fault rates

transient fault tolerance

faultsecure property

permanent memory