Top Banner
University of Stuttgart Institute of Computer Architecture and Computer Engineering Haupt-Seminar Reliable Networks-on-Chip in the Many-Core Era Error Correction Techniques on NoC Protocol Layers 23 June 2009 Author: Ahmed Garamoun Advisor: Dipl.-Inf. Adan Kohler Reviewers: Prof. Dr. H.-J. Wunderlich, Prof. Dr. M. Radetzki
26

Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

Jan 30, 2018

Download

Documents

vuongkhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

University of Stuttgart

Institute of Computer Architecture and Computer Engineering

Haupt-Seminar

Reliable Networks-on-Chip in the Many-Core Era

Error Correction Techniques on NoC Protocol Layers

23 June 2009

Author: Ahmed Garamoun

Advisor: Dipl.-Inf. Adan Kohler

Reviewers: Prof. Dr. H.-J. Wunderlich,

Prof. Dr. M. Radetzki

Page 2: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  I 

Contents      

LIST OF FIGURES .................................................................................... II LIST OF TABLES .................................................................................... III 1 INTRODUCTION ................................................................................... 1 2 ANALYSIS OF ERROR RECOVERY SCHEMES ................................ 2

2.1 RETRANSMISSON SCHEME ................................................................... 2 2.2 ERROR CORRECTION SCHEME .............................................................. 3

2.2.1 Crosstalk Avoidance Correction Codes ......................................................... 4 2.2.1.1 Introduction to Crosstalk.................................................................................... 5 2.2.1.2 Single Error Correction Codes ........................................................................... 6 2.2.1.3 Multiple Error Correction Codes ...................................................................... 10 2.2.1.4 Power Dissipation ............................................................................................ 14

2.3 HYBRID SCHEME............................................................................... 15 3 ERROR CONTROL POLICIES ............................................................ 16

3.1 SWITCH ARCHITECTURE .................................................................... 16 3.2 END-TO-END ERROR CONTROL POLICY .............................................. 17 3.3 SWITCH-TO-SWITCH ERROR CONTROL POLICY .................................. 17

4 CONCLUSION ...................................................................................... 19 REFERENCES.......................................................................................... 20

Page 3: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  II 

List of Figures

FIGURE 1: RETRANSMISSION SCHEME. ......................................................................... 3 FIGURE 2: MINIMUM HAMMING DISTANCE FOR ERROR DETECTION AND CORRECTION

CODES ................................................................................................................. 3 FIGURE 3: ERROR CORRECTION CAPABILITIES FOR DIFFERENT CODING APPROACHES

UNDER DIFFERENT FAULT SCENARIOS. .................................................................. 4 FIGURE 4: (A) CROSS-SECTION SHOWS VICTIM AND AGGRESSOR WIRES, (B) POSSIBLE

CROSSTALK EFFECTS. .......................................................................................... 5 FIGURE 5: DAP ENCODER .......................................................................................... 6 FIGURE 6: DAP DECODER .......................................................................................... 7 FIGURE 7: ENCODER FOR BSC. ................................................................................... 8 FIGURE 8: DECODER FOR BSC. ................................................................................... 9 FIGURE 9: ENCODER AND DECODER FOR DAPBI. ........................................................ 9 FIGURE 10: ENCODER FOR CADEC............................................................................10 FIGURE 11: DECODING ALGORITHM FOR CADEC. ......................................................11 FIGURE 12: CIRCUIT IMPLEMENTATION FOR CADEC DECODER...................................11 FIGURE 13: DECODING ALGORITHM FOR JTEC. ..........................................................12 FIGURE 14: ENCODER FOR OPTIMIZED JTEC. ..............................................................13 FIGURE 15: DECODER ALGORITHM FOR OPTIMIZED JTEC. ..........................................14 FIGURE 16: POWER DISSIPATION TRADEOFF AS FUNCTION OF ERROR CORRECTING

CODES. ...............................................................................................................14 FIGURE 17 : VOLTAGE SWING REDUCTION AS FUNCTION OF ERROR PROBABILITY ........15 FIGURE 18: ENERGY DISSIPATION FOR BFT ARCHITECTURE WITH DIFFERENT CODING

SCHEMES. ...........................................................................................................15 FIGURE 19: SWITCH ARCHITECTURE ...........................................................................17 FIGURE 20: SWITCH ARCHITECTURE FOR END-TO-END ERROR CONTROL POLICY. ........18 FIGURE 21: SWITCH ARCHITECTURE FOR SWITCH-TO-SWITCH ERROR CONTROL

POLICY. ..............................................................................................................19

 

 

 

 

 

 

 

Page 4: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  III 

List of Tables

TABLE 1: THE ANALYZED CODE APPROACHES.............................................................. 4 TABLE 2: EFFECTIVE CAPACITANCE (CEFF) OF THE VICTIM WIRE AND ITS

CORRESPONDING TRANSITION ACTIVITIES. ............................................................ 6 TABLE 3: DAP CODE EXAMPLES. ................................................................................ 7 TABLE 4: MDR CODE EXAMPLES. ............................................................................... 7 TABLE 5: BSC CODE EXAMPLES. ................................................................................. 8 

Page 5: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  1 

Chapter 1 Introduction

As the device geometry shrinks toward the nanometer scale and current System-on-Chips (SoC) integrate 10 to 100 or more of embedded functional units and storage blocks [Jant05], the interconnection between these blocks became the bottleneck in achieving high degree of integration, to solve that problem the researchers introduced Network-on-Chips (NoC) as a way to achieve a high degree of integration and performance requirements. Network on chips are highly exposed to several sources of transient noise [Mura05], affecting signal integrity and system reliability. As a result, error control and recovery schemes have been discussed.

Error recovery schemes and error control policy can protect the system from errors that affect the system reliability. Chapter 2 discusses Error recovery [Ross07] schemes which consist of three schemes. One scheme is to detect errors and, in case of erroneous data, retransmit that specific data; another scheme uses error correction codes with a focus on crosstalk avoidance error correction codes. Finally, there are hybrid scheme which combines the principles of error detection with retransmission and error correction. Chapter 3 discusses Error control policies [Mura05] which determine where errors will be detected and corrected. For Network-on-Chips, two error control policies in different network layers can be used, end-to-end control policy which in the network layer or switch-to-switch control which in the link layer.

Since the different recovery and error control schemes vary in area, power and performance overhead, the choice of the error control and recovery schemes requires studying the impact of choosing one scheme on the other [Ross07].  

Page 6: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  2 

Chapter 2 Analysis of Error Recovery Schemes

There are a number of solutions to handle the increasing number of errors due to technology scaling in NoC. Information redundancy or coding is one of the most feasible solutions [Vitk08]. Nowadays the research has focused on handling limited number of error, single or double. When considering future technology scaling down, it is expected that the error rates increase drastically, these errors can be categorized into two types: transient errors which causes the component to malfunction for some time, other type is permanent errors which cause the component to malfunction forever [Leht07]. An efficient fault tolerance approach should consider occurrence of multiple simultaneous permanent and transient errors. In this chapter three possible schemes for error correction are discussed.

2.1 Retransmission Scheme In the retransmission scheme, the receiver notifies the sender in case of errors,

causing the sender to repeat the transmission as shown in Figure1. Since the retransmission scheme looks uncomplicated, it has been reported as power efficient scheme [Ross07]. Retransmission scheme is power efficient because certain Hamming distance provides higher error detection capability than error correction capability .Hamming distance is the number of positions for which the corresponding symbols are different, as shown in Figure2 “minimum Hamming distance = number of detected error -1” than the error correction capability “minimum Hamming distance = 2×number of corrected error -1”, which leads to less area overhead and less power consumption. On the other hand the retransmission scheme increases the delay and guarantee of certain throughput is impossible. If permanent errors occur the retransmission scheme totally fails to handle that error [Leht07], one way to handle the permanent errors in retransmission scheme is sending the data through two different routes, but in this case the network load will be doubled. Another way that

Page 7: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  3 

can handle the permanent errors and reduce the delay overhead is to use error correction codes.

Figure 1: Retransmission scheme.

 

 

Figure 2: Minimum Hamming distance for error detection and correction codes  

2.2 Error correction scheme There are several error correction coding approaches like Hamming, BCH, and

Reed-Solomon codes. In this section, Hamming code is discussed as one example of error correction coding approaches and its capability to correct single and burst errors. In addition, more details about the crosstalk avoidance correction coding approaches will be discussed.

Hamming codes are the most widely used codes in NoC error protection [Srid05]. The distance of Hamming codes for single error detection/single error correction is 3 and for double error detection/single error correction code is 4. Hamming code (n = 71, k = 64) is taken as example in our analysis, where n is the width of the codeword and k is the width of the data word, its code rate “R” which equals n divided by k is 0.90 [Leht07].

Interleaving is an efficient way to handle burst errors. It is the encoding of partitioned data word separately, then combine one bit of each partitioned data word at a time to get the final code word. This method is popular for its not complicated and a cheaper way to handle burst errors. In addition, many single errors can be corrected if they affect a different partitioned data word which increases the power of the error correction scheme.

One possible solution is to combine Hamming codes together with the technique of interleaving. Therefore two approaches with that solution are analyzed.

Core B Core A Error 

Detection Encoder 

Error Detection Decoder 

DATA

ACK/NACK

Sender   Receiver

Buffer 

Where dmin is the minimum Hamming distance, s is the number of errors to be corrected and  t number of error to be corrected

Page 8: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  4 

In the first approach the data word of 64 bits is partitioned into 3 parts (21, 21, 22), which are encoded by Hamming codes as 2× (26, 21), (27, 22) with rate R=0.81. In the second approach, the data word is partitioned into 4 parts (16, 16, 16, 16) are encoded by 4 instances of a (21, 16) Hamming code with rate of R=0.76.

The chart in Figure 3 shows simulations result of applying different error correcting approaches to different error scenarios; it includes combination of single errors and burst errors.

Code n k Rate Notes Hamming 1 71 64 0.90 Single error correction Hamming 2 79 64 0.81 2x(26,21)+(27,21) Hamming 3 84 64 0.76 4x(21,16)

Table 1: The analyzed code approaches.  

 

Figure 3: Error correction capabilities for different coding approaches under different fault scenarios. 

 

2.2.1 Crosstalk Avoidance Correction Codes

Crosstalk became one of major sources of transient errors as result of technology scaling [Phil06]. The difficulty in correcting crosstalk errors lies on data dependency and hardness to predict the next data bit [Gang08]. In this section understanding of crosstalk principle, crosstalk avoidance single, multiple error correction codes are discussed.

0

0.2

0.4

0.6

0.8

1

1;0 2;0 3;0 0;2 0;3 0;4 0;2x2 1;2 2;2 1;3

Correct transmission

s

X;Y  where X is the number of single errors and  Y is the length of burst errors

Hamming 1

Hamming 2

Hamming 3

Page 9: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  5 

(b) 

(a) 

2.2.1.1 Introduction to Crosstalk

The crosstalk is unwanted coupling of energy between two adjacent wires affect each other during the switching activities on the wires. Figure 4(a) shows cross section of three wires on the chip, assuming that the middle wire is the victim wire (V) which is affected through crosstalk by simultaneous transitions of its neighboring aggressors wires “A1 and A2” [Phil06]. To study the effect of the aggressors wire on the victim wire, let’s assume that the capacitance between the victim and substrate is Cs and the capacitance between the victim and each of the aggressors is Cc .

 

 

    

 Figure 4: (a) Cross‐section shows victim and aggressor wires, (b) Possible crosstalk 

effects. The transition activities and their corresponding effective capacitances Ceff

which are the resultant of Cc and Cs are shown in Table 2. The symbol ↑ represent rising transition, ↓ falling transition and – no transition. The crosstalk effect when the transition of the victim and the aggressors occurs in same direction, case 1 in Table 2, has the least impact on effective capacitance Ceff =Cs ; in this case, Cc has no effect on Ceff because the change in voltage transitions between the victim and the aggressors is zero. The worst case occurs when the transition of the victim is in one certain direction and the transition of the aggressors in the other direction, case 5 in Table 2, so the change in the voltage transition between the victim and aggressive wire A1 is 2 times because both wires change in different direction, the same is true for V and A2, resulting in total effective capacitance of Ceff = Cs + 4Cc . The possible effects of the crosstalk on the signal on the victim wire are shown in Figure 4(b). The observable

Page 10: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  6 

faults are glitches and delays that correspond to a certain transition pattern [Fran06, Cuvi99].

Ceff Transition Patterns 1. Cs (↑,↑,↑) (↓,↓,↓) 2. Cs + Cc (–,↑,↑) (–,↓,↓) (↑,↑,–) (↓,↓,–) 3. Cs + 2Cc (–,↑,–)

(↑,↑,↓) (–,↓,–) (↑,↓,↓)

(↓,↑,↑)

(↓,↓,↑)

4. Cs + 3Cc (–,↑,↓) (–,↓,↑) (↑,↓,–) (↓,↑,–) 5. Cs + 4Cc (↑,↓,↑) (↓,↑,↓)

Table 2: Effective capacitance (Ceff) of the victim wire and its corresponding transition activities. 

2.2.1.2 Single Error Correction Codes

Duplicate Add Parity (DAP) The DAP [Gang08, Pand06a] achieves crosstalk avoidance by duplication of the data bit lines, which reduces the worst case capacitance to Cs + 2Cc .In addition a parity bit is added to correct single errors. The encoder for DAP is shown in Figure 5, it is simple encoder consisting of only 3 XOR gates which generate the parity bit. The decoder is shown in Figure 6, it consists of 4 XOR gates to detect errors and a multiplexer to select the error-free data word. Table 3 shows examples of DAP codewords, the bold bit is the parity bit. Moreover, optimization by intelligent spacing [Ross05] which decreases the spacing distance between duplicated wires can be done during design time when routing the wires.

 

Figure 5: DAP Encoder  

 

=1 

=1 

=1 

X0

X1

X2

X3

Y0 

Y1 

Y2 

Y3 

Y4 

Y5 

Y6 

Y7 

Y8 

Page 11: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  7 

Data word Codeword 0100 100110000 0011 000001111 1010 111001100

Table 3: DAP code examples. 

 

Figure 6: DAP Decoder 

 

Modified Dual Rail (MDR) MDR [Gang08, Pand06a] works basically in the same way like DAP, the only

modification being the duplication of the parity bit in order to reduce the effect of crosstalk on it. Table 4 shows examples of DAP codewords, the bold bit is the parity bit.

Data word Codeword 0100 1100110000 0011 0000001111 1010 1111001100

Table 4: MDR code examples.

=1 

=1 

=1 

=1 

10 

10 

10 

10 

Y0

Y8

Y4 

Y5 

Y6

Y7

Y1

Y2 

Y3 

X3 

X2

X1 

X0

Decoder Enable

Page 12: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  8 

Boundary Shift Code (BSC) To be able to understand the idea behind the Boundary Shift Code two

important definitions should be discussed. First is the invalid transition, which represents a transition from one codeword to another so that adjacent bits changed in opposite directions, which results in the worst case coupling capacitance. For example (01100011, 11011010) would be an invalid transition. A code is self-shielding if it can avoid invalid transitions. Second definition is the dependent boundary which is the place where two adjacent bits differ and are denoted by the leftmost bit position the codeword, for example (11001110 ,01100111) have dependent boundary of (2,4,7) and (1,3,5) respectively. The idea of the BSC [Pate04] based on the fact that no overlapping dependent boundaries sets form an invalid transition, also the fact that one bit circular shift converts the code with odd dependent boundary to code with even dependent boundary and vice versa, alternation between the codes generate self-shielding code. The function of BSC encoder shown in Figure 7 is to duplicate the data word “even dependent boundary” and then to send the duplicated codeword with the parity bit and the clock signal. On the next clock cycle the next codeword with parity but circulated is sent. Finally the transmitted data is a self-shielded codeword. The decoder is similar to the DAP decoder with additional multiplexer array to generate the non circulated codeword. Table 5 shows examples of DAP codeword, the bold bit is the parity bit.

Clock Data word Codeword 1 0100 100110000 2 0011 000011110 3 1010 111001100

Table 5: BSC code examples. 

 

 

Figure 7: Encoder for BSC.

1       0 1       01       0 1       0  1       0

=1

=1  =1

X0 X1  X2 X3

CLK 

Y0  Y2  Y4 Y6Y1  Y3  Y5 Y7 Y8

Decoder Enable 

Page 13: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  9 

 

Figure 8: Decoder for BSC. Duplicate Add Parity Bus Invert (DAPBI)

The DAPBI [Srid05] is combination of a DAP code with a low power property. Bus invert refers to invert the codeword if the current data word differs from the previous data word in more than half the number of bits. As shown in Figure 9, DAPBI is realized by a DAP combined with a bus invert module. The low power property of DAPBI is due to reducing the transition activity on the data bus 12  where P is the power dissipation, α is the transition activity, Cnet is the

capacitance, Vdd is the supply voltage and f is the frequency.

Figure 9: Encoder and Decoder for DAPBI.

   

01

01

=1 

Metric Computation 

parity

=1 

=1Bus‐invert 

Encoder  Decoder 

Dataword 

      Decoder Enable 0       1 0       10       1 0       1  0       1

CLK 

Y0  Y2 Y4 Y6Y1  Y3  Y5 Y7 Y8

1       0 1       0    0 1       0 

=1=1 

=1 

=1 

X0  X1  X2 X3

Page 14: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  10 

2.2.1.3 Multiple Error Correction Codes

Crosstalk Avoidance Double Error Correcting (CADEC) Code CADEC [Gang08] combines SEC Hamming codes with DAP. For a 32 bit

data word, a SEC Hamming encoder (38, 32) producing single error correction capability with Hamming distance of 3, since DAP double the data lines to 77 ”2×38bits for data, 1bit for parity”. The minimum Hamming distance between different codewords is 7; this means that triple errors can be corrected. In the CADEC only double error correction is explained due to the fact that decoder will be simpler than triple error correction, which leads to less area and power overhead. The encoder for CADEC is shown in Figure 10.

 

Figure 10: Encoder for CADEC. 

 

The decoding process for CADEC is more complex than the encoding process, it is explained with the help of the flow chart shown in Figure 11. The decoding algorithm consists of the following steps; it is apparent that the decoding algorithm invalid for more than double errors:

1- Separate the received 77 bits to 2x38 copies named A, B and single parity bit called P0.

2- Calculate the parity for each individual copy and named PA for A and PB for B.

3- If PA=PB, which stands for no error in both copies or there is single error in each copy or there are double errors in one copy and the other is error-free. In that case one copy -for example B- is sent to the syndrome detection to detect if it contains a double error. This should be avoided because the Hamming decoder can correct only one bit.

4- If PA≠PB, which stands for only one error in one of the copies occurred and the other copy is error-free. This can be detected by comparing PA and PB with P0.

5- The final step is to perform the Hamming decoding.

The circuit implementation for the CADEC decoder is shown in Figure 12.

Page 15: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  11 

 

Figure 11: Decoding algorithm for CADEC. 

 

 

 

Figure 12: Circuit implementation for CADEC decoder. 

Page 16: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  12 

Joint crosstalk avoidance and Triple Error Correction (JTEC) code The encoder for JTEC , which generates codeword with minimum Hamming

distance of 7, [Gang08a] resembles the CADEC encoder shown in Figure 10. In order to correct triple errors the decoder becomes more complicated since it requires syndrome and parity computation for each copy. The following Decoder algorithm shall be explained with the help of the flow chart shown in Figure 13.

1- Separate the received bits into two copies named A, B and a single parity bit called P0.

2- Compute the syndrome SA, SB and parity bit PA, PB for each copy. 3- If the syndrome SA≠0, it stands for the copy A have one or two errors, and in

the other copy there is two or one error, so selection of single error copy is performed.

• If P0=PA, that refers to two errors in A but B has single error in the worst case, so copy B is chosen and send to SEC Hamming decoder.

• However, if P0≠PA, that refers to one error in A and B may have double errors, single errors or even be error-free. SB is checked and chosen to be sent to SEC Hamming decoder in error-free case (SB=0), else A is chosen and sent to SEC Hamming decoder.

4- If SA=0 then A may have triple errors or can be error-free, so PA is checked and if PA=0 that means there are 0, 2, 4… 2n errors. Since the double errors are eliminated in the previous condition and we can correct maximum of triple errors, A has to be error-free. Otherwise, B is chosen if its syndrome SB=0, copy A is chosen if SB≠0.

Figure 13: Decoding algorithm for JTEC. 

Page 17: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  13 

Optimization of Joint crosstalk avoidance and Triple Error Correction (JTEC) code

To avoid the delay overhead which is generated by the long chains of XOR gates used for parity generation, an optimized JTEC [Gang08a] code is discussed. This optimization is shown in Figure 14. The encoder combines the parity bit “P0” with the first (38, 32) Hamming coded copy “A” to generate a copy with SEC/DED protection and the other copy “B” with SEC only. Since JTEC deals with triple error correction and we have 2 copies of data word in two different formats, a double error in one copy means that the other copy has at most a single error which can be corrected by an SEC code. If the first 39 bits copy”A+P0” has triple errors then the B copy is error-free. Flow chart in Figure 15 explains the algorithm for the JTEC decoder.

 

Figure 14: Encoder for optimized JTEC. 

Page 18: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  14 

 

Figure 15: Decoder Algorithm for optimized JTEC. 

2.2.1.4 Power Dissipation

There are two strategies to reduce power dissipation, as shown in Figure 16. Selection of the most power efficient strategy needs modeling and simulation for the network topology with the coding approach [Jant05, Pand06].

      

Figure 16: Power dissipation tradeoff as function of error correcting codes. Figure 17 shows the voltage swing reduction for several crosstalk avoidance codes [Gang08a]. As shown the voltage can be reduced from 0.77V to 0.42V for different error correcting schemes. As the power depends on the square of voltage swing, a reduction by factor 2 in voltage swing leads to a reduction by factor 4 in

increase the voltage

increase the power dissipation

noise margin is increased

Error probability decreases

simple error correction mechanism

low area and power overhead for the Encoder & 

Decoder

decrease the power dissipation

decrease the voltage

decrease the power dissipation

noise margin is decreased

Error probability increases

complex error correction mechanism

high area and power overhead for the Encoder & 

Decoder

increase the power dissipation

Page 19: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  15 

power. Due to the complex encoder and decoder needed to handle the increasing in error probability, this factor of 4 is not the right estimation. In Figure 18 an experiment [Gang08a] uses butterfly-fat-tree (BFT) based NoC architecture with 64 IP blocks is simulated to compare the power dissipation with respect to different crosstalk avoidance error correcting schemes. As noticed, the rate of change in power dissipation with respect to the coding mechanism is not a function of the quadratic voltage swing (Figure 17).

 

Figure 17 : Voltage swing reduction as function of Error probability  

 

Figure 18: Energy dissipation for BFT architecture with different coding schemes. 

 

2.3 Hybrid Scheme In the Hybrid Scheme [Mura05], a combination of error correction scheme and

retransmission scheme is performed. The receiver in the Hybrid Scheme corrects a specific number of erroneous bits, but for more than that specific number of bits, it sends a request to the sender to retransmit the data. A Hybrid Scheme is not an ideal error correcting scheme, switch drops the packet in case of uncorrectable erroneous header flit. Recovering from such case is difficult since it needs a powerful error correction codes.

0.2

0.4

0.6

0.8

1

1.E‐23 1.E‐19 1.E‐15 1.E‐11 1.E‐07 1.E‐03

Voltage

 Swing (V)

Errror Probability

DAP

CADEC

JTEC

0

500

1000

1500

2000

2500

DAP CADEC JTEC

Energy Dissipa

tion

 (pJ)

Coding Schemes

Energy Dissipation

Page 20: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  16 

 

Chapter 3 Error Control Policies

3.1 Switch Architecture

In the widely used packet-based communication, each packet consists of data units called flits. The first flit of a packet (header) decides the routing path of the packet, and then crossbar in each switch is set to forward the packet to next switch. Figure 19 shows the switch architecture. The structure of the switch consists of 5 channels, 4 of them for the communication with the adjacent switches and the 5th channel for the communication with the network interface (NI), which is connected to the processing element (PE).

Page 21: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  17 

     

Figure 19: Switch architecture 

 

 

3.2 End-to-End Error control policy

In the End –to-End control error policy [Ross07, Mura05, Victo08], the flits are encoded for error detection/correction in the sender and decoded only at the receiver. In order to perform the End-to-End control policy, the switch architecture is modified as shown in Figure 20. One encoder and one decoder are needed to be connected to the NI, since the sender and receiver is the PE which is connected to the NI. The area overhead for End-to-End error control policy consists of one encoder, one decoder and extended buffer registers to match the codeword width. The additional delay overhead per flit can be given by ∆TD-Flit = TEncoder + TDecoder , where TEncoder, TDecoder are the encoder and decoder delay, respectively.

Page 22: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  18 

  

Figure 20: Switch architecture for End‐to‐End error control policy. 

 

3.3 Switch-to-Switch Error Control Policy

In the Switch-to-Switch error control policy [Ross07, Mura05, Victo08], the flits are encoded and decoded for error correction/detection in each hop of the transmission from the sender to the receiver. In order to perform the Switch-to-Switch error control policy, the switch architecture is modified as shown in Figure 21. Four encoders and four decoders are needed to be connected to each input and output, respectively. The received codeword from S, N, W or E is decoded and sent to the NI or saved in buffers or encoded and sent to S, N, W or E to continue its routing path to the destination NI. The area overhead for Switch-to-Switch error control policy consists of four encoders, four decoders and extended output buffer registers to match the codeword width. The additional delay overhead per flit can be given by ∆TD-Flit = Nmax × (TEncoder + TDecoder), where Nmax is the maximum number of hops made by a flit.

Page 23: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  19 

 

Figure 21: Switch architecture for Switch‐to‐Switch error control policy.

Page 24: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  20 

Chapter 4 Conclusion

As Network-on-chip became the backbone for interconnection of many cores

System-on-Chips, with technology scaling, NoCs are highly susceptible to transient errors which leads to more challenges to make the system robust and power efficient. Errors recovery schemes can significantly decrease the power consumption by 20% for a given error rate and performance [Vitk08].

The report discussed how and where the errors can be corrected in the NoC. Several schemes for error recovery and control policies have been discussed. This report mainly dealt with one type of transient error which is crosstalk errors, starts from understanding of the crosstalk phenomena, furthermore discussion for the single and multiple crosstalk avoidance error correction techniques and the power consumption after integrating the encoder and decoder for these techniques. Moreover error control policies have been briefly discussed.

Finally, Network-on-Chip reliability is a challenge for the coming new technologies, in order to increase the reliability using powerful error recovery schemes with low power consumption. Future technologies change the effect of error sources, for example, in nanoscale technologies crosstalk not only became one of the main error sources but also consumes significant amount of energy while communication between IP cores.

Page 25: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  21 

References  [Cuvi99] M.Cuviello, S.Dey, X.Bai and Y.Zhao. "Fault modeling and simulation for crosstalk in system-on-chip interconnects." Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on, 1999, pp.297-303 [Fran06] A.P.Frantz, F.L.Kastensmidt, L.Carro and E.Cota. "Dependable Network-on-Chip Router Able to Simultaneously Tolerate Soft Errors and Crosstalk," Test Conference, 2006. ITC '06. IEEE International, Oct. 2006, pp.1-9 [Gang08] A.Ganguly, P.P.Pande, B.Belzer and C.Grecu."Design of Low Power & Reliable Networks on Chip Through Joint Crosstalk Avoidance and Multiple Error Correction Coding." J. Electron. Test. 24, June 2008, 1-3 [Gang08a] A.Ganguly, P.P.Pande and B.Belzer. "Crosstalk-Aware Channel Coding Schemes for Energy Efficient and Reliable NoC Interconnects.", Accepted for publication in IEEE Transactions on VLSI, August 2008, available: http://ieeexplore.ieee.org/xpls/pre_abs_all.jsp?isnumber=4359553&arnumber=4801555. [Jant05] A.Jantsch, R.Lauter and A.Vitkowski. "Power analysis of link level and end-to-end data protection in networks on chip," Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, 23-26 May 2005, Vol. 2, pp. 1770-1773 [Leht07] T.Lehtonen, P.Liljeberg, and J.Plosila. "Analysis of forward error correction methods for nanoscale networks-on-chip," in Proceedings of the 2nd international Conference on Nano-Networks (Catania, Italy, September 24 - 26, 2007). ICST (Institute for Computer Sciences Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, 1-5. [Mura05] S.Murali, T.Theocharides, N.Vijaykrishnan, M.J.Irwin, L.Benini and G.De Micheli. "Analysis of error recovery schemes for networks on chips," Design & Test of Computers, IEEE, Sept.-Oct. 2005, vol.22, no.5, pp. 434-442

Page 26: Error Correction Techniques on NoC Protocol · PDF fileError Correction Techniques on NoC Protocol Layers 23 June ... Hamming distance provides higher error detection capability than

  22 

[Pand06] P.P.Pande, H.Zhu, A.Ganguly and C.Grecu. "Crosstalk-aware Energy Reduction in NoC Communication Fabrics," SOC Conference, 2006 IEEE International, 24-27 Sept. 2006, pp.225-228 [Pand06a] P.P.Pande, A.Ganguly, B.Feero, B.Belzer and C.Grecu. "Design of Low power & Reliable Networks on Chip through joint crosstalk avoidance and forward error correction coding," dft, 21st IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'06), 2006, pp.466-476 [Pate04] K.N.Patel and I.L.Markov. "Error-correction and crosstalk avoidance in DSM busses," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, Oct. 2004, vol.12, no.10, pp. 1076-1080 [Phil06] J.-M.Philippe, S.Pillement and O.Sentieys. "Area efficient temporal coding schemes for reducing crosstalk effects," Quality Electronic Design, 2006. ISQED '06. 7th International Symposium on , 27-29 March 2006, pp.6 pp.-339 [Ross05] D.Rossi, C.Metra, A.K.Nieuwland and A.Katoch. "Exploiting ECC redundancy to minimize crosstalk impact," Design & Test of Computers, IEEE, Jan.-Feb. 2005, vol.22, no.1, pp. 59-70 [Ross07] D.Rossi, P.Angelini and C.Metra, "Configurable Error Control Scheme for NoC Signal Integrity," iolts, 13th IEEE International On-Line Testing Symposium (IOLTS 2007), 2007, pp.43-48 [Srid05] S.R.Sridhara and N.R.Shanbhag. "Coding for system-on-chip networks: a unified framework," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, June 2005, vol.13, no.6, pp. 655-667 [Vitk08] A.Vitkovski, A.Jantsch, R.Lauter, R.Haukilahti and E.Nilsson. "Low-power and error protection coding for network-on-chip traffic," Computers & Digital Techniques, IET, November 2008, vol.2, no.6, pp.483-492