Paper Report Presenter: Jyun- Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra Singh Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, India Kewal K. Saluja Electrical and Computer Engg. Dept., University of Wisconsin-Madison, Madison, WI Erik Larsson Dept. of Computer and Info. Science, Linkoping University, Linkoping, Sweden Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010 Cite count: 16
20
Embed
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Paper Report
Presenter: Jyun-Yan Li
Multiplexed redundant execution: A technique for efficient fault tolerance
in chip multiprocessors
Pramod Subramanyan, Virendra Singh Supercomputer Education and Research Center, Indian Institute of Science, Bangalore, IndiaKewal K. Saluja Electrical and Computer Engg. Dept., University of Wisconsin-Madison, Madison, WIErik Larsson Dept. of Computer and Info. Science, Linkoping University, Linkoping, Sweden
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010
Cite count: 16
2
Continued CMOS scaling is expected to make future microprocessors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market.
To mitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures execute a single application using two threads, typically as one leading thread and one trailing thread. Errors are detected by comparing the outputs produced by these two threads. These architectures schedule a single application on two cores or two thread contexts of a CMP.
Abstract – part1
3
As a result, besides the additional energy consumption and performance overhead that is required to provide fault tolerance, such schemes also impose a throughput loss. Consequently a CMP which is capable of executing 2n threads in non-redundant mode can only execute half as many (n) threads in fault-tolerant mode.
In this paper we propose multiplexed redundant execution (MRE), a low-overhead architectural technique that executes multiple trailing threads on a single processor core. MRE exploits the observation that it is possible to accelerate the execution of the trailing thread by providing execution assistance from the leading thread.
Abstract – part2
4
Execution assistance combined with coarse-grained multithreading allows MRE to schedule multiple trailing threads concurrently on a single core with only a small performance penalty. Our results show that MRE increases the throughput of fault-tolerant CMP by 16% over an ideal dual modular redundant (DMR) architecture
Abstract – part3
5
Chip multiprocessors (CMPs) become the major for performance growth Susceptible to soft errors, wear-out related permanent
fault …
2 cores or thread contexts execute single program in the CMP Throughput loss
。The throughput of the CMP decreases to half System cost
。Cooling, energy and maintenance cost
What’s the problem
6
Related work
AR-SMT[22]
SRT[21]
CRT[18]
CRTR[13]
SRTR[29]
Razor[11]
Power efficient redundant execution
[26]
Dynamic frequency and voltage scaling to reduce power
Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessorsThis
paper:
Adding recovery
Adding recovery
Using SMT to detect transient faultLeading thread stores results in a delay buffer, and trailing thread re-executes and compare result
Replicating critical pipeline registers and comparing them to detect error
7
Input replication Issue input value to the both threads Optimize trailing thread
。Load Value Queue (LVQ) for the loading data。Branch Outcome Queue (BOQ) for the fetching instruction
Output comparator Verifying results of the both threads before they are
forward to the rest of the system Store queue prevents store data