Present by Chen, Ting-Wei Present by Chen, Ting-Wei Adaptive Task Checkpointing Adaptive Task Checkpointing and Replication: Toward and Replication: Toward Efficient Fault-Tolerant Efficient Fault-Tolerant Grids Grids Maria Chtepen, Filip H.A. Claeys, Bart D Maria Chtepen, Filip H.A. Claeys, Bart D hoedt, Member, IEEE, Filip De Turck, Mem hoedt, Member, IEEE, Filip De Turck, Mem ber, IEEE, Piet Demeester, Senior Member, ber, IEEE, Piet Demeester, Senior Member, IEEE, AND Peter A. Vanrolleghem IEEE, AND Peter A. Vanrolleghem
30
Embed
Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids
Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids. Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt, Member, IEEE, Filip De Turck, Member, IEEE, Piet Demeester, Senior Member, IEEE, AND Peter A. Vanrolleghem. Table of Content. Introduction - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Present by Chen, Ting-WeiPresent by Chen, Ting-Wei
Adaptive Task Checkpointing Adaptive Task Checkpointing and Replication: Toward and Replication: Toward
Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt, MembMaria Chtepen, Filip H.A. Claeys, Bart Dhoedt, Member, IEEE, Filip De Turck, Member, IEEE, Piet Demeester, IEEE, Filip De Turck, Member, IEEE, Piet Demeester, Senior Member, IEEE, AND Peter A. Vanrolleghemer, Senior Member, IEEE, AND Peter A. Vanrolleghem
2
Table of ContentTable of Content
• Introduction• Adaptive Checkpointing Heuristics• Replication-Based Heuristics• Conclusion and Future Work
3
IntroductionIntroduction
• A novel fault-tolerant algorithm combine– Checkpointing– Replication
• Be evaluated– Newly developed grid simulation
environment Dynamic Scheduling in Distributed Environments (DSiDE)
4
Introduction Introduction (cont.)(cont.)
• Simulation– Run employing workload– System parameters
• From several large-scale parallel production systems’ logs
– Using the discrete event grid simulator DSiDE
5
Introduction Introduction (cont.)(cont.)
• Comparable throughput and fault tolerance– Static checkpointing with optimal