Dept. of Computer Science & Engineering, CUHK Performance and Effectiveness Analysis of Checkpointing in Mobile Environments Chen Xinyu 2003-01-22
Dec 21, 2015
Dept. of Computer Science & Engineering, CUHK
Performance and Effectiveness Analysis of Checkpointing in
Mobile Environments
Chen Xinyu
2003-01-22
Introduction
Mobile Environment – Wireless CORBA
Performance and Effectiveness Analysis of Checkpointing
Conclusions and Future Work
Outline
Introduction
Mobile Computing Permanent failures
Physical damage
Transient failures Mobile hosts Wireless links Environmental conditions
Checkpointing and Rollback Recovery
Checkpoint the saved program’s states during failure-free
execution
Repair brings the failed device back to normal operation
Rollback reloads the program’s states saved at the most
recent checkpoint
Recovery the reprocessing of the program, starting from the
most recent checkpoint, applying the logged messages and until the point just before the failure
Wireless CORBA Architecture
Visited Domain
Home Domain
Terminal Domain
Access Bridge
Access Bridge
Access Bridge
Access Bridge
Static Host
Static Host
Terminal Bridge
GIOP
Tunnel
ab1
ab2
mh1
GTP Messages
Wireless CORBA Architecture
Visited Domain
ab1
ab2
Access Bridge
Access Bridge
Static Host
Static Host
Home Domain
Home Location
Agent
Terminal Domain Terminal
Bridge
GIOP
Tunnelmh1
mh1
Terminal Domain Terminal
Bridge
GIOP
Tunnel
GIOP
Tunnel
mh1
Terminal Domain Terminal
BridgeGIOP Tunnel
mh1
Terminal Domain Terminal
Bridge
Access Bridge
Access Bridge
Introduction
Mobile Environment – Wireless CORBA
Performance and Effectiveness Analysis of Checkpointing
Conclusions and Future Work
Outline
Program’s Termination Condition
A program is successfully terminated if it receives N computational messages continuously
Assumptions
Failure occurrence, message arrival and handoff event
homogeneous Poisson process with parameter , and respectively
Failures do not occur when the program is in the repair or rollback process
A failure is detected as soon as it occurs
Execution without Checkpointing
RY0
X0
R
F1
H1Z0
0 t
Fj
Hk
mj(1) mj(N)m1(n1)m0(N)
X(N)
Repair Handoff
H H
Conditional Execution Time without checkpointing
LST without checkpointing
LST and Expectation of Program Execution Time
Bounded Situations
Without handoff
Without handoff and failure
Execution with Equi-number Checkpointing
Ci
R+CYi(0)
Xi(0)
R+C
Fi(1)
Hi(1)Z i(0)
0 t
Fi(j)
Hi(k)
mij(1) mij(a)mi1(ni1)mi0(a)
Xi(N,a)
Repair + Rollback Handoff
Ci-1
Checkpointing
H H CC
Conditional Execution Time & LST with Checkpointing
LST and Expectation of Program Execution Time
Average Effectiveness
Effective interval: a program produces useful work towards its completion
Wasted interval: Repair and rollback Handoff Checkpoint creation Wasted Computation
Average Effectiveness: how much of the time an MH is in effective interval during an execution
Optimal Checkpointing Interval
Beneficial Condition
Equi-number Checkpointing
Equi-number checkpointing with respect to message number Message number in each checkpointing interval is
not changed
Equi-number checkpointing with respect to checkpoint number Checkpoint number is not changed
Equi-number Checkpointing with respect to Checkpoint Number
Equi-number Checkpointing with respect to Message Number
Comparison Between Checkpointing and Without Checkpointing
Average Effectiveness vs. Message Arrival Rate and Handoff Rate
Conclusions
Introduce an equi-number checkpoiting strategy
Derive LST and expectation of program execution time
Derive average effectiveness Derive optimal checkpointing
interval Identify the beneficial condition
Future Work
Analytical model Message queuing effect during repair and
recovery General event distributions
Fault tolerance in ad hoc network Without infrastructure support Self-organizing and adaptive
Thank You