Rare-Event Simulation of Regenerative Systems: Estimation of the Mean and Distribution of Hitting Times Bruno Tuffin Based on joint works with P. L’Ecuyer, P. Glynn and M. Nakayama The 12th International Conference on Monte Carlo Methods and Applications July 8-12, 2019 Sydney, Australia B. Tuffin (Inria) Hitting times MCM 2019 1 / 44
61
Embed
Rare-Event Simulation of Regenerative Systems: Estimation ... · Rare-Event Simulation of Regenerative Systems: Estimation of the Mean and Distribution of Hitting Times Bruno Tu n
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Rare-Event Simulation of Regenerative Systems:Estimation of the Mean and Distribution of Hitting
Times
Bruno TuffinBased on joint works with P. L’Ecuyer, P. Glynn and M. Nakayama
The 12th International Conference on Monte Carlo Methods andApplications
July 8-12, 2019Sydney, Australia
B. Tuffin (Inria) Hitting times MCM 2019 1 / 44
Outline
1 A short tutorial on rare-event simulation for reg. systems
2 IS application: simulation of highly reliable Markovian systems
3 Mean Time To Failure (MTTF) estimation by simulation: direct orregenerative estimator?
Crude estimationsComparison of crude estimatorsImportance Sampling estimators
4 Quantiles and tail-distribution measuresDefinitionsExponential approximation and associated estimatorsNumerical examples
B. Tuffin (Inria) Hitting times MCM 2019 2 / 44
Introduction: rare events and dependability
In telecommunication networks: loss probability of a small unit ofinformation (a packet, or a cell in ATM networks), connectivity of aset of nodes,
in dependability analysis: probability that a system is failed at a giventime, availability, mean-time-to-failure,
in air control systems: probability of collision of two aircrafts,
in particle transport: probability of penetration of a nuclear shield,
in biology: probability of some molecular reactions,
in insurance: probability of ruin of a company,
in finance: value at risk (maximal loss with a given probability in apredefined time),
...
B. Tuffin (Inria) Hitting times MCM 2019 3 / 44
Context: Time To Failure (TTF) estimationDependability analysis is of primary importance in many areas
I nuclear power plantsI telecommunicationsI manufacturingI transport systemsI computer science
Focus on the time to failure (TTF): random time to reach failure
Even for Markov chains, models usually so large⇒ computation by simulation
B. Tuffin (Inria) Hitting times MCM 2019 4 / 44
Example: Highly Reliable Markovian Systems (HRMS)
System with c types ofcomponents. X = (S1, . . . ,Xc)with Xi number of upcomponents.
Markov chain. Failure rates areO(ε), but not repair rates. Failurepropagations possible.
System down when in greystate(s)
Goal:
I compute p probability from(2, 2) to hit failure beforebeing back (2, 2): small if εsmall.
I compute TTF: long time if εsmall.
B. Tuffin (Inria) Hitting times MCM 2019 5 / 44
S-valued regenerative process X = (X (t) : t ≥ 0)Goal: Compute α = E[T ] , where
T = inf{t ≥ 0 : X (t) ∈ A}is the hitting time of subset A
Regeneration times 0 = Γ(0) < Γ(1) < · · · ,with iid cycles ((τ(k), (X (Γ(k − 1) + s) : 0 ≤ s < τ(k)) : k ≥ 1)τ(k) = Γ(k)− Γ(k − 1), length of the kth regenerative cycle
1 A short tutorial on rare-event simulation for reg. systems
2 IS application: simulation of highly reliable Markovian systems
3 Mean Time To Failure (MTTF) estimation by simulation: direct orregenerative estimator?
Crude estimationsComparison of crude estimatorsImportance Sampling estimators
4 Quantiles and tail-distribution measuresDefinitionsExponential approximation and associated estimatorsNumerical examples
B. Tuffin (Inria) Hitting times MCM 2019 20 / 44
Highly Reliable Markovian Systems (HRMS)
System with c types of components. X = (X1, . . . ,Cc) with Ci
number of up components.
1: state with all components up.
Failure rates are O(ε), but not repair rates. Failure propagationspossible.
System down (in A) when some combinations of components aredown.
Goal: compute µ(1) ≡ p(1) with p(y) probability to hit A before 1starting from y (denominator of the ratio est. of MTTF)
Simulation using the embedded DTMC. Failure probabilities are O(ε)(except from 1). How to improve (accelerate) this?
Existing method: ∀y 6= 1, increase the probability of the set offailures to constant 0.5 < q < 0.9 and use individual probabilitiesproportional to the original ones (SFB), or uniformly (BFB).
Failures not rare anymore. BRE property verified for BFB.
B. Tuffin (Inria) Hitting times MCM 2019 21 / 44
HRMS Example, and IS
Figure: Original probabilities Figure: Probabilities under IS/BFB
Assume E[τ3] <∞. If tbpb →∞ as b →∞, then we have that asb →∞,
√tbpb
(αi ,b(tb)
E[Tb]− 1
)⇒√
E[τ ]N (0, 1), i = 1, 2, and
√tbpb
(α1,b(tb)
E[Tb]−α2,b(tb)
E[Tb]
)⇒ 0.
B. Tuffin (Inria) Hitting times MCM 2019 30 / 44
Numerical results for HRMSSystem with 3 component types, with ni = 3, failure rates ε, repair rates 1, andsystem is down whenever fewer than two components of any one type areoperational.Direct:
m ε Confidence Interval Variance CPU Work Norm. Var.
Direct estimator: bounded relative variance, but computational time issue
Regenerative estimator: rather a rare event issue.
B. Tuffin (Inria) Hitting times MCM 2019 31 / 44
Numerical results for HRMSSystem with 3 component types, with ni = 3, failure rates ε, repair rates 1, andsystem is down whenever fewer than two components of any one type areoperational.Direct:
m ε Confidence Interval Variance CPU Work Norm. Var.
Direct estimator: bounded relative variance, but computational time issue
Regenerative estimator: rather a rare event issue.
B. Tuffin (Inria) Hitting times MCM 2019 31 / 44
Efficient Regenerative IS estimators extensively studied.
Question:
What about the direct estimator?Can its combination with IS yield an efficient estimator?
We will play with the toy example:
0 1 2
2ε ε
1 with embedded DTMC
0 1 2
1 ε/(1 + ε)
1/(1 + ε)
Eε(Tε) =∞∑n=0
(n + 1)
(1
2ε+
1
1 + ε
)(1
1 + ε
)n ε
1 + ε=
1 + 3ε
2ε2
Eε[(Tε)2] =∞∑n=0
(n + 1)2
(1
2ε+
1
1 + ε
)2 ( 1
1 + ε
)n ε
1 + ε=
(2 + ε)(1 + 3ε)2
4(1 + ε)ε4
Eε(N) =∞∑n=0
(2 + 2n)
(1
1 + ε
)n ε
1 + ε=
2(1 + ε)
εwith N:# transitions in a run.
B. Tuffin (Inria) Hitting times MCM 2019 32 / 44
Failure biasing
Change the probability of making a failure transition to be ρ,independent of ε
0 1 2
1 ρ
1− ρ
Eε[(TεL)2] = Eε[(Tε)2L] =∞∑n=0
(n + 1)2
(1
2ε+
1
1 + ε
)2
((1
1+ε
)nε
1+ε
)2
(1− ρ)nρ
Converging sum iff 1/((1 + ε)2(1− ρ)) < 1, i.e., ρ small enough
ρ < 1− 1
(1 + ε)2= 2ε− 3ε2 + o(ε2).
But Eε(N) =∞∑n=0
(2 + 2n)(1− ρ)nρ =2
ρ.
The average simulation time for a single run will increase to infinity asε→ 0!
B. Tuffin (Inria) Hitting times MCM 2019 33 / 44
Failure biasing
Change the probability of making a failure transition to be ρ,independent of ε
0 1 2
1 ρ
1− ρ
Eε[(TεL)2] = Eε[(Tε)2L] =∞∑n=0
(n + 1)2
(1
2ε+
1
1 + ε
)2
((1
1+ε
)nε
1+ε
)2
(1− ρ)nρ
Converging sum iff 1/((1 + ε)2(1− ρ)) < 1, i.e., ρ small enough
ρ < 1− 1
(1 + ε)2= 2ε− 3ε2 + o(ε2).
But Eε(N) =∞∑n=0
(2 + 2n)(1− ρ)nρ =2
ρ.
The average simulation time for a single run will increase to infinity asε→ 0!
B. Tuffin (Inria) Hitting times MCM 2019 33 / 44
Failure biasing
Change the probability of making a failure transition to be ρ,independent of ε
0 1 2
1 ρ
1− ρ
Eε[(TεL)2] = Eε[(Tε)2L] =∞∑n=0
(n + 1)2
(1
2ε+
1
1 + ε
)2
((1
1+ε
)nε
1+ε
)2
(1− ρ)nρ
Converging sum iff 1/((1 + ε)2(1− ρ)) < 1, i.e., ρ small enough
ρ < 1− 1
(1 + ε)2= 2ε− 3ε2 + o(ε2).
But Eε(N) =∞∑n=0
(2 + 2n)(1− ρ)nρ =2
ρ.
The average simulation time for a single run will increase to infinity asε→ 0!
B. Tuffin (Inria) Hitting times MCM 2019 33 / 44
Zero-variance approximationFor a CTMC with transition matrix (Px ,y )x ,y∈S , if Eε,x expectationstarting from x ,
Px ,y = Px ,y1/λ(x) + Eε,y (Tε)
Eε,x(Tε)
yields an estimator with variance zero.
On our toy example, the only probability we can change is from 1
0 1 2
1 ρ
1− ρ
ρ =ε
1 + ε
11+ε + 0
1+2ε2ε2
=2ε3
(1 + ε)2(1 + 2ε)yields variance 0.
But the estimation takes on average longer time, 2ρ = Θ(ε−3), as ε
gets closer to zero.
An approximation of the zero-variance IS can be inefficient, producingan unbounded work-normalized relative variance.
B. Tuffin (Inria) Hitting times MCM 2019 34 / 44
Zero-variance approximationFor a CTMC with transition matrix (Px ,y )x ,y∈S , if Eε,x expectationstarting from x ,
Px ,y = Px ,y1/λ(x) + Eε,y (Tε)
Eε,x(Tε)
yields an estimator with variance zero.
On our toy example, the only probability we can change is from 1
0 1 2
1 ρ
1− ρ
ρ =ε
1 + ε
11+ε + 0
1+2ε2ε2
=2ε3
(1 + ε)2(1 + 2ε)yields variance 0.
But the estimation takes on average longer time, 2ρ = Θ(ε−3), as ε
gets closer to zero.
An approximation of the zero-variance IS can be inefficient, producingan unbounded work-normalized relative variance.
B. Tuffin (Inria) Hitting times MCM 2019 34 / 44
Zero-variance approximationFor a CTMC with transition matrix (Px ,y )x ,y∈S , if Eε,x expectationstarting from x ,
Px ,y = Px ,y1/λ(x) + Eε,y (Tε)
Eε,x(Tε)
yields an estimator with variance zero.
On our toy example, the only probability we can change is from 1
0 1 2
1 ρ
1− ρ
ρ =ε
1 + ε
11+ε + 0
1+2ε2ε2
=2ε3
(1 + ε)2(1 + 2ε)yields variance 0.
But the estimation takes on average longer time, 2ρ = Θ(ε−3), as ε
gets closer to zero.
An approximation of the zero-variance IS can be inefficient, producingan unbounded work-normalized relative variance.
B. Tuffin (Inria) Hitting times MCM 2019 34 / 44
Discussion on the impact of the approximation
For ρ = 2ε3
(1+ε)2(1+2ε), we retrieve a variance zero.
For ρ = ε3 (approximation of good asymptotic order), the variance isΘ(ε−2), but the work-normalized relative variance is unbounded dueto the computational time.
For ρ = 2ε3 (exact first-order term), the variance is Θ(1), which isbetter but still not sufficient to yield a bounded work-normalizedvariance.
Much better than an exact first-order approximation is required.Hard to obtain in practice.
B. Tuffin (Inria) Hitting times MCM 2019 35 / 44
Discussion on the impact of the approximation
For ρ = 2ε3
(1+ε)2(1+2ε), we retrieve a variance zero.
For ρ = ε3 (approximation of good asymptotic order), the variance isΘ(ε−2), but the work-normalized relative variance is unbounded dueto the computational time.
For ρ = 2ε3 (exact first-order term), the variance is Θ(1), which isbetter but still not sufficient to yield a bounded work-normalizedvariance.
Much better than an exact first-order approximation is required.Hard to obtain in practice.
B. Tuffin (Inria) Hitting times MCM 2019 35 / 44
Discussion on the impact of the approximation
For ρ = 2ε3
(1+ε)2(1+2ε), we retrieve a variance zero.
For ρ = ε3 (approximation of good asymptotic order), the variance isΘ(ε−2), but the work-normalized relative variance is unbounded dueto the computational time.
For ρ = 2ε3 (exact first-order term), the variance is Θ(1), which isbetter but still not sufficient to yield a bounded work-normalizedvariance.
Much better than an exact first-order approximation is required.Hard to obtain in practice.
B. Tuffin (Inria) Hitting times MCM 2019 35 / 44
Discussion on the impact of the approximation
For ρ = 2ε3
(1+ε)2(1+2ε), we retrieve a variance zero.
For ρ = ε3 (approximation of good asymptotic order), the variance isΘ(ε−2), but the work-normalized relative variance is unbounded dueto the computational time.
For ρ = 2ε3 (exact first-order term), the variance is Θ(1), which isbetter but still not sufficient to yield a bounded work-normalizedvariance.
Much better than an exact first-order approximation is required.Hard to obtain in practice.
B. Tuffin (Inria) Hitting times MCM 2019 35 / 44
Conclusions on MTTF estimation
We have compared two standard estimators of the MTTF for regenerativeprocesses
a direct one expressed as the average of simulated times to failure
one making use of the regenerative structure
1 Crude direct and ratio-based estimators are asymptotically equivalent(in two asymptotic contexts)
2 When IS is used, the regenerative expression is rather advised.
B. Tuffin (Inria) Hitting times MCM 2019 36 / 44
Outline
1 A short tutorial on rare-event simulation for reg. systems
2 IS application: simulation of highly reliable Markovian systems
3 Mean Time To Failure (MTTF) estimation by simulation: direct orregenerative estimator?
Crude estimationsComparison of crude estimatorsImportance Sampling estimators
4 Quantiles and tail-distribution measuresDefinitionsExponential approximation and associated estimatorsNumerical examples
B. Tuffin (Inria) Hitting times MCM 2019 37 / 44
Basic idea
Let F be the cumulative distribution function of T
Goal: For fixed 0 < q < 1, estimate the q-quantile (0 < q < 1)
ξ = F−1(q) ≡ inf{t : F (t) ≥ q}
and the conditional tail expectation (CTE)
γ = E [T | T > ξ].
Assumption: X is (classically) regenerativewith 0 = Γ0 < Γ1 < Γ2 < · · · sequence of regeneration times
B. Tuffin (Inria) Hitting times MCM 2019 38 / 44
Decomposition
Using τi = Γi − Γi−1 and M the number of first cycles not reaching A
T =M∑i=1
τi + TM+1
with Ti = inf{t ≥ 0 : X (Γi−1 + t) ∈ A} time to the next hit to Aafter Γi−1.
M geometric r.v. with P(M = k) = p(1− p)k where
p = P(T < τ).
Recall that the regenerative structure of X allows to express
α = E [T ] =E [T ∧ τ ]
p≡ ζ
p.
B. Tuffin (Inria) Hitting times MCM 2019 39 / 44
Asymptotic regimes/exponential approximation
Introduction of a rarity parameter ε
Assumption: p ≡ pε → 0 as ε→ 0.
I Ex HRMS: Probability of reaching a failed state before coming back tothe initial (perfectly working) state goes to 0 with failure rates
I Ex GI/G/1 queue: considering a receding set of states (number ofcustomers) A ≡ Aε = {bε, bε + 1, bε + 2, . . .}.
Theorem (Known result)
The scaled hitting time Tε/αε converges weakly to an exponential: foreach x ≥ 0,
Pε(Tε/αε ≤ x)→ 1− e−x as ε→ 0.
B. Tuffin (Inria) Hitting times MCM 2019 40 / 44
Quantile and CTE estimators based on the exponentialapproximationFrom
I Very efficientI But biased.... for small ε, does not seem a problem in practiceI Other less biased estimators studied in our WSC’2018 paper.
B. Tuffin (Inria) Hitting times MCM 2019 43 / 44
References
Mainly based onI P. L’Ecuyer and B. Tuffin. Approximating Zero-Variance Importance Sampling in a
Reliability Setting. Annals of Operations Research. Vol.189, pp 277-297, Sept.2011I P.W. Glynn, M.K. Nakayama, and B. Tuffin. On the estimation of the mean time to
failure by simulation. In the Proceedings of the 2017 Winter Simulation Conference,Las Vegas, NV, USA, Dec. 2017
I P.W. Glynn, M.K. Nakayama, B.Tuffin. Using Simulation to Calibrate ExponentialApproximations to Tail-Distribution Measures of Hitting Times to Rarely VisitedSets. In the Proceedings of the 2018 Winter Simulation Conference, Gothenburg,Sweden, Dec. 2018
Other selected references on rare eventsI G. Rubino and B. Tuffin (eds). Rare Event Simulation using Monte Carlo Methods.
John Wiley, 2009I P. L’Ecuyer, J. Blanchet, B. Tuffin, P.W. Glynn. Asymptotic Robustness of
Estimators in Rare-Event Simulation. ACM Transactions on Modeling andComputer Simulation. Vol 20, Num. 1 Article 6, 2010
I P. L’Ecuyer, V. Demers and B. Tuffin. Rare Events, Splitting, and Quasi-MonteCarlo. ACM Transactions on Modeling and Computer Simulation, Vol. 17, Num. 2,Article 9, 2007
I P. L’Ecuyer and B. Tuffin, Approximate Zero-Variance Simulation. In Proceedingsof the 2008 Winter Simulation Conference, 2008