Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion Fitting parametric distributions using R: the fitdistrplus package M. L. Delignette-Muller - CNRS UMR 5558 R. Pouillot J.-B. Denis - INRA MIAJ useR! 2009,10/07/2009
23
Embed
Fitting parametric distributions using R: the fitdistrplus package
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Fitting parametric distributions using R: thefitdistrplus package
M. L. Delignette-Muller - CNRS UMR 5558R. Pouillot
J.-B. Denis - INRA MIAJ
useR! 2009,10/07/2009
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Background
Specifying the probability distribution that best fits a sampledata among a predefined family of distributions
a frequent need especially in Quantitative RiskAssessmentgeneral-purpose maximum-likelihood fitting routine for theparameter estimation step : fitdistr(MASS) (Venablesand Ripley, 2002)possibility to implement other steps using R (Ricci, 2005)but no specific package dedicated to the whole processdifficulty to work with censored data
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Objective
Build a package that provides functions to help the wholeprocess of specification of a distribution from data
choose among a family of distributions the best candidatesto fit a sampleestimate the distribution parameters and their uncertaintyassess and compare the goodness-of-fit of severaldistributions
that specifically handles different kinds of datadiscretecontinuous with possible censored values (right-, left- andinterval-censored with several upper and lower bounds)
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Technical choices
Skewness-kurtosis graph for the choice of distributions(Cullen and Frey, 1999)
Two fitting methodsmatching momentsfor a limited number of distributions and non-censored datamaximum likelihood (mle) using optim(stats)for any distribution, predefined or defined by the userfor non-censored or censored data
Uncertainty on parameter estimationsstandard errors from the Hessian matrix (only for mle)parametric or non-parametric bootstrap
Assessment of goodness-of-fitchi-squared, Kolmogorov-Smirnov, Anderson-Darling statisticsdensity, cdf, P-P and Q-Q plots
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Technical choices
Skewness-kurtosis graph for the choice of distributions(Cullen and Frey, 1999)
Two fitting methodsmatching momentsfor a limited number of distributions and non-censored datamaximum likelihood (mle) using optim(stats)for any distribution, predefined or defined by the userfor non-censored or censored data
Uncertainty on parameter estimationsstandard errors from the Hessian matrix (only for mle)parametric or non-parametric bootstrap
Assessment of goodness-of-fitchi-squared, Kolmogorov-Smirnov, Anderson-Darling statisticsdensity, cdf, P-P and Q-Q plots
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Technical choices
Skewness-kurtosis graph for the choice of distributions(Cullen and Frey, 1999)
Two fitting methodsmatching momentsfor a limited number of distributions and non-censored datamaximum likelihood (mle) using optim(stats)for any distribution, predefined or defined by the userfor non-censored or censored data
Uncertainty on parameter estimationsstandard errors from the Hessian matrix (only for mle)parametric or non-parametric bootstrap
Assessment of goodness-of-fitchi-squared, Kolmogorov-Smirnov, Anderson-Darling statisticsdensity, cdf, P-P and Q-Q plots
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Technical choices
Skewness-kurtosis graph for the choice of distributions(Cullen and Frey, 1999)
Two fitting methodsmatching momentsfor a limited number of distributions and non-censored datamaximum likelihood (mle) using optim(stats)for any distribution, predefined or defined by the userfor non-censored or censored data
Uncertainty on parameter estimationsstandard errors from the Hessian matrix (only for mle)parametric or non-parametric bootstrap
Assessment of goodness-of-fitchi-squared, Kolmogorov-Smirnov, Anderson-Darling statisticsdensity, cdf, P-P and Q-Q plots
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Main functions of fitdistrplus
descdist: provides a skewness-kurtosis graph to help tochoose the best candidate(s) to fit a given datasetfitdist and plot.fitdist: for a given distribution,estimate parameters and provide goodness-of-fit graphsand statisticsbootdist: for a fitted distribution, simulates theuncertainty in the estimated parameters by bootstrapresamplingfitdistcens, plot.fitdistcens andbootdistcens: same functions dedicated to continuousdata with censored values
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Skewness-kurtosis plot for continuous dataEx. on consumption data: food serving sizes (g)> descdist(serving.size)
●
0 1 2 3 4
Cullen and Frey graph
square of skewness
kurt
osis
109
87
65
43
21 ● Observation Theoretical distributions
normaluniformexponentiallogistic
betalognormalgamma
(Weibull is close to gamma and lognormal)
●
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Skewness-kurtosis plot for continuous datawith bootstrap option
> descdist(serving.size,boot=1001)
●
0 1 2 3 4
Cullen and Frey graph
square of skewness
kurt
osis
109
87
65
43
21 ● Observation
● bootstrapped values
Theoretical distributions
normaluniformexponentiallogistic
betalognormalgamma
(Weibull is close to gamma and lognormal)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●● ●
●
● ●●
●●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
● ●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
● ●
●
●
● ●
●●
●
●
●●
●
●
● ●
●●
● ●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Skewness-kurtosis plot for discrete dataEx. on microbial data: counts of colonies on small food samples> descdist(colonies.count,discrete=TRUE)
●
0 5 10 15
Cullen and Frey graph
square of skewness
kurt
osis
2119
1715
1311
98
76
54
32
1 ● Observation Theoretical distributions
normalnegative binomial
Poisson
●
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Fit of a given distributionby maximum likelihood or matching moments
Ex. on consumption data: food serving sizes (g)
Maximum likelihood estimation> fg.mle<-fitdist(serving.size,"gamma",method="mle")> summary(fg.mle)
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Goodness-of-fit graphs for discrete dataEx. on microbial data: counts of colonies on small food samples> fnbinom<-fitdist(colonies.count,"nbinom")> plot(fnbinom)
0 2 4 6 8 10 12
0.0
0.2
0.4
Empirical (black) and theoretical (red) distr.
data
Den
sity
0 2 4 6 8 10 12
0.0
0.4
0.8
Empirical (black) and theoretical (red) CDFs
data
CD
F
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Fit of a given distributionby maximum likelihood to censored data
Ex. on microbial censored data: concentrations in foodwith left censored values (not detected)and interval censored values (detected but not counted)
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Goodness-of-fit graphs for censored dataEx. on microbial censored data: concentrations in food> plot(fnorm)
−2 −1 0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
1.0
Cumulative distribution plot
censored data
CD
F
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Bootstrap resamplingEx. on microbial censored data> bnorm<-bootdistcens(fnorm)> summary(bnorm)Nonparametric bootstrap medians and 95% CI
Median 2.5% 97.5%mean 0.233 -0.455 0.875sd 1.294 0.908 1.776
> plot(bnorm)
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
● ●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
−0.5 0.0 0.5 1.0
1.0
1.5
2.0
Scatterplot of the boostrapped values of the two parameters
mean
sd
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Use of the bootstrap in risk assessment
The bootstrap sample may be used to take into accountuncertainty in risk assessment,in two-dimensional Monte Carlo simulations,as proposed in the package mc2d.
Introduction Choice of distributions to fit Fit of distributions Simulation of uncertainty Conclusion
Still many things to do
fitdistrplus is still under development.Many improvements are planned
other goodness-of-fit statisticsother graphs for goodness-of-fit for censored data(Turnbull,...)optimized choice of the algorithm used in optim for thelikelihood maximizationgraphs of likelihood contours (detection of identifiabilityproblems)...
do not hesitate to provide us other improvement ideas !