ResearchArticle A Poisson-Gamma Model for Zero Inflated … · 2017. 11. 7. · ResearchArticle A Poisson-Gamma Model for Zero Inflated Rainfall Data NelsonChristopherDzupire ,1 PhilipNgare,1,2
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleA Poisson-Gamma Model for Zero Inflated Rainfall Data
Nelson Christopher Dzupire 1 Philip Ngare12 and Leo Odongo13
1Pan African University Institute of Basic Sciences Technology and Innovation Juja Kenya2University of Nairobi Nairobi Kenya3Kenyatta University Nairobi Kenya
Correspondence should be addressed to Nelson Christopher Dzupire ndzupireccacmw
Received 7 November 2017 Accepted 20 February 2018 Published 4 April 2018
Academic Editor Steve Su
Copyright copy 2018 Nelson Christopher Dzupire et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited
Rainfall modeling is significant for prediction and forecasting purposes in agriculture weather derivatives hydrology and risk anddisaster preparedness Normally two models are used to model the rainfall process as a chain dependent process representing theoccurrence and intensity of rainfall Such two models help in understanding the physical features and dynamics of rainfall processHowever rainfall data is zero inflated and exhibits overdispersion which is always underestimated by such models In this studywe have modeled the two processes simultaneously as a compound Poisson process The rainfall events are modeled as a Poissonprocess while the intensity of each rainfall event is Gamma distributed We minimize overdispersion by introducing the dispersionparameter in the model implemented through Tweedie distributions Simulated rainfall data from the model shows a resemblanceof the actual rainfall data in terms of seasonal variation means variance and magnitude The model also provides mechanismsfor small but important properties of the rainfall process The model developed can be used in forecasting and predicting rainfallamounts and occurrences which is important in weather derivatives agriculture hydrology and prediction of drought and floodoccurrences
1 Introduction
Climate variables in particular rainfall occurrence andintensity hugely impact human and physical environmentKnowledge of the frequency of the occurrence and intensityof rainfall events is essential for planning designing andmanagement of various water resources system [1] Specif-ically rain-fed agriculture is a sensitive sector to weatherand crop production is directly dependent on the amountof rainfall and its occurrence Rainfall modeling has a greatimpact on crop growth weather derivatives hydrologicalsystems drought and floodmanagement and crop simulatedstudies
Rainfall modeling is also important in pricing of weatherderivatives which are financial instruments that are used asa tool for risk management to reduce risk associated withadverse or unexpected weather conditions
Further as climate change greatly affects the environmentthere is an urgent need for predicting the variability of rainfallfor future periods for different climate change scenarios
in order to provide necessary information for high qualityclimate related impact studies [1]
Howevermodeling precipitation poses a lot of challengesnamely accurate measurement of precipitation since rainfalldata consists of sequences of values which are either zero orsome positive numbers (intensity) depending on the depthof accumulation over discrete intervals In addition factorslike wind can affect collection accuracy Rainfall is localizedunlike temperature which is highly correlated across regionstherefore a derivative holder based on rainfall may suffergeographical basis risk in case of pricing weather derivativesThe final challenge is the choice of a proper probabilitydistribution function to describe precipitation data Thestatistical property of precipitation is far more complex anda more sophisticated distribution is required [2]
Rainfall has been modeled as a chain dependent processwhere a two-state Markov chain model represents the occur-rence of rainfall and the intensity of rainfall is modeled byfitting a suitable distribution like Gamma [3] exponentialand mixed exponential [1 4] These models are easy to
HindawiJournal of Probability and StatisticsVolume 2018 Article ID 1012647 12 pageshttpsdoiorg10115520181012647
2 Journal of Probability and Statistics
understand and interpret and use maximum likelihood tofind the parameters However models involve many parame-ters to fully describe the dynamics of rainfall aswell asmakingseveral assumptions for the process
Wilks [5] proposed a multisite model for daily precipi-tation using a combination of two-state Markov process (forthe rainfall occurrence) and amixed exponential distribution(for the precipitation amount) He found that the mixture ofexponential distributions offered a much better fit than thecommonly used Gamma distribution
In study of Leobacher and Ngare [3] the precipitationis modeled on a monthly basis by constructing a suit-able Markov-Gamma process to take into account seasonalchanges of precipitation It is assumed that rainfall datafor different years of the same month is independent andidentically distributed It is assumed that precipitation can beforecast with sufficient accuracy for a month
Another approach of modeling rainfall is based on thePoisson cluster model where two of the most recognizedcluster based models in the stochastic modeling of rain-fall are the Newman-Scott Rectangular Pulses model andthe Bartlett-Lewis Rectangular Pulse model These mod-els represent rainfall sequences in time and rainfall fieldsin space where both the occurrence and depth processesare combined The difficulty in Poisson cluster models asobserved by Onof et al [6] is the challenge of how manyfeatures should be addressed so that the model is stillmathematically tractable In addition the models are bestfitted by the method of moments and so requires matchinganalytic expressions for the statistical properties such asmeanand variance
Carmona and Diko [7] developed a time-homogeneousjump Markov process to describe rainfall dynamics Therainfall process was assumed to be in form of storms whichconsists of cells themselves At a cell arrival time the rainfallprocess jumps up by a random amount and at extinction timeit jumps down by a random amount bothmodeled as Poissonprocess Each time the rain intensity changes an exponentialincrease occurs either upwards or downwards To preservenonnegative intensity the downward jump size is truncatedto the current jump sizeTheMarkov jumpprocess also allowsfor a jump directly to zero corresponding to the state of norain [8]
In this study the rainfall process is modeled as a singlemodel where the occurrence and intensity of rainfall aresimultaneously modeled The Poisson process models thedaily occurrence of rainfall while the intensity is modeledusing Gamma distribution as the magnitude of the jumpsof the Poisson process Hence we have a compound Poissonprocess which is Poisson-GammamodelThe contribution ofthis study is twofold a Poisson-Gamma model that simul-taneously describes the rainfall occurrence and intensity atonce and a suitablemodel for zero inflated datawhich reducesoverdispersion
This paper is structured as follows In Section 2 thePoisson-Gamma model is described and then formulatedmathematically while Section 3 presents methods of estimat-ing the parameters of the model In Section 4 the model isfitted to the data and goodness of fit of the model is evaluated
by mean deviance whereas quantile residuals perform thediagnostics check of the model Simulation and forecastingare carried out in Section 5 and the study concludes inSection 6
2 Model Formulation
21 Model Description Rainfall comprises discrete and con-tinuous components in that if it does not rain the amountof rainfall is discrete whereas if it rains the amount iscontinuous In most research works [3 4 9] the rainfallprocess is presented by use of two separate models oneis for the occurrence and conditioned on the occurrenceand another model is developed for the amount of rain-fall Rainfall occurrence is basically modeled as first orhigher order Markov chain process and conditioned onthis process a distribution is used to fit the precipitationamount Commonly used distributions are Gamma expo-nential mixture of exponential Weibull and so on Thesemodels work based on several assumptions and inclusionof several parameters to capture the observed temporaldependence of the rainfall process However rainfall dataexhibit overdispersion [10] which is caused by various factorslike clustering unaccounted temporal correlation or the factthat the data is a product of Bernoulli trials with unequalprobability of eventsThe stochastic models developed in thisway underestimate the overdispersion of rainfall data whichmay result in underestimating the risk of low or high seasonalrainfall
Our interest in this research is to simultaneously modelthe occurrence and intensity of rainfall in one model Wewould model the rainfall process by using a Poisson-Gammaprobability distribution which is flexible to model the exactzeros and the amount of rainfall together
Rainfall ismodeled as a compoundPoisson processwhichis a Levy process with Gamma distributed jumps This ismotivated by the sudden changes of rainfall amount fromzero to a large positive value following each rainfall eventwhich are modeled as pure jumps of the compound Poissonprocess
We assume rainfall arrives in forms of storms following aPoisson process and at each arrival time the current intensityincreases by a randomamount based onGammadistributionThe jumps of the driving process represent the arrival ofthe storm events generating a jump size of random sizeEach storm comprises cells that also arrive following anotherPoisson process
The Poisson cluster processes gives an appropriate tool asrainfall data indicating presence of clusters of rainfall cellsAs observed by Onof et al [6] use of Gamma distributedvariables for cell depth improves the reproduction of extremevalues
Lord [11] used the Poisson-Gamma compound process tomodel the motor vehicle crashes where they examined theeffects of low sample mean values and small sample size onthe estimation of the fixed dispersion parameter Wang [12]proposed a Poisson-Gamma compound approach for speciesrichness estimation
Journal of Probability and Statistics 3
22 Mathematical Formulation Let 119873119905 be total number ofrainfall event per day following a Poisson process such that
The amount of rainfall is the total sum of the jumps ofeach rainfall event say (119910119894)119894ge1 assumed to be identically andindependently Gamma distributed and independent of thetimes of the occurrence of rainfall
E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895))) 119875 (119873 (119905) = 119895)because of independence of 119871 and 119873(119905)
ln119872119871 (119904) = 120582 (119872119884 (119904) minus 1) = 120582 [(1 minus 120572119909)minus119875 minus 1] (6)
If we observe the occurrence of rainfall for 119899 periodsthen we have the sequence 119871 119894119899119894=1 which is independent andidentically distributed
If on a particular day there is no rainfall that occurredthen
We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+
If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0
We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is
Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898
+ 1120582119898sum119894=1
infinsum119895=1
119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582
119898sum119894=1
infinsum119895=1
119894 = 0
(15)
We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below
Definition 3 (see [14]) A probability density function of theform
for suitable functions 119896() and 119886() is called an exponentialdispersion model
Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922
If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then
119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)
(17)
However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that
E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0
E (119910119894) = 1198961015840 (120579119894) (18)
minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2
Var (119910119894) = Θ11989610158401015840 (120579119894)
(19)
Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)
Journal of Probability and Statistics 5
The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function
Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions
Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions
From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have
by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that
int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901
119901 = 2(22)
Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is
log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)
Proof From (16) the moment generating function is given by
For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)
= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)
By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows
120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1
(27)
The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0
6 Journal of Probability and Statistics
Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is
119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)
and the probability of having a rainfall event is
119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)
Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos
generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]
3 Parameter Estimation
We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin
119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895
is given by
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(31)
where 119895max = 1198712minus119901(2 minus 119901)Θ
119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)
(33)
The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation
119882(119871Θ 119875) = infinsum119895=1
119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum
119895=1
119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)
= 119882119895
(34)
Considering119882119895 we have
log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)
Using Stirlingrsquos approximation of Gamma functions we have
119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)
For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have
120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
2 Journal of Probability and Statistics
understand and interpret and use maximum likelihood tofind the parameters However models involve many parame-ters to fully describe the dynamics of rainfall aswell asmakingseveral assumptions for the process
Wilks [5] proposed a multisite model for daily precipi-tation using a combination of two-state Markov process (forthe rainfall occurrence) and amixed exponential distribution(for the precipitation amount) He found that the mixture ofexponential distributions offered a much better fit than thecommonly used Gamma distribution
In study of Leobacher and Ngare [3] the precipitationis modeled on a monthly basis by constructing a suit-able Markov-Gamma process to take into account seasonalchanges of precipitation It is assumed that rainfall datafor different years of the same month is independent andidentically distributed It is assumed that precipitation can beforecast with sufficient accuracy for a month
Another approach of modeling rainfall is based on thePoisson cluster model where two of the most recognizedcluster based models in the stochastic modeling of rain-fall are the Newman-Scott Rectangular Pulses model andthe Bartlett-Lewis Rectangular Pulse model These mod-els represent rainfall sequences in time and rainfall fieldsin space where both the occurrence and depth processesare combined The difficulty in Poisson cluster models asobserved by Onof et al [6] is the challenge of how manyfeatures should be addressed so that the model is stillmathematically tractable In addition the models are bestfitted by the method of moments and so requires matchinganalytic expressions for the statistical properties such asmeanand variance
Carmona and Diko [7] developed a time-homogeneousjump Markov process to describe rainfall dynamics Therainfall process was assumed to be in form of storms whichconsists of cells themselves At a cell arrival time the rainfallprocess jumps up by a random amount and at extinction timeit jumps down by a random amount bothmodeled as Poissonprocess Each time the rain intensity changes an exponentialincrease occurs either upwards or downwards To preservenonnegative intensity the downward jump size is truncatedto the current jump sizeTheMarkov jumpprocess also allowsfor a jump directly to zero corresponding to the state of norain [8]
In this study the rainfall process is modeled as a singlemodel where the occurrence and intensity of rainfall aresimultaneously modeled The Poisson process models thedaily occurrence of rainfall while the intensity is modeledusing Gamma distribution as the magnitude of the jumpsof the Poisson process Hence we have a compound Poissonprocess which is Poisson-GammamodelThe contribution ofthis study is twofold a Poisson-Gamma model that simul-taneously describes the rainfall occurrence and intensity atonce and a suitablemodel for zero inflated datawhich reducesoverdispersion
This paper is structured as follows In Section 2 thePoisson-Gamma model is described and then formulatedmathematically while Section 3 presents methods of estimat-ing the parameters of the model In Section 4 the model isfitted to the data and goodness of fit of the model is evaluated
by mean deviance whereas quantile residuals perform thediagnostics check of the model Simulation and forecastingare carried out in Section 5 and the study concludes inSection 6
2 Model Formulation
21 Model Description Rainfall comprises discrete and con-tinuous components in that if it does not rain the amountof rainfall is discrete whereas if it rains the amount iscontinuous In most research works [3 4 9] the rainfallprocess is presented by use of two separate models oneis for the occurrence and conditioned on the occurrenceand another model is developed for the amount of rain-fall Rainfall occurrence is basically modeled as first orhigher order Markov chain process and conditioned onthis process a distribution is used to fit the precipitationamount Commonly used distributions are Gamma expo-nential mixture of exponential Weibull and so on Thesemodels work based on several assumptions and inclusionof several parameters to capture the observed temporaldependence of the rainfall process However rainfall dataexhibit overdispersion [10] which is caused by various factorslike clustering unaccounted temporal correlation or the factthat the data is a product of Bernoulli trials with unequalprobability of eventsThe stochastic models developed in thisway underestimate the overdispersion of rainfall data whichmay result in underestimating the risk of low or high seasonalrainfall
Our interest in this research is to simultaneously modelthe occurrence and intensity of rainfall in one model Wewould model the rainfall process by using a Poisson-Gammaprobability distribution which is flexible to model the exactzeros and the amount of rainfall together
Rainfall ismodeled as a compoundPoisson processwhichis a Levy process with Gamma distributed jumps This ismotivated by the sudden changes of rainfall amount fromzero to a large positive value following each rainfall eventwhich are modeled as pure jumps of the compound Poissonprocess
We assume rainfall arrives in forms of storms following aPoisson process and at each arrival time the current intensityincreases by a randomamount based onGammadistributionThe jumps of the driving process represent the arrival ofthe storm events generating a jump size of random sizeEach storm comprises cells that also arrive following anotherPoisson process
The Poisson cluster processes gives an appropriate tool asrainfall data indicating presence of clusters of rainfall cellsAs observed by Onof et al [6] use of Gamma distributedvariables for cell depth improves the reproduction of extremevalues
Lord [11] used the Poisson-Gamma compound process tomodel the motor vehicle crashes where they examined theeffects of low sample mean values and small sample size onthe estimation of the fixed dispersion parameter Wang [12]proposed a Poisson-Gamma compound approach for speciesrichness estimation
Journal of Probability and Statistics 3
22 Mathematical Formulation Let 119873119905 be total number ofrainfall event per day following a Poisson process such that
The amount of rainfall is the total sum of the jumps ofeach rainfall event say (119910119894)119894ge1 assumed to be identically andindependently Gamma distributed and independent of thetimes of the occurrence of rainfall
E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895))) 119875 (119873 (119905) = 119895)because of independence of 119871 and 119873(119905)
ln119872119871 (119904) = 120582 (119872119884 (119904) minus 1) = 120582 [(1 minus 120572119909)minus119875 minus 1] (6)
If we observe the occurrence of rainfall for 119899 periodsthen we have the sequence 119871 119894119899119894=1 which is independent andidentically distributed
If on a particular day there is no rainfall that occurredthen
We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+
If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0
We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is
Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898
+ 1120582119898sum119894=1
infinsum119895=1
119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582
119898sum119894=1
infinsum119895=1
119894 = 0
(15)
We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below
Definition 3 (see [14]) A probability density function of theform
for suitable functions 119896() and 119886() is called an exponentialdispersion model
Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922
If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then
119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)
(17)
However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that
E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0
E (119910119894) = 1198961015840 (120579119894) (18)
minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2
Var (119910119894) = Θ11989610158401015840 (120579119894)
(19)
Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)
Journal of Probability and Statistics 5
The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function
Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions
Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions
From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have
by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that
int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901
119901 = 2(22)
Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is
log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)
Proof From (16) the moment generating function is given by
For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)
= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)
By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows
120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1
(27)
The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0
6 Journal of Probability and Statistics
Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is
119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)
and the probability of having a rainfall event is
119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)
Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos
generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]
3 Parameter Estimation
We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin
119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895
is given by
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(31)
where 119895max = 1198712minus119901(2 minus 119901)Θ
119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)
(33)
The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation
119882(119871Θ 119875) = infinsum119895=1
119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum
119895=1
119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)
= 119882119895
(34)
Considering119882119895 we have
log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)
Using Stirlingrsquos approximation of Gamma functions we have
119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)
For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have
120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
The amount of rainfall is the total sum of the jumps ofeach rainfall event say (119910119894)119894ge1 assumed to be identically andindependently Gamma distributed and independent of thetimes of the occurrence of rainfall
E (119890119904(119871(1)+119871(2)+sdotsdotsdot+119871(119895))) 119875 (119873 (119905) = 119895)because of independence of 119871 and 119873(119905)
ln119872119871 (119904) = 120582 (119872119884 (119904) minus 1) = 120582 [(1 minus 120572119909)minus119875 minus 1] (6)
If we observe the occurrence of rainfall for 119899 periodsthen we have the sequence 119871 119894119899119894=1 which is independent andidentically distributed
If on a particular day there is no rainfall that occurredthen
We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+
If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0
We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is
Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898
+ 1120582119898sum119894=1
infinsum119895=1
119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582
119898sum119894=1
infinsum119895=1
119894 = 0
(15)
We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below
Definition 3 (see [14]) A probability density function of theform
for suitable functions 119896() and 119886() is called an exponentialdispersion model
Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922
If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then
119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)
(17)
However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that
E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0
E (119910119894) = 1198961015840 (120579119894) (18)
minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2
Var (119910119894) = Θ11989610158401015840 (120579119894)
(19)
Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)
Journal of Probability and Statistics 5
The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function
Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions
Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions
From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have
by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that
int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901
119901 = 2(22)
Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is
log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)
Proof From (16) the moment generating function is given by
For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)
= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)
By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows
120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1
(27)
The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0
6 Journal of Probability and Statistics
Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is
119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)
and the probability of having a rainfall event is
119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)
Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos
generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]
3 Parameter Estimation
We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin
119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895
is given by
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(31)
where 119895max = 1198712minus119901(2 minus 119901)Θ
119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)
(33)
The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation
119882(119871Θ 119875) = infinsum119895=1
119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum
119895=1
119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)
= 119882119895
(34)
Considering119882119895 we have
log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)
Using Stirlingrsquos approximation of Gamma functions we have
119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)
For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have
120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
4 Journal of Probability and Statistics
We can express the probability density function 119891120579(119871) interms of a Dirac function as119891120579 (119871) = 11990101205750 (119871) + 1199020119891+
If we assume that there are 119898 positive values 1198711 1198712 119871119898then there are119872 = 119899 minus 119898 zeros where119898 gt 0
We observe that 119898 sim 119861119894(119899 1 minus exp (minus120582)) and 119901(119898 = 0) =exp (minus119899120582) hence the likelihood function is
Now for we have120597 log 119871 (120579 1198711 1198712 119871119899)120597120582= 119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898
+ 1120582119898sum119894=1
infinsum119895=1
119894 120597 log119871 (120579 1198711 1198712 119871119899)120597120582 = 0 997904rArr119898 minus 119899 + 1198981 minus 119890minus120582 + (minus1)119898 + 1120582
119898sum119894=1
infinsum119895=1
119894 = 0
(15)
We can observe from the above evaluation that 120582 can not beexpressed in closed form similar derivation also shows that120572 as well can not be expressed in closed form Therefore wecan only estimate 120582 and 120572 using numerical methods Withersand Nadarajah [13] also observed that the probability densityfunction can not be expressed in closed form and thereforeit is difficult to find the analytic form of the estimators Sowe will express the probability density function in terms ofexponential dispersion models as described below
Definition 3 (see [14]) A probability density function of theform
for suitable functions 119896() and 119886() is called an exponentialdispersion model
Θ gt 0 is the dispersion parameterThe function 119896(120579) is thecumulant of the exponential dispersion model since Θ = 1then 1198961015840() are the successive cumulants of the distribution [15]The exponential dispersion models were first introduced byFisher in 1922
If we let 119871 119894 = log119891(119910119894 120579119894 Θ) as a contribution of 119910119894 to thelikelihood function 119871 = sum119894 119871 119894 then
119871 119894 = 1Θ [119910119894120579 minus 119896 (120579119894)] + log 119886 (119910 Θ) 120597119871 119894120597120579119894 = 1Θ (119910119894 minus 1198961015840 (120579119894)) 1205972119871 1198941205971205792119894 = minus 1Θ11989610158401015840 (120579119894)
(17)
However we expect that E(120597119871 119894120597120579119894) = 0 and minusE(1205972119871 1198941205971205792119894 ) =E(120597119871 119894120597120579119894)2 so that
E( 1Θ (119910119894 minus 1198961015840 (120579119894))) = 01Θ (E (119910119894) minus 1198961015840 (120579119894)) = 0
E (119910119894) = 1198961015840 (120579119894) (18)
minusE(minus 1Θ11989610158401015840 (120579119894)) = E( 1Θ (119910119894 minus 1198961015840 (120579119894)))2 11989610158401015840 (120579119894)Θ = Var (119910119894)Θ2
Var (119910119894) = Θ11989610158401015840 (120579119894)
(19)
Therefore the mean of the distribution is E[119884] = 120583 = 119889119896(120579)119889120579 and the variance is Var(119884) = Θ(1198892119896(120579)1198891205792)
Journal of Probability and Statistics 5
The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function
Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions
Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions
From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have
by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that
int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901
119901 = 2(22)
Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is
log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)
Proof From (16) the moment generating function is given by
For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)
= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)
By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows
120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1
(27)
The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0
6 Journal of Probability and Statistics
Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is
119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)
and the probability of having a rainfall event is
119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)
Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos
generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]
3 Parameter Estimation
We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin
119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895
is given by
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(31)
where 119895max = 1198712minus119901(2 minus 119901)Θ
119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)
(33)
The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation
119882(119871Θ 119875) = infinsum119895=1
119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum
119895=1
119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)
= 119882119895
(34)
Considering119882119895 we have
log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)
Using Stirlingrsquos approximation of Gamma functions we have
119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)
For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have
120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Journal of Probability and Statistics 5
The relationship 120583 = 119889119896(120579)119889120579 is invertible so that 120579 canbe expressed as a function of 120583 as such we have Var(119884) =Θ119881(120583) where 119881(120583) is called a variance function
Definition 4 The family of exponential dispersion modelswhose variance functions are of the form 119881(120583) = 120583119901 for119901 isin (minusinfin 0]cup[1infin) are called Tweedie family distributions
Examples are as follows for 119901 = 0 then we have a normaldistribution 119901 = 1 and Θ = 1 it is a Poisson distributionand Gamma distribution for 119901 = 2 while when 119901 = 3 it isGaussian inverse distribution Tweedie densities can not beexpressed in closed form (apart from the examples above)but can instead be identified by their cumulants generatingfunctions
From Var(119884) = Θ(1198892119896(120579)1198891205792) then for Tweedie familydistribution we have
by equating the constants of integration above to zeroFor 119901 = 1 we have 120583 = [(1 minus 119901)120579]1(1minus119901) so that
int119889119896 (120579) = int [(1 minus 119901) 120579]1(1minus119901) 119889120579119896 (120579) = [(1 minus 119901) 120579](2minus119901)(1minus119901)2 minus 119901 = 120583(2minus119901)(1minus119901)2 minus 119901
119901 = 2(22)
Proposition 5 Thecumulant generating function of a Tweediedistribution for 1 lt 119901 lt 2 is
log119872119884 (119905)= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (23)
Proof From (16) the moment generating function is given by
For 1 lt 119901 lt 2 we substitute 120579 and 119896(120579) to havelog119872119884 (119905)
= 1Θ 1205832minus119901119901 minus 1 [(1 + 119905Θ (1 minus 119901) 120583119901minus1)(2minus119901)(1minus119901) minus 1] (26)
By comparing the cumulant generating functions inLemma 1 and Proposition 5 the compound Poisson processcan be thought of as Tweedie distribution with parameters(120582 120572 119875) expressed as follows
120582 = 1205832minus119901Θ(2 minus 119901) 120572 = Θ (119901 minus 1) 120583119901minus1119875 = 2 minus 119901119901 minus 1
(27)
The requirement that the Gamma shape parameter 119875 bepositive implies that only Tweedie distributions between 1 lt119901 lt 2 can represent the Poisson-Gamma compound processIn addition for 120582 gt 0 120572 gt 0 implies 120583 gt 0 and Θ gt 0
6 Journal of Probability and Statistics
Proposition 6 Based on Tweedie distribution the probabilityof receiving no rainfall at all is
119875 (119871 = 0) = exp[minus 1205832minus119901Θ(2 minus 119901)] (28)
and the probability of having a rainfall event is
119875 (119871 gt 0)= 119882 (120582 120572 119871 119875) exp[ 119871(1 minus 119901) 120583119901minus1 minus 1205832minus1199012 minus 119901] (29)
Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos
generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]
3 Parameter Estimation
We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin
119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895
is given by
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(31)
where 119895max = 1198712minus119901(2 minus 119901)Θ
119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)
(33)
The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation
119882(119871Θ 119875) = infinsum119895=1
119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum
119895=1
119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)
= 119882119895
(34)
Considering119882119895 we have
log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)
Using Stirlingrsquos approximation of Gamma functions we have
119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)
For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have
120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Proof This follows by directly substituting the values of 120582 and120579 119896(120579) into (16)The function 119882(120582 120572 119871 119875) is an example of Wrightrsquos
generalized Bessel function however it can not be expressedin terms of the more common Bessel function To evaluate itthe value of 119895 is determined forwhich the function119882119895 reachesthe maximum [15]
3 Parameter Estimation
We approximate the function 119882(120582 120572 119871 119875) =suminfin119895=1(120582119895(120572119871)119895119875119890minus120582119895Γ(119895119875)) = suminfin
119895=1 119882119895 following theprocedure by [15] where the value of 119895 is determined forwhich119882119895 reaches maximumWe treat 119895 as continuous so that119882119895 is differentiated with respect to 119895 and set the derivative tozero So for 119871 gt 0 we have the followingLemma 7 (see [15]) The log maximum approximation of 119882119895
is given by
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(31)
where 119895max = 1198712minus119901(2 minus 119901)Θ
119871119895119875 (119901 minus 1)119895119875 120583(2minus119901)119895+(119901minus1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)
(33)
The term 120583(2minus119901)119895+(119901minus1)119895119875 depends on the 119871 119901 119875 Θ values sowe maximize the summation
119882(119871Θ 119875) = infinsum119895=1
119871119895119875 (119901 minus 1)119895119875Θ119895(1minus119875) (2 minus 119901)119895 119895Γ (119895119875)= infinsum
119895=1
119911119895119895Γ (119895119875)where 119911 = 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901)
= 119882119895
(34)
Considering119882119895 we have
log119882119895 = 119895 log 119911 minus log 119895 minus log (119875119895)= 119895 log 119911 minus log Γ (119895 + 1) minus log (119875119895) (35)
Using Stirlingrsquos approximation of Gamma functions we have
119882119895 asymp 119895 [log 119911 + (1 + 119875) minus 119875 log119875 minus (1 minus 119875) log 119895]minus log (2120587) minus 12 log119875 minus log 119895 (37)
For 1 lt 119901 lt 2 we have 119875 = (2 minus 119901)(119901 minus 1) gt 0 hencethe logarithms have positive arguments Differentiating withrespect to 119895 we have
120597 log119882119895120597119895 asymp log 119911 minus 1119895 minus log 119895 minus 119875 log (119875119895)asymp log 119911 minus log 119895 minus 119875 log (119875119895) (38)
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Journal of Probability and Statistics 7
where 1119895 is ignored for large 119895 Solving for (120597 log119882119895)120597119895 = 0we have
119895max = 1198712minus119901(2 minus 119901)Θ (39)
Substituting 119895max in log119882119895 to find the maximum approxima-tion of119882119895 we have
log119882max = 1198712minus119901(2 minus 119901)Θ [log 119871119875 (119901 minus 1)119875Θ(1minus119875) (2 minus 119901) + (1 + 119875)minus 119875 log119875 minus (1 minus 119875) log 1198712minus119901(2 minus 119901)Θ] minus log (2120587) minus 12sdot log119875 minus log 1198712minus119901(2 minus 119901)Θ
(40)
Hence the result follows
It can be observed that 120597119882119895120597119895 is monotonically decreas-ing hence log119882119895 is strictly convex as a function of 119895Therefore 119882119895 decays faster than geometrically on either sideof 119895max [15] Therefore if we are to estimate 119882(119871Θ 119875) by(119871 Θ 119875) = sum119895119906
119895=119895119889119882119895 the approximation error is bounded
by geometric sum
119882(119871Θ 119875) minus (119871 Θ 119875)lt 119882119895119889minus1
1 minus 119903119895119889minus11198971 minus 119903119897 + 119882119895119906+1
11 minus 119903119906 119903119897 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119889 minus 1119903119906 = exp(120597119882119895120597119895 )100381610038161003816100381610038161003816100381610038161003816119895 = 119895119906 + 1
(41)
For quick and accurate evaluation of119882(120582 120572 119871 119875) the seriesis summed for only those terms in the series which contributesignificantly to the sum
Generalized linear models extend the standard linearregressionmodels to incorporate nonnormal response distri-butions and possibly nonlinear functions of the mean Theadvantage of GLMs is that the fitting process maximizes thelikelihood for the choice of the distribution for a randomvariable 119910 and the choice is not restricted to normality unlikelinear regression [16]
The exponential dispersion models are the responsedistributions for the generalized linear models Tweedie dis-tributions are members of the exponential dispersion modelsupon which the generalized linear models are based Conse-quently fitting a Tweedie distribution follows the frameworkof fitting a generalized linear model
Lemma 8 In case of a canonical link function the sufficientstatistics for 120573119895 are sum119899
119894=1 119910119894119909119894119895
Proof For 119899 independent observations 119910119894 of the exponentialdispersion model (16) the log-likelihood function is
119910119894120579119894 minus 119896 (120579119894)Θ + 119899sum119894
log 119886 (119910119894 Θ) (42)
But 120579119894 = sum119901119895 120573119895119909119894119895 hence
119899sum119894
119910119894120579119894 = 119899sum119894=1
119910119894 119901sum119895
120573119895119909119894119895 = 119901sum119895
120573119895 119899sum119894=1
119910119894119909119894119895 (43)
Proposition 9 Given that 119910119894 is distributed as (16) then itsdistribution depends only on its first two moments namely 120583119894and Var(119910119894)Proof Let 119892(120583119894) be the link function of the GLM such that120578119894 = sum119901
119895=1 120573119895119909119894119895 = 119892(120583119894) The likelihood equations are
120597119871 (120573)120597120573 = 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894 = 119910119894 minus 120583119894Θ120583119901119894 119909119894119895 120597120583119894120597120578119894 (46)
Since Var(119910119894) = 119881(120583119894) the relationship between themean andvariance characterizes the distribution
Clearly a GLM only requires the first two moments ofthe response 119910119894 hence despite the difficulty of full likelihoodanalysis of Tweedie distribution as it can not be expressedin closed form for 1 lt 119901 lt 2 we can still fit aTweedie distribution family The likelihood is only requiredto estimate 119901 and Θ as well as diagnostic check of the model
Proposition 10 Under the standard regularity conditions forlarge 119899 the maximum likelihood estimator 120573 of 120573 for general-ized linear model is efficient and has an approximate normaldistribution
Proof From the log-likelihood the covariance matrix of thedistribution is the inverse of the information matrix J =E(minus1205972119871(120573)120597120573ℎ120597120573119895)
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Var (119910119894)119909119894ℎ 120597120583119894120597120578119894)( 119910119894 minus 120583119894Var (119910119894)119909119894119895 120597120583119894120597120578119894)]
where119882 = diag[(1Var(119910119894))(120597120583119894120597120578119894)2]Therefore 120573 has an approximate 119873[120573 (119883119879119882119883)minus1] with
Var(120573) = (119883119879119883)minus1 where is evaluated at 120573To compute 120573 we use the iteratively reweighted least
square algorithmproposed byDobson andBarnett [17]wherethe iterations use the working weights 119908119894119908119894119881 (120583119894) 119892 (120583119894)2 (49)
where 119881(120583119894) = 120583119901119894 However estimating 119901 is more difficult than estimating120573 and Θ such that most researchers working with Tweedie
densities have119901 a priori In this study we use the procedure in[15]where themaximum likelihood estimator of119901 is obtainedby directly maximizing the profile likelihood function Forany given value of119901wefind themaximum likelihood estimateof 120573Θ and compute the log-likelihood function This isrepeated several times until we have a value of 119901 whichmaximizes the log-likelihood function
Given the estimated values of 119901 and 120573 then the unbiasedestimator of Θ is given by
Θ = 119899sum119894=1
[119871 119894 minus 120583119894 (120573)]2120583119894 (120573)119901 (50)
Since for 1 lt 119901 lt 2 the Tweedie density can not be expressedin closed form it is recommended that the maximumlikelihood estimate of Θ must be computed iteratively fromfull data [15]
4 Data and Model Fitting
41 Data Analysis Daily rainfall data of Balaka district inMalawi covering the period 1995ndash2015 is used The data wasobtained from Meteorological Surveys of Malawi Figure 1shows a plot of the data
In summary the minimum value is 0mmwhich indicatesthat there were no rainfall on particular days whereas themaximum amount is 1237mm The mean rainfall for thewhole period is 3167mm
Rainfall for Balaka for 10 years
020
60
100
Am
ount
2000 2005 2010 20151995Year
Figure 1 Daily rainfall amount for Balaka district
minus8
minus4
0246
log(
varia
nce)
minus2 0 2minus4log(mean)
Figure 2 Variance mean relationship
We investigated the relationship between the variance andthe mean of the data by plotting the log(variance) againstlog(mean) as shown in Figure 2 From the figure we canobserve a linear relationship between the variance and themean which can be expressed as
log (Variance) = 120572 + 120573 log (mean) (51)
Variace = 119860 lowastmean120573 119860 isin R (52)
Hence the variance can be expressed as some power 120573 isin R
of the mean agreeing with the Tweedie variance functionrequirement
42 Fitted Model To model the daily rainfall data we use sinand cos as predictors due to the cyclic nature and seasonalityof rainfall We have assumed that February ends on 28th forall the years to be uniform in our modeling
where 119894 = 1 2 365 corresponds to days of the year and1198860 1198861 1198862 are the coefficients of regressionIn the first place we estimate 119901 by maximizing the profile
log-likelihood function Figure 3 shows the graph of theprofile log-likelihood function As can be observed the valueof 119901 that maximizes the function is 15306
From the results obtained after fitting themodel both thecyclic cosine and sine terms are important characteristics fordaily rainfall Table 1 The covariates were determined to takeinto account the seasonal variations in the stochastic model
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Comparing the actual means and the predicted means for 2July we have 120583 = 03820 whereas 120583 = 04333 similarly for 31December we have 120583 = 90065 and 120583 = 106952 respectivelyFigure 4 shows the estimated mean and actual mean wherethe model behaves well generally
43 Goodness of Fit of the Model Let the maximum likeli-hood estimate of 120579119894 be 120579119894 for all 119894 and 120583 as the modelrsquos mean
Rainfall means
Actual meanPredicted mean
0
5
10
15
20
Mea
n
100 200 3000Days of the year
Figure 4 Actual versus predicted mean
estimate Let 120579119894 denote the estimate of 120579119894 for the saturatedmodel with corresponding 120583 = 119910119894
The goodness of fit is determined by deviance which isdefined as
minus 2 [ maximum likelihood of the fitted modelMaximum likelihood of the saturated model
119910119894120579119894 minus 119896 (120579119894)Θ minus 2 119899sum119894=1
119910119894120579119894 minus 119896 (120579119894)Θ= 2 119899sum
119894=1
119910119894 (120579119894 minus 120579119894) minus 119896 (120579119894) + 119896 (120579119894)Θ = Dev (119910 120583)Θ
(56)
Dev(119910 120583) is called the deviance of the model and the greaterthe deviance the poorer the fitted model as maximizing thelikelihood corresponds to minimizing the deviance
In terms of Tweedie distributions with 1 lt 119901 lt 2 thedeviance is
Dev119901
= 2 119899sum119894=1
(1199102minus119901119894 minus (2 minus 119901) 1199101198941205831minus119901119894 + (1 minus 119901) 1205832minus119901119894(1 minus 119901) (2 minus 119901) ) (57)
Based on results from fitting the model the residualdeviance is 43144 less than the null deviance 62955 whichimplies that the fitted model explains the data better than anull model
44 Diagnostic Check Themodel diagnostic is considered asa way of residual analysis The fitted model faces challenges
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
10 Journal of Probability and Statistics
Residuals
2000 2005 2010 20151995Year
0
50
100
150
Resid
uals
Figure 5 Residuals of the model
Normal Q-Q Plot
minus2
0
2
4
Sam
ple q
uant
iles
minus2 0minus4 42Theoretical quantiles
Figure 6 Q-Q plot of the quantile residuals
to be assessed especially for days with no rainfall at all as theyproduce spurious results and distracting patterns similarlyas observed by [15] Since this is a nonnormal regressionresiduals are far from being normally distributed and havingequal variances unlike in a normal linear regression Herethe residuals lie parallel to distinct values hence it is difficultto make any meaningful decision about the fitted model(Figure 5)
So we assess the model based on quantile residuals whichremove the pattern in discrete data by adding the smallestamount of randomization necessary on the cumulative prob-ability scale
The quantile residuals are obtained by inverting thedistribution function for each response and finding theequivalent standard normal quantile
Mathematically let 119886119894 = lim119910uarr119910119894119865(119910 120583119894 Θ) and 119887119894 = 119865(119910119894120583119894 Θ) where 119865 is the cumulative function of the probability
density function 119891(119910 120583 Θ) then the randomized quantileresiduals for 119910119894 are
119903119902119894 = Φminus1 (119906119894) (58)
with 119906119894 being the uniform random variable on (119886119894 119887119894] Therandomized quantile residuals are distributed normally bar-ring the variability in 120583 and Θ
Figure 6 shows the normalized Q-Q plot and as canbe observed there are no large deviations from the straightline only small deviations at the tail The linearity observedindicates an acceptable fitted model
Rainfall amount
Days of the year1 44 95 153 218 283 348 413 478 543 608 673
PredictedActual
020
60
100
Am
ount
Figure 7 Simulated rainfall and observed rainfall
Probability of no rainfall
065
075
085
095
Prob
abili
ty
100 200 3000Days of the year
Figure 8 Probability of rainfall occurrence
5 Simulation
Themodel is simulated to test whether it produces data withsimilar characteristics to the actual observed rainfall Thesimulation is done for a period of two years where one wasthe last year of the data (2015) and the other year (2016) wasa future prediction Then comparison was done with a graphfor 2015 data as shown in Figure 7
The different statistics of the simulated data and actualdata are shown in Table 2 for comparisons
The main objective of simulation is to demonstrate thatthe Poisson-Gamma can be used to predict and forecastrainfall occurrence and intensity simultaneously Based onthe results above (Figure 8) the model has shown that itworks well in predicting the rainfall intensity and hence canbe used in agriculture actuarial science hydrology and so on
However the model performed poorly in predictingprobability of rainfall occurrence as it underestimated theprobability of rainfall occurrence It is suggested here thatprobably the use of truncated Fourier series can improve thisestimation as compared to the sinusoidal
But it performed better in predicting probability of norainfall on days where there was little or no rainfall asindicated in Figure 8
It can also be observed that the model produces syntheticprecipitation that agrees with the four characteristics of astochastic precipitation model as suggested by [4] as followsThe probability of rainfall occurrence obeys a seasonal pat-tern (Figure 8) in addition we can also tell that a probabilityof a rain in a day is higher if the previous day was wetwhich is the basis of precipitation models that involve the
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Journal of Probability and Statistics 11
Table 2 Data statistics
Min 1st Qu Median Mean 3rd Qu MaxPredicted data 000 000 000 3314 000 1165Actual data [10 yrs] 000 000 000 3183 0300 1237Actual data [2015] 000 000 000 3328 000 845Markov process From Figure 7 we can also observe variationof rainfall intensity based on time of the season
In addition the model allows modeling of exact zeros inthe data and is able to predict a probability of no rainfall eventsimultaneously
6 Conclusion
A daily stochastic rainfall model was developed based ona compound Poisson process where rainfall events followa Poisson distribution and the intensity is independentof events following a Gamma distribution Unlike severalresearches that have been carried out into precipitationmodeling whereby two models are developed for occurrenceand intensity the model proposed here is able to model bothprocesses simultaneously The proposed model is also ableto model the exact zeros the event of no rainfall whichis not the case with the other models This precipitationmodel is an important tool to study the impact of weatheron a variety of systems including ecosystem risk assessmentdrought predictions and weather derivatives as we can beable to simulate synthetic rainfall data The model providesmechanisms for understanding the fine scale structure likenumber and mean of rainfall events mean daily rainfalland probability of rainfall occurrence This is applicable inagriculture activities disaster preparedness and water cyclesystems
The model developed can easily be used for forecastingfuture events and in terms ofweather derivatives theweatherindex can be derived from simulating a sample path bysumming up daily precipitation in the relevant accumulationperiod Rather than developing a weather index which is notflexible enough to forecast future events we can use thismodel in pricing weather derivatives
Rainfall data is generally zero inflated in that the amountof rainfall received on a day can be zero with a posi-tive probability but continuously distributed otherwise Thismakes it difficult to transform the data to normality bypower transforms or to model it directly using continu-ous distribution The Poisson-Gamma distribution has acomplicated probability density function whose parametersare difficult to estimate Hence expressing it in terms of aTweedie distribution makes estimating the parameters easyIn addition Tweedie distributions belong to the exponentialfamily of distributions upon which generalized linear modelsare based hence there is an already existing framework inplace for fitting and diagnostic testing of the model
Themodel developed allows the information in both zeroand positive observations to contribute to the estimationof all parts of the model unlike the other model [3 4 9]which conditions rainfall intensity based on probability of
occurrence In addition the introduction of the dispersionparameter in the model helps in reducing underestimationof overdispersion of the data which is also common in theaforementioned models
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
The authors extend their gratitude to Pan African UniversityInstitute for Basic Sciences Technology and Innovation forthe financial support
References
[1] AHussain ldquoStochasticmodeling of rainfall processes amarkovchain-mixed exponential model for rainfalls in different cli-matic conditionsrdquo
[2] M Cao A Li and J Z Wei ldquoPrecipitation modeling andcontract valuation A frontier in weather derivativesrdquo TheJournal of Alternative Investments vol 7 no 2 pp 93ndash99 2004
[3] G Leobacher and P Ngare ldquoOn modelling and pricing rainfallderivativeswith seasonalityrdquoAppliedMathematical Finance vol18 no 1 pp 71ndash91 2011
[4] M Odening O Musshoff and W Xu ldquoAnalysis of rainfallderivatives using daily precipitation models Opportunities andpitfallsrdquo Agricultural Finance Review vol 67 no 1 pp 135ndash1562007
[5] D S Wilks ldquoMultisite generalization of a daily stochasticprecipitation generation modelrdquo Journal of Hydrology vol 210no 1ndash4 pp 178ndash191 1998
[6] C Onof R E Chandler A Kakou P Northrop H S Wheaterand V Isham ldquoRainfall modelling using poisson-cluster pro-cesses a review of developmentsrdquo Stochastic EnvironmentalResearch and Risk Assessment vol 14 no 6 pp 384ndash411 2000
[7] R Carmona and P Diko ldquoPricing precipitation based deriva-tivesrdquo International Journal of Theoretical and Applied Financevol 8 no 7 pp 959ndash988 2005
[8] F E Benth and J S Benth Modeling and pricing in financialmarkets for weather derivatives vol 17 World Scientific 2012
[9] B Lopez Cabrera M Odening and M Ritter ldquoPricing rainfallfutures at the CMErdquo Technical report Humboldt UniversityCollaborative Research Center
[10] T I Harrold A Sharma and S J Sheather ldquoA nonparametricmodel for stochastic generation of daily rainfall occurrencerdquoWater Resources Research vol 39 no 10 2003
[11] D Lord ldquoModeling motor vehicle crashes using Poisson-gamma models examining the effects of low sample meanvalues and small sample size on the estimation of the fixed
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
[12] J-P Wang ldquoEstimating species richness by a Poisson-com-pound gamma modelrdquo Biometrika vol 97 no 3 pp 727ndash7402010
[13] C S Withers and S Nadarajah ldquoOn the compound Poisson-gamma distributionrdquo Kybernetika vol 47 no 1 pp 15ndash37 2011
[14] E W Frees R Derrig and G Meyers Regression modelingwith actuarial and financial applications vol 1 CambridgeUniversity Press Cambridge UK 2014
[15] P K Dunn and G K Smyth ldquoEvaluation of Tweedie exponen-tial dispersion model densities by Fourier inversionrdquo Statisticsand Computing vol 18 no 1 pp 73ndash86 2008
[16] A Agresti Foundations of linear and generalized linear modelsJohn Wiley amp Sons 2015
[17] A J Dobson and A G Barnett An Introduction to GeneralizedLinear Models Texts in Statistical Science Series CRC PressBoca Raton Fla USA 3rd edition 2008
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences