Top Banner
Priredio: Prof. Dr. Kemo Sokolija, Sarajevo 2012. APLIKACIJE VJEROVATNOSTI, STATISTIKE I SLUČAJNIH PROCESA U ZNANOSTI I INŽENJERSTVU Tema: Procjena parametara distribucije
68

Doktorski 1.7 Procjena Parametara

Dec 16, 2015

Download

Documents

ccooxxyy

Doktorski_1.7_Procjena_parametara.pptx
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Slide 1

Priredio: Prof. Dr. Kemo Sokolija,Sarajevo 2012.

APLIKACIJE VJEROVATNOSTI, STATISTIKE I SLUAJNIH PROCESAU ZNANOSTI I INENJERSTVU

Tema: Procjena parametara distribucijeCarl Friedrich Gauss was born on 30 April 1777 in Braunschweig (Brunswick), in the Duchy of Braunschweig-Wolfenbttel, now part of Lower Saxony, Germany, as the son of poor working-class parents.[4] Indeed, his mother was illiterate and never recorded the date of his birth, remembering only that he had been born on a Wednesday, eight days before the Feast of the Ascension, which itself occurs 40 days after Easter. Gauss would later solve this puzzle about his birthdate in the context of finding the date of Easter, deriving methods to compute the date in both past and future years.[5] He was christened and confirmed in a church near the school he attended as a child.[6]Gauss was a child prodigy. There are many anecdotes about his precocity while a toddler, and he made his first ground-breaking mathematical discoveries while still a teenager. He completed Disquisitiones Arithmeticae, his magnum opus, in 1798 at the age of 21, though it was not published until 1801. This work was fundamental in consolidating number theory as a discipline and has shaped the field to the present day.Gauss's intellectual abilities attracted the attention of the Duke of Braunschweig,[2] who sent him to the Collegium Carolinum (now Technische Universitt Braunschweig), which he attended from 1792 to 1795, and to the University of Gttingen from 1795 to 1798. While at university, Gauss independently rediscovered several important theorems;[citation needed] his breakthrough occurred in 1796 when he showed that any regular polygon with a number of sides which is a Fermat prime (and, consequently, those polygons with any number of sides which is the product of distinct Fermat primes and a power of 2) can be constructed by compass and straightedge. This was a major discovery in an important field of mathematics; construction problems had occupied mathematicians since the days of the Ancient Greeks, and the discovery ultimately led Gauss to choose mathematics instead of philology as a career. Gauss was so pleased by this result that he requested that a regular heptadecagon be inscribed on his tombstone. The stonemason declined, stating that the difficult construction would essentially look like a circle.[7]The year 1796 was most productive for both Gauss and number theory. He discovered a construction of the heptadecagon on 30 March.[8] He further advanced modular arithmetic, greatly simplifying manipulations in number theory.[citation needed] On 8 April he became the first to prove the quadratic reciprocity law. This remarkably general law allows mathematicians to determine the solvability of any quadratic equation in modular arithmetic. The prime number theorem, conjectured on 31May, gives a good understanding of how the prime numbers are distributed among the integers. Gauss also discovered that every positive integer is representable as a sum of at most three triangular numbers on 10 July and then jotted down in his diary the famous note: "! num=++". On October1 he published a result on the number of solutions of polynomials with coefficients in finite fields, which 150 years later led to the Weil conjectures

1

x1(*) plot = nanositi na grafikon

Termin procjena parametara odnosi se na proces koritenja podataka o uzorku (u inenjerstvu pouzdanosti su to obino vremena do kvara ili podaci o uspjenosti) u svrhu procjene parametara izabrane distribucije. Postoji nekoliiko metoda koje se koriste u ovu svrhu. U okviru ovog predavanja bit e dan pregled onih metoda to se koriste u analizi podataka ivotne dobi.

Poet emo s relativno jednostavnom metodom:

Grafika metoda (Probability Plotting*) i nastaviti sa sofisticiranijim metodama:

Regresija poretka (Rank Regression) ili Najmani kvadrati (Least Squares),

Metoda maksimalne vjerodostojnosti (Maximum Likelihood Estimation) i Bayesova metoda (Bayesian Estimation Method).Procjena parametara distribucije

x1

Grafika metoda - Probability Plotting

Ovo je, matematski gledano, najslabije utemeljena metoda za procjenu parametara distribucije. Sastoji se u bukvalno runom unoenju podataka na, za ovu namjenu, posebno konstruiranom papiru (probability plotting paper).

Metoda se bazira na linearizaciji funkcije kumulativne distribucije (cdf), koritenjem specijalo konstruiranog papira.

Dole navedene korake ove metode ilustrirat emo na primjeru 2-parameterske Weibullove distribucije : Linearizacija funkcije nepouzdanosti

Konstrukcija specijalnog papira (karte)

Odreivanje koordinata taaka Koritenje dobivenog dijagrama za oitavanje parametara distribucije

x1Linearizacija funkcije nepouzdanosti Grafika metoda - Probability Plotting

U sluaju 2-parameterske Weibullove distribucije, funkcija cdf (odnosno funkcija nepouzdanosti Q(t)) dana je kao:

Ovu je funkciju mogue linearizirati tj. dovesti u opi format y = mx + b kako slijedi:

APLIKACIJE VJEROVATNOSTI, STATISTIKE I SLUAJNIH PROCESAU ZNANOSTI I INENJESTVU x1

Grafika metoda - Probability Plotting

Uvodei zamjene:

prethodnu jednadbu moemo napisati u formi:

To je sada linearna jednadba, odnosno prava s nagibom:m =i odsjekom na ordinati:b = - ln ( )

x1 Grafika metoda - Probability Plotting

Konstrukcija papira (karte)Slijedei korak sastoji se u konstrukciji Weibullovog papira (Weibull probability plotting paper) s odgovaarajuim osama y i x . Transformacija ose x je jednostavna logaritamska transformacija. Transformacija ose y je sloenija, budui da se radi o dvostrukoj logaritamskoj recipronoj transformaciji:

gdje Q(t) predstavlja funkciju nepouzdanosti. Postoji vie verzija ovog papira, a onu toga je kreirala kompanija ReliaSoft's reliability engineering moete nai na stranici www.weibull.com.

x1Na donjoj slici prikazana je neto modificirana forma ovog papira. Grafika metoda - Probability Plotting

x1

Papir (karta) je konstruiran na bazi prikazanih transformacija osa y i x, gdje osa y predstavlja nepouzdanost, a osa x vrijeme do kvara. Za svaku od taaka koje elimo unijeti u dijagram moraju biti poznate obje ove vrijednosti.

Nakon to se u dijagram unesu sve promatrane take, potrebno je kroz njih (izmeu njih) provui najbolju moguu pravu liniju.

Nakon toga, mogue je odrediti nagib dobivene prave (neki od posebno konstruiranih papira imaju indikatore nagiba koji olakavaju ovaj proraun). To je, kao to smo vidjeli, ustvari traeni parametar (parametar oblika)Weibullove raspodjele. Drugi parametar parametar skale, , (takoer nazivan parametar ivota) jednostavno oitavamo s x- osi , kao ono vrijeme kojemu odgovara nepouzdanost Q(t) = 63.2% . Grafika metoda - Probability Plotting

x1Zato: Grafika metoda - Probability Plotting

Ako u izraz za funkciju nepouzdanosti uvrstimo t = dobit emo:

x1Odreivanje koordinata taaka Grafika metoda - Probability Plotting

Take na dijagramu predstavljaju nae podatke, odnosno podatke o vremenima do kvara u sluaju koji nas ovdje posebno zanima.

Ako smo naprimjer proveli ispitivanja i ustanovili da su ispitivane jedinice doivjele kvar nakon 13, 24, 34 i 45 sati, ova emo vremena koristiti kao x odnosno t koordinate naih taaka.

Odreivanje pozicija na y osi tj korespondirajue im vrijednosti funkcije nepouzdanosti je malo sloenije. Potrebno je, naime, za svako vrijeme do kvara ustanoviti kumulativni procent jedinica koje su doivjele kvar. Naprimjer, kumulativni procent jedinica koje su doivjele kvar do desetog sata moe biti 25%, do dvadesetog 20 sata 50%, itd. Ovu jednostavnu metodu ne moemo primijeniti u praksi, budui da posjedujemo podatke za uzorak, a ne i one to se odnose na kompletnu populaciju.Zbog toga se mora koristiti alternativni pristup.

Najee koritena metoda za odreivanje funkcije nepouzdanosti odnosno kumulativnih procenata jedinica koje su doivjele kvar jeste metoda poretka medijane (median rank method).

x1Odreivanje koordinata taaka Grafika metoda - Probability Plotting

Take na dijagramu predstavljaju nae podatke, odnosno podatke o vremenima do kvara u sluaju koji nas ovdje posebno zanima.

Ako smo naprimjer proveli ispitivanja i ustanovili da su ispitivane jedinice doivjele kvar nakon 13, 24, 34 i 45 sati, ova emo vremena koristiti kao x odnosno t koordinate naih taaka.

Odreivanje pozicija na y osi tj korespondirajue im vrijednosti funkcije nepouzdanosti je malo sloenije.Potrebno je, naime, za svako vrijeme do kvara ustanoviti kumulativni procent jedinica koje su doivjele kvar. Naprimjer, kumulativni procent jedinica koje su doivjele kvar do desetog sata moe biti 25%, do dvadesetog 20 sata 50%, itd. Ovu jednostavnu metodu ne moemo primijeniti u praksi, budui da posjedujemo podatke za uzorak, a ne i one to se odnose na kompletnu populaciju.Zbog toga se mora koristiti alternativni pristup.

Najee koritena metoda za odreivanje funkcije nepouzdanosti odnosno kumulativnih procenata jedinica koje su doivjele kvar jeste metoda poretka medijane (median rank method).

x1 Grafika metoda - Probability Plotting

Metoda poretka medijane (Median rank method)Poredak medijane predstavlja vrijednost koja kae da razina povjerljivosti prave vjerovatnosti pojave kvara Q(Tj) , kod j -tog kvara iznosi 50%.Poredak moe biti odreen za bilo koju razinu povjerljivosti, P, veu od nule i manju od jedan, rjeavanjem slijedee kumulativne binomne jednadbe po Z.

N = dimenzija uzorka; j = redni broj eksperimentaPrema tome, poredak medijane e se dobiti rjeavanjem gornje jednadbe po Z, uvrtavajui da je P = 0.50 :

Rjeavanje kumulativne binomne jednadbe po Z zahtijeva primjenu numerikih metoda.

x1 Grafika metoda - Probability Plotting

Benardova aproksimacija za poredak medijaneU praktinoj primjeni se umjesto binomne jednadbe koristi slijedea aproksimacija, poznata kao Benardova aproksimacija, za poredak medijane:

Procjenitelj Kaplan-MeierProcjenitelj Kaplan-Meier (poznat i kao product limit estimator) koristi se kao alternativa za procjenu funkcije nepouzdanosti, odnosno kumulativnih procenata jedinica koje su doivjele kvar. Jednadba iz koje se odreuje procjenitelj glasi:

m = ukupan broj taaka podatakan = ukupan broj jedinicarj = broj kvarova u jtoj grupi podatakasj = broj preivjelih jedinica u jtoj grupi

x1 Grafika metoda - Primjer

Razmotrimo est identinih jedinica nad kojima je u jednakim uvjetima proveden test pouzdanosti. Sve jedinice podvrgnute testu doivjele su kvar nakon slijedeeg broja sati rada: 96, 257, 498, 763, 1051 i 1744.Grafika metoda za odreivanje parametara distribucije sprovodi se u slijedeim koracima:1. Postaviti vremena do kvara u rastui poredak. Ista metodologija moe se primijeniti na sve distribucije ije se cdf mogu linearizirati. Budui da im se cdf razlikuju, za razliite distribucije moraju biti prireeni razliiti papiri. U slijedeem primjeru ilustrirat emo primjenu 1-parametarske eksponencijalne distribucije.

x1 Grafika metoda - Probability Plotting2. Procjena funkcije nepouzdanosti odnosno kumulativnih procenata jedinica koje su doivjele kvar: Nakon to se primijeni Benardova aproksimacija imat emo:3. Nakon to smo dobili x i y koordinate svih est taaka, unosimo ih na posebno konstruiran papir za eksponencijalnu distribuciju.

x1 Grafika metoda - Primjer

4. Kroz dobivene take potrebno je provui najbolji mogui pravac .

Kroz ordinatu Q(t) = 63.2% ili R(t) =36.8% povlai se horizontalna linija do presjeka s dobivenim pravcem, a zatim iz tako dobivenog presjecita sputa vertikala na apscisu.

Vrijednost dobivena na apscisi predstavlja procijenjeni parametar sredine, koji u konkretnom sluaju iznosi = 833. To znai da je: = 1/ = 0.0012 Budui da je: Q () = 1 exp (- /) = = 1 e-1 = 0.632 = 63.2% ,ovaj procenat uvijek iznosi 63.2%.

x1 Grafika metoda - Primjer

Sada je mogue odrediti bilo koju vrijednost pouzdanosti za bilo koje vrijeme misije.Naprimjer za t = 15 sati oitavamo da je R(t = 15) = 98.15% . Isto se moe dobiti i analitiki uvrtavajui dobivenu vrijednost parametra i zadano vrijeme u izraz za funkciju pouzdanosti jednoparametarske eksponencijalne distribucije.Komentar o grafikoj metodiOsim oiglednih nedostataka grafike metode, runo crtanje ne daje uvijek konzistentne rezultate dva ovjeka nee crtajui pravac kroz zadani set taaka dobiti istu pravu liniju, to e zasigurno utjecati na konaan rezultat.

Ova metoda je bila koritena ranije, prije iroke primjene kompjutera, a danas se upotrebljava sve rijee, budui da kompjuter omoguuje primjenu mnogo kompliciranijih, ali tanijih metoda.

x1

Metoda najmanih kvadrata - Regresija poretka

Linerna regresija OKJako polarizirana linerna regresija Koordinate taaka podataka

x1Metoda najmanih kvadrata - Regresija poretka

Metoda najmanjih kvadrata ili najmanjih kvadrata odstupanja predstavlja matematsku verziju grafike metode koju smo prethodno opisali.Termini linearna regresija i najmanji kvadrati su sinonimi. Ova minimizacija moe biti provedena u vertikalnom ili horizontalnom pravcu. Ako se radi o regresiji po X, onda pravac treba postaviti tako da se minimiziraju horizontalna rastojanja.Ako se radi o regresiji po Y, onda to znai da treba minimizirati vertikalna rastojanja.Metoda najmanjih kvadrata zahtijeva da prava linija koja se provlai kroz take na grafikonu bude takva da suma kvadrata rastojanja od te linije do taaka to se odnose na podatke bude minimizirana.

Regresija po YMetoda najmanih kvadrata

Neka je dobiven skup parova (x1 , y1), (x2 , y2) ,..., (xN , yN) i neka su tano poznate x vrijednosti. Onda je, u skladu s principom najmanjih kvadrata, koji minimizira vertikalna rastojanja izmeu taaka s podacima i prave linije koja aproksimira podatke, najbolji pravac (simbol () oznaava da se radi o procjeni) onaj koji daje:

gdje su i b procjene a i b metodom najmanjih kvadrata, a N je broj taaka s podacima.

x1Metoda najmanih kvadrata

Ove jednadbe bivaju minimizirane pomou procjenitelja:

x1Regresija po Xgdje su i b procjene a i b metodom najmanjih kvadrata, a N je broj taaka s podacima. Metoda najmanih kvadrata

Neka je dobiven skup parova (x1 , y1), (x2 , y2) ,..., (xN , yN) i neka su tano poznate x vrijednosti. Tada, u skladu s principom najmanjih kvadrata, koji minimizira vertikalna rastojanja izmeu taaka i prave linije koja aproksimira podatke, najbolji pravac (simbol () oznaava da se radi o procjeni) onaj koji daje:

x1

xy = kovarijansa varijabila x i y , x = standardna devijacija za x , y = standardna devijacija za y. Metoda najmanih kvadrata

Koeficijent korelacije () je mjera dobrote linearne regresije, odnosno mjera pristajanja regresionog modela dobivenim podacima. U sluaju analize podataka ivotne dobi to je mjera linearne relacije (korelacije) izmeu poretka medijana i vremena do kvara. Koeficijent korelacije za populaciju definiran je kao:Ove jednadbe bivaju minimizirane pomou procjenitelja:Koeficient korelacije

x1Procijenitelj koeficijenta korelacije populacije () je koeficijent korelacije uzorka, , dan preko jednadbe:Rang u kojemu se kree :to je koeficijent korelacije blii 1, to je linearna regresija bolja. Iznos +1 pokazuje da svi parovi (xi , yi) lee na pravoj liniji s pozitivnim nagibom, a iznos -1 ukazuje na perfektno pristajanje s negativnim nagibom. Ako je koeficijent korelacije jednak nuli, to znai da imamo stohastiki razasute podatke i da ne postoji korelacija izmeu x i y u linearnom modelu. Metoda najmanih kvadrata

x1Komentar o metodi najmanjih kvadrataMetoda najmanih kvadrata

Metoda najmanjih kvadrata predstavlja dobru metodu za procjenu parametara kad se radi o funkcijama koje se mogu linearizirati.Za takve distribucije prorauni su relativno jednostavni i ne trae primjenu numerikih tehnika niti koritenje tablica. Osim toga, preko koeficijenta korelacije, ova metoda nudi dobru mjeru dobrote primjene izabrane distribucije.

Metodu najmanjih kvadrata je najbolje koristiti u sluajevima kada skup podataka sadri kompetne podatke, tj. kada se podaci sastoje samo od vremene do kvara, bez cenzuriranih ili intervalnih podataka.

x1

Metoda maksimalne vjerodostojnosti (Maximum Likelihood Estimation MLE) Koncept Princip metode MLE:

Izabrati parametre koji maksimiraju funkciju vjerodostojnostiMetoda maksimalne vjerodostojnosti

http://www.aiaccess.net/English/home.htmhttp://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_likelihood.htmInteractive animationLikelihoodThis illustration shows a sample of nindependent observations, and two continuousdistributions f1(x) and f2(x), with f2(x)being just f1(x)translated by a certain amount.Of these two distributions, which one is the most likely to have generated the sample ? Clearly, the answer is f1(x), and we would like to formalize this intuition.Although thisisnot strictly impossible, we don't believe that f2(x)generated the sample because all the observations are in regions where the values of f2(x) aresmall : the probability for an observation to appear in such a region is small, and it is even more unlikely that all the observations in the sample would appear in low density regions.On the other hand, the values taken by f1(x) are substantial for all the observations, which are thenwhere one would expect them to be, would the sample be actually generated by f1(x).Definition of the likelihoodOf the many ways to quantify this intuitive judgement, one turns out to be remarkably effective.Continuous distributionsFor any continuous probability distribution f(x), just multiply the values of f(x) for each of the observations in the sample, denote the result L, and call it the likelihood of the distribution f(x) for this particular sample :Likelihood = L =: i f(xi) i = 1, 2, ..., nClearly, the likelihood can have a large value only if all the observations are in regions where f(x)is not very small.This definition has the additional advantage that L receivesa natural interpretation. The sample x = {x1, x2,..., xn } may be regarded as a single observation generated by the n-variate probability distribution f(x1, x2, ..., xn) = i f(xi)because of the independence of the individual observations. So the likelihood of the distribution is just the value of the n-variate probability density f(x1, x2, ..., xn ) for the set of observations in the sample considered as a unique n-variate observation.Discrete distributionsA similar definition is obtained for discrete distributions by replacing "probability density" by "probability". Let p(x) be a discrete distribution defined by:P{X = k} = p(k)The likelihood of this distribution for a given sample x = {x1, x2,..., xn } is defined byLikelihood = L =: i p(xi) i = 1, 2, ..., nThe likelihood is then the probability of the sample x considered as a single observation drawn from the discrete probability distributionp(x1, x2, ..., xn) = i p(xi)Likelihood and estimation, Maximum Likelihood estimatorsThese considerations make us believe that "likelihood" might be a helpful concept for identifying the distribution that generated a given sample.First note, though, that as such, thisapproach is moot if we don't a priori restrict the scope of our search : the probability distribution leading tothe largest possible value of the likelihood is obtained by assigning the probability 1/n to each of the points where there is an observation, and assigning the value 0 to f(x) for any other point of the x axis. This result is both trivial and useless.But consider the example given in the above illustration : f1(x) and f2(x) are assumed to belong to a family of distributions, all identical in shape and differing only by their position along the x axis (location family). It now makes sense to ask for which position of the generic distribution f(x) is the likelihood largest. If we denote the parameter adjusting the horizontal positionof the distribution, one may consider the value of conducive to the largest likelihood as being probably fairly close to the true (and unknown) value 0of the parameter of the distribution that actually generated the sample.It then appears that the concept of likelihood may lead to a method of parameterestimation. The method consists in retaining as an estimate of 0 the value of conducive to the largest possiblevalue of the sample likelihood. This method is thus called Maximum Likelihood estimation, which is, in fact, the most powerful and widely used method of parameter estimation these days.An estimator * obtained by maximizing the likelihood of a probability distribution defined up to the value ofa parameter is called a Maximum Likelihood estimator and is usually denoted "MLE".When we need toemphasize the fact thatthe likelihood depends on both the sample x = {xi} and the parameter , we'll denote it L(x, ).-----We used a continuous probability distribution for illustrating the concept of Maximum Likelihood, but the principle is valid for any distribution, either discrete or continuous.Log-LikelihoodThe likelihood is defined as a product, and maximizing aproduct isusually more difficult than maximizing a sum. But if a function L() is changed into a new function L'() by a monotonously increasing transformation, then L() and L'()will clearly reach their maximum values for the same value of .In particular, if the monotonous transformation is logarithmic, maximization of a product is then turned into an easiermaximization of a sum.The logarithm of the likelihood is called the log-likelihood, and will be denoted log-L. So, by definition :Log-likelihood = log-L = : i log(f(xi)) i = 1, 2, ..., nand the likelihood and log-likelihood reach their extrema for the same values of .Maximizing the likelihoodThe principle of Maximum Likelihood estimation is straightforward enough, but its practice is fraught with difficulties, as is the case of any optimization problem.Identifying extremaFrom your highschool days, you may remember that the extrema of a differentiable function L()verify the condition

As we just mentioned, this condition is also valid for the log-likelihood log-L instead of the likelihood L.So the most natural approach to maximizing a differentiablelikelihood is to firstsolve this equation. Yet, this alone is far from solving the problem for a number of reasons.TractabilityEven though most classical likelihoods are differentiable (with the importantexception of the uniform distribution), there is no reason why the solutions ofthis equation should have simple analytical forms. As a matter of fact, more often than not, they don't, and it may then be necessary to resort to computer numerical techniques to identify the extrema of the likelihood function (as is typically the case, for example,with Logistic Regression).Identifying maximaThe above equation identifies extremaof L(), but says nothing about which of these extrema are maxima (that we are interested in)and which are minima (that we are not interested in). To make things worse, some inflexion points may also satisfy the equation. So, after the solutions of the equation have been identified, one must go through these solutions to retain only those corresponding to maxima. Computer optimization techniques can be adjusted so as to identify only maxima.Recall that a genuinemaximum also verifies :a condition that has to be checked for every solution of the first equation.* There is no reason why the likelihood would have a single maximum. So once the maxima have been found, only the largest among them is retained, provided it is within the allowed range of the parameter . It can be shown, though, that under certain regularity conditions,the probability for the likelihood function to have a unique maximum tends to 1 as the sample size grows without limit.* The equationidentifies extrema only within the interior of the range of . It is therefore ineffective in identifying extrema :- That are on the boundary of the range of when this range is limited.- Or that are "at infinity" (lower image of the above illustration). The likelihood then has no maximum.Numerical errorsWhen the maximum of the likelihood is identified by computer numerical techniques, the issue of the validityof the solution thus found is crucial. The value resultingfrom intense numerical computation may be extremely sensitive to round-off errors, thus conducing to estimatedvalues of the parameter that may besubstantially different from the value that would be obtained in the absence ofcomputation errors. This is particularly true when the truevalueof the parameter is in a region where the likelihood varies very little with , or when the maximum is at infinity.Numerical instabilitiesIn the same line of reasoning, it may happen that the value of thelikelihood is extremely sensitive to small changes in the values of the observations. Because real-world observations are always somewhat uncertain, it is a good idea to check the changes in the estimated value of the parameter when the values of theobservations are slightly modified. If these small modifications lead to large variations in the estimated value of the parameter, the original value of the estimated parameter should be regarded with some suspicion.Multivariate likelihoodMaximization of the likelihood may alsobe used for estimating several parameters simultaneously. This will be the case :1) When two (or more) parameters of a univariate distribution are estimated simultaneously (for example, simultaneous estimation of the mean and of the variance of a normal distribution, see animation below).2) When a (vector) parameter of a multivariate distribution is estimated. For example :*Estimating the mean of a p-variate distribution is equivalent to the simultaneous estimation of p univariate parameters (the coordinates of the distribution mean).* Estimating a covariance matrix involves the simultaneous estimation of its n(n + 1)/2 coefficients (owing to the symmetry of the matrix).-----The situation is now a bit more complex than in the univariate case.First order conditionsAs in the univariate case (and with the same restrictions concerning the range), local extrema can be identified by setting to 0all the partial derivatives of the likelihood function with respect to the components of the parameter. For example, if the vector parameter has two components 1and 2, the extrema of the likelihood must verify :and Second order conditionsThe second order condition permitting to certify that a point of a twice continuously differentiable function L verifying the above first odrder conditions is indeed a maximum are somewhat more complicated than in the univariate case. It is in fact a set of two conditions :1) At least one of the second partial derivative of L with respect to the components of the parameter must be negative (not just non positive) :for at least one i2) The determinant of the matrix of the second order partial derivatives of L must be positive (not just non negative) :This last condition is in practice fairly annoying as it usuallyleads to cumbersome calculations even in simple cases.AnimationThis animation illustrates the idea of maximizing the likelihood of a normal distribution when its two parameters (mean and variance) have to be estimated simultaneously.The "Book of Animations" on your computerThe distribution likelihood is the product of the heights of all the greenconnections from the samplepoints to the gaussian curve.The posted value is the ratio of the current likelihood to the largest possible likelihood.To fit the candidate normal distribution to the sample :* Translate it by translating the top of the curve with your mouse,* Change its width (standard deviation) by translating either side of the curve with your mouse.Fine-tune the position and width of the curve by clicking and keepingyour mouse button down :* Above the top of the curve to make it taller (and therefore narrower),* In the area below the curve to make it shorter (and therefore wider),* On either side of the curve to translate it.Properties of Maximum Likelihood estimatorsSo far, we only convinced ourselves that maximizing the likelihood of a sample seems to bea reasonable way of estimating the value of the parameter of a distribution, but we also anticipated some technical difficulties in doing so. So why insist on Maximum Likelihood estimation ?It turns out that MLEs have very interesting properties, that we now enunciate.Invariance property of MLEsSuppose we identified *, the Maximum Likelihood estimator of the parameter . Suppose also that what we are really interested in is not , but rather a function of , say (). How can we find an estimator of ()? For example, istheMLE of a variance of any help in identifying an estimator of ?It is. We'll show that for any function (.), if *is the Maximum Likelihood estimator of , then (*)is the Maximum Likelihood estimator of ().Fixed size samples* We show here that if the parameter admits an efficient estimator * (i.e. unbiased with minimum variance), then the value of this estimator for the sample is the unique solution of the above likelihood equation.* We show here that if the parameter admits a sufficient statistic, andif the MLE is unique, then this MLE is a function of this sufficient statistic. In addition, if this unique MLE is itself sufficient, then it is minimal sufficient.Asymptotic properties of MLEsThe strongest justification for Maximum Likelihood estimation may be found in the asymptotic (that is, for large samples) properties of MLEs.1) ConsistencyThe least that can be expected from a statistic asa candidate estimator is to be consistent. We'll show that, under certain regularity conditions, aMLE isindeed consistent : for larger and larger samples, its variance tends to 0 and its expectation tends to the true value 0 of the parameter.2) Asymptotic normalityAs the sample size grows without limit, we'll show that the distribution of a MLE converges to a normal distribution. Even for moderately large samples, the distribution of MLE is approximately normal.3) Asymptotic efficiencyLast but certainly not least,we'll show that a MLE is asymptotically efficient. What this means is that as the sample size grows without limit, the ratio of the variance of a MLE to the Cramr-Rao lower bound tends to 1. As a MLE is asymptotically unbiased, it is then also asymptotically efficient.-----Remember, though, that the asymptotic properties of an estimator, good as they may be, say nothing about the properties of this estimator for small samples, and there is no reason to believe that MLEs are particularly good estimators for small samples. In particular :*Consistency implies asymptotic unbiasedness, but MLEs have no reason for being unbiased estimators, and more often than not, they are biased.* Asymptotic efficiency implies the smallest possible variance for very large samples, but says nothing about the variance of a MLE for moderate size samples.Maximum likelihood estimationand Data ModelingThe concept of Maximum Likelihood estimation extends to the question of estimating the parameters of a model (whether predictive of descriptive). Consider for example the case of Simple Linear Regression y = + x + .For each data point xi, the standard linear regression model tells us that the measurement yi is normally distributed (under the normality assumption) according to N( + xi, ). Fitting the model provides estimates of , and .This fit may be achieved by assigning the quantity N(yi - ( + xi), ) to each data point, and defining the likelihood of the model by :L = i N(yi - ( + xi), )(where yiis the measurement in xi), and then maximizing this likelihood with respect to , and . The results of this maximization process are the Maximum Likelihood estimates of the model parameters.This approach clearly generalizes to any model consisting of :1)A deterministic part (here + x),2)And a random part with a probability distribution known up to the values of some parameters (here, ).-----In the case of standard Linear Regression, Maximum Likelihood estimation is nearly equivalent to Least Squares estimation, but this is an exception rather thana rule. For example, withLogistic Regression or Neural Networks, Maximum Likelihood estimation is just about the only operational technique for estimating the parameters of the model.Likelihood and testsSince likelihood measures the quality of the fit between a distribution and a sample, it should be expected to play an important role in tests bearing on the choice between candidate distributions as the distribution that generated the sample.The simplest example of use of the likelihood in tests is to be found with the Neyman-Pearson theorem, which states that the Best Critical Region for a test that has to decide between two candidate distributions is entirely determined by considerations about the likelihoods of these two distributions for the sample at hand.The Neyman-Pearson theorem is limited to the case where both the null and the alternative hypothesisaresimple. Its generalization to composite hypothesis gives rise to a general and powerful method for building tests known as the Likelihood Ratio Test (LRT) method.CaveatMaximum Likelihood estimation is attractive because it isconceptually simple and receives anintuitive interpretation. Yet, a mathematicallyrigorous approach of the properties of MLEs is difficult, and invariably involvesregularity conditions on the likelihood functionthat are both difficult to establish, difficult to interpret and difficult to check in real life applications.These regularity conditions cannot be casually ignored,and the already long life of Maximum Likelihood estimation is illustrated by a number of lethallypathological behaviors of MLEs, even for the most basic properties (e.g. consistency). So MLEs should certainly not be considered as a magic solution to be selected without regard for other types of estimators

26

x1

Metoda maksimalne vjerodostojnosti Prikazan je uzorak koji se sastoji iz n neovisnih opservacija i prikazane su dvije kontinuirane distribucije f1(x) i f2(x), pri emu je f2(x) ustvaritranslirana f1(x).Vjerodostojnost - ideja

Mada to nije posve nemogue, mi sumnjamo (ne vjerujemo) da je uzorak generirala funkcija f2(x), budui se sve opservacije nalaze u regionu gdje su vrijednosti funkcije f2(x) male: vjerovatnost da se opservacija pojavi u tom regionu je mala,a jo je manje vjerovatno da su se ba sve opservacije u uzorku pojavile u regionu niske gustoe vjerovatnosti. S druge strane, vrijednosti koje poprima f1(x) kod svih opservacija su znatne i upravo takve da ne postoji sumnja da bi uzorak mogao biti generiran od strane funkcije f1(x).Ova nas razmatranja nagone da povjerujemo kako vjerodostojnost moe biti djelotvoran koncept za identificiranje distribucije koja je generirala uzorak. http://www.aiaccess.net/English/home.htmhttp://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_likelihood.htmInteractive animationLikelihoodThis illustration shows a sample of nindependent observations, and two continuousdistributions f1(x) and f2(x), with f2(x)being just f1(x)translated by a certain amount.Of these two distributions, which one is the most likely to have generated the sample ? Clearly, the answer is f1(x), and we would like to formalize this intuition.Although thisisnot strictly impossible, we don't believe that f2(x)generated the sample because all the observations are in regions where the values of f2(x) aresmall : the probability for an observation to appear in such a region is small, and it is even more unlikely that all the observations in the sample would appear in low density regions.On the other hand, the values taken by f1(x) are substantial for all the observations, which are thenwhere one would expect them to be, would the sample be actually generated by f1(x).Definition of the likelihoodOf the many ways to quantify this intuitive judgement, one turns out to be remarkably effective.Continuous distributionsFor any continuous probability distribution f(x), just multiply the values of f(x) for each of the observations in the sample, denote the result L, and call it the likelihood of the distribution f(x) for this particular sample :Likelihood = L =: i f(xi) i = 1, 2, ..., nClearly, the likelihood can have a large value only if all the observations are in regions where f(x)is not very small.This definition has the additional advantage that L receivesa natural interpretation. The sample x = {x1, x2,..., xn } may be regarded as a single observation generated by the n-variate probability distribution f(x1, x2, ..., xn) = i f(xi)because of the independence of the individual observations. So the likelihood of the distribution is just the value of the n-variate probability density f(x1, x2, ..., xn ) for the set of observations in the sample considered as a unique n-variate observation.Discrete distributionsA similar definition is obtained for discrete distributions by replacing "probability density" by "probability". Let p(x) be a discrete distribution defined by:P{X = k} = p(k)The likelihood of this distribution for a given sample x = {x1, x2,..., xn } is defined byLikelihood = L =: i p(xi) i = 1, 2, ..., nThe likelihood is then the probability of the sample x considered as a single observation drawn from the discrete probability distributionp(x1, x2, ..., xn) = i p(xi)Likelihood and estimation, Maximum Likelihood estimatorsThese considerations make us believe that "likelihood" might be a helpful concept for identifying the distribution that generated a given sample.First note, though, that as such, thisapproach is moot if we don't a priori restrict the scope of our search : the probability distribution leading tothe largest possible value of the likelihood is obtained by assigning the probability 1/n to each of the points where there is an observation, and assigning the value 0 to f(x) for any other point of the x axis. This result is both trivial and useless.But consider the example given in the above illustration : f1(x) and f2(x) are assumed to belong to a family of distributions, all identical in shape and differing only by their position along the x axis (location family). It now makes sense to ask for which position of the generic distribution f(x) is the likelihood largest. If we denote the parameter adjusting the horizontal positionof the distribution, one may consider the value of conducive to the largest likelihood as being probably fairly close to the true (and unknown) value 0of the parameter of the distribution that actually generated the sample.It then appears that the concept of likelihood may lead to a method of parameterestimation. The method consists in retaining as an estimate of 0 the value of conducive to the largest possiblevalue of the sample likelihood. This method is thus called Maximum Likelihood estimation, which is, in fact, the most powerful and widely used method of parameter estimation these days.An estimator * obtained by maximizing the likelihood of a probability distribution defined up to the value ofa parameter is called a Maximum Likelihood estimator and is usually denoted "MLE".When we need toemphasize the fact thatthe likelihood depends on both the sample x = {xi} and the parameter , we'll denote it L(x, ).-----We used a continuous probability distribution for illustrating the concept of Maximum Likelihood, but the principle is valid for any distribution, either discrete or continuous.Log-LikelihoodThe likelihood is defined as a product, and maximizing aproduct isusually more difficult than maximizing a sum. But if a function L() is changed into a new function L'() by a monotonously increasing transformation, then L() and L'()will clearly reach their maximum values for the same value of .In particular, if the monotonous transformation is logarithmic, maximization of a product is then turned into an easiermaximization of a sum.The logarithm of the likelihood is called the log-likelihood, and will be denoted log-L. So, by definition :Log-likelihood = log-L = : i log(f(xi)) i = 1, 2, ..., nand the likelihood and log-likelihood reach their extrema for the same values of .Maximizing the likelihoodThe principle of Maximum Likelihood estimation is straightforward enough, but its practice is fraught with difficulties, as is the case of any optimization problem.Identifying extremaFrom your highschool days, you may remember that the extrema of a differentiable function L()verify the condition

As we just mentioned, this condition is also valid for the log-likelihood log-L instead of the likelihood L.So the most natural approach to maximizing a differentiablelikelihood is to firstsolve this equation. Yet, this alone is far from solving the problem for a number of reasons.TractabilityEven though most classical likelihoods are differentiable (with the importantexception of the uniform distribution), there is no reason why the solutions ofthis equation should have simple analytical forms. As a matter of fact, more often than not, they don't, and it may then be necessary to resort to computer numerical techniques to identify the extrema of the likelihood function (as is typically the case, for example,with Logistic Regression).Identifying maximaThe above equation identifies extremaof L(), but says nothing about which of these extrema are maxima (that we are interested in)and which are minima (that we are not interested in). To make things worse, some inflexion points may also satisfy the equation. So, after the solutions of the equation have been identified, one must go through these solutions to retain only those corresponding to maxima. Computer optimization techniques can be adjusted so as to identify only maxima.Recall that a genuinemaximum also verifies :a condition that has to be checked for every solution of the first equation.* There is no reason why the likelihood would have a single maximum. So once the maxima have been found, only the largest among them is retained, provided it is within the allowed range of the parameter . It can be shown, though, that under certain regularity conditions,the probability for the likelihood function to have a unique maximum tends to 1 as the sample size grows without limit.* The equationidentifies extrema only within the interior of the range of . It is therefore ineffective in identifying extrema :- That are on the boundary of the range of when this range is limited.- Or that are "at infinity" (lower image of the above illustration). The likelihood then has no maximum.Numerical errorsWhen the maximum of the likelihood is identified by computer numerical techniques, the issue of the validityof the solution thus found is crucial. The value resultingfrom intense numerical computation may be extremely sensitive to round-off errors, thus conducing to estimatedvalues of the parameter that may besubstantially different from the value that would be obtained in the absence ofcomputation errors. This is particularly true when the truevalueof the parameter is in a region where the likelihood varies very little with , or when the maximum is at infinity.Numerical instabilitiesIn the same line of reasoning, it may happen that the value of thelikelihood is extremely sensitive to small changes in the values of the observations. Because real-world observations are always somewhat uncertain, it is a good idea to check the changes in the estimated value of the parameter when the values of theobservations are slightly modified. If these small modifications lead to large variations in the estimated value of the parameter, the original value of the estimated parameter should be regarded with some suspicion.Multivariate likelihoodMaximization of the likelihood may alsobe used for estimating several parameters simultaneously. This will be the case :1) When two (or more) parameters of a univariate distribution are estimated simultaneously (for example, simultaneous estimation of the mean and of the variance of a normal distribution, see animation below).2) When a (vector) parameter of a multivariate distribution is estimated. For example :*Estimating the mean of a p-variate distribution is equivalent to the simultaneous estimation of p univariate parameters (the coordinates of the distribution mean).* Estimating a covariance matrix involves the simultaneous estimation of its n(n + 1)/2 coefficients (owing to the symmetry of the matrix).-----The situation is now a bit more complex than in the univariate case.First order conditionsAs in the univariate case (and with the same restrictions concerning the range), local extrema can be identified by setting to 0all the partial derivatives of the likelihood function with respect to the components of the parameter. For example, if the vector parameter has two components 1and 2, the extrema of the likelihood must verify :and Second order conditionsThe second order condition permitting to certify that a point of a twice continuously differentiable function L verifying the above first odrder conditions is indeed a maximum are somewhat more complicated than in the univariate case. It is in fact a set of two conditions :1) At least one of the second partial derivative of L with respect to the components of the parameter must be negative (not just non positive) :for at least one i2) The determinant of the matrix of the second order partial derivatives of L must be positive (not just non negative) :This last condition is in practice fairly annoying as it usuallyleads to cumbersome calculations even in simple cases.AnimationThis animation illustrates the idea of maximizing the likelihood of a normal distribution when its two parameters (mean and variance) have to be estimated simultaneously.The "Book of Animations" on your computerThe distribution likelihood is the product of the heights of all the greenconnections from the samplepoints to the gaussian curve.The posted value is the ratio of the current likelihood to the largest possible likelihood.To fit the candidate normal distribution to the sample :* Translate it by translating the top of the curve with your mouse,* Change its width (standard deviation) by translating either side of the curve with your mouse.Fine-tune the position and width of the curve by clicking and keepingyour mouse button down :* Above the top of the curve to make it taller (and therefore narrower),* In the area below the curve to make it shorter (and therefore wider),* On either side of the curve to translate it.Properties of Maximum Likelihood estimatorsSo far, we only convinced ourselves that maximizing the likelihood of a sample seems to bea reasonable way of estimating the value of the parameter of a distribution, but we also anticipated some technical difficulties in doing so. So why insist on Maximum Likelihood estimation ?It turns out that MLEs have very interesting properties, that we now enunciate.Invariance property of MLEsSuppose we identified *, the Maximum Likelihood estimator of the parameter . Suppose also that what we are really interested in is not , but rather a function of , say (). How can we find an estimator of ()? For example, istheMLE of a variance of any help in identifying an estimator of ?It is. We'll show that for any function (.), if *is the Maximum Likelihood estimator of , then (*)is the Maximum Likelihood estimator of ().Fixed size samples* We show here that if the parameter admits an efficient estimator * (i.e. unbiased with minimum variance), then the value of this estimator for the sample is the unique solution of the above likelihood equation.* We show here that if the parameter admits a sufficient statistic, andif the MLE is unique, then this MLE is a function of this sufficient statistic. In addition, if this unique MLE is itself sufficient, then it is minimal sufficient.Asymptotic properties of MLEsThe strongest justification for Maximum Likelihood estimation may be found in the asymptotic (that is, for large samples) properties of MLEs.1) ConsistencyThe least that can be expected from a statistic asa candidate estimator is to be consistent. We'll show that, under certain regularity conditions, aMLE isindeed consistent : for larger and larger samples, its variance tends to 0 and its expectation tends to the true value 0 of the parameter.2) Asymptotic normalityAs the sample size grows without limit, we'll show that the distribution of a MLE converges to a normal distribution. Even for moderately large samples, the distribution of MLE is approximately normal.3) Asymptotic efficiencyLast but certainly not least,we'll show that a MLE is asymptotically efficient. What this means is that as the sample size grows without limit, the ratio of the variance of a MLE to the Cramr-Rao lower bound tends to 1. As a MLE is asymptotically unbiased, it is then also asymptotically efficient.-----Remember, though, that the asymptotic properties of an estimator, good as they may be, say nothing about the properties of this estimator for small samples, and there is no reason to believe that MLEs are particularly good estimators for small samples. In particular :*Consistency implies asymptotic unbiasedness, but MLEs have no reason for being unbiased estimators, and more often than not, they are biased.* Asymptotic efficiency implies the smallest possible variance for very large samples, but says nothing about the variance of a MLE for moderate size samples.Maximum likelihood estimationand Data ModelingThe concept of Maximum Likelihood estimation extends to the question of estimating the parameters of a model (whether predictive of descriptive). Consider for example the case of Simple Linear Regression y = + x + .For each data point xi, the standard linear regression model tells us that the measurement yi is normally distributed (under the normality assumption) according to N( + xi, ). Fitting the model provides estimates of , and .This fit may be achieved by assigning the quantity N(yi - ( + xi), ) to each data point, and defining the likelihood of the model by :L = i N(yi - ( + xi), )(where yiis the measurement in xi), and then maximizing this likelihood with respect to , and . The results of this maximization process are the Maximum Likelihood estimates of the model parameters.This approach clearly generalizes to any model consisting of :1)A deterministic part (here + x),2)And a random part with a probability distribution known up to the values of some parameters (here, ).-----In the case of standard Linear Regression, Maximum Likelihood estimation is nearly equivalent to Least Squares estimation, but this is an exception rather thana rule. For example, withLogistic Regression or Neural Networks, Maximum Likelihood estimation is just about the only operational technique for estimating the parameters of the model.Likelihood and testsSince likelihood measures the quality of the fit between a distribution and a sample, it should be expected to play an important role in tests bearing on the choice between candidate distributions as the distribution that generated the sample.The simplest example of use of the likelihood in tests is to be found with the Neyman-Pearson theorem, which states that the Best Critical Region for a test that has to decide between two candidate distributions is entirely determined by considerations about the likelihoods of these two distributions for the sample at hand.The Neyman-Pearson theorem is limited to the case where both the null and the alternative hypothesisaresimple. Its generalization to composite hypothesis gives rise to a general and powerful method for building tests known as the Likelihood Ratio Test (LRT) method.CaveatMaximum Likelihood estimation is attractive because it isconceptually simple and receives anintuitive interpretation. Yet, a mathematicallyrigorous approach of the properties of MLEs is difficult, and invariably involvesregularity conditions on the likelihood functionthat are both difficult to establish, difficult to interpret and difficult to check in real life applications.These regularity conditions cannot be casually ignored,and the already long life of Maximum Likelihood estimation is illustrated by a number of lethallypathological behaviors of MLEs, even for the most basic properties (e.g. consistency). So MLEs should certainly not be considered as a magic solution to be selected without regard for other types of estimators

27

x1

Ako razmotrimo funkcije f1(x) i f2(x) , vidjet emo da one pripadaju istoj familji distribucija, koje imaju isti oblik i razlikuju se samo po svojoj pozicije du x - osi (lokaciji ili poloaju).

Sada ima smisla upitati se koja pozicija daje najveu vjerodostojnost generiranja danog uzorka.

Ako sa obiljeimo parametar koji odreuje horizontalnu poziciju distribucije, moemo kazati da je vrijednost moe dovesti do najvee vjerodostojnosti, odnosno do najvee vjerovatnosti da smo blizu pravoj i nepoznatoj vrijednosti,, parametra distribucije koja je ustvari generirala uzorak. ini se da nas koncept vjerodostojnosti moe dovesti do metode za procjenu parametra.

Metoda se sastoji u tome da kao procijenitelja parametra ,, drimo vrijednost ,,koja nas dovodi do najvee mogue vrijednosti vjerodostojnosti uzorka.

Ova se metoda naziva: Metoda maksimalne vjerodostojnosti (Maximum Likelihood estimation MLE) i predstavlja najmoniju i danas najire koritenu metodu za procjenu parametara distribucije. Procijenitelj * dobiven maksimiranjem vjerodostojnosti distribucije vjerovatnosti definirane preko vrijednosti ,,naziva se procijenitelj maksimalne vjerodostojnosti i obino obiljeava sa MLE.

Kad hoemo naglasiti da vjerodostojnost ovisi i o uzorku X = {xi} i o parametru , obiljeavat emo je sa L(X, )Metoda maksimalne vjerodostojnosti http://www.aiaccess.net/English/home.htmhttp://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_likelihood.htmInteractive animationLikelihoodThis illustration shows a sample of nindependent observations, and two continuousdistributions f1(x) and f2(x), with f2(x)being just f1(x)translated by a certain amount.Of these two distributions, which one is the most likely to have generated the sample ? Clearly, the answer is f1(x), and we would like to formalize this intuition.Although thisisnot strictly impossible, we don't believe that f2(x)generated the sample because all the observations are in regions where the values of f2(x) aresmall : the probability for an observation to appear in such a region is small, and it is even more unlikely that all the observations in the sample would appear in low density regions.On the other hand, the values taken by f1(x) are substantial for all the observations, which are thenwhere one would expect them to be, would the sample be actually generated by f1(x).Definition of the likelihoodOf the many ways to quantify this intuitive judgement, one turns out to be remarkably effective.Continuous distributionsFor any continuous probability distribution f(x), just multiply the values of f(x) for each of the observations in the sample, denote the result L, and call it the likelihood of the distribution f(x) for this particular sample :Likelihood = L =: i f(xi) i = 1, 2, ..., nClearly, the likelihood can have a large value only if all the observations are in regions where f(x)is not very small.This definition has the additional advantage that L receivesa natural interpretation. The sample x = {x1, x2,..., xn } may be regarded as a single observation generated by the n-variate probability distribution f(x1, x2, ..., xn) = i f(xi)because of the independence of the individual observations. So the likelihood of the distribution is just the value of the n-variate probability density f(x1, x2, ..., xn ) for the set of observations in the sample considered as a unique n-variate observation.Discrete distributionsA similar definition is obtained for discrete distributions by replacing "probability density" by "probability". Let p(x) be a discrete distribution defined by:P{X = k} = p(k)The likelihood of this distribution for a given sample x = {x1, x2,..., xn } is defined byLikelihood = L =: i p(xi) i = 1, 2, ..., nThe likelihood is then the probability of the sample x considered as a single observation drawn from the discrete probability distributionp(x1, x2, ..., xn) = i p(xi)Likelihood and estimation, Maximum Likelihood estimatorsThese considerations make us believe that "likelihood" might be a helpful concept for identifying the distribution that generated a given sample.First note, though, that as such, thisapproach is moot if we don't a priori restrict the scope of our search : the probability distribution leading tothe largest possible value of the likelihood is obtained by assigning the probability 1/n to each of the points where there is an observation, and assigning the value 0 to f(x) for any other point of the x axis. This result is both trivial and useless.But consider the example given in the above illustration : f1(x) and f2(x) are assumed to belong to a family of distributions, all identical in shape and differing only by their position along the x axis (location family). It now makes sense to ask for which position of the generic distribution f(x) is the likelihood largest. If we denote the parameter adjusting the horizontal positionof the distribution, one may consider the value of conducive to the largest likelihood as being probably fairly close to the true (and unknown) value 0of the parameter of the distribution that actually generated the sample.It then appears that the concept of likelihood may lead to a method of parameterestimation. The method consists in retaining as an estimate of 0 the value of conducive to the largest possiblevalue of the sample likelihood. This method is thus called Maximum Likelihood estimation, which is, in fact, the most powerful and widely used method of parameter estimation these days.An estimator * obtained by maximizing the likelihood of a probability distribution defined up to the value ofa parameter is called a Maximum Likelihood estimator and is usually denoted "MLE".When we need toemphasize the fact thatthe likelihood depends on both the sample x = {xi} and the parameter , we'll denote it L(x, ).-----We used a continuous probability distribution for illustrating the concept of Maximum Likelihood, but the principle is valid for any distribution, either discrete or continuous.Log-LikelihoodThe likelihood is defined as a product, and maximizing aproduct isusually more difficult than maximizing a sum. But if a function L() is changed into a new function L'() by a monotonously increasing transformation, then L() and L'()will clearly reach their maximum values for the same value of .In particular, if the monotonous transformation is logarithmic, maximization of a product is then turned into an easiermaximization of a sum.The logarithm of the likelihood is called the log-likelihood, and will be denoted log-L. So, by definition :Log-likelihood = log-L = : i log(f(xi)) i = 1, 2, ..., nand the likelihood and log-likelihood reach their extrema for the same values of .Maximizing the likelihoodThe principle of Maximum Likelihood estimation is straightforward enough, but its practice is fraught with difficulties, as is the case of any optimization problem.Identifying extremaFrom your highschool days, you may remember that the extrema of a differentiable function L()verify the condition

As we just mentioned, this condition is also valid for the log-likelihood log-L instead of the likelihood L.So the most natural approach to maximizing a differentiablelikelihood is to firstsolve this equation. Yet, this alone is far from solving the problem for a number of reasons.TractabilityEven though most classical likelihoods are differentiable (with the importantexception of the uniform distribution), there is no reason why the solutions ofthis equation should have simple analytical forms. As a matter of fact, more often than not, they don't, and it may then be necessary to resort to computer numerical techniques to identify the extrema of the likelihood function (as is typically the case, for example,with Logistic Regression).Identifying maximaThe above equation identifies extremaof L(), but says nothing about which of these extrema are maxima (that we are interested in)and which are minima (that we are not interested in). To make things worse, some inflexion points may also satisfy the equation. So, after the solutions of the equation have been identified, one must go through these solutions to retain only those corresponding to maxima. Computer optimization techniques can be adjusted so as to identify only maxima.Recall that a genuinemaximum also verifies :a condition that has to be checked for every solution of the first equation.* There is no reason why the likelihood would have a single maximum. So once the maxima have been found, only the largest among them is retained, provided it is within the allowed range of the parameter . It can be shown, though, that under certain regularity conditions,the probability for the likelihood function to have a unique maximum tends to 1 as the sample size grows without limit.* The equationidentifies extrema only within the interior of the range of . It is therefore ineffective in identifying extrema :- That are on the boundary of the range of when this range is limited.- Or that are "at infinity" (lower image of the above illustration). The likelihood then has no maximum.Numerical errorsWhen the maximum of the likelihood is identified by computer numerical techniques, the issue of the validityof the solution thus found is crucial. The value resultingfrom intense numerical computation may be extremely sensitive to round-off errors, thus conducing to estimatedvalues of the parameter that may besubstantially different from the value that would be obtained in the absence ofcomputation errors. This is particularly true when the truevalueof the parameter is in a region where the likelihood varies very little with , or when the maximum is at infinity.Numerical instabilitiesIn the same line of reasoning, it may happen that the value of thelikelihood is extremely sensitive to small changes in the values of the observations. Because real-world observations are always somewhat uncertain, it is a good idea to check the changes in the estimated value of the parameter when the values of theobservations are slightly modified. If these small modifications lead to large variations in the estimated value of the parameter, the original value of the estimated parameter should be regarded with some suspicion.Multivariate likelihoodMaximization of the likelihood may alsobe used for estimating several parameters simultaneously. This will be the case :1) When two (or more) parameters of a univariate distribution are estimated simultaneously (for example, simultaneous estimation of the mean and of the variance of a normal distribution, see animation below).2) When a (vector) parameter of a multivariate distribution is estimated. For example :*Estimating the mean of a p-variate distribution is equivalent to the simultaneous estimation of p univariate parameters (the coordinates of the distribution mean).* Estimating a covariance matrix involves the simultaneous estimation of its n(n + 1)/2 coefficients (owing to the symmetry of the matrix).-----The situation is now a bit more complex than in the univariate case.First order conditionsAs in the univariate case (and with the same restrictions concerning the range), local extrema can be identified by setting to 0all the partial derivatives of the likelihood function with respect to the components of the parameter. For example, if the vector parameter has two components 1and 2, the extrema of the likelihood must verify :and Second order conditionsThe second order condition permitting to certify that a point of a twice continuously differentiable function L verifying the above first odrder conditions is indeed a maximum are somewhat more complicated than in the univariate case. It is in fact a set of two conditions :1) At least one of the second partial derivative of L with respect to the components of the parameter must be negative (not just non positive) :for at least one i2) The determinant of the matrix of the second order partial derivatives of L must be positive (not just non negative) :This last condition is in practice fairly annoying as it usuallyleads to cumbersome calculations even in simple cases.AnimationThis animation illustrates the idea of maximizing the likelihood of a normal distribution when its two parameters (mean and variance) have to be estimated simultaneously.The "Book of Animations" on your computerThe distribution likelihood is the product of the heights of all the greenconnections from the samplepoints to the gaussian curve.The posted value is the ratio of the current likelihood to the largest possible likelihood.To fit the candidate normal distribution to the sample :* Translate it by translating the top of the curve with your mouse,* Change its width (standard deviation) by translating either side of the curve with your mouse.Fine-tune the position and width of the curve by clicking and keepingyour mouse button down :* Above the top of the curve to make it taller (and therefore narrower),* In the area below the curve to make it shorter (and therefore wider),* On either side of the curve to translate it.Properties of Maximum Likelihood estimatorsSo far, we only convinced ourselves that maximizing the likelihood of a sample seems to bea reasonable way of estimating the value of the parameter of a distribution, but we also anticipated some technical difficulties in doing so. So why insist on Maximum Likelihood estimation ?It turns out that MLEs have very interesting properties, that we now enunciate.Invariance property of MLEsSuppose we identified *, the Maximum Likelihood estimator of the parameter . Suppose also that what we are really interested in is not , but rather a function of , say (). How can we find an estimator of ()? For example, istheMLE of a variance of any help in identifying an estimator of ?It is. We'll show that for any function (.), if *is the Maximum Likelihood estimator of , then (*)is the Maximum Likelihood estimator of ().Fixed size samples* We show here that if the parameter admits an efficient estimator * (i.e. unbiased with minimum variance), then the value of this estimator for the sample is the unique solution of the above likelihood equation.* We show here that if the parameter admits a sufficient statistic, andif the MLE is unique, then this MLE is a function of this sufficient statistic. In addition, if this unique MLE is itself sufficient, then it is minimal sufficient.Asymptotic properties of MLEsThe strongest justification for Maximum Likelihood estimation may be found in the asymptotic (that is, for large samples) properties of MLEs.1) ConsistencyThe least that can be expected from a statistic asa candidate estimator is to be consistent. We'll show that, under certain regularity conditions, aMLE isindeed consistent : for larger and larger samples, its variance tends to 0 and its expectation tends to the true value 0 of the parameter.2) Asymptotic normalityAs the sample size grows without limit, we'll show that the distribution of a MLE converges to a normal distribution. Even for moderately large samples, the distribution of MLE is approximately normal.3) Asymptotic efficiencyLast but certainly not least,we'll show that a MLE is asymptotically efficient. What this means is that as the sample size grows without limit, the ratio of the variance of a MLE to the Cramr-Rao lower bound tends to 1. As a MLE is asymptotically unbiased, it is then also asymptotically efficient.-----Remember, though, that the asymptotic properties of an estimator, good as they may be, say nothing about the properties of this estimator for small samples, and there is no reason to believe that MLEs are particularly good estimators for small samples. In particular :*Consistency implies asymptotic unbiasedness, but MLEs have no reason for being unbiased estimators, and more often than not, they are biased.* Asymptotic efficiency implies the smallest possible variance for very large samples, but says nothing about the variance of a MLE for moderate size samples.Maximum likelihood estimationand Data ModelingThe concept of Maximum Likelihood estimation extends to the question of estimating the parameters of a model (whether predictive of descriptive). Consider for example the case of Simple Linear Regression y = + x + .For each data point xi, the standard linear regression model tells us that the measurement yi is normally distributed (under the normality assumption) according to N( + xi, ). Fitting the model provides estimates of , and .This fit may be achieved by assigning the quantity N(yi - ( + xi), ) to each data point, and defining the likelihood of the model by :L = i N(yi - ( + xi), )(where yiis the measurement in xi), and then maximizing this likelihood with respect to , and . The results of this maximization process are the Maximum Likelihood estimates of the model parameters.This approach clearly generalizes to any model consisting of :1)A deterministic part (here + x),2)And a random part with a probability distribution known up to the values of some parameters (here, ).-----In the case of standard Linear Regression, Maximum Likelihood estimation is nearly equivalent to Least Squares estimation, but this is an exception rather thana rule. For example, withLogistic Regression or Neural Networks, Maximum Likelihood estimation is just about the only operational technique for estimating the parameters of the model.Likelihood and testsSince likelihood measures the quality of the fit between a distribution and a sample, it should be expected to play an important role in tests bearing on the choice between candidate distributions as the distribution that generated the sample.The simplest example of use of the likelihood in tests is to be found with the Neyman-Pearson theorem, which states that the Best Critical Region for a test that has to decide between two candidate distributions is entirely determined by considerations about the likelihoods of these two distributions for the sample at hand.The Neyman-Pearson theorem is limited to the case where both the null and the alternative hypothesisaresimple. Its generalization to composite hypothesis gives rise to a general and powerful method for building tests known as the Likelihood Ratio Test (LRT) method.CaveatMaximum Likelihood estimation is attractive because it isconceptually simple and receives anintuitive interpretation. Yet, a mathematicallyrigorous approach of the properties of MLEs is difficult, and invariably involvesregularity conditions on the likelihood functionthat are both difficult to establish, difficult to interpret and difficult to check in real life applications.These regularity conditions cannot be casually ignored,and the already long life of Maximum Likelihood estimation is illustrated by a number of lethallypathological behaviors of MLEs, even for the most basic properties (e.g. consistency). So MLEs should certainly not be considered as a magic solution to be selected without regard for other types of estimators

28

x1Metoda maksimalne vjerovatnosti (Maximum Likelihood ML) -koncept

Neka je = { x1,..., x n } uzorak sluajne varijable X dobiven nakon n provedenih eksperimenata. Neka je procedura putom koje je proveden eksperiment takva da ne utjee na rezultat narednog eksperimenta (to se trai i u drugim statistikim metodama). Ako je ispunjena gornja pretpostavka uzorak moemo zamisliti kao skup pojedinanih mjerenja uzajamno neovisnih i na isti nain distribuiranih sluajnih varijabila X1 ,.., Xn. Pretpostavimo da je poznat skup , tj skup parametara distribucije populacije.

Proizvod f(x1|) dx1

nije nita drugo nego vjerovatnost da se sluajna varjabla X1 nae u infintezimalnom intervalu irine dx1 centriranom u x1 ili poopeno:Pr(Xi xi, xi+dx) = f (xi)dxiMetoda maksimalne vjerodostojnosti

Proizvodpredstavlja vjerovatnost da se viestruka sluajna varijabla Z = (X1,...,Xn) nae u infintezimalnom okruenju volumenacentriranom u taki ije su koordinate Z = (x1,...,xn).Definirajmo sada funkciju vjerodostojnosti :Princip ML trai da se , za dani uzorak X , maksimira vjerovatnost opserviranja sluajne varijabile u infintezimalnom okruenju take Rn euklidskog n-dimenzionalnog prostora, definiranog vektorom X .Princip maksimalne vjerodostojnosti (ML)

Za pretpostavljenu funkciju gustoe vjerovatnosti f(x), funkcija vjerodostojnosti sluajnog uzorka { x1,..., x n } dafinira se kao: Metoda maksimalne vjerodostojnosti

x1 Kada distribucija ima K nepoznatih parametara, procjene parametara dobivaju se rjeavanjem sistema:

Funkcija l = ln(L) naziva se reducirana funkcija gustoe maksimalne vjerovatnosti: Prema tome, procjenitelj skupa parametara je vektor n, takav da vrijedi:

n = max L () Ovaj je uvjet zadovoljen u stacionarnim takama fukcije L. Logika metode: mijenjaj n sve dok funkcija L () ne dostigne maksimum, tj. sve dok se ne postigne maksimalna vjerovatnost opserviranja podataka.Metoda maksimalne vjerodostojnosti

x1 Prednost koritenja funkcije l umjesto funkcije L je u tome to se proizvodi transformiraju u sume, tako da je jednostavnije raunati prve izvode.

Maksimum dviju funkcija (l i L) je isti.Metoda maksimalne vjerodostojnosti

x1Sa statistike take gledanja, metoda maksimalne vjerovatnosti je, uz neke izuzetke najrobusnija od svih tehnika za procjenu parametara koje ovdje razmatramo.

Osnovna ideja koja lei iza ove metode je dobiti vrijednosti parametara za danu distribuciju, koji e najvjerodostojnije opisati dobivene podatke.Kao primjer uzmimo skup podataka (- 3, 0, 4) i pretpostavimo da nas zanima procjena sredine. Ako treba da izaberemo najvjerodostojniju vrijednost za sredinu izmeu vrijednosti 5, 1 i 10, koju bismo izabrali? U ovom sluaju najvjerodostojnija vrijednost je 1. Slino, u sluaju metode maksimalne vjerodostojnosti biramo one vrijednosti parametara s kojima e pretpostavljena distribucija najvjerodostojnije generirati promatrani uzorak .To se matematski moe formulirati na slijedei nain:Ako je X kontinuirana varijabila ija je pdf:

Metoda maksimalne vjerodostojnosti

x1gdje su 1, 2 ,..., k nepoznati parametri, koje je potrebno procijeniti pomou R, neovisni opservacija x1, x2,..., xR , koje u sluaju analize podataka o ivotu korespondiraju vremenima do kvara. Funkcija gustoe maksimalne vjerovatnosti dana je kao:

Logaritam funkcije L dan je kao:

Procijenitelje maksimalne vjerodostojnosti (ili vrijednosti parametara) 1, 2 ,..., k dobit emo maksimiranjem L ili .Maksimirajui to je mnogo lake nego raditi sa L , procijenitelj maksimalne vjerovatnosti 1, 2 ,..., k , pretstavljaju simultano rjeenje k jednadbi:Metoda maksimalne vjerodostojnosti

x1Metoda maksimalne vjerovatnosti za cenzurirane podatke

Lijevo cenzuriranje - vrijednost podatka nalazi se ispod stanovite vrijednosti, ali nije poznato za koliko ispod

Intervalno cenzuriranje vrijednost podatka nalazi se negdje u intervalu izmeu dvije stanovite vrijednosti

Desno cenzuriranje vrijednost podataka nalazi se iznad stanovite vrijednosti, ali nije poznato za koliko iznadMetoda maksimalne vjerodostojnosti

x1

Metoda maksimalne vjerodostojnosti

Podaci o komponenti, proizvodu ili sistemu mogu biti razdvojeni u slijedee kategorije:

kompletni podaci, desno cenzurirani podaci, intervalno cenzurirani podaci i lijevo cenzurirani podaci.Klasifikacija podataka o ivotnoj dobiKompletni podaci

Kompletni podaci, kao to prikazuje slika desno, odnose se na sluajeve kada sve jedinice koje su bile podvrgnute testu doive kvar i kada je za svaku od njih vrijeme do kvara poznato.

x1Desno cenzurirani podaci

Desno cenzurirani podaci, takoer nazivani suspendirani podaci, sastoje se iz jedinica koje tokom provoenja testa nisu doivjele kvar slika desno.Uzmimo kao primjer 5 jedinica koje su podvrgnute testu. Tri jedinice su doivjele kvar i njihova vremena do kvara (u satima) su: 65, 76, i 84. radiradiu kvaruu kvaruu kvaruDesno cenzurirani podaci213345Metoda maksimalne vjerodostojnosti

Dvije preostale jedinice jo uvijek rade, nakon to je test prekinut poslije 85 i 100 sati respektivno.Za podatke koji se odnose na ove dvije jedinice kazat emo da su suspendirane ili desno cenzurirani podaci.

x1radiradiu kvaruu kvaruu kvaruDesno cenzurirani podaci213345Metoda maksimalne vjerodostojnosti Desno cenzurirani podaci

Pretpostavimo jednu desno cenzuriranu opservaciju u momentu t = 1500.

Koja funkcija daje vjerovatnost da se to dogodilo?

R(1500) daje vjerovatnost da jedinica doivljava kvar u momentu t = 1500 ili kasnije.

x1

Intervalno cenzurirani podaci

Drugi tip cenziriranih podataka, prikazan na slici desno, predstavljaju intervalno cenzurirane podatke. Ovi podaci sadre neodreenost glede vremena kada ustvari je jedinica doivjela kvar.

Naprimjer, testu je podvrgnuto pet jedinica nad kojima se svakih 100 sati vri inspekcija kako bi se ustvrdio njihov status (pokvarena ili jo radi). Metoda maksimalne vjerodostojnosti

Status se moe ustvrditi samo u vrijeme inspekcije.Ako je jedinica doivjela kvar, poznato je samo da je ona kvar doivjela izmeu dvije inspekcije, a stvarno vrijeme kvara nije poznato. Umjesto stvarnog vremena do kvara bit e registriran interval.

x1

Metoda maksimalne vjerodostojnosti Intervalno cenzurirani podaci

Pretpostavimo jednu intervalno cenzuriranu opservaciju: kvar se dogaa u vremenu izmeu 1000 i 1300.

Koja funkcija daje vjerovatnost da se to dogodilo?

F(1300) - F(100) daje vjerovatnost da jedinica doivljava kvar u vremenu izmeu 1000 i 1300.

x1

Lijevo cenzurirani podaci

Lijevo cenzurirani podaci predstavljaju specijalni sluaj intervalno cenzuriranih podataka za koje se zna da se vrijeme do kvara dane jedinice nalazi izmeu trenutka nula i nekog trenutka u kojemu je obavljena inspekcija.

Ako se, naprimjer, inspekcija dogodila u stotom satu, jedinica je mogla doivjeti kvar u bilo kojem momentu izmeu nula i sto sati.Metoda maksimalne vjerodostojnosti

x1

Metoda maksimalne vjerodostojnosti Lijevo cenzurirani podaci

Pretpostavimo jednu lijevo cenzuriranu opservaciju u momentu t = 500.

Koja funkcija daje vjerovatnost da se to dogodilo?

F(500) daje vjerovatnost da jedinica doivljava kvar u momentu t = 500 ili ranije.

x1

Pitanje klasifikacije podataka (kompletni ili suspendirani) moe biti otvoreno za razliite interpretacije, ovisno o tomu to elimo mjeriti, U odreenim okolnostima, naime, neka(e) jedinica (e), bez obzira na toto je (su) doivjela (e) kvar moe, u svrhu analize, biti klasificirana (e) kao suspendirana (e). Uzmimo podatke iz tablice desno, gdje su dana vremena do kvara za proizvod koji je doivo kvar zbog naina kvara oznaenih sa A, B i C:Ako je cilj analize odrediti vjerovatnost kvara proizvoda, ne vodei rauna o nainu kvara , prikazane podatke emo klasificirati kao kompletne podatke. Ako je, meutim, cilj analize ustvrditi vjerovatnost pojave kvara proizvoda, naprimjer, samo zbog naina kvara A, podaci koji se odnose na naine kvara B i C bit e tretirani kao suspendirani podaci. Kompletni i suspendirani podaci

Metoda maksimalne vjerodostojnosti

x1

Traeni se parametri dobivaju maksimiranjem gornjih jednadbi.

gdje su: 1, 2 ,..., k nepoznati parametri koje treba procijeniti iz R opserviranih kvarova u vremenima T1, T2 ,..., TR , a M opservirane suspenzije u S1, S2 ,..., SM .Onda je funkcija maksimalne gustoe vjerovatnosti dana kao:

MLE za desno cenzurirane podatke

Kada obavljamo analizu maksimalne vjerodostojnosti na podacima koji sadre suspendirane jedinice, funkciju maksimalne vjerovatnosti treba proiriti, kako bi se u obzir uzele i suspendirane jedinice.Sama tehnika procijenjivanja ostaje nepromijenjena.Razmotrimo distribuciju ije su funkcije pdf i cdf:Metoda maksimalne vjerodostojnosti

x1MLE za intervalno i lijevo cenzurirane podatke

Ako imamo sluaj da su svi podaci kojim raspolaemo intervalni podaci, gornji lan e predstavljati funkciju maksimalne gustoe vjerovatnosti.

Ukljuivanje lijevo i intervalno cenzuriranih podataka u metodu MLE pretpostavlja dodavanje jednog lana u jednadbu funkcije L.

Kada imamo posla s intervalnim podacima, pretpostavlja se da se kvarovi dogaaju u intervalima tj. izmeu vremena A i vremena B (ili izmeu vremena 0 i vremena B, ako se radi o lijevo cenzuriranim podacima), gdje je A B. Pod pretpostavkom da smo imali P intervalnih opservacija, funkciju maksimalne vjerovatnosti potrebno je modificirati mnoei je s lanom:Metoda maksimalne vjerodostojnosti

x1Kompletna funkcija maksimalne vjerovatnostiNakon ukljuivanja lanova koji se odnose na razliite vrste podataka, funkcija gustoe maksimalne vjerovatnosti moe se napisati u obliku:gdje je:

Metoda maksimalne vjerodostojnosti

x1

R = broj jedinica koje su doivjele egzaktan kvar M = broj suspendiranih jedinica P = broj jedinica s lijevo ili intervalno cenzuriranim podacima k = parametri distribucije Ti = vremena do kvara Sj = vremena suspenzije IlU = zavretak intervala l- te grupe IlL = poetak intervala l- te grupe

Ukupan broj jedinica iznosi N = R + M + P .

Potrebno je primijetiti da u sluaju kada je neki od brojeva R, M , ili P jednak nuli, njemu pridrueni lan u jednadbi za kompletnu funkciju nije jednak nuli nego je jednak jedinici. Metoda maksimalne vjerodostojnosti

x1Komentari u vezi s metodom MLE

MLE metoda ima mnogo svojstava koja ju ine atraktivnom za primjenu.

Metoda je asimptotski konzistentna, to znai da kada raste veliina uzorka procjene konvergiraju k pravim vrijednostima.

Metoda je asimptotski efikasna, to znai da za velike uzorke daje najpreciznije procjene.

Metoda je asimptotski nepolarizirana, to znai da se kod velikih uzoraka, u prosjeku, mogu oekivati prave vrijednosti.

Ako se radi o dovoljno velikom uzorku, distribucija samih procjena je normalna.

Metoda maksimalne vjerodostojnosti

x1Komentari u vezi s metodom MLE

Naalost je za postizanje naprijed navedenih svojstava nuna prilino velika dimenzija uzorka : trideset do pedeset, do vie od stotine egzaktnih vremena do kvara, ovisno o aplikaciji.

Poznato je, naprimjer, da su u sluaju malih uzoraka MLE procjene parametra oblika za Weibullovu distribuciju jako polarizirane.Polarizacija moe prouzroiti nesuglasja prilikom analize.

Postoji takoer patoloke situacije kada nisu prisutna asimptotska svojstva MLE.Jedna od tih se ima kod procjene parametra poloaja za troparametarsku Weibullovu distribuciju, u sluajevima kada parametar oblika ima vrijednost blisku jedinici.Metoda maksimalne vjerodostojnosti

x1Komentari u vezi s metodom MLE

Meutim, MLE je u stanju operirati sa suspendiranim i intervalnim podacima, bolje nego metoda regresije, posebice kad imamo posla s jako cenzuriranim skupom podataka u kojemu ima samo nekoliko egzaktnih vremena do kvara ili kada su cenzurirana vremena neravnomjerno distribuirana.

MLE prua mogunost procjene i u sluajevima kada se ima jedan ili niti jedan opservirani kvar, to metoda regresije naravno nije u stanju.

Iskustva pokazuju da je tehniku regresije primjereno koristiti u sluajevima kad se imaju mali uzorci i kad nema mnogo cenzuriranih podataka.

U sluajevima kada se ima mnogo cenzuriranih podataka i kada je prisutna velika disproporcija u intervalima i/ili kada je dimenzija uzorka dovoljna, preferira se primjena MLE metode.

Metoda maksimalne vjerodostojnosti

x1Do sada smo se bavili s pristupima procjeni parametara baziranim na klasinoj statistici. U ovom poglavlju bit e uvedena druga kola razmiljanja Bayesova statistika. Osnovna premisa Bayesove statistike (u kontekstu analize podataka ivotne dobi) jeste: do statistikih zakljuaka doi tako to e se aktualnim opservacijama pridruiti prethodno steeno znanje. Ukljuivanjem prethodnih informacija o parametrima mogue je dobiti njihovu posteriornu distribuciju i tako izvui zakljuke o parametrima modela i njihovim funkcijama.

Prethodne informacije mogu potjecati iz eksploatacije ili opservacija dobivenih kroz prethodno obavljene usporedive eksperimenate.

Ova vrsta analize moe biti naroito korisna u sluajevima kad se ima ogranien broj podataka o danom dizajnu ili kvaru, ali postoji dobro prethodno razumijevanje ponaanja stope kvara za promatrani dizajn ili nain kvara. Bayesova metoda procjeneThomas Bayes was born in London. In 1719 he enrolled at the University of Edinburgh to study logic and theology: as a Nonconformist, Oxford and Cambridge were closed to him.He is known to have published two works in his lifetime: Divine Benevolence, or an Attempt to Prove That the Principal End of the Divine Providence and Government is the Happiness of His Creatures (1731), and An Introduction to the Doctrine of Fluxions, and a Defence of the Mathematicians Against the Objections of the Author of the Analyst (published anonymously in 1736), in which he defended the logical foundation of Isaac Newton's calculus against the criticism of George Berkeley, author of The Analyst.It is speculated that Bayes was elected as a Fellow of the Royal Society in 1742 on the strength of the Introduction to the Doctrine of Fluxions, as he is not known to have published any other mathematical works during his lifetime.Bayes died in Tunbridge Wells, Kent. He is buried in Bunhill Fields Cemetery in London where many Nonconformists are buried.

51

x1Bayesova metoda procjene

Bayesova teoremaPretpostavimo da smo procijenili priorne vjerovatnosti za parametre koji nas zanimaju, a nakon toga smo dobili nove informacije.eljeli bi pronai metodu za izraunavanje revidiranih ili posteriornih vjerovatnosti.

Bayesova teorema daje nam odgovor.

x1Bayesova metoda procjene

PriornevjerovatnostiNoveinformacijePrimjenaBayesove teoremePosteriornevjerovatnostiRevizija vjerovatnosti primjenom Bayesove teoremeThomas Bayes was born in London. In 1719 he enrolled at the University of Edinburgh to study logic and theology: as a Nonconformist, Oxford and Cambridge were closed to him.He is known to have published two works in his lifetime: Divine Benevolence, or an Attempt to Prove That the Principal End of the Divine Providence and Government is the Happiness of His Creatures (1731), and An Introduction to the Doctrine of Fluxions, and a Defence of the Mathematicians Against the Objections of the Author of the Analyst (published anonymously in 1736), in which he defended the logical foundation of Isaac Newton's calculus against the criticism of George Berkeley, author of The Analyst.It is speculated that Bayes was elected as a Fellow of the Royal Society in 1742 on the strength of the Introduction to the Doctrine of Fluxions, as he is not known to have published any other mathematical works during his lifetime.Bayes died in Tunbridge Wells, Kent. He is buried in Bunhill Fields Cemetery in London where many Nonconformists are buried.

53

x1Bayesova metoda procjene

Bayesova teorema za dani parametar P(B/A) = P(A/B) P(B) / P(A) p (data) = p (data) p () / p (data)1/p (data) = constPosteriorna likelihood x priornaPriorna = vjerovatnost parametra; predstavlja ono to smo mislili prije nego smo vidjeli podatke eksperimenta. Likelihood = vjerovatnost podataka za dani parametar na osnovi podataka koji se odnose na eksperiment. Posteriorna = ono to mislimo imajui pred sobom priorne informacije i upravo dobivene podatke eksperimenta. Daje relaciju izmeu uvjetne gustoe vjerovatnosti parametra (posteriorna vjerovatnost) i neuvjetovane gustoe vjerovatnosti (priorne, budui da ovisi o informacijama dobivenim prije zapoinjanja eksperimenta).

x1Bayesovo praviloBayesovo pravilo prua okvir za kombiniranje prethodnih informacija i aktualnih podataka o uzorku, u svrhu izvlaenja zakljuaka o promatranom modelu. Prethodno znanje o parametrima iskazuje se preko funkcije (), koju nazivamo priorna (prethodna) distribucija.

Primjenom Bayesovog pravila, posteriorna distribucija vektora parametara , prua osvjeene informacije o parametrima . Bayesovo pravilo iskazuje se preko slijedeeg izraza za pdf:Bayesova metoda procjene

gdje su: = vektor parametara izabrane distribucije = rang vektrora L(Data ) = funkcija gustoe maksimalne vjerovatnosti bazirana na izabranoj distribuciji i podacima () = prethodna distribucija za svaki od parametara

x1Integral u jednadbi Bayesovog pravila, koji se esto naziva marginalna vjerovatnost i predstavlja konstantu, interpretiramo kao vjerovatnost dobivanja podataka o uzorku, pod uvjetom da je zadovoljena prethodna distribucija. U opem sluaju se radi o nesvojstvenom integralu za ije je rjeavanje po