BachelorThesisinStatisticsandDataAnalysis A Bayesian ...1223567/FULLTEXT01.pdf · BachelorThesisinStatisticsandDataAnalysis A Bayesian approach to predict the number of soccer goals

Bachelor Thesis in Statistics and Data Analysis

A Bayesian approach to predictthe number of soccer goals

Modeling with Bayesian Negative Binomial regression

Joakim BäcklundNils Johdet

Division of Statistics and Machine LearningDepartment of Computer and Information Science

Linköpings University

June 2018 | LIU-IDA/STAT-G–18/006–SE

Supervisor: Lecturer. Isak Hietala

Examiner: Lecturer. Ann-Charlotte Hallberg

Abstract

This thesis focuses on a well-known topic in sports betting, predicting the numberof goals in soccer games. The data set used comes from the top English soccerleague: Premier League, and consists of games played in the seasons 2015/16 to2017/18. This thesis approaches the prediction with the auxiliary support of theodds from the betting exchange Betfair. The purpose is to find a model that cancreate an accurate goal distribution. The methods used are Bayesian NegativeBinomial regression and Bayesian Poisson regression. The results conclude thatthe Poisson regression is the better model because of the presence of underdisper-sion. We argue that the methods can be used to compare different sportsbooksaccuracies, and may help creating better models.

Acknowledgements

We would like to express our gratitude to our supervisor Lecturer. Isak Hietalafor his perpetual guidance and assistance in keeping the progress on schedule. Wewould also like to extend our gratitude to Ph.D. Student Per Sidén for valuableinsights and constructive suggestions. We would also like to thank Assistant pro-fessor Bertil Wegmann for ideas regarding the Bayesian modeling, his willingnessto give his time so generously has been very much appreciated. Lastly, we wish toexpress our gratitude to our opponents Sjoerd Schelhaas and Hugo Hjalmarssonfor providing much appreciated and useful feedback on the thesis.

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Sports betting . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Soccer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Betfair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.4 Odds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Previous studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Social and ethical aspects . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Data 5

2.1 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Distribution of the number of goals . . . . . . . . . . . . . . . . . . 6

3 Methods 8

3.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.1 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.2 Negative binomial distribution . . . . . . . . . . . . . . . . . 9

3.1.3 Gamma-Poisson mixture . . . . . . . . . . . . . . . . . . . . 10

3.2 Bayesian Inference and Modeling . . . . . . . . . . . . . . . . . . . 10

CONTENTS CONTENTS

3.2.1 Non-bayesian approach to regression . . . . . . . . . . . . . 11

3.2.2 Bayesian approach to regression . . . . . . . . . . . . . . . . 12

3.2.3 Poisson regression . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.4 The Negative Binomial case . . . . . . . . . . . . . . . . . . 13

3.3 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . 14

3.3.1 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.2 Hamiltonian Monte Carlo . . . . . . . . . . . . . . . . . . . 16

3.3.3 MCMC Diagnostic . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 Model evaluation and comparison . . . . . . . . . . . . . . . . . . . 18

3.4.1 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . 18

3.4.2 Deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4.3 Widely Applicable Information Criterion (WAIC) . . . . . . 19

3.4.4 Akaike weights . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Implementation in R . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5.1 RStan Version 2.17.3 . . . . . . . . . . . . . . . . . . . . . . 21

3.5.2 rethinking Version 1.59 . . . . . . . . . . . . . . . . . . . . . 21

4 Results 22

4.1 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 MCMC Diagnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.1 Poisson model with total line 3.5 . . . . . . . . . . . . . . . 23

4.2.2 Negative Binomial model with total line 3.5 . . . . . . . . . 26

4.3 Predictive posterior distributions . . . . . . . . . . . . . . . . . . . 28

5 Discussion 30

5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3 Applications of method . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Conclusion 32

List of Figures

2.1 Bar graph of soccer goals distribution in the data set . . . . . . . . 7

3.1 Trace plot comparison of an unhealthy and a healthy Markov Chain 17

4.1 Trace plot for the Poisson model (3.5) . . . . . . . . . . . . . . . . . 24

4.2 Accumulated posterior quantiles of β1 from the Poisson model . . . 25

4.3 Pairs plot for Poisson model with total line 3.5 . . . . . . . . . . . . 26

4.4 Trace plot for the Negative Binomial model . . . . . . . . . . . . . 27

4.5 Predictive posterior distribution comparisons for models: Poisson35And NegBin35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.6 Predictive posterior distribution comparisons on new data betweenmodels: Poisson35 And NegBin35 . . . . . . . . . . . . . . . . . . . 29

6.1 Pairs plot for Negative Binomial model with total line 3.5 . . . . . . 35

6.2 Accumulated posterior quantiles of β1 from the Negative Binomialmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3 Accumulated posterior quantiles of β2 from the Negative Binomialmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

List of tables

2.1 Example data of one observation from Betexplorer . . . . . . . . . . 5

2.2 Example of processed data with the implied probability for Overeach line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.1 WAIC model comparisons . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Parameter estimation and diagnostics, Poisson model (3.5) . . . . . 23

4.3 Parameter estimation and diagnostics, Negative Binomial model (3.5) 26

Keywords

Odds – A reflection of the likelihood of a possible event expressed numerically. Inbetting, the decimal odds is expressed as the ratio of payoff to the stake wagered.

Implied probability – A conversion of odds into a percentage, calculated by theinversion of the odds.

Sportsbook – An organization that accepts bets usually on sports. They handlethe odds pricing, correction of the result and the payout of the winning.

Betting exchange – A service where the customers can choose to lay (give) odds,or place bets at other customers odds, also known as a prediction market, similarto a future exchange. The betting exchange provides the platform, leagues andgames, correction of result and the payout of the winnings.

Total – A common bet in sports is whether the total number of goals scored byboth teams is over or under a certain number, called the total-line.

Line – A number set by the the market or sportsbook before the event, wherebets can be placed on over or under the given number.

1. Introduction

This chapter provides an introduction to sports betting and the betting marketexchange Betfair. The second section presents previous studies in the field of goalpredicting. The third section covers the purpose of this thesis, and the last sectionprovides a reflection regarding the social and ethical aspects of this thesis.

1.1 Background

This section describes the history of sports betting, and a description of the bettingmarket exchange Betfair.

1.1.1 Sports betting

Gambling in general dates back to before written history; while sports bettinghave allegedly existed for as long as sports has been around, there are recordsof gambling at sports events and outcomes of gladiator fights from the Romanempire. [1]

Before sports betting was legalized in Nevada in 1931; people in the U.S placedtheir wagers trough privately run enterprises referred to as “bookies”. In UnitedKingdom, sports betting was not allowed until 1961. In 1994, Antigua and Barbudawas the first country to pass a law that allowed operators to apply for onlinegambling licences. However, sportsbooks did not get involved until 2001 whenU.K territories Isle of Man and Gibraltar began to offer licenses. [2]

Thanks to the sports betting industry’s online introduction, a great number ofsportsbooks has emerged. The competition has driven their margins down, andput more pressure on the accuracy of their models, to still continue to generateprofit. The lower margins has also given the market more incentive to try to beatthe sportsbooks.

1

1.2. PREVIOUS STUDIES Introduction

1.1.2 Soccer

The most popular sport is soccer, in sports betting this is not an exception. About70 percent of the market share is estimated to come from soccer. The most populartype of betting is outcome betting, but the competition between sportsbooks hasresulted in the appearances of other bet-types such as the total. [3]

1.1.3 Betfair

A well-known online gambling website is Betfair which was established in 2000.The company is particularly known for its betting exchange which is one of thelargest in the world. The customers get to decide what events they are willing toplace or lay bets on and to what odds. This results in a larger scale of possiblewagers to be found compared to if the wagers were to be decided by the sportsbookitself. [4]

1.1.4 Odds

Odds for games are often available days in advance, before the actual start of theevent. Information such as scoring average, player injuries, weather condition, andteam line-ups can be expected to be reflected in the odds. This is due to a conceptknown as the wisdom of the crowds coined by James Surowiecki. He implies thatthe collective wisdom is often more accurate than a judgement from one singleperson. [5]

1.2 Previous studies

There have been plenty of previous studies regarding models that predict theexpected number of goals in soccer. Most of them focus on using home and awayscoring averages to predict the total number of goals in an upcoming game.

One thesis researches the betting markets risk management regarding closing odds;the authors use a time-independent Poisson model to predict the results and com-pare it to the odds. They state that this model is quite similar to the model thatsome sportsbooks already use. They conclude that the odds, in some cases, arebeatable by the model, but it is often adjusted by the sportsbook next year. Themethod the authors used is foremost established from an article from Maher. [6]Maher remarks in the summary that “Previous authors have rejected the Poisson

2

1.3. PURPOSE Introduction

model for association football scores in favour of the Negative Binomial.” Mahersarticle investigates the Poisson model further by including parameters for attack-ing and defensive strengths of each team. The author draws the conclusion thatan independent Poisson model gives a good description of the number of goalsin soccer games, but improvements can be achieved by using a bivariate Poissonmodel instead of the independent Poisson model. [7]

In a recently published article in the International Journal of Forecasting, theauthors look into a bivariate Weibull model to predict results and number of goalsin soccer games. The authors rejected the Poisson distribution in favor of theWeibull count distribution, which provided better predictions of both results andthe number of goals. [8]

Another thesis evaluates if the odds are beatable by trying three different modelsto predict the number of goals in a soccer game. These three models are thefollowing:

• Gamblers assessment - using previous number of goals made by teams andcalculating an average.

• Poisson Distribution Assessment - This model assumes that the number ofgoals follow the Poisson Distribution and uses the average of the previousgoals and inserts it into the Poisson Distribution.

• Dixon-Coles Assessment - This model is based on previous number of goalsscored, creating parameters for the team’s offensive and defensive strength.It also takes into account whether the team plays at home or away.

The author concludes: “that with the approaches taken, it was not possible tocreate probability assessments which were better than those of the bookmakers.However, results show, that it is possible to almost match them.” [9]

1.3 Purpose

The purpose of this thesis is to find a model which can predict the distributionof the total numbers of goals in soccer games, using total-lines odds set by themarket. It would be convenient to use one or two total-lines to represent all thelines. Because sometimes all lines are not available, another reason is that a simplermodel is often to be preferred. Also, if one or two total-lines would to represent allthe total-lines, it would be an indication that the odds is useful when predictingthe number of goals in soccer.

3

1.4. SOCIAL AND ETHICAL ASPECTS Introduction

1.3.1 Research questions• Can the odds be used to create a useful predictive goal distribution?• Is negative binomial regression appropriate to model soccer goals in Premier

League?

1.4 Social and ethical aspects

This thesis does not use any data that can be connected to a certain person orobject which means that no cautions of data privacy policy has to be considered.

However, gambling is a controversial health issue and can have negative economicaland social consequences. Gambling can become an addictive pursuit for people.The proliferation of internet, and as consequence profusion of sportsbooks, providespeople with a perpetual appeal towards gambling. This thesis does not, in anyway, recommend people to try gambling.

1.5 Delimitations

The data set in this thesis does not consist of all games played in Premier Leagueduring the seasons 2015/16-2017/18. The reason is that the data source Betex-plorer did not have all the total lines available for every game. Therefore, onlygames with lines available between 0.5-4.5 has been selected.

4

2. Data

The data consist of the number of goals scored, total lines and odds from 687 soccergames played in the Premier League from season 2015/16 to 2017/18. PremierLeague is the highest ranked league of the English soccer league system and consistof 20 competing teams each season. Each team plays 38 games throughout theseason. The data was scraped from the website Betexplorer. [10]

Betexplorer saves the results and the odds for each game from a number of sports-books and betting exchanges. The table below presents an example of the rawdata.

Table 2.1: Example data of one observation from Betexplorer

game ID sportsbook total line Over Under1 Betfair 1.5 1.5 2.81 Betfair 2.5 1.95 1.951 Betfair 3.5 2.9 1.4

Table 2.1 presents three total lines from a single game and their correspondingodds for over and under the total line. The over and under columns represent theclosing odds on each outcome.

2.1 Data processing

The odds from table 2.1 are converted and stored as implied probabilities. Theimplied probabilities are an indication of how often a bet must win for it to breakeven in the long run. Suppose that someone wager 1 dollar on a bet with 2.0 oddsand if the wager is won, 2 dollar will be handed out. In this case the impliedprobability is 50% which means he or she has to win half the time to break evenin the long run.

The firs step in order to calculate the normalized implied probabilities given by theodds is to determine the factor used to normalize the implied probability (nmf)which is done by the following equation:

nmf =n∑i=1

1odds of outcome i

Where n is the number of possible outcomes.

5

2.2. DISTRIBUTION OF THE NUMBER OF GOALS Data

The next step is to calculate the breakeven odds (BEodds). This is done by thefollowing equation:

BEodds = nmf · odds (2.1)

The implied probabilities are then calculated by inverting the BEodds that wasdetermined in equation 2.1.

When this procedure is done, the data of implied probabilities are stored into adatabase and extracted in the following format.

Table 2.2: Example of processed data with the implied probability for Over each line

obs. 0.5 1.5 2.5 3.5 4.5 5.5 6.5 Goals1 0.917 0.746 0.520 0.296 0.139 0.059 0.022 42 0.878 0.642 0.373 0.192 0.080 0.031 0.010 03 0.944 0.800 0.572 0.358 0.185 0.086 0.035 14 0.928 0.747 0.490 0.284 0.129 0.056 0.021 15 0.978 0.901 0.752 0.565 0.361 0.208 0.099 2

Table 2.2 presents the implied probabilities for the outcomes over 0.5 to 6.5 goalsfrom five different games and the number of goals scored in each game. This isthe final data that was received after processing the original data.

2.2 Distribution of the number of goals

To get a visual understanding of how the number of goals is distributed, a bargraph is presented below.

6

2.2. DISTRIBUTION OF THE NUMBER OF GOALS Data

Figure 2.1: Bar graph of soccer goals distribution in the data set

Figure 2.1 illustrates a barplot for the goal distribution in the data set. The figureshows that most of the games end with two or three goals, and more than fourgoals are not as common.

7

3. Methods

In this chapter, the relevant distributions are first described; the second sectioncovers Bayesian inference and modeling. The third section describes Markov ChainMonte Carlo algorithms and the diagnostic tools used. The fourth section isabout model evaluation and comparison. In the last section, the packages used forBayesian modeling in R are described.

3.1 Distributions

This section describes the distributions Poisson, Negative binomial and their re-lationship to each other. The section also covers the Gamma-Poisson distributionand how it is relevant for this thesis.

3.1.1 Poisson distribution

The Poisson distribution is often used for the counts of event that occur in a giveninterval of time. The assumptions of the Poisson distribution are the following

• The number of times an event occurs is denoted k and can take values as 0,1, 2 ... n.

• Trials are made independently.

• The event has the same probability to occur throughout the whole time-interval.

The probability mass function has the following formula:

P (k | λ) = e−λ · λk

k!where k is the actual number of events occurring from the Poisson experiment, λis the average number of events occuring (mean) and is also equal to the variance.[11]

8

3.1. DISTRIBUTIONS Methods

3.1.2 Negative binomial distribution

The negative binomial distribution is a discrete probability distribution of thenumber of failures from a sequence of independent Bernoulli-trials, until a specified(and fixed) number of successes occurs. The negative binomial distribution hastwo parameters r and p and has the probability mass function

f(k | r, p) = p(X = k) =(r + k − 1

k

)pk(1− p)r for k = 0, 1, 2, ... (3.1)

where k equals the number of successes that occur before the rth failure and pis the probability of success.

Another common formulation is

f(k | r, p) = p(X = k) =(r + k − 1

k

)pr(1− p)k for k = 0, 1, 2, ...,

where k equals the number of failures that occur before the rth success and pis the probability of success. [11]

When counting the number of X failures before the r:th success, the expectednumber of failures is

E[X] = r(1− p)p

= µ

and the variance

V ar(X) = r(1− p)p2 = r(1− p)p+ r(p− 1)2

p2

= r(1− p)p

+ r(p− 1)2

p2 = µ+

(r(p− 1)2

)r

p2 · 1r

⇒ V ar(X) = µ+ µ2

r

as r → ∞, Negative Binomial distribution converges to the Poisson distribution,And for small r, Negative Binomial gives a larger variance than Poisson. Therefore,r is referred to as the dispersion or shape parameter. Hence, the Negative Binomialdistribution is an alternative to the Poisson distribution when the variance isgreater than the mean (overdispersion). [12]

9

3.2. BAYESIAN INFERENCE AND MODELING Methods

3.1.3 Gamma-Poisson mixture

Consider a Poisson distribution with parameter λ. The λ parameter follows agamma distribution with shape parameter r, and the scale parameter θ = p

1−pwhich can be expressed as the rate parameter β = 1

θ= 1−p

p. The mixture model

can be expressed as

f(k | r, p) =∫ ∞

0fPoisson

(k|λ) · f

Gamma

(λ|r, 1−p

p

)dλ

=∫ ∞

0

λk

k! e−λ · λr−1 e

−λ(1−p)/p

( p1−p)rΓ(r)dλ

= (1− p)rp−rk!Γ(r)

∫ ∞0

λr+k−1e−λ/pdλ

= (1− p)rp−rk!Γ(r) pr+kΓ(r + k)

⇒ f(k | r, p) = p(X = k) = Γ(r + k)k!Γ(r) pk(1− p)r where {r ∈ R | r > 0}

Γ(r+k)k!Γ(r) corresponds to

(r+k−1k

)in the probability mass function of the negative

binomial in Eq. 3.1. This means that r now has been extended to all positivereal values which will prove to be an essential part in subsection 3.3.2. Hence, theNegative binomial distribution can be described as a Poisson distribution mixedwith the Gamma distribution where λ is gamma distributed with shape parameterr and scale parameter θ = p

1−p . [13]

3.2 Bayesian Inference and Modeling

This section describes how Bayesian inference works and how it can be applied toa regular multiple regression.

If no citation is specified in this section, it can be assumed that the source is R.McElreath, Statistical rethinking 2016. [14]

10


Bayesian inference gives the opportunity to keep previous information in the anal-ysis. The prior information that the user would want to keep in the analysis canbe information from experience, previous experiments or data. Suppose the useris a non-expert in the subject and has no previous information available. The usershould then use a vague prior which means the prior distribution play a minimalrole in the posterior distribution.

After deciding how vague the prior should be, the model is updated by actual datawhich educates the model further from the prior information, which results in aposterior. This update is performed by Bayes’ theorem which is a fundamental partof Bayesian modeling. Bayes theorem is used to calculate conditional probabilitiesand is represented by the following equation

p(A | B) = p(B | A) · p(A)p(B)

The equation below shows how Bayesian modeling is performed using Bayes the-orem

p(θ | datadatadata) = p(datadatadata | θ) · p(θ)p(datadatadata) ∝ p(datadatadata | θ) · p(θ)

where θ is the set of parameters. Hence p(θ) is the prior, p(datadatadata | θ) is thelikelihood of data given the parameters, and p(θ | datadatadata) is the posterior probabilitydistribution for the parameters θ given the observed data.

For every parameter intended to be estimated in a a Bayesian model, an initialsets of plausibilities has to be provided which are the priors.

3.2.1 Non-bayesian approach to regression

Consider a standard multiple linear regression, the equation is defined as

YYY = XXXβββ + εεε

where YYY is n × 1 column vector of the response variable, XXX is a n × k matrix,βββ is a k × 1 vector of regression coefficients, and εεε is a n × 1 column vector ofindependent and identically normally distributed random variables.

11


This formula can be rewritten as the probabilistic model behind it as

µµµ = XXXβββ

yi ∼ N (µi, σ)

Where the elements yi in YYY follows the normal distribution with mean µi andstandard deviation σ.

The parameters can be estimated by using ordinary least squares, done by mini-mizing the squared errors of fitted values, or by maximizing the likelihood function

θ̂MLE = arg maxβββ, σ

n∏i=1N (yi | xxx′iβββ, σ) (3.2)

where θ is the set of parameters in the vector βββ and σ. N is a probability densityfunction of the normal distribution, evaluated at yi with mean xxx′iβββ and standarddeviation σ. [15]

3.2.2 Bayesian approach to regression

In the Bayesian approach, instead of maximizing the likelihood function, eachparameter is assigned a prior distribution, and then Bayes theorem is used

f(βββ, σ | Y,X)︸︷︷︸posterior

∝n∏i=1N (yi | xxx′iβββ, σ)︸︷︷︸

likelihood

fβββ(βββ) fσ(σ)︸︷︷︸priors

When uniform priors are used, they correspond to fp(x) ∝ 1. Because the likeli-hood function is the same as in Eq. 3.2, maximum likelihood of the parameterswill be the same as its Bayesian counterpart, maximum a posteriori probability(MAP) estimate with uniform priors.

If a posterior distribution belongs to the same distribution family as the priorprobability distribution, it is called a conjugate prior. It is convenient because theparameters of the posterior distribution are then directly available.

12


3.2.3 Poisson regression

Poisson regression is a generalized linear model that assumes the response variabley follows a Poisson distribution. The model takes the form

yi ∼ Poisson(λi)

log(λi) = xxx′iβββ

βββ ∼ N (0, σ)

where xxxi is a (1 + k) × 1 vector of k numbers of explanatory variables, and βββ isa (k+1) × 1 vector of regression coefficients.

σ is arbitrarily but some guidelines are:

• vague: σ = 106

• weak informative: σ = 10

• informative prior: σ = 1

A log link is applied to ensure that the parameter λ maps only to positive values.

3.2.4 The Negative Binomial case

A Negative binomial regression can be written as an extended Gamma-Poissonmixture generalized model. What makes it extended is the use of two linearfunctions, one for each parameter. It can be compared to fitting a regressionmodel with normally distributed data with non-constant variance for σ. Thatsituation would also require a linear function for each parameter. [16]

The gamma probability density function with shape parameter k and scale param-eter θ has the following probability density function

1Γ(k)θkx

k−1e−xθ

with the mean E[X] = kθ and variance V ar(X) = kθ2

By inserting the shape parameter from section 3.1.3 k = r and the scale parameterθ = p

1−p It follows that µ is equal to rp1−p .

13

3.3. MARKOV CHAIN MONTE CARLO (MCMC) Methods

yi ∼ GamPois(µi, θi)

log(µi) = xxx′iβββµlog(θi) = xxx′iβββθ

Where xxxi is a (1 + k) × 1 vector of k number of explanatory variables, βββµ and βββθare two (1 + k) × 1 vectors of the regression coefficients.

Independent priors for each coefficients in the βββ vectors:

βββ ∼ N (0, σ)

σ is arbitrary.

The parameters µ and θ must be positive in order to ensure that r is positive and pto map between 0 and 1; therefore, a log link is applied to each of the parameters.

3.3 Markov Chain Monte Carlo (MCMC)


Markov Chain Monte Carlo (MCMC) are a group of algorithms which purpose is tosample from the posterior by constructing a Markov Chain that uses the posteriordistribution as the marginal distribution. The MCMC is commonly used when aconjugate prior cannot be used. The samples that are obtained by this processare used to approximate the posterior. The process requires no assumption ofthe shape of the posterior distribution, which makes it possible to sample directlyfrom it. Generalized linear- and multilevel models, which produce non-Gaussianposterior distributions, has a great benefit of using this process. This is becausethe MCMC has the ability to directly estimate models, without assuming multi-variate normality for instance. Besides these benefits that characterizes MCMC,the process is very time-consuming and some added monitoring of the process isalso required to ensure the MCMC is performing well, which will be explained insubsection 3.3.3 .

3.3.1 Markov Chain

A Markov Chain is a stochastic mathematical process that generates transitionsbetween different states of a variable. The process can be used on both discreteand categorical variables and are used to analyze how the outcome of the variable

14


changes within two consecutive time-periods. The set of all possible states for thevariable is called state space which can include different types of states such asweather conditions, goals scored etc.

Information on the probability of transitioning from one state to another in theprocess at time t, is given by a transition matrix. A process that can describe atransition to n different states, L1, L2, ..., Ln. The probability for the process tobe in a certain state at time t is presented in a vector such as

xxxt =

x1x2...xn

Furthermore, the probability for the process to transit between one state to anothercan be presented in a transition matrix

P =

1→ 1 2→ 1 ... n→ 11→ 2 2→ 2 ... n→ 2. . . .. . . .. . . .

1→ n 2→ n ... n→ n

Since the elements in the transition matrix is probabilities, they range from 0 to1 and each column in the matrix sums to 1.

With the aid of the two above-mentioned matrices, the probability vector for timet+ 1 can now be calculated by the equation

xxxt+1 = Pxxxt

By repeating the equation, probabilities further into the future can be calculatedby

xxxt+s = P sxxxt

Where s is the number of steps into the future.

By letting the probability distribution of xxxt be given by the n × 1 vector πππ, wheren is the number of states. A Markov Chain has reached an equilibrium distributionπππ once it satisfies

Pπππ = πππ

15


hence, πππ is an eigenvector with the eigenvalue 1. [17]

Regardless of which initial starting state that is chosen, the equilibrium probabil-ity distribution of states will be reached where no more changes will occur in thedistribution. Different type of MCMC algorithms use this to construct stationaryMarkov Chains, so that the equilibrium probability distribution is the target dis-tribution. If a stationary chain can be constructed, the chain can be iterated froman arbitrarily starting point many times. The draws generated would appear tobe coming from the target distribution.

3.3.2 Hamiltonian Monte Carlo

In statistics, Monte Carlo refers to algorithms used to solve computationally heavyproblems through simulations of random numbers and estimate the sample averagei.e. of the draws from the Markov Chains with the help of the law of large numbers.The Hamiltonian Monte Carlo (HMC) is an algorithm to sample from an unknownposterior distribution through the MCMC process.

The HMC is an effective algorithm when models consist of hundreds or even thou-sands parameters. The HMC can be thought of as a algorithm which pretends thatthe vector of parameters determine the position of a particle that has no friction,comparable to a physics simulation. HMC builds upon knowing how the density ischanging at the particle’s current location. The surface for the frictionless particleto glide across is provided by the log-posterior. Depending on if the log-posterioris very flat or very steep, the particle can glide for a long period of time or a shortperiod of time until it turns around. When the particle turns, it is because of thegradient changes direction.

The particle can glide for a long period of time until it changes direction whenthe log-posterior is very flat due to lack of information in the likelihood and flatpriors. The particle does not glide for a long period of time until it turns aroundwhen the log-posterior is very steep due to very concentrated likelihood or priors.This process provides an understanding of how the parameter’s distribution isscattered by learning from the gliding particle. The more time the particle spendsat a location, the more dense the log-posterior and vice versa.

Only when the parameters are continuous, Hamiltonian Monte Carlo can be usedsince the particle cant glide through a discrete parameter’s surface.

16


3.3.3 MCMC Diagnostic

In order to assess whether the convergence of the MCMC algorithm has occurredor not, a number of diagnostics can be used.

The most useful way for diagnosing a Markov Chain is too inspect a trace plot.A trace plot shows the samples in sequential order for each parameter. The firstpart of the plot is the warm-up phase (the gray regions in figure 3.1), where thechain is adapting for efficient sampling. The remaining region of the plot is thesampling used for inference. By inspecting the plot, a healthy Markov Chain canbe identified by stationarity and good mixing. Stationarity means that the meanis stable through the plot, and a well-mixing chain means each sample is not highlycorrelated with the sample before it. A low or non-existent correlation betweeneach sample provides a greater amount of information from a given number ofdraws from the posterior. Figure 3.1 illustrates two trace plots from an unhealthy(left) and a healthy Markov Chain (right).

Figure 3.1: Trace plot comparison of an unhealthy and a healthy Markov Chain

One metric used for diagnostic is called R̂. The metric gives an indication ofwhether a chain has converged to the equilibrium distribution. It is done by com-paring its behavior to other randomly initialized chains. The R̂ statistic measuresthe ratio of the average variance of samples, within each chain, to the variance ofthe pooled samples across all chains. If all the chains are at an equilibrium, thesewill be the same and R̂ will be equal to one. If the chains have not converged toa common distribution, the R̂ statistic will be greater than one. [18]

Another metric used is the effective sample size neff , it is an estimate of the numberof independent draws from the posterior distribution where anything greater than100 is considered adequate. Because the draws within a Markov Chain are notindependent if there is autocorrelation, the effective sample size, neff , will besmaller than the total sample size, N. The larger the ratio of neff to N the better.

17

3.4. MODEL EVALUATION AND COMPARISON Methods

Plots for the accumulated posterior means and quantiles, for each parameter, canalso be inspected for ensuring the convergence to a fixed value.

A Highest Posterior Density Interval (HPDI) is the smallest possible interval con-taining the probability mass specified. The interval is similar to a credibilityinterval, which is reminiscent of a confidence interval, but a HDPI can differ froma credibility interval when the posterior distribution is skewed or multimodal.

3.4 Model evaluation and comparison


There are many ways to evaluate and compare Bayesian models. Comparing themodels can be useful even if the predictive accuracy is considered poor because itcan help to decide where to go next in order to improve the model.

3.4.1 Kullback-Leibler divergence

One measurement used to provide a distance for how much a model deviates froma perfect model is called Information entropy and is is defined as

H(p) = −E[log(pi)] = −n∑i

pi(log(pi))

Where there are n possible events, and each event i has the probability pi

Information entropy needs to be quantified to be able to say how far a modelis from the target distribution. The Kullback-Leibler divergence is the averagedifference between the true target distribution p and the predicted distribution q.The difference is measured in log probability and the formula is

DKL(p || q) =∑i

pi(log(pi)− log(qi)) =∑i

pi log(piqi

)(3.3)

In a scenario where DKL = 0, q can be used to predict the true target distributionp which means the model that was used to predict the distribution q is the perfectmodel.

18


3.4.2 Deviance

While K-L Divergence provides a measure of distance, the target distribution pis still unknown. However, by comparing the divergences of two models q and mfor example, it is known from Eq. 3.3 that log(pi) is used for both calculating qand m. Which means, when comparing them, they are subtracted from each otherand shows that all p’s can be canceled; it has no impact on how far the comparingmodels are apart. Hence, each model’s average log-probability is sufficient forcomparing models, and an approximation of these averages is obtained by summingthe log-probabilities of each observed case for q and m.

This measurement is called deviance, and it is a relative model fit measurementand an approximation of K-L divergence. Deviance for model q is defined by

D(q) = −2∑i

log(qi)

where i denotes each case (observation) and each qi is the likelihood of case i.

3.4.3 Widely Applicable Information Criterion (WAIC)

Unfortunately, deviance has the same flaw as the coefficient of determination; italways improves when the model gets more complex. Information criteria correctsthat flaw by taking the complexity of the model into account and penalize complexmodels.

WAIC is the generalized version of the Akaike information criterion (AIC), itis an example of an information criterion for out-of-sample deviance. WAIC iscalculated by taking averages of the loglikelihood over the posterior distribution.It does not require a multivariate normal posterior distribution, compared to otherinformation criterion and is often more accurate.

The main feature of the WAIC is that it is pointwise. This is advantageous becausethe observations may have different uncertainties, and some can be harder topredict than others. The disadvantage is that it requires independent observations.

The WAIC consist of two parts, the first part is the log-pointwise-predictive-density:

lppd =N∑i=1

log[p(yi)] (3.4)

19


where p(yi) is the likelihood of observation yi.

For each set of parameters sampled from the posterior distribution, the lppd com-putes the likelihood of observation yi. It will then average the likelihoods of eachobservation i and sum over all observations. This is an analog of deviance averagedover the posterior distribution.

The second part is the effective number of parameters pWAIC

pWAIC =N∑i=1

V (yi) (3.5)

where V (yi) is the variance in log-likelihood for observation i.

Combining Eq. 3.4 and Eq. 3.5, WAIC is defined as

WAIC = −2(lppd− pWAIC)

and is an estimate of out-of-sample deviance.

3.4.4 Akaike weights

The Akaike weights is used to compare the models relative predictive accuracy.The weights are calculated by converting the expected deviance given by WAICto a probability scale. In a set of m models, the weight for model i is given by

wi =exp

[− 1

2dWAICi]

m∑j=1

exp[− 1

2dWAICj]

where dWAICi is the difference between model i WAIC value and the model withthe lowest WAIC.

The sum of all weights in the set of models will equal to 1 and each individualmodel will have a weight between 0 and 1. Each weight can be interpreted suchas the probability that the model i has the best prediction ability, given the set ofmodels.

20

3.5. IMPLEMENTATION IN R Methods

3.5 Implementation in R

3.5.1 RStan Version 2.17.3

Stan is a state-of-the-art platform for statistical modeling and high-performancestatistical computation. RStan is the R interface to Stan.

3.5.2 rethinking Version 1.59

The package rethinking essentially consists of two functions, map and map2stan.These functions force the user to build up a statistical function. In this thesis thefunction map2stan is used. The function builds a Stan model that is used to fitthe model with Hamiltonian Monte Carlo sampling.

The functions dgampois and rgampois computes the density and produces a ran-dom sample from a Gamma-Poisson mixture probability distribution.The functionparameters are the gamma parameters of the mean µ and the scale θ. Internally,the function uses dnbinom and rnbinom, which takes the parameters r and p. Theparameters r and p are calculated by r = µ

θand p = θ

1+θ .

21

4. Results

In this chapter the model evaluation and comparisons are presented in order todetermine which models that has the best prediction ability. The second sectionpresents the MCMC diagnostics for the two best models. The last section illus-trates the predictive abilities of the posterior distributions for these models.

4.1 Model comparison

In order to compare the different models predictive ability, a table is presentedbelow with WAIC values and Akaike weights for each model.

Table 4.1: WAIC model comparisons

Model (expl. var.) WAIC pWAIC dWAIC Akaike WeightPoisson (3.5) 2562.8 1.8 0.0 0.67Poisson (2.5) 2564.3 1.7 1.5 0.32Poisson (1.5) 2571.6 2.1 8.8 0.01

Negative Binomial (3.5) 2714.7 2.2 151.8 0.00Negative Binomial (4.5) 2715.1 2.3 152.3 0.00Negative Binomial (2.5) 2715.8 2.3 153.0 0.00Negative Binomial (5.5) 2715.9 2.3 153.1 0.00

Negative Binomial (0.5, 1.5) 2717.0 3.8 154.1 0.00Negative Binomial (0.5, 3.5) 2718.4 3.4 155.5 0.00Negative Binomial (0.5, 4.5) 2718.8 3.5 156.0 0.00Negative Binomial (0.5, 2.5) 2718.8 3.3 156.0 0.00

Negative Binomial (1.5) 2719.6 2.3 156.7 0.00Negative Binomial (0.5) 2727.1 2.1 164.3 0.00

Table 4.1 presents the WAIC values and Akaike weights for each of the models.To each of the models, the same weak informative prior has been used for allparameters, normally distributed with mean 0 and standard deviation of 10.

The top three models that has the lowest WAIC values are all Poisson models.This value is an estimate of out-of-sample deviance and provides the conclusionthat these three models has better predictive capability than the remaining models.Hence, the Poisson models has the best ability for predictions.

The Akaike weights confirms that the Poisson models are to prefer over the Nega-

22

4.2. MCMC DIAGNOSTIC Results

tive Binomial models to predict new observations. The probability for the Poissonmodel with explanatory variable 3.5 to be chosen as the model with best predictiveability is 67 %. The table shows that there are no Negative Binomial Model thathave a probability above zero which means that neither of them can be chosen asthe model with best predictive ability. 1

To summarize the model comparison of the thirteen models that is presented intable 4.1. the Poisson model with predictor variable 3.5 turns out to be the bestmodel. The best Negative Binomial model is when the predictor variable 3.5 isused. These two models will be further analyzed in subsections 4.2.1 and 4.2.2.

4.2 MCMC Diagnostic

This section presents the MCMC diagnostics of the best Poisson model and thebest Negative Binomial model that was selected from the model comparison. Byobserving MCMC diagnostics for each model, it is possible to determine whetherthey have fulfilled all the criteria for being reliable and useful models.

4.2.1 Poisson model with total line 3.5

A summary of the posterior result of the Poisson model with total line 3.5 ispresented below.

Table 4.2: Parameter estimation and diagnostics, Poisson model (3.5)

Mean StdDev lower 0.909 upper 0.909 neff R̂β01 0.58 0.10 0.42 0.75 156 1.00β1 1.45 0.29 0.98 2.01 158 1.00

Table 4.2 shows that the posterior mean for the intercept is 0.58 and 1.45 forthe slope parameter. 0.909 and upper 0.909 is a 90.9 percent highest posteriordensity interval. The reason why 90.9% was chosen as limit was because it providesa simple interpretation of the interval. The odds for the parameters posterior meanto be within the interval is 0.909

1−0.909 = 10, which means it is ten times more likelythat the parameters posterior mean is within the interval than outside.

The intervals for β01 and β1 indicate that both posterior distributions are reliablyabove zero. neff is greater than 100 and R̂ is approximately 1, this indicates thatthe Markov chain has converged to an equilibrium distribution.

1Note that the probabilities are rounded

23


To investigate further whether the model has a well calibrated Markov Chain ornot, a trace plot is presented below.

Figure 4.1: Trace plot for the Poisson model (3.5)

Figure 4.1 shows trace plots for the parameters in the Negative binomial model.The trace plot looks stationary, with a distinct zig-zag motion between the samples.The zig-zag motion is a sign that there is no correlation between the samples andthe eff value above each plot is greater than 100 which indicates that the chain ishealthy.

To determine how many iterations that is needed for the quantiles of the slopeparameter to converge, the following plot is visualized below.

24


Figure 4.2: Accumulated posterior quantiles of β1 from the Poisson model

Figure 4.2 illustrates a convergence plot for the accumulated posterior quantiles ofthe parameter β1. The figure shows that after approximately 100 iterations, eachof the quantiles have converged.

It is also of interest to determine if there exist correlation between each pair ofparameters in the model. This is visualized in a pairs plot below.

25


Figure 4.3: Pairs plot for Poisson model with total line 3.5

The diagonal of figure 4.3 presents the marginal estimate densities, which illus-trates the distribution of the values of the parameter in the Markov chain. Thelower left square shows the correlation coefficient between each pair of parameter.β01 is the intercept and β1 is the slope parameter in the Poisson model. The strongcorrelation is not a big concern, since HMC can handle it well. 2

4.2.2 Negative Binomial model with total line 3.5

A summary of the posterior result of the Negative Binomial model with total line3.5 as predictor variable is presented below.

Table 4.3: Parameter estimation and diagnostics, Negative Binomial model (3.5)

Mean StdDev lower 0.909 upper 0.909 neff R̂β01 0.54 0.17 0.22 0.81 501 1.00β02 6.61 3.26 1.30 12.05 171 1.01β1 1.56 0.55 0.65 2.50 505 1.00β2 7.96 8.64 -5.46 24.11 462 1.00

Table 4.3 shows that every parameter in the model has a neff value above 100and a R̂ value below 1.1. This is an indication that the models Markov Chainhas converged to an equilibrium distribution. The 90.9 percent highest posteriordensity interval for the parameter β2 is the only parameter that is not reliablyabove zero.

To investigate whether the model has a well adjusted Markov Chain or not, a traceplot is presented below.

2 The correlation can be reduced by centering the explanatory variable, which has been tried,it gave a lower correlation but a slightly higher WAIC.

26


Figure 4.4: Trace plot for the Negative Binomial model

Figure 4.4 shows that the Markov chains appears to be stationary, with a distinctzig-zag motion between the samples. The Markov Chains looks to be healthy sincethe zig-zag motion is a sign that there are no correlation between the samples. Theconclusion that was determined from table 4.3 that the parameter β2 cannot bereliably over zero, can clearly be seen in figure 4.4 where the draws range from -11up to 30.

To determine how many iterations that is needed for the quantiles of a the pa-rameters to converge, a similar plot as in figure 4.2 was produced for both β1 andβ2 which can be seen in the appendix (Figure 6.2). It appeared that each of thequantiles for the two parameters had converged. The correlation between eachpair of parameter in the model is visualized in a pair plot which also can be seenin the appendix (Figure 6.3).

27

4.3. PREDICTIVE POSTERIOR DISTRIBUTIONS Results

4.3 Predictive posterior distributions

This section aims to illustrate the predictive ability of the models from the lastsection. The models are also retrained on a sub-sample of the data set. Thesenew models are tested on the sub-sample of data it has not seen before, in orderto evaluate their predicting ability.

The predictive posterior distribution for the Poisson model and the Negative Bi-nomial model is presented in the histogram below, together with data. The x-axisshows the numbers of goals and the y-axis shows the density.

Figure 4.5: Predictive posterior distribution comparisons for models: Poisson35 And NegBin35

Figure 4.5 illustrates that both models posterior fits the data relatively well, butthey both are underestimating the probability of 4 goals. Both models are accuratewhen predicting from 5 up to 9 goals and the predictions of 0 and 1 goals are notfar away from data.

To see how well the two models predictive posterior distributions fits new data,the histogram below is presented.

28

4.3. PREDICTIVE POSTERIOR DISTRIBUTIONS Results

Figure 4.6: Predictive posterior distribution comparisons on new data between models: Poisson35 And NegBin35

Figure 4.6 illustrates the predictive posterior distributions for the Poisson andNegative Binomial Models. The models are trained on data which consists of oddsand goals for the Premier League seasons 2015-2016 and 2016-2017. The datain this figure consists of the goal distribution of games played in Premier Leagueseason 2017-2018. The figure shows that the Poisson model predicts the new databetter than the Negative Binomial model in the goal region of 0 to 3.

29

5. Discussion

In this chapter we discuss the results, their limitations, application areas, andfuture studies.

5.1 Limitations

One shortcoming of these results is that not every linear combination of explana-tory variables could be tested. Instead, we have chosen to test the combinationswe anticipated to be good predictors and not be too correlated. This means thatthe best model may have been evaded. Also, the data is time-dependent, even ifthe time dependency can be expected to be reflected in the odds. The thought ofincluding future data in the training sample and test it on previous data does notseem reasonable.

When we produce the predictive posterior distributions, it is done by samplingfrom the predictive distributions. As a consequence, the results are only based onsmall samples.

This thesis only uses data from the Premier League which showed signs of under-dispersion. The choice of the best predictive model might have been different if weinstead included many different leagues since we do not know if Premier Leaguetend to be more underdispersed than other leagues. The majority of leagues mighthave preferred the Negative Binomial over the Poisson when predicting number ofgoals.

5.2 Results

The predictive posterior distribution for the Poisson model shows to fit the databetter than the Negative Binomial model. We believe that the cause for this isthat the variance for the goals scored in the 2015-2018 seasons seems to be lowerthan the mean. This means that the Negative Binomial distribution should notbe used since this distribution allows the variance to be higher than the meanbut not lower. We had hoped to create a Negative Binomial model in order tobeat the regular Poisson model when predicting the number of goals in soccer,since some of the previous studies indicates that the Poisson distribution should

30

5.3. APPLICATIONS OF METHOD Discussion

be substituted by the Negative Binomial distribution. This is because goal-datatends to be overdispersed; however, our data was underdispersed. We believe thatwe might have been unlucky by choosing seasons where underdispersion existed,which should be reasonably rare. We believe that in general Negative Binomialmight still be the best way to go compared to Poisson when predicting soccer goalssince most seasons generally tend to be overdispersed.

5.3 Applications of method

We argue that by using odds as an explanatory variable, more variables that maynot be incorporated into the odds can be added to the model. Therefore, if themodels WAIC improves, the market might underestimate the variables added. Wethink that this method can also be used to compare different sportsbooks models,in order to find out which one is using the most accurate model for predicting thetotal number of goals.

5.4 Future work

For future work, we propose using predictors for estimating both home and awaygoals. And introducing a copula to account for correlation between home and awaygoals. Furthermore, we suggest using a method that can handle underdisperseddata, such as Conway-Maxwell-Poisson regression. It would also be interesting tostudy more than one league.

31

6. Conclusion

This chapter summarizes the results in order to answer the research questions thatwere stated at the beginning of the thesis. The first research question was

• Can the odds be used to create a useful predictive goal distribution?

Answer: Useful predictive goal distributions can be obtained by using the odds. Itis enough to know the odds for over 3.5 goals in a soccer game in Premier Leagueto receive useful predictive goal distributions.

The secound reserach question was

• Is negative binomial regression appropriate to model soccer goals in PremierLeague?

Answer: It is not, negative binomial model has a considerable margin of error forpoint predicting the number of goals.

32

Bibliography

[1] The history of Sports Betting. url: http://www.onlinegamblingsites.org/history/sports-betting/ (visited on 05/18/2018).

[2] A Brief History of Gambling. url: https://medium.com/edgefund/a-brief-history-of-gambling-a7f46dbf4403 (visited on 05/18/2018).

[3] Frank Keogh and Gary Rose. Football betting - the global gambling industryworth billions. url: https://www.bbc.com/sport/football/24354124(visited on 05/18/2018).

[4] Betfair. Betfair Exchange. url: https://www.betfair.com/ (visited on03/04/2018).

[5] J. Surowiecki. The Wisdom of Crowds. Bantam Doubleday Dell PublishingGroup Inc, 2011.

[6] Markus Ådahl och Kevin Musasa Kazadi Ehsan Fazlhashemi. “En studie ispelmarknadens riskhantering kring stängningsodds.” "page 8". MA thesis.Umeå University, 2016. url: http://www.diva-portal.org/smash/get/diva2:897973/FULLTEXT01.pdf.

[7] Maher M.J. “Association Football scores.” In: (1982). "page 109-118".[8] Georgi Boshnakov, Tarak Kharrat, and Ian Mchale. “A Bivariate Weibull

Count Model for Forecasting Association Football Scores”. In: InternationalJournal of Forecasting 33.2 (2017), pp. 458–466. issn: 0169-2070. doi: 10.1016/j.ijforecast.2016.11.006.

[9] Rasmus B. Olesen. “Assessing the number of goals in soccer matches”. "re-sume och page 8,9". MA thesis. Danmark: Ålborg universitet, 2008. url:http://projekter.aau.dk/projekter/files/14466581/assessingthenumberofgoalsinsoccermatches.pdf.

[10] Betexplorer. url: https://www.betexplorer.com (visited on 01/18/2018).[11] Schervish M. DeGroot M. Probability and Statistics. Pearson Education, Inc,

2012.[12] Negative binomial regression. url: http://www.karlin.mff.cuni.cz/

~pesta/NMFM404/NB.html (visited on 04/18/2018).[13] Randall Reese. Poisson versus Negative Binomial Regression. url: http:

//www.math.usu.edu/jrstevens/biostat/PoissonNB.pdf.[14] R. McElreath. Statistical rethinking: A bayesian course with examples in R

and Stan. Boca Raton: CRC Press., 2016.

33

http://www.onlinegamblingsites.org/history/sports-betting/

http://www.onlinegamblingsites.org/history/sports-betting/

https://medium.com/edgefund/a-brief-history-of-gambling-a7f46dbf4403

https://medium.com/edgefund/a-brief-history-of-gambling-a7f46dbf4403

https://www.bbc.com/sport/football/24354124

https://www.betfair.com/

http://www.diva-portal.org/smash/get/diva2:897973/FULLTEXT01.pdf

http://www.diva-portal.org/smash/get/diva2:897973/FULLTEXT01.pdf

https://doi.org/10.1016/j.ijforecast.2016.11.006

https://doi.org/10.1016/j.ijforecast.2016.11.006

http://projekter.aau.dk/projekter/files/14466581/assessingthenumberofgoalsinsoccermatches.pdf

http://projekter.aau.dk/projekter/files/14466581/assessingthenumberofgoalsinsoccermatches.pdf

https://www.betexplorer.com

http://www.karlin.mff.cuni.cz/~pesta/NMFM404/NB.html

http://www.karlin.mff.cuni.cz/~pesta/NMFM404/NB.html

http://www.math.usu.edu/jrstevens/biostat/PoissonNB.pdf

http://www.math.usu.edu/jrstevens/biostat/PoissonNB.pdf

BIBLIOGRAPHY

[15] Andreas Svensson Fredrik Lindsten Thomas B. Schön and Niklas Wahlström.url: http://www.it.uu.se/edu/course/homepage/sml/literature/probabilistic_modeling_compendium.pdf.

[16] "Mail conversation with Assistant Professor Bertil Wegmann from Dept. ofComputer and Information Science, Linköpings University".

[17] Sheldon M. Ross. Introduction to Probability Models. Academic press, 1997.[18] Stan: A Probabilistic Programming Language. url: https://www.jstatsoft.

org/article/view/v076i01/v76i01.pdf (visited on 05/05/2018).

34

http://www.it.uu.se/edu/course/homepage/sml/literature/probabilistic_modeling_compendium.pdf

http://www.it.uu.se/edu/course/homepage/sml/literature/probabilistic_modeling_compendium.pdf

https://www.jstatsoft.org/article/view/v076i01/v76i01.pdf

https://www.jstatsoft.org/article/view/v076i01/v76i01.pdf

Appendix

Figure 6.1: Pairs plot for Negative Binomial model with total line 3.5

Figure 6.2: Accumulated posterior quantiles of β1 from the Negative Binomial model

35

BIBLIOGRAPHY

Figure 6.3: Accumulated posterior quantiles of β2 from the Negative Binomial model

36

BachelorThesisinStatisticsandDataAnalysis A Bayesian ...1223567/FULLTEXT01.pdf · BachelorThesisinStatisticsandDataAnalysis A Bayesian approach to predict the number of soccer goals

Documents