Top Banner

Click here to load reader

BachelorThesisinStatisticsandDataAnalysis A Bayesian ... 1223567/FULLTEXT01.pdf · PDF file BachelorThesisinStatisticsandDataAnalysis A Bayesian approach to predict the number of

Nov 19, 2020

ReportDownload

Documents

others

  • Bachelor Thesis in Statistics and Data Analysis

    A Bayesian approach to predict the number of soccer goals

    Modeling with Bayesian Negative Binomial regression

    Joakim Bäcklund Nils Johdet

    Division of Statistics and Machine Learning Department of Computer and Information Science

    Linköpings University

    June 2018 | LIU-IDA/STAT-G–18/006–SE

  • Supervisor: Lecturer. Isak Hietala

    Examiner: Lecturer. Ann-Charlotte Hallberg

  • Abstract

    This thesis focuses on a well-known topic in sports betting, predicting the number of goals in soccer games. The data set used comes from the top English soccer league: Premier League, and consists of games played in the seasons 2015/16 to 2017/18. This thesis approaches the prediction with the auxiliary support of the odds from the betting exchange Betfair. The purpose is to find a model that can create an accurate goal distribution. The methods used are Bayesian Negative Binomial regression and Bayesian Poisson regression. The results conclude that the Poisson regression is the better model because of the presence of underdisper- sion. We argue that the methods can be used to compare different sportsbooks accuracies, and may help creating better models.

  • Acknowledgements

    We would like to express our gratitude to our supervisor Lecturer. Isak Hietala for his perpetual guidance and assistance in keeping the progress on schedule. We would also like to extend our gratitude to Ph.D. Student Per Sidén for valuable insights and constructive suggestions. We would also like to thank Assistant pro- fessor Bertil Wegmann for ideas regarding the Bayesian modeling, his willingness to give his time so generously has been very much appreciated. Lastly, we wish to express our gratitude to our opponents Sjoerd Schelhaas and Hugo Hjalmarsson for providing much appreciated and useful feedback on the thesis.

  • Contents

    1 Introduction 1

    1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1.1 Sports betting . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1.2 Soccer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.1.3 Betfair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.1.4 Odds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Previous studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Social and ethical aspects . . . . . . . . . . . . . . . . . . . . . . . 4

    1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Data 5

    2.1 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Distribution of the number of goals . . . . . . . . . . . . . . . . . . 6

    3 Methods 8

    3.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3.1.1 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . 8

    3.1.2 Negative binomial distribution . . . . . . . . . . . . . . . . . 9

    3.1.3 Gamma-Poisson mixture . . . . . . . . . . . . . . . . . . . . 10

    3.2 Bayesian Inference and Modeling . . . . . . . . . . . . . . . . . . . 10

  • CONTENTS CONTENTS

    3.2.1 Non-bayesian approach to regression . . . . . . . . . . . . . 11

    3.2.2 Bayesian approach to regression . . . . . . . . . . . . . . . . 12

    3.2.3 Poisson regression . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.2.4 The Negative Binomial case . . . . . . . . . . . . . . . . . . 13

    3.3 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . 14

    3.3.1 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.3.2 Hamiltonian Monte Carlo . . . . . . . . . . . . . . . . . . . 16

    3.3.3 MCMC Diagnostic . . . . . . . . . . . . . . . . . . . . . . . 17

    3.4 Model evaluation and comparison . . . . . . . . . . . . . . . . . . . 18

    3.4.1 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . 18

    3.4.2 Deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.4.3 Widely Applicable Information Criterion (WAIC) . . . . . . 19

    3.4.4 Akaike weights . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.5 Implementation in R . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.5.1 RStan Version 2.17.3 . . . . . . . . . . . . . . . . . . . . . . 21

    3.5.2 rethinking Version 1.59 . . . . . . . . . . . . . . . . . . . . . 21

    4 Results 22

    4.1 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.2 MCMC Diagnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    4.2.1 Poisson model with total line 3.5 . . . . . . . . . . . . . . . 23

    4.2.2 Negative Binomial model with total line 3.5 . . . . . . . . . 26

    4.3 Predictive posterior distributions . . . . . . . . . . . . . . . . . . . 28

    5 Discussion 30

  • 5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    5.3 Applications of method . . . . . . . . . . . . . . . . . . . . . . . . . 31

    5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    6 Conclusion 32

  • List of Figures

    2.1 Bar graph of soccer goals distribution in the data set . . . . . . . . 7

    3.1 Trace plot comparison of an unhealthy and a healthy Markov Chain 17

    4.1 Trace plot for the Poisson model (3.5) . . . . . . . . . . . . . . . . . 24

    4.2 Accumulated posterior quantiles of β1 from the Poisson model . . . 25

    4.3 Pairs plot for Poisson model with total line 3.5 . . . . . . . . . . . . 26

    4.4 Trace plot for the Negative Binomial model . . . . . . . . . . . . . 27

    4.5 Predictive posterior distribution comparisons for models: Poisson35 And NegBin35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    4.6 Predictive posterior distribution comparisons on new data between models: Poisson35 And NegBin35 . . . . . . . . . . . . . . . . . . . 29

    6.1 Pairs plot for Negative Binomial model with total line 3.5 . . . . . . 35

    6.2 Accumulated posterior quantiles of β1 from the Negative Binomial model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    6.3 Accumulated posterior quantiles of β2 from the Negative Binomial model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

  • List of tables

    2.1 Example data of one observation from Betexplorer . . . . . . . . . . 5

    2.2 Example of processed data with the implied probability for Over each line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    4.1 WAIC model comparisons . . . . . . . . . . . . . . . . . . . . . . . 22

    4.2 Parameter estimation and diagnostics, Poisson model (3.5) . . . . . 23

    4.3 Parameter estimation and diagnostics, Negative Binomial model (3.5) 26

  • Keywords

    Odds – A reflection of the likelihood of a possible event expressed numerically. In betting, the decimal odds is expressed as the ratio of payoff to the stake wagered.

    Implied probability – A conversion of odds into a percentage, calculated by the inversion of the odds.

    Sportsbook – An organization that accepts bets usually on sports. They handle the odds pricing, correction of the result and the payout of the winning.

    Betting exchange – A service where the customers can choose to lay (give) odds, or place bets at other customers odds, also known as a prediction market, similar to a future exchange. The betting exchange provides the platform, leagues and games, correction of result and the payout of the winnings.

    Total – A common bet in sports is whether the total number of goals scored by both teams is over or under a certain number, called the total-line.

    Line – A number set by the the market or sportsbook before the event, where bets can be placed on over or under the given number.

  • 1. Introduction

    This chapter provides an introduction to sports betting and the betting market exchange Betfair. The second section presents previous studies in the field of goal predicting. The third section covers the purpose of this thesis, and the last section provides a reflection regarding the social and ethical aspects of this thesis.

    1.1 Background

    This section describes the history of sports betting, and a description of the betting market exchange Betfair.

    1.1.1 Sports betting

    Gambling in general dates back to before written history; while sports betting have allegedly existed for as long as sports has been around, there are records of gambling at sports events and outcomes of gladiator fights from the Roman empire. [1]

    Before sports betting was legalized in Nevada in 1931; people in the U.S placed their wagers trough privately run enterprises referred to as “bookies”. In United Kingdom, sports betting was not allowed until 1961. In 1994, Antigua and Barbuda was the first country to pass a law that allowed operators to apply for online gambling licences. However, sportsbooks did not get involved until 2001 when U.K territories Isle of Man and Gibraltar began to offer licenses. [2]

    Thanks to the sports betting industry’s online introduction, a great

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.