Highlights Football: Discovering elapsing-time bias in the science of success L. Galli, G. Galvan, T. Levato, C. Liti, V. Piccialli, M. Sciandrone We conjecture that players’ behavior is more and more correlated with the match outcome as the 90 minutes elapse. We demonstrate the effect of this elapsing-time bias by applying a host of machine learning techniques on a large corpus of finely detailed football matches of European leagues. We show that we can predict the output of a match with high confidence simply by looking at the last 15 minutes of the game. We design a new task and we show that is not affected by elapsing-time bias.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Highlights
Football: Discovering elapsing-time bias
in the science of success
L. Galli, G. Galvan, T. Levato, C. Liti, V. Piccialli, M. Sciandrone
� We conjecture that players’ behavior is more and more correlated with
the match outcome as the 90 minutes elapse.
� We demonstrate the effect of this elapsing-time bias by applying a
host of machine learning techniques on a large corpus of finely detailed
football matches of European leagues.
� We show that we can predict the output of a match with high confidence
simply by looking at the last 15 minutes of the game.
� We design a new task and we show that is not affected by elapsing-time
bias.
Football: Discovering elapsing-time bias
in the science of success
L. Gallia,∗, G. Galvanb, T. Levatob, C. Litic, V. Picciallic, M. Sciandroneb
aChair for Mathematics of Information Processing, RWTH Aachen University,Pontdriesch 10, 52062 Aachen (Germany)
bDipartimento di Ingegneria dell’Informazione, Universita di Firenze, Via di SantaMarta 3, 50139 Firenze (Italy)
cDipartimento di Ingegneria Civile e Ingegneria Informatica, Universita degli Studi diRoma “Tor Vergata”, Via del Politecnico 1, 00133 Roma (Italy)
Abstract
One of the fundamental topics in sports analytics is the science of success,
i.e., the study of the correlation between players’ performances and their suc-
cess. This is a very challenging task especially in the case of team sports,
among which football is a prominent example. This paper is concerned with
uncovering a dangerous bias that is present in most of the approaches pro-
posed in the literature that apply statistical techniques or machine learning
models to study the correlation between team performances and match out-
come. In particular we find out that players’ behavior on a time interval is
more and more correlated with the match outcome as the 90 minutes elapse.
As an extreme example, we show that we can predict the output of a match
with high confidence simply by looking at the last 15 minutes of the game.
Table 4: AUC (mean ± ci) for the different models using statistics over the whole match.
curve (AUC) [16] in the macro-averaging fashion1. In particular, we compute
the AUC on 10 random splits where 90% of the data is used for training and
the remaining 10% as the test set. For each independent 90% split, the hyper-
parameters of the models are chosen with a 5-fold cross-validation scheme
except for NNs where we use early-stopping on a validation set containing
20% of the original training data.
3.1. Study of the Final Output with Whole-Match Statistics
The mean AUC (macro) and confidence intervals are reported in Table
4. Logistic regression, XGBoost, SVM and NNs proved to be statistically
equivalent2, while k -NN and RF scored significantly worse. The AUC of
the top four models is around 0.82, meaning that the features extracted are
highly correlated with the outcome of the match. Moreover, it does not seem
that the performances are highly influenced by the choice of the machine
learning model (considering the top 5 best models).
As a further analysis on the model performances, we show the ROC curves
1We choose macro over micro-averaging as the former weights all the different classes(Win, Draw, Lose) in the same way, regardless of the number of examples for each class.
2This was assessed using a Welch’s t-test to compare the 10 AUC test scores.
11
for the logistic regression model in Figure 3. To extract the curves we mod-
ify the test procedure. Namely, instead of random sampling 10 different test
sets, we perform a 10-fold-cross-validation test. In this way we gather the
predictions of the model on all the examples of the dataset (which are how-
ever unseen when the predictions are computed). We use such predictions to
extract four different curves: one curve for each class (in a one-vs-all fashion)
obtained by comparing the probability of being of a class against the prob-
ability of being of the other two (as in the standard two-classes case) and a
cumulative curve according to the macro-averaging scheme. From Figure 3
we can make the following observations.
� The results of the win-vs-all curve are very similar to those obtained
in Figure 2 of [10]: 0.88 AUC in comparison with 0.89± 0.02. In fact,
the win-vs-all curve is basically simulating the task addressed in [10]
where only the Win and Not win outputs are considered.
� In accordance with the literature (e.g., [9, 11, 10]), it might be noticed
that, while Wins and Losses are well identified by the method, Draws
are not classified as clearly. This means that the machine learning
model is (at least to some extent) able to understand when one of the
teams is prevailing on the other, but has not the same confidence in
classifying matches that conclude in draws.
To help understanding how the models behave w.r.t. the different classes
we show the confusion matrix w.r.t. the latter experiment. The confusion
matrix is obtained by discretizing the probabilities computed by the model by
choosing, for each instance, the most probable class. The confusion matrix
12
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True Positive Rate
Receiver operating characteristic
Win Vs All (area = 0.88)Loss Vs All (area = 0.87)Draw Vs All (area = 0.70)macro-average (area = 0.82)
Figure 3: ROC curves for the logistic regression model.
13
Draw Wi
nLos
s
Predicted label
Draw
Win
Loss
True
labe
l
0.45 0.28 0.27
0.20 0.72 0.08
0.21 0.08 0.71
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 4: Confusion matrix for the logistic regression model.
is shown in Figure 4. We notice that wins and losses have high accuracy
(∼ 72%) and mistaking wins for losses is a rare kind of error (∼ 8%). The
most frequent kind of errors is, instead, the ones where we take draws for
wins or losses and vice-versa. This does not come as a surprise as it confirms
the lower AUC obtained in the draw-vs-all of Figure 3.
3.2. Study of the Final Output with Intervals Statistics
As a second step, we introduce time intervals to analyze what is the
effect of exploiting a more detailed description of the match. In this phase,
we maintain the result of the whole match as the labels, but we extract the
Table 5: AUC (mean ± ci) for the different models using statistics extracted every 15minutes from the whole match.
141 features described in Table A.6 from each of the 15-minutes intervals (as
described in Section 2).
The results are showed in Table 5. By comparing it with Table 4 we can
notice that each method gains from 3 to 6 points when exploiting a finer
description of the match. This means that a single aggregation of the perfor-
mances along all the intervals is hiding some useful information. The feature
vector extracted this way is now even more correlated with the outcome of
the match. As in Table 4, also Table 5 shows that the same group of 5
machine learning models ( logistic regression, XGBoost, linear SVM, gaus-
sian SVM and NNs) are performing similarly (they are actually statistically
equivalent). This seems to suggest, once again, that the choice of the model
is not crucial for this task.
3.3. Study of the Partial Output with Partial-Intervals Statistics
We now present a set of experiments that will progressively bring us to
discover the presence of elapsing-time bias. As a first step in this direction,
we focus on understanding if the high correlation between game statistics
and final match outcome can be also found between partial game statistics
15
Figure 5: AUC (macro) for logistic regression trained on different sections of the match.
and partial outcomes. Namely, we consider different sections of the match of
increasing duration (0-15, 0-30, . . . , 0-90) and we train a model to output the
partial outcome at the end of each one. For example, if we consider the first
30 minutes of the match we train a model that receives as input the game
statistics (divided in 15 minutes intervals as before) of the first 30 minutes
and must output who is winning at the end of those 30 minutes. For the sake
of simplicity we show only the performance of the logistic regression model,
since, as we showed in the previous experiments, all the other models are
not significantly better. The results are shown in terms of AUC (macro) in
Figure 5. From Figure 5 we can observe the following.
� The performance of the models for partial outcomes are by far worse
than the performance of the model for the whole match, with roughly
17 points of difference between the models 0-90 and 0-30.
� If we exclude the 0-15 model, there is an evident increasing trend in
16
the performance of the model that is positively correlated with both
(a) labels that are getting closer to the end of the match;
(b) features that are taking into account a larger section of the match
(as well as becoming more numerous).
To explain this phenomenon, and understand whether it is a matter of (a)
labels or (b) features (or both), we first shift our focus to the model itself.
Namely, we analyze how the model exploits the different features. To this
aim, we compute the permutation importance, also known as Mean Decrease
Accuracy. According to this technique, which can be traced back to [17], the
model is treated as a black-box estimator and the importance of a feature
is measured by the decrease of test performance one obtains by replacing
the entire corresponding test column with random noise. For this method to
work, noise must be drawn from the same distribution of the original feature
values. We follow [18] and compute the new column by a random shuffle
of the old one. Notice that the model is not retrained, as in, for instance,
feature selection techniques, but we simply use the model to make predictions
on permuted test sets. Once this procedure is performed on all the columns
(several times) we have a measure of the importance of each feature. We
consider a logistic regression model trained on 80% of the dataset and we
apply the permutation importance procedure to the remaining 20%. We
use this tool to analyze how each model exploits the features. In Figure 6
we show the feature importance aggregated by intervals. Namely, we sum
the feature importance for each feature belonging to the same 15 minutes
intervals.
17
Figure 6: Feature importance for the different models.
We notice that, except for the 0-30 model, all the other models rely much
more on the last interval to make their predictions. This is particularly evi-
dent in the 0-90 models where the last interval is ∼ 20 times more important
than the others.
Given this evidence, to deepen the analysis, we propose to build models
that, to the extreme, only exploit the features from the last available 15
minute interval. Even if the labels are not changed, we will refer to these
models as 0-15, 15-30, . . . , 75-90, for differentiating them from the previous
models. The plot is showed in Figure 7.
Now, by comparing Figures 5 and 7 we can observe the following.
� In both cases (again by excluding the 0-15 model), there is an evident
increasing trend in the performance of the models as the labels get
18
Figure 7: AUC (macro) for logistic regression trained on different sections of the match.
closer to the final outcome of the match. Note that in this new study
the amount of features is not changing.
� Each model of Figure 7 loses some AUC points w.r.t. the corresponding
of Figure 5 (despite the 0-15 model that is exactly the same).
– 15-30: 3 AUC points (0.65 vs 0.68);
– 30-45: 4 AUC points (0.66 vs 0.70);
– 45-60: 3 AUC points (0.72 vs 0.75);
– 60-75: 6 AUC points (0.73 vs 0.79);
– 75-90: 1 AUC point (0.84 vs 0.85).
We can see that the loss of points is not correlated with the amount of
features removed between the models of Figure 5 and these of Figure 7.
In fact, even if the 75-90 model is only using 141 features, in comparison
with the 846 of the 0-90 model, it only loses 1 AUC point w.r.t. it.
19
From these observations, we can first conclude that the increasing phe-
nomenon showed in Figure 5 is not related to the number of features consid-
ered, but instead to the labels getting closer to the final outcome.
Moreover, if we now focus on the final outcome (as in the 0-90 and 75-90
model), we can conclude that the last 15 minutes of the match are the most
important for understanding it. In fact, it is possible to train a model only
on this last interval (the 75-90 model) for obtaining AUC performances that
are comparable to the ones obtained by exploiting the whole match (the 0-90
model). Note that the performances obtained by this 75-90 model are even
better than any of those of Table 4. We believe this can be considered a
strong proof of the existence of what we named elapsing-time bias.
Our conjecture here is that players are reacting to the partial outcome
(e.g., [12, 13, 14, 15]), either by trying to change it before the end or by trying
to maintain it. Moreover, the partial and the final outcome are getting closer
in expectation as the 90 minutes are elapsing. This means that the expected
outcome is becoming a stronger and stronger bias as time is elapsing. Even if
this phenomenon is intuitive and has already influenced some researchers [2,
6], to the best of our knowledge, this is the first quantitative study performed
to show the significant consequences that it has on the soccer match analysis.
3.4. Study of the Real-Time Output with Real-Time Statistics
In this section, we will address the issue of elapsing-time bias by introduc-
ing a novel way to set up the learning problem. In particular, we will study
the correlation between the statistics extracted on a single interval and the
outcome of that specific interval. More precisely, an interval will be labeled
as Win/Draw/Loss if team A scored more/same/fewer goals than team B
20
Figure 8: AUC (mean ± ci) for the different intervals.
within that time interval. This yields six different tasks for each match, one
for each of the six-time intervals (0-15, 15-30, . . . , 75-90). Note that the fea-
tures here are the same as those of Figure 7, but labels are different. Thanks
to this modification of the output, the partial (and final) outcome is never
observed. This means that game statistics are not studied jointly with the
partial (or final) outcome, but instead w.r.t. the real-time consequence of
the 15-minutes interval.
The results are reported in Figure 8 and from them we can observe the
following.
� The results are fairly stable among the different intervals, in contrast
with the ones of Figure 7. The last interval of each half is, apparently,
slightly more difficult to analyze than the others, but this can be prob-
ably explained by noticing that these two intervals are inherently more
chaotic from a game perspective.
21
� As happened in previous experiments, Logistic regression, XGBoost,
SVM and NNs performed equally well, while k-NN and RF scored
significantly worse.
� AUC scores obtained here are not as high as those obtained in Section
3.3, suggesting that the task is more challenging. Thanks to the real-
time nature of this task, these models are never addressing the partial
(final) output.
We can now conclude that elapsing-time bias is not affecting the newly
proposed task. In particular, since in Figures 7 and 8 the features are the
same, the issue is solved by not exploiting them for understanding the partial
(or final) output. Note that game statistics are not modified, so they may
still be affected by the expected output. However, thanks to the fact that we
are not addressing the partial (or final) output, we have a model that is not
exploiting this bias for understanding the game.
To better evaluate the performance of these models we propose two ad-
ditional comparisons against:
� a baseline model which employs as features only the score at beginning
of the interval
� a model trained with the same feature vector of game statistics with
the addition of the score at the beginning of the interval.
We focus only on the Logistic Regression model for simplicity. The results are
depicted in Figure 9. We notice that the inclusion of the initial score in the
feature vector does not seem to change the results significantly. This suggests
22
Figure 9: AUC (mean ± ci) for the different intervals.
that the model leverages the match statistics we propose in a meaningful
and fairly effective way. This is also highlighted by the comparison with the
baseline, which is significantly lower and approaches a value of 0.5 which
indicates random guess.
We notice that the inclusion of the initial score in the feature vector does
not seem to change the results significantly. This suggests that the model
leverages the match statistics we propose in a meaningful and fairly effective
way. This is also highlighted by the comparison with the baseline, which
is significantly lower and approaches a value of 0.5 which indicates random
guess.
Finally, we show that the newly proposed models have the additional
desirable characteristic of exploiting the game statistics in a fairly uniform
way. In particular, we propose a very similar analysis to the one of Figure
23
Figure 10: Feature importance for the different models.
6. In this case, we further partition each of the 15-minutes intervals in 33
sub-intervals of 5 minutes4. In Figure 10 the aggregated feature importance
is now on intervals of 5 minutes. It is interesting to notice that the newly
designed task is yielding models that are observing the whole match in a
much more uniform way. In contrast with Figure 6 this means that there
is not a single interval (e.g., the last 15-minutes) that can be exploited to
understand the whole match.
3The 75-90 model has 4 intervals because it also includes the injury time.4The results are slightly deteriorated in this new 5-minutes setting (∼1-2 points w.r.t.
the 15-minutes model). This is caused by a fragmentation of the features in pieces ofinformation that are too fine grained and thus not easily exploited by the models.
24
4. Conclusions
In this work, we conducted a series of experiments devoted to expose
what we called elapsing-time bias. Namely, we discovered that as the match
progresses we are able to predict the match final outcome with an high con-
fidence simply by looking at the in-game statistics for the last 15 minute
interval. We conjecture that the latter is due to players reacting to the
partial outcome, which gets closer, as time goes by, to the final score.
We believe that such bias poses a serious problem when using the obtained
models to perform any kind of subsequent analysis, e.g., as is often done in the
literature, determine which are the factors that most contribute to winning
a match.
Hence, we proposed to re-frame the learning problem in such a way that
is not affected by elapsing-time bias. In particular, the novel task does not
make use of the final (or partial) outcome as the mapping labels. For each
interval, the mapping output is instead computed taking into account only
the goals scored within that time frame.
Finally, we presented a novel set of experiments to analyze the perfor-
mance of the models in this new scenario. This study shows that the novel
task is not affected by elapsing-time bias.
Appendix A. Models
In this appendix we briefly describe the different machine learning models
employed in this work, which implementation was used and how they were
setup.
25
name description events mean std min maxaerial absolute # of aerial aerial 28.64 13.51 1.0 106.0aerial percentage succ % on aerial aerial 0.49 0.12 0.0 1.0attack event # of almost any event within zone almost any event 192.01 63.25 27.0 578.0ball possession # of possession events possession events 0.49 0.1 0.16 0.84ball recovery # of ball recovery ball recovery 47.2 11.97 1.0 119.0ball touch # of ball touch ball touch 17.17 9.07 0.0 48.0blocked pass # of blocked pass blocked pass 3.71 4.61 0.0 28.0challenge absolute # of challenge challenge 8.37 4.5 0.0 44.0challenge defence absolute # of challenge within zone challenge 2.36 1.96 0.0 15.0chance missed # of chance missed chance missed 0.06 0.27 0.0 3.0claim # of claim claim 1.35 1.37 0.0 11.0clearance # of clearance clearance 28.05 12.62 1.0 100.0corner # of corner corner 5.02 2.87 0.0 21.0cross absolute # of crosses pass 12.17 5.76 0.0 48.0cross not claimed # of cross not claimed cross not claimed 0.05 0.24 0.0 4.0cross percentage succ % on crosses pass 0.24 0.14 0.0 1.0defence event # of almost any event within zone almost any event 191.88 41.85 59.0 496.0dispossessed # of dispossessed dispossessed 11.82 4.61 0.0 35.0error # of error error 0.27 0.54 0.0 4.0error defence # of error within zone error 0.19 0.44 0.0 4.0formation change # of formation change formation change 0.75 0.95 0.0 12.0foul # of foul foul 13.82 4.55 1.0 38.0good skill # of good skill good skill 0.29 0.67 0.0 9.0h index h index on length of actions pass 7.69 1.59 3.0 15.0inner center shots # of shot events within zone shot events 5.35 3.09 0.0 23.0inner left shots # of shot events within zone shot events 0.09 0.31 0.0 3.0inner right shots # of shot events within zone shot events 0.08 0.29 0.0 3.0interception # of interception interception 17.22 7.03 1.0 65.0keeper pickup # of keeper pickup keeper pickup 7.09 3.09 0.0 25.0keeper sweeper # of keeper sweeper keeper sweeper 0.77 1.05 0.0 9.0mean n seq mean lenght of actions pass 5.55 0.98 3.29 11.96mean pass length mean length of passages pass 22.24 1.91 14.68 31.81missed shots # of missed shots missed shots 5.21 2.67 0.0 19.0n excessive seq # of long actions pass 8.79 5.62 0.0 42.0n seq # of actions pass 44.0 10.63 9.0 93.0n shot seq # of pass + shots events pass + shots events 0.98 1.24 0.0 12.0offside given # of offside given offside given 2.49 1.94 0.0 15.0offside pass # of offside pass offside pass 2.48 1.93 0.0 15.0offside provoked # of offside provoked offside provoked 2.48 1.93 0.0 15.0other shots # of shot events within zone shot events 0.42 0.72 0.0 8.0outer center shots # of shot events within zone shot events 5.19 2.94 0.0 23.0outer left shots # of shot events within zone shot events 0.2 0.47 0.0 5.0outer right shots # of shot events within zone shot events 0.15 0.4 0.0 4.0pass absolute # of pass pass 470.08 111.69 156.0 1096.0pass attack absolute # of pass within zone pass 54.72 23.19 4.0 231.0pass attack percentage succ % on pass within zone pass 0.57 0.09 0.1 0.95pass defence absolute # of pass within zone pass 68.72 16.61 22.0 215.0pass defence percentage succ % on pass within zone pass 0.67 0.12 0.23 1.0pass percentage succ % on pass pass 0.74 0.07 0.31 0.93passages possession passage distance covered pass 7906.97 2450.43 1575.01 19466.68penalty faced # of penalty faced penalty faced 0.08 0.28 0.0 3.0punch # of punch punch 0.58 0.86 0.0 9.0red card # of red card red card 0.05 0.23 0.0 3.0save # of save save 6.04 3.27 0.0 31.0saved shot # of saved shot saved shot 6.06 3.29 0.0 31.0shield ball opp # of shield ball opp shield ball opp 0.54 0.79 0.0 7.0shot on post # of shot on post shot on post 0.23 0.49 0.0 4.0smother # of smother smother 0.11 0.36 0.0 6.0substitution # of substitution substitution 2.79 0.48 0.0 9.0tackle absolute # of tackle tackle 20.03 5.92 3.0 47.0tackle defence absolute # of tackle within zone tackle 6.75 3.29 0.0 24.0tackle defence percentage succ % on tackle within zone tackle 0.77 0.19 0.0 1.0tackle percentage succ % on tackle tackle 0.75 0.1 0.15 1.0take on absolute # of take on take on 18.58 7.39 0.0 65.0take on attack absolute # of take on within zone take on 6.71 3.74 0.0 28.0take on attack percentage succ % on take on within zone take on 0.32 0.22 0.0 1.0take on percentage succ % on take on take on 0.44 0.15 0.0 1.0various error # of error events error events 0.39 0.67 0.0 5.0x mass center mean x position of events almost any event 47.57 4.83 23.66 65.95y mass center mean y position of events almost any event 48.53 3.37 32.87 60.88yellow card # of yellow card yellow card 2.03 1.37 0.0 10.0
Table A.6: For every features the table shows the belonging group, events which areinvolved in the extraction, the description and few basic statistics. Features showed hereare extracted from the performances of both teams on the whole match.
26
k-Nearest Neighbor
A popular classification method is k-Nearest Neighbor (k -NN) [19]. This
learning algorithm is memory-based : the fitting procedure amounts to memo-
rize the training set. Given a query point x, k -NN searches for the k nearest
neighbors {xi1 , . . . , xik} of x among the training points according to some
distance function. The output is then produced by a majority vote between
the training labels {yi1 , . . . , yik}. When probabilities are required, no voting
is performed and probabilities are obtained simply by counting the number
of examples of each class among the k neighbors. This is how we produce
probabilistic outputs in our experiments in Section 3.
We used, for its simplicity, k -NN as a baseline against other algorithms.
We employed the k -NN implementation available in the scikit-learn [20] col-
lection with the euclidean norm as distance function. As an example, k -NN
has been used to predict the outcome of football matches given the book-
makers’ odds in [21].
Support Vector Machines
Support Vector Machines [22] were originally developed for binary clas-
sification problems. The idea is to find a hyper-plane (w, b) that separates
positive (P) and negative (N) examples and, once such hyper-plane has been
determined, express the classification function5 f : Rn → [−1, 1] as
f(x) = sign(〈w, x〉+ b). (A.1)
5Here, the two classes are -1, 1
27
In practice, linear SVM may perform poorly when data are highly non-
linear. Non-linear SVMs map the input vectors into a higher dimensional
space F , called feature space, through a non-linear mapping function φ.
In this work we used both the linear version of SVM and SVM equipped
with the RBF kernel: K(x, z) = e−γ‖x−z‖2, with γ > 0.
When probabilistic outputs are required, the distance from the hyper-
plane can be turned into a probability estimate through calibration tech-
niques like isotonic regression or Platt’s scaling. For the experiments we
employed Platt’s scaling technique available in scikit-learn [20].
As for the multi-class case, several extensions of the SVM model case have
been proposed in the literature. Two of the most popular are the one-vs-all
and one-vs-one techniques. Both of these techniques rely on training several
binary SVM classifiers. In our experiments we used the one-vs-all version.
We used libSVM [23] to train SVM. For an extensive treatment of SVM we
refer the reader to the book [24]. SVMs have been widely used also for sport
applications [25, 26].
Neural Networks
Neural Networks (NNs) are a powerful class of functions that can ap-
proximate any non-linear function, provided that enough parameters are
employed, as stated in the universal approximation theorem [27].
Neural networks can be arranged in a variety of ways depending on the
number and size of layers and the choice of activation, output and loss func-
tion employed. In our experiments, we used the hyperbolic tangent as the
activation function, the softmax as output function and the cross-entropy as
the loss function. The number and size of layers were determined using cross
28
validation.
We used tensorflow [28] to implement NNs and we used Stochastic Gra-
dient Descent [29] to train them. Applications of neural networks in sports
prediction can be found, for example, in [30].
Logistic Regression
Although Logistic Regression was first developed as a modification of
Linear Regression for classification, it can be described as a simple neural
network. It is, in fact, equivalent to a single-layered feed-forward neural
network with a single output neuron and sigmoid output function. The
training loss of choice is binary-cross entropy. Its multi-class extension can
be achieved in a one-vs-all or one-vs-one fashion as described for SVM or
by modifying the architecture to include multi-output units. In this case,
softmax is chosen as output function as in NNs. In our experiments, we used
the latter. We took the implementation from the scikit-learn collection [20].
We used the `2 norm for the regularization term.
Random Forests and XGBoost
Random Forests [17] and XGBoost [31] are two very popular machine
learning techniques that both exploit Decision Trees as a building block for
composing larger methods.
Random Forests are probably the most famous application of ensembling:
given a variety of different models, each prediction is obtained combining
their outputs. In the case of Random Forests, this combination is obtained
employing many randomized Decision Trees models that have been built
29
by sub-sampling the dataset along the axes of examples and/or features,
together with a random selection of the splitting feature.
XGBoost is a specific implementation of the Gradient Boosting algorithm
originally proposed by Friedman [32]. Boosting techniques also belong to the
class of Ensemble methods since the final function is obtained by combin-
ing various simpler functions, in this case Decision Trees but differ in the
way they are trained. We refer the reader to [31] for further details on the
Gradient Boosting algorithm.
We used the implementation from [20] and [33] for Random Forest and
XGBoost respectively. In [25] both random forest and XGBoost have been
used to predict the outcome of a football match.
References
[1] L. Pappalardo, P. Cintia, A. Rossi, E. Massucco, P. Ferragina, D. Pe-
dreschi, F. Giannotti, A public data set of spatio-temporal match events
in soccer competitions, Scientific data 6 (1) (2019) 1–15.
[2] H. Liu, M.-A. Gomez, C. Lago-Penas, J. Sampaio, Match statistics re-
lated to winning in the group stage of 2014 brazil fifa world cup, Journal
of sports sciences 33 (12) (2015) 1205–1213.
[3] F. A. Moura, L. E. B. Martins, S. A. Cunha, Analysis of football game-
related statistics using multivariate techniques, Journal of sports sci-
ences 32 (20) (2014) 1881–1887.
[4] C. Lago-Penas, J. Lago-Ballesteros, A. Dellal, M. Gomez, Game-related
statistics that discriminated winning, drawing and losing teams from the
30
spanish soccer league, Journal of sports science & medicine 9 (2) (2010)
288.
[5] C. Lago-Penas, J. Lago-Ballesteros, E. Rey, Differences in performance
indicators between winning and losing teams in the uefa champions
league, Journal of human kinetics 27 (2011) (2011) 135–146.
[6] H. Lepschy, H. Wasche, A. Woll, Success factors in football: an analysis
of the german bundesliga, International Journal of Performance Analysis
in Sport 20 (2) (2020) 150–164.
[7] J. Castellano, D. Casamichana, C. Lago, The use of match statistics that
discriminate between successful and unsuccessful soccer teams, Journal
of human kinetics 31 (1) (2012) 137–147.
[8] H. Liu, W. G. Hopkins, M.-A. Gomez, Modelling relationships between
match events and match outcome in elite football, European journal of
sport science 16 (5) (2016) 516–525.
[9] L. Pappalardo, P. Cintia, Quantifying the relation between performance
and success in soccer, Advances in Complex Systems 21 (4) (2018)
1750014.
[10] Y. Li, R. Ma, B. Goncalves, B. Gong, Y. Cui, Y. Shen, Data-driven
team ranking and match performance analysis in chinese football super
league, Chaos, Solitons & Fractals 141 (2020) 110330.
[11] R. P. Bunker, F. Thabtah, A machine learning framework for sport result
prediction, Applied computing and informatics 15 (1) (2019) 27–33.
31
[12] J. Sampaio, C. Lago, L. Casais, N. Leite, Effects of starting score-line,
game location, and quality of opposition in basketball quarter score,
European Journal of Sport Science 10 (6) (2010) 391–396.
[13] L. Vaz, M. Van Rooyen, J. Sampaio, Rugby game-related statistics that
discriminate between winning and losing teams in irb and super twelve
close games, Journal of sports science & medicine 9 (1) (2010) 51.
[14] M.-A. Gomez, A. DelaSerna, C. Lupo, J. Sampaio, Effects of situational
variables and starting quarter score in the outcome of elite women’s
water polo game quarters, International Journal of Performance Analysis
in Sport 14 (1) (2014) 73–83.
[15] C. Lupo, G. Condello, L. Capranica, A. Tessitore, Women’s water polo
world championships: Technical and tactical aspects of winning and
losing teams in close and unbalanced games, The Journal of Strength &
Conditioning Research 28 (1) (2014) 210–222.
[16] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Let-
ters 27 (8) (2006) 861 – 874.
[17] L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32.