# Is Football/Soccer Purely Stochastic, Made Out of Luck, or Maybe Predictable? How Does Bayesian Reasoning Assess Sports?

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theoretical Framework

#### Hamiltonian Monte Carlo (Hmc) with Rstan

`Shinystan`package in R, which offers a variety of graphs and metrics tools, related to the convergence of the Monte Carlo chains, such as: ${\eta}_{eff}/N$, mcse/sd, HMC/NUTS and autocorrelation of each of the parameters, detailed by chain.

`Shinystan`package [21] generates a table considering each chain separately and together, in which, for the HMC/NUTS, it shows the mean, standard deviation, maximum and minimum of the following properties of the chains:

- accept_stat: For the HMC without NUTS, it is the Metropolis’ standard acceptance probability. A value closer to one is better (robust);
- stepsize: The integrator used in the Hamiltonian simulation. If the value is large, it will be imprecise and reject too many proposals; and if it is small, it will take too many small steps, which will cause long simulation times per interval;
- treedepth: A $treedepth=0$ means that the first jump step is immediately rejected and returns to the initial state;
- n_leapfrog: The number of jump steps performed during the Hamiltonian simulation. If its value is small, the sample will become a random walk; and if it is large, the algorithm will work more in each iteration;
- divergent: The number of jump transitions with divergent error. This number is the average of divergence at each iteration;
- energy: The Hamiltonian value in each sample. The energy diagnostic for HMC quantifies whether the tails of the posterior distribution are heavy or not.

## 3. Methodology

#### 3.1. Data

#### 3.2. Description of the Sports (Hierarchical) Model

## 4. Results

`Shinystan`package containing graphical and numerical summaries of parameters was adopted for the model and convergence diagnostics, and implemented in the R software for the analysis of the parameters with the HMC algorithm of strings with the size being 2000 iterations, with warmup = 1000 and thin = 1, with four independent chains. All statistical analyses of this study consider a credibility level of 50% (as statistical significance).

`Shinystan`package facilitates the analysis of the results due to the intuitive and easy interaction of this package, and by adopting the $launch\_shinystan$ function, a parsing interface related to the strings from HMC, you will be able to view various graphs and parameter estimates in more detail.

`Shinystan`also makes it possible to view all the posteriors of the parameters adopted by the statistical model, enabling a better understanding of what is being analyzed and, in the case of seeing any anomaly, to be able to correct it or to do some other test. For each parameter that intervenes in the model, a graph of the posterior distribution is generated, and, as shown in Figure 8, the posterior distributions for the ${\mu}_{\mathrm{def}}$, ${\tau}_{\mathrm{def}}$, ${\mu}_{\mathrm{att}}$ and ${\tau}_{\mathrm{att}}$ parameters obtained by the HMC strings. From these graphs, we could see that the Home parameter is distributed in a symmetric way (in Figure 5), and that the ${\mu}_{\mathrm{def}}$ and ${\mu}_{\mathrm{att}}$ parameters have slight skewness (in Figure 8). The precision parameters ${\tau}_{\mathrm{def}}$ and ${\tau}_{\mathrm{att}}$ are involved with attack parameter dispersion and suggest a significance, compared to defense, since the distribution of ${\tau}_{\mathrm{def}}$ is around zero.

`Shinystan`interface are presented in Table 3, in which the $accept\_stat$ metric validates the convergence of each of the strings in a general way, since there are values close to one.

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

HMC | Hamiltonian Monte Carlo |

MCMC | Markov chain Monte Carlo |

NUTS | No-U-Turn Sampler |

MAP | Maximum a Posteriori |

CIs | Credibility Intervals |

GLM | Generalized Linear Model |

LOO | leave-one-out |

elpd_loo | LOO expected log pointwise predictive density |

p_loo | LOO effective number of parameters |

looic | LOO Information Criterion |

SE | Standard Error |

att | attack capacity |

def | defense capacity |

$\theta $ | Poisson parameter related to the expected number of events of each team |

${\beta}_{\mathrm{home}}$ | Home advantage parameter for the team playing at home |

${\beta}_{1}$ and ${\beta}_{3}$ | MODEL 1—Attack ability parameters from home and away |

${\beta}_{2}$ and ${\beta}_{4}$ | MODEL 1—Defense ability parameters from home and away |

${\beta}_{\mathrm{att}}$ | MODEL 2—Attack ability parameter (for each team) |

${\beta}_{\mathrm{def}}$ | MODEL 2—Defense ability parameter (for each team) |

${\mu}_{\mathrm{att}}$ | MODEL 2—2020 Chilean Premier attack (mean) location parameter |

${\mu}_{\mathrm{def}}$ | MODEL 2—2020 Chilean Premier defense (mean) location parameter |

${\tau}_{\mathrm{att}}$ | MODEL 2—2020 Chilean Premier attack variability parameter |

${\tau}_{\mathrm{def}}$ | MODEL 2—2020 Chilean Premier defense variability parameter |

UCA | Universidad Católica team |

ULC | Unión La Calera team |

UCH | Universidad de Chile team |

UES | Unión Española team |

PAL | Palestino team |

DAN | Deportes Antofagasta team |

COB | Cobresal team |

HUA | Huachipato team |

CUN | Curicó Unido team |

OHI | O´Higgins team |

SWA | Santiago Wanderers team |

EVE | Everton team |

UCO | Universidad de Concepción team |

AIT | Audax Italiano team |

DLS | Deportes La Serena team |

CCO | Colo Colo team |

DIQ | Deportes Iquique team |

COU | Coquimbo Unido team |

## References

- Radicchi, E.; Mozzachiodi, M. Social talent scouting: A new opportunity for the identification of football players? Phys. Cult. Sport
**2016**, 70, 28. [Google Scholar] [CrossRef] [Green Version] - Schumaker, R.P.; Solieman, O.K.; Chen, H. Sports Data Mining; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010; Volume 26. [Google Scholar]
- Morgulev, E.; Azar, O.H.; Lidor, R. Sports analytics and the big-data era. Int. J. Data Sci. Anal.
**2018**, 5, 213–222. [Google Scholar] [CrossRef] - Beal, R.; Norman, T.J.; Ramchurn, S.D. Artificial intelligence for team sports: A survey. Knowl. Eng. Rev.
**2019**, 34, e28. [Google Scholar] [CrossRef] - Louzada, F.; Maiorano, A.C.; Ara, A. iSports: A web-oriented expert system for talent identification in soccer. Expert Syst. Appl.
**2016**, 44, 400–412. [Google Scholar] [CrossRef] - Santos-Fernandez, E.; Mengersen, K.L.; Wu, P. Bayesian methods in sport statistics. In Wiley StatsRef: Statistics Reference Online; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Anderson, C.; Sally, D. The Numbers Game: Why Everything You Know about Soccer Is Wrong; Penguin: London, UK, 2013. [Google Scholar]
- Baio, G.; Blangiardo, M. Bayesian hierarchical model for the prediction of football results. J. Appl. Stat.
**2010**, 37, 253–264. [Google Scholar] [CrossRef] [Green Version] - Lee, A.J. Modeling scores in the Premier League: Is Manchester United really the best? Chance
**1997**, 10, 15–19. [Google Scholar] [CrossRef] - Suzuki, A.K.; Salasar, L.E.B.; Leite, J.; Louzada-Neto, F. A Bayesian approach for predicting match outcomes: The 2006 (Association) Football World Cup. J. Oper. Res. Soc.
**2010**, 61, 1530–1539. [Google Scholar] [CrossRef] - Santana, H.; Ferreira, P.H.; Ara, A.; Louzada, F.; Suzuki, A.K. Modelagem Estatística e de Aprendizado de Máquina: Previsão do Campeonato Brasileiro Série A 2017. MatemáTica EstatíStica Foco
**2019**, 7, 42-a. [Google Scholar] - Constantinou, A.C.; Fenton, N.E.; Neil, M. pi-football: A Bayesian network model for forecasting Association Football match outcomes. Knowl.-Based Syst.
**2012**, 36, 322–339. [Google Scholar] [CrossRef] - Hervert-Escobar, L.; Hernandez-Gress, N.; Matis, T.I. Bayesian based approach learning for outcome prediction of soccer matches. In International Conference on Computational Science; Springer: Berlin/Heidelberg, Germany, 2018; pp. 269–279. [Google Scholar]
- Poisson, S.D. Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile; Bachelier: Cambridge, MA, USA, 1837. [Google Scholar]
- Gelade, G.A. The influence of team composition on attacking and defending in football. J. Sport. Econ.
**2018**, 19, 1174–1190. [Google Scholar] [CrossRef] - Moreno, E.; Martínez, C. Bayesian and frequentist evidence in one-sided hypothesis testing. In TEST; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–20. [Google Scholar]
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys.
**1953**, 21, 1087–1092. [Google Scholar] [CrossRef] [Green Version] - Alder, B.J.; Wainwright, T.E. Studies in molecular dynamics. I. General method. J. Chem. Phys.
**1959**, 31, 459–466. [Google Scholar] [CrossRef] [Green Version] - Brooks, S.; Gelman, A.; Jones, G.; Meng, X.L. Handbook of Markov Chain Monte Carlo; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
- Betancourt, M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv
**2017**, arXiv:1701.02434. [Google Scholar] - Gabry, J.; Simpson, D.; Vehtari, A.; Betancourt, M.; Gelman, A. Visualization in Bayesian workflow. J. R. Stat. Soc. Ser. Stat. Soc.
**2019**, 182, 389–402. [Google Scholar] [CrossRef] [Green Version] - CJ, D.; Chakravarty, A. Team Contingent or Sport Native? A Bayesian Analysis of Home Field Advantage in Professional Soccer. J. Bus. Anal.
**2021**, 4, 67–75. [Google Scholar] - Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput.
**2017**, 27, 1413–1432. [Google Scholar] [CrossRef] [Green Version] - Muth, C.; Oravecz, Z.; Gabry, J. User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. Quant. Methods Psychol.
**2018**, 14, 99–119. [Google Scholar] [CrossRef] - Crowder, M.; Dixon, M.; Ledford, A.; Robinson, M. Dynamic modelling and prediction of English Football League matches for betting. J. R. Stat. Soc. Ser. Stat.
**2002**, 51, 157–168. [Google Scholar] [CrossRef]

**Figure 2.**Teams’ performance through the 2020 Chilean Premier Soccer League. Despite the group being a home player or visitor, most of the teams’ performance was on a single goal (on median, or 50% of the time). At home, the teams with better performance were PAL, UCA, UCH, DAN, and AIT. As visitors, better performances were ULC and UCA teams.

**Figure 3.**Prediction of the Chile 2020 championship, according to the adjusted model, saving the last 5 rounds of 2020 games. The points in red are the teams that had statistical significance, being only their attacking power significant.

**Figure 4.**Betas Maximum a Posteriori (MAPs) probability estimates and 50% credibility intervals (50% CIs) for the attack parameters on the left, and the defense parameters on the right for each team.

**Figure 8.**Distribution of defense and attack parameters in general in the Chilean league ($\mu $ and $\tau $). The main diagonal shows the histograms of the posterior distributions of the offense and defense of the 2020 Chilean First Division league. The upper triangular part shows evidence of independence, from the HMC, among the parameter estimates. The lower triangular part shows results associated with the bivariate distributions of the predicted posteriors.

Statistic | Estimate | SE | |
---|---|---|---|

Model 1 (Non-hierarchical) | elpd_loo | −15.4 | 2.3 |

p_loo | 0.5 | 0.3 | |

looic | 30.7 | 4.7 | |

Model 2 (Hierarchical) | elpd_loo | −15.4 | 2.1 |

p_loo | 0.2 | 0.1 | |

looic | 30.8 | 4.1 |

Posterior Mean | Posterior Quantile 25% | Posterior Quantile 75% | |
---|---|---|---|

HOME | $1.37\times {10}^{8}$ | $-5.96\times {10}^{9}$ | $6.45\times {10}^{9}$ |

UCA_att | 0.4999 | 0.4071 | 0.5947 |

ULC_att | 0.4185 | 0.3278 | 0.5124 |

UES_att | 0.3616 | 0.2715 | 0.4532 |

UCH_att | 0.2636 | 0.1724 | 0.3557 |

PAL_att | 0.2535 | 0.1608 | 0.3487 |

AIT_att | 0.2208 | 0.1292 | 0.3168 |

COB_att | 0.1965 | 0.1087 | 0.2902 |

HUA_att | 0.183 | 0.0868 | 0.2799 |

DAN_att | 0.1583 | 0.0623 | 0.2573 |

SWA_att | 0.142 | 0.0482 | 0.2429 |

OHI_att | 0.1107 | 0.0162 | 0.2067 |

CUN_att | 0.1061 | 0.0184 | 0.1972 |

DIQ_att | 0.0711 | −0.0231 | 0.1681 |

UCO_att | 0.0687 | −0.024 | 0.1657 |

EVE_att | 0.0519 | −0.0445 | 0.1513 |

DLS_att | 0.0082 | −0.0872 | 0.1085 |

CCO_att | −0.0311 | −0.1284 | 0.071 |

COU_att | −0.059 | −0.1584 | 0.0434 |

UCO_def | −0.0105 | −0.0136 | 0.0055 |

ULC_def | −0.0042 | −0.0091 | 0.0066 |

COU_def | −0.0097 | −0.0134 | 0.0051 |

COB_def | −0.0013 | −0.0086 | 0.0085 |

UCA_def | 0.0084 | −0.0051 | 0.0134 |

UCH_def | 0.0104 | −0.0042 | 0.0137 |

DIQ_def | −0.0114 | −0.0143 | 0.0044 |

DLS_def | −0.0002 | −0.0085 | 0.0085 |

PAL_def | −0.008 | −0.0121 | 0.0059 |

UES_def | −0.0254 | −0.0249 | 0.0022 |

SWA_def | −0.0249 | −0.0242 | 0.0023 |

HUA_def | −0.0077 | −0.0117 | 0.0057 |

CUN_def | −0.0235 | −0.0213 | 0.0025 |

CCO_def | −0.0051 | −0.0104 | 0.0069 |

OHI_def | 0.0014 | −0.007 | 0.0086 |

EVE_def | −0.0058 | −0.0096 | 0.0065 |

DAN_def | −0.0044 | −0.0095 | 0.0068 |

AIT_def | −0.0219 | −0.0202 | 0.0024 |

${\mu}_{\mathrm{att}}$ | $1.12E\times {10}^{8}$ | $-6.56\times {10}^{8}$ | $6.71\times {10}^{0}9$ |

${\tau}_{\mathrm{att}}$ | 0.2729 | 0.2288 | 0.311 |

${\mu}_{\mathrm{def}}$ | $1.08\times {10}^{8}$ | $-6.68\times {10}^{9}$ | $6.84\times {10}^{9}$ |

${\tau}_{\mathrm{def}}$ | 0.0343 | 0.0047 | 0.0452 |

**Table 3.**Metrics provided by the

`Shinystan`parameter on chain convergence for the hierarchical model.

Chain | Accept_stat | Stepsize | Treedepth | N_deapfrog | Divergent | Energy |
---|---|---|---|---|---|---|

All chains | 0.7922 | 0.0265 | 7.0215 | 158.1773 | 0.0158 | 494.9190 |

Chain 1 | 0.7909 | 0.0224 | 7.4130 | 218.1270 | 0.0060 | 498.7200 |

Chain 2 | 0.7583 | 0.0276 | 6.9330 | 125.8300 | 0.0050 | 494.4215 |

Chain 3 | 0.7272 | 0.0260 | 6.6970 | 145.0860 | 0.0510 | 463.3970 |

Chain 4 | 0.8924 | 0.0297 | 7.0430 | 143.6660 | 0.0010 | 523.1376 |

**Table 4.**Median predicted goals, from the 4 chains, adopting the last 5 games of the 2020 Chilean championship.

MATCH | HOME | AWAY | ||||
---|---|---|---|---|---|---|

HOME × AWAY | GOAL | MODEL 1 | MODEL 2 | GOAL | MODEL 1 | MODEL 2 |

DLS × AIT | 0 | 1 | 1 | 2 | 1 | 1 |

UCO × UCA | 1 | 1 | 1 | 2 | 2 | 1 |

PAL × COU | 2 | 1 | 1 | 2 | 1 | 1 |

EVE × HUA | 1 | 1 | 1 | 0 | 1 | 1 |

OHI × CCO | 1 | 1 | 1 | 1 | 1 | 1 |

**Table 5.**Final standings of the 2020 Chilean Premier League. The positions highlighted, in bold style, were predicted correctly by the adjusted Bayesian hierarchical model.

Position | Team Name | Points |
---|---|---|

1 | Universidad Católica (UCA) | 65 |

2 | Unión La Calera (ULC) | 57 |

3 | Universidad de Chile (UCH) | 52 |

4 | Unión Española (UES) | 52 |

5 | Palestino (PAL) | 51 |

6 | Deportes Antofagasta (DAN) | 48 |

7 | Cobresal (COB) | 47 |

8 | Huachipato (HUA) | 46 |

9 | Curicó Unido (CUN) | 46 |

10 | O´Higgins (OHI) | 45 |

11 | Santiago Wanderers (SWA) | 44 |

12 | Everton (EVE) | 43 |

13 | Universidad de Concepción (UCO) | 41 |

14 | Audax Italiano (AIT) | 41 |

15 | Deportes La Serena (DLS) | 39 |

16 | Colo Colo (CCO) | 39 |

17 | Deportes Iquique (DIQ) | 38 |

18 | Coquimbo Unido (COU) | 35 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Blanco, L.B.; Ferreira, P.H.; Louzada, F.; Nascimento, D.C.d.
Is Football/Soccer Purely Stochastic, Made Out of Luck, or Maybe Predictable? How Does Bayesian Reasoning Assess Sports? *Axioms* **2021**, *10*, 276.
https://doi.org/10.3390/axioms10040276

**AMA Style**

Blanco LB, Ferreira PH, Louzada F, Nascimento DCd.
Is Football/Soccer Purely Stochastic, Made Out of Luck, or Maybe Predictable? How Does Bayesian Reasoning Assess Sports? *Axioms*. 2021; 10(4):276.
https://doi.org/10.3390/axioms10040276

**Chicago/Turabian Style**

Blanco, Leonardo Barrios, Paulo Henrique Ferreira, Francisco Louzada, and Diego Carvalho do Nascimento.
2021. "Is Football/Soccer Purely Stochastic, Made Out of Luck, or Maybe Predictable? How Does Bayesian Reasoning Assess Sports?" *Axioms* 10, no. 4: 276.
https://doi.org/10.3390/axioms10040276