An Expected Goals Model for Analyzing a 5-a-Side Soccer for the Blind Using Ten Machine Learning Algorithms with SHAP Interpretability
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Dataset
2.3. Variables and Characteristics
2.4. Data Split and Protection Against Overfitting
2.5. Compared with Algorithms
2.6. Evaluation Metrics
2.7. SHAP Analysis
2.8. Implementation
2.9. Ethical Considerations
3. Results
3.1. Comparison of the Ten Algorithms
3.2. SHAP Interpretability Based on the Model’s Overall Importance
3.3. SHAP Heatmap
3.4. Functional Dependence of the Main Predictors
3.5. Individual Decomposition: Waterfall and Decision Plots
4. Discussion
4.1. Practical Implications
4.2. Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hernández-Beltrán, V.; Gámez-Calvo, L.; Castelli Correia de Campos, L.F.; Bertu, F.; Gamonales, J.M. Analysis of the Body Composition of the Players of the Spanish Blind Football Team. Rev. Andal. Med. Deporte 2023, 16, 100–106. [Google Scholar] [CrossRef]
- Huertas-Pineda, L.V.; Zapata-Piratova, C.D.; Santos-Tavera, J.E.; Becerra Patiño, B.A. Análisis de variables sociodemográficas, psicológicas, físicas y técnicas de los futbolistas ciegos en relación con su rol de juego: Estudio exploratorio. Sport. Sci. J. Sch. Sport Phys. Educ. Psychomot. 2026, 12, 1–25. [Google Scholar] [CrossRef]
- Gamonales, J.M.; Muñoz-Jiménez, J.; León-Guzmán, K.; Ibáñez, S.J. 5-A-Side Football for Individuals with Visual Impairments: A review of the Literature. Eur. J. Adapt. Phys. Act. 2018, 11, 4. [Google Scholar] [CrossRef]
- Becerra-Patiño, B.A.; Montenegro-Bonilla, A.D.; Valencia-Sánchez, W.G.; Olivares-Arancibia, J.; Yáñez-Sepúlveda, R.; Pino-Ortega, J. Identification of Performance Variables in Blind 5-A-Side Football: Physical Fitness, Physiological Responses, Technical–Tactical Actions and Recovery Variables: A Systematic Review. Sports 2026, 14, 3. [Google Scholar] [CrossRef] [PubMed]
- Souza, R.P.; Alves, J.M.V.M.; Gorla, J.I.; Novaes, G.; Cabral, S.I.C.; Neves, E.B.; Nogueira, C.D. Characterization of the intensity of effort of blind athletes from the Brazilian Football 5-A-Side national team. J. Health Biol. Sci. 2016, 4, 218–226. [Google Scholar] [CrossRef]
- Finocchietti, S.; Gori, M.; Souza Oliveira, A. Kinematic Profile of Visually Impaired Football Players During Specific Sports Actions. Sci. Rep. 2019, 9, 10660. [Google Scholar] [CrossRef] [PubMed]
- Gamonales, J.M.; Hernández Beltrán, V.; León, K.; Espada, M.; Sanabria Jiménez, M.; Alemán Ramírez, C.; Castelli Correia de Campos, L.F.; Muñoz Jiménez, J. Analysis of the shots in Football for blind people in the 2021 World Grand Prix. Cult. Cienc. Deporte 2023, 18, 81–89. [Google Scholar] [CrossRef]
- Becerra Patiño, B.A.; Pino-Ortega, J.; Olivares-Arancibia, J. Exploratory review of the scientific production of 5-blind soccer with the Bibliometrix tool. Retos 2025, 68, 1025–1047. [Google Scholar] [CrossRef]
- Anzer, G.; Bauer, P. A Goal Scoring Probability Model for Shots Based on Synchronized Positional and Event Data in Football (Soccer). Front. Sports Act. Living 2021, 3, 624475. [Google Scholar] [CrossRef] [PubMed]
- Belloso, J.; Gómez-Ruano, M.Á.; Lago-Peñas, C. Unveiling the xG metric in soccer: How the adjusted xG improves match predictions. Int. J. Perform. Anal. Sport 2026, 1–11. [Google Scholar] [CrossRef]
- Ruiz-de-Alarcón-Quintero, A.; Dela-Cruz-Torres, B. An Expected Goals on Target (xGOT) Metric as a New Metric for Analyzing Elite Soccer Player Performance. Data 2024, 9, 102. [Google Scholar] [CrossRef]
- Bandara, I.; Shelyag, S.; Rajasegarar, S.; Dwyer, D.; Kim, E.J.; Angelova, M. Predicting goal probabilities with improved xG models using event sequences in association football. PLoS ONE 2024, 19, e0312278. [Google Scholar] [CrossRef] [PubMed]
- Murillo García, C. The XG and their association with goals scored in elite football. Rev. Iberoam. Cienc. Act. Fís. Deporte 2025, 14, 85–93. [Google Scholar] [CrossRef]
- Davis, J.; Bransen, L.; Devos, L.; Jaspers, A.; Meert, W.; Robberechts, P.; Van Haaren, J.; Van Roy, M. Methodology and evaluation in sports analytics: Challenges, approaches, and lessons learned. Mach. Learn. 2024, 113, 6977–7010. [Google Scholar] [CrossRef]
- Iapteff, L.; Le Coz, S.; Rioland, M.; Houde, T.; Carling, C.; Imbach, F. Toward interpretable expected goals modeling using Bayesian mixed models. Front. Sports Act. Living 2025, 7, 1504362. [Google Scholar] [CrossRef] [PubMed]
- International Blind Sports Federation. IBSA Blind Football Laws of the Game 2022–2025; International Blind Sports Federation: Bonn, Germany, 2022; Available online: https://ibsasport.org/sports/football/about/rules-and-downloads/ (accessed on 5 May 2026).
- Sakuma, T.; Kobayashi, M.; Kinoshita, H.; Matsui, Y.; Kobayashi, Y.; Watanabe, M. Three-dimensional kinematics analysis of blind football kicking. Sports Biomech. 2023, 22, 1136–1152. [Google Scholar] [CrossRef] [PubMed]
- Comité Paralímpico Español. Fútbol-5. Recuperado el 3 de Julio de 2025; Comité Paralímpico Español: Madrid, Spain, 2025; Available online: https://www.paralimpicos.es/deportes-paralimpicos/futbol-5 (accessed on 1 May 2026).
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Li, X.; Ouyang, Y.; Hong, W.; Jiang, W.; Li, F.; Zhao, Y.; Liu, Y.; Zhao, Y.; Zhou, T. An explainable machine learning analysis of technical and tactical indicators associated with CSL match outcomes. Front. Psychol. 2026, 17, 1854812. [Google Scholar] [CrossRef]
- Cavus, M.; Biecek, P. Explainable expected goal models for performance analysis in football analytics. In Proceedings of the 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–9. [Google Scholar] [CrossRef]
- Mănescu, D.C. Big Data Analytics Framework for Decision-Making in Sports Performance Optimization. Data 2025, 10, 116. [Google Scholar] [CrossRef]
- O’Donoghue, P. Research Methods for Sports Performance Analysis; Routledge: London, UK, 2010. [Google Scholar]
- Green, S. Assessing the Performance of Premier League Goalscorers; OptaPro Blog: London, UK, 2012. [Google Scholar]
- Rodu, J.; DeJong Lempke, A.F.; Kupperman, N.; Hertel, J. On Leveraging Machine Learning in Sport Science in the Hypothetico-deductive Framework. Sports Med.-Open 2024, 10, 124. [Google Scholar] [CrossRef] [PubMed]
- Chicco, D.; Jurman, G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 2023, 16, 4. [Google Scholar] [CrossRef] [PubMed]
- Shirdel, M.; Di Mauro, M.; Liotta, A. Worthiness Benchmark: A novel concept for analyzing binary classification evaluation metrics. Inf. Sci. 2024, 678, 120882. [Google Scholar] [CrossRef]
- Sundaram, S.; Gowri, K.; Devaraju, S.; Gokuldev, S.; Jayaprakash, S.; Anandaram, H.; Manivasagan, C.; Thenmozhi, M. An Exploration of Python Libraries in Machine Learning Models for Data Science. In Advanced Interdisciplinary Applications of Machine Learning Python Libraries for Data Science; IGI Global Scientific Publishing: Hershey, PA, USA, 2023. [Google Scholar] [CrossRef]
- World Medical Association. World Medical Association Declaration of Helsinki: Ethical Principles for Medical Research Involving Human Participants. JAMA 2025, 333, 71–74. [Google Scholar] [CrossRef] [PubMed]
- Mead, J.; O’Hare, A.; McMenemy, P. Expected goals in football: Improving model performance and demonstrating value. PLoS ONE 2023, 18, e0282295. [Google Scholar] [CrossRef] [PubMed]
- Rathke, A. An examination of expected goals and shot efficiency in soccer. J. Hum. Sport Exerc. 2017, 12, S514–S529. [Google Scholar] [CrossRef]
- Ruiz, H.; Power, P.; Wei, X.; Lucey, P. “The Leicester City Fairytale?”: Utilizing new soccer analytics tools to compare performance in the 15/16 & 16/17 EPL seasons. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1991–2000. [Google Scholar] [CrossRef]
- Herbinet, C. Predicting Football Results Using Machine Learning Techniques; Technical Report; Imperial College: London, UK, 2018. [Google Scholar]
- Moya, D.; Tipantuña, C.; Villa, G.; Calderón-Hinojosa, X.; Rivadeneira, B.; Álvarez, R. Machine Learning Applied to Professional Football: Performance Improvement and Results Prediction. Mach. Learn. Knowl. Extr. 2025, 7, 85. [Google Scholar] [CrossRef]
- Bunker, R.; Yeung, C.; Fujii, K. Machine Learning for Soccer Match Result Prediction. In Artificial Intelligence, Optimization, and Data Sciences in Sports; Blondin, M.J., Fister, I., Jr., Pardalos, P.M., Eds.; Springer Optimization and Its Applications, 218; Springer: Cham, Switzerland, 2025. [Google Scholar] [CrossRef]
- Gamonales, J.M.; Muñoz-Jiménez, J.; León, K.; Ibáñez, S.J. Differences between Championships of Football 5-a-Side for Blind People. Appl. Sci. 2021, 11, 8933. [Google Scholar] [CrossRef]
- Gamonales Puerto, J.M.; Muñoz Jiménez, J.; León Guzmán, K.; Ibáñez Godoy, S.J. Efficacy of shots on goal in football for the visually impaired. Int. J. Perform. Anal. Sport 2018, 18, 393–409. [Google Scholar] [CrossRef]
- Markopoulou, C.; Papageorgiou, G.; Tjortjis, C. Diverse Machine Learning for Forecasting Goal-Scoring Likelihood in Elite Football Leagues. Mach. Learn. Knowl. Extr. 2024, 6, 1762–1781. [Google Scholar] [CrossRef]
- Hewitt, J.H.; Karakuş, O. A machine learning approach for player and position adjusted expected goals in football (soccer). Frankl. Open 2023, 4, 100034. [Google Scholar] [CrossRef]
- Becerra-Patiño, B.A.; Monterrosa-Quintero, A.; Olivares-Arancibia, J.; López-Gil, J.F.; Pino-Ortega, J. Differences in Anthropometric and Body Composition Factors of Blind 5-a-Side Soccer Players in Response to Playing Position: A Systematic Review. J. Funct. Morphol. Kinesiol. 2025, 10, 238. [Google Scholar] [CrossRef] [PubMed]
- Becerra Patiño, B.A.; Escorcia-Clavijo, J.B. The transfer and dissemination of knowledge in sports training: A scoping review. Retos 2023, 50, 79–90. [Google Scholar] [CrossRef]









| Model | Best Hyperparameters (Selected by 5-Fold Inner CV on AUC) |
|---|---|
| Logistic Regression | C = 10.0; penalty = l2; solver = lbfgs |
| Random Forest | max_depth = 5; min_samples_split = 2; n_estimators = 300 |
| Extra Trees | max_depth = 5; n_estimators = 300 |
| Gradient Boosting | learning_rate = 0.1; max_depth = 2; n_estimators = 200 |
| XGBoost | learning_rate = 0.05; max_depth = 3; n_estimators = 200 |
| LightGBM | learning_rate = 0.1; n_estimators = 200; num_leaves = 15 |
| CatBoost | depth = 3; iterations = 100; learning_rate = 0.1 |
| SVM (RBF) | C = 0.5; gamma = scale |
| KNN | n_neighbors = 7; weights = uniform |
| MLP | alpha = 0.01; hidden_layer_sizes = (64, 32); learning_rate_init = 0.01 |
| Variable | Type | No-Goal Mean ± SD | Goal Mean ± SD | Overall Mean ± SD |
|---|---|---|---|---|
| Final X-coordinate (m) | Continuous | 10.32 ± 4.33 | 8.57 ± 2.93 | 10.18 ± 4.26 |
| Final Y-coordinate (m) | Continuous | 5.39 ± 2.88 | 5.52 ± 1.98 | 5.40 ± 2.81 |
| Distance to goal (m) | Continuous | 11.73 ± 4.15 | 10.02 ± 2.25 | 11.59 ± 4.05 |
| Progression vector (m) | Continuous | 14.19 ± 7.18 | 15.19 ± 6.96 | 14.27 ± 7.14 |
| Shot angle (°) | Continuous | 26.52 ± 15.13 | 29.49 ± 16.43 | 26.76 ± 15.20 |
| Match minute (absolute) | Continuous | 23.41 ± 18.64 | 20.79 ± 18.85 | 22.96 ± 18.65 |
| Match minute (per-half) | Continuous | 13.85 ± 9.31 | 13.64 ± 9.59 | 13.82 ± 9.33 |
| Origin: Open play | Categorical | 57 (41.9%) | 12 (42.9%) | 69 (42.1%) |
| Origin: Set piece | Categorical | 43 (31.6%) | 6 (21.4%) | 49 (29.9%) |
| Origin: Counterattack | Categorical | 17 (12.5%) | 4 (14.3%) | 21 (12.8%) |
| Origin: High recovery | Categorical | 19 (14.0%) | 4 (14.3%) | 23 (14.0%) |
| Origin: Penalty | Categorical | 0 (0.0%) | 2 (7.1%) | 2 (1.2%) |
| Combination: Individual | Categorical | 88 (64.7%) | 24 (85.7%) | 112 (68.3%) |
| Combination: team | Categorical | 48 (35.3%) | 4 (14.3%) | 52 (31.7%) |
| Leg: Right | Categorical | 107 (78.7%) | 16 (57.1%) | 123 (75.0%) |
| Leg: Left | Categorical | 28 (20.6%) | 12 (42.9%) | 40 (24.4%) |
| Leg: Body | Categorical | 1 (0.7%) | 0 (0.0%) | 1 (0.6%) |
| Rival: PER | Categorical | 31 (22.8%) | 10 (35.7%) | 41 (25.0%) |
| Rival: COL | Categorical | 28 (20.6%) | 8 (28.6%) | 36 (22.0%) |
| Rival: CHL | Categorical | 26 (19.1%) | 4 (14.3%) | 30 (18.3%) |
| Rival: MEX | Categorical | 27 (19.9%) | 6 (21.4%) | 33 (20.1%) |
| Rival: BRA | Categorical | 24 (17.6%) | 0 (0.0%) | 24 (14.6%) |
| Near kickboard (Y < 3 m or >17 m) | Binary | 34 (25.0%) | 0 (0.0%) | 34 (20.7%) |
| Close zone (X < 6 m) | Binary | 31 (22.8%) | 4 (14.3%) | 35 (21.3%) |
| Second half | Binary | 65 (47.8%) | 10 (35.7%) | 75 (45.7%) |
| Model | Accuracy | Bal. Acc. | Precision | Recall | F1 | AUC-ROC | AUC-PR | MCC | Brier ↓ | Log-Loss ↓ | CV-AUC (5-Fold) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CatBoost | 0.920 | 0.864 | 0.778 | 0.778 | 0.778 | 0.913 | 0.828 | 0.729 | 0.072 | 0.253 | 0.871 ± 0.139 |
| Gradient Boosting | 0.920 | 0.821 | 0.857 | 0.667 | 0.750 | 0.927 | 0.829 | 0.711 | 0.077 | 0.345 | 0.863 ± 0.172 |
| LightGBM | 0.900 | 0.852 | 0.700 | 0.778 | 0.737 | 0.840 | 0.796 | 0.677 | 0.078 | 0.339 | 0.833 ± 0.126 |
| Random Forest | 0.900 | 0.809 | 0.750 | 0.667 | 0.706 | 0.921 | 0.823 | 0.648 | 0.075 | 0.268 | 0.869 ± 0.139 |
| XGBoost | 0.900 | 0.809 | 0.750 | 0.667 | 0.706 | 0.921 | 0.825 | 0.648 | 0.078 | 0.258 | 0.806 ± 0.146 |
| Extra Trees | 0.840 | 0.816 | 0.538 | 0.778 | 0.636 | 0.892 | 0.681 | 0.553 | 0.131 | 0.422 | 0.807 ± 0.113 |
| KNN | 0.880 | 0.753 | 0.714 | 0.556 | 0.625 | 0.915 | 0.603 | 0.561 | 0.095 | 0.304 | 0.880 ± 0.075 |
| SVM (RBF) | 0.800 | 0.835 | 0.471 | 0.889 | 0.615 | 0.875 | 0.520 | 0.543 | 0.113 | 0.353 | 0.826 ± 0.078 |
| Logistic Regression | 0.820 | 0.804 | 0.500 | 0.778 | 0.609 | 0.897 | 0.677 | 0.519 | 0.129 | 0.388 | 0.830 ± 0.103 |
| MLP | 0.860 | 0.741 | 0.625 | 0.556 | 0.588 | 0.913 | 0.672 | 0.506 | 0.126 | 0.979 | 0.820 ± 0.088 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Becerra-Patiño, B.A.; Yáñez-Sepúlveda, R.; Pino-Ortega, J. An Expected Goals Model for Analyzing a 5-a-Side Soccer for the Blind Using Ten Machine Learning Algorithms with SHAP Interpretability. Data 2026, 11, 164. https://doi.org/10.3390/data11070164
Becerra-Patiño BA, Yáñez-Sepúlveda R, Pino-Ortega J. An Expected Goals Model for Analyzing a 5-a-Side Soccer for the Blind Using Ten Machine Learning Algorithms with SHAP Interpretability. Data. 2026; 11(7):164. https://doi.org/10.3390/data11070164
Chicago/Turabian StyleBecerra-Patiño, Boryi A., Rodrigo Yáñez-Sepúlveda, and José Pino-Ortega. 2026. "An Expected Goals Model for Analyzing a 5-a-Side Soccer for the Blind Using Ten Machine Learning Algorithms with SHAP Interpretability" Data 11, no. 7: 164. https://doi.org/10.3390/data11070164
APA StyleBecerra-Patiño, B. A., Yáñez-Sepúlveda, R., & Pino-Ortega, J. (2026). An Expected Goals Model for Analyzing a 5-a-Side Soccer for the Blind Using Ten Machine Learning Algorithms with SHAP Interpretability. Data, 11(7), 164. https://doi.org/10.3390/data11070164

