# The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game Outcomes

^{*}

## Abstract

**:**

## 1. Introduction

#### Related Literature Review

## 2. Applied Algorithms and Methods

#### 2.1. Supervised Classification Machine Learning Algorithms

#### 2.1.1. Logistic Regression Algorithm

_{0}, and θ are the parameters of the model.

#### 2.1.2. Naïve Bayes Algorithm

_{j}| x) of sample x belonging to class c

_{j}according to the Bayes’ rule:

#### 2.1.3. Decision Trees Algorithm

#### 2.1.4. Multilayer Perceptron Neural Network Algorithm

#### 2.1.5. Random Forest Algorithm

_{k}} is a set of uniformly distributed, completely independent vectors and x is input vector pattern.

#### 2.1.6. K-NN Algorithm

#### 2.1.7. LogitBoost Algorithm

#### 2.2. Data Validation Methods

_{i}for i = 1,2,…,k. The parameter k is not strictly defined but depends on the situation and the expert’s assessment. With this method, one of the k subsets is used for testing, and the other subsets are used for training. The procedure is repeated k times and the average accuracy of the model is calculated (Figure 2). Cross-validation has an advantage over the Train&Test method because all T

_{i}subsets are used for training as well as for testing.

## 3. Data Acquisition and Preparation

#### 3.1. Data Acquisition

#### 3.2. Data Preparation

#### Feature Vector

## 4. Results and Discussion

#### 4.1. Prediction Results by Using Disjoint Datasets and Train&Test Validation Method

#### 4.2. Prediction Results by Using Disjoint Datasets and Cross-Validation Method

#### 4.3. Comparison of Prediction Results of Train&Test Validation Method and Cross-Validation Method by Using Disjoint Datasets

#### 4.4. Prediction Results by Using Up-To-Date Data and Train&Test Validation Method

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Table of Notations

y(x) | linear function |

f(z) | logistic (sigmoid) function |

W, w_{0}, θ | parameters of the Logistic regression model |

P (c_{j} | x) | posterior probability in Naïve Bayes algorithm |

x | sample |

c_{j} | class according to the Bayes’ rule |

d_{Euclidean} (x,y) | Euclidean distance in k-NN algorithm |

2fgm, 2fga | Number of three pointers made/attempts by the player/team |

3fgm, 3fga | Number of two pointers made/attempts by the player/team |

ftm, fta | Number of free throws made/attempts by the player/team |

defReb, offReb | Number of defensive/offensive rebounds by the player/team |

ast | Number of assists by the player/team |

st | Number of stolen balls by the player/team |

to | Number of turnovers by the player/team |

blcks | Number of blocks made by the player/team |

flsCmmtd | Number of fouls committed by the player/team |

## References

- Loeffelholz, B.; Bednar, E.; Bauer, K.W. Predicting NBA Games Using Neural Networks. J. Quant. Anal. Sports
**2009**, 5, 1–17. [Google Scholar] [CrossRef] - Zdravevski, E.; Kulakov, A. System for Prediction of the Winner in a Sports Game. ICT Innov.
**2010**, 55–63. [Google Scholar] [CrossRef] - Cao, C. Sports Data Mining Technology Used in Basketball Outcome Prediction. Master’s Dissertation, Dublin Institute of Technology, Dublin, Ireland, 2012. [Google Scholar]
- Torres, R.A. Prediction of NBA games based on Machine Learning Methods. In Computer-Aided Engineering; University of Wisconsin: Madison, WI, USA, 2013. [Google Scholar]
- Ganguly, S.; Frank, N. The Problem with Win Probability. In Proceedings of the MIT Sloan Sports Analytics Conference, Boston, MA, USA, 23–24 February 2018; Available online: https://statsweb-wpengine.netdna-ssl.com/wp-content/uploads/2018/09/2011.pdf (accessed on 30 January 2019).
- Miljković, D.; Gajić, L.; Kovačević, A.; Konjović, Z. The use of data mining for basketball matches outcomes prediction. In Proceedings of the IEEE 8th International Symposium on Intelligent and Informatics, Subotica, Serbia, 10–11 September 2010; pp. 309–312. [Google Scholar] [CrossRef]
- Avalon, G.; Balci, B.; Guzman, J. Various Machine Learning Approaches to Predicting NBA Score Margins. In Final Project; Stanford University: Standord, CA, USA, 2016. [Google Scholar]
- Kravanja, A. Napovedanje Zmagovalcev Košarkaških Tekem. Bachelor’s Thesis, University of Ljubljana, Ljubljana, Slovênia, 2013. [Google Scholar]
- Lin, J.; Short, L.; Sundaresan, V. Predicting National Basketball Association Winners. In Final Project; Stanford University: Standord, CA, USA, 2014. [Google Scholar]
- Pai, P.-F.; ChangLiao, L.-H.; Lin, K.-P. Analyzing basketball games by a support vector machines with decision tree model. Neural Comput. Appl.
**2017**, 28, 4159–4167. [Google Scholar] [CrossRef] - Horvat, T.; Job, J.; Medved, V. Prediction of Euroleague games based on supervised classification algorithm k-nearest neighbours. In Proceedings of the 6th International Congress on Sport Sciences Research and Technology Support K-BioS, Sevilla, Spain, 20–21 September 2018; pp. 203–207. [Google Scholar]
- Ivanković, Z.; Racković, M.; Markoviski, B. Analysis of basketball games using neural networks. In Proceedings of the 11th International Symposium on Computational Intelligence and Informatics (CINTI), Budapest, Hungary, 18–20 November 2010. [Google Scholar] [CrossRef]
- Cheng, G.; Zhang, Z.; Kyebambe, M.N.; Kimbugwe, N. Predicting the Outcome of NBA Playoffs Based on the Maximum Entropy Principle. Entropy
**2016**, 18, 450. [Google Scholar] [CrossRef] - Tran, T. Predicting NBA Games with Matrix Factorization. Master’s Dissertation, Department of Electrical Engineering and Computer Science, Massachuttets Institute of Technology, Cambridge, MA, USA, 2016. [Google Scholar]
- Lam, M.W.Y. One-Match-Ahead Forecasting in Two-Team Sports with Stacked Bayesian Regressions. J. Artif. Intell. Soft Comput. Res.
**2018**, 8, 159–171. [Google Scholar] [CrossRef] [Green Version] - Trawinski, K. A fuzzy classification system for prediction of the results of the basketball games. In Proceedings of the International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010. [Google Scholar] [CrossRef]
- Zimmermann, A.; Moorthy, S.; Shi, Z. Predicting College Basketball Match Outcomes Using Machine Learning Techniques: Some Results and Lessons Learned. 2013. Available online: https://arxiv.org/pdf/1310.3607.pdf (accessed on 30 January 2019).
- Amin, A.; Shah, B.; Khattak, A.M.; Baker, T.; Rahman Durani, H.; Anwar, S. Just-in-time Customer Churn Prediction: With and Without Data Transformation. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Khalaf, M.; Hussain, A.J.; Al-Jumeily, D.; Baker, T.; Keight, R.; Lisboa, P.; Fergus, P.; Al Kafri, A.S. A Data Science Methodology Based on Machine Learning Algorithms for Flood Severity Prediction. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Jain, S.; Kaur, H. Machine learning approaches to predict basketball game outcome. In Proceedings of the 2017 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA) (Fall), Dehradun, India, 15–16 September 2017; pp. 1–7. [Google Scholar] [CrossRef]
- Prasetio, D.; Harlili, D. Predicting football match results with logistic regression. In Proceedings of the International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA), Penang, Malaysia, 16–19 August 2016. [Google Scholar] [CrossRef]
- Soto Valero, C. Predicting Win-Loss outcomes in MLB regular season games–A comparative study using data mining methods. Int. J. Comput. Sci. Sport
**2016**, 15, 91–112. [Google Scholar] [CrossRef] [Green Version] - Available online: https://www.cs.waikato.ac.nz/ml/weka/ (accessed on 30 January 2019).
- Available online: https://en.wikipedia.org/wiki/Weka_(machine_learning) (accessed on 30 January 2019).
- De Mantaras, R.L.; Armengol, E. Machine learning from example: Inductive and Lazy methods. Data Knowl. Eng.
**1998**, 25, 99–123. [Google Scholar] [CrossRef] - Rosenblatt, F. The Perceptron: A Theory of Statistical Separability in Cognitive Systems; Report No. VG1196-G-1; Cornell Aeronautical Laboratory: Buffalo, NY, USA, 1958. [Google Scholar]
- Horvat, T.; Job, J. Importance of the training dataset length in basketball game outcome prediction by using naïve classification machine learning methods. Elektrotehniški Vestnik
**2019**, 86, 197–202. [Google Scholar]

**Figure 7.**Comparison of the average algorithm accuracy by validation methods regardless of the used dataset length.

**Figure 9.**Comparison of the average results obtained by using Train&Test validation method and disjoint datasets or up-to-date data.

Name/Abbrev. | Full Name/Explanation |
---|---|

2fgm, 2fga | Number of three pointers made/attempts by the player/team |

3fgm, 3fga | Number of two pointers made/attempts by the player/team |

ftm, fta | Number of free throws made/attempts by the player/team |

defReb, offReb | Number of defensive/offensive rebounds by the player/team |

ast | Number of assists by the player/team |

st | Number of stolen balls by the player/team |

to | Number of turnovers by the player/team |

blcks | Number of blocks made by the player/team |

flsCmmtd | Number of fouls committed by the player/team |

Machine Learning Algorithm | 1 Training Season + 1 Testing Season | 2 Training Seasons + 1 Testing Season | 1 Training Season + 2 Testing Seasons | 2 Training Seasons + 2 Testing Seasons | 3 Training Seasons + 2 Testing Seasons | Average |
---|---|---|---|---|---|---|

Logistic regr. | 57.09% | 56.47% | 55.63% | 56.01% | 55.62% | 56.16% |

Naive Bayes | 57.40% | 57.20% | 55.76% | 54.97% | 53.65% | 55.80% |

Decision tree | 55.03% | 55.16% | 53.75% | 53.66% | 49.87% | 53.49% |

Multilayer perc. | 57.13% | 56.32% | 55.64% | 55.86% | 55.58% | 56.11% |

K-NN | 58.94% | 59.04% | 57.76% | 57.33% | 56.42% | 57.90% |

Random forest | 57.96% | 56.94% | 56.94% | 55.39% | 54.14% | 56.27% |

LogitBoost | 56.46% | 54.48% | 55.31% | 53.36% | 52.84% | 54.49% |

Average | 57.14% | 56.52% | 55.83% | 55.23% | 54.02% | 55.75% |

Machine Learning Algorithm | 2 Seasons | 3 Seasons | 4 Seasons | 5 Seasons | Average |
---|---|---|---|---|---|

Logistic regr. | 58.14% | 57.50% | 57.01% | 56.02% | 57.17% |

Naive Bayes | 58.67% | 58.08% | 57.54% | 56.00% | 57.57% |

Decision tree | 55.07% | 53.61% | 52.95% | 51.85% | 53.37% |

Multilayer perc. | 57.89% | 57.38% | 56.95% | 56.00% | 57.06% |

K-NN | 60.12% | 59.46% | 58.53% | 57.69% | 58.95% |

Random forest | 58.74% | 57.23% | 55.85% | 53.59% | 56.35% |

LogitBoost | 55.78% | 55.83% | 54.80% | 52.50% | 54.73% |

Average | 57.77% | 57.01% | 56.23% | 54.81% | 56.46% |

Machine Learning Algorithm | 1 Training Season + 1 Testing Season | 2 Training Seasons + 1 Testing Season | 1 Training Season + 2 Testing Seasons | 2 Training Seasons + 2 Testing Seasons | 3 Training Seasons + 2 Testing Seasons | Average |
---|---|---|---|---|---|---|

Logistic regr. | 59.29% | 58.97% | 59.97% | 59.47% | 59.44% | 59.43% |

Naive Bayes | 59.22% | 58.03% | 58.58% | 57.77% | 57.58% | 58.24% |

Decision tree | 54.97% | 55.10% | 54.85% | 54.20% | 54.18% | 54.66% |

Multilayer perc. | 58.23% | 58.70% | 59.97% | 59.46% | 59.50% | 59.17% |

K-NN | 60.06% | 59.23% | 60.82% | 59.87% | 60.06% | 60.01% |

Random forest | 59.56% | 58.63% | 58.92% | 57.50% | 56.60% | 58.24% |

LogitBoost | 57.55% | 56.24% | 55.82% | 54.57% | 54.39% | 55.71% |

Average | 58.41% | 57.84% | 58.42% | 57.55% | 57.39% | 57.92% |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Horvat, T.; Havaš, L.; Srpak, D.
The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game Outcomes. *Symmetry* **2020**, *12*, 431.
https://doi.org/10.3390/sym12030431

**AMA Style**

Horvat T, Havaš L, Srpak D.
The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game Outcomes. *Symmetry*. 2020; 12(3):431.
https://doi.org/10.3390/sym12030431

**Chicago/Turabian Style**

Horvat, Tomislav, Ladislav Havaš, and Dunja Srpak.
2020. "The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game Outcomes" *Symmetry* 12, no. 3: 431.
https://doi.org/10.3390/sym12030431