Construction of a Predictive Model for MLB Matches
Abstract
:1. Introduction
1.1. Background
1.2. Literature Review
- The Bet365 [5] online bookmaker announces its first five innings betting odds at 10 p.m. China Standard Time. The odds are pre-game odds obtained before the start of the match. This information is relevant, since a starting pitcher in MLB usually rests four or five days after pitching a game before pitching another. Therefore, most MLB teams have five or six starting pitchers on their rosters. These pitchers, and the sequence in which they pitch, is known as the rotation. For the most part the starting lineups will stay the same after posted. The choice of the bookmaker probably does not have a high effect on the homogeneity of prices due to the transparency of odds among online bookmakers and high competition in the market. The betting odds already include the profits of the bookmaker. Therefore, the inverse odds cannot be regarded as the implied probability directly. In the next section, we will introduce how to adjust bookmakers’ profits using basic normalization (BN) based on the literature.
- We calculated the ENRS (based on batter vs. pitcher career statistics) and the probability of leading in the first five innings, which is the new implied probability (NIP) proposed in this article.
2. Determining Outcome Probabilities from Betting Odds
2.1. Basic Normalization
2.2. New Implied Probability (NIP)
3. Data Processing
3.1. Research Tools
3.2. Rank Probability Score (RPS)
3.3. Expected Value (EV)
4. Results
4.1. Comparison of the Prediction Probability and the Real Outcomes
4.2. Comparison of the Expected Value
5. Discussion and Application
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Date | Matches | BN | NIP | NIP–NBD | Results | |
---|---|---|---|---|---|---|
15 September 2018 | NYY | TOR | 0.71 | 0.83 | 0.84 | 1 |
15 September 2018 | WSH | ATL | 0.59 | 0.76 | 0.78 | 0 |
15 September 2018 | MIA | PHI | 0.39 | 0.21 | 0.20 | 0 |
15 September 2018 | SF | COL | 0.45 | 0.27 | 0.23 | 1 |
15 September 2018 | DET | CLE | 0.37 | 0.53 | 0.53 | 1 |
15 September 2018 | MIN | KC | 0.57 | 0.50 | 0.43 | 1 |
16 September 2018 | MIN | KC | 0.47 | 0.11 | 0.10 | 0 |
16 September 2018 | CIN | CHC | 0.40 | 0.74 | 0.93 | - |
16 September 2018 | ARI | HOU | 0.36 | 0.15 | 0.12 | 0 |
16 September 2018 | COL | SF | 0.56 | 0.36 | 0.35 | 0 |
18 September 2018 | ATL | STL | 0.54 | 0.13 | 0.11 | 0 |
18 September 2018 | ARI | CHC | 0.58 | 0.53 | - | - |
19 September 2018 | BOS | NYY | 0.42 | 0.43 | 0.41 | 1 |
19 September 2018 | NYM | PHI | 0.36 | 0.59 | 0.59 | 1 |
19 September 2018 | CWS | CLE | 0.31 | 0.41 | 0.34 | 0 |
19 September 2018 | CIN | MIL | 0.36 | 0.72 | 0.74 | 1 |
20 September 2018 | CIN | MIL | 0.39 | 0.12 | 0.11 | 0 |
20 September 2018 | NYM | PHI | 0.54 | 0.36 | 0.35 | 0 |
21 September 2018 | CWS | CLE | 0.36 | 0.60 | 0.61 | 1 |
21 September 2018 | BOS | NYY | 0.40 | 0.36 | 0.36 | 0 |
21 September 2018 | OAK | ANA | 0.54 | 0.88 | 0.93 | 1 |
22 September 2018 | COL | ARI | 0.47 | 0.26 | 0.22 | 0 |
22 September 2018 | HOU | ANA | 0.68 | 0.53 | 0.53 | 1 |
22 September 2018 | SD | LAD | 0.29 | 0.95 | 0.99 | 1 |
22 September 2018 | NYY | BAL | 0.73 | 0.80 | 0.82 | 1 |
22 September 2018 | PIT | MIL | 0.43 | 0.65 | 0.75 | 1 |
23 September 2018 | COL | ARI | 0.38 | 0.84 | 0.86 | 1 |
23 September 2018 | ATL | PHI | 0.58 | 0.66 | 0.67 | 1 |
23 September 2018 | HOU | ANA | 0.71 | 0.64 | 0.67 | 1 |
23 September 2018 | SEA | TEX | 0.50 | 0.87 | 0.89 | 1 |
24 September 2018 | COL | ARI | 0.46 | 0.48 | 0.48 | 1 |
24 September 2018 | OAK | MIN | 0.64 | 0.67 | 0.70 | 0 |
24 September 2018 | TOR | TB | 0.38 | 0.15 | - | 0 |
25 September 2018 | ARI | LAD | 0.37 | 0.42 | 0.41 | 1 |
25 September 2018 | BAL | BOS | 0.31 | 0.07 | 0.05 | 0 |
25 September 2018 | CWS | CLE | 0.33 | 0.45 | 0.35 | - |
25 September 2018 | PIT | CHC | 0.41 | 0.06 | 0.03 | 1 |
25 September 2018 | OAK | SEA | 0.41 | 0.24 | 0.23 | 1 |
25 September 2018 | SD | SF | 0.44 | 0.58 | 0.59 | 1 |
25 September 2018 | HOU | TOR | 0.64 | 0.60 | 0.59 | 1 |
26 September 2018 | ANA | TEX | 0.59 | 0.88 | - | 0 |
26 September 2018 | CLE | CWS | 0.65 | 0.71 | 0.72 | 1 |
26 September 2018 | OAK | SEA | 0.50 | 0.87 | 0.90 | 1 |
26 September 2018 | SF | SD | 0.50 | 0.49 | 0.48 | 0 |
27 September 2018 | ARI | LAD | 0.49 | 0.98 | 0.95 | 1 |
27 September 2018 | BOS | BAL | 0.77 | 0.98 | 1.00 | - |
27 September 2018 | CWS | CLE | 0.38 | 0.07 | 0.03 | 0 |
27 September 2018 | CHC | PIT | 0.63 | 0.36 | 0.35 | 1 |
27 September 2018 | COL | PHI | 0.65 | 0.97 | 0.98 | 1 |
27 September 2018 | MIN | DET | 0.57 | 0.54 | 0.54 | 1 |
27 September 2018 | NYM | ATL | 0.62 | 0.59 | 0.58 | - |
27 September 2018 | OAK | SEA | 0.50 | 0.47 | 0.47 | 1 |
27 September 2018 | MIL | STL | 0.51 | 0.61 | 0.63 | 1 |
27 September 2018 | TB | NYY | 0.38 | 0.03 | 0.03 | 1 |
28 September 2018 | CHC | PIT | 0.61 | 0.89 | 0.91 | 1 |
28 September 2018 | NYM | ATL | 0.48 | 0.15 | 0.11 | 1 |
29 September 2018 | NYY | BOS | 0.56 | 0.85 | 0.87 | 1 |
29 September 2018 | PIT | CIN | 0.47 | 0.55 | 0.55 | 1 |
29 September 2018 | KC | CLE | 0.38 | 0.38 | 0.33 | 0 |
29 September 2018 | MIA | NYM | 0.44 | 0.71 | 0.75 | 1 |
29 September 2018 | SD | ARI | 0.41 | 0.10 | 0.06 | - |
29 September 2018 | SEA | TEX | 0.63 | 0.71 | 0.73 | 1 |
29 September 2018 | LAD | SF | 0.63 | 0.63 | 0.65 | 1 |
29 September 2018 | TB | TOR | 0.62 | 0.33 | 0.31 | 1 |
30 September 2018 | COL | WSH | 0.51 | 0.07 | 0.06 | 0 |
30 September 2018 | KC | CLE | 0.30 | 0.31 | 0.30 | - |
30 September 2018 | CWS | MIN | 0.44 | 0.02 | 0.04 | 0 |
30 September 2018 | MIA | NYM | 0.40 | 0.51 | 0.50 | - |
30 September 2018 | PHI | ATL | 0.59 | 0.64 | 0.67 | - |
30 September 2018 | TOR | RAY | 0.30 | 0.03 | 0.02 | - |
References
- Jeffrey, M. The House Advantage: Playing the Odds to Win Big in Business, 1st ed.; St. Martin’s Press: New York, NY, USA, 2010. [Google Scholar]
- Street & Smith’s Sports Business Journal. Available online: https://www.sportsbusinessdaily.com/Journal/Issues/2018/04/16/World-Congress-of-Sports/Research.aspx (accessed on 16 April 2018).
- Cantinotti, M.; Ladouceur, R.; Jacques, C. Sports betting: Can gamblers beat randomness? Psychol. Addict. Behav. 2004, 18, 143–147. [Google Scholar] [CrossRef]
- Garcia, J.; Perez, L.; Rodriguez, P. Football pools sales: How important is a football club in the top divisions? Int. J. Sport Financ. 2008, 3, 167–176. [Google Scholar]
- Bet365. Available online: https://help.bet365.com/product-help/sports/rules/baseball (accessed on 30 December 2020).
- Shin, H.S. Measuring the incidence of insider trading in a market for state-contingent claims. Econ. J. 1993, 103, 1141–1153. [Google Scholar] [CrossRef]
- Beating the Bookies with Their Own Numbers-and How the Online Sports Betting Market is Rigged. Available online: https://arxiv.org/abs/1710.02824 (accessed on 30 December 2020).
- Shin, H.S. Optimal betting odds against insider traders. Econ. J. 1991, 101, 1179–1185. [Google Scholar] [CrossRef]
- Shin, H.S. Prices of state contingent claims with insider traders, and the favourite-longshot bias. Econ. J. 1992, 102, 426–435. [Google Scholar] [CrossRef]
- Dixon, M.; Coles, S. Modeling association football scores and inefficiencies in the football betting market. Appl. Stat. 1997, 46, 265–280. [Google Scholar]
- Maher, M.J. Modeling association football scores. Stat. Neerl. 1982, 36, 3. [Google Scholar] [CrossRef]
- Vlastakis, N.; Dotsis, G.; Markellos, R.N. Nonlinear modelling of European football scores using support vector machines. Appl. Econ. 2008, 40, 111–118. [Google Scholar] [CrossRef] [Green Version]
- Constantinou, A.C.; Fenton, N.E. Profiting from arbitrage and odds biases of the European football gambling market. J. Gambl. Bus. Econ. 2013, 7, 41–70. [Google Scholar] [CrossRef]
- Constantinou, A.C.; Fenton, N.E.; Martin Neil, M. Profiting from an inefficient association football gambling market: Prediction, risk and uncertainty using Bayesian networks. Knowl. Based Syst. 2013, 50, 60–86. [Google Scholar] [CrossRef] [Green Version]
- Bukiet, B.; Harold, E.R.; Palacios, J.L. A Markov chain approach to baseball. Operat. Res. 1997, 45, 14–23. [Google Scholar] [CrossRef]
- D’Esopo, D.A.; Lefkowitz, B. The distribution of runs in the game of baseball. In Optimal Strategies in Sports; Ladany, P., Machol, R.E., Eds.; North-Holland Publishing Company: Amsterdam, Holland, 1997. [Google Scholar]
- Hirotsu, N. Reconsideration of the best batting order in baseball: Is the order to maximize the expected number of runs really the best? J. Quant. Anal. Sports 2011, 7, 1–12. [Google Scholar] [CrossRef]
- Hirotsu, N.; Bickel, J.E. Optimal batting orders in run-limit-rule baseball: A Markov chain approach. IMA J. Manag. Math. 2014, 27, 297–313. [Google Scholar] [CrossRef]
- Hirotsu, N.; Wright, M. Modeling a baseball game to optimize pitcher substitution strategies using dynamic programming. Econ. Manag. Optim. Sports 2004, 62, 131–161. [Google Scholar]
- Fritz, K.; Bukiet, B. Objective method for determining the most valuable player in major league baseball. Int. J. Perform. Anal. Sport 2010, 10, 152–169. [Google Scholar] [CrossRef]
- Smith, Z.J. A Markov Chain Model for Predicting Major League Baseball. Ph.D. Thesis, University of Texas, Austin, TX, USA, 2016. [Google Scholar]
- Chang, C.C. Runner advancement model application in CPBL. J. Taiwan Intell. Technol. Appl. Stat. 2017, 15, 31–46. [Google Scholar]
- Chang, C.C. Using new sabermetrics index to provide CPBL hitter salaries. J. Taiwan Intell. Technol. Appl. Stat. 2018, 16, 19–30. [Google Scholar]
- Chang, C.C. Improving Taiwanese baseball data analysis goes hand in hand with U.S.A. Sci. Educ. Monthly 2018, 407, 2–18. [Google Scholar]
- Wunderlich, F.; Memmert, D. Are betting returns a useful measure of accuracy in (sports) forecasting? Int. J. Forecast. 2020, 36, 713–722. [Google Scholar] [CrossRef]
- Epstein, E. A scoring system for probability forecasts of ranked categories. J. Appl. Meteorol. 1969, 8, 985–987. [Google Scholar] [CrossRef] [Green Version]
- STATS.SEANDOLINAR.com. Available online: https://stats.seandolinar.com/mlb-run-distribution-neg-binomial/ (accessed on 30 December 2020).
- MLB. Available online: https://www.mlb.com/starting-lineups/ (accessed on 30 December 2020).
- MLB. Available online: http://mlb.mlb.com/stats/sortable_batter_vs_pitcher.jsp#season=2018 (accessed on 30 December 2020).
- Constantinou, A.C.; Fenton, N.E. Solving the problem of inadequate scoring rules for assessing probabilistic football forecast models. J. Quant. Anal. Sports 2012, 8, 1–14. [Google Scholar] [CrossRef]
- Albert, J. Streaky hitting in baseball. J. Quant. Anal. Sports 2008, 4, 1–32. [Google Scholar] [CrossRef] [Green Version]
- Rump, C. Data clustering for fitting parameters of a Markov chain model of multi-game playoff series. J. Quant. Anal. Sports 2008, 4, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Cserepy, N.; Ostrow, R.; Weems, B. Predicting the Final Score of Major League Baseball Games; Stanford University: Stanford, CA, USA, 2015. [Google Scholar]
Code Number | (Base State, Number of Outs) | Code Number | (Base State, Number of Outs) | Code Number | (Base State, Number of Outs) |
---|---|---|---|---|---|
1 | (0, 0) | 9 | (0, 1) | 17 | (0, 2) |
2 | (1, 0) | 10 | (1, 1) | 18 | (1, 2) |
3 | (2, 0) | 11 | (2, 1) | 19 | (2, 2) |
4 | (3, 0) | 12 | (3, 1) | 20 | (3, 2) |
5 | (12, 0) | 13 | (12, 1) | 21 | (12, 2) |
6 | (13, 0) | 14 | (13, 1) | 22 | (13, 2) |
7 | (23, 0) | 15 | (23, 1) | 23 | (23, 2) |
8 | (123, 0) | 16 | (123, 1) | 24 | (123, 2) |
□ | □ | □ | □ | 25 | (X, 3) |
Hitting Conditions | Outcome |
---|---|
Base Balled (BB) | Batter safely reaches first base and does not advance unless there is a bases loaded. |
One Base Hit (1B) | Batter safely reaches first base; first base runner safely reaches second base, and the rest runners score points. |
Two Base Hit (2B) | Batter safely reaches second base; first base runner safely reaches third base, and the rest runners score points. |
Three Base Hit (3B) | Batter safely reaches third base, and the rest runners score points. |
Home Run (HR) | Batter scores a point, and the rest runners score points. |
Date | Matches | RPSBN | RPSNIP | RPSNIP–NBD | O1 | O2 | Lead | EVNIP | EVNIP–NBD | |
---|---|---|---|---|---|---|---|---|---|---|
15 September 2018 | NYY | TOR | 0.08 | 0.03 | 0.02 | 1.33 | 3.3 | 1 | 0.1 | 0.12 |
15 September 2018 | WSH | ATL | 0.35 | 0.58 | 0.6 | 1.6 | 2.35 | 2 | −0.44 | −0.48 |
15 September 2018 | MIA | PHI | 0.15 | 0.04 | 0.04 | 2.45 | 1.57 | 2 | 0.24 | 0.26 |
15 September 2018 | SF | COL | 0.3 | 0.53 | 0.59 | 2.1 | 1.71 | 1 | −0.43 | −0.51 |
15 September 2018 | DET | CLE | 0.4 | 0.22 | 0.22 | 2.6 | 1.52 | 1 | 0.37 | 0.38 |
15 September 2018 | MIN | KC | 0.18 | 0.25 | 0.32 | 1.66 | 2.2 | 1 | −0.17 | −0.28 |
Date | Matches | RPSBN | RPSNIP | RPSNIP–NBD | O1 | O2 | Lead | EVNIP | EVNIP–NBD | |
---|---|---|---|---|---|---|---|---|---|---|
15 September 2018 | NYY | TOR | 0.08 | 0.03 | 0.02 | 1.33 | 3.3 | 1 | 0.1 | 0.12 |
19 September 2018 | BOS | NYY | 0.33 | 0.32 | 0.35 | 2.25 | 1.64 | 1 | –0.03 | –0.07 |
21 September 2018 | BOS | NYY | 0.16 | 0.13 | 0.13 | 2.4 | 1.58 | 2 | 0 | 0.02 |
22 September 2018 | NYY | BAL | 0.07 | 0.04 | 0.03 | 1.3 | 3.5 | 1 | 0.04 | 0.07 |
27 September 2018 | TB | NYY | 0.38 | 0.94 | 0.94 | 2.5 | 1.55 | 1 | –0.92 | –0.92 |
29 September 2018 | NYY | BOS | 0.19 | 0.02 | 0.02 | 1.68 | 2.15 | 1 | 0.43 | 0.46 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, C.-H. Construction of a Predictive Model for MLB Matches. Forecasting 2021, 3, 102-112. https://doi.org/10.3390/forecast3010007
Chang C-H. Construction of a Predictive Model for MLB Matches. Forecasting. 2021; 3(1):102-112. https://doi.org/10.3390/forecast3010007
Chicago/Turabian StyleChang, Chia-Hao. 2021. "Construction of a Predictive Model for MLB Matches" Forecasting 3, no. 1: 102-112. https://doi.org/10.3390/forecast3010007
APA StyleChang, C. -H. (2021). Construction of a Predictive Model for MLB Matches. Forecasting, 3(1), 102-112. https://doi.org/10.3390/forecast3010007