Neural Networks and Betting Strategies for Tennis

: Recently, the interest of the academic literature on sports statistics has increased enormously. In such a framework, two of the most signiﬁcant challenges are developing a model able to beat the existing approaches and, within a betting market framework, guarantee superior returns than the set of competing speciﬁcations considered. This contribution attempts to achieve both these results, in the context of male tennis. In tennis, several approaches to predict the winner are available, among which the regression-based, point-based and paired comparison of the competitors’ abilities play a signiﬁcant role. Contrary to the existing approaches, this contribution employs artiﬁcial neural networks (ANNs) to forecast the probability of winning in tennis matches, starting from all the variables used in a large selection of the previous methods. From an out-of-sample perspective, the implemented ANN model outperforms four out of ﬁve competing models, independently of the considered period. For what concerns the betting perspective, we propose four different strategies. The resulting returns on investment obtained from the ANN appear to be more broad and robust than those obtained from the best competing model, irrespective of the betting strategy adopted.


Introduction
In sports statistics, two of the most significant challenges are developing a forecasting model able to robustly beat the existing approaches and, within a betting market framework, guarantee returns superior to the competing specifications. This contribution attempts to achieve both these results, in the context of male tennis. So far, the approaches to forecasting a winning player in tennis can be divided into three main categories: Regression-based approach, point-based procedure, and comparison of the latent abilities of players. A comprehensive review of these methods with a related comparison is provided by Kovalchik (2016). Contributions belonging to the regression approach usually use probit or logit regressions, as the papers of Lisi and Zanella (2017), Del Corral and Prieto-Rodríguez (2010) and Klaassen and Magnus (2003), Clarke and Dyte (2000) and Boulier and Stekler (1999), among others. In point-based models the interest is in the prediction of winning a single point, as in Knottenbelt et al. (2012), Barnett et al. (2006) and Barnett and Clarke (2005). Approaches relying on the comparison of players' abilities have been pioneered by the work of McHale and Morton (2011), who make use of the Bradley-Terry (BT)-type model. All these forecasting models are difficult to compare in terms of global results because of the different periods, matches or evaluation criteria involved. Synthesizing only partly the high number of contributions, the model of Del Corral and Prieto-Rodríguez (2010) has a Brier Score (Brier 1950), which is the equivalent of Mean-Squared-Error for binary outcomes, of 17.5% for the Australian Open 2009 while that of Lisi and Zanella (2017) is of 16.5% for the four Grand Slam Championships played in 2013. Knottenbelt et al. (2012) declare a Return-On-Investment (ROI) of 3.8% for 2713 male matches played in 2011 and the work of McHale and Morton (2011) leads to a ROI of 20% in 2006. Despite these promising results, building a comprehensive (and possibly adaptive to the last recent results) forecasting model in tennis is complicated by a large number of variables that may influence the outcome of the match. A potential solution to this problem is provided by O'Donoghue (2008), which suggests to use the principal component analysis to reduce the number of key variables influencing tennis matches. Another unexplored possibility in this framework is taking advantage of all the variables at hand by machine learning algorithms to map all the relationships among these variables and the matches' outcomes. Therefore, this paper introduces the supervised, feed-forward Artificial Neural Networks (ANNs) (reviewed in Suykens et al. (1996), Paliwal and Kumar (2009), Haykin (2009), and Zhang et al. (1998), among others) trained by backpropagation, in order to forecast the probability of winning in tennis matches. The ANNs can be seen as flexible regression models with the advantage of handling a large number of variables (Liestbl et al. 1994) efficiently. Moreover, when the ANNs are used, the non-linear relationships between the output of interest, that is the probability of winning, and all the possible influential variables are taken into account. Under this aspect, our ANN model benefits of a vast number of input variables, some deriving from a selection of the existing approaches and some others properly created to deal with particular aspects of tennis matches uncovered by the actual literature. In particular, we consider the following forecasting models, both in terms of input variables and competing models: The logit regression of Lisi and Zanella (2017), labelled as "LZR"; the logit regression of Klaassen and Magnus (2003), labelled as "KMR"; the probit regression of Del Corral and Prieto-Rodríguez (2010), labelled as "DCPR"; the BT-type model of McHale and Morton (2011), labelled as "BTM"; the point-based approach of Barnett and Clarke (2005), labelled as "BCA". Overall, our set of variables consists of more than thirty variables for each player.
The contribution of our work is twofold: (i) For the first time the ANNs are employed in tennis literature 1 ; (ii) four betting strategies for the favourite and the underdog player are proposed. Through these betting strategies, we test if and how much the estimated probabilities of winning of the proposed approach are economically profitable with respect to the best approach among those considered. Furthermore, the proposed betting strategies constitute a help in the tough task of decision making on which matches to bet on.
The ANNs have been recently employed in many different and heterogeneous frameworks: Wind (Cao et al. 2012) and pollution (Zainuddin and Pauline 2011) data, hydrological time series (Jain and Kumar 2007), tourism data (Atsalakis et al. 2018;Palmer et al. 2006), financial data for pricing the European options (Liu et al. 2019) or to forecast the exchange rate movements (Allen et al. 2016), betting data to promote the responsible gambling (Hassanniakalager and Newall 2019), and in many other fields. Moreover, the use of artificial intelligence for predicting issues related to sports or directly sports outcomes is increasing, proving that neural network modeling can be a useful approach to handle very complex problems. For instance, Nikolopoulos et al. (2007) investigate the performances of ANNs with respect to traditional regression models in forecasting the shares of television programs showing sports matches. Maszczyk et al. (2014) investigate the performance of the ANNs with respect to that of the regression models in the context of javelin throwers. Silva et al. (2007) study the 200 m individual medley and 400 m front crawl events through ANNs to predict the competitive performance in swimming, while Maier et al. (2000) develop a neural network model to predict the distance reached by javelins. The work of Condon et al. (1999) predicts a country's success at the Olympic Games, comparing the performances of the proposed ANN with those obtained from the linear regression models. In the context of team sports,Şahin and Erol (2017), Hucaljuk andRakipović (2011), andMcCullagh et al. (2010) for football and Loeffelholz et al. (2009) for basketball attempt to predict 1 To the best of our knowledge, only other two contributions focus on tennis: Sipko and Knottenbelt (2015) and Somboonphokkaphan et al. (2009). Nevertheless, the former contribution is a master's thesis, and the latter is a conference proceeding. the outcome matches. Generally, all the previously cited works employing the ANNs have better performances than the traditional, parametric approaches.
From a statistical viewpoint, the proposed configuration of the ANN outperforms four out of five competing approaches, independently of the period considered. The best competitor is represented by the LZR model, which is at least as good as the proposed ANN. Economically, the ROIs of the ANN with those of the LRZ are largely investigated according to the four betting strategies proposed. In this respect, the ANN model robustly yields higher, sometimes much higher, ROIs than those of the LZR.
The rest of article is structured as follows. Section 2 illustrates the notation and some definitions we adopt. Section 3 details the design of the ANN used. Section 4 is devoted to the empirical application. Section 5 presents the betting strategies and the results of their application while Section 6 concludes.

Notation and Definitions
Let Y i,j = 1 be the event that a player i wins a match j defeating the opponent. Naturally, if player i has been defeated in the match j, then Y i,j = 0. The aim of this paper is to forecast the probability of winning of player i for the match j, that is p i,j . Being p i,j a probability, the probability for the other player will be obtained as complement to one. Let us define two types of information sets: Definition 1. The information set including all the information related to the match j when the match is over is defined as F j .
Definition 2. The information set including all the information related to the match j before the begin of the match is defined as F j|j−1 .
It is important to underline that when Definition 2 holds, the outcome of the match is unknown. The aim of forecasting p i,j takes advantage of a number of variables influencing the outcome of the match. Let X j = X 1,i,j , · · · , X N,i,j be the N-dimensional vector of variables (potentially) influencing the match j, according to F j or F j|j−1 . Note that X j may include information related to each or both players, but, for ease of notation, we intentionally drop out the suffix i for X j . Moreover, X j can include both quantitative and qualitative variables, like for instance the surface of the match or the handedness of the players. In this latter case, the variables under consideration will be expressed by a dummy.
Definition 3. A generic nth variable X n,i,j ∈ X j is defined as "structural" if and only if the following expression holds, ∀j: X n,i,j |F j = X n,i,j |F j|j−1 .
By Definition 3 we focus on variables whose information is known and invariant both before the match and after its conclusion. Therefore, player characteristics (such as height, weight, ranks, and so forth), and tournament specifications (like the level of the match, the surface, the country where the match is played) are invariant to F j and F j|j−1 . Hence, these types of variables are defined as structural ones. Instead, information concerning end-of-match statistics (such as the number of aces, the number of points won on serve, on return, and so forth) is unknown for the match j if the information set F j|j−1 is used.
Let o i,j and o ii,j be the published odds for the match j and players i and ii, with i = ii, respectively. The odds o i,j and o ii,j denote the amount received by a bettor after an investment of one dollar, for instance, in case of a victory of player i or ii, respectively. Throughout all the paper, we define the two players of a match j as favourite and underdog, synthetically denoted as " f " and "u", according to the following definition: Definition 4. A player i in the match j becomes the favourite of that match if a given quantitative variable included in X j is smaller than the value observed for the opponent ii, with i = ii.
Definition 4 is left intentionally general to allow different favourite identification among the competing models. For instance, one may argue that a favourite is the player i that for the match j has the smallest odd, that is o i,j < o ii,j . Otherwise, one may claim that the favourite is the player with the highest rank. It is important to underline that Definition 4 does not allow to have players unequivocally defined as favourite or underdog.

Design of the Artificial Neural Network
The general structure of the ANN used in this work belongs to the Multilayer Perceptrons class. In particular, the topology of the ANN model we implement, illustrated in Figure 1, consists of an input, a hidden and an output layer. Nodes belonging to the input layer are a large selection of variables (potentially) influencing the outcomes while the output layer consists of only one neuron with a response of zero, representing the event "player i has lost the match", and one, representing the event "player i has won the match". According to Hornik (1991), we use only one single hidden layer. This architecture gives good forecasting performances to the model, when the number of nodes included in the hidden layer is sufficiently adequate. More formally, the ANN structure adopted to forecast the probability of winning of player i in the match j, that is p i,j , taking advantage of the set of variables included in X j operates similarly to a non-linear regression. To simplify the notation, from now on we will always refer to the favourite player (according to Definition 4), such that the suffix identifying player i will disappear, except when otherwise requested. A comprehensive set of notations used in this work is summarized in Table A1. Therefore, let X n,j ∈ X j be the nth variable in the match j, potentially influencing the outcome of that match. All the variables in X j , with j = 1, · · · , J, feed the N-dimensional input nodes. Moreover, let σ(·) be the transfer or activation function, which in this work is the sigmoid logistic. Therefore, the ANN model with N inputs, one hidden layer having N h < N nodes and an output layer Y j is defined as follows: where w j,r represents the weight of the rth hidden neuron, v n,j,r is the weight associated to the nth variable, for the match j and β r is the bias vector.
The set-up of the ANN model here employed consists of two distinct steps: A learning and a testing phase. The former uses a dataset of J L < J matches, and the latter a dataset of J Te < J L matches, with J L + J Te = J. The learning phase, in turn, encompasses two steps: A training and a validation procedures. In the former phase, the weights of Equation (1) are found on the training sample whose dimension is of J Tr < J L , by means of machine learning algorithms. These learning algorithms find the optimal weights minimizing the error between the desired output provided in the training sample and the actual output computed by the model. The error is evaluated in terms of the Brier Score [BS]: Unfortunately, the performance of such a model highly depends on: (i) The starting values given to the weights in the first step of the estimation routine; (ii) the pre-fixed number of nodes belonging to the hidden layer. To overcome this drawback, a subset of matches J V = J L − J Tr is used to evaluate the performance of M different models in a (pseudo) out-of-sample perspective. In other words, M different ANN specifications are estimated, on the basis of several number of nodes belonging to the hidden layer and to decrease the effect of the initial values of the weights. All these M different ANN specifications are evaluated in the validation sample. Once got the optimal weights, that is the weights minimizing Equation (2) for the J V matches of the learning phase, the resulting specification is used to predict the probability of winning in the testing phase. In such a step, the left-hand side of Equation (1) is replaced byp j . A detailed scheme of the overall procedure is presented in Figure 2.
The algorithm used to train the neural network is a feedforward procedure with batch training through the Adam Optimization algorithm (Kingma and Ba 2014). The Adam algorithm, whose name comes out from adaptive moment estimation, consists of an optimization method belonging to the stochastic gradient descent class. Such an algorithm shows better performances with respect to several competitors (as the AdaGrad and RMSprop, mainly in ANN settings with sparse gradients and non-convex optimization, as highlighted by Ruder (2016), among others). More in detail, the Adam algorithm computes adaptive learning rates for each model parameter and has been shown to be efficient in different situations (for instance, when the dataset is very large or when there are many parameters to estimate).
A typical problem affecting the ANNs is the presence of noise in the training set, which may lead to (fragile) models too specialized on the training data. In the related literature, several procedures have been proposed to weaken such a problem, like the dropout (Srivastava et al. 2014) and early stopping (Caruana et al. 2001) approaches. Dropout is a regularization technique consisting of dropping randomly out some neurons from the optimization. As consequence, when some neurons are randomly ignored during the training phase, the remaining neurons will have more influence, giving the model more stability. In the training phase, a 10% dropout regularization rate is used on the hidden layer nodes. The early stopping is a method which helps to determine when the learning phase has to be interrupted. In particular, at the end of each epoch, the validation loss (that is, the BS) is computed. If the loss has not improved after two epochs, the learning phase is stopped and the last model is selected.

Empirical Application
The software used in this work is R. The R package for the ANN estimation, validation, and testing procedure is keras (Allaire and Chollet 2020). All the codes are available upon request. Data used in this work come from the merge of two different datasets. The first dataset is the historical archive of the site www.tennis-data.co.uk. This archive includes the results of all the most important tennis tournaments (Masters 250, 500 and 1000, Grand Slams and ATP Finals), closing betting odds of different professional bookmakers, ranks, ATP points, and final scores for each player, year by year. The second set of data comes from the site https://github.com/JeffSackmann/tennis_atp. This second archive reports approximately the same set of matches of the first dataset but with some information about the players, such as the handedness, the height, and, more importantly, a large set of statistics in terms of points for each match. For instance, the number of won points on serve, breakpoint saved, aces, and so forth are stored in this dataset. As regards the period under consideration, first matches were played on 4 July 2005 while the most recent matches on 28 September 2018. After the merge of these two datasets, the initial number of matches is 35,042. As said in the Introduction, we aim at configuring an ANN including all the variables employed in the existing approach to estimate the probability of winning in tennis matches. Therefore, after having removed the uncompleted matches (1493) and the matches with missing values in variables used in at least one approach (among the competing specifications LZR, KMR, DCPR, BTM, and BCA), the final dataset reduces to 26,880 matches. All the variables used by each forecasting model are synthetically described in Table 1, while in the Supplementary Materials, each of them is properly defined. It is worth noting that some variables have been configured with the specific intent of exploring some aspects uncovered by the other approaches, so far. These brand new variables are: X 1 -X 12 , X 17 , X 18 , and X 20 . For instance, X 17 and X 18 take into account the fatigue accumulated by the players, in terms of the time of stay on court and number of games played in the last matches. Moreover, the current form of the player (X 20 ) may be a good proxy of the outcome for the upcoming match. This variable considers the current rank of the player with respect to the average rank of the last six months. If the actual ranking is better than the average based on the last six months, then it means that the player is in a good period. The last column of Table 1 reports the test statistics of the non-linearity test of Teräsvirta et al. (1993), which provides a justification for using the ANNs. The null hypothesis of this test is the linearity in mean between the dependent variable (that is, the outcome of the match) and the potential variable influencing this outcome (that is, the variables illustrated in the table). The results of this test, on the basis of all the non-dummy variables included in X j , generally suggest rejecting the linearity hypothesis, corroborating the possibility of using the ANNs to take trace of the emerged non-linearity.
In the learning phase (alternately named in-sample period), each model uses the variables reported in the Table 1 according to the information set F j . In both the learning and testing procedures, with reference to the variables X 19 and X 31 , we only focus on the odds provided by the professional bookmaker Bet365. The reason for which we use the odds provided by Bet365 is that it presents the largest betting coverage (over 98%) of the whole set of matches. The implied probabilities (X 19 ) coming from the Bet365 odds have been normalized according to the procedure proposed by Shin (1991Shin ( , 1992Shin ( , 1993. Other normalization techniques are discussed in Candila and Scognamillo (2018).
All the models are estimated using six expanding windows, each starting from 2005, and at least composed of 8 years (until 2012). The structure of the resulting procedure is depicted in Figure 3, where the sequence of green circles represents the learning sample and the brown circles the testing (or out-of-sample) periods. Globally, the out-of-sample time span is the period 2013-2018, for a total of 9818 matches. It is worth noting that the final ANN models for each testing year will have a unique set of optimal weights and may also retrieve a different number of nodes belonging to the hidden layer. The ANN models selected by the validation procedure, as illustrated above, have a number of nodes in the hidden layer varying from 5 to 30.  (2017); KMR that of the logit regression of Klaassen and Magnus (2003); DCPR that of the probit regression of Del Corral and Prieto-Rodríguez (2010); BTM stands for the BT-type model of McHale and Morton (2011) and BCA for the Barnett and Clarke (2005) point-based approach. The column "Non-lin. test" reports the Teräsvirta test statistics, whose null hypothesis is of "linearity in mean" between Y and each continuous variable. ***, and * denote significance at the 1%, and 10% level, respectively.
· · · · · · · · · · · · · ·  The importance of each input variable, for each learning phase, is depicted in Figure 4, which reports the summed product of the connection weights according to the procedure proposed by Olden et al. (2004). Such a method calculates the variable importance as the product of the raw input-hidden and hidden-output connection weights between each input and output neuron and sums the product across all hidden neurons. Independently of the learning window, the most important variables appear to be X 12 (Head-to-head), X 3 (Won point return frequency), X 7 (Completeness of the player), and X 8 (Advantage on service).

Out-of-Sample Forecast Evaluation
Once estimated all the models and the ANN for each learning period, the out-of-sample estimates are obtained. When we go out-of-sample, the structural variables are invariant such that no predictions are made for these ones. For all the other variables, instead, we need to forecast the values for each match j using the information set F j|j−1 . For some variables, like X 24 , the literature proposed a method to forecast them (Barnett and Clarke 2005). For the other variables, we consider either the historical mean of the last year (or less, if that period is unavailable) or the value of the match before. Obviously, a model only based on this predicted information, like the BCA, is expected to perform worse than a model only based on structural variables. The evaluation of the performances of the proposed ANN model with respect to each competing model is employed through the Diebold-Mariano (DM) test, according to version, specifically indicated for binary outcomes, suggested by Gneiting and Katzfuss (2014) (Equation (11)). The DM test statistics with relative significance are reported columns two to five of Table 2. It results that the proposed model statistically outperforms all the competing models except the LZR. This holds independently of the specific out-of-sample period. It is interesting to note that the LZR is able to outperform the proposed ANN only in 2013. When the learning sample increases, this superior significance vanishes. Given that, from a statistical point of view, only the model of Lisi and Zanella (2017) has a similar performance to that of the proposed ANN, only the LZR will be used for evaluation purposes in the following section dedicated to the betting strategy. Note that the ANN model includes the same set of variables of the LZR, except for variable X 31 (see Table 1). Therefore, it would be worthy of investigation performing an additional analysis between the proposed (full) ANN model and a restricted ANN specification, based only on the variables of the LZR. We label this model as "LZR-ANN". The resulting evaluation is illustrated in the last column of Table 2, where again the DM test statistics are reported. Interestingly, the full ANN model largely outperforms the restricted LZR-ANN, signalling the importance of including as many as possible input variables in the ANN model.

Betting Strategies
As said before, the variables related to the bookmaker odds used in the learning and testing phases were those provided by Bet365. However, for the same match different odds provided by other professional bookmakers are available. Therefore, let o i,k,j be the odds provided by the bookmaker k for the player i and the match j, with k = 1, · · · , K. Then, the definition of the best odds is provided: Definition 5. The best odds for player i and the match j, denoted with o Best i,j , is the maximum value among the K available odds for the match j. Formally: Letp i,j be the estimated probability obtained from the proposed ANN or the LZR for the player i and the match j, for the out-of-sample matches (the period 2013-2018). Let us focus on the probabilities for the favouritesp f ,j . If these probabilities are high, the model (ANN or LZR) gives high chances to the favourite to win the match. Otherwise, low probabilitiesp f ,j imply that the model signals the other player as the true favourite. From an economic point of view, it is convenient to choose a threshold above which a bet is placed on the favourite, and below which a bet is placed on the underdog. High thresholds will imply a smaller number of bets, but with greater chances for that player to win the match. These thresholds are based on the sample quantiles of the estimated probabilities, according to the following definitions: Definition 6. Let P f and P u be all the estimated probabilities for a particular (out-of-sample) period, for the favourite and underdog players, respectively. Definition 7. Let q(α) be the sample quantile corresponding to the probability α, that is: Pr(P f ≤ q(α)) = α or Pr(P u ≤ q(α)) = α.
So far, we have only identified the jth match among the total number of matches played. However, for the setting of some betting strategies, it is useful to refer the match j to the day t, by the following definitions: Definition 8. Let j(t) and J(t) be, respectively, the jth and the total number of matches played in the day t. Assuming that t = 1, · · · , T, the total number of matches played is J = ∑ T t=1 J(t).
Definition 9. Letp st f ,j(t) ,p nd f ,j(t) ,p st u,j(t) andp nd u,j(t) be the estimated top-2 probabilities (following Definition (5)) for the favourite and underdog (according to Definition 4) among all the J(t) matches played on day t, respectively.
Finally, the best odds, that is the highest odds on the basis of Definition 5, whose estimated probabilities follow Definition 9 are defined as: be the best odds for player i associated to the matches played on day t satisfying Definition 9.
The four proposed betting strategies are synthetically denoted by S 1 , S 2 , S 3 and S 4 . The first two strategies produce gains (or losses) coming from single bets. In this case, an outcome for the match j successfully predicted is completely independent of another outcome, always correctly predicted. Therefore, from an economic viewpoint, the gains coming from two successful bets are equal to the sum of every single bet. However, it is also possible to obtain larger gains if two events are jointly considered in a single bet. In this circumstance, the gains will be obtained from the multiplications (and not from the summation) of the odds. The strategies S 3 and S 4 take advantage of this aspect. In particular, they allow to jointly bet on two matches for each day t, at most. Therefore, S 3 and S 4 may be more remunerative than betting on a single event but are much riskier, because they yield some gains only if both the events forming the single bet are correctly predicted. All the strategies will select the matches to bet on the basis of different sample quantiles depending on α (see Definition 7). For example, let α be close to one. In the case of betting on the favourites, only the probabilitiesp f ,j with very high values will be selected. That is, only the matches in which the winning of the favourite player is retained almost sure will be suggested. The same but with opposite meaning will hold for those matches whose estimated probabilitiesp f ,j are below to a sample quantile when α is close to zero. The message here is that the model (ANN or the competing specification) signals the favourite player (according to Definition 4) as a potential defeated player, such that it would be more convenient to bet on the adversary.

Strategy S 1
The strategy S 1 suggests to bet one dollar on the favourite, according to the Definition 4, in the match j if and only ifp f ,j ≥ q(α). The returns of the strategy S 1 , labelled as R S 1 f ,j are: The strategy S 1 suggests to bet one dollar on the underdog, according to the Definition 4, in the match j if and only ifp f ,j < q(α). In this case, the returns, labelled as R S 1 u,j are:

Strategy S 2
The strategy S 2 , contrary to S 1 , suggests to bet the amount o Best f ,j on the favourite (always satisfying Definition 4) in the match j if and only ifp f ,j ≥ q(α). The returns of S 2 for the favourite are labelled as R S 2 f ,j and are: The amount o Best u,j is placed on the underdog satisfying Definition 4 in the match j if and only if p f ,j < q(α). The returns of such a strategy R S 2 u,j are:

Strategy S 3
For the favourite on the basis of Definition 4, S 3 suggests to bet one dollar on the top-2 matches of day t both satisfying Definition 9 and the conditionsp st f ,j(t) ≥ q(α) andp nd f ,j(t) ≥ q(α). The returns for the day t, labelled as R S 3 f ,t , are: For the underdog selected by Definition 4, S 3 suggests to bet one dollar on the top-2 matches of day t both satisfying Definition 9 and the conditionsp st f ,j(t) < q(α) andp nd f ,j(t) < q(α). The returns for the day t, labelled as R S 3 u,t , are: −1 if at least one player u loses . (8b)

Strategy S 4
The strategy S 4 for the favourite suggests to bet the amount o Best,st f ,j(t) · o Best,nd f ,j(t) on the top-2 matches of day t both satisfying Definition 9 and the conditionsp st f ,j(t) ≥ q(α) andp nd f ,j(t) ≥ q(α). The returns for the day t, labelled as R S 4 f ,t , are: if both players f win; j(t) if at least one player f loses.
Finally, the strategy S 4 for the underdog suggests to bet the amount o Best,st u,j(t) · o Best,nd u,j(t) on the top-2 matches of day t both satisfying Definition 9 and the conditionsp st f ,j(t) < q(α) andp nd f ,j(t) < q(α). The returns for the day t, labelled as R S 4 u,t , are: nd u,j(t) if at least one player u loses. (10b)

Return-On-Investment (ROI)
The ROI obtained from a given strategy is calculated as the ratio between the sum of the returns as previously defined and the amount of money invested. Let B S l i be the total number of bets placed on player i, with i = { f , u}, according to the strategy S l , with l = {1, 2, 3, 4}. Moreover, let EXP S l i,j be the amount of money invested on betting on player i, according to strategy S l , in the match j. In other words, EXP S l i,j is nothing else that the summation of the quantities reported in Equations (3b), (5b), (7b) and (9b) for the bets on the favourite and in Equations (4b), (6b), (8b) and (10b) for the bets on the underdogs. The ROI for the player i and the strategy S l , expressed in percentage, labelled as ROI Note that when the strategies S 3 and S 4 are employed, the calculation of the ROI through Equation (11) requires the substitution of j with t. The ROI can be positive or negative, depending on the numerator of Equation (11). In the first case, it means that the strategy under investigation has yielded positive returns. Otherwise, the strategy has lead to some losses.
The ROIs for all the strategies, according to different values of the probability α used to calculate the quantile q(α), for the ANN and LZR methods and for the favourite and underdog players are reported in Panels A of Tables 3 and 4, respectively. The same tables report the ROIs for a number of subsets of matches, for robustness purposes (Panel B to Panel D). Several points can be underlined looking at Panels A. First, note that when α = 0.5, the quantile corresponds to the median. In this case, all the 9818 matches forming the out-of-sample datasets (from 2013 to 2018) are equally divided in bets on the favourites and underdogs. Obviously, betting on all the matches is never a good idea: Independently of the strategy adopted, the ROIs coming from betting on the favourites are always negative. However, those obtained from the matches selected by the ANN model are less negative than those obtained from the LZR. For instance, according to strategy S 4 , the losses for the ANN model are equal to 10.01% of the invested capital, while for the LZR are equal to 12.84%. The occurrence of negative returns in betting markets has been largely investigated in the literature. In terms of expected values, there are positive and negative returns according to the type of bettors. As pointed out by Coleman (2004), there exists a group of skilled or informed bettors, which placing low odds bets have a positive or near zero expected returns. Then, another group of bettors, defined as risk lovers, being less skilled or informed compared to the first group, place bets mainly on longer odds bets. This group of bettors has negative expected returns. Therefore, also negative returns are perfectly consistent with betting markets. Second, the number of matches largely decreases when the probability α increases (for Table 3) or decreases (for Table 4). When the bets are placed only on the matches with a well-defined favourite (that is,p f ,j ≥ q(α), with α = 0.95), the ROIs become positive. Considering the last two columns of Panel A for Table 3, independently of the strategy adopted, the ROIs of the ANN model are always positive and bigger than those of the LRZ method. Betting on the underdogs when the estimated probability for the favourite is below the smallest quantile considered (α = 0.05) lets again positive ROIs, but this time only for the ANN (last two columns of the Panel A for Table 4). Summarizing the results exposed in both the top panels of Tables 3 and 4, it appears clear that among the sixteen considered set of bets, the ANN model outperforms the competing model fifteen times for bets on the favourites (only under S 3 and α = 0.75 the losses are greater for the ANN model) and fourteen times for bets on the underdogs (under S 4 and α = 0.5 and α = 0.25 the LZR specification reports positive ROIs while the ANN model reports negative results).  Given these prominent achievements, the robustness of these ROIs obtained using the probability of winning estimated by the implemented ANN is extensively evaluated. This issue is addressed by restricting the set of matches to bet on according to three different criteria, shown in Panels B to D of Tables 3 and 4. More in detail, for bets on the favourites and underdogs only the matches played in Grand Slams and in Grand Slams and Masters 1000 are, respectively, selected (Panels B). Panels C depict the ROIs for the bets on matches with a top-50 as favourite (Table 3) or with a top-50 as underdog (Table 4). Finally, Panels D show the ROIs with respect to the matches only played in the first six months of each year. Again, the ROIs of the proposed model are generally better, sometimes much better than the corresponding ROIs of the LZR. For instance, betting on the underdogs with the sample quantile chosen for α = 0.10 and the strategy S 4 lets a ROI of 79.51% for the ANN and losses of 1.52% of the capital invested for the LZR.
To conclude, even if from a statistical point of view the out-of-sample performance of the ANN is not different from that of the LZR, from an economic viewpoint, the results of the proposed model are very encouraging. Irrespective of the betting strategy adopted, the chosen quantile used to select the matches to bet on, the subset of matches (if all the out-of-sample matches, or a smaller set on the basis of three alternative criteria), the ROIs guaranteed by the ANN are almost always better than those obtained from the LZR method.

Conclusions
The ultimate goals of a forecasting model aiming at predicting the probability of winning of a team/player before the beginning of the match are: Robustly beating the existing approaches from a statistical and economic point of view. This contribution attempted to pursue both these objectives, in the context of male tennis. So far, three main categories of forecasting models are available within this context: Regression-based, point-based, and paired-based approaches. Each method differs from the other not only for the different statistical methodologies applied (for instance, probit or logit for the regression-based approach, BT-type model for the paired-based specification, and so forth) but also for the different variables influencing the outcome of the match considered. This latter aspect highlights that several variables can potentially influence the result of the matches, and taking all of them into consideration may be problematic, mainly in the parametric model, such as the standard regressions. In this framework, we investigate the possibility of using the ANNs with a variety of input variables. More in detail, the design of the proposed ANN consists of a feed-forward rule trained by backpropagation, with Adam algorithm incorporated in the optimization procedure. As regards the input nodes, some variables come from a selection of the existing approaches aimed at forecasting the winner of the match, while some other variables are intentionally created for the purpose of taking into account some aspects uncovered by the other approaches. For instance, variables handling the players' fatigue both in terms of minutes and in terms of games played are configured. Five models belonging to the previously cited approaches constitute the competing specifications. From a statistical point of view, the proposed configuration of the ANN beats four out of five competing approaches while it is at least as good as the regression model labelled as LZR, proposed by Lisi and Zanella (2017). In order to provide help in the complicated issue of the matches' choice on which placing a bet, four betting strategies are proposed. The chosen matches derive from the overcoming of the estimated probability sample quantiles. Higher quantiles imply fewer matches on which place a bet and vice versa. Moreover, two of these strategies suggest to jointly bet on the two matches played in a day whose estimated probabilities are the most confident, among all the available probabilities of that day, towards the two players. These two strategies assure higher returns at the price of being riskier because they depend on the outcomes of two matches. In comparing the ROIs of the ANN with those of the LRZ obtained from the four betting strategies proposed, it results that the ANN model implemented guarantees higher, sometimes much higher, net returns than those of the LZR. These superior ROIs are achieved irrespectively of the choice of the player to bet on (if favourite or underdog) and of the subset of matches selected according to three different criteria.
Further research may be related to the extension of the ANN tools to other sports. Moreover, the four proposed betting strategies could be applied to other disciplines, taking advantage of the market inefficiencies (see the contribution of Cortis et al. (2013) for the inefficiency in the soccer betting market, for instance).