Analyse Success Model of Split Time and Cut-Off Point Values of Physical Demands to Keep Category in Semi-Professional Football Players

The aim of this study was to analyse different success models and split time on cut-off point values on physical demands to keep category in semi-professional football players. An ad hoc observational controlled study was carried out with a total of ten (840 match data) outfield main players (25.2 ± 6.3 years, 1.79 ± 0.75 m, 74.9 ± 5.8 kg and 16.5 ± 6 years of football experience) and monitored using 15 Hz GPS devices. During 14 official matches from the Spanish division B in the 2016/2017 season, match data were coded considering the situational variable (score) and classified by match results (winning, losing or drawing). The results show significant differences between high-intensity attributes criteria that considered split time in velocity zones of 0–15 min (p = 0.043, ηp2 = 0.065, medium), 30–45 min (p = 0.010, ηp2 = 0.094, medium) and 60–75 min (p = 0.015, ηp2 = 0.086, medium), as well as sprint 60–75 min (p = 0.042, ηp2 = 0.066, medium) and 75–90 min (p = 0.002, ηp2 = 0.129, medium). Decision tree induction was applied to reduce the disparity range of data according to six 15-min intervals and to determine the cut-off point values for every parameter combination. It was possible to establish multivariate models for the main high-intensity actions criteria, allowing the establishment of all rules with their attributes and enabling the detection and visualisation of relationships and the pattern sets of variables for determining success.


Introduction
Football is characterised by brief linear and non-linear efforts of high intensity alternated with non-established periods (short or long) of recovery [1], where performance depends on different technical, tactical, biomechanical, psychological and physiological characteristics [2]. During a professional match, a football player covers a total distance of 9-14 km [3]. This distance is mostly covered at a low intensity (<14 km/h), while only 7-12% of this distance is carried out at a high intensity (14-21 km/h) and 1-4% in very high-intensity runs (21.1-24 km/h) and sprinting (over 24 km/h) [4]. In this context, the movement patterns of the football players are intermittent in nature [5], Appl. Sci. 2020, 10, 5299 2 of 13 where the high-intensity actions that involve accelerations and decelerations are critical points [6], with a work:rest ratio of 1:8 [7].
There are multiple variables of the situation, such as the competitive level [8], the opponent's level [9], the tactical system [10], the pressure to achieve the objective [11] and the conditions, quantity and the intensity of the efforts demanded during the competition [12]. Recently, several studies relied on these physical demands during a football game to propose metrics that quantify specific aspects of football performance [13][14][15]. This performance and their connection with competitive success (i.e., to win a match) have been the focus of attention in several studies [12,15,16]. Thus, a positive relation has been found between greater physical demand and match success, i.e., for the English Premier League [1], the Spanish La Liga [12] or elite women football players [17].
Football elite and semi-professional training generally includes tracking by GPS or different tracking systems of workloads, movement patterns and performance [18]. This situation provides a huge amount data about each player, team, game and season. In this context, data mining techniques are employed to assist in decision-making [19]. Sports teams can gain advantage over their rivals by converting data to applied knowledge through appropriate data extraction and interpretation [20].
Moreover, although various unidentified factors may influence the results, data mining will still be valuable in result prediction [21]. To date, the use of data mining techniques to predict football success has focused on decision tree classifiers [22]. This technique generates outcomes by repeatedly dividing training data [23]. An important feature of this machine learning model is its ability to determine the best threshold or cut-off point values that differentiate each training [24]. These models may be critical in considering load-related information during football drills used by elite and semi-professional players compared to competition match demands [25].
However, no study has yet investigated the influence of the physical performance variables on match results using the cut-off points method. Therefore, the aim of this study was to analyse the influence of the different physical parameters variables on match results, using cut-off points method, in a semi-professional football team that fought to keep category.
The contribution of this study to the literature and the differences from previous studies are clear, both in the field of machine learning applications and in the contributions to the field of performance analysis. According to an updated review published in 2019, only one of the studies found included physical variables for the discrimination of the results and the cut-off points, applied to basketball [26]. This study also confirms that one of the two most productive methods of machine learning are the decision trees [26]. Previous studies that included machine learning application in soccer used as discriminating variables the participation of star players or not [27], technical variables such as passes or shots [28][29][30] or the previous results in the season [31]. A more current study is the only one found that uses physical variables to establish cut-off points but applied to simulated games [32]. There is no clear evidence in the use of decision tress and cut-off points with physical variables in real competitive situations and specific situations such as the process to keep category. Therefore, this study provides a relevant advance in knowledge, by expanding the use of decision trees with physical variables and how it can be applied to specific competitive situations.

Sample
Ten male main outfield players (25.2 ± 6.3 years, 1.79 ± 0.75 m, 74.9 ± 5.8 kg and 16.5 ± 6 years of football experience, goalkeeper excluded) participated voluntarily during 14 competition matches until having the options to keep category in Spanish second B Division League matches. Only players who played regularly and in the majority of official league matches were considered for the study (i.e., the criterion for inclusion was that players had to have played more than 65 min of total playing time during each match) [33]. All players and coaches were informed of the protocol of the study, and participants' signed informed consent was obtained before participating in the study. The study was conducted according to the requirements of the Declaration of Helsinki (2013), was approved by the Clinical Research Ethical Committee of European University of Madrid (CIPI48/2019) and received formal approval from the professional football club involved.

Procedures
The final sample consisted of 840 match data. All variables were recorded with a 15 Hz GPS system, 100 Hz-10G accelerometer and 50 Hz magnetometer (GPSport, Camberra, Australia) on six 15-min split-time time measurement (i.e., 0-15, 15-30 and 30-45 min and 45-60, 60-75 and 75-90 min, respectively). The sampled match results (7 home and 7 away matches) performance consisted of a total of 11 goals scored and 18 conceded by the sample team (i.e., 2 wins, 5 draws and 7 losses). Only 14 out of the total of 38 matches of the season were analysed. For this reason, the matches prior to the 22nd match and after the 35th match, where fulfilling the aim to stay in the same division was mathematically impossible were ignored. After performance tagging at each football match, all GPS data were downloaded to a PC and analysed using GPS software package TeamAMS v4.0 (GPSport, Canberra, Australia).
Matches were classified in function to outcome to identify the dimensions of physical performance having a membership value (0 scored= (0) zero point of success; 2 scored= (1/3) of success; and 3 scored = (1) one point of success). Fuzzy set "Success" is defined over the domain 0 to |C|*3 of possible points for the set of matches C and a membership function as: [success (C) = point (C)/(|C|) *3]. Generalise the function's success described above with W (number of win matches), D (number of drawn matches) and L (number of lost matches). Therefore, |C| = W + D + L, and success is achieved when |C| = (Aw*W + AD*D + AL*L)/(AW*|C|), with Aw the value for winning match, AD drawing match and AL losing match, such that Aw < AD < AL [36]. The mathematics member function of fuzzy set Success was determined by: Knowing that the attribution of success and failure is difficult to define and depends on the subject's perspective [37], and in accordance with the approaches used in previous research [38], we propose the segmentation of the level of success, measured with the membership function if fuzzy set of success, with three labels: Success (S), Defeat (D) and a third intermediate one (not success, U) In the end, the sample was divided taking into consideration this situational variable (score), classified by the match result (winning, losing or drawing) and divided by the six split-time time measurements mentioned above.

Statistical Analysis
To investigate the effect of the cut-off point values that best differentiate the split time on the physical demands in competition, data normality was confirmed via the Shapiro-Wilk test (p > 0.05). Comparisons among success, unsuccess and defeat were performed using a one-way ANOVA and post hoc analysis (p < 0.05). The proportion of the total variability attributable to a factor was estimated via effect size using the partial Eta-squared (ηp 2 ) value with the following interpretation: small (ηp 2 = 0.01-0.059); medium (ηp 2 = 0.06 − 0.14); and large effect (ηp 2 > 0.14).
The decision tree, generated with the ID3 algorithm using a gain ratio criterion, was the tool selected to reduce the disparity within the range of data (i.e., to minimise entropy) [22] and to describe the physical performances of the team football players [18]. To establish the cut-off point values according to six 15-min intervals, a classifier model was used [39]. To process cut-off point values of the match, the dataset was analysed using Rapidminer studio v. 8.1 (RapidMiner, Inc. Headquarters, Boston, MA, USA). Finally, the cut-off point in relation to temporal split-time with three linguistic labels was determined ( Figure 1).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 14 To investigate the effect of the cut-off point values that best differentiate the split time on the physical demands in competition, data normality was confirmed via the Shapiro-Wilk test (p > 0.05). Comparisons among success, unsuccess and defeat were performed using a one-way ANOVA and post hoc analysis (p < 0.05). The proportion of the total variability attributable to a factor was estimated via effect size using the partial Eta-squared (ηp 2 ) value with the following interpretation: small (ηp 2 = 0.01-0.059); medium (ηp 2 = 0.06-0.14); and large effect (ηp 2 > 0.14).
The decision tree, generated with the ID3 algorithm using a gain ratio criterion, was the tool selected to reduce the disparity within the range of data (i.e., to minimise entropy) [22] and to describe the physical performances of the team football players [18]. To establish the cut-off point values according to six 15-min intervals, a classifier model was used [39]. To process cut-off point values of the match, the dataset was analysed using Rapidminer studio v. 8.1 (RapidMiner, Inc. Headquarters, Boston, MA, USA). Finally, the cut-off point in relation to temporal split-time with three linguistic labels was determined ( Figure 1).

Results
Significant differences were found in total distance covered in splits 0-15, 30-45 and 60-75, where more metres are covered in defeat matches (p < 0.05). Furthermore, many sprints were shown in defeat matches in the splits 60-75 and 75-90 (p < 0.05). The descriptive and statistical inference results for each variable and split time are presented in Table 1.

Results
Significant differences were found in total distance covered in splits 0-15, 30-45 and 60-75, where more metres are covered in defeat matches (p < 0.05). Furthermore, many sprints were shown in defeat matches in the splits 60-75 and 75-90 (p < 0.05). The descriptive and statistical inference results for each variable and split time are presented in Table 1. A result of the movement patterns derived by the participating individuals of player was classified into three models for the different criteria considered (i.e., zones of velocity, sprint and acceleration and deceleration number). It generated three decision trees for every criterion variable with regard to six 15-min intervals (I; 1 to 6), ranging from 0-90 min to the identified cut-off values in relation to the mined six split times that differentiated the factors of win, draw and lost in competitive matches.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 14
A result of the movement patterns derived by the participating individuals of player was classified into three models for the different criteria considered (i.e., zones of velocity, sprint and acceleration and deceleration number). It generated three decision trees for every criterion variable with regard to six 15-min intervals (I; 1 to 6), ranging from 0-90 min to the identified cut-off values in relation to the mined six split times that differentiated the factors of win, draw and lost in competitive matches.
A decision tree was created ( Figure 2) categorising work rate movement patterns into six speed zones, ranging from 0 to >21.0 km·h⁻¹ The attributes were codified hierarchically into 16 [root nodes (RNs) 1-16], which could conduct the performed (cut-off) in each split time into 16 levels. Statistical tendency for differences showed: Success (S1,2,3, 6,8,9,12,13,14), Unsuccess (U4) and Defeat (D5,7,10,11,15,16). However, the number-sprint abilities decision tree (Figure 3) was characterised inducing sprinting performance during match play according to attributes, which were codified hierarchically into 12 (RNs 1-12), which could conduct the performed cut-off in each split time into of 12 levels. The statistical tendency was: Success (S1,6,7,10,11,12), Unsuccess (U3) and Defeat (D2,4, 5,8,9). Finally, the number of accelerations and decelerations decision tree (Figure 4) was labelled into profile markers' capacity during match play and were codified hierarchically into 10 (RN 1-10), which could conduct the performed cut-off in each split time into 10 levels. The statistical tendency for differences showed: Success (S2,5,6,9,10), Unsuccess (U4, 8) and Defeat (D1,3,7).     In addition to attributes which could be categorised to the three labels (i.e., Success, Unsuccess and Defeat) through their nodes, fuzzy success set was applied to the gradual assessment of nodes membership in the real unit interval [0,1] taking into account that some information is incomplete, and in particular the statistical distributions (Table 2). In addition to attributes which could be categorised to the three labels (i.e., Success, Unsuccess and Defeat) through their nodes, fuzzy success set was applied to the gradual assessment of nodes membership in the real unit interval [0, 1] taking into account that some information is incomplete, and in particular the statistical distributions (Table 2). Finally, the RNs and attributes enabled identification of linguistic labels of success patterns for physical demands (zones of speed, sprint and acceleration and deceleration), establishing a hierarchy among the six 15 split-time and cut-off point values of the most decisive variables in reaching success (Table 3).

Discussion
This research exposes the cut-off point of efforts performed during football matches to keep category in semi-professional football players. The main contribution is the proposal to model the success of a set of matches played taking into account the situational variable match result (winning, losing or drawing) using a fuzzy set of success and a discretisation of its continuous value within [0, 1]. Accordingly, the physical demands assessed were accurately coded across three linguistic labels (i.e., success, unsuccess and defeat), establishing the cut-off-point values of six zones of velocity, acceleration, deceleration and sprint numbers according to six 15-min intervals.
Regarding the speed zones, an increase in Z 1 per player, especially during the first split time (0-15 min), is considered the main discriminating factor that separates defeat/unsuccess from success. It is important to highlight that the first 15 min of the game are the more random part of a match [40], since the equality in the score line equates with the work-rate ratio and the high-intensity actions of each team due to the absence of fatigue [41]. However, this is a key factor, since scoring a goal during the first split time implies scoring first. Scoring first has been demonstrated to be the strongest predictor of success in both a group phase and knockout stages in elite football tournaments [42], because, when a team is drawing or losing, it reduces the attempts on goal [43] and the team that is winning decides to play with less risky options with a well-structured defensive strategy [44]. It is also important to highlight that, in the last split (75-90 min), it is necessary to perform more actions in Z 3 per player than the opponent. In this context, the ability to maintain skill proficiency during football match play is considered an important factor in overall player performance and match success [45]. In this period of a football match, there exists a disproportionate number of goals scored [46], confirming a relationship between match-related fatigue and success [47]. These results suggest that high-intensity actions are less important in this phase of the play, since fatigue is affecting all players, reducing their performance by 20% [48]. Thus, teams may maintain a consistent running pace that avoids unnecessary loss of possession, a key aspect noted earlier in the final 15 min of play [49] to achieve success.
Regarding sprints, these constitute one of the most important activities in football, even if they only represent 1-12% of the total distance covered in a match [50]. Our results suggest the need to perform 8.5 sprints per player to achieve success, regardless of the time of the match. This highlights the importance of conditional work, since the football games may demand the same type of high-intensity actions at the beginning (0-15 min) as at the end (75-90 min). These high-intensity actions have been related to the least successful elite teams [51]. Therefore, successful teams may permit themselves to impose their style of play on less successful teams, which may have implications for the reduction in physical performance during the later stages of the game [52]. In this sense, Rampinini et al. [51] highlighted the relationship between the amount of sprint actions completed in the early stages of the game and the decrement in intense efforts completed in the later stages. It is important to emphasise the great variability of results on sprint football actions in the literature [53,54], partly as a consequence of the methodological differences that exist between the studies, or also maybe directly related to game factors (level of opposition, stage of the competitive season, etc.) [51]. However, all the results show the need to maintain a homogeneity of high-intensity actions throughout the match to be successful [52], according to our results.
Regarding the acceleration and deceleration actions, data show the need to carry out a controlled number of accelerations in order to achieve success, especially in the last section of the first half and second half. In this regard, Vigh-Larsen et al. [55] found that these high-intensity actions tend to decrease gradually throughout each half, being temporarily recovered at the onset of the first half. As previously mentioned, Di Salvo et al. [52] showed that successful teams do not need to take more high-intensity actions and can control matches with better tactical positioning. Similarly, elite youth football players perform more acceleration actions per game (>130) than top-elite football players (~119) [56], due to the style of play during youth football matches and the lack of experience needed for the players to maintain the demands of the game [57].
Finally, one of the limitations of this study concerns the size sample analysed, the heterogeneity teams and played could present different finding regarding present research. One of the limitations of this study concerns the sample studied; with a single team it is impossible to divide the sample by position to obtain the individual profile position (i.e., full-back, wing-back, central defender, sweeper, midfield, winger, and centre forward). It would be interesting to extend this study to a professional competition including the 20 teams with all the players (~11,500 observations per season). Secondly, future studies using small sample sizes should consider the use of Bayesian models, random forest, SVM, multilayer perceptron and regression models to check the differences among matches and different positions performance.

Conclusions
This study proposes the creation of different decision trees based on the cut-off points of efforts performed during football matches to keep category in semi-professional football players.
Using this method (i.e., machine learning algorithm) permitted validating our findings (top-down greedy approach) that it can handle both numerical and categorical data. Firstly, we identified importance of variables and their cut-off points according to six 15-min intervals. Secondly, we used a nonparametric method (i.e., there is no assumption about the distribution space and the structure of the classifier). Finally, each node (i.e., in a greedy manner) was achieved by performing the largest information gain for the categorical targets.
The main practical implication of this study is that coaches can use the ID3 algorithm as an ecological tool to establish the physical performances cut-off point values during official match conditions according to six split times. The classification approach can mainly be used to assess training and competition by establishing the cut-off points of velocity zones, acceleration and deceleration, and velocity divide the predictor space (independent variables) into distinct and non-overlapping regions. In addition, the fuzzy set success provides criteria for the selection of appropriate pattern training drills for optimal physical preparation during training sessions, while there is the option to keep category in semi-professional football. This study also opens a new horizon in the possibility of applying new statistical techniques to improve decision-making in football, which could also be applied to other team sports.