The Role of Veracity on the Load Monitoring of Professional Soccer Players: A Systematic Review in the Face of the Big Data Era

: Big Data has real value when the veracity of the collected data has been previously identified. However, data veracity for load monitoring in professional soccer players has not been analyzed yet. This systematic review aims to evaluate the current evidence from the scientific literature related to data veracity for load monitoring in professional soccer. Systematic searches through the PubMed, Scopus, and Web of Science databases were conducted for reports onthe data veracity of diverse load monitoring tools and the associated parameters used in professional soccer. Ninety-four studies were finally included in the review, with 39 different tools used and 578 associated parameters identified. The pooled sample consisted of 2066 footballers (95% male: 24 ± 3 years and 5% female: 24 ± 1 years). Seventy-three percent of these studies did not report veracity metrics for anyof the parameters from these tools. Thus, data veracity was found for 54% of tools and 23% of parameters. The current information will assist in the selection of the most appropriate tools and parameters to be used for load monitoring with traditional and Big Data approaches while identifying those still requiring the analysis of their veracity metrics or their improvement to acceptable veracity levels.


Introduction
Load monitoring is used to ensure the best workload that optimizes physical fitness and sports performance while preventing athletes from injury or illness in soccer [1]. In this context, a true individualization of all training factors in team sports is mandatory for a better fitness-fatigue equilibrium through an individualized monitoring approach [2]. Meanwhile, the number of injuries [3] and the physical performance of soccer players continue to increase over the years [4]. For instance, in a 13-year longitudinal analysis, it was found that hamstring injuries have annually increased by 4% in professional soccer players [3]. Factors associated with load monitoring, such as player load, match frequency, playing style, team management, and the continuity of technical staff, have also influenced these trends [3]. On the other hand, it is well accepted that physical performance in soccer matches has continuously improved over the past years. For instance, total distance covered, high-intensity running and sprinting distances, and number of sprints increased by 2% ( [4]. However, the VO 2max of elite female [5] and male [6] soccer players has not changed between 1989 and 2007 for females and between 1989 and 2012 for males. This paradox of a similar aerobic fitness between decades, despite objectively increased match demands, highlights the need to select appropriate monitoring tools. Furthermore, considering that soccer is a complex sport in which players need to develop several physical capacities (e.g., acceleration, agility, endurance), the selection of the best tools to monitor the evolution of players' physical fitness over the season is required for better managing the complex balance between training, competition, recovery, and evaluation [7][8][9][10].
We are living in a time where technology has experienced a fast evolution in the sports arena [11]. This technological growth has enabled the 24 h monitoring of athletes on an individual basis [12], thus allowing large data sets, i.e., Big Data, to be gathered in sport settings. Careful analyses of these large datasets can enhance our knowledge in sportsscience and medicine, thus supporting the making-decision process for designing appropriate training and competitive strategies [13,14]. However, some premises, which are known as the four Vs of Big Data, should be attained fromthese datasets for these purposes: (1) volume, (2) variety, (3) velocity, and (4) veracity [15]. Volume refers to the size of the datasets; variety refers to different data formats and data sources; velocity describes the speed at which data is generated and processed for analyses; and veracity refers to the accuracy, quality, relevance, uncertainty, reliability, and predictive values of the collected data [14][15][16][17]. Focusing on veracity to identify the best monitoring tools and parameters in soccer is mandatory to identify the "errors" [18]. Furthermore, veracity is a preliminary step to effectively meet all these premises and, thus, optimize the applicability of Big Data to soccer with the best cost-to-benefit ratio. For instance, according to IBM [15], poor data quality generates an economic cost of around USD 3.1 trillion per year in the USA. If this value were the gross domestic product (GDP) of any country, it would be placed in the top 6 GDPs in the world.
Therefore, this systematic review aims to evaluate data veracity for load monitoring in professional soccer. This information is important for a deep understanding of the tools and parameters used in load monitoring as well as the effective implementation of Big Data analyses in professional clubs.

Materials and Methods
We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [19].

Sources and Study Selection Process
Three electronic databases (PubMed, Scopus, and Web of Science) were systematically searched from inception to August 2020. The command line ("monitoring" OR "monitor") AND ("training load" OR "load") AND ("soccer" OR "football") was used in the title, abstract, and keywords during the electronic searches. The selection process was completed by two authors (J.G.C. and C.A.C.F.). If some doubt was found during this step, a third author (D.B.) assisted in the decision. The quality appraisal was completed by one author (J.G.C.).

Eligibility Criteria
Initially, the articles reviewed were identified after reading the titles and abstracts based on the following inclusion criteria: (1) The study was written in English; (2) The study was published as original research in a peer-reviewed journal, and a full-text article was available; (3) Data were reported for soccer players; (4) The participants were professional soccer players; (5) Load monitoring parameters were included.
Then, after this first screening, the eligibility criteria according to PECO were applied in the remaining full manuscripts. (P)articipants: healthy, professional soccer players of any age and sex. (E)xposure: exposure to load monitoring in the training session and/or on match day. (C)omparators: control groups without load monitoring are accepted but not mandatory. (O)utcomes: load monitoring tools and parameters.

Data Extraction
Five authors (A.L.M., G.R.C., R.G., F.M.V., and S.D.R.) extracted the following information from the included full-text articles: authors, year, sample information (mean, age, sex, sample size, competitive level), study design, veracity data, and information on load monitoring tools and parameters. The parameters were classified according to the applied method (e.g., laboratory or portable methods; venous or capillary blood samples). Discrepancies were resolved through discussion until consensus was reached. A narrative synthesis of data was performed.
The median was used when a range of a veracity metric was reported in a manuscript. All veracity metrics of the reported parameters were used to determine the median and range (minimum-maximum) for each tool.

Quality Assessment
The quality of all studies was evaluated by one author (J.G.C.) using objective evaluation criteria (see supplementary material, Table S1) based on a previous study by Saw et al. [20]. Scores were assigned based on how well each criterion was met, assuming a maximum possible score of 8 (low risk of bias). Studies with a score of ≤4 were considered poor and were, therefore, subsequently excluded.

Veracity Analysis
Veracity analysis was performed based on the use of measures related to accuracy, uncertainty, reliability, and quality of data [18,[21][22][23]. All these measures were only obtained when explicitly reported by the authors'own data. For the used instruments, accuracy was checked in the manufacturer's official documentation or website. Accuracy is understood as the closeness of the agreement between the result of a measurement and the true value of the measurand [24]. Uncertainty and reliability were obtained by absolute and relative consistencies using the standard error of measurement (SEM) [25] (also known as the typical error of measurement (TEM) or typical error (TE) [26], beyond the measures generated from the SEM itself as the minimum difference (MD) or minimum detectable change (MDC)) and the intraclass correlation coefficient (ICC) [25][26][27]. The SEM is the measure of the amount of error variance in a set of obtained scores, different from the standard error of estimate (SEE), which is the standard deviation of true scores if the observed score is held constant [28]. Standard uncertainty is the uncertainty of the result of a measurement expressed as a standard deviation [29]. Precision is understood as the closeness of agreement between the results of successive measurements of the same measurand, carried out under the same conditions of measurement. Precision is also called repeatability [24]. Limits of agreement (LOAs) refer to the reference interval for the test-retest differences expected for 95% of the population [30]. The coefficient of variation (CV) refers to the variation around the average, expressed as a percentage; data quality by means of the CV is interpreted by the level of instability [31].
Some reference values are assumed to better understand the results. SEM results lower than 5% have been suggested [32,33] and classified as good (SEM = < 5.0%), moderate (SEM = 5.0-9.9%), or poor (SEM ≥ 10.0%) [34][35][36]. However, it is recommended to present the corresponding value with confidence intervals (CIs) [37,38] because this is a measure of how much the measured test scores are spread around a "true" score [37,38]. Although the SEM may be better than the ICC [25] to evaluate reliability, both are reported in the present manuscript. According to Koo and Li [39], ICC values < 0.50 = poor, between 0.50 and 0.75 = moderate, between 0.75 and 0.90 = good, and >0.90 =indicative of excellent reliability. For the CV, it has been suggested that CV > 30.0% = large, CV < 30.0% and >10.0% = medium, and CV < 10.0% = small [40]. These reference values were used in the figure to determine the traffic-light systems(i.e., red, amber or green). For accuracy, MDC, SEE, precision, uncertainty and LOAs, no reference values were found in the scientific literature.

Results
The initial search returned 1035 articles (see Figure 1). After the removal of duplicates (n = 461), a total of 574 studies were retained for full-text screening. Following the eligibility assessment, 479 studies were excluded as they did not meet the inclusion criteria, while one full-text manuscript was not found. Finally, 94 studies were included in this systematic review .

Characteristics of the Studies and Risk of Bias
The pooled sample size and age included 2066 participants aged 24 ± 3 years, being composed mostly of male athletes (95%). The average duration of the load monitoring interventions was 168 days (range: 1-1034). The athletes were all professional soccer players from first divisions (81%), second divisions (12%), or third divisions (1%),andnational A total of 578 parameters were used to monitor the load from these 39 tools. Data veracity was reported for 54% of the tools in at least one parameter, which resulted in 23% of these parameters. Thus, most studies did not report metrics of veracity for their tools and parameters (see details in Figure ??, Table 1 and Table S2). Specifically, Table S2 presents a summary of all the selected studies, with further information on study design (and duration), sample level (n; sex; age; country of sample), tools (brand and model or reference), accuracy reported by the company, and parameters as well as veracity analyses (metrics).        (n = 1) Precision = 3-10%

Discussion
The purpose of this systematic review is to evaluate data veracity for load monitoring in professional soccer. We describe here the metrics reported for tools and their associated parameters in a sample with 87% of athletes playing at top-level divisions or in their national teams. Thirty-nine different tools were found in the included studies; however, 73% of these studies did not report any veracity metric. Thus, data veracity was found for 54% of tools and 23% of associated parameters. Of note, some veracity metrics present a great variability in the obtained metrics. The SEM and MDC ratings were between 0.3-89.7% (good-poor) and 48.0-305.0%, respectively. The ICC ratings were between poor and excellent reliability levels (i.e., 0.004-0.98), with the CV ratings from poor to great (i.e., 1.1-193.5%). For the remaining metrics, the results were between 2.0-10.0% for bias, 4.0-22.0% for SEE, and 3.0-10.0% for precision. In addition, some tools were used without the required accuracy. All this information provides a precise state of the art regarding the potential of current monitoring tools to be used with Big Data approaches.

The Impact of Accuracy and CV on the Practical Application of Tools and Parameters
The need for reporting accuracy can be better understood following an example reported by the United Nations Industrial Development Organization [24]. Thus, in practical terms, if the accuracy of a tool is ±3, this means that if 100 would be the reading displayed on a tool during a measurement, then the actual value could be anywhere between 97 and 103, including 97 and 103. In this regard, it is important to note that some tools did not present the required accuracy for the experimental design performed in each study. For example, the accuracy of an infrared camera is ±2 • C, and the applied protocol in the study provided for changes in body temperature ranged between 0.3 and 1.5 • C [98]. Moreover, in EPTS with a sampling rate of 10 Hz, the accuracy for high accelerations (>4 m/s 2 ) is compromised, which may have an impact on the interpretation of results [79] because of the poor estimation of instantaneous velocity when performing these very high accelerations (>4 m/s 2 ) during player tracking in team sports [135]. Therefore, tools with inadequate accuracy for the required interventions need to improve their accuracy or else they willgenerate an input noise, thus compromising the outputs of a Big Data approach.
A common concern of sports scientists as well as strength and conditioning coaches of high-level soccer clubs when using monitoring tools refers to reliability, which is one of the main factors related to the discrepancy between the expected and actual effectiveness of monitoring, thus representing a potential barrier for successful interventions [136]. For this purpose, the classic classification of the CV is considered for the decision-making process: >30% = large, <30% and >10% = medium, and <10% = small [40]. In the present systematic review, extremely large CVs, such as 193.5% for sprinting at speeds ≥25.2 km/h, are measured with EPTS [129]. Moreover, 137% for the time between peak power and peak force during a CMJ [137] and 679% for the variance from mean bedtime assessed by Actigraphy [138] have been found in the scientific literature. Large CVs make it extremely difficult to detect the real differences between moments after an intervention unless these differences were also very large [139]. Therefore, caution should be taken when using parameters with a CV greater than 30%.

ICC and SEM Should Always Be Reported Together
The ICC is probably the most popular reliability metric in the literature. Our analysis showed that the ICC was often included as a reliability metric but with large variability, ranging from poor = 0.004 (i.e., strain from session-RPE) [131] to excellent = 0.98 (i.e., contraction time measured with tensiomyography or jump height of a CMJ) [57,131]. Particularly, in the case of tensiomyography, although there is sufficient data in favor of its good-to-excellent relative reliability in the scientific literature (i.e., ICC = 0.70-0.99) [140] in agreement with our findings (i.e., ICC = 0.80-0.98), more evidence is necessary for identifying the accuracy of this tool, according to a current systematic review with metaanalysis [141], mainly because reliability based only on the ICC cannot be recommended [142]. This is because there is a relationship between the ICC and between-subjects variability, where the heterogeneity of the subjects is a determinant for the ICC obtained [25]. An excellent ICC can mask poor trial-to-trial consistency when between-subjects variability is high. Conversely, a poor ICC can be found even when trial-to-trial variability is low when the between-subjects variability is low. In this case, the homogeneity of the subjects' means will make it difficult to differentiate between subjects even though the absolute measurement error is small. In other words, if individuals differ little from each other, the ICC may be poor, even if the trial-to-trial variability is small [25]. Thus, an examination of the ICC in conjunction with the SEM is, therefore, recommended [142]. For example, anaverage HR, with ICC = 0.77 (good) and SEM = 3.0% (good), in one case [127] would be preferable than the same parameter with ICC = 0.89 (good) and SEM = 27% (poor) [124].
On the other hand, the SEM is not affected by between-subjects variability, as occurs with the ICC. The SEM has been used to define the boundaries in which a subject's true score lies, and it can be calculated as both absolute and relative scores [25,26,37,38]. However, the range of variation of this veracity metric for the found parameters was high (i.e., between good = 0.3% for total distance covered, as measured with EPTS, and poor = 89.7% for accelerations >4 m/s 2 , measured with EPTS) [64,73]. This would suggest that for the same tool, some parameters could be more recommended than others. In this regard, some researchers are using a limit of 10% of the SEM to determine which parameters can be used in subsequent analyses [35,87,143]. Although its use is strongly recommended by the specialized literature [25,26], some points of confusion can be highlighted. The first is that it is called either standard error of measurement (SEM), typical error of measurement (TEM), or typical error (TE), even with the same form of calculation. The second point, which is most impactful, is that the SEM is commonly referred to as the coefficient of variation (CV) and reported as a percentage, but the calculation is not performed by dividing the standard deviation by the mean and subsequently multiplied by 100 [144]. These facts can easily drive researchers and practitioners to misinterpretations. Furthermore, the SEM can also be used in the interpretation of individual scores as the minimum difference (MD) or minimum detectable change (MDC) [25,26]. In the present study, the MDC ranged from 48.0% for the distance at maximal sprint speed recorded with EPTS to 305.0% for the very high-speed running time obtained with EPTS [76] with regards to the group average. This extremely high variability could hinder the verification of real changes in an individual's performance. Of note, the MDC in professional soccer players is not easily found in scientific literature. Apart from the included study in the current systematic review, another study reported an MDC between 1.0% and 30.0% for performance in a new agility test [145]. Therefore, the SEM is highly recommended for determining the reliability of any tool and parameter, while its interpretation can be facilitated when reported in both relative and absolute terms.

Needs, Limitations, and Potential of Big Data in Professional Soccer
There is limited information regarding the last three metrics of veracity. Bias was reported as absolute values in only two studies [127,129], and, in one study, it was reported as relative values ranging between 2.0-10.0%, with the SEE ranging between 4.0-22.0% for total and sprint distances between 19.8-25.2 and above 25.2 km/h, recorded with EPTS [133]. Precision ranged between 3.0-10.0% for urine metabolomic parameters in a single study [128]. In view of the definition itself, these three metrics are important for appropriately understanding veracity. Therefore, more studies with the use of these metrics in professional soccer athletes are warranted as well as more studies with professional female soccer players, who represented only 5% of the samples in the included studies. Moreover, 94% of the included articles are observational studies, and among the experimental study designs, none were randomized controlled trials. Despite the difficulty of carrying out experimental studies in professional soccer settings, we suggest that more of these studies are warranted to increase the veracity of monitoring tools and their associated parameters. The approach applied in this systematic review is based on one of the 4 Vs of Big Data. However, the other Vs also have an impact on soccer and should be the focus of future research. In this regard, the volume that describes the magnitude of the data [14] and is usually measured in terabytes (i.e., 10 12 ), petabytes (i.e., 10 15 ), zettabytes (i.e., 10 21 ), and even yottabytes (i.e., 10 24 ) [15] has increased exponentially, with a 300-fold increase in the last 15 years, from 2005 to 2020, reaching 40 zettabytes (i.e., 37.3 trillion gigabytes; 10 9 ) [15]. Therefore, this unprecedented volume of data is overwhelming, thus increasing the risks of it not being fully used to inform the practice [13]. For example, a dataset from a Bundesliga season resulted in 400 gigabytes of tracking data [14]. Regarding this, variety refers to the heterogeneity of data, i.e., to different data formats and sources distinguished, which can be differentiated among structured (e.g., relational data), semistructured, (e.g., XML data), and unstructured data (e.g., emails, pictures, videos, or social networking data) [14][15][16][17]. Thus, in soccer, data variety refers to position, video, fitness, training, skill performance, and health data [14]. Moreover, velocity describes the speed at which novel data is generated, processed, and analyzed [14][15][16][17]. Specifically for soccer, the velocity may vary between real-time streams from physiological and positional data to stored data tonotational analyses during training and competition [14]. All these three key concepts characterizing Big Data are highly relevant and should be considered with the suggestion by Lukoianova and Rubin [23], who have stated that Big Data can only have value when its veracity can be established and, thereby, the information quality confirmed.
To date, Big Data has been reported in the scientific literature of soccer in relation to tactical analysis only [14,146] but not in relation to load monitoring. Although data can be considered big based on the number of V2019s, there are no universal benchmarks for the number of Vs [14][15][16][17][18][21][22][23]146,147]. Therefore, the current systematic review establishes the initial basis for the implementation of Big Data approaches for load monitoring in professional soccer while identifying the strengths and limitations of the current evidence before moving on to the real applications of Big Data: data management, which involves processes and supporting technologies to acquire and store data and to prepare and retrieve it for analysis, and analytics, which refers to the techniques used to analyze and acquire intelligence from Big Data [146,147]).
Finally, we would like to make a brief statement and endorse an editorial on the current trend of having extreme positions in sportsscience, which is not recommended [148]. Thus, this review highlights the importance of data veracity analysis for objectively using tools and their associated parameters for load monitoring in professional soccer, considering equally traditional approaches and the forefront of technological advances.

Conclusions
In conclusion, a wide diversity of tools and parameters are used to monitor loads in elite professional soccer. However, it is not common to find data veracity for these tools and parameters in scientific literature. The reported veracity metrics will assist in the selection and use of the best monitoring tools and their associated parameters for load monitoring in professional soccer. Before looking for new tools and parameters, the current ones need to present adequate levels of data veracity (accuracy, reliability, and quality of data). Therefore, this information is warranted when aiming to use predictive analytics for structured Big Data in professional soccer.

1.
The use of Big Data approaches without appropriate data veracity can undermine the precision of the predictive analytics models and generate fatal errors with a high economic cost; 2.
Up to the moment, data veracity is not commonly reported in scientific studies using tools and parameters for load monitoring in professional soccer; 3.
It is necessary to more frequently analyze and share data veracity to perform the best data management and analytics when applying Big Data.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/app11146479/s1, Table S1: Risk of bias score; Table S2: Summary of the selected studies. Reference [149] is cited in supplementary materials. Data Availability Statement: After publication, all data necessary to understand and assess the conclusions of the manuscript are available to any reader of Applied Sciences.