A Systematic Review and Meta-Analysis on the Association and Differences between Aerobic Threshold and Point of Optimal Fat Oxidation

Over the past two decades, scientists have attempted to evaluate whether the point of maximal fat oxidation (FATmax) and the aerobic threshold (AerT) are connected. The existence of such a relationship would allow a more tailored training approach for athletes while improving the efficacy of individualized exercise prescriptions when treating numerous health-related issues. However, studies have reported conflicting results, and this issue remains unresolved. This systematic review and meta-analysis aimed: (i) to examine the strength of the association between FATmax and AerT by using the effect size (ES) of correlation coefficient (r) and standardized mean difference (SMD); (ii) to identify potential moderators and their influence on ES variability. This study was registered with PROSPERO (CRD42021239351) and ClinicalTrials (NCT03789045). PubMed and Google Scholar were searched and fourteen articles, consisting of overall 35 ES for r and 26 ES for SMD were included. Obtained ESs were analyzed using a multilevel random-effects meta-analysis. Our results support the presence of a significant association between FATmax and AerT exercise intensities. In conclusion, due to the large ES variance caused by clinical and methodological differences among the studies, we recommend that future studies follow strict standardization of data collection and analysis of FATmax and AerT-related outcomes.


Introduction
Lipids and carbohydrates are the dominant fuels utilized by humans during exercise with their absolute and relative contribution being influenced by sex, diet, exercise intensity and duration, time of the day, and fitness level [1]. During moderate exercise intensities, the energy contribution from lipids increases and then markedly declines to zero at heavy to severe exercise intensities; from that point on, carbohydrates become the dominant energy substrate [2,3]. Carbohydrates, due to their limited stores, can reduce performance during prolonged and/or heavy intensity activities; yet, as little as 1% of body fat can supply sufficient energy for up to 90 km of physical movement, making fat a more suitable fuel source [2]. Maximal fat oxidation point (FAT max ) is commonly used to describe an exercise intensity at which fat oxidation is at its highest, whereas exercise intensity matching negligible fat oxidation is labeled FAT min [3,4].
Regular exercise at FAT max intensity has been proposed as a key factor to optimize the body's ability to oxidize lipids, which is of the highest interest to athletes [5]. Moreover, with the current obesity epidemic representing a serious medical problem due to its association with numerous chronic diseases, (e.g., cardiovascular diseases, hypertension, diabetes), exercising at FAT max intensity has also gained a great deal of attention among public health professionals and has been recommended for treating a number of chronic health issues [5][6][7]. Accurately prescribing exercise intensity is a complex task and there is controversy among both researchers and professionals regarding which of the methods used to design an efficient training plan is the most appropriate [8][9][10] The traditional approach is based on the prescription of exercise intensity as a percentage of maximal oxygen uptake (%VO 2max ) or maximal heart rate (%HR max ), with these methods commonly represented in literature [8,9]. However, exercise intensity prescriptions based on %VO 2max have revealed moderate to large inter-subject variability  ) at the intensities yielding FAT max [1,8]. The variability of the FAT max intensities becomes lower when %HR max is used (55-65%HR max ) yet remains ambiguous when it comes to individualized exercise prescription [11,12]. Hence, exercise intensities expressed as a fixed percentage of maximal values might not accurately reflect the metabolic responses of the human body [11,13,14]. For these reasons, some authors recommend that exercise intensity should be prescribed using a more standardized method, such as individual metabolic thresholds since traditional methods fail to account for differences in the subject's metabolic stress [9][10][11]. In contrast to the relative percentage of VO 2max or HR max , an individualized approach to exercise intensity prescription based on metabolic thresholds describes specific metabolic phases during exercise and thus, intends to account for differences in the body's physiological and functional capacity [11]. This approach might also homogenize the elicited metabolic stress and consequently reduce individual variability in metabolic responses despite differences in their phenotype [9,15].
During exercise with increasing intensity, three phases of the body's energy production and two threshold points delineating these phases can be distinguished [15]. These threshold points have been termed the metabolic thresholds and can be determined by either gas analysis or blood lactate techniques [15,16]. Throughout the years, scientists used different terms to identify these two thresholds, whether they wanted to refer to the physiological processes occurring in the body or to the methods used to identify them [11,15,16]. For additional clarification of the physiological and methodological significance of the thresholds, we suggest further reading [11,15,16]. In this paper, we will mention only the first threshold, whereas the term aerobic threshold (AerT) will be used to refer to it. Our goal is to align with the conceptual framework for performance diagnosis and training prescription proposed and clearly described by Meyer et al. (2005) [15].
Ever since the term FAT max was introduced, scientists have tried to determine the existence of a relationship between exercise intensities matching FAT max and AerT, with the aim of assuring a more individualized exercise prescription [17][18][19]. If such a connection exists, it would integrate the most relevant indices for planning and assessing an effective exercise program [9,10]. Hence, over the last two decades, conflicting results with high interstudy variability on the association between exercise intensities matching FAT max and AerT have been reported [20][21][22]. These variations may have resulted from both methodological and clinical differences within and between the studies [12]. To our knowledge, no studies have systematically explained this variability.
Hence, this systematic review and meta-analysis aimed to examine the association between the FAT max and AerT, identify relevant moderators, and examine their influence on effect size variability.

Study Design
This systematic review was registered in the International Prospective Register of Systematic Reviews (PROSPERO) (ID: CRD42021239351) and is part of a pre-registered trial on ClinicalTrials.gov (ID: NCT03789045). The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist and flow diagram for reporting systematic reviews and meta-analyses was used as a tool to structure this review and describe the methodology, and systematically present our search findings [23].

Search Strategy
We searched MEDLINE (via PubMed) and Google Scholar for studies exploring the FAT max and AerT relationship. The search was performed using the Boolean logic, which limits the search results with operators including AND/OR to only those documents containing relevant key terms in the scope of the review. The search combined the following key terms: "fat oxidation", OR "maximal fat oxidation", OR "optimal fat oxidation", OR "peak fat oxidation", AND "aerobic threshold", OR "anaerobic threshold", OR "ventilatory threshold", OR "lactate threshold", OR "metabolic threshold", OR "gas exchange threshold", representing variety in terminology used [15,16]. The search was developed and conducted by two independent researchers (PR and NZ) and reported using the PRISMA statement [23]. Furthermore, since the FAT max phrase was introduced in 2001 [2], the inclusion date for the publications was restricted from 1 January 2001, to present. The last search was run on 1 April 2021.

Data Extraction
Two investigators (PR and NZ) independently performed, in an unblinded standardized manner, screening of the retrieved records, and checked whether they met the eligibility criteria, with a third reviewer being involved when opinion differences were present (FMC). Data were extracted from each of the selected articles and summarized in an Excel spreadsheet (Microsoft, Redmond, WA, USA) for further analysis.

Inclusion/Exclusion Criteria
Research articles were selected using the defined population, intervention, comparison, outcomes, and study (PICOS) design [24], with study characteristics for inclusion established as: (i) original studies, (i.e., randomized and non-randomized controlled trials, cohort studies, case-control studies, and cross-sectional studies) in form of (ii) full text, abstracts or congress presentations. Furthermore, the studies included had to report AerT and FAT max intensities occurrence, (i.e., mean ± SD) and their association, (i.e., Pearson correlation, Kendall rank correlation, or Spearman correlation coefficient). If one of two requirements failed to be reported or was unable to be determined, the corresponding author was contacted. However, independent of its form, to be considered eligible, these studies were required to show a clear and reproducible description of the methods used to determine both AerT and FAT max . Case reports, editorials, reviews, and opinion papers were excluded. In addition, only studies that included participants with no evidence of any metabolic, pulmonary, or cardiovascular diseases (conditions potentially affecting substrate utilization) were considered acceptable. No other restrictions to participants' characteristics were introduced other than age (18-60 years). Report criteria required studies to be written in English and to be published in a peer-reviewed journal. The grading of recommendations, assessment, development, and evaluation (GRADE) approach was applied to define the quality of a body of evidence for selected studies [25].

Statistical Analysis
All statistical analyses were performed with R software (version 4.0.4) (The R Foundation, Vienna, Austria) by using metafor and dmetar packages [26,27]. Pearson's correlation coefficients (r) and standardized mean difference (SMD) between the exercise intensities at which AerT and FAT max appeared were used as ESs; such ESs were analyzed separately to estimate the strength of the correlation and the size difference between the two exercise intensities, respectively. Summary SMD and r estimates were determined using a random-effects model and presented as mean and 95% confidence (CI) and prediction (PI) interval [28].
Considering that r can be biased by the measurement unit used to compute it due to possible spurious correlation, a meta-analysis of r had to be performed separately for each measurement unit used to identify exercise intensity at FAT max and AerT [29]. Indeed, the correlation between variables that are non-independent due to the share of a common denominator, (i.e., variables expressed as a fraction of body weight or maximal values), can be influenced and over-estimate the correlation between the original independent variables [29,30]. On the other hand, SMD allows ESs deriving from different measurement units to be pooled in one single meta-analysis [30].
To account for dependencies among ESs, which in certain cases were clustered within the same study, multilevel meta-analyses (MA) were performed where the level 1 variance is attributed to pure sampling error, the level 2 described the within-studies variance, and level 3 represented the amount of between-studies variance [31][32][33]. Distribution of variance among the different levels was used to identify the required levels of modeling [33]. Moreover, the model assumptions about the identifiability of each parameter (the variances of each model) were checked using the profile likelihood (PL) of each fitted model; when the PL did not provide an identifiable profile of the parameter, a simpler model was considered [34]. Sensitivity analysis based on the leave-one-out method and statistical outlier detection test was implemented in order to explore the impact of excluding or including any ES, determine the impact of distortion on the pooled overall effect estimate as well as avoid potential pseudoreplication problems [35,36]. In a multilevel meta-analysis, the risk of bias across studies (publication bias) requires a specialized analysis; thus, publication bias was determined by an extended Funnel plot test and the adapted Funnel plot for multilevel meta-analysis [37]. The risk of bias in individual studies, (i.e., internal validity) included in this paper was examined using a ROBIS tool, with results of this assessment presented in a graphical format [38].
For each outcome parameter, degrees of heterogeneity were measured with Cochran's test for chi-squared statistic of total (Q) and expected variance (df) and expressed as the Higgins (I 2 ) statistic [39,40]. If heterogeneity was present, the random-effects model was the preferred model, and the weighing factor, the inverse of the between-studies and withinstudies variance was used [28]. Additionally, the PI calculated for each ES provided a predicted range for the true treatment effect in an individual study, additionally describing the degrees of heterogeneity between the studies [26]. In cases of relevant heterogeneity between the studies (0% to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity; 50% to 90%: may represent substantial heterogeneity; 75% to 100%: considerable heterogeneity), relevant moderators and matching subgroups were identified and an analysis of changes in variation in effects was performed [33,41]. For each SMD and r meta-analysis, potential sources of heterogeneity in a test of moderators were assessed by using an omnibus test based on F distribution [42] which was performed solely if at least three ESs were available per subgroup [33]. The standardized measure of effect size (Cohen's F) was also used to describe the strength and practical significance of the test of moderator, with values f = 0.1 is a small effect, f = 0.25 is a medium effect, and f = 0.4 is a large effect, respectively [43]. The critical value for the F distribution (f 2 ) was determined from the table with the α = 0.05 [43].
When r was used as the ES, if measures of association other than Pearson's correlation (r) were reported, they were converted to r as previously described [44]. Moreover, to obtain unbiased weights for each study, Fisher's z-transformation was used to convert r values derived from the original studies to z values, which were used for the statistical analyses (z values were transformed back to r for presentation purposes) [28,45]. Correlations were classified as weak (r ≤ 0.30), moderate (r = 0.30-0.50), significant (r = 0.50-0.70), and strong (r > 0.70) [46].
When SMD was used as the ES, means and SD values of exercise intensities at FAT max and AerT were extracted from all the retrieved studies. The SMD was computed to account for the differences between FAT max and AerT since the studies assessed the same outcome, (i.e., exercise intensity at FAT max and AerT) but measured them by using different measurement units [47]. Additionally, since FAT max and AerT exercise intensities were paired data deriving from the same individual, the SMD was computed to account for paired measures by accounting for the correlation between the two exercise intensities [48]. Treating the two intensities as independent measures could rise the unit-of-analysis error by providing confidence intervals that are likely to be too wide, and reducing the trial's weight, with the possible consequence of disguising clinically important heterogeneity [39]. For one study where r was not reported [49], r had to be calculated using a different measurement unit than that one used to report mean and SD values, (i.e., %VO 2max ) and therefore used to compute SMD which was assumed to be equal to the pooled r derived from the same measurement unit as ES. Cohen's rule of thumb for interpretation of the SMD statistic was followed: a value of ≤0.2 indicates no effect, a value of 0.21 to 0.5 indicates a small effect, a value of 0.51 to 0.8 indicates a medium effect and a value of >0.81 indicates a large effect [43].
A descriptive exploration of the plotted ESs (r and SMD) was used as a proxy of agreement, where high agreement was assumed in case of high r and small effect SMD (r > 0.7 and SMD < 0.5) and low agreement in case of low r and medium to large effect SMD (r < 0.7 or SMD > 0.51). Furthermore, this method was used to explore measurement units as a potential source of spuriousness [29].

Descriptive Results
The systematic search retrieved a total of 71,400 papers with 18,300 identified as duplicates. Thereafter, screening of the paper's title and language was performed with a total of 575 potentially relevant papers included in the abstract screening. Subsequently, eighty records were selected to be read in full as potentially eligible. An additional sixty-six articles were excluded after reading the full text as they did not fulfill the inclusion criteria. Data extraction was performed by two reviewers (PR and NZ) independently, using a data extraction Excel form (Microsoft, Redmond, WA, USA). Any disagreements about data extraction were solved by consensus or by the decision of a third reviewer. For one study, correlation coefficients were converted from Kendall's tau to Pearson r [20]. The literature search and study selection process are reported using the PRISMA statement in Figure 1. The internal validity of included studies is presented in Figure 2. One correlation ES was excluded as it was identified as an outlier after sensitivity analysis (results not shown) due to its ES estimate being too extreme to fit within pooled ES and not overlapping with the 95% CI of the pooled effect [50]. Moreover, its ES overly contributed to the heterogeneity, while at the same time was not very influential concerning the overall pooled weight (presumably due to the small sample size). Overall, fourteen papers were included in the present study with a total of 35 ES for r and 26 ES for SMD.

Exploring Heterogeneity and Subgroup Analysis
Due to anticipated heterogeneity, three investigators (PR, AGFJ, and FMC) independently performed the identification of common variables, (i.e., relevant moderators) and their subgroups allowing sources of variation to be investigated. Identified moderator variables, categorized by differences in characteristics of the studies, (i.e., methodological diversity) and by study populations, (i.e., clinical diversity), are reported in Table 1.

Exploring Heterogeneity and Subgroup Analysis
Due to anticipated heterogeneity, three investigators (PR, AGFJ, and FMC) independently performed the identification of common variables, (i.e., relevant moderators) and their subgroups allowing sources of variation to be investigated. Identified moderator variables, categorized by differences in characteristics of the studies, (i.e., methodological diversity) and by study populations, (i.e., clinical diversity), are reported in Table 1.

Exploring Heterogeneity and Subgroup Analysis
Due to anticipated heterogeneity, three investigators (PR, AGFJ, and FMC) independently performed the identification of common variables, (i.e., relevant moderators) and their subgroups allowing sources of variation to be investigated. Identified moderator variables, categorized by differences in characteristics of the studies, (i.e., methodological diversity) and by study populations, (i.e., clinical diversity), are reported in Table 1.

Clinical Diversity
When the selected studies were examined for their clinical diversity, two moderators were identified: (i) sex and (ii) physical activity level, each with two subgroups (Table 1). All included papers collectively evaluated 855 participants, out of which 526 (61.52%) were males and 329 (38.48%) were females. Seven studies (50%) used males as primary participants, whereas one (7.14%) study tested only females. In six (42.86%) studies, both sexes were evaluated. When classified by physical activity level, 296 (34.62%) subjects were active while 559 (65.38%) were inactive, (i.e., sedentary or obese). Seven studies (50%) examined the active population whereas six (42.86%) studies examined inactive participants. Only one study (7.14%) examined both groups.

Methodological Diversity
When selected studies were examined for their methodological diversity, six moderators with relevant subgroups were identified (Table 1). Cycle ergometry was the preferred method in nine (64.29%) studies, whereas treadmill was employed in five (35.71%). When considering the methods used to determine the AerT, five (35.71%) studies preferred the lactate method whereas gas analysis was used in seven (50%). Two (14.29%) times, both methods were used. When evaluating the measurement methods used to establish correlation, five methods were identified: thirteen (37.14%) cases preferred mL/min/kg with L/min, %VO 2max and b/min were used in seven (20%), and %HR max in two (5.71%) cases. The analysis of test protocols used to assess VO 2max /AerT, showed that all studies favored a graded exercise test (GXT) type protocol. Twelve (85.71%) studies preferred shorter stages (≤3-min) while longer stages (>3-min) were used in only one study (7.14%). One (7.14%) study examined the association by using both short and long stages. FAT max was identified by using visual inspection of the appropriate plots in nine (64.29%) studies while five (35.71%) studies approached this issue by using a mathematical model. Finally, ten (71.43%) studies determined FAT max during the same GXT used to assess VO 2max /AerT, whereas four (28.57%) studies used an additional test.

Overall Effect
All data provided by the selected studies met the inclusion standard for correlation meta-analysis with the studies evaluated for their heterogeneity and bias. Pooled ESs calculated using random effects, 95% CI, and PI with the results of heterogeneity and distribution of variance results for each measurement unit are presented in Table 2. By observing the adapted Funnel plot for multilevel meta-analysis (results not shown) for each measurement unit used to assess r, we observed that the distribution of study-specific effects was actually quite symmetrical, with no indication of the existence of publication bias.

B/min Measurement Method
For the observed measurement method, only one moderator met the criteria set to perform the test of moderators.

Sex
In the observed data (with the sub-grouping) the PL for the three-level MA was noninformative, so a two-level MA was fitted. When sex was examined as a moderator, the test of moderators revealed no statistical differences (F 1,4 = 0. 54

mL/kg/min Measurement Method
For the observed measurement method, the following moderators met the inclusion criteria set to perform the test of moderators.

%O 2max Measurement Method
For the observed measurement method, the following moderators met the inclusion criteria set to perform the test of moderators.

FAT max Detection Method
The PL for the three-level MA was non-informative, so a two-level MA was fitted. When evaluating the FAT max detection method as a moderator, two subgroups were identified: visual and mathematical methods (F 1,5 = 0.06, critical f 2 = 6.61, p = 0.823). For the visual method of detection, mean ES was 0.52 (95% CI −0.76 to 0.97 and PI −0.99 to 1.00) whereas for the mathematical method, mean ES was 0.60 (95% CI −0.18 to 0.92 and PI −0.77 to 0.98). For this moderator, pooled values for a common estimate confirmed the existence of substantial heterogeneity (Q = 60.71, df = 5, I 2 = 93.86%, p < 0.001). Total I 2 for the visual inspection method was considerably high (Q = 50.48, df = 2, I 2 = 97.61%, p < 0.001). For the mathematical method, heterogeneity was substantial (Q = 10.22, df = 3, I 2 = 73.40%, p < 0.001). The expected effect for future studies in both groups varied from very weak to very strong, reflecting high heterogeneity.

L/min Measurement Method
For the observed measurement method, no moderators met the inclusion criteria set to perform the test of moderators.
3.6. Meta-Analysis Study of Standardised Mean Differences 3.6.1. Overall Effect Since SMD allows adjustment for different effect sizes deriving from different measurement units without bias presence, calculated SMDs were pooled in one single meta-analysis. ESs were calculated using random effects, and 95% CI with the results of heterogeneity, and the distribution of variance results for each measurement unit is presented in Figure 3. Polled PI was −1.43 to 0.97 with SE = 0.16. Residual heterogeneity was substantial and similarly divided between level 2 (39.75%) and level 3 (52.73%). In 69.23%, FAT max preceded AerT whereas the test of moderators revealed that only in the case of long stage GXT did AerT tend to precede FAT max (Table 3). By observing the adapted funnel plot for multilevel meta-analysis (results not shown) for each measurement unit used to assess SMD, we observed that the distribution of study-specific effects was actually quite symmetrical, with no indication of the existence of publication bias.

L/min Measurement Method
For the observed measurement method, no moderators met the inclusion criteria set to perform the test of moderators.

Overall Effect
Since SMD allows adjustment for different effect sizes deriving from different measurement units without bias presence, calculated SMDs were pooled in one single meta-analysis. ESs were calculated using random effects, and 95% CI with the results of heterogeneity, and the distribution of variance results for each measurement unit is presented in Figure 3. Polled PI was −1.43 to 0.97 with SE = 0.16. Residual heterogeneity was substantial and similarly divided between level 2 (39.75%) and level 3 (52.73%). In 69.23%, FATmax preceded AerT whereas the test of moderators revealed that only in the case of long stage GXT did AerT tend to precede FATmax (Table 3). By observing the adapted funnel plot for multilevel meta-analysis (results not shown) for each measurement unit used to assess SMD, we observed that the distribution of study-specific effects was actually quite symmetrical, with no indication of the existence of publication bias.

Sex
For this moderator, the PL for the three-level MA was informative, so this was the fitted model. Test of moderators revealed F 1,22 = 0.01, critical f 2 = 4.30, p = 0.936. For this moderator, pooled values for a common estimate of heterogeneity showed Q = 290.54, df = 22, p < 0.001 with variance equally being distributed between level 2 (41.65%) and level 3 (51.49%).

Ergometer
The PL for the three-level MA was informative, so a three-level MA was used. When the type of ergometer was examined, the results were F 1,24 = 0.33, critical f 2 = 4.260, p = 0.572. For this moderator, the presence of residual heterogeneity was confirmed (Q = 285.26, df = 24, p < 0.001) with I 2 originating from level 2 (35.06%) and level 3 (58.09%).

FAT max Detection Method
The PL for the three-level MA was informative. When evaluating the FAT max detection method, two subgroups were identified: visual and mathematical methods, with following results (F 1,24 = 1.76, critical f 2 = 4.260, p = 0.197). For this moderator, pooled values for a common estimate confirmed the existence of substantial heterogeneity (Q = 300.77, df = 24, I 2 = 92.39% (level 2 38.67% and level 3 53.72%), p < 0.001). The expected effect for future studies in both groups varied from very weak to very strong, reflecting high heterogeneity.

Correlation Coefficients and Standardized Mean Differences
Finally, r vs. SMD between the exercise intensities at FAT max and AerT were plotted in Figure 4; providing an integrated view of all ESs. The combination of r and SMD can be seen as a proxy of agreement, where high agreement is observed in the case of high r and low SMD (r > 0.7 and SMD < 0.5, represented by the dark grey square in Figure 4) and low agreement related to low r or high SMD (r < 0.7 and SMD > 0.51, white background in Figure 4). 0.857. With an overall high heterogeneity identified (Q = 281.50, df = 24, p < 0.001), substantial heterogeneity (I 2 = 95.23%) was distributed within (1.3%) and between (93.93%) studies using gas analysis, whereas moderate heterogeneity (Q = 92.03, df = 13, I 2 = 89.41%, p < 0.001), unevenly distributed between level 2 (71.62%) and level 3 (17.79%) in studies using the lactate method were detected.

FATmax Detection Method
The PL for the three-level MA was informative. When evaluating the FATmax detection method, two subgroups were identified: visual and mathematical methods, with following results (F1,24 = 1.76, critical f 2 = 4.260, p = 0.197). For this moderator, pooled values for a common estimate confirmed the existence of substantial heterogeneity (Q = 300.77, df = 24, I 2 = 92.39% (level 2 38.67% and level 3 53.72%), p < 0.001). The expected effect for future studies in both groups varied from very weak to very strong, reflecting high heterogeneity.

Correlation Coefficients and Standardized Mean Differences
Finally, r vs. SMD between the exercise intensities at FATmax and AerT were plotted in Figure 4; providing an integrated view of all ESs. The combination of r and SMD can be seen as a proxy of agreement, where high agreement is observed in the case of high r and low SMD (r > 0.7 and SMD < 0.5, represented by the dark grey square in Figure 4) and low agreement related to low r or high SMD (r < 0.7 and SMD > 0.51, white background in Figure 4).

Discussion
The main conclusion of our study was that the FAT max and AerT connection exhibits a consistent and strong association. Moreover, the variations in methodological designs of the studies, combined with the clinical diversity seem to contribute to the apparently inconsistent, (i.e., heterogeneous) results. These findings represent an important practical application that allows a more tailored and individualized approach to exercise prescription.
The Strongest r, with the narrowest 95% CI and PI, was observed in the case of mL/min/kg, implying the highest association when this method is used. Test of moderators followed by the subgroups analysis revealed no influence of clinical differences on FAT max and AerT correlation. However, substantial heterogeneity was still noticed for both clinical moderators which remained considerable even after subgroup analysis. If heterogeneity is lower within the subgroups than across the pooled data, we can assume that the subgroup analysis contributes toward explaining heterogeneity in the overall analysis. However, this was not the case. Potential explanations could be drawn from the characteristics of study participants, (i.e., demographics, sex, age range, body composition, etc.), which in our study could not be controlled. Furthermore, stronger correlation and narrower 95% CI and PI were noticed in the studies using a treadmill, gas analysis for AerT detection, and a single GXT used to determine AerT and FAT max , respectively. These findings suggest more robust and reliable results since CIs can be interpreted as an estimate of the "true" effect. A sufficient number of participants were included in each subgroup, so the covariate distribution is not concerning in these results. As a general rule, the larger the differences between the subgroups and narrower their CI's, the more robust and reliable the results are. Concerning methods used to detect AerT, results for the lactate subgroup indicate large dispersion in the correlations; hence, the results observed cannot be considered deriving from a single population. Contrary, gas analysis was revealed to be a more reliable method when comparing exercise intensities of AerT and FAT max , with the lactate method significantly reducing the strength of the association. However, it is worth noting that the observed presence of considerable heterogeneity for each subgroup could originate from multiple methods associated with either gas analysis or lactate tests, which require further examination [56]. Based on the available data, it is recommended that future trials should prefer gas analysis over lactate methods when aiming to assess the correlation between AerT and FAT max . When considering the test used to determine FAT max , no heterogeneity was identified for an additional test, presumably due to the small number of included studies. However, in the group where FAT max and AerT were determined during a single GXT, substantial heterogeneity was revealed, presumably as a result of methodological differences among the studies. These differences can probably be attributed to different GXT protocol methodologies (stages length and incline differences), contributing toward subgroup variability. In our MA, stages ranging from 1 min to 6 min were reported (Table 1). However, due to the small number of included studies and to adequately assess statistical significance, stages ≤3 min were classified as short whereas, stages >3 min were classified as long [13,14]. Even though reliability was higher for the short stages group, it still remains vague if 1-min, 2-min or 3-min stages are more adequate as no consensus exists regarding the optimal GXT methodology when determining VO 2max /AerT [1,8,13]. Hence, using a single GXT should be a preferable method, which, in the long run, would allow further insight into the variability within the short stages. Furthermore, this strategy could inhibit potential pseudoreplication problems, while at the same time being less time-consuming and more economical for both researchers and subjects [12,36].
Even though the L/min measurement method was revealed to have the second strongest correlation coefficient, no moderators met the inclusion criteria set to perform the test of moderators. However, this method revealed robust and reliable results due to narrow 95% CI and associated PI, demonstrating a possible future application of this method. However, since a non-sufficient number of participants were included in each subgroup, the covariate distribution is a concerning factor for this analysis.
When the b/min measurement method was used, lower r with a wider 95% CI and PI were observed, compared to previously discussed methods, with only one moderator meeting the required standards for the test of moderators. Nevertheless, sex was revealed to have no influence on the correlation strength with a somewhat stronger correlation and wider 95% CI and PI observed in females compared to males. This result could be attributed to the substantially larger number of males included in this test of moderators, allowing more robust results. Considering this, future studies should expect a high variability in their results when this method of correlation assessment is used.
Even though the %VO 2max was observed as the method with the weakest correlation, none of the tested moderators were revealed to have a significant influence on the correlation strength. This result, in line with our original framework, goes to show that the individual metabolic responses to exercise cannot accurately be identified by calculating fixed percentages of maximal values [11,13,14]. Moreover, future studies using this method could expect large heterogeneity in their results since 95% CI and PI observed were wide, leading to low reliability of obtained results since obtained CIs cannot be interpreted as an estimate of the "true" effect.
The SMD analysis showed a small negative pooled effect size, revealing no significant differences between the intensities at which FAT max and AerT occur, with a strong tendency of FAT max to precede AerT. Furthermore, it seems that the duration of the stages used during an exercise test for VO 2max /AerT determination may affect the association between FAT max and the AerT. When long stages are used in a GXT, AerT tended to precede FAT max , demonstrating a direct influence of the specific moderator on the order. The fact that longer stages tend to underestimate FAT max could be a potential reason for this occurrence [1,12]. This finding advises-when prescribing individualized exercise-researchers and professionals should devote their attention toward methods standardization considering the VO 2max /AerT identification as that will also optimize the oxidation of fat [9][10][11]. The heterogeneity for the pooled SMD meta-analysis was high, with a tendency of lowering during the subgroup SMDs analysis; indicating an influence of the methodological differences on the exercise intensities matching FAT max and AerT. The test of moderators showed no significant effect for any of the moderators considered, which, as for the r ES analyses, seems to highlight that most of the heterogeneity was due to multiple factors caused by the lack of standardized methods and guidelines to assess the association between FAT max and AerT.
When r and SMD ESs were plotted (see Figure 4), no evident pattern in relation to the moderators tested was noticed. Out of the 25 matched correlation coefficients and SMDs, 10 showed high r and low SMD, which seemed to be relatively equally distributed between the moderators fulfilling the requirement for the test of moderators. Moreover, although high r and low SMD Ess seemed to be more frequent in certain subgroups, (i.e., short stages VO 2max protocol, single GXT, and gas analysis method), there are several other matched ESs computed from studies that used mentioned moderators, yet not reported high r and low SMD. Interestingly, when the studies reporting high r and low SMD were considered, there seemed to be a pattern. In fact, if a matched pair of ESs showed high r and low SMD, this was also true for all the other ESs of that specific study. This implies that the combination of low SMD and high r was relatively uniform within the study, suggesting the existence of possible further methodological or clinical differences that remained uncovered during the moderators' analysis. Studies with measurement units such as b/min or L/min, which can be considered as non-spurious measures (not normalized to subjects), show a tendency to have weak to moderate correlation values or high SMDs; with one study as the only exception [18]. Contrary, the studies with low SMD and high correlation values tend to be studies that use a potentially spurious measures such as %HR max , %V0 2max , and mL/min/kg (see the dark grey rectangle in Figure 4). This may be due to the fact that when the exercise intensity is expressed in relative terms, the interindividual variability is reduced as the exercise intensity is normalized and uniform among individuals.
The present study is not without limitations. First, this systematic review included only English written peer-reviewed published studies restricted to healthy adult subjects pooling data from nonrandomized studies. Therefore, many confounding factors that might affect the correlation between FAT max and AerT could not be controlled. Second, although the number of participants included in the study was large, numbers in the several subgroups were relatively limited, inhibiting appropriate statistical analysis. Third, although these findings strongly suggest that a relationship exists between the two intensities, correlations and SMD are not sufficient to confirm that the intensities necessarily coincide and that the error between the two measures is small. Indeed, agreement and correlation are widely-used concepts that assess the relationship between variables and although similar and related, they represent different notions of connection [57]. Due to the lack of the information needed to directly assess agreement in the selected studies, our results could not provide direct information regarding the associated levels of agreement and error between the parameters. However, since high correlations and low SMD are two conditions necessary and related to high levels of agreement, our study was able to provide a proxy of agreement. Nevertheless, a future study exploring the topic of agreement is required.
The current results have wide practical implications. Development of affordable and unconventional field test strategies, such as methods using heart rate variability [58], perceived exertion [59], and respiratory frequency [60], could allow the identification of the AerT with relatively good accuracy, simplicity, and velocity. From a physiological perspective, AerT can be considered as a "biomarker" for FAT max identification (vice versa), which in turn could replace costly tests aiming to detect mentioned indices. From a practical point of view, the existence of a FAT max and AerT association has important implications for both athletes and clinical populations, providing a basis for more applied and individualized training prescriptions since FAT max training is defined as an effective method to enhance physical activity and fat oxidation [6,7]. This knowledge can improve training effectiveness by helping to recognize each subject's unique physiological and energetic demands, allowing an accurate and highly tailored aerobic training prescription.

Conclusions
Several recommendations could be drawn from our results. First, the existence of methodological and clinical differences between and within the studies can play a decisive role when establishing the correlation between FAT max and AerT. Using mL/min/kg as a measurement unit, short GXT stages, and gas analysis are recommended for standardization of future trials. In addition, the assessment should preferably be performed during one GXT session, whereas clinical differences, ergometer type, and FAT max detection method, all have a trivial influence on correlation strength. Likewise, our MA showed non-homogenous PI results ranging from almost 0 to 1. Therefore, when non-standardized, r of the future studies is expected to range from almost no correlation to absolute correlation. Hence, we concluded that when methodological differences are standardized, a strong correlation between FAT max and AerT is present.
Author Contributions: R.P. and C.F.M. participated in the design of the study, contributed to data collection, data reduction/analysis, and interpretation of results; Z.N., M.M., and F.J.A.-G. participated in the design of the study and contributed to data collection; P.T. contributed to data analysis and interpretation of results. All authors equally contributed to the manuscript writing. All authors have read and agreed to the published version of the manuscript, and agree with the order of presentation of the authors.
Funding: This research received no external funding.
Institutional Review Board Statement: Ethical review and approval were waived for this study due to this study being a Systematic Review and Meta-Analysis.
Informed Consent Statement: Patient consent was waived due to this study being a Systematic Review and Meta-Analysis.