A Meta-Analysis of Sampled Maximal Aerobic Capacity Data for Boys Aged 11 Years Old or Less Obtained by Cycle Ergometry

The aim of this study was to develop distributions of VO2max based on measured values that exist in the literature in prepubertal boys using cycle ergometry. PRISMA guidelines were followed in conducting this research. One database was searched for peak and maximal VO2 values in healthy boys with mean age under 11 years old. Data were split into articles reporting absolute and relative VO2max values and analyzed accordingly. Multilevel models grounded in Bayesian principles were used. We investigated associations between VO2max and body mass, year of the study, and country of origin. Differences in “peak” and “maximal” VO2 were assessed. Absolute VO2max (Lmin−1) increases with age (P ~100%) but mean relative VO2max does not change (P ~100%). Absolute VO2max is higher in more recent studies (P = 95.7 ± 0.3%) and mean relative VO2max is lower (P = 99.6 ± 0.1%). Relative VO2max in the USA is lower compared with boys from other countries (P = 98.8 ± 0.2%), but there are no differences in absolute values. Mean aerobic capacity estimates presented as “peak” values are higher than “maximal” values on an absolute basis (P = 97.5 ± 0.3%) but not on a relative basis (P = 99.6 ± 0.1%). Heavier boys have lower cardiorespiratory fitness (P ≈ 100%), and body mass seems to be increasing faster with age in the USA compared with other countries (P = 92.3 ± 0.3%). New reference values for cardiorespiratory fitness are presented for prepubertal boys obtained with cycle ergometry. This is new, as no reference values have been determined so far based on actual measured values in prepubertal boys. Aerobic capacity normalized to body weight does not change with age. Cardiorespiratory fitness in prepubertal boys is declining, which is associated with increasing body mass over the last few decades. Lastly, this study did not find any statistically significant difference in the sample’s mean aerobic capacity estimates using the ”peak” and “maximum” distinctions identified in the literature.


Introduction
Cardiopulmonary exercise testing is considered the gold standard for cardiorespiratory fitness (CRF) in pediatric medicine [1]. In exercise testing, maximal oxygen uptake (VO 2max ) is determined by using indirect calorimetry, which requires a skilled clinician and the use of standardized exercise treadmill protocols or cycle ergometry [2]. Reference values for CRF are needed to assess disease progression, for intervention monitoring, or to assess suboptimal aerobic performance [3]. Even though both treadmills and cycle ergometers are considered criterion measures of CRF, the two methods often produce statistically different estimates for the same child. The differences may be as large as 7-15%, with treadmill estimates being higher than cycle estimates [4][5][6]. Cycle ergometry has an advantage, however, as the test is not easily constrained by the mechanical limitations of the patient, such as deviant walking patterns and soreness in joints. In addition, during cycle ergometry, there is a lower chance of movement artifacts in the ECG and blood pressure recordings [1].
Another issue in exercise physiology is which criteria should be used for determining CRF. For a VO 2max determination, a plateau of VO 2 needs to be achieved. Usually, this is hard to attain in children. If relaxed criteria are used, the highest VO 2 value is then called VO 2peak . We know that the exercise physiology literature distinguishes between VO 2max and VO 2peak metrics, and we preserve this terminological difference in the tables but use CRF to designate maximal oxygen consumption in both cases. Differences in the two metrics are another aspect of uncertainty surrounding what is being measured. In fact, some reviewers state that the highly conditional nature of CRF estimates makes their validity and reliability questionable, especially during growth and maturation [7].
The literature has shown that a statistically significant difference in CRF could exist between girls and boys using absolute VO 2max values or relative VO 2max to body mass, and boys could have higher CRF values [7]. In addition, young girls and boys participate in different activities, and girls are less involved in organized sports and spend less time practicing [8]. These differences in activities cause boys to be outdoors more than girls, on average [9,10]. Thus, there are physiological and behavioral reasons why CRF estimates should be gender specific. Secondly, younger children spend more time outdoors and undertake more moderate and vigorous physical activities than older children, even if that time is modest-not optimal-for almost everyone [8,11,12]. When looking into CRF for children, using only children with mean age under 11 years of age can minimize physiological complications associated with puberty, resulting in significant changes in total body and muscle mass, stroke volume, growth velocity, oxygen uptake kinetics, fat oxidation rates, and blood lactate responses to work [13][14][15].
Given the lack of age-and gender-specific CRF reference values in prepubertal children, there is a need to develop observed distributions of VO 2max based on criterion methods rather than estimated or regression-based predicted values that are currently widely used. To the best of our knowledge, this is the first meta-analysis to critically examine CRF in boys 11 y old or less measured with cycle ergometry and distinguishing between VO 2max and VO 2peak indicators. The aim of this analysis is to provide researchers, medical experts, and sports practitioners with criterion-based observed values based on sampled studies identified in the literature. In addition, the aim of this paper is to assess whether there are significant differences in VO 2max and VO 2peak in boys under 11 y old and to compare values in girls of the same age.

Design
The Consideration of Population, Intervention, Comparator, Outcomes, and Study design (PICOS) framework was used.

Population
Included subjects were subjects who: (1) had mean age under 11 years old, (2) were stated to be healthy, (3) were without cardiovascular disease, pulmonary diseases (except asthma), morbid obesity, developmental disabilities, or muscular dystrophies, and (4) were free from injury. Overweight participants were included. Children with asthma were also included, as they seemed to have physical activity levels comparable with those of the normal pediatric population [16].

Intervention
Regardless of the interventions reported in many of the original articles, only preintervention data were used.

Comparator
VO 2max and VO 2peak metrics used to denote aerobic capacity were compared. In addition, comparisons were made based on the year and location of the study (USA versus non-USA countries; conducted solely to provide compatible sample sizes).

Outcomes
The main outcomes were VO 2max and VO 2peak metrics measured with cycle ergometry.

Study Design
Articles were considered for the analysis if: (1) they were published in a peer-reviewed journal, (2) they had mean/standard deviation VO 2max /VO 2peak parameters for each sample, along with mean age data for the subjects, and (3) if maximal effort was achieved during the incremental test.
These measures provided us with children (girls and boys) that used various testing methods for measuring CRF. Consequently, all articles reporting CRF of girls and mixedgender groups were removed from further analysis. In addition, analysis excluded articles with graphical results only, field studies, treadmill incremental tests, or other nonstandardized protocols and types of incremental tests. The study followed PRISMA guidelines, and the flow diagram presenting the study design can be found in Figure 1.

Intervention
Regardless of the interventions reported in many of the original articles, only preintervention data were used.

Comparator
VO2max and VO2peak metrics used to denote aerobic capacity were compared. In addition, comparisons were made based on the year and location of the study (USA versus non-USA countries; conducted solely to provide compatible sample sizes).

Outcomes
The main outcomes were VO2max and VO2peak metrics measured with cycle ergometry.

Study Design
Articles were considered for the analysis if: (1) they were published in a peerreviewed journal, (2) they had mean/standard deviation VO2max/VO2peak parameters for each sample, along with mean age data for the subjects, and (3) if maximal effort was achieved during the incremental test.
These measures provided us with children (girls and boys) that used various testing methods for measuring CRF. Consequently, all articles reporting CRF of girls and mixedgender groups were removed from further analysis. In addition, analysis excluded articles with graphical results only, field studies, treadmill incremental tests, or other nonstandardized protocols and types of incremental tests. The study followed PRISMA guidelines, and the flow diagram presenting the study design can be found in Figure 1.  potential articles included boys and girls with incremental tests in cycle ergometry and treadmills. All potential articles up to 2019 were hand searched by two researchers. In 2022, the same criteria were used to conduct an additional search from 1 January 2019 to 31 March 2022. Finally, articles that included data from girls and using treadmills were excluded. It is beyond the scope of one manuscript to include boys and girls, so girls were analyzed in a separate paper (in publication). This research includes only the comparison between boys and girls to determine whether there are any differences in cardiorespiratory fitness.

Statistical Analysis
All statistical conclusions developed in this paper utilize applied Bayesian inferential methodology included in the STAN library for R programming [17]. Among other attributes and capabilities, STAN promotes the use of Bayesian inferential models to allow a researcher to evaluate the likelihood that one distribution-in our case, VO 2max -has the same statistical properties as another distribution purported to describe the same phenomenon. In general, random probability Markov chain Monte Carlo (MCMC) algorithms are used to sample from the two distributions to facilitate the comparisons. A wide variety of Bayesian model comparison techniques are available in STAN to facilitate these types of statistical testing.
When comparing two groups, we used a simple normal model (the so-called Bayesian t-test): where y is the input data (VO 2max or VO 2peak measurements), µ is the location parameter, and σ the scale parameter. The default Stan priors (flat improper priors) were used. As observed, y is assumed to be approximately normally distributed.
For the linear regression model, the equation: was used, where y is the input data (VO 2max or VO 2peak measurements), x is the dependent variable (e.g., age or year of study), α 1 is the intercept for the location parameter, β 1 is the regression coefficient for the location parameter, α 2 is the intercept for the scale parameter, and β 2 is the regression coefficient for the scale parameter.
In other words, this model can detect both changes in the mean VO 2max and the VO 2max between-study variance through time or with age. Before making any statistical inferences, we executed all the necessary diagnostics (e.g., trace plots, estimated sample sizes, posterior predictive checks) to ensure the suitability of our models.
With P, we denote the probability that a particular research claim is true. We used a capital P to not confuse the probabilities calculated with Bayesian analyses with P-values from frequentist statistics. Unlike with P-values, with Bayesian statistics, we can directly quantify the probability (P) of a particular research question, which arguably provides us with the most direct, transparent, and intuitive measure of how certain we are about a claim we are making. Note that with Bayesian approaches, we can easily calculate the probability that the opposite of a particular claim is true (1 − P). Because of all this, the use of Bayesian statistical analyses has been on the rise over the last couple of years [17][18][19]. Uncertainty in all our analyses is reported with the Monte Carlo standard error (MCSE) measure.

Results
The analyses included 95 study samples that reported absolute values of aerobic capacity (both VO 2max and VO 2peak metrics) in units of Lmin −1 (included articles can be found in Table S1) and 118 study samples that reported relative VO 2max /VO 2peak measures in units of mLkg −1 min −1 (included articles can be found in Table S2) [15,. Observed distributions of VO 2max/peak are presented in Table 1. We are as sure as we can be (P~100%) that absolute CRF increases with age. The probability that the between-study standard deviation increases with age is 87.1 ± 1.4% ( Figure 2). Looking into differences between VO 2max and VO 2peak , this study suggests that mean VO 2peak is larger than mean VO 2max . We can claim this with a probability of 97.5 ± 0.3% ( Figure 3). When checking for any changes across the years, the analysis showed that the mean absolute CRF is higher in more recent studies. We can claim this with a probability of 95.7 ± 0.3%. The between-study variability seems to be dropping, but we can claim this only with less than a 90% certainty (P = 89.9 ± 0.3%) ( Figure 4). We also looked for any differences between CRF in the USA and other countries of the world. We cannot claim there are differences here ( Figure 5). Lastly, we looked into body mass. In articles reporting absolute values, CRF (Lmin −1 ) is higher in boys with greater body mass (P ≈ 100%) ( Figure 6). Looking into differences in body mass between the USA and countries in the rest of the world, there are no significant differences between trends in body mass (Figure 7), but in general, USA boys seem to be heavier (P = 96.05 ± 0.4%). What is more, boys in studies using VO 2peak seem to be heavier than those with VO 2max (P = 98.7 ± 0.2%).
found in Table S1) and 118 study samples that reported relative VO2max/VO2peak measures in units of mLkg −1 min −1 (included articles can be found in Table S2) [15,. Observed distributions of VO2max/peak are presented in Table 1. We are as sure as we can be (P ~100%) that absolute CRF increases with age. The probability that the between-study standard deviation increases with age is 87.1 ± 1.4% (Figure 2). Looking into differences between VO2max and VO2peak, this study suggests that mean VO2peak is larger than mean VO2max. We can claim this with a probability of 97.5 ± 0.3% (Figure 3). When checking for any changes across the years, the analysis showed that the mean absolute CRF is higher in more recent studies. We can claim this with a probability of 95.7 ± 0.3%. The between-study variability seems to be dropping, but we can claim this only with less than a 90% certainty (P = 89.9 ± 0.3%) (Figure 4). We also looked for any differences between CRF in the USA and other countries of the world. We cannot claim there are differences here ( Figure 5). Lastly, we looked into body mass. In articles reporting absolute values, CRF (Lmin −1 ) is higher in boys with greater body mass (P ≈ 100%) ( Figure 6). Looking into differences in body mass between the USA and countries in the rest of the world, there are no significant differences between trends in body mass (Figure 7), but in general, USA boys seem to be heavier (P = 96.05 ± 0.4%). What is more, boys in studies using VO2peak seem to be heavier than those with VO2max (P = 98.7 ± 0.2%).       ). This is the opposite of the finding with mean relative VO2max (mLkg −1 min −1 ), which is lower in newer studies (lower figure). We can claim this with a probability of 99.57 ± 0.05%. The between-study variability seems to be dropping, and we can claim this with a probability of 90.94 ± 0.3%.  ). This is the opposite of the finding with mean relative VO 2max (mLkg −1 min −1 ), which is lower in newer studies (lower figure). We can claim this with a probability of 99.57 ± 0.05%. The between-study variability seems to be dropping, and we can claim this with a probability of 90.94 ± 0.3%.   We cannot claim that the mean relative CRF or its standard deviation changes with age ( Figure 2). Secondly, we checked for differences between VO2peak and VO2max. The opposite was found in the absolute CRF. Our study suggests that mean VO2max is higher than mean VO2peak. We can claim this with a probability of 99.6 ± 0.1% (Figure 3). Moreover, it seems that the mean relative CRF is lower in more recent studies. We can claim this with a probability of 99.6 ± 0.1%. The between-study variability seems to be dropping, which we can claim with a probability of only 90.9 ± 0.3% (Figure 4). Our study also suggests that the mean relative CRF in other countries is higher than in the USA. We We cannot claim that the mean relative CRF or its standard deviation changes with age ( Figure 2). Secondly, we checked for differences between VO 2peak and VO 2max . The opposite was found in the absolute CRF. Our study suggests that mean VO 2max is higher than mean VO 2peak . We can claim this with a probability of 99.6 ± 0.1% (Figure 3). Moreover, it seems that the mean relative CRF is lower in more recent studies. We can claim this with a probability of 99.6 ± 0.1%. The between-study variability seems to be dropping, which we can claim with a probability of only 90.9 ± 0.3% (Figure 4). Our study also suggests that the mean relative CRF in other countries is higher than in the USA. We can claim this with a probability of 98.8 ± 0.2% ( Figure 5). Investigating body mass in articles reporting relative values, mean relative CRF (mLkg −1 min −1 ) is lower when participants have higher body mass (P ≈ 100%) ( Figure 6). It could be that USA boys are a bit heavier on average, but we cannot claim this with a very high probability (P = 82.9 ± 0.7%). We did, however, observe that in these articles, body mass seems to be increasing faster with age in the USA compared to other countries (P = 92.3 ± 0.3%) (Figure 7). Finally, boys in studies using VO 2peak also seem to be heavier in articles reporting relative values compared with studies with VO 2max (P = 99.6 ± 0.1%).

Is There Any Difference between Boys and Girls?
No significant differences exist between relative cardiorespiratory fitness values in prepubertal boys and girls. The probability that boys have lower values than girls is only 73.6 ± 1%.

Models in Practice
At https://demsarjure.shinyapps.io/vo 2max /, (access date: 26 December 2022) a simple app can be found in which the measured VO 2max in Lmin −1 , participant's age, and weight are put in the calculator. The app then uses the fitted Bayesian models to calculate and visualize the percentile for the data that were provided. The app calculates absolute VO 2max when VO 2max and age are provided and relative VO 2max when weight is provided as well. The dashed vertical lines denote the 95% CI.

Discussion
To the best of our knowledge, this is a meta-analysis with the largest dataset of CRF measurements in boys with mean age under 11 that performed cycle ergometry. Based on the articles included, normative values for prepubertal boys are presented, and a prediction model based on age has been developed for researchers and clinicians to use.
Children with mean ages 4 to 11 were included in this meta-analysis. Relative CRF (normalized to body mass) or its standard deviation did not change with age, which is in line with norms found in boys from 8 to 18 years old [116]. However, articles reporting CRF not normalized to body mass (mean absolute CRF) showed that CRF and its standard deviation in prepubertal boys are dependent on age. This can be explained by increasing body mass as boys age. Body mass is metabolically active tissue that uses oxygen consumption during exercise. This finding is supported by higher absolute CRF in heavier boys in this metaanalysis. Interestingly, mean relative CRF is lower in heavier boys, which we suggest can be explained by the methodology used. We excluded only morbid obesity, so overweight subjects were included in our analysis. The decision to include them seemed necessary since obesity has become a global epidemic during the last three decades, especially in developed countries. In 2013, 23.8% of boys were overweight or obese [117]. In children with obesity, CRF has declined in the last decades, and it is vital to improve the level of physical activity and to improve their aerobic fitness [118]. Creating normative values for boys with normal body mass index only would not be useful for those who are overweight or obese. Heavier children might be more susceptible to cardiovascular risk later in life and will need clinical evaluation and follow-up. To conclude, heavier boys seem to have lower aerobic capacity, which can be explained by overweight individuals also included in this analysis. Having normative values for boys (both absolute and relative CRF values) is, thus, necessary for understanding an individual's fitness and could gain even greater importance as boys seem to have become heavier in recent years.
Although specifics of determining VO 2peak in contrast to VO 2max are widely discussed in the literature, there are no recommendations for their use in children based on large studies. This analysis showed that mean VO 2peak is higher than mean VO 2max in studies reporting absolute values, and the opposite was found in studies with relative values. In theory, lower VO 2peak than VO 2max could be explained by subjects not reaching their actual maximal oxygen uptake in articles using VO 2peak metric, which is a general concern when Life 2023, 13, 276 12 of 18 reporting maximal VO 2 in children and adults. However, there is no clear explanation why children with measured absolute values would show higher VO 2peak than mean VO 2max . Our study cannot provide clarification of this finding. However, we would like to suggest that higher body mass in boys with VO 2peak as compared with boys with VO 2max could be a reason for this. We also observed higher body mass in studies using relative VO 2peak as compared with boys with relative VO 2max , but these values are normalized to body mass.
In more recent studies, absolute CRF values (not normalized to body mass) are higher and mean relative CRF values (normalized to body mass) are lower. We can assume this is the result of increasing body mass in boys involved in the studies analyzed. The betweenstudy variability is dropping, which we suggest can be explained by improved methodological approaches and more articles in recent years (85 groups prior to 1995 vs. 128 subject groups after 1995). Understanding that based on these results, prepubertal boys are becoming less fit, which is not only observed or estimated but can now be supported by actual VO 2max measurements as well.
Finally, there is no difference between absolute CRF in the USA and other countries, but relative values are lower in the USA. We found an association between higher body mass and higher absolute values, and we can also say with certainty that USA boys are heavier in the studies included in our analysis. Both absolute and relative values thus indicate that prepubertal boys from the USA have lower endurance capacity than boys from other countries. If we try to interpret that with the data from our analysis that body mass in the USA increases faster than in other countries, we can expect endurance capacity to decrease even further in the future. This is alarming since CRF is the most important marker of health among the health-related physical fitness components in children and adolescents [119][120][121], and there is an inverse relationship between cardiorespiratory fitness during childhood and cardiovascular disease risk factors in adulthood [122].

Limitations
There are some limitations that should be considered. Firstly, we did not distinguish among the many protocols used for each of the approaches during cycle ergometry. These protocols are quite important in estimating aerobic capacity, but protocol nuances used by individual laboratories make sorting them into logical categories very difficult. The same can be said about criteria used to determine whether a child has attained his personal best CRF for a particular test averaging time [123]. These criteria usually include: RER (≥1.0), no change in VO 2 with increasing workload (i.e., a plateau in VO 2 ), visible signs of exhaustion, and attainment of age-predicted heart rate or some percentage of it [5,124] but are not identical in all studies involved.

Conclusions
New reference values for CRF are presented for prepubertal boys that can be used for physical fitness classification on an individual level for medical experts and sports practitioners. Aerobic capacity normalized to body weight does not change with mean age in boys 4-11 years old, which can be very useful in clinical settings for early diagnosis of reduced cardiorespiratory fitness. It seems that values are not different from those in prepubertal girls.
CRF in prepubertal boys is declining. Our results show that this is associated with increasing body mass, but it also suggests boys with mean age under 11 years old might be less active than in the past. In addition, aerobic capacity is lower in boys in the USA, which is associated with increased body mass. In light of the obesity pandemic, these results indicate more action is needed to improve physical activity in prepubertal boys in order to reduce the likelihood of increased cardiovascular risk later in life. CRF references presented by this analysis can aid in evaluating obesity criteria and physical fitness in prepubertal boys.
Finally, this study did not find any advantage in determining CRF values with the VO 2peak or VO 2max metrics. Based on our findings, it seems that in prepubertal boys, the Life 2023, 13, 276 13 of 18 differences are not significant enough to be important. This can help researchers and clinicians as they perform cardiopulmonary tests on prepuberal boys.