Comparison of Polar Active Watch and Waist- and Wrist-Worn ActiGraph Accelerometers for Measuring Children’s Physical Activity Levels during Unstructured Afterschool Programs

Background: The purpose of this study was to examine the convergent validity of the Polar Active Watch (PAW), a consumer-grade wrist-worn activity monitor, against waist- and wrist-worn research-grade monitors, the ActiGraph GT3X+/GT9X accelerometers, in children. Methods: Fifty-one children (18 boys; mean age = 10.30 ± 0.91 years) wore the three monitors (PAW, GT3X+, and GT9X) during an 80-min afterschool program across five school days. Time spent in sedentary, light-intensity (LPA), and moderate- and vigorous-intensity physical activity (MVPA) were estimated from each monitor. The correlation, mixed model, mean absolute percentage error, equivalence testing, and Bland-Altman analyses were used to examine the comparability of PA estimates of the PAW with GT3X+/GT9X accelerometers. Results: Moderate to strong correlations for sedentary and MVPA minutes, and weak correlation for LPA were observed between the PAW and GT3X+/GT9X accelerometers. Significant mean differences were found, where the PAW tended to overestimate time in sedentary and MVPA and underestimate LPA minutes, compared to the GT3X+/GT9X accelerometers. However, a non-significant mean difference in MVPA minutes was observed when using an adjusted MET threshold (≥4 METs) for the PAW, compared to the GT3X+ accelerometer. Conclusions: The PAW showed moderate convergent validity for sedentary and MVPA minutes against the GT3X+/GT9X accelerometers. However, caution is needed in the direct comparison between the monitors due to relatively large mean differences and within-group variability.


Introduction
Physical activity (PA) is one of the most important lifestyle behaviors associated with better health and wellbeing in all age groups. In particular, there is an extensive body of literature demonstrating positive health benefits of PA in children [1,2] such as a reduced risk for being overweight and obese [3], as well as better metabolic [4], bone [5], and mental health [6]. As a result, world and national public health authorities have released PA guidelines for children suggesting that school-aged children should engage in 60 min of daily moderate-and vigorous-intensity PA (MVPA) [7,8]. Further given the importance of school as a place for the promotion of PA in children, the Institute of Medicine recommends at least 30 min of MVPA per school day be accumulated in a school environment [9], which includes, but is not limited to, physical education, before and after-school PA programs, and recess [10].
it is supposed to measure), it provides the form of evidence of construct validity, particularly when the counterpart monitor has known validity against a criterion measure [38,39]. In this regard, we compared PA intensity levels estimated from the PAW to the research-grade PA monitor, ActiGraph accelerometers, which have been previously validated in children [40,41] and frequently compared to different types of consumer-grade PA monitors [42][43][44]. Although the ActiGraph accelerometers have been extensively validated in the literature, the evidence is specific to the location of placement (wrist vs. waist), activity cut-points, as well as the model of ActiGgaph accelerometer used. Therefore, this study examined the convergent validity of PAW for the assessment different PA intensity levels in children, with particular focus on its comparability with wrist-worn GT3X+ and waist-worn GT9X ActiGraph accelerometers, using previously established cut-points for each wear location.

Study Setting and Participants
A total of 51 children (18 boys) aged between 9 and 11 years (mean age = 10.30 years old) participated in this study. All children were enrolled in a school-based afterschool program in a US public elementary school located in a high-poverty area in a mid-size west Texas city. The afterschool program, established as part of the US federally funded East Lubbock Promise Neighborhood (ELPN) project, offers a variety of sessions that include but are not limited to PA, arts, music, and science every school day. Each child had an individualized schedule requiring participation in two sessions per day of 40 min each, one of which was a mandatory PA session on every school day. The PA session was designed to promote moderate-and vigorous-intensity PA (MVPA) through age-appropriate and task-oriented activities (e.g., kicking, running, throwing), and taught by trained undergraduate and graduate level coaches in the school gym or on the playground. Other non-PA sessions were delivered in a classroom, auditorium room, or technology/science lab, where children were supervised and taught by trained afterschool teachers in a relaxed environment. Thus, the afterschool program covered the range of activity levels, from sedentary to vigorous intensity, in supervised school environments. Children participating in the afterschool program were healthy, without physical impairments that would limit their physical activities. Each child's standing height (cm) and body weight (kg) were measured by trained staff using a stadiometer (SECA, Seca Co., Hamburg, Germany) and mechanical scale (Health-O-Meter Professional, Subbeam Products Co., Boca Raton, FL, USA). Participant characteristics are presented in Table 1. This study was conducted under the ELPN protocols approved by the Institutional Review Board of Texas Tech University (#503995), in accordance with the Declaration of Helsinki, and the parents' informed consent and child's verbal assent were obtained at the beginning of the semester. Interested readers can refer to a previous report [45] for more details about the ELPN project.

Procedures
PA data were collected using three accelerometers (PAW, waist-worn GT3X+, and wrist-worn GT9X accelerometers) during 80 min of afterschool programs for up to five school days for each child. Children were asked to visit a pre-designated classroom before the start of their first afterschool program session each day. Children were instructed to wear PAWs and GT9X accelerometers in the same order on their non-dominant wrists, and GT3X+ accelerometers on elastic belts around their waists through the end of the afterschool program in a day. The choice of wear location for the respective ActiGraph accelerometer was solely based on the discretion of researchers, given the size of accelerometers relative to the children's wrist (i.e., GT9X < GT3X+), as well as available resources (i.e., wristbands and elastic belts).

Polar Active Watch
The PAW is a lightweight (45 g), watch-style uniaxial accelerometer, with a screen on its face displaying time and activity levels. The child's sex, date of birth, height, and weight were entered to the Polar GOFIT website (https://polargofit.com/), and the Polar Websync Software (version 2.8.3, Polar Electro Oy, Kimpel, Finland) was used to initialize the device and transfer data from the device for further analyses. The PAW provides 30-s epoch metabolic equivalent task (MET) scores estimated by the manufacturer's proprietary algorithm.

ActiGraph GT3X+ and GT9X Accelerometers
The GT3X+ (firmware v3.2.1, ActiGraph Inc, Pensacola, FL, USA) and GT9X (firmware v1.7.1) are two of the latest generations of ActiGraph accelerometers, that can be worn both at the waist or on the wrist using an elastic belt or watch band. Both devices record accelerations in three axes (vertical, antero-posterior, and medio-lateral), with a dynamic range of ±6 g for the GT3X+ and ±8 g for the GT9X, at sampling frequencies ranging from 30-100 Hz, which are converted into activity counts at a user-defined epoch length. For this study, the devices were set to collect data at 30 Hz, and the data were downloaded at 30-s epoch using the ActiGraph ActiLife software (version 6.13.3, ActiGraph Inc, Pensacola, FL, USA). All devices were synchronized to the time on the computer clock.
Time spent in different PA intensity levels was estimated. For the PAW, three different sets of MET thresholds were used: The first set of MET thresholds (PAW#1) were Polar-defined MET thresholds in Polar Active software. The second set of MET thresholds (PAW#2) were standard MET thresholds widely used in the literature; yet these thresholds may not be appropriate for children when a child-specific resting metabolic rate (RMR) is not accounted [46,47]. Since it is currently unknown whether or not the Polar proprietary algorithm adjusts for higher RMR in children when estimating MET values, we created additional thresholds (PAW#3), which have been used as alternative MET thresholds for children [47], including the US national health surveillance [48].
As previously noted, several activity cut-points have been proposed for the waist-and wrist-worn ActiGraph accelerometers, yet no single cut-point is recognized as standard. For this study, we selected the two cut-points previously proposed for each of the waist-and wrist-worn ActiGraph accelerometers:  using activity counts per 30-s from the vertical axis, according to Reference [49].
The Evenson's cut-points were developed using the older generation of ActiGraph accelerometer (model 7164), which has different internal mechanisms compared to the GT3X+ accelerometer. Based on the suggestion of Cain et al. [50], we enabled the low frequency extension filter during post data processing to attenuate possible biases in applying the Evenson's cut-points to the GT3X+ activity counts data. The Chandler's cut-points were developed using the GT3X+ accelerometer; however, a recent study reported that activity counts estimated from the GT3X+ and GT9X accelerometers were comparable [51], regardless of activation of the low frequency extension filter [52]. Regarding consistency, the low frequency extension filter was also enabled when processing the GT9X accelerometer data. In addition, the Evenson's and Chanlder's cut-points that were originally calibrated in 15-s and 5-s epoch lengths, respectively, were converted to the 30-s cut-points by multiplying by 2 and 6, respectively.
Time spent in different PA intensity levels during afterschool periods was calculated for each monitoring day. The first and last 10 mins of the data were removed prior to the calculation to account for the time spent in distributing and returning the devices. There were 152 data points recorded across the entire sample (median monitoring days: 3 days; interquartile range: 2-4).

Statistical Analysis
We first created scatter plots by examining the linear relation between MET values per 30-s obtained from the PAW and activity counts per 30-s obtained from the GT3X+ and GT9X accelerometers, respectively, for each monitoring day. The purpose of this analysis was to examine the convergence between monitors that are not influenced by the choice of MET thresholds for the PAW, and activity counts cut-points for the GT3X+ and GT9X accelerometers. An average time spent in different PA intensity levels was calculated and compared between monitors using a mixed model, with a random intercept accounting for multiple observations within each child. The correlation coefficients between the PAW and GT3X+ or GT9X accelerometers were calculated for each PA intensity level using a mixed model with a random intercept, as outlined in Hanlett et al. [53], for the same reason (i.e., multiple observations within each child for each monitor). The bias-corrected 95% confidence intervals for correlation coefficients were estimated using the bootstrapping method, with 200 bootstrap resamples drawn from the observations at the child level.
The agreement of PA estimates between the PAW and GT3X+ (and GT9X) accelerometers was examined using mean absolute percentage error (MAPE) and a modified Bland-Altman method. The mean bias and 95% limits of agreement (LOA) were calculated using a one-way random effects model, where multiple monitoring days are nested within a random factor of subject and the LOA is defined by the true difference between monitors in addition to between-and within-subject random variability. The 95% confidence intervals of LOA were additionally calculated using the method of variance estimates recovery, which has been shown to be superior to a conventional approach using a delta method [54], in order to estimate the maximum limits of upper and lower LOA. Further details of the modified Bland-Altman method, and a SAS macro implementing the calculations of LOA with multiple observations within individuals are available from Zou [54].
In addition, the equivalence test was performed using a two one-sided t-test (TOST) approach. By following the guidelines outlined in Dixon et al. [55], two one-sided hypotheses were formulated, where the mean ratio of PA estimates between monitors was compared with the upper and lower limits of 10% equivalence zones (i.e., Ha1: 0.9 < mean ratio; and Ha2: mean ratio < 1.11). Since the primary focus of this study was to examine convergent validity of the PAW relative to the ActiGraph accelerometers, PA estimates obtained from the GT3X+ and GT9X accelerometers were used as references when creating equivalence zones. The selection of 10% equivalence zones was based on previous studies comparing consumer-and research-grade activity monitors [56]. Two variables associated with each hypothesis were created for each monitoring day within each child (i.e., Da1 = PAW − 0.9 × GT3X+ (or GT9X); and Da2 = PAW − 1.1 × GT3X+ (or GT9X)). A mixed model with a random intercept was used to test the mean of each hypothesized variable against zero. Two-sided p-value estimated from the model (i.e., intercept parameter) was divided by 2 to obtain the one-sided p-value only if the effect was in the hypothesized direction (i.e., positive t-value for Da1 and negative t-value for Da2). The equivalence of PA estimates of the PAW with the estimates from the GT3X+ and GT9X accelerometers was established when one-sided p-values from both tests were less than 0.05 [55]. All data management and statistical analyses were performed using the SAS v9.4 (SAS Institute, Cary, NC, USA), and statistical significance was set at p < 0.05.

Results
The scatter plots depicting linear relationships between MET values from the PAW and activity counts from the GT3X+ and GT9X accelerometers are presented in Figure 1. There were positive relationships, where MET values from the PAW were more strongly related with activity counts from the GT9X accelerometer (r = 0.72; 95% CI = 0.71-0.73), than those from the GT3X+ accelerometer (r = 0.46; 95% CI = 0.45-0.47). t-value for Da2). The equivalence of PA estimates of the PAW with the estimates from the GT3X+ and GT9X accelerometers was established when one-sided p-values from both tests were less than 0.05 [55]. All data management and statistical analyses were performed using the SAS v9.4 (SAS Institute, Cary, NC, USA), and statistical significance was set at p < 0.05.

Results
The scatter plots depicting linear relationships between MET values from the PAW and activity counts from the GT3X+ and GT9X accelerometers are presented in Figure 1. There were positive relationships, where MET values from the PAW were more strongly related with activity counts from the GT9X accelerometer (r = 0.72; 95% CI = 0.71-0.73), than those from the GT3X+ accelerometer (r = 0.46; 95% CI = 0.45-0.47). The time spent in PA intensity levels estimated from a mixed model with a random intercept across the monitors are presented in  25-19.39), when compared with the PAW#1 and PAW#2. When compared with the estimates from the GT3X+ accelerometer, the PAW yielded significantly different PA estimates across all intensity levels, regardless of the MET thresholds used. Likewise, PA estimates from the GT9X were statistically different from the estimates of PAW#1 for all intensity levels, but non-significant differences were seen in LPA when compared with PAW#2, and in MVPA when compared with PAW#3, respectively. The time spent in PA intensity levels estimated from a mixed model with a random intercept across the monitors are presented in Table 2 25-19.39), when compared with the PAW#1 and PAW#2. When compared with the estimates from the GT3X+ accelerometer, the PAW yielded significantly different PA estimates across all intensity levels, regardless of the MET thresholds used. Likewise, PA estimates from the GT9X were statistically different from the estimates of PAW#1 for all intensity levels, but non-significant differences were seen in LPA when compared with PAW#2, and in MVPA when compared with PAW#3, respectively. The results of the Bland-Altman analysis showing agreement of PA estimates from the PAW with the GT3X+ and GT9X accelerometers are shown in Figures 2 and 3, respectively. Overall, more than 95% and 98% of data points fell within the 95% LOA and the maximum allowable LOA (i.e., upper and lower limits of the 95% confidence interval of LOA), respectively, across PA intensity levels.
Pertaining to the comparison with the GT3X+ accelerometer (Figure 1 Table 3 presents the correlations, MAPE, and mean ratios of PA estimates from the PAW relative to the GT3X+ and GT9X accelerometers. The correlations of the PAW with the GT3X+ accelerometer ranged between 0.20 and 0.67 for PAW#1, between 0.32 and 0.71 for PAW#2, and between 0.16 and 0.64 for PAW#3, respectively, where the strongest and weakest correlations were found in MVPA and sedentary, respectively, across all MET thresholds of the PAW. When compared with the GT3X+, the largest MAPE was seen in sedentary minutes estimated from PAW#1 (121.68%; 95% CI = 84.87-158.49) and in MVPA estimated from PAW#2 (98.38%; 95% CI = 67.69-129.06) and PAW#3 (69.16%; 95% CI = 47.10-91.22). The lowest MAPE was seen in LPA across all MET thresholds of PAW. The results of the equivalence test using mean ratios based on the TOST approach demonstrated non-equivalence of PA estimates of the PAW, at 10% of the equivalence zones of the GT3X+ accelerometer. Similarly, when compared to the GT9X accelerometer, the weakest correlations were found in LPA and strongest correlations were found in MVPA across all MET thresholds of the PAW. The MAPE was largest in MVPA and smallest in LPA across all MET thresholds of PAW. The significant equivalence was observed in LPA estimated from PAW#2 (mean ratio: 1.11; 95% CI = 1.01-1.20), but no other equivalency was found relative to 10% of the equivalence zone of the GT9X accelerometer.      [53] and bias-corrected 95% confidence intervals were estimated using the bootstrapping method; c values represent mean absolute percentage error (MAPE, %) and 95% confidence intervals in parentheses; d values represent mean ratio of PA estimates of the PAW over the estimates from the GT3X+ or GT9X accelerometers and 95% confidence intervals in parentheses. * significantly equivalent with the PAW#1 based on two one-sided t-tests (TOST) equivalence test.

Discussion
The purpose of this study was to examine the convergent validity of the PAW, a consumer-grade PA monitor, for the assessment of PA intensity levels in children, against research-grade and waist-worn GT3X+ and wrist-worn GT9X accelerometers. The three sets of MET thresholds were applied to the PAW for the estimation of PA intensity levels, and PA estimates were compared with the estimates obtained from the GT3X+ using Evenson's cut-points and the GT9X using Chandler's cut-points. Our findings demonstrated that PA estimates from the PAW were moderately to highly correlated with the estimates from the GT3X+ and GT9X accelerometers for sedentary (r ranges 0.45-0.66) and MVPA (r ranges 0.64-0.75), whereas weak to moderate correlations were found for LPA (r ranges 0.16-0.54). PA estimates obtained from the PAW generally showed greater correlations with the estimates from the GT9X (r ranges 0.20-0.75) than the GT3X+ (r ranges 0.16-0.71) for each intensity level. In particular, greater correlations were found for the PAW using standard MET thresholds (PAW#2) with the GT3X accelerometer (r ranges 0.32-0.71), and PAW using adjusted MET thresholds (PAW#3) showed greater correlations with the GT9X accelerometer (r ranges 0.66-0.75).
The convergence between monitors is established when PA estimates from the monitors are sufficiently correlated; however, there is little consensus about the optimal correlation coefficient to determine the level of convergent validity. The rules of thumb that have been widely applied in the literature are to interpret a correlation between 0.0 and <0.25 as a weak relationship, ≥0.25-<0.50 as a moderate relationship, ≥0.50-<0.75 as a strong relationship, and ≥0.75 as a very strong relationship. However, it is also suggested that the correlation coefficients be interpreted relative to common practice in a field [57]. As previously noted, there is currently little evidence available in the literature regarding the convergent validity of the PAW in comparison with a research-grade PA monitor; however, there are several studies available that examined the convergent validity of different consumer-grade PA monitors against the ActiGraph accelerometers. One study compared the waist-worn Fitbit Zip (Fitbit Inc., San Francisco, CA, USA) with the waist-worn GT3X+ accelerometer in children, particularly using the Evenson's cut-points, where interpreted correlations ranged from 0.24-0.72 for MVPA minutes as weak to moderate and 0.57-0.87 for sedentary minutes as moderate to strong correlations [44]. Among the adult population, Gomersall et al. [58] reported correlations of MVPA minutes estimated from the Fitbit One and Jawbone Up (Jawbone Inc., San Francisco, CA, USA) to be 0.80 and 0.72, respectively, compared to the GT3X accelerometer, and interpreted them as strong correlations [58]. Another study that compared six consumer-grade PA monitors with the GT3X accelerometer demonstrated correlations ranging between 0.52 and 0.91 for MVPA minutes and interpreted them as weak to strong correlations. Although there are some discrepancies in interpreting the correlations within the contexts of convergent validity of consumer-grade PA monitors in the literature, it is also suggested that r ≥ 0.70 is recommended to claim convergent validity, and r < 0.50 should be avoided [38]. In this regard, our findings may demonstrate moderate (r ≥ 0.50) convergent validity of the PAW, particularly using the Polar-defined (PAW#1) and adjusted MET thresholds (PAW#3), for the estimation of sedentary minutes against the GT3X+ and GT9X accelerometers. Pertaining to MVPA minutes, the PAW, regardless of which MET thresholds were used, showed at least moderate convergent validity relative to the GT3X+ and greater convergent validity was demonstrated when compared with the GT9X accelerometer. However, there was weak convergent validity for the estimation of LPA minutes.
Aside from interpretation of correlation coefficients within the context of convergent validity of consumer-grade monitors, our findings can be compared with Mossea et al. [44] who examined the validity of the Fitbit Zip in relation to the GT3X+ accelerometer in a school environment. In particular, their findings partly in contrast with our results by showing greater correlations in MVPA than sedentary minutes; but some similarities were also observed. In their study, the highest correlation of 0.72 for MVPA was reported during a physical education lesson, followed by a correlation of 0.56 observed at recess, which were comparable with the correlations that we observed between the PAW and GT3X+ accelerometer for MVPA minutes (r ranges 0.67-0.71). Whereas the correlations for sedentary minutes during physical education and recess time were reported as 0.85 and 0.87, respectively; these were higher than the correlations we observed for sedentary minutes against the GT3X+ accelerometer (r ranges 0.48-0.65). Although it is still difficult to directly compare the results between the studies due to the fundamental differences between the monitors (e.g., uniaxial vs. triaxial accelerometer sensors for the PAW and Fitbit Zip, respectively), this may indicate that convergent validity of the PAW for the estimation of MVPA minutes is comparable with other consumer-grade monitors, but not for sedentary minutes.
The correlation is a primary method for evaluating the convergent validity; however, it is not sensitive for detecting systematic difference in mean values between the monitors, as well as variability of differences within the group. In the present study, we found that, in general, the mean of PA estimates from the PAW were not statistically equivalent with the estimates from the GT3X+ and GT9X accelerometers, regardless of which MET thresholds were used for the PAW. Although more than 95% of the data points were within the observed LOAs in the Bland-Altman analyses, a relatively wide range of LOA indicates large variability in differences within the group. We further observed significant proportional bias where mean differences tended to increase with increasing average estimates between the monitors. Considering that this study focused on the 60-min of afterschool programs, it should be noted that longer measurement periods may likely increase differences in PA estimates between the PAW and GT3X+/GT9X accelerometers, as well as within-group variability in differences.
Meanwhile, present findings also indicated that the PAW consistently overestimated time spent in MVPA relative to the GT3X+ and GT9X accelerometers, with the largest MAPE (168.56%) observed between PAW#2 and the GT9X accelerometer, despite the strong correlation observed (r = 0.72). In contrast, MAPE was consistently lower in comparing LPA minutes between the monitors across the MET threshold conditions of the PAW. The smallest MAPE was observed when comparing PAW#2 and GT3X+ (30.50%), where the lowest correlation (r = 0.16) was also calculated. Conceptually, the correlation is a measure of the linear relationship between the monitors showing consistency of the relative position of the same participants between the monitors; whereas MAPE is a measure of absolute error in PA estimates between the monitors. These results imply that MVPA minutes obtained from the PAW may have limited comparability with the estimates from the GT3X+ and GT9X accelerometers at group level; however, the relative position of the participants within the group may be comparable between the monitors. Whereas, less difference may be expected when comparing the group mean of LPA minutes between the monitors, large unknown random errors may reduce the comparability of concordant changes in LPA minutes within the group.
The underlying mechanisms explaining the observed systematic and proportional biases could not be clearly elucidated in the present study, but it should be noted that selection of the MET thresholds proportionally influenced PA estimates from the PAW. For the children, the use of standard MET thresholds (<1.5 METs for sedentary and ≥3 METs for MVPA) for the estimation of PA intensity levels has been concerned with whether the rate of energy expenditure is based on a standard resting metabolic rate (RMR) for adults (3.5 mL/kg/min), rather than for children, whose RMR can be up to 6 mL/kg/min [59]. As an alternative, the thresholds with <2.0 METs for sedentary and ≥4 METs for MVPA have been proposed as adjusted MET thresholds if the standard adult's RMR is used [46,60]. As previously noted, it is currently unclear if the PAW produces the MET values after accounting for child-specific RMR. The Polar standard MET thresholds define sedentary as >2.0 METs and MVPA as ≥3.5 METs, which are 0.5 METs greater than standard MET thresholds and 0.5 METs lower than adjusted MET thresholds for MVPA, resulting in significantly different PA estimates when compared with PA estimates obtained using standard and adjusted MET thresholds. The Evenson's cut-points were developed based on the ten semi-structured activities ranging from sedentary to vigorous-intensity PA, where the maximum MET value of sedentary activities and minimum MET value of MVPA activities were about 1.3 and 3.7 METs, respectively, after accounting for child-specific RMR. In this regard, our results that showed less mean difference in MVPA minutes when using adjusted MET thresholds (PAW#3) with the GT3X+ accelerometer may indicate that child-specific RMR is not accounted for when estimating MET values from the PAW. However, inconsistent findings were found where sedentary and LPA minutes were over and underestimated by PAW#3 relative to the GT3X+ accelerometer, respectively, which may also indicate that the PAW is not sensitive to discriminate activities between sedentary and LPA levels when compared to the GT3X accelerometer.
It is widely acknowledged that the placement of the activity monitor may alter measured PA outcomes. In particular, a wrist-worn PA monitor is generally known to be more sensitive to upper-body locomotor movements [61,62] and we expected high comparability between the PAW and GT9X accelerometer, as both monitors were placed on the same non-dominant wrist. In the present study, there was stronger correlation (r = 0.72) of MET values from the PAW with the activity counts from the wrist-worn GT9X accelerometer, compared to the waist-worn GT3X accelerometer (r = 0.46). However, as previously discussed, the amounts of PA measured by the PAW were not comparable to the GT9X accelerometer. On the other hand, there was no systematic mean bias between the GT3X and GT9X accelerometers (p > 0.05; the results are not shown in the table), with strong correlations across all intensity levels between the monitors (r ≥ 0.87). This may support the notion that appropriate calibration of raw accelerometry data obtained from waist-and wrist-worn activity tracker would result in comparable PA estimates [63,64]. In this regard, the lack of comparability of the PAW with the GT9X, as well as the GT3X, accelerometers may be attributed to unknown measurement errors introduced when calibrating raw accelerations into the MET values. Currently, most companies manufacturing consumer-grade PA monitors do not provide access to raw acceleration data nor their proprietary algorithms. For the PAW, it is likely that the required input parameters including age, sex, height, and weight are accounted for when estimating MET from the Polar software; however, the detailed parameters and proprietary algorithms are not publicly available. Thus, it is difficult to elucidate where exactly the observed discrepancies arose in the current study. In addition, the proprietary algorithms of most companies are often changed over time as new functions or results from their own discrete research are added, affecting intra-monitor reliability, as well as comparability with other consumer-and research-grade PA monitors [65]. Therefore, we echo the recommendation of Evenson et al. [19] suggesting that these companies consider revealing their proprietary algorithms, at least to some extent (e.g., calibration methods, input parameters, and research outputs), which can potentially advance measurement practice of both consumer-and research-grade PA monitors.
There are several limitations that should be considered when interpreting the present findings. First, our study did not have a criterion measure of energy expenditure, so we were unable to draw any conclusions on the criterion validity of the PAW. Additionally, as previously noted, the Evenson's and Chandler's cut-points were developed using different models of ActiGraph accelerometers. Although we enabled the low frequency extension filter during post data processing to reduce possible bias for using different models of ActiGraph accelerometers, where recent studies have shown high comparability of activity counts between the GT3X+ and GT9X accelerometers, both monitors cannot be considered criterion measures of PA. Furthermore, the Evenson's and Chandler's cut-points were converted to 30-s cut-points which might introduce errors in estimating time spent in PA intensity levels from the GT3X+ and GT9X accelerometers. Collectively, our findings should be interpreted within the context of convergent validity of the PAW in comparison with the waist-worn GT3X+ and the wrist-worn GT9X accelerometers using Evenson's and Chandler's cut-points, respectively. Future work should examine the accuracy of the PAW against a criterion measure of energy expenditure, such as indirect calorimetry. In addition, the MET threshold for LPA in PAW#3 was 2.00-3.99 METs based on a recent study that reported higher accuracy of <2.0 METs for the classification of sedentary activities in children when child-specific RMR is not accounted [47]. However, the MET thresholds of LPA used in Evenson's original calibration study [40] was about 1.5-3.99 METs, and this discrepancy in MET thresholds for LPA might introduce the errors when comparing LPA minutes between PAW#3 and GT3X+ accelerometer. Lastly, PA observations were exclusively taken from the 80-min school-based afterschool program setting, and we did not collect the specific activities that occurred during the program. Although the observations were taken across multiple schooldays to increase total observation time per individual and the types of activity that took place included all intensity levels ranging from sedentary to vigorous-intensity, we suggest that future research should utilize longer monitoring times in free-living settings or include a wide range of unstructured and structured activities in a controlled laboratory setting.

Conclusions
We attempted to examine the convergent validity of the PAW for the assessment of PA in children compared to the waist-worn GT3X+ and wrist-worn GT9X accelerometers. Overall, the results indicated a weak convergent validity of the PAW for the assessment of LPA minutes, but moderate convergent validity for the estimation of sedentary and MVPA minutes, relative to the GT3X+ and GT9X accelerometers, particularly when applying the Polar-defined or adjusted MET thresholds. There were significant mean differences in PA estimates between the monitors, with less discrepancy observed in MVPA minutes when using the adjusted MET thresholds. Collectively, the data suggested that within-group comparison of relative levels of sedentary and MVPA minutes estimated from the PAW could be comparable to those of the GT3X+ and GT9X accelerometers. However, between-group comparisons of PA estimates obtained from different monitors is not recommended due to the large uncertainties regarding their comparability at group-level, particularly when using the Polar-defined MET thresholds. Such uncertainties can be possibly attenuated when applying the adjusted MET thresholds for the estimation of PA levels from the PAW. Given the potential of PAW as a means of PA assessment or as an intervention component among children, across both research and practical applications, further research is needed to examine the criterion validity of the PAW in children.
Author Contributions: Y.K. conceived and designed the study, conducted statistical analyses, interpreted the results, and wrote the first draft of the manuscript; M.L. conceived and designed the study, collected the data, interpreted the results, and critically revised the manuscript. All authors read and approved the final manuscript.

Funding:
The ELPN project is supported by the Promise Neighborhoods Grant from the US Department of Education (award #1009047). The funder does not have any role in the study other than to provide funding.

Conflicts of Interest:
The authors declare no conflict of interest.