Validity and Reliability of Physiological Data in Applied Settings Measured by Wearable Technology: A Rapid Systematic Review

: The purpose of this review was to evaluate the current state of the literature and to identify the types of study designs, wearable devices, statistical tests, and exercise modes used in validation and reliability studies conducted in applied settings / outdoor environments. This was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. We identiﬁed nine articles that ﬁt our inclusion criteria, eight of which tested for validity and one tested for reliability. The studies tested 28 di ﬀ erent devices with exercise modalities of running, walking, cycling, and hiking. While there were no universally common analytical techniques used to measure accuracy or validity, correlative measures were used in 88% of studies, mean absolute percentage error (MAPE) in 75%, and Bland–Altman plots in 63%. Intra-class correlation was used to determine reliability. There were not any universally common thresholds to determine validity, however, of the studies that used MAPE and correlation, there were only ﬁve devices that had a MAPE of < 10% and a correlation value of > 0.7. Overall, the current review establishes the need for greater testing in applied settings when validating wearables. Researchers should seek to incorporate multiple intensities, populations, and modalities into their study designs while utilizing appropriate analytical techniques to measure and determine validity and reliability.


Introduction
Advances in technology have allowed researchers to learn how the body reacts to the stresses placed upon it through sport, physical activity, and exercise. Laboratory technology has advanced from early direct calorimeters, to whole-room open-circuit indirect calorimeter, to Douglas Bags, then pedometers, metabolic carts, portable metabolic systems, and other means designed to measure physiological metrics during exercising [1,2]. Technologies like Douglas Bags and portable metabolic systems have been revolutionary to the field of exercise physiology, allowing research to be performed in applied settings. This has enabled researchers to take the athletes or participants into the field to measure the physiological responses to the stresses of exercise. Wearables and fitness trackers are the natural progression to this technology, with the added benefit of reduced cost and increased prevalence. With the popularity of wearable technology increasing year over year, there are unique opportunities and insights now available. As these fitness trackers are meant to be worn by the general public, continuously, they provide a wealth of new data, previously unavailable to sport and exercise scientists, public health and wellness experts, and medical professionals. A total of 722 million variables. The purpose was to represent the current state of the technology, and due to the rapid evolution of this technology, anything prior to 2010 is antiquated and too different from the devices being released today.

Search Strategy
Researchers performed three phases of screening, and teams of two independent reviewers performed all searches and reviews. First, the researchers identified all relevant articles by title screening only; second, eligibility was determined by abstract screening; and finally, full-article screening was performed. Any inconsistencies in eligible literature within teams was rectified by a third researcher. Reviewers exported the references into the citation manager of their choice (RefWorks or Endnote), then sent their completed list as an excel spreadsheet (exported from the citation manager) to the third researcher for the compilation and determination of final eligibility to resolve any inconsistencies. If eligibility could not be determined from the text, reviewers contacted the author by email to clarify; there were no instances where the author could not be contacted for clarification.
Google Scholar was used as the search database, and the single search string which utilized keywords and Boolean operators was: "Running OR Walking OR Biking OR Cycling OR Swimming OR Rowing OR Hiking OR Triathlon OR Exercise + Activity Trackers OR Fitness Trackers OR Wearable Technology OR Wearables + Validity OR Reliability OR outdoors OR field" (see Table 1).

Data Extraction
Teams of two independent reviewers extracted relevant data from each study into an excel spreadsheet, including information such as the number of subjects, information on the wearable device being tested, statistical measures to determine validity, outdoor location, as well as exercise format and intensity. Any inconsistencies were resolved between the reviewers on that team.

Risk of Bias Assessment
The Cochrane Risk of Bias Tool 2.0 (ROB 2.0) was used to assess the methodological quality of the individual studies and the risk of bias [31]. Teams of two researchers collaborated to fill out the assessment tool for each study.

Results
The search string resulted in 17,300 articles. During the screening process, it became known to the researchers that Google Scholar does not allow users to go past 100 pages (1000 articles). Therefore, while the search produced 17,300 results, 1000 results were evaluated for inclusion. This limitation was not known prior to choosing Google Scholar as the database, but due to the popularity of Google Scholar, (82% of academic knowledge seekers start their research with Google Scholar) [32], the size of Google Scholar [33], and the scope of the rapid review being performed, we determined that this would be a sufficient assortment of articles for the current rapid review.
There was a total of 157 articles after title screening, 38 articles after abstract screening, and a total of nine articles that met the criteria for inclusion after full-article screening, [34][35][36][37][38][39][40][41][42] (see Figure 1 and Table 2). of Google Scholar, (82% of academic knowledge seekers start their research with Google Scholar) [32], the size of Google Scholar [33], and the scope of the rapid review being performed, we determined that this would be a sufficient assortment of articles for the current rapid review.

Exercise Mode
The studies reviewed utilized several exercise modalities to test for validity or reliability, including walking or hiking [34,36,41,42], running or trail running [34,[37][38][39][40][41][42], and cycling [41] (See "Exercise Modality" column in Tables 3-5). These activities were performed under various intensities and durations.  Duration was reported as both distance (km) and time (min). The average reported running distance was 2.2 ± 1.3 km, while the average walking distance was 2.1 ± 1.4 km. Distance was not reported for cycling exercise [41]. Six articles [34,35,37,38,40,42] reported time as their measure of duration with Navalta et al. [37] and Adamakis [34] both reporting duration and time. Adamakis reported a timed duration for two different exercise protocols, walking and running (which were each factored into the average separately). Among the articles that reported time as their measurement, an average of 25.1 min was spent performing the study-specific protocols.
The intensities under which the participants performed the activities were primarily described as a generalized, self-selected pace. Carrier et al. [35] was among the articles that described a self-selected pace for their participants but included a stipulation that the pace be maintained above 70% of the subject's maximal heart rate. This was in accordance with the guidelines of the wearable technology utilized to estimate aerobic capacity. Other exercise intensity descriptions by authors included Wahl et al. [40], who described in their protocol as an outdoor run that needed to be maintained at a speed of 10.1 km/h, and Zanetti et al. [42], who described the exercise as a specific, intermittent intensity that would simulate the intensity of running/exercising in a rugby match as their exercise protocol.

Study Design
With respect to the study design employed by the investigations meeting the criteria for inclusion in this rapid review, the data extracted were participant characteristics, types of statistical analyses, criterion measures used, and physiological variables tested.
The number of participants utilized for determining the validity and reliability of wearable devices in an outdoor/applied setting ranged from n = 1 to n = 44 (n = 19.6 ± 12 participants, reported as mean ± SD). Only 56% (5/9) of studies had over 20 participants. Seventy-seven percent of studies (7/9) tested both male (n = 11 ± 6) and female (n = 9 ± 7) participants. Participants in all investigations were overwhelmingly younger, with an average mean age of 27 ± 5 years. Without exception, the participants were screened to be healthy and free of illness. Four investigations (57%) required participants to have a state of chronic activity level. The studies reviewed for this paper all included information on the biological sex, age, weight, and height of the participants, as is commonly reported, with Parak et al. [38] also reporting body mass index (BMI), Wahl et al. [40] and Zanetti et al. [42] both reporting body fat percentage, Xie et al. [41] reporting weekly physical activity, and Carrier et al. [35] reporting weekly average run distance.
One investigation evaluated wearable device reliability, and the remaining reported validity compared to a criterion measure [36] (See "Reliability/Validity Measure" column in Tables 3-5). Reliability was determined using the intraclass correlation coefficient (ICC) and measures were considered reliable if the ICC was greater than 0.70 with an accompanying p-value less than 0.05. Considering validity measurements, 88% (7/8) of studies used multiple indicators of agreement. Four investigations [34,35,41,42] reported two measures of validity, while three studies [37,38,40] utilized four statistical tests or more for agreement. Among the statistical tests, correlative measures (Pearson, ICC, Spearman, Lin's concordance correlation coefficient (CCC)) were employed in 88% (7/8) of studies with established thresholds for considering a device valid either using statistical significance (p < 0.05) or a predefined definition (r > 0.70). Mean absolute percent error (MAPE) was used in 75% (6/8) of studies with a threshold of lower than 10% being considered valid in two studies, and thresholds not reported in the methodology in the remaining four investigations. The typical error of the estimate (TEE) or mean absolute error (MAE) was employed in four investigations with effect size calculations used to determine the validity thresholds in one study and thresholds not reported in the remaining three investigations. Bland-Altman plots were utilized in 63% (5/8) of investigations.

Wearable Device
From all studies reviewed, a total of 28 consumer devices tested and no novel devices were used (a total list of devices can be found in Table 2). In studies with multiple devices being tested [34,37,40,41], the order in which they were placed on the wrist/forearm were randomized. Of the devices tested only one was biometric clothing (Hexoskin), one ring (Motiv Ring), one forearm (Scosche Rhythm+), one earbud (Jabra Elite Sport Earbuds), and 24 were wrist-worn devices (see Table 2).

Device Validity
Device validity was determined for several different physiological metrics, including energy expenditure, heart rate, ventilation rate, VO 2 max, and minute ventilation.

Estimated Energy Expenditure
As shown in Table 3, the energy expenditure estimations from wearable technology devices continued to have low agreement with criterion portable metabolic units when this measure was obtained in an outdoor environment. Of the twenty-three different wearable devices evaluated in the literature to meet inclusion in the present rapid review, none were considered to return acceptable validity measures for exercise occurring in a natural setting, according to the authors of the original studies. Additionally, no investigations reported reliability measures for estimated energy expenditure outdoors.

Heart Rate
Heart rate measures depend largely on the device type and outdoor location utilized. The Hexoskin smart shirt displayed poor reliability (36) and validity (39) when utilized in trail situations (hiking and trail running). Similarly, every photoplethysmography-based device evaluated during trail running returned heart rate values that were not deemed acceptable by the authors (37). On the other hand, with the exception of the Xiaomi Mi Band 2, wrist worn devices returned acceptable agreement when compared to palpated heart rate measurements when participants ran and walked around a track or rode a fixed path [41] (see Table 4).

Other Physiological Variables
One investigation evaluated the ability of wearable technology devices to return acceptable validity measures for the estimated physiological variables of ventilation rate, and minute ventilation [39], while two evaluated VO 2 max [35.38]. The Hexoskin biometric shirt displayed acceptable agreement for ventilation rate but not minute ventilation [39] in a trail environment. The PulseOn monitor and Garmin fenix 3 provided acceptable validity for estimating the maximal aerobic capacity when participants ran on an outdoor track [35,38] (see Table 5). No reliability data were available for these variables in an outdoor setting.

Outdoor Location/Environment
Of the studies reviewed, various outdoor locations were chosen and are as follows: outdoor trails (4/9) [34,36,37,39], paved track (3/9) [35,38,41], free-living conditions [40], and a rugby field (1/9) [42]. All studies had varying descriptions of the environment. Of the studies reviewed, Adamakis [34], Montes et al. [36], Navalta et al. [37], and Tanner et al. [39] were the only studies to mention grade or elevation, with Adamakis taking place at a 49-acre park on a path with both wooden and paved surfaces with no increase or decrease in grade. Montes noted the starting elevation for both days, where day one was recorded at 5446 feet above sea level, and day two was recorded as 5757 feet above sea level at the trailhead, which then rose to 6443 feet above sea level at a 17.6% grade. The trail names were not mentioned, however, a grading system was defined for both trails as a class I, Yosemite Decimal System (YDS). Navalta  Parak et al. [38] and Tanner et al. [39] were the only studies to mention temperature whereas Tanner et al.'s was measured at 26.2-32.3 • C. Parak et al. did not give a specific temperature but listed stipulations for conducting testing, since testing took place in the winter months. The stipulations were no rain or snow, and a temperature above −10 • C.
The study by Zanetti et al. [42] took place on a rugby pitch to simulate game aspects, but no other information about climate or environment was given. Wahl et al. [40] also did not describe the environment of their outdoor running route. Xie et al. [41] used a standard 400 m track for part of the testing, though they did not describe their predetermined outdoor cycling route. Environment was not explicitly described for every study and/or session, but inferences could be made by the geographical location of each study. Adamakis

Risk of Bias
The risk of bias and methodological quality of the studies included in the present review were assessed using the Cochrane Risk of Bias Assessment Tool (ROB 2.0) [31]. The assessment tool uses five domains to evaluate the quality of the study and the individual risk of bias (1. randomization process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Author (Year)
Randomization Process process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Randomization Process
Deviations process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Randomization Process
Deviations process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Randomization Process
Deviations , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Randomization Process
Deviations process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Randomization Process
Deviations process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.

Randomization Process
Deviations process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review. , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) ble 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.  e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) ble 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review. , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) ble 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). Table 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias of all studies included in this review.  , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) ble 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). , 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the e, 5. selection of the reported result), which produces an overall bias result in the form of isk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies least "Some concerns" for bias due to the randomization procedures being irrelevant to on-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) ble 6. Cochrane Risk of Bias Assessment tool output used to assess the quality and the risk of bias all studies included in this review. process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34). process, 2. deviations from intended interventions, 3. missing outcome data, 4. measurement of the outcome, 5. selection of the reported result), which produces an overall bias result in the form of "Low risk", "Some concerns"/unclear risk of bias, and "High risk", as seen in Table 6. All the studies had at least "Some concerns" for bias due to the randomization procedures being irrelevant to validation-type study designs. One study had a high risk of bias due to the sample size of one (n = 1) (34).

Discussion
The purpose of this rapid systematic review was to evaluate the current state of the literature and to identify the types of study designs, wearable devices, statistical tests, and exercise formats used in studies conducted in applied settings/outdoor environments. According to our findings, the present volume of literature validating wearable technology in applied settings is small compared to the larger body of wearable technology validation literature. We believe that determining the validity and reliability of wearable technology devices in applied settings is important, (1) for consumers to have confidence in the measurements that are being generated, (2) for coaches, practitioners, and athletes to have accurate and reliable physiological data available, and (3) for researchers who wish to conduct investigations in applied settings utilizing these devices. Our findings indicate two main themes that should be considered when investigators intend to conduct validity or reliability testing on wearable devices in outdoor settings. Each theme will be discussed in further detail below, including considerations for study design, and for the analytical techniques utilized.

Study Design
Out of the nine papers that were included, only one paper analyzed the reliability of the device [36]. Reliability is an important aspect in determining the effectiveness of wearable technology and researchers should design validation and reliability studies to remedy this deficiency. This limitation has been noted in other systematic reviews specific to wearable technology for tracking physical activity [5,48]. The current findings, again, highlight the need for study designs to account for device reliability.
A difficulty in outdoor validation is designing robust and complex training or testing protocols, and authors should aim to design more rigorous and purposive studies to improve the level of testing, similar to what would be found in laboratory-based studies. These may include utilizing different intensities, modalities, environments, populations, and collection times, to name a few. Wearable technology purports to measure physiological variables in a range of different exercises, however, running, walking, and biking are the main exercise modalities evaluated, and there remains a need to validate these devices using other modes of exercise. The Consumer Technology Association recommends at least 5 min of data collection during trials obtaining heart rate [49], however, that may still be insufficient. The average collection time for the studies included in this review was 25.1 min. Researchers should also try to account for a range of body compositions, BMI, age, biological sex, skin type, etc., as the Consumer Technology Association also recommends [49]. Of the studies reviewed, only one [38] reported BMI (although all reported height and weight, so the BMI could be calculated), two [40,42] reported body fat percentage, and none reported skin types (although not all authors used devices with photoplethysmography or near-infrared sensors that could be impacted by the skin type). The Consumer Technology Association has also recommended that at least twenty participants be utilized [49][50][51], and only 56% (5/9) of studies met this guideline.
These consumer devices are primarily going to be used at self-selected paces, consequently, it is important to have a self-selected pace as a condition of the validity testing; however, researchers should also make an effort to incorporate different intensities of exercise, as these devices are intended to be used throughout the spectrum of exercise intensity. The studies included in the current review used a self-selected pace in seven of the nine studies.
Researchers should also seek to validate the devices in a range of environments, including different altitudes, temperatures, humidity levels, etc., whenever reasonable. When reporting the results of the studies, researchers should also include information about the testing environment under which the devices were utilized. The studies for the current review reported a range of environmental factors, such as the testing surface, geographical region, temperature range, altitude, and grade. Designing studies to test under these different conditions and circumstances will provide better resolution, for both the consumer and the researcher, as to the unique circumstances and intensities in which each device may be considered valid.

Analytical Techniques, Validity Criteria, and Quality Assessment
According to Welk et al. [10], 87% of the activity monitoring validation literature uses correlation coefficients and 52% use MAPE. From the studies included in the present review, the use of multiple statistical tests was performed in 88% of the validation studies. We recommend that researchers looking to validate a device perform at least three analyses to assess validity, 1. some type of correlation test (Pearson, Spearman, ICC, CCC), 2. MAPE, and 3. Bland-Altman plots with 95% limits of agreement. MAE or root mean square error (RMSE) can also be useful as they are in the same units of the device measurement [52][53][54], and some authors have seen fit to perform mean comparisons using standard hypothesis testing methods (t-test, ANOVA) with a "flipped" alpha level to determine accuracy [10]. While this type of analysis can be useful in determining whether the device tends to overestimate or underestimate, compared to the criterion measure, these tests were designed to determine whether a difference exists, and a lack of a significant difference is not the same as accuracy or validity. Therefore, statistical analyses using MAPE, correlation, and Bland-Altman plots should also be performed.
As discussed earlier, there is no widely accepted criteria to determine validity, and it varies between authors, journals, and reviewers. Of the six studies that utilized MAPE as a criterion for validity determination, a threshold of <10% was established for the validity threshold in two of them, with the other four did not report a threshold. TEE or MAE was utilized in four investigations with one study utilizing effect size calculation to determine validity, while the other three did not establish thresholds for TEE or MAE. Five out of eight validation studies utilized Bland-Altman plots, although there has not been a quantitative measure developed to establish thresholds associated with Bland-Altman plots. Correlative measures were highly common, and performed in 8/9 studies evaluated, with a minimum threshold for correlation values being >0.7 and a maximum threshold of >0.9. While acceptable analyses are beginning to emerge, there remains the need to establish universally acceptable validity criteria. As there is not even agreed upon criteria to measure accuracy and validity, accepted thresholds to determine validity have even less consensus. As the purpose of validation studies is to answer the question of whether a device is valid, thresholds to answer that question are essential. While specific use cases of the devices may influence whether a given validity threshold would be acceptable to certain populations (research, professional and collegiate athletics, consumer use, etc.), it is, nevertheless, important to establish appropriate thresholds to determine when devices may be considered valid.
The deficiency of proper analytical methods extends to the evaluation of the quality of the articles, as is evidenced in the lack of appropriate "Risk of Bias" assessment tools for a review such as this one. While some systematic reviews for validation literature will use a common risk of bias tools like the Cochrane [31] or Joanna Briggs Institute [55] assessment tools [56], others have simply chosen not to perform a risk of bias assessment [11]. While the Cochrane tool was used in the current review, there is a need to develop an assessment tool more appropriate to the study designs used in the validation of wearable technology.
Beyond establishing appropriate measurement criteria, thresholds, and a proper risk of bias tool for validation studies, there is no easy way for practitioners, researchers, athletes or consumers to determine whether a device is valid, and under what circumstances it may be valid without combing through, potentially, hundreds of peer-reviewed articles. This is time-consuming and difficult for anyone to do, and even more unlikely for athletes or consumers to do, as they may not have access to certain articles or journals. There is a need for an easily accessible, independent database to succinctly characterize which devices may be used in specific scenarios, based on the independent, peer-reviewed validation literature. This would be helpful for anyone seeking to use wearable technology, from consumers using it for recreational fitness purposes to academics and professionals conducting high-level research. As the capabilities of these devices to measure more physiological metrics inevitably improve, the need for independent research will continue to increase. In addition to adding new activities, manufacturers should also seek to continually improve the list of physiological variables that the devices can measure.

Limitations
A limitation of the current review is that Google Scholar does not allow the user to go past page 100 (1000 search results). This was not known to the researchers prior to starting the review, however, due to the popularity of Google Scholar (as stated earlier, 82% of academics start their research using Google Scholar) [32], the decision was made to move forward despite this limitation. The major reason this review has been labeled a "rapid review", was due to the search abilities associated with using Google Scholar.

Conclusions
The purpose of this review was to evaluate the current state of the literature and to identify the types of study designs, wearable devices, statistical tests, and exercise modes used in validation and reliability studies conducted in applied settings/outdoor environments. As a result, we identified nine studies that fit our inclusion criteria and reflected the current state of the literature. The main findings included 28 wearable devices with exercise modalities in outdoor environments being: running, walking, cycling, hiking, and trail running. There were not any universally common analytical techniques used to determine validity, however, correlative measures were used in 88% of the studies, mean absolute percentage error (MAPE) was used in 75%, and Bland-Altman plots were used in 63%. The devices that had an MAPE lower than 10% and a correlation value of greater than 0.7 in any measured variable were: Garmin Vivosmart (Energy Expenditure), Garmin Vivoactive (Energy Expenditure), Suunto Spartan Sport w/HRM (HR), Garmin fenix 3 HR (VO 2 max), and the PulseOn (VO 2 max).
Overall, the current review established the need for greater testing in outdoor or applied settings when validating wearable technology. Researchers should seek to incorporate multiple intensities, populations, and exercise modalities into their study designs while utilizing appropriate analytical techniques to determine validity and reliability. The results of these studies will have even greater relevance when validated in the field or in applied settings. Researchers who perform the validation of these devices enable others to confidently use these devices to drive training, health, and wellness decisions, as well as to enable the use of these devices in future research.
Funding: This research received no external funding