Reported Outcome Measures in Studies of Real-World Ambulation in People with a Lower Limb Amputation: A Scoping Review

Background: The rapidly increasing use of wearable technology to monitor free-living ambulatory behavior demands to address to what extent the chosen outcome measures are representative for real-world situations. This scoping review aims to provide an overview of the purpose of use of wearable activity monitors in people with a Lower Limb Amputation (LLA) in the real world, to identify the reported outcome measures, and to evaluate to what extent the reported outcome measures capture essential information from real-world ambulation of people with LLA. Methods: The literature search included a search in three databases (MEDLINE, CINAHL, and EMBASE) for articles published between January 1999 and January 2022, and a hand-search. Results and conclusions: 98 articles met the inclusion criteria. According to the included studies’ main objective, the articles were classified into observational (n = 46), interventional (n = 34), algorithm/method development (n = 12), and validity/feasibility studies (n = 6). Reported outcome measures were grouped into eight categories: step count (reported in 73% of the articles), intensity of activity/fitness (31%), type of activity/body posture (27%), commercial scores (15%), prosthetic use and fit (11%), gait quality (7%), GPS (5%), and accuracy (4%). We argue that researchers should be more careful with choosing reliable outcome measures, in particular, regarding the frequently used category step count. However, the contemporary technology is limited in providing a comprehensive picture of real-world ambulation. The novel knowledge from this review should encourage researchers and developers to engage in debating and defining the framework of ecological validity in rehabilitation sciences, and how this framework can be utilized in the development of wearable technologies and future studies of real-world ambulation in people with LLA.


Introduction
The use of wearable technology to monitor real-world ambulatory activity in people with a Lower Limb Amputation (LLA) has grown rapidly in the past decade. Activity monitors have the potential to provide objective information about peoples' ambulatory behavior and participation in the community, an important domain of the International Classification of Functioning, Disability and Health [1]. Data from the Swedish Amputation and Prosthetic Registry show that 27-48% of the 5762 persons with LLA report to not walk outdoors one year post-amputation [2]. Community walking is essential to participate in work, leisure, social activities, and family roles, and the inability to ambulate outside of the home increases isolation and dependency [3]. Monitoring ambulatory behavior of people with LLA in a free-living setting gives valuable information that can be used to develop

Source of Evidence Screening and Selection
The reference manager software tool EndNote X9 (Clarivate, Philadelphia, USA was used to export records and to identify and remove duplicates. Remaining records were

Source of Evidence Screening and Selection
The reference manager software tool EndNote X9 (Clarivate, Philadelphia, PA, USA) was used to export records and to identify and remove duplicates. Remaining records were exported to the web-based analysis tool Rayyan (Rayyan Systems Inc., Cambridge, MA, USA), that allows for blinded screening by multiple reviewers [23]. The screening process involved two phases: (1) reviewing based on title and abstract and (2) full-text article reviewing. For the first phase, two reviewers independently assessed the inclusion eligibility and rated each record with either 'include', 'exclude', or 'maybe'. Articles were included if the studies involved (1) at least one person with LLA using a prosthesis, (2) quantitative measurements using wearable technology in a real-world setting, e.g., outside the clinic or laboratory, and (3) participants who were free to decide their ambulatory behavior, without receiving supervision or other instructions. Studies including only persons with partial foot amputation were excluded. Additionally, studies in real-world setting where participants performed a testing protocol or received supervision or other instructions on how to behave, were excluded. Following phase 1, the two reviewers conducted debriefing meetings to discuss potential disagreements on the screening process, that were then solved by consensus. The second phase was performed by one reviewer who reviewed the full text of included articles from phase 1 for inclusion criteria. All included and excluded articles from phase 2 were subsequently discussed with the second reviewer to verify if the two reviewers agreed upon the decision of the first reviewer. Only peer-reviewed articles in the English language were included. Yet, since this is a scoping review, conference abstracts, clinical letters, and Ph.D. theses were included if they met the inclusion criteria. With duplicates existing of an original research article and a conference abstract, the original research article was prioritized, and the conference abstract excluded.

Data Extraction
The following data was extracted from each article: author(s), year of publication, country of publication, title, study design, objective(s), study population (number of participants, age, etiology, and level of amputation), control group if applicable, wearable technology used, placement of technology, environment and duration of real-world measurements, intervention if applicable, outcome measures reported for the used wearable technology, key findings, conclusion, and clinical relevance. The first reviewer initially extracted the data, and the second reviewer verified the data.

Analysis and Presentation of Results
The main objective of each study was used to synthesize categories of study design. The study design categories were used as the main structure of an overview table of all included studies. The following information from the extracted data was listed in Table 1: First author (year and country of publication), title, objective, study population (number of participants (number of females)), age, level and etiology of amputation, technology used (placement on body and duration of monitoring), reported outcome measures, and key findings. The study design categories were described with a detailed summary of the main objective from all articles. The publication year of included articles was summarized and presented in Figure 2. The analysis of reported outcome measures was performed by collecting all reported outcome measures related to the used wearable technology from each included article. The categorization of reported outcome measures was an iterative process of searching for patterns in the large number of outcome measures and discussions between the first and second reviewer.  [24] Understanding changes in physical activity among lower limb prosthesis users: A COVID-19 case series (clinical letter) Understand potential changes to Physical Activity (PA) during shutdown and "shelter-in-place" orders. Steps per day (overall, pre-index, post-index); supplemental data: number of bouts; steps per bout; time per bout; steps per day normalized to pre-index step count Two participants demonstrated clear signs of overall reduced activity through beginning stages of the COVID-19 pandemic. 2 Rosenblatt (2021, USA) [25] Prosthetic disuse leads to lower balance confidence in a long-term user of a transtibial prosthesis Assess the impact of prosthesis disuse on balance, gait, PA and balance confidence. n = 1(0); 76; TTA; cancer StepWatch 3; prosthetis; 2 × 7 days Steps per day Balance confidence, walking speed and steps per day decreased with 19%, 12%, and 19%, respectively, following 4 months of prosthesis disuse; functional measures were not impacted. 3 Miller (2021, USA) [26] Patterns of sitting, standing, and stepping after lower limb amputation Describe sitting, standing, and stepping patterns and compare the patterns between people with dysvascular Lower Limb Amputation (LLA) and traumatic LLA. Daily step count, moderate-intensity and high-intensity ambulation, and HR-QoL increased, but low-intensity ambulation decreased.  [28] Relationship between level of daily activity and upper-body aerobic capacity in adults with a lower limb amputation Investigate the relationship between upper-body peak aerobic capacity (VO2peak), PA levels, and walking capacity. Steps per day, time in sedentary, low, moderate, high intensity (%); peak intensity level VO2peak correlated significantly with daily step count, sedentary time, high-intensity activity level, and peak-intensity activity level, preferred walking speed, and 2-min walking test. 6 Davis-Wilson (2021, USA) [29] Cumulative loading in individuals with non-traumatic lower limb amputation, individuals with diabetes mellitus, and healthy individuals (conference abstract) Determine if differences existed in cumulative loading between individuals with diabetes + LLA, individuals with diabetes, and healthy individuals of similar health. n = 6(0); 58 ± 6; level and etiology of amputation N/A ActiGraph GT3X; hip; 10 days Steps per day; cumulative loading (body weight/day) No  Morning-to-afternoon percent limb fluid volume change per hour was not strongly correlated to percent time weight-bearing or to self-report outcomes. 21 Esposito ( Duration dynamic activities, walking, dynamic activities besides walking (%); sit-to-stand transitions (n); overall and walking body motility (g); resting heart rate; absolute heart rate during walking, normalized heart rate during walking (bpm); heart rate reserve (%) Participants with amputation had lower percentage dynamic activities and body motility during walking than controls. No significant differences in heart rate and percentage heart rate reserve during walking. Strong agreement between self-reported and measured activity between 9.00am-9:00pm for 34% of participants. Poor agreement between self-reported and measured time spent in various activity intensities. 42 Kanade ( Uptime was highest in persons with rotationplasty, and similar between persons with limb-sparing reconstruction and above-knee amputation. Duration dynamic activities, walking, (%); sit-to-stand transitions (n); overall and walking body motility (g); resting heart rate; absolute heart rate during walking; normalized heart rate during walking (bpm); percentage heart rate reserve (%) Participants with amputation had lower activity levels and body motility during walking than controls. No differences in normalized heart rate during walking. 46 Coleman (1999, USA) [21] Step activity monitor: long-term, continuous recording of ambulatory function Provide guidelines for use of the Step Activity Monitor (SAM), and results of accuracy and reliability testing, and case study descriptions. n = 2(1); age N/A; 2 TTA; etiology N/A Step Activity Monitor (later StepWatch); ankle; 2 × 1 week Total steps; duration inactivity (hours/day), low, moderate and high activity SAM is accurate, reliable, and can be used to perform long-term step counting on a range of subjects. It is viable means for monitoring gait activity outside of the laboratory during normal daily activities.
Interventional studies 1 Vanicek (2021, UK) [68] STEPFORWARD study: a randomized controlled feasibility trial of a self-aligning prosthetic ankle-foot for older patients with vascular-related amputations Determine the feasibility of a Randomized Controlled Trial (RCT) of the effectiveness and cost-effectiveness of a self-aligning prosthetic ankle-foot compared with a standard prosthetic ankle-foot. n = 55(8); 68.8 ± 9.6; all TTA; all non-traumatic (diabetes, PVD, blood clot, or other) ActivPAL4; prosthesis; 2 × 1 week) Steps per day; stepping (min/day) (baseline, final) The consent, retention and completion rates demonstrate that it is feasible to recruit and retain participants to a future trial. Steps per day Prosthetic foot stiffness category was significantly associated with changes in prosthetic foot-ankle biomechanics, but not with changes in gait symmetry, community ambulation and relative foot stiffness perception. Coached participants had greater decreases in waist circumference than the self-directed control group.
The home-based intervention was promising in terms of efficacy, safety and acceptability. The behavior-change intervention group showed within-group increase in daily step count, and had a higher increase in daily step count than the control group, demonstrating that the intervention might increase walking activity. Step activity and 6-Minute Walk Test outcomes when wearing low-activity or high-activity prosthetic feet Determine changes in daily step count and 6MWT with Low-Activity feet (LA) and high-activity Energy-Storage-And-Return (ESAR) feet, and examine sensitivity of these measures to classify different feet. Participants report benefitting in their performance from using an MPK, but this was not reflected in the daily activity levels. Steps per day; duration activity (minutes/weekdays, weekend days and all days); figure including number of bouts (dots), bout duration (x-axis) and cadence (y-axis) Pylon type and knee type had no effect on daily activity level or activity duration. The only significant difference was for the prosthetic-side knee angle at initial contact, which was higher with the rigid pylon than the SAP while walking a controlled speed, suggesting SAP is as effective as rigid pylon. 34 Coleman (

Search Results
The initial literature search resulted in 4006 records after removing duplicates. Then, 3198 records were excluded after screening the title and abstract, leaving 115 articles for full-text eligibility screening. Of these, 93 articles met the inclusion criteria, and in addition, 5 articles were identified through hand-searching, resulting in 98 articles included in this review. See details of the literature search and reasons for exclusion in the PRISMA flowchart in Figure 1.
The articles were structured into four categories, based on the primary reason for using wearable technology: observational studies (n = 46), interventional studies (n = 34),

Search Results
The initial literature search resulted in 4006 records after removing duplicates. Then, 3198 records were excluded after screening the title and abstract, leaving 115 articles for full-text eligibility screening. Of these, 93 articles met the inclusion criteria, and in addition, 5 articles were identified through hand-searching, resulting in 98 articles included in this review. See details of the literature search and reasons for exclusion in the PRISMA flowchart in Figure 1.

Categories of Reported Outcome Measures
The reported outcome measures related to real-world ambulation were merged into eight categories: step count, fitness and intensity of activity, type of activity and body posture, commercial scores, prosthetic use and fit, gait quality, GPS, and accuracy.
Step count was also reported in combination with other variables of walking activity, such as number of steps per intensity activity [49,90], per walking technique [59], per walking bout [24,105], per activity classification [102,112], or the maximal number of consecutive steps taken [33]. Some articles reported on step count related to location, such as number of steps taken at home and away from home [70,105,110], per community category [54,57], or the difference in steps between a day inpatient and a day out-patient [36].

Fitness and Intensity of Activity
Thirty articles reported outcome measures that were related to a person's fitness level or the intensity of the measured activity. The majority of the studies measured the cadence in steps per minute, which is an indication of ambulatory intensity [120]. Sixteen articles reported the time, frequency, or number of steps in specific intensity intervals, based on the parameters cadence or acceleration [4,21,27,28,45,47,49,51,61,62,81,90,95,101,117,118]. The number of intensity intervals and cut-off values were diverse, although most studies used the intensity intervals for low, medium, and high intensity activity. Some studies additionally included the time or frequency spent sedentary, i.e., inactivity [21,28,45,47,81,101]. Four articles demonstrated cadence distribution, by visualizing cadence per walking bouts categorized according to duration and number of bouts [86], or by quantifying the cadence variability [34,69,108]. Five articles reported the most intensive walking activity, by reporting the maximum or peak values of cadence averaged for a certain time-frame, such as the average cadence of the most intensive 60 min, 30 min, or 1 min [28,51,61,108,114]. Parameters related to activity intensity and fitness other than cadence, were walking speed [70,105], heart rate [63,67], or the acceleration of body movements in m/s 2 or g (=9.81 m/s 2 ) units [62,63,67,119]. Two articles reported the cadence variability scale parameter, which was a calculation of the distribution spread of cadence variability over the duration of the observation period [34,108].

Type of Activity and Body Posture
Twenty-six articles reported outcome measures that were related to the type of activity or body posture. To what extent the activity was specified varied among studies. Ten articles reported only the amount of activity and/or inactivity in duration, percentage or number of bouts, without further specifying for the type of activity [38,60,61,77,83,92,93,95,98,113]. Articles that specified the type of activity, reported activities such as stepping, walking, sitting, lying, standing, or other activities [24,37,41,43,62,63,[66][67][68]80,91,109,119]. Two articles included an additional specification by classifying walking activity into different categories, such as turns [102] or directional locomotion [112]. Three articles reported the number of sit-to-stand transitions [26,63,67].

Commercial Scores
Fifteen articles reported a commercial score, or a score based on a custom calculation. Three articles reported a commercial score of the K-level [87,88,114]. Three articles reported a commercial score that indicated level of physical activity, i.e., the physical activity index [62], modus index [69], ambulation energy index [69], peak performance index [69], and Fitbit activity score [117]. The latter article also reported the Fitbit Web derived miles walked, calories, and number of floors climbed. However, since the commercial score did not account for height, weight and age of the users, the authors also developed a custom model for calculation of calories and the activity score including these variables [117]. Six articles reported a custom calculated score. Two articles calculated the K-level, using the three variables potential to ambulate, cadence variability and energy expenditure [48,116]. In addition, Orendurff et al. [116] reported a clinically judged K-level by a prosthetist, who subjectively rated the three variables in figures from the StepWatch data. Three articles reported the distance walked, of which two calculated distances using the clinically assessed step length multiplied by daily step count [33,96]. Darter et al. [97] reported distance walked and walking speed, but did not further describe the calculation.

Prosthetic Use and Fit
Eleven articles reported outcome measures related to prosthetic use and prosthetic fit. Five articles reported results on the duration that the prosthesis was worn, although using diverse terms [26,27,39,41,101]. Three studies included counts of doffing the prosthesis [41,109], or the duration of the prosthesis doffed [83]. Outcome measures related to prosthetic fit aimed to monitor displacement of the socket to the limb after wearing the prosthesis, and were measured through sensor pressure change or sensor signal loss during the wearing period [106,107].

Gait Quality
Seven studies reported outcome measures that were related to gait quality. Davis-Wilson et al. [29] assessed cumulative loading during ambulation that was calculated by the formula: daily steps/2 × peak ground reaction force, the latter was measured with a force plate and normalized to body weight. Kim et al. [105] calculated stride length from the three-dimensional position of the foot using IMU data. Frossard et al. [112,113] reported in two articles temporospatial parameters, i.e., the duration of the gait cycle, swing and support phases, and kinetic parameters, i.e., the forces, moments and impulses along the anteroposterior, mediolateral, and long axis of the prosthesis to categorize ADL activities into different locomotory activities. Two articles developed an algorithm to indicate gait quality, i.e., Kaufman et al. [80] calculated so-called gait entropy, and Gaunaurd et al. [74] developed a Machine Learning Classifier that gave biofeedback related to balance, toe load and knee flexion. Kaluf et al. [69] reported stance/swing time, that was calculated with the ModusTrex software.

GPS
Six studies included GPS data in addition to ambulatory activity measurements, however only five reported GPS-related results. Jamieson et al. [103] used GPS to record elevation data in the Strava app (Strava Inc., San Francisco, CA, USA), to aid with labelling uphill and downhill movement, but did not report data directly related to GPS data. Kim et al. [70,105] used a GPS-enabled smartphone in two studies in which non-sedentary periods were identified from the raw data and combined with the location to determine where the activity occurred. Results of daily steps, cadence and walking speed were divided into measures at home and away from home. Godfrey et al. [114] used GPS data to confirm whether steps were taken in the home or in the community to calculate the Modified Clinical K-level. In two other studies from Hordacre et al. [54,57], a GPS travel recorder was combined with a StepWatch to specify community activity into seven categories: employment, residential, commercial, health services, recreational, social, and home.

Accuracy
Four articles reported outcome measures of accuracy that were directly related to the wearable technology used. Griffiths et al. [104] reported F-scores and confusion matrices for eight different models that classified the postures sitting, standing, stepping and lying. Accordingly, Jamieson implemented classifiers and a neural network for activity recognition using eight models and three levels of label resolution [103]. They reported classification accuracy, F1-scores, and confusion matrices for the two models with the highest accuracy and the accuracy of the models for each participant with LLA [103]. Redfield et al. reported on the agreement between activity classification using one or two accelerometers [111], and van Dam et al. [119] reported the test-retest reliability of activity monitor by identical assessments on two separate days.

Reported Outcome Measure in Categories Per Study Design
The most frequent reported outcome measure category was step count, followed by outcome measures related to fitness/intensity of activity (Table 2).

Discussion
The overall purpose of this scoping review was to survey the scientific literature to evaluate the use of wearable activity monitors in reporting real-world ambulation and prosthetic use in people with LLA. The results demonstrate that the number of studies using wearable technologies is rising, hence it is important to understand the opportunities and limitations in the use of these devices. By classifying the included articles according to their study design, we demonstrated that the number of algorithm/method development studies and validity/feasibility studies was relatively low, most likely because these studies are challenging to perform in the real world and hence are more often conducted in the laboratory [10]. The majority of the studies using wearable technologies in the real world were observational and interventional studies. This is not surprising, as wearable technologies enable monitoring a person's natural behavior and allow for observation over time, or assessment to the effect of an intervention. Although there exists a large battery of performance-based tests that can detect changes in physical functions and capacity [121], real-world measurements have revealed that capacity is not necessarily the same as performance [34,37]. Studies have shown that half of older community dwelling adults classified by clinic-based tests as high functional capacity, exhibit low functional level behavior in the community [48]. The use of wearable technology in the real world extends the understanding of a person's natural behavior by monitoring parameters that have not been feasible to perform in-laboratory.
This review identified multiple outcome measures that are available to monitor since the use of wearable technologies in the real world. The most frequent reported outcome measure was daily step count, which is an indication of the level of physical activity. Results of multiple studies have demonstrated that the majority of people with LLA do not meet the recommended level of physical activity [16]. Lower levels of physical activity in this population is associated with an increased risk of developing cardiovascular diseases [122], and lower perceived quality of life [123]. Monitoring physical activity may facilitate the development of personalized treatments that optimize the individual health status.
Step count is also used to calculate cadence (steps min −1 ), a measure that is closely related to walking speed and hence, indicates the intensity of walking [124]. Walking intensity classified in intervals provides valuable information about the structure of daily walking activity. For instance, Kim et al. [105] showed that people with LLA and able-bodied control persons had a similar variance in walking intensity, but the LLA group had a more positively skewed distribution of intensity, indicating that both groups had similar ranges of intensity, but that the LLA group took more of their steps at lower cadence. The cadence variability, i.e., the ability to walk at multiple speeds, is considered an important determinant of functional mobility and hence community ambulation. Some studies used cadence to report the upper boundaries of physical activity [28,46,51,61,108,114]. Peak values of performed physical activity are an indication of a person's fitness and ability to perform high-intensity physical activity. This is important, because previous research demonstrated that a larger amount of high intensity physical activity is associated to higher cardiorespiratory fitness [28]. Despite the scarce evidence on this topic, the results indicate that assessment of the most intensive physical activity performed in the real world can be a valuable measure to assess overall health status.
Other identified outcome measures in this review were related to prosthetic use and information about the environment in which the prosthesis is used. The amount and structure of prosthetic use is directly related to the amount of prosthetic ambulation, which again, is an indication of prosthetic fit and trust in the prosthesis. For instance, studies have shown that donning and doffing the prosthesis influences limb fluid volume, and temporarily doffing the socket is necessary to facilitate limb fluid volume recovery that is retained during subsequent activity [107,109]. However, Balkman et al. [41] argue that frequent donning and doffing of the prosthesis can be an indication of a poorly fitted prosthesis, that can cause skin problems on the stump. Monitoring when, how and how much a person uses the prosthesis can provide valuable information to clinicians about prosthetic fit and functioning. To investigate the amount and location of prosthetic use in the community, multiple studies have used GPS and found that ambulatory patterns outside the home are different from inside the home [70,105]. Ambulation away from home requires a higher level of functional mobility, because it generally covers larger distances, and is influenced by environmental factors such as obstacles, terrain, and variable weather conditions [54,57]. Jamieson et al. [103] used recordings from a chest-mounted camera in addition to GPS data to determine the type of terrain that participants walked over. They observed variation among the participants, i.e., some participants walked on certain terrains that other participants avoided or rarely walked over, such as sandy terrain [103]. Hence, assessment of the amount and patterns of community activity is important in prosthesis prescription and to examine the ability for participation in the society, which is an important determinant of quality of life [54].
By categorizing the identified outcome measures in this review, we were able to obtain a clear overview of which categories were reported in different study design. The results showed that of the eight categories, the category step count was the most frequent reported category, regardless of the study design. It is, however, arguable whether step count is an appropriate outcome measure for different research questions. To obtain a sufficient degree of construct validity in a study, it is important that the chosen outcome measures reasonably represent what it intends to measure. A surprising observation in this review is that 82% of the interventional studies report one or multiple outcome measures related to step count, however, the majority of the studies did not find a significant effect of the intervention on daily step count [70][71][72][73]76,78,82,85,86,90,91,93,95,96,[98][99][100]. For instance, Klute et al. [116] measured improvements in kinematic and metabolic walking efficiency in laboratory tests using a microprocessor knee versus a hydraulic knee, but reported no change in real-world ambulatory patterns. Accordingly, Andrysek et al. [85] showed no difference in step count between an automatic stance-phase lock knee (ASPL) and a weight-activated braking knee, despite the lower energy expenditure measured for the ASPL knee. Moreover, participants in this study rated the ASPL knee higher in terms of knee stability and improved walking, which could be interpreted as encouraging factors for prosthetic use, but this did not result in increased step count. Segal et al. [90] demonstrated that participants wearing a torsion adapter tended to take more low-and medium-intensity steps, but fewer high-intensity steps compared to a rigid adapter. However, total daily step count was not different between the adapters, indicating that the structure of walking might change, but not the total amount of ambulation [90]. According to Wurdeman et al. [82], changing a prosthesis will change the biomechanics of the individual, such as the step length, but not the behavior and daily routines that mainly determine the number of steps walked. Yet, interventions targeting behavior change, such as physical activity level, have neither demonstrated significant increases in daily step count [75,77]. A few studies have shown small increases immediately after the intervention, but these effects disappeared on the long-term [81,84]. Imam et al. [84] demonstrated a long-term improvement in walking capacity, but the intervention did not result in participants increasing their physical activity level. Hence, it is suggested that it is the individual's willingness to change or changes in daily routines that can lead to behavior change, rather than any enabling technology or intervention [101]. Likewise, observational studies that aim to gain understanding of mobility are sometimes limited in construct validity. Anderson et al. [31] found no difference in daily step count between fallers and non-fallers, indicating that number of steps is not necessarily a determinant for falls. The participants experienced falls mostly caused by intrinsic destabilization sources, inadequate weight shift patterns, and transfer-related functional activities, i.e., factors that are related to balance, and it would therefore be more likely to detect a between-group difference in parameters related to balance [31]. Overall, our findings demonstrate the often-used outcome measure step count has limited ability to detect changes in walking behavior, and this might have consequences on a study's construct validity. Hence, researchers should consider whether they capture the relevant information when designing their studies. The challenge whether sampled information is representative of the investigated situation was earlier demonstrated in the representative design developed by Brunswik [125]. The representative design, which is a methodological approach to achieve generalizability of results, requires researchers to sample information that is representative of the 'target ecology', and to specify how those conditions are represented in the experiment. Building on this approach, we encourage researchers to define what they are interested in to measure in their experiments and reflect on whether the selected outcome measure might answer their research question. Additionally, we recommend researchers to elaborate more precisely on the limitations of the reported outcome measures to avoid misinterpretation of the results.
As discussed earlier, in-laboratory studies of the mobility of people with LLA are often limited in the extent that study findings can be generalized to real-world situations [5]. Measurements in the real world may overcome some of the limitations of in-laboratory testing, enhancing the ecological validity of the studies. According to the definition of Martin T. Orne, ecological validity refers to the generalization of experimental findings to the real world outside the laboratory [126]. Despite the increasing popularity of studying people with LLA outside the laboratory, ecological validity is a rarely discussed topic in the field of prosthetic mobility [127]. Among the included studies in this review, only four studies mentioned the term ecological validity, without specifying what the terms implies and how it is relevant with regards to the interpretation of their results [59,73,78,112]. This concern was earlier expressed by Holleman et al. [128] in the field of social sciences where there is an ongoing debate about the definition of ecological validity and how to enhance the understanding of human behavior in the real world. There seems to be no agreement upon a definition in the literature, nor any form of classification or quantitative approach to determine or evaluate a study's ecological validity. Holleman et al. [128] describe that technological advances have further stimulated researchers to emphasize the importance of studying human behavior in the real world. However, they additionally argue that labeling an experiment as 'ecological valid' because it is conducted in a 'real-world' environment can lead to misleading and potentially counterproductive conclusions [128]. Therefore, they highlight the importance of developing and criticizing the contemporary framework of ecological validity. The contemporary framework for evaluation of ecological validity includes the dimensions stimuli, tasks, behaviors, and research context that can be evaluated on a continuum of artificiality versus naturality and simplicity versus complexity. Whereas the in-clinic environment is characterized by its artificiality and simplicity, the real world is at the other extreme and is characterized by its naturality and complexity. With respect to the included studies in this review that monitor prosthetic ambulation in the real world, the environment is in principle higher in ecological validity compared to laboratory studies. Namely, the study subjects perform their normal behavior, without receiving instructions and without any other demand characteristics that can influence their behavior. However, based on the results of this review, we believe that the contemporary wearable technologies are limited in the ability to capture the essential information of real-world ambulation. This review demonstrated a poor diversity of reporting outcome measures, in particular studies using commercial devices rather than custom-developed devices were limited to reporting step count or the intensity of activity. Therefore, we wish to introduce some suggestions to the future development of wearable technologies. First, we observed that few studies included essential determinants of community ambulation, such as parameters related to prosthetic fit or gait quality (11% and 7% of the included articles, respectively). Research has demonstrated that gait symmetry and step length, i.e., indications of gait quality, are associated with performance-based measures [129], and that walking capacity is associated with walking performance in the community [28,61]. Hence, gait quality might also be associated with the amount and structure of community ambulation. Therefore, technological development and advancement of wearable sensors should include outcome measures of gait quality of prosthetic ambulation. Second, we observed that parameters related to balance, which is an important determinant of prosthetic mobility [130], are not yet included in the features of the contemporary wearable technology, though several studies have demonstrated that balance confidence is associated with the level of community activity and participation [42,49]. Hence, future research should investigate the potentials of including parameters related to balance in advancement of wearable technologies. Last, mobility involves dimensions that are challenging to quantify, such as pain, fear, motivation, confidence, or other psychosocial aspects [3,32]. Enhancing the understanding of the complexity of prosthetic mobility in daily life may facilitate further development of wearable technologies for the purpose of monitoring ambulatory behavior in this population. As such, we recommend future researchers to utilize studies that investigate prosthetic mobility in daily life using a holistic approach, such as performed by e.g., Hafner et al. [8], Batten et al. [3], and Miller et al. [32]. On the other hand, we recognize that the complexity of real-world ambulation and diversity in human behavior might go beyond the potentials of technology. Yet, technological advancements that aims to integrate more variables that are important determinants of prosthetic mobility can enhance the opportunities to capture essential information of real-world ambulation.

Limitations
First, our review was limited to English publications only, and may have excluded important studies published in other languages. Second, the classification of articles based on study design was performed by the two reviewers and judged according to the aim of the study. Many studies had multiple study objectives that could be considered under different study designs, such as interventional studies that in addition had objectives that were essentially observational, or algorithm/development studies that also included a form of accuracy assessment. However, we believe the classification of study designs used in the present review, is appropriate for describing the main objective of each study. Third, the synthesis of outcome measures categories was a subjective evaluation by the reviewers. The large variety in outcome measures and related units may have caused somewhat overlap between categories. For instance, cadence was considered as an indication of walking intensity, although it is in essence based on the number of steps. However, we believed that the category step count was more related to the level of physical activity, while the intensity of activity is more related to the structure of walking throughout the day. Last, our concern regarding the extent that reported outcome measures can answer the study's research question was based on an overall evaluation of all studies included in this review. To judge each individual study goes beyond the scope of a scoping review [128]. Yet, we believe that our evaluation was sufficient to emphasize the need to report outcome measures that capture the essential information of real-world ambulation of people with LLA.

Conclusions
To the best of our knowledge, this is the first review that presents the reported outcome measures in studies of real-world ambulation in people with LLA. We identified that the most frequent used outcome measure was related to step count, regardless of study design. We have expressed our concerns that step count might not be a reliable outcome measure to detect change of an intervention, as step count is highly dependent on a person's daily routine. Other important outcome measures were less reported, such as outcome measures related to the type of activity, or the intensity of activity. Only few studies reported outcome measures related to gait quality or prosthetic fit. In future research, we encourage researchers to reflect on whether the selected outcome measures are representative of the investigated situation, and to elaborate on the limitations of the reported outcome measures. Additionally, we argue that the contemporary technology is limited in providing a comprehensive picture of real-world ambulation. In future development of wearable technologies, we encourage researchers to integrate variables that are important determinants of prosthetic mobility. Furthermore, as the use of wearable technology in the real world is expected to further increase, we encourage researchers in the rehabilitation sciences to engage in the debate and development of the definition and framework of ecological validity.