An Objective Methodology for the Selection of a Device for Continuous Mobility Assessment

Continuous monitoring by wearable technology is ideal for quantifying mobility outcomes in “real-world” conditions. Concurrent factors such as validity, usability, and acceptability of such technology need to be accounted for when choosing a monitoring device. This study proposes a bespoke methodology focused on defining a decision matrix to allow for effective decision making. A weighting system based on responses (n = 69) from a purpose-built questionnaire circulated within the IMI Mobilise-D consortium and its external collaborators was established, accounting for respondents’ background and level of expertise in using wearables in clinical practice. Four domains (concurrent validity, CV; human factors, HF; wearability and usability, WU; and data capture process, CP), associated evaluation criteria, and scores were established through literature research and group discussions. While the CV was perceived as the most relevant domain (37%), the others were also considered highly relevant (WU: 30%, HF: 17%, CP: 16%). Respondents (~90%) preferred a hidden fixation and identified the lower back as an ideal sensor location for mobility outcomes. Overall, this study provides a novel, holistic, objective, as well as a standardized approach accounting for complementary aspects that should be considered by professionals and researchers when selecting a solution for continuous mobility monitoring.


Domains and Relevant Criteria
The factors to evaluate were grouped into the following four domains ( = 1, … ,4 ). Both domains and the associated criteria ( = 1, … , ) were identified ( Figure 2) through a combination of literature search and expert opinions within the IMI Mobilise-D consortium, which consists of many of the world's leading scientists, clinicians, and companies on mobility assessment (>150 professionals from 34 partners; https://www.mobilise-d.eu): Concurrent validity-factors related to the validity of the measurements; Human factors-factors related to the context of data capture, perception of the user towards the technology, data security and privacy, effect of monitoring outside clinical settings; Wearability & usability for the wearer-e.g., size, location, fixation modality, charging frequency; Data capture process-e.g., whether a calibration procedure, device programming, or anthropometric information are required for appropriate data capture.

Domains and Relevant Criteria
The factors to evaluate were grouped into the following four domains (d = 1, . . . , 4). Both domains and the associated criteria (c = 1, . . . , N) were identified ( Figure 2) through a combination of literature search and expert opinions within the IMI Mobilise-D consortium, which consists of many of the world's leading scientists, clinicians, and companies on mobility assessment (>150 professionals from 34 partners; https://www.mobilise-d.eu): Concurrent validity-factors related to the validity of the measurements; Human factors-factors related to the context of data capture, perception of the user towards the technology, data security and privacy, effect of monitoring outside clinical settings; Wearability & usability for the wearer-e.g., size, location, fixation modality, charging frequency; Data capture process-e.g., whether a calibration procedure, device programming, or anthropometric information are required for appropriate data capture.

Domains and Relevant Criteria
The factors to evaluate were grouped into the following four domains ( = 1, … ,4 ). Both domains and the associated criteria ( = 1, … , ) were identified ( Figure 2) through a combination of literature search and expert opinions within the IMI Mobilise-D consortium, which consists of many of the world's leading scientists, clinicians, and companies on mobility assessment (>150 professionals from 34 partners; https://www.mobilise-d.eu): Concurrent validity-factors related to the validity of the measurements; Human factors-factors related to the context of data capture, perception of the user towards the technology, data security and privacy, effect of monitoring outside clinical settings; Wearability & usability for the wearer-e.g., size, location, fixation modality, charging frequency; Data capture process-e.g., whether a calibration procedure, device programming, or anthropometric information are required for appropriate data capture.   To properly assess the criteria within this domain, reference parameters measured with a gold standard system (e.g., stereophotogrammetry or instrumented walking for mobility evaluation) need to be available. While several parameters can be captured during continuous mobility monitoring, this study focused on real-world walking speed (RWS), as a representative example. Level of agreement (expressed as the interclass correlation coefficient-ICC), accuracy, robustness, and reliability of RWS measurements can be assessed to quantify associated sources of error. Since the validity of RWS estimation depends on both the identification of a walking bout [5], and the initial and final contacts of the foot with the floor [29], the validity of these events needs to be considered as well.

Human Factors Criteria
Acceptance and adoption of wearable devices are affected by the wearer's view on the use of such devices to manage their health condition [33,34], data security [33], and their experience of, and adherence to, the proposed data capture process [38]. Of paramount importance for the wearer is the perceived impact that being monitored can have on daily life activities, as well as trust in the measurements collected by the device; perceived usefulness strongly correlates with wearer acceptance [43].

Wearability and Usability Criteria
Widespread deployment of wearables requires "perceived usefulness" by the stakeholders, and benefits of use to be balanced with "perceived ease of use" [43]. Comfort, battery life, and feedback provided by the device are additional elements to be considered within this domain [35,36], as well as its size, location, and method of attachment to the body [39,44,45].

Data Capture Process Criteria
Some devices/algorithms perform optimally when additional calibration procedures are performed, such as holding a static posture [7,31], device programming [46], or providing anthropometric measurements as an input [6,21]. These elements directly affect participant-device interaction and should be accounted for.

Weighting System
A questionnaire was designed to establish the relevance (i.e., weighting system) of the selected domains and criteria. Approval from the University of Sheffield Research Ethics Committee (Application 027211) was obtained for this study, and participants agreed to take part in the research by completing the anonymous online form. The online questionnaire was circulated among 34 partner institutions belonging to the Mobilise-D consortium, which consists of more than 150 professionals (e.g., scientists, clinicians, and companies) working on mobility assessment using wearable devices (www.mobilise-d. eu), and its external collaborators. Before widespread distribution, the ad-hoc questionnaire was pilot tested for both readability and data acquisition by using feedback from various professionals.
Following the process visualized in Figure 3, the gathered responses were used to assess: As a researcher, have you ever used a wearable device? 3.
Have you ever used wearable devices directly on patients as opposed to healthy individuals? 4.
Have you ever analysed the information/data extracted from wearable devices to characterise patients' mobility?
iii Each positive response was scored as 0.25, and the total LoE was obtained as a sum of the partial scores. LoE of each participant was then classified as excellent, good, average, poor, or none if total LoE was 1.00, 0.75, 0.50, 0.25, and 0, respectively. iv Respondents' perceived level of importance of each domain and criterion, based on a 1-5 Likert scale (1 = unimportant; 5 = very important). v The modal value of the responses of each domain and criterion, ω d and ω d,c , respectively, calculated as the preferences indicated by each respondent. The latter were multiplied by the relevant LoE, which allowed us to account for the relevant respondents' level of expertise.
Finally, the computed weights were normalised as [42]: For each criterion c: where N are the criteria included in the relevant domain (d).
Each positive response was scored as 0.25, and the total was obtained as a sum of the partial scores.
of each participant was then classified as excellent, good, average, poor, or none if total was 1.00, 0.75, 0.50, 0.25, and 0, respectively. iv.
Respondents' perceived level of importance of each domain and criterion, based on a 1-5 Likert scale (1 = unimportant; 5 = very important). v.
The modal value of the responses of each domain and criterion, and , , respectively, calculated as the preferences indicated by each respondent. The latter were multiplied by the relevant , which allowed us to account for the relevant respondents' level of expertise.
Finally, the computed weights were normalised as [42]: For each criterion : where are the criteria included in the relevant domain ( ).

Scoring System
Each criterion was first classified as either "benefit" or "cost" (Table 1) and scored higher/lower if implying a better/worse sensor/algorithm solution [42], using scores that were normalised concerning their range of variation within each criterion and domain:

Scoring System
Each criterion was first classified as either "benefit" or "cost" (Table 1) and scored higher/lower if implying a better/worse sensor/algorithm solution [42], using scores that were normalised concerning their range of variation within each criterion and domain: Cost criteria where s d,c is the score assigned to the criteria c of the d domain (0 ≤ e d,c ≤ 1). Only respondents who had declared to have a technical background were asked to score concurrent validity criteria based on the following definitions: • Accuracy: closeness of an estimated parameter (p) to the "true value" measured using a gold standard (p GS ) and is expressed in percentage as: • Robustness to changes in the device positioning, quantified as e%.

•
Reliability between different trials, quantified as e%. • ICC: the agreement between p and p GS in different trials. • Sensitivity (%): describes the true positive (TP) events, i.e., the number of gait events (GEs-defined as initial and final foot-to-ground contacts and used to identify strides, steps, as well as gait cycle phases [18], expressed as unitless numbers) and Walking Bouts (WBs) correctly identified with a device/algorithm solution (n GE ) as compared to the values from a gold standard (n GE_GS ): • Specificity (%): number of true negative (TN) events relative to the actual events assessed with a gold standard: • Positive predictive value (%): TP events over the total amount of identified GEs, including falsely detected GEs (TP + FP): Criteria from the other domains were scored using the system shown in Table 1. Location and fixation modality criteria scores were defined by asking participants to rank possible choices taken from the literature [20,[47][48][49]. They were then asked to indicate the best three from twelve locations (lower back/hip/waist; pocket; chest; neck (body-fixed); neck (pendant); head; foot; ankle; shank; thigh; wrist; arm) and five fixation modalities (adhesive on the skin; strap above/below clothes; clip above/below clothes). The recorded ranking scores (1, 2, 3 for 3rd, 2nd, 1st, respectively) were then scaled by the respondents' LoE.

Comparison of Concurrent Solutions
For each monitoring solution (E i ), an overall score, based on the partial scores obtained for the different domains and criteria and on the calculated weights and scores, was finally computed: where e d is the overall score of each domain d, obtained as the combination of the scores e d,c and normalised weights w d,c , assigned to each of the N criteria.

Application of the Decision Matrix
Among the different studies in the literature evaluating either different solutions for DMOs estimations, the information and results extracted from two studies were used to feed the decision matrix and practically demonstrate how this tool can be used in future research. Among the different domains' proposed criteria, a subset of the available scores for the relevant studies was available and used in the decision matrix. The weighting systems were, therefore, accordingly adjusted based on the results obtained in this study. Benefit and cost scores were assigned based on Table 1 and the relevant information obtained through the ad-hoc questionnaire (i.e., fixation modality and device location) and normalised as described in Section 2.3. For each wearable device, the overall score was calculated using Equation (5).

Participants
Sixty-nine participants submitted their responses to the questionnaire (Figure 4). Among them, 83% had either an excellent or good level of expertise (LoE ≥ 0.75) in the use of wearable devices.   Table 2 shows the normalised weights for domains and relevant criteria, as calculated based on each respondent's perceived level of importance ( Figure 5).   Table 2 shows the normalised weights for domains and relevant criteria, as calculated based on each respondent's perceived level of importance ( Figure 5). Based on the obtained modal values of each domain, both concurrent validity and wearability and usability domains were classified as "very important" for a seven-day mobility monitoring solution. The other two domains were labeled as "important"; and this classification was  Based on the obtained modal values ω d of each domain, both concurrent validity and wearability and usability domains were classified as "very important" for a seven-day mobility monitoring solution. The other two domains were labeled as "important"; and this classification was not modified when the respondents' LoE was considered ( Figure 5).

Scores
The favourite location and fixation criteria were the "lower back/hip/waist" and "strap below clothes," respectively, as shown by the results reported in Figure 6. The most common explanations behind the choice of the lower back/hip/waist location were the respondents' previous experience with this solution with their patients, comfort, proximity to the centre of mass location, the possibility of the device to be integrated with a belt and the potential to "track" the movement of both lower limbs with a single device. The fixation with a strap below the clothes was indicated as preferred due to this method's robustness, the possibility of hiding the sensor, and preserving participant privacy and past positive experiences with this approach.
behind the choice of the lower back/hip/waist location were the respondents' previous experience with this solution with their patients, comfort, proximity to the centre of mass location, the possibility of the device to be integrated with a belt and the potential to "track" the movement of both lower limbs with a single device. The fixation with a strap below the clothes was indicated as preferred due to this method's robustness, the possibility of hiding the sensor, and preserving participant privacy and past positive experiences with this approach.
Step time accuracy and robustness (highest e% value reported for each method) were considered as representative for walking bout detection accuracy and robustness (Table 3), respectively.

Example 2
Among the seven wearable devices explored in Storm et al. [28], four (S1-Movemonitor, S2-Up, S3-One, S4-ActivPAL, Table 4) were selected for the concurrent evaluation, performed using step detection accuracy as a concurrent validity criterion ( Table 4). The mean step detection accuracy value was calculated for each monitoring solution over those reported for slow, self-selected, and fast walking speeds.

Discussion
This study aimed to propose a standardised methodology for selecting the optimal device for continuous mobility monitoring, with a special focus on walking speed. Although this method was implemented using professionals/researchers, a similar approach could also be used to evaluate user perspectives. This approach's novelty allows researchers to assess the relevance of domains that were previously quantified only in isolation [33][34][35][36][37]39,40], such as the wearability and usability of a device, in combination with aspects related to its validity and other domains. This ensures a more robust choice of a specific solution.
The different aspects to be considered while exploring concurrent continuous mobility monitoring solutions were first identified, and their relevance assessed by capturing information from experts in this research area. The identified domains of relevant criteria, and calculated weighting and scoring systems, were the three elements that identify the decision matrix, representing the successfully developed method.
The scoring system, which combined "benefit" and "cost" criteria, highlighted the differences among monitoring solutions and allowed the calculation of an overall score for each of them [42]. This procedure allows a trade-off on multiple and concurrent domain/criteria.
The weighting system was obtained via an experts' questionnaire and constituted an objective methodology to assess the selected elements' relevance while aiming to identify an optimal monitoring solution. Critically, this method's reliability does not rely on the knowledge and expertise of a single decision-maker, which could bias the outcomes [42]. The novelty of this developed approach is that it allows researchers to consider the respondents' expertise, making the unbiased results especially relevant for the field. The use of examples taken from the literature demonstrated how this framework could be used when only a subset of domains/criteria are available by adjusting the relevant scoring system to specific requirements.
From a professional's perspective, the concurrent validity domain, which is the one most widely considered in the literature when a new wearable device is proposed, was also confirmed to be the most important in this study (37%), even by respondents from a non-technical background (33%). Nonetheless, results indicate that the other domains are also important for the widespread deployment of wearable devices (wearability and usability: 30%; human factors: 17%; data capture process: 16%). Recently, a study [50] attempted to provide some guidelines for selecting and comparing different devices; however, the focus was still mostly on how technical specifications and raw data quality affect the validity domain.
Respondents to the questionnaire, who were professionals (i.e., developers, clinicians, and researchers) who deploy the technology, were asked to select the best three location and fixation solutions. This has allowed for the establishment of an exact ranking among different solutions for continuous mobility monitoring. Although previous studies have assessed the effect of different device locations [23][24][25][26][27][28][29][30][31][32], the effect of a variety of fixation methods had not yet been explored. Thus, we have developed and applied a novel quantitative approach to allow these criteria to be explicitly identified and ranked. Almost 90% of the responders chose a device placed on the lower back (of these, 62%, 24%, and 14% identified this as the first, second, and third choice, respectively) because it provides accurate measurement and can be integrated with a belt. This solution is indeed usually accepted for long-term at-home use, approximates the centre of mass location, and is the most common location adopted in studies assessing mobility [48]. For the fixation method, the solution identified as an ideal one can be hidden under the clothes (83% of the respondents). In particular, a strap (43%) and adhesive on the skin (43%) were indicated as the most robust fixation methods (i.e., less relative motion between the device and the segment where it is placed). Other explicit preferences for location and fixation modality included choosing a solution that provides reliable measures, allows comfort, as well as device aesthetics, confirming what has been previously reported in the literature [35,43].
Once developed, the proposed framework was successfully used for ranking concurrent solutions using data extracted from the literature to compare different algorithms applied to the same raw data, which led to conclusions similar to those from the original study [31] while also providing a single summary score for each proposed solution. When used to evaluate the performance of the different wearable devices reported in Storm et al. [28], the differences among the solutions can be further highlighted, not just considering their concurrent validity, but also the other three domains, which are key elements for the widespread use of this technology. Moreover, a similar methodology could also be implemented when selecting concurrent devices for different applications.
Users (i.e., wearer, either patients or participants), which are the real stakeholders who will directly use this technology, did not participate in this questionnaire, which certainly represents the main limitation of this study. Nonetheless, their opinion might have been biased by their previous experience, which is usually limited to using a single wearable device. In order to include this essential aspect, future studies should either recruit a specific population of individuals who previously experienced different wearable solutions for mobility monitoring or having them participating in an ad-hoc comparative assessment. As highlighted by Manta et al. [51], both patients and care partners should be engaged in the selection and development of digital mobility outcome solutions for identifying a solution that is effective, helpful, and improves both quality and efficiency in clinical research and care. Future studies might include the opinion from a specific population of individuals who previously experienced different wearable solutions for mobility monitoring or include them in a comparative assessment. Their relevant perception of the importance of the identified factors could then be integrated and combined with the information collected in this study. Moreover, it would be of interest to evaluate the effect that awareness of the criteria adopted behind the design of a device might have on user perception and acceptance of the device.

Conclusions
This study proposed a new methodology that provides a novel, holistic, objective, as well as standardised approach accounting for complementary aspects that should be considered by professionals and researchers when selecting a solution for continuous mobility monitoring. An ad-hoc decision matrix has been established for this aim, the definition of which made it explicit that a comprehensive approach should be adopted when choosing a technology for continuous mobility monitoring if aiming for widespread adoption. In particular, the four identified domains: concurrent validity, human factors, wearability and usability, and data capture process should be simultaneously considered when evaluating concurrent solutions.