Quantification of Movement in Stroke Patients under Free Living Conditions Using Wearable Sensors: A Systematic Review

Stroke is a main cause of long-term disability worldwide, placing a large burden on individuals and health care systems. Wearable technology can potentially objectively assess and monitor patients outside clinical environments, enabling a more detailed evaluation of their impairment and allowing individualization of rehabilitation therapies. The aim of this review is to provide an overview of setups used in literature to measure movement of stroke patients under free living conditions using wearable sensors, and to evaluate the relation between such sensor-based outcomes and the level of functioning as assessed by existing clinical evaluation methods. After a systematic search we included 32 articles, totaling 1076 stroke patients from acute to chronic phases and 236 healthy controls. We summarized the results by type and location of sensors, and by sensor-based outcome measures and their relation with existing clinical evaluation tools. We conclude that sensor-based measures of movement provide additional information in relation to clinical evaluation tools assessing motor functioning and both are needed to gain better insight in patient behavior and recovery. However, there is a strong need for standardization and consensus, regarding clinical assessments, but also regarding the use of specific algorithms and metrics for unsupervised measurements during daily life.


Introduction
Stroke is classically characterized as a neurological deficit attributed to an acute focal injury of the central nervous system by a vascular cause and is a major cause of disability and death worldwide [1]. Around 80% of stroke victims survive [2], but their quality of life can be severely impacted in both physical and physical-emotional domains [3]. The physical effects of a stroke in the brain mainly manifest on the contralateral side of the body and can be extremely persistent: research shows that 3-6 months after stroke, 55-75% of stroke survivors still experience problems in functioning of the affected body side [4].
Motor functioning of stroke patients is typically assessed in the controlled environment of a clinic, hospital or research laboratory, usually by asking the patient to perform standard clinical assessment tests which include repetitive tasks or isolated movements. However, this functional assessment is not representative of individual free-living behavior [5]. Because daily life functioning is severely affected by stroke [3], monitoring patients during their activities of daily living (ADL) could provide more valid information about patients' functioning in their home environment. This knowledge enhances evaluation of the effects of rehabilitation interventions, which could help to improve the interventions and quality of care and thus eventually stroke patients' quality of life.
Recent technological developments in wearable technology have led to a steady increase in the number of studies monitoring movements related to activities of daily living of stroke patients. These wearable systems benefit from a high acceptance rate and simplicity and can be used independently from a base station, which makes them easy to use outside controlled environments. However, most studies that use wearable measurement systems in stroke patients are using them to measure repetition of a task or routine in clinical or simulated ADL conditions in well controlled settings. Measurements in these controlled conditions may result in performance bias and are thus not representative of the actual patients' functioning in their home environment [6]. Additionally, those measurement systems used in simulated ADL environments might not be reliable for examining daily life functioning in free-living environments: movements in the latter less organized environment are self-initiated, usually task oriented, less predictable and have a higher variation [7].
Lately, an increasing number of studies have tried to gain better insight into the real-life behavior of patients by evaluating them in their free-living environment [8][9][10][11]. Using wearable sensors quantitative aspects of movement, such as the amount of activity, the number of steps or repetitions or the intensity of movement can be determined. However, for functioning in daily life qualitative aspects of movement are also important, since they provide information about how movements are performed, and insight into those movement aspects that need to be tailored during rehabilitation. Since continuous monitoring of stroke patients' functioning in daily life has inherent challenges that are not present in the clinic or lab, a clear overview of the possibilities for capturing quantitative and qualitative aspects of movement through sensor-based assessment would be of great interest. Furthermore, an overview of relations between such sensor-based measures and existing clinical evaluation tools would provide insight in the validity and added value of continuous movement monitoring in stroke patients.
A few prior studies executed reviews that are related to this topic. The study of Noorkõiv et al. [12] is most comparable in scope to ours: with the aim of assessing the additional clinical value of accelerometry after stroke they selected eight studies that investigated upper extremity activity after stroke, in free-living environments and discussed correlations between accelerometry and clinical measures. However, many developments in this area of research have happened in the last decade. Another interesting review for our purposes is the one by Johansson et al. [13] who studied the use of wearable sensors for clinical applications in stroke, as well as epilepsy and Parkinson's disease. Their aim was to synthesize knowledge from quantitative and qualitative clinical studies, executed in laboratory, hospital as well free living environments, including studies of movement as well as physical activity. They included 24 studies in stroke but did not provide an overview of measurement set-ups. Gebruers et al. [14] systematically reviewed clinimetric properties and clinical applicability of different accelerometry-based measurement techniques in stroke patients. With that aim in mind, they also discussed correlations between accelerometry and common stroke scales when reported in included studies. They did not specifically focus on activity in free living conditions, however, nor did they discuss or summarize the measurement setups used. Fini et al. [15] described how physical activity is monitored following stroke, summarising methods and devices used across the stroke pathway and documenting their psychometric properties. They did not study quantification of movement nor did they focus on free living conditions, however. The most recent related review [16] focused on how wearable technologies have been used over the past decade to assess gait and mobility, but not other types of movement, in stroke patients. They did not focus on free living conditions, either. In summary, there is no recent overview of how quantitative and qualitative aspects of movement are captured through sensor-based assessment in stroke patients, during free-living conditions. Furthermore, a recent overview of relations between such sensor-based measures and existing clinical evaluation tools in this context is also missing.
The aim of this review is therefore: (1) to provide an overview of setups used in literature to measure the quantitative and qualitative aspects of movements of stroke patients under free living conditions using wearable sensors, and (2) to evaluate the relation between the sensor-based outcomes that are obtained from moving in a free living environment and the level of functioning as assessed by existing clinical evaluation methods.

Materials and Methods
This review was performed according to the preferred reporting items for systematic reviews and meta-analysis statement (PRISMA) [17] and registered in the international prospective register of systematic reviews (PROSPERO registration ID: CRD42020207226).

Search Method
A literature search was conducted on the 30 December 2021, using the PubMed, Scopus and Web of Science databases.
The search term was composed by four extensive parts separated by the AND operators: (1) Terms for stroke (2)  After combining the results of these searches, three steps were taken to identify eligible studies. First, one author (MB) removed duplicates. Second, two authors (MB, HD) screened the remaining papers independently by titles and abstracts, using the inclusion/exclusion criteria specified below. Third, remaining full-text versions were assessed independently by two authors (MB, HD). Any disagreements in the results were resolved in a consensus meeting with a third assessor (CL).

Eligibility Criteria
Included studies involved patients after a stroke episode, used wearable sensor technology for movement assessment and quantifying patient movement. Articles in other languages than English as well as abstracts, case reports, reviews, study protocols, nonoriginal research or theses were excluded. Furthermore, as we aim to only include studies under free living conditions, by which we mean that participants carried out their nor-mal routine at the clinic or at home without any behavioral constraints or instructions, studies were excluded if they used only standardized assessments or tests, if participants performed a list of scripted activities, or if they used non-passive setups, such as robotic assistance, electrical stimulation or feedback systems. Studies were also excluded if the main topic was activity recognition or measurement of the amount of physical activity, e.g., by energy expenditure. Finally, studies taking place during the hyper-acute phase of stroke (less than one day after stroke) were also excluded on the rationale that patients in such an early phase-in most cases bedridden and clinically monitored-can hardly perform any kind of activities of daily living.

Assessment of Methodological Quality
Each of the selected articles was assessed on methodological quality by two authors (MB, HD) using the tool developed by Downs and Black [18] (Supplementary Materials). For items 3-5, 10, 19, 25 and 27, we added an explanation to the table in Supplementary Materials on how the items were interpreted in the context of our review. In case of inconsistencies regarding the scores on the items, disagreements were resolved by consensus or by discussion with a third assessor (CL). The study quality was classified as high (score ≥14, 75% of the maximum score of 19), moderate (9 ≤ score < 14, 50-74% of the maximum score) or low (score < 9, 50% of the maximum score). Low-quality studies were excluded from further data extraction and synthesis.

Data Extraction and Synthesis
The data were extracted by two authors (MB, HD) using pre-formatted forms that included authors and year of publication, experimental design, sensor technology and placement, measurement task, population, outcomes extracted (clinical measures, sensorbased measures) and results.
Note that the aim of the review was not to provide a meta-analysis of results; due to the wide variety in the design of studies and outcome measures used it was not possible to pool outcome measures together. Therefore, a narrative synthesis was created, covering the main descriptive themes from the selected articles relevant for our research questions, such as characteristics of the sensor setups and relations between sensor-based movement characteristics and the patient's clinical state. The latter were expressed in correlation strength.

Results
The database search yielded 2561 studies. After removal of duplicates, 1671 remained. Application of the eligibility criteria excluded 1610 articles during title and abstract screening and 28 articles during full-text screening. The remaining 33 articles were assessed for methodological quality. One study was excluded due to poor methodological quality, scoring only four points (Supplementary Materials). Out of the remaining 32 studies that were included for data synthesis, 29 (91%) were of high quality [5,[8][9][10][11] and three (9%) of moderate quality [43][44][45]. This inclusion process is described in Figure 1. A summary of all included studies is presented in Appendix A in Table A1 (continuous recording in the clinic or during rehabilitation) and Table A2 (continuous recording during ADL).

Study Design, Sample Size and Participant Characteristics
All 32 included studies were observational, of which 17 were cross-sectional and 15 were prospective cohort studies. Sixteen studies assessed activities in the hospital or rehabilitation center (Appendix A; Table A1), and sixteen studies activities of daily living in the home environment (Appendix A; Table A2). Eleven studies compared results between stroke patients and healthy controls and five studies had their stroke patients divided into two groups for comparison between fallers and not-fallers [11], walking patients and wheelchair users [5], inpatients and outpatients [25] or controls and patients following Constraint Induced Movement Therapy (CIMT) [26,27] or other interventions [41].
In total 1076 stroke patients were included in the 32 studies, ranging from four to 169 stroke patients per study. Ten studies included data of in total 236 healthy controls in their analysis [5,8,20,23,29,33,35,36,40,44,45]. The reported average age of the stroke participants was 61 years while healthy participants were usually younger with a reported average of 55.8 years.

Protocol
All studies consisted of at least one hour of continuous measurement. In 16 of the included studies measurements were performed in pure ecological conditions at the home of the participants (Appendix A; Table A2). In the other 16 studies participants were measured or started being measured in the hospital or rehabilitation center (Appendix A, Table A1). The clinical motor assessment instruments differed between studies, however, the free-living part of their protocols was in essence the same: it started with the instrumentation of the participant by the researcher, after which they could move freely during the length of the assessment. In five studies participants were measured in both clinical and home environments (Appendix A, Tables A1 and A2). The time patients were measured ranged from 2 to 168 h. In the home environment the mean measurement duration was 73 ± 66.5 h while it was 54.5 ± 69.5 h in the hospital/rehabilitation environment. The longest measurements were 24 h a day for at least 7 consecutive days in the home and hospital environments [11,39,43]. In total 7936 h of healthy participant and 100,815 h of patient data were gathered, the latter divided into 72,463 h in home environments and 28,352 h at hospitals and rehabilitation centers.

Sensor Placement and Technology
Accelerometers and activity monitors-with embedded accelerometer sensors-were used in all studies ( Figure 2). Five studies used inertial measurement units (IMUs) [19,31,32,34,45] that combine tri-axial accelerometers, gyroscopes and magnetometers, although three of them [19,34,45] did not include the data from the magnetometer in their analysis. The other two studies [31,32] used IMUs as part of a full body motion measuring system. Two studies of the same research group [29,44] used an activity monitor combined with an electrohydraulic activity sensor that, by means of a fluid-filled tube laid over the arm from shoulder to wrist, was able to measure the elevation of the arm with respect to the body. Finally, de Lucena et al. [36] used a setup made from an accelerometer, four magnetometers and a magnetic ring to measure the amount of hand activity.
Most of the studies-26 of 32-focused their research exclusively on upper limb functioning and placed the sensors on the wrists or forearms, while twelve of these studies also placed sensors on other parts of the body [10,19,20,26,29,31,32,35,36,39,43,44]. Except for the studies [29,44], in which the (electrohydraulic) sensors were placed along the arm, from shoulder to wrist, and [36] in which a magnetic ring was used as part of the setup, the information of the additional sensors was used to derive parameters for full body kinematic models [31,32], activity recognition or gait detection [10,19,20,26,35,39]. Four studies combined upper and lower body data [5,34,37,43], and three focused on walking only, in particular on the amount, number of walking bouts and gait characteristics [11,21,23]. While the wrists were clearly the preferred location of upper body studies, gait and lower body performance were evaluated by using single or a combination of sensors on the sternum, lower back, hips, thighs and ankles [11,21,23].

Movement Measures Derived from Sensors
In total 110 different variables were reported that were calculated from the sensor signals, to examine quantitative (amount of movement) and qualitative (symmetry, variability, kinematics) aspects of upper limb activity and gait performance.

Upper Limb Activity-Related Movement Measures
Most of the upper limb activity related movement measures were based on the magnitude of the 3D accelerometer vector x 2 + y 2 + z 2 . Several studies [5,[8][9][10]19,20,22,24,25,28,30,35,36,[39][40][41][42][43] integrated this value over a time window -ranging from a few seconds [8] to several hours [10]-to quantify the amount and intensity of upper limb movement. Studies that reported their findings using similar values, like the signal magnitude area [34], or 'counts' are also included in this category. Counts represent filtered and summed acceleration signals, created by the software of manufacturers of several commercially available activity trackers. While counts also employ the magnitude of the acceleration, the algo-rithms used in creating them are not always available, which limits comparability between activity trackers. A simplification of the movement magnitude is the use time, which is calculated by applying a threshold to the amount of movement over a period of time, to determine what is movement and what is noise. Use time is usually calculated using the norm of the acceleration, except in the study by Flury et al. [37], that uses an algorithm based on the information from triaxial gyroscopes. From these two values-magnitude of movement and use time-measures of quantitative as well as qualitative aspects of movement can be calculated.
Additionally, six studies [19,29,31,32,44,45], analyzed the quality of movement by calculating kinematic measures from IMUs and electrohydraulic sensors, and one study [36] focused on the number of hand movements using a custom setup and algorithm.

Measures of Quantitative Aspects of Upper Limb Movement
Unilateral and bilateral magnitude. Unilateral magnitude refers to the amount of movement of either the affected upper limb (AUL) or the unaffected upper limb (UUL) of stroke patients. When comparing unilateral magnitude of stroke patients with controls the AUL is usually compared to the non-dominant upper limb (NDUL) of controls, and the UUL to the dominant upper limb (DUL) of controls. In eleven studies [5,[8][9][10]19,20,22,28,34,39,43] these measures were used. Bilateral magnitude refers to the amount of simultaneous upper limb movement (AUL + UUL), calculated per time window [8,28,30].
Unilateral, bilateral and total use time. Six studies [8,19,20,24,27,37] reported about the total time spent over a magnitude threshold for individual AUL and UUL (DUL and NDUL in controls) movements. Bilateral use time was reported by four studies [8,20,29,44] and is defined as the total time of performing simultaneous upper limb movements. Finally, by adding up the unilateral use time and simultaneous bilateral use time Total use time is obtained [8,29].

Measures of Qualitative Aspects of Upper Limb Movement
Symmetry of arm movement, usually expressed as a ratio, is one of the main qualitative outcomes evaluated in the literature.
Magnitude ratio was reported in ten studies [8,10,19,24,25,28,30,34,35,39] and refers to the magnitude of AUL acceleration relative to the magnitude of the UUL acceleration (AUL/UUL), per time window. It represents the relative contribution of each arm to the activity, and is an indicator of symmetry in use intensity. Similarly, the Use Ratio [25][26][27]29,30,37,44] refers to the use time of the AUL relative to the use time of the UUL and is an indicator of symmetry of upper arm use. For both magnitude and use ratio values near 1 indicate more symmetric movement.
Variability of arm movement as reported in three studies [25,33,38] is also used to asses the quality of upper limb movement. Acceleration variability is usually calculated as the standard deviation of the acceleration norm σ, per 1 min epoch [33] or over the entire monitoring period [25]. Additionally, the norm of the standard deviation of the acceleration vector components σ 2 x + σ 2 y + σ 2 z , is used as another measure of acceleration variability that may be more sensitive to rotation movements [33,38]. Variation Ratio, both a variability and a symmetry outcome, is calculated over the total recording period [25], or for every 24 h [33] as the magnitude of AUL acceleration variability relative to the magnitude of UUL acceleration variability. We include in this category the score proposed by Le Heron et al. [40], because it is both a measure of variability and symmetry between AUL and UUL.
Finally, arm kinematic measures provide several metrics of movement quality. Average joint range of motion was reported in one study [31], describing the degree of motion of elbow and shoulder. Additionally, several reaching related measures of movement were identified. Reaching area [31] is defined as the encircled trajectory of the hand position relative to the pelvis in the horizontal plane and describes the degree of motion of elbow and shoulder. The maximum distance of the hand relative to the pelvis defines reaching distance [31].
Reaching counts refers to the number of times the hand showed a displacement of more than 10 cm away from the preferred hand position [31,32] and reaching ratio is calculated as the ratio of reaching counts of the AUL relative to the UUL [31]. Distribution of forearm elevation, reported in three studies [19,29,44], is based on the distribution of the forearm elevation relative to the body over time. It is reported as a probability distribution [19] or as movement time spent in separated vertical regions relative to the body [29,44]. Gross arm movement time is the duration, during the recording period, of movements in which the sum of a change of forearm orientation in yaw and elevation is more than 30 • within a time period of 2 s, but only if the movement occurs within a range of forearm elevation between −30 • and +30 • [19,45].

Hand Movement Related Measures
Only one study focused on the study of hand movements during ADL [36]. This study proposes the quantitative measure 'HAND counts' measuring the amount of movement of the fingers with respect to the wrist, where the sensors are located.

Lower Body and Gait Related Measures
One study [34] studied lower limb activity using sensor measures equivalent to the ones used for the upper limb, namely unilateral magnitude of the affected lower limb (ALL) and unaffected lower limb (ULL) and magnitude ratio (ALL/ULL). Six included studies [5,11,21,23,37,43] reported gait related measures obtained from walking bouts during ADL. The extracted gait related measures were both quantitative and qualitative and are widely used throughout gait studies. Quantitative gait measures were duration of walking, number of steps and number of walking bouts [5,21,23,37] while the measures of qualitative aspects of movement were gait symmetry, walking speed, stride time, stride (step) regularity, standard deviation of accelerations, harmonic ratio, index of harmonicity, frequency, amplitude and width of the dominant power peak, local divergence exponent and step-time ratio [5,11,21,23,43].

General Measures of Quantitative Aspects of Movement
By applying human activity recognition (HAR) algorithms, four of the included studies evaluated the duration of patient activities during the day, providing quantitative measures of movement. The main measures of this type were percentage of activity time during the day, i.e., how long patients engage in dynamic activities or passive activities expressed as a percentage of the day [20,23,35,37].

Comparison of Movement Measures with Clinical Assessment Tools
Sixteen studies compared sensor-based movement measures with clinical assessment tools [9,10,19,22,[24][25][26][27][28][33][34][35][36]38,40,42]. All studies used Pearson or Spearman correlation tests to determine the relationships between sensor-based measures and clinical assessment scale scores, assuming a significance level of α = 0.05. Correlation values were interpreted as follows: 0.00-0.30 = weak; 0.31-0.50 = low; 0.51-0.70 = moderate; 0.71-0.90 = strong; 0.91-1.00 = very strong. An overview of the correlations between the different sensorbased movement measures and clinical assessment tools, in relation to sample size and grouped by unilateral and bilateral magnitude, use time, magnitude ratio, time of use ratio, movement variability, kinematic and hand movement outcomes is presented in Figure 3. Here, the results are discussed per clinical assessment tool. The Motor Activity Log (MAL) is a semi-structured questionnaire that assesses the patient's perception of how much the affected arm is used and the quality of movement of its use during normal daily activities at home. These two aspects of upper limb functioning are captured by the MAL's two subscales: Amount of Use (AOU) and Quality of Movement (QOM). The MAL uses a 5-point Likert scale, where a higher score reflects a better ability to use the affected arm [47]. The MAL-AOU was assessed in four studies [9,10,22,28] and the MAL-QOM in five studies [9,10,[26][27][28]. Unilateral upper limb magnitude had a low-to-moderate correlation with both MAL-AOU (r = 0.37-0.58) and MAL-QOM (r = 0.43-0.65) [9,10,22,28]. The correlation between MAL-AOU and MAL-QOM and magnitude ratio ranged from strong [10] (r = 0.84, r = 0.79, respectively) to moderate (r = 0.6, r = 0.66 respectively) [28]. Use ratio had a moderate-to-strong correlation with MAL-QOM (r = 0.52-0.71) [26,27]. These findings imply a positive relation between a more balanced use of UUL and AUL and both a better perception of use of the AUL and a higher quality of movement of that limb, as captured by the MAL.
The Stroke Impairment Scale (SIS) for clinical assessment, is a self-reported index of overall physical activity. Participants rate their ability to function in their daily environment on a 5-point Likert scale, where a higher score reflects the experience of fewer difficulties [48]. One of the articles used the total SIS score [27], one used the Physical Function subscale [9] and one used the Hand Function and Mobility subscales [28]. Unilateral and bilateral magnitude of upper limb function had a low to moderate correlation with the Physical Function (unilateral: r = 0.42) [9], Hand Function (unilateral: r = 0.61, bilateral: 0.43) and Mobility (unilateral: r = 0.41, bilateral: r = 0.39) subscale scores [28]. Magnitude ratio had a low correlation with the Mobility subscale score (r = 0.23) and a moderate correlation with the Hand Function subscale score (r = 0.58) [28]. However, the use ratio was only weakly correlated (r = 0.16) with the total SIS score [27], indicating that overall self-reported ability of physical functioning is low to moderately correlated to the aforementioned sensor-based metrics.
The National Institute of Health Stroke Scale (NIHSS) is an impairment scale consisting of 15 items and has been widely used in clinical trials and as initial assessment tool after stroke, with a higher score reflecting more severe stroke symptoms [48]. The NIHSS was used in two studies [10,33], and the Supplementary Motor Scale of the NIHSS (SMS-NIHSS) was assessed by two studies [33,40]. In this supplementary scale, motor function of the shoulder, wrist, hip and ankle is assessed, while the NIHSS only includes proximal motor function. The correlation of unilateral magnitude and magnitude ratio with NIHSS was moderately negative (r = −0.69, r = −0.60, respectively), implying that more severe stroke symptoms are characterized by low absolute AUL use and low AUL use compared to UUL use [10]. Le Heron et al. (2014) found a moderate correlation with their index-similar to variation ratio-and SMS-NIHSS (r = −0.53) and Iacovelli et al. (2019) found that the correlation between the asymmetry index of the acceleration variability and both NIHSS and SMS-NIHSS was moderate (r = 0.41, r = 0.51) but strong (r = 0.71, r = 0.81, respectively) for the second index proposed, which is more sensitive to rotational movements. These results imply that more severe stroke symptoms are characterized by a larger difference in variability in acceleration between upper limbs, especially during rotational movements.
The Box and Blocks Test (BBT) can be considered a fast screening tool for assessing gross manual dexterity, with a higher score reflecting better gross manual dexterity. The BBT provides information about the speed of performance but qualitative information on movement performance is not fully captured [49]. The correlation between several metrics, computed with and without periods of walking, and the BBT ranged from moderate to very strong [19]. Unilateral magnitude in-and excluding walking was moderately and very strongly correlated with the BBT score (r = 0.69, r = 0.93, respectively). Similarly, the correlation of the magnitude ratio AUL/UUL with the BTT increased from r = 0.49 to r = 0.84 when excluding walking implying that removal of walking periods from the data positively affects the correlation between these metrics and the BBT score. BBT score had a strong correlation with unilateral use time including walking (r = 0.77), a moderate correlation with distribution of forearm elevation probability excluding walking (r = 0.68), a moderate correlation with the amount of hand use [36] and a strong correlation with gross movement duration (r ≥ 0.90) [19]. Furthermore, Rand & Eng (2015) also found a moderate correlation between BBT and unilateral magnitude of the affected arm (r = 0.62).
These results indicate a moderate to strong correlation of sensor-based measures and the BBT score. It should be noted, however, that one of these studies only had 10 stroke participants [19], making these correlations less reliable.
The Action Research Arm Test (ARAT) is a test for upper limb functioning that consists of 19 items which are divided into 4 subscales: grasp, grip, pinch and gross movement. A higher score on the ARAT reflects better upper limb functioning [49]. Unilateral AUL magnitude was moderately correlated with the ARAT score in [22], while Urbin et al. [25] found a strong correlation between the ARAT and five sensor-based metrics: AUL median acceleration magnitude (r = 0.75), use ratio (r = 0.79), magnitude ratio (r = 0.83), acceleration variability (r = 0.73) and variation ratio (r = 0.85). The strong correlation between these variability and ratio metrics and the ARAT score implies that better function is characterized by a more dynamic and symmetrical use of AUL and UUL.
Other clinical scales that were considered assessed functioning after rehabilitation or during ADL, activity level or real-world arm use based on spontaneous behavior. The Fugl-Meyer upper extremity motor assessment scale (FMA-UE) was found to be strongly correlated with AUL magnitude (r = 0.82) [34] (r = 0.7) [42] and moderately to strongly correlated with arm ratio measures, both magnitude (r = 0.82) [34] (r = 0.59) [42] and use time (r = 0.85) [24]. A moderate correlation was also found between its lower extremity scale (FMA-LE) and the leg magnitude ratio (r = 0.61) [34]. These results indicate that higher sensor-based symmetry values are related to better performance. Strong correlations were found between magnitude ratio and the Upper Extremity-and Hand-subscales of Brunnstrom Recovery Stage (BRS), and the Simple Test for Evaluating Hand Function (STEF)-affected side (r = 0.77, r = 0.71, r = 0.86, respectively) [10]. Furthermore, moderate to strong correlations were found between unilateral magnitude and the STEF (r = 67 AUL, r = 0.75 UUL) [10]. The rest of the correlations between clinical scales and sensor-based metrics were either low or moderate [10,22,27,35,38,42].

Discussion
The aim of this review was to provide an overview of the literature describing setups used to measure the quantitative and qualitative aspects of movements of stroke patients under free living conditions using wearable sensors. Additionally, we evaluated the relation between the sensor-based outcomes that were obtained from moving in a freeliving environment and the level of functioning as assessed by existing clinical evaluation methods. Based on the results of our review it appears that continuous monitoring of motor function of stroke patients during their activities of daily living using wearable sensors is feasible, and an overview of setups is provided. The sensor-based outcomes showed weak to strong correlations with the scores on clinical scales assessing motor functioning.
In comparison to earlier reviews related to our topic, we considerably updated the discussed literature and added a structured overview of setups used to measure the quantitative and qualitative aspects of movements of stroke patients under free living conditions using wearable sensors. Noorkõiv et al. [12] included eight studies in their review, of which five [5,24,27,28,41], are also included in ours. Johansson et al. [13] included 24 studies in stroke, of which eight [20,[23][24][25][26][27][28]40] are also included in ours. Studies that were discussed in earlier reviews, but not in ours, mostly either didn't focus on free-living conditions, or discussed physical activity instead of quantitative or qualitative aspects of movement. Some of these earlier reviews [12,14], are also explicitly limited to studies that used accelerometry or accelerometry-based proprietary devices such as actigraphs, actiwatches or step counters, probably because the use of gyroscopes or combinations of sensors in IMUs is more recent.
The studies included in the current review made use of different wearable sensors to measure movements of stroke patients in free living conditions. All included studies used at least 3D accelerometer sensors. Reliability of accelerometer derived movement measures has been addressed extensively in numerous laboratory studies. Until recently, the technology necessary for long term measurements was not available. However, due to technological advancements, accelerometers are now easily available at low cost and with low power consumption, making long term data collection feasible. This might explain why all studies included in the present review, derived measures from at least accelerometers, to quantify movements during activities of daily living. Inertial measurement units (IMUs), combining accelerometers with other sensors, can provide more specific information about the patient's movements, because by integrating information from 3D accelerometers, gyroscopes and magnetometers, body segment orientation and joint angles can be calculated [50]. Considering technological developments concerning miniaturization and power consumption, we expect to see a further increase in the number of studies of movement in daily life using IMUs over the next years. Custom made equipment based on newer and improved sensor technology may also provide an alternative to the monitoring of factors that were impossible to quantify until now, such as the 'manumeter' [36] which allows the quantification of hand movements during ADL. Another type of custom system used in one of the studies included in this review, is the electrohydraulic activity sensor [29,44]. This sensor was used to provide information about the elevation of the forearm relative to the body. In principle, for this specific purpose IMUs can be used also [31], which has advantages in terms of wearability, usability and compliance. However, when the aim is to accurately measure isolated upper limb multi joint movements, and the coordination between the joints, single-sensor systems might not be adequate. More complex sensor configurations, with additional sensors attached to each segment of the arm, as well as to the trunk, will definitely improve the analysis of such multi-joint movements [19].
The location of the sensors was quite similar in the studies that focused on the measurement of upper body movement. All studies used accelerometers on both wrists, except for the studies using the most extensive measurement set-ups. The studies with electrohydraulic sensors [29,44] needed setups covering the whole arms and some studies using IMUs [31,32] needed additional sensors on the whole body to create a full-body kinematic model. While this may provide more information compared to the use of only accelerometers on the wrist (and sometimes sternum, waist and/or lower limb), the more complex setup needed when using multiple IMUs may reduce usability and long-term compliance. While the more technologically advanced IMUs may eventually replace single 3D accelerometers, the use of single 3D accelerometer sensors, that require less power consumption and processing time, on one or only a few body positions allowing for easy use, appears to be a good alternative for long term monitoring of upper limbs during daily living.
For the identified studies of gait during ADL, only accelerometers were used, located on the sternum, lower back, upper leg and/or lower leg, Accelerometers at these locations have excellent to good validity and reliability for spatiotemporal gait measures, but for measures of quality of gait this depends on the measurement protocol, algorithms and design [51]. With respect to sensor locations, with the appropriate correction methods, vertical and anterio-posterior accelerations were found to be reliable for back and shank sensor locations in osteoarthritis patients [52]. For stroke patients in a controlled setting, a sensor on the back was found to give reliable outcomes with respect to gait asymmetry [52][53][54]. However, reliability has not been assessed for stroke patients during long-term measurements in unsupervised settings.
Our overview of the reported correlations between sensor-based measures of movement and clinical scale scores suggests that the sensor-based measures represent a different construct compared to clinical scales thus providing additional information. After all, a perfect correlation between sensor-based measures and clinical scale scores would indicate that sensors and clinical scales provide the exact same information. In the studies included in this review, however, the correlation is mostly weak to moderate. This thus implies that with sensors, additional information is gained on top of the information gained from clinical scales. With patient reported outcome measures, insight into how patients experience their functioning and limitations is gained, while sensors provide objective information about (ADL) functioning. Previous research has shown that these are indeed two different constructs with patient reported outcomes being influenced by e.g., amount of pain experienced during ADL [55]. Both, however, provide information that is valuable and important to improve the care of this patient group.

Limitations and Future Recommendations
Quality of included studies was mostly good, with only three studies being of moderate quality (9%). Lowest scores were found for items related to corrections for or assessment of confounding variables, and participant inclusion bias. The latter is a common issue in patient studies; usually participants are not recruited from the general population. Furthermore, only studies written in English were included. While we selected studies for this review based on well-defined selection criteria, it was very difficult to combine results from multiple studies. We found a disparity among studies in time measured after stroke, the units in which the outcomes were reported (e.g., actigraph counts vs. homemade counts vs. g/min), the length of the measurement, the lack of transparency of algorithms to calculate parameters, differences in design and methodology as well the use of a wide variety of clinical scales to assess upper extremity functioning in stroke patients (see Figure 3). This underlines the need for clear protocols, guidelines and transparency of algorithms, with respect to the use of wearable measurement systems and the extraction of metrics from these systems when monitoring stroke patients outside the clinic. Also, studies should avoid reporting their findings in proprietary, closed-source units. When reporting timebased outcomes, fraction or percentage of time should be prioritized over absolute values in hours. We also encourage future studies to report the numerical value of the measures extracted during their experiments, even if it's not one of the aims of their study, as it could help with future pooling and meta-analysis of such measures.
Thanks to technological developments, in the near future, more sensors will become available that are "wearable", precise and fast enough for monitoring in free living conditions. For example, continuous monitoring using insole pressure sensors, instead of the use of force plates which until now have been used in controlled settings [55], could provide information about the symmetry and loading patterns of hemiparetic gait during activities of daily living (ADL). Wearable wireless electromyography could help to understand balance impairments, loading and compensatory activity during ADL, for walking as well as upper extremity function, providing directions for interventions [56]. Similarly, muscle activity patterns of walking might provide important information about compensatory abilities and intervention strategies [57,58]. Modern barometric pressure sensors, which are fast, low in noise and can detect height differences as little as 10 cm should also be considered. While some studies already used them to help improve activity classification for stroke patients [59,60], to our knowledge no current study has used these to extract measures of movement in stroke patients. This kind of sensor has already demonstrated very good results, combined with accelerometers, in estimating Timed-up-and-go (TUG) scores from ADL data in hip arthroplasty patients [61].

Conclusions
Continuous monitoring of motor function of stroke patients during their activities of daily living using wearable sensors is in principle feasible and provides complementary information to clinical assessments. With recent advances in technology the options, regarding the type of sensors that can be used, increase, as well as the accuracy and possibilities of extracting different metrics from them. Sensor-based measures of movement provide additional information in relation to the scores on clinical scales assessing motor functioning and both are needed to gain better insight in patient behavior and recovery.
However, there is a strong need for standardization and consensus, regarding clinical assessments, but also regarding the use of specific algorithms and metrics for unsupervised measurements during daily life. Lab protocols and metrics, cannot simply be generalized to unsupervised settings.