A Smartphone-Based Algorithm for L Test Subtask Segmentation

: Background: Subtask segmentation can provide useful information from clinical tests, allowing clinicians to better assess a patient’s mobility status. A new smartphone-based algorithm was developed to segment the L Test of functional mobility into stand-up, sit-down, and turn subtasks. Methods: Twenty-one able-bodied participants each completed five L Test trials, with a smartphone attached to their posterior pelvis. The smartphone used a custom-designed application that collected linear acceleration, gyroscope, and magnetometer data, which were then put into a threshold-based algorithm for subtask segmentation. Results: The algorithm produced good results (>97% accuracy, >98% specificity, >74% sensitivity) for all subtasks. Conclusions: These results were a substantial improvement compared with previously published results for the L Test, as well as similar functional mobility tests. This smartphone-based approach is an accessible method for providing useful metrics from the L Test that can lead to better clinical decision-making.


Introduction
Functional mobility tests are used in rehabilitation to monitor patient progress and assess their ability to move and ambulate safely [1].Common functional mobility tests include the timed up-and-go test (TUG) and L Test of functional mobility (L Test) [1].Although similar in movement, Deathe and Miller [1] reported advantages of the L Test over the TUG, emphasizing that the L Test includes turns in both directions, increases the distance walked, and decreases the ceiling effect often associated with TUG [1].
The L Test begins with the person sitting in a chair.The person then stands up and walks forward three meters to a marker, turns 90 • , walks seven meters to a second marker, turns 180 • , walks back to the first marker, turns 90 • , walks back to the chair, turns 180 • , and sits down [1] (Figure 1).Each movement is referred to as a subtask (i.e., stand-up, walk, turn, sit-down).Individual assessment of these subtasks has been useful in predictive recovery measures [2].For example, fall risk has been correlated with stand-up and sit-down task durations, as well as 180 • turns [2].However, current data collection methods are limited to a clinician's stopwatch to measure total test time [3]; hence, the opportunity to gain extra information is lost.Additionally, clinicians do not have the time to calculate subtask timing from a secondary data collection source within a typical clinical environment.Hence, automating the segmentation and postprocessing of these tests should enable clinicians to obtain and use subtask information within clinical encounters for decision-making, thereby creating a more effective clinical experience.
A variety of data collection methods have been proposed for subtask segmentation.These include 2D video recordings, wearable sensors, and ambient sensors [3].These approaches can also decrease manual error and variance between clinicians compared to using a stopwatch [3].Wearable sensors are one of the best options due to their accessibility, minimal space requirements, and affordability [3].An inertial measurement unit (IMU) is an inexpensive and accessible type of wearable sensor that can measure parameters such as acceleration, angular velocity, and turn angle.Figure 2 shows a physical representation of these parameters, which follow the same labeling convention as [4]: anteroposterior acceleration (APa), angular velocity (APω), and rotation (APR); mediolateral acceleration (MLa), angular velocity (MLω), and rotation (MLR); and vertical acceleration (Va), angular velocity (Vω), and rotation (VR).Additionally, the azimuth signal provides the horizontal angle from true north.A variety of data collection methods have been proposed for subtask segmentation.These include 2D video recordings, wearable sensors, and ambient sensors [3].These approaches can also decrease manual error and variance between clinicians compared to using a stopwatch [3].Wearable sensors are one of the best options due to their accessibility, minimal space requirements, and affordability [3].An inertial measurement unit (IMU) is an inexpensive and accessible type of wearable sensor that can measure parameters such as acceleration, angular velocity, and turn angle.Figure 2 shows a physical representation of these parameters, which follow the same labeling convention as [4]: anteroposterior acceleration (APa), angular velocity (APω), and rotation (APR); mediolateral acceleration (MLa), angular velocity (MLω), and rotation (MLR); and vertical acceleration (Va), angular velocity (Vω), and rotation (VR).Additionally, the azimuth signal provides the horizontal angle from true north.A variety of data collection methods have been proposed for subtask segmentation.These include 2D video recordings, wearable sensors, and ambient sensors [3].These approaches can also decrease manual error and variance between clinicians compared to using a stopwatch [3].Wearable sensors are one of the best options due to their accessibility, minimal space requirements, and affordability [3].An inertial measurement unit (IMU) is an inexpensive and accessible type of wearable sensor that can measure parameters such as acceleration, angular velocity, and turn angle.Figure 2 shows a physical representation of these parameters, which follow the same labeling convention as [4]: anteroposterior acceleration (APa), angular velocity (APω), and rotation (APR); mediolateral acceleration (MLa), angular velocity (MLω), and rotation (MLR); and vertical acceleration (Va), angular velocity (Vω), and rotation (VR).Additionally, the azimuth signal provides the horizontal angle from true north.TUG subtask segmentation using IMU sensors is a valid approach [2-27], and can be completed using smartphone sensor data [16,18,25].A smartphone approach does not require additional equipment and has good statistical agreement with movement analysis devices [28].It was noted that prior smartphone-based studies did not have a primary aim to test the algorithms themselves but rather aimed to establish the validity of using a smartphone IMU for subtask segmentation.
To date, there is no validated approach for segmenting the L Test.In this paper, we report research to develop and evaluate an approach for fully segmenting the standup, sit-down, and turn subtasks of the L Test using data acquired from a single pelvisworn smartphone.Preliminary results were presented in a conference paper [29].A successful segmentation approach will provide additional movement information for the clinician while requiring equivalent time to complete an L Test trial, accommodating clinical appointment duration.Appropriate subtask segmentation will also enable further analysis and research with AI-based modeling (i.e., fall risk, movement quality, etc.).

Participants
Data were collected from a convenience sample of 6 male and 15 female able-bodied participants, between the ages of 19

Data Collection
A custom belt was fastened around the waist of each participant, which held a Samsung Galaxy S10+ smartphone in a posterior pocket (Figure 3).This posterior-pelvis position was chosen for its approximation of center of mass, as well as its demonstrated efficacity in other algorithms [30].A custom app written by our group [31] recorded IMU data at 60 Hz; including, raw and linear accelerations in the mediolateral, anteroposterior, and vertical directions; rotation angle in the mediolateral, anteroposterior, and vertical direction; azimuth angle; and angular velocity in the mediolateral, anteroposterior, and vertical directions.Participants were instructed to complete the L Test at a walking speed that was fast but safe according to their own capabilities.Five trials were completed for each participant.The app provided an auditory cue to indicate that it had begun recording.Participants were then informed that they could begin the first trial.Once each trial was completed, the participant was given an opportunity to rest before beginning the next trial.

Ground Truth
An Apple iPhone XR was used to video-record participants at 30 Hz while they completed the tests.Ground truth times of events of interest were determined from the video using Kinovea [32].The beginning of the stand-up task was defined as the beginning of trunk flexion and the end corresponded to maximal trunk extension, whether this occurred before or after the first step.Turn initiation was the beginning of pelvis rotation and turn completion was the end of this rotation.The beginning of the sit-down subtask was defined as the beginning of trunk flexion and the end was the end of trunk extension.Timestamps of three foot strikes from the first walkway were also recorded, and then used to align the ground truth times with the inertial data, since foot strikes provided clear acceleration peaks.

Ground Truth
An Apple iPhone XR was used to video-record participants at 30 Hz while they completed the tests.Ground truth times of events of interest were determined from the video using Kinovea [32].The beginning of the stand-up task was defined as the beginning of trunk flexion and the end corresponded to maximal trunk extension, whether this occurred before or after the first step.Turn initiation was the beginning of pelvis rotation and turn completion was the end of this rotation.The beginning of the sit-down subtask was defined as the beginning of trunk flexion and the end was the end of trunk extension.Timestamps of three foot strikes from the first walkway were also recorded, and then used to align the ground truth times with the inertial data, since foot strikes provided clear acceleration peaks.

Preprocessing
Raw data were imported into a custom-built Python program.The algorithm corrected jumps in the azimuth signal that occurred when a participant turned past 360° or 0° [30].A threshold technique identified changes in azimuth magnitude greater than 10° between data points and, if true, added or subtracted this magnitude change from the signal [30].An example of these jumps can be seen in the azimuth plot in Figure 4a.Acceleration and angular velocity data were filtered using a fourth-order zero-lag Butterworth low pass filter with a 4 Hz cut-off frequency [30], a commonly used approach for filtering data for segmentation [4,20,30].

Preprocessing
Raw data were imported into a custom-built Python program.The algorithm corrected jumps in the azimuth signal that occurred when a participant turned past 360 • or 0 • [30].A threshold technique identified changes in azimuth magnitude greater than 10 • between data points and, if true, added or subtracted this magnitude change from the signal [30].An example of these jumps can be seen in the azimuth plot in Figure 4a.Acceleration and angular velocity data were filtered using a fourth-order zero-lag Butterworth low pass filter with a 4 Hz cut-off frequency [30], a commonly used approach for filtering data for segmentation [4,20,30].

Algorithm 2.5.1. Algorithm Overview
The algorithm classified stand-up, first 90 • turn, first 180 • turn, second 90 • turn, second 180 • turn, and sit-down task.Each subtask was found using the algorithm described in Section 2.5.3.After a subtask was found, data up to the end of the identified subtask were removed and the classifier resumed classifying subsequent tasks.The only exception was the last 180 • turn and the sit-down task, since these two tasks happen simultaneously for some participants.

Threshold Selection
When completing tasks within the L Test, participants maintained a relatively consistent torso posture during straight walking, but more torso motion during standing-up, turning, and sitting-down.Therefore, a threshold approach was adopted to differentiate between walking and these other subtasks.To calculate these thresholds, four steps (two strides) from the initial seven-meter walking section were segmented for each participant.A 0.33 s sliding window (20 data points at 60 Hz, overlapping, and advancing by one data point within the given range) was used to calculate both standard deviation (SD) and magnitude change within each window, for medial-lateral angular velocity and medial-lateral rotation angle, for each participant.The mean of these values across all participants was then calculated to provide a final mean and standard deviation for these signals while walking.The thresholds for stand-up and sit-down were set to the mean plus a required number of standard deviations, depending on the signal and subtask (Section 2.5.3).Turning thresholds were initially gathered from [30] and updated to provide sufficient values for 90 • turns.The 0.33 s sliding window was chosen as an appropriate window size for subtasks, since for healthy individuals, the average step takes about 0.47 to 0.59 s during moderateto-vigorous walking [33] and a 0.33 s window was found to give sufficient time to observe signals for a subtask without missing a change in magnitude or standard deviation.If the window was too small, insufficient changes in the signals for proper classification may occur for people who walked slowly.If the window was too large, data from a previous stride may interfere with the classification of the current task.

Subtask Identification
A sliding window approach was used to identify the beginning and end of each subtask.Pseudocode depicting the sliding window approach for the stand-up task is in Appendix A. To determine the beginning of the subtask, two steps were required.First, a sliding window (length of 0.33 s, overlapping, advancing by one data point) with signals relevant to the subtask was used to determine if a magnitude threshold was crossed.If the window moves beyond the search area (i.e., each subtask has specific start and end search times, Table 2) without the threshold being crossed, then the threshold was decreased by 10% and the sliding window repeated the search from the beginning of the search area.Secondly, the window continued forward from the location of the magnitude threshold crossing but using a standard deviation threshold.The subtask beginning was set to the end of the window where the standard deviation threshold was crossed.
The end of each subtask was identified by moving from the subtask beginning until the standard deviation crossed the threshold again.If the window could not identify a location where the standard deviation passed the threshold, the threshold was increased by 10% and the sliding window repeated the search from the subtask beginning.The direction of search for the sit-down subtask was reversed to minimize the movement effects during the 180 • turn that occurs before sitting down.Sliding window and subtask identification parameters are listed in Table 2.
Additionally, some participants had a slower trunk angular velocity but greater range; therefore, ML ω did not pass the threshold during stand-up and sit-down subtasks.For these cases, lower thresholds were introduced for ML ω and higher thresholds for ML R .This is shown in Table 2. 5 SD above mean3 or 6 SD above mean (ML R ) and 3 SD above mean (ML ω )

Results
A total of 105 trials with 21 participants were classified by the algorithm.Figure 5 shows an example of the output of one trial.Table 3 shows ground truth and algorithmidentified means and standard deviations of subtasks for each participant.Table 4 shows the accuracy, specificity, and sensitivity for stand-up, sit-down, and turning subtask classification.The performance metrics were calculated using Python's SciKitLearn library, with a 0.06 s (2 video frames) error allowance.

Discussion
The new threshold-based algorithm for segmenting the L Test provided excellent outcomes, demonstrating that the proposed smartphone-based approach can provide viable subtask segmentation.All accuracy and specificity results were very high for all subtasks, with over 97.1%.These results demonstrate that the algorithm can correctly identify the tasks for almost all data windows.Sensitivity, or the true positive rate, had lower results for all subtasks; however, for most tasks, the algorithm was able to correctly identify the exact timestamps that the subtasks had taken place.
The sit-down subtask had lower metrics, notably sensitivity, than the other tasks, most likely due to some participants beginning to drop their head and round their shoulders to look down at the chair before and during the turn before sitting, which may have caused the algorithm to occasionally detect some pelvis flexion and thus misclassify this movement as the start of the sit-down task.As previously mentioned in Section 2.5.2, some of this misclassification was addressed using a higher threshold for this part of the algorithm.Additionally, at the end of the turn, participants would often fidget or fall into the chair, causing them to move a bit after they finished the task.For the stand-up task, participants were more commonly completely still, not introducing variable movements into the signal and therefore having a clear distinction between the beginning and end of the task.Reversing the array (as mentioned in Table 2) minimized the effect of variable movements on sit-down task initiation identification.
The algorithm had lower performance results with participants who took longer to fully extend their torso during the stand-up task.Similarly, those who took longer turns also tended to have lower subtask identification metrics than those with shorter turn duration.Currently, the thresholds used are global thresholds based on all participants.These outliers were therefore not well labeled by the global thresholds.Future studies could investigate participant-specific thresholds, tuned to their linear walking data.This would allow the algorithm to be directly applied to different patient populations while still ensuring sufficient performance.
The current approach improved upon the results from our preliminary study [29].The results in [29] achieved 97.9% accuracy, 98.5% specificity, and 86.1% sensitivity for stand-up; 94.6% accuracy, 96.2% specificity, and 72.1% sensitivity for sit-down; and 90.2% accuracy, 95.7% sensitivity, and 70.5% specificity for all turns [29].The most substantial improvements occurred in sensitivity of the stand-up and turning tasks.The stand-up task obtained an increase in sensitivity of 6.2%, with the turns obtaining increases ranging from 3.8% to 11.6% [29].
A study by Abdollah V. et al. [13] produced good results for stand-up and sit-down tasks in the single-sensor TUG test; however, turns were not segmented.With a single tri-axial accelerometer mounted on a participant's head, they obtained 95% accuracy, 100% specificity, and 90% sensitivity during the stand-up task, and 98% accuracy, 100% specificity, and 98% sensitivity with the sit-down task, using a rule-based threshold algorithm [13].Our results surpassed these accuracy measurements and were within 1.4% of the specificity outcomes.The current algorithm also surpassed the stand-up subtask sensitivity by 7.4%, but fell short by 19.7% for the sit-down subtask.Pew C. et al. [26] also published algorithms for segmenting some parts of the L Test, which included results for walking and turning subtasks.They used a variety of machine learning algorithms, with the highest accuracy for turning being 96% with support vector machines (SVMs) [26].The current algorithm surpasses this for all turns.
A 0.07 s error allowance is lower than the commonly used L Test approach of using a stopwatch, which has been reported to have a measurement error of 0.2 s.Yahalom et al. [25] discussed that the measurement error using a stopwatch is estimated to be the same for both stop and start times.However, with a stopwatch, this can vary between clinicians.Additionally, a human-controlled approach can also be subjective and introduce further variability in components such as when the clinician begins or ends recording (for example, before or after the patient leans back in the chair during the sit-down task), and can also be affected by distractions [25].Therefore, even with error allowances, a more objective method of measuring these subtasks should provide more consistent measurements.

Limitations and Future Work
One limitation for the study is that the algorithm was only tested on able-bodied individuals.Future work should include validation with people who have mobility deficits since biomechanical signals can differ from able-bodied participants [34].Additionally, creating a database of L Test segment timings could help clinicians interpret outcomes from data segmentation.The implementation of this algorithm into a smartphone app could provide clinicians with these data in an efficient manner.Further signal analysis during the L Test could also provide clinicians with additional details, and assist in future evaluations of fall risk [35].

Conclusions
A novel method of subtask segmentation was developed and successfully evaluated for the L Test.When compared to published segmentation results for TUG, the performance metrics of the proposed algorithm generally surpassed previous outcomes, with >97% accuracy, >98% specificity, >74% sensitivity, and >79% precision for all subtasks.The smartphone-based approach was chosen due to its accessibility and ease of use, so that the outcomes could be seamlessly integrated into a clinical setting.This technology should allow for precise and useful metrics from functional mobility tests such as the L Test and provide a basis for future AI-based outcome measures.

BioMedInformatics 2024, 4 , 2 Figure 1 .
Figure 1.Route for the L test.The participant can choose the direction for 180° turns.

Figure 1 .
Figure 1.Route for the L test.The participant can choose the direction for 180 • turns.

Figure 1 .
Figure 1.Route for the L test.The participant can choose the direction for 180° turns.

Figure 2 .
Figure 2. Parametric directions used in inertial data.Figure 2. Parametric directions used in inertial data.

Figure 2 .
Figure 2. Parametric directions used in inertial data.Figure 2. Parametric directions used in inertial data.

Figure 3 .
Figure 3. Participant completing an L Test trial.

Figure 4 .
Figure 4. Example of raw data (a) and preprocessed data (b) collected by the app for mediolateral acceleration, azimuth, pitch, and vertical angular velocity signals.

Figure 4 .
Figure 4. Example of raw data (a) and preprocessed data (b) collected by the app for mediolateral acceleration, azimuth, pitch, and vertical angular velocity signals.

Figure 5 .
Figure 5. Examples of inertial data for an L Test trial with (a) mediolateral linear acceleration, (b) azimuth, (c) pitch, and (d) mediolateral angular velocity.Red indicates the stand-up and sit-down subtasks, orange indicates the 90° turn subtasks, and green indicates the 180° turn subtasks.

Figure 5 .
Figure 5. Examples of inertial data for an L Test trial with (a) mediolateral linear acceleration, (b) azimuth, (c) pitch, and (d) mediolateral angular velocity.Red indicates the stand-up and sit-down subtasks, orange indicates the 90 • turn subtasks, and green indicates the 180 • turn subtasks.

Figure A1 .
Figure A1.Pseudocode for the stand-up subtask.Here, i represents the starting index in the data array for the window; i2 represents the starting index in the data array for the window once the first set of thresholds have been crossed.SD is the standard deviation of the signal under investigation within a given window.MLω is mediolateral angular velocity and MLR is mediolateral rotation.

Figure A1 .
Figure A1.Pseudocode for the stand-up subtask.Here, i represents the starting index in the data array for the window; i2 represents the starting index in the data array for the window once the first set of thresholds have been crossed.SD is the standard deviation of the signal under investigation within a given window.MLω is mediolateral angular velocity and MLR is mediolateral rotation.

Table 1 .
and 68 (average age: 36 ± 19 years).Exclusion criteria included individuals with cognitive issues that affected their ability to follow instructions.Participants provided informed consent prior to participating.The study was approved by the University of Ottawa's Office of Research Ethics and Integrity (H-09-22-8351).Participant characteristics are shown in Table 1.Participant characteristics.

Table 3 .
Algorithm and ground truth (GT) mean and standard deviation for subtask durations for all participants.

Table 4 .
Performance metrics for subtasks.Duration difference is the mean absolute difference and standard deviation between the algorithm time and the ground truth time across all participants and trials.