Machine Learning Approaches for Activity Recognition and/or Activity Prediction in Locomotion Assistive Devices—A Systematic Review

Locomotion assistive devices equipped with a microprocessor can potentially automatically adapt their behavior when the user is transitioning from one locomotion mode to another. Many developments in the field have come from machine learning driven controllers on locomotion assistive devices that recognize/predict the current locomotion mode or the upcoming one. This review synthesizes the machine learning algorithms designed to recognize or to predict a locomotion mode in order to automatically adapt the behavior of a locomotion assistive device. A systematic review was conducted on the Web of Science and MEDLINE databases (as well as in the retrieved papers) to identify articles published between 1 January 2000 to 31 July 2020. This systematic review is reported in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines and is registered on Prospero (CRD42020149352). Study characteristics, sensors and algorithms used, accuracy and robustness were also summarized. In total, 1343 records were identified and 58 studies were included in this review. The experimental condition which was most often investigated was level ground walking along with stair and ramp ascent/descent activities. The machine learning algorithms implemented in the included studies reached global mean accuracies of around 90%. However, the robustness of those algorithms seems to be more broadly evaluated, notably, in everyday life. We also propose some guidelines for homogenizing future reports.


Introduction
Healthy humans are easily able to adjust locomotor pattern to deal with multiple environments encountered in daily living situations such as stair ascent/descent, slope ascent/descent, obstacle clearance, walking on uneven floors, cross-slopes or different surfaces. Hence, with lower limb impairments such as unilateral lower limb amputation, it becomes challenging to deal with most of these environmental changes [1].
Sensors 2020, 20, 6345 3 of 30 out classification for recognizing locomotion modes. Studies using a Machine Learning regression approach were excluded. • The articles must be related to locomotion in various environments, e.g., level ground walking, stair ascent/descent, ramp ascent/descent, obstacle clearance, walking on a cross-slope, turning, walking on different surfaces, ... Studies were included if at least two locomotion modes were investigated.

•
Only lower limb assistive devices such as exoskeletons, prostheses (for below or above knee amputation) or orthoses were considered. • Studies were excluded if they met at least one of the following exclusion criteria: (1) non-human (robots or animals), (2) volunteers who are minors (under 18 years old), (3) studies focusing on volunteers equipped with an upper-limb device.

Information Sources
The PubMed and Web of Knowledge (including Web of Science core collection, Derwent Innovation Index, Russian Citation index, SciELO Citation Index) databases were searched on 31 July 2020. The two search strings used are given in the Supplementary Material. Published articles in English between 1 January 2000 and 31 July 2020 were included. Systematic reviews and meta-analyses were excluded. Conference papers were excluded if a corresponding published peer-reviewed article by the same authors had been included. Additional articles were included by further searching the references within the papers which were first identified by the search strategy described above.

Study Selection
The search strings were defined and validated by all authors. One person (FL) performed the initial search and removed the duplicates. Two main readers (DL, FL) independently screened the titles and the abstracts of all articles identified during the initial search. In case of disagreement, a third reader (LC) decided to include/exclude the article. Afterwards, the two readers (DL, FL) read the full text of the articles which had been picked from the previous step and checked them for eligibility using the criteria of our Modified QualSyst Tool which can be found in the Supplementary Material of this article. The process used to create the Modified QualSyst Tool can be found in the Section 2.4.1. Any disagreements on the eligibility of an article were resolved by the third person (LC).

Quality Assessment in Included Articles
The quality of the included articles was assessed with a dedicated QualSyst Tool [7] modified for the purposes of studying Machine Learning algorithms implemented on locomotion assistive devices. In the sections below, we provide further explanations of the score assignment for each article using this tool.

Creating the Modified QualSyst Tool
Our first step was to remove irrelevant items from the QualSyst Tool [7] (Criteria 3 and 5 to 12, e.g., blinding of investigators, of subjects, etc.). Next, we added items which are relevant to the implementation of Machine Learning algorithms such as analysis windows, selected features, evaluation method of the algorithm, etc. All items were validated by all the authors and the quality of included articles was assessed by the main readers (FL, DL). The final version of this Modified QualSyst Tool can be found in the Supplementary Material of this article.

Rating Articles Using the Modified QualSyst Tool
Twelve items were used for rating the articles. For each item, the article was rated with a score between 0 and 2 (with 2 indicating full supply of information, 1 a partial supply and 0 no information provided). Guidelines to allow consistent ratings across the included papers were created. These • The first two items evaluated if the hypotheses and objectives of the study were sufficiently described and if the study design was appropriate. • Item 3 evaluated if the volunteer characteristics were sufficiently described. • Items 4 to 10 evaluated if the Machine Learning approach was sufficiently described to allow repeatability. • Items 11 and 12 evaluated if the results were reported with enough details and if the conclusions were in accordance with them.
The score of each article was computed as the average of the 12 rated items. The maximum score possible for an article was 2. The score from 0-2 was transformed to a scale of 0-100% for ease of comprehension (0 indicating no information provided at all and 100 with maximum lucidity). More details on this scoring procedure and the guidelines used can be found in the Supplementary Material.

Synthesis of the Results
The following elements were extracted and grouped from the included studies: • Investigated population (pathology and number of volunteers) and type of assistive device (above-knee prosthesis or below-knee prosthesis or orthosis or exoskeleton).

•
The main elements of the experimental protocol are reported.
The studied locomotor activities along with the walking speed of the volunteers are given. The 'Critical Timing' is reported. It is the latest moment when the behavior of the locomotion assistive device can be adapted to the new locomotion mode without disturbing the user. The type of sensors used in each study along with the total number of measurement axes per sensor are reported. Details on the machine learning algorithm implementation are also reported (online and/or offline implementation; forward prediction and/or backward recognition [8]).

•
The signal processing techniques and Machine Learning algorithms used are reported as well: This includes the type and length of the analysis windows. The extracted features used for the analyses. If several configurations were tested, only the optimal configuration is given. The machine learning algorithms are provided. Overall results of the machine learning algorithms are reported in terms of accuracy (A). So, if studies indicated the error rates (E), the corresponding mean overall accuracy was computed (A = 100-E in percent). For studies recruiting both healthy volunteers and patients, the reported accuracy of the machine learning algorithms corresponded to the patients (accuracy).

Study Selection
The literature search produced 288 articles on PubMed and 1078 articles on Web of Science. Additionally, four studies were manually identified from references in the articles and added to the review. After removing the duplicates, there remained 1343 articles for screening. On the basis of titles and abstracts screening, 1267 articles were excluded from the review. Two authors independently read the full texts of the remaining 76 articles and checked them for eligibility. Finally, 58 articles were considered eligible to be included in this review. The PRISMA Flow Chart [6] is provided (Figure 1). 58 articles were considered eligible to be included in this review. The PRISMA Flow Chart [6] is provided ( Figure 1).

Quality of the Included Studies
The mean quality score of each study using the items of the Modified QualSyst Tool is provided in Table 1 and the detailed quality scores are presented in the Supplementary Material. The mean quality score was 68.4% +/-13.4 for the articles. Table 1. Quality assessment and recruited volunteers in the included studies.

Extracted Elements of the Included Studies
In this section, we summarize some of the key aspects of the extracted elements of the included studies.

Type of Assistive Device and Related Population
The type of assistive device used in each study and the related population are detailed in Table 1. Four types of devices were used in the included studies: prostheses for transfemoral amputation (i.e., above-knee prostheses), prostheses for transtibial amputation (i.e., below-knee prostheses), exoskeletons and orthoses.

•
Above-knee prostheses. This was the largest group among the published studies (N = 32). Among these thirty-two studies, the recruited population were either patients with unilateral . There were healthy volunteers and patients with transfemoral amputation or knee disarticulation (N = 10). Finally, there were healthy volunteers wearing an above-knee prosthesis with an L-shape adaptor (N = 3). • Below-knee prosthesis. This was the second largest group in this review (N = 18). Among those eighteen studies, the recruited population were either patients with unilateral transtibial amputation (N = 13) or healthy volunteers or patients with unilateral transtibial amputation (N = 5). • Exoskeletons and orthoses. This constituted the smallest group in this review (N = 6 and N = 2 respectively). Among those eight studies, the recruited population was always healthy volunteers wearing the assistive device.

Locomotion Activities and Walking Speed
The locomotion activities and walking speed investigated in each study are reported in Table 2.
The most representative experimental protocol investigated level ground walking along with stair and ramp ascent/descent activities (N = 43). Secondly, in some studies, level ground walking was investigated only with stair ascent and/or descent activities (N = 13). Among those fifty-six (43 + 13) studies, additional activities were also considered such as obstacle clearance (N = 6), turning (N = 2) or squatting (N = 1/58) for 'dynamic' activities and standing (N = 23/58) or sitting (N = 6/58) for static activities. The remaining two papers investigated level ground walking with cross slope walking (N = 1) and level ground walking with turning (N = 1).
In most studies, the walking speed was not provided (N = 33). One can assume that the volunteers walked at a self-selected speed in these thirty-three studies. Next, the volunteers were asked to walk at a self-selected speed in seventeen studies (N = 17). Finally, a small number of studies investigated different walking speeds: volunteers were asked to walk either at self-selected speed or at a slower or faster pace for different locomotion activities (N = 6). In the two remaining studies, recruited volunteers were asked to walk at a predefined speed of 0.7 m/s (N = 2).
Off P and R Off P Off P

Identifying the Critical Timing
The Critical Timings used in each study are provided in Table 2.
Among the studies focusing on ankle-knee or ankle-foot prostheses (N = 50), most investigated the transitions between locomotion modes (N = 39). Several definitions of critical timing were used. We describe these definitions below: Firstly, a study (N = 1) conducted by Huang et al. [23] in 2010 defined the critical timing as 200 ms before the prosthesis foot off of the ground for all transitions. Figure 2 illustrates the critical timing used in Huang et al. [23] for both level ground walking to stair ascent and stair descent to level ground walking transitions.
Secondly, some studies (N = 15) (in Huang et al. 2011 [24] for example) chose the critical timings at well-defined gait events (e.g., Foot-Off and Foot Contact): for transitions from level ground walking to any other locomotion mode, the critical timing was defined at the prosthesis foot off of the ground and for transitions from any locomotion mode to level ground walking, the critical timing was defined at prosthesis foot contact on level ground.
Thirdly, some studies (N = 5) (in Spanias et al. [42] for example) attempted to delay the critical timing in order to improve the locomotion mode prediction. Here, for transitions from level ground walking to any other locomotion mode, the critical timing was defined 90 ms after a gait event, such as the prosthesis foot off, mid-swing, prosthesis foot contact or mid-stance.
Finally, in a recent study (N = 1) conducted by Xu et al. [49] in 2018 defined the definitions of critical timings were altered based on the transition type and on the transitioning leg. As a result, the critical timing was delayed when the amputated leg was the leading leg for the transition. For level ground walking to stair ascent or stairs descent transitions, the critical timing was defined either at the last prosthesis foot off of the ground or at the first prosthesis foot contact on the stairs. For any other transitions, the critical timing was defined either at the first prosthesis foot contact on the new locomotion mode or at the first prosthesis foot off of the new locomotion mode.
The other studies did not investigate the transitions (N = 11) or did not report the critical timings used in the study (N = 17).
Among the studies focusing on orthoses or exoskeletons (N = 8), only three studies investigated the transitions between locomotion modes. In Long et al. [29], the critical timing occurred at foot contact of the contralateral leg of the exoskeleton. In Wang et al. [47], the critical timing occurred at foot contact of the ipsilateral leg in the new locomotion mode. Finally, in Zhou et al. [65], the critical timing occurred at mid-swing when the leg wearing the exoskeleton led the transition. It occurred at the last foot off of the ground for transitions from level ground to any other locomotion mode and finally for transitions from any locomotion mode to level ground walking it was at the first foot off of the ground. The remaining studies either did not investigate the transitions (N = 4) or did not report the critical timings used in the study (N = 1).  Panel A represents a patient with amputation in his transition from level walking to stair ascent. The superior part of this panel is a spatial representation of the patient motion. The line below is the temporal representation of the foot contact events. A dashed line maps the spatial representation to the temporal representation. For the spatial representation, the points refer to the spatial coordinates where the foot will hit/leave the ground. The temporal axis details the Foot Contact (FC) and Foot Off (FO) gait events for both sides. Critical timing is defined 200 ms prior to the prosthesis Foot Off event according to the Huang et al. Study [23]. The blue points are associated with the sound leg (Index S) and the red points are associated with the prosthesis side (index P). The panel B uses the same representation for patient from level walking to stair descent.  Panel A represents a patient with amputation in his transition from level walking to stair ascent. The superior part of this panel is a spatial representation of the patient motion. The line below is the temporal representation of the foot contact events. A dashed line maps the spatial representation to the temporal representation. For the spatial representation, the points refer to the spatial coordinates where the foot will hit/leave the ground. The temporal axis details the Foot Contact (FC) and Foot Off (FO) gait events for both sides. Critical timing is defined 200 ms prior to the prosthesis Foot Off event according to the Huang et al. Study [23]. The blue points are associated with the sound leg (Index S) and the red points are associated with the prosthesis side (index P). The panel B uses the same representation for patient from level walking to stair descent.

Online/Offline Implementation of Machine Learning Algorithm for Prediction of the Upcoming Locomotion Mode or Recognition of the Current Locomotion Mode
Information regarding the type of implementation of the Machine Learning algorithms is provided in Table 2: recognition and/or prediction algorithm and online and/or offline implementation.
The Machine Learning algorithms developed in the studies included in this systematic review were designed either to predict the upcoming locomotion mode (N = 30) or to recognize the current locomotion mode (N = 24). Some studies developed a Locomotion Mode Recognition system with adaptive strategies (N = 4). A forward predictor identified the upcoming locomotion mode while a backward estimator recognized the current locomotion mode. The backward estimator was used to label new data and the forward predictor could be updated with these newly labeled data.
Most of algorithms were trained and evaluated offline (N = 40) with a few which were trained offline and evaluated online (N = 18).

Data Type and Sensors Used
The details concerning the sensors used in the studies are provided in Table 2. Sensors used in the included studies were of four types:

Analysis Windows
The details concerning the analysis windows used in each study are provided in Table 3. Three types of analysis windows can be distinguished: sliding (N = 30), unique (N = 7) or multiple (N = 19) windows. The first method consisted of using a sliding analysis window by defining a window length and a window increment. The windows can therefore overlap. For the unique and multiple methods, the analysis window(s) was (were) defined either by a starting or ending point and a fixed window length (N = 21) or by both end points with a variable window length (N = 5). The remaining studies (N = 2) did not provide any information concerning analysis windows.   Mean, Max, Min, SD, sum(abs(diff(X))), mean(diff(X)), sum(abs(X)), Std(abs(diff(X))), CORR

Features
The detailed features and domains used in each study can be found in Table 3. Two main domains of features have been investigated in the included studies: time-domain (e.g., mean, minimum, maximum, standard deviation, etc.) (N = 48) and time-frequential domain features (e.g., coefficients of the wavelet transform) (N = 1). One study compared the performances of machine learning algorithm using either time-domain features or time-frequency domain features [9].
The remaining studies did not provide any information concerning the features used (N = 1) or used the temporal data measured by the sensors and did not extract any features (N = 7).

Machine Learning Algorithms and Their Accuracies
Details on the machine learning algorithms used in studies and their reported accuracies are presented in Table 3.
Most of the studies used the classical pattern recognition algorithms (Bishop 2006 [66]) which are available. Three algorithms were implemented more often than others: Linear Discriminant Some less typical adaptive algorithms were also sometimes used. Learning From Testing data (LIFT) and Entropy Based Algorithm (EBA) were each used twice and Transductive SVM was used once [15,27].

Discussion
This systematic review included 58 articles implementing Machine Learning classifiers designed to identify the locomotion mode of assistive device user. Such algorithms were generally implemented as high-level controllers able to automatically adapt the behavior of lower limb prostheses, exoskeletons, or orthoses. We used the PubMed and Web of Science core collection databases for finding our references. This was done because most medical related literature (including biomedical engineering) can be found in these two databases. In addition, we performed an extensive search through the references of the papers from the aforementioned databases. As we were focusing on medical literature, we did not include Scopus as one of the databases for this review. This may have led to a very small number of papers that have not been included in this review.
Accuracy and the robustness (e.g., stable performance in the face of long-term use) of the algorithm were the variables most often used to report the results from studies investigating locomotion on different terrains. The influence of (1) sensors, (2) analysis windows and features, (3) machine learning algorithms on the accuracy and on the robustness of the locomotion mode classifiers are discussed below. It should be noted that the accuracies reported in this review are those which were supplied in each paper. Since each study was conducted with different circumstances such as number of subjects and conditions tested, accuracies can be compared within each study but cannot be compared between studies with precision.

Influence of Sensor Choice
Several sensors have been used to build locomotion mode classifiers. The choices of these sensors may influence the accuracy and the robustness of the classifiers. More details are provided in the sections below.
Firstly, IMUs measure the acceleration and the rotational speed along three orthogonal axes. For example, Stolyarov et al. [43] classified level-ground walking (LW), stair ascent (SA), stair descent (SD), ramp ascent (RA) and ramp descent (RD) with LDA. They showed that including trajectory information of the prosthesis increased the averaged accuracy compared to using only the accelerations and rotational speeds (from 80.9% to 94.1%). They suggested using filtering techniques to reduce drift (e.g., Kalman filters, particle filters, etc.). These researchers also brought up the point that the performance of the classification algorithms might be reduced when applied to gait at slow walking speed. Other researchers demonstrating the capacity of IMUs for the detection of locomotion mode were Zhou et al. [65]. They were able with the SVM to classify three locomotion modes (LW, SA, SD) with the exclusive use of IMU data. They achieved above 90% accuracy using orientation information. The signals combining acceleration, rotational speed and orientation were directly extracted from the IMUs (MPU 9250, Ivensense ® -the filter technique was not reported in the data sheet of the sensor).
However, these studies suggested that the algorithm performances could increase when fusing IMUs signals with other sensors signals. Thus, in most studies using IMUs, information from this sensor was fused with measurements from other sensors (see below).
Secondly, load cells measured the interaction force between the device and the user. For example, Huang et al. [24] classified five locomotion modes (LW, SA, SD, RA, RD) with LDA and SVM by using only a 6 degrees of freedom (DOF) load cell mounted on the prosthetic pylon of an above-knee prosthesis. The phase-dependent strategy achieved 85 to 95% accuracy during stance phase (Initial Double Limb Stance (DS1), Single Limb Stance (SS) and Terminal Double Limb Stance (DS2)) but the accuracies dropped to 50-60% during swing (SW) phase for both LDA and SVM classifiers. Similar drops in accuracy were reported when using only plantar pressure measurements [13,46]. According to the authors [24], the low classification accuracies in the swing phase were almost certainly due to low forces/moments generated during swing phase.
Thirdly, EMG signals measured from the residual limb were reported to contain useful information for locomotion mode predictions in early studies. Indeed, for example, Huang et al. [24] and Miller et al. [33] achieved classification of five locomotion modes (LW, SA, SD, RA, RD) using EMG signals measured in the residual limb of patients with transfemoral and transtibial unilateral amputation respectively. LDA and SVM classifiers were used in both studies. For volunteers with transfemoral amputation [24], the SVM achieved an accuracy of above 90% for all phases. The LDA algorithm achieved similar accuracies in the stance phase but a slightly lower accuracy of 85% in the swing phase. For volunteers with transtibial amputation [33], both LDA and SVM algorithms achieved around 98% accuracy. Many researchers have pointed out that the EMG signals suffer from disturbances especially because of shifts in electrode position when donning and doffing a prosthesis for example. Miller et al. [33] reported a mean loss in accuracy of 15.8% and 23.1% for LDA and SVM classifiers when the medial gastrocnemius electrode was shifted. Both studies concluded that EMG signals could be helpful for classifying locomotion modes as long as the signals are not disturbed. Several studies have provided suggestions for reducing these problems. They are discussed in the 'Algorithm robustness' Section 4.1.2 below.
Finally, sensor fusion has been proven to significantly increase accuracies of locomotion mode classifiers [24,54]. For example, Huang et al. [24] observed an increase in accuracy by combining EMG and load cell data instead of using either only EMG data or only load cell data (accuracy increase of up to 5.9% for an SVM classifier). Since then, data from different sensors have been fused together to reach higher accuracies. In another example, Young et al. [54] used 13 mechanical sensors (IMU, load cell, position, velocity and torque at knee and ankle joints) and recorded EMG signals from 9 muscles of the residual limb of volunteers with a transfemoral amputation. A DBN algorithm predicting upcoming locomotion modes reached 99% accuracy for steady-state steps and 88% accuracy for transitional steps.

Algorithm Robustness
Sensors measurement noise over time can affect the performances of locomotion mode classifiers. To achieve reliable behavior of locomotion assistive device for long-term use, the influence of such noise should be considered. Techniques implemented to take into account sensors noise are discussed here.
EMG signals were mostly reported to be disturbed by environmental noise, electrode conductivity changes, shifts in electrode position or even loss of electrode contact [67,68]. Three techniques have been used to cope with such disturbances. The first one aims at training ML algorithm with several electrode displacement configurations [33]. The second one consists of building a sensor fault detection system so that disturbed EMG channel are removed if detected as noisy [23,40]. The third one uses an adaptive framework so that ML algorithm can be updated when EMG signals are disturbed [42]. The latter adaptive algorithm also included a sensor fault detection system. Alternatively, according to some researchers [62,63], capacitive sensing systems, measuring the gap change between the residual limb and the prosthetic socket [63], could eventually replace EMG signals since such sensors appear to be robust to donning and doffing an ankle-knee prosthesis and to load bearing changes [62].

Influence of Analysis Windows
In this section, we will discuss the influence of the analysis window configuration on the accuracy of locomotion mode prediction.
Among the included studies, sliding (N = 30, Table 3) and multiple (N = 19, Table 3) analysis windows were the preferred configurations. While the implementation of sliding windows requires the building of one classifier per gait phase, the implementation of multiple windows is performed by building one classifier per analysis window [50][51][52][53][54][55]. The number of classifiers depends on the number of gait phases for sliding windows and depends on the number of windows for multiple windows. In the case of sliding windows, Chen et al. [12] observed that the number of phases for phase-dependent classification significantly influences algorithm accuracy. As a result, using four gait phases (DS1, SS, DS2, SW) increased the accuracy of both LDA and QDA compared to when using only two phases (Stance, Swing). As the sliding window method generally involves a longer portion of the gait phase in question, the data to be classified are generally more variable.
Several studies reported that the length of analysis windows had a significant impact on algorithm performances for multiple window [53,54] and sliding window [22] configurations. Young et al. [53,54], using multiple windows, showed that there was an optimal window length (between 200 and 300 ms) for classification accuracies using mechanical and EMG data for both steady state and transitional data. The same was found in a study using sliding windows, where the length of the window but not its increments were found to affect algorithm performances [22]. For online implementation however smaller window increment ensures a faster response time since locomotion mode classification is performed more often.
More recently, some researchers did not use analysis windows which ended at classic gait events like foot contact but instead allowed for a delay in the termination of the analysis window. For example, Simon et al. [37,69] had an analysis window which ended 90ms after foot contact or foot off. This delay increased the accuracy of a DBN algorithm and did not affect the stability of the users of a powered above-knee prosthesis.

Influence of Features
The features set used in each study was highly dependent on the sensors used. For EMG signals, two types of features were tested: (1) time-domain features and (2) time-frequential domain features. The most commonly used time-domain features were mean absolute value, waveform length, number of zero crossings, number of slope sign changes (N = 21) and the coefficients of autoregressive models (N = 8). For time-frequential domain features, the coefficients of the wavelet transform of EMG signals were used once [9]. Ai et al. [9] compared LDA and SVM performances when using time-domain features or time-frequency domain features. Both algorithms reached higher accuracies with time domain features for one volunteer with below-knee amputation, e.g., in the case of the SVM 91.9% with time-domain features vs. 82.3% with time-frequency features. Additionally, time-domain features were easier and faster to compute [9].
A large number of studies (N = 48, Table 3) used mechanical sensors (IMU, load cells, encoders, pressure insoles, etc.). The most representative feature (N = 34, Table 3) set was a combination of the following time-domain features: mean, maximum, minimum and standard deviation. Initial and final values were also sometimes added to the feature set (N = 8, Table 3).
Finally, several feature reduction techniques were sometimes used to find the minimal feature set necessary for successful classification and to avoid overfitting (N = 14): Wrapper techniques such as Sequential Forward Selection (SFS) and Selection Backward Selection (SBS) were used to pick the features having the highest impact on the classification accuracy [39] (N = 8). Such methods are time consuming [18]. Zhang et al. [57] compared the processing time taken by two wrapper methods and a filter method. The filter method was found to be faster compared to wrapper methods (84 s for the filter method vs. 1978 s for SBS).

On Accuracy
A variety of ML algorithms were used in the included studies. The most frequently used algorithms were LDA (N = 29, Table 3), SVM (N = 19, Table 3) and DBN (N = 10, Table 3). Also, CNNs were used to avoid features selection (N = 4, Table 3).
LDA is easy to implement since no hyperparameters need to be tuned [48,70]. This algorithm is fast (1.29 ms [48], 0.078 ms with parallelization [32]) and not prone to overfitting [9]. For these reasons, this algorithm is often used as a baseline for performance comparisons between several algorithms [32,42]. More importantly, in some studies, LDA obtained accuracies similar to neural networks [48] and to SVM [33].
Even though, hyperparameters such as kernel parameter and the penalty factor need to be tuned for SVM [16], optimization techniques (e.g., grid search [9], particle swarm optimization [29]) have been found in some studies to reach slightly better performances than LDA [9,24] or QDA [62].
One of the first researchers to use DBNs were Young et al. in 2013 [50,51]. By adding past information to those of the current state, the DBN was able to obtain higher classification accuracies than LDA [54] (88% vs. 85% for transitional accuracies for DBN and LDA respectively). The DBN, unlike LDA with uniform priors, take transitional probabilities into account (e.g., in stair ascent mode, the next mode is more likely to be stair ascent or level ground walking).
Finally, CNNs were recently used in a few studies [16,44,58,59]. For example, Zhang et al. [58,59] used depth-images with a depth-camera coupled with an IMU mounted on the prosthetic pylon of an above-knee prosthesis. CNNs, known to perform well when handling image datasets are often used to avoid manual feature selection. CNNs were also used in the case of non-image data, e.g., IMU data [44] or load cell data [16]. All four studies using CNNs reported an accuracy above 89% but none of those studies implemented the designed CNN online.
The most common mistake was misclassification between ramp ascent and level ground walking modes [50]. Grouping ramp ascent and level walking classes were reported to improve the performances of locomotion mode classifiers [50]. Such a technique is relevant when the control laws (impedance in [43,50]) are similar for both modes. Zhang et al. [59] evaluated the influence of such errors (misclassifications between level walking and incline walking) on the stability of the user of an above-knee prosthesis using angular momentum and a subjective questionnaire. It was observed that the effect of the errors depends on the type of error, the error duration, and the gait phase where the error occurred. Errors were considered critical if the stability of prosthesis users was disturbed. This appears to be a good criterion for evaluating the importance of errors when designing a locomotion mode classifier.

On Robustness
Very few studies have evaluated the performances of locomotion mode classifiers for long term use. Adaptive frameworks have been proposed to deal with EMG disturbances [42] or to achieve stable performances for long term use [27]. For example, Spanias et al. [42], designed a forward predictor and a backward estimator. The forward predictor is an ML algorithm designed to predict the upcoming locomotion mode of an assistive device user. The backward estimator is an ML algorithm designed to recognize the current locomotion mode. The latter algorithm was used to label new data. Then, the newly labelled data were incorporated into the training set and then used to update the forward predictor parameters. Spanias et al. [42] used this framework to deal with EMG disturbances. The adaptive algorithm learned to reincorporate disturbed EMG channels over time. The adaptive algorithm was reported to perform significantly better than a non-adaptive algorithm. In another example, Liu et al. [27] evaluated the performance of adaptive algorithms compared to a non-adaptive algorithm across multiple session within a single experimental day. After donning and doffing the prosthesis, the adaptive algorithms were reported to update classifiers boundaries and to recover initial accuracy whereas the performances of the non-adaptive algorithms gradually decreased. To sum up, adaptive frameworks seem to be a promising solution to achieve long-term locomotion mode classification.

Propositions for Future Work
This systematic review included 58 articles published between 1 January 2000 and 31 July 2020. All 58 articles implemented ML-based locomotion mode classifiers designed for users of lower limb assistive devices. As can be seen from Table 3, classification accuracies under the tested conditions were almost always very high, hence indicating good progress in the attempts to construct more intelligent prosthetic devices. Nevertheless, there is always room for improvement. We try here to propose some recommendations concerning the research reports in the field as well as suggestions for moving forward with the implementation of these devices in the daily lives of the patients.

Homogenization of Reports
We will start first with the question of terms that are used in the field. This is not a trivial matter as the homogenization of terms would increase the understanding between researchers and hence speed up progress. There is much confusion around the use of terms recognition and prediction. The two terms are used in an interchangeable manner across studies but do not refer to the same goal. We propose that classifying the locomotion mode before the critical timing can be considered as a prediction task while a classification made after the critical timing can be considered as a recognition task. For recall, the critical timing is the latest moment when the behavior of the locomotion assistive device can be adapted to the new locomotion mode without disturbing the user. A more discriminating use of the two terms, recognition and prediction, would ease the comprehension of the studies.
The report of accuracies also suffers from a similar lack of precision. While many reports have distinguished between accuracies during steady state and the transitional step, several have not. Adding together the success obtained in steady state with that which is obtained in transitional steps is misleading, as the errors made in the latter tend to be higher. We therefore propose that there should be a systematic distinction of accuracies for these two modes.

Recommendations for Generalization to Daily Life Conditions
The review shows that significant progress has been made in the efforts to ease the use of prosthetic devices across multiple terrains. Nevertheless, some obvious steps are necessary to move ahead with ensuring the comfortable use of these devices in the daily lives of the patients.
An obvious thing to add on the list would be the inclusion of more daily life conditions for testing the devices. Examples of these would be different angles of approaches towards stairs or slope [9,28], different staircases [18] or load bearing changes [51]. A good extension for many of the studies included in this review would be a test of the algorithms outside the laboratory. Only a very small number of studies managed to take this step. For example, the work of Zhang et al. [58,59] evaluated a CNN classifier with data acquired both indoors and outdoors. Such studies are to be encouraged.
Another important condition to be included, to make the prosthetic devices more usable in daily life, would be the integration of multiple speeds in the study. Once again, very few researchers have investigated this condition. One researcher who has taken a step in this direction is Liu et al., 2017 [28].
A third variation which is not often taken into consideration is the transitioning leg which is used when entering a new terrain. While subjects tend to use one leg more than the other when crossing into new conditions, the side used is not always identical and subjects can change the transitioning leg. A handful of studies such as one by Zhou et al. [65] have taken this into account. They reported better accuracies when the locomotion mode classifier was trained with data from both transitioning legs. This may be a simple condition to include in more of the future studies in the field.
We turn here, from a discussion of conditions to be tested, to comments on how to decrease the burden of developing an algorithm which is tuned to each patient. The process of gathering data for the purpose of training the ML algorithm for each patient can be long and burdensome. A few researchers have provided recommendations on how to reduce the difficulty of this step. For example, Zhang et al. [60] proposed an automatic training method through environmental sensing. A radar distance meter coupled with an IMU helped to sense the environment and to automatically label the acquired data. Automatic labelling could also be achieved with depth cameras [58,59]. Another step in this direction has been the use of subject independent models which could potentially reduce the amount of training data needed. Efforts of this type have been made by Young et al. [52] and Spanias [8]. It seems that the addition of this step to future investigations of predictive or recognition algorithms would provide the additional bonus of reduced training time for the patient.