Validity of Instrumented Insoles for Step Counting, Posture and Activity Recognition: A Systematic Review

With the growing interest in daily activity monitoring, several insole designs have been developed to identify postures, detect activities, and count steps. However, the validity of these devices is not clearly established. The aim of this systematic review was to synthesize the available information on the criterion validity of instrumented insoles in detecting postures activities and steps. The literature search through six databases led to 33 articles that met inclusion criteria. These studies evaluated 17 different insole models and involved 290 participants from 16 to 75 years old. Criterion validity was assessed using six statistical indicators. For posture and activity recognition, accuracy varied from 75.0% to 100%, precision from 65.8% to 100%, specificity from 98.1% to 100%, sensitivity from 73.0% to 100%, and identification rate from 66.2% to 100%. For step counting, accuracies were very high (94.8% to 100%). Across studies, different postures and activities were assessed using different criterion validity indicators, leading to heterogeneous results. Instrumented insoles appeared to be highly accurate for steps counting. However, measurement properties were variable for posture and activity recognition. These findings call for a standardized methodology to investigate the measurement properties of such devices.


Introduction
There is growing evidence regarding the role of regular physical activity in the improvement and preservation of functional autonomy and in the prevention of many diseases and disorders [1][2][3][4][5]. For example, it has been shown that regular practice of physical activity contributes in preventing recurrent stroke [1,5], obesity [1,4], cardiovascular diseases [1][2][3], and cancer [1]. Physical activity can be defined as any bodily movement produced by skeletal muscles resulting in energy expenditure [6]. The identification in daily life of relevant postures (e.g., sitting, standing) and activities (e.g., walking, jogging, descending/ascending stairs or ramp, cycling) provides important information regarding individuals' active or sedentary behavior and is thus a crucial component of daily physical activity measurement.
Physical activity may be evaluated using subjective and objective methods [7][8][9][10][11][12][13][14]. Subjective methods such as questionnaires [7,8] and individual diaries [15] are relatively inexpensive and are the more feasible method in large population-based studies. However, they present some limitations such step, search parameters #1, #2 and #3 were combined to retrieve references that covered all three concepts (#4). The search strategy was adapted for each database and limited to English or French language articles published from a given database's inception to 6 May 2019. The detailed search strategy is presented in Table 1.

Selection Criteria
We considered for inclusion only studies assessing the psychometric properties of instrumented insoles to quantify step counting and to detect posture and activity recognition. Instrumented insoles were defined as insoles integrating at least one of the following: pressure sensors, an accelerometer, a gyroscope, a magnetometer, an inertial measurement unit (IMU), or other electronic sensors (e.g., heart rate sensors). Outcomes included the quantification of stride or step count, recognition of posture (lying, sitting and standing), or activities (walking, jogging, ascend/descend ramp or stairs, cycling and elevator up/down). The targeted psychometric property was criterion validity. Studies that involved another measurement system in addition to insoles were included if data of the instrumented insole could be extracted. Papers were included if they were scientific papers with available full-text.

Article Selection
After removing duplicate references from the search results, two reviewers independently screened the titles and abstracts to identify potentially eligible articles based on the selection criteria. Preliminary selection results were compared and discrepancies were resolved by discussions between the reviewers. If it was unclear based on the title and abstract whether or not a publication met the selection criteria, the full-text of the article was read before a final decision was made. The full-text articles of all pre-selected references were independently reviewed by the two reviewers to determine if articles met the selection criteria. Discrepancies were again discussed and, in case of no consensus, a third reviewer was consulted for final decision regarding the selection.

Data Extraction
The relevant data from the selected full-text articles were extracted by one reviewer. Extracted data were as follows: full article reference, participants' characteristics (diagnosis, age), data collection setting and duration, insole design (sensing element, sampling frequency, data transmission method), criterion methods (or gold standard), algorithms used, outcomes (step count, postures and activities), validity.

Methodological Quality Assessment
The methodological quality of the studies reported in the selected articles was assessed using a structured quality appraisal tool developed by MacDermid [29]. This tool consists of 12 criteria pertaining to the study question and design, measurement methods, analyses and recommendations. Each item was scored as 0, 1, 2 or NA (not applicable) giving a maximum possible score of 24. For each article, the quality score was expressed as a percentage calculated as: Quality score = obtained score total possible score × 100% (1) As per de Oliveira et al. [30], study quality was categorized as follows: "high quality" ≥ 80.0%, "good quality" between 70.0% and 79.9%, "moderate quality" for scores between 50.0% and 69.9%, and "low quality" representing scores < 50.0%. These categories and scores were used to assist with interpretation of the review findings. No article was omitted from the review based study quality, however.

Results
The electronic search retrieved 2030 records, from which 930 duplicates were removed. The title and abstract of the remaining 1100 records were screened (1015 were removed) and then there was a review of the full-text of the remaining 85 references (52 removed). Thirty-two articles (with independent studies) met the inclusion criteria. The search history and selection process are presented in Figure 1. From the 33 included articles, 27 reported on studies that evaluated posture and activity recognition while step counting was reported in seven of them. One article [31] reported on step counting as well as on posture and activity recognition. Most of the included articles were published in the past 10 years as illustrated in Figure 2. Tables 2 and 3 present the description of technical features of insoles respectively for posture and activity recognition, and step count. Table 4 summarizes the criterion validity of insoles for posture and activity recognition, while criterion validity of insoles for step count is included in Table 3. Finally, Table 5 presents the summary of methodological quality appraisal of included studies using MacDermid.

Insole Models and Technical Features
The 33 articles described 17 different insole models, most of them (16/17) being academic research prototypes (see Tables 2 and 3). Only one of these 17 insole models was commercially available [32,33]. Data transmission methods were Bluetooth, wireless and wire modules with sampling frequencies varying from 10 Hz to 400 Hz (see Tables 2 and 3). For step detection, instrumented insoles were validated using visual observation [25,31,34,35], other devices (the Runtastic pedometer application and other smartphone applications) [34,36], or using a predefined number of steps [24,36,37] (see Table 3). To validate the instrumented insoles for posture and activity recognition, comparisons were made between the smart insole data and that collected from direct observation during data collection or from a video recording or from other wearable devices (2D accelerometer (ADXL202), gyroscope (Murata, ENC-03J), ActivPAL device, PPAC (plantar-pressure based ambulatory classification) and FF (foot force sensor) + GPS [18,26,[31][32][33] (see Table 2).
For step counting, accuracies were from 94.8% to 100% (see Table 3) [24,25,35]. Similarly, one article reported an intraclass correlation coefficient (ICC) of 0.99 for number of steps [31]. Error rates of 4% and 0% were reported for walking and running respectively [36]. In two articles, the reported measurement errors were 0% [34] and < 1% [37] for step counting during walking. The methods used to validate step counting were different from one article to another.

Methodological Quality of Included Articles
Based on MacDermid criteria, the total methodological quality scores for each reported article were calculated (Table 5). Total quality scores varied between 12 and 23 points corresponding to 50.0% and 95.6%. For posture and activity recognition, the methodological quality was high in 16 articles, good in six articles, and moderate in four articles as illustrated in Table 5. For step counting, methodological quality was high for one article, good for two articles and moderate for three articles (see Table 5).

Discussion
The aim of the present review was to assess and synthesize the available information on the criterion validity of instrumented insoles in detecting posture, activity and steps. This systematic review included 33 articles that reported smart insole criterion validity data such as the accuracy, precision, specificity, sensitivity, identification rate, and measurement error for step counting, posture and activity recognition. Accuracy varied from 75.0% to 100%, precision from 65.8% to 100% and specificity from 98.1% to 100%. These values excluded the detection of ascending/descending stairs because in some articles, the criterion validity of this activity was reported to be very low (from 3.0% to 53.0%) [26,42,52] except in studies of Sazonov et al. [49], Zhang et al. [41], Sugimoto et al. [51], Peng et al. [47], and Chen et al. [39]. Sensitivity varied from 73.0% to 100%, identification rate from 66.2% to 100% and measurement error was of 4.0%. Walking, standing and sitting were the most frequently assessed activities. In summary, the criterion validity of instrumented insole varied from one article to another, and was expressed by different indicators. Overall, instrumented insoles appear to be best at monitoring of steps.
The variation in the reported instrumented insole criterion validity results could be related to several factors such as methods of training algorithms, dataset size, and heterogeneity of activities and postures. For example, larger datasets may result in a higher rate of successful classification. The training and validation algorithms from Edgar et al., [43] were based on datasets of 4800 and 2400 feature vectors respectively, which yielded a successful classification of 99.3% for training and 89.6% for validation. Zhang et al, [41] on the other hand, reported an identification rate of 98.8% for training and 98.3% for validation based on 11 268 feature vectors for training and 8687 for validation. The results from Sazonov et al. [42,61] suggest that the algorithm training method may also influence the accuracy of events detection results. These researchers used two models of training algorithms, an individual model and a group model. The individual model, a training algorithm for each individual participant, yielded higher accuracy for both training and validation (99.9% and 99.1%, respectively) than the group model (95.0% and 91.5%, respectively) [40,42]. Finally, the variability of reported accuracies between studies may be due to the heterogeneity of experimental conditions regarding activities to be detected (sitting, walking, etc.). Indeed, with an experimental task of only 3 distinct activities (standing, sitting and walking), articles had reported high criterion validity [31,40,45,46] while in other articles, experimental conditions with more than 3 selected activities (standing, sitting, walking, car driving, vacuuming, ascending/descending stairs, elevator up/down, dancing, lying down, shelving items, washing, etc.) led to lower detection rate [32,33,39,43,44,53,57]. These differences in the experimental conditions do not allow comparisons of the different models of instrumented insoles that were used across studies. It is, therefore, difficult to state which of the insoles was best. To make such a conclusion, more consistent studies are needed that would have tested different models of instrumented insoles through standardized experimental conditions. However, the findings of this systematic review may help identifying the most appropriate device for a given application. Some authors have reported the inability of their algorithms to discriminate between a walking activity and standing posture, with confusion occurring mainly in people with low walking speed (0.69 ± 0.35 m/s) such as older adults and stroke survivors [32,40]. This indicates that event detection algorithms may be sensitive to the amplitude of movements, so that the detection of upright events like standing or slow walking based on sensor signals can be confused, leading to false positive results. Such observations have been reported in the literature with other physical activity monitoring devices (pedometers, accelerometers and inertial sensors) that are known to be less accurate in detecting activities at slow walking speed [22,[62][63][64]. However, it worth noting that among the 33 articles included in this review, only three reported an assessment of the criterion validity of instrumented insoles in stroke survivors [31,40,42] and only two in older adults [32,33]. Therefore, more studies are required to investigate the accuracy of instrumented insoles in discriminating standing from walking at a slow speed.
Ten different algorithms were identified in this systematic review of 33 articles. These varied from simple (for example, binary decision trees) to complex machine learning algorithms (for example, Support Vector Machine, SVM). It is difficult to directly compare these algorithms of the variability in the experimental conditions under which they were used. However, two articles examined the accuracy of the same instrumented insole for posture and activity recognition using either two (SVM, MLP) [53] or three algorithms (SVM, MLP, MLD) [48] under the same experimental condition and on the same dataset. The authors [48,53] concluded that there was no significant difference between these algorithms. However, the storage space requirement was high for SVM compared to MLD and MLP [48]. MLD and MLP can run on wearable devices such as insoles, while SVM runs only on a computer, and data are then stored and processed off-line. Thus, SVM cannot be used for real-time posture and activity recognition.
This systematic review highlights the need for a consensus on the methodology and the measurement quality to consider (e.g., accuracy, precision, specificity) when validating insoles for posture and activity recognition. The ideal study design would be the one that compares different models of insoles based on the same experimental protocol. However, this seems complex since, contrary to other wearable devices such as wrist-worn accelerometers, one participant can only wear a maximum of two insoles at a time. Therefore, in place of multiple comparisons of insoles within the same data collection, standardized methodology of insole validation would be useful for future research. Consensus on postures and activities that should be included in experimental designs for insole validation studies would also facilitate the comparison of psychometric properties for different models of this new monitoring device. Moreover, the use of similar methods of evaluation and standardized postures and activities could enable pooling data for comparative analyses [7].
Given that most of the studies in this review were conducted in a laboratory setting, we found limited evidence regarding the criterion validity of insoles within an outdoor context. Future work should consider evaluating the psychometric properties of insoles in the community setting. To make this possible, sensors and circuit boards should be integrated into insoles rather than having separate components that need to be connected for data collection, as was the case for most of the 16 insole models evaluated here. With external components to carry on during data collection, the use of some insoles may be uncomfortable for the users while performing daily activities. Another limitation of the findings from this review is that many studies enrolled only young, healthy adults. Such samples do not allow testing the insoles in individuals with various or unstable walking patterns. Indeed, accuracy of instrumented insoles could vary from a normal walking pattern to pathological or modified walking patterns for step counts, posture and activity recognition. Thus, there is a need to evaluate insoles on different walking patterns.

Conclusions
This systematic review provides a summary of the validity of instrumented insoles for steps count and activity recognition. Instrumented insoles appeared to be highly valid for step counting; but measurement qualities were variable for posture and activity recognition due to the heterogeneity of experimental conditions and tasks on which different models of insole were tested using ten different algorithms. The most frequently assessed activities were walking, standing and sitting. In addition, several indicators (e.g. accuracy, precision, measurement error, etc.) were used to assess criterion validity of the insoles. Further research should standardize indicators of criterion validity to be considered, and the experimental postures and activities used for testing the insoles.