Collection of Kinematic and Kinetic Data of Young and Adult, Male and Female Subjects Performing Periodic and Transient Gait Tasks for Gait Pattern Recognition

: The aim of the study was to develop a database of biomechanical data for multiple gait tasks. This database will be used to create a real-time gait pattern classifier that will be implemented in a new-generation active knee prosthesis. With this intent, we collected kinematic and kinetic data of 40 subjects performing 16 gait tasks, categorized as periodic and transient motions. We analyzed four distinct sub-populations, differentiated by age and gender. As the classifier will be based also on inertial data, we chose to synthesize these signals within the motion capture environment. To assess the effects of gender and age we performed a correlation analysis on the signals used as input of the classifier. The results obtained indicate that there is no need to differentiate into four distinct classes for the development of the classifier. Sample data of the dataset are made publicly available.


Introduction
In this paper we present the data collection and analysis of a biomechanical database comprehensive of multiple gait tasks. This database will be used to develop a real-time classifier implemented in a new-generation active knee prosthesis, called GARP (Ginocchio Artificiale a Rigenerazione di Potenza, i.e., artificial knee with power regeneration).
The classifier will implement a machine learning-based algorithm called random forest, similar to other works present in literature [1,2]. The classifier outcome will be based on data coming from a rotary encoder on the knee joint and a six-axis IMU (inertial measurement unit) located inside the prosthetic element corresponding to the biological shank. To implement this control system, a comprehensive database is needed both to train the classifier and to develop the control logic.
We chose to create the database instead of using data from literature for multiple reasons.
First, even if many databases can be found in literature [3][4][5][6][7], only a few of them give full access to all data, as they typically present only means and standard deviations. Typical classifiers, such as the random forest, instead need to be trained using data from as many single trials as possible.
Secondly, we wanted to have a database containing certain specific gait tasks, and we were not able to find any database containing all those sought in the same dataset. As observed by different authors [3,4], it is not advisable to merge data from multiple sources, as some inconsistencies can arise due to the use by different research groups of different biomechanical protocols, experimental conditions, subject characteristics and so forth.
Finally, we chose to collect data without using an instrumented treadmill, as we did not want to bias the measure of the IMU due to the relative movement of the belt with respect to the ground. Further, the clinical validity of data collected on a treadmill is still under discussion [8].
Given these considerations, we decided to create our own comprehensive dataset. To do so, we built instrumented stairs and ramps, and we collected kinematic and kinetic data of 40 subjects, divided into four classes evenly distributed, based on age and gender. Subjects were asked to perform 16 activities of daily living, such as level walking, climbing stairs and sitting.
An important element of novelty of this database is that we collected data of two categories of motion, namely periodic and transient tasks. Periodic tasks are motions that are repeated cyclically, whereas transient ones represent the passage from one periodic gait task to the next one.
After collecting the data, we performed an analysis of correlation on the data that will be used to train the classifier of the knee prosthesis. The goal of the analysis was to assess if it would be more appropriate to treat the classes as separate ones when training the classifier, or if we could consider all subjects as belonging to a unique population.
Sample data of the dataset are made publicly available (see Supplementary Materials).

Instrumentation
Kinematics data were collected using an 8-camera optoelectronic motion capture system (Smart-DX 6000, BTS Bioengineering), capturing images at 250 Hz. Kinetics signals were provided by force platforms (BTS P6000D, BTS Bioengineering) with a sampling rate of 1000 Hz.
We built instrumented wooden stairs and ramps ( Figure 1). For the design of the stairs we took as a reference the regulation of Cybathlon [9] (step height: 170 mm; step width: 280 mm). The inclination of the ramp was set to be 10°.
In order to collect kinetic data, force platforms were included in the design of both the stairs and the ramp. Thanks to the modular design of the stairs, it was possible to move the force platforms into different positions within the structure (Figure 1a).

Processing Protocol and Virtual IMU
We used a modified version of the built-in processing protocol of the Smart-DX software (Smart Clinic, BTS Bioengineering) called Helen Hayes MM, derived from the studies of Kadaba and Davis [10,11]. It uses 22 markers placed in specific anatomical reference locations.
Due to practicality reasons, we chose to synthesize the signals of linear acceleration and angular velocity coming from the IMU that was to be mounted on the prosthesis. In order to do so, we created a system of reference integral with the leg within the motion capture environment, coincident both in location and in orientation with the IMU itself. Subsequently, we derived twice the position and once the spatial orientation of the system of reference created. Finally, we added the gravity vector, properly reoriented, to the 3D acceleration obtained previously.
To assess the validity of this method we performed a test in-vitro, that consisted in comparing data coming from the physical and the virtual IMU sensors. To do so we attached an IMU on a Tshaped marker triad (Figure 2a), and then we collected data from both systems while moving them together, performing semi-random rapid movement (range of linear acceleration: ±40 m/s 2 , angular velocity: ±800 deg/s) (Figure 2b). The results of the validation are presented in the section Results.

Data Collection
We collected lower limb kinematic and kinetic data of 40 subjects, performing 5 repetitions of 16 different gait tasks at self-selected speed, for a total of 3200 sets of signals. During data processing, some of them were discarded, mainly due to missing markers not noticed during data collection.
The 16 gait tasks were categorized as periodic and transient, as shown in Table 1. In periodic gait tasks the biomechanical pattern is repeated periodically, whereas transients represent the movement used to pass from one periodic state to the next. We chose to collect data of only 8 transient tasks over a total of 56 partial permutations ideally possible, because a huge number would make the data collection impractical due to time and fatigue constraints.

Correlation Analysis
We wanted to assess the best strategy to train the random forest gait-task classifier. We considered two options, the first being to keep the four sub-populations analyzed as distinct, and the second was to consider all the subjects belonging to the same unique class, hence maintaining a good number of signals per gait task.
We focused our analysis on the signals that are more representative of the motion in the sagittal plane (i.e., knee flexion angle, linear accelerations X, Z and IMU angular velocity Y), as the others present a higher noise-to-signal ratio.
We computed the Pearson correlation coefficient of all the combinations of signals corresponding to a specific gait task. We called "overall correlation coefficient" (OCC) the mean of the coefficient of correlations corresponding to the combination of all the available trials of a specific gait task. This metric is a measure of the overall repeatability of the signal under analysis. We called "class correlation coefficient" (CCC) the mean of all the coefficients of correlations of the combination of trials corresponding only to a specific sub-population (YM, YF, AM or AF) for a gait task. Using the latter formulation, the coefficient of correlations between waveforms belonging to different classes was not taken into account. The mean of the four CCCs obtained, called "mean class correlation coefficient" (mCCC), represents the repeatability of the signal within sub-population.
We repeated this procedure for all the 16 gait tasks analyzed, obtaining a vector of OCC and mCCC. We chose to exclude from this analysis the seating and standing tasks, due to their inherent poor self-correlation because of their small range of motion.

Virtual IMU Validation
We compared data measured by a physical IMU with the corresponding virtual one. We confirmed the validity of the method, as the correlation between signals was always above 0.985 for all the 6 axes, even when performing highly dynamic movements (Figure 2b).

Database
We found good correspondence between the data we collected and that found in literature in terms of overall trends [3][4][5][6][7]. However, we chose not to perform any further statistical comparison between databases. In fact, it would be not trivial to distinguish if a possible difference in the signals has to be attributed to a real difference in the gait biomechanics, or it is imputable to the different processing protocol used. Some representative kinematics signals are reported in Figure 3.

Correlation Analysis
Representative OCC and mCCC values are reported in Table 2. We performed a paired t-test between the OCC and mCCC vectors, which refused the null hypothesis for all the four types of signals analyzed ( = 0.887, = 0.719, = 0.699, = 0.726). Therefore, the two vectors can be considered as belonging to the same population. For this reason, we chose not to distinguish between classes when training the classifier.

Discussion
In the present paper we reported several achievements obtained during the course of this work. First of all, we were able to build instrumented stairs and ramps (Figure 1) that allowed us to collect kinematic and kinetic data of subjects performing a wide variety of activities of daily living.
Secondly, we developed a method that allowed us to synthesize six-axis IMU signals from motion capture data. We assessed the validity of the method by comparing signals coming from a physical and a virtual IMU ( Figure 2). As the correlation of corresponding signals was above 0.985 for all the six axes of the IMU, we can consider this method valid.
Third, we performed a correlation analysis on the data that will be used in the gait classifier. It proved that, for the purpose of our study and in the above-mentioned conditions, we can rightfully train the classifier considering a unique population, composed by all the subjects belonging to the four sub-populations analyzed. Moreover, by following this strategy, the classifier will show more robustness in classifying more scattered signals.
Finally, in this work we were able to compile a kinematics and kinetics database comprehensive of multiple gait tasks, that presents several features that make it particularly valuable.
First of all, it contains all the data of every single trial. Conversely, typical biomechanical databases available in literature report only means and standard deviations of the metrics studied. This aspect is of great importance, particularly when applying a machine learning algorithm, that typically requires a huge number of samples to be trained properly.
Secondly, it contains a huge number of kinematic and kinetic sets of trial data, namely 2605. This is due to the huge number of subjects enrolled, and the large variety of gait tasks analyzed.
Third, it contains some data relative to transient movement between periodic gait tasks. We hypothesize that this feature has the potential to improve gait classification reliability, by making possible the categorization and classification of these types of movements. In fact, once the classifier recognizes a specific transient motion, and by knowing the previous periodic gait task, it could hypothetically be able to "predict" the consecutive periodic gait task. As shown in Figure 3, the kinematic pattern of transient movements starts from the kinematic pattern of the previous periodic task and ends following the pattern of the consecutive periodic task. To our knowledge, this is the first work that proves this behavior.
Fourth, as every single set of data contains the position and orientation of the system of reference integral with every single lower limb, we are able to generate other virtual IMUs using the virtual IMU method. This potentially allows us to optimize the position, orientation and number of IMUs needed to maximize the classifier's reliability.
A limitation of this work is that the database contains data of healthy subjects, who can have different biomechanical patterns compared to amputees [12,13] depending on many factors, such as amputation level or ankle-foot prosthesis model used. This choice was taken due to the difficulty of enrolling a large number of subjects with amputation. Nevertheless, a classifier trained on this dataset could represent an initial starting point that could be refined using subject-specific data.
Sample data of our dataset are made publicly available together with some more technical details about the processing protocol (see Supplementary Materials).

Conclusions
In this work we compiled a kinematic and kinetic database suitable to be used to create a gait task classifier. It presents several elements that make it valuable, such as the number of subjects enrolled, the number of gait tasks analyzed, the presence of periodic and transient gait tasks and the possibility to generate six-axis virtual IMU signals starting from kinematic data.
Supplementary Materials: Sample data of our dataset and more technical information about the processing protocol are available online at http://doi.org/10.5281/zenodo.3628229.