What Lies Beneath One’s Feet? Terrain Classiﬁcation Using Inertial Data of Human Walk

: The objective of this study was to investigate if the inertial data collected from normal human walk can be used to reveal the underlying terrain types. For this purpose, we recorded the gait patterns of normal human walk on six different terrain types with variation in hardness and friction using body mounted inertial sensors. We collected accelerations and angular velocities of 40 healthy subjects with two smartphones embedded inertial measurement units (MPU-6500) attached at two different body locations (chest and lower back). The recorded data were segmented with stride based segmentation approach and 194 tempo-spectral features were computed for each stride. We trained two machine learning classiﬁers, namely random forest and support vector machine, and cross validated the results with 10-fold cross-validation strategy. The classiﬁcation tasks were performed on indoor–outdoor terrains, hard–soft terrains, and a combination of binary, ternary, quaternary, quinary and senary terrains. From the experimental results, the classiﬁcation accuracies of 97% and 92% were achieved for indoor–outdoor and hard–soft terrains, respectively. The classiﬁcation results for binary, ternary, quaternary, quinary and senary class classiﬁcation were 96%, 94%, 92%, 90%, and 89%, respectively. These results demonstrate that the stride data collected with the low-level signals of a single IMU can be used to train classiﬁers and predict terrain types with high accuracy. Moreover, the problem at hand can be solved invariant of sensor type and sensor location. are used for data collection including camera, microphone, inertial measurement units (IMU), laser, tactical and ladar sensors. For classiﬁcation, different types of classiﬁer are proposed including Random Forest, Support Vector Machine (SVM), Sparse Representation Based Classiﬁcation (SRC), Artiﬁcal Neural Network (ANN), Probabilistic Neural Network (PNN) and Long Short-Term Memory Units (LSTM).


Introduction
Terrain classification is an active area of research having a wide range of applications, e.g., outdoor terrain navigation, the recommendation of floor types for health care environments, sports flooring, consumer suggestion systems and autonomous driving [1,2]. In the literature, several terrain classification approaches have been proposed based on different types of data (e.g., visual data, acoustic, physical touch, etc.) typically acquired using optical cameras, 3D laser scanners and on-board sensors mounted over humanoid robots [3][4][5], autonomous off-road driving vehicles [6][7][8] and aerial platforms [9].
The traditional camera-based approaches employ visual features to distinguish different terrain types. One of the earliest approaches was proposed by Weszka et al. [10] who performed terrain classification using automatic texture measure. Anastrasirichai et al. [11] also presented an algorithm that used visual features captured during the human walk. Similarly, Ma et al. employed aerial image data for terrain classification to support off-road navigation of the ground vehicle using low-rank sparse representation. Peterson et al. [12] also used imagery data using aerial vehicle for ground robot navigation. With the aim of terrain classification, Dornik et al. [6] classified different soil types using geographic object-based analysis on images. Laible et al. [7] fused color information together with 3D scans obtained from LiDAR to perform terrain classification. Similarly, Ojeda et al. [13] employed fused data acquired from a suite of sensors consisting of microphone, accelerometer, gyroscope, infrared and motor current to train a feedforward neural network for terrain classification.
From a robotics perspective, Wu et al. [3] proposed a small legged robot that used an array of miniature capacitive tactile sensors to directly measure ground reaction forces (GRF) and used them to classify terrains. Zhang et al. [14] also sensed ground forces using force/torque sensor for biomimetic hexapod robots walking on unstructured terrains. Similarly, Giguere et al. [4] described a tactile probe for surface classification utilizing mobile robots, with single-axis accelerometer. Belter et al. [5] also addressed the issue of terrain perception and classification using noisy range data acquired via laser scanning and terrain mapping module in humanoid robots. Valada et al. [15] performed a robotic acoustic-based terrain classification that exploited the sound waves originating from the terrain-vehicle interaction to build a spectrogram image that is fed to a convolution neural network to learn deep features for subsequent terrain classification. Rothrock et al. [16] framed the problem of terrain classification as a semantic segmentation problem in which they employed deeplab to visually classify terrain type and then registered it with the slope angles and wheel slip data to generate a prediction model for Mars Rover mission. Similarly, for planetary missions, Brooks et al. [17] also analyzed the vibration patterns obtained via terrain-vehicle vibration to distinguish different terrain types. Yaguang et al. [18] proposed a method for terrain classification based on a combination of speed-up robust features for an autonomous multi-legged walking robot.
In the context of autonomous off-road driving, the terrain classification problem has been actively studied in the last decade. For instance, Manduchi et al. [1] presented a technique for terrain classification and obstacle detection for autonomous navigation using single-axis lidar and a stereo pair of sensors. Dupont et al. [19] and Lu et al. [20] also proposed terrain classification based on the vibrations generated by autonomous ground vehicles using a laser stripe based structured light sensor. Further, Delmerico et al. [21] also performed terrain classification using rapid training of a classifier for autonomous air and ground robots. In the problem of robotic terrain classification, the performance of the proposed methods depends on the appropriate navigation strategy, machine vibration and obstacles in the path.
Although these approaches work well, their accuracy is constrained due to several challenges. For example, in the case of visual sensor, the motion, occlusion and change of appearance in visual data due to varying illumination conditions cause degradation in performance. Similarly, the roboticsand autonomous off-driving-based approaches have trade-off in terms of accuracy, cost, and restrictive ambient conditions [3]. Thus, owing to these aforementioned issues, the problem of reliably classifying terrain types while maintaining the accuracy with a low-cost solution is a highly challenging task. Moreover, there is a need to explore the alternative approach against traditional terrain classifications. For instance, it is well known that human beings can capture information about terrain during walking by sensing it with their feet and by the sound of their footsteps [22]. The kinematic properties of the human motion pattern allow capturing the motion data for gait analysis, which in turn has been used as a reliable source for activity recognition [23] and estimating soft biometrics including gait-based age estimation [24,25], gender classification [24,26], emotion recognition [27] and human authentication/identification [28,29]. Moreover, with the ubiquitous availability of modern devices such as smartphones and wearables are typically equipped with many sensors. For instance, on-board/embedded inertial measurement units (IMUs) consisting of tri-axial accelerometers, tri-axial gyroscopes and tri-axial magnetometers are able to provide the inertial data (i.e., 3D acceleration and angular velocities with acceptable low noise rate) at no additional cost. Furthermore, together with such sensors, these smart devices are usually equipped with powerful processors capable of performing high computational tasks and thus potentially can capture and analyze inertial data without compromising the normal use of the device. This has potentially created many opportunities to solve real-life problems [30] including soft biometrics classification [31], reconstruction of human motion [32], and measuring physical health and basic activities.
Use of smart phone inertial data for terrain classification has not yet been extensively explored. To the best of our knowledge, there exist only a few methods that utilize inertial data of human gait for terrain classification [33,34]. Table 1 highlights the main approaches in terrain classification. In this context, this paper proposes a novel terrain classification framework that uses tempo-spectral features extracted from inertial data collected with a smart phone. The extracted features are used to train machine learning classifiers (support vector machine and random forest) and predict terrain types. More precisely, the gait patterns of normal human walk over six different terrain types with variations in hardness and friction were recorded with inertial sensors. The following are the main contributions of the proposed approach: • We collected gait data of 40 healthy participants using body-mounted inertial sensors (embedded in smartphones) attached on two body locations i.e., chest and lower back.The data were collected on six different types of terrains: carpet, concrete floor, grass, asphalt, soil, and tiles (as explained in Section 2.4). The data can be freely accessed by sending email to the corresponding author.

•
We propose a set of 194 saptio-spectral hand-crafted features per stride, which can be used to train different supervised learning classifiers (random forest and support vector machine) and predict terrains. The prediction accuracy remained above 90% for terrains under different classes such as indoor-outdoor, hard-soft, and a combination of binary, ternary, quaternary, quinary and senary terrain classes (details in Sections 2.4 and 3).

•
From the experimental results, we found that the lower back location is more suitable for sensor placement than chest for the task of terrain classifications as it produced the highest classification accuracies (details are in Section 4.1).  [20] Autonomous off-road driving 2009 Laser PNN Hu et al. [33] Human Gait 2018 IMU LSTM Diaz et al. [34] Human Gait 2018 Camera, IMU BoW model

Selection of Terrains
Several types of terrains, both natural and man-made, exist around us. The goal of this study was to classify such terrains which humans encounter on daily basis. In this context, we chose six different types of terrains having variations in hardness and friction. The hard terrains include concrete floor, asphalt, and tiles, whereas the soft terrains include carpet, grass, and soil ( Figure 1).
An indoor environment was used to record data for concrete floor, carpet, and tiles, whereas an outdoor environment was used for the rest of the terrains (Table 2).

Characteristics of the Population
The population consisted of 40 healthy South Asian subjects who voluntarily participated in the study. The subjects were briefed about the nature of the experiments, the type of data to be collected, and how it would be used in the research. All willing participants were asked to sign a consent form. The characteristics of the population including age, height, weight, and male-to-female ratio are given in Table 3.

Placement of Sensors
Recent studies have shown that body locations such as chest [35], arm [36], waist and lower back [36][37][38], and ankle [24,38] are appropriate for human activity recognition and soft bio-metrics estimation. We chose two different body locations i.e., chest and lower back, for the placement of sensors. Two smartphones were attached on the chest and the lower back. The data collected from each sensor were processed independently to find the suitable body location for the task of terrain classification, which was one of the main objectives of this study.

Data Collection
Most modern digital devices such as smart phones are equipped with a range of sensors including cameras and inertial measurement units (IMUs). For the purpose of data collection, we used two Android-based smart phones. The smart phones have on-board 6D IMUs (MPU-6500-InvenSense, San Jose, CA, USA) which can measure tri-axial accelerations and tri-axial angular velocities. The technical specifications of both MPU-6500 are given in Table 4.The smart phones were tightly attached at the chest and lower back of the subject using elastic belts, as shown in Figure 2.   A data recording application was developed to record 6D inertial data with the smart phones. The data were recorded at a sampling rate of 75 Hz.
The standardized gait task consisted of a straight walk for a distance of 10 m from a starting point, turning around, and returning to the starting point. Subjects were asked to walk in their natural gait while wearing shoes and repeat the standardized gait task twice on all six surfaces. This resulted in 40 × 6 = 240 m of walking by each subject on all terrains.

Segmentation of Signals into Strides
Human gait is characterized as "the succession of phases separated by foot-strikes (the foot is in contact with the ground) and takeoffs (the foot leaves the ground)" [39]. A stride is defined as a complete cycle of heel strike of a foot to the next heel strike of the same foot [40]. The low-level signals of IMUs are known to be noisy and suppression of noise is necessary in order to minimize the effect of the noise and correctly segment the raw signals into strides. To reduce the noise, we employed moving average filter [41,42], which is a known method of noise suppression. A window size of nine samples was used to smooth the raw signal. The smoothed signal can be decomposed into single steps using local maxima (peaks) or local minima (valleys) [43][44][45]. We detected the local minima in y-axis of acceleration signal only as it is the gravitational axis. The same valleys were used to segment all of the 3D accelerations and 3D angular velocities. This technique assured that the length of the segmented step remained the same for all 6D components. For strides, we segmented the signal only when the same foot consecutively stuck the ground twice, i.e., two consecutive steps (Figure 3).

Features Extraction
Choosing a set of good features is necessary for the success rate of a classifier. In this regard, we computed 33 unique features per stride. When computed from 6D low-level signals, this resulted in 194 features per stride (32 × 6 = 198 plus 1 × 2 = 2; the signal magnitude area was computed for the x-axis of acceleration and angular velocities only, while the rest of the features were computed for all 6D components). The 194 tempo-spectral features from each stride are shown in Table 5. The temporal features include mean, standard deviation, median, maximum, minimum, root mean square, signal magnitude area, index of maximum, index of minimum, power, entropy, energy, skewness, kurtosis, interquartile range, mean absolute deviation, jerk, zero crossing rate and max-min difference.
We computed Discrete Fourier Transform (DFT) of the raw signa. For an input vector x of length N, the DFT is a vector X of length N given by the formula: By using Equation (1), we computed the spectral features, which include mean, maximum value, magnitude, band power of signal, energy and nine coefficients of FFT. Table 5. List of hand-crafted features extracted from 6D accelerations and angular velocities for each stride. The features set consists of a total of 194 tempo-spectral features, of which 110 features were computed from the time domain (T) and 84 features were computed from the frequency domain (F).

Feature Domain Sensor Features Equation
Mean Index of minimum value in stride Total: 194

Classification
The primary goal of the study was prediction of ground surface or terrain using hand-crafted features. For this purpose, we used the Scikit-learn library (version 0.19.2), which is an open source machine learning library for Python [46]. We used two supervised learning algorithms i.e.,Random Forest (RF) and Support Vector Machine (SVM), and trained them using the features set discussed in the previous section. We used grid search and found the best parameters for classification of terrains. For RF, the model was trained using the following parameters: number_of_trees = 500, criterion = "gini", max_features = "auto", and max_depth = "none". For SVM, the following parameters were used for training the model: kernel = "RBF", C = 10, and gamma = 0.01.
Since the sensors were placed on two different locations of the subjects' bodies, each model was trained and validated with a features set computed from the data collected through the respective sensor. Furthermore, five different sets of features for each stride against each dataset were computed to test the prediction accuracy of each model under variable feature sets.  [47], which is a well-known model for cross validation, was used to measure the performance of each predictor. The chosen value of k was 10.

Binary Classification
In the binary classification, the terrains were grouped into pairs or binary classes: indoor (carpet, concrete, and tiles) and outdoor (grass, asphalt, and soil); hard (concrete, tiles, and asphalt) and soft (carpet, grass, and soil); and pair-wise (one-to-one). The classification results of all three cases are presented in the following subsections.

Indoor-Outdoor Classification
In the case of indoor-outdoor classification, the best classification accuracy of 97.48% (±0.29) was achieved with SVM for the lower back sensor followed by the chest sensor 97.11% (±0.32) when trained with all features (194 features per stride). A comparable classification rate on all features was observed with RF where the best accuracy was achieved with the lower back sensor 96.52% (±0.34) followed by the chest sensor 95.72% (±0.41). The precision, recall, and f1-score remained above 95% in all cases. Figure 4 shows the confusion matrices of indoor-outdoor classification computed with SVM and RF using the different sets of features computed from the lower back sensor data. Detailed classification results for different sensor placements against different features sets are presented in Table 6. In general, for all three sensor placements, models trained with all features produced the highest classification accuracies.

Hard-Soft Classification
The hard-soft terrains classification is a binary classification case where the surfaces were categorized into hard surfaces (concrete floor, asphalt, and tiles) and soft surfaces (carpet, grass, and soil). The best classification accuracy of 92.08% (±0.25) was observed with RF for the lower back sensor when trained on all features. On the same features set and the same lower back sensor, the SVM produced an average classification accuracy of 90.64% (±0.33). For the chest, the RF achieved classifications accuracy of 89.08% (±0.28), whereas the SVM achieved classification accuracy of 89.57% (±0.31). Figure 5 shows the confusion matrices computed by RF and SVM on all features using the lower back sensor. Detailed classification results for different sensor placements against different features sets are presented in Table 6. A trend similar to hard-soft classification was observed again where the models trained with all features produced the highest classification accuracies for all sensor placements.

Pair-Wise Classification
The goal of pair-wise classification was to compute and compare classification accuracies of different terrains against each other as pairs.
The best classification accuracies were achieved with all features computed from the lower back sensor data and trained with SVM, as shown in Figure 6. The best pair-wise classification accuracies of above 98% were observed between carpet and asphalt, carpet and soil, asphalt and tiles, and soil and tiles. The lowest pair-wise classification accuracy of 91% was seen between grass and soil. The precision, recall, and f1-score remained above 93% in all cases. Detailed pair-wise classification results are presented as bar graphs in Figure 7 and in Table 6.

Senary Classification
The senary classification is the most significant case as all surfaces were compared with each other. The results are presented in Table 7. The trend remained similar to all of the previous cases where the best average classification accuracies were achieved with the features set computed from the lower back mounted sensor. The RF produced an average classification accuracy of 88.7% (±0.4), whereas SVM produced an average classification accuracy of 87.5% (±0.43). For the chest sensor, the average classification accuracies remained at 86% for both SVM and RF. The precision, recall, and f1-score remained above 84% in all of the cases. Figure 8 shows the confusion matrices computed with SVM and RF for the lower back sensor for different features sets. The best classification accuracies of 89% and 87.5% were achieved with temporal features for RF and with all features for SVM, respectively. The features set computed with the 3D angular velocities produced the lowest classification accuracies for both RF and SVM. From the confusion matrices computed on all features, the highest confusion was mostly observed between soft and hard surfaces, e.g., carpet and concrete (RF: 8.3%, SVM: 6.9%), carpet and tiles (RF: 7.8%, SVM: 8.4%), and asphalt and soil (RF: 3.9%, SVM: 4.0%). Interestingly, highest confusion was also observed between grass and soil (RF: 7.9%, SVM: 9.2%), which was mainly due to the natural similarities between soil and grass.
The radar graph in Figure 8b shows the classification accuracies of senary classification computed with all features using RF and SVM. It is observable that the highest classification accuracy was achieved for carpet, whereas the lowest classification accuracy was achieved for soil (because of confusion between soil and grass).

Summary of Findings
The goal of this work was to classify the type of terrain from the inertial data of the strides collected with a single body mounted IMU. 6D accelerations and angular velocities were recorded with the help of two android based smart phones (on-board MPU-6500 IMU). The sensors were mounted on two different body locations, i.e., chest (smart phone), and lower back (smart phone). A total of 40 volunteers participated in data collection sessions and their gait data were recorded on six different terrains, namely carpet, concrete floor, grass, asphalt, soil, and tiles. A valleys detection method was used to segment low-level 6D gait signals into strides. A total of 194 tempo-spectral features were computed for each of the stride, which included 110 time domain features and 84 spectral domain features. The choice of the predictors was Random Forest and Support Vector Machine and 10-fold cross validation was used as validation model. Figure 9 shows the comparison of classification accuracies achieved with SVM and RF for each sensor location. It is observable that the highest classification accuracy was achieved with the lower back sensor data followed by the chest data. Furthermore, it can be seen that SVM performed better with fewer classes, i.e., indoor-outdoor, hard-soft, binary and ternary classification, whereas the RF performed better with more classes, i.e., quaternary, quinary and senary. This is true for the features sets computed from the data collected with the sensors attached on lower back and chest positions. The gradual drop in the classification accuracies as the number of classes increases was because of the natural similarities between different types of surfaces, which caused a significant number of samples to be mis-classified.

Comparison with Existing Approaches
Terrain classification has been extensively studied in the domain of humanoid robots and autonomous mobile vehicles; however, we have found few studies that focus on terrain classification using inertial data of human gait. In the experimental setup of Hu et al. [33], an IMU was mounted on L5 vertebra of the subjects and the gait data of 35 subjects were recorded on two different types of surfaces: flat surfaces and uneven bricks. A deep learning model with long short-term memory units was used for training and prediction. They achieved a surface classification accuracy of 96.3%. Anantrasirichai et al. [11] used visual features captured from body mounted cameras during human locomotion. They considered three different classes of terrains, i.e., hard surfaces, soft surfaces, and unwalkable surfaces. They reported a classification accuracy of 82%. Diaz et al. [34] proposed a terrain identification and surface inclination estimation system for a prosthetic leg using visual and inertial sensors. They recorded data on six different surfaces and achieved an average classification accuracy of 86%. Libby et al. [48] performed acoustic-based terrain classification for robots that used sound from the interaction of vehicle and terrain. They reported a classification rate of 92%. They performed sliding window operation for feature extraction, which is not efficient with respect to time, as compared to the proposed method. In comparison to these approaches, the proposed approach was tested on six different terrains, namely, carpet, concrete floor, grass, asphalt, soil, and tile, and the classification accuracies outperformed others in all cases, as shown in Table 8.

Limitations
In our experiments, we only collected data on six different terrains, namely carpet, concrete floor, grass, asphalt, soil, tiles. However, there exist many practical terrains such as pebbles, sand, gravels, exercise mats, footpaths, etc., which should be considered in data collection. This would help in analyzing the behavior of the proposed approach under broader spectrum. Similarly, our database consists of only 40 subjects with a male to female ratio of 30:10, which is unbalanced as only 25% of the population is female. Extending the database and including more female participants to balance the population and testing the proposed approach with the extended dataset is another important direction of the future work. Another limitation of the proposed approach is placement of the sensors i.e., chest and lower back. There exist many other practical sensor placement locations such as wrists, side and back pockets, chest pockets, etc. that should also be considered. This would help in terrain classification in a ubiquitous manner. Another important aspect is that, although the data from both the sensors were collected simultaneously, there was no stringent requirement for synchronization, as both time series were independent from each other. Human intervention was only needed in the pre-processing step for training data preparation to retain the time series data when the subject is walking. Since we were interested in terrain classification using single strides, these strides were extracted automatically using local minima from the time series data. For practical applications, automatic segmentation of strides and decision making (for terrain classification) using sequential analysis is indeed a direction of future work.

Conclusions
The novelty of this work is finding a set of hand-crafted features from the inertial data of human gait that can be used to train and predict different of terrains. We showed that a single stride has enough information encoded to predict terrains. We also showed that the set of hand-crafted features can be computed from either of the two body parts, i.e., lower back or chest The highest classification accuracy of 97% was achieved for indoor-outdoor classification, whereas the lowest classification accuracy of 89% was achieved for senary classification. The results also show that, for lower back and chest sensors, SVM performed better than RF in binary classification case only. We also present the comparison of different sets of features, which were computed from tempo-spectral domain, where the set of all features performed better than any other features set. Moreover, our results shows that smart phones can be used for data collection and terrains classification. The possible applications of the proposed approach include monitoring of the conditions of sidewalks to identify the areas which need renovation, terrain awareness systems to guide visually impaired persons [50], and production of digital terrain models using body mounted IMUs [51].

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The results of ternary, quaternary and quinary classifications are discussed in the following subsections.

Appendix A.1. Ternary Classification
For ternary classification, any three of the six surfaces were compared with each other and classification accuracies were computed. This resulted in a total of 20 ternary combinations, as shown in the parallel coordinate chart ( Figure A1). In most of the cases, the best average classification accuracies of around 96% were achieved with the features set computed from the lower back mounted sensor and both SVM and RF produced comparable results. The lowest average classification accuracies of up to 91% were observed with the features set computed from the chest sensor. Figure A2 shows the confusion matrices of all of the 20 ternary terrain combinations using SVM on all features computed form the lower back sensor data. The best ternary classification accuracy of above 96% was achieved among carpet, asphalt, and soil, whereas the lowest accuracy of 89% was observed among carpet, concrete, and tiles as well as grass, asphalt, and soil. In the case of RF, the average classification accuracy for the lower back sensor remained above 93% when trained with all features. For the chest sensor, both SVM and RF produced an average classification accuracy of 92%. The precision, recall, and F1-score remained above 89% in all cases. Detailed results are presented in Table A1.

. Quaternary Classification
The objective of quaternary classification was to compute and compare the classification accuracies of any four of the six surfaces against each other. In this regard, a total of 15 quaternary combinations were created and the results are presented in parallel coordinate chart ( Figure A3). In general, a trend similar to ternary terrain classification was observed where the best average classification accuracies were achieved with the features set computed from the lower back mounted sensor for both SVM and RF classifiers. The best classification accuracy, however, dropped to 94% from 96%. The lowest average classification accuracies were observed with the features set computed from the chest sensor. Figure A4 presents the confusion matrices for all 15 quaternary terrain combinations using SVM on all features computed form the lower back sensor data. The best quaternary classification accuracy of around 94% was achieved among concrete, grass, soil, and tiles, whereas the lowest accuracy of around 90.15% was observed among carpet, grass, soil, and tiles. The average classification accuracy for the lower back sensor remained above 91% for both SVM and RF when trained with all features, as shown in Table A1. For the chest sensor, both SVM and RF produced an average classification accuracy of 90%. The precision, recall, and F1-score remained above 86% in all cases.  Figure A4. Confusion matrices of quaternary classification with lower back sensor. All results were obtained with all features set (194 features) using SVM classifier. Classes: C Ca T = carpet, C Co T = concrete, C Gr T = grass, C As T = Asphalt, C So T = soil, and C Ti T = tiles.