Vibration and Image Texture Data Fusion-Based Terrain Classiﬁcation Using WKNN for Tracked Robots

: For terrain recognition needs during vehicle driving, this paper carries out terrain classiﬁ-cation research based on vibration and image information. Twenty time-domain features and eight frequency-domain features of vibration signals that are highly correlated with terrain are selected, and principal component analysis (PCA) is used to reduce the dimensionality of the time-domain and frequency-domain features and retain the main information. Meanwhile, the texture features of the terrain images are extracted using the gray-level co-occurrence matrix (GLCM) technique


Introduction
Tracked chassis vehicles have the advantages of high traction, low grounding pressure, good passing performance, etc., and are frequently used in engineering machinery, agricultural machinery, military vehicles, etc. [1]. Tracked robots are typical tracked vehicles, which can replace humans in some dangerous jobs (such as demolition, chemical detection, etc.) through equipped cameras, sensors, manipulators, and other devices, which can effectively reduce manual labor intensity and avoid possible dangers, among other purposes. With the rapid development of autonomous driving technology, higher requirements have been put forward for intelligent path planning and precise path tracking control [2], and the online identification of ground types is a key factor in further improving its performance. At present, relevant scholars have conducted extensive research on terrain classification. According to the type of sensors used, terrain classification methods can be divided into two categories based on external-type sensors (vision, radar, etc.) and internal-type sensors (acceleration sensors, current sensors, etc.) [3]. External-type sensors enable the detection of environmental information, while internal-type sensors enable the detection of ambient information regarding the state of the robot itself.
The vibration signals between wheels or tracks and the ground during robot motion can directly reflect different terrain characteristics. In a study by Ward et al. [4], an algorithm is presented to classify terrain using a single-suspension mounted accelerometer. Shi et al. improved Laplacian SVM based on the homogeneities in feature space and the temporal dimension, which increased classification accuracy [5]. Xue et al. performed the timedomain feature extraction of vibration signals and used the KNN method for terrain classification [6]. Komma et al. proposed an adaptive terrain classification strategy based on vehicle vibration information using a probability framework of Bayesian filters [7]. Brooks et al. proposed a self-supervised learning framework that enables a robotic system to learn to predict the mechanical properties of distant terrain based on measurements of the mechanical properties of similar terrain that has previously been traversed [8]. DuPont et al. studied a vehicle terrain classification algorithm using a probabilistic neural network, where the frequency responses of terrain-induced vehicle vibrations comprise the terrain signature [9]. Bai et al. proposed a terrain classification approach based on 3D vibrations induced in a rover structure by the wheel-terrain interactions [10]. Du et al. used PCA to reduce the dimensionality of the features based on the time-frequency domain features of the vibration data and performed terrain classification based on reduced feature vectors [11]. These methods are all based on vibration signals to perform the classification of terrain; but relying on vibration signals alone for terrain classification may be limited by the difficulty of data acquisition, leading to a decrease in classification accuracy.
The camera and radar can directly acquire information regarding the external environment of a robot to determine the current driving terrain. Woods et al. proposed a method for segmenting and classifying terrain types based on range data to establish a geometric model of the terrain that maps terrain types to friction coefficients [12]. Wu et al. used a novel hybrid coding architecture and deep filter banks, combining a stacked denoising sparse autoencoder and fisher vector for visual terrain classification [13]. Hu et al. used a deep convolutional neural network and conditional random field to classify terrain based on radar information [14]. Filitchkin et al. used a bag of visual words created from speeded up robust features with a support vector machine classifier to classify terrain [15]. Kurup et al. used a fusion of visual data from a camera and vibrational data from an inertial measurement unit [16]. In [17], a hybrid method based on deep learning is proposed to visually classify terrains encountered by mobile robots. Hanson et al. used machine learning-based approach with neural networks to fuse images and spectral and inertial measurement unit data for terrain classification [18].
The aforementioned research demonstrates that the vibration-signal-based terrain classification methods have the advantages of low cost, easy implementation, and high efficiency, and they are susceptible to external disturbances, such as the vibration and load changes of the tracked robots themselves. Although the terrain images can directly reflect the current terrain, they are also affected by factors such as lighting, shading, and weather changes, which can lead to incorrect terrain classification. Unlike the terrain classification of traditional wheeled robots, tracked robots work in harsher environments and vibrate more intensely during driving, which means that the vibration signals will contain more disturbances, all of which add to the complexity of the terrain classification of tracked robots. As a result, terrain classification research on tracked robots remains a challenging task. Under these conditions, existing methods using single information classification cannot obtain the expected results, and an effective terrain classification strategy needs to be proposed to address these challenges. Considering the complexity of the environment in the process of terrain classification, image texture features that can reflect rich image information with strong anti-interference ability are selected as supplements. Therefore, this paper proposes a terrain classification method for vibration PCA and image texture data fusion-based terrain classification using WKNN for tracked robots. The main contributions and innovations of this work are summarized as follows: (1) Describing the characteristics of different driving terrains by extracting time-domain and frequency-domain features from the vibration signals of tracked robots These extracted features are then processed using the PCA algorithm to reduce dimensionality, transforming the original vibration signals into more representative feature vectors. Additionally, the PCA-processed vibration signals features are fused with the terrain image texture features to obtain more detailed and precise information to describe the current terrain features of the tracked robot. (2) To enhance the significance of closer samples in determining terrain classification, the traditional KNN algorithm is modified by introducing a weighting operation, resulting in the improved WKNN algorithm. Combined with the previously extracted feature matrix that combines vibration and image information, the tracked robot achieves terrain classification with high accuracy and robustness. The time-domain signal is a one-dimensional signal with time characteristics, which is more intuitive in signal processing. The primary performance of the signal is determined by its amplitude period and other features. In this paper, various time-domain characteristic information, such as mean amplitude, square root amplitude, maximum value, minimum value, peak, peak value, square mean root, root mean square, crest factor, clearance or margin factor, kurtosis, variance, standard deviation, skewness, waveform factors, pulse factor, residual gap factor, and skewness factor, are selected to describe the time-domain characteristics of the signal [19,20], as shown in Table 1.

Features Description Features Description
Mean amplitude Root mean square

Skewness factor
Crest factor T 9 = T 5 T 8 Peak state factor Clarence or margin factor T 10 = T 6 T 8 Yield factor T 20 = T 5 T 2

Frequency-Domain Extraction
The Fourier transform of the signal can obtain the signal frequency components that are difficult to obtain in the time-domain, and the new feature vectors can be obtained by frequency-domain analysis. In this paper, we choose the typical feature information such as mean, frequency center, variance of mean frequency, median value, peak, root mean square, mean square frequency, and root mean frequency square to describe the frequency-domain characteristics, as shown in Table 2. Table 2. Frequency-domain feature extraction.

Features Description Features Description
Mean Frequency center Root mean square Variance of mean frequency Median value F 4 = median(s(u)) Root mean frequency square To ensure consistent weights, value ranges, and units for the features, the extracted time-domain and frequency-domain feature information is initially normalized. Each extracted feature is scaled using the normalization method to ensure all feature values are to a range between 0 and 1, as shown in Equation (1).
where x i is the i-th eigenvalue, x imin is the minimum value of the i-th eigenvalue, x imax is the maximum value of the i-th eigenvalue, and X is the i-th eigenvalue after normalization. The PCA method is a widely used technique for reducing the dimensionality of a dataset, thereby enabling improved analysis and visualization. The fundamental concept is to transform each feature (dimension) in the original dataset into a new coordinate system composed of the principal components of the original data. Principal components are the directions with the largest variance. They are a set of mutually orthogonal unit vectors that can be used to describe the largest variance in the dataset. The main steps of PCA dimensionality reduction are as follows [11,21]: (1) Firstly, it is first necessary to standardize the data in Z-groups so that the units and ranges of different features are the same, as shown in Equation (2).
where X is the original data, µ is the mean value of the data, and σ is the standard deviation of the data.
(2) Based on the given data matrix, the covariance matrix C is constructed as follows: where n is the number of samples.
(3) Based on this, the eigenvalues of the covariance matrix C and the corresponding eigenvectors can be obtained. (4) The selection criteria for the principal components are as follows: The eigenvalues are ranked from largest to smallest, and the number of principal components that satisfy the 99% contribution rate is selected according to the magnitude of the eigenvalues m. The eigenvectors corresponding to the top m eigenvalues are chosen as the principal components.
(5) Based on the selected principal components, the projection matrix P is constructed as follows: where q i is the feature vector corresponding to the i-th principal component.
The original data is multiplied by the projection matrix to obtain the reduced dimensional data in the new coordinate system, as shown in Equation (5).
where Y is the reduced dimensional data, Z is the normalized data, and P is the projection matrix. The above steps lead to the dimensionality-reduced dataset Y, in which each column corresponds to a principal component. These components are used to represent the mapping of the original data in the new coordinate system. In this paper, the time-domain features and frequency-domain features are combined and reduced in dimensionality according to the PCA algorithm to reduce the noise and eliminate redundant information in the feature vectors, and to gather data for the upcoming terrain classification project, including the primary feature vectors of acceleration information.

Image Texture Feature Extraction
Since terrain images contain rich terrain information, and their key features are reflected in aspects such as grayscale information and texture information of images, this paper uses GLCM analysis to achieve image terrain texture feature extraction [22]. GLCM analysis method is a commonly used method for image texture feature extraction. It describes the texture information of an image by analyzing the gray-level relationship between different pixels in the image. Commonly used GLCM features include contrast, correlation, energy, entropy, etc. [23]. These features can be used to describe the interrelationship of pixels in different directions in an image and reflect the texture characteristics of the image. Based on the theoretical analysis and experimental results, five texture feature parameters, namely energy S E , contrast S con , correlation S cor , entropy S En , and difference moment D, are selected as feature vectors in this paper, which are calculated as follows [24,25]: where P(i, j) denotes the i-th row and j-th column element values in the GLCM; L is the grayscale quantization level; µ i and µ j are the mean values of rows and columns of P, respectively; and σ i and σ j are the standard deviations of rows and columns of P, respectively.

WKNN Classification Algorithm
The KNN algorithm is a machine learning algorithm originally proposed by Cover and Hart [26], and its core idea is to find the k training samples in the sample space that are nearest to the sample to be classified [27]. Based on the category of these k samples, voting is performed, and finally, the samples to be classified are classified into the category with the most votes. The WKNN algorithm functions as follows: (1) Calculating the distance between the samples to be classified and the training samples.
For each sample to be classified, the distance between it and each sample in the training set needs to be calculated. The commonly used distance measures include Euclidean distance, Manhattan distance, Chebyshev distance, etc. In this paper, we use the Euclidean distance as the distance measure, assuming that the samples to be classified are x = (x 1 , x 2 , World Electr. Veh. J. 2023, 14, 214 6 of 14 . . ., x n ). If the training samples are y = (y 1 , y 2 , . . ., y n ), then the distance d(x, y) between them can be calculated using the following equation: (2) Selecting the closest k training samples; Based on the calculated distance between each sample to be classified and the training samples, the k training samples with the closest distance are selected.
(3) Counting the categories of k training samples and obtaining result from a vote.
For these k training samples, the categories to which each of them belong are counted, and a vote is obtained. A majority vote is used, i.e., the category with the most votes is used as the category for the sample to be classified. If there is a tie in the number of votes, one category can be randomly selected as the final result.
(4) Classifying the samples to be classified into a category with the most votes.
In the KNN algorithm, the selection of k value is the key issue. If the k value is too small, it is easy to be disturbed by noise, leading to classification errors; if the k value is too large, it is easy to ignore local features, leading to a decrease in classification accuracy. In order to consider the influence of distance weights between neighboring samples on the prediction results, this paper uses the WKNN classification algorithm. This algorithm uses the WKNN samples to weight the classification results in order to deal with unbalanced data sets and improve the classification accuracy. Commonly used weight calculation methods include inverse distance weights and kernel function weights. In this paper, we have chosen the inverse distance weight, and the weight formula is as follows (i.e., the closer the neighbor weight, the bigger the weight): Finally, the classification results are weighted according to the weights of neighboring samples. The final classification results can be obtained using weighted voting or weighted averaging.

Experimental Method
The experiments were conducted using a tracked robot whose photograph is shown in Figure 1a. The parameters of the accelerometer (HWT605) and the camera module (RER-USB4KHBR01) used for terrain classification are shown in Tables 3 and 4, respectively.
When a tracked robot traverses various road surfaces, the irregular impact during the contact between the track and the ground leads to vibration generation, and the generated vibration is transmitted to various parts of the body through, for example, the transmission mechanism. The results of various experiments show that installing the vibration sensor above the body has the best impact. In order to obtain accurate terrain image data, a common method is to mount the camera in front of the tracked robot. With this configuration, the tracked robot can capture and record terrain information in real time as it travels through the area. Ultimately, the data acquisition scheme used for terrain classification of the tracked robot is shown in Figure 1b. The host computer (laptop) communicates with the STM32 controller of the tracked robot through serial communication to realize the motion control for the tracked robot. Five common driving environments of tracked robots, including asphalt road, cement road, marble road, grassland, and cobblestone roads, are selected to carry out terrain classification research. The speed of tracked robots is set to low speed (0.3 m/s) and high speed (0.7 m/s), and three sets of data are collected for each terrain and speed. In order to ensure that the acceleration data is not affected by the vehicle start and stop, the first 3 s and the last 3 s of each data set are excluded to generate a new vibration data set on the basis of sufficient experimental time. Considering the data transmission and storage speed, the camera frame rate is set to 1 FPS to collect the image data. The images acquired by the camera and the corresponding vibration subseries acquired by the accelerometer are shown in Figure 2.

Experimental Method
The experiments were conducted using a tracked robot whose photograph is shown in Figure 1a. The parameters of the accelerometer (HWT605) and the camera module (RER-USB4KHBR01) used for terrain classification are shown in Table 3 and Table 4, respectively. When a tracked robot traverses various road surfaces, the irregular impact during the contact between the track and the ground leads to vibration generation, and the generated vibration is transmitted to various parts of the body through, for example, the transmission mechanism. The results of various experiments show that installing the vibration sensor above the body has the best impact. In order to obtain accurate terrain image data, a common method is to mount the camera in front of the tracked robot. With this configuration, the tracked robot can capture and record terrain information in real time as it travels through the area. Ultimately, the data acquisition scheme used for terrain classification of the tracked robot is shown in Figure 1b   (0.7 m/s), and three sets of data are collected for each terrain and speed. In order to ensure that the acceleration data is not affected by the vehicle start and stop, the first 3 s and the last 3 s of each data set are excluded to generate a new vibration data set on the basis of sufficient experimental time. Considering the data transmission and storage speed, the camera frame rate is set to 1 FPS to collect the image data. The images acquired by the camera and the corresponding vibration subseries acquired by the accelerometer are shown in Figure 2.

Terrain Classification Based on Vibration Signals
the average of the 3 sets of vibration data mentioned above are taken and segmented into training samples; the original vibration data sets are similarly segmented to be used as test samples. The time-domain and frequency-domain features of the training and test samples are extracted according to Table 1 and Table 2

Terrain Classification Based on Vibration Signals
The average of the 3 sets of vibration data mentioned above are taken and segmented into training samples; the original vibration data sets are similarly segmented to be used as test samples.  The overall average accuracy rate (AAR) of the tracked robot at 0.3 m/s is 57.14%, and the overall AAR at 0.7 m/s is 60.00%. At low speeds, tracked robots are less affected by road feedback vibration and are more affected by external interference, resulting in lower classification accuracy at this time. In addition, due to the similar terrain of tarmac, concrete, and marble roads as well as similar vibration data, the characteristics of the three are easy to be confused, which leads to confusion in the process of terrain classification and makes the classification accuracy decrease.
The time-domain and frequency-domain features mentioned above are extracted. Then, the PCA algorithm to reduce their dimensionality are applied and a more representative feature matrix is obtained. Finally, the WKNN algorithm is used for terrain classification. The terrain classification results of tracked robots at speeds of 0.3 m/s and 0.7 m/s are shown in Tables 7 and 8. Comparing Tables 5 and 7, the accuracy increases from 57.14% to 62.86% at 0.3 m/s. Comparing Tables 6 and 8, the accuracy increases from 60% to 80.00% at 0.7 m/s. These findings indicate that the WKNN classification method, which is based on the PCA processing of vibration data, proposed in this paper, significantly improves the classification accuracy compared to the traditional KNN classification method based on original data.

Terrain Classification by Fusing Vibration and Image Information
According to the previous analysis, due to the similarity of terrain between asphalt roads, cement roads, and marble roads, relying solely on vibration signals for terrain classification leads to a decrease in classification accuracy. In this paper, image texture is introduced as the basis for classification. Figure 3 demonstrates the texture feature parameters of the image extracted under different terrain conditions. By comparing these texture feature parameters, the texture differences between different ground images can be reflected. This proves that terrain classification and recognition can be achieved by analyzing the texture characteristics of ground images. roads, cement roads, and marble roads, relying solely on vibration signals for terrain classification leads to a decrease in classification accuracy. In this paper, image texture is introduced as the basis for classification. Figure 3 demonstrates the texture feature parameters of the image extracted under different terrain conditions. By comparing these texture feature parameters, the texture differences between different ground images can be reflected. This proves that terrain classification and recognition can be achieved by analyzing the texture characteristics of ground images. The classification results using image texture features are shown in Tables 9 and 10. Comparing Table 5 and Table 9, the accuracy at v = 0.3 m/s increases from 57.14% to 77.14%. Comparing Table 6 and Table 10, the accuracy at v = 0.7 m/s increases from 60% to 62.86%. Compared to using vibration signals, classification based on image texture features has higher accuracy. However, it is greatly affected by the real-world environment. The experiment is conducted under ideal conditions, and working in harsh actual environments The classification results using image texture features are shown in Tables 9 and 10.  Comparing Tables 5 and 9, the accuracy at v = 0.3 m/s increases from 57.14% to 77.14%. Comparing Tables 6 and 10, the accuracy at v = 0.7 m/s increases from 60% to 62.86%. Compared to using vibration signals, classification based on image texture features has higher accuracy. However, it is greatly affected by the real-world environment. The experiment is conducted under ideal conditions, and working in harsh actual environments will lead to a decrease in classification accuracy. Therefore, it is necessary to integrate vibration signal features and image texture features to improve the accuracy of terrain classification for tracked robots. Vibration information and image texture should be selected at the same time for feature extraction. The vibration features are first processed through PCA method, and then fused with the texture features of the image. Based on the fused features, the WKNN algorithm is used for terrain classification. The classification results are shown in Tables 11 and 12.  Comparing Tables 5 and 11, the accuracy rate increases from 57.14% to 97.14% when the speed is 0.3 m/s. Similarly, comparing Tables 6 and 12, the accuracy rate increases from 60.00% to 91.43% when the speed is 0.7 m/s. We can conclude that the WKNN algorithm, while using the fusion of vibration and image information proposed in this paper, performs better than a single vibration or image information classification method, with a significant improvement in accuracy. The comparison of the results of various classification methods for different features is shown in Figure 4. Compared to the traditional single data feature classification method and the WKNN classification method proposed in this paper, the accuracy of the fusion of multiple data features has been greatly improved, indicating that the proposed method has superior performance in terrain classification. At the same time, it proves that combining image information and vibration signals for terrain classification can not only provide more comprehensive and diverse information sources, but also that the two types of information are complementary in terrain classification, thereby improving the robustness of the classifier. The comparison of the results of various classification methods for different features is shown in Figure 4. Compared to the traditional single data feature classification method and the WKNN classification method proposed in this paper, the accuracy of the fusion of multiple data features has been greatly improved, indicating that the proposed method has superior performance in terrain classification. At the same time, it proves that combining image information and vibration signals for terrain classification can not only provide more comprehensive and diverse information sources, but also that the two types of information are complementary in terrain classification, thereby improving the robustness of the classifier.

Conclusions
This paper proposes the WKNN algorithm, that integrates vibration and image information for terrain classification of tracked robots. By utilizing the PCA technique to reduce the dimensionality of vibration features, a new vibration feature is constructed by selecting feature vectors that meet a 99% contribution rate. At the same time, image texture features are simultaneously extracted and combined in the feature layer to generate new terrain feature vectors. The fused feature information is then classified using the WKNN algorithm. Finally, trials are conducted to confirm the combine them proposed method.
The experimental results demonstrate that the WKNN classification method, based on the PCA processing of vibration data proposed in this paper, significantly improves the classification accuracy compared to the traditional KNN classification method based on original data. At a velocity of 0.3 m/s, the classification accuracy improves from 57.14% to 62.86%, whereas at 0.7 m/s, the accuracy surges from 60.00% to an impressive 80.00%. The superiority of this approach's superiority has been proven. By combining vibration PCA and image texture data and using the WKNN algorithm for terrain classification, the classification accuracy can reach 97.

Conclusions
This paper proposes the WKNN algorithm, that integrates vibration and image information for terrain classification of tracked robots. By utilizing the PCA technique to reduce the dimensionality of vibration features, a new vibration feature is constructed by selecting feature vectors that meet a 99% contribution rate. At the same time, image texture features are simultaneously extracted and combined in the feature layer to generate new terrain feature vectors. The fused feature information is then classified using the WKNN algorithm. Finally, trials are conducted to confirm the combine them proposed method.
The experimental results demonstrate that the WKNN classification method, based on the PCA processing of vibration data proposed in this paper, significantly improves the classification accuracy compared to the traditional KNN classification method based on original data. At a velocity of 0.3 m/s, the classification accuracy improves from 57.14% to 62.86%, whereas at 0.7 m/s, the accuracy surges from 60.00% to an impressive 80.00%. The superiority of this approach's superiority has been proven. By combining vibration PCA and image texture data and using the WKNN algorithm for terrain classification, the classification accuracy can reach 97.14% at 0.3 m/s; at 0.7 m/s, the accuracy can reach 91.43%, further enhancing the precision of terrain classification. Combining vibration and image information not only provides a more comprehensive and diverse source of information, but also improves the accuracy and robustness of terrain classification. This indicates that the method proposed in this paper has great potential in the research of terrain classification for tracked robots. Future research will further explore and optimize this method to be applicable to a broader range of tracked-robot application scenarios.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.