Frequency-Temporal Disagreement Adaptation for Robotic Terrain Classification via Vibration in a Dynamic Environment

The accurate terrain classification in real time is of great importance to an autonomous robot working in field, because the robot could avoid non-geometric hazards, adjust control scheme, or improve localization accuracy, with the aid of terrain classification. In this paper, we investigate the vibration-based terrain classification (VTC) in a dynamic environment, and propose a novel learning framework, named DyVTC, which tackles online-collected unlabeled data with concept drift. In the DyVTC framework, the exterior disagreement (ex-disagreement) and interior disagreement (in-disagreement) are proposed novely based on the feature diversity and intrinsic temporal correlation, respectively. Such a disagreement mechanism is utilized to design a pseudo-labeling algorithm, which shows its compelling advantages in extracting key samples and labeling; and consequently, the classification accuracy could be retrieved by incremental learning in a changing environment. Since two sets of features are extracted from frequency and time domain to generate disagreements, we also name the proposed method feature-temporal disagreement adaptation (FTDA). The real-world experiment shows that the proposed DyVTC could reach an accuracy of 89.5%, but the traditional time- and frequency-domain terrain classification methods could only reach 48.8% and 71.5%, respectively, in a dynamic environment.


Introduction
Robotic terrain classification refers to the process of a mobile robot classifying the terrain, on which it is traversing or will traverse, as one of the predefined classes [1]. An accurate terrain classification method is of great importance to an autonomous robot performing field tasks which usually need to traverse a variety of terrains like sand, grass, gravel, or clay [2,3]. For example, if a wheeled robot decides to traverse the sandy ground, its wheels may sink into the sand; and therefore, the robot could our work. In the learning algorithm, the concept of ex-and in-disagreement are introduced, which is verified to be powerful to extract key samples and label them in high accuracy.
The rest of the paper is organized as follows. Section 2 covers the framework description of the proposed terrain classification method, as well as the details of some key steps, including feature extraction, classification algorithm, Bayesian filter, domain fusion, and pseudo-labeling algorithm. Section 3 presents the experimental verification, including the description of the experimental robot and data collection, performance evaluation of classifier and Bayesian filter, and comparative study between the existing methods and ours. The paper is concluded in Section 4.

Methodology
The framework of the proposed DyVTC is shown in Figure 1. A single vibration point provides an extreme limited information, so we should use a vibration frame, which is composed of a certain number of successive vibration points, to extract its representative features. All vibration frames are transformed into samples both in the time and frequency domain. Based on the labeled timeand frequency-domain vibration samples, two classifiers are obtained by batch training, respectively. The above process is offline. When the mobile robot is operating outdoors, online-collected vibration samples are fed into the pre-trained classifiers; and then, the classifier-output terrain predictions are fed into Bayesian filter to yield a better terrain prediction. Meanwhile, the classifier-and filter-output terrain predictions are analyzed based on the mechanism of ex-and in-disagreement; and therefore, some key samples could be extracted and labeled in high accuracy. When these pseudo-labeled samples accumulate to some extent, they are used to re-train the classifiers incrementally. The rest of the section expatiates on some key steps in the DyVTC. . The rectangular and elliptical blocks represent operations and dataset, respectively. The rectangles without corners represent models. We use the color blue to highlight the online parts.

Feature Extraction
We use an accelerometer to detect the acceleration along the vertical axis at 100 Hz, thus obtaining the acceleration time series. Due to the presence of gravity, the accelerometer does not detect a pure motion vibration, but the vertical acceleration mixed with gravitational acceleration. Hence, we subtract the gravitational acceleration constant from the acceleration time series, and therefore obtain the vibration time series. Furthermore, the vibration time series is split into vibration frames, each of which contains n vibration points. To guarantee a real-time terrain classification, each vibration frame overlaps the successive one by 50%. Define a vibration frame by a = (a 1 , a 2 , · · · , a n ). Now we are in the position to extract features from a in the frequency domain and time domain.

Frequency-Domain Features
The expression of time series in the frequency domain is usually beneficial to simplify the mathematical analysis and understand the signal components. The discrete Fourier transform (DFT) is such a powerful tool to yield the amplitude spectrum of the time series, thus being intensively used in the analysis of time series. The N-point DFT on the vibration frame a is defined by [40] where j 2 = −1, k is the frequency. The implementation of DFT often employs an efficient algorithm, which is well known as fast Fourier transform (FFT). For an N-point FFT, the parameter N is typically specified as a power of 2 or a value that can be factored into a product of small prime numbers. In the case N > n, the vibration frame a should be padded using zeros; that is, the terms from a n+1 to a N are specified as zeros. The accelerometer usually work at a frequency of up to 100 Hz. If the terrain classification is desired to work at 1 Hz, which means the prediction should be given every second, then we use the 128-point FFT to transform the vibration frames into their spectrums. If treating the spectrum as the feature directly, the feature is a 128-dimensional vector. In order to reduce the feature dimension, we sample some entries uniformly from the spectral vector to constitute the feature.

Time-Domain Features
Other than the frequency domain, we also extract the features in the time domain directly. A 10-dimensional feature vector φ = (φ 1 , φ 2 , · · · , φ 10 ) is obtained, and its entries are shown in Table 1. It is noted that φ 5 can be extended by setting τ = 1, 2, · · · , n − 1. However, according to the Khintchine's law, it should be guaranteed that τ n to bound the estimation error of φ 5 . In this paper, we choose τ = 1.

Support Vector Machine
Let {(x 1 , y 1 ), ..., (x m , y m )} denote the training set, where m is the size of the training set and y i ∈ {±1}. Support vector machine (SVM) aims to construct a separating hyperplane between two classes of points that maximizes the margin between the hyperplane and support vectors [41]. Usually the hyperplane cannot be found in the original sample space. For such a nonlinear classification task, kernel technique is applied to map the original data to a high-dimensional feature space by ϕ : x → ϕ(x). Inner product of points in feature space is then conducted implicitly by a kernel function. In our work, we use two common kernel functions that are linear kernel κ(x i , x j ) = x i x j , and Gaussian , where σ denotes the width of the Gaussian kernel. Soft margin is used to regularize the trade-off between minimizing the training error and maximizing the margin. Therefore, an SVM can be described as the following optimization problem [42] min ω,b,ξ where ω is the vector normal to the hyperplane, b is a scalar bias, and λ is the soft margin parameter. Multi-class classification task of SVM can be performed using one-versus-one approach. The SVM model can be updated online using incremental SVM (i.e., [43]). As a result that only the support vectors participate in the learning process, the incremental SVM reduces the training time greatly and seldom loses accuracy.

Name Equation Description
Zero-crossing number (ZCN) is an indicator function, which outputs 1 if the expression in (·) holds, or 0 otherwise. This feature is an approximation of the frequency of a. Mean Although the gravitational acceleration has been subtracted, the mean of a may considerably diverge from zero for some course terrains.

ZCN inā
. φ 3 is a complement to φ 1 , which avoids φ 1 ≈ 0 for even high-frequency vibration signal when the robot is traversing coarse terrains. Variance Intuitively, the variance is higher when the terrain becomes coarser. Autocorrelation τ < n is an integer indicating time difference. As a measure of non-randomness, φ 5 gets larger with a stronger dependency between a i and a i+τ . 6 indicates the biggest bump of the terrain.
Minimum φ 7 = min(a) φ 7 indicates the deepest puddle of the terrain.

Bayesian Filter
The recursive form of Bayesian filter can be seen in [44]. Define χ t as the state at time t, c t the measurement, and C t = {c 1 , c 2 · · · , c t } the measurement set. The purpose is to acquire P(χ t |C t ), the a posteriori possibility distribution function (pdf) of χ t conditioned on C t . Given P(χ t−1 |C t−1 ), we have where P(χ t |C t−1 ) denotes the a priori pdf of χ t conditioned on C t−1 . Define χ t as the state at time t, c t the measurement, and C t = {c 1 , c 2 · · · , c t } the measurement set. The purpose is to acquire P(χ t |C t ), the a posteriori possibility distribution function (pdf) of χ t conditioned on C t . Generally speaking, analytic solutions to Equations (5) and (6) are unavailable in most cases, so the estimation problem for continuous state is seldom tackled by Bayesian filter. However, if the state is discrete and its number is not too large, the Bayesian filter is a practicable method to solve such a state estimation problem. In terrain classification, the state at time t is defined as χ t ∈ {1, 2, · · · , } where i = 1, 2, · · · , denotes the terrain ID. The measurement c t ∈ {1, 2, · · · , } is the classifier-output terrain predictions. Given P(χ t−1 |C t−1 ), we have where P(χ t |C t−1 ) denotes the a priori pdf of χ t conditioned on C t−1 , P(χ t = i|χ t−1 = j) denotes the probability that the mobile robot moves from terrain j to i at time t, and P(c t = j|χ t = i) denotes the probability of the classifier outputting terrain j conditioned on terrain i. Meanwhile, we observe that the denominator of Equation (8) is a normalizer. Applying the Bayesian filter to improve the terrain classification is on the premise of knowing P(χ 0 |C 0 ), P(c t |χ t ) and P(χ t |χ t−1 ). First, the initial a posteriori pdf P(χ 0 |C 0 ), where C 0 denotes a set of no measurements, describes the distribution of the terrain at which the mobile robot locates initially. If the initial terrain is known, then we have P(χ 0 = i|C 0 ) = 1 and P(χ 0 = i|C 0 ) = 0 when locating at terrain i; otherwise, P(χ 0 |C 0 ) is assumed to be uniform distribution, namely, P(χ 0 = i|C 0 ) = 1 for i = 1, 2, · · · , . Second, P(c t |χ t ), which is required during the measurement-update procedure, is determined by the confusion matrix. Third, P(χ t |χ t−1 ), which is required during the time-update procedure, describes the correlation of the sampled terrain series. Given terrains, an × square matrix M with elements m ij = P(χ t = i|χ t−1 = j) is defined. The diagonal elements m ii where i = 1, 2, · · · , should be assigned a relatively large value not greater than 1, based on the heuristic that terrain is spatially continuous. The off-diagonal elements m ij where i = j can be determined by the terrain distribution in a map. For example, if terrain i possesses more area than terrain j, then m ij < m ji . It should be guaranteed that the sum of a row equals 1. A general and simple setup of M is that m ii = µ for i = 1, 2, · · · , and m ij = 1−µ −1 for i = j.

Pseudo-Labeling Algorithm
The pseudo-labeling algorithm aims to extract key samples, and label them in a high accuracy. The term key samples is denoted as the unlabeled samples that cannot be correctly classified. Now we introduce a new term named interior disagreement (in-disagreement). For each domain, we have two terrain predictions at the same time. The classifier outputs are read as the a priori terrain predictions, while the filter outputs as the a posteriori terrain predictions. If the a priori and a posteriori terrain predictions of the same domain at a certain time are different, then this phenomenon is referred to as in-disagreement. The term a priori ex-disagreement means the a priori terrain predictions at a certain time of the two domains are different. Similarly to the a priori ex-disagreement, the a posteriori ex-disagreement is denoted by that the a posteriori terrain predictions at a certain time of the two domains are different. Based on the in-and ex-disagreement, we propose the following heuristic rules: 1. If one domain (denoted as the 1st domain) appears in-disagreement at a certain time, the sample is likely to be a key sample of the 1st domain. 2. Based on the first rule, if at the same time, the other domain (denoted as the 2nd domain) does not appear in-disagreement, and there is no a posteriori ex-disagreement between the two domains, then the 2nd-domain terrain prediction is likely to be a reliable label to the 1st-domain key sample. 3. If in-disagreement appears in both domains, but there is no a posteriori ex-disagreement, the filter-output terrain prediction can be used to label the samples from both domains. 4. If neither in-disagreement nor ex-disagreement appears at a certain time, the sample is likely to be classified correctly, thus not a key sample.
Now we present the algorithm in detail. Define γ ∈ {T, F} as the domain type, where T stands for time domain, and F for frequency domain. In the γ domain, upon feeding a sample x γ t , the γ-domain classifier outputs the a priori terrain prediction c γ t ; and then, the Bayesian filter outputs the a posteriori terrain predictionĉ γ t . The pseudo-labeling algorithm is shown in Algorithm 1. As a result that the rules are proposed on the mechanism of in-and ex-disagreement, we name it in-and ex-disagreement-based pseudo-labeling (IE). The proposed IE is a sample that is an efficient method to extract and label key samples, which will be verified in Section 3.

Algorithm 1 In-and Ex-Disagreement-Based Pseudo-Labeling Algorithm (IE)
Input: The unlabeled samples x T t and x F t , the a priori terrain predictions c T t and c F t , the a posteriori terrain predictionsĉ T t andĉ F t , where t = 1, 2, · · · , K. Output: Pseudo-labeled sample sets L T and L F , for time and frequency domain, respectively.
if c T t =ĉ T t and c F t =ĉ F t andĉ T t =ĉ F t then 10: end if 12: end for 13: return L T and L F

Fusion of Terrain Predictions
In ensemble learning, voting, including the majority, plurality, and weighted voting, are general schemes to fuse different predictions [45]. However, they cannot be used to our fusion task directly, since we only have two domains. Two dedicated schemes follow: The 1st fusion scheme is The 2nd fusion scheme is where o 2 t denotes the fused terrain prediction using the 2nd fusion scheme,v γ t denotes the confidence vector of the γ-domain Bayesian filtering at time t. The weight w > 0 should be a number larger than 1. The function M{·} returns the index of the largest element in the vector. As a result that the terrain IDs correspond to the vector indices, M{·} returns the terrain prediction.
The mentioned fusion schemes fuse the a posteriori terrain predictions, while they can be also used to fuse the a priori terrain predictions.

Experimental Verification
In this section, we first present the description of the experimental robot, the experimental terrains, and the details of the experimental data collection. Second, we demonstrate the performance of the traditional terrain classification methods when data drift exists. Thirdly, we exhibit how the Bayesian filtering improves the classification results. Finally, a comparative study is done to verify the effectiveness of the proposed DyVTC.

Experimental Data Collection
The experimental robot and its electronic system structure and signal flows are shown in Figure 2. The robot is 340 mm in length, 270 mm in width, 230 mm in height, and 2.6 kg in mass. The diameter and width of the wheels are 130 mm and 60 mm, respectively. With a power supply of 12 V, the robot could traverse coarse grounds at the speed of up to 1.5 m/s. An accelerometer-gyroscope-magnetometer integrated sensor (MPU9250) and an odometry constitute the sensor system. The main configurations of odometry, gyroscope, accelerometer, and magnetometer are exhibited in Table 2. The odometry is actually four incremental encoders which are directly mounted on the motor shafts to perceive the motor rotational speeds; and consequently, the odometry outputs the robot's moving speed. The accelerometer-gyroscope-magnetometer integrated sensor can be used to obtain the robot pose and the vibration. The micro control unit (MCU) reads the Z-axis accelerometer at 100 Hz. Meanwhile, the robot moving speed and pose are measured every second, which evaluates the robot motion modes. The robot is controlled with a smart phone, by sending commands to the robot via Bluetooth. The MCU is a development board of Arduino Mini Pro which is used to realize some simple and fundamental operations, such as data gathering, motor control, and command receiving. While the robot is working, all data are stored in the local memory (a T-Flash card); and next, the card is unplugged from the robot, connected, and transferred to a desktop computer (3.20 GHz, 8 GB RAM).  All algorithms will be evaluated on the computer based on the gathered data. Among the terrains listed in [46], we select six terrains on which a robot is most likely to traverse to do the experiment. As shown in Figure 3, some of them are artificial terrains (e.g., asphalt road), while some are natural ones (e.g., natural grass). These terrains are different in rigidity, roughness, and flatness. The segments of vibration time series collected on the six terrains and the corresponding terrain photographs are shown in Figure 3. Compared with other terrains, it is observed that the interaction between the robot and the cobble path generates a highly distinguishable vibration. The vibration has higher frequency, larger magnitude, and weaker autocorrelation, because the cobble path is relatively rigid and irregular. The vibrations of the other five terrains may not be easy to discriminate intuitively because of their slight differences; however, they still can be found in terms of their variation tendency.  Different motion states might also cause the data drift, but it could be eliminated by the sufficient data collection, as the number of motion states are relatively limited. Hence, in our experiment of data collection, we control the experimental robot to wander on the six terrains at a speed ranging from the minimum speed (0.2 m/s) to the maximum speed (1.1 m/s) and in different motion modes (e.g., circular and linear motion), which avoids the data drift from an insufficient experiment. We collect the vibration data in two different environments, thus obtaining two vibration datasets: D 1 and D 2 . lIntuitively speaking, (i) the grass in garden and roadside may be different in height, (ii) the natural grass gets harder under fine weather, while softer after raining, and (iii) the soil is harder at night than that in the daytime because of the lower temperature. Environment One and Environment Two both include the aforementioned 6 terrains, but are different in location, weather, and temperature. For each dataset, the vibration time series are segmented into vibration frames by every 100 points with 50% overlap, and therefore, D 1 and D 2 are transformed into S 1 and S 2 which are composed of vibration frames. As shown in Figure 4a, S 1 is divided into S 1.1 and S 1.2 , each of which contains 3000 frames. Similarly, as shown in Figure 4b,

Performance Evaluation of Classifier
To evaluate the classifier performance in a static environment, i.e., the training and test data are both gathered in Environment One, we train two classifiers on S T 1.1 and S F 1.1 , and test them on S T 1.2 and S F 1.2 , respectively. The Gaussian kernel is employed for the time-domain classifier. As for the frequency-domain classifier, because the feature vector is of high dimension, we employ a linear kernel. We use the confusion matrix to show the classification performance. The rows of the confusion matrices represent the real terrains, while the columns represent the predicted terrains. The trained time-and frequency-domain SVM model can achieve the accuracies of 85.4% and 86.5%, which are acceptable to a field robot. It is observed that the main confusion exists between the terrains of natural gas (NG) and sand beach (SB). Compared with other terrain, NG and SB are both natural terrains, and have the similar rigidity and unevenness. In addition, the terrain of plastic track (PT) cannot be easily classified. The classifier C T cannot distinguish PT and asphalt road (AR) perfectly, while C F are confused in PT and artificial grass (AG). The terrains of PT, AR, and AG are all artificial terrains, which are made to enhance pedestrian or vehicle's traversability, so they usually have the similar characteristics in rigidity, roughness, and flatness.
To evaluate the classifier performance in a dynamic environment, we use the classifiers trained on S T 1.1 and S F 1.1 to predict S T 2.1 and S F 2.1 , respectively. Due to the data drift, the accuracies on S T 2.1 and S F 2.1 could only reach 48.8% and 71.5%, respectively. As illustrated in Figure 5a, Figure 6a,b, the classifier performs much better under data drift, but only about 33% of NG and SB samples can be classified correctly. The performance degradation of SVM model is caused by data drift. In our experiment, NG is the most changeful terrain, hence becoming the main class that confuses the classifier.
The fusion accuracies with different w are shown in Figure 7. It is observed that the fusion of the time-and frequency-domain classifiers could increase the classification accuracy slightly, with an appropriate w. The time-domain classifier performs much worse than the frequency-domain one, so the increase of fusion accuracy is not significant.
The offline terrain classification, which means the classifiers performing on S T 1 /S F 1 , could achieve a maximum accuracy of 92.7%. The offline classification accuracy is improved. However, the online terrain classification, which means the classifiers performing on S T 2 /S F 2 , does not see a significant improvement. The online classification accuracy can be increased by about only 1% if ω could be appropriately set. If we have no a priori knowledge on the two views and do not know which is better, then the coefficient ω is usually assigned by 1.

Performance Evaluation of Bayesian Filter
Now we are in the position to evaluate the Bayesian filter improving the classifier-output terrain predictions. Here, we exhibit the details of the Bayesian filter correcting the classifier's outputs, as shown in Figure 8. Taking the temporal correlation in sample stream into consideration, the prediction of the current terrain is not only based on the current vibration frame any more, but a combination of the current vibration frame and the previous terrain prediction. Hence, as shown in Figure 8a,b, the incorrect predictions by the classifier at time 1674, 1676, 2837, 2838, 2840, 2841, 2852, 2854, 2855 can be corrected by the Bayesian filter. The Bayesian filter regards the classifier-output terrain predictions as observations. Due to the introduction of temporal correlation, which results in the lags in response to the variation of observations, the Bayesian filter outputs incorrect predictions at time 1663, 1665-1667, as shown in Figure 8b. Such lags can be found in Figure 8c as well. Furthermore, it is known that the Bayesian filter has the ability of tracking the observations. Therefore, as the Figure  Denote the sets C T 1 , C F 1 , C T 2 , and C F 2 by the outputs of C T (S T 1 ), C F (S F 1 ), C T (S T 2 ), and C F (S F 2 ), where S T 1 ⊂ S T 1 and S F 1 ⊂ S F 1 denote the testing set of S T 1 and S F 1 , respectively. Feeding these classifier's output set into the Bayesian filter, the classification results are increased by approximately 5% to 10%. The filtering accuracies with different µ are exhibited in Figure 9. It is observed that the accuracies almost reach 97% and 98% with the Bayesian filter performing on C T 1 , C F 1 , which means the offline classification accuracy increases by approximately 10%. On the other hand, filtering on C T 2 and C F 2 does not see such an effectiveness, which increases the classification by approximately 7% only. The influences on Bayesian filtering and pseudo-labeling algorithm of different diagonal elements are shown in Figure 10. The term "true" means the number of the extracted samples that are not key samples (i.e., the samples that can be classified correctly), while "false" means the number of key samples (i.e., the samples that cannot be classified correctly). The term "all" means the number of the extracted samples. The terms "true-positive", "false-positive", and "all-positive" mean the numbers of the true samples, false samples, and all samples which could be correctly labeled by the proposed pseudo-labeling algorithm, respectively. It is observed that the Bayesian filter could increase the classification accuracy to some extent. In the time domain, the increasing accuracy varies from 1.6% to 4%, and peaks when the diagonal element exceeds 97%. Such an accuracy promotion could be found in the frequency domain more apparently. Furthermore, the pseudo-labeling algorithm could extract more false and false-positive samples with the diagonal element getting larger, both in the time and frequency domain. On the contrary, the number of true and true-positive samples does not increase significantly. Therefore, the pseudo-labeling algorithm could reach a high performance with a larger diagonal element.

Comparative Study of Adaptation in a Dynamic Environment
As shown above, the classifier trained on S 1 cannot achieve a high accuracy on S 2 for the presence of data drift. Now we are in the position to evaluate how DyVTC could retrieve the classification accuracy by incremental learning on the data chunks. As aforementioned, the terrain classification in a dynamic environment has rarely been investigated. We construct some terrain classification methods by applying the existing learning algorithms. The performances of the proposed DyVTC and those constructed ones will be evaluated. The 9 methods are shown as follows: • IE1: The proposed DyVTC. IE is the abbreviation of in-and ex-disagreement. • IE2: Similar to IE1, we use the a priori ex-disagreement, instead of a posteriori ex-disagreement.
• IE3: Similar to IE1, we use both a priori and a posteriori ex-disagreement, which are combined using logical OR. • CT.95: Using co-training algorithm (see [47]) to tackle such a terrain classification problem. The confidence threshold is 0.95. • CT.8: Similar to CT.95, but the confidence threshold is 0.8. • ST.95: Using the self-training algorithm for both domains. The similar idea can be found in [35,36]. The confidence threshold is 0.95. • ST.8: Similar to ST.95, but the confidence threshold is 0.8. • KM.95: Using an advanced fuzzy k means (see [48]) semi-supervised clustering algorithm to label the newly collected samples. The confidence threshold is 0.95. • KM.8: Similar to KM.95, but the confidence threshold is 0.8.
The performances of pseudo-labeling algorithms are shown in Table 3. It is observed that the IE1 outperforms all the other algorithms in accuracy. The IE1 algorithm could only extract 100-200 samples from the whole 3000 samples, and the true-positive accuracy is 0. However, most of the extracted samples are key samples and these key samples could be labeled in an extremely high accuracy (over 95%). Hence, as shown in Figure 11, such a pseudo-labeling algorithm could increase the classification accuracy on S 2 . The IE2 and IE3 are the variants of IE1. IE2 could extract many true samples and label them in 100% accuracy, but its false-positive accuracy is 0%. This indicates IE2 cannot bring valuable information, and thus cannot increase nor decrease the classification accuracy. All indices of IE3 are the sums of the corresponding indices of IE1 and IE2, and consequently, the performance of IE3 is between those of IE1 and IE2. We can also observe that the pseudo-labeling accuracies of IE2 and IE3 decrease at learning steps 2 and 3, but the classifier accuracy does not decrease. This is because IE2 and IE3 have high true-positive accuracy, which guarantee that the classifier accuracy does not decrease after update. In conclusion, it is the best to use a posteriori ex-disagreement in the pseudo-labeling algorithm. The CT.95 and CT.8 could increase the accuracy of the time-domain classifier but decrease that of the frequency-domain classifier, which is caused by the unequal performances of the two domains. The frequency-domain classifier performs much better, so it acts as a supervisor of the time-domain classifier. The ST.95 and ST.8 do not utilize a mutual learning mechanism, thus they have no effect on the classifier accuracy. The KM.95 and KM.8 only work under clustering assumption which is seldom satisfied when data drift occurs. Hence, the classifier accuracy decreases after updating using the KM methods. In conclusion, the IE methods could increase the classifier accuracy by incremental learning, but the others cannot work or even are counterproductive. (i) KM.8 Figure 11. Accuracies of iterative incremental learning. The pseudo-labeling algorithm is conducted on S 2.1 , S 2.2 , and S 2.3 , while the classifier is re-trained incrementally at the end of S 2.1 , S 2.2 , and S 2.3 . The original classifier which is trained on S 1.1 is tested on S 2.1 , while the updated classifiers are tested on S 2.2 , S 2.3 , and S 2.4 . The fusion is based on the 2nd scheme. The marker definitions follow: • denotes fusion accuracy; •, , and * denote the frequency-domain filter, classifier, and pseudo-labeling accuracy, respectively; •, , and * denote the time-domain filter, classifier, and pseudo-labeling accuracy, respectively.
The time cost is shown in Table 4. It can be observed that IE1, IE2, and IE3 take the shortest time to generate the pseudo-labeled sample set, while KM.95 and KM.8 is the most time-consuming. Unlike KM.95 and KM.8, which could only work after a data chunk is collected completely, IE1, IE2, IE3, CT.95, CT.8, ST.95, ST.8 could generate the pseudo-labeled samples at the time when a vibration frame prediction is finished, so the time cost of pseudo-labeling of these methods could be ignored. For the incremental learning part, compared with CT, ST, and KM, IE1 and IE3 use less pseudo-labeled samples to train the last classifier incrementally, but are the most time-consuming. This is because the majority of the pseudo-labeled samples generated by IE1 and IE3 are correctly-labeled key samples, which leads to the changing of classifier. Even so, IE1 and IE3 could be done within one second, which guarantees the real-time application.

Conclusions
In this paper, we propose a novel vibration-based terrain classification method for autonomous robots working in a dynamic environment, mainly to suppress the affect rendered by data drift, during the period that manual labels do not arrive. We mainly propose an ex-and in-disagreement-based learning algorithm, which is verified to be powerful to extract key samples and label them in high accuracy. In order to activate such a learning framework, we divide the vibration view into two domains, which may produce ex-disagreements; and introduce the Bayesian filter to correct the classification results, which may produce in-disagreements. The real-world experiment shows that the proposed DyVTC could reach an accuracy of 89.5%, which outperforms the existing VTC methods.