4.2. Comparison of the Construction and Predictive Performance of Each Model
Following processing (including preprocessing, stop analysis, trajectory interpolation, and road network matching) of 432 MPL trajectories, 1831 precise moving segments were derived. To comprehensively evaluate the accuracy of the transportation mode recognition results in this study, as well as the effects of the precision of residents’ travel trajectories, geographical environment features, and navigation features on transportation mode recognition, we construct three datasets (
D1,
D2 and
D3) for transportation mode recognition models and conducted comparative experiment. The specific settings are shown in
Table 5. In particular,
D1 and
D3 use precise moving segments as samples, while
D2 use raw moving segments.
D1 extract the eight spatiotemporal feature parameters (
Table 4) from its samples, whereas
D2 and
D3 both extract 35 feature parameters from their respective samples. Datasets
D1,
D2, and
D3 have an identical sample size of 1831. Each of the three datasets was subjected to the SOM, RF, and XGBoost models, respectively, generating a total of nine comparison groups (nine models in total).
The dataset was randomly split into a training set (75%) and a test set (25%) for each model. Hyperparameter optimization was employed for the nine models using grid search coupled with 5-fold cross-validation on the training set. The best-performing hyperparameters were used to construct the final travel mode recognition models. These models were subsequently evaluated on the test set, and their predictive performances are presented in
Table 6.
Figure 6 presents the ROC curves of the nine models on the test set. The optimal hyperparameters and corresponding predictive performance for all nine models are summarized as follows: ① Model 1 (
D1 + SOM) achieved the best fit with a learning rate (η) of 0.01 and a neighborhood radius (γ) of 2.5. On the test set, it attained an AUC of 0.781, an accuracy (
Acc) of 63.2%, and an
F1-score of 0.633. ② Model 2 (
D2 + SOM) performed best with η = 0.01 and γ = 2, yielding an AUC of 0.845, an
Acc of 70.0%, and an
F1-score of 0.702. ③ Model 3 (
D3 + SOM) found its optimal performance with η = 0.01 and γ = 2, resulting in an AUC of 0.86, an
Acc of 74.5%, and an
F1-score of 0.736. ④ Model 4 (
D1 + RF) achieved optimal performance with 500 decision trees (
ntree), obtaining an AUC of 0.87, an
Acc of 76.8%, and an
F1-score of 0.765. ⑤ Model 5 (
D2 + RF) performed best with
ntree = 500, achieving an AUC of 0.905, an
Acc of 84.0%, and an
F1-score of 0.84. ⑥ Model 6 (
D3 + RF) reached its optimal configuration with
ntree = 300, which produced an AUC of 0.908, an
Acc of 89.8%, and an
F1-score of 0.902. ⑦ Model 7 (
D1 + XGBoost) was optimized with a maximum tree depth (
max_depth) of 7300 boosting rounds (
n_estimators), and a learning rate (α) of 0.1. It demonstrated strong performance with an AUC of 0.929, an
Acc of 81.5%, and an
F1-score of 0.814. ⑧ Model 8 (
D2 + XGBoost) achieved the best fit with
max_depth = 2,
n_estimators = 150, and α = 0.01, attaining an AUC of 0.948, an
Acc of 86.0%, and an
F1-score of 0.869. ⑨ Model 9 (
D3 + XGBoost) performed best with
max_depth = 5,
n_estimators = 150, and α = 0.1, achieving superior results with an AUC of 0.966, an
Acc of 91.8%, and an
F1-score of 0.92.
Table 6 illustrates the recognition performance of three distinct datasets under the SOM, RF, and XGBoost models. Using identical datasets enables cross-model performance comparison. The results reveal that SOM achieves the lowest performance, RF exhibits moderate performance, and XGBoost attains the highest performance. Specifically, XGBoost consistently demonstrated superior performance across all datasets. In
D1, it outperformed SOM and RF in accuracy by 18.3% and 4.7%, and in
F1-score with respective margins of 0.181 and 0.049. This trend continued in
D2, with accuracy advantages of 16.0% and 2.0%, and
F1-score advantages of 0.167 and 0.029 over SOM and RF. Similarly, in
D3, it led SOM and RF by 17.3% and 2.0% in accuracy, and by 0.184 and 0.018 in
F1-score. These comparisons demonstrate the superior performance of XGBoost in transportation mode recognition, with its accuracy reaching 91.8% in
D3.
Figure 6 also indicates that Model 9 achieves the highest prediction performance (AUC = 0.966). In summary, the Model 9 (
D3 + XGBoost) is the optimal transportation mode recognition model. To compare the statistical significance of model performance, we conducted 5 repeated 10-fold cross-validations on the training set, yielding 50 performance observations (
Acc and AUC) for each model. Subsequent analysis of these results was performed using the Mann–Whitney U test. The results indicated that Model 9 significantly outperforms all other models, with
p-values < 0.05 in the significance tests for both
Acc and AUC when compared to the other models (
Table 7). This conclusion is further supported by the independent test set, where the XGBoost-based model achieved the highest accuracy (91.8%) and the highest AUC (0.966). To gain deeper insights into the stability of this estimation, we performed 10,000 Bootstrap resamplings on the test set and calculated the 95% confidence interval, which was [90.2%, 92.5%]. The narrow range of this interval indicates high confidence in our estimation that the accuracy of Model 9 exceeds 90%.
When applying identical models, the results reflect performance variations across datasets (
Table 6). Overall,
D2 outperforms
D1, while
D3 exceeds both (
Table 6). Specifically, With SOM,
D3’s accuracy is 11.3% higher than
D1 and 4.5% higher than
D2. For RF,
D3 shows 13.0% and 5.8% improvements over
D1 and
D2. Using XGBoost,
D3 achieves 10.3% and 5.8% gains over
D1 and
D2. The experiments demonstrate that constructing a comprehensive feature parameter system outperforms trajectory interpolation alone in enhancing recognition accuracy for urban residents’ transportation modes using MPL data. Moreover, simultaneously combining both approaches achieves the optimal recognition performance. All these results demonstrate that the following analytical framework optimizes recognition performance for MPL data: First, addressing positioning uncertainty through spatiotemporal interpolation of travel trajectories. Second, constructing a comprehensive feature parameter system incorporating spatiotemporal features of the trajectory data, geographical environment features, and navigation features. Finally, employing the XGBoost machine learning model to develop the transportation mode recognition model.
4.3. Influence of the Precision of Residents’ Travel Trajectories on Recognition Results
Based on the aforementioned findings, which demonstrate the superior performance of the XGBoost model, a further analysis was conducted to elucidate the contribution of precise movement segments and comprehensive feature parameters to transportation mode recognition within this model. Specifically, the results from the three datasets using the XGBoost model were analyzed and evaluated.
Figure 7 shows the confusion matrix of the recognition results for Model 7 (
D1 + XGBoost), Model 8 (
D2 + XGBoost) and Model 9 (
D3 + XGBoost). Based on these results, we further explore the role and impact of the precision of residents’ travel trajectories, geographical environment features, and navigation features on transportation mode recognition.
Figure 7b,c show the confusion matrix of the recognition results for Model 8 and Model 9, revealing the impact of the precision of the residents’ travel trajectories on the accuracy of transportation mode recognition. The results show that when the same set of residents’ travel feature parameters is used, road network matching and precise interpolation of the MPL trajectory can effectively improve the recognition accuracy of each transportation mode. In particular, the most substantial improvement in recognition accuracy is observed for buses and cars, with increases of 12% and 6%, respectively. The recognition accuracy increased by 5% for both bicycles and electric bicycles. The recognition accuracy of walking and subway is improved slightly, reaching 3% and 2%. Moreover, the confusion matrices of the two sets of results indicate that in the absence of road network matching and precise interpolation, almost all transportation modes are misclassified. Walking can be misclassified as bicycle, electric bicycle, bus, or car. Even between walking and car, which represent two transportation modes with substantial differences in their characteristics, instances of misclassification can still occur. Once precise moving segments are constructed, the misclassification between transportation modes with pronounced differences is reduced. However, there are more misclassifications between transportation modes with similar travel features, such as walking misclassified as bicycle and bicycle misclassified as walking and electric bicycle. After performing road network matching and precise interpolation on the MPL trajectory, the model’s ability to distinguish transportation modes with distinct travel characteristics can be improved. However, it is still difficult to fully distinguish similar transportation modes (such as walking, bicycle, electric bicycle, and between bus and car).
The results of
Acc and
F1-score for Model 8 and Model 9 presented in
Table 6 indicate that with road network matching and interpolation processing, all the indicators improved by approximately 6%. This further indicates that the construction of precise moving segments is beneficial for improving the recognition accuracy of the urban residents’ transportation mode based on MPL data.
4.4. Influence of Geographical Environment Features and Navigation Features on Recognition Results
A comparative analysis of the overall recognition results reveals that Model 9 achieved superior performance relative to Model 7. As summarized in
Table 6, Model 9 exhibited an approximate 10% gain in both Accuracy and
F1-score, suggesting that the added geographical environment and navigation features contributed to the enhanced model performance.
Figure 7a,c present the confusion matrices of Model 7 and Model 9 for the recognition results of each transportation mode, further revealing the impacts of geographical environment characteristics and navigation characteristics on the accuracy of transportation mode recognition. The results show that the model’s recognition accuracy for each transportation mode improved with the addition of geographical environment feature parameters and navigation feature parameters. The increase in bus recognition accuracy of 16% is the largest observed, whereas the recognition accuracies of bicycles and cars increased by 14% and 9%, respectively. The improvements in recognition accuracy for electric bicycles, subway and walking are relatively small at 8%, 6% and 7%, respectively. Furthermore, the research results indicate that the model can better distinguish similar transportation modes better after the addition of geographical environment feature parameters and navigation feature parameters. For example, in the absence of geographical environment feature parameters and navigation feature parameters, buses may be misclassified as bicycles, electric bicycles, or cars. In particular, 10% of the bus samples are misclassified as cars, whereas after adding these feature parameters, only 4% of the bus samples are misclassified as cars, and there are no misclassifications as other modes. Similarly, buses, electric bicycles, bicycles and walking all present similar situations, indicating that adding geographical environment features and navigation features is beneficial for improving the model’s ability to distinguish similar transportation modes. A detailed class-wise analysis in
Figure 7c reveals varying recognition accuracy across modes. The subway mode achieved the highest accuracy (97%), with misclassifications confined solely to cars and buses. This was followed by the bus mode (95%), which was only mistaken for cars. Walking and car modes attained accuracies of 93% and 91%, respectively, and were confused only with their respective similar modes. In contrast, the bicycle (87%) and electric bicycle (85%) modes achieved lower accuracy due to their confusion with a broader set of alternatives, such as walking, buses, and each other.
In summary, the proposed comprehensive feature parameter system considerably enhances the recognition accuracy for individual transportation modes and effectively reduces misclassification between similar modes, such as bicycles and electric bicycles. Distinguishing between these two modes is notoriously challenging when relying solely on spatiotemporal feature parameters. Our framework addresses this challenge by integrating navigation feature with refined spatiotemporal dynamics, which collectively reveal systematic differences in travel behavior and kinematic potential. First and most critically, navigation feature parameters provide powerful indirect evidence for discrimination. For each moving segment, we synchronously obtained the navigation time and navigation distance for both modes. The actual travel time and distance of an authentic electric bicycle trip will closely match the navigation-calculated values for an electric bicycle (
NTE and
NDE) while deviating from those for a conventional bicycle (
NTBi and
NDBi) to a large extent. As shown in
Figure 8, the gain-based feature importance analysis confirms that
NDBi,
NTE, and
NTBi are among the most critical features, indicating that the model successfully learns the complex matching relationship between the observed trajectory and the navigation predictions for each mode. Secondly, refined spatiotemporal features capture the micro-level kinematic differences between the two modes. Despite similar trajectory shapes, electric bicycle systematically achieves higher average and maximum speeds due to power assistance. Furthermore, the assisted propulsion results in more stable power output, leading to a lower velocity variance during acceleration or uphill climbing compared to human-powered bicycles, which exhibit greater speed fluctuations. These features effectively complement and reinforce the evidence provided by the navigation parameters.
Figure 8 also confirms that
maximum speed and
average speed are key discriminative features. Finally, geographical environment feature parameters—such as the distance from the trip origin/destination to bicycle parking points (
DONBi/
DDNBi) and the proximity of trajectory points to dedicated bicycle lanes (
RPCC)—provide additional contextual information, further enhancing the robustness of the classification.
4.5. Analysis of Misclassification
In
Figure 7c, we can observe the misclassification between different modes. We find that 8% of the walking samples are misclassified as bicycles, and 9% of the bicycle samples are misclassified as walking. Additionally, the electric bicycles are misclassified mainly as bicycles or cars, with misclassification rates of 8% and 6%, respectively. The misclassification rate between cars and buses is 8%. Walking can be misclassified as bicycle, a bicycle misclassified as walking, an electric bicycle misclassified as a bicycle, an electric bicycle misclassified as a car, and a car misclassified as a bus, making these the five most common types of misclassified samples. Therefore, this study conducted an in-depth exploration of the reasons for misclassification in these five modes.
To explore the reasons for the generation of these five most common types of misclassified samples, we set up five groups of misclassification contrast sets, as shown in
Table 8. The first group in
Table 8 includes Class A and Class A′. The samples that are actually walking but misclassified as bicycles are defined as Class A, whereas the samples that are actually bicycles and correctly recognized as bicycles are defined as Class A′. The second group consists of Class B and Class B′. The samples that are actually bicycles but misclassified as walking are defined as Class B, whereas the samples that are actually walking and correctly recognized as walking are defined as Class B′. The third group is composed of Class C and Class C′. The samples that are actually electric bicycles but misclassified as bicycles are defined as Class C, whereas the samples that are actually bicycles and correctly recognized as bicycles are defined as Class C′. Class D and Class D′ belong to the fourth group. The samples that are actually electric bicycles but misclassified as cars are defined as Class D, whereas the samples that are actually cars and correctly recognized as cars are defined as Class D′. The fifth group comprises Class E and Class E′. The samples that are actually cars but misclassified as bus are defined as Class E, while the samples that are actually bus and correctly recognized as buses are defined as Class E′. We performed a comprehensive analysis of the 35 feature parameters (
Table 4) for misclassified samples and the corresponding 35 feature parameters for correctly classified samples among the five groups of misclassification contrast sets. We quantified the similarity between them by using the Pearson correlation coefficient and Spearman correlation coefficient.
As shown in
Figure 9a, the navigation feature parameters (such as
NDS,
NTS, and
NTE) and some geographical environment feature parameters (such as
Road density,
RPCS, and
DONBu) are not strongly correlated between Class A and Class A′. However,
Average speed,
RPCC, and
DONBi are highly correlated. Combined with actual travel habits, this misclassification is most likely caused by the fact that walking and bicycles often share non-motorized vehicle lanes and other transportation facilities. As shown in
Figure 9b, the spatiotemporal feature parameters such as
Average speed,
Minimum speed, and
Travel distance, are highly correlated between Class B and Class B′. Combined with the actual travel habits, it can be seen that this misclassification is very likely because the elderly and female cyclists ride at a slower speed, resulting in the spatiotemporal characteristics of cycling being similar to those of walking. As shown in
Figure 9c, the parameters such as
Travel distance,
Minimum speed,
NTBi, and
NDBi exhibit clear correlations between Class C and Class C′. The misclassification of this group may be because of the short distance travel and slow travel caused by the different models and speed limits of electric vehicles. In this scenario, the travel characteristics are often similar to those of bicycles, so it is easy to misclassify them. As shown in
Figure 9d, the parameters such as
Maximum speed,
Velocity range, and
AHD_B are markedly correlated between Class D and Class D′. Combined with the actual situation, it can be seen that the misclassification of this group may be due to the similarity in maximum speed between certain electric bicycles and cars, as well as the habit of some users traveling to commercial areas by using electric bicycles. As shown in
Figure 9e, parameters such as
Minimum speed and
Velocity range demonstrate a notable correlation between Class E and Class E′. Considering the actual situation, we find that this may be due to the similarity in speed characteristics between cars and buses under traffic congestion conditions, leading to misclassification. In general, misclassified samples are often the result of special circumstances such as congestion and elderly travel, which cause travel characteristics to exhibit strong similarities with other categories, leading to misclassification.