In this section, a series of experiments were executed to substantiate the efficacy of the proposed PMLNet. The entire experiments were performed on a i7-8700K, 3.70 GHZ CPU with 32 GB RAM. The TensorFlow 1.14 framework served as the foundational platform for these operations.
  4.1. Datasets Partition
As previously mentioned, after the pre-processing of the raw datasets, they are divided into four sub-datasets: Current Driving Segment Travel Time dataset , Bus Stop Dwelling Time dataset , Stop-to-Stop Travel Time dataset , and Transfer Point Waiting Time dataset . And , , , and  are used to train and evaluate the four sub-modules, which were further divided into training, verification, and testing datasets.
It should be noted that in order to validate the performance of proposed PMLNet, a pre-partitioning process is conducted, and a testing dataset, 
, is taken from 
. The corresponding results of the dataset partition are shown in 
Table 6.
  4.2. Model Training
As mentioned in 
Section 3.1.3, two basic models, namely LSTM and the proposed MDARNN, are employed in this study. The model selection is task-specific. For Current Driving Segment Travel Time and Transfer Point Waiting Time, locally-driven many-to-one prediction tasks, LSTM is adopted, as it efficiently captures short-term dependencies. For Bus Stop Dwelling Time and Stop-to-Stop Travel Time, which involve long-term dependencies and require integration of local factors in many-to-many predictions, MDARNN is utilized. Its attention mechanisms effectively filter local features, rendering it suitable for addressing the complexity of long-sequence tasks. The corresponding parameter settings are described as follows.
- MDARNN - –
- Number of input features: 55 features in total, comprising 7 macro impact factors and 48 local impact factors; 
- –
- Encoder hidden layer length: 20 units; 
- –
- Decoder hidden layer length: 30 units; 
- –
- Time step: 5; 
- –
- Batch sizes: 160; 
- –
- Initial learning rate: 0.006. 
 
- LSTM - –
- Number of input features: 55 features in total, comprising 7 macro impact factors and 48 local impact factors; 
- –
- Number of Neurons: 160 units; 
- –
- Full connection layer neurons (Current Driving Segment Travel Time module): 10 units; 
- –
- Full connection layer neurons (Transfer Point Waiting Time module): 3 units; 
- –
- Batch sizes: 160; 
- –
- Initial learning rate: 0.006. 
 
As shown in 
Table 7, for the parameter settings of four modules, the initial value of the step size is set to 0.006, and the dynamic adjustment of the learning rate and the updates to the network’s weights and biases are executed using the Adam optimizer. Each of the modules undergoes 100 epochs of training, where every epoch encompasses 160 instances for training purposes.
All four modules utilize the mean absolute error (MAE) as their loss function and employ the Adam optimizer to minimize the MAE. This approach is used to search for the combination of parameters that minimizes the loss function on the training set. The details of the algorithm are shown in Algorithm 3, while the corresponding explanation is provided in 
Table 8.
        
| Algorithm 3: Adam Algorithm | 
| Input: Output:
 
 ![Applsci 15 08104 i003]() | 
The training performance of the four sub-models is shown in 
Figure 9a–d; the blue line in the figure is the decreasing curve of the loss function on the training set, and the yellow line is the decreasing curve of the loss function on the validation set. The vertical coordinate of the left figure is the error calculated by MAE, which participates in the updating of the parameters, and the right figure is the error calculated by mean square error (MSE), which does not participate in the updating of the parameters, and serves as a reference curve for adjusting the parameters of the model only. The horizontal coordinate is the number of training rounds. With the increase in the number of epochs, the training loss and validation loss of the four models show good convergence.
  4.4. Contrast Experiments
To assess the effectiveness of our proposed model-PMLNet, we have selected seven alternative methods for comparison; among them, Historical Average (HA), Support Vector Regression (SVR), Partitioning and Combination Framework Linear Regression (PCF-LR) are existing methods, while PCF-unweight, PCF-LSTM, PCF-DARNN, and MDARNN are ablation experiments.
HA: This method uses historical travel records as the basic data, and calculates the present travel time as the predicted value by taking the average of historical records [
23].
SVR (Support Vector Regression): SVR is employed for predicting total bus travel time, following the methodology outlined in literature [
29].
PCF-unweight: The current travel time is derived directly from the sum of the outputs of all four sub-modules without real-time traffic flow impact factor weighting, as shown in Equation (
24)
PCF-LSTM: The method predicts all four time periods by utilizing LSTM, with four real-time traffic impact factors being employed for calibration.
PCF-DARNN: Instead of MDARNN, this method employs DA-RNN to predict Stop-to-Stop Travel Time and Bus Stop Dwelling Time. The total predicted travel time is weighted and summed by four real-time traffic impact factors.
Pure MDARNN: This approach utilizes the MDARNN model introduced in this work for predicting bus travel time without using the partitioning and combination framework. The model’s predictions are then calibrated using traffic factors as Equation (
25), where 
 represents the travel time of 
qth bus on the route predicted by the MDARNN model in the first 30 min, and 
 represents the true travel time of the 
qth bus on the route in the first 30 min.
PCF-LR: PCF-LR combines the historical average method for predicting Transfer Point Waiting Time and constructs an LSTM model for travel time prediction. The total travel time is then obtained using linear regression. This method is employed for Bus Arrival Time prediction in [
9].
In this study, four bus routes are considered; the results of the performance comparison are presented in 
Table 10. All experiments were conducted under the same environment and using the same dataset as processed in this paper as shown in 
Table 6.
PMLNet outperforms other models on all routes, confirming the effectiveness of the parallel prediction framework. Notably, MAPE slightly decreases as route distance increases (Route 4 is the shortest). This is because bus travel time depends on travel status, real-time traffic congestion, stop dwelling time, and transfer waiting time. When the bus routes are shorter, the road sections are shorter, the number of bus stops is fewer, and the predicted congestion status, travel time between stops, stopping time at stops, and waiting time at transfer point will have large differences. However, the predicted values of the four sub-modules tend to stabilize as the travel distance increases, resulting in better prediction accuracy. In contrast, longer bus routes tend to have larger MAE, because the longer bus route has a greater average travel time, and the MAE accumulates as the travel time increases. In this reason, Route 4, with the shortest travel distance, has a higher MAPE value and a lower MEA value than other routes.
Figure 12 shows the MAE and MAPE for the four models on different routes. The MAE and MAPE of PCF-DARNN are higher than those of PMLNet, suggesting that MDARNN is more suitable than DARNN for bus travel time prediction by considering macro impact factors. Pure MDARNN does not perform well in both MAE and MAPE compared to models that use division and combination framework, possibly due to the framework’s design of different neural networks based on the characteristics of the four travel time parts. The performance of PCF-unweight on the four routes is inferior to that of PMLNet, which demonstrates that the real-time traffic factors are effective in reducing the deviation of bus travel time prediction.
 The overall performance of PMLNet is also optimal as shown in 
Table 11. Compared with other methods, both the MAE and MAPE of PMLNet reach a small value. The MAPE is 2.91%, which indicates the effectiveness of the PMLNet model. The MAE is 1.45 min, which indicates that the average difference between the estimated travel time and the actual travel time on the four travel routes is 1.45 min, which is a low value acceptable to the passengers. The experimental results show that the 
 correction factor improves the prediction accuracy of each route by about 15–40% by integrating real-time traffic flow information.
  4.5. Bus Travel Time Prediction
To validate the effectiveness of PMLNet, a bus travel time prediction system based on PMLNet is designed. The system architecture is given in 
Figure 13. As shown in 
Figure 13, the system employs a client/server model to achieve travel time prediction. The client handles real-time data collection/processing, travel time prediction, and output demonstration, while the server handles database access and PMLNet training. The data collected by the client is sent to the server and stored in the server database.
The client firstly collects and processes the real-time macro and local factors, and then sends the real-time location information to the client page, while the real-time data is sent to PMLNet and the server, respectively. After obtaining the features, the PMLNet module outputs the predicted bus travel time. Finally, the client page displays the real-time location information and predicted bus travel time. On the server side, the real-time data, actual travel time, and predicted travel time are stored in the database. As sufficient data accumulates, the model can undergo incremental training on the server to enhance its generalization capability.
The system was tested on the XIAOMI 10 in 15 June 2023, and the test routes were route 805 to 202, where the start is Jialing Garden, the end is Golden Fort, and the tester would transfer at Shapingba Station.
The client screenshots show the user interface of the bus travel time prediction system. This interface is designed to provide users with key information about their bus trips. The main elements displayed include the route map, predicted and actual travel times, and important station markers. Before starting their trip, users can interact with the interface by checking the predicted times to plan their commute. After the trip, they can compare the actual travel times with the predicted ones to assess the system’s accuracy.
Figure 14 shows the client page at the start location and the destination. In (a), the predicted travel time consists of two driving periods and one waiting period, while (b) demonstrates the actual travel time of three periods. The test results reveal that the error between the predicted travel time and the actual travel time is less than two minutes, which proves that the bus travel time prediction system can accurately identify the real-time location of the bus and predict the travel time with high accuracy.