A Bayesian-Optimized Mixture of Experts Framework for Short-Term Traffic Flow Prediction
Abstract
1. Introduction
2. Literature Review
3. Methods
3.1. Problem Definition
3.2. Multi-Scale Traffic Flow Prediction Using BO-MoE
3.2.1. Input Feature Layer
3.2.2. MoE Module
3.2.3. BO Algorithm
| Algorithm 1. BO-MoE for Traffic Prediction |
| Input: Traffic flow data X, Hyperparameter space x, prediction horizon s, iterations T. Output: Predicted traffic flow Procedure: Initialize the GP surrogate model for i = 1 to T do //GP Posterior Prediction Compute posterior mean and variance // Select hyperparameters using EI = argmax Expected_Improvement(x|GP) //Build and Train MoE model with selected hyperparameters MoE_model = Build_MoE() MoE_model = Train_MoE (Model = MoE_model, Data = X) //Evaluate model performance = Evaluate_MAE (MoE_model, Validation_Data) //Update surrogate model GP = Update_GP (GP, , ) end for //Select optimal hyperparameters = argmin //Build and Train final model Final_MoE = Build_MoE() Final_MoE = Train_MoE(Model = Final_MoE, Data = X) //Generate predictions = Predict (Final_MoE, s) return |
4. Results and Discussion
4.1. Evaluation Metrics
4.2. Dataset
4.3. Implementation Details
4.4. Comparative Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yin, X.; Wu, G.; Wei, J.; Shen, Y.; Qi, H.; Yin, B. Deep Learning on Traffic Prediction: Methods, Analysis and Future Directions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4927–4943. [Google Scholar] [CrossRef]
- Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; PKP Publishing Services Network: Burnaby, BC, Canada, 2020; pp. 914–921. [Google Scholar]
- Smith, B.L.; Williams, B.M.; Oswald, R.K. Comparison of Parametric and Nonparametric Models for Traffic Flow Forecasting. Transp. Res. Part C Emerg. Technol. 2002, 10, 303–321. [Google Scholar] [CrossRef]
- Kumar, S.V.; Vanajakshi, L. Short-Term Traffic Flow Prediction Using Seasonal ARIMA Model with Limited Input Data. Eur. Transp. Res. Rev. 2015, 7, 21. [Google Scholar] [CrossRef]
- Xu, C.; Li, Z.; Wang, W. Short-Term Traffic Flow Prediction Using a Methodology Based on ARIMA and Genetic Programming. Transport 2016, 31, 343–358. [Google Scholar] [CrossRef]
- Zhao, Z.; Chen, W.; Wu, X.; Chen, P.C.Y.; Liu, J. LSTM network: A deep learning approach for short-term traffic forecast. IET Intell. Transp. Syst. 2017, 11, 68–75. [Google Scholar] [CrossRef]
- Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU Neural Network Methods for Traffic Flow Prediction. In Proceedings of the 31st Youth Academic Annual Conference of the Chinese Association of Automation, Wuhan, China, 11–13 November 2016; IEEE: New York, NY, USA, 2016; pp. 324–328. [Google Scholar]
- Abduljabbar, R.L.; Dia, H.; Tsai, P.-W. Unidirectional and Bidirectional LSTM Models for Short-Term Traffic Prediction. J. Adv. Transp. 2021, 2021, 5589075. [Google Scholar] [CrossRef]
- Wei, W.; Wu, H.; Ma, H. An Autoencoder and LSTM-Based Traffic Flow Prediction Method. Sensors 2019, 19, 2946. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Yu, Y.; Qi, Y.; Shu, F.; Wang, Y. Short-Term Traffic Flow Prediction Based on Spatio-Temporal Analysis and CNN. Transp. A Transp. Sci. 2019, 15, 80–91. [Google Scholar] [CrossRef]
- Liu, L.; Zhen, J.; Li, G.; Zhan, G.; He, Z.; Du, B.; Lin, L. Dynamic Spatio-Temporal Representation Learning for Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2021, 22, 7169–7183. [Google Scholar] [CrossRef]
- Zhao, W.; Gao, Y.; Ji, T.; Wan, X.; Ye, F.; Bai, G. Deep Temporal Convolutional Networks for Short-Term Traffic Flow Forecasting. IEEE Access 2019, 7, 114496–114507. [Google Scholar] [CrossRef]
- Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention-Based Spatio-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January– 1 February 2019; PKP Publishing Services Network: Burnaby, BC, Canada, 2019; pp. 922–929. [Google Scholar]
- Zhang, C.; Yu, J.J.Q.; Liu, Y. Spatial-Temporal Graph Attention Networks: A Deep Learning Approach for Traffic Forecasting. IEEE Access 2019, 7, 4–16. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatio-Temporal Graph Modeling. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; AAAI Press: Washington, DC, USA, 2019; pp. 1907–1913. [Google Scholar]
- Guo, K.; Hu, Y.; Qian, Z.; Sun, Y.; Gao, J.; Yin, B. Dynamic Graph Convolution Network for Traffic Forecasting Based on Latent Network Estimation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1009–1018. [Google Scholar] [CrossRef]
- Han, Y.; Zhao, S.; Deng, H.; Jia, W. Principal Graph Embedding Convolutional Recurrent Network for Traffic Flow Prediction. Appl. Intell. 2023, 53, 17809–17823. [Google Scholar] [CrossRef]
- Zhang, E.; Lv, Z.; Cheng, Z.; Ke, J. CL-DGCN: Contrastive Learning Based Deeper Graph Convolutional Network for Traffic Flow Data Prediction. Transp. Res. Part E Logist. Transp. Rev. 2025, 203, 104345. [Google Scholar] [CrossRef]
- Emami, A.; Sarvi, M.; Asadi Bagloee, S. Using Kalman filter algorithm for short-term traffic flow prediction in a connected vehicle environment. J. Mod. Transp. 2019, 27, 222–232. [Google Scholar] [CrossRef]
- Lin, T.; Lin, R. An Efficient Hybrid Model Combining LSTM and Kalman Filter for Real-Time Traffic Flow Prediction in Smart Transportation Systems. In Proceedings of the 2025 IEEE 15th International Conference on Signal Processing, Communications and Computing, Hong Kong, China, 18–21 July 2025; IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar]
- Deshpande, N.; Park, H. Physics-Informed Deep Learning with Kalman Filter Mixture for Traffic State Prediction. Int. J. Transp. Sci. Technol. 2025, 17, 161–174. [Google Scholar] [CrossRef]
- Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive Mixture of Local Experts. Neural Comput. 1991, 3, 79–87. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Cui, Y.; Zhao, Y.; Yang, W.; Zhang, R.; Zhou, X. ST-MoE: Spatio-Temporal Mixture-of-Experts for Debiasing in Traffic Prediction. In Proceedings of the ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; ACM Inc.: Nashville, TN, USA, 2023; pp. 1208–1217. [Google Scholar]
- Jiang, W.; Han, J.; Liu, H.; Tao, T.; Tan, N.; Xiong, H. Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; ACM Inc.: Nashville, TN, USA, 2024; pp. 5206–5217. [Google Scholar]
- Lu, X.; Chen, C.; Gao, R.; Xing, Z. Prediction of High-Speed Traffic Flow around Cities Based on BO-XGBoost Model. Symmetry 2023, 15, 1453. [Google Scholar] [CrossRef]
- Wang, C.; Huang, S.; Zhang, C. Short-Term Traffic Flow Prediction Considering Weather Factors Based on Optimized Deep Learning Networks. Sustainability 2025, 17, 2576. [Google Scholar] [CrossRef]
- Li, B.; Yu, R.; Chen, Z.; Ding, Y.; Yang, M.; Li, J.; Wang, J.; Zhong, H. High-Resolution Multi-Source Traffic Data in New Zealand. Sci. Data 2024, 11, 1216. [Google Scholar] [CrossRef] [PubMed]

| Hyperparameter | PEMS04 | PEMS08 | NZ |
|---|---|---|---|
| learning rate | 0.0005 | 0.0001 | 0.0003 |
| dropout | 0.006 | 0.004 | 0.002 |
| BiLSTM hidden | 32 | 256 | 128 |
| BiLSTM num layer | 3 | 2 | 3 |
| TCN channels | [64, 128, 256] | [96, 192, 384] | [96, 192, 384] |
| TCN kernel size | 3 | 2 | 3 |
| Transformer hidden | 64 | 64 | 128 |
| Transformer heads | 8 | 8 | 4 |
| Transformer layers | 2 | 3 | 3 |
| Expert gating hidden | 32 | 128 | 32 |
| Abbreviations | Expert Combinations |
|---|---|
| LSTM-TCN | |
| LSTM-TCN-Transformer | |
| LSTM-GRU-Transformer | |
| GRU-TCN-Transformer | |
| BiLSTM-GNN-Transformer | |
| BiLSTM-GRU-TCN | |
| BiLSTM-AGC-Transformer | |
| BiLSTM-TCN-Transformer |
| PEMS04 | PEMS08 | NZ | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAPE | MAE | RMSE | MAPE | MAE | RMSE | MAPE | ||||
| 18.439 | 30.717 | 0.122 | 0.958 | 13.583 | 22.705 | 0.099 | 0.972 | 19.869 | 41.203 | 0.282 | 0.939 | |
| 18.466 | 30.794 | 0.122 | 0.957 | 13.489 | 22.694 | 0.088 | 0.972 | 19.637 | 41.075 | 0.275 | 0.936 | |
| 19.566 | 31.586 | 0.146 | 0.955 | 14.209 | 23.128 | 0.096 | 0.971 | 20.232 | 42.63 | 0.271 | 0.935 | |
| 18.394 | 30.688 | 0.122 | 0.958 | 13.273 | 22.611 | 0.088 | 0.972 | 20.222 | 43.094 | 0.275 | 0.932 | |
| 18.375 | 30.74 | 0.121 | 0.958 | 13.346 | 22.681 | 0.088 | 0.972 | 19.719 | 41.515 | 0.28 | 0.938 | |
| 18.522 | 30.769 | 0.123 | 0.957 | 14.012 | 23.032 | 0.094 | 0.971 | 19.685 | 41.258 | 0.27 | 0.939 | |
| 18.475 | 30.884 | 0.122 | 0.957 | 13.607 | 22.815 | 0.089 | 0.972 | 19.908 | 40.743 | 0.27 | 0.942 | |
| 18.403 | 30.792 | 0.122 | 0.957 | 13.086 | 22.631 | 0.087 | 0.972 | 19.938 | 42.525 | 0.269 | 0.932 | |
| PEMS04 | PEMS08 | NZ | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAPE | MAE | RMSE | MAPE | MAE | RMSE | MAPE | ||||
| LSTM | 19.740 | 32.271 | 0.132 | 0.952 | 15.108 | 24.494 | 0.101 | 0.965 | 21.051 | 42.448 | 0.289 | 0.940 |
| GRU | 19.665 | 32.220 | 0.133 | 0.952 | 14.929 | 24.361 | 0.100 | 0.966 | 21.834 | 46.000 | 0.282 | 0.931 |
| BiLSTM | 19.298 | 31.643 | 0.128 | 0.954 | 14.669 | 23.877 | 0.098 | 0.968 | 20.469 | 41.567 | 0.275 | 0.939 |
| TCN | 18.561 | 30.858 | 0.124 | 0.957 | 13.681 | 22.775 | 0.091 | 0.972 | 20.360 | 42.930 | 0.279 | 0.933 |
| Transformer | 19.833 | 31.643 | 0.162 | 0.955 | 14.429 | 23.136 | 0.128 | 0.971 | 21.164 | 42.746 | 0.345 | 0.935 |
| ASTGCN | 21.905 | 36.331 | 0.144 | 0.941 | 15.651 | 25.856 | 0.107 | 0.963 | 23.143 | 48.813 | 0.310 | 0.925 |
| DGCN | 20.598 | 33.288 | 0.146 | 0.920 | 17.425 | 26.809 | 0.120 | 0.919 | 25.158 | 51.518 | 0.396 | 0.824 |
| PGECRN | 19.682 | 32.220 | 0.134 | 0.953 | 14.571 | 24.515 | 0.102 | 0.967 | 34.791 | 76.372 | 0.404 | 0.880 |
| BO- | 18.42 | 30.778 | 0.126 | 0.958 | 13.564 | 22.891 | 0.093 | 0.971 | 19.986 | 42.933 | 0.271 | 0.933 |
| BO- | 18.338 | 30.687 | 0.122 | 0.958 | 13.495 | 22.751 | 0.091 | 0.972 | 19.953 | 43.071 | 0.275 | 0.935 |
| BO- | 18.195 | 30.490 | 0.120 | 0.958 | 13.157 | 22.488 | 0.086 | 0.972 | 19.742 | 40.504 | 0.267 | 0.936 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, J.; Ren, J.; Wang, H.; Xie, F.; Chen, S.; Jiang, M. A Bayesian-Optimized Mixture of Experts Framework for Short-Term Traffic Flow Prediction. Modelling 2026, 7, 55. https://doi.org/10.3390/modelling7020055
Wu J, Ren J, Wang H, Xie F, Chen S, Jiang M. A Bayesian-Optimized Mixture of Experts Framework for Short-Term Traffic Flow Prediction. Modelling. 2026; 7(2):55. https://doi.org/10.3390/modelling7020055
Chicago/Turabian StyleWu, Jianqing, Jiaao Ren, Hui Wang, Fei Xie, Shaohan Chen, and Mengjie Jiang. 2026. "A Bayesian-Optimized Mixture of Experts Framework for Short-Term Traffic Flow Prediction" Modelling 7, no. 2: 55. https://doi.org/10.3390/modelling7020055
APA StyleWu, J., Ren, J., Wang, H., Xie, F., Chen, S., & Jiang, M. (2026). A Bayesian-Optimized Mixture of Experts Framework for Short-Term Traffic Flow Prediction. Modelling, 7(2), 55. https://doi.org/10.3390/modelling7020055

