1. Introduction
With rising population, vehicle ownership, and traffic jams, public transport systems are the solutions needed to make cities more sustainable and viable. One of the factors that affects the efficiency of a public transport system is bus arrival times. These affect the commuter’s satisfaction level. Accurate prediction of bus arrival time is an important activity for transport agencies and city planners to deliver better services and to reduce operational problems due to traffic-hindered delays. Traffic congestion is the main reason for bus arrival time delays. Using modern machine learning (ML) techniques, predicting traffic density and using this prediction to assess the time of arrival of buses are good measures to tackle delays. Traffic density is the number of vehicles on the road at a certain time. It is an important parameter as it affects speed and travel times. Even the total travel time takes longer than expected with a higher density of private cars. Historically, bus schedules and predictions for arrival times used a static timetable and basic traffic forecasts, which were static and could not adjust to real-time traffic conditions. Urban traffic flows can quickly evolve, making prediction and control increasingly complicated. Inaccuracy leads to dissatisfaction amongst commuters. Machine learning helps analyze large datasets to find patterns which contribute to improvement in the accuracy of the prediction. The goal is to build a machine learning-based framework for traffic density modeling and bus arrival time prediction.
The dataset that is used in this research contains over 1,048,577 instances, which have large spatial, temporal, and traffic variables, such as longitude, latitude, time, date, average speed, number of vehicles, location of the vehicle, which are encoded using geohashes. Looking at these figures, we can classify the traffic density as “1 (High)” or “0 (Low)”. Different ML models like Logistic Regression (LG), Naïve Bayes (NB), Gradient Boosting (GB), K-NN, and Support Vector Machine (SVM) do not face any complications as traditional methods do as they can extract realistic complex relationships between different sets of variable interactions [
1].
Using the traffic density forecasts in bus scheduling can improve the transport system. Accurate arrival time (AT) prediction also helps us to improve the performance—for example, to reduce the waiting time for the passengers—and it also impacts on the ratio of the public transport users. These predictions can also support flexible traffic management, like managing the routes according to the high traffic density time or changing the number of buses according to the road conditions. That improvement helps passengers to save time and also supports smart city goals to improve transport in urban areas.
This research is helpful not only in practice but also adds to the knowledge about intelligent transportation systems (ITS). Previous background studies have looked at traffic prediction using different machine learning models, showing that they can handle the big dataset. And Zhang, X. et al. explained how spatial–temporal features can improve prediction accuracy in their research paper [
2]. There are few studies available which mainly focus on that point—traffic density—which helps to improve the arrival prediction time. This research aims to focus on this by using a traffic density model. As cities keep growing, the problems caused by high traffic are becoming more obvious. Traffic delays lead to billions of dollars in lost work time and add to air pollution by increasing greenhouse emissions. Using traffic density models for public transportation can encourage more people to use buses instead of cars, helping to reduce these problems.
2. Literature Review
Recent research on traffic prediction illustrates the growing adoption of machine learning for solving problems in transport. Several types of models helpful in predicting the flow, density, and congestion in traffic flow have been considered by researchers. Some models like LG, GB, and SVM turned out very successfully in handling big datasets and yielded very good accuracy in results. Other works have targeted deep learning methods such as the long short term memory (LSTM) network, which is efficient in analyzing patterns over time to five better predictions. Most research targets general traffic forecasting and does not tend to specifically address how traffic density affects public transportation. This gap justifies the importance of our kind of studies, where traffic density prediction is proposed for bus scheduling improvement to alleviate the traffic problem in cities. This work will add to previous works in an effort to come up with smart, data-driven solutions to improve transportation in cities.
The authors use the Markov Model, which achieved 98% accuracy for traffic density estimation. Future work aims to integrate dynamic factors, road diversity, and reinforcement learning for optimization [
3]. The models by [
4] showed a mean error of 13%. Future research will refine data quality, ramp metering, and expand model capabilities. The authors in [
5] provided the best accuracy, while Fast Fourier Transform (FFT) was most efficient for real-time forecasting. Future work includes improving datasets, scalability, and exploring Convolution Neural Network (CNNs).
Andrzej Sroczyński, Andrzej Czyżewski: A single-layer LSTM and gated recurrent unit (GRU) model excelled in traffic prediction. Future goals focus on adaptive control and real-time deployment [
6]. In this paper, the author used Temporal Graph Convolutional Network (ToGCN) with Seq2Seq and achieved 78.12% accuracy in urban traffic flow prediction. Future plans include scaling for V2X and IoT integration [
7]. The author in this paper applied the Artificial Neural Network (ANN) and reached 32.47% Mean Absolute Percentage Error (MAPE) for short-term traffic density. Hybrid methods and longer data collection are planned next [
8].
The authors in [
9] surpassed traditional methods with 93% accuracy. Future efforts aim at hybrid models and real-time applications. Authors used deep learning to improve urban traffic forecasts by automating feature extraction. Future research will enhance parallel computing and data integration [
10]. Ref. [
11] used sinusoid-based spatial model improved load balancing in cellular networks. Future work includes combining real-time data with ML for better predictions.
Anurag Kanungo, Ayush Sharma, Chetan Singla: A dynamic traffic light algorithm reduced congestion by 35%. Future research will focus on adaptive algorithms and scalability with ML integration [
12]. Idriss Moumen, Jaafar Abouchabaka, Najat Rafalia: LSTM and Decision Trees predicted traffic flow effectively. Future directions include scaling models for large cities and real-time usage [
13]. The authors applied supervised learning with Simulation of Urban Mobility (SUMO) simulations, which delivered high accuracy. Future goals involve enhancing scalability, integrating advanced ML models, and improving accuracy [
14].
The authors have developed the Green Way Algorithm and no specific machine learning model: Srishti Bhargava, Krishna Prakasha, and Ishita Sinha. Simulations ran just fine but did not show any accuracy details. In future work, they plan to test it with real-time data, compare with other methods, and use past data to make better predictions [
15]. Joseph Mathew and P. M. Xavier wrote “A survey”. This survey gives an overview of the current state of several methods of traffic detection using wireless signals. The strengths and weaknesses of each methodology have been discussed in this paper concerning its capability to monitor traffic conditions. It will bring a deep understanding of the capability of different wireless technologies in improving the efficiency and accuracy of managing road traffic [
16].
The paper developed an estimation about the arrival timings for buses by incorporating a machine learning model along with the traffic density pattern. Thus, the maximum achieved accuracy corresponding to the ETA predictions for buses was approximately around 92%. Thus, in the near future, the authors would like to enhance this system with high-resolution traffic data, a more enhanced version of the machine learning model so that the estimations come out better, and lastly, testing this system in other cities for improved mannerisms and adaptability [
17].
3. Methodology
Traffic density modeling involves using data such as vehicle counts and actual speed to represent road congestion. This modeling plays a crucial role in accurately predicting bus arrival times. By incorporating this approach, the reliability of public transport can be improved, the commuter experience enhanced, and broader smart city initiatives supported [
16].
3.1. Model
The model framework was implemented using RapidMiner as shown in
Figure 1.
Figure 1 illustrates the overall workflow implemented in RapidMiner. First, the dataset containing traffic density information was retrieved (Retrieve operator). Next, the target attribute was assigned using the Set Role operator. The dataset was then divided into training and testing subsets with the Split Data operator. A Logistic Regression model was trained on the training subset, and subsequently applied to the testing subset using the Apply Model operator. Finally, model performance was evaluated with the Performance operator, providing accuracy and related metrics. The results of evaluation metrics are shown in
Figure 2.
3.1.1. Logistic Regression
Logistic Regression is a simple and interpretable model commonly used for binary or multi-class classification tasks. It predicts class probabilities based on a linear relationship between the input features and the target variable. In this study, the Logistic Regression model achieved an accuracy of 99.91%.
3.1.2. Naïve Bayes
Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem, which assumes that features are conditionally independent given the class label. This model is computationally efficient and well suited for high-dimensional data. The Naïve Bayes model achieved an accuracy of 94.92%.
3.1.3. K-Nearest Neighbors
K-NN is a non-parametric algorithm that predicts the class of a data point based on the majority class among its k nearest neighbors in the feature space. It is intuitive and effective for small- to medium-sized datasets. The K-NN model in this study achieved an accuracy of 98.73%.
3.1.4. Support Vector Machine
Support Vector Machine is a supervised machine learning model that aims to find the optimal hyperplane that best separates data points into distinct classes. In this study, the SVM model—referred to as the Hyper Hyper Model—achieved an accuracy of 54.62%, which is notably lower than the other models tested, suggesting limitations in its performance for this dataset.
3.1.5. Gradient Boosting
Gradient Boosting is an ensemble technique that builds multiple decision trees sequentially, where each new tree focuses on correcting the errors of the previous ones. This iterative process results in a robust model for both classification and regression tasks. The Gradient Boosting model achieved an accuracy of 98.97%, demonstrating strong predictive capability.
3.2. Framework
This study utilizes a dataset consisting of 1,048,576 instances with 9 attributes, with traffic density designated as the target variable [
18]. The dataset is split into 80% for training and 20% for testing. The five machine learning models applied include LG, NB, K-NN, SVM, and GB.
The respective accuracies obtained from each model are shown in
Table 1:
Logistic Regression (LG): 99.91%;
Naïve Bayes (NB): 94.92%;
K-Nearest Neighbors (K-NN): 98.73%;
Support Vector Machine (SVM): 54.62%;
Gradient Boosting: 98.97%.
Table 1.
Model accuracy.
Model | Accuracy % |
---|
Logistic Regression | 99.91 |
Naïve Bayes | 94.92 |
K-NN | 98.73 |
Support Vector Machine | 54.62 |
Gradient Boosting | 98.97 |
These results highlight that the Logistic Regression model significantly outperformed the others, achieving the highest accuracy of 99.91%. In comparison, the Markov Model proposed by Hira Beenish et al. reached an accuracy of 98%. This comparison suggests that our Logistic Regression approach may be more reliable and effective for traffic density prediction.
The strength of our model lies in its ability to handle large-scale datasets with numerous features, making it particularly advantageous for applications in urban traffic management and public transportation planning.
4. Conclusions
Our research demonstrates the potential of machine learning in predicting traffic density using large-scale datasets. In this study, various models—LG, NB, K-NN, SVM, and GB—were implemented to achieve high accuracy and provide valuable insights. The predictions generated by these models are crucial for enhancing urban transportation systems.
The proposed framework serves as a foundation for the development of smarter transportation solutions, particularly in optimizing bus scheduling and mitigating traffic congestion. Additionally, it contributes to making urban travel more environmentally sustainable by encouraging more efficient use of public transport.
For future work, incorporating real-time traffic data will be essential to improve prediction accuracy. Other influential factors—such as weather conditions, road types, and unexpected events—should also be integrated to further strengthen the model’s reliability and robustness. In conclusion, this study offers a solid starting point for leveraging machine learning techniques to address traffic-related challenges and to improve urban mobility systems.
Author Contributions
A.U. conceptualized the study, designed the research framework, and supervised the project. T.M.A. performed data collection, preprocessing, model implementation, evaluation of results, and prepared figures and tables. C.I. provided supervision and guidance, critically reviewed the methodology and results, validated findings, and contributed to manuscript editing and final approval. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data will be made available upon reasonable request to the first author.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Chen, C.M.; Liang, C.C.; Chu, C.P. Long-term travel time prediction using gradient boosting. J. Intell. Transp. Syst. 2020, 24, 109–124. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Y.; Zhou, X.; Luo, J.; Zhang, Z.L. Urban traffic dynamics prediction—A continuous spatial-temporal meta-learning approach. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–19. [Google Scholar] [CrossRef]
- Beenish, H.; Javid, T.; Fahad, M.; Shahzad, Y.; Ishaq, F. Contemporary Study of Machine Learning Algorithms for Traffic Density Estimation in Intelligent Transportation Systems. Intell. Transp. Syst. 2023, 5, 592–608. [Google Scholar]
- Muñoz, L.; Sun, X.; Horowitz, R.; Alvarez, L. Traffic density estimation with the cell transmission model. In Proceedings of the 2003 American Control Conference(ACC), Denver, CO, USA, 4–6 June 2003; pp. 3750–3755. [Google Scholar]
- Sun, P.; Aljeri, N.; Boukerche, A. Machine learning-based models for real-time traffic flow prediction in vehicular networks. IEEE Netw. 2020, 34, 178–185. [Google Scholar] [CrossRef]
- Sroczynski, A.; Czyzewski, A. Road traffic can be predicted by machine learning equally effectively as by complex microscopic model. Sci. Rep. 2023, 13, 14523. [Google Scholar] [CrossRef] [PubMed]
- Qiu, H.; Zheng, Q.; Msahli, M.; Memmi, G.; Qiu, M.; Lu, J. Topological graph convolutional network-based urban traffic flow and density prediction. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4560–4569. [Google Scholar] [CrossRef]
- Fang, C.Y.; Chiou, C.F.; Chen, C.L.; Chen, S.W. Dangerous driving condition analysis in driver assistance systems. In Proceedings of the 2009 12th International IEEE Conference on Intelligent Transportation Systems, St. Louis, MO, USA, 4–7 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar]
- Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2015, 16, 865–873. [Google Scholar] [CrossRef]
- Liu, Z.; Li, Z.; Wu, K.; Li, M. Urban traffic prediction from mobility data using deep learning. IEEE Netw. 2018, 32, 40–46. [Google Scholar] [CrossRef]
- Humayun, M.; Ashfaq, F.; Jhanjhi, N.Z.; Alsadun, M.K. Traffic management: Multi-scale vehicle detection in varying weather conditions using yolov4 and spatial pyramid pooling network. Electronics 2022, 11, 2748. [Google Scholar] [CrossRef]
- Ashfaq, F.; Ghoniem, R.M.; Jhanjhi, N.Z.; Khan, N.A.; Algarni, A.D. Using dual attention BiLSTM to predict vehicle lane changing maneuvers on highway dataset. Systems 2023, 11, 196. [Google Scholar] [CrossRef]
- Zeroual, A.; Harrou, F.; Sun, Y. Predicting road traffic density using a machine learning-driven approach. In Proceedings of the 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 9–10 December 2021. [Google Scholar] [CrossRef]
- Moumen, I.; Abouchabaka, J.; Rafalia, N. Smart traffic forecasting: Leveraging adaptive machine learning and big data analytics for traffic flow prediction. IAES Int. J. Artif. Intell. 2024, 13, 2323–2332. [Google Scholar] [CrossRef]
- Chhatpar, P.; Doolani, N.; Shahani, S.; Priya, R.L. Machine learning solutions to vehicular traffic congestion. In Proceedings of the 2018 International Conference on Smart City and Emerging Technology (ICSCET), Mumbai, India, 5 January 2018; pp. 1–4. [Google Scholar]
- Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A new model for predicting component-based software reliability using soft computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
- Babbar, H.; Rani, S.; Masud, M.; Verma, S.; Anand, D.; Jhanjhi, N. Load balancing algorithm for migrating switches in software-defined vehicular networks. Comput. Mater. Contin. 2021, 67, 1301–1316. [Google Scholar] [CrossRef]
- Airehrour, D.; Gutierrez, J.; Ray, S.K. GradeTrust: A secure trust based routing protocol for MANETs. In Proceedings of the 2015 International Telecommunication Networks and Applications Conference (ITNAC), Sydney, Australia, 18–20 November 2015; pp. 65–70. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).