SBMEV: A Stacking-Based Meta-Ensemble Vehicle Classification Framework for Real-World Traffic Surveillance
Abstract
1. Introduction
- A new dataset comprising 10,864 vehicle images is introduced from real-world traffic scenarios in urban highways in India. Detailed and structured annotations are provided to ensure reproducibility and enable rigorous evaluation of vehicle classification methods.
- Eleven state-of-the-art deep learning architectures, namely VGG16, VGG19, MobileNetV2, Xception, AlexNet, ResNet50, ResNet152, DenseNet121, DenseNet201, InceptionV3, and NASNetMobile, were rigorously evaluated on the EAHVSD and JUIVCD datasets, providing an extensive performance analysis.
- A stacking-based meta-ensemble learning strategy is employed for vehicle classification, integrating diverse base learners and combining their predictions through a multi-meta-learner. This approach enhances model diversity and achieves performance across multiple evaluation metrics, including accuracy, precision, recall, F1 Score, Cohen’s Kappa, and ROC AUC curve.
- Hyperparameter tuning strategies were incorporated to optimise both the individual models and the proposed ensemble framework, resulting in improved classification efficiency and robustness.
- The proposed ensemble framework was validated on both datasets, demonstrating superior performance compared to individual models and existing approaches in terms of accuracy and reliability.
2. Related Work
2.1. Vehicle Classification Using Deep Learning Techniques
2.2. Ensemble Learning Approaches
2.3. Stacking-Based Meta Ensemble Approach
3. Methodology
3.1. Data Collection
3.1.1. Proposed EAHVSD Dataset
3.1.2. JUIVCD Dataset
3.2. Data Annotation Process
| Algorithm 1 YOLO-to-XML Annotation Conversion |
|
3.3. Data Augmentation
3.4. Data Split
- (a)
- Training dataset: The training set of the custom dataset comprises four folders: LCV_0, LMV_1, OSV_2, and Truck_3. The LCV_0 folder contains 1044 images, LMV_1 folder contains 4211 images, OSV_2 contains 1186 images, and Truck_3 contains 1162 images.
- (b)
- Validation dataset: The validation set of the custom dataset comprises four folders: LCV_0, LMV_1, OSV_2, and Truck_3. The LCV_0 folder contains 224 images, the LMV_1 folder contains 902 images, the OSV_2 folder contains 254 images, and the Truck_3 folder contains 249 images. In total, the validation set consists of 1634 vehicle images.
- (c)
- Testing dataset: The testing set of the custom dataset comprises four folders: LCV_0, LMV_1, OSV_2, and Truck_3. The LCV_0 folder contains 224 images, the LMV_1 folder contains 903 images, the OSV_2 folder contains 250 images, and the Truck_3 folder contains 255 images. In total, the testing set consists of 1632 vehicle images.
3.5. Pre-Trained Deep Learning Models
3.5.1. VGGNet
3.5.2. MobileNetV2
3.5.3. DenseNet
3.5.4. InceptionV3
3.6. Meta-Learners in the Stacking-Based Meta-Ensemble Framework
- (a)
- Logistic Regression (LR): LR is employed as a linear meta-learner that estimates class posterior probabilities by learning weighted combinations of CNN outputs. Its simplicity and interpretability provide a strong baseline for evaluating the effectiveness of the stacking framework.
- (b)
- Random Forest (RF): RF serves as a non-linear meta-learner capable of modelling complex interactions among the CNN probability features. By aggregating decisions from multiple randomised trees, RF improve robustness and reduces overfitting at the meta level.
- (c)
- Support Vector Machine (SVM): The SVM meta learner constructs optimal separating hyperplanes in the CNN-derived feature space. Kernel-based SVMs further enable non-linear decision boundaries, enhancing class discrimination in challenging scenarios.
- (d)
- K-Nearest Neighbour (KNN): KNN is utilised as an instance-based meta learner that assigns class labels based on similarity between CNN probability vectors. This approach provides a non-parametric perspective on ensemble decision fusion.
- (e)
- Multi-Layer Perceptron (MLP): The MLP meta-learner relationships among the CNN outputs through multiple fully connected layers, enabling more expressive ensemble modelling.
- (f)
- XGBoost: XGBoost is employed as a powerful gradient-boosted meta-learner that iteratively refines ensemble predictions by minimising classification loss. Its regularisation mechanisms improve generalisation and stability of the stacking model.
3.7. Stacking-Based Meta-Ensemble Learning Technique
| Algorithm 2 Stacking-Based Meta-Ensemble Training |
|
4. Experimental Setup and Implementation
4.1. Hardware and Software Configuration
4.2. Computational Efficiency Analysis
4.3. Training Configuration: Hyperparameters, Optimiser and Loss Function
4.4. Evaluation Metrics
- (a)
- Accuracy (A): Accuracy measures the proportion of correctly classified samples in the test set, providing an overall indication of model performance [46].
- (b)
- Precision (P): Precision evaluates the ratio of correctly predicted positive samples to the total number of predicted positive samples. It reflects the model’s ability to minimise false positives [46].
- (c)
- Recall (R): Recall represents the proportion of actual positive samples that are correctly identified. It indicates the model’s sensitivity and is particularly important in scenarios with class imbalance [46].
- (d)
- F1-score (F1): The F1-score is the harmonic mean of Precision and Recall. It provides a balance metric for evaluating classifiers that takes both false positives and false negatives into account. [47].
- (e)
- ROC–AUC Curve: The Receiver Operating Characteristic (ROC) curve illustrates the relationship between the True Positive Rate (sensitivity) and the False Positive Rate (1-specificity) across varying thresholds. The area under the curve (AUC) provides a scalar measure of discrimination capability, with higher values indicating superior performance [48].
- (f)
- Cohen’s Kappa: Cohen’s kappa quantifies inter-annotator agreement for categorical classification while accounting for the possibility of agreement occurring by chance [49].
5. Evaluation and Result
5.1. Results on the EAHVSD Dataset
5.2. Ensemble Model Results on the EAHVSD Dataset
5.3. Results on the Publicly Benchmarked JUIVCD Dataset Individual and Ensemble Model
5.4. Comparison Study on Model and Ensemble Performance on the EAHVSD and JUIVCD Datasets
5.5. Ablation Study
5.6. ROC-AUC Analysis of the Proposed Ensemble Model
5.7. Comparison with Existing Studies
6. Threats to Validity
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ITS | Intelligent Transportation System |
| Faster R-CNN | Region-Based Convolutional Neural Network |
| YOLO | You Only Look Once |
| VANETs | Vehicular Ad Hoc Networks |
| VRe-ID | Vehicle Re-Identification |
| MFCC | Mel-Frequency Cepstral Coefficients |
| PHOG | Pyramid Histogram of Oriented Gradients |
| Bi-LSTM | Bidirectional Long Short-Term Memory |
| DenseNet | Densely Connected Convolutional Networks |
| InceptionV3 | Deep CNN with Inception Modules |
| NASNetMobile | Neural Architecture Search Network for Mobile Devices |
| RetinaNet | Object Detection Model with Focal Loss |
| RCNet | Road Condition Classification Network |
| JUVCSID | Jamia University Vehicle Classification Surveillance Indian Dataset |
| NGSIM | Next-Generation Simulation Dataset |
| KITTI | Karlsruhe Institute for Technology and Informatics Dataset |
| GAN | Generative Adversarial Network |
| WMVE | Weighted Majority Voting Ensemble |
| CBAM | Convolutional Block Attention Module |
| AI | Artificial Intelligence |
| ML | Machine Learning |
| DL | Deep Learning |
| SVM | Support Vector Machine |
| DT | Decision Tree |
| RF | Random Forest |
| LBP | Local Binary Pattern |
| IoT | Internet of Things |
| MLP | Multi-Layer Perceptron |
| ROI | Region of Interest |
| SWA | Soft-Weighted-Average |
| GRU | Gated Recurrent Unit |
| CNN | Convolutional Neural Network |
| SSD | Single Shot Multi Box Detector |
| HOG | Histogram of Oriented Gradients |
| VGGNet | Visual Geometry Group Networks |
| ResNet | Residual Neural Network |
| LSTM | Long Short-Term Memory |
| AUC | Area Under the Curve |
| NB | Naive Bayes |
References
- Ambardekar, A.; Nicolescu, M.; Bebis, G.; Nicolescu, M. Vehicle classification framework: A comparative study. EURASIP J. Image Video Process. 2014, 2014, 29. [Google Scholar] [CrossRef]
- Boukerche, A.; Siddiqui, A.J.; Mammeri, A. Automated vehicle detection and classification: Models, methods, and techniques. ACM Comput. Surv. (CSUR) 2017, 50, 1–39. [Google Scholar] [CrossRef]
- Butt, M.A.; Khattak, A.M.; Shafique, S.; Hayat, B.; Abid, S.; Kim, K.-I.; Ayub, M.W.; Sajid, A.; Adnan, A. Convolutional neural network-based vehicle classification in adverse illuminous conditions for intelligent transportation systems. Complexity 2021, 2021, 6644861. [Google Scholar] [CrossRef]
- Maity, S.; Saha, D.; Singh, P.K.; Sarkar, R. JUIVCDv1: Development of a still-image-based dataset for Indian vehicle classification. Multimed. Tools Appl. 2024, 83, 71379–71406. [Google Scholar] [CrossRef]
- Pandey, A.D.; Kumar, B.; Parida, M.; Mudgal, A.; Chouksey, A.K.; Mishra, R. Vehicle classification using accelerometer signals and machine-learning techniques. J. Intell. Transp. Syst. 2025, 1–29. [Google Scholar] [CrossRef]
- Chen, X.; Liu, Y.; Li, S. BML-YOLO: Multi-scale vehicle target detection method based on feature fusion. Signal Image Video Process. 2025, 19, 745. [Google Scholar] [CrossRef]
- Yu, S.; Wu, Y.; Li, W.; Song, Z.; Zeng, W. A model for fine-grained vehicle classification based on deep learning. Neurocomputing 2017, 257, 97–103. [Google Scholar] [CrossRef]
- Battiato, S.; Farinella, G.M.; Furnari, A.; Puglisi, G.; Snijders, A.; Spiekstra, J. An integrated system for vehicle tracking and classification. Expert Syst. Appl. 2015, 42, 7263–7275. [Google Scholar] [CrossRef]
- Venkatasivarambabu, P.; Babu, R.K.; Jagan, B.; Rai, H.M.; Agarwal, N.; Agarwal, S. Vehicle tracking and classification for intelligent transportation systems using YOLOv5 and modified deep SORT with HRNN. Signal Image Video Process. 2025, 19, 1–12. [Google Scholar] [CrossRef]
- Usama, M.; Anwar, H.; Anwar, S. Vehicle and license plate recognition with novel dataset for toll collection. Pattern Anal. Appl. 2025, 28, 57. [Google Scholar] [CrossRef]
- Seo, A.; Jeon, H.; Son, Y. Robust prediction method for pedestrian trajectories in occluded video scenarios. Soft Comput. 2025, 29, 4449–4459. [Google Scholar] [CrossRef]
- Aljebreen, M.; Alabduallah, B.; Mahgoub, H.; Allafi, R.; Hamza, M.A.; Ibrahim, S.S.; Yaseen, I.; Alsaid, M.I. Integrating IoT and honey badger algorithm-based ensemble learning for accurate vehicle detection and classification. Ain Shams Eng. J. 2023, 14, 102547. [Google Scholar] [CrossRef]
- Gayen, S.; Maity, S.; Singh, P.K.; Sarkar, R. SimSANet: A simple sequential attention-aided deep neural network for vehicle make and model recognition. Neural Comput. Appl. 2025, 37, 319–339. [Google Scholar] [CrossRef]
- Chen, W.; Sun, Q.; Wang, J.; Dong, J.-J.; Xu, C. A novel model based on AdaBoost and deep CNN for vehicle classification. IEEE Access 2018, 6, 60445–60455. [Google Scholar] [CrossRef]
- Hedeya, M.A.; Eid, A.H.; Abdel-Kader, R.F. A super-learner ensemble of deep networks for vehicle-type classification. IEEE Access 2020, 8, 98266–98280. [Google Scholar] [CrossRef]
- Stocker, M.; Silvonen, P.; Rönkkö, M.; Kolehmainen, M. Detection and classification of vehicles by measurement of road-pavement vibration and by means of supervised machine learning. J. Intell. Transp. Syst. 2016, 20, 125–137. [Google Scholar] [CrossRef]
- Lee, H.; Coifman, B. Using LIDAR to validate the performance of vehicle classification stations. J. Intell. Transp. Syst. 2015, 19, 355–369. [Google Scholar] [CrossRef]
- Pateriya, P.; Trivedi, A.; Malhotra, R. Transforming traffic management: Vehicle classification in smart transportation systems. In Proceedings of the International Conference on Structural Engineering and Construction Management, Angamaly, India, 5–7 June 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1011–1023. [Google Scholar]
- Guo, L.; Li, R.; Jiang, B. An ensemble broad learning scheme for semisupervised vehicle type classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5287–5297. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, M.; Luo, Z.; Cai, Y. An ensemble deep learning method for vehicle type classification on visual traffic surveillance sensors. IEEE Access 2017, 5, 24417–24425. [Google Scholar] [CrossRef]
- Liu, W.; Luo, Z.; Li, S. Improving deep ensemble vehicle classification by using selected adversarial samples. Knowl.-Based Syst. 2018, 160, 167–175. [Google Scholar] [CrossRef]
- Pemila, M.; Pongiannan, R.K.; Narayanamoorthi, R.; Sweelem, E.A.; Hendawi, E.; El-Sebah, M.I.A. classification of vehicles using machine learning algorithm on the extensive dataset. IEEE Access 2024, 12, 98338–98351. [Google Scholar] [CrossRef]
- Ghosh, T.; Gayen, S.; Maity, S.; Valenkova, D.; Sarkar, R. A feature fusion-based custom deep learning model for vehicle make and model recognition. In Proceedings of the 13th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 11–14 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
- Saputra, W.S.J.; Puspaningrum, E.Y.; Syahputra, W.F.; Sari, A.P.; Via, Y.V.; Idhom, M. Car classification based on image using transfer learning convolutional neural network. In Proceedings of the 2022 IEEE Information Technology International Seminar (ITIS), Surabaya, Indonesia, 19–21 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 324–327. [Google Scholar]
- Kapaliya, S.; Swain, D.; Kaur, H.; Satapathy, S. An efficient deep learning based vehicle classification system for Indian vehicles. In Proceedings of the 2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), Gunupur, India, 15–17 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Wang, Y.; Deng, Y.; Zheng, Y.; Chattopadhyay, P.; Wang, L. Vision transformers for image classification: A comparative survey. Technologies 2025, 13, 32. [Google Scholar] [CrossRef]
- Wang, Y.; Yin, Y.; Li, Y.; Qu, T.; Guo, Z.; Peng, M.; Jia, S.; Wang, Q.; Zhang, W.; Li, F. Classification of plant leaf disease recognition based on self-supervised learning. Agronomy 2024, 14, 500. [Google Scholar] [CrossRef]
- Shvai, N.; Hasnat, A.; Meicler, A.; Nakib, A. Accurate classification for automatic vehicle-type recognition based on ensemble classifiers. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1288–1297. [Google Scholar] [CrossRef]
- Zhang, B. Reliable classification of vehicle types based on cascade classifier ensembles. IEEE Trans. Intell. Transp. Syst. 2012, 14, 322–332. [Google Scholar] [CrossRef]
- Zhang, H.; Fu, R. An ensemble learning–online semi-supervised approach for vehicle behavior recognition. IEEE Trans. Intell. Transp. Syst. 2021, 23, 10610–10626. [Google Scholar] [CrossRef]
- Jagannathan, P.; Rajkumar, S.; Frnda, J.; Divakarachari, P.B.; Subramani, P. Moving vehicle detection and classification using gaussian mixture model and ensemble deep learning technique. Wirel. Commun. Mob. Comput. 2021, 2021, 5590894. [Google Scholar] [CrossRef]
- Wang, S. Real operational labeled data of air handling units from office, auditorium, and hospital buildings. Sci. Data 2025, 12, 1481. [Google Scholar] [CrossRef] [PubMed]
- Wang, S. Effectiveness of traditional augmentation methods for rebar counting using UAV imagery with Faster R-CNN and YOLOv10-based transformer architectures. Sci. Rep. 2025, 15, 33702. [Google Scholar] [CrossRef]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. WIREs Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Zhang, H.; Guo, Y.; Wang, C.; Fu, R. Stacking-based ensemble learning method for the recognition of the preceding vehicle lane-changing manoeuvre: A naturalistic driving study on the highway. IET Intell. Transp. Syst. 2022, 16, 489–503. [Google Scholar] [CrossRef]
- Khoshkangini, R.; Mashhadi, P.; Tegnered, D.; Lundström, J.; Rögnvaldsson, T. Predicting vehicle behaviour using multi-task ensemble learning. Expert Syst. Appl. 2023, 212, 118716. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, H.; Zhou, Y.; Guo, Z.; Lin, F. Improved DAB-DETR model for irregular traffic obstacles detection in vision-based driving environment perception scenario. Appl. Intell. 2025, 55, 541. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Mascarenhas, S.; Agarwal, M. A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for image classification. In Proceedings of the 2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON), Bengaluru, India, 19–21 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 96–99. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Naskinova, I. Transfer learning with NASNet-Mobile for Pneumonia X-ray classification. Asian-Eur. J. Math. 2023, 16, 2250240. [Google Scholar] [CrossRef]
- Buckland, M.; Gey, F. The relationship between recall and precision. J. Am. Soc. Inf. Sci. 1994, 45, 12–19. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
- Malhotra, R.; Khan, K. OpTunedSMOTE: A novel model for automated hyperparameter tuning of SMOTE in software defect prediction. Intell. Data Anal. 2025, 29, 787–807. [Google Scholar] [CrossRef]
- Artstein, R.; Poesio, M. Inter-coder agreement for computational linguistics. Comput. Linguist. 2008, 34, 555–596. [Google Scholar] [CrossRef]
- Townsend, J.T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys. 1971, 9, 40–50. [Google Scholar] [CrossRef]
- Wang, H.; Yu, Y.; Cai, Y.; Chen, X.; Chen, L.; Li, Y. Soft-weighted-average ensemble vehicle detection method based on single-stage and two-stage deep learning models. IEEE Trans. Intell. Veh. 2020, 6, 100–109. [Google Scholar] [CrossRef]










| Dataset | Images | Classes | Model/Method | Accuracy | Key Limitation |
|---|---|---|---|---|---|
| CompCars + real-world images 1 | 45,230 | 5 | CNN + AdaBoost + SVM | 99.50% | Evaluated on limited public benchmarks |
| BIT-Vehicle 2 | 9850 | 6 | ResNet50, Xception, DenseNet + Super Learner (DL ensemble) | 97.62% | Confusion among visually similar vehicle classes |
| BIT-Vehicle 2 | 64,000 | 8 | Ensemble Broad Learning System (BLS-based) | 91.23% | Increased training time due to ensemble structure |
| VINCI + Indian traffic images 3,4 | 73,638 | 5 | VGG-14, InceptionV3 + CatBoost | 99.03% | Class imbalance and overlap between vehicle categories |
| MIO-TCD 5 | 648,959 | 11 | ResNet50, Xception, DenseNet + Super Learner | 97.94% | Marginal performance gains from data augmentation |
| EAHVSD (Proposed) | 10,864 | 4 | Stacking-based CNN Meta-Ensemble | 96.04% | Single-view data; limited samples in LCV and OSV classes |
| Parameters | Description |
|---|---|
| Location of site | Hyderabad, India |
| Type of camera | Surveillance camera |
| Camera installation height (m) | 7.2 |
| Frame per second (FPS) | 25 |
| Video resolution (pixel size) | 1920 × 1080 |
| Road dimension (m) | Length: 56 m, Width: 20 m |
| Data collection sessions | 9-December-2023 (Afternoon, 25 min) 24-January-2024(Afternoon 52 min) 25-January-2024 (Night, 10 min) 2-February-2024 (Morning, 48 min) 6-February-2024 (Morning, 29 min) |
| Condition diversity | Morning, afternoon, and night recordings ensure variation in illumination and traffic conditions |
| Total number of images | 10,864 images |
| Class Label | X_center | Y_center | Width | Height | Xmin | Ymin | Xmax | Ymax |
|---|---|---|---|---|---|---|---|---|
| LMV_1 | 0.309766 | 0.465278 | 0.107031 | 0.113889 | 492 | 441 | 697 | 564 |
| LMV_1 | 0.647656 | 0.757639 | 0.142188 | 0.254167 | 1106 | 681 | 1381 | 956 |
| Truck_3 | 0.633594 | 0.445833 | 0.173438 | 0.230556 | 1050 | 357 | 1383 | 606 |
| Description | Details |
|---|---|
| Model architecture | Convolutional Neural Network (CNN) |
| Input image size | |
| Normalisation | Pixel values rescaled to the range [0, 1] (Scaling: ) |
| Data augmentation | Applied using Keras ImageDataGenerator |
| Augmentation types | Rescale: ; Rotation range: 20; Zoom range: 0.2; Horizontal flip: True; Width shift range: 0.1; Height shift range: 0.1; Shear range: 0.1; Brightness range: [0.8, 1.2] |
| Category | Specification |
|---|---|
| Execution Environment | Cloud-based GPU platform (Google Colab Pro) |
| System RAM | 53 GB (allocated) |
| GPU | NVIDIA Tesla T4 (22.5 GB VRAM) |
| Processor | Intel Core i5 (7th Gen) for local testing |
| Storage | ∼235.7 GB cloud disk allocation |
| Programming Frameworks | Python 3.10, TensorFlow 2.x, Keras |
| Supporting Libraries | NumPy, Pandas, Scikit-learn, Matplotlib |
| Development Tools | Jupyter Notebook, Google Colab Pro+ |
| Training Strategy | Training with fixed random seeds 42 |
| Compute Usage | Approx. 1.41 compute units per hour (4–5 h run time) |
| Inference Execution | GPU-enabled test-time inference |
| Model | Batch Size | Dense Units | Dropout | Learning Rate | Epochs |
|---|---|---|---|---|---|
| VGG16 | 16 | 250 | 0.2 | 0.0001 | 100 |
| VGG19 | 16 | 250 | 0.2 | 0.0001 | 80 |
| MobileNetV2 | 32 | 512 | 0.2 | 0.0001 | 70 |
| Xception | 64 | 250 | 0.2 | 0.0001 | 100 |
| AlexNet | 64 | 250 | 0.3 | 0.0001 | 90 |
| ResNet50 | 32 | 250 | 0.2 | 0.0001 | 100 |
| ResNet152 | 32 | 512 | 0.2 | 0.0001 | 100 |
| DenseNet121 | 16 | 250 | 0.3 | 0.0001 | 100 |
| DenseNet201 | 16 | 250 | 0.2 | 0.001 | 100 |
| InceptionV3 | 16 | 250 | 0.2 | 0.0001 | 120 |
| NASNetMobile | 32 | 250 | 0.3 | 0.0001 | 100 |
| Model | EAHVSD | JUIVCD | ||||||
|---|---|---|---|---|---|---|---|---|
| A | P | R | F1 | A | P | R | F1 | |
| VGG16 | 0.924 | 0.924 | 0.924 | 0.924 | 0.843 | 0.843 | 0.840 | 0.840 |
| VGG19 | 0.872 | 0.878 | 0.872 | 0.872 | 0.867 | 0.858 | 0.858 | 0.858 |
| MobileNetV2 | 0.923 | 0.919 | 0.919 | 0.919 | 0.912 | 0.916 | 0.916 | 0.916 |
| Xception | 0.885 | 0.883 | 0.885 | 0.882 | 0.955 | 0.954 | 0.954 | 0.954 |
| AlexNet | 0.837 | 0.839 | 0.837 | 0.836 | 0.786 | 0.779 | 0.779 | 0.779 |
| ResNet50 | 0.636 | 0.620 | 0.636 | 0.564 | 0.606 | 0.573 | 0.573 | 0.573 |
| ResNet152 | 0.652 | 0.563 | 0.652 | 0.590 | 0.557 | 0.516 | 0.516 | 0.516 |
| DenseNet121 | 0.927 | 0.931 | 0.927 | 0.926 | 0.921 | 0.922 | 0.922 | 0.922 |
| DenseNet201 | 0.926 | 0.930 | 0.926 | 0.927 | 0.923 | 0.922 | 0.922 | 0.922 |
| InceptionV3 | 0.959 | 0.959 | 0.959 | 0.959 | 0.941 | 0.938 | 0.938 | 0.938 |
| NASNetMobile | 0.908 | 0.905 | 0.898 | 0.897 | 0.909 | 0.910 | 0.910 | 0.910 |
| Vehicle Class | VGG16 | MobileNetV2 | InceptionV3 | DenseNet121 | DenseNet201 | Proposed Ensemble Model | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
| LCV_0 | 0.84 | 0.86 | 0.85 | 0.90 | 0.78 | 0.83 | 0.89 | 0.88 | 0.88 | 0.86 | 0.82 | 0.84 | 0.86 | 0.82 | 0.84 | 0.90 | 0.92 | 0.91 |
| LMV_1 | 0.99 | 0.99 | 0.99 | 1.00 | 0.98 | 0.99 | 1.00 | 0.99 | 0.99 | 1.00 | 0.99 | 0.99 | 1.00 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
| OSV_2 | 0.92 | 0.85 | 0.88 | 0.73 | 0.94 | 0.82 | 0.89 | 0.89 | 0.89 | 0.92 | 0.80 | 0.86 | 0.92 | 0.80 | 0.86 | 0.93 | 0.90 | 0.91 |
| Truck_3 | 0.79 | 0.80 | 0.79 | 0.82 | 0.73 | 0.77 | 0.81 | 0.85 | 0.83 | 0.73 | 0.86 | 0.79 | 0.73 | 0.86 | 0.79 | 0.85 | 0.87 | 0.86 |
| Vehicle Class | InceptionV3 | MobileNetV2 | VGG16 | DenseNet201 | DenseNet121 | Ensemble Model | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
| 0_Car | 0.91 | 1.00 | 0.95 | 0.68 | 1.00 | 0.81 | 0.59 | 1.00 | 0.74 | 0.60 | 1.00 | 0.75 | 0.73 | 1.00 | 0.84 | 0.64 | 1.00 | 0.78 |
| 1_Bus | 0.90 | 0.97 | 0.94 | 0.85 | 0.97 | 0.90 | 0.75 | 0.91 | 0.82 | 0.93 | 0.94 | 0.93 | 0.74 | 0.98 | 0.84 | 0.94 | 0.97 | 0.96 |
| 2_Bicycle | 1.00 | 0.78 | 0.87 | 0.83 | 0.91 | 0.87 | 0.91 | 0.64 | 0.75 | 0.97 | 0.94 | 0.96 | 0.85 | 0.93 | 0.89 | 0.96 | 0.95 | 0.96 |
| 3_Ambassador | 1.00 | 0.88 | 0.93 | 0.97 | 0.72 | 0.83 | 1.00 | 0.58 | 0.74 | 1.00 | 0.38 | 0.55 | 0.99 | 0.73 | 0.84 | 1.00 | 0.66 | 0.80 |
| 4_Van | 0.94 | 0.97 | 0.95 | 0.82 | 0.91 | 0.86 | 0.98 | 0.74 | 0.84 | 0.77 | 0.96 | 0.86 | 0.96 | 0.87 | 0.91 | 0.96 | 0.93 | 0.95 |
| 5_Motorized2W | 0.98 | 1.00 | 0.99 | 0.97 | 0.94 | 0.96 | 0.88 | 0.97 | 0.92 | 0.99 | 0.99 | 0.99 | 0.97 | 0.95 | 0.96 | 0.99 | 0.99 | 0.99 |
| 6_Rickshaw | 0.89 | 0.96 | 0.93 | 0.96 | 0.99 | 0.97 | 0.97 | 0.96 | 0.96 | 0.97 | 0.96 | 0.96 | 0.99 | 0.97 | 0.98 | 0.98 | 1.00 | 0.99 |
| 7_Motorvan | 1.00 | 0.64 | 0.78 | 0.78 | 0.64 | 0.70 | 0.57 | 0.36 | 0.44 | 1.00 | 0.82 | 0.90 | 1.00 | 0.82 | 0.90 | 1.00 | 0.73 | 0.84 |
| 8_Truck | 0.62 | 0.95 | 0.75 | 0.74 | 0.71 | 0.72 | 0.68 | 0.39 | 0.49 | 0.59 | 0.93 | 0.72 | 0.41 | 0.98 | 0.58 | 0.73 | 0.83 | 0.78 |
| 9_Autorickshaw | 0.90 | 0.96 | 0.93 | 0.97 | 0.82 | 0.89 | 0.98 | 0.80 | 0.88 | 0.97 | 0.94 | 0.95 | 0.99 | 0.66 | 0.79 | 0.99 | 0.91 | 0.95 |
| 10_Toto | 0.73 | 0.48 | 0.58 | 0.87 | 0.57 | 0.68 | 0.82 | 0.39 | 0.53 | 0.47 | 0.87 | 0.61 | 0.61 | 0.74 | 0.67 | 0.95 | 0.78 | 0.86 |
| 11_Minitruck | 0.97 | 0.55 | 0.70 | 0.87 | 0.54 | 0.67 | 0.52 | 0.80 | 0.63 | 0.98 | 0.50 | 0.66 | 0.87 | 0.43 | 0.58 | 0.85 | 0.74 | 0.79 |
| Model | EAHVSD Dataset | JUIVCD Dataset | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | P | R | F1 | AUC | Cohen Kappa | A | P | R | F1 | AUC | Cohen Kappa | |
| VGG16 | 0.92 | 0.92 | 0.92 | 0.92 | 0.99 | 0.89 | 0.86 | 0.86 | 0.81 | 0.81 | 0.99 | 0.46 |
| MobileNetV2 | 0.90 | 0.91 | 0.91 | 0.91 | 0.99 | 0.85 | 0.87 | 0.88 | 0.86 | 0.87 | 0.98 | 0.74 |
| InceptionV3 | 0.93 | 0.94 | 0.94 | 0.94 | 0.99 | 0.89 | 0.92 | 0.93 | 0.92 | 0.92 | 1.00 | 0.64 |
| DenseNet121 | 0.92 | 0.92 | 0.92 | 0.92 | 0.99 | 0.86 | 0.89 | 0.89 | 0.85 | 0.85 | 0.99 | 0.58 |
| DenseNet201 | 0.91 | 0.92 | 0.92 | 0.92 | 0.99 | 0.87 | 0.89 | 0.89 | 0.83 | 0.84 | 0.99 | 0.42 |
| Our Proposed Ensemble | 0.96 | 0.94 | 0.94 | 0.94 | 0.99 | 0.93 | 0.95 | 0.93 | 0.91 | 0.91 | 1.00 | 0.89 |
| Configuration | VGG16 | MobileNetV2 | InceptionV3 | DenseNet121 | DenseNet201 | EAHVSD Accuracy (%) | JUIVCD Accuracy (%) |
|---|---|---|---|---|---|---|---|
| Individual (VGG16) | ✓ | – | – | – | – | 92.0 | 86.0 |
| Individual (MobileNetV2) | – | ✓ | – | – | – | 90.0 | 87.0 |
| Individual (InceptionV3) | – | – | ✓ | – | – | 93.0 | 92.0 |
| Individual (DenseNet121) | – | – | – | ✓ | – | 92.0 | 89.0 |
| Individual (DenseNet201) | – | – | – | – | ✓ | 91.0 | 89.0 |
| 3-Ensemble Model | – | ✓ | ✓ | – | ✓ | 95.20 | 94.01 |
| 4- Ensemble Model | ✓ | ✓ | ✓ | – | ✓ | 94.73 | 92.70 |
| Majority Voting Ensemble | ✓ | ✓ | ✓ | ✓ | ✓ | 94.20 | 93.83 |
| Proposed Ensemble | ✓ | ✓ | ✓ | ✓ | ✓ | 96.04 | 95.28 |
| Dataset | Model | Images | Classes | Performance | Study |
|---|---|---|---|---|---|
| JUIVCD | Xception, InceptionV3, DenseNet121 | 6335 | 12 | 95.00% | [4] |
| CompCars+ | CNN + AdaBoost + SVM | 45,230 | 5 | 99.50% | [14] |
| KITTI | Soft weighted-average ensemble | 7518 | 4 | 94.75% | [51] |
| MIO-TCD | ResNet50, Xception, DenseNet + Super Learner | 648,959 | 11 | 97.94% | [15] |
| BIT-Vehicle | ResNet50, Xception, DenseNet + Super Learner | 9850 | 6 | 97.62% | [15] |
| MIO-TCD | ResNet50 + Ensemble Broad Learning System | 4000 | 4 | 94.63% | [19] |
| BIT-Vehicle | Ensemble Broad Learning System | 64,000 | 8 | 91.23% | [19] |
| Proposed EAHVSD | Stacking-based ensemble (VGG-16, MobileNetV2, InceptionV3, DenseNet-121, and DenseNet-201) | 10,864 | 4 | 96.04% | – |
| JUIVCD | Stacking-based ensemble (VGG-16, MobileNetV2, InceptionV3, DenseNet-121, and DenseNet-201) | 6335 | 6 | 95.28% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pateriya, P.; Trivedi, A.; Malhotra, R. SBMEV: A Stacking-Based Meta-Ensemble Vehicle Classification Framework for Real-World Traffic Surveillance. Appl. Sci. 2026, 16, 520. https://doi.org/10.3390/app16010520
Pateriya P, Trivedi A, Malhotra R. SBMEV: A Stacking-Based Meta-Ensemble Vehicle Classification Framework for Real-World Traffic Surveillance. Applied Sciences. 2026; 16(1):520. https://doi.org/10.3390/app16010520
Chicago/Turabian StylePateriya, Preeti, Ashutosh Trivedi, and Ruchika Malhotra. 2026. "SBMEV: A Stacking-Based Meta-Ensemble Vehicle Classification Framework for Real-World Traffic Surveillance" Applied Sciences 16, no. 1: 520. https://doi.org/10.3390/app16010520
APA StylePateriya, P., Trivedi, A., & Malhotra, R. (2026). SBMEV: A Stacking-Based Meta-Ensemble Vehicle Classification Framework for Real-World Traffic Surveillance. Applied Sciences, 16(1), 520. https://doi.org/10.3390/app16010520

