Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking
Abstract
1. Introduction
2. System Framework and Optimization Methods
2.1. System Framework and Process
- (1)
- Make lightweight improvements to the YOLOv8n model, reduce its size, and lower the consumption of computing resources while ensuring its detection performance.
- (2)
- Pedestrian trajectories are smoothed by correcting pedestrian position, which reduces the occurrence of trajectory jumps and enhances the stability of the trajectories.
- (3)
- The motion trends of related pedestrians are utilized to correct the predicted trajectory of the occluded pedestrians, significantly enhancing the accuracy of predicting the positions of occluded pedestrians.
- (4)
- An auxiliary counting zone is designed to reduce counting mistakes that are prone to occur in narrow areas, which will enhance the accuracy of counting pedestrians.
2.2. Improvement of Pedestrian Target Detection
2.2.1. Lightweight Module Design: RGCSPELAN
2.2.2. LSCD Detection Head
2.3. Improvement of Pedestrian Target Tracking and Counting Model
2.3.1. Problems Existing in DeepSort
2.3.2. Correction of Pedestrian Position
2.3.3. Improvement of the Target Prediction Position
2.3.4. Counting with Auxiliary Zone
3. Experiment and Analysis
3.1. Experimental Dataset
3.2. Environmental Configuration and Evaluation Indicators
3.3. Comparative Experiment on the Improvement of Object Detection Models
3.4. Ablation Experiment of Target Detection Model
3.5. Influence of Parameters on Experiments
3.5.1. Influences of Tolerance Threshold and Pulling Step
3.5.2. Analysis of Buffer
3.5.3. Influences of the Length of Frame Window
3.6. Pedestrian Counting Experiment
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ma, Z.; Chan, A.B. Crossing the line: Crowd counting by integer programming with local features. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2539–2546. [Google Scholar]
- Deng, L.; Zhou, Q.; Wang, S.; Górriz, J.M.; Zhang, Y. Deep learning in crowd counting: A survey. CAAI Trans. Intell. Technol. 2024, 9, 1043–1077. [Google Scholar] [CrossRef]
- Lempitsky, V.; Zisserman, A. Learning to count objects in images. In Proceedings of the 24th International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2010; pp. 1324–1332. [Google Scholar]
- Taniguchi, Y.; Mizushima, M.; Hasegawa, G.; Nakano, H.; Matsuoka, M. Counting pedestrians passing through a line in crowded scenes by extracting optical flows. Information 2016, 19, 303–316. [Google Scholar]
- Huang, R. Research on Cross-Line Counting Method for High-Density Population in Surveillance Videos. Master’s Thesis, Yangzhou University, Yangzhou, China, 2023. [Google Scholar]
- Zheng, H.; Lin, Z.; Cen, J.; Wu, Z.; Zhao, Y. Cross-line pedestrian counting based on spatially-consistent two-stage local crowd density estimation and accumulation. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 787–799. [Google Scholar] [CrossRef]
- Satyanarayana, P.; Pavuluri, G.; Kunda, S.; Satvik, M.; CharanKumar, Y.J.A. A robust bi-directional algorithm for people count in crowded areas. Int. J. Pure Appl. Math. 2017, 116, 73–78. [Google Scholar]
- He, M.; Luo, H.; Hui, B.; Chang, Z. Pedestrian flow tracking and statistics of monocular camera based on convolutional neural network and kalman filter. Appl. Sci. 2019, 9, 1624. [Google Scholar] [CrossRef]
- Gochoo, M.; Rizwan, S.A.; Ghadi, Y.Y.; Jalal, A.; Kim, K. A systematic deep learning based overhead tracking and counting system using RGB-D remote cameras. Appl. Sci. 2021, 11, 5503. [Google Scholar] [CrossRef]
- Marczyk, M.; Kempski, A.; Socha, M.; Cogiel, M.; Foszner, P.; Staniszewski, M. Passenger location estimation in public transport: Evaluating methods and camera placement impact. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17878–17887. [Google Scholar] [CrossRef]
- Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Han, L.; Feng, H.; Liu, G.; Zhang, A.; Han, T. A real-time intelligent monitoring method for indoor evacuee distribution based on deep learning and spatial division. J. Build. Eng. 2024, 92, 109764. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Mohammed, S.Y. Architecture Review: Two-stage and one-stage object detection. Franklin. Open 2025, 12, 100322. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Zhuang, J.; Wang, K.; Yuan, Z.; Yan, Y. Frequency domain iterative clustering for boundary-preserving superpixel segmentation. Appl. Soft Comput. 2026, 191, 114717. [Google Scholar] [CrossRef]
- Yang, M.; Xu, R.; Yang, C.; Wu, H.; Wang, A. Hybrid-DETR: A differentiated module-based model for object detection in remote sensing images. Remote Sens. 2024, 13, 5014. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-Style ConvNets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13728–13737. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Wu, Y.; He, K. Group normalization. Int. J. Comput. Vis. 2018, 128, 742–753. [Google Scholar] [CrossRef]
- Zhao, W.; Wang, L.; Li, Y.; Liu, X.; Zhang, Y.; Yan, B.; Li, H. A multi-scale and multi-stage human pose recognition method based on convolutional neural networks for non-wearable ergonomic evaluation. Processes 2024, 12, 2419. [Google Scholar] [CrossRef]
- Pereira, R.; Carvalho, G.; Garrote, L.; Nunes, U.J. Sort and Deep-SORT based multi-object tracking for mobile robotics: Evaluation with new data association metrics. Appl. Sci. 2022, 12, 1319. [Google Scholar] [CrossRef]
- Wu, Z.; Teixeira, C.; Ke, W.; Xiong, Z. Head anchor enhanced detection and association for crowded pedestrian tracking. arXiv 2025, arXiv:2508.05514. [Google Scholar] [CrossRef]
- He, M.; Luan, Q.; Shui, W.; Yu, H.; Fan, D. An improved social force model considering pedestrian perception avoidance feature of peer groups. J. Highw. Transp. Res. Dev. 2017, 34, 125–130. [Google Scholar]
- Chen, K.; Zhao, X.; Dong, C.; Di, Z.; Chen, Z. Anti-occlusion object tracking algorithm based on filter prediction. J. Shanghai Jiaotong Univ. Sci. 2024, 29, 400–413. [Google Scholar] [CrossRef]
- Zhang, L.; Yue, H.; Li, M.; Wang, S.; Mi, X.Y. Simulation of pedestrian push-force in evacuation with congestion. Acta Phys. Sin. 2015, 64, 060505. [Google Scholar] [CrossRef]
- Zhang, S.; Xie, Y.; Wan, J.; Xia, H.; Li, S.Z.; Guo, G. WiderPerson: A diverse dataset for dense pedestrian detection in the wild. IEEE Trans. Multimed. 2020, 22, 380–393. [Google Scholar] [CrossRef]
- Zhang, S.; Benenson, R.; Schiele, B. CityPersons: A diverse dataset for pedestrian detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4457–4465. [Google Scholar]














| Configuration | Training & Test Environment (TE1) | Test Environment (TE2) |
|---|---|---|
| Purpose | Model Training & Test | Model Test |
| Operating System | Windows 11 | Windows 10 |
| Memory | 128 GB | 16 GB |
| CPU | Intel Core i9-13900K | Intel Core i5-12400F |
| GPU | NVIDIA GeForce RTX 3090 | NVIDIA GeForce RTX 4060 Ti |
| GPU Memory | 24 GB | 8 GB |
| CUDA Toolkit | 12.1 | 11.3 |
| Python version | 3.11 | 3.8 |
| Deep Learning Framework | PyTorch 2.4.1 + cu121 | PyTorch 1.13 + cu117 |
| Models | Precision (%) | Recall (%) | mAP@50 (%) | mAP@50:95 (%) | FLOPs (G) | Param (M) | FPS |
|---|---|---|---|---|---|---|---|
| YOLOv5n | 94.68 | 90.06 | 95.58 | 75.97 | 7.1 | 2.50 | 157.37 |
| YOLOv8n | 94.25 | 91.18 | 95.89 | 76.41 | 8.1 | 3.01 | 225.81 |
| YOLO11n | 94.75 | 90.32 | 95.58 | 76.35 | 6.3 | 2.58 | 196.61 |
| RL_YOLOv8 | 94.99 | 91.25 | 96.40 | 76.50 | 5.5 | 1.67 | 233.80 |
| Model | YOLOv5n | YOLOv8n | YOLO11n | RL_YOLOv8 |
|---|---|---|---|---|
| FPS | 60.27 | 67.15 | 54.69 | 72.15 |
| Models | Precision (%) | Recall (%) | mAP@50 (%) | mAP@50:95 (%) | FLOPs (G) | Param (M) |
|---|---|---|---|---|---|---|
| YOLOv5n | 0.7698 | 0.6084 | 0.7120 | 0.4295 | 7.1 | 2.50 |
| YOLOv8n | 0.7723 | 0.6125 | 0.7153 | 0.4341 | 8.1 | 3.01 |
| YOLO11n | 0.7704 | 0.6040 | 0.7091 | 0.4315 | 6.3 | 2.58 |
| RL_YOLOv8 | 0.7631 | 0.6156 | 0.7151 | 0.4328 | 5.5 | 1.67 |
| Models | Precision (%) | Recall (%) | mAP@50 (%) | mAP@50:95 (%) | FLOPs (G) | Param (M) |
|---|---|---|---|---|---|---|
| YOLOv5n | 0.7826 | 0.4970 | 0.5753 | 0.3494 | 7.1 | 2.50 |
| YOLOv8n | 0.7825 | 0.5023 | 0.5880 | 0.3591 | 8.1 | 3.01 |
| YOLO11n | 0.7871 | 0.5025 | 0.5823 | 0.3561 | 6.3 | 2.58 |
| RL_YOLOv8 | 0.7588 | 0.5245 | 0.6021 | 0.3670 | 5.5 | 1.67 |
| Models | Precision (%) | Recall (%) | mAP@50 (%) | mAP@50:95 (%) | FLOPs (G) | Param (M) | FPS | |
|---|---|---|---|---|---|---|---|---|
| RGCSPELAN | LSCD | |||||||
| - | - | 94.25 | 91.18 | 95.66 | 75.97 | 8.1 | 3.01 | 225.81 |
| √ | - | 94.91 | 90.68 | 95.66 | 76.35 | 7.1 | 2.31 | 234.84 |
| - | √ | 94.46 | 90.55 | 95.83 | 76.76 | 6.5 | 2.36 | 227.36 |
| √ | √ | 94.99 | 91.25 | 96.40 | 76.50 | 5.5 | 1.67 | 233.80 |
| Tolerance Value | True Count (In + Out) | Model Counting (In + Out) | TP (In + Out) | FP (In + Out) | FN (In + Out) | Precision (%) | Recall (%) | F1 (%) |
|---|---|---|---|---|---|---|---|---|
| 3 | 55 + 8 | 50 + 11 | 48 + 7 | 2 + 4 | 7 + 1 | 90.16 | 87.30 | 88.71 |
| 5 | 55 + 8 | 55 + 10 | 52 + 7 | 3 + 3 | 3 + 1 | 90.77 | 93.65 | 92.19 |
| 8 | 55 + 8 | 56 + 9 | 52 + 7 | 4 + 2 | 3 + 1 | 90.77 | 93.65 | 92.19 |
| 10 | 55 + 8 | 56 + 9 | 52 + 7 | 4 + 2 | 3 + 1 | 90.77 | 93.65 | 92.19 |
| 15 | 55 + 8 | 55 + 9 | 51 + 7 | 4 + 2 | 4 + 1 | 90.63 | 92.06 | 91.34 |
| 20 | 55 + 8 | 55 + 9 | 51 + 7 | 4 + 2 | 4 + 1 | 90.63 | 92.06 | 91.34 |
| 30 | 55 + 8 | 54 + 9 | 50 + 7 | 4 + 2 | 5 + 1 | 90.48 | 90.48 | 90.48 |
| 40 | 55 + 8 | 54 + 9 | 50 + 7 | 4 + 2 | 5 + 1 | 90.48 | 90.48 | 90.48 |
| Frame Window | True Count (In + Out) | Model Counting (In + Out) | TP (In + Out) | FP (In + Out) | FN (In + Out) | Precision (%) | Recall (%) | F1 (%) |
|---|---|---|---|---|---|---|---|---|
| 0.5 | 55 + 8 | 55 + 9 | 51 + 7 | 4 + 2 | 4 + 1 | 90.63 | 92.06 | 91.34 |
| 1 | 55 + 8 | 56 + 9 | 52 + 7 | 4 + 2 | 3 + 1 | 90.77 | 93.65 | 92.19 |
| 1.5 | 55 + 8 | 54 + 9 | 50 + 7 | 4 + 2 | 5 + 1 | 90.48 | 90.48 | 90.48 |
| 2 | 55 + 8 | 55 + 10 | 51 + 7 | 4 + 3 | 4 + 1 | 89.23 | 92.06 | 90.63 |
| Frame Window | True Count (In + Out) | Model Counting (In + Out) | TP (In + Out) | FP (In + Out) | FN (In + Out) | Precision (%) | Recall (%) | F1(%) |
|---|---|---|---|---|---|---|---|---|
| 5 | 55 + 8 | 50 + 9 | 48 + 7 | 2 + 2 | 7 + 1 | 93.22 | 87.30 | 90.16 |
| 10 | 55 + 8 | 54 + 9 | 51 + 7 | 3 + 2 | 4 + 1 | 92.06 | 92.06 | 92.06 |
| 15 | 55 + 8 | 56 + 9 | 52 + 7 | 4 + 2 | 3 + 1 | 90.77 | 93.65 | 92.19 |
| 20 | 55 + 8 | 55 + 9 | 50 + 7 | 5 + 2 | 5 + 1 | 89.06 | 90.48 | 89.76 |
| 25 | 55 + 8 | 57 + 9 | 52 + 7 | 5 + 2 | 3 + 1 | 89.39 | 93.65 | 91.47 |
| 30 | 55 + 8 | 58 + 9 | 50 + 7 | 8 + 2 | 5 + 1 | 85.07 | 90.48 | 87.69 |
| Scene | True Count (In + Out) | Model Counting (In + Out) | TP (In + Out) | FP (In + Out) | FN (In + Out) | P (%) | R (%) | F1 Score | |
|---|---|---|---|---|---|---|---|---|---|
| Original counting result | Line 1 | 55 + 8 | 59 + 8 | 52 + 6 | 7 + 2 | 3 + 2 | 86.57 | 92.06 | 89.23 |
| Line 2 | 12 + 4 | 12 + 1 | 11 + 1 | 1 + 0 | 1 + 3 | 92.31 | 75.00 | 82.76 | |
| Line 3 | 104 + 28 | 111 + 32 | 103 + 28 | 8 + 4 | 1 + 0 | 91.61 | 99.24 | 95.27 | |
| Improved counting result | Line 1 | 55 + 8 | 56 + 9 | 52 + 7 | 4 + 2 | 3 + 1 | 90.77 | 93.65 | 92.19 |
| Line 2 | 12 + 4 | 11 + 4 | 10 + 4 | 1 + 0 | 2 + 0 | 93.33 | 87.50 | 90.32 | |
| Line 3 | 104 + 28 | 107 + 28 | 104 + 28 | 3 + 0 | 0 + 0 | 97.78 | 100.00 | 98.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Song, L.; Han, L.; Wang, J.; Feng, H.; Ji, R. Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking. ISPRS Int. J. Geo-Inf. 2026, 15, 136. https://doi.org/10.3390/ijgi15030136
Song L, Han L, Wang J, Feng H, Ji R. Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking. ISPRS International Journal of Geo-Information. 2026; 15(3):136. https://doi.org/10.3390/ijgi15030136
Chicago/Turabian StyleSong, Laihao, Litao Han, Jiayan Wang, Hengjian Feng, and Ran Ji. 2026. "Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking" ISPRS International Journal of Geo-Information 15, no. 3: 136. https://doi.org/10.3390/ijgi15030136
APA StyleSong, L., Han, L., Wang, J., Feng, H., & Ji, R. (2026). Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking. ISPRS International Journal of Geo-Information, 15(3), 136. https://doi.org/10.3390/ijgi15030136

