A Visual Trajectory-Based Method for Personnel Behavior Recognition in Industrial Scenarios
Abstract
1. Introduction
- Development of an integrated processing pipeline tailored for industrial scenarios, which seamlessly combines robust detection, multi-object tracking, perspective correction, and trajectory-feature-based classification into an automated workflow to achieve multi-person behavior recognition.
- Proposal of an enhanced YOLOv8n personnel detection and localization model:
- We integrate RFAConv into the model to more effectively capture information disparities across spatial locations, thus enhancing detection accuracy.
- The incorporation of the EMA facilitates efficient extraction of small-target features while strengthening the model’s focus on critical attributes.
- Development of a perspective correction method tailored for industrial settings: The perspective transformation algorithm establishes a mapping relationship from tilted views to top-view space, eliminating interference caused by surveillance camera angles on personnel movement trajectories.
- Design of discriminative trajectory features: We introduce novel features, including trajectory curvature entropy and primary direction angle, which effectively capture subtle motion patterns.
2. Related Work
2.1. Robust Trajectory Acquisition in Industrial Environments
2.2. Geometric Distortion Correction via Perspective Transformation
2.3. Personnel Behavior Recognition Model Establishment
3. Methodologies
3.1. Overall Framework
- Personnel Detection: Video frames are fed into the enhanced YOLOv8 model to localize personnel in an industrial scene.
- Personnel Tracking: The BOT-SORT tracker associates personnel IDs and generates trajectory position data.
- Perspective Correction: Surveillance footage undergoes perspective transformation to map raw trajectory coordinates to a standardized bird’s-eye view space.
- Feature Extraction: Transformed trajectories undergo feature engineering to construct a trajectory feature vector dataset.
- Behavior Recognition: The Random Forest classifier identifies personnel behaviors based on extracted features.
3.2. Behavior Analysis
- Normal Walking: Near-linear trajectory with consistent velocity.
- Prolonged Stillness: Minimal directional changes and increasing trajectory length over time within specific regions.
- Abnormal Acceleration: Sudden velocity increase within localized zones.
- Area Loitering: Extended presence within a confined area with near-zero displacement magnitude.
3.3. Improved YOLOv8n
3.3.1. Incorporation of the RFAConv
- This branch generates attention maps by first using average pooling to aggregate global information, followed by 1 × 1 convolutions for efficient channel interaction. A Softmax function then produces the final weights, which highlight the most critical feature locations within the receptive field.
- The Spatial Feature Branch: Operating concurrently, this branch uses grouped convolutions to extract a rich set of spatial feature maps from the same input features. Each group of convolutions generates a distinct feature map corresponding to a specific part of the receptive field.
3.3.2. Incorporation of the EMA
3.4. Perspective Transformation
- Feature point calibration: The four vertices of the rectification area in the surveillance image are located through annotation.
- The target image dimensions are defined based on the scale ratio of the region of interest.
- Execute matrix computation and transformation: Compute the matrix using the DLT [44] with four pairs of source-target point correspondences.
- For each pixel in the target image, inversely map to source coordinates via .
- The pixel value at is computed via interpolation algorithms and populated into the target image.
3.5. Trajectory Feature Engineering
- Trajectory Curvature Entropy
- Primary Direction Angle
- Motion Velocity and Directional Anomaly
- Path Length and Displacement
4. Experiments and Analysis
4.1. Experimental Settings
4.1.1. Dataset
4.1.2. Implementation
- Input image size: 640 × 640;
- Epoch: 150;
- Initial learning rate: 0.01;
- Weight decay: 0.05;
- Batch size: 8;
- Optimizer: SGD;
- Patience: 20;
- Data enhancement strategy: mosaic, flip, scale, translate, hsv.
4.2. Evaluation Metrics
4.2.1. Evaluation Metrics for Detection
4.2.2. Evaluation Metrics for Tracking
4.2.3. Evaluation Metrics for Personnel Behavior Recognition
4.3. Object Detection Experiments
4.3.1. Comparative Experiments of Different Models
4.3.2. Ablation Study
4.3.3. Visualization Analysis
4.3.4. Performance on Environment-Specific Test
4.4. Experiments on Tracking
4.5. Perspective Transformation Experiment
4.6. Personnel Behavior Recognition Experiment
4.6.1. Classifier Optimization and Feature Validation
- class_weight = “balanced”;
- max_depth = 15;
- min_samples_leaf = 1;
- min_samples_split = 2;
- n_estimators = 150.
4.6.2. Experimental Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xiao, X.; Jun, K.; Jian, W.; Long, C.; Hong, H. Experimental platform for industrial worker behavior analysis based on computer vision. Exp. Technol. Manag. 2024, 41, 101–110. [Google Scholar]
- Wang, H.; Lv, L.; Li, X.; Li, H.; Leng, J.; Zhang, Y.; Thomson, V.; Liu, G.; Wen, X.; Sun, C.; et al. A safety management approach for industry 5.0′s human-centered manufacturing based on digital twin. J. Manuf. Syst. 2023, 66, 1–12. [Google Scholar] [CrossRef]
- Lu, Y.; Zheng, H.; Chand, S.; Xia, W.; Liu, Z.; Xu, X.; Wang, L.; Qin, Z.; Bao, J. Outlook on human-centric manufacturing towards industry 5.0. J. Manuf. Syst. 2022, 62, 612–627. [Google Scholar] [CrossRef]
- Ahmed, I.; Anisetti, M.; Jeon, G. An iot-based human detection system for complex industrial environment with deep learning architectures and transfer learning. Int. J. Intell. Syst. 2022, 37, 10249–10267. [Google Scholar] [CrossRef]
- Ada, F.; Giacomo, P.; Alessandro, P. Quasi-real time remote video surveillance unit for lorawan-based image transmission. In Proceedings of the 2021 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), Rome, Italy, 7–9 June 2021; pp. 588–593. [Google Scholar]
- Sun, Q.; Yang, Y. Unsupervised video anomaly detection based on multi-timescale trajectory prediction. Comput. Vis. Image Underst. 2023, 227, 103615. [Google Scholar] [CrossRef]
- Nguyen, H.; Ta, N.; Nguyen, C.; Bui, T.; Pham, M.; Nguyen, M. Yolo based real-time human detection for smart video surveillance at the edge. In Proceedings of the 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE), Phu Quoc Island, Vietnam, 13–15 January 2021; pp. 439–444. [Google Scholar] [CrossRef]
- Li, W.; Li, J.; Xiao, F.; Li, X. Research on pedestrian target detection method based on improved yolov8 algorithm. In Proceedings of the 2024 IEEE 8th International Conference on Vision, Image and Signal Processing (ICVISP), Kunming, China, 27–29 December 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Mao, J.; Wang, H.; Li, D. Optimized and improved yolov8 dense pedestrian detection algorithm. In Proceedings of the 2024 9th International Conference on Communication, Image and Signal Processing (CCISP), Gold Coast, Australia, 13–15 November 2024; pp. 57–61. [Google Scholar] [CrossRef]
- Rajakumar, S.; Azad, R.B. A novel yolov8 architecture for human activity recognition of occluded pedestrians. Int. J. Electr. Comput. Eng. 2024, 14, 2088–8708. [Google Scholar] [CrossRef]
- Zhao, X.; Li, H.; Zhao, Z.; Li, S. Height reverse perspective transformation for crowd counting. Front. Imaging 2023, 2, 1271885. [Google Scholar] [CrossRef]
- Yin, W.; Zang, X.; Wu, L.; Zhang, X.; Zhao, J. A distortion correction method based on actual camera imaging principles. Sensors 2024, 24, 2406. [Google Scholar] [CrossRef]
- Wang, C.; Ding, Y.; Cui, K.; Li, J.; Xu, Q.; Mei, J. A perspective distortion correction method for planar imaging based on homography mapping. Sensors 2025, 25, 1891. [Google Scholar] [CrossRef] [PubMed]
- Colque, R.M.; Cayllahua, E.; de Melo, V.C.; Chavez, G.C.; Schwartz, W.R. Anomaly event detection based on people trajectories for surveillance videos. In Proceedings of the VISIGRAPP (5: VISAPP), Valletta, Malta, 27–29 February 2020; pp. 107–116. [Google Scholar]
- Hu, X.; Dai, J.; Huang, Y.; Yang, H.; Zhang, L.; Chen, W.; Yang, G.; Zhang, D. A weakly supervised framework for abnormal behavior detection and localization in crowded scenes. Neurocomputing 2020, 383, 270–281. [Google Scholar] [CrossRef]
- Lee, Y.; Lee, S.; Kim, B.; Kim, D. Glbrf: Group-based lightweight human behavior recognition framework in video camera. Appl. Sci. 2024, 14, 2424. [Google Scholar] [CrossRef]
- Pronello, C.; Garzón Ruiz, X.R. Evaluating the performance of video-based automated passenger counting systems in real-world conditions: A comparative study. Sensors 2023, 23, 7719. [Google Scholar] [CrossRef]
- Li, W. Analysis of object detection performance based on faster r-cnn. J. Phys. Conf. Ser. 2021, 1827, 12085. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.; Romero-González, J. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Zhu, J. Research on remote sensing small target detection method based on lightweight yolov8. Acad. J. Comput. Inf. Sci. 2025, 8, 83–89. [Google Scholar] [CrossRef]
- Wan, Z.; Lan, Y.; Xu, Z.; Shang, K.; Zhang, F. Dau-yolo: A lightweight and effective method for small object detection in uav images. Remote Sens. 2025, 17, 1768. [Google Scholar] [CrossRef]
- Zheng, X.; Bi, J.; Li, K.; Zhang, G.; Jiang, P. Smn-yolo: Lightweight yolov8-based model for small object detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 2025, 22, 1–5. [Google Scholar] [CrossRef]
- Bai, J.; Zhang, S. Rfa-yolov8: Steel plate surface defect detection algorithm based on improved yolov8. J. Phys. Conf. Ser. 2024, 2872, 12025. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 20 October 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. 16.0Simple Online and Realtime Tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the 17th European Conference on Computer Vision, ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar]
- Li, T.; Li, Z.; Mu, Y.; Su, J. Pedestrian multi-object tracking based on yolov7 and bot-sort. In Proceedings of the 2023 3rd International Conference on Computer Vision and Pattern Analysis (ICCPA 2023), Hangzhou, China, 7–9 April 2023; Volume 12754. [Google Scholar]
- Habib, S.; Hussain, A.; Albattah, W.; Islam, M.; Khan, S.; Khan, R.U.; Khan, K. Abnormal activity recognition from surveillance videos using convolutional neural network. Sensors 2021, 21, 8291. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Zhang, B.; Sander, P.V.; Liao, J. Blind geometric distortion correction on images through deep learning. In Proceedings of the IEEE/CVF Conference On computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4855–4864. [Google Scholar]
- Bruls, T.; Porav, H.; Kunze, L.; Newman, P. The right (angled) perspective: Improving the understanding of road scenes using boosted inverse perspective mapping. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 302–309. [Google Scholar] [CrossRef]
- Abu-raddaha, A.; El-Shair, Z.A.; Rawashdeh, S. Leveraging perspective transformation for enhanced pothole detection in autonomous vehicles. J. Imaging 2024, 10, 227. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, B.; Gao, J.; Ji, S.; Zhang, X.; Wang, Z. A perspective transformation method based on computer vision. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 765–768. [Google Scholar] [CrossRef]
- Wang, Z.; Fan, Y.; Zhang, H. Lane-line detection algorithm for complex road based on opencv. In Proceedings of the 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 11–13 October 2019; pp. 1404–1407. [Google Scholar] [CrossRef]
- Kaseris, M.; Kostavelis, I.; Malassiotis, S. A comprehensive survey on deep learning methods in human activity recognition. Mach. Learn. Knowl. Extr. 2024, 6, 842–876. [Google Scholar] [CrossRef]
- Kulbacki, M.; Segen, J.; Chaczko, Z.; Rozenblit, J.W.; Kulbacki, M.; Klempous, R.; Wojciechowski, K. Intelligent video analytics for human action recognition: The state of knowledge. Sensors 2023, 23, 4258. [Google Scholar] [CrossRef] [PubMed]
- Zhao, S.; Zhu, J.; Lu, J.; Ju, Z.; Wu, D. Lightweight human behavior recognition method for visual communication agv based on cnn-lstm. Int. J. Crowd Sci. 2025, 9, 133–138. [Google Scholar] [CrossRef]
- Rodrigues, R.; Bhargava, N.; Velmurugan, R.; Chaudhuri, S. Multi-timescale trajectory prediction for abnormal human activity detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2626–2634. [Google Scholar]
- Fernández, J.D.; García-González, J.; Benítez-Rochel, R.; Molina-Cabello, M.A.; López-Rubio, E. Anomalous trajectory detection for automated traffic video surveillance. In Bio-Inspired Systems and Applications: From Robotics to Ambient Intelligence; Ferrández Vicente, J.M., Ed.; Springer: Cham, Switzerland, 2020; Volume 13259. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Bustoni, I.A.; Hidayatulloh, I.; Ningtyas, A.M.; Purwaningsih, A.; Azhari, S.N. Classification methods performance on human activity recognition. J. Phys. Conf. Ser. 2020, 1456, 12027. [Google Scholar] [CrossRef]
- Baldominos, A.; Cervantes, A.; Saez, Y.; Isasi, P. A comparison of machine learning and deep learning techniques for activity recognition using mobile devices. Sensors 2019, 19, 521. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 1st ed.; Cambridge University Press: Cambridge, UK, 2003; pp. 88–109. [Google Scholar]
- Abdel-Aziz, Y.I.; Karara, H.M.; Hauck, M. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogramm. Eng. Remote Sens. 2015, 81, 103–107. [Google Scholar] [CrossRef]
- Chartrand, R.; Yin, W. Iteratively reweighted algorithms for compressive sensing. In Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 3869–3872. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-Identification: A Benchmark. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- CVHub520/X-AnyLabeling. Available online: https://github.com/CVHub520/X-AnyLabeling (accessed on 8 September 2025).
Models | AP50 (%) | AP50:95 (%) | Model Size (MB) | GFLOPs | FPS |
---|---|---|---|---|---|
Faster-RCNN | 58.3 | 26.7 | 524.5 | 386.2 | 32 |
YOLOv5-n | 69.4 | 37.2 | 3.6 | 4.1 | 702 |
YOLOv5-s | 72.7 | 38 | 13.7 | 15.8 | 399 |
YOLOv7-tiny | 74.3 | 39.2 | 11.7 | 13 | 285 |
YOLOv8-n | 71.3 | 40.6 | 5.9 | 8.1 | 433 |
YOLOv8-s | 77.5 | 43.4 | 21.5 | 28.4 | 181 |
YOLOv9-t | 70.9 | 37.8 | 5.8 | 10.7 | 172 |
YOLOv10-n | 70.6 | 35.9 | 5.5 | 8.2 | 415 |
Ours | 78.2 | 44.8 | 6 | 8.4 | 201 |
Baseline | RFA | EMA | AP50 (%) | AP50:95 (%) | Model Size (MB) | GFLOPs | FPS |
---|---|---|---|---|---|---|---|
YOLOv8n | 71.3 | 40.6 | 5.9 | 8.1 | 433 | ||
√ | 75.8 | 44.2 | 6.0 | 8.3 | 237 | ||
√ | 74.1 | 43.5 | 6.0 | 8.1 | 369 | ||
√ | √ | 78.2 | 44.8 | 6.0 | 8.4 | 201 |
Condition | Models | AP50(%) | AP50:95(%) |
---|---|---|---|
(a) | Baseline | 67.7 | 37.4 |
Ours | 75.7 | 42.2 | |
(b) | Baseline | 60.8 | 29.3 |
Ours | 71.7 | 38.1 | |
(c) | Baseline | 71.3 | 40.6 |
Ours | 78.2 | 44.8 |
Model | Scene | MOTA(%) | MOTP(%) | FPS |
---|---|---|---|---|
Improved YOLOv8n + BOT-SORT | 1 | 86.95 | 7.81 | 59.2 |
2 | 80.10 | 11.84 | 57.3 | |
3 | 71.92 | 13.90 | 56.7 |
Precision (%) | Recall (%) | F1-Score (%) | Number of Test Samples | |
---|---|---|---|---|
Normal Walking | 83.4 | 81.1 | 82.4 | 95 |
Area Loitering | 84.4 | 83.5 | 83.9 | 97 |
Abnormal Acceleration | 80.6 | 86.2 | 83.3 | 29 |
Prolonged Stillness | 88.4 | 90.5 | 89.4 | 84 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Song, T.; Xu, Z.; Cao, S.; Zhou, B.; Jiang, Q. A Visual Trajectory-Based Method for Personnel Behavior Recognition in Industrial Scenarios. Sensors 2025, 25, 6331. https://doi.org/10.3390/s25206331
Wang H, Song T, Xu Z, Cao S, Zhou B, Jiang Q. A Visual Trajectory-Based Method for Personnel Behavior Recognition in Industrial Scenarios. Sensors. 2025; 25(20):6331. https://doi.org/10.3390/s25206331
Chicago/Turabian StyleWang, Houquan, Tao Song, Zhipeng Xu, Songxiao Cao, Bin Zhou, and Qing Jiang. 2025. "A Visual Trajectory-Based Method for Personnel Behavior Recognition in Industrial Scenarios" Sensors 25, no. 20: 6331. https://doi.org/10.3390/s25206331
APA StyleWang, H., Song, T., Xu, Z., Cao, S., Zhou, B., & Jiang, Q. (2025). A Visual Trajectory-Based Method for Personnel Behavior Recognition in Industrial Scenarios. Sensors, 25(20), 6331. https://doi.org/10.3390/s25206331