4.3.1. Pedestrian Detection Result
The training results for the improved YOLOv8 model are presented in 
Figure 5, showcasing the changes in metrics such as train loss, val loss, precision, recall, mAP@0.5, and mAP@0.5:0.95 as the number of epochs increases. The training loss and validation loss decrease over time, indicating improved model learning. Precision, recall, mAP@0.5, and mAP@0.5:0.95 exhibit an upward trend, showcasing the model’s enhanced detection performance. These results demonstrate the effectiveness of the pedestrian detection model in accurately identifying pedestrians in various scenarios.
As shown in 
Table 1, we evaluated our proposed model against the baseline YOLOv8 network. The integration of the SoftNMS technique yielded significant improvements across various performance metrics. Notably, we observed a 0.93% increase in precision, a 1.55% increase in recall, a 0.61% increase in mAP@0.5, and a remarkable 10.17% increase in mAP@0.5:0.95. Furthermore, the incorporation of GhostNet resulted in notable reductions in model complexity. We achieved a reduction of 39.98% in the number of parameters, corresponding to a 37.1% decrease in model size. Additionally, the FLOPs were reduced by 35.8%. The combined utilization of SoftNMS and GhostNet led to 3.38% increase in precision and 3.07% increase in mAP@0.5:0.95. These findings underscore the effectiveness of these improvement techniques in optimizing the YOLOv8n model, enabling enhanced object detection capabilities while maintaining a balance between accuracy and lightweight design.
Despite the individual performance degradation of the Ghost convolution, the combination of SoftNMS and Ghost convolutions results in improved performance compared to the baseline. This is due to the complementary nature of the two techniques. SoftNMS suppresses duplicate detections, compensating for the slight performance decrease caused by Ghost convolution. Moreover, the integration of Ghost convolution provides benefits such as reduced complexity and computation cost, enhanced feature representation, and increased receptive field, contributing to the overall improved performance.
Figure 6 illustrates the detection results of YOLOv8 and the improved models on sample pedestrian images. The images showcase various scenarios, including crowded streets and pedestrian crossings. The bounding boxes, along with the corresponding class labels and confidence scores, indicate the detected pedestrians and their associated certainty levels. The improved models demonstrate the ability to detect a higher number of pedestrians, including those in densely crowded areas. The bounding boxes accurately localize the pedestrians, even in challenging scenarios where individuals are closely packed together. Furthermore, the improved models show improved sensitivity in detecting pedestrians at various scales. They can successfully detect both small and large pedestrians, allowing for comprehensive coverage across different sizes and distances. This capability is particularly important in real-world scenarios where pedestrians may appear at varying scales.
 In order to further validate the efficacy of the proposed model, we conducted a comparison of the visual effects using Grad-CAM. Grad-CAM is a technique that generates a heatmap representing the network’s attention to different regions of the input image. By applying Grad-CAM to both the original YOLOv8 and the improved model, we obtained the heatmaps illustrating their attention towards the target recognition, as depicted in 
Figure 7. The analysis of the heatmaps reveals that the improved model exhibits a higher intensity in the heatmap corresponding to the detection target area compared to the original YOLOv8. This observation suggests that the enhanced model is capable of extracting and leveraging the feature information of the detection target more effectively to a certain extent.
  4.3.2. Pedestrian Tracking Result
To assess the algorithm’s effectiveness, we utilized various detectors and trackors in the tracking process. Additionally, we utilized the ReID model mobilenetv2 for person re-identification.
For the ByteTrack tracker, the pedestrian tracking results on MOT17 are shown in 
Table 2. When combined with the public YOLOv8n detector, the MOTA was 33.599%, while the MOTP reached 81.432%. The IDF1 was 44.349%, while the HOTA score stood at 37.604%. Additionally, the FP (false positive) and FN (false negative) values were 3076 and 71,246, respectively. When the private YOLOv8n detector was used, the performance improved significantly. The MOTA increased to 50.529% and the IDF1 score rose to 57.377%, indicating improved detection and tracking accuracy. The HOTA score increased to 45.715%, showing enhanced overall performance. The FP and FN values decreased to 2222 and 53,026, respectively.
By integrating the SoftNMS technique into YOLOv8n, several metrics showed enhancements. The MOTP increased to 80.343%, indicating improved precision in object tracking. The IDF1 score improved to 57.473%, suggesting better detection and tracking capabilities. Additionally, the HOTA score reached 45.808%, demonstrating the effectiveness of the SoftNMS integration. The IDSW value decreased to 297, indicating a reduction in the number of identity switches and better tracking consistency. Incorporating GhostNet into the YOLOv8n architecture resulted in slightly lower MOTA of 43.759%, while maintaining a high MOTP of 80.602%; moreover, the FP value decreased to 1349, indicating a reduction in false positives. However, the IDSW value decreased to 233, indicating a reduction in the number of identity switches and better tracking consistency. The integration of both the SoftNMS and GhostNet techniques into the YOLOv8 algorithm led to a slight decrease in performance metrics. This can be attributed to the trade-off between detection accuracy and model complexity introduced by these techniques.
For the OCSORT tracker, similar performance trends were observed in 
Table 3. The combination with different YOLOv8n detectors consistently improved the tracking performance compared to the baseline YOLOv8. By integrating the private YOLOv8n detector into the OCSORT tracker, significant improvements were observed across multiple metrics. The MOTA increased to 56.546%, indicating enhanced tracking accuracy. The IDF1 score significantly improved to 62.324%, demonstrating enhanced detection and tracking capabilities. The HOTA score increased to 49.462%, indicating better overall tracking performance. Notably, the IDSW value decreased to 594, indicating a reduced number of identity switches. This yielded better tracking results compared to the baseline YOLOv8n detector. Incorporating the SoftNMS further improved the tracking performance. The MOTP improved to 80.013%. The IDF1 score reached 62.733%, indicating enhanced detection and tracking accuracy. The HOTA score increased to 49.889%, demonstrating the effectiveness of the combined approach. Moreover, the IDSW value decreased to 591, suggesting fewer identity switches, which is beneficial for maintaining consistent object identities throughout the tracking process.
When evaluating the OCSORT with GIOU tracker using different YOLOv8n detectors, as shown in 
Table 4, the overall performance was slightly better compared to using the original IOU.
Figure 8 depicts the variation of different performance metrics with respect to the alpha values, allowing us to identify optimal alpha values that strike a balance between these factors and make informed decisions for optimizing the multi-object tracking system.
 The pedestrian tracking results on MOT20 dataset are shown in 
Table 5, 
Table 6 and 
Table 7. The ByteTrack tracker achieved an MOTA of 21.4% when combined with the Public YOLOv8n detector. However, this performance significantly improved to an MOTA of 57.436% when using the Private YOLOv8n detector. On the other hand, the OCSORT tracker demonstrated even higher performance, achieving an MOTA of 30.032% and 64.933% with the Public and Private YOLOv8n detectors, respectively. Similar to the findings on the MOT17 dataset, the OCSORT tracker with GIOU consistently outperformed the original OCSORT tracker across various metrics. These results highlight the effectiveness of using advanced trackers and improved detectors in pedestrian tracking tasks, emphasizing the importance of selecting appropriate configurations for achieving higher accuracy and reliability in tracking systems.
According to 
Table 8 and 
Table 9, compared to other methods, our approach demonstrates superior performance in terms of numerous metrics, including the MOTA, IDF1, MOTP, HOTA, and IDSW scores. These results validate the effectiveness of our method in achieving accurate and robust multiple object tracking.