Abstract
Falls among the elderly represent a leading cause of injury and mortality worldwide, necessitating reliable and real-time monitoring solutions. This study aims to develop a lightweight, accurate, and efficient fall detection framework based on an improved YOLOv5s model. The proposed architecture incorporates a Convolutional Block Attention Module (CBAM) to enhance salient feature extraction, optimizes multi-scale feature fusion in the Neck for better small-object detection, and re-clusters anchor boxes tailored to the horizontal morphology of elderly falls. A multi-scene dataset comprising 11,314 images was constructed to evaluate performance under diverse lighting, occlusion, and spatial conditions. Experimental results demonstrate that the improved YOLOv5s achieves a mean average precision (mAP@0.5) of 94.2%, a recall of 92.5%, and a false alarm rate of 4.2%, outperforming baseline YOLOv5s and YOLOv4 models while maintaining real-time detection speed at 32 FPS. These findings confirm that integrating attention mechanisms, adaptive fusion, and anchor optimization significantly enhances robustness and generalization. Although performance slightly declines under extreme lighting or heavy occlusion, this limitation highlights future opportunities for multimodal fusion and illumination-invariant modeling. Overall, the study contributes a scalable and deployable AI framework that bridges the gap between algorithmic innovation and real-world elderly care applications, advancing intelligent and non-intrusive safety monitoring in aging societies.