Lightweight RT-DETR with Attentional Up-Downsampling Pyramid Network
Abstract
:1. Introduction
2. Background and Literature Review
2.1. Gesture-Recognition and Fall-Detection Methods Based on Classical Machine Learning
2.2. Gesture-Recognition and Fall-Detection Methods Based on Deep Learning
2.3. Recent Advances and Motivation for This Work
- By integrating the ADown downsampling module and the RepNCSPELAN4 module from YOLOv9 into the RT-DETR model’s backbone, we enhanced its recognition accuracy while reducing computational complexity. This modification enables the model to be deployed on resource-constrained hardware and allows for seamless upgrades without any performance loss.
- To address the accuracy limitations of the feature fusion network, we revisited the structure of RT-DETR R18. We identified that the original upsampling and downsampling processes in the feature fusion stages of RT-DETR R18 relied solely on basic nearest-neighbor interpolation and convolution operations. This approach led to deficiencies in feature selectivity, adaptability, and global information retention. Inspired by attention mechanisms, we developed the AdaptiveGateUpsample and AdaptiveGateDownsample modules. These modules leverage attention mechanisms to address the shortcomings of conventional upsampling and downsampling in terms of feature selectivity, adaptability, and the retention of both local and global information. This results in more precise upsampling and more effective downsampling, enhancing feature representation and model performance. Based on these modules, we constructed the Attentional Up-Downsampling Pyramid Network (AUDPN), an attention-based upsampling and downsampling pyramid structure. This architecture improves multi-scale object-detection capabilities, particularly under challenging conditions such as occlusion and complex lighting, maintaining high detection accuracy and robustness.
- Finally, the practical value of the model was validated through experiments on gesture recognition and human anomaly detection in medical care environments. The model successfully demonstrated its utility by automatically recognizing user gestures and abnormal human postures, combined with gesture-based control and fall posture-detection alarms.
3. Materials and Methods
3.1. Data Collection and Preprocessing
3.2. Design of Lightweight RT-DETR with Attentional Up-Downsampling Pyramid Network
- We incorporated the ADown downsampling module from YOLOv9 to optimize the backbone, using average pooling to extract feature map dimensions. This reduces model parameters, enhancing its lightweight design. Additionally, ADown can merge its parameters into convolutional layers during inference, further improving efficiency.
- We replaced the feature extraction module in the RT-DETR R18 backbone with the RepNCSPELAN4 module from YOLOv9. This module is more lightweight and offers greater versatility and efficiency for handling complex training tasks.
- We independently developed attention-based upsampling and downsampling modules (AdaptiveGateUpsample and AdaptiveGateDownsample) to replace the original neck upsampling and downsampling modules. These modules leverage attention mechanisms to address the shortcomings of conventional upsampling and downsampling, improving feature selectivity, adaptability, and the retention of both local and global information. As a result, the upsampling becomes more precise, and the downsampling more effective. This led to the creation of a new feature fusion network, the Attentional Up-Downsampling Pyramid Network (AUDPN), which enhances object-detection capabilities.
3.2.1. Adown
3.2.2. RepNCSPELAN4
3.2.3. Attentional Up-Downsampling Pyramid Network (AUDPN)
3.3. Evaluation Metrics
4. Results
4.1. Training Environment and Hyperparameter Settings
4.2. Ablation Experiments
4.3. Comparative Experiments
4.4. Performance Evaluation of Lightweight RT-DETR with AUDPN
5. Discussion
5.1. Findings
5.2. Limitations
- Real-time Performance: Although the model has an FPS of up to 57.2 in the training environment, its FPS will be significantly reduced when deployed on resource-limited devices such as industrial control computers without GPU acceleration. Therefore, if you want to achieve fast and accurate inference, we recommend deploying it on devices with GPU acceleration.
- Task Specificity: The current model is primarily optimized for gesture recognition and fall detection. Its generalization to other object-detection tasks still requires further validation. One limitation of the proposed system is its reliance on commonly used gestures, which, while reducing the cognitive burden on elderly users, may lead to potential gesture misclassification in environments where similar gestures are frequently performed. For example, the “OK” gesture might be misinterpreted in certain social settings where it is commonly used. However, as previously discussed, requiring elderly users to learn custom gestures is impractical, making the adoption of familiar gestures the most viable solution. Future research should explore advanced disambiguation strategies, such as integrating hand keypoint tracking or multi-modal inputs (e.g., combining voice commands with gestures) to improve recognition accuracy in these scenarios.
- Cost Considerations: An important limitation is the potential high cost associated with building and deploying intelligent companion robots. In the context of China’s healthcare system—where cost sensitivity is critical—this could impede widespread adoption despite the model’s technical advantages. The intelligent companion robot used in this paper costs about 10,000 RMB. However, the recommended high-configuration version costs around 20,000 RMB.
5.3. Future Research Directions
- Enhanced Data Diversity: Training and testing the model on more diverse datasets collected from various healthcare centers and home environments could improve adaptability and robustness.
- Model Optimization: Further refinements of the model structure and algorithmic enhancements, potentially drawing inspiration from emerging models like YOLO V11, may boost real-time performance on resource-constrained devices.
- Cost-Reduction Strategies: Research into more cost-effective hardware integration and model-compression techniques will be crucial for reducing the overall expense of intelligent companion robot systems.
- Beneficiary-Centric Applications: Future studies should also focus on long-term pilot deployments in Chinese eldercare facilities and community healthcare centers. The primary beneficiaries of this research include elderly individuals, family caregivers, healthcare providers, and policymakers. Intelligent companion robots equipped with our lightweight model could reduce caregiver burden, enable rapid intervention in emergencies, and ultimately lower healthcare costs by preventing severe injuries.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- United Nations. World Population Prospects—Population Division—United Nations. Available online: https://population.un.org/wpp/ (accessed on 30 December 2024).
- United Nations. World Population Prospects 2024. 2024. Available online: https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/undesa_pd_2024_wpp_2024_advance_unedited_0.pdf (accessed on 30 December 2024).
- Babič, F.; Ljiljana, T.M.; Bekić, S.; Holzinger, A. Machine Learning for Family Doctors: A Case of Cluster Analysis for Studying Aging Associated Comorbidities and Frailty. Lect. Notes Comput. Sci. 2019, 11713, 178–194. [Google Scholar] [CrossRef]
- Barry, A.; Heale, R.; Pilon, R.; Lavoie, A. The Meaning of Home for Ageing Women Living Alone: An Evolutionary Concept Analysis. Health Soc. Care Community 2017, 26, e337–e344. [Google Scholar] [CrossRef]
- Finlay, J.M.; Kobayashi, L.C. Social Isolation and Loneliness in Later Life: A Parallel Convergent Mixed-Methods Case Study of Older Adults and Their Residential Contexts in the Minneapolis Metropolitan Area, USA. Soc. Sci. Med. 2018, 208, 25–33. [Google Scholar] [CrossRef]
- Tang, Y. Research on Home-Based Elderly Care Services for Empty Nest Elderly in Rural Communities. Heilongjiang Hum. Resour. Soc. Secur. 2022, 31–33. Available online: https://kns.cnki.net/kcms2/article/abstract?v=VcTOyLYtvEyplYs2_tWdrXhflPLEgFzjNyzOwdcU5owyptMPyHrvZvAgFp6YcrywtbX0wRRnWStjQ1Vm8fYjISEPh7WyVDFdRAMH3oRynHjf9YLi6D49hwKPCNzwz3AmvctwZv0Vy7KUqTW9y5UpXIw7mV9bFAke7w2EZcJXLFMsEL7BWcj2Xg==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Nguyen, H.; Manolova, G.; Daskalopoulou, C.; Vitoratou, S.; Prince, M.; Prina, A.M. Prevalence of Multimorbidity in Community Settings: A Systematic Review and Meta-Analysis of Observational Studies. J. Comorbidity 2019, 9, 2235042X19870934. [Google Scholar] [CrossRef]
- Xie, Q. Research on Fall Detection Based on Deep Learning. Master’s Thesis, Nanjing University of Information Science and Technology, Nanjing, China, 2024; p. 59. Available online: https://link.cnki.net/doi/10.27248/d.cnki.gnjqc.2024.000194 (accessed on 3 January 2025).
- World Health Organization. Disability and Health. Available online: https://www.who.int/zh/news-room/fact-sheets/detail/disability-and-health (accessed on 30 December 2024).
- Santos, N.B.; Bavaresco, R.S.; Tavares, J.E.R.; Ramos, G.d.O.; Barbosa, J.L.V. A Systematic Mapping Study of Robotics in Human Care. Robot. Auton. Syst. 2021, 144, 103833. [Google Scholar] [CrossRef]
- Ba, S.; Hu, L.; Huang, K. Analysis of the Impact of Population Aging on the Life Insurance Industry: Product Structure Optimization and Pricing Strategy Discussion from an Actuarial Perspective. Hainan Financ. 2024, 3–17. Available online: https://kns.cnki.net/kcms2/article/abstract?v=VcTOyLYtvEzwGVKzrQOoARiJPad56hz-dGT3x_bMH9q9-9ZKsxyNlECkQksnIoessgP12TvrIBwqaEeRSH-tzlULjr5g-AW6mfkPZpB6qQSoZ_XdTVwXyu6I_RvTVToLYJvZrK_Ckw3a3diwJCXMDg8avpfEB_pMhrsB1v5o6oqWULjo_UEvDg==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Department of Planning, Development and Information Technology of the People’s Republic of China. Statistical Bulletin on the Development of China’s Health Industry in 2023. 2024. Available online: http://www.nhc.gov.cn/guihuaxxs/s3585u/202408/6c037610b3a54f6c8535c515844fae96.shtml (accessed on 31 December 2024).
- Hooker, R.; Cawley, J.; Everett, C. Predictive Modeling the Physician Assistant Supply: 2010–2025. Public Health Rep. 2011, 126, 708–716. [Google Scholar] [CrossRef]
- Cooper, S.; Fava, D.; Vivas, C.; Marchionni, L.; Ferro, F. ARI: The Social Assistive Robot and Companion. In Proceedings of the 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy, 31 August–4 September 2020; pp. 745–751. [Google Scholar] [CrossRef]
- Auerbach, D. Will NP Workforce Grow Future? Med. Care 2012, 50, 606–610. [Google Scholar] [CrossRef]
- Robinson, H.; Macdonald, B.; Broadbent, E. The Role of Healthcare Robots for Older People at Home: A Review. Int. J. Soc. Robot. 2014, 6, 575–591. [Google Scholar] [CrossRef]
- Broekens, J.; Heerink, M.; Rosendal, H. Assistive Social Robots in Elderly Care: A Review. Gerontechnology 2009, 8, 94–103. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-To-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Bavaresco, R.; Barbosa, J.; Vianna, H.; Büttenbender, P.; Dias, L. Design and Evaluation of a Context-Aware Model Based on Psychophysiology. Comput. Methods Programs Biomed. 2020, 189, 105299. [Google Scholar] [CrossRef]
- D’Onofrio, G.; Fiorini, L.; Hoshino, H.; Matsumori, A.; Okabe, Y.; Tsukamoto, M.; Limosani, R.; Vitanza, A.; Greco, F.R.; Greco, A.; et al. Assistive Robots for Socialization in Elderly People: Results Pertaining to the Needs of the Users. Aging Clin. Exp. Res. 2019, 31, 1313–1329. [Google Scholar] [CrossRef] [PubMed]
- Torta, E.; Oberzaucher, J.; Werner, F.; Cuijpers, R. Attitudes towards Socially Assistive Robots in Intelligent Homes: Results from Laboratory Studies and Field Trials. J. Hum.-Robot Interact. 2013, 1, 76–99. [Google Scholar] [CrossRef]
- Simonov, M.; Bazzani, M.; Frisiello, A. Ubiquitous Monitoring & Service Robots for Care. In Proceedings of the 35th German Conference on Artificial Intelligence, Saarbrucken, Germany, 24–27 September 2012; p. 93. [Google Scholar]
- Tan, J.; Chan, W.; Robinson, N.; Croft, E.; Kulic, D. A Proposed Set of Communicative Gestures for Human Robot Interaction and an RGB Image-Based Gesture Recognizer Implemented in ROS. arXiv 2021, arXiv:2109.09908. [Google Scholar] [CrossRef]
- Wada, Y.S.; Lotfi, A.; Mahmud, M.; Machado, P.; Kubota, N. Gesture Recognition Intermediary Robot for Abnormality Detection in Human Activities. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 1415–1421. [Google Scholar] [CrossRef]
- Werner, C.; Kardaris, N.; Koutras, P.; Zlatintsi, A.; Maragos, P.; Bauer, J.M.; Hauer, K. Improving Gesture-Based Interaction between an Assistive Bathing Robot and Older Adults via User Training on the Gestural Commands. Arch. Gerontol. Geriatr. 2020, 87, 103996. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhang, Y.; Liu, Z. A Dynamic Hand Gesture Recognition Algorithm Based on CSI and YOLOv3. J. Phys. Conf. Ser. 2019, 1267, 012055. [Google Scholar] [CrossRef]
- Zhou, Z.; Han, F.; Wang, Z. Application of Improved SSD Algorithm in Chinese Sign Language Recognition. Comput. Eng. Appl. 2021, 57, 156–161. [Google Scholar]
- Wu, S.; Li, Z.; Li, S.; Liu, Q.; Wu, W. Static Gesture Recognition Algorithm Based on Improved YOLOv5s. Electronics 2023, 12, 596. [Google Scholar] [CrossRef]
- Zhang, J.; Feng, T. Gesture Recognition Based on Improved Faster R-CNN. Inf. Commun. 2019, 44–46. Available online: https://kns.cnki.net/kcms2/article/abstract?v=amOBmv6QLtp-vSJs7CyNIes_FGpIsqWsv9ebKEnCsEexL7hAbjl0WMtBk6dz9w8NvLkLWjM0gQ_u-5sbtrup6rLR3PrPaM73eJlmK4aSSkQqVkQwJ7bu3zEua24wIJLE3dcmqvWWX6PosxikdQG2mT2VTBP1oM4uaNSyKeqPc5lQZdYZp6FBYg==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Niu, Y.; Wu, Y.; Sun, K.; Lu, H.; Zhao, P. Gesture Recognition Detection Based on Lightweight Convolutional Neural Networks. Electron. Meas. Technol. 2022, 45, 91–98. [Google Scholar] [CrossRef]
- Wu, J.; Jiang, L. Intelligent Monitoring System for Elderly Living Alone Based on Ros Service Robots. Electron. Technol. Softw. Eng. 2021, 78–80. Available online: https://kns.cnki.net/kcms2/article/abstract?v=amOBmv6QLtq3OLFyN-QPZfH5Ob75UBpZ48f7KIDiSwA9tTYCkkcnUeV-v_PYufG7SR9IGXRSYtiVavfBKhyADM69nFKecNzmWY3-xQGd1pvPTR6P1PzRJuNY-48JawpYWN3rS2ys2A5SshAg2xmMiV1rOHdtkDOozQbHv-ym1yrbJriyjCDnJw==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Chen, X.; Zhao, Y.; Wang, F.; Cao, X.; Yang, Y. Fall Detection Alarm System for Service Robots Based on Static Image Pose Estimation. Sci. Technol. Innov. Her. 2023, 1–3. [Google Scholar] [CrossRef]
- Yan, S.; Xiong, Y.; Lin, D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. AAAI Conf. Artif. Intell. 2018, 32, 7444–7452. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Adaptive Spectral Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv 2018, arXiv:1805.07694. [Google Scholar] [CrossRef]
- Min, W.; Cui, H.; Rao, H.; Li, Z.; Yao, L. Detection of Human Falls on Furniture Using Scene Analysis Based on Deep Learning and Activity Characteristics. IEEE Access 2018, 6, 9324–9335. [Google Scholar] [CrossRef]
- Adhikari, K.; Bouchachia, H.; Nait-Charif, H. Activity Recognition for Indoor Fall Detection Using Convolutional Neural Network. In Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; IEEE: Berlin/Heidelberg, Germany, 2017; pp. 81–84. [Google Scholar] [CrossRef]
- Xu, S.; Wang, H.; Zhang, H.; Pang, J. A Keypoint-Based Method for Infrared Image Fall Detection. Infrared Technol. 2021, 43, 1003–1007. Available online: https://kns.cnki.net/kcms2/article/abstract?v=amOBmv6QLtqMuhRTYLLOBtlSOeSw7D6sgr9AmhVyzR8nOtjbb5ivJnPFKme0ThCQVeqIgp7q6i84FXSf6qwz5MgTa2-Tj2HFPBA-ELkqOjkQjmZUIrp9EmPXOLBzdqwQpmpM_nsGkgwU4V_5_khO1yqtQz6QizMaT3KMW_UCIsxrBelEC4qtaw==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Chhetri, S.; Alsadoon, A.; Al-Dala’in, T.; Prasad, P.W.C.; Rashid, T.A.; Maag, A. Deep Learning for Vision-Based Fall Detection System: Enhanced Optical Dynamic Flow. Comput. Intell. 2020, 37, 578–595. [Google Scholar] [CrossRef]
- Qi, Y.; Chen, S.; Sun, L. Fall Detection Using Dual-Stream Cnn Based on Improved Vibe Algorithm. Comput. Eng. Des. 2023, 44, 1812–1819. [Google Scholar] [CrossRef]
- Wang, X.; Zheng, X.; Gao, H.; Zeng, Z.; Zhang, Y. Fall Detection Algorithm Based on Convolutional Neural Network and Multi Discriminative Features. J. Comput.-Aided Des. Graph. 2023, 35, 452–462. Available online: https://link.cnki.net/urlid/11.2925.tp.20230410.1444.002 (accessed on 3 January 2025).
- Gupta, B.; Shukla, P.; Mittal, A. K-Nearest Correlated Neighbor Classification for Indian Sign Language Gesture Recognition Using Feature Fusion. In Proceedings of the 2016 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 7–9 January 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Pan, T.-Y.; Lo, L.-Y.; Yeh, C.-W.; Li, J.-W.; Liu, H.-T.; Hu, M.-C. Real-Time Sign Language Recognition in Complex Background Scene Based on a Hierarchical Clustering Classification Method. In Proceedings of the 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), Taipei, Taiwan, 20–22 April 2016; pp. 64–67. [Google Scholar] [CrossRef]
- Sharma, S.; Jain, S.; Khushboo. A Static Hand Gesture and Face Recognition System for Blind People. In Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 7–8 March 2019; pp. 534–539. [Google Scholar] [CrossRef]
- Athira, P.K.; Sruthi, C.J.; Lijiya, A. A Signer Independent Sign Language Recognition with Co-Articulation Elimination from Live Videos: An Indian Scenario. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 771–781. [Google Scholar] [CrossRef]
- Redmon, C.-Y.; Divvala, S.-M.; Girshick, J.-W.; Farhadi, L.-W.; Huang, C.-L. Vision-Based Fall Detection through Shape Features. In Proceedings of the 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), Taipei, Taiwan, 20–22 April 2016; pp. 237–240. [Google Scholar] [CrossRef]
- Gunale, K.; Mukherji, P. Indoor Human Fall Detection System Based on Automatic Vision Using Computer Vision and Machine Learning Algorithms. J. Eng. Sci. Technol. 2018, 13, 2587–2605. Available online: https://api.semanticscholar.org/CorpusID:189891830 (accessed on 3 January 2025).
- Zhang, F.; Zhu, J. Fall Detection Technology Based on Dual Cameras. Comput. Syst. Appl. 2020, 29, 186–192. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Hu, D.; Zhu, J.; Liu, J.; Wang, J.; Zhang, X. Gesture Recognition Based on Modified Yolov5s. IET Image Process. 2022, 16, 2124–2132. [Google Scholar] [CrossRef]
- Yang, Z.; Shen, Y.; Shen, Y. Football Referee Gesture Recognition Algorithm Based on YOLOv8s. Front. Comput. Neurosci. 2024, 18, 1341234. [Google Scholar] [CrossRef]
- Jiang, L. Application of Visual Gesture Recognition Technology in Simulation Training of Aviation Medicine. Master’s Thesis, North China University of Technology, Beijing, China, 2022; p. 72. Available online: https://link.cnki.net/doi/10.26926/d.cnki.gbfgu.2022.000695 (accessed on 3 January 2025).
- Meng, Q.; Dai, J.; Cha, J.; Xiong, Y.; Si, B. Common Gesture Recognition Based on Yolov8 Algorithm. Mod. Instrum. Med. Treat. 2023, 29, 12–20. Available online: https://kns.cnki.net/kcms2/article/abstract?v=amOBmv6QLtogTJhNnVM9_6PIzl6cqWTLexizIQbvgMjPcu4jI8ysIYLwhOTscKZdHQWfabnOABr9qei9XPZJVwyCN6T9RZJGXMnBGBHpkIUYYBA4NGoB5lrtwuj0cCPHVJhLTS356nSR4pHZwn1wVDWcKY5Bt9jXWX1SmDDBK_7TrsoLr3EIDA==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Song, J.; Xu, H.; Zhu, X.; Huang, X.; Chen, C.; Wang, Z. Oef-Yolo: An Improved Yolov8 Algorithm for Fall Detection. Comput. Eng. 2024, 1–16. [Google Scholar] [CrossRef]
- Zong, Z. Research on Home Fall Detection Based on Improved Yolov5s. Master’s Thesis, Nanchang University, Nanchang, China, 2023; p. 72. Available online: https://link.cnki.net/doi/10.27232/d.cnki.gnchu.2023.003973 (accessed on 3 January 2025).
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
- Dai, X.; Chen, Y.; Yang, J.; Zhang, P.; Yuan, L.; Zhang, L. Dynamic DETR: End-To-End Object Detection with Dynamic Attention. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 2968–2977. [Google Scholar] [CrossRef]
- Dai, Z.; Cai, B.; Lin, Y.; Chen, J. UP-DETR: Unsupervised Pre-Training for Object Detection with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 1601–1610. [Google Scholar] [CrossRef]
- Alexander, K.; Karina, K.; Alexander, N.; Roman, K.; Andrei, M. HaGRID—HAnd Gesture Recognition Image Dataset. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024. [Google Scholar] [CrossRef]
- Eraso, J.C.; Muñoz, E.; Muñoz, M.; Pinto, J. Dataset CAUCAFall. Mendeley Data 2022. V4. Available online: https://data.mendeley.com/datasets/7w7fccy7ky/4 (accessed on 3 January 2025).
- Roboflow Universe Projects. Fall Detection Object Detection Dataset and Pre-Trained Model by Roboflow Universe Projects. Roboflow Universe. Available online: https://universe.roboflow.com/roboflow-universe-projects/fall-detection-ca3o8 (accessed on 31 December 2024).
- Wang, C.-Y.; Yeh, I.-H.; Liao, M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Computer Vision—ECCV 2024; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2024; Volume 15089, pp. 1–21. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar] [CrossRef]
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient Long-Range Attention Network for Image Super-Resolution. In Computer Vision—ECCV 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 649–667. [Google Scholar] [CrossRef]
- Li, Z.; Wu, Y.; Jiang, H.; Lei, D.; Pan, F.; Qiao, J.; Fu, X.; Guo, B. RT-DETR-SoilCuc: Detection Method for Cucumber Germinationinsoil Based Environment. Front. Plant Sci. 2024, 15, 1425103. [Google Scholar] [CrossRef]
- Yu, Y.; Mu, Y. Research on Interpolation Algorithm. Mod. Comput. (Prof. Ed.) 2014, 32–35. Available online: https://kns.cnki.net/kcms2/article/abstract?v=amOBmv6QLtoGbG38vL2mgf5TqgvM6iWToQl_G06TEEaIBPdws4o914lEOKfAuEidvtVVRwV6kbe8ObazAihaaIUXoDzJmck6-pRHgJZhCcVQV3X5wD-o%20fGxu6604b2Erm86IW60USMSdETRY0Oum2GSsVSCy_SLpchQlzZcdFmdL36j7tEpx4w==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Wang, Y.; Zhang, Q.; Li, N. Research on Image Deformation Based on Interpolation Algorithm. Instrum. Anal. Monit. 2014, 19–21. Available online: https://kns.cnki.net/kcms2/article/abstract?v=amOBmv6QLtrRVKzp6PjEtVophzc2kNAEw5su4e_8_FADrlIjACFKJseWyg4CPPFzkVPmFCRzsKBoaCMcDQ_BCn9l0p5pHXtg27HRevzViGAndeduGxKZulNiTlCtpTV0H6awOnTiZ3sx5qHEyvDYR9WPKWPGAilvZdok9ZamrJWw9q59AjkZtw==&uniplatform=NZKPT&language=CHS (accessed on 3 January 2025).
- Hu, J.; Shen, L.; Sun, G. Squeeze-And-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Jiang, W.; Gao, Y.; Yuan, H.; Liu, W. Image Classification Network with Gated Mechanism. Acta Electron. Sin. 2024, 52, 2393–2406. Available online: https://link.cnki.net/urlid/11.2087.TN.20240808.1000.002 (accessed on 3 January 2025).
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
- Gong, Y.; Shen, X. Smoking Detection Algorithm Based on Yolov3 with Transposed Convolution Fusion. Comput. Meas. Control 2024, 32, 40–46+54. [Google Scholar]
- Gu, Z.; Liu, G.; Shao, C.; Yu, H. Downsampling Algorithm Incorporating Size Receptive Field Mechanism in Deep Detection Methods. Comput. Sci. Explor. 2023, 18, 2727–2737. Available online: https://link.cnki.net/urlid/11.5602.TP.20231122.1143.006 (accessed on 3 January 2025).
Dataset Name | Dataset Source |
---|---|
Fall | Dataset CAUCAFall, fall-detection-ca3o8_dataset |
Call | HaGRID—HAnd Gesture Recognition Image Dataset |
OK | HaGRID—HAnd Gesture Recognition Image Dataset |
Stop | HaGRID—HAnd Gesture Recognition Image Dataset |
Forward | HaGRID—HAnd Gesture Recognition Image Dataset |
Backwards | HaGRID—HAnd Gesture Recognition Image Dataset |
Serial Number | Type | Content | Action |
---|---|---|---|
1 | Fall Posture | Fall | |
2 | Gesture | Call | |
3 | Gesture | OK | |
4 | Gesture | Stop | |
5 | Gesture | Forward | |
6 | Gesture | Backwards |
Parameters | Dataset Source |
---|---|
Epoch | 300 |
Batch Size | 28 |
Image Size | 640 |
Cache | False |
Workers | 4 |
Patience | 8 |
Model | mAP@0.5 (%) | Params (M) | GFLOPs | Weight Size (MB) | FPS |
---|---|---|---|---|---|
RT-DETR R18 | 99 | 19.879464 | 57.0 | 36.8 | 49.5 |
ADwon+RepNCSPELAN4 | 99.3 | 9.27348 | 27.6 | 18.2 | 49.2 |
ADwon+RepNCSPELAN4 + AdaptiveGateDownsample | 99.3 | 9.01236 | 26.8 | 17.7 | 49.9 |
Lightweight RT-DETR with AUDPN | 99.4 | 9.60372 | 30.4 | 18.9 | 57.2 |
Model | mAP@0.5 (%) | mAP@0.5–0.95 (%) | Precision (%) | Recall (%) |
---|---|---|---|---|
RT-DETR R18 | 99 | 86.1 | 99.4 | 98.7 |
ADwon+RepNCSPELAN4 | 99.3 | 85.7 | 99.3 | 99.1 |
ADwon+RepNCSPELAN4 + AdaptiveGateDownsample | 99.3 | 86.2 | 99.5 | 99.4 |
Lightweight RT-DETR with AUDPN | 99.4 | 86.4 | 99.6 | 99.4 |
Model | mAP@0.5 (%) | Params (M) | Precision (%) | Recall (%) | GFLOPS |
---|---|---|---|---|---|
YOLO v5s | 98.9 | 9.113858 | 98.6 | 96.5 | 23.8 |
YOLO v6s | 98.4 | 16.298594 | 98.9 | 95.8 | 44.0 |
YOLO v8s | 99.1 | 11.127906 | 99.1 | 97.1 | 28.4 |
YOLO v5m | 99.2 | 25.04869 | 99.2 | 98.8 | 64.0 |
YOLO v8m | 99.3 | 25.843234 | 99.4 | 98.8 | 78.7 |
Lightweight RT-DETR with AUDPN | 99.4 | 9.60372 | 99.6 | 99.4 | 30.4 |
Name | Version |
---|---|
Ubuntu | 20.04 |
CUDA | No |
PyTorch | 1.13.1 (CPU) |
OpenCV | 4.10.0 |
Python | 3.8.0 |
CPU | i7-10870 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, N.; Han, X.; Song, X.; Fang, X.; Wu, M.; Yu, Q. Lightweight RT-DETR with Attentional Up-Downsampling Pyramid Network. Appl. Sci. 2025, 15, 3309. https://doi.org/10.3390/app15063309
Li N, Han X, Song X, Fang X, Wu M, Yu Q. Lightweight RT-DETR with Attentional Up-Downsampling Pyramid Network. Applied Sciences. 2025; 15(6):3309. https://doi.org/10.3390/app15063309
Chicago/Turabian StyleLi, Nan, Xiao Han, Xingguo Song, Xu Fang, Mengming Wu, and Qiulin Yu. 2025. "Lightweight RT-DETR with Attentional Up-Downsampling Pyramid Network" Applied Sciences 15, no. 6: 3309. https://doi.org/10.3390/app15063309
APA StyleLi, N., Han, X., Song, X., Fang, X., Wu, M., & Yu, Q. (2025). Lightweight RT-DETR with Attentional Up-Downsampling Pyramid Network. Applied Sciences, 15(6), 3309. https://doi.org/10.3390/app15063309