Next Article in Journal
Three-Dimensional Point Cloud Reconstruction of Unstructured Terrain for Autonomous Robots
Next Article in Special Issue
Radio Coverage Assessment and Indoor Communication Enhancement in Hospitals: A Case Study at CHUCB
Previous Article in Journal
Trajectory Tracking Controller for Quadrotor by Continual Reinforcement Learning in Wind-Disturbed Environment
Previous Article in Special Issue
DRL-Driven Intelligent SFC Deployment in MEC Workload for Dynamic IoT Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

EdgeVidCap: A Channel-Spatial Dual-Branch Lightweight Video Captioning Model for IoT Edge Cameras

1
School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
2
Department of Computer Science and Information Engineering, Providence University, Taichung 43301, Taiwan
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(16), 4897; https://doi.org/10.3390/s25164897
Submission received: 24 June 2025 / Revised: 30 July 2025 / Accepted: 6 August 2025 / Published: 8 August 2025

Abstract

With the deep integration of edge computing and Internet of Things (IoT) technologies, the computational capabilities of intelligent edge cameras continue to advance, providing new opportunities for the local deployment of video understanding algorithms. However, existing video captioning models suffer from high computational complexity and large parameter counts, making them challenging to meet the real-time processing requirements of resource-constrained IoT edge devices. In this work, we propose EdgeVidCap, a lightweight video captioning model specifically designed for IoT edge cameras. Specifically, we design a hybrid module termed Synergetic Attention State Mamba (SASM) that incorporates channel attention mechanisms to enhance feature selection capabilities and leverages State Space Models (SSMs) to efficiently capture long-range spatial dependencies, achieving efficient spatiotemporal modeling of multimodal video features. In the caption generation stage, we propose an adaptive attention-guided LSTM decoder that can dynamically adjust feature weights according to video content and auto-regressively generate semantically rich and accurate textual descriptions. Comprehensive evaluations of EdgeVidCap on mainstream datasets, including MSR-VTT and MSVD are analyzed. Experimental results demonstrate that our system demonstrated enhanced precision relative to existing investigations, and our streamlined frame filtering mechanism yielded greater processing efficiency while creating more dependable descriptions following frame selection.
Keywords: edge computing; video captioning; IOT; lightweight neural networks; attention mechanism; state space models edge computing; video captioning; IOT; lightweight neural networks; attention mechanism; state space models

Share and Cite

MDPI and ACS Style

Guo, L.; Li, X.; Wang, J.; Xiao, J.; Hou, Y.; Zhi, P.; Yong, B.; Li, L.; Zhou, Q.; Li, K. EdgeVidCap: A Channel-Spatial Dual-Branch Lightweight Video Captioning Model for IoT Edge Cameras. Sensors 2025, 25, 4897. https://doi.org/10.3390/s25164897

AMA Style

Guo L, Li X, Wang J, Xiao J, Hou Y, Zhi P, Yong B, Li L, Zhou Q, Li K. EdgeVidCap: A Channel-Spatial Dual-Branch Lightweight Video Captioning Model for IoT Edge Cameras. Sensors. 2025; 25(16):4897. https://doi.org/10.3390/s25164897

Chicago/Turabian Style

Guo, Lan, Xuyang Li, Jinqiang Wang, Jie Xiao, Yufeng Hou, Peng Zhi, Binbin Yong, Linghuey Li, Qingguo Zhou, and Kuanching Li. 2025. "EdgeVidCap: A Channel-Spatial Dual-Branch Lightweight Video Captioning Model for IoT Edge Cameras" Sensors 25, no. 16: 4897. https://doi.org/10.3390/s25164897

APA Style

Guo, L., Li, X., Wang, J., Xiao, J., Hou, Y., Zhi, P., Yong, B., Li, L., Zhou, Q., & Li, K. (2025). EdgeVidCap: A Channel-Spatial Dual-Branch Lightweight Video Captioning Model for IoT Edge Cameras. Sensors, 25(16), 4897. https://doi.org/10.3390/s25164897

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop