Previous Issue
Volume 6, October
 
 

AI, Volume 6, Issue 11 (November 2025) – 1 article

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
20 pages, 2894 KB  
Article
End-to-End Swallowing Event Localization via Blue-Channel-to-Depth Substitution in RGB-D: GRNConvNeXt-Modified AdaTAD with KAN-Chebyshev Decoder
by Derek Ka-Hei Lai, Zi-An Zhao, Andy Yiu-Chau Tam, Jing Li, Jason Zhi-Shen Zhang, Duo Wai-Chi Wong and James Chung-Wai Cheung
AI 2025, 6(11), 276; https://doi.org/10.3390/ai6110276 (registering DOI) - 22 Oct 2025
Abstract
Background: Swallowing is a complex biomechanical process, and its impairment (dysphagia) poses major health risks for older adults. Current diagnostic methods such as videofluoroscopic swallowing (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES) are effective but invasive, resource-intensive, and unsuitable for continuous [...] Read more.
Background: Swallowing is a complex biomechanical process, and its impairment (dysphagia) poses major health risks for older adults. Current diagnostic methods such as videofluoroscopic swallowing (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES) are effective but invasive, resource-intensive, and unsuitable for continuous monitoring. This study proposes a novel end-to-end RGB–D framework for automated swallowing event localization in continuous video streams. Methods: The framework enhances the AdaTAD backbone through three key innovations: (i) finding the optimal strategy to integrate depth information to capture subtle neck movements, (ii) examining the best adapter design for efficient temporal feature adaptation, and (iii) introducing a Kolmogorov–Arnold Network (KAN) decoder that leverages Chebyshev polynomials for non-linear temporal modeling. Evaluation on a proprietary swallowing dataset comprising 641 clips and 3153 annotated events demonstrated the effectiveness of the proposed framework. We analysed and compared the modification strategy across designs of adapters, decoders, input channel combinations, regression methods, and patch embedding techniques. Results: The optimized configuration (VideoMAE + GRNConvNeXtAdapter + KAN + RGD + boundary regression + sinusoidal embedding) achieved an average mAP of 83.25%, significantly surpassing the baseline I3D + RGB + MLP model (61.55%). Ablation studies further confirmed that each architectural component contributed incrementally to the overall improvement. Conclusions: These results establish the feasibility of accurate, non-invasive, and automated swallowing event localization using depth-augmented video. The proposed framework paves the way for practical dysphagia screening and long-term monitoring in clinical and home-care environments. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop