AI | November 2025 - Browse Articles

Journal Browser

► Journal Browser

AI, Volume 6, Issue 11 (November 2025) – 1 article

Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
You may sign up for e-mail alerts to receive table of contents of newly released issues.
PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

20 pages, 2894 KB

Open AccessArticle

End-to-End Swallowing Event Localization via Blue-Channel-to-Depth Substitution in RGB-D: GRNConvNeXt-Modified AdaTAD with KAN-Chebyshev Decoder

by Derek Ka-Hei Lai, Zi-An Zhao, Andy Yiu-Chau Tam, Jing Li, Jason Zhi-Shen Zhang, Duo Wai-Chi Wong and James Chung-Wai Cheung

AI 2025, 6(11), 276; https://doi.org/10.3390/ai6110276 (registering DOI) - 22 Oct 2025

Abstract

Background: Swallowing is a complex biomechanical process, and its impairment (dysphagia) poses major health risks for older adults. Current diagnostic methods such as videofluoroscopic swallowing (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES) are effective but invasive, resource-intensive, and unsuitable for continuous monitoring. This study proposes a novel end-to-end RGB–D framework for automated swallowing event localization in continuous video streams. Methods: The framework enhances the AdaTAD backbone through three key innovations: (i) finding the optimal strategy to integrate depth information to capture subtle neck movements, (ii) examining the best adapter design for efficient temporal feature adaptation, and (iii) introducing a Kolmogorov–Arnold Network (KAN) decoder that leverages Chebyshev polynomials for non-linear temporal modeling. Evaluation on a proprietary swallowing dataset comprising 641 clips and 3153 annotated events demonstrated the effectiveness of the proposed framework. We analysed and compared the modification strategy across designs of adapters, decoders, input channel combinations, regression methods, and patch embedding techniques. Results: The optimized configuration (VideoMAE + GRNConvNeXtAdapter + KAN + RGD + boundary regression + sinusoidal embedding) achieved an average mAP of 83.25%, significantly surpassing the baseline I3D + RGB + MLP model (61.55%). Ablation studies further confirmed that each architectural component contributed incrementally to the overall improvement. Conclusions: These results establish the feasibility of accurate, non-invasive, and automated swallowing event localization using depth-augmented video. The proposed framework paves the way for practical dysphagia screening and long-term monitoring in clinical and home-care environments. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Engineering: Challenges and Developments)

► Show Figures

Figure 1

Journal Menu

Journal Browser

AI, Volume 6, Issue 11 (November 2025) – 1 article

Further Information

Guidelines

MDPI Initiatives

Follow MDPI