MDPI - Publisher of Open Access Journals

31 pages, 12192 KB

Open AccessReview

Harnessing Multi-Camera Video Fusion: Technologies, Applications, and Future Prospects

by Chicheng Ma and Leiyang Xu

Digital 2026, 6(2), 47; https://doi.org/10.3390/digital6020047 (registering DOI) - 12 Jun 2026

The rapid advancement of information technology and multimedia applications has led to an increasing demand for video data processing. In particular, video fusion technology in multi-camera environments, which integrates and optimizes video data from multiple camera viewpoints, plays a crucial role in enhancing [...] Read more.

The rapid advancement of information technology and multimedia applications has led to an increasing demand for video data processing. In particular, video fusion technology in multi-camera environments, which integrates and optimizes video data from multiple camera viewpoints, plays a crucial role in enhancing visual quality and improving the completeness of information. This technology addresses the challenge of obtaining high-quality video content in complex and dynamic environments. By improving image clarity, expanding perspective information, and enhancing scene understanding, video fusion technology has shown significant potential for a wide range of applications, attracting considerable attention from both academia and industry. Despite the existence of several review articles on video fusion, they tend to focus on isolated aspects of the technology and often lack a comprehensive, systematic overview of the field. To fill this gap, this paper provides an in-depth review of the research on video fusion technology in multi-camera scenarios. The paper covers the definition of video fusion; offers a detailed classification of key technologies, such as geometric correction and alignment, perspective fusion, spatio-temporal fusion, and multi-modal fusion; and explores its applications in diverse fields including surveillance security, virtual reality, film and television production, intelligent transportation, medical imaging, robotics, and unmanned aerial vehicles. Additionally, the paper examines the role of edge caching in video fusion, highlights the current challenges faced by the field, and discusses the potential of video fusion technology for driving innovation across multiple industries. Full article

14 pages, 127365 KB

Open AccessArticle

CGS-BR: Construction and Benchmarking of a Respiratory Behavior Dataset for the Chinese Giant Salamander

by Dingwei Mao, Yan Zhou, Maochun Wang, Chenyang Shi, Yuanqiong Chen and Qinghua Luo

Animals 2026, 16(8), 1272; https://doi.org/10.3390/ani16081272 - 21 Apr 2026

Viewed by 375

Abstract

The Chinese giant salamander (Andrias davidianus) is a nationally protected species in China, and its respiratory behavior serves as a key indicator of its physiological state, health status, and biological rhythm. However, research on intelligent monitoring of its respiratory behavior remains [...] Read more.

The Chinese giant salamander (Andrias davidianus) is a nationally protected species in China, and its respiratory behavior serves as a key indicator of its physiological state, health status, and biological rhythm. However, research on intelligent monitoring of its respiratory behavior remains limited due to several challenges, including the species’ nocturnal habits, resulting in low image contrast and poor quality in dark environments; extremely subtle breathing movements; and high-cost manual annotation, leading to a scarcity of high-quality annotated visual data. These factors severely constrain the application of deep learning techniques in this field. To support research on respiratory behavior monitoring in the Chinese giant salamander, this study constructs and releases the CGS-BR dataset, which is the first vision-based dataset dedicated specifically to respiratory behavior detection in this species. The dataset was collected under controlled simulated breeding conditions and consists of 1732 images extracted from 215 high-definition video clips. Following a standardized procedure, each complete respiratory cycle is manually annotated into four stages: head-up, diving, exhalation, and inhalation. To validate the effectiveness of this dataset, this study selects YOLOv8n as the baseline model, which balances detection accuracy, speed, and parameter count, enabling efficient giant salamander respiratory detection under limited resources. By comparing it with several representative models, we provide a reliable evaluation of the dataset’s applicability. CGS-BR aims to provide fundamental data support for research on respiratory monitoring in the Chinese giant salamander, laying the foundation for subsequent applications in conservation management, captive breeding, health monitoring, and early disease warning. Full article

(This article belongs to the Special Issue Artificial Intelligence as a Useful Tool in Behavioural Studies)

► Show Figures

Figure 1

14 pages, 715 KB

Open AccessArticle

The Nerve-Sparing Quality (NSQ) Score: A Novel Intraoperative Scoring System for Assessing Nerve-Sparing Quality During Robot-Assisted Radical Prostatectomy—A Concept and Feasibility Study

by Jakub Kempisty, Krzysztof Balawender, Oskar Dąbrowski and Karol Burdziak

J. Clin. Med. 2026, 15(8), 2979; https://doi.org/10.3390/jcm15082979 - 14 Apr 2026

Viewed by 437

Abstract

Introduction: Nerve-sparing (NS) during robot-assisted radical prostatectomy (RARP) plays a critical role in postoperative functional recovery, particularly urinary continence and erectile function. Despite the importance of precise neurovascular bundle (NVB) preservation, intraoperative assessment of NS quality remains largely subjective and lacks standardized [...] Read more.

Introduction: Nerve-sparing (NS) during robot-assisted radical prostatectomy (RARP) plays a critical role in postoperative functional recovery, particularly urinary continence and erectile function. Despite the importance of precise neurovascular bundle (NVB) preservation, intraoperative assessment of NS quality remains largely subjective and lacks standardized evaluation tools. The aim of this study was to develop and preliminarily evaluate a structured intraoperative scoring system designed specifically for assessing NS quality during RARP. Methods: A novel 10-point intraoperative NS scoring system (NSQ Score) based on five domains was developed: dissection plane, bleeding control, bundle manipulation, continuity of dissection, and symmetry. Each parameter was rated on a 0–2 scale. Thirty robot-assisted radical prostatectomy (RARP) procedures performed in 2024 were randomly selected from a prospectively maintained institutional surgical video archive. Cases were not pre-filtered based on tumor stage, surgical difficulty, or intraoperative complexity. High-definition video recordings of the nerve-sparing phase were anonymized and independently evaluated by three experienced observers blinded to patient outcomes and to each other’s assessments. Inter-rater agreement was analyzed using weighted Cohen’s kappa statistics with quadratic weights, complemented by exact and near-agreement proportions. Cluster bootstrap resampling was applied to account for bilateral observations. Results: A total of 48 evaluable observations were analyzed. The overall inter-rater agreement demonstrated a weighted kappa of 0.41 (95% CI 0.36–0.48), indicating fair-to-moderate agreement among reviewers. Exact agreement occurred in 43% of observations, while near-agreement (allowing one ordinal level difference) reached 98%. Among individual parameters, symmetry demonstrated the highest reliability with substantial agreement (κ = 0.70; 95% CI 0.58–0.81). Other domains showed fair agreement, including intraoperative bleeding (κ = 0.36), continuity of dissection (κ = 0.39), bundle manipulation (κ = 0.34), and dissection plane (κ = 0.27). Agreement levels were comparable between left- and right-sided dissections. Conclusions: We propose a novel structured intraoperative scoring system for evaluating nerve-sparing quality during RARP. The scale is simple, procedure-specific, and feasible for structured postoperative or video-based assessment. Preliminary results demonstrate fair-to-moderate inter-rater reliability with very high near-agreement, supporting the feasibility of this tool for clinical use. The proposed scoring system may facilitate standardized training, objective performance assessment, and future studies correlating intraoperative NS quality with functional outcomes. Full article

(This article belongs to the Special Issue Robotic Urologic Surgery: Clinical Applications and Advances)

► Show Figures

Figure 1

5 pages, 1314 KB

Open AccessPerspective

From Low-Resource Innovation to High-Resource Learning: Head-Mounted Cameras as a Tool to Strengthen Surgical and Burn Care Training

by Einar Logi Snorrason, Fredrik Huss, Ali Modarressi and Morten Kildal

Eur. Burn J. 2026, 7(2), 20; https://doi.org/10.3390/ebj7020020 - 1 Apr 2026

Viewed by 537

Abstract

While the global surgeon deficit continues to demand urgent action, traditional “over-the-shoulder” teaching is increasingly constrained by infection-control demands and crowded operating rooms. Over the past four years, we integrated head-mounted smart cameras into reconstructive-surgery workshops across East Africa. Utilizing voice-controlled, stabilized video [...] Read more.

While the global surgeon deficit continues to demand urgent action, traditional “over-the-shoulder” teaching is increasingly constrained by infection-control demands and crowded operating rooms. Over the past four years, we integrated head-mounted smart cameras into reconstructive-surgery workshops across East Africa. Utilizing voice-controlled, stabilized video technology, we provided trainees with a high-definition, wearer’s-perspective view that enhanced visualization without compromising the sterile field. Following remarkably high acceptance in Africa, we have initiated a pilot study at the National Burn Centre in Sweden to apply these lessons to a high-income setting. Our findings suggest that this technology improves surgical education while supporting infection-control stewardship through reduced overcrowding. This experience illustrates a reverse innovation, where tools refined under the logistical constraints of African operating theatres offer scalable solutions for universal challenges in surgical training and patient safety. Full article

(This article belongs to the Special Issue Innovative Applications and Challenges of Emerging Materials and Technologies in Burn Treatment)

► Show Figures

Figure 1

13 pages, 4077 KB

Open AccessArticle

Redefining Access to the Mesiotemporal Lobe: The Transplanum Polare Approach with Cadaveric and Operative Video Demonstration

by Jesse Shamsul, Alessandro Pesaresi, Daniele Starnoni, Samia Messaoudi, Lorenzo Dolci, Hugues Cadas, Sami Schranz, Sara Sabatasso, Vincent Dunet, Roy T. Daniel, Pablo González-López and Lorenzo Giammattei

Brain Sci. 2026, 16(4), 351; https://doi.org/10.3390/brainsci16040351 - 25 Mar 2026

Cited by 1 | Viewed by 1525

Abstract

Objectives: This study aims to define the surgical anatomy, technical feasibility, advantages, and limitations of the TPPA through detailed cadaveric dissection and a representative clinical case, evaluating its potential as a safe and effective alternative to traditional approaches to the mesiotemporal lobe. Methods: [...] Read more.

Objectives: This study aims to define the surgical anatomy, technical feasibility, advantages, and limitations of the TPPA through detailed cadaveric dissection and a representative clinical case, evaluating its potential as a safe and effective alternative to traditional approaches to the mesiotemporal lobe. Methods: A cadaveric dissection was performed on one adult head injected with colored latex, using standard microsurgical instruments and high-definition video documentation. Each procedural step was recorded and illustrated with cadaveric photographs. Additionally, a clinical case of mesiotemporal cavernous hemangioma resected via TPPA is presented, including an operative video. Results: The dissection demonstrated a direct and safe trajectory to the amygdala and hippocampal head, with clear identification of key vascular and white matter landmarks. In the clinical case, the lesion was completely resected with no postoperative neurological deficits. Conclusions: The TPPA represents a novel microsurgical corridor to the mesiotemporal region, minimizing cortical disruption, Sylvian fissure dissection, and manipulation of middle cerebral artery branches. Although its exposure is limited posteriorly, the TPPA could offer an optimal balance between functional preservation and surgical accessibility, constituting a valuable addition to the modern microsurgical armamentarium. Full article

(This article belongs to the Special Issue Innovations in Skull Base Surgery)

► Show Figures

Figure 1

14 pages, 2689 KB

Open AccessArticle

The Infra-Bullar Groove: Assessing a Novel Surgical Landmark for Identifying the Natural Maxillary Ostium

by Jameel Ghantous, Ayalon Hadar, Itay Chen, Chanan Shaul and Boaz Forer

Life 2026, 16(3), 475; https://doi.org/10.3390/life16030475 - 15 Mar 2026

Viewed by 395

Abstract

Functional endoscopic sinus surgery (FESS) is the gold-standard surgical treatment for chronic rhinosinusitis (CRS) not responding to appropriate medical therapy. Identifying the natural maxillary ostium (NMO) during FESS is often challenging due to the lack of definitive landmarks. To aid in the identification [...] Read more.

Functional endoscopic sinus surgery (FESS) is the gold-standard surgical treatment for chronic rhinosinusitis (CRS) not responding to appropriate medical therapy. Identifying the natural maxillary ostium (NMO) during FESS is often challenging due to the lack of definitive landmarks. To aid in the identification of the NMO, we describe and assess the feasibility of using a new surgical landmark, the infra-bullar groove (IBG), and evaluate its visibility and reproducibility during FESS. Methods: Video recordings of 41 maxillary antrostomy procedures in patients with varying severity of CRS were reviewed. Surgeons of different experience levels assessed IBG visibility and its termination at the NMO. Results: In video recordings where the ethmoid bulla was preserved during uncinectomy, the IBG and its connection to the NMO were successfully identified by all reviewers in 100% of analyzable cases. The mean time to IBG identification did not significantly differ among surgeons. The IBG was consistently more pronounced in cases with well-pneumatized bullae. Conclusions: Under controlled surgical conditions where the ethmoid bulla is preserved during uncinectomy, the IBG demonstrates high visibility and reproducibility for locating the NMO. However, further prospective studies are needed to establish its real-world utility and impact on surgical outcomes. Full article

(This article belongs to the Section Medical Research)

► Show Figures

Figure 1

25 pages, 3230 KB

Open AccessArticle

Lightweight State-Space Model-Based Video Quality Enhancement for Quadruped Robot Dog Decoded Streams

by Wentao Feng, Yuanchun Huang and Zhenglong Yang

Electronics 2026, 15(6), 1151; https://doi.org/10.3390/electronics15061151 - 10 Mar 2026

Viewed by 550

Abstract

In the field of intelligent inspection, high-definition video data collected by quadruped robot dogs face severe transmission and storage constraints. Although existing advanced lossy video coding standards can significantly improve compression efficiency, they inevitably introduce severe compression artifacts in low-bit-rate scenarios. To address [...] Read more.

In the field of intelligent inspection, high-definition video data collected by quadruped robot dogs face severe transmission and storage constraints. Although existing advanced lossy video coding standards can significantly improve compression efficiency, they inevitably introduce severe compression artifacts in low-bit-rate scenarios. To address this issue, this paper proposes a video decoding quality enhancement network named Video Quality Restoration Network (VQRNet), based on a dual-stream architecture. Specifically, the Local Feature Extraction component incorporates a Progressive Feature Fusion Module (PFFM) with a four-stage progressive structure. By integrating reparameterized convolution and attention mechanisms, PFFM focuses on capturing high-frequency texture details to repair small-scale distortions. Simultaneously, the Multi-Scale Lightweight Spatial Attention Module (MLSA) performs spatial feature recalibration, leveraging multi-scale convolution to adaptively identify and enhance key spatial regions, specifically addressing multi-scale distortion. In the Global Feature Extraction component, the State-Space Attention Module (SSAM) combines State-Space Models (SSMs) with attention mechanisms to capture long-range dependencies and contextual information, for large-scale distortions caused by high-intensity compression. To verify the performance of the proposed algorithm, a dedicated dataset comprising 20 real-world video sequences captured by quadruped robot dogs (partitioned into 15 training and 5 testing sequences) was constructed, and the VTM 23.4 reference software was employed to simulate compression degradation using four quantization parameters (QP 30, 35, 40, and 45). Experimental results demonstrate that VQRNet outperforms state-of-the-art quality enhancement methods in terms of core metrics, including PSNR and SSIM, specifically including MIRNet, NAFNet, TRRHA, and CTNet. In the QP = 30 scenario, VQRNet achieves an average PSNR of 40.33 dB, a significant improvement of 3.32 dB over the VTM 23.4 baseline (37.01 dB), while demonstrating significant advantages in computational complexity and parameter efficiency—requiring only 5.27 G FLOPs and 1.40 M parameters, with an average inference latency of only 11.82 ms per 128 × 128 patch. This work provides robust technical support for the efficient video perception of quadruped robot dogs. Full article

► Show Figures

Figure 1

16 pages, 5863 KB

Open AccessArticle

A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames

by Xiuzhen Wu, Yahui Qi, Liang Qin, Shi Yan and Jianxiu Zhang

Automation 2026, 7(2), 43; https://doi.org/10.3390/automation7020043 - 5 Mar 2026

Viewed by 619

Abstract

Due to their advantages of being low-cost, lightweight and flexible, and having wide shooting coverage, UAVs have played an important role in situational awareness in the fields of disaster prevention and mitigation, urban planning and management, etc. In these applications, UAV aerial photography [...] Read more.

Due to their advantages of being low-cost, lightweight and flexible, and having wide shooting coverage, UAVs have played an important role in situational awareness in the fields of disaster prevention and mitigation, urban planning and management, etc. In these applications, UAV aerial photography is limited by the field of view, and high-definition panoramic images of the complete target area cannot be obtained. Image mosaic technology is essential, but an image mosaic using only a single UAV cannot meet the high real-time requirements for situational awareness. In response to the above problems, this paper proposes a multi-UAV fast aerial image mosaic method based on key frames. First, the multi-UAV area coverage flight strategy is determined according to the size of the task area and the UAV flight parameters; then, the field of view of the pod, the flight speed, and the flight altitude are used to determine the key frame extraction time period during the UAV aerial photography process. The image matching-rate calculation method is designed and the key frames are extracted during the extraction time period, and the key frames are returned to the ground visual puzzle system; in the ground visual puzzle system, the improved Laplacian pyramid method is used to quickly fuse and stitch the key frames extracted by each UAV to obtain a panoramic stitched map. The experiment shows that the method can quickly obtain high-precision real-scene map information of the task area. Compared with the single-UAV method and the multi-UAV full video stream-splicing method, this method greatly reduces the consumption of computing power and the requirements of communication bandwidth and improves the efficiency and real-time performance of panoramic map acquisition. Full article

(This article belongs to the Special Issue Inventory Monitoring and Control Through High-Level Coordination of Drone Swarms)

► Show Figures

Figure 1

11 pages, 590 KB

Open AccessArticle

Design and Performance Evaluation of Communication Systems Based on Non-Orthogonal Overlapped Chirp Modulation

by Guoping Liu, Jiaju Zhang, Qiusheng Gao, Wenjiang Pei, Junpeng Zhang and Sinuo Jiao

Symmetry 2026, 18(3), 412; https://doi.org/10.3390/sym18030412 - 27 Feb 2026

Viewed by 331

Abstract

With the evolution of smart grids, power communication networks are increasingly required to support high-bandwidth and diversified services such as high-definition video, real-time control, and positioning—services that impose dual challenges of communication capacity and spectrum constraints—under severe resource limitations. Conventional orthogonal modulation schemes [...] Read more.

With the evolution of smart grids, power communication networks are increasingly required to support high-bandwidth and diversified services such as high-definition video, real-time control, and positioning—services that impose dual challenges of communication capacity and spectrum constraints—under severe resource limitations. Conventional orthogonal modulation schemes exhibit significant limitations in spectral efficiency and concurrent access capabilities, particularly in supporting high-density user environments. To address this, we propose a communication system based on non-orthogonal overlapped chirp modulation, in which the intrinsic symmetry properties of chirp waveforms are utilized to enhance system design and performance. We first construct the system architecture with a multi-symbol concurrent transmission scheme and introduce continuous orthogonal phase modulation to improve symbol distinguishability and mitigate inter-symbol interference—an approach that effectively harnesses signal symmetry for interference suppression. At the receiver, a low-complexity demodulation algorithm based on correlation matrix computation is developed, further improved through oversampling techniques that exploit temporal and spectral symmetry in signal design. Monte Carlo simulations confirm that the proposed system outperforms traditional orthogonal chirp and orthogonal frequency division multiplexing systems in bit error rate performance and spectral efficiency across varying signal-to-noise ratios and modulation schemes. The proposed NOOC system achieves spectral efficiency scaling linearly with concurrency level K, reaching up to 16 bits/s/Hz for K = 16 with BPSK, compared to 1 bit/s/Hz in orthogonal systems. The study provides both a theoretical foundation and practical insights for developing symmetry-aware, efficient, and reliable air interface technologies suitable for future power-private networks. Full article

(This article belongs to the Section Engineering and Materials)

► Show Figures

Figure 1

12 pages, 728 KB

Open AccessArticle

Ciliary Beat Frequency and Pattern: An Accessible Tool for the Screening of Primary Ciliary Dyskinesia

by Elise Kaspi, Julie Mazenq, Adrien Pagin, Rana Mitri-Frangieh, Mohamed Boucekine, Karine Baumstarck, Thomas Radulesco, Justin Michel, Nadine Dufeu, Jean-Christophe Dubus, Patrice Roll and Diane Frankel

Diagnostics 2026, 16(5), 704; https://doi.org/10.3390/diagnostics16050704 - 27 Feb 2026

Viewed by 587

Abstract

Background/Objectives: Primary ciliary dyskinesia (PCD) is a rare inherited disorder caused by dysfunction of motile cilia, leading to chronic respiratory disease. Diagnosis is challenging due to heterogeneous and non-specific clinical manifestations and the absence of a single definitive diagnostic test. Current diagnostic [...] Read more.

Background/Objectives: Primary ciliary dyskinesia (PCD) is a rare inherited disorder caused by dysfunction of motile cilia, leading to chronic respiratory disease. Diagnosis is challenging due to heterogeneous and non-specific clinical manifestations and the absence of a single definitive diagnostic test. Current diagnostic strategies rely on a combination of functional, ultrastructural, and genetic analyses. The objective of this study was to evaluate whether ciliary beat frequency (CBF), combined with ciliary beat pattern (CBP) assessment using digital high-speed video microscopy (DHSV), could serve as an effective first-line screening tool to identify patients requiring further diagnostic investigations. Methods: This single-center retrospective study included 65 patients (52 children and 13 adults) with clinical suspicion of PCD. Ciliary beat analysis was performed on nasal or bronchial samples using DHSV and Sisson–Ammons Video Analysis software. CBF and CBP were assessed and compared between patients with confirmed PCD and those in whom PCD was excluded based on transmission electron microscopy (TEM) and/or molecular genetic analysis. Results: Fifteen patients were diagnosed with PCD. Mean CBF was significantly lower in the PCD group compared with the non-PCD group (3.3 Hz vs. 8.1 Hz; p < 0.001). A CBF cut-off value of 5.25 Hz yielded a sensitivity of 78.6% and a specificity of 95.7%. Three patients with PCD had CBF values above this threshold; however, two of them exhibited abnormal CBP. Sample type, patient age, and the presence of airway pathogens did not significantly influence CBF measurements. Conclusions: CBF and CBP analysis using DHSV represents a useful first-line screening tool within a multifaceted diagnostic approach for PCD, allowing rapid identification of patients who should undergo further confirmatory testing. Full article

(This article belongs to the Section Clinical Laboratory Medicine)

► Show Figures

Figure 1

17 pages, 2088 KB

Open AccessArticle

Perception-Driven and Object-Aware Fast MTT Partitioning for H.266/VVC: A Saliency-Guided Complexity Reduction Framework

by Chih-Ying Lin, Jia-Yi Yeh, Yu-Cheng Chen, Yi-Fan Li, Chih-Ming Lien, Mei-Juan Chen and Chia-Hung Yeh

Electronics 2026, 15(1), 133; https://doi.org/10.3390/electronics15010133 - 27 Dec 2025

Cited by 1 | Viewed by 850

Abstract

The H.266/Versatile Video Coding (VVC) standard was developed to address the growing demand for compressing ultra-high-definition video content, supporting resolutions ranging from 4K to 8K and beyond. H.266/VVC improves coding efficiency by introducing a flexible quadtree with nested multi-type tree (QT-MTT) partitioning and [...] Read more.

The H.266/Versatile Video Coding (VVC) standard was developed to address the growing demand for compressing ultra-high-definition video content, supporting resolutions ranging from 4K to 8K and beyond. H.266/VVC improves coding efficiency by introducing a flexible quadtree with nested multi-type tree (QT-MTT) partitioning and various advanced coding tools. However, these improvements substantially increase the encoding complexity. To address this issue, we propose a perception-driven and object-aware algorithm that accelerates the MTT process in H.266/VVC intra coding. Our method integrates pixel-level saliency detection with object bounding box detection. Specifically, visually distinguishable (VD) pixels are identified using a just noticeable distortion (JND) model based on average background luminance, while detected-object regions are extracted using a YOLO object detection network. These two types of perceptual information are combined to guide adaptive encoding decisions. For each frame, a perception-driven pixel map labeled with VD pixels and a YOLO-based object map are generated. Within the MTT framework, partitioning decisions are determined jointly by standard deviation metrics derived from VD pixels and detected-object region coverage. By incorporating flexible threshold settings, the proposed method can meet different users’ requirements. In this paper, we performed experiments under three threshold settings. The experimental results demonstrate that the proposed method reduces H.266/VVC intra coding time by 27.94% to 43.11%, with BDBR increases of only 1.02% to 1.53%, thus achieving an appropriate trade-off between encoding speed and coding efficiency. Full article

(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

23 pages, 5039 KB

Open AccessArticle

A3DSimVP: Enhancing SimVP-v2 with Audio and 3D Convolution

by Junfeng Yang, Mingrui Long, Hongjia Zhu, Limei Liu, Wenzhi Cao, Qin Li and Han Peng

Electronics 2026, 15(1), 112; https://doi.org/10.3390/electronics15010112 - 25 Dec 2025

Viewed by 920

Abstract

In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a [...] Read more.

In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a critical challenge. Traditional error control strategies, such as Forward Error Correction (FEC) and Automatic Repeat Request (ARQ), often introduce excessive latency or bandwidth overhead. Meanwhile, receiver-side concealment methods struggle under high motion or significant packet loss, motivating the exploration of predictive models. SimVP-v2, with its efficient convolutional architecture and Gated Spatiotemporal Attention (GSTA) mechanism, provides a strong baseline by reducing complexity and achieving competitive prediction performance. Despite its merits, SimVP-v2’s reliance on 2D convolutions for implicit temporal aggregation limits its capacity to capture complex motion trajectories and long-term dependencies. This often results in artifacts such as motion blur, detail loss, and accumulated errors. Furthermore, its single-modality design ignores the complementary contextual cues embedded in the audio stream. To overcome these issues, we propose A3DSimVP (Audio- and 3D-Enhanced SimVP-v2), which integrates explicit spatio-temporal modeling with multimodal feature fusion. Architecturally, we replace the 2D depthwise separable convolutions within the GSTA module with their 3D counterparts, introducing a redesigned GSTA-3D module that significantly improves motion coherence across frames. Additionally, an efficient audio–visual fusion strategy supplements visual features with contextual audio guidance, thereby enhancing the model’s robustness and perceptual realism. We validate the effectiveness of A3DSimVP’s improvements through extensive experiments on the KTH dataset. Our model achieves a PSNR of 27.35 dB, surpassing the 27.04 of the SimVP-v2 baseline. Concurrently, our improved A3DSimVP model reduces the loss metrics on the KTH dataset, achieving an MSE of 43.82 and an MAE of 385.73, both lower than the baseline. Crucially, our LPIPS metric is substantially lowered to 0.22. These data tangibly confirm that A3DSimVP significantly enhances both structural fidelity and perceptual quality while maintaining high predictive accuracy. Notably, A3DSimVP attains faster inference speeds than the baseline with only a marginal increase in computational overhead. These results establish A3DSimVP as an efficient and robust solution for latency-critical video applications. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)

► Show Figures

Figure 1

22 pages, 14012 KB

Open AccessArticle

Video Frame Interpolation for Extreme Motion Scenes Based on Dual Alignment and Region-Adaptive Interaction

by Xin Ning, Jiantao Qu, Junyi Duan, Kun Yang and Youdong Ding

Symmetry 2025, 17(12), 2097; https://doi.org/10.3390/sym17122097 - 6 Dec 2025

Viewed by 1227

Abstract

Video frame interpolation in ultra-high-definition extreme motion scenes remains highly challenging due to large displacements, nonlinear motion, and occlusions that disrupt spatio-temporal symmetry. To address this issue, this study proposes a frame interpolation method for extreme motion scenes based on dual alignment and [...] Read more.

Video frame interpolation in ultra-high-definition extreme motion scenes remains highly challenging due to large displacements, nonlinear motion, and occlusions that disrupt spatio-temporal symmetry. To address this issue, this study proposes a frame interpolation method for extreme motion scenes based on dual alignment and region-adaptive interaction from the perspectives of cross-frame localization and adaptive reconstruction. Specifically, we design a two-stage motion information alignment strategy that obtains two types of motion information via optical flow estimation and offset estimation, and it progressively guides reference pixels for accurate long-range cross-frame localization, mitigating structural misalignment caused by limited receptive fields while simultaneously alleviating spatiotemporal asymmetry caused by inconsistent inter-frame motion speed and direction. Based on this, we introduce a region-adaptive interaction module that automatically adapts motion representations for different regions through cross-frame interaction and leverages distinct attention pathways to accurately capture both the global context and local high-frequency motion details. This achieves a dynamic feature fusion tailored to regional characteristics, significantly enhancing the model’s ability to perceive the overall structure and texture details in extreme motion scenarios. In addition, the introduction of a motion compensation module explicitly captures pixel motion relationships by constructing a global correlation matrix that compensates for the positioning errors of the dual alignment module in extreme motion or occlusion areas. The experimental results demonstrate that the proposed method achieves excellent overall performance in ultra-high-definition extreme motion scenes, with a PSNR improvement of 0.05 dB over state-of-the-art methods. In multi-frame interpolation tasks, it achieves an average PSNR gain of 0.31 dB, demonstrating strong cross-scene interpolation capability. Full article

(This article belongs to the Special Issue Symmetry in Artificial Intelligence and Applications)

► Show Figures

Figure 1

20 pages, 1343 KB

Open AccessArticle

Hybrid CDN Architecture Integrating Edge Caching, MEC Offloading, and Q-Learning-Based Adaptive Routing

by Aymen D. Salman, Akram T. Zeyad, Asia Ali Salman Al-karkhi, Safanah M. Raafat and Amjad J. Humaidi

Computers 2025, 14(10), 433; https://doi.org/10.3390/computers14100433 - 13 Oct 2025

Cited by 1 | Viewed by 3601

Abstract

Content Delivery Networks (CDNs) have evolved to meet surging data demands and stringent low-latency requirements driven by emerging applications like high-definition video streaming, virtual reality, and IoT. This paper proposes a hybrid CDN architecture that synergistically combines edge caching, Multi-access Edge Computing (MEC) [...] Read more.

Content Delivery Networks (CDNs) have evolved to meet surging data demands and stringent low-latency requirements driven by emerging applications like high-definition video streaming, virtual reality, and IoT. This paper proposes a hybrid CDN architecture that synergistically combines edge caching, Multi-access Edge Computing (MEC) offloading, and reinforcement learning (Q-learning) for adaptive routing. In the proposed system, popular content is cached at radio access network edges (e.g., base stations) and computation-intensive tasks are offloaded to MEC servers, while a Q-learning agent dynamically routes user requests to the optimal service node (cache, MEC server, or origin) based on the network state. The study presented detailed system design and provided comprehensive simulation-based evaluation. The results demonstrate that the proposed hybrid approach significantly improves cache hit ratios and reduces end-to-end latency compared to traditional CDNs and simpler edge architectures. The Q-learning-enabled routing adapts to changing load and content popularity, converging to efficient policies that outperform static baselines. The proposed hybrid model has been tested against variants lacking MEC, edge caching, or the RL-based controller to isolate each component’s contributions. The paper concludes with a discussion on practical considerations, limitations, and future directions for intelligent CDN networking at the edge. Full article

(This article belongs to the Special Issue Edge and Fog Computing for Internet of Things Systems (2nd Edition))

► Show Figures

Figure 1

23 pages, 1348 KB

Open AccessReview

Opportunities Offered by Telemedicine in the Care of Patients Affected by Fractures and Critical Issues: A Narrative Review

by Giulia Vita, Valerio Massimo Magro, Andrea Sorbino, Concetta Ljoka, Nicola Manocchio and Calogero Foti

J. Clin. Med. 2025, 14(20), 7135; https://doi.org/10.3390/jcm14207135 - 10 Oct 2025

Cited by 6 | Viewed by 2699

Abstract

Telerehabilitation is an effective, accessible addition or alternative to conventional rehabilitation for fracture management, especially in older adults after hip fractures, leveraging video visits, mHealth apps, virtual reality (VR), and wearable sensors to deliver exercise, education, and monitoring at home with high satisfaction [...] Read more.

Telerehabilitation is an effective, accessible addition or alternative to conventional rehabilitation for fracture management, especially in older adults after hip fractures, leveraging video visits, mHealth apps, virtual reality (VR), and wearable sensors to deliver exercise, education, and monitoring at home with high satisfaction and adherence. Across non-surgical and surgical contexts, telemedicine shows feasibility and cost benefits, with mixed superiority but consistent non-inferiority for functional outcomes versus in-person care. In hip fracture populations, randomized and non-randomized studies indicate improvements in functional independence measure (FIM), Timed Up and Go test (TUG), Activities of Daily Living/Instrumental Activities of Daily Living (ADLs/IADLs), and quality of life, with some evidence for reduced anxiety and depression, while effects on mobility, pain, and adverse events remain uncertain overall. In patients with upper-limb fractures, telerehabilitation appears to improve function and pain, though strength gains may lag compared with in-person therapy in some trials; adjuncts like motor imagery and virtual reality may enhance outcomes and motivation. Application is facilitated by user-friendly platforms, caregiver involvement, and simple modalities such as structured phone follow-up. Limitations include small samples, heterogeneous protocols, scarce long-term data, and a predominance of non-inferiority or complementary designs, warranting larger, definitive trials. This technology can lead to improved patient management at home, effortlessly verifying treatment compliance, efficacy, and safety, while simultaneously reducing the need for hospitalization, promoting a more peaceful recovery. Here, we have undertaken a narrative review of the medical–scientific literature in this field. Full article

(This article belongs to the Special Issue Recent Advances in the Management of Fractures)

► Show Figures

Figure 1

Search Results (150)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (150)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI