MDPI - Publisher of Open Access Journals

25 pages, 9990 KB

Open AccessArticle

Bidirectional Mamba-Enhanced 3D Human Pose Estimation for Accurate Clinical Gait Analysis

by Chengjun Wang, Wenhang Su, Jiabao Li and Jiahang Xu

Fractal Fract. 2025, 9(9), 603; https://doi.org/10.3390/fractalfract9090603 - 17 Sep 2025

Viewed by 339

Three-dimensional human pose estimation from monocular video remains challenging for clinical gait analysis due to high computational cost and the need for temporal consistency. We present Pose3DM, a bidirectional Mamba-based state-space framework that models intra-frame joint relations and inter-frame dynamics with linear computational [...] Read more.

Three-dimensional human pose estimation from monocular video remains challenging for clinical gait analysis due to high computational cost and the need for temporal consistency. We present Pose3DM, a bidirectional Mamba-based state-space framework that models intra-frame joint relations and inter-frame dynamics with linear computational complexity. Replacing transformer self-attention with state-space modeling improves efficiency without sacrificing accuracy. We further incorporate fractional-order total-variation regularization to capture long-range dependencies and memory effects, enhancing temporal and spatial coherence in gait dynamics. On Human3.6M, Pose3DM-L achieves 37.9 mm MPJPE under Protocol 1 (P1) and 32.1 mm P-MPJPE under Protocol 2 (P2), with 127 M MACs per frame and 30.8 G MACs in total. Relative to MotionBERT, P1 and P2 errors decrease by 3.3% and 2.4%, respectively, with 82.5% fewer parameters and 82.3% fewer MACs per frame. Compared with MotionAGFormer-L, Pose3DM-L improves P1 by 0.5 mm and P2 by 0.4 mm while using 60.6% less computation: 30.8 G vs. 78.3 G total MACs and 127 M vs. 322 M per frame. On AUST-VisGait across six gait patterns, Pose3DM consistently yields lower MPJPE, standard error, and maximum error, enabling reliable extraction of key gait parameters from monocular video. These results highlight state-space models as a cost-effective route to real-time gait assessment using a single RGB camera. Full article

(This article belongs to the Special Issue Fractional and Fractal Methods in Biomedical Imaging and Time Series Learning)

► Show Figures

Figure 1

19 pages, 2675 KB

Open AccessArticle

Fast Intra-Coding Unit Partitioning for 3D-HEVC Depth Maps via Hierarchical Feature Fusion

by Fangmei Liu, He Zhang and Qiuwen Zhang

Electronics 2025, 14(18), 3646; https://doi.org/10.3390/electronics14183646 - 15 Sep 2025

Viewed by 311

Abstract

As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like [...] Read more.

As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like depth modeling modes (DMMs), substantially prolongs the decision-making process for coding unit (CU) partitioning, becoming a critical bottleneck in compression encoding time. To address this issue, this paper proposes a fast CU partitioning framework based on hierarchical feature fusion convolutional neural networks (HFF-CNNs). It aims to significantly accelerate the overall encoding process while ensuring excellent encoding quality by optimizing depth map CU partitioning decisions. This framework synergistically captures CU’s global structure and local details through multi-scale feature extraction and channel attention mechanisms (SE module). It introduces the wavelet energy ratio designed for quantifying the texture complexity of depth map CU and the quantization parameter (QP) that reflects the encoding quality as external features, enhancing the dynamic perception ability of the model from different dimensions. Ultimately, it outputs depth-corresponding partitioning predictions through three fully connected layers, strictly adhering to HEVC’s quad-tree recursive segmentation mechanism. Experimental results demonstrate that, across eight standard test sequences, the proposed method achieves an average encoding time reduction of 48.43%, significantly lowering intra-frame encoding complexity with a BDBR increment of only 0.35%. The model exhibits outstanding lightweight characteristics with minimal inference time overhead. Compared with the representative methods under comparison, this method achieves a better balance between cross-resolution adaptability and computational efficiency, providing a feasible optimization path for real-time 3D-HEVC applications. Full article

► Show Figures

Figure 1

24 pages, 3398 KB

Open AccessArticle

DEMNet: Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding

by Feng He, Qiran Zhang, Yichuan Li and Tianci Wang

Remote Sens. 2025, 17(17), 2963; https://doi.org/10.3390/rs17172963 - 26 Aug 2025

Viewed by 740

Abstract

Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to [...] Read more.

Infrared dim and small target detection aims to accurately localize targets within complex backgrounds or clutter. However, under extremely low signal-to-noise ratio (SNR) conditions, single-frame detection methods often fail to effectively detect such targets. In contrast, multi-frame detection can exploit temporal cues to significantly improve the probability of detection (Pd) and reduce false alarms (Fa). Existing multi-frame approaches often employ 3D convolutions/RNNs to implicitly extract temporal features. However, they typically lack explicit modeling of target motion. To address this, we propose a Dual Encoder–Decoder Multi-Frame Infrared Small Target Detection Network with Motion Encoding (DEMNet) that explicitly incorporates motion information into the detection process. The first multi-level encoder–decoder module leverages spatial and channel attention mechanisms to fuse hierarchical features across multiple scales, enabling robust spatial feature extraction from each frame of the temporally aligned input sequence. The second encoder–decoder module encodes both inter-frame target motion and intra-frame target positional information, followed by 3D convolution to achieve effective motion information fusion. Extensive experiments demonstrate that DEMNet achieves state-of-the-art performance, outperforming recent advanced methods such as DTUM and SSTNet. For the DAUB dataset, compared to the second-best model, DEMNet improves Pd by 2.42 percentage points and reduces Fa by 4.13 × 10⁻⁶ (a 68.72% reduction). For the NUDT dataset, it improves Pd by 1.68 percentage points and reduces Fa by 0.67 × 10⁻⁶ (a 7.26% reduction) compared to the next-best model. Notably, DEMNet demonstrates even greater advantages on test sequences with SNR ≤ 3. Full article

(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)

► Show Figures

Figure 1

18 pages, 4672 KB

Open AccessArticle

Desynchronization Resilient Audio Watermarking Based on Adaptive Energy Modulation

by Weinan Zhu, Yanxia Zhou, Deyang Wu, Gejian Zhao, Zhicheng Dong, Jingyu Ye and Hanzhou Wu

Mathematics 2025, 13(17), 2736; https://doi.org/10.3390/math13172736 - 26 Aug 2025

Viewed by 656

Abstract

With the rapid proliferation of social media platforms and user-generated content, audio data is frequently shared, remixed, and redistributed online. This raises urgent needs for copyright protection and traceability to safeguard the integrity and ownership of such content. Resilience to desynchronization attacks remains [...] Read more.

With the rapid proliferation of social media platforms and user-generated content, audio data is frequently shared, remixed, and redistributed online. This raises urgent needs for copyright protection and traceability to safeguard the integrity and ownership of such content. Resilience to desynchronization attacks remains a significant challenge in audio watermarking. Most existing techniques face a trade-off between embedding capacity, robustness, and imperceptibility, making it difficult to meet all three requirements effectively in real-world applications. To address this issue, we propose an improved patchwork-based audio watermarking algorithm. Each audio frame is divided into two non-overlapping segments, from which mid-frequency energy features are extracted and modulated for watermark embedding. A linearly decreasing buffer compensation mechanism balances imperceptibility and robustness. Additionally, an optimization algorithm is incorporated to enhance watermark transparency while maintaining resistance to desynchronization attacks. During watermark extraction, each bit of the watermark is recovered by analyzing the intra-frame energy relationships. Furthermore, we provide a theoretical analysis demonstrating that the proposed method is robust against various types of attack. Extensive experimental results demonstrate that the proposed scheme ensures high audio quality, strong robustness against desynchronization attacks, and a higher embedding capacity than existing methods. Full article

(This article belongs to the Special Issue Information Security and Image Processing)

► Show Figures

Figure 1

30 pages, 37977 KB

Open AccessArticle

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding

by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang

Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025

Viewed by 608

Abstract

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 2545 KB

Open AccessArticle

Reliable Indoor Fire Detection Using Attention-Based 3D CNNs: A Fire Safety Engineering Perspective

by Mostafa M. E. H. Ali and Maryam Ghodrat

Fire 2025, 8(7), 285; https://doi.org/10.3390/fire8070285 - 21 Jul 2025

Cited by 1 | Viewed by 1027

Abstract

Despite recent advances in deep learning for fire detection, much of the current research prioritizes model-centric metrics over dataset fidelity, particularly from a fire safety engineering perspective. Commonly used datasets are often dominated by fully developed flames, mislabel smoke-only frames as non-fire, or [...] Read more.

Despite recent advances in deep learning for fire detection, much of the current research prioritizes model-centric metrics over dataset fidelity, particularly from a fire safety engineering perspective. Commonly used datasets are often dominated by fully developed flames, mislabel smoke-only frames as non-fire, or lack intra-video diversity due to redundant frames from limited sources. Some works treat smoke detection alone as early-stage detection, even though many fires (e.g., electrical or chemical) begin with visible flames and no smoke. Additionally, attempts to improve model applicability through mixed-context datasets—combining indoor, outdoor, and wildland scenes—often overlook the unique false alarm sources and detection challenges specific to each environment. To address these limitations, we curated a new video dataset comprising 1108 annotated fire and non-fire clips captured via indoor surveillance cameras. Unlike existing datasets, ours emphasizes early-stage fire dynamics (pre-flashover) and includes varied fire sources (e.g., sofa, cupboard, and attic fires), realistic false alarm triggers (e.g., flame-colored objects, artificial lighting), and a wide range of spatial layouts and illumination conditions. This collection enables robust training and benchmarking for early indoor fire detection. Using this dataset, we developed a spatiotemporal fire detection model based on the mixed convolutions ResNets (MC3_18) architecture, augmented with Convolutional Block Attention Modules (CBAM). The proposed model achieved 86.11% accuracy, 88.76% precision, and 84.04% recall, along with low false positive (11.63%) and false negative (15.96%) rates. Compared to its CBAM-free baseline, the model exhibits notable improvements in F1-score and interpretability, as confirmed by Grad-CAM++ visualizations highlighting attention to semantically meaningful fire features. These results demonstrate that effective early fire detection is inseparable from high-quality, context-specific datasets. Our work introduces a scalable, safety-driven approach that advances the development of reliable, interpretable, and deployment-ready fire detection systems for residential environments. Full article

(This article belongs to the Special Issue Computer Vision and Artificial Intelligence in Fire and Flame Detection)

► Show Figures

Figure 1

31 pages, 3781 KB

Open AccessArticle

Enhancing Sustainable Mobility Through Gamified Challenges: Evidence from a School-Based Intervention

by Martina Vacondio, Federica Gini, Simone Bassanelli and Annapaola Marconi

Sustainability 2025, 17(14), 6586; https://doi.org/10.3390/su17146586 - 18 Jul 2025

Viewed by 724

Abstract

Promoting behavioral change in mobility is essential for sustainable urban development. This study evaluates the effectiveness of gamified challenges in fostering sustainable travel behaviors among high school students and teachers within the High School Challenge (HSC) 2024 campaign in Lecco, Italy. Over a [...] Read more.

Promoting behavioral change in mobility is essential for sustainable urban development. This study evaluates the effectiveness of gamified challenges in fostering sustainable travel behaviors among high school students and teachers within the High School Challenge (HSC) 2024 campaign in Lecco, Italy. Over a 13-week period, participants tracked their commuting habits via gamified mobile application, Play&Go, that awarded points for sustainable mobility choices and introduced weekly challenges. Using behavioral (GPS-based tracking) and self-report data, we assessed the influence of challenge types, player characteristics (HEXAD Player Types, Big Five traits), and user experience evaluations on participation, retention, and behavior change. The results show that challenges, particularly those based on walking distances and framed as intra-team goals, significantly enhanced user engagement and contributed to improved mobility behaviors during participants’ free time. Compared to the 2023 edition without challenges, the 2024 campaign achieved better retention. HEXAD Player Types were more predictive of user appreciation than Personality Traits, though these effects were more evident in subjective evaluations than actual behavior. Overall, findings highlight the importance of tailoring gamified interventions to users’ motivational profiles and structuring challenges around SMART principles. This study contributes to the design of behaviorally informed, scalable solutions for sustainable mobility transitions. Full article

(This article belongs to the Special Issue Transportation and Sustainable Mobility; Users’ Transitions for a Greener Future)

► Show Figures

Graphical abstract

21 pages, 2816 KB

Open AccessArticle

AutoStageMix: Fully Automated Stage Cross-Editing System Utilizing Facial Features

by Minjun Oh, Howon Jang and Daeho Lee

Appl. Sci. 2025, 15(13), 7613; https://doi.org/10.3390/app15137613 - 7 Jul 2025

Viewed by 420

Abstract

StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated [...] Read more.

StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated StageMix generation system designed to perform all processes automatically. The system is structured into five principal stages: preprocessing, feature extraction, identifying a transition point, editing path determination, and StageMix generation. The initial stage of the process involves audio analysis to synchronize the sequences across all input videos, followed by frame extraction. After that, the facial features are extracted from each video frame. Next, transition points are identified, which form the basis for face-based transitions, inter-stage cuts, and intra-stage cuts. Subsequently, a cost function is defined to facilitate the creation of cross-edited sequences. The optimal editing path is computed using Dijkstra’s algorithm to minimize the total cost of editing. Finally, the StageMix is generated by applying appropriate editing effects tailored to each transition type, aiming to maximize visual appeal. Experimental results suggest that our method generally achieves lower NME scores than existing StageMix generation approaches across multiple test songs. In a user study with 21 participants, AutoStageMix achieved viewer satisfaction comparable to that of professionally edited StageMixes, with no statistically significant difference between the two. AutoStageMix enables users to produce StageMixes effortlessly and efficiently by eliminating the need for manual editing. Full article

► Show Figures

Figure 1

17 pages, 7292 KB

Open AccessArticle

QP-Adaptive Dual-Path Residual Integrated Frequency Transformer for Data-Driven In-Loop Filter in VVC

by Cheng-Hsuan Yeh, Chi-Ting Ni, Kuan-Yu Huang, Zheng-Wei Wu, Cheng-Pin Peng and Pei-Yin Chen

Sensors 2025, 25(13), 4234; https://doi.org/10.3390/s25134234 - 7 Jul 2025

Viewed by 540

Abstract

As AI-enabled embedded systems such as smart TVs and edge devices demand efficient video processing, Versatile Video Coding (VVC/H.266) becomes essential for bandwidth-constrained Multimedia Internet of Things (M-IoT) applications. However, its block-based coding often introduces compression artifacts. While CNN-based methods effectively reduce these [...] Read more.

As AI-enabled embedded systems such as smart TVs and edge devices demand efficient video processing, Versatile Video Coding (VVC/H.266) becomes essential for bandwidth-constrained Multimedia Internet of Things (M-IoT) applications. However, its block-based coding often introduces compression artifacts. While CNN-based methods effectively reduce these artifacts, maintaining robust performance across varying quantization parameters (QPs) remains challenging. Recent QP-adaptive designs like QA-Filter show promise but are still limited. This paper proposes DRIFT, a QP-adaptive in-loop filtering network for VVC. DRIFT combines a lightweight frequency fusion CNN (LFFCNN) for local enhancement and a Swin Transformer-based global skip connection for capturing long-range dependencies. LFFCNN leverages octave convolution and introduces a novel residual block (FFRB) that integrates multiscale extraction, QP adaptivity, frequency fusion, and spatial-channel attention. A QP estimator (QPE) is further introduced to mitigate double enhancement in inter-coded frames. Experimental results demonstrate that DRIFT achieves BD rate reductions of 6.56% (intra) and 4.83% (inter), with an up to 10.90% gain on the BasketballDrill sequence. Additionally, LFFCNN reduces the model size by 32% while slightly improving the coding performance over QA-Filter. Full article

(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)

► Show Figures

Figure 1

24 pages, 2149 KB

Open AccessArticle

STA-3D: Combining Spatiotemporal Attention and 3D Convolutional Networks for Robust Deepfake Detection

by Jingbo Wang, Jun Lei, Shuohao Li and Jun Zhang

Symmetry 2025, 17(7), 1037; https://doi.org/10.3390/sym17071037 - 1 Jul 2025

Viewed by 938

Abstract

Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos [...] Read more.

Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos and cross-dataset scenarios. Observing that mainstream generation methods use frame-by-frame synthesis without adequate temporal consistency constraints, we introduce the Spatiotemporal Attention 3D Network (STA-3D), a novel framework that combines a lightweight spatiotemporal attention module with a 3D convolutional architecture to improve detection robustness. The proposed attention module adopts a symmetric multi-branch architecture, where each branch follows a nearly identical processing pipeline to separately model temporal-channel, temporal-spatial, and intra-spatial correlations. Our framework additionally implements Spatial Pyramid Pooling (SPP) layers along the temporal axis, enabling adaptive modeling regardless of input video length. Furthermore, we mitigate the inherent asymmetry in the quantity of authentic and forged samples by replacing standard cross entropy with focal loss for training. This integration facilitates the simultaneous exploitation of inter-frame temporal discontinuities and intra-frame spatial artifacts, achieving competitive performance across various benchmark datasets under different compression conditions: for the intra-dataset setting on FF++, it improves the average accuracy by 1.09 percentage points compared to existing SOTA, with a more significant gain of 1.63 percentage points under the most challenging C40 compression level (particularly for NeuralTextures, achieving an improvement of 4.05 percentage points); while for the intra-dataset setting, AUC is enhanced by 0.24 percentage points on the DFDC-P dataset. Full article

(This article belongs to the Special Issue Applications Based on Symmetry and Asymmetry in Deep Learning and Artificial Intelligence Methods)

► Show Figures

Figure 1

19 pages, 1230 KB

Open AccessArticle

A Graph Convolutional Network Framework for Area Attention and Tracking Compensation of In-Orbit Satellite

by Shuai Wang, Ruoke Wu, Yizhi Jiang, Xiaoqiang Di, Yining Mu, Guanyu Wen, Makram Ibrahim and Jinqing Li

Appl. Sci. 2025, 15(12), 6742; https://doi.org/10.3390/app15126742 - 16 Jun 2025

Viewed by 409

Abstract

In order to solve the problems of low tracking accuracy of in-orbit satellites by ground stations and slow processing speed of satellite target tracking images, this paper proposes an orbital satellite regional tracking and prediction model based on graph convolutional networks (GCNs). By [...] Read more.

In order to solve the problems of low tracking accuracy of in-orbit satellites by ground stations and slow processing speed of satellite target tracking images, this paper proposes an orbital satellite regional tracking and prediction model based on graph convolutional networks (GCNs). By performing superpixel segmentation on the satellite tracking image information, we constructed an intra-frame superpixel seed graph node network, enabling the conversion of spatial optical image information into artificial-intelligence-based graph feature data. On this basis, we propose and build an in-orbit satellite region of interest prediction model, which effectively enhances the perception of in-orbit satellite feature information and can be used for in-orbit target prediction. This model, for the first time, combines intra-frame and inter-frame graph structures to improve the sensitivity of GCNs to the spatial feature information of in-orbit satellites. Finally, the model is trained and validated using real satellite target tracking image datasets, demonstrating the effectiveness of the proposed model. Full article

► Show Figures

Figure 1

32 pages, 4311 KB

Open AccessArticle

DRGNet: Enhanced VVC Reconstructed Frames Using Dual-Path Residual Gating for High-Resolution Video

by Zezhen Gai, Tanni Das and Kiho Choi

Sensors 2025, 25(12), 3744; https://doi.org/10.3390/s25123744 - 15 Jun 2025

Viewed by 665

Abstract

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding [...] Read more.

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding technologies, such as Advanced Video Coding/H.264 (AVC), High Efficiency Video Coding/H.265 (HEVC), and Versatile Video Coding/H.266 (VVC), which significantly improves compression efficiency while maintaining high video quality. However, during the encoding process, compression artifacts and the loss of visual details remain unavoidable challenges, particularly in high-resolution video processing, where the massive amount of image data tends to introduce more artifacts and noise, ultimately affecting the user’s viewing experience. Therefore, effectively reducing artifacts, removing noise, and minimizing detail loss have become critical issues in enhancing video quality. To address these challenges, this paper proposes a post-processing method based on Convolutional Neural Network (CNN) that improves the quality of VVC-reconstructed frames through deep feature extraction and fusion. The proposed method is built upon a high-resolution dual-path residual gating system, which integrates deep features from different convolutional layers and introduces convolutional blocks equipped with gating mechanisms. By ingeniously combining gating operations with residual connections, the proposed approach ensures smooth gradient flow while enhancing feature selection capabilities. It selectively preserves critical information while effectively removing artifacts. Furthermore, the introduction of residual connections reinforces the retention of original details, achieving high-quality image restoration. Under the same bitrate conditions, the proposed method significantly improves the Peak Signal-to-Noise Ratio (PSNR) value, thereby optimizing video coding quality and providing users with a clearer and more detailed visual experience. Extensive experimental results demonstrate that the proposed method achieves outstanding performance across Random Access (RA), Low Delay B-frame (LDB), and All Intra (AI) configurations, achieving BD-Rate improvements of 6.1%, 7.36%, and 7.1% for the luma component, respectively, due to the remarkable PSNR enhancement. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

8 pages, 502 KB

Open AccessProceeding Paper

Adaptive Frequency and Assignment Algorithm for Context-Based Arithmetic Compression Codes for H.264 Video Intraframe Encoding

by Huang-Chun Hsu and Jian-Jiun Ding

Eng. Proc. 2025, 98(1), 4; https://doi.org/10.3390/engproc2025098004 - 4 Jun 2025

Viewed by 379

Abstract

In modern communication technology, short videos are increasingly used on social media platforms. The advancement of video codecs is pivotal in communication. In this study, we developed a new scheme to encode the residue of intraframes. For the H.264 baseline profile, we used [...] Read more.

In modern communication technology, short videos are increasingly used on social media platforms. The advancement of video codecs is pivotal in communication. In this study, we developed a new scheme to encode the residue of intraframes. For the H.264 baseline profile, we used context-based arithmetic variable-length coding (CAVLC) to encode the residue of integer transforms in a block-wise manner. In the developed method, the DC and AC coefficients are separated. In addition, context assignment, adaptive scanning, range increment, and mutual learning are adopted in a mixture of fixed-length and variable-length schemes, and block-wise compressions of the frequency table are applied to obtain improved compression rates. Compressing the frequency prevents CAVLC from being hindered by horizontally/vertically dominated blocks. The developed method outperforms CAVLC, with average reductions of 7.81, 8.58, and 7.88% in quarter common intermediate format (QCIF), common intermediate format (CIF), and full high-definition (FHD) inputs. Full article

(This article belongs to the Proceedings of 2024 4th International Conference on Social Sciences and Intelligence Management (SSIM 2024))

► Show Figures

Figure 1

31 pages, 5995 KB

Open AccessArticle

Study on Seismic Performance of Frame–Shear Wall Split-Foundation Structures with Shear Walls on Both Grounding Ends

by Wusu Wang, Baolong Jiang, Yingmin Li, Yangyang Tang and Shuyan Ji

Buildings 2025, 15(11), 1852; https://doi.org/10.3390/buildings15111852 - 28 May 2025

Viewed by 767

Abstract

This study focuses on the fundamental mechanical behavior of frame–shear wall split-foundation structures with shear walls at both upper and lower ground ends, investigating their basic mechanical characteristics, internal force redistribution patterns, and the influencing factor of intra-stiffness ratio on seismic performance. From [...] Read more.

This study focuses on the fundamental mechanical behavior of frame–shear wall split-foundation structures with shear walls at both upper and lower ground ends, investigating their basic mechanical characteristics, internal force redistribution patterns, and the influencing factor of intra-stiffness ratio on seismic performance. From the analysis results, it can be found that the relative drop height of frame–shear wall split-foundation structures significantly affects their internal force patterns. Shear-bending stiffness should be adopted in stiffness calculations to reflect the stiffness reduction effect of drop height on lower embedding shear walls. In frame–shear wall split-foundation structures, the existence of drop height causes upper embedding columns to experience more unfavorable stress conditions compared to lower embedding shear walls, potentially preventing lower embedding shear walls from serving as the primary seismic defense line. Strengthening lower embedding shear walls to reduce the intra-stiffness ratio can mitigate this issue. Performance evaluation under bidirectional rare earthquakes shows greater along-slope directional damage than cross-slope directional damage. Increasing shear wall length to reduce the intra-stiffness ratio improves component rotation-based performance, but shear strain-based evaluation of upper embedding shear walls indicates a limited improvement in shear capacity. Special attention should therefore be paid to along-slope directional shear capacity of upper embedding shear walls during structural design. Full article

(This article belongs to the Special Issue Advanced Concrete Structures: Structural Behaviors and Design Methods—2nd Edition)

► Show Figures

Figure 1

13 pages, 2778 KB

Open AccessArticle

Speckle-Tracking Echocardiography in Dogs: Evaluating Imaging Parameters and Methodological Variability in Global Longitudinal Strain Assessment

by Jonas E. Mogensen, Maiken B. T. Bach, Pernille G. Bay, Tuğba Varlik, Jakob L. Willesen, Caroline H. Gleerup and Jørgen Koch

Animals 2025, 15(11), 1523; https://doi.org/10.3390/ani15111523 - 23 May 2025

Viewed by 1103

Abstract

Two-dimensional speckle-tracking echocardiography (2D-STE) is an advanced imaging technique that offers quantitative insights into myocardial function by analyzing the motion of speckles created during ultrasound–tissue interactions. This study aims to evaluate the reliability of 2D-STE by examining the impact of key technical parameters [...] Read more.

Two-dimensional speckle-tracking echocardiography (2D-STE) is an advanced imaging technique that offers quantitative insights into myocardial function by analyzing the motion of speckles created during ultrasound–tissue interactions. This study aims to evaluate the reliability of 2D-STE by examining the impact of key technical parameters on global longitudinal strain (GLS) measurement accuracy and comparing two speckle-tracking analysis methods provided by GE Healthcare: quantitative analysis of the 2D strain (2D strain) and automated function imaging (AFI). The prospective study consisted of two cohorts. In the first cohort, including 16 healthy dogs, the influence of frame rate, heart rate variation, zoom, transducer frequency, and image foreshortening on speckle-tracking values was assessed. In the second cohort, which included 10 healthy dogs, 2D-STE parameters were obtained with the 2D strain and AFI to assess agreement between the methods and observer variability. Our findings indicate that foreshortening (p < 0.01, Cohen’s d: 0.52, CI: −17.81 to −24.83) and heart rate variability (p = 0.02, Cohen’s d: 0.72, CI: −18.07 to −26.23) significantly affect speckle-tracking measurements. While zoom, frame rate, and frequency did not show a significant impact. Additionally, while the 2D strain and AFI exhibited a strong correlation, a significant systematic bias was identified, with AFI underestimating strain values compared to the 2D strain. Intra- and inter-observer coefficients of variation (CV) were below 9% for both methods, supporting their reliability. These findings emphasize the need to optimize image acquisition and selection criteria, which enhances the accuracy and reliability of the speckle-tracking analysis. Full article

(This article belongs to the Special Issue Advances in Diagnostic Imaging in Small Animal Cardiology)

► Show Figures

Figure 1

Search Results (165)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (165)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI