MDPI - Publisher of Open Access Journals

21 pages, 4010 KiB

Open AccessArticle

PCES-YOLO: High-Precision PCB Detection via Pre-Convolution Receptive Field Enhancement and Geometry-Perception Feature Fusion

by Heqi Yang, Junming Dong, Cancan Wang, Zhida Lian and Hui Chang

Appl. Sci. 2025, 15(13), 7588; https://doi.org/10.3390/app15137588 - 7 Jul 2025

Viewed by 296

Abstract

Printed circuit board (PCB) defect detection faces challenges like small target feature loss and severe background interference. To address these issues, this paper proposes PCES-YOLO, an enhanced YOLOv11-based model. First, a developed Pre-convolution Receptive Field Enhancement (PRFE) module replaces C3k in the C3k2 [...] Read more.

Printed circuit board (PCB) defect detection faces challenges like small target feature loss and severe background interference. To address these issues, this paper proposes PCES-YOLO, an enhanced YOLOv11-based model. First, a developed Pre-convolution Receptive Field Enhancement (PRFE) module replaces C3k in the C3k2 module. The ConvNeXtBlock with inverted bottleneck is introduced in the P4 layer, greatly improving small-target feature capture and semantic understanding. The second key innovation lies in the creation of the Efficient Feature Fusion and Aggregation Network (EFAN), which integrates a lightweight Spatial-Channel Decoupled Downsampling (SCDown) module and three innovative fusion pathways. This achieves substantial parameter reduction while effectively integrating shallow detail features with deep semantic features, preserving critical defect information across different feature levels. Finally, the Shape-IoU loss function is incorporated, focusing on bounding box shape and scale for more accurate regression and enhanced defect localization precision. Experiments on the enhanced Peking University PCB defect dataset show that PCES-YOLO achieves a mAP50 of 97.3% and a mAP50–95 of 77.2%. Compared to YOLOv11n, it shows improvements of 3.6% in mAP50 and 15.2% in mAP50–95. When compared to YOLOv11s, it increases mAP50 by 1.0% and mAP50–95 by 5.6% while also significantly reducing the model parameters. The performance of PCES-YOLO is also evaluated against mainstream object detection algorithms, including Faster R-CNN, SSD, YOLOv8n, etc. These results indicate that PCES-YOLO outperforms these algorithms in terms of detection accuracy and efficiency, making it a promising high-precision and efficient solution for PCB defect detection in industrial settings. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 6772 KiB

Open AccessArticle

A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization

by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng

Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025

Viewed by 933

Abstract

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article

► Show Figures

Figure 1

17 pages, 9400 KiB

Open AccessArticle

MRCA-UNet: A Multiscale Recombined Channel Attention U-Net Model for Medical Image Segmentation

by Lei Liu, Xiang Li, Shuai Wang, Jun Wang and Silas N. Melo

Symmetry 2025, 17(6), 892; https://doi.org/10.3390/sym17060892 - 6 Jun 2025

Viewed by 483

Abstract

Deep learning techniques play a crucial role in medical image segmentation for diagnostic purposes, with traditional convolutional neural networks (CNNs) and emerging transformers having achieved satisfactory results. CNN-based methods focus on extracting the local features of an image, which are beneficial for handling [...] Read more.

Deep learning techniques play a crucial role in medical image segmentation for diagnostic purposes, with traditional convolutional neural networks (CNNs) and emerging transformers having achieved satisfactory results. CNN-based methods focus on extracting the local features of an image, which are beneficial for handling image details and textural features. However, the receptive fields of CNNs are relatively small, resulting in poor performance when processing images with long-range dependencies. Conversely, transformer-based methods are effective in handling global information; however, they suffer from significant computational complexity arising from the building of long-range dependencies. Additionally, they lack the ability to perceive image details and adopt channel features. These problems can result in unclear image segmentation and blurred boundaries. Accordingly, in this study, a multiscale recombined channel attention (MRCA) module is proposed, which can simultaneously extract both global and local features and has the capability of exploring channel features during feature fusion. Specifically, the proposed MRCA first employs multibranch extraction of image features and performs operations such as blocking, shifting, and aggregating the image at different scales. This step enables the model to recognize multiscale information locally and globally. Feature selection is then performed to enhance the predictive capability of the model. Finally, features from different branches are connected and recombined across channels to complete the feature fusion. Benefiting from fully exploring the channel features, an MRCA-based U-Net (MRCA-UNet) framework is proposed for medical image segmentation. Experiments conducted on the Synapse multi-organ segmentation (Synapse) dataset and the International Skin Imaging Collaboration (ISIC-2018) dataset demonstrate the competitive segmentation performance of the proposed MRCA-UNet, achieving an average Dice Similarity Coefficient (DSC) of 81.61% and a Hausdorff Distance (HD) of 23.36 on Synapse and an Accuracy of 95.94% on ISIC-2018. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision Using Embedded Systems)

► Show Figures

Figure 1

20 pages, 13045 KiB

Open AccessArticle

Detection of Crack Sealant in the Pretreatment Process of Hot In-Place Recycling of Asphalt Pavement via Deep Learning Method

by Kai Zhao, Tianzhen Liu, Xu Xia and Yongli Zhao

Sensors 2025, 25(11), 3373; https://doi.org/10.3390/s25113373 - 27 May 2025

Viewed by 517

Abstract

Crack sealant is commonly used to fill pavement cracks and improve the Pavement Condition Index (PCI). However, during asphalt pavement hot in-place recycling (HIR), irregular shapes and random distribution of crack sealants can cause issues like agglomeration and ignition. To address these problems, [...] Read more.

Crack sealant is commonly used to fill pavement cracks and improve the Pavement Condition Index (PCI). However, during asphalt pavement hot in-place recycling (HIR), irregular shapes and random distribution of crack sealants can cause issues like agglomeration and ignition. To address these problems, it is necessary to mill large areas containing crack sealant or pre-mark locations for removal after heating. Currently, detecting and recording crack sealant locations, types, and distributions is conducted manually, which significantly reduces efficiency. While deep learning-based object detection has been widely applied to distress detection, crack sealants present unique challenges. They often appear as wide black patches that overlap with cracks and potholes, and complex background noise further complicates detection. Additionally, no dataset specifically for crack sealant detection currently exists. To overcome these challenges, this paper presents a specialized dataset created from 1983 pavement images. A deep learning detection algorithm named YOLO-CS (You Only Look Once Crack Sealant) is proposed. This algorithm integrates the RepViT (Representation Learning with Visual Tokens) network to reduce computational complexity while capturing the global context of images. Furthermore, the DRBNCSPELAN (Dilated Reparam Block with Cross-Stage Partial and Efficient Layer Aggregation Networks) module is introduced to ensure efficient information flow, and a lightweight shared convolution (LSC) detection head is developed. The results demonstrate that YOLO-CS outperforms other algorithms, achieving a precision of 88.4%, a recall of 84.2%, and an mAP (mean average precision) of 92.1%. Moreover, YOLO-CS significantly reduces parameters and memory consumption. Integrating Artificial Intelligence-based algorithms into HIR significantly enhances construction efficiency. Full article

(This article belongs to the Special Issue Health Monitoring and Maintenance of Road Pavements Using Emerging Sensing Technologies)

► Show Figures

Figure 1

19 pages, 1649 KiB

Open AccessArticle

SFSIN: A Lightweight Model for Remote Sensing Image Super-Resolution with Strip-like Feature Superpixel Interaction Network

by Yanxia Lyu, Yuhang Liu, Qianqian Zhao, Ziwen Hao and Xin Song

Mathematics 2025, 13(11), 1720; https://doi.org/10.3390/math13111720 - 23 May 2025

Viewed by 418

Abstract

Remote sensing image (RSI) super-resolution plays a critical role in improving image details and reducing costs associated with physical imaging devices. However, existing super-resolution methods are not applicable to resource-constrained edge devices because they are hampered by a large number of parameters and [...] Read more.

Remote sensing image (RSI) super-resolution plays a critical role in improving image details and reducing costs associated with physical imaging devices. However, existing super-resolution methods are not applicable to resource-constrained edge devices because they are hampered by a large number of parameters and significant computational complexity. To address these challenges, we propose a novel lightweight super-resolution model for remote sensing images, a strip-like feature superpixel interaction network (SFSIN), which combines the flexibility of convolutional neural networks (CNNs) with the long-range learning capabilities of a Transformer. Specifically, the Transformer captures global context information through long-range dependencies, while the CNN performs shape-adaptive convolutions. By stacking strip-like feature superpixel interaction (SFSI) modules, we aggregate strip-like features to enable deep feature extraction from local and global perspectives. In addition to traditional methods that rely solely on direct upsampling for reconstruction, our model uses the convolutional block attention module with upsampling convolution (CBAMUpConv), which integrates deep features from spatial and channel dimensions to improve reconstruction performance. Extensive experiments on the AID dataset show that SFSIN outperforms ten state-of-the-art lightweight models. SFSIN achieves a PSNR of 33.10 dB and an SSIM of 0.8715 on the ×2 scale, outperforming competitive models in both quantity and quality, while also excelling at higher scales. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in the Era of Big Knowledge and Large Models)

► Show Figures

Figure 1

28 pages, 17488 KiB

Open AccessArticle

Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation

by Ali Zakir, Sartaj Ahmed Salman, Gibran Benitez-Garcia and Hiroki Takahashi

Electronics 2025, 14(11), 2107; https://doi.org/10.3390/electronics14112107 - 22 May 2025

Viewed by 510

Abstract

Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for [...] Read more.

Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for real-time or resource-constrained scenarios. We propose AMFACPose (Attentive Multi-scale Features with Adaptive Context PoseResNet) to address these limitations. Specifically, our architecture incorporates Coordinate Convolution 2D (CoordConv2d) to retain explicit spatial context, alleviating the loss of coordinate information in conventional convolutions. To reduce computational overhead while maintaining accuracy, we utilize Depthwise Separable Convolutions (DSCs), separating spatial and pointwise operations. At the core of our approach is an Adaptive Feature Pyramid Network (AFPN), which replaces costly deconvolution-based upsampling by efficiently aggregating multi-scale features to handle diverse human poses and body sizes. We further introduce Dual-Gate Context Blocks (DGCBs) that refine global context to manage partial occlusions and cluttered backgrounds. The model integrates Squeeze-and-Excitation (SE) blocks and the Spatial–Channel Refinement Module (SCRM) to emphasize the most informative feature channels and spatial regions, which is particularly beneficial for occluded or overlapping keypoints. For precise keypoint localization, we replace dense heatmap predictions with coordinate classification using Multi-Layer Perceptron (MLP) heads. Experiments on the COCO and CrowdPose datasets demonstrate that AMFACPose surpasses the existing 2D HPE methods in both accuracy and computational efficiency. Moreover, our implementation on edge devices achieves real-time performance while preserving high accuracy, confirming the suitability of AMFACPose for resource-constrained pose estimation in both benchmark and real-world environments. Full article

(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)

► Show Figures

Figure 1

26 pages, 13657 KiB

Open AccessArticle

Multilevel Feature Cross-Fusion-Based High-Resolution Remote Sensing Wetland Landscape Classification and Landscape Pattern Evolution Analysis

by Sijia Sun, Biao Wang, Zhenghao Jiang, Ziyan Li, Sheng Xu, Chengrong Pan, Jun Qin, Yanlan Wu and Peng Zhang

Remote Sens. 2025, 17(10), 1740; https://doi.org/10.3390/rs17101740 - 16 May 2025

Viewed by 410

Abstract

Analyzing wetland landscape pattern evolution is crucial for managing wetland resources. High-resolution remote sensing serves as a primary method for monitoring wetland landscape patterns. However, the complex landscape types and spatial structures of wetlands pose challenges, including interclass similarity and intraclass spatial heterogeneity, [...] Read more.

Analyzing wetland landscape pattern evolution is crucial for managing wetland resources. High-resolution remote sensing serves as a primary method for monitoring wetland landscape patterns. However, the complex landscape types and spatial structures of wetlands pose challenges, including interclass similarity and intraclass spatial heterogeneity, leading to the low separability of landscapes and difficulties in identifying fragmented and small objects. To address these issues, this study proposes the multilevel feature cross-fusion wetland landscape classification network (MFCFNet), which combines the global modeling capability of Swin Transformer with the local detail-capturing ability of convolutional neural networks (CNNs), facilitating discerning intraclass consistency and interclass differences. To alleviate the semantic confusion caused by different-level features with semantic gaps during fusion, we introduce a deep–shallow feature cross-fusion (DSFCF) module between the encoder and the decoder. We incorporate global–local attention block (GLAB) to aggregate global contextual information and local detail. The constructed Shengjin Lake Wetland Gaofen Image Dataset (SLWGID) is utilized to evaluate the performance of MFCFNet, achieving evaluation metric results of the OA, mIoU, and F1 score of 93.23%, 78.12%, and 87.05%, respectively. MFCFNet is used to classify the wetland landscape of Shengjin Lake from 2013 to 2023. A landscape pattern evolution analysis is conducted, focusing on landscape transitions, area changes, and pattern characteristic variations. The method demonstrates effectiveness for the dynamic monitoring of wetland landscape patterns, providing valuable insights for wetland conservation. Full article

► Show Figures

Figure 1

21 pages, 29272 KiB

Open AccessArticle

Multi-Strategy Enhancement of YOLOv8n Monitoring Method for Personnel and Vehicles in Mine Air Door Scenarios

by Lei Zhang, Hongjing Tao, Zhipeng Sun and Weixun Yi

Sensors 2025, 25(10), 3128; https://doi.org/10.3390/s25103128 - 15 May 2025

Viewed by 454

Abstract

The mine air door is the primary facility for regulating airflow and controlling the passage of personnel and vehicles. Intelligent monitoring of personnel and vehicles within the mine air door system is a crucial measure to ensure the safety of mine operations. To [...] Read more.

The mine air door is the primary facility for regulating airflow and controlling the passage of personnel and vehicles. Intelligent monitoring of personnel and vehicles within the mine air door system is a crucial measure to ensure the safety of mine operations. To address the issues of slow speed and low efficiency associated with traditional detection methods in mine air door scenarios, this study proposes a CGSW-YOLO man-vehicle monitoring model based on YOLOv8n. Firstly, the Faster Block module, which incorporates partial convolution (PConv), is integrated with the C2f module of the backbone network. This combination aims to minimize redundant calculations during the convolution process and expedite the model’s aggregation of multi-scale information. Secondly, standard convolution is replaced with GhostConv in the backbone network to further reduce the number of model parameters. Additionally, the Slim-neck module is integrated into the neck feature fusion network to enhance the information fusion capability of various feature maps while maintaining detection accuracy. Finally, WIoUv3 is utilized as the loss function, and a dynamic non-monotonic focusing mechanism is implemented to adjust the quality of the anchor frame dynamically. The experimental results indicate that the CGSW-YOLO model exhibits strong performance in monitoring man-vehicle interactions in mine air door scenarios. The Precision (P), Recall (R), and the map@0.5 are recorded at 88.2%, 93.9%, and 98.0%, respectively, representing improvements of 0.2%, 1.5%, and 1.7% over the original model. The Frames Per Second (FPS) has increased to 135.14 f·s⁻¹, reflecting a rise of 35.14%. Additionally, the parameters, the floating point operations per second (FLOPS), and model size are 2.36 M, 6.2 G, and 5.0 MB, respectively. These values indicate reductions of 21.6%, 23.5%, and 20.6% compared to the original model. Through the verification of on-site surveillance video, the CGSW-YOLO model demonstrates its effectiveness in monitoring both individuals and vehicles in scenarios involving mine air doors. Full article

(This article belongs to the Special Issue Recent Advances in Optical Sensor for Mining)

► Show Figures

Figure 1

33 pages, 2794 KiB

Open AccessArticle

Soil Bulk Density, Aggregates, Carbon Stabilization, Nutrients and Vegetation Traits as Affected by Manure Gradients Regimes Under Alpine Meadows of Qinghai–Tibetan Plateau Ecosystem

by Mahran Sadiq, Nasir Rahim, Majid Mahmood Tahir, Aqila Shaheen, Fu Ran, Guoxiang Chen and Xiaoming Bai

Plants 2025, 14(10), 1442; https://doi.org/10.3390/plants14101442 - 12 May 2025

Viewed by 458

Abstract

Climate change and overgrazing significantly constrain the sustainability of meadow land and vegetation in the livestock industry on the Tibetan–Plateau ecosystem. In context of climate change mitigation, grassland soil C sequestration and forage sustainability, it is important to understand how manure regimes influence [...] Read more.

Climate change and overgrazing significantly constrain the sustainability of meadow land and vegetation in the livestock industry on the Tibetan–Plateau ecosystem. In context of climate change mitigation, grassland soil C sequestration and forage sustainability, it is important to understand how manure regimes influence SOC stability, grassland soil, forage structure and nutritional quality. However, the responses of SOC fractions, soil and forage structure and quality to the influence of manure gradient practices remain unclear, particularly at Tianzhu belt, and require further investigation. A field study was undertaken to evaluate the soil bulk density, aggregate fractions and dynamics in SOC concentration, permanganate oxidizable SOC fractions, SOC stabilization and soil nutrients at the soil aggregate level under manure gradient practices. Moreover, the forage biodiversity, aboveground biomass and nutritional quality of alpine meadow plant communities were also explored. Four treatments, i.e., control (CK), sole sheep manure (SM), cow dung alone (CD) and a mixture of sheep manure and cow dung (SMCD) under five input rates, i.e., 0.54, 1.08, 1.62, 2.16 and 2.70 kg m⁻², were employed under randomized complete block design with four replications. Our analysis confirmed the maximum soil bulk density (BD) (0.80 ± 0.05 g cm⁻³) and micro-aggregate fraction (45.27 ± 0.77%) under CK, whilst the maximum macro-aggregate fraction (40.12 ± 0.54%) was documented under 2.70 kg m⁻² of SMCD. The SOC, very-labile C fraction (Cfrac₁), labile C fraction (Cfrac₂) and non-labile/recalcitrant C fraction (Cfrac₄) increased with manure input levels, being the highest in 2.16 kg m⁻² and 2.70 kg m⁻² applications of sole SM and the integration of 50% SM and 50% CD (SMCD), whereas the less-labile fraction (Cfrac₃) was highest under CK across aggregate fractions. However, manures under varying gradients improved SOC pools and stabilization for both macro- and micro-aggregates. A negative response of the carbon management index (CMI) in macro-aggregates was observed, whilst CMI in the micro-aggregate fraction depicted a positive response to manure addition with input rates, being the maximum under sole SM addition averaged across gradients. Higher SOC pools and CMI under the SM, CD and SMCD might be owing to the higher level of soil organic matter inputs under higher doses of manures. Moreover, the highest accumulation of soil nutrients,, for instance, TN, AN, TP, AP, TK, AK, DTPA extractable Zn, Cu, Fe and Mn, was recorded in SM, CD and SMCD under varying gradients over CK at both aggregate fractions. More nutrient accumulation was found in macro-aggregates over micro-aggregates, which might be credited to the physical protection of macro-aggregates. Overall, manure addition under varying input rates improved the plant community structure and enhanced meadow yield, plant community diversity and nutritional quality more than CK. Therefore, alpine meadows should be managed sustainably via the adoption of sole SM practice under a 2.16 kg m⁻² input rate for the ecological utilization of the meadow ecosystem. The results of this study deliver an innovative perspective in understanding the response of alpine meadows’ SOC pools, SOC stabilization and nutrients at the aggregate level, as well as vegetation structure, productivity and forage nutritional quality to manure input rate practices. Moreover, this research offers valuable information for ensuring climate change mitigation and the clean production of alpine meadows in the Qinghai–Tibetan Plateau area of China. Full article

(This article belongs to the Section Plant Ecology)

► Show Figures

Figure 1

17 pages, 2144 KiB

Open AccessArticle

DEPANet: A Differentiable Edge-Guided Pyramid Aggregation Network for Strip Steel Surface Defect Segmentation

by Yange Sun, Siyu Geng, Chengyi Zheng, Chenglong Xu, Huaping Guo and Yan Feng

Algorithms 2025, 18(5), 279; https://doi.org/10.3390/a18050279 - 9 May 2025

Viewed by 403

Abstract

The steel strip is an important and ideal material for the automotive and aerospace industries due to its superior machinability, cost efficiency, and flexibility. However, surface defects such as inclusions, spots, and scratches can significantly impact product performance and durability. Accurately identifying these [...] Read more.

The steel strip is an important and ideal material for the automotive and aerospace industries due to its superior machinability, cost efficiency, and flexibility. However, surface defects such as inclusions, spots, and scratches can significantly impact product performance and durability. Accurately identifying these defects remains challenging due to the complex texture structures and subtle variations in the material. In order to tackle this challenge, we propose a Differentiable Edge-guided Pyramid Aggregation Network (DEPANet) to utilize edge information for improving segmentation performance. DEPANet adopts an end-to-end encoder-decoder framework, where the encoder consisting of three key components: a backbone network, a Differentiable Edge Feature Pyramid network (DEFP), and Edge-aware Feature Aggregation Modules (EFAMs). The backbone network is designed to extract overall features from the strip steel surface, while the proposed DEFP utilizes learnable Laplacian operators to extract multiscale edge information of defects across scales. In addition, the proposed EFAMs aggregate the overall features generating from the backbone and the edge information obtained from DEFP using the Convolutional Block Attention Module (CBAM), which combines channel attention and spatial attention mechanisms, to enhance feature expression. Finally, through the decoder, implemented as a Feature Pyramid Network (FPN), the multiscale edge-enhanced features are progressively upsampled and fused to reconstruct high-resolution segmentation maps, enabling precise defect localization and robust handling of defects across various sizes and shapes. DEPANet demonstrates superior segmentation accuracy, edge preservation, and feature representation on the SD-saliency-900 dataset, outperforming other state-of-the-art methods and delivering more precise and reliable defect segmentation. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

19 pages, 2647 KiB

Open AccessArticle

FDI-VSR: Video Super-Resolution Through Frequency-Domain Integration and Dynamic Offset Estimation

by Donghun Lim and Janghoon Choi

Sensors 2025, 25(8), 2402; https://doi.org/10.3390/s25082402 - 10 Apr 2025

Cited by 1 | Viewed by 798

Abstract

The increasing adoption of high-resolution imaging sensors across various fields has led to a growing demand for techniques to enhance video quality. Video super-resolution (VSR) addresses this need by reconstructing high-resolution videos from lower-resolution inputs; however, directly applying single-image super-resolution (SISR) methods to [...] Read more.

The increasing adoption of high-resolution imaging sensors across various fields has led to a growing demand for techniques to enhance video quality. Video super-resolution (VSR) addresses this need by reconstructing high-resolution videos from lower-resolution inputs; however, directly applying single-image super-resolution (SISR) methods to video sequences neglects temporal information, resulting in inconsistent and unnatural outputs. In this paper, we propose FDI-VSR, a novel framework that integrates spatiotemporal dynamics and frequency-domain analysis into conventional SISR models without extensive modifications. We introduce two key modules: the Spatiotemporal Feature Extraction Module (STFEM), which employs dynamic offset estimation, spatial alignment, and multi-stage temporal aggregation using residual channel attention blocks (RCABs); and the Frequency–Spatial Integration Module (FSIM), which transforms deep features into the frequency domain to effectively capture global context beyond the limited receptive field of standard convolutions. Extensive experiments on the Vid4, SPMCs, REDS4, and UDM10 benchmarks, supported by detailed ablation studies, demonstrate that FDI-VSR not only surpasses conventional VSR methods but also achieves competitive results compared to recent state-of-the-art methods, with improvements of up to 0.82 dB in PSNR on the SPMCs benchmark and notable reductions in visual artifacts, all while maintaining lower computational complexity and faster inference. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

25 pages, 4732 KiB

Open AccessArticle

Analysis of Core–Periphery Structure Based on Clustering Aggregation in the NFT Transfer Network

by Zijuan Chen, Jianyong Yu, Yulong Wang and Jinfang Xie

Entropy 2025, 27(4), 342; https://doi.org/10.3390/e27040342 - 26 Mar 2025

Viewed by 534

Abstract

With the rise of blockchain technology and the Ethereum platform, non-fungible tokens (NFTs) have emerged as a new class of digital assets. The NFT transfer network exhibits core–periphery structures derived from different partitioning methods, leading to local discrepancies and global diversity. We propose [...] Read more.

With the rise of blockchain technology and the Ethereum platform, non-fungible tokens (NFTs) have emerged as a new class of digital assets. The NFT transfer network exhibits core–periphery structures derived from different partitioning methods, leading to local discrepancies and global diversity. We propose a core–periphery structure characterization method based on Bayesian and stochastic block models (SBMs). This method incorporates prior knowledge to improve the fit of core–periphery structures obtained from various partitioning methods. Additionally, we introduce a locally weighted core–periphery structure aggregation (LWCSA) scheme, which determines local aggregation weights using the minimum description length (MDL) principle. This approach results in a more accurate and representative core–periphery structure. The experimental results indicate that core nodes in the NFT transfer network constitute approximately 2.3–5% of all nodes. Compared to baseline methods, our approach improves the normalized mutual information (NMI) index by 6–10%, demonstrating enhanced structural representation. This study provides a theoretical foundation for further analysis of the NFT market. Full article

(This article belongs to the Special Issue Entropy, Econophysics, and Complexity)

► Show Figures

Figure 1

36 pages, 34376 KiB

Open AccessArticle

Fast Fourier Asymmetric Context Aggregation Network: A Controllable Photo-Realistic Clothing Image Synthesis Method Using Asymmetric Context Aggregation Mechanism

by Haopeng Lei, Ying Hu, Mingwen Wang, Meihai Ding, Zhen Li and Guoliang Luo

Appl. Sci. 2025, 15(7), 3534; https://doi.org/10.3390/app15073534 - 24 Mar 2025

Viewed by 601

Abstract

Clothing image synthesis has emerged as a crucial technology in the fashion domain, enabling designers to rapidly transform creative concepts into realistic visual representations. However, the existing methods struggle to effectively integrate multiple guiding information sources, such as sketches and texture patches, limiting [...] Read more.

Clothing image synthesis has emerged as a crucial technology in the fashion domain, enabling designers to rapidly transform creative concepts into realistic visual representations. However, the existing methods struggle to effectively integrate multiple guiding information sources, such as sketches and texture patches, limiting their ability to precisely control the generated content. This often results in issues such as semantic inconsistencies and the loss of fine-grained texture details, which significantly hinders the advancement of this technology. To address these issues, we propose the Fast Fourier Asymmetric Context Aggregation Network (FCAN), a novel image generation network designed to achieve controllable clothing image synthesis guided by design sketches and texture patches. In the FCAN, we introduce the Asymmetric Context Aggregation Mechanism (ACAM), which leverages multi-scale and multi-stage heterogeneous features to achieve efficient global visual context modeling, significantly enhancing the model’s ability to integrate guiding information. Complementing this, the FCAN also incorporates a Fast Fourier Channel Dual Residual Block (FF-CDRB), which utilizes the frequency-domain properties of Fast Fourier Convolution to enhance fine-grained content inference while maintaining computational efficiency. We evaluate the FCAN on the newly constructed SKFashion dataset and the publicly available VITON-HD and Fashion-Gen datasets. The experimental results demonstrate that the FCAN consistently generates high-quality clothing images aligned with the design intentions while outperforming the baseline methods across multiple performance metrics. Furthermore, the FCAN demonstrates superior robustness to varying texture conditions compared to the existing methods, highlighting its adaptability to diverse real-world scenarios. These findings underscore the potential of the FCAN to advance this technology by enabling controllable and high-quality image generation. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

30 pages, 11153 KiB

Open AccessArticle

GCA2Net: Global-Consolidation and Angle-Adaptive Network for Oriented Object Detection in Aerial Imagery

by Shenbo Zhou, Zhenfei Liu, Hui Luo, Guanglin Qi, Yunfeng Liu, Haorui Zuo, Jianlin Zhang and Yuxing Wei

Remote Sens. 2025, 17(6), 1077; https://doi.org/10.3390/rs17061077 - 19 Mar 2025

Cited by 1 | Viewed by 585

Abstract

Enhancing the detection capabilities of rotated objects in aerial imagery is a vital aspect of the burgeoning field of remote sensing technology. The objective is to identify and localize objects oriented in arbitrary directions within the image. In recent years, the capacity for [...] Read more.

Enhancing the detection capabilities of rotated objects in aerial imagery is a vital aspect of the burgeoning field of remote sensing technology. The objective is to identify and localize objects oriented in arbitrary directions within the image. In recent years, the capacity for rotated object detection has seen continuous improvement. However, existing methods largely employ traditional backbone networks, where static convolutions excel at extracting features from objects oriented at a specific angle. In contrast, most objects in aerial imagery are oriented in various directions. This poses a challenge for backbone networks to extract high-quality features from objects of different orientations. In response to the challenge above, we propose the Dynamic Rotational Convolution (DRC) module. By integrating it into the ResNet backbone network, we form the backbone network presented in this paper, DRC-ResNet. Within the proposed DRC module, rotation parameters are predicted by the Adaptive Routing Unit (ARU), employing a data-driven approach to adaptively rotate convolutional kernels to extract features from objects oriented in various directions within different images. Building upon this foundation, we introduce a conditional computation mechanism that enables convolutional kernels to more flexibly and efficiently adapt to the dramatic angular changes of objects within images. To better integrate key information within images after obtaining features rich in angular details, we propose the Multi-Order Spatial-Channel Aggregation Block (MOSCAB) module, which is aimed at enhancing the integration capacity of key information in images through selective focusing and global information aggregation. Meanwhile, considering the significant semantic gap between features at different levels during the feature pyramid fusion process, we propose a new multi-scale fusion network named AugFPN+. This network reduces the semantic gap between different levels before feature fusion, achieves more effective feature integration, and minimizes the spatial information loss of small objects to the greatest extent possible. Experiments conducted on popular benchmark datasets DOTA-V1.0 and HRSC2016 demonstrate that our proposed model has achieved mAP scores of

77.56 %

and

90.4 %

, respectively, significantly outperforming current rotated detection models. Full article

► Show Figures

Figure 1

15 pages, 3474 KiB

Open AccessArticle

New Underwater Image Enhancement Algorithm Based on Improved U-Net

by Sisi Zhu, Zaiming Geng, Yingjuan Xie, Zhuo Zhang, Hexiong Yan, Xuan Zhou, Hao Jin and Xinnan Fan

Water 2025, 17(6), 808; https://doi.org/10.3390/w17060808 - 12 Mar 2025

Viewed by 1325

Abstract

(1) Objective: As light propagates through water, it undergoes significant attenuation and scattering, causing underwater images to experience color distortion and exhibit a bluish or greenish tint. Additionally, suspended particles in the water further degrade image quality. This paper proposes an improved U-Net [...] Read more.

(1) Objective: As light propagates through water, it undergoes significant attenuation and scattering, causing underwater images to experience color distortion and exhibit a bluish or greenish tint. Additionally, suspended particles in the water further degrade image quality. This paper proposes an improved U-Net network model for underwater image enhancement to generate high-quality images. (2) Method: Instead of incorporating additional complex modules into enhancement networks, we opted to simplify the classic U-Net architecture. Specifically, we replaced the standard convolutions in U-Net with our self-designed efficient basic block, which integrates a simplified channel attention mechanism. Moreover, we employed Layer Normalization to enhance the capability of training with a small number of samples and used the GELU activation function to achieve additional benefits in image denoising. Furthermore, we introduced the SK fusion module into the network to aggregate feature information, replacing traditional concatenation operations. In the experimental section, we used the “Underwater ImageNet” dataset from “Enhancing Underwater Visual Perception (EUVP)” for training and testing. EUVP, established by Islam et al., is a large-scale dataset comprising paired images (high-quality clear images and low-quality blurry images) as well as unpaired underwater images. (3) Results: We compared our proposed method with several high-performing traditional algorithms and deep learning-based methods. The traditional algorithms include He, UDCP, ICM, and ULAP, while the deep learning-based methods include CycleGAN, UGAN, UGAN-P, and FUnIEGAN. The results demonstrate that our algorithm exhibits outstanding competitiveness on the underwater imagenet-dataset. Compared to the currently optimal lightweight model, FUnIE-GAN, our method reduces the number of parameters by 0.969 times and decreases Floating-Point Operations Per Second (FLOPS) by more than half. In terms of image quality, our approach achieves a minimal UCIQE reduction of only 0.008 while improving the NIQE by 0.019 compared to state-of-the-art (SOTA) methods. Finally, extensive ablation experiments validate the feasibility of our designed network. (4) Conclusions: The underwater image enhancement algorithm proposed in this paper significantly reduces model size and accelerates inference speed while maintaining high processing performance, demonstrating strong potential for practical applications. Full article

► Show Figures

Figure 1

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI